idnits 2.17.1 draft-nichols-diff-svc-arch-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-25) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 1162 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 7 instances of too long lines in the document, the longest one being 3 characters in excess of 72. ** There are 7 instances of lines with control characters in the document. ** The abstract seems to contain references ([2,3]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 876: '... 100kbps. (There MAY be a shaper set a...' RFC 2119 keyword, line 1022: '...datagrams SHOULD be treated as best-ef...' Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. '1' -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Normative reference to a draft: ref. '3' -- Possible downref: Non-RFC (?) normative reference: ref. '5' -- Possible downref: Non-RFC (?) normative reference: ref. '6' -- Possible downref: Non-RFC (?) normative reference: ref. '7' -- Possible downref: Non-RFC (?) normative reference: ref. '10' Summary: 13 errors (**), 0 flaws (~~), 2 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force K. Nichols / V. Jacobson / L. Zhang 2 INTERNET-DRAFT Nov, 1997 3 draft-nichols-diff-svc-arch-00.txt Expires: 5/98 5 A Two-bit Differentiated Services Architecture for the Internet 7 Status of this Memo 9 This document is an Internet-Draft. Internet-Drafts are working 10 documents of the Internet Engineering Task Force (IETF), its areas, 11 and its working groups. Note that other groups may also distribute 12 working documents as Internet-Drafts. 14 Internet-Drafts are draft documents valid for a maximum of six months 15 and may be updated, replaced, or obsoleted by other documents at any 16 time. It is inappropriate to use Internet-Drafts as reference 17 material or to cite them other than as "work in progress". 19 To learn the current status of any Internet-Draft, please check the 20 "1id-abstracts.txt" listing contained in the Internet- Drafts Shadow 21 Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), 22 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 23 ftp.isi.edu (US West Coast). 25 Abstract 27 This document presents a differentiated services architecture for 28 the internet. Dave Clark and Van Jacobson each presented work 29 on differentiated services at the Munich IETF meeting [2,3]. 30 Each explained how to use one bit of the IP header to deliver a 31 new kind of service to packets in the internet. These were two very 32 different kinds of service with quite different policy assumptions. 33 Ensuing discussion has convinced us that both service types 34 have merit and that both service types can be implemented with 35 a set of very similar mechanisms. We propose an architectural 36 framework that permits the use of both of these service types 37 and exploits their similarities in forwarding path mechanisms. 38 The major goals of this architecture are each shared with one 39 or both of those two proposals: keep the forwarding path simple, 40 push complexity to the edges of the network to the extent possible, 41 provide a service that avoids assumptions about the type of traffic 42 using it, employ an allocation policy that will be compatible with 43 both long-term and short-term provisioning, make it possible for 44 the dominant Internet traffic model to remain best-effort. 46 NOTE: This document includes figures that are an integral part of its 47 content. The IETF's choice of ascii as the standard document form 48 precludes the inclusion of those figures. The complete document, 49 with all its figures, is available at: 50 http://ftp.ee.lbl.gov/papers/dsarch.pdf 51 Internet Engineering Task Force K. Nichols / V. Jacobson / L. Zhang 52 draft-nichols-diff-svc-arch-00.txt Expires: 4/98 54 A Two-bit Differentiated Services Architecture for the Internet 56 K. Nichols 57 Bay Networks 59 V. Jacobson 60 LBNL 62 L. Zhang 63 UCLA 65 1. Introduction 67 This document presents a differentiated services architecture for 68 the internet. Dave Clark and Van Jacobson each presented work 69 on differentiated services at the Munich IETF meeting [2,3]. 70 Each explained how to use one bit of the IP header to deliver a 71 new kind of service to packets in the internet. These were two very 72 different kinds of service with quite different policy assumptions. 73 Ensuing discussion has convinced us that both service types 74 have merit and that both service types can be implemented with 75 a set of very similar mechanisms. We propose an architectural 76 framework that permits the use of both of these service types 77 and exploits their similarities in forwarding path mechanisms. 78 The major goals of this architecture are each shared with one 79 or both of those two proposals: keep the forwarding path simple, 80 push complexity to the edges of the network to the extent possible, 81 provide a service that avoids assumptions about the type of traffic 82 using it, employ an allocation policy that will be compatible with 83 both long-term and short-term provisioning, make it possible for 84 the dominant Internet traffic model to remain best-effort. 86 The major contributions of this document are to present two 87 distinct service types, a set of general mechanisms for the 88 forwarding path that can be used to implement a range of 89 differentiated services and to propose a flexible framework 90 for provisioning a differentiated services network. It is 91 precisely this kind of architecture that is needed for expedient 92 deployment of differentiated services: we need a framework and 93 set of primitives that can be implemented in the short-term and 94 provide interoperable services, yet can provide a "sandbox" for 95 experimentation and elaboration that can lead in time to more 96 levels of differentiation within each service as needed. 98 At the risk of belaboring an analogy, we are motivated to provide 99 services tiers in somewhat the same fashion as the airlines do 100 with first class, business class and coach class. The latter 101 also has tiering built in due to the various restrictions put on 102 the purchase. A part of the analogy we want to stress is that 103 best effort traffic, like coach class seats on an airplane, 104 is still expected to make up the bulk of internet traffic. 105 Business and first class carry a small number of passengers, 106 but are quite important to the economics of the airline industry. 107 The various economic forces and realities combine to dictate the 108 relative allocation of the seats and to try to fill the airplane. 109 We don't expect that differentiated services will comprise all 110 the traffic on the internet, but we do expect that new services 111 will lead to a healthy economic and service environment. 113 This document is organized into sections describing service 114 architecture, mechanisms, the bandwidth allocation architecture, 115 how this architecture might interoperate with RSVP/int-serv work, 116 and gives recommendations for deployment. 118 2. Architecture 120 2.1 Background 122 The current internet delivers one type of service, best-effort, 123 to all traffic. A number of proposals have been made concerning 124 the addition of enhanced services to the Internet. We focus on two 125 particular methods of adding a differentiated level of service to 126 IP, each designated by one bit [1,2,3]. These services represent a 127 radical departure from the Internet's traditional service, but they 128 are also a radical departure from traditional "quality of service" 129 architectures which rely on circuit-based models. Both these 130 proposals seek to define a single common mechanism that is used by 131 interior network routers, pushing most of the complexity and state 132 of differentiated services to the network edges. Both use bandwidth 133 as the resource that is being requested and allocated. Clark and 134 Wroclawski defined an "Assured" service that follows "expected 135 capacity" usage profiles that are statistically provisioned [3]. 136 The assurance that the user of such a service receives is that 137 such traffic is unlikely to be dropped as long as it stays within 138 the expected capacity profile. The exact meaning of "unlikely" 139 depends on how well provisioned the service is. An Assured service 140 traffic flow may exceed its Profile, but the excess traffic is 141 not given the same assurance level. Jacobson defined a "Premium" 142 service that is provisioned according to peak capacity Profiles 143 that are strictly not oversubscribed and that is given its own 144 high-priority queue in routers [2]. A Premium service traffic 145 flow is shaped and hard-limited to its provisioned peak rate 146 and shaped so that bursts are not injected into the network. 147 Premium service presents a "virtual wire" where a flow's bursts 148 may queue at the shaper at the edge of the network, but thereafter 149 only in proportion to the indegree of each router. Despite their 150 many similarities, these two approaches result in fundamentally 151 different services. The former uses buffer management to provide 152 a "better effort" service while the latter creates a service with 153 little jitter and queueing delay and no need for queue management 154 on the Premium packets' queue. 156 An Assured service was introduced in [3] by Clark and Wroclawski, 157 though we have made some alterations in its specification for 158 our architecture. Further refinements and an "Expected Capacity" 159 framework are given in Clark and Fang [10]. This framework is 160 focused on "providing different levels of best-effort service at 161 times of network congestion" but also mentions that it is possible 162 to have a separate router queue to implement a "guaranteed" 163 level of assurance. We believe this framework and our Two-bit 164 architecture are compatible but this needs further exploration. 165 As Premium service has not been documented elsewhere, we describe 166 it next and follow this with a description of the two-bit 167 architecture. 169 2.2 Premium service 171 In [2], a Premium service was presented that is fundamentally 172 different from the Internet's current best effort service. 173 This service is not meant to replace best effort but primarily to 174 meet an emerging demand for a commercial service that can share the 175 network with best effort traffic. This is desirable economically, 176 since the same network can be used for both kinds of traffic. 177 It is expected that Premium traffic would be allocated a small 178 percentage of the total network capacity, but that it would 179 be priced much higher. One use of such a service might be to 180 create "virtual leased lines", saving the cost of building and 181 maintaining a separate network. Premium service, not unlike 182 a standard telephone line, is a capacity which the customer 183 expects to be there when the receiver is lifted, although it may, 184 depending on the household, be idle a good deal of the time. 185 Provisioning Premium traffic in this way reduces the capacity 186 of the best effort internet by the amount of Premium allocated, 187 in the worst case, thus it would have to be priced accordingly. 188 On the other hand, whenever that capacity is not being used it 189 is available to best effort traffic. In contrast to normal best 190 effort traffic which is bursty and requires queue management 191 to deal fairly with congestive episodes, this Premium service 192 by design creates very regular traffic patterns and small or 193 nonexistent queues. 195 Premium service levels are specified as a desired peak bit-rate 196 for a specific flow (or aggregation of flows). The user contract 197 with the network is not to exceed the peak rate. The network 198 contract is that the contracted bandwidth will be available when 199 traffic is sent. First-hop routers (or other edge devices) filter 200 the packets entering the network, set the Premium bit of those 201 that match a Premium service specification, and perform traffic 202 shaping on the flow that smooths all traffic bursts before they 203 enter the network. This approach requires no changes in hosts. 204 A compliant router along the path needs two levels of priority 205 queueing, sending all packets with the Premium bit set first. 206 Best-effort traffic is unmarked and queued and sent at the lower 207 priority. This results in two "virtual networks": one which is 208 identical to today's Internet with buffers designed to absorb 209 traffic bursts; and one where traffic is limited and shaped to 210 a contracted peak-rate, but packets move through a network of 211 queues where they experience almost no queueing delay. 213 In this architecture, forwarding path decisions are made separately 214 and more simply than the setting up of the service agreements 215 and traffic profiles. With the exception of policing and shaping 216 at administrative or "trust" boundaries, the only actions that 217 need to be handled in the forwarding path are to classify a 218 packet into one of two queues on a single bit and to service 219 the two queues using simple priority. Shaping must include both 220 rate and burst parameters; the latter is expected to be small, 221 in the one or two packet range. Policing at boundaries enforces 222 rate compliance, and may be implemented by a simple token bucket. 223 The admission and set-up procedures are expected to evolve, in 224 time, to be dynamically configurable and fairly complex while 225 the mechanisms in the forwarding path remain simple. 227 A Premium service built on this architecture can be deployed in 228 a useful way once the forwarding path mechanisms are in place 229 by making static allocations. Traffic flows can be designated 230 for special treatment through network management configuration. 231 Traffic flows should be designated by the source, the destination, 232 or any combination of fields in the packet header. First-hop (of 233 leaf) routers will filter flows on all or part of the header tuple 234 consisting of the source IP address, destination IP address, 235 protocol identifier, source port number, and destination 236 port number. Based on this classification, a first-hop router 237 performs traffic shaping and sets the designated Premium bit 238 of the precedence field. End-hosts are thus not required to be 239 "differentiated services aware", though if and when end-systems 240 become universally "aware", they might do their own shaping and 241 first-hop routers merely police. 243 Adherence to the subscribed rate and burst size must be enforced 244 at the entry to the network, either by the end-system or by the 245 first-hop router. Within an intranet, administrative domain, or 246 "trust region" the packets can then be classified and serviced 247 solely on the Premium bit. Where packets cross a boundary, the 248 policing function is critical. The entered region will check the 249 prioritized packet flow for conformance to a rate the two regions 250 have agreed upon, discarding packets that exceed the rate. It is 251 thus in the best interests of a region to ensure conformance 252 to the agreed-upon rate at the egress. This requirement means 253 that Premium traffic is burst-free and, together with the no 254 oversubscription rule, leads directly to the observation that 255 Premium queues can easily be sized to prevent the need to drop 256 packets and thus the need for a queue management policy. At each 257 router, the largest queue size is related to the in-degree of 258 other routers and is thus quite small, on the order of ten packets. 260 Premium bandwidth allocations must not be oversubscribed as 261 they represent a commitment by the network and should be priced 262 accordingly. Note that, in this architecture, Premium traffic will 263 also experience considerably less delay variation than either best 264 effort traffic or the Assured data traffic of [3]. Premium rates 265 might be configured on a subscription basis in the near-term, 266 or on-demand when dynamic set-up or signaling is available. 268 Figure 1 shows how a Premium packet flow is established within a 269 particular administrative domain, Company A, and sent across the 270 access link to Company A's ISP. Assume that the host's first-hop 271 router has been configured to match a flow from the host's IP 272 address to a destination IP address that is reached through ISP. 273 A Premium flow is configured from a host with a rate which is 274 both smaller than the total Premium allocation Company A has 275 from the ISP, r bytes per second, and smaller than the amount of 276 that allocation has been assigned to other hosts in Company A. 277 Packets are not marked in any special way when they leave the host. 278 The first-hop router clears the Premium bit on all arriving 279 packets, sets the Premium bit on all packets in the designated 280 flow, shapes packets in the Premium flow to a configured rate 281 and burst size, queues best-effort unmarked packets in the low 282 priority queue and shaped Premium packets in the high priority 283 queue, and sends packets from those two queues at simple priority. 284 Intermediate routers internal to Company A enqueue packets in 285 one of two output queues based on the Premium bit and service 286 the queues with simple priority. Border routers perform quite 287 different tasks, depending on whether they are processing an egress 288 flow or an ingress flow. An egress border router may perform 289 some reshaping on the aggregate Premium traffic to conform to 290 rate r, depending on the number of Premium flows aggregated. 291 Ingress border routers only need to perform a simple policing 292 function that can be implemented with a token bucket. In the 293 example, the ISP accepts all Premium packets from A as long as 294 the flow does not exceed r bytes per second. 296 Figure 1. Premium traffic flow from end-host to 297 organization's ISP 299 2.3 Two-bit differentiated services architecture 301 Clark's and Jacobson's proposals are markedly similar in the 302 location and type of functional blocks that are needed to implement 303 them. Furthermore, they implement quite different services which 304 are not incompatible in a network. The Premium service implements 305 a guaranteed peak bandwidth service with negligible queueing delay 306 that cannot starve best effort traffic and can be allocated in a 307 fairly straightforward fashion. This service would seem to have 308 a strong appeal for commercial applications, video broadcasts, 309 voice-over-IP, and VPNs. On the other hand, this service may 310 prove both too restrictive (in its hard limits) and overdesigned 311 (no overallocation) for some applications. The Assured service 312 implements a service that has the same delay characteristics as 313 (undropped) best effort packets and the firmness of its guarantee 314 depends on how well individual links are provisioned for bursts of 315 Assured packets. On the other hand, it permits traffic flows to use 316 any additional available capacity without penalty and occasional 317 dropped packets for short congestive periods may be acceptable 318 to many users. This service might be what an ISP would provide to 319 individual customers who are willing to pay a bit more for internet 320 service that seems unaffected by congestive periods. Both services 321 are only as good as their admission control schemes, though this 322 can be more difficult for traffic which is not peak-rate allocated. 324 There may be some additional benefits of deploying both services. 325 To the extent that Premium service is a conservative allocation 326 of resources, unused bandwidth that had been allocated to Premium 327 might provide some "headroom" for underallocated or burst periods 328 of Assured traffic or for best effort. Network elements that 329 deploy both services will be performing RED queue management on 330 all non-Premium traffic, as suggested in [4], and the effects of 331 mixing the Premium streams with best effort might serve to reduce 332 burstiness in the latter. A strength of the Assured service is that 333 it allows bursts to happen in their natural fashion, but this also 334 makes the provisioning, admission control and allocation problem 335 more difficult so it may take more time and experimentation before 336 this admission policy for this service is completely defined. 337 A Premium service could be deployed that employs static allocations 338 on peak rates with no statistical sharing. 340 As there appear to be a number of advantages to an architecture 341 that permits these two types of service and because, as we shall 342 see, they can be made to share many of the same mechanisms, we 343 propose designating two bit-patterns from the IP header precedence 344 field. We leave the explicit designation of these bit-patterns 345 to the standards process thus we use the shorthand notation of 346 denoting each pattern by a bit, one we will call the Premium or 347 P-bit, the other we call the assurance or A-bit. It is possible 348 for a network to implement only one of these services and to have 349 network elements that only look at the one applicable bit, but we 350 focus on the two service architecture. Further, we assume the case 351 where no changes are made in the hosts, appropriate packet marking 352 all being done in the network, at the first-hop, or leaf, router. 353 We describe the forwarding path architecture in this section, 354 assuming that the service has been allocated through mechanisms 355 we will discuss in section 4. 357 In a more general sense, Premium service denotes packets that are 358 enqueued at a higher priority than the ordinary best-effort queue. 359 Similarly, Assured service denotes packets that are treated 360 preferentially with respect to the dropping probability within 361 the "normal" queue. There are a number of ways to add more service 362 levels within each of these service types [7], but this document 363 takes the position of specifying the base-level services of 364 Premium and Assured. 366 The forwarding path mechanisms can be broken down into those 367 that happen at the input interface, before packet forwarding, 368 and those that happen at the output interface, after packet 369 forwarding. Intermediate routers only need to implement the 370 post packet forwarding functions, while leaf and border routers 371 must perform functions on arriving packets before forwarding. 372 We describe the mechanisms this way for illustration; other ways 373 of composing their functions are possible. 375 Leaf routers are configured with a traffic profile for a particular 376 flow based on its packet header. This functionality has been 377 defined by the RSVP Working Group in RFC 2205. Figure 2 shows 378 what happens to a packet that arrives at the leaf router, before 379 it is passed to the forwarding engine. All arriving packets must 380 have both the A-bit and the P-bit cleared after which packets 381 are classified on their header. If the header does not match any 382 configured values, it is immediately forwarded. Matched flows 383 pass through individual Markers that have been configured from the 384 usage profile for that flow: service class (Premium or Assured), 385 rate (peak for Premium, "expected" for Assured), and permissible 386 burst size (may be optional for Premium). Assured flow packets 387 emerge from the Marker with their A-bits set when the flow is in 388 conformance to its Profile, but the flow is otherwise unchanged. 389 For a Premium flow, the Marker will hold packets when necessary 390 to enforce their configured rate. Thus Premium flow packets 391 emerge from the Marker in a shaped flow with their P-bits set. 392 (It is possible for Premium flow packets to be dropped inside 393 of the Marker as we describe below.) Packets are passed to the 394 forwarding engine when they emerge from Markers. Packets that have 395 either their P or A bits set we will refer to as Marked packets. 397 Figure 2. Block diagram of leaf router input functionality 399 Figure 3 shows the inner workings of the Marker. For both Assured 400 and Premium packets, a token bucket "fills" at the flow rate 401 that was specified in the usage profile. For Assured service, 402 the token bucket depth is set by the Profile's burst size. 403 For Premium service, the token bucket depth must be limited to 404 the equivalent of only one or two packets. (We suggest a depth of 405 one packet in early deployments.) When a token is present, Assured 406 flow packets have their A-bit set to one, otherwise the packet is 407 passed to the forwarding engine. For Premium-configured Marker, 408 arriving packets that see a token present have their P-bits set 409 and are forwarded, but when no token is present, Premium flow 410 packets are held until a token arrives. If a Premium flow bursts 411 enough to overflow the holding queue, its packets will be dropped. 412 Though the flow set up data can be used to configure a size limit 413 for the holding queue (this would be the meaning of a "burst" 414 in Premium service), it is not necessary. Unconfigured holding 415 queues should be capable of holding at least two bandwidth-delay 416 products, adequate for TCP connections. A smaller value might 417 be used to suit delay requirements of a specific application. 419 Figure 3. Markers to implement the two different services 421 In practice, the token bucket should be implemented in bytes 422 and a token is considered to be present if the number of bytes 423 in the bucket is equal or larger to the size of the packet. 424 For Premium, the bucket can only be allowed to fill to the 425 maximum packet size; while Assured may fill to the configured 426 burst parameter. Premium traffic is held until a sufficient byte 427 credit has accumulated and this holding buffer provides the only 428 real queue the flow sees in the network. For Assured, traffic, 429 we just test if the bytes in the bucket are sufficient for the 430 packet size and set A if so. If not, the only difference is that 431 A is not set. Assured traffic goes into a queue following this 432 step and potentially sees a queue at every hop along its path. 434 Each output interface of a router must have two queues and must 435 implement a test on the P-bit to select a packet's output queue. 436 The two queues must be serviced by simple priority, Premium packets 437 first. Each output interface must implement the RED-based RIO 438 mechanism described in [3] on the lower priority queue. RIO uses 439 two thresholds for when to begin dropping packets, a lower one 440 based on total queue occupancy for ordinary best effort traffic and 441 one based on the number of packets enqueued that have their A-bit 442 set. This means that any action preferential to Assured service 443 traffic will only be taken when the queue's capacity exceeds the 444 threshold value for ordinary best effort service. In this case, 445 only unmarked packets will be dropped (using the RED algorithm) 446 unless the threshold value for Assured service is also reached. 447 Keeping an accurate count of the number of A-bit packets currently 448 in a queue requires either testing the A-bit at both entry and 449 exit of the queue or some additional state in the router. Figure 4 450 is a block diagram of the output interface for all routers. 452 Figure 4. Router output interface for two-bit architecture 454 The packet output of a leaf router is thus a shaped stream of 455 packets with P-bits set mingled with an unshaped best effort stream 456 of packets, some of which may have A-bits set. Premium service 457 clearly cannot starve best effort traffic because it is both burst 458 and bandwidth controlled. Assured service might rely only on a 459 conservative allocation to prevent starvation of unmarked traffic, 460 but bursts of Assured traffic might then close out best-effort 461 traffic at bottleneck queues during congestive periods. 463 After [3], we designate the forwarding path objects that test flows 464 against their usage profiles "Profile Meters". Border routers will 465 require Profile Meters at their input interfaces. The bilateral 466 agreement between adjacent administrative domains must specify a 467 peak rate on all P traffic and a rate and burst for A traffic (and 468 possibly a start time and duration). A Profile Meter is required 469 at the ingress of a trust region to ensure that differentiated 470 service packet flows are in compliance with their agreed-upon 471 rates. Non-compliant packets of Premium flows are discarded while 472 non-compliant packets of Assured flows have their A-bits reset. 473 For example, in figure 1, if the ISP has agreed to supply Company 474 A with r bytes/sec of Premium service, P-bit marked packets that 475 enter the ISP through the link from Company A will be dropped if 476 they exceed r. If instead, the service in figure 1 was Assured 477 service, the packets would simply be unmarked, forwarded as 478 best effort. 480 The simplest border router input interface is a Profile Meter 481 constructed from a token bucket configured with the contracted 482 rate across that ingress link (see figure 5). Each type, Premium 483 or Assured, and each interface must have its own profile meter 484 corresponding to a particular class across a particular boundary. 485 (This is in contrast to models where every flow that crosses the 486 boundary must be separately policed and/or shaped.) The exact 487 mechanisms required at a border router input interface depend 488 on the allocation policy deployed; a more complex approach is 489 presented in section 4. 491 Figure 5. Border router input interface Profile Meters 493 3. Mechanisms 495 3.1 Forwarding Path Primitives 497 Section 2.3 introduced the forwarding path objects of Markers and 498 Profile Meters. In this section we specify the primitive building 499 blocks required to compose them. The primitives are: general 500 classifier, bit-pattern classifier, bit setter, priority queues, 501 policing token bucket and shaping token bucket. These primitives 502 can compose a Marker (either a policing or a shaping token bucket 503 plus a bit setter) and a Profile Meter (a policing token bucket 504 plus a dropper or bit setter). 506 General Classifier: 507 Leaf or first-hop routers must perform a transport-level signature 508 matching based on a tuple in the packet header, a functionality 509 which is part of any RSVP-capable router. As described above, 510 packets whose tuples match one of the configured flows are 511 conformance tested and have the appropriate service bit set. 512 This function is memory- and processing-intensive, but is kept 513 at the edges of the network where there are fewer flows. 515 Bit-pattern classifier: 516 This primitive comprises a simple two-way decision based on 517 whether a particular bit-pattern in the IP header is set or not. 518 As in figure 4, the P-bit is tested when a packet arrives at a 519 non-leaf router to determine whether to enqueue it in the high 520 priority output queue or the low priority packet queue. The A-bit 521 of packets bound for the low priority queue is tested to 1) 522 increment the count of Assured packets in the queue if set and 2) 523 determine which drop probability will be used for that packet. 524 Packets exiting the low priority queue must also have the A-bit 525 tested so that the count of enqueued Assured packets can be 526 decremented if necessary. 528 Bit setter: 529 The A-bits and P-bits must be set or cleared in several places. 530 A functional block that sets the appropriate bits of the IP header 531 to a configured bit-pattern would be the most general. 533 Priority queues: 534 Every network element must include (at least) two levels of simple 535 priority queueing. The high priority queue is for the Premium 536 traffic and the service rule is to send packets in that queue 537 first and to exhaustion. Recall that Premium traffic must never be 538 oversubscribed, thus Premium traffic should see little or no queue. 540 Shaping token bucket: 541 This is the token bucket required at the leaf router for Premium 542 traffic and shown in figure 3. As we shall see, shaping is also 543 useful at egress points of a trust region. An arriving packet is 544 immediately forwarded if there is a token present in the bucket, 545 otherwise the packet is enqueued until the bucket contains tokens 546 sufficient to send it. Shaping requires clocking mechanisms, 547 packet memory, and some state block for each flow and is thus a 548 memory and computation-intensive process. 550 Policing token bucket: 551 This is the token bucket required for Profile Meters and shown in 552 figure 5. Policing token buckets never hold arriving packets, but 553 check on arrival to see if a token is available for the packet's 554 service class. If so, the packet is forwarded immediately. 555 If not, the policing action is taken, dropping for Premium and 556 reclassifying or unmarking for Assured. 558 3.2 Passing configuration information 560 Clearly, mechanisms are required to communicate the information 561 about the request to the leaf router. This configuration 562 information is the rate, burst, and whether it is a Premium or 563 Assured type. There may also need to be a specific field to set 564 or clear this configuration. This information can be passed in 565 a number of ways, including using the semantics of RSVP, SNMP, 566 or directly set by a network administrator in some other way. 567 There must be some mechanisms for authenticating the sender of 568 this information. We expect configuration to be done in a variety 569 of ways in early deployments and a protocol and mechanism for 570 this to be a topic for future standards work. 572 3.3 Discussion 574 The requirements of shapers motivate their placement at the edges 575 of the network where the state per router can be smaller than 576 in the middle of a network. The greatest burden of flow matching 577 and shaping will be at leaf routers where the speeds and buffering 578 required should be less than those that might be required deeper in 579 the network. This functionality is not required at every network 580 element on the path. Routers that are internal to a trust region 581 will not need to shape traffic. Border routers may need or desire 582 to shape the aggregate flow of Marked packets at their egress 583 in order to ensure that they will not burst into non-compliance 584 with the policing mechanism at the ingress to the other domain 585 (though this may not be necessary if the in-degree of the router 586 is low). Further, the shaping would be applied to an aggregation 587 of all the Premium flows that exit the domain via that path, 588 not to each flow individually. 590 These mechanisms are within reach of today's technology and 591 it seems plausible to us that Premium and Assured services are 592 all that is needed in the Internet. If, in time, these services 593 are found insufficient, this architecture provides a migration 594 path for delivering other kinds of service levels to traffic. 595 The A- and P-bits would continue to be used to identify traffic 596 that gets Marked service, but further filter matching could be 597 done on packet headers to differentiate service levels further. 598 Using the bits this way reduces the number of packets that have 599 to have further matching done on them rather than filtering every 600 incoming packet. More queue levels and more complex scheduling 601 could be added for P-bit traffic and more levels of drop priority 602 could be added for A-bit traffic if experience shows them to be 603 necessary and processing speeds are sufficient. We propose that 604 the services described here be considered as "at least" services. 605 Thus, a network element should at least be capable of mapping all 606 P-bit traffic to Premium service and of mapping all A-bit traffic 607 to be treated with one level of priority in the "best effort" queue 608 (it appears that the single level of A-bit traffic should map to 609 a priority that is equivalent to the best level in a multi-level 610 element that is also in the path). 612 On the other hand, what is the downside of deploying an 613 architecture for both classes of service if later experience 614 convinces us that only one of them is needed? The functional blocks 615 of both service classes are similar and can be provided by the same 616 mechanism, parameterized differently. If Assured service is not 617 used, very little is lost. A RED-managed best effort queue has been 618 strongly recommended in [4] and, to the extent that the deployment 619 of this architecture pushes the deployment of RED-managed best 620 effort queues, it is clearly a positive. If Premium service 621 goes unused, the two-queues with simple priority service is not 622 required and the shaping function of the Marker may be unused, 623 thus these would impose an unnecessary implementation cost. 625 4. The Architectural Framework for Marked Traffic Allocation 627 Thus far we have focused on the service definitions and the 628 forwarding path mechanisms. We now turn to the problem of 629 allocating the level of Marked traffic throughout the Internet. 630 We observe that most organizations have fixed portions of their 631 budgets, including data communications, that are determined on 632 an annual or quarterly basis. Some additional monies might be 633 attached to specific projects for discretionary costs that arise 634 in the shorter term. In turn, service providers (ISPs and NSPs) 635 must do their planning on annual and quarterly bases and thus 636 cannot be expected to provide differentiated services purely 637 "on call". Provisioning sets up static levels of Marked traffic 638 while call set-up creates an allocation of Marked traffic for 639 a single flow's duration. Static levels can be provisioned with 640 time-of-day specifications, but cannot be changed in response to 641 a dynamic message. We expect both kinds of bandwidth allocation 642 to be important. The purchasers of Marked services can generally 643 be expected to work on longer-term budget cycles where these 644 services will be accounted for similarly to many information 645 services today. A mail-order house may wish to purchase a fixed 646 allocation of bandwidth in and out of its web-server to give 647 potential customers a "fast" feel when browsing their site. 648 This allocation might be based on hit rates of the previous 649 quarter or some sort of industry-based averages. In addition, 650 there needs to be a dynamic allocation capability to respond to 651 particular events, such as a demonstration, a network broadcast 652 by a company's CEO, or a particular network test. Furthermore, 653 a dynamic capability may be needed in order to meet a precommitted 654 service level when the particular source or destination is allowed 655 to be "anywhere on the Internet". "Dynamic" covers the range 656 from a telephoned or e-mailed request to a signalling type model. 657 A strictly statically allocated scenario is expected to be useful 658 in initial deployment of differentiated services and to make up 659 a major portion of the Marked traffic for the forseeable future. 661 Without a "per call" dynamic set up, the preconfiguring of 662 usage profiles can always be construed as "paying for bits you 663 don't use" whether the type of service is Premium or Assured. 664 We prefer to think of this as paying for the level of service that 665 one expects to have available at any time, for example paying 666 for a telephone line. A customer might pay an additional flat 667 fee to have the privilege of calling a wide local area for no 668 additional charge or might pay by the call. Although a customer 669 might pay on a "per call" basis for every call made anywhere, 670 it generally turns out not to be the most economical option for 671 most customers. It's possible similar pricing structures might 672 arise in the internet. 674 We use Allocation to refer to the process of making Marked 675 traffic commitments anywhere along this continuum from strictly 676 preallocated to dynamic call set-up and we require an Allocation 677 architecture capable of encompassing this entire spectrum 678 in any mix. We further observe that Allocation must follow 679 organizational hierarchies, that is each organization must 680 have complete responsibility for the Allocation of the Marked 681 traffic resource within its domain. Finally, we observe that 682 the only chance of success for incremental deployment lies in an 683 Allocation architecture that is made up of bilateral agreements, 684 as multilateral agreements are much too complex to administer. 685 Thus, the Allocation architecture is made up of agreements across 686 boundaries as to the amount of Marked traffic that will be allowed 687 to pass. This is similar to "settlement" models used today. 689 4.1 Bandwidth Brokers - Allocating and Controlling Bandwidth Shares 691 The goal of differentiated services is controlled sharing of 692 some organization's Internet bandwidth. The control can be done 693 independently by individuals, i.e., users set bit(s) in their 694 packets to distinguish their most important traffic, or it can 695 be done by agents that have some knowledge of the organization's 696 priorities and policies and allocate bandwidth with respect to 697 those policies. Independent labeling by individuals is simple to 698 implement but unlikely to be sufficient since it's unreasonable to 699 expect all individuals to know all their organization's priorities 700 and current network use and always mark their traffic accordingly. 701 Thus this architecture is designed with agents called bandwidth 702 brokers (BB) [2], that can be configured with organizational 703 policies, keep track of the current allocation of marked traffic, 704 and interpret new requests to mark traffic in light of the policies 705 and current allocation. 707 We note that such agents are inherent in any but the most trivial 708 notions of sharing. Neither individuals nor the routers their 709 packets transit have the information necessary to decide which 710 packets are most important to the organization. Since these 711 agents must exist, they can be used to allocate bandwidth for 712 end-to-end connections with far less state and simpler trust 713 relationships than deploying per flow or per filter guarantees in 714 all network elements on an end-to-end path. BBs make it possible 715 for bandwidth allocation to follow organizational hierarchies 716 and, in concert with the forwarding path mechanisms discussed 717 in section 3, reduce the state required to set up and maintain a 718 flow over architectures that require checking the full flow header 719 at every network element. Organizationally, the BB architecture 720 is motivated by the observation that multilateral agreements 721 rarely work and this architecture allows end-to-end services to 722 be constructed out of purely bilateral agreements. BBs only need 723 to establish relationships of limited trust with their peers 724 in adjacent domains, unlike schemes that require the setting 725 of flow specifications in routers throughout an end-to-end path. 726 In practical technical terms, the BB architecture makes it possible 727 to keep state on an administrative domain basis, rather than at 728 every router and the service definitions of Premium and Assured 729 service make it possible to confine per flow state to just the 730 leaf routers. 732 BBs have two responsibilities. Their primary one is to parcel 733 out their region's Marked traffic allocations and set up the 734 leaf routers within the local domain. The other is to manage the 735 messages that are sent across boundaries to adjacent regions' BBs. 736 A BB is associated with a particular trust region, one per domain. 737 A BB has a policy database that keeps the information on who can 738 do what when and a method of using that database to authenticate 739 requesters. Only a BB can configure the leaf routers to deliver a 740 particular service to flows, crucial for deploying a secure system. 741 If the deployment of Differentiated Services has advanced to 742 the stage where dynamically allocated, marked flows are possible 743 between two adjacent domains, BBs also provide the hook needed to 744 implement this. Each domain's BB establishes a secure association 745 with its peer in the adjacent domain to negotiate or configure a 746 rate and a service class (Premium or Assured) across the shared 747 boundary and through the peer's domain. As we shall see, it is 748 possible for some types of service and particularly in early 749 implementations, that this "secure association" is not automatic 750 but accomplished through human negotiation and subsequent manual 751 configuration of the adjacent BBs according to the negotiated 752 agreement. This negotiated rate is a capability that a BB controls 753 for all hosts in its region. 755 When an allocation is desired for a particular flow, a request is 756 sent to the BB. Requests include a service type, a target rate, 757 a maximum burst, and the time period when service is required. 758 The request can be made manually by a network administrator 759 or a user or it might come from another region's BB. A BB first 760 authenticates the credentials of the requester, then verifies there 761 exists unallocated bandwidth sufficient to meet the request. If a 762 request passes these tests, the available bandwidth is reduced by 763 the requested amount and the flow specification is recorded. In the 764 case where the flow has a destination outside this trust region, 765 the request must fall within the class allocation through the 766 "next hop" trust region that was established through a bilateral 767 agreement of the two trust regions. The requester's BB informs 768 the adjacent region's BB that it will be using some of this rate 769 allocation. The BB configures the appropriate leaf router with 770 the information about the packet flow to be given a service at 771 the time that the service is to commence. This configuration is 772 "soft state" that the BB will periodically refresh. The BB in 773 the adjacent region is responsible for configuring the border 774 router to permit the allocated packet flow to pass and for any 775 additional configurations and negotiations within and across its 776 borders that will allow the flow to reach its final destination. 778 At DMZs, there must be an unambiguous way to determine the local 779 source of a packet. An interface's source could be determined 780 from its MAC address which would then be used to classify packets 781 as coming across a logical link directly from the source domain 782 corresponding to that MAC address. Thus with this understanding 783 we can continue to use figures illustrating a single pipe between 784 two different domains. 786 In this way, all agreements and negotiations are performed 787 between two adjacent domains. An initial request might cause 788 communication between BBs on several domains along a path, but 789 each communication is only between two adjacent BBs. Initially, 790 these agreements will be prenegotiated and fairly static. Some may 791 become more dynamic as the service evolves. 793 4.2 Examples 795 This section gives examples of BB transactions in a non-trivial, 796 multi-transit-domain Internet. The BB framework allows operating 797 points across a spectrum from "no signalling across boundaries" 798 to "each flow set up dynamically". We might expect to move 799 across this spectrum over time, as the necessary mechanisms are 800 ubiquitously deployed and BBs become more sophisticated, but 801 the statically allocated portions of the spectrum should always 802 have uses. We believe the ability to support this wide spectrum 803 of choices simultaneously will be important both in incremental 804 deployment and in allowing ISPs to make a wide range of offerings 805 and pricings to users. The examples of this section roughly follow 806 the spectrum of increasing sophistication. Note that we assume 807 that domains contract for some amount of Marked traffic which can 808 be requested as either `Assured' or `Premium' in each individual 809 flow setup transaction. The examples say "Marked" although actual 810 transactions would have to specify either Assured or Premium. 812 A statically configured example with no BB messages exchanged 814 Here all allocations are statically preallocated through purely 815 bilateral agreements between users (individual TCPs, individual 816 hosts, campus networks, or whole ISPs) [6]. The allocations are in 817 the form of usage profiles of rate, burst, and a time during which 818 that profile is to be active. Users and providers negotiate these 819 Profiles which are then installed in the user domain BB and in the 820 provider domain BB. No BB messages cross the boundary; we assume 821 this negotiation is done by human representatives of each domain. 822 In this case, BBs only have to perform one of their two functions, 823 that of allocating this Profile within their local domain. It is 824 even possible to set all of this suballocations up in advance and 825 then the BB only needs to set up and tear down the Profile at the 826 proper time and to refresh the soft state in the leaf routers. 827 >From the user domain BB, the Profile is sent as soft state 828 to the first hop router of the flow during the specified time. 829 These Profiles might be set using RSVP, a variant of RSVP, SNMP, or 830 some vendor-specific mechanism. Although this static approach can 831 work for all Marked traffic, due to the strictly not oversubscribed 832 requirement, it is only appropriate for Premium traffic as long as 833 it is kept to a small percentage of the bottleneck path through 834 a domain or is otherwise constrained to a well-known behavior. 835 Similar restrictions might hold for Assured depending on the 836 expectation associated with the service. 838 In figure 6, we show an example of setting a Profile in a leaf 839 router. A usage profile has been negotiated with the ISP for the 840 entire domain and the BB parcels it out among individual flows 841 as requested. The leaf router mechanism is that shown in figure 3, 842 with the token bucket set to the parameters from the usage profile. 843 The ISP's BB would configure its own Profile Meter at the ingress 844 router from that customer to ensure the Profile was maintained. 845 This mechanism was shown in figure 5. We assume that the time 846 duration and start times for any Profile to be active are 847 maintained in the BB. The Profile is sent to the ingress device 848 or cleared from the ingress device by messages sent from the BB. 849 In this example, we assume that van@lbl wants to talk to ddc@mit. 850 The LBL-BB is sent a request from Van asking that premium service 851 be assigned to a flow that is designated as having source address 852 "V:4" and going to destination address "D:8". This flow should be 853 configured for a rate of 128kb/sec and allocated from 1pm to 3pm. 854 The request must be "signed" in a secure, verifiable manner. 855 The request might be sent as data to the LBL-BB, an e-mail message 856 to a network administrator, or in a phone call to a network 857 administrator. The LBL-BB receives this message, verifies that 858 there is 128kb/sec of unused Premium service for the domain from 859 1-3pm, then sends a message to Leaf1 that sets up an appropriate 860 Profile Meter. The message to Leaf1 might be an RSVP message, 861 or SNMP, or some proprietary method. All the domains passed must 862 have sufficient reserve capacity to meet this request. 864 Figure 6. Bandwidth Broker setting Profiles in leaf routers 866 A statically configured example with BB messages exchanged 868 Next we present an example where all allocations are statically 869 preallocated but BB messages are exchanged for greater flexibility. 870 Figure 7 shows an end-to-end example for Marked traffic in a 871 statically allocated internet. The numbers at the trust region 872 boundaries indicate the total statically allocated Marked packet 873 rates that will be accepted across those boundaries. For example, 874 100kbps of Marked traffic can be sent from LBL to ESNet; a Profile 875 Meter at the ESNet egress boundary would have a token bucket set 876 to rate 100kbps. (There MAY be a shaper set at LBL's egress to 877 ensure that the Marked traffic conforms to the aggregate Profile.) 878 The tables inside the transit network "bubbles" show their policy 879 databases and reflect the values after the transaction is complete. 880 In Figure 7, V wants to transmit a flow from LBL to D at MIT at 10 881 Kbps. As in figure 6, a request for this profile is made of LBL's 882 BB. LBL's BB authenticates the request and checks to see if there 883 is 10kbps left in its Marked allocation going in that direction. 884 There is, so the LBL-BB passes a message to the ESNet-BB saying 885 that it would like to use 10kbps of its Marked allocation for 886 this flow. ESNet authenticates the message, checks its database 887 and sees that it has a 10kbps Marked allocation to NEARNet (the 888 next region in that direction) that is being unused. The policy 889 is that ESNet-BB must always inform ("ask") NEARNet-BB when it 890 is about to use part of its allocation. NEARNET-BB authenticates 891 the message, checks its database and discovers that 20kbps of the 892 allocation to MIT is unused and the policy at that boundary is to 893 not inform MIT when part of the allocation is about to be used 894 ("<50 ok" where the total allocation is 50). The dotted lines 895 indicate the "implied" transaction, that is the transaction that 896 would have happened if the policy hadn't said "don't ask me". 897 Now each BB can pass an "ok" message to this request across 898 its boundary. This allows V to send to D, but not vice versa. 899 It would also be possible for the request to originate from D. 901 Figure 7. End-to-end example with static allocation. 903 Consider the same example where the ESNet-BB finds all of its 904 Marked allocation to NEARNet, 10 kbps, in use. With static 905 allocations, ESNet must transmit a "no" to this request back to 906 the LBL-BB. Presumably, the LBL-BB would record this information to 907 complain to ESNet about the overbooking at the end of the month! 908 One solution to this sort of "busy signal" is for ESNet to get 909 better at anticipating its customers needs or require long advance 910 bookings for every flow, but it's also possible for bandwidth 911 brokerage decisions to become dynamic. 913 Figure 8. End-to-end static allocation example with no remaining 914 allocation 916 Dynamic Allocation and additional mechanism 918 As we shall see, dynamic allocation requires more complex BBs as 919 well as more complex border policing, including the necessity to 920 keep more state. However, it enables an important service with 921 a small increase in state. 923 The next set of figures (starting with figure 9) show what happens 924 in the case of dynamic allocation. As before, V requests 10kbps 925 to talk to D at MIT. Since the allocation is dynamic, the border 926 policers do not have a preset value, instead being set to reflect 927 the current peak value of Marked traffic permitted to cross 928 that boundary. The request is sent to the LBL-BB. 930 Figure 9. First step in end-to-end dynamic allocation example. 932 In figure 10, note that ESNet has no allocation set up to NEARNet. 933 This system is capable of dynamic allocations in addition to 934 static, so it asks NEARNet if it can "add 10" to its allocation 935 from ESNet. As in the figure 7 example, MIT's policy is set to 936 "don't ask" for this case, so the dotted lines represent "implicit 937 transactions" where no messages were exchanged. However, NEARNet 938 does update its table to indicate that it is now using 20kbps of 939 the Marked allocation to MIT. 941 Figure 10. Second step in end-to-end dynamic allocation example 943 In figure 11, we see the third step where MIT's "virtual ok" 944 allows the NEARNet-BB to tell its border router to increase the 945 Marked allocation across the ESNet-NEARNet boundary by 10 kbps. 947 Figure 11. Third step in end-to-end dynamic allocation example 949 Figure 11 shows NEARNet-BB's "ok" for that request transmitted 950 back to ESNet-BB. This causes ESNet-BB to send its border router 951 a message to create a 10 kbps subclass for the flow "V->D". 952 This is required in order to ensure that the 10kpbs that has just 953 been dynamically allocated gets used only for that connection. 954 Note that this does require that the per flow state be passed 955 from LBL-BB to ESNet-BB, but this is the only boundary that needs 956 that level of flow information and this further classification 957 will only need to be done at that one boundary router and only 958 on packets coming from LBL. Thus dynamic allocation requires more 959 complex Profile Metering than that shown in figure 5. 961 Figure 12. Fourth step in end-to-end dynamic allocation example. 963 In figure 12, the ESNet border router gives the "ok" that a 964 subclass has been created, causing the ESNet-BB to send an "ok" 965 to the LBL-BB which lets V know the request has been approved. 967 Figure 13. Final step in end-to-end dynamic allocation example 969 For dynamic allocation, a basic version of a CBQ scheduler [5] 970 would have all the required functionality to set up the subclasses. 971 RSVP currently provides a way to move the TSpec for the flow. 973 For multicast flows, we assume that packets that are bound for 974 at least one egress can be carried through a domain at that level 975 of service to all egress points. If a particular multicast branch 976 has been subscribed to at best-effort when upstream branches are 977 Marked, it will have its bit settings cleared before it crosses 978 the boundary. The information required for this flow identification 979 is used to augment the existing state that is already kept on 980 this flow because it is a multicast flow. We note that we are 981 already "catching" this flow, but now we must potentially clear 982 the bit-pattern. 984 5. RSVP/int-serv and this architecture 986 Much work has been done in recent years on the definition of 987 related integrated services for the internet and the specification 988 of the RSVP signalling protocol. The two-bit architecture proposed 989 in this work can easily interoperate with those specifications. 990 In this section we first discuss how the forwarding mechanisms 991 described in section 3 can be used to support integrated 992 services. Second, we discuss how RSVP could interoperate with 993 the administrative structure of the BBs to provide better scaling. 995 5.1 Providing Controlled-Load and Guaranteed Service 997 We believe that the forwarding path mechanisms described in 998 section 3 are general enough that they can also be used to provide 999 the Controlled-Load service [8] and a version of the Guaranteed 1000 Quality of Service [9], as developed by the int-serv WG. First note 1001 that Premium service can be thought of as a constrained case of 1002 Controlled-Load service where the burst size is limited to one 1003 packet and where non-conforming packets are dropped. A network 1004 element that has implemented the mechanisms to support premium 1005 service can easily support the more general controlled-load 1006 service by making one or more minor parameter adjustments, e.g. 1007 by lifting the constraint on the token bucket size, or configuring 1008 the Premium service rate with the peak traffic rate parameter in 1009 the Controlled-Load specification, and by changing the policing 1010 action on out-of-profile packets from dropping to sending the 1011 packets to the Best-effort queue. 1013 It is also possible to implement Guaranteed Quality of Service 1014 using the mechanisms of Premium service. From RFC 2212 [9]: 1015 "The definition of guaranteed service relies on the result that 1016 the fluid delay of a flow obeying a token bucket (r, b) and being 1017 served by a line with bandwidth R is bounded by b/R as long as R is 1018 no less than r. Guaranteed service with a service rate R, where now 1019 R is a share of bandwidth rather than the bandwidth of a dedicated 1020 line approximates this behavior." The service model of Premium 1021 clearly fits this model. RFC 2212 states that "Non-conforming 1022 datagrams SHOULD be treated as best-effort datagrams." Thus, a 1023 policing Profile Meter that drops non-conforming datagrams would 1024 be acceptable, but it's also possible to change the action for 1025 non-compliant packets from a drop to sending to the best-effort 1026 queue. 1028 5.2 RSVP and BBs 1030 In this section we discuss how RSVP signaling can be used in 1031 conjunction with the BBs described in section 4 to deliver a 1032 more scalable end-to-end resource set up for Integrated Services. 1033 First we note that the BB architecture has three major differences 1034 with the original RSVP resource set up model: 1036 1. There exist apriori bilateral business relations between BBs of 1037 adjacent trust regions before one can set up end-to-end resource 1038 allocation; real-time signaling is used only to activate/confirm 1039 the availability of pre-negotiated Marked bandwidth, and to 1040 dynamically readjust the allocation amount when necessary. We note 1041 that this real-time signaling across domains is not required, 1042 but depends on the nature of the bilateral agreement (e.g., the 1043 agreement might state "I'll tell you whenever I'm going to use 1044 some of my allocation" or not). 1046 2. A few bits in the packet header, i.e. the P-bit and A-bit, 1047 are used to mark the service class of each packet, therefore a 1048 full packet classification (by checking all relevant fields in 1049 the header) need be done only once at the leaf router; after that 1050 packets will be served according to their class bit settings. 1052 3. RSVP resource set up assumes that resources will be reserved 1053 hop-by-hop at each router along the entire end-to-end path. 1055 RSVP messages sent to leaf routers by hosts can be intercepted 1056 and sent to the local domain's BB. The BB processes the message 1057 and, if the request is approved, forwards a message to the leaf 1058 router that sets up appropriate per-flow packet classification. 1059 A message should also be sent to the egress border router to add 1060 to the aggregate Marked traffic allocation for packet shaping by 1061 the Profile Meter on outbound traffic. (It's possible that this is 1062 always set to the full allocation.) An RSVP message must be sent 1063 across the boundary to adjacent ISP's border router, either from 1064 the local domain's border router or from the local domain's BB. 1065 If the ISP is also implementing the RSVP with a BB and diff-serv 1066 framework, its border router forwards the message to the ISP's 1067 local BB. A similar process (to what happened in the first domain) 1068 can be carried out in the ISP domain, then an RSVP message 1069 gets forwarded to the next ISP along the path. Inside a domain, 1070 packets are served solely according to the Marked bits. The local 1071 BB knows exactly how much Premium traffic is permitted to enter 1072 at each border router and from which border router packets exit. 1074 6. Recommendations 1076 This document has presented a reference architecture for 1077 differentiated services. Several variations can be envisioned, 1078 particularly for early and partial deployments, but we do not 1079 enumerate all of these variations here. There has been a great 1080 market demand for differentiated services lately. As one of the 1081 many efforts to meet that demand this draft sketches out the 1082 framework of a flexible architecture for offering differential 1083 services, and in particular defines a simple set of packet 1084 forwarding path mechanisms to support two basic types of 1085 differential services. Although there remain a number of issues 1086 and parameters that need further exploration and refinement, 1087 we believe it is both possible and feasible at this time to 1088 start deployment of differentiated services incrementally. First, 1089 given that the basic mechanisms required in the packet forwarding 1090 path are clearly understood, both Assured and Premium services 1091 can be implemented today with manually configured BBs and static 1092 resource allocation. Initially we recommend conservative choices 1093 on the amount of Marked traffic that is admitted into the network. 1094 Second, we plan to continue the effort started with this draft 1095 and the experimental work of the authors to define and deploy 1096 increasingly sophisticated BBs. We hope to turn the experience 1097 gained from in-progress trial implementations on ESNet and CAIRN 1098 into future proposals to the IETF. 1100 Future revisions of this draft will present the receiver-based 1101 and multicast flow allocations in detail. After this step 1102 is finished, we believe the basic picture of an scalable, 1103 robust, secure resource management and allocation system will be 1104 completed. In this draft we described how the proposed architecture 1105 supports two services that seem to us to provide at least a good 1106 starting point for trial deployment of differentiated services. 1107 Our main intent is to define an architecture with three services, 1108 Premium, Assured, and Best effort, that can be determined by 1109 specific bit-patterns, but not to preclude additional levels 1110 of differentiation within each service. It seems that more 1111 experimentation and experience is required before we could 1112 standardize more than one level per service class. Our base-level 1113 approach says that everyone has to provide "at least" Premium 1114 service and Assured service as documented. We feel rather strongly 1115 about both 1) that we should not try to define, at this time, 1116 something beyond the minimalist two service approach and 2) that 1117 the architecture we define must be open-ended so that more levels 1118 of differentiation might be standardized in the future. We believe 1119 this architecture is completely compatible with approaches that 1120 would define more levels of differentiation within a particular 1121 service, if the benefits of doing so become well understood. 1123 7. Acknowledgments 1125 The authors have benefited from many discussions, both in 1126 person and electronically and wish to particularly thank Dave 1127 Clark who has been responsible for the genesis of many of the 1128 ideas presented here, though he does not agree with all of the 1129 content this document. We also thank Sally Floyd for comments 1130 on an earlier draft. A comment from Jon Crowcroft was partially 1131 responsible for our including section 5. Comments from Fred Baker 1132 made us try to make it clearer that we are defining two base-level 1133 services, irrespective of the bit patterns used to encode them. 1135 8. References 1137 [1] D. Clark, "Adding Service Discrimination to the Internet", 1138 1995. 1140 [2] V. Jacobson, "Differentiated Services Architecture", talk in 1141 the Int-Serv WG at the Munich IETF, August, 1997. 1143 [3] D. Clark and J. Wroclawski, "An Approach to Service Allocation 1144 in the Internet", Internet Draft draft-clark-diff-svc-alloc-00.txt, 1145 July 1997, also talk by D. Clark in the Int-Serv WG at the Munich 1146 IETF, August, 1997. 1148 [4] Braden et. al., "Recommendations on Queue Management and 1149 Congestion Avoidance in the Internet", Internet Draft, March, 1997. 1151 [4] Braden, R., Ed., et. al., "Resource Reservation Protocol (RSVP) 1152 - Version 1 Functional Specification", RFC 2205, September, 1997. 1154 [5] S. Floyd and V. Jacobson, "Link-sharing and Resource Management 1155 Models for Packet Networks", IEEE/ACM Transactions on Networking, 1156 pp 365-386, August 1995. 1158 [6] D. Clark, private communication, October 26, 1997 1160 [7] "Advanced QoS Services for the Intelligent Internet", Cisco 1161 Systems White Paper, 1997. 1163 [8] J. Wroclawski, "Specification of the Controlled-Load Network 1164 Element Service", RFC 2211, September, 1997. 1166 [9] S. Shenker, et. al., "Specification of Guaranteed Quality of 1167 Service", RFC 2212, September, 1997. 1169 [10] D. Clark and W. Fang, "Explicit Allocation of 1170 Best Effort Packet Delivery Service", November, 1997. 1171 http://diffserv.lcs.mit.edu/Papers/exp-alloc-ddc-wf.pdf 1173 Authors' Addresses 1175 Kathleen Nichols 1176 Bay Networks, Inc. 1177 Bay Architecture Lab 1178 4401 Great America Parkway, SC1-04 1179 Santa Clara, CA 95052-8185 1180 Phone: 408-495-3252 1181 Fax: 408-495-1299 1182 Email: knichols@baynetworks.com 1184 Van Jacobson 1185 M/S 50B-2239 1186 Lawrence Berkeley National Laboratory 1187 One Cyclotron Rd 1188 Berkeley, CA 94720 1189 Email: van@ee.lbl.gov 1191 Lixia Zhang 1192 UCLA 1193 4531G Boelter Hall 1194 Los Angeles, CA 90095 1195 Phone: 310-825-2695 1196 Email: lixia@cs.ucla.edu 1198 Internet Engineering Task Force K. Nichols / V. Jacobson / L. Zhang 1199 draft-nichols-diff-svc-arch-00.txt Expires: 5/98