idnits 2.17.1 draft-nichols-diff-svc-arch-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 1223 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 5 instances of lines with control characters in the document. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 1020: '...orming datagrams SHOULD be treated as ...' Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 20 has weird spacing: '... at any time....' == Line 23 has weird spacing: '...ssed at http:...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. '1' -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Normative reference to a draft: ref. '3' -- Possible downref: Non-RFC (?) normative reference: ref. '5' -- Possible downref: Non-RFC (?) normative reference: ref. '6' -- Possible downref: Non-RFC (?) normative reference: ref. '7' -- Possible downref: Non-RFC (?) normative reference: ref. '10' Summary: 8 errors (**), 0 flaws (~~), 4 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET DRAFT K. Nichols 2 draft-nichols-diff-svc-arch-01.txt V. Jacobson 3 April, 1999 Cisco 4 L. Zhang 5 UCLA 7 A Two-bit Differentiated Services Architecture for the Internet 9 Status of this Memo 11 This document is an Internet-Draft and is in full conformance with 12 all provisions of Section 10 of RFC2026. 14 Internet-Drafts are working documents of the Internet Engineering 15 Task Force (IETF), its areas, and its working groups. Note that other 16 groups may also distribute working documents as Internet-Drafts. 18 Internet-Drafts are draft documents valid for a maximum of six 19 months and may be updated, replaced, or obsoleted by other 20 documents at any time. It is inappropriate to use Internet-Drafts as 21 reference material or to cite them other than as "work in progress." 23 The list of current Internet-Drafts can be accessed at http:// 24 www.ietf.org/ietf/1id-abstracts.txt. 26 The list of Internet-Draft Shadow Directories can be accessed at 27 http://www.ietf.org/shadow.html. 29 Abstract 31 This document was originally submitted as an internet draft in 32 November of 1997. As one of the documents predating the formation 33 of the IETF's Differentiated Services Working Group, many of the 34 ideas presented here, in concert with Dave Clark's subsequent 35 presentation to the December 1997 meeting of the IETF Integrated 36 Services Working Group, were key to the work which led to RFCs 37 2474 and 2475 and the section on allocation remains a timely 38 proposal. For this reason, and to provide a reference, it is 39 being submitted in its original form. The forwarding path portion 40 of this document is intended as a record of where we were at in late 41 1997 and not as an indication of future direction. 43 The postscript version of this document includes Clark's slides as an 44 appendix. The postscript version of this document also includes many 45 figures that aid greatly in its readability. 47 1. Introduction 49 This document presents a differentiated services architecture for the 50 internet. Dave Clark and Van Jacobson each presented work on 51 differentiated services at the Munich IETF meeting [2,3]. Each 52 explained how to use one bit of the IP header to deliver a new 53 kind of service to packets in the internet. These were two very 54 different kinds of service with quite different policy assumptions. 55 Ensuing discussion has convinced us that both service types have 56 merit and that both service types can be implemented with a set 57 of very similar mechanisms. We propose an architectural 58 framework that permits the use of both of these service types and 59 exploits their similarities in forwarding path mechanisms. The 60 major goals of this architecture are each shared with one or both 61 of those two proposals: keep the forwarding path simple, push 62 complexity to the edges of the network to the extent possible, 63 provide a service that avoids assumptions about the type of 64 traffic using it, employ an allocation policy that will be 65 compatible with both long-term and short-term provisioning, 66 make it possible for the dominant Internet traffic model to 67 remain best-effort. 69 The major contributions of this document are to present two 70 distinct service types, a set of general mechanisms for the 71 forwarding path that can be used to implement a range of 72 differentiated services and to propose a flexible framework for 73 provisioning a differentiated services network. It is precisely this 74 kind of architecture that is needed for expedient deployment of 75 differentiated services: we need a framework and set of 76 primitives that can be implemented in the short-term and provide 77 interoperable services, yet can provide a "sandbox" for 78 experimentation and elaboration that can lead in time to more 79 levels of differentiation within each service as needed. 81 At the risk of belaboring an analogy, we are motivated to provide 82 services tiers in somewhat the same fashion as the airlines do 83 with first class, business class and coach class. The latter also has 84 tiering built in due to the various restrictions put on the purchase. 85 A part of the analogy we want to stress is that best effort traffic, 86 like coach class seats on an airplane, is still expected to make up 87 the bulk of internet traffic. Business and first class carry a small 88 number of passengers, but are quite important to the economics 89 of the airline industry. The various economic forces and realities 90 combine to dictate the relative allocation of the seats and to try to 91 fill the airplane. We don't expect that differentiated services will 92 comprise all the traffic on the internet, but we do expect that new 93 services will lead to a healthy economic and service 94 environment. 96 This document is organized into sections describing service 97 architecture, mechanisms, the bandwidth allocation architecture, 98 how this architecture might interoperate with RSVP/int-serv 99 work, and gives recommendations for deployment. 101 2. Architecture 103 2.1 Background 105 The current internet delivers one type of service, best-effort, to 106 all traffic. A number of proposals have been made concerning 107 the addition of enhanced services to the Internet. We focus on 108 two particular methods of adding a differentiated level of service 109 to IP, each designated by one bit [1,2,3]. These services 110 represent a radical departure from the Internet's traditional 111 service, but they are also a radical departure from traditional 112 "quality of service" architectures which rely on circuit-based 113 models. Both these proposals seek to define a single common 114 mechanism that is used by interior network routers, pushing most 115 of the complexity and state of differentiated services to the 116 network edges. Both use bandwidth as the resource that is being 117 requested and allocated. Clark and Wroclawski defined an 118 "Assured" service that follows "expected capacity" usage profiles 119 that are statistically provisioned [3]. The assurance that the user 120 of such a service receives is that such traffic is unlikely to be 121 dropped as long as it stays within the expected capacity profile. 122 The exact meaning of "unlikely" depends on how well 123 provisioned the service is. An Assured service traffic flow may 124 exceed its Profile, but the excess traffic is not given the same 125 assurance level. Jacobson defined a "Premium" service that is 126 provisioned according to peak capacity Profiles that are strictly 127 not oversubscribed and that is given its own high-priority queue 128 in routers [2]. A Premium service traffic flow is shaped and 129 hard-limited to its provisioned peak rate and shaped so that 130 bursts are not injected into the network. Premium service 131 presents a "virtual wire" where a flow's bursts may queue at the 132 shaper at the edge of the network, but thereafter only in 133 proportion to the indegree of each router. Despite their many 134 similarities, these two approaches result in fundamentally 135 different services. The former uses buffer management to 136 provide a "better effort" service while the latter creates a service 137 with little jitter and queueing delay and no need for queue 138 management on the Premium packets's queue. 140 An Assured service was introduced in [3] by Clark and 141 Wroclawski, though we have made some alterations in its 142 specification for our architecture. Further refinements and an 143 "Expected Capacity" framework are given in Clark and Fang 144 [10]. This framework is focused on "providing different levels 145 of best-effort service at times of network congestion" but also 146 mentions that it is possible to have a separate router queue to 147 implement a "guaranteed" level of assurance. We believe this 148 framework and our Two-bit architecture are compatible but this 149 needs further exploration. As Premium service has not been 150 documented elsewhere, we describe it next and follow this with a 151 description of the two-bit architecture. 153 2.2 Premium service 155 In [2], a Premium service was presented that is fundamentally 156 different from the Internet's current best effort service. This 157 service is not meant to replace best effort but primarily to meet 158 an emerging demand for a commercial service that can share the 159 network with best effort traffic. This is desirable economically, 160 since the same network can be used for both kinds of traffic. It is 161 expected that Premium traffic would be allocated a small 162 percentage of the total network capacity, but that it would be 163 priced much higher. One use of such a service might be to create 164 "virtual leased lines", saving the cost of building and maintaining 165 a separate network. Premium service, not unlike a standard 166 telephone line, is a capacity which the customer expects to be 167 there when the receiver is lifted, although it may, depending on 168 the household, be idle a good deal of the time. Provisioning 169 Premium traffic in this way reduces the capacity of the best 170 effort internet by the amount of Premium allocated, in the worst 171 case, thus it would have to be priced accordingly. On the other 172 hand, whenever that capacity is not being used it is available to 173 best effort traffic. In contrast to normal best effort traffic which 174 is bursty and requires queue management to deal fairly with 175 congestive episodes, this Premium service by design creates very 176 regular traffic patterns and small or nonexistent queues. 178 Premium service levels are specified as a desired peak bit-rate 179 for a specific flow (or aggregation of flows). The user contract 180 with the network is not to exceed the peak rate. The network 181 contract is that the contracted bandwidth will be available when 182 traffic is sent. First-hop routers (or other edge devices) filter the 183 packets entering the network, set the Premium bit of those that 184 match a Premium service specification, and perform traffic 185 shaping on the flow that smooths all traffic bursts before they 186 enter the network. This approach requires no changes in hosts. A 187 compliant router along the path needs two levels of priority 188 queueing, sending all packets with the Premium bit set first. 189 Best-effort traffic is unmarked and queued and sent at the lower 190 priority. This results in two "virtual networks": one which is 191 identical to today's Internet with buffers designed to absorb 192 traffic bursts; and one where traffic is limited and shaped to a 193 contracted peak-rate, but packets move through a network of 194 queues where they experience almost no queueing delay. 196 In this architecture, forwarding path decisions are made 197 separately and more simply than the setting up of the service 198 agreements and traffic profiles. With the exception of policing 199 and shaping at administrative or "trust" boundaries, the only 200 actions that need to be handled in the forwarding path are to 201 classify a packet into one of two queues on a single bit and to 202 service the two queues using simple priority. Shaping must 203 include both rate and burst parameters; the latter is expected to 204 be small, in the one or two packet range. Policing at boundaries 205 enforces rate compliance, and may be implemented by a simple 206 token bucket. The admission and set-up procedures are expected 207 to evolve, in time, to be dynamically configurable and fairly 208 complex while the mechanisms in the forwarding path remain 209 simple. 211 A Premium service built on this architecture can be deployed in a 212 useful way once the forwarding path mechanisms are in place by 213 making static allocations. Traffic flows can be designated for 214 special treatment through network management configuration. 215 Traffic flows should be designated by the source, the destination, 216 or any combination of fields in the packet header. First-hop (of 217 leaf) routers will filter flows on all or part of the header tuple 218 consisting of the source IP address, destination IP address, 219 protocol identifier, source port number, and destination port 220 number. Based on this classification, a first-hop router performs 221 traffic shaping and sets the designated Premium bit of the 222 precedence field. End-hosts are thus not required to be 223 "differentiated services aware", though if and when end-systems 224 become universally "aware", they might do their own shaping 225 and first-hop routers merely police. 227 Adherence to the subscribed rate and burst size must be enforced 228 at the entry to the network, either by the end-system or by the 229 first-hop router. Within an intranet, administrative domain, or 230 "trust region" the packets can then be classified and serviced 231 solely on the Premium bit. Where packets cross a boundary, the 232 policing function is critical. The entered region will check the 233 prioritized packet flow for conformance to a rate the two regions 234 have agreed upon, discarding packets that exceed the rate. It is 235 thus in the best interests of a region to ensure conformance to the 236 agreed-upon rate at the egress. This requirement means that 237 Premium traffic is burst-free and, together with the no 238 oversubscription rule, leads directly to the observation that 239 Premium queues can easily be sized to prevent the need to drop 240 packets and thus the need for a queue management policy. At 241 each router, the largest queue size is related to the in-degree of 242 other routers and is thus quite small, on the order of ten packets. 244 Premium bandwidth allocations must not be oversubscribed as 245 they represent a commitment by the network and should be 246 priced accordingly. Note that, in this architecture, Premium 247 traffic will also experience considerably less delay variation than 248 either best effort traffic or the Assured data traffic of [3]. 249 Premium rates might be configured on a subscription basis in the 250 near-term, or on-demand when dynamic set-up or signaling is 251 available. 253 Figure 1 shows how a Premium packet flow is established within 254 a particular administrative domain, Company A, and sent across 255 the access link to Company A's ISP. Assume that the host's first- 256 hop router has been configured to match a flow from the host's 257 IP address to a destination IP address that is reached through 258 ISP. A Premium flow is configured from a host with a rate which 259 is both smaller than the total Premium allocation Company A has 260 from the ISP, r bytes per second, and smaller than the amount of 261 that allocation has been assigned to other hosts in Company A. 262 Packets are not marked in any special way when they leave the 263 host. The first-hop router clears the Premium bit on all arriving 264 packets, sets the Premium bit on all packets in the designated 265 flow, shapes packets in the Premium flow to a configured rate 266 and burst size, queues best-effort unmarked packets in the low 267 priority queue and shaped Premium packets in the high priority 268 queue, and sends packets from those two queues at simple 269 priority. Intermediate routers internal to Company A enqueue 270 packets in one of two output queues based on the Premium bit 271 and service the queues with simple priority. Border routers 272 perform quite different tasks, depending on whether they are 273 processing an egress flow or an ingress flow. An egress border 274 router may perform some reshaping on the aggregate Premium 275 traffic to conform to rate r, depending on the number of 276 Premium flows aggregated. Ingress border routers only need to 277 perform a simple policing function that can be implemented with 278 a token bucket. In the example, the ISP accepts all Premium 279 packets from A as long as the flow does not exceed r bytes per 280 second. 282 Figure 1. Premium traffic flow from end-host to organization's ISP 284 2.3 Two-bit differentiated services architecture 286 Clark's and Jacobson's proposals are markedly similar in the 287 location and type of functional blocks that are needed to 288 implement them. Furthermore, they implement quite different 289 services which are not incompatible in a network. The Premium 290 service implements a guaranteed peak bandwidth service with 291 negligible queueing delay that cannot starve best effort traffic 292 and can be allocated in a fairly straightforward fashion. This 293 service would seem to have a strong appeal for commercial 294 applications, video broadcasts, voice-over-IP, and VPNs. On the 295 other hand, this service may prove both too restrictive (in its hard 296 limits) and overdesigned (no overallocation) for some 297 applications. The Assured service implements a service that has 298 the same delay characteristics as (undropped) best effort packets 299 and the firmness of its guarantee depends on how well individual 300 links are provisioned for bursts of Assured packets. On the other 301 hand, it permits traffic flows to use any additional available 302 capacity without penalty and occasional dropped packets for 303 short congestive periods may be acceptable to many users. This 304 service might be what an ISP would provide to individual 305 customers who are willing to pay a bit more for internet service 306 that seems unaffected by congestive periods. Both services are 307 only as good as their admission control schemes, though this can 308 be more difficult for traffic which is not peak-rate allocated. 310 There may be some additional benefits of deploying both 311 services. To the extent that Premium service is a conservative 312 allocation of resources, unused bandwidth that had been 313 allocated to Premium might provide some "headroom" for 314 underallocated or burst periods of Assured traffic or for best 315 effort. Network elements that deploy both services will be 316 performing RED queue management on all non-Premium traffic, 317 as suggested in [4], and the effects of mixing the Premium 318 streams with best effort might serve to reduce burstiness in the 319 latter. A strength of the Assured service is that it allows bursts to 320 happen in their natural fashion, but this also makes the 321 provisioning, admission control and allocation problem more 322 difficult so it may take more time and experimentation before 323 this admission policy for this service is completely defined. A 324 Premium service could be deployed that employs static 325 allocations on peak rates with no statistical sharing. 327 As there appear to be a number of advantages to an architecture 328 that permits these two types of service and because, as we shall 329 see, they can be made to share many of the same mechanisms, 330 we propose designating two bit-patterns from the IP header 331 precedence field. We leave the explicit designation of these bit- 332 patterns to the standards process thus we use the shorthand 333 notation of denoting each pattern by a bit, one we will call the 334 Premium or P-bit, the other we call the assurance or A-bit. It is 335 possible for a network to implement only one of these services 336 and to have network elements that only look at the one 337 applicable bit, but we focus on the two service architecture. 338 Further, we assume the case where no changes are made in the 339 hosts, appropriate packet marking all being done in the network, 340 at the first-hop, or leaf, router. We describe the forwarding path 341 architecture in this section, assuming that the service has been 342 allocated through mechanisms we will discuss in section 4. 344 In a more general sense, Premium service denotes packets that 345 are enqueued at a higher priority than the ordinary best-effort 346 queue. Similarly, Assured service denotes packets that are 347 treated preferentially with respect to the dropping probability 348 within the "normal" queue. There are a number of ways to add 349 more service levels within each of these service types [7], but 350 this document takes the position of specifying the base-level 351 services of Premium and Assured. 353 The forwarding path mechanisms can be broken down into those 354 that happen at the input interface, before packet forwarding, and 355 those that happen at the output interface, after packet forwarding. 356 Intermediate routers only need to implement the post packet 357 forwarding functions, while leaf and border routers must perform 358 functions on arriving packets before forwarding. We describe the 359 mechanisms this way for illustration; other ways of composing 360 their functions are possible. 362 Leaf routers are configured with a traffic profile for a particular 363 flow based on its packet header. This functionality has been 364 defined by the RSVP Working Group in RFC 2205. Figure 2 365 shows what happens to a packet that arrives at the leaf router, 366 before it is passed to the forwarding engine. All arriving packets 367 must have both the A-bit and the P-bit cleared after which 368 packets are classified on their header. If the header does not 369 match any configured values, it is immediately forwarded. 370 Matched flows pass through individual Markers that have been 371 configured from the usage profile for that flow: service class 372 (Premium or Assured), rate (peak for Premium, "expected" for 373 Assured), and permissible burst size (may be optional for 374 Premium). Assured flow packets emerge from the Marker with 375 their A-bits set when the flow is in conformance to its Profile, 376 but the flow is otherwise unchanged. For a Premium flow, the 377 Marker will hold packets when necessary to enforce their 378 configured rate. Thus Premium flow packets emerge from the 379 Marker in a shaped flow with their P-bits set. (It is possible for 380 Premium flow packets to be dropped inside of the Marker as we 381 describe below.) Packets are passed to the forwarding engine 382 when they emerge from Markers. Packets that have either their P 383 or A bits set we will refer to as Marked packets. 385 Figure 2. Block diagram of leaf router input functionality 387 Figure 3 shows the inner workings of the Marker. For both 388 Assured and Premium packets, a token bucket "fills" at the flow 389 rate that was specified in the usage profile. For Assured service, 390 the token bucket depth is set by the Profile's burst size. For 391 Premium service, the token bucket depth must be limited to the 392 equivalent of only one or two packets. (We suggest a depth of 393 one packet in early deployments.) When a token is present, 394 Assured flow packets have their A-bit set to one, otherwise the 395 packet is passed to the forwarding engine. For Premium- 396 configured Marker, arriving packets that see a token present have 397 their P-bits set and are forwarded, but when no token is present, 398 Premium flow packets are held until a token arrives. If a 399 Premium flow bursts enough to overflow the holding queue, its 400 packets will be dropped. Though the flow set up data can be used 401 to configure a size limit for the holding queue (this would be the 402 meaning of a "burst" in Premium service), it is not necessary. 403 Unconfigured holding queues should be capable of holding at 404 least two bandwidth-delay products, adequate for TCP 405 connections. A smaller value might be used to suit delay 406 requirements of a specific application. 408 Figure 3. Markers to implement the two different services 410 In practice, the token bucket should be implemented in bytes and 411 a token is considered to be present if the number of bytes in the 412 bucket is equal or larger to the size of the packet. For Premium, 413 the bucket can only be allowed to fill to the maximum packet 414 size; while Assured may fill to the configured burst parameter. 415 Premium traffic is held until a sufficient byte credit has 416 accumulated and this holding buffer provides the only real queue 417 the flow sees in the network. For Assured, traffic, we just test if 418 the bytes in the bucket are sufficient for the packet size and set A 419 if so. If not, the only difference is that A is not set. Assured 420 traffic goes into a queue following this step and potentially sees a 421 queue at every hop along its path. 423 Each output interface of a router must have two queues and must 424 implement a test on the P-bit to select a packet's output queue. 425 The two queues must be serviced by simple priority, Premium 426 packets first. Each output interface must implement the RED- 427 based RIO mechanism described in [3] on the lower priority 428 queue. RIO uses two thresholds for when to begin dropping 429 packets, a lower one based on total queue occupancy for ordinary 430 best effort traffic and one based on the number of packets 431 enqueued that have their A-bit set. This means that any action 432 preferential to Assured service traffic will only be taken when 433 the queue's capacity exceeds the threshold value for ordinary 434 best effort service. In this case, only unmarked packets will be 435 dropped (using the RED algorithm) unless the threshold value 436 for Assured service is also reached. Keeping an accurate count of 437 the number of A-bit packets currently in a queue requires either 438 testing the A-bit at both entry and exit of the queue or some 439 additional state in the router. Figure 4 is a block diagram of the 440 output interface for all routers. 442 Figure 4. Router output interface for two-bit architecture 444 The packet output of a leaf router is thus a shaped stream of 445 packets with P-bits set mingled with an unshaped best effort 446 stream of packets, some of which may have A-bits set. Premium 447 service clearly cannot starve best effort traffic because it is both 448 burst and bandwidth controlled. Assured service might rely only 449 on a conservative allocation to prevent starvation of unmarked 450 traffic, but bursts of Assured traffic might then close out best- 451 effort traffic at bottleneck queues during congestive periods. 453 After [3], we designate the forwarding path objects that test 454 flows against their usage profiles "Profile Meters". Border 455 routers will require Profile Meters at their input interfaces. The 456 bilateral agreement between adjacent administrative domains 457 must specify a peak rate on all P traffic and a rate and burst for A 458 traffic (and possibly a start time and duration). A Profile Meter is 459 required at the ingress of a trust region to ensure that 460 differentiated service packet flows are in compliance with their 461 agreed-upon rates. Non-compliant packets of Premium flows are 462 discarded while non-compliant packets of Assured flows have 463 their A-bits reset. For example, in figure 1, if the ISP has agreed 464 to supply Company A with r bytes/sec of Premium service, P-bit 465 marked packets that enter the ISP through the link from 466 Company A will be dropped if they exceed r. If instead, the 467 service in figure 1 was Assured service, the packets would 468 simply be unmarked, forwarded as best effort. 470 The simplest border router input interface is a Profile Meter 471 constructed from a token bucket configured with the contracted 472 rate across that ingress link (see figure 5). Each type, Premium or 473 Assured, and each interface must have its own profile meter 474 corresponding to a particular class across a particular boundary. 475 (This is in contrast to models where every flow that crosses the 476 boundary must be separately policed and/or shaped.) The exact 477 mechanisms required at a border router input interface depend on 478 the allocation policy deployed; a more complex approach is 479 presented in section 4. 481 Figure 5. Border router input interface Profile Meters 483 3. Mechanisms 485 3.1 Forwarding Path Primitives 487 Section 2.3 introduced the forwarding path objects of Markers 488 and Profile Meters. In this section we specify the primitive 489 building blocks required to compose them. The primitives are: 490 general classifier, bit-pattern classifier, bit setter, priority 491 queues, policing token bucket and shaping token bucket. These 492 primitives can compose a Marker (either a policing or a shaping 493 token bucket plus a bit setter) and a Profile Meter (a policing 494 token bucket plus a dropper or bit setter). 496 General Classifier: Leaf or first-hop routers must perform a 497 transport-level signature matching based on a tuple in the packet 498 header, a functionality which is part of any RSVP-capable router. 499 As described above, packets whose tuples match one of the configured 500 flows are conformance tested and have the appropriate service bit set. 501 This function is memory- and processing-intensive, but is kept at the 502 edges of the network where there are fewer flows. 504 Bit-pattern classifier: This primitive comprises a simple two- 505 way decision based on whether a particular bit-pattern in the IP 506 header is set or not. As in figure 4, the P-bit is tested when a 507 packet arrives at a non-leaf router to determine whether to 508 enqueue it in the high priority output queue or the low priority 509 packet queue. The A-bit of packets bound for the low priority 510 queue is tested to 1) increment the count of Assured packets in 511 the queue if set and 2) determine which drop probability will be 512 used for that packet. Packets exiting the low priority queue must 513 also have the A-bit tested so that the count of enqueued Assured 514 packets can be decremented if necessary. 516 Bit setter: The A-bits and P-bits must be set or cleared in several 517 places. A functional block that sets the appropriate bits of the IP 518 header to a configured bit-pattern would be the most general. 520 Priority queues: Every network element must include (at least) 521 two levels of simple priority queueing. The high priority queue is 522 for the Premium traffic and the service rule is to send packets in 523 that queue first and to exhaustion. Recall that Premium traffic 524 must never be oversubscribed, thus Premium traffic should see 525 little or no queue. 527 Shaping token bucket:This is the token bucket required at the 528 leaf router for Premium traffic and shown in figure 3. As we 529 shall see, shaping is also useful at egress points of a trust region. 530 An arriving packet is immediately forwarded if there is a token 531 present in the bucket, otherwise the packet is enqueued until the 532 bucket contains tokens sufficient to send it. Shaping requires 533 clocking mechanisms, packet memory, and some state block for 534 each flow and is thus a memory and computation-intensive 535 process. 537 Policing token bucket: This is the token bucket required for 538 Profile Meters and shown in figure 5. Policing token buckets 539 never hold arriving packets, but check on arrival to see if a token 540 is available for the packet's service class. If so, the packet is 541 forwarded immediately. If not, the policing action is taken, 542 dropping for Premium and reclassifying or unmarking for 543 Assured. 545 3.2 Passing configuration information 547 Clearly, mechanisms are required to communicate the 548 information about the request to the leaf router. This 549 configuration information is the rate, burst, and whether it is a 550 Premium or Assured type. There may also need to be a specific 551 field to set or clear this configuration. This information can be 552 passed in a number of ways, including using the semantics of 553 RSVP, SNMP, or directly set by a network administrator in some 554 other way. There must be some mechanisms for authenticating 555 the sender of this information. We expect configuration to be 556 done in a variety of ways in early deployments and a protocol 557 and mechanism for this to be a topic for future standards work. 559 3.3 Discussion 561 The requirements of shapers motivate their placement at the 562 edges of the network where the state per router can be smaller 563 than in the middle of a network. The greatest burden of flow 564 matching and shaping will be at leaf routers where the speeds 565 and buffering required should be less than those that might be 566 required deeper in the network. This functionality is not required 567 at every network element on the path. Routers that are internal to 568 a trust region will not need to shape traffic. Border routers may 569 need or desire to shape the aggregate flow of Marked packets at 570 their egress in order to ensure that they will not burst into non- 571 compliance with the policing mechanism at the ingress to the 572 other domain (though this may not be necessary if the in-degree 573 of the router is low). Further, the shaping would be applied to an 574 aggregation of all the Premium flows that exit the domain via 575 that path, not to each flow individually. 577 These mechanisms are within reach of today's technology and it 578 seems plausible to us that Premium and Assured services are all 579 that is needed in the Internet. If, in time, these services are found 580 insufficient, this architecture provides a migration path for 581 delivering other kinds of service levels to traffic. The A- and P- 582 bits would continue to be used to identify traffic that gets 583 Marked service, but further filter matching could be done on 584 packet headers to differentiate service levels further. Using the 585 bits this way reduces the number of packets that have to have 586 further matching done on them rather than filtering every 587 incoming packet. More queue levels and more complex 588 scheduling could be added for P-bit traffic and more levels of 589 drop priority could be added for A-bit traffic if experience shows 590 them to be necessary and processing speeds are sufficient. We 591 propose that the services described here be considered as "at 592 least" services. Thus, a network element should at least be 593 capable of mapping all P-bit traffic to Premium service and of 594 mapping all A-bit traffic to be treated with one level of priority 595 in the "best effort" queue (it appears that the single level of A-bit 596 traffic should map to a priority that is equivalent to the best level 597 in a multi-level element that is also in the path). 599 On the other hand, what is the downside of deploying an 600 architecture for both classes of service if later experience 601 convinces us that only one of them is needed? The functional 602 blocks of both service classes are similar and can be provided by 603 the same mechanism, parameterized differently. If Assured 604 service is not used, very little is lost. A RED-managed best effort 605 queue has been strongly recommended in [4] and, to the extent 606 that the deployment of this architecture pushes the deployment of 607 RED-managed best effort queues, it is clearly a positive. If 608 Premium service goes unused, the two-queues with simple 609 priority service is not required and the shaping function of the 610 Marker may be unused, thus these would impose an unnecessary 611 implementation cost. 613 4. The Architectural Framework for Marked Traffic 614 Allocation 616 Thus far we have focused on the service definitions and the 617 forwarding path mechanisms. We now turn to the problem of 618 allocating the level of Marked traffic throughout the Internet. We 619 observe that most organizations have fixed portions of their 620 budgets, including data communications, that are determined on 621 an annual or quarterly basis. Some additional monies might be 622 attached to specific projects for discretionary costs that arise in 623 the shorter term. In turn, service providers (ISPs and NSPs) must 624 do their planning on annual and quarterly bases and thus cannot 625 be expected to provide differentiated services purely "on call". 626 Provisioning sets up static levels of Marked traffic while call set- 627 up creates an allocation of Marked traffic for a single flow's 628 duration. Static levels can be provisioned with time-of-day 629 specifications, but cannot be changed in response to a dynamic 630 message. We expect both kinds of bandwidth allocation to be 631 important. The purchasers of Marked services can generally be 632 expected to work on longer-term budget cycles where these 633 services will be accounted for similarly to many information 634 services today. A mail-order house may wish to purchase a fixed 635 allocation of bandwidth in and out of its web-server to give 636 potential customers a "fast" feel when browsing their site. This 637 allocation might be based on hit rates of the previous quarter or 638 some sort of industry-based averages. In addition, there needs to 639 be a dynamic allocation capability to respond to particular 640 events, such as a demonstration, a network broadcast by a 641 company's CEO, or a particular network test. Furthermore, a 642 dynamic capability may be needed in order to meet a 643 precommitted service level when the particular source or 644 destination is allowed to be "anywhere on the Internet". 645 "Dynamic" covers the range from a telephoned or e-mailed 646 request to a signalling type model. A strictly statically allocated 647 scenario is expected to be useful in initial deployment of 648 differentiated services and to make up a major portion of the 649 Marked traffic for the forseeable future. 651 Without a "per call" dynamic set up, the preconfiguring of usage 652 profiles can always be construed as "paying for bits you don't 653 use" whether the type of service is Premium or Assured. We 654 prefer to think of this as paying for the level of service that one 655 expects to have available at any time, for example paying for a 656 telephone line. A customer might pay an additional flat fee to 657 have the privilege of calling a wide local area for no additional 658 charge or might pay by the call. Although a customer might pay 659 on a "per call" basis for every call made anywhere, it generally 660 turns out not to be the most economical option for most 661 customers. It's possible similar pricing structures might arise in 662 the internet. 664 We use Allocation to refer to the process of making Marked 665 traffic commitments anywhere along this continuum from strictly 666 preallocated to dynamic call set-up and we require an Allocation 667 architecture capable of encompassing this entire spectrum in any 668 mix. We further observe that Allocation must follow 669 organizational hierarchies, that is each organization must have 670 complete responsibility for the Allocation of the Marked traffic 671 resource within its domain. Finally, we observe that the only 672 chance of success for incremental deployment lies in an 673 Allocation architecture that is made up of bilateral agreements, 674 as multilateral agreements are much too complex to administer. 675 Thus, the Allocation architecture is made up of agreements 676 across boundaries as to the amount of Marked traffic that will be 677 allowed to pass. This is similar to "settlement" models used 678 today. 680 4.1 Bandwidth Brokers: Allocating and Controlling Bandwidth Shares 682 The goal of differentiated services is controlled sharing of some 683 organization's Internet bandwidth. The control can be done 684 independently by individuals, i.e., users set bit(s) in their packets 685 to distinguish their most important traffic, or it can be done by 686 agents that have some knowledge of the organization's priorities 687 and policies and allocate bandwidth with respect to those 688 policies. Independent labeling by individuals is simple to 689 implement but unlikely to be sufficient since it's unreasonable to 690 expect all individuals to know all their organization's priorities 691 and current network use and always mark their traffic 692 accordingly. Thus this architecture is designed with agents 693 called bandwidth brokers (BB) [2], that can be configured with 694 organizational policies, keep track of the current allocation of 695 marked traffic, and interpret new requests to mark traffic in light 696 of the policies and current allocation. 698 We note that such agents are inherent in any but the most trivial 699 notions of sharing. Neither individuals nor the routers their 700 packets transit have the information necessary to decide which 701 packets are most important to the organization. Since these 702 agents must exist, they can be used to allocate bandwidth for 703 end-to-end connections with far less state and simpler trust 704 relationships than deploying per flow or per filter guarantees in 705 all network elements on an end-to-end path. BBs make it 706 possible for bandwidth allocation to follow organizational 707 hierarchies and, in concert with the forwarding path mechanisms 708 discussed in section 3, reduce the state required to set up and 709 maintain a flow over architectures that require checking the full 710 flow header at every network element. Organizationally, the BB 711 architecture is motivated by the observation that multilateral 712 agreements rarely work and this architecture allows end-to-end 713 services to be constructed out of purely bilateral agreements. 714 BBs only need to establish relationships of limited trust with 715 their peers in adjacent domains, unlike schemes that require the 716 setting of flow specifications in routers throughout an end-to-end 717 path. In practical technical terms, the BB architecture makes it 718 possible to keep state on an administrative domain basis, rather 719 than at every router and the service definitions of Premium and 720 Assured service make it possible to confine per flow state to just 721 the leaf routers. 723 BBs have two responsibilities. Their primary one is to parcel out 724 their region's Marked traffic allocations and set up the leaf 725 routers within the local domain. The other is to manage the 726 messages that are sent across boundaries to adjacent regions' 727 BBs. A BB is associated with a particular trust region, one per 728 domain. A BB has a policy database that keeps the information 729 on who can do what when and a method of using that database to 730 authenticate requesters. Only a BB can configure the leaf routers 731 to deliver a particular service to flows, crucial for deploying a 732 secure system. If the deployment of Differentiated Services has 733 advanced to the stage where dynamically allocated, marked 734 flows are possible between two adjacent domains, BBs also 735 provide the hook needed to implement this. Each domain's BB 736 establishes a secure association with its peer in the adjacent 737 domain to negotiate or configure a rate and a service class 738 (Premium or Assured) across the shared boundary and through 739 the peer's domain. As we shall see, it is possible for some types 740 of service and particularly in early implementations, that this 741 "secure association" is not automatic but accomplished through 742 human negotiation and subsequent manual configuration of the 743 adjacent BBs according to the negotiated agreement. This 744 negotiated rate is a capability that a BB controls for all hosts in 745 its region. 747 When an allocation is desired for a particular flow, a request is 748 sent to the BB. Requests include a service type, a target rate, a 749 maximum burst, and the time period when service is required. 750 The request can be made manually by a network administrator or 751 a user or it might come from another region's BB. A BB first 752 authenticates the credentials of the requester, then verifies there 753 exists unallocated bandwidth sufficient to meet the request. If a 754 request passes these tests, the available bandwidth is reduced by 755 the requested amount and the flow specification is recorded. In 756 the case where the flow has a destination outside this trust 757 region, the request must fall within the class allocation through 758 the "next hop" trust region that was established through a 759 bilateral agreement of the two trust regions. The requester's BB 760 informs the adjacent region's BB that it will be using some of 761 this rate allocation. The BB configures the appropriate leaf router 762 with the information about the packet flow to be given a service 763 at the time that the service is to commence. This configuration is 764 "soft state" that the BB will periodically refresh. The BB in the 765 adjacent region is responsible for configuring the border router to 766 permit the allocated packet flow to pass and for any additional 767 configurations and negotiations within and across its borders that 768 will allow the flow to reach its final destination. 770 At DMZs, there must be an unambiguous way to determine the 771 local source of a packet. An interface's source could be 772 determined from its MAC address which would then be used to 773 classify packets as coming across a logical link directly from the 774 source domain corresponding to that MAC address. Thus with 775 this understanding we can continue to use figures illustrating a 776 single pipe between two different domains. 778 In this way, all agreements and negotiations are performed 779 between two adjacent domains. An initial request might cause 780 communication between BBs on several domains along a path, 781 but each communication is only between two adjacent BBs. 782 Initially, these agreements will be prenegotiated and fairly static. 783 Some may become more dynamic as the service evolves. 785 4.2 Examples 787 This section gives examples of BB transactions in a non-trivial, 788 multi-transit-domain Internet. The BB framework allows 789 operating points across a spectrum from "no signalling across 790 boundaries" to "each flow set up dynamically". We might expect 791 to move across this spectrum over time, as the necessary 792 mechanisms are ubiquitously deployed and BBs become more 793 sophisticated, but the statically allocated portions of the spectrum 794 should always have uses. We believe the ability to support this 795 wide spectrum of choices simultaneously will be important both 796 in incremental deployment and in allowing ISPs to make a wide 797 range of offerings and pricings to users. The examples of this 798 section roughly follow the spectrum of increasing sophistication. 799 Note that we assume that domains contract for some amount of 800 Marked traffic which can be requested as either Assured or 801 Premium in each individual flow setup transaction. The 802 examples say "Marked" although actual transactions would have 803 to specify either Assured or Premium. 805 A statically configured example with no BB messages 806 exchanged: Here all allocations are statically preallocated 807 through purely bilateral agreements between users (individual 808 TCPs, individual hosts, campus networks, or whole ISPs) [6]. 809 The allocations are in the form of usage profiles of rate, burst, 810 and a time during which that profile is to be active. Users and 811 providers negotiate these Profiles which are then installed in the 812 user domain BB and in the provider domain BB. No BB 813 messages cross the boundary; we assume this negotiation is done 814 by human representatives of each domain. In this case, BBs only 815 have to perform one of their two functions, that of allocating this 816 Profile within their local domain. It is even possible to set all of 817 this suballocations up in advance and then the BB only needs to 818 set up and tear down the Profile at the proper time and to refresh 819 the soft state in the leaf routers. From the user domain BB, the 820 Profile is sent as soft state to the first hop router of the flow 821 during the specified time. These Profiles might be set using 822 RSVP, a variant of RSVP, SNMP, or some vendor-specific 823 mechanism. Although this static approach can work for all 824 Marked traffic, due to the strictly not oversubscribed 825 requirement, it is only appropriate for Premium traffic as long as 826 it is kept to a small percentage of the bottleneck path through a 827 domain or is otherwise constrained to a well-known behavior. 828 Similar restrictions might hold for Assured depending on the 829 expectation associated with the service. 831 In figure 6, we show an example of setting a Profile in a leaf 832 router. A usage profile has been negotiated with the ISP for the 833 entire domain and the BB parcels it out among individual flows 834 as requested. The leaf router mechanism is that shown in figure 835 3, with the token bucket set to the parameters from the usage 836 profile. The ISP's BB would configure its own Profile Meter at 837 the ingress router from that customer to ensure the Profile was 838 maintained. This mechanism was shown in figure 5. We assume 839 that the time duration and start times for any Profile to be active 840 are maintained in the BB. The Profile is sent to the ingress 841 device or cleared from the ingress device by messages sent from 842 the BB. In this example, we assume that van@lbl wants to talk to 843 ddc@mit. The LBL-BB is sent a request from Van asking that 844 premium service be assigned to a flow that is designated as 845 having source address "V:4" and going to destination address 846 "D:8". This flow should be configured for a rate of 128kb/sec 847 and allocated from 1pm to 3pm. The request must be "signed" in 848 a secure, verifiable manner. The request might be sent as data to 849 the LBL-BB, an e-mail message to a network administrator, or in 850 a phone call to a network administrator. The LBL-BB receives 851 this message, verifies that there is 128kb/sec of unused Premium 852 service for the domain from 1-3pm, then sends a message to 853 Leaf1 that sets up an appropriate Profile Meter. The message to 854 Leaf1 might be an RSVP message, or SNMP, or some 855 proprietary method. All the domains passed must have sufficient 856 reserve capacity to meet this request. 858 Figure 6. Bandwidth Broker setting Profiles in leaf routers 860 A statically configured example with BB messages 861 exchanged: Next we present an example where all allocations 862 are statically preallocated but BB messages are exchanged for 863 greater flexibility. Figure 7 shows an end-to-end example for 864 Marked traffic in a statically allocated internet. The numbers at 865 the trust region boundaries indicate the total statically allocated 866 Marked packet rates that will be accepted across those 867 boundaries. For example, 100kbps of Marked traffic can be sent 868 from LBL to ESNet; a Profile Meter at the ESNet egress 869 boundary would have a token bucket set to rate 100kbps. (There 870 MAY be a shaper set at LBL's egress to ensure that the Marked 871 traffic conforms to the aggregate Profile.) The tables inside the 872 transit network "bubbles" show their policy databases and reflect 873 the values after the transaction is complete. In Figure 7, V wants 874 to transmit a flow from LBL to D at MIT at 10 Kbps. As in 875 figure 6, a request for this profile is made of LBL's BB. LBL's 876 BB authenticates the request and checks to see if there is 10kbps 877 left in its Marked allocation going in that direction. There is, so 878 the LBL-BB passes a message to the ESNet-BB saying that it 879 would like to use 10kbps of its Marked allocation for this flow. 880 ESNet authenticates the message, checks its database and sees 881 that it has a 10kbps Marked allocation to NEARNet (the next 882 region in that direction) that is being unused. The policy is that 883 ESNet-BB must always inform ("ask") NEARNet-BB when it is 884 about to use part of its allocation. NEARNET-BB authenticates 885 the message, checks its database and discovers that 20kbps of the 886 allocation to MIT is unused and the policy at that boundary is to 887 not inform MIT when part of the allocation is about to be used 888 ("<50 ok" where the total allocation is 50). The dotted lines 889 indicate the "implied" transaction, that is the transaction that 890 would have happened if the policy hadn't said "don't ask me". 891 Now each BB can pass an "ok" message to this request across its 892 boundary. This allows V to send to D, but not vice versa. It 893 would also be possible for the request to originate from D. 895 Figure 7. End-to-end example with static allocation. 897 Consider the same example where the ESNet-BB finds all of its 898 Marked allocation to NEARNet, 10 kbps, in use. With static 899 allocations, ESNet must transmit a "no" to this request back to 900 the LBL-BB. Presumably, the LBL-BB would record this 901 information to complain to ESNet about the overbooking at the 902 end of the month! One solution to this sort of "busy signal" is for 903 ESNet to get better at anticipating its customers needs or require 904 long advance bookings for every flow, but it's also possible for 905 bandwidth brokerage decisions to become dynamic. 907 Figure 8. End-to-end static allocation example with no remaining 908 allocation 910 Dynamic Allocation and additional mechanism: As we shall 911 see, dynamic allocation requires more complex BBs as well as 912 more complex border policing, including the necessity to keep 913 more state. However, it enables an important service with a small 914 increase in state. 916 The next set of figures (starting with figure 9) show what 917 happens in the case of dynamic allocation. As before, V requests 918 10kbps to talk to D at MIT. Since the allocation is dynamic, the 919 border policers do not have a preset value, instead being set to 920 reflect the current peak value of Marked traffic permitted to cross 921 that boundary. The request is sent to the LBL-BB. 923 Figure 9. First step in end-to-end dynamic allocation example. 925 In figure 10, note that ESNet has no allocation set up to 926 NEARNet. This system is capable of dynamic allocations in 927 addition to static, so it asks NEARNet if it can "add 10" to its 928 allocation from ESNet. As in the figure 7 example, MIT's policy 929 is set to "don't ask" for this case, so the dotted lines represent 930 "implicit transactions" where no messages were exchanged. 931 However, NEARNet does update its table to indicate that it is 932 now using 20kbps of the Marked allocation to MIT. 934 Figure 10. Second step in end-to-end dynamic allocation example 936 In figure 11, we see the third step where MIT's "virtual ok" 937 allows the NEARNet-BB to tell its border router to increase the 938 Marked allocation across the ESNet-NEARNet boundary by 10 939 kbps. 941 Figure 11. Third step in end-to-end dynamic allocation example 943 Figure 11 shows NEARNet-BB's "ok" for that request 944 transmitted back to ESNet-BB. This causes ESNet-BB to send its 945 border router a message to create a 10 kbps subclass for the flow 946 "V->D". This is required in order to ensure that the 10kpbs that 947 has just been dynamically allocated gets used only for that 948 connection. Note that this does require that the per flow state be 949 passed from LBL-BB to ESNet-BB, but this is the only boundary 950 that needs that level of flow information and this further 951 classification will only need to be done at that one boundary 952 router and only on packets coming from LBL. Thus dynamic 953 allocation requires more complex Profile Metering than that 954 shown in figure 5. 956 Figure 12. Fourth step in end-to-end dynamic allocation example. 958 In figure 12, the ESNet border router gives the "ok" that a 959 subclass has been created, causing the ESNet-BB to send an "ok" 960 to the LBL-BB which lets V know the request has been 961 approved. 963 Figure 13. Final step in end-to-end dynamic allocation example 965 For dynamic allocation, a basic version of a CBQ scheduler [5] 966 would have all the required functionality to set up the subclasses. 967 RSVP currently provides a way to move the TSpec for the flow. 969 For multicast flows, we assume that packets that are bound for at 970 least one egress can be carried through a domain at that level of 971 service to all egress points. If a particular multicast branch has 972 been subscribed to at best-effort when upstream branches are 973 Marked, it will have its bit settings cleared before it crosses the 974 boundary. The information required for this flow identification is 975 used to augment the existing state that is already kept on this 976 flow because it is a multicast flow. We note that we are already 977 "catching" this flow, but now we must potentially clear the bit- 978 pattern. 980 5. RSVP/int-serv and this architecture 982 Much work has been done in recent years on the definition of 983 related integrated services for the internet and the specification 984 of the RSVP signalling protocol. The two-bit architecture 985 proposed in this work can easily interoperate with those 986 specifications. In this section we first discuss how the forwarding 987 mechanisms described in section 3 can be used to support 988 integrated services. Second, we discuss how RSVP could 989 interoperate with the administrative structure of the BBs to 990 provide better scaling. 992 5.1 Providing Controlled-Load and Guaranteed Service 994 We believe that the forwarding path mechanisms described in 995 section 3 are general enough that they can also be used to 996 provide the Controlled-Load service [8] and a version of the 997 Guaranteed Quality of Service [9], as developed by the int-serv 998 WG. First note that Premium service can be thought of as a 999 constrained case of Controlled-Load service where the burst size 1000 is limited to one packet and where non-conforming packets are 1001 dropped. A network element that has implemented the 1002 mechanisms to support premium service can easily support the 1003 more general controlled-load service by making one or more 1004 minor parameter adjustments, e.g. by lifting the constraint on the 1005 token bucket size, or configuring the Premium service rate with 1006 the peak traffic rate parameter in the Controlled-Load 1007 specification, and by changing the policing action on out-of- 1008 profile packets from dropping to sending the packets to the Best- 1009 effort queue. 1011 It is also possible to implement Guaranteed Quality of Service 1012 using the mechanisms of Premium service. From RFC 2212 [9]: 1013 "The definition of guaranteed service relies on the result that the 1014 fluid delay of a flow obeying a token bucket (r, b) and being 1015 served by a line with bandwidth R is bounded by b/R as long as 1016 R is no less than r. Guaranteed service with a service rate R, 1017 where now R is a share of bandwidth rather than the bandwidth 1018 of a dedicated line approximates this behavior." The service 1019 model of Premium clearly fits this model. RFC 2212 states that 1020 "Non-conforming datagrams SHOULD be treated as best-effort 1021 datagrams." Thus, a policing Profile Meter that drops non- 1022 conforming datagrams would be acceptable, but it's also possible 1023 to change the action for non-compliant packets from a drop to 1024 sending to the best-effort queue. 1026 5.2 RSVP and BBs 1028 In this section we discuss how RSVP signaling can be used in 1029 conjunction with the BBs described in section 4 to deliver a more 1030 scalable end-to-end resource set up for Integrated Services. First 1031 we note that the BB architecture has three major differences with 1032 the original RSVP resource set up model: 1034 1. There exist apriori bilateral business relations between BBs of 1035 adjacent trust regions before one can set up end-to-end resource 1036 allocation; real-time signaling is used only to activate/confirm 1037 the availability of pre-negotiated Marked bandwidth, and to 1038 dynamically readjust the allocation amount when necessary. We 1039 note that this real-time signaling across domains is not required, 1040 but depends on the nature of the bilateral agreement (e.g., the 1041 agreement might state "I'll tell you whenever I'm going to use 1042 some of my allocation" or not). 1044 2. A few bits in the packet header, i.e. the P-bit and A-bit, are 1045 used to mark the service class of each packet, therefore a full 1046 packet classification (by checking all relevant fields in the 1047 header) need be done only once at the leaf router; after that 1048 packets will be served according to their class bit settings. 1050 3. RSVP resource set up assumes that resources will be reserved 1051 hop-by-hop at each router along the entire end-to-end path. 1053 RSVP messages sent to leaf routers by hosts can be intercepted 1054 and sent to the local domain's BB. The BB processes the 1055 message and, if the request is approved, forwards a message to 1056 the leaf router that sets up appropriate per-flow packet 1057 classification. A message should also be sent to the egress border 1058 router to add to the aggregate Marked traffic allocation for 1059 packet shaping by the Profile Meter on outbound traffic. (Its 1060 possible that this is always set to the full allocation.) An RSVP 1061 message must be sent across the boundary to adjacent ISP's 1062 border router, either from the local domain's border router or 1063 from the local domain's BB. If the ISP is also implementing the 1064 RSVP with a BB and diff-serv framework, its border router 1065 forwards the message to the ISP's local BB. A similar process (to 1066 what happened in the first domain) can be carried out in the ISP 1067 domain, then an RSVP message gets forwarded to the next ISP 1068 along the path. Inside a domain, packets are served solely 1069 according to the Marked bits. The local BB knows exactly how 1070 much Premium traffic is permitted to enter at each border router 1071 and from which border router packets exit. 1073 6. Recommendations 1075 This document has presented a reference architecture for 1076 differentiated services. Several variations can be envisioned, 1077 particularly for early and partial deployments, but we do not 1078 enumerate all of these variations here. There has been a great 1079 market demand for differentiated services lately. As one of the 1080 many efforts to meet that demand this draft sketches out the 1081 framework of a flexible architecture for offering differential 1082 services, and in particular defines a simple set of packet 1083 forwarding path mechanisms to support two basic types of 1084 differential services. Although there remain a number of issues 1085 and parameters that need further exploration and refinement, we 1086 believe it is both possible and feasible at this time to start 1087 deployment of differentiated services incrementally. First, given 1088 that the basic mechanisms required in the packet forwarding path 1089 are clearly understood, both Assured and Premium services can 1090 be implemented today with manually configured BBs and static 1091 resource allocation. Initially we recommend conservative choices 1092 on the amount of Marked traffic that is admitted into the 1093 network. Second, we plan to continue the effort started with this 1094 draft and the experimental work of the authors to define and 1095 deploy increasingly sophisticated BBs. We hope to turn the 1096 experience gained from in-progress trial implementations on 1097 ESNet and CAIRN into future proposals to the IETF. 1099 Future revisions of this draft will present the receiver-based and 1100 multicast flow allocations in detail. After this step is finished, 1101 we believe the basic picture of an scalable, robust, secure 1102 resource management and allocation system will be completed. 1103 In this draft we described how the proposed architecture supports 1104 two services that seem to us to provide at least a good starting 1105 point for trial deployment of differentiated services. Our main 1106 intent is to define an architecture with three services, Premium, 1107 Assured, and Best effort, that can be determined by specific bit- 1108 patterns, but not to preclude additional levels of differentiation 1109 within each service. It seems that more experimentation and 1110 experience is required before we could standardize more than 1111 one level per service class. Our base-level approach says that 1112 everyone has to provide "at least" Premium service and Assured 1113 service as documented. We feel rather strongly about both 1) that 1114 we should not try to define, at this time, something beyond the 1115 minimalist two service approach and 2) that the architecture we 1116 define must be open-ended so that more levels of differentiation 1117 might be standardized in the future. We believe this architecture 1118 is completely compatible with approaches that would define 1119 more levels of differentiation within a particular service, if the 1120 benefits of doing so become well understood. 1122 7. Acknowledgments 1124 The authors have benefited from many discussions, both in 1125 person and electronically and wish to particularly thank Dave 1126 Clark who has been responsible for the genesis of many of the 1127 ideas presented here, though he does not agree with all of the 1128 content this document. We also thank Sally Floyd for comments 1129 on an earlier draft. A comment from Jon Crowcroft was partially 1130 responsible for our including section 5. Comments from Fred 1131 Baker made us try to make it clearer that we are defining two 1132 base-level services, irrespective of the bit patterns used to encode 1133 them. 1135 8. References 1137 [1] D. Clark, "Adding Service Discrimination to the Internet", 1138 Proceedings of the 23rd Annual Telecommunications Policy Research 1139 Conference (TPRC), Solomons, MD, October 1995. 1141 [2] V. Jacobson, "Differentiated Services Architecture", talk in 1142 the Int-Serv WG at the Munich IETF, August, 1997. 1144 [3] D. Clark and J. Wroclawski, "An Approach to Service 1145 Allocation in the Internet", Internet Draft draft-clark-diff-svc- 1146 alloc-00.txt, July 1997, also talk by D. Clark in the Int-Serv WG 1147 at the Munich IETF, August, 1997. 1149 [4] Braden et. al., "Recommendations on Queue Management 1150 and Congestion Avoidance in the Internet", Internet Draft, 1151 March, 1997. 1153 [4] Braden, R., Ed., et. al., "Resource Reservation Protocol 1154 (RSVP) - Version 1 Functional Specification", RFC 2205, 1155 September, 1997. 1157 [5] S. Floyd and V. Jacobson, "Link-sharing and Resource 1158 Management Models for Packet Networks", IEEE/ACM 1159 Transactions on Networking, pp 365-386, August 1995. 1161 [6] D. Clark, private communication, October 26, 1997 1163 [7] "Advanced QoS Services for the Intelligent Internet", Cisco 1164 Systems White Paper, 1997. 1166 [8] J. Wroclawski, "Specification of the Controlled-Load 1167 Network Element Service", RFC 2211, September, 1997. 1169 [9] S. Shenker, et. al., "Specification of Guaranteed Quality of 1170 Service", RFC 2212, September, 1997. 1172 [10] D. Clark and W. Fang, "Explicit Allocation of Best Effort 1173 Packet Delivery Service", IEEE/ACM Transactions on Networking, 1174 August, 1998, Vol6, No 4, pp. 362-373. also at: http:// 1175 diffserv.lcs.mit.edu/Papers/exp-alloc-ddc-wf.pdf 1177 Authors' Addresses 1179 Kathleen Nichols 1180 Cisco Systems, Inc. 1181 170 West Tasman Drive 1182 San Jose, CA 95134-1706 1184 Phone: 408-525-4857 1185 Email: kmn@cisco.com 1187 Van Jacobson 1188 Cisco Systems, Inc. 1189 170 West Tasman Drive 1190 San Jose, CA 95134-1706 1192 Email: van@cisco.com 1194 Lixia Zhang 1195 UCLA 1196 4531G Boelter Hall 1197 Los Angeles, CA 90095 1199 Phone: 310-825-2695 1200 Email: lixia@cs.ucla.edu 1202 Appendix: A Combined Approach to Differential Service in the Internet by 1203 David D. Clark 1205 After the draft-nichols-diff-svc-00 was submitted, the co-authors had a 1206 discussion with Dave Clark and John Wroclawski which resulted in Clark's 1207 using the presentation slot for the draft at the December 1997 IETF 1208 Integrated Services Working Group meeting. A reading of the slides shows 1209 that it was Clark's proposal on "mechanisms", "services", and "rules" 1210 and how to proceed in the standards process that has guided much of the 1211 process in the subsequently formed IETF Differentiated Services Working 1212 Group. We believe Dave Clark's talk gave us a solid approach for 1213 bringing quality of service to the Internet in a manner that is 1214 compatible with its strengths. 1216 The slides presented at the December 1997 IETF Integrated Services 1217 Working Group are included with the Postscript version.