idnits 2.17.1 draft-blake-diffserv-marking-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-03-19) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 8 longer pages, the longest (page 2) being 59 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There is 1 instance of too long lines in the document, the longest one being 1 character in excess of 72. ** The abstract seems to contain references ([RFC795,RFC1349], [IPv6]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 1515 has weird spacing: '...itoring funct...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (December 1997) is 9591 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'Bohn93' is defined on line 1821, but no explicit reference was found in the text == Unused Reference: 'Crow97' is defined on line 1843, but no explicit reference was found in the text == Unused Reference: 'Ferg97' is defined on line 1866, but no explicit reference was found in the text == Unused Reference: 'May97' is defined on line 1902, but no explicit reference was found in the text == Unused Reference: 'RFC1812' is defined on line 1940, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'ACTIVE' -- Possible downref: Non-RFC (?) normative reference: ref. 'AH' -- Possible downref: Non-RFC (?) normative reference: ref. 'Bohn93' -- Possible downref: Non-RFC (?) normative reference: ref. 'CBQ' -- Possible downref: Non-RFC (?) normative reference: ref. 'CCBES' -- Possible downref: Non-RFC (?) normative reference: ref. 'Clark97' -- Possible downref: Non-RFC (?) normative reference: ref. 'CLASSY' -- Possible downref: Non-RFC (?) normative reference: ref. 'Crow97' -- Possible downref: Non-RFC (?) normative reference: ref. 'ECN94' -- Possible downref: Non-RFC (?) normative reference: ref. 'ECN97' -- Possible downref: Non-RFC (?) normative reference: ref. 'ESP' -- Possible downref: Non-RFC (?) normative reference: ref. 'Feng97' -- Possible downref: Non-RFC (?) normative reference: ref. 'Ferg97' -- Possible downref: Non-RFC (?) normative reference: ref. 'Floyd91' -- Possible downref: Non-RFC (?) normative reference: ref. 'Floyd97' -- Possible downref: Non-RFC (?) normative reference: ref. 'FRED' -- Possible downref: Non-RFC (?) normative reference: ref. 'GBH97' -- Possible downref: Non-RFC (?) normative reference: ref. 'HFSC' -- Possible downref: Non-RFC (?) normative reference: ref. 'HPFQA' -- Possible downref: Non-RFC (?) normative reference: ref. 'IPv6' -- Possible downref: Non-RFC (?) normative reference: ref. 'IS802' -- Possible downref: Non-RFC (?) normative reference: ref. 'May97' -- Possible downref: Non-RFC (?) normative reference: ref. 'McCanne' -- Possible downref: Non-RFC (?) normative reference: ref. 'MPLS' -- Possible downref: Non-RFC (?) normative reference: ref. 'QOSP' -- Possible downref: Non-RFC (?) normative reference: ref. 'RED' ** Downref: Normative reference to an Historic RFC: RFC 795 ** Downref: Normative reference to an Unknown state RFC: RFC 1046 ** Obsolete normative reference: RFC 1349 (Obsoleted by RFC 2474) ** Obsolete normative reference: RFC 1583 (Obsoleted by RFC 2178) ** Downref: Normative reference to an Informational RFC: RFC 1633 ** Obsolete normative reference: RFC 1883 (Obsoleted by RFC 2460) -- Possible downref: Non-RFC (?) normative reference: ref. 'Shenker' -- Possible downref: Non-RFC (?) normative reference: ref. 'SIMA' -- Possible downref: Non-RFC (?) normative reference: ref. 'TWOBIT' Summary: 15 errors (**), 0 flaws (~~), 8 warnings (==), 31 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Steven Blake 3 INTERNET-DRAFT IBM Corporation 4 Expires: June 1998 5 December 1997 7 Some Issues and Applications of Packet Marking 8 for Differentiated Services 10 12 Status of This Memo 14 This document is an Internet-Draft. Internet-Drafts are working 15 documents of the Internet Engineering Task Force (IETF), its areas, 16 and its working groups. Note that other groups may also distribute 17 working documents as Internet-Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six months 20 and may be updated, replaced, or obsoleted by other documents at any 21 time. It is inappropriate to use Internet-Drafts as reference 22 material or to cite them other than as "work in progress." 24 To learn the current status of any Internet-Draft, please check the 25 "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow 26 Directories on ftp.is.co.za (Africa), ftp.nordu.net (Europe), 27 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 28 ftp.isi.edu (US West Coast). 30 Abstract 32 ''Packet marking'' is proposed as an architectural generalization of 33 the type of service (TOS) and precedence facilities of IPv4 [RFC795, 34 RFC1349], as well as the traffic class facilities of IPv6 [IPv6]. It 35 is intended to encompass all mechanisms by which a host or a router 36 may mark a packet to invoke some differentiated packet handling 37 behavior by another node along the transit path of the packet. This 38 memo examines several proposed applications of a packet marking 39 facility and attempts to categorize each application in terms of the 40 behavioral requirements it imposes on hosts and routers. In 41 addition, issues related to the deployment of packet marking, 42 including provisioning, authorization, and security, are examined. 43 This memo is proposed as a framework to focus discussion on 44 implementation issues and mechanisms as new differentiated services 45 enabled by a packet marking facility are introduced into the 46 Internet. 48 Blake Expires: June 1998 [Page 1] 49 Table of Contents 51 1. Introduction .................................................... 3 53 2. Motivation ...................................................... 4 55 3. Some Proposed Applications of Packet Marking .................... 6 56 3.1 Explicit Priority ............................................ 6 57 3.1.1 Delay Priority ........................................... 6 58 3.1.2 Drop Priority ............................................ 7 59 3.1.3 Network Control Priority ................................. 8 60 3.2 Explicit Service Class Indication ............................ 8 61 3.2.1 Precedence Service Classes ............................... 9 62 3.2.2 Transport Isolation ...................................... 10 63 3.2.3 Aggregated Integrated Services Classes ................... 10 64 3.2.4 Service-based Route Selection ............................ 11 65 3.3 Best-Effort Service Allocation ............................... 11 66 3.4 Integrated Services Conformant Packet Indication ............. 13 67 3.5 Forward Explicit Congestion Notification ..................... 14 69 4. Differentiation Mechanism Categorization ........................ 16 70 4.1 Host Packet Processing Mechanism Categorization .............. 16 71 4.2 Router Packet Processing Mechanism Categorization ............ 17 72 4.3 Biased vs. Substitute Best-Effort Router Mechanisms .......... 19 73 4.3.1 Transmission Mechanism Categorization .................... 19 74 4.3.2 Path Selection Mechanism Categorization .................. 22 76 5. Service Categorization .......................................... 22 77 5.1 Service Granularity .......................................... 22 78 5.2 Service Invocation ........................................... 23 79 5.3 Service Behavior ............................................. 24 80 5.4 Direction of Value ........................................... 25 82 6. Fairness and Congestion Control Considerations .................. 26 84 7. Provisioning Considerations ..................................... 27 86 8. Authorization Considerations .................................... 29 88 9. Routing Considerations .......................................... 30 90 10. System Implementation Considerations ............................ 31 92 11. Standardization Considerations .................................. 33 94 12. Security Considerations ......................................... 34 96 13. Acknowledgements ................................................ 35 98 14. References ...................................................... 35 100 Author's Address .................................................... 39 102 Blake Expires: June 1998 [Page 2] 103 1. Introduction 105 Best-effort networks such as the current Internet provide a service 106 to users which can best be characterized as delivering connectivity 107 plus a weak guarantee of fair access to network resources. In this 108 sort of network the performance of individual applications is highly 109 dependent on the instantaneous demand for network resources. 110 Although this level of service has proven satisfactory for a wide 111 variety of uses, there exist both applications and users which would 112 benefit substantially from a more predictable level of performance, 113 or from an "inequitable" share of the network resources. As a 114 consequence, there has been significant effort to define new 115 mechanisms to enable differentiated services within the Internet. 117 This document examines a particular class of differentiation 118 mechanisms that are triggered by a facility we refer to loosely as 119 "packet marking". Network nodes (routers and hosts) can implement, 120 in addition to the classical best-effort service, a variety of packet 121 processing, forwarding, buffer management, and scheduling behaviors 122 to differentiate packet queueing delay, packet loss, and application 123 flow throughput. These alternative differentiation mechanisms can 124 be invoked for a particular packet by marking it; i.e., by setting 125 some combination of one or more bits in a "packet handling" field in 126 the packet header. Note that both the IPv4 and IPv6 protocols 127 possess header fields intended specifically for this purpose (TOS/ 128 Precedence [RFC1349], and Class [IPv6], respectively). We will use 129 the term "PH field" to signify the packet handling field in the 130 general case. 132 A differentiated service is provided for a stream of packets by 133 marking (possibly a subset of) the constituent packets thereby 134 invoking some differentiation mechanism for each marked packet on the 135 nodes along the stream's path. A stream here may represent various 136 granularities of traffic ranging from an individual application flow 137 all the way to the aggregate traffic exchanged between a pair of 138 service providers. The differentiated service when invoked is 139 visible to the application or user as some change in a quantitative 140 characteristic of the aggregate packet transport (e.g., average delay, 141 loss, or throughput). 143 A distinguishing feature of the differentiated services mechanisms 144 examined here is that they treat packets with identical markings 145 equivalently; the mechanisms act on aggregated classes of packets 146 (where a class represents those packets with particular markings) and 147 they operate without per-flow state in every node along a packet's 148 path. This is in contrast to the Integrated Services model 149 [RFC1633], where per-flow classification, policing, and scheduling 150 state is installed and maintained on nodes along the path either via 151 application signaling [RSVP] or via administrative configuration. 152 While the Integrated Services mechanisms provide more granular QoS 153 guarantees to individual application flows, the requirement for 154 application signaling and per-flow state in the network introduces 156 Blake Expires: June 1998 [Page 3] 157 performance, scalability, and application compatibility issues. 158 Many applications can still benefit while utilizing a simpler and 159 more scalable set of differentiation mechanisms. Note also that 160 packet marking may help facilitate Integrated Services state 161 aggregation in the interior of the Internet (see Sec. 3.2.3). 163 The concept of packet marking described in this document should be 164 distinguished from the efforts of the Multi-protocol Label Switching 165 working group [MPLS]. MPLS is primarily intended to simplify router 166 forwarding implementations and to enable enhanced routing services. 167 However, some of the issues discussed in this document may be 168 relevant to MPLS as well. 170 Section 2 of this document elaborates on the motivations for 171 deploying packet marking differentiated services. Section 3 outlines 172 some of the proposed applications of packet marking. Section 4 173 introduces a categorization of differentiation mechanisms for hosts 174 and routers and describes how the proposals examined in Sec. 2 fit 175 into this categorization. Section 5 provides a categorization of 176 differentiated services enabled by packet marking. Sections 6-12 177 address various issues of fairness, congestion control, provisioning, 178 authorization, routing, implementation, standardization and security 179 which are introduced with the implementation and deployment of packet 180 marking differentiated services. 182 2. Motivation 184 As discussed in [Shenker], different applications have different 185 utility functions of the bandwidth provided for the application by 186 the network infrastructure. These different applications (e.g., 187 elastic (best-effort); hard, delay-adaptive, and rate-adaptive real- 188 time) exhibit varying sensitivity to the transient and steady- 189 state level of resources available from the network. This 190 sensitivity is often a function of human factors considerations, but 191 can also result from the fundamental characteristics of the 192 application (e.g., distributed database synchronization). 194 In addition, users and organizations may realize greater utility from 195 a particular subset of the applications (either elastic or inelastic) 196 in use, due to personal or business objectives. As an example, many 197 corporate networks prioritize transaction processing and interactive 198 traffic from some applications over batch and other applications' 199 traffic due to their immediate impact to business operations. 201 Finally, different customers of a network service provider may 202 realize greater or lesser utility from the network service they 203 receive. These customer utilities are not necessarily proportional 204 to the speeds of their access links to the service provider. 206 Network owners, be they private network managers, or public Internet 207 service providers, wish to maximize their return on investment in 209 Blake Expires: June 1998 [Page 4] 210 network infrastructure. Their ability to increase revenue is 211 constrained by the elasticity of demand, the mixture of application 212 types, and the means of pricing and cost-recovery available to them, 213 but in general they stand to benefit if they can tailor their service 214 offerings and pricing structures to satisfy the entire range of 215 application and customer service requirements, by allowing each 216 customer to maximize his utility/cost function. Because of the 217 variation in customer utility functions, differentiated pricing 218 (e.g., by Service Level Agreements) is a key revenue generating 219 mechanism, but its success depends on the ability to engineer the 220 network to satisfy the diversity of application/customer 221 service requirements. 223 One means of satisfying these requirements is to engineer 224 differentiated packet handling mechanisms into the network. These 225 mechanisms, which can be conceptualized as mechanisms for 226 prioritization and resource allocation, allow the service provider 227 to provision the network for each of the offered classes of service 228 so as to meet the application/customer requirements at the level of 229 statistical assurance promised. 231 Another means of satisfying these requirements is to over-provision 232 the network, so that it tends to run at low utilization with minimal 233 congestion. One problem with over-provisioning as a strategy for 234 enabling differentiated services is that any best-effort network 235 (i.e., without admission control) with concentration points can 236 experience transient congestion and loss, which will make it 237 difficult to support the most rigorous of application requirements. 238 Another more fundamental difficulty is that over-provisioning 239 provides better service to some customers than they would be willing 240 to pay for (as judged by their utility functions). Whether over- 241 provisioning is a cost-effective method of service differentiation 242 as compared to providing differentiation mechanisms within the 243 network depends on the level and type of application/customer demand, 244 the incremental cost of additional network infrastructure, and the 245 rate of change in demand. However, it appears clear that in an 246 environment (such as today) where there is explosive growth in the 247 users of and traffic in the Internet, that low-complexity 248 differentiation mechanisms offer a more rapid and effective means of 249 tailoring network service offerings than network over-provisioning. 251 Classifying packets to determine their service class can be 252 implemented by a number of means, including source/destination 253 address, protocol, and TCP/UDP port filtering. A problem with 254 filtering at this level of granularity is that each router along the 255 path of a packet will require the necessary filtering rules to 256 determine the service class. This has scaling problems, both in 257 terms of the number of filtering rules, as well as in the need for 258 mechanisms to dynamically add and delete new rules according to 259 changes in customer and application traffic. A further problem is 260 that, with the deployment of IP Security, transport payloads 261 encrypted within an Encapsulated Security Payload (ESP) cannot be 263 Blake Expires: June 1998 [Page 5] 264 identified by a router, because the protocol and port values are 265 obscured [ESP]. This will prevent any service differentiation of 266 encrypted traffic at the application level. 268 A motivation for deploying packet marking is that the network routers 269 need only filter on the value of the PH field to determine the 270 appropriate differentiation mechanisms to apply to a packet. When 271 coupled with aggregate buffer management and packet scheduling 272 mechanisms, as well as network authorization of the PH values and 273 adequate provisioning, packet marking provides a scalable mechanism 274 for offering differentiated transport services to different traffic 275 streams. These differentiation mechanisms may be useful without 276 explicit network authorization and provisioning to allow best-effort 277 applications to trade some fraction of their fair share packet rate 278 for a lower loss rate or for lower average queueing delay [RFC1046]. 279 Packet marking may also be useful as a means for improving the 280 scalability of per-flow Integrated Services by simplifying the 281 implementation of flow aggregation and by improving the efficiency of 282 the Integrated Services packet classification mechanism [CLASSY, 283 GBH97]. 285 3. Some Proposed Applications of Packet Marking 287 In this section we describe some proposed applications of a packet 288 marking facility. We are using the term "application" here to refer 289 to one or more services which could be delivered by specifying the 290 semantics of one or more PH bits and by specifying the 291 differentiation mechanisms they invoke within the network. Some of 292 the proposed semantics are explicit regarding the service requested 293 (e.g., transport isolation) while others could be used to provide a 294 variety of services. Several proposals overlap in terms of the 295 differentiation mechanisms utilized, and as such, a common set of 296 PH bits could be used to enable these proposed services. We will 297 examine the proposals individually here and will then categorize them 298 more rigorously in Sec. 4. 300 3.1 Explicit Priority 302 One application of packet marking is to provide an explicit request 303 for priority handling of the packet by routers. Priorities are 304 usually ranked in a strict hierarchy relative to some metric (e.g., 305 delay, loss probability). The priority value is usually intended to 306 reflect the importance of a packet relative to other packets from the 307 source as well as to packets from other sources [RFC795, RFC1046, 308 RFC1812 Sec. 5.3.3]. In this section we assume the router does not 309 perform priority based path selection (this is discussed in Sec. 310 3.2.4). 312 3.1.1 Delay Priority 314 Delay priority indication within a packet is intended to convey the 316 Blake Expires: June 1998 [Page 6] 317 sensitivity of an application to router queueing delay. A possible 318 range of values is the following: 320 o High -- prefer low maximum queueing delay; jitter 321 sensitive, 322 o Interactive -- prefer low average queueing delay; jitter 323 insensitive, 324 o Regular -- can tolerate the normal delay distribution 325 delivered by best-effort FIFO queueing, 326 o Low -- can tolerate extensive queueing delay or jitter. 328 A greater granularity of delay priority values is possible. However, 329 without strict per-flow admission control and policing, quantifiable 330 bounds on the delay distribution at a particular priority level are 331 difficult to determine. Delay priority is useful for allowing 332 applications which are delay sensitive to avoid large queues, 333 possibly at the expense of packet loss rate, while permitting 334 applications which are not sensitive to queueing delay to utilize 335 router buffers to avoid packet loss and achieve higher throughput (or 336 to avoid much worse round-trip time (RTT) delays). One simple 337 implementation mechanism is to provide a separate queue for each 338 delay priority level, with strict priority service between the 339 levels. A problem with this sort of implementation is possible 340 starvation of service at the lower delay priority levels. 341 Implementation issues of delay priority are further discussed in Sec. 342 4. 344 3.1.2 Drop Priority 346 Drop or discard priority indication within a packet is intended to 347 convey the sensitivity of an application (or a sub-layer of its 348 traffic) to packet losses. A possible range of values is the 349 following: 351 o Critical -- extremely loss sensitive; do not discard while other 352 non-critical packets are queued, 353 o High -- loss sensitive; drop only under extreme congestion, 354 o Regular -- can tolerate normal loss rates under active buffer 355 management [RED, ACTIVE], 356 o Low -- not sensitive to loss; discard under light 357 congestion. 359 A greater granularity of drop priority values is possible, however, 360 as with delay priority, in the absence of strict per-flow admission 361 control and policing, quantifiable bounds on the loss probabilities 362 at a particular priority level are difficult to determine. 363 Furthermore, it may be difficult to engineer several levels of drop 364 priority without introducing delay for the higher drop priority 365 levels under congestion. One possible implementation of drop 366 priority is to use multiple thresholds of packet occupancy in a 367 single FIFO queue to trigger the discard action for incoming packets 368 at a particular drop priority level. These thresholds could be based 370 Blake Expires: June 1998 [Page 7] 371 on the instantaneous queue occupancy with deterministic discard or on 372 an averaged queue occupancy with stochastic discard [Clark97, Feng97] 373 (see Sec. 4. for further discussion of implementation issues). Drop 374 priority is useful both for improving the throughput of more 375 important application flows as well as in enabling rate-adaptive 376 multi-layer audio and video applications, which can adjust their 377 rates after detecting impending congestion due to the drop of lower 378 priority packets of the encoded signal, while still protecting the 379 higher quality components of the signal from loss. Such an approach 380 to layering has superior control-plane scalability to alternatives 381 such as receiver-driven layered multicast [McCanne] (however, there 382 are issue of fairness and congestion which may bias an application to 383 the alternative method (see Sec. 6)). 385 3.1.3 Network Control Priority 387 Network Control priority indication within a packet is intended to 388 indicate that the packet is a component of a network control protocol 389 exchange whose correct and timely operation is critical to the 390 stability of the network. It is primarily intended for use with 391 routing protocols (e.g., RIP, OSPF, IS-IS, BGP), but could also be 392 used for other network signaling and control protocols (e.g., SNMP, 393 RSVP, MPLS) [RFC1812 Sec. 7.1.2]. The value of prioritizing routing 394 traffic over data traffic is to prevent routing collapse under heavy 395 load (e.g., preventing BGP connection timeouts due to excessive TCP 396 losses and retransmits). The value of prioritizing SNMP traffic is 397 to eliminate a denial-of-service attack (where the network manager 398 cannot monitor or configure a network element). A sensible 399 implementation will both guarantee an extremely low loss rate for 400 network control packets (i.e., by never discarding a network control 401 packet when other types of packets are queued) and will attempt to 402 bound the queueing delay they experience. This could be accomplished 403 by implementing a separate network control queue with strict 404 priority, or by providing priority pushout within a single FIFO queue 405 (implementation issues are further discussed in Sec. 4). Because 406 network control traffic is usually a small fraction of the total 407 traffic within a network, this prioritization should not have a 408 noticeable impact on data transport performance. However, because of 409 the high priority provided for this class of traffic, only routers 410 and network management stations should be allowed to set the network 411 control priority indication, and the network should take steps to 412 authenticate the source of a packet with the priority indication set 413 (see Sec. 12). 415 3.2 Explicit Service Class Indication 417 Another application of packet marking is to explicitly indicate a 418 "service class" for a packet. A service class is a more general 419 concept than delay or drop priority. It can be associated with a set 420 of resources provisioned within the interior of the network (e.g., 421 bandwidth, buffers, routes) for a particular set of application/ 422 customer traffic flows which are mapped onto that class. The service 424 Blake Expires: June 1998 [Page 8] 425 class concept does not impose any strict hierarchy of delay, loss, 426 or throughput priority between classes, but instead may permit the 427 specification of quantitative bounds on delay, loss, or throughput 428 performance for a class for a particular traffic profile within that 429 class. Issues regarding the implementation of explicit service 430 classes are discussed in Sec. 4. 432 3.2.1 Precedence Service Classes 434 One instantiation of the service class concept is to provide a set of 435 "precedence" service classes, in a manner very similar to the delay 436 and drop priorities discussed in Secs. 3.1.1 and 3.1.2, but with 437 potentially more flexibility in the provisioning of the classes. 438 Each individual class would be provisioned to provide "better" 439 service than the class of immediately lower rank, where the precise 440 definition of better service could for instance be defined as a 441 higher probability of timely delivery; i.e., lower probability of 442 loss and lower average delay. Each class in the hierarchy could be 443 engineered to provide a statistically quantifiable service for some 444 expected or regulated load, while also being engineered to prevent 445 starvation of service to the lowest precedence classes. Regulation 446 could be implemented in the form of dynamic service class mapping and 447 policing at the edge of the network. The result of implementing this 448 range of services would likely be improved throughput for application 449 flows in the higher precedence classes. 451 One application of this scheme would be to map business-critical 452 transaction traffic to a service class of high precedence, while 453 mapping casual web browsing traffic to a lower precedence class. 454 These classes could be implemented using a variety of methods, 455 including some variant of Weighted Fair Queueing (WFQ) or Class-based 456 Queueing (CBQ) [HPFQA, HFSC, CBQ]. Depending on the precise service 457 guarantees promised for the classes, they could potentially be 458 implemented using combinations of the explicit delay and drop 459 priority PH markings and router mechanisms described in Secs. 3.1.1 460 and 3.1.2. 462 [TWOBIT] describes a particular precedence service class 463 implementation which relies on authorization and policing/shaping at 464 the network edge and strict delay priority queueing in the interior 465 routers. Flows or flow aggregates assigned to the "Premium" service 466 class are policed based on a peak rate limit and any residual bursts 467 are shaped at the network edge; this smoothes the characteristics of 468 the Premium traffic which results in minimal accumulated queueing 469 delay in the interior routers (when the total Premium service load 470 is moderate). Packets which exceed the negotiated peak-rate limit 471 are discarded. Per-router service class provisioning is not 472 required in this scheme since two-level strict priority queueing is 473 used as the differentiation mechanism. However, Premium service 474 must be conservatively allocated to prevent starvation of the best- 475 effort service queue. 477 Blake Expires: June 1998 [Page 9] 478 3.2.2 Transport Isolation 480 Although TCP is the most widely used transport protocol in the public 481 Internet, there are alternative transport protocols which may have a 482 more or less aggressive response to network congestion or packet 483 loss. When run in parallel with TCP traffic, these transport 484 protocols, some of which are also used in alternative protocol packet 485 networks, may be unable to achieve their fair share rate (if they are 486 less aggressive), or may prevent TCP flows from achieving their fair 487 share rates (if they are more aggressive). These transport protocols 488 can be isolated from each other by mapping the flows which utilize 489 them to different service classes which are appropriately provisioned 490 for some level of minimal service. A router may queue these 491 transport-isolated service classes separately. The flows within each 492 service class queue then only compete with each other for the minimal 493 guaranteed bandwidth which is provisioned for that class, and can 494 temporarily consume the bandwidth provisioned for other classes 495 whenever they are unloaded. An advantage of transport isolation is 496 that it can protect normal best-effort TCP traffic from some well- 497 known mis-behaved transport protocols. The minimal bandwidth for 498 each transport service class must be provisioned at each router. 500 3.2.3 Aggregated Integrated Services Classes 502 Integrated Services and RSVP as currently specified depend on per- 503 flow signaling and per-flow packet classification, using the 504 destination address, protocol, and destination port of a packet, and 505 often also the source address and source port [RFC1633, RSVP]. As 506 has been mentioned previously, the requirement for per-flow control 507 and classification state may introduce scalability problems in the 508 interior of the Internet, where the demand for reservations on high- 509 speed links may exceed several thousand simultaneous flows. 510 Scalability can be improved by aggregating both the control and 511 packet classification state generated by a set of (unicast) flows 512 transiting a particular path through a segment of the Internet. This 513 approach was introduced in [CLASSY], and is examined in more detail 514 in [GBH97] and [TWOBIT]. It should be noted that no feasible design 515 for the aggregation of multicast flows has been published. 517 The proposals for control-state aggregation are not the topic of this 518 memo (the reader is encouraged to see [GBH97] and [TWOBIT]). 519 However, the details of packet classification aggregation are 520 relevant here. The basic concept is to mark traffic corresponding to 521 an Integrated Services traffic class with a particular PH service 522 class marking (e.g., Controlled Load, Guaranteed) at the router along 523 the flows' path which initiates aggregation. Subsequent routers 524 within the aggregating region do not classify the aggregated reserved 525 packets using the normal per-flow/session packet filters but instead 526 classify them based on their PH service class marking. These flows 527 are then serviced using a queue which has been either statically or 528 is dynamically provisioned to provide the required QoS for the total 529 set of aggregated Integrated Services flows of a particular traffic 530 class which traverses that link. Aggregated RSVP control messages 531 between routers on the edge of the aggregating region could be used 532 to specify the aggregated reservation request between those nodes, 533 and the interior routers could use this information to perform 534 admission control and to dynamically adjust the resources (e.g., 535 bandwidth, buffers) which are allocated to each aggregated Integrated 536 Services traffic class. More than two service classes could be used 537 for aggregation, each provisioned to deliver a particular QoS for the 538 flows utilizing that class. 540 One additional packet marking requirement introduced by aggregation 541 is the need to explicitly mark those packets of a reserved flow which 542 do not conform to the flow's reservation Tspec (see Sec. 3.4). This 543 marking is necessary since per-flow policing is not possible within 544 the interior of the aggregating region, and a single non-conformant 545 flow could reduce the QoS delivered to all other flows aggregated 546 into its service class. The alternative of marking these non- 547 conformant flows as best-effort could lead to unnecessary packet re- 548 ordering. Furthermore, it is critical that flows not policed at an 549 RSVP aggregation point not be marked with one of the aggregated 550 Integrated Service class PH markings and serviced using the resources 551 dedicated to the aggregated flows. These aggregated service classes 552 require isolation from other (potentially real-time) traffic, since 553 resources have been specifically dedicated to them based on 554 advertised and regulated traffic loads. The routers on the edge of 555 the aggregating region must prevent unauthorized use of these PH 556 markings by non-reserved flows. 558 3.2.4 Service-based Route Selection 560 An alternative means of differentiating the service provided to a 561 given class of traffic is to implement service-specific routes (i.e, 562 TOS routes) [RFC1349, RFC1583]. Service-specific routes can be 563 defined based on those characteristics of packet transport that are 564 largely affected by the path selected between two end-nodes. The 565 canonical set of characteristics used are delay, reliability, 566 throughput, and cost [RFC1349], although routes based on a more 567 detailed set of characteristics could in principle be defined. 568 Routing protocols can then be used to compute service-specific routes 569 by factoring in different link and path metrics (e.g., propagation 570 delay, bit error rate, link rate, transport cost). 572 Issues surrounding service-specific route selection are examined in 573 Sec. 9. 575 3.3 Best-Effort Service Allocation 577 There have been several recent proposals to use packet marking 578 mechanisms to provide best-effort service allocation [Clark97, SIMA, 579 Crow96, May97, Feng97, Bohn93, Ferg97, TWOBIT]. The term best-effort 580 service allocation refers to the notion of providing different 581 expectations of best-effort service to different categories of users 582 or applications based on some negotiated service profile with the 583 network. These expectations could be characterized in terms of 584 average throughput, loss, or delay, along with variance estimates 585 (statistical assurance levels) for these metrics. These schemes are 586 primarily motivated to provide different tiers of service to elastic 587 best-effort applications; in their simplest form they rely on network 588 dimensioning with authorization and enforcement at the network edge 589 to provide statistical assurances on performance which may not be 590 suitable for all application types. Explicit provisioning of 591 resources in the interior of the network is not precluded, but the 592 proposals are designed to work effectively using only simple 593 differentiation mechanisms in the interior routers, such as strict 594 drop or delay priority. We outline three of the proposed schemes 595 here. 597 [Clark97] describes two proposals for service allocation; one relying 598 on marking at the network edge near the sender (sender based scheme), 599 and the other relying on marking within the network and reaction at 600 the network edge near the receiver (receiver scheme). We describe 601 the sender scheme here and the receiver scheme in Sec. 3.5. 603 In the sender scheme, a profile meter monitors the traffic load 604 generated by a source and marks traffic which exceeds the negotiated 605 profile (which might be defined by a token bucket, for example) with 606 an in-profile indicator. The in-profile marking is interpreted as 607 the inverse of a drop preference indicator by the interior routers, 608 which preferentially discard drop preference traffic whenever 609 impending congestion is detected. The proposed differentiation 610 mechanism is a weighted variant of the RED algorithm [RED], termed 611 "RED with In and Out" (RIO), which uses a single FIFO queue and two 612 RED drop thresholds, the lower being assigned to the drop preference 613 traffic. This mechanism is designed to shut down out-of-profile 614 flows as the in-profile traffic utilization on a link approaches 615 one. No explicit provisioning of resources to the two levels of 616 service are required in the interior. This basic scheme can be 617 augmented by defining two or more levels of service assurance (e.g., 618 statistical, assured). The service provider must dimension the 619 network and provision the profiles of assured sources carefully to 620 reduce the probability of congestion loss. In addition, some form of 621 differentiation must be implemented in the router (such as separate 622 provisioned queues) to preferentially deliver in-profile assured 623 packets under congestion. 625 A conceptually similar scheme is described in [SIMA]. In this 626 proposal, the user contracts with the network for a Nominal Bit Rate 627 (NBR) of service. A monitor at the edge of the network measures the 628 traffic load relative to the NBR and dynamically computes a level of 629 drop preference (out of seven) for the packet. A packet flow at rate 630 NBR would be marked with a mid-range drop-preference; the network 631 would be dimensioned to provide a small loss ratio at this level. 632 Pricing of service is governed by the size of the NBR; users can 633 achieve better service by purchasing a larger NBR and underutilizing 634 it (thereby receiving low drop-preference for their packets). The 635 network routers implement preferential discard of traffic based on 636 a series of thresholds for each drop-preference level. The system 637 also allows the user to select real-time or non-real-time service. 638 Real-time packets are served by a small queue which receives strict- 639 priority service over the non-real-time queue. The small size of the 640 queue and the potentially higher loss rate gives the user incentive 641 not to utilize the real-time service for elastic applications. 643 [Feng97] describes a design that is similar to [Clark97]. One 644 salient difference is that the profile-meter at the network edge 645 (termed the Packet Marking Gateway (PMG)) statistically marks packets 646 for priority service based on some computation of the number of 647 priority marked packets needed to achieve a target throughput. The 648 network routers perform service differentiation using a weighted RED 649 implementation. One extension of this work is the incorporation of 650 the packet marking facility within the TCP congestion control 651 algorithm. 653 3.4 Integrated Services Conformant Packet Indication 655 Packet marking can be used to simplify and enhance the implementation 656 of Integrated Services, by marking packets within an Integrated 657 Services flow which conform to the flow's Tspec (or conversely by 658 marking non-conformant packets). As was mentioned in Sec. 3.2.3, 659 non-conformant packet marking is essential to permitting RSVP 660 aggregation, as per-flow policing is not possible when the control- 661 state is aggregated and non-conformant packets can degrade the QoS of 662 other aggregated flows. Packet marking of conformant packets may be 663 useful for non-aggregated Integrated Services flows as well, as it 664 can provide a hint to routers as to which packets may require 665 classification (a computationally expensive procedure) as well as 666 providing an indication as to of which packets of the flow have 667 failed policing upstream. We describe both a conformant packet 668 marking scheme and its dual below. 670 In the first scheme, a single bit is used to indicate that the packet 671 belongs a flow and that that packet has not failed a policing 672 function (it may be conformant). Only packets which have this bit 673 set are flow-classified by the routers, and only these packets are 674 counted against the flow's Tspec. There are three alternatives for 675 where and when this bit should be set: 677 o this bit is set by the source when it has sent a PATH message for 678 the flow, 680 o the bit is set by the source only when it has received a RESV 681 message for the flow, 683 o the bit is set in the network by the farthest upstream router 684 which accepted a RESV for the flow (often the source's first hop 685 router). 687 An issue with alternative one is that the bit may be set for flows 688 which are never reserved. An issue for alternative two is that the 689 semantics of the bit do not permit partial reserved paths (where the 690 reservation succeeds partially upstream from the receiver but fails 691 before reaching the source) since the bit will never be set (by the 692 host) and the routers will never classify the packet. This issue can 693 be addressed in part by alternative three, but in this case, the 694 farthest upstream router must classify every packet received on the 695 same interface as traversed by the flow to identify its packets; this 696 offsets the main advantage of providing the indication. Furthermore, 697 this introduces a dependency on the behavior of an upstream router 698 (since the furthest upstream router which accepted the reservation 699 must wait for RESVERR messages to guess whether an upstream node has 700 accepted the reservation and will mark the packets). 702 In the second scheme, the bit is only used to mark packets which have 703 failed a policing function (non-conformant). Every packet which does 704 not have the bit set is flow-classified, eliminating a potential 705 performance advantage of the first scheme since in this case all 706 best-effort packets are also classified. The bit is set if a packet 707 fails a flow's Tspec policing function (token bucket). Downstream 708 routers could choose not to classify these packets or could choose 709 not to count them against the flow's Tspec. 711 There are potential re-ordering hazards for both schemes, depending 712 on how non-conformant packets are serviced at the router. If non- 713 conformant packets are not classified and are serviced in the best- 714 effort queue, then re-ordering is likely whenever there is a 715 disparatity in the queueing delay between the flow's normal service 716 queue and the best-effort queue. The first scheme only appears to be 717 able to reduce the amount of packets which are classified while 718 preserving the defined RSVP network behavior if alternative one is 719 chosen. The second scheme can only reduce the number of packets 720 which are classified significantly if a large fraction of the 721 Integrated Services packets are non-conformant. However, the 722 semantics of the second scheme much more closely match the 723 requirements for aggregated flows (where flow-classification is 724 eliminated). Both schemes are mutually compatible if separate PH 725 bits are utilized for each. 727 3.5 Forward Explicit Congestion Notification 729 The final packet marking application we will discuss is Forward 730 Explicit Congestion Notification (FECN). FECN is one-half of a 731 bi-directional scheme where the network marks packets which are 732 transmitted across a congested link, and some process at the 733 receiving node sends a Backwards Explicit Congestion Notification 734 (BECN) back to the source, to influence its rate of transmission so 735 that the congestion within the network will subside. The BECN need 736 not be returned in the PH field but may be sent in some higher-layer 737 message. There are two proposed implementations of FECN which we 738 will examine. 740 In the approach describe in [ECN94] and [ECN97], a router sets the 741 FECN bit stochastically based on the RED algorithm, which computes a 742 probability of packet detection which is an increasing function of 743 the queue's average packet occupancy [RED]. The router sets this bit 744 as an alternative action to discarding the packet, which it would 745 have done if the associated transport protocol had not advertised its 746 ECN-capability). There are two variants proposed; one a single-bit 747 scheme and the other a two-bit scheme. In the single bit scheme, the 748 application transport protocol (which is ECN-capable) sets the FECN 749 bit, which is reset by any router which randomly detects the packet 750 due to a build up of queued packets. The packet can then no longer 751 be distinguished from packets utilizing non-ECN-capable transports, 752 and if it is detected downstream at another congested router, it will 753 be discarded. In the two-bit scheme, the transport protocol's ECN- 754 capability is advertised explicitly in a separate bit, and packets 755 which are detected at multiple routers are not discarded. Upon 756 receipt of a packet with the FECN, the receiver sends a BECN back to 757 the source, either as a IP-layer message (e.g., ICMP Source Quench) 758 or as a transport-layer acknowledgement (e.g., TCP ACK option). The 759 source transport protocol is supposed to treat the receipt of a BECN 760 equivalently to the loss of a packet and back-off its transmission 761 rate accordingly. Such a mechanism when widely deployed may 762 significantly reduce the number of lost packets and retransmissions 763 in the network. This reduction in the number of packet losses is 764 especially beneficial to interactive applications like Telnet which 765 are sensitive to RTT delays which result from packet loss and 766 retransmission intervals. 768 The approach described in [Clark97] is the receiver-based best-effort 769 service allocation scheme mentioned in Sec. 3.3. In this (one-bit) 770 scheme, routers set the FECN bit for every packet which experiences 771 congestion (deterministic marking vs. stochastic marking). A 772 receiver profile meter further downstream monitors the number of 773 packets which are marked by FECN and resets the FECN of all those 774 packets within the receiver's service profile. Packets with FECN in 775 excess of the profile are forwarded to the receiver with the FECN 776 set. The receiver transport protocol is supposed to take the same 777 action as described for the first scheme (send a BECN), and the 778 source transport protocol is also supposed to behave as in the first 779 scheme (back-off). In addition, the receiver profile meter may take 780 explicit action against flows from mis-behaving sources (those which 781 do not appear to honor the BECN). 783 The key difference between these two proposals is the marking action 784 in the interior routers (stochastic vs. deterministic marking). As 785 such, it does not appear that the schemes are compatible using the 786 same PH FECN bit, unless the network is configured such that there is 787 a receiver profile meter downstream of every interior router, in 788 which case the interior routers can be configured to mark the FECN 789 bit as described in [Clark97] and the receiver profile meters can 790 filter the FECN indications as appropriate prior to forwarding the 791 packet further downstream. 793 It is not clear how well ECN will scale for multicast traffic, due to 794 the potential implosion of BECNs at a multicast source [CCBES]. 796 4. Differentiation Mechanism Categorization 798 Packet marking is intended to invoke one or more differentiation 799 mechanisms -- either in a host (source/destination) or in the routers 800 along the packet's transit path -- so as to differentiate the data 801 transport performance. In the above statement we are using the term 802 "router" generally to refer to any node in the packet's path which 803 forwards it towards the destination; e.g., a profile-meter [Clark97] 804 or firewall. In Sec. 3 we examined several proposed uses of a packet 805 marking facility and the differentiation mechanisms they might 806 invoke. In this section we reverse the analysis by examining the 807 differentiation mechanisms which could be implemented within a host 808 or router, and then detail which of the mechanisms are invoked by the 809 proposals described in Sec. 3. It should be noted that, for ease of 810 deployment, and with the exception of FECN, most of the proposals 811 attempt to provide differentiated services using only router 812 mechanisms, without substantial changes to the host (if any). 814 4.1 Host Packet Processing Mechanism Categorization 816 Host packet processing mechanisms relevant to differentiated services 817 can be categorized into the following functions: 819 o Path selection -- selection of the output interface or 820 next-hop router, 821 o Transmission scheduling -- selection of the next packet to 822 forward from the transmission queue; 823 selection of the link-layer priority, 824 o Reception scheduling -- selection of the next packet to 825 process and deliver to the transport 826 layer from the reception queue, 827 o Congestion control -- selection of the transmission rate, or 828 of the interval over which to suspend 829 transmission, based on congestion 830 indications from the network. 832 Host path selection is rarely invoked since most hosts are single- 833 homed (single network interface) and most do not run a routing 834 protocol which would allow them to intelligently select the next-hop 835 router for service-specific routing (Sec. 3.2.4); however, nothing 836 precludes a properly configured host from making service-specific 837 route selections. 839 Non-FIFO host transmission scheduling may be invoked to promote a 840 delay prioritized packet, or one within a precedence service class 841 (or within an aggregated Integrated Services class if the host 842 supports aggregation). It may also be used to control the 843 transmission rate of a flow, if that flow's service class is known to 844 be rate regulated by the network. Host transmission scheduling may 845 also invoke link-layer prioritization features; e.g., by selecting a 846 particular ATM QoS VC, or by marking the packet with a particular 847 802.1p priority [IS802]. 849 Non-FIFO reception scheduling may in principle be invoked by a delay 850 prioritized or precedence service class packet, or by a transport- 851 isolated class (where one transport protocol has priority over 852 others). Loss priority may be utilized if the receiving host's 853 buffer is saturated and packets must be discarded. However, any non- 854 FIFO or drop-prioritized reception processing may introduce 855 complexity in the receiving host's networking protocol stack that is 856 not justified in practice by improved performance. 858 Receipt of a backwards explicit congestion notification should 859 directly affect the congestion control function of the source's 860 transport protocol (causing it to reduce its rate of transmission). 861 Also, a transport protocol modified as described in [Feng97] will 862 mark packets for priority as required to achieve a negotiated 863 throughput. 865 4.2 Router Packet Processing Mechanism Categorization 867 Router packet processing mechanisms relevant to differentiated 868 services can be categorized into the following functions: 870 o Reception scheduling -- selection of the next packet to process 871 from the reception queue, 872 o Packet classification -- identification of the flow by header 873 filtering; identification of the 874 differentiation mechanisms to apply by 875 PH lookup, 876 o Path selection -- selection of the next-hop node, 877 o Traffic policing -- setting of PH based on the monitored 878 rate of traffic within a flow/class, 879 o Buffer Management -- selection of the queue/discard action; 880 pushout under congestion, 881 o Transmission Scheduling -- selection of the next queue to service 882 for transmission. 884 Note that buffer management and transmission scheduling can be 885 strongly coupled. 887 Reception scheduling is normally FIFO in routers. This is usually 888 because the forwarding subsystem is integrated with the header 889 processing subsystem, and the packet must be received from the 890 reception queue (if any) before the PH field can be examined. In 891 addition, most wire-speed routers do not encounter significant 892 reception queues. 894 Any differentiation mechanism which utilizes packet marking will 895 require that the routers check the PH field to determine the 896 differentiation mechanisms to apply to the packet. An Integrated 897 Services conformant packet indication may invoke flow classification. 898 Also, some explicit service class may be defined which invokes some 899 packet classification function at some point in the network for 900 authorization purposes or for finer-granularity service 901 differentiation. 903 Path selection may be affected if service-class based routing is 904 configured. In this case the PH field will determine the set of 905 routes to search first. 907 Traffic monitoring and policing is required by several of the best- 908 effort service allocation proposals, to determine whether the traffic 909 of a flow is within a negotiated profile (or how it varies relative 910 to it). Integrated Services policing can utilize a non-conformant 911 packet indication to signal an out-of-profile packet within a 912 reserved flow. 914 Any of the delay priority or explicit service class markings may 915 direct the buffer management subsystem to queue the packet into a 916 non-default queue. Services requiring isolation and/or provisioned 917 resources in the router (e.g., buffer space, bandwidth) generally 918 require a separate queue. Within a queue, an explicit drop priority 919 or precedence service class marking, or an out-of-profile indication, 920 may invoke a buffer management discard action, depending on the 921 current state and history of buffer occupancy in the queue [RED]. 922 The same buffer discard logic may be utilized to set FECN (if the 923 transport protocol is ECN-capable). 925 When multiple queues are supported to provide delay prioritization, 926 provisioned service class bandwidth, or isolation, a queue scheduling 927 algorithm must be implemented to determine which queue's head-of-line 928 packet to transmit next. The scheduling algorithm could vary in 929 complexity from simple strict priority, to one of the more 930 sophisticated rate-based scheduling algorithms such as WFQ or CBQ 931 [HPFQA, HFSC, CBQ]. While the complexity of strict priority queueing 932 is usually O(1), the complexity of the more sophisticated rate-based 933 scheduling algorithms is usually O(log N) for N queues. This may 934 impose an upper bound on the number of economically implementable 935 delay priorities or service classes. 937 It is useful to examine the amount of state that each packet marking 938 application imposes within routers. Some of the stateless 939 applications include an explicit delay/drop priority (when 940 implemented as strict priority) or service class or out-of-profile 941 indicator that indicates drop-preference. These impose no per-flow 942 or per-class state in the routers, although any network authorization 943 or policing/monitoring function which sets these indicators may 944 require per-flow state (although these functions are usually located 945 near the source or receiver). An Integrated Services aggregating 946 router requires per-flow classification and policing state prior to 947 service class aggregation. Any provisioned explicit service class 948 mechanism will impose per-class buffer management and scheduling 949 state in the routers. In the general case, packet marking 950 applications only impose per-flow state at aggregation points in the 951 network where the number of flows is not large. 953 4.3 Biased vs. Substitute Best-Effort Router Mechanisms 955 We have examined in a general way the different router subsystems 956 which may be parameterized to differentiate packet transport 957 behavior. This analysis gives us the basis to more specifically 958 examine the scope of differentiation that can be provided. We 959 characterize a router's differentiation capabilities into two broad 960 categories: biased and substitute best-effort mechanisms. 962 Normal best-effort service is generally considered to be fair (under 963 the assumption of cooperating, well-behaved applications utilizing 964 transport protocols with TCP-like congestion control). This service 965 is often implemented using a single FIFO queue, with no per-flow 966 identification, or path-selection, or special buffer management or 967 scheduling (we accept the possibility of active buffer management, 968 perhaps incorporating fairness enforcement mechanisms [ACTIVE, RED, 969 FRED, Floyd97]). Such a service can be considered fair because there 970 are no explicit biases in the packet handling behavior. It is in 971 fact the case that there are biases in best-effort service, even 972 among well-behaved applications (e.g., flows with large RTTs achieve 973 lower througput [Floyd91]), but these biases are artifacts of the 974 congestion control algorithms utilized and are not due to explicit 975 biasing mechanisms in the network. We define a biased mechanism here 976 as one which explicitly allocates more resources (e.g., buffers, 977 queue service rate) to some set of traffic flows, permitting these 978 flows to achieve superior (loss/delay/throughput) performance over 979 other flows. We will more carefully define a substitute best-effort 980 mechanism in Sec. 4.3.1, but for now we define it as mechanism which 981 does not provide additional resources to flows over any long time 982 scale, but which may temporally provide such resources over short 983 time scales. 985 4.3.1 Transmission Mechanism Categorization 987 Router transmission mechanisms -- buffer management, packet 988 scheduling, and link-layer priority selection -- are one basic means 989 by which a router can differentiate the service of a flow. To 990 provide a framework for the discussion of the possible axis of 991 differentiation we will first describe a hypothetical router 992 transmission subsystem which enforces per-flow link-fairness. This 993 design is motivated by the discussion in [CCBES]. Such a system 994 would incorporate per-flow queueing with a fair (equal service weight 995 per queue) scheduling algorithm; for example a variant of WFQ 996 [HPFQA]. In a system with finite buffers, per-flow buffer management 997 would also be implemented. The buffer management system might impose 998 absolute bounds on the instantaneous buffer occupancy of a flow. To 999 account for bursty flows, a RED-based discard policy might be 1000 implemented based on the long-term traffic history of the flow (and 1001 not the flow's short-term average queue occupancy). The goal of this 1002 buffer management system would be to deliver equivalent packet loss 1003 rates to flows with equal long-term average rates (within some time 1004 horizon), with minimal packet loss for flows under-utilizing their 1005 fair-share rate, and higher loss rates for flows attempting to exceed 1006 their fair-share rate. The goal of the scheduling system is to 1007 provide equal service rates to all flows under backlogged conditions. 1008 In practice, once flows had stabilized to their fair-share rate, only 1009 the bursty flows would queue more than one packet at a time. Note 1010 that maintaining per-flow state (dynamically generated in the case of 1011 best-effort sources) is probably too complex for economical 1012 implementation; it is only proposed as a hypothetical example to 1013 highlight some properties of flow service differentiation. Note also 1014 that link-fairness is not the only, nor necessarily the best 1015 definition of fairness in a network; it is used here only for 1016 illustrative purposes. 1018 Flow service can be differentiated by modifying any of the parameters 1019 of our hypothetical transmission subsystem. For example, to increase 1020 the nominal throughput of a flow, that flow's queue weight could be 1021 increased, thereby increasing its backlog drain rate. To decrease 1022 the probability of packet loss, a flow's maximum allowable buffer 1023 consumption could be increased, and the parameters of the discard 1024 policy could be modified to preferentially allow the flow's packets 1025 to have access to buffers slot under congestion. To increase the 1026 probability of loss under congestion, the reverse actions could be 1027 taken, and threshold mechanisms could be implemented if there are 1028 packets of multiple drop priorities within a flow (as in best-effort 1029 service allocation). To reduce the queueing delay of a flow's 1030 packets (without increasing the long-term service rate of the flow), 1031 the scheduling algorithm could be amended so that a delay prioritized 1032 packet could be transmitted prior to that flow's queue receiving its 1033 turn at the scheduler. This delay prioritization capability would be 1034 bounded by a token bucket with a token pool scaled proportionally to 1035 the typical burst size of the flow, and with a token rate equal to 1036 the flow's fair-share rate (thus turning the flow's queue into a 1037 variable bit-rate shaper). Each of these modifications are an 1038 example of a biased differentiation mechanism. Note that the impact 1039 of this biasing is degradation of the service of other flows under 1040 contention for transmission resources. 1042 We can further categorize biased transmission mechanisms into 1043 provisioned and non-provisioned mechanisms. Provisioned mechanisms 1044 are specifically configured to provide a particular service for a 1045 flow, such as explicit delay or drop priority, or throughput priority 1046 within a precedence service class. A non-provisioned biased 1047 mechanism such as FECN implements a biased discard policy for packets 1048 which are marked as ECN-capable. Such a mechanism is not intended to 1049 enable biased service for well-behaved applications, however, they 1050 introduce the possibility of service bias for badly behaved 1051 applications (e.g., those that do not honor BECN) by allowing them 1052 to achieve better-than-fair throughput due to the lower loss rate. 1054 In contrast to fair and biased transmission mechanisms, we may also 1055 hypothesize the possibility of substitute best-effort mechanisms. 1056 The stability of the current Internet depends on the fact that its 1057 existing service model is fair [CCBES]. Introduction of biased 1058 service capabilities will require provisioning and traffic 1059 regulation. However, the normal "best-effort" service available to 1060 applications may not suit all of their needs, and it may be the case 1061 that applications could improve their performance without subscribing 1062 to any particular provisioned differentiated service from the 1063 network. This would only be possible if these alternative mechanisms 1064 did not aggravate network stability, which implies that they must 1065 also be fair. For the purposes of this discussion we define a 1066 substitute best-effort mechanism as fair if, when selected by a flow, 1067 it does not degrade the overall performance of other active flows, 1068 where we define "performance" for normal best-effort flows as average 1069 throughput, loss, and delay. One example of a substitute best-effort 1070 mechanism would be queueing isolation to protect a flow with a long 1071 RTT (note that this does not strictly meet our definition of fairness 1072 since, if selected, other flows are not able to achieve an unfair 1073 share of the link capacity). Another example would be a mechanism 1074 which allowed a flow to trade its fair-share service rate or its 1075 average packet loss rate for low queueing delay [RFC1046]. Trading 1076 rate for low delay could be achieved by giving a flow delay priority 1077 within a token bucket whose token rate was less than the flow's fair- 1078 share service rate. Trading loss for low delay could be achieved by 1079 queueing the flow in a delay-prioritized queue with a small per-flow 1080 buffer slot quota. Such a capability might be useful for low- 1081 throughput interactive applications like IP telephony. 1083 An alternative mechanism would allow a flow to trade queueing delay 1084 or service rate for lower loss. The former could be achieved by 1085 queueing the packet in a queue with a larger per-flow quota but with 1086 low delay priority. The latter could be achieved by queueing the 1087 flow in a larger buffer with a lower fair-share service rate. This 1088 capability might be useful for some short-term transaction traffic 1089 (e.g., RPC, some WWW) which is insensitive to queueing delay but 1090 which is sensitive to RTT delays. 1092 The third alternative, allowing a flow to trade delay or loss for an 1093 improved service rate, does not make sense in the context of a 1094 congestion-controlled best-effort network (See Sec. 6). 1096 It is not known to the author whether the substitute best-effort 1097 mechanisms proposed have been researched, and whether they exacerbate 1098 fairness and stability within a best-effort network. Furthermore, 1099 although we have discussed the transmission service mechanisms in the 1100 context of per-flow queueing and buffer management, in fact one of 1101 the goals of packet marking differentiated services is to eliminate 1102 per-flow state in the core of the network. Aggregate queueing and 1103 buffer management mechanisms which provide differentiated transport 1104 services may suffer from fairness problems within a service class 1105 similar to the current best-effort Internet with single FIFO queues 1106 [Floyd97]. 1108 4.3.2 Path Selection Mechanism Categorization 1110 We can categorize path selection mechanisms using the same framework 1111 as was used in Sec. 4.3.1. A provisioned biased path selection 1112 mechanism would compute a route based on metrics (e.g., delay, loss, 1113 link rate, and transmission cost) that would suit the requirements of 1114 a particular class of traffic. The flows that had access to these 1115 paths would be authorized prior to entry, and their traffic would be 1116 regulated. The paths would be provisioned to satisfy the regulated 1117 traffic load (perhaps using statistical assumptions). The "out-of- 1118 class" traffic taking these paths would also be regulated to preserve 1119 the service level of the in-class flows. 1121 Non-provisioned biased path selection mechanisms (which also fit our 1122 definition of substitute fair mechanisms) would not utilize per-flow 1123 authorization and traffic regulation. Examples include computing 1124 paths which avoid satellite hops for delay sensitive traffic, or 1125 which avoid wireless hops for loss sensitive traffic. Whether these 1126 alternative paths would actually improve the service of the flows 1127 which took them may depend on the relative load on the paths from 1128 other traffic flows. The assumption that would justify their use by 1129 non-regulated flows is that these paths are in some other way 1130 inferior to the normal shortest-hop path (longer delay, higher loss, 1131 or lower link rate). 1133 5. Service Categorization 1135 Marking a packet invokes a particular router or host differentiation 1136 mechanism on that packet. This facility is used to instantiate a 1137 service for a flow of traffic. Some of the packet marking 1138 applications discussed in Sec. 3 imply a specific differentiation 1139 mechanism (e.g., FECN); others imply a general service from the 1140 network (e.g., precedence) without implying any particular 1141 differentiation mechanism implementation. 1143 In this section we propose a set of criteria for categorizing 1144 differentiated service implementations. 1146 5.1 Service Granularity 1148 We can categorize differentiated services implementations by the 1149 granularity at which they act (at which they differentiate transport 1150 performance). At the lowest level of granularity is per-packet 1151 mechanisms such as the services described in Sec. 3.3 and 3.4, where 1152 packets are marked based on the characteristics of the corresponding 1153 packet flow relative to some traffic specification. These 1154 differentiation mechanisms may be invoked for a subset of the flow's 1155 packets, with the aggregate effect (and its interaction with host 1156 congestion control) delivering the desired service. FECN is another 1157 example of a per-packet differentiation service. 1159 Per-flow differentiated services utilize the same packet marking for 1160 each packet of a flow. Examples of this type include explicit delay/ 1161 drop/network control priority, explicit service class indication, and 1162 service-based route selection. This granularity can be relaxed to 1163 provide per-source host differentiation, where all of the packets 1164 transmitted from a particular source receive the same packet marking 1165 (or in the case of the schemes describe in Sec. 3.3, the individual 1166 flows of the source are not distinguished). Per-source 1167 differentiation is particularly suitable when there is no need for 1168 per-application differentiation (for example when all of the source's 1169 flows have homogeneous service requirements). Note that the 1170 distinction between per-packet services and per-flow/source services 1171 is not crisp. 1173 Per-network differentiated services act on the aggregate of flows 1174 from a particular cluster of nodes, from a particular subnet, or from 1175 a particular site (e.g., VPN service). As is the case for per-source 1176 services, per-network services are appropriate whenever the 1177 aggregated flows have homogeneous service requirements. 1179 Finally, we include per-receiver services such as that described in 1180 [Clark97] (and also RSVP aggregation). This class of services would 1181 require tight integration with host congestion control or network 1182 policing mechanisms to ensure appropriate behavior (i.e., reduction 1183 in transmission rate due to congestion experienced by out-of-profile 1184 packets). 1186 5.2 Service Invocation 1188 Another dimension of service categorization is the point of service 1189 invocation. The earliest point of possible invocation is at the 1190 source (at the application layer). Examples of possible source- 1191 invoked services are explicit delay/drop/network control priority, 1192 explicit service class indication, and Integrated Services conformant 1193 packet indication. Source-invoked services are particular useful 1194 where end-to-end differentiated service is required, since this 1195 exposes the service interface to the application [QOSP]. 1197 An alternative service invocation point is at some point(s) within 1198 the network. Examples here may include the services described in 1199 Sec. 3.2. and 3.3, Integrated Services non-conformant packet 1200 indication and aggregation, and FECN. The service may be invoked on 1201 any granularity of traffic (see Sec. 5.1) and requires configuration 1202 within the network to identify the flows or aggregate of flows to 1203 which the service should be applied. The scope of such a service is 1204 usually intermediate (across a network or network-to-network [QOSP]). 1205 Network-invoked services are useful whenever network authorization 1206 and policing are required, or whenever a set of flows with 1207 homogeneous service requirements can be aggregated. 1209 A hybrid invocation model would permit the source to set the PH field 1210 to request a particular differentiated service, while allowing the 1211 network to authorize and police the traffic from any source which is 1212 allowed to utilize the service. Such a service model might permit 1213 less granular configuration and authorization state in the network 1214 (i.e., no per-flow and only per-source state). 1216 Receiver-invocation of differentiated service is also possible, but 1217 requires some signaling mechanism to allow the receiver to control 1218 the sending rate of a source or the packet markings used across the 1219 network. The canonical example of a receiver-invoked service is 1220 the Integrated Services via RSVP signaling. Another example is ECN 1221 via a receiver-generated BECN (which could be influenced by a 1222 receiver profile meter as describe in [Clark97]). A receiver-to- 1223 network signaling protocol similar to RSVP which did not rely on the 1224 appropriate behavior of the source to enable the differentiated 1225 service (or make it deployable) is conceivable, although the author 1226 is not aware of a proposal for such a signaling mechanism in the 1227 context of packet marking differentiated services. 1229 Service invocation can also be characterized in time as well as in 1230 space. An application, source, or group of sources may have 1231 negotiated on ongoing arrangement with a service provider to provide 1232 a differentiated service for marked packets. This type of static 1233 service allocation may involve a variety of time- and destination- 1234 specific constraints which limits service availability, but it does 1235 not require signaling or any form of immediate configuration to 1236 permit utilization of the service; sources meeting these constraints 1237 as well as constraints on traffic levels may begin to utilize the 1238 service immediately by marking packets (or the network may 1239 automatically mark them). In contrast, dynamic service allocation 1240 involves some form of pre-negotiation (i.e., via signaling) between 1241 the source(s)/receiver(s) and the network for service prior to 1242 availability. This negotiation may involve service start/stop times, 1243 traffic levels, service characteristics, pricing, etc. The network 1244 will be required to dynamically configure authorization and policing 1245 policy mechanisms to instantiate the service, and may also have to 1246 dynamically provision resources within the network interior. 1248 5.3 Service Behavior 1250 A differentiated service's behavior can be categorized by whether it 1251 is biased or offers a substitute best-effort service. Biased 1252 services can be further categorized as to whether they are 1253 provisioned or non-provisioned. For a more detailed discussion of 1254 the differences see Sec. 4.3 1256 A major implementation issue with biased services is the need to 1257 regulate the amount of traffic which can invoke them, usually via 1258 source traffic shaping, or network authorization and traffic 1259 policing, as well as appropriate provisioning within the network. 1260 This is required to satisfy whatever delay/loss/throughput 1261 performance guarantees are associated with the service. The behavior 1262 of the service in the presence of non-conformant traffic can be 1263 characterized as to how the non-conformant traffic is handled. Such 1264 traffic could be discarded automatically by the network [TWOBIT], or 1265 it could be handled with lower priority and suffer a higher 1266 probability of loss or delay [Clark97]. The effect of the presence 1267 of non-conformant traffic on the conformant subset is also relevant. 1269 In general, a provisioned biased differentiated service could be 1270 defined as a set of probability density functions of packet delay, 1271 loss, and flow throughput relative to some statistical traffic model 1272 (and time interval). In practice, however, determining this level of 1273 information detail for each differentiated services customer would be 1274 difficult if not impossible. From these density functions service 1275 assurance levels, measured as the probability of service 1276 availability, could be inferred, although without stationarity 1277 assumptions the service failure modes could not be predicted. A 1278 differentiated services user will be primarily interested in the 1279 degradation behavior of the service. Differentiated services 1280 implementations can be characterized by whether service failure 1281 (e.g., due to under-provisioning or network infrastructure failure) 1282 results in a soft degradation in the delay/loss/throughput metrics, 1283 a complete degradation to traditional best-effort service, or total 1284 interconnectivity failure. Furthermore, implementations can be 1285 characterized by the promised maximal duration and frequency of 1286 service failure. 1288 5.4 Direction of Value 1290 An important means of categorizing differentiated services is by 1291 examining in which direction value flows when a differentiated 1292 service is provided for a packet flow (either to the source or to the 1293 receiver). In any point-to-point exchange of traffic there is 1294 usually a benefit to both ends of the conversation; however, it is 1295 often the case that the level of direct benefit to one party exceeds 1296 the level to the other. This is important to a service provider as 1297 it might be more appropriate to implement pricing policies which 1298 target the primary beneficiary. 1300 Most of the packet marking applications examined provide benefit to 1301 a source of traffic by preferentially handling that source's packets 1302 for improved transport performance. Since these mechanisms are not 1303 necessarily destination-specific, they can be viewed as primarily 1304 benefiting the source. As such one would expect that the source (or 1305 his proxy) would be the entity charged for the differentiated service 1306 (e.g., source-purchased traffic profile). When the traffic profile 1307 is associated with a particular set of destinations, and when 1308 reverse-path services are utilized, we can consider the value to be 1309 bi-directional, and the charges for the service can be distributed 1310 between the end-points (often the same organizational entity). 1312 The model of source-pricing of differentiated service may not suit 1313 WWW-based information delivery, since the value of service flows 1314 primarily to the receiver of information and there is no incentive 1315 for the information source to request service differentiation for 1316 a particular subset of receivers (this changes if there is some 1317 charge to the receiver associated with information retrieval). For 1318 such applications a receiver-invoked service such as describe in 1319 [Clark97] may be most appropriate. Signaling across the network to 1320 (or near) the source to initiate differentiation may permit more 1321 sophisticated receiver-invoked services (e.g., RSVP). However, there 1322 must likely be some associated settlement mechanism to incent the 1323 service provider or source to deploy such a protocol, and scalability 1324 and interoperability factors must also be weighed. 1326 An interesting problem arises for multicast applications where the 1327 receiver is the main beneficiary (arguably commercial broadcast 1328 applications are an example). When packet marking is invoked at or 1329 near the traffic source, all receivers of the transmission receive 1330 the benefit of the differentiated service (if we assume that the 1331 network does not remark the packet along different branches of the 1332 multicast path). This may deliver greater value to some receivers 1333 than they are willing to pay for. Conversely, if the service 1334 provider charges more for differentiated multicast service, this may 1335 make it difficult for the source to provide the desired service to 1336 one or more particular receivers (in an efficient way). Receiver- 1337 invoked service mechanisms such as described in [Clark97] may scale 1338 poorly in a multicast environment due to BECN implosion at the 1339 source. Also, Integrated Services aggregation suffers a variety of 1340 scaling and heterogeneity problems for IP multicast reservations, 1341 since the granularity of service is often too coarse (and due to 1342 control-plane scaling problems). 1344 6. Fairness and Congestion Control Considerations 1346 In the absence of traffic regulation and associated network 1347 provisioning, the stability of the Internet still depends on the use 1348 of cooperative congestion control by all applications [CCBES, 1349 Floyd97]. This is true even for application flows (or packet 1350 subsets) which specify (non-provisioned) drop-preference service. A 1351 form of congestion collapse can occur in the Internet if applications 1352 rely on the network to discard excess packets and do not implement 1353 closed-loop congestion control, because packets which will later be 1354 discarded downstream could utilize bandwidth on a link which could be 1355 better utilized by non-drop-preference congestion-controlled flows 1356 (e.g., normal TCP) [Floyd97]. The normal router mechanisms which 1357 would allow the non-drop-preference traffic to ramp up to the link- 1358 rate (e.g., weighted RED) may not function effectively if the drop- 1359 preference traffic is not well-behaved. One solution to this problem 1360 might be to introduce aggregate bounds on the amount of drop- 1361 preference traffic transmitted which would incent the application not 1362 to abuse the service. However, if a drop-preference marked packet 1363 has such a low probability of being delivered (due to aggregate 1364 constraints), the drop-preference facility is not very useful. 1366 A similar form of congestion collapse can occur if badly behaved 1367 applications which advertise ECN-capability do not respond to a 1368 receiver BECN. This is because the router gives these packets 1369 preferential drop priority. This could allow non-conformant 1370 transport protocols to achieve better throughput than conformant ECN- 1371 capable transports and non-ECN-capable transports. Although it is 1372 not clear whether ECN introduces a fairness problem that is any worse 1373 than the existing problem of badly behaved transports, its deployment 1374 should be approached cautiously. One way to alleviate these fairness 1375 problems might be to implement fairness enforcement mechanisms such 1376 as described in [Floyd97] and [FRED] (note that these mechanisms 1377 might contradict the scalability objectives addressed by packet 1378 marking). 1380 In Sec. 4.3 we introduced the concept of a substitute best-effort 1381 service. Because substitute best-effort differentiation mechanisms 1382 provide short-term biasing at the expense of long-term throughput, 1383 delay, or loss, the router implementation must take active measures 1384 to ensure that these mechanisms do not jeopardize network fairness. 1385 The mechanisms should be engineered to ensure that sources cannot 1386 achieve an unfair share of network resources by modulating between 1387 substitute best-effort services. Badly behaved applications should 1388 not be able to achieve better throughput (or to further degrade 1389 the service of other flows) by selecting a substitute best-effort 1390 service. One possible means of achieving these objectives is to 1391 make the mechanisms non-work-conserving, thereby incenting the 1392 application to select these substitute services only if the trade-off 1393 they provide is absolutely beneficial, and to penalize badly behaved 1394 applications which select these services. 1396 One of the scalability objectives of packet marking differentiated 1397 services is to eliminate per-flow state in the core of the network. 1398 We have examined a hypothetical per-flow router transmission system 1399 to highlight how differentiation might be provided. However, in a 1400 scalable system, only aggregated state would be maintained (or per- 1401 flow state would only be maintained for a small subset of the 1402 active flows). Aggregated buffer management and queueing 1403 implementations may suffer the same fairness problems between flows 1404 within a service class as is exhibited today with best-effort traffic 1405 and single-queue FIFO routers. Provisioning and traffic regulation 1406 might alleviate these problems, but techniques such as described in 1407 [Floyd97] and [FRED] might also be required in some circumstances. 1409 7. Provisioning Considerations 1411 We have used the term "provisioning" in this document to describe the 1412 deployment and assignment of network resources for the exclusive or 1413 preferential use by certain (sets of) traffic flows. Aggregate 1414 differentiation mechanisms in and of themselves cannot deliver a 1415 quantifiable service without constraints on the aggregate amount of 1416 traffic which invokes those mechanisms. The resource allocation 1417 policy can be implemented in each interior router, for example by 1418 service class-specific queues with provisioned minimal bandwidth 1419 levels and buffer quotas. Alternatively, resource allocation can be 1420 implemented more globally, relying on traffic authorization and 1421 policing at the network edge and stateless differentiation mechanisms 1422 in the network interior. Whichever choice is preferred depends on 1423 scalability concerns, the aggregate amount of traffic utilizing a 1424 particular differentiated service, as well as the level of 1425 statistical performance assurance associated with the service. 1427 A basic motivation of differentiated services is to provide tiered 1428 levels of statistical assurance of service for a particular traffic 1429 load, with tiered pricing to match the service provider cost and 1430 customer utility associated with each level. Service assurance was 1431 discussed in detail in Sec. 5.3. To elaborate on that discussion, 1432 the statistical assurance of a differentiated service depends upon 1433 uncertainty about the interior transit path taken by appropriately 1434 marked packets, on the statistical multiplexing gain assumed in the 1435 service allocation policy, and on the instantaneous behavior of other 1436 differentiated services users. As such, there is potentially a 1437 strong time-dependence on service quality. 1439 When provisioning a differentiated service, a provider must take into 1440 account the dimensioning of the network, as well as statistical 1441 models of customer activity and traffic levels. Defining a service 1442 allocation policy which satisfies a particular statistical assurance 1443 level is equivalent to an admission control problem. The primary 1444 design choices in a service allocation policy are static vs. dynamic 1445 allocation, and domain-wide vs. hop-by-hop admission control. When 1446 using a static allocation policy, a provider must provision more 1447 resources than when using a dynamic allocation policy to achieve the 1448 same level of statistical assurance since the admission control 1449 decision must be made on historical data or traffic models and not on 1450 instantaneous measurements of network activity. However, dynamic 1451 admission control requires some means of signaling and/or dynamic 1452 configuration to convey the service request to the network (and its 1453 affected elements). This dynamic admission control decision could be 1454 made by a centralized administrative entity (e.g., the Bandwidth 1455 Broker in [TWOBIT]) which bases its decision on a domain-wide view of 1456 existing service allocations (and possibly a coarse view of the 1457 instantaneous traffic activity). Alternatively, the decision could 1458 be made at each node along the hop-by-hop path taken by the affected 1459 packets (e.g., the RSVP/Integrated Services model). It is difficult 1460 to provide a strict guarantee of service along an unspecified path 1461 at an unspecified time [Clark97], which implies that services which 1462 promise strict performance guarantees will usually include 1463 constraints on the available destinations or network egresses as well 1464 as the interval of service availability. Note also that the choices 1465 between static vs. dynamic allocation and domain-wide vs. hop-by-hop 1466 dynamic admission control are not mutually exclusive. 1468 Implementation of a dynamic, domain-wide admission control policy, as 1469 well as long term service planning, depends on the availability of 1470 statistics on service utilization and performance. Means of 1471 capturing the characteristics of marked traffic, such as the 1472 utilization of a particular service class or differentiation 1473 mechanism, packet discard distributions, queue delay distributions, 1474 etc., are required (e.g., via new router MIB variables). Service 1475 providers and customers may need to deploy test and measurement 1476 applications to characterize and validate the assurance level of a 1477 service. These mechanisms may also be needed to facilitate inter- 1478 provider monitoring and settlement. 1480 Service provisioning must also take into account the scalability of 1481 the mechanisms used to provide the service. Scalability may be 1482 affected by the amount of configuration state, network monitoring 1483 state, router processing and transmission state, and dynamic service 1484 signaling traffic levels and state which is required for a particular 1485 service implementation. 1487 8. Authorization Considerations 1489 Authorization of use of packet marking-based biased differentiated 1490 services is required to permit any level of service assurance. 1491 Authorization is required at whichever point(s) in the network where 1492 the service is invoked (see Sec. 5.2). We break the authorization 1493 problem down into the components of packet classification, traffic 1494 policing, and PH marking. 1496 The packet classification component matches received packets to 1497 statically or dynamically allocated service profiles, based on any 1498 combination of per-source address/subnet, per-destination address/ 1499 subnet, or per-flow packet header filters. The classification 1500 function may be deployed on site subnet boundaries, on site backbone 1501 boundaries, on the site border to a service provider, on provider 1502 ingress boundaries, and/or on inter-provider boundaries. The 1503 granularity of packet classification will generally be relaxed as 1504 the classification component moves into the interior of the network 1505 to facilitate scalability. The classification component may also 1506 honor source service requests (based on the PH value set by the 1507 source). 1509 The traffic policing component measures the instantaneous load of 1510 packets matching a classification entry relative to some traffic 1511 profile. This traffic profile is configured based on some source/ 1512 network or network/network agreement on the amounts and signature of 1513 marked traffic. The traffic policing component could be based on a 1514 simple token bucket filter [TWOBIT], or on a more sophisticated 1515 monitoring function which takes into account the congestion-control 1516 behavior of TCP [Feng97, Clark97]. Non-conformant packets may be 1517 discarded, depending on the service policy. 1519 The PH marking component sets the PH field of each service profile- 1520 conformant packet to invoke the differentiation mechanism(s) deployed 1521 within the network to instantiate the appropriate service. Non- 1522 conformant packets may be re-marked, depending on the service policy. 1524 The classification, policing, and PH marking components within a 1525 network must be configured whenever a service is allocated to a (set 1526 of) sources. This may involve manual configuration for statically 1527 allocated services, or dynamic signaling (e.g., SNMP, RSVP) for 1528 dynamically allocated services. 1530 Interoperability of packet marking differentiated services between 1531 service providers depends on a joint agreement on PH semantics, 1532 traffic profiles, and authorization policies for each service 1533 supported. 1535 9. Routing Considerations 1537 Service assurance may depend on the stability of the routing system 1538 (e.g, the prevalence of routing flap or the frequency of routing 1539 melt-down). The deployment of service-based routing (Sec. 3.2.4) 1540 introduces a variety of additional routing considerations. One issue 1541 involves the existence of an incomplete service-specific path between 1542 a source and destination (or across a domain which deploys service- 1543 specific routing). This incomplete path might exist due to router 1544 misconfiguration, due to different policy decisions among service 1545 providers, or due to routing transients. In the event that a 1546 service-specific route is not available at a router along the transit 1547 path, we assume that the default routing entry is followed [RFC1583]. 1548 The problem occurs when the service-specific and default routes are 1549 calculated using a different set of metrics. It may be possible that 1550 if the default route is followed, then the packet may loop back to 1551 a node which has a matching service-specific route entry, and a 1552 stable routing loop may form. Although the effects of router 1553 misconfiguration and routing transients are hard to mitigate, this 1554 behavior may be particularly onerous since it could be hard to 1555 detect. This problem could be avoided if, whenever a router which 1556 implements service-specific routing has to forward a packet (with a 1557 PH marking associated with a service-specific routing class) using 1558 the default routing entry, the PH field is reset to indicate the 1559 default routing class. This solution avoids the stable routing loop 1560 problem; however, if the same PH bits are overloaded to specify both 1561 service class queueing and route selection (based on some request 1562 such as "Minimize Delay"), then whenever these PH bits are reset, the 1563 service class queueing indicators are erased, and the service 1564 provided to the flow may be degraded at downstream nodes. 1566 Another issue is the choice of behavior in the event that a matching 1567 service-specific routing entry is not available. The basic choices, 1568 "Strong TOS", "Weak TOS", and "Very Weak TOS", are defined in 1569 [RFC1349]. The "Strong TOS" model requires that a router only 1570 forward a packet if a matching service-specific route is a available 1571 (otherwise the packet is discarded). The "Weak TOS" model requires 1572 that the router use the best-matching default routing entry if a 1573 matching service-specific route is not available (this is the 1574 behavior assumed in the example above). The "Very Weak TOS" model 1575 requires that the router attempt to utilize the best-matching, 1576 numerically lowest TOS entry if neither a matching service-specific 1577 nor a matching default entry are available. The "Very Weak TOS" 1578 model only makes sense if the services are somehow ranked in 1579 numerical order of precedence. Historically, the "Weak TOS" model 1580 has been favored, since the "Strong TOS" model may penalize packets 1581 utilizing a service class when service-specific routing is not 1582 deployed or a particular service-specific path is broken [RFC1349]. 1584 The introduction of service-specific routing introduces an additional 1585 criterion to the route lookup algorithm (the service class match). 1586 The route lookup algorithm may choose to prefer an exact matching 1587 service class value above a longest-prefix address match which does 1588 not support the specific service (e.g., the default service class). 1589 Whichever criterion is preferred must be standardized to prevent the 1590 formation of routing loops amongst routers which implement contrary 1591 policies. However, both [RFC1349] and [RFC1583] mandate the use of 1592 the longest-prefix match as the preferred criterion, as this appears 1593 to be the more robust option. 1595 Whenever service-specific routing is deployed, interoperability 1596 between service providers must be considered. There must exist some 1597 compatibility between service-specific route calculation mechanisms 1598 in the deployed IGP and EGP routing protocols to prevent interdomain 1599 routing loops, and peering service providers must agree to implement 1600 compatible policies (including the resetting of routing-sensitive PH 1601 bits) to avoid routing loops or sub-optimal routing paths. 1603 PH bits which affect route selection should not be modified 1604 dynamically within a flow (on a per-packet basis) since this may 1605 affect packet ordering and RTT estimation. Best-effort service 1606 allocation mechanisms such as described in Sec. 3.3 should not 1607 utilize routing-sensitive PH bit combinations to indicate the 1608 conformance of a packet. 1610 Deployment of service-specific routing may introduce scalability 1611 issues due to the increased amount of routing protocol state 1612 maintained in, as well as the increased amount of routing table 1613 computations performed by, the network routers. 1615 10. System Implementation Considerations 1617 We reiterate that the goal of packet marking is to provide a 1618 simplified, scalable mechanism for invoking service differentiation 1619 which avoids per-flow state in the interior of the network to the 1620 maximum extent possible, as this appears to be a more scalable 1621 approach than alternatives such as the RSVP/Integrated Services 1622 model. Marked packets are handled as an aggregate. Note that the 1623 link-fairness model described in Sec. 4.3.1 is an idealized example 1624 which implies far too much dynamic per-flow state for practical 1625 deployment on high-speed nodes. Note also that aggregate 1626 differentiation mechanisms may suffer fairness problems within a 1627 service class (see Sec. 6). 1629 Various differentiation mechanisms may introduce performance and 1630 scalability problems within a router implementation. One particular 1631 example is the impact on router forwarding implementations which 1632 rely on dynamic per-flow caching of forwarding state (e.g., IPv6 1633 Source Address/Flow Label caching as described in [RFC1883]). Such 1634 implementations may enjoy a performance advantage since the first 1635 packet of a flow is searched using the traditional router forwarding 1636 and classification algorithms to determine the next outgoing link, 1637 the appropriate service class, the appropriate delay/drop priority, 1638 etc., while subsequent packets of the flow can be forwarded using a 1639 cache lookup which can usually be performed using an O(1) algorithm 1640 (although this advantage may come at the cost of scalability in terms 1641 of the number of simultaneous flows supported at wire-speed). The 1642 caching algorithm as described in [RFC1883] assumes that the PH field 1643 of a packet remains constant within the six second caching window of 1644 a flow. If the PH field is used to affect the delay or drop priority 1645 of a packet, and if the PH field is modified dynamically to indicate 1646 conformance of the packet to some service profile (see Sec. 3.3), 1647 then the caching algorithm may prevent the router from taking this 1648 indication into account. This problem can be avoided if the PH field 1649 is defined as part of the cache lookup key. Modified values in the 1650 PH field signify a separate "flow" which will require traditional 1651 classification (at least for the first modified packet header). 1652 Alternatively, any differentiation mechanism which is determined by 1653 the PH value may be excluded from the set of cached state and checked 1654 for each individual packet. 1656 Another example is the requirement to recompute the IPv4 header 1657 checksum whenever the PH field is modified. This would be required 1658 for profile meters as defined for example in [Clark97]. [SIMA], and 1659 [TWOBIT]. This would also be required for IPv4 routers which deploy 1660 FECN. IPv4 routers already recompute the IPv4 header checksum 1661 whenever they decrement the TTL of a packet. However, FECN 1662 introduces a potential implementation constraint for routers which 1663 utilize distributed forwarding across a switching fabric, since the 1664 processing component which performs routing and packet classification 1665 and which decrements the packet's TTL may lie across a switching 1666 fabric from the output interface queues. In general, the processing 1667 component which recomputes the IPv4 header checksum must have 1668 knowledge of the state of the targeted output queue whenever FECN is 1669 implemented. 1671 Router implementation complexity and performance scalability will be 1672 affected by the number of output interface queues which are 1673 implemented to provide service differentiation, as well as by the 1674 complexity of the scheduling algorithms used. In addition, per-class 1675 memory requirements and the processing requirements to maintain per- 1676 class state will also have an impact. Maintenance of configuration 1677 parameters (e.g., for flow/source/destination classification) and 1678 network management counters (e.g., for service performance 1679 monitoring) may increase memory requirements and introduce additional 1680 performance constraints. 1682 Compatibility with application expectations for network behavior is 1683 critical. Routers may implement aggregated service differentiation 1684 mechanisms using multiple queues. As a consequence, modulating 1685 between different PH markings may cause different packets of a flow 1686 to be serviced using different queues, which may result in packet 1687 reordering. Applications which modulate between PH markings (e.g., 1688 to signify drop priority for multiple layers of a video signal) 1689 should expect that the packet ordering be maintained. Consequently, 1690 application-visible differentiation mechanisms, as well as network- 1691 invoked differentiation mechanisms should utilize sets of PH markings 1692 which are guaranteed to be serviced within the same queue (and with 1693 the same routing metrics). 1695 11. Standardization Considerations 1697 Packet marked differentiated services cannot be deployed within the 1698 public Internet without some level of standardization. In 1699 particular, the semantics of some of the PH bits must be defined to 1700 allow deployment of interoperable routers, authorization components, 1701 admission control components, and network management agents (we 1702 assume that some of the available PH bits may be reserved for 1703 network-specific use). Specification of those PH bits which may be 1704 changed dynamically in-flight is needed to avoid packet reordering 1705 problems (see Sec. 10). Specification of the PH bits which are 1706 allowed to affect route selection is required for interoperability of 1707 routing protocol implementations. Specification of the PH bits which 1708 may be set by the application or source host (e.g., for substitute 1709 best-effort services) and which are not likely to be changed or reset 1710 in-flight is required for interoperable application development. 1711 Network MIB variables and dynamic signaling protocols necessary for 1712 service configuration and monitoring must be specified. Furthermore, 1713 basic implementation requirements which are essential for the stable 1714 operation of the network should also be specified (e.g., thou shalt 1715 prevent drop-preference traffic from starving normal best-effort 1716 traffic). 1718 Incremental deployment strategies for packet marked differentiated 1719 services may be required if the IPv4 Precedence/TOS field semantics 1720 are redefined from their specification in [RFC795] and [RFC1349]. 1721 This may be needed for example to preserve routing protocol traffic 1722 prioritization based on the IPv4 Precedence field in networks where 1723 the new PH semantics are incrementally honored. This may also be 1724 required where a "worse-than Routine" drop priority level must be 1725 defined to implement a particular differentiated service, and packets 1726 arrive to the network which are not sourced or re-marked to use the 1727 new PH semantics and are instead marked using the "Routine" 1728 Precedence value (the routers may interpret the "Routine" Precedence 1729 value to indicate "worse-than Routine" drop priority). 1731 Another area of potential standardization is the interaction and 1732 compatibility between packet marked differentiated services and the 1733 traditional Integrated Services. Also, service measurement 1734 methodologies may be defined and specified as a Best Current 1735 Practice. 1737 Interoperability of packet marked differentiated services between 1738 different service providers may require the standardization of the 1739 semantics and expected behavior of a small set of differentiation 1740 mechanisms and/or service classes to allow compatible exchange of 1741 traffic. 1743 Those aspects of packet marking which should remain implementation- 1744 dependent include the particular buffer management, scheduling, and 1745 authorization mechanisms and policies used to instantiate a set of 1746 differentiated services. 1748 12. Security Considerations 1750 As discussed in Sec. 2, the wide-spread deployment of IP Security 1751 obscures the header fields which are traditionally used for per-flow 1752 packet classification. Therefore, deployment of packet marking 1753 differentiated services eliminates a disincentive to the deployment 1754 of IP Security. 1756 Because the differentiation mechanisms which are deployed will likely 1757 introduce service bias, new denial-of-service attacks may be 1758 introduced. As examples, host transport protocols which advertise 1759 ECN capability but which do not respond appropriately to a BECN may 1760 degrade the performance of other users and applications, as may 1761 unauthorized use of priority or service class indications. 1762 Unauthorized use of a network control priority indication may permit 1763 an attacker to severely degrade the performance of the network. 1764 Furthermore, an attack on the differentiated services authorization, 1765 signaling, or configuration mechanisms may permit theft-of-service or 1766 may enable a severe denial-of-service attack. As a consequence, 1767 authorization, signaling, and configuration mechanisms must be 1768 strongly protected (e.g., by authentication). Access to provisioned 1769 biased services must always be authorized, and routers must implement 1770 active measures (or intrinsic mechanism design) to enforce fairness 1771 amongst users of substitute best-effort services. Network control 1772 priority in particular must be authorized, for example by always 1773 resetting the associated PH bit(s) on host access links (this may be 1774 difficult to implement on shared-media subnets), or by only honoring 1775 the network control priority indication from configured peers. 1777 The IP Security Authentication Header (AH) does not cover the IPv4 1778 Precedence/TOS field in the integrity check value computation [AH]. 1779 This behavior is in fact essential for the deployment of network- 1780 invoked differentiated services where the source host is unaware of 1781 the PH value which will be delivered to the destination host, since 1782 it may be changed in-flight. In the case where the source host is 1783 authorized to select the PH value, this AH behavior does not provide 1784 end-to-end authentication and integrity of the PH value. The AH 1785 header format and integrity check value computation could be 1786 redefined to incorporate an application-selectable mask on the PH 1787 field which would allow the application to specify the particular PH 1788 bits which might require end-to-end authentication (so as to help 1789 determine denial-of-service attacks within the network). However, 1790 end-to-end integrity of the PH field does not guarantee that a 1791 differentiated service has been delivered, since the network is free 1792 to ignore the PH field. Separate measurement and assurance 1793 mechanisms are needed to ensure that any negotiated differentiated 1794 services are being provided. 1796 13. Acknowledgements 1798 The issues examined in this memo have been topics of discussion 1799 within the Internet community for many years. As such, the author 1800 does not claim credit for the originality of any of the ideas herein, 1801 and has made an earnest attempt to reference their original 1802 proponents. Assistance from the community in documenting the origins 1803 of these ideas is appreciated. 1805 The author would like to specifically acknowledge the assistance of 1806 Janet Andersen, Ed Bowen, Charles Burton, Ed Ellesson, Brian 1807 Haberman, and Hal Sandick. The author would also like to thank Fred 1808 Baker and Steve Deering for insights obtained during both public and 1809 private conversations. 1811 14. References 1813 [ACTIVE] B. Braden et. al., "Recommendations on Queue Management 1814 and Congestion Avoidance in the Internet", Internet Draft 1815 , March 1997. 1817 [AH] S. Kent and R. Atkinson, "IP Authentication Header", 1818 Internet Draft , 1819 October 1997. 1821 [Bohn93] R. Bohn, H. Braun, K. Claffy, and S. Wolff, "Mitigating 1822 the coming Internet crunch: multiple service levels via 1823 Precedence", submitted for publication, November 1993, 1824 ftp://ftp.sdsc.edu/pub/sdsc/anr/papers/precedence.ps.Z. 1826 [CBQ] S. Floyd and V. Jacobson, "Link-sharing and Resource 1827 Management Models for Packet Networks", IEEE/ACM 1828 Transactions on Networking, Vol. 3 no. 4, pp. 365-386, 1829 August 1995. 1831 [CCBES] C. Lefelhocz, B. Lyles, S. Shenker, and L. Zhang, 1832 "Congestion Control for Best-Effort Service: Why We Need a 1833 New Paradigm", IEEE Network, Vol. 10, no. 1, January 1996. 1835 [Clark97] D. Clark and J. Wroclawski, "An Approach to Service 1836 Allocation in the Internet", Internet Draft 1837 , July 1997. 1839 [CLASSY] S. Berson and S. Vincent, "A "Classy" Approach to 1840 Aggregation for Integrated Services", Internet Draft 1841 , March 1997. 1843 [Crow97] J. Crowcroft, "All You Need Is Just One Bit", keynote 1844 presentation, IFIP Conf. on Protocols for High Speed 1845 Networks, October 1996, 1846 http://www.cs.ucl.ac.uk/staff/jon/hipparch/dollarbit. 1848 [ECN94] S. Floyd, "TCP and Explicit Congestion Notification", 1849 ACM Computer Communications Review, Vol. 24 no. 5, pp. 1850 10-23, October 1994. 1852 [ECN97] K. Ramakrishnan and S. Floyd, "A Proposal to Add Explicit 1853 Congestion Notification (ECN) to IPv6 and to TCP", 1854 Internet Draft , November 1997. 1856 [ESP] S. Kent and R. Atkinson, "IP Encapsulating Security 1857 Payload", Internet Draft , 1858 October 1997. 1860 [Feng97] W. Feng, D. Kandlur, D. Saha, and K. Shin, "Adaptive 1861 Packet Marking for Providing Differentiated Services in 1862 the Internet", Univ. Michigan Technical Report 1863 CSE-TR-347-97, October 1997, 1864 http://www.eecs.umich.edu/~wuchang/work/pmg.ps.Z. 1866 [Ferg97] P. Ferguson, "Simple Differential Services: IP TOS and 1867 Precedence, Delay Indication, and Drop Preference, 1868 Internet Draft , 1869 November 1997. 1871 [Floyd91] S. Floyd, "Connections with Multiple Congested Gateways 1872 in Packet-Switched Networks Part 1: One-way Traffic", 1873 Computer Communications Review, Vol.21, No.5, October 1874 1991, p. 30-47, ftp://ftp.ee.lbl.gov/papers/gates1.ps.Z. 1876 [Floyd97] S. Floyd and K. Fall, "Router Mechanisms to Support End- 1877 to-End Congestion Control", LBNL Technical Report, 1878 February 1997, http://ftp.ee.lbl.gov/papers/collapse.ps. 1880 [FRED] D. Lin and R. Morris, "Dynamics of Random Early 1881 Detection", Proc. ACM SIGCOMM 1997, September 1997. 1883 [GBH97] R. Guerin, S. Blake, and S. Herzog, "Aggregating RSVP- 1884 based QoS Requests", Internet Draft 1885 , November 1997. 1887 [HFSC] I. Stoica, H. Zhang, and T. Ng, "A Hierarchical Fair 1888 Service Curve Algorithm for Link-Sharing, Real-Time and 1889 Priority Services", Proc. ACM SIGCOMM 97, September 1997. 1891 [HPFQA] J. Bennett and Hui Zhang, "Hierarchical Packet Fair 1892 Queueing Algorithms", Proc. ACM SIGCOMM 96, August 1996. 1894 [IPv6] S. Deering and R. Hinden, "Internet Protocol, Version 6 1895 (IPv6) Specification", Internet Draft 1896 , November 1997. 1898 [IS802] M. Seaman, A. Smith, and E. Crawley, "Integrated Services 1899 Mappings on IEEE 802 Networks", Internet Draft 1900 , July 1997. 1902 [May97] M. May, J. Bolot, C. Diot, and A. Jean-Marie, "1-Bit 1903 Schemes for Service Discrimination in the Internet: 1904 Analysis and Evaluation", INRIA Research Report, August 1905 1997, 1906 http://www.inria.fr/rodeo/personnel/mmay/papers/rr_1bit.ps. 1908 [McCanne] S. McCanne, V. Jacobson, and M. Vetterli, "Receiver-driven 1909 Layered Multicast", Proc. ACM SIGCOMM 96, August 1996. 1911 [MPLS] R. Callon, P. Doolan, N. Feldman, A. Fredette, G. Swallow, 1912 and A. Viswanathan, "A Framework for Multiprotocol Label 1913 Switching", Internet Draft 1914 , July, 1997. 1916 [QOSP] S. Bradner, editor, "Internet Protocol Quality of Service 1917 Problem Statement", Internet Draft 1918 , September 1997. 1920 [RED] S. Floyd and V. Jacobson, "Random Early Detection Gateways 1921 for Congestion Avoidance", IEEE/ACM Transactions on 1922 Networking, August 1993. 1924 [RFC795] J. Postel, "Service Mappings", Internet RFC 795, September 1925 1981. 1927 [RFC1046] W. Prue and J. Postel, "A Queuing Algorithm to Provide 1928 Type-of-Service for IP Links", Internet RFC 1046, February 1929 1988. 1931 [RFC1349] P. Almquist, "Type of Service in the Internet Protocol 1932 Suite", Internet RFC 1349, July 1992. 1934 [RFC1583] J. Moy, "OSPF Version 2", Internet RFC 1583, March 1994. 1936 [RFC1633] R. Braden, D. Clark, and S. Shenker, "Integrated Services 1937 in the Internet Architecture: An Overview", Internet RFC 1938 1633, July 1994. 1940 [RFC1812] F. Baker, editor, "Requirements for IP Version 4 Routers", 1941 Internet RFC 1812, June 1995. 1943 [RFC1883] S. Deering and R. Hinden, "Internet Protocol, Version 6 1944 (IPv6) Specification", Internet RFC 1883, December 1995. 1946 [RSVP] B. Braden et. al., "Resource ReSerVation Protocol (RSVP) 1947 -- Version 1 Functional Specification", Internet RFC 2205, 1948 September 1997. 1950 [Shenker] S. Shenker, "Fundamental Design Issues for the Future 1951 Internet", IEEE/ACM Trans. on Networking, vol. 13, no. 7, 1952 Sep. 1995. 1954 [SIMA] K. Kilkki, "Simple Integrated Media Access (SIMA)", 1955 Internet Draft , 1956 June 1997. 1958 [TWOBIT] K. Nichols, V. Jacobson, and L. Zhang, "A Two-bit 1959 Differentiated Services Architecture for the Internet", 1960 Internet Draft , 1961 November 1997. 1963 Author's Address 1965 Steven Blake 1966 E95/664 1967 IBM Corporation 1968 800 Park Offices Drive 1969 Research Triangle Park, NC 27709 1970 Phone: +1-919-254-2030 1971 Fax: +1-919-254-5483 1972 E-mail: slblake@raleigh.ibm.com