idnits 2.17.1 draft-ietf-issll-802-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-19) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The abstract seems to contain references ([2], [4], [5], [6], [7], [8], [9], [10]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 49 has weird spacing: '...tension to th...' == Line 158 has weird spacing: '...ted and the s...' == Line 1118 has weird spacing: '... 1.2ms unb...' == Line 1119 has weird spacing: '... 120us unb...' == Line 1120 has weird spacing: '... 12us unb...' == (4 more instances...) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 1997) is 9805 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: '3' is defined on line 1380, but no explicit reference was found in the text == Unused Reference: '12' is defined on line 1413, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. '1' -- Possible downref: Non-RFC (?) normative reference: ref. '2' ** Downref: Normative reference to an Informational RFC: RFC 1633 (ref. '3') -- Possible downref: Non-RFC (?) normative reference: ref. '4' -- Possible downref: Non-RFC (?) normative reference: ref. '5' -- Possible downref: Non-RFC (?) normative reference: ref. '6' -- Possible downref: Non-RFC (?) normative reference: ref. '7' -- Possible downref: Non-RFC (?) normative reference: ref. '8' -- Possible downref: Non-RFC (?) normative reference: ref. '9' -- Possible downref: Non-RFC (?) normative reference: ref. '10' -- Possible downref: Non-RFC (?) normative reference: ref. '11' -- Possible downref: Non-RFC (?) normative reference: ref. '12' -- Possible downref: Non-RFC (?) normative reference: ref. '14' Summary: 11 errors (**), 0 flaws (~~), 9 warnings (==), 14 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Draft Mick Seaman 3 Expires November 1997 3Com 4 draft-ietf-issll-802-01.txt Andrew Smith 5 Extreme Networks 6 Eric Crawley 7 Gigapacket Networks 8 June 1997 10 Integrated Services over IEEE 802.1D/802.1p Networks 12 Status of this Memo 14 This document is an Internet Draft. Internet Drafts are working 15 documents of the Internet Engineering Task Force (IETF), its Areas, 16 and its Working Groups. Note that other groups may also distribute 17 working documents as Internet Drafts. 19 Internet Drafts are draft documents valid for a maximum of six 20 months. Internet Drafts may be updated, replaced, or obsoleted by 21 other documents at any time. It is not appropriate to use Internet 22 Drafts as reference material or to cite them other than as a "working 23 draft" or "work in progress." 25 Please check the I-D abstract listing contained in each Internet 26 Draft directory to learn the current status of this or any other 27 Internet Draft. 29 Abstract 31 This document describes the support of IETF Integrated Services over 32 LANs built from IEEE 802 network segments which may be interconnected by 33 draft standard IEEE P802.1p switches. 35 It describes the practical capabilities and limitations of this 36 technology for supporting Controlled Load [8] and Guaranteed Service [9] 37 using the inherent capabilities of the relevant 802 technologies [5],[6] 38 etc. and the proposed 802.1p queuing features in switches. IEEE P802.1p 39 [2] is a superset of the existing IEEE 802.1D bridging specification. 40 This document provides a functional model for the layer 3 to layer 2 and 41 user-to-network dialogue which supports admission control and defines 42 requirements for interoperability between switches. The special case of 43 such networks where the sender and receiver are located on the same 44 segment is also discussed. 46 This scheme expands on the ISSLL over 802 LANs framework described in 47 [7]. It makes reference to an admission control signaling protocol 48 developed by the ISSLL WG which is known as the "Subnet Bandwidth 49 Manager". This is an extension to the IETF's RSVP protocol [4] and is 50 described in a separate document [10]. 52 1. Introduction 54 The IEEE 802.1 Interworking Task Group is currently enhancing the basic 55 MAC Service provided in Bridged Local Area Networks (aka "switched 56 LANs"). As a supplement to the original IEEE MAC Bridges standard [1], 57 the update P802.1p [2] proposes differential traffic class queuing and 58 access to media on the basis of a "user_priority" signaled in frames. 60 In this document we 61 * review the meaning and use of user_priority in LANs and the frame 62 forwarding capabilities of a standard LAN switch. 63 * examine alternatives for identifying layer 2 traffic flows for 64 admission control. 65 * review the options available for policing traffic flows. 66 * derive requirements for consistent traffic class handling in a network 67 of switches and use these requirements to discuss queue handling 68 alternatives for 802.1p and the way in which these meet administrative 69 and interoperability goals. 70 * consider the benefits and limitations of this switched-based approach, 71 contrasting it with full router based RSVP implementation in terms of 72 complexity, utilisation of transmission resources and administrative 73 controls. 75 The model used is outlined in the "framework document" [7] which in 76 summary: 77 * partitions the admission control process into two separable 78 operations: 79 * an interaction between the user of the integrated service and the 80 local network elements ("provision of the service" in the terms of 81 802.1D) to confirm the availability of transmission resources for 82 traffic to be introduced. 83 * selection of an appropriate user_priority for that traffic on the 84 basis of the service and service parameters to be supported. 85 * distinguishes between the user to network interface above and the 86 mechanisms used by the switches ("support of the service"). These 87 include communication between the switches (network to network 88 signaling). 89 * describes a simple architecture for the provision and support of these 90 services, broken down into components with functional and interface 91 descriptions: 92 * a single "user" component: a layer-3 to layer-2 negotiation and 93 translation component for both sending and receiving, with interfaces to 94 other components residing in the station. 95 * processes residing in a bridge/switch to handle admission control and 96 mapping requests, including proposals for actual traffic mappings to 97 user_priority values. 98 * identifies a need for a signaling protocol to carry admission control 99 requests between devices. 101 It will be noted that this document is written from the pragmatic 102 viewpoint that there will be a widely deployed network technology and we 103 are evaluating it for its ability to support some or all of the defined 104 IETF integrated services: this approach is intended to ensure 105 development of a system which can provide useful new capabilities in 106 existing (and soon to be deployed) network infrastructures. 108 2. Goals and Assumptions 110 It is assumed that typical subnetworks that are concerned about 111 quality-of-service will be"switch-rich": that is to say most 112 communication between end stations using integrated services support 113 will pass through at least one switch. The mechanisms and protocols 114 described will be trivially extensible to communicating systems on the 115 same shared media, but it is important not to allow problem 116 generalisation to complicate the practical application that we target: 117 the access characteristics of Ethernet and Token-Ring LANs are forcing a 118 trend to switch-rich topologies along with MAC enhancements to ensure 119 access predictability on half-duplex switch to switch links. 121 Note that we illustrate most examples in this document using RSVP as an 122 "upper-layer" QoS signaling protocol but there are actually no real 123 dependencies on this protocol: RSVP could be replaced by some other 124 dynamic protocol or else the requests could be made by network 125 management or other policy entities. In any event, no extra 126 modifications to the RSVP protocol are assumed. 128 There may be a heterogeneous mixture of switches with different 129 capabilities, all compliant with IEEE 802.1p, but implementing queuing 130 and forwarding mechanisms in a range from simple 2-queue per port, 131 strict priority, up to more complex multi-queue (maybe even one per- 132 flow) WFQ or other algorithms. 134 The problem is broken down into smaller independent pieces: this may 135 lead to sub-optimal usage of the network resources but we contend that 136 such benefits are often equivalent to very small improvements in network 137 efficiency in a LAN environment. Therefore, it is a goal that the 138 switches in the network operate using a much simpler set of information 139 than the RSVP engine in a router. In particular, it is assumed that such 140 switches do not need to implement per-flow queuing and policing 141 (although they might do so). 143 It is a fundamental assumption of the int-serv model that flows are 144 isolated from each other throughout their transit across a network. 145 Intermediate queueing nodes are expected to police the traffic to ensure 146 that it conforms to the pre-agreed traffic flow specification. In the 147 architecture proposed here for mapping to layer-2, we diverge from that 148 assumption in the interests of simplicity: the policing function is 149 assumed to be implemented in the transmit schedulers of the layer-3 150 devices (end stations, routers). In the LAN environments envisioned, it 151 is reasonable to assume that end stations are "trusted" to adhere to 152 their agreed contracts at the inputs to the network and that we can 153 afford to over-allocate resources at admission -control time to 154 compensate for the inevitable extra jitter/bunching introduced by the 155 switched network itself. 157 These divergences have some implications on the receiver heterogeneity 158 that can be supported and the statistical multiplexing gains that might 159 have been exploited, especially for Controlled Load flows. 161 3. User Priority and Frame Forwarding in IEEE 802 Networks 163 3.1 General IEEE 802 Service Model 165 User_priority is a value associated with the transmission and reception 166 of all frames in the IEEE 802 service model: it is supplied by the 167 sender which is using the MAC service. It is provided along with the 168 data to a receiver using the MAC service. It may or may not be actually 169 carried over the network: Token- Ring/802.5 carries this value (encoded 170 in its FC octet), basic Ethernet/802.3 does not. 802.1p defines a way to 171 carry this value over the network in a consistent way on Ethernet, Token 172 Ring, FDDI or other MAC-layer media using an extended frame format. The 173 usage of user_priority is summarised below but is more fully described 174 in section 2.5 of 802.1D [1] and 802.1p [2] "Support of the Internal 175 Layer Service by Specific MAC Procedures" and readers are referred to 176 these documents for further information. 178 If the "user_priority" is carried explicitly in packets, its utility is 179 as a simple label in the data stream enabling packets in different 180 classes to be discriminated easily by downstream nodes without their 181 having to parse the packet in more detail. 183 Apart from making the job of desktop or wiring-closet switches easier, 184 an explicit field means they do not have to change hardware or software 185 as the rules for classifying packets evolve (e.g. based on new protocols 186 or new policies). More sophisticated layer-3 switches, perhaps deployed 187 towards the core of a network, can provide added value here by 188 performing the classification more accurately and, hence, utilising 189 network resources more efficiently or providing better protection of 190 flows from one another: this appears to be a good economic choice since 191 there are likely to be very many more desktop/wiring closet switches in 192 a network than switches requiring layer-3 functionality. 194 The IEEE 802 specifications make no assumptions about how user_priority 195 is to be used by end stations or by the network. In particular it can 196 only be considered a "priority" in a loose sense: although the current 197 802.1p draft defines static priority queuing as the default mode of 198 operation of switches that implement multiple queues (user_priority is 199 defined as a 3-bit quantity so strict priority queueing would give value 200 7 = high priority, 0 = low priority). The general switch algorithm is as 201 follows: packets are placed onto a particular queue based on the 202 received user_priority (from the packet if a 802.1p header or 802.5 203 network was used, invented according to some local policy if not). The 204 selection of queue is based on a mapping from user_priority 205 [0,1,2,3,4,5,6 or 7] onto the number of available queues.Note that 206 switches may implement any number of queues from 1 upwards and it may 207 not be visible externally, except through any advertised switch 208 parameters and the its admission control behaviour, which user_priority 209 values get mapped to the same vs. Different queues internally.Other 210 algorithms that a switch might implement might include e.g. weighted 211 fair queueuing, round robin. 213 In particular, IEEE makes no recommendations about how a sender should 214 select the value for user_priority: one of the main purposes of this 215 current document is to propose such usage rules and how to communicate 216 the semantics of the values between switches, end- stations and routers. 217 For the remainder of this document we use the term "traffic class" when 218 discussing the treatment of packets with one of the user_priority 219 values. 221 3.2 Ethernet/802.3 223 There is no explicit traffic class or user_priority field carried in 224 Ethernet packets. This means that user_priority must be regenerated at a 225 downstream receiver or switch according to some defaults or by parsing 226 further into higher-layer protocol fields in the packet. Alternatively, 227 the IEEE 802.1Q encapsulation [11] may be used which provides an 228 explicit traffic class field on top of an basic MAC format. 230 For the different IP packet encapsulations used over Ethernet/802.3, it 231 will be necessary to adjust any admission- control calculations 232 according to the framing and to the padding requirements: 234 Encapsulation Framing Overhead IP MTU 235 bytes/pkt bytes 237 IP EtherType (ip_len<=46 bytes) 64-ip_len 1500 238 (1500>=ip_len>=46 bytes) 18 1500 240 IP EtherType over 802.1p/Q (ip_len<=42) 64-ip_len 1500* 241 (1500>=ip_len>=42 bytes) 22 1500* 243 IP EtherType over LLC/SNAP (ip_len<=40) 64-ip_len 1492 244 (1500>=ip_len>=40 bytes) 24 1492 246 * note that the draft IEEE 802.1Q specification exceeds the IEEE 802.3 247 maximum packet length values by 4 bytes. 249 3.3 Token-Ring/802.5 251 The token ring standard [6] provides a priority mechanism that can be 252 used to control both the queuing of packets for transmission and the 253 access of packets to the shared media. The priority mechanisms are 254 implemented using bits within the Access Control (AC) and the Frame 255 Control (FC) fields of a LLC frame. The first three bits of the AC 256 field, the Token Priority bits, together with the last three bits of the 257 AC field, the Reservation bits, regulate which stations get access to 258 the ring. The last three bits of the FC field of an LLC frame, the User 259 Priority bits, are obtained from the higher layer in the user_priority 260 parameter when it requests transmission of a packet. This parameter also 261 establishes the Access Priority used by the MAC. The user_priority value 262 is conveyed end-to-end by the User Priority bits in the FC field and is 263 typically preserved through Token-Ring bridges of all types. In all 264 cases, 0 is the lowest priority. 266 Token-Ring also uses a concept of Reserved Priority: this relates to the 267 value of priority which a station uses to reserve the token for the next 268 transmission on the ring. When a free token is circulating, only a 269 station having an Access Priority greater than or equal to the Reserved 270 Priority in the token will be allowed to seize the token for 271 transmission. Readers are referred to [14] for further discussion of 272 this topic. 274 A token ring station is theoretically capable of separately queuing each 275 of the eight levels of requested user priority and then transmitting 276 frames in order of priority. A station sets Reservation bits according 277 to the user priority of frames that are queued for transmission in the 278 highest priority queue. This allows the access mechanism to ensure that 279 the frame with the highest priority throughout the entire ring will be 280 transmitted before any lower priority frame. Annex I to the IEEE 802.5 281 token ring standard recommends that stations send/relay frames as 282 follows: 284 Application user_priority 285 non-time-critical data 0 286 - 1 287 - 2 288 - 3 289 LAN management 4 290 time-sensitive data 5 291 real-time-critical data 6 292 MAC frames 7 294 To reduce frame jitter associated with high-priority traffic, the annex 295 also recommends that only one frame be transmitted per token and that 296 the maximum information field size be 4399 octets whenever delay- 297 sensitive traffic is traversing the ring. Most existing implementations 298 of token ring bridges forward all LLC frames with a default access 299 priority of 4. Annex I recommends that bridges forward LLC frames that 300 have a user priorities greater that 4 with a reservation equal to the 301 user priority (although the draft IEEE P802.1p [2] permits network 302 management override this behaviour). The capabilities provided by token 303 ring's user and reservation priorities and by IEEE 802.1p can provide 304 effective support for Integrated Services flows that request QoS using 305 RSVP. These mechanisms can provide, with few or no additions to the 306 token ring architecture, bandwidth guarantees with the network flow 307 control necessary to support such guarantees. 309 For the different IP packet encapsulations used over Token Ring/802.5, 310 it will be necessary to adjust any admission-control calculations 311 according to the framing requirements: 313 Encapsulation Framing Overhead IP MTU 314 bytes/pkt bytes 316 IP EtherType over 802.1p/Q 29 4370* 317 IP EtherType over LLC/SNAP 25 4370* 319 *the suggested MTU from RFC 1042 [13] is 4464 bytes but there are issues 320 related to discovering what the maximum supported MTU between any two 321 points both within and between Token Ring subnets. We recommend here an 322 MTU consistent with the 802.5 Annex I recommendation. 324 4. Integrated services through layer-2 switches 326 4.1 Summary of switch characteristics 328 For the sake of illustration, we divide layer-2 bridges/switches into 329 several categories, based on the level of sophistication of their QoS 330 and software protocol capabilities: these categories are not intended to 331 represent all possible implementation choices but, instead, to aid 332 discussion of what QoS capabilities can be expected from a network made 333 of these devices. 335 Class I - 802.1p priority queueuing between traffic classes. 336 - No multicast heterogeneity. 337 - 802.1p GARP/GMRP pruning of individual multicast addresses. 339 Class II As (I) plus: 340 - can map received user_priority on a per-input-port basis to 341 some internal set of canonical values. 342 - can map internal canonical values onto transmitted 343 user_priority on a per-output-port basis giving some limited form of 344 multicast heterogeneity. 345 - maybe implements IGMP snooping for pruning. 347 Class III As (II) plus: 348 - per-flow classification 349 - maybe per-flow policing and/or reshaping 350 - WFQ or other transmit scheduling (probably not per-flow) 4.2 351 Queueing 353 Connectionless packet-based networks in general, and LAN-switched 354 networks in particular, work today because of scaling choices in network 355 provisioning. Consciously or (more usually) unconsciously, enough excess 356 bandwidth and buffering is provisioned in the network to absorb the 357 traffic sourced by higher-layer protocols or cause their transmission 358 windows to run out, on a statistical basis, so that the network is only 359 overloaded for a short duration and the average expected loading is less 360 than 60% (usually much less). 362 With the advent of time-critical traffic such overprovisioning has 363 become far less easy to achieve. Time critical frames may find 364 themselves queued for annoyingly long periods of time behind temporary 365 bursts of file transfer traffic, particularly at network bottleneck 366 points, e.g. at the 100 Mb/s to 10 Mb/s transition that might occur 367 between the riser to the wiring closet and the final link to the user 368 from a desktop switch. In this case, however, if it is known (guaranteed 369 by application design, merely expected on the basis of statistics, or 370 just that this is all that the network guarantees to support) that the 371 time critical traffic is a small fraction of the total bandwidth, it 372 suffices to give it strict priority over the "normal" traffic. The worst 373 case delay experienced by the time critical traffic is roughly the 374 maximum transmission time of a maximum length non-time-critical frame - 375 less than a millisecond for 10 Mb/s Ethernet, and well below an end to 376 end budget based on human perception times. 378 When more than one "priority" service is to be offered by a network 379 element e.g. it supports Controlled-Load as well as Guaranteed Service, 380 the queuing discipline becomes more complex. In order to provide the 381 required isolation between the service classes, it will probably be 382 necessary to queue them separately. There is then an issue of how to 383 service the queues - a combination of admission control and maybe 384 weighted fair queuing may be required in such cases. As with the service 385 specifications themselves, it is not the place for this document to 386 specify queuing algorithms, merely to observe that the external 387 behaviour meet the services' requirements. 389 4.3 Multicast Heterogeneity 391 IEEE 802.1D and 802.1p specify a basic model for multicast whereby a 392 switch performs multicast routing decisions based on the destination 393 address: this would produce a list of output ports to which the packet 394 should be forwarded. In its default mode, such a switch would use any 395 user_priority value in received packets to enqueue the packets at each 396 output port. All of the classes of switch identified above can support 397 this operation. 399 At layer-3, the int-serv model allows heterogeneous multicast flows 400 where different branches of a tree can have different types of 401 reservations for a given multicast destination, or even supports the 402 notion that some trees will have some branches with reserved flows and 403 some using best effort (default) service. 405 If a switch is selecting per-port output queues based only on the 406 incoming user_priority, as described by 802.1p, it must treat all 407 branches of all multicast sessions within that user_priority class with 408 the same queuing mechanism: no heterogeneity is then possible.I If a 409 switch were to implement a separate user_priority mapping at each output 410 port, as described under "Class II switch" above, then some limited form 411 of receiver heterogeneity can be supported e.g. forwarding of traffic as 412 user_priority 4 on one branch where receivers have performed admission 413 control reservations and as user_priority 0 on one where they have not. 414 We assume that per-user_priority queuing without taking account of input 415 or output ports is the minimum standard functionality for systems in a 416 LAN environment (Class I switch, as defined above). More functional 417 layer-2 switches or even layer-3 switches (a.k.a. routers) can be used 418 if even more flexible forms of heterogeneity are considered necessary: 419 their behaviour is well standardised. 421 4.4 Override of incoming user_priority 423 In some cases, a network administrator may not trust the user_priority 424 values contained in packets from a source and may which to map these 425 into some more suitable set of values. Alternatively, due perhaps to 426 equipment limitations or transition periods, values may need to be 427 mapped to/from different regions of a network. 429 Some switches may implement such a function on input that maps received 430 user_priority into some internal set of values (this table is known in 431 802.1p as the "user_priority regeneration table"). These values can then 432 be mapped using the output table described above onto outgoing 433 user_priority values: these same mappings must also be used when 434 applying admission control to requests that use the user_priority values 435 (see e.g. [10]). More sophisticated approaches may also be envisioned 436 where a device polices traffic flows and adjusts their onward 437 user_priority based on their conformance to the admitted traffic flow 438 specifications. 440 4.5 Remapping of non-conformant aggregated flows 442 One other topic under discussion in the int-serv context is how to 443 handle the traffic for data flows from sources that are exceeding their 444 currently agreed traffic contract with the network. An approach that 445 shows much promise is to treat such traffic with "somewhat less than 446 best effort" service in order to protect traffic that is normally given 447 "best effort" service from having to back off (such traffic is often 448 "adaptive" using TCP or other congestion control algorithms and it would 449 be unfair to penalise it due to badly behaved traffic from reserved 450 flows which are usually set up by non-adaptive applications). 452 A solution here might be to assign normal best effort traffic to one 453 user_priority and to label excess non-conformant traffic as a "lower" 454 user_priority. This topic is further discussed below. 456 5. Selecting traffic classes 458 One fundamental question is "who gets to decide what the classes mean 459 and who gets access to them?" One approach would be for the meanings of 460 the classes to be "well-known": we would then need to standardise a set 461 of classes e.g. 1 = best effort, 2 = controlled- load, 3 = guaranteed 462 (loose delay bound, high bandwidth), 4 = guaranteed (slightly tighter 463 delay) etc. The values to encode in such a table in end stations, in 464 isolation from the network to which they are connected, is 465 problematical: one approach could be to define one user_priority value 466 per int-serv service and leave it at that (reserving the rest of the 467 combinations for future traffic classes - there are sure to be plenty!). 469 We propose here a more flexible mapping: clients ask "the network" which 470 user_priority traffic class to use for a given traffic flow, as 471 categorised by its flow-spec and layer-2 endpoints. The network provides 472 a value back to the requester which is appropriate to the current 473 network topology, load conditions, other admitted flows etc. The task of 474 configuring switches with this mapping (e.g. through network management, 475 a switch-switch protocol or via some network-wide QoS-mapping directory 476 service) is an order of magnitude less complex than performing the same 477 function in end stations. Also, when new services (or other network 478 reconfigurations) are added to such a network, the network elements will 479 typically be the ones to be upgraded with new queuing algorithms etc. 480 and can be provided with new mappings at this time. 482 Given the need for a new session or "flow" requiring some QoS support, a 483 client then needs answers to the following questions: 485 1. which traffic class do I add this flow to? 486 The client needs to know how to label the packets of the flow as it 487 places them into the network. 489 2. who do I ask/tell? 490 The proposed model is that a client ask "the network" which 491 user_priority traffic class to use for a given traffic flow. This has 492 several benefits as compared to a model which allows clients to select a 493 class for themselves. 495 3. how do I ask/tell them? 496 A request/response protocol is needed between client and network: in 497 fact, the request can be piggy-backed onto an admission control request 498 and the response can be piggy-backed onto an admission control 499 acknowledgment: this "one pass" assignment has the benefit of completing 500 the admission control in a timely way and reducing the exposure to 501 changing conditions which could occur if clients cached the knowledge 502 for extensive periods. 504 The network (i.e. the first network element encountered downstream from 505 the client) must then answer the following questions: 507 1. which traffic class do I add this flow to? 508 This is a packing problem, difficult to solve in general, but many 509 simplifying assumptions can be made: presumably some simple form of 510 allocation can be done without a more complex scheme able to dynamically 511 shift flows around between classes. 513 2. which traffic class has worst-case parameters which meet the needs of 514 this flow? 515 This might be an ordering/comparison problem: which of two service 516 classes is "better" than another? Again, we can make this tractable by 517 observing that all of the current int-serv classes can be ranked (best 518 effort <= Controlled Load <= Guaranteed Service) in a simple manner. If 519 any classes are implemented in the future that cannot be simply ranked 520 then the issue can be finessed by either a priori knowledge about what 521 classes are supported or by configuration. 523 and return the chosen user_priority value to the client. 525 Note that the client may be either an end station, router or a first 526 switch which may be acting as a proxy for a client which does not 527 participate in these protocols for whatever reason. Note also that a 528 device e.g. a server or router, may choose to implement both the 529 "client" as well as the "network" portion of this model so that it can 530 select its own user_priority values: such an implementation would, 531 however, be discouraged unless the device really does have a close tie- 532 in with the network topology and resource allocation policies but would 533 work in some cases where there is known over- provisioning of resources. 535 6. Flow Identification 537 Several previous proposals for int-serv over lower-layers have treated 538 switches very much as a special case of routers: in particular, that 539 switches along the data path will make packet handling decisions based 540 on the RSVP flow and filter specifications and use them to classify the 541 corresponding data packets. However, filtering to the per-flow level 542 becomes cost-prohibitive with increasing switch speed: devices with such 543 filtering capabilities are unlikely to have a very different 544 implementation cost to IP routers, in which case we must question 545 whether a specification oriented toward switched networks is of any 546 benefit at all. 548 This document proposes that "aggregated flow" identification based on 549 user_priority be the minimum required of switches. 551 7. Reserving Network Resources - Admission Control 553 So far we have not discussed admission control. In fact, without 554 admission control it is possible to scratchbuild a LAN network of some 555 size capable of supporting real-time services, providing that the 556 traffic fits within certain scaling constraints (relative link speeds, 557 numbers of ports etc. - see below). This is not surprising since it is 558 possible to run a fair approximation to real time services on small LANs 559 today with no admission control or help from encoded priority bits. 561 Imagine a campus network providing dedicated 10 Mbps connections to each 562 user. Each floor of each building supports up to 96 users, organized 563 into groups of 24, with each group being supported by a 100 Mbps 564 downlink to a basement switch which concentrates 5 floors (20 x 100 565 Mbps) and a data center (4 x 100 Mbps) to a 1 Gbps link to an 8 Gbps 566 central campus switch, which in turn hooks 6 buildings together (with 2 567 x 1 Gbps full duplex links to support a corporate server farm). Such a 568 network could support 1.5 Mb/s of voice/video from every user to any 569 other user or (for half the population) the server farm, provided the 570 video ran high priority: this gives 3000 users, all with desktop video 571 conferencing running along with file transfer/email etc. In such a 572 network RSVP's role would be limited to ensuring resource availability 573 at the communicating end stations and for connection to the wide area. 575 In such a network, a discussion as to the best service policy to apply 576 to high and low priority queues may prove academic: while it is true 577 that "normal" traffic may be delayed by bunches of high priority frames, 578 queuing theory tells us that the average queue occupancy in the high 579 priority queue at any switch port will be somewhat less than 1 (with 580 real user behaviour, i.e. not all watching video conferences all the 581 time) it should be far less. A cheaper alternative to buying equipment 582 with a fancy queue service policy may be to buy equipment with more 583 bandwidth to lower the average link utilisation by a few per cent. 585 In practice a number of objections can be made to such a simple 586 solution. There may be long established expensive equipment in the 587 network which does not provide all the bandwidth required. There will be 588 considerable concern over who is allowed to say what traffic is high 589 priority. There may be a wish to give some form of "prioritised" service 590 to crucial business applications, above that given to experimental 591 video-conferencing. The task that faces us is to provide a degree of 592 control without making that control so elaborate to implement that the 593 control-oriented solution is not simply rejected in favor of providing 594 yet more bandwidth, at a lower cost. 596 The proposed admission control mechanism requires a query-response 597 interaction with the network returning a "YES/NO" answer and, if 598 successful, a user_priority value with which to tag the data frames of 599 this flow. 601 The relevant int-serv specifications describe the parameters which need 602 to be considered when making an admission control decision at each node 603 in the network path between sender and receiver. We discuss how to 604 calculate these parameters for different network technologies below but 605 we do not specify admission control algorithms or mechanisms as to how 606 to progress the admission control process across the network. One such 607 mechanism is described as SBM in [10]. 609 Where there are multiple mechanisms in use for allocating resources e.g. 610 some combination of SBM and network management, it will be necessary to 611 ensure that network resources are partitioned amongst the different 612 mechanisms in some way: this could be by configuration or maybe by 613 having the mechanisms allocate from a common resource pool within any 614 device. 616 8. Mapping of integrated services to layer-2 in layer-3 devices 618 8.1 Layer-3 client 620 We assume the same client model as int-serv and RSVP where we use the 621 term "client" to mean the entity handling QoS in the layer-3 device at 622 each end of a layer-2 hop (e.g. end-station, router). The sending client 623 itself is responsible for local admission control and scheduling packets 624 onto its link in accordance with the service agreed. Just as in the 625 int-serv model, this involves per-flow schedulers (a.k.a. shapers) in 626 every such data source. 628 The client is running an RSVP process which presents a session 629 establishment interface to applications, signals RSVP over the network, 630 programs a scheduler and classifier in the driver and interfaces to a 631 policy control module. In particular, RSVP also interfaces to a local 632 admission control module: it is this entity that we focus on here. 634 The following diagram is taken from the RSVP specification [4]: 635 _____________________________ 636 | _______ | 637 | | | _______ | 638 | |Appli- | | | | RSVP 639 | | cation| | RSVP <--------------------> 640 | | <--> | | 641 | | | |process| _____ | 642 | |_._____| | -->Polcy|| 643 | | |__.__._| |Cntrl|| 644 | |data | | |_____|| 645 |===|===========|==|==========| 646 | | --------| | _____ | 647 | | | | ---->Admis|| 648 | _V__V_ ___V____ |Cntrl|| 649 | | | | | |_____|| 650 | |Class-| | Packet | | 651 | | ifier|==>Schedulr|====================> 652 | |______| |________| | data 653 | | 654 |_____________________________| 656 Figure 1 - RSVP in Sending Hosts 658 Note that we illustrate examples in this document using RSVP as the 659 "upper-layer" signaling protocol but there are no actual dependencies on 660 this protocol: RSVP could be replaced by some other dynamic protocol or 661 else the requests could be made by network management or other policy 662 entities. 664 8.2 Requests to layer-2 666 The local admission control entity within a client is responsible for 667 mapping these layer-3 requests into layer-2 language. 669 The upper-layer entity requests from ISSLL: 671 "May I reserve for traffic with with 672 from to and how 673 should I label it?" 675 where 676 = Flow Spec, Tspec, Rspec (e.g. 677 bandwidth, burstiness, MTU etc.) 678 = latency, jitter bounds etc. 679 = IP address(es) 680 = IP address(es) - may be multicast 682 8.3 Sender 684 The ISSLL functionality in the sender is illustrated below and may be 685 summarised as: 686 * maps the endpoints of the conversation to layer-2 addresses in the 687 LAN, so it can figure out what traffic is really going where (probably 688 makes reference to the ARP protocol cache for unicast or an algorithmic 689 mapping for multicast destinations). 690 * applies local admission control on outgoing link and driver 691 * formats a SBM request to the network with the mapped addresses and 692 filter/flow specs 693 * receives response from the network and reports the YES/NO admission 694 control answer back to the upper layer entity, along with any negotiated 695 modifications to the session parameters. 696 * stores any resulting user_priority to be associated with this session 697 in a "802 header" lookup table for use when sending any future data 698 packets. 699 from IP from RSVP 700 ____|____________|____________ 701 | | | | 702 | __V____ ___V___ | 703 | | | | | | 704 | | Addr |<->| | | SBM signaling 705 | |mapping| | SBM |<------------------------> 706 | |_______| |Client | | 707 | ___|___ | | | 708 | | |<->| | | 709 | | 802 | |_______| | 710 | | header| / | | | 711 | |_______| / | | | 712 | | / | | _____ | 713 | | +-----/ | +->|Local| | 714 | __V_V_ _____V__ |Admis| | 715 | | | | | |Cntrl| | 716 | |Class-| | Packet | |_____| | 717 | | ifier|==>Schedulr|======================> 718 | |______| |________| | data 719 |______________________________| 721 Figure 2 - ISSLL in End-station Sender 723 ISSLL manageable objects in the sender: 724 802 header table 725 Local admission control resource status 726 L2 additions to classifier/scheduler int-serv tables 728 8.4 Receiver 729 The ISSLL functionality in the receiver is a good deal simpler. It is 730 summarised below and is illustrated by the following picture: 731 * handles any received SBM protocol indications. 732 * applies local admission control to see if a request can be supported 733 with appropriate local receive resources. 734 * passes indications up to RSVP if OK. 735 * accepts confirmations from RSVP and relays them back via SBM signaling 736 towards the requester. 737 * may program a receive classifier and scheduler, if any is used, to 738 identify traffic classes of received packets and accord them appropriate 739 treatment e.g. reserve some buffers for particular traffic classes. 740 * programs receiver to strip any 802 header information from received 741 packets. 743 to RSVP to IP 744 ^ ^ 745 ____|____________|___________ 746 | | | | 747 | __|____ | | 748 | | | | | 749 SBM signaling | | SBM | ___|___ | 750 <-----------------> |Client | | Strip | | 751 | |_______| |802 hdr| | 752 | | \ |_______| | 753 | __v___ \ ^ | 754 | | Local |\ | | 755 | | Admis | \ | | 756 | | Cntrl | \ | | 757 | |_______| \ | | 758 | ______ v___|____ | 759 | |Class-| | Packet | | 760 ===================>| ifier|==>|Scheduler| | 761 data | |______| |_________| | 762 |_____________________________| 764 Figure 3 - ISSLL in End-station Receiver 766 9. Layer-2 Switch Functions 768 9.1 Switch Model 770 In this model of layer-2 switch behaviour, we define the following 771 entities within the switch: 773 * Local admission control - one of these on each port accounts for the 774 available bandwidth on the link attached to that port. For half-duplex 775 links, this involves taking account of the resources allocated to both 776 transmit and receive flows. For full-duplex, the input port accountant's 777 task is trivial. 779 * Input SBM module: one instance on each port, performs the "network" 780 side of the signaling protocol for peering with clients or other 781 switches. Also holds knowledge of the mappings of int-serv classes to 782 user_priority. 784 * SBM propagation - relays requests that have passed admission control 785 at the input port to the relevant output ports' SBM modules. This will 786 require access to the switch's forwarding table (layer-2 "routing table" 787 - cf. RSVP model) and port spanning-tree states. 789 * Output SBM module - forwards requests to the next layer-2 or -3 790 network hop. 792 * Classifier, Queueing and Scheduler - these functions are basically as 793 described by the Forwarding Process of IEEE 802.1p (see section 3.7 of 794 [2]). The Classifier module identifies the relevant QoS information from 795 incoming packets and uses this, together with the normal bridge 796 forwarding database, to decide to which output queue of which output 797 port to enqueue the packet. In Class I switches, this information is the 798 "regenerated user_priority" parameter which has already been decoded by 799 the receiving MAC service and potentially re-mapped by the 802.1p 800 forwarding process (see description in section 3.7.3 of [2]). This does 801 not preclude more sophisticated classification rules which may be 802 applied in more complex Class III switches e.g. matching on individual 803 int-serv flows. 805 The Queueing and Scheduler module holds the output queues for ports and 806 provides the algorithm for servicing the queues for transmission onto 807 the output link in order to provide the promised int-serv service. 808 Switches will implement one or more output queues per port and all will 809 implement at least a basic strict priority dequeueing algorithm as their 810 default, in accordance with 802.1p. 812 * Ingress traffic class mapper and policing - as described in 802.1p 813 section 3.7. This optional module may check on whether the data within 814 traffic classes are conforming to the patterns currently agreed: 815 switches may police this and discard or re-map packets. The default 816 behaviour is to pass things through unchanged. 818 * Egress traffic class mapper - as described in 802.1p section 3.7. This 819 optional module may apply re-mapping of traffic classes e.g. on a per- 820 output port basis. The default behaviour is to pass things through 821 unchanged. 823 These are shown by the following diagram which is a superset of the IEEE 824 802.1D/802.1p bridge model: 826 _______________________________ 827 | _____ ______ ______ | 828 SBM signaling | | | | | | | | SBM signaling 829 <------------------>| IN |<->| SBM |<->| OUT |<----------------> 830 | | SBM | | prop.| | SBM | | 831 | |_____| |______| |______| | 832 | / | ^ / | | 833 ______________| / | | | | |_____________ 834 | \ / __V__ | | __V__ / | 835 | \ ____/ |Local| | | |Local| / | 836 | \ / |Admis| | | |Admis| / | 837 | \/ |Cntrl| | | |Cntrl| / | 838 | _____V \ |_____| | | |_____| / _____ | 839 | |traff | \ ___|__ V_______ / |egrss| | 840 | |class | \ |Filter| |Queue & | / |traff| | 841 | |map & |=====|==========>|Data- |=| Packet |=|===>|class| | 842 | |police| | | base| |Schedule| | |map | | 843 | |______| | |______| |________| | |_____| | 844 |____^_________|_______________________________|______|______| 845 data in | |data out 846 ========+ +========> 847 Figure 4 - ISSLL in Switches 849 9.2 Admission Control 851 On reception of an admission control request, a switch performs the 852 following actions: 853 * ingress SBM module translates any received user_priority or else 854 selects a layer-2 traffic class which appears compatible with the 855 request and whose use does not violate any administrative policies in 856 force. In effect, it matches up the requested service with those 857 available in each of the user_priority classes and chooses the "best" 858 one. It ensures that, if this reservation is successful, the selected 859 value is passed back to the client. 860 * ingress SBM observes the current state of allocation of resources on 861 the input port/link and then determines whether the new resource 862 allocation from the mapped traffic class would be excessive. The request 863 is passed to the reservation propagator if accepted so far. 864 * reservation propagator relays the request to the bandwidth accountants 865 on each of the switch's outbound links to which this reservation would 866 apply (implied interface to routing/forwarding database). 867 * egress bandwidth accountant observes the current state of allocation 868 of queueing resources on its outbound port and bandwidth on the link 869 itself and determines whether the new allocation would be excessive. 870 Note that this is only the local decision of this switch hop: each 871 further layer-2 hop through the network gets a chance to veto the 872 request as it passes along. 873 * the request, if accepted by this switch, is then passed on down the 874 line on each output link selected. Any user_priority described in the 875 forwarded request must be translated according to any egress mapping 876 table. 878 * if accepted, the switch must notify the client of the user_priority to 879 use for packets belonging to this flow. Note that this is a 880 "provisional YES" - we assume an optimistic approach here: later 881 switches can still say "NO" later. 882 * if this switch wishes to reject the request, it can do so by notifying 883 the original client (by means of its layer-2 address). 885 10. Mappings from intserv service models to IEEE 802 887 It is assumed that admission control will be applied when deciding 888 whether or not to admit a new flow through a given network element and 889 that a device sending onto a link will be proxying the parameters and 890 admission control decisions on behalf of that link: this process will 891 require the device to be able to determine (by estimation, measurement 892 or calculation) several parameters. It is assumed that details of the 893 potential flow are provided to the device by some means (e.g. a 894 signaling protocol, network management). The service definition 895 specifications themselves provide some implementation guidance as to how 896 to calculate some of these quantities. 898 The accuracy of calculation of these parameters may not be very 899 critical: indeed it is an assumption of this model's being used with 900 relatively simple Class I switches that they merely provide values to 901 describe the device and admit flows conservatively. 903 10.1 General characterisation parameters 905 There are some general parameters that a device will need to use and/or 906 supply for all service types: 907 - Ingress link 908 - Egress links and their MTUs, framing overheads and minimum packet 909 sizes (see media-specific information presented above). 910 - available path bandwidth: updated hop-by-hop by any device along the 911 path of the flow. 912 - minimum latency 914 10.2 Parameters to implement Guaranteed Service 916 A network element must be able to determine the following parameters: 918 - Constant delay bound through this device (in addition to any value 919 provided by "minimum latency" above) and up to the receiver at the next 920 network element for the packets of this flow if it were to be admitted: 921 this would include any access latency bound to the outgoing link as well 922 as propagation delay across that link. 923 - Rate-proportional delay bound through this device and up to the 924 receiver at the next network element for the packets of this flow if it 925 were to be admitted. 926 - Receive resources that would need to be associated with this flow 927 (e.g. buffering, bandwidth) if it were to be admitted and not suffer 928 packet loss if it kept within its supplied Tspec/Rspec. 929 - Transmit resources that would need to be associated with this flow 930 (e.g. buffering, bandwidth, constant- and rate-proportional delay 931 bounds) if it were to be admitted. 933 10.3 Parameters to implement Controlled Load 935 A network element must be able to determine the following parameters 936 which can be extracted from [8]: 938 - Receive resources that would need to be associated with this flow 939 (e.g. buffering) if it were to be admitted. 940 - Transmit resources that would need to be associated with this flow 941 (e.g. buffering) if it were to be admitted. 943 10.4 Parameters to implement Best Effort 945 For a network element to implement best effort service there are no 946 explicit parameters that need to be characterised. 948 10.5 Mapping to IEEE 802 user_priority 950 There are many options available for mapping aggregations of flows 951 described by int-serv service models (Best Effort, Controlled Load, and 952 Guaranteed are the services considered here) onto user_priority classes. 953 There currently exists very little practical experience with particular 954 mappings to help make a determination as to the "best" mapping. In that 955 spirit, the following options are presented in order to stimulate 956 experimentation in this area. Note, this does not dictate what 957 mechanisms/algorithms a network element (e.g. an Ethernet switch) needs 958 to perform to implement these mappings: this is an implementation choice 959 and does not matter so long as the requirements for the particular 960 service model are met. Having said that, we do explore below the ability 961 of a switch implementing strict priority queueing to support some or all 962 of the service types under discussion: this is worthwhile because this 963 is likely to be the most widely deployed dequeueing algorithm in simple 964 switches as it is the default specified in 802.1p. 966 In order to reduce the administrative problems , such a mapping table is 967 held by *switches* (and routers if desired) but generally not by end- 968 station hosts and is a read-write table. The values proposed below are 969 defaults and can be overridden by management control so long as all 970 switches agree to some extent (the required level of agreement requires 971 further analysis). 973 It is possible that some form of network-wide lookup service could be 974 implemented that serviced requests from clients e.g. traffic_class = 975 getQoSbyName("H.323 video") and notified switches of what sorts of 976 traffic categories they were likely to encounter and how to allocate 977 those requests into traffic classes: such mechanisms are for further 978 study. 980 Proposal: A Simple Scheme 982 user_priority Service 983 0 "less than" Best Effort 984 1 Best Effort 985 2 reserved 986 3 reserved 987 4 Controlled Load 988 5 Guaranteed Service, 100ms bound 989 6 Guaranteed Service, 10ms bound 990 7 reserved 992 In this proposal, all traffic that uses the controlled load service is 993 mapped to a single 802.1p user_priority whilst that for guaranteed 994 service is placed into one of two user_priority classes with different 995 delay bounds. Unreserved best effort traffic is mapped to another. 997 The use of classes 4, 5 and 6 for Controlled Load and Guaranteed Service 998 is somewhat arbitrary as long as they are increasing. Any two classes 999 greater than Best Effort can be used as long as GS is "greater" than CL 1000 although those proposed here have the advantage that, for transit 1001 through 802.1p switches with only two-level strict priority queuing, 1002 they both get "high priority" treatment (the current 802.1p default 1003 split is 0-3 and 4-7 for a device with 2 queues). The choice of delay 1004 bound is also arbitrary but potentially very significant: this can lead 1005 to a much more efficient allocation of resources as well as greater 1006 (though still not very good) isolation between flows. 1008 The "less than best effort" class might be useful for devices that wish 1009 to tag packets that are exceeding a committed network capacity and can 1010 be optionally discarded by a downstream device. Note, this is not 1011 *required* by any current int-serv models but is under study. 1013 The advantage to this approach is that it puts some real delay bounds on 1014 the Guaranteed Service without adding any additional complexity to the 1015 other services. It still ignores the amount of *bandwidth* available 1016 for each class. This should behave reasonably well as long as all 1017 traffic for CL and GS flows does not exceed any resource capacities in 1018 the device. Some isolation between very delay-critical GS and less 1019 critical GS flows is provided but there is still an overall assumption 1020 that flows will in general be well- behaved. In addition, this mapping 1021 still leaves room for future service models. 1023 Expanding the number of classes for CL service is not as appealing since 1024 there is no need to map to a particular delay bound. There may be cases 1025 where an administrator might map CL onto more classes for particular 1026 bandwidths or policy levels. It may also be desirable to further 1027 subdivide CL traffic in cases where the itis frequently non-conformant 1028 for certain applications. 1030 11. Network Topology Scenarios 1032 11.1 Switched networks using priority scheduling algorithms 1034 In general, the int-serv standards work has tried to avoid any 1035 specification of scheduling algorithms, instead relying on implementers 1036 to deduce appropriate algorithms from the service definitions and on 1037 users to apply measurable benchmarks to check for conformance. However, 1038 since one standards' body has chosen to specify a single default 1039 scheduling algorithm for switches [2], it seems appropriate to examine 1040 to some degree, how well this "implementation" might actually support 1041 some or all of the int-serv services. 1043 If the mappings of Proposal A above are applied in a switch implementing 1044 strict priority queueing between the 8 traffic classes (7 = highest) 1045 then the result will be that all Guaranteed Service packets will be 1046 transmitted in preference to any other service. Controlled Load packets 1047 will be transmitted next, with everything else waiting until both of 1048 these queues are empty. If the admission control algorithms in use on 1049 the switch ensure that the sum of the "promised" bandwidth of all of the 1050 GS and CL sessions are never allowed to exceed the available link 1051 bandwidth then things are looking good. 1053 11.2 Full-duplex switched networks 1055 We have up to now ignored the MAC access protocol. On a full-duplex 1056 switched LAN (of either Ethernet or Token-Ring types - the MAC algorithm 1057 is, by definition, unimportant) this can be factored in to the 1058 characterisation parameters advertised by the device since the access 1059 latency is well controlled (jitter = one largest packet time). Some 1060 example characteristics (approximate): 1062 Type Speed Max Pkt Max Access 1063 Length Latency 1065 Ethernet 10Mbps 1.2ms 1.2ms 1066 100Mbps 120us 120us 1067 1Gbps 12us 12us 1068 Token-Ring 4Mbps 9ms 9ms 1069 16Mbps 9ms 9ms 1070 FDDI 100Mbps 360us 8.4ms 1072 These delays should be also be considered in the context of speed- of- 1073 light delays of e.g. ~400ns for typical 100m UTP links and ~7us for 1074 typical 2km multimode fibre links. 1076 Therefore we see Full-Duplex switched network topologies as offering 1077 good QoS capabilities for both Controlled Load and Guaranteed Service. 1079 11.3 Shared-media Ethernet networks 1081 We have not mentioned the difficulty of dealing with allocation on a 1082 single shared CSMA/CD segment: as soon as any CSMA/CD algorithm is 1083 introduced then the ability to provide any form of Guaranteed Service is 1084 seriously compromised in the absence of any tight coupling between the 1085 multiple senders on the link. There are a number of reasons for not 1086 offering a better solution for this issue. 1088 Firstly, we do not believe this is a truly solvable problem: it would 1089 seem to require a new MAC protocol. Those who are interested in solving 1090 this problem per se should probably be following the BLAM developments 1091 in 802.3 but we would be suspicious of the interoperability 1092 characteristics of a series of new software MACs running above the 1093 traditional 802.3 MAC. 1095 Secondly, we are not convinced that it is really an interesting problem. 1096 While not everyone in the world is buying desktop switches today and 1097 there will be end stations living on repeated segments for some time to 1098 come, the number of switches is going up and the number of stations on 1099 repeated segments is going down. This trend is proceeding to the point 1100 that we may be happy with a solution which assumes that any network 1101 conversation requiring resource reservations will take place through at 1102 least one switch (be it layer-2 or layer-3). Put another way, the 1103 easiest QoS upgrade to a layer-2 network is to install segment 1104 switching: only when has been done is it worthwhile to investigate more 1105 complex solutions involving admission control. 1107 Thirdly, in the core of the network (as opposed to at the edges), there 1108 does not seem to be enough economic benefit for repeated segment 1109 solutions as opposed to switched solutions. While repeated solutions 1110 *may* be 50% cheaper, their cost impact on the entire network is 1111 amortised across all of the edge ports. There may be special 1112 circumstances in the future (e.g. Gigabit buffered repeaters) but these 1113 have differing characteristics to existing CSMA/CD repeaters anyway. 1115 Type Speed Max Pkt Max Access 1116 Length Latency 1118 Ethernet 10Mbps 1.2ms unbounded 1119 100Mbps 120us unbounded 1120 1Gbps 12us unbounded 1122 11.4 Half-duplex switched Ethernet networks 1124 Many of the same arguments for sub-optimal support of Guaranteed Service 1125 apply to half-duplex switched Ethernet as to shared media: in essence, 1126 this topology is a medium that *is* shared between at least two senders 1127 contending for each packet transmission opportunity. Unless these are 1128 tightly coupled and cooperative then there is always the chance that the 1129 junk traffic of one will interfere with the other's important traffic. 1130 Such coupling would seem to need some form of modifications to the MAC 1131 protocol (see above). 1133 Notwithstanding this, these topologies do seem to offer the chance to 1134 provide Controlled Load service: with the knowledge that there are only 1135 a small limited number (e.g. two) of potential senders that are both 1136 using prioritisation for their CL traffic (with admission control for 1137 those CL flows based on the knowledge of the number of potential 1138 senders) over best effort, the media access characteristics, whilst not 1139 deterministic in the true mathematical sense, are somewhat predictable. 1140 This is probably a close enough approximation to CL to be useful. 1142 Type Speed Max Pkt Max Access 1143 Length Latency 1145 Ethernet 10Mbps 1.2ms unbounded 1146 100Mbps 120us unbounded 1147 1Gbps 12us unbounded 1149 11.5 Half-duplex and shared Token Ring networks 1151 In a shared Token Ring network, the network access time for high 1152 priority traffic at any station is bounded and is given by (N+1)*THTmax, 1153 where N is the number of stations sending high priority traffic and 1154 THTmax is the maximum token holding time [14]. This assumes that network 1155 adapters have priority queues so that reservation of the token is done 1156 for traffic with the highest priority currently queued in the adapter. 1157 It is easy to see that access times can be improved by reducing N or 1158 THTmax. The recommended default for THTmax is 10 ms [6]. N is an 1159 integer from 2 to 256 for a shared ring and 2 for a switched half duplex 1160 topology. A similar analysis applies for FDDI. Using default values 1161 gives: 1163 Type Speed Max Pkt Max Access 1164 Length Latency 1166 Token-Ring 4/16Mbps shared 9ms 2570ms 1167 4/16Mbps switched 9ms 30ms 1168 FDDI 100Mbps 360us 8ms 1170 Given that access time is bounded, it is possible to provide an upper 1171 bound for end-to-end delays as required by Guaranteed Service assuming 1172 that traffic of this class uses the highest priority allowable for user 1173 traffic. The actual number of stations that send traffic mapped into 1174 the same traffic class as GS may vary over time but, from an admission 1175 control standpoint, this value is needed a priori. The admission 1176 control entity must therefore use a fixed value for N, which may be the 1177 total number of stations on the ring or some lower value if it is 1178 desired to keep the offered delay guarantees smaller. If the value of N 1179 used is lower than the total number of stations on the ring, admission 1180 control must ensure that the number of stations sending high priority 1181 traffic never exceeds this number. This approach allows admission 1182 control to estimate worst case access delays assuming that all of the N 1183 stations are sending high priority data even though, in most cases, this 1184 will mean that delays are significantly overestimated. 1186 Assuming that Controlled Load flows use a traffic class lower than that 1187 used by GS, no upper-bound on access latency can be provided for CL 1188 flows. However, CL flows will receive better service than best effort 1189 flows. 1191 Note that, on many existing shared token rings, bridges will transmit 1192 frames using an Access Priority (see section 3.3) value 4 irrespective 1193 of the user_priority carried in the frame control field of the frame. 1194 Therefore, existing bridges would need to be reconfigured or modified 1195 before the above access time bounds can actually be used. 1197 12. Signaling protocol 1199 The mechanisms described in this document make use of a signaling 1200 protocol for devices to communicate their admission control requests 1201 across the network: the service definitions to be provided by such a 1202 protocol are described below. The candidate IETF protocol for this 1203 purpose is called "Subnet Bandwidth Manager" and is described in [10]. 1205 In all these cases, appropriate delete/cleanup mechanisms will also have 1206 to be provided for when sessions are torn down. All interactions are 1207 assumed to provide read as well as write capabilities. 1209 12.1 Client service definitions 1211 The following interfaces are identified from Figures 2 and 3: 1213 SBM <-> Address mapping 1215 This is a simple lookup function which may cause ARP protocol 1216 interactions, may be just a lookup of an existing ARP cache entry or may 1217 be an algorithmic mapping. The layer-2 addresses are needed by SBM for 1218 inclusion in its signaling messages to/from switches which avoids the 1219 switches having to perform the mapping and, hence, have knowledge of 1220 layer-3 information for the complete subnet: 1222 l2_addr = map_address( ip_addr ) 1224 SBM <-> Session/802 header 1226 This is for notifying the transmit path of how to associate 1227 user_priority values with the traffic of each outgoing session: the 1228 transmit path will provide the user_priority value when it requests a 1229 MAC-layer transmit operation for each packet (user_priority is one of 1230 the parameters defined by the IEEE 802 service model): 1232 bind_802_header( sessionid, user_priority ) 1234 SBM <-> Classifier/Scheduler 1236 This is for notifying transmit classifier/scheduler of additional 1237 layer-2 information associated with scheduling the transmission of a 1238 session's packets (may be unused in some cases): 1240 bind_l2sessioninfo( sessionid, l2_header, traffic_class ) 1242 SBM <-> Local Admission Control 1244 For applying local admission control for a session e.g. is there enough 1245 transmit bandwidth still uncommitted for this potential new session? Are 1246 there sufficient receive buffers? This should commit the necessary 1247 resources if OK: it will be necessary to release these resources if a 1248 later stage of the session setup process fails. 1250 status = admit_l2txsession( Tspec, flowspec ) 1251 status = admit_l2rxsession( Rspec, flowspec ) 1253 SBM <-> RSVP - this is outlined above in section 8.2 and fully described 1254 in [10]. 1256 12.2 Switch service definitions 1258 The following interfaces are identified from Figure 4: 1260 SBM <-> Classifier 1262 This is for notifying receive classifier of how to match up incoming 1263 layer-2 information with the associated traffic class: it may in some 1264 cases consist of a set of read-only default mappings: 1266 bind_l2classifierinfo( l2_header, traffic_class ) 1268 SBM <-> Queue and Packet Scheduler 1270 This is for notifying transmit scheduler of additional layer-2 1271 information associated with a given traffic class (it may be unused in 1272 some cases): 1274 bind_l2schedulerinfo( l2_header, traffic_class ) 1276 SBM <-> Local Admission Control 1278 As for host above. 1280 SBM <-> Traffic Class Map and Police 1282 Optional configuration of any layer-2 policing function and/or 1283 user_priority remapping that might be implemented on input to a switch: 1285 bind_l2classmapping( in_user_priority, remap_user_priority ) 1286 bind_l2policing( l2_header, traffic_characteristics ) 1288 SBM <-> Filtering Database 1290 SBM propagation rules need access to the layer-2 forwarding database to 1291 determine where to forward SBM messages (analogous to RSRR interface in 1292 L3 RSVP): 1294 output_portlist = lookup_l2dest( l2_addr ) 1296 13. Compatibility and Interoperability with existing equipment 1298 Layer-2-only "standard" 802.1p switches will have to work together with 1299 routers and layer-3 switches. Wide deployment of such 802.1p switches is 1300 envisaged, in a number of roles in the network. "Desktop switches" will 1301 provide dedicated 10/100 Mbps links to end stations at costs 1302 comparable/compatible with NICs/adapter cards. Very high speed core 1303 switches may act as central campus switching points for layer 3 devices. 1304 Real network deployments provide a wide range of examples today. The 1305 question is "what functionality beyond that of the basic 802.1D bridge 1306 should such 802.1p switches provide?". In the abstract the answer is 1307 "whatever they can do to broaden the applicability of the switching 1308 solution while still being economically distinct from the layer 3 1309 switches in their cost of acquisition, speed/bandwidth, cost of 1310 ownership and administration". Broadening the applicability means both 1311 addressing the needs of new traffic types and building larger switched 1312 networks (or making larger portions of existing networks switched). Thus 1313 one could imagine a network in which every device (along a network path) 1314 was layer-3 capable/intrusive into the full data stream; or one in which 1315 only the edge devices were pure layer-2; or one in which every alternate 1316 device lacked layer-3 functionality; or most do - excluding some key 1317 control points such as router firewalls, for example. Whatever the mix, 1318 the solution has to interoperate with these layer-3 QoS-aware devices. 1320 Of course, where int-serv flows pass through equipment which is ignorant 1321 of priority queuing and which places all packets through the same 1322 queuing/overload-dropping path, it is obvious that some of the 1323 characteristics of the flow get more difficult to support. Suitable 1324 courses of action in the cases where sufficient bandwidth or buffering 1325 is not available are of the form: 1327 (a) buy more (and bigger) routers 1328 (b) buy more capable switches 1329 (c) rearrange the network topology: 802.1Q VLANs [11] may help here. 1330 (d) buy more bandwidth 1332 It would also be possible to pass more information between switches 1333 about the capabilities of their neighbours and to route around non- 1334 QoS-capable switches: such methods are for further study. 1336 14. Justification 1338 An obvious comment is that this is all too complex, it's what RSVP is 1339 doing already, why do we think we can do better by reinventing the 1340 solution to this problem at layer-2? 1341 The key is that we do not have to tackle the full problem space of RSVP: 1342 there are a number of simple scenarios that cover a considerable 1343 proportion of the real situations that occur: all we have to do here is 1344 cover 99% of the territory at significantly lower cost and leave the 1345 other applications to full RSVP running in strategically positioned 1346 high-function switches or routers. This will allow a significant 1347 reduction in overall network cost (equipment and ownership). This 1348 approach does mean that we have to discuss real life situations instead 1349 of abstract topologies that "could happen". 1351 Sometimes, for example, simple bandwidth configuration in a few switches 1352 e.g. to avoid overloading particular trunk links, can be used to 1353 overcome bottlenecks due to the network topology: if there are issues 1354 with overloading end station "last hops", RSVP in the end stations would 1355 exert the correct controls simply by examining local resources without 1356 much tie-in to the layer-2 topology. In this case there has been no need 1357 to resort to any form of complex topology computation and much 1358 complexity has been avoided. 1360 In the more general case, there remains work to be done. This will need 1361 to be done against the background constraint that the changing of queue 1362 service policies and the addition of extra functionality to support new 1363 service disciplines will proceed at the rate of hardware product 1364 development cycles and advance implementations of new algorithms may be 1365 pursued reluctantly or without the necessary 20/20 foresight. 1367 However, compared to the alternative of no traffic classes at all, there 1368 is substantial benefit in even the simplest of approaches (e.g. 2-4 1369 queues with straight priority), so there is significant reward for doing 1370 something: wide acceptance of that "something" probably means that even 1371 the simplest queue service disciplines will be provided for. 1373 15. References 1375 [1] ISO/IEC 10038, ANSI/IEEE Std 802.1D-1993 "MAC Bridges" 1377 [2] "Supplement to MAC Bridges: Traffic Class Expediting and 1378 Dynamic Multicast Filtering", May 1997, IEEE P802.1p/D6 1380 [3] "Integrated Services in the Internet Architecture: an Overview" 1381 RFC1633, June 1994 1383 [4] "Resource Reservation Protocol (RSVP) - Version 1 Functional 1384 Specification", Internet Draft, June 1997 1385 1387 [5] "Carrier Sense Multiple Access with Collision Detection 1388 (CSMA/CD) Access Method and Physical Layer Specifications" 1389 ANSI/IEEE Std 802.3-1985. 1391 [6] "Token-Ring Access Method and Physical Layer Specifications" 1392 ANSI/IEEE Std 802.5-1995 1394 [7] "A Framework for Providing Integrated Services Over Shared and 1395 Switched LAN Technologies", Internet Draft, May 1997 1396 1398 [8] "Specification of the Controlled-Load Network Element Service", 1399 Internet Draft, May 1997, 1400 1402 [9] "Specification of Guaranteed Quality of Service", 1403 Internet Draft, February 1997, 1404 1406 [10] "SBM (Subnet Bandwidth Manager): A Proposal for Admission 1407 Control over Ethernet", Internet Draft, June 1997 1408 1410 [11] "Draft Standard for Virtual Bridged Local Area Networks", 1411 May 1997, IEEE P802.1Q/D6 1413 [12] "General Characterization Parameters for Integrated 1414 Service Network Elements", Internet Draft, November 1996 1415 1417 [13] "A Standard for the Transmission of IP Datagrams over IEEE 1418 802 Networks", RFC 1042, February 1988 1420 [14] "The Use of Priorities on Token-Ring Networks for Multimedia 1421 Traffic", C. Bisdikian, B. V. Patel, F. Schaffa and M. 1422 Willebeek-LeMair, IEEE Network, Nov/Dec 1995. 1424 16. Security Considerations 1426 There are no known security issues over and above those inherent in the 1427 Integrated Services architecture and the network technologies referenced 1428 by this document. 1430 17. Acknowledgments 1431 This document draws heavily on the work of the ISSLL WG of the IETF and 1432 the IEEE P802.1 Interworking Task Group. In particular, it includes 1433 previous work on Token-Ring by Anoop Ghanwani, Wayne Pace and Vijay 1434 Srinivasan. 1436 18. Authors' addresses 1438 Mick Seaman 1439 3Com Corp. 1440 5400 Bayfront Plaza 1441 Santa Clara CA 95052-8145 1442 USA 1443 +1 (408) 764 5000 1444 mick_seaman@3com.com 1446 Andrew Smith 1447 Extreme Networks 1448 10460 Bandley Drive 1449 Cupertino CA 95014 1450 USA 1451 +1 (408) 863 2821 1452 andrew@extremenetworks.com 1454 Eric Crawley 1455 Gigapacket Networks 1456 25 Porter Rd. 1457 Littleton MA 01460 1458 USA 1459 +1 (508) 486 0665 1460 esc@gigapacket.com