idnits 2.17.1 draft-ietf-issll-is802-framework-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-16) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 41 longer pages, the longest (page 40) being 74 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 42 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The abstract seems to contain references ([13], [14]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 39 has weird spacing: '...res. It inclu...' == Line 554 has weird spacing: '...ted and the s...' == Line 1139 has weird spacing: '...tion of any l...' == Line 1570 has weird spacing: '... 1.2ms unbou...' == Line 1571 has weird spacing: '... 120us unbou...' == (4 more instances...) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 1997) is 9649 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: '11' is defined on line 1828, but no explicit reference was found in the text == Unused Reference: '12' is defined on line 1832, but no explicit reference was found in the text == Unused Reference: '17' is defined on line 1855, but no explicit reference was found in the text == Unused Reference: '18' is defined on line 1859, but no explicit reference was found in the text == Unused Reference: '20' is defined on line 1866, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. '1' -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '3' -- Possible downref: Non-RFC (?) normative reference: ref. '4' ** Downref: Normative reference to an Informational RFC: RFC 1633 (ref. '8') -- Possible downref: Non-RFC (?) normative reference: ref. '10' ** Downref: Normative reference to an Historic RFC: RFC 1819 (ref. '12') -- Possible downref: Non-RFC (?) normative reference: ref. '13' -- Possible downref: Non-RFC (?) normative reference: ref. '14' -- Possible downref: Non-RFC (?) normative reference: ref. '15' -- Possible downref: Non-RFC (?) normative reference: ref. '16' -- Possible downref: Non-RFC (?) normative reference: ref. '18' -- Possible downref: Non-RFC (?) normative reference: ref. '19' -- Possible downref: Non-RFC (?) normative reference: ref. '20' Summary: 12 errors (**), 0 flaws (~~), 14 warnings (==), 14 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Anoop Ghanwani 3 INTERNET DRAFT J. Wayne Pace 4 Expires May 1998 Vijay Srinivasan 5 IBM Corp. 6 Andrew Smith 7 Extreme Networks 8 Mick Seaman 9 3Com Corp. 10 November 1997 12 A Framework for Providing Integrated Services 13 Over Shared and Switched IEEE 802 LAN Technologies 15 draft-ietf-issll-is802-framework-03.txt 17 Status of This Memo 19 This document is an Internet-Draft. Internet Drafts are working 20 documents of the Internet Engineering Task Force (IETF), its areas, 21 and its working groups. Note that other groups may also distribute 22 working documents as Internet Drafts. Internet Drafts are draft 23 documents valid for a maximum of six months, and may be updated, 24 replaced, or obsoleted by other documents at any time. It is not 25 appropriate to use Internet Drafts as reference material, or to cite 26 them other than as a ``working draft'' or ``work in progress.'' To 27 view the entire list of current Internet-Drafts, please check the 28 "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow 29 Directories on ftp.is.co.za (Africa), ftp.nordu.net (Europe), 30 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 31 ftp.isi.edu (US West Coast). This document is a product of the IS802 32 subgroup of the ISSLL working group of the Internet Engineering Task 33 Force. Comments are solicited and should be addressed to the working 34 group's mailing list at issll@mercury.lcs.mit.edu and/or the authors. 36 Abstract 38 This memo describes a framework for supporting IETF Integrated Services 39 on shared and switched LAN infrastructures. It includes background 40 material on the capabilities of IEEE 802-like networks with regard to 41 parameters that affect Integrated Services such as access latency, delay 42 variation and queueing support in LAN switches. It discusses aspects of 43 IETF's Integrated Services model that cannot easily be accommodated in 44 different LAN environments. It outlines a functional model for 45 supporting the Resource Reservation Protocol (RSVP) in such LAN 46 environments. Details of extensions to RSVP for use over LANs are 47 described in an accompanying memo [14]. Mappings of the assorted 48 Integrated Services onto IEEE LANs are described in another memo [13]. 50 1 Introduction 52 The Internet has traditionally provided support for best effort traffic 53 only. However, with the recent advances in link layer technology, and 54 with numerous emerging real-time applications such as video conferencing 55 and Internet telephony, there has been much interest for developing 56 mechanisms which enable real-time services over the Internet. A 57 framework for meeting these new requirements was set out in RFC1633 [8] 58 and this has driven the specification of various classes of network 59 service by the Integrated Services working group of the IETF, such as 60 Controlled Load RFC 2211 [6] and Guaranteed Service RFC 2212 [7]. Each 61 of these service classes is designed to provide certain Quality of 62 Service (QoS) to traffic conforming to a specified set of parameters. 63 Applications are expected to choose one of these classes according to 64 their QoS requirements. One mechanism for end-stations to utilise such 65 services in an IP network is provided by a QoS signaling protocol, the 66 Resource Reservation Protocol (RSVP) RFC 2205 [5] developed by the RSVP 67 working group of the IETF. The IEEE under its Project 802 has defined 68 standards for many different local area network technologies. These all 69 typically offer the same "MAC-layer" datagram service [1] to upper-layer 70 protocols such as IP although they often provide different dynamic 71 behaviour characteristics - it is these that are important when 72 considering their ability to support real-time services. Later in this 73 memo we describe some of the relevant characteristics of different MAC- 74 layer LAN technologies. In addition, IEEE 802 has defined standards 75 for bridging multiple LAN segments together using devices known as "MAC 76 Bridges" or "Switches" [2]. Newer work has also defined enhanced queuing 77 [3] and "virtual LAN" [4] capabilities for these devices. Such LANs 78 often constitute the last hop or hops between users and the Internetas 79 well as being a primary building-block for complete private campus 80 networks. It is therefore necessary to provide standardized mechanisms 81 for using these technologies to support end- to-end real-time services. 82 In order to do this, there must be some mechanism for resource 83 management at the data-link link layer. Resource management in this 84 context encompasses the functions of admission control, scheduling, 85 traffic policing, etc. The ISSLL (Integrated Services over Specific 86 Link Layers) working group in the IETF was chartered with the purpose of 87 exploring and standardizing such mechanisms for various link layer 88 technologies. 2 Document Outline 90 This document is concerned with specifying a framework for providing 91 Integrated Services over shared and switched LAN technologies such as 92 Ethernet/802.3, token ring/802.5, FDDI, etc. We begin in section 4 with 93 a discussion of the capabilities of various IEEE 802 MAC-layer 94 technologies. Section 5lists the requirements and goals for a mechanism 95 capable of providing Integrated Services in a LAN. The resource 96 management functions outlined in Section 5are provided by an entity 97 referred to as a Bandwidth Manager (BM): the architectural model of the 98 the BM is described in section 6 and its various components are 99 discussed in section 7. Some implementation issues with respect to link 100 layer support for Integrated Services are examined in Section 8. We then 101 in section 9 discuss a taxonomy of topologies for the LAN technologies 102 under consideration with an emphasis on the capabilities of each which 103 can be leveraged for enabling Integrated Services. In this framework, no 104 assumptions are made about the topology at the link layer. The 105 framework is intended to be as exhaustive as possible; this means that 106 it is possible that all the functions discussed may not be supportable 107 by a particular topology or technology, but this should not preclude the 108 usage of this model for it. 110 3 Definitions 112 The following is a list of terms used in this and other ISSLL 113 documents. 115 - Link Layer or Layer 2 or L2: We refer to data-link layer technologies 116 such as IEEE 802.3/Ethernet as L2 or layer 2. 118 - Link Layer Domain or Layer 2 domain or L2 domain: a set of nodes and 119 links interconnected without passing through a L3 forwarding function. 120 One or more IP subnets can be overlaid on a L2 domain. 122 - Layer 2 or L2 devices: We refer to devices that only implement Layer 2 123 functionality as Layer 2 or L2 devices. These include 802.1D bridges or 124 switches. 126 - Internetwork Layer or Layer 3 or L3: Layer 3 of the ISO 7 layer model. 127 This memo is primarily concerned with networks that use the Internet 128 Protocol (IP) at this layer. 130 - Layer 3 Device or L3 Device or End-Station: these include hosts and 131 routers that use L3 and higher layer protocols or application programs 132 that need to make resource reservations. 134 - Segment: A L2 physical segment that is shared by one or more senders. 135 Examples of segments include (a) a shared Ethernet or Token-Ring wire 136 resolving contention for media access using CSMA or token passing, (b) a 137 half duplex link between two stations or switches, (c) one direction of 138 a switched full-duplex link. 140 - Managed segment: A managed segment is a segment with a DSBM present 141 and responsible for exercising admission control over requests for 142 resource reservation. A managed segment includes those interconnected 143 parts of a shared LAN that are not separated by DSBMs. 145 - Traffic Class: An aggregation of data flows which are given similar 146 service within a switched network. 148 - Subnet: used in this memo to indicate a group of L3 devices sharing a 149 common L3 network address prefix along with the set of segments making 150 up the L2 domain in which they are located. 152 - Bridge/Switch: a layer 2 forwarding device as defined by IEEE 802.1D. 153 The terms bridge and switch are used synonymously in this memo. 155 4 Frame Forwarding in IEEE 802 Networks 157 4.1 General IEEE 802 Service Model 159 User_priority is a value associated with the transmission and reception 160 of all frames in the IEEE 802 service model: it is supplied by the 161 sender that is using the MAC service. It is provided along with the data 162 to a receiver using the MAC service. It may or may not be actually 163 carried over the network: Token-Ring/802.5 carries this value (encoded 164 in its FC octet), basic Ethernet/802.3 does not, 802.12 may or may not 165 depending on the frame format in use. 802.1p defines a consistent way to 166 carry this value over the bridged network on Ethernet, Token Ring, 167 Demand-Priority, FDDI or other MAC-layer media using an extended frame 168 format. The usage of user_priority is summarised below but is more fully 169 described in section 2.5 of 802.1D [2] and 802.1p [3] "Support of the 170 Internal Layer Service by Specific MAC Procedures" and readers are 171 referred to these documents for further information. 173 If the "user_priority" is carried explicitly in packets, its utility is 174 as a simple label in the data stream enabling packets in different 175 classes to be discriminated easily by downstream nodes without their 176 having to parse the packet in more detail. 178 Apart from making the job of desktop or wiring-closet switches easier, 179 an explicit field means they do not have to change hardware or software 180 as the rules for classifying packets evolve (e.g. based on new protocols 181 or new policies). More sophisticated layer-3 switches, perhaps deployed 182 towards the core of a network, can provide added value here by 183 performing the classification more accurately and, hence, utilising 184 network resources more efficiently or providing better protection of 185 flows from one another: this appears to be a good economic choice since 186 there are likely to be very many more desktop/wiring closet switches in 187 a network than switches requiring layer-3 functionality. 189 The IEEE 802 specifications make no assumptions about how user_priority 190 is to be used by end stations or by the network. In particular it can 191 only be considered a "priority" in a loose sense: although 802.1p 192 defines static priority queuing as the default mode of operation of 193 switches that implement multiple queues (user_priority is defined as a 194 3-bit quantity so strict priority queueing would give value 7 = high 195 priority, 0 = low priority). The general switch algorithm is as follows: 196 packets are placed onto a particular queue based on the received 197 user_priority (perhaps directly from the packet if a 802.1p header or 198 802.5 network was used or else invented according to some local policy 199 if not). The selection of queue is based on a mapping from user_priority 200 [0,1,2,3,4,5,6 or 7] onto the number of available queues. Note that 201 switches may implement any number of queues from 1 upwards and it may 202 not be visible externally, except through any advertised int-serv 203 parameters and the switch's admission control behaviour, which 204 user_priority values get mapped internally onto the same vs. different 205 queues. Other algorithms that a switch might implement might include 206 e.g. weighted fair queueuing, round robin. 208 In particular, IEEE makes no recommendations about how a sender should 209 select the value for user_priority: one of the main purposes of this 210 current document is to propose such usage rules and how to communicate 211 the semantics of the values between switches, end-stations and routers. 212 In the remainder of this document we use the term "traffic class" 213 synonymously with user_priority. 215 4.2 Ethernet/802.3 217 There is no explicit traffic class or user_priority field carried in 218 Ethernet packets. This means that user_priority must be regenerated at a 219 downstream receiver or switch according to some defaults or by parsing 220 further into higher-layer protocol fields in the packet. Alternatively, 221 the IEEE 802.1Q encapsulation [4] may be used which provides an explicit 222 traffic class field on top of an basic MAC format. 224 For the different IP packet encapsulations used over Ethernet/802.3, it 225 will be necessary to adjust any admission-control calculations according 226 to the framing and to the padding requirements: 228 Encapsulation Framing Overhead IP MTU 229 bytes/pkt bytes 231 IP EtherType (ip_len<=46 bytes) 64-ip_len 1500 232 (1500>=ip_len>=46 bytes) 18 1500 234 IP EtherType over 802.1p/Q (ip_len<=42) 64-ip_len 1500* 235 (1500>=ip_len>=42 bytes) 22 1500* 237 IP EtherType over LLC/SNAP (ip_len<=40) 64-ip_len 1492 238 (1500>=ip_len>=40 bytes) 24 1492 240 * note that the draft IEEE 802.1Q specification exceeds the current IEEE 241 802.3 maximum packet length values by 4 bytes although work is 242 proceeding within IEEE to address this issue. 244 4.3 Token-Ring/802.5 246 The token ring standard [6] provides a priority mechanism that can be 247 used to control both the queuing of packets for transmission and the 248 access of packets to the shared media. The priority mechanisms are 249 implemented using bits within the Access Control (AC) and the Frame 250 Control (FC) fields of a LLC frame. The first three bits of the AC 251 field, the Token Priority bits, together with the last three bits of the 252 AC field, the Reservation bits, regulate which stations get access to 253 the ring. The last three bits of the FC field of an LLC frame, the User 254 Priority bits, are obtained from the higher layer in the user_priority 255 parameter when it requests transmission of a packet. This parameter also 256 establishes the Access Priority used by the MAC. The user_priority value 257 is conveyed end-to-end by the User Priority bits in the FC field and is 258 typically preserved through Token-Ring bridges of all types. In all 259 cases, 0 is the lowest priority. 261 Token-Ring also uses a concept of Reserved Priority: this relates to the 262 value of priority which a station uses to reserve the token for the next 263 transmission on the ring. When a free token is circulating, only a 264 station having an Access Priority greater than or equal to the Reserved 265 Priority in the token will be allowed to seize the token for 266 transmission. Readers are referred to [14] for further discussion of 267 this topic. 269 A token ring station is theoretically capable of separately queuing each 270 of the eight levels of requested user priority and then transmitting 271 frames in order of priority. A station sets Reservation bits according 272 to the user priority of frames that are queued for transmission in the 273 highest priority queue. This allows the access mechanism to ensure that 274 the frame with the highest priority throughout the entire ring will be 275 transmitted before any lower priority frame. Annex I to the IEEE 802.5 276 token ring standard recommends that stations send/relay frames as 277 follows: 279 Application user_priority 281 non-time-critical data 0 282 - 1 283 - 2 284 - 3 285 LAN management 4 286 time-sensitive data 5 287 real-time-critical data 6 288 MAC frames 7 290 To reduce frame jitter associated with high-priority traffic, the annex 291 also recommends that only one frame be transmitted per token and that 292 the maximum information field size be 4399 octets whenever delay- 293 sensitive traffic is traversing the ring. Most existing implementations 294 of token ring bridges forward all LLC frames with a default access 295 priority of 4. Annex I recommends that bridges forward LLC frames that 296 have a user priorities greater that 4 with a reservation equal to the 297 user priority (although the draft IEEE P802.1p [2] permits network 298 management override this behaviour). The capabilities provided by token 299 ring's user and reservation priorities and by IEEE 802.1p can provide 300 effective support for Integrated Services flows that request QoS using 301 RSVP. These mechanisms can provide, with few or no additions to the 302 token ring architecture, bandwidth guarantees with the network flow 303 control necessary to support such guarantees. 305 For the different IP packet encapsulations used over Token Ring/802.5, 306 it will be necessary to adjust any admission-control calculations 307 according to the framing requirements: 309 Encapsulation Framing Overhead IP MTU 310 bytes/pkt bytes 312 IP EtherType over 802.1p/Q 29 4370* 313 IP EtherType over LLC/SNAP 25 4370* 315 *the suggested MTU from RFC 1042 [13] is 4464 bytes but there are issues 316 related to discovering what the maximum supported MTU between any two 317 points both within and between Token Ring subnets. We recommend here an 318 MTU consistent with the 802.5 Annex I recommendation. 320 4.4 FDDI 322 The Fiber Distributed Data Interface standard [16] provides a priority 323 mechanism that can be used to control both the queuing of packets for 324 transmission and the access of packets to the shared media. The priority 325 mechanisms are implemented using similar mechanisms to Token-Ring 326 described above. The standard also makes provision for "Synchronous" 327 data traffic with strict media access and delay guarantees - this mode 328 of operation is not discussed further here: this is an area within the 329 scope of the ISSLL WG that requires further work. In the remainder of 330 this document we treat FDDI as a 100Mbps Token Ring (which it is) using 331 a service interface compatible with IEEE 802 networks. 333 4.5 Demand-Priority/802.12 334 IEEE 802.12 [19] is a standard for a shared 100Mbit/s LAN. Data packets 335 are transmitted using either 803.3 or 802.5 frame formats. The MAC 336 protocol is called Demand Priority. Its main characteristics in respect 337 to QoS are the support of two service priority levels (normal- and high- 338 priority) and the service order: data packets from all network nodes 339 (e.g. end-hosts and bridges/switches) are served using a simple round 340 robin algorithm. 342 If the 802.3 frame format is used for data transmission then 343 user_priority is encoded in the starting delimiter of the 802.12 data 344 packet. If the 802.5 frame format is used then the priority is 345 additionally encoded in the YYY bits of the AC field in the 802.5 packet 346 header (see also section 4.3). Furthermore, the 802.1p/Q encapsulation 347 may also be applied in 802.12 networks with its own user_priority field. 348 Thus, in all cases, switches are able to recover any user_priority 349 supplied by a sender. 351 The same rules apply for 802.12 user_priority mapping through a bridge 352 as with other media types: the only additional information is that 353 "normal" priority is used by default for user_priority values 0 through 354 4 inclusive and "high" priority is used for user_priority levels 5 355 through 7: this ensures that the default Token-Ring user_priority level 356 of 4 for 802.5 bridges is mapped to "normal" on 802.12 segments. 358 The medium access in 802.12 LANs is deterministic: the demand priority 359 mechanism ensures that, once the normal priority service has been pre- 360 empted, all high priority packets have strict priority over packets with 361 normal priority. In the abnormal situation that a normal-priority packet 362 has been waiting at the front of a MAC transmit queue for a time period 363 longer than PACKET_PROMOTION (200 - 300 ms [15]),its priority is 364 automatically 'promoted' to high priority. Thus, even normal-priority 365 packets have a maximum guaranteed access time to the medium. 367 Integrated Services can be built on top of the 802.12 medium access 368 mechanism. When combined with admission control and bandwidth 369 enforcement mechanisms, delay guarantees as required for a Guaranteed 370 Service can be provided without any changes to the existing 802.12 MAC 371 protocol. 373 Since the 802.12 standard supports the 802.3 and 802.5 frame formats, 374 the same framing overhead as reported in sections 4.2 and 4.3 must be 375 considered in the admission control equations for 802.12 links. 377 5 Requirements and Goals 379 This section discusses the requirements and goals which should drive the 380 design of an architecture for supporting Integrated Services over LAN 381 technologies. The requirements refer to functions and features which 382 must be supported, while goals refer to functions and features which are 383 desirable, but are not an absolute necessity. Many of the requirements 384 and goals are driven by the functionality supported by Integrated 385 Services and RSVP. 387 5.1 Requirements 389 - Resource Reservation: The mechanism must be capable of reserving 390 resources on a single segment or multiple segments and at 391 bridges/switches connecting them. It must be able to provide 392 reservations for both unicast and multicast sessions. It should be 393 possible to change the level of reservation while the session is in 394 progress. 396 - Admission Control: The mechanism must be able to estimate the level of 397 resources necessary to meet the QoS requested by the session in order to 398 decide whether or not the session can be admitted. For the purpose of 399 management, it is useful to provide the ability to respond to queries 400 about availability of resources. It must be able to make admission 401 control decisions for different types of services such as guaranteed 402 delay, controlled load, etc. 404 - Flow Separation and Scheduling: It is necessary to provide a 405 mechanism for traffic flow separation so that real-time flows can be 406 given preferential treatment over best effort flows. Packets of real- 407 time flows can then be isolated and scheduled according to their service 408 requirements. 410 - Policing: Traffic policing must be performed in order to ensure that 411 sources adhere to their negotiated traffic specifications. Policing must 412 be implemented at the sources and must ensure that violating traffic is 413 either dropped or transmitted as best effort. Policing may optionally be 414 implemented in the bridges and switches. Alternatively, traffic may be 415 shaped to insure conformance to the negotiated parameters. 417 - Soft State: The mechanism must maintain soft state information about 418 the reservations. This means that state information must be 419 periodically refreshed if the reservation is to be maintained; otherwise 420 the state information and corresponding reservations will expire after 421 some pre-specified interval. 423 - Centralized or Distributed Implementation: In the case of a 424 centralized implementation, a single entity manages the resources of the 425 entire subnet. This approach has the advantage of being easier to deploy 426 since bridges and switches may not need to be upgraded with additional 427 functionality. However, this approach scales poorly with geographical 428 size of the subnet and the number of end stations attached. In a fully 429 distributed implementation, each segment will have a local entity 430 managing its resources. This approach has better scalability than the 431 former. However, it requires that all bridges and switches in the 432 network support new mechanisms. It is also possible to have a semi- 433 distributed implementation where there is more than one entity, each 434 managing the resources of a subset of segments and bridges/switches 435 within the subnet. Ideally, implementation should be flexible; i.e. a 436 centralized approach may be used for small subnets and a distributed 437 approach can be used for larger subnets. Examples of centralized and 438 distributed implementations are discussed in Section 4. 440 - Scalability: The mechanism and protocols should have a low overhead 441 and should scale to the largest receiver groups likely to occur within a 442 single link layer domain. 444 - Fault Tolerance and Recovery: The mechanism must be able to function 445 in the presence of failures; i.e. there should not be a single point of 446 failure. For instance, in a centralized implementation, some mechanism 447 must be specified for back-up and recovery in the event of failure. 449 - Interaction with Existing Resource Management Controls: The 450 interaction with existing infrastructure for resource management needs 451 to be specified. For example, FDDI has a resource management mechanism 452 called the "Synchronous Bandwidth Manager". The mechanism must be 453 designed so that it takes advantage of, and specifies the interaction 454 with, existing controls where available. 456 5.2 Goals 458 - Independence from higher layer protocols: The mechanism should, as far 459 as possible, be independent of higher layer protocols such as RSVP and 460 IP. Independence from RSVP is desirable so that it can interwork with 461 other reservation protocols such as ST2 [10]. Independence from IP is 462 desirable so that it can interwork with network layer protocols such as 463 IPX, NetBIOS, etc. 465 - Receiver heterogeneity: this refers to multicast communication where 466 different receivers request different levels of service. For example, in 467 a multicast group with many receivers, it is possible that one of the 468 receivers desires a lower delay bound than the others. A better delay 469 bound may be provided by increasing the amount of resources reserved 470 along the path to that receiver while leaving the reservations for the 471 other receivers unchanged. In its most complex form, receiver 472 heterogeneity implies the ability to simultaneously provide various 473 levels of service as requested by different receivers. In its simplest 474 form, receiver heterogeneity will allow a scenario where some of the 475 receivers use best effort service and those requiring service guarantees 476 make a reservation. Receiver heterogeneity, especially for the 477 reserved/best effort scenario, is a very desirable function. More 478 details on supporting receiver heterogeneity are provided in Section 6. 480 - Support for different filter styles: It is desirable to provide 481 support for the different filter styles defined by RSVP such as fixed 482 filter, shared explicit and wildcard. Some of the issues with respect 483 to supporting such filter styles in the link layer domain are examined 484 in Section 6. 486 - Path Selection: In source routed LAN technologies such as token 487 ring/802.5, it may be useful for the mechanism to incorporate the 488 function of path selection. Using an appropriate path selection 489 mechanism may optimize utilization of network resources. 491 5.3 Non-goals 493 This document describes service mappings onto existing IEEE- and ANSI- 494 defined standard MAC layers and uses standard MAC-layer services as in 495 IEEE 802.1 bridging. It does not attempt to make use of or describe the 496 capabilities of other proprietary or standard MAC-layer protocols 497 although it should be noted that there exists published work regarding 498 MAC layers suitable for QoS mappings: these are outside the scope of the 499 IETF ISSLL working group charter. 501 5.4 Assumptions 503 For this framework, it is assumed that typical subnetworks that are 504 concerned about quality-of-service will be "switch-rich": that is to say 505 most communication between end stations using integrated services 506 support will pass through at least one switch. The mechanisms and 507 protocols described will be trivially extensible to communicating 508 systems on the same shared media, but it is important not to allow 509 problem generalisation to complicate the practical application that we 510 target: the access characteristics of Ethernet and Token-Ring LANs are 511 forcing a trend to switch-rich topologies. In addition, there have been 512 developments in the area of MAC enhancements to ensure delay- 513 deterministic access on network links e.g. IEEE 802.12 [19] and also 514 proprietary schemes. 516 Note that we illustrate most examples in this model using RSVP as an 517 "upper-layer" QoS signaling protocol but there are actually no real 518 dependencies on this protocol: RSVP could be replaced by some other 519 dynamic protocol or else the requests could be made by network 520 management or other policy entities. In particular, the SBM signaling 521 protocol [14], which is based upon RSVP, is designed to work seamlessly 522 in the architecture described in this memo. 524 There may be a heterogeneous mixture of switches with different 525 capabilities, all compliant with IEEE 802.1D [2] [3], but implementing 526 queuing and forwarding mechanisms in a range from simple 2-queue per 527 port, strict priority, up to more complex multi- queue (maybe even one 528 per-flow) WFQ or other algorithms. 530 The problem is broken down into smaller independent pieces: this may 531 lead to sub-optimal usage of the network resources but we contend that 532 such benefits are often equivalent to very small improvements in network 533 efficiency in a LAN environment. Therefore, it is a goal that the 534 switches in the network operate using a much simpler set of information 535 than the RSVP engine in a router. In particular, it is assumed that such 536 switches do not need to implement per-flow queuing and policing 537 (although they might do so). 539 It is a fundamental assumption of the int-serv model that flows are 540 isolated from each other throughout their transit across a network. 541 Intermediate queueing nodes are expected to police the traffic to ensure 542 that it conforms to the pre-agreed traffic flow specification. In the 543 architecture proposed here for mapping to layer-2, we diverge from that 544 assumption in the interests of simplicity: the policing function is 545 assumed to be implemented in the transmit schedulers of the layer-3 546 devices (end stations, routers). In the LAN environments envisioned, it 547 is reasonable to assume that end stations are "trusted" to adhere to 548 their agreed contracts at the inputs to the network and that we can 549 afford to over-allocate resources at admission -control time to 550 compensate for the inevitable extra jitter/bunching introduced by the 551 switched network itself. 553 These divergences have some implications on the types of receiver 554 heterogeneity that can be supported and the statistical multiplexing 555 gains that might have been exploited, especially for Controlled Load 556 flows: this is discussed in a later section of this document. 558 6 Basic Architecture 560 The functional requirements described in Section 3 will be performed by 561 an entity which we refer to as the Bandwidth Manager (BM). The BM is 562 responsible for providing mechanisms for an application or higher layer 563 protocol to request QoS from the network. For architectural purposes, 564 the BM consists of the following components. 566 6.1 Components 568 6.1.1 Requester Module 570 The Requester Module (RM) resides in every end station in the subnet. 572 One of its functions is to provide an interface between applications or 573 higher layer protocols such as RSVP, STII, SNMP, etc. and the BM. An 574 application can invoke the various functions of the BM by using the 575 primitives for communication with the RM and providing it with the 576 appropriate parameters. To initiate a reservation, in the link layer 577 domain, the following parameters must be passed to the RM: the service 578 desired (Guaranteed Service or Controlled Load), the traffic descriptors 579 contained in the TSpec, and an RSpec specifying the amount of resources 580 to be reserved [9]. More information on these parameters may be found 581 in the relevant Integrated Services documents [6,7,8,9]. When RSVP is 582 used for signaling at the network layer, this information is available 583 and needs to be extracted from the RSVP PATH and RSVP RESV messages (See 584 [5] for details). In addition to these parameters, the network layer 585 addresses of the end points must be specified. The RM must then 586 translate the network layer addresses to link layer addresses and 587 convert the request into an appropriate format which is understood by 588 other components of the BM responsible admission control. The RM is 589 also responsible for returning the status of requests processed by the 590 BM to the invoking application or higher layer protocol. 592 6.1.2 Bandwidth Allocator 594 The Bandwidth Allocator (BA) is responsible for performing admission 595 control and maintaining state about the allocation of resources in the 596 subnet. An end station can request various services, e.g. bandwidth 597 reservation, modification of an existing reservation, queries about 598 resource availability, etc. These requests are processed by the BA. The 599 communication between the end station and the BA takes place through the 600 RM. The location of the BA will depend largely on the implementation 601 method. In a centralized implementation, the BA may reside on a single 602 station in the subnet. In a distributed implementation, the functions of 603 the BA may be distributed in all the end stations and bridges/switches 604 as necessary. The BA is also responsible for deciding how to label 605 flows, e.g. based on the admission control decision, the BA may 606 indicate to the RM that packets belonging to a particular flow be tagged 607 with some priority value which maps to the appropriate traffic class. 609 6.1.3 Communication Protocols 611 The protocols for communication between the various components of the BM 612 system must be specified. These include the following: 614 - Communication between the higher layer protocols and the RM: The BM 615 must define primitives for the application to initiate reservations, 616 query the BA about available resources, and change or delete 617 reservations, etc. These primitives could be implemented as an API for 618 an application to invoke functions of the BM via the RM. 620 - Communication between the RM and the BA: A signaling mechanism must be 621 defined for the communication between the RM and the BA. This protocol 622 will specify the messages which must be exchanged between the RM and the 623 BA in order to service various requests by the higher layer entity. 625 - Communication between peer BAs: If there is more than one BA in the 626 subnet, a means must be specified for inter-BA communication. 627 Specifically, the BAs must be able to decide among themselves about 628 which BA would be responsible for which segments and bridges or 629 switches. Further, if a request is made for resource reservation along 630 the domain of multiple BAs, the BAs must be able to handle such a 631 scenario correctly. Inter-BA communication will also be responsible for 632 back-up and recovery in the event of failure. 634 6.2 Centralised vs. Distributed Implementations 636 Example scenarios are provided showing the location of the the 637 components of the bandwidth manager in centralized and fully distributed 638 implementations. Note that in either case, the RM must be present in 639 all end stations which desire to make reservations. Essentially, 640 centralized or distributed refers to the implementation of the BA, the 641 component responsible for resource reservation and admission control. 642 In the figures below, "App" refers to the application making use of the 643 BM. It could either be a user application, or a higher layer protocol 644 process such as RSVP. 646 +---------+ 647 .-->| BA |<--. 648 / +---------+ \ 649 / .-->| Layer 2 |<--. \ 650 / / +---------+ \ \ 651 / / \ \ 652 / / \ \ 653 +---------+ / / \ \ +---------+ 654 | App |<----- /-/---------------------------\-\----->| App | 655 +---------+ / / \ \ +---------+ 656 | RM |<----. / \ .--->| RM | 657 +---------+ / +---------+ +---------+ \ +---------+ 658 | Layer 2 |<------>| Layer 2 |<------>| Layer 2 |<------>| Layer 2 | 659 +---------+ +---------+ +---------+ +---------+ 661 RSVP Host/ Intermediate Intermediate RSVP Host/ 662 Router Bridge/Switch Bridge/Switch Router 664 Figure 1 - Bandwidth Manager with centralized Bandwidth Allocator 666 Figure 1 shows a centralized implementation where a single BA is 667 responsible for admission control decisions for the entire subnet. Every 668 end station contains a RM. Intermediate bridges and switches in the 669 network need not have any functions of the BM since they will not be 670 actively participating in admission control. The RM at the end station 671 requesting a reservation initiates communication with its BA. For larger 672 subnets, a single BA may not be able to handle the reservations for the 673 entire subnet. In that case it would be necessary to deploy multiple 674 BAs, each managing the resources of a non-overlapping subset of 675 segments. In a centralized implementation, the BA must have some model 676 of the layer-2 topology of the subnet e.g. link layer spanning tree 677 information, in order to be able to reserve resources on appropriate 678 segments. Without this topology information, the BM would have to 679 reserve resources on all segments for all flows which, in a switched 680 network, would lead to very inefficient utilization of resources. 682 +---------+ +---------+ 683 | App |<-------------------------------------------->| App | 684 +---------+ +---------+ +---------+ +---------+ 685 | RM/BA |<------>| BA |<------>| BA |<------>| RM/BA | 686 +---------+ +---------+ +---------+ +---------+ 687 | Layer 2 |<------>| Layer 2 |<------>| Layer 2 |<------>| Layer 2 | 688 +---------+ +---------+ +---------+ +---------+ 690 RSVP Host/ Intermediate Intermediate RSVP Host/ 691 Router Bridge/Switch Bridge/Switch Router 693 Figure 2 - Bandwidth Manager with fully distributed Bandwidth Allocator 695 Figure 2 depicts the scenario of a fully distributed bandwidth manager. 696 In this case, all devices in the subnet have BM functionality. All the 697 end hosts are still required to have a RM. In addition, all stations 698 actively participate in admission control. With this approach, each BA 699 would need only local topology information since it is responsible for 700 the resources on segments that are directly connected to it. This local 701 topology information, such as a list of ports active on the spanning 702 tree and which unicast addresses are reachable from which ports, is 703 readily available in today's switches. Note that in the figures above, 704 the arrows between peer layers are used to indicate logical 705 connectivity. 707 7 Model of the Bandwidth Manager in a Network 709 In this section we describe how the model above fits with the existing 710 IETF Integrated Services model of IP hosts and IP routers. First we 711 describe layer-3 host and router implementations; later we describe how 712 the model is applied in layer-2 switches. Throughout we indicate any 713 differences between centralised and distributed implementations. 715 7.1 End-station model 717 7.1.1 Layer-3 Client Model 719 We assume the same client model as int-serv and RSVP where we use the 720 term "client" to mean the entity handling QoS in the layer-3 device at 721 each end of a layer-2 hop (e.g. end-station, router). In this model, the 722 sending client is responsible for local admission control and scheduling 723 packets onto its link in accordance with the service agreed. As with the 724 current int-serv model, this involves per-flow scheduling (a.k.a. 725 traffic shaping) in every such originating source. 727 For now, we assume that the client is running an RSVP process which 728 presents a session establishment interface to applications, signals over 729 the network, programs a scheduler and classifier in the driver and 730 interfaces to a policy control module. In particular, RSVP also 731 interfaces to a local admission control module: it is this entity that 732 we focus on here. 734 The following diagram is taken from the RSVP specification [5]: 735 _____________________________ 736 | _______ | 737 | | | _______ | 738 | |Appli- | | | | RSVP 739 | | cation| | RSVP <--------------------> 740 | | <--> | | 741 | | | |process| _____ | 742 | |_._____| | -->Polcy|| 743 | | |__.__._| |Cntrl|| 744 | |data | | |_____|| 745 |===|===========|==|==========| 746 | | --------| | _____ | 747 | | | | ---->Admis|| 748 | _V__V_ ___V____ |Cntrl|| 749 | | | | | |_____|| 750 | |Class-| | Packet | | 751 | | ifier|==>Schedulr|====================> 752 | |______| |________| | data 753 | | 754 |_____________________________| 756 Figure 3 - RSVP in Sending Hosts 758 Note that we illustrate examples in this document using RSVP as the 759 "upper-layer" signaling protocol but there are no actual dependencies on 760 this protocol: RSVP could be replaced by some other dynamic protocol or 761 else the requests could be made by network management or other policy 762 entities. 764 7.1.2 Requests to layer-2 ISSLL 766 The local admission control entity within a client is responsible for 767 mapping these layer-3 session-establishment requests into layer-2 768 language. 770 The upper-layer entity makes a request, in generalised terms to ISSLL of 771 the form: 773 "May I reserve for traffic with with 774 from to and how should I 775 label it?" 777 where 778 = Sender Tspec 779 (e.g. bandwidth, burstiness, MTU) 780 = FlowSpec 781 (e.g. latency, jitter bounds) 782 = IP address(es) 783 = IP address(es) - may be multicast 785 7.1.3 At the Layer-3 Sender 787 The ISSLL functionality in the sender is illustrated in Figure 4. 789 from IP from RSVP 790 ____|____________|____________ 791 | | | | 792 | __V____ ___V___ | 793 | | | | | | 794 | | Addr |<->| | | SBM signaling 795 | |mapping| |Request|<------------------------> 796 | |_______| |Module | | 797 | ___|___ | | | 798 | | |<->| | | 799 | | 802 | |_______| | 800 | | header| / | | | 801 | |_______| / | | | 802 | | / | | _____ | 803 | | +-----/ | +->|Band-| | 804 | __V_V_ _____V__ |width| | 805 | | | | | |Alloc| | 806 | |Class-| | Packet | |_____| | 807 | | ifier|==>Schedulr|======================> 808 | |______| |________| | data 809 |______________________________| 811 Figure 4 - ISSLL in End-station Sender 813 The functions of the Requestor Module may be summarised as: - maps the 814 endpoints of the conversation to layer-2 addresses in the LAN, so that 815 the client can figure out what traffic is really going where (probably 816 makes reference to the ARP protocol cache for unicast or an algorithmic 817 mapping for multicast destinations). 819 - communicates with any local Bandwidth Allocator module for local 820 admission control decisions 822 - formats a SBM request to the network with the mapped addresses and 823 filter/flow specs 825 - receives response from the network and reports the YES/NO admission 826 control answer back to the upper layer entity, along with any negotiated 827 modifications to the session parameters. 829 - saves any returned user_priority to be associated with this session in 830 a "802 header" table: this will be used when adding layer-2 header 831 before sending any future data packet belonging to this session. This 832 table might, for example, be indexed by the RSVP flow identifier. 834 The Bandwidth Allocator (BA) component is only present when a 835 distributed BA model is implemented: when present, its functions can be 836 summarised as: - applies local admission control on outgoing link 837 bandwidth and driver queueing resources 839 7.1.4 At the Layer-3 Receiver 841 The ISSLL functionality in the receiver is simpler. It is summarised 842 below and is illustrated by Figure 5. 844 The Requestor Module 846 - handles any received SBM protocol indications. 848 - communicates with any local BA for local admission control decisions 850 - passes indications up to RSVP if OK. 852 - accepts confirmations from RSVP and relays them back via SBM signaling 853 towards the requester. 855 - may program a receive classifier and scheduler, if any is used, to 856 identify traffic classes of received packets and accord them appropriate 857 treatment e.g. reserve some buffers for particular traffic classes. 859 - programs receiver to strip any 802 header information from received 860 packets. 862 The Bandwidth Allocator, present only in a distributed implementation 864 - applies local admission control to see if a request can be supported 865 with appropriate local receive resources. 867 to RSVP to IP 868 ^ ^ 869 ____|____________|___________ 870 | | | | 871 | __|____ | | 872 | | | | | 873 SBM signaling | |Request| ___|___ | 874 <-----------------> |Module | | Strip | | 875 | |_______| |802 hdr| | 876 | | \ |_______| | 877 | __v___ \ ^ | 878 | | Band- |\ | | 879 | | width| \ | | 880 | | Alloc | \ | | 881 | |_______| \ | | 882 | ______ v___|____ | 883 | |Class-| | Packet | | 884 ===================>| ifier|==>|Scheduler| | 885 data | |______| |_________| | 886 |_____________________________| 888 Figure 5 - ISSLL in End-station Receiver 890 7.2 Switch Model 892 7.2.1 Centralised BA 894 Where a centralised Bandwidth Allocator model is implemented, switches 895 do not take part in the admission control process: all admission control 896 is implemented by a central BA e.g. a "Subnet Bandwidth Manager" (SBM) 897 as described in [14]. Note that this centralised BA may actually be co- 898 located with a switch but its functions would not necessarily then be 899 closely tied with the switches forwarding functions as is the case with 900 the distributed BA described below. 902 7.2.2 Distributed BA 904 The model of layer-2 switch behaviour described here uses the 905 terminology of the SBM protocol as an example of an admission control 906 protocol: the model is equally applicable when other mechanisms e.g. 907 static configuration, network management are in use for admission 908 control. We define the following entities within the switch: 910 * Local admission control - one of these on each port accounts for the 911 available bandwidth on the link attached to that port. For half- duplex 912 links, this involves taking account of the resources allocated to both 913 transmit and receive flows. For full-duplex, the input port accountant's 914 task is trivial. 916 * Input SBM module: one instance on each port, performs the "network" 917 side of the signaling protocol for peering with clients or other 918 switches. Also holds knowledge of the mappings of int-serv classes to 919 user_priority. 921 * SBM propagation - relays requests that have passed admission control 922 at the input port to the relevant output ports' SBM modules. This will 923 require access to the switch's forwarding table (layer-2 "routing table" 924 cf. RSVP model) and port spanning-tree states. 926 * Output SBM module - forwards requests to the next layer-2 or -3 927 network hop. 929 * Classifier, Queueing and Scheduler - these functions are basically as 930 described by the Forwarding Process of IEEE 802.1p (see section 3.7 of 931 [3]). The Classifier module identifies the relevant QoS information from 932 incoming packets and uses this, together with the normal bridge 933 forwarding database, to decide to which output queue of which output 934 port to enqueue the packet. Different types of switches will use 935 different techniques for flow idenfication - see section 8.1 for details 936 of a taxonomy of switch types. In Class I switches, this information is 937 the "regenerated user_priority" parameter which has already been decoded 938 by the receiving MAC service and potentially re- mapped by the 802.1p 939 forwarding process (see description in section 3.7.3 of [3]). This does 940 not preclude more sophisticated classification rules which may be 941 applied in more complex Class III switches e.g. matching on individual 942 int-serv flows. 944 The Queueing and Scheduler module holds the output queues for ports and 945 provides the algorithm for servicing the queues for transmission onto 946 the output link in order to provide the promised int-serv service. 947 Switches will implement one or more output queues per port and all will 948 implement at least a basic strict priority dequeueing algorithm as their 949 default, in accordance with 802.1p. 951 * Ingress traffic class mapper and policing - as described in 802.1p 952 section 3.7. This optional module may check on whether the data within 953 traffic classes are conforming to the patterns currently agreed: 954 switches may police this and discard or re-map packets. The default 955 behaviour is to pass things through unchanged. 957 * Egress traffic class mapper - as described in 802.1p section 3.7. This 958 optional module may apply re-mapping of traffic classes e.g. on a per- 959 output port basis. The default behaviour is to pass things through 960 unchanged. 962 These are shown by the following diagram which is a superset of the IEEE 963 802.1D bridge model: 965 _______________________________ 966 | _____ ______ ______ | 967 SBM signaling | | | | | | | | SBM signaling 968 <------------------>| IN |<->| SBM |<->| OUT |<----------------> 969 | | SBM | | prop.| | SBM | | 970 | |_____| |______| |______| | 971 | / | ^ / | | 972 ______________| / | | | | |_____________ 973 | \ / __V__ | | __V__ / | 974 | \ ____/ |Local| | | |Local| / | 975 | \ / |Admis| | | |Admis| / | 976 | \/ |Cntrl| | | |Cntrl| / | 977 | _____V \ |_____| | | |_____| / _____ | 978 | |traff | \ ___|__ V_______ / |egrss| | 979 | |class | \ |Filter| |Queue & | / |traff| | 980 | |map & |=====|==========>|Data- |=| Packet |=|===>|class| | 981 | |police| | | base| |Schedule| | |map | | 982 | |______| | |______| |________| | |_____| | 983 |____^_________|_______________________________|______|______| 984 data in | |data out 985 ========+ +========> 986 Figure 6 - ISSLL in Switches 988 7.3 Admission Control 990 On reception of an admission control request, a switch performs the 991 following actions, again using SBM as an example: the behaviour is 992 different depending on whether the "Designated SBM" for this segment is 993 within this switch or not - see [14] for a more detailed specification 994 of the DSBM/SBM actions: 996 * if the ingress SBM is the "Designated SBM" for this link/segment, it 997 translates any received user_priority or else selects a layer-2 traffic 998 class which appears compatible with the request and whose use does not 999 violate any administrative policies in force. In effect, it matches up 1000 the requested service with those available in each of the user_priority 1001 classes and chooses the "best" one. It ensures that, if this reservation 1002 is successful, the selected value is passed back to the client. 1004 * ingress DSBM observes the current state of allocation of resources on 1005 the input port/link and then determines whether the new resource 1006 allocation from the mapped traffic class would be excessive. The request 1007 is passed to the reservation propagator if accepted so far. 1009 * if the ingress SBM is not the "Designated SBM" for this link/segment 1010 then it passes the request on directly to the reservation propagator 1012 * reservation propagator relays the request to the bandwidth accountants 1013 on each of the switch's outbound links to which this reservation would 1014 apply (implied interface to routing/forwarding database). 1016 * egress bandwidth accountant observes the current state of allocation 1017 of queueing resources on its outbound port and bandwidth on the link 1018 itself and determines whether the new allocation would be excessive. 1019 Note that this is only the local decision of this switch hop: each 1020 further layer-2 hop through the network gets a chance to veto the 1021 request as it passes along. 1023 * the request, if accepted by this switch, is then passed on down the 1024 line on each output link selected. Any user_priority described in the 1025 forwarded request must be translated according to any egress mapping 1026 table. 1028 * if accepted, the switch must notify the client of the user_priority to 1029 use for packets belonging to this flow. Note that this is a 1030 "provisional YES" - we assume an optimistic approach here: later 1031 switches can still say "NO" later. 1033 * if this switch wishes to reject the request, it can do so by notifying 1034 the original client (by means of its layer-2 address). 1036 7.4 QoS Signaling 1038 The mechanisms described in this document make use of a signaling 1039 protocol for devices to communicate their admission control requests 1040 across the network: the service definitions to be provided by such a 1041 protocol e.g. [14] are described below. Below, we illustrate the 1042 primitives and information that need to be exchanged with such a 1043 signaling protocol entity - in all these examples, appropriate 1044 delete/cleanup mechanisms will also have to be provided for when 1045 sessions are torn down. 1047 7.4.1 Client service definitions 1049 The following interfaces can be identified from Figure 4 and Figure 5 1051 * SBM <-> Address mapping 1053 This is a simple lookup function which may cause ARP protocol 1054 interactions, may be just a lookup of an existing ARP cache entry or may 1055 be an algorithmic mapping. The layer-2 addresses are needed by SBM for 1056 inclusion in its signaling messages to/from switches which avoids the 1057 switches having to perform the mapping and, hence, have knowledge of 1058 layer-3 information for the complete subnet: 1060 l2_addr = map_address( ip_addr ) 1062 * SBM <-> Session/802 header 1064 This is for notifying the transmit path of how to add layer-2 header 1065 information e.g. user_priority values to the traffic of each outgoing 1066 flow: the transmit path will provide the user_priority value when it 1067 requests a MAC-layer transmit operation for each packet (user_priority 1068 is one of the parameters passed in the packet transmit primitive defined 1069 by the IEEE 802 service model): 1071 bind_l2_header( flow_id, user_priority ) 1073 * SBM <-> Classifier/Scheduler 1075 This is for notifying transmit classifier/scheduler of any additional 1076 layer-2 information associated with scheduling the transmission of a 1077 flow packets: this primitive may be unused in some implementations or it 1078 may be used, for example, to provide information to a transmit scheduler 1079 that is performing per-traffic_class scheduling in addition to the per- 1080 flow scheduling required by int-serv: the l2_header may be a pattern 1081 (additional to the FilterSpec) to be used to identify the flow's 1082 traffic. 1084 bind_l2schedulerinfo( flow_id, , l2_header, traffic_class ) 1086 * SBM <-> Local Admission Control 1088 For applying local admission control for a session e.g. is there enough 1089 transmit bandwidth still uncommitted for this potential new session? Are 1090 there sufficient receive buffers? This should commit the necessary 1091 resources if OK: it will be necessary to release these resources at a 1092 later stage if the session setup process fails. This call would be made 1093 by a segment's Designated SBM for example: 1095 status = admit_l2session( flow_id, Tspec, FlowSpec ) 1097 * SBM <-> RSVP - this is outlined above in section 7.1.2 and fully 1098 described in [14]. 1100 * Management Interfaces 1101 Some or all of the modules described by this model will also require 1102 configuration management: it is expected that details of the manageable 1103 objects will be specified by future work in the ISSLL WG. 1105 7.4.2 Switch service definitions 1107 The following interfaces are identified from Figure 6: 1109 * SBM <-> Classifier 1111 This is for notifying receive classifier of how to match up incoming 1112 layer-2 information with the associated traffic class: it may in some 1113 cases consist of a set of read-only default mappings: 1115 bind_l2classifierinfo( flow_id, l2_header, traffic_class ) 1117 * SBM <-> Queue and Packet Scheduler 1119 This is for notifying transmit scheduler of additional layer-2 1120 information associated with a given traffic class (it may be unused in 1121 some cases - see discussion in previous section): 1123 bind_l2schedulerinfo( flow_id, l2_header, traffic_class ) 1125 * SBM <-> Local Admission Control 1127 As for host above. 1129 * SBM <-> Traffic Class Map and Police 1131 Optional configuration of any user_priority remapping that might be 1132 implemented on ingress to and egress from the ports of a switch (note 1133 that, for Class I switches, it is likely that these mappings will have 1134 to be consistent across all ports): 1136 bind_l2ingressprimap( inport, in_user_pri, internal_priority ) 1137 bind_l2egressprimap( outport, internal_priority, out_user_pri ) 1139 Optional configuration of any layer-2 policing function to be applied 1140 on a per-class basis to traffic matching the l2_header. If the switch is 1141 capable of per-flow policing then existing int-serv/RSVP models will 1142 provide a service definition for that configuration: 1144 bind_l2policing( flow_id, l2_header, Tspec, FlowSpec ) 1146 * SBM <-> Filtering Database 1148 SBM propagation rules need access to the layer-2 forwarding database to 1149 determine where to forward SBM messages (analogous to RSRR interface in 1150 L3 RSVP): 1152 output_portlist = lookup_l2dest( l2_addr ) 1154 * Management Interfaces 1156 Some or all of the modules described by this model will also require 1157 configuration management: it is expected that details of the manageable 1158 objects will be specified by future work in the ISSLL WG. 1160 8 Implementation Issues 1162 As stated earlier, the Integrated Services working group has defined 1163 various service classes offering varying degrees of QoS guarantees. 1164 Initial effort will concentrate on enabling the Controlled Load [6] and 1165 Guaranteed Service classes [7]. The Controlled Load service provides a 1166 loose guarantee, informally stated as "the same as best effort would be 1167 on an unloaded network". The Guaranteed Service provides an upper-bound 1168 on the transit delay of any packet. The extent to which these services 1169 can be supported at the link layer will depend on many factors including 1170 the topology and technology used. Some of the mapping issues are 1171 discussed below in light of the emerging link layer standards and the 1172 functions supported by higher layer protocols. Considering the 1173 limitations of some of the topologies under consideration, it may not be 1174 possible to satisfy all the requirements for Integrated Services on a 1175 given topology. In such cases, it is useful to consider providing 1176 support for an approximation of the service which may suffice in most 1177 practical instances. For example, it may not be feasible to provide 1178 policing/shaping at each network element (bridge/switch) as required by 1179 the Controlled Load specification. But if this task is left to the end 1180 stations, a reasonably good approximation to the service can be 1181 obtained. 1183 8.1 Switch characteristics 1185 For the sake of illustration, we divide layer-2 bridges/switches into 1186 several categories, based on the level of sophistication of their QoS 1187 and software protocol capabilities: these categories are not intended to 1188 represent all possible implementation choices but, instead, to aid 1189 discussion of what QoS capabilities can be expected from a network made 1190 of these devices (the basic "class 0" device is included for 1191 completeness but cannot really provide useful integrated service). 1193 Class 0 1194 - 802.1D MAC bridging 1195 - single queue per output port, no separation of traffic classes 1196 - Spanning-Tree to remove topology loops (single active path) 1198 Class I 1199 - 802.1p priority queueuing between traffic classes. 1200 - No multicast heterogeneity. 1201 - 802.1p GARP/GMRP pruning of individual multicast addresses. 1203 Class II As (I) plus: 1204 - can map received user_priority on a per-input-port basis to some 1205 internal set of canonical values. 1206 - can map internal canonical values onto transmitted user_priority on 1207 a per-output-port basis giving some limited form of multicast 1208 heterogeneity. 1209 - maybe implements IGMP snooping for pruning. 1211 Class III As (II) plus: 1212 - per-flow classification 1213 - maybe per-flow policing and/or reshaping 1214 - more complex transmit scheduling (probably not per-flow) 1216 8.2 Queueing 1218 Connectionless packet-based networks in general, and LAN-switched 1219 networks in particular, work today because of scaling choices in network 1220 provisioning. Consciously or (more usually) unconsciously, enough excess 1221 bandwidth and buffering is provisioned in the network to absorb the 1222 traffic sourced by higher-layer protocols or cause their transmission 1223 windows to run out, on a statistical basis, so that the network is only 1224 overloaded for a short duration and the average expected loading is less 1225 than 60% (usually much less). 1227 With the advent of time-critical traffic such over-provisioning has 1228 become far less easy to achieve. Time critical frames may find 1229 themselves queued for annoyingly long periods of time behind temporary 1230 bursts of file transfer traffic, particularly at network bottleneck 1231 points, e.g. at the 100 Mb/s to 10 Mb/s transition that might occur 1232 between the riser to the wiring closet and the final link to the user 1233 from a desktop switch. In this case, however, if it is known (guaranteed 1234 by application design, merely expected on the basis of statistics, or 1235 just that this is all that the network guarantees to support) that the 1236 time critical traffic is a small fraction of the total bandwidth, it 1237 suffices to give it strict priority over the "normal" traffic. The worst 1238 case delay experienced by the time critical traffic is roughly the 1239 maximum transmission time of a maximum length non-time-critical frame - 1240 less than a millisecond for 10 Mb/s Ethernet, and well below an end to 1241 end budget based on human perception times. 1243 When more than one "priority" service is to be offered by a network 1244 element e.g. it supports Controlled-Load as well as Guaranteed Service, 1245 the queuing discipline becomes more complex. In order to provide the 1246 required isolation between the service classes, it will probably be 1247 necessary to queue them separately. There is then an issue of how to 1248 service the queues - a combination of admission control and more 1249 intelligent queueing disciplines e.g. weighted fair queuing, may be 1250 required in such cases. As with the service specifications themselves, 1251 it is not the place for this document to specify queuing algorithms, 1252 merely to observe that the external behaviour meet the services' 1253 requirements. 1255 8.3 Mapping of Services to Link Level Priority 1257 The number of traffic classes supported and access methods of the 1258 technology under consideration will determine how many and what services 1259 may be supported. Native token ring/802.5, for instance, supports eight 1260 priority levels which may be mapped to one or more traffic classes. 1261 Ethernet/802.3 has no support for signaling priorities within frames. 1262 However, the IEEE 802 standards committee has recently developed a new 1263 standard for bridges/switches related to multimedia traffic expediting 1264 and dynamic multicast filtering [3]. A packet format for carrying a User 1265 Priority field on all IEEE 802 media types is now defined in [4]. These 1266 standards allow for up to eight traffic classes on all media. The User 1267 Priority bits carried in the frame are mapped to a particular traffic 1268 class within a bridge/switch. The User Priority is signaled on an end- 1269 to-end basis, unless overridden by bridge/switch management. The 1270 traffic class that is used by a flow should depend on the quality of 1271 service desired and whether the reservation is successful or not. 1272 Therefore, a sender should use the User Priority value which maps to the 1273 best effort traffic class until told otherwise by the BM. The BM will, 1274 upon successful completion of resource reservation, specify the User 1275 Priority to be used by the sender for that session's data. An 1276 accompanying memo [13] addresses the issue of mapping the various 1277 Integrated Services to appropriate traffic classes. 1279 8.4 Re-mapping of non-conformant aggregated flows 1281 One other topic under discussion in the int-serv context is how to 1282 handle the traffic for data flows from sources that are exceeding their 1283 currently agreed traffic contract with the network. An approach that 1284 shows some promise is to treat such traffic with "somewhat less than 1285 best effort" service in order to protect traffic that is normally given 1286 "best effort" service from having to back off (such traffic is often 1287 "adaptive" using TCP or other congestion control algorithms and it would 1288 be unfair to penalise it due to badly behaved traffic from reserved 1289 flows which are often set up by non-adaptive applications). 1291 One solution here might be to assign normal best effort traffic to one 1292 user_priority and to label excess non-conformant traffic as a "lower" 1293 user_priority although the re-ordering problems that might arise from 1294 doing this may make this solution undesirable, particularly if the flows 1295 are using TCP: for this reason the controlled load service recommends 1296 dropping excess traffic, rather than re-mapping to a lower priority. 1297 This topic is further discussed below. 1299 8.5 Override of incoming user_priority 1301 In some cases, a network administrator may not trust the user_priority 1302 values contained in packets from a source and may wish to map these into 1303 some more suitable set of values. Alternatively, due perhaps to 1304 equipment limitations or transition periods, values may need to be 1305 mapped to/from different regions of a network. 1307 Some switches may implement such a function on input that maps received 1308 user_priority into some internal set of values (this table is known in 1309 802.1p as the "user_priority regeneration table"). These values can then 1310 be mapped using the output table described above onto outgoing 1311 user_priority values: these same mappings must also be used when 1312 applying admission control to requests that use the user_priority values 1313 (see e.g. [14]). More sophisticated approaches may also be envisioned 1314 where a device polices traffic flows and adjusts their onward 1315 user_priority based on their conformance to the admitted traffic flow 1316 specifications. 1318 8.6 Support for Different Reservation Styles 1320 +-----+ +-----+ +-----+ 1321 | S1 | | S2 | | S3 | 1322 +-----+ +-----+ +-----+ 1323 | | | 1324 | v | 1325 | +-----+ | 1326 +--------->| SW |<---------+ 1327 +-----+ 1328 | | 1329 +----+ +----+ 1330 | | 1331 v V 1332 +-----+ +-----+ 1333 | R1 | | R2 | 1334 +-----+ +-----+ 1336 Figure 7 - Illustration of filter styles. 1338 In the figure above, SW is a bridge/switch in the link layer domain. S1, 1339 S2, S3, R1 and R2 are end stations which are members of a group 1340 associated with the same RSVP flow. S1, S2 and S3 are upstream end 1341 stations. R1 and R2 are the downstream end-stations which receive 1342 traffic from all the senders. RSVP allows receivers R1 and R2 to 1343 specify reservations which can apply to: (a) one specific sender only 1344 (fixed filter); (b) any of two or more explicitly specified senders 1345 (shared explicit filter); and (c) any sender in the group (shared 1346 wildcard filter). Support for the fixed filter style is 1347 straightforward; a separate reservation is made for the traffic from 1348 each of the senders. However, support for the other two filter styles 1349 has implications regarding policing; i.e. the merged flow from the 1350 different senders must be policed so that they conform to traffic 1351 parameters specified in the filter's RSpec. This scenario is further 1352 complicated if the services requested by R1 and R2 are different. 1353 Therefore, in the absence of policing within bridges/switches, it may be 1354 possible to support only fixed filter reservations at the link layer. 1356 8.7 Supporting Receiver Heterogeneity 1358 At layer-3, the int-serv model allows heterogeneous multicast flows 1359 where different branches of a tree can have different types of 1360 reservations for a given multicast destination. It also supports the 1361 notion that trees may have some branches with reserved flows and some 1362 using best effort (default) service. If we were to treat a layer-2 1363 subnet as a single "network element", as defined in [8], then all of the 1364 branches of the distribution tree that lie within the subnet could be 1365 assumed to require the same QoS treatment and be treated as an atomic 1366 unit as regards admission control etc.. With this assumption, the model 1367 and protocols already defined by int-serv and RSVP already provide 1368 sufficient support for multicast heterogeneity. Note, however, that an 1369 admission control request may well be rejected because just one link in 1370 the subnet has reached its traffic limit and that this will lead to 1371 rejection of the request for the whole subnet. 1373 +-----+ 1374 | S | 1375 +-----+ 1376 | 1377 v 1378 +-----+ +-----+ +-----+ 1379 | R1 |<-----| SW |----->| R2 | 1380 +-----+ +-----+ +-----+ 1382 Figure 8 - Example of receiver heterogeneity 1384 As an example, consider Figure 8, SW is a Layer 2 device (bridge/switch) 1385 participating in resource reservation, S is the upstream source end 1386 station and R1 and R2 are downstream end station receivers. R1 would 1387 like to make a reservation for the flow while R2 would like to receive 1388 the flow using best effort service. S sends RSVP PATH messages which 1389 are multicast to both R1 and R2. R1 sends an RSVP RESV message to S 1390 requesting the reservation of resources. 1392 If the reservation is successful at Layer 2, the frames addressed to the 1393 group will be categorized in the traffic class corresponding to the 1394 service requested by R1. At SW, there must be some mechanism which 1395 forwards the packet providing service corresponding to the reserved 1396 traffic class at the interface to R1 while using the best effort traffic 1397 class at the interface to R2. This may involve changing the contents of 1398 the frame itself, or ignoring the frame priority at the interface to R2. 1400 Another possibility for supporting heterogeneous receivers would be to 1401 have separate groups with distinct MAC addresses, one for each class of 1402 service. By default, a receiver would join the "best effort" group 1403 where the flow is classified as best effort. If the receiver makes a 1404 reservation successfully, it can be transferred to the group for the 1405 class of service desired. The dynamic multicast filtering capabilities 1406 of bridges and switches implementing the emerging IEEE 802.1p standard 1407 would be a very useful feature in such a scenario. A given flow would 1408 be transmitted only on those segments which are on the path between the 1409 sender and the receivers of that flow. The obvious disadvantage of such 1410 an approach is that the sender needs to send out multiple copies of the 1411 same packet corresponding to each class of service desired thus 1412 potentially duplicating the traffic on a portion of the distribution 1413 tree. 1415 The above approaches would provide very sub-optimal utilisation of 1416 resources given the size and complexity of the layer-2 subnets 1417 envisioned by this document. Therefore, it is desirable to support the 1418 ability of layer-2 switches to apply QoS differently on different egress 1419 branches of a tree that divides at that switch: this is discussed in the 1420 following paragraphs. 1422 IEEE 802.1D and 802.1p specify a basic model for multicast whereby a 1423 switch performs multicast routing decisions based on the destination 1424 address: this would produce a list of output ports to which the packet 1425 should be forwarded. In its default mode, such a switch would use the 1426 user_priority value in received packets (or a value regenerated on a 1427 per-input-port basis in the absence of an explicit value) to enqueue the 1428 packets at each output port. All of the classes of switch identified 1429 above can support this operation. 1431 If a switch is selecting per-port output queues based only on the 1432 incoming user_priority, as described by 802.1p, it must treat all 1433 branches of all multicast sessions within that user_priority class with 1434 the same queuing mechanism: no heterogeneity is then possible and this 1435 could well lead to the failure of an admission control request for the 1436 whole multicast session due to a single link being at its maximum 1437 allocation, as described above. Note that, in the layer-2 case as 1438 distinct from the layer-3 case with RSVP/int-serv, the option of having 1439 some receivers getting the session with the requested QoS and some 1440 getting it best effort does not exist as the Class I switches are unable 1441 to re-map the user_priority on a per-link basis: this could well become 1442 an issue with heavy use of dynamic multicast sessions. If a switch were 1443 to implement a separate user_priority mapping at each output port, as 1444 described under "Class II switch" above, then some limited form of 1445 receiver heterogeneity can be supported e.g. forwarding of traffic as 1446 user_priority 4 on one branch where receivers have performed admission 1447 control reservations and as user_priority 0 on one where they have not. 1448 We assume that per-user_priority queuing without taking account of input 1449 or output ports is the minimum standard functionality for switches in a 1450 LAN environment (Class I switch, as defined above) but that more 1451 functional layer-2 or even layer-3 switches (a.k.a. routers) can be used 1452 if even more flexible forms of heterogeneity are considered necessary to 1453 achieve more efficient resource utilisation: note that the behaviour of 1454 layer-3 switches in this context is already well standardised by IETF. 1456 9 Network Topology Scenarios 1457 As stated earlier, this memo is concerned with specifying a framework 1458 for supporting Integrated Services in LAN technologies such as 1459 Ethernet/IEEE 802.3, token ring/IEEE 802.5 and FDDI. The extent to which 1460 service guarantees can be provided by a network depend to a large degree 1461 on the ability to provide the key functions of flow identification and 1462 scheduling in addition to admission control and policing. This section 1463 discusses some of the capabilities of these LAN technologies and 1464 provides a taxonomy of possible topologies emphasizing the capabilities 1465 of each with regard to supporting the above functions. For the 1466 technologies considered here, the basic topology of a LAN may be shared, 1467 switched half duplex or switched full duplex. In the shared topology, 1468 multiple senders share a single segment. Contention for media access is 1469 resolved using protocols such as CSMA/CD in Ethernet and token passing 1470 in token ring and FDDI. Switched half duplex, is essentially a shared 1471 topology with the restriction that there are only two transmitters 1472 contending for resources on any segment. Finally, in a switched full 1473 duplex topology, a full bandwidth path is available to the transmitter 1474 at each end of the link at all times. Therefore, in this topology, 1475 there is no need for any access control mechanism such as CSMA/CD or 1476 token passing as there is no contention between the transmitters - 1477 obviously, this topology provides the best QoS capabilities. Another 1478 important element in the discussion of topologies is the presence or 1479 absence of support for multiple traffic classes: these were discussed 1480 earlier in section 4.1.Depending on the basic topology used and the 1481 ability to support traffic classes, we identify six scenarios as 1482 follows: 1483 1. Shared topology without traffic classes 1484 2. Shared topology with traffic classes. 1485 3. Switched half duplex topology without traffic classes 1486 4. Switched half duplex topology with traffic classes 1487 5. Switched full duplex topology without traffic classes 1488 6. Switched full duplex topology with traffic classes 1490 There is also the possibility of hybrid topologies where two or more of 1491 the above coexist. For instance, it is possible that within a single 1492 subnet, there are some switches which support traffic classes and some 1493 which do not. If the flow in question traverses both kinds of switches 1494 in the network, the least common denominator will prevail. In other 1495 words, as far as that flow is concerned, the network is of the type 1496 corresponding to the least capable topology that is traversed. In the 1497 following sections, we present these scenarios in further detail for 1498 some of the different IEEE 802 network types with discussion of their 1499 abilities to support the Integrated Service classes. 1501 9.1 Full-duplex switched networks 1503 We have up to now ignored the MAC access protocol. On a full-duplex 1504 switched LAN (of either Ethernet or Token-Ring types - the MAC algorithm 1505 is, by definition, unimportant) this can be factored in to the 1506 characterisation parameters advertised by the device since the access 1507 latency is well controlled (jitter = one largest packet time). Some 1508 example characteristics (approximate): 1510 Type Speed Max Pkt Max Access 1511 Length Latency 1513 Ethernet 10Mbps 1.2ms 1.2ms 1514 100Mbps 120us 120us 1515 1Gbps 12us 12us 1516 Token-Ring 4Mbps 9ms 9ms 1517 16Mbps 9ms 9ms 1518 FDDI 100Mbps 360us 8.4ms 1519 Demand-Priority 100Mbps 120us 253us 1521 Table 1 - Full-duplex switched media access latency 1523 These delays should be also be considered in the context of speed-of- 1524 light delays of e.g. ~400ns for typical 100m UTP links and ~7us for 1525 typical 2km multimode fibre links. 1527 Therefore we see Full-Duplex switched network topologies as offering 1528 good QoS capabilities for both Controlled Load and Guaranteed Service 1529 when supported by suitable queueing strategies in the switch nodes. 1531 9.2 Shared-media Ethernet networks 1533 We have not mentioned the difficulty of dealing with allocation on a 1534 single shared CSMA/CD segment: as soon as any CSMA/CD algorithm is 1535 introduced then the ability to provide any form of Guaranteed Service is 1536 seriously compromised in the absence of any tight coupling between the 1537 multiple senders on the link. There are a number of reasons for not 1538 offering a better solution for this issue. 1540 Firstly, we do not believe this is a truly solvable problem: it would 1541 seem to require a new MAC protocol. There have been proposals for 1542 enhancements to the MAC layer protocols e.g. BLAM and enhanced flow- 1543 control in IEEE 802.3; IEEE 802.1 has examined research showing 1544 disappointing simulation results for performance guarantees on shared 1545 CSMA/CD Ethernet without MAC enhancements. However, any solution 1546 involving a new "software MAC" running above the traditional 802.3 MAC 1547 or other proprietary MAC protocols is clearly outside the scope of the 1548 work of the ISSLL WG and this document. Secondly, we are not convinced 1549 that it is really an interesting problem. While not everyone in the 1550 world is buying desktop switches today and there will be end stations 1551 living on repeated segments for some time to come, the number of 1552 switches is going up and the number of stations on repeated segments is 1553 going down. This trend is proceeding to the point that we may be happy 1554 with a solution which assumes that any network conversation requiring 1555 resource reservations will take place through at least one switch (be it 1556 layer-2 or layer-3). Put another way, the easiest QoS upgrade to a 1557 layer-2 network is to install segment switching: only when this has been 1558 done is it worthwhile to investigate more complex solutions involving 1559 admission control. 1561 Thirdly, in the core of the network (as opposed to at the edges), there 1562 does not seem to be wide deployment of repeated segments as opposed to 1563 switched solutions. There may be special circumstances in the future 1564 (e.g. Gigabit buffered repeaters) but these have differing 1565 characteristics to existing CSMA/CD repeaters anyway. 1567 Type Speed Max Pkt Max Access 1568 Length Latency 1570 Ethernet 10Mbps 1.2ms unbounded 1571 100Mbps 120us unbounded 1572 1Gbps 12us unbounded 1574 Table 2 - Shared Ethernet media access latency 1576 9.3 Half-duplex switched Ethernet networks 1578 Many of the same arguments for sub-optimal support of Guaranteed Service 1579 apply to half-duplex switched Ethernet as to shared media: in essence, 1580 this topology is a medium that *is* shared between at least two senders 1581 contending for each packet transmission opportunity. Unless these are 1582 tightly coupled and cooperative then there is always the chance that the 1583 best-effort traffic of one will interfere with the important traffic of 1584 the other. Such coupling would seem to need some form of modifications 1585 to the MAC protocol (see above). 1587 Notwithstanding the above, half-duplex switched topologies do seem to 1588 offer the chance to provide Controlled Load service: with the knowledge 1589 that there are only a small limited number (e.g. two) of potential 1590 senders that are both using prioritisation for their CL traffic (with 1591 admission control for those CL flows based on the knowledge of the 1592 number of potential senders) over best effort, the media access 1593 characteristics, whilst not deterministic in the true mathematical 1594 sense, are somewhat predictable. This is probably a close enough 1595 approximation to CL to be useful. 1597 Type Speed Max Pkt Max Access 1598 Length Latency 1600 Ethernet 10Mbps 1.2ms unbounded 1601 100Mbps 120us unbounded 1602 1Gbps 12us unbounded 1604 Table 3 - Half-duplex switched Ethernet media access latency 1606 9.4 Half-duplex and shared Token Ring networks 1608 In a shared Token Ring network, the network access time for high 1609 priority traffic at any station is bounded and is given by (N+1)*THTmax, 1610 where N is the number of stations sending high priority traffic and 1611 THTmax is the maximum token holding time [14]. This assumes that 1612 network adapters have priority queues so that reservation of the token 1613 is done for traffic with the highest priority currently queued in the 1614 adapter. It is easy to see that access times can be improved by 1615 reducing N or THTmax. The recommended default for THTmax is 10 ms [6]. 1616 N is an integer from 2 to 256 for a shared ring and 2 for a switched 1617 half duplex topology. A similar analysis applies for FDDI. Using default 1618 values gives: 1620 Type Speed Max Pkt Max Access 1621 Length Latency 1623 Token-Ring 4/16Mbps shared 9ms 2570ms 1624 4/16Mbps switched 9ms 30ms 1625 FDDI 100Mbps 360us 8ms 1627 Table 4 - Half-duplex and shared Token-Ring media access latency 1629 Given that access time is bounded, it is possible to provide an upper 1630 bound for end-to-end delays as required by Guaranteed Service assuming 1631 that traffic of this class uses the highest priority allowable for user 1632 traffic. The actual number of stations that send traffic mapped into 1633 the same traffic class as GS may vary over time but, from an admission 1634 control standpoint, this value is needed a priori. The admission 1635 control entity must therefore use a fixed value for N, which may be the 1636 total number of stations on the ring or some lower value if it is 1637 desired to keep the offered delay guarantees smaller. If the value of N 1638 used is lower than the total number of stations on the ring, admission 1639 control must ensure that the number of stations sending high priority 1640 traffic never exceeds this number. This approach allows admission 1641 control to estimate worst case access delays assuming that all of the N 1642 stations are sending high priority data even though, in most cases, this 1643 will mean that delays are significantly overestimated. 1645 Assuming that Controlled Load flows use a traffic class lower than that 1646 used by GS, no upper-bound on access latency can be provided for CL 1647 flows. However, CL flows will receive better service than best effort 1648 flows. 1650 Note that, on many existing shared token rings, bridges will transmit 1651 frames using an Access Priority (see section 4.3) value 4 irrespective 1652 of the user_priority carried in the frame control field of the frame. 1653 Therefore, existing bridges would need to be reconfigured or modified 1654 before the above access time bounds can actually be used. 1656 9.5 Half-duplex and shared Demand-Priority networks 1658 In 802.12 networks, communication between end-nodes and hubs and between 1659 the hubs themselves is based on the exchange of link control signals. 1660 These signals are used to control the shared medium access. If a hub, 1661 for example, receives a high-priority request while another hub is in 1662 the process of serving normal-priority requests, then the service of the 1663 latter hub can effectively be pre-empted in order to serve the high- 1664 priority request first. After the network has processed all high- 1665 priority requests, it resumes the normal-priority service at the point 1666 in the network at which it was interrupted. 1668 The time needed to preempt normal-priority network service (the high- 1669 priority network access time) is bounded: the bound depends on the 1670 physical layer and on the topology of the shared network. The physical 1671 layer has a significant impact when operating in half-duplex mode as 1672 e.g. used across unshielded twisted-pair cabling (UTP) links, because 1673 link control signals cannot be exchanged while a packet is transmitted 1674 over the link. Therefore the network topology has to be considered 1675 since, in larger shared networks, the link control signals must 1676 potentially traverse several links (and hubs) before they can reach the 1677 hub which possesses the network control. This may delay the preemption 1678 of the normal priority service and hence increase the upper bound that 1679 may be guaranteed. 1681 Upper bounds on the high-priority access time are given below for a UTP 1682 physical layer and a cable length of 100 m between all end-nodes and 1683 hubs using a maximum propagation delay of 570ns as defined in [15]. 1684 These values consider the worst case signaling overhead and assume the 1685 transmission of maximum-sized normal-priority data packets while the 1686 normal-priority service is being pre-empted. 1688 Type Speed Max Pkt Max Access 1689 Length Latency 1691 Demand Priority 100Mbps, 802.3pkt, UTP 120us 253us 1692 802.5pkt, UTP 360us 733us 1694 Table 5 - Half-duplex switched Demand-Priority UTP access latency 1696 Shared 802.12 topologies can be classified using the hub cascading level 1697 "N". The simplest topology is the single hub network (N = 1). For a UTP 1698 physical layer, a maximum cascading level of N = 5 is supported by the 1699 standard. Large shared networks with many hundreds nodes can however 1700 already be built with a level 2 topology. The bandwidth manager could be 1701 informed about the actual cascading level by using network management 1702 mechanisms and use this information in its admission control algorithms. 1704 Type Speed Max Pkt Max Access Topology 1705 Length Latency 1707 Demand Priority 100Mbps, 802.3pkt 120us 262us N=1 1708 120us 554us N=2 1709 120us 878us N=3 1710 120us 1.24ms N=4 1711 120us 1.63ms N=5 1713 Demand Priority 100Mbps, 802.5pkt 360us 722us N=1 1714 360us 1.41ms N=2 1715 360us 2.32ms N=3 1716 360us 3.16ms N=4 1717 360us 4.03ms N=5 1719 Table 6 - Shared Demand-Priority UTP access latency 1721 In contrast to UTP, the fibre-optic physical layer operates in dual 1722 simplex mode: Upper bounds for the high-priority access time are given 1723 below for 2 km multimode fibre links with a propagation delay of 10 us. 1725 Type Speed Max Pkt Max Access 1726 Length Latency 1728 Demand Priority 100Mbps,802.3pkt,Fibre 120us 139us 1729 802.5pkt,Fibre 360us 379us 1731 Table 7 - Half-duplex switched Demand-Priority Fibre access latency 1733 For shared-media with distances of 2km between all end-nodes and hubs, 1734 the 802.12 standard allows a maximum cascading level of 2. Higher levels 1735 of cascaded topologies are supported but require a reduction of the 1736 distances [15]. 1738 Type Speed Max Pkt Max Access Topology 1739 Length Latency 1741 Demand Priority 100Mbps,802.3pkt 120us 160us N=1 1742 120us 202us N=2 1744 Demand Priority 100Mbps,802.5pkt 360us 400us N=1 1745 360us 682us N=2 1747 Table 8 - Shared Demand-Priority Fibre access latency 1749 The bounded access delay and deterministic network access allow the 1750 support of service commitments required for Guaranteed Service and 1751 Controlled Load, even on shared-media topologies. The support of just 1752 two priority levels in 802.12, however, limits the number of services 1753 that can simultaneously be implemented across the network. 1755 10 Justification 1757 An obvious comment is that this whole model is too complex, it is what 1758 RSVP is doing already, why do we think we can do better by reinventing 1759 the solution to this problem at layer-2? 1761 The key is that there are a number of simple layer-2 scenarios that 1762 cover a considerable proportion of the real QoS problems that will 1763 occur: a solution that covers nearly all of the problems at 1764 significantly lower cost is beneficial. Full RSVP/int-serv with per-flow 1765 queueing in strategically-positioned high-function switches or routers 1766 may be needed to completely solve all issues but devices implementing 1767 the architecture described in this document will allow a significantly 1768 simpler network. 1770 11 Summary 1772 This document has specified a framework for providing Integrated 1773 Services over shared and switched LAN technologies. The ability to 1774 provide QoS guarantees necessitates some form of admission control and 1775 resource management. The requirements and goals of a resource 1776 management scheme for subnets have been identified and discussed. We 1777 refer to the entire resource management scheme as a Bandwidth Manager. 1778 Architectural considerations were discussed and examples were provided 1779 to illustrate possible implementations of a Bandwidth Manager. Some of 1780 the issues involved in mapping the services from higher layers to the 1781 link layer have also been discussed. Accompanying memos from the ISSLL 1782 working group address service mapping issues [13] and provide a protocol 1783 specification for the Bandwidth Manager protocol [14] based on the 1784 requirements and goals discussed in this document. 1786 12 References 1787 [1] IEEE Standards for Local and Metropolitan Area Networks: Overview 1788 and 1789 Architecture ANSI/IEEE Std. 802.1 1791 [2] ISO/IEC 10038, ANSI/IEEE Std 802.1D-1993 "MAC Bridges" 1793 [3] ISO/IEC 15802-3 "Information technology - Telecommunications and 1794 information exchange between systems - Local and metropolitan 1795 area 1796 networks - Common specifications - Part 3: Media Access Control 1797 (MAC) 1798 Bridges" (current draft available as IEEE P802.1p/D8) 1800 [4] IEEE Standards for Local and Metropolitan Area Networks: Draft 1801 Standard 1802 for Virtual Bridged Local Area Networks, P802.1Q/D7, October 1803 1997. 1805 [5] B. Braden, L. Zhang, S. Berson, S. Herzog and S. Jamin, "Resource 1806 Reservation Protocol (RSVP) - Version 1 Functional 1807 Specification" RFC 1808 2205 September 1997 1810 [6] J. Wroclawski, "Specification of the Controlled Load Network Element 1811 Service" RFC 2211 September 1997 1813 [7] S. Shenker, C. Partridge and R. Guerin, "Specification of Guaranteed 1814 Quality of Service" RFC 2212 September 1997 1816 [8] R. Braden, D. Clark and S. Shenker, "Integrated Services in the 1817 Internet 1818 Architecture: An Overview" RFC 1633, June 1994. 1820 [9] J. Wroclawski, "The Use of RSVP with IETF Integrated Services" RFC 1821 2210 1822 September 1997 1824 [10] S. Shenker and J. Wroclawski, "Network Element Service 1825 Specification 1826 Template" Internet-Draft 1828 [11] S. Shenker and J. Wroclawski, "General Characterization Parameters 1829 for 1830 Integrated Service Network Elements" RFC 2215 September 1997 1832 [12] L. Delgrossi and L. Berger (Editors), "Internet Stream Protocol 1833 Version 2 1834 (ST2) Protocol Specification - Version ST2+", RFC 1819, August 1835 1995. 1837 [13] M.Seaman, A.Smith, E.Crawley "Integrated Service Mappings on IEEE 1838 802 1839 Networks", Internet Draft, November 1997, 1840 1842 [14] D.Hoffman et al. "SBM (Subnet Bandwidth Manager): A Proposal for 1843 Admission Control over Ethernet", Internet Draft, November 1997 1844 1846 [15] "Carrier Sense Multiple Access with Collision Detection (CSMA/CD) 1847 Access 1848 Method and Physical Layer Specifications" 1850 ANSI/IEEE Std 802.3-1985. 1852 [16] "Token-Ring Access Method and Physical Layer Specifications" 1853 ANSI/IEEE Std 802.5-1995 1855 [17] "A Standard for the Transmission of IP Datagrams over IEEE 802 1856 Networks", 1857 RFC 1042, February 1988 1859 [18] C. Bisdikian, B. V. Patel, F. Schaffa, and M Willebeek-LeMair, 1860 The Use of Priorities on Token-Ring Networks for Multimedia 1861 Traffic, IEEE Network, Nov/Dec 1995. 1863 [19] "Demand Priority Access Method, Physical Layer and Repeater 1864 Specification for 100Mbit/s", IEEE Std. 802.12-1995. 1866 [20] "Fiber Distributed Data Interface MAC", 1867 ANSI Std. X3.139-1987 1869 13 Security Considerations 1871 Implementation of the model described in this memo creates no known new 1872 avenues for malicious attack on the network infrastructure although 1873 readers 1874 are referred to section 2.8 of the RSVP specification [5] for a 1875 discussion of 1876 the impact of the use of admission control signaling protocols on 1877 network 1878 security. 1880 14 Acknowledgements 1882 Much of the work presented in this document has benefited greatly 1883 from discussion held at the meetings of the Integrated Services over 1884 Specific Link Layers (ISSLL) working group. In particular we would 1885 like to thank Eric Crawley, Don Hoffman and Raj Yavatkar. 1887 Authors' Addresses 1889 Anoop Ghanwani 1890 IBM Corporation 1891 P.O.Box 12195 1892 Research Triangle Park, NC 27709 1893 USA 1894 +1 (919) 254-0260 1895 anoop@raleigh.ibm.com 1897 J. Wayne Pace 1898 IBM Corporation 1899 P. O. Box 12195 1900 Research Triangle Park, NC 27709 1901 +1 (919) 254-4930 1902 pacew@raleigh.ibm.com 1904 Vijay Srinivasan 1905 IBM Corporation 1906 P. O. Box 12195 1907 Research Triangle Park, NC 27709 1908 +1 (919) 254-2730 1909 vijay@raleigh.ibm.com 1911 Andrew Smith 1912 Extreme Networks 1913 10460 Bandley Drive 1914 Cupertino CA 95014 1915 USA 1916 +1 (408) 863 2821 1917 andrew@extremenetworks.com 1919 Mick Seaman 1920 3Com Corp. 1921 5400 Bayfront Plaza 1922 Santa Clara CA 95052-8145 1923 USA 1924 +1 (408) 764 5000 1925 mick_seaman@3com.com