idnits 2.17.1 draft-ramakrishnan-mpls-unite-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-27) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (August 1998) is 9387 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 9 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force K.K. Ramakrishnan 3 INTERNET DRAFT Gisli Hjalmtysson 4 Kobus van der Merwe 5 (AT&T Labs. Research) 6 Flavio Bonomi 7 Sateesh Kumar 8 Michael Wong 9 (CSI Zeitnet/Cabletron) 10 August 1998 12 UNITE: An Architecture for Lightweight Signaling 13 15 Status of This Memo 17 This document is an Internet-Draft. Internet-Drafts are working 18 documents of the Internet Engineering Task Force (IETF), its areas, 19 and its working groups. Note that other groups may also distribute 20 working documents as Internet-Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at 24 any time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 To view the entire list of current Internet-Drafts, please check 28 the "1id-abstracts.txt" listing contained in the Internet-Drafts 29 Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern 30 Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific 31 Rim), ftp.ietf.org US East Coast), or ftp.isi.edu (US West Coast). 33 (Note that this ID is also available in Postscript and PDF formats) 35 Abstract 37 Communication networks need to support a wide range of applications 38 with diverse service quality requirements. The current widespread 39 use of best-effort communication also suggests that the overhead 40 for establishing communication both in processing and latency needs 41 to be kept at a minimum. With ATM signaling, every flow, including 42 a best-effort flow, suffers the overhead of end-to-end connection 43 establishment. ATM signaling complexity is further exacerbated by 44 having variable length messages with a large number of information 45 elements using a very flexible encoding, sent on a single control 46 channel. The inclusion of QoS processing and connectivity in 47 the initial setup of a connection requires sequential hop-by-hop 48 processing. Variable length messages involves both a single point of 49 resequencing as well as relatively slow, software based processing. 50 In recognition of these shortcomings, the MPLS working group has 51 opted to use topology driven label distribution as its default 52 label distribution mechanism, while at the same time acknowledging 53 the possible need for on-demand label distribution. We see these 54 different approaches as points on a range of solutions and we do 55 not wish to open a debate concerning the relative merits of each 56 approach. However, we believe that if there is a need for on-demand 57 label distribution, then there is a need to do this very efficiently. 58 In this light we have decided to bring to the MPLS working group our 59 architecture for lightweight signaling. While in its current form it 60 is applicable to an ATM environment, we believe that it represent a 61 step forward in the evolution of signaling for high speed networks. 62 It holds the promise of processing signaling in hardware, thereby 63 enabling substantial speed up of connection setup, so as to meet the 64 needs of contemporary applications. 66 Our proposed lightweight architecture for ATM signaling is called 67 UNITE. The fundamental philosophy of UNITE is the separation of 68 connectivity from QoS control. This has the potential to eliminate 69 the round-trip connection setup delay, before initiating data 70 transmission. Using a single cell with proper encoding, we avoid the 71 overhead of reassembly and segmentation on the signaling channel. 72 With fixed formats, we believe that a hardware implementation is 73 feasible. Performing QoS negotiation in-band allows switches in the 74 path to process QoS-requests in parallel, facilitates connection 75 specific control policies, supports both sender and receiver 76 initiated QoS, and allows for uniform treatment of unicast and 77 multicast connections. 79 Note on Applicability 81 This Internet Draft is based on an ATM Forum contribution and as 82 such is written within an ATM context. However, we believe that 83 the UNITE approach to signaling might also be of value within the 84 context of MPLS and have therefore decided to present it to the MPLS 85 working group to solicit feedback. We hope to extend and modify this 86 Internet Draft to be applicable for on-demand label distribution in 87 MPLS based on the feedback received. 89 1. Introduction 91 The goal of lightweight signaling is to reduce the penalty of 92 connection setup, while supporting service guarantees. A lightweight 93 signaling protocol should ideally support and enhance both 94 connectionless and connection-oriented services. Because of a desire 95 to foresee the signaling needs of any and all applications that are 96 likely to use the network, current ATM signaling is complex and 97 slow, multiple messages are required to set up a connection, and 98 considerable processing is required to parse the complex signaling 99 messages. 101 In this internet draft, we describe UNITE, a lightweight signaling 102 protocol for ATM networks. We are motivated by the need to more 103 efficiently support data applications that typify current Internet 104 traffic while providing facilities to support applications that 105 require stringent quality-of-service such as telephony. Furthermore, 106 this work is aimed at reducing the complexity of ATM signaling, 107 improving the performance of ATM call processing, and improving ATM 108 as a general purpose transport infrastructure. 110 The principal idea behind UNITE is a complete separation of 111 connectivity from quality-of- service, or more generally, service 112 attributes. The connectivity setup message is reduced to a single 113 ATM cell, with fixed field sizes and positions, avoiding the overhead 114 of reassembly and segmentation on the signaling channel, allowing 115 it to be fully processed in hardware. Exploiting per-VC queueing, 116 data can be forwarded immediately after a one-hop exchange, rather 117 than suffering a full-round-trip latency. However, we recognize 118 that not all switches are likely to have per-VC queues, and switches 119 may initially want to support connection establishment in software. 120 For this reason UNITE accommodates both software processing and 121 FIFO switches using a marker/marker-acknowledgment protocol between 122 switches. UNITE reduces connection setup cost sufficiently, so 123 that establishing connectivity becomes comparable to forwarding 124 and populating a cache in a router. A UNITE switch can therefore 125 reasonably be expected to setup new connections at a rate competitive 126 with routing in a connectionless networks. Conversely, IP-type 127 best effort data flows suffer sufficiently small delay penalty for 128 establishing a connection over the ATM infrastructure that it becomes 129 viable to set up a connection even for the shortest of flows. Thus, 130 UNITE is ideally matched to carry Internet traffic (IP) over ATM 131 networks. 133 UNITE uses in-band messages for QoS establishment. It builds on 134 the extensive work done for QoS in ATM networks, including the 135 specification of classes of service, admission control and related 136 issues such as conformance and policing. Because the QoS messages 137 are sent on the established VC, we can exploit parallelism to improve 138 the throughput and latency for QoS establishment. 140 In part due to its simplicity, UNITE supports both source and 141 destination initiated QoS, supports multipoint-to-multipoint 142 connections and recognizes the possible need for variable QoS to 143 different participants [i, ii] (variegated multicast trees). 145 UNITE has been implemented in a software prototype. Early 146 performance measurements confirm our expectations for a higher 147 signaling throughput and lower call setup latency. In the next 148 section we describe UNITEs connection setup for best effort 149 connections. Subsequently, in Section 3, we describe UNITEs support 150 for multicast. In Section 4, we provide details of UNITEs QoS 151 Management, and then deal with interoperability issues, both with UNI 152 as well as with existing switches. Section 7 summarizes the benefits 153 of UNITE and then we conclude. Finally, in Section 9, we briefly 154 consider the applicability of UNITE in an MPLS environment. 156 2. UNITE Connection Setup 158 UNITE uses a separate, initial mechanism for setup of connectivity to 159 enable a fast connection setup. This is shown in Figure 1. 161 Figure 1: The UNITE Connection Setup 162 The calling station issues a micro-setup, which is a single cell, on 163 the signaling channel that includes all the information necessary 164 to establish a best-effort connection with a remote called station. 165 The switch that receives the micro-setup determines the route (based 166 on the destination address and a broad QoS class identification) 167 to forward the micro-setup on the correct output port. After 168 allocating VC resources on the upstream link for the connection 169 and forwarding the micro-setup, the switch returns a single cell 170 micro-acknowledgment to the upstream node on the signaling channel. 171 If appropriate, the switch may allocate per- VC buffers for the 172 switch at that time also. On receiving the micro-ack, the upstream 173 node transmits a single cell marker on the established VC (in-band). 174 This marker serves as the 3rd step of a three-way handshake. 175 Subsequent to transmitting the marker, the upstream node may 176 transmit data on the VC, on a best-effort basis. The above sequence 177 of steps is repeated at each hop. Virtual Circuits established 178 are bi-directional, with VC-ids allocated in the conventional 179 manner by switches. While we believe we can accommodate multiple 180 address formats, we are currently using existing NSAP addresses 181 and address allocation methodologies. We assume the existence of 182 link-layer management, such as ILMI. The micro- setup is routed to 183 the destination on a hop-by-hop basis, using routing tables that 184 are setup based on existing PNNI information dissemination and 185 route-computations. We also use existing cell-formats and currently 186 defined AAL5 framing. 188 The commitment provided by the connection is that data is transmitted 189 on a best-effort basis. Since the QoS class information is provided 190 in the micro-setup, the path selected even for the best-effort 191 connection may be on a more informed basis than pure best-effort with 192 no a priori knowledge. Data may begin flowing from an upstream node 193 to the downstream node immediately upon completion of the micro-setup 194 on that hop. The latency suffered by a best- effort flow to use the 195 connection-oriented nature of ATM is thus only a single hop round- 196 trip propagation delay, plus the time needed to setup state on that 197 hop. Data is buffered on a switch (with per-VC buffers) when it 198 arrives on a VCid for which a forwarding table entry has yet to be 199 setup. In a subsequent section in this draft, we describe methods 200 to accommodate FIFO switches, and also when the processing of the 201 signaling messages is performed in software. This is enabled by the 202 use of an optional marker-acknowledge, that allows for a downstream 203 switch (or node) to require the upstream switch (or node) to delay 204 transmitting of data until it is ready to receive data. To ensure 205 that no persistent loops form, UNITE uses a combination of a unique 206 Flow-ID for the connection and an end-end acknowledgement. When 207 the destination receives the micro-setup, it sends an in-band (on 208 the established VC) end-end ack to the source. This indicates to 209 the source that a loop-free path has been established. Only upon 210 receiving the end-end ack does the source issue a RELEASE at any time 211 in the future when it needs it. Issuing a RELEASE prior to receiving 212 the end-end ack may erase the Flow-ID maintained at a switch. This 213 is undesirable because it will be unable to recognize the micro-setup 214 that may come back as a result of a loop. The combination of the 215 unique Flow-ID and holding back the RELEASE until the end-end ack is 216 received enables us to avoid loops. 218 2.1. ESTABLISHING CONNECTIVITY, PHASE 1: THE MICRO-SETUP 220 The micro-setup and the associated micro-acknowledgment are sent on 221 a well-known signaling VC. The processing of the micro-setup at the 222 switch includes the following functions, in order: 224 1. A route lookup for the micro-setup, identifying the port on which 225 to forward the micro- setup. 227 2. Allocation of a VC from the VC address space on the upstream 228 link. We assume that all connections are created bi-directional (to 229 minimize the overhead of both ends establishing connections). 231 3. Allocation of a reasonable amount of buffering at the switch for 232 that VC, if appropriate. 234 4. Initiating an ACK-timer for the micro-setup. This timer is for 235 error recovery when the micro- setup is lost or when the downstream 236 switch does not successfully progress the micro- setup. 238 5. Forwarding the micro-setup downstream on the link that the 239 route-lookup function determined as the best path towards the 240 destination end-system. 242 6. Mark the incoming VC state as DISCARD, so that the switch 243 discards all incoming cells on this VC. This enables us to clear 244 previously buffered cells for the upstream link on the newly assigned 245 VC, if there are any. The VC state transitions to FORWARD state 246 subsequently, when a MARKER acknowledging the ACK is received. 248 7. Finally, a VC id is returned to the upstream switch in the 249 micro-ACK. The upstream node may begin transmitting data on receipt 250 of the micro-ACK. The forwarding of the data to the downstream next 251 hop has to await the completion of the processing at the next hop 252 switch and the return of a corresponding VC id for the flow. 254 We have chosen to provide reliable delivery within the UNITE 255 signaling framework itself, rather than layering it on top of 256 another reliable transport mechanism. Current ATM UNI signaling 257 uses a reliable transport protocol, SSCOP for transporting signaling 258 messages thus re-incorporating some of the overhead for processing a 259 signaling message, and makes it difficult to implement in hardware. 260 The 3-way handshake obviates the need for a reliable transport for 261 carrying signaling messages. 263 A simple, efficient encoding of the setup is vital: we use a single 264 cell for the micro-setup, with only essential components in it, thus 265 allowing for hardware implementation. In addition, it allows for 266 distributed call setup to be implemented in a switch (especially 267 important when there are a large number of ports). The micro-setup 268 uses a unique end-to-end Flow-id. All control exchanges use this 269 Flow-id. Included in the micro-setup is whether the call is unicast 270 or multicast capable. Multicast and unicast connections have nearly 271 identical mechanisms for both connection setup and QoS setup. 273 UNITE adopts hop-by-hop routing of the micro-setup, in contrast 274 to the traditional source- routing used in ATMs PNNI routing 275 protocols. However, source-routing has been used to avoid loops 276 in connection-oriented networks. Since UNITEs Flow-id is a unique 277 end-to- end call-reference identifier, this may be used to detect 278 loops. When a duplicate micro-setup is received with the same 279 Flow-id, without it being a retransmission (or on a different port 280 than the port the earlier copy was received on) it indicates a 281 routing loop. UNITE suppresses multiple micro-setups (a mechanism 282 we also use for multicast connections for normal operation). A 283 controller might also send a release in the backward direction for 284 the Flow-id (or allow timers to subsequently close the connection). 285 This mechanism along with the rules for issuing a RELEASE after 286 an end-end acknowledge is received by the source ensures that a 287 successful connection does not contain a loop. Routing loops are 288 mostly transient inconsistencies in routing tables, which we expect 289 to be corrected by subsequent updates as part of the normal operation 290 of the routing protocols. 292 The micro-setup being a single cell allows the switch to avoid 293 re-assembly and segmentation. In addition, all of the requirements 294 to keep the cells in sequence may be ignored: a micro-setup cell 295 may be served in any order, relative to the others. Thus, we could 296 choose to process the micro- setup for some classes in hardware, 297 and others in software, if so desired. Furthermore, it allows for 298 a truly distributed implementation of the micro-setup because there 299 is no need for a single point of re-sequencing the cell streams for 300 signaling messages arriving on different ports. A fixed format 301 micro-setup cell also assists hardware implementations. 303 The fields of the micro-setup cell are as follows, with reference to 304 Figure 2: 306 1. Flow-id (8 bytes) - A unique (end-to-end) Flow-id identifying the 307 micro-setup from source. This comprises two sub-fields: 309 a) A unique source identifier. For example, this could be the host 310 Ethernet address, that is unique through the use of an address ROM (6 311 bytes). 313 b) A source unique sequence number (2 bytes). 315 2. Type (1 byte) - type of signaling cell. Includes a Retransmit 316 bit. 318 3. QoS Class (1 byte) - for minimal QoS sensitive routing. 319 (Potentially broken up into a nibble for class definition and a 320 nibble for specification of the size of the dominant parameter for 321 that class. 323 4. Reserved (1 byte) - for future use. Anticipating the potential 324 use of a Virtual Private Network Identifier, we could include 3 bytes 325 for a VPN ID by removing the User-User Information byte from the AAL5 326 trailer. The use of such a VPN ID is for further discussion. 328 5. Protocol ID (5 bytes) - allows the caller to specify the network 329 layer entity addressed at the called station and eliminates a need 330 for a second exchange to establish this connectivity. SNAP encoding 331 is assumed by default. The 5 bytes includes the OUI and PID fields. 333 6. Destination Address (20 bytes) - destination NSAP address. 335 7. A VPI/VCI that is assigned by the upstream node for the 336 connection when it is appropriate. This is determined by which 337 end of a link is supposed to allocate the VPI/VCI value for a new 338 connection, just like the current convention. 340 8. AAL5 Trailer (8 bytes) - the standard ATM AAL5 trailer including 341 the CRC and length. In addition, of course, is the 5 byte ATM cell 342 header. The VC id on which the micro-setup is transmitted is a 343 common, well-known signaling VC. 345 A switch maintains a timer associated with the micro-setup that 346 has been transmitted to the downstream hop. This timer is cleared 347 upon receiving the ACK from the downstream switch. A switch that 348 has timed out after transmission of the micro-setup retransmits the 349 micro-setup request. The re-transmitted micro-setup is identical to 350 the previous except for a retransmit bit in the type field. As a 351 result it can be retransmitted by hardware. 353 2.2. Establishing connectivity, Phase 2: The ACK for the Micro-setup 355 The micro-Acknowledgment of the connection setup upon successful 356 processing of the micro-setup is returned upstream to the previous 357 switch or host. The information provided has to be adequate for 358 appropriate processing at the upstream switch or the original 359 host requesting the connection. The downstream switch maintains 360 a timer associated with the micro-ACK transmitted upstream, for 361 re-transmitting micro-ACKs. (This timer is cleared when the MARKER 362 is received in the third phase of the micro-setup and therefore also 363 protects against loss of the MARKER.) The micro-ACK has the following 364 fields: 366 1. Flow-id (8 bytes): the Flow-id received in the micro-setup, to 367 enable to upstream node to match this ACK to the request. 369 2. VPI/VCI returned for the request (3 bytes) 371 3. The NSAP address that was the same as the one received in the 372 micro-setup. 374 4. There is also a bit to indicate to the upstream whether a 375 Marker-Acknowledge is to be expected or not, before transmitting 376 data. A second bit is used to inform the upstream switch on whether 377 it should delay sending its marker-acknowledge until it has received 378 one from downstream (Refer to Section 5.2). 380 2.3. Establishing connectivity, Phase 3: The Use of a Marker 382 The final part of the 3-way handshake on the hop-by-hop micro-setup 383 is an in-band MARKER. The MARKER serves not only to acknowledge the 384 micro-ACK message, but is also essential to mark the beginning of the 385 new data flow. The use of the 3-way handshake ensures that data at 386 both the upstream node and the link related to the newly allocated 387 VC id are flushed of old data at the time the downstream switch 388 receives the MARKER. The 3-way handshake also allows for recovery 389 from loss of the micro-ACK. The MARKER is the first cell sent in- 390 band by the upstream node. Everything that follows this marker is a 391 valid data cell for the new flow. The MARKER includes the Flow ID, 392 the NSAP address of the connection initiator (source), and a bit to 393 indicate if the version of the MARKER is a retransmission or not. 394 The switch controller may, for example, use the source NSAP address 395 for functions, such as VC re-routing or generating an out-of-band 396 RELEASE. 398 The upstream node, after sending the MARKER, sends data on this VC id 399 if the downstream node has not requested that a Marker-Acknowledge 400 is to be expected. If a Marker-Acknowledge is to be expected, 401 then the upstream node transmits data only after receiving the 402 Marker-Acknowledge. 404 3. Call Setup for Multicast 406 UNITE incorporates the functionality of having multipoint-to-multipo* 407 *int 408 communication [iv] as an integral part of the signaling architecture. 409 The simpler cases of point-to- multipoint multicast calls are 410 simple sub-cases of this overall multicast architecture. The simple 411 difference between a unicast call and a multicast call is that 412 the micro-setup issued indicates that the call is potentially a 413 multicast call. For the purposes of this discussion we assume that 414 the underlying network forwarding mechanism can manage issues such as 415 cell interleaving [iv]. Therefore, we describe procedures that are 416 applicable for core-initiated joins (for core based trees [v,vi]), 417 which are similar for source-initiated join for a source- based tree. 418 We then describe leaf-initiated joins for other participants that 419 join subsequent to the call being setup [vii,viii]. 421 3.1. Core/Source Initiated Joins 423 Core initiated joins (or source initiated joins) are relevant when 424 the set of participants is known initially. The core issues a 425 micro-setup knowing the NSAP address of each individual participant. 426 Since there is no way to package more than one NSAP address in the 427 micro- setup, an individual micro-setup is issued for each of the 428 participants. We think this is not as important, because, (a) the 429 micro-setup is relatively cheap and (b) the number of participants 430 that subsequently join using the leaf-initiated joins may dominate. 432 The first micro-setup issued to a participant includes a label (in 433 the Type field) to indicate that it is a multicast-capable call 434 setup. The rest of the micro-setup is similar to that described 435 for a unicast call. The Flow-id is determined by the originator 436 (i.e. the core or sender). The Flow-id acts as a call-reference 437 identifier for the multicast call. The micro-setup issued for 438 joining subsequent participants uses the same Flow-id, again labeled 439 as a multicast. The micro-ACK that comes back from the downstream 440 hop returns a VC id as with unicast calls. The MARKER transmitted by 441 the core (or source) is sent in-band, on the VC id returned in the 442 ACK. 444 The Flow-id used in the micro-setup is retained at the switch, as a 445 means of identifying the multicast call. During joins, the switch 446 sending the micro-setup maintains state, which includes the Flow-id 447 and the destination NSAP address to which the setup was issued (the 448 new leaf). This way, ACKs that return for the individual setups 449 issued may be matched up by the switch, for purposes of managing 450 their retransmission. 452 Figure 3: Core initiated join. Observe, that the marker on the last 453 hop to B is generated by the controller at the branch point. 455 The initiator of the micro-setup (core or source) sends the MARKER 456 when it receives the first micro-ACK. Upon receiving subsequent 457 micro-ACKs, the source/core knows that the VC is already open 458 (operational) and therefore, doesnt generate a further MARKER. At a 459 new branch point on the multicast tree, however, a MARKER is required 460 to the new destination: this is because that branch of the tree 461 needs to be flushed of any old data that is currently in existence 462 for that VC identifier. The controller is responsible for generating 463 and sending this in-band MARKER. Subsequently, data may be forwarded 464 on that VC id, as a result of a proper 3-way handshake. Figure 3 465 illustrates this scenario. 467 3.2. Leaf Initiated Joins 469 Figure 4 : Leaf Initiated Join: As LIJ is progressed to switch 470 four. Bs LIJ is suppressed at switch two. 472 The mechanisms for Leaf Initiated Joins (LIJ) are similar to those 473 suggested in the conventional ATM Forum UNI 4.0. However, instead of 474 having a separate LIJ and Add- Party mechanism, UNITE uses the same 475 mechanisms of the micro-setup for performing a LIJ. Consider Figure 476 4, where two participants A and B wish to join the multicast tree, 477 that currently ends at Switch 4. The LIJ is a micro-setup (the Type 478 indicator indicates that this is a LIJ for a multicast call) from 479 one of the participants, that is directed towards the core/source, 480 using the NSAP address corresponding to the core/source. The Flow ID 481 used in the micro setup is the multicast call reference identifier, 482 and is stored at the switches as the micro setup is forwarded 483 upstream towards the core. We assume that the underlying call 484 routing mechanisms direct the micro-setup towards the source/core 485 in accordance with the appropriate criterion (e.g., shortest-path 486 or least cost). When a LIJ arrives at a switch from another 487 participant, such as B, the Flow ID is recognized as already existing 488 at the switch, and the forwarding of Bs micro-setup is suppressed. 489 This may be done only if the core does not wish to be notified of 490 the address of an individual leaf joining. Note that this happens 491 even though the LIJ of the first participant added on this branch, 492 has not yet reached the tree at Switch 4. When the micro-setup 493 from A is issued, the 3-way handshake results in the marker being 494 forwarded by the switches upstream. This effectively opens up the 495 VC from the node A up to the branching point, at Switch 4. Along 496 with the suppression of the micro-setups, subsequent markers are also 497 suppressed at the switches. 499 4. DETAILS OF QoS MANAGEMENT 501 The second part of UNITE is a separate means of full QoS 502 specification and negotiation. This allows both a very flexible 503 QoS management process, as well as the ability to incorporate QoS 504 renegotiation with ease. As discussed in the previous sections, the 505 micro-setup includes a QoS byte that can be used in the original 506 connection setup to support coarse or aggregate level QoS (e.g., 507 by allowing some differentiated decision for the forwarding of the 508 micro-setup). UNITE supports detailed QoS signaling (or full QoS 509 signaling) that is performed in-band on the already established 510 best-effort VC. We anticipate that a large subset of flows will 511 not use the additional phase of a QoS setup for establishing a 512 distinctive quality of service. The QoS class specification that is 513 provided in the initial micro-setup may be adequate for a reasonably 514 large subset of best-effort flows (e.g., a large class of TCP/IP and 515 UDP/IP flows carrying non-real-time data clearly dont need to have a 516 subsequent QoS setup phase). Similarly, well-understood real-time 517 flows such as uncompressed telephony traffic (mu-law, a-law) may be 518 adequately specified as being delay- sensitive. The assured QoS for 519 the flow begins after the QoS negotiation completes, end-to-end, 520 in a similar fashion to the conventional UNI QoS setup. Also, we 521 believe that most of the more sophisticated QoS management will be 522 handled in software as is the case in the current UNI framework. 523 However, UNITEs framework provides a more flexible and efficient QoS 524 management in the following dimensions: 526 1. UNITE QoS requests may be initiated by the source or the 527 destination of the original best effort connection setup. In the 528 more general case of multicast connections, QoS requests may be 529 source/core initiated or leaf initiated. 531 2. UNITE QoS in-band signaling allows QoS renegotiation originating 532 from any of the connection end points. 534 3. UNITE QoS in-band signaling enables potentially different QoS 535 negotiation modalities and implementations taking advantage of 536 parallelism in the processing of the QoS setup across multiple 537 switches in the end-to-end path. 539 Figure 5 : Protocol for Establishing QoS in UNITE. 541 For those flows that require a detailed QoS negotiation, we use 542 the process of QoS setup described in Figure 5. The QoS request 543 may immediately follow the marker, as shown in Figure 5, or may 544 be submitted after the call is established. The receiver, after 545 processing the request sends a QoS Commit, that commits the 546 reservation. To adjust over-committed reservations, and to confirm 547 the QoS reservation to the receiver, the originator sends a QoS 548 Ack. The delay until a QoS flow begins on the forward path is an 549 end-to-end round-trip plus the processing at the destination. On the 550 reverse path, a confirmed QoS flow begins one round-trip after the 551 QoS Commit is issued from the destination. For compatibility with 552 existing ATM, we anticipate that the QoS request, Commit and Ack, 553 would be encoded as in the UNI connection setup and connect messages, 554 as far as the QoS information is concerned. For our purposes in 555 this section, we treat the end-system that initiates the QoS setup 556 request as the QoS source. The end-system that responds to the QoS 557 setup request at the other end is the QoS destination. During the QoS 558 negotiation, data may still flow on the connection on a best-effort 559 basis. Cells that belong to the QoS negotiation message are marked 560 with a Payload-Type Indicator (PTI), possibly as RM cells, so that 561 it may flow to the controller on the switch. Thus, in fact, QoS 562 signaling and data cells (or messages) may be interleaved because of 563 the PTI value being distinct from one another. 565 Figure 6: Three Way QoS Setup. 567 Various alternatives for detailed QoS negotiation can be considered 568 here, including the conventional three way setup described in Figure 569 6, and one which is consistent with the RSVP-like signaling proposed 570 for IP networks. With reference to Figure 6, the QoS request is 571 multicast to all switch controllers in the path and to the next link 572 at each switch, facilitating parallel processing in the controllers 573 (1). The Commit message traverses the reverse path, slaloming to 574 every controller, collecting the commitments (2). The QoS Ack. 575 multicasts the commitment to all controllers (3). 577 In UNITE a QoS request may be initiated by any participant of a 578 multicast, the core (if present), source or a leaf. Moreover, 579 unless otherwise dictated by higher level policies, core/source and 580 leaf initiated QoS may all be used at different times for the same 581 multicast. As an illustration of the potential of UNITE, we describe 582 the case of Leaf Initiated QoS request by referring to Figure 7. 584 The leaf initiated QoS requests carry the demand from the receivers 585 upstream. When the QoS request arrives at a switch, the demand is 586 noted at the switch. The switch conveys upstream, the maximum of all 587 the demands observed from the different branches (a leaf node or a 588 switch may be at the end of the branch). Note that different leaves 589 may issue their QoS requests at different times. The switch examines 590 each QoS request and transmits a request upstream only if the QoS 591 request is higher than the current maximum. When the demands arrive 592 at the core/sender, the permit returned is the minimum of the offered 593 capacity, the demands received from the leaves and the available link 594 capacity. Note that each switch needs to maintain state, which is 595 the demand and the permit returned for each branch for the multicast 596 call. The leaf may be a source or a receiver, requesting a QoS on a 597 shared distribution tree (e.g., CBT). 599 Figure 7: Multicast QoS. Leaf initiated QoS, a) demand phase, b) 600 permit phase. 602 5. Interoperability Issues 604 In this section, we describe how to use UNITE with existing switches 605 including software based implementations and FIFO switches. 607 5.1. Interoperability with existing Switches 609 The proposed UNITE protocol discussed in Section 2 assumes that 610 switches will be able to do per-VC queueing and furthermore will be 611 able to handle the processing of the Marker in- band. Processing of 612 the Marker involves changing the state of the per-VC queue so that 613 arriving cells are buffered rather than dropped. (This ensures that 614 valid data cells, that might follow the marker back-to-back, will 615 be queued, while any invalid cells, e.g. cells in flight from an 616 erroneous connection, will be dropped.) Current ATM switches do not 617 necessarily provide these capabilities, however, and it is crucial 618 that UNITE can still function on such legacy switches. An extra (but 619 optional) Marker-Acknowledge message is introduced to deal with these 620 issues. 622 If a switch is processing the Marker in software, it cannot guarantee 623 that queue state will change from discard to queueing in time to 624 cater for valid data cells following the Marker. In fact because of 625 different switch architectures and implementations, the amount of 626 time it takes to process the Marker in software will vary greatly. 627 An upstream node therefore has no way of knowing how long to delay 628 before it can start forwarding data cells. By having a downstream 629 node sending the Marker-Acknowledge message only when it is ready 630 to receive data from the upstream node, this problem is solved. An 631 illustration of the optional use of the Marker-Acknowledge is given 632 in Figure 9. A downstream node indicates in the micro-Acknowledge 633 message whether it requires the upstream node to wait for a 634 Marker- Acknowledge or not. The Marker-Acknowledge mechanism can 635 therefore be used on a hop- by-hop basis as dictated by local switch 636 capabilities. When a downstream node has requested the use of the 637 Marker-Acknowledge message, the upstream node starts a timer when 638 it sends the Marker downstream. This timer is cleared when the 639 Marker-Acknowledge message is received from downstream, or, if the 640 timer expires the Marker is retransmitted. Note that the penalty 641 for using the Marker-Acknowledge is two round-hop worth of delay as 642 opposed to one round hop in the ideal case. The hop-by-hop nature of 643 the original protocol is however maintained 645 The Marker-Acknowledge message is also used to cater for FIFO 646 switches, as illustrated in Figure 10. A FIFO switch will not 647 be able to buffer data cells until it receives an acknowledgment 648 from downstream. (Indeed some FIFO switches might not even be 649 able to accept cells into the switch without having received the 650 outgoing VCI from the downstream switch.) A FIFO switch will then 651 simply delay sending the Marker-Acknowledge until it is capable of 652 forwarding data cells. This in itself is however not enough. If 653 the upstream switch is itself a FIFO switch, then the second FIFO 654 switch has to also indicate to the upstream switch that it should not 655 send a Marker-Acknowledge message upstream until it has received a 656 Marker-Acknowledge message from downstream. (In the non-FIFO case 657 described above, a switch can send a Marker-Acknowledge message 658 upstream, as soon as it is capable of receiving data. If both the 659 upstream and the downstream switches are FIFO switches, however, the 660 upstream switch should wait until the downstream switch is capable of 661 receiving data.) The Acknowledgment message therefore also needs to 662 indicate to the upstream node whether it should wait for a downstream 663 Marker-Acknowledge, before it can send its Marker-Acknowledge 664 upstream. (If the upstream node is not a FIFO switch and is capable 665 of buffering data, it can simply ignore this indication in the 666 Acknowledgment message.) 668 Figure 9: Use of the Optional Marker Acknowledge to enable 669 downstream switches to control upstream switch transmission of data 670 until it is ready 672 This approach has the effect of changing the hop-by-hop delay of the 673 UNITE protocol into a partial end-to-end delay across consecutive 674 FIFO switches. (A setup request will proceed, with data following, 675 on a hop-by-hop basis until a FIFO switch or switches are reached. 676 Forwarding of data will then be delayed until the last FIFO switch in 677 the sequence is ready to receive the data.) 679 6. CONSIDERATIONS ON UNITE IMPLEMENTATION. 681 The fundamental features of UNITE, namely, the separation of 682 connectivity from full QoS processing, the specification of single 683 cell signaling messages and the simplified reliability support via 684 timers and retransmission of basic messages, enable a broad range of 685 implementation scenarios for UNITE. 687 At one extreme, UNITE may be implemented completely at the software 688 level. The only functionality required at the hardware level is the 689 ability to recognize in-band control cells used for UNITE signaling 690 arriving at the switch ports, and to route such cells to the switch 691 controller. In the most basic software implementation per VC queuing 692 would not be required, and early data transmission (before end-to-end 693 acknowledgment) may not be supported. We believe that, while the 694 full latency improvement potential of UNITE is not achieved with 695 such an implementation, significant improvement in call processing 696 capacity as well as fairness improvements may indeed be achieved. 698 Figure 10: Use of the Marker Ack with a sequence of FIFO switches. 700 At the opposite extreme in the range of implementations of UNITE 701 is the scenario where as much call processing functionality is 702 implemented in the hardware, most likely located in the switch line 703 cards and host adapters, and the switch supports advanced queuing and 704 scheduling schemes. In this scenario the full potential of UNITE 705 may be manifested, with close to a single hop round trip latency 706 before the inception of data transmission, and vast call set up 707 capacity increases for best effort or basic QoS calls. Such capacity 708 increases are naturally scaleable with the switch port capacity 709 and the number of ports on the switch, thanks to the distributed 710 nature of the implementation enabled by UNITE. We believe that a call 711 processing capacity of several thousand calls per second per OC-3 712 port is feasible within a UNITE framework. Note that even in an 713 advanced implementation the full QoS call processing would be handled 714 at the software level. 716 It is reasonable to conceive a UNITE implementation in which the port 717 processing modules on the port cards support the following functions 718 in hardware: 720 1. Capture/injection of UNITE signaling cells. 722 2. Management of timers, retransmissions and state changes in the 723 call processing state machine. 725 3. Forwarding of micro-setup to the correct outgoing link, based on 726 fast address lookup. 728 4. Allocation of incoming labels (i.e., incoming/outgoing VPI/VCI 729 and Tags used for routing through the fabric) out of local label 730 pools managed (on longer time scales) by the central switch 731 controller. 733 5. Basic QoS support. This may imply forwarding and 734 queueing/scheduling based on a the QoS byte in the micro-setup. 736 6. Control of queue scheduling based on UNITE control messages 737 received (e.g., blocking until a message is received). 739 A subset of the functionality listed above may also be implemented 740 within the Adapter SAR ASIC, namely, signaling control cell 741 capture/injection and management of timers, retransmissions and state 742 changes in the call processing state machine. The switch control 743 processor would, in this scenario, be responsible for: 745 1. Monitoring and management of label pools allocated in real time 746 by the Port Processing Modules. 748 2. Call accounting and monitoring. 750 3. Switch level resource management. 752 4. Full QoS call processing, including Call Admission Control and 753 support of sophisticated bandwidth reservation schemes and management 754 of appropriate scheduling schemes. 756 5. Initialization, monitoring of error conditions and switch level 757 management. A range of UNITE implementations falling in between the 758 extremes described above can naturally be conceived, including the 759 case of the current generation of switches supporting per VC queuing 760 in the switch fabric, but still handling UNITE control cells in 761 software. Large signaling performance advantages could be gained in 762 this case by exploiting the early data transmission feature of UNITE. 764 In order to explore the implications of a basic UNITE software 765 implementation we developed a UNITE prototype completely in software 766 over a network including two ATM switches and two adapter cards. 767 A picture of the prototype setup is shown in Figure 11. In the 768 prototype in-band UNITE signaling cells are supported by OAM cells. 769 To evaluate the performance improvement with UNITE, we compared the 770 performance of UNI 3.0 versus the prototype UNITE implementation. 771 The tests used a mature UNI 3.0 implementation, a Radcom test box 772 acting as a source and destination, and a Cabletron ATM Switch. The 773 UNITE tests used two Sun workstations with Cabletron/Zeitnet ATM 774 adapters. 776 One important issue for connection-oriented communication is the 777 amount of memory used to keep state for each individual connection. 778 At least comparatively, UNITE is efficient in using memory for VC 779 state, using only 128 bytes per best-effort VC in our prototype. In 780 contrast, UNI uses almost 1500 bytes for a best-effort VC. Thus, 781 there is the potential for UNITE to support a much larger number of 782 VCs on switch ports. 784 We measured the UNITE connection setup latency and throughput. 785 Our preliminary results, using 100 microsecond clock granularity 786 in our measurements, were as follows: The best effort connection 787 setup latency through an individual switch was 1.7 ms with UNITE. 788 In comparison, a best-effort UNI connection took 10.9 milliseconds. 789 The various components of this service time are shown in Table 1. 790 In terms of throughput, UNITE got approximately 700-800 calls/sec, 791 while with UNI we got approximately 130 calls/sec. We believe that 792 with some simple optimizations, UNITE could easily get over 1000 793 calls/sec. We expect that even more substantial improvement could be 794 achieved with UNITE with a streamlined/hardware implementation. 796 7. BENEFITS OF UNITE 798 In this section we summarize and reorganize, as a quick reference, 799 the benefits of UNITE discussed in this internet draft. 801 1) Separating connectivity from QoS enables UNITE to: 803 a) Achieve high throughput for establishing connections. 805 b) Have a very low latency to begin data transmission because we dont 806 have to wait for an end-end message exchange. 808 c) Have throughput and latency for connection establishment be 809 independent of the complexity of the QoS class and other service 810 characteristics. Complex QoS specifications are allowed for those 811 connections that need it. 813 d) Support QoS establishment and renegotiation in a similar fashion. 814 This enables simple ways to change parameters for flows. 816 e) Allow for communication on a best-effort basis even upon failure 817 of a QoS request. 819 2) UNITE is ideally matched to carry Internet traffic over ATM 820 networks. 822 3) UNITE is optimized for distributed hardware implementation of 823 signaling within a switch on a per-port basis. 825 a) A single cell, fixed length, fixed format micro-setup allows for 826 high-speed processing of the setup message. 828 b) No single point of re-sequencing or SAR is needed, and no software 829 stack such as SSCOP is required for supporting basic connection 830 establishment. 832 c) Reliability is achieved via simple timers and retransmissions that 833 are easily implemented in hardware. 835 d) State and context information for connectivity requires only a 836 small amount of memory and can be kept in a distributed fashion, even 837 on a line card. 839 4) Separation of connectivity and QoS and sending QoS related 840 messages in-band allows the network to have QoS setup initiated by 841 sources or destinations. 843 5) UNITE provides isolation of QoS negotiation to connections that 844 require it 845 a) Multiple passes, complex QoS negotiation or other service 846 characteristics may be allowed. 848 6) Supports a full range of multicast architectures, including 849 multipoint-to-multipoint. 851 a) QoS can adapt to the capabilities of the branches of the tree. 853 7) Builds on the QoS work done for ATM and IP. 855 a) Accommodates a wide range of QoS models: UNI, RSVP and future 856 evolution. 858 9) Builds on substantial amount of the work done for PNNI for 859 QoS-sensitive routing. 861 10) Allows communication on a path selected based on a coarse class 862 specification. Hence even simple connectivity can be better than 863 true best-effort. 865 11) Inter-works with existing UNI switches. 867 12) Allows for legacy switches and various levels of hardware 868 implementation complexity. 870 8. SUMMARY 872 We have described a protocol for lightweight signaling. The key idea 873 behind the protocol, is the separation of connection establishment 874 and QoS processing. This makes connection setup independent of QoS 875 processing complexity, benefiting most flows and best effort flows 876 in particular, as the channel becomes immediately available on best 877 effort basis. The separation allows all flow specific signaling, 878 i.e., QoS messages to be carried in-band, thus protecting the shared 879 signaling resources. UNITE signaling unifies initial QoS setup and 880 renegotiation, and supports source/core initiated QoS as well as 881 receiver initiated QoS requests. 883 The connectivity setup message is a single ATM cell (called 884 micro-setup). The micro-setup and micro-acknowledgment are exchanged 885 on a hop-by-hop basis on a signaling channel. By incorporating a 886 minimal QoS class identification in the micro-setup request, UNITE 887 has the ability to provide QoS sensitive routing. Data flow on 888 the best-effort VC may begin without waiting for the setup to be 889 completed over the entire end-end path. The use of per-VC queuing 890 permits the source to start sending data on a best-effort basis as 891 soon as the connection has been setup on the next hop. 893 Subsequently, QoS setup requests and acknowledgments flow in-band on 894 the best-effort VC that is initially setup. The QoS for the flow 895 is assured upon completion of the end-end exchange of the QoS setup 896 and ack. The complexity of QoS messages and their processing is 897 isolated to those VCs requiring it, without impacting other VCs. In 898 addition, it allows for the QoS requester to be either the source or 899 destination of the connection. The architecture recognizes the need 900 for multipoint-to-multipoint connections, and the possible need for 901 variable QoS to different participants in the multicast group. 903 9. Applicability OF UNITE to MPLS 905 As we stated earlier, this Internet Draft is based on an ATM Forum 906 contribution and as such is written within an ATM context. However, 907 we believe that UNITE might also be of value within the context of 908 MPLS and have therefore decided to present it to the MPLS working 909 group to solicit feedback. 911 Clearly, UNITE currently uses ATM addresses, to be applicable to 912 ATM. However, we believe that the protocol could be used with IP 913 addresses, with hop-by-hop forwarding of the micro-setup at the 914 MPLS switches using conventional link-state routing, such as OSPF. 915 Because of the inclusion of the QoS class in the micro-setup, we 916 can take advantage of potential enhancements to IP to accommodate 917 QoS-sensitive routing. 919 We believe that UNITE might be applicable to the following objectives 920 of the MPLS working group. We reiterate below these specific 921 objectives: 923 1. Specify standard protocol(s) for maintenance and distribution 924 of label binding information to support unicast destination-based 925 routing with forwarding based on label-swapping. 927 2. Specify standard protocol(s) for maintenance and distribution 928 of label binding information to support multicast routing with 929 forwarding based on label-swapping. 931 4. Specify standard protocol(s) for maintenance and distribution of 932 label binding information to support explicit paths different from 933 the one constructed by destination-based forwarding with forwarding 934 based on label-swapping. 936 6. Specify a standard way to use the ATM user plane 938 a) Allow operation/co-existence with standard (ATM Forum, ITU, etc.) 939 ATM control plane and/or standard ATM hardware 941 b) Specify a label swapping control plane 943 c) Take advantage of possible mods/improvements in ATM hardware, for 944 example the ability to merge VCs 946 7. Discuss support for QOS (e.g. RSVP) 948 UNITE is a framework that is efficient in providing connectivity, 949 with very low latency for a source to begin transmitting data 950 when using on-demand label distribution. An integral part of the 951 framework is providing very flexible support for QoS, accommodating 952 multiple QoS models including sender and receiver initiated QoS. 953 Multicast support fits naturally in UNITE, with common procedures 954 applicable for unicast and multicast (both source and receiver 955 joins). UNITE allows the network to scale to large numbers of nodes 956 because of the ability to support on-demand label distribution 957 efficiently. Further, UNITE achieves scalability in the following 958 dimensions: 960 1. UNITE can achieve high throughput for label distribution. 962 2. UNITE enables initiation of packet forwarding with low latency. 964 3. UNITE minimizes the amount of state needed in the network. 966 UNITE uses a QoS hint to route the setup. The explicit path 967 established in this manner may therefore be different from 968 "default" destination based forwarding because it uses QoS sensitive 969 destination based routing. 971 As it is currently defined, UNITE is an ATM control protocol and is 972 therefore directly applicable to objective (6). 974 UNITE also directly addresses QoS issues without making any 975 assumptions about the specific QoS model that is used. For example, 976 RSVP can be combined with UNITE to perform the actual resource 977 reservations. 979 If particular MPLS switches do not support native IP forwarding, 980 then the need for UNITE appears even more compelling, especially in 981 an environment where services other than best- effort are provided. 982 (e.g. in a Diffserv type of environment, it would be wasteful to 983 distribute labels for all service classes across the whole network). 985 10. References 987 [i] L. Zhang, S. Deering, D. Estrin, S. Shenker, RSVP: A New Resource 988 ReSerVation Protocol, IEEE Network Magazine, Sept. 1993. 990 [ii] Danny J. Mitzel, Deborah Estrin, Scott Shenker, and Lixia 991 Zhang, An Architectural Comparison of ST-II and RSVP, Proceedings of 992 Infocom94, 1994. 994 [iv] M. Grossglauser & K.K. Ramakrishnan (1997) SEAM: Scalable and 995 Efficient ATM Multicast, Proceedings of IEEE Infocom'97, April 1997, 996 Kobe, Japan. 998 [v] T. Ballardie, P. Francis, and J. Crowcroft, Core Based Trees 999 (CBT), in Proc. ACM SIGCOMM'93, San Francisco, California, September 1000 1993. 1002 [vi] S. Deering, D. Estrin, D. Farinacci, V. Jacobson, C. Liu, and L. 1003 Wei, An Architecture for Wide-Area Multicast Routing,' in Proc. ACM 1004 SIGCOMM'94, London, August 1994. 1006 [vii] ATM Forum, ATM User-Network-Interface (UNI) Signaling 1007 specification version 4.0, July 1996. 1009 [viii] S. Deering, Multicast Routing in Internetworks and Extended 1010 LANs, in Proc. ACM SIGCOMM'88, Stanford, California, August 1988. 1012 Authors' Addresses 1014 K. K. Ramakrishnan, Gili Hjalmtysson, Kobus Van der Merwe 1015 AT&T Labs. Research 1016 180 Park Avenue, Florham Park, N.J. 07932 1017 kkrama@research.att.com,gisli@research.att.com,kobus@research.att.com 1018 Phone:+1 973 360 8766 1019 Fax: +1 973 360 8871 1021 Flavio Bonomi, Sateesh Kumar, Michael Wong 1022 CSI ZeitNet/Cabletron 1023 5150 Great America Parkway, Santa Clara, CA, 95054 1024 fbonomi@ctron.com, skumar@ctron.com, mwong@ctron.com 1025 Phone: +1 408 565 9360 1026 Fax: +1 408 565 6501