Internet Engineering Task Force K.K. Ramakrishnan INTERNET DRAFT Gisli Hjalmtysson Kobus van der Merwe (AT&T Labs. Research) Flavio Bonomi Sateesh Kumar Michael Wong (CSI Zeitnet/Cabletron) August 1998 UNITE: An Architecture for Lightweight Signaling Status of This Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." To view the entire list of current Internet-Drafts, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific Rim), ftp.ietf.org US East Coast), or ftp.isi.edu (US West Coast). (Note that this ID is also available in Postscript and PDF formats) Ramakrishnan et al. Expires February 1999 [Page i] Internet Draft UNITE August 1998 Abstract Communication networks need to support a wide range of applications with diverse service quality requirements. The current widespread use of best-effort communication also suggests that the overhead for establishing communication both in processing and latency needs to be kept at a minimum. With ATM signaling, every flow, including a best-effort flow, suffers the overhead of end-to-end connection establishment. ATM signaling complexity is further exacerbated by having variable length messages with a large number of information elements using a very flexible encoding, sent on a single control channel. The inclusion of QoS processing and connectivity in the initial setup of a connection requires sequential hop-by-hop processing. Variable length messages involves both a single point of resequencing as well as relatively slow, software based processing. In recognition of these shortcomings, the MPLS working group has opted to use topology driven label distribution as its default label distribution mechanism, while at the same time acknowledging the possible need for on-demand label distribution. We see these different approaches as points on a range of solutions and we do not wish to open a debate concerning the relative merits of each approach. However, we believe that if there is a need for on-demand label distribution, then there is a need to do this very efficiently. In this light we have decided to bring to the MPLS working group our architecture for lightweight signaling. While in its current form it is applicable to an ATM environment, we believe that it represent a step forward in the evolution of signaling for high speed networks. It holds the promise of processing signaling in hardware, thereby enabling substantial speed up of connection setup, so as to meet the needs of contemporary applications. Our proposed lightweight architecture for ATM signaling is called UNITE. The fundamental philosophy of UNITE is the separation of connectivity from QoS control. This has the potential to eliminate the round-trip connection setup delay, before initiating data transmission. Using a single cell with proper encoding, we avoid the overhead of reassembly and segmentation on the signaling channel. With fixed formats, we believe that a hardware implementation is feasible. Performing QoS negotiation in-band allows switches in the path to process QoS-requests in parallel, facilitates connection specific control policies, supports both sender and receiver initiated QoS, and allows for uniform treatment of unicast and multicast connections. Ramakrishnan et al. Expires February 1999 [Page ii] Internet Draft UNITE August 1998 Note on Applicability This Internet Draft is based on an ATM Forum contribution and as such is written within an ATM context. However, we believe that the UNITE approach to signaling might also be of value within the context of MPLS and have therefore decided to present it to the MPLS working group to solicit feedback. We hope to extend and modify this Internet Draft to be applicable for on-demand label distribution in MPLS based on the feedback received. 1. Introduction The goal of lightweight signaling is to reduce the penalty of connection setup, while supporting service guarantees. A lightweight signaling protocol should ideally support and enhance both connectionless and connection-oriented services. Because of a desire to foresee the signaling needs of any and all applications that are likely to use the network, current ATM signaling is complex and slow, multiple messages are required to set up a connection, and considerable processing is required to parse the complex signaling messages. In this internet draft, we describe UNITE, a lightweight signaling protocol for ATM networks. We are motivated by the need to more efficiently support data applications that typify current Internet traffic while providing facilities to support applications that require stringent quality-of-service such as telephony. Furthermore, this work is aimed at reducing the complexity of ATM signaling, improving the performance of ATM call processing, and improving ATM as a general purpose transport infrastructure. The principal idea behind UNITE is a complete separation of connectivity from quality-of- service, or more generally, service attributes. The connectivity setup message is reduced to a single ATM cell, with fixed field sizes and positions, avoiding the overhead of reassembly and segmentation on the signaling channel, allowing it to be fully processed in hardware. Exploiting per-VC queueing, data can be forwarded immediately after a one-hop exchange, rather than suffering a full-round-trip latency. However, we recognize that not all switches are likely to have per-VC queues, and switches may initially want to support connection establishment in software. For this reason UNITE accommodates both software processing and FIFO switches using a marker/marker-acknowledgment protocol between Ramakrishnan et al. Expires February 1999 [Page 1] Internet Draft UNITE August 1998 switches. UNITE reduces connection setup cost sufficiently, so that establishing connectivity becomes comparable to forwarding and populating a cache in a router. A UNITE switch can therefore reasonably be expected to setup new connections at a rate competitive with routing in a connectionless networks. Conversely, IP-type best effort data flows suffer sufficiently small delay penalty for establishing a connection over the ATM infrastructure that it becomes viable to set up a connection even for the shortest of flows. Thus, UNITE is ideally matched to carry Internet traffic (IP) over ATM networks. UNITE uses in-band messages for QoS establishment. It builds on the extensive work done for QoS in ATM networks, including the specification of classes of service, admission control and related issues such as conformance and policing. Because the QoS messages are sent on the established VC, we can exploit parallelism to improve the throughput and latency for QoS establishment. In part due to its simplicity, UNITE supports both source and destination initiated QoS, supports multipoint-to-multipoint connections and recognizes the possible need for variable QoS to different participants [i, ii] (variegated multicast trees). UNITE has been implemented in a software prototype. Early performance measurements confirm our expectations for a higher signaling throughput and lower call setup latency. In the next section we describe UNITEs connection setup for best effort connections. Subsequently, in Section 3, we describe UNITEs support for multicast. In Section 4, we provide details of UNITEs QoS Management, and then deal with interoperability issues, both with UNI as well as with existing switches. Section 7 summarizes the benefits of UNITE and then we conclude. Finally, in Section 9, we briefly consider the applicability of UNITE in an MPLS environment. 2. UNITE Connection Setup UNITE uses a separate, initial mechanism for setup of connectivity to enable a fast connection setup. This is shown in Figure 1. Figure 1: The UNITE Connection Setup Ramakrishnan et al. Expires February 1999 [Page 2] Internet Draft UNITE August 1998 The calling station issues a micro-setup, which is a single cell, on the signaling channel that includes all the information necessary to establish a best-effort connection with a remote called station. The switch that receives the micro-setup determines the route (based on the destination address and a broad QoS class identification) to forward the micro-setup on the correct output port. After allocating VC resources on the upstream link for the connection and forwarding the micro-setup, the switch returns a single cell micro-acknowledgment to the upstream node on the signaling channel. If appropriate, the switch may allocate per- VC buffers for the switch at that time also. On receiving the micro-ack, the upstream node transmits a single cell marker on the established VC (in-band). This marker serves as the 3rd step of a three-way handshake. Subsequent to transmitting the marker, the upstream node may transmit data on the VC, on a best-effort basis. The above sequence of steps is repeated at each hop. Virtual Circuits established are bi-directional, with VC-ids allocated in the conventional manner by switches. While we believe we can accommodate multiple address formats, we are currently using existing NSAP addresses and address allocation methodologies. We assume the existence of link-layer management, such as ILMI. The micro- setup is routed to the destination on a hop-by-hop basis, using routing tables that are setup based on existing PNNI information dissemination and route-computations. We also use existing cell-formats and currently defined AAL5 framing. The commitment provided by the connection is that data is transmitted on a best-effort basis. Since the QoS class information is provided in the micro-setup, the path selected even for the best-effort connection may be on a more informed basis than pure best-effort with no a priori knowledge. Data may begin flowing from an upstream node to the downstream node immediately upon completion of the micro-setup on that hop. The latency suffered by a best- effort flow to use the connection-oriented nature of ATM is thus only a single hop round- trip propagation delay, plus the time needed to setup state on that hop. Data is buffered on a switch (with per-VC buffers) when it arrives on a VCid for which a forwarding table entry has yet to be setup. In a subsequent section in this draft, we describe methods to accommodate FIFO switches, and also when the processing of the signaling messages is performed in software. This is enabled by the use of an optional marker-acknowledge, that allows for a downstream switch (or node) to require the upstream switch (or node) to delay transmitting of data until it is ready to receive data. To ensure that no persistent loops form, UNITE uses a combination of a unique Flow-ID for the connection and an end-end acknowledgement. When the destination receives the micro-setup, it sends an in-band (on Ramakrishnan et al. Expires February 1999 [Page 3] Internet Draft UNITE August 1998 the established VC) end-end ack to the source. This indicates to the source that a loop-free path has been established. Only upon receiving the end-end ack does the source issue a RELEASE at any time in the future when it needs it. Issuing a RELEASE prior to receiving the end-end ack may erase the Flow-ID maintained at a switch. This is undesirable because it will be unable to recognize the micro-setup that may come back as a result of a loop. The combination of the unique Flow-ID and holding back the RELEASE until the end-end ack is received enables us to avoid loops. 2.1. ESTABLISHING CONNECTIVITY, PHASE 1: THE MICRO-SETUP The micro-setup and the associated micro-acknowledgment are sent on a well-known signaling VC. The processing of the micro-setup at the switch includes the following functions, in order: 1. A route lookup for the micro-setup, identifying the port on which to forward the micro- setup. 2. Allocation of a VC from the VC address space on the upstream link. We assume that all connections are created bi-directional (to minimize the overhead of both ends establishing connections). 3. Allocation of a reasonable amount of buffering at the switch for that VC, if appropriate. 4. Initiating an ACK-timer for the micro-setup. This timer is for error recovery when the micro- setup is lost or when the downstream switch does not successfully progress the micro- setup. 5. Forwarding the micro-setup downstream on the link that the route-lookup function determined as the best path towards the destination end-system. 6. Mark the incoming VC state as DISCARD, so that the switch discards all incoming cells on this VC. This enables us to clear previously buffered cells for the upstream link on the newly assigned VC, if there are any. The VC state transitions to FORWARD state subsequently, when a MARKER acknowledging the ACK is received. Ramakrishnan et al. Expires February 1999 [Page 4] Internet Draft UNITE August 1998 7. Finally, a VC id is returned to the upstream switch in the micro-ACK. The upstream node may begin transmitting data on receipt of the micro-ACK. The forwarding of the data to the downstream next hop has to await the completion of the processing at the next hop switch and the return of a corresponding VC id for the flow. We have chosen to provide reliable delivery within the UNITE signaling framework itself, rather than layering it on top of another reliable transport mechanism. Current ATM UNI signaling uses a reliable transport protocol, SSCOP for transporting signaling messages thus re-incorporating some of the overhead for processing a signaling message, and makes it difficult to implement in hardware. The 3-way handshake obviates the need for a reliable transport for carrying signaling messages. A simple, efficient encoding of the setup is vital: we use a single cell for the micro-setup, with only essential components in it, thus allowing for hardware implementation. In addition, it allows for distributed call setup to be implemented in a switch (especially important when there are a large number of ports). The micro-setup uses a unique end-to-end Flow-id. All control exchanges use this Flow-id. Included in the micro-setup is whether the call is unicast or multicast capable. Multicast and unicast connections have nearly identical mechanisms for both connection setup and QoS setup. UNITE adopts hop-by-hop routing of the micro-setup, in contrast to the traditional source- routing used in ATMs PNNI routing protocols. However, source-routing has been used to avoid loops in connection-oriented networks. Since UNITEs Flow-id is a unique end-to- end call-reference identifier, this may be used to detect loops. When a duplicate micro-setup is received with the same Flow-id, without it being a retransmission (or on a different port than the port the earlier copy was received on) it indicates a routing loop. UNITE suppresses multiple micro-setups (a mechanism we also use for multicast connections for normal operation). A controller might also send a release in the backward direction for the Flow-id (or allow timers to subsequently close the connection). This mechanism along with the rules for issuing a RELEASE after an end-end acknowledge is received by the source ensures that a successful connection does not contain a loop. Routing loops are mostly transient inconsistencies in routing tables, which we expect to be corrected by subsequent updates as part of the normal operation of the routing protocols. Ramakrishnan et al. Expires February 1999 [Page 5] Internet Draft UNITE August 1998 The micro-setup being a single cell allows the switch to avoid re-assembly and segmentation. In addition, all of the requirements to keep the cells in sequence may be ignored: a micro-setup cell may be served in any order, relative to the others. Thus, we could choose to process the micro- setup for some classes in hardware, and others in software, if so desired. Furthermore, it allows for a truly distributed implementation of the micro-setup because there is no need for a single point of re-sequencing the cell streams for signaling messages arriving on different ports. A fixed format micro-setup cell also assists hardware implementations. The fields of the micro-setup cell are as follows, with reference to Figure 2: 1. Flow-id (8 bytes) - A unique (end-to-end) Flow-id identifying the micro-setup from source. This comprises two sub-fields: a) A unique source identifier. For example, this could be the host Ethernet address, that is unique through the use of an address ROM (6 bytes). b) A source unique sequence number (2 bytes). 2. Type (1 byte) - type of signaling cell. Includes a Retransmit bit. 3. QoS Class (1 byte) - for minimal QoS sensitive routing. (Potentially broken up into a nibble for class definition and a nibble for specification of the size of the dominant parameter for that class. 4. Reserved (1 byte) - for future use. Anticipating the potential use of a Virtual Private Network Identifier, we could include 3 bytes for a VPN ID by removing the User-User Information byte from the AAL5 trailer. The use of such a VPN ID is for further discussion. 5. Protocol ID (5 bytes) - allows the caller to specify the network layer entity addressed at the called station and eliminates a need for a second exchange to establish this connectivity. SNAP encoding is assumed by default. The 5 bytes includes the OUI and PID fields. 6. Destination Address (20 bytes) - destination NSAP address. Ramakrishnan et al. Expires February 1999 [Page 6] Internet Draft UNITE August 1998 7. A VPI/VCI that is assigned by the upstream node for the connection when it is appropriate. This is determined by which end of a link is supposed to allocate the VPI/VCI value for a new connection, just like the current convention. 8. AAL5 Trailer (8 bytes) - the standard ATM AAL5 trailer including the CRC and length. In addition, of course, is the 5 byte ATM cell header. The VC id on which the micro-setup is transmitted is a common, well-known signaling VC. A switch maintains a timer associated with the micro-setup that has been transmitted to the downstream hop. This timer is cleared upon receiving the ACK from the downstream switch. A switch that has timed out after transmission of the micro-setup retransmits the micro-setup request. The re-transmitted micro-setup is identical to the previous except for a retransmit bit in the type field. As a result it can be retransmitted by hardware. 2.2. Establishing connectivity, Phase 2: The ACK for the Micro-setup The micro-Acknowledgment of the connection setup upon successful processing of the micro-setup is returned upstream to the previous switch or host. The information provided has to be adequate for appropriate processing at the upstream switch or the original host requesting the connection. The downstream switch maintains a timer associated with the micro-ACK transmitted upstream, for re-transmitting micro-ACKs. (This timer is cleared when the MARKER is received in the third phase of the micro-setup and therefore also protects against loss of the MARKER.) The micro-ACK has the following fields: 1. Flow-id (8 bytes): the Flow-id received in the micro-setup, to enable to upstream node to match this ACK to the request. 2. VPI/VCI returned for the request (3 bytes) 3. The NSAP address that was the same as the one received in the micro-setup. 4. There is also a bit to indicate to the upstream whether a Marker-Acknowledge is to be expected or not, before transmitting data. A second bit is used to inform the upstream switch on whether Ramakrishnan et al. Expires February 1999 [Page 7] Internet Draft UNITE August 1998 it should delay sending its marker-acknowledge until it has received one from downstream (Refer to Section 5.2). 2.3. Establishing connectivity, Phase 3: The Use of a Marker The final part of the 3-way handshake on the hop-by-hop micro-setup is an in-band MARKER. The MARKER serves not only to acknowledge the micro-ACK message, but is also essential to mark the beginning of the new data flow. The use of the 3-way handshake ensures that data at both the upstream node and the link related to the newly allocated VC id are flushed of old data at the time the downstream switch receives the MARKER. The 3-way handshake also allows for recovery from loss of the micro-ACK. The MARKER is the first cell sent in- band by the upstream node. Everything that follows this marker is a valid data cell for the new flow. The MARKER includes the Flow ID, the NSAP address of the connection initiator (source), and a bit to indicate if the version of the MARKER is a retransmission or not. The switch controller may, for example, use the source NSAP address for functions, such as VC re-routing or generating an out-of-band RELEASE. The upstream node, after sending the MARKER, sends data on this VC id if the downstream node has not requested that a Marker-Acknowledge is to be expected. If a Marker-Acknowledge is to be expected, then the upstream node transmits data only after receiving the Marker-Acknowledge. 3. Call Setup for Multicast UNITE incorporates the functionality of having multipoint-to-multipo* *int communication [iv] as an integral part of the signaling architecture. The simpler cases of point-to- multipoint multicast calls are simple sub-cases of this overall multicast architecture. The simple difference between a unicast call and a multicast call is that the micro-setup issued indicates that the call is potentially a multicast call. For the purposes of this discussion we assume that the underlying network forwarding mechanism can manage issues such as cell interleaving [iv]. Therefore, we describe procedures that are applicable for core-initiated joins (for core based trees [v,vi]), which are similar for source-initiated join for a source- based tree. We then describe leaf-initiated joins for other participants that join subsequent to the call being setup [vii,viii]. Ramakrishnan et al. Expires February 1999 [Page 8] Internet Draft UNITE August 1998 3.1. Core/Source Initiated Joins Core initiated joins (or source initiated joins) are relevant when the set of participants is known initially. The core issues a micro-setup knowing the NSAP address of each individual participant. Since there is no way to package more than one NSAP address in the micro- setup, an individual micro-setup is issued for each of the participants. We think this is not as important, because, (a) the micro-setup is relatively cheap and (b) the number of participants that subsequently join using the leaf-initiated joins may dominate. The first micro-setup issued to a participant includes a label (in the Type field) to indicate that it is a multicast-capable call setup. The rest of the micro-setup is similar to that described for a unicast call. The Flow-id is determined by the originator (i.e. the core or sender). The Flow-id acts as a call-reference identifier for the multicast call. The micro-setup issued for joining subsequent participants uses the same Flow-id, again labeled as a multicast. The micro-ACK that comes back from the downstream hop returns a VC id as with unicast calls. The MARKER transmitted by the core (or source) is sent in-band, on the VC id returned in the ACK. The Flow-id used in the micro-setup is retained at the switch, as a means of identifying the multicast call. During joins, the switch sending the micro-setup maintains state, which includes the Flow-id and the destination NSAP address to which the setup was issued (the new leaf). This way, ACKs that return for the individual setups issued may be matched up by the switch, for purposes of managing their retransmission. Figure 3: Core initiated join. Observe, that the marker on the last hop to B is generated by the controller at the branch point. The initiator of the micro-setup (core or source) sends the MARKER when it receives the first micro-ACK. Upon receiving subsequent micro-ACKs, the source/core knows that the VC is already open (operational) and therefore, doesnt generate a further MARKER. At a new branch point on the multicast tree, however, a MARKER is required to the new destination: this is because that branch of the tree needs to be flushed of any old data that is currently in existence for that VC identifier. The controller is responsible for generating and sending this in-band MARKER. Subsequently, data may be forwarded Ramakrishnan et al. Expires February 1999 [Page 9] Internet Draft UNITE August 1998 on that VC id, as a result of a proper 3-way handshake. Figure 3 illustrates this scenario. 3.2. Leaf Initiated Joins Figure 4 : Leaf Initiated Join: As LIJ is progressed to switch four. Bs LIJ is suppressed at switch two. The mechanisms for Leaf Initiated Joins (LIJ) are similar to those suggested in the conventional ATM Forum UNI 4.0. However, instead of having a separate LIJ and Add- Party mechanism, UNITE uses the same mechanisms of the micro-setup for performing a LIJ. Consider Figure 4, where two participants A and B wish to join the multicast tree, that currently ends at Switch 4. The LIJ is a micro-setup (the Type indicator indicates that this is a LIJ for a multicast call) from one of the participants, that is directed towards the core/source, using the NSAP address corresponding to the core/source. The Flow ID used in the micro setup is the multicast call reference identifier, and is stored at the switches as the micro setup is forwarded upstream towards the core. We assume that the underlying call routing mechanisms direct the micro-setup towards the source/core in accordance with the appropriate criterion (e.g., shortest-path or least cost). When a LIJ arrives at a switch from another participant, such as B, the Flow ID is recognized as already existing at the switch, and the forwarding of Bs micro-setup is suppressed. This may be done only if the core does not wish to be notified of the address of an individual leaf joining. Note that this happens even though the LIJ of the first participant added on this branch, has not yet reached the tree at Switch 4. When the micro-setup from A is issued, the 3-way handshake results in the marker being forwarded by the switches upstream. This effectively opens up the VC from the node A up to the branching point, at Switch 4. Along with the suppression of the micro-setups, subsequent markers are also suppressed at the switches. 4. DETAILS OF QoS MANAGEMENT The second part of UNITE is a separate means of full QoS specification and negotiation. This allows both a very flexible QoS management process, as well as the ability to incorporate QoS renegotiation with ease. As discussed in the previous sections, the micro-setup includes a QoS byte that can be used in the original Ramakrishnan et al. Expires February 1999 [Page 10] Internet Draft UNITE August 1998 connection setup to support coarse or aggregate level QoS (e.g., by allowing some differentiated decision for the forwarding of the micro-setup). UNITE supports detailed QoS signaling (or full QoS signaling) that is performed in-band on the already established best-effort VC. We anticipate that a large subset of flows will not use the additional phase of a QoS setup for establishing a distinctive quality of service. The QoS class specification that is provided in the initial micro-setup may be adequate for a reasonably large subset of best-effort flows (e.g., a large class of TCP/IP and UDP/IP flows carrying non-real-time data clearly dont need to have a subsequent QoS setup phase). Similarly, well-understood real-time flows such as uncompressed telephony traffic (mu-law, a-law) may be adequately specified as being delay- sensitive. The assured QoS for the flow begins after the QoS negotiation completes, end-to-end, in a similar fashion to the conventional UNI QoS setup. Also, we believe that most of the more sophisticated QoS management will be handled in software as is the case in the current UNI framework. However, UNITEs framework provides a more flexible and efficient QoS management in the following dimensions: 1. UNITE QoS requests may be initiated by the source or the destination of the original best effort connection setup. In the more general case of multicast connections, QoS requests may be source/core initiated or leaf initiated. 2. UNITE QoS in-band signaling allows QoS renegotiation originating from any of the connection end points. 3. UNITE QoS in-band signaling enables potentially different QoS negotiation modalities and implementations taking advantage of parallelism in the processing of the QoS setup across multiple switches in the end-to-end path. Figure 5 : Protocol for Establishing QoS in UNITE. For those flows that require a detailed QoS negotiation, we use the process of QoS setup described in Figure 5. The QoS request may immediately follow the marker, as shown in Figure 5, or may be submitted after the call is established. The receiver, after processing the request sends a QoS Commit, that commits the reservation. To adjust over-committed reservations, and to confirm the QoS reservation to the receiver, the originator sends a QoS Ack. The delay until a QoS flow begins on the forward path is an end-to-end round-trip plus the processing at the destination. On the Ramakrishnan et al. Expires February 1999 [Page 11] Internet Draft UNITE August 1998 reverse path, a confirmed QoS flow begins one round-trip after the QoS Commit is issued from the destination. For compatibility with existing ATM, we anticipate that the QoS request, Commit and Ack, would be encoded as in the UNI connection setup and connect messages, as far as the QoS information is concerned. For our purposes in this section, we treat the end-system that initiates the QoS setup request as the QoS source. The end-system that responds to the QoS setup request at the other end is the QoS destination. During the QoS negotiation, data may still flow on the connection on a best-effort basis. Cells that belong to the QoS negotiation message are marked with a Payload-Type Indicator (PTI), possibly as RM cells, so that it may flow to the controller on the switch. Thus, in fact, QoS signaling and data cells (or messages) may be interleaved because of the PTI value being distinct from one another. Figure 6: Three Way QoS Setup. Various alternatives for detailed QoS negotiation can be considered here, including the conventional three way setup described in Figure 6, and one which is consistent with the RSVP-like signaling proposed for IP networks. With reference to Figure 6, the QoS request is multicast to all switch controllers in the path and to the next link at each switch, facilitating parallel processing in the controllers (1). The Commit message traverses the reverse path, slaloming to every controller, collecting the commitments (2). The QoS Ack. multicasts the commitment to all controllers (3). In UNITE a QoS request may be initiated by any participant of a multicast, the core (if present), source or a leaf. Moreover, unless otherwise dictated by higher level policies, core/source and leaf initiated QoS may all be used at different times for the same multicast. As an illustration of the potential of UNITE, we describe the case of Leaf Initiated QoS request by referring to Figure 7. The leaf initiated QoS requests carry the demand from the receivers upstream. When the QoS request arrives at a switch, the demand is noted at the switch. The switch conveys upstream, the maximum of all the demands observed from the different branches (a leaf node or a switch may be at the end of the branch). Note that different leaves may issue their QoS requests at different times. The switch examines each QoS request and transmits a request upstream only if the QoS request is higher than the current maximum. When the demands arrive at the core/sender, the permit returned is the minimum of the offered capacity, the demands received from the leaves and the available link capacity. Note that each switch needs to maintain state, which is Ramakrishnan et al. Expires February 1999 [Page 12] Internet Draft UNITE August 1998 the demand and the permit returned for each branch for the multicast call. The leaf may be a source or a receiver, requesting a QoS on a shared distribution tree (e.g., CBT). Figure 7: Multicast QoS. Leaf initiated QoS, a) demand phase, b) permit phase. 5. Interoperability Issues In this section, we describe how to use UNITE with existing switches including software based implementations and FIFO switches. 5.1. Interoperability with existing Switches The proposed UNITE protocol discussed in Section 2 assumes that switches will be able to do per-VC queueing and furthermore will be able to handle the processing of the Marker in- band. Processing of the Marker involves changing the state of the per-VC queue so that arriving cells are buffered rather than dropped. (This ensures that valid data cells, that might follow the marker back-to-back, will be queued, while any invalid cells, e.g. cells in flight from an erroneous connection, will be dropped.) Current ATM switches do not necessarily provide these capabilities, however, and it is crucial that UNITE can still function on such legacy switches. An extra (but optional) Marker-Acknowledge message is introduced to deal with these issues. If a switch is processing the Marker in software, it cannot guarantee that queue state will change from discard to queueing in time to cater for valid data cells following the Marker. In fact because of different switch architectures and implementations, the amount of time it takes to process the Marker in software will vary greatly. An upstream node therefore has no way of knowing how long to delay before it can start forwarding data cells. By having a downstream node sending the Marker-Acknowledge message only when it is ready to receive data from the upstream node, this problem is solved. An illustration of the optional use of the Marker-Acknowledge is given in Figure 9. A downstream node indicates in the micro-Acknowledge message whether it requires the upstream node to wait for a Marker- Acknowledge or not. The Marker-Acknowledge mechanism can therefore be used on a hop- by-hop basis as dictated by local switch capabilities. When a downstream node has requested the use of the Ramakrishnan et al. Expires February 1999 [Page 13] Internet Draft UNITE August 1998 Marker-Acknowledge message, the upstream node starts a timer when it sends the Marker downstream. This timer is cleared when the Marker-Acknowledge message is received from downstream, or, if the timer expires the Marker is retransmitted. Note that the penalty for using the Marker-Acknowledge is two round-hop worth of delay as opposed to one round hop in the ideal case. The hop-by-hop nature of the original protocol is however maintained The Marker-Acknowledge message is also used to cater for FIFO switches, as illustrated in Figure 10. A FIFO switch will not be able to buffer data cells until it receives an acknowledgment from downstream. (Indeed some FIFO switches might not even be able to accept cells into the switch without having received the outgoing VCI from the downstream switch.) A FIFO switch will then simply delay sending the Marker-Acknowledge until it is capable of forwarding data cells. This in itself is however not enough. If the upstream switch is itself a FIFO switch, then the second FIFO switch has to also indicate to the upstream switch that it should not send a Marker-Acknowledge message upstream until it has received a Marker-Acknowledge message from downstream. (In the non-FIFO case described above, a switch can send a Marker-Acknowledge message upstream, as soon as it is capable of receiving data. If both the upstream and the downstream switches are FIFO switches, however, the upstream switch should wait until the downstream switch is capable of receiving data.) The Acknowledgment message therefore also needs to indicate to the upstream node whether it should wait for a downstream Marker-Acknowledge, before it can send its Marker-Acknowledge upstream. (If the upstream node is not a FIFO switch and is capable of buffering data, it can simply ignore this indication in the Acknowledgment message.) Figure 9: Use of the Optional Marker Acknowledge to enable downstream switches to control upstream switch transmission of data until it is ready This approach has the effect of changing the hop-by-hop delay of the UNITE protocol into a partial end-to-end delay across consecutive FIFO switches. (A setup request will proceed, with data following, on a hop-by-hop basis until a FIFO switch or switches are reached. Forwarding of data will then be delayed until the last FIFO switch in the sequence is ready to receive the data.) Ramakrishnan et al. Expires February 1999 [Page 14] Internet Draft UNITE August 1998 6. CONSIDERATIONS ON UNITE IMPLEMENTATION. The fundamental features of UNITE, namely, the separation of connectivity from full QoS processing, the specification of single cell signaling messages and the simplified reliability support via timers and retransmission of basic messages, enable a broad range of implementation scenarios for UNITE. At one extreme, UNITE may be implemented completely at the software level. The only functionality required at the hardware level is the ability to recognize in-band control cells used for UNITE signaling arriving at the switch ports, and to route such cells to the switch controller. In the most basic software implementation per VC queuing would not be required, and early data transmission (before end-to-end acknowledgment) may not be supported. We believe that, while the full latency improvement potential of UNITE is not achieved with such an implementation, significant improvement in call processing capacity as well as fairness improvements may indeed be achieved. Figure 10: Use of the Marker Ack with a sequence of FIFO switches. At the opposite extreme in the range of implementations of UNITE is the scenario where as much call processing functionality is implemented in the hardware, most likely located in the switch line cards and host adapters, and the switch supports advanced queuing and scheduling schemes. In this scenario the full potential of UNITE may be manifested, with close to a single hop round trip latency before the inception of data transmission, and vast call set up capacity increases for best effort or basic QoS calls. Such capacity increases are naturally scaleable with the switch port capacity and the number of ports on the switch, thanks to the distributed nature of the implementation enabled by UNITE. We believe that a call processing capacity of several thousand calls per second per OC-3 port is feasible within a UNITE framework. Note that even in an advanced implementation the full QoS call processing would be handled at the software level. It is reasonable to conceive a UNITE implementation in which the port processing modules on the port cards support the following functions in hardware: 1. Capture/injection of UNITE signaling cells. Ramakrishnan et al. Expires February 1999 [Page 15] Internet Draft UNITE August 1998 2. Management of timers, retransmissions and state changes in the call processing state machine. 3. Forwarding of micro-setup to the correct outgoing link, based on fast address lookup. 4. Allocation of incoming labels (i.e., incoming/outgoing VPI/VCI and Tags used for routing through the fabric) out of local label pools managed (on longer time scales) by the central switch controller. 5. Basic QoS support. This may imply forwarding and queueing/scheduling based on a the QoS byte in the micro-setup. 6. Control of queue scheduling based on UNITE control messages received (e.g., blocking until a message is received). A subset of the functionality listed above may also be implemented within the Adapter SAR ASIC, namely, signaling control cell capture/injection and management of timers, retransmissions and state changes in the call processing state machine. The switch control processor would, in this scenario, be responsible for: 1. Monitoring and management of label pools allocated in real time by the Port Processing Modules. 2. Call accounting and monitoring. 3. Switch level resource management. 4. Full QoS call processing, including Call Admission Control and support of sophisticated bandwidth reservation schemes and management of appropriate scheduling schemes. 5. Initialization, monitoring of error conditions and switch level management. A range of UNITE implementations falling in between the extremes described above can naturally be conceived, including the case of the current generation of switches supporting per VC queuing in the switch fabric, but still handling UNITE control cells in software. Large signaling performance advantages could be gained in this case by exploiting the early data transmission feature of UNITE. Ramakrishnan et al. Expires February 1999 [Page 16] Internet Draft UNITE August 1998 In order to explore the implications of a basic UNITE software implementation we developed a UNITE prototype completely in software over a network including two ATM switches and two adapter cards. A picture of the prototype setup is shown in Figure 11. In the prototype in-band UNITE signaling cells are supported by OAM cells. To evaluate the performance improvement with UNITE, we compared the performance of UNI 3.0 versus the prototype UNITE implementation. The tests used a mature UNI 3.0 implementation, a Radcom test box acting as a source and destination, and a Cabletron ATM Switch. The UNITE tests used two Sun workstations with Cabletron/Zeitnet ATM adapters. One important issue for connection-oriented communication is the amount of memory used to keep state for each individual connection. At least comparatively, UNITE is efficient in using memory for VC state, using only 128 bytes per best-effort VC in our prototype. In contrast, UNI uses almost 1500 bytes for a best-effort VC. Thus, there is the potential for UNITE to support a much larger number of VCs on switch ports. We measured the UNITE connection setup latency and throughput. Our preliminary results, using 100 microsecond clock granularity in our measurements, were as follows: The best effort connection setup latency through an individual switch was 1.7 ms with UNITE. In comparison, a best-effort UNI connection took 10.9 milliseconds. The various components of this service time are shown in Table 1. In terms of throughput, UNITE got approximately 700-800 calls/sec, while with UNI we got approximately 130 calls/sec. We believe that with some simple optimizations, UNITE could easily get over 1000 calls/sec. We expect that even more substantial improvement could be achieved with UNITE with a streamlined/hardware implementation. 7. BENEFITS OF UNITE In this section we summarize and reorganize, as a quick reference, the benefits of UNITE discussed in this internet draft. 1) Separating connectivity from QoS enables UNITE to: a) Achieve high throughput for establishing connections. Ramakrishnan et al. Expires February 1999 [Page 17] Internet Draft UNITE August 1998 b) Have a very low latency to begin data transmission because we dont have to wait for an end-end message exchange. c) Have throughput and latency for connection establishment be independent of the complexity of the QoS class and other service characteristics. Complex QoS specifications are allowed for those connections that need it. d) Support QoS establishment and renegotiation in a similar fashion. This enables simple ways to change parameters for flows. e) Allow for communication on a best-effort basis even upon failure of a QoS request. 2) UNITE is ideally matched to carry Internet traffic over ATM networks. 3) UNITE is optimized for distributed hardware implementation of signaling within a switch on a per-port basis. a) A single cell, fixed length, fixed format micro-setup allows for high-speed processing of the setup message. b) No single point of re-sequencing or SAR is needed, and no software stack such as SSCOP is required for supporting basic connection establishment. c) Reliability is achieved via simple timers and retransmissions that are easily implemented in hardware. d) State and context information for connectivity requires only a small amount of memory and can be kept in a distributed fashion, even on a line card. 4) Separation of connectivity and QoS and sending QoS related messages in-band allows the network to have QoS setup initiated by sources or destinations. 5) UNITE provides isolation of QoS negotiation to connections that require it Ramakrishnan et al. Expires February 1999 [Page 18] Internet Draft UNITE August 1998 a) Multiple passes, complex QoS negotiation or other service characteristics may be allowed. 6) Supports a full range of multicast architectures, including multipoint-to-multipoint. a) QoS can adapt to the capabilities of the branches of the tree. 7) Builds on the QoS work done for ATM and IP. a) Accommodates a wide range of QoS models: UNI, RSVP and future evolution. 9) Builds on substantial amount of the work done for PNNI for QoS-sensitive routing. 10) Allows communication on a path selected based on a coarse class specification. Hence even simple connectivity can be better than true best-effort. 11) Inter-works with existing UNI switches. 12) Allows for legacy switches and various levels of hardware implementation complexity. 8. SUMMARY We have described a protocol for lightweight signaling. The key idea behind the protocol, is the separation of connection establishment and QoS processing. This makes connection setup independent of QoS processing complexity, benefiting most flows and best effort flows in particular, as the channel becomes immediately available on best effort basis. The separation allows all flow specific signaling, i.e., QoS messages to be carried in-band, thus protecting the shared signaling resources. UNITE signaling unifies initial QoS setup and renegotiation, and supports source/core initiated QoS as well as receiver initiated QoS requests. The connectivity setup message is a single ATM cell (called micro-setup). The micro-setup and micro-acknowledgment are exchanged Ramakrishnan et al. Expires February 1999 [Page 19] Internet Draft UNITE August 1998 on a hop-by-hop basis on a signaling channel. By incorporating a minimal QoS class identification in the micro-setup request, UNITE has the ability to provide QoS sensitive routing. Data flow on the best-effort VC may begin without waiting for the setup to be completed over the entire end-end path. The use of per-VC queuing permits the source to start sending data on a best-effort basis as soon as the connection has been setup on the next hop. Subsequently, QoS setup requests and acknowledgments flow in-band on the best-effort VC that is initially setup. The QoS for the flow is assured upon completion of the end-end exchange of the QoS setup and ack. The complexity of QoS messages and their processing is isolated to those VCs requiring it, without impacting other VCs. In addition, it allows for the QoS requester to be either the source or destination of the connection. The architecture recognizes the need for multipoint-to-multipoint connections, and the possible need for variable QoS to different participants in the multicast group. 9. Applicability OF UNITE to MPLS As we stated earlier, this Internet Draft is based on an ATM Forum contribution and as such is written within an ATM context. However, we believe that UNITE might also be of value within the context of MPLS and have therefore decided to present it to the MPLS working group to solicit feedback. Clearly, UNITE currently uses ATM addresses, to be applicable to ATM. However, we believe that the protocol could be used with IP addresses, with hop-by-hop forwarding of the micro-setup at the MPLS switches using conventional link-state routing, such as OSPF. Because of the inclusion of the QoS class in the micro-setup, we can take advantage of potential enhancements to IP to accommodate QoS-sensitive routing. We believe that UNITE might be applicable to the following objectives of the MPLS working group. We reiterate below these specific objectives: 1. Specify standard protocol(s) for maintenance and distribution of label binding information to support unicast destination-based routing with forwarding based on label-swapping. Ramakrishnan et al. Expires February 1999 [Page 20] Internet Draft UNITE August 1998 2. Specify standard protocol(s) for maintenance and distribution of label binding information to support multicast routing with forwarding based on label-swapping. 4. Specify standard protocol(s) for maintenance and distribution of label binding information to support explicit paths different from the one constructed by destination-based forwarding with forwarding based on label-swapping. 6. Specify a standard way to use the ATM user plane a) Allow operation/co-existence with standard (ATM Forum, ITU, etc.) ATM control plane and/or standard ATM hardware b) Specify a label swapping control plane c) Take advantage of possible mods/improvements in ATM hardware, for example the ability to merge VCs 7. Discuss support for QOS (e.g. RSVP) UNITE is a framework that is efficient in providing connectivity, with very low latency for a source to begin transmitting data when using on-demand label distribution. An integral part of the framework is providing very flexible support for QoS, accommodating multiple QoS models including sender and receiver initiated QoS. Multicast support fits naturally in UNITE, with common procedures applicable for unicast and multicast (both source and receiver joins). UNITE allows the network to scale to large numbers of nodes because of the ability to support on-demand label distribution efficiently. Further, UNITE achieves scalability in the following dimensions: 1. UNITE can achieve high throughput for label distribution. 2. UNITE enables initiation of packet forwarding with low latency. 3. UNITE minimizes the amount of state needed in the network. UNITE uses a QoS hint to route the setup. The explicit path established in this manner may therefore be different from Ramakrishnan et al. Expires February 1999 [Page 21] Internet Draft UNITE August 1998 "default" destination based forwarding because it uses QoS sensitive destination based routing. As it is currently defined, UNITE is an ATM control protocol and is therefore directly applicable to objective (6). UNITE also directly addresses QoS issues without making any assumptions about the specific QoS model that is used. For example, RSVP can be combined with UNITE to perform the actual resource reservations. If particular MPLS switches do not support native IP forwarding, then the need for UNITE appears even more compelling, especially in an environment where services other than best- effort are provided. (e.g. in a Diffserv type of environment, it would be wasteful to distribute labels for all service classes across the whole network). 10. References [i] L. Zhang, S. Deering, D. Estrin, S. Shenker, RSVP: A New Resource ReSerVation Protocol, IEEE Network Magazine, Sept. 1993. [ii] Danny J. Mitzel, Deborah Estrin, Scott Shenker, and Lixia Zhang, An Architectural Comparison of ST-II and RSVP, Proceedings of Infocom94, 1994. [iv] M. Grossglauser & K.K. Ramakrishnan (1997) SEAM: Scalable and Efficient ATM Multicast, Proceedings of IEEE Infocom'97, April 1997, Kobe, Japan. [v] T. Ballardie, P. Francis, and J. Crowcroft, Core Based Trees (CBT), in Proc. ACM SIGCOMM'93, San Francisco, California, September 1993. [vi] S. Deering, D. Estrin, D. Farinacci, V. Jacobson, C. Liu, and L. Wei, An Architecture for Wide-Area Multicast Routing,' in Proc. ACM SIGCOMM'94, London, August 1994. [vii] ATM Forum, ATM User-Network-Interface (UNI) Signaling specification version 4.0, July 1996. Ramakrishnan et al. Expires February 1999 [Page 22] Internet Draft UNITE August 1998 [viii] S. Deering, Multicast Routing in Internetworks and Extended LANs, in Proc. ACM SIGCOMM'88, Stanford, California, August 1988. Authors' Addresses K. K. Ramakrishnan, Gili Hjalmtysson, Kobus Van der Merwe AT&T Labs. Research 180 Park Avenue, Florham Park, N.J. 07932 kkrama@research.att.com,gisli@research.att.com,kobus@research.att.com Phone:+1 973 360 8766 Fax: +1 973 360 8871 Flavio Bonomi, Sateesh Kumar, Michael Wong CSI ZeitNet/Cabletron 5150 Great America Parkway, Santa Clara, CA, 95054 fbonomi@ctron.com, skumar@ctron.com, mwong@ctron.com Phone: +1 408 565 9360 Fax: +1 408 565 6501 Ramakrishnan et al. Expires February 1999 [Page 23]