Network Working Group M. Tuexen INTERNET DRAFT Siemens AG Q. Xie Motorola R. Stewart M. Shore Cisco L. Ong Point Reyes Networks J. Loughney M. Stillman Nokia Expires October 2, 2001 April 2, 2001 Architecture for Reliable Server Pooling Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of [RFC2026]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract The goal is to develop an architecture and protocols for the management and operation of server pools supporting highly reliable applications, and for client access mechanisms to a server pool. A proposed architecture is presented and illustrated by examples. Tuexen et al. [Page 1] Internet Draft Architecture for Reliable Server Pooling April 2001 1. Introduction 1.1. Overview The Internet is always on. Many users expect services to be always available; many business depend upon connectivity 24 hours a day, 7 days a week, 365 days a year. In order to fulfill this, many proprietary solutions and operating system dependent solutions have been developed to provide highly reliable and highly available servers. This document defines a proposed architecture, which can be used to provide highly available services. The way this is achieved is by using servers grouped into pools. Therefore, if a client wants to access a server pool, it will be able to use any of the servers in the server pool taking into account the server pool policy. Highly available services also put the same high reliability requirements upon the transport layer protocol beneath RSerPool - it must provide strong survivability in the face of network component failures. Supporting real time applications is another main focus of RSerPool which leads to requirements on the processing time needed. Scalability is another important requirement. 1.2. Terminology This document uses the following terms: Operation scope: The part of the network visible to pool users by a specific instance of the reliable server pooling protocols. Pool (or server pool): A collection of servers providing the same application functionality. Pool handle (or pool name): A logical pointer to a pool. Each server pool will be identifiable in the operation scope of the system by a unique pool handle or "name". Pool element: A server entity having registered to a pool. Pool user: A server pool user. Tuexen et al. [Page 2] Internet Draft Architecture for Reliable Server Pooling April 2001 Pool element handle (or endpoint handle): A logical pointer to a particular pool element in a pool, consisting of the name of the pool and a destination transport address of the pool element. Name space: A cohesive structure of pool names and relations that may be queried by an internal or external agent. Name server: Entity which the responsible for managing and maintaining the name space within the RSerPool operation scope. 1.3. Abbreviations ASAP: Aggregate Server Access Protocol ENRP: Endpoint Name Resolution Protocol PE: Pool element PU: Pool user SCTP: Stream Control Transmission Protocol TCP: Transmission Control Protocol 2. Reliable Server Pooling Architecture In this section, we discuss what a typical reliable server pool architecture may look like. 2.1. Common RSerPool Functional Areas The following functional areas or components may likely be present in a typical RSerPool system architecture: - A number of logical "Server Pools" to provide distinct application services. Each of those server pools will likely be composed of some number of "Pool Elements (PEs)" - which are application programs running on distributed host machines, collectively providing the desired application services via, for example, data sharing and/or load sharing. Each server pool will be identifiable in the operation scope of the system by a unique "name". Tuexen et al. [Page 3] Internet Draft Architecture for Reliable Server Pooling April 2001 - Some "Pool Users (PUs)" which are the users of the application services provided by the various server pools. - PEs may or may not be PU, depending on whether or not they wish to access other pools in the operation scope of the system. - A "Name Space" which contains all the defined names within the operation scope of the system. - One or more "Name Servers" which carry out various maintenance functions (e.g., registration and de-registration, integrity checking) for the "Name Space". 2.2. RSerPool Protocol Overview The RSerPool requested features can be obtained with the help of two protocols: ENRP (Endpoint Name Resolution Protocol) and ASAP (Aggregate Server Access Protocol). ENRP is designed to provide a fully distributed fault-tolerant real-time translation service that maps a name to a set of transport addresses pointing to a specific group of networked communication endpoints registered under that name. ENRP employs a client-server model with which an ENRP server will respond to the name translation service requests from endpoint clients running on the same host or running on different hosts. ASAP in conjunction with ENRP provides a fault tolerant data transfer mechanism over IP networks. ASAP uses a name-based addressing model which isolates a logical communication endpoint from its IP address(es), thus effectively eliminating the binding between the communication endpoint and its physical IP address(es) which normally constitutes a single point of failure. In addition, ASAP defines each logical communication destination as a server pool, providing full transparent support for server-pooling and load sharing. It also allows dynamic system scalability - members of a server pool can be added or removed at any time without interrupting the service. The fault tolerant server pooling is gained by combining two parts, namely ASAP and ENRP. ASAP provides the user interface for name to address translation, load sharing management, and fault management. ENRP defines the fault tolerant name translation service. The protocol stack used is described by the following figure 1. Tuexen et al. [Page 4] Internet Draft Architecture for Reliable Server Pooling April 2001 ********* *********** * PE/PU * *ENRP Srvr* ********* *********** +-------+ +----+----+ To other <-->| ASAP |<------>|ASAP|ENRP| <---To Peer ENRP PE/PU +-------+ +----+----+ Name Servers | SCTP | | SCTP | +-------+ +---------+ | IP | | IP | +-------+ +---------+ Figure 1: Typical protocol stack 2.3. Typical Interactions between RSerPool Components The following drawing shows the typical RSerPool components and their possible interactions with each other: Tuexen et al. [Page 5] Internet Draft Architecture for Reliable Server Pooling April 2001 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~ operation scope ~ ~ ......................... ......................... ~ ~ . Server Pool 1 . . Server Pool 2 . ~ ~ . +-------+ +-------+ . (d) . +-------+ +-------+ . ~ ~ . |PE(1,A)| |PE(1,C)|<-------------->|PE(2,B)| |PE(2,A)|<---+ ~ ~ . +-------+ +-------+ . . +-------+ +-------+ . | ~ ~ . ^ ^ . . ^ ^ . | ~ ~ . | (a) | . . | | . | ~ ~ . +----------+ | . . | | . | ~ ~ . +-------+ | | . . | | . | ~ ~ . |PE(1,B)|<---+ | | . . | | . | ~ ~ . +-------+ | | | . . | | . | ~ ~ . ^ | | | . . | | . | ~ ~ .......|........|.|.|.... .......|.........|....... | ~ ~ | | | | | | | ~ ~ (c)| (a)| | |(a) (a)| (a)| (c)| ~ ~ | | | | | | | ~ ~ | v v v v v | ~ ~ | +++++++++++++++ (e) +++++++++++++++ | ~ ~ | + ENRP-Server +<---------->+ ENRP-Server + | ~ ~ | +++++++++++++++ +++++++++++++++ | ~ ~ v ^ ^ | ~ ~ ********* | | | ~ ~ * PU(A) *<-------+ (b)| | ~ ~ ********* (b) | | ~ ~ v | ~ ~ ::::::::::::::::: (f) ***************** | ~ ~ : Other Clients :<------------->* Proxy/Gateway * <---+ ~ ~ ::::::::::::::::: ***************** ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Figure 2: RSerPool components and their possible interactions. In figure 2 we can identify the following possible interactions: (a) Server Pool Elements <-> ENRP Server: (ASAP) Each PE in a pool uses ASAP to register or de-register itself as well as to exchange other auxiliary information with the ENRP Server. The ENRP Server also uses ASAP to monitor the operational status of each PE in a pool. (b) PU <-> ENRP Server: (ASAP) A PU normally uses ASAP to request the ENRP Server for a name- to-address translation service before the PU can send user messages addressed to a server pool by the pool's name. Tuexen et al. [Page 6] Internet Draft Architecture for Reliable Server Pooling April 2001 (c) PU <-> PE: (ASAP) ASAP can be used to exchange some auxiliary information of the two parties before they engage in user data transfer. (d) Server Pool <-> Server Pool: (ASAP) A PE in a server pool can become a PU to another pool when the PE tries to initiate communication with the other pool. In such a case, the interactions described in B) and C) above will apply. (e) ENRP Server <-> ENRP Server: (ENRP) ENRP can be used to fulfill various Name Space operation, administration, and maintenance (OAM) functions. (f) Other Clients <-> Proxy/Gateway: standard protocols The proxy/gateway enables clients ("other clients"), which are not RSerPool aware, to access services provided by an RSerPool based server pool. It should be noted that these proxies/gateways may become a single point of failure. 3. Examples In this section the basic concepts of ENRP and ASAP will be described. First an RSerPool aware FTP server is considered. The interaction with an RSerPool aware and an non-aware client is given. Finally, a telephony example is considered. 3.1. Two File Transfer Examples In this section we present two separate file transfer examples using ENRP and ASAP. We present two separate examples demonstrating an ENRP/ASAP aware client and a client that is using a Proxy or Gateway to perform the file transfer. In this example we will use a FTP [RFC959] model with some modifications. The first example (the RSerPool aware one) will modify FTP concepts so that the file transfer takes place over SCTP. In the second example we will use TCP between the unaware client and the Proxy. The Proxy itself will use the modified FTP with RSerPool as illustrated in the first example. Please note that in the example we do NOT follow FTP [RFC959] precisely but use FTP-like concepts and attempt to adhere to the basic FTP model. These examples use FTP for illustrative purposes, FTP was chosen since many of the basic concept are well known and should be familiar to readers. Tuexen et al. [Page 7] Internet Draft Architecture for Reliable Server Pooling April 2001 3.1.1. The RSerPool Aware Client ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~ operation scope ~ ~ ......................... ~ ~ . "File Transfer Pool" . ~ ~ . +-------+ +-------+ . ~ ~ +-> |PE(1,A)| |PE(1,C)| . ~ ~ |. +-------+ +-------+ . ~ ~ |. ^ ^ . ~ ~ |. +----------+ | . ~ ~ |. +-------+ | | . ~ ~ |. |PE(1,B)|<---+ | | . ~ ~ |. +-------+ | | | . ~ ~ |. ^ | | | . ~ ~ |.......|........|.|.|.... ~ ~ | ASAP | ASAP| | |ASAP ~ ~ |(d) |(c) | | | ~ ~ | v v v v ~ ~ | ********* +++++++++++++++ ~ ~ + ->* PU(X) * + ENRP-Server + ~ ~ ********* +++++++++++++++ ~ ~ ^ ASAP ^ ~ ~ | <-(b) | ~ ~ +--------------+ ~ ~ (a)-> ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Figure 3: Architecture for RSerPool aware client. To effect a file transfer the following steps would take place. (1) The application in PU(X) would send a login request. The PU(X)'s ASAP layer would send an ASAP request to its ENRP server to request the list of pool elements (using (a)). The pool handle to identify the pool would be "File Transfer Pool". The ASAP layer queues the login request. (2) The ENRP server would return a list of the three PEs PE(1,A), PE(1,B) and PE(1,C) to the ASAP layer in PU(X) (using (b)). (3) The ASAP layer selects one of the PEs, for example PE(1,B). It transmitts the login request, the other FTP control data finally starts the transmission of the requested files (using (c)). For this the multiple stream feature of SCTP could be used. (4) If during the file transfer conversation, PE(1,B) fails, assuming the PE's were sharing state of file transfer, a fail- Tuexen et al. [Page 8] Internet Draft Architecture for Reliable Server Pooling April 2001 over to PE(1,A) could be initiated. PE(1,A) would continue the transfer until complete (see (d)). In parallel a request from PE(1,A) would be made to ENRP to request a cache update for the server pool "File Transfer Pool" and a report would also be made that PE(1,B) is non-responsive (this would cause appropriate audits that may remove PE(1,B) from the pool if the ENRP servers had not already detected the failure) (using (a)). 3.1.2. The RSerPool Unaware Client In this example we investigate the use of a Proxy server assuming the same set of scenario as illustrated above. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~ operation scope ~ ~ ......................... ~ ~ . "File Transfer Pool" . ~ ~ . +-------+ +-------+ . ~ ~ . |PE(1,A)| |PE(1,C)| . ~ ~ . +-------+ +-------+ . ~ ~ . ^ ^ . ~ ~ . +----------+ | . ~ ~ . +-------+ | | . ~ ~ . |PE(1,B)|<---+ | | . ~ ~ . +-------+ | | | . ~ ~ .......^........|.|.|.... ~ ~ | | | | ~ ~ | ASAP| | |ASAP ~ ~ | | | | ~ ~ | v v v ~ ~ | +++++++++++++++ +++++++++++++++ ~ ~ | + ENRP-Server +<--ENRP-->+ ENRP-Server + ~ ~ | +++++++++++++++ +++++++++++++++ ~ ~ | ASAP ^ ~ ~ | ASAP (c) (b) | ^ ~ ~ +---------------------------------+ | | | ~ ~ | v | (a) ~ ~ v v ~ ~ ::::::::::::::::: (e)-> ***************** ~ ~ : FTP Client :<------------->* Proxy/Gateway * ~ ~ ::::::::::::::::: (f) ***************** ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Figure 4: Architecture for RserPool unaware client. In this example the steps will occur: Tuexen et al. [Page 9] Internet Draft Architecture for Reliable Server Pooling April 2001 (1) The FTP client and the Proxy/Gateway are using the TCP-based ftp protocol. The client sends the login request to the proxy (using (e)) (2) The proxy behaves like a client and performs the actions described under (1), (2) and (3) of the above description (using (a), (b) and (c)). (3) The ftp communication continues and will be translated by the proxy into the RSerPool aware dialect. This interworking uses (f) and (c). Note that in this example high availability is maintained between the Proxy and the server pool but a single point of failure exists between the FTP client and the Proxy, i.e. the command TCP connection and its one IP address it is using for commands. 3.2. Telephony Signaling Example This example shows the use of ASAP/RSerPool to support server pooling for high availability of a telephony application such as a Voice over IP Gateway Controller (GWC) and Gatekeeper services (GK). In this example, we show two different scenarios of deploying these services using RSerPool in order to illustrate the flexibility of the RSerPool architecture. 3.2.1. Decomposed GWC and GK Scenario In this scenario, both GWC and GK services are deployed as separate pools with some number of PEs, as shown in the following diagram. Each of the pools will register their unique pool handle (i.e. name) with the ENRP Server. We also assume that there are a Signaling Gateway (SG) and a Media Gateway (MG) present and both are RSerPool aware. Tuexen et al. [Page 10] Internet Draft Architecture for Reliable Server Pooling April 2001 ................... . Gateway . . Controller Pool . ................. . +-------+ . . Gatekeeper . . |PE(2,A)| . . Pool . . +-------+ . . +-------+ . . +-------+ . . |PE(1,A)| . . |PE(2,B)| . . +-------+ . . +-------+ . . +-------+ . (d) . +-------+ . . |PE(1,B)|<------------>|PE(2,C)|<-------------+ . +-------+ . . +-------+ . | ................. ........^.......... | | | (c)| (e)| | v +++++++++++++++ ********* ***************** + ENRP-Server + * SG(X) * * Media Gateway * +++++++++++++++ ********* ***************** ^ ^ | | | <-(a) | +-------------------+ (b)-> Figure 5: Deployment of Decomposed GWC and GK. As shown in the figure 5, the following sequence takes place: (1) the Signaling Gateway (SG) receives an incoming signaling message to be forwarded to the GWC. SG(X)'s ASAP layer would send an ASAP request to its "local" ENRP server to request the list of pool elements (PE's) of GWC (using (a)). The key used for this query is the pool handle of the GWC. The ASAP layer queues the data to be sent in local buffers until the ENRP server responds. (2) the ENRP server would return a list of the three PE's A, B and C to the ASAP layer in SG(X) together with information to be used for load-sharing traffic across the gateway controller pool (using (b)). (3) the ASAP layer in SG(X) will select one PE (e.g., PE(2,C)) and send the signaling message to it (using (c)). The selection is based on the load sharing information of the gateway controller pool. Tuexen et al. [Page 11] Internet Draft Architecture for Reliable Server Pooling April 2001 (4) to progress the call, PE(2,C) finds that it needs to talk to the Gatekeeper. Assuming it has already had gatekeeper pool's information in its local cache (e.g., obtained and stored from recent query to ENRP Server), PE(2,C) selects PE(1,B) and sends the call control message to it (using (d)). We assume PE(1,B) responds back to PE(2,C) and authorizes the call to proceed. (5) PE(2,C) issues media control commands to the Media Gateway (using (e)). RSerPool will provide service robustness to the system if some failure would occur in the system. For instance, if PE(1, B) in the Gatekeeper Pool crashed after receiving the call control message from PE(2, C) in step (d) above, what most likely will happen is that, due to the absence of a reply from the Gatekeeper, a timer expiration event will trigger the call state machine within PE(2, C) to resend the control message. The ASAP layer at PE(2, C) will then notice the failure of PE(1, B) through (likely) the endpoint unreachability detection by the transport protocol beneath ASAP and automatically deliver the re-sent call control message to the alternate GK pool member PE(1, A). With appropriate intra-pool call state sharing support, PE(1, A) will be able to correctly handle the call and reply to PE(2, C) and hence progress the call. 3.2.2. Collocated GWC and GK Scenario In this scenario, the GWC and GK services are collocated (e.g., they are implemented as a single process). In such a case, one can form a pool that provides both GWC and GK services as shown in figure 6 below. Tuexen et al. [Page 12] Internet Draft Architecture for Reliable Server Pooling April 2001 ........................................ . Gateway Controller/Gatekeeper Pool . . +-------+ . . |PE(3,A)| . . +-------+ . . +-------+ . . |PE(3,C)|<---------------------------+ . +-------+ . | . +-------+ ^ . | . |PE(3,B)| | . | . +-------+ | . | ................|....................... | | | +-------------+ | | | (c)| (e)| v v +++++++++++++++ ********* ***************** + ENRP-Server + * SG(X) * * Media Gateway * +++++++++++++++ ********* ***************** ^ ^ | | | <-(a) | +-------------------+ (b)-> Figure 6: Deployment of Collocated GWC and GK. The same sequence as described in 5.2.1 takes place, except that step (4) now becomes internal to the PE(3,C) (again, we assume Server C is selected by SG). 4. Acknowledgements The authors would like to thank Bernard Aboba, Matt Holdrege, Christopher Ross, Werner Vogels and many others for their invaluable comments and suggestions. 5. References [RFC793] J. B. Postel, "Transmission Control Protocol", RFC 793, September 1981. [RFC959] J. B. Postel, J. Reynolds, "File Transfer Protocol (FTP)", RFC 959, October 1985. [RFC2026] S. Bradner, "The Internet Standards Process -- Revision 3", RFC 2026, October 1996. Tuexen et al. [Page 13] Internet Draft Architecture for Reliable Server Pooling April 2001 [RFC2608] E. Guttman et al., "Service Location Protocol, Version 2", RFC 2608, June 1999. [RFC2719] L. Ong et al., "Framework Architecture for Signaling Transport", RFC 2719, October 1999. [RFC2960] R. R. Stewart et al., "Stream Control Transmission Protocol", RFC 2960, November 2000. 6. Authors' Addresses Michael Tuexen Tel.: +49 89 722 47210 Siemens AG e-mail: Michael.Tuexen@icn.siemens.de ICN WN CS SE 51 D-81359 Munich Germany Qiaobing Xie Tel.: +1 847 632 3028 Motorola, Inc. e-mail: qxie1@email.mot.com 1501 W. Shure Drive, #2309 Arlington Heights, Il 60004 USA Randall Stewart Tel.: +1 815 477 2127 Cisco Systems, Inc. e-mail: rrs@cisco.com 24 Burning Bush Trail Crystal Lake, Il 60012 USA Melinda Shore Tel.: +1 607 272 7512 Cisco Systems, Inc. e-mail: mshore@cisco.com 809 Hayts Rd Ithaca, NY 14850 USA Lyndon Ong Tel.: +1 408 321 8237 Point Reyes Networks e-mail: long@pointreyesnet.com 1991 Concourse Drive San Jose, CA USA Tuexen et al. [Page 14] Internet Draft Architecture for Reliable Server Pooling April 2001 John Loughney Tel.: Nokia Research Center e-mail: john.loughney@nokia.com PO Box 407 FIN-00045 Nokia Group Finland Maureen Stillman Tel.: +1 607 273 0724 62 Nokia e-mail: maureen.stillman@nokia.com 127 W. State Street Ithaca, NY 14850 USA This Internet Draft expires October 2, 2001. Tuexen et al. [Page 15]