idnits 2.17.1 draft-chu-ip-cluster-00.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-19) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 6) being 59 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (August 1996) is 10109 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Downref: Normative reference to an Informational RFC: RFC 1794 (ref. '1') Summary: 9 errors (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET-DRAFT Chi Chu 3 Expires: February 21, 1997 Research 2000, Inc. 4 August 1996 6 IP Cluster 7 draft-chu-ip-cluster-00.txt 9 Status of this Memo 11 This document is an Internet-Draft. Internet-Drafts are working 12 documents of the Internet Engineering Task Force (IETF), its 13 areas, and its working groups. Note that other groups may also 14 distribute working documents as Internet-Drafts. 16 Internet-Drafts are draft documents valid for a maximum of six 17 months and may be updated, replaced, or obsoleted by other 18 documents at any time. It is inappropriate to use Internet- 19 Drafts as reference material or to cite them other than as 20 ``work in progress.'' 22 To learn the current status of any Internet-Draft, please check 23 the ``1id-abstracts.txt'' listing contained in the Internet- 24 Drafts Shadow Directories on ftp.is.co.za (Africa), 25 nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), 26 ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). 28 1. Abstract 30 This Internet-Draft is intended to provide a means for 31 "IP Clustering" across multiple servers. It is meant as an improved 32 alternative to the various solutions for distributing WWW traffic 33 already attempted by the IETF DNS Working Group. In addition, 34 the clustering method can be applied not only to a heavily visited 35 web server, but also to any overloaded TCP/IP servers such as a 36 domain name server. The IP Cluster provides two primary functions: 37 IP traffic distribution to multiple servers and fault-tolerance. 39 2. Introduction 41 The notion of distributing IP (Web) data traffic to multiple server 42 machines has already been foray-ed by the various DNS methods 43 mentioned or described in RFC 1794 [1]. The basic drawbacks for 44 all these methods are similar: 46 * short or zero TTL for DNS records - this is not intended by 47 the DNS specification and incurs a few unpleasant consequences; 48 * heavy DNS traffic - since secondary or non-authoritative DNS 49 servers cannot effectively cache the data, all these methods 50 generate heavy DNS queries across the global Internet, 51 bombarding a chain of servers in the name space; 53 * potentially high delay - if any server in the DNS chain 54 experiences outage or bottleneck, the response to the initial 55 query would be significantly delayed if an alternate DNS server 56 were required to process the query. 57 * the primary DNS server becomes the single point of failure - 58 since the TTL is very small or zero, outage of the primary 59 server for even a small period of time results in failed DNS 60 lookup; 61 * easier to spoof - a DNS record can be easily "spoof-ed" to 62 mislead a client to a bogus host name to IP address mapping; 64 and among other drawbacks that are method specific. In short, these 65 DNS methods solve one problem that may be beneficial to a single 66 site, but create another that can be quite undesirable for the 67 Internet at large. Just imagine what would happen if every website 68 decides to implement a DNS method for distributing its web traffic. 70 Clearly, it is imperative and highly desirable that an alternative 71 solution be established that does not suffer the same drawbacks 72 discussed above, and yet the new solution not introduce a new 73 problem equal in its severity to the network at large. 75 3. The Alternative 77 3.1 Applicable Topology 79 The proposed method requires an IP router connecting to multiple 80 servers in a switch-like configuration. That is, each server 81 machine is directly connected to a unique physical port/interface 82 on the router. Since a router interface can be a LAN or serial 83 interface, this IP cluster formation can span locally or be 84 distributed via wide-area network. 86 3.2 Description 88 The nature of IP load balancing requires that for a given IP host 89 name, typically corresponding to some network services, the data 90 traffic and thus the processing be actually distributed among 91 several server machines, with some control. This idea of a 92 "virtual server" provides transparent services to a remote client. 93 The virtual server itself consists a number of host machines, or 94 cluster members, each performing a set of services. In a true 95 cluster environment, a cluster member performs a set of services 96 or functions that may be different from that of another member 97 within the cluster. 99 Similar to all the DNS load balancing methods, the proposed method 100 described in this document assumes symmetric host processing. 101 Namely, the so-called "IP Cluster" consists of a number of cluster 102 members each of which performs the same set of services (although 103 strictly speaking, it does not have to be necessarily so). 105 The alternative method does not rely on the dynamic host name 106 to IP address mapping. Instead, it relies on the concept of a 107 Virtual IP (VIP) address. This VIP address is configured as the 108 host IP address for all cluster members. Each cluster member 109 is directly connected to a unique router interface port, much 110 like an Ether-Switch configuration, topologically. 112 The VIP address appears to the outside world as just another 113 unique IP address, with the usual DNS host name to IP address 114 mapping in the traditional static sense. To each of the IP 115 cluster members, it thinks that this VIP address is its globally 116 unique host address. 118 However, this VIP address appears very differently to the IP 119 router to which all the cluster members are directly connected 120 to. With careful and deliberate choice of the VIP address (e.g., 121 xx.xx.xx.63 for a Class C network), and with the appropriate 122 subnet (or variable subnet) mask enabled in the router interface 123 ports, this unique host IP address is in effect a broadcasting 124 address as for as the router is concerned. Consequently, upon 125 receiving an IP packet with destination address equal to this VIP 126 address, the router will attempt to, assuming configured properly, 127 broadcast the packet to all its relevant interfaces. 129 Each of the "broadcasted" interface, however, is configured with 130 a simple filter. This simple filter basically filters on the IP 131 source address of the incoming packet. Thus with each interface 132 filter permitting only a unique and non-overlapping portion of 133 the IP address space to route through, we have effectively 134 achieved high-performance "IP-Switching". 136 Furthermore, since this portioning of the IP address space can be 137 well controlled by each interface filter's bitmasking and 138 wildcarding, load balancing can be accomplished now with respect 139 to CPU, memory, IO, or all of the above, depending upon the 140 application nature of the IP cluster. 142 4. A Scalable Model 144 The IP clustering model described above should scale very well. 145 Physically, the "Virtual IP" clustering is primarily limited by 146 the number of router interface ports. In terms of performance, the 147 scalability of this model is limited mostly by network bandwidth 148 technology and the router performance which is usually orders of 149 magnitude greater than a workstation server's ability to deliver 150 the same data throughput. 152 In short, this IP clustering model should scale quite linearly. 154 5. Fault Resilience and Fault Tolerance 156 Fault Resilience (FR) here means the ability of the IP cluster to 157 be able to 159 * automatically redistribute its parallel server processing in 160 the event of any single cluster member (i.e., server machine) 161 failure, and 162 * automatically restore to the normal parallel processing once 163 the failed server has recovered (by whatever means). 165 The IP clustering method described in this proposal should be able 166 to support the above requirements. There are a number of viable 167 implementations, however; and I shall briefly describe the basic 168 concept. 170 Essentially what needs to be done here to achieve FR is similar 171 to what is done in a "classic" cluster environment. Each cluster 172 member monitors the health and status of the other cluster members. 173 When a failure is detected, each monitoring member (which means 174 all but the failed machine) automatically enables itself to support 175 a portion of the services or functions that it is configured to 176 assume for that failed cluster member. When the failed cluster 177 member becomes alive again (usually through a heartbeat), all other 178 cluster members will fall back to their normal mode of processing. 180 While I will not delve into all the relevant issues of building 181 a Fault Tolerant (FT) IP Cluster, suffice to say, however, with 182 this IP cluster model, one may easily build a Fault Tolerant IP 183 Cluster against any single point of host-or-network failure within 184 the cluster. 186 6. Implementation 188 The implementation of this IP cluster assumes that a router used 189 for IP switching is capable of forwarding IP broadcast packets. 190 While most routers have limited broadcast forwarding capability 191 (e.g., some may not forward TCP/IP broadcast packets), this 192 limitation should be easily removed by a perspective router vendor 193 by relaxing the artifically imposed transport-layer filtering 194 (which is not entirely a router's business to begin with). 196 Reconfiguration of the IP-Switch/router filters for achieving 197 better load balance should be performed by an automated script. 198 Since this type of reconfiguration is considered system down-time 199 for the IP cluster, the implementation of such a script should 200 minimize the down-time by, for instance, separating logging into 201 the router from actually modifying the filters with human control. 203 As for communications between cluster members (i.e., heartbeat, 204 etc.), any number of protocols can be used. It may be as simple 205 as ping and tcp-echo, or as sophisticated as a new multicast 206 protocol. 208 7. Performance 209 As already mentioned in Section 4, the IP clustering method 210 described in this proposal should be extremely fast. The so-called 211 IP cluster here is essentially an IP-Switch (as opposed to an 212 Ether or FastEther Switch) connecting to number of cluster 213 members each taking full advantage of the underlying transmission 214 medium without the usual network contention. 216 Assuming that one is to configure a "Super IP-Switch" with 217 maximum IO ports and each port is connected to the highest 218 bandwidth technology and server machine available, the only issue 219 with regard to performance then is the router's routing 220 capability, particularly the router's CPU required to perform 221 the interface filtering. 223 We can rest assured, however, that this interface filtering or 224 the router's routing performance cannot realistically be an issue 225 for two reasons. Reason one, because of bitmasking and wildcarding, 226 each interface filter list should be very short and compact. 227 (I don't see more than six lines in each access list unless the 228 same router is also used for firewalling, etc.) Reason two, long 229 before one reaches such routing performance issues, any reasonable 230 organization would want to add a second router into the same IP 231 cluster. The VIP clustering model supports multiple routers as 232 an integral part of a single IP cluster. In fact, building such 233 an IP cluster with multiple routers is one step towards building 234 a fault-tolerant IP cluster. 236 One question remains: How effective is the load balancing scheme 237 based on the IP source address filtering, which if not effective, 238 would defeat a lot of this high-performance claim. I would say: 239 pretty effective, especially if the client base is very large 240 (which is what this proposal is intended to accomplish to begin 241 with). 243 This is simply a basic principle of statistical analysis: when 244 there is a large number of statistical samples, with each sample 245 behaving randomly and wildly, the overall statistical distribution 246 is often predictable and well behaved. In fact, the larger the 247 number, the more predictable and better behaved the statistical 248 envelope would be. Thus, this statistical property works greatly 249 in favor of this Internet-Draft's intent to use the IP cluster to 250 support very large client base. 252 Assuming one has setup the proposed IP cluster with multiple 253 servers. It makes no sense to talk about how good the load 254 balance actually is when the traffic is light enough that if all 255 the traffic gets distributed to a single cluster member that that 256 member server is still not overloaded. Good load balance becomes 257 relevant when traffic is heavy enough that some or all of the 258 cluster members must share significant (but still not necessarily 259 equal) portions of the traffic load. It is important to keep the 260 perspective that the real purpose of clustering is to avoid 261 server overloading and not to artificially maintain equal load 262 balance at all time. The beauty of this IP clustering model is 263 that the more traffic and the larger the client base grows, the 264 better and more evenly the cluster distributes the load without 265 incurring any processing overhead. 267 The above load analysis simply means that an effective IP cluster 268 does not require fully dynamic load balancing per IP packet. 269 In fact, a truly dynamic load balancing scheme on per packet 270 basis would adversely affect the performance of such an IP cluster. 271 How often (e.g., once a month, etc.) and what criteria (e.g., CPU, 272 memory, IO) the load balance sampling and analyzing should be 273 performed in order to re-tune, if necessary, the IP-Switch/router 274 access filter lists are application dependent. 276 8. Security Considerations 278 While the DNS methods for IP clustering relies on dynamic host 279 name to IP address mapping, which can easily be "spoof-ed", 280 the Virtual IP method does not suffer the same level of security 281 issues for the simple reason that it is more difficult to spoof 282 (and spoof it well) the routing topology of the Internet than to 283 spoof a DNS record. 285 Additionally, this Virtual IP clustering model does not preclude 286 any security schemes that are available under a non-cluster single 287 server environment, firewalls included. 289 9. Acknowledgments 291 Much appreciation is due to Mike Lee and Josh Sierles for 292 enlightening me with the DNS load balancing methods, and to Josh 293 again for referring me to RFC 1794. 295 10. References 297 [1] Brisco, T., "DNS Support for Load Balancing", RFC 1794, 298 Rutgers University, April 1995. 300 11. Author's Address 302 Chi Chu 303 Research 2000, Inc. 304 265 Cherry Street, 16G 305 New York, New York 10002 306 USA 308 Phone: 212-598-9455 309 Email: chi@soho.ios.com 310 URL: http://soho.ios.com/~chi 312 This document expires February 21, 1997.