idnits 2.17.1 draft-ietf-ion-marsmcs-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-27) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 200 instances of too long lines in the document, the longest one being 4 characters in excess of 72. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 167: '...An MCS MUST NOT share its ATM address ...' RFC 2119 keyword, line 177: '...ress of the MARS MUST be known to the ...' RFC 2119 keyword, line 180: '...startup, the MCS MUST open a point-to-...' RFC 2119 keyword, line 181: '...e MCS to the MARS MUST be carried over...' RFC 2119 keyword, line 182: '... MARSVC. The MCS MUST register with th...' (66 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 18, 1996) is 10022 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'ML93' is mentioned on line 48, but not defined == Unused Reference: 'BK95' is defined on line 741, but no explicit reference was found in the text == Unused Reference: 'LM93' is defined on line 744, but no explicit reference was found in the text -- Possible downref: Normative reference to a draft: ref. 'BK95' ** Obsolete normative reference: RFC 1577 (ref. 'LM93') (Obsoleted by RFC 2225) -- No information found for draft-luciani-rolc-scsp - is the name correct? -- Possible downref: Normative reference to a draft: ref. 'LA96' -- No information found for draft-talpade-ion-multmcs - is the name correct? -- Possible downref: Normative reference to a draft: ref. 'TA96' Summary: 13 errors (**), 0 flaws (~~), 4 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force Talpade and Ammar 2 INTERNET-DRAFT Georgia Institute of Technology 3 November 18, 1996 5 Expires: May, 1996 7 8 Multicast Server Architectures for MARS-based ATM multicasting. 10 Status of this Memo 12 This document is an Internet-Draft. Internet-Drafts are working documents 13 of the Internet Engineering Task Force (IETF), its areas, and its working 14 groups. Note that other groups may also distribute working documents as 15 Internet-Drafts. 17 Internet-Drafts are draft documents valid for a maximum of six months and 18 may be updated, replaced, or obsoleted by other documents at any time. It 19 is inappropriate to use Internet- Drafts as reference material or to cite 20 them other than as ``work in progress.'' 22 To learn the current status of any Internet-Draft, please check the 23 ``1id-abstracts.txt'' listing contained in the Internet- Drafts Shadow 24 Directories on ds.internic.net (US East Coast), nic.nordu.net (Europe), 25 ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific Rim). 27 Abstract 29 A mechanism to support the multicast needs of layer 3 protocols in general, 30 and IP in particular, over UNI 3.0/3.1 based ATM networks has been 31 described in RFC 2022. Two basic approaches exist for the intra-subnet 32 (intra-cluster) multicasting of IP packets. One makes use of a mesh of 33 point to multipoint VCs (the 'VC Mesh' approach), while the other uses a 34 shared point to multipoint tree rooted on a Multicast Server (MCS). This 35 memo provides details on the design and implementation of an MCS, building 36 on the core mechanisms defined in RFC 2022. It also provides a mechanism 37 for using multiple MCSs per group for providing fault tolerance. This 38 approach can be used with RFC 2022 based MARS server and clients, without 39 needing any change in their functionality. 41 1 Introduction 43 A solution to the problem of mapping layer 3 multicast service over the 44 connection-oriented ATM service provided by UNI 3.0/3.1, has been presented 45 in [GA96]. A Multicast Address Resolution Server (MARS) is used to 46 maintain a mapping of layer 3 group addresses to ATM addresses in that 47 architecture. It can be considered to be an extended analog of the ATM ARP 48 Server introduced in RFC 1577 ([ML93]). Hosts in the ATM network use the 49 MARS to resolve layer 3 multicast addresses into corresponding lists of ATM 50 addresses of group members. Hosts keep the MARS informed when they need to 51 join or leave a particular layer 3 group. 53 The MARS manages a "cluster" of ATM-attached endpoints. A "cluster" is 54 defined as 56 "The set of ATM interfaces choosing to participate in direct ATM 57 connections to achieve multicasting of AALSDUs between themselves." 59 In practice, a cluster is the set of endpoints that choose to use the same 60 MARS to register their memberships and receive their updates from. 62 A sender in the cluster has two options for multicasting data to the group 63 members. It can either get the list of ATM addresses constituting the 64 group from the MARS, set up a point-to-multipoint virtual circuit (VC) with 65 the group members as leaves, and then proceed to send data out on it. 66 Alternatively, the source can make use of a proxy Multicast Server (MCS). 67 The source transmits data to such an MCS, which in turn uses a 68 point-to-multipoint VC to get the data to the group members. 70 The MCS approach has been briefly introduced in [GA96]. This memo presents 71 a detailed description of MCS architecture and proposes a simple mechanism 72 for supporting multiple MCSs for fault tolerance. We assume an 73 understanding of the IP multicasting over UNI 3.0/3.1 ATM network concepts 74 described in [GA96], and access to it. This document is organized as 75 follows. Section 2 presents interactions with the local UNI 3.0/3.1 76 signaling entity that are used later in the document and have been 77 originally described in [GA96]. Section 3 presents an MCS architecture, 78 along with a description of its interactions with the MARS. Section 4 79 describes the working of an MCS. The possibility of using multiple MCSs for 80 the same layer 3 group, and the mechanism needed to support such usage, is 81 described in section 5. A comparison of the VC Mesh approach and the MCS 82 approach is presented in Appendix A. 84 2 Interaction with the local UNI 3.0/3.1 signaling entity 86 The following generic signaling functions are presumed to be available to 87 local AAL Users: 89 LCALL-RQ - Establish a unicast VC to a specific endpoint. 90 LMULTI-RQ - Establish multicast VC to a specific endpoint. 91 LMULTI-ADD - Add new leaf node to previously established VC. 92 LMULTI-DROP - Remove specific leaf node from established VC. 93 LRELEASE - Release unicast VC, or all Leaves of a multicast VC. 95 The following indications are assumed to be available to AAL Users, 96 generated by by the local UNI 3.0/3.1 signaling entity: 98 LACK - Succesful completion of a local request. 99 LREMOTE-CALL - A new VC has been established to the AAL User. 100 ERRL-RQFAILED - A remote ATM endpoint rejected an LCALLRQ, 101 LMULTIRQ, or L-MULTIADD. 102 ERRL-DROP - A remote ATM endpoint dropped off an existing VC. 103 ERRL-RELEASE - An existing VC was terminated. 105 3 MCS Architecture 107 The MCS acts as a proxy server which multicasts data received from a source 108 to the group members in the cluster. All multicast sources transmitting to 109 an MCS-based group send the data to the specified MCS. The MCS then 110 forwards the data over a point to multipoint VC that it maintains to group 111 members in the cluster. Each multicast source thus maintains a single 112 point-to-multipoint VC to the designated MCS for the group. The designated 113 MCS terminates one point-to-multipoint VC from each cluster member that is 114 multicasting to the layer 3 group. Each group member is the leaf of the 115 point-to-multipoint VC originating from the MCS. 117 A brief introduction to possible MCS architectures has been presented in 118 [GA96]. The main contribution of that document concerning the MCS approach 119 is the specification of the MARS interaction with the MCS. The next section 120 lists control messages exchanged by the MARS and MCS. 122 3.1 Control Messages exchanged by the MCS and the MARS 124 The following control messages are exchanged by the MARS and the MCS. 126 operation code Control Message 128 1 MARS_REQUEST 129 2 MARS_MULTI 130 3 MARS_MSERV 131 6 MARS_NAK 132 7 MARS_UNSERV 133 8 MARS_SJOIN 134 9 MARS_SLEAVE 135 12 MARS_REDIRECT_MAP 137 MARSMSERV and MARS-UNSERV are identical in format to the MARSJOIN message. 138 MARSSJOIN and MARS-SLEAVE are also identical in format to MARSJOIN. As 139 such, their formats and those of MARSREQUEST, MARS-MULTI, MARSNAK and 140 MARSREDIRECT-MAP are described in [GA96]. Their usage is described in 141 section 4. All control messages are LLC/SNAP encapsulated as described in 142 section 4.2 of [GA96]. (The "mar$" notation used in this document is 143 borrowed from [GA96], and indicates a specific field in the control 144 message.) Data messages are reflected without any modification by the MCS. 146 3.2 Association with a layer 3 group 148 The simplest MCS architecture involves taking incoming AALSDUs from the 149 multicast sources and sending them out over the point-to-multipoint VC to 150 the group members. The MCS can service just one layer 3 group using this 151 design, as it has no way of distinguishing between traffic destined for 152 different groups. So each layer 3 MCS-supported group will have its own 153 designated MCS. 155 However it is desirable in the interests of saving resources to utilize the 156 same MCS to support multiple groups. This can be done by adding minimal 157 layer 3 specific processing into the MCS. The MCS can now look inside the 158 received AALSDUs and determine which layer 3 group they are destined for. 159 A single instance of such an MCS could register its ATM address with the 160 MARS for multiple layer 3 groups, and manage multiple point-to-multipoint 161 VCs, one for each group. This capability is included in the MCS 162 architecture, as is the capability of having multiple MCSs per group 163 (section 5). 165 4 Working of MCS 167 An MCS MUST NOT share its ATM address with any other cluster member (MARS 168 or otherwise). However, it may share the same physical ATM interface (even 169 with other MCSs or the MARS), provided that each logical entity has a 170 different ATM address. This section describes the working of MCS and its 171 interactions with the MARS and other cluster members. 173 4.1 Usage of MARSMSERV and MARS-UNSERV 175 4.1.1 Registration (and deregistration) with the MARS 177 The ATM address of the MARS MUST be known to the MCS by out-of-band means 178 at startup. One possible approach for doing this is for the network 179 administrator to specify the MARS address at command line while invoking 180 the MCS. On startup, the MCS MUST open a point-to-point control VC (MARSVC) 181 with the MARS. All traffic from the MCS to the MARS MUST be carried over 182 the MARSVC. The MCS MUST register with the MARS using the MARS-MSERV 183 message on startup. To register, a MARSMSERV MUST be sent by the MCS to 184 the MARS over the MARSVC. On receiving this MARS-MSERV, the MARS adds the 185 MCS to the ServerControlVC. The ServerControlVC is maintained by the MARS 186 with all MCSs as leaves, and is used to disseminate general control 187 messages to all the MCSs. The MCS MUST terminate this VC, and MUST expect 188 a copy of the MCS registration MARSMSERV on the MARS-VC from the MARS. 190 An MCS can deregister by sending a MARSUNSERV to the MARS. A copy of this 191 MARSUNSERV MUST be expected back from the MARS. The MCS will then be 192 dropped from the ServerControlVC. 194 No protocol specific group addresses are included in MCS registration 195 MARSMSERV and MARS-UNSERV. The mar$flags.register bit MUST be set, the 196 mar$cmi field MUST be set to zero, the mar$flags.sequence field MUST be set 197 to zero, the source ATM address MUST be included and a null source protocol 198 address MAY be specified in these MARSMSERV and MARS-UNSERV. All other 199 fields are set as described in section 5.2.1 of [GA96] (the MCS can be 200 considered to be a cluster member while reading that section). It MUST 201 keep retransmitting (section 4.1.3) the MARSMSERV/MARS-UNSERV over the 202 MARSVC until it receives a copy back. 204 In case of failure to open the MARSVC, or error on it, the reconnection 205 procedure outlined in section 4.5.2 is to be followed. 207 4.1.2 Registration (and deregistration) of layer 3 groups 209 The MCS can register with the MARS to support particular group(s). To 210 register a group X, a MARSMSERV with a pair of MUST be 211 sent to the MARS. The MCS MUST expect a copy of the MARSMSERV back from the 212 MARS. The retransmission strategy outlined in section 4.1.3 is to be 213 followed if no copy is received. Multiple groups can be supported by 214 sending a separate MARSMSERV for each group. 216 The MCS MUST similarly use MARSUNSERV if it wants to withdraw support for a 217 specific layer 3 group. A copy of the group MARSUNSERV MUST be received, 218 failing which the retransmission strategy in section 4.1.3 is to be 219 followed. 221 The mar$flags.register bit MUST be reset and the mar$flags.sequence field 222 MUST be set to zero in the group MARSMSERV and MARS-UNSERV. All other 223 fields are set as described in section 5.2.1 of [GA96] (the MCS can be 224 considered to be a cluster member when reading that section). 226 4.1.3 Retransmission of MARSMSERV and MARS-UNSERV 228 Transient problems may cause loss of control messages. The MCS needs to 229 retransmit MARSMSERV/MARS-UNSERV at regular intervals when it does not 230 receive a copy back from the MARS. This interval should be no shorter than 231 5 seconds, and a default value of 10 seconds is recommended. A maximum of 232 5 retransmissions are permitted before a failure is logged. This MUST be 233 considered a MARS failure, which SHOULD result in the MARS reconnection 234 mechanism described in section 4.5.2. 236 A "copy" is defined as a received message with the following fields 237 matching the previously transmitted MARSMSERV/MARS-UNSERV: 239 - mar$op 240 - mar$flags.register 241 - mar$pnum 242 - Source ATM address 243 - first pair 245 In addition, a valid copy MUST have the following field values: 247 - mar$flags.punched = 0 248 - mar$flags.copy = 1 250 If either of the above is not true, the message MUST be dropped without 251 resetting of the MARSMSERV/MARS-UNSERV timer. There MUST be only one 252 MARSMSERV or MARS-UNSERV outstanding at a time. 254 4.1.4 Processing of MARSMSERV and MARS-UNSERV 256 The MARS transmits copies of group MARSMSERV and MARS-UNSERV on the 257 ServerControlVC. So they are also received by MCSs other than the 258 originating one. This section discusses the processing of these messages 259 by the other MCSs. 261 If a MARSMSERV is seen that refers to a layer 3 group not supported by the 262 MCS, it MUST be used to track the Server Sequence Number (section 4.5.1) 263 and then silently dropped. 265 If a MARSMSERV is seen that refers to a layer 3 group supported by the MCS, 266 the MCS learns of the existence of another MCS supporting the same group. 267 This possibility is incorporated (of multiple MCSs per group) in this 268 version of the MCS approach and is discussed in section 5. 270 4.2 Usage of MARSREQUEST and MARS-MULTI 272 As described in section 5.1, the MCS learns at startup whether it is an 273 active or inactive MCS. After successful registration with the MARS, an MCS 274 which has been designated as inactive for a particular group MUST NOT 275 register to support that group with the MARS. It instead proceeds as in 276 section 5.4. The active MCS for a group also has to do some special 277 processing, which we describe in that section. The rest of section 4 278 describes the working of a single active MCS, with section 5 describing the 279 active MCSs actions for supporting multiple MCSs. 281 After the active MCS registers to support a layer 3 group, it uses 282 MARSREQUEST and MARS-MULTI to obtain information about group membership 283 from the MARS. These messages are also used during the revalidation phase 284 (section 4.5) and when no outgoing VC exists for a received layer 3 packet 285 (section 4.3). 287 On registering to support a particular layer 3 group, the MCS MUST send a 288 MARSREQUEST to the MARS. The mechanism to retrieve group membership and the 289 format of MARSREQUEST and MARS-MULTI is described in section 5.1.1 and 290 5.1.2 of [GA96] respectively. The MCS MUST use this mechanism for sending 291 (and retransmitting) the MARSREQUEST and processing the returned 292 MARSMULTI(/s). The MARS-MULTI MUST be received correctly, and the MCS MUST 293 use it to initialize its knowledge of group membership. 295 On successful reception of a MARSMULTI, the MCS MUST attempt to open the 296 outgoing point-to-multipoint VC using the mechanism described in section 297 5.1.3 of [GA96], if any group members exist. The MCS however MUST start 298 transmitting data on this VC after it has opened it successfully with at 299 least one of the group members as a leaf, and after it has attempted to add 300 all the group members at least once. 302 4.3 Usage of outgoing point-to-multipoint VC 304 Cluster members which are sources for MCS-supported layer 3 groups send 305 (encapsulated) layer 3 packets to the designated MCSs. An MCS, on 306 receiving them from cluster members, has to send them out over the specific 307 point-to-multipoint VC for that layer 3 group. This VC is setup as 308 described in the previous section. However, it is possible that no group 309 members currently exist, thus causing no VC to be setup. So an MCS may 310 have no outgoing VC to forward received layer 3 packets on, in which case 311 it MUST initiate the MARSREQUEST and MARS-MULTI sequence described in the 312 previous section. This new MARSMULTI could contain new members, whose 313 MARSSJOINs may have been not received by the MCS (and the loss not detected 314 due to absence of traffic on the ServerControlVC). 316 If an MCS learns that there are no group members (MARSNAK received from 317 MARS), it MUST delay sending out a new MARSREQUEST for that group for a 318 period no less than 5 seconds and no more than 10 seconds. 320 Layer 3 packets received from cluster members, while no outgoing 321 point-to-multipoint VC exists for that group, MUST be silently dropped 322 after following the guidelines in the previous paragraphs. This might 323 result in some layer 3 packets being lost until the VC is setup. 325 Each outgoing point-to-multipoint VC has a revalidate flag associated with 326 it. This flag MUST be checked whenever a layer 3 packet is sent out on 327 that VC. No action is taken if it is not set. If it is set, the packet is 328 sent out, the revalidation procedure (section 4.5.3) MUST be initiated for 329 this group, and the flag MUST be reset. 331 In case of error on a point-to-multipoint VC, the MCS MUST initiate 332 revalidation procedures for that VC as described in section 4.5.3. 334 Once a point-to-multipoint VC has been setup for a particular layer 3 335 group, the MCS MUST hold the VC open and mark it as the outgoing path for 336 any subsequent layer 3 packets being sent for that group address. A 337 point-to-multipoint VC MUST NOT have an activity timer associated with it. 338 It is to remain up at all times, unless the MCS explicitly stops supporting 339 that layer 3 group, or no more leaves exist on the VC which causes it to be 340 shut down. The VC is kept up inspite of non-existent traffic to reduce the 341 delay suffered by MCS supported groups. If the VC were to be shut down on 342 absence of traffic, the VC reestablishment procedure (needed when new 343 traffic for the layer 3 group appears) would further increase the initial 344 delay, which can be potentially higher than the VC mesh approach anyway as 345 two VCs need to be setup in the MCS case (one from source to MCS, second 346 from MCS to group) as opposed to only one (from source to group) in the VC 347 Mesh approach. This approach of keeping the VC from the MCS open even in 348 the absense of traffic is experimental. A decision either way can only be 349 made after gaining experience (either through implementation or simulation) 350 about the implications of keeping the VC open. 352 If the MCS supports multiple layer 3 groups, it MUST follow the procedure 353 outlined in the four previous subsections for each group that it is an 354 active MCS. Each incoming data AALSDU MUST be examined for determining its 355 recipient group, before being forwarded onto the appropriate outgoing 356 point-to-multipoint VC. 358 4.3.1 Group member dropping off a point-to-multipoint VC 360 AN ERRL-DROP may be received during the lifetime of a point-to-multipoint 361 VC indicating that a leaf node has terminated its participation at the ATM 362 level. The ATM endpoint associated with the ERRL-DROP MUST be removed from 363 the locally held set associated with the VC. The revalidate flag on the VC 364 MUST be set after a random interval of 1 through 10 seconds. 366 If an ERRL-RELEASE is received for a VC, then the entire set is cleared and 367 the VC considered to be completely shutdown. A new VC for this layer 3 368 group will be established only on reception of new traffic for the group 369 (as described in section 4.3). 371 4.4 Processing of MARSSJOIN and MARS-SLEAVE 373 The MARS transmits equivalent MARSSJOIN/MARS-SLEAVE on the ServerControlVC 374 when it receives MARSJOIN/MARS-LEAVE from cluster members. The MCSs keep 375 track of group membership updates through these messages. The format of 376 these messages are identical to MARSJOIN and MARS-LEAVE, which are 377 described in section 5.2.1 of [GA96]. It is sufficient to note here that 378 these messages carry the ATM address of the node joining/leaving the 379 group(/s), the group(/s) being joined or left, and a Server Sequence Number 380 from MARS. 382 When a MARSSJOIN is seen which refers to (or encompasses) a layer 3 group 383 (or groups) supported by the MCS, the following action MUST be taken. The 384 new member's ATM address is extracted from the MARSSJOIN. An L-MULTIADD is 385 issued for the new member for each of those referred groups which have an 386 outgoing point-to-multipoint VC. An LMULTI-RQ is issued for the new member 387 for each of those refered groups which have no outgoing VCs. 389 When a MARSSLEAVE is seen that refers to (or encompasses) a layer 3 group 390 (or groups) supported by the MCS, the following action MUST be taken. The 391 leaving member's ATM address is extracted. An LMULTI-DROP is issued for 392 the member for each of the refered groups which have an outgoing 393 point-to-multipoint VC. 395 There is a possibility of the above requests (LMULTI-RQ or LMULTIADD or 396 LMULTI-DROP) failing. The UNI 3.0/3.1 failure cause must be returned in 397 the ERRL-RQFAILED signal from the local signaling entity to the AAL User. 398 If the failure cause is not 49 (Quality of Service unavailable), 51 (user 399 cell rate not available - UNI 3.0), 37 (user cell rate not available - UNI 400 3.1), or 41 (Temporary failure), the endpoint's ATM address is dropped from 401 the locally held view of the group by the MCS. Otherwise, the request MUST 402 be re-attempted with increasing delay (initial value between 5 to 10 403 seconds, with delay value doubling after each attempt) until it either 404 succeeds or the multipoint VC is released or a MARSSLEAVE is received for 405 that group member. If the VC is open, traffic on the VC MUST continue 406 during these attempts. 408 MARSSJOIN and MARS-SLEAVE are processed differently if multiple MCSs share 409 the members of the same layer 3 group (section 5.4). MARSSJOIN and 410 MARSSLEAVE that do not refer to (or encompass) supported groups MUST be 411 used to track the Server Sequence Number (section 4.5.1), but are otherwise 412 ignored. 414 4.5 Revalidation Procedures 416 The MCS has to initiate revalidation procedures in case of certain failures 417 or errors. 419 4.5.1 Server Sequence Number 421 The MCS needs to track the Server Sequence Number (SSN) in the messages 422 received on the ServerControlVC from the MARS. It is carried in the mar$msn 423 of all messages (except MARSNAK) sent by the MARS to MCSs. A jump in SSN 424 implies that the MCS missed the previous message(/s) sent by the MARS. The 425 MCS then sets the revalidate flag on all outgoing point-to-multipoint VCs 426 after a random delay of between 1 and 10 seconds, to avoid all MCSs 427 inundating the MARS simultaneously in case of a more general failure. 429 The only exception to the rule is if a sequence number is detected during 430 the establishment of a new group's VC (i.e. a MARSMULTI was correctly 431 received, but its mar$msn indicated that some previous MARS traffic had 432 been missed on ClusterControlVC). In this case every open VC, EXCEPT the 433 one just being established, MUST have its revalidate flag set at some 434 random interval between 1 and 10 seconds from the time the jump in SSN was 435 detected. (The VC being established is considered already validated in 436 this case). 438 Each MCS keeps its own 32 bit MCS Sequence Number (MSN) to track the SSN. 439 Whenever a message is received that carries a mar$msn field, the following 440 processing is performed: 442 Seq.diff = mar$msn - MSN 444 mar$msn -> MSN 446 (.... process MARS message ....) 448 if ((Seq.diff != 1) && (Seq.diff != 0)) 449 then (.... revalidate group membership information ....) 451 The mar$msn value in an individual MARSMULTI is not used to update the MSN 452 until all parts of the MARSMULTI (if > 1) have arrived. (If the mar$msn 453 changes during reception of a MARSMULTI series, the MARS-MULTI is discarded 454 as described in section 5.1.1 of [GA96]). 456 The MCS sets its MSN to zero on startup. It gets the current value of SSN 457 when it receives the copy of the registration MARSMSERV back from the MARS. 459 4.5.2 Reconnecting to the MARS 461 The MCSs are assumed to have been configured with the ATM address of at 462 least one MARS at startup. MCSs MAY choose to maintain a table of ATM 463 addresses, each address representing alternative MARS which will be 464 contacted in case of failure of the previous one. This table is assumed to 465 be ordered in descending order of preference. 467 An MCS will decide that it has problems communicating with a MARS if: 469 * It fails to establish a point-to-point VC with the MARS. 471 * MARSREQUEST generates no response (no MARSMULTI or MARS-NAK returned). 473 * ServerControlVC fails. 475 * MARSMSERV or MARSUNSERV do not result in their respective copies being 476 received. 478 (reconnection as in section 5.4 in [GA96], with MCS-specific actions used 479 where needed). 481 4.5.3 Revalidating a point-to-multipoint VC 483 The revalidation flag associated with a point-to-multipoint VC is checked 484 when a layer 3 packet is to be sent out on the VC. Revalidation procedures 485 MUST be initiated for a point-to-multipoint VC that has its revalidate flag 486 set when a layer 3 packet is being sent out on it. Thus more active groups 487 get revalidated faster than less active ones. The revalidation process 488 MUST NOT result in disruption of normal traffic on the VC being 489 revalidated. 491 The revalidation procedure is as follows. The MCS reissues a MARSREQUEST 492 for the VC being revalidated. The returned set of members is compared with 493 the locally held set; LMULTI-ADDs MUST be issued for new members, and 494 LMULTI-DROPs MUST be issued for non-existent ones. The revalidate flag 495 MUST be reset for the VC. 497 5 Multiple MCSs for a layer 3 group 499 Having a single MCS for a layer 3 group can cause it to become a single 500 point of failure and a bottleneck for groups with large numbers of active 501 senders. It is thus desirable to introduce a level of fault tolerance by 502 having multiple MCS per group. Support for load sharing is not introduced 503 in this version of the draft so as to reduce the complexity of the 504 protocol. 506 5.1 Outline 508 The protocol described in this draft offers fault tolerance by using 509 multiple MCSs for the same group. This is achieved by having a standby MCS 510 take over from a failed MCS which had been supporting the group. The MCS 511 currently supporting a group is refered to as the active MCS, while the one 512 or more standby MCSs are refered to as inactive MCSs. There is only one 513 active MCS existing at any given instant for an MCS-supported group. The 514 protocol makes use of the HELLO messages as described in [LA96]. 516 To reduce the complexity of the protocol, the following operational 517 guidelines need to be followed. These guidelines need to be enforced by 518 out-of-band means which are not specified in this document and can be 519 implementation dependent. 521 * The set of (one or more) MCSs (``mcslist'') that support a particular 522 IP Multicast group is predetermined and fixed. This set MUST be known 523 to each MCS in the set at startup, and the ordering of MCSs in the set 524 is the same for all MCSs in the set. An implementation of this would 525 be to maintain the set of ATM addresses of the MCSs in a file, an 526 identical copy of which is kept at each MCS in the set. 528 * All MCSs in ``mcslist'' have to be started up together, with the first 529 MCS in ``mcslist'' being the last to be started. 531 * A failed MCS cannot be started up again. 533 5.2 Discussion of Multiple MCSs in operation 535 An MCS on startup determines its position in the ``mcslist''. If the MCS 536 is not the first in ``mcslist'', it does not register for supporting the 537 group with the MARS. If the MCS is first in the set, it does register to 538 support the group. 540 The first MCS thus becomes the active MCS and supports the group as 541 described in section 4. The active MCS also opens a point-to-multipoint VC 542 (HelloVC) to the remaining MCSs in the set (the inactive MCSs). It starts 543 sending HELLO messages on this VC at a fixed interval (HelloInterval 544 seconds). The inactive MCSs maintain a timer to keep track of the last 545 received HELLO message. If an inactive MCS does not receive a message 546 within HelloInterval* DeadFactor seconds (values of HelloInterval and 547 DeadFactor are the same at all the MCSs), or if the HelloVC is closed, it 548 assumes failure of the active MCS and attempts to elect a new one. The 549 election process is described in section 5.5. 551 If an MCS is elected as the new active one, it registers to support the 552 group with the MARS. It also initiates the transmission of HELLO messages 553 to the remaining inactive MCSs. 555 5.3 Inter-MCS control messages 557 The protocol uses HELLO messages in the heartbeat mechanism, and also 558 during the election process. The format of the HELLO message is based on 559 that described in [LA96]. The Hello message type code is 5. 561 0 1 2 3 562 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 563 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 564 | Sender Len | Recvr Len | State | Type | unused | 565 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 566 | HelloInterval | DeadFactor | 567 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 568 | IP Multicast address | 569 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 570 | Sender ATM address (variable length) | 571 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 572 | Receiver ATM address (variable length) | 573 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 575 Sender Len 576 This field holds the length in octets of the Sender ATM address. 578 Recvr Len 579 This field holds the length in octets of the Receiver ATM 580 address. 582 State 583 Currently two states: No-Op (0x00) and Elected (0x01). 584 It is used by a candidate MCS to indicate if it was successfully 585 elected. 587 Type 588 This is the code for the message type. 590 HelloInterval 591 The hello interval advertises the time between sending of 592 consecutive Hello Messages by an active MCS. If the time between 593 Hello messages exceeds the HelloInterval then the Hello is to be 594 considered late by the inactive MCS. 596 DeadFactor 597 This is a multiplier to the HelloInterval. If an inactive MCS 598 does not receive a Hello message within the interval 599 HelloInterval*DeadFactor from an active MCS that advertised 600 the HelloInterval then the inactive MCS MUST consider the active 601 one to have failed. 603 IP Multicast address 604 This field is used to indicate the group to associate the HELLO 605 message with. It is useful if MCSs can support more than one 606 group. 608 Sender ATM address 609 This is the protocol address of the server which is sending the 610 Hello. 612 Receiver ATM address 613 This is the protocol address of the server which is to Reply to 614 the Hello. If the sender does not know this address then the 615 sender sets it to zero. (This happens in the HELLO messages sent 616 from the active MCS to the inactive ones, as they are multicast 617 and not sent to one specific receiver). 619 5.4 The Multiple MCS protocol 621 As is indicated in section 5.1, all the MCSs supporting the same IP 622 Multicast group MUST be started up together. The set of MCSs (``mcslist'') 623 MUST be specified to each MCS in the set at startup. After registering to 624 support the group with the MARS, the first MCS in the set MUST open a 625 point-to-multipoint VC (HelloVC) with the remaining MCSs in the ``mcslist'' 626 as leaves, and thus assumes the role of active MCS. It MUST send HELLO 627 messages HelloInterval seconds apart on this VC. The Hello message sent by 628 the active MCS MUST have the Receiver Len set to zero, the State field set 629 to "Elected", with the other fields appropriately set. The Receiver ATM 630 address field does not exist in this HELLO message. The initial value of 631 HelloInterval and DeadFactor MUST be the same at all MCSs at startup. The 632 active MCS can choose to change these values by introducing the new value 633 in the HELLO messages that are sent out. The active MCS MUST support the 634 group as described in section 4. 636 The other MCSs in ``mcslist'' determine the identity of the first MCS from 637 the ``mcslist''. They MUST NOT register to support the group with the 638 MARS, and become inactive MCSs. On startup, an inactive MCS expects HELLO 639 messages from the active MCS. The inactive MCS MUST terminate the HelloVC. 640 A timer MUST be maintained, and if the inactive MCS does not receive HELLO 641 message from the active one within a period HelloInterval*DeadFactor 642 seconds, it assumes that the active MCS died, and initiates the election 643 process as described in section 5.5. If a HELLO message is received within 644 this period, the inactive MCS does not initiate any further action, other 645 than restarting the timer. The inactive MCSs MUST set their values of 646 HelloInterval and DeadFactor to those specified by the active MCS in the 647 HELLO messages. 649 On failure of the active MCS, a new MCS assumes its role as described in 650 section 5.5. In this case, the remaining inactive MCSs will expect HELLO 651 messages from this new active MCS as described in the previous paragraph. 653 5.5 Failure handling 655 5.5.1 Failure of active MCS 657 The failure of the active MCS is detected by the inactive MCSs if no HELLO 658 message is received within an interval of HelloInterval*DeadFactor seconds, 659 or if the HelloVC is closed. In this case the next MCS in ``mcslist'' 660 becomes the candidate MCS. It MUST open a point-to-multipoint VC to the 661 remaining inactive MCSs (HelloVC) and send a HELLO message on it with the 662 State field set to No-Op. The rest of the message is formatted as 663 described earlier. 665 On receiving a HELLO message from a candidate MCS, an inactive MCS MUST 666 open a point-to-point VC to that candidate. It MUST send a HELLO message 667 back to it, with the Sender and Receiver fields appropriately set (not 668 zero), and the State field being No-Op. If a HELLO message is received by 669 an inactive MCS from a non-candidate MCS, it is ignored. If no HELLO 670 message is received from the candidate with the State field set to 671 "Elected" in HelloInterval seconds, the inactive MCS MUST retransmit the 672 HELLO. If no HELLO message with State field set to "Elected" is received by 673 the inactive MCSs within an interval of HelloInterval*DeadFactor seconds, 674 the next MCS in ``mcslist'' is considered as the candidate MCS. Note that 675 the values used for HelloInterval and DeadFactor in the election phase are 676 the default ones. 678 The candidate MCS MUST wait for a period of HelloInterval*DeadFactor 679 seconds for receiving HELLO messages from inactive MCSs. It MUST transmit 680 HELLO messages with State field set to No-Op at HelloInterval seconds 681 interval during this period. If it receives messages from atleast half of 682 the remaining inactive MCSs during this period, it considers itself elected 683 and assumes the active MCS role. It then registers to support the group 684 with the MARS, and starts sending HELLO messages at HelloInterval second 685 intervals with State field set to "Elected" on the already existing 686 HelloVC. The active MCS can then alter the HelloInterval and DeadFactor 687 values if desired, and communicate the same to the inactive MCSs in the 688 HELLO message. 690 5.5.2 Failure of inactive MCS 692 If an inactive MCS drops off the HelloVC, the active MCS MUST attempt to 693 add that MCS back to the VC for three attempts, spaced 694 HelloInterval*DeadFactor seconds apart. If even the third attempt fails, 695 the inactive MCS is considered dead. 697 An MCS, active or inactive, MUST NOT be started up once it has failed. 698 Failed MCSs can only be started up by manual intervention after shutting 699 down all the MCSs, and restarting them together. 701 5.6 Compatibility with future MARS and MCS versions 703 Future versions of MCSs can be expected to use an enhanced MARS for load 704 sharing and fault tolerance ([TA96]). The MCS architecture described in 705 this document is compatible with the enhanced MARS and the future MCS 706 versions. This is because the active MCS is the only one which 707 communicates with the MARS about the group. Hence the active MCS will only 708 be informed by the enhanced MARS about the subset of the group that it is 709 to support. Thus MCSs conforming to this document are compatible with 710 [GA96] based MARS, as well as enhanced MARS. 712 6 Summary 714 This draft describes the architecture of an MCS. It also provides a 715 mechanism for using multiple MCSs per group for providing fault tolerance. 716 This approach can be used with [GA96] based MARS server and clients, 717 without needing any change in their functionality. It uses the HELLO 718 packet format as described in [LA96] for the heartbeat messages. 720 7 Acknowledgements 722 We would like to acknowledge Grenville Armitage (Bellcore) for reviewing 723 the draft and suggesting improvements towards simplifying the multiple MCS 724 functionalities. Discussion with Joel Halpern (Newbridge) helped clarify 725 the multiple MCS problem. Anthony Gallo (IBM RTP) pointed out security 726 issues that are not adequately addressed in the current draft. 728 8 Authors' Address 730 Rajesh Talpade - taddy@cc.gatech.edu - (404)-894-6737 731 Mostafa H. Ammar - ammar@cc.gatech.edu - (404)-894-3292 733 College of Computing 734 Georgia Institute of Technology 735 Atlanta, GA 30332-0280 736 References 738 [GA96] Armitage, G.J., "Support for Multicast over UNI 3.0/3.1 based ATM 739 networks", RFC 2022. 741 [BK95] Birman, A., Kandlur, D., Rubas, J., "An extension to the MARS 742 model", Internet Draft, draft-kandlur-ipatm-mars-directvc-00.txt, 743 November 1995. 744 [LM93] Laubach, M., "Classical IP and ARP over ATM", RFC1577, 745 Hewlett-Packard Laboratories, December 1993. 747 [LA96] Luciani, J., G. Armitage, and J. Halpern, "Server Cache 748 Synchronization Protocol (SCSP) - NBMA", Internet Draft, 749 draft-luciani-rolc-scsp-02.txt, April 1996. 751 [TA96] Talpade, R., and Ammar, M.H., "Multiple MCS support using an 752 enhanced version of the MARS server.", Internet Draft (work in 753 progress), draft-talpade-ion-multmcs-00.txt, June 1996.