idnits 2.17.1 draft-ietf-ipatm-ipmc-08.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-25) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. == There are 23 instances of lines with multicast IPv4 addresses in the document. If these are generic example addresses, they should be changed to use the 233.252.0.x range defined in RFC 5771 ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 241: '... administrators MUST ensure that each...' RFC 2119 keyword, line 245: '... restriction MAY only occur after fu...' RFC 2119 keyword, line 569: '...control messages MUST be LLC/SNAP enca...' RFC 2119 keyword, line 602: '... null addresses SHALL be encoded as z...' RFC 2119 keyword, line 663: '...r$pro.snap field MUST be zero on trans...' (151 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 917 has weird spacing: '...qoctets sourc...' == Line 918 has weird spacing: '...roctets sourc...' == Line 919 has weird spacing: '...soctets sourc...' == Line 920 has weird spacing: '...xoctets targe...' == Line 921 has weird spacing: '...yoctets targe...' == (20 more instances...) == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: A host or endpoint interface that is using the same MARS to support multicasting needs of multiple protocols MUST not assume their CMI will be the same for each protocol. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '0xAA-AA-03' on line 2709 -- Looks like a reference, but probably isn't: '0x00-00-5E' on line 2709 -- Looks like a reference, but probably isn't: '0x00-03' on line 572 -- Looks like a reference, but probably isn't: 'Addresses' on line 2724 -- Looks like a reference, but probably isn't: '0x00-01' on line 2709 -- Looks like a reference, but probably isn't: '0x00-04' on line 1811 -- Looks like a reference, but probably isn't: '0x00-80' on line 1846 -- Looks like a reference, but probably isn't: '0x800' on line 1908 -- Looks like a reference, but probably isn't: '0x86DD' on line 2709 -- Looks like a reference, but probably isn't: '0x00-00' on line 2825 -- Looks like a reference, but probably isn't: '-' on line 3281 ** Obsolete normative reference: RFC 1483 (ref. '2') (Obsoleted by RFC 2684) ** Obsolete normative reference: RFC 1577 (ref. '3') (Obsoleted by RFC 2225) -- Possible downref: Non-RFC (?) normative reference: ref. '4' ** Downref: Normative reference to an Experimental RFC: RFC 1075 (ref. '5') ** Downref: Normative reference to an Informational RFC: RFC 1821 (ref. '7') -- Possible downref: Non-RFC (?) normative reference: ref. '8' Summary: 14 errors (**), 0 flaws (~~), 9 warnings (==), 16 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet-Draft Grenville Armitage 3 Bellcore 4 October 24th, 1995 6 Support for Multicast over UNI 3.0/3.1 based ATM Networks. 7 9 Status of this Memo 11 This document was submitted to the IETF IP over ATM WG. Publication 12 of this document does not imply acceptance by the IP over ATM WG of 13 any ideas expressed within. Comments should be submitted to the ip- 14 atm@matmos.hpl.hp.com mailing list. 16 Distribution of this memo is unlimited. 18 This memo is an internet draft. Internet Drafts are working documents 19 of the Internet Engineering Task Force (IETF), its Areas, and its 20 Working Groups. Note that other groups may also distribute working 21 documents as Internet Drafts. 23 Internet Drafts are draft documents valid for a maximum of six 24 months. Internet Drafts may be updated, replaced, or obsoleted by 25 other documents at any time. It is not appropriate to use Internet 26 Drafts as reference material or to cite them other than as a "working 27 draft" or "work in progress". 29 Please check the lid-abstracts.txt listing contained in the 30 internet-drafts shadow directories on ds.internic.net (US East 31 Coast), nic.nordu.net (Europe), ftp.isi.edu (US West Coast), or 32 munnari.oz.au (Pacific Rim) to learn the current status of any 33 Internet Draft. 35 Abstract 37 Mapping the connectionless IP multicast service over the connection 38 oriented ATM services provided by UNI 3.0/3.1 is a non-trivial task. 39 This memo describes a mechanism to support the multicast needs of 40 Layer 3 protocols in general, and describes its application to IP 41 multicasting in particular. 43 ATM based IP hosts and routers use a Multicast Address Resolution 44 Server (MARS) to support RFC 1112 style Level 2 IP multicast over the 45 ATM Forum's UNI 3.0/3.1 point to multipoint connection service. 46 Clusters of endpoints share a MARS and use it to track and 47 disseminate information identifying the nodes listed as receivers for 48 given multicast groups. This allows endpoints to establish and manage 49 point to multipoint VCs when transmitting to the group. 51 The MARS behaviour allows Layer 3 multicasting to be supported using 52 either meshes of VCs or ATM level multicast servers. This choice may 53 be made on a per-group basis, and is transparent to the endpoints. 55 Contents. 57 1. Introduction. 58 1.1 The Multicast Address Resolution Server (MARS). 59 1.2 The ATM level multicast Cluster. 60 1.3 Document overview. 61 1.4 Conventions. 62 2. The IP multicast service model. 63 3. UNI 3.0/3.1 support for intra-cluster multicasting. 64 3.1 VC meshes. 65 3.2 Multicast Servers. 66 3.3 Tradeoffs. 67 3.4 Interaction with local UNI 3.0/3.1 signalling entity. 68 4. Overview of the MARS. 69 4.1 Architecture. 70 4.2 Control message format. 71 4.3 Fixed header fields in MARS control messages. 72 4.3.1 Hardware type. 73 4.3.2 Protocol type. 74 4.3.3 Checksum. 75 4.3.4 Extensions Offset. 76 4.3.5 Operation code. 77 4.3.6 Reserved. 78 5. Endpoint (MARS client) interface behaviour. 79 5.1 Transmit side behaviour. 80 5.1.1 Retrieving Group Membership from the MARS. 81 5.1.2 MARS_REQUEST, MARS_MULTI, and MARS_NAK messages. 82 5.1.3 Establishing the outgoing multipoint VC. 83 5.1.4 Monitoring updates on ClusterControlVC. 84 5.1.4.1 Updating the active VCs. 85 5.1.4.2 Tracking the Cluster Sequence Number. 86 5.1.5 Revalidating a VC's leaf nodes. 87 5.1.5.1 When leaf node drops itself. 88 5.1.5.2 When a jump is detected in the CSN. 89 5.2. Receive side behaviour. 90 5.2.1 Format of the MARS_JOIN and MARS_LEAVE Messages. 91 5.2.1.1 Important IPv4 default values. 92 5.2.2 Retransmission of MARS_JOIN and MARS_LEAVE messages. 93 5.2.3 Cluster member registration and deregistration. 94 5.3 Support for Layer 3 group management. 95 5.4 Support for redundant/backup MARS entities. 96 5.4.1 First response to MARS problems. 97 5.4.2 Connecting to a backup MARS. 98 5.4.3 Dynamic backup lists, and soft redirects. 99 5.5 Data path LLC/SNAP encapsulations. 100 5.5.1 Type #1 encapsulation. 101 5.5.2 Type #2 encapsulation. 102 5.5.3 A Type #1 example. 104 6. The MARS in greater detail. 105 6.1 Basic interface to Cluster members. 106 6.1.1 Response to MARS_REQUEST. 107 6.1.2 Response to MARS_JOIN and MARS_LEAVE. 108 6.1.3 Generating MARS_REDIRECT_MAP. 109 6.1.4 Cluster Sequence Numbers. 110 6.2 MARS interface to Multicast Servers (MCSs). 111 6.2.1 MARS_REQUESTs for MCS supported groups. 112 6.2.2 MARS_MSERV and MARS_UNSERV messages. 113 6.2.3 Registering a Multicast Server (MCS). 114 6.2.4 Modified response to MARS_JOIN and MARS_LEAVE. 115 6.2.5 Sequence numbers for ServerControlVC traffic. 116 6.3 Why global sequence numbers? 117 6.4 Redundant/Backup MARS Architectures. 118 7. How an MCS utilises a MARS. 119 7.1 Association with a particular Layer 3 group. 120 7.2 Termination of incoming VCs. 121 7.3 Management of outgoing VC. 122 7.4 Use of a backup MARS. 123 8. Support for IP multicast routers. 124 8.1 Forwarding into a Cluster. 125 8.2 Joining in 'promiscuous' mode. 126 8.3 Forwarding across the cluster. 127 8.4 Joining in 'semi-promiscous' mode. 128 8.5 An alternative to IGMP Queries. 129 8.6 CMIs across multiple interfaces. 130 9. Multiprotocol applications of the MARS and MARS clients. 131 10. Supplementary parameter processing. 132 10.1 Interpreting the ar$extoff field. 133 10.2 The format of TLVs. 134 10.3 Processing MARS messages with TLVs. 135 10.4 Initial set of TLV elements. 136 11. Key Decisions and open issues. 137 Acknowledgments 138 References 139 Appendix A. Hole punching algorithms. 140 Appendix B. Minimising the impact of IGMP in IPv4 environments. 141 Appendix C. Further comments on 'Clusters'. 142 Appendix D. TLV list parsing algorithm. 143 Appendix E. Summary of timer values. 144 Appendix F. Pseudo code for MARS operation. 146 1. Introduction. 148 Multicasting is the process whereby a source host or protocol entity 149 sends a packet to multiple destinations simultaneously using a 150 single, local 'transmit' operation. The more familiar cases of 151 Unicasting and Broadcasting may be considered to be special cases of 152 Multicasting (with the packet delivered to one destination, or 'all' 153 destinations, respectively). 155 Most network layer models, like the one described in RFC 1112 [1] for 156 IP multicasting, assume sources may send their packets to an abstract 157 'multicast group addresses'. Link layer support for such an 158 abstraction is assumed to exist, and is provided by technologies such 159 as Ethernet. 161 ATM is being utilized as a new link layer technology to support a 162 variety of protocols, including IP. With RFC 1483 [2] the IETF 163 defined a multiprotocol mechanism for encapsulating and transmitting 164 packets using AAL5 over ATM Virtual Channels (VCs). However, the ATM 165 Forum's currently published signalling specifications (UNI 3.0 [8] 166 and UNI 3.1 [4]) does not provide the multicast address abstraction. 167 Unicast connections are supported by point to point, bidirectional 168 VCs. Multicasting is supported through point to multipoint 169 unidirectional VCs. The key limitation is that the sender must have 170 prior knowledge of each intended recipient, and explicitly establish 171 a VC with itself as the root node and the recipients as the leaf 172 nodes. 174 This document has two broad goals: 176 Define a group address registration and membership distribution 177 mechanism that allows UNI 3.0/3.1 based networks to support the 178 multicast service of protocols such as IP. 180 Define specific endpoint behaviours for managing point to 181 multipoint VCs to achieve multicasting of layer 3 packets. 183 As the IETF is currently in the forefront of using wide area 184 multicasting this document's descriptions will often focus on IP 185 service model of RFC 1112. A final chapter will note the 186 multiprotocol application of the architecture. 188 This document avoids discussion of one highly non-trivial aspect of 189 using ATM - the specification of QoS for VCs being established in 190 response to higher layer needs. Research in this area is still very 191 formative [7], and so it is assumed that future documents will 192 clarify the mapping of QoS requirements to VC establishment. The 193 default at this time is that VCs are established with a request for 194 Unspecified Bit Rate (UBR) service, as typified by the IETF's use of 195 VCs for unicast IP, described in RFC 1755 [6]. 197 1.1 The Multicast Address Resolution Server (MARS). 199 The Multicast Address Resolution Server (MARS) is an extended analog 200 of the ATM ARP Server introduced in RFC 1577 [3]. It acts as a 201 registry, associating layer 3 multicast group identifiers with the 202 ATM interfaces representing the group's members. MARS messages 203 support the distribution of multicast group membership information 204 between MARS and endpoints (hosts or routers). Endpoint address 205 resolution entities query the MARS when a layer 3 address needs to be 206 resolved to the set of ATM endpoints making up the group at any one 207 time. Endpoints keep the MARS informed when they need to join or 208 leave particular layer 3 groups. To provide for asynchronous 209 notification of group membership changes the MARS manages a point to 210 multipoint VC out to all endpoints desiring multicast support 212 Valid arguments can be made for two different approaches to ATM level 213 multicasting of layer 3 packets - through meshes of point to 214 multipoint VCs, or ATM level multicast servers (MCS). The MARS 215 architecture allows either VC meshes or MCSs to be used on a per- 216 group basis. 218 1.2 The ATM level multicast Cluster. 220 Each MARS manages a 'cluster' of ATM-attached endpoints. A Cluster is 221 defined as 223 The set of ATM interfaces chosing to participate in direct ATM 224 connections to achieve multicasting of AAL_SDUs between 225 themselves. 227 In practice, a Cluster is the set of endpoints that choose to use the 228 same MARS to register their memberships and receive their updates 229 from. 231 By implication of this definition, traffic between interfaces 232 belonging to different Clusters passes through an inter-cluster 233 device. (In the IP world an inter-cluster device would be an IP 234 multicast router with logical interfaces into each Cluster.) This 235 document explicitly avoids specifying the nature of inter-cluster 236 (layer 3) routing protocols. 238 The mapping of clusters to other constrained sets of endpoints (such 239 as unicast Logical IP Subnets) is left to each network administrator. 240 However, for the purposes of conformance with this document network 241 administrators MUST ensure that each Logical IP Subnet (LIS) is 242 served by a separate MARS, creating a one-to-one mapping between 243 cluster and unicast LIS. IP multicast routers then interconnect each 244 LIS as they do with conventional subnets. (Relaxation of this 245 restriction MAY only occur after future research on the interaction 246 between existing layer 3 multicast routing protocols and unicast 247 subnet boundaries.) 249 The term 'Cluster Member' will be used in this document to refer to 250 an endpoint that is currently using a MARS for multicast support. 251 Thus potential scope of a cluster may be the entire membership of a 252 LIS, while the actual scope of a cluster depends on which endpoints 253 are actually cluster members at any given time. 255 1.3 Document overview. 257 This document assumes an understanding of concepts explained in 258 greater detail in RFC 1112, RFC 1577, UNI 3.0/3.1, and RFC 1755 [6]. 260 Section 2 provides an overview of IP multicast and what RFC 1112 261 required from Ethernet. 263 Section 3 describes in more detail the multicast support services 264 offered by UNI 3.0/3.1, and outlines the differences between VC 265 meshes and multicast servers (MCSs) as mechanisms for distributing 266 packets to multiple destinations. 268 Section 4 provides an overview of the MARS and its relationship to 269 ATM endpoints. This section also discusses the encapsulation and 270 structure of MARS control messages. 272 Section 5 substantially defines the entire cluster member endpoint 273 behaviour, on both receive and transmit sides. This includes both 274 normal operation and error recovery. 276 Section 6 summarises the required behaviour of a MARS. 278 Section 7 looks at how a multicast server (MCS) interacts with a 279 MARS. 281 Section 8 discusses how IP multicast routers may make novel use of 282 promiscuous and semi-promiscuous group joins. Also discussed is a 283 mechanism designed to reduce the amount of IGMP traffic issued by 284 routers. 286 Section 9 discusses how this document applies in the more general 287 (non-IP) case. 289 Section 10 summarises the key proposals, and identifies areas for 290 future research that are generated by this MARS architecture. 292 The appendices provide discussion on issues that arise out the 293 implementation of this document. Appendix A discusses MARS and 294 endpoint algorithms for parsing MARS messages. Appendix B describes 295 the particular problems introduced by the current IGMP paradigms, and 296 possible interim work-arounds. Appendix C discusses the 'cluster' 297 concept in further detail, while Appendix D briefly outlines an 298 algorithm for parsing TLV lists. 300 1.4 Conventions. 302 In this document the following coding and packet representation rules 303 are used: 305 All multi-octet parameters are encoded in big-endian form (i.e. 306 the most significant octet comes first). 308 In all multi-bit parameters bit numbering begins at 0 for the 309 least significant bit when stored in memory (i.e. the n'th bit has 310 weight of 2^n). 312 A bit that is 'set', 'on', or 'one' holds the value 1. 314 A bit that is 'reset', 'off', 'clear', or 'zero' holds the value 315 0. 317 2. Summary of the IP multicast service model. 319 Under IP version 4 (IPv4), addresses in the range between 224.0.0.0 320 and 239.255.255.255 (224.0.0.0/4) are termed 'Class D' or 'multicast 321 group' addresses. These abstractly represent all the IP hosts in the 322 Internet (or some constrained subset of the Internet) who have 323 decided to 'join' the specified group. 325 RFC1112 requires that a multicast-capable IP interface must support 326 the transmission of IP packets to an IP multicast group address, 327 whether or not the node considers itself a 'member' of that group. 328 Consequently, group membership is effectively irrelevant to the 329 transmit side of the link layer interfaces. When Ethernet is used as 330 the link layer (the example used in RFC1112), no address resolution 331 is required to transmit packets. An algorithmic mapping from IP 332 multicast address to Ethernet multicast address is performed locally 333 before the packet is sent out the local interface in the same 'send 334 and forget' manner as a unicast IP packet. 336 Joining and Leaving an IP multicast group is more explicit on the 337 receive side - with the primitives JoinLocalGroup and LeaveLocalGroup 338 affecting what groups the local link layer interface should accept 339 packets from. When the IP layer wants to receive packets from a 340 group, it issues JoinLocalGroup. When it no longer wants to receive 341 packets, it issues LeaveLocalGroup. A key point to note is that 342 changing state is a local issue, it has no affect on other hosts 343 attached to the Ethernet. 345 IGMP is defined in RFC 1112 to support IP multicast routers attached 346 to a given subnet. Hosts issue IGMP Report messages when they perform 347 a JoinLocalGroup, or in response to an IP multicast router sending an 348 IGMP Query. By periodically transmitting queries IP multicast routers 349 are able to identify what IP multicast groups have non-zero 350 membership on a given subnet. 352 A specific IP multicast address, 224.0.0.1, is allocated for the 353 transmission of IGMP Query messages. Host IP layers issue a 354 JoinLocalGroup for 224.0.0.1 when they intend to participate in IP 355 multicasting, and issue a LeaveLocalGroup for 224.0.0.1 when they've 356 ceased participating in IP multicasting. 358 Each host keeps a list of IP multicast groups it has been 359 JoinLocalGroup'd to. When a router issues an IGMP Query on 224.0.0.1 360 each host begins to send IGMP Reports for each group it is a member 361 of. IGMP Reports are sent to the group address, not 224.0.0.1, "so 362 that other members of the same group on the same network can overhear 363 the Report" and not bother sending one of their own. IP multicast 364 routers conclude that a group has no members on the subnet when IGMP 365 Queries no longer elict associated replies. 367 3. UNI 3.0/3.1 support for intra-cluster multicasting. 369 For the purposes of the MARS protocol, both UNI 3.0 and UNI 3.1 370 provide equivalent support for multicasting. Differences between UNI 371 3.0 and UNI 3.1 in required signalling elements are covered in RFC 372 1755. 374 This document will describe its operation in terms of 'generic' 375 functions that should be available to clients of a UNI 3.0/3.1 376 signalling entity in a given ATM endpoint. The ATM model broadly 377 describes an 'AAL User' as any entity that establishes and manages 378 VCs and underlying AAL services to exchange data. An IP over ATM 379 interface is a form of 'AAL User' (although the default LLC/SNAP 380 encapsulation mode specified in RFC1755 really requires that an 'LLC 381 entity' is the AAL User, which in turn supports the IP/ATM 382 interface). 384 The most fundamental limitations of UNI 3.0/3.1's multicast support 385 are: 387 Only point to multipoint, unidirectional VCs may be established. 389 Only the root (source) node of a given VC may add or remove leaf 390 nodes. 392 Leaf nodes are identified by their unicast ATM addresses. UNI 393 3.0/3.1 defines two ATM address formats - native E.164 and NSAP 394 (although it must be stressed that the NSAP address is so called 395 because it uses the NSAP format - an ATM endpoint is NOT a Network 396 layer termination point). In UNI 3.0/3.1 an 'ATM Number' is the 397 primary identification of an ATM endpoint, and it may use either 398 format. Under some circumstances an ATM endpoint must be identified 399 by both a native E.164 address (identifying the attachment point of a 400 private network to a public network), and an NSAP address ('ATM 401 Subaddress') identifying the final endpoint within the private 402 network. For the rest of this document the term will be used to mean 403 either a single 'ATM Number' or an 'ATM Number' combined with an 'ATM 404 Subaddress'. 406 3.1 VC meshes. 408 The most fundamental approach to intra-cluster multicasting is the 409 multicast VC mesh. Each source establishes its own independent point 410 to multipoint VC (a single multicast tree) to the set of leaf nodes 411 (destinations) that it has been told are members of the group it 412 wishes to send packets to. 414 Interfaces that are both senders and group members (leaf nodes) to a 415 given group will originate one point to multipoint VC, and terminate 416 one VC for every other active sender to the group. This criss- 417 crossing of VCs across the ATM network gives rise to the name 'VC 418 mesh'. 420 3.2 Multicast Servers. 422 An alternative model has each source establish a VC to an 423 intermediate node - the multicast server (MCS). The multicast server 424 itself establishes and manages a point to multipoint VC out to the 425 actual desired destinations. 427 The MCS reassembles AAL_SDUs arriving on all the incoming VCs, and 428 then queues them for transmission on its single outgoing point to 429 multipoint VC. (Reassembly of incoming AAL_SDUs is required at the 430 multicast server as AAL5 does not support cell level multiplexing of 431 different AAL_SDUs on a single outgoing VC.) 432 The leaf nodes of the multicast server's point to multipoint VC must 433 be established prior to packet transmission, and the multicast server 434 requires an external mechanism to identify them. A side-effect of 435 this method is that ATM interfaces that are both sources and group 436 members will receive copies of their own packets back from the MCS 437 (An alternative method is for the multicast server to explicitly 438 retransmit packets on individual VCs between itself and group 439 members. A benefit of this second approach is that the multicast 440 server can ensure that sources do not receive copies of their own 441 packets.) 443 The simplest MCS pays no attention to the contents of each AAL_SDU. 444 It is purely an AAL/ATM level device. More complex MCS architectures 445 (where a single endpoint serves multiple layer 3 groups) are 446 possible, but are beyond the scope of this document. More detailed 447 discussion is provided in section 7. 449 3.3 Tradeoffs. 451 Arguments over the relative merits of VC meshes and multicast servers 452 have raged for some time. Ultimately the choice depends on the 453 relative trade-offs a system administrator must make between 454 throughput, latency, congestion, and resource consumption. Even 455 criteria such as latency can mean different things to different 456 people - is it end to end packet time, or the time it takes for a 457 group to settle after a membership change? The final choice depends 458 on the characteristics of the applications generating the multicast 459 traffic. 461 If we focussed on the data path we might prefer the VC mesh because 462 it lacks the obvious single congestion point of an MCS. Throughput 463 is likely to be higher, and end to end latency lower, because the 464 mesh lacks the intermediate AAL_SDU reassembly that must occur in 465 MCSs. The underlying ATM signalling system also has greater 466 opportunity to ensure optimal branching points at ATM switches along 467 the multicast trees originating on each source. 469 However, resource consumption will be higher. Every group member's 470 ATM interface must terminate a VC per sender (consuming on-board 471 memory for state information, instance of an AAL service, and 472 buffering in accordance with the vendors particular architecture). On 473 the contrary, with a multicast server only 2 VCs (one out, one in) 474 are required, independent of the number of senders. The allocation of 475 VC related resources is also lower within the ATM cloud when using a 476 multicast server. These points may be considered to have merit in 477 environments where VCs across the UNI or within the ATM cloud are 478 valuable (e.g. the ATM provider charges on a per VC basis), or AAL 479 contexts are limited in the ATM interfaces of endpoints. 481 If we focus on the signalling load then MCSs have the advantage when 482 faced with dynamic sets of receivers. Every time the membership of a 483 multicast group changes (a leaf node needs to be added or dropped), 484 only a single point to multipoint VC needs to be modified when using 485 an MCS. This generates a single signalling event across the MCS's 486 UNI. However, when membership change occurs in a VC mesh, signalling 487 events occur at the UNIs of every traffic source - the transient 488 signalling load scales with the number of sources. This has obvious 489 ramifications if you define latency as the time for a group's 490 connectivity to stabilise after change (especially as the number of 491 senders increases). 493 Finally, as noted above, MCSs introduce a 'reflected packet' problem, 494 which requires additional per-AAL_SDU information to be carried in 495 order for layer 3 sources to detect their own AAL_SDUs coming back. 497 The MARS architecture allows system administrators to utilize either 498 approach on a group by group basis. 500 3.4 Interaction with local UNI 3.0/3.1 signalling entity. 502 The following generic signalling functions are presumed to be 503 available to local AAL Users: 505 L_CALL_RQ - Establish a unicast VC to a specific endpoint. 506 L_MULTI_RQ - Establish multicast VC to a specific endpoint. 507 L_MULTI_ADD - Add new leaf node to previously established VC. 508 L_MULTI_DROP - Remove specific leaf node from established VC. 509 L_RELEASE - Release unicast VC, or all Leaves of a multicast VC. 511 The signalling exchanges and local information passed between AAL 512 User and UNI 3.0/3.1 signalling entity with these functions are 513 outside the scope of this document. 515 The following indications are assumed to be available to AAL Users, 516 generated by by the local UNI 3.0/3.1 signalling entity: 518 L_ACK - Succesful completion of a local request. 519 L_REMOTE_CALL - A new VC has been established to the AAL User. 520 ERR_L_RQFAILED - A remote ATM endpoint rejected an L_CALL_RQ, 521 L_MULTI_RQ, or L_MULTI_ADD. 522 ERR_L_DROP - A remote ATM endpoint dropped off an existing VC. 523 ERR_L_RELEASE - An existing VC was terminated. 525 The signalling exchanges and local information passed between AAL 526 User and UNI 3.0/3.1 signalling entity with these functions are 527 outside the scope of this document. 529 4. Overview of the MARS. 531 The MARS may reside within any ATM endpoint that is directly 532 addressable by the endpoints it is serving. Endpoints wishing to join 533 a multicast cluster must be configured with the ATM address of the 534 node on which the cluster's MARS resides. (Section 5.4 describes how 535 backup MARSs may be added to support the activities of a cluster. 536 References to 'the MARS' in following sections will be assumed to 537 mean the acting MARS for the cluster.) 539 4.1 Architecture. 541 Architecturally the MARS is an evolution of the RFC 1577 ARP Server. 542 Whilst the ARP Server keeps a table of {IP,ATM} address pairs for all 543 IP endpoints in an LIS, the MARS keeps extended tables of {layer 3 544 address, ATM.1, ATM.2, ..... ATM.n} mappings. It can either be 545 configured with certain mappings, or dynamically 'learn' mappings. 546 The format of the {layer 3 address} field is generally not 547 interpreted by the MARS (except for a few special cases, described 548 later). 550 A single ATM node may support multiple logical MARSs, each of which 551 support a separate cluster. The restriction is that each MARS has a 552 unique ATM address (e.g. a different SEL field in the NSAP address of 553 the node on which the multiple MARSs reside). By definition a single 554 instance of a MARS may not support more than one cluster. 556 The MARS distributes group membership update information to cluster 557 members over a point to multipoint VC known as the ClusterControlVC. 558 Additionally, when Multicast Servers (MCSs) are being used it also 559 establishes a separate point to multipoint VC out to registered MCSs, 560 known as the ServerControlVC. All cluster members are leaf nodes of 561 ClusterControlVC. All registered multicast servers are leaf nodes of 562 ServerControlVC (described further in section 6). 564 The MARS does NOT take part in the actual multicasting of layer 3 565 data packets. 567 4.2 Control message format. 569 By default all MARS control messages MUST be LLC/SNAP encapsulated 570 using the following codepoints: 572 [0xAA-AA-03][0x00-00-5E][0x00-03][MARS control message] 573 (LLC) (OUI) (PID) 575 (This is a PID from the IANA OUI.) 576 MARS control messages are made up of 4 major components: 578 [Fixed header][Mandatory fields][Addresses][Supplementary TLVs] 580 [Fixed header] contains fields indicating the operation being 581 performed and the layer 3 protocol being referred to (e.g IPv4, IPv6, 582 AppleTalk, etc). The fixed header also carries checksum information, 583 and hooks to allow this basic control message structure to be re-used 584 by other query/response protocols. 586 The [Mandatory fields] section carries fixed width parameters that 587 depend on the operation type indicated in [Fixed header]. 589 The following [Addresses] area carries variable length fields for 590 source and target addresses - both hardware (e.g. ATM) and layer 3 591 (e.g. IPv4). These provide the fundamental information that the 592 registrations, queries, and updates use and operate on. For the MARS 593 protocol fields in [Fixed header] indicate how to interpret the 594 contents of [Addresses]. 596 [Supplementary TLVs] represents an optional list of TLV (type, 597 length, value) encoded information elements that may be appended to 598 provide supplementary information. This feature is described in 599 further detail in section 10. 601 MARS messages contain variable length address fields. In all cases 602 null addresses SHALL be encoded as zero length, and have no space 603 allocated in the message. 605 (Unique LLC/SNAP encapsulation of MARS control messages means MARS 606 and ARP Server functionality may be implemented within a common 607 entity, and share a client-server VC, if the implementor so chooses. 608 Note that the LLC/SNAP codepoint for MARS is different to the 609 codepoint used for ATMARP.) 611 4.3 Fixed header fields in MARS control messages. 613 The [Fixed header] has the following format: 615 Data: 616 ar$hrd 16 bits Hardware type. 617 ar$pro.type 16 bits Protocol type. 618 ar$pro.snap 40 bits Optional SNAP extension to protocol type. 619 ar$hdrrsv 24 bits Reserved. Unused by MARS control protocol. 620 ar$chksum 16 bits Checksum across entire MARS message. 621 ar$extoff 16 bits Extensions Offset. 622 ar$op 16 bits Operation code. 623 ar$shtl 8 bits Type & length of source ATM number. 625 ar$sstl 8 bits Type & length of source ATM subaddress. 627 ar$shtl and ar$sstl provide information regarding the source's 628 hardware (ATM) address. In the MARS protocol these fields are always 629 present, as every MARS message carries a non-null source ATM address. 630 In all cases the source ATM address is the first variable length 631 field in the [Addresses] section. 633 The other fields in [Fixed header] are described in the following 634 subsections. 636 4.3.1 Hardware type. 638 ar$hrd defines the type of 'hardware' addresses being carried. When 639 ar$hrd = 0x13 the addresses are ATM addresses. Interpretation of the 640 ATM number and subaddress fields when ar$hrd != 0x13 is for future 641 definition. The remainder of this document assumes that ar$hrd = 642 0x13. 644 4.3.2 Protocol type. 646 The ar$pro.type field is a 16 bit unsigned integer representing the 647 following number space: 649 0x0000 to 0x00FF Protocols defined by the equivalent NPLIDs. 650 0x0100 to 0x03FF Reserved for future use by the IETF. 651 0x0400 to 0x04FF Allocated for use by the ATM Forum. 652 0x0500 to 0x05FF Experimental/Local use. 653 0x0600 to 0xFFFF Protocols defined by the equivalent Ethertypes. 655 (based on the observations that valid Ethertypes are never smaller 656 than 0x600, and NPLIDs never larger than 0xFF.) 658 The NLPID value of 0x80 is used to indicate a SNAP encoded extension 659 is being used to encode the protocol type. When ar$pro.type == 0x80 660 the SNAP extension is encoded in the ar$pro.snap field. This is 661 termed the 'long form' protocol ID. 663 If ar$pro != 0x80 then the ar$pro.snap field MUST be zero on transmit 664 and ignored on receive. The ar$pro.type field itself identifies the 665 protocol being referred to. This is termed the 'short form' protocol 666 ID. 668 In all cases, where a protocol has an assigned number in the 669 ar$pro.type space (excluding 0x80) the short form MUST be used when 670 transmitting MARS messages. Additionally, where a protocol has valid 671 short and long forms of identification, receivers MAY choose to 672 recognise the long form. 674 ar$pro.type values other than 0x80 MAY have 'long forms' defined in 675 future documents. 677 For the remainder of this document references to ar$pro SHALL be 678 interpreted to mean ar$pro.type, or ar$pro.type in combination with 679 ar$pro.snap as appropriate. 681 The use of different protocol types is described further in section 682 9. 684 4.3.3 Checksum. 686 The ar$chksum field carries a standard IP checksum calculated across 687 the entire MARS control message (excluding the LLC/SNAP header). The 688 field is set to zero before performing the checksum calculation. 690 As the entire LLC/SNAP encapsulated MARS message is protected by the 691 32 bit CRC of the AAL5 transport, implementors MAY choose to ignore 692 the checksum facility. If no checksum is calculated these bits MUST 693 be reset before transmission. If no checksum is performed on 694 reception, this field MUST be ignored. If a receiver is capable of 695 validating a checksum it MUST only perform the validation when the 696 received ar$chksum field is non-zero. Messages arriving with 697 ar$chksum of 0 are always considered valid. 699 4.3.4 Extensions Offset. 701 The ar$extoff field identifies the existence and location of an 702 optional supplementary parameters list. Its use is described in 703 section 10. 705 4.3.5 Operation code. 707 The ar$op field is further subdivided into two 8 bit fields - 708 ar$op.version (leading octet) and ar$op.type (trailing octet). 709 Together they indicate the nature of the control message, and the 710 context within which its [Mandatory fields], [Addresses], and 711 [Supplementary TLVs] should be interpreted. 713 ar$op.version 714 0 MARS protocol defined in this document. 715 0x01 - 0xEF Reserved for future use by the IETF. 716 0xF0 - 0xFE Allocated for use by the ATM Forum. 717 0xFF Experimental/Local use. 719 ar$op.type 720 Value indicates operation being performed, within context of 721 the control protocol version indicated by ar$op.version. 723 For the rest of this document references to the ar$op value SHALL be 724 taken to mean ar$op.type, with ar$op.version = 0x00. The values used 725 in this document are summarised in section 11. 727 (Note this number space is independent of the ATMARP operation code 728 number space.) 730 4.3.6 Reserved. 732 ar$hdrrsv may be subdivided and assigned specific meanings for other 733 control protocols indicated by ar$op.version != 0. 735 5. Endpoint (MARS client) interface behaviour. 737 An endpoint is best thought of as a 'shim' or 'convergence' layer, 738 sitting between a layer 3 protocol's link layer interface and the 739 underlying UNI 3.0/3.1 service. An endpoint in this context can exist 740 in a host or a router - any entity that requires a generic 'layer 3 741 over ATM' interface to support layer 3 multicast. It is broken into 742 two key subsections - one for the transmit side, and one for the 743 receive side. 745 Multiple logical ATM interfaces may be supported by a single physical 746 ATM interface (for example, using different SEL values in the NSAP 747 formatted address assigned to the physical ATM interface). Therefore 748 implementors MUST allow for multiple independent 'layer 3 over ATM' 749 interfaces too, each with its own configured MARS (or table of MARSs, 750 as discussed in section 5.4), and ability to be attached to the same 751 or different clusters. 753 The initial signalling path between a MARS client (managing an 754 endpoint) and its associated MARS is a transient point to point, 755 bidirectional VC. This VC is established by the MARS client, and is 756 used to send queries to, and receive replies from, the MARS. It has 757 an associated idle timer, and is dismantled if not used for a 758 configurable period of time. The minimum suggested value for this 759 time is 1 minute, and the RECOMMENDED default is 20 minutes. (Where 760 the MARS and ARP Server are co-resident, this VC may be used for both 761 ATM ARP traffic and MARS control traffic.) 763 The remaining signalling path is ClusterControlVC, to which the MARS 764 client is added as a leaf node when it registers (described in 765 section 5.2.3). 767 The majority of this document covers the distribution of information 768 allowing endpoints to establish and manage outgoing point to 769 multipoint VCs - the forwarding paths for multicast traffic to 770 particular multicast groups. The actual format of the AAL_SDUs sent 771 on these VCs is almost completely outside the scope of this 772 specification. However, endpoints are not expected to know whether 773 their forwarding path leads directly to a multicast group's members 774 or to an MCS (described in section 3). This requires additional per- 775 packet encapsulation (described in section 5.5) to aid in the the 776 detection of reflected AAL_SDUs. 778 5.1 Transmit side behaviour. 780 The following description will often be in terms of an IPv4/ATM 781 interface that is capable of transmitting packets to a Class D 782 address at any time, without prior warning. It should be trivial for 783 an implementor to generalise this behaviour to the requirements of 784 another layer 3 data protocol. 786 When a local Layer 3 entity passes down a packet for transmission, 787 the endpoint first ascertains whether an outbound path to the 788 destination multicast group already exists. If it does not, the MARS 789 is queried for a set of ATM endpoints that represent an appropriate 790 forwarding path. (The ATM endpoints may represent the actual group 791 members within the cluster, or a set of one or more MCSs. The 792 endpoint does not distinguish between either case. Section 6.2 793 describes the MARS behaviour that leads to MCSs being supplied as the 794 forwarding path for a multicast group.) 796 The query is executed by issuing a MARS_REQUEST. The reply from the 797 MARS may take one of two forms: 799 MARS_MULTI - Sequence of MARS_MULTI messages returning the set of 800 ATM endpoints that are to be leaf nodes of an 801 outgoing point to multipoint VC (the forwarding 802 path). 804 MARS_NAK - No mapping found, group is empty. 806 The formats of these messages are described in section 5.1.2. 808 Outgoing VCs are established with a request for Unspecified Bit Rate 809 (UBR) service, as typified by the IETF's use of VCs for unicast IP, 810 described in RFC 1755 [6]. Future documents may vary this approach 811 and allow the specification of different ATM traffic parameters from 812 locally configured information or parameters obtained through some 813 external means. 815 5.1.1 Retrieving Group Membership from the MARS. 817 If the MARS had no mapping for the desired Class D address a MARS_NAK 818 will be returned. In this case the IP packet MUST be discarded 819 silently. If a match is found in the MARS's tables it proceeds to 820 return addresses ATM.1 through ATM.n in a sequence of one or more 821 MARS_MULTIs. A simple mechanism is used to detect and recover from 822 loss of MARS_MULTI messages. 824 (If the client learns that there is no other group member in the 825 cluster - the MARS returns a MARS_NAK or returns a MARS_MULTI with 826 the client as the only member - it MUST delay sending out a new 827 MARS_REQUEST for that group for a period no less than 5 seconds and 828 no more than 10 seconds.) 830 Each MARS_MULTI carries a boolean field x, and a 15 bit integer field 831 y - expressed as MARS_MULTI(x,y). Field y acts as a sequence number, 832 starting at 1 and incrementing for each MARS_MULTI sent. Field x 833 acts as an 'end of reply' marker. When x == 1 the MARS response is 834 considered complete. 836 In addition, each MARS_MULTI may carry multiple ATM addresses from 837 the set {ATM.1, ATM.2, .... ATM.n}. A MARS MUST minimise the number 838 of MARS_MULTIs transmitted by placing as many group member's 839 addresses in a single MARS_MULTI as possible. The limit on the length 840 of an individual MARS_MULTI message MUST be the MTU of the underlying 841 VC. 843 For example, assume n ATM addresses must be returned, each MARS_MULTI 844 is limited to only p ATM addresses, and p << n. This would require a 845 sequence of k MARS_MULTI messages (where k = (n/p)+1, using integer 846 arithmetic), transmitted as follows: 848 MARS_MULTI(0,1) carries back {ATM.1 ... ATM.p} 849 MARS_MULTI(0,2) carries back {ATM.(p+1) ... ATM.(2p)} 850 [.......] 851 MARS_MULTI(1,k) carries back { ... ATM.n} 853 If k == 1 then only MARS_MULTI(1,1) is sent. 855 Typical failure mode will be losing one or more of MARS_MULTI(0,1) 856 through MARS_MULTI(0,k-1). This is detected when y jumps by more than 857 one between consecutive MARS_MULTI's. An alternative failure mode is 858 losing MARS_MULTI(1,k). A timer MUST be implemented to flag the 859 failure of the last MARS_MULTI to arrive. A default value of 10 860 seconds is RECOMMENDED. 862 If a 'sequence jump' is detected, the host MUST wait for the 863 MARS_MULTI(1,k), discard all results, and repeat the MARS_REQUEST. 865 If a timeout occurs, the host MUST discard all results, and repeat 866 the MARS_REQUEST. 868 A final failure mode involves the MARS Sequence Number (described in 869 section 5.1.4.2 and carried in each part of a multi-part MARS_MULTI). 870 If its value changes during the reception of a multi-part MARS_MULTI 871 the host MUST wait for the MARS_MULTI(1,k), discard all results, and 872 repeat the MARS_REQUEST. 874 (Corruption of cell contents will lead to loss of a MARS_MULTI 875 through AAL5 CPCS_PDU reassembly failure, which will be detected 876 through the mechanisms described above.) 878 If the MARS is managing a cluster of endpoints spread across 879 different but directly accessible ATM networks it will not be able to 880 return all the group members in a single MARS_MULTI. The MARS_MULTI 881 message format allows for either E.164, ISO NSAP, or (E.164 + NSAP) 882 to be returned as ATM addresses. However, each MARS_MULTI message may 883 only return ATM addresses of the same type and length. The returned 884 addresses MUST be grouped according to type (E.164, ISO NSAP, or 885 both) and returned in a sequence of separate MARS_MULTI parts. 887 5.1.2 MARS_REQUEST, MARS_MULTI, and MARS_NAK messages. 889 MARS_REQUEST is shown below. It is indicated an 'operation type 890 value' (ar$op) of 1. 892 The multicast address being resolved is placed into the the target 893 protocol address field (ar$tpa), and the target hardware address is 894 set to null (ar$thtl and ar$tstl both zero). 896 In IPv4 environments the protocol type (ar$pro) is 0x800 and the 897 target protocol address length (ar$tpln) MUST be set to 4. The source 898 fields MUST contain the ATM number and subaddress of the client 899 issuing the MARS_REQUEST (the subaddress MAY be null). 901 Data: 902 ar$hrd 16 bits Hardware type. 903 ar$pro.type 16 bits Protocol type. 904 ar$pro.snap 40 bits Optional SNAP extension to protocol type. 905 ar$hdrrsv 24 bits Reserved. Unused by MARS control protocol. 906 ar$chksum 16 bits Checksum across entire MARS message. 907 ar$extoff 16 bits Extensions Offset. 908 ar$op 16 bits Operation code (MARS_REQUEST = 1) 909 ar$shtl 8 bits Type & length of source ATM number. 910 ar$sstl 8 bits Type & length of source ATM subaddress. 912 ar$spln 8 bits Length of source protocol address (s) 913 ar$thtl 8 bits Type & length of target ATM number (x) 914 ar$tstl 8 bits Type & length of target ATM subaddress (y) 915 ar$tpln 8 bits Length of target multicast group address (z) 916 ar$pad 64 bits Padding (aligns ar$sha with MARS_MULTI). 917 ar$sha qoctets source ATM number 918 ar$ssa roctets source ATM subaddress 919 ar$spa soctets source protocol address 920 ar$tha xoctets target ATM number 921 ar$tsa yoctets target ATM subaddress 922 ar$tpa zoctets target multicast group address 924 Following the RFC1577 approach, the ar$shtl, ar$sstl, ar$thtl and 925 ar$tstl fields are coded as follows: 927 7 6 5 4 3 2 1 0 928 +-+-+-+-+-+-+-+-+ 929 |0|x| length | 930 +-+-+-+-+-+-+-+-+ 932 The most significant bit is reserved and MUST be set to zero. The 933 second most significant bit (x) is a flag indicating whether the ATM 934 address being referred to is in: 936 - ATM Forum NSAPA format (x = 0). 937 - Native E.164 format (x = 1). 939 The bottom 6 bits is an unsigned integer value indicating the length 940 of the associated ATM address in octets. If this value is zero the 941 flag x is ignored. 943 The ar$spln and ar$tpln fields are unsigned 8 bit integers, giving 944 the length in octets of the source and target protocol address fields 945 respectively. 947 MARS packets use true variable length fields. A null (non-existant) 948 address MUST be coded as zero length, and no space allocated for it 949 in the message body. 951 MARS_NAK is the MARS_REQUEST returned with operation type value of 6. 952 All other fields are left unchanged from the MARS_REQUEST (e.g. do 953 not transpose the source and target information. In all cases MARS 954 clients use the source address fields to identify their own messages 955 coming back). 957 The MARS_MULTI message is identified by an ar$op value of 2. The 958 message format is: 960 Data: 961 ar$hrd 16 bits Hardware type. 962 ar$pro.type 16 bits Protocol type. 963 ar$pro.snap 40 bits Optional SNAP extension to protocol type. 964 ar$hdrrsv 24 bits Reserved. Unused by MARS control protocol. 965 ar$chksum 16 bits Checksum across entire MARS message. 966 ar$extoff 16 bits Extensions Offset. 967 ar$op 16 bits Operation code (MARS_MULTI = 2). 968 ar$shtl 8 bits Type & length of source ATM number. 969 ar$sstl 8 bits Type & length of source ATM subaddress. 970 ar$spln 8 bits Length of source protocol address (s) 971 ar$thtl 8 bits Type & length of target ATM number (x) 972 ar$tstl 8 bits Type & length of target ATM subaddress (y) 973 ar$tpln 8 bits Length of target multicast group address (z) 974 ar$tnum 16 bits Number of target ATM addresses returned (N). 975 ar$seqxy 16 bits Boolean flag x and sequence number y. 976 ar$msn 32 bits MARS Sequence Number. 977 ar$sha qoctets source ATM number 978 ar$ssa roctets source ATM subaddress 979 ar$spa soctets source protocol address 980 ar$tpa zoctets target multicast group address 981 ar$tha.1 xoctets target ATM number 1 982 ar$tsa.1 yoctets target ATM subaddress 1 983 ar$tha.2 xoctets target ATM number 2 984 ar$tsa.2 yoctets target ATM subaddress 2 985 [.......] 986 ar$tha.N xoctets target ATM number N 987 ar$tsa.N yoctets target ATM subaddress N 989 The source protocol and ATM address fields are copied directly from 990 the MARS_REQUEST that this MARS_MULTI is in response to (not the MARS 991 itself). 993 ar$seqxy is coded with flag x in the leading bit, and sequence number 994 y coded as an unsigned integer in the remaining 15 bits. 996 | 1st octet | 2nd octet | 997 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 998 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 999 |x| y | 1000 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1002 ar$tnum is an unsigned integer indicating how many pairs of 1003 {ar$tha,ar$tsa} (i.e. how many group member's ATM addresses) are 1004 present in the message. ar$msn is an unsigned 32 bit number filled in 1005 by the MARS before transmitting each MARS_MULTI. Its use is described 1006 further in section 5.1.4. 1008 As an example, assume we have a multicast cluster using 4 byte 1009 protocol addresses, 20 byte ATM numbers, and 0 byte ATM subaddresses. 1010 For n group members in a single MARS_MULTI we require a (48 + 20n) 1011 byte message. If we assume the default MTU of 9180 bytes, we can 1012 return a maximum of 456 group member's addresses in a single 1013 MARS_MULTI. 1015 5.1.3 Establishing the outgoing multipoint VC. 1017 Following the completion of the MARS_MULTI reply the endpoint may 1018 establish a new point to multipoint VC, or reuse an existing one. 1020 If establishing a new VC, an L_MULTI_RQ is issued for ATM.1, followed 1021 by an L_MULTI_ADD for every member of the set {ATM.2, ....ATM.n} 1022 (assuming the set is non-null). The packet is then transmitted over 1023 the newly created VC just as it would be for a unicast VC. 1025 After transmitting the packet, the local interface holds the VC open 1026 and marks it as the active path out of the host for any subsequent IP 1027 packets being sent to that Class D address. 1029 When establishing a new multicast VC it is possible that one or more 1030 L_MULTI_RQ or L_MULTI_ADD may fail. The UNI 3.0/3.1 failure cause 1031 must be returned in the ERR_L_RQFAILED signal from the local 1032 signalling entity to the AAL User. If the failure cause is not 49 1033 (Quality of Service unavailable) or 51 (user cell rate not 1034 available), the endpoint's ATM address is dropped from the set 1035 {ATM.1, ATM.2, ..., ATM.n} returned by the MARS. Otherwise, the 1036 L_MULTI_RQ or L_MULTI_ADD should be reissued after a random delay of 1037 5 to 10 seconds. If the request fails again, another request should 1038 be issued after twice the previous delay has elapsed. This process 1039 should be continued until the call succeeds or the multipoint VC gets 1040 released. 1042 If the initial L_MULTI_RQ fails for ATM.1, and n is greater than 1 1043 (i.e. the returned set of ATM addresses contains 2 or more addresses) 1044 a new L_MULTI_RQ should be immediately issued for the next ATM 1045 address in the set. This procedure is repeated until an L_MULTI_RQ 1046 succeeds, as no L_MULTI_ADDs may be issued until an initial outgoing 1047 VC is established. 1049 Each ATM address for which an L_MULTI_RQ failed with cause 49 or 51 1050 MUST be tagged rather than deleted. An L_MULTI_ADD is issued for 1051 these tagged addresses using the random delay procedure outlined 1052 above. 1054 The VC MAY be considered 'up' before failed L_MULTI_ADDs have been 1055 successfully re-issued. An endpoint MAY implement a concurrent 1056 mechanism that allows data to start flowing out the new VC even while 1057 failed L_MULTI_ADDs are being re-tried. (The alternative of waiting 1058 for each leaf node to accept the connection could lead to significant 1059 delays in transmitting the first packet.) 1061 Each VC MUST have a configurable inactivity timer associated with it. 1062 If the timer expires, an L_RELEASE is issued for that VC, and the 1063 Class D address is no longer considered to have an active path out of 1064 the local host. The timer SHOULD be no less than 1 minute, and a 1065 default of 20 minutes is RECOMMENDED. Choice of specific timer 1066 periods is beyond the scope of this document. 1068 VC consumption may also be reduced by endpoints noting when a new 1069 group's set of {ATM.1, ....ATM.n} matches that of a pre-existing VC 1070 out to another group. With careful local management, and assuming the 1071 QoS of the existing VC is sufficient for both groups, a new pt to mpt 1072 VC may not be necessary. Under certain circumstances endpoints may 1073 decide that it is sufficient to re-use an existing VC whose set of 1074 leaf nodes is a superset of the new group's membership (in which case 1075 some endpoints will receive multicast traffic for a layer 3 group 1076 they haven't joined, and must filter them above the ATM interface). 1077 Algorithms for performing this type of optimization are not discussed 1078 here, and are not required for conformance with this document. 1080 5.1.4 Tracking subsequent group updates. 1082 Once a new VC has been established, the transmit side of the cluster 1083 member's interface needs to monitor subsequent group changes - adding 1084 or dropping leaf nodes as appropriate. This is achieved by watching 1085 for MARS_JOIN and MARS_LEAVE messages from the MARS itself. These 1086 messages are described in detail in section 5.2 - at this point it is 1087 sufficient to note that they carry: 1089 - The ATM address of a node joining or leaving a group. 1090 - The layer 3 address of the group(s) being joined or left. 1091 - A Cluster Sequence Number (CSN) from the MARS. 1093 MARS_JOIN and MARS_LEAVE messages arrive at each cluster member 1094 across ClusterControlVC. MARS_JOIN or MARS_LEAVE messages that simply 1095 confirm information already held by the cluster member are used to 1096 track the Cluster Sequence Number, but are otherwise ignored. 1098 5.1.4.1 Updating the active VCs. 1100 If a MARS_JOIN is seen that refers to (or encompasses) a group for 1101 which the transmit side already has a VC open, the new member's ATM 1102 address is extracted and an L_MULTI_ADD issued locally. This ensures 1103 that endpoints already sending to a given group will immediately add 1104 the new member to their list of recipients. 1106 If a MARS_LEAVE is seen that refers to (or encompasses) a group for 1107 which the transmit side already has a VC open, the old member's ATM 1108 address is extracted and an L_MULTI_DROP issued locally. This ensures 1109 that endpoints already sending to a given group will immediately drop 1110 the old member from their list of recipients. When the last leaf of a 1111 VC is dropped, the VC is closed completely and the affected group no 1112 longer has a path out of the local endpoint (the next outbound packet 1113 to that group's address will trigger the creation of a new VC, as 1114 described in sections 5.1.1 to 5.1.3). 1116 In an IPv4 environment any endpoint leaving 224.0.0.1 is assumed to 1117 be ceasing support for IP multicast operation. If a MARS_LEAVE is 1118 seen that refers to group 224.0.0.1 then the ATM address of the 1119 endpoint specified in the message MUST be removed from every 1120 multipoint VC on which it is listed as a leaf node. 1122 The transmit side of the interface MUST NOT shut down an active VC to 1123 a group for which the receive side has just executed a 1124 LeaveLocalGroup. (This behaviour is consistent with the model of 1125 hosts transmitting to groups regardless of their own membership 1126 status.) 1128 If a MARS_JOIN or MARS_LEAVE arrives with ar$pnum == 0 it carries no 1129 pairs, and is only used for tracking the CSN. 1131 5.1.4.2 Tracking the Cluster Sequence Number. 1133 It is important that endpoints do not miss group membership updates 1134 issued by the MARS over ClusterControlVC. However, this will happen 1135 from time to time. The Cluster Sequence Number is carried as an 1136 unsigned 32 bit value in the ar$msn field of many MARS messages 1137 (except for MARS_REQUEST and MARS_NAK). It increments once for every 1138 transmission the MARS makes on ClusterControlVC, regardless of 1139 whether the transmission represents a change in the MARS database or 1140 not. By tracking this counter, cluster members can determine whether 1141 they have missed a previous message on ClusterControlVC, and possibly 1142 a membership change. This is then used to trigger revalidation 1143 (described in section 5.1.5). 1145 The current CSN is copied into the ar$msn field of MARS messages 1146 being sent to cluster members, whether out ClusterControlVC or on an 1147 point to point VC. 1149 Calculations on the sequence numbers MUST be performed as unsigned 32 1150 bit arithmetic. 1152 Every cluster member keeps its own 32 bit Host Sequence Number (HSN) 1153 to track the MARS's sequence number. Whenever a message is received 1154 that carries an ar$msn field the following processing is performed: 1156 Seq.diff = ar$msn - HSN 1158 ar$msn -> HSN 1159 {...process MARS message as appropriate...} 1161 if ((Seq.diff != 1) && (Seq.diff != 0)) 1162 then {...revalidate group membership information...} 1164 The basic result is that the cluster member attempts to keep locked 1165 in step with membership changes noted by the MARS. If it ever detects 1166 that a membership change occurred (in any group) without it noticing, 1167 it re-validates the membership of all groups it currently has 1168 multicast VCs open to. 1170 The ar$msn value in an individual MARS_MULTI is not used to update 1171 the HSN until all parts of the MARS_MULTI (if more than 1) have 1172 arrived. (If the ar$msn changes the MARS_MULTI is discarded, as 1173 described in section 5.1.1.) 1175 The MARS is free to choose an initial value of CSN. When a new 1176 cluster member starts up it should initialise HSN to zero. When the 1177 cluster member sends the MARS_JOIN to register (described later), the 1178 HSN will be correctly updated to the current CSN value when the 1179 endpoint receives the copy of its MARS_JOIN back from the MARS. 1181 5.1.5 Revalidating a VC's leaf nodes. 1183 Certain events may inform a cluster member that it has incorrect 1184 information about the sets of leaf nodes it should be sending to. If 1185 an error occurs on a VC associated with a particular group, the 1186 cluster member initiates revalidation procedures for that specific 1187 group. If a jump is detected in the Cluster Sequence Number, this 1188 initiates revalidation of all groups to which the cluster member 1189 currently has open point to multipoint VCs. 1191 Each open and active multipoint VC has a flag associated with it 1192 called 'VC_revalidate'. This flag is checked everytime a packet is 1193 queued for transmission on that VC. If the flag is false, the packet 1194 is transmitted and no further action is required. 1196 However, if the VC_revalidate flag is true then the packet is 1197 transmitted and a new sequence of events is started locally. 1199 Revalidation begins with re-issuing a MARS_REQUEST for the group 1200 being revalidated. The returned set of members {NewATM.1, NewATM.2, 1201 .... NewATM.n} is compared with the set already held locally. 1202 L_MULTI_DROPs are issued on the group's VC for each node that appears 1203 in the original set of members but not in the revalidated set of 1204 members. L_MULTI_ADDs are issued on the group's VC for each node that 1205 appears in the revalidated set of members but not in the original set 1206 of members. The VC_revalidate flag is reset when revalidation 1207 concludes for the given group. Implementation specific mechanisms 1208 will be needed to flag the 'revalidation in progress' state. 1210 The key difference between constructing a VC (section 5.1.3) and 1211 revalidating a VC is that packet transmission continues on the open 1212 VC while it is being revalidated. This minimises the disruption to 1213 existing traffic. 1215 The algorithm for initiating revalidation is: 1217 - When a packet arrives for transmission on a given group, 1218 the groups membership is revalidated if VC_revalidate == TRUE. 1219 Revalidation resets VC_revalidate. 1220 - When an event occurs that demands revalidation, every 1221 group has its VC_revalidate flag set TRUE at a random time 1222 between 1 and 10 seconds. 1224 Benefit: Revalidation of active groups occurs quickly, and 1225 essentially idle groups are revalidated as needed. Randomly 1226 distributed setting of VC_revalidate flag improves chances of 1227 staggered revalidation requests from senders when a sequence number 1228 jump is detected. 1230 5.1.5.1 When leaf node drops itself. 1232 During the life of a multipoint VC an ERR_L_DROP may be received 1233 indicating that a leaf node has terminated its participation at the 1234 ATM level. The ATM endpoint associated with the ERR_L_DROP MUST be 1235 removed from the locally held set {ATM.1, ATM.2, .... ATM.n} 1236 associated with the VC. 1238 After a random period of time between 1 and 10 seconds the 1239 VC_revalidate flag associated with that VC MUST be set true. 1241 If an ERR_L_RELEASE is received then the entire set {ATM.1, ATM.2, 1242 .... ATM.n} is cleared and the VC is considered to be completely shut 1243 down. Further packet transmission to the group served by this VC will 1244 result in a new VC being established as described in section 5.1.3. 1246 5.1.5.2 When a jump is detected in the CSN. 1248 Section 5.1.4.2 describes how a CSN jump is detected. If a CSN jump 1249 is detected upon receipt of a MARS_JOIN or a MARS_LEAVE then every 1250 outgoing multicast VC MUST have its VC_revalidate flag set true at 1251 some random interval between 1 and 10 seconds from when the CSN jump 1252 was detected. 1254 The only exception to this rule is if a sequence number jump is 1255 detected during the establishment of a new group's VC (i.e. a 1256 MARS_MULTI reply was correctly received, but its ar$msn indicated 1257 that some previous MARS traffic had been missed on ClusterControlVC). 1258 In this case every open VC, EXCEPT the one just established, MUST 1259 have its VC_revalidate flag set true at some random interval between 1260 1 and 10 seconds from when the CSN jump was detected. (The VC being 1261 established at the time is considered already validated.) 1263 5.2. Receive side behaviour. 1265 A cluster member is a 'group member' (in the sense that it receives 1266 packets directed at a given multicast group) when its ATM address 1267 appears in the MARS's table entry for the group's multicast address. 1268 A key function within each cluster is the distribution of group 1269 membership information from the MARS to cluster members. 1271 An endpoint may wish to 'join a group' in response to a local, higher 1272 level request for membership of a group, or because the endpoint 1273 supports a layer 3 multicast forwarding engine that requires the 1274 ability to 'see' intra-cluster traffic in order to forward it. 1276 Two messages support these requirements - MARS_JOIN and MARS_LEAVE. 1277 These are sent to the MARS by endpoints when the local layer 3/ATM 1278 interface is requested to join or leave a multicast group. The MARS 1279 propagates these messages back out over ClusterControlVC, to ensure 1280 the knowledge of the group's membership change is distributed in a 1281 timely fashion to other cluster members. 1283 Certain models of layer 3 endpoints (e.g. IP multicast routers) 1284 expect to be able to receive packet traffic 'promiscuously' across 1285 all groups. This functionality may be emulated by allowing routers 1286 to request that the MARS returns them as 'wild card' members of all 1287 Class D addresses. However, a problem inherent in the current ATM 1288 model is that a completely promiscuous router may exhaust the local 1289 reassembly resources in its ATM interface. MARS_JOIN supports a 1290 generalisation to the notion of 'wild card' entries, enabling routers 1291 to limit themselves to 'blocks' of the Class D address space. Use of 1292 this facility is described in greater detail in Section 8. 1294 A block can be as small as 1 (a single group) or as large as the 1295 entire multicast address space (e.g. default IPv4 'promiscuous' 1296 behaviour). A block is defined as all addresses between, and 1297 inclusive of, a address pair. A MARS_JOIN or MARS_LEAVE may 1298 carry multiple pairs. 1300 Cluster members MUST provide ONLY a single pair in each 1301 JOIN/LEAVE message they issue. However, they MUST be able to process 1302 multiple pairs in JOIN/LEAVE messages when performing VC 1303 management as described in section 5.1.4 (the interpretation being 1304 that the join/leave operation applies to all addresses in range from 1305 to inclusive, for every pair). 1307 In RFC1112 environments a MARS_JOIN for a single group is triggered 1308 by a JoinLocalGroup signal from the IP layer. A MARS_LEAVE for a 1309 single group is triggered by a LeaveLocalGroup signal from the IP 1310 layer. 1312 Cluster members with special requirements (e.g. multicast routers) 1313 may issue MARS_JOINs and MARS_LEAVEs specifying a block of multicast 1314 group addresses. 1316 An endpoint MUST register with a MARS in order to become a member of 1317 a cluster and be added as a leaf to ClusterControlVC. Registration 1318 is covered in section 5.2.3. 1320 Finally, the endpoint MUST be capable of terminating unidirectional 1321 VCs (i.e. act as a leaf node of a UNI 3.0/3.1 point to multipoint VC, 1322 with zero bandwidth assigned on the return path). RFC 1755 describes 1323 the signalling information required to terminate VCs carrying 1324 LLC/SNAP encapsulated traffic (discussed further in section 5.5). 1326 5.2.1 Format of the MARS_JOIN and MARS_LEAVE Messages. 1328 The MARS_JOIN message is indicated by an operation type value of 4. 1329 MARS_LEAVE has the same format and operation type value of 5. The 1330 message format is: 1332 Data: 1333 ar$hrd 16 bits Hardware type. 1334 ar$pro.type 16 bits Protocol type. 1335 ar$pro.snap 40 bits Optional SNAP extension to protocol type. 1336 ar$hdrrsv 24 bits Reserved. Unused by MARS control protocol. 1337 ar$chksum 16 bits Checksum across entire MARS message. 1338 ar$extoff 16 bits Extensions Offset. 1339 ar$op 16 bits Operation code (MARS_JOIN or MARS_LEAVE). 1340 ar$shtl 8 bits Type & length of source ATM number. 1341 ar$sstl 8 bits Type & length of source ATM subaddress. 1343 ar$spln 8 bits Length of source protocol address (s) 1344 ar$tpln 8 bits Length of multicast group address (z) 1345 ar$pnum 16 bits Number of multicast group address pairs (N) 1346 ar$flags 16 bits layer3grp, copy, and register flags. 1347 ar$cmi 16 bits Cluster Member ID 1348 ar$msn 32 bits MARS Sequence Number. 1349 ar$sha qoctets source ATM number. 1350 ar$ssa roctets source ATM subaddress. 1351 ar$spa soctets source protocol address 1352 ar$min.1 zoctets Minimum multicast group address - pair.1 1353 ar$max.1 zoctets Maximum multicast group address - pair.1 1354 [.......] 1355 ar$min.N zoctets Minimum multicast group address - pair.N 1356 ar$max.N zoctets Maximum multicast group address - pair.N 1358 ar$spln indicates the number of bytes in the source endpoint's 1359 protocol address, and is interpreted in the context of the protocol 1360 indicated by the ar$pro field. (e.g. in IPv4 environments ar$pro will 1361 be 0x800, ar$spln is 4, and ar$tpln is 4.) 1363 The ar$flags field contains three flags: 1365 Bit 15 - ar$flags.layer3grp. 1366 Bit 14 - ar$flags.copy. 1367 Bit 13 - ar$flags.register. 1368 Bit 12 - ar$flags.punched. 1369 Bit 0-7 - ar$flags.sequence. 1371 Bits 8 to 11 are reserved and MUST be zero. 1373 ar$flags.sequence is set by cluster members, and MUST always be 1374 passed on unmodified by the MARS when retransmitting MARS_JOIN or 1375 MARS_LEAVE messages. It is source specific, and MUST be ignored by 1376 other cluster members. Its use is described in section 5.2.2. 1378 ar$flags.punched MUST be zero when the MARS_JOIN or MARS_LEAVE is 1379 transmitted to the MARS. Its use is described in section 5.2.2 and 1380 section 6.2.4. 1382 ar$flags.copy MUST be set to 0 when the message is being sent from a 1383 MARS client, and MUST be set to 1 when the message is being sent from 1384 a MARS. (This flag is intended to support integrating the MARS 1385 function with one of the MARS clients in your cluster. The 1386 destination of an incoming MARS_JOIN can be determined from its 1387 value.) 1389 ar$flags.layer3grp allows the MARS to provide the group membership 1390 information described further in section 5.3. The rules for its use 1391 are: 1393 ar$flags.layer3grp MUST be set when the cluster member is issuing 1394 the MARS_JOIN a the result of a layer 3 multicast group being 1395 explicitly joined. (e.g. as a result of a JoinHostGroup operation 1396 in an RFC1112 compliant host). 1398 ar$flags.layer3grp MUST be reset in each MARS_JOIN if the 1399 MARS_JOIN is simply the local ip/atm interface registering to 1400 receive traffic on that group for its own reasons. 1402 ar$flags.layer3grp is ignored and MUST be treated as reset by the 1403 MARS for any MARS_JOIN that specifies a block covering more than a 1404 single group (e.g. a block join from a router ensuring their 1405 forwarding engines 'see' all traffic). 1407 ar$flags.register indicates whether the MARS_JOIN or MARS_LEAVE is 1408 being used to register or deregister a cluster member (described in 1409 section 5.2.3). When used to join or leave specific groups the 1410 ar$register flag MUST be zero. 1412 ar$pnum indicates how many pairs are included in the 1413 message. This field MUST be 1 when the message is sent from a cluster 1414 member. A MARS MAY return a MARS_JOIN or MARS_LEAVE with any ar$pnum 1415 value, including zero. This will be explained futher in section 1416 6.2.4. 1418 The ar$cmi field MUST be zeroed by cluster members, and is used by 1419 the MARS during cluster member registration, described in section 1420 5.2.3. 1422 ar$msn MUST be zero when transmitted by an endpoint. It is set to the 1423 current value of the Cluster Sequence Number by the MARS when the 1424 MARS_JOIN or MARS_LEAVE is retransmitted. Its use has been described 1425 in section 5.1.4. 1427 To simplify construction and parsing of MARS_JOIN and MARS_LEAVE 1428 messages, the following restrictions are imposed on the 1429 pairs: 1431 Assume max(N) is the field from the Nth pair. 1432 Assume min(N) is the field from the Nth pair. 1433 Assume a join/leave message arrives with K pairs. 1434 The following must hold: 1435 max(N) < min(N+1) for 1 <= N < K 1436 max(N) >= min(N) for 1 <= N <= K 1438 In plain language, the set must specify an ascending sequence of 1439 address blocks. The definition of "greater" or "less than" may be 1440 protocol specific. In IPv4 environments the addresses are treated as 1441 32 bit, unsigned binary values (most significant byte first). 1443 5.2.1.1 Important IPv4 default values. 1445 The JoinLocalGroup and LeaveLocalGroup operations are only valid for 1446 a single group. For any arbitrary group address X the associated 1447 MARS_JOIN or MARS_LEAVE MUST specify a single pair . 1448 ar$flags.layer3grp MUST be set under these circumstances. 1450 A router choosing to behave strictly in accordance with RFC1112 MUST 1451 specify the entire Class D space. The associated MARS_JOIN or 1452 MARS_LEAVE MUST specify a single pair <224.0.0.0, 239.255.255.255>. 1453 Whenever a router issues a MARS_JOIN only in order to forward IP 1454 traffic it MUST reset ar$flags.layer3grp. 1456 The use of alternative values by multicast routers is 1457 discussed in Section 8. 1459 5.2.2 Retransmission of MARS_JOIN and MARS_LEAVE messages. 1461 Transient problems may result in the loss of messages between the 1462 MARS and cluster members 1464 A simple algorithm is used to solve this problem. Cluster members 1465 retransmit each MARS_JOIN and MARS_LEAVE message at regular intervals 1466 until they receive a copy back again, either on ClusterControlVC or 1467 the VC on which they are sending the message. At this point the 1468 local endpoint can be certain that the MARS received and processed 1469 it. 1471 The interval should be no shorter than 5 seconds, and a default value 1472 of 10 seconds is recommended. After 5 retransmissions the attempt 1473 should be flagged locally as a failure. This MUST be considered as a 1474 MARS failure, and triggers the MARS reconnection described in section 1475 5.4. 1477 A 'copy' is defined as a received message with the following fields 1478 matching a previously transmitted MARS_JOIN/LEAVE: 1480 - ar$op 1481 - ar$flags.register 1482 - ar$flags.sequence 1483 - ar$pnum 1484 - Source ATM address 1485 - First pair 1487 In addition, a valid copy MUST have the following field values: 1489 - ar$flags.punched = 0 1490 - ar$flags.copy = 1 1492 The ar$flags.sequence field is never modified or checked by a MARS. 1493 Implementors MAY choose to utilize locally significant sequence 1494 number schemes, which MAY differ from one cluster member to the next. 1495 In the absence of such schemes the default value for 1496 ar$flags.sequence MUST be zero. 1498 Careful implementations MAY have more than one outstanding 1499 (unacknowledged) MARS_JOIN/LEAVE at a time. 1501 5.2.3 Cluster member registration and deregistration. 1503 To become a cluster member an endpoint must register with the MARS. 1504 This achieves two things - the endpoint is added as a leaf node of 1505 ClusterControlVC, and the endpoint is assigned a 16 bit Cluster 1506 Member Identifier (CMI). The CMI uniquely identifies each endpoint 1507 that is attached to the cluster. 1509 Registration with the MARS occurs when an endpoint issues a MARS_JOIN 1510 with the ar$flags.register flag set to one (bit 13 of the ar$flags 1511 field). 1513 The cluster member MUST include its source ATM address, and MAY 1514 choose to specify a null source protocol address when registering. 1516 No protocol specific group addresses are included in a registration 1517 MARS_JOIN. 1519 The cluster member retransmits this MARS_JOIN in accordance with 1520 section 5.2.2 until it confirms that the MARS has received it. 1522 When the registration MARS_JOIN is returned it contains a non-zero 1523 value in ar$cmi. This value MUST be noted by the cluster member, and 1524 used whenever circumstances require the cluster member's CMI. 1526 An endpoint may also choose to de-register, using a MARS_LEAVE with 1527 ar$flags.register set. This would result in the MARS dropping the 1528 endpoint from ClusterControlVC, removing all references to the member 1529 in the mapping database, and freeing up its CMI. 1531 As for registration, a deregistration request MUST include the 1532 correct source ATM address for the cluster member, but MAY choose to 1533 specify a null source protocol address. 1535 The cluster member retransmits this MARS_LEAVE in accordance with 1536 section 5.2.2 until it confirms that the MARS has received it. 1538 5.3 Support for Layer 3 group management. 1540 Whilst the intention of this specification is to be independent of 1541 layer 3 issues, an attempt is being made to assist the operation of 1542 layer 3 multicast routing protocols that need to ascertain if any 1543 groups have members within a cluster. 1545 One example is IP, where IGMP is used (as described in section 2) 1546 simply to determine whether any other cluster members are listening 1547 to a group because they have higher layer applications that want to 1548 receive a group's traffic. 1550 Routers may choose to query the MARS for this information, rather 1551 than multicasting IGMP queries to 224.0.0.1 and incurring the 1552 associated cost of setting up a VC to all systems in the cluster. 1554 The query is issued by sending a MARS_GROUPLIST_REQUEST to the MARS. 1555 MARS_GROUPLIST_REQUEST is built from a MARS_JOIN, but it has an 1556 operation code of 10. The first pair will be used by the 1557 MARS to identify the range of groups in which the querying cluster 1558 member is interested. Any additional pairs will be ignored. 1559 A request with ar$pnum = 0 will be ignored. 1561 The response from the MARS is a MARS_GROUPLIST_REPLY, carrying a list 1562 of the multicast groups within the specified block that 1563 have Layer 3 members. A group is noted in this list if one or more 1564 of the MARS_JOINs that generated its mapping entry in the MARS 1565 contained a set ar$flags.layer3grp flag. 1567 MARS_GROUPLIST_REPLYs are transmitted back to the querying cluster 1568 member on the VC used to send the MARS_GROUPLIST_REQUEST. 1570 MARS_GROUPLIST_REPLY is derived from the MARS_MULTI but with ar$op = 1571 11. It may have multiple parts if needed, and is received in a 1572 similar manner to a MARS_MULTI. 1574 Data: ar$hrd 16 bits Hardware type. 1575 ar$pro.type 16 bits Protocol type. 1576 ar$pro.snap 40 bits Optional SNAP extension to protocol type. 1577 ar$hdrrsv 24 bits Reserved. Unused by MARS control protocol. 1578 ar$chksum 16 bits Checksum across entire MARS message. 1579 ar$extoff 16 bits Extensions Offset. 1580 ar$op 16 bits Operation code (MARS_GROUPLIST_REPLY). 1581 ar$shtl 8 bits Type & length of source ATM number. 1583 ar$sstl 8 bits Type & length of source ATM subaddress. 1584 ar$spln 8 bits Length of source protocol address (s) 1585 ar$thtl 8 bits Unused - set to zero. 1586 ar$tstl 8 bits Unused - set to zero. 1587 ar$tpln 8 bits Length of target multicast group address (z) 1588 ar$tnum 16 bits Number of group addresses returned (N). 1589 ar$seqxy 16 bits Boolean flag x and sequence number y. 1590 ar$msn 32 bits MARS Sequence Number. 1591 ar$sha qoctets source ATM number. 1592 ar$ssa roctets source ATM subaddress. 1593 ar$spa soctets source protocol address 1594 ar$mgrp.1 zoctets Group address 1 1595 [.......] 1596 ar$mgrp.N zoctets Group address N 1598 ar$seqxy is coded as for the MARS_MULTI - multiple 1599 MARS_GROUPLIST_REPLY components are transmitted and received using 1600 the same algorithm as described in section 5.1.1 for MARS_MULTI. The 1601 only difference is that protocol address are being returned rather 1602 than ATM addresses. 1604 As for MARS_MULTIs, if an error occurs in the reception of a multi 1605 part MARS_GROUPLIST_REPLY the whole thing MUST be discarded and the 1606 MARS_GROUPLIST_REQUEST re-issued. (This includes the ar$msn value 1607 being constant.) 1609 Note that the ability to generate MARS_GROUPLIST_REQUEST messages, 1610 and receive MARS_GROUPLIST_REPLY messages, is not required for 1611 general host interface implementations. It is optional for interfaces 1612 being implemented to support layer 3 multicast forwarding engines. 1613 However, this functionality MUST be supported by the MARS. 1615 5.4 Support for redundant/backup MARS entities. 1617 Endpoints are assumed to have been configured with the ATM address of 1618 at least one MARS. Endpoints MAY choose to maintain a table of ATM 1619 addresses, representing alternative MARSs that will be contacted in 1620 the event that normal operation with the original MARS is deemed to 1621 have failed. It is assumed that this table orders the ATM addresses 1622 in descending order of preference. 1624 An endpoint will typically decide there are problems with the MARS 1625 when: 1627 - It fails to establish a point to point VC to the MARS. 1628 - MARS_REQUESTs fail (section 5.1.1). 1629 - MARS_JOIN/MARS_LEAVEs fail (section 5.2.2). 1631 - It has not received a MARS_REDIRECT_MAP in the last 4 minutes 1632 (section 5.4.3). 1634 (If it is able to discern which connection represents 1635 ClusterControlVC, it may also use connection failures on this VC to 1636 indicate problems with the MARS). 1638 5.4.1 First response to MARS problems. 1640 The first response is to assume a transient problem with the MARS 1641 being used at the time. The cluster member should wait a random 1642 period of time between 1 and 10 seconds before attempting to re- 1643 connect and re-register with the MARS. If the registration MARS_JOIN 1644 is successful then: 1646 The cluster member MUST then proceed to rejoin every group that 1647 its local higher layer protocol(s) have joined. It is recommended 1648 that a random delay between 1 and 10 seconds be inserted before 1649 attempting each MARS_JOIN. 1651 The cluster member MUST initiate the revalidation of every 1652 multicast group it was sending to (as though a sequence number 1653 jump had been detected, section 5.1.5). 1655 The rejoin and revalidation procedure must not disrupt the cluster 1656 member's use of multipoint VCs that were already open at the time 1657 of the MARS failure. 1659 If re-registration with the current MARS fails, and there are no 1660 backup MARS addresses configured, the cluster member MUST wait for at 1661 least 1 minute before repeating the re-registration procedure. It is 1662 RECOMMENDED that the cluster member signals an error condition in 1663 some locally significant fashion. 1665 This procedure may repeat until network administrators manually 1666 intervene or the current MARS returns to normal operation. 1668 5.4.2 Connecting to a backup MARS. 1670 If the re-registration with the current MARS fails, and other MARS 1671 addresses has been configured, the next MARS address on the list is 1672 chosen to be the current MARS, and the cluster member immediately 1673 restarts the re-registration procedure described in section 5.4.1. If 1674 this is succesful the cluster member will resume normal operation 1675 using the new MARS. It is RECOMMENDED that the cluster member signals 1676 a warning of this condition in some locally significant fashion. 1678 If the attempt at re-registration with the new MARS fails, the 1679 cluster member MUST wait for at least 1 minute before chosing the 1680 next MARS address in the table and repeating the procedure. If the 1681 end of the table has been reached, the cluster member starts again at 1682 the top of the table (which should be the original MARS that the 1683 cluster member started with). 1685 In the worst case scenario this will result in cluster members 1686 looping through their table of possible MARS addresses until network 1687 administrators manually intervene. 1689 5.4.3 Dynamic backup lists, and soft redirects. 1691 To support some level of autoconfiguration, a MARS message is defined 1692 that allows the current MARS to broadcast on ClusterControlVC a table 1693 of backup MARS addresses. When this message is received, cluster 1694 members that maintain a list of backup MARS addresses MUST insert 1695 this information at the top of their locally held list (i.e. the 1696 information provided by the MARS has a higher preference than 1697 addresses that may have been manually configured into the cluster 1698 member). 1700 The message is MARS_REDIRECT_MAP. It is based on the MARS_MULTI 1701 message, with the following changes: 1703 - ar$tpln field replaced by ar$redirf. 1704 - ar$spln field reserved. 1705 - ar$tpa and ar$spa eliminated. 1707 MARS_REDIRECT_MAP has an operation type code of 12 decimal. 1709 Data: 1710 ar$hrd 16 bits Hardware type. 1711 ar$pro.type 16 bits Protocol type. 1712 ar$pro.snap 40 bits Optional SNAP extension to protocol type. 1713 ar$hdrrsv 24 bits Reserved. Unused by MARS control protocol. 1714 ar$chksum 16 bits Checksum across entire MARS message. 1715 ar$extoff 16 bits Extensions Offset. 1716 ar$op 16 bits Operation code (MARS_REDIRECT_MAP). 1717 ar$shtl 8 bits Type & length of source ATM number. 1718 ar$sstl 8 bits Type & length of source ATM subaddress. 1719 ar$spln 8 bits Length of source protocol address (s) 1720 ar$thtl 8 bits Type & length of target ATM number (x) 1721 ar$tstl 8 bits Type & length of target ATM subaddress (y) 1722 ar$redirf 8 bits Flag controlling client redirect behaviour. 1723 ar$tnum 16 bits Number of MARS addresses returned (N). 1724 ar$seqxy 16 bits Boolean flag x and sequence number y. 1725 ar$msn 32 bits MARS Sequence Number. 1726 ar$sha qoctets source ATM number 1727 ar$ssa roctets source ATM subaddress 1728 ar$tha.1 xoctets ATM number for MARS 1 1729 ar$tsa.1 yoctets ATM subaddress for MARS 1 1730 ar$tha.2 xoctets ATM number for MARS 2 1731 ar$tsa.2 yoctets ATM subaddress for MARS 2 1732 [.......] 1733 ar$tha.N xoctets ATM number for MARS N 1734 ar$tsa.N yoctets ATM subaddress for MARS N 1736 The source ATM address field(s) MUST identify the originating MARS. 1737 A multi-part MARS_REDIRECT_MAP may be transmitted and reassembled 1738 using the ar$seqxy field in the same manner as a multi-part 1739 MARS_MULTI (section 5.1.1). If a failure occurs during the reassembly 1740 of a multi-part MARS_REDIRECT_MAP (a part lost, reassembly timeout, 1741 or illegal MARS Sequence Number jump) the entire message MUST be 1742 discarded. 1744 This message is transmitted regularly by the MARS (it MUST be 1745 transmitted at least every 2 minutes, it is RECOMMENDED that it is 1746 transmitted every 1 minute). 1748 The MARS_REDIRECT_MAP is also used to force cluster members to shift 1749 from one MARS to another. If the ATM address of the first MARS 1750 contained in a MARS_REDIRECT_MAP table is not the address of cluster 1751 member's current MARS the client MUST 'redirect' to the new MARS. The 1752 ar$redirf field controls how the redirection occurs. 1754 ar$redirf has the following format: 1756 7 6 5 4 3 2 1 0 1757 +-+-+-+-+-+-+-+-+ 1758 |x| | 1759 +-+-+-+-+-+-+-+-+ 1761 If Bit 7 (the most significant bit) of ar$redirf is 1 1762 then the cluster member MUST perform a 'hard' redirect. 1763 Having installed the new table of MARS addresses carried 1764 by the MARS_REDIRECT_MAP, the cluster member re-registers 1765 with the MARS now at the top of the table using the 1766 mechanism described in sections 5.4.1 and 5.4.2. 1768 If Bit 7 of ar$redirf is 0 then the cluster member MUST 1769 perform a 'soft' redirect, beginning with the following 1770 actions: 1772 - open a point to point VC to the first ATM address. 1773 - attempt a registration (section 5.2.3). 1775 If the registration succeeds, the cluster member shuts down its point 1776 to point VC to the current MARS (if it had one open), and then 1777 proceeds to use the newly opened point to point VC as its connection 1778 to the 'current MARS'. The cluster member does NOT attempt to rejoin 1779 the groups it is a member of, or revalidate groups it is currently 1780 sending to. 1782 This is termed a 'soft redirect' because it avoids the extra 1783 rejoining and revalidation processing that occurs when a MARS failure 1784 is being recovered from. It assumes some external synchronisation 1785 mechanisms exist between the old and new MARS - mechanisms that are 1786 outside the scope of this specification. 1788 Some level of trust is required before initiating a soft redirect. A 1789 cluster member MUST check that the calling party at the other end of 1790 the VC on which the MARS_REDIRECT_MAP arrived (supposedly 1791 ClusterControlVC) is in fact the node it trusts as the current MARS. 1793 Additional applications of this function are for further study. 1795 5.5 Data path LLC/SNAP encapsulations. 1797 An extended encapsulation scheme is required to support the filtering 1798 of possible reflected packets (section 3.3). 1800 Two LLC/SNAP codepoints are allocated from the IANA OUI space. These 1801 support two different mechanisms for detecting reflected packets. 1802 They are called Type #1 and Type #2 multicast encapsulations. 1804 Type #1 1806 [0xAA-AA-03][0x00-00-5E][0x00-01][Type #1 Extended Layer 3 packet] 1807 LLC OUI PID 1809 Type #2 1811 [0xAA-AA-03][0x00-00-5E][0x00-04][Type #2 Extended Layer 3 packet] 1812 LLC OUI PID 1814 For conformance with this document MARS clients: 1816 MUST transmit data using Type #1 encapsulation. 1818 MUST be able to correctly receive traffic using Type #1 OR Type #2 1819 encapsulation. 1821 MUST NOT transmit using Type #2 encapsulation. 1823 5.5.1 Type #1 encapsulation. 1825 The Type #1 Extended layer 3 packet carries within it a copy of the 1826 source's Cluster Member ID (CMI) and either the 'short form' or 'long 1827 form' of the protocol type as appropriate (section 4.3). 1829 When carrying packets belonging to protocols with valid short form 1830 representations the [Type #1 Extended Layer 3 packet] is encoded as: 1832 [pkt$cmi][pkt$pro][Original Layer 3 packet] 1833 2octet 2octet N octet 1835 The first 2 octets (pkt$cmi) carry the CMI assigned when an endpoint 1836 registers with the MARS (section 5.2.3). The second 2 octets 1837 (pkt$pro) indicate the protocol type of the packet carried in the 1838 remainder of the payload. This is copied from the ar$pro field used 1839 in the MARS control messages. 1841 When carrying packets belonging to protocols that only have a long 1842 form representation (pkt$pro = 0x80) the overhead SHALL be further 1843 extended to carry the 5 byte ar$pro.snap field (with padding for 32 1844 bit alignment). The encoded form SHALL be: 1846 [pkt$cmi][0x00-80][ar$pro.snap][padding][Original Layer 3 packet] 1847 2octet 2octet 5 octets 3 octets N octet 1849 The CMI is copied into the pkt$cmi field of every outgoing Type #1 1850 packet. When an endpoint interface receives an AAL_SDU with the 1851 LLC/SNAP codepoint indicating Type #1 encapsulation it compares the 1852 CMI field with its own Cluster Member ID for the indicated protocol. 1853 The packet is discarded silently if they match. Otherwise the packet 1854 is accepted for processing by the local protocol entity identified by 1855 the pkt$pro (and possibly SNAP) field(s). 1857 Where a protocol has valid short and long forms of identification, 1858 receivers MAY choose to additionally recognise the long form. 1860 5.5.2 Type #2 encapsulation. 1862 Future developments may enable direct multicasting of AAL_SDUs beyond 1863 cluster boundaries. Expanding the set of possible sources in this way 1864 may cause the CMI to become an inadequate parameter with which to 1865 detect reflected packets. A larger source identification field may 1866 be required. 1868 The Type #2 Extended layer 3 packet carries within it an 8 octet 1869 source ID field and either the 'short form' or 'long form' of the 1870 protocol type as appropriate (section 4.3). The form and content of 1871 the source ID field is currently unspecified, and is not relevant to 1872 any MARS client built in conformance with this document. Received 1873 Type #2 encapsulated packets MUST always be accepted and passed up to 1874 the higher layer indicated by the protocol identifier. 1876 When carrying packets belonging to protocols with valid short form 1877 representations the [Type #2 Extended Layer 3 packet] is encoded as: 1879 [8 octet sourceID][ar$pro.type][Null pad][Original Layer 3 packet] 1880 2octets 2octets 1882 When carrying packets belonging to protocols that only have a long 1883 form representation (pkt$pro = 0x80) the overhead SHALL be further 1884 extended to carry the 5 byte ar$pro.snap field (with padding for 32 1885 bit alignment). The encoded form SHALL be: 1887 [8 octet sourceID][ar$pro.type][ar$pro.snap][Null pad][Layer 3 1888 packet] 1889 2octets 5octets 1octet 1891 (Note that in this case the padding after the SNAP field is 1 octet 1892 rather than the 3 octets used in Type #1.) 1894 Where a protocol has valid short and long forms of identification, 1895 receivers MAY choose to additionally recognise the long form. 1897 (Future documents may specify the contents of the source ID field. 1898 This will only be relevant to implementations sending Type #2 1899 encapsulated packets, as they are the only entities that need to be 1900 concerned about detecting reflected Type #2 packets.) 1902 5.5.3 A Type #1 example. 1904 An IPv4 packet (fully identified by an Ethertype of 0x800, therefore 1905 requiring 'short form' protocol type encoding) would be transmitted 1906 as: 1908 [0xAA-AA-03][0x00-00-5E][0x00-01][pkt$cmi][0x800][IPv4 packet] 1910 The different LLC/SNAP codepoints for unicast and multicast packet 1911 transmission allows a single IPv4/ATM interface to support both by 1912 demuxing on the LLC/SNAP header. 1914 6. The MARS in greater detail. 1916 Section 5 implies a lot about the MARS's basic behaviour as observed 1917 by cluster members. This section summarises the behaviour of the MARS 1918 for groups that are VC mesh based, and describes how a MARSs 1919 behaviour changes when an MCS is registered to support a group. 1921 The MARS is intended to be a multiprotocol entity - all its mapping 1922 tables, CMIs, and control VCs MUST be managed within the context of 1923 the ar$pro field in incoming MARS messages. For example, a MARS 1924 supports completely separate ClusterControlVCs for each layer 3 1925 protocol that it is registering members for. If a MARS receives 1926 messages with an ar$pro that it does not support, the message is 1927 dropped. 1929 In general the MARS treats protocol addresses as arbitrary byte 1930 strings. For example, the MARS will not apply IPv4 specific 'class' 1931 checks to addresses supplied under ar$pro = 0x800. It is sufficient 1932 for the MARS to simply assume that endpoints know how to interpret 1933 the protocol addresses that they are establishing and releasing 1934 mappings for. 1936 The MARS requires control messages to carry the originator's identity 1937 in the source ATM address field(s). Messages that arrive with an 1938 empty ATM Number field are silently discarded prior to any other 1939 processing by the MARS. (Only the ATM Number field needs to be 1940 checked. An empty ATM Number field combined with a non-empty ATM 1941 Subaddress field does not represent a valid ATM address.) 1943 (Some example pseudo-code for a MARS can be found in Appendix F.) 1945 6.1 Basic interface to Cluster members. 1947 The following MARS messages are used or required by cluster members: 1949 1 MARS_REQUEST 1950 2 MARS_MULTI 1951 4 MARS_JOIN 1952 5 MARS_LEAVE 1953 6 MARS_NAK 1954 10 MARS_GROUPLIST_REQUEST 1955 11 MARS_GROUPLIST_REPLY 1956 12 MARS_REDIRECT_MAP 1958 6.1.1 Response to MARS_REQUEST. 1960 Except as described in section 6.2, if a MARS_REQUEST arrives whose 1961 source ATM address does not match that of any registered Cluster 1962 member the message MUST be dropped and ignored. 1964 6.1.2 Response to MARS_JOIN and MARS_LEAVE. 1966 When a registration MARS_JOIN arrives (described in section 5.2.3) 1967 the MARS performs the following actions: 1969 - Adds the node to ClusterControlVC. 1970 - Allocates a new Cluster Member ID (CMI). 1971 - Inserts the new CMI into the ar$cmi field of the MARS_JOIN. 1972 - Retransmits the MARS_JOIN back privately. 1974 If the node is already a registered member of the cluster associated 1975 with the specified protocol type then its existing CMI is simply 1976 copied into the MARS_JOIN, and the MARS_JOIN retransmitted back to 1977 the node. A single node may register multiple times if it supports 1978 multiple layer 3 protocols. The CMIs allocated by the MARS for each 1979 such registration may or may not be the same. 1981 The retransmitted registration MARS_JOIN must NOT be sent on 1982 ClusterControlVC. If a cluster member issues a deregistration 1983 MARS_LEAVE it too is retransmitted privately. 1985 Non-registration MARS_JOIN and MARS_LEAVE messages are ignored if 1986 they arrive from a node that is not registered as a cluster member. 1988 Except as described in section 6.2.4, after performing any required 1989 database updates non-registration MARS_JOIN and MARS_LEAVE messages 1990 are retransmitted on ClusterControlVC. The following fields are 1991 modified just prior to retransmission: 1993 ar$flags.copy is set to 1. 1995 ar$msn is set to the current Cluster Sequence Number for 1996 ClusterControlVC (Section 5.1.4.2). 1998 The MARS retransmits MARS_JOIN and MARS_LEAVE messages even if they 1999 resulted in no change to the database. 2001 MARS_JOIN or MARS_LEAVE messages MUST arrive at the MARS with 2002 ar$flags.copy set to 0, otherwise the message is silently ignored. 2003 All outgoing MARS_JOIN or MARS_LEAVE messages have ar$flags.copy set 2004 to 1. 2006 ar$flags.layer3grp (section 5.3) MUST be ignored (and treated as 2007 reset) for MARS_JOINs specifying more than a single group. If a 2008 MARS_JOIN is received that contains more than one pair, the 2009 MARS MUST ignore the second and subsequent pairs. 2011 An additional IPv4 specific behaviour exists - if a node issues a 2012 MARS_LEAVE for address "224.0.0.1" (the 'all systems' group) it is 2013 assumed to have ceased multicast support completely. All references 2014 to this node MUST be eliminated from any other IPv4 groups it is a 2015 member of in the database. However, the endpoint is NOT released as a 2016 leaf node from ClusterControlVC (this only occurs upon receipt of a 2017 deregistration MARS_LEAVE). 2019 If the MARS receives a deregistration MARS_LEAVE (described in 2020 section 5.2.3) that member's ATM address MUST be removed from all 2021 groups for which it may have joined, dropped from ClusterControlVC, 2022 and the CMI released. 2024 If the MARS receives an ERR_L_RELEASE on ClusterControlVC indicating 2025 that a cluster member has disconnected, that member's ATM address 2026 MUST be removed from all groups for which it may have joined, and the 2027 CMI released. 2029 6.1.3 Generating MARS_REDIRECT_MAP. 2031 A MARS_REDIRECT_MAP message (described in section 5.4.3) MUST be 2032 regularly transmitted on ClusterControlVC. It is RECOMMENDED that 2033 this occur every 1 minute, and it MUST occur at least every 2 2034 minutes. If the MARS has no knowledge of other backup MARSs serving 2035 the cluster, it MUST include its own address as the only entry in the 2036 MARS_REDIRECT_MAP message (in addition to filling in the source 2037 address fields). 2039 The design and use of backup MARS entities is beyond the scope of 2040 this document, and will be covered in future work. 2042 6.1.4 Cluster Sequence Numbers. 2044 The Cluster Sequence Number (CSN) is described in section 5.1.4, and 2045 is carried in the ar$msn field of MARS messages being sent to cluster 2046 members (either out ClusterControlVC or on an individual VC). The 2047 MARS increments the CSN every time a message is sent on 2048 ClusterControlVC. The current CSN is copied into the ar$msn field of 2049 MARS messages being sent to cluster members, whether out 2050 ClusterControlVC or on a private VC. 2052 A MARS should be carefully designed to minimise the possibility of 2053 the CSN jumping unecessarily. Under normal operation only cluster 2054 members affected by transient link problems will miss CSN updates and 2055 be forced to revalidate. If the MARS itself glitches, it will be 2056 innundated with requests for a period as every cluster member 2057 attempts to revalidate. 2059 Calculations on the CSN MUST be performed as unsigned 32 bit 2060 arithmetic. 2062 One implication of this mechanism is that the MARS should serialize 2063 its processing of 'simultaneous' MARS_REQUEST, MARS_JOIN and 2064 MARS_LEAVE messages. Join and Leave operations should be queued 2065 within the MARS along with MARS_REQUESTS, and not processed until all 2066 the reply packets of a preceeding MARS_REQUEST have been transmitted. 2067 The transmission of MARS_REDIRECT_MAP should also be similarly 2068 queued. 2070 (The regular transmission of MARS_REDIRECT_MAP serves a secondary 2071 purpose of allowing cluster members to track the CSN, even if they 2072 miss an earlier MARS_JOIN or MARS_LEAVE.) 2074 6.2 MARS interface to Multicast Servers (MCS). 2076 When the MARS returns the actual addresses of group members, the 2077 endpoint behaviour described in section 5 results in all groups being 2078 supported by meshes of point to multipoint VCs. However, when MCSs 2079 register to support particular layer 3 multicast groups the MARS 2080 modifies its use of various MARS messages to fool endpoints into 2081 using the MCS instead. 2083 The following MARS messages are associated with interaction between 2084 the MARS and MCSs. 2086 3 MARS_MSERV 2087 7 MARS_UNSERV 2088 8 MARS_SJOIN 2089 9 MARS_SLEAVE 2091 The following MARS messages are treated in a slightly different 2092 manner when MCSs have registered to support certain group addresses: 2094 1 MARS_REQUEST 2095 4 MARS_JOIN 2096 5 MARS_LEAVE 2098 A MARS must keep two sets of mappings for each layer 3 group using 2099 MCS support. The original {layer 3 address, ATM.1, ATM.2, ... ATM.n} 2100 mapping (now termed the 'host map', although it includes routers) is 2101 augmented by a parallel {layer 3 address, server.1, server.2, .... 2102 server.K} mapping (the 'server map'). It is assumed that no ATM 2103 addresses appear in both the server and host maps for the same 2104 multicast group. Typically K will be 1, but it will be larger if 2105 multiple MCSs are configured to support a given group. 2107 The MARS also maintains a point to multipoint VC out to any MCSs 2108 registered with it, called ServerControlVC (section 6.2.3). This 2109 serves an analogous role to ClusterControlVC, allowing the MARS to 2110 update the MCSs with group membership changes as they occur. A MARS 2111 MUST also send its regular MARS_REDIRECT_MAP transmissions on both 2112 ServerControlVC and ClusterControlVC. 2114 6.2.1 Response to a MARS_REQUEST if MCS is registered. 2116 When the MARS receives a MARS_REQUEST for an address that has both 2117 host and server maps it generates a response based on the identity of 2118 the request's source. If the requestor is a member of the server map 2119 for the requested group then the MARS returns the contents of the 2120 host map in a sequence of one or more MARS_MULTIs. Otherwise, if the 2121 source is a valid cluster member, the MARS returns the contents of 2122 the server map in a sequence of one or more MARS_MULTIs. If the 2123 source is neither a cluster member, nor a member of the server map 2124 for the group, the request is dropped and ignored. 2126 Servers use the host map to establish a basic distribution VC for the 2127 group. Cluster members will establish outgoing multipoint VCs to 2128 members of the group's server map, without being aware that their 2129 packets will not be going directly to the multicast group's members. 2131 6.2.2 MARS_MSERV and MARS_UNSERV messages. 2133 MARS_MSERV and MARS_UNSERV are identical to the MARS_JOIN message. 2134 An MCS uses a MARS_MSERV with a pair of to specify 2135 the multicast group X that it is willing to support. A single group 2136 MARS_UNSERV indicates the group that the MCS is no longer willing to 2137 support. The operation code for MARS_MSERV is 3 (decimal), and 2138 MARS_UNSERV is 7 (decimal). 2140 When operating on specific groups ar$flags.register MUST be zero. 2142 When an MCS issues a MARS_MSERV for a specific group the message MUST 2143 be dropped and ignored if the source has not already registered with 2144 the MARS as a multicast server (section 6.2.3). Otherwise, the MARS 2145 adds the new ATM address to the server map for the specified group, 2146 possibly constructing a new server map if this is the first MCS for 2147 the group. 2149 When an MCS issues a MARS_UNSERV the MARS removes its ATM address 2150 from the server maps for each specified group, deleting any server 2151 maps that end up being null after the operation. 2153 Both of these messages are sent to the MARS over a point to point VC 2154 (between MCS and MARS). After processing, they are retransmitted on 2155 ServerControlVC to allow other MCSs to note the new node. 2157 The operation code is then changed to MARS_JOIN or MARS_LEAVE 2158 respectively, and another copy of the message is also transmitted on 2159 ClusterControlVC. This fools the cluster members into thinking a new 2160 leaf node as been added to (or dropped from) the group specified. In 2161 the retransmitted MARS_JOIN/LEAVE ar$flags.layer3grp MUST be zero, 2162 ar$flags.copy MUST be one, and ar$flags.register MUST be zero. 2164 The MARS retransmits redundant MARS_MSERV and MARS_UNSERV messages 2165 onto ServerControlVC, generates the appropriate MARS_JOIN or 2166 MARS_LEAVE messages on ClusterControlVC, but takes no further action. 2168 It is assumed that at least one MCS will have MARS_MSERV'ed a group 2169 before the first cluster member joins it. If a MARS_MSERV arrives for 2170 a group that has a non-null host map but no server map the default 2171 response of the MARS will be to silently drop the MARS_MSERV without 2172 any further action. The MCS attempting to support the group will 2173 eventually flag an error after repeated MARS_MSERVs fail. 2175 The last or only MCS for a group MAY choose to issue a MARS_UNSERV 2176 while the group still has members. When the MARS_UNSERV is processed 2177 by the MARS the 'server map' will be deleted. When the associated 2178 MARS_LEAVE is issued on ClusterControlVC, all cluster members with a 2179 VC open to the MCS for that group will close down the VC (in 2180 accordance with section 5.1.4, since the MCS was their only leaf 2181 node). When cluster members subsequently find they need to transmit 2182 packets to the group, they will begin again with the 2183 MARS_REQUEST/MARS_MULTI sequence to establish a new VC. Since the 2184 MARS will have deleted the server map, this will result in the host 2185 map being return, and the group reverts to being supported by a VC 2186 mesh. 2188 A clean mechanism for the reverse process - transitioning a group 2189 from a VC mesh to MCS supported while the group is active - is a 2190 subject for further study. 2192 6.2.3 Registering a Multicast Server (MCS). 2194 Section 5.2.3 describes how endpoints register as cluster members, 2195 and hence get added as leaf nodes to ClusterControlVC. The same 2196 approach is used to register endpoints that intend to provide MCS 2197 support. 2199 Registration with the MARS occurs when an endpoint issues a 2200 MARS_MSERV with ar$flags.register set to one. Upon registration the 2201 endpoint is added as a leaf node to ServerControlVC, and the 2202 MARS_MSERV is returned to the MCS privately. 2204 The MCS retransmits this MARS_MSERV until it confirms that the MARS 2205 has received it (by receiving a copy back, in an analogous way to the 2206 mechanism described in section 5.2.2 for reliably transmitting 2207 MARS_JOINs). 2209 The ar$cmi field in MARS_MSERVs MUST be set to zero by both MCS and 2210 MARS. 2212 An MCS may also choose to de-register, using a MARS_UNSERV with 2213 ar$flags.register set to one. When this occurs the MARS MUST remove 2214 all references to that MCS in all servermaps associated with the 2215 protocol (ar$pro) specified in the MARS_UNSERV. 2217 Note that multiple logical MCSs may share the same physical ATM 2218 interface, provided that each MCS uses a separate ATM address (e.g. a 2219 different SEL field in the NSAP format address). In fact, an MCS may 2220 share the ATM interface of a node that is also a cluster member 2221 (either host or router), provided each logical entity has a different 2222 ATM address. 2224 A MARS MUST be capable of handling a multi-entry servermap. However, 2225 the possible use of multiple MCSs registering to support the same 2226 group is a subject for further study. In the absence of an MCS 2227 synchronisation protocol a system administrator MUST NOT allow more 2228 than one logical MCS to register for a given group. 2230 6.2.4 Modified response to MARS_JOIN and MARS_LEAVE. 2232 The existence of MCSs supporting some groups but not others requires 2233 the MARS to modify its distribution of single and block join/leave 2234 updates to cluster members. The MARS also adds two new messages - 2235 MARS_SJOIN and MARS_SLEAVE - for communicating group changes to MCSs 2236 over ServerControlVC. 2238 The MARS_SJOIN and MARS_SLEAVE messages are identical to MARS_JOIN, 2239 with operation codes 18 and 19 (decimal) respectively. 2241 When a cluster member issues MARS_JOIN or MARS_LEAVE for a single 2242 group, the MARS checks to see if the group has an associated server 2243 map. If the specified group does not have a server map the MARS 2244 simply retransmits the MARS_JOIN or MARS_LEAVE on ClusterControlVC. 2246 However, if a server map exists for the group a new set of actions 2247 are taken. 2249 A copy of the MARS_JOIN/LEAVE is made with type MARS_SJOIN or 2250 MARS_SLEAVE as appropriate, and transmitted on ServerControlVC. 2251 This allows the MCS(s) supporting the group to note the new member 2252 and update their data VCs. 2254 The original message is transmitted back to the source cluster 2255 member unchanged, using the VC it arrived on rather than 2256 ClusterControlVC. The ar$flags.punched field MUST be reset to 0 2257 in this message. 2259 (Section 5.2.2 requires cluster members have a mechanism to confirm 2260 the reception of their message by the MARS. For mesh supported 2261 groups, using ClusterControlVC serves dual purpose of providing this 2262 confirmation and distributing group update information. When a group 2263 is MCS supported, there is no reason for all cluster members to 2264 process null join/leave messages on ClusterControlVC, so they are 2265 sent back on the private VC between cluster member and MARS.) 2267 Receipt of a block MARS_JOIN (e.g. from a router coming on-line) or 2268 MARS_LEAVE requires a more complex response. The single 2269 block may simultaneously cover mesh supported and MCS supported 2270 groups. However, cluster members only need to be informed of the 2271 mesh supported groups that the endpoint has joined. Only the MCSs 2272 need to know if the endpoint is joining any MCS supported groups. 2274 The solution is to modify the MARS_JOIN or MARS_LEAVE that is 2275 retransmitted on ClusterControlVC. The following action is taken: 2277 A copy of the MARS_JOIN/LEAVE is made with type MARS_SJOIN or 2278 MARS_SLEAVE as appropriate, and transmitted on ServerControlVC. 2279 This allows the MCS(s) supporting the group to note the membership 2280 change and update their outgoing point to multipoint VCs. 2282 Before transmission on the ClusterControlVC, the original 2283 MARS_JOIN/LEAVE then has its block replaced with a 'hole 2284 punched' set of zero or more pairs. The 'hole punched' 2285 set of pairs covers the entire address range specified 2286 by the original pair, but excludes those 2287 addresses/groups supported by MCSs. 2289 If no 'holes' were punched in the specified block, the original 2290 MARS_JOIN/LEAVE is re-transmitted out on ClusterControlVC 2291 unchanged. Otherwise the following occurs: 2293 The original MARS_JOIN/LEAVE is transmitted back to the source 2294 cluster member unchanged, using the VC it arrived on. The 2295 ar$flags.punched field MUST be reset to 0 in this message. 2297 If the hole-punched set contains 1 or more pair, a 2298 copy of the original MARS_JOIN/LEAVE is transmitted on 2299 ClusterControlVC, carrying the new list. The 2300 ar$flags.punched field MUST be set to 1 in this message. 2302 The ar$flags.punched field is set to ensure the hole-punched copy 2303 is ignored by the message's source when trying to match received 2304 MARS_JOIN/LEAVE messages with ones previously sent (section 2305 5.2.2). 2307 (Appendix A discusses some algorithms for 'hole punching'.) 2309 It is assumed that MCSs use the MARS_SJOINs and MARS_SLEAVEs to 2310 update their own VCs out to the actual group's members. 2312 ar$flags.layer3grp is copied over into the messages transmitted by 2313 the MARS. ar$flags.copy MUST be set to one. 2315 6.2.5 Sequence numbers for ServerControlVC traffic. 2317 In an analogous fashion to the Cluster Sequence Number, the MARS 2318 keeps a Server Sequence Number (SSN) that is incremented for every 2319 transmission on ServerControlVC. The current value of the SSN is 2320 inserted into the ar$msn field of every message the MARS issues that 2321 it believes is destined for an MCS. This includes MARS_MULTIs that 2322 are being returned in response to a MARS_REQUEST from an MCS, and 2323 MARS_REDIRECT_MAP being sent on ServerControlVC. The MCS must check 2324 the MARS_REQUESTs source, and if it is a registered MCS the SSN is 2325 copied into the ar$msn field, otherwise the CSN is copied into the 2326 ar$msn field. 2328 MCSs are expected to track and use the SSNs in an analogous manner to 2329 the way endpoints use the CSN in section 5.1 (to trigger revalidation 2330 of group membership information). 2332 A MARS should be carefully designed to minimise the possibility of 2333 the SSN jumping unecessarily. Under normal operation only MCSs that 2334 are affected by transient link problems will miss ar$msn updates and 2335 be forced to revalidate. If the MARS itself glitches it will be 2336 innundated with requests for a period as every MCS attempts to 2337 revalidate. 2339 6.3 Why global sequence numbers? 2341 The CSN and SSN are global within the context of a given protocol 2342 (e.g. IP). They count ClusterControlVC and ServerControlVC activity 2343 without reference to the multicast group(s) involved. This may be 2344 perceived as a limitation, because there is no way for cluster 2345 members or multicast servers to isolate exactly which multicast group 2346 they may have missed an update for. An alternative was to try and 2347 provide a per-group sequence number. 2349 Unfortunately per-group sequence numbers are not practical. The 2350 current mechanism allows sequence information to be piggy-backed onto 2351 MARS messages already in transit for other reasons. The ability to 2352 specify blocks of multicast addresses with a single MARS_JOIN or 2353 MARS_LEAVE means that a single message can refer to membership change 2354 for multiple groups simultaneously. A single ar$msn field cannot 2355 provide meaningful information about each group's sequence. Multiple 2356 ar$msn fields would have been unwieldy. 2358 Any MARS or cluster member that supports different protocols MUST 2359 keep separate mapping tables and sequence numbers for each protocol. 2361 6.4 Redundant/Backup MARS Architectures. 2363 If backup MARSs exist for a given cluster then mechanisms are needed 2364 to ensure consistency between their mapping tables and those of the 2365 active, current MARS. 2367 (Cluster members will consider backup MARSs to exist if they have 2368 been configured with a table of MARS addresses, or the regular 2369 MARS_REDIRECT_MAP messages contain a list of 2 or more addresses.) 2371 The definition of an MARS-synchronization protocol is beyond the 2372 current scope of this document, and is expected to be the subject of 2373 further research work. However, the following observations may be 2374 made: 2376 The MARS_REDIRECT_MAP message exist, enabling one MARS to force 2377 endpoints to move to another MARS (e.g. in the aftermath of a MARS 2378 failure, the chosen backup MARS will eventually wish to hand 2379 control of the cluster over to the main MARS when it is 2380 functioning properly again). 2382 Cluster members and MCSs do not need to start up with knowledge of 2383 more than one MARS, provided that MARS correctly issues 2384 MARS_REDIRECT_MAP messages with the full list of MARSs for that 2385 cluster. 2387 Any mechanism for synchronising backup MARSs (and coping with the 2388 aftermath of MARS failures) should be compatible with the cluster 2389 member behaviour described in this document. 2391 7. How an MCS utilises a MARS. 2393 When an MCS supports a multicast group it acts as a proxy cluster 2394 endpoint for the senders to the group. It also behaves in an 2395 analogous manner to a sender, managing a single outgoing point to 2396 multipoint VC to the real group members. 2398 Detailed description of possible MCS architectures are beyond the 2399 scope of this document. This section will outline the main issues. 2401 7.1 Association with a particular Layer 3 group. 2403 When an MCS issues a MARS_MSERV it forces all senders to the 2404 specified layer 3 group to terminate their VCs on the supplied source 2405 ATM address. 2407 The simplest MCS architecture involves taking incoming AAL_SDUs and 2408 simply flipping them back out a single point to multipoint VC. Such 2409 an MCS cannot support more than one group at once, as it has no way 2410 to differentiate between traffic destined for different groups. 2411 Using this architecture, a physical node would provide MCS support 2412 for multiple groups by creating multiple logical instances of the 2413 MCS, each with different ATM Addresses (e.g. a different SEL value in 2414 the node's NSAPA). 2416 A slightly more complex approach would be to add minimal layer 3 2417 specific processing into the MCS. This would look inside the received 2418 AAL_SDUs and determine which layer 3 group they are destined for. A 2419 single instance of such an MCS might register its ATM Address with 2420 the MARS for multiple layer 3 groups, and manage multiple independent 2421 outgoing point to multipoint VCs (one for each group). 2423 When an MCS starts up it MUST register with the MARS as described in 2424 section 6.2.3, identifying the protocol it supports with the ar$pro 2425 field of the MARS_MSERV. This also applies to logical MCSs, even if 2426 they share the same physical ATM interface. This is important so that 2427 the MARS can react to the loss of an MCS when it drops off 2428 ServControlVC. (One consequence is that 'simple' MCS architectures 2429 end up with one ServerControlVC member per group. MCSs with layer 3 2430 specific processing may support multiple groups while still only 2431 registering as one member of ServerControlVC.) 2433 An MCS MUST NOT share the same ATM address as a cluster member, 2434 although it may share the same physical ATM interface. 2436 7.2 Termination of incoming VCs. 2438 An MCS MUST terminate unidirectional VCs in the same manner as a 2439 cluster member. (e.g. terminate on an LLC entity when LLC/SNAP 2440 encapsulation is used, as described in RFC 1755 for unicast 2441 endpoints.) 2443 7.3 Management of outgoing VC. 2445 An MCS MUST establish and manage its outgoing point to multipoint VC 2446 as a cluster member does (section 5.1). 2448 MARS_REQUEST is used by the MCS to establish the initial leaf nodes 2449 for the MCS's outgoing point to multipoint VC. After the VC is 2450 established, the MCS reacts to MARS_SJOINs and MARS_SLEAVEs in the 2451 same way a cluster member reacts to MARS_JOINs and MARS_LEAVEs. 2453 The MCS tracks the Server Sequence Number from the ar$msn fields of 2454 messages from the MARS, and revalidates its outgoing point to 2455 multipoint VC(s) when a sequence number jump occurs. 2457 7.4 Use of a backup MARS. 2459 The MCS uses the same approach to backup MARSs as a cluster member 2460 (section 5.4), tracking MARS_REDIRECT_MAP messages on 2461 ServerControlVC. 2463 8. Support for IP multicast routers. 2465 Multicast routers are required for the propagation of multicast 2466 traffic beyond the constraints of a single cluster (inter-cluster 2467 traffic). (There is a sense in which they are multicast servers 2468 acting at the next higher layer, with clusters, rather than 2469 individual endpoints, as their abstract sources and destinations.) 2471 Multicast routers typically participate in higher layer multicast 2472 routing algorithms and policies that are beyond the scope of this 2473 memo (e.g. DVMRP [5] in the IPv4 environment). 2475 It is assumed that the multicast routers will be implemented over the 2476 same sort of IP/ATM interface that a multicast host would use. Their 2477 IP/ATM interfaces will will register with the MARS as a cluster 2478 members, joining and leaving multicast groups as necessary. As noted 2479 in section 5, multiple logical 'endpoints' may be implemented over a 2480 single physical ATM interface. Routers use this approach to provide 2481 interfaces into each clusters they will be routing between. 2483 The rest of this section will assume a simple IPv4 scenario where the 2484 scope of a cluster has been limited to a particular LIS that is part 2485 of an overlaid IP network. Not all members of the LIS are necessarily 2486 registered cluster members (you may have unicast-only hosts in the 2487 LIS). 2489 8.1 Forwarding into a Cluster. 2491 If the multicast router needs to transmit a packet to a group within 2492 the cluster its IP/ATM interface opens a VC in the same manner as a 2493 normal host would. Once a VC is open, the router watches for 2494 MARS_JOIN and MARS_LEAVE messages and responds to them as a normal 2495 host would. 2497 The multicast router's transmit side MUST implement inactivity timers 2498 to shut down idle outgoing VCs, as for normal hosts. 2500 As with normal host, the multicast router does not need to be a 2501 member of a group it is sending to. 2503 8.2 Joining in 'promiscuous' mode. 2505 Once registered and initialised, the simplest model of IPv4 multicast 2506 router operation is for it to issue a MARS_JOIN encompassing the 2507 entire Class D address space. In effect it becomes 'promiscuous', as 2508 it will be a leaf node to all present and future multipoint VCs 2509 established to IPv4 groups on the cluster. 2511 How a router chooses which groups to propagate outside the cluster is 2512 beyond the scope of this document. 2514 Consistent with RFC 1112, IP multicast routers may retain the use of 2515 IGMP Query and IGMP Report messages to ascertain group membership. 2516 However, certain optimisations are possible, and are described in 2517 section 8.5. 2519 8.3 Forwarding across the cluster. 2521 Under some circumstances the cluster may simply be another hop 2522 between IP subnets that have participants in a multicast group. 2524 [LAN.1] ----- IPmcR.1 -- [cluster/LIS] -- IPmcR.2 ----- [LAN.2] 2526 LAN.1 and LAN.2 are subnets (such as Ethernet) with attached hosts 2527 that are members of group X. 2529 IPmcR.1 and IPmcR.2 are multicast routers with interfaces to the LIS. 2531 A traditional solution would be to treat the LIS as a unicast subnet, 2532 and use tunneling routers. However, this would not allow hosts on the 2533 LIS to participate in the cross-LIS traffic. 2535 Assume IPmcR.1 is receiving packets promiscuously on its LAN.1 2536 interface. Assume further it is configured to propagate multicast 2537 traffic to all attached interfaces. In this case that means the LIS. 2539 When a packet for group X arrives on its LAN.1 interface, IPmcR.1 2540 simply sends the packet to group X on the LIS interface as a normal 2541 host would (Issuing MARS_REQUEST for group X, creating the VC, 2542 sending the packet). 2544 Assuming IPmcR.2 initialised itself with the MARS as a member of the 2545 entire Class D space, it will have been returned as a member of X 2546 even if no other nodes on the LIS were members. All packets for group 2547 X received on IPmcR.2's LIS interface may be retransmitted on LAN.2. 2549 If IPmcR.1 is similarly initialised the reverse process will apply 2550 for multicast traffic from LAN.2 to LAN.1, for any multicast group. 2551 The benefit of this scenario is that cluster members within the LIS 2552 may also join and leave group X at anytime. 2554 8.4 Joining in 'semi-promiscuous' mode. 2556 Both unicast and multicast IP routers have a common problem - 2557 limitations on the number of AAL contexts available at their ATM 2558 interfaces. Being 'promiscuous' in the RFC 1112 sense means that for 2559 every M hosts sending to N groups, a multicast router's ATM interface 2560 will have M*N incoming reassembly engines tied up. 2562 It is not hard to envisage situations where a number of multicast 2563 groups are active within the LIS but are not required to be 2564 propagated beyond the LIS itself. An example might be a distributed 2565 simulation system specifically designed to use the high speed IP/ATM 2566 environment. There may be no practical way its traffic could be 2567 utilised on 'the other side' of the multicast router, yet under the 2568 conventional scheme the router would have to be a leaf to each 2569 participating host anyway. 2571 As this problem occurs below the IP layer, it is worth noting that 2572 'scoping' mechanisms at the IP multicast routing level do not provide 2573 a solution. An IP level scope would still result in the router's ATM 2574 interface receiving traffic on the scoped groups, only to drop it. 2576 In this situation the network administrator might configure their 2577 multicast routers to exclude sections of the Class D address space 2578 when issuing MARS_JOIN(s). Multicast groups that will never be 2579 propagated beyond the cluster will not have the router listed as a 2580 member, and the router will never have to receive (and simply ignore) 2581 traffic from those groups. 2583 Another scenario involves the product M*N exceeding the capacity of a 2584 single router's interface (especially if the same interface must also 2585 support a unicast IP router service). 2587 A network administrator may choose to add a second node, to function 2588 as a parallel IP multicast router. Each router would be configured to 2589 be 'promiscuous' over separate parts of the Class D address space, 2590 thus exposing themselves to only part of the VC load. This sharing 2591 would be completely transparent to IP hosts within the LIS. 2593 Restricted promiscuous mode does not break RFC 1112's use of IGMP 2594 Report messages. If the router is configured to serve a given block 2595 of Class D addresses, it will receive the IGMP Report. If the router 2596 is not configured to support a given block, then the existence of an 2597 IGMP Report for a group in that block is irrelevant to the router. 2598 All routers are able to track membership changes through the 2599 MARS_JOIN and MARS_LEAVE traffic anyway. (Section 8.5 discusses a 2600 better alternative to IGMP within a cluster.) 2602 Mechanisms and reasons for establishing these modes of operation are 2603 beyond the scope of this document. 2605 8.5 An alternative to IGMP Queries. 2607 An unfortunate aspect of IGMP is that it assumes multicasting of IP 2608 packets is a cheap and trivial event at the link layer. As a 2609 consequence, regular IGMP Queries are multicasted by routers to group 2610 224.0.0.1. These queries are intended to trigger IGMP Replies by 2611 cluster members that have layer 3 members of particular groups. 2613 The MARS_GROUPLIST_REQUEST and MARS_GROUPLIST_REPLY messages were 2614 designed to allow routers to avoid actually transmitting IGMP Queries 2615 out into a cluster. 2617 Whenever the router's forwarding engine wishes to transmit an IGMP 2618 query, a MARS_GROUPLIST_REQUEST can be sent to the MARS instead. The 2619 resulting MARS_GROUPLIST_REPLY(s) (described in section 5.3) from the 2620 MARS carry all the information that the router would have ascertained 2621 from IGMP replies. 2623 It is RECOMMENDED that multicast routers utilise this MARS service to 2624 minimise IGMP traffic within the cluster. 2626 By default a MARS_GROUPLIST_REQUEST SHOULD specify the entire address 2627 space (e.g. <224.0.0.0, 239.255.255.255> in an IPv4 environment). 2628 However, routers serving part of the address space (as described in 2629 section 8.4) MAY choose to issue MARS_GROUPLIST_REQUESTs that specify 2630 only the subset of the address space they are serving. 2632 (On the surface it would also seem useful for multicast routers to 2633 track MARS_JOINs and MARS_LEAVEs that arrive with ar$flags.layer3grp 2634 set. These might be used in lieu of IGMP Reports, to provide the 2635 router with timely indication that a new layer 3 group member exists 2636 within the cluster. However, this only works on VC mesh supported 2637 groups, and is therefore NOT recommended). 2639 Appendix B discusses less elegant mechanisms for reducing the impact 2640 of IGMP traffic within a cluster, on the assumption that the IP/ATM 2641 interfaces to the cluster are being used by un-optimised IP 2642 multicasting code. 2644 8.6 CMIs across multiple interfaces. 2646 The Cluster Member ID is only unique within the Cluster managed by a 2647 given MARS. On the surface this might appear to leave us with a 2648 problem when a multicast router is routing between two or more 2649 Clusters using a single physical ATM interface. The router will 2650 register with two or more MARSs, and thereby acquire two or more 2651 independent CMI's. Given that each MARS has no reason to synchronise 2652 their CMI allocations, it is possible for a host in one cluster to 2653 have the same CMI has the router's interface to another Cluster. How 2654 does the router distinguish between its own reflected packets, and 2655 packets from that other host? 2657 The answer lies in the fact that routers (and hosts) actually 2658 implement logical IP/ATM interfaces over a single physical ATM 2659 interface. Each logical interface will have a unique ATM Address (eg. 2660 an NSAP with different SELector fields, one for each logical 2661 interface). 2663 Each logical IP/ATM interface is configured with the address of a 2664 single MARS, attaches to only one cluster, and so had only one CMI to 2665 worry about. Each of the MARSs that the router is registered with 2666 will have been given a different ATM Address (corresponding to the 2667 different logical IP/ATM interfaces) in each registration MARS_JOIN. 2669 When hosts in a cluster add the router as a leaf node, they'll 2670 specify the ATM Address of the appropriate logical IP/ATM interface 2671 on the router in the L_MULTI_ADD message. Thus, each logical IP/ATM 2672 interface will only have to check and filter on CMIs assigned by its 2673 own MARS. 2675 In essence the cluster differentiation is achieved by ensuring that 2676 logical IP/ATM interfaces are assigned different ATM Addresses. 2678 9. Multiprotocol applications of the MARS and MARS clients. 2680 A deliberate attempt has been made to describe the MARS and 2681 associated mechanisms in a manner independent of a specific higher 2682 layer protocol being run over the ATM cloud. The immediate 2683 application of this document will be in an IPv4 environment, and this 2684 is reflected by the focus of key examples. However, the ar$pro.type 2685 and ar$pro.snap fields in every MARS control message allow any higher 2686 layer protocol that has a 'short form' or 'long form' of protocol 2687 identification (section 4.3) to be supported by a MARS. 2689 Every MARS MUST implement entirely separate logical mapping tables 2690 and support. Every cluster member must interpret messages from the 2691 MARS in the context of the protocol type that the MARS message refers 2692 to. 2694 Every MARS and MARS client MUST treat Cluster Member IDs in the 2695 context of the protocol type carried in the MARS message or data 2696 packet containing the CMI. 2698 For example, IPv6 has been allocated an Ethertype of 0x86DD. This 2699 means the 'short form' of protocol identification must be used in the 2700 MARS control messages and the data path encapsulation (section 5.5). 2701 An IPv6 multicasting client sets the ar$pro.type field of every MARS 2702 message to 0x86DD. When carrying IPv6 addresses the ar$spln and 2703 ar$tpln fields are either 0 (for null or non-existent information) or 2704 16 (for the full IPv6 address). 2706 Following the rules in section 5.5, an IPv6 data packet is 2707 encapsulated as: 2709 [0xAA-AA-03][0x00-00-5E][0x00-01][pkt$cmi][0x86DD][IPv6 packet] 2711 A host or endpoint interface that is using the same MARS to support 2712 multicasting needs of multiple protocols MUST not assume their CMI 2713 will be the same for each protocol. 2715 10. Supplementary parameter processing. 2717 The ar$extoff field in the [Fixed header] indicates whether 2718 supplementary parameters are being carried by a MARS control message. 2719 This mechanism is intended to enable the addition of new 2720 functionality to the MARS protocol in later documents. 2722 Supplementary parameters are conveyed as a list of TLV (type, length, 2723 value) encoded information elements. The TLV(s) begin on the first 2724 32 bit boundary following the [Addresses] field in the MARS control 2725 message (e.g. after ar$tsa.N in a MARS_MULTI, after ar$max.N in a 2726 MARS_JOIN, etc). 2728 10.1 Interpreting the ar$extoff field. 2730 If the ar$extoff field is non-zero it indicates that a list of one or 2731 more TLVs have been appended to the MARS message. The first TLV is 2732 found by treating ar$extoff as an unsigned integer representing an 2733 offset (in octets) from the beginning of the MARS message (the MSB of 2734 the ar$hrd field). 2736 As TLVs are 32 bit aligned the bottom 2 bits of ar$extoff are also 2737 reserved. A receiver MUST mask off these two bits before calculating 2738 the octet offset to the TLV list. A sender MUST set these two bits 2739 to zero. 2741 If ar$extoff is zero no TLVs have been appended. 2743 10.2 The format of TLVs. 2745 When they exist, TLVs begin on 32 bit boundaries, are multiples of 32 2746 bits in length, and form a sequential list terminated by a NULL TLV. 2748 The TLV structure is: 2750 [Type - 2 octets][Length - 2 octets][Value - n*4 octets] 2752 The Type subfield indicates how the contents of the Value subfield 2753 are to be interpreted. 2755 The Length subfield indicates the number of VALID octets in the Value 2756 subfield. Valid octets in the Value subfield start immediately after 2757 the Length subfield. The offset (in octets) from the start of this 2758 TLV to the start of the next TLV in the list is given by the 2759 following formula: 2761 offset = (length + 4 + ((4-(length & 3)) % 4)) 2763 (where % is the modulus operator) 2765 The Value subfield is padded with 0, 1, 2, or 3 octets to ensure the 2766 next TLV is 32 bit aligned. The padded locations MUST be set to zero. 2768 (For example, a TLV that needed only 5 valid octets of information 2769 would be 12 octets long. The Length subfield would hold the value 5, 2770 and the Value subfield would be padded out to 8 bytes. The 5 valid 2771 octets of information begin at the first octet of the Value 2772 subfield.) 2773 The Type subfield is formatted in the following way: 2775 | 1st octet | 2nd octet | 2776 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 2777 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2778 | x | y | 2779 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2781 The most significant 2 bits (Type.x) determine how a recipient should 2782 behave when it doesn't recognise the TLV type indicated by the lower 2783 14 bits (Type.y). The required behaviours are: 2785 Type.x = 0 Skip the TLV, continue processing the list. 2786 Type.x = 1 Stop processing, silently drop the MARS message. 2787 Type.x = 2 Stop processing, drop message, give error indication. 2788 Type.x = 3 Reserved. (currently treat as x = 0) 2790 (The error indication generated when Type.x = 2 SHOULD be logged in 2791 some locally significant fashion. Consequential MARS message activity 2792 in response to such an error condition will be defined in future 2793 documents.) 2795 The TLV type space (Type.y) is further subdivided to encourage use 2796 outside the IETF. 2798 0 Null TLV. 2799 0x0001 - 0x0FFF Reserved for the IETF. 2800 0x1000 - 0x11FF Allocated to the ATM Forum. 2801 0x1200 - 0x37FF Reserved for the IETF. 2802 0x3800 - 0x3FFF Experimental use. 2804 10.3 Processing MARS messages with TLVs. 2806 Supplementary parameters act as modifiers to the basic behaviour 2807 specified by the ar$op field of any given MARS message. 2809 If a MARS message arrives with a non-zero ar$extoff field its TLV 2810 list MUST be parsed before handling the MARS message in accordance 2811 with the ar$op value. Unrecognised TLVs MUST be handled as required 2812 by their Type.x value. 2814 How TLVs modify basic MARS operations will be ar$op and TLV specific. 2816 10.4 Initial set of TLV elements. 2818 Conformance with this document only REQUIRES the recognition of one 2819 TLV, the Null TLV. This terminates a list of TLVs, and MUST be 2820 present if ar$extoff is non-zero in a MARS message. It MAY be the 2821 only TLV present. 2823 The Null TLV is coded as: 2825 [0x00-00][0x00-00] 2827 Future documents will describe the formats, contents, and 2828 interpretations of additional TLVs. The minimal parsing requirements 2829 imposed by this document are intended to allow conformant MARS and 2830 MARS client implementations to deal gracefully and predictably with 2831 future TLV developments. 2833 11. Key Decisions and open issues. 2835 The key decisions this document proposes: 2837 A Multicast Address Resolution Server (MARS) is proposed to co- 2838 ordinate and distribute mappings of ATM endpoint addresses to 2839 arbitrary higher layer 'multicast group addresses'. The specific 2840 case of IPv4 multicast is used as the example. 2842 The concept of 'clusters' is introduced to define the scope of a 2843 MARS's responsibility, and the set of ATM endpoints willing to 2844 participate in link level multicasting. 2846 A MARS is described with the functionality required to support 2847 intra-cluster multicasting using either VC meshes or ATM level 2848 multicast servers (MCSs). 2850 LLC/SNAP encapsulation of MARS control messages allows MARS and 2851 ATMARP traffic to share VCs, and allows partially co-resident MARS 2852 and ATMARP entities. 2854 New message types: 2856 MARS_JOIN, MARS_LEAVE, MARS_REQUEST. Allow endpoints to join, 2857 leave, and request the current membership list of multicast 2858 groups. 2860 MARS_MULTI. Allows multiple ATM addresses to be returned by the 2861 MARS in response to a MARS_REQUEST. 2863 MARS_MSERV, MARS_UNSERV. Allow multicast servers to register 2864 and deregister themselves with the MARS. 2866 MARS_SJOIN, MARS_SLEAVE. Allow MARS to pass on group membership 2867 changes to multicast servers. 2869 MARS_GROUPLIST_REQUEST, MARS_GROUPLIST_REPLY. Allow MARS to 2870 indicate which groups have actual layer 3 members. May be used 2871 to support IGMP in IPv4 environments, and similar functions in 2872 other environments. 2874 MARS_REDIRECT_MAP. Allow MARS to specify a set of backup MARS 2875 addresses. 2877 'wild card' MARS mapping table entries possible, where a single 2878 ATM address is simultaneously associated with blocks of multicast 2879 group addresses. 2881 For the MARS protocol ar$op.version = 0. The complete set of MARS 2882 control messages and ar$op.type values is: 2884 1 MARS_REQUEST 2885 2 MARS_MULTI 2886 3 MARS_MSERV 2887 4 MARS_JOIN 2888 5 MARS_LEAVE 2889 6 MARS_NAK 2890 7 MARS_UNSERV 2891 8 MARS_SJOIN 2892 9 MARS_SLEAVE 2893 10 MARS_GROUPLIST_REQUEST 2894 11 MARS_GROUPLIST_REPLY 2895 12 MARS_REDIRECT_MAP 2897 A number of issues are left open at this stage, and are likely to be 2898 the subject of on-going research and additional documents that build 2899 upon this one. 2901 The specified endpoint behaviour allows the use of 2902 redundant/backup MARSs within a cluster. However, no 2903 specifications yet exist on how these MARSs co-ordinate amongst 2904 themselves. (The default is to only have one MARS per cluster.) 2906 The specified endpoint behaviour and MARS service allows the use 2907 of multiple MCSs per group. However, no specifications yet exist 2908 on how this may be used, or how these MCSs co-ordinate amongst 2909 themselves. Until futher work is done on MCS co-ordination 2910 protocols the default is to only have one MCS per group. 2912 The MARS relies on the cluster member dropping off 2913 ClusterControlVC if the cluster member dies. It is not clear if 2914 additional mechanisms are needed to detect and delete 'dead' 2915 cluster members. 2917 If a multicast server attempts to MARS_MSERV for an existing VC 2918 mesh supported group, it would be nice to have current senders to 2919 the group migrate their outgoing VCs from the actual cluster 2920 member leaf nodes to the newly registered multicast server(s). How 2921 this might be achieved, the load this would place on the MARS, and 2922 its scalability, is for further study. 2924 Supporting layer 3 'broadcast' as a special case of multicasting 2925 (where the 'group' encompasses all cluster members) has not been 2926 explicitly discussed. 2928 Supporting layer 3 'unicast' as a special case of multicasting 2929 (where the 'group' is a single cluster member, identified by the 2930 cluster member's unicast protocol address) has not been explicitly 2931 discussed. 2933 The future development of ATM Group Addresses and Leaf Initiated 2934 Join to ATM Forum's UNI specification has not been addressed. 2935 (However, the problems identified in this document with respect to 2936 VC scarcity and impact on AAL contexts will not be fixed by such 2937 developments in the signalling protocol.) 2939 Security Consideration 2941 Security consideration are not addressed in this document. 2943 Acknowledgments 2945 The discussions within the IP over ATM Working Group have helped 2946 clarify the ideas expressed in this document. John Moy (Cascade 2947 Communications Corp.) initially suggested the idea of wild-card 2948 entries in the ARP Server. Drew Perkins (Fore Systems) provided 2949 rigorous and useful critique of early proposed mechanisms for 2950 distributing and validating group membership information. Susan 2951 Symington (and co-workers at MITRE Corp., Don Chirieleison, Rich 2952 Verjinski, and Bill Barns) clearly articulated the need for multicast 2953 server support, proposed a solution, and challenged earlier block 2954 join/leave mechanisms. John Shirron (Fore Systems) provided useful 2955 improvements on my original revalidation procedures. 2957 Susan Symington and Bryan Gleeson (Adaptec) independently championed 2958 the need for the service provided by MARS_GROUPLIST_REQUEST/REPLY. 2959 The new encapsulation scheme arose from WG discussions, captured by 2960 Bryan Gleeson in an interim Internet Draft (with Keith McCloghrie 2961 (Cisco), Andy Malis (Ascom Nexion), and Andrew Smith (Bay Networks) 2962 as key contributors). James Watt (Newbridge) and Joel Halpern 2963 (Newbridge) motivated the development of a more multiprotocol MARS 2964 control message format, evolving it away from its original ATMARP 2965 roots. They also motivated the development of Type #1 and Type #2 2966 data path encapsulations. 2968 Finally, Jim Rubas (IBM) supplied the MARS pseudo-code in Appendix F. 2970 Author's Address 2972 Grenville Armitage 2973 Bellcore, 445 South Street 2974 Morristown, NJ, 07960 2975 USA 2977 Email: gja@thumper.bellcore.com 2978 Ph. +1 201 829 2635 2980 References 2981 [1] S. Deering, "Host Extensions for IP Multicasting", RFC 1112, 2982 Stanford University, August 1989. 2984 [2] Heinanen, J., "Multiprotocol Encapsulation over ATM Adaption 2985 Layer 5", RFC 1483, USC/Information Science Institute, July 1993. 2987 [3] Laubach, M., "Classical IP and ARP over ATM", RFC1577, Hewlett- 2988 Packard Laboratories, December 1993 2990 [4] ATM Forum, "ATM User Network Interface (UNI) Specification 2991 Version 3.1", ISBN 0-13-393828-X, Prentice Hall, Englewood Cliffs, 2992 NJ, June 1995. 2994 [5] D. Waitzman, C. Partridge, S. Deering, "Distance Vector Multicast 2995 Routing Protocol", RFC 1075, November 1988. 2997 [6] M. Perez, F. Liaw, D. Grossman, A. Mankin, E. Hoffman, A. Malis, 2998 "ATM Signaling Support for IP over ATM", RFC 1755, February 1995. 3000 [7] M. Borden, E. Crawley, B. Davie, S. Batsell, "Integration of 3001 Real-time Services in an IP-ATM Network Architecture.", RFC 1821, 3002 August 1995. 3004 [8] ATM Forum, "ATM User-Network Interface Specification Version 3005 3.0", Englewood Cliffs, NJ: Prentice Hall, September 1993 3007 Appendix A. Hole punching algorithms. 3009 Implementations are entirely free to comply with the body of this 3010 memo in any way they see fit. This appendix is purely for 3011 clarification. 3013 A MARS implementation might pre-construct a set of pairs 3014 (P) that reflects the entire Class D space, excluding any addresses 3015 currently supported by multicast servers. The field of the 3016 first pair MUST be 224.0.0.0, and the field of the last pair 3017 must be 239.255.255.255. The first and last pair may be the same. 3018 This set is updated whenever a multicast server registers or 3019 deregisters. 3021 When the MARS must perform 'hole punching' it might consider the 3022 following algorithm: 3024 Assume the MARS_JOIN/LEAVE received by the MARS from the cluster 3025 member specied the block . 3027 Assume Pmin(N) and Pmax(N) are the and fields from the 3028 Nth pair in the MARS's current set P. 3030 Assume set P has K pairs. Pmin(1) MUST equal 224.0.0.0, and 3031 Pmax(M) MUST equal 239.255.255.255. (If K == 1 then no hole 3032 punching is required). 3034 Execute pseudo-code: 3036 create copy of set P, call it set C. 3038 index1 = 1; 3039 while (Pmax(index1) <= Emin) 3040 index1++; 3042 index2 = K; 3043 while (Pmin(index2) >= Emax) 3044 index2--; 3046 if (index1 > index2) 3047 Exit, as the hole-punched set is null. 3049 if (Pmin(index1) < Emin) 3050 Cmin(index1) = Emin; 3052 if (Pmax(index2) > Emax) 3053 Cmax(index2) = Emax; 3055 Set C is the required 'hole punched' set of address blocks. 3057 The resulting set C retains all the MARS's pre-constructed 'holes' 3058 covering the multicast servers, but will have been pruned to cover 3059 the section of the Class D space specified by the originating host's 3060 values. 3062 The host end should keep a table, H, of open VCs in ascending order 3063 of Class D address. 3065 Assume H(x).addr is the Class address associated with VC.x. 3066 Assume H(x).addr < H(x+1).addr. 3068 The pseudo code for updating VCs based on an incoming JOIN/LEAVE 3069 might be: 3071 x = 1; 3072 N = 1; 3074 while (x < no.of VCs open) 3075 { 3076 while (H(x).addr > max(N)) 3077 { 3078 N++; 3079 if (N > no. of pairs in JOIN/LEAVE) 3080 return(0); 3081 } 3083 if ((H(x).addr <= max(N) && 3084 ((H(x).addr >= min(N)) 3085 perform_VC_update(); 3086 x++; 3087 } 3089 Appendix B. Minimising the impact of IGMP in IPv4 environments. 3091 Implementing any part of this appendix is not required for 3092 conformance with this document. It is provided solely to document 3093 issues that have been identified. 3095 The intent of section 5.1 is for cluster members to only have 3096 outgoing point to multipoint VCs when they are actually sending data 3097 to a particular multicast groups. However, in most IPv4 environments 3098 the multicast routers attached to a cluster will periodically issue 3099 IGMP Queries to ascertain if particular groups have members. The 3100 current IGMP specification attempts to avoid having every group 3101 member respond by insisting that each group member wait a random 3102 period, and responding if no other member has responded before them. 3103 The IGMP reply is sent to the multicast address of the group being 3104 queried. 3106 Unfortunately, as it stands the IGMP algorithm will be a nuisance for 3107 cluster members that are essentially passive receivers within a given 3108 multicast group. It is just as likely that a passive member, with no 3109 outgoing VC already established to the group, will decide to send an 3110 IGMP reply - causing a VC to be established were there was no need 3111 for one. This is not a fatal problem for small clusters, but will 3112 seriously impact on the ability of a cluster to scale. 3114 The most obvious solution is for routers to use the 3115 MARS_GROUPLIST_REQUEST and MARS_GROUPLIST_REPLY messages, as 3116 described in section 8.5. This would remove the regular IGMP Queries, 3117 resulting in cluster members only sending an IGMP Report when they 3118 first join a group. 3120 Alternative solutions do exist. One would be to modify the IGMP reply 3121 algorithm, for example: 3123 If the group member has VC open to the group proceed as per RFC 3124 1112 (picking a random reply delay between 0 and 10 seconds). 3126 If the group member does not have VC already open to the group, 3127 pick random reply delay between 10 and 20 seconds instead, and 3128 then proceed as per RFC 1112. 3130 If even one group member is sending to the group at the time the IGMP 3131 Query is issued then all the passive receivers will find the IGMP 3132 Reply has been transmitted before their delay expires, so no new VC 3133 is required. If all group members are passive at the time of the IGMP 3134 Query then a response will eventually arrive, but 10 seconds later 3135 than under conventional circumstances. 3137 The preceeding solution requires re-writing existing IGMP code, and 3138 implies the ability of the IGMP entity to ascertain the status of VCs 3139 on the underlying ATM interface. This is not likely to be available 3140 in the short term. 3142 One short term solution is to provide something like the preceeding 3143 functionality with a 'hack' at the IP/ATM driver level within cluster 3144 members. Arrange for the IP/ATM driver to snoop inside IP packets 3145 looking for IGMP traffic. If an IGMP packet is accepted for 3146 transmission, the IP/ATM driver can buffer it locally if there is no 3147 VC already active to that group. A 10 second timer is started, and if 3148 an IGMP Reply for that group is received from elsewhere on the 3149 cluster the timer is reset. If the timer expires, the IP/ATM driver 3150 then establishes a VC to the group as it would for a normal IP 3151 multicast packet. 3153 Some network implementors may find it advantageous to configure a 3154 multicast server to support the group 224.0.0.1, rather than rely on 3155 a mesh. Given that IP multicast routers regularly send IGMP queries 3156 to this address, a mesh will mean that each router will permanently 3157 consume an AAL context within each cluster member. In clusters served 3158 by multiple routers the VC load within switches in the underlying ATM 3159 network will become a scaling problem. 3161 Finally, if a multicast server is used to support 224.0.0.1, another 3162 ATM driver level hack becomes a possible solution to IGMP Reply 3163 traffic. The ATM driver may choose to grab all outgoing IGMP packets 3164 and send them out on the VC established for sending to 224.0.0.1, 3165 regardless of the Class D address the IGMP message was actually for. 3166 Given that all hosts and routers must be members of 224.0.0.1, the 3167 intended recipients will still receive the IGMP Replies. The negative 3168 impact is that all cluster members will receive the IGMP Replies. 3170 Appendix C. Further comments on 'Clusters'. 3172 The cluster concept was introduced in section 1 for two reasons. The 3173 more well known term of Logical IP Subnet is both very IP specific, 3174 and constrained to unicast routing boundaries. As the architecture 3175 described in this document may be re-used in non-IP environments a 3176 more neutral term was needed. As the needs of multicasting are not 3177 always bound by the same scopes as unicasting, it was not immediately 3178 obvious that apriori limiting ourselves to LISs was beneficial in the 3179 long term. 3181 It must be stressed that Clusters are purely an administrative being. 3182 You choose their size (i.e. the number of endpoints that register 3183 with the same MARS) based on your multicasting needs, and the 3184 resource consumption you are willing to put up with. The larger the 3185 number of ATM attached hosts you require multicast support for, the 3186 more individual clusters you might choose to establish (along with 3187 multicast routers to provide inter-cluster traffic paths). 3189 Given that not all the hosts in any given LIS may require multicast 3190 support, it becomes conceivable that you might assign a single MARS 3191 to support hosts from across multiple LISs. In effect you have a 3192 cluster covering multiple LISs, and have achieved 'cut through' 3193 routing for multicast traffic. Under these circumstances increasing 3194 the geographical size of a cluster might be considered a good thing. 3196 However, practical considerations limit the size of clusters. Having 3197 a cluster span multiple LISs may not always be a particular 'win' 3198 situation. As the number of multicast capable hosts in your LISs 3199 increases it becomes more likely that you'll want to constrain a 3200 cluster's size and force multicast traffic to aggregate at multicast 3201 routers scattered across your ATM cloud. 3203 Finally, multi-LIS clusters require a degree of care when deploying 3204 IP multicast routers. Under the Classical IP model you need unicast 3205 routers on the edges of LISs. Under the MARS architecture you only 3206 need multicast routers at the edges of clusters. If your cluster 3207 spans multiple LISs, then the multicast routers will perceive 3208 themselves to have a single interface that is simultaneously attached 3209 to multiple unicast subnets. Whether this situation will work depends 3210 on the inter-domain multicast routing protocols you use, and your 3211 multicast router's ability to understand the new relationship between 3212 unicast and multicast topologies. 3214 In the absence of futher research in this area, networks deployed in 3215 conformance to this document MUST make their IP cluster and IP LIS 3216 coincide, so as to avoid these complications. 3218 Appendix D. TLV list parsing algorithm. 3220 The following pseudo-code represents how the TLV list format 3221 described in section 10 could be handled by a MARS or MARS client. 3223 list = (ar$extoff & 0xFFFC); 3225 if (list == 0) exit; 3227 list = list + message_base; 3229 while (list->Type.y != 0) 3230 { 3231 switch (list->Type.y) 3232 { 3233 default: 3234 { 3235 if (list->Type.x == 0) break; 3237 if (list->Type.x == 1) exit; 3239 if (list->Type.x == 2) log-error-and-exit; 3240 } 3242 [...other handling goes here..] 3244 } 3246 list += (list->Length + 4 + ((4-(list->Length & 3)) % 3247 4)); 3249 } 3251 return; 3253 Appendix E. Summary of timer values. 3255 This appendix summarises of various timers or limits mentioned in the 3256 main body of the document. Values are specified in the following 3257 format: [x, y, z] indicating a minimum value of x, a recommended 3258 value of y, and a maximum value of z. A '-' will indicate that a 3259 categroy has no value specified. Values in minutes are followed by 3260 'min', values in seconds are followed by 'sec'. 3262 Idle time for MARS - MARS client pt to pt VC: 3263 [1 min, 20 min, -] 3265 Idle time for multipoint VCs from client. 3266 [1 min, 20 min, -] 3268 Allowed time between MARS_MULTI components. 3269 [-, -, 10 sec] 3271 Initial random L_MULTI_RQ/ADD retransmit timer range. 3272 [5 sec, -, 10 sec] 3274 Random time to set VC_revalidate flag. 3275 [1 sec, -, 10 sec] 3277 MARS_JOIN/LEAVE retransmit interval. 3278 [5 sec, 10 sec, -] 3280 MARS_JOIN/LEAVE retransmit limit. 3281 [-, -, 5] 3283 Random time to re-register with MARS. 3284 [1 sec, -, 10 sec] 3286 Force wait if MARS re-registration is looping. 3287 [1 min, -, -] 3289 Transmission interval for MARS_REDIRECT_MAP. 3290 [1 min, 1 min, 2 min] 3292 Limit for client to miss MARS_REDIRECT_MAPs. 3293 [-, -, 4 min] 3295 Appendix F. Pseudo code for MARS operation. 3297 Implementations are entirely free to comply with the body of this 3298 memo in any way they see fit. This appendix is purely for possible 3299 clarification. 3301 A MARS implementation might be built along the lines suggested in 3302 this pseudo-code. 3304 1. Main 3306 1.1 Initilization 3308 Define a server list as the list of leaf nodes 3309 on ServerControlVC. 3310 Define a cluster list as the list of leaf nodes 3311 on ClusterControlVC. 3312 Define a host map as the list of hosts that are 3313 members of a group. 3314 Define a server map as the list of hosts (MCSs) 3315 that are serving a group. 3316 Read config file. 3317 Allocate message queues. 3318 Allocate internal tables. 3319 Set up passive open VC connection. 3320 Set up redirect_map timer. 3321 Establish logging. 3323 1.2 Message Processing 3325 Forever { 3326 If the message has a TLV then { 3327 If TLV is unsupported then { 3328 process as defined in TLV type field. 3329 } /* unknown TLV */ 3330 } /* TLV present */ 3331 Place incoming message in the queue. 3332 For (all messages in the queue) { 3333 If the message is not a JOIN/LEAVE/MSERV/UNSERV with 3334 ar$flags.register == 1 then { 3335 If the message source is (not a member of server list) && 3336 (not a member of cluster list) then { 3337 Drop the message silently. 3338 } 3339 } 3340 If (ar$pro.type is not supported) or 3341 (the ATM source address is missing) then { 3342 Continue. 3344 } 3345 Determine type of message. 3346 If an ERR_L_RELEASE arrives on ClusterControlVC then { 3347 Remove the endpoints ATM address from all groups 3348 for which it has joined. 3349 Release the CMI. 3350 Continue. 3351 } /* error on CCVC */ 3352 Call specific message handling routine. 3353 If redirect_map timer pops { 3354 Call MARS_REDIRECT_MAP message handling routine. 3355 } /* redirect timer pop */ 3356 } /* all msgs in the queue */ 3357 } /* forever loop */ 3359 2. Message Handler 3361 2.1 Messages: 3363 - MARS_REQUEST 3365 Indicate no MARS_MULTI support of TLV. 3366 If the supported TLV is not NULL then { 3367 Indicate MARS_MULTI support of TLV. 3368 Process as required. 3369 } else { /* TLV NULL */ 3370 Indicate message to be sent on Private VC. 3371 If the message source is a member of server list then { 3372 If the group has a non-null host map then { 3373 Call MARS_MULTI with the host map for the group. 3374 } else { /* no group */ 3375 Call MARS_NAK message routine. 3376 } /* no group */ 3377 } else { /* source is cluster list */ 3378 If the group has a non-null server map then { 3379 Call MARS_MULTI with the server map for the group. 3380 } else { /* cluster member but no server map */ 3381 If the group has a non-null host map then { 3382 Call MARS_MULTI with the host map for the group. 3383 } else { /* no group */ 3384 Call MARS_NAK message routine. 3385 } /* no group */ 3386 } /* cluster member but no server map */ 3387 } /* source is a cluster list */ 3388 } /* TLV NULL */ 3389 If a message exists then { 3390 Send message as indicated. 3391 } 3392 Return. 3394 - MARS_MULTI 3396 Construct a MARS_MULTI for the specified map. 3397 If the param indicates TLV support then { 3398 Process the TLV as required. 3399 } 3400 Return. 3402 - MARS_JOIN 3404 If (ar$flags.copy != 0) silently ignore the message. 3405 If more than a single pair is specified then 3406 ignore the 2nd and subsequent pairs. 3407 Indicate message to be sent on private VC. 3408 If (ar$flags.register == 1) then { 3409 If the node is already a registered member of the cluster 3410 associated with protocol type then { /*previous register*/ 3411 Copy the existing CMI into the MARS_JOIN. 3412 } else { /* new register */ 3413 Add the node to ClusterControlVC. 3414 Add the node to cluster list. 3415 ar$cmi = obtain CMI. 3416 } /* new register */ 3417 } else { /* not a register */ 3418 If the message source is in server map then { 3419 Drop the message silently. 3420 Indicate no message to be sent. 3421 } else { 3422 If the first encompasses any group with 3423 a server map then { 3424 Call the Modified JOIN/LEAVE Processing routine. 3425 Indicate no message to be sent. 3426 } else { /* server map does not exist */ 3427 Update internal tables. 3428 Indicate message to be sent on ClusterControlVC. 3429 } /* server map does not exist */ 3430 } 3431 } /* not a register */ 3432 ar$flags.copy = 1. 3433 Send message as indicated. 3434 Return. 3436 - MARS_LEAVE 3438 If (ar$flags.copy != 0) silently ignore the message. 3439 Indicate message to be sent on ClusterControlVC. 3441 If group address is 224.0.0.1 then { 3442 All references to this node must be eliminated from 3443 any other groups for which it is a member. 3444 } /* group is 224.0.0.1 */ 3445 If (ar$flags.register == 1) then { /* deregistration */ 3446 Update internal tables to remove the member's ATM addr 3447 from all groups it has joined. 3448 Drop the endpoint from ClusterControlVC. 3449 Drop the endpoint from cluster list. 3450 Release the CMI. 3451 Indicate message to be sent on Private VC. 3452 } else { /* not a deregistration */ 3453 If the first encompasses any group with 3454 a server map then { 3455 Call the Modified JOIN/LEAVE Processing routine. 3456 Indicate no message to be sent. 3457 } else { /* server map does not exist */ 3458 Update internal tables. 3459 Indicate message to be sent on ClusterControlVC. 3460 } 3461 } /* not a deregistration */ 3462 If a message exists then { 3463 ar$flags.copy = 1. 3464 Send message as indicated. 3465 } 3466 Return. 3468 - MARS_MSERV 3470 If (ar$flags.register == 1) then { /* server register */ 3471 Add the endpoint as a leaf node to ServerControlVC. 3472 Add the endpoint to the server list. 3473 Indicate the message to be sent on Private VC. 3474 ar$cmi = 0. 3475 } else { /* not a register */ 3476 If (group has non-null host map) && (no server map) then { 3477 Silently drop the message. 3478 Indicate no message to be sent. 3479 } else { 3480 If the source has not registered then { 3481 Drop and ignore the message. 3482 Indicate no message to be sent. 3483 } else { /* source is registered */ 3484 Add the server ATM addr to the server map for the group. 3485 Indicate the message to be sent on ServerControlVC. 3486 Send message as indicated. 3487 Make a copy of the message. 3488 Change the op code to MARS_JOIN. 3490 Indicate the message to be sent on ClusterControlVC. 3491 ar$flags.layer3grp = 0. 3492 ar$flags.copy = 1. 3493 } /* source is registered */ 3494 } 3495 } /* not a register */ 3496 If a message exists then { 3497 Send message as indicated. 3498 } 3499 Return. 3501 - MARS_UNSERV 3503 If (ar$flags.register == 1) then { /* deregister */ 3504 Remove the ATM addr of the MCS from all server maps. 3505 If a server map becomes null then delete it. 3506 Remove the endpoint as a leaf of ServerControlVC. 3507 Remove the endpoint from server list. 3508 Indicate the message to be sent on Private VC. 3509 } else { /* not a deregister */ 3510 If the source is not a member of server list then { 3511 Drop and ignore the message. 3512 Indicate no message to be sent. 3513 } else { /* source is registered */ 3514 Remove ATM addr of the MCS from each server map indicated. 3515 If a server map is null then delete it. 3516 Indicate the message to be sent on ServerControlVC. 3517 Send message as indicated. 3518 Make a copy of the message. 3519 Change the op code to MARS_LEAVE. 3520 Indicate the message (copy) to be sent on ClusterControlVC. 3521 ar$flags.layer3grp = 0; 3522 ar$flags.copy = 1. 3523 } /* source is registered */ 3524 } /* not a deregister */ 3525 If a message exists then { 3526 Send message as indicated. 3527 } 3528 Return. 3530 - MARS_NAK 3532 Build command. 3533 Return. 3535 - MARS_GROUPLIST_REQUEST 3537 If (ar$pnum != 1) then Return. 3539 Call MARS_GROUPLIST_REPLY with the range and output VC. 3540 Return. 3542 - MARS_GROUPLIST_REPLY 3544 Build command for specified range. 3545 Indicate message to be sent on specified VC. 3546 Send message as indicated. 3547 Return. 3549 - MARS_REDIRECT_MAP 3551 Include the MARSs own address in the message. 3552 If there are backup MARSs then include their addresses. 3553 Indicate MARS_REDIRECT_MAP is to be sent on ClusterControlVC. 3554 Send message back as indicated. 3555 Return. 3557 3. Send Message Handler 3559 If the message is going out ClusterControlVC then { 3560 ar$msn = obtain a CSN 3561 } 3562 If the message is going out ServerControlVC then { 3563 ar$msn = obtain a SSN 3564 } 3565 Return. 3567 4. Number Generator 3569 4.1 Cluster Sequence Number 3571 Generate the next sequence number. 3572 Return. 3574 4.2 Server Sequence Number 3576 Generate the next sequence number. 3577 Return. 3579 4.3 CMI 3581 CMIs are allocated uniquely per registered cluster member 3582 within the context of a particular layer 3 protocol type. 3583 A single node may register multiple times if it supports 3584 multiple layer 3 protocols. 3585 The CMIs allocated for each such registration may or may 3586 not be the same. 3588 Generate a CMI for this protocol. 3589 Return. 3591 5. Modified JOIN/LEAVE Processing 3593 This routine processes JOIN/LEAVE when a server map exists. 3595 Make a copy of the message. 3596 Change the type of the copy to MARS_SJOIN. 3597 If the message is a MARS_LEAVE then { 3598 Change the type of the copy to MARS_SLEAVE. 3599 } 3600 ar$flags.copy = 1 (copy). 3601 Indicate the message to be sent on ServerControlVC. 3602 Send message (copy) as indicated. 3603 ar$flags.punched = 0 in the original message. 3604 Indicate the message to be sent on Private VC. 3605 Send message (original) as indicated. 3606 Hole punch the group by excluding 3607 from the range those groups that are served by MCSs. 3608 Indicate the (original) message to be sent on ClusterControlVC. 3609 If (number of holes punched > 0) then { /* punched holes */ 3610 In original message do { 3611 ar$flags.punched = 1. 3612 old punched list <- new punched list. 3613 } 3614 } /* punched holes */ 3615 ar$flags.copy = 1. 3616 Send message as indicated. 3617 Return.