idnits 2.17.1 draft-ietf-ipatm-ipmc-11.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-26) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. == There are 19 instances of lines with multicast IPv4 addresses in the document. If these are generic example addresses, they should be changed to use the 233.252.0.x range defined in RFC 5771 ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 242: '... administrators MUST ensure that each...' RFC 2119 keyword, line 246: '... restriction MAY only occur after fu...' RFC 2119 keyword, line 571: '...control messages MUST be LLC/SNAP enca...' RFC 2119 keyword, line 605: '... null addresses SHALL be encoded as z...' RFC 2119 keyword, line 640: '... value of 0x000F SHALL be used by MARS...' (157 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 923 has weird spacing: '...qoctets sourc...' == Line 924 has weird spacing: '...roctets sourc...' == Line 925 has weird spacing: '...soctets sourc...' == Line 926 has weird spacing: '...zoctets targe...' == Line 927 has weird spacing: '...xoctets targe...' == (30 more instances...) == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: A host or endpoint interface that is using the same MARS to support multicasting needs of multiple protocols MUST not assume their CMI will be the same for each protocol. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '0xAA-AA-03' on line 2809 -- Looks like a reference, but probably isn't: '0x00-00-5E' on line 2809 -- Looks like a reference, but probably isn't: '0x00-03' on line 574 -- Looks like a reference, but probably isn't: 'Addresses' on line 2824 -- Looks like a reference, but probably isn't: '0x00-01' on line 2809 -- Looks like a reference, but probably isn't: '0x00-04' on line 1868 -- Looks like a reference, but probably isn't: '0x00-80' on line 1903 -- Looks like a reference, but probably isn't: '0x800' on line 1966 -- Looks like a reference, but probably isn't: '0x86DD' on line 2809 -- Looks like a reference, but probably isn't: '0x00-00' on line 2927 -- Looks like a reference, but probably isn't: '-' on line 3390 ** Obsolete normative reference: RFC 1483 (ref. '2') (Obsoleted by RFC 2684) ** Obsolete normative reference: RFC 1577 (ref. '3') (Obsoleted by RFC 2225) -- Possible downref: Non-RFC (?) normative reference: ref. '4' ** Downref: Normative reference to an Experimental RFC: RFC 1075 (ref. '5') ** Downref: Normative reference to an Informational RFC: RFC 1821 (ref. '7') -- Possible downref: Non-RFC (?) normative reference: ref. '8' Summary: 14 errors (**), 0 flaws (~~), 9 warnings (==), 16 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet-Draft Grenville Armitage 3 Bellcore 4 February 9th, 1996 6 Support for Multicast over UNI 3.0/3.1 based ATM Networks. 7 9 Status of this Memo 11 This document was submitted to the IETF IP over ATM WG. Publication 12 of this document does not imply acceptance by the IP over ATM WG of 13 any ideas expressed within. Comments should be submitted to the ip- 14 atm@matmos.hpl.hp.com mailing list. 16 Distribution of this memo is unlimited. 18 This memo is an internet draft. Internet Drafts are working documents 19 of the Internet Engineering Task Force (IETF), its Areas, and its 20 Working Groups. Note that other groups may also distribute working 21 documents as Internet Drafts. 23 Internet Drafts are draft documents valid for a maximum of six 24 months. Internet Drafts may be updated, replaced, or obsoleted by 25 other documents at any time. It is not appropriate to use Internet 26 Drafts as reference material or to cite them other than as a "working 27 draft" or "work in progress". 29 Please check the lid-abstracts.txt listing contained in the 30 internet-drafts shadow directories on ds.internic.net (US East 31 Coast), nic.nordu.net (Europe), ftp.isi.edu (US West Coast), or 32 munnari.oz.au (Pacific Rim) to learn the current status of any 33 Internet Draft. 35 Abstract 37 Mapping the connectionless IP multicast service over the connection 38 oriented ATM services provided by UNI 3.0/3.1 is a non-trivial task. 39 This memo describes a mechanism to support the multicast needs of 40 Layer 3 protocols in general, and describes its application to IP 41 multicasting in particular. 43 ATM based IP hosts and routers use a Multicast Address Resolution 44 Server (MARS) to support RFC 1112 style Level 2 IP multicast over the 45 ATM Forum's UNI 3.0/3.1 point to multipoint connection service. 46 Clusters of endpoints share a MARS and use it to track and 47 disseminate information identifying the nodes listed as receivers for 48 given multicast groups. This allows endpoints to establish and manage 49 point to multipoint VCs when transmitting to the group. 51 The MARS behaviour allows Layer 3 multicasting to be supported using 52 either meshes of VCs or ATM level multicast servers. This choice may 53 be made on a per-group basis, and is transparent to the endpoints. 55 Contents. 57 1. Introduction. 58 1.1 The Multicast Address Resolution Server (MARS). 59 1.2 The ATM level multicast Cluster. 60 1.3 Document overview. 61 1.4 Conventions. 62 2. The IP multicast service model. 63 3. UNI 3.0/3.1 support for intra-cluster multicasting. 64 3.1 VC meshes. 65 3.2 Multicast Servers. 66 3.3 Tradeoffs. 67 3.4 Interaction with local UNI 3.0/3.1 signalling entity. 68 4. Overview of the MARS. 69 4.1 Architecture. 70 4.2 Control message format. 71 4.3 Fixed header fields in MARS control messages. 72 4.3.1 Hardware type. 73 4.3.2 Protocol type. 74 4.3.3 Checksum. 75 4.3.4 Extensions Offset. 76 4.3.5 Operation code. 77 4.3.6 Reserved. 78 5. Endpoint (MARS client) interface behaviour. 79 5.1 Transmit side behaviour. 80 5.1.1 Retrieving Group Membership from the MARS. 81 5.1.2 MARS_REQUEST, MARS_MULTI, and MARS_NAK messages. 82 5.1.3 Establishing the outgoing multipoint VC. 83 5.1.4 Monitoring updates on ClusterControlVC. 84 5.1.4.1 Updating the active VCs. 85 5.1.4.2 Tracking the Cluster Sequence Number. 86 5.1.5 Revalidating a VC's leaf nodes. 87 5.1.5.1 When leaf node drops itself. 88 5.1.5.2 When a jump is detected in the CSN. 89 5.1.6 'Migrating' the outgoing multipoint VC. 90 5.2. Receive side behaviour. 91 5.2.1 Format of the MARS_JOIN and MARS_LEAVE Messages. 92 5.2.1.1 Important IPv4 default values. 93 5.2.2 Retransmission of MARS_JOIN and MARS_LEAVE messages. 94 5.2.3 Cluster member registration and deregistration. 95 5.3 Support for Layer 3 group management. 96 5.4 Support for redundant/backup MARS entities. 97 5.4.1 First response to MARS problems. 98 5.4.2 Connecting to a backup MARS. 99 5.4.3 Dynamic backup lists, and soft redirects. 100 5.5 Data path LLC/SNAP encapsulations. 101 5.5.1 Type #1 encapsulation. 102 5.5.2 Type #2 encapsulation. 104 5.5.3 A Type #1 example. 105 6. The MARS in greater detail. 106 6.1 Basic interface to Cluster members. 107 6.1.1 Response to MARS_REQUEST. 108 6.1.2 Response to MARS_JOIN and MARS_LEAVE. 109 6.1.3 Generating MARS_REDIRECT_MAP. 110 6.1.4 Cluster Sequence Numbers. 111 6.2 MARS interface to Multicast Servers (MCSs). 112 6.2.1 MARS_REQUESTs for MCS supported groups. 113 6.2.2 MARS_MSERV and MARS_UNSERV messages. 114 6.2.3 Registering a Multicast Server (MCS). 115 6.2.4 Modified response to MARS_JOIN and MARS_LEAVE. 116 6.2.5 Sequence numbers for ServerControlVC traffic. 117 6.3 Why global sequence numbers? 118 6.4 Redundant/Backup MARS Architectures. 119 7. How an MCS utilises a MARS. 120 7.1 Association with a particular Layer 3 group. 121 7.2 Termination of incoming VCs. 122 7.3 Management of outgoing VC. 123 7.4 Use of a backup MARS. 124 8. Support for IP multicast routers. 125 8.1 Forwarding into a Cluster. 126 8.2 Joining in 'promiscuous' mode. 127 8.3 Forwarding across the cluster. 128 8.4 Joining in 'semi-promiscous' mode. 129 8.5 An alternative to IGMP Queries. 130 8.6 CMIs across multiple interfaces. 131 9. Multiprotocol applications of the MARS and MARS clients. 132 10. Supplementary parameter processing. 133 10.1 Interpreting the mar$extoff field. 134 10.2 The format of TLVs. 135 10.3 Processing MARS messages with TLVs. 136 10.4 Initial set of TLV elements. 137 11. Key Decisions and open issues. 138 Acknowledgments 139 References 140 Appendix A. Hole punching algorithms. 141 Appendix B. Minimising the impact of IGMP in IPv4 environments. 142 Appendix C. Further comments on 'Clusters'. 143 Appendix D. TLV list parsing algorithm. 144 Appendix E. Summary of timer values. 145 Appendix F. Pseudo code for MARS operation. 147 1. Introduction. 149 Multicasting is the process whereby a source host or protocol entity 150 sends a packet to multiple destinations simultaneously using a 151 single, local 'transmit' operation. The more familiar cases of 152 Unicasting and Broadcasting may be considered to be special cases of 153 Multicasting (with the packet delivered to one destination, or 'all' 154 destinations, respectively). 156 Most network layer models, like the one described in RFC 1112 [1] for 157 IP multicasting, assume sources may send their packets to an abstract 158 'multicast group addresses'. Link layer support for such an 159 abstraction is assumed to exist, and is provided by technologies such 160 as Ethernet. 162 ATM is being utilized as a new link layer technology to support a 163 variety of protocols, including IP. With RFC 1483 [2] the IETF 164 defined a multiprotocol mechanism for encapsulating and transmitting 165 packets using AAL5 over ATM Virtual Channels (VCs). However, the ATM 166 Forum's currently published signalling specifications (UNI 3.0 [8] 167 and UNI 3.1 [4]) does not provide the multicast address abstraction. 168 Unicast connections are supported by point to point, bidirectional 169 VCs. Multicasting is supported through point to multipoint 170 unidirectional VCs. The key limitation is that the sender must have 171 prior knowledge of each intended recipient, and explicitly establish 172 a VC with itself as the root node and the recipients as the leaf 173 nodes. 175 This document has two broad goals: 177 Define a group address registration and membership distribution 178 mechanism that allows UNI 3.0/3.1 based networks to support the 179 multicast service of protocols such as IP. 181 Define specific endpoint behaviours for managing point to 182 multipoint VCs to achieve multicasting of layer 3 packets. 184 As the IETF is currently in the forefront of using wide area 185 multicasting this document's descriptions will often focus on IP 186 service model of RFC 1112. A final chapter will note the 187 multiprotocol application of the architecture. 189 This document avoids discussion of one highly non-trivial aspect of 190 using ATM - the specification of QoS for VCs being established in 191 response to higher layer needs. Research in this area is still very 192 formative [7], and so it is assumed that future documents will 193 clarify the mapping of QoS requirements to VC establishment. The 194 default at this time is that VCs are established with a request for 195 Unspecified Bit Rate (UBR) service, as typified by the IETF's use of 196 VCs for unicast IP, described in RFC 1755 [6]. 198 1.1 The Multicast Address Resolution Server (MARS). 200 The Multicast Address Resolution Server (MARS) is an extended analog 201 of the ATM ARP Server introduced in RFC 1577 [3]. It acts as a 202 registry, associating layer 3 multicast group identifiers with the 203 ATM interfaces representing the group's members. MARS messages 204 support the distribution of multicast group membership information 205 between MARS and endpoints (hosts or routers). Endpoint address 206 resolution entities query the MARS when a layer 3 address needs to be 207 resolved to the set of ATM endpoints making up the group at any one 208 time. Endpoints keep the MARS informed when they need to join or 209 leave particular layer 3 groups. To provide for asynchronous 210 notification of group membership changes the MARS manages a point to 211 multipoint VC out to all endpoints desiring multicast support 213 Valid arguments can be made for two different approaches to ATM level 214 multicasting of layer 3 packets - through meshes of point to 215 multipoint VCs, or ATM level multicast servers (MCS). The MARS 216 architecture allows either VC meshes or MCSs to be used on a per- 217 group basis. 219 1.2 The ATM level multicast Cluster. 221 Each MARS manages a 'cluster' of ATM-attached endpoints. A Cluster is 222 defined as 224 The set of ATM interfaces chosing to participate in direct ATM 225 connections to achieve multicasting of AAL_SDUs between 226 themselves. 228 In practice, a Cluster is the set of endpoints that choose to use the 229 same MARS to register their memberships and receive their updates 230 from. 232 By implication of this definition, traffic between interfaces 233 belonging to different Clusters passes through an inter-cluster 234 device. (In the IP world an inter-cluster device would be an IP 235 multicast router with logical interfaces into each Cluster.) This 236 document explicitly avoids specifying the nature of inter-cluster 237 (layer 3) routing protocols. 239 The mapping of clusters to other constrained sets of endpoints (such 240 as unicast Logical IP Subnets) is left to each network administrator. 241 However, for the purposes of conformance with this document network 242 administrators MUST ensure that each Logical IP Subnet (LIS) is 243 served by a separate MARS, creating a one-to-one mapping between 244 cluster and unicast LIS. IP multicast routers then interconnect each 245 LIS as they do with conventional subnets. (Relaxation of this 246 restriction MAY only occur after future research on the interaction 247 between existing layer 3 multicast routing protocols and unicast 248 subnet boundaries.) 250 The term 'Cluster Member' will be used in this document to refer to 251 an endpoint that is currently using a MARS for multicast support. 252 Thus potential scope of a cluster may be the entire membership of a 253 LIS, while the actual scope of a cluster depends on which endpoints 254 are actually cluster members at any given time. 256 1.3 Document overview. 258 This document assumes an understanding of concepts explained in 259 greater detail in RFC 1112, RFC 1577, UNI 3.0/3.1, and RFC 1755 [6]. 261 Section 2 provides an overview of IP multicast and what RFC 1112 262 required from Ethernet. 264 Section 3 describes in more detail the multicast support services 265 offered by UNI 3.0/3.1, and outlines the differences between VC 266 meshes and multicast servers (MCSs) as mechanisms for distributing 267 packets to multiple destinations. 269 Section 4 provides an overview of the MARS and its relationship to 270 ATM endpoints. This section also discusses the encapsulation and 271 structure of MARS control messages. 273 Section 5 substantially defines the entire cluster member endpoint 274 behaviour, on both receive and transmit sides. This includes both 275 normal operation and error recovery. 277 Section 6 summarises the required behaviour of a MARS. 279 Section 7 looks at how a multicast server (MCS) interacts with a 280 MARS. 282 Section 8 discusses how IP multicast routers may make novel use of 283 promiscuous and semi-promiscuous group joins. Also discussed is a 284 mechanism designed to reduce the amount of IGMP traffic issued by 285 routers. 287 Section 9 discusses how this document applies in the more general 288 (non-IP) case. 290 Section 10 summarises the key proposals, and identifies areas for 291 future research that are generated by this MARS architecture. 293 The appendices provide discussion on issues that arise out the 294 implementation of this document. Appendix A discusses MARS and 295 endpoint algorithms for parsing MARS messages. Appendix B describes 296 the particular problems introduced by the current IGMP paradigms, and 297 possible interim work-arounds. Appendix C discusses the 'cluster' 298 concept in further detail, while Appendix D briefly outlines an 299 algorithm for parsing TLV lists. Appendix E summarises various timer 300 values used in this document, and Appendix F provides example 301 pseudo-code for a MARS entity. 303 1.4 Conventions. 305 In this document the following coding and packet representation rules 306 are used: 308 All multi-octet parameters are encoded in big-endian form (i.e. 309 the most significant octet comes first). 311 In all multi-bit parameters bit numbering begins at 0 for the 312 least significant bit when stored in memory (i.e. the n'th bit has 313 weight of 2^n). 315 A bit that is 'set', 'on', or 'one' holds the value 1. 317 A bit that is 'reset', 'off', 'clear', or 'zero' holds the value 318 0. 320 2. Summary of the IP multicast service model. 322 Under IP version 4 (IPv4), addresses in the range between 224.0.0.0 323 and 239.255.255.255 (224.0.0.0/4) are termed 'Class D' or 'multicast 324 group' addresses. These abstractly represent all the IP hosts in the 325 Internet (or some constrained subset of the Internet) who have 326 decided to 'join' the specified group. 328 RFC1112 requires that a multicast-capable IP interface must support 329 the transmission of IP packets to an IP multicast group address, 330 whether or not the node considers itself a 'member' of that group. 331 Consequently, group membership is effectively irrelevant to the 332 transmit side of the link layer interfaces. When Ethernet is used as 333 the link layer (the example used in RFC1112), no address resolution 334 is required to transmit packets. An algorithmic mapping from IP 335 multicast address to Ethernet multicast address is performed locally 336 before the packet is sent out the local interface in the same 'send 337 and forget' manner as a unicast IP packet. 339 Joining and Leaving an IP multicast group is more explicit on the 340 receive side - with the primitives JoinLocalGroup and LeaveLocalGroup 341 affecting what groups the local link layer interface should accept 342 packets from. When the IP layer wants to receive packets from a 343 group, it issues JoinLocalGroup. When it no longer wants to receive 344 packets, it issues LeaveLocalGroup. A key point to note is that 345 changing state is a local issue, it has no affect on other hosts 346 attached to the Ethernet. 348 IGMP is defined in RFC 1112 to support IP multicast routers attached 349 to a given subnet. Hosts issue IGMP Report messages when they perform 350 a JoinLocalGroup, or in response to an IP multicast router sending an 351 IGMP Query. By periodically transmitting queries IP multicast routers 352 are able to identify what IP multicast groups have non-zero 353 membership on a given subnet. 355 A specific IP multicast address, 224.0.0.1, is allocated for the 356 transmission of IGMP Query messages. Host IP layers issue a 357 JoinLocalGroup for 224.0.0.1 when they intend to participate in IP 358 multicasting, and issue a LeaveLocalGroup for 224.0.0.1 when they've 359 ceased participating in IP multicasting. 361 Each host keeps a list of IP multicast groups it has been 362 JoinLocalGroup'd to. When a router issues an IGMP Query on 224.0.0.1 363 each host begins to send IGMP Reports for each group it is a member 364 of. IGMP Reports are sent to the group address, not 224.0.0.1, "so 365 that other members of the same group on the same network can overhear 366 the Report" and not bother sending one of their own. IP multicast 367 routers conclude that a group has no members on the subnet when IGMP 368 Queries no longer elict associated replies. 370 3. UNI 3.0/3.1 support for intra-cluster multicasting. 372 For the purposes of the MARS protocol, both UNI 3.0 and UNI 3.1 373 provide equivalent support for multicasting. Differences between UNI 374 3.0 and UNI 3.1 in required signalling elements are covered in RFC 375 1755. 377 This document will describe its operation in terms of 'generic' 378 functions that should be available to clients of a UNI 3.0/3.1 379 signalling entity in a given ATM endpoint. The ATM model broadly 380 describes an 'AAL User' as any entity that establishes and manages 381 VCs and underlying AAL services to exchange data. An IP over ATM 382 interface is a form of 'AAL User' (although the default LLC/SNAP 383 encapsulation mode specified in RFC1755 really requires that an 'LLC 384 entity' is the AAL User, which in turn supports the IP/ATM 385 interface). 387 The most fundamental limitations of UNI 3.0/3.1's multicast support 388 are: 390 Only point to multipoint, unidirectional VCs may be established. 392 Only the root (source) node of a given VC may add or remove leaf 393 nodes. 395 Leaf nodes are identified by their unicast ATM addresses. UNI 396 3.0/3.1 defines two ATM address formats - native E.164 and NSAP 397 (although it must be stressed that the NSAP address is so called 398 because it uses the NSAP format - an ATM endpoint is NOT a Network 399 layer termination point). In UNI 3.0/3.1 an 'ATM Number' is the 400 primary identification of an ATM endpoint, and it may use either 401 format. Under some circumstances an ATM endpoint must be identified 402 by both a native E.164 address (identifying the attachment point of a 403 private network to a public network), and an NSAP address ('ATM 404 Subaddress') identifying the final endpoint within the private 405 network. For the rest of this document the term will be used to mean 406 either a single 'ATM Number' or an 'ATM Number' combined with an 'ATM 407 Subaddress'. 409 3.1 VC meshes. 411 The most fundamental approach to intra-cluster multicasting is the 412 multicast VC mesh. Each source establishes its own independent point 413 to multipoint VC (a single multicast tree) to the set of leaf nodes 414 (destinations) that it has been told are members of the group it 415 wishes to send packets to. 417 Interfaces that are both senders and group members (leaf nodes) to a 418 given group will originate one point to multipoint VC, and terminate 419 one VC for every other active sender to the group. This criss- 420 crossing of VCs across the ATM network gives rise to the name 'VC 421 mesh'. 423 3.2 Multicast Servers. 425 An alternative model has each source establish a VC to an 426 intermediate node - the multicast server (MCS). The multicast server 427 itself establishes and manages a point to multipoint VC out to the 428 actual desired destinations. 430 The MCS reassembles AAL_SDUs arriving on all the incoming VCs, and 431 then queues them for transmission on its single outgoing point to 432 multipoint VC. (Reassembly of incoming AAL_SDUs is required at the 433 multicast server as AAL5 does not support cell level multiplexing of 434 different AAL_SDUs on a single outgoing VC.) 435 The leaf nodes of the multicast server's point to multipoint VC must 436 be established prior to packet transmission, and the multicast server 437 requires an external mechanism to identify them. A side-effect of 438 this method is that ATM interfaces that are both sources and group 439 members will receive copies of their own packets back from the MCS 440 (An alternative method is for the multicast server to explicitly 441 retransmit packets on individual VCs between itself and group 442 members. A benefit of this second approach is that the multicast 443 server can ensure that sources do not receive copies of their own 444 packets.) 446 The simplest MCS pays no attention to the contents of each AAL_SDU. 447 It is purely an AAL/ATM level device. More complex MCS architectures 448 (where a single endpoint serves multiple layer 3 groups) are 449 possible, but are beyond the scope of this document. More detailed 450 discussion is provided in section 7. 452 3.3 Tradeoffs. 454 Arguments over the relative merits of VC meshes and multicast servers 455 have raged for some time. Ultimately the choice depends on the 456 relative trade-offs a system administrator must make between 457 throughput, latency, congestion, and resource consumption. Even 458 criteria such as latency can mean different things to different 459 people - is it end to end packet time, or the time it takes for a 460 group to settle after a membership change? The final choice depends 461 on the characteristics of the applications generating the multicast 462 traffic. 464 If we focussed on the data path we might prefer the VC mesh because 465 it lacks the obvious single congestion point of an MCS. Throughput 466 is likely to be higher, and end to end latency lower, because the 467 mesh lacks the intermediate AAL_SDU reassembly that must occur in 468 MCSs. The underlying ATM signalling system also has greater 469 opportunity to ensure optimal branching points at ATM switches along 470 the multicast trees originating on each source. 472 However, resource consumption will be higher. Every group member's 473 ATM interface must terminate a VC per sender (consuming on-board 474 memory for state information, instance of an AAL service, and 475 buffering in accordance with the vendors particular architecture). On 476 the contrary, with a multicast server only 2 VCs (one out, one in) 477 are required, independent of the number of senders. The allocation of 478 VC related resources is also lower within the ATM cloud when using a 479 multicast server. These points may be considered to have merit in 480 environments where VCs across the UNI or within the ATM cloud are 481 valuable (e.g. the ATM provider charges on a per VC basis), or AAL 482 contexts are limited in the ATM interfaces of endpoints. 484 If we focus on the signalling load then MCSs have the advantage when 485 faced with dynamic sets of receivers. Every time the membership of a 486 multicast group changes (a leaf node needs to be added or dropped), 487 only a single point to multipoint VC needs to be modified when using 488 an MCS. This generates a single signalling event across the MCS's 489 UNI. However, when membership change occurs in a VC mesh, signalling 490 events occur at the UNIs of every traffic source - the transient 491 signalling load scales with the number of sources. This has obvious 492 ramifications if you define latency as the time for a group's 493 connectivity to stabilise after change (especially as the number of 494 senders increases). 496 Finally, as noted above, MCSs introduce a 'reflected packet' problem, 497 which requires additional per-AAL_SDU information to be carried in 498 order for layer 3 sources to detect their own AAL_SDUs coming back. 500 The MARS architecture allows system administrators to utilize either 501 approach on a group by group basis. 503 3.4 Interaction with local UNI 3.0/3.1 signalling entity. 505 The following generic signalling functions are presumed to be 506 available to local AAL Users: 508 L_CALL_RQ - Establish a unicast VC to a specific endpoint. 509 L_MULTI_RQ - Establish multicast VC to a specific endpoint. 510 L_MULTI_ADD - Add new leaf node to previously established VC. 511 L_MULTI_DROP - Remove specific leaf node from established VC. 512 L_RELEASE - Release unicast VC, or all Leaves of a multicast VC. 514 The signalling exchanges and local information passed between AAL 515 User and UNI 3.0/3.1 signalling entity with these functions are 516 outside the scope of this document. 518 The following indications are assumed to be available to AAL Users, 519 generated by by the local UNI 3.0/3.1 signalling entity: 521 L_ACK - Succesful completion of a local request. 522 L_REMOTE_CALL - A new VC has been established to the AAL User. 523 ERR_L_RQFAILED - A remote ATM endpoint rejected an L_CALL_RQ, 524 L_MULTI_RQ, or L_MULTI_ADD. 525 ERR_L_DROP - A remote ATM endpoint dropped off an existing VC. 526 ERR_L_RELEASE - An existing VC was terminated. 528 The signalling exchanges and local information passed between AAL 529 User and UNI 3.0/3.1 signalling entity with these functions are 530 outside the scope of this document. 532 4. Overview of the MARS. 534 The MARS may reside within any ATM endpoint that is directly 535 addressable by the endpoints it is serving. Endpoints wishing to join 536 a multicast cluster must be configured with the ATM address of the 537 node on which the cluster's MARS resides. (Section 5.4 describes how 538 backup MARSs may be added to support the activities of a cluster. 539 References to 'the MARS' in following sections will be assumed to 540 mean the acting MARS for the cluster.) 542 4.1 Architecture. 544 Architecturally the MARS is an evolution of the RFC 1577 ARP Server. 545 Whilst the ARP Server keeps a table of {IP,ATM} address pairs for all 546 IP endpoints in an LIS, the MARS keeps extended tables of {layer 3 547 address, ATM.1, ATM.2, ..... ATM.n} mappings. It can either be 548 configured with certain mappings, or dynamically 'learn' mappings. 549 The format of the {layer 3 address} field is generally not 550 interpreted by the MARS. 552 A single ATM node may support multiple logical MARSs, each of which 553 support a separate cluster. The restriction is that each MARS has a 554 unique ATM address (e.g. a different SEL field in the NSAP address of 555 the node on which the multiple MARSs reside). By definition a single 556 instance of a MARS may not support more than one cluster. 558 The MARS distributes group membership update information to cluster 559 members over a point to multipoint VC known as the ClusterControlVC. 560 Additionally, when Multicast Servers (MCSs) are being used it also 561 establishes a separate point to multipoint VC out to registered MCSs, 562 known as the ServerControlVC. All cluster members are leaf nodes of 563 ClusterControlVC. All registered multicast servers are leaf nodes of 564 ServerControlVC (described further in section 6). 566 The MARS does NOT take part in the actual multicasting of layer 3 567 data packets. 569 4.2 Control message format. 571 By default all MARS control messages MUST be LLC/SNAP encapsulated 572 using the following codepoints: 574 [0xAA-AA-03][0x00-00-5E][0x00-03][MARS control message] 575 (LLC) (OUI) (PID) 577 (This is a PID from the IANA OUI.) 579 MARS control messages are made up of 4 major components: 581 [Fixed header][Mandatory fields][Addresses][Supplementary TLVs] 583 [Fixed header] contains fields indicating the operation being 584 performed and the layer 3 protocol being referred to (e.g IPv4, IPv6, 585 AppleTalk, etc). The fixed header also carries checksum information, 586 and hooks to allow this basic control message structure to be re-used 587 by other query/response protocols. 589 The [Mandatory fields] section carries fixed width parameters that 590 depend on the operation type indicated in [Fixed header]. 592 The following [Addresses] area carries variable length fields for 593 source and target addresses - both hardware (e.g. ATM) and layer 3 594 (e.g. IPv4). These provide the fundamental information that the 595 registrations, queries, and updates use and operate on. For the MARS 596 protocol fields in [Fixed header] indicate how to interpret the 597 contents of [Addresses]. 599 [Supplementary TLVs] represents an optional list of TLV (type, 600 length, value) encoded information elements that may be appended to 601 provide supplementary information. This feature is described in 602 further detail in section 10. 604 MARS messages contain variable length address fields. In all cases 605 null addresses SHALL be encoded as zero length, and have no space 606 allocated in the message. 608 (Unique LLC/SNAP encapsulation of MARS control messages means MARS 609 and ARP Server functionality may be implemented within a common 610 entity, and share a client-server VC, if the implementor so chooses. 611 Note that the LLC/SNAP codepoint for MARS is different to the 612 codepoint used for ATMARP.) 614 4.3 Fixed header fields in MARS control messages. 616 The [Fixed header] has the following format: 618 Data: 619 mar$afn 16 bits Address Family (0x000F). 620 mar$pro 56 bits Protocol Identification. 621 mar$hdrrsv 24 bits Reserved. Unused by MARS control protocol. 622 mar$chksum 16 bits Checksum across entire MARS message. 623 mar$extoff 16 bits Extensions Offset. 624 mar$op 16 bits Operation code. 625 mar$shtl 8 bits Type & length of source ATM number. (r) 626 mar$sstl 8 bits Type & length of source ATM subaddress. (q) 628 mar$shtl and mar$sstl provide information regarding the source's 629 hardware (ATM) address. In the MARS protocol these fields are always 630 present, as every MARS message carries a non-null source ATM address. 631 In all cases the source ATM address is the first variable length 632 field in the [Addresses] section. 634 The other fields in [Fixed header] are described in the following 635 subsections. 637 4.3.1 Hardware type. 639 mar$afn defines the type of link layer addresses being carried. The 640 value of 0x000F SHALL be used by MARS messages generated in 641 accordance with this document. The encoding of ATM addresses and 642 subaddresses when mar$afn = 0x000F is described in section 5.1.2. 643 Encodings when mar$afn != 0x000F are outside the scope of this 644 document. 646 4.3.2 Protocol type. 648 The mar$pro field is made up of two subfields: 650 mar$pro.type 16 bits Protocol type. 651 mar$pro.snap 40 bits Optional SNAP extension to protocol type. 653 The mar$pro.type field is a 16 bit unsigned integer representing the 654 following number space: 656 0x0000 to 0x00FF Protocols defined by the equivalent NLPIDs. 657 0x0100 to 0x03FF Reserved for future use by the IETF. 658 0x0400 to 0x04FF Allocated for use by the ATM Forum. 659 0x0500 to 0x05FF Experimental/Local use. 660 0x0600 to 0xFFFF Protocols defined by the equivalent Ethertypes. 662 (based on the observations that valid Ethertypes are never smaller 663 than 0x600, and NLPIDs never larger than 0xFF.) 665 The NLPID value of 0x80 is used to indicate a SNAP encoded extension 666 is being used to encode the protocol type. When mar$pro.type == 0x80 667 the SNAP extension is encoded in the mar$pro.snap field. This is 668 termed the 'long form' protocol ID. 670 If mar$pro.type != 0x80 then the mar$pro.snap field MUST be zero on 671 transmit and ignored on receive. The mar$pro.type field itself 672 identifies the protocol being referred to. This is termed the 'short 673 form' protocol ID. 675 In all cases, where a protocol has an assigned number in the 676 mar$pro.type space (excluding 0x80) the short form MUST be used when 677 transmitting MARS messages. Additionally, where a protocol has valid 678 short and long forms of identification, receivers MAY choose to 679 recognise the long form. 681 mar$pro.type values other than 0x80 MAY have 'long forms' defined in 682 future documents. 684 For the remainder of this document references to mar$pro SHALL be 685 interpreted to mean mar$pro.type, or mar$pro.type in combination with 686 mar$pro.snap as appropriate. 688 The use of different protocol types is described further in section 689 9. 691 4.3.3 Checksum. 693 The mar$chksum field carries a standard IP checksum calculated across 694 the entire MARS control message (excluding the LLC/SNAP header). The 695 field is set to zero before performing the checksum calculation. 697 As the entire LLC/SNAP encapsulated MARS message is protected by the 698 32 bit CRC of the AAL5 transport, implementors MAY choose to ignore 699 the checksum facility. If no checksum is calculated these bits MUST 700 be reset before transmission. If no checksum is performed on 701 reception, this field MUST be ignored. If a receiver is capable of 702 validating a checksum it MUST only perform the validation when the 703 received mar$chksum field is non-zero. Messages arriving with 704 mar$chksum of 0 are always considered valid. 706 4.3.4 Extensions Offset. 708 The mar$extoff field identifies the existence and location of an 709 optional supplementary parameters list. Its use is described in 710 section 10. 712 4.3.5 Operation code. 714 The mar$op field is further subdivided into two 8 bit fields - 715 mar$op.version (leading octet) and mar$op.type (trailing octet). 716 Together they indicate the nature of the control message, and the 717 context within which its [Mandatory fields], [Addresses], and 718 [Supplementary TLVs] should be interpreted. 720 mar$op.version 721 0 MARS protocol defined in this document. 722 0x01 - 0xEF Reserved for future use by the IETF. 723 0xF0 - 0xFE Allocated for use by the ATM Forum. 725 0xFF Experimental/Local use. 727 mar$op.type 728 Value indicates operation being performed, within context of 729 the control protocol version indicated by mar$op.version. 731 For the rest of this document references to the mar$op value SHALL be 732 taken to mean mar$op.type, with mar$op.version = 0x00. The values 733 used in this document are summarised in section 11. 735 (Note this number space is independent of the ATMARP operation code 736 number space.) 738 4.3.6 Reserved. 740 mar$hdrrsv may be subdivided and assigned specific meanings for other 741 control protocols indicated by mar$op.version != 0. 743 5. Endpoint (MARS client) interface behaviour. 745 An endpoint is best thought of as a 'shim' or 'convergence' layer, 746 sitting between a layer 3 protocol's link layer interface and the 747 underlying UNI 3.0/3.1 service. An endpoint in this context can exist 748 in a host or a router - any entity that requires a generic 'layer 3 749 over ATM' interface to support layer 3 multicast. It is broken into 750 two key subsections - one for the transmit side, and one for the 751 receive side. 753 Multiple logical ATM interfaces may be supported by a single physical 754 ATM interface (for example, using different SEL values in the NSAP 755 formatted address assigned to the physical ATM interface). Therefore 756 implementors MUST allow for multiple independent 'layer 3 over ATM' 757 interfaces too, each with its own configured MARS (or table of MARSs, 758 as discussed in section 5.4), and ability to be attached to the same 759 or different clusters. 761 The initial signalling path between a MARS client (managing an 762 endpoint) and its associated MARS is a transient point to point, 763 bidirectional VC. This VC is established by the MARS client, and is 764 used to send queries to, and receive replies from, the MARS. It has 765 an associated idle timer, and is dismantled if not used for a 766 configurable period of time. The minimum suggested value for this 767 time is 1 minute, and the RECOMMENDED default is 20 minutes. (Where 768 the MARS and ARP Server are co-resident, this VC may be used for both 769 ATM ARP traffic and MARS control traffic.) 771 The remaining signalling path is ClusterControlVC, to which the MARS 772 client is added as a leaf node when it registers (described in 773 section 5.2.3). 775 The majority of this document covers the distribution of information 776 allowing endpoints to establish and manage outgoing point to 777 multipoint VCs - the forwarding paths for multicast traffic to 778 particular multicast groups. The actual format of the AAL_SDUs sent 779 on these VCs is almost completely outside the scope of this 780 specification. However, endpoints are not expected to know whether 781 their forwarding path leads directly to a multicast group's members 782 or to an MCS (described in section 3). This requires additional per- 783 packet encapsulation (described in section 5.5) to aid in the the 784 detection of reflected AAL_SDUs. 786 5.1 Transmit side behaviour. 788 The following description will often be in terms of an IPv4/ATM 789 interface that is capable of transmitting packets to a Class D 790 address at any time, without prior warning. It should be trivial for 791 an implementor to generalise this behaviour to the requirements of 792 another layer 3 data protocol. 794 When a local Layer 3 entity passes down a packet for transmission, 795 the endpoint first ascertains whether an outbound path to the 796 destination multicast group already exists. If it does not, the MARS 797 is queried for a set of ATM endpoints that represent an appropriate 798 forwarding path. (The ATM endpoints may represent the actual group 799 members within the cluster, or a set of one or more MCSs. The 800 endpoint does not distinguish between either case. Section 6.2 801 describes the MARS behaviour that leads to MCSs being supplied as the 802 forwarding path for a multicast group.) 804 The query is executed by issuing a MARS_REQUEST. The reply from the 805 MARS may take one of two forms: 807 MARS_MULTI - Sequence of MARS_MULTI messages returning the set of 808 ATM endpoints that are to be leaf nodes of an 809 outgoing point to multipoint VC (the forwarding 810 path). 812 MARS_NAK - No mapping found, group is empty. 814 The formats of these messages are described in section 5.1.2. 816 Outgoing VCs are established with a request for Unspecified Bit Rate 817 (UBR) service, as typified by the IETF's use of VCs for unicast IP, 818 described in RFC 1755 [6]. Future documents may vary this approach 819 and allow the specification of different ATM traffic parameters from 820 locally configured information or parameters obtained through some 821 external means. 823 5.1.1 Retrieving Group Membership from the MARS. 825 If the MARS had no mapping for the desired Class D address a MARS_NAK 826 will be returned. In this case the IP packet MUST be discarded 827 silently. If a match is found in the MARS's tables it proceeds to 828 return addresses ATM.1 through ATM.n in a sequence of one or more 829 MARS_MULTIs. A simple mechanism is used to detect and recover from 830 loss of MARS_MULTI messages. 832 (If the client learns that there is no other group member in the 833 cluster - the MARS returns a MARS_NAK or returns a MARS_MULTI with 834 the client as the only member - it MUST delay sending out a new 835 MARS_REQUEST for that group for a period no less than 5 seconds and 836 no more than 10 seconds.) 838 Each MARS_MULTI carries a boolean field x, and a 15 bit integer field 839 y - expressed as MARS_MULTI(x,y). Field y acts as a sequence number, 840 starting at 1 and incrementing for each MARS_MULTI sent. Field x 841 acts as an 'end of reply' marker. When x == 1 the MARS response is 842 considered complete. 844 In addition, each MARS_MULTI may carry multiple ATM addresses from 845 the set {ATM.1, ATM.2, .... ATM.n}. A MARS MUST minimise the number 846 of MARS_MULTIs transmitted by placing as many group member's 847 addresses in a single MARS_MULTI as possible. The limit on the length 848 of an individual MARS_MULTI message MUST be the MTU of the underlying 849 VC. 851 For example, assume n ATM addresses must be returned, each MARS_MULTI 852 is limited to only p ATM addresses, and p << n. This would require a 853 sequence of k MARS_MULTI messages (where k = (n/p)+1, using integer 854 arithmetic), transmitted as follows: 856 MARS_MULTI(0,1) carries back {ATM.1 ... ATM.p} 857 MARS_MULTI(0,2) carries back {ATM.(p+1) ... ATM.(2p)} 858 [.......] 859 MARS_MULTI(1,k) carries back { ... ATM.n} 861 If k == 1 then only MARS_MULTI(1,1) is sent. 863 Typical failure mode will be losing one or more of MARS_MULTI(0,1) 864 through MARS_MULTI(0,k-1). This is detected when y jumps by more than 865 one between consecutive MARS_MULTI's. An alternative failure mode is 866 losing MARS_MULTI(1,k). A timer MUST be implemented to flag the 867 failure of the last MARS_MULTI to arrive. A default value of 10 868 seconds is RECOMMENDED. 870 If a 'sequence jump' is detected, the host MUST wait for the 871 MARS_MULTI(1,k), discard all results, and repeat the MARS_REQUEST. 873 If a timeout occurs, the host MUST discard all results, and repeat 874 the MARS_REQUEST. 876 A final failure mode involves the MARS Sequence Number (described in 877 section 5.1.4.2 and carried in each part of a multi-part MARS_MULTI). 878 If its value changes during the reception of a multi-part MARS_MULTI 879 the host MUST wait for the MARS_MULTI(1,k), discard all results, and 880 repeat the MARS_REQUEST. 882 (Corruption of cell contents will lead to loss of a MARS_MULTI 883 through AAL5 CPCS_PDU reassembly failure, which will be detected 884 through the mechanisms described above.) 886 If the MARS is managing a cluster of endpoints spread across 887 different but directly accessible ATM networks it will not be able to 888 return all the group members in a single MARS_MULTI. The MARS_MULTI 889 message format allows for either E.164, ISO NSAP, or (E.164 + NSAP) 890 to be returned as ATM addresses. However, each MARS_MULTI message may 891 only return ATM addresses of the same type and length. The returned 892 addresses MUST be grouped according to type (E.164, ISO NSAP, or 893 both) and returned in a sequence of separate MARS_MULTI parts. 895 5.1.2 MARS_REQUEST, MARS_MULTI, and MARS_NAK messages. 897 MARS_REQUEST is shown below. It is indicated by an 'operation type 898 value' (mar$op) of 1. 900 The multicast address being resolved is placed into the the target 901 protocol address field (mar$tpa), and the target hardware address is 902 set to null (mar$thtl and mar$tstl both zero). 904 In IPv4 environments the protocol type (mar$pro) is 0x800 and the 905 target protocol address length (mar$tpln) MUST be set to 4. The 906 source fields MUST contain the ATM number and subaddress of the 907 client issuing the MARS_REQUEST (the subaddress MAY be null). 909 Data: 910 mar$afn 16 bits Address Family (0x000F). 911 mar$pro 56 bits Protocol Identification. 912 mar$hdrrsv 24 bits Reserved. Unused by MARS control protocol. 913 mar$chksum 16 bits Checksum across entire MARS message. 914 mar$extoff 16 bits Extensions Offset. 915 mar$op 16 bits Operation code (MARS_REQUEST = 1) 916 mar$shtl 8 bits Type & length of source ATM number. (r) 917 mar$sstl 8 bits Type & length of source ATM subaddress. (q) 918 mar$spln 8 bits Length of source protocol address (s) 919 mar$thtl 8 bits Type & length of target ATM number (x) 920 mar$tstl 8 bits Type & length of target ATM subaddress (y) 921 mar$tpln 8 bits Length of target group address (z) 922 mar$pad 64 bits Padding (aligns mar$sha with MARS_MULTI). 923 mar$sha qoctets source ATM number 924 mar$ssa roctets source ATM subaddress 925 mar$spa soctets source protocol address 926 mar$tpa zoctets target multicast group address 927 mar$tha xoctets target ATM number 928 mar$tsa yoctets target ATM subaddress 930 Following the RFC1577 approach, the mar$shtl, mar$sstl, mar$thtl and 931 mar$tstl fields are coded as follows: 933 7 6 5 4 3 2 1 0 934 +-+-+-+-+-+-+-+-+ 935 |0|x| length | 936 +-+-+-+-+-+-+-+-+ 938 The most significant bit is reserved and MUST be set to zero. The 939 second most significant bit (x) is a flag indicating whether the ATM 940 address being referred to is in: 942 - ATM Forum NSAPA format (x = 0). 943 - Native E.164 format (x = 1). 945 The bottom 6 bits is an unsigned integer value indicating the length 946 of the associated ATM address in octets. If this value is zero the 947 flag x is ignored. 949 The mar$spln and mar$tpln fields are unsigned 8 bit integers, giving 950 the length in octets of the source and target protocol address fields 951 respectively. 953 MARS packets use true variable length fields. A null (non-existant) 954 address MUST be coded as zero length, and no space allocated for it 955 in the message body. 957 MARS_NAK is the MARS_REQUEST returned with operation type value of 6. 958 All other fields are left unchanged from the MARS_REQUEST (e.g. do 959 not transpose the source and target information. In all cases MARS 960 clients use the source address fields to identify their own messages 961 coming back). 963 The MARS_MULTI message is identified by an mar$op value of 2. The 964 message format is: 966 Data: 967 mar$afn 16 bits Address Family (0x000F). 968 mar$pro 56 bits Protocol Identification. 969 mar$hdrrsv 24 bits Reserved. Unused by MARS control protocol. 970 mar$chksum 16 bits Checksum across entire MARS message. 971 mar$extoff 16 bits Extensions Offset. 972 mar$op 16 bits Operation code (MARS_MULTI = 2). 973 mar$shtl 8 bits Type & length of source ATM number. (r) 974 mar$sstl 8 bits Type & length of source ATM subaddress. (q) 975 mar$spln 8 bits Length of source protocol address (s) 976 mar$thtl 8 bits Type & length of target ATM number (x) 977 mar$tstl 8 bits Type & length of target ATM subaddress (y) 978 mar$tpln 8 bits Length of target group address (z) 979 mar$tnum 16 bits Number of target ATM addresses returned (N) 980 mar$seqxy 16 bits Boolean flag x and sequence number y. 981 mar$msn 32 bits MARS Sequence Number. 982 mar$sha qoctets source ATM number 983 mar$ssa roctets source ATM subaddress 984 mar$spa soctets source protocol address 985 mar$tpa zoctets target multicast group address 986 mar$tha.1 xoctets target ATM number 1 987 mar$tsa.1 yoctets target ATM subaddress 1 988 mar$tha.2 xoctets target ATM number 2 989 mar$tsa.2 yoctets target ATM subaddress 2 990 [.......] 991 mar$tha.N xoctets target ATM number N 992 mar$tsa.N yoctets target ATM subaddress N 994 The source protocol and ATM address fields are copied directly from 995 the MARS_REQUEST that this MARS_MULTI is in response to (not the MARS 996 itself). 998 mar$seqxy is coded with flag x in the leading bit, and sequence 999 number y coded as an unsigned integer in the remaining 15 bits. 1001 | 1st octet | 2nd octet | 1002 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 1003 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1004 |x| y | 1005 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1007 mar$tnum is an unsigned integer indicating how many pairs of 1008 {mar$tha,mar$tsa} (i.e. how many group member's ATM addresses) are 1009 present in the message. mar$msn is an unsigned 32 bit number filled 1010 in by the MARS before transmitting each MARS_MULTI. Its use is 1011 described further in section 5.1.4. 1013 As an example, assume we have a multicast cluster using 4 byte 1014 protocol addresses, 20 byte ATM numbers, and 0 byte ATM subaddresses. 1015 For n group members in a single MARS_MULTI we require a (48 + 20n) 1016 byte message. If we assume the default MTU of 9180 bytes, we can 1017 return a maximum of 456 group member's addresses in a single 1018 MARS_MULTI. 1020 5.1.3 Establishing the outgoing multipoint VC. 1022 Following the completion of the MARS_MULTI reply the endpoint may 1023 establish a new point to multipoint VC, or reuse an existing one. 1025 If establishing a new VC, an L_MULTI_RQ is issued for ATM.1, followed 1026 by an L_MULTI_ADD for every member of the set {ATM.2, ....ATM.n} 1027 (assuming the set is non-null). The packet is then transmitted over 1028 the newly created VC just as it would be for a unicast VC. 1030 After transmitting the packet, the local interface holds the VC open 1031 and marks it as the active path out of the host for any subsequent IP 1032 packets being sent to that Class D address. 1034 When establishing a new multicast VC it is possible that one or more 1035 L_MULTI_RQ or L_MULTI_ADD may fail. The UNI 3.0/3.1 failure cause 1036 must be returned in the ERR_L_RQFAILED signal from the local 1037 signalling entity to the AAL User. If the failure cause is not 49 1038 (Quality of Service unavailable), 51 (user cell rate not available - 1039 UNI 3.0), 37 (user cell rate not available - UNI 3.1), or 41 1040 (Temporary failure), the endpoint's ATM address is dropped from the 1041 set {ATM.1, ATM.2, ..., ATM.n} returned by the MARS. Otherwise, the 1042 L_MULTI_RQ or L_MULTI_ADD should be reissued after a random delay of 1043 5 to 10 seconds. If the request fails again, another request should 1044 be issued after twice the previous delay has elapsed. This process 1045 should be continued until the call succeeds or the multipoint VC gets 1046 released. 1048 If the initial L_MULTI_RQ fails for ATM.1, and n is greater than 1 1049 (i.e. the returned set of ATM addresses contains 2 or more addresses) 1050 a new L_MULTI_RQ should be immediately issued for the next ATM 1051 address in the set. This procedure is repeated until an L_MULTI_RQ 1052 succeeds, as no L_MULTI_ADDs may be issued until an initial outgoing 1053 VC is established. 1055 Each ATM address for which an L_MULTI_RQ failed with cause 49, 51, 1056 37, or 41 MUST be tagged rather than deleted. An L_MULTI_ADD is 1057 issued for these tagged addresses using the random delay procedure 1058 outlined above. 1060 The VC MAY be considered 'up' before failed L_MULTI_ADDs have been 1061 successfully re-issued. An endpoint MAY implement a concurrent 1062 mechanism that allows data to start flowing out the new VC even while 1063 failed L_MULTI_ADDs are being re-tried. (The alternative of waiting 1064 for each leaf node to accept the connection could lead to significant 1065 delays in transmitting the first packet.) 1067 Each VC MUST have a configurable inactivity timer associated with it. 1068 If the timer expires, an L_RELEASE is issued for that VC, and the 1069 Class D address is no longer considered to have an active path out of 1070 the local host. The timer SHOULD be no less than 1 minute, and a 1071 default of 20 minutes is RECOMMENDED. Choice of specific timer 1072 periods is beyond the scope of this document. 1074 VC consumption may also be reduced by endpoints noting when a new 1075 group's set of {ATM.1, ....ATM.n} matches that of a pre-existing VC 1076 out to another group. With careful local management, and assuming the 1077 QoS of the existing VC is sufficient for both groups, a new pt to mpt 1078 VC may not be necessary. Under certain circumstances endpoints may 1079 decide that it is sufficient to re-use an existing VC whose set of 1080 leaf nodes is a superset of the new group's membership (in which case 1081 some endpoints will receive multicast traffic for a layer 3 group 1082 they haven't joined, and must filter them above the ATM interface). 1083 Algorithms for performing this type of optimization are not discussed 1084 here, and are not required for conformance with this document. 1086 5.1.4 Tracking subsequent group updates. 1088 Once a new VC has been established, the transmit side of the cluster 1089 member's interface needs to monitor subsequent group changes - adding 1090 or dropping leaf nodes as appropriate. This is achieved by watching 1091 for MARS_JOIN and MARS_LEAVE messages from the MARS itself. These 1092 messages are described in detail in section 5.2 - at this point it is 1093 sufficient to note that they carry: 1095 - The ATM address of a node joining or leaving a group. 1096 - The layer 3 address of the group(s) being joined or left. 1097 - A Cluster Sequence Number (CSN) from the MARS. 1099 MARS_JOIN and MARS_LEAVE messages arrive at each cluster member 1100 across ClusterControlVC. MARS_JOIN or MARS_LEAVE messages that simply 1101 confirm information already held by the cluster member are used to 1102 track the Cluster Sequence Number, but are otherwise ignored. 1104 5.1.4.1 Updating the active VCs. 1106 If a MARS_JOIN is seen that refers to (or encompasses) a group for 1107 which the transmit side already has a VC open, the new member's ATM 1108 address is extracted and an L_MULTI_ADD issued locally. This ensures 1109 that endpoints already sending to a given group will immediately add 1110 the new member to their list of recipients. 1112 If a MARS_LEAVE is seen that refers to (or encompasses) a group for 1113 which the transmit side already has a VC open, the old member's ATM 1114 address is extracted and an L_MULTI_DROP issued locally. This ensures 1115 that endpoints already sending to a given group will immediately drop 1116 the old member from their list of recipients. When the last leaf of a 1117 VC is dropped, the VC is closed completely and the affected group no 1118 longer has a path out of the local endpoint (the next outbound packet 1119 to that group's address will trigger the creation of a new VC, as 1120 described in sections 5.1.1 to 5.1.3). 1122 The transmit side of the interface MUST NOT shut down an active VC to 1123 a group for which the receive side has just executed a 1124 LeaveLocalGroup. (This behaviour is consistent with the model of 1125 hosts transmitting to groups regardless of their own membership 1126 status.) 1128 If a MARS_JOIN or MARS_LEAVE arrives with mar$pnum == 0 it carries no 1129 pairs, and is only used for tracking the CSN. 1131 5.1.4.2 Tracking the Cluster Sequence Number. 1133 It is important that endpoints do not miss group membership updates 1134 issued by the MARS over ClusterControlVC. However, this will happen 1135 from time to time. The Cluster Sequence Number is carried as an 1136 unsigned 32 bit value in the mar$msn field of many MARS messages 1137 (except for MARS_REQUEST and MARS_NAK). It increments once for every 1138 transmission the MARS makes on ClusterControlVC, regardless of 1139 whether the transmission represents a change in the MARS database or 1140 not. By tracking this counter, cluster members can determine whether 1141 they have missed a previous message on ClusterControlVC, and possibly 1142 a membership change. This is then used to trigger revalidation 1143 (described in section 5.1.5). 1145 The current CSN is copied into the mar$msn field of MARS messages 1146 being sent to cluster members, whether out ClusterControlVC or on an 1147 point to point VC. 1149 Calculations on the sequence numbers MUST be performed as unsigned 32 1150 bit arithmetic. 1152 Every cluster member keeps its own 32 bit Host Sequence Number (HSN) 1153 to track the MARS's sequence number. Whenever a message is received 1154 that carries an mar$msn field the following processing is performed: 1156 Seq.diff = mar$msn - HSN 1157 mar$msn -> HSN 1158 {...process MARS message as appropriate...} 1160 if ((Seq.diff != 1) && (Seq.diff != 0)) 1161 then {...revalidate group membership information...} 1163 The basic result is that the cluster member attempts to keep locked 1164 in step with membership changes noted by the MARS. If it ever detects 1165 that a membership change occurred (in any group) without it noticing, 1166 it re-validates the membership of all groups it currently has 1167 multicast VCs open to. 1169 The mar$msn value in an individual MARS_MULTI is not used to update 1170 the HSN until all parts of the MARS_MULTI (if more than 1) have 1171 arrived. (If the mar$msn changes the MARS_MULTI is discarded, as 1172 described in section 5.1.1.) 1174 The MARS is free to choose an initial value of CSN. When a new 1175 cluster member starts up it should initialise HSN to zero. When the 1176 cluster member sends the MARS_JOIN to register (described later), the 1177 HSN will be correctly updated to the current CSN value when the 1178 endpoint receives the copy of its MARS_JOIN back from the MARS. 1180 5.1.5 Revalidating a VC's leaf nodes. 1182 Certain events may inform a cluster member that it has incorrect 1183 information about the sets of leaf nodes it should be sending to. If 1184 an error occurs on a VC associated with a particular group, the 1185 cluster member initiates revalidation procedures for that specific 1186 group. If a jump is detected in the Cluster Sequence Number, this 1187 initiates revalidation of all groups to which the cluster member 1188 currently has open point to multipoint VCs. 1190 Each open and active multipoint VC has a flag associated with it 1191 called 'VC_revalidate'. This flag is checked everytime a packet is 1192 queued for transmission on that VC. If the flag is false, the packet 1193 is transmitted and no further action is required. 1195 However, if the VC_revalidate flag is true then the packet is 1196 transmitted and a new sequence of events is started locally. 1198 Revalidation begins with re-issuing a MARS_REQUEST for the group 1199 being revalidated. The returned set of members {NewATM.1, NewATM.2, 1200 .... NewATM.n} is compared with the set already held locally. 1201 L_MULTI_DROPs are issued on the group's VC for each node that appears 1202 in the original set of members but not in the revalidated set of 1203 members. L_MULTI_ADDs are issued on the group's VC for each node that 1204 appears in the revalidated set of members but not in the original set 1205 of members. The VC_revalidate flag is reset when revalidation 1206 concludes for the given group. Implementation specific mechanisms 1207 will be needed to flag the 'revalidation in progress' state. 1209 The key difference between constructing a VC (section 5.1.3) and 1210 revalidating a VC is that packet transmission continues on the open 1211 VC while it is being revalidated. This minimises the disruption to 1212 existing traffic. 1214 The algorithm for initiating revalidation is: 1216 - When a packet arrives for transmission on a given group, 1217 the groups membership is revalidated if VC_revalidate == TRUE. 1218 Revalidation resets VC_revalidate. 1219 - When an event occurs that demands revalidation, every 1220 group has its VC_revalidate flag set TRUE at a random time 1221 between 1 and 10 seconds. 1223 Benefit: Revalidation of active groups occurs quickly, and 1224 essentially idle groups are revalidated as needed. Randomly 1225 distributed setting of VC_revalidate flag improves chances of 1226 staggered revalidation requests from senders when a sequence number 1227 jump is detected. 1229 5.1.5.1 When leaf node drops itself. 1231 During the life of a multipoint VC an ERR_L_DROP may be received 1232 indicating that a leaf node has terminated its participation at the 1233 ATM level. The ATM endpoint associated with the ERR_L_DROP MUST be 1234 removed from the locally held set {ATM.1, ATM.2, .... ATM.n} 1235 associated with the VC. 1237 After a random period of time between 1 and 10 seconds the 1238 VC_revalidate flag associated with that VC MUST be set true. 1240 If an ERR_L_RELEASE is received then the entire set {ATM.1, ATM.2, 1241 .... ATM.n} is cleared and the VC is considered to be completely shut 1242 down. Further packet transmission to the group served by this VC will 1243 result in a new VC being established as described in section 5.1.3. 1245 5.1.5.2 When a jump is detected in the CSN. 1247 Section 5.1.4.2 describes how a CSN jump is detected. If a CSN jump 1248 is detected upon receipt of a MARS_JOIN or a MARS_LEAVE then every 1249 outgoing multicast VC MUST have its VC_revalidate flag set true at 1250 some random interval between 1 and 10 seconds from when the CSN jump 1251 was detected. 1253 The only exception to this rule is if a sequence number jump is 1254 detected during the establishment of a new group's VC (i.e. a 1255 MARS_MULTI reply was correctly received, but its mar$msn indicated 1256 that some previous MARS traffic had been missed on ClusterControlVC). 1257 In this case every open VC, EXCEPT the one just established, MUST 1258 have its VC_revalidate flag set true at some random interval between 1259 1 and 10 seconds from when the CSN jump was detected. (The VC being 1260 established at the time is considered already validated.) 1262 5.1.6 'Migrating' the outgoing multipoint VC 1264 In addition to the group tracking described in section 5.1.4, the 1265 transmit side of a cluster member must respond to 'migration' 1266 requests by the MARS. This is triggered by the reception of a 1267 MARS_MIGRATE message from ClusterControlVC. The MARS_MIGRATE message 1268 is shown below, with an mar$op code of 13. 1270 Data: 1271 mar$afn 16 bits Address Family (0x000F). 1272 mar$pro 56 bits Protocol Identification. 1273 mar$hdrrsv 24 bits Reserved. Unused by MARS control protocol. 1274 mar$chksum 16 bits Checksum across entire MARS message. 1275 mar$extoff 16 bits Extensions Offset. 1276 mar$op 16 bits Operation code (MARS_MIGRATE = 13). 1277 mar$shtl 8 bits Type & length of source ATM number. (r) 1278 mar$sstl 8 bits Type & length of source ATM subaddress. (q) 1279 mar$spln 8 bits Length of source protocol address (s) 1280 mar$thtl 8 bits Type & length of target ATM number (x) 1281 mar$tstl 8 bits Type & length of target ATM subaddress (y) 1282 mar$tpln 8 bits Length of target group address (z) 1283 mar$tnum 16 bits Number of target ATM addresses returned (N) 1284 mar$resv 16 bits Reserved. 1285 mar$msn 32 bits MARS Sequence Number. 1286 mar$sha qoctets source ATM number 1287 mar$ssa roctets source ATM subaddress 1288 mar$spa soctets source protocol address 1289 mar$tpa zoctets target multicast group address 1290 mar$tha.1 xoctets target ATM number 1 1291 mar$tsa.1 yoctets target ATM subaddress 1 1292 mar$tha.2 xoctets target ATM number 2 1293 mar$tsa.2 yoctets target ATM subaddress 2 1294 [.......] 1295 mar$tha.N xoctets target ATM number N 1296 mar$tsa.N yoctets target ATM subaddress N 1298 A migration is requested when the MARS determines that it no longer 1299 wants cluster members forwarding their packets directly to the ATM 1300 addresses it had previously specified (through MARS_REQUESTs or 1301 MARS_JOINs). When a MARS_MIGRATE is received each cluster member MUST 1302 perform the following steps: 1304 Close down any existing outgoing VC associated with the group 1305 carried in the mar$tpa field (L_RELEASE), or dissociate the group 1306 from any outgoing VC it may have been sharing (as described in 1307 section 5.1.3). 1309 Establish a new outgoing VC for the specified group, using the 1310 algorithm described in section 5.1.3 and taking the set of ATM 1311 addresses supplied in the MARS_MIGRATE as the group's new set of 1312 members {ATM.1, .... ATM.n}. 1314 The MARS_MIGRATE carries the new set of members {ATM.1, .... ATM.n} 1315 in a single message, in similar manner to a single part MARS_MULTI. 1316 As with other messages from the MARS, the Cluster Sequence Number 1317 carried in mar$msn is checked as described in section 5.1.4.2. 1319 5.2. Receive side behaviour. 1321 A cluster member is a 'group member' (in the sense that it receives 1322 packets directed at a given multicast group) when its ATM address 1323 appears in the MARS's table entry for the group's multicast address. 1324 A key function within each cluster is the distribution of group 1325 membership information from the MARS to cluster members. 1327 An endpoint may wish to 'join a group' in response to a local, higher 1328 level request for membership of a group, or because the endpoint 1329 supports a layer 3 multicast forwarding engine that requires the 1330 ability to 'see' intra-cluster traffic in order to forward it. 1332 Two messages support these requirements - MARS_JOIN and MARS_LEAVE. 1333 These are sent to the MARS by endpoints when the local layer 3/ATM 1334 interface is requested to join or leave a multicast group. The MARS 1335 propagates these messages back out over ClusterControlVC, to ensure 1336 the knowledge of the group's membership change is distributed in a 1337 timely fashion to other cluster members. 1339 Certain models of layer 3 endpoints (e.g. IP multicast routers) 1340 expect to be able to receive packet traffic 'promiscuously' across 1341 all groups. This functionality may be emulated by allowing routers 1342 to request that the MARS returns them as 'wild card' members of all 1343 Class D addresses. However, a problem inherent in the current ATM 1344 model is that a completely promiscuous router may exhaust the local 1345 reassembly resources in its ATM interface. MARS_JOIN supports a 1346 generalisation to the notion of 'wild card' entries, enabling routers 1347 to limit themselves to 'blocks' of the Class D address space. Use of 1348 this facility is described in greater detail in Section 8. 1350 A block can be as small as 1 (a single group) or as large as the 1351 entire multicast address space (e.g. default IPv4 'promiscuous' 1352 behaviour). A block is defined as all addresses between, and 1353 inclusive of, a address pair. A MARS_JOIN or MARS_LEAVE may 1354 carry multiple pairs. 1356 Cluster members MUST provide ONLY a single pair in each 1357 JOIN/LEAVE message they issue. However, they MUST be able to process 1358 multiple pairs in JOIN/LEAVE messages when performing VC 1359 management as described in section 5.1.4 (the interpretation being 1360 that the join/leave operation applies to all addresses in range from 1361 to inclusive, for every pair). 1363 In RFC1112 environments a MARS_JOIN for a single group is triggered 1364 by a JoinLocalGroup signal from the IP layer. A MARS_LEAVE for a 1365 single group is triggered by a LeaveLocalGroup signal from the IP 1366 layer. 1368 Cluster members with special requirements (e.g. multicast routers) 1369 may issue MARS_JOINs and MARS_LEAVEs specifying a single block of 2 1370 or more multicast group addresses. However, a cluster member SHALL 1371 NOT issue such a multi-group block join for an address range fully or 1372 partially overlapped by multi-group block join(s) that the cluster 1373 member has previously issued and not yet retracted. A cluster member 1374 MAY issue combinations of single group MARS_JOINs that overlap with a 1375 multi-group block MARS_JOIN. 1377 An endpoint MUST register with a MARS in order to become a member of 1378 a cluster and be added as a leaf to ClusterControlVC. Registration 1379 is covered in section 5.2.3. 1381 Finally, the endpoint MUST be capable of terminating unidirectional 1382 VCs (i.e. act as a leaf node of a UNI 3.0/3.1 point to multipoint VC, 1383 with zero bandwidth assigned on the return path). RFC 1755 describes 1384 the signalling information required to terminate VCs carrying 1385 LLC/SNAP encapsulated traffic (discussed further in section 5.5). 1387 5.2.1 Format of the MARS_JOIN and MARS_LEAVE Messages. 1389 The MARS_JOIN message is indicated by an operation type value of 4. 1390 MARS_LEAVE has the same format and operation type value of 5. The 1391 message format is: 1393 Data: 1394 mar$afn 16 bits Address Family (0x000F). 1395 mar$pro 56 bits Protocol Identification. 1397 mar$hdrrsv 24 bits Reserved. Unused by MARS control protocol. 1398 mar$chksum 16 bits Checksum across entire MARS message. 1399 mar$extoff 16 bits Extensions Offset. 1400 mar$op 16 bits Operation code (MARS_JOIN or MARS_LEAVE). 1401 mar$shtl 8 bits Type & length of source ATM number. (r) 1402 mar$sstl 8 bits Type & length of source ATM subaddress. (q) 1403 mar$spln 8 bits Length of source protocol address (s) 1404 mar$tpln 8 bits Length of group address (z) 1405 mar$pnum 16 bits Number of group address pairs (N) 1406 mar$flags 16 bits layer3grp, copy, and register flags. 1407 mar$cmi 16 bits Cluster Member ID 1408 mar$msn 32 bits MARS Sequence Number. 1409 mar$sha qoctets source ATM number. 1410 mar$ssa roctets source ATM subaddress. 1411 mar$spa soctets source protocol address 1412 mar$min.1 zoctets Minimum multicast group address - pair.1 1413 mar$max.1 zoctets Maximum multicast group address - pair.1 1414 [.......] 1415 mar$min.N zoctets Minimum multicast group address - pair.N 1416 mar$max.N zoctets Maximum multicast group address - pair.N 1418 mar$spln indicates the number of bytes in the source endpoint's 1419 protocol address, and is interpreted in the context of the protocol 1420 indicated by the mar$pro field. (e.g. in IPv4 environments mar$pro 1421 will be 0x800, mar$spln is 4, and mar$tpln is 4.) 1423 The mar$flags field contains three flags: 1425 Bit 15 - mar$flags.layer3grp. 1426 Bit 14 - mar$flags.copy. 1427 Bit 13 - mar$flags.register. 1428 Bit 12 - mar$flags.punched. 1429 Bit 0-7 - mar$flags.sequence. 1431 Bits 8 to 11 are reserved and MUST be zero. 1433 mar$flags.sequence is set by cluster members, and MUST always be 1434 passed on unmodified by the MARS when retransmitting MARS_JOIN or 1435 MARS_LEAVE messages. It is source specific, and MUST be ignored by 1436 other cluster members. Its use is described in section 5.2.2. 1438 mar$flags.punched MUST be zero when the MARS_JOIN or MARS_LEAVE is 1439 transmitted to the MARS. Its use is described in section 5.2.2 and 1440 section 6.2.4. 1442 mar$flags.copy MUST be set to 0 when the message is being sent from a 1443 MARS client, and MUST be set to 1 when the message is being sent from 1444 a MARS. (This flag is intended to support integrating the MARS 1445 function with one of the MARS clients in your cluster. The 1446 destination of an incoming MARS_JOIN can be determined from its 1447 value.) 1449 mar$flags.layer3grp allows the MARS to provide the group membership 1450 information described further in section 5.3. The rules for its use 1451 are: 1453 mar$flags.layer3grp MUST be set when the cluster member is issuing 1454 the MARS_JOIN as the result of a layer 3 multicast group being 1455 explicitly joined. (e.g. as a result of a JoinHostGroup operation 1456 in an RFC1112 compliant host). 1458 mar$flags.layer3grp MUST be reset in each MARS_JOIN if the 1459 MARS_JOIN is simply the local ip/atm interface registering to 1460 receive traffic on that group for its own reasons. 1462 mar$flags.layer3grp is ignored and MUST be treated as reset by the 1463 MARS for any MARS_JOIN that specifies a block covering more than a 1464 single group (e.g. a block join from a router ensuring their 1465 forwarding engines 'see' all traffic). 1467 mar$flags.register indicates whether the MARS_JOIN or MARS_LEAVE is 1468 being used to register or deregister a cluster member (described in 1469 section 5.2.3). When used to join or leave specific groups the 1470 mar$register flag MUST be zero. 1472 mar$pnum indicates how many pairs are included in the 1473 message. This field MUST be 1 when the message is sent from a cluster 1474 member. A MARS MAY return a MARS_JOIN or MARS_LEAVE with any mar$pnum 1475 value, including zero. This will be explained futher in section 1476 6.2.4. 1478 The mar$cmi field MUST be zeroed by cluster members, and is used by 1479 the MARS during cluster member registration, described in section 1480 5.2.3. 1482 mar$msn MUST be zero when transmitted by an endpoint. It is set to 1483 the current value of the Cluster Sequence Number by the MARS when the 1484 MARS_JOIN or MARS_LEAVE is retransmitted. Its use has been described 1485 in section 5.1.4. 1487 To simplify construction and parsing of MARS_JOIN and MARS_LEAVE 1488 messages, the following restrictions are imposed on the 1489 pairs: 1491 Assume max(N) is the field from the Nth pair. 1492 Assume min(N) is the field from the Nth pair. 1494 Assume a join/leave message arrives with K pairs. 1495 The following must hold: 1496 max(N) < min(N+1) for 1 <= N < K 1497 max(N) >= min(N) for 1 <= N <= K 1499 In plain language, the set must specify an ascending sequence of 1500 address blocks. The definition of "greater" or "less than" may be 1501 protocol specific. In IPv4 environments the addresses are treated as 1502 32 bit, unsigned binary values (most significant byte first). 1504 5.2.1.1 Important IPv4 default values. 1506 The JoinLocalGroup and LeaveLocalGroup operations are only valid for 1507 a single group. For any arbitrary group address X the associated 1508 MARS_JOIN or MARS_LEAVE MUST specify a single pair . 1509 mar$flags.layer3grp MUST be set under these circumstances. 1511 A router choosing to behave strictly in accordance with RFC1112 MUST 1512 specify the entire Class D space. The associated MARS_JOIN or 1513 MARS_LEAVE MUST specify a single pair <224.0.0.0, 239.255.255.255>. 1514 Whenever a router issues a MARS_JOIN only in order to forward IP 1515 traffic it MUST reset mar$flags.layer3grp. 1517 The use of alternative values by multicast routers is 1518 discussed in Section 8. 1520 5.2.2 Retransmission of MARS_JOIN and MARS_LEAVE messages. 1522 Transient problems may result in the loss of messages between the 1523 MARS and cluster members 1525 A simple algorithm is used to solve this problem. Cluster members 1526 retransmit each MARS_JOIN and MARS_LEAVE message at regular intervals 1527 until they receive a copy back again, either on ClusterControlVC or 1528 the VC on which they are sending the message. At this point the 1529 local endpoint can be certain that the MARS received and processed 1530 it. 1532 The interval should be no shorter than 5 seconds, and a default value 1533 of 10 seconds is recommended. After 5 retransmissions the attempt 1534 should be flagged locally as a failure. This MUST be considered as a 1535 MARS failure, and triggers the MARS reconnection described in section 1536 5.4. 1538 A 'copy' is defined as a received message with the following fields 1539 matching a previously transmitted MARS_JOIN/LEAVE: 1541 - mar$op 1542 - mar$flags.register 1543 - mar$flags.sequence 1544 - mar$pnum 1545 - Source ATM address 1546 - First pair 1548 In addition, a valid copy MUST have the following field values: 1550 - mar$flags.punched = 0 1551 - mar$flags.copy = 1 1553 The mar$flags.sequence field is never modified or checked by a MARS. 1554 Implementors MAY choose to utilize locally significant sequence 1555 number schemes, which MAY differ from one cluster member to the next. 1556 In the absence of such schemes the default value for 1557 mar$flags.sequence MUST be zero. 1559 Careful implementations MAY have more than one outstanding 1560 (unacknowledged) MARS_JOIN/LEAVE at a time. 1562 5.2.3 Cluster member registration and deregistration. 1564 To become a cluster member an endpoint must register with the MARS. 1565 This achieves two things - the endpoint is added as a leaf node of 1566 ClusterControlVC, and the endpoint is assigned a 16 bit Cluster 1567 Member Identifier (CMI). The CMI uniquely identifies each endpoint 1568 that is attached to the cluster. 1570 Registration with the MARS occurs when an endpoint issues a MARS_JOIN 1571 with the mar$flags.register flag set to one (bit 13 of the mar$flags 1572 field). 1574 The cluster member MUST include its source ATM address, and MAY 1575 choose to specify a null source protocol address when registering. 1577 No protocol specific group addresses are included in a registration 1578 MARS_JOIN. 1580 The cluster member retransmits this MARS_JOIN in accordance with 1581 section 5.2.2 until it confirms that the MARS has received it. 1583 When the registration MARS_JOIN is returned it contains a non-zero 1584 value in mar$cmi. This value MUST be noted by the cluster member, and 1585 used whenever circumstances require the cluster member's CMI. 1587 An endpoint may also choose to de-register, using a MARS_LEAVE with 1588 mar$flags.register set. This would result in the MARS dropping the 1589 endpoint from ClusterControlVC, removing all references to the member 1590 in the mapping database, and freeing up its CMI. 1592 As for registration, a deregistration request MUST include the 1593 correct source ATM address for the cluster member, but MAY choose to 1594 specify a null source protocol address. 1596 The cluster member retransmits this MARS_LEAVE in accordance with 1597 section 5.2.2 until it confirms that the MARS has received it. 1599 5.3 Support for Layer 3 group management. 1601 Whilst the intention of this specification is to be independent of 1602 layer 3 issues, an attempt is being made to assist the operation of 1603 layer 3 multicast routing protocols that need to ascertain if any 1604 groups have members within a cluster. 1606 One example is IP, where IGMP is used (as described in section 2) 1607 simply to determine whether any other cluster members are listening 1608 to a group because they have higher layer applications that want to 1609 receive a group's traffic. 1611 Routers may choose to query the MARS for this information, rather 1612 than multicasting IGMP queries to 224.0.0.1 and incurring the 1613 associated cost of setting up a VC to all systems in the cluster. 1615 The query is issued by sending a MARS_GROUPLIST_REQUEST to the MARS. 1616 MARS_GROUPLIST_REQUEST is built from a MARS_JOIN, but it has an 1617 operation code of 10. The first pair will be used by the 1618 MARS to identify the range of groups in which the querying cluster 1619 member is interested. Any additional pairs will be ignored. 1620 A request with mar$pnum = 0 will be ignored. 1622 The response from the MARS is a MARS_GROUPLIST_REPLY, carrying a list 1623 of the multicast groups within the specified block that 1624 have Layer 3 members. A group is noted in this list if one or more 1625 of the MARS_JOINs that generated its mapping entry in the MARS 1626 contained a set mar$flags.layer3grp flag. 1628 MARS_GROUPLIST_REPLYs are transmitted back to the querying cluster 1629 member on the VC used to send the MARS_GROUPLIST_REQUEST. 1631 MARS_GROUPLIST_REPLY is derived from the MARS_MULTI but with mar$op = 1632 11. It may have multiple parts if needed, and is received in a 1633 similar manner to a MARS_MULTI. 1635 Data: 1636 mar$afn 16 bits Address Family (0x000F). 1638 mar$pro 56 bits Protocol Identification. 1639 mar$hdrrsv 24 bits Reserved. Unused by MARS control protocol. 1640 mar$chksum 16 bits Checksum across entire MARS message. 1641 mar$extoff 16 bits Extensions Offset. 1642 mar$op 16 bits Operation code (MARS_GROUPLIST_REPLY). 1643 mar$shtl 8 bits Type & length of source ATM number. (r) 1644 mar$sstl 8 bits Type & length of source ATM subaddress. (q) 1645 mar$spln 8 bits Length of source protocol address (s) 1646 mar$thtl 8 bits Unused - set to zero. 1647 mar$tstl 8 bits Unused - set to zero. 1648 mar$tpln 8 bits Length of target group address (z) 1649 mar$tnum 16 bits Number of group addresses returned (N). 1650 mar$seqxy 16 bits Boolean flag x and sequence number y. 1651 mar$msn 32 bits MARS Sequence Number. 1652 mar$sha qoctets source ATM number. 1653 mar$ssa roctets source ATM subaddress. 1654 mar$spa soctets source protocol address 1655 mar$mgrp.1 zoctets Group address 1 1656 [.......] 1657 mar$mgrp.N zoctets Group address N 1659 mar$seqxy is coded as for the MARS_MULTI - multiple 1660 MARS_GROUPLIST_REPLY components are transmitted and received using 1661 the same algorithm as described in section 5.1.1 for MARS_MULTI. The 1662 only difference is that protocol address are being returned rather 1663 than ATM addresses. 1665 As for MARS_MULTIs, if an error occurs in the reception of a multi 1666 part MARS_GROUPLIST_REPLY the whole thing MUST be discarded and the 1667 MARS_GROUPLIST_REQUEST re-issued. (This includes the mar$msn value 1668 being constant.) 1670 Note that the ability to generate MARS_GROUPLIST_REQUEST messages, 1671 and receive MARS_GROUPLIST_REPLY messages, is not required for 1672 general host interface implementations. It is optional for interfaces 1673 being implemented to support layer 3 multicast forwarding engines. 1674 However, this functionality MUST be supported by the MARS. 1676 5.4 Support for redundant/backup MARS entities. 1678 Endpoints are assumed to have been configured with the ATM address of 1679 at least one MARS. Endpoints MAY choose to maintain a table of ATM 1680 addresses, representing alternative MARSs that will be contacted in 1681 the event that normal operation with the original MARS is deemed to 1682 have failed. It is assumed that this table orders the ATM addresses 1683 in descending order of preference. 1685 An endpoint will typically decide there are problems with the MARS 1686 when: 1688 - It fails to establish a point to point VC to the MARS. 1689 - MARS_REQUESTs fail (section 5.1.1). 1690 - MARS_JOIN/MARS_LEAVEs fail (section 5.2.2). 1691 - It has not received a MARS_REDIRECT_MAP in the last 4 minutes 1692 (section 5.4.3). 1694 (If it is able to discern which connection represents 1695 ClusterControlVC, it may also use connection failures on this VC to 1696 indicate problems with the MARS). 1698 5.4.1 First response to MARS problems. 1700 The first response is to assume a transient problem with the MARS 1701 being used at the time. The cluster member should wait a random 1702 period of time between 1 and 10 seconds before attempting to re- 1703 connect and re-register with the MARS. If the registration MARS_JOIN 1704 is successful then: 1706 The cluster member MUST then proceed to rejoin every group that 1707 its local higher layer protocol(s) have joined. It is recommended 1708 that a random delay between 1 and 10 seconds be inserted before 1709 attempting each MARS_JOIN. 1711 The cluster member MUST initiate the revalidation of every 1712 multicast group it was sending to (as though a sequence number 1713 jump had been detected, section 5.1.5). 1715 The rejoin and revalidation procedure must not disrupt the cluster 1716 member's use of multipoint VCs that were already open at the time 1717 of the MARS failure. 1719 If re-registration with the current MARS fails, and there are no 1720 backup MARS addresses configured, the cluster member MUST wait for at 1721 least 1 minute before repeating the re-registration procedure. It is 1722 RECOMMENDED that the cluster member signals an error condition in 1723 some locally significant fashion. 1725 This procedure may repeat until network administrators manually 1726 intervene or the current MARS returns to normal operation. 1728 5.4.2 Connecting to a backup MARS. 1730 If the re-registration with the current MARS fails, and other MARS 1731 addresses have been configured, the next MARS address on the list is 1732 chosen to be the current MARS, and the cluster member immediately 1733 restarts the re-registration procedure described in section 5.4.1. If 1734 this is succesful the cluster member will resume normal operation 1735 using the new MARS. It is RECOMMENDED that the cluster member signals 1736 a warning of this condition in some locally significant fashion. 1738 If the attempt at re-registration with the new MARS fails, the 1739 cluster member MUST wait for at least 1 minute before chosing the 1740 next MARS address in the table and repeating the procedure. If the 1741 end of the table has been reached, the cluster member starts again at 1742 the top of the table (which should be the original MARS that the 1743 cluster member started with). 1745 In the worst case scenario this will result in cluster members 1746 looping through their table of possible MARS addresses until network 1747 administrators manually intervene. 1749 5.4.3 Dynamic backup lists, and soft redirects. 1751 To support some level of autoconfiguration, a MARS message is defined 1752 that allows the current MARS to broadcast on ClusterControlVC a table 1753 of backup MARS addresses. When this message is received, cluster 1754 members that maintain a list of backup MARS addresses MUST insert 1755 this information at the top of their locally held list (i.e. the 1756 information provided by the MARS has a higher preference than 1757 addresses that may have been manually configured into the cluster 1758 member). 1760 The message is MARS_REDIRECT_MAP. It is based on the MARS_MULTI 1761 message, with the following changes: 1763 - mar$tpln field replaced by mar$redirf. 1764 - mar$spln field reserved. 1765 - mar$tpa and mar$spa eliminated. 1767 MARS_REDIRECT_MAP has an operation type code of 12 decimal. 1769 Data: 1770 mar$afn 16 bits Address Family (0x000F). 1771 mar$pro 56 bits Protocol Identification. 1772 mar$hdrrsv 24 bits Reserved. Unused by MARS control protocol. 1773 mar$chksum 16 bits Checksum across entire MARS message. 1774 mar$extoff 16 bits Extensions Offset. 1775 mar$op 16 bits Operation code (MARS_REDIRECT_MAP). 1776 mar$shtl 8 bits Type & length of source ATM number. (r) 1777 mar$sstl 8 bits Type & length of source ATM subaddress. (q) 1778 mar$spln 8 bits Length of source protocol address (s) 1779 mar$thtl 8 bits Type & length of target ATM number (x) 1780 mar$tstl 8 bits Type & length of target ATM subaddress (y) 1781 mar$redirf 8 bits Flag controlling client redirect behaviour. 1782 mar$tnum 16 bits Number of MARS addresses returned (N). 1783 mar$seqxy 16 bits Boolean flag x and sequence number y. 1784 mar$msn 32 bits MARS Sequence Number. 1785 mar$sha qoctets source ATM number 1786 mar$ssa roctets source ATM subaddress 1787 mar$tha.1 xoctets ATM number for MARS 1 1788 mar$tsa.1 yoctets ATM subaddress for MARS 1 1789 mar$tha.2 xoctets ATM number for MARS 2 1790 mar$tsa.2 yoctets ATM subaddress for MARS 2 1791 [.......] 1792 mar$tha.N xoctets ATM number for MARS N 1793 mar$tsa.N yoctets ATM subaddress for MARS N 1795 The source ATM address field(s) MUST identify the originating MARS. 1796 A multi-part MARS_REDIRECT_MAP may be transmitted and reassembled 1797 using the mar$seqxy field in the same manner as a multi-part 1798 MARS_MULTI (section 5.1.1). If a failure occurs during the reassembly 1799 of a multi-part MARS_REDIRECT_MAP (a part lost, reassembly timeout, 1800 or illegal MARS Sequence Number jump) the entire message MUST be 1801 discarded. 1803 This message is transmitted regularly by the MARS (it MUST be 1804 transmitted at least every 2 minutes, it is RECOMMENDED that it is 1805 transmitted every 1 minute). 1807 The MARS_REDIRECT_MAP is also used to force cluster members to shift 1808 from one MARS to another. If the ATM address of the first MARS 1809 contained in a MARS_REDIRECT_MAP table is not the address of cluster 1810 member's current MARS the client MUST 'redirect' to the new MARS. The 1811 mar$redirf field controls how the redirection occurs. 1813 mar$redirf has the following format: 1815 7 6 5 4 3 2 1 0 1816 +-+-+-+-+-+-+-+-+ 1817 |x| | 1818 +-+-+-+-+-+-+-+-+ 1820 If Bit 7 (the most significant bit) of mar$redirf is 1 then the 1821 cluster member MUST perform a 'hard' redirect. Having installed the 1822 new table of MARS addresses carried by the MARS_REDIRECT_MAP, the 1823 cluster member re-registers with the MARS now at the top of the table 1824 using the mechanism described in sections 5.4.1 and 5.4.2. 1826 If Bit 7 of mar$redirf is 0 then the cluster member MUST perform a 1827 'soft' redirect, beginning with the following actions: 1829 - open a point to point VC to the first ATM address. 1830 - attempt a registration (section 5.2.3). 1832 If the registration succeeds, the cluster member shuts down its point 1833 to point VC to the current MARS (if it had one open), and then 1834 proceeds to use the newly opened point to point VC as its connection 1835 to the 'current MARS'. The cluster member does NOT attempt to rejoin 1836 the groups it is a member of, or revalidate groups it is currently 1837 sending to. 1839 This is termed a 'soft redirect' because it avoids the extra 1840 rejoining and revalidation processing that occurs when a MARS failure 1841 is being recovered from. It assumes some external synchronisation 1842 mechanisms exist between the old and new MARS - mechanisms that are 1843 outside the scope of this specification. 1845 Some level of trust is required before initiating a soft redirect. A 1846 cluster member MUST check that the calling party at the other end of 1847 the VC on which the MARS_REDIRECT_MAP arrived (supposedly 1848 ClusterControlVC) is in fact the node it trusts as the current MARS. 1850 Additional applications of this function are for further study. 1852 5.5 Data path LLC/SNAP encapsulations. 1854 An extended encapsulation scheme is required to support the filtering 1855 of possible reflected packets (section 3.3). 1857 Two LLC/SNAP codepoints are allocated from the IANA OUI space. These 1858 support two different mechanisms for detecting reflected packets. 1859 They are called Type #1 and Type #2 multicast encapsulations. 1861 Type #1 1863 [0xAA-AA-03][0x00-00-5E][0x00-01][Type #1 Extended Layer 3 packet] 1864 LLC OUI PID 1866 Type #2 1868 [0xAA-AA-03][0x00-00-5E][0x00-04][Type #2 Extended Layer 3 packet] 1869 LLC OUI PID 1871 For conformance with this document MARS clients: 1873 MUST transmit data using Type #1 encapsulation. 1875 MUST be able to correctly receive traffic using Type #1 OR Type #2 1876 encapsulation. 1878 MUST NOT transmit using Type #2 encapsulation. 1880 5.5.1 Type #1 encapsulation. 1882 The Type #1 Extended layer 3 packet carries within it a copy of the 1883 source's Cluster Member ID (CMI) and either the 'short form' or 'long 1884 form' of the protocol type as appropriate (section 4.3). 1886 When carrying packets belonging to protocols with valid short form 1887 representations the [Type #1 Extended Layer 3 packet] is encoded as: 1889 [pkt$cmi][pkt$pro][Original Layer 3 packet] 1890 2octet 2octet N octet 1892 The first 2 octets (pkt$cmi) carry the CMI assigned when an endpoint 1893 registers with the MARS (section 5.2.3). The second 2 octets 1894 (pkt$pro) indicate the protocol type of the packet carried in the 1895 remainder of the payload. This is copied from the mar$pro field used 1896 in the MARS control messages. 1898 When carrying packets belonging to protocols that only have a long 1899 form representation (pkt$pro = 0x80) the overhead SHALL be further 1900 extended to carry the 5 byte mar$pro.snap field (with padding for 32 1901 bit alignment). The encoded form SHALL be: 1903 [pkt$cmi][0x00-80][mar$pro.snap][padding][Original Layer 3 packet] 1904 2octet 2octet 5 octets 3 octets N octet 1906 The CMI is copied into the pkt$cmi field of every outgoing Type #1 1907 packet. When an endpoint interface receives an AAL_SDU with the 1908 LLC/SNAP codepoint indicating Type #1 encapsulation it compares the 1909 CMI field with its own Cluster Member ID for the indicated protocol. 1910 The packet is discarded silently if they match. Otherwise the packet 1911 is accepted for processing by the local protocol entity identified by 1912 the pkt$pro (and possibly SNAP) field(s). 1914 Where a protocol has valid short and long forms of identification, 1915 receivers MAY choose to additionally recognise the long form. 1917 5.5.2 Type #2 encapsulation. 1919 Future developments may enable direct multicasting of AAL_SDUs beyond 1920 cluster boundaries. Expanding the set of possible sources in this way 1921 may cause the CMI to become an inadequate parameter with which to 1922 detect reflected packets. A larger source identification field may 1923 be required. 1925 The Type #2 Extended layer 3 packet carries within it an 8 octet 1926 source ID field and either the 'short form' or 'long form' of the 1927 protocol type as appropriate (section 4.3). The form and content of 1928 the source ID field is currently unspecified, and is not relevant to 1929 any MARS client built in conformance with this document. Received 1930 Type #2 encapsulated packets MUST always be accepted and passed up to 1931 the higher layer indicated by the protocol identifier. 1933 When carrying packets belonging to protocols with valid short form 1934 representations the [Type #2 Extended Layer 3 packet] is encoded as: 1936 [8 octet sourceID][mar$pro.type][Null pad][Original Layer 3 1937 packet] 1938 2octets 2octets 1940 When carrying packets belonging to protocols that only have a long 1941 form representation (pkt$pro = 0x80) the overhead SHALL be further 1942 extended to carry the 5 byte mar$pro.snap field (with padding for 32 1943 bit alignment). The encoded form SHALL be: 1945 [8 octet sourceID][mar$pro.type][mar$pro.snap][Null pad][Layer 3 1946 packet] 1947 2octets 5octets 1octet 1949 (Note that in this case the padding after the SNAP field is 1 octet 1950 rather than the 3 octets used in Type #1.) 1952 Where a protocol has valid short and long forms of identification, 1953 receivers MAY choose to additionally recognise the long form. 1955 (Future documents may specify the contents of the source ID field. 1956 This will only be relevant to implementations sending Type #2 1957 encapsulated packets, as they are the only entities that need to be 1958 concerned about detecting reflected Type #2 packets.) 1960 5.5.3 A Type #1 example. 1962 An IPv4 packet (fully identified by an Ethertype of 0x800, therefore 1963 requiring 'short form' protocol type encoding) would be transmitted 1964 as: 1966 [0xAA-AA-03][0x00-00-5E][0x00-01][pkt$cmi][0x800][IPv4 packet] 1968 The different LLC/SNAP codepoints for unicast and multicast packet 1969 transmission allows a single IPv4/ATM interface to support both by 1970 demuxing on the LLC/SNAP header. 1972 6. The MARS in greater detail. 1974 Section 5 implies a lot about the MARS's basic behaviour as observed 1975 by cluster members. This section summarises the behaviour of the MARS 1976 for groups that are VC mesh based, and describes how a MARSs 1977 behaviour changes when an MCS is registered to support a group. 1979 The MARS is intended to be a multiprotocol entity - all its mapping 1980 tables, CMIs, and control VCs MUST be managed within the context of 1981 the mar$pro field in incoming MARS messages. For example, a MARS 1982 supports completely separate ClusterControlVCs for each layer 3 1983 protocol that it is registering members for. If a MARS receives 1984 messages with an mar$pro that it does not support, the message is 1985 dropped. 1987 In general the MARS treats protocol addresses as arbitrary byte 1988 strings. For example, the MARS will not apply IPv4 specific 'class' 1989 checks to addresses supplied under mar$pro = 0x800. It is sufficient 1990 for the MARS to simply assume that endpoints know how to interpret 1991 the protocol addresses that they are establishing and releasing 1992 mappings for. 1994 The MARS requires control messages to carry the originator's identity 1995 in the source ATM address field(s). Messages that arrive with an 1996 empty ATM Number field are silently discarded prior to any other 1997 processing by the MARS. (Only the ATM Number field needs to be 1998 checked. An empty ATM Number field combined with a non-empty ATM 1999 Subaddress field does not represent a valid ATM address.) 2001 (Some example pseudo-code for a MARS can be found in Appendix F.) 2003 6.1 Basic interface to Cluster members. 2005 The following MARS messages are used or required by cluster members: 2007 1 MARS_REQUEST 2008 2 MARS_MULTI 2009 4 MARS_JOIN 2010 5 MARS_LEAVE 2011 6 MARS_NAK 2012 10 MARS_GROUPLIST_REQUEST 2013 11 MARS_GROUPLIST_REPLY 2014 12 MARS_REDIRECT_MAP 2016 6.1.1 Response to MARS_REQUEST. 2018 Except as described in section 6.2, if a MARS_REQUEST arrives whose 2019 source ATM address does not match that of any registered Cluster 2020 member the message MUST be dropped and ignored. 2022 6.1.2 Response to MARS_JOIN and MARS_LEAVE. 2024 When a registration MARS_JOIN arrives (described in section 5.2.3) 2025 the MARS performs the following actions: 2027 - Adds the node to ClusterControlVC. 2028 - Allocates a new Cluster Member ID (CMI). 2029 - Inserts the new CMI into the mar$cmi field of the MARS_JOIN. 2030 - Retransmits the MARS_JOIN back privately. 2032 If the node is already a registered member of the cluster associated 2033 with the specified protocol type then its existing CMI is simply 2034 copied into the MARS_JOIN, and the MARS_JOIN retransmitted back to 2035 the node. A single node may register multiple times if it supports 2036 multiple layer 3 protocols. The CMIs allocated by the MARS for each 2037 such registration may or may not be the same. 2039 The retransmitted registration MARS_JOIN must NOT be sent on 2040 ClusterControlVC. If a cluster member issues a deregistration 2041 MARS_LEAVE it too is retransmitted privately. 2043 Non-registration MARS_JOIN and MARS_LEAVE messages are ignored if 2044 they arrive from a node that is not registered as a cluster member. 2046 MARS_JOIN or MARS_LEAVE messages MUST arrive at the MARS with 2047 mar$flags.copy set to 0, otherwise the message is silently ignored. 2049 All outgoing MARS_JOIN or MARS_LEAVE messages SHALL have 2050 mar$flags.copy set to 1, and mar$msn set to the current Cluster 2051 Sequence Number for ClusterControlVC (Section 5.1.4.2). 2053 mar$flags.layer3grp (section 5.3) MUST be ignored (and treated as 2054 reset) for MARS_JOINs specifying more than a single group. If a 2055 MARS_JOIN/LEAVE is received that contains more than one 2056 pair, the MARS MUST silently drop the message. 2058 If one or more MCSs have registered with the MARS, message processing 2059 continues as described in section 6.2.4. 2061 The MARS database is updated to add the node to any indicated 2062 group(s) that it was not already considered a member of, and message 2063 processing continues as follows: 2065 If a single group was being joined or left: 2067 mar$flags.punched is set to 0. 2069 If the joining (leaving) node was already (is still) considered a 2070 member of the specified group, the message is retransmitted 2071 privately back to the cluster member. Otherwise the message is 2072 retransmitted on ClusterControlVC. 2074 If a single block covering 2 or more groups was being joined or left: 2076 A copy of the original MARS_JOIN/LEAVE is made. This copy then has 2077 its block replaced with a 'hole punched' set of zero or 2078 more pairs. The 'hole punched' set of pairs 2079 covers the entire address range specified by the original 2080 pair, but excludes those addresses/groups which the 2081 joining (leaving) node is already (still) a member of due to a 2082 previous single group join. 2084 If no 'holes' were punched in the specified block, the original 2085 MARS_JOIN/LEAVE is retransmitted out on ClusterControlVC. 2086 Otherwise the following occurs: 2088 The original MARS_JOIN/LEAVE is transmitted back to the source 2089 cluster member unchanged, using the VC it arrived on. The 2090 mar$flags.punched field MUST be reset to 0 in this message. 2092 If the hole-punched set contains 1 or more pair, the 2093 copy of the original MARS_JOIN/LEAVE is transmitted on 2094 ClusterControlVC, carrying the new list. The 2095 mar$flags.punched field MUST be set to 1 in this message. 2097 (The mar$flags.punched field is set to ensure the hole-punched 2098 copy is ignored by the message's source when trying to match 2099 received MARS_JOIN/LEAVE messages with ones previously sent 2100 (section 5.2.2)). 2102 If the MARS receives a deregistration MARS_LEAVE (described in 2103 section 5.2.3) that member's ATM address MUST be removed from all 2104 groups for which it may have joined, dropped from ClusterControlVC, 2105 and the CMI released. 2107 If the MARS receives an ERR_L_RELEASE on ClusterControlVC indicating 2108 that a cluster member has disconnected, that member's ATM address 2109 MUST be removed from all groups for which it may have joined, and the 2110 CMI released. 2112 6.1.3 Generating MARS_REDIRECT_MAP. 2114 A MARS_REDIRECT_MAP message (described in section 5.4.3) MUST be 2115 regularly transmitted on ClusterControlVC. It is RECOMMENDED that 2116 this occur every 1 minute, and it MUST occur at least every 2 2117 minutes. If the MARS has no knowledge of other backup MARSs serving 2118 the cluster, it MUST include its own address as the only entry in the 2119 MARS_REDIRECT_MAP message (in addition to filling in the source 2120 address fields). 2122 The design and use of backup MARS entities is beyond the scope of 2123 this document, and will be covered in future work. 2125 6.1.4 Cluster Sequence Numbers. 2127 The Cluster Sequence Number (CSN) is described in section 5.1.4, and 2128 is carried in the mar$msn field of MARS messages being sent to 2129 cluster members (either out ClusterControlVC or on an individual VC). 2130 The MARS increments the CSN after every transmission of a message on 2131 ClusterControlVC. The current CSN is copied into the mar$msn field 2132 of MARS messages being sent to cluster members, whether out 2133 ClusterControlVC or on a private VC. 2135 A MARS should be carefully designed to minimise the possibility of 2136 the CSN jumping unecessarily. Under normal operation only cluster 2137 members affected by transient link problems will miss CSN updates and 2138 be forced to revalidate. If the MARS itself glitches, it will be 2139 innundated with requests for a period as every cluster member 2140 attempts to revalidate. 2142 Calculations on the CSN MUST be performed as unsigned 32 bit 2143 arithmetic. 2145 One implication of this mechanism is that the MARS should serialize 2146 its processing of 'simultaneous' MARS_REQUEST, MARS_JOIN and 2147 MARS_LEAVE messages. Join and Leave operations should be queued 2148 within the MARS along with MARS_REQUESTS, and not processed until all 2149 the reply packets of a preceeding MARS_REQUEST have been transmitted. 2150 The transmission of MARS_REDIRECT_MAP should also be similarly 2151 queued. 2153 (The regular transmission of MARS_REDIRECT_MAP serves a secondary 2154 purpose of allowing cluster members to track the CSN, even if they 2155 miss an earlier MARS_JOIN or MARS_LEAVE.) 2157 6.2 MARS interface to Multicast Servers (MCS). 2159 When the MARS returns the actual addresses of group members, the 2160 endpoint behaviour described in section 5 results in all groups being 2161 supported by meshes of point to multipoint VCs. However, when MCSs 2162 register to support particular layer 3 multicast groups the MARS 2163 modifies its use of various MARS messages to fool endpoints into 2164 using the MCS instead. 2166 The following MARS messages are associated with interaction between 2167 the MARS and MCSs. 2169 3 MARS_MSERV 2170 7 MARS_UNSERV 2171 8 MARS_SJOIN 2172 9 MARS_SLEAVE 2174 The following MARS messages are treated in a slightly different 2175 manner when MCSs have registered to support certain group addresses: 2177 1 MARS_REQUEST 2178 4 MARS_JOIN 2179 5 MARS_LEAVE 2181 A MARS must keep two sets of mappings for each layer 3 group using 2182 MCS support. The original {layer 3 address, ATM.1, ATM.2, ... ATM.n} 2183 mapping (now termed the 'host map', although it includes routers) is 2184 augmented by a parallel {layer 3 address, server.1, server.2, .... 2185 server.K} mapping (the 'server map'). It is assumed that no ATM 2186 addresses appear in both the server and host maps for the same 2187 multicast group. Typically K will be 1, but it will be larger if 2188 multiple MCSs are configured to support a given group. 2190 The MARS also maintains a point to multipoint VC out to any MCSs 2191 registered with it, called ServerControlVC (section 6.2.3). This 2192 serves an analogous role to ClusterControlVC, allowing the MARS to 2193 update the MCSs with group membership changes as they occur. A MARS 2194 MUST also send its regular MARS_REDIRECT_MAP transmissions on both 2195 ServerControlVC and ClusterControlVC. 2197 6.2.1 Response to a MARS_REQUEST if MCS is registered. 2199 When the MARS receives a MARS_REQUEST for an address that has both 2200 host and server maps it generates a response based on the identity of 2201 the request's source. If the requestor is a member of the server map 2202 for the requested group then the MARS returns the contents of the 2203 host map in a sequence of one or more MARS_MULTIs. Otherwise, if the 2204 source is a valid cluster member, the MARS returns the contents of 2205 the server map in a sequence of one or more MARS_MULTIs. If the 2206 source is neither a cluster member, nor a member of the server map 2207 for the group, the request is dropped and ignored. 2209 Servers use the host map to establish a basic distribution VC for the 2210 group. Cluster members will establish outgoing multipoint VCs to 2211 members of the group's server map, without being aware that their 2212 packets will not be going directly to the multicast group's members. 2214 6.2.2 MARS_MSERV and MARS_UNSERV messages. 2216 MARS_MSERV and MARS_UNSERV are identical to the MARS_JOIN message. 2217 An MCS uses a MARS_MSERV with a pair of to specify 2218 the multicast group X that it is willing to support. A single group 2219 MARS_UNSERV indicates the group that the MCS is no longer willing to 2220 support. The operation code for MARS_MSERV is 3 (decimal), and 2221 MARS_UNSERV is 7 (decimal). 2223 Both of these messages are sent to the MARS over a point to point VC 2224 (between MCS and MARS). After processing, they are retransmitted on 2225 ServerControlVC to allow other MCSs to note the new node. 2227 When registering or deregistering support for specific groups the 2228 mar$flags.register flag MUST be zero. (This flag is only one when the 2229 MCS is registering as a member of ServerControlVC, as described in 2230 section 6.2.3.) 2232 When an MCS issues a MARS_MSERV for a specific group the message MUST 2233 be dropped and ignored if the source has not already registered with 2234 the MARS as a multicast server (section 6.2.3). Otherwise, the MARS 2235 adds the new ATM address to the server map for the specified group, 2236 possibly constructing a new server map if this is the first MCS for 2237 the group. 2239 If a MARS_MSERV represents the first MCS to register for a particular 2240 group, and there exists a non null host map serving that particular 2241 group, the MARS issues a MARS_MIGRATE (section 5.1.6) on 2242 ClusterControlVC. The MARS's own identity is placed in the source 2243 protocol and hardware address fields of the MARS_MIGRATE. The ATM 2244 address of the MCS is placed as the first and only target ATM 2245 address. The address of the affected group is placed in the target 2246 multicast group address field. 2248 If a MARS_MSERV is not the first MCS to register for a particular 2249 group the MARS simply changes its operation code to MARS_JOIN, and 2250 sends a copy of the message on ClusterControlVC. This fools the 2251 cluster members into thinking a new leaf node as been added to the 2252 group specified. In the retransmitted MARS_JOIN mar$flags.layer3grp 2253 MUST be zero, mar$flags.copy MUST be one, and mar$flags.register MUST 2254 be zero. 2256 When an MCS issues a MARS_UNSERV the MARS removes its ATM address 2257 from the server maps for each specified group, deleting any server 2258 maps that end up being null after the operation. 2260 The operation code is then changed to MARS_LEAVE and sends a copy of 2261 the message on ClusterControlVC. This fools the cluster members into 2262 thinking a leaf node as been dropped from the group specified. In the 2263 retransmitted MARS_LEAVE mar$flags.layer3grp MUST be zero, 2264 mar$flags.copy MUST be one, and mar$flags.register MUST be zero. 2266 The MARS retransmits redundant MARS_MSERV and MARS_UNSERV messages 2267 directly back to the MCS generating them. MARS_MIGRATE messages are 2268 never repeated in response to redundant MARS_MSERVs. 2270 The last or only MCS for a group MAY choose to issue a MARS_UNSERV 2271 while the group still has members. When the MARS_UNSERV is processed 2272 by the MARS the 'server map' will be deleted. When the associated 2273 MARS_LEAVE is issued on ClusterControlVC, all cluster members with a 2274 VC open to the MCS for that group will close down the VC (in 2275 accordance with section 5.1.4, since the MCS was their only leaf 2276 node). When cluster members subsequently find they need to transmit 2277 packets to the group, they will begin again with the 2278 MARS_REQUEST/MARS_MULTI sequence to establish a new VC. Since the 2279 MARS will have deleted the server map, this will result in the host 2280 map being return, and the group reverts to being supported by a VC 2281 mesh. 2283 The reverse process is achieved through the MARS_MIGRATE message when 2284 the first MCS registers to support a group. This ensures that 2285 cluster members explicitly dismantle any VC mesh they may have had 2286 up, and re-establish their multicast forwarding path with the MCS as 2287 its termination point. 2289 6.2.3 Registering a Multicast Server (MCS). 2291 Section 5.2.3 describes how endpoints register as cluster members, 2292 and hence get added as leaf nodes to ClusterControlVC. The same 2293 approach is used to register endpoints that intend to provide MCS 2294 support. 2296 Registration with the MARS occurs when an endpoint issues a 2297 MARS_MSERV with mar$flags.register set to one. Upon registration the 2298 endpoint is added as a leaf node to ServerControlVC, and the 2299 MARS_MSERV is returned to the MCS privately. 2301 The MCS retransmits this MARS_MSERV until it confirms that the MARS 2302 has received it (by receiving a copy back, in an analogous way to the 2303 mechanism described in section 5.2.2 for reliably transmitting 2304 MARS_JOINs). 2306 The mar$cmi field in MARS_MSERVs MUST be set to zero by both MCS and 2307 MARS. 2309 An MCS may also choose to de-register, using a MARS_UNSERV with 2310 mar$flags.register set to one. When this occurs the MARS MUST remove 2311 all references to that MCS in all servermaps associated with the 2312 protocol (mar$pro) specified in the MARS_UNSERV, and drop the MCS 2313 from ServerControlVC. 2315 Note that multiple logical MCSs may share the same physical ATM 2316 interface, provided that each MCS uses a separate ATM address (e.g. a 2317 different SEL field in the NSAP format address). In fact, an MCS may 2318 share the ATM interface of a node that is also a cluster member 2319 (either host or router), provided each logical entity has a different 2320 ATM address. 2322 A MARS MUST be capable of handling a multi-entry servermap. However, 2323 the possible use of multiple MCSs registering to support the same 2324 group is a subject for further study. In the absence of an MCS 2325 synchronisation protocol a system administrator MUST NOT allow more 2326 than one logical MCS to register for a given group. 2328 6.2.4 Modified response to MARS_JOIN and MARS_LEAVE. 2330 The existence of MCSs supporting some groups but not others requires 2331 the MARS to modify its distribution of single and block join/leave 2332 updates to cluster members. The MARS also adds two new messages - 2333 MARS_SJOIN and MARS_SLEAVE - for communicating group changes to MCSs 2334 over ServerControlVC. 2336 The MARS_SJOIN and MARS_SLEAVE messages are identical to MARS_JOIN, 2337 with operation codes 18 and 19 (decimal) respectively. 2339 When a cluster member issues MARS_JOIN or MARS_LEAVE for a single 2340 group, the MARS checks to see if the group has an associated server 2341 map. If the specified group does not have a server map the MARS 2342 simply retransmits the MARS_JOIN or MARS_LEAVE on ClusterControlVC. 2344 However, if a server map exists for the group a new set of actions 2345 are taken. 2347 A copy of the MARS_JOIN/LEAVE is made with type MARS_SJOIN or 2348 MARS_SLEAVE as appropriate, and transmitted on ServerControlVC. 2349 This allows the MCS(s) supporting the group to note the new member 2350 and update their data VCs. 2352 The original message is transmitted back to the source cluster 2353 member unchanged, using the VC it arrived on rather than 2354 ClusterControlVC. The mar$flags.punched field MUST be reset to 0 2355 in this message. 2357 (Section 5.2.2 requires cluster members have a mechanism to confirm 2358 the reception of their message by the MARS. For mesh supported 2359 groups, using ClusterControlVC serves dual purpose of providing this 2360 confirmation and distributing group update information. When a group 2361 is MCS supported, there is no reason for all cluster members to 2362 process null join/leave messages on ClusterControlVC, so they are 2363 sent back on the private VC between cluster member and MARS.) 2365 Receipt of a block MARS_JOIN (e.g. from a router coming on-line) or 2366 MARS_LEAVE requires a more complex response. The single 2367 block may simultaneously cover mesh supported and MCS supported 2368 groups. However, cluster members only need to be informed of the 2369 mesh supported groups that the endpoint has joined. Only the MCSs 2370 need to know if the endpoint is joining any MCS supported groups. 2372 The solution is to modify the MARS_JOIN or MARS_LEAVE that is 2373 retransmitted on ClusterControlVC. The following action is taken: 2375 A copy of the MARS_JOIN/LEAVE is made with type MARS_SJOIN or 2376 MARS_SLEAVE as appropriate, and transmitted on ServerControlVC. 2377 This allows the MCS(s) supporting the group to note the membership 2378 change and update their outgoing point to multipoint VCs. 2380 Before transmission on the ClusterControlVC, the original 2381 MARS_JOIN/LEAVE then has its block replaced with a 'hole 2382 punched' set of zero or more pairs. The 'hole punched' 2383 set of pairs covers the entire address range specified 2384 by the original pair, but excludes those 2385 addresses/groups supported by MCSs or which the joining (leaving) 2386 node is already (still) a member of due to a previous single group 2387 join. 2389 If no 'holes' were punched in the specified block, the original 2390 MARS_JOIN/LEAVE is re-transmitted out on ClusterControlVC 2391 unchanged. Otherwise the following occurs: 2393 The original MARS_JOIN/LEAVE is transmitted back to the source 2394 cluster member unchanged, using the VC it arrived on. The 2395 mar$flags.punched field MUST be reset to 0 in this message. 2397 If the hole-punched set contains 1 or more pair, a 2398 copy of the original MARS_JOIN/LEAVE is transmitted on 2399 ClusterControlVC, carrying the new list. The 2400 mar$flags.punched field MUST be set to 1 in this message. 2402 The mar$flags.punched field is set to ensure the hole-punched copy 2403 is ignored by the message's source when trying to match received 2404 MARS_JOIN/LEAVE messages with ones previously sent (section 2405 5.2.2). 2407 (Appendix A discusses some algorithms for 'hole punching'.) 2409 It is assumed that MCSs use the MARS_SJOINs and MARS_SLEAVEs to 2410 update their own VCs out to the actual group's members. 2412 mar$flags.layer3grp is copied over into the messages transmitted by 2413 the MARS. mar$flags.copy MUST be set to one. 2415 6.2.5 Sequence numbers for ServerControlVC traffic. 2417 In an analogous fashion to the Cluster Sequence Number, the MARS 2418 keeps a Server Sequence Number (SSN) that is incremented after every 2419 transmission on ServerControlVC. The current value of the SSN is 2420 inserted into the mar$msn field of every message the MARS issues that 2421 it believes is destined for an MCS. This includes MARS_MULTIs that 2422 are being returned in response to a MARS_REQUEST from an MCS, and 2423 MARS_REDIRECT_MAP being sent on ServerControlVC. The MARS must check 2424 the MARS_REQUESTs source, and if it is a registered MCS the SSN is 2425 copied into the mar$msn field, otherwise the CSN is copied into the 2426 mar$msn field. 2428 MCSs are expected to track and use the SSNs in an analogous manner to 2429 the way endpoints use the CSN in section 5.1 (to trigger revalidation 2430 of group membership information). 2432 A MARS should be carefully designed to minimise the possibility of 2433 the SSN jumping unecessarily. Under normal operation only MCSs that 2434 are affected by transient link problems will miss mar$msn updates and 2435 be forced to revalidate. If the MARS itself glitches it will be 2436 innundated with requests for a period as every MCS attempts to 2437 revalidate. 2439 6.3 Why global sequence numbers? 2441 The CSN and SSN are global within the context of a given protocol 2442 (e.g. IPv4, mar$pro = 0x800). They count ClusterControlVC and 2443 ServerControlVC activity without reference to the multicast group(s) 2444 involved. This may be perceived as a limitation, because there is no 2445 way for cluster members or multicast servers to isolate exactly which 2446 multicast group they may have missed an update for. An alternative 2447 was to try and provide a per-group sequence number. 2449 Unfortunately per-group sequence numbers are not practical. The 2450 current mechanism allows sequence information to be piggy-backed onto 2451 MARS messages already in transit for other reasons. The ability to 2452 specify blocks of multicast addresses with a single MARS_JOIN or 2453 MARS_LEAVE means that a single message can refer to membership change 2454 for multiple groups simultaneously. A single mar$msn field cannot 2455 provide meaningful information about each group's sequence. Multiple 2456 mar$msn fields would have been unwieldy. 2458 Any MARS or cluster member that supports different protocols MUST 2459 keep separate mapping tables and sequence numbers for each protocol. 2461 6.4 Redundant/Backup MARS Architectures. 2463 If backup MARSs exist for a given cluster then mechanisms are needed 2464 to ensure consistency between their mapping tables and those of the 2465 active, current MARS. 2467 (Cluster members will consider backup MARSs to exist if they have 2468 been configured with a table of MARS addresses, or the regular 2469 MARS_REDIRECT_MAP messages contain a list of 2 or more addresses.) 2471 The definition of an MARS-synchronization protocol is beyond the 2472 current scope of this document, and is expected to be the subject of 2473 further research work. However, the following observations may be 2474 made: 2476 MARS_REDIRECT_MAP messages exist, enabling one MARS to force 2477 endpoints to move to another MARS (e.g. in the aftermath of a MARS 2478 failure, the chosen backup MARS will eventually wish to hand 2479 control of the cluster over to the main MARS when it is 2480 functioning properly again). 2482 Cluster members and MCSs do not need to start up with knowledge of 2483 more than one MARS, provided that MARS correctly issues 2484 MARS_REDIRECT_MAP messages with the full list of MARSs for that 2485 cluster. 2487 Any mechanism for synchronising backup MARSs (and coping with the 2488 aftermath of MARS failures) should be compatible with the cluster 2489 member behaviour described in this document. 2491 7. How an MCS utilises a MARS. 2493 When an MCS supports a multicast group it acts as a proxy cluster 2494 endpoint for the senders to the group. It also behaves in an 2495 analogous manner to a sender, managing a single outgoing point to 2496 multipoint VC to the real group members. 2498 Detailed description of possible MCS architectures are beyond the 2499 scope of this document. This section will outline the main issues. 2501 7.1 Association with a particular Layer 3 group. 2503 When an MCS issues a MARS_MSERV it forces all senders to the 2504 specified layer 3 group to terminate their VCs on the supplied source 2505 ATM address. 2507 The simplest MCS architecture involves taking incoming AAL_SDUs and 2508 simply flipping them back out a single point to multipoint VC. Such 2509 an MCS cannot support more than one group at once, as it has no way 2510 to differentiate between traffic destined for different groups. 2511 Using this architecture, a physical node would provide MCS support 2512 for multiple groups by creating multiple logical instances of the 2513 MCS, each with different ATM Addresses (e.g. a different SEL value in 2514 the node's NSAPA). 2516 A slightly more complex approach would be to add minimal layer 3 2517 specific processing into the MCS. This would look inside the received 2518 AAL_SDUs and determine which layer 3 group they are destined for. A 2519 single instance of such an MCS might register its ATM Address with 2520 the MARS for multiple layer 3 groups, and manage multiple independent 2521 outgoing point to multipoint VCs (one for each group). 2523 When an MCS starts up it MUST register with the MARS as described in 2524 section 6.2.3, identifying the protocol it supports with the mar$pro 2525 field of the MARS_MSERV. This also applies to logical MCSs, even if 2526 they share the same physical ATM interface. This is important so that 2527 the MARS can react to the loss of an MCS when it drops off 2528 ServControlVC. (One consequence is that 'simple' MCS architectures 2529 end up with one ServerControlVC member per group. MCSs with layer 3 2530 specific processing may support multiple groups while still only 2531 registering as one member of ServerControlVC.) 2533 An MCS MUST NOT share the same ATM address as a cluster member, 2534 although it may share the same physical ATM interface. 2536 7.2 Termination of incoming VCs. 2538 An MCS MUST terminate unidirectional VCs in the same manner as a 2539 cluster member. (e.g. terminate on an LLC entity when LLC/SNAP 2540 encapsulation is used, as described in RFC 1755 for unicast 2541 endpoints.) 2543 7.3 Management of outgoing VC. 2545 An MCS MUST establish and manage its outgoing point to multipoint VC 2546 as a cluster member does (section 5.1). 2548 MARS_REQUEST is used by the MCS to establish the initial leaf nodes 2549 for the MCS's outgoing point to multipoint VC. After the VC is 2550 established, the MCS reacts to MARS_SJOINs and MARS_SLEAVEs in the 2551 same way a cluster member reacts to MARS_JOINs and MARS_LEAVEs. 2553 The MCS tracks the Server Sequence Number from the mar$msn fields of 2554 messages from the MARS, and revalidates its outgoing point to 2555 multipoint VC(s) when a sequence number jump occurs. 2557 7.4 Use of a backup MARS. 2559 The MCS uses the same approach to backup MARSs as a cluster member 2560 (section 5.4), tracking MARS_REDIRECT_MAP messages on 2561 ServerControlVC. 2563 8. Support for IP multicast routers. 2565 Multicast routers are required for the propagation of multicast 2566 traffic beyond the constraints of a single cluster (inter-cluster 2567 traffic). (There is a sense in which they are multicast servers 2568 acting at the next higher layer, with clusters, rather than 2569 individual endpoints, as their abstract sources and destinations.) 2571 Multicast routers typically participate in higher layer multicast 2572 routing algorithms and policies that are beyond the scope of this 2573 memo (e.g. DVMRP [5] in the IPv4 environment). 2575 It is assumed that the multicast routers will be implemented over the 2576 same sort of IP/ATM interface that a multicast host would use. Their 2577 IP/ATM interfaces will will register with the MARS as a cluster 2578 members, joining and leaving multicast groups as necessary. As noted 2579 in section 5, multiple logical 'endpoints' may be implemented over a 2580 single physical ATM interface. Routers use this approach to provide 2581 interfaces into each clusters they will be routing between. 2583 The rest of this section will assume a simple IPv4 scenario where the 2584 scope of a cluster has been limited to a particular LIS that is part 2585 of an overlaid IP network. Not all members of the LIS are necessarily 2586 registered cluster members (you may have unicast-only hosts in the 2587 LIS). 2589 8.1 Forwarding into a Cluster. 2591 If the multicast router needs to transmit a packet to a group within 2592 the cluster its IP/ATM interface opens a VC in the same manner as a 2593 normal host would. Once a VC is open, the router watches for 2594 MARS_JOIN and MARS_LEAVE messages and responds to them as a normal 2595 host would. 2597 The multicast router's transmit side MUST implement inactivity timers 2598 to shut down idle outgoing VCs, as for normal hosts. 2600 As with normal host, the multicast router does not need to be a 2601 member of a group it is sending to. 2603 8.2 Joining in 'promiscuous' mode. 2605 Once registered and initialised, the simplest model of IPv4 multicast 2606 router operation is for it to issue a MARS_JOIN encompassing the 2607 entire Class D address space. In effect it becomes 'promiscuous', as 2608 it will be a leaf node to all present and future multipoint VCs 2609 established to IPv4 groups on the cluster. 2611 How a router chooses which groups to propagate outside the cluster is 2612 beyond the scope of this document. 2614 Consistent with RFC 1112, IP multicast routers may retain the use of 2615 IGMP Query and IGMP Report messages to ascertain group membership. 2616 However, certain optimisations are possible, and are described in 2617 section 8.5. 2619 8.3 Forwarding across the cluster. 2621 Under some circumstances the cluster may simply be another hop 2622 between IP subnets that have participants in a multicast group. 2624 [LAN.1] ----- IPmcR.1 -- [cluster/LIS] -- IPmcR.2 ----- [LAN.2] 2626 LAN.1 and LAN.2 are subnets (such as Ethernet) with attached hosts 2627 that are members of group X. 2629 IPmcR.1 and IPmcR.2 are multicast routers with interfaces to the LIS. 2631 A traditional solution would be to treat the LIS as a unicast subnet, 2632 and use tunneling routers. However, this would not allow hosts on the 2633 LIS to participate in the cross-LIS traffic. 2635 Assume IPmcR.1 is receiving packets promiscuously on its LAN.1 2636 interface. Assume further it is configured to propagate multicast 2637 traffic to all attached interfaces. In this case that means the LIS. 2639 When a packet for group X arrives on its LAN.1 interface, IPmcR.1 2640 simply sends the packet to group X on the LIS interface as a normal 2641 host would (Issuing MARS_REQUEST for group X, creating the VC, 2642 sending the packet). 2644 Assuming IPmcR.2 initialised itself with the MARS as a member of the 2645 entire Class D space, it will have been returned as a member of X 2646 even if no other nodes on the LIS were members. All packets for group 2647 X received on IPmcR.2's LIS interface may be retransmitted on LAN.2. 2649 If IPmcR.1 is similarly initialised the reverse process will apply 2650 for multicast traffic from LAN.2 to LAN.1, for any multicast group. 2651 The benefit of this scenario is that cluster members within the LIS 2652 may also join and leave group X at anytime. 2654 8.4 Joining in 'semi-promiscuous' mode. 2656 Both unicast and multicast IP routers have a common problem - 2657 limitations on the number of AAL contexts available at their ATM 2658 interfaces. Being 'promiscuous' in the RFC 1112 sense means that for 2659 every M hosts sending to N groups, a multicast router's ATM interface 2660 will have M*N incoming reassembly engines tied up. 2662 It is not hard to envisage situations where a number of multicast 2663 groups are active within the LIS but are not required to be 2664 propagated beyond the LIS itself. An example might be a distributed 2665 simulation system specifically designed to use the high speed IP/ATM 2666 environment. There may be no practical way its traffic could be 2667 utilised on 'the other side' of the multicast router, yet under the 2668 conventional scheme the router would have to be a leaf to each 2669 participating host anyway. 2671 As this problem occurs below the IP layer, it is worth noting that 2672 'scoping' mechanisms at the IP multicast routing level do not provide 2673 a solution. An IP level scope would still result in the router's ATM 2674 interface receiving traffic on the scoped groups, only to drop it. 2676 In this situation the network administrator might configure their 2677 multicast routers to exclude sections of the Class D address space 2678 when issuing MARS_JOIN(s). Multicast groups that will never be 2679 propagated beyond the cluster will not have the router listed as a 2680 member, and the router will never have to receive (and simply ignore) 2681 traffic from those groups. 2683 Another scenario involves the product M*N exceeding the capacity of a 2684 single router's interface (especially if the same interface must also 2685 support a unicast IP router service). 2687 A network administrator may choose to add a second node, to function 2688 as a parallel IP multicast router. Each router would be configured to 2689 be 'promiscuous' over separate parts of the Class D address space, 2690 thus exposing themselves to only part of the VC load. This sharing 2691 would be completely transparent to IP hosts within the LIS. 2693 Restricted promiscuous mode does not break RFC 1112's use of IGMP 2694 Report messages. If the router is configured to serve a given block 2695 of Class D addresses, it will receive the IGMP Report. If the router 2696 is not configured to support a given block, then the existence of an 2697 IGMP Report for a group in that block is irrelevant to the router. 2698 All routers are able to track membership changes through the 2699 MARS_JOIN and MARS_LEAVE traffic anyway. (Section 8.5 discusses a 2700 better alternative to IGMP within a cluster.) 2702 Mechanisms and reasons for establishing these modes of operation are 2703 beyond the scope of this document. 2705 8.5 An alternative to IGMP Queries. 2707 An unfortunate aspect of IGMP is that it assumes multicasting of IP 2708 packets is a cheap and trivial event at the link layer. As a 2709 consequence, regular IGMP Queries are multicasted by routers to group 2710 224.0.0.1. These queries are intended to trigger IGMP Replies by 2711 cluster members that have layer 3 members of particular groups. 2713 The MARS_GROUPLIST_REQUEST and MARS_GROUPLIST_REPLY messages were 2714 designed to allow routers to avoid actually transmitting IGMP Queries 2715 out into a cluster. 2717 Whenever the router's forwarding engine wishes to transmit an IGMP 2718 query, a MARS_GROUPLIST_REQUEST can be sent to the MARS instead. The 2719 resulting MARS_GROUPLIST_REPLY(s) (described in section 5.3) from the 2720 MARS carry all the information that the router would have ascertained 2721 from IGMP replies. 2723 It is RECOMMENDED that multicast routers utilise this MARS service to 2724 minimise IGMP traffic within the cluster. 2726 By default a MARS_GROUPLIST_REQUEST SHOULD specify the entire address 2727 space (e.g. <224.0.0.0, 239.255.255.255> in an IPv4 environment). 2728 However, routers serving part of the address space (as described in 2729 section 8.4) MAY choose to issue MARS_GROUPLIST_REQUESTs that specify 2730 only the subset of the address space they are serving. 2732 (On the surface it would also seem useful for multicast routers to 2733 track MARS_JOINs and MARS_LEAVEs that arrive with mar$flags.layer3grp 2734 set. These might be used in lieu of IGMP Reports, to provide the 2735 router with timely indication that a new layer 3 group member exists 2736 within the cluster. However, this only works on VC mesh supported 2737 groups, and is therefore NOT recommended). 2739 Appendix B discusses less elegant mechanisms for reducing the impact 2740 of IGMP traffic within a cluster, on the assumption that the IP/ATM 2741 interfaces to the cluster are being used by un-optimised IP 2742 multicasting code. 2744 8.6 CMIs across multiple interfaces. 2746 The Cluster Member ID is only unique within the Cluster managed by a 2747 given MARS. On the surface this might appear to leave us with a 2748 problem when a multicast router is routing between two or more 2749 Clusters using a single physical ATM interface. The router will 2750 register with two or more MARSs, and thereby acquire two or more 2751 independent CMI's. Given that each MARS has no reason to synchronise 2752 their CMI allocations, it is possible for a host in one cluster to 2753 have the same CMI has the router's interface to another Cluster. How 2754 does the router distinguish between its own reflected packets, and 2755 packets from that other host? 2757 The answer lies in the fact that routers (and hosts) actually 2758 implement logical IP/ATM interfaces over a single physical ATM 2759 interface. Each logical interface will have a unique ATM Address (eg. 2760 an NSAP with different SELector fields, one for each logical 2761 interface). 2763 Each logical IP/ATM interface is configured with the address of a 2764 single MARS, attaches to only one cluster, and so had only one CMI to 2765 worry about. Each of the MARSs that the router is registered with 2766 will have been given a different ATM Address (corresponding to the 2767 different logical IP/ATM interfaces) in each registration MARS_JOIN. 2769 When hosts in a cluster add the router as a leaf node, they'll 2770 specify the ATM Address of the appropriate logical IP/ATM interface 2771 on the router in the L_MULTI_ADD message. Thus, each logical IP/ATM 2772 interface will only have to check and filter on CMIs assigned by its 2773 own MARS. 2775 In essence the cluster differentiation is achieved by ensuring that 2776 logical IP/ATM interfaces are assigned different ATM Addresses. 2778 9. Multiprotocol applications of the MARS and MARS clients. 2780 A deliberate attempt has been made to describe the MARS and 2781 associated mechanisms in a manner independent of a specific higher 2782 layer protocol being run over the ATM cloud. The immediate 2783 application of this document will be in an IPv4 environment, and this 2784 is reflected by the focus of key examples. However, the mar$pro.type 2785 and mar$pro.snap fields in every MARS control message allow any 2786 higher layer protocol that has a 'short form' or 'long form' of 2787 protocol identification (section 4.3) to be supported by a MARS. 2789 Every MARS MUST implement entirely separate logical mapping tables 2790 and support. Every cluster member must interpret messages from the 2791 MARS in the context of the protocol type that the MARS message refers 2792 to. 2794 Every MARS and MARS client MUST treat Cluster Member IDs in the 2795 context of the protocol type carried in the MARS message or data 2796 packet containing the CMI. 2798 For example, IPv6 has been allocated an Ethertype of 0x86DD. This 2799 means the 'short form' of protocol identification must be used in the 2800 MARS control messages and the data path encapsulation (section 5.5). 2801 An IPv6 multicasting client sets the mar$pro.type field of every MARS 2802 message to 0x86DD. When carrying IPv6 addresses the mar$spln and 2803 mar$tpln fields are either 0 (for null or non-existent information) 2804 or 16 (for the full IPv6 address). 2806 Following the rules in section 5.5, an IPv6 data packet is 2807 encapsulated as: 2809 [0xAA-AA-03][0x00-00-5E][0x00-01][pkt$cmi][0x86DD][IPv6 packet] 2811 A host or endpoint interface that is using the same MARS to support 2812 multicasting needs of multiple protocols MUST not assume their CMI 2813 will be the same for each protocol. 2815 10. Supplementary parameter processing. 2817 The mar$extoff field in the [Fixed header] indicates whether 2818 supplementary parameters are being carried by a MARS control message. 2819 This mechanism is intended to enable the addition of new 2820 functionality to the MARS protocol in later documents. 2822 Supplementary parameters are conveyed as a list of TLV (type, length, 2823 value) encoded information elements. The TLV(s) begin on the first 2824 32 bit boundary following the [Addresses] field in the MARS control 2825 message (e.g. after mar$tsa.N in a MARS_MULTI, after mar$max.N in a 2826 MARS_JOIN, etc). 2828 10.1 Interpreting the mar$extoff field. 2830 If the mar$extoff field is non-zero it indicates that a list of one 2831 or more TLVs have been appended to the MARS message. The first TLV 2832 is found by treating mar$extoff as an unsigned integer representing 2833 an offset (in octets) from the beginning of the MARS message (the MSB 2834 of the mar$afn field). 2836 As TLVs are 32 bit aligned the bottom 2 bits of mar$extoff are also 2837 reserved. A receiver MUST mask off these two bits before calculating 2838 the octet offset to the TLV list. A sender MUST set these two bits 2839 to zero. 2841 If mar$extoff is zero no TLVs have been appended. 2843 10.2 The format of TLVs. 2845 When they exist, TLVs begin on 32 bit boundaries, are multiples of 32 2846 bits in length, and form a sequential list terminated by a NULL TLV. 2848 The TLV structure is: 2850 [Type - 2 octets][Length - 2 octets][Value - n*4 octets] 2852 The Type subfield indicates how the contents of the Value subfield 2853 are to be interpreted. 2855 The Length subfield indicates the number of VALID octets in the Value 2856 subfield. Valid octets in the Value subfield start immediately after 2857 the Length subfield. The offset (in octets) from the start of this 2858 TLV to the start of the next TLV in the list is given by the 2859 following formula: 2861 offset = (length + 4 + ((4-(length & 3)) % 4)) 2863 (where % is the modulus operator) 2865 The Value subfield is padded with 0, 1, 2, or 3 octets to ensure the 2866 next TLV is 32 bit aligned. The padded locations MUST be set to zero. 2868 (For example, a TLV that needed only 5 valid octets of information 2869 would be 12 octets long. The Length subfield would hold the value 5, 2870 and the Value subfield would be padded out to 8 bytes. The 5 valid 2871 octets of information begin at the first octet of the Value 2872 subfield.) 2874 The Type subfield is formatted in the following way: 2876 | 1st octet | 2nd octet | 2877 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 2878 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2879 | x | y | 2880 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2882 The most significant 2 bits (Type.x) determine how a recipient should 2883 behave when it doesn't recognise the TLV type indicated by the lower 2884 14 bits (Type.y). The required behaviours are: 2886 Type.x = 0 Skip the TLV, continue processing the list. 2887 Type.x = 1 Stop processing, silently drop the MARS message. 2888 Type.x = 2 Stop processing, drop message, give error indication. 2889 Type.x = 3 Reserved. (currently treat as x = 0) 2891 (The error indication generated when Type.x = 2 SHOULD be logged in 2892 some locally significant fashion. Consequential MARS message activity 2893 in response to such an error condition will be defined in future 2894 documents.) 2896 The TLV type space (Type.y) is further subdivided to encourage use 2897 outside the IETF. 2899 0 Null TLV. 2900 0x0001 - 0x0FFF Reserved for the IETF. 2901 0x1000 - 0x11FF Allocated to the ATM Forum. 2902 0x1200 - 0x37FF Reserved for the IETF. 2903 0x3800 - 0x3FFF Experimental use. 2905 10.3 Processing MARS messages with TLVs. 2907 Supplementary parameters act as modifiers to the basic behaviour 2908 specified by the mar$op field of any given MARS message. 2910 If a MARS message arrives with a non-zero mar$extoff field its TLV 2911 list MUST be parsed before handling the MARS message in accordance 2912 with the mar$op value. Unrecognised TLVs MUST be handled as required 2913 by their Type.x value. 2915 How TLVs modify basic MARS operations will be mar$op and TLV 2916 specific. 2918 10.4 Initial set of TLV elements. 2920 Conformance with this document only REQUIRES the recognition of one 2921 TLV, the Null TLV. This terminates a list of TLVs, and MUST be 2922 present if mar$extoff is non-zero in a MARS message. It MAY be the 2923 only TLV present. 2925 The Null TLV is coded as: 2927 [0x00-00][0x00-00] 2929 Future documents will describe the formats, contents, and 2930 interpretations of additional TLVs. The minimal parsing requirements 2931 imposed by this document are intended to allow conformant MARS and 2932 MARS client implementations to deal gracefully and predictably with 2933 future TLV developments. 2935 11. Key Decisions and open issues. 2937 The key decisions this document proposes: 2939 A Multicast Address Resolution Server (MARS) is proposed to co- 2940 ordinate and distribute mappings of ATM endpoint addresses to 2941 arbitrary higher layer 'multicast group addresses'. The specific 2942 case of IPv4 multicast is used as the example. 2944 The concept of 'clusters' is introduced to define the scope of a 2945 MARS's responsibility, and the set of ATM endpoints willing to 2946 participate in link level multicasting. 2948 A MARS is described with the functionality required to support 2949 intra-cluster multicasting using either VC meshes or ATM level 2950 multicast servers (MCSs). 2952 LLC/SNAP encapsulation of MARS control messages allows MARS and 2953 ATMARP traffic to share VCs, and allows partially co-resident MARS 2954 and ATMARP entities. 2956 New message types: 2958 MARS_JOIN, MARS_LEAVE, MARS_REQUEST. Allow endpoints to join, 2959 leave, and request the current membership list of multicast 2960 groups. 2962 MARS_MULTI. Allows multiple ATM addresses to be returned by the 2963 MARS in response to a MARS_REQUEST. 2965 MARS_MSERV, MARS_UNSERV. Allow multicast servers to register 2966 and deregister themselves with the MARS. 2968 MARS_SJOIN, MARS_SLEAVE. Allow MARS to pass on group membership 2969 changes to multicast servers. 2971 MARS_GROUPLIST_REQUEST, MARS_GROUPLIST_REPLY. Allow MARS to 2972 indicate which groups have actual layer 3 members. May be used 2973 to support IGMP in IPv4 environments, and similar functions in 2974 other environments. 2976 MARS_REDIRECT_MAP. Allow MARS to specify a set of backup MARS 2977 addresses. 2979 MARS_MIGRATE. Allows MARS to force cluster members to shift 2980 from VC mesh to MCS based forwarding tree in single operation. 2982 'wild card' MARS mapping table entries are possible, where a 2983 single ATM address is simultaneously associated with blocks of 2984 multicast group addresses. 2986 For the MARS protocol mar$op.version = 0. The complete set of MARS 2987 control messages and mar$op.type values is: 2989 1 MARS_REQUEST 2990 2 MARS_MULTI 2991 3 MARS_MSERV 2992 4 MARS_JOIN 2993 5 MARS_LEAVE 2994 6 MARS_NAK 2995 7 MARS_UNSERV 2996 8 MARS_SJOIN 2997 9 MARS_SLEAVE 2998 10 MARS_GROUPLIST_REQUEST 2999 11 MARS_GROUPLIST_REPLY 3000 12 MARS_REDIRECT_MAP 3001 13 MARS_MIGRATE 3003 A number of issues are left open at this stage, and are likely to be 3004 the subject of on-going research and additional documents that build 3005 upon this one. 3007 The specified endpoint behaviour allows the use of 3008 redundant/backup MARSs within a cluster. However, no 3009 specifications yet exist on how these MARSs co-ordinate amongst 3010 themselves. (The default is to only have one MARS per cluster.) 3012 The specified endpoint behaviour and MARS service allows the use 3013 of multiple MCSs per group. However, no specifications yet exist 3014 on how this may be used, or how these MCSs co-ordinate amongst 3015 themselves. Until futher work is done on MCS co-ordination 3016 protocols the default is to only have one MCS per group. 3018 The MARS relies on the cluster member dropping off 3019 ClusterControlVC if the cluster member dies. It is not clear if 3020 additional mechanisms are needed to detect and delete 'dead' 3021 cluster members. 3023 Supporting layer 3 'broadcast' as a special case of multicasting 3024 (where the 'group' encompasses all cluster members) has not been 3025 explicitly discussed. 3027 Supporting layer 3 'unicast' as a special case of multicasting 3028 (where the 'group' is a single cluster member, identified by the 3029 cluster member's unicast protocol address) has not been explicitly 3030 discussed. 3032 The future development of ATM Group Addresses and Leaf Initiated 3033 Join to ATM Forum's UNI specification has not been addressed. 3034 (However, the problems identified in this document with respect to 3035 VC scarcity and impact on AAL contexts will not be fixed by such 3036 developments in the signalling protocol.) 3038 Possible modifications to the interpretation of the mar$hrdrsv and 3039 mar$afn fields in the Fixed header, based on different values for 3040 mar$op.version, are for further study. 3042 Security Consideration 3044 Security consideration are not addressed in this document. 3046 Acknowledgments 3048 The discussions within the IP over ATM Working Group have helped 3049 clarify the ideas expressed in this document. John Moy (Cascade 3050 Communications Corp.) initially suggested the idea of wild-card 3051 entries in the ARP Server. Drew Perkins (Fore Systems) provided 3052 rigorous and useful critique of early proposed mechanisms for 3053 distributing and validating group membership information. Susan 3054 Symington (and co-workers at MITRE Corp., Don Chirieleison, and Bill 3055 Barns) clearly articulated the need for multicast server support, 3056 proposed a solution, and challenged earlier block join/leave 3057 mechanisms. John Shirron (Fore Systems) provided useful improvements 3058 on my original revalidation procedures. 3060 Susan Symington and Bryan Gleeson (Adaptec) independently championed 3061 the need for the service provided by MARS_GROUPLIST_REQUEST/REPLY. 3062 The new encapsulation scheme arose from WG discussions, captured by 3063 Bryan Gleeson in an interim Internet Draft (with Keith McCloghrie 3064 (Cisco), Andy Malis (Ascom Nexion), and Andrew Smith (Bay Networks) 3065 as key contributors). James Watt (Newbridge) and Joel Halpern 3066 (Newbridge) motivated the development of a more multiprotocol MARS 3067 control message format, evolving it away from its original ATMARP 3068 roots. They also motivated the development of Type #1 and Type #2 3069 data path encapsulations. Rajesh Talpade (Georgia Tech) helped 3070 clarify the need for the MARS_MIGRATE function. 3072 Maryann Maher (ISI) provided valuable sanity and implementation 3073 checking during the latter stages of the document's development. 3075 Finally, Jim Rubas (IBM) supplied the MARS pseudo-code in Appendix F 3076 and also provided detailed proof-reading in the latter stages of the 3077 document's development. 3079 Author's Address 3081 Grenville Armitage 3082 Bellcore, 445 South Street 3083 Morristown, NJ, 07960 3084 USA 3086 Email: gja@thumper.bellcore.com 3087 Ph. +1 201 829 2635 3089 References 3090 [1] S. Deering, "Host Extensions for IP Multicasting", RFC 1112, 3091 Stanford University, August 1989. 3093 [2] Heinanen, J., "Multiprotocol Encapsulation over ATM Adaption 3094 Layer 5", RFC 1483, USC/Information Science Institute, July 1993. 3096 [3] Laubach, M., "Classical IP and ARP over ATM", RFC1577, Hewlett- 3097 Packard Laboratories, December 1993 3099 [4] ATM Forum, "ATM User Network Interface (UNI) Specification 3100 Version 3.1", ISBN 0-13-393828-X, Prentice Hall, Englewood Cliffs, 3101 NJ, June 1995. 3103 [5] D. Waitzman, C. Partridge, S. Deering, "Distance Vector Multicast 3104 Routing Protocol", RFC 1075, November 1988. 3106 [6] M. Perez, F. Liaw, D. Grossman, A. Mankin, E. Hoffman, A. Malis, 3107 "ATM Signaling Support for IP over ATM", RFC 1755, February 1995. 3109 [7] M. Borden, E. Crawley, B. Davie, S. Batsell, "Integration of 3110 Real-time Services in an IP-ATM Network Architecture.", RFC 1821, 3111 August 1995. 3113 [8] ATM Forum, "ATM User-Network Interface Specification Version 3114 3.0", Englewood Cliffs, NJ: Prentice Hall, September 1993 3116 Appendix A. Hole punching algorithms. 3118 Implementations are entirely free to comply with the body of this 3119 memo in any way they see fit. This appendix is purely for 3120 clarification. 3122 A MARS implementation might pre-construct a set of pairs 3123 (P) that reflects the entire Class D space, excluding any addresses 3124 currently supported by multicast servers. The field of the 3125 first pair MUST be 224.0.0.0, and the field of the last pair 3126 must be 239.255.255.255. The first and last pair may be the same. 3127 This set is updated whenever a multicast server registers or 3128 deregisters. 3130 When the MARS must perform 'hole punching' it might consider the 3131 following algorithm: 3133 Assume the MARS_JOIN/LEAVE received by the MARS from the cluster 3134 member specified the block . 3136 Assume Pmin(N) and Pmax(N) are the and fields from the 3137 Nth pair in the MARS's current set P. 3139 Assume set P has K pairs. Pmin(1) MUST equal 224.0.0.0, and 3140 Pmax(M) MUST equal 239.255.255.255. (If K == 1 then no hole 3141 punching is required). 3143 Execute pseudo-code: 3145 create copy of set P, call it set C. 3147 index1 = 1; 3148 while (Pmax(index1) <= Emin) 3149 index1++; 3151 index2 = K; 3152 while (Pmin(index2) >= Emax) 3153 index2--; 3155 if (index1 > index2) 3156 Exit, as the hole-punched set is null. 3158 if (Pmin(index1) < Emin) 3159 Cmin(index1) = Emin; 3161 if (Pmax(index2) > Emax) 3162 Cmax(index2) = Emax; 3164 Set C is the required 'hole punched' set of address blocks. 3166 The resulting set C retains all the MARS's pre-constructed 'holes' 3167 covering the multicast servers, but will have been pruned to cover 3168 the section of the Class D space specified by the originating host's 3169 values. 3171 The host end should keep a table, H, of open VCs in ascending order 3172 of Class D address. 3174 Assume H(x).addr is the Class address associated with VC.x. 3175 Assume H(x).addr < H(x+1).addr. 3177 The pseudo code for updating VCs based on an incoming JOIN/LEAVE 3178 might be: 3180 x = 1; 3181 N = 1; 3183 while (x < no.of VCs open) 3184 { 3185 while (H(x).addr > max(N)) 3186 { 3187 N++; 3188 if (N > no. of pairs in JOIN/LEAVE) 3189 return(0); 3190 } 3192 if ((H(x).addr <= max(N) && 3193 ((H(x).addr >= min(N)) 3194 perform_VC_update(); 3195 x++; 3196 } 3198 Appendix B. Minimising the impact of IGMP in IPv4 environments. 3200 Implementing any part of this appendix is not required for 3201 conformance with this document. It is provided solely to document 3202 issues that have been identified. 3204 The intent of section 5.1 is for cluster members to only have 3205 outgoing point to multipoint VCs when they are actually sending data 3206 to a particular multicast groups. However, in most IPv4 environments 3207 the multicast routers attached to a cluster will periodically issue 3208 IGMP Queries to ascertain if particular groups have members. The 3209 current IGMP specification attempts to avoid having every group 3210 member respond by insisting that each group member wait a random 3211 period, and responding if no other member has responded before them. 3212 The IGMP reply is sent to the multicast address of the group being 3213 queried. 3215 Unfortunately, as it stands the IGMP algorithm will be a nuisance for 3216 cluster members that are essentially passive receivers within a given 3217 multicast group. It is just as likely that a passive member, with no 3218 outgoing VC already established to the group, will decide to send an 3219 IGMP reply - causing a VC to be established where there was no need 3220 for one. This is not a fatal problem for small clusters, but will 3221 seriously impact on the ability of a cluster to scale. 3223 The most obvious solution is for routers to use the 3224 MARS_GROUPLIST_REQUEST and MARS_GROUPLIST_REPLY messages, as 3225 described in section 8.5. This would remove the regular IGMP Queries, 3226 resulting in cluster members only sending an IGMP Report when they 3227 first join a group. 3229 Alternative solutions do exist. One would be to modify the IGMP reply 3230 algorithm, for example: 3232 If the group member has VC open to the group proceed as per RFC 3233 1112 (picking a random reply delay between 0 and 10 seconds). 3235 If the group member does not have VC already open to the group, 3236 pick random reply delay between 10 and 20 seconds instead, and 3237 then proceed as per RFC 1112. 3239 If even one group member is sending to the group at the time the IGMP 3240 Query is issued then all the passive receivers will find the IGMP 3241 Reply has been transmitted before their delay expires, so no new VC 3242 is required. If all group members are passive at the time of the IGMP 3243 Query then a response will eventually arrive, but 10 seconds later 3244 than under conventional circumstances. 3246 The preceeding solution requires re-writing existing IGMP code, and 3247 implies the ability of the IGMP entity to ascertain the status of VCs 3248 on the underlying ATM interface. This is not likely to be available 3249 in the short term. 3251 One short term solution is to provide something like the preceeding 3252 functionality with a 'hack' at the IP/ATM driver level within cluster 3253 members. Arrange for the IP/ATM driver to snoop inside IP packets 3254 looking for IGMP traffic. If an IGMP packet is accepted for 3255 transmission, the IP/ATM driver can buffer it locally if there is no 3256 VC already active to that group. A 10 second timer is started, and if 3257 an IGMP Reply for that group is received from elsewhere on the 3258 cluster the timer is reset. If the timer expires, the IP/ATM driver 3259 then establishes a VC to the group as it would for a normal IP 3260 multicast packet. 3262 Some network implementors may find it advantageous to configure a 3263 multicast server to support the group 224.0.0.1, rather than rely on 3264 a mesh. Given that IP multicast routers regularly send IGMP queries 3265 to this address, a mesh will mean that each router will permanently 3266 consume an AAL context within each cluster member. In clusters served 3267 by multiple routers the VC load within switches in the underlying ATM 3268 network will become a scaling problem. 3270 Finally, if a multicast server is used to support 224.0.0.1, another 3271 ATM driver level hack becomes a possible solution to IGMP Reply 3272 traffic. The ATM driver may choose to grab all outgoing IGMP packets 3273 and send them out on the VC established for sending to 224.0.0.1, 3274 regardless of the Class D address the IGMP message was actually for. 3275 Given that all hosts and routers must be members of 224.0.0.1, the 3276 intended recipients will still receive the IGMP Replies. The negative 3277 impact is that all cluster members will receive the IGMP Replies. 3279 Appendix C. Further comments on 'Clusters'. 3281 The cluster concept was introduced in section 1 for two reasons. The 3282 more well known term of Logical IP Subnet is both very IP specific, 3283 and constrained to unicast routing boundaries. As the architecture 3284 described in this document may be re-used in non-IP environments a 3285 more neutral term was needed. As the needs of multicasting are not 3286 always bound by the same scopes as unicasting, it was not immediately 3287 obvious that apriori limiting ourselves to LISs was beneficial in the 3288 long term. 3290 It must be stressed that Clusters are purely an administrative being. 3291 You choose their size (i.e. the number of endpoints that register 3292 with the same MARS) based on your multicasting needs, and the 3293 resource consumption you are willing to put up with. The larger the 3294 number of ATM attached hosts you require multicast support for, the 3295 more individual clusters you might choose to establish (along with 3296 multicast routers to provide inter-cluster traffic paths). 3298 Given that not all the hosts in any given LIS may require multicast 3299 support, it becomes conceivable that you might assign a single MARS 3300 to support hosts from across multiple LISs. In effect you have a 3301 cluster covering multiple LISs, and have achieved 'cut through' 3302 routing for multicast traffic. Under these circumstances increasing 3303 the geographical size of a cluster might be considered a good thing. 3305 However, practical considerations limit the size of clusters. Having 3306 a cluster span multiple LISs may not always be a particular 'win' 3307 situation. As the number of multicast capable hosts in your LISs 3308 increases it becomes more likely that you'll want to constrain a 3309 cluster's size and force multicast traffic to aggregate at multicast 3310 routers scattered across your ATM cloud. 3312 Finally, multi-LIS clusters require a degree of care when deploying 3313 IP multicast routers. Under the Classical IP model you need unicast 3314 routers on the edges of LISs. Under the MARS architecture you only 3315 need multicast routers at the edges of clusters. If your cluster 3316 spans multiple LISs, then the multicast routers will perceive 3317 themselves to have a single interface that is simultaneously attached 3318 to multiple unicast subnets. Whether this situation will work depends 3319 on the inter-domain multicast routing protocols you use, and your 3320 multicast router's ability to understand the new relationship between 3321 unicast and multicast topologies. 3323 In the absence of futher research in this area, networks deployed in 3324 conformance to this document MUST make their IP cluster and IP LIS 3325 coincide, so as to avoid these complications. 3327 Appendix D. TLV list parsing algorithm. 3329 The following pseudo-code represents how the TLV list format 3330 described in section 10 could be handled by a MARS or MARS client. 3332 list = (mar$extoff & 0xFFFC); 3334 if (list == 0) exit; 3336 list = list + message_base; 3338 while (list->Type.y != 0) 3339 { 3340 switch (list->Type.y) 3341 { 3342 default: 3343 { 3344 if (list->Type.x == 0) break; 3346 if (list->Type.x == 1) exit; 3348 if (list->Type.x == 2) log-error-and-exit; 3349 } 3351 [...other handling goes here..] 3353 } 3355 list += (list->Length + 4 + ((4-(list->Length & 3)) % 3356 4)); 3358 } 3360 return; 3362 Appendix E. Summary of timer values. 3364 This appendix summarises of various timers or limits mentioned in the 3365 main body of the document. Values are specified in the following 3366 format: [x, y, z] indicating a minimum value of x, a recommended 3367 value of y, and a maximum value of z. A '-' will indicate that a 3368 category has no value specified. Values in minutes are followed by 3369 'min', values in seconds are followed by 'sec'. 3371 Idle time for MARS - MARS client pt to pt VC: 3372 [1 min, 20 min, -] 3374 Idle time for multipoint VCs from client. 3375 [1 min, 20 min, -] 3377 Allowed time between MARS_MULTI components. 3378 [-, -, 10 sec] 3380 Initial random L_MULTI_RQ/ADD retransmit timer range. 3381 [5 sec, -, 10 sec] 3383 Random time to set VC_revalidate flag. 3384 [1 sec, -, 10 sec] 3386 MARS_JOIN/LEAVE retransmit interval. 3387 [5 sec, 10 sec, -] 3389 MARS_JOIN/LEAVE retransmit limit. 3390 [-, -, 5] 3392 Random time to re-register with MARS. 3393 [1 sec, -, 10 sec] 3395 Force wait if MARS re-registration is looping. 3396 [1 min, -, -] 3398 Transmission interval for MARS_REDIRECT_MAP. 3399 [1 min, 1 min, 2 min] 3401 Limit for client to miss MARS_REDIRECT_MAPs. 3402 [-, -, 4 min] 3404 Appendix F. Pseudo code for MARS operation. 3406 Implementations are entirely free to comply with the body of this 3407 memo in any way they see fit. This appendix is purely for possible 3408 clarification. 3410 A MARS implementation might be built along the lines suggested in 3411 this pseudo-code. 3413 1. Main 3415 1.1 Initilization 3417 Define a server list as the list of leaf nodes 3418 on ServerControlVC. 3419 Define a cluster list as the list of leaf nodes 3420 on ClusterControlVC. 3421 Define a host map as the list of hosts that are 3422 members of a group. 3423 Define a server map as the list of hosts (MCSs) 3424 that are serving a group. 3425 Read config file. 3426 Allocate message queues. 3427 Allocate internal tables. 3428 Set up passive open VC connection. 3429 Set up redirect_map timer. 3430 Establish logging. 3432 1.2 Message Processing 3434 Forever { 3435 If the message has a TLV then { 3436 If TLV is unsupported then { 3437 process as defined in TLV type field. 3438 } /* unknown TLV */ 3439 } /* TLV present */ 3440 Place incoming message in the queue. 3441 For (all messages in the queue) { 3442 If the message is not a JOIN/LEAVE/MSERV/UNSERV with 3443 mar$flags.register == 1 then { 3444 If the message source is (not a member of server list) && 3445 (not a member of cluster list) then { 3446 Drop the message silently. 3447 } 3448 } 3449 If (mar$pro.type is not supported) or 3450 (the ATM source address is missing) then { 3451 Continue. 3453 } 3454 Determine type of message. 3455 If an ERR_L_RELEASE arrives on ClusterControlVC then { 3456 Remove the endpoints ATM address from all groups 3457 for which it has joined. 3458 Release the CMI. 3459 Continue. 3460 } /* error on CCVC */ 3461 Call specific message handling routine. 3462 If redirect_map timer pops { 3463 Call MARS_REDIRECT_MAP message handling routine. 3464 } /* redirect timer pop */ 3465 } /* all msgs in the queue */ 3466 } /* forever loop */ 3468 2. Message Handler 3470 2.1 Messages: 3472 - MARS_REQUEST 3474 Indicate no MARS_MULTI support of TLV. 3475 If the supported TLV is not NULL then { 3476 Indicate MARS_MULTI support of TLV. 3477 Process as required. 3478 } else { /* TLV NULL */ 3479 Indicate message to be sent on Private VC. 3480 If the message source is a member of server list then { 3481 If the group has a non-null host map then { 3482 Call MARS_MULTI with the host map for the group. 3483 } else { /* no group */ 3484 Call MARS_NAK message routine. 3485 } /* no group */ 3486 } else { /* source is cluster list */ 3487 If the group has a non-null server map then { 3488 Call MARS_MULTI with the server map for the group. 3489 } else { /* cluster member but no server map */ 3490 If the group has a non-null host map then { 3491 Call MARS_MULTI with the host map for the group. 3492 } else { /* no group */ 3493 Call MARS_NAK message routine. 3494 } /* no group */ 3495 } /* cluster member but no server map */ 3496 } /* source is a cluster list */ 3497 } /* TLV NULL */ 3498 If a message exists then { 3499 Send message as indicated. 3500 } 3501 Return. 3503 - MARS_MULTI 3505 Construct a MARS_MULTI for the specified map. 3506 If the param indicates TLV support then { 3507 Process the TLV as required. 3508 } 3509 Return. 3511 - MARS_JOIN 3513 If (mar$flags.copy != 0) silently ignore the message. 3514 If more than a single pair is specified then 3515 silently ignore the message. 3516 Indicate message to be sent on private VC. 3517 If (mar$flags.register == 1) then { 3518 If the node is already a registered member of the cluster 3519 associated with protocol type then { /*previous register*/ 3520 Copy the existing CMI into the MARS_JOIN. 3521 } else { /* new register */ 3522 Add the node to ClusterControlVC. 3523 Add the node to cluster list. 3524 mar$cmi = obtain CMI. 3525 } /* new register */ 3526 } else { /* not a register */ 3527 If the group is a duplicate of a previous MARS_JOIN then { 3528 mar$msn = current csn. 3529 Indicate message to be sent on Private VC. 3530 } else { 3531 Indicate no message to be sent. 3532 If the message source is in server map then { 3533 Drop the message silently. 3534 } else { 3535 If the first encompasses any group with 3536 a server map then { 3537 Call the Modified JOIN/LEAVE Processing routine. 3538 } else { 3539 If the MARS_JOIN is for a multi group then { 3540 Call the MultiGroup JOIN/LEAVE Processing Routine. 3541 } else { 3542 Indicate message to be sent on ClusterControlVC. 3543 } /* not for a multi group */ 3544 } /* group not handled by server */ 3545 } /* msg src not in server map */ 3546 Update internal tables. 3547 } /* not a duplicate */ 3548 } /* not a register */ 3550 If a message exists then { 3551 mar$flags.copy = 1. 3552 Send message as indicated. 3553 } 3554 Return. 3556 - MARS_LEAVE 3558 If (mar$flags.copy != 0) silently ignore the message. 3559 If more than a single pair is specified then 3560 silently ignore the message. 3561 Indicate message to be sent on ClusterControlVC. 3562 If (mar$flags.register == 1) then { /* deregistration */ 3563 Update internal tables to remove the member's ATM addr 3564 from all groups it has joined. 3565 Drop the endpoint from ClusterControlVC. 3566 Drop the endpoint from cluster list. 3567 Release the CMI. 3568 Indicate message to be sent on Private VC. 3569 } else { /* not a deregistration */ 3570 If the group is a duplicate of a previous MARS_LEAVE then { 3571 mar$msn = current csn. 3572 Indicate message to be sent on Private VC. 3573 } else { 3574 Indicate no message to be sent. 3575 If the first encompasses any group with 3576 a server map then { 3577 Call the Modified JOIN/LEAVE Processing routine. 3578 } else { 3579 If the MARS_LEAVE is for a multi group then { 3580 Call the MultiGroup JOIN/LEAVE Processing Routine. 3581 } else { 3582 Indicate message to be sent on ClusterControlVC. 3583 } 3584 } 3585 Update internal tables. 3586 } /* not a duplicate */ 3587 } /* not a deregistration */ 3588 If a message exists then { 3589 mar$flags.copy = 1. 3590 Send message as indicated. 3591 } 3592 Return. 3594 - MARS_MSERV 3596 If (mar$flags.register == 1) then { /* server register */ 3597 Add the endpoint as a leaf node to ServerControlVC. 3599 Add the endpoint to the server list. 3600 Indicate the message to be sent on Private VC. 3601 mar$cmi = 0. 3602 } else { /* not a register */ 3603 If the source has not registered then { 3604 Drop and ignore the message. 3605 Indicate no message to be sent. 3606 } else { /* source is registered */ 3607 If MCS is already member of indicated server map { 3608 Indicate message to be sent on Private VC. 3609 mar$flags.layer3grp = 0; 3610 mar$flags.copy = 1. 3611 } else { /* New MCS to add. */ 3612 Add the server ATM addr to server map for group. 3613 Indicate message to be sent on ServerControlVC. 3614 Send message as indicated. 3615 Make a copy of the message. 3616 Indicate message to be sent on ClusterControlVC. 3617 If new server map was just created { 3618 Construct MARS_MIGRATE, with MCS as target. 3619 } else { 3620 Change the op code to MARS_JOIN. 3621 mar$flags.layer3grp = 0. 3622 mar$flags.copy = 1. 3623 } /* new server map */ 3624 } /* New MCS to add. */ 3625 } /* source is registered */ 3626 } /* not a register */ 3628 If a message exists then { 3629 Send message as indicated. 3630 } 3631 Return. 3633 - MARS_UNSERV 3635 If (mar$flags.register == 1) then { /* deregister */ 3636 Remove the ATM addr of the MCS from all server maps. 3637 If a server map becomes null then delete it. 3638 Remove the endpoint as a leaf of ServerControlVC. 3639 Remove the endpoint from server list. 3640 Indicate the message to be sent on Private VC. 3641 } else { /* not a deregister */ 3642 If the source is not a member of server list then { 3643 Drop and ignore the message. 3644 Indicate no message to be sent. 3645 } else { /* source is registered */ 3646 If MCS is not member of indicated server map { 3647 Indicate message to be sent on Private VC. 3648 mar$flags.layer3grp = 0; 3649 mar$flags.copy = 1. 3650 } else { /* MCS existed, must be removed. */ 3651 Remove ATM addr of the MCS from indicated server map. 3652 If a server map is null then delete it. 3653 Indicate the message to be sent on ServerControlVC. 3654 Send message as indicated. 3655 Make a copy of the message. 3656 Change the op code to MARS_LEAVE. 3657 Indicate message (copy) to be sent on ClusterControlVC. 3658 mar$flags.layer3grp = 0; 3659 mar$flags.copy = 1. 3660 } /* MCS existed, must be removed. */ 3661 } /* source is registered */ 3662 } /* not a deregister */ 3663 If a message exists then { 3664 Send message as indicated. 3665 } 3666 Return. 3668 - MARS_NAK 3670 Build command. 3671 Return. 3673 - MARS_GROUPLIST_REQUEST 3675 If (mar$pnum != 1) then Return. 3676 Call MARS_GROUPLIST_REPLY with the range and output VC. 3677 Return. 3679 - MARS_GROUPLIST_REPLY 3681 Build command for specified range. 3682 Indicate message to be sent on specified VC. 3683 Send message as indicated. 3684 Return. 3686 - MARS_REDIRECT_MAP 3688 Include the MARSs own address in the message. 3689 If there are backup MARSs then include their addresses. 3690 Indicate MARS_REDIRECT_MAP is to be sent on ClusterControlVC. 3691 Send message back as indicated. 3692 Return. 3694 3. Send Message Handler 3696 If (the message is going out ClusterControlVC) && 3697 (a new csn is required) then { 3698 mar$msn = obtain a CSN 3699 } 3700 If (the message is going out ServerControlVC) && 3701 (a new ssn is required) then { 3702 mar$msn = obtain a SSN 3703 } 3704 Return. 3706 4. Number Generator 3708 4.1 Cluster Sequence Number 3710 Generate the next sequence number. 3711 Return. 3713 4.2 Server Sequence Number 3715 Generate the next sequence number. 3716 Return. 3718 4.3 CMI 3720 CMIs are allocated uniquely per registered cluster member 3721 within the context of a particular layer 3 protocol type. 3722 A single node may register multiple times if it supports 3723 multiple layer 3 protocols. 3724 The CMIs allocated for each such registration may or may 3725 not be the same. 3726 Generate a CMI for this protocol. 3727 Return. 3729 5. Modified JOIN/LEAVE Processing 3731 This routine processes JOIN/LEAVE when a server map exists. 3733 Make a copy of the message. 3734 Change the type of the copy to MARS_SJOIN. 3735 If the message is a MARS_LEAVE then { 3736 Change the type of the copy to MARS_SLEAVE. 3737 } 3738 mar$flags.copy = 1 (copy). 3739 Indicate the message to be sent on ServerControlVC. 3740 Send message (copy) as indicated. 3741 mar$flags.punched = 0 in the original message. 3743 Indicate the message to be sent on Private VC. 3744 Send message (original) as indicated. 3745 Hole punch the group by excluding 3746 from the range those groups that are served by MCSs 3747 or which the joining (leaving) node is already 3748 (still) a member of due to it having previously 3749 issued a single group join. 3750 Indicate the (original) message to be sent on ClusterControlVC. 3751 If (number of holes punched > 0) then { /* punched holes */ 3752 In original message do { 3753 mar$flags.punched = 1. 3754 old punched list <- new punched list. 3755 } 3756 } /* punched holes */ 3757 mar$flags.copy = 1. 3758 Send message as indicated. 3759 Return. 3761 5.1 MultiGroup JOIN/LEAVE Processing 3763 This routine processes JOIN/LEAVE when a multi group exists. 3765 If (mar$flags.layer3grp) { 3766 Ignore this setting, consider it reset. 3767 } 3768 mar$flags.copy = 1. 3769 Make a copy of the message. 3770 From the copy hole punch the group by 3771 excluding from the range those groups that this 3772 node has already joined or left. 3773 If (number of holes punched > 0) then { 3774 mar$flags.punch = 0 in original message. 3775 Indicate original message to be sent on Private VC. 3776 Send original message as indicated. 3777 mar$flags.punch = 1 in copy message. 3778 old group range <- new punched list. 3779 Indicate message to be sent on ClusterControlVC. 3780 Send copy of message as indicated. 3781 } else { 3782 Indicate message to be sent on ClusterControlVC. 3783 Send original message as indicated. 3784 } /* no holes punched */ 3785 Return.