idnits 2.17.1 draft-anker-congress-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-25) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The abstract seems to contain references ([2], [8]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 299 has weird spacing: '...litates migra...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (18 July 1998) is 9413 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: 'G' on line 1666 -- Looks like a reference, but probably isn't: 'S' on line 1554 -- Possible downref: Non-RFC (?) normative reference: ref. '1' -- Unexpected draft version: The latest known version of draft-armitage-ion-venus is -02, but you're referring to -03. ** Downref: Normative reference to an Informational draft: draft-armitage-ion-venus (ref. '3') -- No information found for draft-ietf-idmr-PIM-SM-spec - is the name correct? -- Possible downref: Normative reference to a draft: ref. '4' -- No information found for draft-ietf-idmr-PIM-DM-spec - is the name correct? -- Possible downref: Normative reference to a draft: ref. '5' -- Possible downref: Non-RFC (?) normative reference: ref. '6' -- Possible downref: Non-RFC (?) normative reference: ref. '7' ** Obsolete normative reference: RFC 1577 (ref. '8') (Obsoleted by RFC 2225) -- Unexpected draft version: The latest known version of draft-ietf-idmr-cbt-spec is -09, but you're referring to -10. (However, the state information for draft-ietf-idmr-PIM-DM-spec is not up-to-date. The last update was unsuccessful) ** Downref: Normative reference to an Historic draft: draft-ietf-idmr-cbt-spec (ref. '9') == Outdated reference: A later version (-11) exists of draft-ietf-idmr-dvmrp-v3-03 ** Downref: Normative reference to an Informational draft: draft-ietf-idmr-dvmrp-v3 (ref. '10') ** Downref: Normative reference to an Historic RFC: RFC 1584 (ref. '11') == Outdated reference: A later version (-05) exists of draft-smirnov-ion-earth-02 -- Possible downref: Normative reference to a draft: ref. '12' ** Downref: Normative reference to an Informational RFC: RFC 1937 (ref. '13') -- Possible downref: Non-RFC (?) normative reference: ref. '14' -- Possible downref: Normative reference to a draft: ref. '15' == Outdated reference: A later version (-03) exists of draft-ietf-mboned-imrp-some-issues-02 -- Possible downref: Normative reference to a draft: ref. '16' -- Possible downref: Normative reference to a draft: ref. '17' ** Downref: Normative reference to an Informational RFC: RFC 2121 (ref. '18') -- Possible downref: Non-RFC (?) normative reference: ref. '20' == Outdated reference: A later version (-14) exists of draft-ietf-rolc-nhrp-11 Summary: 17 errors (**), 0 flaws (~~), 7 warnings (==), 20 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET-DRAFT T. Anker 3 D. Breitgand 4 File: draft-anker-congress-01.txt D. Dolev 5 Z. Levy 6 The Hebrew Univ. of Jerusalem 7 Expiration: 18 July 1998 9 IMSS: IP Multicast Shortcut Service 11 Status of this Memo 13 This document is an Internet Draft. Internet Drafts are working 14 documents of the Internet Engineering Task Force (IETF), its areas, 15 and its Working Groups. Note that other groups may also distribute 16 working documents as Internet Drafts. 18 Internet Drafts are draft documents valid for a maximum of six 19 months. Internet Drafts may be updated, replaced, or obsoleted by 20 other documents at any time. It is not appropriate to use Internet 21 Drafts as reference material or to cite them other than as a "working 22 draft" or "work in progress". 24 To learn the current status of any Internet-Draft, please check the 25 "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow 26 Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), 27 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 28 ftp.isi.edu (US West Coast). 30 Abstract 32 This memo describes an IP Multicast Shortcut Service (IMSS) over a 33 large ATM cloud. The service enables cut-through routing between 34 routers serving different Logical IP Subnets (LISs). The presented 35 solution is complementary to MARS [2], adopted as the IETF standard 36 solution for IP multicast over ATM. 38 IMSS consists of two orthogonal components: CONnection-oriented Group 39 address RESolution Service (CONGRESS) and IP multicast SErvice for 40 Non-broadcast Access Networking TEchnology (IP-SENATE). An IP class D 41 address is resolved into a set of addresses of multicast routers that 42 should receive the multicast traffic targeted to this class D 43 address. This task is accomplished using CONGRESS. The cut-through 44 routing decisions and actual data transmission are performed by IP- 45 SENATE. 47 IMSS preserves the classical LIS model [8]. The scope of IMSS is to 48 facilitate inter-LIS cut-through routing, while MARS provides tools 49 for the intra-LIS IP multicast. 51 Table of Content 53 1. ................................................Introduction 54 1.1 ..................................................Background 55 1.2 ....................................................CONGRESS 56 1.3 ...................................................IP-SENATE 57 2. ..................................................Discussion 58 3. ...............................................IMSS Overview 59 3.1 ...............................................Network Model 60 3.2 ....................................................CONGRESS 61 3.2.1 ...............................................CONGRESS' API 62 3.3 ...................................................IP-SENATE 63 4. ................................................Architecture 64 4.1 .......................................CONGRESS Architecture 65 4.2 ......................................IP-SENATE Architecture 66 4.3 ...........................................IMSS Architecture 67 5. ...........................................CONGRESS Protocol 68 5.1 .............................................Data Structures 69 5.2 .......................IMSS Router Joining/Leaving a D-group 70 5.3 ...........Reception of Incremental Membership Notifications 71 5.4 ...............................Resolution of D-Group Address 72 5.5 ........................................Handling of Failures 73 5.5.1 .........................................IMSS Router Failure 74 5.5.2 ..............................................Domain Failure 75 5.5.3 .............................................Domain Recovery 76 6. ..........................................IP-SENATE Protocol 77 6.1 ........................................Main Data Structures 78 6.2 .....................................Maintenance of D-groups 79 6.2.1 ............................................Joining D-Groups 80 6.2.2 ............................................Leaving D-Groups 81 6.2.3 .........................Client and Server Operational Roles 82 6.2.4 ...............................Regular and Sender-Only Modes 83 6.3 ........................................Forwarding Decisions 84 6.3.1 ..................A Server Receives a Datagram from a Client 85 6.3.2 ............A Server Receives a Datagram from another Server 86 6.3.3 .........A Client Receives a Datagram from an IDMR Interface 87 6.3.4 .........A Server Receives a Datagram from an IDMR Interface 88 6.3.5 ...........................................Pruning Mechanism 89 7. .............................................Fault Tolerance 90 8. .....................................Security Considerations 91 9. .............................................Message Formats 92 9.1 ...........................................CONGRESS Messages 93 9.2 ..........................................IP-SENATE Messages 94 10. ..................................................References 95 11. .............................................Acknowledgments 96 12. .......................................List of Abbreviations 98 1. Introduction 100 As was noted in VENUS [3]: "The development of NHRP [21], a protocol 101 for discovering and managing unicast forwarding paths that bypass IP 102 routers, has led to some calls for an IP multicast equivalent. 103 Unfortunately, the IP multicast service is a rather different beast 104 to the IP unicast service.". The problems correctly identified by 105 VENUS can be divided into two broad categories: 1) problems 106 associated with multicast group membership maintenance and resolution 107 and 2) problems concerned with the multicast routing. Although VENUS, 108 "...focuses exclusively on the problems associated with extending the 109 MARS model to cover multiple clusters or clusters spanning more than 110 one subnet", most of the discussed problems are, in fact, intrinsic 111 to any cut-through routing solution. The main conclusion that one can 112 draw from VENUS is that these problems cannot be solved just by the 113 straightforward extension of MARS to cover multiple LISs. This memo 114 presents a solution that relies on MARS for intra-LIS multicast 115 communication, and uses an alternative methodology to provide an 116 inter-LIS multicast shortcut service that scales to large ATM clouds. 117 It is assumed that the reader is familiar with the classical LIS 118 model [8], MARS[2] and the basics of the Inter-Domain Multicast 119 Routing (IDMR) protocols [4,5,9,10,11]. 121 This document has two goals: 123 o To provide a generic protocol for dynamic mapping 124 of any IP class D address onto a set of the multicast routers 125 that have an ATM (or any other SVC-based Data Link subnetwork) 126 connectivity and have either directly attached hosts, or down- 127 stream routers (w.r.t. to a specific multicast tree) that need 128 to receive the corresponding multicast traffic. The resolved 129 addresses are used to establish the shortcut ATM connections 130 among the multicast routers. The mapping protocol should be 131 independent of any underlying IP multicast protocol. It should 132 be specifically noted that this document proposes usage of the 133 shortcut multicast connections on a per-source basis. This is 134 motivated by the fact that the shortcut connections will be 135 mainly used by multicast applications that need guaranteed QoS. 136 For all other multicast applications the current IP over ATM 137 paradigm would, probably, suffice. Multicast applications that 138 require QoS, such as video-conferencing, transmission of high 139 quality video stream, interactive games, etc, will usually 140 involve a small number of sources and will require a source 141 specific multicast trees in order to achieve the required QoS. 143 o To provide a solution for the generic interoperability and 144 routing problems that arise when any cut-through routing 145 protocol is deployed in conjunction with the existing IDMR 146 protocols. 148 This document proposes an architectural separation between the two 149 problem domains above, so that each one of them can be tackled with 150 the most appropriate methodology and in the most generic manner. 152 1.1 Background 154 The classical IP network over an ATM cloud consists of multiple 155 Logical IP Subnets (LISs) interconnected by IP routers [8]. The 156 standardized solution for IP Multicast over ATM, Multicast Address 157 Resolution Service (MARS[2]) follows the classical model. In the MARS 158 approach, each LIS is served by a single MARS server and is termed 159 "MARS cluster". MARS can be viewed "as an extended analog of the ATM 160 ARP server [8]". 162 From the IP multicast perspective, MARS is functionally equivalent to 163 IGMP [1]. Similarly to IGMP, a MARS server registers the hosts that 164 are directly attached to a multicast router and are interested to 165 receive multicast traffic targeted to a specific IP class D address. 166 The important difference, however, is that MARS is aware of the 167 connection-oriented nature of the underlying network. For each 168 relevant IP class D address, the MARS server maintains a set 169 (membership) of the hosts that belong to the same LIS and have been 170 registered to receive IP datagrams being sent to this address. 172 The process of mapping an IP class D address onto a set of ATM end- 173 point addresses is termed "multicast address resolution". Each such 174 set is used to establish native ATM connections between an IP 175 multicast router and the local members of the IP multicast group. The 176 IP multicast datagrams targeted to a specific class D address are 177 propagated over these connections. The ATM connections' layout within 178 a MARS cluster may be based either on a mesh of point to multipoint 179 (ptmpt) Virtual Circuits (VCs) [6,7], or a Multicast Server (MCS). 181 There is a work in progress to distribute the MARS server in order to 182 provide for load balancing and fault tolerance [17]. A group of 183 redundant MARS servers will constitute a single logical entity that 184 would provide the same functionality as a non-distributed MARS 185 server. 187 There is another work in progress, EARTH [12] that intends to extend 188 the scope of the services provided by MARS to multiple LISs. EARTH 189 defines a Multicast LIS (MLIS) that is composed of a number of LISs 190 and is served by a single EARTH server. Due to the centralistic 191 approach taken by EARTH, ultimately, very large MLISs would look as 192 very large MARS clusters. Thus the discussion and the conclusions 193 provided in VENUS are equally applicable to EARTH. 195 In the classical LIS model, LIS has the following properties: 197 o All members of a LIS have the same IP network/subnet number 198 and address mask; 200 o All members of a LIS are directly connected to the same NBMA 201 subnetwork; 203 o All hosts and routers outside the LIS are accessed via a router; 205 o All members of a LIS access each other directly (without 206 routers). 208 In the MARS model that retains the LIS model, it is assumed that all 209 the multicast communication outside the LISs is performed via 210 multicast routers that run some IDMR protocols. As explained in [13], 211 the classical LIS model may be too restrictive for networks based on 212 switched virtual circuit technology, e.g, ATM. Obviously, if LISs 213 share the same physical ATM network (ATM cloud), the LIS 214 internetworking model may introduce extra routing hops. This mismatch 215 between the IP and ATM topologies complicates full utilization of the 216 capabilities provided by the ATM network (e.g., QoS). 218 In addition, the extra routing hops impose an unnecessary 219 segmentation and reassembly overhead, because every IP datagram 220 should be reassembled at every router so that a router can perform 221 routing decisions. The "short-cut" (or "cut-through") paradigm seeks 222 to eliminate the mismatch between the topology of IP and that of the 223 underlying ATM network. Unfortunately, as was already stated above, 224 bypassing the extra routing hops is not a trivial task. 226 1.2 CONGRESS 228 The purpose of cut-through routing is to establish direct 229 communication links among the multicast group members. The discovery 230 of the multicast group members addresses is performed by a multicast 231 group address resolution and maintenance service. Generally this 232 service maps some application-defined character string, a multicast 233 group address, onto a set of identifiers of the group members. 235 Since a multicast group address resolution and maintenance service is 236 crucial to any multicast routing short-cut solution over NBMA 237 networks, it is appropriate to ask whether it should be implemented 238 once as a generic stand-alone service or suited specifically for each 239 and every multicast short-cut service. The tradeoff here is between 240 the generality and efficiency w.r.t. a specific multicast routing 241 protocol. In the IMSS approach a general multicast address resolution 242 service, CONGRESS, is used. 244 CONGRESS is a multicast address resolution and maintenance service 245 for NBMA networks that is independent of an underlying multicast 246 protocol. This is a generic stand-alone service. Although CONGRESS 247 may be exploited by the native ATM applications, as well as by the 248 network layer (IP), this document will focus only on the aspects of 249 CONGRESS related to IP. In fact a reduced version of CONGRESS having 250 the minimal set of features is presented in this memo. The interested 251 reader is encouraged to refer to [14] for more information. 253 CONGRESS operates in the native ATM environment. Its purpose is to 254 provide multicast address resolution and maintenance service 255 scaleable to a large ATM WAN. CONGRESS design is based on the 256 following principles: 258 o No flooding: CONGRESS does not flood the WAN on every multicast 259 group membership change. 261 o Hierarchical design: CONGRESS services are provided to 262 applications by multiple hierarchically organized servers. 264 o Robustness: Due to network failures and/or network 265 reconfiguration and re-planning, some CONGRESS servers may 266 temporarily disconnect and later reconnect. CONGRESS withstands 267 such transient failures by providing a best-effort service to 268 applications. 270 It is important to stress that CONGRESS is not concerned with the 271 actual data transfer. Its functionality is limited to the resolution 272 of multicast group addresses upon requests from the applications. 274 An overview of CONGRESS is provided in Section 3.2. 276 1.3 IP-SENATE 278 IP-SENATE is the second component of IMSS. It is concerned with the 279 actual IP datagram transmission over the short-cut communication 280 links, establishment of these links, routing decisions and the 281 interoperability with the existing IDMR protocols. IP-SENATE provides 282 a solution for the problems arising from bypassing of the multicast 283 routers. Most of these problems are general and independent of the 284 underlying IDMR protocols. The design philosophy of IP-SENATE is 285 based on the following principles: 287 o IP-SENATE is a best effort service. IP-SENATE does not 288 guarantee that short-cut is always possible, but it attempts to 289 perform the short-cut wherever possible. 291 o Short-cut is performed only among the multicast routers and not 292 directly among hosts. 294 o IP-SENATE facilitates (a) a full mesh of ptmpt connections 295 based communication, (b) multicast servers based communication 296 and (c) a hybrid form of communication based on the previous 297 two. 299 o IP-SENATE facilitates migration from a mesh of ptmpt 300 connections to multicast service-based connections and for 301 load-balancing among the multicast servers without a need for 302 global reconfiguration. 304 o IP-SENATE uses CONGRESS services for resolution and maintenance 305 of the multicast addresses into a set of addresses of the 306 relevant multicast routers. IP-SENATE may use any other service 307 providing the same functionality as CONGRESS. 309 o IP-SENATE is an inter-LIS protocol. It extends only the IDMR 310 routers. Host interface to IP multicast services [19] is not 311 changed. 313 o IP-SENATE relies on MARS to facilitate all the intra-LIS IP 314 multicast traffic. 316 o IP-SENATE does not assume a single multicast routing domain. 317 IP-SENATE is designed to operate in a heterogeneous network 318 where network consists of multiple interconnected multicast 319 routing domains. Consequently, IP-SENATE is not tailored for 320 any specific multicast routing protocol, but can be dynamically 321 configured to inter-work with different multicast protocols. 323 o IP-SENATE is to be implemented as an extension to the existing 324 multicast routing software. 326 2. Discussion 328 A designer of a short-cut routing multicast solution is opposed with 329 multiple non-trivial problems. The more prominent problems are 330 discussed below. 332 o If hosts are allowed to communicate directly with other 333 hosts (as in [3]), bypassing the multicast routers, then each 334 host must maintain membership information about all other hosts 335 scattered all over the internet and belonging to the same IP 336 multicast group. This scheme does not scale well because: 338 - The hosts must maintain large amounts of data that should 339 be kept consistent and updated. 340 - A considerable traffic and signalling overhead is 341 introduced when membership changes, e.g, join or leave 342 events are flooded over the network. 343 - As was noted in RFC2121 [18], an ATM Network 344 Interface Card (NIC) is capable of supporting a limited 345 number of connections (i.e, VCs originating from a NIC or 346 terminating at a NIC). If full mesh of ptmpt VCs is used 347 for cut-through communication within a multicast group, 348 NICs might not be capable to support all the simultaneous 349 connections. 351 o To solve the NIC limitations problem, the current IETF IP 352 multicast over ATM solution, MARS, supports a migrate 353 functionality that allows to switch from a mesh of ptmpt 354 connections to a multicast server based communication within a 355 single MARS cluster. It is not clear how to extend this 356 functionality, to a large ATM cloud. Such switching obsoletes 357 membership information kept at the hosts that are scattered 358 throughout the internet. As a result, some currently active 359 connections may become stale or terminate abruptly. 361 The IMSS solution presented in this memo performs cut-through only 362 among the multicast routers, reducing the problems above to a certain 363 extent. The NIC limitation problem is not completely eliminated, 364 however. Hence, IMSS facilitates deployment of "multicast servers" 365 for other routers that are termed "clients". In IMSS some of the 366 multicast routers may also function as multicast servers. 368 Cut-through mechanisms may have a negative impact on the conventional 369 IDMR protocols. For the sake of discussion of the interoperability 370 issues with the IDMR protocols, we divide the IDMR protocols into two 371 large families: "broadcast & prune"- based [10] and "explicit join"- 372 based [4,5,9,11]. In the first model periodical flooding of the 373 network and the subsequent pruning of irrelevant branches of the 374 multicast propagation trees is employed. In the second model, some 375 explicit information about the topology of the IP multicast groups is 376 exchanged among the multicast routers. 378 As we see it, a cut-through solution will have to co-exist with a 379 regular Inter-Domain Multicast Routing protocol in the same routing 380 domain. One of the reasons for deployment of an IDMR protocol in 381 addition to the cut-through mechanism, in the same ATM cloud, is that 382 it is not guaranteed that a cut-through connections can reach all the 383 relevant targets in the ATM cloud. 385 =============================================================== 387 |------------| 388 | IP cloud | 389 | (DVMRP) | 390 | | 391 | S #######> R ## |-----------| 392 |------------| # | |----------| 393 ##> CTR xxxxxxx>CTR | 394 | IP/ATM |# IP cloud| 395 S - source | cloud |# (DVMRP) |------------| 396 D - Destination |-----------|# | | 397 R - DVMRP router |#########>R########> D | 398 CTR - Cut-through router |----------| | 399 x - Cut-through connection | IP cloud | 400 # - DVMRP branch | (DVMRP) | 401 |------------| 403 Figure 1. 404 =============================================================== 406 Another important reason is that if a "broadcast & prune" IDMR 407 protocol is used in some non-ATM based IP subnetworks connected to 408 the ATM cloud, the border routers that connect these subnetworks to 409 the ATM cloud, do not receive explicit notifications that some 410 downstream routers could be a part of an IDMR multicast propagation 411 tree (as depicted in Figure 1). Thus, a broadcast & prune mechanism 412 of the IDMR protocol should be exploited periodically by the cut- 413 through multicast routers in order to learn about the downstream 414 routers that depend on them. The discovery process is based on 415 analysis of the prune messages that the multicast router will receive 416 from the neighboring routers. 418 On the other hand, the co-existence of IDMR protocols with the cut- 419 through solution, raises several problems: 421 o Routing decisions are normally made at the multicast 422 routers. If hosts can bypass a multicast router, the latter 423 should be aware of all the hosts in its own LIS (and in all of 424 the downstream LISs) that participate in the cut-through 425 connections. Otherwise the IDMR protocols would not be able to 426 construct the multicast propagation trees correctly and the 427 multicast datagrams may be lost. 429 o If a multicast cut-through mechanism is deployed in 430 conjunction with some IDMR protocol, then conflicts with the 431 Reverse Path Forwarding (RPF) [20] may occur. The RPF mechanisms 432 prevent routing loops and are crucial for the correct operation 433 of IDMR protocols. Thus, the cut-through traffic should be 434 treated carefully in order not to confuse the IDMR protocol. 436 o A multicast distribution tree of an IDMR protocol may span 437 non-ATM based IP subnetworks and contains more than one border 438 router that connect these subnetworks to the ATM cloud as shown 439 in Figure 2. If these border routers maintain the cut-through 440 ATM connections to all other relevant border routers, undesired 441 datagram duplication may result. 443 o Another scenario that may lead to routing loops and 444 undesired datagram duplication, may arise when both a cut- 445 through mechanism and some conventional IDMR protocol, are 446 deployed in the same ATM cloud. This means that an IDMR tree 447 spans some routers within the ATM cloud and not only the border 448 routers. 450 =============================================================== 451 S 452 | 453 CTR xxxxxxxxxxx CTR(a) ##############R 454 xx x x # 455 x x x x # # # 456 x x x x # # # 457 x x x x # # # 458 x x x x R R R 459 x xx x # 460 x xx x # 461 x x x x # .... 462 x x x x # 463 x x x x # 464 x x x x # 465 xx x x # 466 x x x # 467 CTR xxxxxxxxxxx x CTR(b) 469 IP/ATM + Shortcut Domain DVMRP Domain 471 S - the source 472 R - IP router 473 CTR - cut-through router 474 x - cut-through connection 475 # - DVMRP branch 477 Figure 2. 478 =============================================================== 480 3. IMSS Overview 482 IMSS organizes IP multicast routers into logical groups, where each 483 group corresponds to some class D IP address and contains routers 484 that have members of this IP multicast group or senders to it in 485 their domain. These groups are termed "D-groups". D-groups will be 486 further discussed in Section 4.2. The resolution and management of 487 these multicast router groups is performed through the CONGRESS 488 services described later in Section 3.2. 490 3.1 Network Model 492 In this memo, the physical layer is assumed to be comprised of 493 different interconnected Data Link subnetworks: ATM, Ethernet, 494 Switched Ethernet, Token Ring etc. IMSS facilitates IP multicast 495 data transfer over large-scale Non-Broadcast Media Access (NBMA) 496 network. We assume that ATM is the underlying NBMA network. We call a 497 single ATM Data Link subnetwork an ATM cloud. For administrative and 498 policy reasons a single ATM cloud may be partitioned into several, 499 disjoint logical ATM clouds, so that the direct connectivity is 500 allowed only within the same logical cloud. Hereafter, unless 501 otherwise specified, we use the term ATM cloud to mean logical ATM 502 cloud. 504 We assume that the network layer is IP. The topology of the IP 505 network consists of hosts (that may be either ATM based or non-ATM 506 based) and IP routers. IP multicast traffic (which is our focus) is 507 routed using IP multicast routers running some (potentially 508 different) IDMR protocols. 510 The internals of IP implementation may vary from one IP subnetwork to 511 another. The differences are due to the usage of different Data Link 512 layers. If the underlying network is ATM, then the IP subnetwork's 513 implementation can be based either on LAN Emulation, or Classical IP 514 and ARP over ATM (RFC1577) [8] standards. 516 We differentiate between the two types of IP-multicast routers: a) 517 routers that run an IDMR protocol and b) those that run both an IDMR 518 protocol and the IP-SENATE protocol. We refer to the latter routers 519 as "border routers". A border router connects either an ATM based 520 LIS, or a conventional IP subnetwork to an ATM cloud. 522 An important assumption is that only one IDMR protocol is allowed 523 _inside_ (including the border routers) the same logical ATM cloud. 524 Having multiple IDMR protocols in the same logical ATM cloud 525 considerably complicates the task of avoiding datagrams duplications 526 that may happen as was explained in Section 2. If multiple IDMR 527 protocols need to be deployed in an ATM cloud than each of the 528 respective multicast routing domains will constitute a distinct 529 logical ATM cloud. 531 It should be noted, that we use the term border router in a slightly 532 different manner than this term is usually used. Namely, if upon 533 receiving an IP multicast datagram via an IDMR protocol, a border 534 router for some reason cannot forward it using a cut-through 535 connection, it may use an IDMR protocol for the next hop forwarding. 536 As one may note, the border router behaves just as a regular router 537 in this case. For this reason, we will sometimes refer to a border 538 router simply as "IP-SENATE router", to stress the mere fact that it 539 may take either IDMR routing decisions, or IP-SENATE routing 540 decisions at any given time w.r.t the same network interface. 542 Depending on the direction of the IP multicast traffic, a border 543 router may be called "ingress router" (if the traffic is directed to 544 the IP subnetwork), or "egress router" (if the traffic is directed 545 outside the IP subnetwork). 547 All IDMR protocols make use of multicast distribution trees over 548 which IP multicast datagrams are propagated. Multicast routers that 549 comprise a specific tree, receive datagrams from the upstream routers 550 and forward them to the downstream routers. 552 For the sake of simplicity, we assume that each border router has 553 only one ATM interface that participates in the IP-SENATE protocol. 555 3.2 CONGRESS 557 CONGRESS is a native ATM protocol that provides multicast group 558 address (name) resolution and dynamic membership monitoring services 559 to higher-level applications. Multicast group names are application- 560 defined character strings. CONGRESS does not deal with actual data 561 transmission. Address resolution services provided by CONGRESS, are 562 used by applications in order to open and maintain native ATM 563 connections for data transmission. Although CONGRESS is much more 564 than just an auxiliary service for IP-SENATE, in this document we 565 concentrate only on those CONGRESS' features that are relevant for 566 IP-SENATE (The interested reader is advised to read the full version 567 of the CONGRESS protocol presented in [14]). From the CONGRESS' 568 perspective, IP-SENATE is one of the applications that utilizes its 569 services. 571 3.2.1 CONGRESS' API 573 We refer to a client that uses CONGRESS services by a generic term 574 end-point (in the context of this document, an end-point is always an 575 IP-SENATE router). An end-point may become a group member by joining 576 a group or cease its membership by leaving a group. Each join or 577 leave request of an end-point leads to a generation of an Incremental 578 Membership Notification w.r.t. a specific group. Incremental 579 membership notifications reflects only the difference between the new 580 membership and the previously reported one. The full membership of a 581 group may be constructed by resolving a group name once upon joining 582 and then by applying the incremental membership notifications as they 583 arrive. Incremental membership notifications may be also triggered by 584 various asynchronous network events, i.e, host or communication link 585 crash/recovery. 587 The CONGRESS services are provided by a library that includes the 588 following basic functions: 590 o join(G, id, id_len): Makes the invoking end-point a 591 registered member of a multicast group G. id is the identifier 592 of the new member (a pointer to some application-specific 593 structure). id_len is the size of this application specific 594 structure. 596 o leave(G, id, id_len): Unregister the invoking end-point 597 from G. 599 o resolve(G): A multicast group name G is resolved into 600 a set of the ATM end-point identifiers. This set includes all 601 the end-points who joined G and have not disconnected due to a 602 network failure or a host crash. 604 o set_flag(G, imn_flag): Enables or disables the reception 605 of the incremental membership notifications w.r.t. G, by the 606 invoking end-point. 608 In the context of this memo, a multicast group is always a D-group. 610 3.3 IP-SENATE 612 An IP-SENATE extension at a multicast router uses the group 613 membership information that it receives from CONGRESS, in order to 614 open ATM connections that bypass the IP routing mechanism. Since the 615 number of multicast routers is considerably lower than the overall 616 number of the ATM-based destinations (both hosts and multicast 617 routers), IP-SENATE reduces the number of potential short-cut 618 connections comparing to a straightforward host to host cut-through 619 routing. It may still be the case, however, that the number of 620 multicast routers participating in a mesh of ptmpt connections is 621 very large. Using the address resolution services of CONGRESS, IP- 622 SENATE can support both hierarchies of multicast servers and meshes 623 of ptmpt connections, and to switch back and forth between these two 624 layouts as required. This will be described in Subsection 6.2.3. 626 In order to avoid stable routing loops, an IP-SENATE router never 627 routes IP multicast datagrams using cut-through connections if they 628 were received from another IP-SENATE router. In addition, an RPF-like 629 mechanism is deployed by IP-SENATE in order to prevent the extensive 630 duplication of IP multicast datagrams. Such duplication may result 631 from multiple IP-SENATE routers setting up multiple cut-through 632 connections to the same destinations (see Figure 2). 634 We assume that IP-SENATE will be used along with conventional IDMR 635 protocols and that not all of the multicast routers will run IP- 636 SENATE within an ATM cloud. As was explained in Subsection 2, this 637 deployment mode may lead to unnecessary datagram duplication when a 638 datagram is propagated over some multicast distribution tree and, 639 simultaneously, over a cut-through connection. 641 IP-SENATE provides a pruning mechanism that cuts the branches of an 642 IDMR multicast distribution tree so that IP-SENATE multicast router 643 that receives datagrams via a cut-through connection would not 644 receive duplications via IDMR. 646 4. Architecture 648 4.1 CONGRESS Architecture 650 CONGRESS services are provided by a set of servers. There are two 651 kinds of CONGRESS servers: Local Membership Servers (LMSs) and Global 652 Membership Servers (GMSs). An LMS resides at the same hosts as a 653 multicast router and constitutes this router's interface to the 654 CONGRESS services. GMSs are organized in a hierarchical structure 655 throughout the network, and may run on either dedicated machines or 656 in switches. Logically, an LMS location is independent of the 657 router's host. 659 CONGRESS views the network as a hierarchy of domains, where each 660 domain is serviced by a CONGRESS server (the CONGRESS hierarchy can 661 be readily mapped onto a peer group hierarchy provided by the native 662 ATM network layer, PNNI). Note, that there is no relationship between 663 a CONGRESS domain and a LIS. At the lowest level, a domain consists 664 of a single multicast router. Such a domain is called a "host domain" 665 and is serviced by the LMS of the router's host. The LMS is called a 666 "representative" of a host domain. Higher level domains consist of a 667 set of the lower level domain representatives. Thus, a single GMS may 668 serve a domain that consists of either several LMSs, or several GMSs 669 that are representatives of their respective lower level domains. 671 A CONGRESS `domain identifier' is the longest common address prefix 672 of the domains it is built of. The domain identifier of a host domain 673 is the ATM address of the host itself. Figure 3 illustrates the 674 CONGRESS domain layout. 676 Note, that there is no relation between the addresses in the figure 677 below and the IP address whatsoever. The IP-like addresses were 678 chosen to illustrate the hierarchy idea in the most simple way. 680 ======================================================================== 682 GMS --------------------------------- GMS 683 1.1 1.7 684 / \ / \ 685 / \ / \ 686 / \ / \ 687 / \ / \ 688 / \ / \ 689 / \ / \ 690 / \ / \ 691 / \ / \ 692 GMS ------------ GMS GMS --------------- GMS 693 1.1.1 1.1.2 1.7.4 1.7.2 694 / \ / \ / \ / \ 695 / \ / \ / \ / \ 696 LMS LMS LMS LMS LMS LMS LMS LMS 697 1.1.1.2 1.1.1.5 1.1.2.1 1.1.2.3 1.7.4.8 1.7.4.9 1.7.2.1 1.7.2.6 699 Figure 3. 701 ======================================================================== 703 In order to avoid flooding of the whole network upon every membership 704 change occurring in every D-group, membership notifications 705 pertaining to a D-group are propagated using a distributed spanning 706 tree for this group. This spanning tree is a sub-tree of the CONGRESS 707 servers hierarchy. The CONGRESS servers comprising the sub-tree 708 corresponding to a D-group, are the servers that have multicast 709 routers from this group in their domains. Each server in the CONGRESS 710 hierarchy maintains only a part of the spanning tree that consists of 711 its immediate neighbours. The spanning tree is constructed and 712 maintained according to the multicast routers join/leave requests 713 issued through their LMSs. In addition, asynchronous network events 714 such as crashes/recoveries of end-points (multicast routers), 715 CONGRESS servers and/or failures of communication links change the 716 topology of the spanning tree (such events are detected by a best- 717 effort fault detector module). Obviously, since CONGRESS operates in 718 an asynchronous environment, the spanning tree of a group can only be 719 a best-effort approximation. 721 4.2 IP-SENATE Architecture 723 In Figure 4, the architecture of IP-SENATE router is presented. An 724 IP-SENATE router is, by definition, a border router that connects a 725 cut-through routing domain to some IDMR routing domain(s). As shown 726 in the figure, IP-SENATE extends a multicast router`s software. D- 727 groups of IP-SENATE are managed through CONGRESS. We employ an LMS at 728 each IP-SENATE router in order to provide the interface to the 729 CONGRESS services. In order to make routing decisions and to open 730 cut-through connections, IP-SENATE communicates with the CONGRESS 731 protocol that supplies group address resolution and maintenance 732 services. 734 ================================================================== 736 |----\ /---------|----------| |-------|----------\ /-----| 737 | IP \/ IP-SENATE| IDMR | | IDMR | IP-SENATE \/ IP | 738 | |____ _____|__________| |_______|____________| | 739 | | ^ ^ ^ | | ^ ^ ^ | | 740 | | | |---| |--------| | | |--------| |---| | | | 741 | | | |CGS| |RFC+MARS| | | |RFC+MARS| |CGS| | | | 742 | | | |if | |1577 if | | | |1577 if | |if | | | | 743 | |----| | |---| |--------| | | |--------| |---| | | |----| 744 | |IDMR| v v v | | v v v | |IDMR| 745 |-|----|---------------------| |------------------- |-|----| 746 |MAC | ATM ______________| |___________ ATM | | 747 | | |signalling | |signalling| | | 748 |------|---------------------| |------------------- |------| 749 |phy. | phy. layer | | phy. layer |phy. | 750 |layer | | | |layer | 751 |------|---------------------| |------------------- |------| 752 | | | | 753 ... == ============ ===== ... 755 CGS if - CONGRESS interface 756 MARS if - MARS interface 758 Figure 4. 760 ================================================================== 762 In the classical IP multicast model [19], a host does not have to 763 become a registered member of a multicast group in order to send 764 datagrams to this group. A sender does not see any difference between 765 sending a datagram to a multicast IP address or a unicast IP address. 766 The difference is in the multicast router, that has to participate in 767 some IDMR protocol that builds a multicast propagation tree. In this 768 model, a multicast router usually should know only about its 769 immediate neighbours that belong to the propagation tree, and not 770 about the whole tree (example of an exception to this is MOSPF [11]). 772 IP-SENATE provides the hosts with the same interface for IP multicast 773 service as in the classical model. A border IP-SENATE router that 774 forwards IP multicast datagrams from a particular source residing in 775 a non-ATM cloud into the ATM cloud, or from an ATM-based host 776 residing in the router's LIS, is termed an injector for the 777 corresponding pair. (Note, that the same router may 778 function as an injector for multiple pairs). 780 Injectors for any specific class D address must know the identifiers 781 of all other IP-SENATE routers that must receive the traffic targeted 782 to this class D address. For any pair, shortcut 783 connections should be opened by the corresponding injectors to these 784 IP-SENATE routers. Ideally, only a single injector should be active 785 w.r.t. any source in order to avoid datagram duplication. The set of 786 IP-SENATE routers' identifiers that has to be maintained per IP class 787 D address, includes the identifiers of the IP-SENATE routers that 788 have either 790 o directly connected hosts that registered (e.g., using IGMP) 791 to receive IP multicast traffic pertaining to a specific class D 792 address, or 794 o some downstream multicast routers (w.r.t. some source) that have 795 receivers in their LISs. 797 This set of IP-SENATE routers is termed D-group. In order to obtain 798 the membership of a D-group, an IP-SENATE router joins this group via 799 CONGRESS. The name associated with this multicast group is just a 800 class D address interpreted as a character string. The details of how 801 D-groups are formed and managed are provided in Subsection 6.2. 803 It may seem that an IP-SENATE router that does not have any 804 downstream receivers (neither routers, nor hosts) w.r.t any source, 805 does not need to be a member of a D-group because it does not need to 806 receive any traffic. Such a router could have used the CONGRESS 807 resolve operation each time it needs to learn about the membership of 808 the corresponding D-group (for example, when it needs to send a 809 datagram originated by a sender in its domain). In this scheme, 810 however, CONGRESS would have been heavily used and unnecessary 811 overhead on the network would be imposed. In our approach, an IP- 812 SENATE router joins the relevant D-group even if it does not have to 813 receive the multicast traffic. In this case, it will receive 814 incremental membership notifications concerning the D-group. These 815 scheme is less costly. In order to prevent such a router, from being 816 added as a leaf to the cut-through connections within the D-group, 817 special sub-identifiers are added to the IP-SENATE router's 818 identifier. This is explained in Subsection 6.1. 820 In order to overcome the previously mentioned NIC's limitations on a 821 number of simultaneously opened connections, some IP-SENATE routers 822 may act as multicast servers, serving other IP-SENATE routers that 823 are termed clients. 825 It is important to stress that an IP-SENATE router acting as a server 826 in one D-group may act as a client in another one. Moreover, as will 827 be explained in Subsection 6.2.3, the operational roles of the IP- 828 SENATE routers may dynamically change within the same D-group. 830 It is important to understand that maintaining a distinct multicast 831 group simultaneously for every possible IP class D address is 832 technically infeasible. Fortunately, there is no real need to do 833 this, because only a part of these addresses is actually in use at 834 any given time. It is also unlikely that the same multicast router 835 would belong to ALL the D-groups. In IP-SENATE's approach, membership 836 of D-groups is formed on-demand using CONGRESS, as will be explained 837 in Subsection 6.2. 839 Another very important property of the IP-SENATE solution is that 840 IP-SENATE can tear down the cut-through connections among the members 841 of a D-group when no multicast data is transmitted over these 842 connections for a sufficiently long period of time. The cut-through 843 connections may be resumed later on-demand, using CONGRESS to obtain 844 updated membership information. Note, that when an IP-SENATE router 845 terminates the inactive connections within a D-group, this does not 846 affect CONGRESS which may continue to monitor the membership of the 847 group running "in the background". Thus, when the cut-through 848 connections need to be resumed, the membership information would be 849 instantly available. 851 For a variety of reasons that were explained in Section 2, IP-SENATE 852 may have to co-exist with some IDMR protocol in the same ATM cloud. 853 This implies that an IP-SENATE router may receive IP multicast 854 datagrams both via an IDMR protocol and the cut-through connections 855 on the same network interface. For the correct operation of IP-SENATE 856 protocol, it is necessary to differentiate between these two cases. 857 One way to do this is to use the protocol field of the IP datagram 858 header. An IP-SENATE protocol should be assigned a special unique 859 number. Each time an IP-SENATE router forwards a datagram over a 860 cut-through connection, the original protocol number is extracted and 861 appended to the end of the datagram. The IP-SENATE protocol number 862 is inserted into the protocol field and all other relevant fields of 863 the IP datagram header (total length, header checksum, etc.) should 864 be updated appropriately. Obviously, the reverse operations should be 865 performed by the IP-SENATE routers on the other side of the cut- 866 through connections. A more detailed description of this 867 encapsulation technique is to be provided. 869 4.3 IMSS Architecture 871 In Figure 5 the architecture of IMSS is summarized. IMSS does not 872 change a MARS server's functionality. An IP-SENATE router interacts 873 with the MARS server in order to carry out IP multicast transmission 874 within the LIS. An LMS serves as a CONGRESS front-end to the IP- 875 SENATE router. An IP-SENATE router communicates with an LMS in order 876 to handle the membership of the relevant D-groups. An LMS 877 communicates with the GMS as was explained in Section 4.1. In the 878 figure above an LMS is shown to run on the same machine as the IP- 879 SENATE router. This layout is most reliable since the LMS monitors 880 the IP-SENATE router's liveness using IPC tools. It is possible, 881 however, to run an LMS on a different machine. 883 =================================================================== 885 -------------------------------- 886 | | 887 | | | 888 --------- | | ------------- ------- | ------- 889 | MARS | <--|--> | | IP-SENATE | <----> | LMS | | <-----> | GMS | 890 | Server| | | | Router | ------- | ------- 891 --------- | | ------------- | 892 | | 893 LIS -------------------------------- 894 border 896 Figure 5. 898 =================================================================== 900 5. CONGRESS Protocol 902 5.1 Data Structures 904 In this subsection we summarize the main data structures used by both 905 LMS and GMS types of CONGRESS servers. 907 Each LMS maintains a Local Membership List. This list contains the 908 D-group addresses that the multicast router local to the LMS had 909 joined through CONGRESS. 911 In order to avoid constant flooding of the network with excess 912 messages, the GMSs maintain for each D-group G a distributed CONGRESS 913 "group control tree", T(G), that is a sub-tree of the CONGRESS 914 hierarchy tree. Vertices of T(G) are LMSs and GMSs (where LMSs are 915 the leafs of T(G)) that have the members of G in their respective 916 domains. All CONGRESS protocol messages concerning G are confined to 917 T(G). 919 Each GMS maintains only a local part of T(G) for each D-group G in a 920 vector GT(G). GT(G) holds an entry for each neighbour (i.e., parent, 921 sibling or child) of the GMS in T(G). A value of an entry in this 922 vector can be either `resolve', or `all'. In case of `resolve', only 923 `resolve' requests are forwarded to the corresponding neighbour 924 (because no member of G in its domain have set the on-line flag). A 925 value of `all' means that all CONGRESS protocol messages concerning G 926 should be forwarded to that neighbour. 928 When a GMS first creates a vector for a group, all its entries are 929 initialized to `all' for each of the GMS's neighbours. 931 Each GMS also keeps track of the liveliness of its neighbours through 932 updates supplied by its fault-detector module. 934 5.2 IMSS Router Joining/Leaving a D-group 936 When an IMSS router wishes to join a D-group G, it issues a `join' 937 request to its LMS, L, using some local IPC mechanism. Next, L 938 informs its GMS about the new member of G by forwarding it a `join' 939 message. 941 The `join' message must be propagated to all members of G that have 942 requested incremental membership notifications. As will be explained 943 later, a multicast router that acts as a client of a multicast 944 server, does not require constant reception of incremental membership 945 notifications. 947 When a `join' message travels in the CONGRESS hierarchy, GMSs can 948 learn about the new member of G and update their GT(G) accordingly in 949 order to ensure the correct operation of the future `resolve' 950 operations. 952 If a GMS receives the `join' notification from one of its children C, 953 and GT(G) does not exist (i.e., the new member of G is the first one 954 in the GMS's domain), then the GMS initializes it, and forwards this 955 message to all its live siblings and the parent. 957 If GT(G) exists, the GMS sets GT(G,C) to `all' and forwards the 958 notification to all its live siblings and the parent that have `all' 959 in their corresponding entries of GT(G). As a special case, upon the 960 reception of the join notification directly from an LMS, a GMS 961 forwards it also to all of its children (i.e. LMSs) that are alive 962 and have `all' in their corresponding entries of GT(G). 964 If a `join' notification w.r.t. G was received by a GMS from its 965 parent or a sibling, X, and GT(G) does not exist, the notification is 966 ignored. Otherwise, the entry GT(G,X) is set to `all' and the GMS 967 forwards the notification to all its live children that have `all' in 968 the corresponding entries of GT(G). 970 Upon the reception of the notification about a new router joining G 971 from its GMS, an LMS delivers a corresponding incremental membership 972 notifications to the local IMSS router. 974 In order to maintain a T(G) accurately, GMSs should prune all their 975 neighbours that do not have members of G in their respective domains, 976 from their GT(G) entries. This will allow to keep the message 977 overhead linear in the size of G. Immediately after the new router 978 register in a new D-group , the local LMS issues a `resolve' request 979 w.r.t. G, on its behalf. This request is handled as described in 980 Subsection 5.4. The CONGRESS servers that reply with an empty lists 981 of members (routers) are removed from GT(G) by the GMSs throughout 982 the hierarchy. Note that if a parent GMS reply with an empty list to 983 its child (in the CONGRESS hierarchy), the child does not remove the 984 corresponding entry of the parent from its GT(G). 986 An IMSS router leaves a D-group through issuing a `leave' request to 987 its LMS. The propagation of the `leave' notification corresponding to 988 this request is exactly the same as that of the `join' notification 989 described above. In addition, if a GMS S discovers that there are no 990 more members of a group G in its domain, it deletes the GT(G) vector 991 from its GT. After that, S informs all its neighbours to which it 992 forwarded the corresponding `leave' notification that they should 993 remove GT(G, S) entry from their GTs (The set of these neighbours 994 does not include neighbouring LMSs). Note that an LMS knows that a 995 group should be deleted by directly monitoring the membership of its 996 local IMSS router. A GMS knows that a group should be deleted when 997 all of its children have reported that there are no more members of a 998 group G in its domain. 1000 5.3 Reception of Incremental Membership Notifications 1002 Whenever an IMSS router wishes to start or stop receiving incremental 1003 membership notifications w.r.t. a D-group G of which it is a member, 1004 all the GMSs that have members of G in their domain must know this. 1005 This is necessary for accurate propagation of future membership 1006 changes of G occurring in their domain. However, a notification of 1007 this request is not necessary to be received by GMSs if the 1008 requesting router is not the first inside (or outside) the GMS's 1009 domain to request incremental membership notifications. The same is 1010 true if a router is the last inside (or outside) their domain 1011 requesting to stop receiving incremental membership notifications. An 1012 IMSS router may wish to stop the reception of the incremental 1013 membership notifications if it decides to operate in a `client' role, 1014 as will be explained in Section 6.2.3. 1016 Let G' be the set of members of G that requested to receive 1017 incremental membership notifications. When an IMSS router R desires 1018 to receive incremental membership notifications w.r.t. a D-group G, 1019 it issues a `set_flag' request with the `online_flag' parameter set 1020 to TRUE to its LMS. The LMS forwards the `set_flag' request message m 1021 to its GMS. Similarly, when R desires to stop receiving incremental 1022 membership notifications, it issues a `set_flag' request with the 1023 `online_flag' parameter set to FALSE to its LMS. When a GMS receives 1024 m from a neighbour, it sets the entry of this neighbour in GT(G) to 1025 `all' if `online_flag' is TRUE, and to `resolve' otherwise. If R is 1026 the first member of G' in the GMSs domain or G' has no more members 1027 in this domain, m is forwarded to all the siblings that are listed in 1028 GT(G), and to the parent. If R is the first member of G' outside the 1029 GMSs domain or G' has no more members outside this domain, then m is 1030 forwarded to all the children of the GMS that are listed in GT(G). 1032 It should be noted, that each CONGRESS server always marks its parent 1033 as `all'. 1035 5.4 Resolution of D-Group Address 1037 An IMSS router that is a member of a D-group G, can resolve G's name 1038 into a list of the live registered members by issuing an appropriate 1039 `resolve' request to its LMS. The LMS then generates an appropriate 1040 message m from it, and forwards m to its GMS. 1042 When a GMS receives m from one of its children, it forwards m to all 1043 the live siblings and the parent that are listed in GT. As a special 1044 case, if m was received from an LMS, the GMS also forwards m to the 1045 live LMSs that are listed in GT(G). If m was received by the GMS from 1046 either its parent or a sibling, it forwards it to all the live 1047 children that are listed in GT(G). The GMS then collects the 1048 responses to m until all the neighbours have responded or became 1049 disconnected. Then the GMS sends the aggregated response to the 1050 neighbour from which the request was received. 1052 When an LMS receives a `resolve reply' message m w.r.t. G, it 1053 responds with the the address of the local router. If the local 1054 router is not a member of G the LMS responds with an `empty' message. 1056 This way, the `resolve' request is propagated to the relevant LMSs 1057 that are leaves of the T(G). The responses of these LMSs are 1058 aggregated by the GMSs, the intermediate nodes of T(G). The final 1059 response will be received by the LMS that originated the `resolve' 1060 request from its GMS and will be delivered to the requesting IMSS 1061 router. 1063 5.5 Handling of Failures 1065 The CONGRESS handling of failures focuses on asynchronous host 1066 crash/recoveries, and communication links failures/recoveries. In 1067 order to handle these failures each CONGRESS server interacts with a 1068 local "fault detector" module that monitors the liveliness of this 1069 CONGRESS server's neighbours. All the messages that are sent/received 1070 by a CONGRESS server pass through the fault detector in the first 1071 place. Thus, a message received from a CONGRESS server is interpreted 1072 by the fault detector as the evidence of the sender's liveliness. If 1073 a server's neighbour was suspected by the fault detector of this 1074 server, and later a message from the presumably failed neighbour was 1075 received, the fault detector delivers the notification about the 1076 neighbour's liveliness before the delivery of its message. 1078 5.5.1 IMSS Router Failure 1080 When an IMSS router fails, a local LMS discovers this using internal 1081 IPC mechanisms. This event is handled by the LMS as if the failed 1082 router had issued a `leave' request w.r.t. to all the D-groups that 1083 it was a member of. 1085 5.5.2 Domain Failure 1087 When a CONGRESS server disconnects from the rest of the hierarchy due 1088 to a communication link failure or a host crash, this event is 1089 interpreted by its neighbours as if all the IMSS routers that reside 1090 in its domain have left their respective D-groups. Instead of sending 1091 multiple `leave' notifications, each GMS that detects a failure of a 1092 neighbouring CONGRESS server, generates a `domain leave' notification 1093 message that contains the domain identifier of the failed domain and 1094 a list of all the D-groups that had members in this domain. The 1095 latter is obtained from the local GT table. The `domain leave' 1096 notification is propagated and processed throughout the CONGRESS 1097 hierarchy in the same way as a `join'/`leave' notification. An IMSS 1098 router outside the failed domain can compute a new membership of a 1099 D-group from the `domain leave' notification by discarding all the 1100 IMSS routers that have the same address prefix as the failed domain 1101 identifier. Similarly, an IMSS router within the failed domain 1102 discards all the IMSS routers that have the address prefix different 1103 from that of the failed domain. 1105 5.5.3 Domain Recovery 1107 A GMS and its respective domain are considered recovered whenever the 1108 GMS re-connects or re-starts execution. For each D-group G in the 1109 recovered domain group membership information must be updated 1110 throughout the re-merged T(G). A recovered GMS initializes its data 1111 structures from scratch as described in Section 5.1. 1113 When a GMS detects (through the fault detector) a recovery of one of 1114 its siblings in the CONGRESS hierarchy, it resolves all the D-groups 1115 that are present in its GT by issuing `resolve' requests to its 1116 children. The aggregated replies of these `resolve' requests are sent 1117 as ordinary `join' notifications to the recovered sibling. 1119 When a GMS detects a recovery of one of its children, it does not 1120 perform any actions except marking this server as alive. The 1121 necessary actions will be initiated by the recovered child as 1122 described below. 1124 When a CONGRESS server detects a recovery of its parent, it generates 1125 `resolve' requests to its children w.r.t. all the D-groups known to 1126 this CONGRESS server. The aggregated results are sent to the parent 1127 as special `join' notifications. These notifications are forwarded as 1128 ordinary `join' notifications, but are also marked with a special 1129 flag. When such a message w.r.t. a D-group G is received by some GMS 1130 from its sibling, the GMS resolves the membership of G within its 1131 domain and sends back the aggregated result as an ordinary `join' 1132 notification. 1134 6. IP-SENATE Protocol 1136 In this section we provide a detailed description of the IP-SENATE 1137 protocol. For the sake of simplicity we divide the protocol into two 1138 parts: a) D-groups' formation and maintenance, b) datagram forwarding 1139 decisions. The IP-SENATE routers are event-driven. This means that in 1140 the core of the program, there is a main event-dispatching loop, and 1141 when a certain event occurs, an appropriate event handler function is 1142 invoked. After an event has been processed, the control is returned 1143 to the main loop. It is important to stress, that the event handling 1144 is atomic, i.e, a pending event is not handled until the current 1145 event has been fully processed. For the sake of simplicity, we 1146 provide all the explanations for a single IP multicast group (i.e., a 1147 single class D address). 1149 6.1 Main Data Structures 1151 This subsection depicts the main data structures used by the IP- 1152 SENATE routers. 1154 o RAV[G]: Each IP-SENATE router R maintains a Redundancy Avoidance 1155 Vector (RAV) for each D-group G with which R is involved. RAV[G] 1156 has an entry for each source (originator) of the IP multicast 1157 datagrams that were forwarded to R by other IP-SENATE routers 1158 (i.e., via short-cut connections). 1160 We define "remoteness" of an injector to be an estimation of the 1161 distance of a router from a datagram source. This estimation can 1162 be based, for instance, on the TTL value of the packet received 1163 from the source (the higher is the value of the TTL field, the 1164 closer is the injector to the source). Another method for 1165 measuring the remoteness is piggybacking the routing metrics 1166 derived from the routing tables of the injector on the packets 1167 forwarded over the short-cut connections. It should be noted 1168 that using TTL as a measure for remoteness may cause some 1169 problems, as will be explained later. We will use a function 1170 denoted as remoteness(m, R) where m is a datagram received by an 1171 IP-SENATE router R in order to calculate the remoteness of R 1172 from the source of m. We use regular mathematical notation to 1173 compare two remoteness values. The meaning of remoteness(m, R) < 1174 remoteness(m, R') is that R is closer to m's source than R'. 1176 The entry RAV[G][S] holds the name of the IP-SENATE router that 1177 has the minimal remoteness value w.r.t the source S and is 1178 forwarding datagrams from S to R through short-cut connections. 1179 The value of remoteness is kept in the same entry with the 1180 router's identifier. The information kept in RAV[G] is temporal 1181 and is refreshed regularly, as will be explained later. 1183 o eif: expected network interface variable. This variable is 1184 concerned with the RPF techniques that are used by the IDMR 1185 protocols in order to break routing loops that may occur in 1186 multicast distribution trees. When a multicast IP datagram 1187 arrives to a multicast router, the router checks whether it 1188 received it from the "expected" network interface. The expected 1189 network interface for a multicast datagram originated at some 1190 source S, is the interface that would be used to forward unicast 1191 datagrams to S by this multicast router. If a multicast datagram 1192 arrived from an unexpected interface it is silently discarded, 1193 because it was not propagated over the optimal branch. Obviously 1194 for each IP multicast datagram originated at some source S, the 1195 value of this variable depends on the IDMR routing tables. It is 1196 important to understand that an actual implementation is not 1197 required to support eif explicitly. This variable is used by us 1198 in order to simplify the presentation of the algorithms. 1200 o id: identifier of an IP-SENATE router. This is a structure 1201 containing the following fields: 1203 - physical address: an ATM address of the IP-SENATE router; 1205 - operational role: `client' or `server'; 1207 - mode: `sender-only' or `regular'. 1209 o Membership[G]: group membership table. For each D-group 1210 of which an IP-SENATE router is a member, there is a row in this 1211 table. Each item in the row is an id structure, as explained 1212 above. These memberships are maintained through CONGRESS' 1213 incremental membership notifications. 1215 6.2 Maintenance of D-groups 1217 In this subsection we explain in a more detailed manner how IP-SENATE 1218 routers build and manage D-groups. 1220 6.2.1 Joining D-Groups 1222 The code below deals with handling of four kinds of events that cause 1223 an IP-SENATE router to join a D-group. 1225 C1. explicitly requested join: 1227 C1.1 1229 An IP-SENATE router R finds out (e.g, through processing of 1230 IGMP "join_group" request or "MARS_JOIN" request) that there 1231 exists some destination within its LIS, that needs to receive 1232 IP multicast datagrams that are sent to some IP class D 1233 address. 1235 C1.2 1237 An IP-SENATE router R learns via some mechanism (e.g, via some 1238 control messages) that there exist downstream multicast routers 1239 that depend on it for receiving multicast datagrams for some 1240 group. 1242 C2. traffic-driven join: 1244 C2.1 1246 An IP-SENATE router R receives an IP multicast datagram via 1247 some IDMR propagation tree from some neighbouring multicast 1248 router. 1250 C2.2 1252 An IP-SENATE router R receives an IP multicast datagram from 1253 some directly attached host. 1255 In cases C2.1 and C2.2 an IP-SENATE router should decide whether it 1256 will forward a multicast datagram further. Moreover, if it decides to 1257 forward, it should also decide which protocol it will use, i.e, via 1258 IP-SENATE cut-through connections or via some IDMR multicast 1259 distribution tree. The IP-SENATE approach is to use cut-through 1260 wherever possible. In order to open the cut-through connections to 1261 all other relevant IP-SENATE routers, an IP-SENATE router joins an 1262 appropriate D-group. 1264 As was explained in Section 4.2, an IP-SENATE router may join a D- 1265 group assuming either a server or a client operational role. The 1266 operational role of an IP-SENATE router is indicated by its 1267 identifier. Further explanations about the operational roles are 1268 provided in Subsection 6.2.3. 1270 If an IP-SENATE router joins a D-group as a sender-only, it schedules 1271 a timer-related event handler that will terminate the membership of 1272 this router in the D-group, if no directly attached host emits 1273 multicast datagrams for a sufficiently long time. This timer will be 1274 referred later, as a D-timer. 1276 -------------------------------------------------------- 1278 if R is a member of G /* go to forwarding decisions */ 1279 go to the table of forwarding decisions (Figure 6); 1281 else 1282 if case C1.1 or case C1.2 or case C2.1 1283 decide on the operational role according to local conditions; 1284 id = {R, role, regular}; 1285 join(G, id, ...); /* Join via CONGRESS */ 1286 else /* case C2.2 */ 1287 decide on role according to local conditions; 1288 id = {R, role, sender-only}; 1289 join(G, id, ...); /* Join via CONGRESS */ 1290 Reset D-timer; 1292 go to the table of forwarding decisions (Figure 6); 1294 -------------------------------------------------------- 1296 Note that if downstream routers participate in a "broadcast & 1297 prune"-based IDMR protocol, case C1.2 is problematic, since no 1298 explicit information about these routers is available. This is a 1299 generic problem that does not pertain to cut-through routing only. 1300 The same problem arises when any "broadcast & prune"- based routing 1301 protocol works in conjunction with a protocol based on "explicit 1302 join" messages. As an example consider PIM [4,5] and DVMRP [10] 1303 interoperability issues [15]. Another work in progress that attempts 1304 to classify the inter-operability issues that arise from deployment 1305 of various IDMR protocols, is given in [16]. 1307 In the IP-SENATE approach we solve this problem as follows. 1309 Since we allow IP-SENATE to coexist with some other IDMR protocols 1310 (see Section 4.2) on the same NIC, an IP-SENATE router may 1311 periodically propagate datagrams using both an IDMR protocol and 1312 cut-through connections. This way a multicast propagation tree of an 1313 IDMR protocol will be preserved, and all IP-SENATE routers that are 1314 also nodes in some IDMR propagation tree (see case C2.1) will join 1315 the relevant D-group. As will be explained in the following 1316 subsection, an IP-SENATE router leaves this D-group when it receives 1317 "prune" messages from all of its neighbouring downstream multicast 1318 routers and no directly attached hosts desire to receive multicast 1319 traffic for this class D address. 1321 6.2.2 Leaving D-Groups 1323 This subsection depicts the part of an IP-SENATE router's algorithm 1324 that deals with leaving of the D-groups 1326 Generally, an IP-SENATE router may leave a D-group corresponding to 1327 some class D IP address, when this router has neither directly 1328 attached hosts, nor downstream routers that need to receive the IP 1329 multicast traffic pertaining to the multicast IP address, or need to 1330 send datagrams to it. This happens when 1332 o all directly attached hosts performed IGMP/MARS leave, and 1334 o all neighboring multicast routers (of attached networks), 1335 running some IDMR protocol, have sent `prune' or `leave' 1336 messages (depending on the IDMR protocol) for this group, or 1338 o the router is a `sender-only' member, and its D-timer for this 1339 group had expired. 1341 6.2.3 Client and Server Operational Roles 1343 An IP-SENATE router locally decides whether it will assume a client 1344 or a server role upon joining the relevant D-group. The decision 1345 depends on a number of connections that are already supported by the 1346 IP-SENATE router's NIC and the number of additional connections that 1347 need to be supported, if the router decides to assume a specific 1348 operational role. 1350 When an IP-SENATE router joins a D-group, assuming the client 1351 operational role, it expects that some server will take care of it. 1352 If no server takes care of this client for a certain period of time, 1353 this client starts using an IDMR protocol for the forwarding of IP 1354 multicast traffic. The IP-SENATE routers that act as servers, learn 1355 through the CONGRESS' incremental membership notifications about the 1356 new client. Based on the load of the server's NICs and CPU, physical 1357 distance, administrative policies etc., each server locally decides 1358 whether to take care of the new client. If a server decides to serve 1359 a client, it tries to open a native ATM VC to this client (or to add 1360 this client as a leaf to an already opened ptmpt connection). If the 1361 client has already accepted some other server's connection set-up 1362 request, it may either refuse to accept the new connection, or tear 1363 down the previous connection and to switch to the new one. In both 1364 cases this is a local decision of the client. 1366 In case of some server's failure, all its clients should re-join the 1367 relevant D-group. This will once again trigger the procedure 1368 described above. 1370 It should be noted that the operational roles are not fixed "once and 1371 for all". Depending on the size of a D-group and the local NIC and 1372 CPU load, an IP-SENATE router may desire to change its operational 1373 role. In order to do this, an IP-SENATE router should simply leave 1374 its D-group and then re-join it with the appropriately updated 1375 identifier that indicates its new operational role (see Section 6.1). 1377 6.2.4 Regular and Sender-Only Modes 1379 An IP-SENATE router may operate either in `regular' or `sender-only' 1380 mode, as was explained in Section 4.2. An IP-SENATE router may wish 1381 to change its mode from sender-only to regular if it learns about 1382 some downstream host or router that needs to receive the multicast 1383 traffic pertaining to a specific class D address. In order to 1384 perform this transition, an IP-SENATE router should leave the 1385 relevant D-group and re-join it with the updated identifier 1386 indicating that it is acting in the regular mode. 1388 Note, that actually there is no need for the transition in the 1389 opposite direction, i.e, from a regular to a sender-only mode. 1390 Indeed, if an IP-SENATE router does not have any downstream hosts or 1391 routers that desire to receive multicast traffic, this IP-SENATE 1392 router will simply leave the relevant D-group (see Subsection 6.2.2). 1393 If there exist some down-stream senders, this IP-SENATE router will 1394 re-join the group on-demand later, as was explained in Subsection 1395 6.2.1. 1397 6.3 Forwarding Decisions 1399 This subsection depicts the forwarding algorithm executed by the IP- 1400 SENATE routers. Due to the assumed heterogeneous network model, there 1401 are multiple cases that should be handled carefully. By using 1402 CONGRESS membership services and the encapsulation/decapsulation 1403 technique described in Section 4.2, an IP-SENATE router can 1404 differentiate between the multicast traffic that it receives from 1405 another IP-SENATE routers via the cut-through connections and traffic 1406 received via an IDMR propagation tree. An IP-SENATE server decides 1407 how to forward an incoming multicast packet according to the identity 1408 and operational role of the sending router and according to its own 1409 operational role. For each possible pair of sender and receiver, the 1410 table in Figure 6 provides a pointer to the subsection that describes 1411 the relevant part of the pseudo-code. The short parts of the pseudo- 1412 code are shown directly in the table. 1414 ============================================================ 1416 ------------------------------------------------------- 1417 | \ Sender| Multicast | IP-SENATE | IP-SENATE | 1418 | \ | Router (via | | | 1419 | \ | IDMR protocol) | CLIENT | SERVER | 1420 | \ | or a directly | | | 1421 | \ | attached host | | | 1422 | \ | | | | 1423 |Receiver \ | | | | 1424 |-----------------------------------------------------| 1425 | | | | Forward m | 1426 | | | | using | 1427 |IP-SENATE | 6.3.3 | X | IDMR | 1428 | | | | protocol. | 1429 | Client | | | | 1430 | | | | | 1431 |-----------------------------------------------------| 1432 |IP-SENATE | | | | 1433 | | 6.3.4 | 6.3.1 | 6.3.2 | 1434 | Server | | | | 1435 | | | | | 1436 ------------------------------------------------------- 1438 Figure 6. 1439 ============================================================ 1441 For the sake of simplicity and shorter representation, we assume that 1442 the involved IP-SENATE routers have already joined the relevant D- 1443 groups, according to the algorithm explained in Subsection 6.2.1. 1445 In all of the following cases we depict the steps taken by an IP- 1446 SENATE router R, upon a reception of an IP multicast datagram m 1447 originated at some source S and targeted to some multicast group G. 1449 6.3.1 A Server Receives a Datagram from a Client 1451 An IP-SENATE router acting as a server, is responsible for the 1452 propagation of the multicast traffic that it receives from its 1453 clients, to all the relevant multicast routers and directly attached 1454 hosts. 1456 In order to avoid undesired duplication of IP multicast datagrams, an 1457 IP-SENATE router should check whether some other IP-SENATE router(s) 1458 might propagate the IP multicast datagrams originating at the same 1459 source S. This may happen when a multicast distribution tree of some 1460 IDMR protocol contains more than one egress router that connect the 1461 branches of the propagation tree to the ATM cloud. Figure 2 provides 1462 a graphical representation of this scenario. In such a case, it is 1463 obviously preferable that only one of the egress routers, closest to 1464 the source, would transmit the datagrams. 1466 In cases such as described above, IP-SENATE routers belonging to the 1467 same D-group, can deterministically choose a router that will perform 1468 forwarding of IP multicast datagrams by using the CONGRESS membership 1469 services. This is done by inspecting RAV[G][S]. Initially RAV[G][S] 1470 is set to this server's identifier and the remoteness value is 1471 derived from either the server's routing table or from the TTL field 1472 of the datagram seen seen by this router. With the passage of time, 1473 however, the server may find out that other servers are forwarding to 1474 it datagrams (over shortcut connections) originated at the same 1475 source, and that these routers are located closer to the source than 1476 itself (as seen from the piggybacked remoteness value). In this case 1477 RAV[G][S] is set to the name and remoteness of the router that is 1478 closest to the source S. This server (router) will be a designated 1479 injector for the datagrams originated at S and targeted to G. 1480 Obviously, when a router receives a datagram from source S over a 1481 non-shortcut connection, it may update its RAV[G][S] if its own 1482 remoteness value is better than that of the current injector. If 1483 this is the case, the router becomes a new designated injector. 1485 Since we assume an asynchronous network model, it is possible that at 1486 some point multiple IP-SENATE routers belonging to the same D-group, 1487 will consider themselves as the ones that must forward datagrams. As 1488 time passes, however, the IP-SENATE routers will learn about this 1489 redundancy, because it will be reflected by RAV[G]. In the following 1490 subsection more details about RAV maintenance are provided. 1492 In Section 6.1, two examples of measuring a remoteness were provided. 1493 It should be noted that TTL is not always a reliable measure since a 1494 source may change its value arbitrarily. In this case, due to the 1495 asynchronous nature of the network, oscillations between multiple 1496 injectors may occur. Since source initiated changes of TTL may occur 1497 considerably more often than changes of the network topology, these 1498 oscillations may present a serious problem. 1500 The information kept in RAV[G] is temporal. Each time an IP-SENATE 1501 router enters information into a row S of RAV[G], it resets a timer 1502 associated with the source S. We refer to this timer as S-timer. If 1503 no traffic from S is encountered during the time window defined by 1504 the S-timer, the IP-SENATE router discards the row in RAV[G] 1505 associated with S. 1507 When RAV[G] becomes empty, the IP-SENATE router starts another timer, 1508 called G-timer. In case no multicast traffic is encountered within G 1509 during the G-timer, an IP-SENATE router tears down the cut-through 1510 connections within the corresponding D-group. These cut-through 1511 connections may be resumed on-demand later. 1513 -------------------------------------------------------- 1515 if exists an entry RAV[G][m.S] 1516 if remoteness(m, R) <= RAV[G][m.S] 1517 /* The closest router to S is responsible for the 1518 * cut-through propagation, so that R is the injector 1519 */ 1521 update RAV[G][m.S] to hold R and remoteness(m, R); 1522 update eif to be the correct one; 1523 forward m using IDMR protocol; 1524 forward m to all other servers that are members of G 1525 that act in regular mode (directly); 1526 forward m to all clients that are members of G that act 1527 in regular mode excluding the sender (directly); 1528 else 1529 /* m will be sent by the router nearest to source. */ 1530 discard m; 1532 else /* The source of the datagram is not in the RAV[G] yet */ 1533 Create a new entry for m.S in RAV[G]; 1534 update RAV[G][m.S] to hold R and remoteness(m, R); 1536 forward m using IDMR protocol; 1537 forward m to all other servers that are members of G 1538 that act in regular mode (directly); 1539 forward m to all clients that are members of G that act in 1540 regular mode excluding the sender (directly); 1542 -------------------------------------------------------- 1544 6.3.2 A Server R Receives a Datagram from another Server R' 1546 If a server receives multicast traffic from another server belonging 1547 to the same D-group, the sending server believes that it is the one 1548 closest to the source (i.e. it receives packets from the source with 1549 the lower remoteness vault than all the other IP_SENATE routers). 1550 Otherwise it would not have been sending the datagrams. If the entry 1551 for the sending server in the RAV[G][S] is empty (e.g. because RAV 1552 was refreshed) the receiving server should insert the remoteness 1553 value of the received packet of the sending server into the 1554 corresponding entry in RAV[G][S]. Note that this operation may change 1555 the local notion of the IP-SENATE router with the lowest remoteness 1556 value, at the receiving IP-SENATE router. 1558 An IP-SENATE router acting as a server, is responsible for the 1559 propagation of the IP multicast traffic to all its clients belonging 1560 to the same D-group and to all the relevant IDMR interfaces. The 1561 latter case should be treated especially carefully because IDMR 1562 routers use RPF mechanisms in order to break stable routing loops. 1563 When a multicast IP datagram arrives to an IDMR router, the router 1564 checks whether it received it from the "expected" network interface. 1565 An IDMR router expects to receive multicast datagrams originated at 1566 some source S, from the same network interface that this router would 1567 use in order to forward unicast datagrams to S. If a multicast 1568 datagram arrived from an unexpected interface, it is silently 1569 discarded, because it was not propagated over the optimal branch of 1570 the IDMR multicast propagation tree. 1572 As seen from the code below, an IP-SENATE router updates the variable 1573 eif to be as expected by the IDMR interface. Otherwise, the RPF 1574 mechanism may might erroneously discard datagrams that should not be 1575 discarded. 1577 Obviously, there is no need to forward the IP multicast datagram that 1578 came from an IP-SENATE router acting as a server to other servers 1579 belonging to the same D-group. These servers are supposed to be the 1580 leaves of the same ptmpt connection as the receiving server. 1582 -------------------------------------------------------- 1584 update eif to be the expected interface; 1585 forward m using IDMR protocol; 1586 forward m to all clients that are members of G 1587 that act in regular mode; 1589 /* There is no need to forward to other servers, since 1590 * they are supposed to be handled by the same IP-SENATE 1591 * server that sent m. 1592 */ 1594 If the entry RAV[G][m.S] does not exist 1595 Create a new entry for m.S in RAV[G]; 1596 /* Since this is the first datagram originated at S that this 1597 * router (R) sees, it is assumed that the forwarder is the 1598 * designated injector for (S,G) 1599 */ 1600 update RAV[G][m.S] to hold the identifier of the 1601 forwarder R' and remoteness(m, R'); 1602 return; 1604 if (remoteness(m, R') < RAV[G][m.S]) 1605 update RAV[G][m.S] to hold the identifier of the 1606 datagram forwarder R' and remoteness(m, R'); 1608 -------------------------------------------------------- 1610 6.3.3 A Client Receives a Datagram from an IDMR Interface 1612 When an IP-SENATE server acting as a client receives an IP multicast 1613 datagram from an IDMR interface, it should forward it to all other 1614 involved IDMR interfaces. In order to propagate the datagram to all 1615 the relevant IP-SENATE routers using short-cut, a client should 1616 forward the datagram to its server. The latter will forward it 1617 further according to the algorithm described in Subsection 6.3.1. 1619 As will be explained in Subsection 6.3.5, IP-SENATE routers that also 1620 participate in some "broadcast & prune"- based IDMR protocol, prune 1621 the redundant branches of an IDMR multicast propagation tree. 1623 -------------------------------------------------------- 1625 forward m using IDMR protocol; 1627 forward m to Multicast_Server over a point-to-point SVC; 1629 -------------------------------------------------------- 1631 6.3.4 A Server Receives a Datagram from an IDMR Interface 1633 If an IP-SENATE router, acting as a server receives an IP multicast 1634 datagram via an IDMR multicast propagation tree, it is responsible to 1635 forward it to all the relevant non-IP-SENATE multicast routers and to 1636 the relevant clients. In case this IP-SENATE router is the designated 1637 injector for (S,G), it should also forward the multicast datagram to 1638 all the IP-SENATE routers acting as servers (over short-cut 1639 connections). 1641 -------------------------------------------------------- 1643 if exists an entry RAV[G][m.S] 1644 if remoteness(m, R) <= RAV[G][m.S] 1645 /* The closest router to S is responsible for the 1646 * cut-through propagation, so that R is the injector 1647 */ 1649 update RAV[G][m.S] to hold R and remoteness(m, R); 1650 forward m using IDMR protocol; 1651 forward m to all other servers that are members of G 1652 that act in regular mode (directly); 1653 forward m to all clients that are members of G 1654 that act in regular mode (directly); 1655 else 1656 /* m was received or will be received from the 1657 * the IP-SENATE router nearest to source. 1658 */ 1659 discard m; 1660 else 1661 Create a new entry for m.S in RAV[G]; 1662 /* Since this is the first datagram originated at S that 1663 * this router (R) sees, it is assumed that R is the 1664 * designated injector for (S,G). 1665 */ 1666 update RAV[G][m.S] to hold R and remoteness(m, R); 1667 forward m using IDMR protocol; 1668 forward m to all other servers that are members of G 1669 that act in regular mode (directly); 1670 forward m to all clients that are members of G 1671 that act in regular mode (directly); 1673 -------------------------------------------------------- 1675 6.3.5 Pruning Mechanism 1677 As mentioned earlier, IP-SENATE uses an IDMR mechanism along with 1678 short-cutting. An IP-SENATE router that must forward multicast 1679 traffic of a group G to directly attached hosts or to multicast 1680 routers, joins the relevant D-group upon reception of datagrams (or 1681 explicit join) from an IDMR interface. Consequently, shortcut 1682 connections will be formed between the members of the D-group. At 1683 this point the router may receive traffic both from shortcut 1684 connections and from the existing IDMR interface. In order to avoid 1685 this redundancy, the router prunes the upstream IDMR interface, 1686 hereafter accepting upstream traffic only from the shortcut 1687 connection. 1689 ================================================================== 1691 S 1692 x \ 1693 x \ 1694 On the left - < R1 On the right 1695 the cut-through x \ side - the IDMR 1696 connection from x ... propagation tree 1697 S to R' x \ branch 1698 x R 1699 Here, R' should x / 1700 send prune to < / 1701 R. R'<_______<___< 1702 / 1703 _________R2________________ 1704 / \ 1705 | A DVMRP routing domain | 1706 | | 1707 | | 1708 | | 1709 \_______R''________________/ 1710 | 1711 | 1712 | 1713 ------------------ 1714 | .... | 1715 | | 1716 H H 1718 a directly attached hosts that want to receive 1719 datagrams targeted to G 1721 "\" - An IDMR propagation 1722 "x" - the shortcut link 1724 Figure 7. 1726 ================================================================== 1728 Figure 7 depicts a scenario when a downstream multicast router 1729 requests prune in spite of having downstream routers and directly 1730 attached hosts that are dependent on it. Since IP-SENATE router R' 1731 receives the IP multicast traffic targeted to a group G both via a 1732 cut-through connection and an IDMR propagation tree, R' sends prune 1733 message to R. Note, however, that the rest of the IDMR multicast 1734 propagation tree located beneath the multicast router R' continues to 1735 function as usual. If all the downstream IDMR interfaces of an IP- 1736 SENATE router R have been pruned and the router has no directly 1737 attached hosts who are registered in G or are senders in G (no D- 1738 Timer is set), R leaves the relevant D-group through CONGRESS. 1740 It should be noted that if the IDMR protocol that runs inside the ATM 1741 cloud is based on broadcast-and-prune model, e.g. DVMRP, then an 1742 extensive signalling overhead may be introduced by shortcutting. This 1743 is because a multicast propagation tree of DVMRP is reconstructed 1744 periodically by flooding of multicast traffic to all the routers 1745 residing inside the ATM cloud. At the beginning, all routers will 1746 join the relevant D-group in order to make themselves available for 1747 shortcut connections. Later a considerable part of these routers will 1748 leave this D-group since their respective downstream routers will 1749 send them prune messages. This way shortcut connections may be opened 1750 to routers that, in fact, do not need to receive multicast traffic at 1751 all. These connections will be later teared down. Obviously, it is 1752 possible to introduce some optimizations that will try to minimize 1753 the signalling overhead, but, generally speaking, we believe that 1754 broadcast-and-prune IDMR protocols do not go well with shortcutting. 1756 In some cases, it may happen that a short-cut connection is 1757 mistakenly established from a downstream multicast router to the 1758 upstream multicast routers. Such short-cut connection would 1759 contradict the orientation of the IDMR propagation tree. If the 1760 upstream router would blindly prune its upstream IDMR branches just 1761 because it has a short-cut connection, it may destroy the 1762 connectivity of the IDMR propagation tree. In order to avoid such 1763 situations, an IP-SENATE router requests pruning of its upstream IDMR 1764 interfaces only if the remoteness value of a datagram received over 1765 the short-cut connection is lower than that of the datagram received 1766 over an IDMR tree. 1768 As was explained in Sections 6.3.1 and 6.3.4, a downstream router's 1769 cut-through connection would be suppressed by some other IP-SENATE 1770 router that is located closer to the source in terms of remoteness. 1772 7. Fault Tolerance 1774 Currently each GMS is a single point of failure in its domain, i.e., 1775 when a GMS fails, its domain is disconnected from the rest of the 1776 CONGRESS hierarchy. Note that this situation resembles a single DNS 1777 failure in its domain. The use of a distributed GMS server comprised 1778 of a primary and backup servers acting as a single logical entity can 1779 make the CONGRESS protocol more robust. Another way to increase the 1780 robustness is to elect a new GMS from the lower level in the CONGRESS 1781 hierarchy to take over the failed server's responsibilities. 1783 This subject is for further study. 1785 8. Security Considerations 1787 Security issues are not discussed in this document. 1789 9. Message Formats 1791 9.1 CONGRESS Messages 1793 To be supplied. 1795 9.2 IP-SENATE Messages 1797 To be supplied. 1799 10. References 1801 [1] Fenner, W., "Internet Group Management Protocol, 1802 Version 2", Internet Draft, September 1995 1804 [2] G. Armitage, "Support for Multicast over UNI 1805 3.0/3.1 based ATM Networks.", RFC2022, November 1806 1996. 1807 [3] G. Armitage, VENUS - Very Extensive Non-Unicast Service. 1808 Internet Draft, June 1997. 1809 draft-armitage-ion-venus-03.txt 1811 [4] Estrin, D, et. al., "Protocol Independent Multicast 1812 Sparse Mode (PIM-SM): Protocol Specification". Internet Draft 1813 draft-ietf-idmr-PIM-SM-spec-09.ps, October, 1996. 1815 [5] Estrin, D, et. al., "Protocol Independent Multicast 1816 Dense Mode (PIM-DM): Protocol Specification". Internet Draft 1817 draft-ietf-idmr-PIM-DM-spec-04.ps, September, 1996. 1819 [6] ATM Forum, "ATM User-Network Interface Specification Version 1820 3.1", 1994. 1822 [7] ATM Forum, "ATM User-Network Interface Specification Version 1823 4.0", 1996. 1825 [8] Laubach, M., "Classical IP and ARP over ATM", RFC 1577, 1826 Hewlett-Packard Laboratories, December 1993. 1828 [9] A. Ballardie. Core Based Tree (CBT) Multicast Architecture. 1829 Internet Draft, 1997. 1830 draft-ietf-idmr-cbt-spec-10.txt 1832 [10] T. Pusateri. Distance vector multicast routing protocol. 1833 Internet Draft, September 1996. 1834 draft-ietf-idmr-dvmrp-v3-03.[txt,ps]. 1836 [11] J. Moy. Multicast extensions to OSPF. 1837 RFC1584, July 1993. 1839 [12] M. Smirnov. EARTH - EAsy IP multicast Routing 1840 THrough ATM clouds. Internet Draft, 1997. 1841 draft-smirnov-ion-earth-02.txt 1843 [13] Yakov Rekhter and Dilip Kandlur. 1844 "Local/Remote" Forwarding Decision in Switched Data Link 1845 Subnetworks, RFC 1937. 1847 [14] T. Anker and D. Breitgand and D. Dolev and Z. Levy. 1848 CONGRESS: CONnection-oriented Group-address RESolution 1849 Service. The Hebrew University, Jerusalem Israel. 1850 Technical Report CS96-23, December 1996. 1851 http://www.cs.huji.ac.il/labs/transis/transis.html 1853 [15] Deborah Estrin and Ahmed Helmy and David Thaler. 1854 PIM Multicast Border Router (PMBR) specification 1855 for connecting PIM-SM domains to a DVMRP Backbone. 1856 Internet Draft, February 1997. 1857 draft-ietf-mboned-pmbr-spec-00.txt 1859 [16] D. Thaler. Interoperability Rules for Multicast 1860 Routing Protocols. Internet Draft May 1996. 1861 draft-ietf-mboned-imrp-some-issues-02.txt 1863 [17] G. Armitage, A Distributed MARS Protocol. 1864 Internet Draft, January 1997. 1865 draft-armitage-ion-distmars-spec-00.txt 1867 [18] G. Armitage, Issues affecting MARS Cluster Size. 1868 RFC 2121, March 1997, 1870 [19] S. Deering. Host Extensions for IP Multicasting. 1871 RFC 1112, August 1989. 1873 [20] C. Semeria. Introduction to IP Multicast Routing. 1875 [21] J. Luciani, et al. NBMA Next Hop Resolution Protocol (NHRP). 1876 Internet Draft, February 1997. 1877 draft-ietf-rolc-nhrp-11.txt 1879 11. Acknowledgments 1881 We would like to thank Prof. Israel Cidon from the Technion 1882 Institute, Israel. We also thank Yoav Kluger and Benny Rodrig from 1883 Madge Networks (Israel), for their helpful comments and their 1884 precious time. 1886 12. List of Abbreviations 1888 o IMSS - IP Multicast Shortcut Service 1889 o CONGRESS - CONnection-oriented Group address RESolution Service 1890 o IP-SENATE - IP multicast SErvice for Non-broadcast Access 1891 Networking TEchnology 1892 o LMS - Local Membership Server 1893 o GMS - Global Membership Server 1894 o MCS - Multicast Server 1896 Authors' Addresses 1898 Tal Anker 1899 The Hebrew University of Jerusalem 1900 Computer Science Dept 1901 Givat-Ram, Jerusalem 1902 Israel, 91904 1904 Phone: (972) 6585706 1906 EMail: anker@cs.huji.ac.il 1908 David Breitgand 1909 The Hebrew University of Jerusalem 1910 Computer Science Dept 1911 Givat-Ram, Jerusalem 1912 Israel, 91904 1914 Phone: (972) 6585706 1916 EMail: davb@cs.huji.ac.il 1918 Danny Dolev 1919 The Hebrew University of Jerusalem 1920 Computer Science Dept 1921 Givat-Ram, Jerusalem 1922 Israel, 91904 1924 Phone: (972) 6584116 1926 EMail: dolev@cs.huji.ac.il 1928 Zohar Levy 1929 The Hebrew University of Jerusalem 1930 Computer Science Dept 1931 Givat-Ram, Jerusalem 1932 Israel, 91904 1934 Phone: (972) 6585706 1936 EMail: zohar@cs.huji.ac.il