idnits 2.17.1 draft-ietf-mboned-intro-multicast-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-24) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 80 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 5 instances of too long lines in the document, the longest one being 1 character in excess of 72. == There are 6 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. == There are 28 instances of lines with multicast IPv4 addresses in the document. If these are generic example addresses, they should be changed to use the 233.252.0.x range defined in RFC 5771 Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 2409 has weird spacing: '...to make forwa...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 1997) is 9780 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Missing reference section? 'DR' on line 2556 looks like a reference -- Missing reference section? 'RFC-1825' on line 2913 looks like a reference -- Missing reference section? 'RFC-1826' on line 2913 looks like a reference Summary: 10 errors (**), 0 flaws (~~), 4 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group T. Maufer 3 Internet-Draft C. Semeria 4 Category: Informational 3Com Corporation 5 Expire in six months July 1997 7 Introduction to IP Multicast Routing 8 10 Status of this Memo 12 This document is an Internet Draft. Internet Drafts are working 13 documents of the Internet Engineering Task Force (IETF), its Areas, and 14 its Working Groups. Note that other groups may also distribute working 15 documents as Internet Drafts. 17 Internet Drafts are draft documents valid for a maximum of six months. 18 Internet Drafts may be updated, replaced, or obsoleted by other 19 documents at any time. It is not appropriate to use Internet Drafts as 20 reference material or to cite them other than as a "working draft" or 21 "work in progress." 23 To learn the current status of any Internet-Draft, please check the 24 "1id-abstracts.txt" listing contained in the internet-drafts Shadow 25 Directories on: 27 ftp.is.co.za (Africa) 28 nic.nordu.net (Europe) 29 ds.internic.net (US East Coast) 30 ftp.isi.edu (US West Coast) 31 munnari.oz.au (Pacific Rim) 33 ABSTRACT 35 The first part of this paper describes the benefits of multicasting, 36 the MBone, Class D addressing, and the operation of the Internet Group 37 Management Protocol (IGMP). The second section explores a number of 38 different techniques that may potentially be employed by multicast 39 routing protocols: 41 o Flooding 42 o Spanning Trees 43 o Reverse Path Broadcasting (RPB) 44 o Truncated Reverse Path Broadcasting (TRPB) 45 o Reverse Path Multicasting (RPM) 46 o ''Shared-Tree'' Techniques 48 The third part contains the main body of the paper. It describes how 49 the previous techniques are implemented in multicast routing protocols 50 available today (or under development). 52 o Distance Vector Multicast Routing Protocol (DVMRP) 53 o Multicast Extensions to OSPF (MOSPF) 54 o Protocol-Independent Multicast - Dense Mode (PIM-DM) 55 o Protocol-Independent Multicast - Sparse Mode (PIM-SM) 56 o Core-Based Trees (CBT) 58 FOREWORD 60 This document is introductory in nature. We have not attempted to 61 describe every detail of each protocol, rather to give a concise 62 overview in all cases, with enough specifics to allow a reader to grasp 63 the essential details and operation of protocols related to multicast 64 IP. Every effort has been made to ensure the accurate representation of 65 any cited works, especially any works-in-progress. For the complete 66 details, we refer you to the relevant specification(s). 68 If internet-drafts were cited in this document, it is only because they 69 were the only sources of certain technical information at the time of 70 this writing. We expect that many of the internet-drafts which we have 71 cited will eventually become RFCs. See the IETF's internet-drafts 72 shadow directories for the status of any of these drafts, their follow- 73 on drafts, or possibly the resulting RFCs. 75 TABLE OF CONTENTS 76 Section 78 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . INTRODUCTION 79 1.1 . . . . . . . . . . . . . . . . . . . . . . . . . Multicast Groups 80 1.2 . . . . . . . . . . . . . . . . . . . . . Group Membership Protocol 81 1.3 . . . . . . . . . . . . . . . . . . . . Multicast Routing Protocols 82 1.3.1 . . . . . . . . . . . Multicast Routing vs. Multicast Forwarding 83 2 . . . . . . . . MULTICAST SUPPORT FOR EMERGING INTERNET APPLICATIONS 84 2.1 . . . . . . . . . . . . . . . . . . . . . . . Reducing Network Load 85 2.2 . . . . . . . . . . . . . . . . . . . . . . . . Resource Discovery 86 2.3 . . . . . . . . . . . . . . . Support for Datacasting Applications 87 3 . . . . . . . . . . . . . . THE INTERNET'S MULTICAST BACKBONE (MBone) 88 4 . . . . . . . . . . . . . . . . . . . . . . . . MULTICAST ADDRESSING 89 4.1 . . . . . . . . . . . . . . . . . . . . . . . . Class D Addresses 90 4.2 . . . . . . . Mapping a Class D Address to an IEEE-802 MAC Address 91 4.3 . . . . . . . . . Transmission and Delivery of Multicast Datagrams 92 5 . . . . . . . . . . . . . . INTERNET GROUP MANAGEMENT PROTOCOL (IGMP) 93 5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . IGMP Version 1 94 5.2 . . . . . . . . . . . . . . . . . . . . . . . . . . IGMP Version 2 95 5.3 . . . . . . . . . . . . . . . . . . . . . . IGMP Version 3 (Future) 96 6 . . . . . . . . . . . . . . . . . . . MULTICAST FORWARDING TECHNIQUES 97 6.1 . . . . . . . . . . . . . . . . . . . . . "Simpleminded" Techniques 98 6.1.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . Flooding 99 6.1.2 . . . . . . . . Multicast Extensions to MAC-layer Spanning Trees 100 6.2 . . . . . . . . . . . . . . . . . . . Source-Based Tree Techniques 101 6.2.1 . . . . . . . . . . . . . . . . . Reverse Path Broadcasting (RPB) 102 6.2.1.1 . . . . . . . . . . . . . Reverse Path Broadcasting: Operation 103 6.2.1.2 . . . . . . . . . . . . . . . . . RPB: Benefits and Limitations 104 6.2.2 . . . . . . . . . . . Truncated Reverse Path Broadcasting (TRPB) 105 6.2.3 . . . . . . . . . . . . . . . . . Reverse Path Multicasting (RPM) 106 6.2.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . Operation 107 6.2.3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . Limitations 108 6.3 . . . . . . . . . . . . . . . . . . . . . . Shared Tree Techniques 109 6.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . Operation 110 6.3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . Benefits 111 6.3.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . Limitations 112 7 . . . . . . . . . . . . . . . . . . . "DENSE MODE" ROUTING PROTOCOLS 113 7.1 . . . . . . . . Distance Vector Multicast Routing Protocol (DVMRP) 114 7.1.1 . . . . . . . . . . . . . . . . . Physical and Tunnel Interfaces 115 7.1.2 . . . . . . . . . . . . . . . . . . . . . . . . . Basic Operation 116 7.1.3 . . . . . . . . . . . . . . . . . . . . . DVMRP Router Functions 117 7.1.4 . . . . . . . . . . . . . . . . . . . . . . . DVMRP Routing Table 118 7.1.5 . . . . . . . . . . . . . . . . . . . . . DVMRP Forwarding Table 119 7.1.6 . . . . . . . . . . . DVMRP Tree Building and Forwarding Summary 120 7.2 . . . . . . . . . . . . . . . Multicast Extensions to OSPF (MOSPF) 121 7.2.1 . . . . . . . . . . . . . . . . . . Intra-Area Routing with MOSPF 122 7.2.1.1 . . . . . . . . . . . . . . . . . . . . . Local Group Database 123 7.2.1.2 . . . . . . . . . . . . . . . . . Datagram's Shortest Path Tree 124 7.2.1.3 . . . . . . . . . . . . . . . . . . . . . . . Forwarding Cache 125 7.2.2 . . . . . . . . . . . . . . . . . . Mixing MOSPF and OSPF Routers 126 7.2.3 . . . . . . . . . . . . . . . . . . Inter-Area Routing with MOSPF 127 7.2.3.1 . . . . . . . . . . . . . . . . Inter-Area Multicast Forwarders 128 7.2.3.2 . . . . . . . . . . . Inter-Area Datagram's Shortest Path Tree 129 7.2.4 . . . . . . . . . Inter-Autonomous System Multicasting with MOSPF 130 7.2.5 . . . . . . . . . . . MOSPF Tree Building and Forwarding Summary 131 7.3 . . . . . . . . . . . . . . . Protocol-Independent Multicast (PIM) 132 7.3.1 . . . . . . . . . . . . . . . . . . . . PIM - Dense Mode (PIM-DM) 133 7.3.1.1 . . . . . . . . . . PIM-DM Tree Building and Forwarding Summary 134 8 . . . . . . . . . . . . . . . . . . . "SPARSE MODE" ROUTING PROTOCOLS 135 8.1 . . . . . . . Protocol-Independent Multicast - Sparse Mode (PIM-SM) 136 8.1.1 . . . . . . . . . . . . . . Directly Attached Host Joins a Group 137 8.1.2 . . . . . . . . . . . . Directly Attached Source Sends to a Group 138 8.1.3 . . . . . . . Shared Tree (RP-Tree) or Shortest Path Tree (SPT)? 139 8.1.4 . . . . . . . . . . . PIM-SM Tree Building and Forwarding Summary 140 8.2 . . . . . . . . . . . . . . . . . . . . . . Core Based Trees (CBT) 141 8.2.1 . . . . . . . . . . . . . . . . . . Joining a Group's Shared Tree 142 8.2.2 . . . . . . . . . . . . . . . . . . . . . Data Packet Forwarding 143 8.2.3 . . . . . . . . . . . . . . . . . . . . . . . Non-Member Sending 144 8.2.4 . . . . . . . . . . . . CBT Tree Building and Forwarding Summary 145 8.2.5 . . . . . . . . . . . . . . . . . CBT Multicast Interoperability 146 9 . . . . . . . . . . . . . . . . MULTICAST IP ROUTING: RELATED TOPICS 147 9.1 . . . . . . Interoperability Framework For Multicast Border Routers 148 9.1.1 . . . . . . . . . . . . Requirements for Multicast Border Routers 149 9.2 . . . . . . . . . . . . . . . . Issues with Expanding-Ring Searches 150 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . REFERENCES 151 10.1 . . . . . . . . . . . . . . . . . . . Requests for Comments (RFCs) 152 10.2 . . . . . . . . . . . . . . . . . . . . . . . . . Internet-Drafts 153 10.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . Textbooks 154 10.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other 155 11 . . . . . . . . . . . . . . . . . . . . . . SECURITY CONSIDERATIONS 156 12 . . . . . . . . . . . . . . . . . . . . . . . . . . ACKNOWLEDGEMENTS 157 13 . . . . . . . . . . . . . . . . . . . . . . . . . AUTHORS' ADDRESSES 159 [This space was intentionally left blank.] 160 1. INTRODUCTION 162 There are three fundamental types of IPv4 addresses: unicast, 163 broadcast, and multicast. A unicast address is used to transmit a 164 packet to a single destination. A broadcast address is used to send a 165 datagram to an entire subnetwork. A multicast address is designed to 166 enable the delivery of datagrams to a set of hosts that have been 167 configured as members of a multicast group across various 168 subnetworks. 170 Multicasting is not connection-oriented. A multicast datagram is 171 delivered to destination group members with the same "best-effort" 172 reliability as a standard unicast IP datagram. This means that 173 multicast datagrams are not guaranteed to reach all members of a group, 174 nor to arrive in the same order in which they were transmitted. 176 The only difference between a multicast IP packet and a unicast IP 177 packet is the presence of a 'group address' in the Destination Address 178 field of the IP header. Instead of a Class A, B, or C IP destination 179 address, multicasting employs a Class D address format, which ranges 180 from 224.0.0.0 to 239.255.255.255. 182 1.1 Multicast Groups 184 Individual hosts are free to join or leave a multicast group at any 185 time. There are no restrictions on the physical location or the number 186 of members in a multicast group. A host may be a member of more than 187 one multicast group at any given time and does not have to belong to a 188 group to send packets to members of a group. 190 1.2 Group Membership Protocol 192 A group membership protocol is employed by routers to learn about the 193 presence of group members on their directly attached subnetworks. When 194 a host joins a multicast group, it transmits a group membership protocol 195 message for the group(s) that it wishes to receive, and sets its IP 196 process and network interface card to receive frames addressed to the 197 multicast group. 199 1.3 Multicast Routing Protocols 201 Multicast routers execute a multicast routing protocol to define 202 delivery paths that enable the forwarding of multicast datagrams 203 across an internetwork. 205 1.3.1 Multicast Routing vs. Multicast Forwarding 207 Multicast routing protocols establish or help establish the distribution 208 tree for a given group, which enables multicast forwarding of packets 209 addressed to the group. In the case of unicast, routing protocols are 210 also used to build a forwarding table (commonly called a routing table). 212 Unicast destinations are entered in the routing table, and associated 213 with a metric and a next-hop router toward the destination. The key 214 difference between unicast forwarding and multicast forwarding is that 216 ======================================================================== 217 _ _ _ _ 218 |_| |_| |_| |_| 219 '-' '-' '-' '-' 220 | | | | 221 <- - - - - - - - - -> 222 | 223 | 224 v 225 Router 226 ^ ^ 227 / \ 228 + + 229 _ | / \ | _ 230 |_|-| + + |-|_| 231 '_' | / \ | '_' 232 _ | v v | _ 233 |_|-| - - ->|Router| <-+-+-+-> |Router|<- - - |-|_| 234 '_' | | '_' 235 _ | | _ 236 |_|-| |-|_| 237 '_' | | '_' 238 | | 240 LEGEND 242 <- - - -> Group Membership Protocol 243 <-+-+-+-> Multicast Routing Protocol 245 Figure 1: Multicast IP Delivery Service 246 ======================================================================= 248 multicast packets must be forwarded away from their source. If a packet 249 is ever forwarded back toward its source, a forwarding loop could have 250 formed, possibly leading to a multicast "storm." 252 Each routing protocol constructs a forwarding table in its own way. The 253 forwarding table may tell each router that for a certain source, or for 254 a given source sending to a certain group (a (source, group) pair), or 255 for any source sending to some group, how to forward such packets. This 256 may take the form of rules expecting certain packets to arrive on some 257 "inbound" or "upstream"interface, and the (set of) "downstream" or 258 "outbound" interface(s) required to reach all known subnetworks with 259 group members. Not all multicast routing protocols use the same style 260 forwarding state, and some use techniques not mentioned here. 262 2. MULTICAST SUPPORT FOR EMERGING INTERNET APPLICATIONS 264 Today, the majority of Internet applications rely on point-to-point 265 transmission. The utilization of point-to-multipoint transmission has 266 traditionally been limited to local area network applications. Over the 267 past few years the Internet has seen a rise in the number of new 268 applications that rely on multicast transmission. Multicast IP 269 conserves bandwidth by forcing the network to do packet replication only 270 when necessary, and offers an attractive alternative to unicast 271 transmission for the delivery of network ticker tapes, live stock 272 quotes, multiparty videoconferencing, and shared whiteboard applications 273 (among others). It is important to note that the applications for IP 274 Multicast are not solely limited to the Internet. Multicast IP can also 275 play an important role in large commercial internetworks. 277 2.1 Reducing Network Load 279 Assume that a stock ticker application is required to transmit packets 280 to 100 stations within an organization's network. Unicast transmission 281 to this set of stations will require the periodic transmission of 100 282 packets where many packets may in fact be traversing the same link(s). 283 Multicast transmission is the ideal solution for this type of 284 application since it requires only a single packet stream to be 285 transmitted by the source which is replicated at forks in the multicast 286 delivery tree. 288 Broadcast transmission is not an effective solution for this type of 289 application since it affects the CPU performance of each and every 290 station that sees the packet. Besides, it wastes bandwidth. 292 2.2 Resource Discovery 294 Some applications utilize multicast instead of broadcast transmission 295 to transmit packets to group members residing on the same subnetwork. 296 However, there is no reason to limit the extent of a multicast 297 transmission to a single LAN. The time-to-live (TTL) field in the IP 298 header can be used to limit the range (or "scope") of a multicast 299 transmission. "Expanding ring searches" are one often-cited generic 300 multicast application, and they will be discussed in detail once the 301 mechanisms used by each multicast routing protocol have been described. 303 2.3 Support for Datacasting Applications 305 Since 1992, the IETF has conducted a series of "audiocast" experiments 306 in which live audio and video were multicast from the IETF meeting site 307 to destinations around the world. In this case, "datacasting" takes 308 compressed audio and video signals from the source station and transmits 309 them as a sequence of UDP packets to a group address. Multicast 310 delivery today is not limited to audio and video. Stock quote systems 311 are one example of a (connectionless) data-oriented multicast 312 application. Someday reliable multicast transport protocols may 313 facilitate efficient inter-computer communication. Reliable multicast 314 transport protocols are currently an active area of research and 315 development. 317 3. THE INTERNET'S MULTICAST BACKBONE (MBone) 319 The Internet Multicast Backbone (MBone) is an interconnected set of 320 subnetworks and routers that support the delivery of IP multicast 321 traffic. The goal of the MBone is to construct a semipermanent IP 322 multicast testbed to enable the deployment of multicast applications 323 without waiting for the ubiquitous deployment of multicast-capable 324 routers in the Internet. 326 The MBone has grown from 40 subnets in four different countries in 1992, 327 to more than 4300 subnets worldwide by July 1997. With new multicast 328 applications and multicast-based services appearing, it seems likely 329 that the use of multicast technology in the Internet will keep growing 330 at an ever-increasing rate. 332 The MBone is a virtual network that is layered on top of sections of the 333 physical Internet. It is composed of islands of multicast routing 334 capability connected to other islands, or "regions," by virtual point- 335 to-point links called "tunnels." The tunnels allow multicast traffic to 336 pass through the non-multicast-capable parts of the Internet. Tunneled 337 IP multicast packets are encapsulated as IP-over-IP (i.e., the protocol 338 number is set to 4) so they are seen as regular unicast packets to the 339 intervening routers. The encapsulating IP header is added on entry to a 340 tunnel and stripped off on exit. This set of multicast routers, their 341 directly-connected subnetworks, and the interconnecting tunnels comprise 342 the MBone. 344 Since the MBone and the Internet have different topologies, multicast 345 routers execute a separate routing protocol to decide how to forward 346 multicast packets. In some cases, this means that they include their 347 own internal unicast routing protocol, but in other cases the multicast 348 routing protocol relies on the routing table provided by the underlying 349 unicast routing protocols. 351 The majority of the MBone regions are currently interconnected by the 352 Distance Vector Multicast Routing Protocol (DVMRP). Internally, the 353 regions may execute any routing protocol they choose, i.e., Multicast 354 extensions to OSPF (MOSPF), or the Protocol-Independent Multicast (PIM) 355 routing protocol(s), or the DVMRP. 357 As multicast routing software features become more widely available on 358 the routers of the Internet, providers may gradually decide to use 359 "native" multicast as an alternative to using lots of tunnels. "Native" 360 multicast happens when a region of routers (or set of regions) operates 361 ======================================================================== 363 +++++++ 364 /|Region | \ 365 /T/| A | \T\ 366 /U/ +++++++ \U\ 367 /N/ | \N\ 368 /N/ | \N\ 369 /E/ | \E\ 370 /L/ | \L\ 371 ++++++++ ++++++++ ++++++++ 372 | Region | | Region | ---------| Region | 373 | B | | C | Tunnel | D | 374 ++++++++\ ++++++++ --------- ++++++++ 375 \T\ | 376 \U\ | 377 \N\ | 378 \N\ | 379 \E\ ++++++++ 380 \L\| Region | 381 \| E | 382 ++++++++ 384 Figure 2: Internet Multicast Backbone (MBone) 386 ======================================================================== 388 without tunnels. All subnetworks are connected by at least one router 389 which is capable of forwarding their multicast packets as required. In 390 this case, tunnels are not required: Multicast packets just flow where 391 they need to. 393 The MBone carries audio and video multicasts of Internet Engineering 394 Task Force (IETF) meetings, NASA Space Shuttle Missions, US House and 395 Senate sessions, and other content. There are public and private 396 sessions on the MBone. Sessions that are meant for public consumption 397 are announced via the session directory (SDR) tool. Users of this tool 398 see a list of current and future public sessions, provided they are 399 within the sender's "scope." 401 4. MULTICAST ADDRESSING 403 A multicast address is assigned to a set of Internet hosts comprising a 404 multicast group. Senders use the multicast address as the destination 405 IP address of a packet that is to be transmitted to all group members. 407 4.1 Class D Addresses 409 An IP multicast group is identified by a Class D address. Class D 410 addresses have their high-order four bits set to "1110" followed by 411 a 28-bit multicast group ID. Expressed in standard "dotted-decimal" 412 notation, multicast group addresses range from 224.0.0.0 to 413 239.255.255.255 (shorthand: 224.0.0.0/4). 415 Figure 3 shows the format of a 32-bit Class D address: 417 ======================================================================== 419 0 1 2 3 31 420 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 421 |1|1|1|0| Multicast Group ID | 422 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 423 |------------------------28 bits------------------------| 425 Figure 3: Class D Multicast Address Format 426 ======================================================================== 428 The Internet Assigned Numbers Authority (IANA) maintains a list of 429 registered IP multicast groups. The base address 224.0.0.0 is reserved 430 and cannot be assigned to any group. The block of multicast addresses 431 ranging from 224.0.0.1 to 224.0.0.255 is reserved for permanent 432 assignment to various uses, including routing protocols and other 433 protocols that require a well-known permanent address. Multicast 434 routers should not forward any multicast datagram with destination 435 addresses in this range, (regardless of the packet's TTL). 437 Some of the well-known groups include: 439 "all systems on this subnet" 224.0.0.1 440 "all routers on this subnet" 224.0.0.2 441 "all DVMRP routers" 224.0.0.4 442 "all OSPF routers" 224.0.0.5 443 "all OSPF designated routers" 224.0.0.6 444 "all RIP2 routers" 224.0.0.9 445 "all PIM routers" 224.0.0.13 446 "all CBT routers" 224.0.0.15 448 The remaining groups ranging from 224.0.1.0 to 239.255.255.255 are 449 either permanently assigned to various multicast applications or are 450 available for dynamic assignment (via SDR or other methods). From this 451 range, the addresses from 239.0.0.0 to 239.255.255.255 are reserved for 452 various "administratively scoped" applications, within private networks, 453 not necessarily Internet-wide applications. 455 The complete list may be found in the Assigned Numbers RFC (RFC 1700 or 456 its successor) or at the assignments page of the IANA Web Site: 458 460 4.2 Mapping a Class D Address to an IEEE-802 MAC Address 462 The IANA has been allocated a reserved portion of the IEEE-802 MAC-layer 463 multicast address space. All of the addresses in IANA's reserved block 464 begin with 01-00-5E (hex); to be clear, the range from 01-00-5E-00-00-00 465 to 01-00-5E-FF-FF-FF is used for IP multicast groups. 467 A simple procedure was developed to map Class D addresses to this 468 reserved MAC-layer multicast address block. This allows IP multicasting 469 to easily take advantage of the hardware-level multicasting supported by 470 network interface cards. 472 The mapping between a Class D IP address and an IEEE-802 (e.g., FDDI, 473 Ethernet) MAC-layer multicast address is obtained by placing the low- 474 order 23 bits of the Class D address into the low-order 23 bits of 475 IANA's reserved MAC-layer multicast address block. This simple 476 procedure removes the need for an explicit protocol for multicast 477 address resolution on LANs akin to ARP for unicast. All LAN stations 478 know this simple transformation, and can easily send any IP multicast 479 over any IEEE-802-based LAN. 481 Figure 4 illustrates how the multicast group address 234.138.8.5 482 (or EA-8A-08-05 expressed in hex) is mapped into an IEEE-802 multicast 483 address. Note that the high-order nine bits of the IP address are not 484 mapped into the MAC-layer multicast address. 486 The mapping in Figure 4 places the low-order 23 bits of the IP multicast 487 group ID into the low order 23 bits of the IEEE-802 multicast address. 488 Because the class D space has more groups (2^28) than the IETF's OUI may 489 contain at the MAC layer (2^23), multiple group addresses map to each 490 IEEE-802 address. To be precise, 32 class D addresses map to each MAC- 491 layer multicast addresses. If two or more groups on a LAN have, by some 492 astronomical coincidence, chosen class D addresses which map to the same 493 MAC-layer multicast address, then the member hosts will receive traffic 494 for all those groups. It will then be up to their IP-layer software to 495 interpret which packets are for the group(s) to which they really belong. 496 The class D addresses 224.10.8.5 (E0-0A-08-05) and 225.138.8.5 (E1-8A- 497 08-05) are shown to map to the same IEEE-802 MAC-layer multicast address 498 (01-00-5E-0A-08-05). 500 4.3 Transmission and Delivery of Multicast Datagrams 502 When the sender and receivers are members of the same (LAN) subnetwork, 503 the transmission and reception of multicast frames is a straightforward 504 process. The source station simply addresses the IP packet to the 505 multicast group, the network interface card maps the Class D address to 506 ======================================================================== 508 Class D Addresses: 510 234.138.8.5 (EA-8A-08-05) 1110 1010 1000 1010 0000 1000 0000 0101 511 224.10.8.5 (E0-0A-08-05) 1110 0000 0000 1010 0000 1000 0000 0101 512 225.138.8.5 (E1-8A-08-05) 1110 0001 1000 1010 0000 1000 0000 0101 513 **** '''' '^^^ ^^^^ ^^^^ ^^^^ ^^^^ ^^^^ 515 | E A | 8 516 Class-D IP |_______ _______|__ _ _ _ 517 Address |-+-+-+-+-+-+-+-|-+ - - - 518 |1 1 1 0 1 0 1 0|1 519 |-+-+-+-+-+-+-+-|-+ - - - 520 ................... 521 IEEE-802 ....not......... 522 MAC-Layer .............. 523 Multicast ....mapped.. 524 Address ........... 525 |-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+ - - - 526 |0 0 0 0 0 0 0 1|0 0 0 0 0 0 0 0|0 1 0 1 1 1 1 0|0 527 |-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+ - - - 528 |_______ _______|_______ _______|_______ _______|_______ 529 | 0 1 | 0 0 | 5 E | 0 531 [Address mapping below continued from half above] 533 | 8 A | 0 8 | 0 5 | 534 |_______ _______|_______ _______|_______ _______| Class-D IP 535 - - - -|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-| Address 536 | 0 0 0 1 0 1 0|0 0 0 0 1 0 0 0|0 0 0 0 0 1 0 1| 537 - - - -|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-| 538 \____________ ____________________/ 539 \___ ___/ 540 \ / 541 | 542 23 low-order bits mapped 543 | 544 v 546 - - - -|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-| IEEE-802 547 | 0 0 0 1 0 1 0|0 0 0 0 1 0 0 0|0 0 0 0 0 1 0 1| MAC-Layer 548 - - - -|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-| Multicast 549 |_______ _______|_______ _______|_______ _______| Address 550 | 0 A | 0 8 | 0 5 | 552 Figure 4: Mapping between Class D and IEEE-802 Multicast Addresses 553 ======================================================================== 554 the corresponding IEEE-802 multicast address, and the frame is sent. 555 Internet hosts that need to receive selected multicast frames, whether 556 because a user has executed a multicast application, or because the 557 host's IP stack is required to receive certain groups (e.g., 224.0.0.1, 558 the "all-hosts" group), notify their driver software of which group 559 addresses to filter. 561 Things become somewhat more complex when the sender is attached to one 562 subnetwork and receivers reside on different subnetworks. In this case, 563 the routers must implement a multicast routing protocol that permits the 564 construction of multicast delivery trees and supports multicast packet 565 forwarding. In addition, each router needs to implement a group 566 membership protocol that allows it to learn about the existence of group 567 members on its directly attached subnetworks. 569 5. INTERNET GROUP MANAGEMENT PROTOCOL (IGMP) 571 The Internet Group Management Protocol (IGMP) runs between hosts and 572 their immediately-neighboring multicast routers. The mechanisms of the 573 protocol allow a host to inform its local router that it wishes to 574 receive transmissions addressed to a specific multicast group. Also, 575 routers periodically query the LAN to determine if any group members are 576 still active. If there is more than one IP multicast router on the LAN, 577 one of the routers is elected "querier" and assumes the responsibility of 578 querying the LAN for the presence of any group members. 580 Based on the group membership information learned from the IGMP, a 581 router is able to determine which (if any) multicast traffic needs to be 582 forwarded to each of its "leaf" subnetworks. "Leaf" subnetworks are 583 those that have no further downstream routers; they either contain 584 receivers for some set of groups, or they do not. Multicast routers use 585 the information derived from IGMP, along with a multicast routing 586 protocol, to support IP multicasting across the MBone. 588 5.1 IGMP Version 1 590 IGMP Version 1 was specified in RFC-1112. According to the 591 specification, multicast routers periodically transmit Host Membership 592 Query messages to determine which host groups have members on their 593 directly-attached networks. IGMP Query messages are addressed to the 594 all-hosts group (224.0.0.1) and have an IP TTL = 1. This means that 595 Query messages sourced from a router are transmitted onto the directly- 596 attached subnetwork, but are not forwarded by any other multicast 597 routers. 599 When a host receives an IGMP Query message, it responds with a Host 600 Membership Report for each group to which it belongs, sent to each group 601 to which it belongs. (This is an important point: IGMP Queries are 602 sent to the "all hosts on this subnet" class D address (224.0.0.1), but 603 IGMP Reports are sent to the group(s) to which the host(s) belong. IGMP 604 Reports, like Queries, are sent with the IP TTL = 1, and thus are not 605 forwarded beyond the local subnetwork.) 607 ======================================================================== 609 Group 1 ____________________ 610 ____ ____ | multicast | 611 | | | | | router | 612 |_H2_| |_H4_| |____________________| 613 ---- ---- +-----+ | 614 | | <-----|Query| | 615 | | +-----+ | 616 | | | 617 |-----+--+---------+-----+----------+--------------------+----| 618 | | | 619 | | | 620 ____ ____ ____ 621 | | | | | | 622 |_H1_| |_H3_| |_H5_| 623 ---- ---- ---- 624 Group 2 Group 1 Group 1 625 Group 2 627 Figure 5: Internet Group Management Protocol - Query Message 628 ======================================================================== 630 In order to avoid a flurry of Reports, each host starts a randomly- 631 chosen Report delay timer for each of its group memberships. If, during 632 the delay period, another Report is heard for the same group, every 633 other host in that group must reset its timer to a new random value. 634 This procedure spreads Reports out over a period of time and thus 635 minimizes Report traffic for each group that has at least one member on 636 a given subnetwork. 638 It should be noted that multicast routers do not need to be directly 639 addressed since their interfaces are required to promiscuously receive 640 all multicast IP traffic. Also, a router does not need to maintain a 641 detailed list of which hosts belong to each multicast group; the router 642 only needs to know that at least one group member is present on a given 643 network interface. 645 Multicast routers periodically transmit IGMP Queries to update their 646 knowledge of the group members present on each network interface. If 647 the router does not receive a Report from any members of a particular 648 group after a number of Queries, the router assumes that group members 649 are no longer present on an interface. Assuming this is a leaf subnet 650 (i.e., a subnet with group members but no multicast routers connecting 651 to additional group members further downstream), this interface is 652 removed from the delivery tree(s) for this group. Multicasts will 653 continue to be sent on this interface only if the router can tell (via 654 multicast routing protocols) that there are additional group members 655 further downstream reachable via this interface. 657 When a host first joins a group, it immediately transmits an IGMP Report 658 for the group rather than waiting for a router's IGMP Query. This 659 reduces the "join latency" for the first host to join a given group on 660 a particular subnetwork. "Join latency" is measured from the time when 661 a host's first IGMP Report is sent, until the first packet for that 662 group arrives on that host's subnetwork. Of course, if the group is 663 already active, the join latency is negligible. 665 5.2 IGMP Version 2 667 IGMP version 2 was distributed as part of the Distance Vector Multicast 668 Routing Protocol (DVMRP) implementation ("mrouted") source code, from 669 version 3.3 through 3.8. Initially, there was no detailed specification 670 for IGMP version 2 other than this source code. However, the complete 671 specification has recently been published in , a work-in-progress which will update the specification 673 contained in the first appendix of RFC-1112. IGMP version 2 extends 674 IGMP version 1 while maintaining backward compatibility with version 1 675 hosts. 677 IGMP version 2 defines a procedure for the election of the multicast 678 querier for each LAN. In IGMP version 2, the multicast router with the 679 lowest IP address on the LAN is elected the multicast querier. In IGMP 680 version 1, the querier election was determined by the multicast routing 681 protocol. 683 IGMP version 2 defines a new type of Query message: the Group-Specific 684 Query. Group-Specific Query messages allow a router to transmit a Query 685 to a specific multicast group rather than all groups residing on a 686 directly attached subnetwork. 688 Finally, IGMP version 2 defines a Leave Group message to lower IGMP's 689 "leave latency." When a host wishes to leave that specific group, the 690 host transmits an IGMPv2 "Leave Group" message to the all-routers group 691 (224.0.0.2), with the group field set to the group being left. After 692 receiving a Leave Group message, the IGMPv2-elected querier must find 693 out if this was the last member of this group on this subnetwork. To do 694 this, the router begins transmitting Group-Specific Query messages on 695 the interface that received the Leave Group message. If it hears no 697 Reports in response to the Group-Specific Query messages, then (if this 698 is a leaf subnet) this interface is removed from the delivery tree(s) 699 for this group (as was the case of IGMP version 1). Even if there are 700 no group members on this subnetwork, multicasts must continue to be sent 701 onto it if the router can tell that additional group members are further 702 away from the source (i.e., "downstream") that are reachable via other 703 routers attached to this subnetwork. 705 "Leave latency" is measured from a router's perspective. In version 1 706 of IGMP, leave latency was the time from a router's hearing the last 707 Report for a given group, until the router aged out that interface from 708 the delivery tree for that group (assuming this is a leaf subnet, of 709 course). Note that the only way for the router to tell that this was 710 the LAST group member is that no reports are heard in some multiple of 711 the Query Interval (this is on the order of minutes). IGMP version 2, 712 with the addition of the Leave Group message, allows a group member to 713 more quickly inform the router that it is done receiving traffic for a 714 group. The router then must determine if this host was the last member 715 of this group on this subnetwork. To do this, the router quickly 716 queries the subnetwork for other group members via the Group-Specific 717 Query message. If no members send reports after several of these Group- 718 Specific Queries, the router can infer that the last member of that 719 group has, indeed, left the subnetwork. 721 The benefit of lowering the leave latency is that the router can quickly 722 use its multicast routing protocol to inform its "upstream" neighbors 723 ("upstream" is the direction that a router considers to be toward a 724 source), allowing the delivery tree for this group to quickly adapt to 725 the new shape of the group (without this subnetwork and the branch that 726 lead to it). The alternative was having to wait for several rounds of 727 unanswered Queries to pass (on the order of minutes). If a group 728 was experiencing high traffic levels, it can be very beneficial to stop 729 transmitting data for this group as soon as possible. 731 5.3 IGMP Version 3 (Future) 733 IGMP version 3 is a preliminary work-in-progress published in . IGMP version 3 as it is currently defined will 735 introduce support for Group-Source Report messages so that a host may 736 elect to receive traffic only from specific multicast within a group. An 737 Inclusion Group-Source Report message will allow a host to specify the 738 IP address(es) of the specific sources it wants to receive, and an 739 Exclusion Group-Source Report message will allow a host to explicitly 740 ask that traffic from some (list of) sources be blocked. With IGMP 741 versions 1 and 2, if a host wants to receive any traffic for a group, 742 then traffic from all the group's sources will be forwarded onto the 743 host's subnetwork. 745 IGMP version 3 will help conserve bandwidth by allowing a host to select 746 only the specific sources from which it wants to receive traffic. Also, 747 multicast routing protocols will be able to use this information to help 748 conserve bandwidth when constructing the branches of their multicast 749 delivery trees. 751 Finally, support for Leave Group messages, first introduced in IGMPv2, 752 has been enhanced to support Group-Source Leave messages. This feature 753 will allow a host to leave an entire group, or to specify the specific 754 IP address(es) of the (source, group) pair(s) that it wishes to leave. 755 Note that at this time, not all existing multicast routing protocols can 756 support such requests. This is one issue that needs to be addressed 757 during the development of IGMP version 3. 759 6. MULTICAST FORWARDING TECHNIQUES 761 IGMP provides the final step in a multicast packet delivery service 762 since it is only concerned with the forwarding of multicast traffic from 763 a router to group members on its directly-attached subnetworks. IGMP is 764 not concerned with the delivery of multicast packets between neighboring 765 routers or across an internetwork. 767 To provide an internetwork delivery service, it is necessary to define 768 multicast routing protocols. A multicast routing protocol is 769 responsible for the construction of multicast delivery trees and 770 enabling multicast packet forwarding. This section explores a number of 771 different techniques that may potentially be employed by multicast 772 routing protocols: 774 o "Simpleminded" Techniques 775 - Flooding 776 - Multicast Extensions to MAC-layer Spanning Trees 778 o Source-Based Tree (SBT) Techniques 779 - Reverse Path Broadcasting (RPB) 780 - Truncated Reverse Path Broadcasting (TRPB) 781 - Reverse Path Multicasting (RPM) 783 o "Shared-Tree" Techniques 785 Later sections will describe how these algorithms are implemented in the 786 most prevalent multicast routing protocols in the Internet today (e.g., 787 Distance Vector Multicast Routing Protocol (DVMRP), Multicast extensions 788 to OSPF (MOSPF), Protocol-Independent Multicast (PIM), and Core-Based 789 Trees (CBT). 791 6.1 "Simpleminded" Techniques 793 Flooding and Multicast Extensions to MAC-layer Spanning Trees are two 794 algorithms that could be used to build primitive multicast routing 795 protocols. The techniques are primitive because they tend to waste 796 bandwidth or require a large amount of computational resources within 797 the participating multicast routers. Also, protocols built on these 798 techniques may work for small networks with few senders, groups, and 799 routers, but do not scale well to larger numbers of senders, groups, or 800 routers. Finally, the ability to handle arbitrary topologies may be 801 absent, or may only be present in limited ways. 803 6.1.1 Flooding 805 The simplest technique for delivering multicast datagrams to all routers 806 in an internetwork is to implement a flooding algorithm. The flooding 807 procedure begins when a router receives a packet that is addressed to a 808 multicast group. The router employs a protocol mechanism to determine 809 whether or not it has seen this particular packet before. If it is the 810 first reception of the packet, the packet is forwarded on all interfaces 811 (except the one on which it arrived) guaranteeing that the multicast 812 packet reaches all routers in the internetwork. If the router has seen 813 the packet before, then the packet is discarded. 815 A flooding algorithm is very simple to implement since a router does not 816 have to maintain a routing table and only needs to keep track of the 817 most recently seen packets. However, flooding does not scale for 818 Internet-wide applications since it generates a large number of 819 duplicate packets and uses all available paths across the internetwork 820 instead of just a limited number. Also, the flooding algorithm makes 821 inefficient use of router memory resources since each router is required 822 to maintain a distinct table entry for each recently seen packet. 824 6.1.2 Multicast Extensions to MAC-layer Spanning Trees 826 A more effective solution than flooding would be to select a subset of 827 the internetwork topology which forms a spanning tree. The spanning 828 tree defines a structure in which only one active path connects any two 829 routers of the internetwork. Figure 6 shows an internetwork and a 830 spanning tree rooted at router RR. 832 Once the spanning tree has been built, a multicast router simply 833 forwards each multicast packet to all interfaces that are part of the 834 spanning tree except the one on which the packet originally arrived. 835 Forwarding along the branches of a spanning tree guarantees that the 836 multicast packet will not loop and that it will eventually reach all 837 routers in the internetwork. 839 A spanning tree solution is powerful and would be relatively easy to 840 implement since there is a great deal of experience with spanning tree 841 protocols in the Internet community. However, a spanning tree solution 842 can centralize traffic on a small number of links, and may not provide 843 the most efficient path between the source subnetwork and group members. 844 Also, it is computationally difficult to compute a spanning tree in 845 large, complex topologies. 847 [This space was intentionally left blank.] 848 ======================================================================== 850 A Sample Internetwork: 852 #----------------# 853 / |\ / \ 854 | | \ / \ 855 | | \ / \ 856 | | \ / \ 857 | | \ / \ 858 | | #------# \ 859 | | / | \ \ 860 | | / | \ \ 861 | \ / | \-------# 862 | \ / | -----/| 863 | #-----------#----/ | 864 | /|\--- --/| \ | 865 | / | \ / \ \ | 866 | / \ /\ | \ / 867 | / \ / \ | \ / 868 #---------#-- \ | ----# 869 \ \ | / 870 \--- #-/ 872 One Possible Spanning Tree for this Sample Internetwork: 874 # # 875 \ / 876 \ / 877 \ / 878 \ / 879 \ / 880 #------RR 881 | \ 882 | \ 883 | \-------# 884 | 885 #-----------#---- 886 /| | \ 887 / | \ \ 888 / \ | \ 889 / \ | \ 890 # # | # 891 | 892 # 893 LEGEND 895 # Router 896 RR Root Router 898 Figure 6: Spanning Tree 899 ======================================================================== 900 6.2 Source-Based Tree Techniques 902 The following techniques all generate a source-based tree by various 903 means. The techniques differ in the efficiency of the tree building 904 process, and the bandwidth and router resources (i.e., state tables) 905 used to build a source-based tree. 907 6.2.1 Reverse Path Broadcasting (RPB) 909 A more efficient solution than building a single spanning tree for the 910 entire internetwork would be to build a spanning tree for each potential 911 source [subnetwork]. These spanning trees would result in source-based 912 delivery trees emanating from the subnetworks directly connected to the 913 source stations. Since there are many potential sources for a group, a 914 different delivery tree is constructed rooted at each active source. 916 6.2.1.1 Reverse Path Broadcasting: Operation 918 The fundamental algorithm to construct these source-based trees is 919 referred to as Reverse Path Broadcasting (RPB). The RPB algorithm is 920 actually quite simple. For each source, if a packet arrives on a link 921 that the local router believes to be on the shortest path back toward 922 the packet's source, then the router forwards the packet on all 923 interfaces except the incoming interface. If the packet does not 924 arrive on the interface that is on the shortest path back toward the 925 source, then the packet is discarded. The interface over which the 926 router expects to receive multicast packets from a particular source is 927 referred to as the "parent" link. The outbound links over which the 928 router forwards the multicast packet are called "child" links for this 929 source. 931 This basic algorithm can be enhanced to reduce unnecessary packet 932 duplication. If the local router making the forwarding decision can 933 determine whether a neighboring router on a child link is "downstream," 934 then the packet is multicast toward the neighbor. (A "downstream" 935 neighbor is a neighboring router which considers the local router to be 936 on the shortest path back toward a given source.) Otherwise, the packet 937 is not forwarded on the potential child link since the local router 938 knows that the neighboring router will just discard the packet (since it 939 will arrive on a non-parent link for the source, relative to that 940 downstream router). 942 The information to make this "downstream" decision is relatively easy to 943 derive from a link-state routing protocol since each router maintains a 944 topological database for the entire routing domain. If a distance- 945 vector routing protocol is employed, a neighbor can either advertise its 946 previous hop for the source as part of its routing update messages or 947 "poison reverse" the route toward a source if it is not on the delivery 948 tree for that source. Either of these techniques allows an upstream 949 router to determine if a downstream neighboring router is on an active 950 branch of the distribution tree for a certain source. 952 ======================================================================== 954 Source 955 | ^ 956 | : shortest path back to the 957 | : source for THIS router 958 | : 959 "parent link" 960 _ 961 ______|!2|_____ 962 | | 963 --"child -|!1| |!3| - "child -- 964 link" | ROUTER | link" 965 |_______________| 967 Figure 7: Reverse Path Broadcasting - Forwarding Algorithm 968 ======================================================================== 970 Note that the source station (S) is attached to a leaf subnetwork 971 directly connected to Router A. For this example, we will look at the 972 RPB algorithm from Router B's perspective. Router B receives the 973 multicast packet from Router A on link 1. Since Router B considers link 974 1 to be the parent link for the (source, group) pair, it forwards the 975 packet on link 4, link 5, and the local leaf subnetworks if they contain 976 group members. Router B does not forward the packet on link 3 because 977 it knows from routing protocol exchanges that Router C considers link 2 978 as its parent link for the source. Router B knows that if it were to 979 forward the packet on link 3, it would be discarded by Router C since 980 the packet would not be arriving on Router C's parent link for this 981 source. 983 Figure 8 illustrates the preceding discussion of the enhanced RPB 984 algorithm's basic operation. 986 6.2.1.2 RPB: Benefits and Limitations 988 The key benefit to reverse path broadcasting is that it is reasonably 989 efficient and easy to implement. It does not require that the router 990 know about the entire spanning tree, nor does it require a special 991 mechanism to stop the forwarding process (as flooding does). In 992 addition, it guarantees efficient delivery since multicast packets 993 always follow the "shortest" path from the source station to the 994 destination group. Finally, the packets are distributed over multiple 995 links, resulting in better network utilization since a different tree is 996 computed for each source. 998 One of the major limitations of the RPB algorithm is that it does not 999 take into account multicast group membership when building the delivery 1000 tree for a source. As a result, extraneous datagrams may be forwarded 1001 onto subnetworks that have no group members. 1003 ======================================================================== 1005 Source Station------>O 1006 A # 1007 +|+ 1008 + | + 1009 + O + 1010 + + 1011 1 2 1012 + + 1013 + + 1014 + + 1015 B + + C 1016 O-#- - - - -3- - - - -#-O 1017 +|+ -|+ 1018 + | + - | + 1019 + O + - O + 1020 + + - + 1021 + + - + 1022 4 5 6 7 1023 + + - + 1024 + + E - + 1025 + + - + 1026 D #- - - - -8- - - - -#- - - - -9- - - - -# F 1027 | | | 1028 O O O 1029 LEGEND 1031 O Leaf 1032 + + Shortest-path 1033 - - Branch 1034 # Router 1036 Figure 8: Reverse Path Broadcasting - Example 1037 ======================================================================== 1039 6.2.2 Truncated Reverse Path Broadcasting (TRPB) 1041 Truncated Reverse Path Broadcasting (TRPB) was developed to overcome the 1042 limitations of Reverse Path Broadcasting. With information provided by 1043 IGMP, multicast routers determine the group memberships on each leaf 1044 subnetwork and avoid forwarding datagrams onto a leaf subnetwork if it 1045 does not contain at least one member of a given destination group. Thus, 1046 the delivery tree is "truncated" by the router if a leaf subnetwork has 1047 no group members. 1049 Figure 9 illustrates the operation of TRPB algorithm. In this example 1050 the router receives a multicast packet on its parent link for the 1051 Source. The router forwards the datagram on interface 1 since that 1052 interface has at least one member of G1. The router does not forward 1053 the datagram to interface 3 since this interface has no members in the 1054 destination group. The datagram is forwarded on interface 4 if and only 1055 if a downstream router considers this subnetwork to be part of its 1056 "parent link" for the Source. 1058 ====================================================================== 1060 Source 1061 | : 1062 : 1063 | : (Source, G1) 1064 v 1065 | 1066 "parent link" 1067 | 1068 "child link" ___ 1069 G1 _______|2|_____ 1070 \ | | 1071 G3\\ _____ ___ ROUTER ___ ______ / G2 1072 \| hub |--|1| |3|-----|switch|/ 1073 /|_____| ^-- ___ --^ |______|\ 1074 / ^ |______|4|_____| ^ \ 1075 G1 ^ .^--- ^ G3 1076 ^ .^ | ^ 1077 ^ .^ "child link" ^ 1078 Forward | Truncate 1080 Figure 9: Truncated Reverse Path Broadcasting - (TRPB) 1081 ====================================================================== 1083 TRPB removes some limitations of RPB but it solves only part of the 1084 problem. It eliminates extraneous traffic on leaf subnetworks, but it 1085 does not consider group memberships when building the branches of the 1086 delivery tree. 1088 6.2.3 Reverse Path Multicasting (RPM) 1090 Reverse Path Multicasting (RPM) is an enhancement to Reverse Path 1091 Broadcasting and Truncated Reverse Path Broadcasting. 1093 RPM creates a delivery tree that spans only 1) subnetworks with group 1094 members, and 2) routers and subnetworks along the shortest path to those 1095 subnetworks. RPM allows the source-based "shortest-path" tree to be 1096 "pruned" so that datagrams are only forwarded along branches that lead 1097 to active members of the destination group. 1099 6.2.3.1 Operation 1101 When a multicast router receives a packet for a (source, group) pair, 1102 the first packet is forwarded following the TRPB algorithm across all 1103 routers in the internetwork. Routers on the edge of the network (which 1104 have only leaf subnetworks) are called leaf routers. The TRPB algorithm 1105 guarantees that each leaf router will receive at least the first 1106 multicast packet. If there is a group member on one of its leaf 1107 subnetworks, a leaf router forwards the packet based on this group 1108 membership information. 1110 ======================================================================== 1112 Source 1113 | : 1114 | : (Source, G) 1115 | : 1116 | v 1117 | 1118 o-#-G 1119 , |********** 1120 ^ | * 1121 , | * 1122 ^ | * o 1123 , | * / 1124 o-#-o ,#*********** 1125 ^ |\ ^ |\ * 1126 , | o , | G * 1127 ^ | ^ | * 1128 , | , | * 1129 ^,| ^,| * 1130 # # # 1131 /|\ /|\ /|\ 1132 o o o o o o G o G 1133 LEGEND 1135 # Router 1136 o Leaf without group member 1137 G Leaf with group member 1138 *** Active Branch 1139 --- Pruned Branch 1140 ,>, Prune Message (direction of flow --> 1142 Figure 10: Reverse Path Multicasting (RPM) 1143 ======================================================================== 1145 If none of the subnetworks connected to the leaf router contain group 1146 members, the leaf router may transmit a "prune" message on its parent 1147 link, informing the upstream router that it should not forward packets 1148 for this particular (source, group) pair on the child interface on which 1149 it received the prune message. Prune messages are sent just one hop 1150 back toward the source. 1152 An upstream router receiving a prune message is required to store the 1153 prune information in memory. If the upstream router has no recipients 1154 on local leaf subnetworks and has received prune messages from each 1155 downstream neighbor on each of the child interfaces for this (source, 1156 group) pair, then the upstream router does not need to receive any more 1157 packets for this (source, group) pair. Therefore, the upstream router 1158 can also generate a prune message of its own, one hop further back 1159 toward the source. This cascade of prune messages results in an active 1160 multicast delivery tree, consisting exclusively of "live" branches 1161 (i.e., branches that lead to active receivers). 1163 Since both the group membership and internetwork topology can change 1164 dynamically, the pruned state of the multicast delivery tree must be 1165 refreshed periodically. At regular intervals, the prune information 1166 expires from the memory of all routers and the next packet for the 1167 (source, group) pair is forwarded toward all downstream routers. This 1168 allows "stale state" (prune state for groups that are no longer active) 1169 to be reclaimed by the multicast routers. 1171 6.2.3.2 Limitations 1173 Despite the improvements offered by the RPM algorithm, there are still 1174 several scaling issues that need to be addressed when attempting to 1175 develop an Internet-wide delivery service. The first limitation is that 1176 multicast packets must be periodically broadcast across every router in 1177 the internetwork, onto every leaf subnetwork. This "broadcasting" is 1178 wasteful of bandwidth (until the updated prune state is constructed). 1180 This "broadcast and prune" paradigm is very powerful, but it wastes 1181 bandwidth and does not scale well, especially if there are receivers at 1182 the edge of the delivery tree which are connected via low-speed 1183 technologies (e.g., ISDN or modem). Also, note that every router 1184 participating in the RPM algorithm must either have a forwarding table 1185 entry for a (source, group) pair, or have prune state information for 1186 that (source, group) pair. 1188 It is clearly wasteful (especially as the number of active sources and 1189 groups increase) to place such a burden on routers that are not on every 1190 (or perhaps any) active delivery tree. Shared tree techniques are an 1191 attempt to address these scaling issues, which become quite acute when 1192 most groups' senders and receivers are sparsely distributed across the 1193 internetwork. 1195 6.3 Shared Tree Techniques 1197 The most recent additions to the set of multicast forwarding techniques 1198 are based on a shared delivery tree. Unlike shortest-path tree 1199 algorithms which build a source-based tree for each source, or each 1200 (source, group) pair, shared tree algorithms construct a single delivery 1201 tree that is shared by all members of a group. The shared tree approach 1202 is quite similar to the spanning tree algorithm except it allows the 1203 definition of a different shared tree for each group. Stations wishing 1204 to receive traffic for a multicast group must explicitly join the shared 1205 delivery tree. Multicast traffic for each group is sent and received 1206 over the same delivery tree, regardless of the source. 1208 6.3.1 Operation 1210 A shared tree may involve a single router, or set of routers, which 1211 comprise(s) the "core" of a multicast delivery tree. Figure 11 1212 illustrates how a single multicast delivery tree is shared by all 1213 sources and receivers for a multicast group. 1215 ======================================================================== 1217 Source Source Source 1218 | | | 1219 | | | 1220 v v v 1222 [#] * * * * * [#] * * * * * [#] 1223 * 1224 ^ * ^ 1225 | * | 1226 join | * | join 1227 | [#] | 1228 [x] [x] 1229 : : 1230 member member 1231 host host 1233 LEGEND 1235 [#] Shared Tree "Core" Routers 1236 * * Shared Tree Backbone 1237 [x] Member-hosts' directly-attached routers 1239 Figure 11: Shared Multicast Delivery Tree 1241 ======================================================================== 1243 Similar to other multicast forwarding algorithms, shared tree algorithms 1244 do not require that the source of a multicast packet be a member of a 1245 destination group in order to send to a group. 1247 6.3.2 Benefits 1249 In terms of scalability, shared tree techniques have several advantages 1250 over source-based trees. Shared tree algorithms make efficient use of 1251 router resources since they only require a router to maintain state 1252 information for each group, not for each source, or for each (source, 1253 group) pair. (Remember that source-based tree techniques required all 1254 routers in an internetwork to either a) be on the delivery tree for a 1255 given source or (source, group) pair, or b) to have prune state for 1256 that source or (source, group) pair: So the entire internetwork must 1257 participate in the source-based tree protocol.) This improves the 1258 scalability of applications with many active senders since the number of 1259 source stations is no longer a scaling issue. Also, shared tree 1260 algorithms conserve network bandwidth since they do not require that 1261 multicast packets be periodically flooded across all multicast routers 1262 in the internetwork onto every leaf subnetwork. This can offer 1263 significant bandwidth savings, especially across low-bandwidth WAN 1264 links, and when receivers sparsely populate the domain of operation. 1265 Finally, since receivers are required to explicitly join the shared 1266 delivery tree, data only ever flows over those links that lead to active 1267 receivers. 1269 6.3.3 Limitations 1271 Despite these benefits, there are still several limitations to protocols 1272 that are based on a shared tree algorithm. Shared trees may result in 1273 traffic concentration and bottlenecks near core routers since traffic 1274 from all sources traverses the same set of links as it approaches the 1275 core. In addition, a single shared delivery tree may create suboptimal 1276 routes (a shortest path between the source and the shared tree, a 1277 suboptimal path across the shared tree, a shortest path between the 1278 group's core router and the receiver's directly-attached router), 1279 resulting in increased delay which may be a critical issue for some 1280 multimedia applications. (Simulations indicate that latency over a 1281 shared tree may be approximately 10% larger than source-based trees in 1282 many cases, but by the same token, this may be negligible for many 1283 applications.) 1285 Finally, expanding-ring searches will probably not work as expected 1286 inside shared-tree domains. The searching host's increasing TTL will 1287 cause the packets to work their way up the shared tree, and while a 1288 desired resource may still be found, it may not be as topologically 1289 close as one would expect. 1291 7. "DENSE MODE" ROUTING PROTOCOLS 1293 Certain multicast routing protocols are designed to work well in 1294 environments that have plentiful bandwidth and where it is reasonable 1295 to assume that receivers are rather densely distributed. In such 1296 scenarios, it is very reasonable to use periodic flooding, or other 1297 bandwidth-intensive techniques that would not necessarily be very 1298 scalable over a wide-area network. In section 8, we will examine 1299 different protocols that are specifically geared toward efficient WAN 1300 operation, especially for groups that have widely dispersed (i.e., 1301 sparse) membership. 1303 So-called "dense"-mode routing protocols include: 1305 o the Distance Vector Multicast Routing Protocol (DVMRP), 1307 o Multicast Extensions to Open Shortest Path First (MOSPF), and 1309 o Protocol Independent Multicast - Dense Mode (PIM-DM). 1311 These protocols' underlying designs assume that the amount of protocol 1312 overhead (in terms of the amount of state that must be maintained by 1313 each router, the number of router CPU cycles required, and the amount of 1314 bandwidth consumed by protocol operation) is appropriate since receivers 1315 densely populate the area of operation. 1317 7.1. Distance Vector Multicast Routing Protocol (DVMRP) 1319 The Distance Vector Multicast Routing Protocol (DVMRP) is a distance- 1320 vector routing protocol designed to support the forwarding of multicast 1321 datagrams through an internetwork. DVMRP constructs source-based 1322 multicast delivery trees using the Reverse Path Multicasting (RPM) 1323 algorithm. Originally, the entire MBone ran only DVMRP. Today, over 1324 half of the MBone routers still run some version of DVMRP. 1326 DVMRP was first defined in RFC-1075. The original specification was 1327 derived from the Routing Information Protocol (RIP) and employed the 1328 Truncated Reverse Path Broadcasting (TRPB) technique. The major 1329 difference between RIP and DVMRP is that RIP calculates the next-hop 1330 toward a destination, while DVMRP computes the previous-hop back toward 1331 a source. Since mrouted 3.0, DVMRP has employed the Reverse Path 1332 Multicasting (RPM) algorithm. Thus, the latest implementations of DVMRP 1333 are quite different from the original RFC specification in many regards. 1334 There is an active effort within the IETF Inter-Domain Multicast Routing 1335 (IDMR) working group to specify DVMRP version 3 in a well-documented 1336 form. 1338 The current DVMRP v3 Internet-Draft, representing a snapshot of a work- 1339 in-progress, is: 1341 , or 1343 7.1.1 Physical and Tunnel Interfaces 1345 The interfaces of a DVMRP router may be either a physical interface to 1346 a directly-attached subnetwork or a tunnel interface (a.k.a virtual 1347 interface) to another multicast-capable region. All interfaces are 1348 configured with a metric specifying their individual costs, and a TTL 1349 threshold which any multicast packets must exceed in order to cross this 1350 interface. In addition, each tunnel interface must be explicitly 1351 configured with two additional parameters: The IP address of the local 1352 router's tunnel interface and the IP address of the remote router's 1353 interface address. 1355 ======================================================================== 1357 TTL Scope 1358 Threshold 1359 _____________________________________________________________________ 1360 0 Restricted to the same host 1361 1 Restricted to the same subnetwork 1362 15 Restricted to the same site 1363 63 Restricted to the same region 1364 127 Worldwide 1365 191 Worldwide; limited bandwidth 1366 255 Unrestricted in scope 1368 Table 1: TTL Scope Control Values 1369 ======================================================================== 1371 A multicast router will only forward a multicast datagram across an 1372 interface if the TTL field in the IP header is greater than the TTL 1373 threshold assigned to the interface. Table 1 lists the conventional 1374 TTL values that are used to restrict the scope of an IP multicast. For 1375 example, a multicast datagram with a TTL of less than 16 is restricted 1376 to the same site and should not be forwarded across an interface to 1377 other sites in the same region. 1379 TTL-based scoping is not always sufficient for all applications. 1380 Conflicts arise when trying to simultaneously enforce limits on 1381 topology, geography, and bandwidth. In particular, TTL-based scoping 1382 cannot handle overlapping regions, which is a necessary characteristic 1383 of administrative regions. In light of these issues, "administrative" 1384 scoping was created in 1994, to provide a way to do scoping based on 1385 multicast address. Certain addresses would be usable within a given 1386 administrative scope (e.g., a corporate internetwork) but would not be 1387 forwarded onto the global MBone. This allows for privacy, and address 1388 reuse within the class D address space. The range from 239.0.0.0 to 1389 239.255.255.255 has been reserved for administrative scoping. While 1390 administrative scoping has been in limited use since 1994 or so, it has 1391 yet to be widely deployed. The IETF MBoneD working group is working on 1392 the deployment of administrative scoping. For additional information, 1393 please see or the successor to 1394 this work-in-progress, entitled "Administratively Scoped IP Multicast." 1395 7.1.2 Basic Operation 1397 DVMRP implements the Reverse Path Multicasting (RPM) algorithm. 1398 According to RPM, the first datagram for any (source, group) pair is 1399 forwarded across the entire internetwork (providing the packet's TTL and 1400 router interface thresholds permit this). Upon receiving this traffic, 1401 leaf routers may transmit prune messages back toward the source if there 1402 are no group members on their directly-attached leaf subnetworks. The 1403 prune messages remove all branches that do not lead to group members 1404 from the tree, leaving a source-based shortest path tree. 1406 After a period of time, the prune state for each (source, group) pair 1407 expires to reclaim router memory that is being used to store prune 1408 state pertaining to groups that are no longer active. If those groups 1409 happen to still in use, a subsequent datagram for the (source, group) 1410 pair will be broadcast across all downstream routers. This will result 1411 in a new set of prune messages, serving to regenerate the (source, 1412 group) pair's source-based shortest-path tree. Current implementations 1413 of RPM (notably DVMRP) do not transmit prune messages reliably, so the 1414 prune lifetime must be kept short to compensate for potential lost 1415 prune messages. (The internet-draft of DVMRP 3.0 incorporates a 1416 mechanism to make prunes reliable.) 1418 DVMRP also implements a mechanism to quickly "graft" back a previously 1419 pruned branch of a group's delivery tree. If a router that had sent a 1420 prune message for a (source, group) pair discovers new group members on 1421 a leaf network, it sends a graft message to the previous-hop router for 1422 this source. When an upstream router receives a graft message, it 1423 cancels out the previously-received prune message. Graft messages 1424 cascade (reliably) hop-by-hop back toward the source until they reach 1425 the nearest "live" branch point on the delivery tree. In this way, 1426 previously-pruned branches are quickly restored to a given delivery 1427 tree. 1429 7.1.3 DVMRP Router Functions 1431 In Figure 12, Router C is downstream and may potentially receive 1432 datagrams from the source subnetwork from Router A or Router B. If 1433 Router A's metric to the source subnetwork is less than Router B's 1434 metric, then Router A is dominant over Router B for this source. 1436 This means that Router A will forward any traffic from the source 1437 subnetwork and Router B will discard traffic received from that source. 1438 However, if Router A's metric is equal to Router B's metric, then the 1439 router with the lower IP address on its downstream interface (child 1440 link) becomes the Dominant Router for this source. Note that on a 1441 subnetwork with multiple routers forwarding to groups with multiple 1442 sources, different routers may be dominant for each source. 1444 7.1.4 DVMRP Routing Table 1446 The DVMRP process periodically exchanges routing table updates with its 1447 DVMRP neighbors. These updates are independent of those generated by 1448 any unicast Interior Gateway Protocol. 1450 Since the DVMRP was developed to route multicast and not unicast 1451 traffic, a router will probably run multiple routing processes in 1452 practice: One to support the forwarding of unicast traffic and another 1453 to support the forwarding of multicast traffic. (This can be convenient: 1454 A router can be configured to only route multicast IP, with no unicast 1456 ======================================================================== 1458 To 1459 .-<-<-<-<-<-<-Source Subnetwork->->->->->->->->--. 1460 v v 1461 | | 1462 parent link parent link 1463 | | 1464 _____________ _____________ 1465 | Router A | | Router B | 1466 | | | | 1467 ------------- ------------- 1468 | | 1469 child link child link 1470 | | 1471 --------------------------------------------------------------------- 1472 | 1473 parent link 1474 | 1475 _____________ 1476 | Router C | 1477 | | 1478 ------------- 1479 | 1480 child link 1481 | 1483 Figure 12. DVMRP Dominant Router in a Redundant Topology 1484 ======================================================================== 1486 IP routing. This may be a useful capability in firewalled 1487 environments.) 1489 Again, consider Figure 12: There are two types of routers in figure 12: 1490 Dominant and Subordinate. Assume in this example that Router B is 1491 dominant, Router A is subordinate, and Router C is part of the 1492 downstream distribution tree. In general, which routers are dominant 1493 or subordinate may be different for each source! A subordinate router 1494 is one that is NOT on the shortest path tree back toward a source. The 1495 dominant router can tell this because the subordinate router will 1496 'poison-reverse' the route for this source in its routing updates which 1497 are sent on the common LAN (i.e., Router A sets the metric for this 1498 source to 'infinity'). The dominant router keeps track of subordinate 1499 routers on a per-source basis...it never needs or expects to receive a 1500 prune message from a subordinate router. Only routers that are truly on 1501 the downstream distribution tree will ever need to send prunes to the 1502 dominant router. If a dominant router on a LAN has received either a 1503 poison-reversed route for a source, or prunes for all groups emanating 1504 from that source subnetwork, then it may itself send a prune upstream 1505 toward the source (assuming also that IGMP has told it that there are no 1506 local receivers for any group from this source). 1508 ======================================================================== 1510 Source Subnet From Metric Status TTL 1511 Prefix Mask Gateway 1513 128.1.0.0 255.255.0.0 128.7.5.2 3 Up 200 1514 128.2.0.0 255.255.0.0 128.7.5.2 5 Up 150 1515 128.3.0.0 255.255.0.0 128.6.3.1 2 Up 150 1516 128.3.0.0 255.255.0.0 128.6.3.1 4 Up 200 1518 Figure 13: DVMRP Routing Table 1519 ======================================================================== 1521 A sample routing table for a DVMRP router is shown in Figure 13. Unlike 1522 the table that would be created by a unicast routing protocol such as 1523 the RIP, OSPF, or the BGP, the DVMRP routing table contains Source 1524 Prefixes and From-Gateways instead of Destination Prefixes and Next-Hop 1525 Gateways. 1527 The routing table represents the shortest path (source-based) spanning 1528 tree to every possible source prefix in the internetwork--the Reverse 1529 Path Broadcasting (RPB) tree. The DVMRP routing table does not 1530 represent group membership or received prune messages. 1532 The key elements in DVMRP routing table include the following items: 1534 Source Prefix A subnetwork which is a potential or actual 1535 source of multicast datagrams. 1537 Subnet Mask The subnet mask associated with the Source 1538 Prefix. Note that the DVMRP provides the subnet 1539 mask for each source subnetwork (in other words, 1540 the DVMRP is classless). 1542 From-Gateway The previous-hop router leading back toward a 1543 particular Source Prefix. 1545 TTL The time-to-live is used for table management 1546 and indicates the number of seconds before an 1547 entry is removed from the routing table. This 1548 TTL has nothing at all to do with the TTL used 1549 in TTL-based scoping. 1551 7.1.5 DVMRP Forwarding Table 1553 Since the DVMRP routing table is not aware of group membership, the 1554 DVMRP process builds a forwarding table based on a combination of the 1555 information contained in the multicast routing table, known groups, and 1556 received prune messages. The forwarding table represents the local 1557 router's understanding of the shortest path source-based delivery tree 1558 for each (source, group) pair--the Reverse Path Multicasting (RPM) tree. 1560 ======================================================================== 1562 Source Multicast TTL InIntf OutIntf(s) 1563 Prefix Group 1565 128.1.0.0 224.1.1.1 200 1 Pr 2p3p 1566 224.2.2.2 100 1 2p3 1567 224.3.3.3 250 1 2 1568 128.2.0.0 224.1.1.1 150 2 2p3 1570 Figure 14: DVMRP Forwarding Table 1571 ======================================================================== 1573 The forwarding table for a sample DVMRP router is shown in Figure 14. 1574 The elements in this display include the following items: 1576 Source Prefix The subnetwork sending multicast datagrams 1577 to the specified groups (one group per row). 1579 Multicast Group The Class D IP address to which multicast 1580 datagrams are addressed. Note that a given 1581 Source Prefix may contain sources for several 1582 Multicast Groups. 1584 InIntf The parent interface for the (source, group) 1585 pair. A 'Pr' in this column indicates that a 1586 prune message has been sent to the upstream 1587 router (the From-Gateway for this Source Prefix 1588 in the DVMRP routing table). 1590 OutIntf(s) The child interfaces over which multicast 1591 datagrams for this (source, group) pair are 1592 forwarded. A 'p' in this column indicates 1593 that the router has received a prune message(s) 1594 from a (all) downstream router(s) on this port. 1596 7.1.6 DVMRP Tree Building and Forwarding Summary 1598 As we have seen, DVMRP enables packets to be forwarded away from a 1599 multicast source along the Reverse Path Multicasting (RPM) tree. The 1600 general name for this technique is Reverse Path Forwarding, and it is 1601 used in some other multicast routing protocols, as we shall see later. 1603 Reverse Path Forwarding was described in Steve Deering's Ph.D. thesis, 1604 and was a refinement of early work on Reverse Path Broadcasting done by 1605 Dalal and Metcalfe in 1978. In our discussion of RPB, we saw that it 1606 was wasteful in its excessive usage of network resources, because all 1607 nodes received a (single) copy of each packet. No effort was made in 1608 RPB to prune the tree to only include branches leading to active 1609 receivers. After all, RPB was a broadcast technique, not a multicast 1610 technique. 1612 Truncated RPB added a small optimization so that IGMP was used to tell 1613 if there were listeners on the LANs at the edge of the RPB tree; if 1614 there were no listeners, the packets were stopped at the leaf routers 1615 and not forwarded onto the edge LANs. Despite the savings of LAN 1616 bandwidth, TRPB still sends a copy of each packet to every router in the 1617 topology. 1619 Reverse Path Multicasting, used in DVMRP, takes the group membership 1620 information derived from IGMP and sends control messages upstream (i.e., 1621 toward the source subnetwork) in order to prune the distribution tree. 1622 This technique trades off some memory in the participating routers (to 1623 store the prune information) in order to gain back the wasted bandwidth 1624 on branches that do not lead to interested receivers. 1626 In DVMRP, the RPM distribution tree is created on demand to describe the 1627 forwarding table for a given source sending to a group. As described in 1628 section 7.1.5, the forwarding table indicates the expected inbound 1629 interface for packets from this source, and the expected outbound 1630 interface(s) for distribution to the rest of the group. Forwarding table 1631 entries are created when packets to a "new" (source, group) pair arrive 1632 at a DVMRP router. As each packet is received, its source and group are 1633 matched against the appropriate row of the forwarding table. If the 1634 packet was received on the correct inbound interface, it is forwarded 1635 downstream on the appropriate outbound interfaces for this group. 1637 DVMRP's tree-building protocol is often called "broadcast-and-prune" 1638 because the first time a packet for a new (source, group) pair arrives, 1639 it is transmitted towards all routers in the internetwork. Then the 1640 edge routers initiate prunes. Unnecessary delivery branches are pruned 1641 within the round-trip time from the top of a branch to the furthest leaf 1642 router, typically on the order tens of milliseconds or less, thus, the 1643 distribution tree for this new (source, group) pair is quickly trimmed 1644 to serve only active receivers. 1646 The check DVMRP routers do when a packet arrives at the router is called 1647 the "reverse-path check." The first thing a router must do upon receipt 1648 of a multicast packet is determine that it arrived on the "correct" 1649 inbound interface. For packets matching (source, group) pairs that this 1650 router has already seen, there will already be a forwarding table entry 1651 indicating the expected incoming interface. For "new" packets, DVMRP's 1652 routing table is used to compare the actual receiving interface with the 1653 one that is considered by the router to be on the shortest path back to 1654 the source. If the reverse-path check succeeds, the packet is forwarded 1655 only on those interfaces that the router considers "child" interfaces 1656 (with respect to the source of the packet); there may be some interfaces 1657 that, for a given source, are neither child nor incoming interfaces. 1658 Child interfaces attach to subnetworks which use this router as their 1659 previous-hop on the shortest path toward the source (router addresses on 1660 this subnetwork are used as a tie-breaker when there is more than one 1661 such router). 1663 Once the incoming interface and valid downstream child interfaces for 1664 this (source, group) are determined, a forwarding table entry is created 1665 to enable quick forwarding of future packets for this (source, group) 1666 pair. A multicast packet must never be forwarded back toward its source: 1667 This would result in a forwarding loop. 1669 7.2. Multicast Extensions to OSPF (MOSPF) 1671 Version 2 of the Open Shortest Path First (OSPF) routing protocol is 1672 defined in RFC-1583. OSPF is an Interior Gateway Protocol (IGP) that 1673 distributes unicast topology information among routers belonging to a 1674 single OSPF "Autonomous System." OSPF is based on link-state algorithms 1675 which permit rapid route calculation with a minimum of routing protocol 1676 traffic. In addition to efficient route calculation, OSPF is an open 1677 standard that supports hierarchical routing, load balancing, and the 1678 import of external routing information. 1680 The Multicast Extensions to OSPF (MOSPF) are defined in RFC-1584. MOSPF 1681 routers maintain a current image of the network topology through the 1682 unicast OSPF link-state routing protocol. The multicast extensions to 1683 OSPF are built on top of OSPF Version 2 so that a multicast routing 1684 capability can be incrementally introduced into an OSPF Version 2 1685 routing domain. Routers running MOSPF will interoperate with non-MOSPF 1686 routers when forwarding unicast IP data traffic. MOSPF does not support 1687 tunnels. 1689 7.2.1 Intra-Area Routing with MOSPF 1691 Intra-Area Routing describes the basic routing algorithm employed by 1692 MOSPF. This elementary algorithm runs inside a single OSPF area and 1693 supports multicast forwarding when a source and all destination group 1694 members reside in the same OSPF area, or when the entire OSPF Autonomous 1695 System is a single area (and the source is inside that area...). The 1696 following discussion assumes that the reader is familiar with OSPF. 1698 7.2.1.1 Local Group Database 1700 Similar to all other multicast routing protocols, MOSPF routers use the 1701 Internet Group Management Protocol (IGMP) to monitor multicast group 1702 membership on directly-attached subnetworks. MOSPF routers maintain a 1703 "local group database" which lists directly-attached groups and 1704 determines the local router's responsibility for delivering multicast 1705 datagrams to these groups. 1707 On any given subnetwork, the transmission of IGMP Host Membership 1708 Queries is performed solely by the Designated Router (DR). However, 1709 the responsibility of listening to IGMP Host Membership Reports is 1710 performed by not only the Designated Router (DR) but also the Backup 1711 Designated Router (BDR). Therefore, in a mixed LAN containing both 1712 MOSPF and OSPF routers, an MOSPF router must be elected the DR for the 1713 subnetwork. This can be achieved by setting the OSPF RouterPriority to 1714 zero in each non-MOSPF router to prevent them from becoming the (B)DR. 1716 The DR is responsible for communicating group membership information to 1717 all other routers in the OSPF area by flooding Group-Membership LSAs. 1718 Similar to Router-LSAs and Network-LSAs, Group-Membership LSAs are only 1719 flooded within a single area. 1721 7.2.1.2 Datagram's Shortest Path Tree 1723 The datagram's shortest path tree describes the path taken by a 1724 multicast datagram as it travels through the area from the source 1725 subnetwork to each of the group members' subnetworks. The shortest 1726 path tree for each (source, group) pair is built "on demand" when a 1727 router receives the first multicast datagram for a particular (source, 1728 group) pair. 1730 When the initial datagram arrives, the source subnetwork is located in 1731 the MOSPF link state database. The MOSPF link state database is simply 1732 the standard OSPF link state database with the addition of Group- 1733 Membership LSAs. Based on the Router- and Network-LSAs in the OSPF 1734 link state database, a source-based shortest-path tree is constructed 1735 using Dijkstra's algorithm. After the tree is built, Group-Membership 1736 LSAs are used to prune the tree such that the only remaining branches 1737 lead to subnetworks containing members of this group. The output of 1738 these algorithms is a pruned source-based tree rooted at the datagram's 1739 source (Figure 15). 1741 To forward multicast datagrams to downstream members of a group, each 1742 router must determine its position in the datagram's shortest path tree. 1744 ======================================================================== 1746 S 1747 | 1748 A # 1749 / \ 1750 1 2 1751 / \ 1752 B # # C 1753 / \ \ 1754 3 4 5 1755 / \ \ 1756 D # # E # F LEGEND 1757 / \ \ 1758 6 7 8 # Router 1759 / \ \ 1760 G # # H # I 1762 Figure 15. Shortest Path Tree for a (S, G) pair 1763 ======================================================================== 1765 Assume that Figure 15 illustrates the shortest path tree for a given 1766 (source, group) pair. Router E's upstream node is Router B and there 1767 are two downstream interfaces: one connecting to Subnetwork 6 and 1768 another connecting to Subnetwork 7. 1770 Note the following properties of the basic MOSPF routing algorithm: 1772 o For a given multicast datagram, all routers within an OSPF 1773 area calculate the same source-based shortest path delivery 1774 tree. Tie-breakers have been defined to guarantee that if 1775 several equal-cost paths exist, all routers agree on a single 1776 path through the area. Unlike unicast OSPF, MOSPF does not 1777 support the concept of equal-cost multipath routing. 1779 o Synchronized link state databases containing Group-Membership 1780 LSAs allow an MOSPF router to build a source-based shortest- 1781 path tree in memory, working forward from the source to the 1782 group member(s). Unlike the DVMRP, this means that the first 1783 datagram of a new transmission does not have to be flooded to 1784 all routers in an area. 1786 o The "on demand" construction of the source-based delivery tree 1787 has the benefit of spreading calculations over time, resulting 1788 in a lesser impact for participating routers. Of course, this 1789 may strain the CPU(s) in a router if many new (source, group) 1790 pairs appear at about the same time, or if there are a lot of 1791 events which force the MOSPF process to flush and rebuild its 1792 forwarding cache. In a stable topology with long-lived 1793 multicast sessions, these effects should be minimal. 1795 7.2.1.3 Forwarding Cache 1797 Each MOSPF router makes its forwarding decision based on the contents of 1798 its forwarding cache. Contrary to DVMRP, MOSPF forwarding is not RPF- 1799 based. The forwarding cache is built from the source-based shortest- 1800 path tree for each (source, group) pair, and the router's local group 1801 database. After the router discovers its position in the shortest path 1802 tree, a forwarding cache entry is created containing the (source, group) 1803 pair, its expected "upstream" incoming interface, and the necessary 1804 "downstream" outgoing interface(s). The forwarding cache entry is then 1805 used to quickly forward all subsequent datagrams from this source to 1806 this group. If a new source begins sending to a new group, MOSPF must 1807 first calculate the distribution tree so that it may create a cache 1808 entry that can be used to forward the packet. 1810 Figure 16 displays the forwarding cache for an example MOSPF router. 1811 The elements in the display include the following items: 1813 Dest. Group A known destination group address to which 1814 datagrams are currently being forwarded, or to 1815 which traffic was sent "recently" (i.e., since 1816 the last topology or group membership or other 1817 event which (re-)initialized MOSPF's forwarding 1818 cache. 1820 Source The datagram's source host address. Each (Dest. 1821 Group, Source) pair uniquely identifies a 1822 separate forwarding cache entry. 1824 ======================================================================== 1826 Dest. Group Source Upstream Downstream TTL 1828 224.1.1.1 128.1.0.2 11 12 13 5 1829 224.1.1.1 128.4.1.2 11 12 13 2 1830 224.1.1.1 128.5.2.2 11 12 13 3 1831 224.2.2.2 128.2.0.3 12 11 7 1833 Figure 16: MOSPF Forwarding Cache 1834 ======================================================================== 1836 Upstream Datagrams matching this row's Dest. Group and 1837 Source must be received on this interface. 1839 Downstream If a datagram matching this row's Dest. Group 1840 and Source is received on the correct Upstream 1841 interface, then it is forwarded across the listed 1842 Downstream interfaces. 1844 TTL The minimum number of hops a datagram must cross 1845 to reach any of the Dest. Group's members. An 1846 MOSPF router may discard a datagram if it can see 1847 that the datagram has insufficient TTL to reach 1848 even the closest group member. 1850 The information in the forwarding cache is not aged or periodically 1851 refreshed: It is maintained as long as there are system resources 1852 available (e.g., memory) or until the next topology change. The 1853 contents of the forwarding cache will change when: 1855 o The topology of the OSPF internetwork changes, forcing all of 1856 the shortest path trees to be recalculated. (Once the cache 1857 has been flushed, entries are not rebuilt. If another packet 1858 for one of the previous (Dest. Group, Source) pairs is 1859 received, then a "new" cache entry for that pair will be 1860 created then.) 1862 o There is a change in the Group-Membership LSAs indicating that 1863 the distribution of individual group members has changed. 1865 7.2.2 Mixing MOSPF and OSPF Routers 1867 MOSPF routers can be combined with non-multicast OSPF routers. This 1868 permits the gradual deployment of MOSPF and allows experimentation with 1869 multicast routing on a limited scale. 1871 It is important to note that an MOSPF router is required to eliminate 1872 all non-multicast OSPF routers when it builds its source-based shortest- 1873 path delivery tree. (An MOSPF router can determine the multicast 1874 capability of any other router based on the setting of the multicast- 1875 capable bit (MC-bit) in the Options field of each router's link state 1876 advertisements.) The omission of non-multicast routers may create a 1877 number of potential problems when forwarding multicast traffic: 1879 o The Designated Router for a multi-access network must be an 1880 MOSPF router. If a non-multicast OSPF router is elected the 1881 DR, the subnetwork will not be selected to forward multicast 1882 datagrams since a non-multicast DR cannot generate Group- 1883 Membership LSAs for its subnetwork (because it is not running 1884 IGMP, so it won't hear IGMP Host Membership Reports). 1886 o Even though there may be unicast connectivity to a destination, 1887 there may not be multicast connectivity. For example, the only 1888 path between two points could require traversal of a non- 1889 multicast-capable OSPF router. 1891 o The forwarding of multicast and unicast datagrams between two 1892 points may follow different paths, making some routing problems 1893 a bit more challenging to solve. 1895 7.2.3 Inter-Area Routing with MOSPF 1897 Inter-area routing involves the case where a datagram's source and some 1898 of its destination group members reside in different OSPF areas. It 1899 should be noted that the forwarding of multicast datagrams continues to 1900 be determined by the contents of the forwarding cache which is still 1901 built from the local group database and the datagram source-based trees. 1902 The major differences are related to the way that group membership 1903 information is propagated and the way that the inter-area source-based 1904 tree is constructed. 1906 7.2.3.1 Inter-Area Multicast Forwarders 1908 In MOSPF, a subset of an area's Area Border Routers (ABRs) function as 1909 "inter-area multicast forwarders." An inter-area multicast forwarder is 1910 responsible for the forwarding of group membership information and 1911 multicast datagrams between areas. Configuration parameters determine 1912 whether or not a particular ABR also functions as an inter-area 1913 multicast forwarder. 1915 Inter-area multicast forwarders summarize their attached areas' group 1916 membership information to the backbone by originating new Group- 1917 Membership LSAs into the backbone area. Note that the summarization of 1918 group membership in MOSPF is asymmetric. This means that while group 1919 membership information from non-backbone areas is flooded into the 1920 backbone, but group membership from the backbone (or from any other 1921 non-backbone areas) is not flooded into any non-backbone area(s). 1923 To permit the forwarding of multicast traffic between areas, MOSPF 1924 introduces the concept of a "wild-card multicast receiver." A wild-card 1925 multicast receiver is a router that receives all multicast traffic 1926 generated in an area. In non-backbone areas, all inter-area multicast 1927 forwarders operate as wild-card multicast receivers. This guarantees 1928 that all multicast traffic originating in any non-backbone area is 1929 delivered to its inter-area multicast forwarder, and then if necessary 1930 into the backbone area. Since the backbone knows group membership for 1931 all areas, the datagram can be forwarded to the appropriate location(s) 1932 in the OSPF autonomous system, if only it is forwarded into the backbone 1933 by the source area's multicast ABR. 1935 7.2.3.2 Inter-Area Datagram's Shortest-Path Tree 1937 In the case of inter-area multicast routing, it is usually impossible to 1938 build a complete shortest-path delivery tree. Incomplete trees are a 1939 fact of life because each OSPF area's complete topological and group 1940 membership information is not distributed between OSPF areas. 1942 Topological estimates are made through the use of wild-card receivers 1943 and OSPF Summary-Links LSAs. 1945 If the source of a multicast datagram resides in the same area as the 1946 router performing the calculation, the pruning process must be careful 1947 to ensure that branches leading to other areas are not removed from the 1948 tree. Only those branches having no group members nor wild-card 1949 multicast receivers are pruned. Branches containing wild-card multicast 1950 receivers must be retained since the local routers do not know whether 1951 there are any group members residing in other areas. 1953 If the source of a multicast datagram resides in a different area than 1954 the router performing the calculation, the details describing the local 1955 topology surrounding the source station are not known. However, this 1956 information can be estimated using information provided by Summary-Links 1957 LSAs for the source subnetwork. In this case, the base of the tree 1958 begins with branches directly connecting the source subnetwork to each 1959 of the local area's inter-area multicast forwarders. Datagrams sourced 1960 from outside the local area will enter the area via one of its inter- 1961 area multicast forwarders, so they all must be part of the candidate 1962 distribution tree. 1964 Since each inter-area multicast forwarder is also an ABR, it must 1965 maintain a separate link state database for each attached area. Thus 1966 each inter-area multicast forwarder is required to calculate a separate 1967 forwarding tree for each of its attached areas. 1969 7.2.4 Inter-Autonomous System Multicasting with MOSPF 1971 Inter-Autonomous System multicasting involves the situation where a 1972 datagram's source or some of its destination group members are in 1973 different OSPF Autonomous Systems. In OSPF terminology, "inter-AS" 1974 communication also refers to connectivity between an OSPF domain and 1975 another routing domain which could be within the same Autonomous System 1976 from the perspective of an Exterior Gateway Protocol. 1978 To facilitate inter-AS multicast routing, selected Autonomous System 1979 Boundary Routers (ASBRs) are configured as "inter-AS multicast 1980 forwarders." MOSPF makes the assumption that each inter-AS multicast 1981 forwarder executes an inter-AS multicast routing protocol which forwards 1982 multicast datagrams in a reverse path forwarding (RPF) manner. Since 1983 the publication of the MOSPF RFC, a term has been defined for such a 1984 router: Multicast Border Router. See section 9 for an overview of the 1985 MBR concepts. Each inter-AS multicast forwarder is a wildcard multicast 1986 receiver in each of its attached areas. This guarantees that each 1987 inter-AS multicast forwarder remains on all pruned shortest-path trees 1988 and receives all multicast datagrams. 1990 The details of inter-AS forwarding are very similar to inter-area 1991 forwarding. On the "inside" of the OSPF domain, the multicast ASBR 1992 must conform to all the requirements of intra-area and inter-area 1993 forwarding. Within the OSPF domain, group members are reached by the 1994 usual forward path computations, and paths to external sources are 1995 approximated by a reverse-path source-based tree, with the multicast 1996 ASBR standing in for the actual source. When the source is within the 1997 OSPF AS, and there are external group members, it falls to the inter- 1998 AS multicast forwarders, in their role as wildcard receivers, to make 1999 sure that the data gets out of the OSPF domain and sent off in the 2000 correct direction. 2002 7.2.5 MOSPF Tree Building and Forwarding Summary 2004 MOSPF builds a separate tree for each (source, group) combination. The 2005 tree is rooted at the source, and includes each of the group members as 2006 leaves. If the (M)OSPF domain is divided into multiple areas, the tree 2007 is built in pieces, one area at a time. These pieces are then glued 2008 together at the area border routers which connect the various areas. 2010 Sometimes group membership of certain areas (or other ASes) is unknown. 2011 MOSPF forces the tree to extend to these areas and/or ASes by adding 2012 their area border routers/AS boundary routers to the tree as "wildcard 2013 multicast receivers". 2015 Construction of the tree within a given area depends on the location of 2016 the source. If the source is within the area, "forward" costs are used, 2017 and the path within the area follows the "forward path" , that is, the 2018 same route that unicast packets would take from source to group member. 2019 If the source belongs to a different area, or a different AS, "reverse 2020 costs" are used, resulting in reverse path forwarding through the area. 2021 (Reverse path forwarding is less preferred, but is forced because OSPF 2022 Summary LSAs and AS-External LSAs only advertise costs in the reverse 2023 direction). 2025 MOSPF's tree-building process is data-driven. Despite having Group LSAs 2026 present in each area's link state database, no trees are built unless 2027 multicast data is seen from a source to a group (of course, if the Link 2028 State Database indicates no Group LSAs for this group, then no tree is 2029 built since there must be no group members present in this area). So, 2030 despite MOSPF being a "Dense-mode" routing protocol, it is not based on 2031 broadcast-and-prune, but rather it is a so-called "explicit-join" 2032 protocol. 2034 If a packet arrives for a (source, group) pair which has not been seen 2035 by this router before, the router executes the Dijkstra algorithm over 2036 the relevant links in the Link State Database. The Dijkstra algorithm 2037 outputs the "source-rooted" shortest-path tree for this (source, group), 2038 as described earlier. The router examines its position in the tree, 2039 caching the expected inbound interface for packets from this source, and 2040 listing the outbound interface(s) that lead to active downstream 2041 receivers. Subsequent traffic is examined against this cached data. 2042 The traffic from the source must arrive on the correct interface to be 2043 processed further. 2045 7.3 Protocol-Independent Multicast (PIM) 2047 The Protocol Independent Multicast (PIM) routing protocols have been 2048 developed by the Inter-Domain Multicast Routing (IDMR) working group of 2049 the IETF. The objective of the IDMR working group is to develop one--or 2050 possibly more than one--standards-track multicast routing protocol(s) 2051 that can provide scalable multicast routing across the Internet. 2053 PIM is actually two protocols: PIM - Dense Mode (PIM-DM) and PIM - 2054 Sparse Mode (PIM-SM). In the remainder of this introduction, any 2055 references to "PIM" apply equally well to either of the two protocols... 2056 there is no intention to imply that there is only one PIM protocol. 2057 While PIM-DM and PIM-SM share part of their names, and they do have 2058 related control messages, they are really two independent protocols. 2060 PIM receives its name because it is not dependent on the mechanisms 2061 provided by any particular unicast routing protocol. However, any 2062 implementation supporting PIM requires the presence of a unicast routing 2063 protocol to provide routing table information and to adapt to topology 2064 changes. 2066 PIM makes a clear distinction between a multicast routing protocol that 2067 is designed for dense environments and one that is designed for sparse 2068 environments. Dense-mode refers to a protocol that is designed to 2069 operate in an environment where group members are relatively densely 2070 packed and bandwidth is plentiful. Sparse-mode refers to a protocol 2071 that is optimized for environments where group members are distributed 2072 across many regions of the Internet and bandwidth is not necessarily 2073 widely available. It is important to note that sparse-mode does not 2074 imply that the group has a few members, just that they are widely 2075 dispersed across the Internet. 2077 The designers of PIM-SM argue that DVMRP and MOSPF were developed for 2078 environments where group members are densely distributed, and bandwidth 2079 is relatively plentiful. They emphasize that when group members and 2080 senders are sparsely distributed across a wide area, DVMRP and MOSPF 2081 exhibit inefficiencies (but both are efficient in delivering packets). 2082 The DVMRP periodically sends multicast packets over many links that do 2083 not lead to group members, while MOSPF propagates group membership 2084 information across links that may not lead to senders or receivers. 2086 7.3.1 PIM - Dense Mode (PIM-DM) 2088 While the PIM architecture was driven by the need to provide scalable 2089 sparse-mode delivery trees, PIM also defines a new dense-mode protocol 2090 instead of relying on existing dense-mode protocols such as DVMRP and 2091 MOSPF. It is envisioned that PIM-DM would be deployed in resource rich 2092 environments, such as a campus LAN where group membership is relatively 2093 dense and bandwidth is likely to be readily available. PIM-DM's control 2094 messages are similar to PIM-SM's by design. 2096 PIM - Dense Mode (PIM-DM) is similar to DVMRP in that it employs the 2097 Reverse Path Multicasting (RPM) algorithm. However, there are several 2098 important differences between PIM-DM and the DVMRP: 2100 o To find routes back to sources, PIM-DM relies on the presence 2101 of an existing unicast routing table. PIM-DM is independent of 2102 the mechanisms of any specific unicast routing protocol. In 2103 contrast, DVMRP contains an integrated routing protocol that 2104 makes use of its own RIP-like exchanges to build its own unicast 2105 routing table (so a router may orient itself with respect to 2106 active source(s)). 2108 o Unlike the DVMRP, which calculates a set of child interfaces for 2109 each (source, group) pair, PIM-DM simply forwards multicast 2110 traffic on all downstream interfaces until explicit prune 2111 messages are received. PIM-DM is willing to accept packet 2112 duplication to eliminate routing protocol dependencies and 2113 to avoid the overhead inherent in determining the parent/child 2114 relationships. 2116 For those cases where group members suddenly appear on a pruned branch 2117 of the delivery tree, PIM-DM--like DVMRP--employs graft messages to re- 2118 attach the previously pruned branch to the delivery tree. 2120 7.3.1.1 PIM-DM Tree Building and Forwarding Summary 2122 Dense-mode PIM builds source-based trees by default. It uses the RPM 2123 algorithm, which is what DVMRP uses: Thus, PIM-DM is a data-driven 2124 protocol that floods packets to the edges of the PIM-DM domain, and 2125 expects prunes to be returned on inactive branches. A minor difference 2126 from DVMRP is that PIM-DM floods packets for new (source, group) pairs 2127 on all non-incoming interfaces. PIM-DM trades off a bit of extra 2128 flooding traffic for a simpler protocol design. 2130 Pruning in PIM-DM only happens via explicit Prune messages, which are 2131 multicast on broadcast links (if there are other routers present which 2132 hear a prune, and they still wish to receive traffic for this group to 2133 support active receivers that are downstream of them, then these other 2134 routers must multicast PIM-Join packets to ensure they remain attached 2135 to the distribution tree). Finally, PIM-DM uses a reliable graft 2136 mechanism to enable previously-sent prunes to be "erased" when new 2137 downstream group members appear after a prune had been sent. 2139 Since PIM-DM uses RPM, it implements a reverse-path check on all packets 2140 which it receives. Again, this check verifies that received packets 2141 arrive on the interface that the router would use if it needed to 2142 send a packet toward the source's prefix. Since PIM-DM does not have 2143 its own routing protocol (as DVMRP does), it uses the existing unicast 2144 routing protocol to locate itself with respect to the source(s) of 2145 multicast packets it has seen. 2147 8. "SPARSE MODE" ROUTING PROTOCOLS 2149 The most recent additions to the set of multicast routing protocols are 2150 called "sparse mode" protocols. They are designed from a different 2151 perspective than the "dense mode" protocols that we have already 2152 examined. Often, they are not data-driven, in the sense that forwarding 2153 state is set up in advance, and they trade off using bandwidth liberally 2154 (which is a valid thing to do in a campus LAN environment) for other 2155 techniques that are much more suited to scaling over large WANs, where 2156 bandwidth is scarce and expensive. 2158 These emerging routing protocols include: 2160 o Protocol Independent Multicast - Sparse Mode (PIM-SM), and 2162 o Core-Based Trees (CBT). 2164 While these routing protocols are designed to operate efficiently over a 2165 wide area network where bandwidth is scarce and group members may be 2166 quite sparsely distributed, this is not to imply that they are only 2167 suitable for small groups. Sparse doesn't mean small, rather it is 2168 meant to convey that the groups are widely dispersed, and thus it is 2169 wasteful to (for instance) flood their data periodically across the 2170 entire internetwork. 2172 CBT and PIM-Sparse Mode (PIM-SM) have been designed to provide highly 2173 efficient communication between members of sparsely distributed groups-- 2174 the type of groups that are likely to be prevalent in wide-area 2175 internetworks. The designers of these sparse-mode protocols' have 2176 observed that several hosts participating in a multicast session do not 2177 require periodically broadcasting their traffic across the entire 2178 internetwork. 2180 Noting today's existing MBone scaling problems, and extrapolating to a 2181 future of ubiquitous multicast (overlaid with perhaps thousands of 2182 widely scattered groups), it is not hard to imagine that existing 2183 multicast routing protocols will experience scaling problems. To 2184 mitigate these potential scaling issues, PIM-SM and CBT are designed 2185 to ensure that multicast traffic only crosses routers that have 2186 explicitly joined a shared tree on behalf of group members. 2188 8.1 Protocol-Independent Multicast - Sparse Mode (PIM-SM) 2190 As described previously, PIM also defines a "dense-mode" or source-based 2191 tree variant. Again, the two protocols are quite unique, and other than 2192 control messages, they have very little in common. Note that although 2193 PIM integrates control message processing and data packet forwarding 2194 among PIM-Sparse and -Dense Modes, PIM-SM and PIM-DM must never run in 2195 the same region at the same time. Essentially, a region is a set of 2196 routers which are executing a common multicast routing protocol. 2198 PIM-SM differs from existing dense-mode protocols in two key ways: 2200 o Routers with adjacent or downstream members are required to 2201 explicitly join a sparse mode delivery tree by transmitting 2202 join messages. If a router does not join the pre-defined 2203 delivery tree, it will not receive multicast traffic addressed 2204 to the group. 2206 In contrast, dense-mode protocols assume downstream group 2207 membership and forward multicast traffic on downstream links 2208 until explicit prune messages are received. Thus, the default 2209 forwarding action of dense-mode routing protocols is to forward 2210 all traffic, while the default action of a sparse-mode protocol 2211 is to block traffic unless it has been explicitly requested. 2213 o PIM-SM evolved from the Core-Based Trees (CBT) approach in that 2214 it employs the concept of a "core" (or rendezvous point (RP) in 2215 PIM-SM terminology) where receivers "meet" sources. 2217 ======================================================================== 2219 S1 S2 2220 ___|___ ___|___ 2221 | | 2222 | | 2223 # # 2224 \ / 2225 \_____________RP______________/ 2226 ./|\. 2227 ________________// | \\_______________ 2228 / _______/ | \______ \ 2229 # # # # # 2230 ___|___ ___|___ ___|___ ___|___ ___|___ 2231 | | | | | | 2232 R R R R R R 2233 LEGEND 2235 # PIM Router 2236 R Multicast Receiver 2238 Figure 17: Rendezvous Point 2239 ======================================================================== 2241 When joining a group, each receiver uses IGMP to notify its directly- 2242 attached router, which in turn joins the multicast delivery tree by 2243 sending an explicit PIM-Join message hop-by-hop toward the group's RP. 2244 A source uses the RP to announce its presence, and act as a conduit to 2245 members that have joined the group. This model requires sparse-mode 2246 routers to maintain a small amount of state (the RP-set for the sparse- 2247 mode region) prior to the arrival of data. 2249 There is only one RP-set per sparse-mode domain. By using a hash 2250 function, each PIM-SM router can map a group address uniquely to one of 2251 the members of the RP-set (to determine the group's RP). At any given 2252 time, each group has precisely one RP. In the event of the failure of 2253 an RP, a new RP-set is distributed which does not include the failed RP. 2255 8.1.1 Directly Attached Host Joins a Group 2257 When there is more than one PIM router connected to a multi-access LAN, 2258 the router with the highest IP address is selected to function as the 2259 Designated Router (DR) for the LAN. The DR sends Join/Prune messages 2260 toward the RP. 2262 When the DR receives an IGMP Report message for a new group, it performs 2263 a deterministic hash function over the sparse-mode region's RP-set to 2264 uniquely determine the RP for the group. 2266 ======================================================================== 2268 Source (S) 2269 _|_____ 2270 | 2271 | 2272 # 2273 / \ 2274 / \ 2275 / \ 2276 # # 2277 | / \ 2278 | / \ 2279 Host | / \ 2280 -----|- DR - - - - - -#- - - - - - - -RPg 2281 (receiver) | ----Join--> ----Join--> 2283 LEGEND 2285 # PIM Router RPg Rendezvous Point for group g 2287 Figure 18: Host Joins a Multicast Group 2288 ======================================================================== 2290 After performing the lookup, the DR creates a multicast forwarding entry 2291 for the (*, group) pair and transmits a unicast PIM-Join message toward 2292 the RP for this specific group. The (*, group) notation indicates an 2293 (any source, group) pair. The intermediate routers forward the unicast 2294 PIM-Join message, creating a forwarding entry for the (*, group) pair 2295 only if such a forwarding entry does not yet exist. Intermediate 2296 routers must create a forwarding entry so that they will be able to 2297 forward future traffic downstream toward the DR which originated the 2298 PIM-Join message. 2300 8.1.2 Directly Attached Source Sends to a Group 2302 When a source first transmits a multicast packet to a group, its DR 2303 forwards the datagram to the RP for subsequent distribution along the 2304 group's delivery tree. The DR encapsulates the initial multicast 2305 packets in PIM-SM-Register packets and unicasts them toward the group's 2306 RP. The PIM-SM-Register packets inform the RP of a new source. The RP 2307 may then elect to transmit PIM-Join messages back toward the source's DR 2308 to join this source's shortest-path tree, which will allow future 2309 unencapsulated packets to flow from this source's DR to the group's RP. 2311 ======================================================================== 2313 Source (S) 2314 _|____ 2315 | 2316 | 2317 DR v 2318 / ^\ v 2319 / ^ \ v 2320 / ^ \ v 2321 # ^ # v 2322 / ^ \ v 2323 / ^ \ v 2324 Host | / ^ \ v | Host 2325 ------|- # - - - - - -#- - - - - - - - RP - - - # - -|----- 2326 (receiver) | < ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ > | (receiver) 2328 LEGEND 2330 # PIM Router v v v PIM-SM-Register 2331 RP Rendezvous Point ^ ^ ^ PIM-Join 2332 ~ ~ ~ Resend to group members 2334 Figure 19: Source sends to a Multicast Group 2335 ======================================================================== 2337 Unless the RP decides to join the source's SPT, rooted at the source's 2338 DR, the (source, group) state is not created in all the routers between 2339 source's DR and the RP, and the DR must continue to send the source's 2340 multicast IP packets to the RP as unicast packets encapsulated within 2341 unicast PIM-SM-Register packets. The DR may stop forwarding multicast 2342 packets encapsulated in this manner once it has received a PIM-Register- 2343 Stop message from the group's RP. The RP may send PIM-Register-Stop 2344 messages if there are no downstream receivers for a group, or if the RP 2345 has successfully joined the source's shortest-path tree (SPT). 2347 8.1.3 Shared Tree (RP-Tree) or Shortest Path Tree (SPT)? 2349 The RP-tree provides connectivity for group members but does not 2350 optimize the delivery path through the internetwork. PIM-SM allows 2351 routers to either a) continue to receive multicast traffic over the 2352 shared RP-tree, or b) subsequently create a source-based shortest-path 2353 tree on behalf of their attached receiver(s). Besides reducing the 2354 delay between this router and the source (beneficial to its attached 2355 receivers), the shared tree also reduces traffic concentration effects 2356 on the RP-tree. 2358 A PIM-SM router with local receivers has the option of switching to the 2359 source's shortest-path tree (i.e., source-based tree) once it starts 2360 receiving data packets from the source. The change-over may be 2361 triggered if the data rate from the source exceeds a predefined 2362 threshold. The local receiver's last-hop router does this by sending a 2363 PIM-Join message toward the active source. After the source-based SPT 2364 is active, protocol mechanisms allow a Prune message for that source to 2365 be transmitted to the group's RP, thus removing this router from the 2366 this source's shared RP-tree. Alternatively, the DR may be configured 2367 to never switch over to the source-based SPT, or some other metric might 2368 be used to control when to switch to a source-based SPT. 2370 ======================================================================== 2372 LEGEND Source 2373 (S) 2374 # PIM Router _|____ 2375 RP Rendezvous Point | 2376 ` ` RP-Tree (Shared) | 2377 % % Shortest-Path Tree % # ` 2378 (Source-based) % / \ ` 2379 % / \ ` 2380 % / \ ` 2381 Designated % # # ` 2382 Router % / \ ` 2383 % / \ ` 2384 Host | <-% % % % % % / \ ` 2385 -----|-#-------------#---------------RP 2386 (receiver) | <-` ` ` ` ` ` ` ` ` ` ` ` ` ` ` 2387 | 2389 Figure 20: Shared RP-Tree and Shortest Path Tree (SPT) 2390 ======================================================================== 2391 Besides a last-hop router being able to switch to a source-based tree, 2392 there is also the capability of the RP for a group to transition to a 2393 source's shortest-path tree. Similar controls (bandwidth threshold, 2394 administrative metrics, etc.) may be used at an RP to influence these 2395 decisions. The RP only joins the source's DR's SPT if local policy 2396 controls permit this. 2398 8.1.4 PIM-SM Tree Building and Forwarding Summary 2400 PIM-SM can use both source-based and shared trees. In fact, a given 2401 group may have some routers on its shared tree, and others on source- 2402 based trees, simultaneously. By default, PIM-SM uses shared trees 2403 rooted at the Rendezvous Point, but regardless of which tree type is in 2404 use, there is no broadcasting of any traffic. Interested receivers use 2405 IGMP to inform their local PIM-SM router(s), then the subnetwork's PIM- 2406 SM Designated Router then issues PIM-Join messages (on behalf of the 2407 receivers) toward the group's RP. These join messages establish 2408 forwarding state in the intermediate routers which is used in the future 2409 to make forwarding decisions. If a packet is received which has no 2410 pre-established forwarding state, it is dropped. 2412 As each packet is received, it must match a pre-existing forwarding 2413 cache entry. If it does, then you know which interface on which to do 2414 the a reverse-path check. This is consistent with the forwarding 2415 technique used by PIM-DM, and similar to that used by DVMRP. The 2416 unicast routing table provides the necessary information to determine 2417 the best route toward the group's RP; the packet must have arrived on 2418 the interface this router would use to send traffic toward the group's 2419 RP. Note that the forwarding state which is created by PIM-SM is uni- 2420 directional (it allows traffic to flow from the source's DR toward the 2421 RP, not away from it). 2423 8.2 Core Based Trees (CBT) 2425 Core Based Trees is another multicast architecture that is based on a 2426 shared delivery tree. It is specifically intended to address the 2427 important issue of scalability when supporting multicast applications 2428 across the public Internet, and is also suitable for use within private 2429 intranetworks. 2431 Similar to PIM-SM, CBT is protocol-independent. CBT employs the 2432 information contained in the unicast routing table to build its shared 2433 delivery tree. It does not care how the unicast routing table is 2434 derived, only that a unicast routing table is present. This feature 2435 allows CBT to be deployed without requiring the presence of any specific 2436 unicast routing protocol. 2438 "Protocol independence" doesn't necessarily have to mean that multicast 2439 paths are the same set of routers and links used by unicast routing, 2440 though it is easy to make this assumption (it may indeed hold true in 2441 some cases). An underlying routing protocol could collect both unicast- 2442 and multicast- related information, so that unicast routes could be 2443 calculated based on the unicast information, and multicast routes based 2444 on the multicast information. If the path (set of intervening routers 2445 and links) used between any two network nodes is the same for multicast 2446 and for unicast, then it can be said that the unicast and multicast 2447 topologies overlap (for that set of paths). 2449 Where multicast and unicast topologies do happen to overlap, multicast 2450 and unicast paths could be calculated from the one set of information 2451 (i.e., unicast). However, if the set of routers and links between two 2452 network nodes differs for multicast and unicast traffic, then the 2453 unicast and multicast topologies are incongruent for that network path. 2454 It's a matter of policy whether unicast and multicast topologies are 2455 aligned for any set of network links. 2457 The current version of the CBT specification has adopted a similar 2458 "bootstrap" mechanism to that defined in the PIM-SM specification. It 2459 is an implementation choice whether a dynamic or static mechanism is 2460 used for discovering how groups map to cores. This process of 2461 discovering which core serves which group(s) is what is referred to as 2462 bootstrapping. 2464 Each group has only one core, but one core might serve multiple groups. 2465 Use of the dynamic bootstrap mechanism is only applicable within a 2466 multicast region, not between regions. The advantage of the dynamic 2467 approach is that a region's CBT routers need less configuration. The 2468 disadvantage is that core placement could be particularly sub-optimal 2469 for some set of receivers. Manual placement means that each group's 2470 core can be "better" positioned relative to a group's members. CBT's 2471 modular design allows other core discovery mechanisms to be used if such 2472 mechanisms are considered more beneficial to CBT's requirements. For 2473 inter-domain RP/Core discovery, efforts are underway to standardize (or 2474 at least separately specify) a common mechanism, the intent being that 2475 any shared tree protocol could implement this common interdomain 2476 discovery architecture using its own protocol message types. 2478 In a significant departure from PIM-SM, CBT has decided to maintain its 2479 scaling characteristics by not offering the option of optimizing delay 2480 by shifting from a Shared Tree (e.g., PIM-SM's RP-Tree) to a Shortest 2481 Path Tree (SPT). The designers of CBT believe that this is a critical 2482 decision since when multicasting becomes widely deployed, the need for 2483 routers to maintain large amounts of state information will become the 2484 overpowering scaling factor. Thus CBT's state information (i.e., its 2485 forwarding cache) consists of: (group, incoming interface, {outgoing 2486 interface list}). Forwarding decisions are made by using the group's 2487 destination address as the search key into the forwarding cache. 2489 Finally, unlike PIM-SM's shared tree state, CBT state is bi-directional. 2490 Data may therefore flow in either direction along a branch. Thus, data 2491 from a source which is directly attached to an existing tree branch need 2492 not be encapsulated. 2494 8.2.1 Joining a Group's Shared Tree 2496 A host that wants to join a multicast group issues an IGMP host 2497 membership report. This message informs its local CBT-aware router(s) 2498 that it wishes to receive traffic addressed to the multicast group. Upon 2499 receipt of an IGMP host membership report for a new group, the local CBT 2500 router issues a JOIN_REQUEST which is processed hop-by-hop, creating 2501 transient join state (incoming interface, outgoing interface) in each 2502 router traversed. 2504 If the JOIN_REQUEST encounters a router that is already on the group's 2505 shared tree before it reaches the core router, then that router issues a 2506 JOIN_ACK hop-by-hop back toward the sending router. If the JOIN_REQUEST 2507 does not encounter an on-tree CBT router along its path towards the 2508 core, then the core router is responsible for responding with a 2509 JOIN_ACK. In either case, each intermediate router that forwards the 2510 JOIN_REQUEST towards the core is required to create a transient "join 2511 state." This transient "join state" includes the multicast group, and 2512 the JOIN_REQUEST's incoming and outgoing interfaces. This information 2513 allows an intermediate router to forward returning JOIN_ACKs along the 2514 exact reverse path to the CBT router which initiated the JOIN_REQUEST. 2516 As the JOIN_ACK travels towards the CBT router that issued the 2517 JOIN_REQUEST, each intermediate router creates new "active state" for 2518 this group. New branches are established by having the intermediate 2519 routers remember which interface is upstream, and which interface(s) 2520 is(are) downstream. Once a new branch is created, each child router 2521 monitors the status of its parent router with a keepalive mechanism, 2522 the CBT "Echo" protocol. A child router periodically unicasts a 2523 ECHO_REQUEST to its parent router, which is then required to respond 2524 with a unicast ECHO_REPLY message. 2526 If, for any reason, the link between an on-tree router and its parent 2527 should fail, or if the parent router is otherwise unreachable, the 2528 on-tree router transmits a FLUSH_TREE message on its child interface(s) 2529 which begins tearing down all the downstream branches for this group. 2530 Each leaf router is then responsible for re-attaching itself to the 2531 group's core, thus rebuilding the shared delivery tree. Leaf routers 2532 only re-join the shared tree if they have at least one directly- 2533 attached group member. 2535 The Designated Router (DR) for a given broadcast-capable subnetwork is 2536 elected by CBT's "Hello" protocol. It functions as the only upstream 2537 router for all groups using that link. The DR is not necessarily the 2538 best next-hop router to every core for every multicast group. The 2539 implication is that it is possible for the DR to receive a JOIN_REQUEST 2540 over a LAN, but the DR may need to redirect the JOIN_REQUEST back across 2541 ======================================================================== 2543 #- - - -#- - - - -# 2544 | \ 2545 | # 2546 | 2547 # - - - - # 2548 member | | 2549 host --| | 2550 | --Join--> --Join--> --Join--> | 2551 |- [DR] - - - [:] - - - -[:] - - - - [@] 2552 | <--ACK-- <--ACK-- <--ACK-- 2554 LEGEND 2556 [DR] Local CBT Designated Router 2557 [:] CBT Router 2558 [@] Core Router 2559 # CBT Router that is already on the shared tree 2561 Figure 21: CBT Tree Joining Process 2562 ======================================================================== 2564 the same link to the best next-hop router toward a given group's core. 2565 Data traffic is never duplicated across a link, only JOIN_REQUESTs, 2566 which should be an inconsequential volume of traffic. 2568 8.2.2 Data Packet Forwarding 2570 When a JOIN_ACK is received by an intermediate router, it either adds 2571 the interface over which the JOIN_ACK was received to an existing 2572 group's forwarding cache entry, or creates a new entry for the multicast 2573 group if one does not already exist. When a CBT router receives a data 2574 packet addressed to any multicast group, it simply forwards the packet 2575 over the outgoing interfaces specified by the group's forwarding cache 2576 entry. 2578 8.2.3 Non-Member Sending 2580 Similar to other multicast routing protocols, CBT does not require that 2581 the source of a multicast packet be a member of the multicast group. 2582 However, for a multicast data packet to reach the active core for the 2583 group, at least one CBT-capable router must be present on the non-member 2584 sender's subnetwork. The local CBT-capable router employs IP-in-IP 2585 encapsulation and unicasts the data packet to the active core for 2586 delivery to the rest of the multicast group. Thus, every CBT-capable 2587 router in each CBT region needs a list of corresponding 2588 mappings 2589 8.2.4 CBT Tree Building and Forwarding Summary 2591 CBT trees are constructed based on forward paths from the source(s) to 2592 the core. Any group only has exactly one active core, as you recall. 2593 As CBT is an explicit-join protocol, no routers forward traffic to a 2594 group unless there are pre-established active receivers for that group 2595 within the CBT domain. 2597 Trees are built using each CBT router's unicast routing table, by 2598 forwarding JOIN_REQUESTs toward a group's core. The resulting non- 2599 source-specific state is bi-directional, a feature unique to CBT trees. 2600 If any group member needs to send to the group, zero extra forwarding 2601 state needs to be established...every possible receiver is already also 2602 a potential sender, with no penalty of extra state (in the form of 2603 (source, group) state), in the routers. 2605 8.2.5 CBT Multicast Interoperability 2607 Multicast interoperability is currently being defined. Work is underway 2608 in the IDMR working group to describe the attachment of stub-CBT and 2609 stub-PIM domains to a DVMRP backbone. Future work will focus on 2610 developing methods of connecting non-DVMRP transit domains to a DVMRP 2611 backbone. 2613 CBT interoperability will be achieved through the deployment of domain 2614 border routers (BRs) which enable the forwarding of multicast traffic 2615 between the CBT and DVMRP domains. The BR implements DVMRP and CBT on 2616 different interfaces and is responsible for forwarding data across the 2617 domain boundary. 2619 The BR is also responsible for exporting selected routes out of the CBT 2620 domain into the DVMRP domain. While the CBT stub domain never needs to 2621 import routes, the DVMRP backbone needs to import routes to any sources 2622 of traffic which are inside the CBT domain. The routes must be imported 2623 so that DVMRP can perform its reverse-path check. 2625 ======================================================================== 2627 / - - - - - - - \ /----------------\ 2628 | | | 2629 DVMRP | +----+ | CBT | 2630 | ----| BR |----| | 2631 Backbone | +----+ | Domain | 2632 | | | 2633 \- - - - - - - -/ \----------------/ 2635 Figure 22: Domain Border Router 2636 ======================================================================== 2637 9. MULTICAST IP ROUTING: RELATED TOPICS 2639 To close this overview of multicast IP routing technology, we first 2640 examine multicast routing protocol interoperability, then turn to 2641 expanding-ring searches, which may not be equally-effective depending 2642 on which multicast routing protocol is being used. 2644 9.1 Interoperability Framework for Multicast Border Routers 2646 In late 1996, the IETF IDMR working group began discussing a formal 2647 structure that would describe the way different multicast routing 2648 protocols should interact inside a multicast border router (MBR). The 2649 work-in-progress can be found in the following internet draft: , or its successor. The draft covers explicit 2651 rules for the major multicast routing protocols that existed at the end 2652 of 1996: DVMRP, MOSPF, PIM-DM, PIM-SM, and CBT, but applies to any 2653 potential future multicast routing protocol(s) as well. 2655 The IDMR standards work will focus on this generic inter-protocol MBR 2656 scheme, rather than having to write 25 documents, 20 detailing how each 2657 of those 5 protocols must interwork with the 4 others, plus 5 detailing 2658 how two disjoint regions running the same protocol must interwork. 2660 9.1.1 Requirements for Multicast Border Routers 2662 In order to ensure reliable multicast delivery across a network with an 2663 arbitrary mixture of multicast routing protocols, some constraints are 2664 imposed to limit the scope of the problem space. 2666 Each multicast routing domain, or region, may be connected in a "tree 2667 of regions" topology. If more arbitrary inter-regional topologies are 2668 desired, a hierarchical multicast routing protocol (such as, H-DVMRP) 2669 must be employed, because it carries topology information about how the 2670 regions are interconnected. Until this information is available, we 2671 must consider the case of a tree of regions with one centrally-placed 2672 "backbone" region. Each pair of regions is interconnected one or more 2673 MBR(s). 2675 A MBR is responsible for injecting a default route into its "child 2676 regions," and also injecting subnetwork reachability information into 2677 its "parent region," optionally using aggregation techniques to reduce 2678 the volume of the information while preserving its meaning. MBRs which 2679 comply with have other characteristics and 2680 duties, including: 2682 o The MBR consists at least two active routing components, each 2683 an instance of some multicast routing protocol. No assumption is 2684 made about the type of routing protocol (e.g., broadcast-and-prune 2685 or explicit-join; distance-vector or link-state; etc.) any component 2686 runs, or the nature of a "component". Multiple components running 2687 the same protocol are allowed. 2689 o A MBR forwards packets between two or more independent regions, with 2690 one or more active interfaces per region, but only one component per 2691 region. 2693 o Each interface for which multicast is enabled is "owned" by exactly 2694 one of the components at a time. 2696 o All components share a common forwarding cache of (S,G) entries, 2697 which are created when data packets are received, and can be 2698 deleted at any time. The component owning an interface is the only 2699 component that may change forwarding cache entries pertaining to 2700 that interface. Each forwarding cache entry has a single incoming 2701 interface (iif) and a list of outgoing interfaces (oiflist). 2703 9.2. ISSUES WITH EXPANDING-RING SEARCHES 2705 Expanding-ring searches may be used when an end-station wishes to find 2706 the closest example of a certain kind of server. One key assumption is 2707 that each of these servers is equivalent in the service provided: Ask 2708 any of them a question pertinent to that service and you should get the 2709 same answer. For example, such a service is the DNS. The searching 2710 client sends a query packet with the IP header's TTL field set to 1. 2711 This packet will only reach its local subnetwork. If a response is not 2712 heard within some timeout interval, a second query is emitted, this time 2713 with the TTL set to 2. This process of sending and waiting for a 2714 response, while incrementing the TTL, is what an expanding-ring search 2715 is, fundamentally. Expanding-ring searches can facilitate resource- 2716 discovery or auto-configuration protocols. 2718 Another key assumption is that the multicast infrastructure provides the 2719 ability to broadcast from a source equally well in all directions. This 2720 usually implies that, at a minimum, all routers in an internetwork 2721 support a multicast routing protocol. 2723 There are two classes of routing protocols to consider when discussing 2724 expanding-ring searches: DVMRP, MOSPF, and PIM-DM make up one class, 2725 while PIM-SM and CBT comprise the other. 2727 DVMRP supports expanding-ring searches fairly well for a reasonable 2728 number of sources. The downside of using DVMRP is that there is source- 2729 specific state kept across the entire DVMRP internetwork. As the number 2730 of sources increases, the amount of state increases linearly. Due to 2731 DVMRP's broadcast-and-prune nature, the tree for each source will 2732 quickly converge to reach all receivers for a given group (limited to 2733 receivers within the packet's TTL). If we assume that all routers in 2734 your internetwork speak DVMRP, these TTL-scoped searches will have the 2735 desired result: As the TTL is incremented, the packets will cross 2736 successively further routers, radiating away from the source subnetwork. 2738 MOSPF supports expanding-ring searches particularly well. It also keeps 2739 source-specific state, so has the same scaling issues as DVMRP with 2740 respect to state (forwarding cache) increasing linearly with the number 2741 of sources. MOSPF's unique capability is that it knows the topology of 2742 its local area. One by-product of this knowledge is that for a given 2743 (source, group) pair, each MOSPF router knows the minimum TTL needed to 2744 reach the closest group member. If a packet's TTL is not at least this 2745 large, it need not be forwarded. This conserves bandwidth that would 2746 otherwise be wasted by several iterations of successively larger TTLs. 2747 However, since MOSPF is an explicit-join protocol, any servers wishing 2748 to be found must join the search group in advance. Otherwise, MOSPF's 2749 trees will not include these subnetworks. 2751 PIM-DM is very similar to DVMRP in the context of expanding-ring 2752 searches. If we continue to assume that all routers in a given 2753 internetwork support the multicast routing protocol, then RPM (used by 2754 PIM-DM and the DVMRP) will ensure that sources' traffic is broadcast 2755 across the entire internetwork (limited only by the packet's initial 2756 TTL), with no forwarding loops. 2758 Shared-tree protocols do not have properties that necessarily lend 2759 themselves to supporting expanding-ring searches. PIM-SM, by default, 2760 does not build source-based trees. Consider the case of a sender on 2761 a leaf subnet in a PIM-SM domain. Multicast packets sent with TTL=1 2762 will only reach end-stations on the local subnetwork, but TTL=2 packets 2763 will be tunneled inside PIM-SM-Register packets destined for the RP. 2764 Once at the RP, the PIM-SM-Register wrapper is removed, exposing the 2765 multicast packet, which should now have TTL=1 (the DR should have 2766 decremented the TTL before forwarding the packet). The RP can now 2767 forward the original packet over its attached downstream interfaces for 2768 this group. Since the TTL=1, this is as far as they will go. Future 2769 packets, with incrementally-higher TTLs, will radiate outward from the 2770 RP along any downstream branches for this group. Thus, the search will 2771 locate resources that are closest to the RP, not the source (unless the 2772 RP and the source happen to be very close together). 2774 CBT does not really have this problem, at least in the latest version, 2775 since it does not need to encapsulate packets to get them to the group's 2776 core. Also, CBT's state is bi-directional, so any receiver can also be 2777 a sender with no tree-setup penalty. CBT, because it is a shared-tree 2778 protocol, isn't as good as DVMRP, MOSPF, or PIM-DM for expanding-ring 2779 searches, but it is better than PIM-SM: CBT tends to have more 2780 symmetric distances, and "closer" on the core-based tree is more 2781 correlated with "closer" in terms of network hops. However, CBT is not 2782 perfect for these searches. The effectiveness of a CBT search depends 2783 on the density of branch points of the group's distribution tree in the 2784 immediate vicinity of the source. If we assume that all routers are 2785 also CBT routers, then the search can be quite effective. 2787 10. REFERENCES 2789 10.1 Requests for Comments (RFCs) 2791 1075 "Distance Vector Multicast Routing Protocol," D. Waitzman, 2792 C. Partridge, and S. Deering, November 1988. 2794 1112 "Host Extensions for IP Multicasting," Steve Deering, 2795 August 1989. 2797 1583 "OSPF Version 2," John Moy, March 1994. 2799 1584 "Multicast Extensions to OSPF," John Moy, March 1994. 2801 1585 "MOSPF: Analysis and Experience," John Moy, March 1994. 2803 1700 "Assigned Numbers," J. Reynolds and J. Postel, October 2804 1994. (STD 2) 2806 1812 "Requirements for IP version 4 Routers," Fred Baker, 2807 Editor, June 1995 2809 2117 "Protocol Independent Multicast-Sparse Mode (PIM-SM): 2810 Protocol Specification," D. Estrin, D. Farinacci, A. Helmy, 2811 D. Thaler; S. Deering, M. Handley, V. Jacobson, C. Liu, 2812 P. Sharma, and L. Wei, July 1997. 2814 2189 "Core Based Trees (CBT version 2) Multicast Routing," 2815 A. Ballardie, September 1997. 2817 2200 "Internet Official Protocol Standards," Jon Postel, Editor, 2818 June 1997. (STD 1) 2820 2201 "Core Based Trees (CBT) Multicast Routing Architecture," 2821 A. Ballardie, September 1997. 2823 10.2 Internet-Drafts 2825 "Core Based Tree (CBT) Multicast Border Router Specification for 2826 Connecting a CBT Stub Region to a DVMRP Backbone," , A. J. Ballardie, March 1997. 2829 "Distance Vector Multicast Routing Protocol," , T. Pusateri, February 19, 1997. 2832 "Internet Group Management Protocol, Version 2," , William Fenner, January 22, 1997. 2835 "Internet Group Management Protocol, Version 3," , Brad Cain, Ajit Thyagarajan, and Steve Deering, 2837 Expired. 2839 "Protocol Independent Multicast Version 2, Dense Mode Specification," 2840 , S. Deering, D. Estrin, 2841 D. Farinacci, V. Jacobson, A. Helmy, and L. Wei, May 21, 1997. 2843 "Protocol Independent Multicast-Sparse Mode (PIM-SM): Motivation 2844 and Architecture," , S. Deering, 2845 D. Estrin, D. Farinacci, V. Jacobson, C. Liu, and L. Wei, 2846 November 19, 1996. 2848 "PIM Multicast Border Router (PMBR) specification for connecting 2849 PIM-SM domains to a DVMRP Backbone," , D. Estrin, A. Helmy, D. Thaler, February 3, 1997. 2852 "Administratively Scoped IP Multicast," , D. Meyer, June 10, 1997. 2855 "Interoperability Rules for Multicast Routing Protocols," , D. Thaler, November 7, 1996. 2858 See the IDMR home pages for an archive of specifications: 2860 2861 2863 10.3 Textbooks 2865 Comer, Douglas E. Internetworking with TCP/IP Volume 1 Principles, 2866 Protocols, and Architecture Second Edition, Prentice Hall, Inc. 2867 Englewood Cliffs, New Jersey, 1991 2869 Huitema, Christian. Routing in the Internet, Prentice Hall, Inc. 2870 Englewood Cliffs, New Jersey, 1995 2872 Stevens, W. Richard. TCP/IP Illustrated: Volume 1 The Protocols, 2873 Addison Wesley Publishing Company, Reading MA, 1994 2875 Wright, Gary and W. Richard Stevens. TCP/IP Illustrated: Volume 2 2876 The Implementation, Addison Wesley Publishing Company, Reading MA, 2877 1995 2879 10.4 Other 2881 Dalal, Y. K., and Metcalfe, R. M., "Reverse Path Forwarding of 2882 Broadcast Packets", Communications of the ACM, 21(12):1040-1048, 2883 December 1978. 2885 Deering, Steven E. "Multicast Routing in a Datagram 2886 Internetwork," Ph.D. Thesis, Stanford University, December 1991. 2888 Ballardie, Anthony J. "A New Approach to Multicast Communication 2889 in a Datagram Internetwork," Ph.D. Thesis, University of London, 2890 May 1995. 2892 "Hierarchical Distance Vector Multicast Routing for the MBone," 2893 Ajit Thyagarajan and Steve Deering, Proceedings of the ACM SIGCOMM, 2894 pages 60-66, October, 1995. 2896 11. SECURITY CONSIDERATIONS 2898 As with unicast routing, the integrity and accuracy of the multicast 2899 routing information is important to the correct operation of the 2900 multicast segments of the Internet. Lack of authentication of routing 2901 protocol updates can permit an adversary to inject incorrect routing 2902 data and cause multicast routing to break or flow in unintended 2903 directions. Some existing multicast routing protocols (e.g., MOSPF) do 2904 support cryptographic authentication of their protocol exchanges. More 2905 detailed discussion of multicast routing protocol security is left to 2906 the specifications of those routing protocols. 2908 Lack of authentication of IGMP can permit an adversary to inject false 2909 IGMP messages on a directly attached subnet. Such messages could cause 2910 unnecessary traffic to be transmitted to that subnet (e.g., via a forged 2911 JOIN) or could cause desired traffic to not be transmitted to that 2912 subnet (e.g., via a forged LEAVE). If this is considered to be an 2913 issue, one could use the IP Authentication Header [RFC-1825, RFC-1826] 2914 to provide cryptographic authentication of the IGMP messages. The 2915 reader should consult the IGMPv2 specification for additional 2916 information on this topic. 2918 Security issues in multicast data traffic are beyond the scope of this 2919 document. 2921 12. ACKNOWLEDGEMENTS 2923 This RFC would not have been possible without the encouragement of Mike 2924 O'Dell and the support of David Meyer and Joel Halpern. Also invaluable 2925 was the feedback and comments from the IETF MBoneD and IDMR working 2926 groups. A number of people spent considerable time commenting on and 2927 discussing this paper with the authors, and deserve to be mentioned by 2928 name: Kevin Almeroth, Ran Atkinson, Tony Ballardie, Steve Casner, Jon 2929 Crowcroft, Steve Deering, Bill Fenner, Hugh Holbrook, Cyndi Jung, John 2930 Moy, Shuching Shieh, Dave Thaler, and Nair Venugopal. If we neglected 2931 to mention anyone here, please accept our sincerest apologies. 2933 13. AUTHORS' ADDRESSES 2935 Tom Maufer Chuck Semeria 2937 3Com Corporation 3Com Corporation 2938 5400 Bayfront Plaza 5400 Bayfront Plaza 2939 Mailstop 5247 Mailstop 2218 2940 P.O. Box 58145 P.O. Box 58145 2941 Santa Clara, CA 95052-8145 Santa Clara, CA 95052-8145 2943 Phone: +1 408 764-8814 Phone: +1 408 764-7201 2944 Email: Email: