idnits 2.17.1 draft-ietf-idmr-cbt-spec-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-24) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There are 39 instances of too long lines in the document, the longest one being 7 characters in excess of 72. ** The abstract seems to contain references ([2], [3], 3], [4], [5], [4,5], [1,2,, [1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? '1' on line 936 looks like a reference -- Missing reference section? '2' on line 940 looks like a reference -- Missing reference section? '3' on line 943 looks like a reference -- Missing reference section? '4' on line 947 looks like a reference -- Missing reference section? '5' on line 950 looks like a reference Summary: 12 errors (**), 0 flaws (~~), 1 warning (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 2 Inter-Domain Multicast Routing (IDMR) A. J. Ballardie 3 INTERNET-DRAFT University College London 4 N. Jain 5 Bay Networks, Inc. 6 S. Reeve 7 Bay Networks, Inc. 9 June 20th, 1995 11 Core Based Trees (CBT) Multicast 13 -- Protocol Specification -- 15 Status of this Memo 17 This document is an Internet Draft. Internet Drafts are working do- 18 cuments of the Internet Engineering Task Force (IETF), its Areas, and 19 its Working Groups. Note that other groups may also distribute work- 20 ing documents as Internet Drafts). 22 Internet Drafts are draft documents valid for a maximum of six 23 months. Internet Drafts may be updated, replaced, or obsoleted by 24 other documents at any time. It is not appropriate to use Internet 25 Drafts as reference material or to cite them other than as a "working 26 draft" or "work in progress." 28 Please check the I-D abstract listing contained in each Internet 29 Draft directory to learn the current status of this or any other 30 Internet Draft. 32 Abstract 34 This document describes the Core Based Tree (CBT) multicast protocol 35 specification. CBT is a next-generation multicast protocol that makes 36 use of a shared delivery tree rather than separate per-sender trees 37 utilized by most other multicast schemes [1, 2, 3]. 39 The specification includes a description of an optimization whereby 40 native IP-style multicasts are forwarded over tree branches as well 41 as subnetworks with group member presence. This mode of operation 42 will be called CBT "native mode" and obviates the need to insert a 43 CBT header into data packets before forwarding over CBT interfaces. 44 Native mode is only relevant to CBT-only domains or ``clouds''. 46 The CBT architecture is described in an accompanying document: 47 draft-ietf-idmr-arch-00.txt. Other related documents include [4, 5]. 49 _1. _D_o_c_u_m_e_n_t _L_a_y_o_u_t 51 We describe the protocol details by means of example using the topol- 52 ogy shown in figure 1. Examples show how a host joins a group and 53 leaves a group, and we also show various tree maintenance scenarios. 55 In this figure member hosts are shown as capital letters, routers are 56 prefixed with R, and subnets are prefixed with S. 58 Figure 1 is shown over... 60 A B 61 | S1 S4 | 62 ------------------- ----------------------------------------------- 63 | | | | 64 ------ ------ ------ ------ 65 | R1 | | R2 | | R5 | | R6 | 66 ------ ------ ------ ------ 67 C | | | | | 68 | | | | S2 | S8 | 69 ---------- ------------------------------------------ ------------- 70 S3 | 71 ------ 72 | R3 | 73 | ------ D 74 | S9 | | S5 | 75 | | --------------------------------------------- 76 | |----| | | 77 ---| R7 |-----| ------ 78 | |----| |------------------| R4 | 79 | S7 | ------ F 80 | | | S6 | 81 |-E | --------------------------------- 82 | | 83 | ------ 84 |---| |---------------------| R8 | 85 |R12 -----| ------ G 86 |---| | | | S10 87 | S14 ---------------------------- 88 | | 89 I --| ------ 90 | | R9 | 91 ------ 92 | S12 93 | ---------------------------- 94 S15 | | 95 | ------ 96 |----------------------|R10 | 97 J ---| ------ H 98 | | | 99 | ---------------------------- 100 | S13 102 Figure 1. Example Network Topology 104 _2. _P_r_o_t_o_c_o_l _S_p_e_c_i_f_i_c_a_t_i_o_n 106 _2._1. _C_B_T _G_r_o_u_p _I_n_i_t_i_a_t_i_o_n 108 Like any of the other multicast schemes, one user, the group initia- 109 tor, initiates a CBT multicast group. Group initiation could be car- 110 ried out by a network management centre, or by some other external 111 means, rather than have a user act as group initiator. However, in 112 the author's implementation, this flexibility has been afforded the 113 user, and a CBT group is invoked by means of a graphical user inter- 114 face (GUI), known as the CBT User Group Management Interface. 116 NOTE: Work is currently in progress to address the issue of core 117 placement. 119 _2._2. _T_r_e_e _J_o_i_n_i_n_g _P_r_o_c_e_s_s 121 The following steps are involved in a host establishing itself as 122 part of a CBT multicast tree: 124 o+ the joining host must inform all routers on its subnet that it 125 requires a Designated Router (DR) for the group it wishes to 126 join (it is a requirement that only one router, the DR, forward 127 to and from upstream to avoid loops). 129 o+ the establishment of a DR for the group. 131 o+ once established, the DR must proceed to join the distribution 132 tree. 134 The following CBT control messages come into play during the host 135 joining process: 137 NOTE: all CBT message types are described in section 8 irrespective 138 of some of the comments included with certain message types below. 140 o+ CORE_NOTIFICATION (sent only by a group initiating host to 141 inform each core for the group that it has been elected as a 142 core for the group). 144 o+ CORE_NOTIFICATION_ACK 146 o+ DR_SOLICITATION 148 o+ DR_ADVERTISEMENT_NOTIFICATION (sent only by a local CBT-capable 149 router when that router is unaware of a DR for the group on the 150 same subnet, and believes it is candidate for the best next-hop 151 router off the LAN to the core address as specified in the 152 DR_SOLICITATION. This message acts as a tie-breaker in the case 153 where there are two or more such routers on a subnet). 155 o+ DR_ADVERTISEMENT 157 o+ TAG_REPORT (sent by a joining host to the DR subsequent to 158 receiving a DR_ADVERTISEMENT. This message serves to invoke the 159 DR to become part of the distribution tree, if not already, by 160 sending a JOIN_REQUEST). 162 o+ JOIN_REQUEST (sent only by the group's DR iff it is not yet part 163 of, or in the process of, joining the corresponding CBT tree). 165 o+ JOIN_ACK 167 o+ HOST_JOIN_ACK (multicast across the subnet by the local DR as an 168 indication that the DR is part of the distribution tree. This 169 message may be sent in immediate response to receiving a 170 TAG_REPORT, depending on whether the DR is already part of the 171 CBT tree or not. If not it is sent subsequent to the DR receiv- 172 ing a JOIN_ACK). 174 A group-initiating host sends a CORE-NOTIFICATION message to each of 175 the elected cores for the group. This message is acknowledged 176 (CORE_NOTIFICATION_ACK) by each core individually. Provided at least 177 one ACK is received a host will not be prevented from joining the 178 tree. 180 The purpose of the CORE_NOTIFICATION is twofold: firstly, to communi- 181 cate the identities of all of the cores, together with their rank- 182 ings, to each of them individually; secondly, to invoke the building 183 of the core backbone or core tree. These two procedures follow on one 184 to the other in the order just described. New receivers attempting to 185 join whilst the building of the core backbone is still in progress 186 have their explicit JOIN-REQUEST messages stored by whichever CBT- 187 capable router involved in the core joining process is encountered 188 first. 190 Taking our example topology in figure 1, host A is the group initia- 191 tor. The elected cores are router R4 (primary core) and R9 (secon- 192 dary core). Host A first sends a CORE_NOTIFICATION to each of R4 and 193 R9, and each responds positively with a CORE_NOTIFICATION_ACK. 194 CORE_NOTIFICATION messages are always unicast. 196 Subsequent to sending a CORE_NOTIFICATION_ACK, each secondary core 197 router (in this case there is only one secondary, R9) proceeds to 198 join the primary core, and thus forms the core tree, or backbone; R9 199 unicasts a JOIN_REQUEST (subcode CORE_JOIN) to R8, its best next-hop 200 to the primary core, R4. JOIN_REQUESTs (and corresponding ACKs) are 201 processed by all intervening CBT-capable routers, and forwarded if 202 necessary. R8 forwards the JOIN_REQUEST to R4, remembering the incom- 203 ing and outgoing interfaces of the JOIN_REQUEST. 205 R4 receives the JOIN_REQUEST (subcode CORE_JOIN), realises it is the 206 target of the join, and therefore sends a JOIN_ACK back out of the 207 receiving interface to the previous-hop sender of the join. R8 208 receives the JOIN_ACK and forwards it to R9 over the interface the 209 join was received from R9. On receipt of the JOIN_ACK, R9 need take 210 no further action. Core tree set up is complete. 212 For the period between any CBT-capable router forwarding (or ori- 213 ginating) a JOIN_REQUEST and receiving a JOIN_ACK the corresponding 214 router is not permitted to acknowledge any subsequent joins received 215 for the same group; rather, the router caches such joins till such 216 time as it has itself received a JOIN_ACK for the original join, at 217 which time it can acknowledge any cached joins. A router is said to 218 be in a pending-join state if it is awaiting a JOIN_ACK itself. 220 Returning to host A which has just received both 221 CORE_NOTIFICATION_ACKs, it must now establish which local CBT router 222 is DR for the group. Since A is the group initiator it is highly 223 unlikely that a DR for the group will already exist. If A was joining 224 an existing group a DR may already be present. 226 Host A sends a DR_SOLICITATION (IP TTL 1) to the "all-CBT-routers" 227 address (224.0.0.7). The solicitation contains one of core addresses 228 as elected by the host, to which it wishes a join to be sent. Any 229 routers on the same subnet receiving the solicitation establish 230 whether they are the best next-hop to the specified core or not. If a 231 router does consider itself a candidate and has no record for a DR 232 for the group, it multicasts a DR_ADV_NOTIFICATION to the "all-CBT- 233 routers" group (224.0.0.7). This message acts as a tie-breaker in the 234 case where there is more than one CBT router on the subnet which 235 thinks it is the best next-hop to the core. The lowest-addressed 236 source of a DR_ADV_NOTIFICATION wins the election and subsequently 237 advertises itself as DR by means of a DR_ADVERTISEMENT, multicast to 238 the "all-systems group (224.0.0.1). As R1 is the only router on A's 239 subnet, it responds with a DR_ADV_NOTIFICATION followed by a 240 DR_ADVERTISEMENT. 242 The time between sending a DR_ADV_NOTIFICATION and a DR_ADVERTISEMENT 243 should be configurable and ideally less than one second so as to keep 244 join latency to a minimum. 246 The DR election for subnet S4 is more complex. When host B sends a 247 DR_SOLICITATION routers R2, R5 and R6 receive it. Assuming R2 and R5 248 both believe they are the best next-hop to R4 (the specified core) 249 both send a DR_ADV_NOTIFICATION. R2 (the lower addressed) wins the 250 tie-breaker and subsequently multicasts a DR_ADVERTISEMENT to S4. All 251 subnets with joining hosts proceed similarly. 253 A DR candidate is a router whose outgoing interface, as specified in 254 its routing table entry for the destination, is different than the 255 interface over which the DR_SOLICITATION arrived. 257 On receiving a DR_ADVERTISEMENT host A sends a TAG_REPORT to the DR, 258 R1. R1 responds by unicasting a JOIN_REQUEST (subcode ACTIVE_JOIN) to 259 R3 -- the best next-hop to R4, the desired target of the join. R3 260 forwards (unicast) the received join to R4, remembering incoming and 261 outgoing interfaces. R4, now already established on tree for the 262 group responds to the JOIN_REQUEST with a JOIN_ACK, and sends it to 263 R3, which in turn sends it to R1. The branch R1-R3-R4 is now complete 264 and part of the distribution tree. 266 On receipt of the JOIN_ACK, R1 multicasts to the "all-systems" 267 address (224.0.0.1) a HOST_JOIN_ACK which is a notification to the 268 joining end-system that the DR has been successful in joining the 269 tree. The multicast application running on host A can now send data. 271 Host B proceeds to join the group in a similar fashion, but there are 272 some subtle differences. Host B is not the group initiator and it 273 need not send CORE_NOTIFICATIONs. Host B's first step is to elect a 274 DR, as described above. On receipt of a DR_ADVERTISEMENT from router 275 R2 in this case, B unicasts a TAG_REPORT to R2. The core specified in 276 the TAG_REPORT is R4. In response the the TAG_REPORT, R2 unicasts a 277 JOIN_REQUEST (subcode ACTIVE_JOIN) to R3, the best next-hop to R4. R3 278 however, has just joined the tree and so can acknowledge the received 279 join, i.e. it need not travel all the way to R4. R3 unicasts a 280 JOIN_ACK to R2, which results in R2 multicasting a HOST_JOIN_ACK 281 across subnet S4. 283 _3. _D_a_t_a _P_a_c_k_e_t _F_o_r_w_a_r_d_i_n_g (_C_B_T _m_o_d_e) 285 "CBT mode" as opposed to "native mode" describes the 286 forwarding/sending of data packets over CBT tree interfaces contain- 287 ing a CBT header encapsulation. For efficiency, this encapsulation is 288 as follows: 290 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 291 | encaps IP hdr | CBT hdr | original IP hdr | data ....| 292 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 294 Figure 2. Encapsulation for CBT mode 296 By using the encapsulations above there is virtually no necessity to 297 modify a packet's original IP header, and decapsulation is relatively 298 efficient. 300 It is worth pointing out at this point the distinction between sub- 301 networks and tree branches, although they can be one and the same. 302 For example, a multi-access subnetwork containing routers and end- 303 systems could potentially be both a CBT tree branch and a subnetwork 304 with group member presence. A tree branch which is not simultaneously 305 a subnetwork is a "tunnel" or a point-to-point link. 307 In CBT forwarding mode there are three forwarding methods used by CBT 308 routers: 310 o+ IP multicasting. This method is used to send a data packet 311 across a directly-connected subnetwork with group member pres- 312 ence. Thus, system host changes are not required for CBT. Simi- 313 larly, end-systems originating multicast data do so in tradi- 314 tional IP-style. 316 o+ CBT unicasting. This method is used for sending data packets 317 encapsulated (as illustrated above) across a tunnel or point- 318 to-point link. 320 o+ CBT multicasting. This method sends data packets encapsulated 321 (as illustrated above) but the outer encapsulating IP header 322 contains a multicast address. This method is used when a parent 323 or multiple children are reachable over a single physical inter- 324 face, as could be the case on a multi-access Ethernet. The IP 325 module of end-systems subscribed to the same group will discard 326 these multicasts since the CBT payload type will not be recog- 327 nized. 329 CBT routers create Forwarding Information Base (FIB) entries whenever 330 they send or receive a JOIN_ACK. The FIB describes the parent-child 331 relationships on a per-group basis. A FIB entry dictates over which 332 tree interfaces, and how (unicast or multicast) a data packet is to 333 be sent. Additionally, a data packet is IP multicast over any 334 directly-connected subnetworks with group member presence. Such 335 interfaces are kept in a separate table relating to IGMP. A FIB entry 336 is shown below: 338 32-bits 4 4 4 4 | 4 339 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 340 | group-id | parent addr | parent vif | No. of | | 341 | | index | index |children | children | 342 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 343 |chld addr |chld vif | 344 | index | index | 345 |+-+-+-+-+-+-+-+-+-+-+ 346 |chld addr |chld vif | 347 | index | index | 348 |+-+-+-+-+-+-+-+-+-+-+ 349 |chld addr |chld vif | 350 | index | index | 351 |+-+-+-+-+-+-+-+-+-+-+ 352 | | 353 | etc. | 354 |+-+-+-+-+-+-+-+-+-+-+ 356 Figure 3. CBT FIB entry 358 The field lengths shown above assume a maximum of 16 directly con- 359 nected neighbouring routers. 361 When a data packet arrives at a CBT router, the following rules 362 apply: 364 o+ if the packet is an IP-style multicast, it is checked to see if 365 it originated locally (i.e. if the arrival interface subnetmask 366 ANDed with the packet's source IP address equals the arrival 367 interface's subnet number, the packet was sourced locally). If 368 it does not the packet is discarded. 370 o+ the packet is IP multicast to all directly connected subnets 371 with group member presence. The packet is sent with an IP TTL 372 value of 1 in this case. 374 o+ the packet is encapsulated for CBT forwarding (see figure 2) and 375 unicast to parent and children. However, if more than one child 376 is reachable over the same interface the packet will be CBT mul- 377 ticast. Therefore, it is possible that an IP-style multicast and 378 a CBT multicast will be forwarded over a particular subnetwork. 380 Using our example topology in figure 1, let's assume member G ori- 381 ginates an IP multicast packet. R8 is the DR for subnet S10 (R4 is DR 382 for all its attached subnets). R8 CBT unicasts the packet to each of 383 its children, R9 and R12. These children are not reachable over the 384 same interface. R8, being the DR for subnets S14 and S10 also IP mul- 385 ticasts the packet to S14 (S10 received the IP style packet already 386 from the originator). R9, the DR for S12, need not IP multicast onto 387 S12 since there are no members present there. R9 CBT unicasts the 388 packet to R10, which is the DR for S13 and S15. It IP multicasts to 389 both S13 and S15. 391 Going upstream from R8, R8 CBT unicasts to R4. It is DR for all 392 directly connected subnets and therefore IP multicasts the data 393 packet onto S5, S6 and S7, all of which have member presence. R4 uni- 394 casts the packet to all outgoing children, R3 and R7 (NOTE: R4 does 395 not have a parent since it is the primary core router for the group). 396 R7 IP multicasts onto S9. R3 CBT unicasts to R1 and R2, its children. 397 Finally, R1 IP multicasts onto S1 and S3, and R2 IP multicasts onto 398 S4. 400 _3._1. _N_o_n-_M_e_m_b_e_r _S_e_n_d_i_n_g 402 For a multicast data packet to span beyond the scope of the originat- 403 ing subnetwork at least one CBT-capable router must be present on 404 that subnetwork. The DR for the group on the subnetwork must encap- 405 sulate the IP-style packet and unicast it to a core for the group. 406 This requires CBT routers to have access to a mapping mechanism 407 between group addresses and core routers. This mechanism is 408 currently beyond the scope of this document. 410 _4. _D_a_t_a _P_a_c_k_e_t _F_o_r_w_a_r_d_i_n_g (_n_a_t_i_v_e _m_o_d_e) 412 In CBT "native mode" only one forwarding method is used, namely all 413 data packets are forwarded over CBT tree interfaces as native IP mul- 414 ticasts, i.e. there are no encapsulations required. This assumes that 415 CBT is the multicast routing protocol in operation within the domain 416 (or "cloud") in question. It also assumes that all routers within the 417 domain of operation are CBT-capable, i.e. there are no "tunnels". If 418 this latter constraint cannot be satisfied it is necessary to encap- 419 sulate IP-over-IP before forwarding to a child or parent reachable 420 via non-CBT-capable router(s). 422 Besides the structural characteristics of "native mode" data packets, 423 described above, the data packet forwarding rules are identical to 424 those described in section 3. 426 _4._1. _N_o_n-_M_e_m_b_e_r _S_e_n_d_i_n_g (_n_a_t_i_v_e _m_o_d_e) 428 For a multicast data packet to span beyond the scope of the originat- 429 ing subnetwork at least one CBT-capable router must be present on 430 that subnetwork. The DR for the group on the subnetwork must encap- 431 sulate (IP-over-IP) the IP-style packet and unicast it to a core for 432 the group. This requires CBT routers to have access to a mapping 433 mechanism between group addresses and core routers. This mechanism 434 is currently beyond the scope of this document. 436 _5. _T_r_e_e _M_a_i_n_t_e_n_a_n_c_e 438 Once a tree branch has been created, i.e. a CBT router has received a 439 JOIN_ACK for a JOIN_REQUEST previously sent (forwarded), a child 440 router is required to monitor the status of its parent/parent link at 441 fixed intervals by means of a ``keepalive'' mechanism operating 442 between them. The ``keepalive'' mechanism is implemented by means of 443 two CBT control messages: CBT_ECHO_REQUEST and CBT_ECHO_REPLY. 445 For any non-core router, if its parent router, or path to the parent, 446 fails, that non-core router is initially responsible for re-attaching 447 itself, and therefore all routers subordinate to it on the same 448 branch, to the tree. 450 _5._1. _R_o_u_t_e_r _F_a_i_l_u_r_e 452 A non-core router can detect a failure from the following two cases: 454 o+ if a child stops receiving CBT_ECHO_REPLY messages. In this case 455 the child realises that its parent has become unreachable and 456 must therefore try and re-connect to the tree. It does so by 457 arbitrarily choosing an alternate core from its list of cores 458 for this group. It establishes a chosen core's reachability by 459 unicasting a CBT_CORE_PING message to it, to which the core 460 responds with a CBT_PING_REPLY. On receipt of the latter, the 461 re-joining router sends a JOIN_REQUEST (subcode ACTIVE_REJOIN) 462 to the best next-hop router on the path to the core. A router 463 will continue arbitrarily choosing an alternate core until a 464 CBT_PING_REPLY is received. 466 o+ if a parent stops receiving CBT_ECHO_REQUESTs from a child. In 467 this case the parent simply removes the child interface from its 468 FIB entry for the particular group. 470 _5._2. _R_o_u_t_e_r _R_e-_S_t_a_r_t_s 472 There are two cases to consider here: 474 o+ Core re-start. In this case, the core router relies on receiving 475 a CBT_CORE_PING message, which contains the list of cores for 476 the specified group. Obviously, one of the core addresses will 477 be its own. If a core realises its core status for a group in 478 this way, if it is not the primary it sends a JOIN_REQUEST (sub- 479 code ACTIVE_JOIN) to the primary core. If the router in ques- 480 tion is the primary it need not send a join, but rather awaits 481 joins and considers itself part of the tree again. 483 o+ Non-core re-start. In this case, the router can only join the 484 tree again if a downstream router sends a JOIN_REQUEST through 485 it, or it is elected DR for one of its directly attached sub- 486 nets. 488 _5._3. _R_o_u_t_e _L_o_o_p_s 490 Routing loops are only a concern when a router with at least one 491 child is attempting to re-join a CBT tree. In this case the re- 492 joining router sends a JOIN_REQUEST (subcode ACTIVE REJOIN) to the 493 best next-hop on the path to the core. This join is forwarded as nor- 494 mal until it reaches either the core or a non-core router that is 495 already part of the tree. If the join reaches the specified core, the 496 join terminates there and is ACKd as normal. If however, the join is 497 terminated by non-core router, the ACTIVE_REJOIN is converted to a 498 NON_ACTIVE_REJOIN and forwarded upstream. A JOIN_ACK is also sent 499 downstream to acknowledge the received join. The NON_ACTIVE_REJOIN 500 is a loop detection packet. All routers receiving this must forward 501 it over their parent interface. If the originator of the correspond- 502 ing ACTIVE_REJOIN should receive the NON_ACTIVE_REJOIN it immediately 503 sends a QUIT_REQUEST to its recently established parent and the loop 504 is broken. 506 o+ Using figure 4 (over) to demonstrate this, if R3 is attempting 507 to re-join the tree (R1 is the core in figure 4) and R3 believes 508 its best next-hop to R1 is R6, and R6 believes R5 is its best 509 next-hop to R1, which sees R4 as its best next-hop to R1 -- a 510 loop is formed. R3 begins by sending a JOIN_REQUEST (subcode 511 ACTIVE_REJOIN, since R4 is its child) to R6. R6 forwards the 512 join to R5. R5 is on-tree for the group, so changes the join 513 subcode to NON_ACTIVE_REJOIN, and forwards this to its parent, 514 R4. R4 forwards the NON_ACTIVE_REJOIN to R3, its parent. R3 515 originated the corresponding ACTIVE_REJOIN, and so it immedi- 516 ately sends a QUIT_REQUEST to R6, which in turn sends a quit if 517 it has not received an ACK from R5 already AND has itself a 518 child or subnets with member presence. If so it need not send a 519 quit -- the loop has been broken by R3 sending the first quit. 521 QUIT_REQUESTs are typically acknowledged by means of a QUIT_ACK, but 522 there might be cases where, due to failure, the parent cannot 523 respond. In this case the child nevertheless removes the parent 524 information after some small number of re-tries. 526 ------ 527 | R1 | 528 ------ 529 | 530 --------------------------- 531 | 532 ------ 533 | R2 | 534 ------ 535 | 536 --------------------------- 537 | | 538 ------ | 539 | R3 |--------------------------| 540 ------ | 541 | | 542 --------------------------- | 543 | | ------ 544 ------ | | | 545 | R4 | |-------| R6 | 546 ------ | |----| 547 | | 548 --------------------------- | 549 | | 550 ------ | 551 | R5 |--------------------------| 552 ------ | 553 | 555 Figure 4: Example Loop Topology 557 _6. _D_a_t_a _P_a_c_k_e_t _L_o_o_p_s 559 NOTE: this is only applicable when CBT header encapsulation is in 560 use. 562 When a data packet hits its first on-tree router, that router is 563 responsible for setting the on-tree bits in the CBT header. This 564 indicates to all subsequent routers on the tree that the packet is in 565 the process of spanning the tree for the group. However, it might be 566 that a misbehaving router forwards an on-tree packet over a non-tree 567 interface, and such a packet might work its way back onto the tree, 568 potentially forming a data packet loop. Therefore, the on-tree bits 569 in the CBT header serve to identify such packets -- should a router 570 receive a data packet with its on-tree bits set over a non-tree 571 interface the packet is immediately discarded. 573 _7. _T_r_e_e _T_e_a_r_d_o_w_n 575 There are two scenarios whereby a tree branch may be torn down: 577 o+ During a re-configuration, if a router's best next-hop to the 578 specified core is one of its existing children then before send- 579 ing the re-join it must tear down that particular downstream 580 branch. It does so by sending a FLUSH_TREE message which is pro- 581 cessed hop-by-hop down the branch. All routers receiving this 582 message must process it and forward it to all their children. 583 Routers that have received a flush message will re-establish 584 themselves on the delivery tree if they have directly connected 585 subnets with group presence. Subsequent to sending a FLUSH_TREE, 586 the router can send the re-join to its child. 588 o+ If a CBT router has no children it periodically checks all its 589 directly connected subnets for group member presence. If no 590 member presence is ascertained on any of its subnets it sends a 591 QUIT_REQUEST upstream to remove itself from the tree. 593 With regards to the latter scenario, lets see using the example 594 topology of figure 1 how a tree branch is torn down. 596 Assume member E leaves the group (if IGMPv2 is in use an explicit 597 IGMP_LEAVE message will be sent by E). If R7 registers no further 598 group presence (by means of IGMP) then R7 sends a QUIT_REQUEST to R4. 599 R4 responds with a QUIT_ACK to R7. R4 has children AND subnets with 600 group presence, and so does not itself attempt to quit the tree. The 601 branch R4-R7 has been torn down. 603 _8. _C_B_T _P_a_c_k_e_t _F_o_r_m_a_t_s _a_n_d _M_e_s_s_a_g_e _T_y_p_e_s 605 CBT packets travel in IP datagrams. We distinguish between two types 606 of CBT packet: CBT data packets, and CBT control packets. 608 CBT data packets carry a CBT header when these packets are traversing 609 CBT tree branches. The enscapsulation (for "CBT mode") is shown 610 below: 612 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 613 | encaps IP hdr | CBT hdr | original IP hdr | data ....| 614 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 616 Figure 5. Encapsulation for CBT mode 618 CBT control packets carry a CBT control header. All CBT control mes- 619 sages are implemented over UDP. This makes sense for several reasons: 620 firstly, all the information required to build a CBT delivery tree is 621 kept in user space. Secondly, implementation is made considerably 622 easier. 624 CBT control messages fall into two categories: primary maintenance 625 messages, which are concerned with tree-building, re-configuration, 626 and teardown, and auxiliary maintenance messsages, which are mainly 627 concerned with general tree maintenance. 629 _8._1. _C_B_T _H_e_a_d_e_r _F_o_r_m_a_t 631 See over.... 633 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 634 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 635 | vers |unused | type | hdr length | protocol | 636 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 637 | checksum | IP TTL | on-tree|unused| 638 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 639 | group identifier | 640 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 641 | core address | 642 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 643 | packet origin | 644 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 645 | flow identifier | 646 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 647 | security fields | 648 | (T.B.D) | 649 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 651 Figure 6. CBT Header 653 Each of the fields is described below: 655 o+ Vers: Version number -- this release specifies version 1. 657 o+ type: indicates whether the payload is data or control infor- 658 mation. 660 o+ hdr length: length of the header, for purpose of checksum 661 calculation. 663 o+ protocol: upper-layer protocol number. 665 o+ checksum: the 16-bit one's complement of the one's complement 666 of the CBT header, calculated across all fields. 668 o+ IP TTL: TTL value gleaned from the IP header where the packet 669 originated. It is decremented each time it traverses a CBT 670 router. 672 o+ on-tree: indicates whether the packet is on- or off-tree. 673 Once this field is set (i.e. on-tree), it is non-changing. 675 o+ group identifier: multicast group address. 677 o+ core address: the unicast address of a core for the group. A 678 core address is always inserted into the CBT header by an 679 originating host, since at any instant, it does not know if 680 the local DR for the group is on-tree. If it is not, the 681 local DR must unicast the packet to the specified core. 683 o+ packet origin: source address of the originating end-system. 685 o+ flow-identifier: value uniquely identifying a previously set 686 up data stream. 688 o+ security fields: these fields (T.B.D.) will ensure the 689 authenticity and integrity of the received packet. 691 _8._2. _C_o_n_t_r_o_l _P_a_c_k_e_t _H_e_a_d_e_r _F_o_r_m_a_t 693 The individual fields are described below. It should be noted that the 694 contents of the fields beyond ``group identifier'' are empty in some 695 control messages: 697 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 698 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 699 | vers |unused | type | code | unused | 700 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 701 | hdr length | checksum | 702 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 703 | group identifier | 704 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 705 | packet origin | 706 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 707 | core address | 708 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 709 | Core #1 | 710 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 711 | Core #2 | 712 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 713 | Core #3 | 714 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 715 | Core #4 | 716 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 717 | Core #5 | 718 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 719 | Resource Reservation fields | 720 | (T.B.D) | 721 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 722 | security fields | 723 | (T.B.D) | 724 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 726 Figure 7. CBT Control Packet Header 728 o+ Vers: Version number -- this release specifies version 1. 730 o+ type: indicates control message type (see sections 1.3, 1.4). 732 o+ code: indicates sub-code of control message type. 734 o+ header length: length of the header, for purpose of checksum 735 calculation. 737 o+ checksum: the 16-bit one's complement of the one's complement 738 of the CBT control header, calculated across all fields. 740 o+ group identifier: multicast group address. 742 o+ packet origin: source address of the originating end-system. 744 o+ core address: desired/actual core affiliation of control mes- 745 sage. 747 o+ Core #Z: Maximum of 5 core addresses may be specified for any 748 one group. An implementation is not expected to utilize more 749 than, say, 3. 751 NOTE: It was an engineering design decision to have a fixed max- 752 imum number of core addresses, to avoid a variable-sized packet. 754 o+ Resource Reservation fields: these fields (T.B.D.) are used 755 to reserve resources as part of the CBT tree set up pro- 756 cedure. 758 o+ Security fields: these fields (T.B.D.) ensure the authenti- 759 city and integrity of the received packet. 761 _8._3. _P_r_i_m_a_r_y _M_a_i_n_t_e_n_a_n_c_e _M_e_s_s_a_g_e _T_y_p_e_s 763 There are six types of CBT primary maintenance message, namely: 765 o+ JOIN-REQUEST: invoked by an end-system, generated and sent 766 (unicast) by a CBT router to the specified core address. It 767 is processed hop-by-hop on its way to the specified core. Its 768 purpose is to establish the sending CBT router, and all 769 intermediate CBT routers, as part of the corresponding 770 delivery tree. 772 o+ JOIN-ACK: an acknowledgement to the above. The full list of 773 core addresses is carried in a JOIN-ACK, together with the 774 actual core affiliation (the join may have been terminated by 775 an on-tree router on its journey to the specified core, and 776 the terminating router may or may not be affiliated to the 777 core specified in the original join). A JOIN-ACK traverses 778 the same path as the corresponding JOIN-REQUEST, and it is 779 the receipt of a JOIN-ACK that actually creates a tree 780 branch. 782 o+ JOIN-NACK: a negative acknowledgement, indicating that the 783 tree join process has not been successful. 785 o+ QUIT-REQUEST: a request, sent from a child to a parent, to be 786 removed as a child to that parent. 788 o+ QUIT-ACK: acknowledgement to the above. If the parent, or the 789 path to it is down, no acknowledgement will be received 790 within the timeout period. This results in the child 791 nevertheless removing its parent information. 793 o+ FLUSH-TREE: a message sent from parent to all children, which 794 traverses a complete branch. This message results in all tree 795 interface information being removed from each router on the 796 branch, possibly because of a re-configuration scenario. 798 The JOIN-REQUEST has three valid sub-codes, namely JOIN-ACTIVE, RE- 799 JOIN-ACTIVE, and RE-JOIN-NACTIVE. 801 A JOIN-ACTIVE is sent from a CBT router that has no children for the 802 specified group. 804 A RE-JOIN-ACTIVE is sent from a CBT router that has at least one 805 child for the specified group. 807 A RE-JOIN-NACTIVE originally started out as an active re-join, but 808 has reached an on-tree router for the corresponding group. At this 809 point, the router changes the join status to non-active re-join and 810 forwards it on its parent branch, as does each CBT router that 811 receives it. Should the router that originated the active re-join 812 subsequently receive the non-active re-join, it must immediately send 813 a QUIT-REQUEST to its parent router. It then attempts to re-join 814 again. In this way the re-join acts as a loop-detection packet. 816 _8._4. _A_u_x_i_l_l_i_a_r_y _M_a_i_n_t_e_n_a_n_c_e _M_e_s_s_a_g_e _T_y_p_e_s 818 There are eleven CBT auxilliary maintenance message types: 820 o+ CBT-DR-SOLICITATION: a request sent from a host to the CBT 821 ``all-routers'' multicast address, for the address of the 822 best next-hop CBT router on the LAN to the core as specified 823 in the solicitation. 825 o+ CBT-DR-ADVERTISEMENT: a reply to the above. Advertisements 826 are addressed to the ``all-systems'' multicast group. 828 o+ CBT-CORE-NOTIFICATION: unicast from a group initiating host 829 to each core selected for the group, this message notifies 830 each core of the identities of each of the other core(s) for 831 the group, together with their core ranking. The receipt of 832 this message invokes the building of the core tree by all 833 cores other than the highest-ranked (primary core). 835 o+ CBT-CORE-NOTIFICATION-ACK: a notification of acceptance to 836 becoming a core for a group, to the corresponding end-system. 838 o+ CBT-ECHO-REQUEST: once a tree branch is established, this 839 messsage acts as a ``keepalive'', and is unicast from child 840 to parent. 842 o+ CBT-ECHO-REPLY: positive reply to the above. 844 o+ CBT-CORE-PING: unicast from a CBT router to a core when a 845 tree router's parent has failed. The purpose of this message 846 is to establish core reachability before sending a JOIN- 847 REQUEST to it. 849 o+ CBT-PING-REPLY: positive reply to the above. 851 o+ CBT-TAG-REPORT: unicast from an end-system to the designated 852 router for the corresponding group, subsequent to the end- 853 system receiving a designated router advertisement (as well 854 as a core notification reply if group-initiating host). This 855 message invokes the sending of a JOIN-REQUEST if the receiv- 856 ing router is not already part of the corresponding tree. 858 o+ CBT-HOST_JOIN_ACK: group-specific multicast by a CBT router 859 that originated a JOIN-REQUEST on behalf of some end-system 860 on the same LAN (subnet). The purpose of this message is to 861 notify end-systems on the LAN belonging to the specified 862 group of such things as: success in joining the delivery 863 tree; actual core affiliation. 865 o+ CBT-DR-ADV-NOTIFICATION: multicast to the CBT ``all-routers'' 866 address, this message is sent subsequent to receiving a CBT- 867 DR-SOLICITATION, but prior to any CBT-DR-ADVERTISEMENT being 868 sent. It acts as a tie-breaking mechanism should more than 869 one router on the subnet think itself the best next-hop to 870 the addressed core. It also promts an already established DR 871 to announce itself as such if it has not already done so in 872 response to a CBT-DR-SOLICITATION. 874 _9. _I_n_t_e_r_o_p_e_r_a_b_i_l_i_t_y _I_s_s_u_e_s 876 One of the design goals of CBT is for it to fully interwork with 877 other IP multicast schemes. We have already described how CBT-style 878 packets are transformed into IP-style multicasts, and vice-versa. 880 In order for CBT to fully interwork with other schemes, it is neces- 881 sary to define the interface(s) between a ``CBT cloud'' and the cloud 882 of another scheme. The CBT authors are currently working out the 883 details of the ``CBT-other'' interface, and therefore we omit further 884 discussion of this topic at the present time. 886 _1_0. _C_B_T _S_e_c_u_r_i_t_y _A_r_c_h_i_t_e_c_t_u_r_e 888 see current I-D: draft-ietf-idmr-mkd-02.txt 890 Acknowledgements 892 Special thanks goes to Paul Francis, NTT Japan, for the original 893 brainstorming sessions that brought about this work. 895 Thanks also to team at Bay Networks for their comments and sugges- 896 tions, in particular Steve Ostrowski for his suggestion of using 897 "native mode" as a router optimization, Eric Crawley, Scott Reeve, 898 and Nitin Jain. 900 I would also like to thank the participants of the IETF IDMR working 901 group meetings for their general constructive comments and sugges- 902 tions since the inception of CBT. 904 Author's Address: 906 Tony Ballardie, 907 Department of Computer Science, 908 University College London, 909 Gower Street, 910 London, WC1E 6BT, 911 ENGLAND, U.K. 913 Tel: ++44 (0)71 419 3462 914 e-mail: A.Ballardie@cs.ucl.ac.uk 916 Nitin Jain, 917 Bay Networks, Inc. 918 3 Federal Street, 919 Billerica, MA 01821, 920 USA. 922 Tel: ++1 508 670 8888 923 e-mail: njain@BayNetworks.com 925 Scott Reeve, 926 Bay Networks, Inc. 927 3 Federal Street, 928 Billerica, MA 01821, 929 USA. 931 Tel: ++1 508 670 8888 932 e-mail: sreeve@BayNetworks.com 934 References 936 [1] DVMRP. Described in "Multicast Routing in a Datagram Internet- 937 work", S. Deering, PhD Thesis, 1990. Available via anonymous ftp from: 938 gregorio.stanford.edu:vmtp/sd-thesis.ps. 940 [2] J. Moy. Multicast Routing Extensions to OSPF. Communications of 941 the ACM, 37(8): 61-66, August 1994. 943 [3] D. Farinacci, S. Deering, D. Estrin, and V. Jacobson. Protocol 944 Independent Multicast (PIM) Dense-Mode Specification (draft-ietf- 945 idmr-pim-spec-01.ps). Working draft, 1994. 947 [4] A. J. Ballardie. Scalable Multicast Key Distribution (draft-ietf- 948 idmr-mkd-02.txt). Working draft, 1995. 950 [5] A. J. Ballardie. "A New Approach to Multicast Communication in a 951 Datagram Internetwork", PhD Thesis, 1995. Available via anonymous ftp 952 from: cs.ucl.ac.uk:darpa/IDMR/ballardie-thesis.ps.Z.