idnits 2.17.1 draft-ietf-idmr-cbt-spec-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-25) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** Expected the document's filename to be given on the first page, but didn't find any ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There are 65 instances of too long lines in the document, the longest one being 52 characters in excess of 72. ** The abstract seems to contain references ([2], [3], 3], [4], [N], [5,10], [5], [6], [4,5], [7], [8], [9], [8,9], [10], [1,2,, [1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 679 has weird spacing: '... intf typ...' == Line 681 has weird spacing: '... phys nat...' == Line 682 has weird spacing: '... tunnel cbt ...' == Line 683 has weird spacing: '... phys nat...' == Line 684 has weird spacing: '... tunnel cbt ...' == (2 more instances...) == Couldn't figure out when the document was first submitted -- there may comments or warnings related to the use of a disclaimer for pre-RFC5378 work that could not be issued because of this. Please check the Legal Provisions document at https://trustee.ietf.org/license-info to determine if you need the pre-RFC5378 disclaimer. -- Couldn't find a document date in the document -- date freshness check skipped. -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? '1' on line 1291 looks like a reference -- Missing reference section? '2' on line 1295 looks like a reference -- Missing reference section? '3' on line 1298 looks like a reference -- Missing reference section? '4' on line 1302 looks like a reference -- Missing reference section? '5' on line 1305 looks like a reference -- Missing reference section? '8' on line 1315 looks like a reference -- Missing reference section? '9' on line 1319 looks like a reference -- Missing reference section? '6' on line 1309 looks like a reference -- Missing reference section? '7' on line 1312 looks like a reference -- Missing reference section? '10' on line 1322 looks like a reference -- Missing reference section? 'N' on line 1255 looks like a reference Summary: 13 errors (**), 0 flaws (~~), 8 warnings (==), 13 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Inter-Domain Multicast Routing (IDMR) A. J. Ballardie 3 INTERNET-DRAFT University College London 5 November 21st, 1995 7 Core Based Trees (CBT) Multicast 9 -- Protocol Specification -- 11 Status of this Memo 13 This document is an Internet Draft. Internet Drafts are working do- 14 cuments of the Internet Engineering Task Force (IETF), its Areas, and 15 its Working Groups. Note that other groups may also distribute work- 16 ing documents as Internet Drafts). 18 Internet Drafts are draft documents valid for a maximum of six 19 months. Internet Drafts may be updated, replaced, or obsoleted by 20 other documents at any time. It is not appropriate to use Internet 21 Drafts as reference material or to cite them other than as a "working 22 draft" or "work in progress." 24 Please check the I-D abstract listing contained in each Internet 25 Draft directory to learn the current status of this or any other 26 Internet Draft. 28 Abstract 30 This document describes the Core Based Tree (CBT) multicast protocol 31 specification. CBT is a next-generation multicast protocol that makes 32 use of a shared delivery tree rather than separate per-sender trees 33 utilized by most other multicast schemes [1, 2, 3]. 35 This specification includes a description of an optimization whereby 36 native IP-style multicasts are forwarded over tree branches as well 37 as subnetworks with group member presence. This mode of operation 38 will be called CBT "native mode" and obviates the need to encapsulate 39 data packets before forwarding over CBT interfaces. Native mode is 40 only relevant to CBT-only domains or ``clouds''. Also included are 41 some new "data-driven" features. 43 A special authors' note is included explaining the primary 44 differences between this latest specification and the previous 45 release (June 1995). 47 The CBT architecture is described in an accompanying document: 48 draft-ietf-idmr-arch-00.txt. Other related documents include [4, 5]. 49 For all IDMR-related documents, see 50 http://www.cs.ucl.ac.uk/ietf/idmr. 52 _1. _A_u_t_h_o_r_s' _N_o_t_e 54 The purpose of this note is to explain how the CBT protocol has 55 evolved since the previous version (June 1995). 57 The CBT designers have constantly been seeking to streamline the pro- 58 tocol and seek new mechanisms to simplify the group initiation pro- 59 cedure. Especially, it has been a high priority to ensure that the 60 group joining process is as transparent as possible for new 61 receivers; ideally, from a user perspective, only a minimum of infor- 62 mation should be required in order to join a CBT group -- the 63 knowledge/input of two group parameters, group address and TTL value, 64 is a reasonable expectation. At the same time, we strive to keep 65 join latency to an absolute minimum. 67 The factor most affecting join latency in CBT is the mechanism by 68 which each group on a LAN elects a so-called designated router (DR). 69 This mechanism has now been re-invented, being simpler, and keeps 70 join latency to a minimum. This new DR election process is explained 71 in section 2.3. 73 Core selection, placement, and management have prevented a simple 74 group initiation/joining process, inherent in data-driven schemes 75 (like DVMRP); some network entity needs to elect a group's cores, and 76 a mechanism is needed to distribute this information throughout the 77 network so it is available to potential new receivers. 79 CBT separates out most aspects of core management from the protocol 80 itself. This has been made easier due to the fact that core manage- 81 ment is not a problem unique to CBT, but also PIM-Sparse Mode. 82 Separate, protocol-independent core management mechanisms are 83 currently being proposed/developed [8, 9]. In the absence of core 84 management/distribution protocol, the task could be manually handled 85 by network management facilities. 87 In CBT, the core routers for a particular group are categorised into 88 PRIMARY CORE, and NON-PRIMARY (secondary) CORES. 90 The core tree, the part of a tree linking all core routers together, 91 is built on-demand. That is, the core tree is only built subsequent 92 to a non-primary core receiving a join-request (non-primary core 93 routers join the primary core router -- the primary need do nothing). 94 Join-requests carry an ordered list of core routers, making it possi- 95 ble for the non-primary cores to know where to join. 97 CBT now supports the aggregation of certain types of control message 98 on distribution trees, provided aggregation is at all possible. This 99 depends on coordinated multicast address assignment. 101 Also catalytic in the simplification of the CBT protocol are the 102 "multi-protocol support" aspects of the latest proposal of IGMP 103 (IGMPv3 [6]), in particular, the introduction of the RP/Core-Report 104 message (see Appendix and [6]). 106 The end result of these developments is that the CBT protocol is 107 further simplified and more efficient; six message types have been 108 eliminated from the previous version of the protocol, thereby reduc- 109 ing protocol overhead. Furthermore, the new DR election mechanism 110 ensures group join latency is kept to a minimum. 112 Throughout this draft, we assume IGMPv3 is operating between hosts 113 and routers on a LAN. 115 _2. _P_r_o_t_o_c_o_l _S_p_e_c_i_f_i_c_a_t_i_o_n 117 _2._1. _C_B_T _G_r_o_u_p _I_n_i_t_i_a_t_i_o_n 119 A group's initiator elects a small number of candidate cores (which 120 may be advertised by "some means"). Subsequently, the core distribu- 121 tion engine (if available) is notified of the new group now associ- 122 ated with the elected cores. Subsequent network advertisements pro- 123 vide the mapping information for potential new senders 124 and/or receivers. 126 _2._2. _T_r_e_e _J_o_i_n_i_n_g _P_r_o_c_e_s_s -- _O_v_e_r_v_i_e_w 128 It is assumed that hosts receive mapping advertisements 129 via some protocol external to CBT. Given this assumption, the follow- 130 ing steps are involved in a host joining a CBT tree: 132 o+ the joining host learns of the candidate cores for the group. 134 o+ subsequently, an IGMP RP/Core-Report is issued on the subnet- 135 work, addressed to the corresponding multicast group. 137 All IGMP messages are received by all operational CBT multicast 138 routers on the subnetwork. One CBT-capable router per subnetwork 139 is initially elected as the default LAN CBT DR (DEFAULT DR) for 140 all groups. This election happens automatically when CBT routers 141 are initialised. If the subnetwork has multiple CBT routers 142 present, a (possibly different) group-specific DR (GROUP DR) may 143 subsequently be elected. This is fully explained in section 2.3. 145 o+ on receiving an IGMP RP/Core-Report, the local DR takes care of 146 establishing the subnet as part of the corresponding CBT 147 delivery tree. 149 The following CBT control messages come into play during the host 150 joining process: 152 o+ JOIN_REQUEST 154 o+ JOIN_ACK 156 A join-request is generated by a locally-elected DR (see next sec- 157 tion) in response to receiving an IGMP group membership report from a 158 directly connected host. The join is sent to the next-hop on the path 159 to the target core, as specified in the join packet. The join is pro- 160 cessed by each such hop on the path to the core, until either the 161 join reaches the target core itself, or hits a router that is already 162 part of the corresponding distribution tree (as identified by the 163 group address). In both cases, the router concerned terminates the 164 join, and responds with a join-ack, which traverses the reverse-path 165 of the corresponding join. This is possible due to the transient path 166 state created by a join traversing a CBT router. The ack simply fixes 167 that state. 169 _2._3. _D_R _E_l_e_c_t_i_o_n 171 Multiple CBT routers may be connected to a multi-access subnetwork. 172 In such cases it is necessary to elect a (sub)network designated 173 router (DR) that is responsible for sending IGMP host membership 174 queries, and for generating join-requests in response to receiving 175 IGMP group membership reports. Such joins are forwarded upstream by 176 the DR. 178 At start-up, a CBT router assumes it is the only CBT-capable router 179 on its subnetwork. It therefore sends two or three IGMP-HOST- 180 MEMBERSHIP-QUERYs in short succession (for robustness) in order to 181 quickly learn about any group memberships on the subnet. If other CBT 182 routers are present on the same subnet, they will receive these IGMP 183 queries, and depending on which router was already the elected 184 querier, yield querier duty to the new router iff the new router is 185 lower-addressed. If it is not, then the newly-started CBT router will 186 yield when it hears a query from the already established querier. 188 The CBT DEFAULT DR (D-DR) is always (exception, next para) the 189 subnet's IGMP-querier; in CBT these two roles go hand-in-hand. As a 190 result, there is no protocol overhead whatsoever associated with 191 electing the CBT D-DR. 193 On multi-access LANs where different routers may be running different 194 multicast routing protocols, there may be times when a LAN's 195 (subnet's) elected querier is a non-CBT router. CBT routers keep 196 track of their immediate CBT neighbouring routers, and can therefore 197 easily establish if the source of an IGMP query is CBT-capable or 198 not. If an elected querier is not CBT-capable, the DR is (implicitly) 199 elected to be the lowest-addressed neighbour on the same link; if a 200 CBT router on such a link knows of a lower-addressed neighbour on the 201 same link, it either does not attempt to claim DR status, or relinqu- 202 ishes its DR status if it was previously elected DR. 204 _2._4. _B_a_c_k_w_a_r_d_s _C_o_m_p_a_t_i_b_i_l_i_t_y _w_i_t_h _I_G_M_P_v_1 & _v_2 _H_o_s_t_s 206 To comply with this specification, CBT routers are expected to run 207 IGMP version 3 [7]. However, it cannot be assumed that all hosts on a 208 subnetwork will be running IGMPv3; there may be instances of IGMP 209 versions 1 and/or 2. 211 IGMPv1 & v2 hosts will not be able to issue RP/Core Reports, 212 available with IGMPv3. The implications of this primarily mean that 213 such hosts must inform a D-DR of mappings by means of 214 network management. Alternatively, hosts may implement minimal user- 215 level code to emulate IGMPv3-specific messages, and send them as CBT 216 auxiliary control messages to the specified group address. 218 NOTE: one recent core distribution proposal [8] does not require 219 hosts to participate in core election at all. Rather, a local DR 220 is configured to know a set of core addresses in the lowest level 221 of a core hierarchy, and a function is used to map a group address 222 onto a particular core in the hierarchy. 224 _2._5. _T_r_e_e _J_o_i_n_i_n_g _P_r_o_c_e_s_s -- _D_e_t_a_i_l_s 226 The receipt of an IGMP group membership report by a CBT D-DR for a 227 CBT group not previously heard from triggers the tree joining pro- 228 cess. 230 Immediately subsequent to receiving an IGMP group membership report 231 for a CBT group not previously heard from, the D-DR unicasts a JOIN- 232 REQUEST to the first hop on the (unicast) path to the specified core. 233 Core information is gleaned either by means of an IGMP RP/Core 234 Report, also sent in response to an IGMP host membership query, but 235 prior to an IGMP host membership report, or by some other means. 237 Each CBT-capable router traversed on the path between the sending DR 238 and the core processes the join. However, if a join hits a CBT router 239 that is already on-tree, the join is not propogated further, but 240 ACK'd from that point. 242 JOIN-REQUESTs carry the identity of all cores for the group. Assuming 243 there are no on-tree routers in between, once the join (subcode 244 ACTIVE_JOIN) reaches the target core, if the target core is not the 245 primary core (the first listed in the core listing, contained within 246 the join) it first acknowledges the received join by means of a 247 JOIN-ACK, then sends a JOIN-REQUEST, subcode REJOIN-ACTIVE, to the 248 primary core router. Either the primary core, or the first on-tree 249 router encountered, acknowledges the received rejoin by means of a 250 JOIN-ACK. Any such router other than the primary core proceeds by 251 transforming the rejoin into a REJOIN-NACTIVE for loop detection. 252 This is described in section 6.3. 254 To facilitate detailed protocol description, we use a sample 255 topology, illustrated in Figure 1 (shown over). Member hosts are 256 shown as individual capital letters, routers are prefixed with R, and 257 subnets are prefixed with S. 259 A B 260 | S1 S4 | 261 ------------------- ----------------------------------------------- 262 | | | | 263 ------ ------ ------ ------ 264 | R1 | | R2 | | R5 | | R6 | 265 ------ ------ ------ ------ 266 C | | | | | 267 | | | | S2 | S8 | 268 ---------- ------------------------------------------ ------------- 269 S3 | 270 ------ 271 | R3 | 272 | ------ D 273 | S9 | | S5 | 274 | | --------------------------------------------- 275 | |----| | | 276 ---| R7 |-----| ------ 277 | |----| |------------------| R4 | 278 | S7 | ------ F 279 | | | S6 | 280 |-E | --------------------------------- 281 | | 282 | ------ 283 |---| |---------------------| R8 | 284 |R12 -----| ------ G 285 |---| | | | S10 286 | S14 ---------------------------- 287 | | 288 I --| ------ 289 | | R9 | 290 ------ 291 | S12 292 | ---------------------------- 293 S15 | | 294 | ------ 295 |----------------------|R10 | 296 J ---| ------ H 297 | | | 298 | ---------------------------- 299 | S13 301 Figure 1. Example Network Topology 303 Taking the example topology in figure 1, host A is the group initia- 304 tor, and has elected core routers R4 (primary core) and R9 (secondary 305 core) by some external protocol. The mapping is subse- 306 quently advertised by some (possibly same) protocol. 308 Host A generates an IGMP RP/Core-Report and an IGMP group membership 309 report when the multicast application is invoked on host A. Both 310 reports are multicast to the corresponding group address. All multi- 311 cast routers receive all multicast-addressed messages by default. 312 The only CBT router on A's subnet (S1) is R1, which is, by default, 313 the D-DR. 315 Router R1, receives the RP/Core-Report and the group membership 316 report, and proceeds to unicast a JOIN-REQUEST, subcode ACTIVE-JOIN 317 to the next-hop on the path to R4 (R3), the target core in the 318 RP/Core Report. R3 receives the join, caches the necessary group 319 information, and forwards it to R4 -- the target of the join. 321 R4, being the target of the join, sends a JOIN_ACK back out of the 322 receiving interface to the previous-hop sender of the join, R3. A 323 JOIN-ACK, like JOIN-REQUESTs, is processed hop-by-hop by each router 324 on the reverse-path of the corresponding join. The receipt of a 325 join-ack establishes the receiving router on the corresponding CBT 326 tree, i.e. the router becomes part of a branch on the delivery tree. 327 R3 sends a join-ack to R2, which sends a joinj-ack to R1. A new CBT 328 branch has been created, attaching subnet S1 to the CBT delivery tree 329 for the corresponding group. 331 At this point, it is proposed that IGMP (v3) group multicasts a 332 notification across the subnet indicating to member hosts that the 333 delivery tree has been joined successfully. Such a message would 334 greatly benefit multicast protocols requiring explicit joins [5, 10]. 336 For the period between any CBT-capable router forwarding (or ori- 337 ginating) a JOIN_REQUEST and receiving a JOIN_ACK the corresponding 338 router is not permitted to acknowledge any subsequent joins received 339 for the same group; rather, the router caches such joins till such 340 time as it has itself received a JOIN_ACK for the original join. Only 341 then can it acknowledge any cached joins. A router is said to be in a 342 pending-join state if it is awaiting a JOIN_ACK itself. 344 _2._6. _D-_D_R_s, _G-_D_R_s, _a_n_d _P_r_o_x_y-_a_c_k_s 346 The DR election mechanism does not guarantee that the DR will be the 347 router that actually forwards a join off a multi-access network; the 348 first hop on the path to a particular core might be via another 349 router on the same (sub)network, which actually forwards off-LAN. It 350 is not necessary or desirable to have a tree branch rooted anywhere 351 other than at a router that is the interface to and from the LAN; 352 only this router need keep group state information, the join origina- 353 tor (D-DR) need not since the first hop is on the same LAN. Because 354 of this, CBT incorporates a simple mechanism that prevents the D-DR 355 in such scenarios from keeping group state. 357 If a join-ack has returned to the originating subnet of the 358 corresponding join, but has not yet reached the originating router of 359 the corresponding join, obviously the join-request's first hop is on 360 the same subnet as the originating router (the D-DR). A router knows 361 when it is in this situation by extracting the origin router's subnet 362 address using its own subnet mask, then comparing the result with its 363 own address (using address and mask of the subnet that is about to be 364 forwarded over). If one further hop is required for the join-ack to 365 reach the originator of the corresponding join-request, the router 366 does not send a normal join-ack, but rather sends a JOIN-ACK with 367 subcode PROXY-ACK. Proxy-acks, like normal join-acks, are unicast. 369 A router receiving a proxy-ack cancels any transient state it has 370 created for the corresponding group. The sender of a proxy-ack 371 becomes the group-specific DR (G-DR) for the group - a token (impli- 372 cit) identity. In the normal case where there is no LAN extra hop, 373 the receipt of a JOIN-ACK means that the D-DR becomes the G-DR for 374 the specified group. 376 Control packets may continue to be incurred an extra-hop if they are 377 generated by the D-DR, but data packets will not; since only the 378 sender of the proxy-ack keeps a FIB entry for the group, it is the 379 only router on the LAN that has an upstream forwarding entry. 381 Now let's see an illustration of this; a host joins a CBT group (the 382 first to do so on the subnet), but more than one router is present on 383 its subnet. B's subnet, S4, has 3 CBT routers attached. Assume also 384 that R6 has been elected IGMP-querier and CBT D-DR. 386 The invoking of a multicast application on B causes an IGMP RP/Core- 387 Report and an IGMP group membership report to be multicast to the 388 corresponding group. The target core and ordered core list are 389 contained within the RP/Core report. R6 generates a join-request for 390 target core R4, subcode ACTIVE_JOIN. R6's routing table says the 391 next-hop on the path to R4 is R2, which is on the same subnet as R6. 392 This is irrelevant to R6, which unicasts it to R2. R2 unicasts it to 393 R3, which happens to be already on-tree for the specified group (from 394 R1's join). R3 therefore can acknowledge the arrived join and unicast 395 it back to R2. R2 realises it is not the origin of the corresponding 396 join-request, but sees that the origin (R6) is on the same subnet as 397 itself, and that over which the join-ack would be forwarded to the 398 origin, R6. R2 unicasts the join-ack on its final hop, but sets the 399 ack subcode to PROXY-ACK. This results in the D-DR (R6) removing its 400 pending join information for the specified group. Another consequence 401 of receiving a proxy-ack is that the D-DR need not create a FIB entry 402 for the specified group. 404 If an IGMP RP/Core-Report is received by a D-DR with a join for the 405 same group already pending, it takes no action. 407 Note that the presence of underlying transient asymmetric routes is 408 irrelevant to the tree-building process; CBT tree branches are sym- 409 metric by the nature in which they are built. Joins set up transient 410 state (incoming and outgoing interface state) in all routers along a 411 path to a particular core. The corresponding join-ack traverses the 412 reverse-path of the join as dictated by the transient state, and not 413 the path that underlying routing would dictate. Whilst permanent 414 asymmetric routes could pose a problem for CBT, transient asymmetri- 415 city is detected by the CBT protocol. 417 _2._7. _T_r_e_e _T_e_a_r_d_o_w_n 419 There are two scenarios whereby a tree branch may be torn down: 421 o+ During a re-configuration. If a router's best next-hop to the 422 specified core is one of its existing children, then before 423 sending the join it must tear down that particular downstream 424 branch. It does so by sending a FLUSH_TREE message which is pro- 425 cessed hop-by-hop down the branch. All routers receiving this 426 message must process it and forward it to all their children. 427 Routers that have received a flush message will re-establish 428 themselves on the delivery tree if they have directly connected 429 subnets with group presence. 431 o+ If a CBT router has no children it periodically checks all its 432 directly connected subnets for group member presence. If no 433 member presence is ascertained on any of its subnets it sends a 434 QUIT_REQUEST upstream to remove itself from the tree. 436 Let's see, using the example topology of figure 1, how a tree branch 437 is gracefully torn down using a QUIT_REQUEST. 439 Assume group member B leaves group G on subnet S4. B issues an IGMP 440 HOST-MEMBERSHIP-LEAVE message which is multicast to the "all-routers" 441 group (224.0.0.2). R6, the subnet's D-DR and IGMP-querier, responds 442 with a group-specific-QUERY. No hosts respond within the required 443 response interval, so D-DR assumes group G traffic is no longer 444 wanted on subnet S4. 446 Since R2 has no CBT children, and no other directly attached subnets 447 with group G presence, it immediately follows on by sending a 448 QUIT_REQUEST to R3, its parent on the tree for group G. R3 responds 449 by unicasting a QUIT_ACK to R2. R3 subsequently checks whether it in 450 turn can send a quit by checking group G presence on its directly 451 attached subnets, and any group G children. It has the latter (R1 is 452 its child on the group G tree), and so R3 cannot itself send a quit. 453 However, the branch R3-R2 has been removed from the tree. 455 _3. _C_B_T _P_r_o_t_o_c_o_l _P_o_r_t_s 457 CBT routers implement user-level code for tree building, maintenance, 458 and teardown. This results in a group-specific forwarding information 459 base (FIB) being built in user-space. This FIB is downloaded into 460 kernel-space for fast and efficient data packet forwarding. Any 461 changes in FIB entries are communicated to the kernel as they occur, 462 so that the kernel FIB always reflects the current state of any par- 463 ticular group's tree. 465 CBT primary and auxiliary control packets then travel inside UDP 466 datagrams, as the following diagram illustrates: 468 ++++++++++++++++++++++++++++++++++++++++++++ 469 | IP header | UDP header | CBT control pkt | 470 ++++++++++++++++++++++++++++++++++++++++++++ 472 Figure 2. Encapsulation for CBT control messages 474 The following UDP port numbers are currently being used (their use at 475 this stage is unofficial, and pending official approval): 477 o+ CBT Primary control messages - UDP port 7777 479 o+ CBT Auxiliary control messages - UDP port 7778 481 _4. _D_a_t_a _P_a_c_k_e_t _F_o_r_w_a_r_d_i_n_g (_n_a_t_i_v_e _m_o_d_e) 483 In CBT "native mode" only one forwarding method is used, namely all 484 data packets are forwarded over CBT tree interfaces as native IP mul- 485 ticasts, i.e. there are no encapsulations required. This assumes that 486 CBT is the multicast routing protocol in operation within the domain 487 (or "cloud") in question, and that all routers within the domain of 488 operation are CBT-capable, i.e. there are no "tunnels". If this 489 latter constraint cannot be satisfied it is necessary to encapsulate 490 IP-over-IP before forwarding to a child or parent reachable via non- 491 CBT-capable router(s). 493 The rules for native mode forwarding are altogether simpler than 494 those for CBT-mode forwarding (see next section); data packets are 495 sent over child/parent interfaces as specified in the corresponding 496 FIB entry, as native IP multicasts. This applies to point-to-point 497 links as well as broadcast-type subnetworks such as Ethernets. 499 _5. _D_a_t_a _P_a_c_k_e_t _F_o_r_w_a_r_d_i_n_g (_C_B_T _m_o_d_e) 501 "CBT mode" as opposed to "native mode" describes the forwarding of 502 data packets over CBT tree interfaces containing a CBT header encap- 503 sulation. For efficiency, this encapsulation is as follows: 505 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 506 | encaps IP hdr | CBT hdr | original IP hdr | data ....| 507 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 509 Figure 3. Encapsulation for CBT mode 511 By using the encapsulations above there is no necessity to modify a 512 packet's original IP header until it is forwarded over subnets with 513 group member presence in native mode. When this happens, the TTL 514 value of the original IP header is set to one before forwarding. 516 The TTL value of the CBT header is set by the encapsulating CBT 517 router directly attached to the origin of a data packet. This value 518 is decremented each time it is processed by a CBT router. An encap- 519 sulated data packet is discarded when the CBT header TTL value 520 reaches zero. 522 The purpose of the (outer) encapsulating IP header is to "tunnel" 523 data packets between CBT-capable routers (or "islands"). The outer IP 524 header's TTL value is set to the "length" of the corresponding tun- 525 nel, or MAX_TTL if this is not known, or subject to change. 527 For native mode IP multicasts, i.e. those without any extra encapsu- 528 lation, the TTL value of the IP header is decremented each time the 529 packet is received by a multicast router. 531 It is worth pointing out at this point the distinction between sub- 532 networks and tree branches, although they can be one and the same. 533 For example, a multi-access subnetwork containing routers and end- 534 systems could potentially be both a CBT tree branch and a subnetwork 535 with group member presence. A tree branch which is not simultaneously 536 a subnetwork is either a "tunnel" or a point-to-point link. 538 In CBT forwarding mode there are three forwarding methods used by CBT 539 routers: 541 o+ IP multicasting. This method is used to send a data packet 542 across a directly-connected subnetwork with group member pres- 543 ence. System host changes are not required for CBT. Similarly, 544 end-systems originating multicast data do so in traditional IP- 545 style. 547 o+ CBT unicasting. This method is used for sending data packets 548 encapsulated (as illustrated above) across a tunnel or point- 549 to-point link. En/de-capsulation takes place in CBT routers. 551 o+ CBT multicasting. This method sends data packets encapsulated 552 (as illustrated above) but the outer encapsulating IP header 553 contains a multicast address. This method is used when a parent 554 or multiple children are reachable over a single physical inter- 555 face, as could be the case on a multi-access Ethernet. The IP 556 module of end-systems subscribed to the same group will discard 557 these multicasts since the CBT payload type (protocol id) of the 558 outer IP header is not recognizable by hosts. 560 CBT routers create Forwarding Information Base (FIB) entries whenever 561 they send or receive a JOIN_ACK (with the exception of a proxy-ack, 562 as explained in section 2.5). The FIB describes the parent-child 563 relationships on a per-group basis. A FIB entry dictates over which 564 tree interfaces, and how (unicast or multicast) a data packet is to 565 be sent. Additionally, a data packet is IP multicast over any 566 directly-connected subnetworks with group member presence. Such 567 interfaces are kept in a separate table relating to IGMP. A FIB entry 568 is shown below: 570 32-bits 4 4 4 4 | 4 571 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 572 | group-id | parent addr | parent vif | No. of | | 573 | | index | index |children | children | 574 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 575 |chld addr |chld vif | 576 | index | index | 577 |+-+-+-+-+-+-+-+-+-+-+ 578 |chld addr |chld vif | 579 | index | index | 580 |+-+-+-+-+-+-+-+-+-+-+ 581 |chld addr |chld vif | 582 | index | index | 583 |+-+-+-+-+-+-+-+-+-+-+ 584 | | 585 | etc. | 586 |+-+-+-+-+-+-+-+-+-+-+ 588 Figure 4. CBT FIB entry 590 Note that a CBT FIB is required for both CBT-mode and native-mode 591 multicasting. 593 The field lengths shown above assume a maximum of 16 directly con- 594 nected neighbouring routers. 596 When a data packet arrives at a CBT router, the following rules 597 apply: 599 o+ if the packet is an IP-style multicast, it is checked to see if 600 it originated locally (i.e. if the arrival interface subnetmask 601 bitwise ANDed with the packet's source IP address equals the 602 arrival interface's subnet number, the packet was sourced 603 locally). If the packet is not of local origin, it is discarded. 605 o+ the packet is IP multicast to all directly connected subnets 606 with group member presence. The packet is sent with an IP TTL 607 value of 1 in this case. 609 o+ the packet is encapsulated for CBT forwarding (see figure 3) and 610 unicast to parent and children. However, if more than one child 611 is reachable over the same interface the packet will be CBT mul- 612 ticast. Therefore, it is possible that an IP-style multicast and 613 a CBT multicast will be forwarded over a particular subnetwork. 615 NOTE: the TTL value of encapsulated data packets is manipulated as 616 described at the beginning of this section. 618 Using our example topology in figure 1, let's assume member G ori- 619 ginates an IP multicast packet. R8 is the DR for subnet S10. R8 CBT 620 unicasts the packet to each of its children, R9 and R12. These chil- 621 dren are not reachable over the same interface. R8, being the DR for 622 subnets S14 and S10 also IP multicasts the packet to S14 (S10 623 received the IP style packet already from the originator). R9, the DR 624 for S12, need not IP multicast onto S12 since there are no members 625 present there. R9 CBT unicasts the packet to R10, which is the DR for 626 S13 and S15. It IP multicasts to both S13 and S15. 628 Going upstream from R8, R8 CBT unicasts to R4. It is DR for all 629 directly connected subnets and therefore IP multicasts the data 630 packet onto S5, S6 and S7, all of which have member presence. R4 uni- 631 casts the packet to all outgoing children, R3 and R7 (NOTE: R4 does 632 not have a parent since it is the primary core router for the group). 633 R7 IP multicasts onto S9. R3 CBT unicasts to R1 and R2, its children. 634 Finally, R1 IP multicasts onto S1 and S3, and R2 IP multicasts onto 635 S4. 637 _5._1. _N_o_n-_M_e_m_b_e_r _S_e_n_d_i_n_g (_C_B_T _m_o_d_e) 639 For a multicast data packet to span beyond the scope of the originat- 640 ing subnetwork at least one CBT-capable router must be present on 641 that subnetwork. The default DR (D-DR) for the group on the 642 subnetwork must encapsulate the IP-style packet and unicast it to a 643 core for the group. This requires CBT routers to have access to a 644 mapping mechanism between group addresses and core routers. This 645 mechanism is currently beyond the scope of this document. 647 Alternatively, hosts could perform the CBT encapsulation themselves, 648 but this would require hosts to run a core discovery protocol. Host 649 modifications required for such a protocol, and the subsequent data 650 packet encapsulation, are considered extremely undesirable, and are 651 therefore not considered further. 653 _5._2. _E_l_i_m_i_n_a_t_i_n_g _t_h_e _T_o_p_o_l_o_g_y-_D_i_s_c_o_v_e_r_y _P_r_o_t_o_c_o_l _i_n _t_h_e _P_r_e_s_e_n_c_e _o_f 654 _T_u_n_n_e_l_s 656 Traditionally, multicast protocols operating within a virtual topol- 657 ogy, i.e. an overlay of the physical topology, have required the 658 assistance of a multicast topology discovery protocol, such as that 659 present in DVMRP. However, it is possible to have a multicast proto- 660 col operate within a virtual topology without the need for a multi- 661 cast topology discovery protocol. One way to achieve this is by hav- 662 ing a router configure all its tunnels to its virtual neighbours in 663 advance. A tunnel is identified by a local interface address and a 664 remote interface address. Routing is replaced by "ranking" each such 665 tunnel interface associated with a particular core address; if the 666 highest-ranked route is unavailable (tunnel end-points are required 667 to run an Hello-like protocol between themselves) then the next- 668 highest ranked available route is selected, and so on. 670 CBT trees are built using the same join/join-ack mechanisms as 671 before, only now some branches of a delivery tree run in native mode, 672 whilst others (tunnels) run in CBT mode. Underlying unicast routing 673 dictates which interface a packet should be forwarded over. Each 674 interface is configured as either native mode or CBT mode, so a 675 packet can be encapsulated (decapsulated) accordingly. 677 As an example, router R's configuration would be as follows: 679 intf type mode remote addr 680 ----------------------------------- 681 #1 phys native - 682 #2 tunnel cbt 128.16.8.117 683 #3 phys native - 684 #4 tunnel cbt 128.16.6.8 685 #5 tunnel cbt 128.96.41.1 687 core backup-intfs 688 -------------------- 689 A #5, #2 690 B #3, #5 691 C #2, #4 693 The CBT FIB needs to be slightly modified to accommodate an extra 694 field, "backup-intfs" (backup interfaces). The entry in this field 695 specifies a backup interface whenever a tunnel interface specified in 696 the FIB is down. Additional backups (should the first-listed backup 697 be down) are specified for each core in the core backup table. For 698 example, if interface (tunnel) #2 were down, and the target core of a 699 CBT control packet were core A, the core backup table suggests using 700 interface #5 as a replacement. If interface #5 happened to be down 701 also, then the same table recommends interface #2 as a backup for 702 core A. 704 _5._3. _N_o_n-_M_e_m_b_e_r _S_e_n_d_i_n_g (_n_a_t_i_v_e _m_o_d_e) 706 For a multicast data packet to span beyond the scope of the originat- 707 ing subnetwork at least one CBT-capable router must be present on 708 that subnetwork. The default DR (D-DR) on the subnetwork must encap- 709 sulate (IP-over-IP) the IP-style packet and unicast it to a core for 710 the group. This requires CBT routers to have access to a mapping 711 mechanism between group addresses and core routers. This mechanism 712 is currently beyond the scope of this document. 714 Again, host changes could obviate the need for a local router to per- 715 form a mapping and an encapsulation, but this is not 716 considered a desirable option. 718 _6. _T_r_e_e _M_a_i_n_t_e_n_a_n_c_e 720 Once a tree branch has been created, i.e. a CBT router has received a 721 JOIN_ACK for a JOIN_REQUEST previously sent (forwarded), a child 722 router is required to monitor the status of its parent/parent link at 723 fixed intervals by means of a ``keepalive'' mechanism operating 724 between them. The ``keepalive'' mechanism is implemented by means of 725 two CBT control messages: CBT_ECHO_REQUEST and CBT_ECHO_REPLY. 726 Immediately subsequent to a parent/child relationship being esta- 727 blished, a child unicasts a CBT-ECHO-REQUEST to its parent, which 728 unicasts a CBT-ECHO-REPLY in response. 730 CBT echo requests and replies may be aggregated to conserve bandwidth 731 on links over which tree branches overlap. However, this is only pos- 732 sible if group address assignment has been coordinated to facilitate 733 aggregation. (see section 8.4). 735 For any CBT router, if its parent router, or path to the parent, 736 fails, the child is initially responsible for re-attaching itself, 737 and therefore all routers subordinate to it on the same branch, to 738 the tree. 740 _6._1. _R_o_u_t_e_r _F_a_i_l_u_r_e 742 An on-tree router can detect a failure from the following two cases: 744 o+ if a child stops receiving CBT_ECHO_REPLY messages. In this case 745 the child realises that its parent has become unreachable and 746 must therefore try and re-connect to the tree. The router on the 747 tree immediately subordinate to the failed router arbitrarily 748 elects a core from its list of cores for this group. The rejoin- 749 ing router then sends a JOIN_REQUEST (subcode ACTIVE_JOIN if it 750 has no children attached, and subcode ACTIVE_REJOIN if at least 751 one child is attached) to the best next-hop router on the path 752 to the elected core. If no JOIN-ACK is received after the speci- 753 fied number of retransmissions, an alternate core is arbitarily 754 elected from the core list. The process is repeated until a 755 JOIN-ACK is received for a maximum of RECONNECT-TIMEOUT seconds 756 (90 secs is the recommended default). 758 o+ if a parent stops receiving CBT_ECHO_REQUESTs from a child. In 759 this case the parent simply removes the child interface from its 760 FIB entry for the particular group. 762 _6._2. _R_o_u_t_e_r _R_e-_S_t_a_r_t_s 764 There are two cases to consider here: 766 o+ Core re-start. All JOIN-REQUESTs (all types) carry the identi- 767 ties (i.e. addresses) of each of the cores for a group. If a 768 router is a core for a group, but has only recently re-started, 769 it will not be aware that it is a core for any group(s). In such 770 circumstances, a core only becomes aware that it is such by 771 receiving a JOIN-REQUEST. Subsequent to a core learning its 772 status in this way, if it is not the primary core it ack- 773 nowledges the received join, then sends a JOIN_REQUEST (subcode 774 ACTIVE_REJOIN) to the primary core. If the re-started router is 775 the primary core, it need take no action, i.e. in all cir- 776 cumstances, the primary core simply waits to be joined by other 777 routers. 779 o+ Non-core re-start. In this case, the router can only join the 780 tree again if a downstream router sends a JOIN_REQUEST through 781 it, or it is elected DR for one of its directly attached sub- 782 nets, and subsequently receives an IGMP RP/Core Report. 784 _6._3. _R_o_u_t_e _L_o_o_p_s 786 Routing loops are only a concern when a router with at least one 787 child is attempting to re-join a CBT tree. In this case the re- 788 joining router sends a JOIN_REQUEST (subcode ACTIVE REJOIN) to the 789 best next-hop on the path to the core. This join is forwarded as nor- 790 mal until it reaches either the core, or a non-core router that is 791 already part of the tree. If the join reaches the specified core, the 792 join terminates there and is ACKd as normal. If however, the join is 793 terminated by non-core router, the ACTIVE_REJOIN is converted to a 794 NON_ACTIVE_REJOIN, keeping the origin as that specified in the 795 ACTIVE_REJOIN, and forwarded upstream. A JOIN_ACK is also sent down- 796 stream to acknowledge the received join. 798 The NON_ACTIVE_REJOIN is a loop detection packet. All routers receiv- 799 ing this must forward it over their parent interface. This process 800 continues until the NON_ACTIVE_REJOIN is received by the primary core 801 for the group, or the NON_ACTIVE_REJOIN is received by the originator 802 of the corresponding ACTIVE_REJOIN. A router will know this since the 803 "origin" field remains unchanged when a join is converted from an 804 ACTIVE_REJOIN to a NON_ACTIVE_REJOIN. In the former case, the 805 primary core acknowledges the NON_ACTIVE_REJOIN with JOIN-ACK, sub- 806 code NACTIVE_REJOIN. This message is unicast directly to the 807 REJOIN_ACTIVE originator. In the latter case, the ACTIVE_REJOIN ori- 808 ginator immediately sends a QUIT_REQUEST to its newly-established 809 parent and the loop is broken. 811 o+ Using figure 5 (over) to demonstrate this, if R3 is attempting 812 to re-join the tree (R1 is the core in figure 5) and R3 believes 813 its best next-hop to R1 is R6, and R6 believes R5 is its best 814 next-hop to R1, which sees R4 as its best next-hop to R1 -- a 815 loop is formed. R3 begins by sending a JOIN_REQUEST (subcode 816 ACTIVE_REJOIN, since R4 is its child) to R6. R6 forwards the 817 join to R5. R5 is on-tree for the group, so changes the join 818 subcode to NON_ACTIVE_REJOIN, and forwards this to its parent, 819 R4. R4 forwards the NON_ACTIVE_REJOIN to R3, its parent. R3 820 originated the corresponding ACTIVE_REJOIN, and so it immedi- 821 ately sends a QUIT_REQUEST to R6, which in turn sends a quit if 822 it has not received an ACK from R5 already AND has itself a 823 child or subnets with member presence. If so it does not send a 824 quit -- the loop has been broken by R3 sending the first quit. 826 QUIT_REQUESTs are typically acknowledged by means of a QUIT_ACK, but 827 there might be cases where, due to failure, the parent cannot 828 respond. In this case the child nevertheless removes the parent 829 information after some small number (typically 3) of re-tries. 831 ------ 832 | R1 | 833 ------ 834 | 835 --------------------------- 836 | 837 ------ 838 | R2 | 839 ------ 840 | 841 --------------------------- 842 | | 843 ------ | 844 | R3 |--------------------------| 845 ------ | 846 | | 847 --------------------------- | 848 | | ------ 849 ------ | | | 850 | R4 | |-------| R6 | 851 ------ | |----| 852 | | 853 --------------------------- | 854 | | 855 ------ | 856 | R5 |--------------------------| 857 ------ | 858 | 860 Figure 5: Example Loop Topology 862 In the other scenario where no loop is actually formed, router R3 863 sends a join, subcode REJOIN_ACTIVE to R2, the next-hop on the path 864 to core R1. R2 forwards the re-join to R1, the primary core, which 865 unicasts a JOIN-ACK to the originator of the REJOIN_ACTIVE, i.e. the 866 join-ack remains invisible to R2. 868 _7. _D_a_t_a _P_a_c_k_e_t _L_o_o_p_s 870 The CBT protocol builds a loop-free distribution tree. If all routers 871 that comprise a particular tree function correctly, data packets 872 should never traverse a tree branch more than once. 874 CBT routers will only forward native-style data packets if they are 875 received over a valid on-tree interface. A native-style data packet 876 that is not received over such an interface is discarded. 878 Encapsulated CBT data packets from a non-member sender can arrive via 879 an "off-tree" interface (this is how CBT-mode sends data across tun- 880 nels, and how data from non-member senders in native-mode or CBT-mode 881 reaches a tree). The encapsulating CBT data packet header includes 882 an "on-tree" field, which contains the value 0x00 until the data 883 packet reaches an on-tree router. At this point, the router must con- 884 vert this value to 0xff to indicate the data packet is now on-tree. 885 This value remains unchanged, and from here on the packet should 886 traverse only on-tree interfaces. If an encapsulated packet happens 887 to "wander" off-tree and back on again, the latter on-tree router 888 will receive the CBT encapsulated packet via an off-tree interface. 889 However, this router will recognise that the "on-tree" field of the 890 encapsulating CBT header is set to 0xff, and so immediately discards 891 the packet. 893 _8. _C_B_T _P_a_c_k_e_t _F_o_r_m_a_t_s _a_n_d _M_e_s_s_a_g_e _T_y_p_e_s 895 CBT packets travel in IP datagrams. We distinguish between two types 896 of CBT packet: CBT data packets, and CBT control packets. 898 CBT data packets carry a CBT header when these packets are traversing 899 CBT tree branches. The enscapsulation (for "CBT mode") is shown 900 below: 902 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 903 | encaps IP hdr | CBT hdr | original IP hdr | data ....| 904 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 906 Figure 6. Encapsulation for CBT mode 908 CBT control packets carry a CBT control header. All CBT control mes- 909 sages are implemented over UDP. This makes sense for several reasons: 910 firstly, all the information required to build a CBT delivery tree is 911 kept in user space. Secondly, implementation is made considerably 912 easier. 914 CBT control messages fall into two categories: primary maintenance 915 messages, which are concerned with tree-building, re-configuration, 916 and teardown, and auxiliary maintenance messsages, which are mainly 917 concerned with general tree maintenance. 919 _8._1. _C_B_T _H_e_a_d_e_r _F_o_r_m_a_t 921 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 922 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 923 | vers |unused | type | hdr length | on-tree|unused| 924 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 925 | checksum | IP TTL | unused | 926 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 927 | group identifier | 928 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 929 | core address | 930 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 931 | packet origin | 932 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 933 | flow identifier | 934 | (T.B.D) | 935 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 936 | security fields | 937 | (T.B.D) | 938 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 940 Figure 7. CBT Header 942 Each of the fields is described below: 944 o+ Vers: Version number -- this release specifies version 1. 946 o+ type: indicates whether the payload is data or control infor- 947 mation. 949 o+ hdr length: length of the header, for purpose of checksum 950 calculation. 952 o+ on-tree: indicates whether the packet is on-tree (0xff) or 953 off-tree (0x00). Once this field is set (i.e. on-tree), it 954 is non-changing. 956 o+ checksum: the 16-bit one's complement of the one's complement 957 of the CBT header, calculated across all fields. 959 o+ IP TTL: TTL value gleaned from the IP header where the packet 960 originated. It is decremented each time it traverses a CBT 961 router. 963 o+ group identifier: multicast group address. 965 o+ core address: the unicast address of a core for the group. A 966 core address is always inserted into the CBT header by an 967 originating host, since at any instant, it does not know if 968 the local DR for the group is on-tree. If it is not, the 969 local DR must unicast the packet to the specified core. 971 o+ packet origin: source address of the originating end-system. 973 o+ flow-identifier: (T.B.D) value uniquely identifying a previ- 974 ously set up data stream. 976 o+ security fields: these fields (T.B.D.) will ensure the 977 authenticity and integrity of the received packet. 979 _8._2. _C_o_n_t_r_o_l _P_a_c_k_e_t _H_e_a_d_e_r _F_o_r_m_a_t 981 See over... 983 The individual fields are described below. It should be noted that only 984 certain fields beyond ``group identifier'' are processed for the dif- 985 ferent control messages. 987 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 988 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 989 | vers |unused | type | code | # cores | 990 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 991 | hdr length | checksum | 992 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 993 | group identifier | 994 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 995 | packet origin | 996 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 997 | target core address | 998 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 999 | Core #1 | 1000 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1001 | Core #2 | 1002 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1003 | Core #3 | 1004 | .... | 1005 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1006 | Resource Reservation fields | 1007 | (T.B.D) | 1008 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1009 | security fields | 1010 | (T.B.D) | 1011 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1013 Figure 8. CBT Control Packet Header 1015 o+ Vers: Version number -- this release specifies version 1. 1017 o+ type: indicates control message type (see sections 1.3, 1.4). 1019 o+ code: indicates subcode of control message type. 1021 o+ # cores: number of core addresses carried by this control 1022 packet. 1024 o+ header length: length of the header, for purpose of checksum 1025 calculation. 1027 o+ checksum: the 16-bit one's complement of the one's complement 1028 of the CBT control header, calculated across all fields. 1030 o+ group identifier: multicast group address. 1032 o+ packet origin: source address of the originating end-system. 1034 o+ target core address: desired/actual core affiliation of con- 1035 trol message. 1037 o+ Core #Z: IP address of core #Z. 1039 o+ Resource Reservation fields: these fields (T.B.D.) are used 1040 to reserve resources as part of the CBT tree set up pro- 1041 cedure. 1043 o+ Security fields: these fields (T.B.D.) ensure the authenti- 1044 city and integrity of the received packet. 1046 _8._3. _P_r_i_m_a_r_y _M_a_i_n_t_e_n_a_n_c_e _M_e_s_s_a_g_e _T_y_p_e_s 1048 There are six types of CBT primary maintenance message. Primary mes- 1049 sage subcodes are described in the next section. 1051 o+ JOIN-REQUEST (type 1): generated by a router and unicast to 1052 the specified core address. It is processed hop-by-hop on its 1053 way to the specified core. Its purpose is to establish the 1054 sending CBT router, and all intermediate CBT routers, as part 1055 of the corresponding delivery tree. 1057 o+ JOIN-ACK (type 2): an acknowledgement to the above. The full 1058 list of core addresses is carried in a JOIN-ACK, together 1059 with the actual core affiliation (the join may have been ter- 1060 minated by an on-tree router on its journey to the specified 1061 core, and the terminating router may or may not be affiliated 1062 to the core specified in the original join). A JOIN-ACK 1063 traverses the same path as the corresponding JOIN-REQUEST, 1064 with each CBT router on the path processing the ack. It is 1065 the receipt of a JOIN-ACK that actually creates a tree 1066 branch. 1068 o+ JOIN-NACK (type 3): a negative acknowledgement, indicating 1069 that the tree join process has not been successful. 1071 o+ QUIT-REQUEST (type 4): a request, sent from a child to a 1072 parent, to be removed as a child to that parent. 1074 o+ QUIT-ACK (type 5): acknowledgement to the above. If the 1075 parent, or the path to it is down, no acknowledgement will be 1076 received within the timeout period. This results in the 1077 child nevertheless removing its parent information. 1079 o+ FLUSH-TREE (type 6): a message sent from parent to all chil- 1080 dren, which traverses a complete branch. This message results 1081 in all tree interface information being removed from each 1082 router on the branch, possibly because of a re-configuration 1083 scenario. 1085 _8._3._1. _P_r_i_m_a_r_y _M_a_i_n_t_e_n_a_n_c_e _M_e_s_s_a_g_e _S_u_b_c_o_d_e_s 1087 The JOIN-REQUEST has three valid subcodes: 1089 o+ ACTIVE-JOIN (code 0) - sent from a CBT router that has no 1090 children for the specified group. 1092 o+ REJOIN-ACTIVE (code 1) - sent from a CBT router that has at 1093 least one child for the specified group. 1095 o+ REJOIN-NACTIVE (code 2) - converted from a REJOIN-ACTIVE by 1096 the first on-tree router receiving a REJOIN-ACTIVE. This mes- 1097 sage is forwarded over a router's parent interface until it 1098 either reaches the primary core, or is received by the origi- 1099 nator of the corresponding REJOIN-ACTIVE. 1101 A JOIN-ACK has three valid subcodes: 1103 o+ NORMAL (code 0) - sent by a core router, or on-tree non-core 1104 router acknowledging joins with subcodes REJOIN-ACTIVE and 1105 ACTIVE-JOIN. 1107 o+ PROXY-ACK (code 1) - acknowledgement of a join-request by a 1108 router connected to the same subnet as the originator (subnet 1109 D-DR) of the corresponding join. 1111 o+ REJOIN-NACTIVE (code 2) - sent by a primary core to ack- 1112 nowledge the receipt of a join-request received with subcode 1113 REJOIN-NACTIVE. This ack is unicast directly to the router 1114 that converted the corresponding REJOIN-ACTIVE to REJOIN- 1115 NACTIVE. The CBT control packet "origin" field contains the 1116 IP address of the originator of the REJOIN-ACTIVE, so in 1117 order for the primary core to directly reach the source of 1118 the REJOIN-NACTIVE, the converting router inserts its IP 1119 address in the "core address" field of the control packet 1120 header. The primary core uses the address in this field to 1121 determine the target of the join-ack, subcode REJOIN-NACTIVE. 1123 _8._4. _A_u_x_i_l_l_i_a_r_y _M_a_i_n_t_e_n_a_n_c_e _M_e_s_s_a_g_e _T_y_p_e_s 1125 There are two CBT auxilliary maintenance message types. CBT auxiliary 1126 messages are encoded in a CBT control packet header, and the fields 1127 of the control packet are interpreted as illustrated below. The 1128 interpretation of certain fields further depends on whether aggrega- 1129 tion and security are implemented. 1131 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1132 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1133 | vers |unused | type | code | aggregate | 1134 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1135 | hdr length | checksum | 1136 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1137 | group identifier (or low end of range) | 1138 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1139 | group id mask or NULL | 1140 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1141 | NULL (if security implemented) | 1142 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1143 | security fields if implemented or NULL | 1144 | (T.B.D) | 1145 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1147 Figure 9. CBT Echo Request/Reply 1149 o+ CBT-ECHO-REQUEST (type 7): once a tree branch is established, 1150 this messsage acts as a ``keepalive'', and is unicast from 1151 child to parent. 1153 o+ CBT-ECHO-REPLY (type 8): positive reply to the above. 1155 CBT Echo Requests/Replies can be sent as aggregates, or individually 1156 for each group if multicast address assignment is such that aggrega- 1157 tion is not possible. If aggregation is implemented, the "aggregate" 1158 field (which replaces the "# cores" field of the standard control 1159 packet header. In this case, no cores are assumed present in the mes- 1160 sage) will contain the value 0xff, otherwise 0x00. 1162 If aggregation is not implemented, the "group id mask" field is set 1163 to NULL, or is not present, depending on whether security is imple- 1164 mented or not. Masks are used according to their standard networking 1165 usage. 1167 The "flow-id" field (to be done) of the standard control packet 1168 header is NULL if security is implemented, not present otherwise. 1170 The security fields (to be done) are only present if security is 1171 implemented. 1173 _9. _D_e_f_a_u_l_t _T_i_m_e_r _V_a_l_u_e_s 1175 There are several CBT control messages which are transmitted at fixed 1176 intervals. These values, retransmission times, and timeout values, 1177 are given below. Note these are recommended default values only, and 1178 are configurable with each implementation (all times are in seconds): 1180 o+ CBT-ECHO-INTERVAL 30 (time between sending successive CBT-ECHO- 1181 REQUESTs to parent). 1183 o+ PEND-JOIN-INTERVAL 10 (retransmission time for join-request if 1184 no ack rec'd) 1186 o+ PEND-JOIN-TIMEOUT 30 (time to try joining a different core, or 1187 give up) 1189 o+ EXPIRE-PENDING-JOIN 90 (remove transient state for join that has 1190 not been ack'd) 1192 o+ CBT-ECHO-TIMEOUT 90 (time to consider parent unreachable) 1194 o+ CHILD-ASSERT-INTERVAL 90 (check last time we rec'd an ECHO from 1195 each child) 1197 o+ CHILD-ASSERT-EXPIRE-TIME 180 (remove child information if no 1198 ECHO received) 1200 o+ IFF-SCAN-INTERVAL 300 (scan all interfaces for group presence. 1201 If none, send QUIT) 1203 _1_0. _I_n_t_e_r_o_p_e_r_a_b_i_l_i_t_y _I_s_s_u_e_s 1205 One of the design goals of CBT is for it to fully interwork with 1206 other IP multicast schemes. We have already described how CBT-style 1207 packets are transformed into IP-style multicasts, and vice-versa. 1209 In order for CBT to fully interwork with other schemes, it is neces- 1210 sary to define the interface(s) between a ``CBT cloud'' and the cloud 1211 of another scheme. The CBT authors are currently working out the 1212 details of the ``CBT-other'' interface, and therefore we omit further 1213 discussion of this topic at the present time. 1215 _1_1. _C_B_T _S_e_c_u_r_i_t_y _A_r_c_h_i_t_e_c_t_u_r_e 1217 see current I-D: draft-ietf-idmr-mkd-01.{ps,txt} 1219 Acknowledgements 1221 Special thanks goes to Paul Francis, NTT Japan, for the original 1222 brainstorming sessions that brought about this work. 1224 Thanks also to the networking team at Bay Networks for their comments 1225 and suggestions, in particular Steve Ostrowski for his suggestion of 1226 using "native mode" as a router optimization, Eric Crawley, Scott 1227 Reeve, and Nitin Jain. Thanks also to Ken Carlberg (SAIC) for review- 1228 ing the text, and generally providing constructive comments 1229 throughout. 1231 I would also like to thank the participants of the IETF IDMR working 1232 group meetings for their general constructive comments and sugges- 1233 tions since the inception of CBT. 1235 APPENDIX 1237 IGMP version 3 has recently been proposed [6]. The authors have the 1238 following recommendations for amendments (all minor) to IGMPv3: 1240 o+ The IGMPv3 draft [6] introduces a new IGMP message type, the PIM 1241 RP-REPORT message. Its message format is shown below: 1243 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1244 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1245 | Type | Code | Checksum | 1246 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1247 | Group Address | 1248 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1249 | Version | Reserved | # of RP's (N) | 1250 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1251 | RP Address [1] | 1252 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1253 | RP Address [...] | 1254 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1255 | RP Address [N] | 1256 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1258 Figure 10. PIM RP-REPORT. 1260 The CBT authors propose the following minor amendments to the IGMP 1261 PIM RP-REPORT: 1263 o+ the report to be re-named RP/CORE-REPORT 1265 o+ RP fields re-named RP/Core fields 1267 o+ the reserved field to be re-named the "target core" field, to 1268 contain the numeric value of the position of the target core in 1269 the RP/Core list 1271 o+ The introduction of a new code value to distinguish PIM RP 1272 reports from CBT Core reports. 1274 These minor amendments to IGMPv3 would satisfy CBT's operational 1275 requirements. 1277 Author's Address: 1279 Tony Ballardie, 1280 Department of Computer Science, 1281 University College London, 1282 Gower Street, 1283 London, WC1E 6BT, 1284 ENGLAND, U.K. 1286 Tel: ++44 (0)71 419 3462 1287 e-mail: A.Ballardie@cs.ucl.ac.uk 1289 References 1291 [1] DVMRP. Described in "Multicast Routing in a Datagram Internet- 1292 work", S. Deering, PhD Thesis, 1990. Available via anonymous ftp from: 1293 gregorio.stanford.edu:vmtp/sd-thesis.ps. 1295 [2] J. Moy. Multicast Routing Extensions to OSPF. Communications of 1296 the ACM, 37(8): 61-66, August 1994. 1298 [3] D. Farinacci, S. Deering, D. Estrin, and V. Jacobson. Protocol 1299 Independent Multicast (PIM) Dense-Mode Specification (draft-ietf- 1300 idmr-pim-spec-01.ps). Working draft, 1994. 1302 [4] A. J. Ballardie. Scalable Multicast Key Distribution (draft-ietf- 1303 idmr-mkd-01.txt). Working draft, 1995. 1305 [5] A. J. Ballardie. "A New Approach to Multicast Communication in a 1306 Datagram Internetwork", PhD Thesis, 1995. Available via anonymous ftp 1307 from: cs.ucl.ac.uk:darpa/IDMR/ballardie-thesis.ps.Z. 1309 [6] W. Fenner. Internet Group Management Protocol, version 2 (IGMPv2), 1310 (draft-idmr-igmp-v2-01.txt). 1312 [7] B. Cain, S. Deering, A. Thyagarajan. Internet Group Management 1313 Protocol Version 3 (IGMPv3) (draft-cain-igmp-00.txt). 1315 [8] M. Handley, J. Crowcroft, I. Wakeman. Hierarchical Rendezvous 1316 Point proposal, work in progress. 1317 (http://www.cs.ucl.ac.uk/staff/M.Handley/hpim.ps). 1319 [9] D. Estrin et al. USC/ISI, Work in progress. (document not yet 1320 available). 1322 [10] D. Estrin et al. PIM Sparse Mode Specification. (draft-ietf- 1323 idmr-pim-sparse-spec-00.txt).