idnits 2.17.1 draft-ietf-idmr-cbt-spec-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-19) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == It seems as if not all pages are separated by form feeds - found 0 form feeds but 32 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There are 3 instances of too long lines in the document, the longest one being 8 characters in excess of 72. ** There are 753 instances of lines with control characters in the document. ** The abstract seems to contain references (3], [4,5], [1,2,), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 209 has weird spacing: '...e where some...' == Line 211 has weird spacing: '...CBT. In such ...' == Line 256 has weird spacing: '...ter has a FI...' == Line 345 has weird spacing: '...ticasts a not...' == Line 645 has weird spacing: '... tunnel cbt ...' == (2 more instances...) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? '1' on line 1210 looks like a reference -- Missing reference section? '2' on line 1214 looks like a reference -- Missing reference section? '3' on line 1217 looks like a reference -- Missing reference section? '4' on line 1221 looks like a reference -- Missing reference section? '5' on line 1225 looks like a reference -- Missing reference section? '8' on line 1235 looks like a reference -- Missing reference section? '9' on line 1240 looks like a reference -- Missing reference section? '7' on line 1232 looks like a reference -- Missing reference section? '10' on line 1243 looks like a reference -- Missing reference section? '6' on line 1229 looks like a reference Summary: 13 errors (**), 0 flaws (~~), 8 warnings (==), 13 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Inter-Domain Multicast Routing (IDMR) A. J. Ballardie 2 INTERNET-DRAFT University College London 3 S. Reeve 4 Bay Networks, Inc. 5 N. Jain 6 Bay Networks, Inc. 8 February 9th, 1996 10 Core Based Trees (CBT) Multicast 12 -- Protocol Specification -- 14 16 Status of this Memo 18 This document is an Internet Draft. Internet Drafts are working do- 19 cuments of the Internet Engineering Task Force (IETF), its Areas, and 20 its Working Groups. Note that other groups may also distribute work- 21 ing documents as Internet Drafts). 23 Internet Drafts are draft documents valid for a maximum of six 24 months. Internet Drafts may be updated, replaced, or obsoleted by 25 other documents at any time. It is not appropriate to use Internet 26 Drafts as reference material or to cite them other than as a "working 27 draft" or "work in progress." 29 Please check the I-D abstract listing contained in each Internet 30 Draft directory to learn the current status of this or any other 31 Internet Draft. 33 Abstract 35 This document describes the Core Based Tree (CBT) network layer mul- 36 ticast protocol specification. CBT is a next-generation multicast 37 protocol that makes use of a shared delivery tree rather than 38 separate per-sender trees utilized by most other multicast schemes 39 [1, 2, 3]. 41 This specification includes a description of an optimization whereby 42 native IP-style multicasts are forwarded over tree branches as well 43 as subnetworks with group member presence. This mode of operation 44 will be called CBT "native mode" and obviates the need to encapsulate 45 data packets before forwarding over CBT tree interfaces. Native mode 46 is only relevant to CBT-only domains or ``clouds''. Also included are 47 some new "data-driven" features. 49 A special authors' note is included explaining the latest updates to 50 the CBT specification, together with some nomenclature, and miscel- 51 laneous items. 53 This document is progressing through the IDMR working group of the 54 IETF. The CBT architecture is described in an accompanying document: 55 ftp://cs.ucl.ac.uk/darpa/IDMR/draft-ietf-idmr-arch-00.txt. Other 56 related documents include [4, 5]. For all IDMR-related documents, see 57 http://www.cs.ucl.ac.uk/ietf/idmr. 59 1. Authors' Note 61 The purpose of this note is to explain how the CBT protocol has 62 evolved since the previous version (November 1995). 64 Since the previous release, CBT has been assigned official IP proto- 65 col and UDP port numbers (section 8). 67 The CBT designers have constantly been seeking to streamline the pro- 68 tocol and seek new mechanisms to simplify the group initiation pro- 69 cedure. Especially, it has been a high priority to ensure that join 70 latency be kept to an absolute minimum. The November '95 draft intro- 71 duced the re-invented subnet designated router (DR) election pro- 72 cedure, described here in section 2.3. 74 The concept of proxy-ACKs was introduced in the November '95 draft, 75 but these have been removed since the extra message overhead does not 76 warrant the negligible gain they provide. 78 The CBT loop detection mechanism (comprising rejoin-active and 79 rejoin-nactive) has been slightly modified, and is now simpler and 80 more straighforward. The revised mechanism incorporates a new join 81 ack subcode, and is explained in section 5.3. 83 Core selection, placement, and management, which have prevented sim- 84 ple group initiation/joining, apparent in data-driven schemes (like 85 DVMRP), have been separated out from the protocol itself. Core 86 management is not a problem unique to CBT, but also PIM-Sparse Mode. 87 Separate, protocol-independent core management mechanisms are 88 currently being proposed/developed [8, 9]. In the absence of core 89 management/distribution protocol, the task could be manually handled 90 by network management facilities. 92 In CBT, the core routers for a particular group are categorised into 93 PRIMARY CORE, and NON-PRIMARY (secondary) CORES. 95 The core tree, the part of a tree linking all core routers together, 96 is built on-demand (section 2.4). That is, the core tree is only 97 built subsequent to a non-primary core receiving a join-request 98 (non-primary core routers join the primary core router -- the primary 99 need do nothing). Join-requests carry an ordered list of core routers 100 (and the identity of the primary core in its own separate field), 101 making it possible for the non-primary cores to know where to join. 102 On-demand core tree building is explained as part of section 2.4. 104 CBT now supports the aggregation of neighbour keepalives, which pre- 105 viously were sent on a per group basis. Any two adjacent CBT routers 106 need only send a single keepalive between each other, rather than one 107 per group. Additional aggregation strategies are currently being 108 worked on, and we present some ideas on aggregated rejoins in Appen- 109 dix A. An updated draft fully specifying CBT aggregation strategy 110 should appear soon. 112 The end result of these developments is that the CBT protocol is much 113 simplified and more efficient. 115 2. Protocol Specification 117 2.1. CBT Group Initiation 119 The requirement of hosts to discover the identity of candidate core 120 routers (or RPs) differentiates the role of hosts in shared tree mul- 121 ticast protocols and shortest-path tree multicast protocols; the 122 latter need only announce their desire to join a group by means of an 123 IGMP membership report. It is highly desirable that hosts wishing to 124 join a shared tree need only do the same, leaving local multicast 125 routers to discover mappings, or have local routers 126 configured with the identity of core(s) in the next level of a 127 hierarchy, as suggested by Hierarchical PIM [8]. 129 If the latter approach is eventually adopted by the IETF, then host 130 operations need not differ due to the type of multicast tree being 131 joined, and indeed, the type of tree being joined for a particular 132 group can remain transparent to the host. 134 If the latter approach is not adopted, then hosts need to inform 135 their local multicast router of a mapping for each 136 group joined. This requires hosts to discover mappings, 137 which in turn requires the existence of a (global) core advertisement 138 protocol. Hosts subsequently need a means of advertising mappings to the local multicast router so it can initiate a 140 join. This requires an extension to IGMP, for example, the presence 141 of IGMP RP/Core Reports, as suggested in IGMP version 3 [7], or the 142 protocol itself must provide a means (message) for advertising cores 143 to the local router. In the absence of H-PIM, some similar mechanism, 144 or IGMPv3, CBT implementors may wish to extend CBT to include a core 145 reporting message for group initiators/joiners (for example, whenever 146 a group is initiated/joined, a configuration file is read which holds 147 mappings). 149 Alternatively, mappings can be downloaded to local mul- 150 ticast routers by means of network management tools. 152 2.2. Tree Joining Process -- Overview 154 A local CBT router is notified, by IGMP, of a host's desire to join a 155 group. If more than one CBT router is present on the subnetwork, each 156 will receive the IGMP membership report. However, only one, the 157 default subnet designated router (DEFAULT DR) will act upon the 158 receipt of a report by initiating a CBT join. Note, a CBT join is 159 only initiated if the subnetwork is not yet part of the delivery 160 tree. Also, we assume that the local CBT default DR discovers mappings by one of the mechanisms described in the previous 162 section. DR election is described in section 2.3. 164 The following CBT control messages come into play subequent to the 165 host sending an IGMP join (host membership report): 167 + JOIN_REQUEST 169 + JOIN_ACK 171 A join-request is generated by a locally-elected DR (see next 172 section) in response to receiving an IGMP group membership report 173 from a directly connected host. The join is sent to the next-hop on 174 the path to the target core, as specified in the join packet. The 175 join is processed by each such hop on the path to the core, until 176 either the join reaches the target core itself, or hits a router that 177 is already part of the corresponding distribution tree (as identified 178 by the group address). In both cases, the router concerned terminates 179 the join, and responds with a join-ack, which traverses the reverse- 180 path of the corresponding join. This is possible due to the transient 181 path state created by a join traversing a CBT router. The ack fixes 182 that state. 184 2.3. DR Election 186 Multiple CBT routers may be connected to a multi-access subnetwork. 187 In such cases it is necessary to elect a (sub)network designated 188 router (DR) that is responsible for sending IGMP host membership 189 queries, and generating join-requests in response to receiving IGMP 190 group membership reports. Such joins are forwarded upstream by the 191 DR. 193 The IGMP querier election is as follows (note, here we talk about 194 "CBT routers", but the described mechanism also applies to the gen- 195 eral case). At start-up, a CBT router assumes it is the only CBT- 196 capable router on its subnetwork. It therefore sends two IGMP-HOST- 197 MEMBERSHIP-QUERYs in short succession (within 5 secs) (for robust- 198 ness) in order to quickly learn about any group memberships on the 199 subnet. If other CBT routers are present on the same subnet, they 200 will receive these IGMP queries, and depending on which router was 201 already the elected querier, yield querier duty to the new router iff 202 the new router is lower-addressed. If it is not, then the newly- 203 started CBT router will yield when it hears a query from the already 204 established querier. 206 The CBT DEFAULT DR (D-DR) is always (footnote 1) the subnet's IGMP- 207 _________________________ 209 1 This document does not address the case where some 210 routers on a multi-access subnet may be running multi- 211 cast routing protocols other than CBT. In such cases, 212 IGMP querier may be a non-CBT router, in which case the 213 CBT DR election breaks. This will be discussed in a CBT 214 interoperability document, to appear shortly. 216 querier; in CBT these two roles go hand-in-hand. As a result, there 217 is no protocol overhead whatsoever associated with electing the CBT 218 D-DR. 220 2.4. Tree Joining Process -- Details 222 The receipt of an IGMP group membership report by a CBT D-DR for a 223 CBT group not previously heard from triggers the tree joining pro- 224 cess. 226 Immediately subsequent to receiving an IGMP group membership report 227 for a CBT group not previously heard from, the D-DR unicasts a JOIN- 228 REQUEST to the first hop on the (unicast) path to the target core 229 specified in the CBT join packet. 231 Each CBT-capable router traversed on the path between the sending DR 232 and the core processes the join. However, if a join hits a CBT router 233 that is already on-tree (footnote), the join is not propogated 234 further, but ACK'd downstream from that point. 236 JOIN-REQUESTs carry the identity of all cores for the group. Assuming 237 there are no on-tree routers in between, once the join (subcode 238 ACTIVE_JOIN) reaches the target core, if the target core is not the 239 primary core (as indicated in a separate field of the join packet) it 240 first acknowledges the received join by means of a JOIN-ACK, then 241 sends a JOIN-REQUEST, subcode REJOIN-ACTIVE, to the primary core 242 router. Either the primary core, or the first on-tree router encoun- 243 tered, acknowledges the received rejoin by means of a JOIN-ACK. In 244 the former case, the primary core responds by sending a join-ack, 245 subcode PRIMARY-REJOIN-ACK, which traverses the reverse-path of the 246 join. In the latter case, the join-ack is returned with subcode NOR- 247 MAL; the receiving router responds to this with a rejoin-Nactive, for 248 loop detection. Note that loop detection is not necessary subsequent 249 to receiving a join-ack with subcode PRIMARY-REJOIN-ACK. Loop detec- 250 tion is described further in section 5.3. 252 To facilitate detailed protocol description, we use a sample topol- 253 ogy, illustrated in Figure 1 (shown over). Member hosts are shown as 254 individual capital letters, routers are prefixed with R, and subnets 255 _________________________ 256 "on-tree" describes whether a router has a FIB entry 257 for the corresponding group. 259 are prefixed with S. 261 A B 262 | S1 S4 | 263 ------------------- ----------------------------------------------- 264 | | | | 265 ------ ------ ------ ------ 266 | R1 | | R2 | | R5 | | R6 | 267 ------ ------ ------ ------ 268 C | | | | | 269 | | | | S2 | S8 | 270 ---------- ------------------------------------------ ------------- 271 S3 | 272 ------ 273 | R3 | 274 | ------ D 275 | S9 | | S5 | 276 | | --------------------------------------------- 277 | |----| | | 278 ---| R7 |-----| ------ 279 | |----| |------------------| R4 | 280 | S7 | ------ F 281 | | | S6 | 282 |-E | --------------------------------- 283 | | 284 | ------ 285 |---| |---------------------| R8 | 286 |R12 -----| ------ G 287 |---| | | | S10 288 | S14 ---------------------------- 289 | | 290 I --| ------ 291 | | R9 | 292 ------ 293 | S12 294 | ---------------------------- 295 S15 | | 296 | ------ 297 |----------------------|R10 | 298 J ---| ------ H 299 | | | 300 | ---------------------------- 301 | S13 303 Figure 1. Example Network Topology 304 Taking the example topology in figure 1, host A is the group initia- 305 tor, and has elected core routers R4 (primary core) and R9 (secondary 306 core) by some external protocol. We assume the local CBT DR discovers 307 mappings by "some means", possible one of the mechanisms 308 described in section 2.1. 310 Router R1 receives an IGMP host membership report, and proceeds to 311 unicast a JOIN-REQUEST, subcode ACTIVE-JOIN to the next-hop on the 312 path to R4 (R3), the target core. R3 receives the join, caches the 313 necessary group information, and forwards it to R4 -- the target of 314 the join. 316 R4, being the target of the join, sends a JOIN_ACK back out of the 317 receiving interface to the previous-hop sender of the join, R3. A 318 JOIN-ACK, like a JOIN-REQUEST, is processed hop-by-hop by each router 319 on the reverse-path of the corresponding join. The receipt of a 320 join-ack establishes the receiving router on the corresponding CBT 321 tree, i.e. the router becomes part of a branch on the delivery tree. 322 Finally, R3 sends a join-ack to R1. A new CBT branch has been 323 created, attaching subnet S1 to the CBT delivery tree for the 324 corresponding group (footnote 2). 326 For the period between any CBT-capable router forwarding (or ori- 327 ginating) a JOIN_REQUEST and receiving a JOIN_ACK the corresponding 328 router is not permitted to acknowledge any subsequent joins received 329 for the same group; rather, the router caches such joins till such 330 time as it has itself received a JOIN_ACK for the original join. Only 331 then can it acknowledge any cached joins. A router is said to be in a 332 pending-join state if it is awaiting a JOIN_ACK itself. 334 Note that the presence of underlying transient asymmetric routes is 335 irrelevant to the tree-building process; CBT tree branches are sym- 336 metric by the nature in which they are built. Joins set up transient 337 state (incoming and outgoing interface state) in all routers along a 338 path to a particular core. The corresponding join-ack traverses the 339 reverse-path of the join as dictated by the transient state, and not 340 the path that underlying routing would dictate. Whilst permanent 341 asymmetric routes could pose a problem for CBT, transient 342 _________________________ 344 2 At this point, it is proposed that IGMP (v3) group 345 multicasts a notification across the subnet indicating 346 to member hosts that the delivery tree has been joined 347 successfully. Such a message would greatly benefit mul- 348 ticast protocols requiring explicit joins [5, 10]. 350 asymmetricity is detected by the CBT protocol. 352 2.5. Default DRs and Group DRs 354 The DR election mechanism does not guarantee that the DR will be the 355 router that actually forwards a join off a multi-access network; the 356 first hop on the path to a particular core might be via another 357 router on the same (sub)network, which actually forwards off-subnet. 359 The CBT router that becomes the interface between the subnet and the 360 rest of the CBT tree, i.e. the CBT router at which a join-ack arrives 361 on the subnet, becomes the CBT GROUP DR. This group-specific DR (G- 362 DR) is a token (implicit) identity. In the normal case where there is 363 no subnet extra hop, the receipt of a JOIN-ACK means that the D-DR 364 becomes the G-DR for the specified group. 366 Although very much the same, let's see another example using our 367 example topology of figure 1 of a host joining a CBT tree for the 368 case where more than one CBT router exists on the host subnetwork. 370 B's subnet, S4, has 3 CBT routers attached. Assume also that R6 has 371 been elected IGMP-querier and CBT D-DR. 373 R6 (S4's D-DR) receives an IGMP group membership report. By some 374 means, R6 discovers the mapping for the group specified 375 in the report; R4 is the target core for the group. R6 generates a 376 join-request for target core R4, subcode ACTIVE_JOIN. R6's routing 377 table says the next-hop on the path to R4 is R2, which is on the same 378 subnet as R6. This is irrelevant to R6, which unicasts it to R2. R2 379 unicasts it to R3, which happens to be already on-tree for the speci- 380 fied group (from R1's join). R3 therefore can acknowledge the arrived 381 join and unicast it back to R2. R2 realises it is not the origin of 382 the corresponding join-request, but sees that the origin (R6) is on 383 the same subnet as itself, and that over which the join-ack should be 384 forwarded to the origin, R6. R2 unicasts the join-ack on its final 385 hop. R2 has thus become the group's G-DR, with R6 remaining the D-DR 386 for all groups. 388 If an IGMP membership report is received by a D-DR with a join for 389 the same group already pending, or if the D-DR is already on-tree for 390 the group, it takes no action. 392 2.6. Tree Teardown 394 There are two scenarios whereby a tree branch may be torn down: 396 + During a re-configuration. If a router's best next-hop to the 397 specified core is one of its existing children, then before 398 sending the join it must tear down that particular downstream 399 branch. It does so by sending a FLUSH_TREE message which is pro- 400 cessed hop-by-hop down the branch. All routers receiving this 401 message must process it and forward it to all their children. 402 Routers that have received a flush message will re-establish 403 themselves on the delivery tree if they have directly connected 404 subnets with group presence. 406 + If a CBT router has no children it periodically checks all its 407 directly connected subnets for group member presence. If no 408 member presence is ascertained on any of its subnets it sends a 409 QUIT_REQUEST upstream to remove itself from the tree. 411 The following example, using the example topology of figure 1, shows 412 how a tree branch is gracefully torn down using a QUIT_REQUEST. 414 Assume group member B leaves group G on subnet S4. B issues an IGMP 415 HOST-MEMBERSHIP-LEAVE (relevant only to IGMPv2 and later versions) 416 message which is multicast to the "all-routers" group (224.0.0.2). 417 R6, the subnet's D-DR and IGMP-querier, responds with a group- 418 specific-QUERY. No hosts respond within the required response inter- 419 val, so D-DR assumes group G traffic is no longer wanted on subnet 420 S4. 422 Since R6 has no CBT children, and no other directly attached subnets 423 with group G presence, it immediately follows on by sending a 424 QUIT_REQUEST to R2, its parent on the tree for group G. R2 responds 425 with a QUIT-ACK, unicast to R6; R2 removes the corresponding child 426 information. R2 in turn sends a QUIT upstream to R3 (since it has no 427 other children or subnet(s) with group presence). 429 NOTE: immediately subsequent to sending a QUIT-REQUEST, the sender 430 removes the corresponding parent information, i.e. it does not 431 wait for the receipt of a QUIT-ACK. 433 R3 responds to the QUIT by unicasting a QUIT-ACK to R2. R3 subse- 434 quently checks whether it in turn can send a quit by checking group G 435 presence on its directly attached subnets, and any group G children. 436 It has the latter (R1 is its child on the group G tree), and so R3 437 cannot itself send a quit. However, the branch R3-R2-R6 has been 438 removed from the tree. 440 3. Data Packet Forwarding Rules 442 When a router receives (non-locally originated) data packets for for- 443 warding over directly attached member subnets, it only does so over 444 the set of outgoing member subnets (interfaces) for which that router 445 is DR, irrespective of whether group membership is registered on 446 other local interfaces. In addition, in native mode, packets are for- 447 warded over any remaining interfaces specified by the FIB entry for 448 the group that are not in the above set (excluding the incoming 449 interface). In CBT mode, encapsulated data packets are forwarded over 450 the full set of interfaces specified by the FIB entry, except the 451 incoming interface. 453 A router only forwards data packets originated by directly attached 454 hosts iff the router is the DR on the interface over which those 455 packets were received. 457 4. Data Packet Forwarding -- Encapsulation Details 459 In "native mode" all data packets are forwarded over CBT tree inter- 460 faces as native IP multicasts, i.e. there are no encapsulations 461 required. This assumes that CBT is the multicast routing protocol in 462 operation within the domain (or "cloud") in question, and that all 463 routers within the domain of operation are CBT-capable, i.e. there 464 are no "tunnels". 466 In a multi-protocol environment, whose infrastructure may include 467 non-multicast-capable routers, it is necessary to tunnel data packets 468 between CBT-capable routers. This is called "CBT mode". Data packets 469 are de-capsulated by CBT routers (such that they become native mode 470 data packets) before being forwarded over subnets with member hosts. 471 When multicasting (native mode) to member hosts, the TTL value of the 472 original IP header is set to one. CBT mode encapsulation is as fol- 473 lows: 475 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 476 | encaps IP hdr | CBT hdr | original IP hdr | data ....| 477 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 479 Figure 2. Encapsulation for CBT mode 481 The TTL value of the CBT header is set by the encapsulating CBT 482 router directly attached to the origin of a data packet. This value 483 is decremented each time it is processed by a CBT router. An encap- 484 sulated data packet is discarded when the CBT header TTL value 485 reaches zero. 487 The purpose of the (outer) encapsulating IP header is to "tunnel" 488 data packets between CBT-capable routers (or "islands"). The outer IP 489 header's TTL value is set to the "length" of the corresponding tun- 490 nel, or MAX_TTL (255)if this is not known, or subject to change. 492 For native mode IP multicasts, i.e. those without any extra encapsu- 493 lation, the TTL value of the IP header is decremented each time the 494 packet is received by a multicast router. 496 It is worth pointing out here the distinction between subnetworks and 497 tree branches, although they can be one and the same. For example, a 498 multi-access subnetwork containing routers and end-systems could 499 potentially be both a CBT tree branch and a subnetwork with group 500 member presence. A tree branch which is not simultaneously a subnet- 501 work is either a "tunnel" or a point-to-point link. 503 In CBT mode there are three forwarding methods used by CBT routers: 505 + IP multicasting. This method is used to send a data packet 506 across a directly-connected subnetwork with group member pres- 507 ence. System host changes are not required for CBT. Similarly, 508 end-systems originating multicast data do so in traditional IP- 509 style. 511 + CBT unicasting. This method is used for sending data packets 512 encapsulated (as illustrated above) across a tunnel or point- 513 to-point link. En/de-capsulation takes place in CBT routers. 515 + CBT multicasting. Routers on multi-access links use this method 516 to send data packets encapsulated (as illustrated above) but the 517 outer encapsulating IP header contains a multicast address. This 518 method is used when a parent or multiple children are reachable 519 over a single physical interface, as could be the case on a 520 multi-access Ethernet. The IP module of end-systems subscribed 521 to the same group will discard these multicasts since the CBT 522 payload type (protocol id) of the outer IP header is not recog- 523 nizable by hosts. 525 CBT routers create Forwarding Information Base (FIB) entries whenever 526 they send or receive a JOIN_ACK. The FIB describes the parent-child 527 relationships on a per-group basis. A FIB entry dictates over which 528 tree interfaces, and how (unicast or multicast) a data packet is to 529 be sent. Additionally, a data packet is IP multicast over any 530 directly-connected subnetworks with group member presence. Such 531 interfaces are kept in a separate table relating to IGMP. A FIB entry 532 is shown below: 534 32-bits 4 4 4 8 535 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 536 | group-id | parent addr | parent vif | No. of | | 537 | | index | index |children | children | 538 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+-+-+-+-+-++-+-+-+-+-+-+-+-+-+ 539 |chld addr |chld vif | 540 | index | index | 541 |+-+-+-+-+-+-+-+-+-+-+ 542 |chld addr |chld vif | 543 | index | index | 544 |+-+-+-+-+-+-+-+-+-+-+ 545 |chld addr |chld vif | 546 | index | index | 547 |+-+-+-+-+-+-+-+-+-+-+ 548 | | 549 | etc. | 550 |+-+-+-+-+-+-+-+-+-+-| 552 Figure 3. CBT FIB entry 554 Note that a CBT FIB is required for both CBT-mode and native-mode 555 multicasting. 557 The field lengths shown above assume a maximum of 16 directly con- 558 nected neighbouring routers. 560 When a data packet arrives at a CBT router, the following rules 561 apply: 563 + if the packet is an IP-style multicast, it is checked to see if 564 it originated locally (i.e. if the arrival interface subnetmask 565 bitwise ANDed with the packet's source IP address equals the 566 arrival interface's subnet number, then the packet was sourced 567 locally). If the packet is not of local origin, it is discarded. 569 + the packet is IP multicast to all directly connected subnets 570 with group member presence. The packet is sent with an IP TTL 571 value of 1 in this case. 573 + the packet is encapsulated for CBT forwarding (see figure 2) and 574 unicast to parent and children. However, if more than one child 575 is reachable over the same interface the packet will be CBT mul- 576 ticast. Therefore, it is possible that an IP-style multicast and 577 a CBT multicast will be forwarded over a particular subnetwork. 579 NOTE: the TTL value of encapsulated data packets is manipulated as 580 described at the beginning of this section. 582 Using our example topology in figure 1, let's assume member G ori- 583 ginates an IP multicast packet. R8 is the DR for subnet S10. R8 CBT 584 unicasts the packet to each of its children, R9 and R12. These chil- 585 dren are not reachable over the same interface. R8, being the DR for 586 subnets S14 and S10 also IP multicasts the packet to S14 (S10 587 received the IP style packet already from the originator). R9, the DR 588 for S12, need not IP multicast onto S12 since there are no members 589 present there. R9 CBT unicasts the packet to R10, which is the DR for 590 S13 and S15. It IP multicasts to both S13 and S15. 592 Going upstream from R8, R8 CBT unicasts to R4. It is DR for all 593 directly connected subnets and therefore IP multicasts the data 594 packet onto S5, S6 and S7, all of which have member presence. R4 uni- 595 casts the packet to all outgoing children, R3 and R7 (NOTE: R4 does 596 not have a parent since it is the primary core router for the group). 597 R7 IP multicasts onto S9. R3 CBT unicasts to R1 and R2, its children. 598 Finally, R1 IP multicasts onto S1 and S3, and R2 IP multicasts onto 599 S4. 601 4.1. Non-Member Sending 603 For a multicast data packet to span beyond the scope of the originat- 604 ing subnetwork at least one CBT-capable router must be present on 605 that subnetwork. The default DR (D-DR) for the group on the subnet- 606 work must encapsulate the (native) IP-style packet and unicast it to 607 a core for the group. In native mode this encapsualation constitutes 608 IP-in-IP. In CBT mode, the encapsulation required is shown in figure 609 2. In both cases, CBT routers are required to know map- 610 pings. The alternatives for discovering these are discussed in sec- 611 tion 2.1. Beyond this, this topic is beyond the scope of this docu- 612 ment. 614 5. Eliminating the Topology-Discovery Protocol in the Presence of Tun- 615 nels 617 Traditionally, multicast protocols operating within a virtual topol- 618 ogy, i.e. an overlay of the physical topology, have required the 619 assistance of a multicast topology discovery protocol, such as that 620 present in DVMRP. However, it is possible to have a multicast proto- 621 col operate within a virtual topology without the need for a multi- 622 cast topology discovery protocol. One way to achieve this is by hav- 623 ing a router configure all its tunnels to its virtual neighbours in 624 advance. A tunnel is identified by a local interface address and a 625 remote interface address. Routing is replaced by "ranking" each such 626 tunnel interface associated with a particular core address; if the 627 highest-ranked route is unavailable (tunnel end-points are required 628 to run an Hello-like protocol between themselves) then the next- 629 highest ranked available route is selected, and so on. The exact 630 specification of the Hello protocol is outside the scope of this 631 document. 633 CBT trees are built using the same join/join-ack mechanisms as 634 before, only now some branches of a delivery tree run in native mode, 635 whilst others (tunnels) run in CBT mode. Underlying unicast routing 636 dictates which interface a packet should be forwarded over. Each 637 interface is configured as either native mode or CBT mode, so a 638 packet can be encapsulated (decapsulated) accordingly. 640 As an example, router R's configuration would be as follows: 642 intf type mode remote addr 643 ----------------------------------- 644 #1 phys native - 645 #2 tunnel cbt 128.16.8.117 646 #3 phys native - 647 #4 tunnel cbt 128.16.6.8 648 #5 tunnel cbt 128.96.41.1 650 core backup-intfs 651 -------------------- 652 A #5, #2 653 B #3, #5 654 C #2, #4 656 The CBT FIB needs to be slightly modified to accommodate an extra 657 field, "backup-intfs" (backup interfaces). The entry in this field 658 specifies a backup interface whenever a tunnel interface specified in 659 the FIB is down. Additional backups (should the first-listed backup 660 be down) are specified for each core in the core backup table. For 661 example, if interface (tunnel) #2 were down, and the target core of a 662 CBT control packet were core A, the core backup table suggests using 663 interface #5 as a replacement. If interface #5 happened to be down 664 also, then the same table recommends interface #2 as a backup for 665 core A. 667 6. Tree Maintenance 669 Once a tree branch has been created, i.e. a CBT router has received a 670 JOIN_ACK for a JOIN_REQUEST previously sent (forwarded), a child 671 router is required to monitor the status of its parent/parent link at 672 fixed intervals by means of a ``keepalive'' mechanism operating 673 between them. The ``keepalive'' mechanism is implemented by means of 674 two CBT control messages: CBT_ECHO_REQUEST and CBT_ECHO_REPLY. Adja- 675 cent CBT routers only need to send one keepalive per link, regardless 676 of how many groups are present on that link. This aggregation stra- 677 tegy is expected to conserve considerable bandwidth on "busy" links, 678 such as those nearer the "centre" of the network. 680 The keepalive protocol is simple, as follows: a child unicasts a 681 CBT-ECHO-REQUEST to its parent, which unicasts a CBT-ECHO-REPLY in 682 response. 684 For any CBT router, if its parent router, or path to the parent, 685 fails, the child is initially responsible for re-attaching itself, 686 and therefore all routers subordinate to it on the same branch, to 687 the tree. 689 6.1. Router Failure 691 An on-tree router can detect a failure from the following two cases: 693 + if the child responsible for sending keepalives across a partic- 694 ular link stops receiving CBT_ECHO_REPLY messages. In this case 695 the child realises that its parent has become unreachable and 696 must therefore try and re-connect to the tree for all groups 697 represented on the parent/child link. Until an aggregation stra- 698 tegy is fully worked out, a (re)join must be sent for each group 699 individually. (We present some ideas on rejoin aggregation in 700 Appendix A). 702 The rejoining router (that which is immediately subordinate to 703 the failure) sends a JOIN_REQUEST (subcode ACTIVE_JOIN if it has 704 no children attached, and subcode ACTIVE_REJOIN if at least one 705 child is attached) to the best next-hop router on the path to 706 the elected core. If no JOIN-ACK is received after three 707 retransmissions, each transmission being at PEND-JOIN-INTERVAL 708 (10 secs), an alternate core is elected from the core list, and 709 the process repeated. If all cores have been tried unsuccess- 710 fully, the D-DR has no option but to give up. 712 + if a parent stops receiving CBT_ECHO_REQUESTs from a child. In 713 this case the parent simply removes the child interface from FIB 714 entries that are represented by that parent/child link. 716 6.2. Router Re-Starts 718 There are two cases to consider here: 720 + Core re-start. All JOIN-REQUESTs (all types) carry the identi- 721 ties (i.e. addresses) of each of the cores for a group. If a 722 router is a core for a group, but has only recently re-started, 723 it will not be aware that it is a core for any group(s). In such 724 circumstances, a core only becomes aware that it is such by 725 receiving a JOIN-REQUEST. Subsequent to a core learning its 726 status in this way, if it is not the primary core it ack- 727 nowledges the received join, then sends a JOIN_REQUEST (subcode 728 ACTIVE_REJOIN) to the primary core. If the re-started router is 729 the primary core, it need take no action, i.e. in all cir- 730 cumstances, the primary core simply waits to be joined by other 731 routers. 733 + Non-core re-start. In this case, the router can only join the 734 tree again if a downstream router sends a JOIN_REQUEST through 735 it, or it is elected DR for one of its directly attached sub- 736 nets, and subsequently receives an IGMP membership report. 738 6.3. Route Loops 740 Routing loops are only a concern when a router with at least one 741 child is attempting to re-join a CBT tree. In this case the re- 742 joining router sends a JOIN_REQUEST (subcode ACTIVE REJOIN) to the 743 best next-hop on the path to an elected core. This join is forwarded 744 as normal until it reaches either the specified core, another core, 745 or a non-core router that is already part of the tree. If the rejoin 746 reaches the primary core, loop detection is not necessary. The pri- 747 mary core acks an active-rejoin by means of a JOIN-ACK, subcode 748 PRIMARY-REJOIN-ACK. This ack must be processed by each router on the 749 reverse-path of the active-rejoin. If an active-rejoin is terminated 750 by any router on the tree other than the primary core, loop detection 751 must take place, as we now describe. 753 If, in response to an active-rejoin, a JOIN-ACK is returned, subcode 754 NORMAL (as opposed to an ack with subcode PRIMARY-REJOIN-ACK), the 755 router receiving the ack subsequently generates a JOIN-REQUEST, sub- 756 code NACTIVE-REJOIN (non-active rejoin). This packet serves only to 757 detect loops; it does not create any transient state in the routers 758 it traverses, other than the originating router. Any on-tree router 759 receiving a non-active rejoin is required to forward it over its 760 parent interface for the specified group. In this way, it will either 761 reach the primary core, which returns, directly to the sender, a join 762 ack with subcode PRIMARY-NACTIVE-ACK (so the sender knows no loop is 763 present), or the sender receives the non-active rejoin it sent, via 764 one of its child interfaces, in which case the rejoin obviously 765 formed a loop. 767 If a loop is present, the non-active join originator immediately 768 sends a QUIT_REQUEST to its newly-established parent and the loop is 769 broken. 771 Using figure 4 (over) to demonstrate this, if R3 is attempting to 772 re-join the tree (R1 is the core in figure 4) and R3 believes its 773 best next-hop to R1 is R6, and R6 believes R5 is its best next-hop to 774 R1, which sees R4 as its best next-hop to R1 -- a loop is formed. R3 775 begins by sending a JOIN_REQUEST (subcode ACTIVE_REJOIN, since R4 is 776 its child) to R6. R6 forwards the join to R5. R5 is on-tree for the 777 group, so responds to the active-rejoin with a JOIN-ACK, subcode NOR- 778 MAL (the ack traverses R6 on its way to R3). R3 now generates a 779 JOIN-REQUEST, subcode NACTIVE-REJOIN, and forwards this to its 780 parent, R6. R6 forwards the non-active rejoin to R5, its parent. R5 781 does similarly, as does R4. Now, the non-active rejoin has reached 782 R3, which originated it, so R3 concludes a loop is present on the 783 parent interface for the specified group. It immediately sends a 784 QUIT_REQUEST to R6, which in turn sends a quit if it has not received 785 an ACK from R5 already AND has itself a child or subnets with member 786 presence. If so it does not send a quit -- the loop has been broken 787 by R3 sending the first quit. 789 QUIT_REQUESTs are typically acknowledged by means of a QUIT_ACK. A 790 child removes its parent information immediately subsequent to send- 791 ing its first QUIT-REQUEST. The ack here serves to notify the (old) 792 child that it (the parent) has in fact removed its child information. 793 However, there might be cases where, due to failure, the parent can- 794 not respond. The child sends a QUIT-REQUEST a maximum of three 795 times, at PEND-QUIT-INTERVAL (10 sec) intervals. 797 ------ 798 | R1 | 799 ------ 800 | 801 --------------------------- 802 | 803 ------ 804 | R2 | 805 ------ 806 | 807 --------------------------- 808 | | 809 ------ | 810 | R3 |--------------------------| 811 ------ | 812 | | 813 --------------------------- | 814 | | ------ 815 ------ | | | 816 | R4 | |-------| R6 | 817 ------ | |----| 818 | | 819 --------------------------- | 820 | | 821 ------ | 822 | R5 |--------------------------| 823 ------ | 824 | 826 Figure 4: Example Loop Topology 828 In another scenario the rejoin travels over a loop-free path, and the 829 first on-tree router encountered is the primary core, R1. In figure 830 4, R3 sends a join, subcode REJOIN_ACTIVE to R2, the next-hop on the 831 path to core R1. R2 forwards the re-join to R1, the primary core, 832 which returns a JOIN-ACK, subcode PRIMARY-REJOIN-ACK, over the 833 reverse-path of the rejoin-active. Whenever a router receives a 834 PRIMARY-REJOIN-ACK no loop detection is necessary. 836 If we assume R2 is on tree for the corresponding group, R3 sends a 837 join, subcode REJOIN_ACTIVE to R2, which replies with a join ack, 838 subcode NORMAL. R3 must then generate a loop detection packet (join 839 request, subcode REJOIN-NACTIVE) which is forwarded to its parent, 840 R2, which does similarly. On receipt of the rejoin-Nactive, the pri- 841 mary core unicasts a join ack back directly to R3, with subcode 842 PRIMARY-NACTIVE-ACK. This confirms to R3 that its rejoin does not 843 form a loop. 845 7. Data Packet Loops 847 The CBT protocol builds a loop-free distribution tree. If all routers 848 that comprise a particular tree function correctly, data packets 849 should never traverse a tree branch more than once. 851 CBT routers will only forward native-style data packets if they are 852 received over a valid on-tree interface. A native-style data packet 853 that is not received over such an interface is discarded. 855 Encapsulated CBT data packets from a non-member sender can arrive via 856 an "off-tree" interface (this is how CBT-mode sends data across tun- 857 nels, and how data from non-member senders in native-mode or CBT-mode 858 reaches a tree). The encapsulating CBT data packet header includes 859 an "on-tree" field, which contains the value 0x00 until the data 860 packet reaches an on-tree router. At this point, the router must con- 861 vert this value to 0xff to indicate the data packet is now on-tree. 862 This value remains unchanged, and from here on the packet should 863 traverse only on-tree interfaces. If an encapsulated packet happens 864 to "wander" off-tree and back on again, the latter on-tree router 865 will receive the CBT encapsulated packet via an off-tree interface. 866 However, this router will recognise that the "on-tree" field of the 867 encapsulating CBT header is set to 0xff, and so immediately discards 868 the packet. 870 8. CBT Packet Formats and Message Types 872 CBT packets travel in IP datagrams. We distinguish between two types 873 of CBT packet: CBT data packets, and CBT control packets. CBT con- 874 trol packets carry a CBT control header. All CBT control messages are 875 implemented over UDP. CBT mode data (figure 2) requires a CBT data 876 packet header. 878 8.1. CBT Header Format (for CBT Mode data) 880 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 881 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 882 | vers |unused | type | hdr length | on-tree|unused| 883 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 884 | checksum | IP TTL | unused | 885 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 886 | group identifier | 887 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 888 | reserved | reserved | Type | Length | 889 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 890 | .....VALUE.... | 891 | (for flow-id and/or security options) | 892 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 894 Figure 5. CBT Header 896 Each of the fields is described below: 898 + Vers: Version number -- this release specifies version 1. 900 + type: indicates CBT payload is data. The only value defined 901 for this field is 255 (0xff). 903 + hdr length: length of the header, for purpose of checksum 904 calculation. 906 + on-tree: indicates whether the packet is on-tree (0xff) or 907 off-tree (0x00). Once this field is set (i.e. on-tree), it 908 is non-changing. This field can only be set by a router that 909 has a FIB entry for the corresponding group, i.e. a router 910 that has received a join-ack for a join-request previously 911 sent/forwarded. 913 + checksum: the 16-bit one's complement of the one's complement 914 of the CBT header, calculated across all fields. 916 + IP TTL: TTL value gleaned from the IP header where the packet 917 originated. It is decremented each time it traverses a CBT 918 router. 920 + group identifier: multicast group address. 922 + The TLV fields at the end of the header are for a flow- 923 identifier, and/or security options, if and when implemented. 924 A "type" value of zero implies a "length" of zero, implying 925 there is no "value" field. 927 8.2. Control Packet Header Format 929 The individual fields are described below. It should be noted that only 930 certain fields beyond ``group identifier'' are processed for the dif- 931 ferent control messages. 933 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 934 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 935 | vers |unused | type | code | # cores | 936 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 937 | hdr length | checksum | 938 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 939 | group identifier | 940 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 941 | packet origin | 942 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 943 | primary core address | 944 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 945 | target core address (core #1) | 946 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 947 | Core #2 | 948 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 949 | Core #3 | 950 | .... | 951 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 952 | reserved | reserved | Type | Length | 953 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 954 | .....VALUE.... | 955 | (for flow-id and/or security options) | 956 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 958 Figure 6. CBT Control Packet Header 960 + Vers: Version number -- this release specifies version 1. 962 + type: indicates control message type (see sections 7.3, 963 7.3.1). 965 + code: indicates subcode of control message type. 967 + # cores: number of core addresses carried by this control 968 packet (does not include "primary core address" field). 970 + header length: length of the header, for purpose of checksum 971 calculation. 973 + checksum: the 16-bit one's complement of the one's complement 974 of the CBT control header, calculated across all fields. 976 + group identifier: multicast group address. 978 + packet origin: address of the CBT router that originated the 979 control packet. 981 + primary core address: the address of the primary core for the 982 group. 984 + target core address: desired core affiliation of control mes- 985 sage. 987 + Core #Z: Z refers to some arbitrary IP address representing a 988 core. 990 + The TLV fields at the end of the header are for a flow- 991 identifier, and/or security options, if implemented. A "type" 992 value of zero implies a "length" of zero, implying there is 993 no "value" field. 995 8.3. CBT Control Message Types 997 There are eight types of CBT message. All are encoded in the CBT con- 998 trol header, shown in figure 6. 1000 + JOIN-REQUEST (type 1): generated by a router and unicast to 1001 the specified core address. It is processed hop-by-hop on its 1002 way to the specified core. Its purpose is to establish the 1003 sending CBT router, and all intermediate CBT routers, as part 1004 of the corresponding delivery tree. Note that all cores are 1005 carried in join-requests. 1007 + JOIN-ACK (type 2): an acknowledgement to the above. The full 1008 list of core addresses is carried in a JOIN-ACK, together 1009 with the actual core affiliation (the join may have been ter- 1010 minated by an on-tree router on its journey to the specified 1011 core, and the terminating router may or may not be affiliated 1012 to the core specified in the original join). A JOIN-ACK 1013 traverses the same path as the corresponding JOIN-REQUEST, 1014 with each CBT router on the path processing the ack. It is 1015 the receipt of a JOIN-ACK that actually creates a tree 1016 branch. 1018 + JOIN-NACK (type 3): a negative acknowledgement, indicating 1019 that the tree join process has not been successful. 1021 + QUIT-REQUEST (type 4): a request, sent from a child to a 1022 parent, to be removed as a child to that parent. 1024 + QUIT-ACK (type 5): acknowledgement to the above. If the 1025 parent, or the path to it is down, no acknowledgement will be 1026 received within the timeout period. This results in the 1027 child nevertheless removing its parent information. 1029 + FLUSH-TREE (type 6): a message sent from parent to all chil- 1030 dren, which traverses a complete branch. This message results 1031 in all tree interface information being removed from each 1032 router on the branch, possibly because of a re-configuration 1033 scenario. 1035 + CBT-ECHO-REQUEST (type 7): once a tree branch is established, 1036 this messsage acts as a ``keepalive'', and is unicast from 1037 child to parent (one per link, NOT one per group). 1039 + CBT-ECHO-REPLY (type 8): positive reply to the above. 1041 8.3.1. CBT Control Message Subcodes 1043 The JOIN-REQUEST has three valid subcodes: 1045 + ACTIVE-JOIN (code 0) - sent from a CBT router that has no 1046 children for the specified group. 1048 + REJOIN-ACTIVE (code 1) - sent from a CBT router that has at 1049 least one child for the specified group. 1051 + REJOIN-NACTIVE (code 2) - generated by a router subsequent to 1052 receiving a join ack, subcode NORMAL, in response to a 1053 active-rejoin. 1055 A JOIN-ACK has three valid subcodes: 1057 + NORMAL (code 0) - sent by a core router, or on-tree non-core 1058 router acknowledging joins with subcodes ACTIVE-JOIN and 1059 REJOIN-ACTIVE. 1061 + PRIMARY-REJOIN-ACK (code 1) - sent by a primary core to ack- 1062 nowledge the receipt of a join-request received with subcode 1063 REJOIN-ACTIVE. This message traverses the reverse-path of the 1064 corresponding re-join, and is processed by each router on 1065 that path. 1067 + PRIMARY-NACTIVE-ACK (code 2) - sent by a primary core to ack- 1068 nowledge the receipt of a join-request received with subcode 1069 REJOIN-NACTIVE. This ack is unicast directly to the router 1070 that generated the rejoin-Nactive, i.e. the ack it is not 1071 processed hop-by-hop. 1073 9. CBT Protocol and Port Numbers 1075 CBT mode (data) encapsulation (figure 2) requires an IP protocol 1076 number assignment for CBT. An official protocol number has recently 1077 been approved by the IANA; CBT has IP protocol number 7. 1079 CBT control packets travel inside UDP datagrams, as the following 1080 diagram illustrates: 1082 ++++++++++++++++++++++++++++++++++++++++++++ 1083 | IP header | UDP header | CBT control pkt | 1084 ++++++++++++++++++++++++++++++++++++++++++++ 1086 Figure 7. Encapsulation for CBT control messages 1088 CBT therefore requires a UDP port assignment for control messages. 1089 An official UDP port number has recently been approved by the IANA; 1090 CBT control messages are received on UDP port 7777. 1092 10. Default Timer Values 1094 There are several CBT control messages which are transmitted at fixed 1095 intervals. These values, retransmission times, and timeout values, 1096 are given below. Note these are recommended default values only, and 1097 are configurable with each implementation (all times are in seconds): 1099 + CBT-ECHO-INTERVAL 30 (time between sending successive CBT-ECHO- 1100 REQUESTs to parent). 1102 + PEND-JOIN-INTERVAL 10 (retransmission time for join-request if 1103 no ack rec'd) 1105 + PEND-JOIN-TIMEOUT 30 (time to try joining a different core, or 1106 give up) 1108 + EXPIRE-PENDING-JOIN 90 (remove transient state for join that has 1109 not been ack'd) 1111 + PEND_QUIT_INTERVAL 10 (retransmission time for quit-request if 1112 no ack rec'd) 1114 + CBT-ECHO-TIMEOUT 90 (time to consider parent unreachable) 1116 + CHILD-ASSERT-INTERVAL 90 (increment child timeout if no ECHO 1117 rec'd from a child) 1119 + CHILD-ASSERT-EXPIRE-TIME 180 (time to consider child gone) 1121 + IFF-SCAN-INTERVAL 300 (scan all interfaces for group presence. 1122 If none, send QUIT) 1123 11. Interoperability Issues 1125 One of the design goals of CBT is for it to fully interwork with 1126 other IP multicast schemes. We have already described how CBT-style 1127 packets are transformed into IP-style multicasts, and vice-versa. 1129 In order for CBT to fully interwork with other schemes, it is neces- 1130 sary to define the interface(s) between a ``CBT cloud'' and the cloud 1131 of another scheme. The CBT authors are currently working out the 1132 details of interoperability, and we expect an interoperability docu- 1133 ment to be available shortly. 1135 12. CBT Security Architecture 1137 see current I-D: ftp://cs.ucl.ac.uk/darpa/IDMR/draft-ietf-idmr-mkd- 1138 01.{ps,txt} 1140 Acknowledgements 1142 Special thanks goes to Paul Francis, NTT Japan, for the original 1143 brainstorming sessions that brought about this work. 1145 Thanks too to Sue Thompson (Bellcore). Her detailed reviews led to 1146 the identification of some subtle protocol flaws, and she suggested 1147 several simplifications. 1149 Thanks also to the networking team at Bay Networks for their comments 1150 and suggestions, in particular Steve Ostrowski for his suggestion of 1151 using "native mode" as a router optimization, and Eric Crawley. 1153 Thanks also to Ken Carlberg (SAIC) for reviewing the text, and gen- 1154 erally providing constructive comments throughout. 1156 I would also like to thank the participants of the IETF IDMR working 1157 group meetings for their general constructive comments and sugges- 1158 tions since the inception of CBT. 1160 APPENDIX A 1162 A single rejoin could be sent for all the groups the keepalive 1163 represents. This constitutes an aggregated rejoin strategy; a single 1164 rejoin message can serve to rejoin multiple groups to their respec- 1165 tive trees, provided those groups share a common core (that which is 1166 being rejoined). Therefore, it may be that several rejoins need to be 1167 sent to re-connect all groups traversing the router after a failure. 1168 Similarly, the corresponding join-ack would represent an aggregate. 1170 NOTE: it remains to be worked out how the new parent establishes from 1171 the aggregated rejoin all those groups which the rejoin represents 1172 (so the new parent can create/modify the necessary FIB entries). A 1173 "group aggregate" field may be necessary in the control packet. 1174 Alternatively, when the ack is received in response to the rejoin, 1175 each group represented by the rejoin sends a group-specific echo 1176 until an ack is received for each. 1178 Authors' Addresses: 1180 Tony Ballardie, 1181 Department of Computer Science, 1182 University College London, 1183 Gower Street, 1184 London, WC1E 6BT, 1185 ENGLAND, U.K. 1187 Tel: ++44 (0)71 419 3462 1188 e-mail: A.Ballardie@cs.ucl.ac.uk 1190 Scott Reeve, 1191 Bay Networks, Inc. 1192 3, Federal Street, 1193 Billerica, MA 01821, 1194 USA. 1196 Tel: ++1 508 670 8888 1197 e-mail: sreeve@BayNetworks.com 1199 Nitin Jain, 1200 Bay Networks, Inc. 1201 3, Federal Street, 1202 Billerica, MA 01821, 1203 USA. 1205 Tel: ++1 508 670 8888 1206 e-mail: njain@BayNetworks.com 1208 References 1210 [1] DVMRP. Described in "Multicast Routing in a Datagram Internet- 1211 work", S. Deering, PhD Thesis, 1990. Available via anonymous ftp from: 1212 gregorio.stanford.edu:vmtp/sd-thesis.ps. 1214 [2] J. Moy. Multicast Routing Extensions to OSPF. Communications of 1215 the ACM, 37(8): 61-66, August 1994. 1217 [3] D. Farinacci, S. Deering, D. Estrin, and V. Jacobson. Protocol 1218 Independent Multicast (PIM) Dense-Mode Specification (draft-ietf- 1219 idmr-pim-spec-01.ps). Working draft, 1994. 1221 [4] A. J. Ballardie. Scalable Multicast Key Distribution 1222 (ftp://cs.ucl.ac.uk/darpa/IDMR/draft-ietf-idmr-mkd-01.{ps,txt}). Work- 1223 ing draft, 1995. 1225 [5] A. J. Ballardie. "A New Approach to Multicast Communication in a 1226 Datagram Internetwork", PhD Thesis, 1995. Available via anonymous ftp 1227 from: cs.ucl.ac.uk:darpa/IDMR/ballardie-thesis.ps.Z. 1229 [6] W. Fenner. Internet Group Management Protocol, version 2 (IGMPv2), 1230 (draft-idmr-igmp-v2-01.txt). 1232 [7] B. Cain, S. Deering, A. Thyagarajan. Internet Group Management 1233 Protocol Version 3 (IGMPv3) (draft-cain-igmp-00.txt). 1235 [8] M. Handley, J. Crowcroft, I. Wakeman. Hierarchical Rendezvous 1236 Point proposal, work in progress. 1237 (http://www.cs.ucl.ac.uk/staff/M.Handley/hpim.ps) and 1238 (ftp://cs.ucl.ac.uk/darpa/IDMR/IETF-DEC95/hpim-slides.ps). 1240 [9] D. Estrin et al. USC/ISI, Work in progress. 1241 (http://netweb.usc.edu/pim/). 1243 [10] D. Estrin et al. PIM Sparse Mode Specification. (draft-ietf- 1244 idmr-pim-sparse-spec-00.txt).