idnits 2.17.1 draft-ietf-idmr-cbt-spec-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-24) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There are 31 instances of too long lines in the document, the longest one being 7 characters in excess of 72. ** There are 6 instances of lines with control characters in the document. ** The abstract seems to contain references ([RFC1704], [RFC1546]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 113 has weird spacing: '...on) of vario...' == Line 114 has weird spacing: '...his may not ...' == Line 368 has weird spacing: '...r which is d...' == Line 369 has weird spacing: '...mented each ...' == Line 370 has weird spacing: '... router will...' == (8 more instances...) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? 'RFC 1704' on line 44 looks like a reference -- Missing reference section? 'RFC 1546' on line 348 looks like a reference Summary: 13 errors (**), 0 flaws (~~), 7 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Inter-Domain Multicast Routing (IDMR) A. J. Ballardie 3 INTERNET-DRAFT University College London 4 April 18th, 1995 6 Core Based Trees (CBT) Multicast 8 -- Architectural Overview and Specification -- 9 11 Status of this Memo 13 This document is an Internet Draft. Internet Drafts are working do- 14 cuments of the Internet Engineering Task Force (IETF), its Areas, and 15 its Working Groups. Note that other groups may also distribute work- 16 ing documents as Internet Drafts). 18 Internet Drafts are draft documents valid for a maximum of six 19 months. Internet Drafts may be updated, replaced, or obsoleted by 20 other documents at any time. It is not appropriate to use Internet 21 Drafts as reference material or to cite them other than as a "working 22 draft" or "work in progress." 24 Please check the I-D abstract listing contained in each Internet 25 Draft directory to learn the current status of this or any other In- 26 ternet Draft. 28 Abstract 30 CBT is a new architecture for local- and wide-area IP multicasting, 31 being unique in its utilization of just one shared delivery tree, as 32 opposed to the source-based delivery trees of traditional IP multi- 33 cast schemes. 35 The primary advantages of the CBT approach are that it typically 36 offers more favourable scaling characteristics than do existing mul- 37 ticast algorithms. The definition of a new network layer multicast 38 protocol has also meant that it has been possible to integrate an en- 39 riched functionality into multicast that is not possible under other 40 IP multicast schemes, for example, the incorporation of security 41 features. Besides this functionality providing the ability to authen- 42 ticate tree-joining host's and routers, optional in-built protocol 43 mechanisms provide a scalable solution to the multicast key distribu- 44 tion problem [RFC 1704]. 46 CBT is backwards compatible with traditional IP-style multicast. Host 47 changes are not required, and a local CBT-capable router is mandatory 48 if CBT-style multicasts are to be forwarded beyond the local subnet- 49 work. 51 _1. _B_a_c_k_g_r_o_u_n_d 53 Centre based forwarding was first described in the early 1980s by 54 Wall in his PhD thesis on broadcast and selective broadcast. At this 55 time, multicast was in its very earliest stages of development, and 56 researchers were only just beginning to realise the benefits that 57 could be gained from it, and some of the uses it could be put to. It 58 was only later that the class-D multicast address space was defined, 59 and later again that intrinsic multicast support was taken advantage 60 of for broadcast media, such as Ethernet. 62 Now that we have several years practical experience with multicast, a 63 diversity of multicast applications, and an internetwork infrastruc- 64 ture that wants to support it to an ever-increasing degree, we re- 65 visit the centre-based forwarding paradigm introduced by Wall, and 66 mould and adapt it specifically for today's multicast environment. 68 _2. _I_n_t_r_o_d_u_c_t_i_o_n 70 Multicast group communication is an increasingly important capability 71 in many of today's data networks. Most LANs and more recent wide-area 72 network technologies such as SMDS and ATM specify multicast as part 73 of their service. 75 Since the wide-area introduction of multicasting there has been a 76 large increase in the number and diversity of multicast applications, 77 examples of which include audio and video conferencing, replicated 78 database updating and querying, software update distribution, stock 79 market information services, and more recently, resource discovery. 80 Multimedia is another fast expanding area for which multicast offers 81 an invaluable service. It has therefore been necessary of late to 82 address the topic of scalability with regards to multicast algo- 83 rithms, since, if they do not scale to an internetwork size that is 84 expected (given the growth rate of the last several years), they can- 85 not be of longlasting benefit. This motivates the need for new multi- 86 casting techniques to be investigated. 88 This draft describes a new multicast routing architecture and proto- 89 col which is applicable to a datagram network. The CBT architecture 90 has attractive scaling characteristics. We measure scalability in 91 terms of network state maintenance, bandwidth- and processing costs. 93 _3. _D_o_c_u_m_e_n_t _L_a_y_o_u_t 95 The remainder of this document is divided into three parts: Part A 96 offers a general architectural overview and discussion on the CBT 97 architecture. This section also includes a description of CBT ``any- 98 casting'' [see RFC 1546]. 100 Parts B and C comprise the protocol specification. Part B describes 101 protocol engineering design features, such as CBT group initiation, 102 the tree joining process, tree maintenance issues, the tree leaving 103 process, LAN issues, data packet forwarding, and data packet encapsu- 104 lation and translation (see footnote 1) 106 Part C illustrates and describes in detail, individual CBT packet 107 formats and message types. 109 Part D looks briefly at some other related issues. 111 9_________________________ 112 9 1 We will refer to the copying (and sometimes altera- 113 tion) of various fields of the IP header to a CBT 114 header as translation throughout. This may not be in 115 total agreement with how the term is used elsewhere. 117 Part A 119 _1. _C_B_T - _T_h_e _N_e_w _A_r_c_h_i_t_e_c_t_u_r_e 121 _2. _A_r_c_h_i_t_e_c_t_u_r_a_l _O_v_e_r_v_i_e_w 123 A core-based tree involves having a single node, in our case a router 124 (with additional routers for robustness), known as the core of the 125 tree, from which branches emmanate. These branches are made up of 126 other routers, so-called non-core routers, which form a shortest for- 127 ward path between a member-host's directly attached router, and the 128 core. A router at the end of a branch shall be known as a leaf router 129 on the tree. 131 The CBT protocol builds a delivery tree reflecting the architecture 132 just described. This architecture allows for the enhancement of the 133 scalability of the multicast algorithm with regards to group-specific 134 state maintained in the network, particularly for the case where 135 there are many active senders in a particular group. The CBT archi- 136 tecture offers an improvement in scalability over existing techniques 137 by a factor of the number of active sources (where a source is a sub- 138 network aggregate). Hence, a core-based architecture allows us to 139 significantly improve the overall scaling factor of S * N we have in 140 the source-based tree architecture, to just N. This is the result of 141 having just one multicast tree per group as opposed to one tree per 142 (source, group) pair. 144 It is also interesting to note that routers between a non-member 145 sender and the CBT delivery tree need no knowledge of the multicast 146 tree/group whatsoever in order to forward CBT multicasts, since these 147 are unicast towards the core. This two-phase routing approach is 148 unique to the CBT architecture. One such application that can take 149 advantage of this two-phase routing is resource discovery, whereby a 150 resource, for example, a replicated database, is distributed in dif- 151 ferent locations throughout the Internet. The databases in the dif- 152 ferent locations make up a single multicast group, linked by a CBT 153 tree. A client need only know the address of (one of) the core(s) for 154 the group in order to send (unicast) a request to it. Such a request 155 would not span the tree in this case, but would be answered by the 156 first tree router encountered, making it quite likely that the 157 request is answered by the ``nearest'' server. Effectively, this 158 corresponds to an ``anycast'' service [RFC 1546] (see section X). 160 A diagram showing a single-core CBT tree is shown in the figure 161 below. Only one core is shown to demonstrate the principle. 163 b b b-----b 164 \ | | 165 \ | | 166 b---b b------b 167 / \ / KEY.... 168 / \/ 169 b X---b-----b X = Core 170 / \ b = non-core router 171 / \ 172 / \ 173 b b------b 174 / \ | 175 / \ | 176 b b b 178 Figure 1: Single-Core CBT Tree 180 _2._1. _A_r_c_h_i_t_e_c_t_u_r_a_l _J_u_s_t_i_f_i_c_a_t_i_o_n 182 First of all, exactly what is a core-based tree (CBT) architecture? 183 Core-based, or centre-based forwarding trees, were first described by 184 Wall in his investigation into low-delay approaches to broadcast and 185 selective broadcast. Wall concluded that delay will not be minimal, 186 as with shortest-path trees, but the delay can be kept within bounds 187 that may be acceptable. Simulations have recently been carried out 188 to compare the maximum and average delays of centre-based and 189 shortest-path trees. A summary of these simulations can be found in 191 In the context of multicast, the extent to which the delay charac- 192 teristics of a shared tree are less optimal than SPTs, is question- 193 able. The simulation results state that CBTs incur, on average, a 10% 194 increase in delay over SPTs. Slight discrepancies in delay may not 195 be a critical factor for many multicast applications, such as 196 resource discovery or database updating/querying. Even for real-time 197 applications such as voice and video conferencing, a core based tree 198 may indeed be acceptable, especially if the majority of branches of 199 that tree span high-bandwidth links, such as optical fibre. In 200 several years' time it is easy to envisage the Internet being host to 201 thousands of active multicast groups, and similarly, the bandwidth 202 capacity on many of the Internet links may well far exceed those of 203 today. 205 An important question raised in the SPT vs. CBT debate is: how effec- 206 tively can load sharing be achieved by the different schemes? It 207 would seem that SPT schemes cannot achieve load balancing because of 208 the nature of their forwarding: nodes on a SPT do not have the option 209 to forward incoming packets over different links (i.e. load balance) 210 because of the danger of loops forming in the multicast tree topol- 211 ogy. 213 With shared tree schemes however, each receiver can choose which of 214 the small selection of cores it wishes to join. Cores and on-tree 215 nodes can be configured to accept only a certain number of joins, 216 forcing a receiver to join via a different path. This flexibility 217 gives shared tree schemes the ability to achieve load balancing. 219 In general, spread over all groups, CBT has the ability to randomize 220 the group set over different trees (spanning different links around 221 the centre of the network), something that would not seem possible 222 under SPT schemes. 224 Finally, the CBT protocol requires each receiver to explicitly join 225 the delivery tree, resulting in a tree spanning only a group's 226 receivers. As a result, data flows only over those links that lead to 227 receivers, and thus there is no requirement for off-tree routers to 228 maintain prune state, which prevents data flow where it is not 229 needed. 231 _2._2. _T_h_e _I_m_p_l_i_c_a_t_i_o_n_s _o_f _S_h_a_r_e_d _T_r_e_e_s 233 The trade-offs introduced by the CBT architecture focus primarily 234 between a reduction in the overall state the network must maintain 235 (given that a group has a significant proportion of active senders), 236 and the potential increased delay imposed by a shared delivery tree. 238 We have emphasized CBT's much improved scalability over existing 239 schemes for the case where there are {\m active} group senders. How- 240 ever, because of CBT's ``hard-state'' approach to tree building, i.e. 241 group tree link information does not time out after a period of inac- 242 tivity, as is the case with most source-based architecutures, 243 source-based architectures scale best when there are no senders to a 244 multicast group. This is because multicast routers in the network 245 eventually time out all information pertaining to an inactive group. 246 Source-based trees are said to be built ``on-demand'', and are 247 ``data-driven''. 249 A consequence of the ``hard-state'' approach is that multicast tree 250 branches do not automatically adapt to underlying multicast route 251 changesotnote{If multicast were part of the global internetwork 252 infrastructure, multicast routes are gleaned exclusively from {\m 253 unicast} routes.}. This is in contrast to the ``soft-state'', data- 254 driven approach -- data always follows the path as specified in the 255 routing table. Provided reachability is not lost, it is advantageous, 256 from the perspective of uninterrupted packet flow, that a multicast 257 route is kept constant, but the two disadvantages are: a route may 258 not be optimal for its entire duration, and, ``hard-state'' requires 259 the incorporation of {\m control messages} that monitor reachability 260 between adjacent routers on the multicast tree. This control message 261 overhead can be quite considerable unless some form of message aggre- 262 gation is employed. 264 In terms of the effectiveness of the CBT approach to multicasting, 265 the increased delay factor imposed by a shared delivery tree may not 266 always be acceptable, particularly if a portion of the delivery tree 267 spans low bandwidth links. This is especially relevant for real-time 268 applications, such as voice conferencing. 270 Another consequence of one shared delivery tree is that the cores for 271 a particular group, especially large, widespread groups with numerous 272 active senders, can potentially become traffic ``hot-spots'' or 273 ``bottlenecks''. This has been referred to as the {\m traffic concen- 274 tration} effect in 276 The branches of a CBT tree are made up of a collection of branches, 277 rooted at the tree node that originated a join-request, and terminat- 278 ing at the tree node that acknowledged the same join. This has impli- 279 cations where asymmetric routes are concerned (similar to source- 280 based schemes based on RPF) -- whilst the same CBT branch is used for 281 data packet flow in {\m both} directions, the child-to-parent direc- 282 tion constitutes a valid route reflecting the underlying unicast 283 route (at least at the time the branch was created). However, in the 284 parent-to-child direction, the path does not necessarily reflect 285 underlying unicast routing at any instant, and therefore, in a 286 policy-oriented environment, this {\m might} have disadvantageous 287 side-effects. 289 Finally, there are questions concerning the {\m cores} of a group 290 tree: how are they selected, where are they placed, how are they 291 managed, and how do new group members get to know about them? We have 292 attempted to implement some very simple heuristics to address some of 293 these questions in section X, but these may not be appropriate for 294 large-scale implementation of CBT. Work is currently underway in the 295 development of a core placement/location protocol. 297 We conclude in section X that most aspects of core management are 298 topics of further research. 300 _3. _C_B_T _a_n_d ``_A_n_y_c_a_s_t_i_n_g'' 302 _3._1. _O_v_e_r_v_i_e_w _o_f ``_A_n_y_c_a_s_t_i_n_g'' 304 Anycasting [RFC 1546] is a proposed best-effort, stateless, datagram 305 delivery service which is used by hosts primarily to locate particular 306 services on an internetwork. The goal of anycast is for a client to 307 transmit one request to a resource ``anycast address'', and for a sin- 308 gle, preferably nearest, server to receive the request and respond to 309 it. 311 The motivation for anycasting is that it simplifies the task of finding 312 the appropriate server in a network, and obviates the need to configure 313 applications with particular server address(es), for example, as in DNS 314 resolvers. 316 Questions that, as yet, remain unanswered regarding anycasting, include: 317 how best can anycasting be achieved, and should anycast addresses be a 318 special class of IP address? 320 As for how best to achieve anycast, there are two possible approaches: 321 use existing IP multicast, or, answering our second question, define a 322 special class of IP anycast address within the IP address space, and 323 have servers additionally bind an anycast address on which they listen 324 for client requests. 326 Using existing IP multicast has problems associated with it. Firstly, 327 using expanding ring search to locate a network resource is inefficient 328 for two reasons: it requires potentially many re-transmissions of the 329 request from the client, each iteration requiring a larger TTL (see 330 footnote 11) value. This continues until a response is received. 332 The other problem with using IP multicast is that, for any multicast 333 transmission, potentially more than one response may be received. To 334 summarize, using existing IP multicast for anycast is inefficient in its 335 use of network resources, and does not necessarily achieve the desired 336 goal of anycast, namely that only one server respond to a client 337 request. Also, anycasting should not require managing the IP TTL value 338 of client request packets -- the goal of anycast is to send a single 339 packet, which follows a single path, in order to locate a single, 340 preferably nearest, server. 342 Defining a special class of ``anycast'' addresses has several problems 343 associated with it. For example, routing must be adapted to support yet 344 another class of IP address, and routing tables would be required to 345 support anycast routes. Furthermore, segmenting the IP address space 346 yet further not only involves significant administrative burden, but 347 also assumes that existing applications will recognise particular 348 addresses as being anycast [RFC 1546]. 350 _3._2. _T_h_e _C_B_T ``_A_n_y_c_a_s_t'' _S_o_l_u_t_i_o_n 352 It so happens that the CBT multicast architecture provides an effective 353 solution to the anycasting problem, without requiring the definition of 354 special anycast addresses. 356 The CBT architecture was explained in section 2. CBT is especially 357 attractive for resource discovery applications, where it is assumed that 358 different network resources for distinct CBT groups. The reason CBT is 359 particularly suited to resource discovery, as described, is because it 360 typically involves many senders, whereby a sender is not a group member. 361 As we have already explained, CBT multicast, unlike other IP multicast 362 schemes, involves maintaining group-specific state in the network that 363 is independent of the number of active sources. Moreover, this state is 364 constrained to the tree links that span only a group's receivers. 366 In CBT multicast, non-member senders actually utilize unicast to route 367 _________________________ 368 9 11 This is a field of the IP header which is decre- 369 mented each time the corresponding packet traverses a 370 router. If the TTL field reaches zero, a router will 371 discard the packet. 372 9 373 multicast data to the CBT delivery tree. This is known as CBT's 2-phase 374 routing. These packets are unicast addressed to a single core router (of 375 which there may be several), and will first encounter the delivery tree 376 either at the addressed core, or at an on-tree (non-core) router that is 377 on the unicast path between the sender and the addressed core. 379 For typical multicast applications, the receiving on-tree router disem- 380 minates the received packet(s) to adjacent outgoing on-tree neighbours, 381 and neighbours proceed similarly on receipt of a packet. This is how 382 multicast data packets span a CBT tree. 384 For anycast (and resource discovery applications) however, the first 385 on-tree node encountered does not disemminate the packet further, but 386 responds to the received request. 388 Thus, we believe that CBT offers an effective solution to ``anycasting'' 389 and resource discovery in general. However, some questions remain: what 390 level of fault tolerance does the CBT solution offer, by what means does 391 a sender establish the unicast address of a CBT core router, and 392 finally, is there a guarantee that a client request will hit the CBT 393 tree, i.e. reach a server, at the nearest point to the sender? 395 The question of fault tolerance is indirectly related to the question of 396 establishing a core address. A CBT tree should never comprise only one 397 core router for reasons of robustness. We envisage there should be at 398 least two cores for local groups, and possibly up to five for wide-area 399 groups. By whatever means a client establishes the identity of a core, 400 it will always simultaneously establish the identities of all cores for 401 a particular tree. 403 So, how could core addresses be found out about? One obvious solution 404 would be to advertise core addresse, together with their associated net- 405 work resource, in an application such as, or very much like, ``sd''. 407 With regards to our final question, the choice of core will determine if 408 a packet reaches a nearest server. Since users can not be expected to 409 know about network topology, it is assumed that the choice of core will 410 be fairly random. Hence, our scheme makes no guarantees that a client 411 request will reach the nearest server. 413 Part B 415 _1. _P_r_o_t_o_c_o_l _O_v_e_r_v_i_e_w 417 _1._1. _C_B_T _G_r_o_u_p _I_n_i_t_i_a_t_i_o_n 419 Like any of the other multicast schemes, one user, the group initia- 420 tor, initiates a CBT multicast group. The procedures involved in ini- 421 tiating and joining a CBT group involves a little more user interac- 422 tion than current IP multicast schemes, for example, it is necessary 423 to supply information such as desired group scope, as well as select 424 the primary core from a selection of pre-configured core routers. 425 Explicit core rankings help prevent loops when the core tree is ini- 426 tially set up. It also assists in the tree maintenance process should 427 the tree become partitioned. 429 Group initiation could be carried out by a network management centre, 430 or by some other external means, rather than have a user act as group 431 initiator. However, in the author's implementation, this flexibility 432 has been afforded the user, and a CBT group is invoked by means of a 433 graphical user interface (GUI), known as the CBT User Group Manage- 434 ment Interface. 436 NOTE: Work is currently in progress to address the issue of core 437 placement. 439 _1._2. _T_r_e_e _J_o_i_n_i_n_g _P_r_o_c_e_s_s 441 Once the cores have been enumerated by a group's initiator, and the 442 application, port number etc. have been selected, the group- 443 initiating host sends a special CORE-NOTIFICATION message to each of 444 them, which is acknowledged. The purpose of this message is twofold: 445 firstly, to communicate the identities of all of the cores, together 446 with their rankings, to each of them individually; secondly, to 447 invoke the building of the core backbone. These two procedures follow 448 on one to the other in the order just described. New receivers 449 attempting to join whilst the building of the core backbone is still 450 in progress have their explicit JOIN-REQUEST messages stored by 451 whichever CBT-capable router, involved in the core joining process, 452 is encountered first. Routers on the core backbone will usually 453 include not only the cores themselves, but intervening CBT-capable 454 routers on the unicast path between them. Once this set up is com- 455 plete, any pending joins for the same group can be acknowledged. 457 All the CBT-capable routers traversed by a JOIN-ACKnowlegement change 458 their status to CBT-non-core routers for the group identified by 459 group-id. It is the JOIN-ACK that actually creates a tree branch. 461 The JOIN-ACK carries the complete core list for the group, which is 462 stored by each of the routers it traverses. Between sending a JOIN- 463 REQUEST and receiving a JOIN-ACK, a router is in a state of pending 464 membership. A router that is in the join pending state can not send 465 join acknowledgements in response to other join requests received for 466 the same group, but rather caches them for acknowledgement subsequent 467 to its own join being acknowledged. 469 Non-member senders, and new group receivers, are expected to know the 470 address of at least one of the corresponding group's cores in order 471 to send to/join a group. The current specification does not state how 472 this information is gleaned, but it might be obtainable from a direc- 473 tory such as ``sd'' (the multicast session directory) (see footnote 474 2) or from the Domain Name System (DNS). (see footnote 3) 476 In accordance with existing IP multicast schemes, if the scope of 477 multicasts is to extend beyond the local area, at least one CBT- 478 capable router must be present on the local subnetwork for hosts on 479 that subnetwork to utilize CBT multicast delivery. Only one local 480 router, the designated router, is allowed to send to/receive from 481 uptree (i.e. the branch leading to/from the core) for a particular 482 group. We therefore make a clear distinction between a group member- 483 ship interrogator -- the router responsible for sending IGMP host- 484 membership queries onto the local subnet, and the designated router. 485 However, they may or may not be one and the same. LAN specifics are 486 discussed in sections 1.6, 1.7 and 1.8. 488 Once the designated router (DR) has been established, i.e. the router 489 _________________________ 490 9 2 By Van Jacobson et al., LBL. 491 9 3 We considered disseminating core identities by in- 492 cluding them in link-state routing updates. However, 493 this does not provide scalability since it involves 494 global group information distribution. Further, it in- 495 volves a dependency on link-state routing 496 that is on the shortest-path to the corresponding core, the new 497 receiver (host) sends a special CBT report to it, requesting that it 498 join the corresponding delivery tree if it has not already. If the DR 499 has already joined the corresponding tree, then the DR multicasts to 500 the group a notification to that effect back across the subnet. 501 Information included in this notification include whether the DR was 502 successful in joining the corresponding tree, and actual core affili- 503 ation. 505 NOTE: the actual core affiliation of a tree router may differ from 506 the core specified in the join request, if that join is terminated 507 by an on-tree router whose affiliation is to a different core. 509 If the local DR has not joined the tree, then it proceeds to send a 510 JOIN-REQUEST and awaits an acknowledgement, at which time the notifi- 511 cation, as described above, is multicast across the subnetwork. 513 _1._3. _T_r_e_e _L_e_a_v_i_n_g _P_r_o_c_e_s_s 515 A QUIT-REQUEST is a request by a CBT router to leave a group. A 516 QUIT-REQUEST may be sent by a router to detach itself from a tree if 517 and only if it has no members for that group on any directly attached 518 subnets, AND it has received a QUIT-REQUEST on each of its child 519 interfaces for that group (if it has any). The QUIT-REQUEST can only 520 be sent to the parent router. The parent immediately acknowledges 521 the QUIT-REQUEST with a QUIT-ACK and removes that child interface 522 from the tree. Any CBT router that sends a QUIT-ACK in response to 523 receiving a QUIT-REQUEST should itself send a QUIT-REQUEST upstream 524 if the criteria described above are satisfied. 526 Failure to receive a QUIT-ACK despite several re-transmissions gives 527 the sending router the right to remove the relevant parent interface 528 information, and by doing so, removes itself from the CBT tree for 529 that group. 531 _1._4. _T_r_e_e _M_a_i_n_t_e_n_a_n_c_e _I_s_s_u_e_s 533 Robustness features/mechanisms have been built into the CBT protocol 534 as has been deemed appropriate to ensure timely tree re-configuration 535 in the event of a node or core failure. These mechanisms are imple- 536 mented in the form of request-response messages. Their frequency is 537 configurable, with the trade-off being between protocol overhead and 538 timeliness in detecting a node failure, and recovering from that 539 failure. 541 _1._4._1. _N_o_d_e _F_a_i_l_u_r_e 543 The CBT protocol treats core- and non-core failure in the same way, 544 using the same mechanisms to re-establish tree connectivity. 546 Each child node on a CBT tree monitors the status of its 547 parent/parent link at fixed intervals by means of a ``keepalive'' 548 mechanism operating between them. The ``keepalive'' mechanism is 549 implemented by means of two CBT control messages: CBT-ECHO-REQUEST 550 and CBT-ECHO-REPLY. 552 For any non-core router, if its parent router, or path to the parent, 553 fails, that non-core router is initially responsible for re-attaching 554 itself, and therefore all routers subordinate to it on the same 555 branch, to the tree (Note: re-joining is not necessary just because 556 unicast calculates a new next-hop to the core). 558 Subsequent to sending a QUIT-REQUEST on the parent link, a non-core 559 router initially attempts to re-join the tree by sending a RE-JOIN- 560 REQUEST (see section 1.4.4) on an alternate path (the alternate path 561 is derived from unicast routing) to an arbitrary alternate core 562 selected from the core list. The corresponding core is tested for 563 reachability before the re-join is sent, by means of the control mes- 564 sage: CBT-CORE-PING. Failure to receive a response from the selected 565 core will result in another being selected, and the process continues 566 to repeat itself until a reachable core is found. 568 The significance of sending a RE-JOIN-REQUEST (as opposed to a JOIN- 569 REQUEST) is because of the presence of subordinate routers, i.e. 570 there exists a downstream branch connected to the re-joining router. 571 Care must be taken in this case to avoid loops forming on the tree. 572 If the joining router did not have downstream routers connected to 573 it, it would not be necessary to take precautions to avoid loops 574 since they could not occur (this is explained in more detail in sec- 575 tion 1.4.3). 577 NOTE: It was an engineering design decision not to flush the com- 578 plete (downstream) branch when some (upstream) router detects a 579 failure. Whilst each router would join via its shortest-path to 580 the corresponding core, it would result in an overall longer re- 581 connectivity latency. 583 A FLUSH-TREE control message is however sent if the best next-hop of 584 the re-join is a child on the same tree. 586 _1._4._2. _C_o_r_e _F_a_i_l_u_r_e 588 Once the core tree has been established as the initial step of group 589 initiation, core router failure thereafter is handled no differently 590 than non-core router failure, with a core attempting to re-connect 591 itself to the corresponding tree by means of either a join or re- 592 join. 594 When a core router re-starts subsequent to failure, it will have no 595 knowledge of the tree for which it is supposed to be currently a 596 core. The only means by which it can find out, and therefore re- 597 establish itself on the corresponding tree is if some other on-tree 598 router sends it a CBT-CORE-PING message. This message, by default, 599 always contains the identities of all the cores for a group, together 600 with the group-id. 602 On receipt of a CBT-CORE-PING, a recently re-started core will re- 603 join the tree by means of a JOIN-REQUEST. 605 _1._4._3. _U_n_i_c_a_s_t _T_r_a_n_s_i_e_n_t _L_o_o_p_s 607 Routers rely on underlying unicast routing to carry JOIN-REQUESTs 608 towards the core of a core-based tree. However, subsequent to a 609 topology change, transient routing loops, so called because of their 610 short-lived nature, can form in routing tables whilst the routing 611 algorithm is in the process of converging or stabilizing. 613 There are two cases to consider with respect to CBT and unicast tran- 614 sient loops, namely: 616 o+ a join is sent over a transient loop, but no part of the 617 corresponding CBT tree forms part of that loop. In this case, 618 the join will never get acknowledged and will therefore timeout. 619 Subsequent re-tries will succeed after the transient loop has 620 disappeared. 622 o+ a join is sent over a transient loop, and the loop consists 623 either partly or entirely of routers on the corresponding CBT 624 tree. If the loop consists only partly of routers on the tree 625 and the join originated at a router that is not attempting to 626 re-join the tree, then the JOIN-REQUEST will be acknowledged. No 627 further action is necessary since a loop-free path exists from 628 the originating router to the tree. 630 If the loop consists entirely of routers on the tree, then the 631 router originating the join is attempting to re-join the tree. 632 In this case also, the join could be acknowledged which would 633 result in a loop forming on the tree, so we have designed a 634 loop-detection mechanism which is described below. 636 _1._4._4. _L_o_o_p _D_e_t_e_c_t_i_o_n 638 The CBT protocol incorporates an explicit loop-detection mechanism. 639 Loop detection is only necessary when a router, with at least one 640 child, is attempting to re-connect itself to the corresponding tree. 642 We distinguish between three types of JOIN-REQUEST: active; active 643 re-join; and non-active re-join (see Part C, section 1.3). 645 An active JOIN-REQUEST for group A is one which originates from a 646 router which has no chilren belonging to group A. 648 An active re-join for group A is one which originates from a router 649 that has children belonging to group A. 651 A non-active re-join is one that originally started out as an active 652 re-join, but has reached an on-tree router for the corresponding 653 group. At this point, the router changes the join status to non- 654 active re-join and forwards it on its parent branch, as does each CBT 655 router that receives it. Should the router that originated the active 656 re-join subsequently receive the non-active re-join, a loop is obvi- 657 ously present in the tree. The router must therefore immediately send 658 a QUIT-REQUEST to its parent router, and attempt to re-join again. In 659 this way the re-join acts as a loop-detection packet. 661 Another scenario that requires consideration is when there is a break 662 in the path (tunnel) between a child and its parent. Although the 663 parent is active, the child believes that the parent is down -- the 664 child cannot distinguish between the parent being down and the path 665 to it being down. If the path failure is short-lived, whilst the 666 child will have chosen a new route to the core, the parent will be 667 unaware of this, and will continue forwarding over its child inter- 668 faces, the potential risk being apparent. 670 We guard against this using a child assert mechanism, which is impli- 671 cit, i.e. no control message overhead is incurred for this mechanism. 672 If no CBT-ECHO-REQUEST is heard, after a certain interval the 673 corresponding child interface is removed by the parent. 675 As an additional precaution against packet looping, multicast data 676 packets that are in the process of spanning a CBT's delivery tree 677 branches (remember, we distinguish between actual tree branches and 678 attached subnetworks, although there are cases when they are one and 679 the same) carry an on-tree indicator in the CBT header of the packet. 680 Provided a data packet arrives via a valid tree interface, all 681 routers are obliged to check that the on-tree indicator is set 682 accordingly. A data packet arriving at the tree for the first time 683 from a non-member sender will have the on-tree indicator bits set by 684 the receiving router. These bits should never subsquently be modified 685 by any router. Should a packet be erroneously forwarded by an on- 686 tree router over an off-tree interface, should that packet somehow 687 work its way back on tree, it can be immediately recognised and dis- 688 carded. 690 _1._5. _C_o_r_e _P_l_a_c_e_m_e_n_t 692 As it stands, the current implementation of CBT uses trivial heuris- 693 tics for core placement. 695 Careful placement of core(s) no doubt assists in optimizing the 696 routes between any sender and group members on the tree. Depending 697 on particular group dynamics, such as sender/receiver population, and 698 traffic patterns, it may well be counter-productive to place a 699 core(s) near or at the centre of a group. In any event, there exists 700 no polynomial time algorithm that can find the centre of a dynamic 701 multicast spanning tree. 703 One suggestion might be that cores be statically configured 704 throughout the Internet - there need only be some relatively small 705 number of cores per backbone network (see footnote 4), 706 _________________________ 707 and the addresses of these cores would be ``well-known''. 709 Work is currently in progress to develop a core location/placement 710 mechanism. 712 _1._6. _L_A_N _D_e_s_i_g_n_a_t_e_d _R_o_u_t_e_r 714 As we have said, there must only ever exist one DR for any particular 715 group that is responsible for uptree forwarding/reception of data 716 packets. 718 A group's DR is elected by means of an explicit mechanism. Whenever a 719 host initiates/joins a group, part of the process is for it to send a 720 CBT-DR-SOLICITATION message, addressed to the CBT ``all-routers'' 721 address, which is a request for the best next-hop router to a speci- 722 fied core. 724 If the group is being initiated, a DR will almost certainly not be 725 present on the local subnet for the group, whereas if a group is 726 being joined, the DR may or may not be present, depending on whether 727 there exist other group members on the LAN (subnet). 729 If a DR is present for the specified group, it responds to the soli- 730 citation with a CBT-DR-ADVERTISEMENT, which is addressed to the 731 group. 733 If no DR is present, each CBT router inspects its unicast routing 734 table to establish whether it is the next best-hop to the specified 735 core. 737 A router which considers itself the best next-hop does not respond 738 immediately with an advertisement, but rather sends a CBT-DR-ADV- 739 NOTIFICATION to the CBT ``all-routers'' address. This is a precau- 740 tionary measure to prevent more than one router advertising itself as 741 _________________________ 742 4 The storage and switching overhead incurred by 743 these core routers increases linearly with the number 744 of groups traversing them. A threshold value could be 745 introduced indicating the maximum number of groups per- 746 mitted to traverse a core router. Once exceeded, addi- 747 tional core routers would need to be assigned to the 748 backbone. 750 the DR for the group (it is conceivable that more than one router 751 might think itself as the best next-hop to the core). If this 752 scenario does indeed occur, the advertisement notification acts as a 753 tie-breaker, the router with the lowest address winning the election. 754 The lowest addressed router subsequently advertises itself as DR for 755 the group. 757 _1._7. _N_o_n-_M_e_m_b_e_r _S_e_n_d_i_n_g 759 For non-member senders wishing to send multicasts beyond the scope of 760 the local subnetwork, the presence of a local CBT-capable router is 761 mandatory. The sending of multicast packets from a non-member host to 762 a particular group is two-phase: the first phase involves a host uni- 763 casting the packet from the originating host to one of the group's 764 cores (the destination field of the IP header carries the unicast 765 address of the core). The second phase is the disemmination of the 766 the packet by the receiving router to neighbouring (adjacent) routers 767 on the corresponding tree. Similarly, when an on-tree neighbour 768 receives the packet, it distributes it in the same fashion. 770 Before the multicast leaves the originating subnetwork, it is neces- 771 sary for the local CBT DR to append a CBT header to the packet 772 (behind the IP header), and change the IP destination address field 773 from a multicast address to the unicast address of a core for the 774 group. How does the CBT DR know that this multicast address is asso- 775 ciated with a CBT group? The answer is that there must be some form 776 of mapping mechanism, which has information about which group address 777 correspond to CBT multicast groups. This mechanism maps an IP multi- 778 cast address to a unicast core address. 780 Packets sent from a non-member sender will first encounter the 781 corresponding delivery tree either at the addressed core, or hit an 782 on-tree router that is on the shortest-path between the sender and 783 the core. What happens when a CBT packet hits the corresponding 784 delivery tree is dealt with under ``Data Packet Forwarding'' in sec- 785 tion 1.8 below. 787 NOTE: No host changes are required for CBT. CBT hosts are simply 788 required to run the CBT application-level software that provides the 789 CBT user group management interface. 791 _1._8. _D_a_t_a _P_a_c_k_e_t _F_o_r_w_a_r_d_i_n_g 793 In this section we describe how multicast data packets span a CBT 794 tree. 796 It is important to note that CBT uses the Internet Group Management 797 Protocol (IGMP) in much the same way as traditional IP schemes, 798 namely to establish group presence on directly-connected subnets, 799 and to exchange CBT routing information. A new IGMP message type 800 has been created for exchanging CBT routing messages. 802 We must again bring to the reader's attention the distinction between 803 tree branches and subnets, although there are cases where they are 804 one and the same. 806 It has been an important engineering design goal for CBT to be back- 807 wards compatible with IP-style multicasts. Until the interface with 808 other multicast protocols is clearly defined, CBT routing information 809 is not exchanged with that of any other schemes. 811 IP-style multicast data packets arriving at a CBT router are checked 812 to see if they originated locally. If not, they are discarded. Other- 813 wise, the local CBT DR for the group first sends a copy of the IP- 814 style packet over any directly-connected subnetworks with group 815 member presence (provided the TTL allows), then appends a CBT header 816 to the packet for forwarding over outgoing tree interfaces. 818 CBT-style packets arriving at a CBT router are forwarded over tree 819 interfaces for the group, and sent IP-style over any directly- 820 connected subnetworks with group member presence. The conversion from 821 a CBT-style packet to an IP-style packet requires the copying of 822 various fields of the CBT header to the IP header. 824 The child(ren) or parent of a CBT router may be reachable over a 825 multi-access LAN. This is the case where a subnetwork and a tree 826 branch are one and the same. In this case, the forwarding of the 827 CBT-style packets is achieved with multicast as opposed to unicast. 828 End-systems subscribed to the same group may receive these packets, 829 but they will not be processed, since end-systems will not recognise 830 the upper-layer protocol identifier, i.e. CBT. 832 NOTE: it was an engineering design decision to multicast data pack- 833 ets with a CBT header on multi-access links -- the case of unicast- 834 ing separately from parent to n children is clearly more costly. 835 Multicasting also reduces traffic -- when a parent receives a 836 packet, it does not need to re-send the packet to any of its other 837 children that may be present on the multi-access link, since they 838 will have received a copy from the child's multicast. 840 Data arriving at a CBT router is always multicast first IP-style onto 841 any directly-connected subnets with group member presence, and only 842 subsequently unicast (multicast on multi-access links) to 843 parent/children with a CBT header. 845 A CBT router will not forward IP-style multicsat data packets unless 846 that router has a forwarding information base (FIB) entry for the 847 specified group, The exception to this is if a multicast originates 848 on a local subnetwork. In this case, the local CBT DR for the group 849 needs to insert a CBT header in the packet (behind the IP hdr) and 850 unicast it to one of the cores for the group. 852 A CBT FIB entry is shown below: 854 32-bits 8 8 4 8 | 8 855 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 856 | group-id | parent addr | parent vif | No. of | | 857 | | index | index |children | children | 858 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 859 |chld addr |chld vif | 860 | index | index | 861 |+-+-+-+-+-+-+-+-+-+-+ 862 |chld addr |chld vif | 863 | index | index | 864 |+-+-+-+-+-+-+-+-+-+-+ 865 |chld addr |chld vif | 866 | index | index | 867 |+-+-+-+-+-+-+-+-+-+-+ 868 | | 869 | etc. | 870 |+-+-+-+-+-+-+-+-+-+-+ 872 Figure 2. CBT FIB entry 874 The CBT DR for the specified group fills in the CBT and IP headers as 875 follows (the CBT header is shown over): 877 o+ the multicast group address (group-id) is inserted into the 878 group-id field of the CBT header. 880 o+ the unicast address of a core router for the corresponding group 881 is placed in the core address field of the CBT hdr. 883 o+ the IP address of the originating host is inserted into the ori- 884 gin field of the CBT header. 886 o+ the proto field of the CBT header is set to identify the upper- 887 layer (transport) protocol. 889 o+ the ttl field of the CBT header is either decremented (if CBT- 890 style packet was received) or it is set to the value reflected 891 in the packet's IP hdr (if the pkt originated locally). 893 o+ the on-tree field of the CBT header is set (provided this CBT 894 router is on-tree for the specified group). It is left unset 895 otherwise. 897 o+ the source address field of the IP header is set to the unicast 898 address of the originating host (the IP src addr changes as the 899 CBT-style packet is passed router-to-router on a CBT tree). 901 o+ the destination field of the IP header is set to the unicast 902 address of the on-tree neighbour (set to group address if more 903 than one neighbour is reachable over the same interface). 905 o+ the protocol field of the IP header is set to the CBT protocol 906 value. 908 o+ the TTL value of the IP header is set to MAX_TTL. 910 The packet is now ready for sending. Once this packet arrives at a 911 CBT router, the packet is ``reverse-engineered'' (using the informa- 912 tion carried in the CBT hdr) to produce an IP-style multicast for 913 sending on directly-connected subnets with group presence. 915 Part C 917 _1. _C_B_T _P_a_c_k_e_t _F_o_r_m_a_t_s _a_n_d _M_e_s_s_a_g_e _T_y_p_e_s 919 CBT packets travel in IP datagrams. We distinguish between two types 920 of CBT packet: CBT data packets, and CBT control packets. 922 CBT data packets carry a CBT header when these packets are traversing 923 CBT tree branches. The CBT header is positioned immediately behind 924 the IP header. 926 CBT control packets carry a CBT control header. All CBT control mes- 927 sages are implemented over UDP. This makes sense for several reasons: 928 firstly, all the information required to build a CBT delivery tree is 929 kept in user space. Secondly, implementation is made considerably 930 easier. 932 CBT control messages fall into two categories: primary maintenance 933 messages, which are concerned with tree-building, re-configuration, 934 and teardown, and auxiliary maintenance messsages, which are mainly 935 concerned with general tree maintenance. 937 _1._1. _C_B_T _H_e_a_d_e_r _F_o_r_m_a_t 939 See over.... 941 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 942 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 943 | vers |unused | type | hdr length | protocol | 944 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 945 | checksum | IP TTL | on-tree|unused| 946 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 947 | group identifier | 948 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 949 | core address | 950 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 951 | packet origin | 952 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 953 | flow identifier | 954 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 955 | security fields | 956 | (T.B.D) | 957 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 959 Figure 3. CBT Header 961 Each of the fields is described below: 963 o+ Vers: Version number -- this release specifies version 1. 965 o+ type: indicates whether the payload is data or control infor- 966 mation. 968 o+ hdr length: length of the header, for purpose of checksum 969 calculation. 971 o+ protocol: upper-layer protocol number. 973 o+ checksum: the 16-bit one's complement of the one's complement 974 of the CBT header, calculated across all fields. 976 o+ IP TTL: TTL value gleaned from the IP header where the packet 977 originated. It is decremented each time it traverses a CBT 978 router. 980 o+ on-tree: indicates whether the packet is on- or off-tree. 981 Once this field is set (i.e. on-tree), it is non-changing. 983 o+ group identifier: multicast group address. 985 o+ core address: the unicast address of a core for the group. A 986 core address is always inserted into the CBT header by an 987 originating host, since at any instant, it does not know if 988 the local DR for the group is on-tree. If it is not, the 989 local DR must unicast the packet to the specified core. 991 o+ packet origin: source address of the originating end-system. 993 o+ flow-identifier: value uniquely identifying a previously set 994 up data stream. 996 o+ security fields: these fields (T.B.D.) will ensure the 997 authenticity and integrity of the received packet. 999 _1._2. _C_o_n_t_r_o_l _P_a_c_k_e_t _H_e_a_d_e_r _F_o_r_m_a_t 1001 The individual fields are described below. It should be noted that the 1002 contents of the fields beyond ``group identifier'' are empty in some 1003 control messages: 1005 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1006 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1007 | vers |unused | type | code | unused | 1008 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1009 | hdr length | checksum | 1010 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1011 | group identifier | 1012 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1013 | packet origin | 1014 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1015 | core address | 1016 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1017 | Core #1 | 1018 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1019 | Core #2 | 1020 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1021 | Core #3 | 1022 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1023 | Core #4 | 1024 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1025 | Core #5 | 1026 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1027 | Resource Reservation fields | 1028 | (T.B.D) | 1029 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1030 | security fields | 1031 | (T.B.D) | 1032 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1034 Figure 4. CBT Control Packet Header 1036 o+ Vers: Version number -- this release specifies version 1. 1038 o+ type: indicates control message type (see sections 1.3, 1.4). 1040 o+ code: indicates sub-code of control message type. 1042 o+ header length: length of the header, for purpose of checksum 1043 calculation. 1045 o+ checksum: the 16-bit one's complement of the one's complement 1046 of the CBT control header, calculated across all fields. 1048 o+ group identifier: multicast group address. 1050 o+ packet origin: source address of the originating end-system. 1052 o+ core address: desired/actual core affiliation of control mes- 1053 sage. 1055 o+ Core #Z: Maximum of 5 core addresses may be specified for any 1056 one group. An implementation is not expected to utilize more 1057 than, say, 3. 1059 NOTE: It was an engineering design decision to have a fixed max- 1060 imum number of core addresses, to avoid a variable-sized packet. 1062 o+ Resource Reservation fields: these fields (T.B.D.) are used 1063 to reserve resources as part of the CBT tree set up pro- 1064 cedure. 1066 o+ Security fields: these fields (T.B.D.) ensure the authenti- 1067 city and integrity of the received packet. 1069 _1._3. _P_r_i_m_a_r_y _M_a_i_n_t_e_n_a_n_c_e _M_e_s_s_a_g_e _T_y_p_e_s 1071 There are six types of CBT primary maintenance message, namely: 1073 o+ JOIN-REQUEST: invoked by an end-system, generated and sent 1074 (unicast) by a CBT router to the specified core address. Its 1075 purpose is to establish the sending CBT router as part of the 1076 corresponding delivery tree. 1078 o+ JOIN-ACK: an acknowledgement to the above. The full list of 1079 core addresses is carried in a JOIN-ACK, together with the 1080 actual core affiliation (the join may have been terminated by 1081 an on-tree router on its journey to the specified core, and 1082 the terminating router may or may not be affiliated to the 1083 core specified in the original join). A JOIN-ACK traverses 1084 the same path as the corresponding JOIN-REQUEST, and it is 1085 the receipt of a JOIN-ACK that actually creates a tree 1086 branch. 1088 o+ JOIN-NACK: a negative acknowledgement, indicating that the 1089 tree join process has not been successful. 1091 o+ QUIT-REQUEST: a request, sent from a child to a parent, to be 1092 removed as a child to that parent. 1094 o+ QUIT-ACK: acknowledgement to the above. If the parent, or the 1095 path to it is down, no acknowledgement will be received 1096 within the timeout period. This results in the child 1097 nevertheless removing its parent information. 1099 o+ FLUSH-TREE: a message sent from parent to all children, which 1100 traverses a complete branch. This message results in all tree 1101 interface information being removed from each router on the 1102 branch, possibly because of a re-configuration scenario. 1104 The JOIN-REQUEST has three valid sub-codes, namely JOIN-ACTIVE, RE- 1105 JOIN-ACTIVE, and RE-JOIN-NACTIVE. 1107 A JOIN-ACTIVE is sent from a CBT router that has no children for the 1108 specified group. 1110 A RE-JOIN-ACTIVE is sent from a CBT router that has at least one 1111 child for the specified group. 1113 A RE-JOIN-NACTIVE originally started out as an active re-join, but 1114 has reached an on-tree router for the corresponding group. At this 1115 point, the router changes the join status to non-active re-join and 1116 forwards it on its parent branch, as does each CBT router that 1117 receives it. Should the router that originated the active re-join 1118 subsequently receive the non-active re-join, it must immediately send 1119 a QUIT-REQUEST to its parent router. It then attempts to re-join 1120 again. In this way the re-join acts as a loop-detection packet. 1122 _1._4. _A_u_x_i_l_l_i_a_r_y _M_a_i_n_t_e_n_a_n_c_e _M_e_s_s_a_g_e _T_y_p_e_s 1124 There are eleven CBT auxilliary maintenance message types: 1126 o+ CBT-DR-SOLICITATION: a request sent from a host to the CBT 1127 ``all-routers'' multicast address, for the address of the 1128 best next-hop CBT router on the LAN to the core as specified 1129 in the solicitation. 1131 o+ CBT-DR-ADVERTISEMENT: a reply to the above. Advertisements 1132 are addressed to the ``all-systems'' multicast group. 1134 o+ CBT-CORE-NOTIFICATION: unicast from a group initiating host 1135 to each core selected for the group, this message notifies 1136 each core of the identities of each of the other core(s) for 1137 the group, together with their core ranking. The receipt of 1138 this message invokes the building of the core tree by all 1139 cores other than the highest-ranked (primary core). 1141 o+ CBT-CORE-NOTIFICATION-REPLY: a notification of acceptance to 1142 becoming a core for a group, to the corresponding end-system. 1144 o+ CBT-ECHO-REQUEST: once a tree branch is established, this 1145 messsage acts as a ``keepalive'', and is unicast from child 1146 to parent. 1148 o+ CBT-ECHO-REPLY: positive reply to the above. 1150 o+ CBT-CORE-PING: unicast from a CBT router to a core when a 1151 tree router's parent has failed. The purpose of this message 1152 is to establish core reachability before sending a JOIN- 1153 REQUEST to it. 1155 o+ CBT-PING-REPLY: positive reply to the above. 1157 o+ CBT-TAG-REPORT: unicast from an end-system to the designated 1158 router for the corresponding group, subsequent to the end- 1159 system receiving a designated router advertisement (as well 1160 as a core notification reply if group-initiating host). This 1161 message invokes the sending of a JOIN-REQUEST if the receiv- 1162 ing router is not already part of the corresponding tree. 1164 o+ CBT-CORE-CHANGE: group-specific multicast by a CBT router 1165 that originated a JOIN-REQUEST on behalf of some end-system 1166 on the same LAN (subnet). The purpose of this message is to 1167 notify end-systems on the LAN belonging to the specified 1168 group of such things as: success in joining the delivery 1169 tree; actual core affiliation. 1171 o+ CBT-DR-ADV-NOTIFICATION: multicast to the CBT ``all-routers'' 1172 address, this message is sent subsequent to receiving a CBT- 1173 DR-SOLICITATION, but prior to any CBT-DR-ADVERTISEMENT being 1174 sent. It acts as a tie-breaking mechanism should more than 1175 one router on the subnet think itself the best next-hop to 1176 the addressed core. It also promts an already established DR 1177 to announce itself as such if it has not already done so in 1178 response to a CBT-DR-SOLICITATION. 1180 Part D 1182 _1. _I_n_t_e_r_o_p_e_r_a_b_i_l_i_t_y _I_s_s_u_e_s 1184 One of the design goals of CBT is for it to fully interwork with 1185 other IP multicast schemes. We have already described how CBT-style 1186 packets are transformed into IP-style multicasts, and vice-versa. 1188 In order for CBT to fully interwork with other schemes, it is neces- 1189 sary to define the interface(s) between a ``CBT cloud'' and the cloud 1190 of another scheme. The CBT authors are currently working out the 1191 details of the ``CBT-other'' interface, and therefore we omit further 1192 discussion of this topic at the present time. 1194 _2. _A _R_o_u_t_e_r _O_p_t_i_m_i_z_a_t_i_o_n 1196 In a CBT-only environment it is possible to optimize the performance 1197 of CBT with respect to data packet forwarding in CBT-capable routers. 1198 In such an environment the presence of a CBT header is not necessary, 1199 and its absence is likely to improve switching times by around 50 per 1200 cent. However, the downside is that the functionality the CBT header 1201 provides, such as CBT security, is lost. 1203 _3. _C_B_T _S_e_c_u_r_i_t_y _A_r_c_h_i_t_e_c_t_u_r_e 1205 see current I-D: draft-ballardie-mkd-00.{ps,txt} 1207 _4. _A_c_k_n_o_w_l_e_d_g_e_m_e_n_t_s 1209 Special thanks goes to Paul Francis, NTT Japan, for the original 1210 brainstorming sessions that brought about this work. 1212 Steve Ostrowitz (Bay Networks Inc.) for his suggestions and comments 1213 on making a CBT router implemention as optimal as possible. 1215 I would also like to thank the participants of the IETF IDMR working 1216 group meetings for their general constructive comments and sugges- 1217 tions since the inception of CBT. 1219 Author's Address: 1221 Tony Ballardie, 1222 Department of Computer Science, 1223 University College London, 1224 Gower Street, 1225 London, WC1E 6BT, 1226 ENGLAND, U.K. 1228 Tel: ++44 (0)71 387 7050 x. 3462 1229 e-mail: A.Ballardie@cs.ucl.ac.uk 1231 NOTE: For a version of this draft containing all diagrams and refer- 1232 ences, you are recommended to retrieve the .ps version.