idnits 2.17.1 

draft-ietf-idmr-cbt-spec-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-24) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  ** The document is more than 15 pages and seems to lack a Table of Contents.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an Introduction section.

  ** The document seems to lack a Security Considerations section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** There are 31 instances of too long lines in the document, the longest
     one being 7 characters in excess of 72.

  ** There are 6 instances of lines with control characters in the document.

  ** The abstract seems to contain references ([RFC1704], [RFC1546]), which
     it shouldn't.  Please replace those with straight textual mentions of the
     documents in question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == Line 113 has weird spacing: '...on)  of  vario...'

  == Line 114 has weird spacing: '...his may  not  ...'

  == Line 368 has weird spacing: '...r which  is  d...'

  == Line 369 has weird spacing: '...mented  each  ...'

  == Line 370 has weird spacing: '... router  will...'

  == (8 more instances...)

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- Couldn't find a document date in the document -- date freshness check
     skipped.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Missing reference section? 'RFC 1704' on line 44 looks like a reference

  -- Missing reference section? 'RFC 1546' on line 348 looks like a reference


     Summary: 13 errors (**), 0 flaws (~~), 7 warnings (==), 4 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Inter-Domain Multicast Routing (IDMR)                    A. J. Ballardie
3	INTERNET-DRAFT                                 University College London
4	                                                        April 18th, 1995

6	                    Core Based Trees (CBT) Multicast

8	             -- Architectural Overview and Specification --
9	                    <draft-ietf-idmr-cbt-spec-01.txt>

11	Status of this Memo

13	   This document is an Internet Draft.  Internet Drafts are working do-
14	   cuments of the Internet Engineering Task Force (IETF), its Areas, and
15	   its Working Groups. Note that other groups may also distribute work-
16	   ing documents as Internet Drafts).

18	   Internet Drafts are draft documents valid for a maximum of six
19	   months. Internet Drafts may be updated, replaced, or obsoleted by
20	   other documents at any time.  It is not appropriate to use Internet
21	   Drafts as reference material or to cite them other than as a "working
22	   draft" or "work in progress."

24	   Please check the I-D abstract listing contained in each Internet
25	   Draft directory to learn the current status of this or any other In-
26	   ternet Draft.

28	Abstract

30	   CBT is a new architecture for local- and wide-area IP multicasting,
31	   being unique in its utilization of just one shared delivery tree, as
32	   opposed to the source-based delivery trees of traditional IP multi-
33	   cast schemes.

35	   The primary advantages of the CBT approach are that it typically
36	   offers more favourable scaling characteristics than do existing mul-
37	   ticast algorithms. The definition of a new network layer multicast
38	   protocol has also meant that it has been possible to integrate an en-
39	   riched functionality into multicast that is not possible under other
40	   IP multicast schemes, for example, the incorporation of security
41	   features. Besides this functionality providing the ability to authen-
42	   ticate tree-joining host's and routers, optional in-built protocol
43	   mechanisms provide a scalable solution to the multicast key distribu-
44	   tion problem [RFC 1704].

46	   CBT is backwards compatible with traditional IP-style multicast. Host
47	   changes are not required, and a local CBT-capable router is mandatory
48	   if CBT-style multicasts are to be forwarded beyond the local subnet-
49	   work.

51	_1.  _B_a_c_k_g_r_o_u_n_d

53	   Centre based forwarding was first described in the early 1980s by
54	   Wall in his PhD thesis on broadcast and selective broadcast.  At this
55	   time, multicast was in its very earliest stages of development, and
56	   researchers were only just beginning to realise the benefits that
57	   could be gained from it, and some of the uses it could be put to. It
58	   was only later that the class-D multicast address space was defined,
59	   and later again that intrinsic multicast support was taken advantage
60	   of for broadcast media, such as Ethernet.

62	   Now that we have several years practical experience with multicast, a
63	   diversity of multicast applications, and an internetwork infrastruc-
64	   ture that wants to support it to an ever-increasing degree, we re-
65	   visit the centre-based forwarding paradigm introduced by Wall, and
66	   mould and adapt it specifically for today's multicast environment.

68	_2.  _I_n_t_r_o_d_u_c_t_i_o_n

70	   Multicast group communication is an increasingly important capability
71	   in many of today's data networks. Most LANs and more recent wide-area
72	   network technologies such as SMDS and ATM specify multicast as part
73	   of their service.

75	   Since the wide-area introduction of multicasting there has been a
76	   large increase in the number and diversity of multicast applications,
77	   examples of which include audio and video conferencing, replicated
78	   database updating and querying, software update distribution, stock
79	   market information services, and more recently, resource discovery.
80	   Multimedia is another fast expanding area for which multicast offers
81	   an invaluable service. It has therefore been necessary of late to
82	   address the topic of scalability with regards to multicast algo-
83	   rithms, since, if they do not scale to an internetwork size that is
84	   expected (given the growth rate of the last several years), they can-
85	   not be of longlasting benefit. This motivates the need for new multi-
86	   casting techniques to be investigated.

88	   This draft describes a new multicast routing architecture and proto-
89	   col which is applicable to a datagram network. The CBT architecture
90	   has attractive scaling characteristics. We measure scalability in
91	   terms of network state maintenance, bandwidth- and processing costs.

93	_3.  _D_o_c_u_m_e_n_t _L_a_y_o_u_t

95	   The remainder of this document is divided into three parts: Part A
96	   offers a general architectural overview and discussion on the CBT
97	   architecture. This section also includes a description of CBT ``any-
98	   casting'' [see RFC 1546].

100	   Parts B and C comprise the protocol specification. Part B describes
101	   protocol engineering design features, such as CBT group initiation,
102	   the tree joining process, tree maintenance issues, the tree leaving
103	   process, LAN issues, data packet forwarding, and data packet encapsu-
104	   lation and translation (see footnote 1)

106	   Part C illustrates and describes in detail, individual CBT packet
107	   formats and message types.

109	   Part D looks briefly at some other related issues.

111	9_________________________
112	9  1 We will refer to the copying (and sometimes altera-
113	tion)  of  various  fields  of  the  IP header to a CBT
114	header as translation throughout. This may  not  be  in
115	total agreement with how the term is used elsewhere.

117	Part A

119	_1.  _C_B_T - _T_h_e _N_e_w _A_r_c_h_i_t_e_c_t_u_r_e

121	_2.  _A_r_c_h_i_t_e_c_t_u_r_a_l _O_v_e_r_v_i_e_w

123	   A core-based tree involves having a single node, in our case a router
124	   (with additional routers for robustness), known as the core of the
125	   tree, from which branches emmanate. These branches are made up of
126	   other routers, so-called non-core routers, which form a shortest for-
127	   ward path between a member-host's directly attached router, and the
128	   core. A router at the end of a branch shall be known as a leaf router
129	   on the tree.

131	   The CBT protocol builds a delivery tree reflecting the architecture
132	   just described.  This architecture allows for the enhancement of the
133	   scalability of the multicast algorithm with regards to group-specific
134	   state maintained in the network, particularly for the case where
135	   there are many active senders in a particular group. The CBT archi-
136	   tecture offers an improvement in scalability over existing techniques
137	   by a factor of the number of active sources (where a source is a sub-
138	   network aggregate).  Hence, a core-based architecture allows us to
139	   significantly improve the overall scaling factor of S * N we have in
140	   the source-based tree architecture, to just N. This is the result of
141	   having just one multicast tree per group as opposed to one tree per
142	   (source, group) pair.

144	   It is also interesting to note that routers between a non-member
145	   sender and the CBT delivery tree need no knowledge of the multicast
146	   tree/group whatsoever in order to forward CBT multicasts, since these
147	   are unicast towards the core. This two-phase routing approach is
148	   unique to the CBT architecture. One such application that can take
149	   advantage of this two-phase routing is resource discovery, whereby a
150	   resource, for example, a replicated database, is distributed in dif-
151	   ferent locations throughout the Internet. The databases in the dif-
152	   ferent locations make up a single multicast group, linked by a CBT
153	   tree. A client need only know the address of (one of) the core(s) for
154	   the group in order to send (unicast) a request to it. Such a request
155	   would not span the tree in this case, but would be answered by the
156	   first tree router encountered, making it quite likely that the
157	   request is answered by the ``nearest'' server. Effectively, this
158	   corresponds to an ``anycast'' service [RFC 1546] (see section X).

160	   A diagram showing a single-core CBT tree is shown in the figure
161	   below. Only one core is shown to demonstrate the principle.

163	           b      b     b-----b
164	            \     |     |
165	             \    |     |
166	              b---b     b------b
167	             /     \  /                   KEY....
168	            /       \/
169	           b         X---b-----b          X = Core
170	                    / \                   b = non-core router
171	                   /   \
172	                  /     \
173	                  b      b------b
174	                 / \     |
175	                /   \    |
176	               b     b   b

178	                      Figure 1: Single-Core CBT Tree

180	_2._1.  _A_r_c_h_i_t_e_c_t_u_r_a_l _J_u_s_t_i_f_i_c_a_t_i_o_n

182	   First of all, exactly what is a core-based tree (CBT) architecture?
183	   Core-based, or centre-based forwarding trees, were first described by
184	   Wall in his investigation into low-delay approaches to broadcast and
185	   selective broadcast. Wall concluded that delay will not be minimal,
186	   as with shortest-path trees, but the delay can be kept within bounds
187	   that may be acceptable.  Simulations have recently been carried out
188	   to compare the maximum and average delays of centre-based and
189	   shortest-path trees. A summary of these simulations can be found in

191	   In the context of multicast, the extent to which the delay charac-
192	   teristics of a shared tree are less optimal than SPTs, is question-
193	   able. The simulation results state that CBTs incur, on average, a 10%
194	   increase in delay over SPTs.  Slight discrepancies in delay may not
195	   be a critical factor for many multicast applications, such as
196	   resource discovery or database updating/querying. Even for real-time
197	   applications such as voice and video conferencing, a core based tree
198	   may indeed be acceptable, especially if the majority of branches of
199	   that tree span high-bandwidth links, such as optical fibre. In
200	   several years' time it is easy to envisage the Internet being host to
201	   thousands of active multicast groups, and similarly, the bandwidth
202	   capacity on many of the Internet links may well far exceed those of
203	   today.

205	   An important question raised in the SPT vs. CBT debate is: how effec-
206	   tively can load sharing be achieved by the different schemes? It
207	   would seem that SPT schemes cannot achieve load balancing because of
208	   the nature of their forwarding: nodes on a SPT do not have the option
209	   to forward incoming packets over different links (i.e. load balance)
210	   because of the danger of loops forming in the multicast tree topol-
211	   ogy.

213	   With shared tree schemes however, each receiver can choose which of
214	   the small selection of cores it wishes to join. Cores and on-tree
215	   nodes can be configured to accept only a certain number of joins,
216	   forcing a receiver to join via a different path. This flexibility
217	   gives shared tree schemes the ability to achieve load balancing.

219	   In general, spread over all groups, CBT has the ability to randomize
220	   the group set over different trees (spanning different links around
221	   the centre of the network), something that would not seem possible
222	   under SPT schemes.

224	   Finally, the CBT protocol requires each receiver to explicitly join
225	   the delivery tree, resulting in a tree spanning only a group's
226	   receivers. As a result, data flows only over those links that lead to
227	   receivers, and thus there is no requirement for off-tree routers to
228	   maintain prune state, which prevents data flow where it is not
229	   needed.

231	_2._2.  _T_h_e _I_m_p_l_i_c_a_t_i_o_n_s _o_f _S_h_a_r_e_d _T_r_e_e_s

233	   The trade-offs introduced by the CBT architecture focus primarily
234	   between a reduction in the overall state the network must maintain
235	   (given that a group has a significant proportion of active senders),
236	   and the potential increased delay imposed by a shared delivery tree.

238	   We have emphasized CBT's much improved scalability over existing
239	   schemes for the case where there are {\m active} group senders. How-
240	   ever, because of CBT's ``hard-state'' approach to tree building, i.e.
241	   group tree link information does not time out after a period of inac-
242	   tivity, as is the case with most source-based architecutures,
243	   source-based architectures scale best when there are no senders to a
244	   multicast group. This is because multicast routers in the network
245	   eventually time out all information pertaining to an inactive group.
246	   Source-based trees are said to be built ``on-demand'', and are
247	   ``data-driven''.

249	   A consequence of the ``hard-state'' approach is that multicast tree
250	   branches do not automatically adapt to underlying multicast route
251	   changesotnote{If multicast were part of the global internetwork
252	   infrastructure, multicast routes are gleaned exclusively from {\m
253	   unicast} routes.}.  This is in contrast to the ``soft-state'', data-
254	   driven approach -- data always follows the path as specified in the
255	   routing table. Provided reachability is not lost, it is advantageous,
256	   from the perspective of uninterrupted packet flow, that a multicast
257	   route is kept constant, but the two disadvantages are: a route may
258	   not be optimal for its entire duration, and, ``hard-state'' requires
259	   the incorporation of {\m control messages} that monitor reachability
260	   between adjacent routers on the multicast tree. This control message
261	   overhead can be quite considerable unless some form of message aggre-
262	   gation is employed.

264	   In terms of the effectiveness of the CBT approach to multicasting,
265	   the increased delay factor imposed by a shared delivery tree may not
266	   always be acceptable, particularly if a portion of the delivery tree
267	   spans low bandwidth links. This is especially relevant for real-time
268	   applications, such as voice conferencing.

270	   Another consequence of one shared delivery tree is that the cores for
271	   a particular group, especially large, widespread groups with numerous
272	   active senders, can potentially become traffic ``hot-spots'' or
273	   ``bottlenecks''. This has been referred to as the {\m traffic concen-
274	   tration} effect in

276	   The branches of a CBT tree are made up of a collection of branches,
277	   rooted at the tree node that originated a join-request, and terminat-
278	   ing at the tree node that acknowledged the same join. This has impli-
279	   cations where asymmetric routes are concerned (similar to source-
280	   based schemes based on RPF) -- whilst the same CBT branch is used for
281	   data packet flow in {\m both} directions, the child-to-parent direc-
282	   tion constitutes a valid route reflecting the underlying unicast
283	   route (at least at the time the branch was created). However, in the
284	   parent-to-child direction, the path does not necessarily reflect
285	   underlying unicast routing at any instant, and therefore, in a
286	   policy-oriented environment, this {\m might} have disadvantageous
287	   side-effects.

289	   Finally, there are questions concerning the {\m cores} of a group
290	   tree: how are they selected, where are they placed, how are they
291	   managed, and how do new group members get to know about them? We have
292	   attempted to implement some very simple heuristics to address some of
293	   these questions in section X, but these may not be appropriate for
294	   large-scale implementation of CBT.  Work is currently underway in the
295	   development of a core placement/location protocol.

297	   We conclude in section X that most aspects of core management are
298	   topics of further research.

300	_3.  _C_B_T _a_n_d ``_A_n_y_c_a_s_t_i_n_g''

302	_3._1.  _O_v_e_r_v_i_e_w _o_f ``_A_n_y_c_a_s_t_i_n_g''

304	Anycasting [RFC 1546] is a proposed best-effort, stateless, datagram
305	delivery service which is used by hosts primarily to locate particular
306	services on an internetwork.  The goal of anycast is for a client to
307	transmit one request to a resource ``anycast address'', and for a sin-
308	gle, preferably nearest, server to receive the request and respond to
309	it.

311	The motivation for anycasting is that it simplifies the task of finding
312	the appropriate server in a network, and obviates the need to configure
313	applications with particular server address(es), for example, as in DNS
314	resolvers.

316	Questions that, as yet, remain unanswered regarding anycasting, include:
317	how best can anycasting be achieved, and should anycast addresses be a
318	special class of IP address?

320	As for how best to achieve anycast, there are two possible approaches:
321	use existing IP multicast, or, answering our second question, define a
322	special class of IP anycast address within the IP address space, and
323	have servers additionally bind an anycast address on which they listen
324	for client requests.

326	Using existing IP multicast has problems associated with it. Firstly,
327	using expanding ring search to locate a network resource is inefficient
328	for two reasons: it requires potentially many re-transmissions of the
329	request from the client, each iteration requiring a larger TTL (see
330	footnote 11) value. This continues until a response is received.

332	The other problem with using IP multicast is that, for any multicast
333	transmission, potentially more than one response may be received. To
334	summarize, using existing IP multicast for anycast is inefficient in its
335	use of network resources, and does not necessarily achieve the desired
336	goal of anycast, namely that only one server respond to a client
337	request. Also, anycasting should not require managing the IP TTL value
338	of client request packets -- the goal of anycast is to send a single
339	packet, which follows a single path, in order to locate a single,
340	preferably nearest, server.

342	Defining a special class of ``anycast'' addresses has several problems
343	associated with it. For example, routing must be adapted to support yet
344	another class of IP address, and routing tables would be required to
345	support anycast routes.  Furthermore, segmenting the IP address space
346	yet further not only involves significant administrative burden, but
347	also assumes that existing applications will recognise particular
348	addresses as being anycast [RFC 1546].

350	_3._2.  _T_h_e _C_B_T ``_A_n_y_c_a_s_t'' _S_o_l_u_t_i_o_n

352	It so happens that the CBT multicast architecture provides an effective
353	solution to the anycasting problem, without requiring the definition of
354	special anycast addresses.

356	The CBT architecture was explained in section 2. CBT is especially
357	attractive for resource discovery applications, where it is assumed that
358	different network resources for distinct CBT groups. The reason CBT is
359	particularly suited to resource discovery, as described, is because it
360	typically involves many senders, whereby a sender is not a group member.
361	As we have already explained, CBT multicast, unlike other IP multicast
362	schemes, involves maintaining group-specific state in the network that
363	is independent of the number of active sources. Moreover, this state is
364	constrained to the tree links that span only a group's receivers.

366	In CBT multicast, non-member senders actually utilize unicast to route
367	_________________________
368	9  11 This is a field of the IP header which  is  decre-
369	mented  each  time the corresponding packet traverses a
370	router. If the TTL field reaches zero,  a  router  will
371	discard the packet.
372	9
373	multicast data to the CBT delivery tree. This is known as CBT's 2-phase
374	routing. These packets are unicast addressed to a single core router (of
375	which there may be several), and will first encounter the delivery tree
376	either at the addressed core, or at an on-tree (non-core) router that is
377	on the unicast path between the sender and the addressed core.

379	For typical multicast applications, the receiving on-tree router disem-
380	minates the received packet(s) to adjacent outgoing on-tree neighbours,
381	and neighbours proceed similarly on receipt of a packet. This is how
382	multicast data packets span a CBT tree.

384	For anycast (and resource discovery applications) however, the first
385	on-tree node encountered does not disemminate the packet further, but
386	responds to the received request.

388	Thus, we believe that CBT offers an effective solution to ``anycasting''
389	and resource discovery in general. However, some questions remain: what
390	level of fault tolerance does the CBT solution offer, by what means does
391	a sender establish the unicast address of a CBT core router, and
392	finally, is there a guarantee that a client request will hit the CBT
393	tree, i.e. reach a server, at the nearest point to the sender?

395	The question of fault tolerance is indirectly related to the question of
396	establishing a core address. A CBT tree should never comprise only one
397	core router for reasons of robustness. We envisage there should be at
398	least two cores for local groups, and possibly up to five for wide-area
399	groups. By whatever means a client establishes the identity of a core,
400	it will always simultaneously establish the identities of all cores for
401	a particular tree.

403	So, how could core addresses be found out about? One obvious solution
404	would be to advertise core addresse, together with their associated net-
405	work resource, in an application such as, or very much like, ``sd''.

407	With regards to our final question, the choice of core will determine if
408	a packet reaches a nearest server. Since users can not be expected to
409	know about network topology, it is assumed that the choice of core will
410	be fairly random. Hence, our scheme makes no guarantees that a client
411	request will reach the nearest server.

413	Part B

415	_1.  _P_r_o_t_o_c_o_l _O_v_e_r_v_i_e_w

417	_1._1.  _C_B_T _G_r_o_u_p _I_n_i_t_i_a_t_i_o_n

419	   Like any of the other multicast schemes, one user, the group initia-
420	   tor, initiates a CBT multicast group. The procedures involved in ini-
421	   tiating and joining a CBT group involves a little more user interac-
422	   tion than current IP multicast schemes, for example, it is necessary
423	   to supply information such as desired group scope, as well as select
424	   the primary core from a selection of pre-configured core routers.
425	   Explicit core rankings help prevent loops when the core tree is ini-
426	   tially set up. It also assists in the tree maintenance process should
427	   the tree become partitioned.

429	   Group initiation could be carried out by a network management centre,
430	   or by some other external means, rather than have a user act as group
431	   initiator.  However, in the author's implementation, this flexibility
432	   has been afforded the user, and a CBT group is invoked by means of a
433	   graphical user interface (GUI), known as the CBT User Group Manage-
434	   ment Interface.

436	   NOTE: Work is currently in progress to address the issue of core
437	   placement.

439	_1._2.  _T_r_e_e _J_o_i_n_i_n_g _P_r_o_c_e_s_s

441	   Once the cores have been enumerated by a group's initiator, and the
442	   application, port number etc. have been selected, the group-
443	   initiating host sends a special CORE-NOTIFICATION message to each of
444	   them, which is acknowledged. The purpose of this message is twofold:
445	   firstly, to communicate the identities of all of the cores, together
446	   with their rankings, to each of them individually; secondly, to
447	   invoke the building of the core backbone. These two procedures follow
448	   on one to the other in the order just described. New receivers
449	   attempting to join whilst the building of the core backbone is still
450	   in progress have their explicit JOIN-REQUEST messages stored by
451	   whichever CBT-capable router, involved in the core joining process,
452	   is encountered first. Routers on the core backbone will usually
453	   include not only the cores themselves, but intervening CBT-capable
454	   routers on the unicast path between them. Once this set up is com-
455	   plete, any pending joins for the same group can be acknowledged.

457	   All the CBT-capable routers traversed by a JOIN-ACKnowlegement change
458	   their status to CBT-non-core routers for the group identified by
459	   group-id. It is the JOIN-ACK that actually creates a tree branch.

461	   The JOIN-ACK carries the complete core list for the group, which is
462	   stored by each of the routers it traverses. Between sending a JOIN-
463	   REQUEST and receiving a JOIN-ACK, a router is in a state of pending
464	   membership. A router that is in the join pending state can not send
465	   join acknowledgements in response to other join requests received for
466	   the same group, but rather caches them for acknowledgement subsequent
467	   to its own join being acknowledged.

469	   Non-member senders, and new group receivers, are expected to know the
470	   address of at least one of the corresponding group's cores in order
471	   to send to/join a group. The current specification does not state how
472	   this information is gleaned, but it might be obtainable from a direc-
473	   tory such as ``sd'' (the multicast session directory) (see footnote
474	   2) or from the Domain Name System (DNS). (see footnote 3)

476	   In accordance with existing IP multicast schemes, if the scope of
477	   multicasts is to extend beyond the local area, at least one CBT-
478	   capable router must be present on the local subnetwork for hosts on
479	   that subnetwork to utilize CBT multicast delivery.  Only one local
480	   router, the designated router, is allowed to send to/receive from
481	   uptree (i.e. the branch leading to/from the core) for a particular
482	   group. We therefore make a clear distinction between a group member-
483	   ship interrogator -- the router responsible for sending IGMP host-
484	   membership queries onto the local subnet, and the designated router.
485	   However, they may or may not be one and the same. LAN specifics are
486	   discussed in sections 1.6, 1.7 and 1.8.

488	   Once the designated router (DR) has been established, i.e. the router
489	_________________________
490	9  2 By Van Jacobson et al., LBL.
491	9  3 We considered disseminating core identities by  in-
492	cluding  them  in  link-state routing updates. However,
493	this does not provide  scalability  since  it  involves
494	global  group information distribution. Further, it in-
495	volves a dependency on link-state routing
496	   that is on the shortest-path to the corresponding core, the new
497	   receiver (host) sends a special CBT report to it, requesting that it
498	   join the corresponding delivery tree if it has not already. If the DR
499	   has already joined the corresponding tree, then the DR multicasts to
500	   the group a notification to that effect back across the subnet.
501	   Information included in this notification include whether the DR was
502	   successful in joining the corresponding tree, and actual core affili-
503	   ation.

505	      NOTE: the actual core affiliation of a tree router may differ from
506	     the core specified in the join request, if that join is terminated
507	     by an on-tree router whose affiliation is to a different core.

509	   If the local DR has not joined the tree, then it proceeds to send a
510	   JOIN-REQUEST and awaits an acknowledgement, at which time the notifi-
511	   cation, as described above, is multicast across the subnetwork.

513	_1._3.  _T_r_e_e _L_e_a_v_i_n_g _P_r_o_c_e_s_s

515	   A QUIT-REQUEST is a request by a CBT router to leave a group.  A
516	   QUIT-REQUEST may be sent by a router to detach itself from a tree if
517	   and only if it has no members for that group on any directly attached
518	   subnets, AND it has received a QUIT-REQUEST on each of its child
519	   interfaces for that group (if it has any). The QUIT-REQUEST can only
520	   be sent to the parent router.  The parent immediately acknowledges
521	   the QUIT-REQUEST with a QUIT-ACK and removes that child interface
522	   from the tree. Any CBT router that sends a QUIT-ACK in response to
523	   receiving a QUIT-REQUEST should itself send a QUIT-REQUEST upstream
524	   if the criteria described above are satisfied.

526	   Failure to receive a QUIT-ACK despite several re-transmissions gives
527	   the sending router the right to remove the relevant parent interface
528	   information, and by doing so, removes itself from the CBT tree for
529	   that group.

531	_1._4.  _T_r_e_e _M_a_i_n_t_e_n_a_n_c_e _I_s_s_u_e_s

533	   Robustness features/mechanisms have been built into the CBT protocol
534	   as has been deemed appropriate to ensure timely tree re-configuration
535	   in the event of a node or core failure. These mechanisms are imple-
536	   mented in the form of request-response messages. Their frequency is
537	   configurable, with the trade-off being between protocol overhead and
538	   timeliness in detecting a node failure, and recovering from that
539	   failure.

541	_1._4._1.  _N_o_d_e _F_a_i_l_u_r_e

543	   The CBT protocol treats core- and non-core failure in the same way,
544	   using the same mechanisms to re-establish tree connectivity.

546	   Each child node on a CBT tree monitors the status of its
547	   parent/parent link at fixed intervals by means of a ``keepalive''
548	   mechanism operating between them.  The ``keepalive'' mechanism is
549	   implemented by means of two CBT control messages: CBT-ECHO-REQUEST
550	   and CBT-ECHO-REPLY.

552	   For any non-core router, if its parent router, or path to the parent,
553	   fails, that non-core router is initially responsible for re-attaching
554	   itself, and therefore all routers subordinate to it on the same
555	   branch, to the tree (Note: re-joining is not necessary just because
556	   unicast calculates a new next-hop to the core).

558	   Subsequent to sending a QUIT-REQUEST on the parent link, a non-core
559	   router initially attempts to re-join the tree by sending a RE-JOIN-
560	   REQUEST (see section 1.4.4) on an alternate path (the alternate path
561	   is derived from unicast routing) to an arbitrary alternate core
562	   selected from the core list. The corresponding core is tested for
563	   reachability before the re-join is sent, by means of the control mes-
564	   sage: CBT-CORE-PING. Failure to receive a response from the selected
565	   core will result in another being selected, and the process continues
566	   to repeat itself until a reachable core is found.

568	   The significance of sending a RE-JOIN-REQUEST (as opposed to a JOIN-
569	   REQUEST) is because of the presence of subordinate routers, i.e.
570	   there exists a downstream branch connected to the re-joining router.
571	   Care must be taken in this case to avoid loops forming on the tree.
572	   If the joining router did not have downstream routers connected to
573	   it, it would not be necessary to take precautions to avoid loops
574	   since they could not occur (this is explained in more detail in sec-
575	   tion 1.4.3).

577	     NOTE: It was an engineering design decision not to flush the com-
578	     plete (downstream) branch when some (upstream) router detects a
579	     failure.  Whilst each router would join via its shortest-path to
580	     the corresponding core, it would result in an overall longer re-
581	     connectivity latency.

583	   A FLUSH-TREE control message is however sent if the best next-hop of
584	   the re-join is a child on the same tree.

586	_1._4._2.  _C_o_r_e _F_a_i_l_u_r_e

588	   Once the core tree has been established as the initial step of group
589	   initiation, core router failure thereafter is handled no differently
590	   than non-core router failure, with a core attempting to re-connect
591	   itself to the corresponding tree by means of either a join or re-
592	   join.

594	   When a core router re-starts subsequent to failure, it will have no
595	   knowledge of the tree for which it is supposed to be currently a
596	   core.  The only means by which it can find out, and therefore re-
597	   establish itself on the corresponding tree is if some other on-tree
598	   router sends it a CBT-CORE-PING message. This message, by default,
599	   always contains the identities of all the cores for a group, together
600	   with the group-id.

602	   On receipt of a CBT-CORE-PING, a recently re-started core will re-
603	   join the tree by means of a JOIN-REQUEST.

605	_1._4._3.  _U_n_i_c_a_s_t _T_r_a_n_s_i_e_n_t _L_o_o_p_s

607	   Routers rely on underlying unicast routing to carry JOIN-REQUESTs
608	   towards the core of a core-based tree. However, subsequent to a
609	   topology change, transient routing loops, so called because of their
610	   short-lived nature, can form in routing tables whilst the routing
611	   algorithm is in the process of converging or stabilizing.

613	   There are two cases to consider with respect to CBT and unicast tran-
614	   sient loops, namely:

616	   o+    a join is sent over a transient loop, but no part of the
617	        corresponding CBT tree forms part of that loop. In this case,
618	        the join will never get acknowledged and will therefore timeout.
619	        Subsequent re-tries will succeed after the transient loop has
620	        disappeared.

622	   o+    a join is sent over a transient loop, and the loop consists
623	        either partly or entirely of routers on the corresponding CBT
624	        tree. If the loop consists only partly of routers on the tree
625	        and the join originated at a router that is not attempting to
626	        re-join the tree, then the JOIN-REQUEST will be acknowledged. No
627	        further action is necessary since a loop-free path exists from
628	        the originating router to the tree.

630	        If the loop consists entirely of routers on the tree, then the
631	        router originating the join is attempting to re-join the tree.
632	        In this case also, the join could be acknowledged which would
633	        result in a loop forming on the tree, so we have designed a
634	        loop-detection mechanism which is described below.

636	_1._4._4.  _L_o_o_p _D_e_t_e_c_t_i_o_n

638	   The CBT protocol incorporates an explicit loop-detection mechanism.
639	   Loop detection is only necessary when a router, with at least one
640	   child, is attempting to re-connect itself to the corresponding tree.

642	   We distinguish between three types of JOIN-REQUEST: active; active
643	   re-join; and non-active re-join (see Part C, section 1.3).

645	   An active JOIN-REQUEST for group A is one which originates from a
646	   router which has no chilren belonging to group A.

648	   An active re-join for group A is one which originates from a router
649	   that has children belonging to group A.

651	   A non-active re-join is one that originally started out as an active
652	   re-join, but has reached an on-tree router for the corresponding
653	   group. At this point, the router changes the join status to non-
654	   active re-join and forwards it on its parent branch, as does each CBT
655	   router that receives it. Should the router that originated the active
656	   re-join subsequently receive the non-active re-join, a loop is obvi-
657	   ously present in the tree. The router must therefore immediately send
658	   a QUIT-REQUEST to its parent router, and attempt to re-join again. In
659	   this way the re-join acts as a loop-detection packet.

661	   Another scenario that requires consideration is when there is a break
662	   in the path (tunnel) between a child and its parent. Although the
663	   parent is active, the child believes that the parent is down -- the
664	   child cannot distinguish between the parent being down and the path
665	   to it being down.  If the path failure is short-lived, whilst the
666	   child will have chosen a new route to the core, the parent will be
667	   unaware of this, and will continue forwarding over its child inter-
668	   faces, the potential risk being apparent.

670	   We guard against this using a child assert mechanism, which is impli-
671	   cit, i.e. no control message overhead is incurred for this mechanism.
672	   If no CBT-ECHO-REQUEST is heard, after a certain interval the
673	   corresponding child interface is removed by the parent.

675	   As an additional precaution against packet looping, multicast data
676	   packets that are in the process of spanning a CBT's delivery tree
677	   branches (remember, we distinguish between actual tree branches and
678	   attached subnetworks, although there are cases when they are one and
679	   the same) carry an on-tree indicator in the CBT header of the packet.
680	   Provided a data packet arrives via a valid tree interface, all
681	   routers are obliged to check that the on-tree indicator is set
682	   accordingly. A data packet arriving at the tree for the first time
683	   from a non-member sender will have the on-tree indicator bits set by
684	   the receiving router. These bits should never subsquently be modified
685	   by any router.  Should a packet be erroneously forwarded by an on-
686	   tree router over an off-tree interface, should that packet somehow
687	   work its way back on tree, it can be immediately recognised and dis-
688	   carded.

690	_1._5.  _C_o_r_e _P_l_a_c_e_m_e_n_t

692	   As it stands, the current implementation of CBT uses trivial heuris-
693	   tics for core placement.

695	   Careful placement of core(s) no doubt assists in optimizing the
696	   routes between any sender and group members on the tree.  Depending
697	   on particular group dynamics, such as sender/receiver population, and
698	   traffic patterns, it may well be counter-productive to place a
699	   core(s) near or at the centre of a group. In any event, there exists
700	   no polynomial time algorithm that can find the centre of a dynamic
701	   multicast spanning tree.

703	   One suggestion might be that cores be statically configured
704	   throughout the Internet - there need only be some relatively small
705	   number of cores per backbone network (see footnote 4),
706	_________________________
707	    and the addresses of these cores would be ``well-known''.

709	   Work is currently in progress to develop a core location/placement
710	   mechanism.

712	_1._6.  _L_A_N _D_e_s_i_g_n_a_t_e_d _R_o_u_t_e_r

714	   As we have said, there must only ever exist one DR for any particular
715	   group that is responsible for uptree forwarding/reception of data
716	   packets.

718	   A group's DR is elected by means of an explicit mechanism. Whenever a
719	   host initiates/joins a group, part of the process is for it to send a
720	   CBT-DR-SOLICITATION message, addressed to the CBT ``all-routers''
721	   address, which is a request for the best next-hop router to a speci-
722	   fied core.

724	   If the group is being initiated, a DR will almost certainly not be
725	   present on the local subnet for the group, whereas if a group is
726	   being joined, the DR may or may not be present, depending on whether
727	   there exist other group members on the LAN (subnet).

729	   If a DR is present for the specified group, it responds to the soli-
730	   citation with a CBT-DR-ADVERTISEMENT, which is addressed to the
731	   group.

733	   If no DR is present, each CBT router inspects its unicast routing
734	   table to establish whether it is the next best-hop to the specified
735	   core.

737	   A router which considers itself the best next-hop does not respond
738	   immediately with an advertisement, but rather sends a CBT-DR-ADV-
739	   NOTIFICATION to the CBT ``all-routers'' address. This is a precau-
740	   tionary measure to prevent more than one router advertising itself as
741	_________________________
742	  4 The storage  and  switching  overhead  incurred  by
743	these  core  routers increases linearly with the number
744	of groups traversing them.  A threshold value could  be
745	introduced indicating the maximum number of groups per-
746	mitted to traverse a core router. Once exceeded,  addi-
747	tional  core  routers  would need to be assigned to the
748	backbone.

750	   the DR for the group (it is conceivable that more than one router
751	   might think itself as the best next-hop to the core). If this
752	   scenario does indeed occur, the advertisement notification acts as a
753	   tie-breaker, the router with the lowest address winning the election.
754	   The lowest addressed router subsequently advertises itself as DR for
755	   the group.

757	_1._7.  _N_o_n-_M_e_m_b_e_r _S_e_n_d_i_n_g

759	   For non-member senders wishing to send multicasts beyond the scope of
760	   the local subnetwork, the presence of a local CBT-capable router is
761	   mandatory. The sending of multicast packets from a non-member host to
762	   a particular group is two-phase: the first phase involves a host uni-
763	   casting the packet from the originating host to one of the group's
764	   cores (the destination field of the IP header carries the unicast
765	   address of the core).  The second phase is the disemmination of the
766	   the packet by the receiving router to neighbouring (adjacent) routers
767	   on the corresponding tree. Similarly, when an on-tree neighbour
768	   receives the packet, it distributes it in the same fashion.

770	   Before the multicast leaves the originating subnetwork, it is neces-
771	   sary for the local CBT DR to append a CBT header to the packet
772	   (behind the IP header), and change the IP destination address field
773	   from a multicast address to the unicast address of a core for the
774	   group. How does the CBT DR know that this multicast address is asso-
775	   ciated with a CBT group?  The answer is that there must be some form
776	   of mapping mechanism, which has information about which group address
777	   correspond to CBT multicast groups.  This mechanism maps an IP multi-
778	   cast address to a unicast core address.

780	   Packets sent from a non-member sender will first encounter the
781	   corresponding delivery tree either at the addressed core, or hit an
782	   on-tree router that is on the shortest-path between the sender and
783	   the core. What happens when a CBT packet hits the corresponding
784	   delivery tree is dealt with under ``Data Packet Forwarding'' in sec-
785	   tion 1.8 below.

787	   NOTE: No host changes are required for CBT. CBT hosts are simply
788	   required to run the CBT application-level software that provides the
789	   CBT user group management interface.

791	_1._8.  _D_a_t_a _P_a_c_k_e_t _F_o_r_w_a_r_d_i_n_g

793	   In this section we describe how multicast data packets span a CBT
794	   tree.

796	     It is important to note that CBT uses the Internet Group Management
797	     Protocol (IGMP) in much the same way as traditional IP schemes,
798	     namely to establish group presence on directly-connected subnets,
799	     and to exchange CBT routing information. A new IGMP message type
800	     has been created for exchanging CBT routing messages.

802	   We must again bring to the reader's attention the distinction between
803	   tree branches and subnets, although there are cases where they are
804	   one and the same.

806	   It has been an important engineering design goal for CBT to be back-
807	   wards compatible with IP-style multicasts. Until the interface with
808	   other multicast protocols is clearly defined, CBT routing information
809	   is not exchanged with that of any other schemes.

811	   IP-style multicast data packets arriving at a CBT router are checked
812	   to see if they originated locally. If not, they are discarded. Other-
813	   wise, the local CBT DR for the group first sends a copy of the IP-
814	   style packet over any directly-connected subnetworks with group
815	   member presence (provided the TTL allows), then appends a CBT header
816	   to the packet for forwarding over outgoing tree interfaces.

818	   CBT-style packets arriving at a CBT router are forwarded over tree
819	   interfaces for the group, and sent IP-style over any directly-
820	   connected subnetworks with group member presence. The conversion from
821	   a CBT-style packet to an IP-style packet requires the copying of
822	   various fields of the CBT header to the IP header.

824	   The child(ren) or parent of a CBT router may be reachable over a
825	   multi-access LAN. This is the case where a subnetwork and a tree
826	   branch are one and the same. In this case, the forwarding of the
827	   CBT-style packets is achieved with multicast as opposed to unicast.
828	   End-systems subscribed to the same group may receive these packets,
829	   but they will not be processed, since end-systems will not recognise
830	   the upper-layer protocol identifier, i.e. CBT.

832	     NOTE: it was an engineering design decision to multicast data pack-
833	     ets with a CBT header on multi-access links -- the case of unicast-
834	     ing separately from parent to n children is clearly more costly.
835	     Multicasting also reduces traffic -- when a parent receives a
836	     packet, it does not need to re-send the packet to any of its other
837	     children that may be present on the multi-access link, since they
838	     will have received a copy from the child's multicast.

840	   Data arriving at a CBT router is always multicast first IP-style onto
841	   any directly-connected subnets with group member presence, and only
842	   subsequently unicast (multicast on multi-access links) to
843	   parent/children with a CBT header.

845	   A CBT router will not forward IP-style multicsat data packets unless
846	   that router has a forwarding information base (FIB) entry for the
847	   specified group, The exception to this is if a multicast originates
848	   on a local subnetwork.  In this case, the local CBT DR for the group
849	   needs to insert a CBT header in the packet (behind the IP hdr) and
850	   unicast it to one of the cores for the group.

852	   A CBT FIB entry is shown below:

854	         32-bits          8            8           4         8     |    8
855	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
856	      |   group-id  | parent addr | parent vif | No. of  |                    |
857	      |             |    index    |   index    |children |     children       |
858	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
859	                                                         |chld addr |chld vif |
860	                                                         | index    |  index  |
861	                                                         |+-+-+-+-+-+-+-+-+-+-+
862	                                                         |chld addr |chld vif |
863	                                                         | index    |  index  |
864	                                                         |+-+-+-+-+-+-+-+-+-+-+
865	                                                         |chld addr |chld vif |
866	                                                         | index    |  index  |
867	                                                         |+-+-+-+-+-+-+-+-+-+-+
868	                                                         |                    |
869	                                                         |         etc.       |
870	                                                         |+-+-+-+-+-+-+-+-+-+-+

872	                         Figure 2. CBT FIB entry

874	   The CBT DR for the specified group fills in the CBT and IP headers as
875	   follows (the CBT header is shown over):

877	   o+    the multicast group address (group-id) is inserted into the
878	        group-id field of the CBT header.

880	   o+    the unicast address of a core router for the corresponding group
881	        is placed in the core address field of the CBT hdr.

883	   o+    the IP address of the originating host is inserted into the ori-
884	        gin field of the CBT header.

886	   o+    the proto field of the CBT header is set to identify the upper-
887	        layer (transport) protocol.

889	   o+    the ttl field of the CBT header is either decremented (if CBT-
890	        style packet was received) or it is set to the value reflected
891	        in the packet's IP hdr (if the pkt originated locally).

893	   o+    the on-tree field of the CBT header is set (provided this CBT
894	        router is on-tree for the specified group). It is left unset
895	        otherwise.

897	   o+    the source address field of the IP header is set to the unicast
898	        address of the originating host (the IP src addr changes as the
899	        CBT-style packet is passed router-to-router on a CBT tree).

901	   o+    the destination field of the IP header is set to the unicast
902	        address of the on-tree neighbour (set to group address if more
903	        than one neighbour is reachable over the same interface).

905	   o+    the protocol field of the IP header is set to the CBT protocol
906	        value.

908	   o+    the TTL value of the IP header is set to MAX_TTL.

910	   The packet is now ready for sending. Once this packet arrives at a
911	   CBT router, the packet is ``reverse-engineered'' (using the informa-
912	   tion carried in the CBT hdr) to produce an IP-style multicast for
913	   sending on directly-connected subnets with group presence.

915	Part C

917	_1.  _C_B_T _P_a_c_k_e_t _F_o_r_m_a_t_s _a_n_d _M_e_s_s_a_g_e _T_y_p_e_s

919	   CBT packets travel in IP datagrams. We distinguish between two types
920	   of CBT packet: CBT data packets, and CBT control packets.

922	   CBT data packets carry a CBT header when these packets are traversing
923	   CBT tree branches. The CBT header is positioned immediately behind
924	   the IP header.

926	   CBT control packets carry a CBT control header. All CBT control mes-
927	   sages are implemented over UDP. This makes sense for several reasons:
928	   firstly, all the information required to build a CBT delivery tree is
929	   kept in user space. Secondly, implementation is made considerably
930	   easier.

932	   CBT control messages fall into two categories: primary maintenance
933	   messages, which are concerned with tree-building, re-configuration,
934	   and teardown, and auxiliary maintenance messsages, which are mainly
935	   concerned with general tree maintenance.

937	_1._1.  _C_B_T _H_e_a_d_e_r _F_o_r_m_a_t

939	See over....

941	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
942	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
943	   |  vers |unused |      type     |   hdr length  |   protocol    |
944	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
945	   |          checksum             |      IP TTL   | on-tree|unused|
946	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
947	   |                        group identifier                       |
948	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
949	   |                          core address                         |
950	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
951	   |                          packet origin                        |
952	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
953	   |                         flow identifier                       |
954	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
955	   |                         security fields                       |
956	   |                             (T.B.D)                           |
957	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

959	                          Figure 3. CBT Header

961	   Each of the fields is described below:

963	      o+    Vers: Version number -- this release specifies version 1.

965	      o+    type: indicates whether the payload is data or control infor-
966	           mation.

968	      o+    hdr length: length of the header, for purpose of checksum
969	           calculation.

971	      o+    protocol: upper-layer protocol number.

973	      o+    checksum: the 16-bit one's complement of the one's complement
974	           of the CBT header, calculated across all fields.

976	      o+    IP TTL: TTL value gleaned from the IP header where the packet
977	           originated. It is decremented each time it traverses a CBT
978	           router.

980	      o+    on-tree: indicates whether the packet is on- or off-tree.
981	           Once this field is set (i.e. on-tree), it is non-changing.

983	      o+    group identifier: multicast group address.

985	      o+    core address: the unicast address of a core for the group. A
986	           core address is always inserted into the CBT header by an
987	           originating host, since at any instant, it does not know if
988	           the local DR for the group is on-tree. If it is not, the
989	           local DR must unicast the packet to the specified core.

991	      o+    packet origin: source address of the originating end-system.

993	      o+    flow-identifier: value uniquely identifying a previously set
994	           up data stream.

996	      o+    security fields: these fields (T.B.D.) will ensure the
997	           authenticity and integrity of the received packet.

999	_1._2.  _C_o_n_t_r_o_l _P_a_c_k_e_t _H_e_a_d_e_r _F_o_r_m_a_t

1001	The individual fields are described below. It should be noted that the
1002	contents of the fields beyond ``group identifier'' are empty in some
1003	control messages:

1005	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1006	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1007	   |  vers |unused |      type     |      code     |   unused      |
1008	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1009	   |         hdr length            |            checksum           |
1010	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1011	   |                        group identifier                       |
1012	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1013	   |                          packet origin                        |
1014	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1015	   |                          core address                         |
1016	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1017	   |                             Core #1                           |
1018	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1019	   |                             Core #2                           |
1020	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1021	   |                             Core #3                           |
1022	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1023	   |                             Core #4                           |
1024	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1025	   |                             Core #5                           |
1026	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1027	   |                   Resource Reservation fields                 |
1028	   |                             (T.B.D)                           |
1029	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1030	   |                         security fields                       |
1031	   |                             (T.B.D)                           |
1032	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1034	                  Figure 4. CBT Control Packet Header

1036	      o+    Vers: Version number -- this release specifies version 1.

1038	      o+    type: indicates control message type (see sections 1.3, 1.4).

1040	      o+    code: indicates sub-code of control message type.

1042	      o+    header length: length of the header, for purpose of checksum
1043	           calculation.

1045	      o+    checksum: the 16-bit one's complement of the one's complement
1046	           of the CBT control header, calculated across all fields.

1048	      o+    group identifier: multicast group address.

1050	      o+    packet origin: source address of the originating end-system.

1052	      o+    core address: desired/actual core affiliation of control mes-
1053	           sage.

1055	      o+    Core #Z: Maximum of 5 core addresses may be specified for any
1056	           one group. An implementation is not expected to utilize more
1057	           than, say, 3.

1059	        NOTE: It was an engineering design decision to have a fixed max-
1060	        imum number of core addresses, to avoid a variable-sized packet.

1062	      o+    Resource Reservation fields: these fields (T.B.D.) are used
1063	           to reserve resources as part of the CBT tree set up pro-
1064	           cedure.

1066	      o+    Security fields: these fields (T.B.D.) ensure the authenti-
1067	           city and integrity of the received packet.

1069	_1._3.  _P_r_i_m_a_r_y _M_a_i_n_t_e_n_a_n_c_e _M_e_s_s_a_g_e _T_y_p_e_s

1071	   There are six types of CBT primary maintenance message, namely:

1073	      o+    JOIN-REQUEST: invoked by an end-system, generated and sent
1074	           (unicast) by a CBT router to the specified core address. Its
1075	           purpose is to establish the sending CBT router as part of the
1076	           corresponding delivery tree.

1078	      o+    JOIN-ACK: an acknowledgement to the above. The full list of
1079	           core addresses is carried in a JOIN-ACK, together with the
1080	           actual core affiliation (the join may have been terminated by
1081	           an on-tree router on its journey to the specified core, and
1082	           the terminating router may or may not be affiliated to the
1083	           core specified in the original join). A JOIN-ACK traverses
1084	           the same path as the corresponding JOIN-REQUEST, and it is
1085	           the receipt of a JOIN-ACK that actually creates a tree
1086	           branch.

1088	      o+    JOIN-NACK: a negative acknowledgement, indicating that the
1089	           tree join process has not been successful.

1091	      o+    QUIT-REQUEST: a request, sent from a child to a parent, to be
1092	           removed as a child to that parent.

1094	      o+    QUIT-ACK: acknowledgement to the above. If the parent, or the
1095	           path to it is down, no acknowledgement will be received
1096	           within the timeout period.  This results in the child
1097	           nevertheless removing its parent information.

1099	      o+    FLUSH-TREE: a message sent from parent to all children, which
1100	           traverses a complete branch. This message results in all tree
1101	           interface information being removed from each router on the
1102	           branch, possibly because of a re-configuration scenario.

1104	   The JOIN-REQUEST has three valid sub-codes, namely JOIN-ACTIVE, RE-
1105	   JOIN-ACTIVE, and RE-JOIN-NACTIVE.

1107	   A JOIN-ACTIVE is sent from a CBT router that has no children for the
1108	   specified group.

1110	   A RE-JOIN-ACTIVE is sent from a CBT router that has at least one
1111	   child for the specified group.

1113	   A RE-JOIN-NACTIVE originally started out as an active re-join, but
1114	   has reached an on-tree router for the corresponding group. At this
1115	   point, the router changes the join status to non-active re-join and
1116	   forwards it on its parent branch, as does each CBT router that
1117	   receives it. Should the router that originated the active re-join
1118	   subsequently receive the non-active re-join, it must immediately send
1119	   a QUIT-REQUEST to its parent router. It then attempts to re-join
1120	   again. In this way the re-join acts as a loop-detection packet.

1122	_1._4.  _A_u_x_i_l_l_i_a_r_y _M_a_i_n_t_e_n_a_n_c_e _M_e_s_s_a_g_e _T_y_p_e_s

1124	   There are eleven CBT auxilliary maintenance message types:

1126	      o+    CBT-DR-SOLICITATION: a request sent from a host to the CBT
1127	           ``all-routers'' multicast address, for the address of the
1128	           best next-hop CBT router on the LAN to the core as specified
1129	           in the solicitation.

1131	      o+    CBT-DR-ADVERTISEMENT: a reply to the above. Advertisements
1132	           are addressed to the ``all-systems'' multicast group.

1134	      o+    CBT-CORE-NOTIFICATION: unicast from a group initiating host
1135	           to each core selected for the group, this message notifies
1136	           each core of the identities of each of the other core(s) for
1137	           the group, together with their core ranking. The receipt of
1138	           this message invokes the building of the core tree by all
1139	           cores other than the highest-ranked (primary core).

1141	      o+    CBT-CORE-NOTIFICATION-REPLY: a notification of acceptance to
1142	           becoming a core for a group, to the corresponding end-system.

1144	      o+    CBT-ECHO-REQUEST: once a tree branch is established, this
1145	           messsage acts as a ``keepalive'', and is unicast from child
1146	           to parent.

1148	      o+    CBT-ECHO-REPLY: positive reply to the above.

1150	      o+    CBT-CORE-PING: unicast from a CBT router to a core when a
1151	           tree router's parent has failed. The purpose of this message
1152	           is to establish core reachability before sending a JOIN-
1153	           REQUEST to it.

1155	      o+    CBT-PING-REPLY: positive reply to the above.

1157	      o+    CBT-TAG-REPORT: unicast from an end-system to the designated
1158	           router for the corresponding group, subsequent to the end-
1159	           system receiving a designated router advertisement (as well
1160	           as a core notification reply if group-initiating host). This
1161	           message invokes the sending of a JOIN-REQUEST if the receiv-
1162	           ing router is not already part of the corresponding tree.

1164	      o+    CBT-CORE-CHANGE: group-specific multicast by a CBT router
1165	           that originated a JOIN-REQUEST on behalf of some end-system
1166	           on the same LAN (subnet). The purpose of this message is to
1167	           notify end-systems on the LAN belonging to the specified
1168	           group of such things as: success in joining the delivery
1169	           tree; actual core affiliation.

1171	      o+    CBT-DR-ADV-NOTIFICATION: multicast to the CBT ``all-routers''
1172	           address, this message is sent subsequent to receiving a CBT-
1173	           DR-SOLICITATION, but prior to any CBT-DR-ADVERTISEMENT being
1174	           sent. It acts as a tie-breaking mechanism should more than
1175	           one router on the subnet think itself the best next-hop to
1176	           the addressed core. It also promts an already established DR
1177	           to announce itself as such if it has not already done so in
1178	           response to a CBT-DR-SOLICITATION.

1180	Part D

1182	_1.  _I_n_t_e_r_o_p_e_r_a_b_i_l_i_t_y _I_s_s_u_e_s

1184	   One of the design goals of CBT is for it to fully interwork with
1185	   other IP multicast schemes. We have already described how CBT-style
1186	   packets are transformed into IP-style multicasts, and vice-versa.

1188	   In order for CBT to fully interwork with other schemes, it is neces-
1189	   sary to define the interface(s) between a ``CBT cloud'' and the cloud
1190	   of another scheme. The CBT authors are currently working out the
1191	   details of the ``CBT-other'' interface, and therefore we omit further
1192	   discussion of this topic at the present time.

1194	_2.  _A _R_o_u_t_e_r _O_p_t_i_m_i_z_a_t_i_o_n

1196	   In a CBT-only environment it is possible to optimize the performance
1197	   of CBT with respect to data packet forwarding in CBT-capable routers.
1198	   In such an environment the presence of a CBT header is not necessary,
1199	   and its absence is likely to improve switching times by around 50 per
1200	   cent.  However, the downside is that the functionality the CBT header
1201	   provides, such as CBT security, is lost.

1203	_3.  _C_B_T _S_e_c_u_r_i_t_y _A_r_c_h_i_t_e_c_t_u_r_e

1205	   see current I-D: draft-ballardie-mkd-00.{ps,txt}

1207	_4.  _A_c_k_n_o_w_l_e_d_g_e_m_e_n_t_s

1209	   Special thanks goes to Paul Francis, NTT Japan, for the original
1210	   brainstorming sessions that brought about this work.

1212	   Steve Ostrowitz (Bay Networks Inc.) for his suggestions and comments
1213	   on making a CBT router implemention as optimal as possible.

1215	   I would also like to thank the participants of the IETF IDMR working
1216	   group meetings for their general constructive comments and sugges-
1217	   tions since the inception of CBT.

1219	Author's Address:

1221	   Tony Ballardie,
1222	   Department of Computer Science,
1223	   University College London,
1224	   Gower Street,
1225	   London, WC1E 6BT,
1226	   ENGLAND, U.K.

1228	   Tel: ++44 (0)71 387 7050 x. 3462
1229	   e-mail: A.Ballardie@cs.ucl.ac.uk

1231	   NOTE: For a version of this draft containing all diagrams and refer-
1232	   ences, you are recommended to retrieve the .ps version.