idnits 2.17.1 

draft-ietf-idmr-pim-arch-05.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-19) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  ** Expected the document's filename to be given on the first page, but
     didn't find any

  ** The document is more than 15 pages and seems to lack a Table of Contents.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 24
     longer pages, the longest (page 2) being 68 lines


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an Introduction section.
     (A line matching the expected section header was found, but with an
    unexpected indentation:
     '     1.2 Background' )

  ** The document seems to lack a Security Considerations section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack an Authors' Addresses Section.

  ** There are 543 instances of weird spacing in the document.  Is it really
     formatted ragged-right, rather than justified?

  ** There are 8 instances of too long lines in the document, the longest one
     being 4 characters in excess of 72.

  ** There is 1 instance of lines with control characters in the document.

  ** The abstract seems to contain references ([15], [2], [3], [4], [18],
     [5], [19], [6], [7], [8], [9], [10], [11], [12], [13], [14], [1]), which
     it shouldn't.  Please replace those with straight textual mentions of the
     documents in question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == Line 12 has weird spacing: '...ivation  and...'

  == Line 17 has weird spacing: '... Drafts  are  ...'

  == Line 18 has weird spacing: '...cuments  of  t...'

  == Line 19 has weird spacing: '...ups may  also ...'

  == Line 23 has weird spacing: '... Drafts  may  ...'

  == (538 more instances...)

  == Couldn't figure out when the document was first submitted -- there may
     comments or warnings related to the use of a disclaimer for pre-RFC5378
     work that could not be issued because of this.  Please check the Legal
     Provisions document at https://trustee.ietf.org/license-info to determine
     if you need the pre-RFC5378 disclaimer.

  -- Couldn't find a document date in the document -- date freshness check
     skipped.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Missing reference section? '1' on line 93 looks like a reference

  -- Missing reference section? '2' on line 306 looks like a reference

  -- Missing reference section? '3' on line 871 looks like a reference

  -- Missing reference section? '4' on line 76 looks like a reference

  -- Missing reference section? '5' on line 93 looks like a reference

  -- Missing reference section? '6' on line 306 looks like a reference

  -- Missing reference section? '7' on line 266 looks like a reference

  -- Missing reference section? '8' on line 266 looks like a reference

  -- Missing reference section? '9' on line 282 looks like a reference

  -- Missing reference section? '10' on line 294 looks like a reference

  -- Missing reference section? '11' on line 312 looks like a reference

  -- Missing reference section? '12' on line 335 looks like a reference

  -- Missing reference section? '13' on line 386 looks like a reference

  -- Missing reference section? '14' on line 409 looks like a reference

  -- Missing reference section? '15' on line 452 looks like a reference

  -- Missing reference section? '19' on line 871 looks like a reference

  -- Missing reference section? '18' on line 932 looks like a reference


     Summary: 16 errors (**), 0 flaws (~~), 9 warnings (==), 18 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                             Steven Deering (XEROX)
3	Internet Draft                                      Deborah Estrin (USC)
4	                                                  Dino Farinacci (CISCO)
5	                                                      Mark Handley (UCL)
6	                                                       Ahmed Helmy (USC)
7	                                                      Van Jacobson (LBL)
8	                                                     Chinggung Liu (USC)
9	                                                     Puneet Sharma (USC)
10	                                                    David Thaler (UMICH)
11	                                                      Liming Wei (CISCO)
12	   Protocol Independent Multicast-Sparse Mode (PIM-SM):  Motivation  and
13	   Architecture

15	   Status of This Memo

17	   This document is an Internet  Draft.   Internet  Drafts  are  working
18	   documents  of  the Internet Engineering Task Force (IETF), its Areas,
19	   and its Working Groups. (Note that other groups may  also  distribute
20	   working documents as Internet Drafts).

22	   Internet Drafts are draft  documents  valid  for  a  maximum  of  six
23	   months.  Internet  Drafts  may  be updated, replaced, or obsoleted by
24	   other documents at any time.  It is not appropriate to  use  Internet
25	   Drafts  as  reference  material  or  to  cite  them  other  than as a
26	   ``working'' draft'' or ``work in progress.''

28	   Please check the I-D abstract  listing  contained  in  each  Internet
29	   Draft  directory  to  learn  the  current status of this or any other
30	   Internet Draft.

32	                              Abstract

34	   Traditional multicast  routing  mechanisms  (e.g.  DVMRP  and
35	   MOSPF  [1][2])  were intended for use within regions where groups are
36	   widely represented or bandwidth is universally plentiful. When  group
37	   members, and senders to those group members, are distributed
38	   sparsely across a wide area, these schemes are not  efficient;  data
39	   packets  or  membership report information are periodically sent over
40	   many links that do  not lead to receivers or  senders,  respectively.
41	   This  characteristic  lead  the  Internet  community  to  investigate
42	   multicast   routing   architectures   that   efficiently    establish
43	   distribution  trees across wide-area internets, where many groups are
44	   sparsely represented and where bandwidth is not  uniformly  plentiful
45	   due   to   the  distances  and  multiple  administrations  traversed.
46	   Efficiency is evaluated  in  terms  of  the  state,  control  message
47	   processing,  and  data  packet  processing required across the entire
48	   network in order to deliver data packets to the members of the group.

50	   The   Protocol   Independent    Multicast--Sparse    Mode    (PIM-SM)
51	   architecture:

53	        (a)   maintains the traditional IP multicast  service  model  of
54	             receiver-initiated membership;

56	        (b)    uses  explicit  joins  that  propagate  hop-by-hop   from
57	             members' directly connected routers toward the distribution
58	             tree.

60	        (c)   builds a shared multicast distribution tree centered at  a
61	             Rendezvous Point, and then builds source-specific trees for
62	             those sources whose data traffic warrants it.

64	        (d)   is not dependent on a specific unicast  routing  protocol;
65	             and

67	        (e)   uses soft-state mechanisms to adapt to underlying  network
68	             conditions and group dynamics.

70	        The robustness, flexibility,  and  scaling  properties  of  this
71	        architecture  make  it well suited to large heterogeneous inter-
72	        networks.

74	        This document motivates and describes the  PIM-SM  architecture.
75	        Companion  documents  describe  the detailed protocol mechanisms
76	        for PIM-SM and PIM-DM, respectively [3][4].

78	     1 Introduction

80	        This document describes an architecture for efficiently  routing
81	        to  multicast  groups that may span wide-area (and inter-domain)
82	        internets. We refer to  the  approach  as  Protocol  Independent
83	        Multicast-Sparse  Mode  (PIM-SM)  because it is not dependent on
84	        any  particular  unicast  routing  protocol.   Throughout   this
85	        document  we will use the shorter term PIM, to mean PIM-SM. When
86	        we are referring to the PIM Dense  Mode  protocol  we  will  say
87	        PIM-DM explicitly.

89	        The most significant innovation  in  this  architecture  is  the
90	        efficient support of sparse, wide area groups. This  sparse mode
91	        (SM) of operation  complements  the  traditional   dense-mode
92	        approach  to multicast routing for campus networks, as developed
93	        by Deering [5][6] and implemented in  MOSPF  and  DVMRP  [1][2].
94	        These traditional dense mode multicast schemes were intended for
95	        use within regions  where  a  group  is  widely  represented  or
96	        bandwidth is universally plentiful. However, when group members,
97	        and senders to those groups, are distributed sparsely  across
98	        a  wide  area, these schemes are not efficient; data packets (in
99	        the case of DVMRP) or membership report information (in the case
100	        of  MOSPF)  are occasionally sent over many links that do not
101	        lead to receivers or senders, respectively. The purpose of  this
102	        work  is  to  develop  a  multicast  routing  architecture  that
103	        efficiently establishes distribution trees even when members are
104	        sparsely  distributed.  Efficiency  is evaluated in terms of the
105	        state, control message, and data packet overhead required across
106	        the  entire  network  in  order  to  deliver data packets to the
107	        members of the group.

109	     1.1 Definition of Terms (Glossary):
110	        Following  is  a list  of  terms  and  definitions
111	        used throughout this document, in alphabetical order.
112	        This is a subset of  the  glossary  list  that  appears  in  the
113	        protocol specification.

115	        *    Asserts. The  process  of  choosing  a  single  router  to
116	             forward  multicast  packets from a particular source onto a
117	             particular LAN segment. The need for  Asserts  arises  when
118	             a LAN segment has multiple directly-connected routers with
119	             routes to the source.

121	        *    Bootstrap router (BSR). A BSR is a  dynamically  elected
122	             router   within   a  PIM  domain.  It  is  responsible  for
123	             constructing the RP-Set and originating Bootstrap messages.

125	        *    Candidate-BSR (C-BSR). A C-BSR is a router configured to
126	             participate in the BSR election and act as BSRs if elected.

128	        *    Dense-mode (DM). A generic term referring to a multicast
129	             routing protocol that is optimized for dense groups. DVMRP,
130	             MOSPF, and Dense-mode PIM are examples.

132	        *    Designated  Router  (DR).  The  DR  is  the  highest  IP
133	             addressed  PIM  router on a multi-access LAN. Normally, the
134	             DR sets up multicast route entries and sends  corresponding
135	             Join/Prune  and  Register  messages  on behalf of directly-
136	             connected receivers and sources, respectively. The  DR  may
137	             or  may  not be the same router as the IGMP Querier. The DR
138	             may or may not be the long-term, last-hop  router  for  the
139	             group, or a particular source that is sending to the group;
140	             a router on the LAN that has a lower metric  route  to  the
141	             data source, or to the group's RP, may take over that role.

143	        *    Incoming interface (iif). The iif of a  multicast  route
144	             entry  indicates  the  interface  from which multicast data
145	             packets are accepted for forwarding. The iif is initialized
146	             when the entry is created.

148	        *    Join list. The Join list is one of two lists of IP unicast
149	             addresses  that  is  included in a Join/Prune message; each
150	             address refers to  a  source  or  RP.  It  indicates  those
151	             sources  or  RPs  to  which  downstream receiver(s) wish to
152	             join.

154	        *    Last-hop router. The last-hop router is the router which
155	             forwards   multicast  data  packets  to  directly-connected
156	             member hosts. In general the last-hop router is the DR  for
157	             the  LAN.  However,  under  various conditions described in
158	             this document a parallel router connected to the  same  LAN
159	             may take over as the last-hop router in place of the DR.

161	        *    Member. A host that desires to receive multicast datagrams
162	             for a group. This host need not be a sender to the group. A
163	             Member is synonymously called a  Receiver.

165	        *    Outgoing interface  (oif)  list.  Each  multicast  route
166	             entry

168	             has an oif list containing the outgoing interfaces to which
169	             multicast packets matching that entry should be forwarded.

171	        *    Prune List. The Prune  list  is  the  second  list  of  IP
172	             unicast addresses that is included in a Join/Prune message.
173	             It indicates those sources or  RPs  from  which  downstream
174	             receiver(s) wish to prune.

176	        *    PIM Multicast Border Router (PMBR). A  PMBR  connects  a
177	             PIM domain to other multicast routing domain(s).

179	        *    Rendezvous  Point  (RP).  Each  multicast  group  has  a
180	             shared-tree  via which receivers hear of sources. The RP is
181	             the root of this per-group shared tree, called the RP-Tree.
182	             Candidate-RPs  are routers configured to participate as RPs
183	             for some (or all) groups.

185	        *    RP-Set. The BSR for a PIM region constructs a set of  RP
186	             IP addresses based on Candidate-RP advertisements received.
187	             The RP-Set information is distributed to all PIM routers in
188	             a domain in a Bootstrap message.

190	        *    Reverse Path Forwarding (RPF). RPF is used to select the
191	             appropriate  incoming interface for a multicast route entry
192	             . The RPF neighbor for an IP address X is the the  next-hop
193	             router  used to forward packets toward X. The RPF interface
194	             is the interface to that RPF neighbor. In the  common  case
195	             this  is  the next hop used by the unicast routing protocol
196	             for sending unicast packets toward X. For example, in cases
197	             where  unicast  and  multicast routes are not congruent, it
198	             can be different.

200	        *    Route entry. A multicast route entry is state maintained
201	             in a router along the distribution tree and is created, and
202	             updated based on incoming control  messages,  and  in  some
203	             cases  data  packets. The route entry may be different from
204	             the forwarding entry; the latter is used  to  forward  data
205	             packets  in  real time. Typically a forwarding entry is not
206	             created until data packets arrive, the  forwarding  entry's
207	             iif  and  oif list are copied from the route entry, and the
208	             forwarding entry may be flushed and recreated at will.

210	        *    Shared Tree (RP tree). The set of paths  connecting  all
211	             receivers  of  a group to its RP is the RP tree. A receiver
212	             on the RP tree receives packets from  all  sources  of  the
213	             group,  except  those  sources  that were pruned off the RP
214	             tree.

216	        *    Shortest path tree (SPT).  The  SPT  is  the  multicast
217	             distribution  tree  created  by  the  merger  of all of the
218	             shortest paths that connect receivers  to  the  source  (as
219	             determined by unicast routing).

221	        *    Source. A host that sends multicast datagrams to a  group.
222	             A  Source  is  not  required  to  be  a member. A Source is
223	             synonymously called a Sender.

225	        *    Sparse  Mode  (SM).  Sparse  mode  PIM  uses   explicit
226	             Join/Prune messages and Rendezvous points in place of Dense
227	             Mode PIM's and DVMRP's broadcast and prune mechanism.

229	        *    Wildcard (WC) multicast route entry. Wildcard  multicast
230	             route entries are those entries that may be used to forward
231	             packets for any source  sending  to  the  specified  group.
232	             Wildcard  bits  in  the  join  list of a Join/Prune message
233	             represent either a (*,G) or (*,*,RP)  join;  in  the  prune
234	             list they represent a (*,G) prune.

236	        *    (S,G) route entry.  (S,G)  is  a  source-specific  route
237	             entry.  It  may  be  created  in  response to data packets,
238	             Join/Prune messages, or Asserts. The (S,G) state in routers
239	             creates a source-rooted, shortest path (or reverse shortest
240	             path) distribution tree. (S,G)RPT bit entries  are  source-
241	             specific  entries  on the shared RP-Tree; these entries are
242	             used to prune particular sources off of the shared tree.

244	        *    (*,G) route entry. Group members join the shared RP-Tree
245	             for  a  particular group. This tree is represented by (*,G)
246	             multicast route entries along the  shortest  path  branches
247	             between the RP and the group members.

249	        *    (*,*,RP)  route  entry.  PMBRs  join  toward  all   RPs
250	             supporting  non-local  groups,  within  their PIM domain in
251	             order to pull packets generated within the  region  out  to
252	             the  borders  of the region. The routers along the shortest
253	             path branches between the RP(s) and the PMBRs keep (*,*,RP)
254	             state and use it to determine how to deliver packets toward
255	             the PMBRs if data packets arrive for which there is  not  a
256	             longer match.

258	     1.2 Background

260	        In the traditional dense-mode IP multicast model, established by
261	        Deering  [6], a  multicast address is assigned to the collection
262	        of receivers for a multicast  group.  Senders  simply  use  that
263	        address  as  the  destination  address  of a packet to reach all
264	        members of the group. The separation of  senders  and  receivers
265	        allows  any  host,  member  or non-member, to send to a group. A
266	        group membership protocol (IGMP) [7][8] is used for  routers  to
267	        learn the existence of members on their directly attached
268	        subnetworks.
269	        This receiver-initiated join procedure  has  very  good  scaling
270	        properties;  as  the  group grows, it becomes more likely that a
271	        new receiver will be able to splice onto a nearby branch of  the
272	        distribution  tree. A multicast routing protocol, in the form of
273	        an extension to  existing  unicast  protocols  (e.g.  DVMRP,  an
274	        extension  to  a  RIP-like  distance-vector unicast protocol; or
275	        MOSPF, an extension to the link-state unicast protocol OSPF), is
276	        executed on routers to construct multicast packet delivery paths
277	        and to accomplish multicast data packet forwarding.

279	        In the case of link-state protocols, changes of group membership
280	        on  a  subnetwork  are  detected  by one of the routers directly
281	        attached to that subnetwork,  and  that  router  broadcasts  the
282	        information to all other routers in the same routing domain [9].
283	        Each router  maintains  an  up-to-date  image  of  the  domain's
284	        topology  through  the unicast link-state routing protocol. Upon
285	        receiving a multicast data packet, the router uses the  topology
286	        information  and  the  group membership information to determine
287	        the shortest-path tree (SPT) from the packet's source subnetwork
288	        to  its  destination  group  members. Broadcasting of membership
289	        information is one major factor preventing link-state  multicast
290	        from  scaling  to  larger,  wide-area, networks --- every router
291	        must receive and store membership information for every group in
292	        the domain. The other major factor is the processing cost of the
293	        Dijkstra shortest-path-tree calculations  performed  to  compute
294	        the delivery trees for all active multicast sources [10] for all
295	        groups, thus limiting  its  applicability  on  an  internet-wide
296	        basis.

298	        Distance-vector multicast routing protocols construct  multicast
299	        distribution  trees  using  variants  of Reverse Path Forwarding
300	        (RPF) [11]. When the first data packet is sent to a group from a
301	        particular source subnetwork, and a router receiving this packet
302	        has no knowledge  about  the  group,  the  router  forwards  the
303	        incoming   packet   out   all  interfaces  except  the  incoming
304	        interface.  (Some  schemes  reduce  the   number   of   outgoing
305	        interfaces further by using unicast routing protocol information
306	        to keep track of child-parent  information  [6][2].)  A  special
307	        mechanism  is  used  to avoid forwarding of data packets to leaf
308	        subnetworks with  no  members  in  that  group  (also  known  as
309	        truncated  broadcasting).  Also if the arriving data packet does
310	        not come through the interface that  the  router  uses  to  send
311	        packets  to  the  source  of the data packet, the data packet is
312	        silently dropped; thus the term Reverse  Path  Forwarding  [11].
313	        When  a  router  attached  to a leaf subnetwork, receives a data
314	        packet addressed to a new group, if it finds no members  present
315	        on  its  attached subnetworks, it sends a prune message upstream
316	        towards the source of the data packet. The prune messages  prune
317	        the  tree  branches not leading to group members, thus resulting
318	        in a source-specific shortest-path tree with all  leaves  having
319	        members.  Pruned  branches  will  ``grow  back'' after a time-out
320	        period; these branches will again be pruned if there  are  still
321	        no  multicast  members  and data packets are still being sent to
322	        the group.

324	        Compared with  the  total  number  of  destinations  within  the
325	        greater  internet,  the  number  of  destinations  having  group
326	        members of any particular wide-area group  is  likely  to  be
327	        small.  More  importantly,  bandwidth limitations, and therefore
328	        data and control message overhead, should not be  ignored  in  a
329	        wide  area  context.  In  the  case of distance-vector multicast
330	        schemes, routers that are not on  the  multicast  delivery  tree
331	        still have to carry the periodic truncated-broadcast of packets,
332	        and process the subsequent pruning of branches  for  all  active
333	        groups.   One  particular  distance-vector  multicast  protocol,
334	        DVMRP, has been deployed in hundreds of regions connected by the
335	        MBONE   [12].  However,  its  occasional  broadcasting  behavior
336	        severely limits its  capability  to  scale  to  larger  networks
337	        supporting  much  larger  numbers  of  groups, many of which are
338	        sparse.

340	     1.3 Extending multicast to the wide area: scaling issues

342	        The scalability of a multicast  protocol  can  be  evaluated  in
343	        terms  of  its  overhead  growth  with the size of the internet,
344	        numbers of receivers or sources per group, number of groups, and
345	        distribution   of  group  receivers  and  senders.  Overhead  is
346	        evaluated in terms of resources consumed in routers  and  links,
347	        i.e., state, processing, and bandwidth.

349	        Existing dense-mode  link-state  and  distance-vector  multicast
350	        routing schemes have good scaling properties only when multicast
351	        groups densely populate the network of  interest,  or  when  the
352	        overhead  of  dense-mode operation is negligible relative to the
353	        network resources. When most of the  subnets  or  links  in  the
354	        (inter)network  have  group members, then the bandwidth, storage
355	        and  processing  overhead  of  broadcasting  membership  reports
356	        (link-state),  or  data  packets (distance-vector) is warranted,
357	        since the information or data packets are needed in  most  parts
358	        of  the  network  anyway. The emphasis of our work is to develop
359	        multicast protocols  that  will  also  efficiently  support  the
360	        sparsely distributed groups that are likely to be most prevalent
361	        in   wide-area,   multi-administration,   inter-networks   where
362	        resources must be used more conservatively.

364	     1.4 Overhead and tree types

366	                [Figures are present only in the postscript version]
367	                       Fig. 1  Example of Multicast Trees

369	        The examples in Figure 1 illustrate the inadequacies  of  dense-
370	        mode  mechanisms when supporting sparse, wide area groups. There
371	        are three domains that communicate via an internet. There  is  a
372	        member of a particular group, G, located in each of the domains.
373	        There are no other members of this group currently active in the
374	        internet.  If  a traditional IP multicast routing mechanism such
375	        as DVMRP is used, then when a source in domain
376	         A starts to send  to  the  group,  its  data  packets  will  be
377	        broadcast throughout the entire internet. Subsequently all those
378	        sites that do not have local members will  send  prune  messages
379	        and  the  distribution  tree  will stabilize to that illustrated
380	        with bold lines  in  Figure  1(b).  However,  periodically,  the
381	        source's   packets  will  be  broadcast  throughout  the  entire
382	        internet when the pruned-off branches times out.

384	        Thus far we have motivated our design by contrasting it  to  the
385	        traditional  dense-mode IP multicast routing protocols. The Core
386	        Based Tree (CBT) protocol [13] was proposed to  address  similar
387	        scaling problems in support of sparse-mode multicast. CBT uses a
388	        single delivery tree for each group, rooted at one  of  a  small
389	        set  of  ``core'' routers and shared by all senders to the group.
390	        CBT does not exhibit the  occasional  broadcasting  or  flooding
391	        behavior  of earlier protocols. However, CBT does so at the cost
392	        of imposing a single shared tree for each multicast group.

394	        If CBT were used to support the example group, then a core might
395	        be defined in domain A, and the distribution tree illustrated in
396	        Figure 1(c) would be established. This distribution  tree  would
397	        also be used by sources sending from domains B and C. This would
398	        result in concentration of all  sources'  traffic  on  the  path
399	        indicated  with  bold  lines.  We  refer  to  this  as   traffic
400	        concentration. This is a potentially significant issue with
401	        any protocol that imposes a single shared tree per group. In
402	        addition, the packets traveling from  Y to  Z  will  not  travel
403	        via the shortest path used by unicast packets between  Y and  Z.

405	        [Figures are present only in the postscript version]
406	        Fig. 2  Comparison of shortest-path trees and center-based tree

408	        We need to know the kind of degradations a core-based  tree  can
409	        incur in average networks. David Wall [14] proved that the bound
410	        on maximum delay of an optimal core-based tree (which he  called
411	        a  center-based tree) is 2 times the shortest-path delay. To
412	        get a better understanding of how well optimal core-based  trees
413	        perform  in  average  cases,  we simulated an optimal core-based
414	        tree algorithm over large number of different random graphs.  We
415	        measured  the  maximum delay within each group, and experimented
416	        with graphs of different node degrees. We show the ratio of  the
417	        CBT  maximum  delay  versus  shortest-path tree maximum delay in
418	        Figure 2(a). For each node degree, we tried  500  different  50-
419	        node  graphs  with  10-member  groups chosen randomly. It can be
420	        seen that the maximum delays of core-based  trees  with  optimal
421	        core  placement,  are up to 1.4 times greater than shortest-path
422	        trees. Note that although some error bars  in  the  delay  graph
423	        extend  below  1,  there are no real data points below 1 --- the
424	        distribution is not symmetric, for more details  see  [15].  For
425	        interactive  applications  where  low latency is critical, it is
426	        desirable to use the shortest-path trees  to  avoid  the  longer
427	        delays of an optimal core-based tree.

429	        With respect to the potential traffic concentration problem,  we
430	        also   conducted   simulations  in  randomly  generated  50-node
431	        networks. In each network, there  were  300  active  groups  all
432	        having  40  members,  of  which 32 members were also senders. We
433	        measured the number  of  traffic  flows  on  each  link  of  the
434	        network,  then  recorded  the maximum number within the network.
435	        For each  node  degree  between  three  and  eight,  500  random
436	        networks  were  generated,  and  the  measured maximum number of
437	        traffic flows were averaged. Figure 2(b) shows  a  plot  of  the
438	        measurements  in  networks  with  different  node  degrees. This
439	        experiment demonstrates situations  in  which  CBT  may  exhibit
440	        significantly greater traffic concentrations.

442	        It is evident to us that both tree types have  their  advantages
443	        and  disadvantages. One type of tree may perform very well under
444	        one class of conditions, while the other type may be  better  in
445	        other  situations.  For  example,  shared trees may perform very
446	        well for large numbers of low data rate sources (e.g.,  resource
447	        discovery  applications),  while SPT(s) may be better suited for
448	        high data rate sources (e.g., real  time  teleconferencing).  It
449	        would  be  ideal  to flexibly support both types of trees within
450	        one multicast architecture, so that the selection of tree  types
451	        becomes  a configuration decision within a multicast protocol. A
452	        more complete analysis of these tradeoffs can be found in  [15].
453	        PIM  is  designed  to address the two issues addressed above: to
454	        avoid the overhead of broadcasting packets  when  group  members
455	        sparsely  populate  the  internet,  and  to  do so in a way that
456	        supports  good-quality  distribution  trees  for   heterogeneous
457	        applications.

459	        In PIM, a multicast router can choose to use shortest-path trees
460	        or  a  group-shared  tree. The last-hop routers of the receivers
461	        can make this decision  independently.  A  receiver  could  even
462	        choose  different  types  of  trees  for  different  sources. In
463	        general, we recommend that routers be  configured  to  join  the
464	        shortest  path  tree  for  a  source when the source's data rate
465	        exceeds a configured threshold.

467	        The  capability  to  support  different  tree   types   is   the
468	        fundamental  difference  between  PIM  and  CBT. There are other
469	        significant protocol engineering differences as well,  the  most
470	        significant  of  which  is  PIM's  use of soft state reliability
471	        mechanisms. CBT uses explicit hop-by-hop mechanisms  to  achieve
472	        reliable  delivery of control messages. As described in the next
473	        section, PIM uses periodic refreshes as  its  primary  means  of
474	        reliability.   This  approach  reduces  the  complexity  of  the
475	        protocol and  covers  a  wide  range  of  protocol  and  network
476	        failures  in  a  single  simple  mechanism.  Although soft-state
477	        refreshing can introduce additional message  protocol  overhead,
478	        we  introduce  the  notion  of  scalable  timers to address such
479	        concerns.

481	     1.5 Document organization

483	        In the remainder of this  document  we  enumerate  the  specific
484	        design requirements for wide-area multicast routing (section
485	         2),  summarize  the  architectural  components  and   functions
486	        (section   3),  enumerate  several  protocol engineering choices
487	        made in the design of PIM protocols (section  4),  and  consider
488	        the  use  of  aggregation  to  address  the  scalability problem
489	        (section  5). Protocol details can be found in [3].

491	     2 Requirements

493	        We had several design objectives in  mind  when  designing  this
494	        architecture:

496	        *    Sparse-Mode Regions

498			We define a sparse  mode  region  as one in which

500	             (a)   the number of  networks/domains  with  group  members
501	                  present   is  significantly  smaller  than  number  of
502	                  networks/domains in the region as a whole;

504	             (b)   group members span an area that is too large/wide  to
505	                  rely on scope control; and

507	             (c)   the region spanned by the group is  not  sufficiently
508	                  resource  rich  to  ignore  the  overhead  of  traditional
509	                  schemes.

511	             Groups in sparse-mode regions are not necessarily ``small'';
512	             therefore we must support dynamic groups with large numbers
513	             of participants (i.e. receivers and senders).

515	        *    High-Quality Data Distribution

517	             We wish to support low-delay data distribution when  needed
518	             by  the  application.  In  particular, we avoid  imposing a
519	             single shared tree in which data packets are  forwarded  to
520	             receivers along a common tree, independent of their source.
521	             Source-specific trees are superior when

523	             (a)   multiple sources send data simultaneously  and  would
524	                  experience  poor  service  when  the  traffic  is  all
525	                  concentrated on a single shared tree, or

527	             (b)   the path lengths between sources and destinations  in
528	                  the   shortest-path   tree  (SPTs)  are  significantly
529	                  shorter than in the shared tree.

531	        *     Routing Protocol Independence

533	             The protocol should make use of  existing  unicast  routing
534	             functionality to adapt to topology changes, but at the same
535	             time be independent of the  particular  protocol  employed.
536	             This  independence has another advantage that the multicast
537	             domain  boundaries  may  extend   beyond   unicast   domain
538	             boundaries.  This  allows  network  designers  to take into
539	             consideration the multicast  requirements  and  not  to  be
540	             burdened  with unicast topology restrictions. We accomplish
541	             this by letting the multicast  protocol  make  use  of  the
542	             unicast routing tables, independent of how those tables are
543	             computed.

545	        *     Interoperability with dense mode protocols

547	             We require interoperability with traditional RPF and  link-
548	             state  multicast  routing,  both  intra-domain  and  inter-
549	             domain.  For  example,  the  intra-domain  portion   of   a
550	             distribution  tree  may  be  established  by  some other IP
551	             multicast protocol, and the inter-domain portion by PIM; or
552	             vice  versa.  In  some cases it will be necessary to impose
553	             some additional protocol or configuration overhead in order
554	             to interoperate with some intra-domain routing protocols.

556	        *     Robustness

558	             The protocol should be able to gracefully adapt to  routing
559	             changes. We achieve this by

561	             (a)   using  soft state refreshment mechanisms,

563	             (b)   avoiding a single point of failure by  using  an  RP-
564	                  Set, and

566	             (c)   adapting along with (and based  on)  unicast  routing
567	                  changes  to  deliver  multicast  service  so  long  as
568	                  unicast packets are being serviced.

570	        *     Scalability

572	             We provide mechanisms for scaling with  group  and  network
573	             size.  These  mechanisms  address  the  forms  of overhead:
574	             control messages and  state.  Bandwidth  consumed  by  data
575	             packets  is  already minimized through the use of explicit-
576	             join sparse mode. Control  message  overhead  can  also  be
577	             limited  to  a  fixed  percentage  of the link bandwidth by
578	             adjusting the frequency of periodic messages on a  link  by
579	             link   basis.  This  method  of  controlling  overhead  was
580	             proposed by Van Jacobson.

582	             State overhead can be managed  in  such  a  way  that  each
583	             router  can  unilaterally  choose  its  own  tradeoff point
584	             between the amount of state maintained and  the  amount  of
585	             bandwidth   consumed  by  unneeded  flooding  of  multicast
586	             packets.

588	     3 PIM Components and Functions: Overview

590	        In this section we describe the architectural components of PIM.
591	        The  detailed  protocol  mechanisms  are  described  in  [3]. As
592	        described,  traditional   multicast   routing   protocols   were
593	        optimized   for   densely   distributed   groups   or  uniformly
594	        bandwidth-rich regions, and rely on data driven actions  in  all
595	        network  routers  to  establish efficient distribution trees. In
596	        contrast, sparse-mode multicast constrains data distribution  so
597	        that  packets  reach  only routers that are on the path to group
598	        members. PIM differs from existing IP multicast schemes  in  two
599	        fundamental ways:

601	        *    Routers with local (or downstream) members join  a  sparse-
602	             mode  PIM  distribution tree by sending explicit Join/Prune
603	             messages; in dense-mode IP multicast membership is  assumed
604	             and  multicast  data packets are sent until routers without
605	             local (or downstream) members send explicit prune  messages
606	             to remove themselves from the distribution tree.

608	        *    Whereas dense-mode IP multicast tree construction  is  data
609	             driven,  sparse-mode  PIM  must  use  per-group  Rendezvous
610	             Point for receivers  to  ``meet''  new  sources.  Rendezvous
611	             Points (RP) are used by senders to announce their existence
612	             and by receivers to learn about new senders of a group.  In
613	             SM, the shared-tree join state is stored in anticipation of
614	             data packets, whereas DM does not create state until a data
615	             packet  arrives.  The  source-specific  trees and associate
616	             state are data-driven in PIM, as in PIM-DM.

618	        The shortest-path-tree state maintained in  routers  is  roughly
619	        the  same  type  as  the  multicast  routing information that is
620	        currently maintained by routers running  existing  IP  multicast
621	        protocols  such  as  MOSPF,  i.e., source (S), multicast address
622	        (G), outgoing interface set (oif),  incoming  interface  (iif).
623	        We  refer  to this information as the multicast routing
624	        entry for (S,G). For all routers containing a (S,G) entry, their
625	        oif's and iif together form a shortest-path tree rooted
626	        at S.

628	        An entry for a shared tree can match packets from any source for
629	        its  associated  group  if  the  packets  come through the right
630	        incoming interface, we denote such an entry (*,G). A (*,G) entry
631	        keeps  the  same information a (S,G) entry keeps, except that it
632	        saves the RP address in place of the source address. There is  a
633	        wildcard  flag  (WC-bit)  indicating  that  this  is a wild card
634	        entry, and an RPT-bit indicating that  this  is  a  shared  tree
635	        entry.

637	                 [Figures are present only in the postscript version]
638	                 Fig. 3  How senders rendezvous with receivers
639	        Figure 3 shows a simple scenario of  a  sender  and  a  receiver
640	        joining  a multicast group via an RP. When the receiver wants to
641	        join a multicast group, its last-hop PIM router ( A  in  fig  3)
642	        sends  a Join/Prune message towards the RP for the group. If the
643	        last-hop router does not have RP information, it  is  considered
644	        an  error.  Processing  of  this message by intermediate routers
645	        sets up the multicast tree branch from the RP to  the  receiver.
646	        When   sources   start  sending  to  the  multicast  group,  the
647	        designated router ( D in fig 3) sends  a  PIM-Register  message,
648	        encapsulating  the data packet, to the RP for that group. If the
649	        source's data rate  warrants  a  source-specific  tree,  the  RP
650	        responds  by  sending  a  Join/Prune message towards the source.
651	        Processing of these messages by intermediate routers (there  are
652	        no  intermediate routers between the RP and the source in fig 3)
653	        sets up a packet delivery path from the source to the RP.

655	        If source-specific distribution trees are desired (based on  the
656	        source's  data  rate or some other configuration parameter), the
657	        last-hop  PIM  router  for  each  member  eventually  joins  the
658	        source-rooted  distribution  tree  for  each source by sending a
659	        Join/Prune message towards the source, including the  source  in
660	        the  Join list. After data packets are received on the new path,
661	        router  B in fig 3 sends a PIM-prune  message  towards  the  RP,
662	        including the source S in the prune list.
663	         B knows, by checking  the  incoming  interface  in  it  routing
664	        table,  that  it  is at a point where the shortest-path tree and
665	        the RP  tree  branches  diverge.  A  flag,  called  SPT-bit,  is
666	        included  in  (S,G)  entries  to indicate whether the transition
667	        from shared tree  to  shortest-path  tree  has  completed.  This
668	        minimizes   the   chance  of  losing  data  packets  during  the
669	        transition.

671	        Each PIM router must be able to map a multicast group address to
672	        that  group's  RP  (an  IP  address).  To  do  so,  an RP-Set is
673	        distributed to all PIM routers within a region, and each  router
674	        runs  the  same  hash  function  to  map from group address to a
675	        particular RP in the RP-Set. In this way all  routers  within  a
676	        PIM  region  map  a particular group address to the same RP. The
677	        RP-Set is constructed and distributed by  a  dynamically-elected
678	        bootstrap  router  (BSR)  within the region. Only a single RP is
679	        active for a group at any one point in  time,  and  the  BSR  is
680	        responsible  for  keeping  the RP-Set up to date. Therefore, all
681	        candidate RPs within the  region  send  periodic  advertisements
682	        (liveness indication) to the BSR.

684	        PIM avoids explicit enumeration of  receivers.  In  general,  in
685	        many  existing  and  anticipated  applications,  the  number  of
686	        receivers is much larger than the number of  sources,  and  when
687	        the number of sources is very large, the average data rate tends
688	        to be lower (e.g. resource discovery). In  any  finite  capacity
689	        network  there  is  an  upper  bound  on  the data rate that any
690	        individual  host  can  send  or  receive.  Therefore  there  are
691	        fundamental  bounds on the number of high data rate sources that
692	        can simultaneously send to the same group. However, there are no
693	        such  bounds  on  the  number  of  low datarate sources that can
694	        simultaneously send to the same group. If there are  very  large
695	        numbers  of sources sending to a group, but the sources' average
696	        data rates are low, then it may be more efficient to support the
697	        group  with  a  shared  tree  instead  which has less per-source
698	        overhead; therefore we suggest  triggering  Shortest  Path  Tree
699	        (SPT)  Join/Prune  messages  only  after the last hop router has
700	        received a threshold datarate from  the  particular  source.  If
701	        sources  are  low  data  rate,  these  Join/Prunes  will  not be
702	        triggered and receivers will receive packets via the shared tree
703	        instead  and  no source specific tree state will be constructed.
704	        Issues  of  group-specific   state   proliferation   and   state
705	        aggregation  are  discussed  further  in section  5. In summary,
706	        data packets from the source will travel to the RP  in  Register
707	        messages,  and  from  the  RP  will  travel to receivers via the
708	        distribution paths established by the Join/Prune  messages  sent
709	        upstream  from receivers towards the RP. If the RP and receivers
710	        initiate  shortest  path  tree  Join/Prunes,  the  sources  data
711	        packets  will  longest  match on the source specific (S,G) state
712	        instead of traveling via the RP  distribution  tree.  Some  data
713	        packets  will  continue  to travel from the sources to the RP in
714	        order to reach new receivers. Similarly, receivers will continue
715	        to receive some data packets via the RP tree in order to pick up
716	        new senders. However, when source-specific tree distribution  is
717	        used,  most  data  packets  will  arrive  at  receivers  over  a
718	        shortest-path   distribution   tree.   At   times   when   group
719	        participation is not changing, and all receivers have joined the
720	        shortest path tree(s), the  RP  can  inform  source(s)  to  stop
721	        sending data-encapsulating Register messages.

723	     4 Protocol Engineering Design Features

725	        In this section we describe engineering features embodied in the
726	        PIM  protocols:  robustness,  interaction  with  other multicast
727	        protocols, and multicast service interfaces.

729	     4.1 Robustness features

731	        There are several areas in which PIM is designed for robustness.

733	     4.1.1 Lost PIM messages

735	        The protocol is fairly robust to lost  control  messages.  If  a
736	        PIM-Register  message  gets lost then data packets will continue
737	        to be encapsulated in subsequent PIM-Register messages until the
738	        first  hop  router receives a Register-stop message message from
739	        the RP. If a new Join/Prune message (carrying join  information)
740	        is  lost  over an off-tree link (i.e. a link that is not already
741	        part of the mutlicast distribution tree), then for the remainder
742	        of  the refresh period, packets will not be forwarded on the new
743	        path, causing join latency; or in the case of prune information,
744	        packets will continue to be forwarded until the refresh is sent,
745	        causing leave latency.

747	        All outgoing-interface state that is cached is timed out after a
748	        period equal to `3.5' times the refresh period (e.g., default of
749	        210 seconds for the default 60 second refresh interval).  As  in
750	        other  multicast routing protocols, this longer timeout interval
751	        allows individual packets to be lost without adversely affecting
752	        the  routing function. When a routing entry has no more outgoing
753	        interfaces it is scheduled to be deleted some time later  and  a
754	        prune  can  be  sent  upstream (if no prune is sent upstream the
755	        upstream  state  will  eventually  time  out  anyway  since   no
756	        Join/Prunes  will  be  received  to  refresh  the  join  state.)
757	        Initially PIM messages are configured to be refreshed  every  60
758	        seconds.  However, in the future a scalable timer mechanism will
759	        be deployed in which the rate is a function  of  the  amount  of
760	        state  in  a  router  and  link bandwidth (i.e., for lower speed
761	        links the rate will be slower and for higher speed links it  may
762	        be higher).

764	     4.1.2 Multiple Rendezvous Points and RP failure scenarios

766	        If only a single RP were available to be used  for  a  multicast
767	        group,  group  communication would be disrupted if the RP became
768	        unreachable. Assigning a set of available RPs greatly  increases
769	        the  robustness of the system. A small set of PIM routers within
770	        a domain are configured to act as  Candidate  RPs  (C-RPs),  and
771	        periodically send C-RP Advertisements to the elected BSR. At any
772	        point in time only a single RP is active for a  group.  However,
773	        when  the  BSR  detects  that  a  particular  RP  is  no  longer
774	        reachable, the BSR deletes the unreachable RP(s) from the RP-Set
775	        next  distributed within the periodic Bootstrap message, and all
776	        PIM routers within the  region  rehash  affected  groups  (i.e.,
777	        those that were previously hashed to the now-unreachable RP).

779	     4.2 Interaction with other multicast protocols

781	        The basic difference between traditional  IP  multicast  routing
782	        and  PIM  is  that the former is completely data driven; we will
783	        refer to traditional IP multicast routing as  ``dense  mode''  for
784	        the  purposes  of  this  discussion.  Four  important behavioral
785	        differences result:

787	        *    Dense  mode  sends  and  stores  explicit  prune  state  in
788	             response  to  unwanted  data  packets. Sparse mode requires
789	             explicit joining; the default action is to  not  send  data
790	             packets where they have not been requested.

792	        *    Sparse mode stores shared-tree join state  in  anticipation
793	             of  data packets; Dense-mode routers do not store any state
794	             until data packets are sent (i.e. for active data sources).
795	             The  difference  is  not very significant for active groups
796	             (i.e., PIM would have one additional tree active);  however
797	             for  idle  groups dense mode has the advantage of having no
798	             state at all, whereas PIM would  have  state  for  the  one
799	             shared-tree.

801	        *    Sparse mode relies on the concept of an RP for data  to  be
802	             delivered  to  receivers  who  request  to  join the group.
803	             Dense-mode groups do not require an RP; broadcast  is  used
804	             as the rendezvous mechanism.

806	        *    Sparse mode  relies  on  periodic  refreshing  of  explicit
807	             Join/Prune messages. Dense mode does not need to send prune
808	             messages periodically because of its data driven nature.

810	        In simplified terms, the cost  of  dense  mode  is  the  default
811	        broadcast  behavior  and maintenance of prune state, whereas the
812	        cost of sparse mode is the need for RPs and  RP-tree  state  for
813	        idle  groups.  If  all  members  of a group are located within a
814	        bandwidth-rich region, the group may be supported in a  strictly
815	        dense  mode  using  scope  control.  However, such groups cannot
816	        include any members beyond the indicated scope, without imposing
817	        broadcast and prune overhead on the larger scope needed to reach
818	        the remote receiver. PIM is designed to address the more general
819	        problem  of groups that are not a priori limited to intra-domain
820	        membership and may therefore span domains.

822	        In the case of multi-access LANs, some interesting issues  arise
823	        because  of possibility of parallel routers forwarding duplicate
824	        packets onto the LAN. In SM we must be particularly careful with
825	        the  operation of the RPtree because the RPF check that prevents
826	        routing loops is dependent on information stored in the  router,
827	        and  not based on the source address found in the packet header.
828	        As a result it is conceivable that a packet could be  routed  in
829	        elaborate  loops  because  different routers are using different
830	        criteria for accepting the packet. To solve  this  problem  each
831	        router  on  a multi-access LAN sends Assert messages when a data
832	        packet from a source arrives on the outgoing interface  for  the
833	        associated  (S,G)  or  the  (*,G)  entry.  All routers listen to
834	        Assert messages, compare the metrics included therein, and  only
835	        one router remains the forwarder for that source to that LAN.

837	        We also wish to interoperate with  networks  that  do  not  have
838	        routers  modified  to  generate  and  interpret  PIM  Join/Prune
839	        messages. We have to address two functions: pulling data out  to
840	        the  dense-mode  cloud,  and  importing data into the PIM region
841	        from a dense mode region:

843	        *    In PIM, joining a distribution tree is not passive, routers
844	             with  local  members  must  take  explicit  join  action to
845	             receive data packets. This creates problems when  a  dense-
846	             mode region, wishes to interoperate with PIM. To do so, one
847	             of two things must happen:

849	             1    Either, PMBR's on the border  between  PIM  and  dense
850	                  mode  regions  join  to all of the PIM region's RPs to
851	                  pull out all packets generated within the PIM  region.
852	                  Or,

854	             2    The PMBR on the border of a  dense  mode  region  must
855	                  receive some indication of membership within the dense
856	                  mode cloud, and must generate PIM explicit  Join/Prune
857	                  messages  to  pull  the  data  down  to the dense mode
858	                  cloud.

860	             The first of these two approaches is appropriate  when  the
861	             PIM  region  is  a stub or multihomed and is connected to a
862	             dense mode backbone. The second of these two approaches  is
863	             appropriate  when  the dense mode region is connecting to a
864	             PIM backbone.

866	        *    The PMBRs at the border between PIM and dense mode  regions
867	             must  act  as  DRs  for  the sources external to the PIM-SM
868	             domain. In other words the PMBR  sets  up  source  specific
869	             state and sends Registers on behalf of external sources.

871	        The  details  of  these  mechanisms  are  described  in  [3][19].

873	     4.3 Multicast service interface

875	        The multicast interface for hosts is unchanged. Hosts need  only
876	        learn  about  and  communicate  their  interest  in  joining  to
877	        multicast addresses.

879	     5 Scaling and Aggregation

881	        There   are   several   motivations   for   aggregating   source
882	        information;  the  most  important  are PIM message size and the
883	        amount of memory used for multicast routing entries.

885	        One might consider using the highest level  aggregate  available
886	        for an address when setting up the multicast routing entry. This
887	        is optimal with respect to  routing  entry  space.  It  is  also
888	        optimal  with respect to PIM message size. However, PIM messages
889	        will carry very coarse information and when the messages  arrive
890	        at  routers  closer  to the source(s) where more specific routes
891	        exist there will be a large fanout and PIM messages will  travel
892	        towards  all members of the aggregate which would be inefficient
893	        in most/many cases.

895	        Traditional IP multicast routing (dense mode) does not have this
896	        problem   since   prune  messages  can  carry  most  fine  grain
897	        information which are triggered based on data  packets.  If  the
898	        prune  messages are lost, subsequent data triggers the prune. On
899	        the other hand, graft messages may be  subject  to  the  fan-out
900	        problem.  In  this  case,  they  are  sent as far as the message
901	        information takes it. The penalty is increased join latency.

903	        If PIM is being used for inter-domain routing, and routers  were
904	        able  to  map  from  IP  address  to domain identifier, then one
905	        possibility would be to use the domain  level  aggregate  for  a
906	        source  in  PIM  messages  (Autonomous  System  (AS)  numbers or
907	        Routing Domain Identifiers (RDIs)). Then the PIM  message  would
908	        travel  to  the  PMBRs  of  the domain and the PMBRs can use the
909	        internal multicast protocol's mechanism for propagating the join
910	        within    the   domain   (e.g.   send   appropriate   link-state
911	        advertisement in MOSPF or register a ``local member'' and do  not
912	        prune  in  the case of RPF). However this approach requires that
913	        it is both possible and efficient  to  map  from  IP  to  domain
914	        address  when  processing  data  packets,  as  well  as  control
915	        packets.

917	        We address the issues  of  control  traffic  and  state  scaling
918	        separately  below.  The  detailed  mechanisms  have not yet been
919	        incorporated into the protocol specification as they  are  still
920	        being designed.

922	     5.1 Containing control traffic overhead

924	        To control the bandwidth consumed by periodic control  messages,
925	        we  adopt a technique proposed by one of the authors (Jacobson),
926	        called scalable  timers.  The  timers  controlling  periodic
927	        refreshing  of  control  messages  are  set  such that the total
928	        overhead is a small fixed percentage of the link bandwidth.

930	        Eventually, PIM should use the  scalable  timer  approach;  this
931	        approach  was  initially proposed by Van Jacobson and a detailed
932	        design and analysis was reported in [18]. In this  approach  the
933	        refresh interval is determined by the sender of the information.
934	        The sender can adjust the frequency  of  control  messages  (and
935	        therefore  the  timeout  period at the control message receiver)
936	        depending upon the amount of state that it has  to  communicate,
937	        or  refresh,  over  a  particular  link. It can thereby keep the
938	        amount of control traffic to some small percentage of  the  link
939	        bandwidth. In this case the receiver of the control messages may
940	        infer the appropriate refresh interval based on  measurement  of
941	        arriving   control   traffic,   and   set   its  timeout  values
942	        accordingly.

944	        In the absence  of  more  experimentation  with  scalable  timer
945	        mechanisms,  the  current PIM protocol specifies that the sender
946	        of control messages communication hold-time  values  explicitly.
947	        Therefore,  a  router  tells  its  neighbors how long to keep it
948	        reachable by advertising the  holdtime  in  PIM-Hello  messages.
949	        Likewise,  Join/Prune messages indicate how long state should be
950	        kept. This allows the sender to change its frequency without the
951	        receivers requiring any special configuration information.

953	     5.2 Containing state overhead

955	        PIM-SM maintains less source-specific state than do  dense  mode
956	        protocols.  The  more  important  issue  faced  by  all existing
957	        multicast routing schemes is how to reduce the amount of  group-
958	        specific state. This remains an open area of investigation.

960	     6 Conclusions

962	        We have presented a solution to the problem of routing multicast
963	        packets in large, wide-area internets. Our approach

965	        (a)     uses   constrained,    receiver-initiated,    membership
966	             advertisement for sparsely distributed multicast groups;

968	        (b)   supports both shared and shortest path tree types  in  one
969	             protocol;

971	        (c)   does not depend on a particular unicast protocol; and

973	        (d)   uses soft state mechanisms to  reliably  and  responsively
974	             maintain multicast trees.

976	        The architecture accommodates graceful and efficient  adaptation
977	        to  varying  types of multicast groups, and to different network
978	        conditions.

980	     7 Acknowledgments

982	        Tony Ballardie, Scott Brim, Jon Crowcroft, Paul  Francis,  Lixia
983	        Zhang  and  John  Zwiebel provided detailed comments on previous
984	        drafts. The authors  of  CBT  and  membership  of  the  IDMR  WG
985	        provided  many  of the motivating ideas for this work and useful
986	        feedback on design details.

988	        References
989	   1.   J.Moy. Multicast extension to ospf.
990	         Internet Draft, September 1992.

992	   2.   D.Waitzman S.Deering,  C.Partridge.  Distance  vector  multicast
993	        routing protocol, nov 1988. RFC1075.

995	   3.   D.Estrin, D.Farinacci, A.Helmy, D.Thaler, S.Deering,  M.Handley,
996	        V.Jacobson,  C.Liu,  P.Sharma,  and  L.Wei. Protocol independent
997	        multicast  -  sparse  mode  (pim-sm):  Protocol   specification.
998	        Proposed Experimental RFC, September 1996.

1000	   4.   D.Estrin, D.Farinacci, A.Helmy, V.Jacobson, and L.Wei.  Protocol
1001	        independent   multicast   -   dense   mode   (pim-dm):  Protocol
1002	        specification.  Proposed Experimental RFC, September 1996.

1004	   5.   S.Deering  and  D.Cheriton.  Multicast   routing   in   datagram
1005	        internetworks and extended lans.
1006	         ACM Transactions on Computer Systems, pages 85--111, May 1990.

1008	   6.   S.Deering.  Multicast Routing in a  Datagram  Internetwork.  PhD
1009	        thesis, Stanford University, 1991.

1011	   7.   S.Deering.  Host  extensions  for  ip  multicasting,  aug  1989.
1012	        RFC1112.

1014	   8.   W.Fenner. Internet group management protocol, version 2.
1015	         Internet Draft, May 1996.

1017	   9.   J.Moy. Ospf version 2, oct 1991. RFC1247.

1019	   10.  J.Moy. Mospf: Analysis and experience.
1020	         Internet Draft, July 1993.

1022	   11.  Y.K.  Dalal  and  R.M.  Metcalfe.  Reverse  path  forwarding  of
1023	        broadcast packets.
1024	         Communications of the ACM, 21(12):1040--1048, 1978.

1026	   12.  Ron Frederick. Ietf audio  videocast.
1027	         Internet Society News, 1(4):19, 1993.

1029	   13.  A.J. Ballardie, P.F. Francis, and J.Crowcroft. Core based trees.
1030	        In  Proceedings of the ACM SIGCOMM, San Francisco, 1993.

1032	   14.  David Wall.  Mechanisms for Broadcast and  Selective  Broadcast.
1033	        PhD thesis, Stanford University, June 1980. Technical Report N0.
1034	        190.

1036	   15.  L.Wei and  D.Estrin.  The  trade-offs  of  multicast  trees  and
1037	        algorithms.   In   Proceedings  of  the  1994  international
1038	        conference  on  computer  communications  and   networks,   San
1039	        Francisco, September 1994.

1041	   16.  S.Deering,   D.Estrin,   D.Farinacci,   B.Fenner,    V.Jacobson,
1042	        M.Handley,   D.Thaler,   L.Wei,  and  A.Helmy.  Interoperability
1043	        mechanisms for pim-sm and dvmrp.
1044	         Internet Draft, January 1996.

1046	   17.  D.Estrin, D.Farinacci, A.Helmy, D.Thaler, S.Deering,  M.Handley,
1047	        V.Jacobson,  C.Liu,  P.Sharma,  and  L.Wei. Pim multicast border
1048	        router (pmbr) specification for connecting pim-sm domains  to  a
1049	        dvmrp backbone.  Internet Draft, September 1996.

1051	   18.  P.Sharma, D.Estrin, S.Floyd, and V.Jacobson. Scalable timers for
1052	        soft state protocols.
1053	         Infocom 97, June 1996.

1055	Addresses of Authors:

1057	Deborah Estrin
1058	Computer Science Dept/ISI
1059	University of Southern Calif.
1060	Los Angeles, CA 90089
1061	estrin@usc.edu

1063	Dino Farinacci
1064	Cisco Systems Inc.
1065	170 West Tasman Drive,
1066	San Jose, CA 95134
1067	dino@cisco.com

1069	Ahmed Helmy
1070	Computer Science Dept.
1071	University of Southern Calif.
1072	Los Angeles, CA 90089
1073	ahelmy@catarina.usc.edu

1075	David Thaler
1076	EECS Department
1077	University of Michigan
1078	Ann Arbor, MI 48109
1079	thalerd@eecs.umich.edu

1081	Stephen Deering
1082	Xerox PARC
1083	3333 Coyote Hill Road
1084	Palo Alto, CA 94304
1085	deering@parc.xerox.com

1087	Mark Handley
1088	Department of Computer Science
1089	University College London
1090	Gower Street
1091	London, WC1E 6BT
1092	UK
1093	m.handley@cs.ucl.ac.uk

1095	Van Jacobson
1096	Lawrence Berkeley Laboratory
1097	1 Cyclotron Road
1098	Berkeley, CA 94720
1099	van@ee.lbl.gov

1101	Ching-gung  Liu
1102	Computer Science Dept.
1103	University of Southern Calif.
1104	Los Angeles, CA 90089
1105	charley@catarina.usc.edu

1107	Puneet Sharma
1108	Computer Science Dept.
1109	University of Southern Calif.
1110	Los Angeles, CA 90089
1111	puneet@catarina.usc.edu

1113	Liming Wei
1114	Cisco Systems Inc.
1115	170 West Tasman Drive,
1116	San Jose, CA 95134
1117	lwei@cisco.com