idnits 2.17.1 

draft-morin-l3vpn-mvpn-considerations-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 24.

  -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on
     line 1415.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1426.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1433.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1439.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  == There are 3 instances of lines with non-RFC6890-compliant IPv4 addresses
     in the document.  If these are example addresses, they should be changed.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust Copyright Line does not match the
     current year

  == Line 394 has weird spacing: '...   or   the us...'

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (July 10, 2008) is 5767 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Missing Reference: 'RFC4364' is mentioned on line 746, but not defined

  == Outdated reference: A later version (-10) exists of
     draft-ietf-l3vpn-2547bis-mcast-06

  == Outdated reference: A later version (-08) exists of
     draft-ietf-l3vpn-2547bis-mcast-bgp-05

  == Outdated reference: A later version (-15) exists of
     draft-rosen-vpn-mcast-08

  == Outdated reference: A later version (-10) exists of
     draft-ietf-pim-sm-linklocal-02


     Summary: 1 error (**), 0 flaws (~~), 8 warnings (==), 7 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                      T. Morin, Ed.
3	Internet-Draft                                        France Telecom R&D
4	Intended status: Informational                     B. Niven-Jenkins, Ed.
5	Expires: January 11, 2009                                             BT
6	                                                               Y. Kamite
7	                                                      NTT Communications
8	                                                                R. Zhang
9	                                                                      BT
10	                                                              N. Leymann
11	                                                        Deutsche Telekom
12	                                                                N. Bitar
13	                                                                 Verizon
14	                                                           July 10, 2008

16	    Considerations about Multicast for BGP/MPLS VPN Standardization
17	                draft-morin-l3vpn-mvpn-considerations-03

19	Status of this Memo

21	   By submitting this Internet-Draft, each author represents that any
22	   applicable patent or other IPR claims of which he or she is aware
23	   have been or will be disclosed, and any of which he or she becomes
24	   aware will be disclosed, in accordance with Section 6 of BCP 79.

26	   Internet-Drafts are working documents of the Internet Engineering
27	   Task Force (IETF), its areas, and its working groups.  Note that
28	   other groups may also distribute working documents as Internet-
29	   Drafts.

31	   Internet-Drafts are draft documents valid for a maximum of six months
32	   and may be updated, replaced, or obsoleted by other documents at any
33	   time.  It is inappropriate to use Internet-Drafts as reference
34	   material or to cite them other than as "work in progress."

36	   The list of current Internet-Drafts can be accessed at
37	   http://www.ietf.org/ietf/1id-abstracts.txt.

39	   The list of Internet-Draft Shadow Directories can be accessed at
40	   http://www.ietf.org/shadow.html.

42	   This Internet-Draft will expire on January 11, 2009.

44	Abstract

46	   The current proposal for multicast in BGP/MPLS includes multiple
47	   alternative mechanisms for some of the required building blocks of
48	   the solution.  The aim of this document is to leverage previously
49	   documented requirements to identify the key elements and help move
50	   forward solution design, toward the definition of a standard having a
51	   well defined set of mandatory procedures.  The different proposed
52	   alternative mechanisms are examined in the light of requirements
53	   identified for multicast in L3VPNs, and suggestions are made about
54	   which of these mechanisms standardization should favor.  Issues
55	   related to existing deployments of early implementations are also
56	   addressed.

58	Requirements Language

60	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
61	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
62	   document are to be interpreted as described in [RFC2119].

64	Table of Contents

66	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
67	   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  4
68	   3.  Examining alternatives mechanisms for MVPN functions . . . . .  4
69	     3.1.  MVPN auto-discovery  . . . . . . . . . . . . . . . . . . .  4
70	     3.2.  S-PMSI Signaling . . . . . . . . . . . . . . . . . . . . .  6
71	     3.3.  PE-PE Transmission of C-Multicast Routing  . . . . . . . .  7
72	       3.3.1.  PE-PE signaling scalability  . . . . . . . . . . . . .  8
73	       3.3.2.  P-routers scalability  . . . . . . . . . . . . . . . . 10
74	       3.3.3.  Impact of C-multicast routing on Inter-AS
75	               deployments  . . . . . . . . . . . . . . . . . . . . . 10
76	       3.3.4.  Security and robustness  . . . . . . . . . . . . . . . 11
77	       3.3.5.  C-multicast VPN join latency . . . . . . . . . . . . . 12
78	       3.3.6.  Extranet . . . . . . . . . . . . . . . . . . . . . . . 14
79	       3.3.7.  Conclusion on C-multicast routing  . . . . . . . . . . 14
80	     3.4.  Encapsulation techniques for P-multicast trees . . . . . . 15
81	     3.5.  Inter-AS deployments options . . . . . . . . . . . . . . . 16
82	   4.  Co-located RPs . . . . . . . . . . . . . . . . . . . . . . . . 18
83	   5.  Existing deployments . . . . . . . . . . . . . . . . . . . . . 19
84	   6.  Summary of recommendations . . . . . . . . . . . . . . . . . . 19
85	   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 20
86	   8.  Security Considerations  . . . . . . . . . . . . . . . . . . . 20
87	   9.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 20
88	   10. Informative References . . . . . . . . . . . . . . . . . . . . 20
89	   Appendix A.  Scalability of C-multicast routing processing load  . 21
90	     A.1.  PIM LAN procedures, by default . . . . . . . . . . . . . . 24
91	     A.2.  PIM LAN procedures, with explicit tracking . . . . . . . . 25
92	     A.3.  BGP-based  . . . . . . . . . . . . . . . . . . . . . . . . 26
93	     A.4.  Side by side orders of magnitude comparison  . . . . . . . 27
94	   Appendix B.  Switching to S-PMSI . . . . . . . . . . . . . . . . . 29
95	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 30
96	   Intellectual Property and Copyright Statements . . . . . . . . . . 32

98	1.  Introduction

100	   The current proposal for multicast in BGP/MPLS
101	   [I-D.ietf-l3vpn-2547bis-mcast] includes multiple alternative
102	   mechanisms for some of the required building blocks of the solution.
103	   However, it does not identify the core set of mechanisms which must
104	   be implemented in order to ensure interoperability.  This may lead to
105	   a situation where implementations may support different subsets of
106	   the available optional mechanisms leading to implementations that do
107	   not interoperate, which is a problem for the numerous operators
108	   ahaving multi-vendor backbones.

110	   The aim of this document is to leverage the already expressed
111	   requirements [RFC4834] and study the properties of each approach, to
112	   identify mechanisms that are good candidates for being part of a core
113	   set of mandatory mechanisms which can be used to provide a base for
114	   interoperable solutions.

116	   This document will go through the different building blocks of the
117	   solution and provide recommendations as to which mechanisms should be
118	   favored for each building block, while considering the requirements
119	   already defined and the goal of a fully-interoperable standard.

121	   Considering the history of the multicast VPN proposals and
122	   implementations, the authors also consider it useful to discuss how
123	   existing deployments of early implementations
124	   [I-D.rosen-vpn-mcast][I-D.raggarwa-l3vpn-2547-mvpn] can fit in the
125	   picture, and provide suggestions in this respect.

127	   [This document will evolve to follow key changes in multicast in BGP/
128	   MPLS [I-D.ietf-l3vpn-2547bis-mcast] and
129	   [I-D.ietf-l3vpn-2547bis-mcast-bgp].  Such changes are for instance,
130	   clear statements about compatibility between the different approaches
131	   and other optional features, or completed description of procedures
132	   that are not currently detailed.]

134	2.  Terminology

136	   Please refer to [I-D.ietf-l3vpn-2547bis-mcast] and [RFC4834].

138	3.  Examining alternatives mechanisms for MVPN functions

140	3.1.  MVPN auto-discovery

142	   The current solution document [I-D.ietf-l3vpn-2547bis-mcast] proposes
143	   two different mechanisms for MVPN auto-discovery:

145	   1.  BGP-based auto-discovery

147	   2.  "PIM/shared tree" : discovery done through the exchange of PIM
148	       Hellos by C-PIM instances, accross an MI-PMSI implemented with
149	       one shared tree per VPN (using multicast ASM, or MP2MP LDP)

151	   Both solutions address Section 5.2.10 of [RFC4834] which states that
152	   "the operation of a multicast VPN solution SHALL be as light as
153	   possible and providing automatic configuration and discovery SHOULD
154	   be a priority when designing a multicast VPN solution.  Particularly
155	   the operational burden of setting up multicast on a PE or for a VR/
156	   VRF SHOULD be as low as possible".

158	   The key consideration is that PIM-based discovery is only applicable
159	   to deployments using a shared tree to instantiate an MI-PMSI (it
160	   cannot be applicable to if only P2P or SSM trees are used, because
161	   contrary to ASM and MP2MP, building these P2P or SSM trees cannot
162	   happen before the autodiscovery has been done), whereas the BGP-based
163	   auto-discovery does not place any constraint on the type of multicast
164	   trees that would have to be used.  BGP-based auto-discovery is
165	   independent of the type of P-multicast tree used thus satisfying the
166	   requirement in section 5.2.4.1 of [RFC4834] that "a multicast VPN
167	   solution SHOULD be designed so that control and forwarding planes are
168	   not interdependent".

170	   Additionally, it is to be noted that a number of service providers
171	   have chosen to use SSM-based trees for the default MDTs within their
172	   current deployments, therefore relying already on some BGP-based
173	   auto-discovery.

175	   Moreover, when shared P-tunnels are used, the use of BGP auto-
176	   discovery would allow inconsistencies in the addresses/identifiers
177	   used for the shared trees to be detected (e.g. the same shared tree
178	   identifier being used for different VPNs with distinct BGP route
179	   targets).  This is particularly attractive in the context of inter-AS
180	   VPNs where the impact of any misconfiguration could be magnified and
181	   where a single service provider may not operate all the ASs.  Note
182	   that this technique to detect some misconfiguration cases may not be
183	   usable during a transition period from a shared-tree autodiscovery to
184	   a BGP-based autodiscovery.

186	   Thus, the recommendation is that implementation of the BGP-based
187	   auto-discovery is mandated and should be supported by all mVPN
188	   implementations (while PIM/shared-tree based auto-discovery should be
189	   optionally considered for migration purpose only).

191	3.2.  S-PMSI Signaling

193	   The current solution document [I-D.ietf-l3vpn-2547bis-mcast] proposes
194	   two mechanisms for signaling that multicast flows will be switched to
195	   an S-PMSI :

197	   1.  a UDP-based TLV protocol specifically for S-PMSI signaling
198	       (described in section 7.2.1).

200	   2.  a BGP-based mechanism for S-PMSI signaling (described in section
201	       7.2.2).

203	   Section 5.2.10 of [RFC4834] states that "as far as possible, the
204	   design of a solution SHOULD carefully consider the number of
205	   protocols within the core network: if any additional protocols are
206	   introduced compared with the unicast VPN service, the balance between
207	   their advantage and operational burden SHOULD be examined
208	   thoroughly".  The UDP-based mechanism would be an additional protocol
209	   in the mvpn stack, which isn't the case for the BGP-based S-PMSI
210	   switching signaling, since (a) BGP is identified as a requirement for
211	   autodiscovery, and (b) the BGP-based S-PMSI switching signaling
212	   procedures are very similar to the autodiscovery procedures.

214	   Furthermore, the BGP-based S-PMSI switching signaling mechanism can
215	   be used within MVPNs using either a UI-PMSI or a MI-PMSI while the
216	   UDP-based protocol is restricted to use within MVPNs using an MI-
217	   PMSI.  In practice, this means that, except if shared trees are used,
218	   a PE will have to join to all trees of all PEs in a VPN, while in the
219	   alternative where BGP-based S-PMSI switching signaling is used, it
220	   could delay joining a tree from a PE until traffic from that PE is
221	   needed, thus reducing the amount of state maintained on P routers.

223	   S-PMSI switching signaling approaches can also be compared in an
224	   inter-AS context (see Section 3.5).  The proposed BGP-based approach
225	   for S-PMSI switching signaling provides a good fit with both the
226	   segmented and non-segmented inter-AS approaches (seeSection 3.5).  By
227	   contrast the UDP-based approach for S-PMSI switching signaling
228	   appears to be usable with segmented inter-AS tunnels, but in that
229	   case key advantages of the segmented approach are lost :

231	   o  there is no more an independence of ASes to choose when S-PMSIs
232	      tunnels will be triggered in their AS (and thus control the amount
233	      of state created on their P routers), and with which tunneling
234	      technique they will be built

236	   o  in an inter-AS option B context, an isolation of ASes is obtained
237	      as PEs don't have visibility of, nor exchange with, PEs of other
238	      ASes.  This property can be preserved if the segmented inter-AS
239	      approach and BGP-based S-PMSI switching signaling are used, but it
240	      is not preserved if UDP-based switching signaling is used.

242	   Given all the above, it is the recommendation of the authors that BGP
243	   is the preferred solution for S-PMSI switching signaling and should
244	   be supported by all implementations.

246	   It is identified that, if nothing prevents a fast-paced creation of
247	   S-PMSI, then S-PMSI switching signaling with BGP would possibly
248	   impact the Route Reflectors used for mVPN routes.  However is it also
249	   identified that such a fast-paced behavior would have an impact on P
250	   and PE routers resulting from S-PMSI tunnels signaling, which will be
251	   the same independently of the S-PMSI signaling approach that is used,
252	   and which it is certainly best to avoid by setting up proper
253	   mechanisms.

255	   The UDP-based S-PMSI switching signaling protocol can also be
256	   considered, as an option, given that this protocol has been in
257	   deployment for some time.  Implementations supporting both protocols
258	   would be expected to provide a per-VRF configuration knob to allow an
259	   implementation to use the UDP-based TLV protocol for S-PMSI switching
260	   signaling for specific VRFs in order to support the coexistence of
261	   both protocols (for example during migration scenarios).  Apart from
262	   such migration-facilitating mechanisms, the authors specifically do
263	   not recommend extending the already proposed UDP-based TLV protocol
264	   to new types of P-multicast trees.

266	3.3.  PE-PE Transmission of C-Multicast Routing

268	   The current solution document [I-D.ietf-l3vpn-2547bis-mcast] proposes
269	   multiple mechanisms for PE-PE transmission of customer multicast
270	   routing information:

272	   1.  Full per-MVPN PIM peering across an MI-PMSI (described in section
273	       3.4.1.1).

275	   2.  Lightweight PIM peering across an MI-PMSI (described in section
276	       3.4.1.2)

278	   3.  The unicasting of PIM C-Join/Prune messages (described in section
279	       3.4.1.3)

281	   4.  The use of BGP for carrying C-Multicast routing (described in
282	       section 3.4.2).

284	3.3.1.  PE-PE signaling scalability

286	   Scalability being one of the core requirements for multicast VPN, it
287	   is useful to compare the proposed C-multicast routing mechanisms from
288	   this perspective : Section 4.2.4 of [RFC4834] recommends that "a
289	   multicast VPN solution SHOULD support several hundreds of PEs per
290	   multicast VPN, and MAY usefully scale up to thousands" and section
291	   4.2.5 states that "a solution SHOULD scale up to thousands of PEs
292	   having multicast service enabled".

294	   Scalability with an increased number of VPNs per PE, or with an
295	   increased number of multicast state per VPN, are also important, but
296	   are not focused on in this section since we didn't identify
297	   differences between the different approaches for these matters : all
298	   others things equal, the load on PE due to C-multicast routing
299	   increases roughly linearly with the number of VPNs per PE, and with
300	   the number of multicast state per VPN.

302	   This section thus presents conclusions related to PE-PE signaling
303	   scalability, while Appendix A contains more detailed explanations on
304	   the differences in ways of handling the C-multicast routing load,
305	   between the PIM-based approaches and the BGP-based approach, along
306	   with quantified evaluations of the amount of state and messages with
307	   the different approaches.

309	   At high scales of multicast deployment, the first and third
310	   mechanisms require the PEs to maintain a large number of PIM
311	   adjacencies with other PEs of the same multicast VPN (which implies
312	   the regular exchange PIM Hellos with each other) and to refresh
313	   C-Join/Prune states, thus limiting the scalability of these
314	   approaches.

316	   The third mechanism would reduce the amount of C-Join/Prune
317	   processing for a given multicast flow for PEs that are not the
318	   upstream neighbor for this flow, but would require "explicit
319	   tracking" state to be maintained by the upstream PE.  It also isn't
320	   compatible with the "Join suppression" mechanism.  A possible way to
321	   reduce the amount of signaling with this approach would be the use of
322	   a PIM refresh-reduction mechanism.  Such a mechanism, based on TCP,
323	   is being considered by the PIM WG ([I-D.farinacci-pim-port]) ; its
324	   use in a multicast VPN context hasn't yet been described in
325	   [I-D.ietf-l3vpn-2547bis-mcast], but it is expected that this approach
326	   would provide a scalability similar with the BGP-based approach used
327	   without leveraging RR to process the PE-PE C-multicast routing.
328	   [TBC, when/if, this is further described in
329	   [I-D.ietf-l3vpn-2547bis-mcast]].

331	   The second mechanism would operate in a similar manner to full per-
332	   MVPN PIM peering except that PIM Hello messages are not transmitted
333	   and PIM C-Join/Prune refresh-reduction would be used, thereby
334	   improving scalability, but this approach has yet to be fully
335	   described.  In any case, it seems that it only improves one thing
336	   among the things that will impact scalability with an increased
337	   number of PEs.

339	   The first and second mechanisms can leverage the "Join suppression"
340	   behavior and thus improve the processing burden of an upstream PE,
341	   sparing the processing of a Join refresh message for each remote PE
342	   joined to a multicast stream.  This improvement requires all PEs of a
343	   multicast VPN to process all PIM Join and Prune messages sent by any
344	   other PE participating in the same multicast VPN whether they are the
345	   upstream PE or not.

347	   The fourth mechanism (the use of BGP for carrying C-Multicast
348	   routing) would have a comparable drawback of requiring all PEs to
349	   process a BGP C-multicast route only interesting a specific upstream
350	   PE.  For this reason the C-multicast routing approach can leverage
351	   the Route-Target constraint mechanisms, which specifically allows
352	   only the interested upstream PE to receive a BGP C-multicast route.
353	   When RT-constraints are used the fourth mechanism reduces the total
354	   amount of message processing load put on the PEs for customer
355	   multicast routing to the minimum (by avoiding any processing by
356	   "unrelated" PEs, that are not the joining PE nor the upstream PE, and
357	   by avoiding the use of refreshes), and inherits BGP features that are
358	   expected to improve scalability (for instance, providing a means to
359	   offload some of the processing burden associated with client
360	   multicast routing onto one or many BGP route-reflectors).  This
361	   advantage has a cost (the maintenance of a amount of state linear
362	   with the number of PEs joined to a stream), but when route reflectors
363	   are used, this cost is spread among the route reflectors.

365	   However, the fourth mechanism is specific in that it offers the
366	   possibility of offloading customer multicast routing processing onto
367	   one or more BGP Route Reflector(s).  When this is used, there is a
368	   drawback of increasing the processing load placed on the route
369	   reflector infrastructure.  In the higher scale scenarios, it may be
370	   required to adapt the route relector infrastructure to the mVPN
371	   routing load by using, for example:

373	   o  a separation of resources for unicast and multicast VPN routing :
374	      using dedicated mVPN Route Reflector(s) (or using dedicated mVPN
375	      BGP sessions or dedicated mVPN BGP instances) ;

377	   o  the deployment of additional route reflector resources, for
378	      example increasing the processing resources on existing route
379	      reflectors or deployment of additional route reflectors.

381	   Among the above, the most straightforward approach is to consider the
382	   introduction of route reflectors dedicated to the mVPN service and
383	   dimension them accordingly to the need of that service (but doing so
384	   is not required and is left as an operator engineering decision).

386	3.3.2.  P-routers scalability

388	   Mechanisms (1) and (2) are restricted to use within multicast VPNs
389	   that use an MI-PMSI, thereby necessitating:

391	      the use of a P-multicast tree technique that allows shared trees
392	      (for example PIM-SM in ASM mode or MP2MP LDP)

394	   or   the use of one P-multicast tree per PE per VPN, even for PEs
395	      that do not have sources in their directly attached sites for that
396	      VPN.

398	   By comparison, the fourth mechanism doesn't impose either of these
399	   restrictions, and when P2MP trees are used only necessitates the use
400	   of one tree per VPN per PE attached to a site with a multicast source
401	   or RP (or with a candidate BSR, if BSR is used).

403	   In cases where there are less PEs connected with sources than the
404	   total amount of PEs, it improves the amount of state maintained by
405	   P-routers compared to the amount required to build an MI-PMSI with
406	   P2MP trees.  Such cases are expected to be typical for multicast VPN
407	   deployments (see sections 4.2.4.1 of [RFC4834]).

409	3.3.3.  Impact of C-multicast routing on Inter-AS deployments

411	   Furthermore, co-existence with unicast inter-AS VPN options, and an
412	   equal level of security for multicast and unicast including in an
413	   inter-AS context, are specifically mentioned in sections 5.2.6, 5.2.8
414	   and 5.2.12 of [RFC4834].

416	   In an inter-AS option B context, an isolation of ASes is obtained as
417	   PEs don't have visibility of, nor exchange with, PEs of other ASes.
418	   This property can be preserved if the segmented inter-AS approach and
419	   BGP-based C-multicast routing is used, but it is not preserved if
420	   PIM-based signaling is used.

422	   By comparison, the fourth option (the use of BGP for carrying
423	   C-Multicast routing) does not have any of the above limitations
424	   related to inter-AS deployments.

426	   Additionally, the authors note that the proposed BGP-based approach
427	   for C-multicast routing provides a good fit with both the segmented
428	   and non-segmented inter-AS approaches.  By contrast, though the PIM-
429	   based C-multicast routing is usable with segmented inter-AS trees,
430	   the inter-AS scalability advantage of the approach is lost, since PEs
431	   in an AS will see the C-multicast routing activity of all other PEs
432	   of all other ASes.

434	3.3.4.  Security and robustness

436	   BGP supports MD5 authentication of its peers for additional security,
437	   thereby possibly benefit directly to multicast VPN customer multicast
438	   routing, whether for intra-AS or inter-AS communications.  By
439	   contrast, with a PIM-based approach, no mechanism providing a
440	   comparable level of security to authenticate communications between
441	   remote PEs has been yet fully described yet
442	   [I-D.ietf-pim-sm-linklocal][], and in any case would require
443	   significant additional operations for the provider to be usable in a
444	   multicast VPN context.

446	   The robustness of the infrastructure, especially the existing
447	   infrastructure providing unicast VPN connectivity, is key.  The
448	   C-multicast routing function, especially under load, will compete
449	   with the unicast routing infrastructure.  With the PIM-based
450	   approaches, the unicast and multicast VPN routing functions are
451	   expected to only compete in the PE, for control plane processing
452	   resources.  In the case of the BGP-based approach, they will compete
453	   on the PE for processing resources, and in the route reflectors
454	   (supposing they are used for mVPN routing).  It is identified that in
455	   both cases, mechanisms will be required to arbitrate resources (e.g.
456	   processing priorities).  In the case of PIM-based procedures, between
457	   the different control plane routing instances in the PE.  And in the
458	   case of the BGP-based approach, this is likely to require using
459	   distinct BGP sessions for multicast and unicast (e.g. through the use
460	   of dedicated mVPN BGP route reflectors, or to the use of a distinct
461	   session with an existing route reflector).

463	   Multicast routing is dynamic by nature, and multicast VPN routing has
464	   to follow the VPN customers multicast routing events.  The different
465	   approaches can be compared on how they are expected to behave in
466	   scenarios where multicast routing in the VPNs is subject to an
467	   intense activity.  Scalability of each approach under such a load is
468	   detailed in Appendix A, and the fourth approach (BGP-based) is the
469	   only one having a O(1) cost for join/leave operations, and with which
470	   state maintenance is not concentrated on the upstream PE.

472	   On the other hand, while the BGP-based approach is likely to suffer a
473	   slowdown under a load that is greater than the available processing
474	   resources (because of possibly congested TCP sockets), the PIM-based
475	   approaches would react to such a load by dropping messages, with
476	   failure-recovery obtained through message refreshes.  Thus, the BGP-
477	   based approach could result in a degradation of join/leave latency
478	   performance typically spread evenly across all multicast streams
479	   being joined in that period, while the PIM-based approach could
480	   result in increased join/leave latency, for some random streams, by a
481	   multiple of the time between refreshes (e.g. tens of seconds), and
482	   possibly in some states the adjacency may time-out resulting in
483	   disruption of multicast streams.

485	   The behavior of the PIM-based approach under such a load is also
486	   harder to predict, given that the performance of the "Join
487	   suppression" mechanism (an important mechanism for this approach to
488	   scale) will itself be impeded by delays in Join processing.  For
489	   these reasons, the BGP-based approach would be able to provide a
490	   smoother degradation and more predictable behavior under a highly
491	   dynamic load.

493	   In fact, both an "evenly spread degradation" and an "unevenly spread
494	   larger degradation" can be problematic, and what seems important is
495	   the ability for the VPN backbone operator to (a) limit the amount of
496	   multicast routing activity that can be triggered by a multicast VPN
497	   customer, and to (b) provide the best possible independence between
498	   distinct VPNs.  It seems that both of these can be addressed through
499	   local implementation improvements, and that both the BGP-based and
500	   PIM-based approaches could be engineered to provide (a) and (b).  It
501	   can be noted though that the BGP approach proposes ways to dampen
502	   C-multicast route withdrawals and/or advertisements, and thus already
503	   describes a way to provide (a), while nothing comparable has yet been
504	   described for the PIM-based approaches (even though it doesn't appear
505	   difficult).  The PIM-based approaches rely on a per VPN dataplane to
506	   carry the mVPN control plane, and thus may benefit from this first
507	   level of separation to solve (b).

509	3.3.5.  C-multicast VPN join latency

511	   Section 5.1.3 of [RFC4834] states that "the group join delay [...] is
512	   also considered one important QoS parameter.  It is thus RECOMMENDED
513	   that a multicast VPN solution be designed appropriately in this
514	   regard".  In a multicast VPN context, the "group join delay"of
515	   interest is the time between a CE sending a PIM Join to its PE and
516	   the first packet of the corresponding multicast stream being received
517	   by the CE.

519	   It is to be noted that the C-multicast routing procedures will only
520	   impact the group join latency of a said multicast stream for the
521	   first receiver that is located across the provider backbone from the
522	   multicast source-connected PE (or the first <n> receivers in the
523	   specific case where a specific UMH selection algorithm is used, that
524	   allows <n> distinct UMH to be selected by distinct downstream PEs).

526	   The different approaches proposed seem to have different
527	   characteristics in how they are expected to impact join latency:

529	   o  the PIM-based approaches minimize the number of control plane
530	      processing hops between a new receiver-connected PE and the
531	      source-connected PE, and being datagram-based introduces minimal
532	      delay, thereby possibly having a join latency as good as possible
533	      depending on implementation efficiency

535	   o  under degraded conditions (packet loss, congestion, high control
536	      plane load) the PIM-based approach may impact the latency for a
537	      given multicast stream in an all or nothing manner : if a
538	      C-multicast routing PIM Join packet is lost, latency can reach a
539	      high time (a multiple of the periodicity of PIM Join refreshes)

541	   o  the BGP-based approach uses TCP exchanges, that may introduce an
542	      additional delay depending on BGP and TCP implementation, but
543	      which would typically result, under degraded conditions (such
544	      packet loss, congestion, high control plane load), in a comparably
545	      lower increase of latency spread more evenly across the streams

547	   o  as shown in Appendix A, the BGP-based approach is particular in
548	      that it removes load from all the PEs (without putting this load
549	      on the upstream PE for a stream); this improvement of background
550	      load can bring improved performance when a PE acts as the upstream
551	      PE for a stream, and thus benefit join latency

553	   This qualitative comparison of approaches shows that the BGP-based
554	   approach is designed for a smoother degradation of latency under
555	   degraded conditions such as packet loss, congestion, or high control
556	   plane load.  On the other hand, the PIM-based approaches seem to
557	   structurally be able to reach the shorter "best-case" group join
558	   latency (especially compared to deployment of the BGP-based approach
559	   where route-reflectors are used).

561	   Doing a quantitative comparison of latencies is not possible without
562	   referring to specific implementations and benchmarking procedures,
563	   and would possibly expose different conclusions, especially for best-
564	   case group join latency for which performance is expected vary with
565	   PIM and BGP implementations.  We can also note that improving a BGP
566	   implementation for reduced latency of route processing would not only
567	   benefit multicast VPN group join latency, but the whole BGP-based
568	   routing, which means that the need for good BGP/RR performance is not
569	   specific to multicast VPN routing.

571	   Last, C-multicast join latency will be impacted by the overall load
572	   put on the control plane, and the scalability of the C-multicast
573	   routing approach is thus to be taken into account.  As explained in
574	   sections Section 3.3.1 and Appendix A, the BGP-based approach will
575	   provide the best scalability with an increased number of PEs per VPN,
576	   thereby benefiting group join latency in such higher scale scenarios.

578	3.3.6.  Extranet

580	   An illustrative example of the benefit brought by using a C-multicast
581	   routing approach close to the technique for unicast VPN routing is
582	   how the "extranet" feature can be implemented : when BGP-based
583	   mechanisms are used, the already defined and well understood BGP
584	   route target import/export semantics are just reused and applied to
585	   BGP mVPN routes.  By contrast, it is not specified how implementing
586	   the same feature would be done in the context of other C-multicast
587	   routing mechanisms, and thus unclear how this would bring a
588	   comparable consistency benefit, or if it is possible without
589	   significant engineering trade-offs given that their control plane is
590	   tied to a specific MI-PMSI tunnel. [to be updated when Extranet is
591	   described for approaches other than the BGP-based approaches]

593	   Note that the support for the Extranet feature is stated as a MUST in
594	   sections 5.1.6 of [RFC4834].

596	3.3.7.  Conclusion on C-multicast routing

598	   The fourth approach (BGP-based) for customer multicast routing
599	   clearly presents some advantages over the PIM-based alternatives.
600	   However it has yet to be deployed within an operational mVPN, and
601	   only limited experience exists with its implementations.  By
602	   contrast, PIM-based mechanisms lack many of these benefits and have
603	   identified limitations in how they can handle customer multicast
604	   routing load in higher-scale scenarios.  Despite these, experience in
605	   multiple deployments shows that the "Full PIM peering" approach is
606	   operationally viable.

608	   Consequently, at the present time and until there is experience with
609	   all of the proposed mechanisms it is not clear which of the above
610	   mechanisms should be recommended as the preferred solution to
611	   implementers.  It would appear prudent for implementations to
612	   consider supporting both the fourth (BGP-based) and first (full per-
613	   MPVN PIM peering) mechanisms.  Further experience on both
614	   implementations is likely to be required before some best practice
615	   can be defined.

617	   The first mechanism (full per-MVPN PIM peering across an MI-PMSI) is
618	   the mechanism used by [I-D.rosen-vpn-mcast] and therefore it is
619	   deployed and operating in MVPNs today.  The authors recognize that
620	   because full per-MVPN PIM peering has been in deployment for some
621	   time, the support for this mechanism may be helpful for backwards
622	   compatibility and in order to facilitate migration towards the BGP-
623	   based approach.

625	   Moreover to improve the clarity of the proposed specifications, if
626	   the hello suppression and refresh-reduction procedures are not fully
627	   specified and the benefit they can bring well identified, the authors
628	   would recommend that the proposals for lightweight PIM peering across
629	   an MI-PMSI (the second mechanism) and for the unicasting of PIM
630	   C-Join/Prune messages (the third mechanism) be removed from the final
631	   revision of [I-D.ietf-l3vpn-2547bis-mcast].

633	3.4.  Encapsulation techniques for P-multicast trees

635	   In this section the authors will not make any restricting
636	   recommendations since the appropriateness of a specific provider core
637	   data plane technology will depend on a large number of factors, for
638	   example the service provider's currently deployed unicast data plane,
639	   many of which are service provider specific.

641	   However, implementations should not unreasonably restrict the data
642	   plane technology that can be used, and should not force the use of
643	   the same technology for different VPNs attached to a single PE.
644	   Initial implementations may only support a reduced set of
645	   encapsulation techniques and data plane technologies but this should
646	   not be a limiting factor that hinders future support for other
647	   encapsulation techniques, data plane technologies or
648	   interoperability.

650	   Section 5.2.4.1 of [RFC4834] states "In a multicast VPN solution
651	   extending a unicast L3 PPVPN solution, consistency in the tunneling
652	   technology has to be favored: such a solution SHOULD allow the use of
653	   the same tunneling technology for multicast as for unicast.
654	   Deployment consistency, ease of operation and potential migrations
655	   are the main motivations behind this requirement."

657	   Current unicast VPN deployments use a variety of LDP, RSVP-TE and
658	   GRE/IP-Multicast for encapsulating customer packets for transport
659	   across the provider core of VPN services.  In order to allow the same
660	   encapsulations to be used for unicast and multicast VPN traffic, it
661	   is recommended that multicat VPN standards should recommend
662	   implementations to support for multicast VPNs, all the P2MP variants
663	   of the encapsulations and signaling protocols that they support for
664	   unicast and for which some multipoint extension is defined, such as
665	   mLDP, P2MP RSVP-TE and GRE/IP-multicast.

667	   All three of the above encapsulation techniques support the building
668	   of P2MP multicast trees.  In addition mLDP and GRE/IP-ASM-Multicast
669	   implementations may also support the building of MP2MP multicast
670	   trees.  The use of MP2MP trees may provide some scaling benefits to
671	   the service provider as only a single MP2MP tree need be deployed per
672	   VPN, thus reducing by an order of magnitude the amount of multicast
673	   state that needs to be maintained by P routers.  This gain in state
674	   is at the expense of bandwidth optimization, since sites that do not
675	   have multicast receivers for multicast streams sourced behind a said
676	   PE group will still receive packets of such streams, leading to non-
677	   optimal bandwidth utilization across the VPN core.  One thing to
678	   consider is that the use of MP2MP multicast tree will require
679	   additional configuration to define the same tree identifier or
680	   multicast ASM group address in all PEs (it has been noted that some
681	   auto-configuration could be possible for MP2MP trees, but this it is
682	   not currently supported by the auto-discovery procedures). [ It has
683	   been noted that C-multicast routing schemes not covered in
684	   [I-D.ietf-l3vpn-2547bis-mcast] could expose different advantages of
685	   MP2MP multicast trees - this is out of scope of this document ]

687	   MVPN services can also be supported over a unicast VPN core through
688	   the use of ingress PE replication whereby the ingress PE replicates
689	   any multicast traffic over the P2P tunnels used to support unicast
690	   traffic.  While this option does not require the service provider to
691	   modify their existing P routers (in terms of protocol support) and
692	   does not require maintaining multicast-specific state on the P
693	   routers in order for the service provider to be able deploy a
694	   multicast VPN service, the use of ingress PE replication obviously
695	   leads to non-optimal bandwidth utilization and it is therefore
696	   unlikely to be the long term solution chosen by service providers.
697	   However ingress PE replication may be useful during some migration
698	   scenarios or where a service provider considers the level of
699	   multicast traffic on their network to be too low to justify deploying
700	   multicast specific support within their VPN core.

702	   All proposed approaches for control plane and dataplane can be used
703	   to provide aggregation amongst multicast groups within a VPN and
704	   amongst different multicast VPNs, and potentially reduce the amount
705	   of state to be maintained by P routers.  However the latter -- the
706	   aggregation amongst different multicast VPNs will require support for
707	   upstream-assigned labels on the PEs.  Support for upstream-assigned
708	   labels may require changes to the data plane processing of the PEs
709	   and this should be taken into consideration by service providers
710	   considering the use of aggregate S-PMSI tunnels for the specific
711	   platforms that the service provider has deployed.

713	3.5.  Inter-AS deployments options

715	   There are a number of scenarios that lead to the requirement for
716	   inter-AS multicast VPNs, including:

718	   1.  a service provider may have a large network that they have
719	       segmented into a number of ASs.

721	   2.  a service provider's multicast VPN may consist of a number of ASs
722	       due to acquisitions and mergers with other service providers.

724	   3.  a service provider may wish to interconnect their multicast VPN
725	       platform with that of another service provider.

727	   The first scenario can be considered the "simplest" because the
728	   network is wholly managed by a single service provider under a single
729	   strategy and is therefore likely to use a consistent set of
730	   technologies across each AS.

732	   The second scenario may be more complex than the first because the
733	   strategy and technology choices made for each AS may have been
734	   different due to their differing history and the service provider may
735	   not have (or may be unwilling to) unified the strategy and technology
736	   choices for each AS.

738	   The third scenario is the most complex because in addition to the
739	   complexity of the second scenario, the ASs are managed by different
740	   service providers and therefore may be subject to a different trust
741	   model than the other scenarios.

743	   Section 5.2.6 of [RFC4834] states that "a solution MUST support
744	   inter-AS multicast VPNs, and SHOULD support inter-provider multicast
745	   VPNs", "considerations about coexistence with unicast inter-AS VPN
746	   Options A, B and C (as described in section 10 of [RFC4364]) are
747	   strongly encouraged" and "a multicast VPN solution SHOULD provide
748	   inter-AS mechanisms requiring the least possible coordination between
749	   providers, and keep the need for detailed knowledge of providers'
750	   networks to a minimum - all this being in comparison with
751	   corresponding unicast VPN options".

753	   Section 8 of [I-D.ietf-l3vpn-2547bis-mcast] addresses these
754	   requirements by proposing two approaches for mVPN inter-AS
755	   deployments:

757	   1.  Non-segmented inter-AS tunnels where the multicast tunnels are
758	       end-to-end across ASes, so even though the PEs belonging to a
759	       given MVPN may be in different ASs the ASBRs play no special role
760	       and function merely as P routers (described in section 8.1).

762	   2.  Segmented inter-AS tunnels where each AS constructs its own
763	       separate multicast tunnels which are then 'stitched' together by
764	       the ASBRs (described in section 8.2).

766	   Section 5.2.6 of [RFC4834] also states "Within each service provider
767	   the service provider SHOULD be able on its own to pick the most
768	   appropriate tunneling mechanism to carry (multicast) traffic among
769	   PEs (just like what is done today for unicast)".  The segmented
770	   approach is the only one capable of meeting this requirement.

772	   The segmented inter-AS solution would appear to offer the largest
773	   degree of deployment flexibility to operators.  However the non-
774	   segmented inter-AS solution can simplify deployment in a restricted
775	   number of scenarios and [I-D.rosen-vpn-mcast] only supports the non-
776	   segmented inter-AS solution and therefore the non-segmented inter-AS
777	   solution is likely to be useful to some operators for backward
778	   compatibility and during migration from [I-D.rosen-vpn-mcast] to
779	   [I-D.ietf-l3vpn-2547bis-mcast].

781	   The applicability of segmented or non-segmented inter-AS tunnels to a
782	   given deployment or inter-provider interconnect will depend on a
783	   number of factors specific to each service provider.  However, due to
784	   the additional deployment flexibility offered by segmented inter-AS
785	   tunnels, it is the recommendation of the authors that all
786	   implementations should support the segmented inter-AS model.
787	   Additionally, the authors recommend that implementations should
788	   consider supporting the non-segmented inter-AS model in order to
789	   facilitate co-existence with existing deployments, and as a feature
790	   to provide a lighter engineering in a restricted set of scenarios,
791	   although it is recognized that initial implementations may only
792	   support one or the other.

794	4.  Co-located RPs

796	   Section 5.1.10.1 of [RFC4834] states "In the case of PIM-SM in ASM
797	   mode, engineering of the RP function requires the deployment of
798	   specific protocols and associated configurations.  A service provider
799	   may offer to manage customers' multicast protocol operation on their
800	   behalf.  This implies that it is necessary to consider cases where a
801	   customer's RPs are outsourced (e.g., on PEs).  Consequently, a VPN
802	   solution MAY support the hosting of the RP function in a VR or VRF."

804	   However, customers who have already deployed multicast within their
805	   networks and have therefore already deployed their own internal RPs
806	   are often reluctant to hand over the control of their RPs to their
807	   service provider and make use of a co-located RP model, and providing
808	   RP-collocation on a PE will require the activation of MSDP or the
809	   processing of PIM Registers on the PE.  Securing the PE routers for
810	   such activity requires special care, additional work, and will likely
811	   rely on specific features to be provided by the routers themselves.

813	   The applicability of the co-located RP model to a given MVPN will
814	   thus depend on a number of factors specific to each customer and
815	   service provider.

817	   It is therefore the recommendation that implementations should
818	   support a co-located RP model, but that support for a co-located RP
819	   model within an implementation should not restrict deployments to
820	   using a co-located RP model : implementations MUST support
821	   deployments when activation of a PIM RP function (PIM Register
822	   processing and RP-specific PIM procedures) or VRF MSDP instance is
823	   not required on any PE router and where all the RPs are deployed
824	   within the customers' networks or CEs.

826	5.  Existing deployments

828	   Some suggestions provided in this document can be used to
829	   incrementally modify currently deployed implementations without
830	   hindering these deployments, and without hindering the consistency of
831	   the standardized solution by providing optional per-VRF configuration
832	   knobs to support modes of operation compatible with currently
833	   deployed implementations, while at the same time using the
834	   recommended approach on implementations supporting the standard.

836	   In cases where this may not be easily achieved, a recommended
837	   approach would be to provide a per-VRF configuration knob that allows
838	   incremental per-VPN migration of the mechanisms used by a PE device,
839	   which would allow migration with some per-VPN interruption of service
840	   (e.g. during a maintenance window).

842	   Mechanisms allowing "live" migration by providing concurrent use of
843	   multiple alternatives for a given PE and a given VPN, is not seen as
844	   a priority considering the expected implementation complexity
845	   associated with such mechanisms.  However, if there happen to be
846	   cases where they could be viably implemented relatively simply, such
847	   mechanisms may help improve migration management.

849	6.  Summary of recommendations

851	   The following list summarizes the authors' recommendations.  These
852	   recommendations are not intended to prevent the implementation of
853	   alternative solutions, rather they are the authors' recommendations
854	   for the mechanisms that should be made mandatory in
855	   [I-D.ietf-l3vpn-2547bis-mcast] and therefore be supported by all
856	   implementations.

858	   It is the authors' recommendation:

860	   o  that BGP-based auto-discovery be the mandated solution for auto-
861	      discovery ;

863	   o  that BGP be the mandated solution for S-PMSI switching signaling ;

865	   o  that implementations support both the BGP-based and the full per-
866	      MPVN PIM peering solutions for PE-PE transmission of customer
867	      multicast routing until further operational experience is gained
868	      with both solutions ;

870	   o  that implementations implement the P2MP variants of the P2P
871	      protocols that they already implement, such as mLDP, P2MP RSVP-TE
872	      and GRE/IP-Multicast ;

874	   o  that implementations support segmented inter-AS tunnels and
875	      consider supporting non-segmented inter-AS tunnels (in order to
876	      maintain backwards compatibility and for migration) ;

878	   o  implementations MUST support deployments when activation of a PIM
879	      RP function (PIM Register processing and RP-specific PIM
880	      procedures) or VRF MSDP instance is not required on any PE router.

882	7.  IANA Considerations

884	   This document makes no request to IANA.

886	   [ Note to RFC Editor: this section may be removed on publication as
887	   an RFC. ]

889	8.  Security Considerations

891	   This document does not by itself raise any particular security
892	   considerations.

894	9.  Acknowledgements

896	   We would like to thank Adrian Farrel, Eric Rosen, Yakov Rekhter, and
897	   Maria Napierala for their feedback that helped shape this document.

899	10.  Informative References

901	   [RFC4834]  Morin, T., "Requirements for Multicast in L3 Provider-
902	              Provisioned Virtual Private Networks (PPVPNs)", RFC 4834,
903	              April 2007.

905	   [I-D.ietf-l3vpn-2547bis-mcast]
906	              Rosen, E. and R. Aggarwal, "Multicast in MPLS/BGP IP
907	              VPNs", draft-ietf-l3vpn-2547bis-mcast-06 (work in
908	              progress), October 2006.

910	   [I-D.ietf-l3vpn-2547bis-mcast-bgp]
911	              Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP
912	              Encodings and Procedures for Multicast in MPLS/BGP IP
913	              VPNs", draft-ietf-l3vpn-2547bis-mcast-bgp-05 (work in
914	              progress), June 2008.

916	   [I-D.rosen-vpn-mcast]
917	              Rosen, E., "Multicast in MPLS/BGP VPNs",
918	              draft-rosen-vpn-mcast-08 (work in progress),
919	              December 2004.

921	   [I-D.raggarwa-l3vpn-2547-mvpn]
922	              Aggarwal, R., "Base Specification for Multicast in BGP/
923	              MPLS VPNs", draft-raggarwa-l3vpn-2547-mvpn-00 (work in
924	              progress), June 2004.

926	   [I-D.ietf-pim-sm-linklocal]
927	              Atwood, J., "Authentication and Confidentiality in PIM-SM
928	              Link-local Messages", draft-ietf-pim-sm-linklocal-02 (work
929	              in progress), November 2007.

931	   [I-D.farinacci-pim-port]
932	              Farinacci, D., Wijnands, I., Karan, A., Boers, A., and M.
933	              Napierala, "A Reliable Transport Mechanism for PIM",
934	              draft-farinacci-pim-port-01 (work in progress), May 2008.

936	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
937	              Requirement Levels", BCP 14, RFC 2119, March 1997.

939	Appendix A.  Scalability of C-multicast routing processing load

941	   The main role of multicast routing is to let routers determine that
942	   they should start or stop forwarding a said multicast stream on a
943	   said link.  In the multicast VPN context, this has to be made for
944	   each VPN, and the associated function is thus named "customer-
945	   multicast routing" or "C-multicast routing" and its role is to let PE
946	   routers determine that they should start of stop forwarding the
947	   traffic of a said multicast stream toward the remote PEs, on some
948	   S-PMSI tunnel.

950	   When some "join" message is received by a PE, this PE knows that it
951	   should be sending traffic for the corresponding multicast group of
952	   the corresponding VPN.  But the reception of a "prune" message from a
953	   remote PE is not enough by itself for a PE to know that it should
954	   stop forwarding the corresponding multicast traffic : it has to make
955	   sure that they aren't any other PEs that still have receivers for
956	   this traffic.

958	   There are many ways that the "C-multicast routing" building block can
959	   be designed so that a PE can determine when it can stop forwarding a
960	   said multicast stream toward other PEs:

962	   PIM LAN Procedures, by default
963	      By default when PIM LAN procedures are used, when a PE Prunes
964	      itself from a multicast tree, all other PEs check their own state
965	      to known if they are on the tree, in which case they send a PIM
966	      Join message to override the Prune.  The "did the last receiver
967	      leave?" question is thus implicitly replied to by all PE routers,
968	      for each PIM Prune message.

970	   PIM LAN Procedures, with explicit tracking :
971	      PIM LAN procedures can use an "explicit tracking" approach, where
972	      a PE which is the upstream router for a multicast stream maintains
973	      an updated list of all neighbors who are joined to the tree.
974	      Thus, when it receives a Leave message from a PIM neighbor, it
975	      instantly knows the answer to the "did the last receiver leave?"
976	      question.
977	      In this case, the question is replied to by the upstream router
978	      alone.  The side effect of this "explicit tracking" is that "Join
979	      suppression" is not used : the downstream PEs will always send
980	      Joins toward the upstream PE, which will have to process them all.

982	   BGP-based C-multicast routing
983	      When BGP-based procedures are used for C-multicast routing, if no
984	      BGP route reflector is used, the "did the last receiver leave?"
985	      question is answered like in the PIM "explicit tracking" approach.
986	      But, when a BGP route reflector is used (which is expected to be
987	      the recommended approach), the role of maintaining an updated list
988	      of the PE part of a said multicast tree is taken care of by the
989	      route reflector(s).  Using plain BGP route selection procedures,
990	      the route reflector will withdraw a C-multicast Source Tree Join
991	      for a said (C-S,C-G) when there is no PE advertising one anymore.
992	      In this context, the "did the last receiver leave?" question can
993	      be said to be answered by the route-reflector alone.
994	      Furthermore, the BGP route distribution can leverage more than one
995	      route reflector : if a hierarchy of route reflectors is used, the
996	      "did the last receiver leave?" question is partly answered by each
997	      route reflector in the hierarchy.

999	   We can see that answering the "last receiver leaves" question is a
1000	   significant proportion of the work that the C-multicast routing
1001	   building block has to make, and where approaches differ most.  The
1002	   different approaches for handling C-multicast routing can result in a
1003	   different amount of processing and how this processing is spread
1004	   among the different functions.  These differences can be better
1005	   estimated by quantifying the amount of message processing and state
1006	   maintenance.

1008	   Though the type of processing, messages and states, may vary with the
1009	   different approaches, we propose here a rough estimation of the load
1010	   of PEs, in terms of number of messages processed and number of
1011	   control plane states maintained : a "message processed" being a
1012	   message being parsed, a lookup being done, and some action being
1013	   taken (such has updating a control plane or data plane state), and a
1014	   "state maintained" being a multicast state kept in the control plane
1015	   memory of a PE, related to a interface or a PE being subscribed to a
1016	   multicast stream (we don't compare the data plane states on PE
1017	   routers, which wouldn't vary between the different options chosen).

1019	   The following subsections do such an estimation for each proposed
1020	   approach for C-multicast routing, for different phases of the
1021	   following scenario:

1023	   o  one SSM multicast stream is considered - scalability extrapolation
1024	      to more than one stream is linear

1026	   o  only the intra-AS case is concerned (with the segmented inter-AS
1027	      trees and BGP-based C-multicast routing, #mvpn_PES and #joined_PEs
1028	      should refer to the PEs of the mVPN in the AS, not to all PEs of
1029	      the mVPN)

1031	   o  the scenario is as follows:

1033	      *  one PE Joins the multicast stream (because of a new receiver-
1034	         connected site has sent a Join on the PE-CE link), followed by
1035	         additional PE that also join the multicast stream, one after
1036	         the other ; we evaluate the processing required for the
1037	         addition of each PE

1039	      *  some period of time T passes, without any PE joining or leaving
1040	         (baseline)

1042	      *  all PE leaves, one after the other, until the last one leaves ;
1043	         we evaluate the processing required for the leave of each PE

1045	   o  the parameters used are:

1047	      *  #mVPN_PEs : the number of PEs in the mVPN

1049	      *  #R_PEs : the number of PEs joining the multicast stream

1051	      *  #RRs : the number of route reflectors

1053	      *  T_PIM_r : the time between two refreshes of a PIM Join (default
1054	         is 60s)

1056	   The estimation unit used is the "message.equipment" or "m.e", one
1057	   "message.equipment" being "one equipment processing one message" (10
1058	   m.e being "10 equipments processing each one message", or "5 messages
1059	   each processed by 2 equipments", or "1 message processed by 10
1060	   equipment", etc.).  Similarly for the amount of control plane state,
1061	   we count in "state.equipment" or "s.e".

1063	   We distinguish three different types of equipments : the upstream PE
1064	   for the multicast stream, the RR (if any), and the other PEs (which
1065	   are not the upstream PE).  The estimation is a total number of
1066	   "message.equipment", for each type of equipment.

1068	   Additional precisions:

1070	   o  for PIM, only Join and Prune messages are counted ; the PIM Hellos
1071	      are not counted since these are not messages that trigger specific
1072	      action in a typical scenario; message processing related to the
1073	      PIM Assert mechanism is also not taken into account, because it is
1074	      only active in transient state

1076	   o  for BGP, only UPDATE message for mVPN route carrying C-multicast
1077	      routing information are considered

1079	A.1.  PIM LAN procedures, by default

1081	   +------------+-----------+----------------+----------+--------------+
1082	   |            | upstream  | other PEs      | RR       | total        |
1083	   |            | PE (1)    | (#mvpn_PEs -1) | (none)   |              |
1084	   +------------+-----------+----------------+----------+--------------+
1085	   | first PE   | 1 m.e     | #mVPN_PEs-1    | /        | #mVPN_PEs    |
1086	   | joins      |           | m.e            |          | m.e          |
1087	   +------------+-----------+----------------+----------+--------------+
1088	   | for *each* | 1 m.e     | #mvpn_PEs-1    | /        | #mvpn_PEs    |
1089	   | additional |           | m.e            |          | m.e          |
1090	   | PE joining |           |                |          |              |
1091	   +------------+-----------+----------------+----------+--------------+
1092	   +------------+-----------+----------------+----------+--------------+
1093	   | baseline   | T/T_PIMr  | (T/T_PIMr) .   | /        | (T/T_PIMr) x |
1094	   | processing | m.e       | (#mvpn_PEs -1) |          | #mvpn_PEs    |
1095	   | over a     |           | m.e            |          | m.e          |
1096	   | period T   |           |                |          |              |
1097	   +------------+-----------+----------------+----------+--------------+
1098	   | for *each* | 2 m.e     | 2(#mvpn_PEs-1) | /        | 2 x          |
1099	   | PE leaving |           | m.e            |          | #mvpn_PEs    |
1100	   |            |           |                |          | m.e          |
1101	   +------------+-----------+----------------+----------+--------------+
1102	   | the last   | 1 m.e     | #mvpn_PEs-1    | /        | #mvpn_PEs    |
1103	   | PE leaves  |           | m.e            |          | m.e          |
1104	   +------------+-----------+----------------+----------+--------------+
1105	   | total for  | #R_PEs x  | (#mvpn_PEs-1)  | 0        | #mvpn_PEs x  |
1106	   | #R_PEs PEs | 2 +       | x (#R_PEs) x 2 |          | ( 3 x        |
1107	   |            | T/T_PIMr  | + T/T_PIMr) .  |          | #joined_PEs  |
1108	   |            | m.e       | (#mvpn_PEs -1) |          | + T/T_PIMr ) |
1109	   |            |           | m.e            |          | m.e          |
1110	   +------------+-----------+----------------+----------+--------------+
1111	   | total      | 1 s.e     | #joined_PE s.e | 0        | #R_PEs+1 s.e |
1112	   | state      |           |                |          |              |
1113	   | maintained |           |                |          |              |
1114	   +------------+-----------+----------------+----------+--------------+

1116	   Amount of messages processed for one multicast tree of one VPN - PIM
1117	                        LAN procedures, by default

1119	   We suppose here that the Join suppression and PIM Override mechanisms
1120	   are fully effective, ie. that a Join sent by a PE is instantly seen
1121	   by other PEs.  Strictly speaking, this is not true, and depending on
1122	   network delays and timing, there could be cases where more messages
1123	   are exchanged.

1125	A.2.  PIM LAN procedures, with explicit tracking

1127	   +--------------+-------------+--------------+--------+--------------+
1128	   |              | upstream PE | other PEs    | RRs    | total        |
1129	   |              | (1)         | (#mvpn_PEs   | (none) |              |
1130	   |              |             | -1)          |        |              |
1131	   +--------------+-------------+--------------+--------+--------------+
1132	   | first PE     | 1 m.e       | 1 m.e (see   | /      | 2 m.e        |
1133	   | joins        |             | note below)  |        |              |
1134	   +--------------+-------------+--------------+--------+--------------+
1135	   | for *each*   | 1 m.e       | 1 m.e (see   | /      | 2 m.e        |
1136	   | additional   |             | note below)  |        |              |
1137	   | PE joining   |             |              |        |              |
1138	   +--------------+-------------+--------------+--------+--------------+
1139	   +--------------+-------------+--------------+--------+--------------+
1140	   | baseline     | (T/T_PIM)   | (T/T_PIMr)   | /      | (T/T_PIMres) |
1141	   | processing   | m.e x       | m.e (see     |        | x #R_PEs m.e |
1142	   | over a       | #R_PEs m.e  | note below)  |        |              |
1143	   | period T     |             |              |        |              |
1144	   +--------------+-------------+--------------+--------+--------------+
1145	   | for *each*   | 1 m.e       | 1 m.e (see   | /      | 2 m.e        |
1146	   | PE leaving   |             | note below)  |        |              |
1147	   +--------------+-------------+--------------+--------+--------------+
1148	   | the last PE  | 1 m.e       | 1 m.e (see   | /      | 2 m.e        |
1149	   | leaves       |             | note below)  |        |              |
1150	   +--------------+-------------+--------------+--------+--------------+
1151	   | total for    | #R_PEs (2 + | #R_PEs x ( 2 | 0      | #R_PEs x ( 4 |
1152	   | #R_PEs PEs   | T/T_PIMr)   | + T/T_PIMr)  |        | + T/T_PIMr)  |
1153	   |              | m.e         | m.e          |        | m.e          |
1154	   +--------------+-------------+--------------+--------+--------------+
1155	   | total state  | #R_PEs s.e  | #R_PEs s.e   | 0      | 2 x #R_PEs   |
1156	   | maintained   |             |              |        | s.e          |
1157	   +--------------+-------------+--------------+--------+--------------+

1159	   Amount of messages processed for one multicast tree of one VPN - PIM
1160	                  LAN procedures, with explicit tracking

1162	   Note: in this explicit tracking mode, a said Join or Leave message
1163	   requires processing only by the upstream PE and the PE sending the
1164	   message ; indeed, other PEs don't have any action to take ; it is to
1165	   be noted though that these other PEs will still have to parse the PIM
1166	   message, which is not non-zero processing.  We make here the
1167	   assumption that this is not significant.

1169	A.3.  BGP-based

1171	   About RR: we suppose that a message has to be processed by r BGP
1172	   route reflectors to go from a receiver-connected PE to the source-
1173	   connected PE.  In practice, r depends on how RR are meshed, and would
1174	   typically be small (max 1,2,3...), and r tends quickly toward 1 (as
1175	   soon as there is a receiver-connected PEs in each RR cluster).

1177	   We make the assumption that RT constraint is used, if not the amount
1178	   of state and message processing with this approach is similar to the
1179	   PIM with explicit tracking approach, without the Joins refreshes.

1181	   +--------------+----------+------------+-------------+--------------+
1182	   |              | upstream | other PEs  | RRs (#RRs)  | total        |
1183	   |              | PE (1)   | (#mvpn_PEs |             |              |
1184	   |              |          | -1)        |             |              |
1185	   +--------------+----------+------------+-------------+--------------+
1186	   | first PE     | 1 m.e    | 1 m.e      | r m.e       | (r+2) m.e    |
1187	   | joins        |          |            |             |              |
1188	   +--------------+----------+------------+-------------+--------------+
1189	   | for *each*   | 0        | 1 m.e      | between 1   | between 2    |
1190	   | additional   |          |            | and r m.e   | and (r+1)    |
1191	   | PE joining   |          |            |             | m.e          |
1192	   +--------------+----------+------------+-------------+--------------+
1193	   | baseline     | 0        | 0          | 0           | 0            |
1194	   | processing   |          |            |             |              |
1195	   | over a       |          |            |             |              |
1196	   | period T     |          |            |             |              |
1197	   +--------------+----------+------------+-------------+--------------+
1198	   | for *each*   | 0        | 1 m.e      | between 1   | between 2    |
1199	   | PE leaving   |          |            | and r m.e   | and (r+1)    |
1200	   |              |          |            |             | m.e          |
1201	   +--------------+----------+------------+-------------+--------------+
1202	   | the last PE  | 1 m.e    | 1 m.e      | r m.e       | (r+2) m.e    |
1203	   | leaves       |          |            |             |              |
1204	   +--------------+----------+------------+-------------+--------------+
1205	   | total for    | 2 m.e    | #R_PEs x 2 | 2           | 2 (2 x       |
1206	   | #R_PEs PEs   |          | m.e        | (r+#R_PEs)  | #R_PEs + r + |
1207	   |              |          |            | m.e         | 1) m.e       |
1208	   +--------------+----------+------------+-------------+--------------+
1209	   | total state  | 1 s.e    | #R_PEs s.e | approx.     | approx.      |
1210	   | maintained   |          |            | #R_PEs x    | (#R_PEs x    |
1211	   |              |          |            | #RRs s.e    | (#RRs+1))    |
1212	   |              |          |            |             | m.e          |
1213	   +--------------+----------+------------+-------------+--------------+

1215	   Amount of messages processed for one multicast tree of one VPN - BGP-
1216	                             based procedures

1218	A.4.  Side by side orders of magnitude comparison

1220	   This section concludes on the previous section by considering the
1221	   orders of magnitude when the number of PE in a VPN increases.

1223	   +------------+-----------------------+--------------+---------------+
1224	   |            | PIM LAN Procedures,   | PIM LAN      | BGP-based     |
1225	   |            | default               | Procedures,  |               |
1226	   |            |                       | explicit     |               |
1227	   |            |                       | tracking     |               |
1228	   +------------+-----------------------+--------------+---------------+
1229	   | first PE   | O(#mVPN_PEs)          | O(1)         | O(1)          |
1230	   | joins      |                       |              |               |
1231	   +------------+-----------------------+--------------+---------------+
1232	   | for *each* | O(#mVPN_PEs)          | O(1)         | O(1)          |
1233	   | additional |                       |              |               |
1234	   | PE joining |                       |              |               |
1235	   +------------+-----------------------+--------------+---------------+
1236	   | baseline   | (T/T_PIMr) x          | (T/T_PIMr) x | 0             |
1237	   | processing | O(#mvpn_PEs)          | O(#R_PEs)    |               |
1238	   | over a     |                       |              |               |
1239	   | period T   |                       |              |               |
1240	   +------------+-----------------------+--------------+---------------+
1241	   | for *each* | O(#mVPN_PEs)          | O(1)         | O(1)          |
1242	   | PE leaving |                       |              |               |
1243	   +------------+-----------------------+--------------+---------------+
1244	   | the last   | O(#mVPN_PEs)          | O(1)         | O(1)          |
1245	   | PE leaves  |                       |              |               |
1246	   +------------+-----------------------+--------------+---------------+
1247	   | total for  | O(#mVPN_PEs x #R_PEs) | O(#R_PEs) x  | O(#R_PEs)     |
1248	   | #R_PEs PEs | + O(#mVPN_PEs x       | (T/T_PIMr)   |               |
1249	   |            | T/T_PIMr)             |              |               |
1250	   +------------+-----------------------+--------------+---------------+
1251	   | states     | O(#R_PEs)             | O(#R_PEs)    | O(#R_PEs x    |
1252	   |            |                       |              | #RRs)         |
1253	   +------------+-----------------------+--------------+---------------+
1254	   | notes      | (processing and state | (processing  | (processing   |
1255	   |            | maintenance are       | and state    | and state     |
1256	   |            | essentially done by,  | maintenance  | maintenance   |
1257	   |            | and spread amongst,   | is           | is            |
1258	   |            | the PEs of the mvpn ; | essentially  | essentially   |
1259	   |            | non-upstream PEs have | done on the  | done by, and  |
1260	   |            | processing to do)     | upstream PE) | spread        |
1261	   |            |                       |              | amongst, the  |
1262	   |            |                       |              | RRs)          |
1263	   +------------+-----------------------+--------------+---------------+

1265	   Amount of messages processed for one multicast tree of one VPN - PIM
1266	                  LAN procedures, with explicit tracking

1268	   The conclusions that can be drawn from the above are that:

1270	   o  the PIM LAN Procedures default approach is particular in that all
1271	      PEs, including those that are neither upstream nor downstream for
1272	      a given message have processing to do, which results in a total
1273	      amount of messages to process which is in O(#mVPN_PEs x #R_PEs),
1274	      i.e.  O(#mVPN_PEs ^ 2) if the proportion of R_PEs is considered
1275	      constant when the number of PEs increases

1277	   o  the two PIM-based approach do refreshes of Join messages, this is
1278	      a linear factor not changing the order of magnitude, but which can
1279	      be significant for long-lived streams

1281	   o  the BGP-based approach requires an amount of message processing in
1282	      O(#R_PEs), lower than the two other approaches, and which is
1283	      independent of the duration of streams

1285	   o  state maintenance is in the same order of magnitude for all
1286	      approaches : O(#R_PEs), but the repartition is different:

1288	      *  the PIM LAN Procedure default approach fully spreads, and
1289	         minimizes, the amount of state (one state per PE)

1291	      *  the PIM LAN procedure with explicit tracking, concentrate all
1292	         state on the upstream PE

1294	      *  the BGP-based procedures spread all the state on the set of
1295	         route reflectors

1297	   This quantification of message processing is based on a use case
1298	   where each PE with a receiver joins and leave once.  Drawing
1299	   scalability-related conclusions for other patterns or frequency of
1300	   changes of the set of receiver-connected PEs, requires considering
1301	   the cost of each approach for "a new PE joining" and "a (non-last) PE
1302	   leaving".  From this perspective, the "PIM LAN Procedure default
1303	   approach" is the most costly one (processing in O(#mVPN_PEs)),
1304	   whereas the other approaches are in O(1) ; the "PIM LAN Procedures
1305	   with explicit tracking" reduce the processing to the minimum in that
1306	   case, the BGP-based approach having a cost increased by a linear
1307	   factor depending on the number of RRs that will have to parse the
1308	   message.

1310	Appendix B.  Switching to S-PMSI

1312	   [ the following point was fixed in -07, and is here for reference
1313	   only ]

1315	   Section 7.2.2.3 of [I-D.ietf-l3vpn-2547bis-mcast] proposes two
1316	   approaches for how a source PE can decide when to start transmitting
1317	   customer multicast traffic on a S-PMSI:

1319	   1.  The source PE sends multicast packets for the <C-S, C-G> on both
1320	       the I-PMSI P-multicast tree and the S-PMSI P-multicast tree
1321	       simultaneously for a pre-configured period of time, letting the
1322	       receiver PEs select the new tree for reception, before switching
1323	       to only the S-PMSI.

1325	   2.  The source PE waits for a pre-configured period of time after
1326	       advertising the <C-S, C-G> entry bound to the S-PMSI before fully
1327	       switching the traffic onto the S-PMSI-bound P-multicast tree.

1329	   The first alternative has essentially two drawbacks:

1331	   o  <C-S,C-G> traffic is sent twice for some period of time, which
1332	      would appear to be at odds with the motivation for switching to an
1333	      S-PMSI in order to optimize the bandwidth used by the multicast
1334	      tree for that stream.

1336	   o  It is unlikely that the switchover can occur without packet loss
1337	      or duplication if the transit delays of the I-PMSI P-multicast
1338	      tree and the S-PMSI P-multicast tree differ.

1340	   By contrast, the second alternative has none of these drawbacks, and
1341	   satisfy the requirement in section 5.1.3 of [RFC4834], which states
1342	   that "[...] a multicast VPN solution SHOULD as much as possible
1343	   ensure that client multicast traffic packets are neither lost nor
1344	   duplicated, even when changes occur in the way a client multicast
1345	   data stream is carried over the provider network".  The second
1346	   alternative also happen to be the one used in existing deployments.

1348	   For these reasons, it is the authors' recommendation to mandate the
1349	   implementation of the second alternative for switching to S-PMSI.

1351	Authors' Addresses

1353	   Thomas Morin (editor)
1354	   France Telecom R&D
1355	   2 rue Pierre Marzin
1356	   Lannion  22307
1357	   France

1359	   Email: thomas.morin@orange-ftgroup.com
1360	   Ben Niven-Jenkins (editor)
1361	   BT
1362	   208 Callisto House, Adastral Park
1363	   Ipswich, Suffolk  IP5 3RE
1364	   UK

1366	   Email: benjamin.niven-jenkins@bt.com

1368	   Yuji Kamite
1369	   NTT Communications Corporation
1370	   Tokyo Opera City Tower
1371	   3-20-2 Nishi Shinjuku, Shinjuku-ku
1372	   Tokyo  163-1421
1373	   Japan

1375	   Email: y.kamite@ntt.com

1377	   Raymond Zhang
1378	   BT
1379	   2160 E. Grand Ave.
1380	   El Segundo  CA 90025
1381	   USA

1383	   Email: raymond.zhang@bt.com

1385	   Nicolai Leymann
1386	   Deutsche Telekom
1387	   Goslarer Ufer 35
1388	   10589 Berlin
1389	   Germany

1391	   Email: nicolai.leymann@t-systems.com

1393	   Nabil Bitar
1394	   Verizon
1395	   40 Sylvan Road
1396	   Waltham, MA  02451
1397	   USA

1399	   Email: nabil.n.bitar@verizon.com

1401	Full Copyright Statement

1403	   Copyright (C) The IETF Trust (2008).

1405	   This document is subject to the rights, licenses and restrictions
1406	   contained in BCP 78, and except as set forth therein, the authors
1407	   retain all their rights.

1409	   This document and the information contained herein are provided on an
1410	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
1411	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
1412	   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
1413	   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
1414	   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
1415	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

1417	Intellectual Property

1419	   The IETF takes no position regarding the validity or scope of any
1420	   Intellectual Property Rights or other rights that might be claimed to
1421	   pertain to the implementation or use of the technology described in
1422	   this document or the extent to which any license under such rights
1423	   might or might not be available; nor does it represent that it has
1424	   made any independent effort to identify any such rights.  Information
1425	   on the procedures with respect to rights in RFC documents can be
1426	   found in BCP 78 and BCP 79.

1428	   Copies of IPR disclosures made to the IETF Secretariat and any
1429	   assurances of licenses to be made available, or the result of an
1430	   attempt made to obtain a general license or permission for the use of
1431	   such proprietary rights by implementers or users of this
1432	   specification can be obtained from the IETF on-line IPR repository at
1433	   http://www.ietf.org/ipr.

1435	   The IETF invites any interested party to bring to its attention any
1436	   copyrights, patents or patent applications, or other proprietary
1437	   rights that may cover technology that may be required to implement
1438	   this standard.  Please address the information to the IETF at
1439	   ietf-ipr@ietf.org.