idnits 2.17.1 

draft-ietf-l3vpn-mvpn-considerations-06.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** You're using the IETF Trust Provisions' Section 6.b License Notice from
     12 Sep 2009 rather than the newer Notice from 28 Dec 2009.  (See
     https://trustee.ietf.org/license-info/)


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  == There are 3 instances of lines with non-RFC6890-compliant IPv4 addresses
     in the document.  If these are example addresses, they should be changed.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 422 has weird spacing: '...   or   the us...'

  -- The document date (February 2, 2010) is 5190 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'RFC4364' is mentioned on line 747, but not defined

  == Outdated reference: A later version (-10) exists of
     draft-ietf-l3vpn-2547bis-mcast-09

  == Outdated reference: A later version (-15) exists of
     draft-rosen-vpn-mcast-12

  == Outdated reference: A later version (-10) exists of
     draft-ietf-pim-sm-linklocal-08

  == Outdated reference: A later version (-09) exists of
     draft-ietf-pim-port-02

  == Outdated reference: A later version (-15) exists of
     draft-ietf-mpls-ldp-p2mp-08


     Summary: 1 error (**), 0 flaws (~~), 10 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                      T. Morin, Ed.
3	Internet-Draft                                     France Telecom Orange
4	Expires: August 6, 2010                            B. Niven-Jenkins, Ed.
5	                                                                      BT
6	                                                               Y. Kamite
7	                                                      NTT Communications
8	                                                                R. Zhang
9	                                                                      BT
10	                                                              N. Leymann
11	                                                        Deutsche Telekom
12	                                                                N. Bitar
13	                                                                 Verizon
14	                                                        February 2, 2010

16	    Mandatory Features in a Layer 3 Multicast BGP/MPLS VPN Solution
17	                draft-ietf-l3vpn-mvpn-considerations-06

19	Abstract

21	   More that one set of mechanisms to support multicast in a layer 3
22	   BGP/MPLS VPN has been defined.  These are presented in the documents
23	   that define them as optional building blocks.

25	   To enable interoperability between implementations, this document
26	   defines a subset of features that is considered mandatory for a
27	   multicast BGP/MPLS VPN implementation.  This will help implementers
28	   and deployers understand which L3VPN multicast requirements are best
29	   satisfied by each option.

31	Requirements Language

33	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
34	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
35	   document are to be interpreted as described in [RFC2119].

37	Status of this Memo

39	   This Internet-Draft is submitted to IETF in full conformance with the
40	   provisions of BCP 78 and BCP 79.

42	   Internet-Drafts are working documents of the Internet Engineering
43	   Task Force (IETF), its areas, and its working groups.  Note that
44	   other groups may also distribute working documents as Internet-
45	   Drafts.

47	   Internet-Drafts are draft documents valid for a maximum of six months
48	   and may be updated, replaced, or obsoleted by other documents at any
49	   time.  It is inappropriate to use Internet-Drafts as reference
50	   material or to cite them other than as "work in progress."

52	   The list of current Internet-Drafts can be accessed at
53	   http://www.ietf.org/ietf/1id-abstracts.txt.

55	   The list of Internet-Draft Shadow Directories can be accessed at
56	   http://www.ietf.org/shadow.html.

58	   This Internet-Draft will expire on August 6, 2010.

60	Copyright Notice

62	   Copyright (c) 2010 IETF Trust and the persons identified as the
63	   document authors.  All rights reserved.

65	   This document is subject to BCP 78 and the IETF Trust's Legal
66	   Provisions Relating to IETF Documents
67	   (http://trustee.ietf.org/license-info) in effect on the date of
68	   publication of this document.  Please review these documents
69	   carefully, as they describe your rights and restrictions with respect
70	   to this document.  Code Components extracted from this document must
71	   include Simplified BSD License text as described in Section 4.e of
72	   the Trust Legal Provisions and are provided without warranty as
73	   described in the BSD License.

75	Table of Contents

77	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
78	   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  4
79	   3.  Examining alternatives mechanisms for MVPN functions . . . . .  4
80	     3.1.  MVPN auto-discovery  . . . . . . . . . . . . . . . . . . .  4
81	     3.2.  S-PMSI Signaling . . . . . . . . . . . . . . . . . . . . .  5
82	     3.3.  PE-PE Exchange of C-Multicast Routing  . . . . . . . . . .  7
83	       3.3.1.  PE-PE C-multicast routing scalability  . . . . . . . .  7
84	       3.3.2.  PE-CE multicast routing exchange scalability . . . . . 10
85	       3.3.3.  P-routers scalability  . . . . . . . . . . . . . . . . 10
86	       3.3.4.  Impact of C-multicast routing on Inter-AS
87	               deployments  . . . . . . . . . . . . . . . . . . . . . 10
88	       3.3.5.  Security and robustness  . . . . . . . . . . . . . . . 11
89	       3.3.6.  C-multicast VPN join latency . . . . . . . . . . . . . 12
90	       3.3.7.  Conclusion on C-multicast routing  . . . . . . . . . . 14
91	     3.4.  Encapsulation techniques for P-multicast trees . . . . . . 14
92	     3.5.  Inter-AS deployments options . . . . . . . . . . . . . . . 16
93	     3.6.  Bidir-PIM support  . . . . . . . . . . . . . . . . . . . . 19
94	   4.  Co-located RPs . . . . . . . . . . . . . . . . . . . . . . . . 20
95	   5.  Avoiding duplicates  . . . . . . . . . . . . . . . . . . . . . 21
96	   6.  Existing deployments . . . . . . . . . . . . . . . . . . . . . 21
97	   7.  Summary of recommendations . . . . . . . . . . . . . . . . . . 22
98	   8.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 22
99	   9.  Security Considerations  . . . . . . . . . . . . . . . . . . . 23
100	   10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 23
101	   11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 23
102	     11.1. Normative References . . . . . . . . . . . . . . . . . . . 23
103	     11.2. Informative References . . . . . . . . . . . . . . . . . . 23
104	   Appendix A.  Scalability of C-multicast routing processing load  . 24
105	     A.1.  Scalability with an increased number of PEs  . . . . . . . 26
106	       A.1.1.  SSM Scalability  . . . . . . . . . . . . . . . . . . . 26
107	       A.1.2.  ASM Scalability  . . . . . . . . . . . . . . . . . . . 34
108	     A.2.  Cost of PEs leaving and joining  . . . . . . . . . . . . . 35
109	   Appendix B.  Switching to S-PMSI . . . . . . . . . . . . . . . . . 38
110	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 39

112	1.  Introduction

114	   Specifications for multicast in BGP/MPLS
115	   [I-D.ietf-l3vpn-2547bis-mcast] include multiple alternative
116	   mechanisms for some of the required building blocks of the solution.
117	   However, they do not identify which of these mechanisms are mandatory
118	   to implement in order to ensure interoperability.  Not defining a set
119	   of mandatory to implement mechanisms leads to a situation where
120	   implementations may support different subsets of the available
121	   optional mechanisms which do not interoperate, which is a problem for
122	   the numerous operators having multi-vendor backbones.

124	   The aim of this document is to leverage the already expressed
125	   requirements [RFC4834] and study the properties of each approach, to
126	   identify mechanisms that are good candidates for being part of a core
127	   set of mandatory mechanisms which can be used to provide a base for
128	   interoperable solutions.

130	   This document goes through the different building blocks of the
131	   solution and concludes on which mechanisms an implementation is
132	   required to implement.  Section 7 summarizes these requirements.

134	   Considering the history of the multicast VPN proposals and
135	   implementations, it is also useful to discuss how existing
136	   deployments of early implementations
137	   [I-D.rosen-vpn-mcast][I-D.raggarwa-l3vpn-2547-mvpn] can be
138	   accommodated, and provide suggestions in this respect.

140	2.  Terminology

142	   Please refer to [I-D.ietf-l3vpn-2547bis-mcast] and [RFC4834].

144	3.  Examining alternatives mechanisms for MVPN functions

146	3.1.  MVPN auto-discovery

148	   The current solution document [I-D.ietf-l3vpn-2547bis-mcast] proposes
149	   two different mechanisms for MVPN auto-discovery:

151	   1.  BGP-based auto-discovery

153	   2.  "PIM/shared P-tunnel": discovery done through the exchange of PIM
154	       Hellos by C-PIM instances, across an MI-PMSI implemented with one
155	       shared P-tunnel per VPN (using multicast ASM, or MP2MP LDP)

157	   Both solutions address Section 5.2.10 of [RFC4834] which states that
158	   "the operation of a multicast VPN solution SHALL be as light as
159	   possible and providing automatic configuration and discovery SHOULD
160	   be a priority when designing a multicast VPN solution.  Particularly
161	   the operational burden of setting up multicast on a PE or for a VR/
162	   VRF SHOULD be as low as possible".

164	   The key consideration is that PIM-based discovery is only applicable
165	   to deployments using a shared P-tunnel to instantiate an MI-PMSI (it
166	   is not applicable if only P2P, PIM-SSM, P2MP mLDP/RSVP-TE P-tunnels
167	   are used, because contrary to ASM and MP2MP, building these types of
168	   P-tunnels cannot happen before the autodiscovery has been done),
169	   whereas the BGP-based auto-discovery does not place any constraint on
170	   the type of P-tunnel that would have to be used.  BGP-based auto-
171	   discovery is independent of the type of P-tunnel used thus satisfying
172	   the requirement in section 5.2.4.1 of [RFC4834] that "a multicast VPN
173	   solution SHOULD be designed so that control and forwarding planes are
174	   not interdependent".

176	   Additionally, it is to be noted that a number of service providers
177	   have chosen to use SSM-based P-tunnels for the default MDTs within
178	   their current deployments, therefore relying already on some BGP-
179	   based auto-discovery.

181	   Moreover, when shared P-tunnels are used, the use of BGP auto-
182	   discovery would allow inconsistencies in the addresses/identifiers
183	   used for the shared P-tunnel to be detected (e.g. the same shared
184	   P-tunnel identifier being used for different VPNs with distinct BGP
185	   route targets).  This is particularly attractive in the context of
186	   inter-AS VPNs where the impact of any misconfiguration could be
187	   magnified and where a single service provider may not operate all the
188	   ASs.  Note that this technique to detect some misconfiguration cases
189	   may not be usable during a transition period from a shared-P-tunnel
190	   autodiscovery to a BGP-based autodiscovery.

192	   Thus, the recommendation is that implementation of the BGP-based
193	   auto-discovery is mandated and should be supported by all MVPN
194	   implementations.

196	3.2.  S-PMSI Signaling

198	   The current solution document [I-D.ietf-l3vpn-2547bis-mcast] proposes
199	   two mechanisms for signaling that multicast flows will be switched to
200	   an S-PMSI:

202	   1.  a UDP-based TLV protocol specifically for S-PMSI signaling
203	       (described in section 7.4.2).

205	   2.  a BGP-based mechanism for S-PMSI signaling (described in section
206	       7.4.1).

208	   Section 5.2.10 of [RFC4834] states that "as far as possible, the
209	   design of a solution SHOULD carefully consider the number of
210	   protocols within the core network: if any additional protocols are
211	   introduced compared with the unicast VPN service, the balance between
212	   their advantage and operational burden SHOULD be examined
213	   thoroughly".  The UDP-based mechanism would be an additional protocol
214	   in the MVPN stack, which isn't the case for the BGP-based S-PMSI
215	   switching signaling, since (a) BGP is identified as a requirement for
216	   autodiscovery, and (b) the BGP-based S-PMSI switching signaling
217	   procedures are very similar to the autodiscovery procedures.

219	   Furthermore, the UDP-based S-PMSI switching signaling mechanism
220	   requires an MI-PMSI, while the BGP-based protocol does not.  In
221	   practice, this mean that with the UDP-based protocol a PE will have
222	   to join to all P-tunnels of all PEs in an MVPN, while in the
223	   alternative where BGP-based S-PMSI switching signaling is used, it
224	   could delay joining a P-tunnel rooted at a PE until traffic from that
225	   PE is needed, thus reducing the amount of state maintained on P
226	   routers.

228	   S-PMSI switching signaling approaches can also be compared in an
229	   inter-AS context (see Section 3.5).  The proposed BGP-based approach
230	   for S-PMSI switching signaling provides a good fit with both the
231	   segmented and non-segmented inter-AS approaches (seeSection 3.5).  By
232	   contrast while the UDP-based approach for S-PMSI switching signaling
233	   appears to be usable with segmented inter-AS tunnels, in that case
234	   key advantages of the segmented approach are lost:

236	   o  there is no more an independence of ASes to choose when S-PMSIs
237	      tunnels will be triggered in their AS (and thus control the amount
238	      of state created on their P routers),

240	   o  there is no more an independence of ASes to choose the tunneling
241	      technique for the P-tunnels used for an S-PMSI,

243	   o  In an inter-AS option B context, an isolation of ASes is obtained
244	      as PEs in one AS don't have (direct) exchange of routing
245	      information with PEs of other ASes.  This property is not
246	      preserved if UDP-based S-PMSI switching signaling is used.  By
247	      contrast, BGP-based C-Multicast switching signaling does preserve
248	      this property.

250	   Given all the above, it is the recommendation of the authors that BGP
251	   is the preferred solution for S-PMSI switching signaling and should
252	   be supported by all implementations.

254	   It is identified that, if nothing prevents a fast-paced creation of
255	   S-PMSI, then S-PMSI switching signaling with BGP would possibly
256	   impact the Route Reflectors used for MVPN routes.  However is it also
257	   identified that such a fast-paced behavior would have an impact on P
258	   and PE routers resulting from S-PMSI tunnels signaling, which will be
259	   the same independently of the S-PMSI signaling approach that is used,
260	   and which it is certainly best to avoid by setting up proper
261	   mechanisms.

263	   The UDP-based S-PMSI switching signaling protocol can also be
264	   considered, as an option, given that this protocol has been in
265	   deployment for some time.  Implementations supporting both protocols
266	   would be expected to provide a per-VRF configuration knob to allow an
267	   implementation to use the UDP-based TLV protocol for S-PMSI switching
268	   signaling for specific VRFs in order to support the coexistence of
269	   both protocols (for example during migration scenarios).  Apart from
270	   such migration-facilitating mechanisms, the authors specifically do
271	   not recommend extending the already proposed UDP-based TLV protocol
272	   to new types of P-tunnels.

274	3.3.  PE-PE Exchange of C-Multicast Routing

276	   The current solution document [I-D.ietf-l3vpn-2547bis-mcast] proposes
277	   multiple mechanisms for PE-PE exchange of customer multicast routing
278	   information (C-multicast routing):

280	   1.  Full per-MVPN PIM peering across an MI-PMSI (described in section
281	       3.4.1.1).

283	   2.  Lightweight PIM peering across an MI-PMSI (described in section
284	       3.4.1.2)

286	   3.  The unicasting of PIM C-Join/Prune messages (described in section
287	       3.4.1.3)

289	   4.  The use of BGP for carrying C-Multicast routing (described in
290	       section 3.4.2).

292	3.3.1.  PE-PE C-multicast routing scalability

294	   Scalability being one of the core requirements for multicast VPN, it
295	   is useful to compare the proposed C-multicast routing mechanisms from
296	   this perspective: Section 4.2.4 of [RFC4834] recommends that "a
297	   multicast VPN solution SHOULD support several hundreds of PEs per
298	   multicast VPN, and MAY usefully scale up to thousands" and section
299	   4.2.5 states that "a solution SHOULD scale up to thousands of PEs
300	   having multicast service enabled".

302	   Scalability with an increased number of VPNs per PE, or with an
303	   increased number of multicast state per VPN, are also important, but
304	   are not focused on in this section since we didn't identify
305	   differences between the different approaches for these matters: all
306	   others things equal, the load on PE due to C-multicast routing
307	   increases roughly linearly with the number of VPNs per PE, and with
308	   the number of multicast state per VPN.

310	   This section presents conclusions related to PE-PE C-multicast
311	   routing scalability.  Appendix A provides more detailed explanations
312	   on the differences in ways of handling the C-multicast routing load,
313	   between the PIM-based approaches and the BGP-based approach, along
314	   with a quantified evaluations of the amount of state and messages
315	   with the different approaches, and many points made in this section
316	   are detailed in Appendix A.1.

318	   At high scales of multicast deployment, the first and third
319	   mechanisms require the PEs to maintain a large number of PIM
320	   adjacencies with other PEs of the same multicast VPN (which implies
321	   the regular exchange PIM Hellos with each other) and to periodically
322	   refresh C-Join/Prune states, resulting in an increased processing
323	   cost when the amount of PEs increases (as detailed in Appendix A.1)
324	   to which the second approach is less subject, and to which the fourth
325	   approach is not subject.

327	   The third mechanism would reduce the amount of C-Join/Prune
328	   processing for a given multicast flow for PEs that are not the
329	   upstream neighbor for this flow, but would require "explicit
330	   tracking" state to be maintained by the upstream PE.  It also isn't
331	   compatible with the "Join suppression" mechanism.  A possible way to
332	   reduce the amount of signaling with this approach would be the use of
333	   a PIM refresh-reduction mechanism.  Such a mechanism, based on TCP,
334	   is being specified by the PIM IETF Working Group
335	   ([I-D.ietf-pim-port]) ; its use in a multicast VPN context has not
336	   been described in [I-D.ietf-l3vpn-2547bis-mcast], but it is expected
337	   that this approach would provide a scalability similar with the BGP-
338	   based approach without RR.

340	   The second mechanism would operate in a similar manner to full per-
341	   MVPN PIM peering except that PIM Hello messages are not transmitted
342	   and PIM C-Join/Prune refresh-reduction would be used, thereby
343	   improving scalability, but this approach has yet to be fully
344	   described.  In any case, it seems that it only improves one thing
345	   among the things that will impact scalability when the number of PEs
346	   increases.

348	   The first and second mechanisms can leverage the "Join suppression"
349	   behavior and thus improve the processing burden of an upstream PE,
350	   sparing the processing of a Join refresh message for each remote PE
351	   joined to a multicast stream.  This improvement requires all PEs of a
352	   multicast VPN to process all PIM Join and Prune messages sent by any
353	   other PE participating in the same multicast VPN whether they are the
354	   upstream PE or not.

356	   The fourth mechanism (the use of BGP for carrying C-Multicast
357	   routing) would have a comparable drawback of requiring all PEs to
358	   process a BGP C-multicast route only interesting a specific upstream
359	   PE.  For this reason section 16 [I-D.ietf-l3vpn-2547bis-mcast-bgp]
360	   recommends the use of the Route-Target constrained BGP distribution
361	   [RFC4684] mechanisms, which eliminate this drawback by making only
362	   the interested upstream PE to receive a BGP C-multicast route.
363	   Specifically when Route-Target constrained BGP distribution is used,
364	   the fourth mechanism reduces the total amount of C-multicast routing
365	   processing load put on the PEs by avoiding any processing of customer
366	   multicast routing information on the "unrelated" PEs, that are
367	   neither the joining PE nor the upstream PE.

369	   Moreover, the fourth mechanism further reduces the total amount of
370	   message processing load by avoiding the use of periodic refreshes,
371	   and by inheriting BGP features that are expected to improve
372	   scalability (for instance, providing a means to offload some of the
373	   processing burden associated with customer multicast routing onto one
374	   or many BGP route-reflectors).  The advantages of the fourth
375	   mechanism come at a cost of maintaining an amount of state linear
376	   with the number of PEs joined to a stream.  However, the use of route
377	   reflectors allows to spread this cost among multiple route
378	   reflectors, thus eliminating the need for a single route reflector to
379	   maintain all this state.

381	   However, the fourth mechanism is specific in that it offers the
382	   possibility of offloading customer multicast routing processing onto
383	   one or more BGP Route Reflector(s).  When this is used, there is a
384	   drawback of increasing the processing load placed on the route
385	   reflector infrastructure.  In the higher scale scenarios, it may be
386	   required to adapt the route reflector infrastructure to the MVPN
387	   routing load by using, for example:

389	   o  a separation of resources for unicast and multicast VPN routing:
390	      using dedicated MVPN Route Reflector(s) (or using dedicated MVPN
391	      BGP sessions or dedicated MVPN BGP instances) ;

393	   o  the deployment of additional route reflector resources, for
394	      example increasing the processing resources on existing route
395	      reflectors or deployment of additional route reflectors.

397	   Among the above, the most straightforward approach is to consider the
398	   introduction of route reflectors dedicated to the MVPN service and
399	   dimension them accordingly to the need of that service (but doing so
400	   is not required and is left as an operator engineering decision).

402	3.3.2.  PE-CE multicast routing exchange scalability

404	   The overhead associated with the PE-CE exchange of C-multicast
405	   routing is independent of the choice of the mechanism used for the
406	   PE-PE C-multicast routing.  Therefore, the impact of the PE-CE
407	   C-multicast routing overhead on the overall system scalability is
408	   independent of the protocol used for PE-PE signaling, and therefore
409	   is not relevant when comparing the different approaches proposed for
410	   the PE-PE C-multicast routing.  This is true even if in some
411	   operational contexts the PE-CE C-multicast routing overhead is a
412	   significant factor in the overall system overhead.

414	3.3.3.  P-routers scalability

416	   Mechanisms (1) and (2) are restricted to use within multicast VPNs
417	   that use an MI-PMSI, thereby necessitating:

419	      the use of a P-tunnel technique that allows shared P-tunnels (for
420	      example PIM-SM in ASM mode or MP2MP LDP)

422	   or   the use of one P-tunnel per PE per VPN, even for PEs that do not
423	      have sources in their directly attached sites for that VPN.

425	   By comparison, the fourth mechanism doesn't impose either of these
426	   restrictions, and when P2MP P-tunnels are used only necessitates the
427	   use of one P-tunnel per VPN per PE attached to a site with a
428	   multicast source or RP (or with a candidate BSR, if BSR is used).

430	   In cases where there are less PEs connected with sources than the
431	   total amount of PEs, it improves the amount of state maintained by
432	   P-routers compared to the amount required to build an MI-PMSI with
433	   P2MP P-tunnels.  Such cases are expected to be frequent for multicast
434	   VPN deployments (see sections 4.2.4.1 of [RFC4834]).

436	3.3.4.  Impact of C-multicast routing on Inter-AS deployments

438	   Co-existence with unicast inter-AS VPN options, and an equal level of
439	   security for multicast and unicast including in an inter-AS context,
440	   are specifically mentioned in sections 5.2.6, 5.2.8 and 5.2.12 of
441	   [RFC4834].

443	   In an inter-AS option B context, an isolation of ASes is obtained as
444	   PEs in one AS don't have (direct) exchange of routing information
445	   with PEs of other ASes.  This property is not preserved if PIM-based
446	   PE-PE C-multicast routing is used.  By contrast, the fourth option
447	   (BGP-based C-Multicast routing) does preserve this property.

449	   Additionally, the authors note that the proposed BGP-based approach
450	   for C-multicast routing provides a good fit with both the segmented
451	   and non-segmented inter-AS approaches.  By contrast, though the PIM-
452	   based C-multicast routing is usable with segmented inter-AS tunnels,
453	   the inter-AS scalability advantage of the approach is lost, since PEs
454	   in an AS will see the C-multicast routing activity of all other PEs
455	   of all other ASes.

457	3.3.5.  Security and robustness

459	   BGP supports MD5 authentication of its peers for additional security,
460	   thereby possibly benefit directly to multicast VPN customer multicast
461	   routing, whether for intra-AS or inter-AS communications.  By
462	   contrast, with a PIM-based approach, no mechanism providing a
463	   comparable level of security to authenticate communications between
464	   remote PEs has been yet fully described yet
465	   [I-D.ietf-pim-sm-linklocal][], and in any case would require
466	   significant additional operations for the provider to be usable in a
467	   multicast VPN context.

469	   The robustness of the infrastructure, especially the existing
470	   infrastructure providing unicast VPN connectivity, is key.  The
471	   C-multicast routing function, especially under load, will compete
472	   with the unicast routing infrastructure.  With the PIM-based
473	   approaches, the unicast and multicast VPN routing functions are
474	   expected to only compete in the PE, for control plane processing
475	   resources.  In the case of the BGP-based approach, they will compete
476	   on the PE for processing resources, and in the route reflectors
477	   (supposing they are used for MVPN routing).  It is identified that in
478	   both cases, mechanisms will be required to arbitrate resources (e.g.
479	   processing priorities).  In the case of PIM-based procedures, between
480	   the different control plane routing instances in the PE.  And in the
481	   case of the BGP-based approach, this is likely to require using
482	   distinct BGP sessions for multicast and unicast (e.g. through the use
483	   of dedicated MVPN BGP route reflectors, or to the use of a distinct
484	   session with an existing route reflector).

486	   Multicast routing is dynamic by nature, and multicast VPN routing has
487	   to follow the VPN customers multicast routing events.  The different
488	   approaches can be compared on how they are expected to behave in
489	   scenarios where multicast routing in the VPNs is subject to an
490	   intense activity.  Scalability of each approach under such a load is
491	   detailed in Appendix A.2, and the fourth approach (BGP-based) used in
492	   conjunction with the RT Constraint mechanisms [RFC4684], is the only
493	   one having a cost for join/leave operations independent of the number
494	   of PEs in the VPN (with one exception detailed in Appendix A.2) and
495	   state maintenance not concentrated on the upstream PE.

497	   On the other hand, while the BGP-based approach is likely to suffer a
498	   slowdown under a load that is greater than the available processing
499	   resources (because of possibly congested TCP sockets), the PIM-based
500	   approaches would react to such a load by dropping messages, with
501	   failure-recovery obtained through message refreshes.  Thus, the BGP-
502	   based approach could result in a degradation of join/leave latency
503	   performance typically spread evenly across all multicast streams
504	   being joined in that period, while the PIM-based approach could
505	   result in increased join/leave latency, for some random streams, by a
506	   multiple of the time between refreshes (e.g. tens of seconds), and
507	   possibly in some states the adjacency may time-out resulting in
508	   disruption of multicast streams.

510	   The behavior of the PIM-based approach under such a load is also
511	   harder to predict, given that the performance of the "Join
512	   suppression" mechanism (an important mechanism for this approach to
513	   scale) will itself be impeded by delays in Join processing.  For
514	   these reasons, the BGP-based approach would be able to provide a
515	   smoother degradation and more predictable behavior under a highly
516	   dynamic load.

518	   In fact, both an "evenly spread degradation" and an "unevenly spread
519	   larger degradation" can be problematic, and what seems important is
520	   the ability for the VPN backbone operator to (a) limit the amount of
521	   multicast routing activity that can be triggered by a multicast VPN
522	   customer, and to (b) provide the best possible independence between
523	   distinct VPNs.  It seems that both of these can be addressed through
524	   local implementation improvements, and that both the BGP-based and
525	   PIM-based approaches could be engineered to provide (a) and (b).  It
526	   can be noted though that the BGP approach proposes ways to dampen
527	   C-multicast route withdrawals and/or advertisements, and thus already
528	   describes a way to provide (a), while nothing comparable has yet been
529	   described for the PIM-based approaches (even though it doesn't appear
530	   difficult).  The PIM-based approaches rely on a per VPN dataplane to
531	   carry the MVPN control plane, and thus may benefit from this first
532	   level of separation to solve (b).

534	3.3.6.  C-multicast VPN join latency

536	   Section 5.1.3 of [RFC4834] states that "the group join delay [...] is
537	   also considered one important QoS parameter.  It is thus RECOMMENDED
538	   that a multicast VPN solution be designed appropriately in this
539	   regard".  In a multicast VPN context, the "group join delay"of
540	   interest is the time between a CE sending a PIM Join to its PE and
541	   the first packet of the corresponding multicast stream being received
542	   by the CE.

544	   It is to be noted that the C-multicast routing procedures will only
545	   impact the group join latency of a said multicast stream for the
546	   first receiver that is located across the provider backbone from the
547	   multicast source-connected PE (or the first <n> receivers in the
548	   specific case where a specific UMH selection algorithm is used, that
549	   allows <n> distinct UMH to be selected by distinct downstream PEs).

551	   The different approaches proposed seem to have different
552	   characteristics in how they are expected to impact join latency:

554	   o  the PIM-based approaches minimize the number of control plane
555	      processing hops between a new receiver-connected PE and the
556	      source-connected PE, and being datagram-based introduces minimal
557	      delay, thereby possibly having a join latency as good as possible
558	      depending on implementation efficiency

560	   o  under degraded conditions (packet loss, congestion, high control
561	      plane load) the PIM-based approach may impact the latency for a
562	      given multicast stream in an all or nothing manner: if a
563	      C-multicast routing PIM Join packet is lost, latency can reach a
564	      high time (a multiple of the periodicity of PIM Join refreshes)

566	   o  the BGP-based approach uses TCP exchanges, that may introduce an
567	      additional delay depending on BGP and TCP implementation, but
568	      which would typically result, under degraded conditions (such
569	      packet loss, congestion, high control plane load), in a comparably
570	      lower increase of latency spread more evenly across the streams

572	   o  as shown in Appendix A, the BGP-based approach is particular in
573	      that it removes load from all the PEs (without putting this load
574	      on the upstream PE for a stream); this improvement of background
575	      load can bring improved performance when a PE acts as the upstream
576	      PE for a stream, and thus benefit join latency

578	   This qualitative comparison of approaches shows that the BGP-based
579	   approach is designed for a smoother degradation of latency under
580	   degraded conditions such as packet loss, congestion, or high control
581	   plane load.  On the other hand, the PIM-based approaches seem to
582	   structurally be able to reach the shorter "best-case" group join
583	   latency (especially compared to deployment of the BGP-based approach
584	   where route-reflectors are used).

586	   Doing a quantitative comparison of latencies is not possible without
587	   referring to specific implementations and benchmarking procedures,
588	   and would possibly expose different conclusions, especially for best-
589	   case group join latency for which performance is expected vary with
590	   PIM and BGP implementations.  We can also note that improving a BGP
591	   implementation for reduced latency of route processing would not only
592	   benefit multicast VPN group join latency, but the whole BGP-based
593	   routing, which means that the need for good BGP/RR performance is not
594	   specific to multicast VPN routing.

596	   Last, C-multicast join latency will be impacted by the overall load
597	   put on the control plane, and the scalability of the C-multicast
598	   routing approach is thus to be taken into account.  As explained in
599	   sections Section 3.3.1 and Appendix A, the BGP-based approach will
600	   provide the best scalability with an increased number of PEs per VPN,
601	   thereby benefiting group join latency in such higher scale scenarios.

603	3.3.7.  Conclusion on C-multicast routing

605	   The first and fourth approaches are relevant contenders for
606	   C-multicast routing.  Comparisons from a theoretical standpoint lead
607	   to identify some advantages as well as possible drawbacks in the
608	   fourth approach.  Comparisons from a practical standpoint are harder
609	   to make: since only reduced deployment and implementation information
610	   is available for the fourth approach, advantages would be seen in the
611	   first approach that has been applied through multiple deployments and
612	   shown to be operationally viable.

614	   Moreover, the first mechanism (full per-MVPN PIM peering across an
615	   MI-PMSI) is the mechanism used by [I-D.rosen-vpn-mcast] and therefore
616	   it is deployed and operating in MVPNs today.  The fourth approach may
617	   or may not end up being preferred for a said deployment, but because
618	   the first approach has been in deployment for some time, the support
619	   for this mechanism will in any case be helpful for to facilitate an
620	   eventual migration from a deployment using mechanism close to the
621	   first approach.

623	   Consequently, at the present time, implementations are recommended to
624	   support both the fourth (BGP-based) and first (Full per-MPVN PIM
625	   peering) mechanisms.  Further experience on deployments of the fourth
626	   approach is needed before some best practice can be defined.  In the
627	   meantime, this recommendation would enable a service provider to
628	   choose between the first and the fourth mechanism, without this
629	   choice being constrained by vendors implementation choices, and
630	   taking into account the peculiarities of its own deployment context
631	   by pondering the weight of the different factors into account.

633	3.4.  Encapsulation techniques for P-multicast trees

635	   In this section the authors will not make any restricting
636	   recommendations since the appropriateness of a specific provider core
637	   data plane technology will depend on a large number of factors, for
638	   example the service provider's currently deployed unicast data plane,
639	   many of which are service provider specific.

641	   However, implementations should not unreasonably restrict the data
642	   plane technology that can be used, and should not force the use of
643	   the same technology for different VPNs attached to a single PE.
644	   Initial implementations may only support a reduced set of
645	   encapsulation techniques and data plane technologies but this should
646	   not be a limiting factor that hinders future support for other
647	   encapsulation techniques, data plane technologies or
648	   interoperability.

650	   Section 5.2.4.1 of [RFC4834] states "In a multicast VPN solution
651	   extending a unicast L3 PPVPN solution, consistency in the tunneling
652	   technology has to be favored: such a solution SHOULD allow the use of
653	   the same tunneling technology for multicast as for unicast.
654	   Deployment consistency, ease of operation and potential migrations
655	   are the main motivations behind this requirement."

657	   Current unicast VPN deployments use a variety of LDP, RSVP-TE and
658	   GRE/IP-Multicast for encapsulating customer packets for transport
659	   across the provider core of VPN services.  In order to allow the same
660	   encapsulations to be used for unicast and multicast VPN traffic, it
661	   is recommended that multicast VPN standards should recommend
662	   implementations to support for multicast VPNs, all the P2MP variants
663	   of the encapsulations and signaling protocols that they support for
664	   unicast and for which some multipoint extension is defined, such as
665	   mLDP, P2MP RSVP-TE and GRE/IP-multicast.

667	   All three of the above encapsulation techniques support the building
668	   of P2MP multicast P-tunnels.  In addition mLDP and GRE/
669	   IP-ASM-Multicast implementations may also support the building of
670	   MP2MP multicast P-tunnels.  The use of MP2MP P-tunnels may provide
671	   some scaling benefits to the service provider as only a single MP2MP
672	   P-tunnel need be deployed per VPN, thus reducing by an order of
673	   magnitude the amount of multicast state that needs to be maintained
674	   by P routers.  This gain in state is at the expense of bandwidth
675	   optimization, since sites that do not have multicast receivers for
676	   multicast streams sourced behind a said PE group will still receive
677	   packets of such streams, leading to non-optimal bandwidth utilization
678	   across the VPN core.  One thing to consider is that the use of MP2MP
679	   multicast P-tunnel will require additional configuration to define
680	   the same P-tunnel identifier or multicast ASM group address in all
681	   PEs (it has been noted that some auto-configuration could be possible
682	   for MP2MP P-tunnels, but this it is not currently supported by the
683	   auto-discovery procedures). [ It has been noted that C-multicast
684	   routing schemes not covered in [I-D.ietf-l3vpn-2547bis-mcast] could
685	   expose different advantages of MP2MP multicast P-tunnels - this is
686	   out of scope of this document ]

688	   MVPN services can also be supported over a unicast VPN core through
689	   the use of ingress PE replication whereby the ingress PE replicates
690	   any multicast traffic over the P2P tunnels used to support unicast
691	   traffic.  While this option does not require the service provider to
692	   modify their existing P routers (in terms of protocol support) and
693	   does not require maintaining multicast-specific state on the P
694	   routers in order for the service provider to be able deploy a
695	   multicast VPN service, the use of ingress PE replication obviously
696	   leads to non-optimal bandwidth utilization and it is therefore
697	   unlikely to be the long term solution chosen by service providers.
698	   However ingress PE replication may be useful during some migration
699	   scenarios or where a service provider considers the level of
700	   multicast traffic on their network to be too low to justify deploying
701	   multicast specific support within their VPN core.

703	   All proposed approaches for control plane and dataplane can be used
704	   to provide aggregation amongst multicast groups within a VPN and
705	   amongst different multicast VPNs, and potentially reduce the amount
706	   of state to be maintained by P routers.  However the latter -- the
707	   aggregation amongst different multicast VPNs will require support for
708	   upstream-assigned labels on the PEs.  Support for upstream-assigned
709	   labels may require changes to the data plane processing of the PEs
710	   and this should be taken into consideration by service providers
711	   considering the use of aggregate PMSI tunnels for the specific
712	   platforms that the service provider has deployed.

714	3.5.  Inter-AS deployments options

716	   There are a number of scenarios that lead to the requirement for
717	   inter-AS multicast VPNs, including:

719	   1.  a service provider may have a large network that they have
720	       segmented into a number of ASs.

722	   2.  a service provider's multicast VPN may consist of a number of ASs
723	       due to acquisitions and mergers with other service providers.

725	   3.  a service provider may wish to interconnect their multicast VPN
726	       platform with that of another service provider.

728	   The first scenario can be considered the "simplest" because the
729	   network is wholly managed by a single service provider under a single
730	   strategy and is therefore likely to use a consistent set of
731	   technologies across each AS.

733	   The second scenario may be more complex than the first because the
734	   strategy and technology choices made for each AS may have been
735	   different due to their differing history and the service provider may
736	   not have (or may be unwilling to) unified the strategy and technology
737	   choices for each AS.

739	   The third scenario is the most complex because in addition to the
740	   complexity of the second scenario, the ASs are managed by different
741	   service providers and therefore may be subject to a different trust
742	   model than the other scenarios.

744	   Section 5.2.6 of [RFC4834] states that "a solution MUST support
745	   inter-AS multicast VPNs, and SHOULD support inter-provider multicast
746	   VPNs", "considerations about coexistence with unicast inter-AS VPN
747	   Options A, B and C (as described in section 10 of [RFC4364]) are
748	   strongly encouraged" and "a multicast VPN solution SHOULD provide
749	   inter-AS mechanisms requiring the least possible coordination between
750	   providers, and keep the need for detailed knowledge of providers'
751	   networks to a minimum - all this being in comparison with
752	   corresponding unicast VPN options".

754	   Section 8 of [I-D.ietf-l3vpn-2547bis-mcast] addresses these
755	   requirements by proposing two approaches for MVPN inter-AS
756	   deployments:

758	   1.  Non-segmented inter-AS tunnels where the multicast tunnels are
759	       end-to-end across ASes, so even though the PEs belonging to a
760	       given MVPN may be in different ASs the ASBRs play no special role
761	       and function merely as P routers (described in section 8.1).

763	   2.  Segmented inter-AS tunnels where each AS constructs its own
764	       separate multicast tunnels which are then 'stitched' together by
765	       the ASBRs (described in section 8.2).

767	   (Note that an inter-AS deployment can alternatively rely on Option A
768	   -- so-called "back-to-back" VRFs -- that option is not considered in
769	   this section given that it can be used without any inter-AS specific
770	   mechanism)

772	   Section 5.2.6 of [RFC4834] also states "Within each service provider
773	   the service provider SHOULD be able on its own to pick the most
774	   appropriate tunneling mechanism to carry (multicast) traffic among
775	   PEs (just like what is done today for unicast)".  The segmented
776	   approach is the only one capable of meeting this requirement.

778	   The segmented inter-AS solution would appear to offer the largest
779	   degree of deployment flexibility to operators.  However the non-
780	   segmented inter-AS solution can simplify deployment in a restricted
781	   number of scenarios and [I-D.rosen-vpn-mcast] only supports the non-
782	   segmented inter-AS solution and therefore the non-segmented inter-AS
783	   solution is likely to be useful to some operators for backward
784	   compatibility and during migration from [I-D.rosen-vpn-mcast] to
785	   [I-D.ietf-l3vpn-2547bis-mcast].

787	   The following is a comparison matrix between the "segmented inter-AS
788	   P-tunnels" and "non-segmented inter-AS P-tunnels" approaches:

790	   o  Scalability for I-PMSIs: the "segmented inter-AS P-tunnels" is
791	      more scalable, because of the ability of an ASBR to aggregate
792	      multiple intra-AS P-tunnels used for I-PMSI within its own AS into
793	      one inter-AS P-tunnel to be used by other ASes.  Note that the
794	      I-PMSI scalability improvement brought by the "segmented inter-AS
795	      P-tunnels" approach is higher when segmented P-tunnels have a
796	      granularity of source AS (see item below).

798	   o  Scalability for S-PMSIs: the "segmented inter-AS P-tunnels", when
799	      used with the BGP-based C-multicast routing approach, provides
800	      flexibility in how the bandwidth/state trade-off is handled, to
801	      help with scalability.  Indeed in that case, the trade-off made
802	      for a said (C-S,C-G) in a downstream AS can be made more in favor
803	      of scalability than the trade-off made by the neighbor upstream
804	      AS, thanks to the ability to aggregate one or more S-PMSIs of the
805	      upstream AS in one I-PMSI tunnel in a downstream AS.

807	   o  Configuration at ASBRs: depending on whether segmented P-tunnels
808	      have a granularity of source ASBR or source AS, the "segmented
809	      inter-AS P-tunnels" approach would require respectively the same
810	      or additional configuration on ASBRs as the "non-segmented
811	      inter-AS P-tunnels" approach.

813	   o  Independence of tunneling technology from one AS to another: the
814	      "segmented inter-AS P-tunnels" approach provides this, the "non-
815	      segmented inter-AS P-tunnels" approach does not.

817	   o  Facilitated co-existence with, and migration from, existing
818	      deployments, and lighter engineering in some scenarios : the "non-
819	      segmented inter-AS P-tunnels" approach provides this, the
820	      "segmented inter-AS P-tunnels" approach does not.

822	   The applicability of segmented or non-segmented inter-AS tunnels to a
823	   given deployment or inter-provider interconnect will depend on a
824	   number of factors specific to each service provider.  However, given
825	   the different elements reminded above, it is the recommendation of
826	   the authors that all implementations should support the segmented
827	   inter-AS model.  Additionally, the authors recommend that
828	   implementations should consider supporting the non-segmented inter-AS
829	   model in order to facilitate co-existence with, and migration from,
830	   existing deployments, and as a feature to provide a lighter
831	   engineering in a restricted set of scenarios, although it is
832	   recognized that initial implementations may only support one or the
833	   other.

835	3.6.  Bidir-PIM support

837	   In Bidir-PIM, the packet forwarding rules have been improved over
838	   PIM-SM, allowing traffic to be passed up the shared tree toward the
839	   RP Address (RPA).  To avoid multicast packet looping, Bidir-PIM uses
840	   a mechanism called the designated forwarder (DF) election, which
841	   establishes a loop-free tree rooted at the RPA.  Use of this method
842	   ensures that only one copy of every packet will be sent to an RPA,
843	   even if there are parallel equal cost paths to the RPA.  To avoid
844	   loops the DF election process enforces consistent view of the DF on
845	   all routers on network segment, and during periods of ambiguity or
846	   routing convergence the traffic forwarding is suspended.

848	   In the context of a multicast VPN solution, a solution for Bidir-PIM
849	   support must preserve this property of similarly avoiding packet
850	   loops, including in the case where mVRF's in a given MVPN don't have
851	   a consistent view of the routing to C-RPL/C-RPA.

853	   The current MVPN specifications [I-D.ietf-l3vpn-2547bis-mcast] in
854	   section 11, define three methods to support Bidir-PIM, as RECOMMENDED
855	   in [RFC4834]:

857	   1.  Standard DF election procedure over an MI-PMSI

859	   2.  VPN Backbone as the RPL (section 11.1)

861	   3.  Partitioned Sets of PEs (section 11.2)

863	   Method (1) is naturally applied to deployments using "Full per-MVPN
864	   PIM peering across an MI-PMSI" for C-multicast routing, but as
865	   indicated in [I-D.ietf-l3vpn-2547bis-mcast] in section 11, the DF
866	   Election may not work well in an MVPN environment and an alternative
867	   to DF election would be desirable.

869	   The advantage of method (2) and (3) is that they do not require
870	   running the DF election procedure among PEs.

872	   Method (2) leverages the fact that in Bidir-PIM, running the DF
873	   election procedure is not needed on the RPL.  This approach thus has
874	   the benefit of simplicity of implementation, especially in a context
875	   where BGP-based C-multicast routing is used.  However it has the
876	   drawback of putting constraints on how Bidir-PIM is deployed which
877	   may not always match MVPN customers requirements.

879	   Method (3) treats an MVPN as a collection of sets of multicast VRFs,
880	   all PEs in a set having the same reachability information towards
881	   C-RPA, but distinct from PEs in other sets.  Hence, with this method,
882	   C-Bidir packet loops in MVPN are resolved by the ability to partition
883	   a VPN into disjoints sets of VRF's, each having a distinct view of
884	   converged network.  The partitioning approach to Bidir-PIM requires
885	   either upstream-assigned MPLS labels (to denote the partition) or a
886	   unique MP2MP LSP per partition.  The former is based on PE
887	   Distinguisher Labels that have to be distributed using auto-discovery
888	   BGP routes and their handling requires the support for upstream
889	   assigned labels and context label lookups [RFC5331].  The latter,
890	   using MP2MP LSP per partition, does not have these constraints but is
891	   restricted to P-tunnel types supporting MP2MP connectivity (such as
892	   mLDP [I-D.ietf-mpls-ldp-p2mp]).

894	   This approach to C-Bidir can work with PIM-based or BGP-based
895	   C-multicast routing procedures, and is also generic in the sense that
896	   it does not impose any requirements on the Bidir-PIM service
897	   offering.

899	   Given the above considerations, method (3) "Partitioned Sets of PEs"
900	   is the RECOMMENDED approach.

902	   In the event where method (3) is not applicable (lack of support for
903	   upstream assigned labels or for a P-tunnel type providing MP2MP
904	   connectivity), then method (1) "Standard DF election procedure over
905	   an MI-PMSI" and (2) "VPN Backbone as the RPL" are RECOMMENDED as
906	   interim solutions, (1) having the advantage over (2) of not putting
907	   constraints on how Bidir-PIM is deployed and the drawbacks of only
908	   being applicable when PIM-based C-multicast is used and of possibly
909	   not working well in an MVPN environment.

911	4.  Co-located RPs

913	   Section 5.1.10.1 of [RFC4834] states "In the case of PIM-SM in ASM
914	   mode, engineering of the RP function requires the deployment of
915	   specific protocols and associated configurations.  A service provider
916	   may offer to manage customers' multicast protocol operation on their
917	   behalf.  This implies that it is necessary to consider cases where a
918	   customer's RPs are out-sourced (e.g. on PEs).  Consequently, a VPN
919	   solution MAY support the hosting of the RP function in a VR or VRF."

921	   However, customers who have already deployed multicast within their
922	   networks and have therefore already deployed their own internal RPs
923	   are often reluctant to hand over the control of their RPs to their
924	   service provider and make use of a co-located RP model, and providing
925	   RP-collocation on a PE will require the activation of MSDP or the
926	   processing of PIM Registers on the PE.  Securing the PE routers for
927	   such activity requires special care, additional work, and will likely
928	   rely on specific features to be provided by the routers themselves.

930	   The applicability of the co-located RP model to a given MVPN will
931	   thus depend on a number of factors specific to each customer and
932	   service provider.

934	   It is therefore the recommendation that implementations should
935	   support a co-located RP model, but that support for a co-located RP
936	   model within an implementation should not restrict deployments to
937	   using a co-located RP model: implementations MUST support deployments
938	   when activation of a PIM RP function (PIM Register processing and RP-
939	   specific PIM procedures) or VRF MSDP instance is not required on any
940	   PE router and where all the RPs are deployed within the customers'
941	   networks or CEs.

943	5.  Avoiding duplicates

945	   It is recommended that implementations support the procedures
946	   described in section 9.1.1 of [I-D.ietf-l3vpn-2547bis-mcast]
947	   "Discarding Packets from Wrong PE", allowing fully avoiding
948	   duplicates.

950	6.  Existing deployments

952	   Some suggestions provided in this document can be used to
953	   incrementally modify currently deployed implementations without
954	   hindering these deployments, and without hindering the consistency of
955	   the standardized solution by providing optional per-VRF configuration
956	   knobs to support modes of operation compatible with currently
957	   deployed implementations, while at the same time using the
958	   recommended approach on implementations supporting the standard.

960	   In cases where this may not be easily achieved, a recommended
961	   approach would be to provide a per-VRF configuration knob that allows
962	   incremental per-VPN migration of the mechanisms used by a PE device,
963	   which would allow migration with some per-VPN interruption of service
964	   (e.g. during a maintenance window).

966	   Mechanisms allowing "live" migration by providing concurrent use of
967	   multiple alternatives for a given PE and a given VPN, is not seen as
968	   a priority considering the expected implementation complexity
969	   associated with such mechanisms.  However, if there happen to be
970	   cases where they could be viably implemented relatively simply, such
971	   mechanisms may help improve migration management.

973	7.  Summary of recommendations

975	   The following list summarizes conclusions on the mechanisms that
976	   define the set of mandatory to implement mechanisms in the context of
977	   [I-D.ietf-l3vpn-2547bis-mcast].

979	   Note well that the implementation of the non-mandatory alternative
980	   mechanisms is not precluded.

982	   Recommendations are:

984	   o  that BGP-based auto-discovery be the mandated solution for auto-
985	      discovery ;

987	   o  that BGP be the mandated solution for S-PMSI switching signaling ;

989	   o  that implementations support both the BGP-based and the full per-
990	      MPVN PIM peering solutions for PE-PE exchange of customer
991	      multicast routing until further operational experience is gained
992	      with both solutions ;

994	   o  that implementations use the "Partitioned Sets of PEs" approach
995	      for Bidir-PIM support ;

997	   o  that implementations implement the P2MP variants of the P2P
998	      protocols that they already implement, such as mLDP, P2MP RSVP-TE
999	      and GRE/IP-Multicast ;

1001	   o  that implementations support segmented inter-AS tunnels and
1002	      consider supporting non-segmented inter-AS tunnels (in order to
1003	      maintain backwards compatibility and for migration) ;

1005	   o  implementations MUST support deployments when activation of a PIM
1006	      RP function (PIM Register processing and RP-specific PIM
1007	      procedures) or VRF MSDP instance is not required on any PE router.

1009	   o  that implementations support the procedures described in section
1010	      9.1.1 of [I-D.ietf-l3vpn-2547bis-mcast]

1012	8.  IANA Considerations

1014	   This document makes no request to IANA.

1016	   [ Note to RFC Editor: this section may be removed on publication as
1017	   an RFC. ]

1019	9.  Security Considerations

1021	   This document does not by itself raise any particular security
1022	   considerations.

1024	10.  Acknowledgements

1026	   We would like to thank Adrian Farrel, Eric Rosen, Yakov Rekhter, and
1027	   Maria Napierala for their feedback that helped shape this document.

1029	   Additional credit is due to Maria Napierala for co-authoring
1030	   Section 3.6 on Bidir-PIM support.

1032	11.  References

1034	11.1.  Normative References

1036	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1037	              Requirement Levels", BCP 14, RFC 2119, March 1997.

1039	   [I-D.ietf-l3vpn-2547bis-mcast]
1040	              Aggarwal, R., Bandi, S., Cai, Y., Morin, T., Rekhter, Y.,
1041	              Rosen, E., Wijnands, I., and S. Yasukawa, "Multicast in
1042	              MPLS/BGP IP VPNs", draft-ietf-l3vpn-2547bis-mcast-09 (work
1043	              in progress), November 2009.

1045	   [I-D.ietf-l3vpn-2547bis-mcast-bgp]
1046	              Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP
1047	              Encodings and Procedures for Multicast in MPLS/BGP IP
1048	              VPNs", draft-ietf-l3vpn-2547bis-mcast-bgp-08 (work in
1049	              progress), September 2009.

1051	11.2.  Informative References

1053	   [RFC4834]  Morin, T., "Requirements for Multicast in L3 Provider-
1054	              Provisioned Virtual Private Networks (PPVPNs)", RFC 4834,
1055	              April 2007.

1057	   [I-D.rosen-vpn-mcast]
1058	              Cai, Y., Rosen, E., and I. Wijnands, "Multicast in MPLS/
1059	              BGP IP VPNs", draft-rosen-vpn-mcast-12 (work in progress),
1060	              August 2009.

1062	   [I-D.raggarwa-l3vpn-2547-mvpn]
1063	              Aggarwal, R., "Base Specification for Multicast in BGP/
1064	              MPLS VPNs", draft-raggarwa-l3vpn-2547-mvpn-00 (work in
1065	              progress), June 2004.

1067	   [I-D.ietf-pim-sm-linklocal]
1068	              Atwood, J., "Authentication and Confidentiality in PIM-SM
1069	              Link-local Messages", draft-ietf-pim-sm-linklocal-08 (work
1070	              in progress), November 2007.

1072	   [I-D.ietf-pim-port]
1073	              Farinacci, D., Wijnands, I., Venaas, S., and M. Napierala,
1074	              "A Reliable Transport Mechanism for PIM",
1075	              draft-ietf-pim-port-02 (work in progress), October 2009.

1077	   [RFC4684]  Marques, P., Bonica, R., Fang, L., Martini, L., Raszuk,
1078	              R., Patel, K., and J. Guichard, "Constrained Route
1079	              Distribution for Border Gateway Protocol/MultiProtocol
1080	              Label Switching (BGP/MPLS) Internet Protocol (IP) Virtual
1081	              Private Networks (VPNs)", RFC 4684, November 2006.

1083	   [I-D.ietf-mpls-ldp-p2mp]
1084	              Minei, I., Kompella, K., Wijnands, I., and B. Thomas,
1085	              "Label Distribution Protocol Extensions for Point-to-
1086	              Multipoint and Multipoint-to-Multipoint Label Switched
1087	              Paths", draft-ietf-mpls-ldp-p2mp-08 (work in progress),
1088	              October 2009.

1090	   [RFC5331]  Aggarwal, R., Rekhter, Y., and E. Rosen, "MPLS Upstream
1091	              Label Assignment and Context-Specific Label Space",
1092	              RFC 5331, August 2008.

1094	Appendix A.  Scalability of C-multicast routing processing load

1096	   The main role of multicast routing is to let routers determine that
1097	   they should start or stop forwarding a said multicast stream on a
1098	   said link.  In an MVPN context, this has to be done for each MVPN,
1099	   and the associated function is thus named "customer-multicast
1100	   routing" or "C-multicast routing" and its role is to let PE routers
1101	   determine that they should start or stop forwarding the traffic of a
1102	   said multicast stream toward the remote PEs, on some PMSI tunnel.

1104	   When some "join" message is received by a PE, this PE knows that it
1105	   should be sending traffic for the corresponding multicast group of
1106	   the corresponding MVPN.  But the reception of a "prune" message from
1107	   a remote PE is not enough by itself for a PE to know that it should
1108	   stop forwarding the corresponding multicast traffic: it has to make
1109	   sure that they aren't any other PEs that still have receivers for
1110	   this traffic.

1112	   There are many ways that the "C-multicast routing" building block can
1113	   be designed, and they differ, among other things, in how a PE
1114	   determines when it can stop forwarding a said multicast stream toward
1115	   other PEs:

1117	   PIM LAN Procedures, by default
1118	      By default when PIM LAN procedures are used, when a PE on a LAN
1119	      Prunes itself from a multicast tree, all other PEs on that LAN
1120	      check their own state to known if they are on the tree, in which
1121	      case they send a PIM Join message on that LAN to override the
1122	      Prune.  Thus, for each PIM Prune message, all PE routers on the
1123	      LAN work to let the upstream PE determine the answer to the "did
1124	      the last receiver leave?" question.

1126	   BGP-based C-multicast routing
1127	      When BGP-based procedures are used for C-multicast routing, if no
1128	      BGP Route Reflector is used, the "did the last receiver leave?"
1129	      question is answered by having the upstream PE maintain an up-to-
1130	      date list of the PEs which are joined to the tree, thus making it
1131	      possible to instantly know the answer to the "did the last
1132	      receiver leave?", whenever a PE leaves the said multicast tree.
1133	      But, when a BGP Route Reflector is used (which is expected to be
1134	      the recommended approach), the role of maintaining an updated list
1135	      of the PEs that are part of a said multicast tree is taken care of
1136	      by the Route Reflector(s).  Using BGP procedures a route reflector
1137	      that had been advertised a C-multicast Source Tree Join route for
1138	      a said (C-S, C-G) to other route reflectors before, will withdraw
1139	      this route when there is no of its clients PEs advertising this
1140	      route anymore.  Similarly, a route reflector that had advertised
1141	      this route to its client PEs before, will withdraw this route when
1142	      there is none of its (other) client PEs, and none of its route
1143	      reflectors peers advertising this route anymore.  In this context,
1144	      the "did the last receiver leave?" question can be said to be
1145	      answered by the route-reflector(s).
1146	      Furthermore, the BGP route distribution can leverage more than one
1147	      route reflector: if multiple route reflectors are used with PEs
1148	      being distributed (as clients) among these route reflectors, the
1149	      "did the last receiver leave?" question is partly answered by each
1150	      of these route reflector.

1152	   We can see that answering the "last receiver leaves" question is a
1153	   part of the work that the C-multicast routing building block has to
1154	   make, where the different approaches significantly differ.  The
1155	   different approaches for handling C-multicast routing can indeed
1156	   result in a different amount of processing and how this processing is
1157	   spread among the different functions.  These differences can be
1158	   better estimated by quantifying the amount of message processing and
1159	   state maintenance.

1161	   Though the type of processing, messages and states, may vary with the
1162	   different approaches, we propose here a rough estimation of the load
1163	   of PEs, in terms of number of messages processed and number of
1164	   control plane states maintained.  A "message processed" being a
1165	   message being parsed, a lookup being done, and some action being
1166	   taken (such as, for instance, updating a control plane or data plane
1167	   state, or discarding the information in the message).  A "state
1168	   maintained" being a multicast state kept in the control plane memory
1169	   of a PE, related to an interface or a PE being subscribed to a
1170	   multicast stream (note that a state will be counted on an equipment
1171	   as many times as the number of protocols in which it is present; e.g.
1172	   two times when present both as a PIM state and a BGP route).  Note
1173	   that here we don't compare the data plane states on PE routers, which
1174	   wouldn't vary between the different options chosen.

1176	A.1.  Scalability with an increased number of PEs

1178	   The following sections aims at evaluating the processing and state
1179	   maintenance load for an increasingly high number of PEs in a VPN.

1181	A.1.1.  SSM Scalability

1183	   The following subsections do such an estimation for each proposed
1184	   approach for C-multicast routing, for different phases of the
1185	   following scenario:

1187	   o  one SSM multicast stream is considered

1189	   o  only the intra-AS case is concerned (with the segmented inter-AS
1190	      tunnels and BGP-based C-multicast routing, #mvpn_PE and #R_PE
1191	      should refer to the PEs of the MVPN in the AS, not to all PEs of
1192	      the MVPN)

1194	   o  the scenario is as follows:

1196	      *  one PE Joins the multicast stream (because of a new receiver-
1197	         connected site has sent a Join on the PE-CE link), followed by
1198	         a number of additional PEs that also join the same multicast
1199	         stream, one after the other ; we evaluate the processing
1200	         required for the addition of each PE

1202	      *  some period of time T passes, without any PE joining or leaving
1203	         (baseline)

1205	      *  all PE leaves, one after the other, until the last one leaves ;
1206	         we evaluate the processing required for the leave of each PE

1208	   o  the parameters used are:

1210	      *  #mvpn_PE: the number of PEs in the MVPN

1212	      *  #R_PE: the number of PEs joining the multicast stream

1214	      *  #RR: the number of route reflectors

1216	      *  T_PIM_r: the time between two refreshes of a PIM Join (default
1217	         is 60s)

1219	   The estimation unit used is the "message.equipment" (or "m.e"): one
1220	   "message.equipment" corresponding to "one equipment processing one
1221	   message" (10 m.e being "10 equipments processing each one message",
1222	   or "5 messages each processed by 2 equipments", or "1 message
1223	   processed by 10 equipment", etc.).  Similarly, for the amount of
1224	   control plane state, the unit used is "state.equipment" or "s.e".
1225	   This allow to take into account the fact that a message (or a state)
1226	   can have be processed (or maintained) by more than one node.

1228	   We distinguish three different types of equipments: the upstream PE
1229	   for the considered multicast stream, the RR (if any), and the other
1230	   PEs (which are not the upstream PE).

1232	   The numbers or orders of magnitude given in the tables in the
1233	   following subsections are totals across all equipments of a same
1234	   type, for each type of equipment, in the "m.e" and "s.e" units
1235	   defined above.

1237	   Additionally:

1239	   o  for PIM, only Join and Prune messages are counted:

1241	      *  the load due to PIM Hellos can be easily computed separately
1242	         and only depends on the number of PEs in the VPN;

1244	      *  message processing related to the PIM Assert mechanism is also
1245	         not taken into account, for sake of simplicity;

1247	   o  for BGP, all advertisements and withdrawals of C-multicast Source
1248	      Tree Join routes are considered (Source-Active autodiscovery
1249	      routes are not used in an SSM context) ; and, following the
1250	      recommendation in Section 16 of [I-D.ietf-l3vpn-2547bis-mcast-bgp]
1251	      the case where the RT-Constraint mechanisms [RFC4684] is not used
1252	      is not covered;

1254	   (Note that for all options provided for C-multicast routing, the
1255	   procedures to setup and maintain a shortest path tree toward the
1256	   source of an SSM group are the same than the procedures used to setup
1257	   and maintain a shortest path tree toward an RP or a non-SSM source ;
1258	   the results of this section are thus re-used in section
1259	   Appendix A.1.2 )

1261	A.1.1.1.  PIM LAN procedures, by default

1263	   +------------+------------+---------------+----------+--------------+
1264	   |            | upstream   | other PEs     | RR       | total across |
1265	   |            | PE (1)     | (total across | (none)   | all          |
1266	   |            |            | (#mvpn_PE-1)  |          | equipments   |
1267	   |            |            | PEs)          |          |              |
1268	   +------------+------------+---------------+----------+--------------+
1269	   | first PE   | 1 m.e      | #mvpn_PE-1    | /        | #mvpn_PE m.e |
1270	   | joins      |            | m.e           |          |              |
1271	   +------------+------------+---------------+----------+--------------+
1272	   | for *each* | 1 m.e      | #mvpn_PE-1    | /        | #mvpn_PE m.e |
1273	   | additional |            | m.e           |          |              |
1274	   | PE joining |            |               |          |              |
1275	   +------------+------------+---------------+----------+--------------+
1276	   | baseline   | T/T_PIM_r  | (T/T_PIM_r) . | /        | (T/T_PIM_r)  |
1277	   | processing | m.e        | (#mvpn_PE-1)  |          | x #mvpn_PE   |
1278	   | over a     |            | m.e           |          | m.e          |
1279	   | period T   |            |               |          |              |
1280	   +------------+------------+---------------+----------+--------------+
1281	   | for *each* | 2 m.e      | 2(#mvpn_PE-1) | /        | 2 x #mvpn_PE |
1282	   | PE leaving |            | m.e           |          | m.e          |
1283	   +------------+------------+---------------+----------+--------------+
1284	   | the last   | 1 m.e      | #mvpn_PE-1    | /        | #mvpn_PE m.e |
1285	   | PE leaves  |            | m.e           |          |              |
1286	   +------------+------------+---------------+----------+--------------+
1287	   | total for  | #R_PE x 2  | (#mvpn_PE-1)  | 0        | #mvpn_PE x ( |
1288	   | #R_PE PEs  | +          | x (#R_PE) x 2 |          | 3 x #R_PE +  |
1289	   |            | T/T_PIM_r  | + T/T_PIM_r)  |          | T/T_PIM_r )  |
1290	   |            | m.e        | .             |          | m.e          |
1291	   |            |            | (#mvpn_PE-1)  |          |              |
1292	   |            |            | m.e           |          |              |
1293	   +------------+------------+---------------+----------+--------------+
1294	   | total      | 1 s.e      | #R_PE s.e     | 0        | #R_PE+1 s.e  |
1295	   | state      |            |               |          |              |
1296	   | maintained |            |               |          |              |
1297	   +------------+------------+---------------+----------+--------------+

1299	    Messages processing and state maintenance - PIM LAN procedures, by
1300	                                  default

1302	   We suppose here that the PIM Join suppression and Prune Override
1303	   mechanisms are fully effective, i.e. that a Join or Prune message
1304	   sent by a PE is instantly seen by other PEs.  Strictly speaking, this
1305	   is not true, and depending on network delays and timing, there could
1306	   be cases where more messages are exchanged and the number given in
1307	   this table is a lower bound to the number of PIM messages exchanged.

1309	A.1.1.2.  BGP-based C-multicast routing

1311	   The following analysis assumes that BGP Route Reflectors (RRs) are
1312	   used, and no hierarchy of RRs (remind that the analysis also assumes
1313	   that Route Target Constrain mechanisms are is used).

1315	   Given these assumptions, a message carrying a C-multicast route from
1316	   a downstream PE would need to be processed by the RRs that have that
1317	   PE as their client.  Due to the use of RT Constrain, these RRs would
1318	   then send this message to only the RRs that have the upstream PE as
1319	   client.  None of the other RRs, and none of the other PEs will
1320	   receive this message.  Thus, for a message associated with a given
1321	   MVPN the total number of RRs that would need to process this message
1322	   only depends on the number of RRs that maintain C-multicast routes
1323	   for that MVPN and that have either the receiver-connected PE, or the
1324	   source-connected PE as their clients, and is independent of the total
1325	   number of RRs or the total number of PEs.

1327	   In practice for a given MVPN a PE would be a client of just 2 RRs
1328	   (for redundancy, an RR cluster would typically have 2 RRs).
1329	   Therefore, in practice the message would need to be processed by at
1330	   most 4 RRs (2 RRs if both the downstream PE and the upstream PE are
1331	   the clients of the same RRs).  Thus the number of RRs that have to
1332	   process a given message is at most 4.  Since RRs in different RR
1333	   clusters have a full iBGP mesh among themselves, each RR in the RR
1334	   cluster that contains the upstream PE would receive the message from
1335	   each of the RR in the RR cluster that contains the downstream PE.
1336	   Given 2 RRs per cluster, the total number of messages processed by
1337	   all the RRs is 6.

1339	   Additionally, as soon as there is a receiver-connected PEs in each RR
1340	   cluster, the number of RRs processing a C-multicast route tends
1341	   quickly toward 2 (taking into account that a PE peering to RRs will
1342	   be made redundant).

1344	   +------------+----------+--------------+-----------+----------------+
1345	   |            | upstream | other PEs    | RRs (#RR) | total across   |
1346	   |            | PE (1)   | (total       |           | all equipments |
1347	   |            |          | across       |           |                |
1348	   |            |          | (#mvpn_PE-1) |           |                |
1349	   |            |          | PEs)         |           |                |
1350	   +------------+----------+--------------+-----------+----------------+
1351	   | first PE   | 2 m.e    | 2 m.e        | 6 m.e     | 10 m.e         |
1352	   | joins      |          |              |           |                |
1353	   +------------+----------+--------------+-----------+----------------+
1354	   | for *each* | between  | 2 m.e        | (at most) | (at most) 10   |
1355	   | additional | 0 and 2  |              | 6 m.e     | m.e tending    |
1356	   | PE joining | m.e      |              | tending   | toward 4 m.e   |
1357	   |            |          |              | toward 2  |                |
1358	   |            |          |              | m.e       |                |
1359	   +------------+----------+--------------+-----------+----------------+
1360	   | baseline   | 0        | 0            | 0         | 0              |
1361	   | processing |          |              |           |                |
1362	   | over a     |          |              |           |                |
1363	   | period T   |          |              |           |                |
1364	   +------------+----------+--------------+-----------+----------------+
1365	   | for *each* | between  | 2 m.e        | (at most) | (at most) 10   |
1366	   | PE leaving | 0 and 2  |              | 6 m.e     | m.e tending    |
1367	   |            | m.e      |              | tending   | toward 4 m.e   |
1368	   |            |          |              | toward 2  |                |
1369	   +------------+----------+--------------+-----------+----------------+
1370	   | the last   | 2 m.e    | 2 m.e        | 6 m.e     | 10 m.e         |
1371	   | PE leaves  |          |              |           |                |
1372	   +------------+----------+--------------+-----------+----------------+
1373	   | total for  | at most  | #R_PE x 4    | (at most) | at most 10 x   |
1374	   | #R_PE PEs  | 2 x #RRs | m.e          | 6 x #R_PE | #R_PE + 2 x    |
1375	   |            | m.e (see |              | m.e       | #RRs m.e       |
1376	   |            | note     |              | (tending  | (tending       |
1377	   |            | below)   |              | toward 2  | toward 6 x     |
1378	   |            |          |              | x #R_PE   | #R_PE + #RRs   |
1379	   |            |          |              | m.e)      | m.e )          |
1380	   +------------+----------+--------------+-----------+----------------+
1381	   | total      | 4 s.e    | 2 x #R_PE    | approx. 2 | approx. 4      |
1382	   | state      |          | s.e          | #R_PE +   | #R_PE + #RRx   |
1383	   | maintained |          |              | #RR x     | #clusters + 4  |
1384	   |            |          |              | #clusters | m.e            |
1385	   |            |          |              | s.e       |                |
1386	   +------------+----------+--------------+-----------+----------------+

1388	      Message processing and state maintenance - BGP-based procedures

1390	   Note on the total of m.e on the upstream PE:

1392	   o  there are as many "message.equipement" on the upstream PE as the
1393	      number of times the RRs of the cluster of the upstream PE need to
1394	      re-advertise the C-multicast (C-S,C-G) route ; such a re-
1395	      advertisement is not useful for the upstream PE, because the
1396	      behavior of the upstream PE for a said (VPN associated to the RT,
1397	      C-S,C-G) will not depend on the precise attributes carried by the
1398	      route (other than the RT, of course) but will happen in some cases
1399	      due to how BGP processes these routes ; indeed a BGP peer will
1400	      possibly re-advertise a route when its current best path changes
1401	      for the said NLRI if the set of attributes to advertise also
1402	      changes

1404	   o  let's look at the different relevant attributes, and when they can
1405	      influence when a re-advertisement of a C-multicast route will
1406	      happen:

1408	      *  next-hop and originator-id: a new PE joining will not
1409	         mechanically result in a need to re-advertise a C-multicast
1410	         route because as the RR aggregates C-multicast routes with the
1411	         same NLRI received from PEs in its own cluster (section 11.4 of
1412	         [I-D.ietf-l3vpn-2547bis-mcast-bgp]) the RR rewrites the values
1413	         of these attributes; however the advertisements made by
1414	         different RRs peering with the RRs in the cluster of the
1415	         upstream PE may lead to updates of the value of these
1416	         attributes

1418	      *  cluster-list: the value of this attribute only varies between
1419	         clusters, changes of the value of this attributes does not
1420	         "follow" PE advertisements, and only advertisements made by
1421	         different RRs may lead possibly to updates of the value of this
1422	         attribute

1424	      *  local-pref: the value of this attribute is determined locally,
1425	         this is true both for the routes advertised by each PE (which
1426	         could all be configured to use the same value) and for a route
1427	         that results from the aggregation by an RR of the route with
1428	         the same NLRI advertised by the PEs of his cluster (the RRs
1429	         could also be configured to use a local pref independent from
1430	         the local_pref of the routes advertised to him) ; thus, this
1431	         attribute can be considered to result in a need to re-advertise
1432	         a C-multicast route

1434	      *  other BGP attributes do not have a particular reason to be set
1435	         for C-multicast routes in intra-AS, and if they were, an RR
1436	         (or, for attributes relevant for inter-AS, an ASBR) would also
1437	         overwrite these values when aggregating these routes

1439	   o  Given the above, for a said C-multicast Source Tree Join (S,G)
1440	      NLRI, what may force an RR to re-advertise the route with
1441	      different attributes to the upstream PE would be the case of an RR
1442	      of another cluster advertising a route better than its current
1443	      best route, because of the values of attributes specific to that
1444	      RR (next-hop, originator-id, cluster-list) but not because of
1445	      anything specific to the PEs behind that RR.  If we consider our
1446	      (#R_PE -1) joining a said (C-S,C-G), one after the other after the
1447	      first PE joining, some of these events may thus lead to a re-
1448	      advertisement to the upstream PE, but the number of times this can
1449	      happen is at worse the number of RRs in clusters having receivers
1450	      (plus one because of the possible advertisement of the same route
1451	      by a PE of the local cluster).

1453	   o  Given that in this section, we look at scalability with an
1454	      increased number of PEs, we need to consider the possibility where
1455	      all clusters may have a client PE with a receiver.  We also need
1456	      to consider that the two RRs of the cluster of the upstream PE may
1457	      need to re-advertise the route.  With this in mind, we know that
1458	      2x#RRs is an upper bound to the number of updates made by RRs to
1459	      the upstream PE, for the considered C-multicast route.

1461	A.1.1.3.  Side by side orders of magnitude comparison

1463	   This section concludes on the previous section by considering the
1464	   orders of magnitude when the number of PEs in a VPN increases.

1466	   +------------+--------------------------------+---------------------+
1467	   |            | PIM LAN Procedures             | BGP-based           |
1468	   +------------+--------------------------------+---------------------+
1469	   | first PE   | O(#mvpn_PE)                    | O(1)                |
1470	   | joins (in  |                                |                     |
1471	   | m.e)       |                                |                     |
1472	   +------------+--------------------------------+---------------------+
1473	   | for *each* | O(#mvpn_PE)                    | O(1)                |
1474	   | additional |                                |                     |
1475	   | PE joining |                                |                     |
1476	   | (in m.e)   |                                |                     |
1477	   +------------+--------------------------------+---------------------+
1478	   | baseline   | (T/T_PIM_r) x O(#mvpn_PE)      | 0                   |
1479	   | processing |                                |                     |
1480	   | over a     |                                |                     |
1481	   | period T   |                                |                     |
1482	   | (in m.e)   |                                |                     |
1483	   +------------+--------------------------------+---------------------+
1484	   | for *each* | O(#mvpn_PE)                    | O(1)                |
1485	   | PE leaving |                                |                     |
1486	   | (in m.e)   |                                |                     |
1487	   | the last   | O(#mvpn_PE)                    | O(1)                |
1488	   | PE leaves  |                                |                     |
1489	   | (in m.e)   |                                |                     |
1490	   +------------+--------------------------------+---------------------+
1491	   | total for  | O(#mvpn_PE x #R_PE) +          | O(#R_PE)            |
1492	   | #R_PE PEs  | O(#mvpn_PE x T/T_PIM_r)        |                     |
1493	   | (in m.e)   |                                |                     |
1494	   +------------+--------------------------------+---------------------+
1495	   | states (in | O(#R_PE)                       | O(#R_PE)            |
1496	   | s.e)       |                                |                     |
1497	   +------------+--------------------------------+---------------------+
1498	   | notes      | (processing and state          | (processing and     |
1499	   |            | maintenance are essentially    | state maintenance   |
1500	   |            | done by, and spread amongst,   | is essentially done |
1501	   |            | the PEs of the MVPN ;          | by, and spread      |
1502	   |            | non-upstream PEs have          | amongst, the RRs)   |
1503	   |            | processing to do)              |                     |
1504	   +------------+--------------------------------+---------------------+

1506	    Comparison of orders of magnitude for messages processing and state
1507	                maintenance (totals across all equipements)

1509	   The conclusions that can be drawn from the above are that:

1511	   o  in the PIM-based approach, any message will be processed by all
1512	      PEs, including those that are neither upstream nor downstream for
1513	      the message, which results in a total amount of messages to
1514	      process which is in O(#mvpn_PE x #R_PE) ; i.e.  O(#mvpn_PE ^ 2) if
1515	      the proportion of receiver PEs is considered constant when the
1516	      number of PEs increases ; the refreshes of Join messages,
1517	      introduces a linear factor not changing the order of magnitude,
1518	      but which can be significant for long-lived streams ;

1520	   o  the BGP-based approach requires an amount of message processing in
1521	      O(#R_PE), lower than the PIM-based approach, and which is
1522	      independent of the duration of streams ;

1524	   o  state maintenance is of the same order of magnitude for all
1525	      approaches: O(#R_PE), but the repartition is different:

1527	      *  the PIM-absed approach fully spreads, and minimizes, the amount
1528	         of state (one state per PE)

1530	      *  the BGP-based procedures spread all the state on the set of
1531	         route reflectors

1533	A.1.2.  ASM Scalability

1535	   The conclusion in Appendix A.1.1 are reused in this section, for the
1536	   parts that are common to the setup and maintenance of states related
1537	   to a source tree or a shared tree.

1539	   When PIM-SM is used in a VPN and an ASM multicast group is joined by
1540	   some PEs (#R_PEs) with some sources sending toward this multicast
1541	   group address, we can note the following:

1543	   PEs will generally have to maintain one shared tree, plus one source
1544	   tree for each source sending toward G; each tree resulting in an
1545	   amount of processing and state maintenance similar to what is
1546	   described in the scenario in Appendix A.1.1, with the same
1547	   differences in order of magnitudes between the different approaches
1548	   when the number of PEs is high.

1550	   An exception to this is, when, for a said group in a VPN, among the
1551	   PIM instances in the customer routers and VRFs, none would switch to
1552	   the SPT (SwitchToSptDesired always false): in that case the
1553	   processing and state maintenance load is the one required for
1554	   maintenance of the shared tree only.  It has to be noted that this
1555	   scenario is dependent on customer policy.  To compare the resulting
1556	   load in that case, between PIM-based approaches and the BGP-based
1557	   approach configured to use inter-site shared trees, the scenario
1558	   inAppendix A.1.1 can be used with #R_PEs joining a (C-*,C-G) ASM
1559	   group instead of an SSM group, and the same differences in order of
1560	   magnitude remain true.  In the case of the BGP-based approach used
1561	   without inter-site shared trees, we must take into account the load
1562	   resulting from the fact that to built the C-PIM shared tree, each PE
1563	   has to join the Source Tree to each source ; using the notations of
1564	   Appendix A.1.1 this adds an amount of load (total load across all
1565	   equipments) which is proportional to #R_PEs and the number of
1566	   sources, the order of magnitude with an increasing amount of PEs is
1567	   thus unchanged, and the differences in order of magnitude also remain
1568	   the same.

1570	   Additionally to the maintenance of trees, PEs have to ensure some
1571	   processing and state maintenance related to individual sources
1572	   sending to a multicast group ; the related procedures and behaviors
1573	   largely may differ depending on which C-multicast routing protocols
1574	   is used, how it is configured, and how multicast source discovery
1575	   mechanism are used in the customer VPN and which SwitchToSptDesired
1576	   policy is used.  However the following can be observed:

1578	   o  when BGP-based C-multicast routing is used:

1580	      *  each PE will possibly have to process and maintain a BGP
1581	         Source-Active autodiscovery route for (some or all) sources of
1582	         an ASM group.  The number of Source Active autodiscovery routes
1583	         will typically be one but may be related to the amount of
1584	         upstream PEs in the following cases : when inter-site shared
1585	         trees are used and simultaneously more than one PE is used as
1586	         the upstream PE for SPT (C-S,C-G) trees, and when inter-site
1587	         shared trees are used and there are multiple PEs that are
1588	         possible upstream for this (S,G).

1590	      *  this results in a message processing and state maintenance
1591	         (total across all the equipments) linearly dependent on the
1592	         number of PEs in the VPN (#mvpn_PE) for each source,
1593	         independently of the number of PEs joined to the group.

1595	      *  Depending on whether or not inter-site shared trees are used,
1596	         and depending on the SwitchToSptDesired policy in the PIM
1597	         instances in the customer routers and VRFs, and depending on
1598	         the relative locations of sources and RPs, this will happen for
1599	         all (S,G) of an ASM group or only for some of them, and will be
1600	         done in parallel to the maintenance of shared and/or source
1601	         trees or at the first join of a PE on a source tree.

1603	   o  when PIM-based C-multicast routing is used, depending on the
1604	      SwitchToSptDesired policy in the PIM instances in the customer
1605	      routers and VRFs, and depending on the relative locations of
1606	      sources and RPs, there are:

1608	      *  possible control plane state transitions triggered by the
1609	         reception of (S,G) packets ; such events would induce
1610	         processing on all PEs joined to G

1612	      *  possible PIM Assert messages specific to (S,G) ; this would
1613	         induce a message processing on each PE of the VPN for each PIM
1614	         Assert message

1616	   Given the above, the additional processing that may happen for each
1617	   individual source sending to the group, beyond the maintenance of
1618	   source and shared trees, does not change the orders of magnitude
1619	   identified above.

1621	A.2.  Cost of PEs leaving and joining

1623	   The quantification of message processing in Appendix A.1.1 is done
1624	   based on a use case where each PE with receivers has joined and left
1625	   once.  Drawing scalability-related conclusions for other patterns of
1626	   changes of the set of receiver-connected PEs, can be done by
1627	   considering the cost of each approach for "a new PE joining" and "a
1628	   PE leaving".

1630	   For the "PIM LAN Procedure" approach, in the case of a single SSM or
1631	   SPT tree, the total amount of message processing across all nodes
1632	   depends linearly on the number of PEs in the VPN, when a PE joins
1633	   such a tree.

1635	   For the "BGP-based" approach:

1637	   o  In the case of a single SSM tree, the total amount of message
1638	      processing across all nodes is independent on the number of PEs,
1639	      for "a new PE" joining and "a PE leaving"; it also depends on how
1640	      Route Reflectors are meshed, but not with linear dependency.

1642	   o  In the case of an SPT tree for an ASM group, BGP as additional
1643	      processing due to possible Source-Active autodiscovery routes:

1645	      *  when BGP-based C-multicast routing is used with inter-site
1646	         shared trees, for the first PE joining (and last PE leaving) a
1647	         said SPT, the processing of the corresponding Source-Active
1648	         autodiscovery routes results in a processing cost linearly
1649	         dependent of the number of PEs in the VPN ; for subsequent PE
1650	         joining (and non-last PE leaving) there is no processing due to
1651	         advertisement or withdrawal of Source-Active autodiscovery
1652	         routes

1654	      *  when BGP-based C-multicast routing is used without inter-site
1655	         shared trees, the processing of Source-Active autodiscovery
1656	         routes for an (S,G), happens independently of PEs joining and
1657	         leaving the SPT for (S,G).

1659	   In the case of a new PE having to join a shared tree for an ASM group
1660	   G, we see the following:

1662	   o  the processing due to the PE joining the shared tree itself is the
1663	      same as the processing required to setup an SSM tree, as described
1664	      before (note that this does not happen when BGP-based C-multicast
1665	      routing is used without inter-site shared trees)

1667	   o  for each source for which the PE joins the SPT, the resulting
1668	      processing cost is the same as one SPT tree, as described before ;

1670	      *  the conditions under which a PE will join the SPT for a said
1671	         (C-S, C-G) are the same between the BGP-based with inter-site
1672	         shared tree approach and the PIM-based approach, and depend
1673	         solely on the SwitchToSptDesired policy in the PIM instances in
1674	         the customer routers in the sites connected to the PE and/or in
1675	         the VRF

1677	      *  the conditions under which a PE will join the SPT for a said
1678	         (C-S, C-G) differ between the BGP-based without inter-site
1679	         shared trees approach and the PIM-based approach

1681	      *  the SPT for a said (S,G) can be joined by the PE in the
1682	         following cases:

1684	         +  as soon as one router, or the VPN VRF on the PE, has
1685	            SwitchToSptDesired(S,G) being true

1687	         +  when BGP-based routing is used, and configured to not use
1688	            inter-site shared trees

1690	      *  said differently, the only case where the PE will not join the
1691	         SPT for (S,G) is when all routers in the sites of the VPN
1692	         connected to the PE, or the VPN VRF itself, will never have
1693	         SwitchToSptDesired(S,G) being true, with the additional
1694	         condition when BGP-based C-multicast routing is used, that
1695	         inter-site shared trees are used

1697	   Thus, when one PE joins a group G to which n sources are sending
1698	   traffic, we note the following with regards to the dependency of the
1699	   cost (in total amount of processing across all equipments) to the
1700	   number of PEs :

1702	   o  in the general case (where any router in the site of the VPN
1703	      connected to the PE, or the VRF itself, may have
1704	      SwitchToSptDesired(S,G) being true):

1706	      *  for the "PIM LAN Procedure" approach, the cost is linearly
1707	         dependent on the number of PEs in the VPN, and linearly
1708	         dependent on the number of sources

1710	      *  for the "BGP-based" approach, the cost is linearly dependent on
1711	         the number of sources, and, in the sub-case of the BGP-based
1712	         approach used with inter-site shared trees is also dependent on
1713	         the number of PEs in the VPN only if the PE is the first to
1714	         join the group or the SPT for some source sending to the group

1716	   o  else, under the assumption that routers in the sites of the VPN
1717	      connected to the PE, and the VPN VRF itself, will never have the
1718	      policy function SwitchToSptDesired(S,G) being possibly true, then:

1720	      *  in the case of the PIM-based approach, the cost is linearly
1721	         dependent on the number of PEs in the VPN, and there is no
1722	         dependency on the number of sources

1724	      *  in the case of the BGP-based approach with inter-site shared
1725	         trees, the cost is linearly dependent on the number of RRs, and
1726	         there is no dependency on the number of sources

1728	      *  in the case of the BGP-based approach without inter-site shared
1729	         trees, the cost is linearly dependent on the number of RRs and
1730	         on the number of sources

1732	   Hence, with the PIM-based approach the overall cost across all
1733	   equipments of any PE joining an ASM group G is always dependent on
1734	   the number of PEs (same for a PE that leaves), while the BGP-based
1735	   approach has a cost independent of the number of PEs (with the
1736	   exception of the first PE joining the ASM group, for the BGP-based
1737	   approach used without inter-site shared trees; in that case there is
1738	   a dependency with the number of PEs).

1740	   On the dependency with the number of sources : without making any
1741	   assumption on the SwitchToSptDesired policy on PIM routers and VRFs
1742	   of a VPN, we see that a PE joining an ASM group may induce a
1743	   processing cost linearly dependent on the number of sources.  Apart
1744	   from this general case, under the condition where the
1745	   SwitchToSptDesired is always false on all PIM routers and VRFs of the
1746	   VPN, then with the PIM-based approach, and with the BGP-based
1747	   approach used with inter-site shared trees, the cost in amount of
1748	   messages processed will be independent of the number of sources (it
1749	   has to be noted that this condition depends on customer policy).

1751	Appendix B.  Switching to S-PMSI

1753	   [ the following point was fixed in version 07 of
1754	   [I-D.ietf-l3vpn-2547bis-mcast], and is here for reference only ]

1756	   Section 7.2.2.3 of [I-D.ietf-l3vpn-2547bis-mcast] proposes two
1757	   approaches for how a source PE can decide when to start transmitting
1758	   customer multicast traffic on a S-PMSI:

1760	   1.  The source PE sends multicast packets for the <C-S, C-G> on both
1761	       the I-PMSI P-multicast tree and the S-PMSI P-multicast tree
1762	       simultaneously for a pre-configured period of time, letting the
1763	       receiver PEs select the new tree for reception, before switching
1764	       to only the S-PMSI.

1766	   2.  The source PE waits for a pre-configured period of time after
1767	       advertising the <C-S, C-G> entry bound to the S-PMSI before fully
1768	       switching the traffic onto the S-PMSI-bound P-multicast tree.

1770	   The first alternative has essentially two drawbacks:

1772	   o  <C-S,C-G> traffic is sent twice for some period of time, which
1773	      would appear to be at odds with the motivation for switching to an
1774	      S-PMSI in order to optimize the bandwidth used by the multicast
1775	      tree for that stream.

1777	   o  It is unlikely that the switchover can occur without packet loss
1778	      or duplication if the transit delays of the I-PMSI P-multicast
1779	      tree and the S-PMSI P-multicast tree differ.

1781	   By contrast, the second alternative has none of these drawbacks, and
1782	   satisfy the requirement in section 5.1.3 of [RFC4834], which states
1783	   that "[...] a multicast VPN solution SHOULD as much as possible
1784	   ensure that client multicast traffic packets are neither lost nor
1785	   duplicated, even when changes occur in the way a client multicast
1786	   data stream is carried over the provider network".  The second
1787	   alternative also happen to be the one used in existing deployments.

1789	   For these reasons, it is the authors' recommendation to mandate the
1790	   implementation of the second alternative for switching to S-PMSI.

1792	Authors' Addresses

1794	   Thomas Morin (editor)
1795	   France Telecom - Orange Labs
1796	   2 rue Pierre Marzin
1797	   Lannion  22307
1798	   France

1800	   Email: thomas.morin@orange-ftgroup.com

1802	   Ben Niven-Jenkins (editor)
1803	   BT
1804	   208 Callisto House, Adastral Park
1805	   Ipswich, Suffolk  IP5 3RE
1806	   UK

1808	   Email: benjamin.niven-jenkins@bt.com
1809	   Yuji Kamite
1810	   NTT Communications Corporation
1811	   Tokyo Opera City Tower
1812	   3-20-2 Nishi Shinjuku, Shinjuku-ku
1813	   Tokyo  163-1421
1814	   Japan

1816	   Email: y.kamite@ntt.com

1818	   Raymond Zhang
1819	   BT
1820	   2160 E. Grand Ave.
1821	   El Segundo  CA 90025
1822	   USA

1824	   Email: raymond.zhang@bt.com

1826	   Nicolai Leymann
1827	   Deutsche Telekom
1828	   Goslarer Ufer 35
1829	   10589 Berlin
1830	   Germany

1832	   Email: n.leymann@telekom.de

1834	   Nabil Bitar
1835	   Verizon
1836	   40 Sylvan Road
1837	   Waltham, MA  02451
1838	   USA

1840	   Email: nabil.n.bitar@verizon.com