idnits 2.17.1 

draft-ietf-l3vpn-mvpn-considerations-05.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** The document seems to lack a License Notice according IETF Trust
     Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009
     Section 6.b -- however, there's a paragraph with a matching beginning.
     Boilerplate error?

     (You're using the IETF Trust Provisions' Section 6.b License Notice from
     12 Feb 2009 rather than one of the newer Notices.  See
     https://trustee.ietf.org/license-info/.)


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** There are 40 instances of too long lines in the document, the longest
     one being 1 character in excess of 72.

  == There are 3 instances of lines with non-RFC6890-compliant IPv4 addresses
     in the document.  If these are example addresses, they should be changed.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 417 has weird spacing: '...   or   the us...'

  -- The document date (October 26, 2009) is 5295 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'RFC4364' is mentioned on line 740, but not defined

  == Outdated reference: A later version (-10) exists of
     draft-ietf-l3vpn-2547bis-mcast-08

  == Outdated reference: A later version (-15) exists of
     draft-rosen-vpn-mcast-12

  == Outdated reference: A later version (-10) exists of
     draft-ietf-pim-sm-linklocal-08

  == Outdated reference: A later version (-09) exists of
     draft-ietf-pim-port-01

  == Outdated reference: A later version (-15) exists of
     draft-ietf-mpls-ldp-p2mp-08


     Summary: 2 errors (**), 0 flaws (~~), 10 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                      T. Morin, Ed.
3	Internet-Draft                                     France Telecom Orange
4	Expires: April 29, 2010                            B. Niven-Jenkins, Ed.
5	                                                                      BT
6	                                                               Y. Kamite
7	                                                      NTT Communications
8	                                                                R. Zhang
9	                                                                      BT
10	                                                              N. Leymann
11	                                                        Deutsche Telekom
12	                                                                N. Bitar
13	                                                                 Verizon
14	                                                        October 26, 2009

16	    Mandatory Features in a Layer 3 Multicast BGP/MPLS VPN Solution
17	                draft-ietf-l3vpn-mvpn-considerations-05

19	Status of this Memo

21	   This Internet-Draft is submitted to IETF in full conformance with the
22	   provisions of BCP 78 and BCP 79.

24	   Internet-Drafts are working documents of the Internet Engineering
25	   Task Force (IETF), its areas, and its working groups.  Note that
26	   other groups may also distribute working documents as Internet-
27	   Drafts.

29	   Internet-Drafts are draft documents valid for a maximum of six months
30	   and may be updated, replaced, or obsoleted by other documents at any
31	   time.  It is inappropriate to use Internet-Drafts as reference
32	   material or to cite them other than as "work in progress."

34	   The list of current Internet-Drafts can be accessed at
35	   http://www.ietf.org/ietf/1id-abstracts.txt.

37	   The list of Internet-Draft Shadow Directories can be accessed at
38	   http://www.ietf.org/shadow.html.

40	   This Internet-Draft will expire on April 29, 2010.

42	Copyright Notice

44	   Copyright (c) 2009 IETF Trust and the persons identified as the
45	   document authors.  All rights reserved.

47	   This document is subject to BCP 78 and the IETF Trust's Legal
48	   Provisions Relating to IETF Documents in effect on the date of
49	   publication of this document (http://trustee.ietf.org/license-info).
50	   Please review these documents carefully, as they describe your rights
51	   and restrictions with respect to this document.

53	Abstract

55	   More that one set of mechanisms to support multicast in a layer 3
56	   BGP/MPLS VPN has been defined.  These are presented in the documents
57	   that define them as optional building blocks.

59	   To enable interoperability between implementations, this document
60	   defines a subset of features that is considered mandatory for a
61	   multicast BGP/MPLS VPN implementation.  This will help implementers
62	   and deployers understand which L3VPN multicast requirements are best
63	   satisfied by each option.

65	Requirements Language

67	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
68	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
69	   document are to be interpreted as described in [RFC2119].

71	Table of Contents

73	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
74	   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  4
75	   3.  Examining alternatives mechanisms for MVPN functions . . . . .  4
76	     3.1.  MVPN auto-discovery  . . . . . . . . . . . . . . . . . . .  4
77	     3.2.  S-PMSI Signaling . . . . . . . . . . . . . . . . . . . . .  5
78	     3.3.  PE-PE Exchange of C-Multicast Routing  . . . . . . . . . .  7
79	       3.3.1.  PE-PE C-multicast routing scalability  . . . . . . . .  7
80	       3.3.2.  PE-CE multicast routing exchange scalability . . . . . 10
81	       3.3.3.  P-routers scalability  . . . . . . . . . . . . . . . . 10
82	       3.3.4.  Impact of C-multicast routing on Inter-AS
83	               deployments  . . . . . . . . . . . . . . . . . . . . . 10
84	       3.3.5.  Security and robustness  . . . . . . . . . . . . . . . 11
85	       3.3.6.  C-multicast VPN join latency . . . . . . . . . . . . . 12
86	       3.3.7.  Conclusion on C-multicast routing  . . . . . . . . . . 14
87	     3.4.  Encapsulation techniques for P-multicast trees . . . . . . 14
88	     3.5.  Inter-AS deployments options . . . . . . . . . . . . . . . 16
89	     3.6.  Bidir-PIM support  . . . . . . . . . . . . . . . . . . . . 18
90	   4.  Co-located RPs . . . . . . . . . . . . . . . . . . . . . . . . 19
91	   5.  Existing deployments . . . . . . . . . . . . . . . . . . . . . 20
92	   6.  Summary of recommendations . . . . . . . . . . . . . . . . . . 21
93	   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 21
94	   8.  Security Considerations  . . . . . . . . . . . . . . . . . . . 21
95	   9.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 22
96	   10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 22
97	     10.1. Normative References . . . . . . . . . . . . . . . . . . . 22
98	     10.2. Informative References . . . . . . . . . . . . . . . . . . 22
99	   Appendix A.  Scalability of C-multicast routing processing load  . 23
100	     A.1.  Scalability with an increased number of PEs  . . . . . . . 25
101	       A.1.1.  SSM Scenario . . . . . . . . . . . . . . . . . . . . . 25
102	       A.1.2.  ASM Scalability  . . . . . . . . . . . . . . . . . . . 32
103	     A.2.  Cost of PEs leaving and joining  . . . . . . . . . . . . . 34
104	   Appendix B.  Switching to S-PMSI . . . . . . . . . . . . . . . . . 37
105	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 37

107	1.  Introduction

109	   Specifications for multicast in BGP/MPLS
110	   [I-D.ietf-l3vpn-2547bis-mcast] include multiple alternative
111	   mechanisms for some of the required building blocks of the solution.
112	   However, they do not identify which of these mechanisms are mandatory
113	   to implement in order to ensure interoperability.  Not defining a set
114	   of mandatory to implement mechanisms leads to a situation where
115	   implementations may support different subsets of the available
116	   optional mechanisms which do not interoperate, which is a problem for
117	   the numerous operators having multi-vendor backbones.

119	   The aim of this document is to leverage the already expressed
120	   requirements [RFC4834] and study the properties of each approach, to
121	   identify mechanisms that are good candidates for being part of a core
122	   set of mandatory mechanisms which can be used to provide a base for
123	   interoperable solutions.

125	   This document goes through the different building blocks of the
126	   solution and concludes on which mechanisms an implementation is
127	   required to implement.  Section 6 summarizes these requirements.

129	   Considering the history of the multicast VPN proposals and
130	   implementations, it is also useful to discuss how existing
131	   deployments of early implementations
132	   [I-D.rosen-vpn-mcast][I-D.raggarwa-l3vpn-2547-mvpn] can be
133	   accommodated, and provide suggestions in this respect.

135	2.  Terminology

137	   Please refer to [I-D.ietf-l3vpn-2547bis-mcast] and [RFC4834].

139	3.  Examining alternatives mechanisms for MVPN functions

141	3.1.  MVPN auto-discovery

143	   The current solution document [I-D.ietf-l3vpn-2547bis-mcast] proposes
144	   two different mechanisms for MVPN auto-discovery:

146	   1.  BGP-based auto-discovery

148	   2.  "PIM/shared P-tunnel": discovery done through the exchange of PIM
149	       Hellos by C-PIM instances, across an MI-PMSI implemented with one
150	       shared P-tunnel per VPN (using multicast ASM, or MP2MP LDP)

152	   Both solutions address Section 5.2.10 of [RFC4834] which states that
153	   "the operation of a multicast VPN solution SHALL be as light as
154	   possible and providing automatic configuration and discovery SHOULD
155	   be a priority when designing a multicast VPN solution.  Particularly
156	   the operational burden of setting up multicast on a PE or for a VR/
157	   VRF SHOULD be as low as possible".

159	   The key consideration is that PIM-based discovery is only applicable
160	   to deployments using a shared P-tunnel to instantiate an MI-PMSI (it
161	   is not applicable if only P2P, PIM-SSM, P2MP mLDP/RSVP-TE P-tunnels
162	   are used, because contrary to ASM and MP2MP, building these types of
163	   P-tunnels cannot happen before the autodiscovery has been done),
164	   whereas the BGP-based auto-discovery does not place any constraint on
165	   the type of P-tunnel that would have to be used.  BGP-based auto-
166	   discovery is independent of the type of P-tunnel used thus satisfying
167	   the requirement in section 5.2.4.1 of [RFC4834] that "a multicast VPN
168	   solution SHOULD be designed so that control and forwarding planes are
169	   not interdependent".

171	   Additionally, it is to be noted that a number of service providers
172	   have chosen to use SSM-based P-tunnels for the default MDTs within
173	   their current deployments, therefore relying already on some BGP-
174	   based auto-discovery.

176	   Moreover, when shared P-tunnels are used, the use of BGP auto-
177	   discovery would allow inconsistencies in the addresses/identifiers
178	   used for the shared P-tunnel to be detected (e.g. the same shared
179	   P-tunnel identifier being used for different VPNs with distinct BGP
180	   route targets).  This is particularly attractive in the context of
181	   inter-AS VPNs where the impact of any misconfiguration could be
182	   magnified and where a single service provider may not operate all the
183	   ASs.  Note that this technique to detect some misconfiguration cases
184	   may not be usable during a transition period from a shared-P-tunnel
185	   autodiscovery to a BGP-based autodiscovery.

187	   Thus, the recommendation is that implementation of the BGP-based
188	   auto-discovery is mandated and should be supported by all MVPN
189	   implementations.

191	3.2.  S-PMSI Signaling

193	   The current solution document [I-D.ietf-l3vpn-2547bis-mcast] proposes
194	   two mechanisms for signaling that multicast flows will be switched to
195	   an S-PMSI:

197	   1.  a UDP-based TLV protocol specifically for S-PMSI signaling
198	       (described in section 7.4.2).

200	   2.  a BGP-based mechanism for S-PMSI signaling (described in section
201	       7.4.1).

203	   Section 5.2.10 of [RFC4834] states that "as far as possible, the
204	   design of a solution SHOULD carefully consider the number of
205	   protocols within the core network: if any additional protocols are
206	   introduced compared with the unicast VPN service, the balance between
207	   their advantage and operational burden SHOULD be examined
208	   thoroughly".  The UDP-based mechanism would be an additional protocol
209	   in the MVPN stack, which isn't the case for the BGP-based S-PMSI
210	   switching signaling, since (a) BGP is identified as a requirement for
211	   autodiscovery, and (b) the BGP-based S-PMSI switching signaling
212	   procedures are very similar to the autodiscovery procedures.

214	   Furthermore, the UDP-based S-PMSI switching signaling mechanism
215	   requires an MI-PMSI, while the BGP-based protocol does not.  In
216	   practice, this mean that with the UDP-based protocol a PE will have
217	   to join to all P-tunnels of all PEs in an MVPN, while in the
218	   alternative where BGP-based S-PMSI switching signaling is used, it
219	   could delay joining a P-tunnel rooted at a PE until traffic from that
220	   PE is needed, thus reducing the amount of state maintained on P
221	   routers.

223	   S-PMSI switching signaling approaches can also be compared in an
224	   inter-AS context (see Section 3.5).  The proposed BGP-based approach
225	   for S-PMSI switching signaling provides a good fit with both the
226	   segmented and non-segmented inter-AS approaches (seeSection 3.5).  By
227	   contrast while the UDP-based approach for S-PMSI switching signaling
228	   appears to be usable with segmented inter-AS tunnels, in that case
229	   key advantages of the segmented approach are lost:

231	   o  there is no more an independence of ASes to choose when S-PMSIs
232	      tunnels will be triggered in their AS (and thus control the amount
233	      of state created on their P routers),

235	   o  there is no more an independence of ASes to choose the tunneling
236	      technique for the P-tunnels used for an S-PMSI,

238	   o  In an inter-AS option B context, an isolation of ASes is obtained
239	      as PEs in one AS don't have (direct) exchange of routing
240	      information with PEs of other ASes.  This property is not
241	      preserved if UDP-based S-PMSI switching signaling is used.  By
242	      contrast, BGP-based C-Multicast switching signaling does preserve
243	      this property.

245	   Given all the above, it is the recommendation of the authors that BGP
246	   is the preferred solution for S-PMSI switching signaling and should
247	   be supported by all implementations.

249	   It is identified that, if nothing prevents a fast-paced creation of
250	   S-PMSI, then S-PMSI switching signaling with BGP would possibly
251	   impact the Route Reflectors used for MVPN routes.  However is it also
252	   identified that such a fast-paced behavior would have an impact on P
253	   and PE routers resulting from S-PMSI tunnels signaling, which will be
254	   the same independently of the S-PMSI signaling approach that is used,
255	   and which it is certainly best to avoid by setting up proper
256	   mechanisms.

258	   The UDP-based S-PMSI switching signaling protocol can also be
259	   considered, as an option, given that this protocol has been in
260	   deployment for some time.  Implementations supporting both protocols
261	   would be expected to provide a per-VRF configuration knob to allow an
262	   implementation to use the UDP-based TLV protocol for S-PMSI switching
263	   signaling for specific VRFs in order to support the coexistence of
264	   both protocols (for example during migration scenarios).  Apart from
265	   such migration-facilitating mechanisms, the authors specifically do
266	   not recommend extending the already proposed UDP-based TLV protocol
267	   to new types of P-tunnels.

269	3.3.  PE-PE Exchange of C-Multicast Routing

271	   The current solution document [I-D.ietf-l3vpn-2547bis-mcast] proposes
272	   multiple mechanisms for PE-PE exchange of customer multicast routing
273	   information (C-multicast routing):

275	   1.  Full per-MVPN PIM peering across an MI-PMSI (described in section
276	       3.4.1.1).

278	   2.  Lightweight PIM peering across an MI-PMSI (described in section
279	       3.4.1.2)

281	   3.  The unicasting of PIM C-Join/Prune messages (described in section
282	       3.4.1.3)

284	   4.  The use of BGP for carrying C-Multicast routing (described in
285	       section 3.4.2).

287	3.3.1.  PE-PE C-multicast routing scalability

289	   Scalability being one of the core requirements for multicast VPN, it
290	   is useful to compare the proposed C-multicast routing mechanisms from
291	   this perspective: Section 4.2.4 of [RFC4834] recommends that "a
292	   multicast VPN solution SHOULD support several hundreds of PEs per
293	   multicast VPN, and MAY usefully scale up to thousands" and section
294	   4.2.5 states that "a solution SHOULD scale up to thousands of PEs
295	   having multicast service enabled".

297	   Scalability with an increased number of VPNs per PE, or with an
298	   increased number of multicast state per VPN, are also important, but
299	   are not focused on in this section since we didn't identify
300	   differences between the different approaches for these matters: all
301	   others things equal, the load on PE due to C-multicast routing
302	   increases roughly linearly with the number of VPNs per PE, and with
303	   the number of multicast state per VPN.

305	   This section presents conclusions related to PE-PE C-multicast
306	   routing scalability.  Appendix A provides more detailed explanations
307	   on the differences in ways of handling the C-multicast routing load,
308	   between the PIM-based approaches and the BGP-based approach, along
309	   with a quantified evaluations of the amount of state and messages
310	   with the different approaches, and many points made in this section
311	   are detailed in Appendix A.1.

313	   At high scales of multicast deployment, the first and third
314	   mechanisms require the PEs to maintain a large number of PIM
315	   adjacencies with other PEs of the same multicast VPN (which implies
316	   the regular exchange PIM Hellos with each other) and to periodically
317	   refresh C-Join/Prune states, resulting in an increased processing
318	   cost when the amount of PEs increases (as detailed in Appendix A.1)
319	   to which the second approach is less subject, and to which the fourth
320	   approach is not subject.

322	   The third mechanism would reduce the amount of C-Join/Prune
323	   processing for a given multicast flow for PEs that are not the
324	   upstream neighbor for this flow, but would require "explicit
325	   tracking" state to be maintained by the upstream PE.  It also isn't
326	   compatible with the "Join suppression" mechanism.  A possible way to
327	   reduce the amount of signaling with this approach would be the use of
328	   a PIM refresh-reduction mechanism.  Such a mechanism, based on TCP,
329	   is being specified by the PIM IETF Working Group
330	   ([I-D.ietf-pim-port]) ; its use in a multicast VPN context has not
331	   been described in [I-D.ietf-l3vpn-2547bis-mcast], but it is expected
332	   that this approach would provide a scalability similar with the BGP-
333	   based approach without RR.

335	   The second mechanism would operate in a similar manner to full per-
336	   MVPN PIM peering except that PIM Hello messages are not transmitted
337	   and PIM C-Join/Prune refresh-reduction would be used, thereby
338	   improving scalability, but this approach has yet to be fully
339	   described.  In any case, it seems that it only improves one thing
340	   among the things that will impact scalability when the number of PEs
341	   increases.

343	   The first and second mechanisms can leverage the "Join suppression"
344	   behavior and thus improve the processing burden of an upstream PE,
345	   sparing the processing of a Join refresh message for each remote PE
346	   joined to a multicast stream.  This improvement requires all PEs of a
347	   multicast VPN to process all PIM Join and Prune messages sent by any
348	   other PE participating in the same multicast VPN whether they are the
349	   upstream PE or not.

351	   The fourth mechanism (the use of BGP for carrying C-Multicast
352	   routing) would have a comparable drawback of requiring all PEs to
353	   process a BGP C-multicast route only interesting a specific upstream
354	   PE.  For this reason section 16 [I-D.ietf-l3vpn-2547bis-mcast-bgp]
355	   recommends the use of the Route-Target constrained BGP distribution
356	   [RFC4684] mechanisms, which eliminate this drawback by making only
357	   the interested upstream PE to receive a BGP C-multicast route.
358	   Specifically when Route-Target constrained BGP distribution is used,
359	   the fourth mechanism reduces the total amount of C-multicast routing
360	   processing load put on the PEs by avoiding any processing of customer
361	   multicast routing information on the "unrelated" PEs, that are
362	   neither the joining PE nor the upstream PE.

364	   Moreover, the fourth mechanism further reduces the total amount of
365	   message processing load by avoiding the use of periodic refreshes,
366	   and by inheriting BGP features that are expected to improve
367	   scalability (for instance, providing a means to offload some of the
368	   processing burden associated with customer multicast routing onto one
369	   or many BGP route-reflectors).  The advantages of the fourth
370	   mechanism come at a cost of maintaining an amount of state linear
371	   with the number of PEs joined to a stream.  However, the use of route
372	   reflectors allows to spread this cost among multiple route
373	   reflectors, thus eliminating the need for a single route reflector to
374	   maintain all this state.

376	   However, the fourth mechanism is specific in that it offers the
377	   possibility of offloading customer multicast routing processing onto
378	   one or more BGP Route Reflector(s).  When this is used, there is a
379	   drawback of increasing the processing load placed on the route
380	   reflector infrastructure.  In the higher scale scenarios, it may be
381	   required to adapt the route reflector infrastructure to the MVPN
382	   routing load by using, for example:

384	   o  a separation of resources for unicast and multicast VPN routing:
385	      using dedicated MVPN Route Reflector(s) (or using dedicated MVPN
386	      BGP sessions or dedicated MVPN BGP instances) ;

388	   o  the deployment of additional route reflector resources, for
389	      example increasing the processing resources on existing route
390	      reflectors or deployment of additional route reflectors.

392	   Among the above, the most straightforward approach is to consider the
393	   introduction of route reflectors dedicated to the MVPN service and
394	   dimension them accordingly to the need of that service (but doing so
395	   is not required and is left as an operator engineering decision).

397	3.3.2.  PE-CE multicast routing exchange scalability

399	   The overhead associated with the PE-CE exchange of C-multicast
400	   routing is independent of the choice of the mechanism used for the
401	   PE-PE C-multicast routing.  Therefore, the impact of the PE-CE
402	   C-multicast routing overhead on the overall system scalability is
403	   independent of the protocol used for PE-PE signaling, and therefore
404	   is not relevant when comparing the different approaches proposed for
405	   the PE-PE C-multicast routing.  This is true even if in some
406	   operational contexts the PE-CE C-multicast routing overhead is a
407	   significant factor in the overall system overhead.

409	3.3.3.  P-routers scalability

411	   Mechanisms (1) and (2) are restricted to use within multicast VPNs
412	   that use an MI-PMSI, thereby necessitating:

414	      the use of a P-tunnel technique that allows shared P-tunnels (for
415	      example PIM-SM in ASM mode or MP2MP LDP)

417	   or   the use of one P-tunnel per PE per VPN, even for PEs that do not
418	      have sources in their directly attached sites for that VPN.

420	   By comparison, the fourth mechanism doesn't impose either of these
421	   restrictions, and when P2MP P-tunnels are used only necessitates the
422	   use of one P-tunnel per VPN per PE attached to a site with a
423	   multicast source or RP (or with a candidate BSR, if BSR is used).

425	   In cases where there are less PEs connected with sources than the
426	   total amount of PEs, it improves the amount of state maintained by
427	   P-routers compared to the amount required to build an MI-PMSI with
428	   P2MP P-tunnels.  Such cases are expected to be frequent for multicast
429	   VPN deployments (see sections 4.2.4.1 of [RFC4834]).

431	3.3.4.  Impact of C-multicast routing on Inter-AS deployments

433	   Co-existence with unicast inter-AS VPN options, and an equal level of
434	   security for multicast and unicast including in an inter-AS context,
435	   are specifically mentioned in sections 5.2.6, 5.2.8 and 5.2.12 of
436	   [RFC4834].

438	   In an inter-AS option B context, an isolation of ASes is obtained as
439	   PEs in one AS don't have (direct) exchange of routing information
440	   with PEs of other ASes.  This property is not preserved if PIM-based
441	   PE-PE C-multicast routing is used.  By contrast, the fourth option
442	   (BGP-based C-Multicast routing) does preserve this property.

444	   Additionally, the authors note that the proposed BGP-based approach
445	   for C-multicast routing provides a good fit with both the segmented
446	   and non-segmented inter-AS approaches.  By contrast, though the PIM-
447	   based C-multicast routing is usable with segmented inter-AS tunnels,
448	   the inter-AS scalability advantage of the approach is lost, since PEs
449	   in an AS will see the C-multicast routing activity of all other PEs
450	   of all other ASes.

452	3.3.5.  Security and robustness

454	   BGP supports MD5 authentication of its peers for additional security,
455	   thereby possibly benefit directly to multicast VPN customer multicast
456	   routing, whether for intra-AS or inter-AS communications.  By
457	   contrast, with a PIM-based approach, no mechanism providing a
458	   comparable level of security to authenticate communications between
459	   remote PEs has been yet fully described yet
460	   [I-D.ietf-pim-sm-linklocal][], and in any case would require
461	   significant additional operations for the provider to be usable in a
462	   multicast VPN context.

464	   The robustness of the infrastructure, especially the existing
465	   infrastructure providing unicast VPN connectivity, is key.  The
466	   C-multicast routing function, especially under load, will compete
467	   with the unicast routing infrastructure.  With the PIM-based
468	   approaches, the unicast and multicast VPN routing functions are
469	   expected to only compete in the PE, for control plane processing
470	   resources.  In the case of the BGP-based approach, they will compete
471	   on the PE for processing resources, and in the route reflectors
472	   (supposing they are used for MVPN routing).  It is identified that in
473	   both cases, mechanisms will be required to arbitrate resources (e.g.
474	   processing priorities).  In the case of PIM-based procedures, between
475	   the different control plane routing instances in the PE.  And in the
476	   case of the BGP-based approach, this is likely to require using
477	   distinct BGP sessions for multicast and unicast (e.g. through the use
478	   of dedicated MVPN BGP route reflectors, or to the use of a distinct
479	   session with an existing route reflector).

481	   Multicast routing is dynamic by nature, and multicast VPN routing has
482	   to follow the VPN customers multicast routing events.  The different
483	   approaches can be compared on how they are expected to behave in
484	   scenarios where multicast routing in the VPNs is subject to an
485	   intense activity.  Scalability of each approach under such a load is
486	   detailed in Appendix A.2, and the fourth approach (BGP-based) used in
487	   conjunction with the RT Constraint mechanisms [RFC4684], is the only
488	   one having a cost for join/leave operations independent of the number
489	   of PEs in the VPN (with one exception detailed in Appendix A.2) and
490	   state maintenance not concentrated on the upstream PE.

492	   On the other hand, while the BGP-based approach is likely to suffer a
493	   slowdown under a load that is greater than the available processing
494	   resources (because of possibly congested TCP sockets), the PIM-based
495	   approaches would react to such a load by dropping messages, with
496	   failure-recovery obtained through message refreshes.  Thus, the BGP-
497	   based approach could result in a degradation of join/leave latency
498	   performance typically spread evenly across all multicast streams
499	   being joined in that period, while the PIM-based approach could
500	   result in increased join/leave latency, for some random streams, by a
501	   multiple of the time between refreshes (e.g. tens of seconds), and
502	   possibly in some states the adjacency may time-out resulting in
503	   disruption of multicast streams.

505	   The behavior of the PIM-based approach under such a load is also
506	   harder to predict, given that the performance of the "Join
507	   suppression" mechanism (an important mechanism for this approach to
508	   scale) will itself be impeded by delays in Join processing.  For
509	   these reasons, the BGP-based approach would be able to provide a
510	   smoother degradation and more predictable behavior under a highly
511	   dynamic load.

513	   In fact, both an "evenly spread degradation" and an "unevenly spread
514	   larger degradation" can be problematic, and what seems important is
515	   the ability for the VPN backbone operator to (a) limit the amount of
516	   multicast routing activity that can be triggered by a multicast VPN
517	   customer, and to (b) provide the best possible independence between
518	   distinct VPNs.  It seems that both of these can be addressed through
519	   local implementation improvements, and that both the BGP-based and
520	   PIM-based approaches could be engineered to provide (a) and (b).  It
521	   can be noted though that the BGP approach proposes ways to dampen
522	   C-multicast route withdrawals and/or advertisements, and thus already
523	   describes a way to provide (a), while nothing comparable has yet been
524	   described for the PIM-based approaches (even though it doesn't appear
525	   difficult).  The PIM-based approaches rely on a per VPN dataplane to
526	   carry the MVPN control plane, and thus may benefit from this first
527	   level of separation to solve (b).

529	3.3.6.  C-multicast VPN join latency

531	   Section 5.1.3 of [RFC4834] states that "the group join delay [...] is
532	   also considered one important QoS parameter.  It is thus RECOMMENDED
533	   that a multicast VPN solution be designed appropriately in this
534	   regard".  In a multicast VPN context, the "group join delay"of
535	   interest is the time between a CE sending a PIM Join to its PE and
536	   the first packet of the corresponding multicast stream being received
537	   by the CE.

539	   It is to be noted that the C-multicast routing procedures will only
540	   impact the group join latency of a said multicast stream for the
541	   first receiver that is located across the provider backbone from the
542	   multicast source-connected PE (or the first <n> receivers in the
543	   specific case where a specific UMH selection algorithm is used, that
544	   allows <n> distinct UMH to be selected by distinct downstream PEs).

546	   The different approaches proposed seem to have different
547	   characteristics in how they are expected to impact join latency:

549	   o  the PIM-based approaches minimize the number of control plane
550	      processing hops between a new receiver-connected PE and the
551	      source-connected PE, and being datagram-based introduces minimal
552	      delay, thereby possibly having a join latency as good as possible
553	      depending on implementation efficiency

555	   o  under degraded conditions (packet loss, congestion, high control
556	      plane load) the PIM-based approach may impact the latency for a
557	      given multicast stream in an all or nothing manner: if a
558	      C-multicast routing PIM Join packet is lost, latency can reach a
559	      high time (a multiple of the periodicity of PIM Join refreshes)

561	   o  the BGP-based approach uses TCP exchanges, that may introduce an
562	      additional delay depending on BGP and TCP implementation, but
563	      which would typically result, under degraded conditions (such
564	      packet loss, congestion, high control plane load), in a comparably
565	      lower increase of latency spread more evenly across the streams

567	   o  as shown in Appendix A, the BGP-based approach is particular in
568	      that it removes load from all the PEs (without putting this load
569	      on the upstream PE for a stream); this improvement of background
570	      load can bring improved performance when a PE acts as the upstream
571	      PE for a stream, and thus benefit join latency

573	   This qualitative comparison of approaches shows that the BGP-based
574	   approach is designed for a smoother degradation of latency under
575	   degraded conditions such as packet loss, congestion, or high control
576	   plane load.  On the other hand, the PIM-based approaches seem to
577	   structurally be able to reach the shorter "best-case" group join
578	   latency (especially compared to deployment of the BGP-based approach
579	   where route-reflectors are used).

581	   Doing a quantitative comparison of latencies is not possible without
582	   referring to specific implementations and benchmarking procedures,
583	   and would possibly expose different conclusions, especially for best-
584	   case group join latency for which performance is expected vary with
585	   PIM and BGP implementations.  We can also note that improving a BGP
586	   implementation for reduced latency of route processing would not only
587	   benefit multicast VPN group join latency, but the whole BGP-based
588	   routing, which means that the need for good BGP/RR performance is not
589	   specific to multicast VPN routing.

591	   Last, C-multicast join latency will be impacted by the overall load
592	   put on the control plane, and the scalability of the C-multicast
593	   routing approach is thus to be taken into account.  As explained in
594	   sections Section 3.3.1 and Appendix A, the BGP-based approach will
595	   provide the best scalability with an increased number of PEs per VPN,
596	   thereby benefiting group join latency in such higher scale scenarios.

598	3.3.7.  Conclusion on C-multicast routing

600	   The first and fourth approaches are relevant contenders for
601	   C-multicast routing.  Comparisons from a theoretical standpoint lead
602	   to identify some advantages as well as possible drawbacks in the
603	   fourth approach.  Comparisons from a practical standpoint are harder
604	   to make: since only reduced deployment and implementation information
605	   is available for the fourth approach, advantages would be seen in the
606	   first approach that has been applied through multiple deployments and
607	   shown to be operationally viable.

609	   Moreover, the first mechanism (full per-MVPN PIM peering across an
610	   MI-PMSI) is the mechanism used by [I-D.rosen-vpn-mcast] and therefore
611	   it is deployed and operating in MVPNs today.  The fourth approach may
612	   or may not end up being preferred for a said deployment, but because
613	   the first approach has been in deployment for some time, the support
614	   for this mechanism will in any case be helpful for to facilitate an
615	   eventual migration from a deployment using mechanism close to the
616	   first approach.

618	   Consequently, at the present time, implementations are recommended to
619	   support both the fourth (BGP-based) and first (Full per-MPVN PIM
620	   peering) mechanisms.  Further experience on deployments of the fourth
621	   approach is needed before some best practice can be defined.  In the
622	   meantime, this recommendation would enable service providers to
623	   choose between the first and the fourth mechanism, without this
624	   choice being constrained by vendors implementation choices.

626	3.4.  Encapsulation techniques for P-multicast trees

628	   In this section the authors will not make any restricting
629	   recommendations since the appropriateness of a specific provider core
630	   data plane technology will depend on a large number of factors, for
631	   example the service provider's currently deployed unicast data plane,
632	   many of which are service provider specific.

634	   However, implementations should not unreasonably restrict the data
635	   plane technology that can be used, and should not force the use of
636	   the same technology for different VPNs attached to a single PE.
637	   Initial implementations may only support a reduced set of
638	   encapsulation techniques and data plane technologies but this should
639	   not be a limiting factor that hinders future support for other
640	   encapsulation techniques, data plane technologies or
641	   interoperability.

643	   Section 5.2.4.1 of [RFC4834] states "In a multicast VPN solution
644	   extending a unicast L3 PPVPN solution, consistency in the tunneling
645	   technology has to be favored: such a solution SHOULD allow the use of
646	   the same tunneling technology for multicast as for unicast.
647	   Deployment consistency, ease of operation and potential migrations
648	   are the main motivations behind this requirement."

650	   Current unicast VPN deployments use a variety of LDP, RSVP-TE and
651	   GRE/IP-Multicast for encapsulating customer packets for transport
652	   across the provider core of VPN services.  In order to allow the same
653	   encapsulations to be used for unicast and multicast VPN traffic, it
654	   is recommended that multicast VPN standards should recommend
655	   implementations to support for multicast VPNs, all the P2MP variants
656	   of the encapsulations and signaling protocols that they support for
657	   unicast and for which some multipoint extension is defined, such as
658	   mLDP, P2MP RSVP-TE and GRE/IP-multicast.

660	   All three of the above encapsulation techniques support the building
661	   of P2MP multicast P-tunnels.  In addition mLDP and GRE/
662	   IP-ASM-Multicast implementations may also support the building of
663	   MP2MP multicast P-tunnels.  The use of MP2MP P-tunnels may provide
664	   some scaling benefits to the service provider as only a single MP2MP
665	   P-tunnel need be deployed per VPN, thus reducing by an order of
666	   magnitude the amount of multicast state that needs to be maintained
667	   by P routers.  This gain in state is at the expense of bandwidth
668	   optimization, since sites that do not have multicast receivers for
669	   multicast streams sourced behind a said PE group will still receive
670	   packets of such streams, leading to non-optimal bandwidth utilization
671	   across the VPN core.  One thing to consider is that the use of MP2MP
672	   multicast P-tunnel will require additional configuration to define
673	   the same P-tunnel identifier or multicast ASM group address in all
674	   PEs (it has been noted that some auto-configuration could be possible
675	   for MP2MP P-tunnels, but this it is not currently supported by the
676	   auto-discovery procedures). [ It has been noted that C-multicast
677	   routing schemes not covered in [I-D.ietf-l3vpn-2547bis-mcast] could
678	   expose different advantages of MP2MP multicast P-tunnels - this is
679	   out of scope of this document ]

681	   MVPN services can also be supported over a unicast VPN core through
682	   the use of ingress PE replication whereby the ingress PE replicates
683	   any multicast traffic over the P2P tunnels used to support unicast
684	   traffic.  While this option does not require the service provider to
685	   modify their existing P routers (in terms of protocol support) and
686	   does not require maintaining multicast-specific state on the P
687	   routers in order for the service provider to be able deploy a
688	   multicast VPN service, the use of ingress PE replication obviously
689	   leads to non-optimal bandwidth utilization and it is therefore
690	   unlikely to be the long term solution chosen by service providers.
691	   However ingress PE replication may be useful during some migration
692	   scenarios or where a service provider considers the level of
693	   multicast traffic on their network to be too low to justify deploying
694	   multicast specific support within their VPN core.

696	   All proposed approaches for control plane and dataplane can be used
697	   to provide aggregation amongst multicast groups within a VPN and
698	   amongst different multicast VPNs, and potentially reduce the amount
699	   of state to be maintained by P routers.  However the latter -- the
700	   aggregation amongst different multicast VPNs will require support for
701	   upstream-assigned labels on the PEs.  Support for upstream-assigned
702	   labels may require changes to the data plane processing of the PEs
703	   and this should be taken into consideration by service providers
704	   considering the use of aggregate PMSI tunnels for the specific
705	   platforms that the service provider has deployed.

707	3.5.  Inter-AS deployments options

709	   There are a number of scenarios that lead to the requirement for
710	   inter-AS multicast VPNs, including:

712	   1.  a service provider may have a large network that they have
713	       segmented into a number of ASs.

715	   2.  a service provider's multicast VPN may consist of a number of ASs
716	       due to acquisitions and mergers with other service providers.

718	   3.  a service provider may wish to interconnect their multicast VPN
719	       platform with that of another service provider.

721	   The first scenario can be considered the "simplest" because the
722	   network is wholly managed by a single service provider under a single
723	   strategy and is therefore likely to use a consistent set of
724	   technologies across each AS.

726	   The second scenario may be more complex than the first because the
727	   strategy and technology choices made for each AS may have been
728	   different due to their differing history and the service provider may
729	   not have (or may be unwilling to) unified the strategy and technology
730	   choices for each AS.

732	   The third scenario is the most complex because in addition to the
733	   complexity of the second scenario, the ASs are managed by different
734	   service providers and therefore may be subject to a different trust
735	   model than the other scenarios.

737	   Section 5.2.6 of [RFC4834] states that "a solution MUST support
738	   inter-AS multicast VPNs, and SHOULD support inter-provider multicast
739	   VPNs", "considerations about coexistence with unicast inter-AS VPN
740	   Options A, B and C (as described in section 10 of [RFC4364]) are
741	   strongly encouraged" and "a multicast VPN solution SHOULD provide
742	   inter-AS mechanisms requiring the least possible coordination between
743	   providers, and keep the need for detailed knowledge of providers'
744	   networks to a minimum - all this being in comparison with
745	   corresponding unicast VPN options".

747	   Section 8 of [I-D.ietf-l3vpn-2547bis-mcast] addresses these
748	   requirements by proposing two approaches for MVPN inter-AS
749	   deployments:

751	   1.  Non-segmented inter-AS tunnels where the multicast tunnels are
752	       end-to-end across ASes, so even though the PEs belonging to a
753	       given MVPN may be in different ASs the ASBRs play no special role
754	       and function merely as P routers (described in section 8.1).

756	   2.  Segmented inter-AS tunnels where each AS constructs its own
757	       separate multicast tunnels which are then 'stitched' together by
758	       the ASBRs (described in section 8.2).

760	   Section 5.2.6 of [RFC4834] also states "Within each service provider
761	   the service provider SHOULD be able on its own to pick the most
762	   appropriate tunneling mechanism to carry (multicast) traffic among
763	   PEs (just like what is done today for unicast)".  The segmented
764	   approach is the only one capable of meeting this requirement.

766	   The segmented inter-AS solution would appear to offer the largest
767	   degree of deployment flexibility to operators.  However the non-
768	   segmented inter-AS solution can simplify deployment in a restricted
769	   number of scenarios and [I-D.rosen-vpn-mcast] only supports the non-
770	   segmented inter-AS solution and therefore the non-segmented inter-AS
771	   solution is likely to be useful to some operators for backward
772	   compatibility and during migration from [I-D.rosen-vpn-mcast] to
773	   [I-D.ietf-l3vpn-2547bis-mcast].

775	   The applicability of segmented or non-segmented inter-AS tunnels to a
776	   given deployment or inter-provider interconnect will depend on a
777	   number of factors specific to each service provider.  However, due to
778	   the additional deployment flexibility offered by segmented inter-AS
779	   tunnels, it is the recommendation of the authors that all
780	   implementations should support the segmented inter-AS model.
781	   Additionally, the authors recommend that implementations should
782	   consider supporting the non-segmented inter-AS model in order to
783	   facilitate co-existence with existing deployments, and as a feature
784	   to provide a lighter engineering in a restricted set of scenarios,
785	   although it is recognized that initial implementations may only
786	   support one or the other.

788	3.6.  Bidir-PIM support

790	   In Bidir-PIM, the packet forwarding rules have been improved over
791	   PIM-SM, allowing traffic to be passed up the shared tree toward the
792	   RP Address (RPA).  To avoid multicast packet looping, Bidir-PIM uses
793	   a mechanism called the designated forwarder (DF) election, which
794	   establishes a loop-free tree rooted at the RPA.  Use of this method
795	   ensures that only one copy of every packet will be sent to an RPA,
796	   even if there are parallel equal cost paths to the RPA.  To avoid
797	   loops the DF election process enforces consistent view of the DF on
798	   all routers on network segment, and during periods of ambiguity or
799	   routing convergence the traffic forwarding is suspended.

801	   In the context of a multicast VPN solution, a solution for Bidir-PIM
802	   support must preserve this property of similarly avoiding packet
803	   loops, including in the case where mVRF's in a given MVPN don't have
804	   a consistent view of the routing to C-RPL/C-RPA.

806	   The current MVPN specifications [I-D.ietf-l3vpn-2547bis-mcast] in
807	   section 11, define three methods to support Bidir-PIM, as RECOMMENDED
808	   in [RFC4834]:

810	   1.  Standard DF election procedure over an MI-PMSI

812	   2.  VPN Backbone as the RPL (section 11.1)

814	   3.  Partitioned Sets of PEs (section 11.2)

816	   Method (1) is naturally applied to deployments using "Full per-MVPN
817	   PIM peering across an MI-PMSI" for C-multicast routing, but as
818	   indicated in [I-D.ietf-l3vpn-2547bis-mcast] in section 11, the DF
819	   Election may not work well in an MVPN environment and an alternative
820	   to DF election would be desirable.

822	   The advantage of method (2) and (3) is that they do not require
823	   running the DF election procedure among PEs.

825	   Method (2) leverages the fact that in Bidir-PIM, running the DF
826	   election procedure is not needed on the RPL.  This approach thus has
827	   the benefit of simplicity of implementation, especially in a context
828	   where BGP-based C-multicast routing is used.  However it has the
829	   drawback of putting constraints on how Bidir-PIM is deployed which
830	   may not always match MVPN customers requirements.

832	   Method (3) treats an MVPN as a collection of sets of multicast VRFs,
833	   all PEs in a set having the same reachability information towards
834	   C-RPA, but distinct from PEs in other sets.  Hence, with this method,
835	   C-Bidir packet loops in MVPN are resolved by the ability to partition
836	   a VPN into disjoints sets of VRF's, each having a distinct view of
837	   converged network.  The partitioning approach to Bidir-PIM requires
838	   either upstream-assigned MPLS labels (to denote the partition) or a
839	   unique MP2MP LSP per partition.  The former is based on PE
840	   Distinguisher Labels that have to be distributed using auto-discovery
841	   BGP routes and their handling requires the support for upstream
842	   assigned labels and context label lookups [RFC5331].  The latter,
843	   using MP2MP LSP per partition, does not have these constraints but is
844	   restricted to P-tunnel types supporting MP2MP connectivity (such as
845	   mLDP [I-D.ietf-mpls-ldp-p2mp]).

847	   This approach to C-Bidir can work with PIM-based or BGP-based
848	   C-multicast routing procedures, and is also generic in the sense that
849	   it does not impose any requirements on the Bidir-PIM service
850	   offering.

852	   Given the above considerations, method (3) "Partitioned Sets of PEs"
853	   is the RECOMMENDED approach.

855	   In the event where method (3) is not applicable (lack of support for
856	   upstream assigned labels or for a P-tunnel type providing MP2MP
857	   connectivity), then method (1) "Standard DF election procedure over
858	   an MI-PMSI" and (2) "VPN Backbone as the RPL" are RECOMMENDED as
859	   interim solutions, (1) having the advantage over (2) of not putting
860	   constraints on how Bidir-PIM is deployed and the drawbacks of only
861	   being applicable when PIM-based C-multicast is used and of possibly
862	   not working well in an MVPN environment.

864	4.  Co-located RPs

866	   Section 5.1.10.1 of [RFC4834] states "In the case of PIM-SM in ASM
867	   mode, engineering of the RP function requires the deployment of
868	   specific protocols and associated configurations.  A service provider
869	   may offer to manage customers' multicast protocol operation on their
870	   behalf.  This implies that it is necessary to consider cases where a
871	   customer's RPs are out-sourced (e.g. on PEs).  Consequently, a VPN
872	   solution MAY support the hosting of the RP function in a VR or VRF."
873	   However, customers who have already deployed multicast within their
874	   networks and have therefore already deployed their own internal RPs
875	   are often reluctant to hand over the control of their RPs to their
876	   service provider and make use of a co-located RP model, and providing
877	   RP-collocation on a PE will require the activation of MSDP or the
878	   processing of PIM Registers on the PE.  Securing the PE routers for
879	   such activity requires special care, additional work, and will likely
880	   rely on specific features to be provided by the routers themselves.

882	   The applicability of the co-located RP model to a given MVPN will
883	   thus depend on a number of factors specific to each customer and
884	   service provider.

886	   It is therefore the recommendation that implementations should
887	   support a co-located RP model, but that support for a co-located RP
888	   model within an implementation should not restrict deployments to
889	   using a co-located RP model: implementations MUST support deployments
890	   when activation of a PIM RP function (PIM Register processing and RP-
891	   specific PIM procedures) or VRF MSDP instance is not required on any
892	   PE router and where all the RPs are deployed within the customers'
893	   networks or CEs.

895	5.  Existing deployments

897	   Some suggestions provided in this document can be used to
898	   incrementally modify currently deployed implementations without
899	   hindering these deployments, and without hindering the consistency of
900	   the standardized solution by providing optional per-VRF configuration
901	   knobs to support modes of operation compatible with currently
902	   deployed implementations, while at the same time using the
903	   recommended approach on implementations supporting the standard.

905	   In cases where this may not be easily achieved, a recommended
906	   approach would be to provide a per-VRF configuration knob that allows
907	   incremental per-VPN migration of the mechanisms used by a PE device,
908	   which would allow migration with some per-VPN interruption of service
909	   (e.g. during a maintenance window).

911	   Mechanisms allowing "live" migration by providing concurrent use of
912	   multiple alternatives for a given PE and a given VPN, is not seen as
913	   a priority considering the expected implementation complexity
914	   associated with such mechanisms.  However, if there happen to be
915	   cases where they could be viably implemented relatively simply, such
916	   mechanisms may help improve migration management.

918	6.  Summary of recommendations

920	   The following list summarizes conclusions on the mechanisms that
921	   define the set of mandatory to implement mechanisms in the context of
922	   [I-D.ietf-l3vpn-2547bis-mcast].

924	   Note well that the implementation of the non-mandatory alternative
925	   mechanisms is not precluded.

927	   Recommendations are:

929	   o  that BGP-based auto-discovery be the mandated solution for auto-
930	      discovery ;

932	   o  that BGP be the mandated solution for S-PMSI switching signaling ;

934	   o  that implementations support both the BGP-based and the full per-
935	      MPVN PIM peering solutions for PE-PE exchange of customer
936	      multicast routing until further operational experience is gained
937	      with both solutions ;

939	   o  that implementations use the "Partitioned Sets of PEs" approach
940	      for Bidir-PIM support ;

942	   o  that implementations implement the P2MP variants of the P2P
943	      protocols that they already implement, such as mLDP, P2MP RSVP-TE
944	      and GRE/IP-Multicast ;

946	   o  that implementations support segmented inter-AS tunnels and
947	      consider supporting non-segmented inter-AS tunnels (in order to
948	      maintain backwards compatibility and for migration) ;

950	   o  implementations MUST support deployments when activation of a PIM
951	      RP function (PIM Register processing and RP-specific PIM
952	      procedures) or VRF MSDP instance is not required on any PE router.

954	7.  IANA Considerations

956	   This document makes no request to IANA.

958	   [ Note to RFC Editor: this section may be removed on publication as
959	   an RFC. ]

961	8.  Security Considerations

963	   This document does not by itself raise any particular security
964	   considerations.

966	9.  Acknowledgements

968	   We would like to thank Adrian Farrel, Eric Rosen, Yakov Rekhter, and
969	   Maria Napierala for their feedback that helped shape this document.

971	   Additional credit is due to Maria Napierala for co-authoring
972	   Section 3.6 on Bidir-PIM support.

974	10.  References

976	10.1.  Normative References

978	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
979	              Requirement Levels", BCP 14, RFC 2119, March 1997.

981	   [I-D.ietf-l3vpn-2547bis-mcast]
982	              Aggarwal, R., Bandi, S., Cai, Y., Morin, T., Rekhter, Y.,
983	              Rosen, E., Wijnands, I., and S. Yasukawa, "Multicast in
984	              MPLS/BGP IP VPNs", draft-ietf-l3vpn-2547bis-mcast-08 (work
985	              in progress), March 2009.

987	   [I-D.ietf-l3vpn-2547bis-mcast-bgp]
988	              Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP
989	              Encodings and Procedures for Multicast in MPLS/BGP IP
990	              VPNs", draft-ietf-l3vpn-2547bis-mcast-bgp-08 (work in
991	              progress), September 2009.

993	10.2.  Informative References

995	   [RFC4834]  Morin, T., "Requirements for Multicast in L3 Provider-
996	              Provisioned Virtual Private Networks (PPVPNs)", RFC 4834,
997	              April 2007.

999	   [I-D.rosen-vpn-mcast]
1000	              Cai, Y., Rosen, E., and I. Wijnands, "Multicast in MPLS/
1001	              BGP IP VPNs", draft-rosen-vpn-mcast-12 (work in progress),
1002	              August 2009.

1004	   [I-D.raggarwa-l3vpn-2547-mvpn]
1005	              Aggarwal, R., "Base Specification for Multicast in BGP/
1006	              MPLS VPNs", draft-raggarwa-l3vpn-2547-mvpn-00 (work in
1007	              progress), June 2004.

1009	   [I-D.ietf-pim-sm-linklocal]
1010	              Atwood, J., "Authentication and Confidentiality in PIM-SM
1011	              Link-local Messages", draft-ietf-pim-sm-linklocal-08 (work
1012	              in progress), November 2007.

1014	   [I-D.ietf-pim-port]
1015	              Farinacci, D., Wijnands, I., Venaas, S., and M. Napierala,
1016	              "A Reliable Transport Mechanism for PIM",
1017	              draft-ietf-pim-port-01 (work in progress), July 2009.

1019	   [RFC4684]  Marques, P., Bonica, R., Fang, L., Martini, L., Raszuk,
1020	              R., Patel, K., and J. Guichard, "Constrained Route
1021	              Distribution for Border Gateway Protocol/MultiProtocol
1022	              Label Switching (BGP/MPLS) Internet Protocol (IP) Virtual
1023	              Private Networks (VPNs)", RFC 4684, November 2006.

1025	   [I-D.ietf-mpls-ldp-p2mp]
1026	              Minei, I., Kompella, K., Wijnands, I., and B. Thomas,
1027	              "Label Distribution Protocol Extensions for Point-to-
1028	              Multipoint and  Multipoint-to-Multipoint Label Switched
1029	              Paths", draft-ietf-mpls-ldp-p2mp-08 (work in progress),
1030	              October 2009.

1032	   [RFC5331]  Aggarwal, R., Rekhter, Y., and E. Rosen, "MPLS Upstream
1033	              Label Assignment and Context-Specific Label Space",
1034	              RFC 5331, August 2008.

1036	Appendix A.  Scalability of C-multicast routing processing load

1038	   The main role of multicast routing is to let routers determine that
1039	   they should start or stop forwarding a said multicast stream on a
1040	   said link.  In an MVPN context, this has to be done for each MVPN,
1041	   and the associated function is thus named "customer-multicast
1042	   routing" or "C-multicast routing" and its role is to let PE routers
1043	   determine that they should start or stop forwarding the traffic of a
1044	   said multicast stream toward the remote PEs, on some PMSI tunnel.

1046	   When some "join" message is received by a PE, this PE knows that it
1047	   should be sending traffic for the corresponding multicast group of
1048	   the corresponding MVPN.  But the reception of a "prune" message from
1049	   a remote PE is not enough by itself for a PE to know that it should
1050	   stop forwarding the corresponding multicast traffic: it has to make
1051	   sure that they aren't any other PEs that still have receivers for
1052	   this traffic.

1054	   There are many ways that the "C-multicast routing" building block can
1055	   be designed, and they differ, among other things, in how a PE
1056	   determines when it can stop forwarding a said multicast stream toward
1057	   other PEs:

1059	   PIM LAN Procedures, by default
1060	      By default when PIM LAN procedures are used, when a PE on a LAN
1061	      Prunes itself from a multicast tree, all other PEs on that LAN
1062	      check their own state to known if they are on the tree, in which
1063	      case they send a PIM Join message on that LAN to override the
1064	      Prune.  Thus, for each PIM Prune message, all PE routers on the
1065	      LAN work to let the upstream PE determine the answer to the "did
1066	      the last receiver leave?" question.

1068	   PIM LAN Procedures, with explicit tracking :
1069	      On a LAN, PIM LAN procedures can use an "explicit tracking"
1070	      approach, where a PE which is the upstream router for a multicast
1071	      stream maintains an updated list of all neighbors on the LAN who
1072	      are joined to the tree.  Thus, when it receives a Leave message
1073	      from a PIM neighbor, it instantly knows the answer to the "did the
1074	      last receiver leave?" question.
1075	      In this case, the question is answered by the upstream router
1076	      alone.  The side effect of this "explicit tracking" is that "Join
1077	      suppression" is not used: the downstream PEs will always send
1078	      Joins toward the upstream PE, which will have to process them all.

1080	   BGP-based C-multicast routing
1081	      When BGP-based procedures are used for C-multicast routing, if no
1082	      BGP route reflector is used, the "did the last receiver leave?"
1083	      question is answered like in the PIM "explicit tracking" approach.
1084	      But, when a BGP route reflector is used (which is expected to be
1085	      the recommended approach), the role of maintaining an updated list
1086	      of the PEs that are part of a said multicast tree is taken care of
1087	      by the Route Reflector(s).  Using BGP procedures the route
1088	      reflector that had been advertised a C-multicast Source Tree Join
1089	      route for a said (C-S, C-G) to other route reflectors before, will
1090	      withdraw this route when there is no of its clients PEs
1091	      advertising this route anymore.  Similarily, a route reflector
1092	      that had advertised this route to its client PEs before, will
1093	      withdraw this route when there is none of its (other) client PEs,
1094	      and none of its route reflectors peers advertising this route
1095	      anymore.  In this context, the "did the last receiver leave?"
1096	      question can be said to be answered by the route-reflector(s).
1097	      Furthermore, the BGP route distribution can leverage more than one
1098	      route reflector: if multiple route reflectors are used with PEs
1099	      being distributed (as clients) among these route reflectors, the
1100	      "did the last receiver leave?" question is partly answered by each
1101	      of these route reflector.

1103	   We can see that answering the "last receiver leaves" question is a
1104	   significant proportion of the work that the C-multicast routing
1105	   building block has to make, and where the approaches differ most.
1106	   The different approaches for handling C-multicast routing can result
1107	   in a different amount of processing and how this processing is spread
1108	   among the different functions.  These differences can be better
1109	   estimated by quantifying the amount of message processing and state
1110	   maintenance.

1112	   Though the type of processing, messages and states, may vary with the
1113	   different approaches, we propose here a rough estimation of the load
1114	   of PEs, in terms of number of messages processed and number of
1115	   control plane states maintained.  A "message processed" being a
1116	   message being parsed, a lookup being done, and some action being
1117	   taken (such has updating a control plane or data plane state).  A
1118	   "state maintained" being a multicast state kept in the control plane
1119	   memory of a PE, related to a interface or a PE being subscribed to a
1120	   multicast stream.  Note that here we don't compare the data plane
1121	   states on PE routers, which wouldn't vary between the different
1122	   options chosen.

1124	A.1.  Scalability with an increased number of PEs

1126	   The following sections aims at evaluating the processing and state
1127	   maintenance load for an increasingly high number of PEs in a VPN.

1129	A.1.1.  SSM Scenario

1131	   The following subsections do such an estimation for each proposed
1132	   approach for C-multicast routing, for different phases of the
1133	   following scenario:

1135	   o  one SSM multicast stream is considered

1137	   o  only the intra-AS case is concerned (with the segmented inter-AS
1138	      tunnels and BGP-based C-multicast routing, #mvpn_PE and #R_PE
1139	      should refer to the PEs of the MVPN in the AS, not to all PEs of
1140	      the MVPN)

1142	   o  the scenario is as follows:

1144	      *  one PE Joins the multicast stream (because of a new receiver-
1145	         connected site has sent a Join on the PE-CE link), followed by
1146	         a number of additional PEs that also join the same multicast
1147	         stream, one after the other ; we evaluate the processing
1148	         required for the addition of each PE

1150	      *  some period of time T passes, without any PE joining or leaving
1151	         (baseline)

1153	      *  all PE leaves, one after the other, until the last one leaves ;
1154	         we evaluate the processing required for the leave of each PE

1156	   o  the parameters used are:

1158	      *  #MVPN_PE: the number of PEs in the MVPN

1160	      *  #R_PE: the number of PEs joining the multicast stream

1162	      *  #RR: the number of route reflectors

1164	      *  T_PIM_r: the time between two refreshes of a PIM Join (default
1165	         is 60s)

1167	   The estimation unit used is the "message.equipment" (or "m.e"): one
1168	   "message.equipment" corresponding to "one equipment processing one
1169	   message" (10 m.e being "10 equipments processing each one message",
1170	   or "5 messages each processed by 2 equipments", or "1 message
1171	   processed by 10 equipment", etc.).  Similarly, for the amount of
1172	   control plane state, the unit used is "state.equipment" or "s.e".
1173	   This allow to take into account the fact that a message (or a state)
1174	   can have be processed (or maintained) by more than one node.

1176	   We distinguish three different types of equipments: the upstream PE
1177	   for the considered multicast stream, the RR (if any), and the other
1178	   PEs (which are not the upstream PE).

1180	   The numbers or orders of magnitude given in the tables in the
1181	   following subsections are totals across all equipments of a same
1182	   type, for each type of equipment, in the the "m.e" and "s.e" units
1183	   defined above.

1185	   Additionally:

1187	   o  for PIM, only Join and Prune messages are counted:

1189	      *  the load due to PIM Hellos can be easily computed separately
1190	         and only depends on the number of PEs in the VPN;

1192	      *  message processing related to the PIM Assert mechanism is also
1193	         not taken into account, for sake of simplicity;

1195	   o  for BGP, all advertisements and withdrawals of C-multicast Source
1196	      Tree Join routes are considered (Source-Active autodiscovery
1197	      routes are not used in an SSM context) ; and, following the
1198	      recommendation of [I-D.ietf-l3vpn-2547bis-mcast-bgp] the case
1199	      where the RT-Constraint mechanisms [RFC4684] is not used is not
1200	      covered;

1202	A.1.1.1.  PIM LAN procedures, by default

1204	   +------------+------------+---------------+----------+--------------+
1205	   |            | upstream   | other PEs     | RR       | total across |
1206	   |            | PE (1)     | (total across | (none)   | all          |
1207	   |            |            | (#mvpn_PE-1)  |          | equipments   |
1208	   |            |            | PEs)          |          |              |
1209	   +------------+------------+---------------+----------+--------------+
1210	   | first PE   | 1 m.e      | #MVPN_PE-1    | /        | #MVPN_PE m.e |
1211	   | joins      |            | m.e           |          |              |
1212	   +------------+------------+---------------+----------+--------------+
1213	   | for *each* | 1 m.e      | #mvpn_PE-1    | /        | #mvpn_PE m.e |
1214	   | additional |            | m.e           |          |              |
1215	   | PE joining |            |               |          |              |
1216	   +------------+------------+---------------+----------+--------------+
1217	   | baseline   | T/T_PIM_r  | (T/T_PIM_r) . | /        | (T/T_PIM_r)  |
1218	   | processing | m.e        | (#mvpn_PE-1)  |          | x #mvpn_PE   |
1219	   | over a     |            | m.e           |          | m.e          |
1220	   | period T   |            |               |          |              |
1221	   +------------+------------+---------------+----------+--------------+
1222	   | for *each* | 2 m.e      | 2(#mvpn_PE-1) | /        | 2 x #mvpn_PE |
1223	   | PE leaving |            | m.e           |          | m.e          |
1224	   +------------+------------+---------------+----------+--------------+
1225	   | the last   | 1 m.e      | #mvpn_PE-1    | /        | #mvpn_PE m.e |
1226	   | PE leaves  |            | m.e           |          |              |
1227	   +------------+------------+---------------+----------+--------------+
1228	   | total for  | #R_PE x 2  | (#mvpn_PE-1)  | 0        | #mvpn_PE x ( |
1229	   | #R_PE PEs  | +          | x (#R_PE) x 2 |          | 3 x #R_PE +  |
1230	   |            | T/T_PIM_r  | + T/T_PIM_r)  |          | T/T_PIM_r )  |
1231	   |            | m.e        | .             |          | m.e          |
1232	   |            |            | (#mvpn_PE-1)  |          |              |
1233	   |            |            | m.e           |          |              |
1234	   +------------+------------+---------------+----------+--------------+
1235	   | total      | 1 s.e      | #R_PE s.e     | 0        | #R_PE+1 s.e  |
1236	   | state      |            |               |          |              |
1237	   | maintained |            |               |          |              |
1238	   +------------+------------+---------------+----------+--------------+

1240	    Messages processing and state maintenance - PIM LAN procedures, by
1241	                                  default

1243	   We suppose here that the PIM Join suppression and Prune Override
1244	   mechanisms are fully effective, i.e. that a Join or Prune message
1245	   sent by a PE is instantly seen by other PEs.  Strictly speaking, this
1246	   is not true, and depending on network delays and timing, there could
1247	   be cases where more messages are exchanged and the number given in
1248	   this table is a lower bound to the number of PIM messages exchanged.

1250	A.1.1.2.  PIM LAN procedures, with explicit tracking

1252	   +-------------+-------------+----------------+--------+-------------+
1253	   |             | upstream PE | other PEs      | RRs    | total       |
1254	   |             | (1)         | (total across  | (none) | across all  |
1255	   |             |             | (#mvpn_PE-1)   |        | equipments  |
1256	   |             |             | PEs)           |        |             |
1257	   +-------------+-------------+----------------+--------+-------------+
1258	   | first PE    | 1 m.e       | 1 m.e (see     | /      | 2 m.e       |
1259	   | joins       |             | note below)    |        |             |
1260	   +-------------+-------------+----------------+--------+-------------+
1261	   | for *each*  | 1 m.e       | 1 m.e (see     | /      | 2 m.e       |
1262	   | additional  |             | note below)    |        |             |
1263	   | PE joining  |             |                |        |             |
1264	   +-------------+-------------+----------------+--------+-------------+
1265	   | baseline    | (T/T_PIM_r) | (T/T_PIM_r)    | /      | (T/T_PIM_r) |
1266	   | processing  | m.e x #R_PE | m.e (see note  |        | x #R_PE m.e |
1267	   | over a      | m.e         | below)         |        |             |
1268	   | period T    |             |                |        |             |
1269	   +-------------+-------------+----------------+--------+-------------+
1270	   | for *each*  | 1 m.e       | 1 m.e (see     | /      | 2 m.e       |
1271	   | PE leaving  |             | note below)    |        |             |
1272	   +-------------+-------------+----------------+--------+-------------+
1273	   | the last PE | 1 m.e       | 1 m.e (see     | /      | 2 m.e       |
1274	   | leaves      |             | note below)    |        |             |
1275	   +-------------+-------------+----------------+--------+-------------+
1276	   | total for   | #R_PE (2 +  | #R_PE x ( 2 +  | 0      | #R_PE x ( 4 |
1277	   | #R_PE PEs   | T/T_PIM_r)  | T/T_PIM_r) m.e |        | +           |
1278	   |             | m.e         |                |        | T/T_PIM_r)  |
1279	   |             |             |                |        | m.e         |
1280	   +-------------+-------------+----------------+--------+-------------+
1281	   | total state | #R_PE s.e   | #R_PE s.e      | 0      | 2 x #R_PE   |
1282	   | maintained  |             |                |        | s.e         |
1283	   +-------------+-------------+----------------+--------+-------------+

1285	   Messages processing and state maintenance - PIM LAN procedures, with
1286	                             explicit tracking

1288	   Note: in this explicit tracking mode, a said Join or Leave message
1289	   requires processing only by the upstream PE and the PE sending the
1290	   message ; indeed, other PEs don't have any action to take ; it is to
1291	   be noted though that these other PEs will still have to parse the PIM
1292	   message, which is not zero processing.  We make here the assumption
1293	   that this is not significant.

1295	A.1.1.3.  BGP-based C-multicast routing

1297	   The following analysis assumes that BGP Route Reflectors (RRs) are
1298	   used, and no hierarchy of RRs (remind that the analysis also assumes
1299	   that Route Target Constrain mechanisms are is used).

1301	   Given these assumptions, a message carrying a C-multicast route from
1302	   a downstream PE would need to be processed by the RRs that have that
1303	   PE as their client.  Due to the use of RT Constrain, these RRs would
1304	   then send this message to only the RRs that have the upstream PE as
1305	   client.  None of the other RRs, and none of the other PEs will
1306	   receive this message.  Thus, for a message associated with a given
1307	   MVPN the total number of RRs that would need to process this message
1308	   only depends on the number of RRs that maintain C-multicast routes
1309	   for that MVPN and that have either the receiver-connected PE, or the
1310	   source-connected PE as their clients, and is independent of the total
1311	   number of RRs or the total number of PEs.

1313	   In practice for a given MVPN a PE would be a client of just 2 RRs
1314	   (for redundancy, an RR cluster would typically have 2 RRs).
1315	   Therefore, in practice the message would need to be processed by at
1316	   most 4 RRs (2 RRs if both the downstream PE and the upstream PE are
1317	   the clients of the same RRs).  Thus the number of RRs that have to
1318	   process a given message is at most 4.  Since RRs in different RR
1319	   clusters have a full IBGP mesh among themselves, each RR in the RR
1320	   cluster that contains the upstream PE would receive the message from
1321	   each of the RR in the RR cluster that contains the downstream PE.
1322	   Given 2 RRs per cluster, the total number of messages processed by
1323	   all the RRs is 6.

1325	   Additionaly, as soon as there is a receiver-connected PEs in each RR
1326	   cluster, the number of RRs processing a C-multicast route tends
1327	   quickly toward 2 (taking into account that a PE peering to RRs will
1328	   be made redundant).

1330	   +------------+----------+--------------+-------------+---------------+
1331	   |            | upstream | other PEs    | RRs (#RR)   | total across  |
1332	   |            | PE (1)   | (total       |             | all           |
1333	   |            |          | across       |             | equipments    |
1334	   |            |          | (#mvpn_PE-1) |             |               |
1335	   |            |          | PEs)         |             |               |
1336	   +------------+----------+--------------+-------------+---------------+
1337	   | first PE   | 2 m.e    | 2 m.e        | 6 m.e       | 10 m.e        |
1338	   | joins      |          |              |             |               |
1339	   +------------+----------+--------------+-------------+---------------+
1340	   +------------+----------+--------------+-------------+---------------+
1341	   | for *each* | 0        | 2 m.e        | (at most) 6 | (at most) 8   |
1342	   | additional |          |              | m.e tending | m.e tending   |
1343	   | PE joining |          |              | toward 2    | toward 4 m.e  |
1344	   |            |          |              | m.e         |               |
1345	   +------------+----------+--------------+-------------+---------------+
1346	   | baseline   | 0        | 0            | 0           | 0             |
1347	   | processing |          |              |             |               |
1348	   | over a     |          |              |             |               |
1349	   | period T   |          |              |             |               |
1350	   +------------+----------+--------------+-------------+---------------+
1351	   | for *each* | 0        | 2 m.e        | (at most) 6 | (at most) 8   |
1352	   | PE leaving |          |              | m.e tending | m.e tending   |
1353	   |            |          |              | toward 2    | toward 4 m.e  |
1354	   +------------+----------+--------------+-------------+---------------+
1355	   | the last   | 2 m.e    | 2 m.e        | 6 m.e       | 6 m.e         |
1356	   | PE leaves  |          |              |             |               |
1357	   +------------+----------+--------------+-------------+---------------+
1358	   | total for  | 4 m.e    | #R_PE x 4    | (at most) 6 | at most 2 (5  |
1359	   | #R_PE PEs  |          | m.e          | x #RP_PE    | x #R_PE + 2)  |
1360	   |            |          |              | m.e         | m.e (tending  |
1361	   |            |          |              | (tending    | toward 2 (3   |
1362	   |            |          |              | toward 2 x  | #R_PE + 2)    |
1363	   |            |          |              | #R_PE m.e)  | m.e )         |
1364	   +------------+----------+--------------+-------------+---------------+
1365	   | total      | 2 s.e    | #R_PE s.e    | approx. 2   | approx. 3     |
1366	   | state      |          |              | #R_PE + #RR | #R_PE + #RRx  |
1367	   | maintained |          |              | x #clusters | #clusters + 2 |
1368	   |            |          |              | s.e         | m.e           |
1369	   +------------+----------+--------------+-------------+---------------+

1371	      Message processing and state maintenance - BGP-based procedures

1373	A.1.1.4.  Side by side orders of magnitude comparison

1375	   This section concludes on the previous section by considering the
1376	   orders of magnitude when the number of PEs in a VPN increases.

1378	   +------------+----------------------+--------------+----------------+
1379	   |            | PIM LAN Procedures,  | PIM LAN      | BGP-based      |
1380	   |            | default              | Procedures,  |                |
1381	   |            |                      | explicit     |                |
1382	   |            |                      | tracking     |                |
1383	   +------------+----------------------+--------------+----------------+
1384	   | first PE   | O(#MVPN_PE)          | O(1)         | O(1)           |
1385	   | joins (in  |                      |              |                |
1386	   | m.e)       |                      |              |                |
1387	   +------------+----------------------+--------------+----------------+
1388	   | for *each* | O(#MVPN_PE)          | O(1)         | O(1)           |
1389	   | additional |                      |              |                |
1390	   | PE joining |                      |              |                |
1391	   | (in m.e)   |                      |              |                |
1392	   +------------+----------------------+--------------+----------------+
1393	   | baseline   | (T/T_PIM_r) x        | (T/T_PIM_r)  | 0              |
1394	   | processing | O(#mvpn_PE)          | x O(#R_PE)   |                |
1395	   | over a     |                      |              |                |
1396	   | period T   |                      |              |                |
1397	   | (in m.e)   |                      |              |                |
1398	   +------------+----------------------+--------------+----------------+
1399	   | for *each* | O(#MVPN_PE)          | O(1)         | O(1)           |
1400	   | PE leaving |                      |              |                |
1401	   | (in m.e)   |                      |              |                |
1402	   +------------+----------------------+--------------+----------------+
1403	   | the last   | O(#MVPN_PE)          | O(1)         | O(1)           |
1404	   | PE leaves  |                      |              |                |
1405	   | (in m.e)   |                      |              |                |
1406	   +------------+----------------------+--------------+----------------+
1407	   | total for  | O(#MVPN_PE x #R_PE)  | O(#R_PE) x   | O(#R_PE)       |
1408	   | #R_PE PEs  | + O(#MVPN_PE x       | (T/T_PIM_r)  |                |
1409	   | (in m.e)   | T/T_PIM_r)           |              |                |
1410	   +------------+----------------------+--------------+----------------+
1411	   | states (in | O(#R_PE)             | O(#R_PE)     | O(#R_PE)       |
1412	   | s.e)       |                      |              |                |
1413	   +------------+----------------------+--------------+----------------+
1414	   | notes      | (processing and      | (processing  | (processing    |
1415	   |            | state maintenance    | and state    | and state      |
1416	   |            | are essentially done | maintenance  | maintenance is |
1417	   |            | by, and spread       | is           | essentially    |
1418	   |            | amongst, the PEs of  | essentially  | done by, and   |
1419	   |            | the MVPN ;           | done on the  | spread         |
1420	   |            | non-upstream PEs     | upstream PE) | amongst, the   |
1421	   |            | have processing to   |              | RRs)           |
1422	   |            | do)                  |              |                |
1423	   +------------+----------------------+--------------+----------------+

1425	    Comparison of orders of magnitude for messages processing and state
1426	                maintenance (totals across all equipements)

1428	   The conclusions that can be drawn from the above are that:

1430	   o  the PIM LAN Procedures default approach is particular in that any
1431	      message will be processed by all PEs, including those that are
1432	      neither upstream nor downstream for the message, which results in
1433	      a total amount of messages to process which is in O(#MVPN_PE x
1434	      #R_PE) ; i.e.  O(#MVPN_PE ^ 2) if the proportion of receiver PEs
1435	      is considered constant when the number of PEs increases ;

1437	   o  the two PIM-based approach do refreshes of Join messages, this is
1438	      a linear factor not changing the order of magnitude, but which can
1439	      be significant for long-lived streams ;

1441	   o  the BGP-based approach requires an amount of message processing in
1442	      O(#R_PE), lower than the two other approaches, and which is
1443	      independent of the duration of streams ;

1445	   o  state maintenance is of the same order of magnitude for all
1446	      approaches: O(#R_PE), but the repartition is different:

1448	      *  the PIM LAN Procedure default approach fully spreads, and
1449	         minimizes, the amount of state (one state per PE)

1451	      *  the PIM LAN procedure with explicit tracking, concentrate all
1452	         state on the upstream PE

1454	      *  the BGP-based procedures spread all the state on the set of
1455	         route reflectors

1457	A.1.2.  ASM Scalability

1459	   When PIM-SM is used in a VPN and an ASM multicast group is joined by
1460	   some PEs (#R_PEs) with some sources sending toward this multicast
1461	   group address, we can note the following:

1463	   PEs will generally have to maintain one shared tree, plus one source
1464	   tree for each source sending toward G; each tree resulting in an
1465	   amount of processing and state maintenance similar to what is
1466	   described in the scenario in Appendix A.1.1, with the same
1467	   differences in order of magnitudes between the different approaches
1468	   when the number of PEs is high.

1470	   An exception to this is, when, for a said group in a VPN, among the
1471	   PIM instances in the customer routers and VRFs, none would switch to
1472	   the SPT (SwitchToSptDesired always false): in that case the
1473	   processing and state maintenance load is the one required for
1474	   maintenance of the shared tree only.  It has to be noted that this
1475	   scenario is dependent on customer policy.  To compare the resulting
1476	   load in that case, between PIM-based approaches and the BGP-based
1477	   approach configured to use inter-site shared trees, the scenario
1478	   inAppendix A.1.1 can be used with #R_PEs joining a (C-*,C-G) ASM
1479	   group instead of an SSM group, and the same differences in order of
1480	   magnitude remain true.  In the case of the BGP-based approach used
1481	   without inter-site shared trees, we must take into account the load
1482	   resulting from the fact that to built the C-PIM shared tree, each PE
1483	   has to join the Source Tree to each source ; using the notations of
1484	   Appendix A.1.1 this adds an amount of load (total load across all
1485	   equipements) which is proportional to #R_PEs and the number of
1486	   sources, the order of magnitude with an increasing amount of PEs is
1487	   thus unchanged, and the differences in order of magnitude also remain
1488	   the same.

1490	   Additionaly to the maintenance of trees, PEs have to ensure some
1491	   processing and state maintenance related to individual sources
1492	   sending to a multicast group ; the related procedures and behaviors
1493	   largely may differ depending on which C-multicast routing protocols
1494	   is used, how it is configured, and how multicast source discovery
1495	   mechanism are used in the customer VPN and which SwitchToSptDesired
1496	   policy is used.  However the following can be observed:

1498	   o  when BGP-based C-multicast routing is used, each PE will possibly
1499	      have to process and maintain one BGP Source-Active autodiscovery
1500	      route for (some or all) sources of an ASM group, which results in
1501	      a message processing and state maintenance (total across all the
1502	      equipements) linearly dependent on the number of PEs in the VPN
1503	      (#MVPN_PE) for each source, independently of the number of PEs
1504	      joined to the group.  Depending on whether or not inter-site
1505	      shared trees are used, and depending on the SwitchToSptDesired
1506	      policy in the PIM instances in the customer routers and VRFs, and
1507	      depending on the relative locations of sources and RPs, this will
1508	      happen for all (S,G) of an ASM group or only for some of them, and
1509	      will be done in parallel to the maintenance of shared and/or
1510	      source trees or at the first join of a PE on a source tree

1512	   o  when PIM-based C-multicast routing is used, depending on the
1513	      SwitchToSptDesired policy in the PIM instances in the customer
1514	      routers and VRFs, and depending on the relative locations of
1515	      sources and RPs, there are:

1517	      *  possible control plane state transitions triggered by the
1518	         reception of (S,G) packets ; such events would induce
1519	         processing on all PEs joined to G

1521	      *  possible control plane state transitions triggered by the
1522	         reception of (S,G) packets, and possible PIM Assert messages
1523	         specific to (S,G) ; this would induce a message processing on
1524	         each PE of the VPN for each PIM Assert message

1526	   Given the above, the additional processing that may happen for each
1527	   individual sending to the group beyond the maintenance of source and
1528	   shared trees, does not change the orders of magnitude identified
1529	   above.

1531	A.2.  Cost of PEs leaving and joining

1533	   The quantification of message processing in Appendix A.1.1 is done
1534	   based on a use case where each PE with receivers has joined and left
1535	   once.  Drawing scalability-related conclusions for other patterns of
1536	   changes of the set of receiver-connected PEs, can be done by
1537	   considering the cost of each approach for "a new PE joining" and "a
1538	   PE leaving".

1540	   For the "PIM LAN Procedure default" approach, in the case of a single
1541	   SSM or SPT tree, the total amount of message processing across all
1542	   nodes depends linearly on the number of PEs in the VPN, when a PE
1543	   joins such a tree.  When "PIM LAN Procedures with explicit tracking"
1544	   are used, the amount of processing is independent of the amount of
1545	   PEs.

1547	   For the "BGP-based" approach:

1549	   o  In the case of a single SSM tree, the total amount of message
1550	      processing across all nodes is independent on the number of PEs,
1551	      for "a new PE" joining and "a PE leaving"; it also depends on how
1552	      Route Reflectors are meshed, but not with linear dependency.

1554	   o  In the case of an SPT tree for an ASM group, BGP as additional
1555	      processing due to possible Source-Active autodiscovery routes:

1557	      *  when BGP-based C-multicast routing is used with inter-site
1558	         shared trees, for the first PE joining (and last PE leaving) a
1559	         said SPT, the processing of the corresponding Source-Active
1560	         autodiscovery routes results in a processint cost linearly
1561	         dependent of the number of PEs in the VPN ; for subsequent PE
1562	         joining (and non-last PE leaving) there is no processing due to
1563	         advertisement or withdrawal of Source-Active autodiscovery
1564	         routes

1566	      *  when BGP-based C-multicast routing is used without inter-site
1567	         shared trees, the processing of Source-Active autodiscovery
1568	         routes for an (S,G), happens independently of PEs joining and
1569	         leaving the SPT for (S,G).

1571	   In the case of a new PE having having to join a shared tree for an
1572	   ASM group G, we see the following:

1574	   o  the processing due to the PE joining the shared tree itself is the
1575	      same as the processing required to setup an SSM tree, as described
1576	      before (note that this does not happen when BGP-based C-multicast
1577	      routing is used without inter-site shared trees)

1579	   o  for each source for which the PE joins the SPT, the resulting
1580	      processing cost is the same as one SPT tree, as described before ;

1582	      *  the conditions under which a PE will join the SPT for a said
1583	         (C-S, C-G) are the same between the the BGP-based with inter-
1584	         site shared tree approach and the PIM-based approach, and
1585	         depend solely on the SwitchToSptDesired policy in the PIM
1586	         instances in the customer routers in the sites connected to the
1587	         PE and/or in the VRF

1589	      *  the conditions under which a PE will join the SPT for a said
1590	         (C-S, C-G) differ between the BGP-based without inter-site
1591	         shared trees approach and the PIM-based approach

1593	      *  the SPT for a said (S,G) can be joined by the PE in the
1594	         following cases:

1596	         +  as soon as one router, or the VPN VRF on the PE, has
1597	            SwitchToSptDesired(S,G) being true

1599	         +  when BGP-based routing is used, and configured to not use
1600	            inter-site shared trees

1602	      *  said differently, the only case where the PE will not join the
1603	         SPT for (S,G) is when all routers in the sites of the VPN
1604	         connected to the PE, or the VPN VRF itself, will never have
1605	         SwitchToSptDesired(S,G) being true, with the additional
1606	         condition when BGP-based C-multicast routing is used, that
1607	         inter-site shared trees are used

1609	   Thus, when one PE joins a group G to which n sources are sending
1610	   traffic, we note the following with regards to the dependency of the
1611	   cost (in total amount of processing across all equipments) to the
1612	   number of PEs :

1614	   o  in the general case (where any router in the site of the VPN
1615	      connected to the PE, or the VRF itself, may have
1616	      SwitchToSptDesired(S,G) being true):

1618	      *  for the "PIM LAN Procedure default" approach, the cost is
1619	         linearly dependent on the number of PEs in the VPN, and
1620	         linearly dependent on the number of sources

1622	      *  for the "PIM LAN Procedures with explicit tracking" approach,
1623	         the cost is linearly dependent on the number of sources and
1624	         independent of the number of PEs in the VPN

1626	      *  for the "BGP-based" approach, the cost is linearly dependent on
1627	         the number of sources, and, in the sub-case of the BGP-based
1628	         approach used with inter-site shared trees is also dependent on
1629	         the number of PEs in the VPN only if the PE is the first to
1630	         join the group or the SPT for some source sending to the group

1632	   o  else, under the assumption that routers in the sites of the VPN
1633	      connected to the PE, and the VPN VRF itself, will never have the
1634	      policy function SwitchToSptDesired(S,G) being possibly true, then:

1636	      *  in the case of the PIM-based approaches, the cost is linearly
1637	         dependent on the number of PEs in the VPN, and there is no
1638	         dependency on the number of sources

1640	      *  in the case of the BGP-based approach with inter-site shared
1641	         trees, the cost is linearly dependent on the number of RRs, and
1642	         there is no dependency on the number of sources

1644	      *  in the case of the BGP-based approach without inter-site shared
1645	         trees, the cost is linearly dependent on the number of RRs and
1646	         on the number of sources

1648	   Hence, with the PIM default approach the overall cost across all
1649	   equipements of any PE joining an ASM group G is always dependent on
1650	   the number of PEs (same for a PE that leaves), while in the BGP-based
1651	   and PIM Explicit tracking approaches have a cost independent of the
1652	   number of PEs (with the exception of the first PE joining the ASM
1653	   group, for the BGP-based approach used without inter-site shared
1654	   trees; in that case there is a dependency with the number of PEs).

1656	   On the dependency with the number of sources : without making any
1657	   assumption on the SwitchToSptDesired policy on PIM routers and VRFs
1658	   of a VPN, we see that a PE joining an ASM group may induce a
1659	   processing cost linearly dependent on the number of sources.  Apart
1660	   from this general case, under the condition where the
1661	   SwitchToSptDesired is always false on all PIM routers and VRFs of the
1662	   VPN, then with the PIM based approach, and with the BGP-based
1663	   approach used with inter-site shared trees, the cost in amount of
1664	   messages processed will be independent of the number of sources (it
1665	   has to be noted that this condition depends on customer policy).

1667	Appendix B.  Switching to S-PMSI

1669	   [ the following point was fixed in version 07 of
1670	   [I-D.ietf-l3vpn-2547bis-mcast], and is here for reference only ]

1672	   Section 7.2.2.3 of [I-D.ietf-l3vpn-2547bis-mcast] proposes two
1673	   approaches for how a source PE can decide when to start transmitting
1674	   customer multicast traffic on a S-PMSI:

1676	   1.  The source PE sends multicast packets for the <C-S, C-G> on both
1677	       the I-PMSI P-multicast tree and the S-PMSI P-multicast tree
1678	       simultaneously for a pre-configured period of time, letting the
1679	       receiver PEs select the new tree for reception, before switching
1680	       to only the S-PMSI.

1682	   2.  The source PE waits for a pre-configured period of time after
1683	       advertising the <C-S, C-G> entry bound to the S-PMSI before fully
1684	       switching the traffic onto the S-PMSI-bound P-multicast tree.

1686	   The first alternative has essentially two drawbacks:

1688	   o  <C-S,C-G> traffic is sent twice for some period of time, which
1689	      would appear to be at odds with the motivation for switching to an
1690	      S-PMSI in order to optimize the bandwidth used by the multicast
1691	      tree for that stream.

1693	   o  It is unlikely that the switchover can occur without packet loss
1694	      or duplication if the transit delays of the I-PMSI P-multicast
1695	      tree and the S-PMSI P-multicast tree differ.

1697	   By contrast, the second alternative has none of these drawbacks, and
1698	   satisfy the requirement in section 5.1.3 of [RFC4834], which states
1699	   that "[...] a multicast VPN solution SHOULD as much as possible
1700	   ensure that client multicast traffic packets are neither lost nor
1701	   duplicated, even when changes occur in the way a client multicast
1702	   data stream is carried over the provider network".  The second
1703	   alternative also happen to be the one used in existing deployments.

1705	   For these reasons, it is the authors' recommendation to mandate the
1706	   implementation of the second alternative for switching to S-PMSI.

1708	Authors' Addresses

1710	   Thomas Morin (editor)
1711	   France Telecom - Orange Labs
1712	   2 rue Pierre Marzin
1713	   Lannion  22307
1714	   France

1716	   Email: thomas.morin@orange-ftgroup.com

1718	   Ben Niven-Jenkins (editor)
1719	   BT
1720	   208 Callisto House, Adastral Park
1721	   Ipswich, Suffolk  IP5 3RE
1722	   UK

1724	   Email: benjamin.niven-jenkins@bt.com

1726	   Yuji Kamite
1727	   NTT Communications Corporation
1728	   Tokyo Opera City Tower
1729	   3-20-2 Nishi Shinjuku, Shinjuku-ku
1730	   Tokyo  163-1421
1731	   Japan

1733	   Email: y.kamite@ntt.com

1735	   Raymond Zhang
1736	   BT
1737	   2160 E. Grand Ave.
1738	   El Segundo  CA 90025
1739	   USA

1741	   Email: raymond.zhang@bt.com

1743	   Nicolai Leymann
1744	   Deutsche Telekom
1745	   Goslarer Ufer 35
1746	   10589 Berlin
1747	   Germany

1749	   Email: n.leymann@telekom.de
1750	   Nabil Bitar
1751	   Verizon
1752	   40 Sylvan Road
1753	   Waltham, MA  02451
1754	   USA

1756	   Email: nabil.n.bitar@verizon.com