idnits 2.17.1 

draft-ietf-l3vpn-mvpn-considerations-04.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** The document seems to lack a License Notice according IETF Trust
     Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009
     Section 6.b -- however, there's a paragraph with a matching beginning.
     Boilerplate error?

     (You're using the IETF Trust Provisions' Section 6.b License Notice from
     12 Feb 2009 rather than one of the newer Notices.  See
     https://trustee.ietf.org/license-info/.)


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  == There are 3 instances of lines with non-RFC6890-compliant IPv4 addresses
     in the document.  If these are example addresses, they should be changed.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 392 has weird spacing: '...   or   the us...'

  -- The document date (July 10, 2009) is 5394 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Missing Reference: 'RFC4364' is mentioned on line 715, but not defined

  == Unused Reference: 'I-D.ietf-l3vpn-2547bis-mcast-bgp' is defined on line
     962, but no explicit reference was found in the text

  == Outdated reference: A later version (-10) exists of
     draft-ietf-l3vpn-2547bis-mcast-08

  == Outdated reference: A later version (-08) exists of
     draft-ietf-l3vpn-2547bis-mcast-bgp-07

  == Outdated reference: A later version (-15) exists of
     draft-rosen-vpn-mcast-11

  == Outdated reference: A later version (-10) exists of
     draft-ietf-pim-sm-linklocal-02

  == Outdated reference: A later version (-09) exists of
     draft-ietf-pim-port-01


     Summary: 1 error (**), 0 flaws (~~), 10 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                      T. Morin, Ed.
3	Internet-Draft                                     France Telecom Orange
4	Intended status: Informational                     B. Niven-Jenkins, Ed.
5	Expires: January 11, 2010                                             BT
6	                                                               Y. Kamite
7	                                                      NTT Communications
8	                                                                R. Zhang
9	                                                                      BT
10	                                                              N. Leymann
11	                                                        Deutsche Telekom
12	                                                                N. Bitar
13	                                                                 Verizon
14	                                                           July 10, 2009

16	    Mandatory Features in a Layer 3 Multicast BGP/MPLS VPN Solution
17	                draft-ietf-l3vpn-mvpn-considerations-04

19	Status of this Memo

21	   This Internet-Draft is submitted to IETF in full conformance with the
22	   provisions of BCP 78 and BCP 79.

24	   Internet-Drafts are working documents of the Internet Engineering
25	   Task Force (IETF), its areas, and its working groups.  Note that
26	   other groups may also distribute working documents as Internet-
27	   Drafts.

29	   Internet-Drafts are draft documents valid for a maximum of six months
30	   and may be updated, replaced, or obsoleted by other documents at any
31	   time.  It is inappropriate to use Internet-Drafts as reference
32	   material or to cite them other than as "work in progress."

34	   The list of current Internet-Drafts can be accessed at
35	   http://www.ietf.org/ietf/1id-abstracts.txt.

37	   The list of Internet-Draft Shadow Directories can be accessed at
38	   http://www.ietf.org/shadow.html.

40	   This Internet-Draft will expire on January 11, 2010.

42	Copyright Notice

44	   Copyright (c) 2009 IETF Trust and the persons identified as the
45	   document authors.  All rights reserved.

47	   This document is subject to BCP 78 and the IETF Trust's Legal
48	   Provisions Relating to IETF Documents in effect on the date of
49	   publication of this document (http://trustee.ietf.org/license-info).
50	   Please review these documents carefully, as they describe your rights
51	   and restrictions with respect to this document.

53	Abstract

55	   More that one set of mechanisms to support multicast in a layer 3
56	   BGP/MPLS VPN has been defined.  These are presented in the documents
57	   that define them as optional building blocks.

59	   To enable interoperability between implementations, this document
60	   defines a subset of features that is considered mandatory for a
61	   multicast BGP/MPLS VPN implementation.  This will help implementers
62	   and deployers understand which L3VPN multicast requirements are best
63	   satisfied by each option.

65	Requirements Language

67	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
68	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
69	   document are to be interpreted as described in [RFC2119].

71	Table of Contents

73	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
74	   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  4
75	   3.  Examining alternatives mechanisms for MVPN functions . . . . .  4
76	     3.1.  MVPN auto-discovery  . . . . . . . . . . . . . . . . . . .  4
77	     3.2.  S-PMSI Signaling . . . . . . . . . . . . . . . . . . . . .  5
78	     3.3.  PE-PE Transmission of C-Multicast Routing  . . . . . . . .  7
79	       3.3.1.  PE-PE signaling scalability  . . . . . . . . . . . . .  7
80	       3.3.2.  P-routers scalability  . . . . . . . . . . . . . . . .  9
81	       3.3.3.  Impact of C-multicast routing on Inter-AS
82	               deployments  . . . . . . . . . . . . . . . . . . . . . 10
83	       3.3.4.  Security and robustness  . . . . . . . . . . . . . . . 10
84	       3.3.5.  C-multicast VPN join latency . . . . . . . . . . . . . 12
85	       3.3.6.  Conclusion on C-multicast routing  . . . . . . . . . . 13
86	     3.4.  Encapsulation techniques for P-multicast trees . . . . . . 14
87	     3.5.  Inter-AS deployments options . . . . . . . . . . . . . . . 16
88	     3.6.  Bidir-PIM support  . . . . . . . . . . . . . . . . . . . . 17
89	   4.  Co-located RPs . . . . . . . . . . . . . . . . . . . . . . . . 19
90	   5.  Existing deployments . . . . . . . . . . . . . . . . . . . . . 20
91	   6.  Summary of recommendations . . . . . . . . . . . . . . . . . . 20
92	   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 21
93	   8.  Security Considerations  . . . . . . . . . . . . . . . . . . . 21
94	   9.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 21
95	   10. Informative References . . . . . . . . . . . . . . . . . . . . 21
96	   Appendix A.  Scalability of C-multicast routing processing load  . 22
97	     A.1.  PIM LAN procedures, by default . . . . . . . . . . . . . . 26
98	     A.2.  PIM LAN procedures, with explicit tracking . . . . . . . . 27
99	     A.3.  BGP-based  . . . . . . . . . . . . . . . . . . . . . . . . 28
100	     A.4.  Side by side orders of magnitude comparison  . . . . . . . 29
101	   Appendix B.  Switching to S-PMSI . . . . . . . . . . . . . . . . . 31
102	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 32

104	1.  Introduction

106	   Specifications for multicast in BGP/MPLS
107	   [I-D.ietf-l3vpn-2547bis-mcast] include multiple alternative
108	   mechanisms for some of the required building blocks of the solution.
109	   However, they do not identify which of these mechanisms are mandatory
110	   to implement in order to ensure interoperability.  Not defining a set
111	   of mandatory to implement mechanisms leads to a situation where
112	   implementations may support different subsets of the available
113	   optional mechanisms which do not interoperate, which is a problem for
114	   the numerous operators having multi-vendor backbones.

116	   The aim of this document is to leverage the already expressed
117	   requirements [RFC4834] and study the properties of each approach, to
118	   identify mechanisms that are good candidates for being part of a core
119	   set of mandatory mechanisms which can be used to provide a base for
120	   interoperable solutions.

122	   This document goes through the different building blocks of the
123	   solution and concludes on which mechanisms an implementation is
124	   required to implement.  Section 6 summarizes these requirements.

126	   Considering the history of the multicast VPN proposals and
127	   implementations, it is also useful to discuss how existing
128	   deployments of early implementations
129	   [I-D.rosen-vpn-mcast][I-D.raggarwa-l3vpn-2547-mvpn] can be
130	   accommodated, and provide suggestions in this respect.

132	2.  Terminology

134	   Please refer to [I-D.ietf-l3vpn-2547bis-mcast] and [RFC4834].

136	3.  Examining alternatives mechanisms for MVPN functions

138	3.1.  MVPN auto-discovery

140	   The current solution document [I-D.ietf-l3vpn-2547bis-mcast] proposes
141	   two different mechanisms for MVPN auto-discovery:

143	   1.  BGP-based auto-discovery

145	   2.  "PIM/shared tree" : discovery done through the exchange of PIM
146	       Hellos by C-PIM instances, across an MI-PMSI implemented with one
147	       shared tree per VPN (using multicast ASM, or MP2MP LDP)

149	   Both solutions address Section 5.2.10 of [RFC4834] which states that
150	   "the operation of a multicast VPN solution SHALL be as light as
151	   possible and providing automatic configuration and discovery SHOULD
152	   be a priority when designing a multicast VPN solution.  Particularly
153	   the operational burden of setting up multicast on a PE or for a VR/
154	   VRF SHOULD be as low as possible".

156	   The key consideration is that PIM-based discovery is only applicable
157	   to deployments using a shared tree to instantiate an MI-PMSI (it
158	   cannot be applicable to if only P2P or SSM trees are used, because
159	   contrary to ASM and MP2MP, building these P2P or SSM trees cannot
160	   happen before the autodiscovery has been done), whereas the BGP-based
161	   auto-discovery does not place any constraint on the type of multicast
162	   trees that would have to be used.  BGP-based auto-discovery is
163	   independent of the type of P-multicast tree used thus satisfying the
164	   requirement in section 5.2.4.1 of [RFC4834] that "a multicast VPN
165	   solution SHOULD be designed so that control and forwarding planes are
166	   not interdependent".

168	   Additionally, it is to be noted that a number of service providers
169	   have chosen to use SSM-based trees for the default MDTs within their
170	   current deployments, therefore relying already on some BGP-based
171	   auto-discovery.

173	   Moreover, when shared P-tunnels are used, the use of BGP auto-
174	   discovery would allow inconsistencies in the addresses/identifiers
175	   used for the shared trees to be detected (e.g. the same shared tree
176	   identifier being used for different VPNs with distinct BGP route
177	   targets).  This is particularly attractive in the context of inter-AS
178	   VPNs where the impact of any misconfiguration could be magnified and
179	   where a single service provider may not operate all the ASs.  Note
180	   that this technique to detect some misconfiguration cases may not be
181	   usable during a transition period from a shared-tree autodiscovery to
182	   a BGP-based autodiscovery.

184	   Thus, the recommendation is that implementation of the BGP-based
185	   auto-discovery is mandated and should be supported by all mVPN
186	   implementations.

188	3.2.  S-PMSI Signaling

190	   The current solution document [I-D.ietf-l3vpn-2547bis-mcast] proposes
191	   two mechanisms for signaling that multicast flows will be switched to
192	   an S-PMSI :

194	   1.  a UDP-based TLV protocol specifically for S-PMSI signaling
195	       (described in section 7.4.2).

197	   2.  a BGP-based mechanism for S-PMSI signaling (described in section
198	       7.4.1).

200	   Section 5.2.10 of [RFC4834] states that "as far as possible, the
201	   design of a solution SHOULD carefully consider the number of
202	   protocols within the core network: if any additional protocols are
203	   introduced compared with the unicast VPN service, the balance between
204	   their advantage and operational burden SHOULD be examined
205	   thoroughly".  The UDP-based mechanism would be an additional protocol
206	   in the MVPN stack, which isn't the case for the BGP-based S-PMSI
207	   switching signaling, since (a) BGP is identified as a requirement for
208	   autodiscovery, and (b) the BGP-based S-PMSI switching signaling
209	   procedures are very similar to the autodiscovery procedures.

211	   Furthermore, the BGP-based S-PMSI switching signaling mechanism can
212	   be used within MVPNs using either a UI-PMSI or a MI-PMSI while the
213	   UDP-based protocol is restricted to use within MVPNs using an MI-
214	   PMSI.  In practice, this means that, except if shared trees are used,
215	   a PE will have to join to all trees of all PEs in a VPN, while in the
216	   alternative where BGP-based S-PMSI switching signaling is used, it
217	   could delay joining a tree from a PE until traffic from that PE is
218	   needed, thus reducing the amount of state maintained on P routers.

220	   S-PMSI switching signaling approaches can also be compared in an
221	   inter-AS context (see Section 3.5).  The proposed BGP-based approach
222	   for S-PMSI switching signaling provides a good fit with both the
223	   segmented and non-segmented inter-AS approaches (seeSection 3.5).  By
224	   contrast the UDP-based approach for S-PMSI switching signaling
225	   appears to be usable with segmented inter-AS tunnels, but in that
226	   case key advantages of the segmented approach are lost :

228	   o  there is no more an independence of ASes to choose when S-PMSIs
229	      tunnels will be triggered in their AS (and thus control the amount
230	      of state created on their P routers), and with which tunneling
231	      technique they will be built

233	   o  in an inter-AS option B context, an isolation of ASes is obtained
234	      as PEs don't have visibility of, nor exchange with, PEs of other
235	      ASes.  This property can be preserved if the segmented inter-AS
236	      approach and BGP-based S-PMSI switching signaling are used, but it
237	      is not preserved if UDP-based switching signaling is used.

239	   Given all the above, it is the recommendation of the authors that BGP
240	   is the preferred solution for S-PMSI switching signaling and should
241	   be supported by all implementations.

243	   It is identified that, if nothing prevents a fast-paced creation of
244	   S-PMSI, then S-PMSI switching signaling with BGP would possibly
245	   impact the Route Reflectors used for mVPN routes.  However is it also
246	   identified that such a fast-paced behavior would have an impact on P
247	   and PE routers resulting from S-PMSI tunnels signaling, which will be
248	   the same independently of the S-PMSI signaling approach that is used,
249	   and which it is certainly best to avoid by setting up proper
250	   mechanisms.

252	   The UDP-based S-PMSI switching signaling protocol can also be
253	   considered, as an option, given that this protocol has been in
254	   deployment for some time.  Implementations supporting both protocols
255	   would be expected to provide a per-VRF configuration knob to allow an
256	   implementation to use the UDP-based TLV protocol for S-PMSI switching
257	   signaling for specific VRFs in order to support the coexistence of
258	   both protocols (for example during migration scenarios).  Apart from
259	   such migration-facilitating mechanisms, the authors specifically do
260	   not recommend extending the already proposed UDP-based TLV protocol
261	   to new types of P-multicast trees.

263	3.3.  PE-PE Transmission of C-Multicast Routing

265	   The current solution document [I-D.ietf-l3vpn-2547bis-mcast] proposes
266	   multiple mechanisms for PE-PE transmission of customer multicast
267	   routing information:

269	   1.  Full per-MVPN PIM peering across an MI-PMSI (described in section
270	       3.4.1.1).

272	   2.  Lightweight PIM peering across an MI-PMSI (described in section
273	       3.4.1.2)

275	   3.  The unicasting of PIM C-Join/Prune messages (described in section
276	       3.4.1.3)

278	   4.  The use of BGP for carrying C-Multicast routing (described in
279	       section 3.4.2).

281	3.3.1.  PE-PE signaling scalability

283	   Scalability being one of the core requirements for multicast VPN, it
284	   is useful to compare the proposed C-multicast routing mechanisms from
285	   this perspective : Section 4.2.4 of [RFC4834] recommends that "a
286	   multicast VPN solution SHOULD support several hundreds of PEs per
287	   multicast VPN, and MAY usefully scale up to thousands" and section
288	   4.2.5 states that "a solution SHOULD scale up to thousands of PEs
289	   having multicast service enabled".

291	   Scalability with an increased number of VPNs per PE, or with an
292	   increased number of multicast state per VPN, are also important, but
293	   are not focused on in this section since we didn't identify
294	   differences between the different approaches for these matters : all
295	   others things equal, the load on PE due to C-multicast routing
296	   increases roughly linearly with the number of VPNs per PE, and with
297	   the number of multicast state per VPN.

299	   This section presents conclusions related to PE-PE signaling
300	   scalability, based on Appendix A that provides more detailed
301	   explanations on the differences in ways of handling the C-multicast
302	   routing load, between the PIM-based approaches and the BGP-based
303	   approach, along with a quantified evaluations of the amount of state
304	   and messages with the different approaches.  Many points made in this
305	   section are base on the conclusions in Appendix A.4.

307	   At high scales of multicast deployment, the first and third
308	   mechanisms require the PEs to maintain a large number of PIM
309	   adjacencies with other PEs of the same multicast VPN (which implies
310	   the regular exchange PIM Hellos with each other) and to refresh
311	   C-Join/Prune states, resulting in an increased processing cost when
312	   the amount of PEs increases (as detailed in Appendix A) to which the
313	   second approach is less subject, and to which the fourth approach is
314	   not subject.

316	   The third mechanism would reduce the amount of C-Join/Prune
317	   processing for a given multicast flow for PEs that are not the
318	   upstream neighbor for this flow, but would require "explicit
319	   tracking" state to be maintained by the upstream PE.  It also isn't
320	   compatible with the "Join suppression" mechanism.  A possible way to
321	   reduce the amount of signaling with this approach would be the use of
322	   a PIM refresh-reduction mechanism.  Such a mechanism, based on TCP,
323	   is being specified by the PIM IETF Working Group
324	   ([I-D.ietf-pim-port]) ; its use in a multicast VPN context has not
325	   been described in [I-D.ietf-l3vpn-2547bis-mcast], but it is expected
326	   that this approach would provide a scalability similar with the BGP-
327	   based approach without RR.

329	   The second mechanism would operate in a similar manner to full per-
330	   MVPN PIM peering except that PIM Hello messages are not transmitted
331	   and PIM C-Join/Prune refresh-reduction would be used, thereby
332	   improving scalability, but this approach has yet to be fully
333	   described.  In any case, it seems that it only improves one thing
334	   among the things that will impact scalability with an increased
335	   number of PEs.

337	   The first and second mechanisms can leverage the "Join suppression"
338	   behavior and thus improve the processing burden of an upstream PE,
339	   sparing the processing of a Join refresh message for each remote PE
340	   joined to a multicast stream.  This improvement requires all PEs of a
341	   multicast VPN to process all PIM Join and Prune messages sent by any
342	   other PE participating in the same multicast VPN whether they are the
343	   upstream PE or not.

345	   The fourth mechanism (the use of BGP for carrying C-Multicast
346	   routing) would have a comparable drawback of requiring all PEs to
347	   process a BGP C-multicast route only interesting a specific upstream
348	   PE.  For this reason the C-multicast routing approach can leverage
349	   the Route-Target constraint mechanisms, which specifically allows
350	   only the interested upstream PE to receive a BGP C-multicast route.
351	   When RT-constraints are used the fourth mechanism reduces the total
352	   amount of message processing load put on the PEs for customer
353	   multicast routing to the minimum (by avoiding any processing by
354	   "unrelated" PEs, that are not the joining PE nor the upstream PE, and
355	   by avoiding the use of refreshes), and inherits BGP features that are
356	   expected to improve scalability (for instance, providing a means to
357	   offload some of the processing burden associated with client
358	   multicast routing onto one or many BGP route-reflectors).  This
359	   advantage has a cost (the maintenance of a amount of state linear
360	   with the number of PEs joined to a stream), but when route reflectors
361	   are used, this cost is spread among the route reflectors.

363	   However, the fourth mechanism is specific in that it offers the
364	   possibility of offloading customer multicast routing processing onto
365	   one or more BGP Route Reflector(s).  When this is used, there is a
366	   drawback of increasing the processing load placed on the route
367	   reflector infrastructure.  In the higher scale scenarios, it may be
368	   required to adapt the route reflector infrastructure to the mVPN
369	   routing load by using, for example:

371	   o  a separation of resources for unicast and multicast VPN routing :
372	      using dedicated mVPN Route Reflector(s) (or using dedicated mVPN
373	      BGP sessions or dedicated mVPN BGP instances) ;

375	   o  the deployment of additional route reflector resources, for
376	      example increasing the processing resources on existing route
377	      reflectors or deployment of additional route reflectors.

379	   Among the above, the most straightforward approach is to consider the
380	   introduction of route reflectors dedicated to the mVPN service and
381	   dimension them accordingly to the need of that service (but doing so
382	   is not required and is left as an operator engineering decision).

384	3.3.2.  P-routers scalability

386	   Mechanisms (1) and (2) are restricted to use within multicast VPNs
387	   that use an MI-PMSI, thereby necessitating:

389	      the use of a P-multicast tree technique that allows shared trees
390	      (for example PIM-SM in ASM mode or MP2MP LDP)

392	   or   the use of one P-multicast tree per PE per VPN, even for PEs
393	      that do not have sources in their directly attached sites for that
394	      VPN.

396	   By comparison, the fourth mechanism doesn't impose either of these
397	   restrictions, and when P2MP trees are used only necessitates the use
398	   of one tree per VPN per PE attached to a site with a multicast source
399	   or RP (or with a candidate BSR, if BSR is used).

401	   In cases where there are less PEs connected with sources than the
402	   total amount of PEs, it improves the amount of state maintained by
403	   P-routers compared to the amount required to build an MI-PMSI with
404	   P2MP trees.  Such cases are expected to be frequent for multicast VPN
405	   deployments (see sections 4.2.4.1 of [RFC4834]).

407	3.3.3.  Impact of C-multicast routing on Inter-AS deployments

409	   Furthermore, co-existence with unicast inter-AS VPN options, and an
410	   equal level of security for multicast and unicast including in an
411	   inter-AS context, are specifically mentioned in sections 5.2.6, 5.2.8
412	   and 5.2.12 of [RFC4834].

414	   In an inter-AS option B context, an isolation of ASes is obtained as
415	   PEs don't have visibility of, nor exchange with, PEs of other ASes.
416	   This property can be preserved if the segmented inter-AS approach and
417	   BGP-based C-multicast routing is used, but it is not preserved if
418	   PIM-based signaling is used.

420	   By comparison, the fourth option (the use of BGP for carrying
421	   C-Multicast routing) does not have any of the above limitations
422	   related to inter-AS deployments.

424	   Additionally, the authors note that the proposed BGP-based approach
425	   for C-multicast routing provides a good fit with both the segmented
426	   and non-segmented inter-AS approaches.  By contrast, though the PIM-
427	   based C-multicast routing is usable with segmented inter-AS trees,
428	   the inter-AS scalability advantage of the approach is lost, since PEs
429	   in an AS will see the C-multicast routing activity of all other PEs
430	   of all other ASes.

432	3.3.4.  Security and robustness

434	   BGP supports MD5 authentication of its peers for additional security,
435	   thereby possibly benefit directly to multicast VPN customer multicast
436	   routing, whether for intra-AS or inter-AS communications.  By
437	   contrast, with a PIM-based approach, no mechanism providing a
438	   comparable level of security to authenticate communications between
439	   remote PEs has been yet fully described yet
440	   [I-D.ietf-pim-sm-linklocal][], and in any case would require
441	   significant additional operations for the provider to be usable in a
442	   multicast VPN context.

444	   The robustness of the infrastructure, especially the existing
445	   infrastructure providing unicast VPN connectivity, is key.  The
446	   C-multicast routing function, especially under load, will compete
447	   with the unicast routing infrastructure.  With the PIM-based
448	   approaches, the unicast and multicast VPN routing functions are
449	   expected to only compete in the PE, for control plane processing
450	   resources.  In the case of the BGP-based approach, they will compete
451	   on the PE for processing resources, and in the route reflectors
452	   (supposing they are used for mVPN routing).  It is identified that in
453	   both cases, mechanisms will be required to arbitrate resources (e.g.
454	   processing priorities).  In the case of PIM-based procedures, between
455	   the different control plane routing instances in the PE.  And in the
456	   case of the BGP-based approach, this is likely to require using
457	   distinct BGP sessions for multicast and unicast (e.g. through the use
458	   of dedicated mVPN BGP route reflectors, or to the use of a distinct
459	   session with an existing route reflector).

461	   Multicast routing is dynamic by nature, and multicast VPN routing has
462	   to follow the VPN customers multicast routing events.  The different
463	   approaches can be compared on how they are expected to behave in
464	   scenarios where multicast routing in the VPNs is subject to an
465	   intense activity.  Scalability of each approach under such a load is
466	   detailed in Appendix A, and the fourth approach (BGP-based) is the
467	   only one having a O(1) cost for join/leave operations, and with which
468	   state maintenance is not concentrated on the upstream PE.

470	   On the other hand, while the BGP-based approach is likely to suffer a
471	   slowdown under a load that is greater than the available processing
472	   resources (because of possibly congested TCP sockets), the PIM-based
473	   approaches would react to such a load by dropping messages, with
474	   failure-recovery obtained through message refreshes.  Thus, the BGP-
475	   based approach could result in a degradation of join/leave latency
476	   performance typically spread evenly across all multicast streams
477	   being joined in that period, while the PIM-based approach could
478	   result in increased join/leave latency, for some random streams, by a
479	   multiple of the time between refreshes (e.g. tens of seconds), and
480	   possibly in some states the adjacency may time-out resulting in
481	   disruption of multicast streams.

483	   The behavior of the PIM-based approach under such a load is also
484	   harder to predict, given that the performance of the "Join
485	   suppression" mechanism (an important mechanism for this approach to
486	   scale) will itself be impeded by delays in Join processing.  For
487	   these reasons, the BGP-based approach would be able to provide a
488	   smoother degradation and more predictable behavior under a highly
489	   dynamic load.

491	   In fact, both an "evenly spread degradation" and an "unevenly spread
492	   larger degradation" can be problematic, and what seems important is
493	   the ability for the VPN backbone operator to (a) limit the amount of
494	   multicast routing activity that can be triggered by a multicast VPN
495	   customer, and to (b) provide the best possible independence between
496	   distinct VPNs.  It seems that both of these can be addressed through
497	   local implementation improvements, and that both the BGP-based and
498	   PIM-based approaches could be engineered to provide (a) and (b).  It
499	   can be noted though that the BGP approach proposes ways to dampen
500	   C-multicast route withdrawals and/or advertisements, and thus already
501	   describes a way to provide (a), while nothing comparable has yet been
502	   described for the PIM-based approaches (even though it doesn't appear
503	   difficult).  The PIM-based approaches rely on a per VPN dataplane to
504	   carry the mVPN control plane, and thus may benefit from this first
505	   level of separation to solve (b).

507	3.3.5.  C-multicast VPN join latency

509	   Section 5.1.3 of [RFC4834] states that "the group join delay [...] is
510	   also considered one important QoS parameter.  It is thus RECOMMENDED
511	   that a multicast VPN solution be designed appropriately in this
512	   regard".  In a multicast VPN context, the "group join delay"of
513	   interest is the time between a CE sending a PIM Join to its PE and
514	   the first packet of the corresponding multicast stream being received
515	   by the CE.

517	   It is to be noted that the C-multicast routing procedures will only
518	   impact the group join latency of a said multicast stream for the
519	   first receiver that is located across the provider backbone from the
520	   multicast source-connected PE (or the first <n> receivers in the
521	   specific case where a specific UMH selection algorithm is used, that
522	   allows <n> distinct UMH to be selected by distinct downstream PEs).

524	   The different approaches proposed seem to have different
525	   characteristics in how they are expected to impact join latency:

527	   o  the PIM-based approaches minimize the number of control plane
528	      processing hops between a new receiver-connected PE and the
529	      source-connected PE, and being datagram-based introduces minimal
530	      delay, thereby possibly having a join latency as good as possible
531	      depending on implementation efficiency

533	   o  under degraded conditions (packet loss, congestion, high control
534	      plane load) the PIM-based approach may impact the latency for a
535	      given multicast stream in an all or nothing manner : if a
536	      C-multicast routing PIM Join packet is lost, latency can reach a
537	      high time (a multiple of the periodicity of PIM Join refreshes)

539	   o  the BGP-based approach uses TCP exchanges, that may introduce an
540	      additional delay depending on BGP and TCP implementation, but
541	      which would typically result, under degraded conditions (such
542	      packet loss, congestion, high control plane load), in a comparably
543	      lower increase of latency spread more evenly across the streams

545	   o  as shown in Appendix A, the BGP-based approach is particular in
546	      that it removes load from all the PEs (without putting this load
547	      on the upstream PE for a stream); this improvement of background
548	      load can bring improved performance when a PE acts as the upstream
549	      PE for a stream, and thus benefit join latency

551	   This qualitative comparison of approaches shows that the BGP-based
552	   approach is designed for a smoother degradation of latency under
553	   degraded conditions such as packet loss, congestion, or high control
554	   plane load.  On the other hand, the PIM-based approaches seem to
555	   structurally be able to reach the shorter "best-case" group join
556	   latency (especially compared to deployment of the BGP-based approach
557	   where route-reflectors are used).

559	   Doing a quantitative comparison of latencies is not possible without
560	   referring to specific implementations and benchmarking procedures,
561	   and would possibly expose different conclusions, especially for best-
562	   case group join latency for which performance is expected vary with
563	   PIM and BGP implementations.  We can also note that improving a BGP
564	   implementation for reduced latency of route processing would not only
565	   benefit multicast VPN group join latency, but the whole BGP-based
566	   routing, which means that the need for good BGP/RR performance is not
567	   specific to multicast VPN routing.

569	   Last, C-multicast join latency will be impacted by the overall load
570	   put on the control plane, and the scalability of the C-multicast
571	   routing approach is thus to be taken into account.  As explained in
572	   sections Section 3.3.1 and Appendix A, the BGP-based approach will
573	   provide the best scalability with an increased number of PEs per VPN,
574	   thereby benefiting group join latency in such higher scale scenarios.

576	3.3.6.  Conclusion on C-multicast routing

578	   The first and fourth approaches are relevant contenders for
579	   C-multicast routing.  Comparisons from a theoretical standpoint lead
580	   to identify some advantages in the fourth approach, but possible
581	   drawbacks are also identified for this approach.  Comparisons from a
582	   practical standpoint are harder to make, since only reduced
583	   deployment and implementation information is available for the fourth
584	   approach, but by default advantages would be seen in the first
585	   approach has been applied through multiple deployments and shown to
586	   be operationally viable.

588	   Moreover, the first mechanism (full per-MVPN PIM peering across an
589	   MI-PMSI) is the mechanism used by [I-D.rosen-vpn-mcast] and therefore
590	   it is deployed and operating in MVPNs today.  The fourth approach may
591	   or may not end up being preferred for a said deployment, but because
592	   the first approach has been in deployment for some time, the support
593	   for this mechanism will in any case be helpful for to facilitate an
594	   eventual migration from a deployment using mechanism close to the
595	   first approach.

597	   Consequently, at the present time, implementations are recommended to
598	   support both the fourth (BGP-based) and first (Full per-MPVN PIM
599	   peering) mechanisms.  Further experience on deployments of the fourth
600	   approach is needed before some best practice can be defined.

602	3.4.  Encapsulation techniques for P-multicast trees

604	   In this section the authors will not make any restricting
605	   recommendations since the appropriateness of a specific provider core
606	   data plane technology will depend on a large number of factors, for
607	   example the service provider's currently deployed unicast data plane,
608	   many of which are service provider specific.

610	   However, implementations should not unreasonably restrict the data
611	   plane technology that can be used, and should not force the use of
612	   the same technology for different VPNs attached to a single PE.
613	   Initial implementations may only support a reduced set of
614	   encapsulation techniques and data plane technologies but this should
615	   not be a limiting factor that hinders future support for other
616	   encapsulation techniques, data plane technologies or
617	   interoperability.

619	   Section 5.2.4.1 of [RFC4834] states "In a multicast VPN solution
620	   extending a unicast L3 PPVPN solution, consistency in the tunneling
621	   technology has to be favored: such a solution SHOULD allow the use of
622	   the same tunneling technology for multicast as for unicast.
623	   Deployment consistency, ease of operation and potential migrations
624	   are the main motivations behind this requirement."

626	   Current unicast VPN deployments use a variety of LDP, RSVP-TE and
627	   GRE/IP-Multicast for encapsulating customer packets for transport
628	   across the provider core of VPN services.  In order to allow the same
629	   encapsulations to be used for unicast and multicast VPN traffic, it
630	   is recommended that multicast VPN standards should recommend
631	   implementations to support for multicast VPNs, all the P2MP variants
632	   of the encapsulations and signaling protocols that they support for
633	   unicast and for which some multipoint extension is defined, such as
634	   mLDP, P2MP RSVP-TE and GRE/IP-multicast.

636	   All three of the above encapsulation techniques support the building
637	   of P2MP multicast trees.  In addition mLDP and GRE/IP-ASM-Multicast
638	   implementations may also support the building of MP2MP multicast
639	   trees.  The use of MP2MP trees may provide some scaling benefits to
640	   the service provider as only a single MP2MP tree need be deployed per
641	   VPN, thus reducing by an order of magnitude the amount of multicast
642	   state that needs to be maintained by P routers.  This gain in state
643	   is at the expense of bandwidth optimization, since sites that do not
644	   have multicast receivers for multicast streams sourced behind a said
645	   PE group will still receive packets of such streams, leading to non-
646	   optimal bandwidth utilization across the VPN core.  One thing to
647	   consider is that the use of MP2MP multicast tree will require
648	   additional configuration to define the same tree identifier or
649	   multicast ASM group address in all PEs (it has been noted that some
650	   auto-configuration could be possible for MP2MP trees, but this it is
651	   not currently supported by the auto-discovery procedures). [ It has
652	   been noted that C-multicast routing schemes not covered in
653	   [I-D.ietf-l3vpn-2547bis-mcast] could expose different advantages of
654	   MP2MP multicast trees - this is out of scope of this document ]

656	   MVPN services can also be supported over a unicast VPN core through
657	   the use of ingress PE replication whereby the ingress PE replicates
658	   any multicast traffic over the P2P tunnels used to support unicast
659	   traffic.  While this option does not require the service provider to
660	   modify their existing P routers (in terms of protocol support) and
661	   does not require maintaining multicast-specific state on the P
662	   routers in order for the service provider to be able deploy a
663	   multicast VPN service, the use of ingress PE replication obviously
664	   leads to non-optimal bandwidth utilization and it is therefore
665	   unlikely to be the long term solution chosen by service providers.
666	   However ingress PE replication may be useful during some migration
667	   scenarios or where a service provider considers the level of
668	   multicast traffic on their network to be too low to justify deploying
669	   multicast specific support within their VPN core.

671	   All proposed approaches for control plane and dataplane can be used
672	   to provide aggregation amongst multicast groups within a VPN and
673	   amongst different multicast VPNs, and potentially reduce the amount
674	   of state to be maintained by P routers.  However the latter -- the
675	   aggregation amongst different multicast VPNs will require support for
676	   upstream-assigned labels on the PEs.  Support for upstream-assigned
677	   labels may require changes to the data plane processing of the PEs
678	   and this should be taken into consideration by service providers
679	   considering the use of aggregate S-PMSI tunnels for the specific
680	   platforms that the service provider has deployed.

682	3.5.  Inter-AS deployments options

684	   There are a number of scenarios that lead to the requirement for
685	   inter-AS multicast VPNs, including:

687	   1.  a service provider may have a large network that they have
688	       segmented into a number of ASs.

690	   2.  a service provider's multicast VPN may consist of a number of ASs
691	       due to acquisitions and mergers with other service providers.

693	   3.  a service provider may wish to interconnect their multicast VPN
694	       platform with that of another service provider.

696	   The first scenario can be considered the "simplest" because the
697	   network is wholly managed by a single service provider under a single
698	   strategy and is therefore likely to use a consistent set of
699	   technologies across each AS.

701	   The second scenario may be more complex than the first because the
702	   strategy and technology choices made for each AS may have been
703	   different due to their differing history and the service provider may
704	   not have (or may be unwilling to) unified the strategy and technology
705	   choices for each AS.

707	   The third scenario is the most complex because in addition to the
708	   complexity of the second scenario, the ASs are managed by different
709	   service providers and therefore may be subject to a different trust
710	   model than the other scenarios.

712	   Section 5.2.6 of [RFC4834] states that "a solution MUST support
713	   inter-AS multicast VPNs, and SHOULD support inter-provider multicast
714	   VPNs", "considerations about coexistence with unicast inter-AS VPN
715	   Options A, B and C (as described in section 10 of [RFC4364]) are
716	   strongly encouraged" and "a multicast VPN solution SHOULD provide
717	   inter-AS mechanisms requiring the least possible coordination between
718	   providers, and keep the need for detailed knowledge of providers'
719	   networks to a minimum - all this being in comparison with
720	   corresponding unicast VPN options".

722	   Section 8 of [I-D.ietf-l3vpn-2547bis-mcast] addresses these
723	   requirements by proposing two approaches for mVPN inter-AS
724	   deployments:

726	   1.  Non-segmented inter-AS tunnels where the multicast tunnels are
727	       end-to-end across ASes, so even though the PEs belonging to a
728	       given MVPN may be in different ASs the ASBRs play no special role
729	       and function merely as P routers (described in section 8.1).

731	   2.  Segmented inter-AS tunnels where each AS constructs its own
732	       separate multicast tunnels which are then 'stitched' together by
733	       the ASBRs (described in section 8.2).

735	   Section 5.2.6 of [RFC4834] also states "Within each service provider
736	   the service provider SHOULD be able on its own to pick the most
737	   appropriate tunneling mechanism to carry (multicast) traffic among
738	   PEs (just like what is done today for unicast)".  The segmented
739	   approach is the only one capable of meeting this requirement.

741	   The segmented inter-AS solution would appear to offer the largest
742	   degree of deployment flexibility to operators.  However the non-
743	   segmented inter-AS solution can simplify deployment in a restricted
744	   number of scenarios and [I-D.rosen-vpn-mcast] only supports the non-
745	   segmented inter-AS solution and therefore the non-segmented inter-AS
746	   solution is likely to be useful to some operators for backward
747	   compatibility and during migration from [I-D.rosen-vpn-mcast] to
748	   [I-D.ietf-l3vpn-2547bis-mcast].

750	   The applicability of segmented or non-segmented inter-AS tunnels to a
751	   given deployment or inter-provider interconnect will depend on a
752	   number of factors specific to each service provider.  However, due to
753	   the additional deployment flexibility offered by segmented inter-AS
754	   tunnels, it is the recommendation of the authors that all
755	   implementations should support the segmented inter-AS model.
756	   Additionally, the authors recommend that implementations should
757	   consider supporting the non-segmented inter-AS model in order to
758	   facilitate co-existence with existing deployments, and as a feature
759	   to provide a lighter engineering in a restricted set of scenarios,
760	   although it is recognized that initial implementations may only
761	   support one or the other.

763	3.6.  Bidir-PIM support

765	   In Bidir-PIM, the packet forwarding rules have been improved over
766	   PIM-SM, allowing traffic to be passed up the shared tree toward the
767	   RP Address (RPA).  To avoid multicast packet looping, Bidir-PIM uses
768	   a mechanism called the designated forwarder (DF) election, which
769	   establishes a loop-free tree rooted at the RPA.  Use of this method
770	   ensures that only one copy of every packet will be sent to an RPA,
771	   even if there are parallel equal cost paths to the RPA.  To avoid
772	   loops the DF election process enforces consistent view of the DF on
773	   all routers on network segment, and during periods of ambiguity or
774	   routing convergence the traffic forwarding is suspended.

776	   In the context of a multicast VPN solution, a solution for Bidir-PIM
777	   support must preserve this property of similarly avoiding packet
778	   loops, including in the case where mVRF's in a given MVPN don't have
779	   a consistent view of the routing to C-RPL/C-RPA.

781	   The current MVPN specifications [I-D.ietf-l3vpn-2547bis-mcast] in
782	   section 11, define three methods to support Bidir-PIM, as RECOMMENDED
783	   in [RFC4834] :

785	   1.  Standard DF election procedure over an MI-PMSI

787	   2.  VPN Backbone as the RPL (section 11.1)

789	   3.  Partitioned Sets of PEs (section 11.2)

791	   Method (1) is naturally applied to deployments using "Full per-MVPN
792	   PIM peering across an MI-PMSI" for C-multicast routing, but as
793	   indicated in [I-D.ietf-l3vpn-2547bis-mcast] in section 11, the DF
794	   Election may not work well in an mVPN environment and an alternative
795	   to DF election would be desirable.

797	   The advantage of method (2) and (3) is that they do not require
798	   running the DF election procedure among PEs.

800	   Method (2) leverages the fact that in Bidir-PIM, running the DF
801	   election procedure is not needed on the RPL.  This approach thus has
802	   the benefit of simplicity of implementation, especially in a context
803	   where BGP-based C-multicast routing is used.  However it has the
804	   drawback of putting constraints on how Bidir-PIM is deployed which
805	   may not always match mVPN customers requirements.

807	   Method (3) treats an mVPN as a collection of sets of multicast VRFs,
808	   all PEs in a set having the same reachability information towards
809	   C-RPA, but distinct from PEs in other sets.  Hence, with this method,
810	   C-Bidir packet loops in MVPN are resolved by the ability to partition
811	   a VPN into disjoints sets of VRF's, each having a distinct view of
812	   converged network.  The partitioning approach to Bidir-PIM requires
813	   either upstream-assigned MPLS labels (to denote the partition) or a
814	   unique MP2MP LSP per partition.  The former is based on PE
815	   Distinguisher Labels that have to be distributed using auto-discovery
816	   BGP routes and their handling requires the support for upstream
817	   assigned labels and context label lookups [ref].  The latter, using
818	   MP2MP LSP per partition, does not have these constraints but is
819	   restricted to P-tunnel types supporting MP2MP connectivity (such as
820	   mLDP[ref]).

822	   This approach to C-Bidir can work with PIM-based or BGP-based
823	   C-multicast routing procedures, and is also generic in the sense that
824	   it does not impose any requirements on the Bidir-PIM service
825	   offering.

827	   Given the above considerations, method (3) "Partitioned Sets of PEs"
828	   is the RECOMMENDED approach.

830	   In the event where method (3) is not applicable (lack of support for
831	   upstream assigned labels or for a P-tunnel type providing MP2MP
832	   connectivity), then method (1) "Standard DF election procedure over
833	   an MI-PMSI" and (2) "VPN Backbone as the RPL" are RECOMMENDED as
834	   interim solutions, (1) having the advantage over (2) of not putting
835	   constraints on how Bidir-PIM is deployed and the drawbacks of only
836	   being applicable when PIM-based C-multicast is used and of possibly
837	   not working well in an mVPN environment.

839	4.  Co-located RPs

841	   Section 5.1.10.1 of [RFC4834] states "In the case of PIM-SM in ASM
842	   mode, engineering of the RP function requires the deployment of
843	   specific protocols and associated configurations.  A service provider
844	   may offer to manage customers' multicast protocol operation on their
845	   behalf.  This implies that it is necessary to consider cases where a
846	   customer's RPs are out-sourced (e.g. on PEs).  Consequently, a VPN
847	   solution MAY support the hosting of the RP function in a VR or VRF."

849	   However, customers who have already deployed multicast within their
850	   networks and have therefore already deployed their own internal RPs
851	   are often reluctant to hand over the control of their RPs to their
852	   service provider and make use of a co-located RP model, and providing
853	   RP-collocation on a PE will require the activation of MSDP or the
854	   processing of PIM Registers on the PE.  Securing the PE routers for
855	   such activity requires special care, additional work, and will likely
856	   rely on specific features to be provided by the routers themselves.

858	   The applicability of the co-located RP model to a given MVPN will
859	   thus depend on a number of factors specific to each customer and
860	   service provider.

862	   It is therefore the recommendation that implementations should
863	   support a co-located RP model, but that support for a co-located RP
864	   model within an implementation should not restrict deployments to
865	   using a co-located RP model : implementations MUST support
866	   deployments when activation of a PIM RP function (PIM Register
867	   processing and RP-specific PIM procedures) or VRF MSDP instance is
868	   not required on any PE router and where all the RPs are deployed
869	   within the customers' networks or CEs.

871	5.  Existing deployments

873	   Some suggestions provided in this document can be used to
874	   incrementally modify currently deployed implementations without
875	   hindering these deployments, and without hindering the consistency of
876	   the standardized solution by providing optional per-VRF configuration
877	   knobs to support modes of operation compatible with currently
878	   deployed implementations, while at the same time using the
879	   recommended approach on implementations supporting the standard.

881	   In cases where this may not be easily achieved, a recommended
882	   approach would be to provide a per-VRF configuration knob that allows
883	   incremental per-VPN migration of the mechanisms used by a PE device,
884	   which would allow migration with some per-VPN interruption of service
885	   (e.g. during a maintenance window).

887	   Mechanisms allowing "live" migration by providing concurrent use of
888	   multiple alternatives for a given PE and a given VPN, is not seen as
889	   a priority considering the expected implementation complexity
890	   associated with such mechanisms.  However, if there happen to be
891	   cases where they could be viably implemented relatively simply, such
892	   mechanisms may help improve migration management.

894	6.  Summary of recommendations

896	   The following list summarizes conclusions on the mechanisms that
897	   define the set of mandatory to implement mechanisms in the context of
898	   [I-D.ietf-l3vpn-2547bis-mcast].

900	   Note well that the implementation of the non-mandatory alternative
901	   mechanisms is not precluded.

903	   Recommendations are:

905	   o  that BGP-based auto-discovery be the mandated solution for auto-
906	      discovery ;

908	   o  that BGP be the mandated solution for S-PMSI switching signaling ;

910	   o  that implementations support both the BGP-based and the full per-
911	      MPVN PIM peering solutions for PE-PE transmission of customer
912	      multicast routing until further operational experience is gained
913	      with both solutions ;

915	   o  that implementations use the "Partitioned Sets of PEs" approach
916	      for Bidir-PIM support ;

918	   o  that implementations implement the P2MP variants of the P2P
919	      protocols that they already implement, such as mLDP, P2MP RSVP-TE
920	      and GRE/IP-Multicast ;

922	   o  that implementations support segmented inter-AS tunnels and
923	      consider supporting non-segmented inter-AS tunnels (in order to
924	      maintain backwards compatibility and for migration) ;

926	   o  implementations MUST support deployments when activation of a PIM
927	      RP function (PIM Register processing and RP-specific PIM
928	      procedures) or VRF MSDP instance is not required on any PE router.

930	7.  IANA Considerations

932	   This document makes no request to IANA.

934	   [ Note to RFC Editor: this section may be removed on publication as
935	   an RFC. ]

937	8.  Security Considerations

939	   This document does not by itself raise any particular security
940	   considerations.

942	9.  Acknowledgements

944	   We would like to thank Adrian Farrel, Eric Rosen, Yakov Rekhter, and
945	   Maria Napierala for their feedback that helped shape this document.

947	   Additional credit is due to Maria Napierala for co-authoring
948	   Section 3.6 on Bidir-PIM support.

950	10.  Informative References

952	   [RFC4834]  Morin, T., "Requirements for Multicast in L3 Provider-
953	              Provisioned Virtual Private Networks (PPVPNs)", RFC 4834,
954	              April 2007.

956	   [I-D.ietf-l3vpn-2547bis-mcast]
957	              Aggarwal, R., Bandi, S., Cai, Y., Morin, T., Rekhter, Y.,
958	              Rosen, E., Wijnands, I., and S. Yasukawa, "Multicast in
959	              MPLS/BGP IP VPNs", draft-ietf-l3vpn-2547bis-mcast-08 (work
960	              in progress), March 2009.

962	   [I-D.ietf-l3vpn-2547bis-mcast-bgp]
963	              Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP
964	              Encodings and Procedures for Multicast in MPLS/BGP IP
965	              VPNs", draft-ietf-l3vpn-2547bis-mcast-bgp-07 (work in
966	              progress), April 2009.

968	   [I-D.rosen-vpn-mcast]
969	              Cai, Y., Rosen, E., and I. Wijnands, "Multicast in MPLS/
970	              BGP IP VPNs", draft-rosen-vpn-mcast-11 (work in progress),
971	              June 2009.

973	   [I-D.raggarwa-l3vpn-2547-mvpn]
974	              Aggarwal, R., "Base Specification for Multicast in BGP/
975	              MPLS VPNs", draft-raggarwa-l3vpn-2547-mvpn-00 (work in
976	              progress), June 2004.

978	   [I-D.ietf-pim-sm-linklocal]
979	              Atwood, J., "Authentication and Confidentiality in PIM-SM
980	              Link-local Messages", draft-ietf-pim-sm-linklocal-02 (work
981	              in progress), November 2007.

983	   [I-D.ietf-pim-port]
984	              Farinacci, D., Wijnands, I., Venaas, S., and M. Napierala,
985	              "A Reliable Transport Mechanism for PIM",
986	              draft-ietf-pim-port-01 (work in progress), July 2009.

988	   [RFC4684]  Marques, P., Bonica, R., Fang, L., Martini, L., Raszuk,
989	              R., Patel, K., and J. Guichard, "Constrained Route
990	              Distribution for Border Gateway Protocol/MultiProtocol
991	              Label Switching (BGP/MPLS) Internet Protocol (IP) Virtual
992	              Private Networks (VPNs)", RFC 4684, November 2006.

994	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
995	              Requirement Levels", BCP 14, RFC 2119, March 1997.

997	Appendix A.  Scalability of C-multicast routing processing load

999	   The main role of multicast routing is to let routers determine that
1000	   they should start or stop forwarding a said multicast stream on a
1001	   said link.  In the multicast VPN context, this has to be done for
1002	   each VPN, and the associated function is thus named "customer-
1003	   multicast routing" or "C-multicast routing" and its role is to let PE
1004	   routers determine that they should start or stop forwarding the
1005	   traffic of a said multicast stream toward the remote PEs, on some
1006	   PMSI tunnel.

1008	   When some "join" message is received by a PE, this PE knows that it
1009	   should be sending traffic for the corresponding multicast group of
1010	   the corresponding VPN.  But the reception of a "prune" message from a
1011	   remote PE is not enough by itself for a PE to know that it should
1012	   stop forwarding the corresponding multicast traffic : it has to make
1013	   sure that they aren't any other PEs that still have receivers for
1014	   this traffic.

1016	   There are many ways that the "C-multicast routing" building block can
1017	   be designed, and they differ, among other things, in how a PE
1018	   determines when it can stop forwarding a said multicast stream toward
1019	   other PEs:

1021	   PIM LAN Procedures, by default
1022	      By default when PIM LAN procedures are used, when a PE Prunes
1023	      itself from a multicast tree, all other PEs check their own state
1024	      to known if they are on the tree, in which case they send a PIM
1025	      Join message to override the Prune.  Thus, for each PIM Prune
1026	      message, all PE routers work to let the upstream PE determine the
1027	      answer to the "did the last receiver leave?" question.

1029	   PIM LAN Procedures, with explicit tracking :
1030	      PIM LAN procedures can use an "explicit tracking" approach, where
1031	      a PE which is the upstream router for a multicast stream maintains
1032	      an updated list of all neighbors who are joined to the tree.
1033	      Thus, when it receives a Leave message from a PIM neighbor, it
1034	      instantly knows the answer to the "did the last receiver leave?"
1035	      question.
1036	      In this case, the question is answered by the upstream router
1037	      alone.  The side effect of this "explicit tracking" is that "Join
1038	      suppression" is not used : the downstream PEs will always send
1039	      Joins toward the upstream PE, which will have to process them all.

1041	   BGP-based C-multicast routing
1042	      When BGP-based procedures are used for C-multicast routing, if no
1043	      BGP route reflector is used, the "did the last receiver leave?"
1044	      question is answered like in the PIM "explicit tracking" approach.
1045	      But, when a BGP route reflector is used (which is expected to be
1046	      the recommended approach), the role of maintaining an updated list
1047	      of the PE part of a said multicast tree is taken care of by the
1048	      route reflector(s).  Using plain BGP route selection procedures,
1049	      the route reflector will withdraw a C-multicast Source Tree Join
1050	      for a said (C-S,C-G) when there is no PE advertising one anymore.
1051	      In this context, the "did the last receiver leave?" question can
1052	      be said to be answered by the route-reflector(s).
1053	      Furthermore, the BGP route distribution can leverage more than one
1054	      route reflector : if a hierarchy of route reflectors is used, the
1055	      "did the last receiver leave?" question is partly answered by each
1056	      route reflector in the hierarchy.

1058	   We can see that answering the "last receiver leaves" question is a
1059	   significant proportion of the work that the C-multicast routing
1060	   building block has to make, and where the approaches differ most.
1061	   The different approaches for handling C-multicast routing can result
1062	   in a different amount of processing and how this processing is spread
1063	   among the different functions.  These differences can be better
1064	   estimated by quantifying the amount of message processing and state
1065	   maintenance.

1067	   Though the type of processing, messages and states, may vary with the
1068	   different approaches, we propose here a rough estimation of the load
1069	   of PEs, in terms of number of messages processed and number of
1070	   control plane states maintained : a "message processed" being a
1071	   message being parsed, a lookup being done, and some action being
1072	   taken (such has updating a control plane or data plane state), and a
1073	   "state maintained" being a multicast state kept in the control plane
1074	   memory of a PE, related to a interface or a PE being subscribed to a
1075	   multicast stream (we don't compare the data plane states on PE
1076	   routers, which wouldn't vary between the different options chosen).

1078	   The following subsections do such an estimation for each proposed
1079	   approach for C-multicast routing, for different phases of the
1080	   following scenario:

1082	   o  one SSM multicast stream is considered (extrapolating to a higher
1083	      number of streams is linear)

1085	   o  only the intra-AS case is concerned (with the segmented inter-AS
1086	      trees and BGP-based C-multicast routing, #mvpn_PE and #R_PE should
1087	      refer to the PEs of the mVPN in the AS, not to all PEs of the
1088	      mVPN)

1090	   o  the scenario is as follows:

1092	      *  one PE Joins the multicast stream (because of a new receiver-
1093	         connected site has sent a Join on the PE-CE link), followed by
1094	         a number of additional PEs that also join the multicast stream,
1095	         one after the other ; we evaluate the processing required for
1096	         the addition of each PE

1098	      *  some period of time T passes, without any PE joining or leaving
1099	         (baseline)

1101	      *  all PE leaves, one after the other, until the last one leaves ;
1102	         we evaluate the processing required for the leave of each PE

1104	   o  the parameters used are:

1106	      *  #mVPN_PE : the number of PEs in the mVPN

1108	      *  #R_PE : the number of PEs joining the multicast stream

1110	      *  #RR : the number of route reflectors

1112	      *  T_PIM_r : the time between two refreshes of a PIM Join (default
1113	         is 60s)

1115	   The estimation unit used is the "message.equipment" (or "m.e"): one
1116	   "message.equipment" corresponding to "one equipment processing one
1117	   message" (10 m.e being "10 equipments processing each one message",
1118	   or "5 messages each processed by 2 equipments", or "1 message
1119	   processed by 10 equipment", etc.).  Similarly, for the amount of
1120	   control plane state, the unit used is "state.equipment" or "s.e".
1121	   This allow to take into account the fact that a message (or a state)
1122	   can have be processed (or maintained) by more than one node.

1124	   We distinguish three different types of equipments : the upstream PE
1125	   for the considered multicast stream, the RR (if any), and the other
1126	   PEs (which are not the upstream PE).

1128	   The numbers or orders of magnitude given in the tables in the
1129	   following subsections are totals across all equipments of a same
1130	   type, for each type of equipment, in the the "m.e" and "s.e" units
1131	   defined above.

1133	   Additional precisions:

1135	   o  for PIM, only Join and Prune messages are counted ; the PIM Hellos
1136	      are not counted since these are not messages that trigger specific
1137	      action in a typical scenario; message processing related to the
1138	      PIM Assert mechanism is also not taken into account, because it is
1139	      only active in transient state

1141	   o  for BGP, only UPDATE messages for mVPN routes carrying C-multicast
1142	      routing information are considered

1144	A.1.  PIM LAN procedures, by default

1146	   +------------+------------+---------------+----------+--------------+
1147	   |            | upstream   | other PEs     | RR       | total across |
1148	   |            | PE (1)     | (total across | (none)   | all          |
1149	   |            |            | (#mvpn_PE-1)  |          | equipments   |
1150	   |            |            | PEs)          |          |              |
1151	   +------------+------------+---------------+----------+--------------+
1152	   | first PE   | 1 m.e      | #mVPN_PE-1    | /        | #mVPN_PE m.e |
1153	   | joins      |            | m.e           |          |              |
1154	   +------------+------------+---------------+----------+--------------+
1155	   | for *each* | 1 m.e      | #mvpn_PE-1    | /        | #mvpn_PE m.e |
1156	   | additional |            | m.e           |          |              |
1157	   | PE joining |            |               |          |              |
1158	   +------------+------------+---------------+----------+--------------+
1159	   | baseline   | T/T_PIM_r  | (T/T_PIM_r) . | /        | (T/T_PIM_r)  |
1160	   | processing | m.e        | (#mvpn_PE-1)  |          | x #mvpn_PE   |
1161	   | over a     |            | m.e           |          | m.e          |
1162	   | period T   |            |               |          |              |
1163	   +------------+------------+---------------+----------+--------------+
1164	   | for *each* | 2 m.e      | 2(#mvpn_PE-1) | /        | 2 x #mvpn_PE |
1165	   | PE leaving |            | m.e           |          | m.e          |
1166	   +------------+------------+---------------+----------+--------------+
1167	   | the last   | 1 m.e      | #mvpn_PE-1    | /        | #mvpn_PE m.e |
1168	   | PE leaves  |            | m.e           |          |              |
1169	   +------------+------------+---------------+----------+--------------+
1170	   | total for  | #R_PE x 2  | (#mvpn_PE-1)  | 0        | #mvpn_PE x ( |
1171	   | #R_PE PEs  | +          | x (#R_PE) x 2 |          | 3 x #R_PE +  |
1172	   |            | T/T_PIM_r  | + T/T_PIM_r)  |          | T/T_PIM_r )  |
1173	   |            | m.e        | .             |          | m.e          |
1174	   |            |            | (#mvpn_PE-1)  |          |              |
1175	   |            |            | m.e           |          |              |
1176	   +------------+------------+---------------+----------+--------------+
1177	   | total      | 1 s.e      | #R_PE s.e     | 0        | #R_PE+1 s.e  |
1178	   | state      |            |               |          |              |
1179	   | maintained |            |               |          |              |
1180	   +------------+------------+---------------+----------+--------------+

1182	    Messages processing and state maintenance - PIM LAN procedures, by
1183	                                  default

1185	   We suppose here that the PIM Join suppression and Prune Override
1186	   mechanisms are fully effective, i.e. that a Join or Prune message
1187	   sent by a PE is instantly seen by other PEs.  Strictly speaking, this
1188	   is not true, and depending on network delays and timing, there could
1189	   be cases where more messages are exchanged and the number given in
1190	   this table is a lower bound to the number of PIM messages exchanged.

1192	A.2.  PIM LAN procedures, with explicit tracking

1194	   +-------------+-------------+----------------+--------+-------------+
1195	   |             | upstream PE | other PEs      | RRs    | total       |
1196	   |             | (1)         | (total across  | (none) | across all  |
1197	   |             |             | (#mvpn_PE-1)   |        | equipments  |
1198	   |             |             | PEs)           |        |             |
1199	   +-------------+-------------+----------------+--------+-------------+
1200	   | first PE    | 1 m.e       | 1 m.e (see     | /      | 2 m.e       |
1201	   | joins       |             | note below)    |        |             |
1202	   +-------------+-------------+----------------+--------+-------------+
1203	   | for *each*  | 1 m.e       | 1 m.e (see     | /      | 2 m.e       |
1204	   | additional  |             | note below)    |        |             |
1205	   | PE joining  |             |                |        |             |
1206	   +-------------+-------------+----------------+--------+-------------+
1207	   | baseline    | (T/T_PIM_r) | (T/T_PIM_r)    | /      | (T/T_PIM_r) |
1208	   | processing  | m.e x #R_PE | m.e (see note  |        | x #R_PE m.e |
1209	   | over a      | m.e         | below)         |        |             |
1210	   | period T    |             |                |        |             |
1211	   +-------------+-------------+----------------+--------+-------------+
1212	   | for *each*  | 1 m.e       | 1 m.e (see     | /      | 2 m.e       |
1213	   | PE leaving  |             | note below)    |        |             |
1214	   +-------------+-------------+----------------+--------+-------------+
1215	   | the last PE | 1 m.e       | 1 m.e (see     | /      | 2 m.e       |
1216	   | leaves      |             | note below)    |        |             |
1217	   +-------------+-------------+----------------+--------+-------------+
1218	   | total for   | #R_PE (2 +  | #R_PE x ( 2 +  | 0      | #R_PE x ( 4 |
1219	   | #R_PE PEs   | T/T_PIM_r)  | T/T_PIM_r) m.e |        | +           |
1220	   |             | m.e         |                |        | T/T_PIM_r)  |
1221	   |             |             |                |        | m.e         |
1222	   +-------------+-------------+----------------+--------+-------------+
1223	   | total state | #R_PE s.e   | #R_PE s.e      | 0      | 2 x #R_PE   |
1224	   | maintained  |             |                |        | s.e         |
1225	   +-------------+-------------+----------------+--------+-------------+

1227	   Messages processing and state maintenance - PIM LAN procedures, with
1228	                             explicit tracking

1230	   Note: in this explicit tracking mode, a said Join or Leave message
1231	   requires processing only by the upstream PE and the PE sending the
1232	   message ; indeed, other PEs don't have any action to take ; it is to
1233	   be noted though that these other PEs will still have to parse the PIM
1234	   message, which is not zero processing.  We make here the assumption
1235	   that this is not significant.

1237	A.3.  BGP-based

1239	   About RR: we suppose that a message has to be processed by r BGP
1240	   route reflectors to go from a receiver-connected PE to the source-
1241	   connected PE.  In practice, r depends on how RR are meshed, and would
1242	   typically be small (max 1,2,3...), and r tends quickly toward 1 (as
1243	   soon as there is a receiver-connected PEs in each RR cluster).

1245	   We make the assumption that BGP constrained VPN route distribution
1246	   [RFC4684] is used, if not the amount of state and message processing
1247	   with this approach is similar to the PIM with explicit tracking
1248	   approachAppendix A.2, without the Joins refreshes.

1250	   +-------------+----------+---------------+------------+-------------+
1251	   |             | upstream | other PEs     | RRs (#RR)  | total       |
1252	   |             | PE (1)   | (total across |            | across all  |
1253	   |             |          | (#mvpn_PE-1)  |            | equipments  |
1254	   |             |          | PEs)          |            |             |
1255	   +-------------+----------+---------------+------------+-------------+
1256	   | first PE    | 1 m.e    | 1 m.e         | r m.e      | (r+2) m.e   |
1257	   | joins       |          |               |            |             |
1258	   +-------------+----------+---------------+------------+-------------+
1259	   | for *each*  | 0        | 1 m.e         | between 1  | between 2   |
1260	   | additional  |          |               | and r m.e  | and (r+1)   |
1261	   | PE joining  |          |               |            | m.e         |
1262	   +-------------+----------+---------------+------------+-------------+
1263	   | baseline    | 0        | 0             | 0          | 0           |
1264	   | processing  |          |               |            |             |
1265	   | over a      |          |               |            |             |
1266	   | period T    |          |               |            |             |
1267	   +-------------+----------+---------------+------------+-------------+
1268	   | for *each*  | 0        | 1 m.e         | between 1  | between 2   |
1269	   | PE leaving  |          |               | and r m.e  | and (r+1)   |
1270	   |             |          |               |            | m.e         |
1271	   +-------------+----------+---------------+------------+-------------+
1272	   | the last PE | 1 m.e    | 1 m.e         | r m.e      | (r+2) m.e   |
1273	   | leaves      |          |               |            |             |
1274	   +-------------+----------+---------------+------------+-------------+
1275	   | total for   | 2 m.e    | #R_PE x 2 m.e | 2          | 2 (2 x      |
1276	   | #R_PE PEs   |          |               | (r+#R_PE)  | #R_PE + r + |
1277	   |             |          |               | m.e        | 1) m.e      |
1278	   +-------------+----------+---------------+------------+-------------+
1279	   | total state | 2 s.e    | 2 x #R_PE s.e | approx.    | approx.     |
1280	   | maintained  |          |               | (#R_PE x   | (#R_PE x    |
1281	   |             |          |               | #RR) s.e   | (#RR+2))    |
1282	   |             |          |               |            | m.e         |
1283	   +-------------+----------+---------------+------------+-------------+
1284	      Message processing and state maintenance - BGP-based procedures

1286	A.4.  Side by side orders of magnitude comparison

1288	   This section concludes on the previous section by considering the
1289	   orders of magnitude when the number of PEs in a VPN increases.

1291	   +------------+----------------------+--------------+----------------+
1292	   |            | PIM LAN Procedures,  | PIM LAN      | BGP-based      |
1293	   |            | default              | Procedures,  |                |
1294	   |            |                      | explicit     |                |
1295	   |            |                      | tracking     |                |
1296	   +------------+----------------------+--------------+----------------+
1297	   | first PE   | O(#mVPN_PE)          | O(1)         | O(1)           |
1298	   | joins (in  |                      |              |                |
1299	   | m.e)       |                      |              |                |
1300	   +------------+----------------------+--------------+----------------+
1301	   | for *each* | O(#mVPN_PE)          | O(1)         | O(1)           |
1302	   | additional |                      |              |                |
1303	   | PE joining |                      |              |                |
1304	   | (in m.e)   |                      |              |                |
1305	   +------------+----------------------+--------------+----------------+
1306	   | baseline   | (T/T_PIM_r) x        | (T/T_PIM_r)  | 0              |
1307	   | processing | O(#mvpn_PE)          | x O(#R_PE)   |                |
1308	   | over a     |                      |              |                |
1309	   | period T   |                      |              |                |
1310	   | (in m.e)   |                      |              |                |
1311	   +------------+----------------------+--------------+----------------+
1312	   | for *each* | O(#mVPN_PE)          | O(1)         | O(1)           |
1313	   | PE leaving |                      |              |                |
1314	   | (in m.e)   |                      |              |                |
1315	   +------------+----------------------+--------------+----------------+
1316	   | the last   | O(#mVPN_PE)          | O(1)         | O(1)           |
1317	   | PE leaves  |                      |              |                |
1318	   | (in m.e)   |                      |              |                |
1319	   +------------+----------------------+--------------+----------------+
1320	   | total for  | O(#mVPN_PE x #R_PE)  | O(#R_PE) x   | O(#R_PE)       |
1321	   | #R_PE PEs  | + O(#mVPN_PE x       | (T/T_PIM_r)  |                |
1322	   | (in m.e)   | T/T_PIM_r)           |              |                |
1323	   +------------+----------------------+--------------+----------------+
1324	   | states (in | O(#R_PE)             | O(#R_PE)     | O(#R_PE x #RR) |
1325	   | s.e)       |                      |              |                |
1326	   +------------+----------------------+--------------+----------------+
1327	   +------------+----------------------+--------------+----------------+
1328	   | notes      | (processing and      | (processing  | (processing    |
1329	   |            | state maintenance    | and state    | and state      |
1330	   |            | are essentially done | maintenance  | maintenance is |
1331	   |            | by, and spread       | is           | essentially    |
1332	   |            | amongst, the PEs of  | essentially  | done by, and   |
1333	   |            | the MVPN ;           | done on the  | spread         |
1334	   |            | non-upstream PEs     | upstream PE) | amongst, the   |
1335	   |            | have processing to   |              | RRs)           |
1336	   |            | do)                  |              |                |
1337	   +------------+----------------------+--------------+----------------+

1339	    Comparison of orders of magnitude for messages processing and state
1340	                maintenance (totals across all equipements)

1342	   The conclusions that can be drawn from the above are that:

1344	   o  the PIM LAN Procedures default approach is particular in that all
1345	      PEs, including those that are neither upstream nor downstream for
1346	      a given message have processing to do, which results in a total
1347	      amount of messages to process which is in O(#mVPN_PE x #R_PE),
1348	      i.e.  O(#mVPN_PE ^ 2) if the proportion of receiver PEs is
1349	      considered constant when the number of PEs increases ;

1351	   o  the two PIM-based approach do refreshes of Join messages, this is
1352	      a linear factor not changing the order of magnitude, but which can
1353	      be significant for long-lived streams ;

1355	   o  the BGP-based approach requires an amount of message processing in
1356	      O(#R_PE), lower than the two other approaches, and which is
1357	      independent of the duration of streams ;

1359	   o  state maintenance is in the same order of magnitude for all
1360	      approaches : O(#R_PE), but the repartition is different:

1362	      *  the PIM LAN Procedure default approach fully spreads, and
1363	         minimizes, the amount of state (one state per PE)

1365	      *  the PIM LAN procedure with explicit tracking, concentrate all
1366	         state on the upstream PE

1368	      *  the BGP-based procedures spread all the state on the set of
1369	         route reflectors

1371	   This quantification of message processing is based on a use case
1372	   where each PE with receivers has joined and left once.  Drawing
1373	   scalability-related conclusions for other patterns of changes of the
1374	   set of receiver-connected PEs, requires considering the cost of each
1375	   approach for "a new PE joining" and "a (non-last) PE leaving".  From
1376	   this perspective, the "PIM LAN Procedure default approach" is the one
1377	   with the higher total amount of message processing across all nodes
1378	   (in O(#mVPN_PE)), whereas the other approaches are in O(1) ; the "PIM
1379	   LAN Procedures with explicit tracking" reduce the processing to the
1380	   minimum in that case, the BGP-based approach having a cost increasing
1381	   by a linear factor depending on the number of RRs that will have to
1382	   parse the message.

1384	Appendix B.  Switching to S-PMSI

1386	   [ the following point was fixed in version 07 of
1387	   [I-D.ietf-l3vpn-2547bis-mcast], and is here for reference only ]

1389	   Section 7.2.2.3 of [I-D.ietf-l3vpn-2547bis-mcast] proposes two
1390	   approaches for how a source PE can decide when to start transmitting
1391	   customer multicast traffic on a S-PMSI:

1393	   1.  The source PE sends multicast packets for the <C-S, C-G> on both
1394	       the I-PMSI P-multicast tree and the S-PMSI P-multicast tree
1395	       simultaneously for a pre-configured period of time, letting the
1396	       receiver PEs select the new tree for reception, before switching
1397	       to only the S-PMSI.

1399	   2.  The source PE waits for a pre-configured period of time after
1400	       advertising the <C-S, C-G> entry bound to the S-PMSI before fully
1401	       switching the traffic onto the S-PMSI-bound P-multicast tree.

1403	   The first alternative has essentially two drawbacks:

1405	   o  <C-S,C-G> traffic is sent twice for some period of time, which
1406	      would appear to be at odds with the motivation for switching to an
1407	      S-PMSI in order to optimize the bandwidth used by the multicast
1408	      tree for that stream.

1410	   o  It is unlikely that the switchover can occur without packet loss
1411	      or duplication if the transit delays of the I-PMSI P-multicast
1412	      tree and the S-PMSI P-multicast tree differ.

1414	   By contrast, the second alternative has none of these drawbacks, and
1415	   satisfy the requirement in section 5.1.3 of [RFC4834], which states
1416	   that "[...] a multicast VPN solution SHOULD as much as possible
1417	   ensure that client multicast traffic packets are neither lost nor
1418	   duplicated, even when changes occur in the way a client multicast
1419	   data stream is carried over the provider network".  The second
1420	   alternative also happen to be the one used in existing deployments.

1422	   For these reasons, it is the authors' recommendation to mandate the
1423	   implementation of the second alternative for switching to S-PMSI.

1425	Authors' Addresses

1427	   Thomas Morin (editor)
1428	   France Telecom - Orange Labs
1429	   2 rue Pierre Marzin
1430	   Lannion  22307
1431	   France

1433	   Email: thomas.morin@orange-ftgroup.com

1435	   Ben Niven-Jenkins (editor)
1436	   BT
1437	   208 Callisto House, Adastral Park
1438	   Ipswich, Suffolk  IP5 3RE
1439	   UK

1441	   Email: benjamin.niven-jenkins@bt.com

1443	   Yuji Kamite
1444	   NTT Communications Corporation
1445	   Tokyo Opera City Tower
1446	   3-20-2 Nishi Shinjuku, Shinjuku-ku
1447	   Tokyo  163-1421
1448	   Japan

1450	   Email: y.kamite@ntt.com

1452	   Raymond Zhang
1453	   BT
1454	   2160 E. Grand Ave.
1455	   El Segundo  CA 90025
1456	   USA

1458	   Email: raymond.zhang@bt.com
1459	   Nicolai Leymann
1460	   Deutsche Telekom
1461	   Goslarer Ufer 35
1462	   10589 Berlin
1463	   Germany

1465	   Email: n.leymann@telekom.de

1467	   Nabil Bitar
1468	   Verizon
1469	   40 Sylvan Road
1470	   Waltham, MA  02451
1471	   USA

1473	   Email: nabil.n.bitar@verizon.com