idnits 2.17.1 draft-ietf-l3vpn-mvpn-considerations-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 40 instances of too long lines in the document, the longest one being 1 character in excess of 72. == There are 3 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 417 has weird spacing: '... or the us...' -- The document date (October 26, 2009) is 5295 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC4364' is mentioned on line 740, but not defined == Outdated reference: A later version (-10) exists of draft-ietf-l3vpn-2547bis-mcast-08 == Outdated reference: A later version (-15) exists of draft-rosen-vpn-mcast-12 == Outdated reference: A later version (-10) exists of draft-ietf-pim-sm-linklocal-08 == Outdated reference: A later version (-09) exists of draft-ietf-pim-port-01 == Outdated reference: A later version (-15) exists of draft-ietf-mpls-ldp-p2mp-08 Summary: 2 errors (**), 0 flaws (~~), 10 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group T. Morin, Ed. 3 Internet-Draft France Telecom Orange 4 Expires: April 29, 2010 B. Niven-Jenkins, Ed. 5 BT 6 Y. Kamite 7 NTT Communications 8 R. Zhang 9 BT 10 N. Leymann 11 Deutsche Telekom 12 N. Bitar 13 Verizon 14 October 26, 2009 16 Mandatory Features in a Layer 3 Multicast BGP/MPLS VPN Solution 17 draft-ietf-l3vpn-mvpn-considerations-05 19 Status of this Memo 21 This Internet-Draft is submitted to IETF in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF), its areas, and its working groups. Note that 26 other groups may also distribute working documents as Internet- 27 Drafts. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 The list of current Internet-Drafts can be accessed at 35 http://www.ietf.org/ietf/1id-abstracts.txt. 37 The list of Internet-Draft Shadow Directories can be accessed at 38 http://www.ietf.org/shadow.html. 40 This Internet-Draft will expire on April 29, 2010. 42 Copyright Notice 44 Copyright (c) 2009 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents in effect on the date of 49 publication of this document (http://trustee.ietf.org/license-info). 50 Please review these documents carefully, as they describe your rights 51 and restrictions with respect to this document. 53 Abstract 55 More that one set of mechanisms to support multicast in a layer 3 56 BGP/MPLS VPN has been defined. These are presented in the documents 57 that define them as optional building blocks. 59 To enable interoperability between implementations, this document 60 defines a subset of features that is considered mandatory for a 61 multicast BGP/MPLS VPN implementation. This will help implementers 62 and deployers understand which L3VPN multicast requirements are best 63 satisfied by each option. 65 Requirements Language 67 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 68 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 69 document are to be interpreted as described in [RFC2119]. 71 Table of Contents 73 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 74 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 75 3. Examining alternatives mechanisms for MVPN functions . . . . . 4 76 3.1. MVPN auto-discovery . . . . . . . . . . . . . . . . . . . 4 77 3.2. S-PMSI Signaling . . . . . . . . . . . . . . . . . . . . . 5 78 3.3. PE-PE Exchange of C-Multicast Routing . . . . . . . . . . 7 79 3.3.1. PE-PE C-multicast routing scalability . . . . . . . . 7 80 3.3.2. PE-CE multicast routing exchange scalability . . . . . 10 81 3.3.3. P-routers scalability . . . . . . . . . . . . . . . . 10 82 3.3.4. Impact of C-multicast routing on Inter-AS 83 deployments . . . . . . . . . . . . . . . . . . . . . 10 84 3.3.5. Security and robustness . . . . . . . . . . . . . . . 11 85 3.3.6. C-multicast VPN join latency . . . . . . . . . . . . . 12 86 3.3.7. Conclusion on C-multicast routing . . . . . . . . . . 14 87 3.4. Encapsulation techniques for P-multicast trees . . . . . . 14 88 3.5. Inter-AS deployments options . . . . . . . . . . . . . . . 16 89 3.6. Bidir-PIM support . . . . . . . . . . . . . . . . . . . . 18 90 4. Co-located RPs . . . . . . . . . . . . . . . . . . . . . . . . 19 91 5. Existing deployments . . . . . . . . . . . . . . . . . . . . . 20 92 6. Summary of recommendations . . . . . . . . . . . . . . . . . . 21 93 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 94 8. Security Considerations . . . . . . . . . . . . . . . . . . . 21 95 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 22 96 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 22 97 10.1. Normative References . . . . . . . . . . . . . . . . . . . 22 98 10.2. Informative References . . . . . . . . . . . . . . . . . . 22 99 Appendix A. Scalability of C-multicast routing processing load . 23 100 A.1. Scalability with an increased number of PEs . . . . . . . 25 101 A.1.1. SSM Scenario . . . . . . . . . . . . . . . . . . . . . 25 102 A.1.2. ASM Scalability . . . . . . . . . . . . . . . . . . . 32 103 A.2. Cost of PEs leaving and joining . . . . . . . . . . . . . 34 104 Appendix B. Switching to S-PMSI . . . . . . . . . . . . . . . . . 37 105 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 37 107 1. Introduction 109 Specifications for multicast in BGP/MPLS 110 [I-D.ietf-l3vpn-2547bis-mcast] include multiple alternative 111 mechanisms for some of the required building blocks of the solution. 112 However, they do not identify which of these mechanisms are mandatory 113 to implement in order to ensure interoperability. Not defining a set 114 of mandatory to implement mechanisms leads to a situation where 115 implementations may support different subsets of the available 116 optional mechanisms which do not interoperate, which is a problem for 117 the numerous operators having multi-vendor backbones. 119 The aim of this document is to leverage the already expressed 120 requirements [RFC4834] and study the properties of each approach, to 121 identify mechanisms that are good candidates for being part of a core 122 set of mandatory mechanisms which can be used to provide a base for 123 interoperable solutions. 125 This document goes through the different building blocks of the 126 solution and concludes on which mechanisms an implementation is 127 required to implement. Section 6 summarizes these requirements. 129 Considering the history of the multicast VPN proposals and 130 implementations, it is also useful to discuss how existing 131 deployments of early implementations 132 [I-D.rosen-vpn-mcast][I-D.raggarwa-l3vpn-2547-mvpn] can be 133 accommodated, and provide suggestions in this respect. 135 2. Terminology 137 Please refer to [I-D.ietf-l3vpn-2547bis-mcast] and [RFC4834]. 139 3. Examining alternatives mechanisms for MVPN functions 141 3.1. MVPN auto-discovery 143 The current solution document [I-D.ietf-l3vpn-2547bis-mcast] proposes 144 two different mechanisms for MVPN auto-discovery: 146 1. BGP-based auto-discovery 148 2. "PIM/shared P-tunnel": discovery done through the exchange of PIM 149 Hellos by C-PIM instances, across an MI-PMSI implemented with one 150 shared P-tunnel per VPN (using multicast ASM, or MP2MP LDP) 152 Both solutions address Section 5.2.10 of [RFC4834] which states that 153 "the operation of a multicast VPN solution SHALL be as light as 154 possible and providing automatic configuration and discovery SHOULD 155 be a priority when designing a multicast VPN solution. Particularly 156 the operational burden of setting up multicast on a PE or for a VR/ 157 VRF SHOULD be as low as possible". 159 The key consideration is that PIM-based discovery is only applicable 160 to deployments using a shared P-tunnel to instantiate an MI-PMSI (it 161 is not applicable if only P2P, PIM-SSM, P2MP mLDP/RSVP-TE P-tunnels 162 are used, because contrary to ASM and MP2MP, building these types of 163 P-tunnels cannot happen before the autodiscovery has been done), 164 whereas the BGP-based auto-discovery does not place any constraint on 165 the type of P-tunnel that would have to be used. BGP-based auto- 166 discovery is independent of the type of P-tunnel used thus satisfying 167 the requirement in section 5.2.4.1 of [RFC4834] that "a multicast VPN 168 solution SHOULD be designed so that control and forwarding planes are 169 not interdependent". 171 Additionally, it is to be noted that a number of service providers 172 have chosen to use SSM-based P-tunnels for the default MDTs within 173 their current deployments, therefore relying already on some BGP- 174 based auto-discovery. 176 Moreover, when shared P-tunnels are used, the use of BGP auto- 177 discovery would allow inconsistencies in the addresses/identifiers 178 used for the shared P-tunnel to be detected (e.g. the same shared 179 P-tunnel identifier being used for different VPNs with distinct BGP 180 route targets). This is particularly attractive in the context of 181 inter-AS VPNs where the impact of any misconfiguration could be 182 magnified and where a single service provider may not operate all the 183 ASs. Note that this technique to detect some misconfiguration cases 184 may not be usable during a transition period from a shared-P-tunnel 185 autodiscovery to a BGP-based autodiscovery. 187 Thus, the recommendation is that implementation of the BGP-based 188 auto-discovery is mandated and should be supported by all MVPN 189 implementations. 191 3.2. S-PMSI Signaling 193 The current solution document [I-D.ietf-l3vpn-2547bis-mcast] proposes 194 two mechanisms for signaling that multicast flows will be switched to 195 an S-PMSI: 197 1. a UDP-based TLV protocol specifically for S-PMSI signaling 198 (described in section 7.4.2). 200 2. a BGP-based mechanism for S-PMSI signaling (described in section 201 7.4.1). 203 Section 5.2.10 of [RFC4834] states that "as far as possible, the 204 design of a solution SHOULD carefully consider the number of 205 protocols within the core network: if any additional protocols are 206 introduced compared with the unicast VPN service, the balance between 207 their advantage and operational burden SHOULD be examined 208 thoroughly". The UDP-based mechanism would be an additional protocol 209 in the MVPN stack, which isn't the case for the BGP-based S-PMSI 210 switching signaling, since (a) BGP is identified as a requirement for 211 autodiscovery, and (b) the BGP-based S-PMSI switching signaling 212 procedures are very similar to the autodiscovery procedures. 214 Furthermore, the UDP-based S-PMSI switching signaling mechanism 215 requires an MI-PMSI, while the BGP-based protocol does not. In 216 practice, this mean that with the UDP-based protocol a PE will have 217 to join to all P-tunnels of all PEs in an MVPN, while in the 218 alternative where BGP-based S-PMSI switching signaling is used, it 219 could delay joining a P-tunnel rooted at a PE until traffic from that 220 PE is needed, thus reducing the amount of state maintained on P 221 routers. 223 S-PMSI switching signaling approaches can also be compared in an 224 inter-AS context (see Section 3.5). The proposed BGP-based approach 225 for S-PMSI switching signaling provides a good fit with both the 226 segmented and non-segmented inter-AS approaches (seeSection 3.5). By 227 contrast while the UDP-based approach for S-PMSI switching signaling 228 appears to be usable with segmented inter-AS tunnels, in that case 229 key advantages of the segmented approach are lost: 231 o there is no more an independence of ASes to choose when S-PMSIs 232 tunnels will be triggered in their AS (and thus control the amount 233 of state created on their P routers), 235 o there is no more an independence of ASes to choose the tunneling 236 technique for the P-tunnels used for an S-PMSI, 238 o In an inter-AS option B context, an isolation of ASes is obtained 239 as PEs in one AS don't have (direct) exchange of routing 240 information with PEs of other ASes. This property is not 241 preserved if UDP-based S-PMSI switching signaling is used. By 242 contrast, BGP-based C-Multicast switching signaling does preserve 243 this property. 245 Given all the above, it is the recommendation of the authors that BGP 246 is the preferred solution for S-PMSI switching signaling and should 247 be supported by all implementations. 249 It is identified that, if nothing prevents a fast-paced creation of 250 S-PMSI, then S-PMSI switching signaling with BGP would possibly 251 impact the Route Reflectors used for MVPN routes. However is it also 252 identified that such a fast-paced behavior would have an impact on P 253 and PE routers resulting from S-PMSI tunnels signaling, which will be 254 the same independently of the S-PMSI signaling approach that is used, 255 and which it is certainly best to avoid by setting up proper 256 mechanisms. 258 The UDP-based S-PMSI switching signaling protocol can also be 259 considered, as an option, given that this protocol has been in 260 deployment for some time. Implementations supporting both protocols 261 would be expected to provide a per-VRF configuration knob to allow an 262 implementation to use the UDP-based TLV protocol for S-PMSI switching 263 signaling for specific VRFs in order to support the coexistence of 264 both protocols (for example during migration scenarios). Apart from 265 such migration-facilitating mechanisms, the authors specifically do 266 not recommend extending the already proposed UDP-based TLV protocol 267 to new types of P-tunnels. 269 3.3. PE-PE Exchange of C-Multicast Routing 271 The current solution document [I-D.ietf-l3vpn-2547bis-mcast] proposes 272 multiple mechanisms for PE-PE exchange of customer multicast routing 273 information (C-multicast routing): 275 1. Full per-MVPN PIM peering across an MI-PMSI (described in section 276 3.4.1.1). 278 2. Lightweight PIM peering across an MI-PMSI (described in section 279 3.4.1.2) 281 3. The unicasting of PIM C-Join/Prune messages (described in section 282 3.4.1.3) 284 4. The use of BGP for carrying C-Multicast routing (described in 285 section 3.4.2). 287 3.3.1. PE-PE C-multicast routing scalability 289 Scalability being one of the core requirements for multicast VPN, it 290 is useful to compare the proposed C-multicast routing mechanisms from 291 this perspective: Section 4.2.4 of [RFC4834] recommends that "a 292 multicast VPN solution SHOULD support several hundreds of PEs per 293 multicast VPN, and MAY usefully scale up to thousands" and section 294 4.2.5 states that "a solution SHOULD scale up to thousands of PEs 295 having multicast service enabled". 297 Scalability with an increased number of VPNs per PE, or with an 298 increased number of multicast state per VPN, are also important, but 299 are not focused on in this section since we didn't identify 300 differences between the different approaches for these matters: all 301 others things equal, the load on PE due to C-multicast routing 302 increases roughly linearly with the number of VPNs per PE, and with 303 the number of multicast state per VPN. 305 This section presents conclusions related to PE-PE C-multicast 306 routing scalability. Appendix A provides more detailed explanations 307 on the differences in ways of handling the C-multicast routing load, 308 between the PIM-based approaches and the BGP-based approach, along 309 with a quantified evaluations of the amount of state and messages 310 with the different approaches, and many points made in this section 311 are detailed in Appendix A.1. 313 At high scales of multicast deployment, the first and third 314 mechanisms require the PEs to maintain a large number of PIM 315 adjacencies with other PEs of the same multicast VPN (which implies 316 the regular exchange PIM Hellos with each other) and to periodically 317 refresh C-Join/Prune states, resulting in an increased processing 318 cost when the amount of PEs increases (as detailed in Appendix A.1) 319 to which the second approach is less subject, and to which the fourth 320 approach is not subject. 322 The third mechanism would reduce the amount of C-Join/Prune 323 processing for a given multicast flow for PEs that are not the 324 upstream neighbor for this flow, but would require "explicit 325 tracking" state to be maintained by the upstream PE. It also isn't 326 compatible with the "Join suppression" mechanism. A possible way to 327 reduce the amount of signaling with this approach would be the use of 328 a PIM refresh-reduction mechanism. Such a mechanism, based on TCP, 329 is being specified by the PIM IETF Working Group 330 ([I-D.ietf-pim-port]) ; its use in a multicast VPN context has not 331 been described in [I-D.ietf-l3vpn-2547bis-mcast], but it is expected 332 that this approach would provide a scalability similar with the BGP- 333 based approach without RR. 335 The second mechanism would operate in a similar manner to full per- 336 MVPN PIM peering except that PIM Hello messages are not transmitted 337 and PIM C-Join/Prune refresh-reduction would be used, thereby 338 improving scalability, but this approach has yet to be fully 339 described. In any case, it seems that it only improves one thing 340 among the things that will impact scalability when the number of PEs 341 increases. 343 The first and second mechanisms can leverage the "Join suppression" 344 behavior and thus improve the processing burden of an upstream PE, 345 sparing the processing of a Join refresh message for each remote PE 346 joined to a multicast stream. This improvement requires all PEs of a 347 multicast VPN to process all PIM Join and Prune messages sent by any 348 other PE participating in the same multicast VPN whether they are the 349 upstream PE or not. 351 The fourth mechanism (the use of BGP for carrying C-Multicast 352 routing) would have a comparable drawback of requiring all PEs to 353 process a BGP C-multicast route only interesting a specific upstream 354 PE. For this reason section 16 [I-D.ietf-l3vpn-2547bis-mcast-bgp] 355 recommends the use of the Route-Target constrained BGP distribution 356 [RFC4684] mechanisms, which eliminate this drawback by making only 357 the interested upstream PE to receive a BGP C-multicast route. 358 Specifically when Route-Target constrained BGP distribution is used, 359 the fourth mechanism reduces the total amount of C-multicast routing 360 processing load put on the PEs by avoiding any processing of customer 361 multicast routing information on the "unrelated" PEs, that are 362 neither the joining PE nor the upstream PE. 364 Moreover, the fourth mechanism further reduces the total amount of 365 message processing load by avoiding the use of periodic refreshes, 366 and by inheriting BGP features that are expected to improve 367 scalability (for instance, providing a means to offload some of the 368 processing burden associated with customer multicast routing onto one 369 or many BGP route-reflectors). The advantages of the fourth 370 mechanism come at a cost of maintaining an amount of state linear 371 with the number of PEs joined to a stream. However, the use of route 372 reflectors allows to spread this cost among multiple route 373 reflectors, thus eliminating the need for a single route reflector to 374 maintain all this state. 376 However, the fourth mechanism is specific in that it offers the 377 possibility of offloading customer multicast routing processing onto 378 one or more BGP Route Reflector(s). When this is used, there is a 379 drawback of increasing the processing load placed on the route 380 reflector infrastructure. In the higher scale scenarios, it may be 381 required to adapt the route reflector infrastructure to the MVPN 382 routing load by using, for example: 384 o a separation of resources for unicast and multicast VPN routing: 385 using dedicated MVPN Route Reflector(s) (or using dedicated MVPN 386 BGP sessions or dedicated MVPN BGP instances) ; 388 o the deployment of additional route reflector resources, for 389 example increasing the processing resources on existing route 390 reflectors or deployment of additional route reflectors. 392 Among the above, the most straightforward approach is to consider the 393 introduction of route reflectors dedicated to the MVPN service and 394 dimension them accordingly to the need of that service (but doing so 395 is not required and is left as an operator engineering decision). 397 3.3.2. PE-CE multicast routing exchange scalability 399 The overhead associated with the PE-CE exchange of C-multicast 400 routing is independent of the choice of the mechanism used for the 401 PE-PE C-multicast routing. Therefore, the impact of the PE-CE 402 C-multicast routing overhead on the overall system scalability is 403 independent of the protocol used for PE-PE signaling, and therefore 404 is not relevant when comparing the different approaches proposed for 405 the PE-PE C-multicast routing. This is true even if in some 406 operational contexts the PE-CE C-multicast routing overhead is a 407 significant factor in the overall system overhead. 409 3.3.3. P-routers scalability 411 Mechanisms (1) and (2) are restricted to use within multicast VPNs 412 that use an MI-PMSI, thereby necessitating: 414 the use of a P-tunnel technique that allows shared P-tunnels (for 415 example PIM-SM in ASM mode or MP2MP LDP) 417 or the use of one P-tunnel per PE per VPN, even for PEs that do not 418 have sources in their directly attached sites for that VPN. 420 By comparison, the fourth mechanism doesn't impose either of these 421 restrictions, and when P2MP P-tunnels are used only necessitates the 422 use of one P-tunnel per VPN per PE attached to a site with a 423 multicast source or RP (or with a candidate BSR, if BSR is used). 425 In cases where there are less PEs connected with sources than the 426 total amount of PEs, it improves the amount of state maintained by 427 P-routers compared to the amount required to build an MI-PMSI with 428 P2MP P-tunnels. Such cases are expected to be frequent for multicast 429 VPN deployments (see sections 4.2.4.1 of [RFC4834]). 431 3.3.4. Impact of C-multicast routing on Inter-AS deployments 433 Co-existence with unicast inter-AS VPN options, and an equal level of 434 security for multicast and unicast including in an inter-AS context, 435 are specifically mentioned in sections 5.2.6, 5.2.8 and 5.2.12 of 436 [RFC4834]. 438 In an inter-AS option B context, an isolation of ASes is obtained as 439 PEs in one AS don't have (direct) exchange of routing information 440 with PEs of other ASes. This property is not preserved if PIM-based 441 PE-PE C-multicast routing is used. By contrast, the fourth option 442 (BGP-based C-Multicast routing) does preserve this property. 444 Additionally, the authors note that the proposed BGP-based approach 445 for C-multicast routing provides a good fit with both the segmented 446 and non-segmented inter-AS approaches. By contrast, though the PIM- 447 based C-multicast routing is usable with segmented inter-AS tunnels, 448 the inter-AS scalability advantage of the approach is lost, since PEs 449 in an AS will see the C-multicast routing activity of all other PEs 450 of all other ASes. 452 3.3.5. Security and robustness 454 BGP supports MD5 authentication of its peers for additional security, 455 thereby possibly benefit directly to multicast VPN customer multicast 456 routing, whether for intra-AS or inter-AS communications. By 457 contrast, with a PIM-based approach, no mechanism providing a 458 comparable level of security to authenticate communications between 459 remote PEs has been yet fully described yet 460 [I-D.ietf-pim-sm-linklocal][], and in any case would require 461 significant additional operations for the provider to be usable in a 462 multicast VPN context. 464 The robustness of the infrastructure, especially the existing 465 infrastructure providing unicast VPN connectivity, is key. The 466 C-multicast routing function, especially under load, will compete 467 with the unicast routing infrastructure. With the PIM-based 468 approaches, the unicast and multicast VPN routing functions are 469 expected to only compete in the PE, for control plane processing 470 resources. In the case of the BGP-based approach, they will compete 471 on the PE for processing resources, and in the route reflectors 472 (supposing they are used for MVPN routing). It is identified that in 473 both cases, mechanisms will be required to arbitrate resources (e.g. 474 processing priorities). In the case of PIM-based procedures, between 475 the different control plane routing instances in the PE. And in the 476 case of the BGP-based approach, this is likely to require using 477 distinct BGP sessions for multicast and unicast (e.g. through the use 478 of dedicated MVPN BGP route reflectors, or to the use of a distinct 479 session with an existing route reflector). 481 Multicast routing is dynamic by nature, and multicast VPN routing has 482 to follow the VPN customers multicast routing events. The different 483 approaches can be compared on how they are expected to behave in 484 scenarios where multicast routing in the VPNs is subject to an 485 intense activity. Scalability of each approach under such a load is 486 detailed in Appendix A.2, and the fourth approach (BGP-based) used in 487 conjunction with the RT Constraint mechanisms [RFC4684], is the only 488 one having a cost for join/leave operations independent of the number 489 of PEs in the VPN (with one exception detailed in Appendix A.2) and 490 state maintenance not concentrated on the upstream PE. 492 On the other hand, while the BGP-based approach is likely to suffer a 493 slowdown under a load that is greater than the available processing 494 resources (because of possibly congested TCP sockets), the PIM-based 495 approaches would react to such a load by dropping messages, with 496 failure-recovery obtained through message refreshes. Thus, the BGP- 497 based approach could result in a degradation of join/leave latency 498 performance typically spread evenly across all multicast streams 499 being joined in that period, while the PIM-based approach could 500 result in increased join/leave latency, for some random streams, by a 501 multiple of the time between refreshes (e.g. tens of seconds), and 502 possibly in some states the adjacency may time-out resulting in 503 disruption of multicast streams. 505 The behavior of the PIM-based approach under such a load is also 506 harder to predict, given that the performance of the "Join 507 suppression" mechanism (an important mechanism for this approach to 508 scale) will itself be impeded by delays in Join processing. For 509 these reasons, the BGP-based approach would be able to provide a 510 smoother degradation and more predictable behavior under a highly 511 dynamic load. 513 In fact, both an "evenly spread degradation" and an "unevenly spread 514 larger degradation" can be problematic, and what seems important is 515 the ability for the VPN backbone operator to (a) limit the amount of 516 multicast routing activity that can be triggered by a multicast VPN 517 customer, and to (b) provide the best possible independence between 518 distinct VPNs. It seems that both of these can be addressed through 519 local implementation improvements, and that both the BGP-based and 520 PIM-based approaches could be engineered to provide (a) and (b). It 521 can be noted though that the BGP approach proposes ways to dampen 522 C-multicast route withdrawals and/or advertisements, and thus already 523 describes a way to provide (a), while nothing comparable has yet been 524 described for the PIM-based approaches (even though it doesn't appear 525 difficult). The PIM-based approaches rely on a per VPN dataplane to 526 carry the MVPN control plane, and thus may benefit from this first 527 level of separation to solve (b). 529 3.3.6. C-multicast VPN join latency 531 Section 5.1.3 of [RFC4834] states that "the group join delay [...] is 532 also considered one important QoS parameter. It is thus RECOMMENDED 533 that a multicast VPN solution be designed appropriately in this 534 regard". In a multicast VPN context, the "group join delay"of 535 interest is the time between a CE sending a PIM Join to its PE and 536 the first packet of the corresponding multicast stream being received 537 by the CE. 539 It is to be noted that the C-multicast routing procedures will only 540 impact the group join latency of a said multicast stream for the 541 first receiver that is located across the provider backbone from the 542 multicast source-connected PE (or the first receivers in the 543 specific case where a specific UMH selection algorithm is used, that 544 allows distinct UMH to be selected by distinct downstream PEs). 546 The different approaches proposed seem to have different 547 characteristics in how they are expected to impact join latency: 549 o the PIM-based approaches minimize the number of control plane 550 processing hops between a new receiver-connected PE and the 551 source-connected PE, and being datagram-based introduces minimal 552 delay, thereby possibly having a join latency as good as possible 553 depending on implementation efficiency 555 o under degraded conditions (packet loss, congestion, high control 556 plane load) the PIM-based approach may impact the latency for a 557 given multicast stream in an all or nothing manner: if a 558 C-multicast routing PIM Join packet is lost, latency can reach a 559 high time (a multiple of the periodicity of PIM Join refreshes) 561 o the BGP-based approach uses TCP exchanges, that may introduce an 562 additional delay depending on BGP and TCP implementation, but 563 which would typically result, under degraded conditions (such 564 packet loss, congestion, high control plane load), in a comparably 565 lower increase of latency spread more evenly across the streams 567 o as shown in Appendix A, the BGP-based approach is particular in 568 that it removes load from all the PEs (without putting this load 569 on the upstream PE for a stream); this improvement of background 570 load can bring improved performance when a PE acts as the upstream 571 PE for a stream, and thus benefit join latency 573 This qualitative comparison of approaches shows that the BGP-based 574 approach is designed for a smoother degradation of latency under 575 degraded conditions such as packet loss, congestion, or high control 576 plane load. On the other hand, the PIM-based approaches seem to 577 structurally be able to reach the shorter "best-case" group join 578 latency (especially compared to deployment of the BGP-based approach 579 where route-reflectors are used). 581 Doing a quantitative comparison of latencies is not possible without 582 referring to specific implementations and benchmarking procedures, 583 and would possibly expose different conclusions, especially for best- 584 case group join latency for which performance is expected vary with 585 PIM and BGP implementations. We can also note that improving a BGP 586 implementation for reduced latency of route processing would not only 587 benefit multicast VPN group join latency, but the whole BGP-based 588 routing, which means that the need for good BGP/RR performance is not 589 specific to multicast VPN routing. 591 Last, C-multicast join latency will be impacted by the overall load 592 put on the control plane, and the scalability of the C-multicast 593 routing approach is thus to be taken into account. As explained in 594 sections Section 3.3.1 and Appendix A, the BGP-based approach will 595 provide the best scalability with an increased number of PEs per VPN, 596 thereby benefiting group join latency in such higher scale scenarios. 598 3.3.7. Conclusion on C-multicast routing 600 The first and fourth approaches are relevant contenders for 601 C-multicast routing. Comparisons from a theoretical standpoint lead 602 to identify some advantages as well as possible drawbacks in the 603 fourth approach. Comparisons from a practical standpoint are harder 604 to make: since only reduced deployment and implementation information 605 is available for the fourth approach, advantages would be seen in the 606 first approach that has been applied through multiple deployments and 607 shown to be operationally viable. 609 Moreover, the first mechanism (full per-MVPN PIM peering across an 610 MI-PMSI) is the mechanism used by [I-D.rosen-vpn-mcast] and therefore 611 it is deployed and operating in MVPNs today. The fourth approach may 612 or may not end up being preferred for a said deployment, but because 613 the first approach has been in deployment for some time, the support 614 for this mechanism will in any case be helpful for to facilitate an 615 eventual migration from a deployment using mechanism close to the 616 first approach. 618 Consequently, at the present time, implementations are recommended to 619 support both the fourth (BGP-based) and first (Full per-MPVN PIM 620 peering) mechanisms. Further experience on deployments of the fourth 621 approach is needed before some best practice can be defined. In the 622 meantime, this recommendation would enable service providers to 623 choose between the first and the fourth mechanism, without this 624 choice being constrained by vendors implementation choices. 626 3.4. Encapsulation techniques for P-multicast trees 628 In this section the authors will not make any restricting 629 recommendations since the appropriateness of a specific provider core 630 data plane technology will depend on a large number of factors, for 631 example the service provider's currently deployed unicast data plane, 632 many of which are service provider specific. 634 However, implementations should not unreasonably restrict the data 635 plane technology that can be used, and should not force the use of 636 the same technology for different VPNs attached to a single PE. 637 Initial implementations may only support a reduced set of 638 encapsulation techniques and data plane technologies but this should 639 not be a limiting factor that hinders future support for other 640 encapsulation techniques, data plane technologies or 641 interoperability. 643 Section 5.2.4.1 of [RFC4834] states "In a multicast VPN solution 644 extending a unicast L3 PPVPN solution, consistency in the tunneling 645 technology has to be favored: such a solution SHOULD allow the use of 646 the same tunneling technology for multicast as for unicast. 647 Deployment consistency, ease of operation and potential migrations 648 are the main motivations behind this requirement." 650 Current unicast VPN deployments use a variety of LDP, RSVP-TE and 651 GRE/IP-Multicast for encapsulating customer packets for transport 652 across the provider core of VPN services. In order to allow the same 653 encapsulations to be used for unicast and multicast VPN traffic, it 654 is recommended that multicast VPN standards should recommend 655 implementations to support for multicast VPNs, all the P2MP variants 656 of the encapsulations and signaling protocols that they support for 657 unicast and for which some multipoint extension is defined, such as 658 mLDP, P2MP RSVP-TE and GRE/IP-multicast. 660 All three of the above encapsulation techniques support the building 661 of P2MP multicast P-tunnels. In addition mLDP and GRE/ 662 IP-ASM-Multicast implementations may also support the building of 663 MP2MP multicast P-tunnels. The use of MP2MP P-tunnels may provide 664 some scaling benefits to the service provider as only a single MP2MP 665 P-tunnel need be deployed per VPN, thus reducing by an order of 666 magnitude the amount of multicast state that needs to be maintained 667 by P routers. This gain in state is at the expense of bandwidth 668 optimization, since sites that do not have multicast receivers for 669 multicast streams sourced behind a said PE group will still receive 670 packets of such streams, leading to non-optimal bandwidth utilization 671 across the VPN core. One thing to consider is that the use of MP2MP 672 multicast P-tunnel will require additional configuration to define 673 the same P-tunnel identifier or multicast ASM group address in all 674 PEs (it has been noted that some auto-configuration could be possible 675 for MP2MP P-tunnels, but this it is not currently supported by the 676 auto-discovery procedures). [ It has been noted that C-multicast 677 routing schemes not covered in [I-D.ietf-l3vpn-2547bis-mcast] could 678 expose different advantages of MP2MP multicast P-tunnels - this is 679 out of scope of this document ] 681 MVPN services can also be supported over a unicast VPN core through 682 the use of ingress PE replication whereby the ingress PE replicates 683 any multicast traffic over the P2P tunnels used to support unicast 684 traffic. While this option does not require the service provider to 685 modify their existing P routers (in terms of protocol support) and 686 does not require maintaining multicast-specific state on the P 687 routers in order for the service provider to be able deploy a 688 multicast VPN service, the use of ingress PE replication obviously 689 leads to non-optimal bandwidth utilization and it is therefore 690 unlikely to be the long term solution chosen by service providers. 691 However ingress PE replication may be useful during some migration 692 scenarios or where a service provider considers the level of 693 multicast traffic on their network to be too low to justify deploying 694 multicast specific support within their VPN core. 696 All proposed approaches for control plane and dataplane can be used 697 to provide aggregation amongst multicast groups within a VPN and 698 amongst different multicast VPNs, and potentially reduce the amount 699 of state to be maintained by P routers. However the latter -- the 700 aggregation amongst different multicast VPNs will require support for 701 upstream-assigned labels on the PEs. Support for upstream-assigned 702 labels may require changes to the data plane processing of the PEs 703 and this should be taken into consideration by service providers 704 considering the use of aggregate PMSI tunnels for the specific 705 platforms that the service provider has deployed. 707 3.5. Inter-AS deployments options 709 There are a number of scenarios that lead to the requirement for 710 inter-AS multicast VPNs, including: 712 1. a service provider may have a large network that they have 713 segmented into a number of ASs. 715 2. a service provider's multicast VPN may consist of a number of ASs 716 due to acquisitions and mergers with other service providers. 718 3. a service provider may wish to interconnect their multicast VPN 719 platform with that of another service provider. 721 The first scenario can be considered the "simplest" because the 722 network is wholly managed by a single service provider under a single 723 strategy and is therefore likely to use a consistent set of 724 technologies across each AS. 726 The second scenario may be more complex than the first because the 727 strategy and technology choices made for each AS may have been 728 different due to their differing history and the service provider may 729 not have (or may be unwilling to) unified the strategy and technology 730 choices for each AS. 732 The third scenario is the most complex because in addition to the 733 complexity of the second scenario, the ASs are managed by different 734 service providers and therefore may be subject to a different trust 735 model than the other scenarios. 737 Section 5.2.6 of [RFC4834] states that "a solution MUST support 738 inter-AS multicast VPNs, and SHOULD support inter-provider multicast 739 VPNs", "considerations about coexistence with unicast inter-AS VPN 740 Options A, B and C (as described in section 10 of [RFC4364]) are 741 strongly encouraged" and "a multicast VPN solution SHOULD provide 742 inter-AS mechanisms requiring the least possible coordination between 743 providers, and keep the need for detailed knowledge of providers' 744 networks to a minimum - all this being in comparison with 745 corresponding unicast VPN options". 747 Section 8 of [I-D.ietf-l3vpn-2547bis-mcast] addresses these 748 requirements by proposing two approaches for MVPN inter-AS 749 deployments: 751 1. Non-segmented inter-AS tunnels where the multicast tunnels are 752 end-to-end across ASes, so even though the PEs belonging to a 753 given MVPN may be in different ASs the ASBRs play no special role 754 and function merely as P routers (described in section 8.1). 756 2. Segmented inter-AS tunnels where each AS constructs its own 757 separate multicast tunnels which are then 'stitched' together by 758 the ASBRs (described in section 8.2). 760 Section 5.2.6 of [RFC4834] also states "Within each service provider 761 the service provider SHOULD be able on its own to pick the most 762 appropriate tunneling mechanism to carry (multicast) traffic among 763 PEs (just like what is done today for unicast)". The segmented 764 approach is the only one capable of meeting this requirement. 766 The segmented inter-AS solution would appear to offer the largest 767 degree of deployment flexibility to operators. However the non- 768 segmented inter-AS solution can simplify deployment in a restricted 769 number of scenarios and [I-D.rosen-vpn-mcast] only supports the non- 770 segmented inter-AS solution and therefore the non-segmented inter-AS 771 solution is likely to be useful to some operators for backward 772 compatibility and during migration from [I-D.rosen-vpn-mcast] to 773 [I-D.ietf-l3vpn-2547bis-mcast]. 775 The applicability of segmented or non-segmented inter-AS tunnels to a 776 given deployment or inter-provider interconnect will depend on a 777 number of factors specific to each service provider. However, due to 778 the additional deployment flexibility offered by segmented inter-AS 779 tunnels, it is the recommendation of the authors that all 780 implementations should support the segmented inter-AS model. 781 Additionally, the authors recommend that implementations should 782 consider supporting the non-segmented inter-AS model in order to 783 facilitate co-existence with existing deployments, and as a feature 784 to provide a lighter engineering in a restricted set of scenarios, 785 although it is recognized that initial implementations may only 786 support one or the other. 788 3.6. Bidir-PIM support 790 In Bidir-PIM, the packet forwarding rules have been improved over 791 PIM-SM, allowing traffic to be passed up the shared tree toward the 792 RP Address (RPA). To avoid multicast packet looping, Bidir-PIM uses 793 a mechanism called the designated forwarder (DF) election, which 794 establishes a loop-free tree rooted at the RPA. Use of this method 795 ensures that only one copy of every packet will be sent to an RPA, 796 even if there are parallel equal cost paths to the RPA. To avoid 797 loops the DF election process enforces consistent view of the DF on 798 all routers on network segment, and during periods of ambiguity or 799 routing convergence the traffic forwarding is suspended. 801 In the context of a multicast VPN solution, a solution for Bidir-PIM 802 support must preserve this property of similarly avoiding packet 803 loops, including in the case where mVRF's in a given MVPN don't have 804 a consistent view of the routing to C-RPL/C-RPA. 806 The current MVPN specifications [I-D.ietf-l3vpn-2547bis-mcast] in 807 section 11, define three methods to support Bidir-PIM, as RECOMMENDED 808 in [RFC4834]: 810 1. Standard DF election procedure over an MI-PMSI 812 2. VPN Backbone as the RPL (section 11.1) 814 3. Partitioned Sets of PEs (section 11.2) 816 Method (1) is naturally applied to deployments using "Full per-MVPN 817 PIM peering across an MI-PMSI" for C-multicast routing, but as 818 indicated in [I-D.ietf-l3vpn-2547bis-mcast] in section 11, the DF 819 Election may not work well in an MVPN environment and an alternative 820 to DF election would be desirable. 822 The advantage of method (2) and (3) is that they do not require 823 running the DF election procedure among PEs. 825 Method (2) leverages the fact that in Bidir-PIM, running the DF 826 election procedure is not needed on the RPL. This approach thus has 827 the benefit of simplicity of implementation, especially in a context 828 where BGP-based C-multicast routing is used. However it has the 829 drawback of putting constraints on how Bidir-PIM is deployed which 830 may not always match MVPN customers requirements. 832 Method (3) treats an MVPN as a collection of sets of multicast VRFs, 833 all PEs in a set having the same reachability information towards 834 C-RPA, but distinct from PEs in other sets. Hence, with this method, 835 C-Bidir packet loops in MVPN are resolved by the ability to partition 836 a VPN into disjoints sets of VRF's, each having a distinct view of 837 converged network. The partitioning approach to Bidir-PIM requires 838 either upstream-assigned MPLS labels (to denote the partition) or a 839 unique MP2MP LSP per partition. The former is based on PE 840 Distinguisher Labels that have to be distributed using auto-discovery 841 BGP routes and their handling requires the support for upstream 842 assigned labels and context label lookups [RFC5331]. The latter, 843 using MP2MP LSP per partition, does not have these constraints but is 844 restricted to P-tunnel types supporting MP2MP connectivity (such as 845 mLDP [I-D.ietf-mpls-ldp-p2mp]). 847 This approach to C-Bidir can work with PIM-based or BGP-based 848 C-multicast routing procedures, and is also generic in the sense that 849 it does not impose any requirements on the Bidir-PIM service 850 offering. 852 Given the above considerations, method (3) "Partitioned Sets of PEs" 853 is the RECOMMENDED approach. 855 In the event where method (3) is not applicable (lack of support for 856 upstream assigned labels or for a P-tunnel type providing MP2MP 857 connectivity), then method (1) "Standard DF election procedure over 858 an MI-PMSI" and (2) "VPN Backbone as the RPL" are RECOMMENDED as 859 interim solutions, (1) having the advantage over (2) of not putting 860 constraints on how Bidir-PIM is deployed and the drawbacks of only 861 being applicable when PIM-based C-multicast is used and of possibly 862 not working well in an MVPN environment. 864 4. Co-located RPs 866 Section 5.1.10.1 of [RFC4834] states "In the case of PIM-SM in ASM 867 mode, engineering of the RP function requires the deployment of 868 specific protocols and associated configurations. A service provider 869 may offer to manage customers' multicast protocol operation on their 870 behalf. This implies that it is necessary to consider cases where a 871 customer's RPs are out-sourced (e.g. on PEs). Consequently, a VPN 872 solution MAY support the hosting of the RP function in a VR or VRF." 873 However, customers who have already deployed multicast within their 874 networks and have therefore already deployed their own internal RPs 875 are often reluctant to hand over the control of their RPs to their 876 service provider and make use of a co-located RP model, and providing 877 RP-collocation on a PE will require the activation of MSDP or the 878 processing of PIM Registers on the PE. Securing the PE routers for 879 such activity requires special care, additional work, and will likely 880 rely on specific features to be provided by the routers themselves. 882 The applicability of the co-located RP model to a given MVPN will 883 thus depend on a number of factors specific to each customer and 884 service provider. 886 It is therefore the recommendation that implementations should 887 support a co-located RP model, but that support for a co-located RP 888 model within an implementation should not restrict deployments to 889 using a co-located RP model: implementations MUST support deployments 890 when activation of a PIM RP function (PIM Register processing and RP- 891 specific PIM procedures) or VRF MSDP instance is not required on any 892 PE router and where all the RPs are deployed within the customers' 893 networks or CEs. 895 5. Existing deployments 897 Some suggestions provided in this document can be used to 898 incrementally modify currently deployed implementations without 899 hindering these deployments, and without hindering the consistency of 900 the standardized solution by providing optional per-VRF configuration 901 knobs to support modes of operation compatible with currently 902 deployed implementations, while at the same time using the 903 recommended approach on implementations supporting the standard. 905 In cases where this may not be easily achieved, a recommended 906 approach would be to provide a per-VRF configuration knob that allows 907 incremental per-VPN migration of the mechanisms used by a PE device, 908 which would allow migration with some per-VPN interruption of service 909 (e.g. during a maintenance window). 911 Mechanisms allowing "live" migration by providing concurrent use of 912 multiple alternatives for a given PE and a given VPN, is not seen as 913 a priority considering the expected implementation complexity 914 associated with such mechanisms. However, if there happen to be 915 cases where they could be viably implemented relatively simply, such 916 mechanisms may help improve migration management. 918 6. Summary of recommendations 920 The following list summarizes conclusions on the mechanisms that 921 define the set of mandatory to implement mechanisms in the context of 922 [I-D.ietf-l3vpn-2547bis-mcast]. 924 Note well that the implementation of the non-mandatory alternative 925 mechanisms is not precluded. 927 Recommendations are: 929 o that BGP-based auto-discovery be the mandated solution for auto- 930 discovery ; 932 o that BGP be the mandated solution for S-PMSI switching signaling ; 934 o that implementations support both the BGP-based and the full per- 935 MPVN PIM peering solutions for PE-PE exchange of customer 936 multicast routing until further operational experience is gained 937 with both solutions ; 939 o that implementations use the "Partitioned Sets of PEs" approach 940 for Bidir-PIM support ; 942 o that implementations implement the P2MP variants of the P2P 943 protocols that they already implement, such as mLDP, P2MP RSVP-TE 944 and GRE/IP-Multicast ; 946 o that implementations support segmented inter-AS tunnels and 947 consider supporting non-segmented inter-AS tunnels (in order to 948 maintain backwards compatibility and for migration) ; 950 o implementations MUST support deployments when activation of a PIM 951 RP function (PIM Register processing and RP-specific PIM 952 procedures) or VRF MSDP instance is not required on any PE router. 954 7. IANA Considerations 956 This document makes no request to IANA. 958 [ Note to RFC Editor: this section may be removed on publication as 959 an RFC. ] 961 8. Security Considerations 963 This document does not by itself raise any particular security 964 considerations. 966 9. Acknowledgements 968 We would like to thank Adrian Farrel, Eric Rosen, Yakov Rekhter, and 969 Maria Napierala for their feedback that helped shape this document. 971 Additional credit is due to Maria Napierala for co-authoring 972 Section 3.6 on Bidir-PIM support. 974 10. References 976 10.1. Normative References 978 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 979 Requirement Levels", BCP 14, RFC 2119, March 1997. 981 [I-D.ietf-l3vpn-2547bis-mcast] 982 Aggarwal, R., Bandi, S., Cai, Y., Morin, T., Rekhter, Y., 983 Rosen, E., Wijnands, I., and S. Yasukawa, "Multicast in 984 MPLS/BGP IP VPNs", draft-ietf-l3vpn-2547bis-mcast-08 (work 985 in progress), March 2009. 987 [I-D.ietf-l3vpn-2547bis-mcast-bgp] 988 Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP 989 Encodings and Procedures for Multicast in MPLS/BGP IP 990 VPNs", draft-ietf-l3vpn-2547bis-mcast-bgp-08 (work in 991 progress), September 2009. 993 10.2. Informative References 995 [RFC4834] Morin, T., "Requirements for Multicast in L3 Provider- 996 Provisioned Virtual Private Networks (PPVPNs)", RFC 4834, 997 April 2007. 999 [I-D.rosen-vpn-mcast] 1000 Cai, Y., Rosen, E., and I. Wijnands, "Multicast in MPLS/ 1001 BGP IP VPNs", draft-rosen-vpn-mcast-12 (work in progress), 1002 August 2009. 1004 [I-D.raggarwa-l3vpn-2547-mvpn] 1005 Aggarwal, R., "Base Specification for Multicast in BGP/ 1006 MPLS VPNs", draft-raggarwa-l3vpn-2547-mvpn-00 (work in 1007 progress), June 2004. 1009 [I-D.ietf-pim-sm-linklocal] 1010 Atwood, J., "Authentication and Confidentiality in PIM-SM 1011 Link-local Messages", draft-ietf-pim-sm-linklocal-08 (work 1012 in progress), November 2007. 1014 [I-D.ietf-pim-port] 1015 Farinacci, D., Wijnands, I., Venaas, S., and M. Napierala, 1016 "A Reliable Transport Mechanism for PIM", 1017 draft-ietf-pim-port-01 (work in progress), July 2009. 1019 [RFC4684] Marques, P., Bonica, R., Fang, L., Martini, L., Raszuk, 1020 R., Patel, K., and J. Guichard, "Constrained Route 1021 Distribution for Border Gateway Protocol/MultiProtocol 1022 Label Switching (BGP/MPLS) Internet Protocol (IP) Virtual 1023 Private Networks (VPNs)", RFC 4684, November 2006. 1025 [I-D.ietf-mpls-ldp-p2mp] 1026 Minei, I., Kompella, K., Wijnands, I., and B. Thomas, 1027 "Label Distribution Protocol Extensions for Point-to- 1028 Multipoint and Multipoint-to-Multipoint Label Switched 1029 Paths", draft-ietf-mpls-ldp-p2mp-08 (work in progress), 1030 October 2009. 1032 [RFC5331] Aggarwal, R., Rekhter, Y., and E. Rosen, "MPLS Upstream 1033 Label Assignment and Context-Specific Label Space", 1034 RFC 5331, August 2008. 1036 Appendix A. Scalability of C-multicast routing processing load 1038 The main role of multicast routing is to let routers determine that 1039 they should start or stop forwarding a said multicast stream on a 1040 said link. In an MVPN context, this has to be done for each MVPN, 1041 and the associated function is thus named "customer-multicast 1042 routing" or "C-multicast routing" and its role is to let PE routers 1043 determine that they should start or stop forwarding the traffic of a 1044 said multicast stream toward the remote PEs, on some PMSI tunnel. 1046 When some "join" message is received by a PE, this PE knows that it 1047 should be sending traffic for the corresponding multicast group of 1048 the corresponding MVPN. But the reception of a "prune" message from 1049 a remote PE is not enough by itself for a PE to know that it should 1050 stop forwarding the corresponding multicast traffic: it has to make 1051 sure that they aren't any other PEs that still have receivers for 1052 this traffic. 1054 There are many ways that the "C-multicast routing" building block can 1055 be designed, and they differ, among other things, in how a PE 1056 determines when it can stop forwarding a said multicast stream toward 1057 other PEs: 1059 PIM LAN Procedures, by default 1060 By default when PIM LAN procedures are used, when a PE on a LAN 1061 Prunes itself from a multicast tree, all other PEs on that LAN 1062 check their own state to known if they are on the tree, in which 1063 case they send a PIM Join message on that LAN to override the 1064 Prune. Thus, for each PIM Prune message, all PE routers on the 1065 LAN work to let the upstream PE determine the answer to the "did 1066 the last receiver leave?" question. 1068 PIM LAN Procedures, with explicit tracking : 1069 On a LAN, PIM LAN procedures can use an "explicit tracking" 1070 approach, where a PE which is the upstream router for a multicast 1071 stream maintains an updated list of all neighbors on the LAN who 1072 are joined to the tree. Thus, when it receives a Leave message 1073 from a PIM neighbor, it instantly knows the answer to the "did the 1074 last receiver leave?" question. 1075 In this case, the question is answered by the upstream router 1076 alone. The side effect of this "explicit tracking" is that "Join 1077 suppression" is not used: the downstream PEs will always send 1078 Joins toward the upstream PE, which will have to process them all. 1080 BGP-based C-multicast routing 1081 When BGP-based procedures are used for C-multicast routing, if no 1082 BGP route reflector is used, the "did the last receiver leave?" 1083 question is answered like in the PIM "explicit tracking" approach. 1084 But, when a BGP route reflector is used (which is expected to be 1085 the recommended approach), the role of maintaining an updated list 1086 of the PEs that are part of a said multicast tree is taken care of 1087 by the Route Reflector(s). Using BGP procedures the route 1088 reflector that had been advertised a C-multicast Source Tree Join 1089 route for a said (C-S, C-G) to other route reflectors before, will 1090 withdraw this route when there is no of its clients PEs 1091 advertising this route anymore. Similarily, a route reflector 1092 that had advertised this route to its client PEs before, will 1093 withdraw this route when there is none of its (other) client PEs, 1094 and none of its route reflectors peers advertising this route 1095 anymore. In this context, the "did the last receiver leave?" 1096 question can be said to be answered by the route-reflector(s). 1097 Furthermore, the BGP route distribution can leverage more than one 1098 route reflector: if multiple route reflectors are used with PEs 1099 being distributed (as clients) among these route reflectors, the 1100 "did the last receiver leave?" question is partly answered by each 1101 of these route reflector. 1103 We can see that answering the "last receiver leaves" question is a 1104 significant proportion of the work that the C-multicast routing 1105 building block has to make, and where the approaches differ most. 1106 The different approaches for handling C-multicast routing can result 1107 in a different amount of processing and how this processing is spread 1108 among the different functions. These differences can be better 1109 estimated by quantifying the amount of message processing and state 1110 maintenance. 1112 Though the type of processing, messages and states, may vary with the 1113 different approaches, we propose here a rough estimation of the load 1114 of PEs, in terms of number of messages processed and number of 1115 control plane states maintained. A "message processed" being a 1116 message being parsed, a lookup being done, and some action being 1117 taken (such has updating a control plane or data plane state). A 1118 "state maintained" being a multicast state kept in the control plane 1119 memory of a PE, related to a interface or a PE being subscribed to a 1120 multicast stream. Note that here we don't compare the data plane 1121 states on PE routers, which wouldn't vary between the different 1122 options chosen. 1124 A.1. Scalability with an increased number of PEs 1126 The following sections aims at evaluating the processing and state 1127 maintenance load for an increasingly high number of PEs in a VPN. 1129 A.1.1. SSM Scenario 1131 The following subsections do such an estimation for each proposed 1132 approach for C-multicast routing, for different phases of the 1133 following scenario: 1135 o one SSM multicast stream is considered 1137 o only the intra-AS case is concerned (with the segmented inter-AS 1138 tunnels and BGP-based C-multicast routing, #mvpn_PE and #R_PE 1139 should refer to the PEs of the MVPN in the AS, not to all PEs of 1140 the MVPN) 1142 o the scenario is as follows: 1144 * one PE Joins the multicast stream (because of a new receiver- 1145 connected site has sent a Join on the PE-CE link), followed by 1146 a number of additional PEs that also join the same multicast 1147 stream, one after the other ; we evaluate the processing 1148 required for the addition of each PE 1150 * some period of time T passes, without any PE joining or leaving 1151 (baseline) 1153 * all PE leaves, one after the other, until the last one leaves ; 1154 we evaluate the processing required for the leave of each PE 1156 o the parameters used are: 1158 * #MVPN_PE: the number of PEs in the MVPN 1160 * #R_PE: the number of PEs joining the multicast stream 1162 * #RR: the number of route reflectors 1164 * T_PIM_r: the time between two refreshes of a PIM Join (default 1165 is 60s) 1167 The estimation unit used is the "message.equipment" (or "m.e"): one 1168 "message.equipment" corresponding to "one equipment processing one 1169 message" (10 m.e being "10 equipments processing each one message", 1170 or "5 messages each processed by 2 equipments", or "1 message 1171 processed by 10 equipment", etc.). Similarly, for the amount of 1172 control plane state, the unit used is "state.equipment" or "s.e". 1173 This allow to take into account the fact that a message (or a state) 1174 can have be processed (or maintained) by more than one node. 1176 We distinguish three different types of equipments: the upstream PE 1177 for the considered multicast stream, the RR (if any), and the other 1178 PEs (which are not the upstream PE). 1180 The numbers or orders of magnitude given in the tables in the 1181 following subsections are totals across all equipments of a same 1182 type, for each type of equipment, in the the "m.e" and "s.e" units 1183 defined above. 1185 Additionally: 1187 o for PIM, only Join and Prune messages are counted: 1189 * the load due to PIM Hellos can be easily computed separately 1190 and only depends on the number of PEs in the VPN; 1192 * message processing related to the PIM Assert mechanism is also 1193 not taken into account, for sake of simplicity; 1195 o for BGP, all advertisements and withdrawals of C-multicast Source 1196 Tree Join routes are considered (Source-Active autodiscovery 1197 routes are not used in an SSM context) ; and, following the 1198 recommendation of [I-D.ietf-l3vpn-2547bis-mcast-bgp] the case 1199 where the RT-Constraint mechanisms [RFC4684] is not used is not 1200 covered; 1202 A.1.1.1. PIM LAN procedures, by default 1204 +------------+------------+---------------+----------+--------------+ 1205 | | upstream | other PEs | RR | total across | 1206 | | PE (1) | (total across | (none) | all | 1207 | | | (#mvpn_PE-1) | | equipments | 1208 | | | PEs) | | | 1209 +------------+------------+---------------+----------+--------------+ 1210 | first PE | 1 m.e | #MVPN_PE-1 | / | #MVPN_PE m.e | 1211 | joins | | m.e | | | 1212 +------------+------------+---------------+----------+--------------+ 1213 | for *each* | 1 m.e | #mvpn_PE-1 | / | #mvpn_PE m.e | 1214 | additional | | m.e | | | 1215 | PE joining | | | | | 1216 +------------+------------+---------------+----------+--------------+ 1217 | baseline | T/T_PIM_r | (T/T_PIM_r) . | / | (T/T_PIM_r) | 1218 | processing | m.e | (#mvpn_PE-1) | | x #mvpn_PE | 1219 | over a | | m.e | | m.e | 1220 | period T | | | | | 1221 +------------+------------+---------------+----------+--------------+ 1222 | for *each* | 2 m.e | 2(#mvpn_PE-1) | / | 2 x #mvpn_PE | 1223 | PE leaving | | m.e | | m.e | 1224 +------------+------------+---------------+----------+--------------+ 1225 | the last | 1 m.e | #mvpn_PE-1 | / | #mvpn_PE m.e | 1226 | PE leaves | | m.e | | | 1227 +------------+------------+---------------+----------+--------------+ 1228 | total for | #R_PE x 2 | (#mvpn_PE-1) | 0 | #mvpn_PE x ( | 1229 | #R_PE PEs | + | x (#R_PE) x 2 | | 3 x #R_PE + | 1230 | | T/T_PIM_r | + T/T_PIM_r) | | T/T_PIM_r ) | 1231 | | m.e | . | | m.e | 1232 | | | (#mvpn_PE-1) | | | 1233 | | | m.e | | | 1234 +------------+------------+---------------+----------+--------------+ 1235 | total | 1 s.e | #R_PE s.e | 0 | #R_PE+1 s.e | 1236 | state | | | | | 1237 | maintained | | | | | 1238 +------------+------------+---------------+----------+--------------+ 1240 Messages processing and state maintenance - PIM LAN procedures, by 1241 default 1243 We suppose here that the PIM Join suppression and Prune Override 1244 mechanisms are fully effective, i.e. that a Join or Prune message 1245 sent by a PE is instantly seen by other PEs. Strictly speaking, this 1246 is not true, and depending on network delays and timing, there could 1247 be cases where more messages are exchanged and the number given in 1248 this table is a lower bound to the number of PIM messages exchanged. 1250 A.1.1.2. PIM LAN procedures, with explicit tracking 1252 +-------------+-------------+----------------+--------+-------------+ 1253 | | upstream PE | other PEs | RRs | total | 1254 | | (1) | (total across | (none) | across all | 1255 | | | (#mvpn_PE-1) | | equipments | 1256 | | | PEs) | | | 1257 +-------------+-------------+----------------+--------+-------------+ 1258 | first PE | 1 m.e | 1 m.e (see | / | 2 m.e | 1259 | joins | | note below) | | | 1260 +-------------+-------------+----------------+--------+-------------+ 1261 | for *each* | 1 m.e | 1 m.e (see | / | 2 m.e | 1262 | additional | | note below) | | | 1263 | PE joining | | | | | 1264 +-------------+-------------+----------------+--------+-------------+ 1265 | baseline | (T/T_PIM_r) | (T/T_PIM_r) | / | (T/T_PIM_r) | 1266 | processing | m.e x #R_PE | m.e (see note | | x #R_PE m.e | 1267 | over a | m.e | below) | | | 1268 | period T | | | | | 1269 +-------------+-------------+----------------+--------+-------------+ 1270 | for *each* | 1 m.e | 1 m.e (see | / | 2 m.e | 1271 | PE leaving | | note below) | | | 1272 +-------------+-------------+----------------+--------+-------------+ 1273 | the last PE | 1 m.e | 1 m.e (see | / | 2 m.e | 1274 | leaves | | note below) | | | 1275 +-------------+-------------+----------------+--------+-------------+ 1276 | total for | #R_PE (2 + | #R_PE x ( 2 + | 0 | #R_PE x ( 4 | 1277 | #R_PE PEs | T/T_PIM_r) | T/T_PIM_r) m.e | | + | 1278 | | m.e | | | T/T_PIM_r) | 1279 | | | | | m.e | 1280 +-------------+-------------+----------------+--------+-------------+ 1281 | total state | #R_PE s.e | #R_PE s.e | 0 | 2 x #R_PE | 1282 | maintained | | | | s.e | 1283 +-------------+-------------+----------------+--------+-------------+ 1285 Messages processing and state maintenance - PIM LAN procedures, with 1286 explicit tracking 1288 Note: in this explicit tracking mode, a said Join or Leave message 1289 requires processing only by the upstream PE and the PE sending the 1290 message ; indeed, other PEs don't have any action to take ; it is to 1291 be noted though that these other PEs will still have to parse the PIM 1292 message, which is not zero processing. We make here the assumption 1293 that this is not significant. 1295 A.1.1.3. BGP-based C-multicast routing 1297 The following analysis assumes that BGP Route Reflectors (RRs) are 1298 used, and no hierarchy of RRs (remind that the analysis also assumes 1299 that Route Target Constrain mechanisms are is used). 1301 Given these assumptions, a message carrying a C-multicast route from 1302 a downstream PE would need to be processed by the RRs that have that 1303 PE as their client. Due to the use of RT Constrain, these RRs would 1304 then send this message to only the RRs that have the upstream PE as 1305 client. None of the other RRs, and none of the other PEs will 1306 receive this message. Thus, for a message associated with a given 1307 MVPN the total number of RRs that would need to process this message 1308 only depends on the number of RRs that maintain C-multicast routes 1309 for that MVPN and that have either the receiver-connected PE, or the 1310 source-connected PE as their clients, and is independent of the total 1311 number of RRs or the total number of PEs. 1313 In practice for a given MVPN a PE would be a client of just 2 RRs 1314 (for redundancy, an RR cluster would typically have 2 RRs). 1315 Therefore, in practice the message would need to be processed by at 1316 most 4 RRs (2 RRs if both the downstream PE and the upstream PE are 1317 the clients of the same RRs). Thus the number of RRs that have to 1318 process a given message is at most 4. Since RRs in different RR 1319 clusters have a full IBGP mesh among themselves, each RR in the RR 1320 cluster that contains the upstream PE would receive the message from 1321 each of the RR in the RR cluster that contains the downstream PE. 1322 Given 2 RRs per cluster, the total number of messages processed by 1323 all the RRs is 6. 1325 Additionaly, as soon as there is a receiver-connected PEs in each RR 1326 cluster, the number of RRs processing a C-multicast route tends 1327 quickly toward 2 (taking into account that a PE peering to RRs will 1328 be made redundant). 1330 +------------+----------+--------------+-------------+---------------+ 1331 | | upstream | other PEs | RRs (#RR) | total across | 1332 | | PE (1) | (total | | all | 1333 | | | across | | equipments | 1334 | | | (#mvpn_PE-1) | | | 1335 | | | PEs) | | | 1336 +------------+----------+--------------+-------------+---------------+ 1337 | first PE | 2 m.e | 2 m.e | 6 m.e | 10 m.e | 1338 | joins | | | | | 1339 +------------+----------+--------------+-------------+---------------+ 1340 +------------+----------+--------------+-------------+---------------+ 1341 | for *each* | 0 | 2 m.e | (at most) 6 | (at most) 8 | 1342 | additional | | | m.e tending | m.e tending | 1343 | PE joining | | | toward 2 | toward 4 m.e | 1344 | | | | m.e | | 1345 +------------+----------+--------------+-------------+---------------+ 1346 | baseline | 0 | 0 | 0 | 0 | 1347 | processing | | | | | 1348 | over a | | | | | 1349 | period T | | | | | 1350 +------------+----------+--------------+-------------+---------------+ 1351 | for *each* | 0 | 2 m.e | (at most) 6 | (at most) 8 | 1352 | PE leaving | | | m.e tending | m.e tending | 1353 | | | | toward 2 | toward 4 m.e | 1354 +------------+----------+--------------+-------------+---------------+ 1355 | the last | 2 m.e | 2 m.e | 6 m.e | 6 m.e | 1356 | PE leaves | | | | | 1357 +------------+----------+--------------+-------------+---------------+ 1358 | total for | 4 m.e | #R_PE x 4 | (at most) 6 | at most 2 (5 | 1359 | #R_PE PEs | | m.e | x #RP_PE | x #R_PE + 2) | 1360 | | | | m.e | m.e (tending | 1361 | | | | (tending | toward 2 (3 | 1362 | | | | toward 2 x | #R_PE + 2) | 1363 | | | | #R_PE m.e) | m.e ) | 1364 +------------+----------+--------------+-------------+---------------+ 1365 | total | 2 s.e | #R_PE s.e | approx. 2 | approx. 3 | 1366 | state | | | #R_PE + #RR | #R_PE + #RRx | 1367 | maintained | | | x #clusters | #clusters + 2 | 1368 | | | | s.e | m.e | 1369 +------------+----------+--------------+-------------+---------------+ 1371 Message processing and state maintenance - BGP-based procedures 1373 A.1.1.4. Side by side orders of magnitude comparison 1375 This section concludes on the previous section by considering the 1376 orders of magnitude when the number of PEs in a VPN increases. 1378 +------------+----------------------+--------------+----------------+ 1379 | | PIM LAN Procedures, | PIM LAN | BGP-based | 1380 | | default | Procedures, | | 1381 | | | explicit | | 1382 | | | tracking | | 1383 +------------+----------------------+--------------+----------------+ 1384 | first PE | O(#MVPN_PE) | O(1) | O(1) | 1385 | joins (in | | | | 1386 | m.e) | | | | 1387 +------------+----------------------+--------------+----------------+ 1388 | for *each* | O(#MVPN_PE) | O(1) | O(1) | 1389 | additional | | | | 1390 | PE joining | | | | 1391 | (in m.e) | | | | 1392 +------------+----------------------+--------------+----------------+ 1393 | baseline | (T/T_PIM_r) x | (T/T_PIM_r) | 0 | 1394 | processing | O(#mvpn_PE) | x O(#R_PE) | | 1395 | over a | | | | 1396 | period T | | | | 1397 | (in m.e) | | | | 1398 +------------+----------------------+--------------+----------------+ 1399 | for *each* | O(#MVPN_PE) | O(1) | O(1) | 1400 | PE leaving | | | | 1401 | (in m.e) | | | | 1402 +------------+----------------------+--------------+----------------+ 1403 | the last | O(#MVPN_PE) | O(1) | O(1) | 1404 | PE leaves | | | | 1405 | (in m.e) | | | | 1406 +------------+----------------------+--------------+----------------+ 1407 | total for | O(#MVPN_PE x #R_PE) | O(#R_PE) x | O(#R_PE) | 1408 | #R_PE PEs | + O(#MVPN_PE x | (T/T_PIM_r) | | 1409 | (in m.e) | T/T_PIM_r) | | | 1410 +------------+----------------------+--------------+----------------+ 1411 | states (in | O(#R_PE) | O(#R_PE) | O(#R_PE) | 1412 | s.e) | | | | 1413 +------------+----------------------+--------------+----------------+ 1414 | notes | (processing and | (processing | (processing | 1415 | | state maintenance | and state | and state | 1416 | | are essentially done | maintenance | maintenance is | 1417 | | by, and spread | is | essentially | 1418 | | amongst, the PEs of | essentially | done by, and | 1419 | | the MVPN ; | done on the | spread | 1420 | | non-upstream PEs | upstream PE) | amongst, the | 1421 | | have processing to | | RRs) | 1422 | | do) | | | 1423 +------------+----------------------+--------------+----------------+ 1425 Comparison of orders of magnitude for messages processing and state 1426 maintenance (totals across all equipements) 1428 The conclusions that can be drawn from the above are that: 1430 o the PIM LAN Procedures default approach is particular in that any 1431 message will be processed by all PEs, including those that are 1432 neither upstream nor downstream for the message, which results in 1433 a total amount of messages to process which is in O(#MVPN_PE x 1434 #R_PE) ; i.e. O(#MVPN_PE ^ 2) if the proportion of receiver PEs 1435 is considered constant when the number of PEs increases ; 1437 o the two PIM-based approach do refreshes of Join messages, this is 1438 a linear factor not changing the order of magnitude, but which can 1439 be significant for long-lived streams ; 1441 o the BGP-based approach requires an amount of message processing in 1442 O(#R_PE), lower than the two other approaches, and which is 1443 independent of the duration of streams ; 1445 o state maintenance is of the same order of magnitude for all 1446 approaches: O(#R_PE), but the repartition is different: 1448 * the PIM LAN Procedure default approach fully spreads, and 1449 minimizes, the amount of state (one state per PE) 1451 * the PIM LAN procedure with explicit tracking, concentrate all 1452 state on the upstream PE 1454 * the BGP-based procedures spread all the state on the set of 1455 route reflectors 1457 A.1.2. ASM Scalability 1459 When PIM-SM is used in a VPN and an ASM multicast group is joined by 1460 some PEs (#R_PEs) with some sources sending toward this multicast 1461 group address, we can note the following: 1463 PEs will generally have to maintain one shared tree, plus one source 1464 tree for each source sending toward G; each tree resulting in an 1465 amount of processing and state maintenance similar to what is 1466 described in the scenario in Appendix A.1.1, with the same 1467 differences in order of magnitudes between the different approaches 1468 when the number of PEs is high. 1470 An exception to this is, when, for a said group in a VPN, among the 1471 PIM instances in the customer routers and VRFs, none would switch to 1472 the SPT (SwitchToSptDesired always false): in that case the 1473 processing and state maintenance load is the one required for 1474 maintenance of the shared tree only. It has to be noted that this 1475 scenario is dependent on customer policy. To compare the resulting 1476 load in that case, between PIM-based approaches and the BGP-based 1477 approach configured to use inter-site shared trees, the scenario 1478 inAppendix A.1.1 can be used with #R_PEs joining a (C-*,C-G) ASM 1479 group instead of an SSM group, and the same differences in order of 1480 magnitude remain true. In the case of the BGP-based approach used 1481 without inter-site shared trees, we must take into account the load 1482 resulting from the fact that to built the C-PIM shared tree, each PE 1483 has to join the Source Tree to each source ; using the notations of 1484 Appendix A.1.1 this adds an amount of load (total load across all 1485 equipements) which is proportional to #R_PEs and the number of 1486 sources, the order of magnitude with an increasing amount of PEs is 1487 thus unchanged, and the differences in order of magnitude also remain 1488 the same. 1490 Additionaly to the maintenance of trees, PEs have to ensure some 1491 processing and state maintenance related to individual sources 1492 sending to a multicast group ; the related procedures and behaviors 1493 largely may differ depending on which C-multicast routing protocols 1494 is used, how it is configured, and how multicast source discovery 1495 mechanism are used in the customer VPN and which SwitchToSptDesired 1496 policy is used. However the following can be observed: 1498 o when BGP-based C-multicast routing is used, each PE will possibly 1499 have to process and maintain one BGP Source-Active autodiscovery 1500 route for (some or all) sources of an ASM group, which results in 1501 a message processing and state maintenance (total across all the 1502 equipements) linearly dependent on the number of PEs in the VPN 1503 (#MVPN_PE) for each source, independently of the number of PEs 1504 joined to the group. Depending on whether or not inter-site 1505 shared trees are used, and depending on the SwitchToSptDesired 1506 policy in the PIM instances in the customer routers and VRFs, and 1507 depending on the relative locations of sources and RPs, this will 1508 happen for all (S,G) of an ASM group or only for some of them, and 1509 will be done in parallel to the maintenance of shared and/or 1510 source trees or at the first join of a PE on a source tree 1512 o when PIM-based C-multicast routing is used, depending on the 1513 SwitchToSptDesired policy in the PIM instances in the customer 1514 routers and VRFs, and depending on the relative locations of 1515 sources and RPs, there are: 1517 * possible control plane state transitions triggered by the 1518 reception of (S,G) packets ; such events would induce 1519 processing on all PEs joined to G 1521 * possible control plane state transitions triggered by the 1522 reception of (S,G) packets, and possible PIM Assert messages 1523 specific to (S,G) ; this would induce a message processing on 1524 each PE of the VPN for each PIM Assert message 1526 Given the above, the additional processing that may happen for each 1527 individual sending to the group beyond the maintenance of source and 1528 shared trees, does not change the orders of magnitude identified 1529 above. 1531 A.2. Cost of PEs leaving and joining 1533 The quantification of message processing in Appendix A.1.1 is done 1534 based on a use case where each PE with receivers has joined and left 1535 once. Drawing scalability-related conclusions for other patterns of 1536 changes of the set of receiver-connected PEs, can be done by 1537 considering the cost of each approach for "a new PE joining" and "a 1538 PE leaving". 1540 For the "PIM LAN Procedure default" approach, in the case of a single 1541 SSM or SPT tree, the total amount of message processing across all 1542 nodes depends linearly on the number of PEs in the VPN, when a PE 1543 joins such a tree. When "PIM LAN Procedures with explicit tracking" 1544 are used, the amount of processing is independent of the amount of 1545 PEs. 1547 For the "BGP-based" approach: 1549 o In the case of a single SSM tree, the total amount of message 1550 processing across all nodes is independent on the number of PEs, 1551 for "a new PE" joining and "a PE leaving"; it also depends on how 1552 Route Reflectors are meshed, but not with linear dependency. 1554 o In the case of an SPT tree for an ASM group, BGP as additional 1555 processing due to possible Source-Active autodiscovery routes: 1557 * when BGP-based C-multicast routing is used with inter-site 1558 shared trees, for the first PE joining (and last PE leaving) a 1559 said SPT, the processing of the corresponding Source-Active 1560 autodiscovery routes results in a processint cost linearly 1561 dependent of the number of PEs in the VPN ; for subsequent PE 1562 joining (and non-last PE leaving) there is no processing due to 1563 advertisement or withdrawal of Source-Active autodiscovery 1564 routes 1566 * when BGP-based C-multicast routing is used without inter-site 1567 shared trees, the processing of Source-Active autodiscovery 1568 routes for an (S,G), happens independently of PEs joining and 1569 leaving the SPT for (S,G). 1571 In the case of a new PE having having to join a shared tree for an 1572 ASM group G, we see the following: 1574 o the processing due to the PE joining the shared tree itself is the 1575 same as the processing required to setup an SSM tree, as described 1576 before (note that this does not happen when BGP-based C-multicast 1577 routing is used without inter-site shared trees) 1579 o for each source for which the PE joins the SPT, the resulting 1580 processing cost is the same as one SPT tree, as described before ; 1582 * the conditions under which a PE will join the SPT for a said 1583 (C-S, C-G) are the same between the the BGP-based with inter- 1584 site shared tree approach and the PIM-based approach, and 1585 depend solely on the SwitchToSptDesired policy in the PIM 1586 instances in the customer routers in the sites connected to the 1587 PE and/or in the VRF 1589 * the conditions under which a PE will join the SPT for a said 1590 (C-S, C-G) differ between the BGP-based without inter-site 1591 shared trees approach and the PIM-based approach 1593 * the SPT for a said (S,G) can be joined by the PE in the 1594 following cases: 1596 + as soon as one router, or the VPN VRF on the PE, has 1597 SwitchToSptDesired(S,G) being true 1599 + when BGP-based routing is used, and configured to not use 1600 inter-site shared trees 1602 * said differently, the only case where the PE will not join the 1603 SPT for (S,G) is when all routers in the sites of the VPN 1604 connected to the PE, or the VPN VRF itself, will never have 1605 SwitchToSptDesired(S,G) being true, with the additional 1606 condition when BGP-based C-multicast routing is used, that 1607 inter-site shared trees are used 1609 Thus, when one PE joins a group G to which n sources are sending 1610 traffic, we note the following with regards to the dependency of the 1611 cost (in total amount of processing across all equipments) to the 1612 number of PEs : 1614 o in the general case (where any router in the site of the VPN 1615 connected to the PE, or the VRF itself, may have 1616 SwitchToSptDesired(S,G) being true): 1618 * for the "PIM LAN Procedure default" approach, the cost is 1619 linearly dependent on the number of PEs in the VPN, and 1620 linearly dependent on the number of sources 1622 * for the "PIM LAN Procedures with explicit tracking" approach, 1623 the cost is linearly dependent on the number of sources and 1624 independent of the number of PEs in the VPN 1626 * for the "BGP-based" approach, the cost is linearly dependent on 1627 the number of sources, and, in the sub-case of the BGP-based 1628 approach used with inter-site shared trees is also dependent on 1629 the number of PEs in the VPN only if the PE is the first to 1630 join the group or the SPT for some source sending to the group 1632 o else, under the assumption that routers in the sites of the VPN 1633 connected to the PE, and the VPN VRF itself, will never have the 1634 policy function SwitchToSptDesired(S,G) being possibly true, then: 1636 * in the case of the PIM-based approaches, the cost is linearly 1637 dependent on the number of PEs in the VPN, and there is no 1638 dependency on the number of sources 1640 * in the case of the BGP-based approach with inter-site shared 1641 trees, the cost is linearly dependent on the number of RRs, and 1642 there is no dependency on the number of sources 1644 * in the case of the BGP-based approach without inter-site shared 1645 trees, the cost is linearly dependent on the number of RRs and 1646 on the number of sources 1648 Hence, with the PIM default approach the overall cost across all 1649 equipements of any PE joining an ASM group G is always dependent on 1650 the number of PEs (same for a PE that leaves), while in the BGP-based 1651 and PIM Explicit tracking approaches have a cost independent of the 1652 number of PEs (with the exception of the first PE joining the ASM 1653 group, for the BGP-based approach used without inter-site shared 1654 trees; in that case there is a dependency with the number of PEs). 1656 On the dependency with the number of sources : without making any 1657 assumption on the SwitchToSptDesired policy on PIM routers and VRFs 1658 of a VPN, we see that a PE joining an ASM group may induce a 1659 processing cost linearly dependent on the number of sources. Apart 1660 from this general case, under the condition where the 1661 SwitchToSptDesired is always false on all PIM routers and VRFs of the 1662 VPN, then with the PIM based approach, and with the BGP-based 1663 approach used with inter-site shared trees, the cost in amount of 1664 messages processed will be independent of the number of sources (it 1665 has to be noted that this condition depends on customer policy). 1667 Appendix B. Switching to S-PMSI 1669 [ the following point was fixed in version 07 of 1670 [I-D.ietf-l3vpn-2547bis-mcast], and is here for reference only ] 1672 Section 7.2.2.3 of [I-D.ietf-l3vpn-2547bis-mcast] proposes two 1673 approaches for how a source PE can decide when to start transmitting 1674 customer multicast traffic on a S-PMSI: 1676 1. The source PE sends multicast packets for the on both 1677 the I-PMSI P-multicast tree and the S-PMSI P-multicast tree 1678 simultaneously for a pre-configured period of time, letting the 1679 receiver PEs select the new tree for reception, before switching 1680 to only the S-PMSI. 1682 2. The source PE waits for a pre-configured period of time after 1683 advertising the entry bound to the S-PMSI before fully 1684 switching the traffic onto the S-PMSI-bound P-multicast tree. 1686 The first alternative has essentially two drawbacks: 1688 o traffic is sent twice for some period of time, which 1689 would appear to be at odds with the motivation for switching to an 1690 S-PMSI in order to optimize the bandwidth used by the multicast 1691 tree for that stream. 1693 o It is unlikely that the switchover can occur without packet loss 1694 or duplication if the transit delays of the I-PMSI P-multicast 1695 tree and the S-PMSI P-multicast tree differ. 1697 By contrast, the second alternative has none of these drawbacks, and 1698 satisfy the requirement in section 5.1.3 of [RFC4834], which states 1699 that "[...] a multicast VPN solution SHOULD as much as possible 1700 ensure that client multicast traffic packets are neither lost nor 1701 duplicated, even when changes occur in the way a client multicast 1702 data stream is carried over the provider network". The second 1703 alternative also happen to be the one used in existing deployments. 1705 For these reasons, it is the authors' recommendation to mandate the 1706 implementation of the second alternative for switching to S-PMSI. 1708 Authors' Addresses 1710 Thomas Morin (editor) 1711 France Telecom - Orange Labs 1712 2 rue Pierre Marzin 1713 Lannion 22307 1714 France 1716 Email: thomas.morin@orange-ftgroup.com 1718 Ben Niven-Jenkins (editor) 1719 BT 1720 208 Callisto House, Adastral Park 1721 Ipswich, Suffolk IP5 3RE 1722 UK 1724 Email: benjamin.niven-jenkins@bt.com 1726 Yuji Kamite 1727 NTT Communications Corporation 1728 Tokyo Opera City Tower 1729 3-20-2 Nishi Shinjuku, Shinjuku-ku 1730 Tokyo 163-1421 1731 Japan 1733 Email: y.kamite@ntt.com 1735 Raymond Zhang 1736 BT 1737 2160 E. Grand Ave. 1738 El Segundo CA 90025 1739 USA 1741 Email: raymond.zhang@bt.com 1743 Nicolai Leymann 1744 Deutsche Telekom 1745 Goslarer Ufer 35 1746 10589 Berlin 1747 Germany 1749 Email: n.leymann@telekom.de 1750 Nabil Bitar 1751 Verizon 1752 40 Sylvan Road 1753 Waltham, MA 02451 1754 USA 1756 Email: nabil.n.bitar@verizon.com