idnits 2.17.1 draft-ietf-l3vpn-mvpn-considerations-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 3 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 422 has weird spacing: '... or the us...' -- The document date (February 2, 2010) is 5190 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC4364' is mentioned on line 747, but not defined == Outdated reference: A later version (-10) exists of draft-ietf-l3vpn-2547bis-mcast-09 == Outdated reference: A later version (-15) exists of draft-rosen-vpn-mcast-12 == Outdated reference: A later version (-10) exists of draft-ietf-pim-sm-linklocal-08 == Outdated reference: A later version (-09) exists of draft-ietf-pim-port-02 == Outdated reference: A later version (-15) exists of draft-ietf-mpls-ldp-p2mp-08 Summary: 1 error (**), 0 flaws (~~), 10 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group T. Morin, Ed. 3 Internet-Draft France Telecom Orange 4 Expires: August 6, 2010 B. Niven-Jenkins, Ed. 5 BT 6 Y. Kamite 7 NTT Communications 8 R. Zhang 9 BT 10 N. Leymann 11 Deutsche Telekom 12 N. Bitar 13 Verizon 14 February 2, 2010 16 Mandatory Features in a Layer 3 Multicast BGP/MPLS VPN Solution 17 draft-ietf-l3vpn-mvpn-considerations-06 19 Abstract 21 More that one set of mechanisms to support multicast in a layer 3 22 BGP/MPLS VPN has been defined. These are presented in the documents 23 that define them as optional building blocks. 25 To enable interoperability between implementations, this document 26 defines a subset of features that is considered mandatory for a 27 multicast BGP/MPLS VPN implementation. This will help implementers 28 and deployers understand which L3VPN multicast requirements are best 29 satisfied by each option. 31 Requirements Language 33 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 34 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 35 document are to be interpreted as described in [RFC2119]. 37 Status of this Memo 39 This Internet-Draft is submitted to IETF in full conformance with the 40 provisions of BCP 78 and BCP 79. 42 Internet-Drafts are working documents of the Internet Engineering 43 Task Force (IETF), its areas, and its working groups. Note that 44 other groups may also distribute working documents as Internet- 45 Drafts. 47 Internet-Drafts are draft documents valid for a maximum of six months 48 and may be updated, replaced, or obsoleted by other documents at any 49 time. It is inappropriate to use Internet-Drafts as reference 50 material or to cite them other than as "work in progress." 52 The list of current Internet-Drafts can be accessed at 53 http://www.ietf.org/ietf/1id-abstracts.txt. 55 The list of Internet-Draft Shadow Directories can be accessed at 56 http://www.ietf.org/shadow.html. 58 This Internet-Draft will expire on August 6, 2010. 60 Copyright Notice 62 Copyright (c) 2010 IETF Trust and the persons identified as the 63 document authors. All rights reserved. 65 This document is subject to BCP 78 and the IETF Trust's Legal 66 Provisions Relating to IETF Documents 67 (http://trustee.ietf.org/license-info) in effect on the date of 68 publication of this document. Please review these documents 69 carefully, as they describe your rights and restrictions with respect 70 to this document. Code Components extracted from this document must 71 include Simplified BSD License text as described in Section 4.e of 72 the Trust Legal Provisions and are provided without warranty as 73 described in the BSD License. 75 Table of Contents 77 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 78 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 79 3. Examining alternatives mechanisms for MVPN functions . . . . . 4 80 3.1. MVPN auto-discovery . . . . . . . . . . . . . . . . . . . 4 81 3.2. S-PMSI Signaling . . . . . . . . . . . . . . . . . . . . . 5 82 3.3. PE-PE Exchange of C-Multicast Routing . . . . . . . . . . 7 83 3.3.1. PE-PE C-multicast routing scalability . . . . . . . . 7 84 3.3.2. PE-CE multicast routing exchange scalability . . . . . 10 85 3.3.3. P-routers scalability . . . . . . . . . . . . . . . . 10 86 3.3.4. Impact of C-multicast routing on Inter-AS 87 deployments . . . . . . . . . . . . . . . . . . . . . 10 88 3.3.5. Security and robustness . . . . . . . . . . . . . . . 11 89 3.3.6. C-multicast VPN join latency . . . . . . . . . . . . . 12 90 3.3.7. Conclusion on C-multicast routing . . . . . . . . . . 14 91 3.4. Encapsulation techniques for P-multicast trees . . . . . . 14 92 3.5. Inter-AS deployments options . . . . . . . . . . . . . . . 16 93 3.6. Bidir-PIM support . . . . . . . . . . . . . . . . . . . . 19 94 4. Co-located RPs . . . . . . . . . . . . . . . . . . . . . . . . 20 95 5. Avoiding duplicates . . . . . . . . . . . . . . . . . . . . . 21 96 6. Existing deployments . . . . . . . . . . . . . . . . . . . . . 21 97 7. Summary of recommendations . . . . . . . . . . . . . . . . . . 22 98 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 99 9. Security Considerations . . . . . . . . . . . . . . . . . . . 23 100 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 23 101 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 23 102 11.1. Normative References . . . . . . . . . . . . . . . . . . . 23 103 11.2. Informative References . . . . . . . . . . . . . . . . . . 23 104 Appendix A. Scalability of C-multicast routing processing load . 24 105 A.1. Scalability with an increased number of PEs . . . . . . . 26 106 A.1.1. SSM Scalability . . . . . . . . . . . . . . . . . . . 26 107 A.1.2. ASM Scalability . . . . . . . . . . . . . . . . . . . 34 108 A.2. Cost of PEs leaving and joining . . . . . . . . . . . . . 35 109 Appendix B. Switching to S-PMSI . . . . . . . . . . . . . . . . . 38 110 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 39 112 1. Introduction 114 Specifications for multicast in BGP/MPLS 115 [I-D.ietf-l3vpn-2547bis-mcast] include multiple alternative 116 mechanisms for some of the required building blocks of the solution. 117 However, they do not identify which of these mechanisms are mandatory 118 to implement in order to ensure interoperability. Not defining a set 119 of mandatory to implement mechanisms leads to a situation where 120 implementations may support different subsets of the available 121 optional mechanisms which do not interoperate, which is a problem for 122 the numerous operators having multi-vendor backbones. 124 The aim of this document is to leverage the already expressed 125 requirements [RFC4834] and study the properties of each approach, to 126 identify mechanisms that are good candidates for being part of a core 127 set of mandatory mechanisms which can be used to provide a base for 128 interoperable solutions. 130 This document goes through the different building blocks of the 131 solution and concludes on which mechanisms an implementation is 132 required to implement. Section 7 summarizes these requirements. 134 Considering the history of the multicast VPN proposals and 135 implementations, it is also useful to discuss how existing 136 deployments of early implementations 137 [I-D.rosen-vpn-mcast][I-D.raggarwa-l3vpn-2547-mvpn] can be 138 accommodated, and provide suggestions in this respect. 140 2. Terminology 142 Please refer to [I-D.ietf-l3vpn-2547bis-mcast] and [RFC4834]. 144 3. Examining alternatives mechanisms for MVPN functions 146 3.1. MVPN auto-discovery 148 The current solution document [I-D.ietf-l3vpn-2547bis-mcast] proposes 149 two different mechanisms for MVPN auto-discovery: 151 1. BGP-based auto-discovery 153 2. "PIM/shared P-tunnel": discovery done through the exchange of PIM 154 Hellos by C-PIM instances, across an MI-PMSI implemented with one 155 shared P-tunnel per VPN (using multicast ASM, or MP2MP LDP) 157 Both solutions address Section 5.2.10 of [RFC4834] which states that 158 "the operation of a multicast VPN solution SHALL be as light as 159 possible and providing automatic configuration and discovery SHOULD 160 be a priority when designing a multicast VPN solution. Particularly 161 the operational burden of setting up multicast on a PE or for a VR/ 162 VRF SHOULD be as low as possible". 164 The key consideration is that PIM-based discovery is only applicable 165 to deployments using a shared P-tunnel to instantiate an MI-PMSI (it 166 is not applicable if only P2P, PIM-SSM, P2MP mLDP/RSVP-TE P-tunnels 167 are used, because contrary to ASM and MP2MP, building these types of 168 P-tunnels cannot happen before the autodiscovery has been done), 169 whereas the BGP-based auto-discovery does not place any constraint on 170 the type of P-tunnel that would have to be used. BGP-based auto- 171 discovery is independent of the type of P-tunnel used thus satisfying 172 the requirement in section 5.2.4.1 of [RFC4834] that "a multicast VPN 173 solution SHOULD be designed so that control and forwarding planes are 174 not interdependent". 176 Additionally, it is to be noted that a number of service providers 177 have chosen to use SSM-based P-tunnels for the default MDTs within 178 their current deployments, therefore relying already on some BGP- 179 based auto-discovery. 181 Moreover, when shared P-tunnels are used, the use of BGP auto- 182 discovery would allow inconsistencies in the addresses/identifiers 183 used for the shared P-tunnel to be detected (e.g. the same shared 184 P-tunnel identifier being used for different VPNs with distinct BGP 185 route targets). This is particularly attractive in the context of 186 inter-AS VPNs where the impact of any misconfiguration could be 187 magnified and where a single service provider may not operate all the 188 ASs. Note that this technique to detect some misconfiguration cases 189 may not be usable during a transition period from a shared-P-tunnel 190 autodiscovery to a BGP-based autodiscovery. 192 Thus, the recommendation is that implementation of the BGP-based 193 auto-discovery is mandated and should be supported by all MVPN 194 implementations. 196 3.2. S-PMSI Signaling 198 The current solution document [I-D.ietf-l3vpn-2547bis-mcast] proposes 199 two mechanisms for signaling that multicast flows will be switched to 200 an S-PMSI: 202 1. a UDP-based TLV protocol specifically for S-PMSI signaling 203 (described in section 7.4.2). 205 2. a BGP-based mechanism for S-PMSI signaling (described in section 206 7.4.1). 208 Section 5.2.10 of [RFC4834] states that "as far as possible, the 209 design of a solution SHOULD carefully consider the number of 210 protocols within the core network: if any additional protocols are 211 introduced compared with the unicast VPN service, the balance between 212 their advantage and operational burden SHOULD be examined 213 thoroughly". The UDP-based mechanism would be an additional protocol 214 in the MVPN stack, which isn't the case for the BGP-based S-PMSI 215 switching signaling, since (a) BGP is identified as a requirement for 216 autodiscovery, and (b) the BGP-based S-PMSI switching signaling 217 procedures are very similar to the autodiscovery procedures. 219 Furthermore, the UDP-based S-PMSI switching signaling mechanism 220 requires an MI-PMSI, while the BGP-based protocol does not. In 221 practice, this mean that with the UDP-based protocol a PE will have 222 to join to all P-tunnels of all PEs in an MVPN, while in the 223 alternative where BGP-based S-PMSI switching signaling is used, it 224 could delay joining a P-tunnel rooted at a PE until traffic from that 225 PE is needed, thus reducing the amount of state maintained on P 226 routers. 228 S-PMSI switching signaling approaches can also be compared in an 229 inter-AS context (see Section 3.5). The proposed BGP-based approach 230 for S-PMSI switching signaling provides a good fit with both the 231 segmented and non-segmented inter-AS approaches (seeSection 3.5). By 232 contrast while the UDP-based approach for S-PMSI switching signaling 233 appears to be usable with segmented inter-AS tunnels, in that case 234 key advantages of the segmented approach are lost: 236 o there is no more an independence of ASes to choose when S-PMSIs 237 tunnels will be triggered in their AS (and thus control the amount 238 of state created on their P routers), 240 o there is no more an independence of ASes to choose the tunneling 241 technique for the P-tunnels used for an S-PMSI, 243 o In an inter-AS option B context, an isolation of ASes is obtained 244 as PEs in one AS don't have (direct) exchange of routing 245 information with PEs of other ASes. This property is not 246 preserved if UDP-based S-PMSI switching signaling is used. By 247 contrast, BGP-based C-Multicast switching signaling does preserve 248 this property. 250 Given all the above, it is the recommendation of the authors that BGP 251 is the preferred solution for S-PMSI switching signaling and should 252 be supported by all implementations. 254 It is identified that, if nothing prevents a fast-paced creation of 255 S-PMSI, then S-PMSI switching signaling with BGP would possibly 256 impact the Route Reflectors used for MVPN routes. However is it also 257 identified that such a fast-paced behavior would have an impact on P 258 and PE routers resulting from S-PMSI tunnels signaling, which will be 259 the same independently of the S-PMSI signaling approach that is used, 260 and which it is certainly best to avoid by setting up proper 261 mechanisms. 263 The UDP-based S-PMSI switching signaling protocol can also be 264 considered, as an option, given that this protocol has been in 265 deployment for some time. Implementations supporting both protocols 266 would be expected to provide a per-VRF configuration knob to allow an 267 implementation to use the UDP-based TLV protocol for S-PMSI switching 268 signaling for specific VRFs in order to support the coexistence of 269 both protocols (for example during migration scenarios). Apart from 270 such migration-facilitating mechanisms, the authors specifically do 271 not recommend extending the already proposed UDP-based TLV protocol 272 to new types of P-tunnels. 274 3.3. PE-PE Exchange of C-Multicast Routing 276 The current solution document [I-D.ietf-l3vpn-2547bis-mcast] proposes 277 multiple mechanisms for PE-PE exchange of customer multicast routing 278 information (C-multicast routing): 280 1. Full per-MVPN PIM peering across an MI-PMSI (described in section 281 3.4.1.1). 283 2. Lightweight PIM peering across an MI-PMSI (described in section 284 3.4.1.2) 286 3. The unicasting of PIM C-Join/Prune messages (described in section 287 3.4.1.3) 289 4. The use of BGP for carrying C-Multicast routing (described in 290 section 3.4.2). 292 3.3.1. PE-PE C-multicast routing scalability 294 Scalability being one of the core requirements for multicast VPN, it 295 is useful to compare the proposed C-multicast routing mechanisms from 296 this perspective: Section 4.2.4 of [RFC4834] recommends that "a 297 multicast VPN solution SHOULD support several hundreds of PEs per 298 multicast VPN, and MAY usefully scale up to thousands" and section 299 4.2.5 states that "a solution SHOULD scale up to thousands of PEs 300 having multicast service enabled". 302 Scalability with an increased number of VPNs per PE, or with an 303 increased number of multicast state per VPN, are also important, but 304 are not focused on in this section since we didn't identify 305 differences between the different approaches for these matters: all 306 others things equal, the load on PE due to C-multicast routing 307 increases roughly linearly with the number of VPNs per PE, and with 308 the number of multicast state per VPN. 310 This section presents conclusions related to PE-PE C-multicast 311 routing scalability. Appendix A provides more detailed explanations 312 on the differences in ways of handling the C-multicast routing load, 313 between the PIM-based approaches and the BGP-based approach, along 314 with a quantified evaluations of the amount of state and messages 315 with the different approaches, and many points made in this section 316 are detailed in Appendix A.1. 318 At high scales of multicast deployment, the first and third 319 mechanisms require the PEs to maintain a large number of PIM 320 adjacencies with other PEs of the same multicast VPN (which implies 321 the regular exchange PIM Hellos with each other) and to periodically 322 refresh C-Join/Prune states, resulting in an increased processing 323 cost when the amount of PEs increases (as detailed in Appendix A.1) 324 to which the second approach is less subject, and to which the fourth 325 approach is not subject. 327 The third mechanism would reduce the amount of C-Join/Prune 328 processing for a given multicast flow for PEs that are not the 329 upstream neighbor for this flow, but would require "explicit 330 tracking" state to be maintained by the upstream PE. It also isn't 331 compatible with the "Join suppression" mechanism. A possible way to 332 reduce the amount of signaling with this approach would be the use of 333 a PIM refresh-reduction mechanism. Such a mechanism, based on TCP, 334 is being specified by the PIM IETF Working Group 335 ([I-D.ietf-pim-port]) ; its use in a multicast VPN context has not 336 been described in [I-D.ietf-l3vpn-2547bis-mcast], but it is expected 337 that this approach would provide a scalability similar with the BGP- 338 based approach without RR. 340 The second mechanism would operate in a similar manner to full per- 341 MVPN PIM peering except that PIM Hello messages are not transmitted 342 and PIM C-Join/Prune refresh-reduction would be used, thereby 343 improving scalability, but this approach has yet to be fully 344 described. In any case, it seems that it only improves one thing 345 among the things that will impact scalability when the number of PEs 346 increases. 348 The first and second mechanisms can leverage the "Join suppression" 349 behavior and thus improve the processing burden of an upstream PE, 350 sparing the processing of a Join refresh message for each remote PE 351 joined to a multicast stream. This improvement requires all PEs of a 352 multicast VPN to process all PIM Join and Prune messages sent by any 353 other PE participating in the same multicast VPN whether they are the 354 upstream PE or not. 356 The fourth mechanism (the use of BGP for carrying C-Multicast 357 routing) would have a comparable drawback of requiring all PEs to 358 process a BGP C-multicast route only interesting a specific upstream 359 PE. For this reason section 16 [I-D.ietf-l3vpn-2547bis-mcast-bgp] 360 recommends the use of the Route-Target constrained BGP distribution 361 [RFC4684] mechanisms, which eliminate this drawback by making only 362 the interested upstream PE to receive a BGP C-multicast route. 363 Specifically when Route-Target constrained BGP distribution is used, 364 the fourth mechanism reduces the total amount of C-multicast routing 365 processing load put on the PEs by avoiding any processing of customer 366 multicast routing information on the "unrelated" PEs, that are 367 neither the joining PE nor the upstream PE. 369 Moreover, the fourth mechanism further reduces the total amount of 370 message processing load by avoiding the use of periodic refreshes, 371 and by inheriting BGP features that are expected to improve 372 scalability (for instance, providing a means to offload some of the 373 processing burden associated with customer multicast routing onto one 374 or many BGP route-reflectors). The advantages of the fourth 375 mechanism come at a cost of maintaining an amount of state linear 376 with the number of PEs joined to a stream. However, the use of route 377 reflectors allows to spread this cost among multiple route 378 reflectors, thus eliminating the need for a single route reflector to 379 maintain all this state. 381 However, the fourth mechanism is specific in that it offers the 382 possibility of offloading customer multicast routing processing onto 383 one or more BGP Route Reflector(s). When this is used, there is a 384 drawback of increasing the processing load placed on the route 385 reflector infrastructure. In the higher scale scenarios, it may be 386 required to adapt the route reflector infrastructure to the MVPN 387 routing load by using, for example: 389 o a separation of resources for unicast and multicast VPN routing: 390 using dedicated MVPN Route Reflector(s) (or using dedicated MVPN 391 BGP sessions or dedicated MVPN BGP instances) ; 393 o the deployment of additional route reflector resources, for 394 example increasing the processing resources on existing route 395 reflectors or deployment of additional route reflectors. 397 Among the above, the most straightforward approach is to consider the 398 introduction of route reflectors dedicated to the MVPN service and 399 dimension them accordingly to the need of that service (but doing so 400 is not required and is left as an operator engineering decision). 402 3.3.2. PE-CE multicast routing exchange scalability 404 The overhead associated with the PE-CE exchange of C-multicast 405 routing is independent of the choice of the mechanism used for the 406 PE-PE C-multicast routing. Therefore, the impact of the PE-CE 407 C-multicast routing overhead on the overall system scalability is 408 independent of the protocol used for PE-PE signaling, and therefore 409 is not relevant when comparing the different approaches proposed for 410 the PE-PE C-multicast routing. This is true even if in some 411 operational contexts the PE-CE C-multicast routing overhead is a 412 significant factor in the overall system overhead. 414 3.3.3. P-routers scalability 416 Mechanisms (1) and (2) are restricted to use within multicast VPNs 417 that use an MI-PMSI, thereby necessitating: 419 the use of a P-tunnel technique that allows shared P-tunnels (for 420 example PIM-SM in ASM mode or MP2MP LDP) 422 or the use of one P-tunnel per PE per VPN, even for PEs that do not 423 have sources in their directly attached sites for that VPN. 425 By comparison, the fourth mechanism doesn't impose either of these 426 restrictions, and when P2MP P-tunnels are used only necessitates the 427 use of one P-tunnel per VPN per PE attached to a site with a 428 multicast source or RP (or with a candidate BSR, if BSR is used). 430 In cases where there are less PEs connected with sources than the 431 total amount of PEs, it improves the amount of state maintained by 432 P-routers compared to the amount required to build an MI-PMSI with 433 P2MP P-tunnels. Such cases are expected to be frequent for multicast 434 VPN deployments (see sections 4.2.4.1 of [RFC4834]). 436 3.3.4. Impact of C-multicast routing on Inter-AS deployments 438 Co-existence with unicast inter-AS VPN options, and an equal level of 439 security for multicast and unicast including in an inter-AS context, 440 are specifically mentioned in sections 5.2.6, 5.2.8 and 5.2.12 of 441 [RFC4834]. 443 In an inter-AS option B context, an isolation of ASes is obtained as 444 PEs in one AS don't have (direct) exchange of routing information 445 with PEs of other ASes. This property is not preserved if PIM-based 446 PE-PE C-multicast routing is used. By contrast, the fourth option 447 (BGP-based C-Multicast routing) does preserve this property. 449 Additionally, the authors note that the proposed BGP-based approach 450 for C-multicast routing provides a good fit with both the segmented 451 and non-segmented inter-AS approaches. By contrast, though the PIM- 452 based C-multicast routing is usable with segmented inter-AS tunnels, 453 the inter-AS scalability advantage of the approach is lost, since PEs 454 in an AS will see the C-multicast routing activity of all other PEs 455 of all other ASes. 457 3.3.5. Security and robustness 459 BGP supports MD5 authentication of its peers for additional security, 460 thereby possibly benefit directly to multicast VPN customer multicast 461 routing, whether for intra-AS or inter-AS communications. By 462 contrast, with a PIM-based approach, no mechanism providing a 463 comparable level of security to authenticate communications between 464 remote PEs has been yet fully described yet 465 [I-D.ietf-pim-sm-linklocal][], and in any case would require 466 significant additional operations for the provider to be usable in a 467 multicast VPN context. 469 The robustness of the infrastructure, especially the existing 470 infrastructure providing unicast VPN connectivity, is key. The 471 C-multicast routing function, especially under load, will compete 472 with the unicast routing infrastructure. With the PIM-based 473 approaches, the unicast and multicast VPN routing functions are 474 expected to only compete in the PE, for control plane processing 475 resources. In the case of the BGP-based approach, they will compete 476 on the PE for processing resources, and in the route reflectors 477 (supposing they are used for MVPN routing). It is identified that in 478 both cases, mechanisms will be required to arbitrate resources (e.g. 479 processing priorities). In the case of PIM-based procedures, between 480 the different control plane routing instances in the PE. And in the 481 case of the BGP-based approach, this is likely to require using 482 distinct BGP sessions for multicast and unicast (e.g. through the use 483 of dedicated MVPN BGP route reflectors, or to the use of a distinct 484 session with an existing route reflector). 486 Multicast routing is dynamic by nature, and multicast VPN routing has 487 to follow the VPN customers multicast routing events. The different 488 approaches can be compared on how they are expected to behave in 489 scenarios where multicast routing in the VPNs is subject to an 490 intense activity. Scalability of each approach under such a load is 491 detailed in Appendix A.2, and the fourth approach (BGP-based) used in 492 conjunction with the RT Constraint mechanisms [RFC4684], is the only 493 one having a cost for join/leave operations independent of the number 494 of PEs in the VPN (with one exception detailed in Appendix A.2) and 495 state maintenance not concentrated on the upstream PE. 497 On the other hand, while the BGP-based approach is likely to suffer a 498 slowdown under a load that is greater than the available processing 499 resources (because of possibly congested TCP sockets), the PIM-based 500 approaches would react to such a load by dropping messages, with 501 failure-recovery obtained through message refreshes. Thus, the BGP- 502 based approach could result in a degradation of join/leave latency 503 performance typically spread evenly across all multicast streams 504 being joined in that period, while the PIM-based approach could 505 result in increased join/leave latency, for some random streams, by a 506 multiple of the time between refreshes (e.g. tens of seconds), and 507 possibly in some states the adjacency may time-out resulting in 508 disruption of multicast streams. 510 The behavior of the PIM-based approach under such a load is also 511 harder to predict, given that the performance of the "Join 512 suppression" mechanism (an important mechanism for this approach to 513 scale) will itself be impeded by delays in Join processing. For 514 these reasons, the BGP-based approach would be able to provide a 515 smoother degradation and more predictable behavior under a highly 516 dynamic load. 518 In fact, both an "evenly spread degradation" and an "unevenly spread 519 larger degradation" can be problematic, and what seems important is 520 the ability for the VPN backbone operator to (a) limit the amount of 521 multicast routing activity that can be triggered by a multicast VPN 522 customer, and to (b) provide the best possible independence between 523 distinct VPNs. It seems that both of these can be addressed through 524 local implementation improvements, and that both the BGP-based and 525 PIM-based approaches could be engineered to provide (a) and (b). It 526 can be noted though that the BGP approach proposes ways to dampen 527 C-multicast route withdrawals and/or advertisements, and thus already 528 describes a way to provide (a), while nothing comparable has yet been 529 described for the PIM-based approaches (even though it doesn't appear 530 difficult). The PIM-based approaches rely on a per VPN dataplane to 531 carry the MVPN control plane, and thus may benefit from this first 532 level of separation to solve (b). 534 3.3.6. C-multicast VPN join latency 536 Section 5.1.3 of [RFC4834] states that "the group join delay [...] is 537 also considered one important QoS parameter. It is thus RECOMMENDED 538 that a multicast VPN solution be designed appropriately in this 539 regard". In a multicast VPN context, the "group join delay"of 540 interest is the time between a CE sending a PIM Join to its PE and 541 the first packet of the corresponding multicast stream being received 542 by the CE. 544 It is to be noted that the C-multicast routing procedures will only 545 impact the group join latency of a said multicast stream for the 546 first receiver that is located across the provider backbone from the 547 multicast source-connected PE (or the first receivers in the 548 specific case where a specific UMH selection algorithm is used, that 549 allows distinct UMH to be selected by distinct downstream PEs). 551 The different approaches proposed seem to have different 552 characteristics in how they are expected to impact join latency: 554 o the PIM-based approaches minimize the number of control plane 555 processing hops between a new receiver-connected PE and the 556 source-connected PE, and being datagram-based introduces minimal 557 delay, thereby possibly having a join latency as good as possible 558 depending on implementation efficiency 560 o under degraded conditions (packet loss, congestion, high control 561 plane load) the PIM-based approach may impact the latency for a 562 given multicast stream in an all or nothing manner: if a 563 C-multicast routing PIM Join packet is lost, latency can reach a 564 high time (a multiple of the periodicity of PIM Join refreshes) 566 o the BGP-based approach uses TCP exchanges, that may introduce an 567 additional delay depending on BGP and TCP implementation, but 568 which would typically result, under degraded conditions (such 569 packet loss, congestion, high control plane load), in a comparably 570 lower increase of latency spread more evenly across the streams 572 o as shown in Appendix A, the BGP-based approach is particular in 573 that it removes load from all the PEs (without putting this load 574 on the upstream PE for a stream); this improvement of background 575 load can bring improved performance when a PE acts as the upstream 576 PE for a stream, and thus benefit join latency 578 This qualitative comparison of approaches shows that the BGP-based 579 approach is designed for a smoother degradation of latency under 580 degraded conditions such as packet loss, congestion, or high control 581 plane load. On the other hand, the PIM-based approaches seem to 582 structurally be able to reach the shorter "best-case" group join 583 latency (especially compared to deployment of the BGP-based approach 584 where route-reflectors are used). 586 Doing a quantitative comparison of latencies is not possible without 587 referring to specific implementations and benchmarking procedures, 588 and would possibly expose different conclusions, especially for best- 589 case group join latency for which performance is expected vary with 590 PIM and BGP implementations. We can also note that improving a BGP 591 implementation for reduced latency of route processing would not only 592 benefit multicast VPN group join latency, but the whole BGP-based 593 routing, which means that the need for good BGP/RR performance is not 594 specific to multicast VPN routing. 596 Last, C-multicast join latency will be impacted by the overall load 597 put on the control plane, and the scalability of the C-multicast 598 routing approach is thus to be taken into account. As explained in 599 sections Section 3.3.1 and Appendix A, the BGP-based approach will 600 provide the best scalability with an increased number of PEs per VPN, 601 thereby benefiting group join latency in such higher scale scenarios. 603 3.3.7. Conclusion on C-multicast routing 605 The first and fourth approaches are relevant contenders for 606 C-multicast routing. Comparisons from a theoretical standpoint lead 607 to identify some advantages as well as possible drawbacks in the 608 fourth approach. Comparisons from a practical standpoint are harder 609 to make: since only reduced deployment and implementation information 610 is available for the fourth approach, advantages would be seen in the 611 first approach that has been applied through multiple deployments and 612 shown to be operationally viable. 614 Moreover, the first mechanism (full per-MVPN PIM peering across an 615 MI-PMSI) is the mechanism used by [I-D.rosen-vpn-mcast] and therefore 616 it is deployed and operating in MVPNs today. The fourth approach may 617 or may not end up being preferred for a said deployment, but because 618 the first approach has been in deployment for some time, the support 619 for this mechanism will in any case be helpful for to facilitate an 620 eventual migration from a deployment using mechanism close to the 621 first approach. 623 Consequently, at the present time, implementations are recommended to 624 support both the fourth (BGP-based) and first (Full per-MPVN PIM 625 peering) mechanisms. Further experience on deployments of the fourth 626 approach is needed before some best practice can be defined. In the 627 meantime, this recommendation would enable a service provider to 628 choose between the first and the fourth mechanism, without this 629 choice being constrained by vendors implementation choices, and 630 taking into account the peculiarities of its own deployment context 631 by pondering the weight of the different factors into account. 633 3.4. Encapsulation techniques for P-multicast trees 635 In this section the authors will not make any restricting 636 recommendations since the appropriateness of a specific provider core 637 data plane technology will depend on a large number of factors, for 638 example the service provider's currently deployed unicast data plane, 639 many of which are service provider specific. 641 However, implementations should not unreasonably restrict the data 642 plane technology that can be used, and should not force the use of 643 the same technology for different VPNs attached to a single PE. 644 Initial implementations may only support a reduced set of 645 encapsulation techniques and data plane technologies but this should 646 not be a limiting factor that hinders future support for other 647 encapsulation techniques, data plane technologies or 648 interoperability. 650 Section 5.2.4.1 of [RFC4834] states "In a multicast VPN solution 651 extending a unicast L3 PPVPN solution, consistency in the tunneling 652 technology has to be favored: such a solution SHOULD allow the use of 653 the same tunneling technology for multicast as for unicast. 654 Deployment consistency, ease of operation and potential migrations 655 are the main motivations behind this requirement." 657 Current unicast VPN deployments use a variety of LDP, RSVP-TE and 658 GRE/IP-Multicast for encapsulating customer packets for transport 659 across the provider core of VPN services. In order to allow the same 660 encapsulations to be used for unicast and multicast VPN traffic, it 661 is recommended that multicast VPN standards should recommend 662 implementations to support for multicast VPNs, all the P2MP variants 663 of the encapsulations and signaling protocols that they support for 664 unicast and for which some multipoint extension is defined, such as 665 mLDP, P2MP RSVP-TE and GRE/IP-multicast. 667 All three of the above encapsulation techniques support the building 668 of P2MP multicast P-tunnels. In addition mLDP and GRE/ 669 IP-ASM-Multicast implementations may also support the building of 670 MP2MP multicast P-tunnels. The use of MP2MP P-tunnels may provide 671 some scaling benefits to the service provider as only a single MP2MP 672 P-tunnel need be deployed per VPN, thus reducing by an order of 673 magnitude the amount of multicast state that needs to be maintained 674 by P routers. This gain in state is at the expense of bandwidth 675 optimization, since sites that do not have multicast receivers for 676 multicast streams sourced behind a said PE group will still receive 677 packets of such streams, leading to non-optimal bandwidth utilization 678 across the VPN core. One thing to consider is that the use of MP2MP 679 multicast P-tunnel will require additional configuration to define 680 the same P-tunnel identifier or multicast ASM group address in all 681 PEs (it has been noted that some auto-configuration could be possible 682 for MP2MP P-tunnels, but this it is not currently supported by the 683 auto-discovery procedures). [ It has been noted that C-multicast 684 routing schemes not covered in [I-D.ietf-l3vpn-2547bis-mcast] could 685 expose different advantages of MP2MP multicast P-tunnels - this is 686 out of scope of this document ] 688 MVPN services can also be supported over a unicast VPN core through 689 the use of ingress PE replication whereby the ingress PE replicates 690 any multicast traffic over the P2P tunnels used to support unicast 691 traffic. While this option does not require the service provider to 692 modify their existing P routers (in terms of protocol support) and 693 does not require maintaining multicast-specific state on the P 694 routers in order for the service provider to be able deploy a 695 multicast VPN service, the use of ingress PE replication obviously 696 leads to non-optimal bandwidth utilization and it is therefore 697 unlikely to be the long term solution chosen by service providers. 698 However ingress PE replication may be useful during some migration 699 scenarios or where a service provider considers the level of 700 multicast traffic on their network to be too low to justify deploying 701 multicast specific support within their VPN core. 703 All proposed approaches for control plane and dataplane can be used 704 to provide aggregation amongst multicast groups within a VPN and 705 amongst different multicast VPNs, and potentially reduce the amount 706 of state to be maintained by P routers. However the latter -- the 707 aggregation amongst different multicast VPNs will require support for 708 upstream-assigned labels on the PEs. Support for upstream-assigned 709 labels may require changes to the data plane processing of the PEs 710 and this should be taken into consideration by service providers 711 considering the use of aggregate PMSI tunnels for the specific 712 platforms that the service provider has deployed. 714 3.5. Inter-AS deployments options 716 There are a number of scenarios that lead to the requirement for 717 inter-AS multicast VPNs, including: 719 1. a service provider may have a large network that they have 720 segmented into a number of ASs. 722 2. a service provider's multicast VPN may consist of a number of ASs 723 due to acquisitions and mergers with other service providers. 725 3. a service provider may wish to interconnect their multicast VPN 726 platform with that of another service provider. 728 The first scenario can be considered the "simplest" because the 729 network is wholly managed by a single service provider under a single 730 strategy and is therefore likely to use a consistent set of 731 technologies across each AS. 733 The second scenario may be more complex than the first because the 734 strategy and technology choices made for each AS may have been 735 different due to their differing history and the service provider may 736 not have (or may be unwilling to) unified the strategy and technology 737 choices for each AS. 739 The third scenario is the most complex because in addition to the 740 complexity of the second scenario, the ASs are managed by different 741 service providers and therefore may be subject to a different trust 742 model than the other scenarios. 744 Section 5.2.6 of [RFC4834] states that "a solution MUST support 745 inter-AS multicast VPNs, and SHOULD support inter-provider multicast 746 VPNs", "considerations about coexistence with unicast inter-AS VPN 747 Options A, B and C (as described in section 10 of [RFC4364]) are 748 strongly encouraged" and "a multicast VPN solution SHOULD provide 749 inter-AS mechanisms requiring the least possible coordination between 750 providers, and keep the need for detailed knowledge of providers' 751 networks to a minimum - all this being in comparison with 752 corresponding unicast VPN options". 754 Section 8 of [I-D.ietf-l3vpn-2547bis-mcast] addresses these 755 requirements by proposing two approaches for MVPN inter-AS 756 deployments: 758 1. Non-segmented inter-AS tunnels where the multicast tunnels are 759 end-to-end across ASes, so even though the PEs belonging to a 760 given MVPN may be in different ASs the ASBRs play no special role 761 and function merely as P routers (described in section 8.1). 763 2. Segmented inter-AS tunnels where each AS constructs its own 764 separate multicast tunnels which are then 'stitched' together by 765 the ASBRs (described in section 8.2). 767 (Note that an inter-AS deployment can alternatively rely on Option A 768 -- so-called "back-to-back" VRFs -- that option is not considered in 769 this section given that it can be used without any inter-AS specific 770 mechanism) 772 Section 5.2.6 of [RFC4834] also states "Within each service provider 773 the service provider SHOULD be able on its own to pick the most 774 appropriate tunneling mechanism to carry (multicast) traffic among 775 PEs (just like what is done today for unicast)". The segmented 776 approach is the only one capable of meeting this requirement. 778 The segmented inter-AS solution would appear to offer the largest 779 degree of deployment flexibility to operators. However the non- 780 segmented inter-AS solution can simplify deployment in a restricted 781 number of scenarios and [I-D.rosen-vpn-mcast] only supports the non- 782 segmented inter-AS solution and therefore the non-segmented inter-AS 783 solution is likely to be useful to some operators for backward 784 compatibility and during migration from [I-D.rosen-vpn-mcast] to 785 [I-D.ietf-l3vpn-2547bis-mcast]. 787 The following is a comparison matrix between the "segmented inter-AS 788 P-tunnels" and "non-segmented inter-AS P-tunnels" approaches: 790 o Scalability for I-PMSIs: the "segmented inter-AS P-tunnels" is 791 more scalable, because of the ability of an ASBR to aggregate 792 multiple intra-AS P-tunnels used for I-PMSI within its own AS into 793 one inter-AS P-tunnel to be used by other ASes. Note that the 794 I-PMSI scalability improvement brought by the "segmented inter-AS 795 P-tunnels" approach is higher when segmented P-tunnels have a 796 granularity of source AS (see item below). 798 o Scalability for S-PMSIs: the "segmented inter-AS P-tunnels", when 799 used with the BGP-based C-multicast routing approach, provides 800 flexibility in how the bandwidth/state trade-off is handled, to 801 help with scalability. Indeed in that case, the trade-off made 802 for a said (C-S,C-G) in a downstream AS can be made more in favor 803 of scalability than the trade-off made by the neighbor upstream 804 AS, thanks to the ability to aggregate one or more S-PMSIs of the 805 upstream AS in one I-PMSI tunnel in a downstream AS. 807 o Configuration at ASBRs: depending on whether segmented P-tunnels 808 have a granularity of source ASBR or source AS, the "segmented 809 inter-AS P-tunnels" approach would require respectively the same 810 or additional configuration on ASBRs as the "non-segmented 811 inter-AS P-tunnels" approach. 813 o Independence of tunneling technology from one AS to another: the 814 "segmented inter-AS P-tunnels" approach provides this, the "non- 815 segmented inter-AS P-tunnels" approach does not. 817 o Facilitated co-existence with, and migration from, existing 818 deployments, and lighter engineering in some scenarios : the "non- 819 segmented inter-AS P-tunnels" approach provides this, the 820 "segmented inter-AS P-tunnels" approach does not. 822 The applicability of segmented or non-segmented inter-AS tunnels to a 823 given deployment or inter-provider interconnect will depend on a 824 number of factors specific to each service provider. However, given 825 the different elements reminded above, it is the recommendation of 826 the authors that all implementations should support the segmented 827 inter-AS model. Additionally, the authors recommend that 828 implementations should consider supporting the non-segmented inter-AS 829 model in order to facilitate co-existence with, and migration from, 830 existing deployments, and as a feature to provide a lighter 831 engineering in a restricted set of scenarios, although it is 832 recognized that initial implementations may only support one or the 833 other. 835 3.6. Bidir-PIM support 837 In Bidir-PIM, the packet forwarding rules have been improved over 838 PIM-SM, allowing traffic to be passed up the shared tree toward the 839 RP Address (RPA). To avoid multicast packet looping, Bidir-PIM uses 840 a mechanism called the designated forwarder (DF) election, which 841 establishes a loop-free tree rooted at the RPA. Use of this method 842 ensures that only one copy of every packet will be sent to an RPA, 843 even if there are parallel equal cost paths to the RPA. To avoid 844 loops the DF election process enforces consistent view of the DF on 845 all routers on network segment, and during periods of ambiguity or 846 routing convergence the traffic forwarding is suspended. 848 In the context of a multicast VPN solution, a solution for Bidir-PIM 849 support must preserve this property of similarly avoiding packet 850 loops, including in the case where mVRF's in a given MVPN don't have 851 a consistent view of the routing to C-RPL/C-RPA. 853 The current MVPN specifications [I-D.ietf-l3vpn-2547bis-mcast] in 854 section 11, define three methods to support Bidir-PIM, as RECOMMENDED 855 in [RFC4834]: 857 1. Standard DF election procedure over an MI-PMSI 859 2. VPN Backbone as the RPL (section 11.1) 861 3. Partitioned Sets of PEs (section 11.2) 863 Method (1) is naturally applied to deployments using "Full per-MVPN 864 PIM peering across an MI-PMSI" for C-multicast routing, but as 865 indicated in [I-D.ietf-l3vpn-2547bis-mcast] in section 11, the DF 866 Election may not work well in an MVPN environment and an alternative 867 to DF election would be desirable. 869 The advantage of method (2) and (3) is that they do not require 870 running the DF election procedure among PEs. 872 Method (2) leverages the fact that in Bidir-PIM, running the DF 873 election procedure is not needed on the RPL. This approach thus has 874 the benefit of simplicity of implementation, especially in a context 875 where BGP-based C-multicast routing is used. However it has the 876 drawback of putting constraints on how Bidir-PIM is deployed which 877 may not always match MVPN customers requirements. 879 Method (3) treats an MVPN as a collection of sets of multicast VRFs, 880 all PEs in a set having the same reachability information towards 881 C-RPA, but distinct from PEs in other sets. Hence, with this method, 882 C-Bidir packet loops in MVPN are resolved by the ability to partition 883 a VPN into disjoints sets of VRF's, each having a distinct view of 884 converged network. The partitioning approach to Bidir-PIM requires 885 either upstream-assigned MPLS labels (to denote the partition) or a 886 unique MP2MP LSP per partition. The former is based on PE 887 Distinguisher Labels that have to be distributed using auto-discovery 888 BGP routes and their handling requires the support for upstream 889 assigned labels and context label lookups [RFC5331]. The latter, 890 using MP2MP LSP per partition, does not have these constraints but is 891 restricted to P-tunnel types supporting MP2MP connectivity (such as 892 mLDP [I-D.ietf-mpls-ldp-p2mp]). 894 This approach to C-Bidir can work with PIM-based or BGP-based 895 C-multicast routing procedures, and is also generic in the sense that 896 it does not impose any requirements on the Bidir-PIM service 897 offering. 899 Given the above considerations, method (3) "Partitioned Sets of PEs" 900 is the RECOMMENDED approach. 902 In the event where method (3) is not applicable (lack of support for 903 upstream assigned labels or for a P-tunnel type providing MP2MP 904 connectivity), then method (1) "Standard DF election procedure over 905 an MI-PMSI" and (2) "VPN Backbone as the RPL" are RECOMMENDED as 906 interim solutions, (1) having the advantage over (2) of not putting 907 constraints on how Bidir-PIM is deployed and the drawbacks of only 908 being applicable when PIM-based C-multicast is used and of possibly 909 not working well in an MVPN environment. 911 4. Co-located RPs 913 Section 5.1.10.1 of [RFC4834] states "In the case of PIM-SM in ASM 914 mode, engineering of the RP function requires the deployment of 915 specific protocols and associated configurations. A service provider 916 may offer to manage customers' multicast protocol operation on their 917 behalf. This implies that it is necessary to consider cases where a 918 customer's RPs are out-sourced (e.g. on PEs). Consequently, a VPN 919 solution MAY support the hosting of the RP function in a VR or VRF." 921 However, customers who have already deployed multicast within their 922 networks and have therefore already deployed their own internal RPs 923 are often reluctant to hand over the control of their RPs to their 924 service provider and make use of a co-located RP model, and providing 925 RP-collocation on a PE will require the activation of MSDP or the 926 processing of PIM Registers on the PE. Securing the PE routers for 927 such activity requires special care, additional work, and will likely 928 rely on specific features to be provided by the routers themselves. 930 The applicability of the co-located RP model to a given MVPN will 931 thus depend on a number of factors specific to each customer and 932 service provider. 934 It is therefore the recommendation that implementations should 935 support a co-located RP model, but that support for a co-located RP 936 model within an implementation should not restrict deployments to 937 using a co-located RP model: implementations MUST support deployments 938 when activation of a PIM RP function (PIM Register processing and RP- 939 specific PIM procedures) or VRF MSDP instance is not required on any 940 PE router and where all the RPs are deployed within the customers' 941 networks or CEs. 943 5. Avoiding duplicates 945 It is recommended that implementations support the procedures 946 described in section 9.1.1 of [I-D.ietf-l3vpn-2547bis-mcast] 947 "Discarding Packets from Wrong PE", allowing fully avoiding 948 duplicates. 950 6. Existing deployments 952 Some suggestions provided in this document can be used to 953 incrementally modify currently deployed implementations without 954 hindering these deployments, and without hindering the consistency of 955 the standardized solution by providing optional per-VRF configuration 956 knobs to support modes of operation compatible with currently 957 deployed implementations, while at the same time using the 958 recommended approach on implementations supporting the standard. 960 In cases where this may not be easily achieved, a recommended 961 approach would be to provide a per-VRF configuration knob that allows 962 incremental per-VPN migration of the mechanisms used by a PE device, 963 which would allow migration with some per-VPN interruption of service 964 (e.g. during a maintenance window). 966 Mechanisms allowing "live" migration by providing concurrent use of 967 multiple alternatives for a given PE and a given VPN, is not seen as 968 a priority considering the expected implementation complexity 969 associated with such mechanisms. However, if there happen to be 970 cases where they could be viably implemented relatively simply, such 971 mechanisms may help improve migration management. 973 7. Summary of recommendations 975 The following list summarizes conclusions on the mechanisms that 976 define the set of mandatory to implement mechanisms in the context of 977 [I-D.ietf-l3vpn-2547bis-mcast]. 979 Note well that the implementation of the non-mandatory alternative 980 mechanisms is not precluded. 982 Recommendations are: 984 o that BGP-based auto-discovery be the mandated solution for auto- 985 discovery ; 987 o that BGP be the mandated solution for S-PMSI switching signaling ; 989 o that implementations support both the BGP-based and the full per- 990 MPVN PIM peering solutions for PE-PE exchange of customer 991 multicast routing until further operational experience is gained 992 with both solutions ; 994 o that implementations use the "Partitioned Sets of PEs" approach 995 for Bidir-PIM support ; 997 o that implementations implement the P2MP variants of the P2P 998 protocols that they already implement, such as mLDP, P2MP RSVP-TE 999 and GRE/IP-Multicast ; 1001 o that implementations support segmented inter-AS tunnels and 1002 consider supporting non-segmented inter-AS tunnels (in order to 1003 maintain backwards compatibility and for migration) ; 1005 o implementations MUST support deployments when activation of a PIM 1006 RP function (PIM Register processing and RP-specific PIM 1007 procedures) or VRF MSDP instance is not required on any PE router. 1009 o that implementations support the procedures described in section 1010 9.1.1 of [I-D.ietf-l3vpn-2547bis-mcast] 1012 8. IANA Considerations 1014 This document makes no request to IANA. 1016 [ Note to RFC Editor: this section may be removed on publication as 1017 an RFC. ] 1019 9. Security Considerations 1021 This document does not by itself raise any particular security 1022 considerations. 1024 10. Acknowledgements 1026 We would like to thank Adrian Farrel, Eric Rosen, Yakov Rekhter, and 1027 Maria Napierala for their feedback that helped shape this document. 1029 Additional credit is due to Maria Napierala for co-authoring 1030 Section 3.6 on Bidir-PIM support. 1032 11. References 1034 11.1. Normative References 1036 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1037 Requirement Levels", BCP 14, RFC 2119, March 1997. 1039 [I-D.ietf-l3vpn-2547bis-mcast] 1040 Aggarwal, R., Bandi, S., Cai, Y., Morin, T., Rekhter, Y., 1041 Rosen, E., Wijnands, I., and S. Yasukawa, "Multicast in 1042 MPLS/BGP IP VPNs", draft-ietf-l3vpn-2547bis-mcast-09 (work 1043 in progress), November 2009. 1045 [I-D.ietf-l3vpn-2547bis-mcast-bgp] 1046 Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP 1047 Encodings and Procedures for Multicast in MPLS/BGP IP 1048 VPNs", draft-ietf-l3vpn-2547bis-mcast-bgp-08 (work in 1049 progress), September 2009. 1051 11.2. Informative References 1053 [RFC4834] Morin, T., "Requirements for Multicast in L3 Provider- 1054 Provisioned Virtual Private Networks (PPVPNs)", RFC 4834, 1055 April 2007. 1057 [I-D.rosen-vpn-mcast] 1058 Cai, Y., Rosen, E., and I. Wijnands, "Multicast in MPLS/ 1059 BGP IP VPNs", draft-rosen-vpn-mcast-12 (work in progress), 1060 August 2009. 1062 [I-D.raggarwa-l3vpn-2547-mvpn] 1063 Aggarwal, R., "Base Specification for Multicast in BGP/ 1064 MPLS VPNs", draft-raggarwa-l3vpn-2547-mvpn-00 (work in 1065 progress), June 2004. 1067 [I-D.ietf-pim-sm-linklocal] 1068 Atwood, J., "Authentication and Confidentiality in PIM-SM 1069 Link-local Messages", draft-ietf-pim-sm-linklocal-08 (work 1070 in progress), November 2007. 1072 [I-D.ietf-pim-port] 1073 Farinacci, D., Wijnands, I., Venaas, S., and M. Napierala, 1074 "A Reliable Transport Mechanism for PIM", 1075 draft-ietf-pim-port-02 (work in progress), October 2009. 1077 [RFC4684] Marques, P., Bonica, R., Fang, L., Martini, L., Raszuk, 1078 R., Patel, K., and J. Guichard, "Constrained Route 1079 Distribution for Border Gateway Protocol/MultiProtocol 1080 Label Switching (BGP/MPLS) Internet Protocol (IP) Virtual 1081 Private Networks (VPNs)", RFC 4684, November 2006. 1083 [I-D.ietf-mpls-ldp-p2mp] 1084 Minei, I., Kompella, K., Wijnands, I., and B. Thomas, 1085 "Label Distribution Protocol Extensions for Point-to- 1086 Multipoint and Multipoint-to-Multipoint Label Switched 1087 Paths", draft-ietf-mpls-ldp-p2mp-08 (work in progress), 1088 October 2009. 1090 [RFC5331] Aggarwal, R., Rekhter, Y., and E. Rosen, "MPLS Upstream 1091 Label Assignment and Context-Specific Label Space", 1092 RFC 5331, August 2008. 1094 Appendix A. Scalability of C-multicast routing processing load 1096 The main role of multicast routing is to let routers determine that 1097 they should start or stop forwarding a said multicast stream on a 1098 said link. In an MVPN context, this has to be done for each MVPN, 1099 and the associated function is thus named "customer-multicast 1100 routing" or "C-multicast routing" and its role is to let PE routers 1101 determine that they should start or stop forwarding the traffic of a 1102 said multicast stream toward the remote PEs, on some PMSI tunnel. 1104 When some "join" message is received by a PE, this PE knows that it 1105 should be sending traffic for the corresponding multicast group of 1106 the corresponding MVPN. But the reception of a "prune" message from 1107 a remote PE is not enough by itself for a PE to know that it should 1108 stop forwarding the corresponding multicast traffic: it has to make 1109 sure that they aren't any other PEs that still have receivers for 1110 this traffic. 1112 There are many ways that the "C-multicast routing" building block can 1113 be designed, and they differ, among other things, in how a PE 1114 determines when it can stop forwarding a said multicast stream toward 1115 other PEs: 1117 PIM LAN Procedures, by default 1118 By default when PIM LAN procedures are used, when a PE on a LAN 1119 Prunes itself from a multicast tree, all other PEs on that LAN 1120 check their own state to known if they are on the tree, in which 1121 case they send a PIM Join message on that LAN to override the 1122 Prune. Thus, for each PIM Prune message, all PE routers on the 1123 LAN work to let the upstream PE determine the answer to the "did 1124 the last receiver leave?" question. 1126 BGP-based C-multicast routing 1127 When BGP-based procedures are used for C-multicast routing, if no 1128 BGP Route Reflector is used, the "did the last receiver leave?" 1129 question is answered by having the upstream PE maintain an up-to- 1130 date list of the PEs which are joined to the tree, thus making it 1131 possible to instantly know the answer to the "did the last 1132 receiver leave?", whenever a PE leaves the said multicast tree. 1133 But, when a BGP Route Reflector is used (which is expected to be 1134 the recommended approach), the role of maintaining an updated list 1135 of the PEs that are part of a said multicast tree is taken care of 1136 by the Route Reflector(s). Using BGP procedures a route reflector 1137 that had been advertised a C-multicast Source Tree Join route for 1138 a said (C-S, C-G) to other route reflectors before, will withdraw 1139 this route when there is no of its clients PEs advertising this 1140 route anymore. Similarly, a route reflector that had advertised 1141 this route to its client PEs before, will withdraw this route when 1142 there is none of its (other) client PEs, and none of its route 1143 reflectors peers advertising this route anymore. In this context, 1144 the "did the last receiver leave?" question can be said to be 1145 answered by the route-reflector(s). 1146 Furthermore, the BGP route distribution can leverage more than one 1147 route reflector: if multiple route reflectors are used with PEs 1148 being distributed (as clients) among these route reflectors, the 1149 "did the last receiver leave?" question is partly answered by each 1150 of these route reflector. 1152 We can see that answering the "last receiver leaves" question is a 1153 part of the work that the C-multicast routing building block has to 1154 make, where the different approaches significantly differ. The 1155 different approaches for handling C-multicast routing can indeed 1156 result in a different amount of processing and how this processing is 1157 spread among the different functions. These differences can be 1158 better estimated by quantifying the amount of message processing and 1159 state maintenance. 1161 Though the type of processing, messages and states, may vary with the 1162 different approaches, we propose here a rough estimation of the load 1163 of PEs, in terms of number of messages processed and number of 1164 control plane states maintained. A "message processed" being a 1165 message being parsed, a lookup being done, and some action being 1166 taken (such as, for instance, updating a control plane or data plane 1167 state, or discarding the information in the message). A "state 1168 maintained" being a multicast state kept in the control plane memory 1169 of a PE, related to an interface or a PE being subscribed to a 1170 multicast stream (note that a state will be counted on an equipment 1171 as many times as the number of protocols in which it is present; e.g. 1172 two times when present both as a PIM state and a BGP route). Note 1173 that here we don't compare the data plane states on PE routers, which 1174 wouldn't vary between the different options chosen. 1176 A.1. Scalability with an increased number of PEs 1178 The following sections aims at evaluating the processing and state 1179 maintenance load for an increasingly high number of PEs in a VPN. 1181 A.1.1. SSM Scalability 1183 The following subsections do such an estimation for each proposed 1184 approach for C-multicast routing, for different phases of the 1185 following scenario: 1187 o one SSM multicast stream is considered 1189 o only the intra-AS case is concerned (with the segmented inter-AS 1190 tunnels and BGP-based C-multicast routing, #mvpn_PE and #R_PE 1191 should refer to the PEs of the MVPN in the AS, not to all PEs of 1192 the MVPN) 1194 o the scenario is as follows: 1196 * one PE Joins the multicast stream (because of a new receiver- 1197 connected site has sent a Join on the PE-CE link), followed by 1198 a number of additional PEs that also join the same multicast 1199 stream, one after the other ; we evaluate the processing 1200 required for the addition of each PE 1202 * some period of time T passes, without any PE joining or leaving 1203 (baseline) 1205 * all PE leaves, one after the other, until the last one leaves ; 1206 we evaluate the processing required for the leave of each PE 1208 o the parameters used are: 1210 * #mvpn_PE: the number of PEs in the MVPN 1212 * #R_PE: the number of PEs joining the multicast stream 1214 * #RR: the number of route reflectors 1216 * T_PIM_r: the time between two refreshes of a PIM Join (default 1217 is 60s) 1219 The estimation unit used is the "message.equipment" (or "m.e"): one 1220 "message.equipment" corresponding to "one equipment processing one 1221 message" (10 m.e being "10 equipments processing each one message", 1222 or "5 messages each processed by 2 equipments", or "1 message 1223 processed by 10 equipment", etc.). Similarly, for the amount of 1224 control plane state, the unit used is "state.equipment" or "s.e". 1225 This allow to take into account the fact that a message (or a state) 1226 can have be processed (or maintained) by more than one node. 1228 We distinguish three different types of equipments: the upstream PE 1229 for the considered multicast stream, the RR (if any), and the other 1230 PEs (which are not the upstream PE). 1232 The numbers or orders of magnitude given in the tables in the 1233 following subsections are totals across all equipments of a same 1234 type, for each type of equipment, in the "m.e" and "s.e" units 1235 defined above. 1237 Additionally: 1239 o for PIM, only Join and Prune messages are counted: 1241 * the load due to PIM Hellos can be easily computed separately 1242 and only depends on the number of PEs in the VPN; 1244 * message processing related to the PIM Assert mechanism is also 1245 not taken into account, for sake of simplicity; 1247 o for BGP, all advertisements and withdrawals of C-multicast Source 1248 Tree Join routes are considered (Source-Active autodiscovery 1249 routes are not used in an SSM context) ; and, following the 1250 recommendation in Section 16 of [I-D.ietf-l3vpn-2547bis-mcast-bgp] 1251 the case where the RT-Constraint mechanisms [RFC4684] is not used 1252 is not covered; 1254 (Note that for all options provided for C-multicast routing, the 1255 procedures to setup and maintain a shortest path tree toward the 1256 source of an SSM group are the same than the procedures used to setup 1257 and maintain a shortest path tree toward an RP or a non-SSM source ; 1258 the results of this section are thus re-used in section 1259 Appendix A.1.2 ) 1261 A.1.1.1. PIM LAN procedures, by default 1263 +------------+------------+---------------+----------+--------------+ 1264 | | upstream | other PEs | RR | total across | 1265 | | PE (1) | (total across | (none) | all | 1266 | | | (#mvpn_PE-1) | | equipments | 1267 | | | PEs) | | | 1268 +------------+------------+---------------+----------+--------------+ 1269 | first PE | 1 m.e | #mvpn_PE-1 | / | #mvpn_PE m.e | 1270 | joins | | m.e | | | 1271 +------------+------------+---------------+----------+--------------+ 1272 | for *each* | 1 m.e | #mvpn_PE-1 | / | #mvpn_PE m.e | 1273 | additional | | m.e | | | 1274 | PE joining | | | | | 1275 +------------+------------+---------------+----------+--------------+ 1276 | baseline | T/T_PIM_r | (T/T_PIM_r) . | / | (T/T_PIM_r) | 1277 | processing | m.e | (#mvpn_PE-1) | | x #mvpn_PE | 1278 | over a | | m.e | | m.e | 1279 | period T | | | | | 1280 +------------+------------+---------------+----------+--------------+ 1281 | for *each* | 2 m.e | 2(#mvpn_PE-1) | / | 2 x #mvpn_PE | 1282 | PE leaving | | m.e | | m.e | 1283 +------------+------------+---------------+----------+--------------+ 1284 | the last | 1 m.e | #mvpn_PE-1 | / | #mvpn_PE m.e | 1285 | PE leaves | | m.e | | | 1286 +------------+------------+---------------+----------+--------------+ 1287 | total for | #R_PE x 2 | (#mvpn_PE-1) | 0 | #mvpn_PE x ( | 1288 | #R_PE PEs | + | x (#R_PE) x 2 | | 3 x #R_PE + | 1289 | | T/T_PIM_r | + T/T_PIM_r) | | T/T_PIM_r ) | 1290 | | m.e | . | | m.e | 1291 | | | (#mvpn_PE-1) | | | 1292 | | | m.e | | | 1293 +------------+------------+---------------+----------+--------------+ 1294 | total | 1 s.e | #R_PE s.e | 0 | #R_PE+1 s.e | 1295 | state | | | | | 1296 | maintained | | | | | 1297 +------------+------------+---------------+----------+--------------+ 1299 Messages processing and state maintenance - PIM LAN procedures, by 1300 default 1302 We suppose here that the PIM Join suppression and Prune Override 1303 mechanisms are fully effective, i.e. that a Join or Prune message 1304 sent by a PE is instantly seen by other PEs. Strictly speaking, this 1305 is not true, and depending on network delays and timing, there could 1306 be cases where more messages are exchanged and the number given in 1307 this table is a lower bound to the number of PIM messages exchanged. 1309 A.1.1.2. BGP-based C-multicast routing 1311 The following analysis assumes that BGP Route Reflectors (RRs) are 1312 used, and no hierarchy of RRs (remind that the analysis also assumes 1313 that Route Target Constrain mechanisms are is used). 1315 Given these assumptions, a message carrying a C-multicast route from 1316 a downstream PE would need to be processed by the RRs that have that 1317 PE as their client. Due to the use of RT Constrain, these RRs would 1318 then send this message to only the RRs that have the upstream PE as 1319 client. None of the other RRs, and none of the other PEs will 1320 receive this message. Thus, for a message associated with a given 1321 MVPN the total number of RRs that would need to process this message 1322 only depends on the number of RRs that maintain C-multicast routes 1323 for that MVPN and that have either the receiver-connected PE, or the 1324 source-connected PE as their clients, and is independent of the total 1325 number of RRs or the total number of PEs. 1327 In practice for a given MVPN a PE would be a client of just 2 RRs 1328 (for redundancy, an RR cluster would typically have 2 RRs). 1329 Therefore, in practice the message would need to be processed by at 1330 most 4 RRs (2 RRs if both the downstream PE and the upstream PE are 1331 the clients of the same RRs). Thus the number of RRs that have to 1332 process a given message is at most 4. Since RRs in different RR 1333 clusters have a full iBGP mesh among themselves, each RR in the RR 1334 cluster that contains the upstream PE would receive the message from 1335 each of the RR in the RR cluster that contains the downstream PE. 1336 Given 2 RRs per cluster, the total number of messages processed by 1337 all the RRs is 6. 1339 Additionally, as soon as there is a receiver-connected PEs in each RR 1340 cluster, the number of RRs processing a C-multicast route tends 1341 quickly toward 2 (taking into account that a PE peering to RRs will 1342 be made redundant). 1344 +------------+----------+--------------+-----------+----------------+ 1345 | | upstream | other PEs | RRs (#RR) | total across | 1346 | | PE (1) | (total | | all equipments | 1347 | | | across | | | 1348 | | | (#mvpn_PE-1) | | | 1349 | | | PEs) | | | 1350 +------------+----------+--------------+-----------+----------------+ 1351 | first PE | 2 m.e | 2 m.e | 6 m.e | 10 m.e | 1352 | joins | | | | | 1353 +------------+----------+--------------+-----------+----------------+ 1354 | for *each* | between | 2 m.e | (at most) | (at most) 10 | 1355 | additional | 0 and 2 | | 6 m.e | m.e tending | 1356 | PE joining | m.e | | tending | toward 4 m.e | 1357 | | | | toward 2 | | 1358 | | | | m.e | | 1359 +------------+----------+--------------+-----------+----------------+ 1360 | baseline | 0 | 0 | 0 | 0 | 1361 | processing | | | | | 1362 | over a | | | | | 1363 | period T | | | | | 1364 +------------+----------+--------------+-----------+----------------+ 1365 | for *each* | between | 2 m.e | (at most) | (at most) 10 | 1366 | PE leaving | 0 and 2 | | 6 m.e | m.e tending | 1367 | | m.e | | tending | toward 4 m.e | 1368 | | | | toward 2 | | 1369 +------------+----------+--------------+-----------+----------------+ 1370 | the last | 2 m.e | 2 m.e | 6 m.e | 10 m.e | 1371 | PE leaves | | | | | 1372 +------------+----------+--------------+-----------+----------------+ 1373 | total for | at most | #R_PE x 4 | (at most) | at most 10 x | 1374 | #R_PE PEs | 2 x #RRs | m.e | 6 x #R_PE | #R_PE + 2 x | 1375 | | m.e (see | | m.e | #RRs m.e | 1376 | | note | | (tending | (tending | 1377 | | below) | | toward 2 | toward 6 x | 1378 | | | | x #R_PE | #R_PE + #RRs | 1379 | | | | m.e) | m.e ) | 1380 +------------+----------+--------------+-----------+----------------+ 1381 | total | 4 s.e | 2 x #R_PE | approx. 2 | approx. 4 | 1382 | state | | s.e | #R_PE + | #R_PE + #RRx | 1383 | maintained | | | #RR x | #clusters + 4 | 1384 | | | | #clusters | m.e | 1385 | | | | s.e | | 1386 +------------+----------+--------------+-----------+----------------+ 1388 Message processing and state maintenance - BGP-based procedures 1390 Note on the total of m.e on the upstream PE: 1392 o there are as many "message.equipement" on the upstream PE as the 1393 number of times the RRs of the cluster of the upstream PE need to 1394 re-advertise the C-multicast (C-S,C-G) route ; such a re- 1395 advertisement is not useful for the upstream PE, because the 1396 behavior of the upstream PE for a said (VPN associated to the RT, 1397 C-S,C-G) will not depend on the precise attributes carried by the 1398 route (other than the RT, of course) but will happen in some cases 1399 due to how BGP processes these routes ; indeed a BGP peer will 1400 possibly re-advertise a route when its current best path changes 1401 for the said NLRI if the set of attributes to advertise also 1402 changes 1404 o let's look at the different relevant attributes, and when they can 1405 influence when a re-advertisement of a C-multicast route will 1406 happen: 1408 * next-hop and originator-id: a new PE joining will not 1409 mechanically result in a need to re-advertise a C-multicast 1410 route because as the RR aggregates C-multicast routes with the 1411 same NLRI received from PEs in its own cluster (section 11.4 of 1412 [I-D.ietf-l3vpn-2547bis-mcast-bgp]) the RR rewrites the values 1413 of these attributes; however the advertisements made by 1414 different RRs peering with the RRs in the cluster of the 1415 upstream PE may lead to updates of the value of these 1416 attributes 1418 * cluster-list: the value of this attribute only varies between 1419 clusters, changes of the value of this attributes does not 1420 "follow" PE advertisements, and only advertisements made by 1421 different RRs may lead possibly to updates of the value of this 1422 attribute 1424 * local-pref: the value of this attribute is determined locally, 1425 this is true both for the routes advertised by each PE (which 1426 could all be configured to use the same value) and for a route 1427 that results from the aggregation by an RR of the route with 1428 the same NLRI advertised by the PEs of his cluster (the RRs 1429 could also be configured to use a local pref independent from 1430 the local_pref of the routes advertised to him) ; thus, this 1431 attribute can be considered to result in a need to re-advertise 1432 a C-multicast route 1434 * other BGP attributes do not have a particular reason to be set 1435 for C-multicast routes in intra-AS, and if they were, an RR 1436 (or, for attributes relevant for inter-AS, an ASBR) would also 1437 overwrite these values when aggregating these routes 1439 o Given the above, for a said C-multicast Source Tree Join (S,G) 1440 NLRI, what may force an RR to re-advertise the route with 1441 different attributes to the upstream PE would be the case of an RR 1442 of another cluster advertising a route better than its current 1443 best route, because of the values of attributes specific to that 1444 RR (next-hop, originator-id, cluster-list) but not because of 1445 anything specific to the PEs behind that RR. If we consider our 1446 (#R_PE -1) joining a said (C-S,C-G), one after the other after the 1447 first PE joining, some of these events may thus lead to a re- 1448 advertisement to the upstream PE, but the number of times this can 1449 happen is at worse the number of RRs in clusters having receivers 1450 (plus one because of the possible advertisement of the same route 1451 by a PE of the local cluster). 1453 o Given that in this section, we look at scalability with an 1454 increased number of PEs, we need to consider the possibility where 1455 all clusters may have a client PE with a receiver. We also need 1456 to consider that the two RRs of the cluster of the upstream PE may 1457 need to re-advertise the route. With this in mind, we know that 1458 2x#RRs is an upper bound to the number of updates made by RRs to 1459 the upstream PE, for the considered C-multicast route. 1461 A.1.1.3. Side by side orders of magnitude comparison 1463 This section concludes on the previous section by considering the 1464 orders of magnitude when the number of PEs in a VPN increases. 1466 +------------+--------------------------------+---------------------+ 1467 | | PIM LAN Procedures | BGP-based | 1468 +------------+--------------------------------+---------------------+ 1469 | first PE | O(#mvpn_PE) | O(1) | 1470 | joins (in | | | 1471 | m.e) | | | 1472 +------------+--------------------------------+---------------------+ 1473 | for *each* | O(#mvpn_PE) | O(1) | 1474 | additional | | | 1475 | PE joining | | | 1476 | (in m.e) | | | 1477 +------------+--------------------------------+---------------------+ 1478 | baseline | (T/T_PIM_r) x O(#mvpn_PE) | 0 | 1479 | processing | | | 1480 | over a | | | 1481 | period T | | | 1482 | (in m.e) | | | 1483 +------------+--------------------------------+---------------------+ 1484 | for *each* | O(#mvpn_PE) | O(1) | 1485 | PE leaving | | | 1486 | (in m.e) | | | 1487 | the last | O(#mvpn_PE) | O(1) | 1488 | PE leaves | | | 1489 | (in m.e) | | | 1490 +------------+--------------------------------+---------------------+ 1491 | total for | O(#mvpn_PE x #R_PE) + | O(#R_PE) | 1492 | #R_PE PEs | O(#mvpn_PE x T/T_PIM_r) | | 1493 | (in m.e) | | | 1494 +------------+--------------------------------+---------------------+ 1495 | states (in | O(#R_PE) | O(#R_PE) | 1496 | s.e) | | | 1497 +------------+--------------------------------+---------------------+ 1498 | notes | (processing and state | (processing and | 1499 | | maintenance are essentially | state maintenance | 1500 | | done by, and spread amongst, | is essentially done | 1501 | | the PEs of the MVPN ; | by, and spread | 1502 | | non-upstream PEs have | amongst, the RRs) | 1503 | | processing to do) | | 1504 +------------+--------------------------------+---------------------+ 1506 Comparison of orders of magnitude for messages processing and state 1507 maintenance (totals across all equipements) 1509 The conclusions that can be drawn from the above are that: 1511 o in the PIM-based approach, any message will be processed by all 1512 PEs, including those that are neither upstream nor downstream for 1513 the message, which results in a total amount of messages to 1514 process which is in O(#mvpn_PE x #R_PE) ; i.e. O(#mvpn_PE ^ 2) if 1515 the proportion of receiver PEs is considered constant when the 1516 number of PEs increases ; the refreshes of Join messages, 1517 introduces a linear factor not changing the order of magnitude, 1518 but which can be significant for long-lived streams ; 1520 o the BGP-based approach requires an amount of message processing in 1521 O(#R_PE), lower than the PIM-based approach, and which is 1522 independent of the duration of streams ; 1524 o state maintenance is of the same order of magnitude for all 1525 approaches: O(#R_PE), but the repartition is different: 1527 * the PIM-absed approach fully spreads, and minimizes, the amount 1528 of state (one state per PE) 1530 * the BGP-based procedures spread all the state on the set of 1531 route reflectors 1533 A.1.2. ASM Scalability 1535 The conclusion in Appendix A.1.1 are reused in this section, for the 1536 parts that are common to the setup and maintenance of states related 1537 to a source tree or a shared tree. 1539 When PIM-SM is used in a VPN and an ASM multicast group is joined by 1540 some PEs (#R_PEs) with some sources sending toward this multicast 1541 group address, we can note the following: 1543 PEs will generally have to maintain one shared tree, plus one source 1544 tree for each source sending toward G; each tree resulting in an 1545 amount of processing and state maintenance similar to what is 1546 described in the scenario in Appendix A.1.1, with the same 1547 differences in order of magnitudes between the different approaches 1548 when the number of PEs is high. 1550 An exception to this is, when, for a said group in a VPN, among the 1551 PIM instances in the customer routers and VRFs, none would switch to 1552 the SPT (SwitchToSptDesired always false): in that case the 1553 processing and state maintenance load is the one required for 1554 maintenance of the shared tree only. It has to be noted that this 1555 scenario is dependent on customer policy. To compare the resulting 1556 load in that case, between PIM-based approaches and the BGP-based 1557 approach configured to use inter-site shared trees, the scenario 1558 inAppendix A.1.1 can be used with #R_PEs joining a (C-*,C-G) ASM 1559 group instead of an SSM group, and the same differences in order of 1560 magnitude remain true. In the case of the BGP-based approach used 1561 without inter-site shared trees, we must take into account the load 1562 resulting from the fact that to built the C-PIM shared tree, each PE 1563 has to join the Source Tree to each source ; using the notations of 1564 Appendix A.1.1 this adds an amount of load (total load across all 1565 equipments) which is proportional to #R_PEs and the number of 1566 sources, the order of magnitude with an increasing amount of PEs is 1567 thus unchanged, and the differences in order of magnitude also remain 1568 the same. 1570 Additionally to the maintenance of trees, PEs have to ensure some 1571 processing and state maintenance related to individual sources 1572 sending to a multicast group ; the related procedures and behaviors 1573 largely may differ depending on which C-multicast routing protocols 1574 is used, how it is configured, and how multicast source discovery 1575 mechanism are used in the customer VPN and which SwitchToSptDesired 1576 policy is used. However the following can be observed: 1578 o when BGP-based C-multicast routing is used: 1580 * each PE will possibly have to process and maintain a BGP 1581 Source-Active autodiscovery route for (some or all) sources of 1582 an ASM group. The number of Source Active autodiscovery routes 1583 will typically be one but may be related to the amount of 1584 upstream PEs in the following cases : when inter-site shared 1585 trees are used and simultaneously more than one PE is used as 1586 the upstream PE for SPT (C-S,C-G) trees, and when inter-site 1587 shared trees are used and there are multiple PEs that are 1588 possible upstream for this (S,G). 1590 * this results in a message processing and state maintenance 1591 (total across all the equipments) linearly dependent on the 1592 number of PEs in the VPN (#mvpn_PE) for each source, 1593 independently of the number of PEs joined to the group. 1595 * Depending on whether or not inter-site shared trees are used, 1596 and depending on the SwitchToSptDesired policy in the PIM 1597 instances in the customer routers and VRFs, and depending on 1598 the relative locations of sources and RPs, this will happen for 1599 all (S,G) of an ASM group or only for some of them, and will be 1600 done in parallel to the maintenance of shared and/or source 1601 trees or at the first join of a PE on a source tree. 1603 o when PIM-based C-multicast routing is used, depending on the 1604 SwitchToSptDesired policy in the PIM instances in the customer 1605 routers and VRFs, and depending on the relative locations of 1606 sources and RPs, there are: 1608 * possible control plane state transitions triggered by the 1609 reception of (S,G) packets ; such events would induce 1610 processing on all PEs joined to G 1612 * possible PIM Assert messages specific to (S,G) ; this would 1613 induce a message processing on each PE of the VPN for each PIM 1614 Assert message 1616 Given the above, the additional processing that may happen for each 1617 individual source sending to the group, beyond the maintenance of 1618 source and shared trees, does not change the orders of magnitude 1619 identified above. 1621 A.2. Cost of PEs leaving and joining 1623 The quantification of message processing in Appendix A.1.1 is done 1624 based on a use case where each PE with receivers has joined and left 1625 once. Drawing scalability-related conclusions for other patterns of 1626 changes of the set of receiver-connected PEs, can be done by 1627 considering the cost of each approach for "a new PE joining" and "a 1628 PE leaving". 1630 For the "PIM LAN Procedure" approach, in the case of a single SSM or 1631 SPT tree, the total amount of message processing across all nodes 1632 depends linearly on the number of PEs in the VPN, when a PE joins 1633 such a tree. 1635 For the "BGP-based" approach: 1637 o In the case of a single SSM tree, the total amount of message 1638 processing across all nodes is independent on the number of PEs, 1639 for "a new PE" joining and "a PE leaving"; it also depends on how 1640 Route Reflectors are meshed, but not with linear dependency. 1642 o In the case of an SPT tree for an ASM group, BGP as additional 1643 processing due to possible Source-Active autodiscovery routes: 1645 * when BGP-based C-multicast routing is used with inter-site 1646 shared trees, for the first PE joining (and last PE leaving) a 1647 said SPT, the processing of the corresponding Source-Active 1648 autodiscovery routes results in a processing cost linearly 1649 dependent of the number of PEs in the VPN ; for subsequent PE 1650 joining (and non-last PE leaving) there is no processing due to 1651 advertisement or withdrawal of Source-Active autodiscovery 1652 routes 1654 * when BGP-based C-multicast routing is used without inter-site 1655 shared trees, the processing of Source-Active autodiscovery 1656 routes for an (S,G), happens independently of PEs joining and 1657 leaving the SPT for (S,G). 1659 In the case of a new PE having to join a shared tree for an ASM group 1660 G, we see the following: 1662 o the processing due to the PE joining the shared tree itself is the 1663 same as the processing required to setup an SSM tree, as described 1664 before (note that this does not happen when BGP-based C-multicast 1665 routing is used without inter-site shared trees) 1667 o for each source for which the PE joins the SPT, the resulting 1668 processing cost is the same as one SPT tree, as described before ; 1670 * the conditions under which a PE will join the SPT for a said 1671 (C-S, C-G) are the same between the BGP-based with inter-site 1672 shared tree approach and the PIM-based approach, and depend 1673 solely on the SwitchToSptDesired policy in the PIM instances in 1674 the customer routers in the sites connected to the PE and/or in 1675 the VRF 1677 * the conditions under which a PE will join the SPT for a said 1678 (C-S, C-G) differ between the BGP-based without inter-site 1679 shared trees approach and the PIM-based approach 1681 * the SPT for a said (S,G) can be joined by the PE in the 1682 following cases: 1684 + as soon as one router, or the VPN VRF on the PE, has 1685 SwitchToSptDesired(S,G) being true 1687 + when BGP-based routing is used, and configured to not use 1688 inter-site shared trees 1690 * said differently, the only case where the PE will not join the 1691 SPT for (S,G) is when all routers in the sites of the VPN 1692 connected to the PE, or the VPN VRF itself, will never have 1693 SwitchToSptDesired(S,G) being true, with the additional 1694 condition when BGP-based C-multicast routing is used, that 1695 inter-site shared trees are used 1697 Thus, when one PE joins a group G to which n sources are sending 1698 traffic, we note the following with regards to the dependency of the 1699 cost (in total amount of processing across all equipments) to the 1700 number of PEs : 1702 o in the general case (where any router in the site of the VPN 1703 connected to the PE, or the VRF itself, may have 1704 SwitchToSptDesired(S,G) being true): 1706 * for the "PIM LAN Procedure" approach, the cost is linearly 1707 dependent on the number of PEs in the VPN, and linearly 1708 dependent on the number of sources 1710 * for the "BGP-based" approach, the cost is linearly dependent on 1711 the number of sources, and, in the sub-case of the BGP-based 1712 approach used with inter-site shared trees is also dependent on 1713 the number of PEs in the VPN only if the PE is the first to 1714 join the group or the SPT for some source sending to the group 1716 o else, under the assumption that routers in the sites of the VPN 1717 connected to the PE, and the VPN VRF itself, will never have the 1718 policy function SwitchToSptDesired(S,G) being possibly true, then: 1720 * in the case of the PIM-based approach, the cost is linearly 1721 dependent on the number of PEs in the VPN, and there is no 1722 dependency on the number of sources 1724 * in the case of the BGP-based approach with inter-site shared 1725 trees, the cost is linearly dependent on the number of RRs, and 1726 there is no dependency on the number of sources 1728 * in the case of the BGP-based approach without inter-site shared 1729 trees, the cost is linearly dependent on the number of RRs and 1730 on the number of sources 1732 Hence, with the PIM-based approach the overall cost across all 1733 equipments of any PE joining an ASM group G is always dependent on 1734 the number of PEs (same for a PE that leaves), while the BGP-based 1735 approach has a cost independent of the number of PEs (with the 1736 exception of the first PE joining the ASM group, for the BGP-based 1737 approach used without inter-site shared trees; in that case there is 1738 a dependency with the number of PEs). 1740 On the dependency with the number of sources : without making any 1741 assumption on the SwitchToSptDesired policy on PIM routers and VRFs 1742 of a VPN, we see that a PE joining an ASM group may induce a 1743 processing cost linearly dependent on the number of sources. Apart 1744 from this general case, under the condition where the 1745 SwitchToSptDesired is always false on all PIM routers and VRFs of the 1746 VPN, then with the PIM-based approach, and with the BGP-based 1747 approach used with inter-site shared trees, the cost in amount of 1748 messages processed will be independent of the number of sources (it 1749 has to be noted that this condition depends on customer policy). 1751 Appendix B. Switching to S-PMSI 1753 [ the following point was fixed in version 07 of 1754 [I-D.ietf-l3vpn-2547bis-mcast], and is here for reference only ] 1756 Section 7.2.2.3 of [I-D.ietf-l3vpn-2547bis-mcast] proposes two 1757 approaches for how a source PE can decide when to start transmitting 1758 customer multicast traffic on a S-PMSI: 1760 1. The source PE sends multicast packets for the on both 1761 the I-PMSI P-multicast tree and the S-PMSI P-multicast tree 1762 simultaneously for a pre-configured period of time, letting the 1763 receiver PEs select the new tree for reception, before switching 1764 to only the S-PMSI. 1766 2. The source PE waits for a pre-configured period of time after 1767 advertising the entry bound to the S-PMSI before fully 1768 switching the traffic onto the S-PMSI-bound P-multicast tree. 1770 The first alternative has essentially two drawbacks: 1772 o traffic is sent twice for some period of time, which 1773 would appear to be at odds with the motivation for switching to an 1774 S-PMSI in order to optimize the bandwidth used by the multicast 1775 tree for that stream. 1777 o It is unlikely that the switchover can occur without packet loss 1778 or duplication if the transit delays of the I-PMSI P-multicast 1779 tree and the S-PMSI P-multicast tree differ. 1781 By contrast, the second alternative has none of these drawbacks, and 1782 satisfy the requirement in section 5.1.3 of [RFC4834], which states 1783 that "[...] a multicast VPN solution SHOULD as much as possible 1784 ensure that client multicast traffic packets are neither lost nor 1785 duplicated, even when changes occur in the way a client multicast 1786 data stream is carried over the provider network". The second 1787 alternative also happen to be the one used in existing deployments. 1789 For these reasons, it is the authors' recommendation to mandate the 1790 implementation of the second alternative for switching to S-PMSI. 1792 Authors' Addresses 1794 Thomas Morin (editor) 1795 France Telecom - Orange Labs 1796 2 rue Pierre Marzin 1797 Lannion 22307 1798 France 1800 Email: thomas.morin@orange-ftgroup.com 1802 Ben Niven-Jenkins (editor) 1803 BT 1804 208 Callisto House, Adastral Park 1805 Ipswich, Suffolk IP5 3RE 1806 UK 1808 Email: benjamin.niven-jenkins@bt.com 1809 Yuji Kamite 1810 NTT Communications Corporation 1811 Tokyo Opera City Tower 1812 3-20-2 Nishi Shinjuku, Shinjuku-ku 1813 Tokyo 163-1421 1814 Japan 1816 Email: y.kamite@ntt.com 1818 Raymond Zhang 1819 BT 1820 2160 E. Grand Ave. 1821 El Segundo CA 90025 1822 USA 1824 Email: raymond.zhang@bt.com 1826 Nicolai Leymann 1827 Deutsche Telekom 1828 Goslarer Ufer 35 1829 10589 Berlin 1830 Germany 1832 Email: n.leymann@telekom.de 1834 Nabil Bitar 1835 Verizon 1836 40 Sylvan Road 1837 Waltham, MA 02451 1838 USA 1840 Email: nabil.n.bitar@verizon.com