idnits 2.17.1 draft-ietf-l3vpn-mvpn-considerations-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 3 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 392 has weird spacing: '... or the us...' -- The document date (July 10, 2009) is 5394 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'RFC4364' is mentioned on line 715, but not defined == Unused Reference: 'I-D.ietf-l3vpn-2547bis-mcast-bgp' is defined on line 962, but no explicit reference was found in the text == Outdated reference: A later version (-10) exists of draft-ietf-l3vpn-2547bis-mcast-08 == Outdated reference: A later version (-08) exists of draft-ietf-l3vpn-2547bis-mcast-bgp-07 == Outdated reference: A later version (-15) exists of draft-rosen-vpn-mcast-11 == Outdated reference: A later version (-10) exists of draft-ietf-pim-sm-linklocal-02 == Outdated reference: A later version (-09) exists of draft-ietf-pim-port-01 Summary: 1 error (**), 0 flaws (~~), 10 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group T. Morin, Ed. 3 Internet-Draft France Telecom Orange 4 Intended status: Informational B. Niven-Jenkins, Ed. 5 Expires: January 11, 2010 BT 6 Y. Kamite 7 NTT Communications 8 R. Zhang 9 BT 10 N. Leymann 11 Deutsche Telekom 12 N. Bitar 13 Verizon 14 July 10, 2009 16 Mandatory Features in a Layer 3 Multicast BGP/MPLS VPN Solution 17 draft-ietf-l3vpn-mvpn-considerations-04 19 Status of this Memo 21 This Internet-Draft is submitted to IETF in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF), its areas, and its working groups. Note that 26 other groups may also distribute working documents as Internet- 27 Drafts. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 The list of current Internet-Drafts can be accessed at 35 http://www.ietf.org/ietf/1id-abstracts.txt. 37 The list of Internet-Draft Shadow Directories can be accessed at 38 http://www.ietf.org/shadow.html. 40 This Internet-Draft will expire on January 11, 2010. 42 Copyright Notice 44 Copyright (c) 2009 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents in effect on the date of 49 publication of this document (http://trustee.ietf.org/license-info). 50 Please review these documents carefully, as they describe your rights 51 and restrictions with respect to this document. 53 Abstract 55 More that one set of mechanisms to support multicast in a layer 3 56 BGP/MPLS VPN has been defined. These are presented in the documents 57 that define them as optional building blocks. 59 To enable interoperability between implementations, this document 60 defines a subset of features that is considered mandatory for a 61 multicast BGP/MPLS VPN implementation. This will help implementers 62 and deployers understand which L3VPN multicast requirements are best 63 satisfied by each option. 65 Requirements Language 67 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 68 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 69 document are to be interpreted as described in [RFC2119]. 71 Table of Contents 73 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 74 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 75 3. Examining alternatives mechanisms for MVPN functions . . . . . 4 76 3.1. MVPN auto-discovery . . . . . . . . . . . . . . . . . . . 4 77 3.2. S-PMSI Signaling . . . . . . . . . . . . . . . . . . . . . 5 78 3.3. PE-PE Transmission of C-Multicast Routing . . . . . . . . 7 79 3.3.1. PE-PE signaling scalability . . . . . . . . . . . . . 7 80 3.3.2. P-routers scalability . . . . . . . . . . . . . . . . 9 81 3.3.3. Impact of C-multicast routing on Inter-AS 82 deployments . . . . . . . . . . . . . . . . . . . . . 10 83 3.3.4. Security and robustness . . . . . . . . . . . . . . . 10 84 3.3.5. C-multicast VPN join latency . . . . . . . . . . . . . 12 85 3.3.6. Conclusion on C-multicast routing . . . . . . . . . . 13 86 3.4. Encapsulation techniques for P-multicast trees . . . . . . 14 87 3.5. Inter-AS deployments options . . . . . . . . . . . . . . . 16 88 3.6. Bidir-PIM support . . . . . . . . . . . . . . . . . . . . 17 89 4. Co-located RPs . . . . . . . . . . . . . . . . . . . . . . . . 19 90 5. Existing deployments . . . . . . . . . . . . . . . . . . . . . 20 91 6. Summary of recommendations . . . . . . . . . . . . . . . . . . 20 92 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 93 8. Security Considerations . . . . . . . . . . . . . . . . . . . 21 94 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 21 95 10. Informative References . . . . . . . . . . . . . . . . . . . . 21 96 Appendix A. Scalability of C-multicast routing processing load . 22 97 A.1. PIM LAN procedures, by default . . . . . . . . . . . . . . 26 98 A.2. PIM LAN procedures, with explicit tracking . . . . . . . . 27 99 A.3. BGP-based . . . . . . . . . . . . . . . . . . . . . . . . 28 100 A.4. Side by side orders of magnitude comparison . . . . . . . 29 101 Appendix B. Switching to S-PMSI . . . . . . . . . . . . . . . . . 31 102 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 32 104 1. Introduction 106 Specifications for multicast in BGP/MPLS 107 [I-D.ietf-l3vpn-2547bis-mcast] include multiple alternative 108 mechanisms for some of the required building blocks of the solution. 109 However, they do not identify which of these mechanisms are mandatory 110 to implement in order to ensure interoperability. Not defining a set 111 of mandatory to implement mechanisms leads to a situation where 112 implementations may support different subsets of the available 113 optional mechanisms which do not interoperate, which is a problem for 114 the numerous operators having multi-vendor backbones. 116 The aim of this document is to leverage the already expressed 117 requirements [RFC4834] and study the properties of each approach, to 118 identify mechanisms that are good candidates for being part of a core 119 set of mandatory mechanisms which can be used to provide a base for 120 interoperable solutions. 122 This document goes through the different building blocks of the 123 solution and concludes on which mechanisms an implementation is 124 required to implement. Section 6 summarizes these requirements. 126 Considering the history of the multicast VPN proposals and 127 implementations, it is also useful to discuss how existing 128 deployments of early implementations 129 [I-D.rosen-vpn-mcast][I-D.raggarwa-l3vpn-2547-mvpn] can be 130 accommodated, and provide suggestions in this respect. 132 2. Terminology 134 Please refer to [I-D.ietf-l3vpn-2547bis-mcast] and [RFC4834]. 136 3. Examining alternatives mechanisms for MVPN functions 138 3.1. MVPN auto-discovery 140 The current solution document [I-D.ietf-l3vpn-2547bis-mcast] proposes 141 two different mechanisms for MVPN auto-discovery: 143 1. BGP-based auto-discovery 145 2. "PIM/shared tree" : discovery done through the exchange of PIM 146 Hellos by C-PIM instances, across an MI-PMSI implemented with one 147 shared tree per VPN (using multicast ASM, or MP2MP LDP) 149 Both solutions address Section 5.2.10 of [RFC4834] which states that 150 "the operation of a multicast VPN solution SHALL be as light as 151 possible and providing automatic configuration and discovery SHOULD 152 be a priority when designing a multicast VPN solution. Particularly 153 the operational burden of setting up multicast on a PE or for a VR/ 154 VRF SHOULD be as low as possible". 156 The key consideration is that PIM-based discovery is only applicable 157 to deployments using a shared tree to instantiate an MI-PMSI (it 158 cannot be applicable to if only P2P or SSM trees are used, because 159 contrary to ASM and MP2MP, building these P2P or SSM trees cannot 160 happen before the autodiscovery has been done), whereas the BGP-based 161 auto-discovery does not place any constraint on the type of multicast 162 trees that would have to be used. BGP-based auto-discovery is 163 independent of the type of P-multicast tree used thus satisfying the 164 requirement in section 5.2.4.1 of [RFC4834] that "a multicast VPN 165 solution SHOULD be designed so that control and forwarding planes are 166 not interdependent". 168 Additionally, it is to be noted that a number of service providers 169 have chosen to use SSM-based trees for the default MDTs within their 170 current deployments, therefore relying already on some BGP-based 171 auto-discovery. 173 Moreover, when shared P-tunnels are used, the use of BGP auto- 174 discovery would allow inconsistencies in the addresses/identifiers 175 used for the shared trees to be detected (e.g. the same shared tree 176 identifier being used for different VPNs with distinct BGP route 177 targets). This is particularly attractive in the context of inter-AS 178 VPNs where the impact of any misconfiguration could be magnified and 179 where a single service provider may not operate all the ASs. Note 180 that this technique to detect some misconfiguration cases may not be 181 usable during a transition period from a shared-tree autodiscovery to 182 a BGP-based autodiscovery. 184 Thus, the recommendation is that implementation of the BGP-based 185 auto-discovery is mandated and should be supported by all mVPN 186 implementations. 188 3.2. S-PMSI Signaling 190 The current solution document [I-D.ietf-l3vpn-2547bis-mcast] proposes 191 two mechanisms for signaling that multicast flows will be switched to 192 an S-PMSI : 194 1. a UDP-based TLV protocol specifically for S-PMSI signaling 195 (described in section 7.4.2). 197 2. a BGP-based mechanism for S-PMSI signaling (described in section 198 7.4.1). 200 Section 5.2.10 of [RFC4834] states that "as far as possible, the 201 design of a solution SHOULD carefully consider the number of 202 protocols within the core network: if any additional protocols are 203 introduced compared with the unicast VPN service, the balance between 204 their advantage and operational burden SHOULD be examined 205 thoroughly". The UDP-based mechanism would be an additional protocol 206 in the MVPN stack, which isn't the case for the BGP-based S-PMSI 207 switching signaling, since (a) BGP is identified as a requirement for 208 autodiscovery, and (b) the BGP-based S-PMSI switching signaling 209 procedures are very similar to the autodiscovery procedures. 211 Furthermore, the BGP-based S-PMSI switching signaling mechanism can 212 be used within MVPNs using either a UI-PMSI or a MI-PMSI while the 213 UDP-based protocol is restricted to use within MVPNs using an MI- 214 PMSI. In practice, this means that, except if shared trees are used, 215 a PE will have to join to all trees of all PEs in a VPN, while in the 216 alternative where BGP-based S-PMSI switching signaling is used, it 217 could delay joining a tree from a PE until traffic from that PE is 218 needed, thus reducing the amount of state maintained on P routers. 220 S-PMSI switching signaling approaches can also be compared in an 221 inter-AS context (see Section 3.5). The proposed BGP-based approach 222 for S-PMSI switching signaling provides a good fit with both the 223 segmented and non-segmented inter-AS approaches (seeSection 3.5). By 224 contrast the UDP-based approach for S-PMSI switching signaling 225 appears to be usable with segmented inter-AS tunnels, but in that 226 case key advantages of the segmented approach are lost : 228 o there is no more an independence of ASes to choose when S-PMSIs 229 tunnels will be triggered in their AS (and thus control the amount 230 of state created on their P routers), and with which tunneling 231 technique they will be built 233 o in an inter-AS option B context, an isolation of ASes is obtained 234 as PEs don't have visibility of, nor exchange with, PEs of other 235 ASes. This property can be preserved if the segmented inter-AS 236 approach and BGP-based S-PMSI switching signaling are used, but it 237 is not preserved if UDP-based switching signaling is used. 239 Given all the above, it is the recommendation of the authors that BGP 240 is the preferred solution for S-PMSI switching signaling and should 241 be supported by all implementations. 243 It is identified that, if nothing prevents a fast-paced creation of 244 S-PMSI, then S-PMSI switching signaling with BGP would possibly 245 impact the Route Reflectors used for mVPN routes. However is it also 246 identified that such a fast-paced behavior would have an impact on P 247 and PE routers resulting from S-PMSI tunnels signaling, which will be 248 the same independently of the S-PMSI signaling approach that is used, 249 and which it is certainly best to avoid by setting up proper 250 mechanisms. 252 The UDP-based S-PMSI switching signaling protocol can also be 253 considered, as an option, given that this protocol has been in 254 deployment for some time. Implementations supporting both protocols 255 would be expected to provide a per-VRF configuration knob to allow an 256 implementation to use the UDP-based TLV protocol for S-PMSI switching 257 signaling for specific VRFs in order to support the coexistence of 258 both protocols (for example during migration scenarios). Apart from 259 such migration-facilitating mechanisms, the authors specifically do 260 not recommend extending the already proposed UDP-based TLV protocol 261 to new types of P-multicast trees. 263 3.3. PE-PE Transmission of C-Multicast Routing 265 The current solution document [I-D.ietf-l3vpn-2547bis-mcast] proposes 266 multiple mechanisms for PE-PE transmission of customer multicast 267 routing information: 269 1. Full per-MVPN PIM peering across an MI-PMSI (described in section 270 3.4.1.1). 272 2. Lightweight PIM peering across an MI-PMSI (described in section 273 3.4.1.2) 275 3. The unicasting of PIM C-Join/Prune messages (described in section 276 3.4.1.3) 278 4. The use of BGP for carrying C-Multicast routing (described in 279 section 3.4.2). 281 3.3.1. PE-PE signaling scalability 283 Scalability being one of the core requirements for multicast VPN, it 284 is useful to compare the proposed C-multicast routing mechanisms from 285 this perspective : Section 4.2.4 of [RFC4834] recommends that "a 286 multicast VPN solution SHOULD support several hundreds of PEs per 287 multicast VPN, and MAY usefully scale up to thousands" and section 288 4.2.5 states that "a solution SHOULD scale up to thousands of PEs 289 having multicast service enabled". 291 Scalability with an increased number of VPNs per PE, or with an 292 increased number of multicast state per VPN, are also important, but 293 are not focused on in this section since we didn't identify 294 differences between the different approaches for these matters : all 295 others things equal, the load on PE due to C-multicast routing 296 increases roughly linearly with the number of VPNs per PE, and with 297 the number of multicast state per VPN. 299 This section presents conclusions related to PE-PE signaling 300 scalability, based on Appendix A that provides more detailed 301 explanations on the differences in ways of handling the C-multicast 302 routing load, between the PIM-based approaches and the BGP-based 303 approach, along with a quantified evaluations of the amount of state 304 and messages with the different approaches. Many points made in this 305 section are base on the conclusions in Appendix A.4. 307 At high scales of multicast deployment, the first and third 308 mechanisms require the PEs to maintain a large number of PIM 309 adjacencies with other PEs of the same multicast VPN (which implies 310 the regular exchange PIM Hellos with each other) and to refresh 311 C-Join/Prune states, resulting in an increased processing cost when 312 the amount of PEs increases (as detailed in Appendix A) to which the 313 second approach is less subject, and to which the fourth approach is 314 not subject. 316 The third mechanism would reduce the amount of C-Join/Prune 317 processing for a given multicast flow for PEs that are not the 318 upstream neighbor for this flow, but would require "explicit 319 tracking" state to be maintained by the upstream PE. It also isn't 320 compatible with the "Join suppression" mechanism. A possible way to 321 reduce the amount of signaling with this approach would be the use of 322 a PIM refresh-reduction mechanism. Such a mechanism, based on TCP, 323 is being specified by the PIM IETF Working Group 324 ([I-D.ietf-pim-port]) ; its use in a multicast VPN context has not 325 been described in [I-D.ietf-l3vpn-2547bis-mcast], but it is expected 326 that this approach would provide a scalability similar with the BGP- 327 based approach without RR. 329 The second mechanism would operate in a similar manner to full per- 330 MVPN PIM peering except that PIM Hello messages are not transmitted 331 and PIM C-Join/Prune refresh-reduction would be used, thereby 332 improving scalability, but this approach has yet to be fully 333 described. In any case, it seems that it only improves one thing 334 among the things that will impact scalability with an increased 335 number of PEs. 337 The first and second mechanisms can leverage the "Join suppression" 338 behavior and thus improve the processing burden of an upstream PE, 339 sparing the processing of a Join refresh message for each remote PE 340 joined to a multicast stream. This improvement requires all PEs of a 341 multicast VPN to process all PIM Join and Prune messages sent by any 342 other PE participating in the same multicast VPN whether they are the 343 upstream PE or not. 345 The fourth mechanism (the use of BGP for carrying C-Multicast 346 routing) would have a comparable drawback of requiring all PEs to 347 process a BGP C-multicast route only interesting a specific upstream 348 PE. For this reason the C-multicast routing approach can leverage 349 the Route-Target constraint mechanisms, which specifically allows 350 only the interested upstream PE to receive a BGP C-multicast route. 351 When RT-constraints are used the fourth mechanism reduces the total 352 amount of message processing load put on the PEs for customer 353 multicast routing to the minimum (by avoiding any processing by 354 "unrelated" PEs, that are not the joining PE nor the upstream PE, and 355 by avoiding the use of refreshes), and inherits BGP features that are 356 expected to improve scalability (for instance, providing a means to 357 offload some of the processing burden associated with client 358 multicast routing onto one or many BGP route-reflectors). This 359 advantage has a cost (the maintenance of a amount of state linear 360 with the number of PEs joined to a stream), but when route reflectors 361 are used, this cost is spread among the route reflectors. 363 However, the fourth mechanism is specific in that it offers the 364 possibility of offloading customer multicast routing processing onto 365 one or more BGP Route Reflector(s). When this is used, there is a 366 drawback of increasing the processing load placed on the route 367 reflector infrastructure. In the higher scale scenarios, it may be 368 required to adapt the route reflector infrastructure to the mVPN 369 routing load by using, for example: 371 o a separation of resources for unicast and multicast VPN routing : 372 using dedicated mVPN Route Reflector(s) (or using dedicated mVPN 373 BGP sessions or dedicated mVPN BGP instances) ; 375 o the deployment of additional route reflector resources, for 376 example increasing the processing resources on existing route 377 reflectors or deployment of additional route reflectors. 379 Among the above, the most straightforward approach is to consider the 380 introduction of route reflectors dedicated to the mVPN service and 381 dimension them accordingly to the need of that service (but doing so 382 is not required and is left as an operator engineering decision). 384 3.3.2. P-routers scalability 386 Mechanisms (1) and (2) are restricted to use within multicast VPNs 387 that use an MI-PMSI, thereby necessitating: 389 the use of a P-multicast tree technique that allows shared trees 390 (for example PIM-SM in ASM mode or MP2MP LDP) 392 or the use of one P-multicast tree per PE per VPN, even for PEs 393 that do not have sources in their directly attached sites for that 394 VPN. 396 By comparison, the fourth mechanism doesn't impose either of these 397 restrictions, and when P2MP trees are used only necessitates the use 398 of one tree per VPN per PE attached to a site with a multicast source 399 or RP (or with a candidate BSR, if BSR is used). 401 In cases where there are less PEs connected with sources than the 402 total amount of PEs, it improves the amount of state maintained by 403 P-routers compared to the amount required to build an MI-PMSI with 404 P2MP trees. Such cases are expected to be frequent for multicast VPN 405 deployments (see sections 4.2.4.1 of [RFC4834]). 407 3.3.3. Impact of C-multicast routing on Inter-AS deployments 409 Furthermore, co-existence with unicast inter-AS VPN options, and an 410 equal level of security for multicast and unicast including in an 411 inter-AS context, are specifically mentioned in sections 5.2.6, 5.2.8 412 and 5.2.12 of [RFC4834]. 414 In an inter-AS option B context, an isolation of ASes is obtained as 415 PEs don't have visibility of, nor exchange with, PEs of other ASes. 416 This property can be preserved if the segmented inter-AS approach and 417 BGP-based C-multicast routing is used, but it is not preserved if 418 PIM-based signaling is used. 420 By comparison, the fourth option (the use of BGP for carrying 421 C-Multicast routing) does not have any of the above limitations 422 related to inter-AS deployments. 424 Additionally, the authors note that the proposed BGP-based approach 425 for C-multicast routing provides a good fit with both the segmented 426 and non-segmented inter-AS approaches. By contrast, though the PIM- 427 based C-multicast routing is usable with segmented inter-AS trees, 428 the inter-AS scalability advantage of the approach is lost, since PEs 429 in an AS will see the C-multicast routing activity of all other PEs 430 of all other ASes. 432 3.3.4. Security and robustness 434 BGP supports MD5 authentication of its peers for additional security, 435 thereby possibly benefit directly to multicast VPN customer multicast 436 routing, whether for intra-AS or inter-AS communications. By 437 contrast, with a PIM-based approach, no mechanism providing a 438 comparable level of security to authenticate communications between 439 remote PEs has been yet fully described yet 440 [I-D.ietf-pim-sm-linklocal][], and in any case would require 441 significant additional operations for the provider to be usable in a 442 multicast VPN context. 444 The robustness of the infrastructure, especially the existing 445 infrastructure providing unicast VPN connectivity, is key. The 446 C-multicast routing function, especially under load, will compete 447 with the unicast routing infrastructure. With the PIM-based 448 approaches, the unicast and multicast VPN routing functions are 449 expected to only compete in the PE, for control plane processing 450 resources. In the case of the BGP-based approach, they will compete 451 on the PE for processing resources, and in the route reflectors 452 (supposing they are used for mVPN routing). It is identified that in 453 both cases, mechanisms will be required to arbitrate resources (e.g. 454 processing priorities). In the case of PIM-based procedures, between 455 the different control plane routing instances in the PE. And in the 456 case of the BGP-based approach, this is likely to require using 457 distinct BGP sessions for multicast and unicast (e.g. through the use 458 of dedicated mVPN BGP route reflectors, or to the use of a distinct 459 session with an existing route reflector). 461 Multicast routing is dynamic by nature, and multicast VPN routing has 462 to follow the VPN customers multicast routing events. The different 463 approaches can be compared on how they are expected to behave in 464 scenarios where multicast routing in the VPNs is subject to an 465 intense activity. Scalability of each approach under such a load is 466 detailed in Appendix A, and the fourth approach (BGP-based) is the 467 only one having a O(1) cost for join/leave operations, and with which 468 state maintenance is not concentrated on the upstream PE. 470 On the other hand, while the BGP-based approach is likely to suffer a 471 slowdown under a load that is greater than the available processing 472 resources (because of possibly congested TCP sockets), the PIM-based 473 approaches would react to such a load by dropping messages, with 474 failure-recovery obtained through message refreshes. Thus, the BGP- 475 based approach could result in a degradation of join/leave latency 476 performance typically spread evenly across all multicast streams 477 being joined in that period, while the PIM-based approach could 478 result in increased join/leave latency, for some random streams, by a 479 multiple of the time between refreshes (e.g. tens of seconds), and 480 possibly in some states the adjacency may time-out resulting in 481 disruption of multicast streams. 483 The behavior of the PIM-based approach under such a load is also 484 harder to predict, given that the performance of the "Join 485 suppression" mechanism (an important mechanism for this approach to 486 scale) will itself be impeded by delays in Join processing. For 487 these reasons, the BGP-based approach would be able to provide a 488 smoother degradation and more predictable behavior under a highly 489 dynamic load. 491 In fact, both an "evenly spread degradation" and an "unevenly spread 492 larger degradation" can be problematic, and what seems important is 493 the ability for the VPN backbone operator to (a) limit the amount of 494 multicast routing activity that can be triggered by a multicast VPN 495 customer, and to (b) provide the best possible independence between 496 distinct VPNs. It seems that both of these can be addressed through 497 local implementation improvements, and that both the BGP-based and 498 PIM-based approaches could be engineered to provide (a) and (b). It 499 can be noted though that the BGP approach proposes ways to dampen 500 C-multicast route withdrawals and/or advertisements, and thus already 501 describes a way to provide (a), while nothing comparable has yet been 502 described for the PIM-based approaches (even though it doesn't appear 503 difficult). The PIM-based approaches rely on a per VPN dataplane to 504 carry the mVPN control plane, and thus may benefit from this first 505 level of separation to solve (b). 507 3.3.5. C-multicast VPN join latency 509 Section 5.1.3 of [RFC4834] states that "the group join delay [...] is 510 also considered one important QoS parameter. It is thus RECOMMENDED 511 that a multicast VPN solution be designed appropriately in this 512 regard". In a multicast VPN context, the "group join delay"of 513 interest is the time between a CE sending a PIM Join to its PE and 514 the first packet of the corresponding multicast stream being received 515 by the CE. 517 It is to be noted that the C-multicast routing procedures will only 518 impact the group join latency of a said multicast stream for the 519 first receiver that is located across the provider backbone from the 520 multicast source-connected PE (or the first receivers in the 521 specific case where a specific UMH selection algorithm is used, that 522 allows distinct UMH to be selected by distinct downstream PEs). 524 The different approaches proposed seem to have different 525 characteristics in how they are expected to impact join latency: 527 o the PIM-based approaches minimize the number of control plane 528 processing hops between a new receiver-connected PE and the 529 source-connected PE, and being datagram-based introduces minimal 530 delay, thereby possibly having a join latency as good as possible 531 depending on implementation efficiency 533 o under degraded conditions (packet loss, congestion, high control 534 plane load) the PIM-based approach may impact the latency for a 535 given multicast stream in an all or nothing manner : if a 536 C-multicast routing PIM Join packet is lost, latency can reach a 537 high time (a multiple of the periodicity of PIM Join refreshes) 539 o the BGP-based approach uses TCP exchanges, that may introduce an 540 additional delay depending on BGP and TCP implementation, but 541 which would typically result, under degraded conditions (such 542 packet loss, congestion, high control plane load), in a comparably 543 lower increase of latency spread more evenly across the streams 545 o as shown in Appendix A, the BGP-based approach is particular in 546 that it removes load from all the PEs (without putting this load 547 on the upstream PE for a stream); this improvement of background 548 load can bring improved performance when a PE acts as the upstream 549 PE for a stream, and thus benefit join latency 551 This qualitative comparison of approaches shows that the BGP-based 552 approach is designed for a smoother degradation of latency under 553 degraded conditions such as packet loss, congestion, or high control 554 plane load. On the other hand, the PIM-based approaches seem to 555 structurally be able to reach the shorter "best-case" group join 556 latency (especially compared to deployment of the BGP-based approach 557 where route-reflectors are used). 559 Doing a quantitative comparison of latencies is not possible without 560 referring to specific implementations and benchmarking procedures, 561 and would possibly expose different conclusions, especially for best- 562 case group join latency for which performance is expected vary with 563 PIM and BGP implementations. We can also note that improving a BGP 564 implementation for reduced latency of route processing would not only 565 benefit multicast VPN group join latency, but the whole BGP-based 566 routing, which means that the need for good BGP/RR performance is not 567 specific to multicast VPN routing. 569 Last, C-multicast join latency will be impacted by the overall load 570 put on the control plane, and the scalability of the C-multicast 571 routing approach is thus to be taken into account. As explained in 572 sections Section 3.3.1 and Appendix A, the BGP-based approach will 573 provide the best scalability with an increased number of PEs per VPN, 574 thereby benefiting group join latency in such higher scale scenarios. 576 3.3.6. Conclusion on C-multicast routing 578 The first and fourth approaches are relevant contenders for 579 C-multicast routing. Comparisons from a theoretical standpoint lead 580 to identify some advantages in the fourth approach, but possible 581 drawbacks are also identified for this approach. Comparisons from a 582 practical standpoint are harder to make, since only reduced 583 deployment and implementation information is available for the fourth 584 approach, but by default advantages would be seen in the first 585 approach has been applied through multiple deployments and shown to 586 be operationally viable. 588 Moreover, the first mechanism (full per-MVPN PIM peering across an 589 MI-PMSI) is the mechanism used by [I-D.rosen-vpn-mcast] and therefore 590 it is deployed and operating in MVPNs today. The fourth approach may 591 or may not end up being preferred for a said deployment, but because 592 the first approach has been in deployment for some time, the support 593 for this mechanism will in any case be helpful for to facilitate an 594 eventual migration from a deployment using mechanism close to the 595 first approach. 597 Consequently, at the present time, implementations are recommended to 598 support both the fourth (BGP-based) and first (Full per-MPVN PIM 599 peering) mechanisms. Further experience on deployments of the fourth 600 approach is needed before some best practice can be defined. 602 3.4. Encapsulation techniques for P-multicast trees 604 In this section the authors will not make any restricting 605 recommendations since the appropriateness of a specific provider core 606 data plane technology will depend on a large number of factors, for 607 example the service provider's currently deployed unicast data plane, 608 many of which are service provider specific. 610 However, implementations should not unreasonably restrict the data 611 plane technology that can be used, and should not force the use of 612 the same technology for different VPNs attached to a single PE. 613 Initial implementations may only support a reduced set of 614 encapsulation techniques and data plane technologies but this should 615 not be a limiting factor that hinders future support for other 616 encapsulation techniques, data plane technologies or 617 interoperability. 619 Section 5.2.4.1 of [RFC4834] states "In a multicast VPN solution 620 extending a unicast L3 PPVPN solution, consistency in the tunneling 621 technology has to be favored: such a solution SHOULD allow the use of 622 the same tunneling technology for multicast as for unicast. 623 Deployment consistency, ease of operation and potential migrations 624 are the main motivations behind this requirement." 626 Current unicast VPN deployments use a variety of LDP, RSVP-TE and 627 GRE/IP-Multicast for encapsulating customer packets for transport 628 across the provider core of VPN services. In order to allow the same 629 encapsulations to be used for unicast and multicast VPN traffic, it 630 is recommended that multicast VPN standards should recommend 631 implementations to support for multicast VPNs, all the P2MP variants 632 of the encapsulations and signaling protocols that they support for 633 unicast and for which some multipoint extension is defined, such as 634 mLDP, P2MP RSVP-TE and GRE/IP-multicast. 636 All three of the above encapsulation techniques support the building 637 of P2MP multicast trees. In addition mLDP and GRE/IP-ASM-Multicast 638 implementations may also support the building of MP2MP multicast 639 trees. The use of MP2MP trees may provide some scaling benefits to 640 the service provider as only a single MP2MP tree need be deployed per 641 VPN, thus reducing by an order of magnitude the amount of multicast 642 state that needs to be maintained by P routers. This gain in state 643 is at the expense of bandwidth optimization, since sites that do not 644 have multicast receivers for multicast streams sourced behind a said 645 PE group will still receive packets of such streams, leading to non- 646 optimal bandwidth utilization across the VPN core. One thing to 647 consider is that the use of MP2MP multicast tree will require 648 additional configuration to define the same tree identifier or 649 multicast ASM group address in all PEs (it has been noted that some 650 auto-configuration could be possible for MP2MP trees, but this it is 651 not currently supported by the auto-discovery procedures). [ It has 652 been noted that C-multicast routing schemes not covered in 653 [I-D.ietf-l3vpn-2547bis-mcast] could expose different advantages of 654 MP2MP multicast trees - this is out of scope of this document ] 656 MVPN services can also be supported over a unicast VPN core through 657 the use of ingress PE replication whereby the ingress PE replicates 658 any multicast traffic over the P2P tunnels used to support unicast 659 traffic. While this option does not require the service provider to 660 modify their existing P routers (in terms of protocol support) and 661 does not require maintaining multicast-specific state on the P 662 routers in order for the service provider to be able deploy a 663 multicast VPN service, the use of ingress PE replication obviously 664 leads to non-optimal bandwidth utilization and it is therefore 665 unlikely to be the long term solution chosen by service providers. 666 However ingress PE replication may be useful during some migration 667 scenarios or where a service provider considers the level of 668 multicast traffic on their network to be too low to justify deploying 669 multicast specific support within their VPN core. 671 All proposed approaches for control plane and dataplane can be used 672 to provide aggregation amongst multicast groups within a VPN and 673 amongst different multicast VPNs, and potentially reduce the amount 674 of state to be maintained by P routers. However the latter -- the 675 aggregation amongst different multicast VPNs will require support for 676 upstream-assigned labels on the PEs. Support for upstream-assigned 677 labels may require changes to the data plane processing of the PEs 678 and this should be taken into consideration by service providers 679 considering the use of aggregate S-PMSI tunnels for the specific 680 platforms that the service provider has deployed. 682 3.5. Inter-AS deployments options 684 There are a number of scenarios that lead to the requirement for 685 inter-AS multicast VPNs, including: 687 1. a service provider may have a large network that they have 688 segmented into a number of ASs. 690 2. a service provider's multicast VPN may consist of a number of ASs 691 due to acquisitions and mergers with other service providers. 693 3. a service provider may wish to interconnect their multicast VPN 694 platform with that of another service provider. 696 The first scenario can be considered the "simplest" because the 697 network is wholly managed by a single service provider under a single 698 strategy and is therefore likely to use a consistent set of 699 technologies across each AS. 701 The second scenario may be more complex than the first because the 702 strategy and technology choices made for each AS may have been 703 different due to their differing history and the service provider may 704 not have (or may be unwilling to) unified the strategy and technology 705 choices for each AS. 707 The third scenario is the most complex because in addition to the 708 complexity of the second scenario, the ASs are managed by different 709 service providers and therefore may be subject to a different trust 710 model than the other scenarios. 712 Section 5.2.6 of [RFC4834] states that "a solution MUST support 713 inter-AS multicast VPNs, and SHOULD support inter-provider multicast 714 VPNs", "considerations about coexistence with unicast inter-AS VPN 715 Options A, B and C (as described in section 10 of [RFC4364]) are 716 strongly encouraged" and "a multicast VPN solution SHOULD provide 717 inter-AS mechanisms requiring the least possible coordination between 718 providers, and keep the need for detailed knowledge of providers' 719 networks to a minimum - all this being in comparison with 720 corresponding unicast VPN options". 722 Section 8 of [I-D.ietf-l3vpn-2547bis-mcast] addresses these 723 requirements by proposing two approaches for mVPN inter-AS 724 deployments: 726 1. Non-segmented inter-AS tunnels where the multicast tunnels are 727 end-to-end across ASes, so even though the PEs belonging to a 728 given MVPN may be in different ASs the ASBRs play no special role 729 and function merely as P routers (described in section 8.1). 731 2. Segmented inter-AS tunnels where each AS constructs its own 732 separate multicast tunnels which are then 'stitched' together by 733 the ASBRs (described in section 8.2). 735 Section 5.2.6 of [RFC4834] also states "Within each service provider 736 the service provider SHOULD be able on its own to pick the most 737 appropriate tunneling mechanism to carry (multicast) traffic among 738 PEs (just like what is done today for unicast)". The segmented 739 approach is the only one capable of meeting this requirement. 741 The segmented inter-AS solution would appear to offer the largest 742 degree of deployment flexibility to operators. However the non- 743 segmented inter-AS solution can simplify deployment in a restricted 744 number of scenarios and [I-D.rosen-vpn-mcast] only supports the non- 745 segmented inter-AS solution and therefore the non-segmented inter-AS 746 solution is likely to be useful to some operators for backward 747 compatibility and during migration from [I-D.rosen-vpn-mcast] to 748 [I-D.ietf-l3vpn-2547bis-mcast]. 750 The applicability of segmented or non-segmented inter-AS tunnels to a 751 given deployment or inter-provider interconnect will depend on a 752 number of factors specific to each service provider. However, due to 753 the additional deployment flexibility offered by segmented inter-AS 754 tunnels, it is the recommendation of the authors that all 755 implementations should support the segmented inter-AS model. 756 Additionally, the authors recommend that implementations should 757 consider supporting the non-segmented inter-AS model in order to 758 facilitate co-existence with existing deployments, and as a feature 759 to provide a lighter engineering in a restricted set of scenarios, 760 although it is recognized that initial implementations may only 761 support one or the other. 763 3.6. Bidir-PIM support 765 In Bidir-PIM, the packet forwarding rules have been improved over 766 PIM-SM, allowing traffic to be passed up the shared tree toward the 767 RP Address (RPA). To avoid multicast packet looping, Bidir-PIM uses 768 a mechanism called the designated forwarder (DF) election, which 769 establishes a loop-free tree rooted at the RPA. Use of this method 770 ensures that only one copy of every packet will be sent to an RPA, 771 even if there are parallel equal cost paths to the RPA. To avoid 772 loops the DF election process enforces consistent view of the DF on 773 all routers on network segment, and during periods of ambiguity or 774 routing convergence the traffic forwarding is suspended. 776 In the context of a multicast VPN solution, a solution for Bidir-PIM 777 support must preserve this property of similarly avoiding packet 778 loops, including in the case where mVRF's in a given MVPN don't have 779 a consistent view of the routing to C-RPL/C-RPA. 781 The current MVPN specifications [I-D.ietf-l3vpn-2547bis-mcast] in 782 section 11, define three methods to support Bidir-PIM, as RECOMMENDED 783 in [RFC4834] : 785 1. Standard DF election procedure over an MI-PMSI 787 2. VPN Backbone as the RPL (section 11.1) 789 3. Partitioned Sets of PEs (section 11.2) 791 Method (1) is naturally applied to deployments using "Full per-MVPN 792 PIM peering across an MI-PMSI" for C-multicast routing, but as 793 indicated in [I-D.ietf-l3vpn-2547bis-mcast] in section 11, the DF 794 Election may not work well in an mVPN environment and an alternative 795 to DF election would be desirable. 797 The advantage of method (2) and (3) is that they do not require 798 running the DF election procedure among PEs. 800 Method (2) leverages the fact that in Bidir-PIM, running the DF 801 election procedure is not needed on the RPL. This approach thus has 802 the benefit of simplicity of implementation, especially in a context 803 where BGP-based C-multicast routing is used. However it has the 804 drawback of putting constraints on how Bidir-PIM is deployed which 805 may not always match mVPN customers requirements. 807 Method (3) treats an mVPN as a collection of sets of multicast VRFs, 808 all PEs in a set having the same reachability information towards 809 C-RPA, but distinct from PEs in other sets. Hence, with this method, 810 C-Bidir packet loops in MVPN are resolved by the ability to partition 811 a VPN into disjoints sets of VRF's, each having a distinct view of 812 converged network. The partitioning approach to Bidir-PIM requires 813 either upstream-assigned MPLS labels (to denote the partition) or a 814 unique MP2MP LSP per partition. The former is based on PE 815 Distinguisher Labels that have to be distributed using auto-discovery 816 BGP routes and their handling requires the support for upstream 817 assigned labels and context label lookups [ref]. The latter, using 818 MP2MP LSP per partition, does not have these constraints but is 819 restricted to P-tunnel types supporting MP2MP connectivity (such as 820 mLDP[ref]). 822 This approach to C-Bidir can work with PIM-based or BGP-based 823 C-multicast routing procedures, and is also generic in the sense that 824 it does not impose any requirements on the Bidir-PIM service 825 offering. 827 Given the above considerations, method (3) "Partitioned Sets of PEs" 828 is the RECOMMENDED approach. 830 In the event where method (3) is not applicable (lack of support for 831 upstream assigned labels or for a P-tunnel type providing MP2MP 832 connectivity), then method (1) "Standard DF election procedure over 833 an MI-PMSI" and (2) "VPN Backbone as the RPL" are RECOMMENDED as 834 interim solutions, (1) having the advantage over (2) of not putting 835 constraints on how Bidir-PIM is deployed and the drawbacks of only 836 being applicable when PIM-based C-multicast is used and of possibly 837 not working well in an mVPN environment. 839 4. Co-located RPs 841 Section 5.1.10.1 of [RFC4834] states "In the case of PIM-SM in ASM 842 mode, engineering of the RP function requires the deployment of 843 specific protocols and associated configurations. A service provider 844 may offer to manage customers' multicast protocol operation on their 845 behalf. This implies that it is necessary to consider cases where a 846 customer's RPs are out-sourced (e.g. on PEs). Consequently, a VPN 847 solution MAY support the hosting of the RP function in a VR or VRF." 849 However, customers who have already deployed multicast within their 850 networks and have therefore already deployed their own internal RPs 851 are often reluctant to hand over the control of their RPs to their 852 service provider and make use of a co-located RP model, and providing 853 RP-collocation on a PE will require the activation of MSDP or the 854 processing of PIM Registers on the PE. Securing the PE routers for 855 such activity requires special care, additional work, and will likely 856 rely on specific features to be provided by the routers themselves. 858 The applicability of the co-located RP model to a given MVPN will 859 thus depend on a number of factors specific to each customer and 860 service provider. 862 It is therefore the recommendation that implementations should 863 support a co-located RP model, but that support for a co-located RP 864 model within an implementation should not restrict deployments to 865 using a co-located RP model : implementations MUST support 866 deployments when activation of a PIM RP function (PIM Register 867 processing and RP-specific PIM procedures) or VRF MSDP instance is 868 not required on any PE router and where all the RPs are deployed 869 within the customers' networks or CEs. 871 5. Existing deployments 873 Some suggestions provided in this document can be used to 874 incrementally modify currently deployed implementations without 875 hindering these deployments, and without hindering the consistency of 876 the standardized solution by providing optional per-VRF configuration 877 knobs to support modes of operation compatible with currently 878 deployed implementations, while at the same time using the 879 recommended approach on implementations supporting the standard. 881 In cases where this may not be easily achieved, a recommended 882 approach would be to provide a per-VRF configuration knob that allows 883 incremental per-VPN migration of the mechanisms used by a PE device, 884 which would allow migration with some per-VPN interruption of service 885 (e.g. during a maintenance window). 887 Mechanisms allowing "live" migration by providing concurrent use of 888 multiple alternatives for a given PE and a given VPN, is not seen as 889 a priority considering the expected implementation complexity 890 associated with such mechanisms. However, if there happen to be 891 cases where they could be viably implemented relatively simply, such 892 mechanisms may help improve migration management. 894 6. Summary of recommendations 896 The following list summarizes conclusions on the mechanisms that 897 define the set of mandatory to implement mechanisms in the context of 898 [I-D.ietf-l3vpn-2547bis-mcast]. 900 Note well that the implementation of the non-mandatory alternative 901 mechanisms is not precluded. 903 Recommendations are: 905 o that BGP-based auto-discovery be the mandated solution for auto- 906 discovery ; 908 o that BGP be the mandated solution for S-PMSI switching signaling ; 910 o that implementations support both the BGP-based and the full per- 911 MPVN PIM peering solutions for PE-PE transmission of customer 912 multicast routing until further operational experience is gained 913 with both solutions ; 915 o that implementations use the "Partitioned Sets of PEs" approach 916 for Bidir-PIM support ; 918 o that implementations implement the P2MP variants of the P2P 919 protocols that they already implement, such as mLDP, P2MP RSVP-TE 920 and GRE/IP-Multicast ; 922 o that implementations support segmented inter-AS tunnels and 923 consider supporting non-segmented inter-AS tunnels (in order to 924 maintain backwards compatibility and for migration) ; 926 o implementations MUST support deployments when activation of a PIM 927 RP function (PIM Register processing and RP-specific PIM 928 procedures) or VRF MSDP instance is not required on any PE router. 930 7. IANA Considerations 932 This document makes no request to IANA. 934 [ Note to RFC Editor: this section may be removed on publication as 935 an RFC. ] 937 8. Security Considerations 939 This document does not by itself raise any particular security 940 considerations. 942 9. Acknowledgements 944 We would like to thank Adrian Farrel, Eric Rosen, Yakov Rekhter, and 945 Maria Napierala for their feedback that helped shape this document. 947 Additional credit is due to Maria Napierala for co-authoring 948 Section 3.6 on Bidir-PIM support. 950 10. Informative References 952 [RFC4834] Morin, T., "Requirements for Multicast in L3 Provider- 953 Provisioned Virtual Private Networks (PPVPNs)", RFC 4834, 954 April 2007. 956 [I-D.ietf-l3vpn-2547bis-mcast] 957 Aggarwal, R., Bandi, S., Cai, Y., Morin, T., Rekhter, Y., 958 Rosen, E., Wijnands, I., and S. Yasukawa, "Multicast in 959 MPLS/BGP IP VPNs", draft-ietf-l3vpn-2547bis-mcast-08 (work 960 in progress), March 2009. 962 [I-D.ietf-l3vpn-2547bis-mcast-bgp] 963 Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP 964 Encodings and Procedures for Multicast in MPLS/BGP IP 965 VPNs", draft-ietf-l3vpn-2547bis-mcast-bgp-07 (work in 966 progress), April 2009. 968 [I-D.rosen-vpn-mcast] 969 Cai, Y., Rosen, E., and I. Wijnands, "Multicast in MPLS/ 970 BGP IP VPNs", draft-rosen-vpn-mcast-11 (work in progress), 971 June 2009. 973 [I-D.raggarwa-l3vpn-2547-mvpn] 974 Aggarwal, R., "Base Specification for Multicast in BGP/ 975 MPLS VPNs", draft-raggarwa-l3vpn-2547-mvpn-00 (work in 976 progress), June 2004. 978 [I-D.ietf-pim-sm-linklocal] 979 Atwood, J., "Authentication and Confidentiality in PIM-SM 980 Link-local Messages", draft-ietf-pim-sm-linklocal-02 (work 981 in progress), November 2007. 983 [I-D.ietf-pim-port] 984 Farinacci, D., Wijnands, I., Venaas, S., and M. Napierala, 985 "A Reliable Transport Mechanism for PIM", 986 draft-ietf-pim-port-01 (work in progress), July 2009. 988 [RFC4684] Marques, P., Bonica, R., Fang, L., Martini, L., Raszuk, 989 R., Patel, K., and J. Guichard, "Constrained Route 990 Distribution for Border Gateway Protocol/MultiProtocol 991 Label Switching (BGP/MPLS) Internet Protocol (IP) Virtual 992 Private Networks (VPNs)", RFC 4684, November 2006. 994 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 995 Requirement Levels", BCP 14, RFC 2119, March 1997. 997 Appendix A. Scalability of C-multicast routing processing load 999 The main role of multicast routing is to let routers determine that 1000 they should start or stop forwarding a said multicast stream on a 1001 said link. In the multicast VPN context, this has to be done for 1002 each VPN, and the associated function is thus named "customer- 1003 multicast routing" or "C-multicast routing" and its role is to let PE 1004 routers determine that they should start or stop forwarding the 1005 traffic of a said multicast stream toward the remote PEs, on some 1006 PMSI tunnel. 1008 When some "join" message is received by a PE, this PE knows that it 1009 should be sending traffic for the corresponding multicast group of 1010 the corresponding VPN. But the reception of a "prune" message from a 1011 remote PE is not enough by itself for a PE to know that it should 1012 stop forwarding the corresponding multicast traffic : it has to make 1013 sure that they aren't any other PEs that still have receivers for 1014 this traffic. 1016 There are many ways that the "C-multicast routing" building block can 1017 be designed, and they differ, among other things, in how a PE 1018 determines when it can stop forwarding a said multicast stream toward 1019 other PEs: 1021 PIM LAN Procedures, by default 1022 By default when PIM LAN procedures are used, when a PE Prunes 1023 itself from a multicast tree, all other PEs check their own state 1024 to known if they are on the tree, in which case they send a PIM 1025 Join message to override the Prune. Thus, for each PIM Prune 1026 message, all PE routers work to let the upstream PE determine the 1027 answer to the "did the last receiver leave?" question. 1029 PIM LAN Procedures, with explicit tracking : 1030 PIM LAN procedures can use an "explicit tracking" approach, where 1031 a PE which is the upstream router for a multicast stream maintains 1032 an updated list of all neighbors who are joined to the tree. 1033 Thus, when it receives a Leave message from a PIM neighbor, it 1034 instantly knows the answer to the "did the last receiver leave?" 1035 question. 1036 In this case, the question is answered by the upstream router 1037 alone. The side effect of this "explicit tracking" is that "Join 1038 suppression" is not used : the downstream PEs will always send 1039 Joins toward the upstream PE, which will have to process them all. 1041 BGP-based C-multicast routing 1042 When BGP-based procedures are used for C-multicast routing, if no 1043 BGP route reflector is used, the "did the last receiver leave?" 1044 question is answered like in the PIM "explicit tracking" approach. 1045 But, when a BGP route reflector is used (which is expected to be 1046 the recommended approach), the role of maintaining an updated list 1047 of the PE part of a said multicast tree is taken care of by the 1048 route reflector(s). Using plain BGP route selection procedures, 1049 the route reflector will withdraw a C-multicast Source Tree Join 1050 for a said (C-S,C-G) when there is no PE advertising one anymore. 1051 In this context, the "did the last receiver leave?" question can 1052 be said to be answered by the route-reflector(s). 1053 Furthermore, the BGP route distribution can leverage more than one 1054 route reflector : if a hierarchy of route reflectors is used, the 1055 "did the last receiver leave?" question is partly answered by each 1056 route reflector in the hierarchy. 1058 We can see that answering the "last receiver leaves" question is a 1059 significant proportion of the work that the C-multicast routing 1060 building block has to make, and where the approaches differ most. 1061 The different approaches for handling C-multicast routing can result 1062 in a different amount of processing and how this processing is spread 1063 among the different functions. These differences can be better 1064 estimated by quantifying the amount of message processing and state 1065 maintenance. 1067 Though the type of processing, messages and states, may vary with the 1068 different approaches, we propose here a rough estimation of the load 1069 of PEs, in terms of number of messages processed and number of 1070 control plane states maintained : a "message processed" being a 1071 message being parsed, a lookup being done, and some action being 1072 taken (such has updating a control plane or data plane state), and a 1073 "state maintained" being a multicast state kept in the control plane 1074 memory of a PE, related to a interface or a PE being subscribed to a 1075 multicast stream (we don't compare the data plane states on PE 1076 routers, which wouldn't vary between the different options chosen). 1078 The following subsections do such an estimation for each proposed 1079 approach for C-multicast routing, for different phases of the 1080 following scenario: 1082 o one SSM multicast stream is considered (extrapolating to a higher 1083 number of streams is linear) 1085 o only the intra-AS case is concerned (with the segmented inter-AS 1086 trees and BGP-based C-multicast routing, #mvpn_PE and #R_PE should 1087 refer to the PEs of the mVPN in the AS, not to all PEs of the 1088 mVPN) 1090 o the scenario is as follows: 1092 * one PE Joins the multicast stream (because of a new receiver- 1093 connected site has sent a Join on the PE-CE link), followed by 1094 a number of additional PEs that also join the multicast stream, 1095 one after the other ; we evaluate the processing required for 1096 the addition of each PE 1098 * some period of time T passes, without any PE joining or leaving 1099 (baseline) 1101 * all PE leaves, one after the other, until the last one leaves ; 1102 we evaluate the processing required for the leave of each PE 1104 o the parameters used are: 1106 * #mVPN_PE : the number of PEs in the mVPN 1108 * #R_PE : the number of PEs joining the multicast stream 1110 * #RR : the number of route reflectors 1112 * T_PIM_r : the time between two refreshes of a PIM Join (default 1113 is 60s) 1115 The estimation unit used is the "message.equipment" (or "m.e"): one 1116 "message.equipment" corresponding to "one equipment processing one 1117 message" (10 m.e being "10 equipments processing each one message", 1118 or "5 messages each processed by 2 equipments", or "1 message 1119 processed by 10 equipment", etc.). Similarly, for the amount of 1120 control plane state, the unit used is "state.equipment" or "s.e". 1121 This allow to take into account the fact that a message (or a state) 1122 can have be processed (or maintained) by more than one node. 1124 We distinguish three different types of equipments : the upstream PE 1125 for the considered multicast stream, the RR (if any), and the other 1126 PEs (which are not the upstream PE). 1128 The numbers or orders of magnitude given in the tables in the 1129 following subsections are totals across all equipments of a same 1130 type, for each type of equipment, in the the "m.e" and "s.e" units 1131 defined above. 1133 Additional precisions: 1135 o for PIM, only Join and Prune messages are counted ; the PIM Hellos 1136 are not counted since these are not messages that trigger specific 1137 action in a typical scenario; message processing related to the 1138 PIM Assert mechanism is also not taken into account, because it is 1139 only active in transient state 1141 o for BGP, only UPDATE messages for mVPN routes carrying C-multicast 1142 routing information are considered 1144 A.1. PIM LAN procedures, by default 1146 +------------+------------+---------------+----------+--------------+ 1147 | | upstream | other PEs | RR | total across | 1148 | | PE (1) | (total across | (none) | all | 1149 | | | (#mvpn_PE-1) | | equipments | 1150 | | | PEs) | | | 1151 +------------+------------+---------------+----------+--------------+ 1152 | first PE | 1 m.e | #mVPN_PE-1 | / | #mVPN_PE m.e | 1153 | joins | | m.e | | | 1154 +------------+------------+---------------+----------+--------------+ 1155 | for *each* | 1 m.e | #mvpn_PE-1 | / | #mvpn_PE m.e | 1156 | additional | | m.e | | | 1157 | PE joining | | | | | 1158 +------------+------------+---------------+----------+--------------+ 1159 | baseline | T/T_PIM_r | (T/T_PIM_r) . | / | (T/T_PIM_r) | 1160 | processing | m.e | (#mvpn_PE-1) | | x #mvpn_PE | 1161 | over a | | m.e | | m.e | 1162 | period T | | | | | 1163 +------------+------------+---------------+----------+--------------+ 1164 | for *each* | 2 m.e | 2(#mvpn_PE-1) | / | 2 x #mvpn_PE | 1165 | PE leaving | | m.e | | m.e | 1166 +------------+------------+---------------+----------+--------------+ 1167 | the last | 1 m.e | #mvpn_PE-1 | / | #mvpn_PE m.e | 1168 | PE leaves | | m.e | | | 1169 +------------+------------+---------------+----------+--------------+ 1170 | total for | #R_PE x 2 | (#mvpn_PE-1) | 0 | #mvpn_PE x ( | 1171 | #R_PE PEs | + | x (#R_PE) x 2 | | 3 x #R_PE + | 1172 | | T/T_PIM_r | + T/T_PIM_r) | | T/T_PIM_r ) | 1173 | | m.e | . | | m.e | 1174 | | | (#mvpn_PE-1) | | | 1175 | | | m.e | | | 1176 +------------+------------+---------------+----------+--------------+ 1177 | total | 1 s.e | #R_PE s.e | 0 | #R_PE+1 s.e | 1178 | state | | | | | 1179 | maintained | | | | | 1180 +------------+------------+---------------+----------+--------------+ 1182 Messages processing and state maintenance - PIM LAN procedures, by 1183 default 1185 We suppose here that the PIM Join suppression and Prune Override 1186 mechanisms are fully effective, i.e. that a Join or Prune message 1187 sent by a PE is instantly seen by other PEs. Strictly speaking, this 1188 is not true, and depending on network delays and timing, there could 1189 be cases where more messages are exchanged and the number given in 1190 this table is a lower bound to the number of PIM messages exchanged. 1192 A.2. PIM LAN procedures, with explicit tracking 1194 +-------------+-------------+----------------+--------+-------------+ 1195 | | upstream PE | other PEs | RRs | total | 1196 | | (1) | (total across | (none) | across all | 1197 | | | (#mvpn_PE-1) | | equipments | 1198 | | | PEs) | | | 1199 +-------------+-------------+----------------+--------+-------------+ 1200 | first PE | 1 m.e | 1 m.e (see | / | 2 m.e | 1201 | joins | | note below) | | | 1202 +-------------+-------------+----------------+--------+-------------+ 1203 | for *each* | 1 m.e | 1 m.e (see | / | 2 m.e | 1204 | additional | | note below) | | | 1205 | PE joining | | | | | 1206 +-------------+-------------+----------------+--------+-------------+ 1207 | baseline | (T/T_PIM_r) | (T/T_PIM_r) | / | (T/T_PIM_r) | 1208 | processing | m.e x #R_PE | m.e (see note | | x #R_PE m.e | 1209 | over a | m.e | below) | | | 1210 | period T | | | | | 1211 +-------------+-------------+----------------+--------+-------------+ 1212 | for *each* | 1 m.e | 1 m.e (see | / | 2 m.e | 1213 | PE leaving | | note below) | | | 1214 +-------------+-------------+----------------+--------+-------------+ 1215 | the last PE | 1 m.e | 1 m.e (see | / | 2 m.e | 1216 | leaves | | note below) | | | 1217 +-------------+-------------+----------------+--------+-------------+ 1218 | total for | #R_PE (2 + | #R_PE x ( 2 + | 0 | #R_PE x ( 4 | 1219 | #R_PE PEs | T/T_PIM_r) | T/T_PIM_r) m.e | | + | 1220 | | m.e | | | T/T_PIM_r) | 1221 | | | | | m.e | 1222 +-------------+-------------+----------------+--------+-------------+ 1223 | total state | #R_PE s.e | #R_PE s.e | 0 | 2 x #R_PE | 1224 | maintained | | | | s.e | 1225 +-------------+-------------+----------------+--------+-------------+ 1227 Messages processing and state maintenance - PIM LAN procedures, with 1228 explicit tracking 1230 Note: in this explicit tracking mode, a said Join or Leave message 1231 requires processing only by the upstream PE and the PE sending the 1232 message ; indeed, other PEs don't have any action to take ; it is to 1233 be noted though that these other PEs will still have to parse the PIM 1234 message, which is not zero processing. We make here the assumption 1235 that this is not significant. 1237 A.3. BGP-based 1239 About RR: we suppose that a message has to be processed by r BGP 1240 route reflectors to go from a receiver-connected PE to the source- 1241 connected PE. In practice, r depends on how RR are meshed, and would 1242 typically be small (max 1,2,3...), and r tends quickly toward 1 (as 1243 soon as there is a receiver-connected PEs in each RR cluster). 1245 We make the assumption that BGP constrained VPN route distribution 1246 [RFC4684] is used, if not the amount of state and message processing 1247 with this approach is similar to the PIM with explicit tracking 1248 approachAppendix A.2, without the Joins refreshes. 1250 +-------------+----------+---------------+------------+-------------+ 1251 | | upstream | other PEs | RRs (#RR) | total | 1252 | | PE (1) | (total across | | across all | 1253 | | | (#mvpn_PE-1) | | equipments | 1254 | | | PEs) | | | 1255 +-------------+----------+---------------+------------+-------------+ 1256 | first PE | 1 m.e | 1 m.e | r m.e | (r+2) m.e | 1257 | joins | | | | | 1258 +-------------+----------+---------------+------------+-------------+ 1259 | for *each* | 0 | 1 m.e | between 1 | between 2 | 1260 | additional | | | and r m.e | and (r+1) | 1261 | PE joining | | | | m.e | 1262 +-------------+----------+---------------+------------+-------------+ 1263 | baseline | 0 | 0 | 0 | 0 | 1264 | processing | | | | | 1265 | over a | | | | | 1266 | period T | | | | | 1267 +-------------+----------+---------------+------------+-------------+ 1268 | for *each* | 0 | 1 m.e | between 1 | between 2 | 1269 | PE leaving | | | and r m.e | and (r+1) | 1270 | | | | | m.e | 1271 +-------------+----------+---------------+------------+-------------+ 1272 | the last PE | 1 m.e | 1 m.e | r m.e | (r+2) m.e | 1273 | leaves | | | | | 1274 +-------------+----------+---------------+------------+-------------+ 1275 | total for | 2 m.e | #R_PE x 2 m.e | 2 | 2 (2 x | 1276 | #R_PE PEs | | | (r+#R_PE) | #R_PE + r + | 1277 | | | | m.e | 1) m.e | 1278 +-------------+----------+---------------+------------+-------------+ 1279 | total state | 2 s.e | 2 x #R_PE s.e | approx. | approx. | 1280 | maintained | | | (#R_PE x | (#R_PE x | 1281 | | | | #RR) s.e | (#RR+2)) | 1282 | | | | | m.e | 1283 +-------------+----------+---------------+------------+-------------+ 1284 Message processing and state maintenance - BGP-based procedures 1286 A.4. Side by side orders of magnitude comparison 1288 This section concludes on the previous section by considering the 1289 orders of magnitude when the number of PEs in a VPN increases. 1291 +------------+----------------------+--------------+----------------+ 1292 | | PIM LAN Procedures, | PIM LAN | BGP-based | 1293 | | default | Procedures, | | 1294 | | | explicit | | 1295 | | | tracking | | 1296 +------------+----------------------+--------------+----------------+ 1297 | first PE | O(#mVPN_PE) | O(1) | O(1) | 1298 | joins (in | | | | 1299 | m.e) | | | | 1300 +------------+----------------------+--------------+----------------+ 1301 | for *each* | O(#mVPN_PE) | O(1) | O(1) | 1302 | additional | | | | 1303 | PE joining | | | | 1304 | (in m.e) | | | | 1305 +------------+----------------------+--------------+----------------+ 1306 | baseline | (T/T_PIM_r) x | (T/T_PIM_r) | 0 | 1307 | processing | O(#mvpn_PE) | x O(#R_PE) | | 1308 | over a | | | | 1309 | period T | | | | 1310 | (in m.e) | | | | 1311 +------------+----------------------+--------------+----------------+ 1312 | for *each* | O(#mVPN_PE) | O(1) | O(1) | 1313 | PE leaving | | | | 1314 | (in m.e) | | | | 1315 +------------+----------------------+--------------+----------------+ 1316 | the last | O(#mVPN_PE) | O(1) | O(1) | 1317 | PE leaves | | | | 1318 | (in m.e) | | | | 1319 +------------+----------------------+--------------+----------------+ 1320 | total for | O(#mVPN_PE x #R_PE) | O(#R_PE) x | O(#R_PE) | 1321 | #R_PE PEs | + O(#mVPN_PE x | (T/T_PIM_r) | | 1322 | (in m.e) | T/T_PIM_r) | | | 1323 +------------+----------------------+--------------+----------------+ 1324 | states (in | O(#R_PE) | O(#R_PE) | O(#R_PE x #RR) | 1325 | s.e) | | | | 1326 +------------+----------------------+--------------+----------------+ 1327 +------------+----------------------+--------------+----------------+ 1328 | notes | (processing and | (processing | (processing | 1329 | | state maintenance | and state | and state | 1330 | | are essentially done | maintenance | maintenance is | 1331 | | by, and spread | is | essentially | 1332 | | amongst, the PEs of | essentially | done by, and | 1333 | | the MVPN ; | done on the | spread | 1334 | | non-upstream PEs | upstream PE) | amongst, the | 1335 | | have processing to | | RRs) | 1336 | | do) | | | 1337 +------------+----------------------+--------------+----------------+ 1339 Comparison of orders of magnitude for messages processing and state 1340 maintenance (totals across all equipements) 1342 The conclusions that can be drawn from the above are that: 1344 o the PIM LAN Procedures default approach is particular in that all 1345 PEs, including those that are neither upstream nor downstream for 1346 a given message have processing to do, which results in a total 1347 amount of messages to process which is in O(#mVPN_PE x #R_PE), 1348 i.e. O(#mVPN_PE ^ 2) if the proportion of receiver PEs is 1349 considered constant when the number of PEs increases ; 1351 o the two PIM-based approach do refreshes of Join messages, this is 1352 a linear factor not changing the order of magnitude, but which can 1353 be significant for long-lived streams ; 1355 o the BGP-based approach requires an amount of message processing in 1356 O(#R_PE), lower than the two other approaches, and which is 1357 independent of the duration of streams ; 1359 o state maintenance is in the same order of magnitude for all 1360 approaches : O(#R_PE), but the repartition is different: 1362 * the PIM LAN Procedure default approach fully spreads, and 1363 minimizes, the amount of state (one state per PE) 1365 * the PIM LAN procedure with explicit tracking, concentrate all 1366 state on the upstream PE 1368 * the BGP-based procedures spread all the state on the set of 1369 route reflectors 1371 This quantification of message processing is based on a use case 1372 where each PE with receivers has joined and left once. Drawing 1373 scalability-related conclusions for other patterns of changes of the 1374 set of receiver-connected PEs, requires considering the cost of each 1375 approach for "a new PE joining" and "a (non-last) PE leaving". From 1376 this perspective, the "PIM LAN Procedure default approach" is the one 1377 with the higher total amount of message processing across all nodes 1378 (in O(#mVPN_PE)), whereas the other approaches are in O(1) ; the "PIM 1379 LAN Procedures with explicit tracking" reduce the processing to the 1380 minimum in that case, the BGP-based approach having a cost increasing 1381 by a linear factor depending on the number of RRs that will have to 1382 parse the message. 1384 Appendix B. Switching to S-PMSI 1386 [ the following point was fixed in version 07 of 1387 [I-D.ietf-l3vpn-2547bis-mcast], and is here for reference only ] 1389 Section 7.2.2.3 of [I-D.ietf-l3vpn-2547bis-mcast] proposes two 1390 approaches for how a source PE can decide when to start transmitting 1391 customer multicast traffic on a S-PMSI: 1393 1. The source PE sends multicast packets for the on both 1394 the I-PMSI P-multicast tree and the S-PMSI P-multicast tree 1395 simultaneously for a pre-configured period of time, letting the 1396 receiver PEs select the new tree for reception, before switching 1397 to only the S-PMSI. 1399 2. The source PE waits for a pre-configured period of time after 1400 advertising the entry bound to the S-PMSI before fully 1401 switching the traffic onto the S-PMSI-bound P-multicast tree. 1403 The first alternative has essentially two drawbacks: 1405 o traffic is sent twice for some period of time, which 1406 would appear to be at odds with the motivation for switching to an 1407 S-PMSI in order to optimize the bandwidth used by the multicast 1408 tree for that stream. 1410 o It is unlikely that the switchover can occur without packet loss 1411 or duplication if the transit delays of the I-PMSI P-multicast 1412 tree and the S-PMSI P-multicast tree differ. 1414 By contrast, the second alternative has none of these drawbacks, and 1415 satisfy the requirement in section 5.1.3 of [RFC4834], which states 1416 that "[...] a multicast VPN solution SHOULD as much as possible 1417 ensure that client multicast traffic packets are neither lost nor 1418 duplicated, even when changes occur in the way a client multicast 1419 data stream is carried over the provider network". The second 1420 alternative also happen to be the one used in existing deployments. 1422 For these reasons, it is the authors' recommendation to mandate the 1423 implementation of the second alternative for switching to S-PMSI. 1425 Authors' Addresses 1427 Thomas Morin (editor) 1428 France Telecom - Orange Labs 1429 2 rue Pierre Marzin 1430 Lannion 22307 1431 France 1433 Email: thomas.morin@orange-ftgroup.com 1435 Ben Niven-Jenkins (editor) 1436 BT 1437 208 Callisto House, Adastral Park 1438 Ipswich, Suffolk IP5 3RE 1439 UK 1441 Email: benjamin.niven-jenkins@bt.com 1443 Yuji Kamite 1444 NTT Communications Corporation 1445 Tokyo Opera City Tower 1446 3-20-2 Nishi Shinjuku, Shinjuku-ku 1447 Tokyo 163-1421 1448 Japan 1450 Email: y.kamite@ntt.com 1452 Raymond Zhang 1453 BT 1454 2160 E. Grand Ave. 1455 El Segundo CA 90025 1456 USA 1458 Email: raymond.zhang@bt.com 1459 Nicolai Leymann 1460 Deutsche Telekom 1461 Goslarer Ufer 35 1462 10589 Berlin 1463 Germany 1465 Email: n.leymann@telekom.de 1467 Nabil Bitar 1468 Verizon 1469 40 Sylvan Road 1470 Waltham, MA 02451 1471 USA 1473 Email: nabil.n.bitar@verizon.com