idnits 2.17.1 draft-morin-l3vpn-mvpn-considerations-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 24. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 1415. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1426. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1433. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1439. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 3 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year == Line 394 has weird spacing: '... or the us...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 10, 2008) is 5767 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'RFC4364' is mentioned on line 746, but not defined == Outdated reference: A later version (-10) exists of draft-ietf-l3vpn-2547bis-mcast-06 == Outdated reference: A later version (-08) exists of draft-ietf-l3vpn-2547bis-mcast-bgp-05 == Outdated reference: A later version (-15) exists of draft-rosen-vpn-mcast-08 == Outdated reference: A later version (-10) exists of draft-ietf-pim-sm-linklocal-02 Summary: 1 error (**), 0 flaws (~~), 8 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group T. Morin, Ed. 3 Internet-Draft France Telecom R&D 4 Intended status: Informational B. Niven-Jenkins, Ed. 5 Expires: January 11, 2009 BT 6 Y. Kamite 7 NTT Communications 8 R. Zhang 9 BT 10 N. Leymann 11 Deutsche Telekom 12 N. Bitar 13 Verizon 14 July 10, 2008 16 Considerations about Multicast for BGP/MPLS VPN Standardization 17 draft-morin-l3vpn-mvpn-considerations-03 19 Status of this Memo 21 By submitting this Internet-Draft, each author represents that any 22 applicable patent or other IPR claims of which he or she is aware 23 have been or will be disclosed, and any of which he or she becomes 24 aware will be disclosed, in accordance with Section 6 of BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF), its areas, and its working groups. Note that 28 other groups may also distribute working documents as Internet- 29 Drafts. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 The list of current Internet-Drafts can be accessed at 37 http://www.ietf.org/ietf/1id-abstracts.txt. 39 The list of Internet-Draft Shadow Directories can be accessed at 40 http://www.ietf.org/shadow.html. 42 This Internet-Draft will expire on January 11, 2009. 44 Abstract 46 The current proposal for multicast in BGP/MPLS includes multiple 47 alternative mechanisms for some of the required building blocks of 48 the solution. The aim of this document is to leverage previously 49 documented requirements to identify the key elements and help move 50 forward solution design, toward the definition of a standard having a 51 well defined set of mandatory procedures. The different proposed 52 alternative mechanisms are examined in the light of requirements 53 identified for multicast in L3VPNs, and suggestions are made about 54 which of these mechanisms standardization should favor. Issues 55 related to existing deployments of early implementations are also 56 addressed. 58 Requirements Language 60 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 61 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 62 document are to be interpreted as described in [RFC2119]. 64 Table of Contents 66 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 67 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 68 3. Examining alternatives mechanisms for MVPN functions . . . . . 4 69 3.1. MVPN auto-discovery . . . . . . . . . . . . . . . . . . . 4 70 3.2. S-PMSI Signaling . . . . . . . . . . . . . . . . . . . . . 6 71 3.3. PE-PE Transmission of C-Multicast Routing . . . . . . . . 7 72 3.3.1. PE-PE signaling scalability . . . . . . . . . . . . . 8 73 3.3.2. P-routers scalability . . . . . . . . . . . . . . . . 10 74 3.3.3. Impact of C-multicast routing on Inter-AS 75 deployments . . . . . . . . . . . . . . . . . . . . . 10 76 3.3.4. Security and robustness . . . . . . . . . . . . . . . 11 77 3.3.5. C-multicast VPN join latency . . . . . . . . . . . . . 12 78 3.3.6. Extranet . . . . . . . . . . . . . . . . . . . . . . . 14 79 3.3.7. Conclusion on C-multicast routing . . . . . . . . . . 14 80 3.4. Encapsulation techniques for P-multicast trees . . . . . . 15 81 3.5. Inter-AS deployments options . . . . . . . . . . . . . . . 16 82 4. Co-located RPs . . . . . . . . . . . . . . . . . . . . . . . . 18 83 5. Existing deployments . . . . . . . . . . . . . . . . . . . . . 19 84 6. Summary of recommendations . . . . . . . . . . . . . . . . . . 19 85 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 86 8. Security Considerations . . . . . . . . . . . . . . . . . . . 20 87 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 20 88 10. Informative References . . . . . . . . . . . . . . . . . . . . 20 89 Appendix A. Scalability of C-multicast routing processing load . 21 90 A.1. PIM LAN procedures, by default . . . . . . . . . . . . . . 24 91 A.2. PIM LAN procedures, with explicit tracking . . . . . . . . 25 92 A.3. BGP-based . . . . . . . . . . . . . . . . . . . . . . . . 26 93 A.4. Side by side orders of magnitude comparison . . . . . . . 27 94 Appendix B. Switching to S-PMSI . . . . . . . . . . . . . . . . . 29 95 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 30 96 Intellectual Property and Copyright Statements . . . . . . . . . . 32 98 1. Introduction 100 The current proposal for multicast in BGP/MPLS 101 [I-D.ietf-l3vpn-2547bis-mcast] includes multiple alternative 102 mechanisms for some of the required building blocks of the solution. 103 However, it does not identify the core set of mechanisms which must 104 be implemented in order to ensure interoperability. This may lead to 105 a situation where implementations may support different subsets of 106 the available optional mechanisms leading to implementations that do 107 not interoperate, which is a problem for the numerous operators 108 ahaving multi-vendor backbones. 110 The aim of this document is to leverage the already expressed 111 requirements [RFC4834] and study the properties of each approach, to 112 identify mechanisms that are good candidates for being part of a core 113 set of mandatory mechanisms which can be used to provide a base for 114 interoperable solutions. 116 This document will go through the different building blocks of the 117 solution and provide recommendations as to which mechanisms should be 118 favored for each building block, while considering the requirements 119 already defined and the goal of a fully-interoperable standard. 121 Considering the history of the multicast VPN proposals and 122 implementations, the authors also consider it useful to discuss how 123 existing deployments of early implementations 124 [I-D.rosen-vpn-mcast][I-D.raggarwa-l3vpn-2547-mvpn] can fit in the 125 picture, and provide suggestions in this respect. 127 [This document will evolve to follow key changes in multicast in BGP/ 128 MPLS [I-D.ietf-l3vpn-2547bis-mcast] and 129 [I-D.ietf-l3vpn-2547bis-mcast-bgp]. Such changes are for instance, 130 clear statements about compatibility between the different approaches 131 and other optional features, or completed description of procedures 132 that are not currently detailed.] 134 2. Terminology 136 Please refer to [I-D.ietf-l3vpn-2547bis-mcast] and [RFC4834]. 138 3. Examining alternatives mechanisms for MVPN functions 140 3.1. MVPN auto-discovery 142 The current solution document [I-D.ietf-l3vpn-2547bis-mcast] proposes 143 two different mechanisms for MVPN auto-discovery: 145 1. BGP-based auto-discovery 147 2. "PIM/shared tree" : discovery done through the exchange of PIM 148 Hellos by C-PIM instances, accross an MI-PMSI implemented with 149 one shared tree per VPN (using multicast ASM, or MP2MP LDP) 151 Both solutions address Section 5.2.10 of [RFC4834] which states that 152 "the operation of a multicast VPN solution SHALL be as light as 153 possible and providing automatic configuration and discovery SHOULD 154 be a priority when designing a multicast VPN solution. Particularly 155 the operational burden of setting up multicast on a PE or for a VR/ 156 VRF SHOULD be as low as possible". 158 The key consideration is that PIM-based discovery is only applicable 159 to deployments using a shared tree to instantiate an MI-PMSI (it 160 cannot be applicable to if only P2P or SSM trees are used, because 161 contrary to ASM and MP2MP, building these P2P or SSM trees cannot 162 happen before the autodiscovery has been done), whereas the BGP-based 163 auto-discovery does not place any constraint on the type of multicast 164 trees that would have to be used. BGP-based auto-discovery is 165 independent of the type of P-multicast tree used thus satisfying the 166 requirement in section 5.2.4.1 of [RFC4834] that "a multicast VPN 167 solution SHOULD be designed so that control and forwarding planes are 168 not interdependent". 170 Additionally, it is to be noted that a number of service providers 171 have chosen to use SSM-based trees for the default MDTs within their 172 current deployments, therefore relying already on some BGP-based 173 auto-discovery. 175 Moreover, when shared P-tunnels are used, the use of BGP auto- 176 discovery would allow inconsistencies in the addresses/identifiers 177 used for the shared trees to be detected (e.g. the same shared tree 178 identifier being used for different VPNs with distinct BGP route 179 targets). This is particularly attractive in the context of inter-AS 180 VPNs where the impact of any misconfiguration could be magnified and 181 where a single service provider may not operate all the ASs. Note 182 that this technique to detect some misconfiguration cases may not be 183 usable during a transition period from a shared-tree autodiscovery to 184 a BGP-based autodiscovery. 186 Thus, the recommendation is that implementation of the BGP-based 187 auto-discovery is mandated and should be supported by all mVPN 188 implementations (while PIM/shared-tree based auto-discovery should be 189 optionally considered for migration purpose only). 191 3.2. S-PMSI Signaling 193 The current solution document [I-D.ietf-l3vpn-2547bis-mcast] proposes 194 two mechanisms for signaling that multicast flows will be switched to 195 an S-PMSI : 197 1. a UDP-based TLV protocol specifically for S-PMSI signaling 198 (described in section 7.2.1). 200 2. a BGP-based mechanism for S-PMSI signaling (described in section 201 7.2.2). 203 Section 5.2.10 of [RFC4834] states that "as far as possible, the 204 design of a solution SHOULD carefully consider the number of 205 protocols within the core network: if any additional protocols are 206 introduced compared with the unicast VPN service, the balance between 207 their advantage and operational burden SHOULD be examined 208 thoroughly". The UDP-based mechanism would be an additional protocol 209 in the mvpn stack, which isn't the case for the BGP-based S-PMSI 210 switching signaling, since (a) BGP is identified as a requirement for 211 autodiscovery, and (b) the BGP-based S-PMSI switching signaling 212 procedures are very similar to the autodiscovery procedures. 214 Furthermore, the BGP-based S-PMSI switching signaling mechanism can 215 be used within MVPNs using either a UI-PMSI or a MI-PMSI while the 216 UDP-based protocol is restricted to use within MVPNs using an MI- 217 PMSI. In practice, this means that, except if shared trees are used, 218 a PE will have to join to all trees of all PEs in a VPN, while in the 219 alternative where BGP-based S-PMSI switching signaling is used, it 220 could delay joining a tree from a PE until traffic from that PE is 221 needed, thus reducing the amount of state maintained on P routers. 223 S-PMSI switching signaling approaches can also be compared in an 224 inter-AS context (see Section 3.5). The proposed BGP-based approach 225 for S-PMSI switching signaling provides a good fit with both the 226 segmented and non-segmented inter-AS approaches (seeSection 3.5). By 227 contrast the UDP-based approach for S-PMSI switching signaling 228 appears to be usable with segmented inter-AS tunnels, but in that 229 case key advantages of the segmented approach are lost : 231 o there is no more an independence of ASes to choose when S-PMSIs 232 tunnels will be triggered in their AS (and thus control the amount 233 of state created on their P routers), and with which tunneling 234 technique they will be built 236 o in an inter-AS option B context, an isolation of ASes is obtained 237 as PEs don't have visibility of, nor exchange with, PEs of other 238 ASes. This property can be preserved if the segmented inter-AS 239 approach and BGP-based S-PMSI switching signaling are used, but it 240 is not preserved if UDP-based switching signaling is used. 242 Given all the above, it is the recommendation of the authors that BGP 243 is the preferred solution for S-PMSI switching signaling and should 244 be supported by all implementations. 246 It is identified that, if nothing prevents a fast-paced creation of 247 S-PMSI, then S-PMSI switching signaling with BGP would possibly 248 impact the Route Reflectors used for mVPN routes. However is it also 249 identified that such a fast-paced behavior would have an impact on P 250 and PE routers resulting from S-PMSI tunnels signaling, which will be 251 the same independently of the S-PMSI signaling approach that is used, 252 and which it is certainly best to avoid by setting up proper 253 mechanisms. 255 The UDP-based S-PMSI switching signaling protocol can also be 256 considered, as an option, given that this protocol has been in 257 deployment for some time. Implementations supporting both protocols 258 would be expected to provide a per-VRF configuration knob to allow an 259 implementation to use the UDP-based TLV protocol for S-PMSI switching 260 signaling for specific VRFs in order to support the coexistence of 261 both protocols (for example during migration scenarios). Apart from 262 such migration-facilitating mechanisms, the authors specifically do 263 not recommend extending the already proposed UDP-based TLV protocol 264 to new types of P-multicast trees. 266 3.3. PE-PE Transmission of C-Multicast Routing 268 The current solution document [I-D.ietf-l3vpn-2547bis-mcast] proposes 269 multiple mechanisms for PE-PE transmission of customer multicast 270 routing information: 272 1. Full per-MVPN PIM peering across an MI-PMSI (described in section 273 3.4.1.1). 275 2. Lightweight PIM peering across an MI-PMSI (described in section 276 3.4.1.2) 278 3. The unicasting of PIM C-Join/Prune messages (described in section 279 3.4.1.3) 281 4. The use of BGP for carrying C-Multicast routing (described in 282 section 3.4.2). 284 3.3.1. PE-PE signaling scalability 286 Scalability being one of the core requirements for multicast VPN, it 287 is useful to compare the proposed C-multicast routing mechanisms from 288 this perspective : Section 4.2.4 of [RFC4834] recommends that "a 289 multicast VPN solution SHOULD support several hundreds of PEs per 290 multicast VPN, and MAY usefully scale up to thousands" and section 291 4.2.5 states that "a solution SHOULD scale up to thousands of PEs 292 having multicast service enabled". 294 Scalability with an increased number of VPNs per PE, or with an 295 increased number of multicast state per VPN, are also important, but 296 are not focused on in this section since we didn't identify 297 differences between the different approaches for these matters : all 298 others things equal, the load on PE due to C-multicast routing 299 increases roughly linearly with the number of VPNs per PE, and with 300 the number of multicast state per VPN. 302 This section thus presents conclusions related to PE-PE signaling 303 scalability, while Appendix A contains more detailed explanations on 304 the differences in ways of handling the C-multicast routing load, 305 between the PIM-based approaches and the BGP-based approach, along 306 with quantified evaluations of the amount of state and messages with 307 the different approaches. 309 At high scales of multicast deployment, the first and third 310 mechanisms require the PEs to maintain a large number of PIM 311 adjacencies with other PEs of the same multicast VPN (which implies 312 the regular exchange PIM Hellos with each other) and to refresh 313 C-Join/Prune states, thus limiting the scalability of these 314 approaches. 316 The third mechanism would reduce the amount of C-Join/Prune 317 processing for a given multicast flow for PEs that are not the 318 upstream neighbor for this flow, but would require "explicit 319 tracking" state to be maintained by the upstream PE. It also isn't 320 compatible with the "Join suppression" mechanism. A possible way to 321 reduce the amount of signaling with this approach would be the use of 322 a PIM refresh-reduction mechanism. Such a mechanism, based on TCP, 323 is being considered by the PIM WG ([I-D.farinacci-pim-port]) ; its 324 use in a multicast VPN context hasn't yet been described in 325 [I-D.ietf-l3vpn-2547bis-mcast], but it is expected that this approach 326 would provide a scalability similar with the BGP-based approach used 327 without leveraging RR to process the PE-PE C-multicast routing. 328 [TBC, when/if, this is further described in 329 [I-D.ietf-l3vpn-2547bis-mcast]]. 331 The second mechanism would operate in a similar manner to full per- 332 MVPN PIM peering except that PIM Hello messages are not transmitted 333 and PIM C-Join/Prune refresh-reduction would be used, thereby 334 improving scalability, but this approach has yet to be fully 335 described. In any case, it seems that it only improves one thing 336 among the things that will impact scalability with an increased 337 number of PEs. 339 The first and second mechanisms can leverage the "Join suppression" 340 behavior and thus improve the processing burden of an upstream PE, 341 sparing the processing of a Join refresh message for each remote PE 342 joined to a multicast stream. This improvement requires all PEs of a 343 multicast VPN to process all PIM Join and Prune messages sent by any 344 other PE participating in the same multicast VPN whether they are the 345 upstream PE or not. 347 The fourth mechanism (the use of BGP for carrying C-Multicast 348 routing) would have a comparable drawback of requiring all PEs to 349 process a BGP C-multicast route only interesting a specific upstream 350 PE. For this reason the C-multicast routing approach can leverage 351 the Route-Target constraint mechanisms, which specifically allows 352 only the interested upstream PE to receive a BGP C-multicast route. 353 When RT-constraints are used the fourth mechanism reduces the total 354 amount of message processing load put on the PEs for customer 355 multicast routing to the minimum (by avoiding any processing by 356 "unrelated" PEs, that are not the joining PE nor the upstream PE, and 357 by avoiding the use of refreshes), and inherits BGP features that are 358 expected to improve scalability (for instance, providing a means to 359 offload some of the processing burden associated with client 360 multicast routing onto one or many BGP route-reflectors). This 361 advantage has a cost (the maintenance of a amount of state linear 362 with the number of PEs joined to a stream), but when route reflectors 363 are used, this cost is spread among the route reflectors. 365 However, the fourth mechanism is specific in that it offers the 366 possibility of offloading customer multicast routing processing onto 367 one or more BGP Route Reflector(s). When this is used, there is a 368 drawback of increasing the processing load placed on the route 369 reflector infrastructure. In the higher scale scenarios, it may be 370 required to adapt the route relector infrastructure to the mVPN 371 routing load by using, for example: 373 o a separation of resources for unicast and multicast VPN routing : 374 using dedicated mVPN Route Reflector(s) (or using dedicated mVPN 375 BGP sessions or dedicated mVPN BGP instances) ; 377 o the deployment of additional route reflector resources, for 378 example increasing the processing resources on existing route 379 reflectors or deployment of additional route reflectors. 381 Among the above, the most straightforward approach is to consider the 382 introduction of route reflectors dedicated to the mVPN service and 383 dimension them accordingly to the need of that service (but doing so 384 is not required and is left as an operator engineering decision). 386 3.3.2. P-routers scalability 388 Mechanisms (1) and (2) are restricted to use within multicast VPNs 389 that use an MI-PMSI, thereby necessitating: 391 the use of a P-multicast tree technique that allows shared trees 392 (for example PIM-SM in ASM mode or MP2MP LDP) 394 or the use of one P-multicast tree per PE per VPN, even for PEs 395 that do not have sources in their directly attached sites for that 396 VPN. 398 By comparison, the fourth mechanism doesn't impose either of these 399 restrictions, and when P2MP trees are used only necessitates the use 400 of one tree per VPN per PE attached to a site with a multicast source 401 or RP (or with a candidate BSR, if BSR is used). 403 In cases where there are less PEs connected with sources than the 404 total amount of PEs, it improves the amount of state maintained by 405 P-routers compared to the amount required to build an MI-PMSI with 406 P2MP trees. Such cases are expected to be typical for multicast VPN 407 deployments (see sections 4.2.4.1 of [RFC4834]). 409 3.3.3. Impact of C-multicast routing on Inter-AS deployments 411 Furthermore, co-existence with unicast inter-AS VPN options, and an 412 equal level of security for multicast and unicast including in an 413 inter-AS context, are specifically mentioned in sections 5.2.6, 5.2.8 414 and 5.2.12 of [RFC4834]. 416 In an inter-AS option B context, an isolation of ASes is obtained as 417 PEs don't have visibility of, nor exchange with, PEs of other ASes. 418 This property can be preserved if the segmented inter-AS approach and 419 BGP-based C-multicast routing is used, but it is not preserved if 420 PIM-based signaling is used. 422 By comparison, the fourth option (the use of BGP for carrying 423 C-Multicast routing) does not have any of the above limitations 424 related to inter-AS deployments. 426 Additionally, the authors note that the proposed BGP-based approach 427 for C-multicast routing provides a good fit with both the segmented 428 and non-segmented inter-AS approaches. By contrast, though the PIM- 429 based C-multicast routing is usable with segmented inter-AS trees, 430 the inter-AS scalability advantage of the approach is lost, since PEs 431 in an AS will see the C-multicast routing activity of all other PEs 432 of all other ASes. 434 3.3.4. Security and robustness 436 BGP supports MD5 authentication of its peers for additional security, 437 thereby possibly benefit directly to multicast VPN customer multicast 438 routing, whether for intra-AS or inter-AS communications. By 439 contrast, with a PIM-based approach, no mechanism providing a 440 comparable level of security to authenticate communications between 441 remote PEs has been yet fully described yet 442 [I-D.ietf-pim-sm-linklocal][], and in any case would require 443 significant additional operations for the provider to be usable in a 444 multicast VPN context. 446 The robustness of the infrastructure, especially the existing 447 infrastructure providing unicast VPN connectivity, is key. The 448 C-multicast routing function, especially under load, will compete 449 with the unicast routing infrastructure. With the PIM-based 450 approaches, the unicast and multicast VPN routing functions are 451 expected to only compete in the PE, for control plane processing 452 resources. In the case of the BGP-based approach, they will compete 453 on the PE for processing resources, and in the route reflectors 454 (supposing they are used for mVPN routing). It is identified that in 455 both cases, mechanisms will be required to arbitrate resources (e.g. 456 processing priorities). In the case of PIM-based procedures, between 457 the different control plane routing instances in the PE. And in the 458 case of the BGP-based approach, this is likely to require using 459 distinct BGP sessions for multicast and unicast (e.g. through the use 460 of dedicated mVPN BGP route reflectors, or to the use of a distinct 461 session with an existing route reflector). 463 Multicast routing is dynamic by nature, and multicast VPN routing has 464 to follow the VPN customers multicast routing events. The different 465 approaches can be compared on how they are expected to behave in 466 scenarios where multicast routing in the VPNs is subject to an 467 intense activity. Scalability of each approach under such a load is 468 detailed in Appendix A, and the fourth approach (BGP-based) is the 469 only one having a O(1) cost for join/leave operations, and with which 470 state maintenance is not concentrated on the upstream PE. 472 On the other hand, while the BGP-based approach is likely to suffer a 473 slowdown under a load that is greater than the available processing 474 resources (because of possibly congested TCP sockets), the PIM-based 475 approaches would react to such a load by dropping messages, with 476 failure-recovery obtained through message refreshes. Thus, the BGP- 477 based approach could result in a degradation of join/leave latency 478 performance typically spread evenly across all multicast streams 479 being joined in that period, while the PIM-based approach could 480 result in increased join/leave latency, for some random streams, by a 481 multiple of the time between refreshes (e.g. tens of seconds), and 482 possibly in some states the adjacency may time-out resulting in 483 disruption of multicast streams. 485 The behavior of the PIM-based approach under such a load is also 486 harder to predict, given that the performance of the "Join 487 suppression" mechanism (an important mechanism for this approach to 488 scale) will itself be impeded by delays in Join processing. For 489 these reasons, the BGP-based approach would be able to provide a 490 smoother degradation and more predictable behavior under a highly 491 dynamic load. 493 In fact, both an "evenly spread degradation" and an "unevenly spread 494 larger degradation" can be problematic, and what seems important is 495 the ability for the VPN backbone operator to (a) limit the amount of 496 multicast routing activity that can be triggered by a multicast VPN 497 customer, and to (b) provide the best possible independence between 498 distinct VPNs. It seems that both of these can be addressed through 499 local implementation improvements, and that both the BGP-based and 500 PIM-based approaches could be engineered to provide (a) and (b). It 501 can be noted though that the BGP approach proposes ways to dampen 502 C-multicast route withdrawals and/or advertisements, and thus already 503 describes a way to provide (a), while nothing comparable has yet been 504 described for the PIM-based approaches (even though it doesn't appear 505 difficult). The PIM-based approaches rely on a per VPN dataplane to 506 carry the mVPN control plane, and thus may benefit from this first 507 level of separation to solve (b). 509 3.3.5. C-multicast VPN join latency 511 Section 5.1.3 of [RFC4834] states that "the group join delay [...] is 512 also considered one important QoS parameter. It is thus RECOMMENDED 513 that a multicast VPN solution be designed appropriately in this 514 regard". In a multicast VPN context, the "group join delay"of 515 interest is the time between a CE sending a PIM Join to its PE and 516 the first packet of the corresponding multicast stream being received 517 by the CE. 519 It is to be noted that the C-multicast routing procedures will only 520 impact the group join latency of a said multicast stream for the 521 first receiver that is located across the provider backbone from the 522 multicast source-connected PE (or the first receivers in the 523 specific case where a specific UMH selection algorithm is used, that 524 allows distinct UMH to be selected by distinct downstream PEs). 526 The different approaches proposed seem to have different 527 characteristics in how they are expected to impact join latency: 529 o the PIM-based approaches minimize the number of control plane 530 processing hops between a new receiver-connected PE and the 531 source-connected PE, and being datagram-based introduces minimal 532 delay, thereby possibly having a join latency as good as possible 533 depending on implementation efficiency 535 o under degraded conditions (packet loss, congestion, high control 536 plane load) the PIM-based approach may impact the latency for a 537 given multicast stream in an all or nothing manner : if a 538 C-multicast routing PIM Join packet is lost, latency can reach a 539 high time (a multiple of the periodicity of PIM Join refreshes) 541 o the BGP-based approach uses TCP exchanges, that may introduce an 542 additional delay depending on BGP and TCP implementation, but 543 which would typically result, under degraded conditions (such 544 packet loss, congestion, high control plane load), in a comparably 545 lower increase of latency spread more evenly across the streams 547 o as shown in Appendix A, the BGP-based approach is particular in 548 that it removes load from all the PEs (without putting this load 549 on the upstream PE for a stream); this improvement of background 550 load can bring improved performance when a PE acts as the upstream 551 PE for a stream, and thus benefit join latency 553 This qualitative comparison of approaches shows that the BGP-based 554 approach is designed for a smoother degradation of latency under 555 degraded conditions such as packet loss, congestion, or high control 556 plane load. On the other hand, the PIM-based approaches seem to 557 structurally be able to reach the shorter "best-case" group join 558 latency (especially compared to deployment of the BGP-based approach 559 where route-reflectors are used). 561 Doing a quantitative comparison of latencies is not possible without 562 referring to specific implementations and benchmarking procedures, 563 and would possibly expose different conclusions, especially for best- 564 case group join latency for which performance is expected vary with 565 PIM and BGP implementations. We can also note that improving a BGP 566 implementation for reduced latency of route processing would not only 567 benefit multicast VPN group join latency, but the whole BGP-based 568 routing, which means that the need for good BGP/RR performance is not 569 specific to multicast VPN routing. 571 Last, C-multicast join latency will be impacted by the overall load 572 put on the control plane, and the scalability of the C-multicast 573 routing approach is thus to be taken into account. As explained in 574 sections Section 3.3.1 and Appendix A, the BGP-based approach will 575 provide the best scalability with an increased number of PEs per VPN, 576 thereby benefiting group join latency in such higher scale scenarios. 578 3.3.6. Extranet 580 An illustrative example of the benefit brought by using a C-multicast 581 routing approach close to the technique for unicast VPN routing is 582 how the "extranet" feature can be implemented : when BGP-based 583 mechanisms are used, the already defined and well understood BGP 584 route target import/export semantics are just reused and applied to 585 BGP mVPN routes. By contrast, it is not specified how implementing 586 the same feature would be done in the context of other C-multicast 587 routing mechanisms, and thus unclear how this would bring a 588 comparable consistency benefit, or if it is possible without 589 significant engineering trade-offs given that their control plane is 590 tied to a specific MI-PMSI tunnel. [to be updated when Extranet is 591 described for approaches other than the BGP-based approaches] 593 Note that the support for the Extranet feature is stated as a MUST in 594 sections 5.1.6 of [RFC4834]. 596 3.3.7. Conclusion on C-multicast routing 598 The fourth approach (BGP-based) for customer multicast routing 599 clearly presents some advantages over the PIM-based alternatives. 600 However it has yet to be deployed within an operational mVPN, and 601 only limited experience exists with its implementations. By 602 contrast, PIM-based mechanisms lack many of these benefits and have 603 identified limitations in how they can handle customer multicast 604 routing load in higher-scale scenarios. Despite these, experience in 605 multiple deployments shows that the "Full PIM peering" approach is 606 operationally viable. 608 Consequently, at the present time and until there is experience with 609 all of the proposed mechanisms it is not clear which of the above 610 mechanisms should be recommended as the preferred solution to 611 implementers. It would appear prudent for implementations to 612 consider supporting both the fourth (BGP-based) and first (full per- 613 MPVN PIM peering) mechanisms. Further experience on both 614 implementations is likely to be required before some best practice 615 can be defined. 617 The first mechanism (full per-MVPN PIM peering across an MI-PMSI) is 618 the mechanism used by [I-D.rosen-vpn-mcast] and therefore it is 619 deployed and operating in MVPNs today. The authors recognize that 620 because full per-MVPN PIM peering has been in deployment for some 621 time, the support for this mechanism may be helpful for backwards 622 compatibility and in order to facilitate migration towards the BGP- 623 based approach. 625 Moreover to improve the clarity of the proposed specifications, if 626 the hello suppression and refresh-reduction procedures are not fully 627 specified and the benefit they can bring well identified, the authors 628 would recommend that the proposals for lightweight PIM peering across 629 an MI-PMSI (the second mechanism) and for the unicasting of PIM 630 C-Join/Prune messages (the third mechanism) be removed from the final 631 revision of [I-D.ietf-l3vpn-2547bis-mcast]. 633 3.4. Encapsulation techniques for P-multicast trees 635 In this section the authors will not make any restricting 636 recommendations since the appropriateness of a specific provider core 637 data plane technology will depend on a large number of factors, for 638 example the service provider's currently deployed unicast data plane, 639 many of which are service provider specific. 641 However, implementations should not unreasonably restrict the data 642 plane technology that can be used, and should not force the use of 643 the same technology for different VPNs attached to a single PE. 644 Initial implementations may only support a reduced set of 645 encapsulation techniques and data plane technologies but this should 646 not be a limiting factor that hinders future support for other 647 encapsulation techniques, data plane technologies or 648 interoperability. 650 Section 5.2.4.1 of [RFC4834] states "In a multicast VPN solution 651 extending a unicast L3 PPVPN solution, consistency in the tunneling 652 technology has to be favored: such a solution SHOULD allow the use of 653 the same tunneling technology for multicast as for unicast. 654 Deployment consistency, ease of operation and potential migrations 655 are the main motivations behind this requirement." 657 Current unicast VPN deployments use a variety of LDP, RSVP-TE and 658 GRE/IP-Multicast for encapsulating customer packets for transport 659 across the provider core of VPN services. In order to allow the same 660 encapsulations to be used for unicast and multicast VPN traffic, it 661 is recommended that multicat VPN standards should recommend 662 implementations to support for multicast VPNs, all the P2MP variants 663 of the encapsulations and signaling protocols that they support for 664 unicast and for which some multipoint extension is defined, such as 665 mLDP, P2MP RSVP-TE and GRE/IP-multicast. 667 All three of the above encapsulation techniques support the building 668 of P2MP multicast trees. In addition mLDP and GRE/IP-ASM-Multicast 669 implementations may also support the building of MP2MP multicast 670 trees. The use of MP2MP trees may provide some scaling benefits to 671 the service provider as only a single MP2MP tree need be deployed per 672 VPN, thus reducing by an order of magnitude the amount of multicast 673 state that needs to be maintained by P routers. This gain in state 674 is at the expense of bandwidth optimization, since sites that do not 675 have multicast receivers for multicast streams sourced behind a said 676 PE group will still receive packets of such streams, leading to non- 677 optimal bandwidth utilization across the VPN core. One thing to 678 consider is that the use of MP2MP multicast tree will require 679 additional configuration to define the same tree identifier or 680 multicast ASM group address in all PEs (it has been noted that some 681 auto-configuration could be possible for MP2MP trees, but this it is 682 not currently supported by the auto-discovery procedures). [ It has 683 been noted that C-multicast routing schemes not covered in 684 [I-D.ietf-l3vpn-2547bis-mcast] could expose different advantages of 685 MP2MP multicast trees - this is out of scope of this document ] 687 MVPN services can also be supported over a unicast VPN core through 688 the use of ingress PE replication whereby the ingress PE replicates 689 any multicast traffic over the P2P tunnels used to support unicast 690 traffic. While this option does not require the service provider to 691 modify their existing P routers (in terms of protocol support) and 692 does not require maintaining multicast-specific state on the P 693 routers in order for the service provider to be able deploy a 694 multicast VPN service, the use of ingress PE replication obviously 695 leads to non-optimal bandwidth utilization and it is therefore 696 unlikely to be the long term solution chosen by service providers. 697 However ingress PE replication may be useful during some migration 698 scenarios or where a service provider considers the level of 699 multicast traffic on their network to be too low to justify deploying 700 multicast specific support within their VPN core. 702 All proposed approaches for control plane and dataplane can be used 703 to provide aggregation amongst multicast groups within a VPN and 704 amongst different multicast VPNs, and potentially reduce the amount 705 of state to be maintained by P routers. However the latter -- the 706 aggregation amongst different multicast VPNs will require support for 707 upstream-assigned labels on the PEs. Support for upstream-assigned 708 labels may require changes to the data plane processing of the PEs 709 and this should be taken into consideration by service providers 710 considering the use of aggregate S-PMSI tunnels for the specific 711 platforms that the service provider has deployed. 713 3.5. Inter-AS deployments options 715 There are a number of scenarios that lead to the requirement for 716 inter-AS multicast VPNs, including: 718 1. a service provider may have a large network that they have 719 segmented into a number of ASs. 721 2. a service provider's multicast VPN may consist of a number of ASs 722 due to acquisitions and mergers with other service providers. 724 3. a service provider may wish to interconnect their multicast VPN 725 platform with that of another service provider. 727 The first scenario can be considered the "simplest" because the 728 network is wholly managed by a single service provider under a single 729 strategy and is therefore likely to use a consistent set of 730 technologies across each AS. 732 The second scenario may be more complex than the first because the 733 strategy and technology choices made for each AS may have been 734 different due to their differing history and the service provider may 735 not have (or may be unwilling to) unified the strategy and technology 736 choices for each AS. 738 The third scenario is the most complex because in addition to the 739 complexity of the second scenario, the ASs are managed by different 740 service providers and therefore may be subject to a different trust 741 model than the other scenarios. 743 Section 5.2.6 of [RFC4834] states that "a solution MUST support 744 inter-AS multicast VPNs, and SHOULD support inter-provider multicast 745 VPNs", "considerations about coexistence with unicast inter-AS VPN 746 Options A, B and C (as described in section 10 of [RFC4364]) are 747 strongly encouraged" and "a multicast VPN solution SHOULD provide 748 inter-AS mechanisms requiring the least possible coordination between 749 providers, and keep the need for detailed knowledge of providers' 750 networks to a minimum - all this being in comparison with 751 corresponding unicast VPN options". 753 Section 8 of [I-D.ietf-l3vpn-2547bis-mcast] addresses these 754 requirements by proposing two approaches for mVPN inter-AS 755 deployments: 757 1. Non-segmented inter-AS tunnels where the multicast tunnels are 758 end-to-end across ASes, so even though the PEs belonging to a 759 given MVPN may be in different ASs the ASBRs play no special role 760 and function merely as P routers (described in section 8.1). 762 2. Segmented inter-AS tunnels where each AS constructs its own 763 separate multicast tunnels which are then 'stitched' together by 764 the ASBRs (described in section 8.2). 766 Section 5.2.6 of [RFC4834] also states "Within each service provider 767 the service provider SHOULD be able on its own to pick the most 768 appropriate tunneling mechanism to carry (multicast) traffic among 769 PEs (just like what is done today for unicast)". The segmented 770 approach is the only one capable of meeting this requirement. 772 The segmented inter-AS solution would appear to offer the largest 773 degree of deployment flexibility to operators. However the non- 774 segmented inter-AS solution can simplify deployment in a restricted 775 number of scenarios and [I-D.rosen-vpn-mcast] only supports the non- 776 segmented inter-AS solution and therefore the non-segmented inter-AS 777 solution is likely to be useful to some operators for backward 778 compatibility and during migration from [I-D.rosen-vpn-mcast] to 779 [I-D.ietf-l3vpn-2547bis-mcast]. 781 The applicability of segmented or non-segmented inter-AS tunnels to a 782 given deployment or inter-provider interconnect will depend on a 783 number of factors specific to each service provider. However, due to 784 the additional deployment flexibility offered by segmented inter-AS 785 tunnels, it is the recommendation of the authors that all 786 implementations should support the segmented inter-AS model. 787 Additionally, the authors recommend that implementations should 788 consider supporting the non-segmented inter-AS model in order to 789 facilitate co-existence with existing deployments, and as a feature 790 to provide a lighter engineering in a restricted set of scenarios, 791 although it is recognized that initial implementations may only 792 support one or the other. 794 4. Co-located RPs 796 Section 5.1.10.1 of [RFC4834] states "In the case of PIM-SM in ASM 797 mode, engineering of the RP function requires the deployment of 798 specific protocols and associated configurations. A service provider 799 may offer to manage customers' multicast protocol operation on their 800 behalf. This implies that it is necessary to consider cases where a 801 customer's RPs are outsourced (e.g., on PEs). Consequently, a VPN 802 solution MAY support the hosting of the RP function in a VR or VRF." 804 However, customers who have already deployed multicast within their 805 networks and have therefore already deployed their own internal RPs 806 are often reluctant to hand over the control of their RPs to their 807 service provider and make use of a co-located RP model, and providing 808 RP-collocation on a PE will require the activation of MSDP or the 809 processing of PIM Registers on the PE. Securing the PE routers for 810 such activity requires special care, additional work, and will likely 811 rely on specific features to be provided by the routers themselves. 813 The applicability of the co-located RP model to a given MVPN will 814 thus depend on a number of factors specific to each customer and 815 service provider. 817 It is therefore the recommendation that implementations should 818 support a co-located RP model, but that support for a co-located RP 819 model within an implementation should not restrict deployments to 820 using a co-located RP model : implementations MUST support 821 deployments when activation of a PIM RP function (PIM Register 822 processing and RP-specific PIM procedures) or VRF MSDP instance is 823 not required on any PE router and where all the RPs are deployed 824 within the customers' networks or CEs. 826 5. Existing deployments 828 Some suggestions provided in this document can be used to 829 incrementally modify currently deployed implementations without 830 hindering these deployments, and without hindering the consistency of 831 the standardized solution by providing optional per-VRF configuration 832 knobs to support modes of operation compatible with currently 833 deployed implementations, while at the same time using the 834 recommended approach on implementations supporting the standard. 836 In cases where this may not be easily achieved, a recommended 837 approach would be to provide a per-VRF configuration knob that allows 838 incremental per-VPN migration of the mechanisms used by a PE device, 839 which would allow migration with some per-VPN interruption of service 840 (e.g. during a maintenance window). 842 Mechanisms allowing "live" migration by providing concurrent use of 843 multiple alternatives for a given PE and a given VPN, is not seen as 844 a priority considering the expected implementation complexity 845 associated with such mechanisms. However, if there happen to be 846 cases where they could be viably implemented relatively simply, such 847 mechanisms may help improve migration management. 849 6. Summary of recommendations 851 The following list summarizes the authors' recommendations. These 852 recommendations are not intended to prevent the implementation of 853 alternative solutions, rather they are the authors' recommendations 854 for the mechanisms that should be made mandatory in 855 [I-D.ietf-l3vpn-2547bis-mcast] and therefore be supported by all 856 implementations. 858 It is the authors' recommendation: 860 o that BGP-based auto-discovery be the mandated solution for auto- 861 discovery ; 863 o that BGP be the mandated solution for S-PMSI switching signaling ; 865 o that implementations support both the BGP-based and the full per- 866 MPVN PIM peering solutions for PE-PE transmission of customer 867 multicast routing until further operational experience is gained 868 with both solutions ; 870 o that implementations implement the P2MP variants of the P2P 871 protocols that they already implement, such as mLDP, P2MP RSVP-TE 872 and GRE/IP-Multicast ; 874 o that implementations support segmented inter-AS tunnels and 875 consider supporting non-segmented inter-AS tunnels (in order to 876 maintain backwards compatibility and for migration) ; 878 o implementations MUST support deployments when activation of a PIM 879 RP function (PIM Register processing and RP-specific PIM 880 procedures) or VRF MSDP instance is not required on any PE router. 882 7. IANA Considerations 884 This document makes no request to IANA. 886 [ Note to RFC Editor: this section may be removed on publication as 887 an RFC. ] 889 8. Security Considerations 891 This document does not by itself raise any particular security 892 considerations. 894 9. Acknowledgements 896 We would like to thank Adrian Farrel, Eric Rosen, Yakov Rekhter, and 897 Maria Napierala for their feedback that helped shape this document. 899 10. Informative References 901 [RFC4834] Morin, T., "Requirements for Multicast in L3 Provider- 902 Provisioned Virtual Private Networks (PPVPNs)", RFC 4834, 903 April 2007. 905 [I-D.ietf-l3vpn-2547bis-mcast] 906 Rosen, E. and R. Aggarwal, "Multicast in MPLS/BGP IP 907 VPNs", draft-ietf-l3vpn-2547bis-mcast-06 (work in 908 progress), October 2006. 910 [I-D.ietf-l3vpn-2547bis-mcast-bgp] 911 Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP 912 Encodings and Procedures for Multicast in MPLS/BGP IP 913 VPNs", draft-ietf-l3vpn-2547bis-mcast-bgp-05 (work in 914 progress), June 2008. 916 [I-D.rosen-vpn-mcast] 917 Rosen, E., "Multicast in MPLS/BGP VPNs", 918 draft-rosen-vpn-mcast-08 (work in progress), 919 December 2004. 921 [I-D.raggarwa-l3vpn-2547-mvpn] 922 Aggarwal, R., "Base Specification for Multicast in BGP/ 923 MPLS VPNs", draft-raggarwa-l3vpn-2547-mvpn-00 (work in 924 progress), June 2004. 926 [I-D.ietf-pim-sm-linklocal] 927 Atwood, J., "Authentication and Confidentiality in PIM-SM 928 Link-local Messages", draft-ietf-pim-sm-linklocal-02 (work 929 in progress), November 2007. 931 [I-D.farinacci-pim-port] 932 Farinacci, D., Wijnands, I., Karan, A., Boers, A., and M. 933 Napierala, "A Reliable Transport Mechanism for PIM", 934 draft-farinacci-pim-port-01 (work in progress), May 2008. 936 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 937 Requirement Levels", BCP 14, RFC 2119, March 1997. 939 Appendix A. Scalability of C-multicast routing processing load 941 The main role of multicast routing is to let routers determine that 942 they should start or stop forwarding a said multicast stream on a 943 said link. In the multicast VPN context, this has to be made for 944 each VPN, and the associated function is thus named "customer- 945 multicast routing" or "C-multicast routing" and its role is to let PE 946 routers determine that they should start of stop forwarding the 947 traffic of a said multicast stream toward the remote PEs, on some 948 S-PMSI tunnel. 950 When some "join" message is received by a PE, this PE knows that it 951 should be sending traffic for the corresponding multicast group of 952 the corresponding VPN. But the reception of a "prune" message from a 953 remote PE is not enough by itself for a PE to know that it should 954 stop forwarding the corresponding multicast traffic : it has to make 955 sure that they aren't any other PEs that still have receivers for 956 this traffic. 958 There are many ways that the "C-multicast routing" building block can 959 be designed so that a PE can determine when it can stop forwarding a 960 said multicast stream toward other PEs: 962 PIM LAN Procedures, by default 963 By default when PIM LAN procedures are used, when a PE Prunes 964 itself from a multicast tree, all other PEs check their own state 965 to known if they are on the tree, in which case they send a PIM 966 Join message to override the Prune. The "did the last receiver 967 leave?" question is thus implicitly replied to by all PE routers, 968 for each PIM Prune message. 970 PIM LAN Procedures, with explicit tracking : 971 PIM LAN procedures can use an "explicit tracking" approach, where 972 a PE which is the upstream router for a multicast stream maintains 973 an updated list of all neighbors who are joined to the tree. 974 Thus, when it receives a Leave message from a PIM neighbor, it 975 instantly knows the answer to the "did the last receiver leave?" 976 question. 977 In this case, the question is replied to by the upstream router 978 alone. The side effect of this "explicit tracking" is that "Join 979 suppression" is not used : the downstream PEs will always send 980 Joins toward the upstream PE, which will have to process them all. 982 BGP-based C-multicast routing 983 When BGP-based procedures are used for C-multicast routing, if no 984 BGP route reflector is used, the "did the last receiver leave?" 985 question is answered like in the PIM "explicit tracking" approach. 986 But, when a BGP route reflector is used (which is expected to be 987 the recommended approach), the role of maintaining an updated list 988 of the PE part of a said multicast tree is taken care of by the 989 route reflector(s). Using plain BGP route selection procedures, 990 the route reflector will withdraw a C-multicast Source Tree Join 991 for a said (C-S,C-G) when there is no PE advertising one anymore. 992 In this context, the "did the last receiver leave?" question can 993 be said to be answered by the route-reflector alone. 994 Furthermore, the BGP route distribution can leverage more than one 995 route reflector : if a hierarchy of route reflectors is used, the 996 "did the last receiver leave?" question is partly answered by each 997 route reflector in the hierarchy. 999 We can see that answering the "last receiver leaves" question is a 1000 significant proportion of the work that the C-multicast routing 1001 building block has to make, and where approaches differ most. The 1002 different approaches for handling C-multicast routing can result in a 1003 different amount of processing and how this processing is spread 1004 among the different functions. These differences can be better 1005 estimated by quantifying the amount of message processing and state 1006 maintenance. 1008 Though the type of processing, messages and states, may vary with the 1009 different approaches, we propose here a rough estimation of the load 1010 of PEs, in terms of number of messages processed and number of 1011 control plane states maintained : a "message processed" being a 1012 message being parsed, a lookup being done, and some action being 1013 taken (such has updating a control plane or data plane state), and a 1014 "state maintained" being a multicast state kept in the control plane 1015 memory of a PE, related to a interface or a PE being subscribed to a 1016 multicast stream (we don't compare the data plane states on PE 1017 routers, which wouldn't vary between the different options chosen). 1019 The following subsections do such an estimation for each proposed 1020 approach for C-multicast routing, for different phases of the 1021 following scenario: 1023 o one SSM multicast stream is considered - scalability extrapolation 1024 to more than one stream is linear 1026 o only the intra-AS case is concerned (with the segmented inter-AS 1027 trees and BGP-based C-multicast routing, #mvpn_PES and #joined_PEs 1028 should refer to the PEs of the mVPN in the AS, not to all PEs of 1029 the mVPN) 1031 o the scenario is as follows: 1033 * one PE Joins the multicast stream (because of a new receiver- 1034 connected site has sent a Join on the PE-CE link), followed by 1035 additional PE that also join the multicast stream, one after 1036 the other ; we evaluate the processing required for the 1037 addition of each PE 1039 * some period of time T passes, without any PE joining or leaving 1040 (baseline) 1042 * all PE leaves, one after the other, until the last one leaves ; 1043 we evaluate the processing required for the leave of each PE 1045 o the parameters used are: 1047 * #mVPN_PEs : the number of PEs in the mVPN 1049 * #R_PEs : the number of PEs joining the multicast stream 1051 * #RRs : the number of route reflectors 1053 * T_PIM_r : the time between two refreshes of a PIM Join (default 1054 is 60s) 1056 The estimation unit used is the "message.equipment" or "m.e", one 1057 "message.equipment" being "one equipment processing one message" (10 1058 m.e being "10 equipments processing each one message", or "5 messages 1059 each processed by 2 equipments", or "1 message processed by 10 1060 equipment", etc.). Similarly for the amount of control plane state, 1061 we count in "state.equipment" or "s.e". 1063 We distinguish three different types of equipments : the upstream PE 1064 for the multicast stream, the RR (if any), and the other PEs (which 1065 are not the upstream PE). The estimation is a total number of 1066 "message.equipment", for each type of equipment. 1068 Additional precisions: 1070 o for PIM, only Join and Prune messages are counted ; the PIM Hellos 1071 are not counted since these are not messages that trigger specific 1072 action in a typical scenario; message processing related to the 1073 PIM Assert mechanism is also not taken into account, because it is 1074 only active in transient state 1076 o for BGP, only UPDATE message for mVPN route carrying C-multicast 1077 routing information are considered 1079 A.1. PIM LAN procedures, by default 1081 +------------+-----------+----------------+----------+--------------+ 1082 | | upstream | other PEs | RR | total | 1083 | | PE (1) | (#mvpn_PEs -1) | (none) | | 1084 +------------+-----------+----------------+----------+--------------+ 1085 | first PE | 1 m.e | #mVPN_PEs-1 | / | #mVPN_PEs | 1086 | joins | | m.e | | m.e | 1087 +------------+-----------+----------------+----------+--------------+ 1088 | for *each* | 1 m.e | #mvpn_PEs-1 | / | #mvpn_PEs | 1089 | additional | | m.e | | m.e | 1090 | PE joining | | | | | 1091 +------------+-----------+----------------+----------+--------------+ 1092 +------------+-----------+----------------+----------+--------------+ 1093 | baseline | T/T_PIMr | (T/T_PIMr) . | / | (T/T_PIMr) x | 1094 | processing | m.e | (#mvpn_PEs -1) | | #mvpn_PEs | 1095 | over a | | m.e | | m.e | 1096 | period T | | | | | 1097 +------------+-----------+----------------+----------+--------------+ 1098 | for *each* | 2 m.e | 2(#mvpn_PEs-1) | / | 2 x | 1099 | PE leaving | | m.e | | #mvpn_PEs | 1100 | | | | | m.e | 1101 +------------+-----------+----------------+----------+--------------+ 1102 | the last | 1 m.e | #mvpn_PEs-1 | / | #mvpn_PEs | 1103 | PE leaves | | m.e | | m.e | 1104 +------------+-----------+----------------+----------+--------------+ 1105 | total for | #R_PEs x | (#mvpn_PEs-1) | 0 | #mvpn_PEs x | 1106 | #R_PEs PEs | 2 + | x (#R_PEs) x 2 | | ( 3 x | 1107 | | T/T_PIMr | + T/T_PIMr) . | | #joined_PEs | 1108 | | m.e | (#mvpn_PEs -1) | | + T/T_PIMr ) | 1109 | | | m.e | | m.e | 1110 +------------+-----------+----------------+----------+--------------+ 1111 | total | 1 s.e | #joined_PE s.e | 0 | #R_PEs+1 s.e | 1112 | state | | | | | 1113 | maintained | | | | | 1114 +------------+-----------+----------------+----------+--------------+ 1116 Amount of messages processed for one multicast tree of one VPN - PIM 1117 LAN procedures, by default 1119 We suppose here that the Join suppression and PIM Override mechanisms 1120 are fully effective, ie. that a Join sent by a PE is instantly seen 1121 by other PEs. Strictly speaking, this is not true, and depending on 1122 network delays and timing, there could be cases where more messages 1123 are exchanged. 1125 A.2. PIM LAN procedures, with explicit tracking 1127 +--------------+-------------+--------------+--------+--------------+ 1128 | | upstream PE | other PEs | RRs | total | 1129 | | (1) | (#mvpn_PEs | (none) | | 1130 | | | -1) | | | 1131 +--------------+-------------+--------------+--------+--------------+ 1132 | first PE | 1 m.e | 1 m.e (see | / | 2 m.e | 1133 | joins | | note below) | | | 1134 +--------------+-------------+--------------+--------+--------------+ 1135 | for *each* | 1 m.e | 1 m.e (see | / | 2 m.e | 1136 | additional | | note below) | | | 1137 | PE joining | | | | | 1138 +--------------+-------------+--------------+--------+--------------+ 1139 +--------------+-------------+--------------+--------+--------------+ 1140 | baseline | (T/T_PIM) | (T/T_PIMr) | / | (T/T_PIMres) | 1141 | processing | m.e x | m.e (see | | x #R_PEs m.e | 1142 | over a | #R_PEs m.e | note below) | | | 1143 | period T | | | | | 1144 +--------------+-------------+--------------+--------+--------------+ 1145 | for *each* | 1 m.e | 1 m.e (see | / | 2 m.e | 1146 | PE leaving | | note below) | | | 1147 +--------------+-------------+--------------+--------+--------------+ 1148 | the last PE | 1 m.e | 1 m.e (see | / | 2 m.e | 1149 | leaves | | note below) | | | 1150 +--------------+-------------+--------------+--------+--------------+ 1151 | total for | #R_PEs (2 + | #R_PEs x ( 2 | 0 | #R_PEs x ( 4 | 1152 | #R_PEs PEs | T/T_PIMr) | + T/T_PIMr) | | + T/T_PIMr) | 1153 | | m.e | m.e | | m.e | 1154 +--------------+-------------+--------------+--------+--------------+ 1155 | total state | #R_PEs s.e | #R_PEs s.e | 0 | 2 x #R_PEs | 1156 | maintained | | | | s.e | 1157 +--------------+-------------+--------------+--------+--------------+ 1159 Amount of messages processed for one multicast tree of one VPN - PIM 1160 LAN procedures, with explicit tracking 1162 Note: in this explicit tracking mode, a said Join or Leave message 1163 requires processing only by the upstream PE and the PE sending the 1164 message ; indeed, other PEs don't have any action to take ; it is to 1165 be noted though that these other PEs will still have to parse the PIM 1166 message, which is not non-zero processing. We make here the 1167 assumption that this is not significant. 1169 A.3. BGP-based 1171 About RR: we suppose that a message has to be processed by r BGP 1172 route reflectors to go from a receiver-connected PE to the source- 1173 connected PE. In practice, r depends on how RR are meshed, and would 1174 typically be small (max 1,2,3...), and r tends quickly toward 1 (as 1175 soon as there is a receiver-connected PEs in each RR cluster). 1177 We make the assumption that RT constraint is used, if not the amount 1178 of state and message processing with this approach is similar to the 1179 PIM with explicit tracking approach, without the Joins refreshes. 1181 +--------------+----------+------------+-------------+--------------+ 1182 | | upstream | other PEs | RRs (#RRs) | total | 1183 | | PE (1) | (#mvpn_PEs | | | 1184 | | | -1) | | | 1185 +--------------+----------+------------+-------------+--------------+ 1186 | first PE | 1 m.e | 1 m.e | r m.e | (r+2) m.e | 1187 | joins | | | | | 1188 +--------------+----------+------------+-------------+--------------+ 1189 | for *each* | 0 | 1 m.e | between 1 | between 2 | 1190 | additional | | | and r m.e | and (r+1) | 1191 | PE joining | | | | m.e | 1192 +--------------+----------+------------+-------------+--------------+ 1193 | baseline | 0 | 0 | 0 | 0 | 1194 | processing | | | | | 1195 | over a | | | | | 1196 | period T | | | | | 1197 +--------------+----------+------------+-------------+--------------+ 1198 | for *each* | 0 | 1 m.e | between 1 | between 2 | 1199 | PE leaving | | | and r m.e | and (r+1) | 1200 | | | | | m.e | 1201 +--------------+----------+------------+-------------+--------------+ 1202 | the last PE | 1 m.e | 1 m.e | r m.e | (r+2) m.e | 1203 | leaves | | | | | 1204 +--------------+----------+------------+-------------+--------------+ 1205 | total for | 2 m.e | #R_PEs x 2 | 2 | 2 (2 x | 1206 | #R_PEs PEs | | m.e | (r+#R_PEs) | #R_PEs + r + | 1207 | | | | m.e | 1) m.e | 1208 +--------------+----------+------------+-------------+--------------+ 1209 | total state | 1 s.e | #R_PEs s.e | approx. | approx. | 1210 | maintained | | | #R_PEs x | (#R_PEs x | 1211 | | | | #RRs s.e | (#RRs+1)) | 1212 | | | | | m.e | 1213 +--------------+----------+------------+-------------+--------------+ 1215 Amount of messages processed for one multicast tree of one VPN - BGP- 1216 based procedures 1218 A.4. Side by side orders of magnitude comparison 1220 This section concludes on the previous section by considering the 1221 orders of magnitude when the number of PE in a VPN increases. 1223 +------------+-----------------------+--------------+---------------+ 1224 | | PIM LAN Procedures, | PIM LAN | BGP-based | 1225 | | default | Procedures, | | 1226 | | | explicit | | 1227 | | | tracking | | 1228 +------------+-----------------------+--------------+---------------+ 1229 | first PE | O(#mVPN_PEs) | O(1) | O(1) | 1230 | joins | | | | 1231 +------------+-----------------------+--------------+---------------+ 1232 | for *each* | O(#mVPN_PEs) | O(1) | O(1) | 1233 | additional | | | | 1234 | PE joining | | | | 1235 +------------+-----------------------+--------------+---------------+ 1236 | baseline | (T/T_PIMr) x | (T/T_PIMr) x | 0 | 1237 | processing | O(#mvpn_PEs) | O(#R_PEs) | | 1238 | over a | | | | 1239 | period T | | | | 1240 +------------+-----------------------+--------------+---------------+ 1241 | for *each* | O(#mVPN_PEs) | O(1) | O(1) | 1242 | PE leaving | | | | 1243 +------------+-----------------------+--------------+---------------+ 1244 | the last | O(#mVPN_PEs) | O(1) | O(1) | 1245 | PE leaves | | | | 1246 +------------+-----------------------+--------------+---------------+ 1247 | total for | O(#mVPN_PEs x #R_PEs) | O(#R_PEs) x | O(#R_PEs) | 1248 | #R_PEs PEs | + O(#mVPN_PEs x | (T/T_PIMr) | | 1249 | | T/T_PIMr) | | | 1250 +------------+-----------------------+--------------+---------------+ 1251 | states | O(#R_PEs) | O(#R_PEs) | O(#R_PEs x | 1252 | | | | #RRs) | 1253 +------------+-----------------------+--------------+---------------+ 1254 | notes | (processing and state | (processing | (processing | 1255 | | maintenance are | and state | and state | 1256 | | essentially done by, | maintenance | maintenance | 1257 | | and spread amongst, | is | is | 1258 | | the PEs of the mvpn ; | essentially | essentially | 1259 | | non-upstream PEs have | done on the | done by, and | 1260 | | processing to do) | upstream PE) | spread | 1261 | | | | amongst, the | 1262 | | | | RRs) | 1263 +------------+-----------------------+--------------+---------------+ 1265 Amount of messages processed for one multicast tree of one VPN - PIM 1266 LAN procedures, with explicit tracking 1268 The conclusions that can be drawn from the above are that: 1270 o the PIM LAN Procedures default approach is particular in that all 1271 PEs, including those that are neither upstream nor downstream for 1272 a given message have processing to do, which results in a total 1273 amount of messages to process which is in O(#mVPN_PEs x #R_PEs), 1274 i.e. O(#mVPN_PEs ^ 2) if the proportion of R_PEs is considered 1275 constant when the number of PEs increases 1277 o the two PIM-based approach do refreshes of Join messages, this is 1278 a linear factor not changing the order of magnitude, but which can 1279 be significant for long-lived streams 1281 o the BGP-based approach requires an amount of message processing in 1282 O(#R_PEs), lower than the two other approaches, and which is 1283 independent of the duration of streams 1285 o state maintenance is in the same order of magnitude for all 1286 approaches : O(#R_PEs), but the repartition is different: 1288 * the PIM LAN Procedure default approach fully spreads, and 1289 minimizes, the amount of state (one state per PE) 1291 * the PIM LAN procedure with explicit tracking, concentrate all 1292 state on the upstream PE 1294 * the BGP-based procedures spread all the state on the set of 1295 route reflectors 1297 This quantification of message processing is based on a use case 1298 where each PE with a receiver joins and leave once. Drawing 1299 scalability-related conclusions for other patterns or frequency of 1300 changes of the set of receiver-connected PEs, requires considering 1301 the cost of each approach for "a new PE joining" and "a (non-last) PE 1302 leaving". From this perspective, the "PIM LAN Procedure default 1303 approach" is the most costly one (processing in O(#mVPN_PEs)), 1304 whereas the other approaches are in O(1) ; the "PIM LAN Procedures 1305 with explicit tracking" reduce the processing to the minimum in that 1306 case, the BGP-based approach having a cost increased by a linear 1307 factor depending on the number of RRs that will have to parse the 1308 message. 1310 Appendix B. Switching to S-PMSI 1312 [ the following point was fixed in -07, and is here for reference 1313 only ] 1315 Section 7.2.2.3 of [I-D.ietf-l3vpn-2547bis-mcast] proposes two 1316 approaches for how a source PE can decide when to start transmitting 1317 customer multicast traffic on a S-PMSI: 1319 1. The source PE sends multicast packets for the on both 1320 the I-PMSI P-multicast tree and the S-PMSI P-multicast tree 1321 simultaneously for a pre-configured period of time, letting the 1322 receiver PEs select the new tree for reception, before switching 1323 to only the S-PMSI. 1325 2. The source PE waits for a pre-configured period of time after 1326 advertising the entry bound to the S-PMSI before fully 1327 switching the traffic onto the S-PMSI-bound P-multicast tree. 1329 The first alternative has essentially two drawbacks: 1331 o traffic is sent twice for some period of time, which 1332 would appear to be at odds with the motivation for switching to an 1333 S-PMSI in order to optimize the bandwidth used by the multicast 1334 tree for that stream. 1336 o It is unlikely that the switchover can occur without packet loss 1337 or duplication if the transit delays of the I-PMSI P-multicast 1338 tree and the S-PMSI P-multicast tree differ. 1340 By contrast, the second alternative has none of these drawbacks, and 1341 satisfy the requirement in section 5.1.3 of [RFC4834], which states 1342 that "[...] a multicast VPN solution SHOULD as much as possible 1343 ensure that client multicast traffic packets are neither lost nor 1344 duplicated, even when changes occur in the way a client multicast 1345 data stream is carried over the provider network". The second 1346 alternative also happen to be the one used in existing deployments. 1348 For these reasons, it is the authors' recommendation to mandate the 1349 implementation of the second alternative for switching to S-PMSI. 1351 Authors' Addresses 1353 Thomas Morin (editor) 1354 France Telecom R&D 1355 2 rue Pierre Marzin 1356 Lannion 22307 1357 France 1359 Email: thomas.morin@orange-ftgroup.com 1360 Ben Niven-Jenkins (editor) 1361 BT 1362 208 Callisto House, Adastral Park 1363 Ipswich, Suffolk IP5 3RE 1364 UK 1366 Email: benjamin.niven-jenkins@bt.com 1368 Yuji Kamite 1369 NTT Communications Corporation 1370 Tokyo Opera City Tower 1371 3-20-2 Nishi Shinjuku, Shinjuku-ku 1372 Tokyo 163-1421 1373 Japan 1375 Email: y.kamite@ntt.com 1377 Raymond Zhang 1378 BT 1379 2160 E. Grand Ave. 1380 El Segundo CA 90025 1381 USA 1383 Email: raymond.zhang@bt.com 1385 Nicolai Leymann 1386 Deutsche Telekom 1387 Goslarer Ufer 35 1388 10589 Berlin 1389 Germany 1391 Email: nicolai.leymann@t-systems.com 1393 Nabil Bitar 1394 Verizon 1395 40 Sylvan Road 1396 Waltham, MA 02451 1397 USA 1399 Email: nabil.n.bitar@verizon.com 1401 Full Copyright Statement 1403 Copyright (C) The IETF Trust (2008). 1405 This document is subject to the rights, licenses and restrictions 1406 contained in BCP 78, and except as set forth therein, the authors 1407 retain all their rights. 1409 This document and the information contained herein are provided on an 1410 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 1411 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 1412 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 1413 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 1414 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1415 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1417 Intellectual Property 1419 The IETF takes no position regarding the validity or scope of any 1420 Intellectual Property Rights or other rights that might be claimed to 1421 pertain to the implementation or use of the technology described in 1422 this document or the extent to which any license under such rights 1423 might or might not be available; nor does it represent that it has 1424 made any independent effort to identify any such rights. Information 1425 on the procedures with respect to rights in RFC documents can be 1426 found in BCP 78 and BCP 79. 1428 Copies of IPR disclosures made to the IETF Secretariat and any 1429 assurances of licenses to be made available, or the result of an 1430 attempt made to obtain a general license or permission for the use of 1431 such proprietary rights by implementers or users of this 1432 specification can be obtained from the IETF on-line IPR repository at 1433 http://www.ietf.org/ipr. 1435 The IETF invites any interested party to bring to its attention any 1436 copyrights, patents or patent applications, or other proprietary 1437 rights that may cover technology that may be required to implement 1438 this standard. Please address the information to the IETF at 1439 ietf-ipr@ietf.org.