idnits 2.17.1 draft-ietf-l3vpn-2547bis-mcast-10.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to contain a disclaimer for pre-RFC5378 work, and may have content which was first submitted before 10 November 2008. The disclaimer is necessary when there are original authors that you have been unable to contact, or if some do not wish to grant the BCP78 rights to the IETF Trust. If you are able to get all authors (current and original) to grant those rights, you can and should remove the disclaimer; otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (January 28, 2010) is 5200 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-15) exists of draft-ietf-mpls-ldp-p2mp-08 ** Obsolete normative reference: RFC 4601 (ref. 'PIM-SM') (Obsoleted by RFC 7761) == Outdated reference: A later version (-09) exists of draft-ietf-mpls-rsvp-te-no-php-oob-mapping-03 -- Obsolete informational reference (is this intentional?): RFC 5226 (Obsoleted by RFC 8126) Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Eric C. Rosen (Editor) 3 Internet Draft Cisco Systems, Inc. 4 Intended Status: Standards Track 5 Expires: July 28, 2010 Rahul Aggarwal (Editor) 6 Juniper Networks 8 January 28, 2010 10 Multicast in MPLS/BGP IP VPNs 12 draft-ietf-l3vpn-2547bis-mcast-10.txt 14 Status of this Memo 16 This Internet-Draft is submitted to IETF in full conformance with the 17 provisions of BCP 78 and BCP 79. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF), its areas, and its working groups. Note that 21 other groups may also distribute working documents as 22 Internet-Drafts. 24 Internet-Drafts are draft documents valid for a maximum of six months 25 and may be updated, replaced, or obsoleted by other documents at any 26 time. It is inappropriate to use Internet-Drafts as reference 27 material or to cite them other than as "work in progress." 29 The list of current Internet-Drafts can be accessed at 30 http://www.ietf.org/ietf/1id-abstracts.txt. 32 The list of Internet-Draft Shadow Directories can be accessed at 33 http://www.ietf.org/shadow.html. 35 Copyright and License Notice 37 Copyright (c) 2010 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents 42 (http://trustee.ietf.org/license-info) in effect on the date of 43 publication of this document. Please review these documents 44 carefully, as they describe your rights and restrictions with respect 45 to this document. Code Components extracted from this document must 46 include Simplified BSD License text as described in Section 4.e of 47 the Trust Legal Provisions and are provided without warranty as 48 described in the Simplified BSD License. 50 This document may contain material from IETF Documents or IETF 51 Contributions published or made publicly available before November 52 10, 2008. The person(s) controlling the copyright in some of this 53 material may not have granted the IETF Trust the right to allow 54 modifications of such material outside the IETF Standards Process. 55 Without obtaining an adequate license from the person(s) controlling 56 the copyright in such materials, this document may not be modified 57 outside the IETF Standards Process, and derivative works of it may 58 not be created outside the IETF Standards Process, except to format 59 it for publication as an RFC or to translate it into languages other 60 than English. 62 Abstract 64 In order for IP multicast traffic within a BGP/MPLS IP VPN (Virtual 65 Private Network) to travel from one VPN site to another, special 66 protocols and procedures must be implemented by the VPN Service 67 Provider. These protocols and procedures are specified in this 68 document. 70 Table of Contents 72 1 Specification of requirements ......................... 6 73 2 Introduction .......................................... 6 74 2.1 Optimality vs Scalability ............................. 6 75 2.1.1 Multicast Distribution Trees .......................... 8 76 2.1.2 Ingress Replication through Unicast Tunnels ........... 9 77 2.2 Overview .............................................. 9 78 2.2.1 Multicast Routing Adjacencies ......................... 9 79 2.2.2 MVPN Definition ....................................... 10 80 2.2.3 Auto-Discovery ........................................ 11 81 2.2.4 PE-PE Multicast Routing Information ................... 12 82 2.2.5 PE-PE Multicast Data Transmission ..................... 12 83 2.2.6 Inter-AS MVPNs ........................................ 13 84 2.2.7 Optionally Eliminating Shared Tree State .............. 14 85 3 Concepts and Framework ................................ 14 86 3.1 PE-CE Multicast Routing ............................... 14 87 3.2 P-Multicast Service Interfaces (PMSIs) ................ 15 88 3.2.1 Inclusive and Selective PMSIs ......................... 16 89 3.2.2 P-Tunnels Instantiating PMSIs ......................... 17 90 3.3 Use of PMSIs for Carrying Multicast Data .............. 19 91 3.4 PE-PE Transmission of C-Multicast Routing ............. 21 92 3.4.1 PIM Peering ........................................... 21 93 3.4.1.1 Full Per-MVPN PIM Peering Across a MI-PMSI ............ 21 94 3.4.1.2 Lightweight PIM Peering Across a MI-PMSI .............. 21 95 3.4.1.3 Unicasting of PIM C-Join/Prune Messages ............... 22 96 3.4.2 Using BGP to Carry C-Multicast Routing ................ 23 97 4 BGP-Based Autodiscovery of MVPN Membership ............ 23 98 5 PE-PE Transmission of C-Multicast Routing ............. 26 99 5.1 Selecting the Upstream Multicast Hop (UMH) ............ 26 100 5.1.1 Eligible Routes for UMH Selection ..................... 27 101 5.1.2 Information Carried by Eligible UMH Routes ............ 27 102 5.1.3 Selecting the Upstream PE ............................. 28 103 5.1.4 Selecting the Upstream Multicast Hop .................. 30 104 5.2 Details of Per-MVPN Full PIM Peering over MI-PMSI ..... 30 105 5.2.1 PIM C-Instance Control Packets ........................ 31 106 5.2.2 PIM C-instance RPF Determination ...................... 31 107 5.3 Use of BGP for Carrying C-Multicast Routing ........... 32 108 5.3.1 Sending BGP Updates ................................... 32 109 5.3.2 Explicit Tracking ..................................... 33 110 5.3.3 Withdrawing BGP Updates ............................... 34 111 5.3.4 BSR ................................................... 34 112 6 PMSI Instantiation .................................... 35 113 6.1 Use of the Intra-AS I-PMSI A-D Route .................. 35 114 6.1.1 Sending Intra-AS I-PMSI A-D Routes .................... 35 115 6.1.2 Receiving Intra-AS I-PMSI A-D Routes .................. 36 116 6.2 When C-flows are Specifically Bound to P-Tunnels ...... 36 117 6.3 Aggregating Multiple MVPNs on a Single P-tunnel ....... 36 118 6.3.1 Aggregate Tree Leaf Discovery ......................... 37 119 6.3.2 Aggregation Methodology ............................... 37 120 6.3.3 Demultiplexing C-multicast traffic .................... 38 121 6.4 Considerations for Specific Tunnel Technologies ....... 40 122 6.4.1 RSVP-TE P2MP LSPs ..................................... 40 123 6.4.2 PIM Trees ............................................. 42 124 6.4.3 mLDP P2MP LSPs ........................................ 43 125 6.4.4 mLDP MP2MP LSPs ....................................... 43 126 6.4.5 Ingress Replication ................................... 43 127 7 Binding Specific C-flows to Specific P-Tunnels ........ 45 128 7.1 General Considerations ................................ 46 129 7.1.1 At the PE Transmitting the C-flow on the P-Tunnel ..... 46 130 7.1.2 At the PE Receiving the C-flow from the P-Tunnel ...... 47 131 7.2 Optimizing Multicast Distribution via S-PMSIs ......... 49 132 7.3 Announcing the Presence of Unsolicited Flooded Data ... 50 133 7.4 Protocols for Binding C-flows to P-tunnels ............ 51 134 7.4.1 Using BGP S-PMSI A-D Routes ........................... 51 135 7.4.1.1 Advertising C-flow Binding to P-Tunnel ................ 51 136 7.4.1.2 Explicit Tracking ..................................... 53 137 7.4.2 UDP-based Protocol .................................... 53 138 7.4.2.1 Advertising C-flow Binding to P-tunnel ................ 53 139 7.4.2.2 Packet Formats and Constants .......................... 54 140 7.4.3 Aggregation ........................................... 56 141 8 Inter-AS Procedures ................................... 56 142 8.1 Non-Segmented Inter-AS P-Tunnels ...................... 57 143 8.1.1 Inter-AS MVPN Auto-Discovery .......................... 57 144 8.1.2 Inter-AS MVPN Routing Information Exchange ............ 57 145 8.1.3 Inter-AS P-Tunnels .................................... 58 146 8.1.3.1 PIM-Based Inter-AS P-Multicast Trees .................. 58 147 8.1.3.2 The PIM MVPN Join Attribute ........................... 60 148 8.1.3.2.1 Definition ............................................ 60 149 8.1.3.2.2 Usage ................................................. 60 150 8.2 Segmented Inter-AS P-Tunnels .......................... 61 151 9 Preventing Duplication of Multicast Data Packets ...... 62 152 9.1 Methods for Ensuring Non-Duplication .................. 63 153 9.1.1 Discarding Packets from Wrong PE ...................... 63 154 9.1.2 Single Forwarder Selection ............................ 64 155 9.1.3 Native PIM Methods .................................... 65 156 9.2 Multihomed C-S or C-RP ................................ 65 157 9.3 Switching from the C-RP tree to C-S tree .............. 65 158 9.3.1 How Duplicates Can Occur .............................. 65 159 9.3.2 Solution using Source Active A-D Routes ............... 67 160 10 Eliminating PE-PE Distribution of (C-*,C-G) State ..... 69 161 10.1 Co-locating C-RPs on a PE ............................. 70 162 10.1.1 Initial Configuration ................................. 70 163 10.1.2 Anycast RP Based on Propagating Active Sources ........ 70 164 10.1.2.1 Receiver(s) Within a Site ............................. 70 165 10.1.2.2 Source Within a Site .................................. 71 166 10.1.2.3 Receiver Switching from Shared to Source Tree ......... 71 167 10.2 Using MSDP between a PE and a Local C-RP .............. 71 168 11 Support for PIM-BIDIR C-Groups ........................ 72 169 11.1 The VPN Backbone Becomes the RPL ...................... 74 170 11.1.1 Control Plane ......................................... 74 171 11.1.2 Data Plane ............................................ 75 172 11.2 Partitioned Sets of PEs ............................... 75 173 11.2.1 Partitions ............................................ 75 174 11.2.2 Using PE Distinguisher Labels ......................... 76 175 11.2.3 Partial Mesh of MP2MP P-Tunnels ....................... 77 176 12 Encapsulations ........................................ 77 177 12.1 Encapsulations for Single PMSI per P-Tunnel ........... 77 178 12.1.1 Encapsulation in GRE .................................. 77 179 12.1.2 Encapsulation in IP ................................... 79 180 12.1.3 Encapsulation in MPLS ................................. 79 181 12.2 Encapsulations for Multiple PMSIs per P-Tunnel ........ 80 182 12.2.1 Encapsulation in GRE .................................. 80 183 12.2.2 Encapsulation in IP ................................... 80 184 12.3 Encapsulations Identifying a Distinguished PE ......... 81 185 12.3.1 For MP2MP LSP P-tunnels ............................... 81 186 12.3.2 For Support of PIM-BIDIR C-Groups ..................... 81 187 12.4 General Considerations for IP and GRE Encaps .......... 82 188 12.4.1 MTU (Maximum Transmission Unit) ....................... 82 189 12.4.2 TTL (Time to Live) .................................... 83 190 12.4.3 Avoiding Conflict with Internet Multicast ............. 83 191 12.5 Differentiated Services ............................... 83 192 13 Security Considerations ............................... 83 193 14 IANA Considerations ................................... 85 194 15 Other Authors ......................................... 86 195 16 Other Contributors .................................... 86 196 17 Authors' Addresses .................................... 86 197 18 Normative References .................................. 87 198 19 Informative References ................................ 88 199 1. Specification of requirements 201 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 202 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 203 document are to be interpreted as described in [RFC2119]. 205 2. Introduction 207 [RFC4364] specifies the set of procedures that a Service Provider 208 (SP) must implement in order to provide a particular kind of VPN 209 service ("BGP/MPLS IP VPN") for its customers. The service described 210 therein allows IP unicast packets to travel from one customer site to 211 another, but it does not provide a way for IP multicast traffic to 212 travel from one customer site to another. 214 This document extends the service defined in [RFC4364] so that it 215 also includes the capability of handling IP multicast traffic. This 216 requires a number of different protocols to work together. The 217 document provides a framework describing how the various protocols 218 fit together, and also provides detailed specification of some of the 219 protocols. The detailed specification of some of the other protocols 220 is found in pre-existing documents or in companion documents. 222 A BGP/MPLS IP VPN service that supports multicast is known as a 223 "Multicast VPN" or "MVPN". 225 This document and the companion document [MVPN-BGP] both discuss the 226 use of various BGP messages and procedures to provide MVPN support. 227 While every effort has been made to ensure that the two documents are 228 consistent with each other, it is possible that discrepancies have 229 crept in. In the event of any conflict or other discrepancy with 230 respect to the use of BGP in support of MVPN service, [MVPN-BGP] is 231 to be considered to be the authoritative document. 233 Throughout this draft we will use the term "VPN-IP route" to mean a 234 route that is either in the VPN-IPv4 address family [RFC4364] or in 235 the VPN-IPv6 address family [RFC4659]. 237 2.1. Optimality vs Scalability 239 In a "BGP/MPLS IP VPN" [RFC4364], unicast routing of VPN packets is 240 achieved without the need to keep any per-VPN state in the core of 241 the SP's network (the "P routers"). Routing information from a 242 particular VPN is maintained only by the Provider Edge routers (the 243 "PE routers", or "PEs") that attach directly to sites of that VPN. 244 Customer data travels through the P routers in tunnels from one PE to 245 another (usually MPLS Label Switched Paths, LSPs), so to support the 246 VPN service the P routers only need to have routes to the PE routers. 247 The PE-to-PE routing is optimal, but the amount of associated state 248 in the P routers depends only on the number of PEs, not on the number 249 of VPNs. 251 However, in order to provide optimal multicast routing for a 252 particular multicast flow, the P routers through which that flow 253 travels have to hold state that is specific to that flow. A 254 multicast flow is identified by the (source, group) tuple where the 255 source is the IP address of the sender and the group is the IP 256 multicast group address of the destination. Scalability would be 257 poor if the amount of state in the P routers were proportional to the 258 number of multicast flows in the VPNs. Therefore, when supporting 259 multicast service for a BGP/MPLS IP VPN, the optimality of the 260 multicast routing must be traded off against the scalability of the P 261 routers. We explain this below in more detail. 263 If a particular VPN is transmitting "native" multicast traffic over 264 the backbone, we refer to it as an "MVPN". By "native" multicast 265 traffic, we mean packets that a CE sends to a PE, such that the IP 266 destination address of the packets is a multicast group address, or 267 the packets are multicast control packets addressed to the PE router 268 itself, or the packets are IP multicast data packets encapsulated in 269 MPLS. 271 We say that the backbone multicast routing for a particular multicast 272 group in a particular VPN is "optimal" if and only if all of the 273 following conditions hold: 275 - When a PE router receives a multicast data packet of that group 276 from a CE router, it transmits the packet in such a way that the 277 packet is received by every other PE router that is on the path 278 to a receiver of that group; 280 - The packet is not received by any other PEs; 282 - While in the backbone, no more than one copy of the packet ever 283 traverses any link. 285 - While in the backbone, if bandwidth usage is to be optimized, the 286 packet traverses minimum cost trees rather than shortest path 287 trees. 289 Optimal routing for a particular multicast group requires that the 290 backbone maintain one or more source-trees that are specific to that 291 flow. Each such tree requires that state be maintained in all the P 292 routers that are in the tree. 294 This would potentially require an unbounded amount of state in the P 295 routers, since the SP has no control of the number of multicast 296 groups in the VPNs that it supports. Nor does the SP have any control 297 over the number of transmitters in each group, nor of the 298 distribution of the receivers. 300 The procedures defined in this document allow an SP to provide 301 multicast VPN service without requiring the amount of state 302 maintained by the P routers to be proportional to the number of 303 multicast data flows in the VPNs. The amount of state is traded off 304 against the optimality of the multicast routing. Enough flexibility 305 is provided so that a given SP can make his own tradeoffs between 306 scalability and optimality. An SP can even allow some multicast 307 groups in some VPNs to receive optimal routing, while others do not. 308 Of course, the cost of this flexibility is an increase in the number 309 of options provided by the protocols. 311 The basic technique for providing scalability is to aggregate a 312 number of customer multicast flows onto a single multicast 313 distribution tree through the P routers. A number of aggregation 314 methods are supported. 316 The procedures defined in this document also accommodate the SP that 317 does not want to build multicast distribution trees in his backbone 318 at all; the ingress PE can replicate each multicast data packet and 319 then unicast each replica through a tunnel to each egress PE that 320 needs to receive the data. 322 2.1.1. Multicast Distribution Trees 324 This document supports the use of a single multicast distribution 325 tree in the backbone to carry all the multicast traffic from a 326 specified set of one or more MVPNs. Such a tree is referred to as an 327 "Inclusive Tree". An Inclusive Tree that carries the traffic of more 328 than one MVPN is an "Aggregate Inclusive Tree". An Inclusive Tree 329 contains, as its members, all the PEs that attach to any of the MVPNs 330 using the tree. 332 With this option, even if each tree supports only one MVPN, the upper 333 bound on the amount of state maintained by the P routers is 334 proportional to the number of VPNs supported, rather than to the 335 number of multicast flows in those VPNs. If the trees are 336 unidirectional, it would be more accurate to say that the state is 337 proportional to the product of the number of VPNs and the average 338 number of PEs per VPN. The amount of state maintained by the P 339 routers can be further reduced by aggregating more MVPNs onto a 340 single tree. If each such tree supports a set of MVPNs, (call it an 341 "MVPN aggregation set"), the state maintained by the P routers is 342 proportional to the product of the number of MVPN aggregation sets 343 and the average number of PEs per MVPN. Thus the state does not grow 344 linearly with the number of MVPNs. 346 However, as data from many multicast groups is aggregated together 347 onto a single "Inclusive Tree", it is likely that some PEs will 348 receive multicast data for which they have no need, i.e., some degree 349 of optimality has been sacrificed. 351 This document also provides procedures that enable a single multicast 352 distribution tree in the backbone to be used to carry traffic 353 belonging only to a specified set of one or more multicast groups, 354 from one or more MVPNs. Such a tree is referred to as a "Selective 355 Tree" and more specifically as an "Aggregate Selective Tree" when the 356 multicast groups belong to different MVPNs. By default, traffic from 357 most multicast groups could be carried by an Inclusive Tree, while 358 traffic from, e.g., high bandwidth groups could be carried in one of 359 the "Selective Trees". When setting up the Selective Trees, one 360 should include only those PEs that need to receive multicast data 361 from one or more of the groups assigned to the tree. This provides 362 more optimal routing than can be obtained by using only Inclusive 363 Trees, though it requires additional state in the P routers. 365 2.1.2. Ingress Replication through Unicast Tunnels 367 This document also provides procedures for carrying MVPN data traffic 368 through unicast tunnels from the ingress PE to each of the egress 369 PEs. The ingress PE replicates the multicast data packet received 370 from a CE and sends it to each of the egress PEs using the unicast 371 tunnels. This requires no multicast routing state in the P routers 372 at all, but it puts the entire replication load on the ingress PE 373 router, and makes no attempt to optimize the multicast routing. 375 2.2. Overview 377 2.2.1. Multicast Routing Adjacencies 379 In BGP/MPLS IP VPNs [RFC4364], each CE ("Customer Edge") router is a 380 unicast routing adjacency of a PE router, but CE routers at different 381 sites do not become unicast routing adjacencies of each other. This 382 important characteristic is retained for multicast routing -- a CE 383 router becomes a multicast routing adjacency of a PE router, but CE 384 routers at different sites do not become multicast routing 385 adjacencies of each other. 387 We will use the term "C-tree" to refer to a multicast distribution 388 tree whose nodes include CE routers. (See section 3.1 for further 389 explication of this terminology.) 391 The multicast routing protocol on the PE-CE link is presumed to be 392 PIM ("Protocol Independent Multicast") [PIM-SM]. Both the ASM ("Any 393 Source Multicast") and the SSM ("Source-Specific Multicast") service 394 models are supported. Thus both shared C-trees and source-specific 395 C-trees are supported. Shared C-trees may be unidirectional or 396 bidirectional; in the latter case the multicast routing protocol is 397 presumed to be the BIDIR-PIM [BIDIR-PIM] "variant" of PIM-SM. A CE 398 router exchanges "ordinary" PIM control messages with the PE router 399 to which it is attached. 401 Support for PIM-DM ("Dense Mode") is outside the scope of this 402 document. 404 The PEs attaching to a particular MVPN then have to exchange the 405 multicast routing information with each other. Two basic methods for 406 doing this are defined: (1) PE-PE PIM, and (2) BGP. In the former 407 case, the PEs need to be multicast routing adjacencies of each other. 408 In the latter case, they do not. For example, each PE may be a BGP 409 adjacency of a Route Reflector (RR), and not of any other PEs. 411 In order to support the "Carrier's Carrier" model of [RFC4364], mLDP 412 (Label Distribution Protocol Extensions for Multipoint Label Switched 413 Paths) [MLDP] may also be supported on the PE-CE interface. The use 414 of mLDP on the PE-CE interface is described in [MVPN-BGP]. The use 415 of BGP on the PE-CE interface is not within the scope of this 416 document. 418 2.2.2. MVPN Definition 420 An MVPN is defined by two sets of sites, Sender Sites set and 421 Receiver Sites set, with the following properties: 423 - Hosts within the Sender Sites set could originate multicast 424 traffic for receivers in the Receiver Sites set. 426 - Receivers not in the Receiver Sites set should not be able to 427 receive this traffic. 429 - Hosts within the Receiver Sites set could receive multicast 430 traffic originated by any host in the Sender Sites set. 432 - Hosts within the Receiver Sites set should not be able to receive 433 multicast traffic originated by any host that is not in the 434 Sender Sites set. 436 A site could be both in the Sender Sites set and Receiver Sites set, 437 which implies that hosts within such a site could both originate and 438 receive multicast traffic. An extreme case is when the Sender Sites 439 set is the same as the Receiver Sites set, in which case all sites 440 could originate and receive multicast traffic from each other. 442 Sites within a given MVPN may be either within the same, or in 443 different organizations, which implies that an MVPN can be either an 444 Intranet or an Extranet. 446 A given site may be in more than one MVPN, which implies that MVPNs 447 may overlap. 449 Not all sites of a given MVPN have to be connected to the same 450 service provider, which implies that an MVPN can span multiple 451 service providers. 453 Another way to look at MVPN is to say that an MVPN is defined by a 454 set of administrative policies. Such policies determine both Sender 455 Sites set and Receiver Sites set. Such policies are established by 456 MVPN customers, but implemented/realized by MVPN Service Providers 457 using the existing BGP/MPLS VPN mechanisms, such as Route Targets, 458 with extensions, as necessary. 460 2.2.3. Auto-Discovery 462 In order for the PE routers attaching to a given MVPN to exchange 463 MVPN control information with each other, each one needs to discover 464 all the other PEs that attach to the same MVPN. (Strictly speaking, 465 a PE in the Receiver Sites set need only discover the other PEs in 466 the Sender Sites set and a PE in the Sender Sites set need only 467 discover the other PEs in the Receiver Sites set.) This is referred 468 to as "MVPN Auto-Discovery". 470 This document discusses two ways of providing MVPN autodiscovery: 472 - BGP can be used for discovering and maintaining MVPN membership. 473 The PE routers advertise their MVPN membership to other PE 474 routers using BGP. A PE is considered to be a "member" of a 475 particular MVPN if it contains a VRF (Virtual Routing and 476 Forwarding table, see [RFC4364]) that is configured to contain 477 the multicast routing information of that MVPN. This 478 auto-discovery option does not make any assumptions about the 479 methods used for transmitting MVPN multicast data packets through 480 the backbone. 482 - If it is known that the PE-PE multicast control packets (i.e., 483 PIM packets) of a particular MVPN are to be transmitted through a 484 non-aggregated Inclusive Tree supporting the ASM service model 485 (e.g., through a Tree that is created by non-SSM PIM-SM or by 486 BIDIR-PIM), and if the PEs attaching to that MVPN are configured 487 with the group address corresponding to that tree, then the PEs 488 can auto-discover each other simply by joining the tree and then 489 multicasting PIM Hellos over the tree. 491 2.2.4. PE-PE Multicast Routing Information 493 The BGP/MPLS IP VPN [RFC4364] specification requires a PE to maintain 494 at most one BGP peering with every other PE in the network. This 495 peering is used to exchange VPN routing information. The use of Route 496 Reflectors further reduces the number of BGP adjacencies maintained 497 by a PE to exchange VPN routing information with other PEs. This 498 document describes various options for exchanging MVPN control 499 information between PE routers based on the use of PIM or BGP. These 500 options have different overheads with respect to the number of 501 routing adjacencies that a PE router needs to maintain to exchange 502 MVPN control information with other PE routers. Some of these options 503 allow the retention of the unicast BGP/MPLS VPN model letting a PE 504 maintain at most one BGP routing adjacency with other PE routers to 505 exchange MVPN control information. BGP also provides reliable 506 transport and uses incremental updates. Another option is the use of 507 the currently existing, "soft state" PIM standard [PIM-SM] that uses 508 periodic complete updates. 510 2.2.5. PE-PE Multicast Data Transmission 512 Like [RFC4364], this document decouples the procedures for exchanging 513 routing information from the procedures for transmitting data 514 traffic. Hence a variety of transport technologies may be used in the 515 backbone. For inclusive trees, these transport technologies include 516 unicast PE-PE tunnels, using encapsulation in MPLS, IP, or GRE 517 ("Generic Routing Encapsulation"), multicast distribution trees 518 created by PIM (either unidirectional in the SSM or ASM service 519 models, or bidirectional) using IP/GRE encapsulation, 520 point-to-multipoint LSPs created by RSVP-TE or mLDP, and 521 multipoint-to-multipoint LSPs created by mLDP. 523 In order to aggregate traffic from multiple MVPNs onto a single 524 multicast distribution tree, it is necessary to have a mechanism to 525 enable the egresses of the tree to demultiplex the multicast traffic 526 received over the tree and to associate each received packet with a 527 particular MVPN. This document specifies a mechanism whereby 528 upstream label assignment [MPLS-UPSTREAM-LABEL] is used by the root 529 of the tree to assign a label to each flow. This label is used by 530 the receivers to perform the demultiplexing. This document also 531 describes procedures based on BGP that are used by the root of an 532 Aggregate Tree to advertise the Inclusive and/or Selective binding 533 and the demultiplexing information to the leaves of the tree. 535 This document also describes the data plane encapsulations for 536 supporting the various SP multicast transport options. 538 The specification for aggregating traffic of multiple MVPNs onto a 539 single multipoint-to-multipoint LSP or onto a single bidirectional 540 multicast distribution tree is outside the scope of this document. 542 The specifications for using as selective trees multicast 543 distribution trees that support the ASM service model is outside the 544 scope of this document. The specification for using 545 multipoint-to-multipoint LSPs as selective trees is outside the scope 546 of this document. 548 This document assumes that when SP multicast trees are used, traffic 549 for a particular multicast group is transmitted by a particular PE on 550 only one SP multicast tree. The use of multiple SP multicast trees 551 for transmitting traffic belonging to a particular multicast group is 552 outside the scope of this document. 554 2.2.6. Inter-AS MVPNs 556 [RFC4364] describes different options for supporting BGP/MPLS IP 557 unicast VPNs whose provider backbones contain more than one 558 Autonomous System (AS). These are known as Inter-AS VPNs. In an 559 Inter-AS VPN, the ASes may belong to the same provider or to 560 different providers. This document describes how Inter-AS MVPNs can 561 be supported for each of the unicast BGP/MPLS VPN Inter-AS options. 562 This document also specifies a model where Inter-AS MVPN service can 563 be offered without requiring a single SP multicast tree to span 564 multiple ASes. In this model, an inter-AS multicast tree consists of 565 a number of "segments", one per AS, which are stitched together at AS 566 boundary points. These are known as "segmented inter-AS trees". Each 567 segment of a segmented inter-AS tree may use a different multicast 568 transport technology. 570 It is also possible to support Inter-AS MVPNs with non-segmented 571 source trees that extend across AS boundaries. 573 2.2.7. Optionally Eliminating Shared Tree State 575 The document also discusses some options and protocol extensions that 576 can be used to eliminate the need for the PE routers to distribute to 577 each other the (*,G) and (*,G,RPT-bit) states that occur when the 578 VPNs are creating unidirectional C-trees to support the ASM service 579 model. 581 3. Concepts and Framework 583 3.1. PE-CE Multicast Routing 585 Support of multicast in BGP/MPLS IP VPNs is modeled closely after 586 support of unicast in BGP/MPLS IP VPNs. That is, a multicast routing 587 protocol will be run on the PE-CE interfaces, such that PE and CE are 588 multicast routing adjacencies on that interface. CEs at different 589 sites do not become multicast routing adjacencies of each other. 591 If a PE attaches to n VPNs for which multicast support is provided 592 (i.e., to n "MVPNs"), the PE will run n independent instances of a 593 multicast routing protocol. We will refer to these multicast routing 594 instances as "VPN-specific multicast routing instances", or more 595 briefly as "multicast C-instances". The notion of a "VRF" ("Virtual 596 Routing and Forwarding Table"), defined in [RFC4364], is extended to 597 include multicast routing entries as well as unicast routing entries. 598 Each multicast routing entry is thus associated with a particular 599 VRF. 601 Whether a particular VRF belongs to an MVPN or not is determined by 602 configuration. 604 In this document, we will not attempt to provide support for every 605 possible multicast routing protocol that could possibly run on the 606 PE-CE link. Rather, we consider multicast C-instances only for the 607 following multicast routing protocols: 609 - PIM Sparse Mode (PIM-SM), supporting the ASM service model 611 - PIM Sparse Mode, supporting the SSM service model 613 - PIM Bidirectional Mode (BIDIR-PIM), which uses bidirectional 614 C-trees to support the ASM service model. 616 In order to support the "Carrier's Carrier" model of [RFC4364], mLDP 617 may also be supported on the PE-CE interface. The use of mLDP on the 618 PE-CE interface is described in [MVPN-BGP]. 620 The use of BGP on the PE-CE interface is not within the scope of this 621 document. 623 As the only multicast C-instances discussed by this document are 624 PIM-based C-instances, we will generally use the term "PIM 625 C-instances" to refer to the multicast C-instances. 627 A PE router may also be running a "provider-wide" instance of PIM, (a 628 "PIM P-instance"), in which it has a PIM adjacency with, e.g., each 629 of its IGP neighbors (i.e., with P routers), but NOT with any CE 630 routers, and not with other PE routers (unless another PE router 631 happens to be an IGP adjacency). In this case, P routers would also 632 run the P-instance of PIM, but NOT a C-instance. If there is a PIM 633 P-instance, it may or may not have a role to play in support of VPN 634 multicast; this is discussed in later sections. However, in no case 635 will the PIM P-instance contain VPN-specific multicast routing 636 information. 638 In order to help clarify when we are speaking of the PIM P-instance 639 and when we are speaking of a PIM C-instance, we will also apply the 640 prefixes "P-" and "C-" respectively to control messages, addresses, 641 etc. Thus a P-Join would be a PIM Join that is processed by the PIM 642 P-instance, and a C-Join would be a PIM Join that is processed by a 643 C-instance. A P-group address would be a group address in the SP's 644 address space, and a C-group address would be a group address in a 645 VPN's address space. A C-Tree is a multicast distribution tree 646 constructed and maintained by the PIM C-instances. A C-flow is a 647 stream of multicast packets with a common C-source address and a 648 common C-group address. We will use the notation "(C-S,C-G)" to 649 identify specific C-flows. If a particular C-tree is a shared tree 650 (whether unidirectional or bidirectional) rather than a 651 source-specific tree, we will sometimes speak of the entire set of 652 flows traveling that tree, identifying the set as "(C-*,C-G)". 654 3.2. P-Multicast Service Interfaces (PMSIs) 656 A PE must have the ability to forward multicast data packets received 657 from a CE to one or more of the other PEs in the same MVPN for 658 delivery to one or more other CEs. 660 We define the notion of a "P-Multicast Service Interface" (PMSI). If 661 a particular MVPN is supported by a particular set of PE routers, 662 then there will be one or more PMSIs connecting those PE routers 663 and/or subsets thereof. A PMSI is a conceptual "overlay" on the P 664 network with the following property: a PE in a given MVPN can give a 665 packet to the PMSI, and the packet will be delivered to some or all 666 of the other PEs in the MVPN, such that any PE receiving the packet 667 will be able to determine the MVPN to which the packet belongs. 669 As we discuss below, a PMSI may be instantiated by a number of 670 different transport mechanisms, depending on the particular 671 requirements of the MVPN and of the SP. We will refer to these 672 transport mechanisms as "P-tunnels". 674 For each MVPN, there are one or more PMSIs that are used for 675 transmitting the MVPN's multicast data from one PE to others. We 676 will use the term "PMSI" such that a single PMSI belongs to a single 677 MVPN. However, the transport mechanism that is used to instantiate a 678 PMSI may allow a single P-tunnel to carry the data of multiple PMSIs. 680 In this document we make a clear distinction between the multicast 681 service (the PMSI) and its instantiation. This allows us to separate 682 the discussion of different services from the discussion of different 683 instantiations of each service. The term "P-tunnel" is used to refer 684 to the transport mechanism that instantiates a service. 686 PMSIs are used to carry C-multicast data traffic. The C-multicast 687 data traffic travels along a C-tree, but in the SP backbone all 688 C-trees are tunneled through P-tunnels. Thus we will sometimes talk 689 of a P-tunnel carrying one or more C-trees. 691 Some of the options for passing multicast control traffic among the 692 PEs do so by sending the control traffic through a PMSI; other 693 options do not send control traffic through a PMSI. 695 3.2.1. Inclusive and Selective PMSIs 697 We will distinguish between three different kinds of PMSI: 699 - "Multidirectional Inclusive" PMSI (MI-PMSI) 701 A Multidirectional Inclusive PMSI is one that enables ANY PE 702 attaching to a particular MVPN to transmit a message such that it 703 will be received by EVERY other PE attaching to that MVPN. 705 There is at most one MI-PMSI per MVPN. (Though the P-tunnel or 706 P-tunnels that instantiate an MI-PMSI may actually carry the data 707 of more than one PMSI.) 709 An MI-PMSI can be thought of as an overlay broadcast network 710 connecting the set of PEs supporting a particular MVPN. 712 - "Unidirectional Inclusive" PMSI (UI-PMSI) 714 A Unidirectional Inclusive PMSI is one that enables a particular 715 PE, attached to a particular MVPN, to transmit a message such 716 that it will be received by all the other PEs attaching to that 717 MVPN. There is at most one UI-PMSI per PE per MVPN, though the 718 P-tunnel that instantiates a UI-PMSI may in fact carry the data 719 of more than one PMSI. 721 - "Selective" PMSI (S-PMSI). 723 A Selective PMSI is one that provides a mechanism wherein a 724 particular PE in an MVPN can multicast messages so that they will 725 be received by a subset of the other PEs of that MVPN. There may 726 be an arbitrary number of S-PMSIs per PE per MVPN. The P-tunnel 727 that instantiates a given S-PMSI may carry data from multiple 728 S-PMSIs. 730 We will see in later sections the role played by these different 731 kinds of PMSI. We will use the term "I-PMSI" when we are not 732 distinguishing between "MI-PMSIs" and "UI-PMSIs". 734 3.2.2. P-Tunnels Instantiating PMSIs 736 The P-tunnels that are used to instantiate PMSIs will be referred to 737 as "P-tunnels". A number of different tunnel setup techniques can be 738 used to create the P-tunnels that instantiate the PMSIs. Among these 739 are: 741 - PIM 743 A PMSI can be instantiated as (a set of) Multicast Distribution 744 Trees created by the PIM P-instance ("P-trees"). 746 The multicast distribution trees that instantiate I-PMSIs may be 747 either shared trees or source-specific trees. 749 This document (along with [MVPN-BGP]) specifies procedures for 750 identifying a particular (C-S,C-G) flow and assigning it to a 751 particular S-PMSI. Such an S-PMSI is most naturally instantiated 752 as a source-specific tree. 754 The use of shared trees (including bidirectional trees) to 755 instantiate S-PMSIs is outside the scope of this document. 757 The use of PIM-DM to create P-tunnels is not supported. 759 P-tunnels may be shared by multiple MVPNs (i.e., a given P-tunnel 760 may be the instantiation of multiple PMSIs), as long as the 761 tunnel encapsulation provides some means of demultiplexing the 762 data traffic by MVPN. 764 - MLDP 766 MLDP Point-to-Multipoint (P2MP) LSPs or Multipoint-to-Multipoint 767 (MP2MP) LSPs can be used to instantiate I-PMSIs. 769 An S-PMSI or a UI-PMSI could be instantiated as a single mLDP 770 P2MP LSP, whereas an MI-PMSI would have to be instantiated as a 771 set of such LSPs (each PE in the MVPN being the root of one such 772 LSP), or as a single MP2MP LSP. 774 Procedures for sharing MP2MP LSPs across multiple MVPNs are 775 outside the scope of this document. 777 The use of MP2MP LSPs to instantiate S-PMSIs is outside the scope 778 of this document. 780 Section 11.2.3 discusses a way of using a partial mesh of MP2MP 781 LSPs to instantiate a PMSI. However, a full specification of the 782 necessary procedures is outside the scope of this document. 784 - RSVP-TE 786 A PMSI may be instantiated as one or more RSVP-TE 787 Point-to-Multipoint (P2MP) LSPs. An S-PMSI or a UI-PMSI would be 788 instantiated as a single RSVP-TE P2MP LSP, whereas a 789 Multidirectional Inclusive PMSI would be instantiated as a set of 790 such LSPs, one for each PE in the MVPN. RSVP-TE P2MP LSPs can be 791 shared across multiple MVPNs. 793 - A Mesh of Unicast P-Tunnels. 795 If a PMSI is implemented as a mesh of unicast P-tunnels, a PE 796 wishing to transmit a packet through the PMSI would replicate the 797 packet, and send a copy to each of the other PEs. 799 An MI-PMSI for a given MVPN can be instantiated as a full mesh of 800 unicast P-tunnels among that MVPN's PEs. A UI-PMSI or an S-PMSI 801 can be instantiated as a partial mesh. 803 It can be seen that each method of implementing PMSIs has its own 804 area of applicability. This specification therefore allows for the 805 use of any of these methods. At first glance, this may seem like an 806 overabundance of options. However, the history of multicast 807 development and deployment should make it clear that there is no one 808 option which is always acceptable. The use of segmented inter-AS 809 trees does allow each SP to select the option which it finds most 810 applicable in its own environment, without causing any other SP to 811 choose that same option. 813 SPECIFYING THE CONDITIONS UNDER WHICH A PARTICULAR TREE BUILDING 814 METHOD IS APPLICABLE IS OUTSIDE THE SCOPE OF THIS DOCUMENT. 816 The choice of the tunnel technique belongs to the sender router and 817 is a local policy decision of that router. The procedures defined 818 throughout this document do not mandate that the same tunnel 819 technique be used for all P-tunnels going through a given provider 820 backbone. It is however expected that any tunnel technique that can 821 be used by a PE for a particular MVPN is also supported by all the 822 other PEs having VRFs for the MVPN. Moreover, the use of ingress 823 replication by any PE for an MVPN, implies that all other PEs MUST 824 use ingress replication for this MVPN. 826 3.3. Use of PMSIs for Carrying Multicast Data 828 Each PE supporting a particular MVPN must have a way of discovering 829 the following information: 831 - The set of other PEs in its AS that are attached to sites of that 832 MVPN, and the set of other ASes that have PEs attached to sites 833 of that MVPN. However, if non-segmented inter-AS trees are used 834 (see section 8.1), then each PE needs to know the entire set of 835 PEs attached to sites of that MVPN. 837 - If segmented inter-AS trees are to be used, the set of border 838 routers in its AS that support inter-AS connectivity for that 839 MVPN 841 - If the MVPN is configured to use an MI-PMSI, the information 842 needed to set up and to use the P-tunnels instantiating the 843 MI-PMSI, 845 - For each other PE, whether the PE supports Aggregate Trees for 846 the MVPN, and if so, the demultiplexing information that must be 847 provided so that the other PE can determine whether a packet that 848 it received on an aggregate tree belongs to this MVPN. 850 In some cases the information above is provided by means of the 851 BGP-based auto-discovery procedures discussed in sections 4 and 6.1. 852 In other cases, this information is provided after discovery is 853 complete, by means of procedures discussed in section 7.4. In either 854 case, the information that is provided must be sufficient to enable 855 the PMSI to be bound to the identified P-tunnel, to enable the 856 P-tunnel to be created if it does not already exist, and to enable 857 the different PMSIs that may travel on the same P-tunnel to be 858 properly demultiplexed. 860 If an MVPN uses an MI-PMSI, then the information needed to identify 861 the P-tunnels that instantiate the MI-PMSI has to be known to the PEs 862 attached to the MVPN before any data can be transmitted on the 863 MI-PMSI. This information is either statically configured or 864 auto-discovered (see section 4). The actual process of constructing 865 the P-tunnels (e.g., via PIM, RSVP-TE, or mLDP) SHOULD occur as soon 866 as this information is known. 868 When MI-PMSIs are used, they may serve as the default method of 869 carrying C-multicast data traffic. When we say that an MI-PMSI is 870 the "default" method of carrying C-multicast data traffic for a 871 particular MVPN, we mean that it is not necessary to use any special 872 control procedures to bind a particular C-flow to the MI-PMSI; any 873 C-flows that have not been bound to other PMSIs will be assumed to 874 travel through the MI-PMSI. 876 There is no requirement to use MI-PMSIs as the default method of 877 carrying C-flows. It is possible to adopt a policy in which all 878 C-flows are carried on UI-PMSIs or S-PMSIs. In this case, if an 879 MI-PMSI is not used for carrying routing information it is not needed 880 at all. 882 Even when an MI-PMSI is used as the default method of carrying an 883 MVPN's C-flows, if a particular C-flow has certain characteristics, 884 it may be desirable to migrate it from the MI-PMSI to an S-PMSI. 885 These characteristics, as well as the procedures for migrating a 886 C-flow from an MI-PMSI to an S-PMSI, are discussed in section 7. 888 Sometimes a set of C-flows are traveling the same, shared, C-tree 889 (e.g., either unidirectional or bidirectional), and it may be 890 desirable to move the whole set of C-flows as a unit to an S-PMSI. 891 Procedures for doing this are outside the scope of this 892 specification. 894 Some of the procedures for transmitting C-multicast routing 895 information among the PEs require that the routing information be 896 sent over an MI-PMSI. Other procedures do not use an MI-PMSI to 897 transmit the C-multicast routing information. 899 For a given MVPN, whether an MI-PMSI is used to carry C-multicast 900 routing information is independent from whether an MI-PMSI is used as 901 the default method of carrying the C-multicast data traffic. 903 As previously stated, it is possible to send all C-flows on a set of 904 S-PMSIs, omitting any usage of I-PMSIs. This prevents PEs from 905 receiving data that they don't need, at the cost of requiring 906 additional P-tunnels, and additional signaling to bind the C-flows to 907 P-tunnels. Cost-effective instantiation of S-PMSIs is likely to 908 require Aggregate P-trees, which in turn makes it necessary for the 909 transmitting PE to know which PEs need to receive which multicast 910 streams. This is known as "explicit tracking", and the procedures to 911 enable explicit tracking may themselves impose a cost. This is 912 further discussed in section 7.4.1.2. 914 3.4. PE-PE Transmission of C-Multicast Routing 916 As a PE attached to a given MVPN receives C-Join/Prune messages from 917 its CEs in that MVPN, it must convey the information contained in 918 those messages to other PEs that are attached to the same MVPN. 920 There are several different methods for doing this. As these methods 921 are not interoperable, the method to be used for a particular MVPN 922 must either be configured, or discovered as part of the 923 auto-discovery process. 925 3.4.1. PIM Peering 927 3.4.1.1. Full Per-MVPN PIM Peering Across a MI-PMSI 929 If the set of PEs attached to a given MVPN are connected via a 930 MI-PMSI, the PEs can form "normal" PIM adjacencies with each other. 931 Since the MI-PMSI functions as a broadcast network, the standard PIM 932 procedures for forming and maintaining adjacencies over a LAN can be 933 applied. 935 As a result, the C-Join/Prune messages that a PE receives from a CE 936 can be multicast to all the other PEs of the MVPN. PIM "join 937 suppression" can be enabled and the PEs can send Asserts as needed. 939 This procedure is fully specified in section 5.2. 941 3.4.1.2. Lightweight PIM Peering Across a MI-PMSI 943 The procedure of the previous section has the following 944 disadvantages: 946 - Periodic Hello messages must be sent by all PEs. 948 Standard PIM procedures require that each PE in a particular MVPN 949 periodically multicast a Hello to all the other PEs in that MVPN. 950 If the number of MVPNs becomes very large, sending and receiving 951 these Hellos can become a substantial overhead for the PE 952 routers. 954 - Periodic retransmission of C-Join/Prune messages. 956 PIM is a "soft-state" protocol, in which reliability is assured 957 through frequent retransmissions (refresh) of control messages. 958 This too can begin to impose a large overhead on the PE routers 959 as the number of MVPNs grows. 961 The first of these disadvantages is easily remedied. The reason for 962 the periodic PIM Hellos is to ensure that each PIM speaker on a LAN 963 knows who all the other PIM speakers on the LAN are. However, in the 964 context of MVPN, PEs in a given MVPN can learn the identities of all 965 the other PEs in the MVPN by means of the BGP-based auto-discovery 966 procedure of section 4. In that case, the periodic Hellos would 967 serve no function, and could simply be eliminated. (Of course, this 968 does imply a change to the standard PIM procedures.) 970 When Hellos are suppressed, we may speak of "lightweight PIM 971 peering". 973 The periodic refresh of the C-Join/Prunes is not as simple to 974 eliminate. If and when "refresh reduction" procedures are specified 975 for PIM, it may be useful to incorporate them, so as to make the 976 lightweight PIM peering procedures even more lightweight. 978 Lightweight PIM peering is not specified in this document. 980 3.4.1.3. Unicasting of PIM C-Join/Prune Messages 982 PIM does not require that the C-Join/Prune messages that a PE 983 receives from a CE to be multicast to all the other PEs; it allows 984 them to be unicast to a single PE, the one that is upstream on the 985 path to the root of the multicast tree mentioned in the Join/Prune 986 message. Note that when the C-Join/Prune messages are unicast, there 987 is no such thing as "join suppression". Therefore PIM Refresh 988 Reduction may be considered to be a pre-requisite for the procedure 989 of unicasting the C-Join/Prune messages. 991 When the C-Join/Prunes are unicast, they are not transmitted on a 992 PMSI at all. Note that the procedure of unicasting the C-Join/Prunes 993 is different than the procedure of transmitting the C-Join/Prunes on 994 an MI-PMSI that is instantiated as a mesh of unicast P-tunnels. 996 If there are multiple PEs that can be used to reach a given C-source, 997 procedures described in sections 5.1 and 9 MUST be used to ensure 998 that duplicate packets do not get delivered. 1000 Procedures for unicasting the PIM control messages are not further 1001 specified in this document. 1003 3.4.2. Using BGP to Carry C-Multicast Routing 1005 It is possible to use BGP to carry C-multicast routing information 1006 from PE to PE, dispensing entirely with the transmission of 1007 C-Join/Prune messages from PE to PE. This is discussed in section 5.3 1008 and fully specified in [MVPN-BGP]. 1010 4. BGP-Based Autodiscovery of MVPN Membership 1012 BGP-based autodiscovery is done by means of a new address family, the 1013 MCAST-VPN address family. (This address family also has other uses, 1014 as will be seen later.) Any PE that attaches to an MVPN must issue a 1015 BGP update message containing an NLRI ("Network Layer Reachability 1016 Information" element) in this address family, along with a specific 1017 set of attributes. In this document, we specify the information that 1018 must be contained in these BGP updates in order to provide 1019 auto-discovery. The encoding details, along with the complete set of 1020 detailed procedures, are specified in a separate document [MVPN-BGP]. 1022 This section specifies the intra-AS BGP-based autodiscovery 1023 procedures. When segmented inter-AS trees are used, additional 1024 procedures are needed, as specified in [MVPN-BGP]. (When segmented 1025 inter-AS trees are not used, the inter-AS procedures are almost 1026 identical to the intra-AS procedures.) 1028 BGP-based autodiscovery uses a particular kind of MCAST-VPN route 1029 known as an "auto-discovery routes", or "A-D route". In particular, 1030 it uses two kinds of "A-D routes", the "Intra-AS I-PMSI A-D Route" 1031 and the "Inter-AS I-PMSI A-D Route". (There are also additional 1032 kinds of A-D routes, such as the Source Active A-D routes which are 1033 used for purposes that go beyond auto-discovery. These are discussed 1034 in subsequent sections.) 1036 The Inter-AS I-PMSI A-D Route is used only when segmented inter-AS 1037 P-tunnels are used, as specified in [MVPN-BGP]. 1039 The "Intra-AS I-PMSI A-D route" is originated by the PEs that are 1040 (directly) connected to the site(s) of an MVPN. It is distributed to 1041 other PEs that attach to sites of the MVPN. If segmented Inter-AS 1042 P-Tunnels are used, then the Intra-AS I-PMSI A-D routes are not 1043 distributed outside the AS where they originate; if segmented 1044 Inter-AS P-Tunnels are not used, then the Intra-AS I-PMSI A-D routes 1045 are, despite their name, distributed to all PEs attached to the VPN, 1046 no matter what AS the PEs are in. 1048 The NLRI of an Intra-AS I-PMSI A-D route must contain the following 1049 information: 1051 - The route type (i.e., Intra-AS I-PMSI A-D route) 1053 - The IP address of the originating PE 1055 - An RD ("Route Distinguisher", [RFC4364]) configured locally for 1056 the MVPN. This is an RD that can be prepended to that IP address 1057 to form a globally unique VPN-IP address of the PE. 1059 Intra-AS I-PMSI A-D routes carry the following attributes: 1061 - Route Target Extended Communities attribute. 1063 One or more of these MUST be carried by each Intra-AS I-PMSI A-D 1064 route. If any other PE has one of these Route Targets configured 1065 for import into a VRF, it treats the advertising PE as a member 1066 in the MVPN to which the VRF belongs. This allows each PE to 1067 discover the PEs that belong to a given MVPN. More specifically 1068 it allows a PE in the Receiver Sites set to discover the PEs in 1069 the Sender Sites set of the MVPN and the PEs in the Sender Sites 1070 set of the MVPN to discover the PEs in the Receiver Sites set of 1071 the MVPN. The PEs in the Receiver Sites set would be configured 1072 to import the Route Targets advertised in the BGP A-D routes by 1073 PEs in the Sender Sites set. The PEs in the Sender Sites set 1074 would be configured to import the Route Targets advertised in the 1075 BGP A-D routes by PEs in the Receiver Sites set. 1077 - PMSI tunnel attribute. 1079 This attribute is present whenever the MVPN uses an MI-PMSI, or 1080 when it uses a UI-PMSI rooted at the originating router. It 1081 contains the following information: 1083 * tunnel technology, which may be one of the following: 1085 + Bidirectional multicast tree created by BIDIR-PIM, 1087 + Source-specific multicast tree crated by PIM-SM, 1088 supporting the SSM service model, 1090 + A set of trees (one shared tree and a set of source 1091 trees) PIM-SM, supporting the ASM service model, 1093 + Point-to-multipoint LSP created by RSVP-TE, 1095 + Point-to-multipoint LSP created by mLDP, 1097 + multipoint-to-multipoint LSP created by mLDP 1099 + unicast tunnel 1101 * P-tunnel identifier 1103 Before a P-tunnel can be constructed to instantiate the 1104 I-PMSI, the PE must be able to create a unique identifier for 1105 the tunnel. The syntax of this identifier depends on the 1106 tunnel technology used. 1108 Each PE attaching to a given MVPN must be configured with 1109 information specifying the allowable encapsulations to use 1110 for that MVPN, as well as the particular one of those 1111 encapsulations that the PE is to identify in the PMSI Tunnel 1112 Attribute of the I-PMSI Intra-AS A-D routes that it 1113 originates. 1115 * Multi-VPN aggregation capability and demultiplexor value. 1117 This specifies whether the P-tunnel is capable of aggregating 1118 I-PMSIs from multiple MVPNs. This will affect the 1119 encapsulation used. If aggregation is to be used, a 1120 demultiplexor value to be carried by packets for this 1121 particular MVPN must also be specified. The demultiplexing 1122 mechanism and signaling procedures are described in section 1123 6. 1125 - PE Distinguisher Labels Attribute 1127 Sometimes it is necessary for one PE to advertise an 1128 upstream-assigned MPLS label that identifies another PE. Under 1129 certain circumstances to be discussed later, a PE that is the 1130 root of a multicast P-tunnel will bind an MPLS label value to one 1131 or more of the PEs that belong to the P-tunnel, and will 1132 distribute these label bindings using Intra-AS I-PMSI A-D routes. 1134 Specification of when this must be done is provided in sections 1135 6.4.4 and 11.2.2. We refer to these as "PE Distinguisher 1136 Labels". 1138 Note that, as specified in [MPLS-UPSTREAM-LABEL], PE 1139 Distinguisher Label values are unique only in the context of the 1140 IP address identifying the root of the P-tunnel; they are not 1141 necessarily unique per tunnel. 1143 5. PE-PE Transmission of C-Multicast Routing 1145 As a PE attached to a given MVPN receives C-Join/Prune messages from 1146 its CEs in that MVPN, it must convey the information contained in 1147 those messages to other PEs that are attached to the same MVPN. This 1148 is known as the "PE-PE transmission of C-multicast routing 1149 information". 1151 This section specifies the procedures used for PE-PE transmission of 1152 C-multicast routing information. Not every procedure mentioned in 1153 section 3.4 is specified here. Rather, this section focuses on two 1154 particular procedures: 1156 - Full PIM Peering. 1158 This procedure is fully specified herein. 1160 - Use of BGP to distribute C-multicast routing 1162 This procedure is described herein, but the full specification 1163 appears in [MVPN-BGP]. 1165 Those aspects of the procedures that apply to both of the above are 1166 also specified fully herein. 1168 Specification of other procedures is outside the scope of this 1169 document. 1171 5.1. Selecting the Upstream Multicast Hop (UMH) 1173 When a PE receives a C-Join/Prune message from a CE, the message 1174 identifies a particular multicast flow as belonging either to a 1175 source-specific tree (S,G) or to a shared tree (*,G). Throughout 1176 this section, we use the term C-root to refer to S, in the case of a 1177 source-specific tree, or to the Rendezvous Point (RP) for G, in the 1178 case of (*,G). If the route to the C-root is across the VPN 1179 backbone, then the PE needs to find the "upstream multicast hop" 1180 (UMH) for the (S,G) or (*,G) flow. The "upstream multicast hop" is 1181 either the PE at which (S,G) or (*,G) data packets enter the VPN 1182 backbone, or else is the Autonomous System Border Router (ASBR) at 1183 which those data packets enter the local AS when traveling through 1184 the VPN backbone. The process of finding the upstream multicast hop 1185 for a given C-root is known as "upstream multicast hop selection". 1187 5.1.1. Eligible Routes for UMH Selection 1189 In the simplest case, the PE does the upstream hop selection by 1190 looking up the C-root in the unicast VRF associated with the PE-CE 1191 interface over which the C-Join/Prune was received. The route that 1192 matches the C-root will contain the information needed to select the 1193 upstream multicast hop. 1195 However, in some cases, the CEs may be distributing to the PEs a 1196 special set of routes that are to be used exclusively for the purpose 1197 of upstream multicast hop selection, and not used for unicast routing 1198 at all. For example, when BGP is the CE-PE unicast routing protocol, 1199 the CEs may be using SAFI 2 to distribute a special set of routes 1200 that are to be used for, and only for, upstream multicast hop 1201 selection. When OSPF [OSPF] is the CE-PE routing protocol, the CE 1202 may use an MT-ID ("Multi-Topology Identifier") [OSPF-MT]of 1 to 1203 distribute a special set of routes that are to be used for, and only 1204 for, upstream multicast hop selection . When a CE uses one of these 1205 mechanisms to distribute to a PE a special set of routes to be used 1206 exclusively for upstream multicast hop selection, these routes are 1207 distributed among the PEs using SAFI 129, as described in [MVPN-BGP]. 1209 Whether the routes used for upstream multicast hop selection are (a) 1210 the "ordinary" unicast routes or (b) a special set of routes that are 1211 used exclusively for upstream multicast hop selection, is a matter of 1212 policy. How that policy is chosen, deployed, or implemented is 1213 outside the scope of this document. In the following, we will simply 1214 refer to the set of routes that are used for upstream multicast hop 1215 selection, the "Eligible UMH routes", with no presumptions about the 1216 policy by which this set of routes was chosen. 1218 5.1.2. Information Carried by Eligible UMH Routes 1220 Every route that is eligible for UMH selection SHOULD carry a VRF 1221 Route Import Extended Community [MVPN-BGP]. However, if BGP is used 1222 to distribute C-multicast routing information, or if the route is 1223 from a VRF that belongs to a multi-AS VPN as described in option b of 1224 section 10 of [RFC4364], then the route MUST carry a VRF Route Import 1225 Extended Community. This attribute identifies the PE that originated 1226 the route. 1228 If BGP is used for carrying C-multicast routes, OR if "Segmented 1229 Inter-AS Tunnels" are used, then every UMH route MUST also carry a 1230 Source AS Extended Community [MVPN-BGP]. 1232 These two attributes are used in the upstream multicast hop selection 1233 procedures described below. 1235 5.1.3. Selecting the Upstream PE 1237 The first step in selecting the upstream multicast hop for a given 1238 C-root is to select the upstream PE router for that C-root. 1240 The PE that received the C-Join message from a CE looks in the VRF 1241 corresponding to the interfaces over which the C-Join was received. 1242 It finds the Eligible UMH route that is the best match for the C-root 1243 specified in that C-Join. Call this the "Installed UMH Route". 1245 Note that the outgoing interface of the Installed UMH Route may be 1246 one of the interfaces associated with the VRF, in which case the 1247 upstream multicast hop is a CE and the route to the C-root is not 1248 across the VPN backbone. 1250 Consider the set of all VPN-IP routes that are: (a) eligible to be 1251 imported into the VRF (as determined by their Route Targets), (b) are 1252 eligible to be used for upstream multicast hop selection, and (c) 1253 have exactly the same IP prefix (not necessarily the same RD) as the 1254 installed UMH route. 1256 For each route in this set, determine the corresponding upstream PE 1257 and upstream RD. If a route has a VRF Route Import Extended 1258 Community, the route's upstream PE is determined from it. If a route 1259 does not have a VRF Route Import Extended Community, the route's 1260 upstream PE is determined from the route's BGP next hop attribute. 1261 In either case, the upstream RD is taken from the route's NLRI. 1263 This results in a set of triples of . 1266 Call this the "UMH Route Candidate Set." Then the PE MUST select a 1267 single route from the set to be the "Selected UMH Route". The 1268 corresponding upstream PE is known as the "Selected Upstream PE", and 1269 the corresponding upstream RD is known as the "Selected Upstream RD". 1271 There are several possible procedures that can be used by a PE to 1272 select a single route from the candidate set. 1274 The default procedure, which MUST be implemented, is to select the 1275 route whose corresponding upstream PE address is numerically highest, 1276 where a 32-bit IP address is treated as a 32 bit unsigned integer. 1277 Call this the "default upstream PE selection". For a given C-root, 1278 provided that the routing information used to create the candidate 1279 set is stable, all PEs will have the same default upstream PE 1280 selection. (Though different default upstream PE selections may be 1281 chosen during a routing transient.) 1283 An alternative procedure that MUST be implemented, but which is 1284 disabled by default, is the following. This procedure ensures that, 1285 except during a routing transient, each PE chooses the same upstream 1286 PE for a given combination of C-root and C-G. 1288 1. The PEs in the candidate set are numbered from lower to higher 1289 IP address, starting from 0. 1291 2. The following hash is performed: 1293 - A bytewise exclusive-or of all the bytes in the C-root 1294 address and the C-G address is performed. 1296 - The result is taken modulo n, where n is the number of PEs 1297 in the candidate set. Call this result N. 1299 The selected upstream PE is then the one that appears in position N 1300 in the list of step 1. 1302 Other hashing algorithms are allowed as well, but not required. 1304 The alternative procedure allows a form of "equal cost load 1305 balancing". Suppose, for example, that from egress PEs PE3 and PE4, 1306 source C-S can be reached, at equal cost, via ingress PE PE1 or 1307 ingress PE PE2. The load balancing procedure makes it possible for 1308 PE1 to be the ingress PE for (C-S,C-G1) data traffic while PE2 is the 1309 ingress PE for (C-S,C-G2) data traffic. 1311 Another procedure, which SHOULD be implemented, is to use the 1312 Installed UMH Route as the Selected UMH Route. If this procedure is 1313 used, the result is likely to be that a given PE will choose the 1314 upstream PE that is closest to it, according to the routing in the SP 1315 backbone. As a result, for a given C-root, different PEs may choose 1316 different upstream PEs. This is useful if the C-root is an anycast 1317 address, and can also be useful if the C-root is in a multihomed site 1318 (i.e., a site that is attached to multiple PEs). However, this 1319 procedure is more likely to lead to steady state duplication of 1320 traffic unless (a) PEs discard data traffic that arrives from the 1321 "wrong" upstream PE, or (b) data traffic is carried only in 1322 non-aggregated S-PMSIs . This issue is discussed at length in 1323 section 9. 1325 General policy-based procedures for selecting the UMH route are 1326 allowed, but not required and are not further discussed in this 1327 specification. 1329 5.1.4. Selecting the Upstream Multicast Hop 1331 In certain cases, the selected upstream multicast hop is the same as 1332 the selected upstream PE. In other cases, the selected upstream 1333 multicast hop is the ASBR that is the "BGP next hop" of the Selected 1334 UMH Route. 1336 If the selected upstream PE is in the local AS, then the selected 1337 upstream PE is also the selected upstream multicast hop. This is the 1338 case if any of the following conditions holds: 1340 - The selected UMH route has a Source AS Extended Community, and 1341 the Source AS is the same as the local AS, 1343 - The selected UMH route does not have a Source AS Extended 1344 Community, but the route's BGP next hop is the same as the 1345 upstream PE. 1347 Otherwise, the selected upstream multicast hop is an ASBR. The 1348 method of determining just which ASBR it is depends on the particular 1349 inter-AS signaling method being used (PIM or BGP), and on whether 1350 segmented or non-segmented inter-AS tunnels are used. These details 1351 are presented in later sections. 1353 5.2. Details of Per-MVPN Full PIM Peering over MI-PMSI 1355 When an MVPN uses an MI-PMSI, the C-instances of that MVPN can treat 1356 the MI-PMSI as a LAN interface, and form full PIM adjacencies with 1357 each other over that "LAN interface". 1359 The use of PIM when an MI-PMSI is not in use is outside the scope of 1360 this document. 1362 To form full PIM adjacencies, the PEs execute the standard PIM 1363 procedures on the "LAN interface", including the generation and 1364 processing of PIM Hello, Join/Prune, Assert, DF (Designated 1365 Forwarder) election, and other PIM control packets. These are 1366 executed independently for each C-instance. PIM "join suppression" 1367 SHOULD be enabled. 1369 5.2.1. PIM C-Instance Control Packets 1371 All IPv4 PIM C-Instance control packets of a particular MVPN are 1372 addressed to the ALL-PIM-ROUTERS (224.0.0.13) IP destination address, 1373 and transmitted over the MI-PMSI of that MVPN. While in transit in 1374 the P-network, the packets are encapsulated as required for the 1375 particular kind of P-tunnel that is being used to instantiate the 1376 MI-PMSI. Thus the C-instance control packets are not processed by 1377 the P routers, and MVPN-specific PIM routes can be extended from site 1378 to site without appearing in the P routers. 1380 The handling of IPv6 PIM C-Instance control packets will be specified 1381 in a follow-on document. 1383 As specified in section 5.1.2, when a PE distributes VPN-IP routes 1384 that are eligible for use as UMH routes, the PE MUST include a VRF 1385 Route Import Extended Community with each route. For a given MVPN, a 1386 single such IP address MUST be used, and that same IP address MUST be 1387 used as the source address in all PIM control packets for that MVPN. 1389 Note that BSR ("Bootstrap Router Mechanism for PIM") [BSR] messages 1390 are treated the same as PIM C-instance control packets, and BSR 1391 processing is regarded as an integral part of the PIM C-instance 1392 processing. 1394 5.2.2. PIM C-instance RPF Determination 1396 Although the MI-PMSI is treated by PIM as a LAN interface, unicast 1397 routing is NOT run over it, and there are no unicast routing 1398 adjacencies over it. It is therefore necessary to specify special 1399 procedures for determining when the MI-PMSI is to be regarded as the 1400 "RPF Interface" for a particular C-address. 1402 The PE follows the procedures of section 5.1 to determine the 1403 selected UMH route. If that route is NOT a VPN-IP route learned from 1404 BGP as described in [RFC4364], or if that route's outgoing interface 1405 is one of the interfaces associated with the VRF, then ordinary PIM 1406 procedures for determining the RPF interface apply. 1408 However, if the selected UMH route is a VPN-IP route whose outgoing 1409 interface is not one of the interfaces associated with the VRF, then 1410 PIM will consider the RPF interface to be the MI-PMSI associated with 1411 the VPN-specific PIM instance. 1413 Once PIM has determined that the RPF interface for a particular 1414 C-root is the MI-PMSI, it is necessary for PIM to determine the "RPF 1415 neighbor" for that C-root. This will be one of the other PEs that is 1416 a PIM adjacency over the MI-PMSI. In particular, it will be the 1417 "selected upstream PE" as defined in section 5.1. 1419 5.3. Use of BGP for Carrying C-Multicast Routing 1421 It is possible to use BGP to carry C-multicast routing information 1422 from PE to PE, dispensing entirely with the transmission of 1423 C-Join/Prune messages from PE to PE. This section describes the 1424 procedures for carrying intra-AS multicast routing information. 1425 Inter-AS procedures are described in section 8. The complete 1426 specification of both sets of procedures and of the encodings can be 1427 found in [MVPN-BGP]. 1429 5.3.1. Sending BGP Updates 1431 The MCAST-VPN address family is used for this purpose. MCAST-VPN 1432 routes used for the purpose of carrying C-multicast routing 1433 information are distinguished from those used for the purpose of 1434 carrying auto-discovery information by means of a "route type" field 1435 which is encoded into the NLRI. The following information is 1436 required in BGP to advertise the MVPN routing information. The NLRI 1437 contains: 1439 - The type of C-multicast route. 1441 There are two types: 1443 * source tree join 1445 * shared tree join 1447 - The C-Group address. 1449 - The C-Source address. (In the case of a shared tree join, this is 1450 the address of the C-RP.) 1452 - The Selected Upstream RD corresponding to the C-root address 1453 (determined by the procedures of section 5.1). 1455 Whenever a C-multicast route is sent, it must also carry the Selected 1456 Upstream Multicast Hop corresponding to the C-root address 1457 (determined by the procedures of section 5.1). The selected upstream 1458 multicast hop must be encoded as part of a Route Target Extended 1459 Community, to facilitate the optional use of filters which can 1460 prevent the distribution of the update to BGP speakers other than the 1461 upstream multicast hop. See section 10.1.3 of [MVPN-BGP] for the 1462 details. 1464 There is no C-multicast route corresponding to the PIM function of 1465 pruning a source off the shared tree when a PE switches from a 1466 (C-*,C-G) tree to a (C-S,C-G) tree. Section 9 of this document 1467 specifies a mandatory procedure that ensures that if any PE joins a 1468 (C-S,C-G) source tree, all other PEs that have joined or will join 1469 the (C-*,C-G) shared tree will also join the (C-S,C-G) source tree. 1470 This eliminates the need for a C-multicast route that prunes C-S off 1471 the (C-*,C-G) shared tree when switching from (C-*, C-G) to (C-S,C-G) 1472 tree. 1474 5.3.2. Explicit Tracking 1476 Note that the upstream multicast hop is NOT part of the NLRI in the 1477 C-multicast BGP routes. This means that if several PEs join the same 1478 C-tree, the BGP routes they distribute to do so are regarded by BGP 1479 as comparable routes, and only one will be installed. If a route 1480 reflector is being used, this further means that the PE that is used 1481 to reach the C-source will know only that one or more of the other 1482 PEs have joined the tree, but it won't know which one. That is, this 1483 BGP update mechanism does not provide "explicit tracking". Explicit 1484 tracking is not provided by default because it increases the amount 1485 of state needed and thus decreases scalability. Also, as 1486 constructing the C-PIM messages to send "upstream" for a given tree 1487 does not depend on knowing all the PEs that are downstream on that 1488 tree, there is no reason for the C-multicast route type updates to 1489 provide explicit tracking. 1491 There are some cases in which explicit tracking is necessary in order 1492 for the PEs to set up certain kinds of P-trees. There are other 1493 cases in which explicit tracking is desirable in order to determine 1494 how to optimally aggregate multicast flows onto a given aggregate 1495 tree. As these functions have to do with the setting up of 1496 infrastructure in the P-network, rather than with the dissemination 1497 of C-multicast routing information, any explicit tracking that is 1498 necessary is handled by sending a particular type of A-D route known 1499 as "Leaf A-D routes". 1501 Whenever a PE sends an A-D route with a PMSI Tunnel attribute, it can 1502 set a bit in the PMSI Tunnel attribute indicating "Leaf Information 1503 Required". A PE that installs such an A-D route MUST respond by 1504 generating a a Leaf A-D route, indicating that it needs to join (or 1505 be joined to) the specified PMSI tunnel. Details can be found in 1506 [MVPN-BGP]. 1508 5.3.3. Withdrawing BGP Updates 1510 A PE removes itself from a C-multicast tree (shared or source) by 1511 withdrawing the corresponding BGP update. 1513 If a PE has pruned a C-source from a shared C-multicast tree, and it 1514 needs to "unprune" that source from that tree, it does so by 1515 withdrawing the route that pruned the source from the tree. 1517 5.3.4. BSR 1519 BGP does not provide a method for carrying the control information of 1520 BSR packets received by a PE from a CE. BSR is supported by 1521 transmitting the BSR control messages from one PE in an MVPN to all 1522 the other PEs in that MVPN. 1524 When a PE needs to transmit a BSR message for a particular MVPN to 1525 other PEs, it must put its own IP address into the BSR message as the 1526 IP source address. As specified in section 5.1.2, when a PE 1527 distributes VPN-IP routes that are eligible for use as UMH routes, 1528 the PE MUST include a VRF Route Import Extended Community with each 1529 route. For a given MVPN, a single such IP address MUST be used, and 1530 that same IP address MUST be used as the source address in all BSR 1531 packets that the PE transmits to other PEs. 1533 The BSR message may be transmitted over any PMSI that will deliver 1534 the message to all the other PEs in the MVPN. If no such PMSI has 1535 been instantiated yet, then an appropriate P-tunnel must be 1536 advertised, and the C-flow whose C-source address is the address of 1537 the PE itself, and whose multicast group is ALL-PIM-ROUTERS 1538 (224.0.0.13), must be bound to it. This can be done using the 1539 procedures described in sections 7.3 and 7.4. Note that this is NOT 1540 meant to imply that the other PIM control packets from the PIM 1541 C-instance are to be transmitted to the other PEs. 1543 When a PE receives a BSR message for a particular MVPN from some 1544 other PE, the PE accepts the message only if the IP source address in 1545 that message is the selected upstream PE (see section 5.1.3) for the 1546 IP address of the Bootstrap router. Otherwise the PE simply discards 1547 the packet. If the PE accepts the packet, it does normal BSR 1548 processing on it, and may forward a BSR message to one or more CEs as 1549 a result. 1551 6. PMSI Instantiation 1553 This section provides the procedures for using P-tunnels to 1554 instantiate a PMSI. It describes the procedures for setting up and 1555 maintaining the P-tunnels, as well as for sending and receiving 1556 C-data and/or C-control messages on the P-tunnels. Procedures for 1557 binding particular C-flows to particular P-tunnels are however 1558 discussed in section 7. 1560 PMSIs can be instantiated either by P-multicast trees or by PE-PE 1561 unicast tunnels. In the latter case, the PMSI is said to be 1562 instantiated by "ingress replication." 1564 This specification supports a number of different methods for setting 1565 up P-multicast trees, and these are detailed below. A P-tunnel may 1566 support a single VPN (a non-aggregated P-multicast tree), or multiple 1567 VPNs (an aggregated P-multicast tree). 1569 6.1. Use of the Intra-AS I-PMSI A-D Route 1571 6.1.1. Sending Intra-AS I-PMSI A-D Routes 1573 When a PE is provisioned to have one or more VRFs that provide MVPN 1574 support, the PE announces its MVPN membership information using 1575 Intra-AS I-PMSI A-D routes, as discussed in section 4 and detailed in 1576 section 9.1.1 of [MVPN-BGP]. (Under certain conditions, detailed in 1577 [MVPN-BGP], the Intra-AS I-PMSI A-D route may be omitted.) 1579 Generally, the Intra-AS I-PMSI A-D route will have a PMSI Tunnel 1580 Attribute that identifies a P-tunnel that is being used to 1581 instantiate the I-PMSI. Section 9.1.1 of [MVPN-BGP] details certain 1582 conditions under which the PMSI Tunnel Attribute may be omitted (or 1583 in which a PMSI Tunnel Attribute with the "no tunnel information 1584 present" bit may be sent). 1586 As a special case, when (a) C-PIM control messages are to be sent 1587 through an MI-PMSI, and (b) the MI-PMSI is instantiated by a P-tunnel 1588 technique for which each PE needs to know only a single P-tunnel 1589 identifier per VPN, then the use of the Intra-As I-PMSI A-D Routes 1590 MAY be omitted, and static configuration of the tunnel identifier 1591 used instead. However, this is not recommended for long-term use, 1592 and in all other cases, the Intra-AS A-D routes MUST be used. 1594 The PMSI tunnel attribute MAY contain an upstream-assigned MPLS 1595 label, assigned by the PE originating the Intra-AS I-PMSI A-D route. 1596 If this label is present, the P-tunnel can be carrying data from 1597 several MVPNs. The label is used on the data packets traveling 1598 through the tunnel to identify the MVPN to which those data packets 1599 belong. (The specified label identifies the packet as belonging to 1600 the MVPN that is identified by the RTs of the Intra-AS I-PMSI A-D 1601 route.) 1603 See section 12.2 for details on how to place the label in the 1604 packet's label stack. 1606 The Intra-AS I-PMSI A-D Route may contain a "PE Distinguisher Labels" 1607 Attribute. This contains a set of bindings between upstream-assigned 1608 labels and PE addresses. The PE that originated the route may use 1609 this to bind an upstream-assigned label to one or more of the other 1610 PEs that belong to the same MVPN. The way in which PE Distinguisher 1611 Labels are used is discussed in sections 6.4.1, 6.4.2, 6.4.4, 11.2.2, 1612 and 12.3. 1614 6.1.2. Receiving Intra-AS I-PMSI A-D Routes 1616 When a PE receives an Intra-AS I-PMSI A-D route for a particular MVPN 1617 depends on the particular P-tunnel technology that is being used by 1618 that MVPN. If the P-tunnel technology requires tunnels to be built 1619 by means of receiver-initiated joins, the PE SHOULD join the tunnel 1620 immediately. 1622 6.2. When C-flows are Specifically Bound to P-Tunnels 1624 This situation is discussed in section 7. 1626 6.3. Aggregating Multiple MVPNs on a Single P-tunnel 1628 When a P-multicast tree is shared across multiple MVPNs it is termed 1629 an "Aggregate Tree". The procedures described in this document allow 1630 a single SP multicast tree to be shared across multiple MVPNs. Unless 1631 otherwise specified a P-multicast tree technology supports 1632 aggregation. 1634 All procedures that are specific to multi-MVPN aggregation are 1635 OPTIONAL and are explicitly pointed out. 1637 Aggregate Trees allow a single P-multicast tree to be used across 1638 multiple MVPNs, so that state in the SP core grows per-set-of-MVPNs 1639 and not per MVPN. Depending on the congruence of the aggregated 1640 MVPNs, this may result in trading off optimality of multicast 1641 routing. 1643 An Aggregate Tree can be used by a PE to provide an UI-PMSI or 1644 MI-PMSI service for more than one MVPN. When this is the case the 1645 Aggregate Tree is said to have an inclusive mapping. 1647 6.3.1. Aggregate Tree Leaf Discovery 1649 BGP MVPN membership discovery (section 4) allows a PE to determine 1650 the different Aggregate Trees that it should create and the MVPNs 1651 that should be mapped onto each such tree. The leaves of an Aggregate 1652 Tree are determined by the PEs, supporting aggregation, that belong 1653 to all the MVPNs that are mapped onto the tree. 1655 If an Aggregate Tree is used to instantiate one or more S-PMSIs, then 1656 it may be desirable for the PE at the root of the tree to know which 1657 PEs (in its MVPN) are receivers on that tree. This enables the PE to 1658 decide when to aggregate two S-PMSIs, based on congruence (as 1659 discussed in the next section). Thus explicit tracking may be 1660 required. Since the procedures for disseminating C-multicast routes 1661 do not provide explicit tracking, a type of A-D route known as a 1662 "Leaf A-D Route" is used. The PE that wants to assign a particular 1663 C-multicast flow to a particular Aggregate Tree can send an A-D route 1664 which elicits Leaf A-D routes from the PEs that need to receive that 1665 C-multicast flow. This provides the explicit tracking information 1666 needed to support the aggregation methodology discussed in the next 1667 section. For more details on Leaf A-D routes please refer to 1668 [MVPN-BGP]. 1670 6.3.2. Aggregation Methodology 1672 This document does not specify the mandatory implementation of any 1673 particular set of rules for determining whether or not the PMSIs of 1674 two particular MVPNs are to be instantiated by the same Aggregate 1675 Tree. This determination can be made by implementation-specific 1676 heuristics, by configuration, or even perhaps by the use of offline 1677 tools. 1679 It is the intention of this document that the control procedures will 1680 always result in all the PEs of an MVPN to agree on the PMSIs which 1681 are to be used and on the tunnels used to instantiate those PMSIs. 1683 This section discusses potential methodologies with respect to 1684 aggregation. 1686 The "congruence" of aggregation is defined by the amount of overlap 1687 in the leaves of the customer trees that are aggregated on a SP tree. 1688 For Aggregate Trees with an inclusive mapping the congruence depends 1689 on the overlap in the membership of the MVPNs that are aggregated on 1690 the tree. If there is complete overlap i.e. all MVPNs have exactly 1691 the same sites, aggregation is perfectly congruent. As the overlap 1692 between the MVPNs that are aggregated reduces, i.e. the number of 1693 sites that are common across all the MVPNs reduces, the congruence 1694 reduces. 1696 If aggregation is done such that it is not perfectly congruent a PE 1697 may receive traffic for MVPNs to which it doesn't belong. As the 1698 amount of multicast traffic in these unwanted MVPNs increases 1699 aggregation becomes less optimal with respect to delivered traffic. 1700 Hence there is a tradeoff between reducing state and delivering 1701 unwanted traffic. 1703 An implementation should provide knobs to control the congruence of 1704 aggregation. These knobs are implementation dependent. Configuring 1705 the percentage of sites that MVPNs must have in common to be 1706 aggregated, is an example of such a knob. This will allow a SP to 1707 deploy aggregation depending on the MVPN membership and traffic 1708 profiles in its network. If different PEs or servers are setting up 1709 Aggregate Trees this will also allow a service provider to engineer 1710 the maximum amount of unwanted MVPNs that a particular PE may receive 1711 traffic for. 1713 6.3.3. Demultiplexing C-multicast traffic 1715 If a P-multicast tree is associated with only one MVPN, determining 1716 the P-multicast tree on which a packet was received is sufficient to 1717 determine the packet's MVPN. All that the egress PE needs to know is 1718 the MVPN the P-multicast tree is associated with. 1720 When multiple MVPNs are aggregated onto one P-Multicast tree, 1721 determining the tree over which the packet is received is not 1722 sufficient to determine the MVPN to which the packet belongs. The 1723 packet must also carry some demultiplexing information to allow the 1724 egress PEs to determine the MVPN to which the packet belongs. Since 1725 the packet has been multicast through the P network, any given 1726 demultiplexing value must have the same meaning to all the egress 1727 PEs. The demultiplexing value is a MPLS label that corresponds to 1728 the multicast VRF to which the packet belongs. This label is placed 1729 by the ingress PE immediately beneath the P-Multicast tree header. 1730 Each of the egress PEs must be able to associate this MPLS label with 1731 the same MVPN. If downstream label assignment were used this would 1732 require all the egress PEs in the MVPN to agree on a common label for 1733 the MVPN. Instead the MPLS label is upstream assigned 1734 [MPLS-UPSTREAM-LABEL]. The label bindings are advertised via BGP 1735 updates originated by the ingress PEs. 1737 This procedure requires each egress PE to support a separate label 1738 space for every other PE. The egress PEs create a forwarding entry 1739 for the upstream assigned MPLS label, allocated by the ingress PE, in 1740 this label space. Hence when the egress PE receives a packet over an 1741 Aggregate Tree, it first determines the tree that the packet was 1742 received over. The tree identifier determines the label space in 1743 which the upstream assigned MPLS label lookup has to be performed. 1744 The same label space may be used for all P-multicast trees rooted at 1745 the same ingress PE, or an implementation may decide to use a 1746 separate label space for every P-multicast tree. 1748 A full specification of the procedures to support aggregation on 1749 shared trees or on MP2MP LSPs is outside the scope of this document. 1751 The encapsulation format is either MPLS or MPLS-in-something (e.g. 1752 MPLS-in-GRE [MPLS-IP]). When MPLS is used, this label will appear 1753 immediately below the label that identifies the P-multicast tree. 1754 When MPLS-in-GRE is used, this label will be the top MPLS label that 1755 appears when the GRE header is stripped off. 1757 When IP encapsulation is used for the P-multicast Tree, whatever 1758 information that particular encapsulation format uses for identifying 1759 a particular tunnel is used to determine the label space in which the 1760 MPLS label is looked up. 1762 If the P-multicast tree uses MPLS encapsulation, the P-multicast tree 1763 is itself identified by an MPLS label. The egress PE MUST NOT 1764 advertise IMPLICIT NULL or EXPLICIT NULL for that tree. Once the 1765 label representing the tree is popped off the MPLS label stack, the 1766 next label is the demultiplexing information that allows the proper 1767 MVPN to be determined. 1769 This specification requires that, to support this sort of 1770 aggregation, there be at least one upstream-assigned label per MVPN. 1771 It does not require that there be only one. For example, an ingress 1772 PE could assign a unique label to each (C-S,C-G). (This could be 1773 done using the same technique this is used to assign a particular 1774 (C-S,C-G) to an S-PMSI, see section 7.4.) 1776 When an egress PE receives a C-multicast data packet over a 1777 P-multicast tree, it needs to forward the packet to the CEs that have 1778 receivers in the packet's C-multicast group. In order to do this the 1779 egress PE needs to determine the P-tunnel on which the packet was 1780 received. The PE can then determine the MVPN that the packet belongs 1781 to and if needed do any further lookups that are needed to forward 1782 the packet. 1784 6.4. Considerations for Specific Tunnel Technologies 1786 While it is believed that the architecture specified in this document 1787 places no limitations on the protocols used for setting up and 1788 maintaining P-tunnels, the only protocols that have been explicitly 1789 considered are PIM-SM (both the SSM and ASM service models are 1790 considered, as are bidirectional trees), RSVP-TE, mLDP, and BGP. 1791 (BGP's role in the setup and maintenance of P-tunnels is to "stitch" 1792 together the Intra-AS segments of a segmented Inter-AS P-tunnel.) 1794 6.4.1. RSVP-TE P2MP LSPs 1796 If an I-PMSI is to be instantiated as one or more non-segmented 1797 P-tunnels, where the P-tunnels are RSVP-TE P2MP LSPs, then only the 1798 PEs which are at the head ends of those LSPs will ever include the 1799 PMSI Tunnel attribute in their Intra-AS I-PMSI A-D routes. (These 1800 will be the PEs in the "Sender Sites set".) 1802 If an I-PMSI is to be instantiated as one or more segmented 1803 P-tunnels, where some of the Intra-AS segments of these tunnels are 1804 RSVP-TE P2MP LSPs, then only a PE or ASBR which is at the head end of 1805 one of these LSPs will ever include the PMSI Tunnel attribute in its 1806 Inter-AS I-PMSI A-D route. 1808 Other PEs send Intra-AS I-PMSI A-D routes without PMSI Tunnel 1809 attributes. (These will be the PEs that are in the "Receiver Sites" 1810 but not in the "Sender Sites set".) As each "Sender Site" PE 1811 receives an Intra-AS I-PMSI A-D route from a PE in the Receiver Sites 1812 set, it adds the PE originating that Intra-AS I-PMSI A-D route to the 1813 set of receiving PEs for the P2MP LSP. The PE at the headend MUST 1814 then use RSVP-TE [RSVP-P2MP] signaling to add the receiver PEs to the 1815 P-tunnel. 1817 When RSVP-TE P2MP LSPs are used to instantiate S-PMSIs, and a 1818 particular C-flow is to be bound to the LSP, it is necessary to use 1819 explicit tracking so that the head end of the LSP knows which PEs 1820 need to receive data from the specified C-flow. If the binding is 1821 done using S-PMSI A-D routes (see section 7.4.1), the "Leaf 1822 Information Required" bit MUST be set in the PMSI Tunnel attribute. 1824 RSVP-TE P2MP LSPs can optionally support aggregation of multiple 1825 MVPNs. 1827 If an RSVP-TE P2MP TE LSP Tunnel is used for only a single MVPN, the 1828 mapping between the LSP and the MVPN can either be configured, or can 1829 be deduced from the procedures used to announce the LSP (e.g., from 1830 the RTs in the A-D route that announced the LSP). If the LSP is used 1831 for multiple MVPNs, the set of MVPNs using it (and the corresponding 1832 MPLS labels) are inferred from the PMSI tunnel attributes that 1833 specify the LSP. 1835 If an RSVP-TE P2MP LSP is being used to carry a set of C-flows 1836 traveling along a bidirectional C-Tree, using the procedures of 1837 section 11.2, the head end MUST include the PE Distinguisher Labels 1838 attribute in its I-PMSI Intra-AS A-D route or S-PMSI A-D route, and 1839 MUST provide an upstream-assigned label for each PE that it has 1840 selected as the upstream PE for the C-tree's RPA ("Rendezvous Point 1841 Address"). See section 11.2 for details. 1843 A PMSI Tunnel attribute specifying an RSVP-TE P2MP LSP contains the 1844 following information: 1845 - The type of the tunnel is set to RSVP-TE P2MP Tunnel 1847 - RSVP-TE P2MP Tunnel's SESSION Object 1849 - Optionally RSVP-TE P2MP LSP's SENDER_TEMPLATE Object. This object 1850 is included when it is desired to identify a particular P2MP TE 1851 LSP. 1853 Demultiplexing the C-multicast data packets at the egress PE follows 1854 procedures described in section 6.3.3. As specified in section 6.3.3 1855 an egress PE MUST NOT advertise IMPLICIT NULL or EXPLICIT NULL for a 1856 RSVP-TE P2MP LSP that is carrying traffic for one or more MVPNs. 1858 If (and only if) a particular RSVP-TE P2MP LSP is possibly carrying 1859 data from multiple MVPNs, the following special procedures apply: 1861 - A packet in a particular MVPN, when transmitted into the LSP, 1862 must carry the MPLS label specified in the PMSI tunnel attribute 1863 that announced that LSP as a P-tunnel for that for that MVPN. 1865 - Demultiplexing the C-multicast data packets at the egress PE is 1866 done by means of the MPLS label that rises to the top of the 1867 stack after the corresponding to the P2MP LSP is popped off. 1869 It is possible that at the time a PE learns, via an A-D route with a 1870 PMSI Tunnel attribute, that it needs to receive traffic on a 1871 particular RSVP-TE P2MP LSP, the signaling to set up the LSP will not 1872 have been completed. In this case, the PE needs to wait for the 1873 RSVP-TE signaling to take place before it can modify its forwarding 1874 tables as directed by the A-D route. 1876 It is also possible that the signaling to set up an RSVP-TE P2MP LSP 1877 will be completed before a given PE learns, via a PMSI Tunnel 1878 attribute, of the use to which that LSP will be put. The PE MUST 1879 discard any traffic received on that LSP until that time. 1881 In order for the egress PE to be able to discard such traffic it 1882 needs to know that the LSP is associated with an MVPN and that the 1883 A-D route that binds the LSP to an MVPN or to a particular a C-flow 1884 has not yet been received. This is provided by extending [RSVP-P2MP] 1885 with [RSVP-OOB]. 1887 6.4.2. PIM Trees 1889 When the P-tunnels are PIM trees, the PMSI Tunnel attribute contains 1890 enough information to allow each other PE in the same MVPN to use 1891 P-PIM signaling to join the P-tunnel. 1893 If an I-PMSI is to be instantiated as one or more PIM trees, then the 1894 PE that is at the root of a given PIM tree sends an Intra-AS I-PMSI 1895 A-D route containing a PMSI Tunnel attribute that contains all the 1896 information needed for other PEs to join the tree. 1898 If PIM trees are to be used to instantiate an MI-PMSI, each PE in the 1899 MVPN must send an Intra-AS I-PMSI A-D route containing such a PMSI 1900 Tunnel attribute. 1902 If a PMSI is to be instantiated via a shared tree, the PMSI Tunnel 1903 attribute identifies the a P-group address. The RP or RPA 1904 corresponding to the P-group address is not specified. It must of 1905 course be known to all the PEs. It is presupposed that the PEs use 1906 one of the methods for automatically learning the RP-to-group 1907 correspondences (e.g., Bootstrap Router Protocol [BSR]), or else that 1908 the correspondence are configured. 1910 If a PMSI is to be instantiated via a source-specific tree, the PMSI 1911 Tunnel attribute identifies the PE router that is the root of the 1912 tree, as well as a P-group address. The PMSI Tunnel attribute always 1913 specifies whether the PIM tree is to be a unidirectional shared tree, 1914 a bidirectional shared tree, or a source-specific tree. 1916 If PIM trees are being used to instantiate S-PMSIs, the above 1917 procedures assume that each PE router has a set of group P-addresses 1918 that it can use for setting up the PIM-trees. Each PE must be 1919 configured with this set of P-addresses. If the P-tunnels are 1920 source-specific trees, then the PEs may be configured with 1921 overlapping sets of group P-addresses. If the trees are not 1922 source-specific, then each PE must be configured with a unique set of 1923 group P-addresses (i.e., having no overlap with the set configured at 1924 any other PE router). The management of this set of addresses is 1925 thus greatly simplified when source-specific trees are used, so the 1926 use of source-specific trees is strongly recommended whenever 1927 unidirectional trees are desired. 1929 Specification of the full set of procedures for using bidirectional 1930 PIM trees to instantiate S-PMSIs are outside the scope of this 1931 document. 1933 Details for constructing the PMSI Tunnel Attribute identifying a PIM 1934 tree can be found in [MVPN-BGP]. 1936 6.4.3. mLDP P2MP LSPs 1938 When the P-tunnels are mLDP P2MP trees, each Intra-AS I-PMSI A-D 1939 route has a PMSI Tunnel attribute containing enough information to 1940 allow each other PE in the same MVPN to use mLDP signaling to join 1941 the P-tunnel. The tunnel identifier consists of the root node 1942 address, along with the Generic LSP Identifier value. 1944 An mLDP P2MP LSP may be used to carry traffic of multiple VPNs, if 1945 the PMSI Tunnel Attribute specifying it contains a non-zero MPLS 1946 label. 1948 If an mLDP P2MP LSP is being used to carry the set of flows traveling 1949 along a particular bidirectional C-tree, using the procedures of 1950 section 11.2, the root of the LSP MUST include the PE Distinguisher 1951 Labels attribute in its Intra-AS I-PMSI A-D route or S-PMSI A-D 1952 route, and MUST provide an upstream-assigned label for the PE that it 1953 has selected the upstream PE for the C-tree's RPA. See section 11.2 1954 for details. 1956 6.4.4. mLDP MP2MP LSPs 1958 Specification of the procedures for assigning C-flows to mLDP MP2MP 1959 LSPs that serve as P-tunnels is outside the scope of this document. 1961 6.4.5. Ingress Replication 1963 As described in section 3, a PMSI can be instantiated using Unicast 1964 Tunnels between the PEs that are participating in the MVPN. In this 1965 mechanism the ingress PE replicates a C-multicast data packet 1966 belonging to a particular MVPN and sends a copy to all or a subset of 1967 the PEs that belong to the MVPN. A copy of the packet is tunneled to 1968 a remote PE over a Unicast Tunnel to the remote PE. IP/GRE Tunnels or 1969 MPLS LSPs are examples of unicast tunnels that may be used. The same 1970 Unicast Tunnel can be used to transport packets belonging to 1971 different MVPNs 1973 In order for a PE to use Unicast P-Tunnels to send a C-multicast data 1974 packet for a particular MVPN to a set of remote PEs, the remote PEs 1975 must be able to correctly decapsulate such packets and to assign each 1976 one to the proper MVPN. This requires that the encapsulation used for 1977 sending packets through the P-tunnel have demultiplexing information 1978 which the receiver can associate with a particular MVPN. 1980 If ingress replication is being used to instantiate the PMSIs for an 1981 MVPN, the PEs announce this as part of the BGP based MVPN membership 1982 auto-discovery process, described in section 4. The PMSI tunnel 1983 attribute specifies ingress replication, and also specifies a 1984 downstream-assigned MPLS label. This label will be used to identify 1985 that a particular packet belongs to the MVPN that the Intra-AS I-PMSI 1986 A-D route belongs to (as inferred from its RTs.) If PE1 specifies a 1987 particular label value for a particular MVPN, then any other PE 1988 sending PE1 a packet for that MVPN through a unicast P-tunnel must 1989 put that label on the packet's label stack. PE1 then treats that 1990 label as the demultiplexor value identifying the MVPN in question. 1992 Ingress replication may be used to instantiate any kind of PMSI. 1993 When ingress replication is done, it is RECOMMENDED, except in the 1994 one particular case mentioned in the next paragraph, that explicit 1995 tracking be done, and that the data packets of a particular C-flow 1996 only get sent to those PEs that need to see the packets of that 1997 C-flow. There is never any need to use the procedures of section 7.4 1998 for binding particular C-flows to particular P-tunnels. 2000 The particular case in which there is no need for explicit tracking 2001 is the case where ingress replication is being used to create a 2002 one-hop ASBR-ASBR Inter-AS segment of an segmented Inter-AS P-tunnel. 2004 Section 9.1 specifies three different methods that can be used to 2005 prevent duplication of multicast data packets. Any given deployment 2006 must use at least one of those methods. Note that the method 2007 described in section 9.1.1 ("Discarding Packets from the Wrong PE") 2008 presupposes that the egress PE of a P-tunnel can, upon receiving a 2009 packet from the P-tunnel, determine the identity of the PE that 2010 transmitted the packet into the P-tunnel. SPs that use ingress 2011 replication to instantiate their PMSIs are cautioned against the use 2012 for this purpose of unicast P-tunnel technologies that do not allow 2013 the egress PE to identify the ingress PE (e.g., MP2P LSPs for which 2014 penultimate-hop-popping is done). Deployment of ingress replication 2015 with such a P-tunnel technology MUST NOT be done unless it is known 2016 that the deployment relies entirely on the procedures of section 2017 9.1.2 or 9.1.3 for duplicate prevention. 2019 7. Binding Specific C-flows to Specific P-Tunnels 2021 As discussed previously, Intra-AS I-PMSI A-D routes may (or may not) 2022 have PMSI tunnel attributes, identifying P-tunnels that can be used 2023 as the default P-tunnels for carrying C-multicast traffic, i.e., for 2024 carrying C-multicast traffic that has not been specifically bound to 2025 another P-tunnel. 2027 If none of the Intra-AS I-PMSI A-D routes originated by a particular 2028 PE for a particular MVPN carry PMSI tunnel attributes at all (or if 2029 the only PMSI tunnel attributes they carry have type "No tunnel 2030 information present"), then there are no default P-tunnels for that 2031 PE to use when transmitting C-multicast traffic in that MVPN to other 2032 PEs. In that case, all such C-flows must be assigned to specific 2033 P-tunnels using one of the mechanisms specified in section 7.4. That 2034 is, all such C-flows are carried on P-tunnels that instantiate 2035 S-PMSIs. 2037 There are other cases where it may be either necessary or desirable 2038 to use the mechanisms of section 7.4 to identify specific C-flows and 2039 bind them to or unbind them from specific P-tunnels. Some possible 2040 cases are: 2042 - The policy for a particular MVPN is to send all C-data on 2043 S-PMSIs, even if the Intra-AS I-PMSI A-D routes carry PMSI tunnel 2044 attributes. (This is another case where all C-data is carried on 2045 S-PMSIs; presumably the I-PMSIs are used for control 2046 information.) 2048 - It is desired to optimize the routing of the particular C-flow, 2049 which may already be traveling on an I-PMSI, by sending it 2050 instead on an S-PMSI. 2052 - If a particular C-flow is traveling on an S-PMSI, it may be 2053 considered desirable to move it to an I-PMSI (i.e., optimization 2054 of the routing for that flow may no longer be considered 2055 desirable) 2057 - It is desired to change the encapsulation used to carry the 2058 C-flow, e.g., because one now wants to aggregate it on a P-tunnel 2059 with flows from other MVPNs. 2061 Note that if Full PIM Peering over an MI-PMSI (section 5.2) is being 2062 used, then from the perspective of the PIM state machine, the 2063 "interface" connecting the PEs to each other is the MI-PMSI, even if 2064 some or all of the C-flows are being sent on S-PMSIs. That is, from 2065 the perspective of the C-PIM state machine, when a C-flow is being 2066 sent or received on an S-PMSI, the output or input interface 2067 (respectively) is considered to be the MI-PMSI. 2069 Section 7.1 discusses certain general considerations that apply 2070 whenever a specified C-flow is bound to a specified P-tunnel using 2071 the mechanisms of section 7.4. This includes the case where the 2072 C-flow is moved from one P-tunnel to another, as well as the case 2073 where the C-flow is initially bound to an S-PMSI P-tunnel. 2075 Section 7.2 discusses the specific case of using the mechanisms of 2076 section 7.4 as a way of optimizing multicast routing by switching 2077 specific flows from one P-tunnel to another. 2079 Section 7.3 discusses the case where the mechanisms of section 7.4 2080 are used to announce the presence of "unsolicited flooded data" and 2081 to assign such data to a particular P-tunnel. 2083 Section 7.4 specifies the protocols for assigning specific C-flows to 2084 specific P-tunnels. These protocols may be used to assign a C-flow 2085 to a P-tunnel initially, or to switch a flow from one P-tunnel to 2086 another. 2088 Procedures for binding to a specified P-tunnel the set of C-flows 2089 traveling along a specified C-tree (or for so binding a set of 2090 C-flows that share some relevant characteristic), without identifying 2091 each flow individually, are outside the scope of this document. 2093 7.1. General Considerations 2095 7.1.1. At the PE Transmitting the C-flow on the P-Tunnel 2097 The decision to bind a particular C-flow (designated as (C-S,C-G)) to 2098 a particular P-tunnel, or to switch a particular C-flow to a 2099 particular P-tunnel, is always made by the PE that is to transmit the 2100 C-flow onto the P-tunnel. 2102 Whenever a PE moves a particular C-flow from one P-tunnel, say P1, to 2103 another, say P2, care must be taken to ensure that there is no steady 2104 state duplication of traffic. At any given time, the PE transmits 2105 the C-flow either on P1 or on P2, but not on both. 2107 When a particular PE, say PE1, decides to bind a particular C-flow to 2108 a particular P-tunnel, say P2, the following procedures MUST be 2109 applied: 2111 - PE1 must issue the required control plane information to signal 2112 that the specified C-flow is now bound to P-tunnel P2 (see 2113 section 7.4). 2115 - If P-tunnel P2 needs to be constructed from the root downwards, 2116 PE1 must initiate the signaling to construct P2. This is only 2117 required if P2 is an RSVP-TE P2MP LSP. 2119 - If the specified C-flow is currently bound to a different 2120 P-tunnel, say P1, then: 2122 * PE1 MUST wait for a "switch-over" delay before sending 2123 traffic of the C-flow on P-tunnel P2. It is RECOMMENDED to 2124 allow this delay to be configurable. 2126 * Once the "switch-over" delay has elapsed, PE1 MUST send 2127 traffic for the C-flow on P2, and MUST NOT send it on P1. In 2128 no case is any C-flow packet sent on both P-tunnels. 2130 When a C-flow is switched from one P-tunnel to another, the purpose 2131 of running a switch-over timer is to minimize packet loss without 2132 introducing packet duplication. However, jitter may be introduced 2133 due to the difference in transit delays between the old and new 2134 P-tunnels. 2136 For best effect, the switch-over timer should be configured to a 2137 value that is "just long enough" (a) to allow all the PEs to learn 2138 about the new binding of C-flow to P-tunnel, and (b) to allow the PEs 2139 to construct the P-tunnel, if it doesn't already exist. 2141 If, after such a switch, the "old" P-tunnel P1 is no longer needed, 2142 it SHOULD be torn down and the resources supporting it freed. The 2143 procedures for "tearing down" a P-tunnel are specific to the P-tunnel 2144 technology. 2146 Procedures for binding sets of C-flows traveling along specified 2147 C-trees (or sets of C-flows sharing any other characteristic) to a 2148 specified P-tunnel (or for moving them from one P-tunnel to another) 2149 are outside the scope of this document. 2151 7.1.2. At the PE Receiving the C-flow from the P-Tunnel 2153 Suppose that a particular PE, say PE1, learns, via the procedures of 2154 section 7.4, that some other PE, say PE2, has bound a particular 2155 C-flow, designated as (C-S,C-G), to a particular P-tunnel, say P2. 2156 Then PE1 must determine whether it needs to receive (C-S,C-G) traffic 2157 from PE2. 2159 If BGP is being used to distribute C-multicast routing information 2160 from PE to PE, the conditions under which PE1 needs to receive 2161 (C-S,C-G) traffic from PE2 are specified in section 12.3 of 2162 [MVPN-BGP]. 2164 If PIM over an MI-PMSI is being used to distribute C-multicast 2165 routing from PE to PE, PE1 needs to receive (C-S,C-G) traffic from 2166 PE2 if one or more of the following conditions holds: 2168 - PE1 has (C-S,C-G) state such that PE2 is PE1's Upstream PE for 2169 (C-S,C-G), and PE1 has downstream neighbors ("non-null olist") 2170 for the (C-S,C-G) state. 2172 - PE1 has (C-*,C-G) state with an upstream PE (not necessarily PE2) 2173 and with downstream neighbors ( "non-null olist"), but PE1 does 2174 not have (C-S,C-G) state. 2176 - Native PIM methods are being used to prevent steady-state packet 2177 duplication, and PE1 has either (C-*,C-G) or (C-S,C-G) state such 2178 that the MI-PMSI is one of the downstream interfaces. Note that 2179 this includes the case where PE1 is itself sending (C-S,C-G) 2180 traffic on an S-PMSI. (In this case, PE1 needs to receive the 2181 (C-S,C-G) traffic from PE2 in order to allow the PIM Assert 2182 mechanism to function properly.) 2184 Irrespective of whether BGP or PIM is being used to distribute 2185 C-multicast routing information, once PE1 determines that it needs to 2186 receive (C-S,C-G) traffic from PE2, the following procedures MUST be 2187 applied: 2189 - PE1 MUST take all necessary steps to be able to receive the 2190 (C-S,C-G) traffic on P2. 2192 * If P2 is a PIM tunnel or an mLDP LSP, PE1 will need to use 2193 PIM or mLDP (respectively) to join P2 (unless it is already 2194 joined to P2). 2196 * PE1 may need to modify the forwarding state for (C-S,C-G) to 2197 indicate that (C-S,C-G) traffic is to be accepted on P2. If 2198 P2 is an Aggregate Tree, this also implies setting up the 2199 demultiplexing forwarding entries based on the inner label as 2200 described in section 6.3.3 2202 - If PE1 was previously receiving the (C-S,C-G) C-flow on another 2203 P-tunnel, say P1, then: 2205 * PE1 MAY run a switch-over timer, and until it expires, SHOULD 2206 accept traffic for the given C-flow on both P1 and P2; 2208 * If, after such a switch, the "old" P-tunnel P1 is no longer 2209 needed, it SHOULD be torn down and the resources supporting 2210 it freed. The procedures for "tearing down" a P-tunnel are 2211 specific to the P-tunnel technology. 2213 - If PE1 later determines that it no longer needs to receive any of 2214 the C-multicast data that is being sent on a particular P-tunnel, 2215 it may initiate signaling (specific to the P-unnel technology) to 2216 remove itself from that tunnel. 2218 7.2. Optimizing Multicast Distribution via S-PMSIs 2220 Whenever a particular multicast stream is being sent on an I-PMSI, it 2221 is likely that the data of that stream is being sent to PEs that do 2222 not require it. If a particular stream has a significant amount of 2223 traffic, it may be beneficial to move it to an S-PMSI that includes 2224 only those PEs that are transmitters and/or receivers (or at least 2225 includes fewer PEs that are neither). 2227 If explicit tracking is being done, S-PMSI creation can also be 2228 triggered on other criteria. For instance there could be a "pseudo 2229 wasted bandwidth" criteria: switching to an S-PMSI would be done if 2230 the bandwidth multiplied by the number of uninterested PEs (PE that 2231 are receiving the stream but have no receivers) is above a specified 2232 threshold. The motivation is that (a) the total bandwidth wasted by 2233 many sparsely subscribed low-bandwidth groups may be large, and (b) 2234 there's no point to moving a high-bandwidth group to an S-PMSI if all 2235 the PEs have receivers for it. 2237 Switching a (C-S,C-G) stream to an S-PMSI may require the root of the 2238 S-PMSI to determine the egress PEs that need to receive the (C-S,C-G) 2239 traffic. This is true in the following cases: 2241 - If the P-tunnel is a source initiated tree, such as a RSVP-TE 2242 P2MP Tunnel, the PE needs to know the leaves of the tree before 2243 it can instantiate the S-PMSI. 2245 - If a PE instantiates multiple S-PMSIs, belonging to different 2246 MVPNs, using one P-multicast tree, such a tree is termed an 2247 Aggregate Tree with a selective mapping. The setting up of such 2248 an Aggregate Tree requires the ingress PE to know all the other 2249 PEs that have receivers for multicast groups that are mapped onto 2250 the tree. 2252 The above two cases require that explicit tracking be done for the 2253 (C-S, C-G) stream. The root of the S-PMSI MAY decide to do explicit 2254 tracking of this stream only after it has determined to move the 2255 stream to an S-PMSI, or it MAY have been doing explicit tracking all 2256 along. 2258 If the S-PMSI is instantiated by a P-multicast tree, the PE at the 2259 root of the tree must signal the leaves of the tree that the 2260 (C-S,C-G) stream is now bound to the S-PMSI. Note that the PE could 2261 create the identity of the P-multicast tree prior to the actual 2262 instantiation of the P-tunnel. 2264 If the S-PMSI is instantiated by a source-initiated P-multicast tree 2265 (e.g., an RSVP-TE P2MP tunnel), the PE at the root of the tree must 2266 establish the source-initiated P-multicast tree to the leaves. This 2267 tree MAY have been established before the leaves receive the S-PMSI 2268 binding, or MAY be established after the leaves receives the binding. 2269 The leaves MUST NOT switch to the S-PMSI until they receive both the 2270 binding and the tree signaling message. 2272 7.3. Announcing the Presence of Unsolicited Flooded Data 2274 A PE may receive "unsolicited" data from a CE, where the data is 2275 intended to be flooded to the other PEs of the same MVPN and then on 2276 to other CEs. By "unsolicited", we mean that the data is to be 2277 delivered to all the other PEs of the MVPN, even though those PEs may 2278 not have sent any control information indicating that they need to 2279 receive that data. 2281 For example, if the BSR [BSR] is being used within the MVPN, BSR 2282 control messages may be received by a PE from a CE. These need to be 2283 forwarded to other PEs, even though no PE ever issues any kind of 2284 explicit signal saying that it wants to receive BSR messages. 2286 If a PE receives a BSR message from a CE, and if the CE's MVPN has an 2287 MI-PMSI, then the PE can just send BSR messages on the appropriate 2288 P-tunnel. Otherwise, the PE MUST announce the binding of a 2289 particular C-flow to a particular P-tunnel, using the procedures of 2290 section 7.4. The particular C-flow in this case would be 2291 (C-IPaddress_of_PE, ALL-PIM-ROUTERS). The P-tunnel identified by the 2292 procedures of section 7.4 may or may not be one that was previously 2293 identified in the PMSI tunnel attribute of an I-PMSI A-D route. 2294 Further procedures for handling BSR may be found in sections 5.2.1 2295 and 5.3.4. 2297 Analogous procedures may be used for announcing the presence of other 2298 sorts of unsolicited flooded data, e.g., dense mode data or data from 2299 proprietary protocols that presume messages can be flooded. However, 2300 a full specification of the procedures for traffic other than BSR 2301 traffic is outside the scope of this document. 2303 7.4. Protocols for Binding C-flows to P-tunnels 2305 We describe two protocols for binding C-flows to P-tunnels. 2307 These protocols can be used for moving C-flows from I-PMSIs to 2308 S-PMSIs, as long as the S-PMSI is instantiated by a P-multicast tree. 2309 (If the S-PMSI is instantiated by means of ingress replication, the 2310 procedures of section 6.4.5 suffice.) 2312 These protocols can also be used for other cases in which it is 2313 necessary to bind specific C-flows to specific P-tunnels. 2315 7.4.1. Using BGP S-PMSI A-D Routes 2317 Not withstanding the name of the mechanism "S-PMSI A-D Routes", the 2318 mechanism to be specified in this section may be used any time it is 2319 necessary to advertise a binding of a C-flow to a particular 2320 P-tunnel. 2322 7.4.1.1. Advertising C-flow Binding to P-Tunnel 2324 The ingress PE informs all the PEs that are on the path to receivers 2325 of the (C-S,C-G) of the binding of the P-tunnel to the (C-S,C-G). The 2326 BGP announcement is done by sending update for the MCAST-VPN address 2327 family. An S-PMSI A-D route is used, containing the following 2328 information: 2330 1. IP address of the originating PE 2332 2. The RD configured locally for the MVPN. This is required to 2333 uniquely identify the (C-S,C-G) as the addresses could overlap 2334 between different MVPNs. This is the same RD value used in the 2335 auto-discovery process. 2337 3. The C-S address. 2339 4. The C-G address. 2341 5. A PE MAY use a single P-tunnel to aggregate two or more 2342 S-PMSIs. If the PE already advertised unaggregated S-PMSI 2343 auto-discovery routes for these S-PMSIs, then a decision to 2344 aggregate them requires the PE to re-advertise these routes. 2345 The re-advertised routes MUST be the same as the original ones, 2346 except for the PMSI tunnel attribute. If the PE has not 2347 previously advertised S-PMSI auto-discovery routes for these 2348 S-PMSIs, then the aggregation requires the PE to advertise 2349 (new) S-PMSI auto-discovery routes for these S-PMSIs. The PMSI 2350 Tunnel attribute in the newly advertised/re-advertised routes 2351 MUST carry the identity of the P-tunnel that aggregates the 2352 S-PMSIs. 2354 If all these aggregated S-PMSIs belong to the same MVPN, and 2355 this MVPN uses PIM as its C-multicast routing protocol, then 2356 the corresponding S-PMSI A-D routes MAY carry an MPLS upstream 2357 assigned label [MPLS-UPSTREAM-LABEL]. Moreover, in this case 2358 the labels MUST be distinct on a per MVPN basis, and MAY be 2359 distinct on a per route basis. 2361 If all these aggregated S-PMSIs belong to the MVPN(s) that use 2362 mLDP as its C-multicast routing protocol, then the 2363 corresponding S-PMSI A-D routes MUST carry an MPLS upstream 2364 assigned label [MPLS-UPSTREAM-LABEL], and these labels MUST be 2365 distinct on a per route (per mLDP FEC) basis, irrespective of 2366 whether the aggregated S-PMSIs belong to the same or different 2367 MVPNs. 2369 When a PE distributes this information via BGP, it must include the 2370 following: 2372 1. An identifier for the particular P-tunnel to which the stream 2373 is to be bound. This identifier is a structured field that 2374 includes the following information: 2376 * The type of tunnel 2378 * An identifier for the tunnel. The form of the identifier 2379 will depend upon the tunnel type. The combination of 2380 tunnel identifier and tunnel type should contain enough 2381 information to enable all the PEs to "join" the tunnel and 2382 receive messages from it. 2384 2. Route Target Extended Communities attribute. This is used as 2385 described in section 4. 2387 7.4.1.2. Explicit Tracking 2389 If the PE wants to enable explicit tracking for the specified flow, 2390 it also indicates this in the A-D route it uses to bind the flow to a 2391 particular P-tunnel. Then any PE that receives the A-D route will 2392 respond with a "Leaf A-D Route" in which it identifies itself as a 2393 receiver of the specified flow. The Leaf A-D route will be withdrawn 2394 when the PE is no longer a receiver for the flow. 2396 If the PE needs to enable explicit tracking for a flow without at the 2397 same time binding the flow to a specific P-tunnel, it can do so by 2398 sending an S-PMSI A-D route whose NLRI identifies the flow and whose 2399 PMSI Tunnel attribute has its tunnel type value set to "no tunnel 2400 information present" and its "leaf information required" bit set to 2401 1. This will elicit the Leaf A-D Routes. This is useful when the PE 2402 needs to know the receivers before selecting a P-tunnel. 2404 7.4.2. UDP-based Protocol 2406 This procedure carries its control messages in UDP, and requires that 2407 the MVPN has an MI-PMSI that can be used to carry the control 2408 messages. 2410 7.4.2.1. Advertising C-flow Binding to P-tunnel 2412 In order for a given PE to move a particular C-flow to a particular 2413 P-tunnel, an "S-PMSI Join message" is sent periodically on the 2414 MI-PMSI. (Notwithstanding the name of the mechanism, the mechanism 2415 may be used to bind a flow to any P-tunnel.) The S-PMSI Join is a 2416 UDP-encapsulated message whose destination address is ALL-PIM-ROUTERS 2417 (224.0.0.13), and whose destination port is 3232. 2419 The S-PMSI Join Message contains the following information: 2421 - An identifier for the particular multicast stream that is to be 2422 bound to the P-tunnel. This can be represented as an (S,G) pair. 2424 - An identifier for the particular P-tunnel to which the stream is 2425 to be bound. This identifier is a structured field that includes 2426 the following information: 2428 * The type of tunnel used to instantiate the S-PMSI 2429 * An identifier for the tunnel. The form of the identifier 2430 will depend upon the tunnel type. The combination of tunnel 2431 identifier and tunnel type should contain enough information 2432 to enable all the PEs to "join" the tunnel and receive 2433 messages from it. 2435 * If (and only if) the identified P-tunnel is aggregating 2436 several S-PMSIs, any demultiplexing information needed by the 2437 tunnel encapsulation protocol to identify a particular 2438 S-PMSI. 2440 If the policy for the MVPN is that traffic is sent/received by 2441 default over an MI-PMSI, then traffic for a particular C-flow can be 2442 switched back to the MI-PMSI simply by ceasing to send S-PMSI Joins 2443 for that C-flow. 2445 Note that an S-PMSI Join that is not received over a PMSI (e.g., one 2446 that is received directly from a CE) is an illegal packet that MUST 2447 be discarded. 2449 7.4.2.2. Packet Formats and Constants 2451 The S-PMSI Join message is encapsulated within UDP, and has the 2452 following type/length/value (TLV) encoding: 2454 0 1 2 3 2455 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2456 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2457 | Type | Length | Value | 2458 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2459 | . | 2460 | . | 2461 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2463 Type (8 bits) 2465 Length (16 bits): the total number of octets in the Type, Length, and 2466 Value fields combined 2468 Value (variable length) 2470 In this specification, only one type of S-PMSI Join is defined. A 2471 "type 1" S-PMSI Join is used when the S-PMSI tunnel is a PIM tunnel 2472 that is used to carry a single multicast stream, where the packets of 2473 that stream have IPv4 source and destination IP addresses. 2475 The S-PMSI Join format to use when the C-source and C-group are IPv6 2476 addresses will be defined in a follow-on document. 2478 0 1 2 3 2479 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2480 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2481 | Type | Length | Reserved | 2482 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2483 | C-source | 2484 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2485 | C-group | 2486 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2487 | P-group | 2488 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2490 Type (8 bits): 1 2492 Length (16 bits): 16 2494 Reserved (8 bits): This field SHOULD be zero when transmitted, and 2495 MUST be ignored when received. 2497 C-Source (32 bits): the IPv4 address of the traffic source in the 2498 VPN. 2500 C-Group (32 bits): the IPv4 address of the multicast traffic 2501 destination address in the VPN. 2503 P-Group (32 bits): the IPv4 group address that the PE router is going 2504 to use to encapsulate the flow (C-Source, C-Group). 2506 The P-group identifies the S-PMSI P-tunnel, and the (C-S,C-G) 2507 identifies the multicast flow that is carried in the P-tunnel. 2509 The protocol uses the following constants. 2511 [S-PMSI_DELAY]: 2513 once an S-PMSI Join message has been sent, the PE router that is 2514 to transmit onto the S-PMSI will delay this amount of time before 2515 it begins using the S-PMSI. The default value is 3 seconds. 2517 [S-PMSI_TIMEOUT]: 2519 if a PE (other than the transmitter) does not receive any packets 2520 over the S-PMSI P-tunnel for this amount of time, the PE will 2521 prune itself from the S-PMSI P-tunnel, and will expect (C-S,C-G) 2522 packets to arrive on an I-PMSI. The default value is 3 minutes. 2524 This value must be consistent among PE routers. 2526 [S-PMSI_HOLDOWN]: 2528 if the PE that transmits onto the S-PMSI does not see any 2529 (C-S,C-G) packets for this amount of time, it will resume sending 2530 (C-S,C-G) packets on an I-PMSI. 2532 This is used to avoid oscillation when traffic is bursty. The 2533 default value is 1 minute. 2535 [S-PMSI_INTERVAL] 2536 the interval the transmitting PE router uses to periodically send 2537 the S-PMSI Join message. The default value is 60 seconds. 2539 7.4.3. Aggregation 2541 S-PMSIs can be aggregated on a P-multicast tree. The S-PMSI to 2542 (C-S,C-G) binding advertisement supports aggregation. Furthermore the 2543 aggregation procedures of section 6.3 apply. It is also possible to 2544 aggregate both S-PMSIs and I-PMSIs on the same P-multicast tree. 2546 8. Inter-AS Procedures 2548 If an MVPN has sites in more than one AS, it requires one or more 2549 PMSIs to be instantiated by inter-AS P-tunnels. This document 2550 describes two different types of inter-AS P-tunnel: 2552 1. "Segmented Inter-AS P-tunnels" 2554 A segmented inter-AS P-tunnel consists of a number of 2555 independent segments that are stitched together at the ASBRs. 2556 There are two types of segment, inter-AS segments and intra-AS 2557 segments. The segmented inter-AS P-tunnel consists of 2558 alternating intra-AS and inter-AS segments. 2560 Inter-AS segments connect adjacent ASBRs of different ASes; 2561 these "one-hop" segments are instantiated as unicast P-tunnels. 2563 Intra-AS segments connect ASBRs and PEs that are in the same 2564 AS. An intra-AS segment may be of whatever technology is 2565 desired by the SP that administers the that AS. Different 2566 intra-AS segments may be of different technologies. 2568 Note that the intra-AS segments of inter-AS P-tunnels form a 2569 category of P-tunnels that is distinct from simple intra-AS 2570 P-tunnels; we will rely on this distinction later (see Section 2571 9). 2573 A segmented inter-AS P-tunnel can be thought of as a tree that 2574 is rooted at a particular AS, and that has as its leaves the 2575 other ASes that need to receive multicast data from the root 2576 AS. 2578 2. "Non-segmented Inter-AS P-tunnels" 2580 A non-segmented inter-AS P-tunnel is a single P-tunnel that 2581 spans AS boundaries. The tunnel technology cannot change from 2582 one point in the tunnel to the next, so all ASes through which 2583 the P-tunnel passes must support that technology. In essence, 2584 AS boundaries are of no significance to a non-segmented 2585 inter-AS P-tunnel. 2587 Section 10 of [RFC4364] describes three different options for 2588 supporting unicast Inter-AS BGP/MPLS IP VPNs, known as options A, B, 2589 and C. We describe below how both segmented and non-segmented 2590 inter-AS trees can be supported when option B or option C is used. 2591 (Option A does not pass any routing information through an ASBR at 2592 all, so no special inter-AS procedures are needed.) 2594 8.1. Non-Segmented Inter-AS P-Tunnels 2596 In this model, the previously described discovery and tunnel setup 2597 mechanisms are used, even though the PEs belonging to a given MVPN 2598 may be in different ASes. 2600 8.1.1. Inter-AS MVPN Auto-Discovery 2602 The previously described BGP-based auto-discovery mechanisms work "as 2603 is" when an MVPN contains PEs that are in different Autonomous 2604 Systems. However, please note that, if non-segmented Inter-AS 2605 P-Tunnels are to be used, then the "Intra-AS" I-PMSI A-D routes MUST 2606 be distributed across AS boundaries! 2608 8.1.2. Inter-AS MVPN Routing Information Exchange 2610 When non-segmented inter-AS P-tunnels are used, MVPN C-multicast 2611 routing information may be exchanged by means of PIM peering across 2612 an MI-PMSI, or by means of BGP carrying C-multicast routes. 2614 When PIM peering is used to distribute the C-multicast routing 2615 information, a PE that sends C-PIM Join/Prune messages for a 2616 particular (C-S,C-G) must be able to identify the PE that is its PIM 2617 adjacency on the path to S. This is the "selected upstream PE" 2618 described in section 5.1. 2620 If BGP (rather than PIM) is used to distribute the C-multicast 2621 routing information, and if option b of section 10 of [RFC4364] is in 2622 use, then the C-multicast routes will be installed in the ASBRs along 2623 the path from each multicast source in the MVPN to each multicast 2624 receiver in the MVPN. If option b is not in use, the C-multicast 2625 routes are not installed in the ASBRs. The handling of the 2626 C-multicast routes in either case is thus exactly analogous to the 2627 handling of unicast VPN-IP routes in the corresponding case. 2629 8.1.3. Inter-AS P-Tunnels 2631 The procedures described earlier in this document can be used to 2632 instantiate either an I-PMSI or an S-PMSI with inter-AS P-tunnels. 2633 Specific tunneling techniques require some explanation. 2635 If ingress replication is used, the inter-AS PE-PE P-tunnels will use 2636 the inter-AS tunneling procedures for the tunneling technology used. 2638 Procedures in [RSVP-P2MP] are used for inter-AS RSVP-TE P2MP 2639 P-Tunnels. 2641 Procedures for using PIM to set up the P-tunnels are discussed in 2642 the next section. 2644 8.1.3.1. PIM-Based Inter-AS P-Multicast Trees 2646 When PIM is used to set up an inter-AS P-multicast tree, the PIM 2647 Join/Prune messages used to join the tree contain the IP address of 2648 the upstream PE. However, there are two special considerations that 2649 must be taken into account: 2651 - It is possible that the P routers within one or more of the ASes 2652 will not have routes to the upstream PE. For example, if an AS 2653 has a "BGP-free core", the P routers in an AS will not have 2654 routes to addresses outside the AS. 2656 - If the PIM Join/Prune message must travel through several ASes, 2657 it is possible that the ASBRs will not have routes to he PE 2658 routers. For example, in an inter-AS VPN constructed according 2659 to "option b" of section 10 of [RFC4364], the ASBRs do not 2660 necessarily have routes to the PE routers. 2662 If either of these two conditions obtains, then "ordinary" PIM 2663 Join/Prune messages cannot be routed to the upstream PE. Therefore, 2664 in that case the PIM Join/Prune messages MUST contain the "PIM MVPN 2665 Join Attribute". This allows the multicast distribution tree to be 2666 properly constructed even if routes to PEs in other ASes do not exist 2667 in the given AS's IGP, and even if the routes to those PEs do not 2668 exist in BGP. The use of an PIM MVPN Join Attribute in the PIM 2669 messages allows the inter-AS trees to be built. 2671 The use of the PIM MVPN Join Attribute allows the following 2672 information needs to be added to the PIM Join/Prune messages: a 2673 "Proxy Address", which contains the address of the next ASBR on the 2674 path to the upstream PE. When the PIM Join/Prune arrives at the ASBR 2675 that is identified by the "proxy address", that ASBR must change the 2676 proxy address to identify the next hop ASBR. 2678 This information allows the PIM Join/Prune to be routed through an AS 2679 even if the P routers of that AS do not have routes to the upstream 2680 PE. However, this information is not sufficient to enable the ASBRs 2681 to route the Join/Prune if the ASBRs themselves do not have routes to 2682 the upstream PE. 2684 However, even if the ASBRs do not have routes to the upstream PE, the 2685 procedures of this draft ensure that they will have Inter-AS I-PMSI 2686 A-D routes that lead to the upstream PE. If non-segmented inter-AS 2687 P-tunnels are being used, the ASBRs (and PEs) will have Intra-AS 2688 I-PMSI A-D routes that have been distributed inter-AS. 2690 So rather than having the PIM Join/Prune messages routed by the ASBRs 2691 along a route to the upstream PE, the PIM Join/Prune messages MUST be 2692 routed along the path determined by the Intra-AS I-PMSI A-D routes. 2694 If the only Intra-AS A-D route for a given MVPN is the "Intra-AS 2695 I-PMSI Route", the PIM Join/Prunes will be routed along that. 2696 However, if the PIM Join/Prune message is for a particular P-group 2697 address, and there is an "Intra-AS S-PMSI Route" specifying that 2698 particular P-group address as the P-tunnel for a particular S-PMSI, 2699 then the PIM Join/Prunes MUST be routed along the path determined by 2700 those Intra-AS A-D routes. 2702 The basic format of a PIM Join Attribute is specified in 2703 [PIM-ATTRIB]. The details of the PIM MVPN Join Attribute are 2704 specified in the next section. 2706 8.1.3.2. The PIM MVPN Join Attribute 2708 8.1.3.2.1. Definition 2710 In [PIM-ATTRIB], the notion of a "join attribute" is defined, and a 2711 format for included join attributes in PIM Join/Prune messages is 2712 specified. We now define a new join attribute, which we call the 2713 "MVPN Join Attribute". 2715 0 1 2 3 2716 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2717 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2718 |F|E| Attr_Type | Length | Proxy IP address 2719 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2720 | RD 2721 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-....... 2723 The Attr_Type field of the MVPN Join Attribute is set to 1. 2725 The F bit is set to 0. 2727 Two information fields are carried in the MVPN Join attribute: 2729 - Proxy: The IP address of the node towards which the PIM 2730 Join/Prune message is to be forwarded. This will either be an 2731 IPv4 or an IPv6 address, depending on whether the PIM Join/Prune 2732 message itself is IPv4 or IPv6. 2734 - RD: An eight-byte RD. This immediately follows the proxy IP 2735 address. 2737 The PIM message also carries the address of the upstream PE. 2739 In the case of an intra-AS MVPN, the proxy and the upstream PE are 2740 the same. In the case of an inter-AS MVPN, proxy will be the ASBR 2741 that is the exit point from the local AS on the path to the upstream 2742 PE. 2744 8.1.3.2.2. Usage 2746 When a PE router creates a PIM Join/Prune message in order to set up 2747 an inter-AS I-PMSI, it does so as a result of having received a 2748 particular Intra-AS A-D route. It includes an MVPN Join attribute 2749 whose fields are set as follows: 2751 - If the upstream PE is in the same AS as the local PE, then the 2752 proxy field contains the address of the upstream PE. Otherwise, 2753 it contains the address of the BGP next hop on the route to the 2754 upstream PE. 2756 - The RD field contains the RD from the NLRI of the Intra-AS A-D 2757 route. 2759 - The upstream PE field contains the address of the PE that 2760 originated the Intra-AS A-D route (obtained from the NLRI of that 2761 route). 2763 When a PIM router processes a PIM Join/Prune message with an MVPN 2764 Join Attribute, it first checks to see if the proxy field contains 2765 one of its own addresses. 2767 If not, the router uses the proxy IP address in order to determine 2768 the RPF interface and neighbor. The MVPN Join Attribute must be 2769 passed upstream, unchanged. 2771 If the proxy address is one of the router's own IP addresses, then 2772 the router looks in its BGP routing table for an Intra-AS A-D route 2773 whose NLRI consists of the upstream PE address prepended with the RD 2774 from the Join attribute. If there is no match, the PIM message is 2775 discarded. If there is a match the IP address from the BGP next hop 2776 field of the matching route is used in order to determine the RPF 2777 interface and neighbor. When the PIM Join/Prune is forwarded 2778 upstream, the proxy field is replaced with the address of the BGP 2779 next hop, and the RD and upstream PE fields are left unchanged. 2781 The use of non-segmented inter-AS trees constructed via BIDIR-PIM is 2782 outside the scope of this document. 2784 8.2. Segmented Inter-AS P-Tunnels 2786 The procedures for setting up and maintaining Segmented Inter-AS 2787 Inclusive and Selective P-Tunnels may be found in [MVPN-BGP]. 2789 9. Preventing Duplication of Multicast Data Packets 2791 Consider the case of an egress PE that receives packets of a 2792 particular C-flow,(C-S,C-G), over a non-aggregated S-PMSI. The 2793 procedures described so far will never cause the PE to receive 2794 duplicate copies of any packet in that stream. It is possible that 2795 the (C-S,C-G) stream is carried in more than one S-PMSI; this may 2796 happen when the site that contains C-S is multihomed to more than one 2797 PE. However, a PE that needs to receive (C-S,C-G) packets only joins 2798 one of these S-PMSIs, and so only receives one copy of each packet. 2800 However, if the data packets of stream (C-S,C-G) are carried in 2801 either an I-PMSI or in an aggregated S-PMSI, then the procedures 2802 specified so far make it possible for an egress PE to receive more 2803 than one copy of each data packet. Additional procedures are needed 2804 to either make this impossible, or to ensure that the egress PE does 2805 not forward duplicates to the CE routers. 2807 This section covers only the situation where the C-trees are 2808 unidirectional, in either the ASM or SSM service models. The case 2809 where the C-trees are bidirectional is considered separately in 2810 section 11. 2812 There are two cases where the procedures specified so far make it 2813 possible for an egress PE to receive duplicate copies of a multicast 2814 data packet. These are: 2816 1. The first case occurs when both of the following conditions 2817 hold: 2819 a. an MVPN site that contains C-S or C-RP is multihomed to 2820 more than one PE, and 2822 b. either an I-PMSI or an aggregated S-PMSI is used for 2823 carrying the packets originated by C-S. 2825 In this case, an egress PE may receive one copy of the packet 2826 from each PE to which the site is homed. This case is 2827 discussed further in section 9.2. 2829 2. The second case occurs when all of the following conditions 2830 hold: 2832 a. the IP destination address of the customer packet, C-G, 2833 identifies a multicast group that is operating in ASM 2834 mode, and whose C-multicast tree is set up using PIM-SM 2836 b. an MI-PMSI is used for carrying the data packets, and 2838 c. a router or a CE in a site connected to the egress PE 2839 switches from the C-RP tree to C-S tree. 2841 In this case, it is possible to get one copy of a given packet 2842 from the ingress PE attached to the C-RP's site, and one from 2843 the ingress PE attached to the C-S's site. This case is 2844 discussed further in section 9.3. 2846 Additional procedures are therefore needed to ensure that no MVPN 2847 customer sees steady state multicast data packet duplication. There 2848 are three procedures that may be used: 2850 1. Discarding data packets received from the "wrong" PE 2852 2. Single Forwarder Selection 2854 3. Native PIM methods 2856 These methods are described in section 9.1. Their applicability to 2857 the two scenarios where duplication is possible is discussed in 2858 section 9.2 and 9.3. 2860 9.1. Methods for Ensuring Non-Duplication 2862 Every MVPN MUST use at least one of the three methods for ensuring 2863 non-duplication. 2865 9.1.1. Discarding Packets from Wrong PE 2867 Per section 5.1.3, an egress PE, say PE1, chooses a specific upstream 2868 PE, for given (C-S,C-G). When PE1 receives a (C-S,C-G) packet from a 2869 PMSI, it may be able to identify the PE that transmitted the packet 2870 onto the PMSI. If that transmitter is other than the PE selected by 2871 PE1 as the upstream PE, then PE1 can drop the packet. This means 2872 that the PE will see a duplicate, but the duplicate will not get 2873 forwarded. 2875 The method used by an egress PE to determine the ingress PE for a 2876 particular packet, received over a particular PMSI, depends on the 2877 P-tunnel technology that is used to instantiate the PMSI. If the 2878 P-tunnel is a P2MP LSP, a PIM-SM or PIM-SSM tree, or a unicast 2879 P-tunnel that uses IP encapsulation, then the tunnel encapsulation 2880 contains information that can be used (possibly along with other 2881 state information in the PE) to determine the ingress PE, as long as 2882 the P-tunnel is instantiating an intra-AS PMSI, or an inter-AS PMSI 2883 which is supported by a non-segmented inter-AS tunnel. 2885 Even when inter-AS segmented P-tunnels are used, if an aggregated 2886 S-PMSI is used for carrying the packets, the tunnel encapsulation 2887 must have some information that can be used to identify the PMSI, and 2888 that in turn implicitly identifies the ingress PE. 2890 Consider the case of an I-PMSI that spans multiple ASes and that is 2891 instantiated by segmented Inter-AS P-tunnels. Suppose it is carrying 2892 data this is traveling along a particular C-tree. Suppose also that 2893 the C-root of that C-tree is multi-homed to two or more PEs, and that 2894 each such PE is in a different AS than the others. Then if there is 2895 any duplicate traffic, the duplicates will arrive on a different 2896 P-tunnel. Specifically, if the PE was expecting the traffic on an 2897 particular inter-AS P-tunnel, duplicate traffic will arrive either on 2898 an intra-AS P-tunnel (not an intra-AS segment of an inter-AS 2899 P-tunnel), or on some other inter-AS P-tunnel. To detect duplicates 2900 the PE has to keep track of which inter-AS A-D route the PE uses for 2901 sending MVPN multicast routing information towards C-S/C-RP. The PE 2902 MUST process received (multicast) traffic originated by C-S/C-RP only 2903 from the Inter-AS P-tunnel that was carried in the best Inter-AS A-D 2904 route for the MVPN and that was originated by the AS that contains 2905 C-S/C-RP (where "the best" is determined by the PE). The PE MUST 2906 discard, as duplicates, all other multicast traffic originated by 2907 C-S/C-RP, but received on any other P-tunnel. 2909 If, for a given MVPN, (a) MI-PMSI is used for carrying multicast data 2910 packets, (b) the MI-PMSI is instantiated by a segmented Inter-AS 2911 P-tunnel, (c) C-S or C-RP is multi-homed to different PEs, and (d) at 2912 least two of such PEs are in the same AS, then depending on the 2913 tunneling technology used to instantiate the MI-PMSI, it may not 2914 always be possible for the egress PE to determine the upstream PE. 2915 In that case the procedure of section 9.1.2 or 9.1.3 must be used. 2917 N.B.: Section 10 describes an exception case where PE1 has to accept 2918 a packet even if it is not from the selected upstream PE. 2920 9.1.2. Single Forwarder Selection 2922 Section 5.1 specifies a procedure for choosing a "default upstream PE 2923 selection", such that (except during routing transients) all PEs will 2924 choose the same default upstream PE. To ensure that duplicate 2925 packets are not sent through the backbone (except during routing 2926 transients), an ingress PE does not forward to the backbone any 2927 (C-S,C-G) multicast data packet it receives from a CE, unless the PE 2928 is the default upstream PE selection. 2930 One difference in effect between this procedure and the procedure of 2931 section 9.1.1 is that this procedure sends only one copy of each 2932 packet to each egress PE, rather than sending multiple copies and 2933 forcing the egress PE to discard all but one. 2935 9.1.3. Native PIM Methods 2937 If PE-PE multicast routing information for a given MVPN is being 2938 disseminated by running PIM over an MI-PMSI, then native PIM methods 2939 will prevent steady state data packet duplication. The PIM Assert 2940 mechanism prevents steady state duplication in the scenario of 2941 section 9.2, even if Single Forwarder Selection is not done. The PIM 2942 Prune(S,G,rpt) mechanism addresses the scenario of section 9.3. 2944 9.2. Multihomed C-S or C-RP 2946 Any of the three methods of section 9.1 will prevent steady state 2947 duplicates in the case of a multihomed C-S or C-RP. 2949 9.3. Switching from the C-RP tree to C-S tree 2951 9.3.1. How Duplicates Can Occur 2953 If some PEs are on the C-S tree and some on the C-RP tree then a PE 2954 may also receive duplicate data traffic after a (C-*,C-G) to 2955 (C-S,C-G) switch. 2957 If PIM is being used on an MI-PMSI to disseminate multicast routing 2958 information, native PIM methods (in particular, the use of the 2959 Prune(S,G,rpt) message) prevent steady state data duplication in this 2960 case. 2962 If BGP C-multicast routing is being used, then the procedure of 2963 section 9.1.1, if applicable, can be used to prevent duplication. 2964 However, if that procedure is not applicable, then the procedure of 2965 section 9.1.2 is not sufficient to prevent steady state data 2966 duplication in all scenarios. 2968 In the scenario where (a) BGP C-multicast routing is being used, (b) 2969 there are inter-site shared C-trees, and (c) there are inter-site 2970 source C-trees, then additional procedures are needed. To see this, 2971 consider the following topology: 2973 CE1---C-RP 2974 | 2975 | 2976 CE2---PE1-- ... --PE2---CE5---C-S 2977 ... 2978 C-R1---CE3---PE3-- ... --PE4---CE4---C-R2 2980 Suppose that C-R1 and C-R2 use PIM to join the (C-*,C-G) tree, where 2981 C-RP is the RP corresponding to C-G. As a result, CE3 and CE4 will send 2982 PIM Join(*,G) messages to PE3 and PE4 respectively. This will cause PE3 2983 and PE4 to originate C-multicast Shared Tree Join Routes, specifying 2984 (C-*,C-G). These routes will identify PE1 as the upstream PE. 2986 Now suppose that C-S is a transmitter for multicast group C-G, and that 2987 C-S sends its multicast data packets to C-RP in PIM register messages. 2988 Then PE1 will receive (C-S,C-G) data packets from CE1, and will forward 2989 them over an I-PMSI to PE3 and PE4, who will forward them in turn to CE3 2990 and CE4 respectively. 2992 When C-R1 receives (C-S,C-G) data packets, it may decide to join the 2993 (C-S,C-G) source tree, by sending a PIM Join(S,G) to CE3. This will in 2994 turn cause CE3 to send a PIM Join(S,G) to PE3, which will in turn cause 2995 PE3 to originate a C-multicast Source Tree Join Route, specifying 2996 (C-S,C-G), and identifying PE2 as the upstream PE. As a result, when 2997 PE2 receives (C-S,C-G) data packets from CE5, it will forward them on a 2998 PMSI to PE3. 3000 At this point, the following situation obtains: 3002 - If PE1 receives (C-S,C-G) packets from CE1, PE1 must forward them on 3003 the I-PMSI, because PE4 is still expecting to receive the (C-S,C-G) 3004 packets from PE1. 3006 - PE3 must continue to receive packets from the I-PMSI, since there 3007 may be other sources transmitting C-G traffic, and PE3 currently has 3008 no other way to receive that traffic. 3010 - PE3 must also receive (C-S,C-G) traffic from PE2. 3012 As a result, PE3 may receive two copies of each (C-S,C-G) packet. The 3013 procedure of section 9.1.2 (single forwarder selection) does not prevent 3014 PE3 from receiving two copies, because it does not prevent one PE from 3015 forwarding (C-S,C-G) traffic along the shared C-tree while another 3016 forwards (C-S,C-G) traffic along a source-specific C-tree. 3018 So if PE3 cannot apply the method of section 9.1.1 (discard packet from 3019 wrong PE), perhaps because the tunneling technology does not allow the 3020 egress PE to identify the ingress PE, then additional procedures are 3021 needed. 3023 9.3.2. Solution using Source Active A-D Routes 3025 The issue described in section 9.3.2 is resolved through the use of 3026 Source Active A-D Routes. In the remainder this section, we provide 3027 an example of how this works, along with an informal description of 3028 the procedures. 3030 A full and precise specification of the relevant procedures can be 3031 found in section 13 of [MVPN-BGP]. In the event of any conflicts or 3032 other discrepancies between the description below and the description 3033 in [MVPN-BGP], [MVPN-BGP] is to be considered to be the authoritative 3034 document. 3036 Please note that the material in this section only applies when 3037 inter-site shared trees are being used. 3039 Whenever a PE creates an (C-S,C-G) state as a result of receiving a 3040 C-multicast route for (C-S,C-G) from some other PE, and the C-G group 3041 is an ASM group, the PE that creates the state MUST originate a 3042 Source Active A-D route (see [MVPN-BGP] section 4.5). The NLRI of 3043 the route includes C-S and C-G. By default, the route carries the 3044 same set of Route Targets as the Intra-AS I-PMSI A-D route of the 3045 MVPN originated by the PE. Using the normal BGP procedures, the 3046 route is propagated to all the PEs of the MVPN. For more details see 3047 Section 13.1 ("Source Within a Site - Source Active Advertisement") 3048 of [MVPN-BGP]. 3050 When as a result of receiving a new Source Active A-D route a PE 3051 updates its VRF with the route, the PE MUST check if the newly 3052 received route matches any (C-*,C-G) entries. If (a) there is a 3053 matching entry, (b) the PE does not have (C-S,C-G) state in its 3054 MVPN-TIB for (C-S,C-G) carried in the route, and (c) the received 3055 route is selected as the best (using the BGP route selection 3056 procedures), then the PE takes the following action: 3058 - If the PE's (C-*,C-G) state has a PMSI as a downstream interface, 3059 the PE acts as if all the other PEs had pruned C-S off the 3060 (C-*,C-G) tree. That is, 3062 * If the PE receives (C-S,C-G) traffic from a CE, it does not 3063 transmit it to other PEs. 3065 * Depending on the PIM state of the PE's PE-CE interfaces, the 3066 PE may or may not need to invoke PIM procedures to prune C-S 3067 off the (C-*,C-G) tree by sending a PIM Prune(S,G,rpt) to one 3068 or more of the CEs. This is determined by ordinary PIM 3069 procedures. If this does need to be done, the PE SHOULD delay 3070 sending the Prune until it first runs a timer; this helps 3071 ensure that the source is not pruned from the shared tree 3072 until all PEs have had time to receive the Source Active A-D 3073 route. 3075 - If the PE's (C-*,C-G) state does not have a PMSI as a downstream 3076 interface, the PE sets up its forwarding path to receive 3077 (C-S,C-G) traffic from the originator of the selected Source 3078 Active A-D route. 3080 Whenever a PE deletes the (C-S,C-G) state that was previously created 3081 as a result of receiving a C-multicast route for (C-S,C-G) from some 3082 other PE, the PE that deletes the state also withdraws the Source 3083 Active A-D route (if there is one) that was advertised when the state 3084 was created. 3086 In the example topology of section 9.3.1, this procedure will cause 3087 PE2 to generate a Source Active A-D route for (C-S,C-G). When this 3088 route is received, PE4 will set up its forwarding state to expect 3089 (C-S,C-G) packets from PE2. PE1 will change its forwarding state so 3090 that (C-S,C-G) packets that it receives from CE1 are not forwarded to 3091 any other PEs. (Note though that PE1 may still forward (C-S,C-G) 3092 packets received from CE1 to CE2, if CE2 has receivers for C-G and 3093 those receivers did not switch from the (C-*,C-G) tree to the 3094 (C-S,C-G) tree.) As a result, PE3 and PE4 do not receive duplicate 3095 packets of the (C-S,C-G) C-flow. 3097 With this procedure in place, there is no need to have any kind of 3098 C-multicast route that has the semantics of a PIM Prune(S,G,rpt) 3099 message. 3101 It is worth noting that if, as a result of this procedure, a PE sets 3102 up its forwarding state to receive (C-S,C-G) traffic from the source 3103 tree, the UMH is not necessarily the same as it would be if the PE 3104 had joined the source tree as a result of receiving a PIM Join for 3105 the same source tree from a directly attached CE. 3107 Note that the mechanism described in section 7.4.1 can be leveraged 3108 to advertise an S-PMSI binding along with the source active messages. 3109 This is accomplished by using the same BGP Update message to carry 3110 both the NLRI of the S-PMSI A-D route and the NLRI of the Source 3111 Active A-D route. (Though an implementation processing the received 3112 routes cannot assume that this will always be the case.) 3114 10. Eliminating PE-PE Distribution of (C-*,C-G) State 3116 In the ASM service model, a node that wants to become a receiver for 3117 a particular multicast group G first joins a shared tree, rooted at a 3118 rendezvous point. When the receiver detects traffic from a 3119 particular source it has the option of joining a source tree, rooted 3120 at that source. If it does so, it has to prune that source from the 3121 shared tree, to ensure that it receives packets from that source on 3122 only one tree. 3124 Maintaining the shared tree can require considerable state, as it is 3125 necessary not only to know who the upstream and downstream nodes are, 3126 but to know which sources have been pruned off which branches of the 3127 share tree. 3129 The BGP-based signaling procedures defined in this document and in 3130 [MVPN-BGP] eliminate the need for PEs to distribute to each other any 3131 state having to do with which sources have been pruned off a shared 3132 C-tree. Those procedures do still allow multicast data traffic to 3133 travel on a shared C-tree, but they do not allow a situation in which 3134 some CEs receive (S,G) traffic on a shared tree and some on a source 3135 tree. This results in a considerable simplification of the PE-PE 3136 procedures with minimal change to the multicast service seen within 3137 the VPN. However, shared C-trees are still supported across the VPN 3138 backbone. That is, (C-*,C-G) state is distributed PE-PE, but (C-*, 3139 C-G, RPT-bit) state is not. 3141 In this section, we specify a number of optional procedures which go 3142 further, and which completely eliminate the support for shared 3143 C-trees across the VPN backbone. In these procedures, the PEs keep 3144 track of the active sources for each C-G. As soon as a CE tries to 3145 join the (*,G) tree, the PEs instead join the (S,G) trees for all the 3146 active sources. Thus all distribution of (C-*,C-G) state is 3147 eliminated. These procedures are optional because they require some 3148 additional support on the part of the VPN customer, and because they 3149 are not always appropriate. (E.g., a VPN customer may have his own 3150 policy of always using shared trees for certain multicast groups.) 3151 There are several different options, described in the following 3152 sub-sections. 3154 10.1. Co-locating C-RPs on a PE 3156 [MVPN-REQ] describes C-RP engineering as an issue when PIM-SM (or 3157 BIDIR-PIM) is used in "Any Source Multicast (ASM) mode" [RFC4607] on 3158 the VPN customer site. To quote from [MVPN-REQ]: 3160 "In some cases this engineering problem is not trivial: for instance, 3161 if sources and receivers are located in VPN sites that are different 3162 than that of the RP, then traffic may flow twice through the SP 3163 network and the CE-PE link of the RP (from source to RP, and then 3164 from RP to receivers) ; this is obviously not ideal. A multicast VPN 3165 solution SHOULD propose a way to help on solving this RP engineering 3166 issue." 3168 One of the C-RP deployment models is for the customer to outsource 3169 the RP to the provider. In this case the provider may co-locate the 3170 RP on the PE that is connected to the customer site [MVPN-REQ]. This 3171 section describes how anycast-RP can be used for achieving this. This 3172 is described below. 3174 10.1.1. Initial Configuration 3176 For a particular MVPN, at least one or more PEs that have sites in 3177 that MVPN, act as an RP for the sites of that MVPN connected to these 3178 PEs. Within each MVPN all these RPs use the same (anycast) address. 3179 All these RPs use the Anycast RP technique. 3181 10.1.2. Anycast RP Based on Propagating Active Sources 3183 This mechanism is based on propagating active sources between RPs. 3185 10.1.2.1. Receiver(s) Within a Site 3187 The PE that receives C-Join for (*,G) does not send the information 3188 that it has receiver(s) for G until it receives information about 3189 active sources for G from an upstream PE. 3191 On receiving this (described in the next section), the downstream PE 3192 will respond with Join for (C-S,C-G). Sending this information could 3193 be done using any of the procedures described in section 5. Only the 3194 upstream PE will process this information. 3196 10.1.2.2. Source Within a Site 3198 When a PE receives PIM-Register from a site that belongs to a given 3199 VPN, PE follows the normal PIM anycast RP procedures. It then 3200 advertises the source and group of the multicast data packet carried 3201 in PIM-Register message to other PEs in BGP using the following 3202 information elements: 3204 - Active source address 3206 - Active group address 3208 - Route target of the MVPN. 3210 This advertisement goes to all the PEs that belong to that MVPN. When 3211 a PE receives this advertisement, it checks whether there are any 3212 receivers in the sites attached to the PE for the group carried in 3213 the source active advertisement. If yes, then it generates an 3214 advertisement for (C-S,C-G) as specified in the previous section. 3216 10.1.2.3. Receiver Switching from Shared to Source Tree 3218 No additional procedures are required when multicast receivers in 3219 customer's site shift from shared tree to source tree. 3221 10.2. Using MSDP between a PE and a Local C-RP 3223 Section 10.1 describes the case where each PE is a C-RP. This 3224 enables the PEs to know the active multicast sources for each MVPN, 3225 and they can then use BGP to distribute this information to each 3226 other. As a result, the PEs do not have to join any shared C-trees, 3227 and this results in a simplification of the PE operation. 3229 In another deployment scenario, the PEs are not themselves C-RPs, but 3230 use MSDP [RFC3618] to talk to the C-RPs. In particular, a PE that 3231 attaches to a site that contains a C-RP becomes an MSDP peer of that 3232 C-RP. That PE then uses BGP to distribute the information about the 3233 active sources to the other PEs. When the PE determines, by MSDP, 3234 that a particular source is no longer active, then it withdraws the 3235 corresponding BGP update. Then the PEs do not have to join any 3236 shared C-trees, but they do not have to be C-RPs either. 3238 MSDP provides the capability for a Source Active (SA) message to 3239 carry an encapsulated data packet. This capability can be used to 3240 allow an MSDP speaker to receive the first (or first several) 3241 packet(s) of an (S,G) flow, even though the MSDP speaker hasn't yet 3242 joined the (S,G) tree. (Presumably it will join that tree as a 3243 result of receiving the SA message that carries the encapsulated data 3244 packet.) If this capability is not used, the first several data 3245 packets of an (S,G) stream may be lost. 3247 A PE that is talking MSDP to an RP may receive such an encapsulated 3248 data packet from the RP. The data packet should be decapsulated and 3249 transmitted to the other PEs in the MVPN. If the packet belongs to a 3250 particular (S,G) flow, and if the PE is a transmitter for some S-PMSI 3251 to which (S,G) has already been bound, the decapsulated data packet 3252 should be transmitted on that S-PMSI. Otherwise, if an I-PMSI exists 3253 for that MVPN, the decapsulated data packet should be transmitted on 3254 it. (If a MI-PMSI exists, this would typically be used.) If neither 3255 of these conditions hold, the decapsulated data packet is not 3256 transmitted to the other PEs in the MVPN. The decision as to whether 3257 and how to transmit the decapsulated data packet does not effect the 3258 processing of the SA control message itself. 3260 Suppose that PE1 transmits a multicast data packet on a PMSI, where 3261 that data packet is part of an (S,G) flow, and PE2 receives that 3262 packet from that PMSI. According to section 9, if PE1 is not the PE 3263 that PE2 expects to be transmitting (S,G) packets, then PE2 must 3264 discard the packet. If an MSDP-encapsulated data packet is 3265 transmitted on a PMSI as specified above, this rule from section 9 3266 would likely result in the packet's getting discarded. Therefore, if 3267 MSDP-encapsulated data packets being decapsulated and transmitted on 3268 a PMSI, we need to modify the rules of section 9 as follows: 3270 1. If the receiving PE, PE2, has already joined the (S,G) tree, 3271 and has chosen PE1 as the upstream PE for the (S,G) tree, but 3272 this packet does not come from PE1, PE2 must discard the 3273 packet. 3275 2. If the receiving PE, PE2, has not already joined the (S,G) 3276 tree, but is a PIM adjacency to a CE that is downstream on the 3277 (*,G) tree, the packet should be forwarded to the CE. 3279 11. Support for PIM-BIDIR C-Groups 3281 In BIDIR-PIM, each multicast group is associated with an RPA 3282 (Rendezvous Point Address). The Rendezvous Point Link (RPL) is the 3283 link that attaches to the RPA. Usually it's a LAN where the RPA is 3284 in the IP subnet assigned to the LAN. The root node of a BIDIR-PIM 3285 tree is a node that has an interface on the RPL. 3287 On any LAN (other than the RPL) that is a link in a PIM-bidir tree, 3288 there must be a single node that has been chosen to be the DF. (More 3289 precisely, for each RPA there is a single node that is the DF for 3290 that RPA.) A node that receives traffic from an upstream interface 3291 may forward it on a particular downstream interface only if the node 3292 is the DF for that downstream interface. A node that receives 3293 traffic from a downstream interface may forward it on an upstream 3294 interface only if that node is the DF for the downstream interface. 3296 If, for any period of time, there is a link on which each of two 3297 different nodes believes itself to be the DF, data forwarding loops 3298 can form. Loops in a bidirectional multicast tree can be very 3299 harmful. However, any election procedure will have a convergence 3300 period. The BIDIR-PIM DF election procedure is very complicated, 3301 because it goes to great pains to ensure that if convergence is not 3302 extremely fast, then there is no forwarding at all until convergence 3303 has taken place. 3305 Other variants of PIM also have a DF election procedure for LANs. 3306 However, as long as the multicast tree is unidirectional, 3307 disagreement about who the DF is can result only in duplication of 3308 packets, not in loops. Therefore the time taken to converge on a 3309 single DF is of much less concern for unidirectional trees and it is 3310 for bidirectional trees. 3312 In the MVPN environment, if PIM signaling is used among the PEs, then 3313 the standard LAN-based DF election procedure can be used. However, 3314 election procedures that are optimized for a LAN may not work as well 3315 in the MVPN environment. So an alternative to DF election would be 3316 desirable. 3318 If BGP signaling is used among the PEs, an alternative to DF election 3319 is necessary. One might think that the "single forwarder selection" 3320 procedures described in sections 5 and 9 could be used to choose a 3321 single PE "DF" for the backbone (for a given RPA in a given MVPN). 3322 However, that is still likely to leave a convergence period of at 3323 least several seconds during which loops could form, and there could 3324 be a much longer convergence period if there is anything disrupting 3325 the smooth flow of BGP updates. So a simple procedure like that is 3326 not sufficient. 3328 The remainder of this section describes two different methods that 3329 can be used to support BIDIR-PIM while eliminating the DF election. 3331 11.1. The VPN Backbone Becomes the RPL 3333 On a per MVPN basis, this method treats the whole service provider(s) 3334 infrastructure as a single RPL (RP Link). We refer to such an RPL as 3335 an "MVPN-RPL". This eliminates the need for the PEs to engage in any 3336 "DF election" procedure, because PIM-bidir does not have a DF on the 3337 RPL. 3339 However, this method can only be used if the customer is 3340 "outsourcing" the RPL/RPA functionality to the SP. 3342 An MVPN-RPL could be realized either via an I-PMSI (this I-PMSI is on 3343 a per MVPN basis and spans all the PEs that have sites of a given 3344 MVPN), or via a collection of S-PMSIs, or even via a combination of 3345 an I-PMSI and one or more S-PMSIs. 3347 11.1.1. Control Plane 3349 Associated with each MVPN-RPL is an address prefix that is 3350 unambiguous within the context of the MVPN associated with the 3351 MVPN-RPL. 3353 For a given MVPN, each VRF connected to an MVPN-RPL of that MVPN is 3354 configured to advertise to all of its connected CEs the address 3355 prefix of the MVPN-RPL. 3357 Since in PIM Bidir there is no Designated Forwarder on an RPL, in the 3358 context of MVPN-RPL there is no need to perform the Designated 3359 Forwarder election among the PEs (note there is still necessary to 3360 perform the Designated Forwarder election between a PE and its 3361 directly attached CEs, but that is done using plain PIM Bidir 3362 procedures). 3364 For a given MVPN a PE connected to an MVPN-RPL of that MVPN should 3365 send multicast data (C-S,C-G) on the MVPN-RPL only if at least one 3366 other PE connected to the MVPN-RPL has a downstream multicast state 3367 for C-G. In the context of MVPN this is accomplished by requiring a 3368 PE that has a downstream state for a particular C-G of a particular 3369 VRF present on the PE to originate a C-multicast route for (C-*, 3370 C-G). The RD of this route should be the same as the RD associated 3371 with the VRF. The RTs carried by the route should be such as to 3372 ensure that the route gets distributed to all the PEs of the MVPN. 3374 11.1.2. Data Plane 3376 A PE that receives (C-S,C-G) multicast data from a CE should forward 3377 this data on the MVPN-RPL of the MVPN the CE belongs to only if the 3378 PE receives at least one C-multicast route for (C-*, C-G). 3379 Otherwise, the PE should not forward the data on the RPL/I-PMSI. 3381 When a PE receives a multicast packet with (C-S,C-G) on an MVPN-RPL 3382 associated with a given MVPN, the PE forwards this packet to every 3383 directly connected CE of that MVPN, provided that the CE sends Join 3384 (C-*,C-G) to the PE (provided that the PE has the downstream 3385 (C-*,C-G) state). The PE does not forward this packet back on the 3386 MVPN-RPL. If a PE has no downstream (C-*,C-G) state, the PE does not 3387 forward the packet. 3389 11.2. Partitioned Sets of PEs 3391 This method does not require the use of the MVPN-RPL, and does not 3392 require the customer to outsource the RPA/RPL functionality to the 3393 SP. 3395 11.2.1. Partitions 3397 Consider a particular C-RPA, call it C-R, in a particular MVPN. 3398 Consider the set of PEs that attach to sites that have senders or 3399 receivers for a BIDIR-PIM group C-G, where C-R is the RPA for C-G. 3400 (As always we use the "C-" prefix to indicate that we are referring 3401 to an address in the VPN's address space rather than in the 3402 provider's address space.) 3404 Following the procedures of section 5.1, each PE in the set 3405 independently chooses some other PE in the set to be its "upstream 3406 PE" for those BIDIR-PIM groups with RPA C-R. Optionally, they can 3407 all choose the "default selection" (described in section 5.1), to 3408 ensure that each PE to choose the same upstream PE. Note that if a 3409 PE has a route to C-R via a VRF interface, then the PE may choose 3410 itself as the upstream PE. 3412 The set of PEs can now be partitioned into a number of subsets. 3413 We'll say that PE1 and PE2 are in the same partition if and only if 3414 there is some PE3 such that PE1 and PE2 have each chosen PE3 as the 3415 upstream PE for C-R. Note that each partition has exactly one 3416 upstream PE. So it is possible to identify the partition by 3417 identifying its upstream PE. 3419 Consider packet P, and let PE1 be its ingress PE. PE1 will send the 3420 packet on a PMSI so that it reaches the other PEs that need to 3421 receive it. This is done by encapsulating the packet and sending it 3422 on a P-tunnel. If the original packet is part of a PIM-BIDIR group 3423 (its ingress PE determines this from the packet's destination address 3424 C-G), and if the VPN backbone is not the RPL, then the encapsulation 3425 MUST carry information that can be used to identify the partition to 3426 which the ingress PE belongs. 3428 When PE2 receives a packet from the PMSI, PE2 must determine, by 3429 examining the encapsulation, whether the packet's ingress PE belongs 3430 to the same partition (relative to the C-RPA of the packet's C-G) 3431 that PE2 itself belongs to. If not, PE2 discards the packet. 3432 Otherwise PE2 performs the normal BIDIR-PIM data packet processing. 3433 With this rule in place, harmful loops cannot be introduced by the 3434 PEs into the customer's bidirectional tree. 3436 Note that if there is more than one partition, the VPN backbone will 3437 not carry a packet from one partition to another. The only way for a 3438 packet to get from one partition to another is for it to go up 3439 towards the RPA and then to go down another path to the backbone. If 3440 this is not considered desirable, then all PEs should choose the same 3441 upstream PE for a given C-RPA. Then multiple partitions will only 3442 exist during routing transients. 3444 11.2.2. Using PE Distinguisher Labels 3446 If a given P-tunnel is to be used to carry packets traveling along a 3447 bidirectional C-tree, then, EXCEPT for the case described in sections 3448 11.1 and 11.2.3, the packets that travel on that P-tunnel MUST carry 3449 a PE Distinguisher Label (defined in section 4), using the 3450 encapsulation discussed in section 12.3. 3452 When a given PE transmits a given packet of a bidirectional C-group 3453 to the P-tunnel, the packet will carry the PE Distinguisher Label 3454 corresponding to the partition, for the C-group's C-RPA, that 3455 contains the transmitting PE. This is the PE Distinguisher Label 3456 that has been bound to the upstream PE of that partition; it is not 3457 necessarily the label that has been bound to the transmitting PE. 3459 Recall that the PE Distinguisher Labels are upstream-assigned labels 3460 that are assigned and advertised by the node that is at the root of 3461 the P-tunnel. The information about PE Distinguisher labels is 3462 distributed with Intra-AS I-PMSI A-D routes and/or S-PMSI A-D routes 3463 by encoding it into the PE Distinguisher Label attribute carried by 3464 these routes 3466 When a PE receives a packet with a PE label that does not identify 3467 the partition of the receiving PE, then the receiving PE discards the 3468 packet. 3470 Note that this procedure does not necessarily require the root of a 3471 P-tunnel to assign a PE Distinguisher Label for every PE that belongs 3472 to the tunnel. If the root of the P-tunnel is the only PE that can 3473 transmit packets to the P-tunnel, then the root needs to assign PE 3474 Distinguisher Labels only for those PEs that the root has selected to 3475 be the UMHs for the particular C-RPAs known to the root. 3477 11.2.3. Partial Mesh of MP2MP P-Tunnels 3479 There is one case in which support for BIDIR-PIM C-groups does not 3480 require the use of a PE Distinguisher Label. For a given C-RPA, 3481 suppose a distinct MP2MP LSP is used as the P-tunnel serving that 3482 partition. Then for a given packet, a PE receiving the packet from a 3483 P-tunnel can be inferred the partition from the tunnel. So PE 3484 Distinguisher Labels are not needed in this case. 3486 12. Encapsulations 3488 The BGP-based auto-discovery procedures will ensure that the PEs in a 3489 single MVPN only use tunnels that they can all support, and for a 3490 given kind of tunnel, that they only use encapsulations that they can 3491 all support. 3493 12.1. Encapsulations for Single PMSI per P-Tunnel 3495 12.1.1. Encapsulation in GRE 3497 GRE encapsulation can be used for any PMSI that is instantiated by a 3498 mesh of unicast P-tunnels, as well as for any PMSI that is 3499 instantiated by one or more PIM P-tunnels of any sort. 3501 Packets received Packets in transit Packets forwarded 3502 at ingress PE in the service by egress PEs 3503 provider network 3505 +---------------+ 3506 | P-IP Header | 3507 +---------------+ 3508 | GRE | 3509 ++=============++ ++=============++ ++=============++ 3510 || C-IP Header || || C-IP Header || || C-IP Header || 3511 ++=============++ >>>>> ++=============++ >>>>> ++=============++ 3512 || C-Payload || || C-Payload || || C-Payload || 3513 ++=============++ ++=============++ ++=============++ 3515 The IP Protocol Number field in the P-IP Header MUST be set to 47. 3516 The Protocol Type field of the GRE Header is set to either 0x800 or 3517 0x86dd, depending on whether the C-IP Header is IPv4 or IPv6 3518 respectively.. 3520 When an encapsulated packet is transmitted by a particular PE, the 3521 source IP address in the P-IP header must be the same address that 3522 the PE uses to identify itself in the VRF Route Import Extended 3523 Communities that it attaches to any of VPN-IP routes eligible for UMH 3524 determination that it advertises via BGP (see section 5.1). 3526 If the PMSI is instantiated by a PIM tree, the destination IP address 3527 in the P-IP header is the group P-address associated with that tree. 3528 The GRE key field value is omitted. 3530 If the PMSI is instantiated by unicast P-tunnels, the destination IP 3531 address is the address of the destination PE, and the optional GRE 3532 Key field is used to identify a particular MVPN. In this case, each 3533 PE would have to advertise a key field value for each MVPN; each PE 3534 would assign the key field value that it expects to receive. 3536 [RFC2784] specifies an optional GRE checksum, and [RFC2890] specifies 3537 an optional GRE sequence number fields. 3539 The GRE sequence number field is not needed because the transport 3540 layer services for the original application will be provided by the 3541 C-IP Header. 3543 The use of GRE checksum field must follow [RFC2784]. 3545 To facilitate high speed implementation, this document recommends 3546 that the ingress PE routers encapsulate VPN packets without setting 3547 the checksum, or sequence fields. 3549 12.1.2. Encapsulation in IP 3551 IP-in-IP [RFC2003] is also a viable option. The following diagram 3552 shows the progression of the packet as it enters and leaves the 3553 service provider network. 3555 Packets received Packets in transit Packets forwarded 3556 at ingress PE in the service by egress PEs 3557 provider network 3559 +---------------+ 3560 | P-IP Header | 3561 ++=============++ ++=============++ ++=============++ 3562 || C-IP Header || || C-IP Header || || C-IP Header || 3563 ++=============++ >>>>> ++=============++ >>>>> ++=============++ 3564 || C-Payload || || C-Payload || || C-Payload || 3565 ++=============++ ++=============++ ++=============++ 3567 When the P-IP Header is an IPv4 header, its Protocol Number field is 3568 set to either 4 or 41, depending on whether the C-IP header is an 3569 IPv4 header or an IPv6 header, respectively. 3571 When the P-IP Header is an IPv6 header, its Next Header field is set 3572 to either 4 or 41, depending on whether the C-IP header is an IPv4 3573 header or an IPv6 header, respectively. 3575 When an encapsulated packet is transmitted by a particular PE, the 3576 source IP address in the P-IP header must be the same address that 3577 the PE uses to identify itself in the VRF Route Import Extended 3578 Communities that it attaches to any of VPN-IP routes eligible for UMH 3579 determination that it advertises via BGP (see section 5.1). 3581 12.1.3. Encapsulation in MPLS 3583 If the PMSI is instantiated as a P2MP MPLS LSP or a MP2MP LSP, MPLS 3584 encapsulation is used. Penultimate-hop-popping MUST be disabled for 3585 the LSP. 3587 If other methods of assigning MPLS labels to multicast distribution 3588 trees are in use, these multicast distribution trees may be used as 3589 appropriate to instantiate PMSIs, and appropriate additional MPLS 3590 encapsulation procedures may be used. 3592 Packets received Packets in transit Packets forwarded 3593 at ingress PE in the service by egress PEs 3594 provider network 3596 +---------------+ 3597 | P-MPLS Header | 3598 ++=============++ ++=============++ ++=============++ 3599 || C-IP Header || || C-IP Header || || C-IP Header || 3600 ++=============++ >>>>> ++=============++ >>>>> ++=============++ 3601 || C-Payload || || C-Payload || || C-Payload || 3602 ++=============++ ++=============++ ++=============++ 3604 12.2. Encapsulations for Multiple PMSIs per P-Tunnel 3606 The encapsulations for transmitting multicast data messages when 3607 there are multiple PMSIs per P-tunnel are based on the encapsulation 3608 for a single PMSI per P-tunnel, but with an MPLS label used for 3609 demultiplexing. 3611 The label is upstream-assigned and distributed via BGP as specified 3612 in section 4. The label must enable the receiver to select the 3613 proper VRF, and may enable the receiver to select a particular 3614 multicast routing entry within that VRF. 3616 12.2.1. Encapsulation in GRE 3618 Rather than the IP-in-GRE encapsulation discussed in section 12.1.1, 3619 we use the MPLS-in-GRE encapsulation. This is specified in 3620 [MPLS-IP]. The GRE protocol type MUST be set to 0x8847. [The reason 3621 for using the unicast rather than the multicast value is specified in 3622 [MPLS-MCAST-ENCAPS]. 3624 12.2.2. Encapsulation in IP 3626 Rather than the IP-in-IP encapsulation discussed in section 12.1.2, 3627 we use the MPLS-in-IP encapsulation. This is specified in [MPLS-IP]. 3628 The IP protocol number MUST be set to the value identifying the 3629 payload as an MPLS unicast packet. (There is no "MPLS multicast 3630 packet" protocol number.) 3632 12.3. Encapsulations Identifying a Distinguished PE 3634 12.3.1. For MP2MP LSP P-tunnels 3636 As discussed in section 9, if a multicast data packet is traveling on 3637 a unidirectional C-tree, it is highly desirable for the PE that 3638 receives the packet from a PMSI to be able to determine the identity 3639 of the PE that transmitted the data packet onto the PMSI. The 3640 encapsulations of the previous sections all provide this information, 3641 except in one case. If a PMSI is being instantiated by a MP2MP LSP, 3642 then the encapsulations discussed so far do not allow one to 3643 determine the identity of the PE that transmitted the packet onto the 3644 PMSI. 3646 Therefore, when a packet traveling on a unidirectional C-tree is 3647 traveling on a MP2MP LSP P-tunnel, it MUST carry, as its second 3648 label, a label that has been bound to the packet's ingress PE. This 3649 label is an upstream-assigned label that the LSP's root node has 3650 bound to the ingress PE and has distributed via the PE Distinguisher 3651 Labels attribute of a PMSI A-D Route (see section 4). This label 3652 will appear immediately beneath the labels that are discussed in 3653 sections 12.1.3 and 12.2. 3655 A full specification of the procedures for advertising and for using 3656 the PE Distinguisher Labels in this case is outside the scope of this 3657 document. 3659 12.3.2. For Support of PIM-BIDIR C-Groups 3661 As was discussed in section 11, when a packet belongs to a PIM-BIDIR 3662 multicast group, the set of PEs of that packet's VPN can be 3663 partitioned into a number of subsets, where exactly one PE in each 3664 partition is the upstream PE for that partition. When such packets 3665 are transmitted on a PMSI, then unless the procedures of section 3666 11.2.3 are being used, it is necessary for the packet to carry 3667 information identifying a particular partition. This is done by 3668 having the packet carry the PE Distinguisher Label corresponding to 3669 the upstream PE of one partition. For a particular P-tunnel, this 3670 label will have been advertised by the node that is the root of that 3671 P-tunnel. (A full specification of the procedures for advertising PE 3672 Distinguisher Labels is out of the scope of this document.) 3674 This label needs to be used whenever a packet belongs to a PIM-BIDIR 3675 C-group, no matter what encapsulation is used by the P-tunnel. Hence 3676 the encapsulations of section 12.2 MUST be used. If the P-tunnel 3677 contains only one PMSI, the PE label replaces the label discussed in 3678 section 12.2 If the P-tunnel contains multiple PMSIs, the PE label 3679 follows the label discussed in section 12.2. 3681 In general, PE Distinguisher Labels can be carried if the 3682 encapsulation is MPLS or MPLS-in-IP or MPLS-in-GRE. However, 3683 procedures for advertising and using PE Distinguisher Labels when the 3684 encapsulation is LDP-based MP2P MPLS is outside the scope of this 3685 specification. 3687 12.4. General Considerations for IP and GRE Encaps 3689 These apply also to the MPLS-in-IP and MPLS-in-GRE encapsulations. 3691 12.4.1. MTU (Maximum Transmission Unit) 3693 It is the responsibility of the originator of a C-packet to ensure 3694 that the packet is small enough to reach all of its destinations, 3695 even when it is encapsulated within IP or GRE. 3697 When a packet is encapsulated in IP or GRE, the router that does the 3698 encapsulation MUST set the DF bit in the outer header. This ensures 3699 that the decapsulating router will not need to reassemble the 3700 encapsulating packets before performing decapsulation. 3702 In some cases the encapsulating router may know that a particular 3703 C-packet is too large to reach its destinations. Procedures by which 3704 it may know this are outside the scope of the current document. 3705 However, if this is known, then: 3707 - If the DF bit is set in the IP header of a C-packet that is known 3708 to be too large, the router will discard the C-packet as being 3709 "too large", and follow normal IP procedures (which may require 3710 the return of an ICMP message to the source). 3712 - If the DF bit is not set in the IP header of a C-packet that is 3713 known to be too large, the router MAY fragment the packet before 3714 encapsulating it, and then encapsulate each fragment separately. 3715 Alternatively, the router MAY discard the packet. 3717 If the router discards a packet as too large, it should maintain OAM 3718 information related to this behavior, allowing the operator to 3719 properly troubleshoot the issue. 3721 Note that if the entire path of the P-tunnel does not support an MTU 3722 that is large enough to carry the a particular encapsulated C-packet, 3723 and if the encapsulating router does not do fragmentation, then the 3724 customer will not receive the expected connectivity. 3726 12.4.2. TTL (Time to Live) 3728 The ingress PE should not copy the TTL field from the payload IP 3729 header received from a CE router to the delivery IP or MPLS header. 3730 The setting of the TTL of the delivery header is determined by the 3731 local policy of the ingress PE router. 3733 12.4.3. Avoiding Conflict with Internet Multicast 3735 If the SP is providing Internet multicast, distinct from its VPN 3736 multicast services, and using PIM based P-multicast trees, it must 3737 ensure that the group P-addresses that it used in support of MPVN 3738 services are distinct from any of the group addresses of the Internet 3739 multicasts it supports. This is best done by using administratively 3740 scoped addresses [ADMIN-ADDR]. 3742 The group C-addresses need not be distinct from either the group 3743 P-addresses or the Internet multicast addresses. 3745 12.5. Differentiated Services 3747 The setting of the DS (Differentiated Services) field in the delivery 3748 IP header should follow the guidelines outlined in [RFC2983]. 3749 Setting the EXP field in the delivery MPLS header should follow the 3750 guidelines in [RFC3270]. An SP may also choose to deploy any of 3751 additional Differentiated Services mechanisms that the PE routers 3752 support for the encapsulation in use. Note that the type of 3753 encapsulation determines the set of Differentiated Services 3754 mechanisms that may be deployed. 3756 13. Security Considerations 3758 This document describes an extension to the procedures of [RFC4364], 3759 and hence shares the security considerations described in [RFC4364] 3760 and [RFC4365]. 3762 When GRE encapsulation is used, the security considerations of 3763 [MPLS-IP] are also relevant. The security considerations of 3764 [RFC4797] are also relevant as it discusses implications on packet 3765 spoofing in the context of BGP/MPLS IP VPNs. 3767 The security considerations of [MPLS-HDR] apply when MPLS 3768 encapsulation is used. 3770 This document makes use of a number of control protocols: PIM 3772 [PIM-SM], BGP [MVPN-BGP], mLDP [MLDP], and RSVP-TE [RSVP-P2MP]. 3773 Security considerations relevant to each protocol are discussed in 3774 the respective protocol specifications. 3776 If one uses the UDP-based protocol for switching to S-PMSI (as 3777 specified in Section 7.4.2), then an S-PMSI Join message (i.e., a UDP 3778 packet with destination port 3232 and destination address 3779 ALL-PIM-ROUTERS) that is not received over a PMSI (e.g., one received 3780 directly from a CE router) is an illegal packet and MUST be dropped. 3782 The various procedures for P-tunnel construction have security issues 3783 that are specific to the way that the P-tunnels are used in this 3784 document. When P-tunnels are constructed via such techniques as PIM, 3785 mLDP, or RSVP-TE, it is important for each P or PE router receiving a 3786 control message MUST ensure that the control message comes from 3787 another P or PE router, not from a CE router. (Interpreting an mLDP 3788 or PIM or RSVP-TE control message from a CE router as referring to a 3789 P-tunnel would be a bug.) 3791 A PE MUST NOT accept BGP routes of the MCAST-VPN address family from 3792 a CE. 3794 If BGP is used as a CE-PE routing protocol, then when a PE receives 3795 an IP route from a CE, if this route carries the VRF Route Import 3796 extended community, the PE MUST remove this community from the route 3797 before turning it into a VPN-IP route. Routes that a PE advertises to 3798 a CE MUST NOT carry the VRF Route Import extended community. 3800 An ASBR may receive, from one SP's domain, an mLDP, PIM, or RSVP-TE 3801 control message that attempts to extend a P-tunnel from one SP's 3802 domain into another SP's domain. This is perfectly valid if there is 3803 an agreement between the SPs to jointly provide an MVPN service. In 3804 the absence of such an agreement, however, this could be an 3805 illegitimate attempt to intercept data packets. By default, an ASBR 3806 MUST NOT allow P-tunnels to extend beyond AS boundaries. However, it 3807 MUST be possible to configure an ASBR to allow this on a specified 3808 set of interfaces. 3810 Many of the procedures in this document cause the SP network to 3811 create and maintain an amount of state which is proportional to 3812 customer multicast activity. If the amount of customer multicast 3813 activity exceeds expectations, this can potentially cause P and PE 3814 routers to maintain an unexpectedly large amount of state, which may 3815 cause control and/or data plane overload. To protect against this 3816 situation an implementation should provide ways for the SP to bound 3817 the amount of state it devotes to the handling of customer multicast 3818 activity. 3820 In particular, an implementation SHOULD provide mechanisms that allow 3821 a SP to place limitations on the following: 3823 - total number of (C-*,C-G) and/or (C-S,C-G) states per VRF 3825 - total number of P-tunnels per VRF used for S-PMSIs 3827 - total number of P-tunnels traversing a given P router 3829 A PE implementation MAY also provide mechanisms that allow a SP to 3830 limit the rate of change of various MVPN-related states on PEs, as 3831 well as the rate at which MVPN-related control messages may be 3832 received by a PE from the CEs and/or sent from the PE to other PEs. 3834 An implementation that provides the procedures specified in Sections 3835 10.1 or 10.2 MUST provide the capability to impose an upper bound on 3836 the number of Source Active A-D routes generated, and on how 3837 frequently they may be originated. This MUST be provided on a per PE, 3838 per MVPN granularity. 3840 Lack of the mechanisms that allow a SP to limit the rate of change of 3841 various MVPN-related states on PEs, as well as the rate at which 3842 MVPN-related control messages may be received by a PE from the CEs 3843 and/or sent from the PE to other PEs may result in the control plane 3844 overload on the PE, which in turn would adversely impact all the 3845 customers connected to that PE, as well as to other PEs. 3847 See also the security considerations of [MVPN-BGP]. 3849 14. IANA Considerations 3851 Section 7.4.2 defines the "S-PMSI Join Message", which is carried in 3852 a UDP datagram whose port number is 3232. This port number is 3853 already assigned by IANA to "MDT port". IANA should now have that 3854 assignment reference this document. 3856 IANA should create a registry for the "S-PMSI Join Message Type 3857 Field". Assignments are to be made according to the policy "IETF 3858 Review" as defined in [RFC5226]. The value 1 should be registered 3859 with a reference to this document. The description should read "PIM 3860 IPv4 S-PMSI (unaggregated)". 3862 [PIM-ATTRIB] establishes a registry for "PIM Join Attribute Types". 3863 IANA should assign the value 1 to the "MVPN Join Attribute", and 3864 should reference this document. 3866 15. Other Authors 3868 Sarveshwar Bandi, Yiqun Cai, Thomas Morin, Yakov Rekhter, IJsbrands 3869 Wijnands, Seisho Yasukawa 3871 16. Other Contributors 3873 Significant contributions were made Arjen Boers, Toerless Eckert, 3874 Adrian Farrel, Luyuan Fang, Dino Farinacci, Lenny Giuliano, Shankar 3875 Karuna, Anil Lohiya, Tom Pusateri, Ted Qian, Robert Raszuk, Tony 3876 Speakman, Dan Tappan. 3878 17. Authors' Addresses 3880 Rahul Aggarwal (Editor) 3881 Juniper Networks 3882 1194 North Mathilda Ave. 3883 Sunnyvale, CA 94089 3884 Email: rahul@juniper.net 3886 Sarveshwar Bandi 3887 Motorola 3888 Vanenburg IT park, Madhapur, 3889 Hyderabad, India 3890 Email: sarvesh@motorola.com 3892 Yiqun Cai 3893 Cisco Systems, Inc. 3894 170 Tasman Drive 3895 San Jose, CA, 95134 3896 E-mail: ycai@cisco.com 3898 Thomas Morin 3899 France Telecom R & D 3900 2, avenue Pierre-Marzin 3901 22307 Lannion Cedex 3902 France 3903 Email: thomas.morin@francetelecom.com 3904 Yakov Rekhter 3905 Juniper Networks 3906 1194 North Mathilda Ave. 3907 Sunnyvale, CA 94089 3908 Email: yakov@juniper.net 3910 Eric C. Rosen (Editor) 3911 Cisco Systems, Inc. 3912 1414 Massachusetts Avenue 3913 Boxborough, MA, 01719 3914 E-mail: erosen@cisco.com 3916 IJsbrand Wijnands 3917 Cisco Systems, Inc. 3918 170 Tasman Drive 3919 San Jose, CA, 95134 3920 E-mail: ice@cisco.com 3922 Seisho Yasukawa 3923 NTT Corporation 3924 9-11, Midori-Cho 3-Chome 3925 Musashino-Shi, Tokyo 180-8585, 3926 Japan 3927 Phone: +81 422 59 4769 3928 Email: yasukawa.seisho@lab.ntt.co.jp 3930 18. Normative References 3932 [MLDP] I. Minei, K., Kompella, I. Wijnands, B. Thomas, "Label 3933 Distribution Protocol Extensions for Point-to-Multipoint and 3934 Multipoint-to-Multipoint Label Switched Paths", 3935 draft-ietf-mpls-ldp-p2mp-08.txt, October 2009 3937 [MPLS-HDR] E. Rosen, et. al., "MPLS Label Stack Encoding", RFC 3032, 3938 January 2001 3940 [MPLS-IP] T. Worster, Y. Rekhter, E. Rosen, "Encapsulating MPLS in IP 3941 or Generic Routing Encapsulation (GRE)", RFC 4023, March 2005 3943 [MPLS-MCAST-ENCAPS] T. Eckert, E. Rosen, R. Aggarwal, Y. Rekhter, 3944 "MPLS Multicast Encapsulations", RFC 5332, August 2008 3946 [MPLS-UPSTREAM-LABEL] R. Aggarwal, Y. Rekhter, E. Rosen, "MPLS 3947 Upstream Label Assignment and Context-Specific Label Space", RFC 3948 5331, August 2008 3950 [MVPN-BGP], R. Aggarwal, E. Rosen, T. Morin, Y. Rekhter, C. 3951 Kodeboniya, "BGP Encodings for Multicast in MPLS/BGP IP VPNs", 3952 draft-ietf-l3vpn-2547bis-mcast-bgp-08.txt, September 2009 3954 [OSPF] J. Moy, "OSPF Version 2", RFC 2328, April 1998 3956 [OSPF-MT} P. Psenak, S. Mirtorabi, A. Roy, L. Nguyen, P. 3957 Pillay-Esnault, "Multi-Topology (MT) Routing in OSPF", RFC 4915, June 3958 2007 3960 [PIM-ATTRIB], A. Boers, IJ. Wijnands, E. Rosen, "The PIM Join 3961 Attribute Format", RFC 5384, November 2008 3963 [PIM-SM] "Protocol Independent Multicast - Sparse Mode (PIM-SM)", 3964 Fenner, Handley, Holbrook, Kouvelas, August 2006, RFC 4601 3966 [RFC2119] "Key words for use in RFCs to Indicate Requirement 3967 Levels.", Bradner, March 1997 3969 [RFC4364] "BGP/MPLS IP VPNs", Rosen, Rekhter, et. al., February 2006 3971 [RFC4659] "BGP-MPLS IP Virtual Private Network (VPN) Extension for 3972 IPv6 VPN", De Clercq, et. al., RFC 4659, September 2006 3974 [RSVP-OOB] Z. Ali, G. Swallow, R. Aggarwal, "Non PHP behavior and 3975 Out-of-Band Mapping for RSVP-TE LSPs", 3976 draft-ietf-mpls-rsvp-te-no-php-oob-mapping-03.txt, October 2009 3978 [RSVP-P2MP] R. Aggarwal, D. Papadimitriou, S. Yasukawa, et. al., 3979 "Extensions to RSVP-TE for Point-to-Multipoint TE LSPs", RFC 4875, 3980 May 2007 3982 19. Informative References 3984 [ADMIN-ADDR] D. Meyer, "Administratively Scoped IP Multicast", RFC 3985 2365, July 1998 3987 [BIDIR-PIM] "Bidirectional Protocol Independent Multicast 3988 (BIDIR-PIM)" M. Handley, I. Kouvelas, T. Speakman, L. Vicisano, RFC 3989 5015, October 2007 3991 [BSR] "Bootstrap Router (BSR) Mechanism for PIM", N. Bhaskar, et. 3992 al., RFC 5059, January 2008 3994 [MVPN-REQ] T. Morin, Ed., "Requirements for Multicast in L3 3995 Provider-Provisioned VPNs", RFC 4834, April 2007 3997 [RFC2003] C. Perkins, "IP Encapsulation within IP", RFC 2003, October 3998 1996 4000 [RFC2784] D. Farinacci, et. al., "Generic Routing Encapsulation", 4001 March 2000 4003 [RFC2890] G. Dommety, "Key and Sequence Number Extensions to GRE", 4004 September 2000 4006 [RFC2983] D. Black, "Differentiated Services and Tunnels", October 4007 2000 4009 [RFC3270] F. Le Faucheur, et. al., "MPLS Support of Differentiated 4010 Services", May 2002 4012 [RFC3618] B. Fenner D. Meyer, "Multicast Source Discovery Protocol", 4013 October 2003 4015 [RFC4365], E. Rosen, " Applicability Statement for BGP/MPLS IP 4016 Virtual Private Networks (VPNs)", February 2006 4018 [RFC4607] H. Holbrook, B. Cain, "Source-Specific Multicast for IP", 4019 August 2006 4021 [RFC4797] Y. Rekhter, R. Bonica, E. Rosen, "Use of Provider Edge to 4022 Provider Edge (PE-PE) Generic Routing Encapsulation (GRE) or IP in 4023 BGP/MPLS IP Virtual Private Networks", January 2007 4025 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 4026 IANA Considerations Section in RFCs", BCP 26, RFC 5226, May 2008.