idnits 2.17.1 draft-rosen-vpn-mcast-11.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.ii or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 29, 2009) is 5415 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 4601 (ref. 'PIMv2') (Obsoleted by RFC 7761) == Outdated reference: A later version (-10) exists of draft-rosen-l3vpn-mvpn-mspmsi-04 == Outdated reference: A later version (-10) exists of draft-ietf-l3vpn-2547bis-mcast-08 Summary: 2 errors (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Eric C. Rosen (Editor) 3 Internet Draft Yiqun Cai (Editor) 4 Intended Status: Informational IJsbrand Wijnands 5 Expires: December 29, 2009 Cisco Systems, Inc. 7 June 29, 2009 9 Multicast in MPLS/BGP IP VPNs 11 draft-rosen-vpn-mcast-11.txt 13 Status of this Memo 15 This Internet-Draft is submitted to IETF in full conformance with the 16 provisions of BCP 78 and BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as 21 Internet-Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 Copyright and License Notice 36 Copyright (c) 2009 IETF Trust and the persons identified as the 37 document authors. All rights reserved. 39 This document is subject to BCP 78 and the IETF Trust's Legal 40 Provisions Relating to IETF Documents in effect on the date of 41 publication of this document (http://trustee.ietf.org/license-info). 42 Please review these documents carefully, as they describe your rights 43 and restrictions with respect to this document. 45 Abstract 47 This draft describes the deployed MVPN (Multicast in BGP/MPLS IP 48 VPNs) solution of Cisco Systems. 50 Table of Contents 52 1 Specification of requirements ......................... 4 53 2 Introduction .......................................... 4 54 2.1 Scaling Multicast State Info. in the Network Core ..... 5 55 2.2 Overview .............................................. 6 56 3 Multicast VRFs ........................................ 7 57 4 Multicast Domains ..................................... 8 58 4.1 Model of Operation .................................... 8 59 5 Multicast Tunnels ..................................... 9 60 5.1 Ingress PEs ........................................... 9 61 5.2 Egress PEs ............................................ 9 62 5.3 Tunnel Destination Address(es) ........................ 9 63 5.4 Auto-Discovery ........................................ 10 64 5.4.1 MDT-SAFI .............................................. 11 65 5.5 Which PIM Variant to Use .............................. 12 66 5.6 Inter-AS MDT Construction ............................. 12 67 5.6.1 The PIM MVPN Join Attribute ........................... 12 68 5.6.1.1 Definition ............................................ 12 69 5.6.1.2 Usage ................................................. 13 70 5.7 Encapsulation ......................................... 14 71 5.7.1 Encapsulation in GRE .................................. 14 72 5.7.2 Encapsulation in IP ................................... 15 73 5.7.3 Interoperability ...................................... 15 74 5.8 MTU ................................................... 16 75 5.9 TTL ................................................... 16 76 5.10 Differentiated Services ............................... 16 77 5.11 Avoiding Conflict with Internet Multicast ............. 16 78 6 The PIM C-Instance and the MT ......................... 17 79 6.1 PIM C-Instance Control Packets ........................ 17 80 6.2 PIM C-instance RPF Determination ...................... 17 81 6.2.1 Connector Attribute ................................... 18 82 7 Data MDT: Optimizing Flooding ......................... 19 83 7.1 Limitation of Multicast Domain ........................ 19 84 7.2 Signaling Data MDT Trees .............................. 19 85 7.3 Use of SSM for Data MDTs .............................. 21 86 8 Packet Formats and Constants .......................... 21 87 8.1 MDT TLV ............................................... 21 88 8.2 MDT Join TLV .......................................... 22 89 8.3 Multiple MDT Join TLVs per Datagram ................... 23 90 8.4 Constants ............................................. 23 91 9 IANA Considerations ................................... 24 92 10 Security Considerations ............................... 24 93 11 Acknowledgments ....................................... 24 94 12 Normative References .................................. 24 95 13 Informative References ................................ 25 96 14 Authors' Addresses .................................... 26 98 1. Specification of requirements 100 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 101 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 102 document are to be interpreted as described in [RFC2119]. 104 2. Introduction 106 This draft describes the deployed MVPN (Multicast in BGP/MPLS IP 107 VPNs) solution of Cisco Systems. This is sometimes known as the 108 "PIM+GRE" MVPN profile (see [MVPN-PROFILES], section 2, which recasts 109 the contents of this document into the terminology of a more 110 generalized MVPN framework defined by the L3VPN WG). This document 111 is being made available as it is often used as a reference for 112 interoperating with deployed implementations. 114 The procedures specified in this draft differ in a few minor respects 115 from the fully standards-compliant PIM+GRE profile. These 116 differences are pointed out where they occur. 118 The base specification for BGP/MPLS IP VPNs [RFC4364] does not 119 provide a way for IP multicast data or control traffic to travel from 120 one VPN site to another. This document extends that specification by 121 specifying the necessary protocols and procedures for support of IP 122 multicast. 124 This specification presupposes that: 126 1. PIM [PIMv2] is the multicast routing protocol used within the 127 VPN, 129 2. PIM is also the multicast routing protocol used within the SP 130 network, and 132 3. the SP network supports native IPv4 multicast forwarding. 134 Familiarity with the terminology and procedures of [RFC4364] is 135 presupposed. Familiarity with [PIMv2] is also presupposed. 137 2.1. Scaling Multicast State Info. in the Network Core 139 The BGP/MPLS IP VPN service of [RFC4364] provides a VPN with 140 "optimal" unicast routing through the SP backbone, in that a packet 141 follows the "shortest path" across the backbone, as determined by the 142 backbone's own routing algorithm. This optimal routing is provided 143 without requiring the P routers to maintain any routing information 144 which is specific to a VPN; indeed, the P routers do not maintain any 145 per-VPN state at all. 147 Unfortunately, optimal MULTICAST routing cannot be provided without 148 requiring the P routers to maintain some VPN-specific state 149 information. Optimal multicast routing would require that one or 150 more multicast distribution trees be created in the backbone for each 151 multicast group that is in use. If a particular multicast group from 152 within a VPN is using source-based distribution trees, optimal 153 routing requires that there be one distribution tree for each 154 transmitter of that group. If shared trees are being used, one tree 155 for each group is still required. Each such tree requires state in 156 some set of the P routers, with the amount of state being 157 proportional to the number of multicast transmitters. The reason 158 there needs to be at least one distribution tree per multicast group 159 is that each group may have a different set of receivers; multicast 160 routing algorithms generally go to great lengths to ensure that a 161 multicast packet will not be sent to a node which is not on the path 162 to a receiver. 164 Given that an SP generally supports many VPNs, where each VPN may 165 have many multicast groups, and each multicast group may have many 166 transmitters, it is not scalable to have one or more distribution 167 trees for each multicast group. The SP has no control whatsoever 168 over the number of multicast groups and transmitters that exist in 169 the VPNs, and it is difficult to place any bound on these numbers. 171 In order to have a scalable multicast solution for MPLS/BGP IP VPNs, 172 the amount of state maintained by the P routers needs to be 173 proportional to something which IS under the control of the SP. This 174 specification describes such a solution. In this solution, the 175 amount of state maintained in the P routers is proportional only to 176 the number of VPNs which run over the backbone; the amount of state 177 in the P routers is NOT sensitive to the number of multicast groups 178 or to the number of multicast transmitters within the VPNS. To 179 achieve this scalability, the optimality of the multicast routes is 180 reduced. A PE which is not on the path to any receiver of a 181 particular multicast group may still receive multicast packets for 182 that group, and if so, will have to discard them. The SP does 183 however have control over the tradeoff between optimal routing and 184 scalability. 186 2.2. Overview 188 An SP determines whether a particular VPN is multicast-enabled. If 189 it is, it corresponds to a "Multicast Domain". A PE which attaches 190 to a particular multicast-enabled VPN is said to belong to the 191 corresponding Multicast Domain. For each Multicast Domain, there is 192 a default "Multicast Distribution Tree (MDT)" through the backbone, 193 connecting ALL of the PEs that belong to that Multicast Domain. A 194 given PE may be in as many Multicast Domains as there are VPNs 195 attached to that PE. However, each Multicast Domain has its own MDT. 196 The MDTs are created by running PIM in the backbone, and in general 197 an MDT also includes P routers on the paths between the PE routers. 199 In a departure from the usual multicast tree distribution procedures, 200 the Default MDT for a Multicast Domain is constructed automatically 201 as the PEs in the domain come up. Construction of the Default MDT 202 does not depend on the existence of multicast traffic in the domain; 203 it will exist before any such multicast traffic is seen. Default 204 MDTs correspond to the "MI-PMSIs" of [MVPN-ARCH]. 206 In BGP/IP MPLS VPNs, each CE router is a unicast routing adjacency of 207 a PE router, but CE routers at different sites do NOT become unicast 208 routing adjacencies of each other. This important characteristic is 209 retained for multicast routing -- a CE router becomes a PIM adjacency 210 of a PE router, but CE routers at different sites do NOT become PIM 211 adjacencies of each other. Multicast packets from within a VPN are 212 received from a CE router by an ingress PE router. The ingress PE 213 encapsulates the multicast packets and (initially) forwards them 214 along the Default MDT tree to all the PE routers connected to sites 215 of the given VPN. Every PE router attached to a site of the given 216 VPN thus receives all multicast packets from within that VPN. If a 217 particular PE routers is not on the path to any receiver of that 218 multicast group, the PE simply discards that packet. 220 If a large amount of traffic is being sent to a particular multicast 221 group, but that group does not have receivers at all the VPN sites, 222 it can be wasteful to forward that group's traffic along the Default 223 MDT. Therefore, we also specify a method for establishing individual 224 MDTs for specific multicast groups. We call these "Data MDTs". A 225 Data MDT delivers VPN data traffic for a particular multicast group 226 only to those PE routers which are on the path to receivers of that 227 multicast group. Using a Data MDT has the benefit of reducing the 228 amount of multicast traffic on the backbone, as well reducing the 229 load on some of the PEs; it has the disadvantage of increasing the 230 amount of state that must be maintained by the P routers. The SP has 231 complete control over this tradeoff. Data MDTs correspond to the 232 S-PMSIs of [MVPN-ARCH]. 234 This solution requires the SP to deploy appropriate protocols and 235 procedures, but is transparent to the SP's customers. An enterprise 236 which uses PIM-based multicasting in its network can migrate from a 237 private network to a BGP/MPLS IP VPN service, while continuing to use 238 whatever multicast router configurations it was previously using; no 239 changes need be made to CE routers or to other routers at customer 240 sites. For instance, any dynamic RP-discovery procedures that area 241 already in use may be left in place. 243 3. Multicast VRFs 245 The notion of a "VRF", defined in [RFC4364], is extended to include 246 multicast routing entries as well as unicast routing entries. 248 Each VRF has its own multicast routing table. When a multicast data 249 or control packet is received from a particular CE device, multicast 250 routing is done in the associated VRF. 252 Each PE router runs a number of instances of PIM-SM, as many as one 253 per VRF. In each instance of PIM-SM, the PE maintains a PIM 254 adjacency with each of the PIM-capable CE routers associated with 255 that VRF. The multicast routing table created by each instance is 256 specific to the corresponding VRF. We will refer to these PIM 257 instances as "VPN-specific PIM instances", or "PIM C-instances". 259 Each PE router also runs a "provider-wide" instance of PIM-SM (a "PIM 260 P-instance"), in which it has a PIM adjacency with each of its IGP 261 neighbors (i.e., with P routers), but NOT with any CE routers, and 262 not with other PE routers (unless they happen to be adjacent in the 263 SP's network). The P routers also run the P-instance of PIM, but do 264 NOT run a C-instance. 266 In order to help clarify when we are speaking of the PIM P-instance 267 and when we are speaking of a a PIM C-instance, we will also apply 268 the prefixes "P-" and "C-" respectively to control messages, 269 addresses, etc. Thus a P-Join would be a PIM Join which is processed 270 by the PIM P-instance, and a C-Join would be a PIM Join which is 271 processed by a C-instance. A P-group address would be a group 272 address in the SP's address space, and a C-group address would be a 273 group address in a VPN's address space. 275 4. Multicast Domains 277 4.1. Model of Operation 279 A "Multicast Domain (MD)" is essentially a set of VRFs associated 280 with interfaces that can send multicast traffic to each other. From 281 the standpoint of PIM C-instance, a multicast domain is equivalent to 282 a multi-access interface. The PE routers in a given MD become PIM 283 adjacencies of each other in the PIM C-instance. 285 Each multicast VRF is assigned to one MD. Each MD is configured with 286 a distinct, multicast P-group address, called the "Default MDT group 287 address". This address is used to build the Default MDT for the MD. 289 When a PE router needs to send PIM C-instance control traffic to the 290 other PE routers in the MD, it encapsulates the control traffic, with 291 its own address as source IP address and the Default MDT group 292 address as destination IP address. Note that the Default MDT is part 293 of P-instance of PIM, whereas the PEs that communicate over the 294 Default MDT are PIM adjacencies in a C-instance. Within the 295 C-instance, the Default MDT appears to be a multi-access network to 296 which all the PEs are attached. This is discussed in more detail in 297 section 5. 299 The Default MDT does not only carry the PIM control traffic of the 300 MD's PIM C-instance. It also, by default, carries the multicast data 301 traffic of the C-instance. In some cases though, multicast data 302 traffic in a particular MD will be sent on a Data MDT rather than on 303 the Default MDT The use of Data MDTs is described in section 7. 305 Note that, if an MDT (Default or Data) is set up using the ASM 306 Service Model, MDT (Default or Data) must have a P-group address 307 which is "globally unique" (more precisely, unique over the set of SP 308 networks carrying the multicast traffic of the corresponding MD). If 309 if the MDT is set up using the SSM model, the P-group address of an 310 MDT only needs to be unique relative to the source of the MDT (though 311 see section 5.4). However, some implementations require the same SSM 312 group address to be assigned to all the PEs. Interoperability with 313 those implementations requires conformance to this restriction 315 5. Multicast Tunnels 317 An MD can be thought of as a set of PE routers connected by a 318 "multicast tunnel (MT)". From the perspective of a VPN-specific PIM 319 instance, an MT is a single multi-access interface. In the SP 320 network, a single MT is realized as a Default MDT combined with zero 321 or more Data MDTs. 323 5.1. Ingress PEs 325 An ingress PE is a PE router that is either directly connected to the 326 multicast sender in the VPN, or via a CE router. When the multicast 327 sender starts transmitting, and if there are receivers (or PIM RP) 328 behind other PE routers in the common MD, the ingress PE becomes the 329 transmitter of either the Default MDT group or a Data MDT group in 330 the SP network. 332 5.2. Egress PEs 334 A PE router with a VRF configured in an MD becomes a receiver of the 335 Default MDT group for that MD. A PE router may also join a Data MDT 336 group if if it has a VPN-specific PIM instance in which it is 337 forwarding to one of its attached sites traffic for a particular 338 C-group, and that particular C-group has been associated with that 339 particular Data MDT. When a PE router joins any P-group used for 340 encapsulating VPN multicast traffic, the PE router becomes one of the 341 endpoints of the corresponding MT. 343 When a packet is received from an MT, the receiving PE derives the MD 344 from the destination address which is a P-group address of the the 345 packet received. The packet is then passed to the corresponding 346 Multicast VRF and VPN-specific PIM instance for further processing. 348 5.3. Tunnel Destination Address(es) 350 An MT is an IP tunnel for which the destination address is a P-group 351 address. However an MT is not limited to using only one P-group 352 address for encapsulation. Based on the payload VPN multicast 353 traffic, it can choose to use the Default MDT group address, or one 354 of the Data MDT group addresses (as described in section 7 of this 355 document), allowing the MT to reach a different set of PE routers in 356 the common MD. 358 5.4. Auto-Discovery 360 Any of the variants of PIM may be used to set up the Default MDT: 361 PIM-SM, Bidirectional PIM [BIDIR], or PIM-SSM [SSM]. Except in the 362 case of PIM-SSM, the PEs need only know the proper P-group address in 363 order to begin setting up the Default MDTs. The PEs will then 364 discover each others' addresses by virtue of receiving PIM control 365 traffic, e.g., PIM Hellos, sourced (and encapsulated) by each other. 367 However, in the case of PIM-SSM, the necessary MDTs for an MD cannot 368 be set up until each PE in the MD knows the source address of each of 369 the other PEs in that same MD. This information needs to be 370 auto-discovered. 372 A new BGP Address Family, MDT-SAFI is defined. The NLRI for this 373 address family consists of an RD, an IPv4 unicast address, and an 374 multicast group address. A given PE router in a given MD constructs 375 an NLRI in this family from: 377 - Its own IPv4 address. If it has several, it uses the one which 378 it will be placing in the IP source address field of multicast 379 packets that it will be sending over the MDT. 381 - An RD which has been assigned to the MD. 383 - The P-group address which is to be used as the IP destination 384 address field of multicast packets that will be sent over the 385 MDT. 387 When a PE distributes this NLRI via BGP, it may include a Route 388 Target Extended Communities attribute. This RT must be an "Import 389 RT" [RFC4364] of each VRF in the MD. The ordinary BGP distribution 390 procedures used by [RFC4364] will then ensure that each PE learns the 391 MDT-SAFI "address" of each of the other PEs in the MD, and that the 392 learned MDT-SAFI addresses get associated with the right VRFs. 394 If a PE receives an MDT-SAFI NLRI which does not have an RT 395 attribute, the P-group address from the NLRI has to be used to 396 associate the NLRI with a particular VRF. In this case, each 397 multicast domain must be associated with a unique P-address, even if 398 PIM-SSM is used. However, finding a unique P-address for a 399 multi-provider multicast group may be difficult. 401 In order to facilitate the deployment of multi-provider multicast 402 domains, this specification REQUIRES the use of the MDT-SAFI NLRI 403 (even if PIM-SSM is not used to set up the default MDT). This 404 specification also REQUIRES that an implementation be capable of 405 using PIM-SSM to set up the default MDT. 407 In the standard PIM+GRE profile, the MDT-SAFI is replaced by the 408 "Intra-AS I-PMSI A-D Route." The latter is a generalized version of 409 the MDT-SAFI, which allows the "default MDTs" and "data MDTs" to be 410 implemented as MPLS P2MP or MP2MP LSPs, as well as by PIM-created 411 multicast distribution trees. In the latter case, the Intra-AS A-D 412 routes carry the same information that the MDT-SAFI does, though with 413 a different encoding. 415 The Intra-AS A-D Routes also carry Route Targets, and so may be 416 distributed inter-AS in the same manner as unicast routes. (Inter-AS 417 distribution of "Intra-AS I-PMSI A-D routes" is necessary in some 418 cases, see below.) 420 The encoding of the MDT-SAFI is specified in the following 421 subsection: 423 5.4.1. MDT-SAFI 425 BGP messages in which AFI=1 and SAFI=66 are "MDT-SAFI" messages. 427 The NLRI format is 8-byte-RD:IPv4-address followed by the MDT group 428 address. i.e. The MP_REACH attribute for this SAFI will contain one 429 or more tuples of the following form : 431 +-------------------------------+ 432 | | 433 | RD:IPv4-address (12 octets) | 434 | | 435 +-------------------------------+ 436 | Group Address (4 octets) | 437 +-------------------------------+ 439 The IPv4 address identifies the PE that originated this route, and 440 the RD identifies a VRF in that PE. The group address must be a mul- 441 ticast group address, and is used to build the P-tunnels. All PEs 442 attached to a given MVPN must specify the same group-address, even if 443 the group is an SSM group. MDT-SAFI routes do not carry RTs, and the 444 group address is used to associated a received MDT-SAFI route with a 445 VRF. 447 5.5. Which PIM Variant to Use 449 To minimize the amount of multicast routing state maintained by the P 450 routers, the Default MDTs should be realized as shared trees, such as 451 PIM Bidirectional trees. However, the operational procedures for 452 assigning P-group addresses may be greatly simplified, especially in 453 the case of multi-provider MDs, if PIM-SSM is used. 455 Data MDTs are best realized as source trees, constructed via PIM-SSM. 457 5.6. Inter-AS MDT Construction 459 Standard PIM techniques for the construction of source trees 460 presuppose that every router has a route to the source of the tree. 461 However, if the source of the tree is in a different AS than a 462 particular P router, it is possible that the P router will not have a 463 route to the source. For example, the remote AS may be using BGP to 464 distribute a route to the source, but a particular P router may be 465 part of a "BGP-free core", in which the P routers are not aware of 466 BGP-distributed routes. 468 What is needed in this case is a way for a PE to tell PIM to 469 construct the tree through a particular BGP speaker, the "BGP next 470 hop" for the tree source. This can be accomplished with a PIM 471 extension. 473 If the PE has selected the source of the tree from the MDT SAFI 474 address family, then it may be desirable to build the tree along the 475 route to the MDT SAFI address, rather than along the route to the 476 corresponding IPv4 address. This enables the inter-AS portion of the 477 tree to follow a path which is specifically chosen for multicast 478 (i.e., it allows the inter-AS multicast topology to be 479 "non-congruent" to the inter-AS unicast topology). This too requires 480 a PIM extension. 482 The necessary PIM extension is the PIM MVPN Join Attribute described 483 in in the following sub-section. 485 5.6.1. The PIM MVPN Join Attribute 487 5.6.1.1. Definition 489 In [PIM-ATTRIB], the notion of a "join attribute" is defined, and a 490 format for included join attributes in PIM Join/Prune messages is 491 specified. We now define a new join attribute, which we call the 492 "MVPN Join Attribute". 494 0 1 2 3 495 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 496 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 497 | Type | Length | Proxy IP address 498 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 499 | RD 500 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-....... 502 The Type field of the MVPN Join Attribute is set to 1. 504 The F bit is set to 0. 506 Two information fields are carried in the MVPN Join attribute: 508 - Proxy: The IP address of the node towards which the PIM Join/Prune 509 message is to be forwarded. This will either be an IPv4 or an IPv6 510 address, depending on whether the PIM Join/Prune message itself is 511 IPv4 or IPv6. 513 - RD: An eight-byte RD. This immediately follows the proxy IP 514 address. 516 The PIM message also carries the address of the upstream PE. 518 In the case of an intra-AS MVPN, the proxy and the upstream PE are the 519 same. In the case of an inter-AS MVPN, proxy will be the ASBR which is 520 the exit point from the local AS on the path to the upstream PE. 522 5.6.1.2. Usage 524 When a PE router creates a PIM Join/Prune message in order to set up 525 an inter-AS default MDT, it does so as a result of having received a 526 particular MDT-SAFI route. It includes an MVPN Join attribute whose 527 fields are set as follows: 529 - If the upstream PE is in the same AS as the local PE, then the 530 proxy field contains the address of the upstream PE. Otherwise, 531 it contains the address of the BGP next hop on the route to the 532 upstream PE. 534 - The Rd field contains the RD from the NLRI of the MDT-SAFI route. 536 - The upstream PE field contains the address of the PE that 537 originated the MDT-SAFI route (obtained from the NLRI of that 538 route). 540 When a PIM router processes a PIM Join/Prune message with an MVPN 541 Join Attribute, it first checks to see if the proxy field contains 542 one of its own addresses. 544 If not, the router uses the proxy IP address in order to determine 545 the RPF interface and neighbor. The MVPN Join Attribute must be 546 passed upstream, unchanged. 548 If the proxy address is one of the router's own IP addresses, then 549 the router looks in its BGP routing table for an MDT-SAFI route whose 550 NLRI consists of the upstream PE address prepended with the RD from 551 the Join attribute. If there is no match, the PIM message is 552 discarded. If there is a match the IP address from the BGP next hop 553 field of the matching route is used in order to determine the RPF 554 interface and neighbor. When the PIM Join/Prune is forwarded 555 upstream, the proxy field is replaced with the address of the BGP 556 next hop, and the RD and upstream PE fields are left unchanged. 558 5.7. Encapsulation 560 5.7.1. Encapsulation in GRE 562 GRE [GRE1701] encapsulation is recommended when sending multicast 563 traffic through an MDT. The following diagram shows the progression 564 of the packet as it enters and leaves the service provider network. 566 Packets received Packets in transit Packets forwarded 567 at ingress PE in the service by egress PEs 568 provider network 570 +---------------+ 571 | P-IP Header | 572 +---------------+ 573 | GRE | 574 ++=============++ ++=============++ ++=============++ 575 || C-IP Header || || C-IP Header || || C-IP Header || 576 ++=============++ >>>>> ++=============++ >>>>> ++=============++ 577 || C-Payload || || C-Payload || || C-Payload || 578 ++=============++ ++=============++ ++=============++ 580 The IPv4 Protocol Number field in the P-IP Header must be set to 47. 581 The Protocol Type field of the GRE Header must be set to 0x800 if 582 C-IP header is an IPv4 header; it must be sent to 0x86dd if the C-IP 583 header is an IPv6 header. 585 [GRE2784] specifies an optional GRE checksum, and [GRE2890] specifies 586 optional GRE key and sequence number fields. 588 The GRE key field is not needed because the P-group address in the 589 delivery IP header already identifies the MD, and thus associates the 590 VRF context, for the payload packet to be further processed. 592 The GRE sequence number field is also not needed because the 593 transport layer services for the original application will be 594 provided by the C-IP Header. 596 The use of GRE checksum field must follow [GRE2784]. 598 To facilitate high speed implementation, this document recommends 599 that the ingress PE routers encapsulate VPN packets without setting 600 the checksum, key or sequence field. 602 5.7.2. Encapsulation in IP 604 IP-in-IP [IPIP1853] is also a viable option. When it is used, the 605 IPv4 Protocol Number field is set to 4. The following diagram shows 606 the progression of the packet as it enters and leaves the service 607 provider network. 609 Packets received Packets in transit Packets forwarded 610 at ingress PE in the service by egress PEs 611 provider network 613 +---------------+ 614 | P-IP Header | 615 ++=============++ ++=============++ ++=============++ 616 || C-IP Header || || C-IP Header || || C-IP Header || 617 ++=============++ >>>>> ++=============++ >>>>> ++=============++ 618 || C-Payload || || C-Payload || || C-Payload || 619 ++=============++ ++=============++ ++=============++ 621 5.7.3. Interoperability 623 PE routers in a common MD must agree on the method of encapsulation. 624 This can be achieved either via configuration or means of some 625 discovery protocols. To help reduce configuration overhead and 626 improve multi-vendor interoperability, it is strongly recommended 627 that GRE encapsulation must be supported and enabled by default. 629 5.8. MTU 631 Because multicast group addresses are used as tunnel destination 632 addresses, existing Path MTU discovery mechanisms can not be used. 633 This requires that: 635 1. The ingress PE router (one that does the encapsulation) must 636 not set the DF bit in the outer header, and 638 2. If the "DF" bit is cleared in the IP header of the C-Packet, 639 fragment the C-Packet before encapsulation if appropriate. 640 This is very important in practice due to the fact that the 641 performance of reassembly function is significantly lower than 642 that of decapsulating and forwarding packets on today's router 643 implementations. 645 5.9. TTL 647 The ingress PE should not copy the TTL field from the payload IP 648 header received from a CE router to the delivery IP header. The 649 setting the TTL of the deliver IP header is determined by the local 650 policy of the ingress PE router. 652 5.10. Differentiated Services 654 By default, the setting of the DS field in the delivery IP header 655 should follow the guidelines outlined in [DIFF2983]. An SP may also 656 choose to deploy any of the additional mechanisms the PE routers 657 support. 659 5.11. Avoiding Conflict with Internet Multicast 661 If the SP is providing Internet multicast, distinct from its VPN 662 multicast services, it must ensure that the P-group addresses which 663 correspond to its MDs are distinct from any of the group addresses of 664 the Internet multicasts it supports. This is best done by using 665 administratively scoped addresses [ADMIN-ADDR]. 667 The C-group addresses need not be distinct from either the P-group 668 addresses or the Internet multicast addresses. 670 6. The PIM C-Instance and the MT 672 If a particular VRF is in a particular MD, the corresponding MT is 673 treated by that VRF's VPN-specific PIM instances as a LAN interface. 674 The PEs which are adjacent on the MT must execute the PIM LAN 675 procedures, including the generation and processing of PIM Hello, 676 Join/Prune, Assert, DF election and other PIM control packets. 678 6.1. PIM C-Instance Control Packets 680 The PIM protocol packets are sent to ALL-PIM-ROUTERS (224.0.0.13) in 681 the context of that VRF, but when in transit in the provider network, 682 they are encapsulated using the Default MDT group configured for that 683 MD. This allows VPN-specific PIM routes to be extended from site to 684 site without appearing in the P routers. 686 6.2. PIM C-instance RPF Determination 688 Although the MT is treated as a PIM-enabled interface, unicast 689 routing is NOT run over it, and there are no unicast routing 690 adjacencies over it. It is therefore necessary to specify special 691 procedures for determining when the MT is to be regarded as the "RPF 692 Interface" for a particular C-address. 694 When a PE needs to determine the RPF interface of a particular 695 C-address, it looks up the C-address in the VRF. If the route 696 matching it is not a VPN-IP route learned from MP-BGP as described in 697 [RFC4364], or if that route's outgoing interface is one of the 698 interfaces associated with the VRF, then ordinary PIM procedures for 699 determining the RPF interface apply. 701 However, if the route matching the C-address is a VPN-IP route whose 702 outgoing interface is not one of the interfaces associated with the 703 VRF, then PIM will consider the outgoing interface to be the MT 704 associated with the VPN-specific PIM instance. 706 Once PIM has determined that the RPF interface for a particular 707 C-address is the MT, it is necessary for PIM to determine the RPF 708 neighbor for that C-address. This will be one of the other PEs that 709 is a PIM adjacency over the MT. 711 The BGP "Connector" attribute is defined. Whenever a PE router 712 distributes a VPN-IP address from a VRF that is part of an MD, it 713 SHOULD distribute a Connector attribute along with it. The Connector 714 attribute should specify the MDT address family, and its value should 715 be the IP address which the PE router is using as its source IP 716 address for multicast packets which encapsulated and sent over the 717 MT. Then when a PE has determined that the RPF interface for a 718 particular C-address is the MT, it must look up the Connector 719 attribute that was distributed along with the VPN-IP address 720 corresponding to that C-address. The value of this Connector 721 attribute will be considered to be the RPF adjacency for the 722 C-address. 724 There are older implementations in which the Connector attribute is 725 not present. In this case, as long as "BGP Next Hop" for the 726 C-address is one of the PEs that is a PIM adjacency, then that PE 727 should be treated as the RPF adjacency for that C-address. 729 However, if the MD spans multiple Autonomous Systems, and an "option 730 b" interconnect is used, the BGP Next Hop might not be a PIM 731 adjacency, and the RPF check will not succeed unless the Connector 732 attribute is used. 734 In the standard PIM+GRE profile, the connector attribute is replaced 735 by the "VRF Route Import Extended Community" attribute. The latter 736 is a generalized version, but carries the same information as the 737 connector attribute does; the encoding however is different. 739 The connector attribute is defined in the following sub-section. 741 6.2.1. Connector Attribute 743 The Connector Attribute is an optional transitive attribute. Its 744 value field is formatted as follows: 746 0 1 747 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 748 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 749 |0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1| 750 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 751 | | 752 | IPv4 Address of PE | 753 | | 754 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 756 7. Data MDT: Optimizing Flooding 758 7.1. Limitation of Multicast Domain 760 While the procedure specified in the previous section requires the P 761 routers to maintain multicast state, the amount of state is bounded 762 by the number of supported VPNs. The P routers do NOT run any 763 VPN-specific PIM instances. 765 In particular, the use of a single bidirectional tree per VPN scales 766 well as the number of transmitters and receivers increases, but not 767 so well as the amount of multicast traffic per VPN increases. 769 The multicast routing provided by this scheme is not optimal, in that 770 a packet of a particular multicast group may be forwarded to PE 771 routers which have no downstream receivers for that group, and hence 772 which may need to discard the packet. 774 In the simplest configuration model, only the Default MDT group is 775 configured for each MD. The result of the configuration is that all 776 VPN multicast traffic, control or data, will be encapsulated and 777 forwarded to all PE routers that are part of the MD. While this 778 limits the number of multicast routing states the provider network 779 has to maintain, it also requires PE routers to discard multicast 780 C-packets if there are not receivers for those packets in the 781 corresponding sites. In some cases, especially when the content 782 involves high bandwidth but only a limited set of receivers, it is 783 desirable that certain C-packets only travel to PE routers that do 784 have receivers in the VPN to save bandwidth in the network and reduce 785 load on the PE routers. 787 7.2. Signaling Data MDT Trees 789 A simple protocol is proposed to signal additional P-group addresses 790 to encapsulate VPN traffic. These P-group addresses are called data 791 MDT groups. The ingress PE router advertises a different P-group 792 address (as opposed to always using the Default MDT group) to 793 encapsulate VPN multicast traffic. Only the PE routers on the path 794 to eventual receivers join the P-group, and therefore form an optimal 795 multicast distribution tree in the service provider network for the 796 VPN multicast traffic. These multicast distribution trees are called 797 Data MDT trees because they do not carry PIM control packets 798 exchanged by PE routers. 800 The following documents the procedures of the initiation and teardown 801 of the Data MDT trees. The definition of the constants and timers 802 can be found in section 8. 804 - The PE router connected to the source of the content initially 805 uses the Default MDT group when forwarding the content to the MD. 807 - When one or more pre-configured conditions are met, it starts to 808 periodically announce MDT Join TLV at the interval of 809 [MDT_INTERVAL]. The MDT Join TLV is forwarded to all the PE 810 routers in the MD. 812 If a PE in a particular MD transmits a C-multicast data packet to 813 the backbone, by transmitting it through an MD, every other PE in 814 that MD will receive it. Any of those PEs which are not on a 815 C-multicast distribution tree for the packet's C-multicast 816 destination address (as determined by applying ordinary PIM 817 procedures to the corresponding multicast VRF) will have to 818 discard the packet. 820 A commonly used condition is the bandwidth. When the VPN traffic 821 exceeds certain threshold, it is more desirable to deliver the 822 flow to the PE routers connected to receivers in order to 823 optimize the performance of PE routers and the resource of the 824 provider network. However, other conditions can also be devised 825 and they are purely implementation specific. 827 - The MDT Join TLV is encapsulated in UDP and the packet is 828 addressed to ALL-PIM-ROUTERS (224.0.0.13) in the context of the 829 VRF and encapsulated using the Default MDT group when sent to the 830 MD. This allows all PE routers to receive the information. 832 - Upon receiving MDT Join TLV, PE routers connected to receivers 833 will join the Data MDT group announced by the MDT Join TLV in the 834 global table. When the Data MDT group is in PIM-SM or 835 bidirectional PIM mode, the PE routers build a shared tree toward 836 the RP. When the data MDT group is setup using PIM-SSM, the PE 837 routers build a source tree toward the PE router that is 838 advertising the MDT Join TLV. The IP address of the source 839 address is the same as the source IP address used in the IP 840 packet advertising the MDT Join TLV. 842 PE routers which are not connected to receivers may wish to cache 843 the states in order to reduce the delay when a receiver comes up 844 in the future. 846 - After [MDT_DATA_DELAY], the PE router connected to the source 847 starts encapsulating traffic using the Data MDT group. 849 - When the pre-configured conditions are no longer met, e.g. the 850 traffic stops, the PE router connected to the source stops 851 announcing MDT Join TLV. 853 - If the MDT Join TLV is not received over [MDT_DATA_TIMEOUT], PE 854 routers connected to the receivers just leave the Data MDT group 855 in the global instance. 857 7.3. Use of SSM for Data MDTs 859 The use of Data MDTs requires that a set of multicast P-addresses be 860 pre-allocated and dedicated for use as the destination addresses for 861 the Data MDTs. 863 If SSM is used to set up the Data MDTs, then each MD needs to be 864 assigned a set of these of multicast P-addresses. Each VRF in the MD 865 needs to be configured with this set (i.e., all VRFs in the MD are 866 configured with the same set). If there are n addresses in this set, 867 then each PE in the MD can be the source of n Data MDTs in that MD. 869 If SSM is not used for setting up Data MDTs, then each VRF needs to 870 be configured with a unique set of multicast P-addresses; two VRFs in 871 the same MD cannot be configured with the same set of addresses. 872 This requires the pre-allocation of many more multicast P-addresses, 873 and the need to configure a different set for each VRF greatly 874 complicates the operations and management. Therefore the use of SSM 875 for Data MDTs is very strongly recommended. 877 8. Packet Formats and Constants 879 8.1. MDT TLV 881 "MDT TLV" has the following format. It uses port 3232. 883 0 1 2 3 884 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 885 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 886 | Type | Length | Value | 887 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 888 | . | 889 | . | 890 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 892 Type (8 bits): 894 the type of the MDT TLV. Currently, only 1, MDT Join TLV is 895 defined. 897 Length (16 bits): 899 the total number of octets in the TLV for this type, including 900 both the Type and Length field. 902 Value (variable length): 904 the content of the TLV. 906 8.2. MDT Join TLV 908 "MDT Join TLV" has the following format. 910 0 1 2 3 911 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 912 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 913 | Type | Length | Reserved | 914 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 915 | C-source | 916 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 917 | C-group | 918 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 919 | P-group | 920 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 922 Type (8 bits): 924 as defined above. For MDT Join TLV, the value of the field is 1. 926 Length (16 bits): 928 as defined above. For MDT Join TLV, the value of the field is 929 16, including 1 byte of padding. 931 Reserved (8 bits): 933 for future use. 935 C-Source (32 bits): 937 the IPv4 address of the traffic source in the VPN. 939 C-Group (32 bits): 941 the IPv4 address of the multicast traffic destination address in 942 the VPN. 944 P-Group (32 bits): 946 the IPv4 group address that the PE router is going to use to 947 encapsulate the flow (C-Source, C-Group). 949 Extensions to the MDT-Join format to allow the assignment of IPv6 950 multicast streams to data-MDTs can be found in [MSPMSI]. 952 8.3. Multiple MDT Join TLVs per Datagram 954 A single UDP datagram MAY carry multiple MDT Join TLVs, as many as 955 can fit entirely within it. If there are multiple MDT Join TLVs in a 956 UDP datagram, they MUST be of the same type. The end of the last MDT 957 Join TLV (as determined by the MDT Join TLV length field) MUST 958 coincide with the end of the UDP datagram, as determined by the UDP 959 length field. When processing a received UDP datagram that contains 960 one or more MDT Join TLVs, a router MUST be able to process all the 961 MDT Join TLVs that fit into the datagram. 963 8.4. Constants 965 [MDT_DATA_DELAY]: 967 the interval before the PE router connected to the source to 968 switch to the Data MDT group. The default value is 3 seconds. 970 [MDT_DATA_TIMEOUT]: 972 the interval before which the PE router connected to the 973 receivers to time out MDT JOIN TLV received and leave the data 974 MDT group. The default value is 3 minutes. This value must be 975 consistent among PE routers. 977 [MDT_DATA_HOLDOWN]: 979 the interval before which the PE router will switch back to the 980 Default MDT tree after it started encapsulating packets using the 981 Data MDT group. This is used to avoid oscillation when traffic 982 is bursty. The default value is 1 minute. 984 [MDT_INTERVAL] 985 the interval the source PE router uses to periodically send 986 MDT_JOIN_TLV message. The default value is 60 seconds. 988 9. IANA Considerations 990 The codepoint for the connector attribute is defined in IANA's 991 registry of BGP attributes. The reference should be changed to refer 992 to this document. 994 The codepoint for MDT-SAFI is defined in IANA's registry of BGP SAFI 995 assignments. The reference should be changed to refer to this 996 document. 998 10. Security Considerations 1000 [RFC4364] discusses in general the security considerations that 1001 pertain to when the RFC4364 type of VPN is deployed. 1003 [PIMv2] discusses the security considerations that pertain to the use 1004 of PIM. 1006 The security considerations of [RFC4023] and [RFC4797] apply whenever 1007 VPN traffic is carried through IP or GRE tunnels. 1009 Each PE router MUST install packet filters that would result in 1010 discarding all UDP packets with the destination port 3232 that the PE 1011 router receives from the CE routers connected to the PE router. 1013 11. Acknowledgments 1015 Major contributions to this work have been made by Dan Tappan and 1016 Tony Speakman. 1018 The authors also wish to thank Arjen Boers, Robert Raszuk, Toerless 1019 Eckert and Ted Qian for their help and their ideas. 1021 12. Normative References 1023 [GRE2784] "Generic Routing Encapsulation (GRE)", Farinacci, Li, 1024 Hanks, Meyer, Traina, March 2000, RFC 2784 1026 [PIMv2] "Protocol Independent Multicast - Sparse Mode (PIM-SM)", 1027 Fenner, Handley, Holbrook, Kouvelas, August 2006, RFC 4601 1029 [PIM-ATTRIB] "The PIM Join Attribute Format" A. Boers, IJ. Wijnands, 1030 E. Rosen, November 2008, RFC 5384 1032 [RFC2119] "Key words for use in RFCs to Indicate Requirement 1033 Levels.", Bradner, March 1997, RFC 2119 1035 [RFC4364] "BGP/MPLS IP VPNs", Rosen, Rekhter, February 2006, RFC 1036 4364 1038 13. Informative References 1040 [ADMIN-ADDR] "Administratively Scoped IP Multicast", Meyer, July 1041 1998, RFC 2365 1043 [BIDIR] "Bidirectional Protocol Independent Multicast", Handley, 1044 Kouvelas, Speakman, Vicisano, October 2007, RFC 5015 1046 [DIFF2983] "Differentiated Services and Tunnels", Black, October 1047 2000, RFC2983. 1049 [GRE1701] "Generic Routing Encapsulation (GRE)", Farinacci, Li, 1050 Hanks, Traina, October 1994, RFC 1701 1052 [GRE2890] "Key and Sequence Number Extensions to GRE", Dommety, 1053 September 2000, RFC 2890 1055 [IPIP1853] "IP in IP Tunneling", Simpson, October 1995, RFC1853. 1057 [MSPMSI] "MVPN: Optimized use of PIM, Wild Card Selectors, S-PMSI 1058 Join Extensions, Bidirectional Tunnels, Extranets", Rosen, Boers, 1059 Cai, Wijnands, draft-rosen-l3vpn-mvpn-mspmsi-04.txt, June 2009 1061 [MVPN-ARCH] "Multicast in MPLS/BGP IP VPNs", Rosen, Aggarwal, 1062 draft-ietf-l3vpn-2547bis-mcast-08.txt, March 2009 1064 [MVPN-PROFILES] "MVPN Profiles Using PIM Control Plane", Rosen, 1065 Boers, Cai, Wijnands, June 2009, 1066 draft-rosen-l3vpn-mvpn-profiles-03.txt 1068 [SSM] "Source-Specific Multicast for IP", Holbrook, Cain, August 1069 2006, RFC 4607 1071 [RFC4023] " Encapsulating MPLS in IP or Generic Routing Encapsulation 1072 (GRE)", T. Worster, Y. Rekhter, E. Rosen, Ed.. March 2005, RFC 4023 1074 [RFC4797] "Use of Provider Edge to Provider Edge (PE-PE) Generic 1075 Routing Encapsulation (GRE) or IP in BGP/MPLS IP Virtual Private 1076 Networks", Y.Rekhter, R. Bonica, E. Rosen, January 2007, RFC 4797 1078 14. Authors' Addresses 1080 Yiqun Cai (Editor) 1081 Cisco Systems, Inc. 1082 170 Tasman Drive 1083 San Jose, CA, 95134 1084 E-mail: ycai@cisco.com 1086 Eric C. Rosen (Editor) 1087 Cisco Systems, Inc. 1088 1414 Massachusetts Avenue 1089 Boxborough, MA, 01719 1090 E-mail: erosen@cisco.com 1092 IJsbrand Wijnands 1093 Cisco Systems, Inc. 1094 170 Tasman Drive 1095 San Jose, CA, 95134 1096 E-mail: ice@cisco.com