idnits 2.17.1 draft-rosen-vpn-mcast-10.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (December 31, 2008) is 5595 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 4601 (ref. 'PIMv2') (Obsoleted by RFC 7761) == Outdated reference: A later version (-10) exists of draft-ietf-l3vpn-2547bis-mcast-07 == Outdated reference: A later version (-03) exists of draft-rosen-l3vpn-mvpn-profiles-01 Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Eric C. Rosen (Editor) 3 Internet Draft Yiqun Cai (Editor) 4 Intended Status: Informational IJsbrand Wijnands 5 Expires: June 31, 2009 Cisco Systems, Inc. 7 December 31, 2008 9 Multicast in MPLS/BGP IP VPNs 11 draft-rosen-vpn-mcast-10.txt 13 Status of this Memo 15 This Internet-Draft is submitted to IETF in full conformance with the 16 provisions of BCP 78 and BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 Copyright and License Notice 36 Copyright (c) 2008 IETF Trust and the persons identified as the 37 document authors. All rights reserved. 39 This document is subject to BCP 78 and the IETF Trust's Legal 40 Provisions Relating to IETF Documents 41 (http://trustee.ietf.org/license-info) in effect on the date of 42 publication of this document. Please review these documents 43 carefully, as they describe your rights and restrictions with respect 44 to this document. 46 Abstract 48 This draft describes the deployed MVPN (Multicast in BGP/MPLS IP 49 VPNs) solution of Cisco Systems. 51 Table of Contents 53 1 Specification of requirements ......................... 4 54 2 Introduction .......................................... 4 55 2.1 Scaling Multicast State Info. in the Network Core ..... 5 56 2.2 Overview .............................................. 6 57 3 Multicast VRFs ........................................ 7 58 4 Multicast Domains ..................................... 8 59 4.1 Model of Operation .................................... 8 60 5 Multicast Tunnels ..................................... 9 61 5.1 Ingress PEs ........................................... 9 62 5.2 Egress PEs ............................................ 9 63 5.3 Tunnel Destination Address(es) ........................ 9 64 5.4 Auto-Discovery ........................................ 10 65 5.4.1 MDT-SAFI .............................................. 11 66 5.5 Which PIM Variant to Use .............................. 12 67 5.6 Inter-AS MDT Construction ............................. 12 68 5.6.1 The PIM MVPN Join Attribute ........................... 12 69 5.6.1.1 Definition ............................................ 12 70 5.6.1.2 Usage ................................................. 13 71 5.7 Encapsulation ......................................... 14 72 5.7.1 Encapsulation in GRE .................................. 14 73 5.7.2 Encapsulation in IP ................................... 15 74 5.7.3 Interoperability ...................................... 15 75 5.8 MTU ................................................... 16 76 5.9 TTL ................................................... 16 77 5.10 Differentiated Services ............................... 16 78 5.11 Avoiding Conflict with Internet Multicast ............. 16 79 6 The PIM C-Instance and the MT ......................... 17 80 6.1 PIM C-Instance Control Packets ........................ 17 81 6.2 PIM C-instance RPF Determination ...................... 17 82 6.2.1 Connector Attribute ................................... 18 83 7 Data MDT: Optimizing flooding ......................... 19 84 7.1 Limitation of Multicast Domain ........................ 19 85 7.2 Signaling Data MDT Trees .............................. 19 86 7.3 Use of SSM for Data MDTs .............................. 21 87 8 Packet Formats and Constants .......................... 21 88 8.1 MDT TLV ............................................... 21 89 8.2 MDT Join TLV .......................................... 22 90 8.3 Constants ............................................. 23 91 9 IANA Considerations ................................... 23 92 10 Security Considerations ............................... 24 93 11 Acknowledgments ....................................... 24 94 12 Normative References .................................. 24 95 13 Informative References ................................ 25 96 14 Authors' Addresses .................................... 25 98 1. Specification of requirements 100 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 101 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 102 document are to be interpreted as described in [RFC2119]. 104 2. Introduction 106 This draft describes the deployed MVPN (Multicast in BGP/MPLS IP 107 VPNs) solution of Cisco Systems. This is sometimes known as the 108 "PIM+GRE" MVPN profile (see [MVPN-PROFILES], section 2, which recasts 109 the contents of this document into the terminology of a more 110 generalized MVPN framework defined by the L3VPN WG). This document 111 is being made available as it is often used as a reference for 112 interoperating with deployed implementations. 114 The procedures specified in this draft differ in a few minor respects 115 from the fully standards-compliant PIM+GRE profile. These 116 differences are pointed out where they occur. 118 The base specification for BGP/MPLS IP VPNs [RFC4364] does not 119 provide a way for IP multicast data or control traffic to travel from 120 one VPN site to another. This document extends that specification by 121 specifying the necessary protocols and procedures for support of IP 122 multicast. Only IPv4 multicast is considered in this specification. 124 This specification presupposes that: 126 1. PIM [PIMv2] is the multicast routing protocol used within the 127 VPN, 129 2. PIM is also the multicast routing protocol used within the SP 130 network, and 132 3. the SP network supports native IP multicast forwarding. 134 Familiarity with the terminology and procedures of [RFC4364] is 135 presupposed. Familiarity with [PIMv2] is also presupposed. 137 2.1. Scaling Multicast State Info. in the Network Core 139 The BGP/MPLS IP VPN service of [RFC4364] provides a VPN with 140 "optimal" unicast routing through the SP backbone, in that a packet 141 follows the "shortest path" across the backbone, as determined by the 142 backbone's own routing algorithm. This optimal routing is provided 143 without requiring the P routers to maintain any routing information 144 which is specific to a VPN; indeed, the P routers do not maintain any 145 per-VPN state at all. 147 Unfortunately, optimal MULTICAST routing cannot be provided without 148 requiring the P routers to maintain some VPN-specific state 149 information. Optimal multicast routing would require that one or 150 more multicast distribution trees be created in the backbone for each 151 multicast group that is in use. If a particular multicast group from 152 within a VPN is using source-based distribution trees, optimal 153 routing requires that there be one distribution tree for each 154 transmitter of that group. If shared trees are being used, one tree 155 for each group is still required. Each such tree requires state in 156 some set of the P routers, with the amount of state being 157 proportional to the number of multicast transmitters. The reason 158 there needs to be at least one distribution tree per multicast group 159 is that each group may have a different set of receivers; multicast 160 routing algorithms generally go to great lengths to ensure that a 161 multicast packet will not be sent to a node which is not on the path 162 to a receiver. 164 Given that an SP generally supports many VPNs, where each VPN may 165 have many multicast groups, and each multicast group may have many 166 transmitters, it is not scalable to have one or more distribution 167 trees for each multicast group. The SP has no control whatsoever 168 over the number of multicast groups and transmitters that exist in 169 the VPNs, and it is difficult to place any bound on these numbers. 171 In order to have a scalable multicast solution for MPLS/BGP IP VPNs, 172 the amount of state maintained by the P routers needs to be 173 proportional to something which IS under the control of the SP. This 174 specification describes such a solution. In this solution, the 175 amount of state maintained in the P routers is proportional only to 176 the number of VPNs which run over the backbone; the amount of state 177 in the P routers is NOT sensitive to the number of multicast groups 178 or to the number of multicast transmitters within the VPNS. To 179 achieve this scalability, the optimality of the multicast routes is 180 reduced. A PE which is not on the path to any receiver of a 181 particular multicast group may still receive multicast packets for 182 that group, and if so, will have to discard them. The SP does 183 however have control over the tradeoff between optimal routing and 184 scalability. 186 2.2. Overview 188 An SP determines whether a particular VPN is multicast-enabled. If 189 it is, it corresponds to a "Multicast Domain". A PE which attaches 190 to a particular multicast-enabled VPN is said to belong to the 191 corresponding Multicast Domain. For each Multicast Domain, there is 192 a default "Multicast Distribution Tree (MDT)" through the backbone, 193 connecting ALL of the PEs that belong to that Multicast Domain. A 194 given PE may be in as many Multicast Domains as there are VPNs 195 attached to that PE. However, each Multicast Domain has its own MDT. 196 The MDTs are created by running PIM in the backbone, and in general 197 an MDT also includes P routers on the paths between the PE routers. 199 In a departure from the usual multicast tree distribution procedures, 200 the Default MDT for a Multicast Domain is constructed automatically 201 as the PEs in the domain come up. Construction of the Default MDT 202 does not depend on the existence of multicast traffic in the domain; 203 it will exist before any such multicast traffic is seen. Default 204 MDTs correspond to the "MI-PMSIs" of [MVPN-ARCH]. 206 In BGP/IP MPLS VPNs, each CE router is a unicast routing adjacency of 207 a PE router, but CE routers at different sites do NOT become unicast 208 routing adjacencies of each other. This important characteristic is 209 retained for multicast routing -- a CE router becomes a PIM adjacency 210 of a PE router, but CE routers at different sites do NOT become PIM 211 adjacencies of each other. Multicast packets from within a VPN are 212 received from a CE router by an ingress PE router. The ingress PE 213 encapsulates the multicast packets and (initially) forwards them 214 along the Default MDT tree to all the PE routers connected to sites 215 of the given VPN. Every PE router attached to a site of the given 216 VPN thus receives all multicast packets from within that VPN. If a 217 particular PE routers is not on the path to any receiver of that 218 multicast group, the PE simply discards that packet. 220 If a large amount of traffic is being sent to a particular multicast 221 group, but that group does not have receivers at all the VPN sites, 222 it can be wasteful to forward that group's traffic along the Default 223 MDT. Therefore, we also specify a method for establishing individual 224 MDTs for specific multicast groups. We call these "Data MDTs". A 225 Data MDT delivers VPN data traffic for a particular multicast group 226 only to those PE routers which are on the path to receivers of that 227 multicast group. Using a Data MDT has the benefit of reducing the 228 amount of multicast traffic on the backbone, as well reducing the 229 load on some of the PEs; it has the disadvantage of increasing the 230 amount of state that must be maintained by the P routers. The SP has 231 complete control over this tradeoff. Data MDTs correspond to the S- 232 PMSIs of [MVPN-ARCH]. 234 This solution requires the SP to deploy appropriate protocols and 235 procedures, but is transparent to the SP's customers. An enterprise 236 which uses PIM-based multicasting in its network can migrate from a 237 private network to a BGP/MPLS IP VPN service, while continuing to use 238 whatever multicast router configurations it was previously using; no 239 changes need be made to CE routers or to other routers at customer 240 sites. For instance, any dynamic RP-discovery procedures that area 241 already in use may be left in place. 243 3. Multicast VRFs 245 The notion of a "VRF", defined in [RFC4364], is extended to include 246 multicast routing entries as well as unicast routing entries. 248 Each VRF has its own multicast routing table. When a multicast data 249 or control packet is received from a particular CE device, multicast 250 routing is done in the associated VRF. 252 Each PE router runs a number of instances of PIM-SM, as many as one 253 per VRF. In each instance of PIM-SM, the PE maintains a PIM 254 adjacency with each of the PIM-capable CE routers associated with 255 that VRF. The multicast routing table created by each instance is 256 specific to the corresponding VRF. We will refer to these PIM 257 instances as "VPN-specific PIM instances", or "PIM C-instances". 259 Each PE router also runs a "provider-wide" instance of PIM-SM (a "PIM 260 P-instance"), in which it has a PIM adjacency with each of its IGP 261 neighbors (i.e., with P routers), but NOT with any CE routers, and 262 not with other PE routers (unless they happen to be adjacent in the 263 SP's network). The P routers also run the P-instance of PIM, but do 264 NOT run a C-instance. 266 In order to help clarify when we are speaking of the PIM P-instance 267 and when we are speaking of a a PIM C-instance, we will also apply 268 the prefixes "P-" and "C-" respectively to control messages, 269 addresses, etc. Thus a P-Join would be a PIM Join which is processed 270 by the PIM P-instance, and a C-Join would be a PIM Join which is 271 processed by a C-instance. A P-group address would be a group 272 address in the SP's address space, and a C-group address would be a 273 group address in a VPN's address space. 275 4. Multicast Domains 277 4.1. Model of Operation 279 A "Multicast Domain (MD)" is essentially a set of VRFs associated 280 with interfaces that can send multicast traffic to each other. From 281 the standpoint of PIM C-instance, a multicast domain is equivalent to 282 a multi-access interface. The PE routers in a given MD become PIM 283 adjacencies of each other in the PIM C-instance. 285 Each multicast VRF is assigned to one MD. Each MD is configured with 286 a distinct, multicast P-group address, called the "Default MDT group 287 address". This address is used to build the Default MDT for the MD. 289 When a PE router needs to send PIM C-instance control traffic to the 290 other PE routers in the MD, it encapsulates the control traffic, with 291 its own address as source IP address and the Default MDT group 292 address as destination IP address. Note that the Default MDT is part 293 of P-instance of PIM, whereas the PEs that communicate over the 294 Default MDT are PIM adjacencies in a C-instance. Within the C- 295 instance, the Default MDT appears to be a multi-access network to 296 which all the PEs are attached. This is discussed in more detail in 297 section 5. 299 The Default MDT does not only carry the PIM control traffic of the 300 MD's PIM C-instance. It also, by default, carries the multicast data 301 traffic of the C-instance. In some cases though, multicast data 302 traffic in a particular MD will be sent on a Data MDT rather than on 303 the Default MDT The use of Data MDTs is described in section 7. 305 Note that, if an MDT (Default or Data) is set up using the ASM 306 Service Model, MDT (Default or Data) must have a P-group address 307 which is "globally unique" (more precisely, unique over the set of SP 308 networks carrying the multicast traffic of the corresponding MD). If 309 if the MDT is set up using the SSM model, the P-group address of an 310 MDT only needs to be unique relative to the source of the MDT (though 311 see section 5.4). However, some implementations require the same SSM 312 group address to be assigned to all the PEs. Interoperability with 313 those implementations requires conformance to this restriction 315 5. Multicast Tunnels 317 An MD can be thought of as a set of PE routers connected by a 318 "multicast tunnel (MT)". From the perspective of a VPN-specific PIM 319 instance, an MT is a single multi-access interface. In the SP 320 network, a single MT is realized as a Default MDT combined with zero 321 or more Data MDTs. 323 5.1. Ingress PEs 325 An ingress PE is a PE router that is either directly connected to the 326 multicast sender in the VPN, or via a CE router. When the multicast 327 sender starts transmitting, and if there are receivers (or PIM RP) 328 behind other PE routers in the common MD, the ingress PE becomes the 329 transmitter of either the Default MDT group or a Data MDT group in 330 the SP network. 332 5.2. Egress PEs 334 A PE router with a VRF configured in an MD becomes a receiver of the 335 Default MDT group for that MD. A PE router may also join a Data MDT 336 group if if it has a VPN-specific PIM instance in which it is 337 forwarding to one of its attached sites traffic for a particular C- 338 group, and that particular C-group has been associated with that 339 particular Data MDT. When a PE router joins any P-group used for 340 encapsulating VPN multicast traffic, the PE router becomes one of the 341 endpoints of the corresponding MT. 343 When a packet is received from an MT, the receiving PE derives the MD 344 from the destination address which is a P-group address of the the 345 packet received. The packet is then passed to the corresponding 346 Multicast VRF and VPN-specific PIM instance for further processing. 348 5.3. Tunnel Destination Address(es) 350 An MT is an IP tunnel for which the destination address is a P-group 351 address. However an MT is not limited to using only one P-group 352 address for encapsulation. Based on the payload VPN multicast 353 traffic, it can choose to use the Default MDT group address, or one 354 of the Data MDT group addresses (as described in section 7 of this 355 document), allowing the MT to reach a different set of PE routers in 356 the common MD. 358 5.4. Auto-Discovery 360 Any of the variants of PIM may be used to set up the Default MDT: 361 PIM-SM, Bidirectional PIM [BIDIR], or PIM-SSM [SSM]. Except in the 362 case of PIM-SSM, the PEs need only know the proper P-group address in 363 order to begin setting up the Default MDTs. The PEs will then 364 discover each others' addresses by virtue of receiving PIM control 365 traffic, e.g., PIM Hellos, sourced (and encapsulated) by each other. 367 However, in the case of PIM-SSM, the necessary MDTs for an MD cannot 368 be set up until each PE in the MD knows the source address of each of 369 the other PEs in that same MD. This information needs to be auto- 370 discovered. 372 A new BGP Address Family, MDT-SAFI is defined. The NLRI for this 373 address family consists of an RD, an IPv4 unicast address, and an 374 multicast group address. A given PE router in a given MD constructs 375 an NLRI in this family from: 377 - Its own IPv4 address. If it has several, it uses the one which 378 it will be placing in the IP source address field of multicast 379 packets that it will be sending over the MDT. 381 - An RD which has been assigned to the MD. 383 - The P-group address which is to be used as the IP destination 384 address field of multicast packets that will be sent over the 385 MDT. 387 When a PE distributes this NLRI via BGP, it may include a Route 388 Target Extended Communities attribute. This RT must be an "Import 389 RT" [RFC4364] of each VRF in the MD. The ordinary BGP distribution 390 procedures used by [RFC4364] will then ensure that each PE learns the 391 MDT-SAFI "address" of each of the other PEs in the MD, and that the 392 learned MDT-SAFI addresses get associated with the right VRFs. 394 If a PE receives an MDT-SAFI NLRI which does not have an RT 395 attribute, the P-group address from the NLRI has to be used to 396 associate the NLRI with a particular VRF. In this case, each 397 multicast domain must be associated with a unique P-address, even if 398 PIM-SSM is used. However, finding a unique P-address for a multi- 399 provider multicast group may be difficult. 401 In order to facilitate the deployment of multi-provider multicast 402 domains, this specification REQUIRES the use of the MDT-SAFI NLRI 403 (even if PIM-SSM is not used to set up the default MDT). This 404 specification also REQUIRES that an implementation be capable of 405 using PIM-SSM to set up the default MDT. 407 In the standard PIM+GRE profile, the MDT-SAFI is replaced by the 408 "Intra-AS I-PMSI A-D Route." The latter is a generalized version of 409 the MDT-SAFI, which allows the "default MDTs" and "data MDTs" to be 410 implemented as MPLS P2MP or MP2MP LSPs, as well as by PIM-created 411 multicast distribution trees. In the latter case, the Intra-AS A-D 412 routes carry the same information that the MDT-SAFI does, though with 413 a different encoding. 415 The Intra-AS A-D Routes also carry Route Targets, and so may be 416 distributed inter-AS in the same manner as unicast routes. (Inter-AS 417 distribution of "Intra-AS I-PMSI A-D routes" is necessary in some 418 cases, see below.) 420 The encoding of the MDT-SAFI is specified in the following 421 subsection: 423 5.4.1. MDT-SAFI 425 BGP messages in which AFI=1 and SAFI=66 are "MDT-SAFI" messages. 427 The NLRI format is 8-byte-RD:IPv4-address followed by the MDT group 428 address. i.e. The MP_REACH attribute for this SAFI will contain one 429 or more tuples of the following form : 431 +-------------------------------+ 432 | | 433 | RD:IPv4-address (12 octets) | 434 | | 435 +-------------------------------+ 436 | Group Address (4 octets) | 437 +-------------------------------+ 439 The IPv4 address identifies the PE that originated this route, and 440 the RD identifies a VRF in that PE. The group address must be a mul- 441 ticast group address, and is used to build the P-tunnels. All PEs 442 attached to a given MVPN must specify the same group-address, even if 443 the group is an SSM group. MDT-SAFI routes do not carry RTs, and the 444 group address is used to associated a received MDT-SAFI route with a 445 VRF. 447 5.5. Which PIM Variant to Use 449 To minimize the amount of multicast routing state maintained by the P 450 routers, the Default MDTs should be realized as shared trees, such as 451 PIM Bidirectional trees. However, the operational procedures for 452 assigning P-group addresses may be greatly simplified, especially in 453 the case of multi-provider MDs, if PIM-SSM is used. 455 Data MDTs are best realized as source trees, constructed via PIM-SSM. 457 5.6. Inter-AS MDT Construction 459 Standard PIM techniques for the construction of source trees 460 presuppose that every router has a route to the source of the tree. 461 However, if the source of the tree is in a different AS than a 462 particular P router, it is possible that the P router will not have a 463 route to the source. For example, the remote AS may be using BGP to 464 distribute a route to the source, but a particular P router may be 465 part of a "BGP-free core", in which the P routers are not aware of 466 BGP-distributed routes. 468 What is needed in this case is a way for a PE to tell PIM to 469 construct the tree through a particular BGP speaker, the "BGP next 470 hop" for the tree source. This can be accomplished with a PIM 471 extension. 473 If the PE has selected the source of the tree from the MDT SAFI 474 address family, then it may be desirable to build the tree along the 475 route to the MDT SAFI address, rather than along the route to the 476 corresponding IPv4 address. This enables the inter-AS portion of the 477 tree to follow a path which is specifically chosen for multicast 478 (i.e., it allows the inter-AS multicast topology to be "non- 479 congruent" to the inter-AS unicast topology). This too requires a 480 PIM extension. 482 The necessary PIM extension is the PIM MVPN Join Attribute described 483 in in the following sub-section. 485 5.6.1. The PIM MVPN Join Attribute 487 5.6.1.1. Definition 489 In [PIM-ATTRIB], the notion of a "join attribute" is defined, and a 490 format for included join attributes in PIM Join/Prune messages is 491 specified. We now define a new join attribute, which we call the 492 "MVPN Join Attribute". 494 0 1 2 3 495 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 496 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 497 | Type | Length | Proxy IP address 498 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 499 | RD 500 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-....... 502 The Type field of the MVPN Join Attribute is set to 1. 504 The F bit is set to 0. 506 Two information fields are carried in the MVPN Join attribute: 508 - Proxy: The IP address of the node towards which the PIM Join/Prune 509 message is to be forwarded. This will either be an IPv4 or an IPv6 510 address, depending on whether the PIM Join/Prune message itself is 511 IPv4 or IPv6. 513 - RD: An eight-byte RD. This immediately follows the proxy IP 514 address. 516 The PIM message also carries the address of the upstream PE. 518 In the case of an intra-AS MVPN, the proxy and the upstream PE are the 519 same. In the case of an inter-AS MVPN, proxy will be the ASBR which is 520 the exit point from the local AS on the path to the upstream PE. 522 5.6.1.2. Usage 524 When a PE router creates a PIM Join/Prune message in order to set up 525 an inter-AS default MDT, it does so as a result of having received a 526 particular MDT-SAFI route. It includes an MVPN Join attribute whose 527 fields are set as follows: 529 - If the upstream PE is in the same AS as the local PE, then the 530 proxy field contains the address of the upstream PE. Otherwise, 531 it contains the address of the BGP next hop on the route to the 532 upstream PE. 534 - The Rd field contains the RD from the NLRI of the MDT-SAFI route. 536 - The upstream PE field contains the address of the PE that 537 originated the MDT-SAFI route (obtained from the NLRI of that 538 route). 540 When a PIM router processes a PIM Join/Prune message with an MVPN 541 Join Attribute, it first checks to see if the proxy field contains 542 one of its own addresses. 544 If not, the router uses the proxy IP address in order to determine 545 the RPF interface and neighbor. The MVPN Join Attribute must be 546 passed upstream, unchanged. 548 If the proxy address is one of the router's own IP addresses, then 549 the router looks in its BGP routing table for an MDT-SAFI route whose 550 NLRI consists of the upstream PE address prepended with the RD from 551 the Join attribute. If there is no match, the PIM message is 552 discarded. If there is a match the IP address from the BGP next hop 553 field of the matching route is used in order to determine the RPF 554 interface and neighbor. When the PIM Join/Prune is forwarded 555 upstream, the proxy field is replaced with the address of the BGP 556 next hop, and the RD and upstream PE fields are left unchanged. 558 5.7. Encapsulation 560 5.7.1. Encapsulation in GRE 562 GRE [GRE1701] encapsulation is recommended when sending multicast 563 traffic through an MDT. The following diagram shows the progression 564 of the packet as it enters and leaves the service provider network. 566 Packets received Packets in transit Packets forwarded 567 at ingress PE in the service by egress PEs 568 provider network 570 +---------------+ 571 | P-IP Header | 572 +---------------+ 573 | GRE | 574 ++=============++ ++=============++ ++=============++ 575 || C-IP Header || || C-IP Header || || C-IP Header || 576 ++=============++ >>>>> ++=============++ >>>>> ++=============++ 577 || C-Payload || || C-Payload || || C-Payload || 578 ++=============++ ++=============++ ++=============++ 580 The IPv4 Protocol Number field in the P-IP Header must be set to 47. 581 The Protocol Type field of the GRE Header must be set to 0x800. 583 [GRE2784] specifies an optional GRE checksum, and [GRE2890] specifies 584 optional GRE key and sequence number fields. 586 The GRE key field is not needed because the P-group address in the 587 delivery IP header already identifies the MD, and thus associates the 588 VRF context, for the payload packet to be further processed. 590 The GRE sequence number field is also not needed because the 591 transport layer services for the original application will be 592 provided by the C-IP Header. 594 The use of GRE checksum field must follow [GRE2784]. 596 To facilitate high speed implementation, this document recommends 597 that the ingress PE routers encapsulate VPN packets without setting 598 the checksum, key or sequence field. 600 5.7.2. Encapsulation in IP 602 IP-in-IP [IPIP1853] is also a viable option. When it is used, the 603 IPv4 Protocol Number field is set to 4. The following diagram shows 604 the progression of the packet as it enters and leaves the service 605 provider network. 607 Packets received Packets in transit Packets forwarded 608 at ingress PE in the service by egress PEs 609 provider network 611 +---------------+ 612 | P-IP Header | 613 ++=============++ ++=============++ ++=============++ 614 || C-IP Header || || C-IP Header || || C-IP Header || 615 ++=============++ >>>>> ++=============++ >>>>> ++=============++ 616 || C-Payload || || C-Payload || || C-Payload || 617 ++=============++ ++=============++ ++=============++ 619 5.7.3. Interoperability 621 PE routers in a common MD must agree on the method of encapsulation. 622 This can be achieved either via configuration or means of some 623 discovery protocols. To help reduce configuration overhead and 624 improve multi-vendor interoperability, it is strongly recommended 625 that GRE encapsulation must be supported and enabled by default. 627 5.8. MTU 629 Because multicast group addresses are used as tunnel destination 630 addresses, existing Path MTU discovery mechanisms can not be used. 631 This requires that: 633 1. The ingress PE router (one that does the encapsulation) must 634 not set the DF bit in the outer header, and 636 2. If the "DF" bit is cleared in the IP header of the C-Packet, 637 fragment the C-Packet before encapsulation if appropriate. 638 This is very important in practice due to the fact that the 639 performance of reassembly function is significantly lower than 640 that of decapsulating and forwarding packets on today's router 641 implementations. 643 5.9. TTL 645 The ingress PE should not copy the TTL field from the payload IP 646 header received from a CE router to the delivery IP header. The 647 setting the TTL of the deliver IP header is determined by the local 648 policy of the ingress PE router. 650 5.10. Differentiated Services 652 By default, the setting of the DS field in the delivery IP header 653 should follow the guidelines outlined in [DIFF2983]. An SP may also 654 choose to deploy any of the additional mechanisms the PE routers 655 support. 657 5.11. Avoiding Conflict with Internet Multicast 659 If the SP is providing Internet multicast, distinct from its VPN 660 multicast services, it must ensure that the P-group addresses which 661 correspond to its MDs are distinct from any of the group addresses of 662 the Internet multicasts it supports. This is best done by using 663 administratively scoped addresses [ADMIN-ADDR]. 665 The C-group addresses need not be distinct from either the P-group 666 addresses or the Internet multicast addresses. 668 6. The PIM C-Instance and the MT 670 If a particular VRF is in a particular MD, the corresponding MT is 671 treated by that VRF's VPN-specific PIM instances as a LAN interface. 672 The PEs which are adjacent on the MT must execute the PIM LAN 673 procedures, including the generation and processing of PIM Hello, 674 Join/Prune, Assert, DF election and other PIM control packets. 676 6.1. PIM C-Instance Control Packets 678 The PIM protocol packets are sent to ALL-PIM-ROUTERS (224.0.0.13) in 679 the context of that VRF, but when in transit in the provider network, 680 they are encapsulated using the Default MDT group configured for that 681 MD. This allows VPN-specific PIM routes to be extended from site to 682 site without appearing in the P routers. 684 6.2. PIM C-instance RPF Determination 686 Although the MT is treated as a PIM-enabled interface, unicast 687 routing is NOT run over it, and there are no unicast routing 688 adjacencies over it. It is therefore necessary to specify special 689 procedures for determining when the MT is to be regarded as the "RPF 690 Interface" for a particular C-address. 692 When a PE needs to determine the RPF interface of a particular C- 693 address, it looks up the C-address in the VRF. If the route matching 694 it is not a VPN-IP route learned from MP-BGP as described in 695 [RFC4364], or if that route's outgoing interface is one of the 696 interfaces associated with the VRF, then ordinary PIM procedures for 697 determining the RPF interface apply. 699 However, if the route matching the C-address is a VPN-IP route whose 700 outgoing interface is not one of the interfaces associated with the 701 VRF, then PIM will consider the outgoing interface to be the MT 702 associated with the VPN-specific PIM instance. 704 Once PIM has determined that the RPF interface for a particular C- 705 address is the MT, it is necessary for PIM to determine the RPF 706 neighbor for that C-address. This will be one of the other PEs that 707 is a PIM adjacency over the MT. 709 The BGP "Connector" attribute is defined. Whenever a PE router 710 distributes a VPN-IPv4 address from a VRF that is part of an MD, it 711 SHOULD distribute a Connector attribute along with it. The Connector 712 attribute should specify the MDT address family, and its value should 713 be the IP address which the PE router is using as its source IP 714 address for multicast packets which encapsulated and sent over the 715 MT. Then when a PE has determined that the RPF interface for a 716 particular C-address is the MT, it must look up the Connector 717 attribute that was distributed along with the VPN-IPv4 address 718 corresponding to that C-address. The value of this Connector 719 attribute will be considered to be the RPF adjacency for the C- 720 address. 722 There are older implementations in which the Connector attribute is 723 not present. In this case, as long as "BGP Next Hop" for the C- 724 address is one of the PEs that is a PIM adjacency, then that PE 725 should be treated as the RPF adjacency for that C-address. 727 However, if the MD spans multiple Autonomous Systems, and an "option 728 b" interconnect is used, the BGP Next Hop might not be a PIM 729 adjacency, and the RPF check will not succeed unless the Connector 730 attribute is used. 732 In the standard PIM+GRE profile, the connector attribute is replaced 733 by the "VRF Route Import Extended Community" attribute. The latter 734 is a generalized version, but carries the same information as the 735 connector attribute does; the encoding however is different. 737 The connector attribute is defined in the following sub-section. 739 6.2.1. Connector Attribute 741 The Connector Attribute is an optional transitive attribute. It is 742 formatted as follows: 744 0 1 745 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 746 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 747 |0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1| 748 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 749 | | 750 | IPv4 Address of PE | 751 | | 752 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 754 7. Data MDT: Optimizing flooding 756 7.1. Limitation of Multicast Domain 758 While the procedure specified in the previous section requires the P 759 routers to maintain multicast state, the amount of state is bounded 760 by the number of supported VPNs. The P routers do NOT run any VPN- 761 specific PIM instances. 763 In particular, the use of a single bidirectional tree per VPN scales 764 well as the number of transmitters and receivers increases, but not 765 so well as the amount of multicast traffic per VPN increases. 767 The multicast routing provided by this scheme is not optimal, in that 768 a packet of a particular multicast group may be forwarded to PE 769 routers which have no downstream receivers for that group, and hence 770 which may need to discard the packet. 772 In the simplest configuration model, only the Default MDT group is 773 configured for each MD. The result of the configuration is that all 774 VPN multicast traffic, control or data, will be encapsulated and 775 forwarded to all PE routers that are part of the MD. While this 776 limits the number of multicast routing states the provider network 777 has to maintain, it also requires PE routers to discard multicast C- 778 packets if there are not receivers for those packets in the 779 corresponding sites. In some cases, especially when the content 780 involves high bandwidth but only a limited set of receivers, it is 781 desirable that certain C-packets only travel to PE routers that do 782 have receivers in the VPN to save bandwidth in the network and reduce 783 load on the PE routers. 785 7.2. Signaling Data MDT Trees 787 A simple protocol is proposed to signal additional P-group addresses 788 to encapsulate VPN traffic. These P-group addresses are called data 789 MDT groups. The ingress PE router advertises a different P-group 790 address (as opposed to always using the Default MDT group) to 791 encapsulate VPN multicast traffic. Only the PE routers on the path 792 to eventual receivers join the P-group, and therefore form an optimal 793 multicast distribution tree in the service provider network for the 794 VPN multicast traffic. These multicast distribution trees are called 795 Data MDT trees because they do not carry PIM control packets 796 exchanged by PE routers. 798 The following documents the procedures of the initiation and teardown 799 of the Data MDT trees. The definition of the constants and timers 800 can be found in section 8. 802 - The PE router connected to the source of the content initially 803 uses the Default MDT group when forwarding the content to the MD. 805 - When one or more pre-configured conditions are met, it starts to 806 periodically announce MDT Join TLV at the interval of 807 [MDT_INTERVAL]. The MDT Join TLV is forwarded to all the PE 808 routers in the MD. 810 If a PE in a particular MD transmits a C-multicast data packet to 811 the backbone, by transmitting it through an MD, every other PE in 812 that MD will receive it. Any of those PEs which are not on a C- 813 multicast distribution tree for the packet's C-multicast 814 destination address (as determined by applying ordinary PIM 815 procedures to the corresponding multicast VRF) will have to 816 discard the packet. 818 A commonly used condition is the bandwidth. When the VPN traffic 819 exceeds certain threshold, it is more desirable to deliver the 820 flow to the PE routers connected to receivers in order to 821 optimize the performance of PE routers and the resource of the 822 provider network. However, other conditions can also be devised 823 and they are purely implementation specific. 825 - The MDT Join TLV is encapsulated in UDP and the packet is 826 addressed to ALL-PIM-ROUTERS (224.0.0.13) in the context of the 827 VRF and encapsulated using the Default MDT group when sent to the 828 MD. This allows all PE routers to receive the information. 830 - Upon receiving MDT Join TLV, PE routers connected to receivers 831 will join the Data MDT group announced by the MDT Join TLV in the 832 global table. When the Data MDT group is in PIM-SM or 833 bidirectional PIM mode, the PE routers build a shared tree toward 834 the RP. When the data MDT group is setup using PIM-SSM, the PE 835 routers build a source tree toward the PE router that is 836 advertising the MDT Join TLV. The IP address of the source 837 address is the same as the source IP address used in the IP 838 packet advertising the MDT Join TLV. 840 PE routers which are not connected to receivers may wish to cache 841 the states in order to reduce the delay when a receiver comes up 842 in the future. 844 - After [MDT_DATA_DELAY], the PE router connected to the source 845 starts encapsulating traffic using the Data MDT group. 847 - When the pre-configured conditions are no longer met, e.g. the 848 traffic stops, the PE router connected to the source stops 849 announcing MDT Join TLV. 851 - If the MDT Join TLV is not received over [MDT_DATA_TIMEOUT], PE 852 routers connected to the receivers just leave the Data MDT group 853 in the global instance. 855 7.3. Use of SSM for Data MDTs 857 The use of Data MDTs requires that a set of multicast P-addresses be 858 pre-allocated and dedicated for use as the destination addresses for 859 the Data MDTs. 861 If SSM is used to set up the Data MDTs, then each MD needs to be 862 assigned a set of these of multicast P-addresses. Each VRF in the MD 863 needs to be configured with this set (i.e., all VRFs in the MD are 864 configured with the same set). If there are n addresses in this set, 865 then each PE in the MD can be the source of n Data MDTs in that MD. 867 If SSM is not used for setting up Data MDTs, then each VRF needs to 868 be configured with a unique set of multicast P-addresses; two VRFs in 869 the same MD cannot be configured with the same set of addresses. 870 This requires the pre-allocation of many more multicast P-addresses, 871 and the need to configure a different set for each VRF greatly 872 complicates the operations and management. Therefore the use of SSM 873 for Data MDTs is very strongly recommended. 875 8. Packet Formats and Constants 877 8.1. MDT TLV 879 "MDT TLV" has the following format. It uses port 3232. 881 0 1 2 3 882 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 883 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 884 | Type | Length | Value | 885 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 886 | . | 887 | . | 888 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 890 Type (8 bits): 892 the type of the MDT TLV. Currently, only 1, MDT Join TLV is 893 defined. 895 Length (16 bits): 897 the total number of octets in the TLV for this type, including 898 both the Type and Length field. 900 Value (variable length): 902 the content of the TLV. 904 8.2. MDT Join TLV 906 "MDT Join TLV" has the following format. 908 0 1 2 3 909 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 910 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 911 | Type | Length | Reserved | 912 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 913 | C-source | 914 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 915 | C-group | 916 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 917 | P-group | 918 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 920 Type (8 bits): 922 as defined above. For MDT Join TLV, the value of the field is 1. 924 Length (16 bits): 926 as defined above. For MDT Join TLV, the value of the field is 927 16, including 1 byte of padding. 929 Reserved (8 bits): 931 for future use. 933 C-Source (32 bits): 935 the IPv4 address of the traffic source in the VPN. 937 C-Group (32 bits): 939 the IPv4 address of the multicast traffic destination address in 940 the VPN. 942 P-Group (32 bits): 944 the IPv4 group address that the PE router is going to use to 945 encapsulate the flow (C-Source, C-Group). 947 8.3. Constants 949 [MDT_DATA_DELAY]: 951 the interval before the PE router connected to the source to 952 switch to the Data MDT group. The default value is 3 seconds. 954 [MDT_DATA_TIMEOUT]: 956 the interval before which the PE router connected to the 957 receivers to time out MDT JOIN TLV received and leave the data 958 MDT group. The default value is 3 minutes. This value must be 959 consistent among PE routers. 961 [MDT_DATA_HOLDOWN]: 963 the interval before which the PE router will switch back to the 964 Default MDT tree after it started encapsulating packets using the 965 Data MDT group. This is used to avoid oscillation when traffic 966 is bursty. The default value is 1 minute. 968 [MDT_INTERVAL] 969 the interval the source PE router uses to periodically send 970 MDT_JOIN_TLV message. The default value is 60 seconds. 972 9. IANA Considerations 974 The codepoint for the connector attribute is defined in IANA's 975 registry of BGP attributes. The reference should be changed to refer 976 to this document. 978 The codepoint for MDT-SAFI is defined in IANA's registry of BGP SAFI 979 assignments. The reference should be changed to refer to this 980 document. 982 10. Security Considerations 984 [RFC4364] discusses in general the security considerations that 985 pertain to when the RFC4364 type of VPN is deployed. 987 [PIMv2] discusses the security considerations that pertain to the use 988 of PIM. 990 The security considerations of [RFC4023] and [RFC4797] apply whenever 991 VPN traffic is carried through IP or GRE tunnels. 993 Each PE router MUST install packet filters that would result in 994 discarding all UDP packets with the destination port 3232 that the PE 995 router receives from the CE routers connected to the PE router. 997 11. Acknowledgments 999 Major contributions to this work have been made by Dan Tappan and 1000 Tony Speakman. 1002 The authors also wish to thank Arjen Boers, Robert Raszuk, Toerless 1003 Eckert and Ted Qian for their help and their ideas. 1005 12. Normative References 1007 [GRE2784] "Generic Routing Encapsulation (GRE)", Farinacci, Li, 1008 Hanks, Meyer, Traina, March 2000, RFC 2784 1010 [PIMv2] "Protocol Independent Multicast - Sparse Mode (PIM-SM)", 1011 Fenner, Handley, Holbrook, Kouvelas, August 2006, RFC 4601 1013 [PIM-ATTRIB] "The PIM Join Attribute Format" A. Boers, IJ. Wijnands, 1014 E. Rosen, November 2008, RFC 5384 1016 [RFC2119] "Key words for use in RFCs to Indicate Requirement 1017 Levels.", Bradner, March 1997, RFC 2119 1019 [RFC4364] "BGP/MPLS IP VPNs", Rosen, Rekhter, February 2006, RFC 1020 4364 1022 13. Informative References 1024 [ADMIN-ADDR] "Administratively Scoped IP Multicast", Meyer, July 1025 1998, RFC 2365 1027 [BIDIR] "Bidirectional Protocol Independent Multicast", Handley, 1028 Kouvelas, Speakman, Vicisano, October 2007, RFC 5015 1030 [DIFF2983] "Differentiated Services and Tunnels", Black, October 1031 2000, RFC2983. 1033 [GRE1701] "Generic Routing Encapsulation (GRE)", Farinacci, Li, 1034 Hanks, Traina, October 1994, RFC 1701 1036 [GRE2890] "Key and Sequence Number Extensions to GRE", Dommety, 1037 September 2000, RFC 2890 1039 [IPIP1853] "IP in IP Tunneling", Simpson, October 1995, RFC1853. 1041 [MVPN-ARCH] "Multicast in MPLS/BGP IP VPNs", draft-ietf- 1042 l3vpn-2547bis-mcast-07.txt, June 2008 1044 [MVPN-PROFILES] "MVPN Profiles Using PIM Control Plane", Rosen, 1045 Boers, Cai, Wijnands, June 2008, draft-rosen-l3vpn-mvpn- 1046 profiles-01.txt 1048 [SSM] "Source-Specific Multicast for IP", Holbrook, Cain, August 1049 2006, RFC 4607 1051 [RFC4023] " Encapsulating MPLS in IP or Generic Routing Encapsulation 1052 (GRE)", T. Worster, Y. Rekhter, E. Rosen, Ed.. March 2005, RFC 4023 1054 [RFC4797] "Use of Provider Edge to Provider Edge (PE-PE) Generic 1055 Routing Encapsulation (GRE) or IP in BGP/MPLS IP Virtual Private 1056 Networks", Y.Rekhter, R. Bonica, E. Rosen, January 2007, RFC 4797 1058 14. Authors' Addresses 1060 Yiqun Cai (Editor) 1061 Cisco Systems, Inc. 1062 170 Tasman Drive 1063 San Jose, CA, 95134 1064 E-mail: ycai@cisco.com 1066 Eric C. Rosen (Editor) 1067 Cisco Systems, Inc. 1068 1414 Massachusetts Avenue 1069 Boxborough, MA, 01719 1070 E-mail: erosen@cisco.com 1072 IJsbrand Wijnands 1073 Cisco Systems, Inc. 1074 170 Tasman Drive 1075 San Jose, CA, 95134 1076 E-mail: ice@cisco.com