idnits 2.17.1 draft-rosen-vpn-mcast-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.5 on line 931. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 904. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 911. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 917. ** The document claims conformance with section 10 of RFC 2026, but uses some RFC 3978/3979 boilerplate. As RFC 3978/3979 replaces section 10 of RFC 2026, you should not claim conformance with it if you have changed to using RFC 3978/3979 boilerplate. ** The document seems to lack an RFC 3978 Section 5.1 IPR Disclosure Acknowledgement. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** The document seems to lack an RFC 3978 Section 5.4 Reference to BCP 78 -- however, there's a paragraph with a matching beginning. Boilerplate error? ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (May 2004) is 7286 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'PIM-RPF-Proxy' is mentioned on line 414, but not defined == Unused Reference: 'MT-DISC' is defined on line 833, but no explicit reference was found in the text == Unused Reference: 'PIM-RPF-PROXY' is defined on line 840, but no explicit reference was found in the text == Unused Reference: 'BIDIR' is defined on line 853, but no explicit reference was found in the text == Unused Reference: 'GRE1701' is defined on line 860, but no explicit reference was found in the text == Unused Reference: 'SSM' is defined on line 872, but no explicit reference was found in the text -- No information found for draft-nalawade-mdt-safi - is the name correct? -- Possible downref: Normative reference to a draft: ref. 'MDT-SAFI' -- Possible downref: Non-RFC (?) normative reference: ref. 'MT-DISC' -- Possible downref: Non-RFC (?) normative reference: ref. 'PIMv2' -- Possible downref: Non-RFC (?) normative reference: ref. 'PIM-RPF-PROXY' ** Obsolete normative reference: RFC 2547 (Obsoleted by RFC 4364) == Outdated reference: A later version (-07) exists of draft-ietf-ssm-arch-04 Summary: 9 errors (**), 0 flaws (~~), 9 warnings (==), 11 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Eric C. Rosen (Editor) 3 Internet Draft Yiqun Cai (Editor) 4 Expiration Date: November 2004 IJsbrand Wijnands 5 Cisco Systems, Inc. 7 May 2004 9 Multicast in MPLS/BGP IP VPNs 11 draft-rosen-vpn-mcast-07.txt 13 Status of this Memo 15 This document is an Internet-Draft and is in full conformance with 16 all provisions of Section 10 of RFC2026. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 Abstract 36 In order for IP multicast traffic within a BGP/MPLS IP VPN (Virtual 37 Private Network) to travel from one VPN site to another, special 38 protocols and procedures must be implemented by the VPN Service 39 Provider. These protocols and procedures are specified in this 40 document. 42 Table of Contents 44 1 Specification of requirements ...................... 3 45 2 Introduction ....................................... 3 46 2.1 Scaling Multicast State Info. in the Network Core .. 3 47 2.2 Overview ........................................... 4 48 3 Multicast VRFs ..................................... 5 49 4 Multicast Domains .................................. 6 50 4.1 Model of Operation ................................. 6 51 5 Multicast Tunnels .................................. 7 52 5.1 Ingress PEs ........................................ 7 53 5.2 Egress PEs ......................................... 7 54 5.3 Tunnel Destination Address(es) ..................... 8 55 5.4 Auto-Discovery ..................................... 8 56 5.5 Which PIM Variant to Use ........................... 9 57 5.6 Inter-AS MDT Construction .......................... 9 58 5.7 Encapsulation ...................................... 10 59 5.7.1 Encapsulation in GRE ............................... 10 60 5.7.2 Encapsulation in IP ................................ 11 61 5.7.3 Encapsulation in MPLS .............................. 11 62 5.7.4 Interoperability ................................... 11 63 5.8 MTU ................................................ 12 64 5.9 TTL ................................................ 12 65 5.10 Differentiated Services ............................ 12 66 5.11 Avoiding Conflict with Internet Multicast .......... 12 67 6 The PIM C-Instance and the MT ...................... 13 68 6.1 PIM C-Instance Control Packets ..................... 13 69 6.2 PIM C-instance RPF Determination ................... 13 70 7 Data MDT: Optimizing flooding ...................... 14 71 7.1 Limitation of Multicast Domain ..................... 14 72 7.2 Signaling Data MDT Trees ........................... 15 73 7.3 Use of SSM for Data MDTs ........................... 16 74 8 Packet Formats and Constants ....................... 17 75 8.1 MDT TLV ............................................ 17 76 8.2 MDT Join TLV ....................................... 17 77 8.3 Constants .......................................... 18 78 9 Acknowledgments .................................... 19 79 10 Normative References ............................... 19 80 11 Informative References ............................. 20 81 12 Authors' Addresses ................................. 20 82 13 Intellectual Property Statement .................... 21 83 14 Full Copyright Statement ........................... 21 85 1. Specification of requirements 87 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 88 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 89 document are to be interpreted as described in [RFC2119]. 91 2. Introduction 93 The base specification for BGP/MPLS IP VPNs [RFC2547bis] does not 94 provide a way for IP multicast data or control traffic to travel from 95 one VPN site to another. This document extends that specification by 96 specifying the necessary protocols and procedures for support of IP 97 multicast. Only IPv4 multicast is considered in this specification. 99 This specification presupposes that: 101 1. PIM [PIMv2] is the multicast routing protocol used within the 102 VPN, 104 2. PIM is also the multicast routing protocol used within the SP 105 network, and 107 3. the SP network supports native IP multicast forwarding. 109 Familiarity with the terminology and procedures of [RFC2547bis] is 110 presupposed. Familiarity with [PIMv2] is also presupposed. 112 2.1. Scaling Multicast State Info. in the Network Core 114 The BGP/MPLS IP VPN service of [RFC2547bis] provides a VPN with 115 "optimal" unicast routing through the SP backbone, in that a packet 116 follows the "shortest path" across the backbone, as determined by the 117 backbone's own routing algorithm. This optimal routing is provided 118 without requiring the P routers to maintain any routing information 119 which is specific to a VPN; indeed, the P routers do not maintain any 120 per-VPN state at all. 122 Unfortunately, optimal MULTICAST routing cannot be provided without 123 requiring the P routers to maintain some VPN-specific state 124 information. Optimal multicast routing would require that one or 125 more multicast distribution trees be created in the backbone for each 126 multicast group that is in use. If a particular multicast group from 127 within a VPN is using source-based distribution trees, optimal 128 routing requires that there be one distribution tree for each 129 transmitter of that group. If shared trees are being used, one tree 130 for each group is still required. Each such tree requires state in 131 some set of the P routers, with the amount of state being 132 proportional to the number of multicast transmitters. The reason 133 there needs to be at least one distribution tree per multicast group 134 is that each group may have a different set of receivers; multicast 135 routing algorithms generally go to great lengths to ensure that a 136 multicast packet will not be sent to a node which is not on the path 137 to a receiver. 139 Given that an SP generally supports many VPNs, where each VPN may 140 have many multicast groups, and each multicast group may have many 141 transmitters, it is not scalable to have one or more distribution 142 trees for each multicast group. The SP has no control whatsoever 143 over the number of multicast groups and transmitters that exist in 144 the VPNs, and it is difficult to place any bound on these numbers. 146 In order to have a scalable multicast solution for MPLS/BGP IP VPNs, 147 the amount of state maintained by the P routers needs to be 148 proportional to something which IS under the control of the SP. This 149 specification describes such a solution. In this solution, the 150 amount of state maintained in the P routers is proportional only to 151 the number of VPNs which run over the backbone; the amount of state 152 in the P routers is NOT sensitive to the number of multicast groups 153 or to the number of multicast transmitters within the VPNS. To 154 achieve this scalability, the optimality of the multicast routes is 155 reduced. A PE which is not on the path to any receiver of a 156 particular multicast group may still receive multicast packets for 157 that group, and if so, will have to discard them. The SP does 158 however have control over the tradeoff between optimal routing and 159 scalability. 161 2.2. Overview 163 An SP determines whether a particular VPN is multicast-enabled. If 164 it is, it corresponds to a "Multicast Domain". A PE which attaches 165 to a particular multicast-enabled VPN is said to belong to the 166 corresponding Multicast Domain. For each Multicast Domain, there is 167 a default "Multicast Distribution Tree (MDT)" through the backbone, 168 connecting ALL of the PEs that belong to that Multicast Domain. A 169 given PE may be in as many Multicast Domains as there are VPNs 170 attached to that PE. However, each Multicast Domain has its own MDT. 171 The MDTs are created by running PIM in the backbone, and in general 172 an MDT also includes P routers on the paths between the PE routers. 174 In a departure from the usual multicast tree distribution procedures, 175 the Default MDT for a Multicast Domain is constructed automatically 176 as the PEs in the domain come up. Construction of the Default MDT 177 does not depend on the existence of multicast traffic in the domain; 178 it will exist before any such multicast traffic is seen. 180 In BGP/IP MPLS VPNs, each CE router is a unicast routing adjacency of 181 a PE router, but CE routers at different sites do NOT become unicast 182 routing adjacencies of each other. This important characteristic is 183 retained for multicast routing -- a CE router becomes a PIM adjacency 184 of a PE router, but CE routers at different sites do NOT become PIM 185 adjacencies of each other. Multicast packets from within a VPN are 186 received from a CE router by an ingress PE router. The ingress PE 187 encapsulates the multicast packets and (initially) forwards them 188 along the Default MDT tree to all the PE routers connected to sites 189 of the given VPN. Every PE router attached to a site of the given 190 VPN thus receives all multicast packets from within that VPN. If a 191 particular PE routers is not on the path to any receiver of that 192 multicast group, the PE simply discards that packet. 194 If a large amount of traffic is being sent to a particular multicast 195 group, but that group does not have receivers at all the VPN sites, 196 it can be wasteful to forward that group's traffic along the Default 197 MDT. Therefore, we also specify a method for establishing individual 198 MDTs for specific multicast groups. We call these "Data MDTs". A 199 Data MDT delivers VPN data traffic for a particular multicast group 200 only to those PE routers which are on the path to receivers of that 201 multicast group. Using a Data MDT has the benefit of reducing the 202 amount of multicast traffic on the backbone, as well reducing the 203 load on some of the PEs; it has the disadvantage of increasing the 204 amount of state that must be maintained by the P routers. The SP has 205 complete control over this tradeoff. 207 This solution requires the SP to deploy appropriate protocols and 208 procedures, but is transparent to the SP's customers. An enterprise 209 which uses PIM-based multicasting in its network can migrate from a 210 private network to a BGP/MPLS IP VPN service, while continuing to use 211 whatever multicast router configurations it was previously using; no 212 changes need be made to CE routers or to other routers at customer 213 sites. For instance, any dynamic RP-discovery procedures that area 214 already in use may be left in place. 216 3. Multicast VRFs 218 The notion of a "VRF", defined in [RFC2547bis], is extended to 219 include multicast routing entries as well as unicast routing entries. 221 Each VRF has its own multicast routing table. When a multicast data 222 or control packet is received from a particular CE device, multicast 223 routing is done in the associated VRF. 225 Each PE router runs a number of instances of PIM-SM, as many as one 226 per VRF. In each instance of PIM-SM, the PE maintains a PIM 227 adjacency with each of the PIM-capable CE routers associated with 228 that VRF. The multicast routing table created by each instance is 229 specific to the corresponding VRF. We will refer to these PIM 230 instances as "VPN-specific PIM instances", or "PIM C-instances". 232 Each PE router also runs a "provider-wide" instance of PIM-SM (a "PIM 233 P-instance"), in which it has a PIM adjacency with each of its IGP 234 neighbors (i.e., with P routers), but NOT with any CE routers, and 235 not with other PE routers (unless they happen to be adjacent in the 236 SP's network). The P routers also run the P-instance of PIM, but do 237 NOT run a C-instance. 239 In order to help clarify when we are speaking of the PIM P-instance 240 and when we are speaking of a a PIM C-instance, we will also apply 241 the prefixes "P-" and "C-" respectively to control messages, 242 addresses, etc. Thus a P-Join would be a PIM Join which is processed 243 by the PIM P-instance, and a C-Join would be a PIM Join which is 244 processed by a C-instance. A P-group address would be a group 245 address in the SP's address space, and a C-group address would be a 246 group address in a VPN's address space. 248 4. Multicast Domains 250 4.1. Model of Operation 252 A "Multicast Domain (MD)" is essentially a set of VRFs associated 253 with interfaces that can send multicast traffic to each other. From 254 the standpoint of PIM C-instance, a multicast domain is equivalent to 255 a multi-access interface. The PE routers in a given MD become PIM 256 adjacencies of each other in the PIM C-instance. 258 Each multicast VRF is assigned to one MD. Each MD is configured with 259 a distinct, multicast P-group address, called the "Default MDT group 260 address". This address is used to build the Default MDT for the MD. 262 When a PE router needs to send PIM C-instance control traffic to the 263 other PE routers in the MD, it encapsulates the control traffic, with 264 its own address as source IP address and the Default MDT group 265 address as destination IP address. Note that the Default MDT is part 266 of P-instance of PIM, whereas the PEs that communicate over the 267 Default MDT are PIM adjacencies in a C-instance. Within the C- 268 instance, the Default MDT appears to be a multi-access network to 269 which all the PEs are attached. This is discussed in more detail in 270 section 5. 272 The Default MDT does not only carry the PIM control traffic of the 273 MD's PIM C-instance. It also, by default, carries the multicast data 274 traffic of the C-instance. In some cases though, multicast data 275 traffic in a particular MD will be sent on a Data MDT rather than on 276 the Default MDT The use of Data MDTs is described in section 7. 278 Note that, if an MDT (Default or Data) is set up using PIM-SM or 279 Bidirectional PIM, each MDT (Default or Data) must have a P-group 280 address which is "globally unique" (more precisely, unique over the 281 set of SP networks carrying the multicast traffic of the 282 corresponding MD). If PIM-SSM is used, the P-group address of an MDT 283 only needs to be unique relative to the source of the MDT (though see 284 section 5.4). 286 5. Multicast Tunnels 288 An MD can be thought of as a set of PE routers connected by a 289 "multicast tunnel (MT)". From the perspective of a VPN-specific PIM 290 instance, an MT is a single multi-access interface. In the SP 291 network, a single MT is realized as a Default MDT combined with zero 292 or more Data MDTs. 294 5.1. Ingress PEs 296 An ingress PE is a PE router that is either directly connected to the 297 multicast sender in the VPN, or via a CE router. When the multicast 298 sender starts transmitting, and if there are receivers (or PIM RP) 299 behind other PE routers in the common MD, the ingress PE becomes the 300 transmitter of either the Default MDT group or a Data MDT group in 301 the SP network. 303 5.2. Egress PEs 305 A PE router with a VRF configured in an MD becomes a receiver of the 306 Default MDT group for that MD. A PE router may also join a Data MDT 307 group if if it has a VPN-specific PIM instance in which it is 308 forwarding to one of its attached sites traffic for a particular C- 309 group, and that particular C-group has been associated with that 310 particular Data MDT. When a PE router joins any P-group used for 311 encapsulating VPN multicast traffic, the PE router becomes one of the 312 endpoints of the corresponding MT. 314 When a packet is received from an MT, the receiving PE derives the MD 315 from the destination address which is a P-group address of the the 316 packet received. The packet is then passed to the corresponding 317 Multicast VRF and VPN-specific PIM instance for further processing. 319 5.3. Tunnel Destination Address(es) 321 An MT is an IP tunnel for which the destination address is a P-group 322 address. However an MT is not limited to using only one P-group 323 address for encapsulation. Based on the payload VPN multicast 324 traffic, it can choose to use the Default MDT group address, or one 325 of the Data MDT group addresses (as described in section 7 of this 326 document), allowing the MT to reach a different set of PE routers in 327 the common MD. 329 5.4. Auto-Discovery 331 Any of the variants of PIM may be used to set up the Default MDT: 332 PIM-SM, Bidirectional PIM, or PIM-SSM. Except in the case of PIM- 333 SSM, the PEs need only know the proper P-group address in order to 334 begin setting up the Default MDTs. The PEs will then discover each 335 others' addresses by virtue of receiving PIM control traffic, e.g., 336 PIM Hellos, sourced (and encapsulated) by each other. 338 However, in the case of PIM-SSM, the necessary MDTs for an MD cannot 339 be set up until each PE in the MD knows the source address of each of 340 the other PEs in that same MD. This information needs to be auto- 341 discovered. 343 In [MDT-SAFI], a new BGP Address Family is defined. The NLRI for 344 this address family consists of an RD, an IPv4 unicast address, and 345 an multicast group address. A given PE router in a given MD 346 constructs an NLRI in this family from: 348 - Its own IPv4 address. If it has several, it uses the one which 349 it will be placing in the IP source address field of multicast 350 packets that it will be sending over the MDT. 352 - An RD which has been assigned to the MD. 354 - The P-group address which is to be used as the IP destination 355 address field of multicast packets that will be sent over the 356 MDT. 358 When a PE distributes this NLRI via BGP, it may include a Route 359 Target Extended Communities attribute. This RT must be an "Import 360 RT" [RFC2547bis] of each VRF in the MD. The ordinary BGP 361 distribution procedures used by [RFC2547bis] will then ensure that 362 each PE learns the MDT-SAFI "address" of each of the other PEs in the 363 MD, and that the learned MDT-SAFI addresses get associated with the 364 right VRFs. 366 If a PE receives an MDT-SAFI NLRI which does not have an RT 367 attribute, the P-group address from the NLRI has to be used to 368 associate the NLRI with a particular VRF. In this case, each 369 multicast domain must be associated with a unique P-address, even if 370 PIM-SSM is used. However, finding a unique P-address for a multi- 371 provider multicast group may be difficult. 373 In order to facilitate the deployment of multi-provider multicast 374 domains, this specification REQUIRES the use of the MDT-SAFI NLRI 375 (even if PIM-SSM is not used to set up the default MDT). This 376 specification also REQUIRES that an implementation be capable of 377 using PIM-SSM to set up the default MDT. 379 5.5. Which PIM Variant to Use 381 To minimize the amount of multicast routing state maintained by the P 382 routers, the Default MDTs should be realized as shared trees, such as 383 PIM Bidirectional trees. However, the operational procedures for 384 assigning P-group addresses may be greatly simplified, especially in 385 the case of multi-provider MDs, if PIM-SSM is used. 387 Data MDTs are best realized as source trees, constructed via PIM-SSM. 389 5.6. Inter-AS MDT Construction 391 Standard PIM techniques for the construction of source trees 392 presuppose that every router has a route to the source of the tree. 393 However, if the source of the tree is in a different AS than a 394 particular P router, it is possible that the P router will not have a 395 route to the source. For example, the remote AS may be using BGP to 396 distribute a route to the source, but a particular P router may be 397 part of a "BGP-free core", in which the P routers are not aware of 398 BGP-distributed routes. 400 What is needed in this case is a way for a PE to tell PIM to 401 construct the tree through a particular BGP speaker, the "BGP next 402 hop" for the tree source. This can be accomplished with a PIM 403 extension. 405 If the PE has selected the source of the tree from the MDT SAFI 406 address family, then it may be desirable to build the tree along the 407 route to the MDT SAFI address, rather than along the route to the 408 corresponding IPv4 address. This enables the inter-AS portion of the 409 tree to follow a path which is specifically chosen for multicast 410 (i.e., it allows the inter-AS multicast topology to be "non- 411 congruent" to the inter-AS unicast topology). This too requires a 412 PIM extension. 414 The necessary PIM extension is described in [PIM-RPF-Proxy]. 416 5.7. Encapsulation 418 5.7.1. Encapsulation in GRE 420 GRE encapsulation is recommended when sending multicast traffic 421 through an MDT. The following diagram shows the progression of the 422 packet as it enters and leaves the service provider network. 424 Packets received Packets in transit Packets forwarded 425 at ingress PE in the service by egress PEs 426 provider network 428 +---------------+ 429 | P-IP Header | 430 +---------------+ 431 | GRE | 432 ++=============++ ++=============++ ++=============++ 433 || C-IP Header || || C-IP Header || || C-IP Header || 434 ++=============++ >>>>> ++=============++ >>>>> ++=============++ 435 || C-Payload || || C-Payload || || C-Payload || 436 ++=============++ ++=============++ ++=============++ 438 The IPv4 Protocol Number field in the P-IP Header must be set to 47. 439 The Protocol Type field of the GRE Header must be set to 0x800. 441 [GRE2784] specifies an optional GRE checksum, and [GRE2890] specifies 442 optional GRE key and sequence number fields. 444 The GRE key field is not needed because the P-group address in the 445 delivery IP header already identifies the MD, and thus associates the 446 VRF context, for the payload packet to be further processed. 448 The GRE sequence number field is also not needed because the 449 transport layer services for the original application will be 450 provided by the C-IP Header. 452 The use of GRE checksum field must follow [GRE2784]. 454 To facilitate high speed implementation, this document recommends 455 that the ingress PE routers encapsulate VPN packets without setting 456 the checksum, key or sequence field. 458 5.7.2. Encapsulation in IP 460 IP-in-IP [IPIP1853] is also a viable option. When it is used, the 461 IPv4 Protocol Number field is set to 4. The following diagram shows 462 the progression of the packet as it enters and leaves the service 463 provider network. 465 Packets received Packets in transit Packets forwarded 466 at ingress PE in the service by egress PEs 467 provider network 469 +---------------+ 470 | P-IP Header | 471 ++=============++ ++=============++ ++=============++ 472 || C-IP Header || || C-IP Header || || C-IP Header || 473 ++=============++ >>>>> ++=============++ >>>>> ++=============++ 474 || C-Payload || || C-Payload || || C-Payload || 475 ++=============++ ++=============++ ++=============++ 477 5.7.3. Encapsulation in MPLS 479 An SP may choose MPLS encapsulation if a method described in [PIM- 480 MPLS] is deployed. The specification of the encapsulation as well as 481 the forwarding behavior of the PE routers, is out of the scope for 482 this document. 484 5.7.4. Interoperability 486 PE routers in a common MD must agree on the method of encapsulation. 487 This can be achieved either via configuration or means of some 488 discovery protocols. To help reduce configuration overhead and 489 improve multi-vendor interoperability, it is strongly recommended 490 that GRE encapsulation must be supported and enabled by default. 492 5.8. MTU 494 Because multicast group addresses are used as tunnel destination 495 addresses, existing Path MTU discovery mechanisms can not be used. 496 This requires that: 498 1. The ingress PE router (one that does the encapsulation) must 499 not set the DF bit in the outer header, and 501 2. If the "DF" bit is cleared in the IP header of the C-Packet, 502 fragment the C-Packet before encapsulation if appropriate. 503 This is very important in practice due to the fact that the 504 performance of reassembly function is significantly lower than 505 that of decapsulating and forwarding packets on today's router 506 implementations. 508 5.9. TTL 510 The ingress PE should not copy the TTL field from the payload IP 511 header received from a CE router to the delivery IP header. The 512 setting the TTL of the deliver IP header is determined by the local 513 policy of the ingress PE router. 515 5.10. Differentiated Services 517 By default, the setting of the DS field in the delivery IP header 518 should follow the guidelines outlined in [DIFF2983]. An SP may also 519 choose to deploy any of the additional mechanisms the PE routers 520 support. 522 5.11. Avoiding Conflict with Internet Multicast 524 If the SP is providing Internet multicast, distinct from its VPN 525 multicast services, it must ensure that the P-group addresses which 526 correspond to its MDs are distinct from any of the group addresses of 527 the Internet multicasts it supports. This is best done by using 528 administratively scoped addresses [ADMIN-ADDR]. 530 The C-group addresses need not be distinct from either the P-group 531 addresses or the Internet multicast addresses. 533 6. The PIM C-Instance and the MT 535 If a particular VRF is in a particular MD, the corresponding MT is 536 treated by that VRF's VPN-specific PIM instances as a LAN interface. 537 The PEs which are adjacent on the MT must execute the PIM LAN 538 procedures, including the generation and processing of PIM Hello, 539 Join/Prune, Assert, DF election and other PIM control packets. 541 6.1. PIM C-Instance Control Packets 543 The PIM protocol packets are sent to ALL-PIM-ROUTERS (224.0.0.13) in 544 the context of that VRF, but when in transit in the provider network, 545 they are encapsulated using the Default MDT group configured for that 546 MD. This allows VPN-specific PIM routes to be extended from site to 547 site without appearing in the P routers. 549 6.2. PIM C-instance RPF Determination 551 Although the MT is treated as a PIM-enabled interface, unicast 552 routing is NOT run over it, and there are no unicast routing 553 adjacencies over it. It is therefore necessary to specify special 554 procedures for determining when the MT is to be regarded as the "RPF 555 Interface" for a particular C-address. 557 When a PE needs to determine the RPF interface of a particular C- 558 address, it looks up the C-address in the VRF. If the route matching 559 it is not a VPN-IP route learned from MP-BGP as described in 560 [RFC2547bis], or if that route's outgoing interface is one of the 561 interfaces associated with the VRF, then ordinary PIM procedures for 562 determining the RPF interface apply. 564 However, if the route matching the C-address is a VPN-IP route whose 565 outgoing interface is not one of the interfaces associated with the 566 VRF, then PIM will consider the outgoing interface to be the MT 567 associated with the VPN-specific PIM instance. 569 Once PIM has determined that the RPF interface for a particular C- 570 address is the MT, it is necessary for PIM to determine the RPF 571 neighbor for that C-address. This will be one of the other PEs that 572 is a PIM adjacency over the MT. 574 In [MDT-SAFI], the BGP "Connector" attribute is defined. Whenever a 575 PE router distributes a VPN-IPv4 address from a VRF that is part of 576 an MD, it SHOULD distribute a Connector attribute along with it. The 577 Connector attribute should specify the MDT address family, and its 578 value should be the IP address which the PE router is using as its 579 source IP address for multicast packets which encapsulated and sent 580 over the MT. Then when a PE has determined that the RPF interface 581 for a particular C-address is the MT, it must look up the Connector 582 attribute that was distributed along with the VPN-IPv4 address 583 corresponding to that C-address. The value of this Connector 584 attribute will be considered to be the RPF adjacency for the C- 585 address. 587 If a Connector attribute is not present, but the "BGP Next Hop" for 588 the C-address is one of the PEs that is a PIM adjacency, then that PE 589 should be treated as the RPF adjacency for that C-address. However, 590 if the MD spans multiple Autonomous Systems, the BGP Next Hop might 591 not be a PIM adjacency, and the RPF check will not succeed unless the 592 Connector attribute is used. 594 7. Data MDT: Optimizing flooding 596 7.1. Limitation of Multicast Domain 598 While the procedure specified in the previous section requires the P 599 routers to maintain multicast state, the amount of state is bounded 600 by the number of supported VPNs. The P routers do NOT run any VPN- 601 specific PIM instances. 603 In particular, the use of a single bidirectional tree per VPN scales 604 well as the number of transmitters and receivers increases, but not 605 so well as the amount of multicast traffic per VPN increases. 607 The multicast routing provided by this scheme is not optimal, in that 608 a packet of a particular multicast group may be forwarded to PE 609 routers which have no downstream receivers for that group, and hence 610 which may need to discard the packet. 612 In the simplest configuration model, only the Default MDT group is 613 configured for each MD. The result of the configuration is that all 614 VPN multicast traffic, control or data, will be encapsulated and 615 forwarded to all PE routers that are part of the MD. While this 616 limits the number of multicast routing states the provider network 617 has to maintain, it also requires PE routers to discard multicast C- 618 packets if there are not receivers for those packets in the 619 corresponding sites. In some cases, especially when the content 620 involves high bandwidth but only a limited set of receivers, it is 621 desirable that certain C-packets only travel to PE routers that do 622 have receivers in the VPN to save bandwidth in the network and reduce 623 load on the PE routers. 625 7.2. Signaling Data MDT Trees 627 A simple protocol is proposed to signal additional P-group addresses 628 to encapsulate VPN traffic. These P-group addresses are called data 629 MDT groups. The ingress PE router advertises a different P-group 630 address (as opposed to always using the Default MDT group) to 631 encapsulate VPN multicast traffic. Only the PE routers on the path 632 to eventual receivers join the P-group, and therefore form an optimal 633 multicast distribution tree in the service provider network for the 634 VPN multicast traffic. These multicast distribution trees are called 635 Data MDT trees because they do not carry PIM control packets 636 exchanged by PE routers. 638 The following documents the procedures of the initiation and teardown 639 of the Data MDT trees. The definition of the constants and timers 640 can be found in section 8. 642 - The PE router connected to the source of the content initially 643 uses the Default MDT group when forwarding the content to the MD. 645 - When one or more pre-configured conditions are met, it starts to 646 periodically announce MDT Join TLV at the interval of 647 [MDT_INTERVAL]. The MDT Join TLV is forwarded to all the PE 648 routers in the MD. 650 If a PE in a particular MD transmits a C-multicast data packet to 651 the backbone, by transmitting it through an MD, every other PE in 652 that MD will receive it. Any of those PEs which are not on a C- 653 multicast distribution tree for the packet's C-multicast 654 destination address (as determined by applying ordinary PIM 655 procedures to the corresponding multicast VRF) will have to 656 discard the packet. 658 A commonly used condition is the bandwidth. When the VPN traffic 659 exceeds certain threshold, it is more desirable to deliver the 660 flow to the PE routers connected to receivers in order to 661 optimize the performance of PE routers and the resource of the 662 provider network. However, other conditions can also be devised 663 and they are purely implementation specific. 665 - The MDT Join TLV is encapsulated in UDP and the packet is 666 addressed to ALL-PIM-ROUTERS (224.0.0.13) in the context of the 667 VRF and encapsulated using the Default MDT group when sent to the 668 MD. This allows all PE routers to receive the information. 670 - Upon receiving MDT Join TLV, PE routers connected to receivers 671 will join the Data MDT group announced by the MDT Join TLV in the 672 global table. When the Data MDT group is in PIM-SM or 673 bidirectional PIM mode, the PE routers build a shared tree toward 674 the RP. When the data MDT group is setup using PIM-SSM, the PE 675 routers build a source tree toward the PE router that is 676 advertising the MDT Join TLV. The IP address of the source 677 address is the same as the source IP address used in the IP 678 packet advertising the MDT Join TLV. 680 PE routers which are not connected to receivers may wish to cache 681 the states in order to reduce the delay when a receiver comes up 682 in the future. 684 - After [MDT_DATA_DELAY], the PE router connected to the source 685 starts encapsulating traffic using the Data MDT group. 687 - When the pre-configured conditions are no longer met, e.g. the 688 traffic stops, the PE router connected to the source stops 689 announcing MDT Join TLV. 691 - If the MDT Join TLV is not received over [MDT_DATA_TIMEOUT], PE 692 routers connected to the receivers just leave the Data MDT group 693 in the global instance. 695 7.3. Use of SSM for Data MDTs 697 The use of Data MDTs requires that a set of multicast P-addresses be 698 pre-allocated and dedicated for use as the destination addresses for 699 the Data MDTs. 701 If SSM is used to set up the Data MDTs, then each MD needs to be 702 assigned a set of these of multicast P-addresses. Each VRF in the MD 703 needs to be configured with this set (i.e., all VRFs in the MD are 704 configured with the same set). If there are n addresses in this set, 705 then each PE in the MD can be the source of n Data MDTs in that MD. 707 If SSM is not used for setting up Data MDTs, then each VRF needs to 708 be configured with a unique set of multicast P-addresses; two VRFs in 709 the same MD cannot be configured with the same set of addresses. 710 This requires the pre-allocation of many more multicast P-addresses, 711 and the need to configure a different set for each VRF greatly 712 complicates the operations and management. Therefore the use of SSM 713 for Data MDTs is very strongly recommended. 715 8. Packet Formats and Constants 717 8.1. MDT TLV 719 "MDT TLV" has the following format. It uses port 3232. 721 0 1 2 3 722 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 723 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 724 | Type | Length | Value | 725 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 726 | . | 727 | . | 728 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 730 Type (8 bits): 732 the type of the MDT TLV. Currently, only 1, MDT Join TLV is 733 defined. 735 Length (16 bits): 737 the total number of octets in the TLV for this type, including 738 both the Type and Length field. 740 Value (variable length): 742 the content of the TLV. 744 8.2. MDT Join TLV 746 "MDT Join TLV" has the following format. 748 0 1 2 3 749 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 750 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 751 | Type | Length | Reserved | 752 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 753 | C-source | 754 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 755 | C-group | 756 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 757 | P-group | 758 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 760 Type (8 bits): 762 as defined above. For MDT Join TLV, the value of the field is 1. 764 Length (16 bits): 766 as defined above. For MDT Join TLV, the value of the field is 767 16, including 1 byte of padding. 769 Reserved (8 bits): 771 for future use. 773 C-Source (32 bits): 775 the IPv4 address of the traffic source in the VPN. 777 C-Group (32 bits): 779 the IPv4 address of the multicast traffic destination address in 780 the VPN. 782 P-Group (32 bits): 784 the IPv4 group address that the PE router is going to use to 785 encapsulate the flow (C-Source, C-Group). 787 8.3. Constants 789 [MDT_DATA_DELAY]: 791 the interval before the PE router connected to the source to 792 switch to the Data MDT group. The default value is 3 seconds. 794 [MDT_DATA_TIMEOUT]: 796 the interval before which the PE router connected to the 797 receivers to time out MDT JOIN TLV received and leave the data 798 MDT group. The default value is 3 minutes. This value must be 799 consistent among PE routers. 801 [MDT_DATA_HOLDOWN]: 803 the interval before which the PE router will switch back to the 804 Default MDT tree after it started encapsulating packets using the 805 Data MDT group. This is used to avoid oscillation when traffic 806 is bursty. The default value is 1 minute. 808 [MDT_INTERVAL] 809 the interval the source PE router uses to periodically send 810 MDT_JOIN_TLV message. The default value is 60 seconds. 812 9. Acknowledgments 814 Major contributions to this work have been made by Dan Tappan and 815 Tony Speakman. 817 This document is based on a previous version which included 818 additional material not covered here. Yakov Rekhter and Dino 819 Farinacci were co-authors of the previous version, and the current 820 authors thank them for their contribution. 822 The authors also wish to thank Arjen Boers, Robert Raszuk, Toerless 823 Eckert and Ted Qian for their help and their ideas. 825 10. Normative References 827 [GRE2784] "Generic Routing Encapsulation (GRE)", Farinacci, Li, 828 Hanks, Meyer, Traina, March 2000, RFC 2784 830 [MDT-SAFI] "MDT SAFI", Nalawade, Sreekantiah, February 2004, draft- 831 nalawade-mdt-safi-00.txt 833 [MT-DISC] "MT Tunnel Discovery and RPF check", Wijnands, Nalawade, 834 August 2004, 836 [PIMv2] "Protocol Independent Multicast - Sparse Mode (PIM-SM)", 837 Fenner, Handley, Holbrook, Kouvelas, October 2003, 840 [PIM-RPF-PROXY] "PIM RPF Proxy" Wijnands, Boers, Rosen, forthcoming. 842 [RFC2119] "Key words for use in RFCs to Indicate Requirement 843 Levels.", Bradner, March 1997 845 [RFC2547bis] "BGP/MPLS VPNs", Rosen, et. al., September 2003, 846 848 11. Informative References 850 [ADMIN-ADDR] "Administratively Scoped IP Multicast", Meyer, July 851 1998, RFC 2365 853 [BIDIR] "Bi-directional Protocol Independent Multicast", Handley, 854 Kouvelas, Speakman, Vicisano, June 2003, 857 [DIFF2983] "Differentiated Services and Tunnels", Black, October 858 2000, RFC2983. 860 [GRE1701] "Generic Routing Encapsulation (GRE)", Farinacci, Li, 861 Hanks, Traina, October 1994, RFC 1701 863 [GRE2890] "Key and Sequence Number Extensions to GRE", Dommety, 864 September 2000, RFC 2890 866 [IPIP1853] "IP in IP Tunneling", Simpson, October 1995, RFC1853. 868 [PIM-MPLS] "Using PIM to Distribute MPLS Labels for Multicast 869 Routes", Farinacci, Rekhter, Rosen, Qian, November 2000, 872 [SSM] "Source-Specific Multicast for IP", Holbrook, Cain, October 873 2003, draft-ietf-ssm-arch-04.txt 875 12. Authors' Addresses 877 Yiqun Cai (Editor) 878 Cisco Systems, Inc. 879 170 Tasman Drive 880 San Jose, CA, 95134 881 E-mail: ycai@cisco.com 883 Eric C. Rosen (Editor) 884 Cisco Systems, Inc. 885 1414 Massachusetts Avenue 886 Boxborough, MA, 01719 887 E-mail: erosen@cisco.com 889 IJsbrand Wijnands 890 Cisco Systems, Inc. 891 170 Tasman Drive 892 San Jose, CA, 95134 893 E-mail: ice@cisco.com 895 13. Intellectual Property Statement 897 The IETF takes no position regarding the validity or scope of any 898 Intellectual Property Rights or other rights that might be claimed to 899 pertain to the implementation or use of the technology described in 900 this document or the extent to which any license under such rights 901 might or might not be available; nor does it represent that it has 902 made any independent effort to identify any such rights. Information 903 on the procedures with respect to rights in RFC documents can be 904 found in BCP 78 and BCP 79. 906 Copies of IPR disclosures made to the IETF Secretariat and any 907 assurances of licenses to be made available, or the result of an 908 attempt made to obtain a general license or permission for the use of 909 such proprietary rights by implementers or users of this 910 specification can be obtained from the IETF on-line IPR repository at 911 http://www.ietf.org/ipr. 913 The IETF invites any interested party to bring to its attention any 914 copyrights, patents or patent applications, or other proprietary 915 rights that may cover technology that may be required to implement 916 this standard. Please address the information to the IETF at ietf- 917 ipr@ietf.org. 919 14. Full Copyright Statement 921 Copyright (C) The Internet Society (2004). This document is subject 922 to the rights, licenses and restrictions contained in BCP 78 and 923 except as set forth therein, the authors retain all their rights. 925 This document and the information contained herein are provided on an 926 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 927 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 928 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 929 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 930 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 931 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.