Network Working Group Rahul Aggarwal Internet Draft Anil Lohiya Expiration Date: December 2004 Tom Pusateri Yakov Rekhter Juniper Networks Base Specification for Multicast in BGP/MPLS VPNs draft-raggarwa-l3vpn-2547-mvpn-00.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as ``work in progress.'' The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This document describes the minimal set of procedures required to build multi-vendor inter-operable implementations of multicast for BGP/MPLS VPNs. It is based on prior specifications of multicast for BGP/MPLS VPN specifications that have been implemented and deployed. The procedures described herein require PIM-SM as the multicast routing protocol in the SP network. draft-raggarwa-l3vpn-2547-mvpn-00.txt [Page 1] Internet Draft draft-raggarwa-l3vpn-2547-mvpn-00.txt June 2004 Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [KEYWORDS]. Table of Contents 1. Motivation.......................................... 2 2. Terminology......................................... 3 3. Introduction........................................ 3 3.1. Efficient vs. Scalable Solution...................... 4 4. Basic Concepts...................................... 5 4.1. Multicast Domains................................... 5 4.2. Provider and VPN PIM Instances...................... 5 4.3. Multicast Tunnels................................... 6 5. Procedures.......................................... 6 5.1. Multicast VPN PIM-SM Join/Prune/Assert Propagation.. 6 5.1.1. C-Join/Prune/Assert RPF Interface................... 6 5.1.2. C-Join/Prune/Assert PIM Neighbor Address............ 7 5.1.3. Switching from Shared to Source Specific MD Trees in the SP Network................................... 7 5.2. Multicast VPN Data Forwarding....................... 8 5.3. Operation........................................... 9 5.3.1. PIM Neighbor Discovery in a MD...................... 9 5.3.2. Handling a PIM-Join/Prune received from a CE........ 10 6. Inter-AS Considerations............................. 10 7. Security Considerations............................. 11 8. Acknowledgments..................................... 11 9. Normative References................................ 11 10. Informative References.............................. 12 1. Motivation This document describes the minimal set of procedures required to build inter-operable implementations of multicast support for BGP/MPLS VPNs (MVPNs). It is based on prior multicast in BGP/MPLS VPN specifications [MVPN-6] that have been implemented and deployed. Procedures presented herein are not new. However the intent of this document is to clearly define the base set of procedures required to build inter-operable implementations of multicast support for BGP/MPLS VPNs. This document requires PIM-SM as multicast routing protocol in the SP network. draft-raggarwa-l3vpn-2547-mvpn-00.txt [Page 2] Internet Draft draft-raggarwa-l3vpn-2547-mvpn-00.txt June 2004 This document does not preclude various optional optimizations of multicast support for BGP/MPLS VPNs - it assumes that procedures for such optimizations will be specified in separate documents. 2. Terminology In addition to the terminology used in [2547] and [PIM-SM] this document introduces the following terms: Multicast Domain (MD): A set of VRFs on different PEs, belonging to a given VPN, associated with interfaces that can send multicast traffic to each other. Provider PIM Instance: PIM instance in the SP network VPN PIM Instance: PIM instance in the VPN P-Join: PIM Join message in the Provider PIM instance. C-Join: PIM Join message in the VPN PIM instance. Multicast Tunnel (MT): Tunnel created for each MD, in the provider PIM instance. The MT is used to carry multicast customer packets, both data and control, among the PE routers in a common MD. 3. Introduction [2547] specifies a set of procedures which must be implemented for a SP to provide a unicast VPN service. [MVPN-6] describes various methods that can be used to extend [2547] to enable a SP to provide multicast service in a VPN. However it does not specify the minimal and the exact set of procedures required for inter-operability. This has lead to non inter-operable implementations. This document specifies the minimal set of procedures required for an inter-operable solution that enable a SP to provide multicast service in a VPN. The procedures specified herein require a SP to use PIM-SM as the multicast routing protocol in the SP network. Use of other multicast routing protocols (PIM-SSM, PIM-BIDIR, PIM-DM) in the SP network for the purpose of providing multicast service in a VPN is optional and is not part of the minimal set of required procedures discussed here. Within a VPN, any of PIM-SM, PIM-DM, PIM-SSM, PIM-BIDIR can be used as the multicast routing protocol. draft-raggarwa-l3vpn-2547-mvpn-00.txt [Page 3] Internet Draft draft-raggarwa-l3vpn-2547-mvpn-00.txt June 2004 3.1. Efficient vs. Scalable Solution In the context of this document we define "efficient multicast routing" as follows. When a PE router receives a multicast data packet of a particular multicast group from a CE router, the packet must reach every other PE router which is on the path to a receiver of that group. It should not reach any PEs that aren't on the path to a receiver. It should not be unnecessarily replicated. Efficient multicast routing requires a source-tree for the multicast group, which would mean that the P routers would have to maintain state for each transmitter of each multicast group in each VPN. Note that efficient multicast routing, as defined above, requires potentially an unbounded amount of state in the SP routers, since the SP has no control on the number of multicast groups in the VPNs that it supports. Nor does the SP have any control over the number of transmitters in each group, nor of the distribution of the receivers. However, even if the amount of state was possible, the same multicast group address can be used in multiple VPNs to carry different traffic. This traffic cannot be mixed or delivered to the wrong VPN. This dictates the need for a tunneling mechanism to keep the traffic with the same destination IP address seperated for each VPN. One option is to setup unicast tunnels from the ingress PE to each of the egress PEs. The ingress PE replicates the multicast data packet received from a CE and sends it to each of the egress PEs using the unicast tunnels. Hence this solution uses ingress replication but requires minimal state in the SP network. This documents specifies a solution that aims at achieving a compromise between the the amount of multicast state required to be maintained in the SP network and the efficiency of multicast routing. It uses a PIM-SM shared tree multicast tunnel for each VPN. That allows it to bound the total amount of multicast state in the SP network solely by the number of VPNs. PIM-SM provides a way to improve efficiency of multicast routing (albeit at the cost of additional multicast state in the SP network) by switching from the shared tree to source trees, rooted at each PE. The PIM-SM source tree is shared by all the multicast sources within a VPN that are behind that PE. draft-raggarwa-l3vpn-2547-mvpn-00.txt [Page 4] Internet Draft draft-raggarwa-l3vpn-2547-mvpn-00.txt June 2004 4. Basic Concepts This section describes terminology used in the remainder of this document. 4.1. Multicast Domains A "Multicast Domain (MD)" is a set of VRFs on different PEs, belonging to the same VPN, associated with interfaces that can send multicast traffic to each other. Each MD is assigned a MD P-Group address. It is required that each MD be assigned a unique address across all the domains that are part of the MVPN service. Each VRF has its own multicast routing table. When a multicast control packet is received from a particular CE device, PIM RPF lookup and PIM Join propagation is done in the associated VRF. Similarly, when a multicast data packet is received from a particular CE device, multicast data forwarding is done in the associated VRF. The goal of this is to send the multicast control or data packet to all other VRFs in that MD. This is achieved by building one or more multicast distribution tree for a given MD in the SP network. 4.2. Provider and VPN PIM Instances Each PE router runs an instance of PIM per VRF. In each VRF instance of PIM, the PE maintains a PIM adjacency with each of the PIM-capable CE routers associated with that VRF. The multicast routing table created by each instance is specific to the corresponding VRF. These PIM instances are referred as "VPN-specific PIM instances". These PIM instances can support any flavor of PIM, for instance PIM-SM or PIM- DM. Each PE router also runs a "provider-wide" instance of PIM-SM, in which it has a PIM adjacency with each of its IGP neighbors (i.e., with P and directly connected PE routers), but NOT with any CE routers. The provider PIM instance MUST support PIM-SM. In order to help refer to provider-wide PIM instance and to VPN- specific PIM instance, the prefixes "P-" and "C-" are used respectively. Thus a P-Join would be a PIM Join which is processed by the provider-wide PIM-SM instance, and a C-Join would be a PIM Join which is processed by a VPN-specific PIM instance. A P-group address would be a group address in the SP's address space. draft-raggarwa-l3vpn-2547-mvpn-00.txt [Page 5] Internet Draft draft-raggarwa-l3vpn-2547-mvpn-00.txt June 2004 4.3. Multicast Tunnels Each MD is assigned a unique multicast P-group address across the provider network. As part of normal PIM-SM procedures the provider wide PIM-SM instance has to know the RP for each P-group. Each PE sends the traffic for an MD encapsulated to the P-group for that MD. This is called a Multicast Tunnel (MT). The MT is treated like an interface and normal PIM Hellos are sent through the tunnel. This leads to all PEs discovering each other as PIM neighbors over that MT interface in the given MD. The details are described in section 5.3.1. The MT is used to carry multicast C-packets, both data and control packets, among the PE routers in a common MD. Data forwarding is described in section 5.2. When a packet is received by a PE from another router in the SP network, the receiving PE can determine the MT (and hence the MD) from which the packet was received as the destination address of the packet is the MD P-group address. The decapsulated packet is then passed to the corresponding Multicast VRF and VPN-specific PIM instance for further processing. 5. Procedures 5.1. Multicast VPN PIM-SM Join/Prune/Assert Propagation For a VRF in a particular MD, the corresponding MT is treated by that VRF's VPN-specific PIM instance as an interface. The PEs which are adjacent on the MT must execute the PIM interface procedures, including the generation and processing of Assert packets. The VPN PIM instances can send C-Join messages through the MT. These messages are received by all PEs in the MD. This allows VPN-specific PIM Join/Prune messages to be extended from site to site, without appearing in the P routers. Note that a C-Join message carries the address of the neighbor for which the C-Join message is meant. This message is processed by the corresponding PIM neighbor on the MT interface. 5.1.1. C-Join/Prune/Assert RPF Interface Although the MT is treated as a PIM-enabled interface, unicast routing is NOT run over it, and there are no unicast routing adjacencies over it. It is therefore necessary to specify special procedures for determining when the MT is to be regarded as the "RPF Interface" for a particular C-address. draft-raggarwa-l3vpn-2547-mvpn-00.txt [Page 6] Internet Draft draft-raggarwa-l3vpn-2547-mvpn-00.txt June 2004 When a PE needs to determine the RPF interface of a particular C- address, it looks up the C-address in the VRF. If the route matching it is not a VPN-IP route learned from MP-BGP as described in [2547], or if that route's outgoing interface is one of the interfaces associated with the VRF, then ordinary PIM procedures for determining the RPF interface apply. However, if the route matching the C-address is a VPN-IP route whose outgoing interface is not one of the interfaces associated with the VRF, then PIM will consider the outgoing interface to be the MT associated with the VPN-specific PIM instance. 5.1.2. C-Join/Prune/Assert PIM Neighbor Address Determination of the C-Join PIM neighbor address i.e. the RPF neighbor address needs to be further explained. This depends on the procedure used to assign an address to the MT inteface. The address of this interface MUST be the BGP next-hop address of the unicast VPN routes advertised by the MD VRF. This will typically be a PE loopback address in the provider address space. To determine the C-Join neighbor address, the PE does a route lookup on the C-Source address. This address is a VPN unicast route learnt from the PE sitting in front of the multicast source. The route lookup results in the BGP next-hop of the C-source VPN unicast route. This BGP next-hop is the neighbor address to use while sending the PIM-Join. 5.1.3. Switching from Shared to Source Specific MD Trees in the SP Network By default the generation of VPN instance PIM control messages on a MT by a PE results in all the other PEs in that MD to switch from the shared MD tree in the SP network to a source specific MD tree rooted at the PE that is generating the control messages. This is the case even though there may not be any multicast sources transmitting in that given VRF on that PE. This results in a different source specific tree for a given MD for each PE that belongs to that MD. To reduce the number of source specific trees in the SP network an implementation SHOULD provide the following knobs to control switching from the shared MD tree in the SP network: a) A knob on the RP so that it sends the source specific MD P-Group Join to the source PE (after receiving Register messages) only after the multicast traffic being received for that MD from the source PE exceeds a certain threshold. draft-raggarwa-l3vpn-2547-mvpn-00.txt [Page 7] Internet Draft draft-raggarwa-l3vpn-2547-mvpn-00.txt June 2004 b) A knob on a PE so that it sends the source specific MD P-Group Join to the source PE only after the multicast traffic being received for that MD from the source PE exceeds a certain threshold. Note that this is a local implementation choice and does not impact inter-operability. 5.2. Multicast VPN Data Forwarding A PE in a particular MD transmits a C-multicast data packet through the SP network by transmitting it through the MT corresponding to the MD. The MT is installed as the outgoing interface for the C-multicast data packets when C-Join messages corresponding to the data packet's source and group are received on the MT interface. An implementation MUST support GRE encapsulation. The following diagram shows the progression of the packet using GRE encapsulation as it enters and leaves the service provider network. Packets received Packets in transit Packets forwarded at ingress PE in the service by egress PEs provider network +---------------+ | P-IP Header | +---------------+ | GRE | ++=============++ ++=============++ ++=============++ || C-IP Header || || C-IP Header || || C-IP Header || ++=============++ >>>>> ++=============++ >>>>> ++=============++ || C-Payload || || C-Payload || || C-Payload || ++=============++ ++=============++ ++=============++ The destination address in the P-IP header is the MD address corresponding to the MT. This enables the P routers to forward this packet along the multicast distribution tree corresponding to the MD. The IPv4 Protocol Number field in the P-IP Header must be set to GRE (47). If a PE in a particular MD transmits a C-multicast data packet to the backbone, by transmitting it through an MD, every other PE in that MD will receive it. Any of those PEs which are not on a C-multicast distribution tree for the packet's C-multicast destination address (as determined by applying ordinary PIM procedures to the draft-raggarwa-l3vpn-2547-mvpn-00.txt [Page 8] Internet Draft draft-raggarwa-l3vpn-2547-mvpn-00.txt June 2004 corresponding multicast VRF) will have to discard the packet. 5.3. Operation 5.3.1. PIM Neighbor Discovery in a MD MTs are described in section 4.3. A MT is a pseudo interface unique to a VPN that can be created when PIM-SM initializes. Unicast routing is not run over this interface. This interface is used to carry PIM control and multicast data traffic. PIM sends Hello messages on the MT interfaces using the PE loopback address in the provider address space as the source address. Each PE router needs to join the MD P-Groups associated with all the MDs it belongs to. Discovery of remote PEs in the same VPN is done by sending PIM Hello messages over MT tunnels as follows: o When PIM-SM initializes in a MD, the PE originates a PIM Join message for the MD P-Group address towards the RP in the SP space. This is done for each MD that is configured on the PE. o Since an MT interface belongs to a VPN, sending a Hello message on this interface does the following: o The PIM Hello message has the source address of PE's loopback interface in the SP address space and the destination of ALL-PIM- ROUTERS group. o This PIM Hello gets encapsulated in a GRE header with the source address as the PE's loopback interface and the destination as the MD P-Group address. After the encapsulation, the original PIM-SM Hello travels as the data packet in a PIM-SM Register towards the SP RP. o RP in the SP network knows about all the receivers (the PEs) because of the earlier PIM Join for the MD P-Group address that it received from all the PEs when they initialized. So, when the RP receives the above PIM-SM register, it decapsulates it and forwards it down to all the PEs. So, all the remote PEs (including the one who sent the packet) receives this data packet which has the source address of the originating PE. o This PIM Hello packet originated within the VRF travels as the data packet (due to encapsulation) in the SP network towards the RP. o The above procedure is repeated on all the PEs. Hence, all the PEs receive each other's data packets which contain PIM Hello messages and discover one another. PEs can decide to send the source Join directly to the remote PEs at this point. draft-raggarwa-l3vpn-2547-mvpn-00.txt [Page 9] Internet Draft draft-raggarwa-l3vpn-2547-mvpn-00.txt June 2004 5.3.2. Handling a PIM-Join/Prune received from a CE When a PE receives a PIM Join/Prune for a group in the VPN space in its VRF, it processes this message exactly as per PIM procedures. Then it forwards this message to the upstream PIM neighbor in the path to the VPN-RP or the VPN source. The neighbor address in the PIM message is set as described in section 5.1.2. The PIM message is encapsulated in a GRE header with the source address as the PE's loopback interface and the destination as the MD P-Group address. The original PIM control message in the VPN instance PIM now becomes a data packet within the SP space and gets sent either as the PIM-SM Register to the SP-RP or natively through the SP network. It is sent to all PEs that had sent a PIM-SM Join for the MD P-Group address earlier. The packet finally reaches all the PEs in the MD. The PE for which the "upstream neighbor address" matches forwards the original PIM control message towards the RP or source behind the CE. 6. Inter-AS Considerations [2547] describes three methods for creating inter-AS VPNs: Option A: VRF-to-VRF connections at the AS border routers. Option B: EBGP redistribution of labeled VPN-IP routes from AS to neighboring AS. Option C: Multihop EBGP distribution of labeled VPN-IP routes between source and destination ASes, with EBGP redistribution of labeled IP routes from AS to neighboring AS. The mechanisms described in this draft support multi-AS VPN multicast when either Option A or C is used. However, they are not sufficient when Option B is used. This is because the BGP Next-hop of the VPN routes is re-written in Option B at the ASBRs. As a result of this the PIM neighbor and the BGP next-hop do not match and the procedures described in section 5.1.2 cannot be used for determining the RPF neighbor. Solution to this issue is outside the scope of this document. It is possible that Option C is used with a 'BGP free SP network'. In this case the P routers in one AS do not know how to route to the PE addresses in another AS. As a result of this they will not be able to forward the P-Join messages towards the egress PE. Solution to this issue is outside the scope of this document. For inter-AS VPNs that require multicast service if the involved ASs draft-raggarwa-l3vpn-2547-mvpn-00.txt [Page 10] Internet Draft draft-raggarwa-l3vpn-2547-mvpn-00.txt June 2004 are all under a single provider, these ASs can share RPs, and MSDP is not required. Even if the ASs are under control of multiple service providers, the level of cooperation required to offer even plain unicast 2547 VPN service is high enough, which means that one more issue (ownership of RP) may not be a significant addition to what is already required. And if that is the case, the providers can share RPs, and MSDP is not required. If each provider insists on having its own local RP, MSDP can be used between the RPs that belong to the different providers. However, in many cases, this will not be necessary. If there are inter-AS VPNs that span multiple SPs and require multicast service, then MDs (and MTs) for these VPNs will cross provider boundaries. The assignment of the multicast group addresses associated with the MDs for such VPNs must then be coordinated upon by the providers 7. Security Considerations Security considerations discussed in [2547] and [PIM-SM] apply to this document. 8. Acknowledgment As mentioned earlier, this draft is based on [MVPN-6]. The authors of [MVPN-6] are Eric Rosen, Yiqun Cai, Dan Tappan, IJsbrand Wijnands, Yakov Rekhter and Dino Farinacci. We would like to thank them for their tremendous contribution to this technology. We would also like to thank Paras Trivedi for his detailed review of this document. 9. Normative References [PIM-SM] "Protocol Independent Multicast - Sparse Mode (PIM-SM)", Fenner, Handley, Holbrook, Kouvelas, October 2003, draft-ietf-pim- sm-v2-new-08.txt [2547] "BGP/MPLS VPNs", Rosen, Rekhter, et. al., September 2003, draft-ietf-l3vpn-rfc2547bis-01.txt [GRE2784] "Generic Routing Encapsulation (GRE)", Farinacci, Li, Hanks, Meyer, Traina, March 2000, RFC 2784 [RFC2119] "Key words for use in RFCs to Indicate Requirement draft-raggarwa-l3vpn-2547-mvpn-00.txt [Page 11] Internet Draft draft-raggarwa-l3vpn-2547-mvpn-00.txt June 2004 Levels.", Bradner, March 1997 10. Informative References [MVPN-6] E. Rosen. et. al., "Multicast in MPLS/BGP VPNs", draft- rosen-vpn-mcast-06.txt Author Information Rahul Aggarwal Juniper Networks 1194 North Mathilda Ave. Sunnyvale, CA 94089 Email: rahul@juniper.net Anil Lohiya Juniper Networks 1194 North Mathilda Ave. Sunnyvale, CA 94089 Email: alohiya@juniper.net Tom Pusateri Juniper Networks 1194 North Mathilda Ave. Sunnyvale, CA 94089 Email: pusateri@juniper.net Yakov Rekhter Juniper Networks 1194 North Mathilda Ave. Sunnyvale, CA 94089 Email: yakov@juniper.net draft-raggarwa-l3vpn-2547-mvpn-00.txt [Page 12] Internet Draft draft-raggarwa-l3vpn-2547-mvpn-00.txt June 2004 IPR Notice The IETF takes no position regarding the validity or scope of any intellectual property or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; neither does it represent that it has made any effort to identify any such rights. Information on the IETF's procedures with respect to rights in standards-track and standards-related documentation can be found in BCP-11. Copies of claims of rights made available for publication and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementors or users of this specification can be obtained from the IETF Secretariat. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights which may cover technology that may be required to practice this standard. Please address the information to the IETF Executive Director. Full Copyright Notice Copyright (C) The Internet Society (2003). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION draft-raggarwa-l3vpn-2547-mvpn-00.txt [Page 13] Internet Draft draft-raggarwa-l3vpn-2547-mvpn-00.txt June 2004 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE." Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. draft-raggarwa-l3vpn-2547-mvpn-00.txt [Page 14]