idnits 2.17.1 draft-zzhang-bess-bgp-multicast-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) == There are 8 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: A network currently running PIM can be incrementally transitioned to BGP based multicast. At any time, a router supporting BGP based multicast can use PIM with some neighbors (upstream or downstream) and BGP with some other neighbors. PIM and BGP MUST not be used simultaneously between two neighbors for multicast purpose, and routers connected to the same LAN MUST be transitioned during the same maintenance window. -- The document date (October 29, 2019) is 1633 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC7438' is mentioned on line 262, but not defined == Missing Reference: 'RFC5492' is mentioned on line 312, but not defined == Missing Reference: 'RFC4760' is mentioned on line 432, but not defined == Missing Reference: 'RFC4271' is mentioned on line 432, but not defined == Missing Reference: 'RFC7524' is mentioned on line 545, but not defined == Missing Reference: 'RFC6388' is mentioned on line 687, but not defined == Unused Reference: 'RFC2119' is defined on line 837, but no explicit reference was found in the text == Unused Reference: 'RFC4601' is defined on line 842, but no explicit reference was found in the text == Unused Reference: 'RFC5015' is defined on line 855, but no explicit reference was found in the text == Unused Reference: 'I-D.ietf-bess-mvpn-pe-ce' is defined on line 873, but no explicit reference was found in the text == Outdated reference: A later version (-22) exists of draft-ietf-idr-tunnel-encaps-14 ** Obsolete normative reference: RFC 4601 (Obsoleted by RFC 7761) == Outdated reference: A later version (-02) exists of draft-wijnands-bier-mld-lan-election-01 Summary: 2 errors (**), 0 flaws (~~), 15 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 BESS Z. Zhang 3 Internet-Draft L. Giuliano 4 Intended status: Standards Track Juniper Networks 5 Expires: May 1, 2020 K. Patel 6 Arrcus 7 I. Wijnands 8 M. Mishra 9 Cisco Systems 10 A. Gulko 11 Refinitiv 12 October 29, 2019 14 BGP Based Multicast 15 draft-zzhang-bess-bgp-multicast-03 17 Abstract 19 This document specifies a BGP address family and related procedures 20 that allow BGP to be used for setting up multicast distribution 21 trees. This document also specifies procedures that enable BGP to be 22 used for multicast source discovery, and for showing interest in 23 receiving particular multicast flows. Taken together, these 24 procedures allow BGP to be used as a replacement for other multicast 25 routing protocols, such as PIM or mLDP. The BGP procedures specified 26 here are based on the BGP multicast procedures that were originally 27 designed for use by providers of Multicast Virtual Private Network 28 service. 30 Requirements Language 32 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 33 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 34 document are to be interpreted as described in RFC2119. 36 Status of This Memo 38 This Internet-Draft is submitted in full conformance with the 39 provisions of BCP 78 and BCP 79. 41 Internet-Drafts are working documents of the Internet Engineering 42 Task Force (IETF). Note that other groups may also distribute 43 working documents as Internet-Drafts. The list of current Internet- 44 Drafts is at https://datatracker.ietf.org/drafts/current/. 46 Internet-Drafts are draft documents valid for a maximum of six months 47 and may be updated, replaced, or obsoleted by other documents at any 48 time. It is inappropriate to use Internet-Drafts as reference 49 material or to cite them other than as "work in progress." 51 This Internet-Draft will expire on May 1, 2020. 53 Copyright Notice 55 Copyright (c) 2019 IETF Trust and the persons identified as the 56 document authors. All rights reserved. 58 This document is subject to BCP 78 and the IETF Trust's Legal 59 Provisions Relating to IETF Documents 60 (https://trustee.ietf.org/license-info) in effect on the date of 61 publication of this document. Please review these documents 62 carefully, as they describe your rights and restrictions with respect 63 to this document. Code Components extracted from this document must 64 include Simplified BSD License text as described in Section 4.e of 65 the Trust Legal Provisions and are provided without warranty as 66 described in the Simplified BSD License. 68 Table of Contents 70 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 71 1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . 3 72 1.1.1. Native/unlabeled Multicast . . . . . . . . . . . . . 3 73 1.1.2. Labeled Multicast . . . . . . . . . . . . . . . . . . 4 74 1.2. Overview . . . . . . . . . . . . . . . . . . . . . . . . 4 75 1.2.1. (x,g) Multicast . . . . . . . . . . . . . . . . . . . 5 76 1.2.1.1. Source Discovery for ASM . . . . . . . . . . . . 5 77 1.2.1.2. ASM Shared-tree-only Mode . . . . . . . . . . . . 6 78 1.2.1.3. Integration with BGP-MVPN . . . . . . . . . . . . 7 79 1.2.2. BGP Inband Signaling for mLDP Tunnel . . . . . . . . 7 80 1.2.3. BGP Sessions . . . . . . . . . . . . . . . . . . . . 7 81 1.2.4. LAN and Parallel Links . . . . . . . . . . . . . . . 8 82 1.2.5. Transition . . . . . . . . . . . . . . . . . . . . . 9 83 2. Specification . . . . . . . . . . . . . . . . . . . . . . . . 10 84 2.1. BGP NLRIs and Attributes . . . . . . . . . . . . . . . . 10 85 2.1.1. S-PMSI A-D Route . . . . . . . . . . . . . . . . . . 11 86 2.1.2. Leaf A-D Route . . . . . . . . . . . . . . . . . . . 11 87 2.1.3. Source Active A-D Route . . . . . . . . . . . . . . . 12 88 2.1.4. S-PMSI A-D Route for C-multicast mLDP . . . . . . . . 13 89 2.1.5. Session Address Extended Community . . . . . . . . . 13 90 2.2. Procedures . . . . . . . . . . . . . . . . . . . . . . . 14 91 2.2.1. Source Discovery for ASM . . . . . . . . . . . . . . 14 92 2.2.2. Originating Tree Join Routes . . . . . . . . . . . . 14 93 2.2.2.1. (x,g) Multicast Tree . . . . . . . . . . . . . . 14 94 2.2.2.2. BGP Inband Signaling for mLDP Tunnel . . . . . . 15 95 2.2.3. Receiving Tree Join Routes . . . . . . . . . . . . . 15 96 2.2.4. Withdrawl of Tree Join Routes . . . . . . . . . . . . 16 97 2.2.5. LAN procedures for (x,g) Unidirectional Tree . . . . 16 98 2.2.5.1. Originating S-PMSI A-D Routes . . . . . . . . . . 16 99 2.2.5.2. Receiving S-PMSI A-D Routes . . . . . . . . . . . 17 100 2.2.6. Distributing Label for Upstream Traffic for 101 Bidirectional Tree/Tunnel . . . . . . . . . . . . . . 17 102 3. Security Considerations . . . . . . . . . . . . . . . . . . . 18 103 4. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 18 104 5. References . . . . . . . . . . . . . . . . . . . . . . . . . 18 105 5.1. Normative References . . . . . . . . . . . . . . . . . . 18 106 5.2. Informative References . . . . . . . . . . . . . . . . . 19 107 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 20 109 1. Introduction 111 1.1. Motivation 113 This section provides some motivation for BGP signaling for native 114 and labeled multicast. One target deployment would be a Data Center 115 that requires multicast but uses BGP as its only routing protocol 116 [RFC7938]. In such a deployment, it would be desirable to support 117 multicast by extending the deployed routing protocol, without 118 requiring the deployment of tree building protocols such as PIM, 119 mLDP, RSVP-TE P2MP, and without requiring an IGP. 121 Additionally, compared to PIM, BGP based signaling has several 122 advantage as described in the following section, and may be desired 123 in non-DC deployment scenarios as well. 125 1.1.1. Native/unlabeled Multicast 127 Protocol Independent Multicast (PIM) has been the prevailing 128 multicast protocol for many years. Despite its success, it has two 129 drawbacks: 131 o The ASM model, which is prevalent, introduces complexity in the 132 following areas: source discovery procedures, need for Rendezvous 133 Points (RPs) and group-to-RP mappings, need to switch between RP- 134 rooted trees and source-rooted trees, etc. 136 o Periodical protocol state refreshes due to soft state nature. 138 PIM-SSM removes much of the complexity of PIM-ASM by moving source 139 discovery to the application layer. However, for various reasons, 140 many legacy applications and devices still rely upon network-based 141 source discovery. PIM-Port (PIM over Reliable Transport) solves the 142 soft state issue, though its deployment has also been limited for two 143 reasons: 145 o It does not remove the ASM complexities. 147 o In many of the scenarios where reliable transport is deemed 148 important, BGP-based multicast (e.g. BGP-MVPN) has been used 149 instead of PORT. 151 Partly because of the above mentioned problems, some Data Center 152 operators have been avoiding deploying multicast in their networks. 154 BGP-MVPN [RFC6514] uses BGP to signal VPN customer multicast state 155 over provider networks. It removes the above mentioned problems from 156 the SP environment, and the deployment experiences have been 157 encouraging. While RFC 6514 makes it possible for an SP to provide 158 MVPN service without running PIM on its backbone, that RFC still 159 assumes that PIM (or mLDP) runs on the PE-CE links. [draft-ietf-bess- 160 mvpn-pe-ce] adapts the concept of BGP-MVPN to PE-CE links so that the 161 use of PIM on the PE-CE links can be eliminated (though the PIM-ASM 162 complexities still remains in the customer network), and this 163 document extends it further to general topologies, so that they can 164 be run on any router, as a replacement for PIM or mLDP. 166 With that, PIM can be completely eliminated from the network. PIM 167 soft state is replaced by BGP hard state. For ASM, source specific 168 trees are set up directly after simpler source discovery (data driven 169 on FHRs and control driven elsewhere), all based on BGP. All the 170 complexities related to source discovery and shared/source tree 171 switch are also eliminated. Additionally, the trees can be setup 172 with MPLS labels, with just minor enhancements in the signaling. 174 1.1.2. Labeled Multicast 176 There could be two forms of labeled multicast signaled by BGP. The 177 first one is labeled (x,g) multicast where 'x' stands for either 's' 178 or '*'. Basically, it is for BGP-signaled multicast tree as 179 described in previous section but with labels. The second one is for 180 mLDP tunnels with BGP signaling in part or whole through a BGP 181 domain. 183 For both cases, BGP is used because other label distribution 184 mechanisms like mLDP may not be desired by some operators. For 185 example, a DC operator may prefer to have a BGP-only deployment. 187 1.2. Overview 188 1.2.1. (x,g) Multicast 190 PIM-like functionality is provided, using BGP-based join/prune 191 signaling and BGP-based source discovery for ASM. The BGP-based join 192 signaling supports both labeled multicast and IP multicast. 194 The same RPF procedures as in PIM are used for each router to 195 determine the RPF neighbor for a particular source or RPA (in case of 196 Bidirectional Tree). Except in the Bidirectional Tree case and a 197 special case described in Section 1.2.1.2, no (*,G) join is used - 198 LHR routers discover the sources for ASM and then join towards the 199 sources directly. Data driven mechanisms like PIM Assert is replaced 200 by control driven mechanisms (Section 1.2.4). 202 The joins are carried in BGP Updates with MCAST-TREE SAFI and S-PMSI/ 203 Leaf A-D routes defined in this document. The updates are targeted 204 at the upstream neighbor by use of Route Targets. [Note - earlier 205 version of this draft uses C-multicast route to send joins. We're 206 now switching to S-PMSI/Leaf routes for three reasons. a) when the 207 routes go through RRs, we have to distinguish different routes based 208 on upstream router and downstream router. This leads to Leaf routes. 209 b) for labeled bidirectional trees, we need to signal "upstream fec". 210 S-PMSI suits this very well. c) we may want to allow the option of 211 setting up trees from the roots instead of from the leaves. S-PMSI 212 suits that very well.] 214 If the BGP updates carry labels (via Tunnel Encapsulation Attribute 215 [I-D.ietf-idr-tunnel-encaps]), then (s,g) multicast traffic can use 216 the labels. This is very similar to mLDP Inband Signaling [RFC6826], 217 except that there are no corresponding "mLDP tunnels" for the PIM 218 trees. Similar to mLDP, labeled traffic on transit LANs are point to 219 point. Of course, traffic sent to receivers on a LAN by a LHR is 220 native multicast. 222 For labeled bidirectional (*,g) trees, downstream traffic (away from 223 the RPA) can be forwarded as in the (s,g) case. For upstream traffic 224 (towards RPA), the upstream neighbor needs to advertise a label for 225 its downstream neighbors. The same label that the upstream neighbor 226 advertises to its upstream is the same one that it advertises to its 227 downstreams, using an S-PMSI A-D route. 229 1.2.1.1. Source Discovery for ASM 231 This document does not support ASM via shared trees (aka RP Tree, or 232 RPT) with one exception discussed in the next section. Instead, 233 FHRs, LHRs, and optionally RRs work together to propagate/discover 234 source information via control plane and LHRs join source specific 235 Shortest Path Trees (SPT) directly. 237 A FHR originates Source Active A-D routes upon discovering sources 238 for particular flows and advertise them to its peers. It is desired 239 that the SA routes only reach LHRs that are interested in receiving 240 the traffic. To achieve that, the SA routes carry an IPv4 or IPv6 241 address specific Route Target. The Global Administrator field is set 242 the group address of the flow, and the Local Administrator field is 243 set to 0. An LHR advertises Route Target Membership routes, with the 244 Route Target field in the NLRI set according to the groups it wants 245 to receive traffic for, as how a FHR encode the Route Target in its 246 Source Active routes. The propagation of the SA routes is subject to 247 cooperative export filtering as specified in [RFC4684] and referred 248 to as RTC mechanism in this document. That way, the LHR only 249 receives Source Active routes for groups that it is interested in. 251 Typically, a set of RRs are used and they maintains all Source Active 252 routes but only distribute to interested LHRs on demand (upon 253 receiving corresponding Route Target Membership routes, which are 254 triggered on LHRs when they receive IGMP/MLD membership routes). The 255 rest of the document assumes that RRs are used, even though that is 256 not required. 258 1.2.1.2. ASM Shared-tree-only Mode 260 It may be desired that only a shared tree is used to distribute all 261 traffic for a particular ASM group from its RP to all LHRs, as 262 described in Section 4.1 "PIM Shared Tree Forwarding" of [RFC7438]. 263 This will significantly cut down the number of trees and works out 264 very well in certain deployment scenarios. For example, all the 265 sources could be connected to the RP, or clustered close to RP. In 266 the latter case, either the path from FHRs to the RP do not intersect 267 the shared tree so native forwarding can be used between the FHRs and 268 the RP, or other means outside of this document could be used to 269 forward traffic from FHRs to the RP. 271 For native forwarding from FHRs to the RP, SA routes may be used to 272 announce the sources so that the RP can join source specific trees to 273 pull traffic, but the group specific Route Target is not needed. The 274 LHRs do not advertise the group specific Route Target Membership 275 routes as they do not need the SA routes. 277 To establish the shared tree, (*,g) Leaf A-D routes are used as in 278 the bidirectional tree case, though no forwarding state is 279 established to forward traffic from downstream neighbors. 281 1.2.1.3. Integration with BGP-MVPN 283 For each VPN, the Source Active routes distribution in that VPN do 284 not have to involve PEs at all unless there are sources/receivers 285 directly connected to some PEs and they are independent of MVPN SA 286 routes. For example, FHRs and LHRs establish BGP sessions with RRs 287 of that particular VPN for the purpose of SA distribution. 289 After source discovery, BGP multicast signaling is done from LHRs 290 towards the sources. When the signaling reaches an egress PE, BGP- 291 MVPN signaling takes over, as if a PIM (s,g) join/prune was received 292 on the PE-CE interface. When the BGP-MVPN signaling reaches the 293 ingress PE, BGP multicast signaling as specified in this document 294 takes over, similar to how BGP-MVPN triggers PIM (s,g) join/prune on 295 PE-CE interfaces. 297 1.2.2. BGP Inband Signaling for mLDP Tunnel 299 Part of an (or the whole) mLDP tunnel can also be signaled via BGP 300 and seamlessly integrated with the rest of mLDP tunnel signaled 301 natively via mLDP. All the procedures are similar to mLDP except 302 that the signaling is done via BGP. The mLDP FEC is encoded as the 303 BGP NLRI, with MCAST-TREE SAFI and S-PMSI/Leaf A-D Routes for 304 C-multicast mLDP defined in this document. The Leaf A-D routes 305 correspond to mLDP Label Mapping messages, and the S-PMSI A-D routes 306 are used to signal upstream FEC for MP2MP mLDP tunnels, similar to 307 the bidirection (*,g) case. 309 1.2.3. BGP Sessions 311 In order for two BGP speakers to exchange MCAST-TREE NLRI, they must 312 use BGP Capabilities Advertisement [RFC5492] to ensure that they both 313 are capable of properly processing the MCAST-TREE NLRI. This is done 314 as specified in [RFC4760], by using a capability code 1 315 (multiprotocol BGP) with an AFI of IPv4 (1) or IPv6 (2) and a SAFI of 316 MCAST-TREE with a value to be assigned by IANA. 318 How the BGP peer sessions are provisioned, whether EBGP or IBGP, 319 whether statically, automatically (e.g., based on IGP neighbor 320 discovery), or programmably via an external controller, is outside 321 the scope of this document. 323 In case of IBGP, it could be that every router peering with Route 324 Reflectors, or hop by hop IBGP sessions could be used to exchange 325 MCAST-TREE NLRIs for joins. In the latter case, unless desired 326 otherwise for reasons outside of the scope of this document, the hop 327 by hop IBGP sessions SHOULD only be used to exchange MCAST-TREE 328 NLRIs. 330 When multihop BGP is used, a router advertises its local interface 331 addresses, for the same purposes that the Address List TLV in LDP 332 serves. This is achieved by advertising the interface address as 333 host prefixes with IPv4/v6 Address Specific ECs corresponding to the 334 router's local addresses used for its BGP sessions (Section 2.1.5). 336 Because the BGP Capability Advertisement is only between two peers, 337 when the sessions are only via RRs, a router needs another way to 338 determine if its neighbor is capable of signaling multicast via BGP. 339 The interface address advertisement can be used for that purpose - 340 the inclusion of a Session Address EC indicates that the BGP speaker 341 identified in the EC supports the C-Multicast NLRI. 343 FHRs and LHRs may also establish BGP sessions to some Route 344 Reflectors for source discovery purpose (Section 1.2.1.1). 346 With the traditional PIM, the FHRs and LHRs refer to the PIM DRs on 347 the source or receiver networks. With BGP based multicast, PIM may 348 not be running at all, and the FHRs and LHRs refer to the IGMP/MLD 349 queriers, or the DF elected per [I-D.wijnands-bier-mld-lan-election]. 350 Alternatively, if it is known that a network only has senders then no 351 IGMP/MLD or DF election is needed - any router may generate SA 352 routes. That will not cause any issue other than redundant SA routes 353 being originated. 355 1.2.4. LAN and Parallel Links 357 There could be parallel links between two BGP peers. A single multi- 358 hop session, whether IBGP or EBGP, between loopback addresses may be 359 used. Except for LAN interfaces in case of unlabeled (x,g) 360 unidirectional trees (note that transit LAN interface is not 361 supported for BGP signaled (*,g) bidirectional tree and for mLDP 362 tunnels, traffic on transit LAN is point to point between neighbors), 363 any link between the two peers can be automatically used by a 364 downstream peer to receive traffic from the upstream peer, and it is 365 for the upstream peer to decide which link to use. If one of the 366 links goes down, the upstream peer switches to a different link and 367 there is no change needed on the downstream peer. 369 For unlabeled (x,g) unidirectional trees, the upstream peer MAY 370 prefer LAN interfaces to send traffic, since multiple downstream 371 peers may be reached simultaneously, or it may make a decision based 372 on local policy, e.g., for load balancing purpose. Because different 373 downstream peers might choose different upstream peers for RPF, when 374 an upstream peer decides to use a LAN interface to send traffic, it 375 originates an S-PMSI A-D route indicating that one or more LAN 376 interface will be used. The route carries Route Targets specific to 377 the LANs so that all the peers on the LANs import the route. If more 378 than one router originate the route specifying the same LAN for the 379 same (s,g) or (*,g) flow, then assert procedure based on the S-PMSI 380 A-D routes happens and assert losers will stop sending traffic to the 381 LAN. 383 1.2.5. Transition 385 A network currently running PIM can be incrementally transitioned to 386 BGP based multicast. At any time, a router supporting BGP based 387 multicast can use PIM with some neighbors (upstream or downstream) 388 and BGP with some other neighbors. PIM and BGP MUST not be used 389 simultaneously between two neighbors for multicast purpose, and 390 routers connected to the same LAN MUST be transitioned during the 391 same maintenance window. 393 In case of PIM-SSM, any router can be transitioned at any time 394 (except on a LAN all routers must be transitioned together). It may 395 receive source tree joins from a mixed set of BGP and PIM downstream 396 neighbors and send source tree joins to its upstream neighbor using 397 either PIM or BGP signaling. 399 In case of PIM-ASM, the RPs are first upgraded to support BGP based 400 multicast. They learn sources either via PIM procedures from PIM 401 FHRs, or via Source Active A-D routes from BGP FHRs. In the former 402 case, the RPs can originate proxy Source Active A-D routes. There 403 may be a mixed set of RPs/RRs - some capable of both traditional PIM 404 RP functionalities while some only redistribute SA routes. 406 Then any routers can be transitioned incrementally. A transitioned 407 LHR router will pull Source Active A-D routes from the RPs/RRs when 408 they receive IGMP/MLD (*,G) joins for ASM groups, and may send either 409 PIM (s,g) joins or BGP Source Tree Join routes. A transitioned 410 transit router may receive (*,g) PIM joins but only send source tree 411 joins after pulling Source Active A-D routes from RPs/RRs. 413 Similarly, a network currently running mLDP can be incrementally 414 transitioned to BGP signaling. Without the complication of ASM, any 415 router can be transitioned at any time, even without the restriction 416 of coordinated transition on a LAN. It may receive mixed mLDP label 417 mapping or BGP updates from different downstream neighbors, and may 418 exchange either mLDP label mapping or BGP updates with its upstream 419 neighbors, depending on if the neighbor is using BGP based signaling 420 or not. 422 2. Specification 424 2.1. BGP NLRIs and Attributes 426 The BGP Multiprotocol Extensions [RFC4760] allow BGP to carry routes 427 from multiple different "AFI/SAFIs". This document defines a new 428 SAFI known as a MCAST-TREE SAFI with a value to be assigned by the 429 IANA. This SAFI is used along with the AFI of IPv4 (1) or IPv6 (2). 431 The MCAST-TREE NLRI defined below is carried in the BGP UPDATE 432 messages [RFC4271] using the BGP multiprotocol extensions [RFC4760] 433 with a AFI of IPv4 (1) or IPv6 (2) assigned by IANA and a MCAST-TREE 434 SAFI with a value to be assigned by the IANA. 436 The Next hop field of MP_REACH_NLRI attribute SHALL be interpreted as 437 an IPv4 adress whenever the length of the Next Hop address is 4 438 octets, and as an IPv6 address whenever the length of the Next Hop is 439 address is 16 octets. 441 The NLRI field in the MP_REACH_NLRI and MP_UNREACH_NLRI is a prefix 442 with a maximum length of 12 octers for IPv4 AFI and 36 octets for 443 IPv6 AFI. The following is the format of the MCAST-TREE NLRI: 445 +-----------------------------------+ 446 | Route Type (1 octet) | 447 +-----------------------------------+ 448 | Length (1 octet) | 449 +-----------------------------------+ 450 | Route Type specific (variable) | 451 +-----------------------------------+ 453 The Route Type field defines encoding of the rest of the MCAST-TREE 454 NLRI. (Route Type specific MCAST-TREE NLRI). 456 The Length field indicates the length in octets of the Route Type 457 specific field of MCAST-TREE NLRI. 459 The following new route types are defined: 461 3 - S-PMSI A-D Route for (x,g) 462 4 - Leaf A-D Route 463 5 - Source Active A-D Route 464 0x43 - S-PMSI A-D Route for C-multicast mLDP 466 Except for the Source Active A-D routes, the routes are to be 467 consumed by targeted upstream/downstream neighbors, and are not 468 propagated further. This can be achieved by outbound filtering based 469 on the RTs that lead to the importation of the routes. 471 The Type-3/4 routes MAY carry a Tunnel Encapsulation Attribute (TEA) 472 [I-D.ietf-idr-tunnel-encaps]. The Type-0x43 route MUST carry a TEA. 473 When used for mLDP, the Type-4 route MUST carry a TEA. Only the MPLS 474 tunnel type for the TEA is considered. Others are outside the scope 475 of this document. 477 2.1.1. S-PMSI A-D Route 479 Similar to defined in RFC 6514, an S-PMSI A-D Route Type specific 480 MCAST-TREE NLRI consists of the following, though it does not have an 481 RD: 483 +-----------------------------------+ 484 | Multicast Source Length (1 octet) | 485 +-----------------------------------+ 486 | Multicast Source (variable) | 487 +-----------------------------------+ 488 | Multicast Group Length (1 octet) | 489 +-----------------------------------+ 490 | Multicast Group (variable) | 491 +-----------------------------------+ 492 | Upstream Router's IP Address | 493 +-----------------------------------+ 495 If the Multicast Source (or Group) field contains an IPv4 address, 496 then the value of the Multicast Source (or Group) Length field is 32. 497 If the Multicast Source (or Group) field contains an IPv6 address, 498 then the value of the Multicast Source (or Group) Length field is 499 128. 501 Usage of other values of the Multicast Source Length and Multicast 502 Group Length fields is outside the scope of this document. 504 There are two usages for S-PMSI A-D route. They're described in 505 Section 2.2.5 and Section 2.2.6 respectively. 507 2.1.2. Leaf A-D Route 509 Similar to the Leaf A-D route in [RFC6514], a MCAST-TREE Leaf A-D 510 route's route key includes the corresponding S-PMSI NLRI, plus the 511 Originating Router's IP Addr. The difference is that there is no RD. 513 +-----------------------------------+ 514 | S-PMSI NLRI | 515 +-----------------------------------+ 516 | Originating Router's IP Addrress | 517 +-----------------------------------+ 519 For example, the entire NLRI of a Leaf A-D route for (x,g) tree is as 520 following: 522 +- +-----------------------------------+ 523 | | Route Type - 4 (Leaf A-D) | 524 | +-----------------------------------+ 525 | | Length (1 octet) | 526 | +- +-----------------------------------+ --+ 527 | | | Route Type - 3 (S-PMSI A-D) | | 528 L | L | +-----------------------------------+ | S 529 E | E | | Length (1 octet) | | | 530 A | A | +-----------------------------------+ | P 531 F | F | | Multicast Source Length (1 octet) | | M 532 | | +-----------------------------------+ | S 533 N | R | | Multicast Source (variable) | | I 534 L | O | +-----------------------------------+ | 535 R | U | | Multicast Group Length (1 octet) | | N 536 I | T | +-----------------------------------+ | L 537 | E | | Multicast Group (variable) | | R 538 | | +-----------------------------------+ | I 539 | K | | Upstream Router's IP Address | | 540 | E | +-----------------------------------+ --+ 541 | Y | | Originating Router's IP Addrress | 542 +- +- +-----------------------------------+ 544 Even though the MCAST-TREE Leaf A-D route is unsolicited, unlike the 545 Leaf A-D route for GTM in [RFC7524], it is encoded as if a 546 corresponding S-PMSI A-D route had been received. 548 When used for signaling mLDP tunnels, even though the Leaf A-D route 549 is unsolicited, unlike the "Route-type 0x44 Leaf A-D route for 550 C-multicast mLDP" as in [RFC7441], it is Route-type 4 and encoded as 551 if a corresponding S-PMSI A-D route had been received. 553 2.1.3. Source Active A-D Route 555 Similar to defined in RFC 6514, a Source Active A-D Route Type 556 specific MCAST NLRI consists of the following: 558 +-----------------------------------+ 559 | Multicast Source Length (1 octet) | 560 +-----------------------------------+ 561 | Multicast Source (variable) | 562 +-----------------------------------+ 563 | Multicast Group Length (1 octet) | 564 +-----------------------------------+ 565 | Multicast Group (variable) | 566 +-----------------------------------+ 568 The definition of the source/length and group/length fields are the 569 same as in the S-PMSI A-D routes. 571 Usage of Source Active A-D routes is described in Section 1.2.1.1. 573 2.1.4. S-PMSI A-D Route for C-multicast mLDP 575 The route is used to signal upstream FEC for an MP2MP mLDP tunnel. 576 The route key include the mLDP FEC and the Upstream Router's IP 577 Address field. The encoding is similar to the same route in 578 [RFC7441], though there is no RD. 580 2.1.5. Session Address Extended Community 582 For two BGP speakers to determine if they are directly connected, 583 each will advertise their local interface addresses, with an Session 584 Address Extended Community. This is an Address Specific EC, with the 585 Global Admin Field set to the local address used for its multihop 586 sessions and the Local Admin Field set to the prefix length 587 corresponding to the interface's network mask. 589 For example, if a router has two interfaces with address 590 10.10.10.1/24 and 10.12.0.1/16 respectively (notice the different 591 network mask), and a loopback address 11.11.11.1/32 that is used for 592 BGP sessions, then it will advertise prefix 10.10.10.1/32 with a 593 Session Address EC 11.11.11.1:24 and 10.12.0.1/32 with a Session 594 Address EC 11.11.11.1:16. If it also uses another loopback address 595 11.11.11.11/32 for other BGP sessions, then the routes will 596 additionally carry Session Address EC 11.11.11.11:24 and 597 11.11.11.11:16 respectively. 599 This achieves what the Address List TLV in LDP Address Messages 600 achieves, and can also be used to indicate that a router supports the 601 BGP multicast signaling procedures specified in this document. 603 Only those interface addresses that will be used as resolved nexthops 604 in the RIB need to be advertised with the Session Address EC. For 605 example, the RPF lookup may say that the resolved nexthop address is 606 A1, so the router needs to find out the corresponding BGP speaker 607 with address A1 through the (interface address, session address) 608 mapping built according to the interface address NLRI with the 609 Session Address EC. For comparison with LDP, this is done via the 610 (interface address, session address) mapping that is built by the LDP 611 Address Messages. 613 2.2. Procedures 615 2.2.1. Source Discovery for ASM 617 When a FHR first receives a multicast packet addressed to an ASM 618 group, it originates a Source Active route. It carries a IP/IPv6 619 Address Specific RT, with the Global Admin Field set to the group 620 address and the Local Admin Field set to 0. The route is advertised 621 to its peers, who will re-advertise further based on the RTC 622 mechanisms. Note that typically the route is advertised only to the 623 RRs. 625 The FHRs withdraws the Source Active route after a certain amount of 626 time since it last received a packet of an (s,g) flow. The amount of 627 time to wait is a local matter. 629 2.2.2. Originating Tree Join Routes 631 Note that in this document, tree join routes are S-PMSI/Leaf A-D 632 routes. 634 2.2.2.1. (x,g) Multicast Tree 636 When a router learns from IGMP/MLD or a downstream PIM/BGP peer that 637 it needs to join a particular (s,g) tree, it determines the RPF 638 nexthop address wrt the source, following the same RPF procedures as 639 defined for PIM. It further finds the BGP router that advertised the 640 nexthop address as one of its local addresses. 642 If the RPF neighbor supports MCAST-TREE SAFI, this router originates 643 a Leaf A-D route. Although it is unsolicited, it is constructed as 644 if there was a corresponding S-PMSI A-D route. The Upstream Router's 645 IP Address field is set to the RPF neighbor's session address (learnt 646 via the EC attached to the host route for the RPF nexthop address). 647 An Address Specific RT corresponding to the session address is 648 attached to the route, with the Global Administrative Field set to 649 the session address and the local administrative field set to 0. 651 Similarly, when a router learns that it needs to join a bi- 652 directional tree for a particular group, it determines the RPF 653 neighbor wrt the RPA. If the neighbor supports MCAST-TREE SAFI, it 654 originates a Leaf A-D Route and advertises the route to the RPF 655 neighbor (in case of EBGP or hop-by-hop IBGP), or one or more RRs. 657 When a router first learns that it needs to receive traffic for an 658 ASM group, either because of a local (*,g) IGMP/MLD report or a 659 downstream PIM (*,g) join, it originates a RTC route with the NLRI's 660 AS field set to its AS number and the Route Target field set to an 661 address based RT, with the Global Administrator field set to group 662 address and the Local Administrator field set to 0. The route is 663 advertised to its peers (most practically some RRs), so that the 664 router can receive matching Source Active A-D routes. Upon the 665 receiving of the Source Active A-D routes, the router originates Leaf 666 A-D routes as described above, as long as it still needs to receive 667 traffic for the flows (i.e., the corresponding IGMP/MLD membership 668 exists or join from downstream PIM/BGP neighbor exists). 670 When a Leaf A-D route is originated by this router, it sets up 671 corresponding forwarding state such that the expected incoming 672 interface list includes all non-LAN interfaces directly connecting to 673 the upstream neighbor. LAN interfaces are added upon receiving 674 corresponding S-PMSI A-D route (Section 2.2.5.2). If the upstream 675 neighbor is not directly connected, tunnels may be used - details to 676 be included in future revisions. 678 When the upstream neighbor changes, the previously advertised Leaf 679 A-D route is withdrawn. If there is a new upstream neighbor, a new 680 Leaf A-D route is originated, corresponding to the new neighbor. 681 Because NLRIs are different for the old and new Leaf A-D routes, 682 make-before-break as well as MoFRR [RFC7431] can be achieved. 684 2.2.2.2. BGP Inband Signaling for mLDP Tunnel 686 The same mLDP procedures as defined in [RFC6388] are followed, except 687 that where a label mapping message is sent in [RFC6388], a Leaf A-D 688 route is sent if the the upstream neighbor supports BGP based 689 signaling. 691 2.2.3. Receiving Tree Join Routes 693 A router (auto-)configures Import RTs matching itself so that it can 694 import tree join routes from their peers. Note that in this 695 document, tree join routes are S-PMSI/Leaf A-D routes. 697 When a router receives a tree join route and imports it, it 698 determines if it needs to originate its own corresponding route and 699 advertise further upstream wrt the source/RPA or mLDP tunnel root. 700 If this router is the FHR or is on the RPL or is the tunnel root, 701 then it does not need to. Otherwise the procedures in Section 2.2.2 702 are followed. 704 Additionally, the router sets up its corresponding forwarding state 705 such that traffic will be sent to the downstream neighbor, and 706 received from the downstream neighbor in case of birectional tree/ 707 tunnel. If the downstream neighbor is not directly connected, 708 tunnels may be used - details to be included in future revisions. 710 2.2.4. Withdrawl of Tree Join Routes 712 For a particular tree or tunnel, if a downstream neighbor withdraws 713 its Leaf A-D route, the neighbor is removed from the corresponding 714 forwarding state. If all downstream neighbors withdraw their tree 715 join routes and this router no longer has local receivers, it 716 withdraws the tree join routes that it previously originated. 718 As mentioned earlier, when the upstream neighbor changes, the 719 previously advertised Leaf A-D route is also withdrawn. The 720 corresponding incoming interfaces are also removed from the 721 corresponding forwarding state. 723 2.2.5. LAN procedures for (x,g) Unidirectional Tree 725 For a unidirectional (x,g) multicast tree, if there is a LAN 726 interface connecting to the downstream neighbor, it MAY be preferred 727 over non-LAN interfaces, but an S-PMSI A-D route MUST be originated 728 to facilitate the analog of the Assert process (Section 2.2.5.1). 730 2.2.5.1. Originating S-PMSI A-D Routes 732 If this router chooses to use a LAN interface to send traffic to its 733 neighbors for a particular (s,g) or (*,g) flow, it MUST announce that 734 by originating a corresponding S-PMSI A-D route. The Tunnel Type in 735 the PMSI Tunnel Attribute (PTA) is set to 0 (no tunnel information 736 Present). The LAN interface is identified by an IP address specific 737 RT, with the Global Administrative Field set to the LAN interface's 738 address prefix and the Local Administrative Field set to the prefix 739 length. The RT also serves the purpose of restricting the importing 740 of the route by all routers on the LAN. An operator MUST ensure that 741 RTs encoded as above are not used for other purposes. Practically 742 that should not be unreasonable. 744 If multiple LAN interfaces are to be used (to reach different sets of 745 neighbors), then the route will include multiple RTs, one for each 746 used LAN interface as described above. 748 The S-PMSI A-D routes may also be used to announce tunnels that could 749 be used to send traffic to downstream neighbors that are not directly 750 connected. Details may be added in future revisions. 752 2.2.5.2. Receiving S-PMSI A-D Routes 754 A router (auto-)configures an Import RT for each of its LAN 755 interfaces over which BGP is used for multicast signaling. The 756 construction of the RT is described in the previous section. 758 When a router R1 imports an S-PMSI A-D route for flow (x,g) from 759 router R2, R1 checks to see if it also originating an S-PMSI A-D 760 route with the same NLRI except the Upstream Router's IP Address 761 field. When a router R1 originates an S-PMSI A-D route, it checks to 762 see if it also has installed an S-PMSI A-D route, from some other 763 router R2, with the same NLRI except the Upstream Router's IP Address 764 field. In either case, R1 checks to see if the two routes have an RT 765 in common and the RT is encoded as in Section 2.2.5.1. If so, then 766 there is a LAN attached to both R1 and R2, and both routers are 767 prepared to send (S,G) traffic onto that LAN. This kicks off the 768 assert procedure to elect a winner - the one with the highest 769 Upstream Router's IP Address in the NLRI wins. An assert loser will 770 not include the corresponding LAN interface in its outgoing interface 771 list, but it keeps the S-PMSI A-D route that it originates. 773 If this router does not have a matching S-PMSI route of its own with 774 some common RTs, and the originator of the received S-PMSI route is a 775 chosen upstream neighbor for the corresponding flow, then this router 776 updates its forwarding state to include the LAN interface in the 777 incoming interface list. When the last S-PMSI route with a RT 778 matching the LAN is withdrawn later, the LAN interface is removed 779 from the incoming interface list. 781 Note that a downstream router on the LAN does not participate in the 782 assert procedure. It adds/keeps the LAN interface in the expected 783 incoming interfaces as long as its chosen upstream peer originates 784 the S-PMSI AD route. It does not switch to the assert winner as its 785 upstream. An assert loser MAY keep sending joins upstream based on 786 local policy even if it has no other downstream neighbors (this could 787 be used for fast switch over in case the assert winner would fail). 789 2.2.6. Distributing Label for Upstream Traffic for Bidirectional Tree/ 790 Tunnel 792 For MP2MP mLDP tunnels or labeled (*,g) bidirectional trees, an 793 upstream router needs to advertise a label to all its downstream 794 neighbors so that the downstream neighbors can send traffic to 795 itself. 797 For MP2MP mLDP tunnels, the same procedures for mLDP are followed 798 except that instead of MP2MP-U Label Mapping messages, S-PMSI A-D 799 Routes for C-Multicast mLDP are used. 801 For labeled (*,g) bidirectional trees, for a Leaf A-D route received 802 from a downstream neighbor, a corresponding S-PMSI A-D route is sent 803 back to the downstream router. 805 In both cases, a single S-PMSI A-D route is originated for each tree 806 from this router, but with multiple RTs (one for each downstream 807 neighbor on the tree). A TEA specifies a label allocated by the 808 upstream router for its downstream neighbors to send traffic with. 809 Note that this is still a "downstream allocated" label (the upstream 810 router is "downstream" from traffic direction point of view). 812 The S-PMSI routes do not carry a PTA, unless a P2MP tunnel is used to 813 reach downstream neighbors. Such use case is out of scope of this 814 document for now and may be specified in the future. 816 3. Security Considerations 818 This document does not introduce new security risks. 820 4. Acknowledgements 822 The authors thank Marco Rodrigues for his initial idea/ask of using 823 BGP for multicast signaling beyond MVPN. We thank Eric Rosen for his 824 questions, suggestions, and help finding solutions to some issues. 825 We also thank Luay Jalil and James Uttaro for their comments and 826 support for the work. 828 5. References 830 5.1. Normative References 832 [I-D.ietf-idr-tunnel-encaps] 833 Patel, K., Velde, G., and S. Ramachandra, "The BGP Tunnel 834 Encapsulation Attribute", draft-ietf-idr-tunnel-encaps-14 835 (work in progress), September 2019. 837 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 838 Requirement Levels", BCP 14, RFC 2119, 839 DOI 10.17487/RFC2119, March 1997, 840 . 842 [RFC4601] Fenner, B., Handley, M., Holbrook, H., and I. Kouvelas, 843 "Protocol Independent Multicast - Sparse Mode (PIM-SM): 844 Protocol Specification (Revised)", RFC 4601, 845 DOI 10.17487/RFC4601, August 2006, 846 . 848 [RFC4684] Marques, P., Bonica, R., Fang, L., Martini, L., Raszuk, 849 R., Patel, K., and J. Guichard, "Constrained Route 850 Distribution for Border Gateway Protocol/MultiProtocol 851 Label Switching (BGP/MPLS) Internet Protocol (IP) Virtual 852 Private Networks (VPNs)", RFC 4684, DOI 10.17487/RFC4684, 853 November 2006, . 855 [RFC5015] Handley, M., Kouvelas, I., Speakman, T., and L. Vicisano, 856 "Bidirectional Protocol Independent Multicast (BIDIR- 857 PIM)", RFC 5015, DOI 10.17487/RFC5015, October 2007, 858 . 860 [RFC6514] Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP 861 Encodings and Procedures for Multicast in MPLS/BGP IP 862 VPNs", RFC 6514, DOI 10.17487/RFC6514, February 2012, 863 . 865 [RFC7441] Wijnands, IJ., Rosen, E., and U. Joorde, "Encoding 866 Multipoint LDP (mLDP) Forwarding Equivalence Classes 867 (FECs) in the NLRI of BGP MCAST-VPN Routes", RFC 7441, 868 DOI 10.17487/RFC7441, January 2015, 869 . 871 5.2. Informative References 873 [I-D.ietf-bess-mvpn-pe-ce] 874 Patel, K., Rosen, E., and Y. Rekhter, "BGP as an MVPN PE- 875 CE Protocol", draft-ietf-bess-mvpn-pe-ce-01 (work in 876 progress), October 2015. 878 [I-D.wijnands-bier-mld-lan-election] 879 Wijnands, I., Pfister, P., and Z. Zhang, "Generic 880 Multicast Router Election on LAN's", draft-wijnands-bier- 881 mld-lan-election-01 (work in progress), July 2016. 883 [RFC6826] Wijnands, IJ., Ed., Eckert, T., Leymann, N., and M. 884 Napierala, "Multipoint LDP In-Band Signaling for Point-to- 885 Multipoint and Multipoint-to-Multipoint Label Switched 886 Paths", RFC 6826, DOI 10.17487/RFC6826, January 2013, 887 . 889 [RFC7431] Karan, A., Filsfils, C., Wijnands, IJ., Ed., and B. 890 Decraene, "Multicast-Only Fast Reroute", RFC 7431, 891 DOI 10.17487/RFC7431, August 2015, 892 . 894 [RFC7938] Lapukhov, P., Premji, A., and J. Mitchell, Ed., "Use of 895 BGP for Routing in Large-Scale Data Centers", RFC 7938, 896 DOI 10.17487/RFC7938, August 2016, 897 . 899 Authors' Addresses 901 Zhaohui Zhang 902 Juniper Networks 904 EMail: zzhang@juniper.net 906 Lenny Giuliano 907 Juniper Networks 909 EMail: lenny@juniper.net 911 Keyur Patel 912 Arrcus 914 EMail: keyur@arrcus.com 916 IJsbrand Wijnands 917 Cisco Systems 919 EMail: ice@cisco.com 921 Mankamana Mishra 922 Cisco Systems 924 EMail: mankamis@cisco.com 926 Arkadiy Gulko 927 Refinitiv 929 EMail: arkadiy.gulko@refinitiv.com