idnits 2.17.1 draft-zzhang-bess-bgp-multicast-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) == There are 8 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: A network currently running PIM can be incrementally transitioned to BGP based multicast. At any time, a router supporting BGP based multicast can use PIM with some neighbors (upstream or downstream) and BGP with some other neighbors. PIM and BGP MUST not be used simultaneously between two neighbors for multicast purpose, and routers connected to the same LAN MUST be transitioned during the same maintenance window. -- The document date (March 13, 2017) is 2598 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC7438' is mentioned on line 260, but not defined == Missing Reference: 'RFC5492' is mentioned on line 311, but not defined == Missing Reference: 'RFC4760' is mentioned on line 313, but not defined == Missing Reference: 'RFC7524' is mentioned on line 511, but not defined == Missing Reference: 'RFC6388' is mentioned on line 654, but not defined == Unused Reference: 'RFC2119' is defined on line 808, but no explicit reference was found in the text == Unused Reference: 'RFC4601' is defined on line 813, but no explicit reference was found in the text == Unused Reference: 'RFC5015' is defined on line 826, but no explicit reference was found in the text == Outdated reference: A later version (-22) exists of draft-ietf-idr-tunnel-encaps-03 ** Obsolete normative reference: RFC 4601 (Obsoleted by RFC 7761) == Outdated reference: A later version (-03) exists of draft-wijnands-bier-mld-lan-election-01 Summary: 2 errors (**), 0 flaws (~~), 13 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 BESS Z. Zhang 3 Internet-Draft Juniper Networks 4 Intended status: Standards Track K. Patel 5 Expires: September 14, 2017 Arrcus 6 I. Wijnands 7 Cisco Systems 8 A. Gulko 9 Thomson Reuters 10 March 13, 2017 12 BGP Based Multicast 13 draft-zzhang-bess-bgp-multicast-01 15 Abstract 17 This document specifies a BGP address family and related procedures 18 that allow BGP to be used for setting up multicast distribution 19 trees. This document also specifies procedures that enable BGP to be 20 used for multicast source discovery, and for showing interest in 21 receiving particular multicast flows. Taken together, these 22 procedures allow BGP to be used as a replacement for other multicast 23 routing protocols, such as PIM or mLDP. The BGP procedures specified 24 here are based on the BGP multicast procedures that were originally 25 designed for use by providers of Multicast Virtual Private Network 26 service. 28 Requirements Language 30 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 31 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 32 document are to be interpreted as described in RFC2119. 34 Status of This Memo 36 This Internet-Draft is submitted in full conformance with the 37 provisions of BCP 78 and BCP 79. 39 Internet-Drafts are working documents of the Internet Engineering 40 Task Force (IETF). Note that other groups may also distribute 41 working documents as Internet-Drafts. The list of current Internet- 42 Drafts is at http://datatracker.ietf.org/drafts/current/. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on September 14, 2017. 50 Copyright Notice 52 Copyright (c) 2017 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (http://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 68 1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . 3 69 1.1.1. Native/unlabeled Multicast . . . . . . . . . . . . . 3 70 1.1.2. Labeled Multicast . . . . . . . . . . . . . . . . . . 4 71 1.2. Overview . . . . . . . . . . . . . . . . . . . . . . . . 4 72 1.2.1. (x,g) Multicast . . . . . . . . . . . . . . . . . . . 4 73 1.2.1.1. Source Discovery for ASM . . . . . . . . . . . . 5 74 1.2.1.2. ASM Shared-tree-only Mode . . . . . . . . . . . . 6 75 1.2.1.3. Integration with BGP-MVPN . . . . . . . . . . . . 6 76 1.2.2. BGP Inband Signaling for mLDP Tunnel . . . . . . . . 7 77 1.2.3. BGP Sessions . . . . . . . . . . . . . . . . . . . . 7 78 1.2.4. LAN and Parallel Links . . . . . . . . . . . . . . . 8 79 1.2.5. Transition . . . . . . . . . . . . . . . . . . . . . 9 80 2. Specification . . . . . . . . . . . . . . . . . . . . . . . . 9 81 2.1. BGP NLRIs and Attributes . . . . . . . . . . . . . . . . 9 82 2.1.1. S-PMSI A-D Route . . . . . . . . . . . . . . . . . . 10 83 2.1.2. Leaf A-D Route . . . . . . . . . . . . . . . . . . . 11 84 2.1.3. Source Active A-D Route . . . . . . . . . . . . . . . 12 85 2.1.4. S-PMSI A-D Route for C-multicast mLDP . . . . . . . . 12 86 2.1.5. Session Address Extended Community . . . . . . . . . 12 87 2.2. Procedures . . . . . . . . . . . . . . . . . . . . . . . 13 88 2.2.1. Source Discovery for ASM . . . . . . . . . . . . . . 13 89 2.2.2. Originating Tree Join Routes . . . . . . . . . . . . 13 90 2.2.2.1. (x,g) Multicast Tree . . . . . . . . . . . . . . 13 91 2.2.2.2. BGP Inband Signaling for mLDP Tunnel . . . . . . 14 92 2.2.3. Receiving Tree Join Routes . . . . . . . . . . . . . 15 93 2.2.4. Withdrawl of Tree Join Routes . . . . . . . . . . . . 15 94 2.2.5. LAN procedures for (x,g) Unidirectional Tree . . . . 15 95 2.2.5.1. Originating S-PMSI A-D Routes . . . . . . . . . . 15 96 2.2.5.2. Receiving S-PMSI A-D Routes . . . . . . . . . . . 16 97 2.2.6. Distributing Label for Upstream Traffic for 98 Bidirectional Tree/Tunnel . . . . . . . . . . . . . . 17 99 3. Security Considerations . . . . . . . . . . . . . . . . . . . 17 100 4. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 17 101 5. References . . . . . . . . . . . . . . . . . . . . . . . . . 17 102 5.1. Normative References . . . . . . . . . . . . . . . . . . 17 103 5.2. Informative References . . . . . . . . . . . . . . . . . 19 104 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 19 106 1. Introduction 108 1.1. Motivation 110 This section provides some motivation for BGP signaling for native 111 and labeld multicast. One target deployment would be a Data Center 112 that requires multicast but uses BGP as its only routing protocol 113 [RFC7938]. In such a deployment, it would be desirable to support 114 multicast by extending the deployed routing protocol, without 115 requiring the deployment of tree building protocols such as PIM, 116 mLDP, RSVP-TE P2MP, and without requiring an IGP. 118 Additionally, compared to PIM, BGP based signaling has several 119 advantage as described in the following section, and may be desired 120 in non-DC deployment scenarios as well. 122 1.1.1. Native/unlabeled Multicast 124 Protocol Independent Multicast (PIM) has been the prevailing 125 multicast protocol for many years. Despite its success, it has two 126 drawbacks: 128 o The ASM model, which is prevalent, introduces complexity in the 129 following areas: source discovery procedures, need for Rendezvous 130 Points (RPs) and group-to-RP mappings, need to switch between RP- 131 rooted trees and source-rooted trees, etc. 133 o Periodical protocol state refreshes due to soft state nature. 135 While PIM-SSM removes the complexity of PIM-ASM, it requires that 136 multicast sources are known apriori. There have not been a good way 137 of discovering sources, so its deployment has been limited. PIM-Port 138 (PIM over Reliable Transport) solves the soft state issue, though its 139 deployment has also been limited for two reasons: 141 o It does not remove the ASM complexities. 143 o In many of the scenarios where reliable transport is deemed 144 important, BGP-based multicast (e.g. BGP-MVPN) has been used 145 instead of PORT. 147 Partly because of the above mentioned problems, some Data Center 148 operators have been avoiding deploying multicast in their networks. 150 BGP-MVPN [RFC6514] uses BGP to signal VPN customer multicast state 151 over provider networks. It removes the above mentioned problems from 152 the SP environment, and the deployment experiences have been 153 encouraging. While RFC 6514 makes it possible for an SP to provide 154 MVPN service without running PIM on its backbone, that RFC still 155 assumes that PIM (or mLDP) runs on the PE-CE links. [draft-ietf-bess- 156 mvpn-pe-ce] adapts the concept of BGP-MVPN to PE-CE links so that the 157 use of PIM on the PE-CE links can be eliminated (though the PIM-ASM 158 complexities still remains in the customer network), and this 159 document extends it further to general topologies, so that they can 160 be run on any router, as a replacement for PIM or mLDP. 162 With that, PIM can be completely eliminated from the network. PIM 163 soft state is replaced by BGP hard state. For ASM, source specific 164 trees are set up directly after simpler source discovery (data driven 165 on FHRs and control driven elsewhere), all based on BGP. All the 166 complexities related to source discovery and shared/source tree 167 switch are also eliminated. Additionally, the trees can be setup 168 with MPLS labels, with just minor enhancements in the signaling. 170 1.1.2. Labeled Multicast 172 There could be two forms of labeled multicast signaled by BGP. The 173 first one is labeled (x,g) multicast where 'x' stands for either 's' 174 or '*'. Basically, it is for BGP-signaled multicast tree as 175 described in previous section but with labels. The second one is for 176 mLDP tunnels with BGP signaling in part or whole through a BGP 177 domain. 179 For both cases, BGP is used because other label distribution 180 mechanisms like mLDP may not be desired by some operators. For 181 example, a DC operator may prefer to have a BGP-only deployment. 183 1.2. Overview 185 1.2.1. (x,g) Multicast 187 PIM-like functionality is provided, using BGP-based join/prune 188 signaling and BGP-based source discovery for ASM. The BGP-based join 189 signaling supports both labeled multicast and IP multicast. 191 The same RPF procedures as in PIM are used for each router to 192 determine the RPF neighbor for a particular source or RPA (in case of 193 Bidirectional Tree). Except in the Bidirectional Tree case and a 194 special case described in Section 1.2.1.2, no (*,G) join is used - 195 LHR routers discover the sources for ASM and then join towards the 196 sources directly. Data driven mechanisms like PIM Assert is replaced 197 by control driven mechanisms (Section 1.2.4). 199 The joins are carried in BGP Updates with C-MCAST SAFI defined in 200 [draft-ietf-bess-mvpn-pe-ce] and S-PMSI/Leaf A-D routes defined in 201 this document. The updates are targeted at the upstream neighbor by 202 use of Route Targets. [Note - earlier version of this draft uses 203 C-multicast route to send joins. We're now switching to S-PMSI/Leaf 204 routes for three reasons. a) when the routes go through RRs, we have 205 to distinguish different routes based on upstream router and 206 downstream router. This leads to Leaf routes. b) for labeled 207 bidirectional trees, we need to signal "upstream fec". S-PMSI suits 208 this very well. c) we may want to allow the option of setting up 209 trees from the roots instead of from the leaves. S-PMSI suits that 210 very well.] 212 If the BGP updates carry labels (via Tunnel Encapsulation Attribute 213 [I-D.ietf-idr-tunnel-encaps]), then (s,g) multicast traffic can use 214 the labels. This is very similar to mLDP Inband Signaling [RFC6826], 215 except that there are no corresponding "mLDP tunnels" for the PIM 216 trees. Similar to mLDP, labeled traffic on transit LANs are point to 217 point. Of course, traffic sent to receivers on a LAN by a LHR is 218 native multicast. 220 For labeled bidirectional (*,g) trees, downstream traffic (away from 221 the RPA) can be forwarded as in the (s,g) case. For upstream traffic 222 (towards RPA), the upstream neighbor needs to advertise a label for 223 its downstream neighbors. The same label that the upstream neighbor 224 advertises to its upstream is the same one that it advertises to its 225 downstreams, using an S-PMSI A-D route. 227 1.2.1.1. Source Discovery for ASM 229 This document does not support ASM via shared trees (aka RP Tree, or 230 RPT) with one exception discussed in the next section. Instead, 231 FHRs, LHRs, and optionally RRs work together to propagate/discover 232 source information via control plane and LHRs join source specific 233 Shortest Path Trees (SPT) directly. 235 A FHR originates Source Active A-D routes upon discovering sources 236 for particular flows and advertise them to its peers. It is desired 237 that the SA routes only reach LHRs that are interested in receiving 238 the traffic. To achieve that, the SA routes carry an IPv4 or IPv6 239 address specific Route Target. The Global Administrator field is set 240 the group address of the flow, and the Local Administrator field is 241 set to 0. An LHR advertises Route Target Membership routes, with the 242 Route Target field in the NLRI set according to the groups it wants 243 to receive traffic for, as how a FHR encode the Route Target in its 244 Source Active routes. The propagation of the SA routes is subject to 245 cooperative export filtering as specified in [RFC4684] and referred 246 to as RTC mechanism in this document. That way, the LHR only 247 receives Source Active routes for groups that it is interested in. 249 Typically, a set of RRs are used and they maintains all Source Active 250 routes but only distribute to interested LHRs on demand (upon 251 receiving corresponding Route Target Memberhip routes, which are 252 triggered on LHRs when they receive IGMP/MLD membership routes). The 253 rest of the document assumes that RRs are used, even though that is 254 not required. 256 1.2.1.2. ASM Shared-tree-only Mode 258 It may be desired that only a shared tree is used to distribute all 259 traffic for a particular ASM group from its RP to all LHRs, as 260 described in Section 4.1 "PIM Shared Tree Forwarding" of [RFC7438]. 261 This will significantly cut down the number of trees and works out 262 very well in certain deployment scenarios. For example, all the 263 sources could be connected to the RP, or clustered close the to RP. 264 In the latter case, either the path from FHRs to the RP do not 265 intersect the shared tree so natvie forwarding can be used between 266 the FHRs and the RP, or other means outside of this document could be 267 used to forward traffic from FHRs to the RP. 269 For native forwading from FHRs to the RP, SA routes may be used to 270 announce the sources so that the RP can join source specific trees to 271 pull traffic, but the group specific Route Target is not needed. The 272 LHRs do not advertise the group specific Route Target Membership 273 routes as they do not need the SA routes. 275 To establish the shared tree, (*,g) Leaf A-D routes are used as in 276 the bidirectional tree case, though no forwarding state is 277 established to forward traffic from downstream neighbors. 279 1.2.1.3. Integration with BGP-MVPN 281 For each VPN, the Source Active routes distribution in that VPN do 282 not have to invlove PEs at all unless there are sources/receivers 283 directly connected to some PEs and they are independent of MVPN SA 284 routes. For example, FHRs and LHRs establish BGP sessions with RRs 285 of that particular VPN for the purpose of SA distribution. 287 After source discovery, BGP multicast signaling is done from LHRs 288 towards the sources. When the signaling reaches an egress PE, BGP- 289 MVPN signaling takes over, as if a PIM (s,g) join/prune was received 290 on the PE-CE interface. When the BGP-MVPN signaling reaches the 291 ingress PE, BGP multicast signaling as specified in this document 292 takes over, similar to how BGP-MVPN triggers PIM (s,g) join/prune on 293 PE-CE interfaces. 295 1.2.2. BGP Inband Signaling for mLDP Tunnel 297 Part of an (or the whole) mLDP tunnel can also be signaled via BGP 298 and seamlessly integrated with the rest of mLDP tunnel signaled 299 natively via mLDP. All the procedures are similar to mLDP except 300 that the signaling is done via BGP. The mLDP FEC is encoded as the 301 BGP NLRI, with C-MCAST SAFI and S-PMSI/Leaf A-D Routes for 302 C-multicast mLDP defined in this document. The Leaf A-D routes 303 correspond to mLDP Label Mapping messages, and the S-PMSI A-D routes 304 are used to signal upstream FEC for MP2MP mLDP tunnels, similar to 305 the bidirection (*,g) case. 307 1.2.3. BGP Sessions 309 As specified in [draft-ietf-bess-mvpn-pe-ce-00], in order for two BGP 310 speakers to exchange C-MCAST NLRI, they must use BGP Capabilities 311 Advertisement [RFC5492] to ensure that they both are capable of 312 properly processing the C-MCAST NLRI. This is done as specified in 313 [RFC4760], by using a capability code 1 (multiprotocol BGP) with an 314 AFI of IPv4 (1) or IPv6 (2) and a SAFI of C-MCAST with a value to be 315 assigned by IANA. 317 How the BGP peer sessions are provisioned, whether EBGP or IBGP, 318 whether statically, automatically (e.g., based on IGP neighbor 319 discovery), or programmably via an external controller, is outside 320 the scope of this document. 322 In case of IBGP, it could be that every router peering with Route 323 Reflectors, or hop by hop IBGP sessions could be used to exchange 324 C-MCAST NLRIs for joins. In the latter case, unless desired 325 otherwise for reasons outside of the scope of this document, the hop 326 by hop IBGP sessions SHOULD only be used to exchange C-MCAST NLRIs. 328 When multihop BGP is used, a router advertises its local interface 329 addresses, for the same purposes that the Address List TLV in LDP 330 serves. This is achieved by advertising the interface address as 331 host prefixes with IPv4/v6 Adderss Specific ECs corresponding to the 332 router's local addresses used for its BGP sessions (Section 2.1.5). 334 Because the BGP Capability Advertisement is only between two peers, 335 when the sessions are only via RRs, a router needs another way to 336 determine if its neighbor is capable of signaling multicast via BGP. 337 The interface address advertisement can be used for that purpose - 338 the inclusion of a Session Address EC indicates that the BGP speaker 339 identified in the EC supports the C-Multicast NLRI. 341 FHRs and LHRs may also establish BGP sessions to some Route 342 Reflectors for source discovery purpose (Section 1.2.1.1). 344 With the traditional PIM, the FHRs and LHRs refer to the PIM DRs on 345 the source or receiver networks. With BGP based multicast, PIM may 346 not be running at all, and the FHRs and LHRs refer to the IGMP/MLD 347 queriers, or the DF elected per [I-D.wijnands-bier-mld-lan-election]. 348 Alternatively, if it is known that a network only has senders then no 349 IGMP/MLD or DF election is needed - any router may generate SA 350 routes. That will not cause any issue other than redundnant SA 351 routes being originated. 353 1.2.4. LAN and Parallel Links 355 There could be parallel links between two BGP peers. A single multi- 356 hop session, whether IBGP or EBGP, between loopback addresses may be 357 used. Except for LAN interfaces in case of unlabeled (x,g) 358 unidirectional trees (note that transit LAN interface is not 359 supported for BGP signaled (*,g) bidirectional tree and for mLDP 360 tunnels, traffic on transit LAN is point to point between neighbors), 361 any link between the two peers can be automatically used by a 362 downstream peer to receive traffic from the upstream peer, and it is 363 for the upstream peer to decide which link to use. If one of the 364 links goes down, the upstream peer switches to a different link and 365 there is no change needed on the downstream peer. 367 For unlabeled (x,g) unidirectional trees, the upstream peer MAY 368 prefer LAN interfaces to send traffic, since multiple downstream 369 peers may be reached simultaneously, or it may make a decision based 370 on local policy, e.g., for load balancing purpose. Because different 371 downstream peers might choose different upstream peers for RPF, when 372 an upstream peer decides to use a LAN interface to send traffic, it 373 originates an S-PMSI A-D route indicating that one or more LAN 374 interface will be used. The route carries Route Targets specific to 375 the LANs so that all the peers on the LANs import the route. If more 376 than one router originate the route specifying the same LAN for the 377 same (s,g) or (*,g) flow, then assert procedure based on the S-PMSI 378 A-D routes happens and assert losers will stop sending traffic to the 379 LAN. 381 1.2.5. Transition 383 A network currently running PIM can be incrementally transitioned to 384 BGP based multicast. At any time, a router supporting BGP based 385 multicast can use PIM with some neighbors (upstream or downstream) 386 and BGP with some other neighbors. PIM and BGP MUST not be used 387 simultaneously between two neighbors for multicast purpose, and 388 routers connected to the same LAN MUST be transitioned during the 389 same maintenance window. 391 In case of PIM-SSM, any router can be transitioned at any time 392 (except on a LAN all routers must be transitioned together). It may 393 receive source tree joins from a mixed set of BGP and PIM downstream 394 neighbors and send source tree joins to its upstream neighbor using 395 either PIM or BGP signaling. 397 In case of PIM-ASM, the RPs are first upgraded to support BGP based 398 multicast. They learn sources either via PIM procedures from PIM 399 FHRs, or via Source Active A-D routes from BGP FHRs. In the former 400 case, the RPs can originate proxy Source Active A-D routes. There 401 may be a mixed set of RPs/RRs - some capable of both traditional PIM 402 RP functionalities while some only redistribute SA routes. 404 Then any routers can be transitioned incrementally. A transitioned 405 LHR router will pull Source Active A-D routes from the RPs/RRs when 406 they receive IGMP/MLD (*,G) joins for ASM groups, and may send either 407 PIM (s,g) joins or BGP Source Tree Join routes. A transitioned 408 transit router may receive (*,g) PIM joins but only send source tree 409 joins after pulling Source Active A-D routes from RPs/RRs. 411 Similarly, a network currently running mLDP can be incrementally 412 transitioned to BGP signaling. Without the complication of ASM, any 413 router can be transitioned at any time, even without the restriction 414 of coordinated transition on a LAN. It may receive mixed mLDP label 415 mapping or BGP updates from different downstream neighbors, and may 416 exchange either mLDP label mapping or BGP updates with its upstream 417 neighbors, depending on if the neighbor is using BGP based signaling 418 or not. 420 2. Specification 422 2.1. BGP NLRIs and Attributes 424 The C-MCAST SAFI defined in [I-D.ietf-bess-mvpn-pe-ce] is used, but 425 new route types are used as defined in this document. 427 3 - S-PMSI A-D Route for (x,g) 428 4 - Leaf A-D Route 429 5 - Source Active A-D Route 430 0x43 - S-PMSI A-D Route for C-multicast mLDP 432 Except for the Source Active A-D routes, the routes are to be 433 consumed by targeted upstream/downstream neighbors, and are not 434 propagated further. This can be achieved by outbound filtering based 435 on the RTs that lead to the importation of the routes. 437 The Type-3/4 routes MAY carry a Tunnel Encapsulation Attribute (TEA) 438 [I-D.ietf-idr-tunnel-encaps]. The Type-0x43 route MUST carry a TEA. 439 When used for mLDP, the Type-4 route MUST carry a TEA. Only the MPLS 440 tunnel type for the TEA is considered. Others are outside the scope 441 of this document. 443 2.1.1. S-PMSI A-D Route 445 Similar to defined in RFC 6514, an S-PMSI A-D Route Type specific 446 C-MCAST NLRI consists of the following, though it does not have an 447 RD: 449 +-----------------------------------+ 450 | Multicast Source Length (1 octet) | 451 +-----------------------------------+ 452 | Multicast Source (variable) | 453 +-----------------------------------+ 454 | Multicast Group Length (1 octet) | 455 +-----------------------------------+ 456 | Multicast Group (variable) | 457 +-----------------------------------+ 458 | Upstream Router's IP Address | 459 +-----------------------------------+ 461 If the Multicast Source (or Group) field contains an IPv4 address, 462 then the value of the Multicast Source (or Group) Length field is 32. 463 If the Multicast Source (or Group) field contains an IPv6 address, 464 then the value of the Multicast Source (or Group) Length field is 465 128. 467 Usage of other values of the Multicast Source Length and Multicast 468 Group Length fields is outside the scope of this document. 470 There are two usages for S-PMSI A-D route. They're described in 471 Section 2.2.5 and Section 2.2.6 respectively. 473 2.1.2. Leaf A-D Route 475 Similar to the Leaf A-D route in [RFC6514], a C-MCAST Leaf A-D 476 route's route key includes the corresponding S-PMSI NLRI, plus the 477 Originating Router's IP Addr. The difference is that there is no RD. 479 +-----------------------------------+ 480 | S-PMSI NLRI | 481 +-----------------------------------+ 482 | Originating Router's IP Addrress | 483 +-----------------------------------+ 485 For example, the entire NLRI of a Leaf A-D route for (x,g) tree is as 486 following: 488 +- +-----------------------------------+ 489 | | Route Type - 4 (Leaf A-D) | 490 | +-----------------------------------+ 491 | | Length (1 octet) | 492 | +- +-----------------------------------+ --+ 493 | | | Route Type - 3 (S-PMSI A-D) | | 494 L | L | +-----------------------------------+ | S 495 E | E | | Length (1 octet) | | | 496 A | A | +-----------------------------------+ | P 497 F | F | | Multicast Source Length (1 octet) | | M 498 | | +-----------------------------------+ | S 499 N | R | | Multicast Source (variable) | | I 500 L | O | +-----------------------------------+ | 501 R | U | | Multicast Group Length (1 octet) | | N 502 I | T | +-----------------------------------+ | L 503 | E | | Multicast Group (variable) | | R 504 | | +-----------------------------------+ | I 505 | K | | Upstream Router's IP Address | | 506 | E | +-----------------------------------+ --+ 507 | Y | | Originating Router's IP Addrress | 508 +- +- +-----------------------------------+ 510 Even though the C-MCAST Leaf A-D route is unsolicited, unlike the 511 Leaf A-D route for GTM in [RFC7524], it is encoded as if a 512 corresponding S-PMSI A-D route had been received. 514 When used for signaling mLDP tunnels, even though the Leaf A-D route 515 is unsolicited, unlike the "Route-type 0x44 Leaf A-D route for 516 C-multicast mLDP" as in [RFC7441], it is Route-type 4 and encoded as 517 if a corresponding S-PMSI A-D route had been received. 519 2.1.3. Source Active A-D Route 521 Similar to defined in RFC 6514, a Source Active A-D Route Type 522 specific MCAST NLRI consists of the following: 524 +-----------------------------------+ 525 | Multicast Source Length (1 octet) | 526 +-----------------------------------+ 527 | Multicast Source (variable) | 528 +-----------------------------------+ 529 | Multicast Group Length (1 octet) | 530 +-----------------------------------+ 531 | Multicast Group (variable) | 532 +-----------------------------------+ 534 The definition of the source/length and group/length fields are the 535 same as in the S-PMSI A-D routes. 537 Usage of Source Active A-D routes is described in Section 1.2.1.1. 539 2.1.4. S-PMSI A-D Route for C-multicast mLDP 541 The route is used to signal upstream FEC for an MP2MP mLDP tunnel. 542 The route key include the mLDP FEC and the Upstream Router's IP 543 Address field. The encoding is similar to the same route in 544 [RFC7441], though there is no RD. 546 2.1.5. Session Address Extended Community 548 For two BGP speakers to determine if they are directly connected, 549 each will advertise their local interface addresses, with an Session 550 Address Extended Community. This is an Address Specific EC, with the 551 Global Admin Field set to the local address used for its multihop 552 sessions and the Local Admin Field set to the prefix length 553 corresponding to the interface's network mask. 555 For example, if a router has two interfaces with address 556 10.10.10.1/24 and 10.12.0.1/16 respectively (notice the different 557 network mask), and a loopback address 11.11.11.1/32 that is used for 558 BGP sessions, then it will advertise prefix 10.10.10.1/32 with a 559 Session Address EC 11.11.11.1:24 and 10.12.0.1/32 with a Session 560 Address EC 11.11.11.1:16. If it also uses another loopback address 561 11.11.11.11/32 for other BGP sessions, then the routes will 562 additionally carry Session Address EC 11.11.11.11:24 and 563 11.11.11.11:16 respectively. 565 This achieves what the Address List TLV in LDP Address Messages 566 achieves, and can also be used to indicate that a router supports the 567 BGP multicast signaling procedures specified in this document. 569 Only those interface addresses that will be used as resolved nexthops 570 in the RIB need to be advertised with the Session Address EC. For 571 example, the RPF lookup may say that the resolved nexthop address is 572 A1, so the router needs to find out the corresponding BGP speaker 573 with address A1 through the (interface address, session address) 574 mapping built according to the interface address NLRI with the 575 Session Address EC. For comparison with LDP, this is done via the 576 (interface address, session address) mapping that is built by the LDP 577 Address Messages. 579 2.2. Procedures 581 2.2.1. Source Discovery for ASM 583 When a FHR first receives a multicast packet addressed to an ASM 584 group, it originates a Source Active route. It carries a IP/IPv6 585 Address Specific RT, with the Global Admin Field set to the group 586 address and the Local Admin Field set to 0. The route is advertised 587 to its peers, who will re-advertise further based on the RTC 588 mechanisms. Note that typically the route is advertised only to the 589 RRs. 591 The FHRs withdraws the Source Active route after a certain amount of 592 time since it last received a packet of an (s,g) flow. The amount of 593 time to wait is a local matter. 595 2.2.2. Originating Tree Join Routes 597 Note that in this document, tree join routes are S-PMSI/Leaf A-D 598 routes. 600 2.2.2.1. (x,g) Multicast Tree 602 When a router learns from IGMP/MLD or a downstream PIM/BGP peer that 603 it needs to join a particular (s,g) tree, it determines the RPF 604 nexthop address wrt the source, following the same RPF procedures as 605 defined for PIM. It further finds the BGP router that advertised the 606 nexthop address as one of its local addresses. 608 If the RPF neighbor supports C-MCAST SAFI, this router originates a 609 Leaf A-D route. Although it is unsolicited, it is constructed as if 610 there was a corresponding S-PMSI A-D route. The Upstream Router's IP 611 Address field is set to the RPF neighbor's session address (learnt 612 via the EC attached to the host route for the RPF nexthop address). 614 An Address Specific RT corresponding to the session address is 615 attached to the route, with the Global Administrative Field set to 616 the session address and the local administrative field set to 0. 618 Similarly, when a router learns that it needs to join a bi- 619 directional tree for a particular group, it determines the RPF 620 neighbor wrt the RPA. If the neighbor supports C-MCAST SAFI, it 621 originates a Leaf A-D Route and advertises the route to the RPF 622 neighbor (in case of EBGP or hop-by-hop IBGP), or one or more RRs. 624 When a router first learns that it needs to receive traffic for an 625 ASM group, either because of a local (*,g) IGMP/MLD report or a 626 downstream PIM (*,g) join, it originates a RTC route with the NLRI's 627 AS field set to its AS number and the Route Target field set to an 628 address based RT, with the Global Administrator field set to group 629 address and the Local Administrator field set to 0. The route is 630 advertised to its peers (most practically some RRs), so that the 631 router can receive matching Source Active A-D routes. Upon the 632 receiving of the Source Active A-D routes, the router originates Leaf 633 A-D routes as described above, as long as it still needs to receive 634 traffic for the flows (i.e., the corresponding IGMP/MLD membership 635 exists or join from downstream PIM/BGP neighbor exists). 637 When a Leaf A-D route is originated by this router, it sets up 638 corresponding forwarding state such that the expected incoming 639 interface list includes all non-LAN interfaces directly connecting to 640 the upstream neighbor. LAN interfaces are added upon receiving 641 corresponding S-PMSI A-D route (Section 2.2.5.2). If the upstream 642 neighbor is not directly connected, tunnels may be used - details to 643 be included in future revisions. 645 When the upstream nbr changes, the previously advertised Leaf A-D 646 route is withdrawn. If there is a new upstream neighbor, a new Leaf 647 A-D route is originated, corresponding to the new neighbor. Because 648 NLRIs are different for the old and new Leaf A-D routes, make-before- 649 break can be achieved, so can MoFRR [RFC7431]. 651 2.2.2.2. BGP Inband Signaling for mLDP Tunnel 653 The same mLDP procedures as defined in [RFC6388] are followed, except 654 that where a label mapping message is sent in [RFC6388], a Leaf A-D 655 route is sent if the the upstream neighbor supports BGP based 656 signaling. 658 2.2.3. Receiving Tree Join Routes 660 A router (auto-)configures Import RTs matching itself so that it can 661 import tree join routes from their peers. Note that in this 662 document, tree join routes are S-PMSI/Leaf A-D routes. 664 When a router receives a tree join route and imports it, it 665 determines if it needs to originate its own corresponding route and 666 advertise further upstream wrt the source/RPA or mLDP tunnel root. 667 If itself is the FHR or is on the RPL or is the tunnel root, then it 668 does not need to. Otherwise the procedures in Section 2.2.2 are 669 followed. 671 Additionally, the router sets up its corresponding forwarding state 672 such that traffic will be sent to the downstream neighbor, and 673 received from the downstream neighbor in case of birectional tree/ 674 tunnel. If the downstream neighbor is not directly connected, 675 tunnels may be used - details to be included in future revisions. 677 2.2.4. Withdrawl of Tree Join Routes 679 For a particular tree or tunnel, if a downstream neighbor withdraw 680 its Leaf A-D route, the neighbor is removed from the corresponding 681 forwarding state. If all downstream neighbors withdraw their tree 682 join routes and this router no longer has local receivers, it 683 withdraws the tree join routes that it previously originated. 685 As mentioned earlier, when the upstream neighbor changes, the 686 previously advertised Leaf A-D route is also withdrawn. The 687 corresponding incoming interfaces are also removed from the 688 corresponding forwarding state. 690 2.2.5. LAN procedures for (x,g) Unidirectional Tree 692 For a unidirectional (x,g) multicast tree, if there is a LAN 693 interface connecting to the downstream neighbor, it MAY be preferred 694 over non-LAN interfaces, but an S-PMSI A-D route MUST be originated 695 to facilitate the analog of the Assert process (Section 2.2.5.1). 697 2.2.5.1. Originating S-PMSI A-D Routes 699 If this router chooses to use a LAN interface to send traffic to its 700 neighbors for a particular (s,g) or (*,g) flow, it MUST announce that 701 by originating a corresponding S-PMSI A-D route. The Tunnel Type in 702 the PMSI Tunnel Attribute (PTA) is set to 0 (no tunnel information 703 Present). The LAN interface is identified by an IP address specific 704 RT, with the Global Administrative Field set to the LAN interface's 705 address prefix and the Local Administrative Field set to the prefix 706 length. The RT also serves the purpose of restricting the importing 707 of the route by all routers on the LAN. An operator MUST ensure that 708 RTs encoded as above are not used for other purposes. Practically 709 that should not be unreasonable. 711 If multiple LAN interfaces are to be used (to reach different sets of 712 neighbors), then the route will include multiple RTs, one for each 713 used LAN interface as described above. 715 The S-PMSI A-D routes may also be used to announce tunnels that could 716 be used to send traffic to downstream neighbors that are not directly 717 connected. Details may be added in future revisions. 719 2.2.5.2. Receiving S-PMSI A-D Routes 721 A router (auto-)configures an Import RT for each of its LAN 722 interfaces over which BGP is used for multicast signaling. The 723 construction of the RT is described in the previous section. 725 When a router R1 imports an S-PMSI A-D route for flow (x,g) from 726 router R2, R1 checks to see if it also originating an S-PMSI A-D 727 route with the same NLRI except the Upstream Router's IP Address 728 field. When a router R1 originates an S-PMSI A-D route, it checks to 729 see if it also has installed an S-PMSI A-D route, from some other 730 router R2, with the same NLRI except the Upstream Router's IP Address 731 field. In either case, R1 checks to see if the two routes have an RT 732 in common and the RT is encoded as in Section 2.2.5.1. If so, then 733 there is a LAN attached to both R1 and R2, and both routers are 734 prepared to send (S,G) traffic onto that LAN. This kicks off the 735 assert procedure to elect a winner - the one with the highest 736 Upstream Router's IP Address in the NLRI wins. An assert loser will 737 not include the corresponding LAN interface in its outgoing interface 738 list, but it keeps the S-PMSI A-D route that it originates. 740 If this router does not have a matching S-PMSI route of its own with 741 some common RTs, and the originator of the received S-PMSI route is a 742 chosen upstream neighbor for the corresponding flow, then this router 743 updates its forwarding state to include the LAN interface in the 744 incoming interface list. When the last S-PMSI route with a RT 745 matching the LAN is withdrawn later, the LAN interface is removed 746 from the incoming interface list. 748 Note that a downstream router on the LAN does not participate in the 749 assert procedure. It adds/keeps the LAN interface in the expected 750 incoming interfaces as long as its chosen upstream peer originates 751 the S-PMSI AD route. It does not switch to the assert winner as its 752 upstream. An assert loser MAY keep sending joins upstream based on 753 local policy even if it has no other downstream neighbors (this could 754 be used for fast switch over in case the assert winner would fail). 756 2.2.6. Distributing Label for Upstream Traffic for Bidirectional Tree/ 757 Tunnel 759 For MP2MP mLDP tunnels or labeled (*,g) bidirectional trees, an 760 upstream router needs to advertise a label to all its downstream 761 neighbors so that the downstream neighbors can send traffic to 762 itself. 764 For MP2MP mLDP tunnels, the same procedures for mLDP are followed 765 except that instead of MP2MP-U Label Mapping messages, S-PMSI A-D 766 Routes for C-Multicast mLDP are used. 768 For labeled (*,g) bidirectional trees, for a Leaf A-D route received 769 from a downstream neighbor, a corresponding S-PMSI A-D route is sent 770 back to the downstream router. 772 In both cases, a single S-PMSI A-D route is originated for each tree 773 from this router, but with multiple RTs (one for each downstream 774 neighbor on the tree). A TEA specifies a label allocated by the 775 upstream router for its downstream neighbors to send traffic with. 776 Note that this is still a "downstream allocated" label (the upstream 777 router is "downstream" from traffic direction point of view). 779 The S-PMSI routes do not carry a PTA, unless a P2MP tunnel is used to 780 reach downstream neighbors. Such use case is out of scope of this 781 document for now and may be specified in the future. 783 3. Security Considerations 785 This document does not introduce new security risks? 787 4. Acknowledgements 789 The authors thank Marco Rodrigues and Lenny Giuliano for their 790 initial idea/ask of using BGP for multicast signaling beyond MVPN. 791 We also thank Eric Rosen for his questions, suggestions, and help 792 finding solutions to some issues. 794 5. References 796 5.1. Normative References 798 [I-D.ietf-bess-mvpn-pe-ce] 799 Patel, K., Rosen, E., and Y. Rekhter, "BGP as an MVPN PE- 800 CE Protocol", draft-ietf-bess-mvpn-pe-ce-01 (work in 801 progress), October 2015. 803 [I-D.ietf-idr-tunnel-encaps] 804 Rosen, E., Patel, K., and G. Velde, "The BGP Tunnel 805 Encapsulation Attribute", draft-ietf-idr-tunnel-encaps-03 806 (work in progress), November 2016. 808 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 809 Requirement Levels", BCP 14, RFC 2119, 810 DOI 10.17487/RFC2119, March 1997, 811 . 813 [RFC4601] Fenner, B., Handley, M., Holbrook, H., and I. Kouvelas, 814 "Protocol Independent Multicast - Sparse Mode (PIM-SM): 815 Protocol Specification (Revised)", RFC 4601, 816 DOI 10.17487/RFC4601, August 2006, 817 . 819 [RFC4684] Marques, P., Bonica, R., Fang, L., Martini, L., Raszuk, 820 R., Patel, K., and J. Guichard, "Constrained Route 821 Distribution for Border Gateway Protocol/MultiProtocol 822 Label Switching (BGP/MPLS) Internet Protocol (IP) Virtual 823 Private Networks (VPNs)", RFC 4684, DOI 10.17487/RFC4684, 824 November 2006, . 826 [RFC5015] Handley, M., Kouvelas, I., Speakman, T., and L. Vicisano, 827 "Bidirectional Protocol Independent Multicast (BIDIR- 828 PIM)", RFC 5015, DOI 10.17487/RFC5015, October 2007, 829 . 831 [RFC6514] Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP 832 Encodings and Procedures for Multicast in MPLS/BGP IP 833 VPNs", RFC 6514, DOI 10.17487/RFC6514, February 2012, 834 . 836 [RFC7441] Wijnands, IJ., Rosen, E., and U. Joorde, "Encoding 837 Multipoint LDP (mLDP) Forwarding Equivalence Classes 838 (FECs) in the NLRI of BGP MCAST-VPN Routes", RFC 7441, 839 DOI 10.17487/RFC7441, January 2015, 840 . 842 5.2. Informative References 844 [I-D.wijnands-bier-mld-lan-election] 845 Wijnands, I., Pfister, P., and Z. Zhang, "Generic 846 Multicast Router Election on LAN's", draft-wijnands-bier- 847 mld-lan-election-01 (work in progress), July 2016. 849 [RFC6826] Wijnands, IJ., Ed., Eckert, T., Leymann, N., and M. 850 Napierala, "Multipoint LDP In-Band Signaling for Point-to- 851 Multipoint and Multipoint-to-Multipoint Label Switched 852 Paths", RFC 6826, DOI 10.17487/RFC6826, January 2013, 853 . 855 [RFC7431] Karan, A., Filsfils, C., Wijnands, IJ., Ed., and B. 856 Decraene, "Multicast-Only Fast Reroute", RFC 7431, 857 DOI 10.17487/RFC7431, August 2015, 858 . 860 [RFC7938] Lapukhov, P., Premji, A., and J. Mitchell, Ed., "Use of 861 BGP for Routing in Large-Scale Data Centers", RFC 7938, 862 DOI 10.17487/RFC7938, August 2016, 863 . 865 Authors' Addresses 867 Zhaohui Zhang 868 Juniper Networks 870 EMail: zzhang@juniper.net 872 Keyur Patel 873 Arrcus 875 EMail: keyur@arrcus.com 877 IJsbrand Wijnands 878 Cisco Systems 880 EMail: ice@cisco.com 882 Arkadiy Gulko 883 Thomson Reuters 885 EMail: arkadiy.gulko@thomsonreuters.com