idnits 2.17.1 draft-ietf-bess-mvpn-fast-failover-12.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 28, 2020) is 1275 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group T. Morin, Ed. 3 Internet-Draft Orange 4 Intended status: Standards Track R. Kebler, Ed. 5 Expires: May 1, 2021 Juniper Networks 6 G. Mirsky, Ed. 7 ZTE Corp. 8 October 28, 2020 10 Multicast VPN Fast Upstream Failover 11 draft-ietf-bess-mvpn-fast-failover-12 13 Abstract 15 This document defines multicast VPN extensions and procedures that 16 allow fast failover for upstream failures by allowing downstream PEs 17 to consider the status of Provider-Tunnels (P-tunnels) when selecting 18 the upstream PE for a VPN multicast flow. The fast failover is 19 enabled by using RFC 8562 BFD for Multipoint Networks and the new BGP 20 Attribute - BFD Discriminator. Also, the document introduces a new 21 BGP Community, Standby PE, extending BGP MVPN routing so that a 22 C-multicast route can be advertised toward a Standby Upstream PE. 24 Status of This Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF). Note that other groups may also distribute 31 working documents as Internet-Drafts. The list of current Internet- 32 Drafts is at https://datatracker.ietf.org/drafts/current/. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on May 1, 2021. 41 Copyright Notice 43 Copyright (c) 2020 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (https://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the Simplified BSD License. 56 Table of Contents 58 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 59 2. Conventions used in this document . . . . . . . . . . . . . . 3 60 2.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 61 2.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 62 2.3. Acronyms . . . . . . . . . . . . . . . . . . . . . . . . 4 63 3. UMH Selection Based on Tunnel Status . . . . . . . . . . . . 5 64 3.1. Determining the Status of a Tunnel . . . . . . . . . . . 6 65 3.1.1. mVPN Tunnel Root Tracking . . . . . . . . . . . . . . 6 66 3.1.2. PE-P Upstream Link Status . . . . . . . . . . . . . . 7 67 3.1.3. P2MP RSVP-TE Tunnels . . . . . . . . . . . . . . . . 7 68 3.1.4. Leaf-initiated P-tunnels . . . . . . . . . . . . . . 8 69 3.1.5. (C-S, C-G) Counter Information . . . . . . . . . . . 8 70 3.1.6. BFD Discriminator Attribute . . . . . . . . . . . . . 8 71 3.1.7. Per PE-CE Link BFD Discriminator . . . . . . . . . . 12 72 4. Standby C-multicast Route . . . . . . . . . . . . . . . . . . 12 73 4.1. Downstream PE Behavior . . . . . . . . . . . . . . . . . 13 74 4.2. Upstream PE Behavior . . . . . . . . . . . . . . . . . . 14 75 4.3. Reachability Determination . . . . . . . . . . . . . . . 15 76 4.4. Inter-AS . . . . . . . . . . . . . . . . . . . . . . . . 15 77 4.4.1. Inter-AS Procedures for downstream PEs, ASBR Fast 78 Failover . . . . . . . . . . . . . . . . . . . . . . 16 79 4.4.2. Inter-AS Procedures for ASBRs . . . . . . . . . . . . 16 80 5. Hot Root Standby . . . . . . . . . . . . . . . . . . . . . . 16 81 6. Duplicate Packets . . . . . . . . . . . . . . . . . . . . . . 17 82 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 83 7.1. Standby PE Community . . . . . . . . . . . . . . . . . . 18 84 7.2. BFD Discriminator . . . . . . . . . . . . . . . . . . . . 18 85 7.3. BFD Discriminator Optional Sub-TLV Type . . . . . . . . . 19 86 8. Security Considerations . . . . . . . . . . . . . . . . . . . 19 87 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 20 88 10. Contributor Addresses . . . . . . . . . . . . . . . . . . . . 20 89 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 22 90 11.1. Normative References . . . . . . . . . . . . . . . . . . 22 91 11.2. Informative References . . . . . . . . . . . . . . . . . 23 92 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 23 94 1. Introduction 96 It is assumed that the reader is familiar with the workings of 97 multicast MPLS/BGP IP VPNs as described in [RFC6513] and [RFC6514]. 99 In the context of multicast in BGP/MPLS VPNs [RFC6513], it is 100 desirable to provide mechanisms allowing fast recovery of 101 connectivity on different types of failures. This document addresses 102 failures of elements in the provider network that are upstream of PEs 103 connected to VPN sites with receivers. 105 Section 3 describes local procedures allowing an egress PE (a PE 106 connected to a receiver site) to take into account the status of 107 P-tunnels to determine the Upstream Multicast Hop (UMH) for a given 108 (C-S, C-G). One of the optional methods uses [RFC8562] and the new 109 BGP Attribute - BFD Discriminator. None of these methods provide a 110 "fast failover" solution when used alone, but can be used together 111 with the mechanism described in Section 4 for a "fast failover" 112 solution. 114 Section 4 describes an optional BGP extension, a new Standby PE 115 Community. that can speed up failover by not requiring any multicast 116 VPN routing message exchange at recovery time. 118 Section 5 describes a "hot leaf standby" mechanism that can be used 119 to improve failover time in MVPN. The approach combines mechanisms 120 defined in Section 3 and Section 4 has similarities with the solution 121 described in [RFC7431] to improve failover times when PIM routing is 122 used in a network given some topology and metric constraints. 124 The procedures described in this document are optional to enable an 125 operator to provide protection for multicast services in BGP/MPLS IP 126 VPNs. An operator would enable these mechanisms using a method 127 discussed in Section 3 in combination with the redundancy provided by 128 a standby PE connected to the source of the multicast flow, and it is 129 assumed that all PEs in the network would support these mechanisms 130 for the procedures to work. In the case that a BGP implementation 131 does not recognize or is configured to not support the extensions 132 defined in this document, it will continue to provide the multicast 133 service, as described in [RFC6513]. 135 2. Conventions used in this document 137 2.1. Requirements Language 139 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 140 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 141 "OPTIONAL" in this document are to be interpreted as described in BCP 142 14 [RFC2119] [RFC8174] when, and only when, they appear in all 143 capitals, as shown here. 145 2.2. Terminology 147 The terminology used in this document is the terminology defined in 148 [RFC6513] and [RFC6514]. 150 The term 'upstream' (lower case) throughout this document refers to 151 links and nodes that are upstream to a PE connected to VPN sites with 152 receivers of a multicast flow. 154 The term 'Upstream' (capitalized) throughout this document refers to 155 a PE or an Autonomous System Border Router (ASBR) at which (S,G) or 156 (*,G) data packets enter the VPN backbone or the local AS when 157 traveling through the VPN backbone. 159 2.3. Acronyms 161 PMSI: P-Multicast Service Interface 163 I-PMSI: Inclusive PMSI 165 S-PMSI: Selective PMSI 167 x-PMSI: Either an I-PMSI or an S-PMSI 169 P-tunnel: Provider-Tunnels 171 UMH: Upstream Multicast Hop 173 VPN: Virtual Private Network 175 MVPN: Multicast VPN 177 RD: Route Distinguisher 179 RP: Rendezvous Point 181 NLRI: Network Layer Reachability Information 183 VRF: VPN Routing and Forwarding Table 185 MED: Multi-Exit Discriminator 187 P2MP: Point-to-Multipoint 189 3. UMH Selection Based on Tunnel Status 191 Section 5.1 of [RFC6513] describes procedures used by a multicast VPN 192 downstream PE to determine the Upstream Multicast Hop (UMH) for a 193 given (C-S, C-G). 195 For a given downstream PE and a given VRF, the P-tunnel corresponding 196 to a given Upstream PE for a given (C-S, C-G) state is the S-PMSI 197 tunnel advertised by that Upstream PE for this (C-S, C-G) and 198 imported into that VRF, or if there isn't any such S-PMSI, the I-PMSI 199 tunnel advertised by that PE and imported into that VRF. 201 The procedure described here is an OPTIONAL procedure that is based 202 on a downstream PE taking into account the status of P-tunnels rooted 203 at each possible Upstream PE, for including or not including each 204 given PE in the list of candidate UMHs for a given (C-S, C-G) state. 205 If it is not possible to determine whether a P-tunnel's current 206 status is Up, the state shall be considered "not known to be Down", 207 and it may be treated as if it is Up so that attempts to use the 208 tunnel are acceptable. The result is that, if a P-tunnel is Down 209 (see Section 3.1), the PE that is the root of the P-tunnel will not 210 be considered for UMH selection. This will result in the downstream 211 PE failing over to use the next Upstream PE in the list of 212 candidates. Some downstream PEs could arrive at a different 213 conclusion regarding the tunnel's state because the failure impacts 214 only a subset of branches. Because of that, the procedures of 215 Section 9.1.1 of [RFC6513] are applicable when using I-PMSI 216 P-tunnels. That document is a foundation for this document, and its 217 processes all apply here. Section 9.1.1 mandates the use of specific 218 procedures for sending intra-AS I-PMSI A-D Routes. 220 There are three options specified in Section 5.1 of [RFC6513] for a 221 downstream PE to select an Upstream PE. 223 o The first two options select the Upstream PE from a candidate PE 224 set either based on an IP address or a hashing algorithm. When 225 used together with the optional procedure of considering the 226 P-tunnel status as in this document, a candidate Upstream PE is 227 included in the set if it either: 229 A. advertises an x-PMSI bound to a tunnel, where the specified 230 tunnel's state is not known to be Down, or, 232 B. does not advertise any x-PMSI applicable to the given (C-S, 233 C-G) but has associated a VRF Route Import BGP attribute to 234 the unicast VPN route for S. That is necessary to avoid 235 incorrectly invalidating a UMH PE that would use a policy 236 where no I-PMSI is advertised for a given VRF and where only 237 S-PMSI are used. The S-PMSI can be advertised only after the 238 Upstream PE receives a C-multicast route for (C-S, C-G)/(C-*, 239 C-G) to be carried over the advertised S-PMSI. 241 If the resulting candidate set is empty, then the procedure is 242 repeated without considering the P-tunnel status. 244 o The third option uses the installed UMH Route (i.e., the "best" 245 route towards the C-root) as the Selected UMH Route, and its 246 originating PE is the selected Upstream PE. With the optional 247 procedure of considering P-tunnel status as in this document, the 248 Selected UMH Route is the best one among those whose originating 249 PE's P-tunnel is not "down". If that does not exist, the 250 installed UMH Route is selected regardless of the P-tunnel status. 252 3.1. Determining the Status of a Tunnel 254 Different factors can be considered to determine the "status" of a 255 P-tunnel and are described in the following sub-sections. The 256 optional procedures described in this section also handle the case 257 the downstream PEs do not all apply the same rules to define what the 258 status of a P-tunnel is (please see Section 6), and some of them will 259 produce a result that may be different for different downstream PEs. 260 Thus, the "status" of a P-tunnel in this section is not a 261 characteristic of the tunnel in itself, but is the tunnel status, as 262 seen from a particular downstream PE. Additionally, some of the 263 following methods determine the ability of a downstream PE to receive 264 traffic on the P-tunnel and not specifically on the status of the 265 P-tunnel itself. That could be referred to as "P-tunnel reception 266 status", but for simplicity, we will use the terminology of P-tunnel 267 "status" for all of these methods. 269 Depending on the criteria used to determine the status of a P-tunnel, 270 there may be an interaction with another resiliency mechanism used 271 for the P-tunnel itself, and the UMH update may happen immediately or 272 may need to be delayed. Each particular case is covered in each 273 separate sub-section below. 275 An implementation may support any combination of the methods 276 described in this section and provide a network operator with control 277 to choose which one to use in the particular deployment. 279 3.1.1. mVPN Tunnel Root Tracking 281 A condition to consider that the status of a P-tunnel is Up is that 282 the root of the tunnel, as determined in the x-PMSI Tunnel attribute, 283 is reachable through unicast routing tables. In this case, the 284 downstream PE can immediately update its UMH when the reachability 285 condition changes. 287 That is similar to BGP next-hop tracking for VPN routes, except that 288 the address considered is not the BGP next-hop address, but the root 289 address in the x-PMSI Tunnel attribute. 291 If BGP next-hop tracking is done for VPN routes and the root address 292 of a given tunnel happens to be the same as the next-hop address in 293 the BGP A-D Route advertising the tunnel, then checking, in unicast 294 routing tables, whether the tunnel root is reachable, will be 295 unnecessary duplication and thus will not bring any specific benefit. 297 3.1.2. PE-P Upstream Link Status 299 A condition to consider a tunnel status as Up can be that the last- 300 hop link of the P-tunnel is Up. Conversely, if the last-hop link of 301 the P-tunnel is Down then this can be taken as an indication that the 302 P-tunnel is Down. 304 Using this method when a fast restoration mechanism (such as MPLS FRR 305 [RFC4090]) is in place for the link requires careful consideration 306 and coordination of defect detection intervals for the link and the 307 tunnel. In many cases, it is not practical to use both protection 308 methods at the same time because uncorrelated timers might cause 309 unnecessary switchovers and destabilize the network. 311 3.1.3. P2MP RSVP-TE Tunnels 313 For P-tunnels of type P2MP MPLS-TE, the status of the P-tunnel is 314 considered Up if the sub-LSP to this downstream PE is in the Up 315 state. The determination of whether a P2MP RSVP-TE LSP is in the Up 316 state requires Path and Resv state for the LSP and is based on 317 procedures specified in [RFC4875]. As a result, the downstream PE 318 can immediately update its UMH when the reachability condition 319 changes. 321 When using this method and if the signaling state for a P2MP TE LSP 322 is removed (e.g., if the ingress of the P2MP TE LSP sends a PathTear 323 message) or the P2MP TE LSP changes state from Up to Down as 324 determined by procedures in [RFC4875], the status of the 325 corresponding P-tunnel MUST be re-evaluated. If the P-tunnel 326 transitions from Up to Down state, the Upstream PE that is the 327 ingress of the P-tunnel MUST NOT be considered a valid UMH. 329 3.1.4. Leaf-initiated P-tunnels 331 An Upstream PE SHOULD be removed from the UMH candidate list for a 332 given (C-S, C-G) if the P-tunnel (I-PMSI or S-PMSI) for this (S, G) 333 is leaf-triggered (PIM, mLDP), but for some reason, internal to the 334 protocol, the upstream one-hop branch of the tunnel from P to PE 335 cannot be built. As a result, the downstream PE can immediately 336 update its UMH when the reachability condition changes. 338 3.1.5. (C-S, C-G) Counter Information 340 In cases, where the downstream node can be configured so that the 341 maximum inter-packet time is known for all the multicast flows mapped 342 on a P-tunnel, the local per-(C-S, C-G) traffic counter information 343 for traffic received on this P-tunnel can be used to determine the 344 status of the P-tunnel. 346 When such a procedure is used, in the context where fast restoration 347 mechanisms are used for the P-tunnels, a configurable timer MUST be 348 set on the downstream PE to wait before updating the UMH, to let the 349 P-tunnel restoration mechanism to execute its actions. An 350 implementation SHOULD use three seconds as the default value for this 351 timer. 353 In cases where this mechanism is used in conjunction with the method 354 described in Section 5, no prior knowledge of the rate of the 355 multicast streams is required; downstream PEs can compare reception 356 on the two P-tunnels to determine when one of them is down. 358 3.1.6. BFD Discriminator Attribute 360 P-tunnel status may be derived from the status of a multipoint BFD 361 session [RFC8562] whose discriminator is advertised along with an 362 x-PMSI A-D Route. 364 This document defines the format and ways of using a new BGP 365 attribute called the "BFD Discriminator". It is an optional 366 transitive BGP attribute. An implementation that does not recognize 367 or is configured not to support this attribute MUST follow procedures 368 defined for optional transitive path attributes in Section 5 of 369 [RFC4271]. In Section 7.2, IANA is requested to allocate the 370 codepoint value (TBA2). The format of this attribute is shown in 371 Figure 1. 373 0 1 2 3 374 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 375 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 376 | BFD Mode | Reserved | 377 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 378 | BFD Discriminator | 379 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 380 ~ Optional TLVs ~ 381 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 383 Figure 1: Format of the BFD Discriminator Attribute 385 Where: 387 BFD Mode field is the one octet long. This specification defines 388 the P2MP BFD Session as value 1 Section 7.2. 390 Reserved field is three octets long, and the value MUST be zeroed 391 on transmission and ignored on receipt. 393 BFD Discriminator field is four octets long. 395 Optional TLVs is the optional variable-length field that MAY be 396 used in the BFD Discriminator attribute for future extensions. 397 TLVs MAY be included in a sequential or nested manner. To allow 398 for TLV nesting, it is advised to define a new TLV as a variable- 399 length object. Figure 2 presents the Optional TLV format TLV that 400 consists of: 402 * one octet-long field of TLV's Type value (Section 7.3) 404 * one octet-long field of the length of the Value field in octets 406 * variable length Value field. 408 The length of a TLV MUST be multiple of four octets. 410 0 1 2 3 411 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 412 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 413 | Type | Length | Value ... 414 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 416 Figure 2: Format of the Optional TLV 418 The BFD Discriminator attribute MUST be considered malformed if its 419 length is not a non-zero multiple of four. If the attribute 420 considered malformed, the UPDATE message SHALL be handled using the 421 approach of Attribute Discard per [RFC7606]. 423 3.1.6.1. Upstream PE Procedures 425 To enable downstream PEs to track the P-tunnel status using a point- 426 to-multipoint (P2MP) BFD session the Upstream PE: 428 o MUST initiate the BFD session and set bfd.SessionType = 429 MultipointHead as described in [RFC8562]; 431 o MUST set the IP destination address of the inner IP header to one 432 of the internal loopback addresses from 127/8 range for IPv4 or 433 one of IPv4-mapped IPv6 addresses from ::ffff:127.0.0.0/104 range 434 for IPv6 when transmitting BFD Control packets; 436 o MUST use its IP address as the source IP address when transmitting 437 BFD Control packets; 439 o MUST include the BFD Discriminator attribute in the x-PMSI A-D 440 Route with the value set to My Discriminator value; 442 o MUST periodically transmit BFD Control packets over the x-PMSI 443 P-tunnel after the P-tunnel is considered established. Note that 444 the methods to declare a P-tunnel has been established are outside 445 the scope of this specification. 447 If the tracking of the P-tunnel by using a P2MP BFD session is 448 enabled after the x-PMSI A-D Route has been already advertised, the 449 x-PMSI A-D Route MUST be re-sent with precisely the same attributes 450 as before and the BFD Discriminator attribute included. 452 If the x-PMSI A-D Route is advertised with P-tunnel status tracked 453 using the P2MP BFD session and it is desired to stop tracking 454 P-tunnel status using BFD, then: 456 o x-PMSI A-D Route MUST be re-sent with precisely the same 457 attributes as before, but the BFD Discriminator attribute MUST be 458 excluded; 460 o the P2MP BFD session SHOULD be deleted. 462 3.1.6.2. Downstream PE Procedures 464 Upon receiving the BFD Discriminator attribute in the x-PMSI A-D 465 Route, the downstream PE: 467 o MUST associate the received BFD Discriminator value with the 468 P-tunnel originating from the Upstream PE and the IP address of 469 the Upstream PE; 471 o MUST create a P2MP BFD session and set bfd.SessionType = 472 MultipointTail as described in [RFC8562]; 474 o MUST use the source IP address of the BFD Control packet, the 475 value of the BFD Discriminator field, and the x-PMSI Tunnel 476 Identifier [RFC6514] the BFD Control packet was received to 477 properly demultiplex BFD sessions. 479 After the state of the P2MP BFD session is up, i.e., bfd.SessionState 480 == Up, the session state will then be used to track the health of the 481 P-tunnel. 483 According to [RFC8562], if the downstream PE receives Down or 484 AdminDown in the State field of the BFD Control packet or associated 485 with the BFD session Detection Timer expires, the BFD session is 486 down, i.e., bfd.SessionState == Down. When the BFD session state is 487 Down, then the P-tunnel associated with the BFD session MUST be 488 considered down. If the site that contains C-S is connected to two 489 or more PEs, a downstream PE will select one as its Primary Upstream 490 PE, while others are considered as Standby Upstream PEs. In such a 491 scenario, when the P-tunnel is considered down, the downstream PE MAY 492 initiate a switchover of the traffic from the Primary Upstream PE to 493 the Standby Upstream PE only if the Standby Upstream PE is deemed 494 available. 496 If the downstream PE's P-tunnel is already established when the 497 downstream PE receives the new x-PMSI A-D Route with BFD 498 Discriminator attribute, the downstream PE MUST associate the value 499 of BFD Discriminator field with the P-tunnel and follow procedures 500 listed above in this section if and only if the x-PMSI A-D Route was 501 properly processed as per [RFC6514], and the BFD Discriminator 502 attribute was validated. 504 If the downstream PE's P-tunnel is already established, its state 505 being monitored by the P2MP BFD session, and the downstream PE 506 receives the new x-PMSI A-D Route without the BFD Discriminator 507 attribute, and the x-PMSI A-D Route was processed without any error 508 as per the relevant specifications, the downstream PE: 510 o MUST stop processing BFD Control packets for this P2MP BFD 511 session; 513 o SHOULD delete the P2MP BFD session associated with the P-tunnel; 515 o SHOULD NOT switch the traffic to the Standby Upstream PE. 517 3.1.7. Per PE-CE Link BFD Discriminator 519 The following approach is defined in response to the detection by the 520 Upstream PE of a PE-CE link failure. Even though the provider tunnel 521 is still up, it is desired for the downstream PEs to switch to a 522 backup Upstream PE. To achieve that, if the Upstream PE detects that 523 its PE-CE link fails, it SHOULD set the bfd.LocalDiag of the P2MP BFD 524 session to Concatenated Path Down and/or Reverse Concatenated Path 525 Down (per Section 6.8.17 [RFC5880]), unless it switches to a new PE- 526 CE link within the time of bfd.DesiredMinTxInterval for the P2MP BFD 527 session (in that case, the Upstream PE will start tracking the status 528 of the new PE-CE link). When a downstream PE receives that 529 bfd.LocalDiag code, it treats it as if the tunnel itself failed and 530 tries to switch to a backup PE. 532 4. Standby C-multicast Route 534 The procedures described below are limited to the case where the site 535 that contains C-S is connected to two or more PEs though, to simplify 536 the description, the case of dual-homing is described. The 537 procedures require all the PEs of that MVPN to follow the same UMH 538 selection procedure, as specified in [RFC6513], whether the PE 539 selected based on its IP address, hashing algorithm described in 540 section 5.1.3 of [RFC6513], or Installed UMH Route. The procedures 541 assume that if a site of a given MVPN that contains C-S is dual-homed 542 to two PEs, then all the other sites of that MVPN would have two 543 unicast VPN routes (VPN-IPv4 or VPN-IPv6) to C-S, each with its RD. 545 As long as C-S is reachable via both PEs, a given downstream PE will 546 select one of the PEs connected to C-S as its Upstream PE for C-S. 547 We will refer to the other PE connected to C-S as the "Standby 548 Upstream PE". Note that if the connectivity to C-S through the 549 Primary Upstream PE becomes unavailable, then the PE will select the 550 Standby Upstream PE as its Upstream PE for C-S. When the Primary PE 551 later becomes available, then the PE will select the Primary Upstream 552 PE again as its Upstream PE. Such behavior is referred to as 553 "revertive" behavior and MUST be supported. Non-revertive behavior 554 refers to the behavior of continuing to select the backup PE as the 555 UMH even after the Primary has come up. This non-revertive behavior 556 MAY also be supported by an implementation and would be enabled 557 through some configuration. 559 For readability, in the following sub-sections, the procedures are 560 described for BGP C-multicast Source Tree Join routes, but they apply 561 equally to BGP C-multicast Shared Tree Join routes for the case where 562 the customer RP is dual-homed (substitute "C-RP" to "C-S"). 564 4.1. Downstream PE Behavior 566 When a (downstream) PE connected to some site of an MVPN needs to 567 send a C-multicast route (C-S, C-G), then following the procedures 568 specified in Section 11.1 of [RFC6514], the PE sends the C-multicast 569 route with an RT that identifies the Upstream PE selected by the PE 570 originating the route. As long as C-S is reachable via the Primary 571 Upstream PE, the Upstream PE is the Primary Upstream PE. If C-S is 572 reachable only via the Standby Upstream PE, then the Upstream PE is 573 the Standby Upstream PE. 575 If C-S is reachable via both the Primary and the Standby Upstream PE, 576 then in addition to sending the C-multicast route with an RT that 577 identifies the Primary Upstream PE, the downstream PE also originates 578 and sends a C-multicast route with an RT that identifies the Standby 579 Upstream PE. The route that has the semantics of being a "standby" 580 C-multicast route is further called a "Standby BGP C-multicast 581 route", and is constructed as follows: 583 o the NLRI is constructed as the C-multicast route with an RT that 584 identifies the Primary Upstream PE, except that the RD is the same 585 as if the C-multicast route was built using the Standby Upstream 586 PE as the UMH (it will carry the RD associated to the unicast VPN 587 route advertised by the Standby Upstream PE for S and a Route 588 Target derived from the Standby Upstream PE's UMH route's VRF RT 589 Import EC); 591 o MUST carry the "Standby PE" BGP Community (this is a new BGP 592 Community. Section 7.1 requested IANA to allocate value TBA1). 594 The Local Preference attribute of the normal and the standby 595 C-multicast route needs to be adjusted. so that, if a BGP peer 596 receives two C-multicast routes with the same NLRI, one carrying the 597 "Standby PE" community and the other one not carrying the "Standby 598 PE" community, then preference is given to the one not carrying the 599 "Standby PE" community. Such a situation can happen when, for 600 instance, due to transient unicast routing inconsistencies or lack of 601 support of the Standby PE community, two different downstream PEs 602 consider different Upstream PEs to be the primary one. In that case, 603 without any precaution taken, both Upstream PEs would process a 604 standby C-multicast route and possibly stop forwarding at the same 605 time. For this purpose, routes that carry the "Standby PE" BGP 606 Community MUST have the LOCAL_PREF attribute set to zero. 608 Note that, when a PE advertises such a Standby C-multicast join for a 609 (C-S, C-G) it MUST join the corresponding P-tunnel. 611 If at some later point, the PE determines that C-S is no longer 612 reachable through the Primary Upstream PE, the Standby Upstream PE 613 becomes the Upstream PE, and the PE re-sends the C-multicast route 614 with RT that identifies the Standby Upstream PE, except that now the 615 route does not carry the Standby PE BGP Community (which results in 616 replacing the old route with a new route, with the only difference 617 between these routes being the presence/absence of the Standby PE BGP 618 Community). The LOCAL_PREF attribute MUST be set to zero. 620 4.2. Upstream PE Behavior 622 When a PE receives a C-multicast route for a particular (C-S, C-G), 623 and the RT carried in the route results in importing the route into a 624 particular VRF on the PE, if the route carries the Standby PE BGP 625 Community, then the PE performs as follows: 627 when the PE determines (the use of the particular method to detect 628 the failure is outside the scope of this document) that C-S is not 629 reachable through some other PE, the PE SHOULD install VRF PIM 630 state corresponding to this Standby BGP C-multicast route (the 631 result will be that a PIM Join message will be sent to the CE 632 towards C-S, and that the PE will receive (C-S, C-G) traffic), and 633 the PE SHOULD forward (C-S, C-G) traffic received by the PE to 634 other PEs through a P-tunnel rooted at the PE. 636 Furthermore, irrespective of whether C-S carried in that route is 637 reachable through some other PE: 639 a) based on local policy, as soon as the PE receives this Standby BGP 640 C-multicast route, the PE MAY install VRF PIM state corresponding 641 to this BGP Source Tree Join route (the result will be that Join 642 messages will be sent to the CE toward C-S, and that the PE will 643 receive (C-S, C-G) traffic) 645 b) based on local policy, as soon as the PE receives this Standby BGP 646 C-multicast route, the PE MAY forward (C-S, C-G) traffic to other 647 PEs through a P-tunnel independently of the reachability of C-S 648 through some other PE. [note that this implies also doing a)] 650 Doing neither a) or b) for a given (C-S, C-G) is called "cold root 651 standby". 653 Doing a) but not b) for a given (C-S, C-G) is called "warm root 654 standby". 656 Doing b) (which implies also doing a)) for a given (C-S, C-G) is 657 called "hot root standby". 659 Note that, if an Upstream PE uses an S-PMSI only policy, it shall 660 advertise an S-PMSI for a (C-S, C-G) as soon as it receives a 661 C-multicast route for (C-S, C-G), normal or Standby; i.e., it shall 662 not wait for receiving a non-Standby C-multicast route before 663 advertising the corresponding S-PMSI. 665 Section 9.3.2 of [RFC6514], describes the procedures of sending a 666 Source-Active A-D Route as a result of receiving the C-multicast 667 route. These procedures MUST be followed for both the normal and 668 Standby C-multicast routes. 670 4.3. Reachability Determination 672 The Standby Upstream PE can use the following information to 673 determine that C-S can or cannot be reached through the Primary 674 Upstream PE: 676 o presence/absence of a unicast VPN route toward C-S 678 o supposing that the Standby Upstream PE is the egress of the tunnel 679 rooted at the Primary Upstream PE, the Standby Upstream PE can 680 determine the reachability of C-S through the Primary Upstream PE 681 based on the status of this tunnel, determined thanks to the same 682 criteria as the ones described in Section 3.1 (without using the 683 UMH selection procedures of Section 3); 685 o other mechanisms MAY be used. 687 4.4. Inter-AS 689 If the non-segmented inter-AS approach is used, the procedures 690 described in Section 4.1 through Section 4.3 can be applied. 692 When multicast VPNs are used in an inter-AS context with the 693 segmented inter-AS approach described in Section 9.2 of [RFC6514], 694 the procedures in this section can be applied. 696 A pre-requisite for the procedures described below to be applied for 697 a source of a given MVPN is: 699 o that any PE of this MVPN receives two or more Inter-AS I-PMSI A-D 700 Routes advertised by the AS of the source 702 o that these Inter-AS I-PMSI A-D Routes have distinct Route 703 Distinguishers (as described in item "(2)" of section 9.2 of 704 [RFC6514]). 706 As an example, these conditions will be satisfied when the source is 707 dual-homed to an AS that connects to the receiver AS through two ASBR 708 using auto-configured RDs. 710 4.4.1. Inter-AS Procedures for downstream PEs, ASBR Fast Failover 712 The following procedure is applied by downstream PEs of an AS, for a 713 source S in a remote AS. 715 Additionally to choosing an Inter-AS I-PMSI A-D Route advertised from 716 the AS of the source to construct a C-multicast route, as described 717 in section 11.1.3 [RFC6514], a downstream PE will choose a second 718 Inter-AS I-PMSI A-D Route advertised from the AS of the source and 719 use this route to construct and advertise a Standby C-multicast route 720 (C-multicast route carrying the Standby extended community), as 721 described in Section 4.1. 723 4.4.2. Inter-AS Procedures for ASBRs 725 When an Upstream ASBR receives a C-multicast route, and at least one 726 of the RTs of the route matches one of the ASBR Import RT, the ASBR, 727 that supports this specification, MUST try to locate an Inter-AS 728 I-PMSI A-D Route whose RD and Source AS respectively match the RD and 729 Source AS carried in the C-multicast route. If the match is found, 730 and the C-multicast route carries the Standby PE BGP Community, then 731 the ASBR MUST perform as follows: 733 o if the route was received over iBGP and its LOCAL_PREF attribute 734 is set to zero, then it MUST be re-advertised in eBGP with a MED 735 attribute (MULTI_EXIT_DISC) set to the highest possible value 736 (0xffff) 738 o if the route was received over eBGP and its MED attribute set to 739 0xffff, then it MUST be re-advertised in iBGP with a LOCAL_PREF 740 attribute set to zero 742 Other ASBR procedures are applied without modification. 744 5. Hot Root Standby 746 The mechanisms defined in Section 4 and Section 3 can be used 747 together as follows. 749 The principle is that, for a given VRF (or possibly only for a given 750 (C-S, C-G): 752 o downstream PEs advertise a Standby BGP C-multicast route (based on 753 Section 4) 755 o Upstream PEs use the "hot standby" optional behavior and thus will 756 forward traffic for a given multicast state as soon as they have 757 whether a (primary) BGP C-multicast route or a Standby BGP 758 C-multicast route for that state (or both) 760 o downstream PEs accept traffic from the primary or standby tunnel, 761 based on the status of the tunnel (based on Section 3) 763 Other combinations of the mechanisms proposed in Section 4 and 764 Section 3 are for further study. 766 Note that the same level of protection would be achievable with a 767 simple C-multicast Source Tree Join route advertised to both the 768 primary and secondary Upstream PEs (carrying as Route Target extended 769 communities, the values of the VRF Route Import attribute of each VPN 770 route from each Upstream PEs). The advantage of using the Standby 771 semantic is that, supposing that downstream PEs always advertise a 772 Standby C-multicast route to the secondary Upstream PE, it allows to 773 choose the protection level through a change of configuration on the 774 secondary Upstream PE, without requiring any reconfiguration of all 775 the downstream PEs. 777 6. Duplicate Packets 779 Multicast VPN specifications [RFC6513] impose that a PE only forwards 780 to CEs the packets coming from the expected Upstream PE (Section 9.1 781 of [RFC6513]). 783 We draw the reader's attention to the fact that the respect of this 784 part of multicast VPN specifications is especially important when two 785 distinct Upstream PEs are susceptible to forward the same traffic on 786 P-tunnels at the same time in the steady state. That will be the 787 case when "hot root standby" mode is used (Section 4), and which can 788 also be the case if procedures of Section 3 are used and a) the rules 789 determining the status of a tree are not the same on two distinct 790 downstream PEs or b) the rule determining the status of a tree 791 depends on conditions local to a PE (e.g., the PE-P upstream link 792 being up). 794 7. IANA Considerations 796 7.1. Standby PE Community 798 IANA is requested to allocate the BGP "Standby PE" community value 799 (TBA1) from the Border Gateway Protocol (BGP) Well-known Communities 800 registry using the First Come First Served registration policy. 802 7.2. BFD Discriminator 804 This document defines a new BGP optional transitive attribute, called 805 "BFD Discriminator". IANA is requested to allocate a codepoint 806 (TBA2) in the "BGP Path Attributes" registry to the BFD Discriminator 807 attribute. 809 IANA is requested to create a new BFD Mode sub-registry in the Border 810 Gateway Protocol (BGP) Parameters registry. The registration 811 policies, per [RFC8126], for this sub-registry are according to 812 Table 1. 814 +-----------+-------------------------+ 815 | Value | Policy | 816 +-----------+-------------------------+ 817 | 0- 175 | IETF Review | 818 | 176 - 249 | First Come First Served | 819 | 250 - 254 | Experimental Use | 820 | 255 | IETF Review | 821 +-----------+-------------------------+ 823 Table 1: BFD Mode Sub-registry Registration Policies 825 IANA is requested to make initial assignments according to Table 2. 827 +-----------+------------------+---------------+ 828 | Value | Description | Reference | 829 +-----------+------------------+---------------+ 830 | 0 | Reserved | This document | 831 | 1 | P2MP BFD Session | This document | 832 | 2- 175 | Unassigned | This document | 833 | 176 - 249 | Unassigned | This document | 834 | 250 - 254 | Experimental Use | This document | 835 | 255 | Reserved | This document | 836 +-----------+------------------+---------------+ 838 Table 2: BFD Mode Sub-registry 840 7.3. BFD Discriminator Optional Sub-TLV Type 842 IANA is requested to create a new BFD Discriminator Optional sub-TLV 843 Type sub-registry in Border Gateway Protocol (BGP). The registration 844 policies, per [RFC8126], for this sub-registry are according to 845 Table 3. 847 +-----------+-------------------------+ 848 | Value | Policy | 849 +-----------+-------------------------+ 850 | 0- 175 | IETF Review | 851 | 176 - 249 | First Come First Served | 852 | 250 - 254 | Experimental Use | 853 | 255 | IETF Review | 854 +-----------+-------------------------+ 856 Table 3: BFD Discriminator Optional Sub-TLV Type Sub-registry 857 Registration Policies 859 IANA is requested to make initial assignments according to Table 4. 861 +-----------+------------------+---------------+ 862 | Value | Description | Reference | 863 +-----------+------------------+---------------+ 864 | 0 | Reserved | This document | 865 | 1- 175 | Unassigned | This document | 866 | 176 - 249 | Unassigned | This document | 867 | 250 - 254 | Experimental Use | This document | 868 | 255 | Reserved | This document | 869 +-----------+------------------+---------------+ 871 Table 4: BFD Discriminator Optional Sub-TLV Type Sub-registry 873 8. Security Considerations 875 This document describes procedures based on [RFC6513] and [RFC6514] 876 and hence shares the security considerations respectively represented 877 in these specifications. 879 This document uses P2MP BFD, as defined in [RFC8562], which, in turn, 880 is based on [RFC5880]. Security considerations relevant to each 881 protocol are discussed in the respective protocol specifications. An 882 implementation that supports this specification MUST use a mechanism 883 to control the maximum number of P2MP BFD sessions that can be active 884 at the same time. 886 9. Acknowledgments 888 The authors want to thank Greg Reaume, Eric Rosen, Jeffrey Zhang, 889 Martin Vigoureux, Adrian Farrel, and Zheng (Sandy) Zhang for their 890 reviews, useful comments, and helpful suggestions. 892 10. Contributor Addresses 894 Below is a list of other contributing authors in alphabetical order: 896 Rahul Aggarwal 897 Arktan 899 Email: raggarwa_1@yahoo.com 901 Nehal Bhau 902 Cisco 904 Email: NBhau@cisco.com 906 Clayton Hassen 907 Bell Canada 908 2955 Virtual Way 909 Vancouver 910 CANADA 912 Email: Clayton.Hassen@bell.ca 914 Wim Henderickx 915 Nokia 916 Copernicuslaan 50 917 Antwerp 2018 918 Belgium 920 Email: wim.henderickx@nokia.com 922 Pradeep Jain 923 Nokia 924 701 E Middlefield Rd 925 Mountain View, CA 94043 926 USA 928 Email: pradeep.jain@nokia.com 930 Jayant Kotalwar 931 Nokia 932 701 E Middlefield Rd 933 Mountain View, CA 94043 934 USA 936 Email: Jayant.Kotalwar@nokia.com 938 Praveen Muley 939 Nokia 940 701 East Middlefield Rd 941 Mountain View, CA 94043 942 U.S.A. 944 Email: praveen.muley@nokia.com 946 Ray (Lei) Qiu 947 Juniper Networks 948 1194 North Mathilda Ave. 949 Sunnyvale, CA 94089 950 U.S.A. 952 Email: rqiu@juniper.net 954 Yakov Rekhter 955 Juniper Networks 956 1194 North Mathilda Ave. 957 Sunnyvale, CA 94089 958 U.S.A. 960 Email: yakov@juniper.net 962 Kanwar Singh 963 Nokia 964 701 E Middlefield Rd 965 Mountain View, CA 94043 966 USA 968 Email: kanwar.singh@nokia.com 970 11. References 972 11.1. Normative References 974 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 975 Requirement Levels", BCP 14, RFC 2119, 976 DOI 10.17487/RFC2119, March 1997, 977 . 979 [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A 980 Border Gateway Protocol 4 (BGP-4)", RFC 4271, 981 DOI 10.17487/RFC4271, January 2006, 982 . 984 [RFC4875] Aggarwal, R., Ed., Papadimitriou, D., Ed., and S. 985 Yasukawa, Ed., "Extensions to Resource Reservation 986 Protocol - Traffic Engineering (RSVP-TE) for Point-to- 987 Multipoint TE Label Switched Paths (LSPs)", RFC 4875, 988 DOI 10.17487/RFC4875, May 2007, 989 . 991 [RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection 992 (BFD)", RFC 5880, DOI 10.17487/RFC5880, June 2010, 993 . 995 [RFC6513] Rosen, E., Ed. and R. Aggarwal, Ed., "Multicast in MPLS/ 996 BGP IP VPNs", RFC 6513, DOI 10.17487/RFC6513, February 997 2012, . 999 [RFC6514] Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP 1000 Encodings and Procedures for Multicast in MPLS/BGP IP 1001 VPNs", RFC 6514, DOI 10.17487/RFC6514, February 2012, 1002 . 1004 [RFC7606] Chen, E., Ed., Scudder, J., Ed., Mohapatra, P., and K. 1005 Patel, "Revised Error Handling for BGP UPDATE Messages", 1006 RFC 7606, DOI 10.17487/RFC7606, August 2015, 1007 . 1009 [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for 1010 Writing an IANA Considerations Section in RFCs", BCP 26, 1011 RFC 8126, DOI 10.17487/RFC8126, June 2017, 1012 . 1014 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1015 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1016 May 2017, . 1018 [RFC8562] Katz, D., Ward, D., Pallagatti, S., Ed., and G. Mirsky, 1019 Ed., "Bidirectional Forwarding Detection (BFD) for 1020 Multipoint Networks", RFC 8562, DOI 10.17487/RFC8562, 1021 April 2019, . 1023 11.2. Informative References 1025 [RFC4090] Pan, P., Ed., Swallow, G., Ed., and A. Atlas, Ed., "Fast 1026 Reroute Extensions to RSVP-TE for LSP Tunnels", RFC 4090, 1027 DOI 10.17487/RFC4090, May 2005, 1028 . 1030 [RFC7431] Karan, A., Filsfils, C., Wijnands, IJ., Ed., and B. 1031 Decraene, "Multicast-Only Fast Reroute", RFC 7431, 1032 DOI 10.17487/RFC7431, August 2015, 1033 . 1035 Authors' Addresses 1037 Thomas Morin (editor) 1038 Orange 1039 2, avenue Pierre Marzin 1040 Lannion 22307 1041 France 1043 Email: thomas.morin@orange-ftgroup.com 1045 Robert Kebler (editor) 1046 Juniper Networks 1047 1194 North Mathilda Ave. 1048 Sunnyvale, CA 94089 1049 U.S.A. 1051 Email: rkebler@juniper.net 1052 Greg Mirsky (editor) 1053 ZTE Corp. 1055 Email: gregimirsky@gmail.com