idnits 2.17.1 draft-ietf-bess-mvpn-fast-failover-14.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (December 21, 2020) is 1221 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group T. Morin, Ed. 3 Internet-Draft Orange 4 Intended status: Standards Track R. Kebler, Ed. 5 Expires: June 24, 2021 Juniper Networks 6 G. Mirsky, Ed. 7 ZTE Corp. 8 December 21, 2020 10 Multicast VPN Fast Upstream Failover 11 draft-ietf-bess-mvpn-fast-failover-14 13 Abstract 15 This document defines Multicast Virtual Private Network (VPN) 16 extensions and procedures that allow fast failover for upstream 17 failures by allowing downstream Provider Edges (PEs) to consider the 18 status of Provider-Tunnels (P-tunnels) when selecting the upstream PE 19 for a VPN multicast flow. The fast failover is enabled by using RFC 20 8562 Bidirectional Forwarding Detection (BFD) for Multipoint Networks 21 and the new BGP Attribute - BFD Discriminator. Also, the document 22 introduces a new BGP Community, Standby PE, extending BGP Multicast 23 VPN routing so that a C-multicast route can be advertised toward a 24 Standby Upstream PE. 26 Status of This Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at https://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on June 24, 2021. 43 Copyright Notice 45 Copyright (c) 2020 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (https://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 61 2. Conventions used in this document . . . . . . . . . . . . . . 3 62 2.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 63 2.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 64 2.3. Acronyms . . . . . . . . . . . . . . . . . . . . . . . . 4 65 3. UMH Selection Based on Tunnel Status . . . . . . . . . . . . 5 66 3.1. Determining the Status of a Tunnel . . . . . . . . . . . 6 67 3.1.1. MVPN Tunnel Root Tracking . . . . . . . . . . . . . . 7 68 3.1.2. PE-P Upstream Link Status . . . . . . . . . . . . . . 7 69 3.1.3. P2MP RSVP-TE Tunnels . . . . . . . . . . . . . . . . 7 70 3.1.4. Leaf-initiated P-tunnels . . . . . . . . . . . . . . 8 71 3.1.5. (C-S, C-G) Counter Information . . . . . . . . . . . 8 72 3.1.6. BFD Discriminator Attribute . . . . . . . . . . . . . 8 73 3.1.7. Per PE-CE Link BFD Discriminator . . . . . . . . . . 12 74 4. Standby C-multicast Route . . . . . . . . . . . . . . . . . . 12 75 4.1. Downstream PE Behavior . . . . . . . . . . . . . . . . . 13 76 4.2. Upstream PE Behavior . . . . . . . . . . . . . . . . . . 14 77 4.3. Reachability Determination . . . . . . . . . . . . . . . 15 78 4.4. Inter-AS . . . . . . . . . . . . . . . . . . . . . . . . 16 79 4.4.1. Inter-AS Procedures for downstream PEs, ASBR Fast 80 Failover . . . . . . . . . . . . . . . . . . . . . . 16 81 4.4.2. Inter-AS Procedures for ASBRs . . . . . . . . . . . . 17 82 5. Hot Root Standby . . . . . . . . . . . . . . . . . . . . . . 17 83 6. Duplicate Packets . . . . . . . . . . . . . . . . . . . . . . 18 84 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 85 7.1. Standby PE Community . . . . . . . . . . . . . . . . . . 18 86 7.2. BFD Discriminator . . . . . . . . . . . . . . . . . . . . 18 87 7.3. BFD Discriminator Optional Sub-TLV Type . . . . . . . . . 19 88 8. Security Considerations . . . . . . . . . . . . . . . . . . . 20 89 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 20 90 10. Contributor Addresses . . . . . . . . . . . . . . . . . . . . 20 91 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 22 92 11.1. Normative References . . . . . . . . . . . . . . . . . . 22 93 11.2. Informative References . . . . . . . . . . . . . . . . . 24 94 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 24 96 1. Introduction 98 It is assumed that the reader is familiar with the workings of 99 multicast MPLS/BGP IP VPNs as described in [RFC6513] and [RFC6514]. 101 In the context of multicast in BGP/MPLS VPNs [RFC6513], it is 102 desirable to provide mechanisms allowing fast recovery of 103 connectivity on different types of failures. This document addresses 104 failures of elements in the provider network that are upstream of PEs 105 connected to VPN sites with receivers. 107 Section 3 describes local procedures allowing an egress PE (a PE 108 connected to a receiver site) to take into account the status of 109 P-tunnels to determine the Upstream Multicast Hop (UMH) for a given 110 (C-S, C-G). One of the optional methods uses [RFC8562] and the new 111 BGP Attribute - BFD Discriminator. None of these methods provide a 112 "fast failover" solution when used alone, but can be used together 113 with the mechanism described in Section 4 for a "fast failover" 114 solution. 116 Section 4 describes an optional BGP extension, a new Standby PE 117 Community. that can speed up failover by not requiring any multicast 118 VPN (MVPN) routing message exchange at recovery time. 120 Section 5 describes a "hot leaf standby" mechanism that can be used 121 to improve failover time in MVPN. The approach combines mechanisms 122 defined in Section 3 and Section 4, and has similarities with the 123 solution described in [RFC7431] to improve failover times when PIM 124 routing is used in a network given some topology and metric 125 constraints. 127 The procedures described in this document are optional and allow an 128 operator to provide protection for multicast services in BGP/MPLS IP 129 VPNs. An operator would enable these mechanisms using a method 130 discussed in Section 3 combined with the redundancy provided by a 131 standby PE connected to the multicast flow source. PEs that support 132 these mechanisms would converge faster and thus provide a more stable 133 multicast service. In the case that a BGP implementation does not 134 recognize or is configured not to support the extensions defined in 135 this document, the implementation will continue to provide the 136 multicast service, as described in [RFC6513]. 138 2. Conventions used in this document 139 2.1. Requirements Language 141 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 142 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 143 "OPTIONAL" in this document are to be interpreted as described in BCP 144 14 [RFC2119] [RFC8174] when, and only when, they appear in all 145 capitals, as shown here. 147 2.2. Terminology 149 The terminology used in this document is the terminology defined in 150 [RFC6513] and [RFC6514]. 152 The term 'upstream' (lower case) throughout this document refers to 153 links and nodes that are upstream to a PE connected to VPN sites with 154 receivers of a multicast flow. 156 The term 'Upstream' (capitalized) throughout this document refers to 157 a PE or an Autonomous System Border Router (ASBR) at which (S,G) or 158 (*,G) data packets enter the VPN backbone or the local AS when 159 traveling through the VPN backbone. 161 2.3. Acronyms 163 PMSI: P-Multicast Service Interface 165 I-PMSI: Inclusive PMSI 167 S-PMSI: Selective PMSI 169 x-PMSI: Either an I-PMSI or an S-PMSI 171 P-tunnel: Provider-Tunnels 173 UMH: Upstream Multicast Hop 175 VPN: Virtual Private Network 177 MVPN: Multicast VPN 179 RD: Route Distinguisher 181 RP: Rendezvous Point 183 NLRI: Network Layer Reachability Information 185 VRF: VPN Routing and Forwarding Table 186 MED: Multi-Exit Discriminator 188 P2MP: Point-to-Multipoint 190 3. UMH Selection Based on Tunnel Status 192 Section 5.1 of [RFC6513] describes procedures used by a multicast VPN 193 downstream PE to determine the Upstream Multicast Hop (UMH) for a 194 given (C-S, C-G). 196 For a given downstream PE and a given VRF, the P-tunnel corresponding 197 to a given Upstream PE for a given (C-S, C-G) state is the S-PMSI 198 tunnel advertised by that Upstream PE for this (C-S, C-G) and 199 imported into that VRF, or if there isn't any such S-PMSI, the I-PMSI 200 tunnel advertised by that PE and imported into that VRF. 202 The procedure described here is an OPTIONAL procedure that is based 203 on a downstream PE taking into account the status of P-tunnels rooted 204 at each possible Upstream PE, for including or not including each 205 given PE in the list of candidate UMHs for a given (C-S, C-G) state. 206 If it is not possible to determine whether a P-tunnel's current 207 status is Up, the state shall be considered "not known to be Down", 208 and it may be treated as if it is Up so that attempts to use the 209 tunnel are acceptable. The result is that, if a P-tunnel is Down 210 (see Section 3.1), the PE that is the root of the P-tunnel will not 211 be considered for UMH selection. This will result in the downstream 212 PE failing over to use the next Upstream PE in the list of 213 candidates. Some downstream PEs could arrive at a different 214 conclusion regarding the tunnel's state because the failure impacts 215 only a subset of branches. Because of that, the procedures of 216 Section 9.1.1 of [RFC6513] are applicable when using I-PMSI 217 P-tunnels. That document is a foundation for this document, and its 218 processes all apply here. 220 There are three options specified in Section 5.1 of [RFC6513] for a 221 downstream PE to select an Upstream PE. 223 o The first two options select the Upstream PE from a candidate PE 224 set either based on an IP address or a hashing algorithm. When 225 used together with the optional procedure of considering the 226 P-tunnel status as in this document, a candidate Upstream PE is 227 included in the set if it either: 229 A. advertises an x-PMSI bound to a tunnel, where the specified 230 tunnel's state is not known to be Down, or, 232 B. does not advertise any x-PMSI applicable to the given (C-S, 233 C-G) but has associated a VRF Route Import BGP attribute to 234 the unicast VPN route for S. That is necessary to avoid 235 incorrectly invalidating a UMH PE that would use a policy 236 where no I-PMSI is advertised for a given VRF and where only 237 S-PMSI are used. The S-PMSI can be advertised only after the 238 Upstream PE receives a C-multicast route for (C-S, C-G)/(C-*, 239 C-G) to be carried over the advertised S-PMSI. 241 If the resulting candidate set is empty, then the procedure is 242 repeated without considering the P-tunnel status. 244 o The third option uses the installed UMH Route (i.e., the "best" 245 route towards the C-root) as the Selected UMH Route, and its 246 originating PE is the selected Upstream PE. With the optional 247 procedure of considering P-tunnel status as in this document, the 248 Selected UMH Route is the best one among those whose originating 249 PE's P-tunnel is not "down". If that does not exist, the 250 installed UMH Route is selected regardless of the P-tunnel status. 252 3.1. Determining the Status of a Tunnel 254 Different factors can be considered to determine the "status" of a 255 P-tunnel and are described in the following sub-sections. The 256 optional procedures described in this section also handle the case 257 when the downstream PEs do not all apply the same rules to define 258 what the status of a P-tunnel is (please see Section 6), and some of 259 them will produce a result that may be different for different 260 downstream PEs. Thus, the "status" of a P-tunnel in this section is 261 not a characteristic of the tunnel in itself, but is the tunnel 262 status, as seen from a particular downstream PE. Additionally, some 263 of the following methods determine the ability of a downstream PE to 264 receive traffic on the P-tunnel and not specifically on the status of 265 the P-tunnel itself. That could be referred to as "P-tunnel 266 reception status", but for simplicity, we will use the terminology of 267 P-tunnel "status" for all of these methods. 269 Depending on the criteria used to determine the status of a P-tunnel, 270 there may be an interaction with another resiliency mechanism used 271 for the P-tunnel itself, and the UMH update may happen immediately or 272 may need to be delayed. Each particular case is covered in each 273 separate sub-section below. 275 An implementation may support any combination of the methods 276 described in this section and provide a network operator with control 277 to choose which one to use in the particular deployment. 279 3.1.1. MVPN Tunnel Root Tracking 281 When determining if the status of a P-tunnel is Up, a condition to 282 consider is whether the root of the tunnel,as specified in the x-PMSI 283 Tunnel attribute, is reachablethrough unicast routing tables. In 284 this case, the downstream PE can immediately update its UMH when the 285 reachability condition changes. 287 That is similar to BGP next-hop tracking for VPN routes, except that 288 the address considered is not the BGP next-hop address, but the root 289 address in the x-PMSI Tunnel attribute. 291 If BGP next-hop tracking is done for VPN routes and the root address 292 of a given tunnel happens to be the same as the next-hop address in 293 the BGP A-D Route advertising the tunnel, then checking, in unicast 294 routing tables, whether the tunnel root is reachable, will be 295 unnecessary duplication and thus will not bring any specific benefit. 297 3.1.2. PE-P Upstream Link Status 299 When determining if the status of a P-tunnel is Up, a condition to 300 consider is whether the last-hop link of the P-tunnel is Up. 301 Conversely, if the last-hop link of the P-tunnel is Down then this 302 can be taken as an indication that the P-tunnel is Down. 304 Using this method when a fast restoration mechanism (such as MPLS FRR 305 [RFC4090]) is in place for the link requires careful consideration 306 and coordination of defect detection intervals for the link and the 307 tunnel. In many cases, it is not practical to use both protection 308 methods at the same time because uncorrelated timers might cause 309 unnecessary switchovers and destabilize the network. 311 3.1.3. P2MP RSVP-TE Tunnels 313 For P-tunnels of type P2MP MPLS-TE, the status of the P-tunnel is 314 considered Up if the sub-LSP to this downstream PE is in the Up 315 state. The determination of whether a P2MP RSVP-TE LSP is in the Up 316 state requires Path and Resv state for the LSP and is based on 317 procedures specified in [RFC4875]. As a result, the downstream PE 318 can immediately update its UMH when the reachability condition 319 changes. 321 When using this method and if the signaling state for a P2MP TE LSP 322 is removed (e.g., if the ingress of the P2MP TE LSP sends a PathTear 323 message) or the P2MP TE LSP changes state from Up to Down as 324 determined by procedures in [RFC4875], the status of the 325 corresponding P-tunnel MUST be re-evaluated. If the P-tunnel 326 transitions from Up to Down state, the Upstream PE that is the 327 ingress of the P-tunnel MUST NOT be considered as a valid candidate 328 UMH. 330 3.1.4. Leaf-initiated P-tunnels 332 An Upstream PE SHOULD be removed from the UMH candidate list for a 333 given (C-S, C-G) if the P-tunnel (I-PMSI or S-PMSI) for this (S, G) 334 is leaf-triggered (PIM, mLDP), but for some reason, internal to the 335 protocol, the upstream one-hop branch of the tunnel from P to PE 336 cannot be built. As a result, the downstream PE can immediately 337 update its UMH when the reachability condition changes. 339 3.1.5. (C-S, C-G) Counter Information 341 In cases, where the downstream node can be configured so that the 342 maximum inter-packet time is known for all the multicast flows mapped 343 on a P-tunnel, the local per-(C-S, C-G) traffic counter information 344 for traffic received on this P-tunnel can be used to determine the 345 status of the P-tunnel. 347 When such a procedure is used, in the context where fast restoration 348 mechanisms are used for the P-tunnels, a configurable timer MUST be 349 set on the downstream PE to wait before updating the UMH, to let the 350 P-tunnel restoration mechanism to execute its actions. Determining 351 that a tunnel is probably down by waiting for enough packets to fail 352 to arrive as expected is a heuristic and operational matter that 353 depends on the maximum inter-packet time. A timeout of three seconds 354 is a generally suitable default waiting period to ascertain that the 355 tunnel is down, though other values would be needed for atypical 356 conditions. 358 In cases where this mechanism is used in conjunction with the method 359 described in Section 5, no prior knowledge of the rate or maximum 360 inter-packet time on the multicast streams is required; downstream 361 PEs can compare actual packet reception statistics on the two 362 P-tunnels to determine when one of them is down. The detailed 363 specification of this mechanism is outside the scope of this 364 document. 366 3.1.6. BFD Discriminator Attribute 368 P-tunnel status may be derived from the status of a multipoint BFD 369 session [RFC8562] whose discriminator is advertised along with an 370 x-PMSI A-D Route. 372 This document defines the format and ways of using a new BGP 373 attribute called the "BFD Discriminator". It is an optional 374 transitive BGP attribute. Thus it is expected that an implementation 375 that does not recognize or is configured not to support this 376 attribute follows procedures defined for optional transitive path 377 attributes in Section 5 of [RFC4271]. In Section 7.2, IANA is 378 requested to allocate the codepoint value (TBA2). The format of this 379 attribute is shown in Figure 1. 381 0 1 2 3 382 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 383 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 384 | BFD Mode | Reserved | 385 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 386 | BFD Discriminator | 387 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 388 ~ Optional TLVs ~ 389 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 391 Figure 1: Format of the BFD Discriminator Attribute 393 Where: 395 BFD Mode field is the one octet long. This specification defines 396 the P2MP BFD Session as value 1 Section 7.2. 398 Reserved field is three octets long, and the value MUST be zeroed 399 on transmission and ignored on receipt. 401 BFD Discriminator field is four octets long. 403 Optional TLVs is the optional variable-length field that MAY be 404 used in the BFD Discriminator attribute for future extensions. 405 TLVs MAY be included in a sequential or nested manner. To allow 406 for TLV nesting, it is advised to define a new TLV as a variable- 407 length object. Figure 2 presents the Optional TLV format TLV that 408 consists of: 410 * Type - a one-octet-long field that characterizes the 411 interpretation of the Value field (Section 7.3) 413 * Length - a one-octet-long field equal to the length of the 414 Value field in octets 416 * Value - a variable-length field. 418 The length of a TLV as a whole MUST be multiple of four octets. 420 0 1 2 3 421 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 422 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 423 | Type | Length | Value ... 424 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 426 Figure 2: Format of the Optional TLV 428 The BFD Discriminator attribute MUST be considered malformed if its 429 length is not a non-zero multiple of four. If the attribute is 430 deemed to be malformed, the UPDATE message SHALL be handled using the 431 approach of Attribute Discard per [RFC7606]. 433 3.1.6.1. Upstream PE Procedures 435 To enable downstream PEs to track the P-tunnel status using a point- 436 to-multipoint (P2MP) BFD session the Upstream PE: 438 o MUST initiate the BFD session and set bfd.SessionType = 439 MultipointHead as described in [RFC8562]; 441 o when transmitting BFD Control packets MUST set the IP destination 442 address of the inner IP header to one of the internal loopback 443 addresses from the 127/8 range for IPv4. For IPv6, it SHOULD use 444 the loopback address ::1/128 for IPv6 [RFC4291] or MAY use one of 445 IPv4-mapped IPv6 addresses from ::ffff:127.0.0.0/104 range; 447 o MUST use its IP address as the source IP address when transmitting 448 BFD Control packets; 450 o MUST include the BFD Discriminator attribute in the x-PMSI A-D 451 Route with the value set to My Discriminator value; 453 o MUST periodically transmit BFD Control packets over the x-PMSI 454 P-tunnel after the P-tunnel is considered established. Note that 455 the methods to declare a P-tunnel has been established are outside 456 the scope of this specification. 458 If the tracking of the P-tunnel by using a P2MP BFD session is 459 enabled after the x-PMSI A-D Route has been already advertised, the 460 x-PMSI A-D Route MUST be re-sent with the only change between the 461 previous advertisement and the new advertisement to be the inclusion 462 of the BFD Discriminator attribute. 464 If the x-PMSI A-D Route is advertised with P-tunnel status tracked 465 using the P2MP BFD session, and it is desired to stop tracking 466 P-tunnel status using BFD, then: 468 o x-PMSI A-D Route MUST be re-sent with the only change between the 469 previous advertisement and the new advertisement be the exclusion 470 of the BFD Discriminator attribute; 472 o the P2MP BFD session MUST be deleted. The session MAY be deleted 473 after some configurable delay, which should have a reasonable 474 default. 476 3.1.6.2. Downstream PE Procedures 478 Upon receiving the BFD Discriminator attribute in the x-PMSI A-D 479 Route, the downstream PE: 481 o MUST associate the received BFD Discriminator value with the 482 P-tunnel originating from the Upstream PE and the IP address of 483 the Upstream PE; 485 o MUST create a P2MP BFD session and set bfd.SessionType = 486 MultipointTail as described in [RFC8562]; 488 o MUST use the source IP address of the BFD Control packet, the 489 value of the BFD Discriminator field, and the x-PMSI Tunnel 490 Identifier [RFC6514] the BFD Control packet was received on to 491 properly demultiplex BFD sessions. 493 After the state of the P2MP BFD session is up, i.e., bfd.SessionState 494 == Up, the session state will then be used to track the health of the 495 P-tunnel. 497 According to [RFC8562], if the downstream PE receives Down or 498 AdminDown in the State field of the BFD Control packet or associated 499 with the BFD session Detection Timer associated with the BFD session 500 expires, the BFD session is down, i.e., bfd.SessionState == Down. 501 When the BFD session state is Down, then the P-tunnel associated with 502 the BFD session MUST be considered down. If the site that contains 503 C-S is connected to two or more PEs, a downstream PE will select one 504 as its Primary Upstream PE, while others are considered as Standby 505 Upstream PEs. In such a scenario, when the P-tunnel is considered 506 down, the downstream PE MAY initiate a switchover of the traffic from 507 the Primary Upstream PE to the Standby Upstream PE only if the 508 Standby Upstream PE is deemed to be in the Up state. That MAY be 509 determined from the state of a P2MP BFD session with the Standby 510 Upstream PE as the MultipointHead. 512 If the downstream PE's P-tunnel is already established when the 513 downstream PE receives the new x-PMSI A-D Route with BFD 514 Discriminator attribute, the downstream PE MUST associate the value 515 of BFD Discriminator field with the P-tunnel and follow procedures 516 listed above in this section if and only if the x-PMSI A-D Route was 517 properly processed as per [RFC6514], and the BFD Discriminator 518 attribute was validated. 520 If the downstream PE's P-tunnel is already established, its state 521 being monitored by the P2MP BFD session, and the downstream PE 522 receives the new x-PMSI A-D Route without the BFD Discriminator 523 attribute, and the x-PMSI A-D Route was processed without any error 524 as per the relevant specifications, the downstream PE: 526 o MUST stop processing BFD Control packets for this P2MP BFD 527 session; 529 o the P2MP BFD session associated with the P-tunnel MUST be deleted. 530 The session MAY be deleted after some configurable delay, which 531 should have a reasonable default. 533 o SHOULD NOT switch the traffic to the Standby Upstream PE. 535 3.1.7. Per PE-CE Link BFD Discriminator 537 The following approach is defined in response to the detection by the 538 Upstream PE of a PE-CE link failure. Even though the provider tunnel 539 is still up, it is desired for the downstream PEs to switch to a 540 backup Upstream PE. To achieve that, if the Upstream PE detects that 541 its PE-CE link fails, it SHOULD set the bfd.LocalDiag of the P2MP BFD 542 session to Concatenated Path Down and/or Reverse Concatenated Path 543 Down (per Section 6.8.17 [RFC5880]), unless it switches to a new PE- 544 CE link within the time of bfd.DesiredMinTxInterval for the P2MP BFD 545 session (in that case, the Upstream PE will start tracking the status 546 of the new PE-CE link). When a downstream PE receives that 547 bfd.LocalDiag code, it treats it as if the tunnel itself failed and 548 tries to switch to a backup PE. 550 4. Standby C-multicast Route 552 The procedures described below are limited to the case where the site 553 that contains C-S is connected to two or more PEs, though, to 554 simplify the description, the case of dual-homing is described. In 555 the case where more than two PEs are connected to the C-s site, 556 selection of the Standby PE can be performed using one of the methods 557 of selecting a UMH. Details of the selection are outside the scope 558 of this document. The procedures require all the PEs of that MVPN to 559 follow the same UMH selection procedure, as specified in [RFC6513], 560 whether the PE selected based on its IP address, the hashing 561 algorithm described in section 5.1.3 of [RFC6513], or Installed UMH 562 Route. The consistency of the UMH selection method used among all 563 PEs is expected to be provided by the management plane. The 564 procedures assume that if a site of a given MVPN that contains C-S is 565 dual-homed to two PEs, then all the other sites of that MVPN would 566 have two unicast VPN routes (VPN-IPv4 or VPN-IPv6) to C-S, each with 567 its own RD. 569 As long as C-S is reachable via both PEs, a given downstream PE will 570 select one of the PEs connected to C-S as its Upstream PE for C-S. 571 We will refer to the other PE connected to C-S as the "Standby 572 Upstream PE". Note that if the connectivity to C-S through the 573 Primary Upstream PE becomes unavailable, then the PE will select the 574 Standby Upstream PE as its Upstream PE for C-S. When the Primary PE 575 later becomes available, then the PE will select the Primary Upstream 576 PE again as its Upstream PE. Such behavior is referred to as 577 "revertive" behavior and MUST be supported. Non-revertive behavior 578 refers to the behavior of continuing to select the backup PE as the 579 UMH even after the Primary has come up. This non-revertive behavior 580 MAY also be supported by an implementation and would be enabled 581 through some configuration. Selection of the behavior, revertive or 582 non-revertive, is an operational issue, but it MUST be consistent on 583 all PEs in the given MVPN. 585 For readability, in the following sub-sections, the procedures are 586 described for BGP C-multicast Source Tree Join routes, but they apply 587 equally to BGP C-multicast Shared Tree Join routes for the case where 588 the customer RP is dual-homed (substitute "C-RP" to "C-S"). 590 4.1. Downstream PE Behavior 592 When a (downstream) PE connected to some site of an MVPN needs to 593 send a C-multicast route (C-S, C-G), then following the procedures 594 specified in Section 11.1 of [RFC6514], the PE sends the C-multicast 595 route with an RT that identifies the Upstream PE selected by the PE 596 originating the route. As long as C-S is reachable via the Primary 597 Upstream PE, the Upstream PE is the Primary Upstream PE. If C-S is 598 reachable only via the Standby Upstream PE, then the Upstream PE is 599 the Standby Upstream PE. 601 If C-S is reachable via both the Primary and the Standby Upstream PE, 602 then in addition to sending the C-multicast route with an RT that 603 identifies the Primary Upstream PE, the downstream PE also originates 604 and sends a C-multicast route with an RT that identifies the Standby 605 Upstream PE. The route that has the semantics of being a "standby" 606 C-multicast route is further called a "Standby BGP C-multicast 607 route", and is constructed as follows: 609 o the NLRI is constructed as the C-multicast route with an RT that 610 identifies the Primary Upstream PE, except that the RD is the same 611 as if the C-multicast route was built using the Standby Upstream 612 PE as the UMH (it will carry the RD associated to the unicast VPN 613 route advertised by the Standby Upstream PE for S and a Route 614 Target derived from the Standby Upstream PE's UMH route's VRF RT 615 Import EC); 617 o MUST carry the "Standby PE" BGP Community (this is a new BGP 618 Community. Section 7.1 requested IANA to allocate value TBA1). 620 The Local Preference attribute of the normal and the standby 621 C-multicast route needs to be adjusted. so that, if a BGP peer 622 receives two C-multicast routes with the same NLRI, one carrying the 623 "Standby PE" community and the other one not carrying the "Standby 624 PE" community, then preference is given to the one not carrying the 625 "Standby PE" community. Such a situation can happen when, for 626 instance, due to transient unicast routing inconsistencies or lack of 627 support of the Standby PE community, two different downstream PEs 628 consider different Upstream PEs to be the primary one. In that case, 629 without any precaution taken, both Upstream PEs would process a 630 standby C-multicast route and possibly stop forwarding at the same 631 time. For this purpose, routes that carry the "Standby PE" BGP 632 Community MUST have the LOCAL_PREF attribute set to zero. 634 Note that, when a PE advertises such a Standby C-multicast join for a 635 (C-S, C-G) it MUST join the corresponding P-tunnel. 637 If at some later point, the PE determines that C-S is no longer 638 reachable through the Primary Upstream PE, the Standby Upstream PE 639 becomes the Upstream PE, and the PE re-sends the C-multicast route 640 with RT that identifies the Standby Upstream PE, except that now the 641 route does not carry the Standby PE BGP Community (which results in 642 replacing the old route with a new route, with the only difference 643 between these routes being the presence/absence of the Standby PE BGP 644 Community). The LOCAL_PREF attribute MUST be set to zero. 646 4.2. Upstream PE Behavior 648 When a PE supporting this specification receives a C-multicast route 649 for a particular (C-S, C-G) for which all of the following are true: 651 o the RT carried in the route results in importing the route into a 652 particular VRF on the PE; 654 o the route carries the Standby PE BGP Community; and 656 o the PE determines (via a method of failure detection that is 657 outside the scope of this document) that C-S is not reachable 658 through some other PE (more details are in Section 4.3), 660 then the PE MAY install VRF PIM state corresponding to this Standby 661 BGP C-multicast route (the result will be that a PIM Join message 662 will be sent to the CE towards C-S, and that the PE will receive 663 (C-S, C-G) traffic), and the PE MAY forward (C-S, C-G) traffic 664 received by the PE to other PEs through a P-tunnel rooted at the PE. 666 Furthermore, irrespective of whether C-S carried in that route is 667 reachable through some other PE: 669 a) based on local policy, as soon as the PE receives this Standby BGP 670 C-multicast route, the PE MAY install VRF PIM state corresponding 671 to this BGP Source Tree Join route (the result will be that Join 672 messages will be sent to the CE toward C-S, and that the PE will 673 receive (C-S, C-G) traffic) 675 b) based on local policy, as soon as the PE receives this Standby BGP 676 C-multicast route, the PE MAY forward (C-S, C-G) traffic to other 677 PEs through a P-tunnel independently of the reachability of C-S 678 through some other PE. [note that this implies also doing a)] 680 Doing neither a) or b) for a given (C-S, C-G) is called "cold root 681 standby". 683 Doing a) but not b) for a given (C-S, C-G) is called "warm root 684 standby". 686 Doing b) (which implies also doing a)) for a given (C-S, C-G) is 687 called "hot root standby". 689 Note that, if an Upstream PE uses an S-PMSI only policy, it shall 690 advertise an S-PMSI for a (C-S, C-G) as soon as it receives a 691 C-multicast route for (C-S, C-G), normal or Standby; i.e., it shall 692 not wait for receiving a non-Standby C-multicast route before 693 advertising the corresponding S-PMSI. 695 Section 9.3.2 of [RFC6513], describes the procedures of sending a 696 Source-Active A-D Route as a result of receiving the C-multicast 697 route. These procedures MUST be followed for both the normal and 698 Standby C-multicast routes. 700 4.3. Reachability Determination 702 The Standby Upstream PE can use the following information to 703 determine that C-S can or cannot be reached through the Primary 704 Upstream PE: 706 o presence/absence of a unicast VPN route toward C-S 707 o supposing that the Standby Upstream PE is the egress of the tunnel 708 rooted at the Primary Upstream PE, the Standby Upstream PE can 709 determine the reachability of C-S through the Primary Upstream PE 710 based on the status of this tunnel, determined thanks to the same 711 criteria as the ones described in Section 3.1 (without using the 712 UMH selection procedures of Section 3); 714 o other mechanisms MAY be used. 716 4.4. Inter-AS 718 If the non-segmented inter-AS approach is used, the procedures 719 described in Section 4.1 through Section 4.3 can be applied. 721 When multicast VPNs are used in an inter-AS context with the 722 segmented inter-AS approach described in Section 9.2 of [RFC6514], 723 the procedures in this section can be applied. 725 A pre-requisite for the procedures described below to be applied for 726 a source of a given MVPN is: 728 o that any PE of this MVPN receives two or more Inter-AS I-PMSI A-D 729 Routes advertised by the AS of the source 731 o that these Inter-AS I-PMSI A-D Routes have distinct Route 732 Distinguishers (as described in item "(2)" of section 9.2 of 733 [RFC6514]). 735 As an example, these conditions will be satisfied when the source is 736 dual-homed to an AS that connects to the receiver AS through two ASBR 737 using auto-configured RDs. 739 4.4.1. Inter-AS Procedures for downstream PEs, ASBR Fast Failover 741 The following procedure is applied by downstream PEs of an AS, for a 742 source S in a remote AS. 744 Additionally to choosing an Inter-AS I-PMSI A-D Route advertised from 745 the AS of the source to construct a C-multicast route, as described 746 in section 11.1.3 [RFC6514], a downstream PE will choose a second 747 Inter-AS I-PMSI A-D Route advertised from the AS of the source and 748 use this route to construct and advertise a Standby C-multicast route 749 (C-multicast route carrying the Standby extended community), as 750 described in Section 4.1. 752 4.4.2. Inter-AS Procedures for ASBRs 754 When an Upstream ASBR receives a C-multicast route, and at least one 755 of the RTs of the route matches one of the ASBR Import RT, the ASBR, 756 that supports this specification, MUST try to locate an Inter-AS 757 I-PMSI A-D Route whose RD and Source AS respectively match the RD and 758 Source AS carried in the C-multicast route. If the match is found, 759 and the C-multicast route carries the Standby PE BGP Community, then 760 the ASBR implementation that supports this specification MUST be 761 configurable to perform as follows: 763 o if the route was received over iBGP and its LOCAL_PREF attribute 764 is set to zero, then it MUST be re-advertised in eBGP with a MED 765 attribute (MULTI_EXIT_DISC) set to the highest possible value 766 (0xffff) 768 o if the route was received over eBGP and its MED attribute set to 769 0xffff, then it MUST be re-advertised in iBGP with a LOCAL_PREF 770 attribute set to zero 772 Other ASBR procedures are applied without modification and, when 773 applied, MAY modify the above-listed behavior. 775 5. Hot Root Standby 777 The mechanisms defined in Section 4 and Section 3 can be used 778 together as follows. 780 The principle is that, for a given VRF (or possibly only for a given 781 (C-S, C-G): 783 o downstream PEs advertise a Standby BGP C-multicast route (based on 784 Section 4) 786 o Upstream PEs use the "hot standby" optional behavior and thus will 787 start forwarding traffic for a given multicast state after they 788 have a (primary) BGP C-multicast route or a Standby BGP 789 C-multicast route for that state (or both) 791 o downstream PEs accept traffic from the primary or standby tunnel, 792 based on the status of the tunnel (based on Section 3) 794 Other combinations of the mechanisms proposed in Section 4 and 795 Section 3 are for further study. 797 Note that the same level of protection would be achievable with a 798 simple C-multicast Source Tree Join route advertised to both the 799 primary and secondary Upstream PEs (carrying as Route Target extended 800 communities, the values of the VRF Route Import attribute of each VPN 801 route from each Upstream PEs). The advantage of using the Standby 802 semantic is that, supposing that downstream PEs always advertise a 803 Standby C-multicast route to the secondary Upstream PE, it allows to 804 choose the protection level through a change of configuration on the 805 secondary Upstream PE, without requiring any reconfiguration of all 806 the downstream PEs. 808 6. Duplicate Packets 810 Multicast VPN specifications [RFC6513] impose that a PE only forwards 811 to CEs the packets coming from the expected Upstream PE (Section 9.1 812 of [RFC6513]). 814 We draw the reader's attention to the fact that the respect of this 815 part of multicast VPN specifications is especially important when two 816 distinct Upstream PEs are susceptible to forward the same traffic on 817 P-tunnels at the same time in the steady state. That will be the 818 case when "hot root standby" mode is used (Section 4), and which can 819 also be the case if procedures of Section 3 are used and a) the rules 820 determining the status of a tree are not the same on two distinct 821 downstream PEs or b) the rule determining the status of a tree 822 depends on conditions local to a PE (e.g., the PE-P upstream link 823 being up). 825 7. IANA Considerations 827 7.1. Standby PE Community 829 IANA is requested to allocate the BGP "Standby PE" community value 830 (TBA1) from the Border Gateway Protocol (BGP) Well-known Communities 831 registry using the First Come First Served registration policy. 833 7.2. BFD Discriminator 835 This document defines a new BGP optional transitive attribute, called 836 "BFD Discriminator". IANA is requested to allocate a codepoint 837 (TBA2) in the "BGP Path Attributes" registry to the BFD Discriminator 838 attribute. 840 IANA is requested to create a new BFD Mode sub-registry in the Border 841 Gateway Protocol (BGP) Parameters registry. The registration 842 policies, per [RFC8126], for this sub-registry are according to 843 Table 1. 845 +-----------+-------------------------+ 846 | Value | Policy | 847 +-----------+-------------------------+ 848 | 0- 175 | IETF Review | 849 | 176 - 249 | First Come First Served | 850 | 250 - 254 | Experimental Use | 851 | 255 | IETF Review | 852 +-----------+-------------------------+ 854 Table 1: BFD Mode Sub-registry Registration Policies 856 IANA is requested to make initial assignments according to Table 2. 858 +-----------+------------------+---------------+ 859 | Value | Description | Reference | 860 +-----------+------------------+---------------+ 861 | 0 | Reserved | This document | 862 | 1 | P2MP BFD Session | This document | 863 | 2- 175 | Unassigned | | 864 | 176 - 249 | Unassigned | | 865 | 250 - 254 | Experimental Use | This document | 866 | 255 | Reserved | This document | 867 +-----------+------------------+---------------+ 869 Table 2: BFD Mode Sub-registry 871 7.3. BFD Discriminator Optional Sub-TLV Type 873 IANA is requested to create a new BFD Discriminator Optional sub-TLV 874 Type sub-registry in Border Gateway Protocol (BGP). The registration 875 policies, per [RFC8126], for this sub-registry are according to 876 Table 3. 878 +-----------+-------------------------+ 879 | Value | Policy | 880 +-----------+-------------------------+ 881 | 0- 175 | IETF Review | 882 | 176 - 249 | First Come First Served | 883 | 250 - 254 | Experimental Use | 884 | 255 | IETF Review | 885 +-----------+-------------------------+ 887 Table 3: BFD Discriminator Optional Sub-TLV Type Sub-registry 888 Registration Policies 890 IANA is requested to make initial assignments according to Table 4. 892 +-----------+------------------+---------------+ 893 | Value | Description | Reference | 894 +-----------+------------------+---------------+ 895 | 0 | Reserved | This document | 896 | 1- 175 | Unassigned | | 897 | 176 - 249 | Unassigned | | 898 | 250 - 254 | Experimental Use | This document | 899 | 255 | Reserved | This document | 900 +-----------+------------------+---------------+ 902 Table 4: BFD Discriminator Optional Sub-TLV Type Sub-registry 904 8. Security Considerations 906 This document describes procedures based on [RFC6513] and [RFC6514] 907 and hence shares the security considerations respectively represented 908 in these specifications. 910 This document uses P2MP BFD, as defined in [RFC8562], which, in turn, 911 is based on [RFC5880]. Security considerations relevant to each 912 protocol are discussed in the respective protocol specifications. An 913 implementation that supports this specification MUST provide a 914 mechanism to limit the overall amount of capacity used by the BFD 915 traffic (as the combination of the number of active P2MP BFD sessions 916 and the rate of BFD Control packets to process). 918 The methods described in Section 3.1 may produce false-negative state 919 changes that can be the trigger for an unnecessary convergence in the 920 control plane, ultimately negatively impacting the multicast service 921 provided by the VPN. An operator is expected to consider the network 922 environment and use available controls of the mechanism used to 923 determine the status of a P-tunnel. 925 9. Acknowledgments 927 The authors want to thank Greg Reaume, Eric Rosen, Jeffrey Zhang, 928 Martin Vigoureux, Adrian Farrel, and Zheng (Sandy) Zhang for their 929 reviews, useful comments, and helpful suggestions. 931 10. Contributor Addresses 933 Below is a list of other contributing authors in alphabetical order: 935 Rahul Aggarwal 936 Arktan 938 Email: raggarwa_1@yahoo.com 939 Nehal Bhau 940 Cisco 942 Email: NBhau@cisco.com 944 Clayton Hassen 945 Bell Canada 946 2955 Virtual Way 947 Vancouver 948 CANADA 950 Email: Clayton.Hassen@bell.ca 952 Wim Henderickx 953 Nokia 954 Copernicuslaan 50 955 Antwerp 2018 956 Belgium 958 Email: wim.henderickx@nokia.com 960 Pradeep Jain 961 Nokia 962 701 E Middlefield Rd 963 Mountain View, CA 94043 964 USA 966 Email: pradeep.jain@nokia.com 968 Jayant Kotalwar 969 Nokia 970 701 E Middlefield Rd 971 Mountain View, CA 94043 972 USA 974 Email: Jayant.Kotalwar@nokia.com 976 Praveen Muley 977 Nokia 978 701 East Middlefield Rd 979 Mountain View, CA 94043 980 U.S.A. 982 Email: praveen.muley@nokia.com 984 Ray (Lei) Qiu 985 Juniper Networks 986 1194 North Mathilda Ave. 987 Sunnyvale, CA 94089 988 U.S.A. 990 Email: rqiu@juniper.net 992 Yakov Rekhter 993 Juniper Networks 994 1194 North Mathilda Ave. 995 Sunnyvale, CA 94089 996 U.S.A. 998 Email: yakov@juniper.net 1000 Kanwar Singh 1001 Nokia 1002 701 E Middlefield Rd 1003 Mountain View, CA 94043 1004 USA 1006 Email: kanwar.singh@nokia.com 1008 11. References 1010 11.1. Normative References 1012 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1013 Requirement Levels", BCP 14, RFC 2119, 1014 DOI 10.17487/RFC2119, March 1997, 1015 . 1017 [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A 1018 Border Gateway Protocol 4 (BGP-4)", RFC 4271, 1019 DOI 10.17487/RFC4271, January 2006, 1020 . 1022 [RFC4875] Aggarwal, R., Ed., Papadimitriou, D., Ed., and S. 1023 Yasukawa, Ed., "Extensions to Resource Reservation 1024 Protocol - Traffic Engineering (RSVP-TE) for Point-to- 1025 Multipoint TE Label Switched Paths (LSPs)", RFC 4875, 1026 DOI 10.17487/RFC4875, May 2007, 1027 . 1029 [RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection 1030 (BFD)", RFC 5880, DOI 10.17487/RFC5880, June 2010, 1031 . 1033 [RFC6513] Rosen, E., Ed. and R. Aggarwal, Ed., "Multicast in MPLS/ 1034 BGP IP VPNs", RFC 6513, DOI 10.17487/RFC6513, February 1035 2012, . 1037 [RFC6514] Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP 1038 Encodings and Procedures for Multicast in MPLS/BGP IP 1039 VPNs", RFC 6514, DOI 10.17487/RFC6514, February 2012, 1040 . 1042 [RFC7606] Chen, E., Ed., Scudder, J., Ed., Mohapatra, P., and K. 1043 Patel, "Revised Error Handling for BGP UPDATE Messages", 1044 RFC 7606, DOI 10.17487/RFC7606, August 2015, 1045 . 1047 [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for 1048 Writing an IANA Considerations Section in RFCs", BCP 26, 1049 RFC 8126, DOI 10.17487/RFC8126, June 2017, 1050 . 1052 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1053 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1054 May 2017, . 1056 [RFC8562] Katz, D., Ward, D., Pallagatti, S., Ed., and G. Mirsky, 1057 Ed., "Bidirectional Forwarding Detection (BFD) for 1058 Multipoint Networks", RFC 8562, DOI 10.17487/RFC8562, 1059 April 2019, . 1061 11.2. Informative References 1063 [RFC4090] Pan, P., Ed., Swallow, G., Ed., and A. Atlas, Ed., "Fast 1064 Reroute Extensions to RSVP-TE for LSP Tunnels", RFC 4090, 1065 DOI 10.17487/RFC4090, May 2005, 1066 . 1068 [RFC4291] Hinden, R. and S. Deering, "IP Version 6 Addressing 1069 Architecture", RFC 4291, DOI 10.17487/RFC4291, February 1070 2006, . 1072 [RFC7431] Karan, A., Filsfils, C., Wijnands, IJ., Ed., and B. 1073 Decraene, "Multicast-Only Fast Reroute", RFC 7431, 1074 DOI 10.17487/RFC7431, August 2015, 1075 . 1077 Authors' Addresses 1079 Thomas Morin (editor) 1080 Orange 1081 2, avenue Pierre Marzin 1082 Lannion 22307 1083 France 1085 Email: thomas.morin@orange-ftgroup.com 1087 Robert Kebler (editor) 1088 Juniper Networks 1089 1194 North Mathilda Ave. 1090 Sunnyvale, CA 94089 1091 U.S.A. 1093 Email: rkebler@juniper.net 1095 Greg Mirsky (editor) 1096 ZTE Corp. 1098 Email: gregimirsky@gmail.com