idnits 2.17.1 draft-ietf-bess-mvpn-fast-failover-11.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 2, 2020) is 1295 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group T. Morin, Ed. 3 Internet-Draft Orange 4 Intended status: Standards Track R. Kebler, Ed. 5 Expires: April 5, 2021 Juniper Networks 6 G. Mirsky, Ed. 7 ZTE Corp. 8 October 2, 2020 10 Multicast VPN Fast Upstream Failover 11 draft-ietf-bess-mvpn-fast-failover-11 13 Abstract 15 This document defines multicast VPN extensions and procedures that 16 allow fast failover for upstream failures, by allowing downstream PEs 17 to take into account the status of Provider-Tunnels (P-tunnels) when 18 selecting the Upstream PE for a VPN multicast flow, and extending BGP 19 MVPN routing so that a C-multicast route can be advertised toward a 20 Standby Upstream PE. 22 Requirements Language 24 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 25 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 26 "OPTIONAL" in this document are to be interpreted as described in BCP 27 14 [RFC2119] [RFC8174] when, and only when, they appear in all 28 capitals, as shown here. 30 Status of This Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at https://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on April 5, 2021. 47 Copyright Notice 49 Copyright (c) 2020 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents 54 (https://trustee.ietf.org/license-info) in effect on the date of 55 publication of this document. Please review these documents 56 carefully, as they describe your rights and restrictions with respect 57 to this document. Code Components extracted from this document must 58 include Simplified BSD License text as described in Section 4.e of 59 the Trust Legal Provisions and are provided without warranty as 60 described in the Simplified BSD License. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 65 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 66 3. UMH Selection Based on Tunnel Status . . . . . . . . . . . . 3 67 3.1. Determining the Status of a Tunnel . . . . . . . . . . . 5 68 3.1.1. mVPN Tunnel Root Tracking . . . . . . . . . . . . . . 5 69 3.1.2. PE-P Upstream Link Status . . . . . . . . . . . . . . 6 70 3.1.3. P2MP RSVP-TE Tunnels . . . . . . . . . . . . . . . . 6 71 3.1.4. Leaf-initiated P-tunnels . . . . . . . . . . . . . . 6 72 3.1.5. (C-S, C-G) Counter Information . . . . . . . . . . . 6 73 3.1.6. BFD Discriminator Attribute . . . . . . . . . . . . . 7 74 3.1.7. Per PE-CE Link BFD Discriminator . . . . . . . . . . 10 75 4. Standby C-multicast Route . . . . . . . . . . . . . . . . . . 11 76 4.1. Downstream PE Behavior . . . . . . . . . . . . . . . . . 11 77 4.2. Upstream PE Behavior . . . . . . . . . . . . . . . . . . 12 78 4.3. Reachability Determination . . . . . . . . . . . . . . . 13 79 4.4. Inter-AS . . . . . . . . . . . . . . . . . . . . . . . . 14 80 4.4.1. Inter-AS Procedures for downstream PEs, ASBR Fast 81 Failover . . . . . . . . . . . . . . . . . . . . . . 14 82 4.4.2. Inter-AS Procedures for ASBRs . . . . . . . . . . . . 15 83 5. Hot Root Standby . . . . . . . . . . . . . . . . . . . . . . 15 84 6. Duplicate Packets . . . . . . . . . . . . . . . . . . . . . . 16 85 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 86 7.1. Standby PE Community . . . . . . . . . . . . . . . . . . 16 87 7.2. BFD Discriminator . . . . . . . . . . . . . . . . . . . . 16 88 7.3. BFD Discriminator Optional Sub-TLV Type . . . . . . . . . 17 89 8. Security Considerations . . . . . . . . . . . . . . . . . . . 18 90 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 18 91 10. Contributor Addresses . . . . . . . . . . . . . . . . . . . . 18 92 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 20 93 11.1. Normative References . . . . . . . . . . . . . . . . . . 20 94 11.2. Informative References . . . . . . . . . . . . . . . . . 21 96 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 22 98 1. Introduction 100 In the context of multicast in BGP/MPLS VPNs, it is desirable to 101 provide mechanisms allowing fast recovery of connectivity on 102 different types of failures. This document addresses failures of 103 elements in the provider network that are upstream of PEs connected 104 to VPN sites with receivers. 106 Section 3 describes local procedures allowing an egress PE (a PE 107 connected to a receiver site) to take into account the status of 108 P-tunnels to determine the Upstream Multicast Hop (UMH) for a given 109 (C-S, C-G). This method does not provide a "fast failover" solution 110 when used alone, but can be used together with the mechanism 111 described in Section 4 for a "fast failover" solution. 113 Section 4 describes protocol extensions that can speed up failover by 114 not requiring any multicast VPN routing message exchange at recovery 115 time. 117 Moreover, section 5 describes a "hot leaf standby" mechanism, that 118 uses a combination of these two mechanisms. This approach has 119 similarities with the solution described in [RFC7431] to improve 120 failover times when PIM routing is used in a network given some 121 topology and metric constraints. 123 2. Terminology 125 The terminology used in this document is the terminology defined in 126 [RFC6513] and [RFC6514]. 128 x-PMSI: I-PMSI or S-PMSI 130 The term 'upstream' (lower case) throughout this document refers to 131 links and nodes that are upstream to a PE connected to VPN sites with 132 receivers of a multicast flow. 134 The term 'Upstream' (capitalized) throughout this document refers to 135 a PE or an Autonomous System Border Router (ASBR) at which (S,G) or 136 (*,G) data packets enter the VPN backbone or the local AS when 137 traveling through the VPN backbone. 139 3. UMH Selection Based on Tunnel Status 141 Section 5.1 [RFC6513] describes procedures used by a multicast VPN 142 downstream PE to determine the Upstream Multicast Hop (UMH) for a 143 given (C-S, C-G). 145 For a given downstream PE and a given VRF, the P-tunnel corresponding 146 to a given Upstream PE for a given (C-S, C-G) state is the S-PMSI 147 tunnel advertised by that Upstream PE for this (C-S, C-G) and 148 imported into that VRF, or if there isn't any such S-PMSI, the I-PMSI 149 tunnel advertised by that PE and imported into that VRF. 151 The procedure described here is an OPTIONAL procedure that is based 152 on a downstream PE taking into account the status of P-tunnels rooted 153 at each possible Upstream PE, for including or not including each 154 given PE in the list of candidate UMHs for a given (C-S, C-G) state. 155 The result is that, if a P-tunnel is "down" (see Section 3.1), the PE 156 that is the root of the P-tunnel will not be considered for UMH 157 selection, which will result in the downstream PE to failover to the 158 Upstream PE, which is next in the list of candidates. Some 159 downstream PEs could arrive at a different conclusion regarding the 160 tunnel's state because the failure impacts only a subset of branches. 161 Because of that, procedures described in Section 9.1.1 of [RFC6513] 162 MUST be used when using I-PMSI P-tunnels. 164 There are three options specified in Section 5.1 of [RFC6513] for a 165 downstream PE to select an Upstream PE. 167 o The first two options select the Upstream PE from a candidate PE 168 set either based on an IP address or a hashing algorithm. When 169 used together with the optional procedure of considering the 170 P-tunnel status as in this document, a candidate Upstream PE is 171 included in the set if it either: 173 A. advertises an x-PMSI bound to a tunnel, where the specified 174 tunnel' state is not known to be down, or, 176 B. does not advertise any x-PMSI applicable to the given (C-S, 177 C-G) but has associated a VRF Route Import BGP attribute to 178 the unicast VPN route for S. That is necessary to avoid 179 incorrectly invalidating a UMH PE that would use a policy 180 where no I-PMSI is advertised for a given VRF and where only 181 S-PMSI are used. The S-PMSI can be advertised only after the 182 Upstream PE receives a C-multicast route for (C-S, C-G)/(C-*, 183 C-G) to be carried over the advertised S-PMSI. 185 If the resulting candidate set is empty, then the procedure is 186 repeated without considering the P-tunnel status. 188 o The third option uses the installed UMH Route (i.e., the "best" 189 route towards the C-root) as the Selected UMH Route, and its 190 originating PE is the selected Upstream PE. With the optional 191 procedure of considering P-tunnel status as in this document, the 192 Selected UMH Route is the best one among those whose originating 193 PE's P-tunnel is not "down". If that does not exist, the 194 installed UMH Route is selected regardless of the P-tunnel status. 196 3.1. Determining the Status of a Tunnel 198 Different factors can be considered to determine the "status" of a 199 P-tunnel and are described in the following sub-sections. The 200 optional procedures proposed in this section also allow that all 201 downstream PEs don't apply the same rules to define what the status 202 of a P-tunnel is (please see Section 6), and some of them will 203 produce a result that may be different for different downstream PEs. 204 Thus, the "status" of a P-tunnel in this section is not a 205 characteristic of the tunnel in itself, but is the tunnel status, as 206 seen from a particular downstream PE. Additionally, some of the 207 following methods determine the ability of a downstream PE to receive 208 traffic on the P-tunnel and not specifically on the status of the 209 P-tunnel itself. That could be referred to as "P-tunnel reception 210 status", but for simplicity, we will use the terminology of P-tunnel 211 "status" for all of these methods. 213 Depending on the criteria used to determine the status of a P-tunnel, 214 there may be an interaction with another resiliency mechanism used 215 for the P-tunnel itself, and the UMH update may happen immediately or 216 may need to be delayed. Each particular case is covered in each 217 separate sub-section below. 219 3.1.1. mVPN Tunnel Root Tracking 221 A condition to consider that the status of a P-tunnel is up is that 222 the root of the tunnel, as determined in the x-PMSI Tunnel attribute, 223 is reachable through unicast routing tables. In this case, the 224 downstream PE can immediately update its UMH when the reachability 225 condition changes. 227 That is similar to BGP next-hop tracking for VPN routes, except that 228 the address considered is not the BGP next-hop address, but the root 229 address in the x-PMSI Tunnel attribute. 231 If BGP next-hop tracking is done for VPN routes and the root address 232 of a given tunnel happens to be the same as the next-hop address in 233 the BGP A-D Route advertising the tunnel, then checking, in unicast 234 routing tables, whether the tunnel root is reachable, will be 235 unnecessary duplication and thus will not bring any specific benefit. 237 3.1.2. PE-P Upstream Link Status 239 A condition to consider a tunnel status as Up can be that the last- 240 hop link of the P-tunnel is Up. 242 Using this method when a fast restoration mechanism (such as MPLS FRR 243 [RFC4090]) is in place for the link requires careful consideration 244 and coordination of defect detection intervals for the link and the 245 tunnel. In many cases, it is not practical to use both protection 246 methods at the same time. 248 3.1.3. P2MP RSVP-TE Tunnels 250 For P-tunnels of type P2MP MPLS-TE, the status of the P-tunnel is 251 considered up if the sub-LSP to this downstream PE is in the Up 252 state. The determination of whether a P2MP RSVP-TE LSP is in the Up 253 state requires Path and Resv state for the LSP and is based on 254 procedures specified in [RFC4875]. As a result, the downstream PE 255 can immediately update its UMH when the reachability condition 256 changes. 258 When signaling state for a P2MP TE LSP is removed (e.g., if the 259 ingress of the P2MP TE LSP sends a PathTear message) or the P2MP TE 260 LSP changes state from Up to Down as determined by procedures in 261 [RFC4875], the status of the corresponding P-tunnel SHOULD be re- 262 evaluated. If the P-tunnel transitions from Up to Down state, the 263 Upstream PE that is the ingress of the P-tunnel SHOULD NOT be 264 considered a valid UMH. 266 3.1.4. Leaf-initiated P-tunnels 268 An Upstream PE SHOULD be removed from the UMH candidate list for a 269 given (C-S, C-G) if the P-tunnel (I-PMSI or S-PMSI) for this (S, G) 270 is leaf-triggered (PIM, mLDP), but for some reason, internal to the 271 protocol, the upstream one-hop branch of the tunnel from P to PE 272 cannot be built. As a result, the downstream PE can immediately 273 update its UMH when the reachability condition changes. 275 3.1.5. (C-S, C-G) Counter Information 277 In cases, where the downstream node can be configured so that the 278 maximum inter-packet time is known for all the multicast flows mapped 279 on a P-tunnel, the local per-(C-S, C-G) traffic counter information 280 for traffic received on this P-tunnel can be used to determine the 281 status of the P-tunnel. 283 When such a procedure is used, in the context where fast restoration 284 mechanisms are used for the P-tunnels, a configurable timer MUST be 285 set on the downstream PE to wait before updating the UMH, to let the 286 P-tunnel restoration mechanism to execute its actions. An 287 implementation SHOULD use three seconds as the default value for this 288 timer. 290 In cases where this mechanism is used in conjunction with the method 291 described in Section 5, no prior knowledge of the rate of the 292 multicast streams is required; downstream PEs can compare reception 293 on the two P-tunnels to determine when one of them is down. 295 3.1.6. BFD Discriminator Attribute 297 P-tunnel status may be derived from the status of a multipoint BFD 298 session [RFC8562] whose discriminator is advertised along with an 299 x-PMSI A-D Route. 301 This document defines the format and ways of using a new BGP 302 attribute called the "BFD Discriminator". It is an optional 303 transitive BGP attribute. In Section 7.2, IANA is requested to 304 allocate the codepoint value (TBA2). The format of this attribute is 305 shown in Figure 1. 307 0 1 2 3 308 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 309 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 310 | BFD Mode | Reserved | 311 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 312 | BFD Discriminator | 313 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 314 ~ Optional TLVs ~ 315 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 317 Figure 1: Format of the BFD Discriminator Attribute 319 Where: 321 BFD Mode field is the one octet long. This specification defines 322 the P2MP BFD Session as value 1 Section 7.2. 324 Reserved field is three octets long, and the value MUST be zeroed 325 on transmission and ignored on receipt. 327 BFD Discriminator field is four octets long. 329 Optional TLVs is the optional variable-length field that MAY be 330 used in the BFD Discriminator attribute for future extensions. 331 TLVs MAY be included in a sequential or nested manner. To allow 332 for TLV nesting, it is advised to define a new TLV as a variable- 333 length object. Figure 2 presents the Optional TLV format TLV that 334 consists of: 336 * one octet-long field of TLV 's Type value (Section 7.3) 338 * one octet-long field of the length of the Value field in octets 340 * variable length Value field. 342 The length of a TLV MUST be multiple of four octets. 344 0 1 2 3 345 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 346 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 347 | Type | Length | Value ... 348 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 350 Figure 2: Format of the Optional TLV 352 The BFD Discriminator attribute MUST be considered malformed if its 353 length is not a non-zero multiple of four. If malformed, the UPDATE 354 message SHALL be handled using the approach of Attribute Discard per 355 [RFC7606]. 357 3.1.6.1. Upstream PE Procedures 359 To enable downstream PEs to track the P-tunnel status using a p2mp 360 BFD session the Upstream PE: 362 o MUST initiate the BFD session and set bfd.SessionType = 363 MultipointHead as described in [RFC8562]; 365 o MUST set the IP destination address of the inner IP header to one 366 of the internal loopback addresses from 127/8 range for IPv4 or 367 one of IPv4-mapped IPv6 addresses from ::ffff:127.0.0.0/104 range 368 for IPv6 when transmitting BFD Control packets; 370 o MUST use its IP address as the source IP address when transmitting 371 BFD Control packets; 373 o MUST include the BFD Discriminator attribute in the x-PMSI A-D 374 Route with the value set to My Discriminator value; 376 o MUST periodically transmit BFD Control packets over the x-PMSI 377 P-tunnel after the P-tunnel is considered established. Note that 378 the methods to declare a P-tunnel has been established are outside 379 the scope of this specification. 381 If the tracking of the P-tunnel by using a p2mp BFD session is 382 enabled after the x-PMSI A-D Route has been already advertised, the 383 x-PMSI A-D Route MUST be re-sent with precisely the same attributes 384 as before and the BFD Discriminator attribute included. 386 If the x-PMSI A-D Route is advertised with P-tunnel status tracked 387 using the p2mp BFD session and it is desired to stop tracking 388 P-tunnel status using BFD, then: 390 o x-PMSI A-D Route MUST be re-sent with precisely the same 391 attributes as before, but the BFD Discriminator attribute MUST be 392 excluded; 394 o the p2mp BFD session SHOULD be deleted. 396 3.1.6.2. Downstream PE Procedures 398 Upon receiving the BFD Discriminator attribute in the x-PMSI A-D 399 Route, the downstream PE: 401 o MUST associate the received BFD Discriminator value with the 402 P-tunnel originating from the Upstream PE and the IP address of 403 the Upstream PE; 405 o MUST create a p2mp BFD session and set bfd.SessionType = 406 MultipointTail as described in [RFC8562]; 408 o MUST use the source IP address of the BFD Control packet, the 409 value of the BFD Discriminator field, and the x-PMSI Tunnel 410 Identifier [RFC6514] the BFD Control packet was received to 411 properly demultiplex BFD sessions. 413 After the state of the p2mp BFD session is up, i.e., bfd.SessionState 414 == Up, the session state will then be used to track the health of the 415 P-tunnel. 417 According to [RFC8562], if the downstream PE receives Down or 418 AdminDown in the State field of the BFD Control packet or associated 419 with the BFD session Detection Timer expires, the BFD session is 420 down, i.e., bfd.SessionState == Down. When the BFD session state is 421 Down, then the P-tunnel associated with the BFD session MUST be 422 considered down. If the site that contains C-S is connected to two 423 or more PEs, a downstream PE will select one as its Primary Upstream 424 PE, while others are considered as Standby Upstream PEs. In such a 425 scenario, when the P-tunnel is considered down, the downstream PE MAY 426 initiate a switchover of the traffic from the Primary Upstream PE to 427 the Standby Upstream PE only if the Standby Upstream PE is deemed 428 available. 430 If the downstream PE's P-tunnel is already established when the 431 downstream PE receives the new x-PMSI A-D Route with BFD 432 Discriminator attribute, the downstream PE MUST associate the value 433 of BFD Discriminator field with the P-tunnel and follow procedures 434 listed above in this section if and only if the x-PMSI A-D Route was 435 properly processed as per [RFC6514], and the BFD Discriminator 436 attribute was validated. 438 If the downstream PE's P-tunnel is already established, its state 439 being monitored by the p2mp BFD session, and the downstream PE 440 receives the new x-PMSI A-D Route without the BFD Discriminator 441 attribute, and the x-PMSI A-D Route was processed without any error 442 as per the relevant specifications, the downstream PE: 444 o MUST stop processing BFD Control packets for this p2mp BFD 445 session; 447 o SHOULD delete the p2mp BFD session associated with the P-tunnel; 449 o SHOULD NOT switch the traffic to the Standby Upstream PE. 451 3.1.7. Per PE-CE Link BFD Discriminator 453 The following approach is defined in response to the detection by the 454 Upstream PE of a PE-CE link failure. Even though the provider tunnel 455 is still up, it is desired for the downstream PEs to switch to a 456 backup Upstream PE. To achieve that, if the Upstream PE detects that 457 its PE-CE link fails, it SHOULD set the bfd.LocalDiag of the p2mp BFD 458 session to Concatenated Path Down and/or Reverse Concatenated Path 459 Down (per section 6.8.17 [RFC5880]), unless it switches to a new PE- 460 CE link within the time of bfd.DesiredMinTxInterval for the p2mp BFD 461 session (in that case, the Upstream PE will start tracking the status 462 of the new PE-CE link). When a downstream PE receives that 463 bfd.LocalDiag code, it treats it as if the tunnel itself failed and 464 tries to switch to a backup PE. 466 4. Standby C-multicast Route 468 The procedures described below are limited to the case where the site 469 that contains C-S is connected to two or more PEs though, to simplify 470 the description, the case of dual-homing is described. The 471 procedures require all the PEs of that MVPN to follow the same UMH 472 selection procedure, as specified in [RFC6513], whether the PE 473 selected based on its IP address, hashing algorithm described in 474 section 5.1.3 [RFC6513], or Installed UMH Route. The procedures 475 assume that if a site of a given MVPN that contains C-S is dual-homed 476 to two PEs, then all the other sites of that MVPN would have two 477 unicast VPN routes (VPN-IPv4 or VPN-IPv6) routes to C-S, each with 478 its RD. 480 As long as C-S is reachable via both PEs, a given downstream PE will 481 select one of the PEs connected to C-S as its Upstream PE for C-S. 482 We will refer to the other PE connected to C-S as the "Standby 483 Upstream PE". Note that if the connectivity to C-S through the 484 Primary Upstream PE becomes unavailable, then the PE will select the 485 Standby Upstream PE as its Upstream PE for C-S. When the Primary PE 486 later becomes available, then the PE will select the Primary Upstream 487 PE again as its Upstream PE. Such behavior is referred to as 488 "revertive" behavior and MUST be supported. Non-revertive behavior 489 would refer to the behavior of continuing to select the backup PE as 490 the UMH even after the Primary has come up. This non-revertive 491 behavior MAY also be supported by an implementation and would be 492 enabled through some configuration. 494 For readability, in the following sub-sections, the procedures are 495 described for BGP C-multicast Source Tree Join routes, but they apply 496 equally to BGP C-multicast Shared Tree Join routes for the case where 497 the customer RP is dual-homed (substitute "C-RP" to "C-S"). 499 4.1. Downstream PE Behavior 501 When a (downstream) PE connected to some site of an MVPN needs to 502 send a C-multicast route (C-S, C-G), then following the procedures 503 specified in Section 11.1 of [RFC6514], the PE sends the C-multicast 504 route with an RT that identifies the Upstream PE selected by the PE 505 originating the route. As long as C-S is reachable via the Primary 506 Upstream PE and the Upstream PE is the Primary Upstream PE. If C-S 507 is reachable only via the Standby Upstream PE, then the Upstream PE 508 is the Standby Upstream PE. 510 If C-S is reachable via both the Primary and the Standby Upstream PE, 511 then in addition to sending the C-multicast route with an RT that 512 identifies the Primary Upstream PE, the downstream PE also originates 513 and sends a C-multicast route with an RT that identifies the Standby 514 Upstream PE. The route that has the semantics of being a "standby" 515 C-multicast route is further called a "Standby BGP C-multicast 516 route", and is constructed as follows: 518 o the NLRI is constructed as the C-multicast route with an RT that 519 identifies the Primary Upstream PE, except that the RD is the same 520 as if the C-multicast route was built using the Standby Upstream 521 PE as the UMH (it will carry the RD associated to the unicast VPN 522 route advertised by the Standby Upstream PE for S and a Route 523 Target derived from the Standby Upstream PE's UMH route's VRF RT 524 Import EC); 526 o SHOULD carry the "Standby PE" BGP Community (this is a new BGP 527 Community. Section 7.1 requested IANA to allocate value TBA1). 529 The normal and the standby C-multicast routes must have their Local 530 Preference attribute adjusted so that, if a BGP peer receives two 531 C-multicast routes with the same NLRI, one carrying the "Standby PE" 532 community and the other one not carrying the "Standby PE" community, 533 then preference is given to the one not carrying the "Standby PE" 534 community. Such a situation can happen when, for instance, due to 535 transient unicast routing inconsistencies or lack of support of the 536 Standby PE community, two different downstream PEs consider different 537 Upstream PEs to be the primary one. In that case, without any 538 precaution taken, both Upstream PEs would process a standby 539 C-multicast route and possibly stop forwarding at the same time. For 540 this purpose, routes that carry the "Standby PE" BGP Community MUST 541 have the LOCAL_PREF attribute set to zero. 543 Note that, when a PE advertises such a Standby C-multicast join for a 544 (C-S, C-G) it MUST join the corresponding P-tunnel. 546 If at some later point, the PE determines that C-S is no longer 547 reachable through the Primary Upstream PE, the Standby Upstream PE 548 becomes the Upstream PE, and the PE re-sends the C-multicast route 549 with RT that identifies the Standby Upstream PE, except that now the 550 route does not carry the Standby PE BGP Community (which results in 551 replacing the old route with a new route, with the only difference 552 between these routes being the presence/absence of the Standby PE BGP 553 Community). Also, a LOCAL_PREF attribute MUST be set to zero. 555 4.2. Upstream PE Behavior 557 When a PE receives a C-multicast route for a particular (C-S, C-G), 558 and the RT carried in the route results in importing the route into a 559 particular VRF on the PE, if the route carries the Standby PE BGP 560 Community, then the PE performs as follows: 562 when the PE determines (the use of the particular method to detect 563 the failure is outside the scope of this document) that C-S is not 564 reachable through some other PE, the PE SHOULD install VRF PIM 565 state corresponding to this Standby BGP C-multicast route (the 566 result will be that a PIM Join message will be sent to the CE 567 towards C-S, and that the PE will receive (C-S, C-G) traffic), and 568 the PE SHOULD forward (C-S, C-G) traffic received by the PE to 569 other PEs through a P-tunnel rooted at the PE. 571 Furthermore, irrespective of whether C-S carried in that route is 572 reachable through some other PE: 574 a) based on local policy, as soon as the PE receives this Standby BGP 575 C-multicast route, the PE MAY install VRF PIM state corresponding 576 to this BGP Source Tree Join route (the result will be that Join 577 messages will be sent to the CE toward C-S, and that the PE will 578 receive (C-S, C-G) traffic) 580 b) based on local policy, as soon as the PE receives this Standby BGP 581 C-multicast route, the PE MAY forward (C-S, C-G) traffic to other 582 PEs through a P-tunnel independently of the reachability of C-S 583 through some other PE. [note that this implies also doing (a)] 585 Doing neither (a) or (b) for a given (C-S, C-G) is called "cold root 586 standby". 588 Doing (a) but not (b) for a given (C-S, C-G) is called "warm root 589 standby". 591 Doing (b) (which implies also doing (a)) for a given (C-S, C-G) is 592 called "hot root standby". 594 Note that, if an Upstream PE uses an S-PMSI only policy, it shall 595 advertise an S-PMSI for a (C-S, C-G) as soon as it receives a 596 C-multicast route for (C-S, C-G), normal or Standby; i.e., it shall 597 not wait for receiving a non-Standby C-multicast route before 598 advertising the corresponding S-PMSI. 600 Section 9.3.2 of [RFC6514], describes the procedures of sending a 601 Source-Active A-D Route as a result of receiving the C-multicast 602 route. These procedures MUST be followed for both the normal and 603 Standby C-multicast routes. 605 4.3. Reachability Determination 607 The Standby Upstream PE can use the following information to 608 determine that C-S can or cannot be reached through the Primary 609 Upstream PE: 611 o presence/absence of a unicast VPN route toward C-S 613 o supposing that the Standby Upstream PE is the egress of the tunnel 614 rooted at the Primary Upstream PE, the Standby Upstream PE can 615 determine the reachability of C-S through the Primary Upstream PE 616 based on the status of this tunnel, determined thanks to the same 617 criteria as the ones described in Section 3.1 (without using the 618 UMH selection procedures of Section 3); 620 o other mechanisms MAY be used. 622 4.4. Inter-AS 624 If the non-segmented inter-AS approach is used, the procedures 625 described in Section 4.1 through Section 4.3 can be applied. 627 When multicast VPNs are used in an inter-AS context with the 628 segmented inter-AS approach described in Section 9.2 of [RFC6514], 629 the procedures in this section can be applied. 631 A pre-requisite for the procedures described below to be applied for 632 a source of a given MVPN is: 634 o that any PE of this MVPN receives two or more Inter-AS I-PMSI A-D 635 Routes advertised by the AS of the source 637 o that these Inter-AS I-PMSI A-D Routes have distinct Route 638 Distinguishers (as described in item "(2)" of section 9.2 of 639 [RFC6514]). 641 As an example, these conditions will be satisfied when the source is 642 dual-homed to an AS that connects to the receiver AS through two ASBR 643 using auto-configured RDs. 645 4.4.1. Inter-AS Procedures for downstream PEs, ASBR Fast Failover 647 The following procedure is applied by downstream PEs of an AS, for a 648 source S in a remote AS. 650 Additionally, to choosing an Inter-AS I-PMSI A-D Route advertised 651 from the AS of the source to construct a C-multicast route, as 652 described in section 11.1.3 [RFC6514], a downstream PE will choose a 653 second Inter-AS I-PMSI A-D Route advertised from the AS of the source 654 and use this route to construct and advertise a Standby C-multicast 655 route (C-multicast route carrying the Standby extended community), as 656 described in Section 4.1. 658 4.4.2. Inter-AS Procedures for ASBRs 660 When an Upstream ASBR receives a C-multicast route, and at least one 661 of the RTs of the route matches one of the ASBR Import RT, the ASBR, 662 that supports this specification, MUST locate an Inter-AS I-PMSI A-D 663 Route whose RD and Source AS respectively match the RD and Source AS 664 carried in the C-multicast route. If the match is found, and the 665 C-multicast route carries the Standby PE BGP Community, then the ASBR 666 MUST perform as follows: 668 o if the route was received over iBGP and its LOCAL_PREF attribute 669 is set to zero, then it MUST be re-advertised in eBGP with a MED 670 attribute (MULTI_EXIT_DISC) set to the highest possible value 671 (0xffff) 673 o if the route was received over eBGP and its MED attribute set of 674 0xffff, then it MUST be re-advertised in iBGP with a LOCAL_PREF 675 attribute set to zero 677 Other ASBR procedures are applied without modification. 679 5. Hot Root Standby 681 The mechanisms defined in sections Section 4 and Section 3 can be 682 used together as follows. 684 The principle is that, for a given VRF (or possibly only for a given 685 (C-S, C-G): 687 o downstream PEs advertise a Standby BGP C-multicast route (based on 688 Section 4) 690 o Upstream PEs use the "hot standby" optional behavior and thus will 691 forward traffic for a given multicast state as soon as they have 692 whether a (primary) BGP C-multicast route or a Standby BGP 693 C-multicast route for that state (or both) 695 o downstream PEs accept traffic from the primary or standby tunnel, 696 based on the status of the tunnel (based on Section 3) 698 Other combinations of the mechanisms proposed in Section 4 and 699 Section 3 are for further study. 701 Note that the same level of protection would be achievable with a 702 simple C-multicast Source Tree Join route advertised to both the 703 primary and secondary Upstream PEs (carrying as Route Target extended 704 communities, the values of the VRF Route Import attribute of each VPN 705 route from each Upstream PEs). The advantage of using the Standby 706 semantic for is that, supposing that downstream PEs always advertise 707 a Standby C-multicast route to the secondary Upstream PE, it allows 708 to choose the protection level through a change of configuration on 709 the secondary Upstream PE, without requiring any reconfiguration of 710 all the downstream PEs. 712 6. Duplicate Packets 714 Multicast VPN specifications [RFC6513] impose that a PE only forwards 715 to CEs the packets coming from the expected Upstream PE 716 (Section 9.1). 718 We highlight the reader's attention to the fact that the respect of 719 this part of multicast VPN specifications is especially important 720 when two distinct Upstream PEs are susceptible to forward the same 721 traffic on P-tunnels at the same time in the steady state. That will 722 be the case when "hot root standby" mode is used (Section 4), and 723 which can also be the case if procedures of Section 3 are used and 724 (a) the rules determining the status of a tree are not the same on 725 two distinct downstream PEs or (b) the rule determining the status of 726 a tree depends on conditions local to a PE (e.g., the PE-P upstream 727 link being up). 729 7. IANA Considerations 731 7.1. Standby PE Community 733 IANA is requested to allocate the BGP "Standby PE" community value 734 (TBA1) from the Border Gateway Protocol (BGP) Well-known Communities 735 registry. 737 7.2. BFD Discriminator 739 This document defines a new BGP optional transitive attribute, called 740 "BFD Discriminator". IANA is requested to allocate a codepoint 741 (TBA2) in the "BGP Path Attributes" registry to the BFD Discriminator 742 attribute. 744 IANA is requested to create a new BFD Mode sub-registry in the Border 745 Gateway Protocol (BGP) Parameters registry. The registration 746 policies, per [RFC8126], for this sub-registry are according to 747 Table 1. 749 +-----------+-------------------------+ 750 | Value | Policy | 751 +-----------+-------------------------+ 752 | 0- 175 | IETF Review | 753 | 176 - 249 | First Come First Served | 754 | 250 - 254 | Experimental Use | 755 | 255 | IETF Review | 756 +-----------+-------------------------+ 758 Table 1: BFD Mode Sub-registry Registration Policies 760 IANA is requested to make initial assignments according to Table 2. 762 +-----------+------------------+---------------+ 763 | Value | Description | Reference | 764 +-----------+------------------+---------------+ 765 | 0 | Reserved | This document | 766 | 1 | P2MP BFD Session | This document | 767 | 2- 175 | Unassigned | This document | 768 | 176 - 249 | Unassigned | This document | 769 | 250 - 254 | Experimental Use | This document | 770 | 255 | Reserved | This document | 771 +-----------+------------------+---------------+ 773 Table 2: BFD Mode Sub-registry 775 7.3. BFD Discriminator Optional Sub-TLV Type 777 IANA is requested to create a new BFD Discriminator Optional sub-TLV 778 Type sub-registry in Border Gateway Protocol (BGP). The registration 779 policies, per [RFC8126], for this sub-registry are according to 780 Table 3. 782 +-----------+-------------------------+ 783 | Value | Policy | 784 +-----------+-------------------------+ 785 | 0- 175 | IETF Review | 786 | 176 - 249 | First Come First Served | 787 | 250 - 254 | Experimental Use | 788 | 255 | IETF Review | 789 +-----------+-------------------------+ 791 Table 3: BFD Discriminator Optional Sub-TLV Type Sub-registry 792 Registration Policies 794 IANA is requested to make initial assignments according to Table 4. 796 +-----------+------------------+---------------+ 797 | Value | Description | Reference | 798 +-----------+------------------+---------------+ 799 | 0 | Reserved | This document | 800 | 1- 175 | Unassigned | This document | 801 | 176 - 249 | Unassigned | This document | 802 | 250 - 254 | Experimental Use | This document | 803 | 255 | Reserved | This document | 804 +-----------+------------------+---------------+ 806 Table 4: BFD Discriminator Optional Sub-TLV Type Sub-registry 808 8. Security Considerations 810 This document describes procedures based on [RFC6513] and [RFC6514] 811 and hence shares the security considerations respectively represented 812 in these specifications. 814 This document uses p2mp BFD, as defined in [RFC8562], which, in turn, 815 is based on [RFC5880]. Security considerations relevant to each 816 protocol are discussed in the respective protocol specifications. An 817 implementation that supports this specification MUST use a mechanism 818 to control the maximum number of p2mp BFD sessions that can be active 819 at the same time. 821 9. Acknowledgments 823 The authors want to thank Greg Reaume, Eric Rosen, Jeffrey Zhang, 824 Martin Vigoureux, and Zheng (Sandy) Zhang for their reviews, useful 825 comments, and helpful suggestions. 827 10. Contributor Addresses 829 Below is a list of other contributing authors in alphabetical order: 831 Rahul Aggarwal 832 Arktan 834 Email: raggarwa_1@yahoo.com 836 Nehal Bhau 837 Cisco 839 Email: NBhau@cisco.com 840 Clayton Hassen 841 Bell Canada 842 2955 Virtual Way 843 Vancouver 844 CANADA 846 Email: Clayton.Hassen@bell.ca 848 Wim Henderickx 849 Nokia 850 Copernicuslaan 50 851 Antwerp 2018 852 Belgium 854 Email: wim.henderickx@nokia.com 856 Pradeep Jain 857 Nokia 858 701 E Middlefield Rd 859 Mountain View, CA 94043 860 USA 862 Email: pradeep.jain@nokia.com 864 Jayant Kotalwar 865 Nokia 866 701 E Middlefield Rd 867 Mountain View, CA 94043 868 USA 870 Email: Jayant.Kotalwar@nokia.com 872 Praveen Muley 873 Nokia 874 701 East Middlefield Rd 875 Mountain View, CA 94043 876 U.S.A. 878 Email: praveen.muley@nokia.com 879 Ray (Lei) Qiu 880 Juniper Networks 881 1194 North Mathilda Ave. 882 Sunnyvale, CA 94089 883 U.S.A. 885 Email: rqiu@juniper.net 887 Yakov Rekhter 888 Juniper Networks 889 1194 North Mathilda Ave. 890 Sunnyvale, CA 94089 891 U.S.A. 893 Email: yakov@juniper.net 895 Kanwar Singh 896 Nokia 897 701 E Middlefield Rd 898 Mountain View, CA 94043 899 USA 901 Email: kanwar.singh@nokia.com 903 11. References 905 11.1. Normative References 907 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 908 Requirement Levels", BCP 14, RFC 2119, 909 DOI 10.17487/RFC2119, March 1997, 910 . 912 [RFC4875] Aggarwal, R., Ed., Papadimitriou, D., Ed., and S. 913 Yasukawa, Ed., "Extensions to Resource Reservation 914 Protocol - Traffic Engineering (RSVP-TE) for Point-to- 915 Multipoint TE Label Switched Paths (LSPs)", RFC 4875, 916 DOI 10.17487/RFC4875, May 2007, 917 . 919 [RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection 920 (BFD)", RFC 5880, DOI 10.17487/RFC5880, June 2010, 921 . 923 [RFC6513] Rosen, E., Ed. and R. Aggarwal, Ed., "Multicast in MPLS/ 924 BGP IP VPNs", RFC 6513, DOI 10.17487/RFC6513, February 925 2012, . 927 [RFC6514] Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP 928 Encodings and Procedures for Multicast in MPLS/BGP IP 929 VPNs", RFC 6514, DOI 10.17487/RFC6514, February 2012, 930 . 932 [RFC7606] Chen, E., Ed., Scudder, J., Ed., Mohapatra, P., and K. 933 Patel, "Revised Error Handling for BGP UPDATE Messages", 934 RFC 7606, DOI 10.17487/RFC7606, August 2015, 935 . 937 [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for 938 Writing an IANA Considerations Section in RFCs", BCP 26, 939 RFC 8126, DOI 10.17487/RFC8126, June 2017, 940 . 942 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 943 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 944 May 2017, . 946 [RFC8562] Katz, D., Ward, D., Pallagatti, S., Ed., and G. Mirsky, 947 Ed., "Bidirectional Forwarding Detection (BFD) for 948 Multipoint Networks", RFC 8562, DOI 10.17487/RFC8562, 949 April 2019, . 951 11.2. Informative References 953 [RFC4090] Pan, P., Ed., Swallow, G., Ed., and A. Atlas, Ed., "Fast 954 Reroute Extensions to RSVP-TE for LSP Tunnels", RFC 4090, 955 DOI 10.17487/RFC4090, May 2005, 956 . 958 [RFC7431] Karan, A., Filsfils, C., Wijnands, IJ., Ed., and B. 959 Decraene, "Multicast-Only Fast Reroute", RFC 7431, 960 DOI 10.17487/RFC7431, August 2015, 961 . 963 Authors' Addresses 965 Thomas Morin (editor) 966 Orange 967 2, avenue Pierre Marzin 968 Lannion 22307 969 France 971 Email: thomas.morin@orange-ftgroup.com 973 Robert Kebler (editor) 974 Juniper Networks 975 1194 North Mathilda Ave. 976 Sunnyvale, CA 94089 977 U.S.A. 979 Email: rkebler@juniper.net 981 Greg Mirsky (editor) 982 ZTE Corp. 984 Email: gregimirsky@gmail.com