idnits 2.17.1 draft-wang-lsr-prefix-unreachable-annoucement-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (31 July 2021) is 998 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC2328' is defined on line 345, but no explicit reference was found in the text == Unused Reference: 'RFC5340' is defined on line 359, but no explicit reference was found in the text == Unused Reference: 'RFC5709' is defined on line 363, but no explicit reference was found in the text == Unused Reference: 'RFC7770' is defined on line 368, but no explicit reference was found in the text == Unused Reference: 'RFC7981' is defined on line 378, but no explicit reference was found in the text Summary: 0 errors (**), 0 flaws (~~), 7 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 LSR Working Group A. Wang 3 Internet-Draft China Telecom 4 Intended status: Standards Track G. Mishra 5 Expires: 1 February 2022 Verizon Inc. 6 Z. Hu 7 Y. Xiao 8 Huawei Technologies 9 31 July 2021 11 Prefix Unreachable Announcement 12 draft-wang-lsr-prefix-unreachable-annoucement-07 14 Abstract 16 This document describes a mechanism to solve an existing issue with 17 Longest Prefix Match (LPM), that exists where an operator domain is 18 divided into multiple areas or levels where summarization is 19 utilized. This draft addresses a fail-over issue related to a multi 20 areas or levels domain, where a link or node down event occurs 21 resulting in an LPM component prefix being omitted from the FIB 22 resulting in black hole sink of routing and connectivity loss. This 23 draft introduces a new control plane convergence signaling mechanism 24 using a negative prefix called Prefix Unreachable Announcement (PUA), 25 utilized to detect a link or node down event and signal the RIB that 26 the event has occurred to force immediate control plane convergence. 28 Status of This Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at https://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on 1 February 2022. 45 Copyright Notice 47 Copyright (c) 2021 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 52 license-info) in effect on the date of publication of this document. 53 Please review these documents carefully, as they describe your rights 54 and restrictions with respect to this document. Code Components 55 extracted from this document must include Simplified BSD License text 56 as described in Section 4.e of the Trust Legal Provisions and are 57 provided without warranty as described in the Simplified BSD License. 59 Table of Contents 61 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 62 2. Conventions used in this document . . . . . . . . . . . . . . 3 63 3. Scenario Description . . . . . . . . . . . . . . . . . . . . 3 64 3.1. Inter-Area Node Failure Scenario . . . . . . . . . . . . 4 65 3.2. Inter-Area Links Failure Scenario . . . . . . . . . . . . 4 66 4. PUA (Prefix Unreachable Advertisement) Procedures . . . . . . 5 67 5. MPLS and SRv6 LPM based BGP Next-hop Failure Application . . 5 68 6. Implementation Consideration . . . . . . . . . . . . . . . . 6 69 7. Deployment Considerations . . . . . . . . . . . . . . . . . . 7 70 8. Security Considerations . . . . . . . . . . . . . . . . . . . 7 71 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 72 10. Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 7 73 11. Normative References . . . . . . . . . . . . . . . . . . . . 7 74 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 9 76 1. Introduction 78 As part of an operator optimized design criteria, a critical 79 requirement is to limit Shortest Path First (SPF) churn which occurs 80 within a single OSPF area or ISIS level. This is accomplished by 81 sub-dividing the IGP domain into multiple areas for flood reduction 82 of intra area prefixes so they are contained within each discrete 83 area to avoid domain wide flooding. 85 OSPF and ISIS have a default and summary route mechanism which is 86 performed on the OSPF area border router or ISIS L1-L2 node. The 87 OSPF summary route is triggered to be advertised conditionally when 88 at least one component prefix exists within the non-zero area. ISIS 89 Level-L1-L2 node as well generate a summary prefix into the level-2 90 backbone area for Level 1 area prefixes that is triggered to be 91 advertised conditionally when at least a single component prefix 92 exists within the Level-1 area. ISIS L1-L2 node with attach bit set 93 also generates a default route into each Level-1 area along with 94 summary prefixes generated for other Level-1 areas. 96 Operators have historically relied on MPLS architecture which is 97 based on exact match host route FEC binding for single area. 98 [RFC5283] LDP inter-area extension provides the ability to LPM, so 99 now the RIB match can now be a summary match and not an exact match 100 of a host route of the egress PE for an inter-area LSP to be 101 instantiated. SRV6 routing framework utilities the IPv6 data plane 102 standard IGP LPM. When operators start to migrate from MPLS LSP 103 based host route bootstrapped FEC binding, to SRv6 routing framework, 104 the IGP LPM now comes into play with summarization which will 105 influence the forwarding of traffic when a link or node event occurs 106 for a component prefix within the summary range resulting in black 107 hole routing of traffic. 109 The motivation behind this draft is based on either MPLS LPM FEC 110 binding, or SRv6 BGP service overlay using traditional unicast 111 routing (uRIB) LPM forwarding plane where the IGP domain has been 112 carved up into OSPF or ISIS areas and summarization is utilized. In 113 this scenario where a failure conditions result in a black hole of 114 traffic where multiple ABRs exist and either the area is partitioned 115 or other link or node failures occur resulting in the component 116 prefix host route missing within the summary range. Summarization of 117 inter-area types routes propagated into the backbone area for flood 118 reduction are made up of component prefixes. It is these component 119 prefixes that the PUA tracks to ensure traffic is not black hole sink 120 routed due to a PE or ABR failure. The PUA mechanism ensures 121 immediate control plane convergence with ABR or PE node switchover 122 when area is partitioned or ABR has services down to avoid black hole 123 of traffic. 125 2. Conventions used in this document 127 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 128 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 129 document are to be interpreted as described in [RFC2119] . 131 3. Scenario Description 133 Figure 1 illustrates the topology scenario when OSPF or ISIS is 134 running in multi areas or multi levels domain. R0-R4 are routers in 135 backbone area, S1-S4,T1-T4 are internal routers in area 1 and area 2 136 respectively. R1 and R3 are area border routers or ISIS Level 1-2 137 border nodes between area 0 and area 1. R2 and R4 are area border 138 routers between area 0 and area 2. 140 S1/S4 and T2/T4 PEs peer to customer CEs for overlay VPNs. Ps1/Ps4 141 is the loopback0 address of S1/S4 and Pt2/Pt4 is the loopback0 142 address of T2/T4. 144 +---------------------+------+--------+-----+--------------+ 145 | +--+ +--+ ++-+ ++-+ +-++ + -+ +--+| 146 | |S1+--------+S2+---+R1+---|R0+----+R2+---+T1+--------+T2|| 147 | +-++Ps1 +-++ ++-+ +--+ +-++ ++++ Pt2 +-++| 148 | | | | | || | | 149 | | | | | || | | 150 | +-++Ps4 +-++ ++-+ +-++ ++++ Pt4+-++| 151 | |S4+--------+S3+---+R3+-----------+R4+---+T3+--------+T4|| 152 | +--+ +--+ ++-+ +-++ ++-+ +--+| 153 | | | | 154 | | | | 155 | Area 1 | Area 0 | Area 2 | 156 +---------------------+---------------+--------------------+ 158 Figure 1: OSPF Inter-Area Prefix Unreachable Announcement Scenario 160 3.1. Inter-Area Node Failure Scenario 162 If the area border router R2/R4 does the summary action, then one 163 summary address that cover the prefixes of area 2 will be announced 164 to area 0 and area 1, instead of the detail address. When the node 165 T2 is down, Pt2 bgp next hop becomes unreachable while the LPM 166 summary prefix continues to be advertised into the backbone area. 167 Except the border router R2/R4, the other routers within area 0 and 168 area 1 do not know the unreachable status of the Pt2 bgp next hop 169 prefix. Traffic will continue to forward LPM match to prefix Pt2 and 170 will be dropped on the ABR or Level 1-2 border node resulting in 171 black hole routing and connectivity loss. Customer overlay VPN dual 172 homed to both S1/S4 and T2/R4, traffic will not be able to fail-over 173 to alternate egress PE T4 bgp next hop Pt4 due to the summarization. 175 3.2. Inter-Area Links Failure Scenario 177 In a link failure scenario, if the link between T1/T2 and T1/T3 are 178 down, R2 will not be able to reach node T2. But as R2 and R4 do the 179 summary announcement, and the summary address covers the bgp next hop 180 prefix of Pt2, other nodes in area 0 area 1 will still send traffic 181 to T2 bgp next hop prefix Pt2 via the border router R2, thus black 182 hole sink routing the traffic. 184 In such a situation, the border router R2 should notify other routers 185 that it can't reach the prefix Pt2, and lets the other ABRs(R4) that 186 can reach prefix Pt2 advertise one specific route to Pt2, then the 187 internal routers will select R4 as the bypass router to reach prefix 188 Pt2. 190 4. PUA (Prefix Unreachable Advertisement) Procedures 192 [RFC7794] and [I-D.ietf-lsr-ospf-prefix-originator] draft both define 193 one sub-tlv to announce the originator information of the one prefix 194 from a specified node. This draft utilizes such TLV for both OSPF 195 and ISIS to signal the negative prefix in the perspective PUA when a 196 link or node goes down. 198 ABR detects link or node down and floods PUA negative prefix 199 advertisement along with the summary advertisement according to the 200 prefix-originator specification. The ABR or ISIS L1-L2 border node 201 has the responsibility to add the prefix originator information when 202 it receives the Router LSA from other routers in the same area or 203 level. 205 When the ABR or ISIS L1-L2 border node generates the summary 206 advertisement based on component prefixes, the ABR will announce one 207 new summary LSA or LSP which includes the information about this down 208 prefix, with the prefix originator set to NULL. The number of PUAs 209 is equivalent to the number of links down or nodes down. The LSA or 210 LSP will be propagated with standard flooding procedures. 212 If the nodes in the area receive the PUA flood from all of its ABR 213 routers, they will start BGP convergence process if there exist BGP 214 session on this PUA prefix. The PUA creates a forced fail over 215 action to initiate immediate control plane convergence switchover to 216 alternate egress PE. Without the PUA forced convergence the down 217 prefix will yield black hole routing resulting in loss of 218 connectivity. 220 When only some of the ABRs can't reach the failure node/link, as that 221 described in Section 3.2, the ABR that can reach the PUA prefix 222 should advertise one specific route to this PUA prefix. The internal 223 routers within another area can then bypass the ABRs that can't reach 224 the PUA prefix, to reach the PUA prefix. 226 5. MPLS and SRv6 LPM based BGP Next-hop Failure Application 228 In an MPLS or SR-MPLS service provider core, scalability has been a 229 concern for operators which have split up the IGP domain into 230 multiple areas to avoid SPF churn. Normally, MPLS FEC binding for 231 LSP instantiation is based on egress PE exact match of a host route 232 Looback0. [RFC5283] LDP inter-area extension provides the ability to 233 LPM, so now the RIB match can now be a summary match and not an exact 234 match of host route of the egress PE for an inter-area LSP to be 235 instantiated. The caveat related to this feature that has prevented 236 operators from using the [RFC5283] LDP inter-area extension concept 237 is that when the component prefixes are now hidden in the summary 238 prefix, and thus the visibility of the BGP next-hop attribute is 239 lost. 241 In a case where a PE is down, and the [RFC5283] LDP inter-area 242 extension LPM summary is used to build the LSP inter-area, the LSP 243 remains partially established black hole on the ABR performing the 244 summarization. This major gap with [RFC5283] inter-area extension 245 forces operators into a workaround of having to flood the BGP next- 246 hop domain wide. In a small network this is fine, however if you 247 have 1000s PEs and many areas, the domain wide flooding can be 248 painful for operators as far as resource usage memory consumption and 249 computational requirements for RIB / FIB / LFIB label binding control 250 plane state. The ramifications of domain wide flooding of host 251 routes is described in detail in [RFC5302] domain wide prefix 252 distribution with 2 level ISIS Section 1.2 - Scalability. As SRv6 253 utilizes LPM, this problem exists as well with SRv6 when IGP domain 254 is broken up into areas and summarization is utilized. 256 PUA is now able to provide the negative prefix component flooded 257 across the backbone to the other areas along with the summary prefix, 258 which is now immediately programmed into the RIB control plane. MPLS 259 LSP exact match or SRv6 LPM match over fail over path can now be 260 established to the alternate egress PE. No disruption in traffic or 261 loss of connectivity results from PUA. Further optimizations such as 262 LFA and BFD can be done to make the data plane convergence hitless. 263 The PUA solution applies to MPLS or SR-MPLS where LDP inter-area 264 extension is utilized for LPM aggregate FEC, as well a SRv6 IPv6 265 control plane LPM match summarization of BGP next hop. 267 6. Implementation Consideration 269 Considering the balances of reachable information and unreachable 270 information announcement capabilities, the implementation of this 271 mechanism should set one MAX_Address_Announcement (MAA) threshold 272 value that can be configurable. Then, the ABR should make the 273 following decisions to announce the prefixes: 275 1. If the number of unreachable prefixes is less than MAA, the ABR 276 should advertise the summary address and the PUA. 278 2. Else if the number of reachable address is less than MAA, the ABR 279 should advertise the detail reachable address only. 281 3. Else, the number of reachable prefixes and unreachable prefixes 282 both exceed MAA, then advertise the summary address with MAX metric. 284 7. Deployment Considerations 286 To support the PUA advertisement, the ABRs should be upgraded 287 according to the procedures described in Section 4. The PEs that 288 want to accomplish the BGP switchover that described in Section 3.1 289 and Section 5 should also be upgraded to act upon the receive of the 290 PUA message. Other nodes within the network should ignore such PUA 291 message if they don't care or don't support it. The routers within 292 the IGP domain should not install erroneously the route to the 293 prefixes when they receives PUA message. 295 As described in Section 4, the ABR will advertise the PUA message 296 once it detects there is link or node down within the summary 297 address. In order to reduce the unnecessary advertisements of PUA 298 messages on ABRs, the ABRs should support the configuration of the 299 protected prefixes. Based on such information, the ABR will only 300 advertise the PUA message when the protected prefixes(for example, 301 the loopback addresses of PEs that run BGP) that within the summary 302 address is missing. 304 The advertisement of PUA message should only last one configurable 305 period to allow the services that run on the failure prefixes are 306 converged or switchover. If one prefix is missed before the PUA 307 mechanism takes effect, the ABR will not declare its absence via the 308 PUA mechanism. 310 8. Security Considerations 312 Advertisement of PUA information follow the same procedure of 313 traditional LSA. The action based on the PUA is clearly defined in 314 this document for ABR or Level1/2 router and the receiver that run 315 BGP. 317 There is no changes to the forward behavior of other internal 318 routers. 320 9. IANA Considerations 322 This document has no IANA actions. 324 10. Acknowledgement 326 Thanks Peter Psenak, Les Ginsberg, Acee Lindem, Shraddha Hegde, 327 Robert Raszuk, Tonly Li, Jeff Tantsura, Tony Przygienda and Bruno 328 Decraene for their suggestions and comments on this draft. 330 11. Normative References 332 [I-D.ietf-lsr-ospf-prefix-originator] 333 Wang, A., Lindem, A., Dong, J., Psenak, P., and K. 334 Talaulikar, "OSPF Prefix Originator Extensions", Work in 335 Progress, Internet-Draft, draft-ietf-lsr-ospf-prefix- 336 originator-12, 9 April 2021, 337 . 340 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 341 Requirement Levels", BCP 14, RFC 2119, 342 DOI 10.17487/RFC2119, March 1997, 343 . 345 [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, 346 DOI 10.17487/RFC2328, April 1998, 347 . 349 [RFC5283] Decraene, B., Le Roux, JL., and I. Minei, "LDP Extension 350 for Inter-Area Label Switched Paths (LSPs)", RFC 5283, 351 DOI 10.17487/RFC5283, July 2008, 352 . 354 [RFC5302] Li, T., Smit, H., and T. Przygienda, "Domain-Wide Prefix 355 Distribution with Two-Level IS-IS", RFC 5302, 356 DOI 10.17487/RFC5302, October 2008, 357 . 359 [RFC5340] Coltun, R., Ferguson, D., Moy, J., and A. Lindem, "OSPF 360 for IPv6", RFC 5340, DOI 10.17487/RFC5340, July 2008, 361 . 363 [RFC5709] Bhatia, M., Manral, V., Fanto, M., White, R., Barnes, M., 364 Li, T., and R. Atkinson, "OSPFv2 HMAC-SHA Cryptographic 365 Authentication", RFC 5709, DOI 10.17487/RFC5709, October 366 2009, . 368 [RFC7770] Lindem, A., Ed., Shen, N., Vasseur, JP., Aggarwal, R., and 369 S. Shaffer, "Extensions to OSPF for Advertising Optional 370 Router Capabilities", RFC 7770, DOI 10.17487/RFC7770, 371 February 2016, . 373 [RFC7794] Ginsberg, L., Ed., Decraene, B., Previdi, S., Xu, X., and 374 U. Chunduri, "IS-IS Prefix Attributes for Extended IPv4 375 and IPv6 Reachability", RFC 7794, DOI 10.17487/RFC7794, 376 March 2016, . 378 [RFC7981] Ginsberg, L., Previdi, S., and M. Chen, "IS-IS Extensions 379 for Advertising Router Information", RFC 7981, 380 DOI 10.17487/RFC7981, October 2016, 381 . 383 Authors' Addresses 385 Aijun Wang 386 China Telecom 387 Beiqijia Town, Changping District 388 Beijing 389 102209 390 China 392 Email: wangaj3@chinatelecom.cn 394 Gyan Mishra 395 Verizon Inc. 397 Email: gyan.s.mishra@verizon.com 399 Zhibo Hu 400 Huawei Technologies 401 Huawei Bld., No.156 Beiqing Rd. 402 Beijing 403 100095 404 China 406 Email: huzhibo@huawei.com 408 Yaqun Xiao 409 Huawei Technologies 410 Huawei Bld., No.156 Beiqing Rd. 411 Beijing 412 100095 413 China 415 Email: xiaoyaqun@huawei.com