idnits 2.17.1 draft-bashandy-bgp-edge-node-frr-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There is 1 instance of too long lines in the document, the longest one being 9 characters in excess of 72. == There are 32 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. == There are 14 instances of lines with private range IPv4 addresses in the document. If these are generic example addresses, they should be changed to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x, 198.51.100.x or 203.0.113.x. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 16, 2012) is 4300 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: '5' is defined on line 943, but no explicit reference was found in the text ** Obsolete normative reference: RFC 5512 (ref. '4') (Obsoleted by RFC 9012) == Outdated reference: A later version (-05) exists of draft-ietf-idr-best-external-04 == Outdated reference: A later version (-04) exists of draft-bashandy-idr-bgp-repair-label-02 Summary: 2 errors (**), 0 flaws (~~), 7 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group A. Bashandy 2 Internet Draft B. Pithawala 3 Intended status: Standards Track K. Patel 4 Expires: January 2013 Cisco Systems 5 July 16, 2012 7 Scalable BGP FRR Protection against Edge Node Failure 8 draft-bashandy-bgp-edge-node-frr-03.txt 10 Abstract 12 Consider a BGP free core scenario. Suppose the edge BGP speakers PE1, 13 PE2,..., PEn know about a prefix P/m via the external routers CE1, 14 CE2,..., CEm. If the edge router PEi crashes or becomes totally 15 disconnected from the core, it is desirable for a core router "P" 16 carrying traffic to the failed edge router PEi to immediately restore 17 traffic by re-tunneling packets originally tunneled to PEi and 18 destined to the prefix P/m to one of the other edge routers that 19 advertised P/m, say PEj, until BGP re-converges. In doing so, it is 20 highly desirable to keep the core BGP-free while not imposing 21 restrictions on external connectivity. Thus (1) a core router should 22 not be required to learn any BGP prefix, (2) the size of the 23 forwarding and routing tables in the core routers should be 24 independent of the number of BGP prefixes,(3) provisioning overhead 25 should be kept at minimum, (4) re-routing traffic without waiting for 26 re-convergence must not cause loops, and (4) there should be no 27 restrictions on what edge routers advertise what prefixes. For 28 labeled prefixes, (6) the label stack on the packet must allow the 29 repair PEj to correctly forward the packet and (7) there must not be 30 any need to perform more than one label lookup on any edge or core 31 router during steady state 33 Status of this Memo 35 This Internet-Draft is submitted in full conformance with the 36 provisions of BCP 78 and BCP 79. 38 This document may contain material from IETF Documents or IETF 39 Contributions published or made publicly available before November 40 10, 2008. The person(s) controlling the copyright in some of this 41 material may not have granted the IETF Trust the right to allow 42 modifications of such material outside the IETF Standards Process. 43 Without obtaining an adequate license from the person(s) 44 controlling the copyright in such materials, this document may not 45 be modified outside the IETF Standards Process, and derivative 46 works of it may not be created outside the IETF Standards Process, 47 except to format it for publication as an RFC or to translate it 48 into languages other than English. 50 Internet-Drafts are working documents of the Internet Engineering 51 Task Force (IETF), its areas, and its working groups. Note that 52 other groups may also distribute working documents as Internet- 53 Drafts. 55 Internet-Drafts are draft documents valid for a maximum of six 56 months and may be updated, replaced, or obsoleted by other 57 documents at any time. It is inappropriate to use Internet-Drafts 58 as reference material or to cite them other than as "work in 59 progress." 61 The list of current Internet-Drafts can be accessed at 62 http://www.ietf.org/ietf/1id-abstracts.txt 64 The list of Internet-Draft Shadow Directories can be accessed at 65 http://www.ietf.org/shadow.html 67 This Internet-Draft will expire on January 16, 2013. 69 Copyright Notice 71 Copyright (c) 2012 IETF Trust and the persons identified as the 72 document authors. All rights reserved. 74 This document is subject to BCP 78 and the IETF Trust's Legal 75 Provisions Relating to IETF Documents 76 (http://trustee.ietf.org/license-info) in effect on the date of 77 publication of this document. Please review these documents 78 carefully, as they describe your rights and restrictions with 79 respect to this document. Code Components extracted from this 80 document must include Simplified BSD License text as described in 81 Section 4.e of the Trust Legal Provisions and are provided without 82 warranty as described in the Simplified BSD License. 84 Table of Contents 86 1. Introduction...................................................3 87 1.1. Conventions used in this document.........................4 88 1.2. Terminology...............................................5 89 1.3. Problem definition........................................6 90 2. Overview of the solution in an MPLS Core.......................7 91 2.1. Control Plane operation for Automated pNH Assignment......7 92 2.2. Control Plane operation for Configured pNH...............10 93 2.3. Forwarding behavior at Steady State (When pPE is reachable)11 94 2.4. Forwarding behavior when pPE Fails.......................12 95 3. Overview of the solution in a Pure IP Core....................13 96 3.1. Control Plane operation..................................13 97 3.2. Forwarding Behavior at Steady State (while pPE is reachable) 98 ..............................................................13 99 3.3. Forwarding Behavior at Failure (when pPE is not reachable)14 100 4. Example.......................................................15 101 4.1. Control Plane............................................16 102 4.2. Forwarding Plane at Steady State (When PE0 is reachable).16 103 4.3. Forwarding Plane at Failure (When PE0 is not reachable)..17 104 5. Inter-operability with Existing IP FRR Mechanisms.............19 105 6. Security Considerations.......................................19 106 7. IANA Considerations...........................................19 107 8. Conclusions...................................................19 108 9. References....................................................20 109 9.1. Normative References.....................................20 110 9.2. Informative References...................................21 111 10. Acknowledgments..............................................21 112 Appendix A. How to protect Against Misconfigured pNH.............22 113 Appendix B. Alternative Approach for advertising (pNH,rNH) to iPE23 114 Appendix C. Modification History.................................24 115 A.1.1. Changes from Version 02.............................24 116 A.1.2. Changes from Version 01.............................24 118 1. Introduction 120 In a BGP free core, where traffic is tunneled between edge routers, 121 BGP speakers advertise reachability information about prefixes to 122 other edge routers not to core routers. For labeled address 123 families, namely AFI/SAFI 1/4, 2/4, 1/128, and 2/128, an edge 124 router assigns local labels to prefixes and associates the local 125 label with each advertised prefix such as L3VPN [10], 6PE [11], and 126 Softwire [9]. Suppose that a given edge router is chosen as the 127 best next-hop for a prefix P/m. An ingress router that receives a 128 packet from an external router and destined to the prefix P/m 129 "tunnels" the packet across the core to that egress router. If the 130 prefix P/m is a labeled prefix, the ingress router pushes the label 131 advertised by the egress router before tunneling the packet to the 132 egress router. Upon receiving the packet from the core, the egress 133 router takes the appropriate forwarding decision based on the 134 content of the packet or the label pushed on the packet. 136 In modern networks, it is not uncommon to have a prefix reachable 137 via multiple edge routers. One example is the best external path 138 [8]. Another more common and widely deployed scenario is L3VPN [10] 139 with multi-homed VPN sites. As an example, consider the L3VPN 140 topology depicted in Figure 1. 142 PE1 .............+ 143 | 144 +--------+---------------+ 145 | | 146 | VPN 1 Network | 147 | | 148 | VPN prefix | 149 | (10.0.0.0/8) | 150 | | 151 +---+--------------------+ 152 | 153 /------CE1 154 / 155 / 156 BGP-free core P--------PE0 157 \ 158 \ 159 \------CE2 160 | 161 +---+--------------------+ 162 | | 163 | VPN 2 Network | 164 | | 165 | VPN prefix | 166 | (20.0.0.0/8) | 167 | | 168 +--------+---------------+ 169 | 170 PE2 .............+ 172 Figure 1 VPN prefix reachable via multiple PEs 174 As illustrated in Figure 1, the edge router PE0 is the primary NH 175 for both 10.0.0.0/8 and 20.0.0.0/8. At the same time, both 176 10.0.0.0/8 and 20.0.0.0/8 are reachable through the other edge 177 routers PE1 and PE2, respectively. 179 1.1. Conventions used in this document 181 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 182 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in 183 this document are to be interpreted as described in RFC-2119 [1]. 185 In this document, these words will appear with that interpretation 186 only when in ALL CAPS. Lower case uses of these words are not to be 187 interpreted as carrying RFC-2119 significance. 189 1.2. Terminology 191 This section defines the terms used in this document. For ease of 192 use, we will use terms similar to those used by L3VPN [10] 194 o BGP-Free core: A network where BGP prefixes are only known to 195 the edge routers and traffic is tunneled between edge routers 197 o External prefix: It is a prefix P/m (of any AFI/SAFI) that a BGP 198 speaker has an external path for. The BGP speaker may learn 199 about the prefix from an external peer through BGP, some other 200 protocol, or manual configuration. The protected prefix is 201 advertised to some or all of the internal peers. 203 o Protectable prefix: It is an external prefix P/m of any 204 AFI/SAFI) that a BGP speaker has an external path to and is 205 eligible to have a repair path. 207 o Primary Egress PE, "ePE": It is an IBGP peer that can reach the 208 prefix P/m through an external path and advertised the prefix to 209 the other IBGP peers. The primary egress PE was chosen as the 210 best path by one or more internal peers. In other words, the 211 primary egress PE is an egress PE that will normally be used by 212 some ingress PEs when there is no failure. Referring to Figure 213 1, PE0 is an egress PE. 215 o Protected egress PE, "pPE" (Protected PE for simplicity): It is 216 an egress PE that has or eligible to have a repair path for some 217 or all of the prefixes to which it has an external path 218 Referring to Figure 1, PE0 is a protected egress PE. 220 o Protected edge router: Any protected egress PE. 222 o Protected next-hop (pNH): It is an IPv4 or IPv6 host address 223 belonging to the protected egress PE. Traffic tunneled to this 224 IP address will be protected via the mechanism proposed in this 225 document. Note that the protected next-hop MUST be different 226 from the next-hop attribute in the BGP update message [2][3]. 228 o CE: It is an external router through which an egress PE can 229 reach a prefix P/m. The routers "CE1" and "CE2" in Figure 1 are 230 examples of such CEs. 232 o Ingress PE, "iPE": It is a BGP speaker that learns about a 233 prefix through another IBGP peer and chooses that IBGP peer as 234 the next-hop for the prefix. 236 o Repairing P router "rP" (Also "Repairing core router" and 237 "repairing router"): A core router that attempts to restore 238 traffic when the primary egress PE is no longer reachable 239 without waiting for IGP or BGP to re-converge. The repairing P 240 router restores the traffic by rerouting the traffic (through a 241 tunnel) towards the pre-calculated repair PE when it detects 242 that the primary egress PE is no longer reachable. Referring to 243 Figure 1, the router "P" is the repairing P router. 245 o Repair egress PE "rPE" (Repair PE for simplicity): It is an 246 egress PE other than the primary egress PE that can reach the 247 protected prefix P/m through an external neighbor. The repair PE 248 is pre-calculated prior to any failure. Referring to Figure 1, 249 PE1 is the repair PE for 10.0.0.0/8 while PE2 is the repair PE 250 for 20.0.0.0/8. 252 o Underlying Repair label (rL): The underlying repair label is the 253 label that will be pushed so that the repair PE can forward 254 repaied traffic correctly. A repair label is defined for labeled 255 protected prefixes only. 257 o Repair next-hop (rNH): It is an IPv4 or IPv6 host address 258 belonging to the repair egress PE. If the protected prefix is 259 advertised via BGP, then the repair next-hop SHOULD be the next- 260 hop attribute in the BGP update message [2][3]. 262 o Repair path (Also Repair Egress Path): It is the repair next- 263 hop. If an underlying repair label exists, the repair path is 264 the repair next-hop together with the underlying repair label. 266 o Primary tunnel: It is the tunnel from the ingress PE to the 267 primary egress PE 269 o Repair tunnel: It is the tunnel from the repairing P router to 270 the repair egress PE 272 1.3. Problem definition 274 The problem that we are trying to solve is as follows 276 o Even though multiple prefixes may share the same egress router, 277 they have different repair edge router. In Figure 1 above, both 278 10.0.0.0/8 and 20.0.0.0/8 share the same primary next hop PE0, 279 the routing protocol(s) must identify that the node protecting 280 repair node for 10.0.0.0/8 is PE1 while the node protecting 281 repair node for 11.0.0.0/8 is PE2 283 o On loosing connection to the edge router, the core router "P" 284 MUST reroute traffic towards the *correct* repair edge router 285 without waiting for IGP or BGP to re-converge and update the 286 routing tables. On the failure of PE0 illustrated in Figure 1, 287 the core router P needs to reroute traffic for 10.0.0.0/8 288 towards PE1 and traffic for 11.0.0.0/8 towards PE2 290 o The repairing core router P MUST NOT be forced to learn about 291 the BGP prefixes on any of the edge router. The same applies for 292 all core routers. 294 o The size of the routing table on any core router MUST be 295 independent of the number of BGP prefixes in the network. 297 o Rerouting traffic without waiting for IGP and BGP to re-converge 298 after a failure MUST NOT cause loops. 300 o For labeled prefixes, when a packet gets re-routed to the repair 301 PE, the label stack on the packet MUST ensure correct 302 forwarding. 304 o Provisioning overhead must be kept at minimum. In addition, 305 misconfiguration should be detectable. 307 o At steady state, when pPE is reachable, a path taken by traffic 308 flow must not be impacted by enabling the solution proposed in 309 this document on some or all routers 311 2. Overview of the solution in an MPLS Core 313 The solution proposed in this document relies on the collaboration of 314 egress PE, ingress PE, penultimate hop routers, and repairing router. 315 This section gives an overview of how to the solution works for 316 labeled and unlabeled protected prefixes in an MPLS core. 318 2.1. Control Plane operation for Automated pNH Assignment 320 This section outlines the solution for the case where the protected 321 next hop "pNH" is automatically calculated instead of being assigned 322 by an operator. 324 1. Each egress router that is capable of handling repaired traffic 325 assigns each protectable labeled prefix a repair label: "rL". "rL" 326 is advertised as optional path attribute. "rL" MUST be per-CE or 327 per-VRF for good BGP attribute packing and forwarding simplicity. 328 For unlabeled prefix, no repair label is needed. A router that is 329 capable of handling repaired traffic is called a repair PE "rPE". 330 The semantics of the repair label "rL" is: 332 a. pop *two* labels 334 b. If "rL" is per-CE, then and send the packet to the appropriate 335 CE 337 c. If "rL" is per-VRF, forward the packet based on the contents 338 under the two popped labels 340 2. If an Egress PE knows that a P/m to which it has an external path 341 is also reachable via another PE and that other PE advertises a 342 repair label "rL" for P/m, 344 a. It chooses the other PE as a repair PE. Let's call the chosen 345 repair PE "rPE". The ePE chooses an IP address "rNH" local to 346 or advertised by rPE. 348 i. "rNH" SHOULD be the next-hop attribute advertised by rPE 349 when it announces reachability to the protected prefix 350 P/m to minimize the number of prefixes advertised into 351 IGP. 353 ii. if rPE also advertised a protected next-hop (pNH) for any 354 BGP prefix that rPE can protect, then rNH MUST NOT be any 355 protected next-hop (pNH) advertised by rPE. 357 b. Allocates a local IP address corresponding to the chosen rPE, 358 say "pNH". "pNH" represents the protected NH. I.e. Traffic 359 tunneled to "pNH" will be protected against edge node failure 360 via the BGP FRR mechanism proposed in this document 362 c. A separate pNH is needed for every rPE (for a given protected 363 PE). Each pNH must be unique within a single BGP-free core. 365 d. Now that "ePE" has a repair path for P/m, it becomes a 366 protected PE "pPE". 368 e. Advertise pNH as a prefix into IGP 370 f. Re-advertise the protected prefix P/m to other iBGP peers with 371 "pNH" as optional non-transitive attribute 373 g. pPE advertises the mapping (pNH,rNH) separately to all ingress 374 PEs. A method analogous to how tunnel information is 375 advertised [4] can be used to advertise this mapping (pNH,rNH) 376 to ingress PE's. 378 h. Once iPE receives the pNH for each prefix and the mapping 379 (pNH,rNH), the iPE can retrieve "rL" for P/m from the 380 advertisement of rPE for P/m. 382 i. "pPE" advertises the pair (pNH,rNH) to candidate repairing 383 core routers. 385 j. "pPE" advertises the protected next-hop "pNH" to the 386 penultimate hops to indicate that traffic flowing through the 387 tunnel to the tail end "pNH" is protected against the failure 388 of the node "pPE" and requires special processing by the 389 penultimate hop as will be described in the next few steps 391 k. pPE advertises an explicit label for pNH instead of the usual 392 implicit NULL. This way pPE can carry out the special label 393 popping behavior (described in the next section if the 394 penultimate hop cannot perform this task 396 3. Ingress PE "iPE" 398 a. iPE receives the protected prefix P/m with "pNH" as an 399 optional attribute 401 b. iPE also receives the mapping (pNH,rNH) from pPE 403 c. When iPE receives "rL" with P/m from rPE, then iPE can 404 associate "rL" with P/m as described in Section 2.1. 406 As a result of the above steps, the following nodes store the 407 following information 409 o Ingress PE (iPE) 411 o Receives from pPE NLRI advertisement for the protected labeled 412 prefix P/m containing the usual next-hop attribute and the 413 optional information "pNH". iPE also receives that mapping 414 (pNH, rNH). 416 o iPE retrieves "rL" from the advertisement of rPE for the 417 protected prefix P/m. 419 o Assume that iPE chooses pPE as the primary NH. Then the iPE 420 will use pNH as the tunnel tail end to pPE instead of the 421 usual BGP next-hop 423 o Penultimate Hop 425 o Receives the "pNH" from pPE 427 o As such, it knows that traffic destined to pNH needs certain 428 special forwarding treatment as described in the next few 429 steps 431 o Penultimate hop advertises "pNH" as its own prefix but with 432 one of the following conditions 434 . For link-state IGPs, "pNH" MAY be advertised with 435 *maximum metric* so as not to affect the path taken by 436 the traffic flowing from iPE's to pPE's 438 . For distance vector IGPs, the penultimate hop MAY 439 advertise the metric of "pNH" as follows 441 PHP-metric(pNH) = 443 pPE-metric(pNH) + metric-From-PHP-to-pPE 445 That is the metric advertised by the penultimate hop for 446 pNH equals the metric advertised by pPE for pNH plus the 447 metric from the penultimate hop to pPE 449 . This way the advertisement of pNH by the penultimate hop 450 does not impact the path taken by the traffic from iPE's 451 to pPE's 453 o Repairing core router "rP" (which may also be the penultimate hop) 455 o Receives the pair (pNH,rNH) from pPE 457 o Installs the following forwarding entry for pNH 459 . If pNH is not reachable, re-tunnel traffic to rNH 461 2.2. Control Plane operation for Configured pNH 463 In Section 2.1, the pPE assigned pNH to a protected prefix P/m 464 based on the chosen rPE. The result of this behavior is the need to 465 re-advertise the protected prefix P/m with the associated "pNH". In 466 this section, we outline the procedure by which the operator can pre- 467 assign pNH to protected prefixes and hence avoid the need to re- 468 advertise protected prefixes. 470 1. Protected PE "pPE" 472 a. The operator groups prefixes such that two prefixes belong to 473 the same group if the operator knows that the two prefixes are 474 protected by the same rPE 476 b. The operator assigns a distinct protected next-hop "pNH" for 477 every group of prefixes. The assignment occurs even a repair 478 path for P/m is not yet known. 480 c. pPE advertises "pNH" as an optional non-transitive attribute 481 with the protected prefix P/m *all the time* even of no other 482 PE advertises P/m 484 d. When pPE receives an advertisement for P/m from another PE 486 i. pPE chooses the other PE as rPE 488 ii. pPE advertises the mapping (pNH,rNH) separately to all 489 ingress PEs. rNH SHOULD be the next-hop attribute 490 advertised by rPE. A method analogous to how tunnel 491 information is advertised [4] can be used to advertise 492 this mapping (pNH,rNH) to ingress PE's. 494 e. The rest of the behavior is identical to what specified in 495 Section 2.1. 497 2. How to Protect the network against misconfigured pNH? 499 See Appendix A. 501 What is left it to outline the forwarding behavior before and after 502 "pPE" failure. 504 2.3. Forwarding behavior at Steady State (When pPE is reachable) 506 This section outlines the packet forwarding procedure when pPE is 507 still reachable 509 1. Ingress PE (iPE) receives a packet matching P/m and reachable via 510 pPE 512 2. The iPE pushes three labels: 514 o Bottom label: VPN label advertised by pPE 516 o Second label: rL 518 o Top label: IGP label towards pNH (not the BGP next-hop 519 attribute) 521 3. Penultimate Hop 523 a. Receives a packet with top label bound to pNH 525 b. Pops *two* labels *all the time*. 527 c. Sends packet to pNH 529 4. Protected PE (pPE) 531 a. Receives a packet with top label as VPN label 533 b. Forwards the packet as usual 535 c. For unlabeled packets, the iPE only pushes the rL and the IGP 536 label of pNH and the pPE uses the IP header for forwarding. 538 Thus the packet can be delivered correctly to its destination. 540 2.4. Forwarding behavior when pPE Fails 542 The repairing core router directly connected to a failure detects 543 that pNH is no longer reachable. The following steps are applied. 545 1. Repairing core router "rP" 547 a. Receives packet with top label bound to pNH 549 b. pNH is not reachable 551 c. Swap the top label with the label of rNH 553 d. Send packet towards rPE 555 In effect, the repairing router re-tunnels the packet towards 556 the repair PE 558 2. Penultimate hop of rPE 560 a. rNH is not a protected NH for rPE 562 b. Thus the penultimate hop employs the usual penultimate hop 563 popping and then forwards the packet to rPE 565 3. Repair PE (rPE) 567 a. Receives packet with top label rL (which rPE advertised) and 568 underneath it the regular VPN label advertised by the 569 protected PE "pPE" 571 b. Make a lookup on "rL" 573 c. rL per CE 575 i. Pop *two* labels. 577 ii. Send to correct CE 579 d. rL per VRF 581 i. Pop *two* labels. 583 ii. Make IP lookup in appropriate VRF 585 iii. Send to the CE 587 e. rL is assigned to unlabeled prefix 589 i. Pop "rL" 591 ii. Send the packet to the correct CE 593 3. Overview of the solution in a Pure IP Core 595 This section provides an overview of the solution when operating in a 596 pure IP core where core routers only understand IPv4 or IPv6 597 protocols. Thus traffic between PEs is transported using IP tunnels 598 such as [4][6][7]. 600 3.1. Control Plane operation 602 The control plane behavior in an IP core is identical to its behavior 603 in an MPLS core. 605 3.2. Forwarding Behavior at Steady State (while pPE is reachable) 607 1. Ingress PE (iPE) receives a packet matching P/m and reachable via 608 pPE 610 2. Ingress PE: 612 o For labeled traffic, Pushes two labels 614 . Bottom label: VPN label advertised by pPE 616 . Second label: rL 618 o For unlabeled traffic, just push "rL" 620 o Encapsulates the packet into the IP tunnel header towards the 621 pNH 623 3. Penultimate Hop 624 o No special behavior is needed from the penultimate hop while 625 pPE is reachable 627 4. Protected PE 629 a. Receives an IP packet encapsulated in an IP tunnel header with 630 destination address pNH 632 b. Decapsulate the IP tunnel header and the label right under it 633 (which will be the repair label "rL") 635 c. For labeled traffic, the VPN label is exposed. So pPE makes a 636 lookup using the VPN label. Otherwise the usual IP forwarding 637 is applied 639 d. Forwards the packet as usual 641 3.3. Forwarding Behavior at Failure (when pPE is not reachable) 643 The repairing router directly connected to a failure detects that pNH 644 is no longer reachable. The following steps are applied. 646 1. Repairing router "rP" 648 a. Receives IP packet with a tunnel header destined to pNH 650 b. pNH is not reachable 652 c. Replace the tunnel header with a tunnel header with 653 destination address rNH 655 d. Forward the packet to rNH 657 2. Repair PE (rPE) 659 a. Receives IP packet with a tunnel header destined to rNH 661 b. Decapsulate the tunnel header to expose the repair label "rL" 663 c. The rest of the behavior is identical to the behavior in an 664 MPLS Core. 666 4. Example 668 We will use an LDP core as an example. Consider the diagram 669 depicted in Figure 2 below. 671 +-----------------------------------+ 672 | | 673 | LDP Core | 674 | | 675 | PE1 Lo = 9.9.9.1 676 | |\ 677 | | \ 678 | | \ 679 | | \ 680 | | CE1.......VRF "Blue" 681 | | / (10.0.0.0/8) 682 | | / (11.0.0.0/8) 683 | | / 684 | |/ 685 PE11 P--------PE0 Lo1 = 1.1.1.1/32 686 | |\ pNH Range = 2.1.1.0/24 687 | | \ 688 | | \ 689 | | \ 690 | | CE2.......VRF "Red" 691 | | / (20.0.0.0/8) 692 | | / (21.0.0.0/8) 693 | | / 694 | |/ 695 | PE2 Lo = 9.9.9.2 696 | | 697 | | 698 +-----------------------------------+ 699 Figure 2 : Edge node BGP FRR in LDP core 701 o In Figure 2, PE0 is the pPE for VRFs "Blue" and "Red" while PE1 702 and PE2 are the rPEs for VRFs "Blue" and "Red", respectively. VRF 703 Blue has 10.0.0.0/8 and 11.0.0.0/8 and VRF Red has 20.0.0.0/8 and 704 21.0.0.0/8 706 o Assuming PE0 uses per prefix label allocation, PE0 assigns the VPN 707 labels 4100, 4200, 4300, and 4400 to 10.0.0.0/8, 11.0.0.0/8, 708 20.0.0.0/8, and 21.0.0.0/8 respectively. PE0 advertises the 709 prefixes 10.0.0.0/8, 11.0.0.0/8, 20.0.0.0/8, and 21.0.0.0/8 using 710 MP/BGP as usual 712 4.1. Control Plane 714 1. rPEs Allocate and advertise Repair labels 716 a. Acting as a rPE, PE1 allocates (on per-CE basis) and 717 advertises a repair label rL1=3100 with the prefixes 718 10.0.0.0/8 and 11.0.0.0/8 to all iBGP peers 720 b. Similarly, PE2 allocates and advertises the repair label 721 rL2=3200 with the prefixes 20.0.0.0/8 and 21.0.0.0/8 723 2. pPE calculates and advertises the pNHs 725 a. For prefixes belonging to VRF "blue", PE0 allocates 726 rNH1=2.1.1.1 because all of them are protected by PE1 728 b. Similarly, for prefixes belonging to VRF "red", PE0 729 allocates rNH2=2.1.1.2 because VRF "red" is protected by PE2 731 c. PE0 advertises (pNH1,rNH1)=(2.1.1.1, 9.9.9.1) and 732 (pNH2,rNH2)=(2.1.1.2, 9.9.9.2) to the ingress PE PE11 and 733 the repairing core router "P". 735 d. PE0 re-advertises 10.0.0.0/8 & 11.0.0.0/8 with the optional 736 attribute pNH1=2.1.1.1, and 20.0.0.0/8 & 21.0.0.0/8 with 737 pNH=2.1.1.2 to the ingress PE PE11 739 3. The ingress PE "PE11" creates the following forwarding state 741 a. For prefixes 10.0.0.0/8 & 11.0.0.0/8: Push the VPN labels 742 4100 and 4200, respectively, followed by rL=3100 then tunnel 743 the packet to 2.1.1.1 745 b. For prefixes 20.0.0.0/8 & 21.0.0.0/8: Push the VPN labels 746 4300 and 4400, respectively, followed by rL=3200; then 747 tunnel the packet to 2.1.1.2 749 4.2. Forwarding Plane at Steady State (When PE0 is reachable) 751 1. Ingress PE PE11 753 a. Traffic for VRF "Blue" 755 i. PE11 receives a packet for VRF Blue with destination 756 address 10.1.1.1 from an external router. 758 ii. PE11 pushes the following labels 760 1. The VPN label 4100 761 2. The Repair label 3100 763 3. The LDP label for the pNH 2.1.1.1 765 b. Traffic for VRF "Red" 767 i. PE11 receives a packet for VRF Red with destination 768 address 20.1.1.1 from an external router 770 ii. PE11 pushes the following labels 772 1. The VPN label 4300 774 2. The Repair label 3200 776 3. The LDP label for the pNH 2.1.1.2 778 2. Penultimate Hop of PE0 (Which is also the rP "P") 780 a. Receives a packet with top label for the protected next-hop 781 2.1.1.1 or 2.1.1.2 783 b. Pops *2* labels 785 c. Forwards the packet to pPE which is 1.1.1.1 787 3. Protected PE PE0 789 a. Traffic for VRF "Blue" 791 i. PE0 receives traffic with the top label 4100. 793 ii. 4100 is the VPN label 10.1.1.1 belonging to VRF "Blue" 795 iii. PE0 pops the label 4100 and forwards the packet to CE1 797 b. Traffic for VRF "Red" 799 i. PE0 receives traffic with the top label 4300. 801 ii. 4300 is the VPN label for 20.1.1.1 belonging to VRF "Red" 803 iii. PE0 pops the label 4300 and forwards the packet to CE2 805 4.3. Forwarding Plane at Failure (When PE0 is not reachable) 807 1. The ingress PE PE11 808 Does not know about the failure yet and hence it does not 809 change its behavior. 811 2. Repair PE rP 813 a. Traffic for VRF "Blue" 815 i. Receives a packet with the top label being the LDP label 816 for 2.1.1.1 818 ii. 2.1.1.1 is not reachable 820 iii. Swap the LDP label for 2.1.1.1 with the LDP label of 821 9.9.9.1 823 iv. Forward the packet towards 9.9.9.1 825 b. Traffic for VRF "Blue" 827 i. Receives a packet with the top label being the LDP label 828 for 2.1.1.2 830 ii. 2.1.1.2 is not reachable 832 iii. Swap the LDP label for 2.1.1.1 with the LDP label of 833 9.9.9.2 835 iv. Forward the packet towards 9.9.9.2 837 3. The repair Router "PE1" 839 a. The penultimate hop of PE1 performs the usual penultimate hop 840 popping 842 b. PE1 receives a packet with the top label equals the repair 843 label 3100, which was allocated on per-CE basis and points to 844 CE1 846 c. PE1 pops *2* labels and forwards the packet to CE1 848 4. The repair Router "PE2" 850 a. The penultimate hop of PE2 performs the usual penultimate hop 851 popping 853 b. PE1 receives a packet with the top label equals the repair 854 label 3200, which was allocated on per-CE basis and points to 855 CE2 857 c. PE2 pops *2* labels and forwards the packet to CE2 859 5. Inter-operability with Existing IP FRR Mechanisms 861 Current existing IP FRR mechanisms can be divided into two 862 categories: core protection and edge protection. Core protection 863 techniques, such as [12], [13], and [14], provide protection against 864 internal node and/or link failure. Thus the technique proposed in 865 this document is not related to existing IP FRR mechanisms. If the 866 failure of an internal node or link results in completely 867 disconnecting a protectable edge node, then an administrator MAY 868 configure the repairing router to prefer the technique proposed in 869 this document over existing IP FRR mechanisms. 871 Edge protection techniques, such as [16] and its practical 872 implementation [15] provide protection against the failure of the 873 link between PE and CE routers. Thus existing PE-CE link protection 874 can co-exist with the techniques proposed in this document because 875 the two techniques are independent of each other. 877 6. Security Considerations 879 No additional security risk is introduced by using the mechanisms 880 proposed in this document 882 7. IANA Considerations 884 No requirements for IANA 886 8. Conclusions 888 This document proposes a method that allows fast re-route 889 protection against edge node failure or complete disconnected from 890 the core in a BGP-free core. The proposed method has few advantages 892 o Easy to apply protection policies. pPE is the router that chooses 893 the rPE. Hence if an operator wants to control what prefixes/VRFs 894 get to be protected or what router can be chosen as repair PE, the 895 operator needs to apply the policy on the pPE only. 897 o Simple forwarding plane. The only change in forwarding plane is 898 the need to pop/push two labels on the iPE, rP, and rPEs. 900 o Single label lookup even during failure. Forwarding decisions are 901 taken based on a single label lookup on all routers all the time 902 even during failure 904 o Immunity to mis-configuration. The only required configuration is 905 to choose non-overlapping address ranges on different pPEs. If an 906 operator configures overlapping IP address ranges on two different 907 pPEs, then one of the pPE will eventually allocate a pNH that is 908 covered by the IP address range of another pPE and hence the mis- 909 configuration can be detected 911 o No Need for IP or TE FRR: Because the exit point of the repair 912 tunnel from rP to rPE is different from the primary tunnel exit 913 point 915 o Works in both MPLS core and IP core 917 o Works with per-CE, per-VRF, and per-prefix label allocation 919 o Can be incrementally deployed. There is no flag day. Different 920 routers can be upgraded at different times 922 o Zero impact on the paths taken by traffic: Enabling/deploying the 923 feature described in this document has no effect on the paths 924 taken by traffic at steady state 926 9. References 928 9.1. Normative References 930 [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement 931 Levels", BCP 14, RFC 2119, March 1997. 933 [2] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway Protocol 4 934 (BGP-4), RFC 4271, January 2006 936 [3] Bates, T., Chandra, R., Katz, D., and Rekhter Y., 937 "Multiprotocol Extensions for BGP", RFC 4760, January 2007 939 [4] Malhotra, P. and Rosen, E., " The BGP Encapsulation Subsequent 940 Address Family Identifier (SAFI) and the BGP Tunnel 941 Encapsulation Attribute", RFC 5512, April 2009 943 [5] Lau, J., Ed., Townsley, M., Ed., and I. Goyret, Ed., "Layer Two 944 Tunneling Protocol - Version 3 (L2TPv3)", RFC 3931, March 2005. 946 [6] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. Traina, 947 "Generic Routing Encapsulation (GRE)", RFC 2784, March 2000. 949 [7] Perkins, C., "IP Encapsulation within IP", RFC 2003, October 950 1996. 952 9.2. Informative References 954 [8] Marques,P., Fernando, R., Chen, E, Mohapatra, P., Gredler, H., 955 "Advertisement of the best external route in BGP", draft-ietf- 956 idr-best-external-04.txt, April 2011. 958 [9] Wu, J., Cui, Y., Metz, C., and E. Rosen, "Softwire Mesh 959 Framework", RFC 5565, June 2009. 961 [10] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private Networks 962 (VPNs)", RFC 4364, February 2006. 964 [11] De Clercq, J. , Ooms, D., Prevost, S., Le Faucheur, F., 965 "Connecting IPv6 Islands over IPv4 MPLS Using IPv6 Provider 966 Edge Routers (6PE)", RFC 4798, February 2007 968 [12] Atlas, A. and A. Zinin, "Basic Specification for IP Fast 969 Reroute: Loop-Free Alternates", RFC 5286, September 2008. 971 [13] Shand, S., and Bryant, S., "IP Fast Reroute", RFC5714, January 972 2010 974 [14] Shand, M. and S. Bryant, "A Framework for Loop-Free 975 Convergence", RFC 5715, January 2010. 977 [15] Bashandy, A., Pithawala, P., and Heitz, J., "Scalable, Loop- 978 Free BGP FRR using Repair Label", draft-bashandy-idr-bgp- 979 repair-label-02.txt", July 2011 981 [16] O. Bonaventure, C. Filsfils, and P. Francois. "Achieving sub-50 982 milliseconds recovery upon bgp peering link failures," IEEE/ACM 983 Transactions on Networking, 15(5):1123-1135, 2007 985 10. Acknowledgments 987 Special thanks to Eric Rosen, Clarence Filsfils, Maciek 988 Konstantynowicz, Stewart Bryant, Pradosh Malhotra, Nagendra Kumar, 989 George Swallow, Les Ginsberg, and Anton Smirnov for the valuable 990 comments 992 This document was prepared using 2-Word-v2.0.template.dot. 994 Appendix A. How to protect Against Misconfigured pNH 996 Section 2.2 outlines a method by which the operator can configure 997 the protected next-hop "pNH". There is a possibility of a 998 misconfiguration as follows 1000 o The operator configures the same pNH for two protected prefixes 1001 P1/m1 and P2/m2 but the two prefixes are protected by different 1002 rPEs 1004 o The operator configures two different pNH's for two protected 1005 prefixes P1/m1 and P2/m2 but the two prefixes are protected by 1006 same rPE 1008 The second configuration does not cause a lot of harm. Either way, 1009 routers implementing the BGP FRR scheme proposed in this document can 1010 detect both misconfigurations. 1012 Suppose the operator configures the same "pNH" for P1/m1 and P2/m2 1013 but P1/m1 is protected by rPE1 and P2/m2 is protected by rPE2. In 1014 that case, the iPE and misconfigured pPE will detect this 1015 inconsistency because both will see that P1/m1 and P2/m2 are assigned 1016 the same pNH but are protected by two different rPEs. The reaction to 1017 the misconfiguration is beyond the scope of this document. 1019 Similarly, iPE and pPE can detect that the operator configured 1020 different pNH's for P1/m1 and P2/m2 even though they are protected by 1021 the same rPE because both iPE and pPE will receive an advertisement 1022 for P1/m1 and P2/m2 from the same rPE. Reactions and remedy to the 1023 misconfiguration is beyond the scope of this document. 1025 Appendix B. Alternative Approach for advertising (pNH,rNH) to iPE 1027 In Section 2.1, pPE re-advertises the protected prefixes with (pNH) 1028 as optional non-transitive attribute and advertises mapping (pNH,rNH) 1029 separately. Alternatively, iPE can re-advertise the protected prefix 1030 P/m to other iBGP peers with the mapping (pNH,rNH) as optional non- 1031 transitive attributes. Advertising (pNH) only with the prefixes has 1032 some advantages 1034 o Advertising pNH only with the prefixes can easily be used for 1035 configured pNH as described in Section 2.2. 1037 o If the repair PE changes from one PE to another, there is no need 1038 to re-advertise all the prefixes. Only the mapping (pNH,rNH) needs 1039 to be re-advertised plus possibly some of the protected prefixes 1041 o Advertising pNH only with the prefix slightly reduces the BGP 1042 message size 1044 Irrespective of whether (pNH,rNH) is advertised with the prefix or 1045 separately, (pNH,rNH) is better than advertising (pNH,rL) because 1046 there are many rL's for the same rNH. Hence advertising (pNH,rNH) 1047 yields better attribute packing 1049 Appendix C. Modification History 1051 C.1.1. Changes from Version 02 1053 The whole scheme has been changed to a single next-hop per pPE-rPE. 1054 As a result, unlike version 00 and 01, there will be a need for 1055 behavioral changes in pPE, rP, iPE. The behavior for rPE remains 1056 almost unchanged 1058 The second important change is requiring rP to advertise the pNH with 1059 maximum metric so that traffic does not get disrupted when the pPE 1060 disappears 1062 C.1.2. Changes from Version 01 1064 1. Use the term "underlying repair label" instead of just "repair 1065 label" to avoid confusion with the term "repair label" used in 1066 [15]. 1068 2. In version 01, it was assumed in many places that the repairing 1069 router is the penultimate hop P router. Although this would 1070 probably be the most common case, it is not always true. Hence in 1071 this version the repairing router may be any core router 1073 3. Merged handling labeled and unlabeled prefixes into a single 1074 algorithm. 1076 4. Allowed sending a repair label for unlabeled prefixes and added 1077 the "Push" flag. This ensures loop-free repair even for unlabeled 1078 prefixes in case that the repair PE has eiBGP paths as mentioned 1079 in Section Error! Reference source not found. 1081 5. In Section Error! Reference source not found. discussing the rules 1082 governing the choice of the underlying repair label for labeled 1083 prefix, we changed the wording so that the primary egress PE 1084 "SHOULD" instead of "MAY" use the repair label advertised 1085 according to [15] as an underlying repair label. 1087 6. All occurrences of the term "backup" were replaced by "repair" as 1088 the term "repair" is the commonly used term in the IP FRR context 1089 such as [14][13][12] 1091 7. Added the definition of primary and repair tunnels in Section 1.2. 1093 8. Added a definition of the term "Repair Next-hop" in Section 1.2. 1095 9. Modified the definition of "repair path" in Section 1.2. to being 1096 the repair next-hop plus the underlying repair label instead of 1097 being the repair PE plus the underlying repair label. 1099 10.Outlined inter-operability with existing IP FRR techniques in 1100 Section 5. 1102 11.There were few editorial corrections. 1104 Authors' Addresses 1106 Ahmed Bashandy 1107 Cisco Systems 1108 170 West Tasman Dr, San Jose, CA 95134 1109 Email: bashandy@cisco.com 1111 Burjiz Pithawala 1112 Cisco Systems 1113 170 West Tasman Dr, San Jose, CA 95134 1114 Email: bpithaw@cisco.com 1116 Keyur Patel 1117 Cisco Systems 1118 170 West Tasman Dr, San Jose, CA 95134 1119 Email: keyupate@cisco.com