idnits 2.17.1 draft-bashandy-bgp-frr-mirror-table-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 32 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. == There are 9 instances of lines with private range IPv4 addresses in the document. If these are generic example addresses, they should be changed to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x, 198.51.100.x or 203.0.113.x. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The exact meaning of the all-uppercase expression 'MAY NOT' is not defined in RFC 2119. If it is intended as a requirements expression, it should be rewritten using one of the combinations defined in RFC 2119; otherwise it should not be all-uppercase. == The expression 'MAY NOT', while looking like RFC 2119 requirements text, is not defined in RFC 2119, and should not be used. Consider using 'MUST NOT' instead (if that is what you mean). Found 'MAY NOT' in this paragraph: iv. Remember that the external path MAY or MAY NOT be the best path. For example, if MED is used to decide the best path and the best path happened to be the internal path, then techniques, such as [9], [17], [18], and [20] are needed to calculate and advertise (an) alternative external path(s). == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 8, 2012) is 4215 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: '8' is defined on line 882, but no explicit reference was found in the text ** Obsolete normative reference: RFC 5512 (ref. '4') (Obsoleted by RFC 9012) == Outdated reference: A later version (-05) exists of draft-ietf-idr-best-external-04 == Outdated reference: A later version (-15) exists of draft-ietf-idr-add-paths-07 == Outdated reference: A later version (-03) exists of draft-pmohapat-idr-fast-conn-restore-02 Summary: 1 error (**), 0 flaws (~~), 9 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group A. Bashandy 2 Internet Draft M. Konstantynowicz 3 Intended status: Standards Track N. Kumar 4 Expires: April 2013 Cisco Systems 5 October 8, 2012 7 BGP FRR Protection against Edge Node Failure Using Table Mirroring 8 with Context Labels 9 draft-bashandy-bgp-frr-mirror-table-00.txt 11 Abstract 13 Consider a BGP free core scenario. Suppose the edge BGP speakers PE1, 14 PE2,..., PEn know about a prefix P/m via the external routers CE1, 15 CE2,..., CEm. If the edge router PEi crashes or becomes totally 16 disconnected from the core, it is desirable for a core router "P" 17 carrying traffic to the failed edge router PEi to immediately restore 18 traffic by re-tunneling packets originally tunneled to PEi and 19 destined to the prefix P/m to one of the other edge routers that 20 advertised P/m, say PEj, until BGP re-converges. This draft proposes 21 a BGP FRR scheme that relies on having the repairing edge router 22 mirror the protected edge router forwarding table. The repairing edge 23 router uses a locally allocated context label to identify the correct 24 mirrored table. 26 Status of this Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 This document may contain material from IETF Documents or IETF 32 Contributions published or made publicly available before November 33 10, 2008. The person(s) controlling the copyright in some of this 34 material may not have granted the IETF Trust the right to allow 35 modifications of such material outside the IETF Standards Process. 36 Without obtaining an adequate license from the person(s) 37 controlling the copyright in such materials, this document may not 38 be modified outside the IETF Standards Process, and derivative 39 works of it may not be created outside the IETF Standards Process, 40 except to format it for publication as an RFC or to translate it 41 into languages other than English. 43 Internet-Drafts are working documents of the Internet Engineering 44 Task Force (IETF), its areas, and its working groups. Note that 45 other groups may also distribute working documents as Internet- 46 Drafts. 48 Internet-Drafts are draft documents valid for a maximum of six 49 months and may be updated, replaced, or obsoleted by other 50 documents at any time. It is inappropriate to use Internet-Drafts 51 as reference material or to cite them other than as "work in 52 progress." 54 The list of current Internet-Drafts can be accessed at 55 http://www.ietf.org/ietf/1id-abstracts.txt 57 The list of Internet-Draft Shadow Directories can be accessed at 58 http://www.ietf.org/shadow.html 60 This Internet-Draft will expire on April 8, 2013. 62 Copyright Notice 64 Copyright (c) 2012 IETF Trust and the persons identified as the 65 document authors. All rights reserved. 67 This document is subject to BCP 78 and the IETF Trust's Legal 68 Provisions Relating to IETF Documents 69 (http://trustee.ietf.org/license-info) in effect on the date of 70 publication of this document. Please review these documents 71 carefully, as they describe your rights and restrictions with 72 respect to this document. Code Components extracted from this 73 document must include Simplified BSD License text as described in 74 Section 4.e of the Trust Legal Provisions and are provided without 75 warranty as described in the Simplified BSD License. 77 Table of Contents 79 1. Introduction...................................................3 80 1.1. Conventions used in this document.........................5 81 1.2. Terminology...............................................5 82 1.3. Problem definition........................................7 83 2. Overview of BGP FRR using Mirrored Forwarding Table in an MPLS 84 Core..............................................................8 85 2.1. Control Plane operation...................................8 86 2.2. Forwarding behavior at Steady State (while pPE is reachable) 87 ..............................................................10 88 2.3. Forwarding behavior when pPE Fails.......................10 89 3. Overview of the BGP FRR using Mirrored Forwarding Table in IP Core 90 .................................................................12 91 3.1. Control plane modification for IP core...................12 92 3.2. Forwarding behavior at Steady State (while pPE is reachable) 93 ..............................................................12 94 3.3. Forwarding plane at Failure (when pPE is unreachable)....12 95 4. Rules for Choosing and Managing the Repair path...............13 96 5. Inter-operability with Existing IP FRR Mechanisms.............14 97 6. Example.......................................................15 98 6.1. Control Plane............................................16 99 6.2. Forwarding Plane at Steady State (When PE0 is reachable).17 100 6.3. Forwarding Plane at Failure (When PE0 is not reachable)..17 101 7. Security Considerations.......................................19 102 8. IANA Considerations...........................................19 103 9. Conclusions...................................................19 104 10. References...................................................19 105 10.1. Normative References....................................19 106 10.2. Informative References..................................20 107 11. Acknowledgments..............................................21 108 Appendix A. Auto-determination of Operating Parameters on rPE and pPE 109 .................................................................21 110 A.1. How rPE determines the Protected PE......................22 111 A.2. How pPE Determines its rPEs and Assigns pNH for each rPE.22 112 A.3. Detecting Mis-configuration..............................23 113 Appendix B. Ensuring correct forwarding at the edge routers......24 115 1. Introduction 117 In a BGP free core, where traffic is tunneled between edge routers, 118 BGP speakers advertise reachability information about prefixes to 119 other edge routers but not to core routers. For labeled address 120 families, namely AFI/SAFI 1/4, 2/4, 1/128, and 2/128, an edge 121 router assigns local labels to prefixes and associates the local 122 label with each advertised prefix such as L3VPN [11], 6PE [12], and 123 Softwire [10]. Suppose that a given edge router is chosen as the 124 best next-hop for a prefix P/m by an ingress router. The ingress 125 router that receives a packet from an external router and destined 126 to the prefix P/m "tunnels" the packet across the core to that 127 egress router. If the prefix P/m is a labeled prefix, the ingress 128 router pushes the label advertised by the egress router before 129 tunneling the packet to the egress router. Upon receiving the 130 packet from the core, the egress router takes the appropriate 131 forwarding decision based on the content of the packet or the label 132 pushed on the packet. 134 In modern networks, it is not uncommon to have a prefix reachable 135 via multiple edge routers. One example is the best external path 136 [9]. Another more common and widely deployed scenario is L3VPN [11] 137 with multi-homed VPN sites. As an example, consider the L3VPN 138 topology depicted in Figure 1. 140 PE1 .............+ 141 | 142 +--------+---------------+ 143 | | 144 | VPN 1 Network | 145 | | 146 | VPN prefix | 147 | (10.0.0.0/8) | 148 | | 149 +---+--------------------+ 150 | 151 /------CE1 152 / 153 / 154 BGP-free core P--------PE0 155 \ 156 \ 157 \------CE2 158 | 159 +---+--------------------+ 160 | | 161 | VPN 2 Network | 162 | | 163 | VPN prefix | 164 | (20.0.0.0/8) | 165 | | 166 +--------+---------------+ 167 | 168 PE2 .............+ 170 Figure 1 VPN prefix reachable via multiple PEs 172 As illustrated in Figure 1, the edge router PE0 is the primary NH 173 for both 10.0.0.0/8 and 20.0.0.0/8. At the same time, both 174 10.0.0.0/8 and 20.0.0.0/8 are reachable through the other edge 175 routers PE1 and PE2, respectively. On the failure of the edge 176 router PE0, it is highly desirable for the core router P to re- 177 route traffic for VPN 1 and VPN 2 to PE1 and PE2, respectively, 178 without waiting for IGP or BGP to re-converge. This document 179 proposes a scheme by which the egress and core routers participate 180 to enable a core router to re-route traffic to the correct backup 181 edge router when the primary edge router fails while keeping the 182 core BGP-free 184 It is noteworthy to mention that the behavior specified in this 185 draft requires supporting more than one BGP path. Methods, such as 187 [9], [17], and [18], may be needed to satisfy the multi-path 188 requirement in certain scenarios such as the case were MED [2] or 189 local preference [2] is used to determine the best path. The 190 mechanism(s) by which a router supports BGP multi-path is beyond 191 the scope of this document. 193 1.1. Conventions used in this document 195 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 196 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in 197 this document are to be interpreted as described in RFC-2119 [1]. 199 In this document, these words will appear with that interpretation 200 only when in ALL CAPS. Lower case uses of these words are not to be 201 interpreted as carrying RFC-2119 significance. 203 1.2. Terminology 205 This section defines the terms used in this document. For ease of 206 use, we will use terms similar to those used by L3VPN [11] 208 o BGP-Free core: A network where BGP prefixes are only known to 209 the edge routers and traffic is tunneled between edge routers 211 o External prefix: It is a prefix P/m (of any AFI/SAFI) that a BGP 212 speaker has an external path for. The BGP speaker may learn 213 about the prefix from an external peer through BGP, some other 214 protocol, or manual configuration. The external prefix is 215 advertised to some or all of the internal peers. 217 o Protectable prefix: It is an external prefix P/m of any 218 AFI/SAFI) that a BGP speaker has an external path to and is 219 eligible to have a repair path. 221 o Protected prefix: It is an external prefix P/m (of any AFI/SAFI) 222 that a BGP speaker has an external path to and also has a repair 223 path to. 225 o Primary Egress PE, "ePE": It is an IBGP peer that can reach the 226 prefix P/m through an external path and advertised the prefix to 227 the other IBGP peers. The primary egress PE was chosen as the 228 best path by one or more internal peers. In other words, the 229 primary egress PE is an egress PE that will normally be used by 230 some ingress PEs when there is no failure. Referring to Figure 231 1, PE0 is an egress PE. 233 o Protected egress PE, "pPE" (Protected PE for simplicity): It is 234 an egress PE for which there exists a repair path for some or 235 all of the prefixes to which it has an external path. Referring 236 to Figure 1, PE0 is a protected egress PE. 238 o Protected edge router: Any protected egress PE. 240 o Protected next-hop (pNH): It is an IPv4 or IPv6 host address 241 belonging to the protected egress PE. Traffic tunneled to this 242 IP address will be protected via the mechanism proposed in this 243 document. 245 o CE: It is an external router through which an egress PE can 246 reach a prefix P/m. The routers "CE1" and "CE2" in Figure 1 are 247 examples of such CEs. 249 o Ingress PE, "iPE": It is a BGP speaker that learns about a 250 prefix through another IBGP peer and chooses that IBGP peer as 251 the next-hop for the prefix. 253 o Repairing P router "rP" (Also "Repairing core router" and 254 "repairing router"): A core router that attempts to restore 255 traffic when the primary egress PE is no longer reachable 256 without waiting for IGP or BGP to re-converge. The repairing P 257 router restores the traffic by rerouting the traffic (through a 258 tunnel) towards the pre-calculated repair PE when it detects 259 that the primary egress PE is no longer reachable. Referring to 260 Figure 1, the router "P" is the repairing P router. 262 o Repair egress PE "rPE" (Repair PE for simplicity): It is an 263 egress PE other than the primary egress PE that can reach the 264 protected prefix P/m through an external neighbor. The repair PE 265 is pre-calculated via other PEs prior to any failure. Referring 266 to Figure 1, PE1 is the repair PE for 10.0.0.0/8 while PE2 is 267 the repair PE for 20.0.0.0/8. 269 o Repair next-hop (rNH): It is an IPv4 or IPv6 address belonging 270 to the repair egress PE. If the protected prefix is advertised 271 via BGP, then the repair next-hop SHOULD be the next-hop 272 attribute in the BGP update message [2][3]. 274 o BGP nexthop (bgpNH): This is the usual next-hop attribute for 275 route advertisements as specified in [2] and [3]. 277 o Context Label (cL): It is an MPLS label allocated by the 278 repairing PE (rPE) to identify the mirrored forwarding table of 279 the protected PE (pPE). An rPE must allocate a locally distinct 280 context label for each mirrored forwarding table. Context labels 281 on different rPEs may overlap 283 o Repair path (Also Repair Egress Path): It is the repair next- 284 hop. 286 o Primary tunnel: It is the tunnel from the ingress PE to the 287 primary egress PE 289 o Repair tunnel: It is the tunnel from the repairing P router to 290 the repair egress PE 292 1.3. Problem definition 294 The problem that we are trying to solve is as follows 296 o Even though multiple prefixes may share the same egress router, 297 they have different repair edge router. On losing connection to 298 the edge router, a core router "P" detecting the loss of 299 connection MUST reroute traffic towards the *correct* repair 300 edge router that can reach prefixes that were reachable via the 301 failed edge router without waiting for IGP or BGP to re-converge 302 and update the routing tables. 304 o The repairing core router P MUST NOT be forced to learn about 305 the BGP prefixes on any of the edge router. The same applies for 306 all core routers. 308 o The size of the routing table on any core router MUST be 309 independent of the number of BGP prefixes in the network. 311 o Rerouting traffic without waiting for IGP and BGP to re-converge 312 after a failure MUST NOT introduce loops. 314 o For labeled prefixes, when a packet gets re-routed to the repair 315 PE, the label stack on the packet MUST ensure correct 316 forwarding. 318 o At steady state, when pPE is reachable, paths taken by traffic 319 before deploying the solution proposed in this document MUST NOT 320 be impacted after deploying the solution proposed in this 321 document unless desired by the operator. 323 o The solution MUST be incrementally deployable 325 o Minimize the number of nodes that need to be upgraded. Hence 326 only egress PE's that participate in the solution (namely pPE's 327 and rPE's) and protecting core routers (namely rP's) need to be 328 upgraded. 330 Applying the problem to the topology in Figure 1 above, both 331 10.0.0.0/8 and 20.0.0.0/8 share the same primary egress router PE0, 332 the routing protocol(s) must identify that the node protecting repair 333 node for 10.0.0.0/8 is PE1 while the node protecting repair node for 334 11.0.0.0/8 is PE2. On the failure of PE0, the core router P must 335 reroute traffic for 10.0.0.0/8 towards PE1 and traffic for 11.0.0.0/8 336 towards PE2 without requiring the core router P to know about any BGP 337 prefix. 339 2. Overview of BGP FRR using Mirrored Forwarding Table in an MPLS Core 341 The solution proposed in this document relies on the collaboration of 342 egress PEs, and the repairing core router. This section gives an 343 overview of how to the solution works for both labeled (AFI/SAFI 1/4, 344 2/4, 1/128, and 2/128) and unlabeled (AFI/SAFI 1/1, 2/1, 1/2, and 345 2/2) protected prefixes in a core where the tunnels between edge 346 routers are LDP LSPs [7]. Specifications of the solution in IP core 347 are provided in Section 3. 349 2.1. Control Plane operation 351 Control plan requires certain operating parameters to be assigned. 352 This section explains how the parameters are assigned through 353 configuration. Automatic determination of the operating parameters is 354 explained in Appendix A. 356 1. Setting the Operating parameters on pPE 358 a. Suppose the protectable prefixes on a given pPE are protected by 359 the repair edge routers rPE1, rPE2,... 361 b. For the set of prefixes protected by a given rPE, assign a 362 distinct local next-hop pNH. The pNH is also advertised as the 363 bgpNH when the pPE advertises the prefixes to other iBGP peers. 364 This section assumes that pNH is assigned via configuration. pNH 365 can be automatically calculated as described in Appendix A. 367 c. pNH MUST be unique within a routing domain 369 d. Because pNH is also used as bgpNH, then pNH MUST be advertised 370 into IGP as usual 372 2. Setting the Operating parameters on the rPE 374 a. Suppose the rPE can protect prefixes whose bgpNH is pNH1, 375 pNH2,... 377 b. The operator informs rPE about the bgp next-hops that it can 378 protect. This task can be carried out through configuration. 379 Appendix A outlines how rPE can automatically determine the BGP 380 next-hops it can protect. 382 c. rPE performs the following tasks for each pNH 384 i. rPE allocates a "locally" distinct context label "cL" for 385 each pNH that the rPE can protect 387 ii. rPE advertises "pNH" as its own prefix into IGP but with 388 (maximum metric - 1) so as not to affect the path taken by 389 the traffic flowing from iPE's to pPE's 391 iii. rPE advertises "cL" for pNH instead of implicit NULL to its 392 neighboring LSRs. As explained in Appendix B, this behavior 393 is necessary to ensure correct forwarding during the period 394 starting from complete disconnect of pPE till all iPE stop 395 using pPE as an exit point for BGP traffic. 397 iv. rPE allocates a separate "mirror" forwarding table for each 398 pNH. The mirror forwarding table consists of a mirror IP 399 table and a corresponding label table. The mirror table is 400 identified by the context label "cL" 402 v. rPE assigns a local IP address rNH as the repair next-hop. 403 rNH may be any local IP address on the rPE. "rNH" SHOULD be 404 any next-hop attribute advertised by rPE when it announces 405 reachability to the protected prefix P/m to minimize the 406 number of prefixes advertised into IGP. 408 vi. rPE advertises the triplet (pNH,rNH,cL) to candidate 409 repairing core routers. The syntax is TBD. For example, an 410 LDP optional TLV can be used for this purpose 412 d. Remember that pNH1, pNH2,... are advertised as the BGP next-hop 413 by pPE's. When rPE receives a prefix advertisement from an iBGP 414 peer with bgpNH equal to one of the pNHs it can protect AND rPE 415 has at least one "external" path for the received prefix: 417 i. If the prefix is labeled ((AFI/SAFI 1/4, 2/4, 1/128, and 418 2/128), insert the received label into the mirror label 419 table corresponding to the pNH 421 ii. If the prefix is unlabeled, (AFI/SAFI 1/1, 2/1, 1/2, and 422 2/2), insert the prefix into the mirror IP table 423 corresponding to the pNH 425 iii. The forwarding entry of the prefix or the label in the 426 mirror table is to either send the packet to (one of) the 427 external path(s) or drop the packet 429 iv. Remember that the external path MAY or MAY NOT be the best 430 path. For example, if MED is used to decide the best path 431 and the best path happened to be the internal path, then 432 techniques, such as [9], [17], [18], and [20] are needed to 433 calculate and advertise (an) alternative external path(s). 435 3. Determining the Operating Parameters on Protecting Core router 436 "rP" 438 a. rP receives the triplet (pNH,rNH,cL) from rPE 440 b. rP installs the following entry for pNH in its forwarding table 442 i. if pNH is reachable, forward the packet as usual 444 ii. If pNH is not reachable 446 1. Swap the label bound to pNH with "cL" 448 2. tunnel the traffic towards rNH 450 4. Operating parameters on the rest of the routers 452 a. Other than pPE, rPE, and rP, the rest of the routers can remain 453 totally agnostic to the BGP FRR scheme proposed in this document 455 b. Because rPE advertises pNH with (maximum-metric - 1), all the 456 routers will prefer pPE when sending traffic to the IP address 457 pNH. Hence as long as pPE is reachable, there is no change in 458 traffic patterns 460 2.2. Forwarding behavior at Steady State (while pPE is 461 reachable) 463 When pPE is reachable, there is no change in behavior due to 464 deploying the scheme proposed in this document 466 2.3. Forwarding behavior when pPE Fails 468 The repairing router "rP" directly connected to a failure detects 469 that pNH is no longer reachable. The following steps are applied. 471 1. Repairing router "rP" 473 a. Receives packet with top label bound to pNH 475 b. pNH is not reachable 476 c. Pop the label of pNH and swap it with the context label cL 477 received in the triplet (pNH,rNH,cL) from rPE 479 d. Push the label corresponding to rNH 481 e. Send the packet towards rNH 483 2. Penultimate hop of rPE performs the usual penultimate hop popping 485 3. Repair PE (rPE) 487 a. Because its penultimate hop performed penultimate hop popping, 488 rPE receives a packet with the top label being the context label 489 "cL" 491 b. rPE uses "cL" to identify the correct mirror forwarding table 493 c. rPE pops the context label "cl" 495 d. if the packet underneath "cL" is labeled, lookup the top label in 496 the mirror label table corresponding to cL 498 e. If the packet underneath "cL" is unlabeled, lookup the 499 destination address of the packet in the mirror IP table 500 corresponding to cL 502 f. Forward the packet to an external neighbor or drop it based on 503 the mirror table lookup 505 4. Ingress PEs (iPEs) 507 a. An ingress PE that has not yet learnt about the disappearance of 508 pPE will continue to send traffic towards pNH and hence will be 509 re-routed towards rPE by rP and forwarded correctly 511 b. An ingress PE that learns about the disappearance of pPE will 512 calculate a new best path for traffic previously destined to pNH 514 5. The rest of the core routers 516 a. A core router that has not yet learnt that pPE is no longer 517 reachable will continue send traffic destined to pNH towards pPE. 518 This traffic will be intercepted by rP and re-routed towards rPE 520 b. A core router that has learnt that pPE is no longer reachable 521 will send traffic towards rPE because rPE advertises pNH with 522 (maximum-metric - 1). 524 i. Because rPE advertises the label "cL" for rNH instead of the 525 usual implicit NULL, a packet originally destined towards 526 pPE that gets re-routed towards rPE will arrive at rPE with 527 "cL" at the top 529 ii. Hence rPE will process it as described in step 3. 531 c. Eventually all iPEs learn that pPE is unreachable and hence no 532 traffic will be sent to any of the pNHs advertised by pPE that 533 has just disappeared 535 The next section presents the solution in an IP core. 537 3. Overview of the BGP FRR using Mirrored Forwarding Table in IP Core 539 This section describes the BGP FRR using mirrored tables solution in 540 an IP core for both labeled (AFI/SAFI 1/4, 2/4, 1/128, and 2/128) and 541 unlabeled (AFI/SAFI 1/1, 2/1, 1/2, and 2/2) protected prefixes. 543 The primary difference between a MPLS core and an IP core is that the 544 tunnels between edge routers are IP based such as [5][6][7]. We 545 assume that rP is capable of handling MPLS labels 547 3.1. Control plane modification for IP core 549 When using IP tunnels instead of MPLS tunnels between edge routers, 550 there is one small modification at the repair edge router rPE. For 551 the MPLS core, the correct mirror table at rPE is identified by the 552 context label "cL". For the IP core, the correct mirror table must be 553 indentified by either the context label "cL" or the protected next- 554 hop "pNH". As explained in Appendix B, this behavior is necessary to 555 ensure correct forwarding during the period starting from complete 556 disconnect of pPE till all iPE stop using pPE as an exit point for 557 BGP traffic. 559 3.2. Forwarding behavior at Steady State (while pPE is 560 reachable) 562 When pPE is reachable, there is no change in behavior due to 563 deploying the scheme proposed in this document 565 3.3. Forwarding plane at Failure (when pPE is unreachable) 567 1. iPE is not yet aware of the failure so its behavior remains the 568 same 570 2. rP 571 a. Decapsulates the tunnel header towards pNH 573 b. Pushes the context label "cL" 575 c. Encapsulates the packet into a tunnel header with destination 576 address rNH and forwards the packet towards rPE 578 3. rPE 580 a. If the tunnel packet arrives with desitination address "rNH" 582 i. Decapsulates the tunnel header. This exposes the context 583 label "cL" 585 b. Otherwise (i.e. the destination address is "pNH") 587 i. Decapsulate the tunnel header and associate the exposed 588 packet with the mirror table based on "pNH" 590 c. The rest of the behavior is identical to the MPLS core outlined 591 in Section 2.3. 593 4. Rules for Choosing and Managing the Repair path 595 This section specifies rules governing how the repair path is 596 chosen and installed in the forwarding plan. Other than the rules 597 in this section, the method of choosing the repair path is beyond 598 the scope of this document. 600 1. A repair PE MUST be another edge router that advertises the same 601 prefix to the protected edge router pPE via IBGP peering. 603 2. If a repairing core router "rP" determines that the path taken by 604 the repair tunnel to a repair edge router rPE passes through the 605 protected edge router pPE, then the repairing router "rP" MUST NOT 606 install this repair path in its forwarding plane. Instead, the 607 repairing "p" router MAY use other paths that do not pass through 608 pPE or use existing core FRR mechanisms such as [13], [14], and 609 [15]. 611 3. If the repair PE "rPE" advertises one or more protected next-hops, 612 then the repair next-hop "rNH" MUST be different from any 613 protected next-hop "pNH" advertised by rPE 615 If rules (1) and (2) are not applied, then the tunnel to the repair 616 edge router rPE does not provide protection against the failure of 617 the edge node pPE. Rule (5. ) ensures that there is no ambiguity 618 about the primary and repair next-hops 620 5. Inter-operability with Existing IP FRR Mechanisms 622 Current existing IP FRR mechanisms can be divided into two 623 categories: core protection and edge protection. Core protection 624 techniques, such as [13], [14], and [15], provide protection against 625 internal node and/or link failure. Thus the technique proposed in 626 this document is not related to existing IP FRR mechanisms. If the 627 failure of an internal node or link results in completely 628 disconnecting a protectable edge node, then an administrator MAY 629 configure the repairing router to prefer the technique proposed in 630 this document over existing IP FRR mechanisms. 632 Edge protection techniques, such as [16] provide protection against 633 the failure of the link between PE and CE routers. Thus existing PE- 634 CE link protection can co-exist with the techniques proposed in this 635 document because the two techniques are independent of each other. 637 6. Example 639 We will use and LDP core as an example. Consider the diagram 640 depicted in Figure 2 below. 642 +-----------------------------------+ 643 | | 644 | LDP Core | 645 | | 646 | PE1 Lo = 9.9.9.1 647 | |\ 648 | | \ 649 | | \ 650 | | \ 651 | | CE1.......VRF "Blue" 652 | | / (10.0.0.0/8) 653 | | / (11.0.0.0/8) 654 | | / 655 | |/ 656 PE11 P--------PE0 Lo1 = 1.1.1.1/32 657 | |\ Lo2 = 1.1.1.2/32 658 | | \ 659 | | \ 660 | | \ 661 | | CE2.......VRF "Red" 662 | | / (20.0.0.0/8) 663 | | / (21.0.0.0/8) 664 | | / 665 | |/ 666 | PE2 Lo = 9.9.9.2 667 | | 668 | | 669 +-----------------------------------+ 670 Figure 2 : Edge node BGP FRR in LDP core 672 o In Figure 2, PE0 is the pPE for VRFs "Blue" and "Red". PE1 and PE2 673 are the rPEs for VRFs "Blue" and "Red", respectively. VRF Blue has 674 10.0.0.0/8 and 11.0.0.0/8 and VRF Red has 20.0.0.0/8 and 675 21.0.0.0/8 677 o Assuming PE0 uses per prefix label allocation, PE0 assigns the VPN 678 labels 4100, 4200, 4300, and 4400 to 10.0.0.0/8, 11.0.0.0/8, 679 20.0.0.0/8, and 21.0.0.0/8, respectively. PE0 advertises the 680 prefixes 10.0.0.0/8, 11.0.0.0/8, 20.0.0.0/8, and 21.0.0.0/8 using 681 MP/BGP as usual 682 6.1. Control Plane 684 1. Configuring the pNHs on PE0 686 The operator assigns 1.1.1.1 (the IP address of Loopback0) as the 687 bgpNH for prefixes belonging to vrf "Blue" and 1.1.1.2 (The IP 688 address of Loopback1) as the bgpNH for prefixes belonging to vrf 689 "Red" 691 2. Configuring protection parameters on rPEs 693 a. The operator informs PE1 that it can protect all traffic with 694 bgpNH=1.1.1.1. Accordingly 696 i. PE1 advertises 1.1.1.1 with (maximum-metric - 1) into IGP 698 ii. PE1 allocates a distinct mirror table for prefixes with 699 bgpNH=1.1.1.1 701 iii. PE1 allocates the context label cL=1100 for the mirror 702 table of bgpNH=1.1.1.1 704 iv. When advertising the FEC 1.1.1.1 to its neighboring LSRs, 705 PE1 associates the label 1100 707 v. PE2 advertises the mapping (1.1.1.1, 9.9.9.1, 1100) to 708 candidate repair router 710 vi. When PE1 receives a prefix advertisement from any peer 711 with bgpNH=1.1.1.1, PE1 inserts the VPN labels in the 712 mirror table identified by cL=1100. Hence PE1 inserts the 713 VPN labels 4100 and 4200 in the mirror table. The 714 forwarding entries for both labels is to either pop the 715 label and send the packet to an external neighbor or drop 716 the packet 718 b. The operator informs PE2 that it can protect all traffic with 719 bgpNH=1.1.1.2. Accordingly 721 i. PE2 advertises 1.1.1.2 with (maximum-metric - 1) into IGP 723 ii. PE2 allocates a distinct mirror table for prefixes with 724 bgpNH=1.1.1.2 726 iii. PE2 allocates the context label cl=1200 for the mirror 727 table of bgpNH=1.1.1.2 729 iv. When advertising the FEC 1.1.1.2 to its neighboring LSRs, 730 PE2 associates the label 1200 732 v. PE2 advertises the mapping (1.1.1.2, 9.9.9.2, 1200) to 733 candidate repair router 735 vi. When PE2 receives a prefix advertisement from any peer 736 with bgpNH=1.1.1.2, PE2 inserts the labels into the mirror 737 table identified by cL=1200. Hence PE inserts the VPN 738 labels 4300 and 4400 in the mirror table. The forwarding 739 entries for both labels is to either pop the label and 740 send the packet to an external neighbor or drop the packet 742 3. Enabling BGP FRR on the penultimate hop router "P" 744 a. If not enabled by default, the operator enables edge node 745 protection on the router "P" 747 b. Acting as a rP, the core router "P" receives the advertisements 748 (bgpNH,rNH,cL)=(1.1.1.1, 9.9.9.1,1100) and 749 (bgpNH,rNH,cL)=(1.1.1.2, 9.9.9.2,1200) from PE1 and PE2, 750 respectively. 752 c. "rP" creates the following forwarding state for 1.1.1.1 and 753 1.1.1.2 755 i. If 1.1.1.1 is not reachable 757 1. Push the context label 1100 759 2. Send the packet through the LSP terminating on 760 9.9.9.1 762 ii. If 1.1.1.2 is not reachable 764 1. Push the context label 1200 766 2. Send the packet through the LSP terminating on 767 9.9.9.2 769 6.2. Forwarding Plane at Steady State (When PE0 is reachable) 771 No change in forwarding behavior when PE0 is reachable. 773 6.3. Forwarding Plane at Failure (When PE0 is not reachable) 775 1. Repairing core router "P" 777 a. Traffic for VRF "Blue" 778 i. Receives a packet with the top label being the LDP label for 779 1.1.1.1 781 ii. 1.1.1.1 is not reachable 783 iii. Pop the LDP label of 1.1.1.1. 785 iv. Push the context label 1100 787 v. Push the LDP label for 9.9.9.1 and forward the packet 788 towards PE1 790 b. Traffic for VRF "Red" 792 i. Receives a packet with the top label being the LDP label for 793 1.1.1.2 795 ii. 1.1.1.2 is not reachable 797 iii. Pop the LDP label of 1.1.1.2. 799 iv. Push the context label 1200 801 v. Push the LDP label for 9.9.9.2 and forward the packet 802 towards PE2 804 2. The repair Router "PE1" 806 a. The penultimate hop of PE1 performs the usual penultimate hop 807 popping 809 b. PE1 receives a packet with the top label equals the context label 810 1100 812 c. PE1 makes a lookup for 1100 in its label table. The lookup yields 813 the mirror table of the bgpNH=1.1.1.1 815 d. Pop the cL=1100. This exposes the VPN label 4100 or 4200. 817 e. Lookup VPN label 4100 or 4200 in the mirror table corresponding 818 to cL=1100. The lookup results in popping the VPN label 4100 or 819 4200 and forwarding the packet natively to CE2 821 3. The repair Router "PE2" 823 a. The penultimate hop of PE2 performs the usual penultimate hop 824 popping 826 b. PE2 receives a packet with the top label equals the context label 827 1200 829 c. PE2 makes a lookup for 1200 in its label table. The lookup yields 830 the mirror table of the bgpNH=1.1.1.2 832 d. Pop the cL=1200. This exposes the VPN label 4300 or 4400 834 e. Lookup the VPN label 4300 or 4400 in the mirror table. The lookup 835 results in popping the VPN label 4300 or 4400 and forwarding the 836 packet natively to CE2 838 7. Security Considerations 840 No additional security risk is introduced by using the mechanisms 841 proposed in this document 843 8. IANA Considerations 845 No requirements for IANA 847 9. Conclusions 849 This document proposes a method that allows fast re-route 850 protection against edge node failure or complete disconnected from 851 the core in a BGP-free core. The method does not require support of 852 LFA FRR [13][14][15] and most of the provisioning effort can be 853 automated at the expense of the possible need to re-advertise 854 prefixes as described in Appendix A. 856 10. References 858 10.1. Normative References 860 [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement 861 Levels", BCP 14, RFC 2119, March 1997. 863 [2] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway Protocol 4 864 (BGP-4), RFC 4271, January 2006 866 [3] Bates, T., Chandra, R., Katz, D., and Rekhter Y., 867 "Multiprotocol Extensions for BGP", RFC 4760, January 2007 869 [4] Malhotra, P. and Rosen, E., " The BGP Encapsulation Subsequent 870 Address Family Identifier (SAFI) and the BGP Tunnel 871 Encapsulation Attribute", RFC 5512, April 2009 873 [5] Lau, J., Ed., Townsley, M., Ed., and I. Goyret, Ed., "Layer Two 874 Tunneling Protocol - Version 3 (L2TPv3)", RFC 3931, March 2005. 876 [6] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. Traina, 877 "Generic Routing Encapsulation (GRE)", RFC 2784, March 2000. 879 [7] L. Andersson, I. Minei, B. Thomas, "LDP Specifications", RFC 880 5036' October 2007 882 [8] Perkins, C., "IP Encapsulation within IP", RFC 2003, October 883 1996. 885 10.2. Informative References 887 [9] Marques,P., Fernando, R., Chen, E, Mohapatra, P., Gredler, H., 888 "Advertisement of the best external route in BGP", draft-ietf- 889 idr-best-external-04.txt, April 2011. 891 [10] Wu, J., Cui, Y., Metz, C., and E. Rosen, "Softwire Mesh 892 Framework", RFC 5565, June 2009. 894 [11] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private Networks 895 (VPNs)", RFC 4364, February 2006. 897 [12] De Clercq, J. , Ooms, D., Prevost, S., Le Faucheur, F., 898 "Connecting IPv6 Islands over IPv4 MPLS Using IPv6 Provider 899 Edge Routers (6PE)", RFC 4798, February 2007 901 [13] Atlas, A. and A. Zinin, "Basic Specification for IP Fast 902 Reroute: Loop-Free Alternates", RFC 5286, September 2008. 904 [14] Shand, S., and Bryant, S., "IP Fast Reroute", RFC5714, January 905 2010 907 [15] Shand, M. and S. Bryant, "A Framework for Loop-Free 908 Convergence", RFC 5715, January 2010. 910 [16] O. Bonaventure, C. Filsfils, and P. Francois. "Achieving sub-50 911 milliseconds recovery upon bgp peering link failures, " 912 IEEE/ACM Transactions on Networking, 15(5):1123-1135, 2007 914 [17] D. Walton, E. Chen, A. Retana, J. Scudder, "Advertisement of 915 Multiple Paths in BGP", draft-ietf-idr-add-paths-07.txt, June 916 2012 918 [18] R. Raszuk, R. Fernando, K. Patel, D. McPherson, K. Kumaki, 919 "Distribution of diverse BGP paths", draft-ietf-grow-diverse- 920 bgp-path-dist-08.txt, July 2012 922 [19] T. Bates, E. Chen, and R. Chandra, "BGP Route Reflection: An 923 Alternative to Full Mesh Internal BGP (IBGP)", RFC4456, Apr 924 2006 926 [20] P. Mohapatra, R. Fernando, C. Filsfils, and R. Raszuk, "Fast 927 Connectivity Restoration Using BGP Add-path", draft-pmohapat- 928 idr-fast-conn-restore-02, October 2011 930 11. Acknowledgments 932 Special thanks to Clarence Filsfils, Eric Rosen, Stewart Bryant, 933 and Pradosh Mohapatra for the valuable comments 935 This document was prepared using 2-Word-v2.0.template.dot. 937 Appendix A. Auto-determination of Operating Parameters on rPE and pPE 939 The main provisioning effort as outlined in Section 2 is the 940 assignment of a domain-wide distinct pNH for each rPE-pPE pair and 941 configuring the pNH on the correct pPE and rPE. This section outlines 942 a method by which the assignment of pNH to rPE on a given pPE is 943 automated thereby eliminating the need for any operator intervention 944 except for configuring the range of IP addresses from which pNHs are 945 taken. The automation comes at the expense of the need to re- 946 advertise BGP prefixes under certain conditions as outlined below in 947 this Section. 949 The objective of the automation is to 951 o Let the rPE determine which pPEs the rPE can protect and hence 952 assign a local context label "cL" for each pPE and mirror the 953 portion of the pPE routing table that rPE can protect (Remember 954 that rPE can protect a prefix advertised by pPE if rPE has an 955 external path for that prefix) 957 o Let the pPE determine which PEs can act as rPE's for some or all 958 of its prefixes and hence automatically assign a pNH for each 959 distinct rPE out of a preconfigured range of IP addresses 961 When PEs peer directly with each other, it is easy to determine the 962 router ID of the advertising router. In the presence of a router 963 reflector [19], it is not possible to directly determine the router 964 ID of the advertising PE. Hence we introduce the "RID-attr" optional 965 non-transitive attribute. The actual format of the "RID-attr" 966 attribute is TBD. It contains the router ID of the advertising PE. 968 Each PE MUST have a distinct router ID within a routing domain. "RID- 969 attr" MUST be advertised with each protectable prefix. 970 A.1. How rPE determines the Protected PE 972 Assuming that the "RID-attr" is advertised as an optional attribute 973 with all protectable prefixes, the rPE applies the following steps to 974 determine the pPE 975 1. rPE receives route advertisements from another peer and the 976 advertisement includes the peer's RID in the optional attribute 977 "RID-Attr" 979 2. If rPE has an external path for some or all of the received route 980 advertisements and rPE advertises some or all these route 981 advertisements (as best paths or otherwise such as [9], [17], and 982 [18]), then it considers the peer as a pPE 984 a. rPE allocates a distinct context "cL" label for the pPE 986 b. rPE advertises the mapping cL-->RID all the time to all peers. 987 The syntax is TBD for the time being but a method similar to 988 advertising tunnel information [4] can be used 990 3. If rPE loses all external paths for all prefixes from the peer 991 identified by "RID", then rPE withdraws the mapping "cL-->RID" 993 4. If rPE cannot protect all routes advertised by the pPE but can 994 protect some of them, then rPE re-advertises the protectable 995 prefixes it previously advertise but attaches the context label 996 "cL" as a non-transitive optional attribute. The syntax of "cL" is 997 TBD. This is one of the cases where prefixes previously advertised 998 need to be re-advertised 1000 5. rPE creates a mirror table for pPE. If rPE can protect a route 1001 received from pPE, then rPE mirrors that route into the mirror 1002 table for pPE 1004 A.2. How pPE Determines its rPEs and Assigns pNH for each rPE 1006 1. When pPE receives the mapping cL-->RID where RID is the router ID 1007 of the pPE, pPE assumes the router that advertised the mapping cL- 1008 ->RID is an rPE 1010 2. pPE allocates a distinct pNH for the rPE 1011 3. The next step is for pPE to re-advertise some or all of its 1012 prefixes but use the pNH assigned to rPE as bgpNH. Let 1013 {P1/m1,...,Pk/mk} be the set prefix that rPE advertises to its 1014 peers (as best paths or otherwise such as [9], [17], and [18]) 1015 and, at the same time, pPE advertises as reachable prefixes in the 1016 the NLRI field. There are two cases 1018 a. Case 1: rPE advertises the mapping cL-->RID but rPE does not 1019 associate the context label "cL" as an optional attribute with 1020 any prefix {P1/m1,...,Pk/mk} 1022 b. Case 2: rPE advertises the mapping cL-->RID and rPE associates 1023 "cL" as an optional attribute with a *subset* of the prefixes 1024 {P1/m1,...,Pk/mk} 1026 4. Case 1: rPE does not associate the context label "cL" with 1027 advertised prefixes. In that case, pPE assumes that rPE can 1028 protect all of the prefixes {P1/m1,...,Pk/mk}. Hence pPE re- 1029 advertises {P1/m1,...,Pk/mk} uses the pNH assigned for the rPE as 1030 bgpNH. 1032 5. Case 2: rPE associates "cL" with a *subset* of {P1/m1,...,Pk/mk}. 1033 In that case, pPE assumes the rPE can only protect the subset of 1034 {P1/m1,...,Pk/mk} that has "cL". Hence rPE re-advertises this 1035 subset but uses the pNH assigned for the rPE as bgpNH. 1037 6. Cases 1 and 2 are the second case where prefixes previously 1038 advertised are re-advertised without any topology changes 1040 A.3. Detecting Mis-configuration 1042 The auto assignment of pNH described in this appendix still requires 1043 the operator to configure a range of IP addresses from which a pPE 1044 allocates the protected next-hops "pNH". Because the pNH allocated by 1045 two different pPEs MUST NOT be identical, then the range of IP 1046 addresses on two different pPEs MUST NOT overlap. Hence the only 1047 possible misconfiguration is configuring overlapping IP ranges on two 1048 different pPE. This section describes how such misconfiguration can 1049 be detected. Suppose pPE1 and pPE2 where configured with overlapping 1050 IP ranges. Such misconfiguration can be detected as follows: 1052 1. Because in case of misconfiguration the IP ranges on pPE1 and pPE2 1053 overlap, then at one point in time, pPE1 will allocate a pNH that 1054 falls within the IP range configured on pPE2 1056 2. As described in Section A.2 pPE1 re-advertises some or all of its 1057 prefixes and use the allocated pNH as the bgpNH attribute 1059 3. When pPE2 receives an advertisement from another peer containing a 1060 bgpNH within pPE2's configured IP range, then pPE2 detects the 1061 misconfiguration 1063 Appendix B. Ensuring correct forwarding at the edge routers 1065 As mentioned in Section 2 both rPE and pPE advertise the protected 1066 next-hop "pNH" in the core. To ensure no impact on traffic 1067 engineering, rPE advertises "pNH" with (max-metric - 1). When the 1068 primary edge router pPE becomes totally disconnected from the core, 1069 some core routers may start to forward traffic originally destined to 1070 pPE to rPE. Thus it is possible that traffic originally destined to 1071 pPE arrives at rPE without "cL" appearing at the top of the label 1072 stack. The behavior explained in Section 2 for MPLS core and Section 1073 3 for IP core ensures that traffic is forwarded correctly when 1074 arriving at rPE. 1076 In an MPLS core, the rPE advertises the label "cL" for pNH. Hence 1077 traffic originally destined for pNH and re-routed by a core router 1078 towards rPE will arrive at rPE with "cL" at the top. Hence rPE can 1079 identify the correct mirror table and be able forward the packet 1080 correctly 1082 In an IP core, rPE associates the IP address "pNH" with the mirror 1083 table. Hence if a core router re-routes traffic originally tunneled 1084 towards pPE to rPE, the tunnel packets arrive at rPE with the 1085 destination address "pNH". This allows rPE to identify the correct 1086 mirror table and be able to forward the packet correctly 1088 Authors' Addresses 1090 Ahmed Bashandy 1091 Cisco Systems 1092 170 West Tasman Dr, San Jose, CA 95134 1093 Email: bashandy@cisco.com 1095 Maciek Konstantynowicz 1096 Cisco Systems 1097 170 West Tasman Dr, San Jose, CA 95134 1098 Email: mkonstan@cisco.com 1100 Nagendra Kumar 1101 Cisco Systems 1102 170 West Tasman Dr, San Jose, CA 95134 1103 Email: naikumar@cisco.com