idnits 2.17.1 

draft-bashandy-bgp-edge-node-frr-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** There is 1 instance of too long lines in the document, the longest one
     being 9 characters in excess of 72.

  == There are 32 instances of lines with non-RFC6890-compliant IPv4
     addresses in the document.  If these are example addresses, they should
     be changed.

  == There are 14 instances of lines with private range IPv4 addresses in the
     document.  If these are generic example addresses, they should be changed
     to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x,
     198.51.100.x or 203.0.113.x.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document seems to contain a disclaimer for pre-RFC5378 work, but was
     first submitted on or after 10 November 2008.  The disclaimer is usually
     necessary only for documents that revise or obsolete older RFCs, and that
     take significant amounts of text from those RFCs.  If you can contact all
     authors of the source material and they are willing to grant the BCP78
     rights to the IETF Trust, you can and should remove the disclaimer. 
     Otherwise, the disclaimer is needed and you can ignore this comment. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (July 16, 2012) is 4300 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: '5' is defined on line 943, but no explicit reference
     was found in the text

  ** Obsolete normative reference: RFC 5512 (ref. '4') (Obsoleted by RFC 9012)

  == Outdated reference: A later version (-05) exists of
     draft-ietf-idr-best-external-04

  == Outdated reference: A later version (-04) exists of
     draft-bashandy-idr-bgp-repair-label-02


     Summary: 2 errors (**), 0 flaws (~~), 7 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Network Working Group                                       A. Bashandy
2	Internet Draft                                             B. Pithawala
3	Intended status: Standards Track                               K. Patel
4	Expires: January 2013                                     Cisco Systems
5	                                                          July 16, 2012

7	          Scalable BGP FRR Protection against Edge Node Failure
8	                 draft-bashandy-bgp-edge-node-frr-03.txt

10	Abstract

12	Consider a BGP free core scenario. Suppose the edge BGP speakers PE1,
13	PE2,..., PEn know about a prefix P/m via the external routers CE1,
14	CE2,..., CEm.  If the edge router PEi crashes or becomes totally
15	disconnected from the core, it is desirable for a core router "P"
16	carrying traffic to the failed edge router PEi to immediately restore
17	traffic by re-tunneling packets originally tunneled to PEi and
18	destined to the prefix P/m to one of the other edge routers that
19	advertised P/m, say PEj, until BGP re-converges. In doing so, it is
20	highly desirable to keep the core BGP-free while not imposing
21	restrictions on external connectivity. Thus (1) a core router should
22	not be required to learn any BGP prefix, (2) the size of the
23	forwarding and routing tables in the core routers should be
24	independent of the number of BGP prefixes,(3) provisioning overhead
25	should be kept at minimum, (4) re-routing traffic without waiting for
26	re-convergence must not cause loops, and (4) there should be no
27	restrictions on what edge routers advertise what prefixes. For
28	labeled prefixes, (6) the label stack on the packet must allow the
29	repair PEj to correctly forward the packet and (7) there must not be
30	any need to perform more than one label lookup on any edge or core
31	router during steady state

33	Status of this Memo

35	   This Internet-Draft is submitted in full conformance with the
36	   provisions of BCP 78 and BCP 79.

38	   This document may contain material from IETF Documents or IETF
39	   Contributions published or made publicly available before November
40	   10, 2008. The person(s) controlling the copyright in some of this
41	   material may not have granted the IETF Trust the right to allow
42	   modifications of such material outside the IETF Standards Process.
43	   Without obtaining an adequate license from the person(s)
44	   controlling the copyright in such materials, this document may not
45	   be modified outside the IETF Standards Process, and derivative
46	   works of it may not be created outside the IETF Standards Process,
47	   except to format it for publication as an RFC or to translate it
48	   into languages other than English.

50	   Internet-Drafts are working documents of the Internet Engineering
51	   Task Force (IETF), its areas, and its working groups.  Note that
52	   other groups may also distribute working documents as Internet-
53	   Drafts.

55	   Internet-Drafts are draft documents valid for a maximum of six
56	   months and may be updated, replaced, or obsoleted by other
57	   documents at any time.  It is inappropriate to use Internet-Drafts
58	   as reference material or to cite them other than as "work in
59	   progress."

61	   The list of current Internet-Drafts can be accessed at
62	   http://www.ietf.org/ietf/1id-abstracts.txt

64	   The list of Internet-Draft Shadow Directories can be accessed at
65	   http://www.ietf.org/shadow.html

67	   This Internet-Draft will expire on January 16, 2013.

69	Copyright Notice

71	   Copyright (c) 2012 IETF Trust and the persons identified as the
72	   document authors. All rights reserved.

74	   This document is subject to BCP 78 and the IETF Trust's Legal
75	   Provisions Relating to IETF Documents
76	   (http://trustee.ietf.org/license-info) in effect on the date of
77	   publication of this document. Please review these documents
78	   carefully, as they describe your rights and restrictions with
79	   respect to this document. Code Components extracted from this
80	   document must include Simplified BSD License text as described in
81	   Section 4.e of the Trust Legal Provisions and are provided without
82	   warranty as described in the Simplified BSD License.

84	Table of Contents

86	   1. Introduction...................................................3
87	      1.1. Conventions used in this document.........................4
88	      1.2. Terminology...............................................5
89	      1.3. Problem definition........................................6
90	   2. Overview of the solution in an MPLS Core.......................7
91	      2.1. Control Plane operation for Automated pNH Assignment......7
92	      2.2. Control Plane operation for Configured pNH...............10
93	      2.3. Forwarding behavior at Steady State (When pPE is reachable)11
94	      2.4. Forwarding behavior when pPE Fails.......................12
95	   3. Overview of the solution in a Pure IP Core....................13
96	      3.1. Control Plane operation..................................13
97	      3.2. Forwarding Behavior at Steady State (while pPE is reachable)
98	      ..............................................................13
99	      3.3. Forwarding Behavior at Failure (when pPE is not reachable)14
100	   4. Example.......................................................15
101	      4.1. Control Plane............................................16
102	      4.2. Forwarding Plane at Steady State (When PE0 is reachable).16
103	      4.3. Forwarding Plane at Failure (When PE0 is not reachable)..17
104	   5. Inter-operability with Existing IP FRR Mechanisms.............19
105	   6. Security Considerations.......................................19
106	   7. IANA Considerations...........................................19
107	   8. Conclusions...................................................19
108	   9. References....................................................20
109	      9.1. Normative References.....................................20
110	      9.2. Informative References...................................21
111	   10. Acknowledgments..............................................21
112	   Appendix A. How to protect Against Misconfigured pNH.............22
113	   Appendix B. Alternative Approach for advertising (pNH,rNH) to iPE23
114	   Appendix C. Modification History.................................24
115	         A.1.1. Changes from Version 02.............................24
116	         A.1.2. Changes from Version 01.............................24

118	1. Introduction

120	   In a BGP free core, where traffic is tunneled between edge routers,
121	   BGP speakers advertise reachability information about prefixes to
122	   other edge routers not to core routers. For labeled address
123	   families, namely AFI/SAFI 1/4, 2/4, 1/128, and 2/128, an edge
124	   router assigns local labels to prefixes and associates the local
125	   label with each advertised prefix such as L3VPN [10], 6PE [11], and
126	   Softwire [9]. Suppose that a given edge router is chosen as the
127	   best next-hop for a prefix P/m. An ingress router that receives a
128	   packet from an external router and destined to the prefix P/m
129	   "tunnels" the packet across the core to that egress router. If the
130	   prefix P/m is a labeled prefix, the ingress router pushes the label
131	   advertised by the egress router before tunneling the packet to the
132	   egress router. Upon receiving the packet from the core, the egress
133	   router takes the appropriate forwarding decision based on the
134	   content of the packet or the label pushed on the packet.

136	   In modern networks, it is not uncommon to have a prefix reachable
137	   via multiple edge routers. One example is the best external path
138	   [8]. Another more common and widely deployed scenario is L3VPN [10]
139	   with multi-homed VPN sites. As an example, consider the L3VPN
140	   topology depicted in Figure 1.

142	                                PE1 .............+
143	                                                |
144	                                       +--------+---------------+
145	                                       |                        |
146	                                       |   VPN 1 Network        |
147	                                       |                        |
148	                                       |            VPN prefix  |
149	                                       |           (10.0.0.0/8) |
150	                                       |                        |
151	                                       +---+--------------------+
152	                                           |
153	                                   /------CE1
154	                                  /
155	                                 /
156	    BGP-free core      P--------PE0
157	                                 \
158	                                  \
159	                                   \------CE2
160	                                           |
161	                                       +---+--------------------+
162	                                       |                        |
163	                                       |   VPN 2 Network        |
164	                                       |                        |
165	                                       |            VPN prefix  |
166	                                       |           (20.0.0.0/8) |
167	                                       |                        |
168	                                       +--------+---------------+
169	                                                |
170	                               PE2 .............+

172	             Figure 1 VPN prefix reachable via multiple PEs

174	   As illustrated in Figure 1, the edge router PE0 is the primary NH
175	   for both 10.0.0.0/8 and 20.0.0.0/8. At the same time, both
176	   10.0.0.0/8 and 20.0.0.0/8 are reachable through the other edge
177	   routers PE1 and PE2, respectively.

179	   1.1. Conventions used in this document

181	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
182	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
183	   this document are to be interpreted as described in RFC-2119 [1].

185	   In this document, these words will appear with that interpretation
186	   only when in ALL CAPS. Lower case uses of these words are not to be
187	   interpreted as carrying RFC-2119 significance.

189	   1.2. Terminology

191	   This section defines the terms used in this document. For ease of
192	   use, we will use terms similar to those used by L3VPN [10]

194	   o  BGP-Free core: A network where BGP prefixes are only known to
195	      the edge routers and traffic is tunneled between edge routers

197	   o  External prefix: It is a prefix P/m (of any AFI/SAFI) that a BGP
198	      speaker has an external path for. The BGP speaker may learn
199	      about the prefix from an external peer through BGP, some other
200	      protocol, or manual configuration. The protected prefix is
201	      advertised to some or all of the internal peers.

203	   o  Protectable prefix: It is an external prefix P/m of any
204	      AFI/SAFI) that a BGP speaker has an external path to and is
205	      eligible to have a repair path.

207	   o  Primary Egress PE, "ePE": It is an IBGP peer that can reach the
208	      prefix P/m through an external path and advertised the prefix to
209	      the other IBGP peers. The primary egress PE was chosen as the
210	      best path by one or more internal peers. In other words, the
211	      primary egress PE is an egress PE that will normally be used by
212	      some ingress PEs when there is no failure. Referring to Figure
213	      1, PE0 is an egress PE.

215	   o  Protected egress PE, "pPE" (Protected PE for simplicity): It is
216	      an egress PE that has or eligible to have a repair path for some
217	      or all of the prefixes to which it has an external path
218	      Referring to Figure 1, PE0 is a protected egress PE.

220	   o  Protected edge router: Any protected egress PE.

222	   o  Protected next-hop (pNH): It is an IPv4 or IPv6 host address
223	      belonging to the protected egress PE. Traffic tunneled to this
224	      IP address will be protected via the mechanism proposed in this
225	      document. Note that the protected next-hop MUST be different
226	      from the next-hop attribute in the BGP update message [2][3].

228	   o  CE: It is an external router through which an egress PE can
229	      reach a prefix P/m. The routers "CE1" and "CE2" in Figure 1 are
230	      examples of such CEs.

232	   o  Ingress PE, "iPE": It is a BGP speaker that learns about a
233	      prefix through another IBGP peer and chooses that IBGP peer as
234	      the next-hop for the prefix.

236	   o  Repairing P router "rP" (Also "Repairing core router" and
237	      "repairing router"): A core router that attempts to restore
238	      traffic when the primary egress PE is no longer reachable
239	      without waiting for IGP or BGP to re-converge. The repairing P
240	      router restores the traffic by rerouting the traffic (through a
241	      tunnel) towards the pre-calculated repair PE when it detects
242	      that the primary egress PE is no longer reachable. Referring to
243	      Figure 1, the router "P" is the repairing P router.

245	   o  Repair egress PE "rPE" (Repair PE for simplicity): It is an
246	      egress PE other than the primary egress PE that can reach the
247	      protected prefix P/m through an external neighbor. The repair PE
248	      is pre-calculated prior to any failure. Referring to Figure 1,
249	      PE1 is the repair PE for 10.0.0.0/8 while PE2 is the repair PE
250	      for 20.0.0.0/8.

252	   o  Underlying Repair label (rL): The underlying repair label is the
253	      label that will be pushed so that the repair PE can forward
254	      repaied traffic correctly. A repair label is defined for labeled
255	      protected prefixes only.

257	   o  Repair next-hop (rNH): It is an IPv4 or IPv6 host address
258	      belonging to the repair egress PE. If the protected prefix is
259	      advertised via BGP, then the repair next-hop SHOULD be the next-
260	      hop attribute in the BGP update message [2][3].

262	   o  Repair path (Also Repair Egress Path): It is the repair next-
263	      hop. If an underlying repair label exists, the repair path is
264	      the repair next-hop together with the underlying repair label.

266	   o  Primary tunnel: It is the tunnel from the ingress PE to the
267	      primary egress PE

269	   o  Repair tunnel: It is the tunnel from the repairing P router to
270	      the repair egress PE

272	   1.3. Problem definition

274	   The problem that we are trying to solve is as follows

276	   o  Even though multiple prefixes may share the same egress router,
277	      they have different repair edge router. In Figure 1 above, both
278	      10.0.0.0/8 and 20.0.0.0/8 share the same primary next hop PE0,
279	      the routing protocol(s) must identify that the node protecting
280	      repair node for 10.0.0.0/8 is PE1 while the node protecting
281	      repair node for 11.0.0.0/8 is PE2

283	   o  On loosing connection to the edge router, the core router "P"
284	      MUST reroute traffic towards the *correct* repair edge router
285	      without waiting for IGP or BGP to re-converge and update the
286	      routing tables. On the failure of PE0 illustrated in Figure 1,
287	      the core router P needs to reroute traffic for 10.0.0.0/8
288	      towards PE1 and traffic for 11.0.0.0/8 towards PE2

290	   o  The repairing core router P MUST NOT be forced to learn about
291	      the BGP prefixes on any of the edge router. The same applies for
292	      all core routers.

294	   o  The size of the routing table on any core router MUST be
295	      independent of the number of BGP prefixes in the network.

297	   o  Rerouting traffic without waiting for IGP and BGP to re-converge
298	      after a failure MUST NOT cause loops.

300	   o  For labeled prefixes, when a packet gets re-routed to the repair
301	      PE, the label stack on the packet MUST ensure correct
302	      forwarding.

304	   o  Provisioning overhead must be kept at minimum. In addition,
305	      misconfiguration should be detectable.

307	   o  At steady state, when pPE is reachable, a path taken by traffic
308	      flow must not be impacted by enabling the solution proposed in
309	      this document on some or all routers

311	2. Overview of the solution in an MPLS Core

313	   The solution proposed in this document relies on the collaboration of
314	   egress PE, ingress PE, penultimate hop routers, and repairing router.
315	   This section gives an overview of how to the solution works for
316	   labeled and unlabeled protected prefixes in an MPLS core.

318	   2.1. Control Plane operation for Automated pNH Assignment

320	   This section outlines the solution for the case where the protected
321	   next hop "pNH" is automatically calculated instead of being assigned
322	   by an operator.

324	   1. Each egress router that is capable of handling repaired traffic
325	      assigns each protectable labeled prefix a repair label: "rL". "rL"
326	      is advertised as optional path attribute. "rL" MUST be per-CE or
327	      per-VRF for good BGP attribute packing and forwarding simplicity.
328	      For unlabeled prefix, no repair label is needed. A router that is
329	      capable of handling repaired traffic is called a repair PE "rPE".
330	      The semantics of the repair label "rL" is:

332	       a. pop *two* labels

334	       b. If "rL" is per-CE, then and send the packet to the appropriate
335	          CE

337	       c. If "rL" is per-VRF, forward the packet based on the contents
338	          under the two popped labels

340	   2. If an Egress PE knows that a P/m to which it has an external path
341	      is also reachable via another PE and that other PE advertises a
342	      repair label "rL" for P/m,

344	       a. It chooses the other PE as a repair PE. Let's call the chosen
345	          repair PE "rPE". The ePE chooses an IP address "rNH" local to
346	          or advertised by rPE.

348	           i. "rNH" SHOULD be the next-hop attribute advertised by rPE
349	               when it announces reachability to the protected prefix
350	               P/m to minimize the number of prefixes advertised into
351	               IGP.

353	          ii. if rPE also advertised a protected next-hop (pNH) for any
354	               BGP prefix that rPE can protect, then rNH MUST NOT be any
355	               protected next-hop (pNH) advertised by rPE.

357	       b. Allocates a local IP address corresponding to the chosen rPE,
358	          say "pNH". "pNH" represents the protected NH. I.e. Traffic
359	          tunneled to "pNH" will be protected against edge node failure
360	          via the BGP FRR mechanism proposed in this document

362	       c. A separate pNH is needed for every rPE (for a given protected
363	          PE). Each pNH must be unique within a single BGP-free core.

365	       d. Now that "ePE" has a repair path for P/m, it becomes a
366	          protected PE "pPE".

368	       e. Advertise pNH as a prefix into IGP

370	       f. Re-advertise the protected prefix P/m to other iBGP peers with
371	          "pNH" as optional non-transitive attribute

373	       g. pPE advertises the mapping (pNH,rNH) separately to all ingress
374	          PEs. A method analogous to how tunnel information is
375	          advertised [4] can be used to advertise this mapping (pNH,rNH)
376	          to ingress PE's.

378	       h. Once iPE receives the pNH for each prefix and the mapping
379	          (pNH,rNH), the iPE can retrieve "rL" for P/m from the
380	          advertisement of rPE for P/m.

382	       i. "pPE" advertises the pair (pNH,rNH) to candidate repairing
383	          core routers.

385	       j. "pPE" advertises the protected next-hop "pNH" to the
386	          penultimate hops to indicate that traffic flowing through the
387	          tunnel to the tail end "pNH" is protected against the failure
388	          of the node "pPE" and requires special processing by the
389	          penultimate hop as will be described in the next few steps

391	       k. pPE advertises an explicit label for pNH instead of the usual
392	          implicit NULL. This way pPE can carry out the special label
393	          popping behavior (described in the next section if the
394	          penultimate hop cannot perform this task

396	   3. Ingress PE "iPE"

398	       a. iPE receives the protected prefix P/m with "pNH" as an
399	          optional attribute

401	       b. iPE also receives the mapping (pNH,rNH) from pPE

403	       c. When iPE receives "rL" with P/m from rPE, then iPE can
404	          associate "rL" with P/m as described in Section 2.1.

406	   As a result of the above steps, the following nodes store the
407	   following information

409	   o  Ingress PE (iPE)

411	       o Receives from pPE NLRI advertisement for the protected labeled
412	          prefix P/m containing the usual next-hop attribute and the
413	          optional information "pNH". iPE also receives that mapping
414	          (pNH, rNH).

416	       o iPE retrieves "rL" from the advertisement of rPE for the
417	          protected prefix P/m.

419	       o Assume that iPE chooses pPE as the primary NH. Then the iPE
420	          will use pNH as the tunnel tail end to pPE instead of the
421	          usual BGP next-hop

423	   o  Penultimate Hop

425	       o Receives the "pNH" from pPE

427	       o As such, it knows that traffic destined to pNH needs certain
428	          special forwarding treatment as described in the next few
429	          steps

431	       o Penultimate hop advertises "pNH" as its own prefix but with
432	          one of the following conditions

434	            . For link-state IGPs, "pNH" MAY be advertised with
435	               *maximum metric* so as not to affect the path taken by
436	               the traffic flowing from iPE's to pPE's

438	            . For distance vector IGPs, the penultimate hop MAY
439	               advertise the metric of "pNH" as follows

441	               PHP-metric(pNH) =

443	                     pPE-metric(pNH) + metric-From-PHP-to-pPE

445	               That is the metric advertised by the penultimate hop for
446	               pNH equals the metric advertised by pPE for pNH plus the
447	               metric from the penultimate hop to pPE

449	            . This way the advertisement of pNH by the penultimate hop
450	               does not impact the path taken by the traffic from iPE's
451	               to pPE's

453	   o  Repairing core router "rP" (which may also be the penultimate hop)

455	       o Receives the pair (pNH,rNH) from pPE

457	       o Installs the following forwarding entry for pNH

459	            . If pNH is not reachable, re-tunnel traffic to rNH

461	   2.2. Control Plane operation for Configured pNH

463	   In Section 2.1, the pPE assigned pNH to a protected prefix P/m
464	   based on the chosen rPE. The result of this behavior is the need to
465	   re-advertise the protected prefix P/m with the associated "pNH". In
466	   this section, we outline the procedure by which the operator can pre-
467	   assign pNH to protected prefixes and hence avoid the need to re-
468	   advertise protected prefixes.

470	   1. Protected PE "pPE"

472	       a. The operator groups prefixes such that two prefixes belong to
473	          the same group if the operator knows that the two prefixes are
474	          protected by the same rPE

476	       b. The operator assigns a distinct protected next-hop "pNH" for
477	          every group of prefixes. The assignment occurs even a repair
478	          path for P/m is not yet known.

480	       c. pPE advertises "pNH" as an optional non-transitive attribute
481	          with the protected prefix P/m *all the time* even of no other
482	          PE advertises P/m

484	       d. When pPE receives an advertisement for P/m from another PE

486	           i. pPE chooses the other PE as rPE

488	          ii. pPE advertises the mapping (pNH,rNH) separately to all
489	               ingress PEs. rNH SHOULD be the next-hop attribute
490	               advertised by rPE. A method analogous to how tunnel
491	               information is advertised [4] can be used to advertise
492	               this mapping (pNH,rNH) to ingress PE's.

494	       e. The rest of the behavior is identical to what specified in
495	          Section 2.1.

497	   2. How to Protect the network against misconfigured pNH?

499	   See Appendix A.

501	   What is left it to outline the forwarding behavior before and after
502	   "pPE" failure.

504	   2.3. Forwarding behavior at Steady State (When pPE is reachable)

506	   This section outlines the packet forwarding procedure when pPE is
507	   still reachable

509	   1. Ingress PE (iPE) receives a packet matching P/m and reachable via
510	      pPE

512	   2. The iPE pushes three labels:

514	       o Bottom label: VPN label advertised by pPE

516	       o Second label: rL

518	       o Top label: IGP label towards pNH (not the BGP next-hop
519	          attribute)

521	   3. Penultimate Hop

523	       a. Receives a packet with top label bound to pNH

525	       b. Pops *two* labels *all the time*.

527	       c. Sends packet to pNH

529	   4. Protected PE (pPE)

531	       a. Receives a packet with top label as VPN label

533	       b. Forwards the packet as usual

535	       c. For unlabeled packets, the iPE only pushes the rL and the IGP
536	          label of pNH and the pPE uses the IP header for forwarding.

538	   Thus the packet can be delivered correctly to its destination.

540	   2.4. Forwarding behavior when pPE Fails

542	   The repairing core router directly connected to a failure detects
543	   that pNH is no longer reachable. The following steps are applied.

545	   1. Repairing core router "rP"

547	       a. Receives packet with top label bound to pNH

549	       b. pNH is not reachable

551	       c. Swap the top label with the label of rNH

553	       d. Send packet towards rPE

555	       In effect, the repairing router re-tunnels the packet towards
556	       the repair PE

558	   2. Penultimate hop of rPE

560	       a. rNH is not a protected NH for rPE

562	       b. Thus the penultimate hop employs the usual penultimate hop
563	          popping and then forwards the packet to rPE

565	   3. Repair PE (rPE)

567	       a. Receives packet with top label rL (which rPE advertised) and
568	          underneath it the regular VPN label advertised by the
569	          protected PE "pPE"

571	       b. Make a lookup on "rL"

573	       c. rL per CE

575	           i. Pop *two* labels.

577	          ii. Send to correct CE

579	       d. rL per VRF

581	           i. Pop *two* labels.

583	          ii. Make IP lookup in appropriate VRF

585	         iii. Send to the CE

587	       e. rL is assigned to unlabeled prefix

589	           i. Pop "rL"

591	          ii. Send the packet to the correct CE

593	3. Overview of the solution in a Pure IP Core

595	   This section provides an overview of the solution when operating in a
596	   pure IP core where core routers only understand IPv4 or IPv6
597	   protocols. Thus traffic between PEs is transported using IP tunnels
598	   such as [4][6][7].

600	   3.1. Control Plane operation

602	   The control plane behavior in an IP core is identical to its behavior
603	   in an MPLS core.

605	   3.2. Forwarding Behavior at Steady State (while pPE is reachable)

607	   1. Ingress PE (iPE) receives a packet matching P/m and reachable via
608	      pPE

610	   2. Ingress PE:

612	       o For labeled traffic, Pushes two labels

614	            . Bottom label: VPN label advertised by pPE

616	            . Second label: rL

618	       o For unlabeled traffic, just push "rL"

620	       o Encapsulates the packet into the IP tunnel header towards the
621	          pNH

623	   3. Penultimate Hop
624	       o No special behavior is needed from the penultimate hop while
625	          pPE is reachable

627	   4. Protected PE

629	       a. Receives an IP packet encapsulated in an IP tunnel header with
630	          destination address pNH

632	       b. Decapsulate the IP tunnel header and the label right under it
633	          (which will be the repair label "rL")

635	       c. For labeled traffic, the VPN label is exposed. So pPE makes a
636	          lookup using the VPN label. Otherwise the usual IP forwarding
637	          is applied

639	       d. Forwards the packet as usual

641	   3.3. Forwarding Behavior at Failure (when pPE is not reachable)

643	   The repairing router directly connected to a failure detects that pNH
644	   is no longer reachable. The following steps are applied.

646	   1. Repairing router "rP"

648	       a. Receives IP packet with a tunnel header destined to pNH

650	       b. pNH is not reachable

652	       c. Replace the tunnel header with a tunnel header with
653	          destination address rNH

655	       d. Forward the packet to rNH

657	   2. Repair PE (rPE)

659	       a. Receives IP packet with a tunnel header destined to rNH

661	       b. Decapsulate the tunnel header to expose the repair label "rL"

663	       c. The rest of the behavior is identical to the behavior in an
664	          MPLS Core.

666	4. Example

668	   We will use an LDP core as an example. Consider the diagram
669	   depicted in Figure 2 below.

671	     +-----------------------------------+
672	     |                                   |
673	     |   LDP Core                        |
674	     |                                   |
675	     |                                  PE1  Lo = 9.9.9.1
676	     |                                   |\
677	     |                                   | \
678	     |                                   |  \
679	     |                                   |   \
680	     |                                   |  CE1.......VRF "Blue"
681	     |                                   |   /       (10.0.0.0/8)
682	     |                                   |  /        (11.0.0.0/8)
683	     |                                   | /
684	     |                                   |/
685	   PE11                        P--------PE0    Lo1 = 1.1.1.1/32
686	     |                                   |\    pNH Range = 2.1.1.0/24
687	     |                                   | \
688	     |                                   |  \
689	     |                                   |   \
690	     |                                   |  CE2.......VRF "Red"
691	     |                                   |   /       (20.0.0.0/8)
692	     |                                   |  /        (21.0.0.0/8)
693	     |                                   | /
694	     |                                   |/
695	     |                                  PE2  Lo = 9.9.9.2
696	     |                                   |
697	     |                                   |
698	     +-----------------------------------+
699	                Figure 2 : Edge node BGP FRR in LDP core

701	   o  In Figure 2, PE0 is the pPE for VRFs "Blue" and "Red" while PE1
702	      and PE2 are the rPEs for VRFs "Blue" and "Red", respectively. VRF
703	      Blue has 10.0.0.0/8 and 11.0.0.0/8 and VRF Red has 20.0.0.0/8 and
704	      21.0.0.0/8

706	   o  Assuming PE0 uses per prefix label allocation, PE0 assigns the VPN
707	      labels 4100, 4200, 4300, and 4400 to 10.0.0.0/8, 11.0.0.0/8,
708	      20.0.0.0/8, and 21.0.0.0/8 respectively. PE0 advertises the
709	      prefixes 10.0.0.0/8, 11.0.0.0/8, 20.0.0.0/8, and 21.0.0.0/8 using
710	      MP/BGP as usual

712	   4.1. Control Plane

714	   1. rPEs Allocate and advertise Repair labels

716	       a. Acting as a rPE, PE1 allocates (on per-CE basis) and
717	          advertises a repair label rL1=3100 with the prefixes
718	          10.0.0.0/8 and 11.0.0.0/8 to all iBGP peers

720	       b. Similarly, PE2 allocates and advertises the repair label
721	          rL2=3200 with the prefixes 20.0.0.0/8 and 21.0.0.0/8

723	   2. pPE calculates and advertises the pNHs

725	       a. For prefixes belonging to VRF "blue", PE0 allocates
726	          rNH1=2.1.1.1 because all of them are protected by PE1

728	       b. Similarly, for prefixes belonging to VRF "red", PE0
729	          allocates rNH2=2.1.1.2 because VRF "red" is protected by PE2

731	       c. PE0 advertises (pNH1,rNH1)=(2.1.1.1, 9.9.9.1) and
732	          (pNH2,rNH2)=(2.1.1.2, 9.9.9.2) to the ingress PE PE11 and
733	          the repairing core router "P".

735	       d. PE0 re-advertises 10.0.0.0/8 & 11.0.0.0/8 with the optional
736	          attribute pNH1=2.1.1.1, and 20.0.0.0/8 & 21.0.0.0/8 with
737	          pNH=2.1.1.2 to the ingress PE PE11

739	   3. The ingress PE "PE11" creates the following forwarding state

741	       a. For prefixes 10.0.0.0/8 & 11.0.0.0/8: Push the VPN labels
742	          4100 and 4200, respectively, followed by rL=3100 then tunnel
743	          the packet to 2.1.1.1

745	       b. For prefixes 20.0.0.0/8 & 21.0.0.0/8: Push the VPN labels
746	          4300 and 4400, respectively, followed by rL=3200; then
747	          tunnel the packet to 2.1.1.2

749	   4.2. Forwarding Plane at Steady State (When PE0 is reachable)

751	   1. Ingress PE PE11

753	       a. Traffic for VRF "Blue"

755	           i. PE11 receives a packet for VRF Blue with destination
756	               address 10.1.1.1 from an external router.

758	          ii. PE11 pushes the following labels

760	                 1. The VPN label 4100
761	                 2. The Repair label 3100

763	                 3. The LDP label for the pNH 2.1.1.1

765	       b. Traffic for VRF "Red"

767	           i. PE11 receives a packet for VRF Red with destination
768	               address 20.1.1.1 from an external router

770	          ii. PE11 pushes the following labels

772	                 1. The VPN label 4300

774	                 2. The Repair label 3200

776	                 3. The LDP label for the pNH 2.1.1.2

778	   2. Penultimate Hop of PE0 (Which is also the rP "P")

780	       a. Receives a packet with top label for the protected next-hop
781	          2.1.1.1 or 2.1.1.2

783	       b. Pops *2* labels

785	       c. Forwards the packet to pPE which is 1.1.1.1

787	   3. Protected PE PE0

789	       a. Traffic for VRF "Blue"

791	           i. PE0 receives traffic with the top label 4100.

793	          ii. 4100 is the VPN label 10.1.1.1 belonging to VRF "Blue"

795	         iii. PE0 pops the label 4100 and forwards the packet to CE1

797	       b. Traffic for VRF "Red"

799	           i. PE0 receives traffic with the top label 4300.

801	          ii. 4300 is the VPN label for 20.1.1.1 belonging to VRF "Red"

803	         iii. PE0 pops the label 4300 and forwards the packet to CE2

805	   4.3. Forwarding Plane at Failure (When PE0 is not reachable)

807	   1. The ingress PE PE11
808	          Does not know about the failure yet and hence it does not
809	          change its behavior.

811	   2. Repair PE rP

813	       a. Traffic for VRF "Blue"

815	           i. Receives a packet with the top label being the LDP label
816	               for 2.1.1.1

818	          ii. 2.1.1.1 is not reachable

820	         iii. Swap the LDP label for 2.1.1.1 with the LDP label of
821	               9.9.9.1

823	          iv. Forward the packet towards 9.9.9.1

825	       b. Traffic for VRF "Blue"

827	           i. Receives a packet with the top label being the LDP label
828	               for 2.1.1.2

830	          ii. 2.1.1.2 is not reachable

832	         iii. Swap the LDP label for 2.1.1.1 with the LDP label of
833	               9.9.9.2

835	          iv. Forward the packet towards 9.9.9.2

837	   3. The repair Router "PE1"

839	       a. The penultimate hop of PE1 performs the usual penultimate hop
840	          popping

842	       b. PE1 receives a packet with the top label equals the repair
843	          label 3100, which was allocated on per-CE basis and points to
844	          CE1

846	       c. PE1 pops *2* labels and forwards the packet to CE1

848	   4. The repair Router "PE2"

850	       a. The penultimate hop of PE2 performs the usual penultimate hop
851	          popping

853	       b. PE1 receives a packet with the top label equals the repair
854	          label 3200, which was allocated on per-CE basis and points to
855	          CE2

857	       c. PE2 pops *2* labels and forwards the packet to CE2

859	5. Inter-operability with Existing IP FRR Mechanisms

861	   Current existing IP FRR mechanisms can be divided into two
862	   categories: core protection and edge protection. Core protection
863	   techniques, such as [12], [13], and [14], provide protection against
864	   internal node and/or link failure. Thus the technique proposed in
865	   this document is not related to existing IP FRR mechanisms. If the
866	   failure of an internal node or link results in completely
867	   disconnecting a protectable edge node, then an administrator MAY
868	   configure the repairing router to prefer the technique proposed in
869	   this document over existing IP FRR mechanisms.

871	   Edge protection techniques, such as [16] and its practical
872	   implementation [15] provide protection against the failure of the
873	   link between PE and CE routers. Thus existing PE-CE link protection
874	   can co-exist with the techniques proposed in this document because
875	   the two techniques are independent of each other.

877	6. Security Considerations

879	   No additional security risk is introduced by using the mechanisms
880	   proposed in this document

882	7. IANA Considerations

884	   No requirements for IANA

886	8. Conclusions

888	   This document proposes a method that allows fast re-route
889	   protection against edge node failure or complete disconnected from
890	   the core in a BGP-free core. The proposed method has few advantages

892	   o  Easy to apply protection policies. pPE is the router that chooses
893	      the rPE. Hence if an operator wants to control what prefixes/VRFs
894	      get to be protected or what router can be chosen as repair PE, the
895	      operator needs to apply the policy on the pPE only.

897	   o  Simple forwarding plane. The only change in forwarding plane is
898	      the need to pop/push two labels on the iPE, rP, and rPEs.

900	   o  Single label lookup even during failure. Forwarding decisions are
901	      taken based on a single label lookup on all routers all the time
902	      even during failure

904	   o  Immunity to mis-configuration. The only required configuration is
905	      to choose non-overlapping address ranges on different pPEs. If an
906	      operator configures overlapping IP address ranges on two different
907	      pPEs, then one of the pPE will eventually allocate a pNH that is
908	      covered by the IP address range of another pPE and hence the mis-
909	      configuration can be detected

911	   o  No Need for IP or TE FRR: Because the exit point of the repair
912	      tunnel from rP to rPE is different from the primary tunnel exit
913	      point

915	   o  Works in both MPLS core and IP core

917	   o  Works with per-CE, per-VRF, and per-prefix label allocation

919	   o  Can be incrementally deployed. There is no flag day. Different
920	      routers can be upgraded at different times

922	   o  Zero impact on the paths taken by traffic: Enabling/deploying the
923	      feature described in this document has no effect on the paths
924	      taken by traffic at steady state

926	9. References

928	   9.1. Normative References

930	   [1]   Bradner, S., "Key words for use in RFCs to Indicate Requirement
931	         Levels", BCP 14, RFC 2119, March 1997.

933	   [2]   Rekhter, Y., Li, T., and S. Hares, "A Border Gateway Protocol 4
934	         (BGP-4), RFC 4271, January 2006

936	   [3]   Bates, T., Chandra, R., Katz, D., and Rekhter Y.,
937	         "Multiprotocol Extensions for BGP", RFC 4760, January 2007

939	   [4]   Malhotra, P. and Rosen, E., " The BGP Encapsulation Subsequent
940	         Address Family Identifier (SAFI) and the BGP Tunnel
941	         Encapsulation Attribute", RFC 5512, April 2009

943	   [5]   Lau, J., Ed., Townsley, M., Ed., and I. Goyret, Ed., "Layer Two
944	         Tunneling Protocol - Version 3 (L2TPv3)", RFC 3931, March 2005.

946	   [6]   Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. Traina,
947	         "Generic Routing Encapsulation (GRE)", RFC 2784, March 2000.

949	   [7]   Perkins, C., "IP Encapsulation within IP", RFC 2003, October
950	         1996.

952	   9.2. Informative References

954	   [8]   Marques,P., Fernando, R., Chen, E, Mohapatra, P., Gredler, H.,
955	         "Advertisement of the best external route in BGP", draft-ietf-
956	         idr-best-external-04.txt, April 2011.

958	   [9]   Wu, J., Cui, Y., Metz, C., and E. Rosen, "Softwire Mesh
959	         Framework", RFC 5565, June 2009.

961	   [10]  Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private Networks
962	         (VPNs)", RFC 4364, February 2006.

964	   [11]  De Clercq, J. , Ooms, D., Prevost, S., Le Faucheur, F.,
965	         "Connecting IPv6 Islands over IPv4 MPLS Using IPv6 Provider
966	         Edge Routers (6PE)", RFC 4798, February 2007

968	   [12]  Atlas, A. and A. Zinin, "Basic Specification for IP Fast
969	         Reroute: Loop-Free Alternates", RFC 5286, September 2008.

971	   [13]  Shand, S., and Bryant, S., "IP Fast Reroute", RFC5714, January
972	         2010

974	   [14]  Shand, M. and S. Bryant, "A Framework for Loop-Free
975	         Convergence", RFC 5715, January 2010.

977	   [15]  Bashandy, A., Pithawala, P., and Heitz, J., "Scalable, Loop-
978	         Free BGP FRR using Repair Label", draft-bashandy-idr-bgp-
979	         repair-label-02.txt", July 2011

981	   [16]  O. Bonaventure, C. Filsfils, and P. Francois. "Achieving sub-50
982	         milliseconds recovery upon bgp peering link failures," IEEE/ACM
983	         Transactions on Networking, 15(5):1123-1135, 2007

985	10. Acknowledgments

987	   Special thanks to Eric Rosen, Clarence Filsfils, Maciek
988	   Konstantynowicz, Stewart Bryant, Pradosh Malhotra, Nagendra Kumar,
989	   George Swallow, Les Ginsberg, and Anton Smirnov for the valuable
990	   comments

992	   This document was prepared using 2-Word-v2.0.template.dot.

994	Appendix A.                 How to protect Against Misconfigured pNH

996	   Section 2.2 outlines a method by which the operator can configure
997	   the protected next-hop "pNH". There is a possibility of a
998	   misconfiguration as follows

1000	   o  The operator configures the same pNH for two protected prefixes
1001	      P1/m1 and P2/m2 but the two prefixes are protected by different
1002	      rPEs

1004	   o  The operator configures two different pNH's for two protected
1005	      prefixes P1/m1 and P2/m2 but the two prefixes are protected by
1006	      same rPE

1008	   The second configuration does not cause a lot of harm. Either way,
1009	   routers implementing the BGP FRR scheme proposed in this document can
1010	   detect both misconfigurations.

1012	   Suppose the operator configures the same "pNH" for P1/m1 and P2/m2
1013	   but P1/m1 is protected by rPE1 and P2/m2 is protected by rPE2. In
1014	   that case, the iPE and misconfigured pPE will detect this
1015	   inconsistency because both will see that P1/m1 and P2/m2 are assigned
1016	   the same pNH but are protected by two different rPEs. The reaction to
1017	   the misconfiguration is beyond the scope of this document.

1019	   Similarly, iPE and pPE can detect that the operator configured
1020	   different pNH's for P1/m1 and P2/m2 even though they are protected by
1021	   the same rPE because both iPE and pPE will receive an advertisement
1022	   for P1/m1 and P2/m2 from the same rPE. Reactions and remedy to the
1023	   misconfiguration is beyond the scope of this document.

1025	Appendix B.                 Alternative Approach for advertising (pNH,rNH) to iPE

1027	   In Section 2.1, pPE re-advertises the protected prefixes with (pNH)
1028	   as optional non-transitive attribute and advertises mapping (pNH,rNH)
1029	   separately. Alternatively, iPE can re-advertise the protected prefix
1030	   P/m to other iBGP peers with the mapping (pNH,rNH) as optional non-
1031	   transitive attributes. Advertising (pNH) only with the prefixes has
1032	   some advantages

1034	   o  Advertising pNH only with the prefixes can easily be used for
1035	      configured pNH as described in Section 2.2.

1037	   o  If the repair PE changes from one PE to another, there is no need
1038	      to re-advertise all the prefixes. Only the mapping (pNH,rNH) needs
1039	      to be re-advertised plus possibly some of the protected prefixes

1041	   o  Advertising pNH only with the prefix slightly reduces the BGP
1042	      message size

1044	   Irrespective of whether (pNH,rNH) is advertised with the prefix or
1045	   separately, (pNH,rNH) is better than advertising (pNH,rL) because
1046	   there are many rL's for the same rNH. Hence advertising (pNH,rNH)
1047	   yields better attribute packing

1049	Appendix C.                 Modification History

1051	C.1.1. Changes from Version 02

1053	   The whole scheme has been changed to a single next-hop per pPE-rPE.
1054	   As a result, unlike version 00 and 01, there will be a need for
1055	   behavioral changes in pPE, rP, iPE. The behavior for rPE remains
1056	   almost unchanged

1058	   The second important change is requiring rP to advertise the pNH with
1059	   maximum metric so that traffic does not get disrupted when the pPE
1060	   disappears

1062	C.1.2. Changes from Version 01

1064	   1. Use the term "underlying repair label" instead of just "repair
1065	      label" to avoid confusion with the term "repair label" used in
1066	      [15].

1068	   2. In version 01, it was assumed in many places that the repairing
1069	      router is the penultimate hop P router. Although this would
1070	      probably be the most common case, it is not always true. Hence in
1071	      this version the repairing router may be any core router

1073	   3. Merged handling labeled and unlabeled prefixes into a single
1074	      algorithm.

1076	   4. Allowed sending a repair label for unlabeled prefixes and added
1077	      the "Push" flag. This ensures loop-free repair even for unlabeled
1078	      prefixes in case that the repair PE has eiBGP paths as mentioned
1079	      in Section Error! Reference source not found.

1081	   5. In Section Error! Reference source not found. discussing the rules
1082	      governing the choice of the underlying repair label for labeled
1083	      prefix, we changed the wording so that the primary egress PE
1084	      "SHOULD" instead of "MAY" use the repair label advertised
1085	      according to [15] as an underlying repair label.

1087	   6. All occurrences of the term "backup" were replaced by "repair"  as
1088	      the term "repair" is the commonly used term in the IP FRR context
1089	      such as [14][13][12]

1091	   7. Added the definition of primary and repair tunnels in Section 1.2.

1093	   8. Added a definition of the term "Repair Next-hop" in Section 1.2.

1095	   9. Modified the definition of "repair path" in Section 1.2. to being
1096	      the repair next-hop plus the underlying repair label instead of
1097	      being the repair PE plus the underlying repair label.

1099	   10.Outlined inter-operability with existing IP FRR techniques in
1100	      Section 5.

1102	   11.There were few editorial corrections.

1104	Authors' Addresses

1106	   Ahmed Bashandy
1107	   Cisco Systems
1108	   170 West Tasman Dr, San Jose, CA 95134
1109	   Email: bashandy@cisco.com

1111	   Burjiz Pithawala
1112	   Cisco Systems
1113	   170 West Tasman Dr, San Jose, CA 95134
1114	   Email: bpithaw@cisco.com

1116	   Keyur Patel
1117	   Cisco Systems
1118	   170 West Tasman Dr, San Jose, CA 95134
1119	   Email: keyupate@cisco.com