idnits 2.17.1 

draft-csaszar-rtgwg-ipfrr-fn-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (February 25, 2013) is 4075 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  ** Obsolete normative reference: RFC 4970 (Obsoleted by RFC 7770)

  ** Obsolete normative reference: RFC 4971 (Obsoleted by RFC 7981)

  == Outdated reference: A later version (-10) exists of
     draft-ietf-rtgwg-mrt-frr-architecture-01

  == Outdated reference: A later version (-04) exists of
     draft-enyedi-rtgwg-mrt-frr-algorithm-01


     Summary: 2 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Network Working Group                                  A. Csaszar (Ed.)
2	Internet Draft                                                G. Enyedi
3	Intended status: Standards Track                            J. Tantsura
4	Expires: August 25, 2013                                        S. Kini
5	                                                               Ericsson

7	                                                               J. Sucec
8	                                                                 S. Das
9	                                                              Telcordia

11	                                                      February 25, 2013

13	                  IP Fast Re-Route with Fast Notification
14	                    draft-csaszar-rtgwg-ipfrr-fn-01.txt

16	Status of this Memo

18	   This Internet-Draft is submitted in full conformance with the
19	   provisions of BCP 78 and BCP 79.

21	   Internet-Drafts are working documents of the Internet Engineering
22	   Task Force (IETF), its areas, and its working groups.  Note that
23	   other groups may also distribute working documents as Internet-
24	   Drafts.

26	   Internet-Drafts are draft documents valid for a maximum of six months
27	   and may be updated, replaced, or obsoleted by other documents at any
28	   time.  It is inappropriate to use Internet-Drafts as reference
29	   material or to cite them other than as "work in progress."

31	   The list of current Internet-Drafts can be accessed at
32	   http://www.ietf.org/ietf/1id-abstracts.txt

34	   The list of Internet-Draft Shadow Directories can be accessed at
35	   http://www.ietf.org/shadow.html

37	   This Internet-Draft will expire on August 25, 2013.

39	Copyright Notice

41	   Copyright (c) 2013 IETF Trust and the persons identified as the
42	   document authors. All rights reserved.

44	   This document is subject to BCP 78 and the IETF Trust's Legal
45	   Provisions Relating to IETF Documents
46	   (http://trustee.ietf.org/license-info) in effect on the date of
47	   publication of this document. Please review these documents
48	   carefully, as they describe your rights and restrictions with respect
49	   to this document.

51	Abstract

53	   This document describes the benefits and main applications of sending
54	   explicit fast notification (FN) packets to routers in an area. FN
55	   packets are generated and processed in the dataplane, and a single FN
56	   service can substitute existing OAM methods for remote failure
57	   detection, such as a full mesh of multi-hop BFD session. The FN
58	   service, therefore, decreases network overhead considerable. The main
59	   application is fast reroute in pure IP and in IP/LDP-MPLS networks
60	   called IPFRR-FN. The detour paths used when IPFRR-FN is active are in
61	   most cases identical to those used after Interior Gateway Protocol
62	   (IGP) convergence. The proposed mechanism can address all single
63	   link, node, and SRLG failures in an area; moreover it is an efficient
64	   solution to protect against BGP ASBR failures as well as VPN PE
65	   router failures. IPFRR-FN can be a supplemental tool to provide FRR
66	   when LFA cannot repair a failure case, while it can be a replacement
67	   of existing ASBR/PE protection mechanisms by overcoming their
68	   scalability and complexity issues.

70	Table of Contents

72	   1. Introduction...................................................3
73	   2. Overview of current IPFRR Proposals based on Local Repair......6
74	   3. Requirements of an Explicit Failure Signaling Mechanism........7
75	   4. Conceptual Operation of IPFRR relying on Fast Notification.....8
76	      4.1. Preparation Phase.........................................8
77	      4.2. Failure Reaction Phase....................................9
78	         4.2.1. Activating Failure Specific Backups.................10
79	         4.2.2. SRLG Handling.......................................11
80	      4.3. Example and Timing.......................................12
81	      4.4. Scoping FN Messages with TTL.............................13
82	   5. Operation Details.............................................14
83	      5.1. Transport of Fast Notification Messages..................14
84	      5.2. Message Handling and Encoding............................16
85	         5.2.1. Failure Identification Message for OSPF.............17
86	         5.2.2. Failure Identification Message for ISISError! Bookmark
87	         not defined.

89	      5.3. Protecting External Prefixes.............................18
90	         5.3.1. Failure on the Intra-Area Path Leading to the ASBR..18
91	         5.3.2. Protecting ASBR Failures: BGP-FRR...................19
92	            5.3.2.1. Primary and Backup ASBR in the Same Area.......19
93	            5.3.2.2. Primary and Backup ASBR in Different Areas.....21
94	      5.4. Application to LDP.......................................24
95	      5.5. Application to VPN PE Protection.........................25
96	      5.6. Bypassing Legacy Nodes...................................25
97	      5.7. Capability Advertisement.................................27
98	      5.8. Constraining the Dissemination Scope of Fast Notification
99	      Packets.......................................................27
100	         5.8.1. Pre-Configured FN TTL Setting.......................27
101	         5.8.2. Advanced FN Scoping.................................28
102	   6. Protection against Replay Attacks.............................28
103	      6.1. Calculating LSDB Digest..................................29
104	   7. Security Considerations.......................................30
105	   8. IANA Considerations...........................................30
106	   9. References....................................................30
107	      9.1. Normative References.....................................30
108	      9.2. Informative References...................................31
109	   10. Acknowledgments..............................................34
110	   Appendix A. Memory Needs of a Naive Implementation...............35
111	      A.1. An Example Implementation................................35
112	      A.2. Estimation of Memory Requirements........................36
113	      A.3. Estimation of Failover Time..............................37
114	   Appendix B. Impact Scope of Fast Notification....................39

116	1. Introduction

118	   Convergence of link-state IGPs, such as OSPF or IS-IS, after a link
119	   or node failure is known to be relatively slow. While this may be
120	   sufficient for many applications, some network SLAs and applications
121	   require faster reaction to network failures.

123	   IGP convergence time is composed mainly of:

125	   1. Failure detection at nodes adjacent to the failure

127	   2. Advertisement of the topology change

129	   3. Calculation of new routes

131	   4. Installing new routes to linecards
132	   Traditional Hello-based failure detection methods of link-state IGPs
133	   are relatively slow, hence a new, optimized, Hello protocol has been
134	   standardized [BFD] which can reduce failure detection times to the
135	   range of 10ms even if no lower layer notices the failure quickly
136	   (like loss of signal, etc.).

138	   Even with fast failure detection, reaction times of IGPs may take
139	   several seconds, and even with a tuned configuration it may take at
140	   least a couple of hundreds of milliseconds.

142	   To decrease fail-over time even further, IPFRR techniques [RFC5714],
143	   can be introduced. IPFRR solutions compliant with [RFC5714] are
144	   targeting fail-over time reduction of steps 2-4 with the following
145	   design principles:

147	               IGP                                IPFRR

149	   2. Advertisement of the       ==>      No explicit advertisement,
150	      topology change                     only local repair

152	   3. Calculation of new routes  ==>      Pre-computation of new
153	                                          routes

155	   4. Installing new routes      ==>      Pre-installation of backup
156	      to linecards                        routes

158	   Pre-computing means that the way of bypassing a failed resource is
159	   computed before any failure occurs. In order to limit complexity,
160	   IPFRR techniques typically prepare for single link, single node and
161	   single Shared Risk Link Group (SRLG) failures, which failure types
162	   are undoubtedly the most common ones. The pre-calculated backup
163	   routes are also downloaded to linecards in preparation for the
164	   failure, in this way sparing the lengthy communication between
165	   control plane and data plane when a failure happens.

167	   The principle of local rerouting requires forwarding a packet along a
168	   detour even if only the immediate neighbors of the failed resource
169	   know the failure. IPFRR methods observing the local rerouting
170	   principle do not explicitly propagate the failure information.
171	   Unfortunately, packets on detours must be handled in a different way
172	   than normal packets as otherwise they might get returned to the
173	   failed resource. Rephrased, a node not having *any* sort of
174	   information about the failure may loop the packet back to the node
175	   from where it was rerouted - simply because its default
176	   routing/forwarding configuration dictates that. As an example, see
177	   the following figure. Assuming a link failure between A and Dst, A
178	   needs to drop packets heading to Dst. If node A forwarded packets to
179	   Src, and if the latter had absolutely no knowledge of the failure, a
180	   loop would be formed between Src and A.

182	            +---+             +---+
183	            | B |-------------| C |
184	            +---+             +---+
185	           /                       \
186	          /                         \
187	         /                           \
188	    +---+            +---+  failure   +---+
189	    |Src|------------| A |-----X------|Dst|
190	    +---+            +---+            +---+
191	      =========>==============>=========>
192	                  Primary path

194	    Figure 1 Forwarding inconsistency in case of local repair: The path
195	                       of Src to Dst leads through A

197	   The basic problem that previous IPFRR solutions struggle to solve is,
198	   therefore, to provide consistent routing hop-by-hop without explicit
199	   signaling of the failure.

201	   To provide protection for all single failure cases in arbitrary
202	   topologies, the information about the failure must be given in *some*
203	   way to other nodes. That is, IPFRR solutions targeting full failure
204	   coverage need to signal the fact and to some extent the identity of
205	   the failure within the data packet as no explicit signaling is
206	   allowed. Such solutions have turned out to be considerably complex
207	   and hard or impossible to implement practically. The Loop Free
208	   Alternates (LFA) solution [RFC5286] does not give the failure
209	   information in any way to other routers, and so it cannot repair all
210	   failure cases such as the one in Figure 1.

212	   As discussed in Section 2. solutions that address full failure
213	   coverage and rely on local repair, i.e. carrying some failure
214	   information within the data packets, present an overly complex and
215	   therefore often inpractical alternative to LFA. This draft,
216	   therefore, suggests that relaxing the local re-routing principle with
217	   carefully engineered explicit failure signaling is an effective
218	   approach.

220	   The idea of using explicit failure notification for IPFRR has been
221	   proposed before for Remote LFA Paths [RLFAP]. RLFAP sends explicit
222	   notifications and can limit the radius in which the notification is
223	   propagated to enhance scalability. Design, implementation and
224	   enhancements for the remote LFAP concept are reported in [Hok2007],
225	   [Hok2008] and [Cev2010].

227	   This draft attempts to work out in more detail what kind of failure
228	   dissemination mechanism is required to facilitate remote repair
229	   efficiently. Requirements for explicit signaling are given in
230	   Section 3. This draft does not limit the failure advertisement radius
231	   as opposed to RLFAP. As a result, the detour paths remain stable in
232	   most cases, since they are identical to those that the IGP will
233	   calculation after IGP convergence. Hence, micro-loop will not occur
234	   after IGP convergence.

236	   A key contribution of this memo is to recognize that a Fast
237	   Notification service is not only an enabler for a new IPFRR approach
238	   but it is also a replacement for various OAM remote connectivity
239	   verification procedures such as multi-hop BFD. These previous methods
240	   posed considerable overhead to the network: (i) management of many
241	   OAM sessions; (ii) careful configuration of connectivity verification
242	   packet interval so that no false alarm is given for network internal
243	   failures which are handled by other mechanisms; and (iii) packet
244	   processing overhead, since connectivity verification packets have to
245	   be transmitted continuously through the network in a mesh, even in
246	   fault-free conditions.

248	2. Overview of current IPFRR Proposals based on Local Repair

250	   The only practically feasible solution, Loop Free
251	   Alternates [RFC5286], offers the simplest resolution of the hop-by-
252	   hop routing consistency problem: a node performing fail-over may only
253	   use a next-hop as backup if it is guaranteed that it does not send
254	   the packets back. These neighbors are called Loop-Free
255	   Alternates (LFA). LFAs, however, do not always exist, as shown in
256	   Figure 1 above, i.e., node A has no LFAs with respect to Dst. while
257	   it is true that tweaking the network configuration may boost LFA
258	   failure case coverage considerably [Ret2011], LFAs cannot protect all
259	   failure cases in arbitrary network topologies.

261	   The exact way of adding extra information to data packets and its
262	   usage for forwarding is the most important property that
263	   differentiates most existing IPFRR proposals.

265	   Packets can be marked "implicitly", when they are not altered in any
266	   way, but some extra information owned by the router helps deciding
267	   the correct way of forwarding. Such extra information can be for
268	   instance the direction of the packet, e.g., the incoming interface,
269	   e.g. as in [FIFR]. Such solutions require what is called interface-
270	   based or interface-specific forwarding.

272	   Interface-based forwarding significantly changes the well-established
273	   nature of IP's destination-based forwarding principle, where the IP
274	   destination address alone describes the next hop. One embodiment
275	   would need to download different FIBs for each physical or virtual IP
276	   interface - not a very compelling idea. Another embodiment would
277	   alter the next-hop selection process by adding the incoming interface
278	   id also to the lookup fields, which would impact forwarding
279	   performance considerably.

281	   Other solutions mark data packets explicitly. Some proposals suggest
282	   using free bits in the IP header [MRC], which unfortunately do not
283	   exist in the IPv4 header. Other proposals resort to encapsulating re-
284	   routed packets with an additional IP header as in e.g. [NotVia],
285	   [Eny2009] or [MRT-ARCH]. Encapsulation raises the problem of
286	   fragmentation and reassembly, which could be a performance
287	   bottleneck, if many packets are sent at MTU size. Another significant
288	   problem is the additional management complexity of the encapsulation
289	   addresses, which have their own semantics and require cumbersome
290	   routing calculations, see e.g. [MRT-ALG]. Encapsulation in the IP
291	   header translates to label stacking in LDP-MPLS. The above mentioned
292	   mechanisms either encode the active topology ID in a label on the
293	   stack or encode the failure point in a label, and also require an
294	   increasing mesh of targeted LDP sessions to acquire a valid label at
295	   the detour endpoint, which is another level of complexity.

297	3. Requirements of an Explicit Failure Signaling Mechanism

299	   All local repair mechanisms touched above try to avoid explicit
300	   notification of the failure via signaling, and instead try to hack
301	   some failure-related information into data packets. This is mainly
302	   due to relatively low signaling performance of legacy hardware.
303	   Failure notification, therefore, should fulfill the following
304	   properties to be practically feasible:

306	   1. The signaling mechanism should be reliable. The mechanism needs to
307	      propagate the failure information to all interested nodes even in
308	      a network where a single link or a node is down.

310	   2. The mechanism should be fast in the sense that getting the
311	      notification packet to remote nodes through possible multiple hops
312	      should not require (considerably) more processing at each hop than
313	      plain fast path packet forwarding.

315	   3. The mechanism should involve simple and efficient processing to be
316	      feasible for implementation in the dataplane. This goal manifests
317	      itself in three ways:

319	       a. Origination of notification should be very easy, e.g. creating
320	          a simple IP packet, the payload of which can be filled easily.

322	       b. When receiving the packet, it should be easy to recognize by
323	          dataplane linecards so that processing can commence after
324	          forwarding.

326	       c. No complex operations should be required in order to extract
327	          the information from the packet needed to activate the correct
328	          backup routes.

330	   4. The mechanism should be trustable; that is, it should provide
331	      means to verify the authenticity of the notifications without
332	      significant increase of the processing burden in the dataplane.

334	   5. Duplication of notification packets should be either strictly
335	      bounded or handled without significant dataplane processing
336	      burden.

338	   These requirements present a trade-off. A proper balance needs to be
339	   found that offers good enough authentication and reliability while
340	   keeping processing complexity sufficiently low to be feasible for
341	   data plane implementation. One such solution is proposed in [fn-
342	   transport], which is the assumed notification protocol in the
343	   following.

345	4. Conceptual Operation of IPFRR relying on Fast Notification

347	   This section outlines the operation of an IPFRR mechanism relying on
348	   Fast Notification.

350	4.1. Preparation Phase

352	   As any other IPFRR solution, IPFRR-FN also requires quick failure
353	   detection mechanisms in place, such as lower layer upcalls or BFD.

355	   The FN service needs to be activated and configured so that FN
356	   disseminates the information identifying the failure to the area once
357	   triggered by a local failure detection method.

359	   Based on the detailed topology database obtained by a link state IGP,
360	   the node should pre-calculate alternative paths considering
361	   *relevant* link or node failures in the area. Failure specific
362	   alternative path computation should typically be executed at lower
363	   priority than other routing processing. Note that the calculation can
364	   be done "offline", while the network is intact and the CP has few
365	   things to do.

367	   Also note the word *relevant* above: a node does not needed to
368	   compute all the shortest paths with respect to each possible failure;
369	   only those link failures need to be taken into consideration, which
370	   are in the shortest path tree starting from the node.

372	   To provide protection for Autonomous System Border Router (ASBR)
373	   failures, the node will need information not only from the IGP but
374	   also from BGP. This is described in detail in Section 5.3.

376	   After calculating the failure specific alternative next-hops, only
377	   those which represent a change to the primary next-hop, should be
378	   pre-installed to the linecards together with the identifier of the
379	   failure, which triggers the switch-over. In order to preserve
380	   scalability, external prefixes are handled through FIB indirection
381	   available in most routers already. Due to indirection, backup routes
382	   need to be installed only for egress routers. (The resource needs of
383	   an example implementation are briefly discussed in Appendix A.)

385	4.2. Failure Reaction Phase

387	   The main steps to be taken after a failure are the following:

389	   1. Quick dataplane failure detection

391	   2. Send information about failure using FN service right from
392	      dataplane.

394	   3. Forward the received notification as defined by the actually used
395	      FN protocol such as the one in [fn-transport]

397	   4. After learning about a local or remote failure, extract failure
398	      identifier and activate failure specific backups in all the
399	      linecards where needed, directly within dataplane

401	   5. Start forwarding data traffic using the updated FIB

403	   After a node detects the loss of connectivity to another node, it
404	   should make a decision whether the failure can be handled locally. If
405	   local repair is not possible or not configured, for example because
406	   LFA is not configured or there are destinations for which no LFA
407	   exists, a failure should trigger the FN service to disseminate the
408	   failure description. For instance, if BFD detects a dataplane failure
409	   it not only should invoke routines to notify the control plane but it
410	   should first trigger FN before notifying the CP.

412	   After receiving the trigger, without any DP-CP communication
413	   involved, FN constructs a packet and adds the description of the
414	   failure (described in Section 5.1. ) to the payload. The notification
415	   describes that

417	   o  a node X has lost connectivity

419	   o  to a node Z

421	   o  via a link L.

423	   The proposed encoding of the IPFRR-FN packet is described in
424	   Section 5.1.

426	   The packet is then disseminated by the FN service in the routing
427	   area. Note the synergy of the relation between BFD and IGP Hellos and
428	   between FN and IGP link state advertisements. BFD makes a dataplane
429	   optimized implementation of the routing protocol's Hello mechanism,
430	   while Fast Notification makes a dataplane optimized implementation of
431	   the link state advertisement flooding mechanism of IGPs.

433	   In each hop, the recipient node needs to perform a "punt and
434	   forward". That is, the FN packet not only needs to be forwarded to
435	   the FN neighbors as the specific FN mechanism dictates, but a replica
436	   needs to be detached and, after forwarding, and notifying other
437	   linecards in the same node (the exact way of this is out of the scope
438	   of this document), started to be processed by the dataplane card.

440	4.2.1. Activating Failure Specific Backups

442	   After the forwarding element extracted the contents of the
443	   notification packet, it knows that a node X has lost connectivity to
444	   a node Z via a link L. The recipient now needs to decide whether the
445	   failure was a link or a node failure. Two approaches can be thought
446	   of. Both options are based on the property that notifications advance
447	   in the network as fast as possible.

449	   In the first option, the router does not immediately make the
450	   decision, but instead starts a timer set to fire after a couple of
451	   milliseconds. If, the failure was a node failure, the node will
452	   receive further notifications saying that another node Y has lost
453	   connectivity to node Z through another link M. That is, if node Z is
454	   common in multiple notifications, the recipient can conclude that it
455	   is a node failure and already knows which node it is (Z). If link L
456	   is common, then the recipient can decide for link failure (L). If
457	   further inconclusive notifications arrive, then it means multiple
458	   failures which case is not in scope for IPFRR, and is left for
459	   regular IGP convergence.

461	   After concluding about the exact failure, the data plane element
462	   needs to check in its pre-installed IPFRR database whether this
463	   particular failure results in any route changes. If yes, the linecard
464	   replaces the next-hops impacted by that failure with their failure
465	   specific backups which were pre-installed in the preparation phase.

467	   In the second option, the first received notification is handled
468	   immediately as a link failure, hence the router may start replacing
469	   its next-hops. In many cases this is a good decision, as it has been
470	   shown before that most network failures are link failures. If,
471	   however, another notification arrives a couple of milliseconds later
472	   that points to a node failure, the router then needs to start
473	   replacing its next-hops again. This may cause a route flap but due to
474	   the quick dissemination mechanism the routing inconsistency is very
475	   short lived and likely takes only a couple of milliseconds.

477	4.2.2. SRLG Handling

479	   The above conceptual solution is easily extensible to support pre-
480	   configured SRLGs. Namely, if the failed link is part of an SRLG, then
481	   the disseminated link ID should identify the SRLG itself. As a
482	   result, further possible notifications describing other link failures
483	   of the same SRLG will identify the same resource.

485	   Note that an SRLG is just a set of links, thus links sharing their
486	   fate with no other link can be considered as SRLG sets having only
487	   one element. Thus, we don't need to distinguish link and SRLG IDs,
488	   instead only send SRLG ID in the FN packets, and some of these SRLGs
489	   may have only a single element.

491	   If the control plane knows about SRLGs, it can prepare for failures
492	   of these, e.g. by calculating a path that avoids all links in that
493	   SRLG. SRLG identifier may have been pre-configured or have been
494	   obtained by automated mechanisms such as [RFC4203].

496	4.3. Example and Timing

498	   The main message of this section is that big delay links do not
499	   represent a problem for IPFRR-FN. The FN message of course propagates
500	   on long-haul links slower but the same delay is incurred by normal
501	   data packets as well. Packet loss only takes place as long as a node
502	   forwards traffic to an incorrect or inconsistent next-hop. This may
503	   happen in two cases:

505	   First, as long as the failure is not detected, the node adjacent to
506	   the failure only has the failed next-hop installed.

508	   Secondly, when a node (A) selects a new next-hop (B) after detecting
509	   the failure locally or by receiving an FN, the question is if the
510	   routing in the new next-hop (B) is consistent by the time the first
511	   data packets get from A to B. The following timeline depicts the
512	   situation:

514	   Legend: X : period with packet loss
515	           FN forwarding delay: |--|

517	         |--|--------|
518	   A:----1XX2XXXXXXXX3--------------------------------------------------
519	            |----|   |----|
520	   B:------------4--5-----6XX7------------------------------------------
521	                 |--|--------|

523	   (a) Link delay is |----| FIB update delay is |--------|

525	         |--|--------|
526	   A:----1XX2XXXXXXXX3--------------------------------------------------
527	            |---------------|
528	                     |---------------|
529	   B:-----------------------4--5-----6XX7-------------------------------
530	                            |--|--------|

532	   (b) Link delay is |---------------| FIB update delay is |--------|

534	             Figure 2 Timing of FN and data packet forwarding

536	   As can be seen above, the outage time is only influenced by the FN
537	   forwarding delay and the FIB update time. The link delay is not a
538	   factor. Node A forwards the first re-routed packets from time
539	   instance 3 to node B. These reach node B at time instance 6. Node B
540	   is doing incorrect/inconsistent forwarding when it tries to forward
541	   those packets back to A which have already been put onto a detour by
542	   A. This is the interval between time instances 6 and 7.

544	4.4. Scoping FN Messages with TTL

546	   In a large routing area it is often the case that a failure (i.e. a
547	   topology change) causes next-hop changes only in routers relatively
548	   close to the failure. Analysis of certain random topologies and two
549	   example ISP topologies revealed that a single link failure event
550	   generated routing table changes only in routers not more than 2 hops
551	   away from the failure site for the particular topologies under study
552	   [Hok2008]. Based on this analysis, it is anticipated that in practice
553	   the TTL for failure notification messages can be set to a relatively
554	   small radius, perhaps as small as 2 or 3 hops.

556	   A chief benefit of TTL scoping is that it reduces the overhead on
557	   routers that have no use for the information (i.e. which do not need
558	   to re-route). Another benefit (that is particularly important for
559	   links with scarce capacity) is that it helps to constrain the control
560	   overhead incurred on network links. Determining a suitable TTL value
561	   for each locally originated event and controlling failure
562	   notification dissemination, in general, is discussed further in
563	   Section 5.8.

565	5. Operation Details

567	5.1. Transport of Fast Notification Messages

569	   This draft recommends that out of the several FN delivery options
570	   defined in [fn-transport], the flooding transport option is
571	   preferred, which ensures that any event can reach each node from any
572	   source with any failure present in the network area as long as
573	   theoretically possible. Flooding also ensures that FN messages reach
574	   each node on the shortest (delay) path, and as a side effect failure
575	   notifications always reach *each* node *before* re-routed data
576	   packets could reach that node. This means that looping is minimized.

578	   [fn-transport] describes that the dataplane flooding procedure
579	   requires routers to perform duplicate checking before forwarding the
580	   notifications to other interfaces to avoid duplicating notifications.
581	   [fn-transport] describes that duplicate check can be performed by a
582	   simple storage queue, where previously received notification packets
583	   or their signatures are stored.

585	   IPFRR-FN enables another duplicate check process that is based on the
586	   internal state machine. Routers, after receiving a notification but
587	   before forwarding it to other peers, check the authenticity of the
588	   message, if authentication is used. Now the router may check what is
589	   the stored event and what is the event described by the received
590	   notification.

592	   Two variables and a bit describe what is the known failure state:

594	   o  Suspected failed node ID (denoted by N)

596	   o  Suspected link/SRLG ID (denoted by S)

598	   o  Bit indicating the type of the failure, i.e. link/SRLG failure or
599	      node failure (denoted by T)

601	   Recall that the incoming notification describes that a node X has
602	   lost connectivity to a node Z via a link L. Now, the state machine
603	   can be described with the following pseudo-code:

605	   //current state:
606	   //  N: ID of suspected failed node
607	   //  S: ID of suspected failed link/SRLG
608	   //  T: bit indicating the type of the failure
609	   //     T=0 indicates link/SRLG
610	   //     T=1 indicates node
611	   //
612	   Proc notification_received(Node Originator_X, Node Y, SRLG L) {
613	       if (N == NULL) {
614	           // this is a new event, store it and forward it
615	           N=Y;
616	           S=L;
617	           T=0; //which is the default anyway
618	           Forward_notification;
619	       }
620	       else if (S == L AND T == 0) {
621	           // this is the same link or SRLG as before, need not do
622	           // anything
623	           Discard_notification;
624	       }
625	       else if (N == Y) {
626	           // This is a node failure
627	           if (T == 0) {
628	               // Just now turned out that it is a node failure
629	               T=1;
630	               Forward_notification;
631	           }
632	           else {
633	               // Known before that it is a node failure,
634	               // no need to forward it
635	               Discard_notification;
636	           }
637	       }
638	       else {
639	           // multiple failures
640	       }
641	   }
642	          Figure 3 Pseudo-code of state machine for FN forwarding

644	5.2. Message Handling and Encoding

646	   A failure identifier is needed that unambiguously describes the
647	   failed resource consistently among the nodes in the area. The
648	   schemantics of the identifiers are defined by the IGP used to pre-
649	   calculate and pre-install the backup forwarding entries, e.g. OSPF or
650	   ISIS.

652	   This draft defines a Failure Identification message class. Members of
653	   this class represent a routing protocol specific Failure
654	   Identification message to be carried with the Fast Notification
655	   transport protocol. Each message within the Failure Identification
656	   message class shall contain the following fields, the lengths of
657	   which are routing protocol specific. The exact values shall be
658	   aligned with the WG of the routing protocol:

660	   o  Originator Router ID: the identifier of the router advertising the
661	      failure;

663	   o  Neighbour Router ID: the identifier of the neighbour node to which
664	      the originator lost connectivity.

666	   o  Link ID: the identifier of the link, through which connectivity
667	      was lost to the neighbour. The routing protocol should assign the
668	      same Link ID for bidirectional, broadcast or multi access links
669	      from each access point, consistently.

671	   o  Sequence Number: [fn-transport] expects the applications of the FN
672	      service that require replay attack protection to create and verify
673	      a sequence number in FN messages. It is described in Section 6.

675	   Routers forwarding the FN packets should ensure that Failure
676	   Identification messages are not lost, e.g. due to congestion. FN
677	   packets can be put a high precedence traffic class (e.g. Network
678	   Control class). If the network environment is known to be lossy, the
679	   FN sender should repeat the same notification a couple of times, like
680	   a salvo fire.

682	   After the forwarding element processed the FN packet and extracted
683	   the Failure Identification message, it should decide what backups
684	   need to be activated if at all - as described in Section 4.2.1.

686	5.2.1. Failure Identification Message for OSPF and IS-IS

688	    0                   1                   2                   3
689	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
690	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
691	   |           FN Length           |  FN App Type  | AuType|unused |
692	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
693	   |                    Originator Router ID                       |
694	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
695	   |                     Neighbour Router ID                       |
696	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
697	   |                         Link/SRLG ID                          |
698	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
699	   |                     Sequence Number                           |
700	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
701	   |                     Sequence Number (cont'd)                  |
702	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

704	   FN Header fields:

706	     FN Length
707	        The length of the Failure Identification message for OSPF and
708	        IS-IS is 16 bytes.

710	     FN App Type
711	        The exact values are to be assigned by IANA for the Failure
712	        Identification message class. For example, FN App Type values
713	        between 0x0008 and 0x000F could represent Failure
714	        Identification messages, from which 0x0008 could mean OSPF,
715	        0x0009 could be ISIS.

717	     AuType
718	        IPFRR-FN relies on the authentication options offered the FN
719	        transport service. Cryptographic authentication is recommended.

721	     Originator Router ID
722	        The originator's ID is the loopback IP address of the source of
723	        the FN packet if such exists. If there are more loopback
724	        addresses, the lowest address is used. If there is no loopback
725	        address, the lowest interface address is used.

727	     Neighbour Router ID
728	        The ID of the neighbor to which connectivity was lost. This ID
729	        is similar to the originator ID, i.e. this is the lowest
730	        loopback IP address if such exists, or the lowest interface
731	        address.

733	     Link ID/SRLG ID
734	        If the link is a single point-to-point link not belonging to
735	        any SRLG, the Link ID is the lowest interface IP address
736	        belonging to that link. Similarly, LAN interfaces (belonging to
737	        the same SRLG) use the lowest interface IP belonging to that
738	        LAN.

740	     Sequence Number
741	        This field stores a digest of the LSDB of the routing protocol,
742	        as described in Section 6.

744	5.3. Protecting External Prefixes

746	5.3.1. Failure on the Intra-Area Path Leading to the ASBR

748	   Installing failure specific backup next-hops for each external prefix
749	   would be a scalability problem as the number of these prefixes may be
750	   one or two orders of magnitude higher than intra-area destinations.
751	   To avoid this, it is suggested to make use of indirection already
752	   offered by router vendors.

754	   Indirection means that when a packet needs to be forwarded to an
755	   external destination, the IP address lookup in the FIB will not
756	   return a direct result but a pointer to another FIB entry, i.e. to
757	   the FIB entry of the ASBR. In LDP/MPLS this means that all prefixes
758	   reachable through the same ASBR constitute the same FEC.

760	   As an example, consider that in an area ASBR1 is the primary BGP
761	   route for prefixes P1, P2, P3 and P4 and ASBR2 is the primary route
762	   for prefixes P5, P6 and P7. A FIB arrangement for this scenario could
763	   be the one shown on the following figure. Prefixes using the same
764	   ASBR could be resolved to the same pointer that references to the
765	   next-hop leading to the ASBR. Prefixes resolved to the same pointer
766	   are said to be part of the same "prefix group" or FEC.

768	         FIB lookup       |         FIB lookup
769	                          |
770	   ASBR2 ========> NH2    |   ASBR2 ========> NH2 <----+
771	   ASBR1 ========> NH1    |   ASBR1 ========> NH1 <-+  |
772	                          |                         |  |
773	   P1    ========> NH1    |   P1    ========> Ptr1 -+  |
774	   P2    ========> NH1    |   P2    ========> Ptr1 -+  |
775	   P3    ========> NH1    |   P3    ========> Ptr1 -+  |
776	   P4    ========> NH1    |   P4    ========> Ptr1 -+  |
777	                          |                            |
778	   P5    ========> NH2    |   P5    ========> Ptr2 ----+
779	   P6    ========> NH2    |   P6    ========> Ptr2 ----+
780	   P7    ========> NH2    |   P7    ========> Ptr2 ----+
781	                            |

783	         Figure 4 FIB without (left) and with (right) indirection

785	   If the next-hop to an ASBR changes, it is enough to update in the FIB
786	   the next-hop of the ASBR route. In the above example, this means that
787	   if the next-hop of ASBR1 changes, it is enough to update the route
788	   entry for ASBR1 and due to indirection through pointer Ptr1 this
789	   updates several prefixes at the same time.

791	5.3.2. Protecting ASBR Failures: BGP-FRR

793	   IPFRR-FN can make use of alternative BGP routes advertised in an AS
794	   by new extensions of BGP such as [BGPAddPaths], [DiverseBGP] or
795	   [BGPBestExt]. Using these extensions, for each destination prefix, a
796	   node may learn a "backup" ASBR besides the primary ASBR learnt by
797	   normal BGP operation.

799	5.3.2.1. Primary and Backup ASBR in the Same Area

801	   If the failed ASBR is inside the area, all nodes within that area get
802	   notified by FN. Grouping prefixes into FECs, however, needs to be
803	   done carefully. Prefixes now constitute a common group (i.e. are
804	   resolved to the same pointer) if *both* their primary AND their
805	   backup ASBRs are the same. This is due to the fact that even if two
806	   prefixes use the ASBR by default, they may use different ASBRs when
807	   their common default ASBR fails.

809	   Considering the previous example, let us assume that the backup ASBR
810	   of prefixes P1 and P2 is ASBR3 but that the backup ASBR of P3 and P4
811	   is an ASBR2. Let us further assume that P5 also has ASBR3 as its
812	   backup ASBR but P6 and P7 have an ASBR 4 as their backup ASBR. The
813	   resulting FIB structure is shown in the following figure:

815	         FIB lookup
816	   ASBR4 ========> NH4
817	   ASBR2 ========> NH2
818	   ASBR3 ========> NH3
819	   ASBR1 ========> NH1

821	   P1    ========> Ptr1 -+-> NH1
822	   P2    ========> Ptr1 -+

824	   P3    ========> Ptr2 -+-> NH1
825	   P4    ========> Ptr2 -+

827	   P5    ========> Ptr3 ---> NH2

829	   P6    ========> Ptr4 -+-> NH2
830	   P7    ========> Ptr4 -+

832	                 Figure 5 Indirect FIB for ASBR protection

834	   If, for example, ASBR1 goes down, this affects prefixes P1 through
835	   P4. In order to set the correct backup routes, the container
836	   referenced by Ptr1 needs to be updated to NH2 (next-hop of ASBR2) but
837	   the location referenced by Ptr2 needs to be updated to NH3 (next-hop
838	   of ASBR3). This means that P1 and P2 may constitute the same FEC but
839	   P3 and P4 needs to be another FEC so that there backups can be set
840	   independently.

842	   Note that the routes towards ASBR2 or ASBR3 may have changed, too.
843	   For example, if after the failure ASBR3 would use a new next-hop NH5,
844	   then the container referenced by Ptr2 should be updated to NH5. A
845	   resulting detour FIB is shown in the following figure.

847	         FIB lookup
848	   ASBR4 ========>    NH4
849	   ASBR2 ========>    NH2
850	   ASBR3 ========>    NH5
851	   ASBR1 ========>     X

853	   P1    ========> Ptr1 -+-> NH2
854	   P2    ========> Ptr1 -+

856	   P3    ========> Ptr2 -+-> NH5
857	   P4    ========> Ptr2 -+

859	   P5    ========> Ptr3 ---> NH2

861	   P6    ========> Ptr4 -+-> NH2
862	   P7    ========> Ptr4 -+

864	          Figure 6 Indirect "detour" FIB in case of ASBR1 failure

866	   During pre-calculation, the control plane pre-downloaded the failure
867	   identifier of ASBR1 and assigned NH5 as the failure specific backup
868	   for routes for ASBR3 and pointer Ptr2 and assigned NH2 as the failure
869	   specific backup for the route referenced by Ptr1.

871	5.3.2.2. Primary and Backup ASBR in Different Areas

873	   By default, the scope of FN messages is limited to a single routing
874	   area.

876	   The IPFRR-FN application of FN, may, however, need to redistribute
877	   some specific notifications across areas in a limited manner.

879	   If an ASBR1 in Area1 goes down and some prefixes need to use ASBR2 in
880	   another Area2, then, besides Area1, routers in Area2 need to know
881	   about this failure. Since communication between non-backbone areas is
882	   done through the backbone areas, it may also need the information.
883	   Naturally, if ASBR2 resides in the backbone area, then the FN of
884	   ASBR1 failure needs to be leaked only to the backbone area.

886	   Leaking is facilitated by area border routers (ABR). During failure
887	   preparation phase, the routing engine of an ABR can determine that
888	   for an intra-area ASBR the backup ASBR is in a different area to
889	   which it is the ABR. Therefore, the routing engine installs such
890	   intra-area ASBRs in an "FN redistribution list" at the dataplane
891	   cards.

893	   The ABR, after receiving FN messages, may conclude in its state
894	   machine that a node failure happened. If this node failure is in the
895	   redistribution list, the ABR will generate an FN with the following
896	   data:

898	    0                   1                   2                   3
899	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
900	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
901	   |               16              |     0x008     | AuType|unused |
902	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
903	   |                        ABR Router ID                          |
904	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
905	   |                       ASBR Router ID                          |
906	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
907	   |                              0x0                              |
908	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
909	   |                     Sequence Number                           |
910	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
911	   |                     Sequence Number (cont'd)                  |
912	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

914	   This message is then distributed to the neighbour area specified in
915	   the redistribution list as a regular FN message. A Link ID of 0x0
916	   specifically signals in the neighbour area that this failure is a
917	   known node failure of the node specified by the "Neighbour Router ID"
918	   field (which was set to the failed ASBR's ID).

920	   ABRs in a non-backbone area need to prepare to redistribute ASBR
921	   failure notifications from within their area to the backbone area.

923	   ABRs in the backbone area need to prepare to redistribute an ASBR
924	   failure notification from the backbone area to that area where a
925	   backup ASBR resides.

927	   Consider the previous example, but now let us assume that the current
928	   area is Area0, ASBR2 and ASBR3 reside in Area1 (reachable through
929	   ABR1) but ASBR 4 resides in Area2 (reachable through ABR2). The
930	   resulting FIBs are shown in the following figures: in case of ASBR2
931	   failure, only Ptr4 needs an update.

933	       FIB lookup
934	   ABR1 ========> NH6
935	   ABR2 ========> NH7

937	   (ASBR4 ========> NH7)  //may or may not be in the FIB
938	   (ASBR2 ========> NH6)  //may or may not be in the FIB
939	   (ASBR3 ========> NH6)  //may or may not be in the FIB
940	   (ASBR1 ========> NH1)  //may or may not be in the FIB

942	   P1   ========> Ptr1 -+-> NH1
943	   P2   ========> Ptr1 -+

945	   P3   ========> Ptr2 -+-> NH1
946	   P4   ========> Ptr2 -+

948	   P5   ========> Ptr3 ---> NH6

950	   P6   ========> Ptr4 -+-> NH6
951	   P7   ========> Ptr4 -+

953	           Figure 7 Indirect FIB for inter-area ASBR protection

955	       FIB lookup
956	   ABR1 ========>    NH6
957	   ABR2 ========>    NH7

959	   (ASBR4  =======> NH7)  //may or may not be in the FIB
960	   (ASBR2  =======>  X )  //may or may not be in the FIB
961	   (ASBR3 ========> NH6)  //may or may not be in the FIB
962	   (ASBR1 ========> NH1)  //may or may not be in the FIB

964	   P1   ========> Ptr1 -+-> NH1
965	   P2   ========> Ptr1 -+

967	   P3   ========> Ptr2 -+-> NH1
968	   P4   ========> Ptr2 -+

970	   P5   ========> Ptr3 ---> NH6

972	   P6   ========> Ptr4 -+-> NH7
973	   P7   ========> Ptr4 -+

975	   Figure 8 Indirect "detour" FIB for inter-area ASBR protection, ASBR2
976	                                  failure

978	5.4. Application to LDP

980	   It is possible for LDP traffic to follow paths other than those
981	   indicated by the IGP.  To do so, it is necessary for LDP to have the
982	   appropriate labels available for the alternate so that the
983	   appropriate out-segments can be installed in the forwarding plane
984	   before the failure occurs.

986	   This means that a Label Switching Router (LSR) running LDP must
987	   distribute its labels for the Forwarding Equivalence Classes (FECs)
988	   it can provide to all its neighbours, regardless of whether or not
989	   they are upstream.  Additionally, LDP must be acting in liberal label
990	   retention mode so that the labels that correspond to neighbours that
991	   aren't currently the primary neighbour are stored.  Similarly, LDP
992	   should be in downstream unsolicited mode, so that the labels for the
993	   FEC are distributed other than along the SPT.

995	   The above criteria are identical to those defined in [RFC5286].

997	   In IP, a received FN message may result in rewriting the next-hop in
998	   the FIB. If LDP is applied, the label FIB also needs to be updated in
999	   accordance with the new next-hop; in the LFIB, however, not only the
1000	   outgoing interface needs to be replaced but also the label that is
1001	   valid to this non-default next-hop. The latter is available due to
1002	   liberal label retention and unsolicited downstream mode.

1004	5.5. Application to VPN PE Protection

1006	   Protecting against (egress) PE router failures in VPN scenarios is
1007	   conceptually similar to protecting against ASBR failures for Internet
1008	   traffic. The difference is that in case of ASBR protection core
1009	   routers are normally aware of external prefixes using iBGP, while in
1010	   VPN cases P routers can only route inside the domain. In case of
1011	   VPNs, tunnels running between ingress PE and egress PE decrease the
1012	   burden for P routers. The task here is to redirect traffic to a
1013	   backup egress PE.

1015	   Egress PE protection effectively calls out for an explicit failure
1016	   notification, yet existing proposals try to avoid it.

1018	   [I-D.bashandy-bgp-edge-node-frr] proposes that the P routers adjacent
1019	   to the primary PE maintain the necessary routing state and perform
1020	   the tunnel decaps/re-encaps to the backup PE, thereby proposing
1021	   considerable complexity for P routers.

1023	   [I-D.ietf-pwe3-redundancy] describes a mechanism for pseudowire
1024	   redundancy, where PE routers need to run multi-hop BFD sessions to
1025	   detect the loss of a primary egress PE. This leads to a potentially
1026	   full mesh of multihop BFD session, which is a tremendous complexity.
1027	   In addition, in some cases the egress PE of the secondary PW might
1028	   need to explicitly set the PW state from standby to active.

1030	   FN provides the needed mechanism to actively inform all nodes
1031	   including PE routers that a failure happened, and also identifies
1032	   that a node failure happened. Furthermore, since both the ingress PE
1033	   and the secondary egress PE are informed, all information is
1034	   available for a proper switch-over. This is without a full mesh of
1035	   BFD sessions running all the time between PE routers.

1037	5.6. Bypassing Legacy Nodes

1039	   Legacy nodes, while cannot originate fast notifications and cannot
1040	   process them either, can be assumed to be able to forward the
1041	   notifications. As [fn-transport] discusses, FN forwarding is based on
1042	   multicast. It is safe to assume that legacy routers' multicast
1043	   configuration can be set up statically so as to be able to propagate
1044	   fast notifications as needed. However, since such legacy nodes will
1045	   not do duplicate check, it is a good idea to either set TTL to a low
1046	   value, or not forward notification to legacy nodes, when there is a
1047	   high number of them in the network.

1049	   When calculating failure specific alternative routes, IPFRR-FN
1050	   capable nodes must consider legacy nodes as being fixed directed
1051	   links since legacy nodes do not change packet forwarding in the case
1052	   of failure. There are situations when an FN-IPFRR capable node can,
1053	   exceptionally, bypass a non-IPFRR-FN capable node in order to handle
1054	   a remote failure.

1056	   As an example consider the topology depicted in Figure 9, where the
1057	   link between C and D fails. C cannot locally repair the failure.

1059	    +---+   +---+   +---+   +---+
1060	    | E |---| F |---| G |---| H |
1061	    +---+   +---+   +---+   +---+
1062	      |            /          |
1063	      |           /           |
1064	      |          /            |
1065	    +---+   +---+   +---+   +---+
1066	    | A |---| B |---| C |-X-| D |
1067	    +---+   +---+   +---+   +---+
1068	      >========>==============>
1069	         Traffic from A to D

1071	                Figure 9 Example for bypassing legacy nodes

1073	   First, let us assume that each node is IPFRR-FN capable. C would
1074	   advertise the failure information using FN. Each node learns that the
1075	   link between C and D fails, as a result of which C changes its
1076	   forwarding table to send any traffic destined to D via B. B also
1077	   makes a change, replacing its default next-hop (C) with G. Note that
1078	   other nodes do not need to modify their forwarding at all.

1080	   Now, let us assume that B is a legacy router not supporting IPFRR-FN
1081	   but it is statically configured to multicast fast notifications as
1082	   needed. As such, A will receive the notification. A's pre-
1083	   calculations have been done knowing that B is unable to correct the
1084	   failure. Node A, therefore, has pre-calculated E as the failure
1085	   specific next-hop. Traffic entering at A and heading to D can thus be
1086	   repaired.

1088	5.7. Capability Advertisement

1090	   The solution requires nodes to know which other nodes in the area are
1091	   capable of IPFRR-FN. The most straightforward way to achieve this is
1092	   to rely on the Router Capability TLVs available both in
1093	   OSPF [RFC4970] and in IS-IS [RFC4971].

1095	5.8. Constraining the Dissemination Scope of Fast Notification Packets

1097	   As discussed earlier in Section 4.4. it is desirable to constrain the
1098	   dissemination scope of failure notification messages.  This section
1099	   presents three candidate methods for controlling the scope of failure
1100	   notification: (1) Pre-configure the TTL for FN messages in routers
1101	   based on best current practices and related studies of available ISP
1102	   and enterprise network topologies; (2) dynamically calculate the
1103	   minimum TTL value needed to ensure 100% remote LFAP coverage; and (3)
1104	   dynamically calculate the set of neighbours for which FN message
1105	   should given the identity of the link that has failed.

1107	   These candidate dissemination options are mechanisms with different
1108	   levels of optimality and complexity.  The intent here is to present
1109	   some options that will generate further discussion on the tradeoffs
1110	   between different FN message scoping methods.

1112	5.8.1. Pre-Configured FN TTL Setting

1114	   As discussed, earlier in Section 4.4. studies of various network
1115	   topologies suggest that a fixed TTL setting of 2 hops may be
1116	   sufficient to ensure failure notification message for typical OSPF
1117	   area topologies.  Therefore, a potentially simple solution for
1118	   constraining FN message dissemination is for network managers to
1119	   configure their routers with fixed TTL setting (e.g., TTL=2 hops) for
1120	   FN messages.  This TTL setting can be adjusted by network managers to
1121	   consider implementation-specific details of the topology such as
1122	   configuring a larger TTL setting for topologies containing, say,
1123	   large ring sub-graph structures.

1125	   In terms of performance trades, pre-configuring the FN TTL, since it
1126	   is fixed at configuration time, incurs no computational overhead for
1127	   the router.  On the other hand, it represents a configurable router
1128	   parameter that network administrators must manage.  Furthermore, the
1129	   fixed, pre-configured FN TTL approach is sub-optimal in terms of
1130	   constraining the FN dissemination as most single link events will not
1131	   require FN messages send to up to TTL hops away from the failure
1132	   site.

1134	5.8.2. Advanced FN Scoping

1136	   While the static pre-configured setting of the FN TTL will likely
1137	   work in practice for a wide range of OSPF area topologies, it has at
1138	   two least weaknesses: (1) There may be certain topologies for which
1139	   the TTL setting happens to be insufficient to provide the needed
1140	   failure coverage; and (2) as discussed above, it tends to result in
1141	   FN being disseminated to a larger radius than needed to facilitate
1142	   re-routing.

1144	   The solution to these drawbacks is for routers to dynamically compute
1145	   the FN TTL radius needed for each of the local links it monitors.
1146	   Doing so addresses the two weakness of a pre-configured TTL setting
1147	   by computing a custom TTL setting for each of its local links that
1148	   matches exactly the FN message radius for the given topology.  The
1149	   drawback, of course, is the additional computations.  However, given
1150	   a quasi-static network topology, it is possible this dynamic FN TTL
1151	   computation is performed infrequently and, therefore, on average
1152	   incurs relatively small computation overhead.

1154	   While a pre-configured TTL eliminates computation overhead at the
1155	   expense of FN dissemination overhead and dynamic updates of the TTL
1156	   settings achieve better dissemination efficiency by incurring some
1157	   computational complexity, directed FN message forwarding attempts to
1158	   minimize the FN dissemination scope by leveraging additional
1159	   computation power.  Here, rather than computing a FN TTL setting for
1160	   each local link, a network employing directed forwarding has each
1161	   router instance R compute the sets of one-hop neighbors to which a FN
1162	   message must be forwarded for every possible failure event in the
1163	   routing area.  This has the beneficial effect of constraining the FN
1164	   scope to the direction where there are nodes that require the FN
1165	   update as opposed to disseminating to the entire TTL hop radius about
1166	   a failure site.  The trade off here, of course, is the additional
1167	   computation complexity incurred and the maintenance of forwarding
1168	   state for each possible failure case.  Reference [Cev2010] gives an
1169	   algorithm for finding, for each failure event, the direct neighbors
1170	   to which the notification should be forwarded.

1172	6. Protection against Replay Attacks

1174	   To defend against replay attacks, recipients should be able to ignore
1175	   a re-sent recording of a previously sent FN packet. This suggests
1176	   that some sort of sequence number should be included in the FN
1177	   packet, the verification of which should not need control plane
1178	   involvement. Since the solution should be simple to implement in the
1179	   dataplane, maintaining and verifying per-source sequence numbers is
1180	   not the best option.

1182	   We propose, therefore, that messages should be stamped with the
1183	   digest of the actual routing configuration, i.e., a digest of the
1184	   link state database of the link state routing protocol. The digest
1185	   has to be picked carefully, so that if two LSDBs describe the same
1186	   connectivity information, their digest should be identical as well,
1187	   and different LSDBs should result in different digest values with
1188	   high probability.

1190	   The conceptual way of handling these digests could be the following:

1192	   o  When the LSDB changes, the IGP re-calculates the digest and
1193	      downloads the new value to the dataplane element(s), in a secure
1194	      way.

1196	   o  When a FN packet is originated, the digest is put into the FN
1197	      message into the Sequence Number field.

1199	   o  Network nodes distribute (forward) the FN packet.

1201	   o  When processing, the dataplane element first performs an
1202	      authentication check of the FN packet, as described in [fn-
1203	      transport].

1205	   o  Finally, before processing the failure notification, the dataplane
1206	      element should check whether its own known LSDB digest is
1207	      identical with the one in the message.

1209	   If due to a failure event a node disseminates a failure notification
1210	   with FN, an attacker might capture the whole packet and re-send it
1211	   later. If it resends the packet after the IGP re-converged on the new
1212	   topology, the active LSDB digest is different, so the packet can be
1213	   ignored. If the packet is replayed to a recipient who still has the
1214	   same LSDB digest, then it means that the original failure
1215	   notification was already processed but the IGP has not yet finished
1216	   converging; the IPFRR detour is already active, the replica has no
1217	   impact.

1219	6.1. Calculating LSDB Digest

1221	   We propose to create an LSDB digest that is conceptually similar
1222	   to [ISISDigest]. The operation is proposed to be the following:

1224	   o  Create a hash from each LSA(OSPF)/LSP(ISIS) one by one

1226	   o  XOR these hashes together

1228	   o  When an LSA/LSP is removed, the new LSDB digest is received by
1229	      computing the hash of the removed LSA, and then XOR to the
1230	      existing digest

1232	   o  When an LSA/LSP is added, the new LSDB digest is received by
1233	      computing the hash of the new LSA, and then XOR to the existing
1234	      digest

1236	7. Security Considerations

1238	   The IPFRR application of Fast Notification does not raise further
1239	   known security consideration in addition to those already present in
1240	   Fast Notification itself. If an attacker could send false Failure
1241	   Identification Messages or could hinder the transmission of legal
1242	   messages, then the network would produce an undesired routing
1243	   behavior. These issues should be solved, however, in [fn-transport].

1245	   IPFRR-FN relies on the authentication mechanism provided by the Fast
1246	   Notification transport protocol [fn-transport]. The specification of
1247	   the FN transport protocol requires applications to protect against
1248	   replay attacks with application specific sequence numbers. This
1249	   draft, therefore, describes its own proposed sequence number in
1250	   Section 5.8.1.

1252	8. IANA Considerations

1254	   The Failure Identification message types need to be allocated a value
1255	   in the FN App Type field.

1257	   IPFRR-FN capability needs to be allocated within Router Capability
1258	   TLVs both for OSPF [RFC4970] and in IS-IS [RFC4971].

1260	9. References

1262	9.1. Normative References

1264	   [RFC5286]
1265	             A. Atlas, A. Zinin, "Basic specification for IP Fast-
1266	             Reroute: Loop-Free Alternates", Internet Engineering Task
1267	             Force: RFC 5286, 2008.

1269	   [fn-transport]
1270	             W. Lu, S. Kini, A. Csaszar, G. Enyedi, J. Tantsura, A.
1271	             Tian, "Transport of Fast Notifications Messages", draft-lu-
1272	             fn-transport, 2011

1274	   [RFC4970]
1275	             A. Lindem et al., Extensions to OSPF for Advertising
1276	             Optional Router Capabilities, RFC 4970, 2007

1278	   [RFC4971]
1279	             JP. Vasseur et al., Intermediate System to Intermediate
1280	             System (IS-IS) Extensions for Advertising Router
1281	             Information, RFC 4971, 2007

1283	   [RFC4203]
1284	             K. Kompella, Y. Rekhter, " OSPF Extensions in Support of
1285	             Generalized Multi-Protocol Label Switching (GMPLS)",
1286	             RFC4203, 2005

1288	9.2. Informative References

1290	   [BFD]
1291	             D. Katz, D. Ward, "Bidirectional forwarding detection",
1292	             RFC 5880, IETF, 2010

1294	   [RFC5714]
1295	             M. Shand, S. Bryant, "IP Fast Reroute Framework", RFC 5714,
1296	             IETF, 2010.

1298	   [Cev2010]
1299	             S. Sevher, T. Chen, I. Hokelek, J. Kang, V. Kaul, Y.J. Lin,
1300	             M. Pang, R. Rodoper, S. Samtani, C. Shah, J. Bowcock, G. B.
1301	             Rucker, J. L. Simbol and A. Staikos, "An Integrated Soft
1302	             Handoff Approach in IP Fast Reroute in Wireless Mobile
1303	             Networks", In Proceedings IEEE COMSNETS, 2011.

1305	   [Eny2009]
1306	             Gabor Enyedi, Peter Szilagyi, Gabor Retvari, Andras
1307	             Csaszar, "IP Fast ReRoute: Lightweight Not-Via without
1308	             Additional Addresses", IEEE INFOCOM-MiniConference, Rio de
1309	             Janeiro, Brazil, 2009.

1311	   [FIFR]
1312	             J. Wand, S. Nelakuditi, "IP fast reroute with failure
1313	             inferencing", In Proceedings of ACM SIGCOMM Workshop on
1314	             Internet Network Management - The Five-Nines Workshop,
1315	             2007.

1317	   [Hok2007]
1318	             I. Hokelek, M. A. Fecko, P. Gurung, S. Samtani, J. Sucec,
1319	             A. Staikos, J. Bowcock and Z. Zhang, "Seamless Soft Handoff
1320	             in Wireless Battlefield Networks Using Local and Remote
1321	             LFAPs", In Proceedings IEEE MILCOM, 2007.

1323	   [Hok2008]
1324	             I. Hokelek, S. Cevher, M. A. Fecko, P. Gurung, S. Samtani,
1325	             Z. Zhang, A. Staikos and J. Bowcock, "Testbed
1326	             Implementation of Loop-Free Soft Handoff in Wireless
1327	             Battlefield Networks", In Proceedings of the 26th Army
1328	             Science Conference, December 1-4, 2008.

1330	   [MRC]
1331	             T. Cicic, A. F. Hansen, A. Kvalbein, M. Hartmann, R.
1332	             Martin, M. Menth, S. Gjessing, O. Lysne, "Relaxed multiple
1333	             routing configurations IP fast reroute for single and
1334	             correlated failures", IEEE Transactions on Network and
1335	             Service Management, available online:
1336	             http://www3.informatik.uni-
1337	             wuerzburg.de/staff/menth/Publications/papers/Menth08-Sub-
1338	             4.pdf, September 2010.

1340	   [NotVia]
1341	             S. Bryant, M. Shand, S. Previdi, "IP fast reroute using
1342	             Not-via addresses", Internet Draft, draft-ietf-rtgwg-ipfrr-
1343	             notvia-addresses, 2010.

1345	   [RLFAP]
1346	             I. Hokelek, M. Fecko, P. Gurung, S. Samtani, S. Cevher, J.
1347	             Sucec, "Loop-Free IP Fast Reroute Using Local and Remote
1348	             LFAPs", Internet Draft, draft-hokelek-rlfap-01 (expired),
1349	             2008.

1351	   [Ret2011]
1352	             G. Retvari, J. Tapolcai, G. Enyedi, A. Csaszar, "IP Fast
1353	             ReRoute: Loop Free Alternates Revisited", to appear at IEEE
1354	             INFOCOM 2011

1356	   [ISISDigest]
1357	             J. Chiabaut and D. Fedyk. IS-IS Multicast Synchronization
1358	             Digest. Available online:
1359	             http://www.ieee802.org/1/files/public/docs2008/aq-fedyk-
1360	             ISIS-digest-1108-v1.pdf, Nov 2008.

1362	   [BGPAddPaths]
1363	             D. Walton, A. Retana, E. Chen, J. Scudder, "Advertisement
1364	             of Multiple Paths in BGP", draft-ietf-idr-add-paths, Work
1365	             in progress

1367	   [DiverseBGP]
1368	             R. Raszuk, et. Al, "Distribution of diverse BGP paths",
1369	             draft-ietf-grow-diverse-bgp-path-dist, Work in progress

1371	   [BGPBestExt]
1372	             P. Marques, R. Fernando, E. Chen, P. Mohapatra, H. Gredler,
1373	             "Advertisement of the best external route in BGP", draft-
1374	             ietf-idr-best-external, Work in progress

1376	   [BRITE]
1377	             Oliver Heckmann et al., "How to use topology generators to
1378	             create realistic topologies", Technical Report, Dec 2002.

1380	   [MRT-ARCH]
1381	             A. Atlas et al., "An Architecture for IP/LDP Fast-Reroute
1382	             Using Maximally Redundant Trees", Internet Draft, draft-
1383	             ietf-rtgwg-mrt-frr-architecture-01, 2012

1385	   [MRT-ALG]
1386	             A. Atlas, G. Enyedi, A. Csaszar, "Algorithms for computing
1387	             Maximally Redundant Trees for IP/LDP Fast-Reroute",
1388	             Internet Draft, draft-enyedi-rtgwg-mrt-frr-algorithm-01,
1389	             2012

1391	   [I-D.ietf-pwe3-redundancy]
1392	             P. Muley, M. Aissaoui, M. Bocci, "Pseudowire Redundancy",
1393	             draft-ietf-pwe3-redundancy (Work in progress!), May 2012

1395	   [I-D.bashandy-bgp-edge-node-frr]
1396	             A. Bashandy, B. Pithawala, K. Patel, "Scalable BGP FRR
1397	             Protection against Edge Node Failure", draft-bashandy-bgp-
1398	             edge-node-frr (Work in progress!), March 2012

1400	10. Acknowledgments

1402	   The authors would like to thank Albert Tian, Wenhu Lu, Acee Lindem
1403	   and Ibrahim Hokelek for the continuous discussions and comments on
1404	   the topic, as well as Joel Halpern for his comments and review.

1406	Appendix A.                 Memory Needs of a Naive Implementation

1408	   Practical background might suggest that storing and maintaining
1409	   backup next-hops for many potential remote failures could overwhelm
1410	   the resources of router linecards. This section attempts to provide a
1411	   calculation describing the approximate memory needs in reasonable
1412	   sized networks with a possible implementation.

1414	A.1. An Example Implementation

1416	   Let us suppose that for exterior destinations the forwarding engine
1417	   is using recursive lookup or indirection in order to improve updating
1418	   time such as described in Section 5.3. We are also supposing that the
1419	   concept of "prefix groups" is applied, i.e. there is an internal
1420	   entity for the prefixes using exactly the same primary and backup
1421	   ASBRs, and the next hop entry for a prefix among them is pointing to
1422	   the next hop towards this entity. See e.g. Figure 7.

1424	   In the sequel, the term of "area" refers to an extended area, made up
1425	   by the OSPF or IS-IS area containing the router, with the prefix
1426	   groups added to the area as virtual nodes. Naturally, a prefix group
1427	   is connected to the egress routers (ABRs) through which it can be
1428	   reached. We just need to react to the failure ID of an ASBR for all
1429	   the prefix groups connected to that ASBR; technically, we must
1430	   suppose that one of the virtual links of all the affected prefix
1431	   groups go down.

1433	   Here we show a simple naive implementation which can easily be beaten
1434	   in real routers. This implementation uses an array for all the nodes
1435	   (including real routers and virtual nodes representing prefix groups)
1436	   in the area (node array in the sequel), made up by two pointers and a
1437	   length filed (an integer) per record. One of the pointers points to
1438	   another array (called alternative array). That second array is
1439	   basically an enumeration containing the IDs of those failures
1440	   influencing a shortest path towards that node and an alternative
1441	   neighbor, which can be used, when such a failure occurs. When a
1442	   failure is detected, (either locally, or by FN), we can easily find
1443	   the proper record in all the lists. Moreover, since these arrays can
1444	   be sorted based on the failure ID, we can even use binary search to
1445	   find the needed record. The length of this array is stored in the
1446	   record of the node array pointing to the alternative list.

1448	   Now, we only need to know, which records in the FIB should be
1449	   updated. Therefore there is a second pointer in the node array
1450	   pointing to that record.

1452	          +-------+-------+-------+--   --+-------+
1453	          |   r1  |   r2  |   r3  |  ...  |   rk  |
1454	          +-------+-------+-------+--   --+-------+
1455	              |       |       |               |
1456	              |       |       |               |
1457	             \|/     \|/     \|/             \|/
1458	              *       *       *               *
1459	          +-------+-------+-------+--   --+-------+
1460	          | fail1 | fail2 | fail3 |       | failk |
1461	          | alt.1 | alt.2 | alt.3 |  ...  | alt.k |
1462	          +-------+-------+-------+--   --+-------+
1463	          | fail4 |       | fail5 |
1464	          | alt.4 |       | alt.5 |
1465	          +-------+       +-------+
1466	          | fail6 |
1467	          | alt.6 |
1468	          +-------+

1470	                 Figure 10The way of storing alternatives

1472	A.2. Estimation of Memory Requirements.

1474	   Now, suppose that there are V nodes in the extended area, the network
1475	   diameter is D, a neighbor descriptor takes X bytes, a failure ID
1476	   takes Y bytes and a pointer takes Z bytes. We suppose that lookup for
1477	   external prefixes are using indirection, so we only need to deal with
1478	   destinations inside the extended area. In this way, if there is no
1479	   ECMP, this data structure takes

1481	      (2*Z+Y)*(V-1) + 2*(X+Y)*D*(V-1)

1483	   bytes altogether. The first part is the memory consumption of the
1484	   node array. The memory needed by alternative arrays: any path can
1485	   contain at most D nodes and D links, each record needs X+Y bytes;
1486	   there are records for all the other nodes in the area (V-1 nodes).
1487	   Observe that this is a very rough overestimation, since most of the
1488	   possible failures influencing the path will not change the next hop.

1490	   For computing memory consumption, suppose that neighbor descriptors,
1491	   failure IDs and pointers take 4 bytes, there are 10000 nodes in the
1492	   extended area (so both real routers and virtual nodes representing
1493	   prefix groups are included) and the network diameter is 20 hops. In
1494	   this case, we get that the node array needs about 120KB, the
1495	   alternative array needs about 3.2MB, so altogether 3.4MB if there is
1496	   no ECMP. Observe that the number of external prefixes is not
1497	   important.

1499	   If however, there are paths with equal costs, the size of the
1500	   alternative array increases. Suppose that there are 10 equal paths
1501	   between ANY two nodes in the network. This would cause that the
1502	   alternative list gets 10 times bigger, and now it needs a bit less
1503	   than 32MB. Observe that the node array still needs only about 160KB,
1504	   so 32MB is a good overestimation, which is likely acceptable for
1505	   modern linecards with gigs of DRAM. Moreover, we need to stress here
1506	   again that this is an extremely rough overestimation, so in reality
1507	   much less memory will be enough. Furthermore, usually only protecting
1508	   outer prefixes is needed, so we only need to protect the paths
1509	   towards the prefix groups, which further decreases both the size of
1510	   node array and the number of alternative lists.

1512	A.3. Estimation of Failover Time

1514	   After a failover was detected either locally or by using FN, the
1515	   nodes need to change the entries in their FIB. Here we do a rough
1516	   estimation to show that the previous implementation can do it in at
1517	   most a few milliseconds.

1519	   We are supposing that we have the data structure described in the
1520	   previous section. When a failure happens we need to decide for each
1521	   node in the node table whether the shortest path towards that
1522	   destination was influenced by the failure. We can sort the elements
1523	   in the alternative list, so now we can use binary search, which needs
1524	   ceil(log(2D)) memory access (log here has base 2) for worst case. We
1525	   need one more access to get the node list entry and another to
1526	   rewrite the FIB.

1528	   We suppose DDR3 SDRAM with 64 byte cache line, which means that up to
1529	   8 entries of the alternative list can be fetched from the RAM at a
1530	   time, so the previous formula is modified as we need ceil(log(D/4))+2
1531	   transactions. In this way for D=20 and V=10.000 we need
1532	   (3+2)*10.000=50.000 transactions. If we suppose 10 ECMP paths as
1533	   previously, D=200 and we need (5+2)*10000=70.000 transactions.

1535	   We can do a very conservative estimation by supposing a recent DDR3
1536	   SDRAM module which can do 5MT/s with completely random access, so
1537	   doing 50.000 or 70.000 transaction takes 10ms or 14ms. Keep in mind
1538	   that we assumed that there is only one memory controller, we always
1539	   got the result of the search with the last read, and all the
1540	   alternative lists were full. Moreover, internal system latencies
1541	   (e.g. multiple memory requests) were overestimated seriously, since a
1542	   DDR3 SDRAM can reach even 6 times this speed with random access.

1544	Appendix B.                 Impact Scope of Fast Notification

1546	   The memory and fail-over time calculations presented in Appendix A
1547	   are based on worst-case estimation. They assume that basically in a
1548	   network with diameter equal to 20 hops, each failure has a route
1549	   changing consequence on all routers in the full diameter.

1551	   This section provides experimental results on real-world topologies,
1552	   showing that already 100% failure coverage can be achieved within a
1553	   2-hop radius around the failure.

1555	   We performed the coverage analysis of the fast reroute mechanism
1556	   presented here on realistic topologies, which were generated by the
1557	   BRITE topology generator in bottom-up mode [BRITE]. The coverage
1558	   percentage is defined here as the percentage of the number of useable
1559	   backup paths for protecting the primary paths which are failed
1560	   because of link failures to the number of all failed primary paths.

1562	   The realistic topologies include AT&T and DFN using pre-determined
1563	   BRITE parameter values from [BRITE] and various random topologies
1564	   with different number of nodes and varying network connectivity. For
1565	   example, the number of nodes for AT&T and DFN are 154 and 30,
1566	   respectively, while the number of nodes for other random topologies
1567	   is varied from 20 to 100. The BRITE parameters which are used in our
1568	   topology generation process are summarized in Figure 11 (see [BRITE]
1569	   for the details of each parameter). In summary, m represents the
1570	   average number of edges per node and is set to either 2 or 3. A
1571	   uniform bandwidth distribution in the range 100-1024 Mbps is selected
1572	   and the link cost is obtained deterministically from the link
1573	   bandwidth (i.e., inversely proportional to the link bandwidth as used
1574	   by many vendors). Since the values for p(add) and beta determine the
1575	   number of edges in the generated topologies, their values are varied
1576	   to obtain network topologies with varying connectivity (e.g., sparse
1577	   and dense).

1579	      |----------------------------|-----------------------|
1580	      |                            |  Bottom up            |
1581	      |----------------------------|-----------------------|
1582	      |  Grouping Model            |  Random pick          |
1583	      |  Model                     |  GLP                  |
1584	      |  Node Placement            |  Random               |
1585	      |  Growth Type               |  Incremental          |
1586	      |  Preferential Connectivity |  On                   |
1587	      |  BW Distribution           |  Uniform              |
1588	      |  Minimum BW                |  100                  |
1589	      |  Maximum BW                |  1024                 |
1590	      |  m                         |  2-3                  |
1591	      |  Number of Nodes (N)       |  20,30,50,100,154     |
1592	      |  p(add)                    |  0.01,0.05,0.10,0.42  |
1593	      |  beta                      |  0.01,0.05,0.15,0.62  |
1594	      |----------------------------|-----------------------|

1596	              Figure 11   BRITE topology generator parameters

1598	   The coverage percentage of our fast reroute method is reported for
1599	   different network topologies (e.g., different number of nodes and
1600	   varying network connectivity) using neighborhood depths of 0, 1, and
1601	   2. (i.e., X=0, 1, and 2). For a particular failure, backup routes
1602	   protecting the failed primary paths are calculated only by those
1603	   nodes which are within the selected radious of this failure. Note
1604	   that these nodes are determined by the parameter X as follows: For
1605	   X=0, two nodes which are directly connected to the failed link, for
1606	   X=1, two nodes which are directly connected to the failed link and
1607	   also neighboring nodes which are adjacent to one of the outgoing
1608	   links of these two nodes, and so on.

1610	   The coverage percentage for a certain topology is computed by the
1611	   following formula: Coverage Percentage = N_backupsexist*100/N_fpp
1612	   where N_backupsexist is the number of source-destination pairs whose
1613	   primary paths are failed because of link failures and have backup
1614	   paths for protecting these failed paths, and N_fpp is the number of
1615	   source-destination pairs whose primary paths are failed because of
1616	   link failures. The source-destination pairs, in which source and
1617	   destination nodes do not have any physical connectivity after a
1618	   failure, are excluded from N_fpp. Note that the coverage percentage
1619	   includes a network-wide result which is calculated by averaging all
1620	   coverage results obtained by individually failing all edges for a
1621	   certain network topology.

1623	   Figure 12 shows the coverage percentage results for random topologies
1624	   with different number of nodes (N) and network connectivity, and
1625	   Figure 13 shows these results for AT&T and DFN topologies. In these
1626	   figures, E_mean represents the average number of edges per node for a
1627	   certain topology. Note that the average number of edges per node is
1628	   determined by the parameters m, p(add), and beta. We observed that
1629	   E_mean increases when p(add) and beta values increase. For each
1630	   topology, coverage analysis is repeated for 10 topologies generated
1631	   randomly by using the same BRITE parameters. E_mean and coverage
1632	   percentage are obtained by averaging the results of these ten
1633	   experiments.

1635	     |------------|-----|------|------|------|------|
1636	     |   Case     |N    |E_mean|X=0   |X=1   |X=2   |
1637	     |------------|-----|------|------|------|------|
1638	     |p(add)=0.01 |20   |3.64  |82.39 |98.85 |100.0 |
1639	     |beta=0.01   |50   |3.86  |82.10 |98.69 |100.0 |
1640	     |            |100  |3.98  |83.21 |98.04 |100.0 |
1641	     |------------|-----|------|------|------|------|
1642	     |p(add)=0.05 |20   |3.70  |85.60 |99.14 |100.0 |
1643	     |beta=0.05   |50   |4.01  |84.17 |99.09 |100.0 |
1644	     |            |100  |4.08  |83.35 |98.01 |100.0 |
1645	     |------------|-----|------|------|------|------|
1646	     |p(add)=0.1  |20   |5.52  |93.24 |100.0 |100.0 |
1647	     |beta=0.15   |50   |6.21  |91.46 |99.87 |100.0 |
1648	     |            |100  |6.39  |91.17 |99.86 |100.0 |
1649	     |------------|-----|------|------|------|------|

1651	       Figure 12  Coverage percentage results for random topologies

1653	     |------------|-----------|------|------|------|------|
1654	     |   Case     |N          |E_mean|X=0   |X=1   |X=2   |
1655	     |------------|-----------|------|------|------|------|
1656	     |p(add)=0.42 |154 (AT&T) |6.88  |91.04 |99.81 |100.0 |
1657	     |beta=0.62   |30  (DFN)  |8.32  |93.76 |100.0 |100.0 |
1658	     |------------|-----------|------|------|------|------|

1660	    Figure 13  Coverage percentage results for AT&T and DFN topologies

1662	   There are two main observations from these results:

1664	   1. As the neighborhood depth (X) increases the coverage percentage
1665	   increases and the complete coverage is obtained using a low
1666	   neighborhood depth value (i.e., X=2). This result is significant
1667	   since failure notification message needs to be sent only to nodes
1668	   which are two-hop away from the point of failure for the complete
1669	   coverage. This result supports that our method provides fast
1670	   convergence by introducing minimal signaling overhead within only the
1671	   two-hop neighborhood.

1673	   2. The topologies with higher connectivity (i.e., higher E_mean
1674	   values) have better coverage compared to the topologies with lower
1675	   connectivity (i.e., lower E_mean values). This is an intuitive result
1676	   since the number of possible alternate hops in dense network
1677	   topologies is higher than the number of possible alternate hops in
1678	   sparse topologies. This phenomenon increases the likelihood of
1679	   finding backup paths, and therefore the coverage percentage.

1681	Authors' Addresses

1683	   Andras Csaszar
1684	   Ericsson
1685	   Irinyi J utca 4-10, Budapest, Hungary, 1117
1686	   Email: Andras.Csaszar@ericsson.com

1688	   Gabor Sandor Enyedi
1689	   Ericsson
1690	   Irinyi J utca 4-10, Budapest, Hungary, 1117
1691	   Email: Gabor.Sandor.Enyedi@ericsson.com

1693	   Jeff Tantsura
1694	   Ericsson
1695	   300 Holger Way, San Jose, CA 95134
1696	   Email: jeff.tantsura@ericsson.com

1698	   Sriganesh Kini
1699	   Ericsson
1700	   300 Holger Way, San Jose, CA 95134
1701	   Email: sriganesh.kini@ericsson.com

1703	   John Sucec
1704	   Telcordia Technologies
1705	   One Telcordia Drive, Piscataway, NJ  08854
1706	   Email: sucecj@telcordia.com

1708	   Subir Das
1709	   Telcordia Technologies
1710	   One Telcordia Drive, Piscataway, NJ  08854
1711	   Email: sdas2@telcordia.com