idnits 2.17.1 

draft-ietf-rtgwg-microloop-analysis-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 17.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 732.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 745.

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** The document seems to lack an RFC 3978 Section 5.5 (updated by RFC 4748)
     Disclaimer -- however, there's a paragraph with a matching beginning.
     Boilerplate error?

  ** The document seems to lack an RFC 3979 Section 5, para. 2 IPR Disclosure
     Acknowledgement -- however, there's a paragraph with a matching
     beginning. Boilerplate error?


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document is more than 15 pages and seems to lack a Table of Contents.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 16
     longer pages, the longest (page 2) being 60 lines

  == It seems as if not all pages are separated by form feeds - found 0 form
     feeds but 17 pages


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** There is 1 instance of too long lines in the document, the longest one
     being 3 characters in excess of 72.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 292: '...       The route SHALL be updated with...'
     RFC 2119 keyword, line 296: '...       The route SHALL be updated with...'
     RFC 2119 keyword, line 298: '...porary next-hops SHALL be used for the...'
     RFC 2119 keyword, line 299: '... After DELAY_TYPEB, the route SHALL be...'
     RFC 2119 keyword, line 303: '...imary) next-hops SHALL continue to be ...'
     (10 more instances...)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (October 2005) is 6761 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISIS'

  == Outdated reference: A later version (-12) exists of
     draft-ietf-rtgwg-ipfrr-spec-base-03

  == Outdated reference: A later version (-13) exists of
     draft-ietf-rtgwg-ipfrr-framework-04

  -- Obsolete informational reference (is this intentional?): RFC 3137 (ref.
     'STUB') (Obsoleted by RFC 6987)


     Summary: 10 errors (**), 0 flaws (~~), 6 warnings (==), 7 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                         Alex Zinin
3	Internet Draft                                                   Alcatel
4	Expiration Date: June 2006                                  October 2005
5	File name: draft-ietf-rtgwg-microloop-analysis-01.txt

7	               Analysis and Minimization of Microloops in
8	                      Link-state Routing Protocols

10	               draft-ietf-rtgwg-microloop-analysis-01.txt

12	Status of this Memo

14	   By submitting this Internet-Draft, each author represents that any
15	   applicable patent or other IPR claims of which he or she is aware
16	   have been or will be disclosed, and any of which he or she becomes
17	   aware will be disclosed, in accordance with Section 6 of BCP 79.

19	   Internet Drafts are working documents of the Internet Engineering
20	   Task Force (IETF), its Areas, and its Working Groups. Note that other
21	   groups may also distribute working documents as Internet Drafts.

23	   Internet-Drafts are draft documents valid for a maximum of six months
24	   and may be updated, replaced, or obsoleted by other documents at any
25	   time.  It is inappropriate to use Internet-Drafts as reference
26	   material or to cite them other than a "work in progress."

28	   The list of current Internet-Drafts can be accessed at
29	   http://www.ietf.org/ietf/1id-abstracts.txt

31	   The list of Internet-Draft Shadow Directories can be accessed at
32	   http://www.ietf.org/shadow.html.

34	Abstract

36	   Link-state routing protocols (e.g. OSPF or IS-IS) are known to
37	   converge to a loop-free state within a finite period of time after a
38	   change in the topology. It is normal, however, to observe short-term
39	   loops during the period of topology update propagation, route
40	   recalculation, and forwarding table update, due to the asynchronous
41	   nature of link-state protocol operation. This document provides an
42	   analysis of formation of such microloops and suggests simple
43	   mechanisms to minimize them.

45	1 Introduction

47	   Link-state routing protocols, such as [OSPF] and [ISIS] converge to a
48	   loop-free state within a finite period of time after a topology
49	   change. Additional changes postpone the convergence, but do not get
50	   in its way.

52	   During the period of convergence, however, link-state protocols
53	   exhibit short-term routing table inconsistencies caused by the
54	   protocol's asynchronous nature.  These incornsistencies may cause
55	   short-term packet loops, also known as microloops. For example, see a
56	   sample network in Figure 1.

58	                         +--+    1    +--+
59	                         |A |---------|B |
60	                         +--+         +--+
61	                          |  \  10      |
62	                         5|   ------    |1
63	                          |         \   |
64	                         +--+   10   \+--+
65	                         |E |---------|C |
66	                         +--+         +--+
67	                             \_      /
68	                            5  \    /1 (failure)
69	                                +--+
70	                                |D |
71	                                +--+

73	                        Figure 1. Microloop example

75	   We are interested in routers A and B and their best paths towards D.
76	   Before failure, B's best path to D is B-C-D with cost 2, and A's best
77	   path is A-B-C-D with cost 3.  When link C-D fails, both C and D
78	   announce their link state information with link C-D missing. Within a
79	   finite period of time, both A and B shall receive the topology
80	   updates and converge on them, installing new best paths: A-E-D (10)
81	   for A, and B-A-E-D (11) for B.  However, if, due to the timing
82	   differences, B calculates and installs its new best path through A
83	   before A has a chance to switch from B to E, a microloop will form
84	   between A and B for the duration of time required for A to complete
85	   its routing table update.

87	   Similar microloops may form when other topological changes happen in
88	   the network, for example, when a new link or a node is added, a link
89	   cost is changed, etc. In summary, whenever a topological change in
90	   the network results in changes of the shortest path three (SPT) for
91	   more than one node, it is possible for the network to exhibit
92	   temporary loops.

94	   This document provides an analysis of microloop formation.
95	   Specifically, we categorize different types of reconvergence
96	   scenarios, and explore their properties. We then show that in certain
97	   scenarios microloops do not form, in others they can be eliminated
98	   using simple techniques described in this document, and define
99	   scenarios where more sophisticated loop avoidance mechanisms may be
100	   necessary.

102	   It is useful to understand the relationship between [IPFRR] and the
103	   technique described here. The two mechanisms play complimentary roles
104	   to each other: deploying [IPFRR] without micro-loop prevention only
105	   partially addresses the goal of minimizing packet loss during network
106	   reconvergence, since packets will be lost due to microloops. On the
107	   other hand, micro-loop prevention described in this document relies
108	   on [IPFRR] local failure protection, as routers will keep forwarding
109	   traffic down the old path until the new next-hops are known to be
110	   safe.

112	2 Analysis

114	   To analyse the behavior of a network during reconvergence, we look at
115	   a given router and its neighbors before failure and during the
116	   transition to the new routes. More specifically, we analyse whether
117	   switching to the new routing information can result in loop formation
118	   or not.

120	2.1 Terminology

122	   The following terms are used in the draft.

124	     Dopt(X,Y)
125	          Integer function defined as the cumulative cost of the least-
126	          cost path from node X to node Y in a topology graph. Normally
127	          calculated by link-state routing protocols using Dijkstra
128	          algorithm as part of regular route calculation procedures.
129	          This is the same as "Distance_opt(A,B)" defined in [IPFRR-FW]

131	     Downstream neighbor
132	          Neighbor N of router S is considered S's downstream neighbor
133	          for destination D, if Dopt(N, D) < Dopt(S, D)

135	     Primary neighbor
136	          Neighbor N of router S is considered S's primary neighbor for
137	          destination D, if the path via N is such that D(S, D) is mini-
138	          mized, i.e. N provides a shortest path to D according to the
139	          SPF calculation.

141	     Loop-free neighbor
142	          Neighbor N of router S is considered S's loop-free neighbor
143	          for destination D, if Dopt(N, D) < Dopt(N, S) + Dopt(S, D).
144	          Note that a loop-free neighbor may be, for example, router's
145	          primary before and/or after failure.

147	 2.2 Next hop safety condition

149	   We start the analysis of single-hop loops with the following observa-
150	   tion:

152	     After a topology change, there are precisely two situations when a
153	     microloop between routers X and Y can form:

155	     a)   Before failure, X uses Y as its next-hop. After failure, Y
156	          uses X as its next-hop. Y updates its routes based on the new
157	          topology before X.

159	     b)   Exact opposite of the previous case. Before failure, Y uses X
160	          as its next-hop.  After failure, X uses Y as its next-hop. X
161	          updates its routes based on the new topology before Y.

163	   Formulating this for a given calculating router S (either X or Y in
164	   the above example) switching to a new primary Pn, a microloop may
165	   occur between S and Pn only if Pn was forwarding through S before
166	   failure.

168	   Based on the above, we can define a general safety condition for any
169	   neighbor N (whether new primary or not) of router S that has just
170	   learned about a topology change. Note that the condition must satisfy
171	   the topological criteria above, and be non-recursive, i.e. not lead
172	   to loops if both S and N follow it.

174	     Next-hop safety condition:

176	          After a topology change, it is safe for router S to switch to
177	          neigbor N as its next-hop for a specific destination if the
178	          path through N satisfies both of the following criteria:

180	          1.   S considered N as its loop-free neighbor based on the
181	               topology before change AND

183	          2.   S considers N as its downstream neighbor based on the
184	               topology after change.

186	          The first requirement ensures that N has not been forwarding
187	          traffic to S before the change occured and both S and N used
188	          old topology. The second requirement makes sure N does not
189	          forward traffic to S when N learns the new topology. Note
190	          again, that N is S's any neighbor, and may or may not be used
191	          by S as its new primary or a temporary safe neighbor.

193	          The difference in the conditions before and after failure is
194	          there to make sure that S and N do not recursively consider
195	          each other as safe next-hops when they learn about the fail-
196	          ure.

198	 2.3 Transition types

200	   Here, we analyse different types of scenarios that a given router may
201	   find itself in after learning about a topology change.

203	   For each destination affected by a topological change, the network
204	   will have three major types of nodes categorized by the degree of
205	   safety of their old primary, new primary, and other neighbors. (Note
206	   that we do not yet consider ECMP, which will be discussed in section
207	   3.2.)

209	     Type A

211	          Routers whose new primary next-hops after the topology change
212	          are safe and transition to them will not create a microloop.
213	          Two subtypes are recognized:

215	          A1:  Routers whose primaries haven't changed as a result of
216	               the topology change

218	          A2:  Routers whose new primary satisfies the safety condition

220	     Type B

222	          Routers whose new primary next-hops after the topology change
223	          do not satisfy the safety condition, but that have at least
224	          one other neighbor that does. Note that such a neighbor can be
225	          the router's old primary (type B1) or a neighbor that is nei-
226	          ther old nor new primary (type B2).

228	     Type C

230	          Routers that have no neighbor that satisfies the safety condi-
231	          tion.

233	   It is clear that nothing special needs to be done for type-A routers
234	   as they either do not need to modify their routes or can immediately
235	   switch to the new primary next hops.

237	   It can also be shown that if type-B routers do not immediately switch
238	   to their new primaries, but use their safe next-hops for some time,
239	   switching to the new primaries later will not create loops, provided
240	   that their downstream routers have also switched to the safe hops or
241	   have already switched to the new primaries.

243	     NOTE: The above analysis applies to single-hop loops. Multi-hop
244	     loops, possible in networks with asymmetric link costs could be
245	     prevented by using a tighter safety condition.  However, as shown
246	     by simulations on real-life network topologies, doing so would
247	     decrease micro-loop coverage and thus result in increased number of
248	     unprevented single-hop loops.

250	   The following section formally defines the mechanism.

252	3. Loop prevention mechanism

254	 3.1 Basic procedures

256	   The essense of the mechanism defined here, also known as "path lock-
257	   ing via safe neighbors" (PLSN), can be informally summarized as fol-
258	   lows. Upon a topology change, for each destination:

260	     -    Each router in the network assesses safety of its new primary
261	          next-hops.

263	     -    If the new primaries are safe, they are used immediately, oth-
264	          erwise, partial
265	           ordering of updates is introduced:

267	          o    If non-primary safe neighbors are found, they are used
268	               for a period of time, thus locking traffic to a safe path
269	               while the new primaries complete their transition to the
270	               new routes

272	          o    If no safe neighbors are found, the forwarding path is
273	               locked on the old next-hop for some time to give the new
274	               primary enough time to complete route updates.

276	   For a description of several architectural constants used in this
277	   document (named as "DELAY_xxx"), refer to section 3.4.

279	   On receiving a topology update, the router delays its SPF calculation
280	   by DELAY_SPF time in order to collect the remaining updates that
281	   relate to the same topological event (e.g. update from the router
282	   connected to the second end of a point-to-point link in case of a
283	   link failure, or updates from other neighbors of a failed node).

285	   Upon expiration of DELAY_SPF, the router calculates the new SPT, the
286	   new routes, checks the safety status of each neighbor relative to
287	   each affected destination using the conditions in section 3.1, and
288	   applies the following logic for each route depending on the type of
289	   role it finds itself in:

291	     Type A:
292	          The route SHALL be updated with the new primary next-hops
293	          without an additional delay.

295	     Type B:
296	          The route SHALL be updated with one or more temporary next-
297	          hops that satisfy the safety condition without an additional
298	          delay. These temporary next-hops SHALL be used for the dura-
299	          tion of DELAY_TYPEB. After DELAY_TYPEB, the route SHALL be
300	          updated with the new primary next-hops.

302	     Type C:
303	          The route's old (primary) next-hops SHALL continue to be used
304	          for DELAY_TYPEC.  After DELAY_TYPEC, the route SHALL be
305	          updated with the new primary next-hops.

307	   If, after expiration of DELAY_SPF, the router receives a topology
308	   update sooner than DELAY_STABLE after the previous one, the router
309	   MUST fall back to the regular convergence mechanisms by prematurely
310	   expiring DELAY_TYPEB or DELAY_TYPEC timers if they are still running
311	   (thus causing immediate installation of the new primary next-hops),
312	   MUST recalculate its routing table as soon as practical, and MUST
313	   refrain from using the mechanisms described here until it has seen no
314	   topological updates for at least DELAY_STABLE. This is a safeguard
315	   mechanism to ensure that procedures described here are applied only
316	   when a single failure is experienced and that the network converges
317	   in a situation where multiple topological events or network instabil-
318	   ities are experienced.

320	   [ISIS] includes the concept of an Overload bit (OL) that indicates a
321	   node in the network that shouldn't be used as transit. A similar
322	   notion is introduced in OSPF by [STUB] using LSInfinity link costs.
323	   Honoring these conventions, implementations of this document MUST NOT
324	   use neighbors with the OL bit set in IS-IS or announcing links to the
325	   calculating router with LSInfinity cost in OSPF as temporary safe
326	   neighbors.

328	     NOTE: In OSPF, if S's neighbor N is a stub router, the S->N link,
329	     visited first by the SPF algorithm, will normally have a real link
330	     cost, and it is the backwards link N->S announced by N that will
331	     have its cost set to LSInfinity. Implementations have to account
332	     for this details when satisfying the above requirement.

334	3.2 Equal Cost Multipath Considerations

336	   In situations where more than one primary next-hop is available after
337	   the topology change, there are several possible combination of their
338	   safety properties:

340	     1)   All new next-hops satisfy the safery condition (a pure type-A
341	          situation)

343	     2)   Some of the new next-hops satisfy the safety condition, some
344	          of them do not (a combination of type-A and type-B)

346	     3)   None of the new next-hops satisfy the safety condition, how-
347	          ever, there's at least one other neighbor that satisfies it (a
348	          safe non-primary next-hop, causing new primaries to be type-B)

350	     4)   None of the new next-hops satisfy the safety condition, and
351	          there is no other neighbor that satisfies it (a pure type-C
352	          situation).

354	   For situations 1, 3, and 4 above, the implementation merely follows
355	   the basic procedures described in section 3.1

357	   For situation 2 (an A/B combination), the implementation:

359	     1)   SHALL update the route with the new next-hops that satisfy the
360	          safety condition without an additional delay

362	     2)   SHALL add the remaining new next-hops after DELAY_TYPEB.

364	   Note that one could potentially use temporary safe neighbors in situ-
365	   ation 2 above, however this specification does not recommend this to
366	   avoid unnessesary traffic rerouting and hence packet reordering.

368	3.3 Local Failure and IP Fast Reroute Considerations

370	   After detecting a local failure and initiating the local repair
371	   process if IPFRR is supported, the router directly attached to the
372	   point of failure follows the procedures described in this docu-
373	   ment--it delays its SPF calculation to collect updates from other
374	   routers, calculates new routes, and classifies the next-hops.

376	   For routers implementing IPFRR, the difference with routers that
377	   learn about the failure from the routing protocol updates, is that
378	   one or more of the repairing router's old next-hops has become
379	   unavailable, and hence cannot be considered as the temporary safe
380	   next-hops for type-B operation. Also, if the router was able to
381	   locally repair the failure, and the new primary next-hops do not sat-
382	   isfy the safety condition, the router should consider itself in the
383	   middle of type-B operation with the temporary safe neighbor engaged
384	   as part of IP Fast Reroute operation.

386	   Another distinct situation is when the router does not support IPFRR
387	   or could not repair the failure, the new primary next-hops do not
388	   satisfy the safety condition, and there's no other neighbor that
389	   does, i.e. a type-C situation. Unlike other routers in the network,
390	   the router directly connected to the network does not have the old
391	   next-hop any more, and cannot continue using it. Immediately switch-
392	   ing to the new next-hops, on the other hand, may result in a micro-
393	   loop. In this situation, the router MUST discard traffic forwarded
394	   along the affected route for the duration of DELAY_TYPEC, and then
395	   update the routes. Implementations MAY have a configuration option to
396	   allow switching immediately to the new next-hops for situations where
397	   this type of a micro-loop is not a concern. If implemented, this
398	   option MUST be disabled by default.

400	   As a result, there are the following possible scenarios:

402	     1)   If the new primary next-hops satisfy the safery condition, the
403	          router updates the routes without an additional delay.

405	     2)   Otherwise, if the failure could be repaired locally by IP Fast
406	          Reroute, the router continues to use the repair path for
407	          DELAY_TYPEB and updates the routes with the new primary next-
408	          hops after it expires.

410	     3)   Otherwise (new next-hops are not safe, and IPFRR is not sup-
411	          ported or the failure couldn't be repaired), the router dis-
412	          cards traffic for DELAY_TYPEC and updates the routes with the
413	          new primary next-hops after its expiration.

415	3.4 Architectural Constants

417	   The following architectural constants have been used in the descrip-
418	   tion of the algorithm above:

420	     DELAY_SPF
421	          The delay between the moment the router receives the first
422	          topology update after a period of stability and the moment it
423	          starts its routing table recalculation.  This delay is
424	          necessary to collect multiple updates originated by different
425	          routers that relate to the same topological event.

427	     DELAY_STABLE
428	          Period of time, during which the network topology is consid-
429	          ered to be stable if the router receives no topological
430	          updates. When the first update after DELAY_STABLE is received,
431	          all other updates that fit within DELAY_SPF are considered as
432	          related to a single topological event.

434	     DELAY_TYPEB and DELAY_TYPEC
435	          Periods of time used by the router to delay installation of
436	          new primary next-hops after a topology change when the router
437	          has (type-B) or has not (type-C) a safe neighbor to temporary
438	          divert the traffic to in the meantime.

440	   While correctness and effectiveness of the algorithm described here
441	   does not depend on the actual values assigned to the architectural
442	   constants, it does depend on the relationship between them, and the
443	   assumption that all routers in the same network use the same values.

445	   To satisfy these constrains, and yet allow these delays to be
446	   decreased as implementations continue to improve towards faster con-
447	   vergence, this document defines the architectural constants as con-
448	   figurable, specifies the required relationship between the values,
449	   and the default values that should be used by the implementations.

451	   The following equations define the relationship between the constants
452	   that needs to be maintained in order for the mechanism described here
453	   to provide desireable results:

455	    DELAY_SPF > update-propagation-time

457	    DELAY_STABLE > DELAY_TYPEB > DELAY_TYPEC > fault-propagation-time

459	   where:

461	     o    update-propagation-time is the time it is expected to take
462	          routers in the network to detect the failure, and originate
463	          and propagate new link-state information.

465	     o    fault-propagation-time is update-propagation plus the time it
466	          is expected to take routers in the network to calculate the
467	          new SPT, check the safety condition of the neighbors, and
468	          install required FIB entries.

470	   Because fault-propagation-time includes update-propagation-time, and
471	   DELAY_SPF (since every router will delay its SPF according to this
472	   document):

474	    fault-propagation-time > DELAY_SPF + update-propagation-time

476	   and hence the equations above can be converted to one:

478	    DELAY_STABLE > DELAY_TYPEB > DELAY_TYPEC > (DELAY_SPF + update-prop-
479	   agation-time)

481	   The implementations SHOULD use the following default values for the
482	   architectural constants:

484	        Constant                   Default val
485	       ----------------------------------------
486	        DELAY_SPF                   500 msec
487	        DELAY_TYPEC                   2 sec
488	        DELAY_TYPEB                   4 sec
489	        DELAY_STABLE                 10 sec

491	4 Coverage analysis

493	   The above algorithm minimizes the probability of loop formation. More
494	   specifically, loops will only be possible when two neighboring
495	   routers both experience the type C condition after the topology
496	   change. Appendix A shows that transitions between A-A, A-B, A-C, and
497	   B-C routers are loop-free.

499	   While this mechanism does not remove all possible micro-loops, it
500	   addresses the majority of them in topologies with a reasonable level
501	   of physical redundancy.  Topologically, micro-loop coverage provided
502	   by this algorithm is very similar to that provided by [IPFRR]. This
503	   is due to the fact that similar construct are used by both mecha-
504	   nisms.

506	5 Backwards Compatibility Analysis

508	   Effectiveness of the mechanism described here relies on the assump-
509	   tion that all routers in the network support it.

511	   In a situation where some routers do not support the describer mecha-
512	   nism, the network will continue to converge properly fundamentals of
513	   the routing system are not changed.  When a topology change event
514	   occurs in such a network, Type-A and Type-B routers will not substan-
515	   tially change the convergence patterns, as they will switch to
516	   routers that are guaranteed to forward traffic correctly after
517	   DELAY_SPF.  (Note that routers today already implement a delay
518	   similar to DELAY_SPF.)  Type-C routers, when mixed with routers not
519	   supporting this mechanism, may induce longer than usual micro-loops
520	   (up to DELAY_TYPEC), however this delay is in the same order of mag-
521	   nitude as in most deployed networks today.

523	6 Security Considerations

525	   The mechanism described in this document does not modify any routing
526	   protocol messages, and hence no new threats related to packet modifi-
527	   cations or replay attacks are introduced. The mechanism changes cer-
528	   tain delays used in node-local algorithms and introduces partial
529	   event ordering after a topology change has occured. This, however,
530	   does not introduce new security risks. For type-B situations, traffic
531	   to certain destinations can be temporarily routed via next-hop
532	   routers that would not be used with the same topology change if this
533	   mechanism wasn't employed. However, these next-hop routers can be
534	   used anyway when a different topological change occurs, and hence
535	   this can't be viewed as a new security threat.

537	7 Acknowledgements

539	   The author would like to thank Don Fedyk, Chris Martin, Alex Audu,
540	   Olivier Bonaventure, Stefano Previdi, and other members of the IETF
541	   RTGWG for their useful comments. Special thanks go to Alia Atlas,
542	   Mike Shand, and Steward Bryant, who were instrumental in development
543	   of this mechanism, such as fine-tuning the safety condition, simulat-
544	   ing the mechanism, proof-reading the document, and without whom this
545	   work wouldn't be possible.

547	8 References

549	 8.1 Normative References

551	   [OSPF] J. Moy. OSPF version 2. Technical Report RFC 2328, Internet
552	          Engineering Task Force, 1998.

554	   [ISIS] ISO, "Intermediate system to Intermediate system routeing
555	          information exchange protocol for use in conjunction with the
556	          Protocol for providing the Connectionless-mode Network Service
557	          (ISO 8473)," ISO/IEC 10589:1992.

559	   [IPFRR] Atlas, A., "Basic Specification for IP Fast-Reroute:
560	          Loop-free Alternates", Internet Engineering Task Force, Work
561	          in Progress, draft-ietf-rtgwg-ipfrr-spec-base-03.txt

563	 8.2 Informative References

565	   [IPFRR-FW] Shand, M., S. Bryant, "IP Fast Reroute Framework",
566	          Internet Engineering Task Force, Work in Progress, draft-ietf-
567	          rtgwg-ipfrr-framework-04.txt

569	   [STUB] Retana, A., et al, OSPF Stub Router Advertisement, RFC 3137,
570	          Internet Engineering Task Force, 2001.

572	Author's Address

574	    Alex Zinin
575	    Alcatel
576	    701 E Middlefield Rd
577	    Mountain View, CA 94043
578	    E-mail: zinin@psg.com

580	Appendix A. Loop formation analysis

582	   S is the calculating router discovering the failure through a link-state
583	   update. P is the old primary, NP is the new primary.

585	    BF:
586	                        <------
587	            [P]----------------[S]----------------[NP]
588	               ...>?

590	    AF:

592	                                  ------>
593	            [P]----------------[S]----------------[NP]
594	                                              ?<...

596	   To analyze possible loop formation, we need to check the following:

598	     1)   if it is possible for P to start forwarding packets to S
599	          before S switches to NP

601	     2)   if it is possible for NP to be forwarding packets back to S
602	          before or after S starts using it

604	   Assumptions are that type-As switch-over to NP immediately, and type-
605	   Bs and type-Cs wait certain amount of time so that:

607	      DELAY_TYPEB > DELAY_TYPEC > fault-propagation-time

609	   1. S is type A:

611	   BF analysis:

613	     1.1 If P is another type-A, then S cannot be its new primary, since
614	     S has not been P's LFA before (since it's been fwd'ing through P).
615	     Hence, P will not route through S AF, and the will be no loops
616	     between P and S.

618	     1.2 If P is a type-B, then S hasn't been P's LF neighbor BF, and P
619	     will not forward through S at least for DELAY_TYPEB, which gives S
620	     enough time to switch to NP. After DELAY_TYPEB P may start using S
621	     as it's new primary.

623	     1.3 If P is a type-C, then it hasn't been forwarding traffic to S
624	     BF, and will not use S as its new primary at least for DELAY_TYPEC,
625	     which should give S enough time to switch to NP.

627	     1.4 Consequently, no loops will form between a type-A node and it's
628	     old primary before the type-A nodes switches to its new primary.

630	   AF analysis:

632	     1.5 Regardless of its type, NP has not been forwarding packets to S
633	     BF and will not do so AF by definition of type-A.

635	     1.6 Consequently, no loops will form between a type-A node and it's
636	     new primary before or after the type-A nodes switches to it.

638	   2. S is type B:

640	   BF analysis:

642	     2.1 If P is a type-A, then similarly to 1.1 above, there will be no
643	     routes between P and S.

645	     2.2 If P is another type-B, then similarly to 1.2, S will not be
646	     used by P for at least DELAY_TYPEB, and S will have enough time to
647	     switch to its safe hops or NP.

649	     2.3 If P is a type-C, then similarly to 1.3, S hasn't been receiv-
650	     ing traffic from P BF, and will not AF for at least DELAY_TYPEC,
651	     which should give S enough time to switch to its safe hops or NP.

653	     2.4 Consequently, no loops will form between a type-B node and it's
654	     old primary before the type-B nodes switches to its new primary.

656	   AF analysis:

658	     2.5 If NP is a type-A, then because of the DELAY_TYPEB NP must have
659	     had enough time to switch to its new NP, which cannot be S by defi-
660	     nition of SPT considering that NP is S's new nexthop in the SPT AF.

662	     2.6 If NP is another type-B, then because of DELAY_TYPEB, NP must
663	     have had enough time to switch from its old primary and can equally
664	     likely be routing through either its safe hops, or its new primary.
665	     Neither of the two can be S by definition of a downstream node (for
666	     safe hops) and SPT (for new primary).

668	     2.7 If NP is a type-C, then because DELAY_TYPEB > DELAY_TYPEC, NP
669	     must have had enough time to switch to its new primary, which can't
670	     be S by definition of SPT and considering that NP is S's nexthop in
671	     the SPT AF.

673	     2.8 Consequently, no loops will form between a type-B node and it's
674	     new primary before or after the type-A nodes switches to it.

676	   3. S is type C:

678	   BF analysis:

680	     3.1 If P is a type-A, then similarly to 1.1 before, S has not been
681	     P's LF neighbor before and hence won't be its new primary, so no
682	     loops will form between P and S.

684	     3.2 If P is a type-B, then similarly to 1.2, S will not be used by
685	     P for at least DELAY_TYPEB, and because DELAY_TYPEB > DELAY_TYPEC,
686	     S will have enough time to switch to NP.

688	     3.3 If P is another type-C, then it hasn't been using S as its pri-
689	     mary BF, but it is possible for P to consider S as its new primary
690	     AF and to install routes before S after their DELAY_TYPEC expires.
691	     Hence, a microloop is possible between P and S.

693	     3.4 Consequently, a microloop between a type-C node and its old
694	     primary is possible only if the old primary is also a type-C node
695	     and it considers S as its new primary AF. Note that DELAY_TYPEC
696	     only delays probably loop formation, but does not increase its
697	     duration, as both neighboring routers are using the same delay.

699	   AF analysis:

701	     3.5 If NP is a type-A, then because of the DELAY_TYPEC NP must have
702	     had enough time to switch to its new NP, which cannot be S by defi-
703	     nition of SPT considering that NP is S's new nexthop in the SPT AF.

705	     3.6 If NP is a type-B, then because of DELAY_TYPEC, NP must have
706	     had enough time to switch to its safe hops, which can't be S by
707	     definition of a downstream node and considering that NP is S's new
708	     SPT next-hop.

710	     3.7 If NP is another type-C, a loop is possible if S's DELAY_TYPEC
711	     expires before that on NP and NP has been using S as its primary
712	     BF.

714	     3.8 Consequently, a microloop between a type-C node and its new
715	     primary is possible only if the new primary is also a type-C node
716	     and S was NP's primary BF.

718	   4. Given the above analysis, it can be noted that, for a given fail-
719	   ure, presence of single type-C nodes in the network does not create
720	   microloops.
721	    It is the C-C combination that introduces this potential.

723	IPR Disclaimer

725	   The IETF takes no position regarding the validity or scope of any
726	   Intellectual Property Rights or other rights that might be claimed to
727	   pertain to the implementation or use of the technology described in
728	   this document or the extent to which any license under such rights
729	   might or might not be available; nor does it represent that it has
730	   made any independent effort to identify any such rights.  Information
731	   on the procedures with respect to rights in RFC documents can be
732	   found in BCP 78 and BCP 79.

734	   Copies of IPR disclosures made to the IETF Secretariat and any assur-
735	   ances of licenses to be made available, or the result of an attempt
736	   made to obtain a general license or permission for the use of such
737	   proprietary rights by implementers or users of this specification can
738	   be obtained from the IETF on-line IPR repository at
739	   http://www.ietf.org/ipr.

741	   The IETF invites any interested party to bring to its attention any
742	   copyrights, patents or patent applications, or other proprietary
743	   rights that may cover technology that may be required to implement
744	   this standard.  Please address the information to the IETF at ietf-
745	   ipr@ietf.org.

747	Full Copyright Statement

749	   Copyright (C) The Internet Society (2005). This document is subject
750	   to the rights, licenses and restrictions contained in BCP 78, and
751	   except as set forth therein, the authors retain all their rights.

753	   This document and the information contained herein are provided on an
754	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
755	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
756	   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
757	   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFOR-
758	   MATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES
759	   OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.