idnits 2.17.1 

draft-bryant-shand-lf-conv-frmwk-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 18.

  -- Found old boilerplate from RFC 3978, Section 5.5 on line 822.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 794.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 801.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 807.

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
     of the newer disclaimer which includes the IETF Trust according to RFC
     4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 17
     longer pages, the longest (page 3) being 63 lines


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  == The document doesn't use any RFC 2119 keywords, yet seems to have RFC
     2119 boilerplate text.

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (Dec 2005) is 6705 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'RFC2119' is mentioned on line 49, but not defined

  == Missing Reference: 'RFC3036' is mentioned on line 105, but not defined

  ** Obsolete undefined reference: RFC 3036 (Obsoleted by RFC 5036)

  == Missing Reference: 'MPLS-TE' is mentioned on line 120, but not defined

  == Missing Reference: 'IPFRR' is mentioned on line 122, but not defined

  == Missing Reference: 'ZININ' is mentioned on line 265, but not defined

  == Missing Reference: 'Zinin' is mentioned on line 368, but not defined

  == Missing Reference: 'TUNNEL' is mentioned on line 539, but not defined

  == Missing Reference: 'RFC791' is mentioned on line 558, but not defined


     Summary: 4 errors (**), 0 flaws (~~), 12 warnings (==), 7 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	INTERNET DRAFT  A Framework for Loop-free Convergence        Dec 2005

4	Network Working Group                                         S. Bryant
5	Internet Draft                                                 M. Shand
6	Expiration Date: Dec 2005                                 Cisco Systems

8	                                                               Jun 2005

10	                A Framework for Loop-free Convergence
11	              <draft-bryant-shand-lf-conv-frmwk-01.txt>

13	Status of this Memo

15	   By submitting this Internet-Draft, each author represents that any
16	   applicable patent or other IPR claims of which he or she is aware
17	   have been or will be disclosed, and any of which he or she becomes
18	   aware will be disclosed, in accordance with Section 6 of BCP 79.

20	   Internet-Drafts are working documents of the Internet Engineering
21	   Task Force (IETF), its areas, and its working groups.  Note that
22	   other groups may also distribute working documents as
23	   Internet-Drafts.

25	   Internet-Drafts are draft documents valid for a maximum of six
26	   months and may be updated, replaced, or obsoleted by other
27	   documents at any time.  It is inappropriate to use Internet-Drafts
28	   as reference material or to cite them other than as "work in
29	   progress."

31	   The list of current Internet-Drafts can be accessed at
32	   http://www.ietf.org/1id-abstracts.html

34	   The list of Internet-Draft Shadow Directories can be accessed at
35	   http://www.ietf.org/shadow.html

37	Abstract
38	   This draft describes mechanisms that may be used to prevent or to
39	   suppress the formation of micro-loops when an IP or MPLS network
40	   undergoes topology change due to failure, repair or management
41	   action.

43	Conventions used in this document

45	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
46	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
47	   this document are to be interpreted as described in RFC 2119
48	   [RFC2119].

50	Table of Contents
51	1. Introduction........................................................3

53	2. The Nature of Micro-loops...........................................4

55	3. Applicability.......................................................5

57	4. Micro-loop Control Strategies.......................................5

59	5. Loop mitigation.....................................................6

61	6. Micro-loop Prevention...............................................8
62	 6.1. Incremental Cost Advertisement..................................8
63	 6.2. Single Tunnel Per Router........................................9
64	 6.3. Distributed Tunnels............................................11
65	 6.4. Packet Marking.................................................11
66	 6.5. Ordered SPFs...................................................12
67	 6.6. Synchronised FIB Updates.......................................14
68	7. Loop Suppression...................................................14

70	8. Compatibility Issues...............................................15

72	9. Comparison of Loop-free Convergence Methods........................15

74	10. IANA considerations...............................................16

76	11. Security Considerations...........................................16

78	12. Intellectual Property Statement...................................16

80	13. Full copyright statement..........................................17

82	14. Normative References..............................................17

84	15. Informative References............................................17

86	16. Authors' Addresses................................................18
87	1.
88	    Introduction

90	   When there is a change to the network topology (due to the failure
91	   or restoration of a link or router, or as a result of management
92	   action) the routers need to converge on a common view of the new
93	   topology, and the paths to be used for forwarding traffic to each
94	   destination. During this process, referred to as a routing
95	   transition, packet delivery between certain source/destination
96	   pairs may be disrupted. This occurs due to the time it takes for
97	   the topology change to be propagated around the network together
98	   with the time it takes each individual router to determine and then
99	   update the forwarding information base (FIB) for the affected
100	   destinations. During this transition, packets are lost due to the
101	   continuing attempts to use of the failed component, and due to
102	   forwarding loops. Forwarding loops arise due to the inconsistent
103	   FIBs that occur as a result of the difference in time taken by
104	   routers to execute the transition process. This is a problem that
105	   occurs in both IP networks and MPLS networks that use LDP [RFC3036]
106	   as the label switched path (LSP) signaling protocol.

108	   The service failures caused by routing transitions are largely
109	   hidden by higher-level protocols that retransmit the lost data.
110	   However new Internet services are emerging which are more sensitive
111	   to the packet disruption that occurs during a transition. To make
112	   the transition transparent to their users, these services require a
113	   short routing transition. Ideally, routing transitions would be
114	   completed in zero time with no packet loss.

116	   Regardless of how optimally the mechanisms involved have been
117	   designed and implemented, it is inevitable that a routing
118	   transition will take some minimum interval that is greater than
119	   zero. This has led to the development of a TE fast-reroute
120	   mechanism for MPLS [MPLS-TE]. Alternative mechanisms that might be
121	   deployed in an MPLS network and mechanisms that may be used in an
122	   IP network are work in progress in the IETF [IPFRR]. Any repair
123	   mechanism may however be disrupted by the formation of micro-loops
124	   during the period between the time when the failure is announced,
125	   and the time when all FIBs have been updated to reflect the new
126	   topology.

128	   There is, however, little point is introducing new mechanisms into
129	   an IP network to provide fast re-route, without also deploying
130	   mechanisms that prevent the disruptive effects of micro-loops which
131	   may starve the repair or cause congestion loss as a result of
132	   looping packets.

134	   The disruptive effect of micro-loops is not confined to periods
135	   when there is a component failure. Micro-loops can, for example,
136	   form when a component is put back into service following repair.

138	   Micro-loops can also form as a result of a network maintenance
139	   action such as adding a new network component, removing a network
140	   component or modifying a link cost.

142	   This framework provides a summary of the mechanisms that have been
143	   proposed to address the micro-loop issue.

145	2.
146	   The Nature of Micro-loops

148	   Micro-loops may form during the periods when a network is re-
149	   converging following ANY topology change, and are caused by
150	   inconsistent FIBs in the routers. During the transition, micro-
151	   loops may occur over a single link between a pair of routers that
152	   temporarily use each other as the next hop for a prefix. Micro-
153	   loops may also form when a cycle of routers have the next router in
154	   the cycle as a next hop for a prefix. Cyclic micro-loops always
155	   include at least one link with an asymmetric cost, and/or at least
156	   two symmetric cost link cost changes within the convergence time.

158	   Micro-loops have two undesirable side-effects, congestion and
159	   repair starvation. A looping packet consumes bandwidth until it
160	   either escapes as a result of the re-synchronization of the FIBs,
161	   or its TTL expires. This transiently increases the traffic over a
162	   link by as much as 128 times, and may cause the link to congest.
163	   This congestion reduces the bandwidth available to other traffic
164	   (which is not otherwise affected by the topology change). As a
165	   result the "innocent" traffic using the link experiences increased
166	   latency, and is liable to congestive packet loss.

168	   In cases where the link or node failure has been protected by a
169	   fast re-route repair, the inconsistency in the FIBs prevents some
170	   traffic from reaching the failure and hence being repaired. The
171	   repair may thus become starved of traffic and hence become
172	   ineffective. Thus in addition to the congestive damage, the repair
173	   is rendered ineffective by the micro-loop. Similarly, if the
174	   topology change is the result of management action the link could
175	   have been retained in service throughout the transition (i.e. the
176	   link acts as its own repair path), however, if micro-loops form,
177	   they prevent productive forwarding during the transition.

179	   Unless otherwise controlled, micro-loops may form in any part of
180	   the network that forwards (or in the case of a new link, will
181	   forward) packets over a path that includes the affected topology
182	   change. The time taken to propagate the topology change through the
183	   network, and the non-uniform time taken by each router to calculate
184	   the new SPT and update its FIB may significantly extend the
185	   duration of the packet disruption caused by the micro-loops. In
186	   some cases a packet may be subject to disruption from micro-loops
187	   which occur sequentially at links along the path, thus further
188	   extending the period of disruption beyond that required to resolve
189	   a single loop.

191	3.
192	   Applicability

194	   Loop free convergence techniques are applicable [APPL] to any
195	   situation in which micro-loops may form. For example the
196	   convergence of a network following:

198	   1) Component failure.

200	   2) Component repair.

202	   3) Management withdrawal of a component.

204	   4) Management insertion or a component.

206	   5) Management change of link cost (either positive or negative).

208	   6) External cost change, for example change of external gateway as
209	      a result of a BGP change.

211	   7) An SRLG failure.

213	   In each case, a component may be a link or a router.

215	   Loop free convergence techniques are applicable to both IP networks
216	   and MPLS enabled networks that use LDP, including LDP networks that
217	   use the single-hop tunnel fast-reroute mechanism.

219	4.
220	   Micro-loop Control Strategies.

222	   Micro-loop control strategies fall into three basic classes:

224	     1. Micro-loop mitigation

226	     2. Micro-loop prevention

228	     3. Micro-loop suppression

230	   A micro-loop mitigation scheme works by re-converging the network
231	   in such a way that it reduces, but does not eliminate, the
232	   formation of micro-loops. Such schemes cannot guarantee the
233	   productive forwarding of packets during the transition.

235	   A micro-loop prevention mechanism controls the re-convergence of
236	   network in such a way that no micro-loops form. Such a micro-loop
237	   prevention mechanism allows the continued use of any fast repair
238	   method until the network has converged on its new topology, and
239	   prevents the collateral damage that occurs to other traffic for the
240	   duration of each micro-loop. These mechanisms normally extend the
241	   duration of the re-convergence process. In the case of a fast re-
242	   route repair this means that the network requires the repair to
243	   remain in place longer than would otherwise be the case. This
244	   causes extended problems to any traffic which is NOT repaired by an
245	   imperfect repair (as does ANY method which delays re-convergence).

247	   When a component is returned to service, or when a network
248	   management action has taken place, this additional delay does not
249	   cause traffic disruption, because there is no repair involved.
250	   However the extended delay is undesirable, because it increases the
251	   time that the network takes to be ready for another failure, and
252	   hence leaves it vulnerable to multiple failures.

254	   A micro-loop suppression mechanism attempts to eliminate the
255	   collateral damage done by micro-loops to other traffic. This may be
256	   achieved by, for example, using a packet monitoring method, which
257	   detects that a packet is looping and drops it. Such schemes make no
258	   attempt to productively forward the packet throughout the network
259	   transition.

261	5.
262	   Loop mitigation

264	   The only known loop mitigation approach is the safe-neighbors
265	   method described in [ZININ]. In this method, a micro-loop free
266	   next-hop safety condition is defined as follows:

268	   In a symmetric cost network, it is safe for router X to change to
269	   the use of neighbor Y as its next-hop for a specific destination if
270	   the path through Y to that destination satisfies both of the
271	   following criteria:

273	     1.   X considers Y as its loop-free neighbor based on the
274	          topology before the change AND

276	     2.   X considers Y as its downstream neighbor based on the
277	          topology after the change.

279	   In an asymmetric cost network, a stricter safety condition is
280	   needed, and the criterion is that:

282	          X considers Y as its downstream neighbor based on the
283	          topology both before and after the change.

285	   Based on these criteria, destinations are classified by each router
286	   into three classes:

288	    Type A destinations: Destinations unaffected by the change and
289	    also destinations whose next hop after the change satisfies the
290	    safety criteria.

292	    Type B destinations: Destinations that cannot be sent via the new
293	    primary next-hop because the safety criteria are not satisfied,
294	    but which can be sent via another next-hop that does satisfy the
295	    safety criteria.

297	    Type C destinations: All other destinations.

299	   Following a topology change, Type A destinations are immediately
300	   changed to go via the new topology. Type B destinations are
301	   immediately changed to go via the next hop that satisfies the
302	   safety criteria, even though this is not the shortest path. Type B
303	   destinations continue to go via this path until all routers have
304	   changed their Type C destinations over to the new next hop. Routers
305	   must not change their Type C destinations until all routers have
306	   changed their Type A2 and Type B destinations to the new or
307	   intermediate (safe) next hop.

309	   Simulations indicate that this approach produces a significant
310	   reduction in the number of links that are subject to micro-looping.
311	   However unlike all of the micro-loop prevention methods it is only
312	   a partial solution. In particular, micro-loops may form on any link
313	   joining a pair of type C routers.

315	   Because routers delay updating their Type C destination FIB
316	   entries, they will continue to route towards the failure during the
317	   time when the routers are changing their Type A and B destinations,
318	   and hence will continue to productively forward packets provided
319	   that viable repair paths exist.

321	   A backwards compatibility issue arises with the safe-next-hop
322	   scheme. If a router is not capable of micro-loop control, it will
323	   not correctly delay its FIB update. If all such routers had only
324	   type A destinations this loop migration mechanism would work as it
325	   was designed. Alternatively, if all such incapable routers had only
326	   type C destinations, the "covert" announcement mechanism used to
327	   trigger the tunnel based schemes could be used to cause the Type A
328	   and Type B destinations to be changed, with the incapable routers
329	   and routers having type C destinations delaying until they received
330	   the "real" announcement. Unfortunately, these two approaches are
331	   mutually incompatible.

333	   To recap, routers classify their destinations into three types A, B
334	   or C. Routers update their FIBs in three phases. A router first
335	   updates destinations that it has classified as type A or type B, it
336	   then updates destinations that it has classified as type C, and
337	   finally it corrects the temporary next hop used for destinations
338	   classified as type B.

340	   Note that simulations indicate that in most topologies treating
341	   type B destinations as type C results in only a small degradation
342	   in loop prevention. Also note that early simulation result appear
343	   to indicate that in production networks where some, but not all,
344	   links have asymmetric costs, using the asymmetric cost criterion
345	   actually REDUCES number of loop free destinations.

347	   This mechanism operates identically for both "bad-news" events,
348	   "good-news" events and SRLG failure.

350	6.
351	   Micro-loop Prevention

353	   Six micro-loop prevention methods have been proposed:

355	     1. Incremental cost advertisement

357	     2. Single Tunnel

359	     3. Distributed Tunnels

361	     4. Packet Marking

363	     5. Ordered SPF

365	     6. Synchronized FIBS

367	   Both of the tunnel methods, packet marking and ordered SPF could be
368	   combined with safe-neighbors [Zinin] to reduce the traffic that
369	   used the advanced method. Specifically all traffic could use safe
370	   neighbors except traffic between a pair of routers both of which
371	   consider the destination to be type C. The type C to type C traffic
372	   would be protected from micro-looping through the use of a loop
373	   prevention method.

375	   However, determining whether the new next hop router considers a
376	   destination to be type C may be computationally intensive. An
377	   alternative approach would be to use a loop prevention method for
378	   all local type C destinations. This would not require any
379	   additional computation, but would require the additional loop
380	   prevention method to be used in cases which would not have
381	   generated loops (i.e. when the new next-hop router considered this
382	   to be a type A or B destination).

384	   The amount of traffic that would use safe neighbors is highly
385	   dependent on the network topology and the specific change, but
386	   would be expected to be in the region %70 to %90 in typical
387	   networks.

389	6.1.
390	    Incremental Cost Advertisement

392	   When a link fails, the cost of the link is normally changed from
393	   its assigned metric to "infinity" in one step.  However, it can be
394	   proved that no micro-loops will form if the link cost is increased
395	   in suitable increments, and the network is allowed to stabilize
396	   before the next cost increment is advertised. Once the link cost
397	   has been increased to a value greater than that of the lowest
398	   alternative cost around the link, the link may be disabled without
399	   causing a micro-loop.

401	   The criterion for a link cost change to be safe is that any link
402	   which is subjected to a cost change of x can only cause loops in a
403	   part of the network that has a cyclic cost less than or equal to x.
404	   Because there may exist links which have a cost of one in each
405	   direction, resulting in a cyclic cost of two, this can result in
406	   the link cost having to be raised in increments of one. However the
407	   increment can be larger where the minimum cost permits. Determining
408	   the minimum link cost in the network is trivial, but unfortunately,
409	   calculating the optimum increment is thought to be a costly
410	   calculation.

412	   This approach has the advantage that it requires no change to the
413	   routing protocol. It will work in any network that uses a link-
414	   state IGP because it does not require any co-operation from the
415	   other routers in the network. However the method can be extremely
416	   slow, particularly if large metrics are used. For the duration of
417	   the transition some parts of the network continue to use the old
418	   forwarding path, and hence use any repair mechanism for an extended
419	   period. In the case of a failure that cannot be fully repaired,
420	   some destinations may become unreachable for an extended period.

422	   Where the micro-loop prevention mechanism was being used to support
423	   a fast re-route repair the network may be vulnerable to a second
424	   failure for the duration of the controlled re-convergence.

426	   Where the micro-loop prevention mechanism was being used to support
427	   a reconfiguration of the network the extended time is less of an
428	   issue. In this case, because the real forwarding path is available
429	   throughout the whole transition, there is no conflict between
430	   concurrent change actions throughout the network.

432	   It will be appreciated that when a link is returned to service, its
433	   cost is reduced in small steps from "infinity" to its final cost,
434	   thereby providing similar micro-loop prevention during a
435	   "good-news" event. Note that the link cost may be decreased from
436	   "infinity" to any value greater than that of the lowest alternative
437	   cost around the link in one step without causing a micro-loop.

439	   When the failure is an SRLG the link cost increments must be
440	   coordinated across all members of the SRLG. This may be achieved by
441	   completing the transition of one link before starting the next, or
442	   by interleaving the changes. This can be achieved without the need
443	   for any protocol extensions, by for example, using existing
444	   identifiers to establish the ordering and the arrival of LSP/LSAs
445	   to trigger the generation of the next increment.

447	6.2.
448	    Single Tunnel Per Router

450	   This mechanism works by creating an overlay network using tunnels
451	   whose path is not effected by the topology change and carrying the
452	   traffic affected by the change in that new network. When all the
453	   traffic is in the new, tunnel based, network, the real network is
454	   allowed to converge on the new topology. Because all the traffic
455	   that would be affected by the change is carried in the overlay
456	   network no micro-loops form. When all micro-loop preventing routers
457	   have their tunnels in place, all the routers in the network are
458	   informed of the change in the normal way, at which point micro-
459	   loops may form within isolated islands of non-micro-loop preventing
460	   routers. However, only traffic entering the network via such
461	   routers can micro-loop. All traffic entering the network via a
462	   micro-loop preventing router will be tunneled correctly to the
463	   nearest repairing router, including, if necessary being tunneled
464	   via a non-micro-loop preventing router, and will not micro-loop.
465	   When all the non-micro-loop preventing routers have converged, the
466	   micro-loop preventing routers can change from tunneling the packets
467	   to forwarding normally according to the new topology. This
468	   transition can occur in any order without micro-loops forming.

470	   When a failure is detected (or a link is withdrawn from service),
471	   the router adjacent to the failure issues a new ("covert") routing
472	   message announcing the topology change. This message is propagated
473	   through the network by all routers, but is only understood by
474	   routers capable of using one of the tunnel based micro-loop
475	   prevention mechanisms.

477	   Each of the micro-loop preventing routers builds a tunnel to the
478	   closest router adjacent to the failure. They then determine which
479	   of their traffic would transit the failure and place that traffic
480	   in the tunnel. When all of these tunnels are in place, the failure
481	   is then announced as normal. Because these tunnels will be
482	   unaffected by the transition, and because the routers protecting
483	   the link will continue the repair (or forward across the link being
484	   withdrawn), no traffic will be disrupted by the failure. When the
485	   network has converged these tunnels are withdrawn, allowing traffic
486	   to be forwarded along its new "natural" path. The order of tunnel
487	   insertion and withdrawal is not important, provided that the
488	   tunnels are all in place before the normal announcement is issued.

490	   This method completes in bounded time, and is much faster then the
491	   incremental cost method. Depending on the exact design it completes
492	   in two or three flood-SPF-FIB update cycles.

494	   Where there is no requirement to prevent the formation of micro-
495	   loops involving non-micro-loop preventing routers, a single,
496	   "normal" announcement may be made, and a local timer used to
497	   determine the time at which transition from tunneled forwarding to
498	   normal forwarding over the new topology may commence.

500	   This technique has the disadvantage that it requires traffic to be
501	   tunneled during the transition. This is an issue in IP networks
502	   because not all router designs are capable of high performance IP
503	   tunneling. It is also an issue in MPLS networks because the
504	   encapsulating router has to know the labels set that the
505	   decapsulating router is distributing.

507	   A further disadvantage of this method is that it requires
508	   co-operation from all the routers within the routing domain to
509	   fully protect the network against micro-loops. However it can be
510	   shown that these micro-loops will be confined to contiguous groups
511	   of routers not executing this micro-loop prevention mechanism, and
512	   that it will only affect traffic arriving at the network through
513	   one of those routers.

515	   When a new link is added, the mechanism is run in reverse. When the
516	   "covert" announcement is heard, routers determine which traffic
517	   they will send over the new link, and tunnel that traffic to the
518	   router on the near side of that link. This path will not be
519	   affected by the presence of the new link. When the "normal"
520	   announcement is heard, they then update their FIB to send the
521	   traffic normally according to the new topology. Any traffic
522	   encountering a router that has not yet updated its FIB will be
523	   tunneled to the near side of the link, and will therefore not loop.

525	   When a management change to the topology is required, again exactly
526	   the same mechanism protects against micro-looping of packets by the
527	   micro-loop preventing routers.

529	   When the failure is an SRLG, the required strategy is to classify
530	   traffic according the first member of the SRLG that it will
531	   traverse on its way to the destination, and to tunnel that traffic
532	   to the router that is closest to that SRLG member. This will
533	   require multiple tunnel destinations, in the limiting case, one per
534	   SRLG member.

536	6.3.      Distributed Tunnels

538	   In the distributed tunnels loop prevention method, each router
539	   calculated its own PQ repair [TUNNEL] for its traffic affected by
540	   the failure. The path to the P router will not be affected by the
541	   convergence process. In a manner similar to the single tunnel case,
542	   traffic is repaired in response to the "covert" announcement and
543	   moved to a "natural" path using the new topology in response to a
544	   "normal" announcement.

546	   This reduces the load on the tunnel endpoints, but the length of
547	   time taken to calculate the repairs increases the convergence time.

549	   This method suffers from the same disadvantages as the single
550	   tunnel method.

552	6.4.     Packet Marking

554	   If packets could be marked in some way, this information could be
555	   used to assign them to either, the new topology, the old topology
556	   or a transition topology. They would then be correctly forwarded
557	   during the transition. This could, for example, be achieved by
558	   allocating a Type of Service bit to the task [RFC791]. This
559	   mechanism works identically for both "bad-news" and "good-news"
560	   events. It also works identically for SRLG failure. There are three
561	   problems with this solution:

563	     1. The packet marking bit is generally not available.

565	     2. The mechanism would introduce a non-standard forwarding
566	        procedure.

568	     3. Packet marking using either the old or the new topology would
569	        double the size of the FIB, although the use of a transition
570	        topology, for example always via the failure and its repair,
571	        would have a trivial impact on FIB size.

573	6.5.     Ordered SPFs

575	   Micro loops occur following a failure or a cost increase, when a
576	   router closer to the failed component revises its routes to take
577	   account of the failure before a router which is further away. By
578	   analyzing the reverse spanning tree over which traffic is directed
579	   to the failed component in the old topology, it is possible to
580	   determine a strict ordering which ensures that nodes closer to the
581	   root always process the failure after any nodes further away, and
582	   hence micro loops are prevented.

584	   When the failure has been announced, each router waits a multiple
585	   of some time delay value. The multiple is determined by the node's
586	   position in the reverse spanning tree, and the delay value is
587	   chosen to guarantee that a node can complete its processing within
588	   this time. The convergence time may be reduced by employing a
589	   signaling mechanism to notify the parent when all the children have
590	   completed their processing, and hence when it was safe for the
591	   parent to instantiate its new routes.

593	   The property of this approach is therefore that it imposes a delay
594	   which is bounded by the network diameter although in many cases it
595	   will be much less.

597	   When a link is returned to service the convergence process above is
598	   reversed. A router first calculates the reverse spanning tree in
599	   the new topology rooted at the far end of the new link, and
600	   determines its distance from the new link (in hops). It then waits
601	   a time that is proportional to that distance before updating its
602	   FIB.  It will be seen that network management actions can similarly
603	   be undertaken by treating a cost increase in a manner similar to a
604	   failure and a cost decrease similar to a restoration.

606	   The ordered SPF mechanism requires all nodes in the domain to
607	   operate according to these procedures, and the presence of non
608	   co-operating nodes can give rise to loops for any traffic which
609	   traverses them (not just traffic which is originated through them).

611	   Without additional mechanisms these loops could remain in place for
612	   a significant time.

614	   It should be noted that this method requires per router ordering,
615	   but not per prefix ordering. A router must wait its turn to update
616	   its FIB, but it should then update its entire FIB.

618	   Another way of viewing the operation of this method is to realize
619	   that there is a horizon of routers affected by the failure. Routers
620	   beyond the horizon do not send packets via the failure. Routers at
621	   the horizon have a neighbor that does not send packets via the
622	   failure. It is then obvious that routers on the horizon can use
623	   their neighbor that is over the horizon as a loop free alternate to
624	   the destination and can hence update their FIBs immediately. Once
625	   these routers have updated their FIBs, they move over the horizon
626	   with respect to the failure and their neighbors that are closer to
627	   the failure become the new horizon routers.

629	   Only routers within the horizon need to change their FIBs and hence
630	   only those routers need to delay changing their FIBs.

632	   When an SRLG failure occurs a router must classify traffic into the
633	   classes that pass over each member of the SRLG. Ordered SPF
634	   convergence is then carried out on each SRLG member individually
635	   and the FIB updated for only those prefixes allowed to change at
636	   each epoch. Again, as for the single failure case, signaling may be
637	   used to speed up the convergence process. Note that the special
638	   SRLG case of a full or partial node failure, can be deal with
639	   without using per prefix ordering, by running a single reverse SPF
640	   rooted at the failed node (or common point of the subset of failing
641	   links in the partial case).

643	   There are two classes of signaling optimization that can be applied
644	   to the ordered SPF loop-prevention method:

646	     1. When the router makes NO change, it can signal immediately.
647	        This significantly reduces the time taken by the network to
648	        process long chains of routers that have no change to make to
649	        their FIB.

651	     2. When a router HAS changed, it can signal that it has
652	        completed. This is more problematic since this may be
653	        difficult to determine, particularly in a distributed
654	        architecture, and the optimization obtained is only the
655	        difference between the actual time taken to make the FIB
656	        change and the worst case timer value.

658	   There is another method of executing ordered SPF which is based on
659	   pure signaling [OB]. Methods that use signaling as an optimization
660	   are safe because eventually they fall back on the established IGP
661	   mechanisms which ensure that networks converge under conditions of
662	   packet loss. However a mechanism that relies on signaling in order
663	   to converge requires a reliable signaling mechanism which must be
664	   proven to recover from any failure circumstance.

666	6.6.     Synchronised FIB Updates

668	   Micro-loops form because of the asynchronous nature of the FIB
669	   update process during a network transition. In many router
670	   architectures it is the time taken to update the FIB itself that is
671	   the dominant term. One approach would be to have two FIBs and, in a
672	   synchronized action throughout the network, to switch from the old
673	   to the new. One way to achieve this synchronized change would be to
674	   signal or otherwise determine the wall clock time of the change,
675	   and then execute the change at that time, using NTP to synchronize
676	   the wall clocks in the routers.

678	   This approach has a number of major issues. Firstly two complete
679	   FIBs are needed which may create a scaling issue and secondly a
680	   suitable network wide synchronization method is needed. However,
681	   neither of these are insurmountable problems.

683	   Since the FIB change synchronization will not be perfect there may
684	   be some interval during which micro-loops form. Whether this scheme
685	   is classified as a micro-loop prevention mechanism or a micro-loop
686	   avoidance mechanism within this taxonomy is therefore dependent on
687	   the degree of synchronization achieved.

689	   This mechanism works identically for both "bad-news" and "good-
690	   news" events. It also works identically for SRLG failure.

692	   Further consideration needs to be given to interoperating with
693	   routers that do not support this mechanism. Without a suitable
694	   interoperating mechanism, loops may form for the duration of the
695	   synchronization delay.

697	7.    Loop Suppression

699	   A micro-loop suppression mechanism recognizes that a packet is
700	   looping and drops it. One such approach would be for a router to
701	   recognize, by some means, that it had seen the same packet before.
702	   It is difficult to see how sufficiently reliable discrimination
703	   could be achieved without some form of per-router signature such as
704	   route recording. A packet recognizing approach therefore seems
705	   infeasible.

707	   An alternative approach would be to recognize that a packet was
708	   looping by recognizing that it was being sent back to the place
709	   that it had just come from. This would work for the types of loop
710	   that form in symmetric cost networks, but would not suppress the
711	   cyclic loops that form in asymmetric networks.

713	   This mechanism operates identically for both "bad-news" events,
714	   "good-news" events and SRLG failure.

716	   The problem with this class of micro-loop control strategies is
717	   that whilst they prevent collateral damage they do nothing to
718	   enhance the productive forwarding of packets during the network
719	   transition.

721	8.    Compatibility Issues

723	   Deployment of any micro-loop control mechanism is a major change to
724	   a network. Full consideration must be given to interoperation
725	   between routers that are capable of micro-loop control, and those
726	   that are not. Additionally there may be a desire to limit the
727	   complexity of micro-loop control by choosing a method based purely
728	   on its simplicity. Any such decision must take into account that if
729	   a more capable scheme is needed in the future, its deployment will
730	   be complicated by interaction with the scheme previously deployed.

732	9.    Comparison of Loop-free Convergence Methods

734	   Safe-neighbors is an efficient mechanism to prevent the formation
735	   of micro-loops, but is only a partial solution. It is a useful
736	   adjunct to one of the complete solutions.

738	   Incremental cost advertisement is impractical because it takes too
739	   long to complete.

741	   Packet Marking is impractical because of the need to find the
742	   marking bit.

744	   Of the remaining methods distributed tunnels is significantly more
745	   complex than single tunnels, and should only be considered if a
746	   tunnel solution is preferred, and even with the use of loop
747	   mitigation, the tunnel decapsulation load needs to be reduced on
748	   the router adjacent to the topology change.

750	   Synchronised FIBs is a fast method, but has the issue that a
751	   suitable synchronization mechanism needs to be defined. One method
752	   would be to use NTP, however the coupling of routing convergence to
753	   a protocol that uses the network may be a problem. During the
754	   transition there will be some micro-looping for a short interval
755	   because it is not possible to achieve complete synchronization of
756	   the FIB changeover.

758	   The ordered SPF mechanism has the major advantage that it is a
759	   control plane only solution. However, SRLGs require a per-
760	   destination calculation, and the convergence delay is high, bounded
761	   by the network diameter. When combined with signaling as an
762	   accelerator and safe-neigbours to reduce the number of destinations
763	   that experience the full delay this method is one of the two best
764	   choices.

766	   The single tunnel method deals relatively easily with SRLGs and
767	   uncorrelated changes. The convergence delay would be small. However
768	   the method requires the use of tunneled forwarding which is not
769	   supported on all router hardware, and raises issues of forwarding
770	   performance. When used with safe-neighbors, the amount of traffic
771	   that was tunneled would be significantly reduced, thus reducing the
772	   forwarding performance concerns. This method would be a good choice
773	   in combination with a tunneled IPFRR method. It is the other
774	   promising loop prevention candidate.

776	10.     IANA considerations

778	   There are no IANA considerations that arise from this draft.

780	11.     Security Considerations

782	   All micro-loop control mechanisms raise significant security issues
783	   which must be addressed in their detailed technical description.

785	12.     Intellectual Property Statement

787	   The IETF takes no position regarding the validity or scope of any
788	   Intellectual Property Rights or other rights that might be claimed
789	   to pertain to the implementation or use of the technology described
790	   in this document or the extent to which any license under such
791	   rights might or might not be available; nor does it represent that
792	   it has made any independent effort to identify any such rights.
793	   Information on the procedures with respect to rights in RFC
794	   documents can be found in BCP 78 and BCP 79.

796	   Copies of IPR disclosures made to the IETF Secretariat and any
797	   assurances of licenses to be made available, or the result of an
798	   attempt made to obtain a general license or permission for the use
799	   of such proprietary rights by implementers or users of this
800	   specification can be obtained from the IETF on-line IPR repository
801	   at http://www.ietf.org/ipr.

803	   The IETF invites any interested party to bring to its attention any
804	   copyrights, patents or patent applications, or other proprietary
805	   rights that may cover technology that may be required to implement
806	   this standard.  Please address the information to the IETF at
807	   ietf-ipr@ietf.org.

809	13.      Full copyright statement

811	   Copyright (C) The Internet Society (2005). This document is subject
812	   to the rights, licenses and restrictions contained in BCP 78, and
813	   except as set forth therein, the authors retain all their rights.

815	   This document and the information contained herein are provided on
816	   an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
817	   REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND
818	   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES,
819	   EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT
820	   THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR
821	   ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A
822	   PARTICULAR PURPOSE.

824	14.     Normative References

826	   There are no normative references.

828	15.     Informative References

830	   Internet-drafts are works in progress available from
831	   <http://www.ietf.org/internet-drafts/>

833	   [APPL]        Bryant, S., Shand, M., "Applicability of Loop-
834	                 free Convergence", <draft-bryant-shand-lf-
835	                 applicability-00.txt>, Jun 2005, (work in
836	                 progress).

838	   [OB]          Avoiding transient loops during IGP convergence
839	                 P. Francois, O. Bonaventure
840	                 IEEE INFOCOM 2005, March 2005, Miami, Fl., USA

842	   IPFRR         Shand, M., "IP Fast-reroute Framework",
843	                 <draft-ietf-rtgwg-ipfrr-framework-01.txt>, June
844	                 2004, (work in progress).

846	   LDP           Andersson, L., Doolan, P., Feldman, N.,
847	                 Fredette, A. and B. Thomas, "LDP
848	                 Specification", RFC 3036,
849	                 January 2001.

851	   MPLS-TE       Ping Pan, et al, "Fast Reroute Extensions to
852	                 RSVP-TE for LSP Tunnels",
853	                 <draft-ietf-mpls-rsvp-lsp-fastreroute-07.txt>,
854	                 (work in progress).

856	   RFC791        RFC-791, Internet Protocol Protocol
857	                 Specification, September 1981

859	   TUNNEL        Bryant, S., Shand, M., "IP Fast Reroute using
860	                 tunnels", <draft-bryant-ipfrr-tunnels-02.txt>,
861	                 Apr 2005 (work in progress).

863	   ZININ         Zinin, A., "Analysis and Minimization of
864	                 Microloops in Link-state Routing Protocols",
865	                 <draft-zinin-microloop-analysis-01.txt>, May
866	                 2005 (work in progress).

868	16.    Authors' Addresses

870	   Mike Shand
871	   Cisco Systems,
872	   250, Longwater,
873	   Green Park,
874	   Reading, RG2 6GB,
875	   United Kingdom.             Email: mshand@cisco.com

877	   Stewart Bryant
878	   Cisco Systems,
879	   250, Longwater,
880	   Green Park,
881	   Reading, RG2 6GB,
882	   United Kingdom.             Email: stbryant@cisco.com