idnits 2.17.1 

draft-bryant-shand-ipfrr-multi-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 15.

  -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on
     line 442.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 453.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 460.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 466.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust Copyright Line does not match the
     current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (October 30, 2008) is 5657 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Outdated reference: A later version (-11) exists of
     draft-ietf-rtgwg-ipfrr-notvia-addresses-02

  == Outdated reference: A later version (-13) exists of
     draft-ietf-rtgwg-ipfrr-framework-08


     Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 7 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                          S. Bryant
3	Internet-Draft                                                  M. Shand
4	Intended status: Informational                             Cisco Systems
5	Expires: May 3, 2009                                    October 30, 2008

7	               IPFRR in the Presence of Multiple Failures
8	                   draft-bryant-shand-ipfrr-multi-01

10	Status of this Memo

12	   By submitting this Internet-Draft, each author represents that any
13	   applicable patent or other IPR claims of which he or she is aware
14	   have been or will be disclosed, and any of which he or she becomes
15	   aware will be disclosed, in accordance with Section 6 of BCP 79.

17	   Internet-Drafts are working documents of the Internet Engineering
18	   Task Force (IETF), its areas, and its working groups.  Note that
19	   other groups may also distribute working documents as Internet-
20	   Drafts.

22	   Internet-Drafts are draft documents valid for a maximum of six months
23	   and may be updated, replaced, or obsoleted by other documents at any
24	   time.  It is inappropriate to use Internet-Drafts as reference
25	   material or to cite them other than as "work in progress."

27	   The list of current Internet-Drafts can be accessed at
28	   http://www.ietf.org/ietf/1id-abstracts.txt.

30	   The list of Internet-Draft Shadow Directories can be accessed at
31	   http://www.ietf.org/shadow.html.

33	   This Internet-Draft will expire on May 3, 2009.

35	Abstract

37	   IP Fast Reroute (IPFRR) work in the IETF has focused on the single
38	   failure case, where the failure could be a link, a node or a shared
39	   risk link group.  This draft describes possible extensions to not-via
40	   IPFRR that under some circumstances allow the repair of multiple
41	   simultaneous failures.

43	Requirements Language

45	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
46	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
47	   document are to be interpreted as described in RFC2119 [1].

49	Table of Contents

51	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
52	   2.  The Problem  . . . . . . . . . . . . . . . . . . . . . . . . .  3
53	   3.  Outline Solution . . . . . . . . . . . . . . . . . . . . . . .  4
54	   4.  Looping Repairs  . . . . . . . . . . . . . . . . . . . . . . .  5
55	     4.1.  Dropping Looping Packets . . . . . . . . . . . . . . . . .  6
56	     4.2.  Computing non-looping Repairs of Repairs . . . . . . . . .  6
57	     4.3.  N-level Mutual Loops . . . . . . . . . . . . . . . . . . .  8
58	   5.  Mixing LFAs and Not-via  . . . . . . . . . . . . . . . . . . .  8
59	   6.  Security Considerations  . . . . . . . . . . . . . . . . . . .  9
60	   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . .  9
61	   8.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 10
62	     8.1.  Normative References . . . . . . . . . . . . . . . . . . . 10
63	     8.2.  Informative References . . . . . . . . . . . . . . . . . . 10
64	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 10
65	   Intellectual Property and Copyright Statements . . . . . . . . . . 11

67	1.  Introduction

69	   Work on IP fast reroute (IPFRR) in the IETFFramework [3], Basic [4]
70	   and not-via [2] has so far been restricted to the case of repair of a
71	   single failure.  The single failure cases considere are a single
72	   link, a single node or a shared risk link group (SRLG).  IPFRR repair
73	   of multiple simultaneous failures which are not members of a known
74	   SRLG have not been addressed because of concerns that the use of
75	   multiple concurrent repairs may result in looping repair paths.  In
76	   order to prevent such loops, the current definition of IPFRR using
77	   not-via requires that packets addressed to a not-via address are not
78	   repaired but instead are dropped.

80	   It is possible that a network may experience multiple simultaneous
81	   failures.  This may be due to simple statistical effects, but the
82	   more likely cause is unanticipated SRLGs.  When multiple failures
83	   which are not part of an anticipated group are detected, repairs are
84	   abandoned and the network reverts to normal convergence.  Although
85	   safe, this approach is somewhat draconian, since there are many
86	   circumstances were multiple repairs do not induce loops.

88	   This Internet draft explores the properties of multiple unrelated
89	   failures and proposes some methods that may be used to address this
90	   problem using not-via techniques.

92	2.  The Problem

94	   Let us assume that the repair mechanism is based on not-via repairs.
95	   LFA or downstream routes may be incorporated, and will be dealt with
96	   later.

98	              A------//------B------------D
99	             /                \
100	            /                  \
101	           F                    G
102	            \                  /
103	             \                /
104	              X------//------Y

106	   Figure 1: The General Case of Multiple failures

108	   Note that depending on the repair case under consideration there may
109	   be other paths present in Figure 1, for example between A and B,
110	   and/or between X and Y. These paths are omitted for graphical
111	   clarity.

113	   A------//------B------------X------//------Y------D
114	   |              |            |              |
115	   |              |            |              |
116	   M--------------+            N--------------+

118	   Figure 2: Concatenated Repairs

120	   There are three cases to consider:

122	      1) Consider the general case of a pair of protected links A-B and
123	      X-Y as shown in the network fragment shown Figure 1.  If the
124	      repair path for A-B does not traverse X-Y and the repair path for
125	      X-Y does not traverse A-B, this case is completely safe and will
126	      not cause looping or packet loss.

128	      A more common variation of this case is shown in Figure 2, which
129	      shows two failures in different parts of the network in which a
130	      packet from A to D traverses two concatenated repairs.

132	      2) In Figure 1, the repair for A-B traverses X-Y, but the repair
133	      for X-Y does not traverse A-B.  This case occurs when the not-via
134	      path from A to B traverses link X-Y, but the not-via path from X
135	      to Y traverses some path not shown in Figure 1.  In standard not-
136	      via the repaired packet for A-B would be dropped when it reached
137	      X-Y, since the repair of repaired packets is currently forbidden.
138	      However, if this packet were allowed to be repaired, the path to D
139	      would be complete and no harm would be done, although two levels
140	      of encapsulation would be required.

142	      3) The repair for A-B traverses X-Y AND the repair for X-Y
143	      traverses A-B.  In this case unrestricted repair would result in
144	      looping packets and increasing levels of encapsulation.

146	   The challenge in applying IPFRR to a network that is undergoing
147	   multiple failures is, therefore, to identify which of these cases
148	   exist in the network and react accordingly.

150	3.  Outline Solution

152	   When A is computing the not-via repair path for A-B (i.e. the path
153	   for packets addressed to Ba, read as "B not-via A") it is aware of
154	   the list of nodes which this path traverses.  This can be recorded by
155	   a simple addition to the SPF process, and the not-via addresses
156	   associated with each forward link can be determined.  If the path
157	   were A, F, X, Y, G, B, (Figure 1) the list of not-via addresses would
158	   be: Fa, Xf, Yx, Gy, Bg.  Under standard not-via operation, A would
159	   populate its FIB such that all normal addresses normally reachable
160	   via A-B would be encapsulated to Ba when A-B fails, but traffic
161	   addressed to any not-via address arriving at A would be dropped.  The
162	   new procedure modifies this such that any traffic for a not-via
163	   address normally reachable over A-B is also encapsulated to Ba unless
164	   the not-via address is one of those previously identified as being on
165	   the path to Ba, for example Yx, in which case the packet is dropped.

167	   The above procedure allows cases 1 and 2 above to be repaired, while
168	   preventing the loop which would result from case 3.

170	   Note that this is accomplished by pre-computing the required FIB
171	   entries, and does not require any detailed packet inspection.  The
172	   same result could be achieved by checking for multiple levels of
173	   encapsulation and dropping any attempt to triple encapsulate.
174	   However, this would require more detailed inspection of the packet,
175	   and causes difficulties when more than 2 "simultaneous" failures are
176	   contemplated.

178	   So far we have permitted benign repairs to coexist, albeit sometimes
179	   requiring multiple encapsulation.  Note that in many cases there will
180	   be no performance impact since unless both failures are on the same
181	   node, the two encapsulations or two decapsulations will be performed
182	   at different nodes.  There is however the issue of the MTU impact of
183	   multiple encapsulations.

185	   In the following section we consider the various strategies that may
186	   be applied to case 3 - mutual repairs that would loop.

188	4.  Looping Repairs

190	   In case 3, the simplest approach is to simply not install repairs for
191	   repair paths that might loop.  In this case, although the potentially
192	   looping traffic is dropped, the traffic is not repaired.  If we
193	   assume that a hold-down is applied before reconvergence in case the
194	   link failure was just a short glitch, and if a loop free convergence
195	   mechanism further delays convergence, then the traffic will be
196	   dropped for an extended period.  In these circumstances it would be
197	   better to "abandon all hope" (AAH)
198	   [<draft-bryant-francois-shand-ipfrr-aah-00.txt>] and immediately
199	   invoke normal re-convergence.

201	   Note that it is not sufficient to expedite the issuance of an LSP
202	   reporting the failure, since this may be treated as a permitted
203	   simultaneous failure by the oFIB algorithm.  It is therefore
204	   necessary to explicitly trigger an oFIB AAH.

206	4.1.  Dropping Looping Packets

208	   One approach to case 3 is to allow the repair, and to experimentally
209	   discover the incompatibility of the repairs if and when they occur.
210	   With this method we permit the repair in case 3 and trigger AAH when
211	   a packet drop count on the not-via address has been incremented.
212	   Alternatively, it is possible to wait until the LSP describing the
213	   change is issued normally (i.e. when X announces the failure of X-Y).
214	   When the repairing node A, which has precomputed that X-Y failures
215	   are mutually incompatible with its own repairs receives this LSP it
216	   can then issue the AAH.  This has the disadvantage that it doesn't
217	   overcome the hold-down delay, but it requires no "data-driven"
218	   operation, and it still has the required effect of abandoning the
219	   oFIB which is probably the longer of the delays (although with
220	   signalled oFIB this should be sub-second).

222	   Whilst both of the experimental approaches described above are
223	   feasible, they tend to induce AAH in the presence of otherwise
224	   feasible repairs, and they are contrary to the philosophy of repair
225	   pre-determination that has been applied to existing IPFRR solutions.

227	4.2.  Computing non-looping Repairs of Repairs

229	   An alternative approach to simply dropping the looping packets, or to
230	   detecting the loop after it has occurred, is to use secondary SRLGs.
231	   With a link state routing protocol it is possible to precompute the
232	   incompatibility of the repairs in advance and to compute an
233	   alternative SRLG repair path.  Although this does considerably
234	   increase the computational complexity it may be possible to compute
235	   repair paths that avoid the need to simply drop the offending
236	   packets.

238	   This approach requires us to identify the mutually incompatible
239	   failures, and advertise them as "secondary SRLGs".  When computing
240	   the repair paths for the affected not-via addresses these links are
241	   simultaneously failed.  Note that the assumed simultaneous failure
242	   and resulting repair path only applies to the repair path computed
243	   for the conflicting not-via addresses, and is not used for normal
244	   addresses.  Note that this implies that although there will be a
245	   longer repair path when there is more than one failure, if there is a
246	   single failure the repair path length will be "normal".

248	   Ideally we would wish to only invoke secondary SRLG computation when
249	   we are sure that the repair paths are mutually incompatible.
250	   Consider the case of node A in figure 1.  A first identifies that the
251	   repair path for A-B is via F-X-Y-G-B.  It then explores this path
252	   determining the repair path for each link in the path.  Thus, for
253	   example, it performs a check at X by running an SPF rooted at X with
254	   the X-Y link removed to determine whether A-B is indeed on X's repair
255	   path for packets addressed to Yx.

257	   Some optimizations are possible in this calculation, which appears at
258	   first sight to be order hk (where h is the average hop length of
259	   repair paths and k is the average number of neighbours of a router).
260	   When A is computing its set of repair paths, it does so for all its k
261	   neighbours.  In each case it identifies a list of node pairs
262	   traversed by each repair.  These lists may often have one or more
263	   node pairs in common, so the actual number of link failures which
264	   require investigation is the union of these sets.  It is then
265	   necessary to run an SPF rooted at the first node of each pair (the
266	   first node because the pairings are ordered representing the
267	   direction of the path), with the link to the second node removed.
268	   This SPF, while not an incremental, can be terminated as soon as the
269	   not-via address is reached.  For example, when running the SPF rooted
270	   at X, with the link X-Y removed, the SPF can be terminated when Yx is
271	   reached.  Once the path has been found, the path is checked to
272	   determine if it traverses any of A's links in the direction away from
273	   A. Note that, because the node pair XY may exist in the list for more
274	   than one of A's links (i.e. it lies on more than one repair path), it
275	   is necessary to identify the correct list, and hence link which has a
276	   mutually looping repair path.  That link of A is then advertised by A
277	   as a secondary SRLG paired with the link X-Y.  Also note that X will
278	   be running this algorithm as well, and will identify that XY is
279	   paired with A-B and so advertise it.  This could perhaps be used as a
280	   further check.

282	   The ordering of the pairs in the lists is important. i.e.  X-Y and
283	   Y-X are dealt with separately.  If and only if the repairs are
284	   mutually incompatible, we need to advertise the pair of links as a
285	   secondary SRLG, and then ALL nodes compute repair paths around both
286	   failures using an additional not-via address with the semantics not-
287	   via A-B AND not-via X-Y.

289	   A further possibility is that because we are going to the trouble of
290	   advertising these SRLG sets, we could also advertise the new repair
291	   path and only get the nodes on that path to perform the necessary
292	   computation.  Note also that once we have reached Q space with
293	   respect to the two failures we need no longer continue the
294	   computation, so we only need to notify the nodes on the path that are
295	   not in Q-space.

297	   One cause of mutually looping repair paths is the existence of nodes
298	   with only two links, or sections of the network which are only bi-
299	   connected.  In these cases, repair is clearly impossible - the
300	   failure of both links partitions the network.  It would be
301	   advantageous to be able to identify these cases, and inhibit the
302	   fruitless advertisement of the secondary SRLG information.  This
303	   could be achieved by the node detecting the requirement for a
304	   secondary SRLG, first running the not-via computation with both links
305	   removed.  If this does not result in a path, it is clear that the
306	   network would be partitioned by such a failure, and so no
307	   advertisement is required.

309	4.3.  N-level Mutual Loops

311	   It is tempting to conclude that the mechanism described above can be
312	   applied to the general case of N failures.  If we use the approach of
313	   assuming that repairs are not mutual, and catching the loops and
314	   executing AAH when they occur, then we can attempt repairs in the
315	   case of N failures.

317	   If we use the approach of avoiding potentially mutual repairs and
318	   creating secondary SRLG, then we have to explore N levels of repair,
319	   where N is the number of simultaneous failures we wish to repair.

321	5.  Mixing LFAs and Not-via

323	   So far in this draft we have assumed that all repairs use not-via
324	   tunnels.  However, in practise we may wish to use loop free
325	   alternates (LFAs) or downstream routes where available.  This
326	   complicates the issue, because their use results in packets which are
327	   being repaired, but NOT addressed to not-via addresses.  If BOTH
328	   links are using downstream routes there is no possibility of looping,
329	   since it is impossible to have a pair of nodes which are both
330	   downstream of each other Basic [4].

332	   Loops can however occur when LFAs are used.  An obvious example is
333	   the well known node repair problem with LFAs Basic [4].  If one link
334	   is using a downstream route, while the other is using a not-via
335	   tunnel, the potential mechanism described above would work provided
336	   it were possible to determine the nodes on the path of the downstream
337	   route.  Some methods of computing downstream routes do not provide
338	   this path information.  If the path information is however available,
339	   the link using a downstream route will have a discard FIB entry for
340	   the not-via address of the other link.  The consequence is that
341	   potentially looping packets will be discarded when they attempt to
342	   cross this link.

344	   In the case where the mutual repairs are both using not-via repairs,
345	   the loop will be broken when the packet arrives at the second
346	   failure.  However packets are unconditionally repaired at downstream
347	   routes, and thus when the mutual pair consists of a downstream route
348	   and a not-via repair, the looping packet will only be dropped when it
349	   gets back to the first failure. i.e. it will execute a single turn of
350	   the loop before being dropped.

352	   There is a further complication with downstream routes, since
353	   although the path may be computed to the far side of the failure, the
354	   packet may "peel off" to its destination before reaching the far side
355	   of the failure.  In this case it may traverse some other link which
356	   has failed and was not accounted for on the computed path.  If the
357	   A-B repair (Figure 1) is a downstream route and the X-Y repair is a
358	   not-via repair, we can have the situation where the X-Y repair
359	   packets encapsulated to Yx follow a path which attempts to traverse
360	   A-B.  If the A-B repair path for "normal" addresses is a downstream
361	   route, it cannot be assumed that the repair path for packets
362	   addressed to Yx can be sent to the same neighbour.  This is because
363	   the validity of a downstream route must be ascertained in the
364	   topology represented by Yx, i.e. that with the link X-Y failed.  This
365	   is not the same topology that was used for the normal downstream
366	   calculation, and use of the normal downstream route for the
367	   encapsulated packets may result in an undetected loop.  If it is
368	   computationally feasible to check the downstream route in this
369	   topology (i.e. for any not-via address Qp which traverses A-B we must
370	   perform the downstream calculation for that not-via address in the
371	   topology with link Q-P failed.), then the downstream repair for Yx
372	   can safely be used.  These packets cannot re-visit X-Y, since by
373	   definition they will avoid that link.  Alternatively, the packet
374	   could be always repaired in a not-via tunnel. i.e. even though the
375	   normal repair for traffic traversing A-B would be to use a downstream
376	   route, we could insist that such traffic addressed to a not-via
377	   address MUST use a tunnel to Ba.  Such a tunnel would only be
378	   installed for an address Qp if it were established that it did not
379	   traverse Q-P (using the rules described above).

381	6.  Security Considerations

383	   Security considerations described in Framework [3], Basic [4] and
384	   not-via [2] apply to this work.  Any additional security
385	   considerations will be provided in a future revision of this draft

387	7.  IANA Considerations

389	   There are no IANA actions required by this draft.

391	8.  References
392	8.1.  Normative References

394	   [1]  Bradner, S., "Key words for use in RFCs to Indicate Requirement
395	        Levels", BCP 14, RFC 2119, March 1997.

397	   [2]  Shand, M., Bryant, S., and S. Previdi, "IP Fast Reroute Using
398	        Not-via Addresses", draft-ietf-rtgwg-ipfrr-notvia-addresses-02
399	        (work in progress), February 2008.

401	8.2.  Informative References

403	   [3]  Shand, M. and S. Bryant, "IP Fast Reroute Framework",
404	        draft-ietf-rtgwg-ipfrr-framework-08 (work in progress),
405	        February 2008.

407	   [4]  Atlas, A. and A. Zinin, "Basic Specification for IP Fast
408	        Reroute: Loop-Free Alternates", RFC 5286, September 2008.

410	Authors' Addresses

412	   Stewart Bryant
413	   Cisco Systems
414	   250, Longwater Ave, Green Park,
415	   Reading  RG2 6GB
416	   UK

418	   Email: stbryant@cisco.com

420	   Mike Shand
421	   Cisco Systems
422	   250, Longwater Ave, Green Park,
423	   Reading  RG2 6GB
424	   UK

426	   Email: mshand@cisco.com

428	Full Copyright Statement

430	   Copyright (C) The IETF Trust (2008).

432	   This document is subject to the rights, licenses and restrictions
433	   contained in BCP 78, and except as set forth therein, the authors
434	   retain all their rights.

436	   This document and the information contained herein are provided on an
437	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
438	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
439	   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
440	   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
441	   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
442	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

444	Intellectual Property

446	   The IETF takes no position regarding the validity or scope of any
447	   Intellectual Property Rights or other rights that might be claimed to
448	   pertain to the implementation or use of the technology described in
449	   this document or the extent to which any license under such rights
450	   might or might not be available; nor does it represent that it has
451	   made any independent effort to identify any such rights.  Information
452	   on the procedures with respect to rights in RFC documents can be
453	   found in BCP 78 and BCP 79.

455	   Copies of IPR disclosures made to the IETF Secretariat and any
456	   assurances of licenses to be made available, or the result of an
457	   attempt made to obtain a general license or permission for the use of
458	   such proprietary rights by implementers or users of this
459	   specification can be obtained from the IETF on-line IPR repository at
460	   http://www.ietf.org/ipr.

462	   The IETF invites any interested party to bring to its attention any
463	   copyrights, patents or patent applications, or other proprietary
464	   rights that may cover technology that may be required to implement
465	   this standard.  Please address the information to the IETF at
466	   ietf-ipr@ietf.org.