idnits 2.17.1 draft-bryant-shand-ipfrr-multi-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 15. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 442. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 453. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 460. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 466. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 30, 2008) is 5657 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-11) exists of draft-ietf-rtgwg-ipfrr-notvia-addresses-02 == Outdated reference: A later version (-13) exists of draft-ietf-rtgwg-ipfrr-framework-08 Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group S. Bryant 3 Internet-Draft M. Shand 4 Intended status: Informational Cisco Systems 5 Expires: May 3, 2009 October 30, 2008 7 IPFRR in the Presence of Multiple Failures 8 draft-bryant-shand-ipfrr-multi-01 10 Status of this Memo 12 By submitting this Internet-Draft, each author represents that any 13 applicable patent or other IPR claims of which he or she is aware 14 have been or will be disclosed, and any of which he or she becomes 15 aware will be disclosed, in accordance with Section 6 of BCP 79. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt. 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 This Internet-Draft will expire on May 3, 2009. 35 Abstract 37 IP Fast Reroute (IPFRR) work in the IETF has focused on the single 38 failure case, where the failure could be a link, a node or a shared 39 risk link group. This draft describes possible extensions to not-via 40 IPFRR that under some circumstances allow the repair of multiple 41 simultaneous failures. 43 Requirements Language 45 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 46 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 47 document are to be interpreted as described in RFC2119 [1]. 49 Table of Contents 51 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 52 2. The Problem . . . . . . . . . . . . . . . . . . . . . . . . . 3 53 3. Outline Solution . . . . . . . . . . . . . . . . . . . . . . . 4 54 4. Looping Repairs . . . . . . . . . . . . . . . . . . . . . . . 5 55 4.1. Dropping Looping Packets . . . . . . . . . . . . . . . . . 6 56 4.2. Computing non-looping Repairs of Repairs . . . . . . . . . 6 57 4.3. N-level Mutual Loops . . . . . . . . . . . . . . . . . . . 8 58 5. Mixing LFAs and Not-via . . . . . . . . . . . . . . . . . . . 8 59 6. Security Considerations . . . . . . . . . . . . . . . . . . . 9 60 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 61 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10 62 8.1. Normative References . . . . . . . . . . . . . . . . . . . 10 63 8.2. Informative References . . . . . . . . . . . . . . . . . . 10 64 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 10 65 Intellectual Property and Copyright Statements . . . . . . . . . . 11 67 1. Introduction 69 Work on IP fast reroute (IPFRR) in the IETFFramework [3], Basic [4] 70 and not-via [2] has so far been restricted to the case of repair of a 71 single failure. The single failure cases considere are a single 72 link, a single node or a shared risk link group (SRLG). IPFRR repair 73 of multiple simultaneous failures which are not members of a known 74 SRLG have not been addressed because of concerns that the use of 75 multiple concurrent repairs may result in looping repair paths. In 76 order to prevent such loops, the current definition of IPFRR using 77 not-via requires that packets addressed to a not-via address are not 78 repaired but instead are dropped. 80 It is possible that a network may experience multiple simultaneous 81 failures. This may be due to simple statistical effects, but the 82 more likely cause is unanticipated SRLGs. When multiple failures 83 which are not part of an anticipated group are detected, repairs are 84 abandoned and the network reverts to normal convergence. Although 85 safe, this approach is somewhat draconian, since there are many 86 circumstances were multiple repairs do not induce loops. 88 This Internet draft explores the properties of multiple unrelated 89 failures and proposes some methods that may be used to address this 90 problem using not-via techniques. 92 2. The Problem 94 Let us assume that the repair mechanism is based on not-via repairs. 95 LFA or downstream routes may be incorporated, and will be dealt with 96 later. 98 A------//------B------------D 99 / \ 100 / \ 101 F G 102 \ / 103 \ / 104 X------//------Y 106 Figure 1: The General Case of Multiple failures 108 Note that depending on the repair case under consideration there may 109 be other paths present in Figure 1, for example between A and B, 110 and/or between X and Y. These paths are omitted for graphical 111 clarity. 113 A------//------B------------X------//------Y------D 114 | | | | 115 | | | | 116 M--------------+ N--------------+ 118 Figure 2: Concatenated Repairs 120 There are three cases to consider: 122 1) Consider the general case of a pair of protected links A-B and 123 X-Y as shown in the network fragment shown Figure 1. If the 124 repair path for A-B does not traverse X-Y and the repair path for 125 X-Y does not traverse A-B, this case is completely safe and will 126 not cause looping or packet loss. 128 A more common variation of this case is shown in Figure 2, which 129 shows two failures in different parts of the network in which a 130 packet from A to D traverses two concatenated repairs. 132 2) In Figure 1, the repair for A-B traverses X-Y, but the repair 133 for X-Y does not traverse A-B. This case occurs when the not-via 134 path from A to B traverses link X-Y, but the not-via path from X 135 to Y traverses some path not shown in Figure 1. In standard not- 136 via the repaired packet for A-B would be dropped when it reached 137 X-Y, since the repair of repaired packets is currently forbidden. 138 However, if this packet were allowed to be repaired, the path to D 139 would be complete and no harm would be done, although two levels 140 of encapsulation would be required. 142 3) The repair for A-B traverses X-Y AND the repair for X-Y 143 traverses A-B. In this case unrestricted repair would result in 144 looping packets and increasing levels of encapsulation. 146 The challenge in applying IPFRR to a network that is undergoing 147 multiple failures is, therefore, to identify which of these cases 148 exist in the network and react accordingly. 150 3. Outline Solution 152 When A is computing the not-via repair path for A-B (i.e. the path 153 for packets addressed to Ba, read as "B not-via A") it is aware of 154 the list of nodes which this path traverses. This can be recorded by 155 a simple addition to the SPF process, and the not-via addresses 156 associated with each forward link can be determined. If the path 157 were A, F, X, Y, G, B, (Figure 1) the list of not-via addresses would 158 be: Fa, Xf, Yx, Gy, Bg. Under standard not-via operation, A would 159 populate its FIB such that all normal addresses normally reachable 160 via A-B would be encapsulated to Ba when A-B fails, but traffic 161 addressed to any not-via address arriving at A would be dropped. The 162 new procedure modifies this such that any traffic for a not-via 163 address normally reachable over A-B is also encapsulated to Ba unless 164 the not-via address is one of those previously identified as being on 165 the path to Ba, for example Yx, in which case the packet is dropped. 167 The above procedure allows cases 1 and 2 above to be repaired, while 168 preventing the loop which would result from case 3. 170 Note that this is accomplished by pre-computing the required FIB 171 entries, and does not require any detailed packet inspection. The 172 same result could be achieved by checking for multiple levels of 173 encapsulation and dropping any attempt to triple encapsulate. 174 However, this would require more detailed inspection of the packet, 175 and causes difficulties when more than 2 "simultaneous" failures are 176 contemplated. 178 So far we have permitted benign repairs to coexist, albeit sometimes 179 requiring multiple encapsulation. Note that in many cases there will 180 be no performance impact since unless both failures are on the same 181 node, the two encapsulations or two decapsulations will be performed 182 at different nodes. There is however the issue of the MTU impact of 183 multiple encapsulations. 185 In the following section we consider the various strategies that may 186 be applied to case 3 - mutual repairs that would loop. 188 4. Looping Repairs 190 In case 3, the simplest approach is to simply not install repairs for 191 repair paths that might loop. In this case, although the potentially 192 looping traffic is dropped, the traffic is not repaired. If we 193 assume that a hold-down is applied before reconvergence in case the 194 link failure was just a short glitch, and if a loop free convergence 195 mechanism further delays convergence, then the traffic will be 196 dropped for an extended period. In these circumstances it would be 197 better to "abandon all hope" (AAH) 198 [] and immediately 199 invoke normal re-convergence. 201 Note that it is not sufficient to expedite the issuance of an LSP 202 reporting the failure, since this may be treated as a permitted 203 simultaneous failure by the oFIB algorithm. It is therefore 204 necessary to explicitly trigger an oFIB AAH. 206 4.1. Dropping Looping Packets 208 One approach to case 3 is to allow the repair, and to experimentally 209 discover the incompatibility of the repairs if and when they occur. 210 With this method we permit the repair in case 3 and trigger AAH when 211 a packet drop count on the not-via address has been incremented. 212 Alternatively, it is possible to wait until the LSP describing the 213 change is issued normally (i.e. when X announces the failure of X-Y). 214 When the repairing node A, which has precomputed that X-Y failures 215 are mutually incompatible with its own repairs receives this LSP it 216 can then issue the AAH. This has the disadvantage that it doesn't 217 overcome the hold-down delay, but it requires no "data-driven" 218 operation, and it still has the required effect of abandoning the 219 oFIB which is probably the longer of the delays (although with 220 signalled oFIB this should be sub-second). 222 Whilst both of the experimental approaches described above are 223 feasible, they tend to induce AAH in the presence of otherwise 224 feasible repairs, and they are contrary to the philosophy of repair 225 pre-determination that has been applied to existing IPFRR solutions. 227 4.2. Computing non-looping Repairs of Repairs 229 An alternative approach to simply dropping the looping packets, or to 230 detecting the loop after it has occurred, is to use secondary SRLGs. 231 With a link state routing protocol it is possible to precompute the 232 incompatibility of the repairs in advance and to compute an 233 alternative SRLG repair path. Although this does considerably 234 increase the computational complexity it may be possible to compute 235 repair paths that avoid the need to simply drop the offending 236 packets. 238 This approach requires us to identify the mutually incompatible 239 failures, and advertise them as "secondary SRLGs". When computing 240 the repair paths for the affected not-via addresses these links are 241 simultaneously failed. Note that the assumed simultaneous failure 242 and resulting repair path only applies to the repair path computed 243 for the conflicting not-via addresses, and is not used for normal 244 addresses. Note that this implies that although there will be a 245 longer repair path when there is more than one failure, if there is a 246 single failure the repair path length will be "normal". 248 Ideally we would wish to only invoke secondary SRLG computation when 249 we are sure that the repair paths are mutually incompatible. 250 Consider the case of node A in figure 1. A first identifies that the 251 repair path for A-B is via F-X-Y-G-B. It then explores this path 252 determining the repair path for each link in the path. Thus, for 253 example, it performs a check at X by running an SPF rooted at X with 254 the X-Y link removed to determine whether A-B is indeed on X's repair 255 path for packets addressed to Yx. 257 Some optimizations are possible in this calculation, which appears at 258 first sight to be order hk (where h is the average hop length of 259 repair paths and k is the average number of neighbours of a router). 260 When A is computing its set of repair paths, it does so for all its k 261 neighbours. In each case it identifies a list of node pairs 262 traversed by each repair. These lists may often have one or more 263 node pairs in common, so the actual number of link failures which 264 require investigation is the union of these sets. It is then 265 necessary to run an SPF rooted at the first node of each pair (the 266 first node because the pairings are ordered representing the 267 direction of the path), with the link to the second node removed. 268 This SPF, while not an incremental, can be terminated as soon as the 269 not-via address is reached. For example, when running the SPF rooted 270 at X, with the link X-Y removed, the SPF can be terminated when Yx is 271 reached. Once the path has been found, the path is checked to 272 determine if it traverses any of A's links in the direction away from 273 A. Note that, because the node pair XY may exist in the list for more 274 than one of A's links (i.e. it lies on more than one repair path), it 275 is necessary to identify the correct list, and hence link which has a 276 mutually looping repair path. That link of A is then advertised by A 277 as a secondary SRLG paired with the link X-Y. Also note that X will 278 be running this algorithm as well, and will identify that XY is 279 paired with A-B and so advertise it. This could perhaps be used as a 280 further check. 282 The ordering of the pairs in the lists is important. i.e. X-Y and 283 Y-X are dealt with separately. If and only if the repairs are 284 mutually incompatible, we need to advertise the pair of links as a 285 secondary SRLG, and then ALL nodes compute repair paths around both 286 failures using an additional not-via address with the semantics not- 287 via A-B AND not-via X-Y. 289 A further possibility is that because we are going to the trouble of 290 advertising these SRLG sets, we could also advertise the new repair 291 path and only get the nodes on that path to perform the necessary 292 computation. Note also that once we have reached Q space with 293 respect to the two failures we need no longer continue the 294 computation, so we only need to notify the nodes on the path that are 295 not in Q-space. 297 One cause of mutually looping repair paths is the existence of nodes 298 with only two links, or sections of the network which are only bi- 299 connected. In these cases, repair is clearly impossible - the 300 failure of both links partitions the network. It would be 301 advantageous to be able to identify these cases, and inhibit the 302 fruitless advertisement of the secondary SRLG information. This 303 could be achieved by the node detecting the requirement for a 304 secondary SRLG, first running the not-via computation with both links 305 removed. If this does not result in a path, it is clear that the 306 network would be partitioned by such a failure, and so no 307 advertisement is required. 309 4.3. N-level Mutual Loops 311 It is tempting to conclude that the mechanism described above can be 312 applied to the general case of N failures. If we use the approach of 313 assuming that repairs are not mutual, and catching the loops and 314 executing AAH when they occur, then we can attempt repairs in the 315 case of N failures. 317 If we use the approach of avoiding potentially mutual repairs and 318 creating secondary SRLG, then we have to explore N levels of repair, 319 where N is the number of simultaneous failures we wish to repair. 321 5. Mixing LFAs and Not-via 323 So far in this draft we have assumed that all repairs use not-via 324 tunnels. However, in practise we may wish to use loop free 325 alternates (LFAs) or downstream routes where available. This 326 complicates the issue, because their use results in packets which are 327 being repaired, but NOT addressed to not-via addresses. If BOTH 328 links are using downstream routes there is no possibility of looping, 329 since it is impossible to have a pair of nodes which are both 330 downstream of each other Basic [4]. 332 Loops can however occur when LFAs are used. An obvious example is 333 the well known node repair problem with LFAs Basic [4]. If one link 334 is using a downstream route, while the other is using a not-via 335 tunnel, the potential mechanism described above would work provided 336 it were possible to determine the nodes on the path of the downstream 337 route. Some methods of computing downstream routes do not provide 338 this path information. If the path information is however available, 339 the link using a downstream route will have a discard FIB entry for 340 the not-via address of the other link. The consequence is that 341 potentially looping packets will be discarded when they attempt to 342 cross this link. 344 In the case where the mutual repairs are both using not-via repairs, 345 the loop will be broken when the packet arrives at the second 346 failure. However packets are unconditionally repaired at downstream 347 routes, and thus when the mutual pair consists of a downstream route 348 and a not-via repair, the looping packet will only be dropped when it 349 gets back to the first failure. i.e. it will execute a single turn of 350 the loop before being dropped. 352 There is a further complication with downstream routes, since 353 although the path may be computed to the far side of the failure, the 354 packet may "peel off" to its destination before reaching the far side 355 of the failure. In this case it may traverse some other link which 356 has failed and was not accounted for on the computed path. If the 357 A-B repair (Figure 1) is a downstream route and the X-Y repair is a 358 not-via repair, we can have the situation where the X-Y repair 359 packets encapsulated to Yx follow a path which attempts to traverse 360 A-B. If the A-B repair path for "normal" addresses is a downstream 361 route, it cannot be assumed that the repair path for packets 362 addressed to Yx can be sent to the same neighbour. This is because 363 the validity of a downstream route must be ascertained in the 364 topology represented by Yx, i.e. that with the link X-Y failed. This 365 is not the same topology that was used for the normal downstream 366 calculation, and use of the normal downstream route for the 367 encapsulated packets may result in an undetected loop. If it is 368 computationally feasible to check the downstream route in this 369 topology (i.e. for any not-via address Qp which traverses A-B we must 370 perform the downstream calculation for that not-via address in the 371 topology with link Q-P failed.), then the downstream repair for Yx 372 can safely be used. These packets cannot re-visit X-Y, since by 373 definition they will avoid that link. Alternatively, the packet 374 could be always repaired in a not-via tunnel. i.e. even though the 375 normal repair for traffic traversing A-B would be to use a downstream 376 route, we could insist that such traffic addressed to a not-via 377 address MUST use a tunnel to Ba. Such a tunnel would only be 378 installed for an address Qp if it were established that it did not 379 traverse Q-P (using the rules described above). 381 6. Security Considerations 383 Security considerations described in Framework [3], Basic [4] and 384 not-via [2] apply to this work. Any additional security 385 considerations will be provided in a future revision of this draft 387 7. IANA Considerations 389 There are no IANA actions required by this draft. 391 8. References 392 8.1. Normative References 394 [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement 395 Levels", BCP 14, RFC 2119, March 1997. 397 [2] Shand, M., Bryant, S., and S. Previdi, "IP Fast Reroute Using 398 Not-via Addresses", draft-ietf-rtgwg-ipfrr-notvia-addresses-02 399 (work in progress), February 2008. 401 8.2. Informative References 403 [3] Shand, M. and S. Bryant, "IP Fast Reroute Framework", 404 draft-ietf-rtgwg-ipfrr-framework-08 (work in progress), 405 February 2008. 407 [4] Atlas, A. and A. Zinin, "Basic Specification for IP Fast 408 Reroute: Loop-Free Alternates", RFC 5286, September 2008. 410 Authors' Addresses 412 Stewart Bryant 413 Cisco Systems 414 250, Longwater Ave, Green Park, 415 Reading RG2 6GB 416 UK 418 Email: stbryant@cisco.com 420 Mike Shand 421 Cisco Systems 422 250, Longwater Ave, Green Park, 423 Reading RG2 6GB 424 UK 426 Email: mshand@cisco.com 428 Full Copyright Statement 430 Copyright (C) The IETF Trust (2008). 432 This document is subject to the rights, licenses and restrictions 433 contained in BCP 78, and except as set forth therein, the authors 434 retain all their rights. 436 This document and the information contained herein are provided on an 437 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 438 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 439 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 440 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 441 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 442 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 444 Intellectual Property 446 The IETF takes no position regarding the validity or scope of any 447 Intellectual Property Rights or other rights that might be claimed to 448 pertain to the implementation or use of the technology described in 449 this document or the extent to which any license under such rights 450 might or might not be available; nor does it represent that it has 451 made any independent effort to identify any such rights. Information 452 on the procedures with respect to rights in RFC documents can be 453 found in BCP 78 and BCP 79. 455 Copies of IPR disclosures made to the IETF Secretariat and any 456 assurances of licenses to be made available, or the result of an 457 attempt made to obtain a general license or permission for the use of 458 such proprietary rights by implementers or users of this 459 specification can be obtained from the IETF on-line IPR repository at 460 http://www.ietf.org/ipr. 462 The IETF invites any interested party to bring to its attention any 463 copyrights, patents or patent applications, or other proprietary 464 rights that may cover technology that may be required to implement 465 this standard. Please address the information to the IETF at 466 ietf-ipr@ietf.org.