idnits 2.17.1 draft-ietf-rtgwg-ipfrr-framework-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.5 on line 372. ** Found boilerplate matching RFC 3978, Section 5.4, paragraph 1 (on line 360), which is fine, but *also* found old RFC 2026, Section 10.4C, paragraph 1 text on line 360. ** The document claims conformance with section 10 of RFC 2026, but uses some RFC 3978/3979 boilerplate. As RFC 3978/3979 replaces section 10 of RFC 2026, you should not claim conformance with it if you have changed to using RFC 3978/3979 boilerplate. ** The document seems to lack an RFC 3978 Section 5.1 IPR Disclosure Acknowledgement. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. ** The document seems to lack an RFC 3979 Section 5, para. 1 IPR Disclosure Acknowledgement. ** The document seems to lack an RFC 3979 Section 5, para. 2 IPR Disclosure Acknowledgement. ** The document seems to lack an RFC 3979 Section 5, para. 3 IPR Disclosure Invitation. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 2004) is 7227 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? 'MPLSFRR' on line 72 looks like a reference -- Missing reference section? 'BFD' on line 169 looks like a reference Summary: 10 errors (**), 0 flaws (~~), 2 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group M. Shand 2 Internet Draft 3 Expiration Date: Dec 2004 Cisco Systems 5 June 2004 7 IP Fast Reroute Framework 9 draft-ietf-rtgwg-ipfrr-framework-00.txt 11 Status of this Memo 13 This document is an Internet-Draft and is in full conformance with 14 all provisions of Section 10 of RFC 2026. 16 Internet-Drafts are working documents of the Internet Engineering 17 Task Force (IETF), its areas, and its working groups. Note that other 18 groups may also distribute working documents as Internet-Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six months 20 and may be updated, replaced, or obsolete by other documents at any 21 time. It is inappropriate to use Internet-Drafts as reference 22 material or to cite them other than as "work in progress". 24 The list of current Internet-Drafts can be accessed at 25 http://www.ietf.org/ietf/1id-abstracts.txt 27 The list of Internet-Draft Shadow Directories can be accessed at 28 http://www.ietf.org/shadow.html. 30 Abstract 32 This document provides a framework for the development of IP fast re- 33 route mechanisms which provide protection against link or router 34 failure by invoking locally determined repair paths. Unlike MPLS 35 Fast-reroute, the mechanisms are applicable to a network employing 36 conventional IP routing and forwarding. An essential part of such 37 mechanisms is the prevention of packet loss caused by the loops which 38 normally occur during the re-convergence of the network following a 39 failure. 41 Terminology 43 This section defines words, acronyms, and actions used in this draft. 45 1. Introduction 47 When a link or node failure occurs in a routed network, there is 48 inevitably a period of disruption to the delivery of traffic until 49 the network re-converges on the new topology. Packets for 50 destinations which were previously reached by traversing the failed 51 component may be dropped or may suffer looping. Traditionally such 52 disruptions have lasted for periods of at least several seconds, and 53 most applications have been constructed to tolerate such a quality of 54 service. 56 Recent advances in routers have reduced this interval to under a 57 second for carefully configured networks using link state IGPs. 58 However, new Internet services are emerging which may be sensitive to 59 periods of traffic loss which are orders of magnitude shorter than 60 this. 62 Addressing these issues is difficult because the distributed nature 63 of the network imposes an intrinsic limit on the minimum convergence 64 time which can be achieved. 66 However, there is an alternative approach, which is to compute backup 67 routes that allow the failure to be repaired locally by the router(s) 68 detecting the failure without the immediate need to inform other 69 routers of the failure. In this case, the disruption time can be 70 limited to the small time taken to detect the adjacent failure and 71 invoke the backup routes. This is analogous to the technique employed 72 by MPLS Fast Reroute [MPLSFRR], but the mechanisms employed for the 73 backup routes in pure IP networks are necessarily very different. 75 This document provides a framework for the development of this 76 approach. 78 2. Problem Analysis 80 The duration of the packet delivery disruption caused by a 81 conventional routing transition is determined by a number of factors: 83 1. The time taken to detect the failure. This may be of the order 84 of a few mS when it can be detected at the physical layer, up to 85 several tens of seconds when a routing protocol hello is 86 employed. During this period packets will be unavoidably lost. 88 2. The time taken for the local router to react to the failure. 89 This will typically involve generating and flooding new routing 90 updates, and re-computing the router's FIB. 92 3. The time taken to pass the information about the failure to 93 other routers in the network. In the absence of routing protocol 94 packet loss, this is typically between 10mS and 100mS per hop in 95 a well designed router. 97 4. The time taken to re-compute the forwarding tables. This is 98 typically a few mS for a link state protocol using Dijkstra's 99 algorithm. 101 5. The time taken to load the revised forwarding tables into the 102 forwarding hardware. This time is very implementation dependant 103 and also depends on the number of prefixes affected by the 104 failure, but may be several hundred mS. 106 The disruption will last until the routers adjacent to the failure 107 have completed steps 1 and 2, and then all the routers in the network 108 whose paths are affected by the failure have completed the remaining 109 steps. 111 The initial packet loss is caused by the router(s) adjacent to the 112 failure continuing to attempt to transmit packets across the failure 113 until it is detected. This loss is unavoidable, but the detection 114 time can be reduced to a few tens of mS as described in section 3.1. 116 Subsequent packet loss is caused by the "micro-loops" which form 117 because of temporary inconsistencies between routers' forwarding 118 tables. These occur as a result of the different times at which 119 routers update their forwarding tables to reflect the failure. These 120 variable delays are caused by steps 3, 4 and 5 above and in many 121 routers it is step 5 which is both the largest factor and which has 122 the greatest variance between routers. The large variance arises from 123 implementation differences and from the differing impact that a 124 failure has on each individual router. For example, the number of 125 prefixes affected by the failure may vary dramatically from one 126 router to another. 128 In order to achieve packet disruption times which are commensurate 129 with the failure detection times it is necessary to perform two 130 distinct tasks: 132 1. Provide a mechanism for the router(s) adjacent to the failure to 133 rapidly invoke a repair path, which is unaffected by any 134 subsequent re-convergence. 136 2. Provide a mechanism to prevent the effects of micro loops during 137 subsequent re-convergence. 139 Performing the first task without the second will result in the 140 repair path being starved of traffic and hence being redundant. 141 Performing the second without the first will result in traffic being 142 discarded by the router(s) adjacent to the failure. Both tasks are 143 necessary for an effective solution to the problem. 145 However, repair paths can be used in isolation where the failure is 146 short-lived. The repair paths can be kept in place until the failure 147 is repaired and there is no need to advertise the failure to other 148 routers. 150 Similarly, micro loop avoidance can be used in isolation to prevent 151 loops arising from pre-planned management action. 153 Note that micro-loops can also occur when a link or node is restored 154 to service and thus a micro-loop avoidance mechanism is required for 155 both link up and link down cases. 157 3. Mechanisms for IP Fast-route 159 The set of mechanisms required for an effective solution to the 160 problem can be broken down into the following sub-problems. 162 3.1. Mechanisms for fast failure detection 164 It is critical that the failure detection time is minimized. A number 165 of approaches are possible, such as: 167 1. Physical detection, such as loss of light. 169 2. The Bidirectional Failure Detection protocol [BFD] 171 3. Other forms of "fast hellos" 173 3.2. Mechanisms for repair paths 175 Once a failure has been detected by one of the above mechanisms, 176 traffic which previously traversed the failure is transmitted over 177 one or more repair paths. The design of the repair paths should be 178 such that they can be pre-calculated in anticipation of each local 179 failure and made available for invocation with minimal delay. There 180 are three basic categories of repair paths: 182 1. Equal cost multiple paths (ECMP). Where such paths exist, and 183 one or more of the alternate paths do not traverse the failure, 184 they may trivially be used as repair paths. 186 2. Downstream paths. (Also known as "loop free feasible 187 alternates".) Such a path exists when a direct neighbor of the 188 router adjacent to the failure has a path to the destination 189 which cannot traverse the failure. 191 3. Multihop repair paths. When there is no feasible downstream path 192 it may still be possible to locate a router, which is more than 193 one hop away from the router adjacent to the failure, from which 194 traffic will be forwarded to the destination without traversing 195 the failure. 197 ECMP and downstream paths offer the simplest repair paths and would 198 normally be used when they are available. It is anticipated that 199 around 80% of failures can be repaired using these alone. 201 Multi-hop repair paths are considerably more complex, both in the 202 computations required to determine their existence, and in the 203 mechanisms required to invoke them. They can be further classified 204 as: 206 1. Mechanisms where one or more alternate FIBs are pre-computed in 207 all routers and the repaired packet is instructed to be 208 forwarded using a "repair FIB" by some method of signaling such 209 as detecting a "U-turn" or marking the packet. 211 2. Mechanisms functionally equivalent to a loose source route which 212 is invoked using the normal FIB. These include tunnels and label 213 based mechanisms. 215 In many cases a repair path which reaches two-hops away from the 216 router detecting the failure will suffice, and it is anticipated that 217 around 95% of failures can be repaired by this method. However, to 218 effect complete repair coverage some use of longer multi-hop repair 219 paths is generally necessary. 221 3.2.1. Scope of repair paths 223 A particular repair path may be valid for all destinations which 224 require repair or may only be valid for a subset of destinations. If 225 a repair path is valid for a node immediately downstream of the 226 failure, then it will be valid for all destinations previously 227 reachable by traversing the failure. However, in cases where such a 228 repair path is difficult to achieve because it requires a high order 229 multi-hop repair path, it may still be possible to identify lower 230 order repair paths (possibly even downstream paths) which allow the 231 majority of destinations to be repaired. When IPFRR is unable to 232 provide complete repair, it is desirable that the extent of the 233 repair coverage can be determined and reported via network 234 management. 236 There is a tradeoff to be achieved between minimizing the number of 237 repair paths to be computed, and minimizing the overheads incurred in 238 using higher order multi-hop repair paths for destinations for which 239 they are not strictly necessary. However, the computational cost of 240 determining repair paths on an individual destination basis can be 241 very high. 243 The use of repair paths may result in excessive traffic passing over 244 a link, resulting in congestion discard. This reduces the 245 effectiveness of IPFRR. Mechanisms to influence the distribution of 246 repaired traffic to minimize this effect are therefore desirable. 248 3.2.2. Link or node repair 250 A repair path may be computed to protect against failure of an 251 adjacent link, or failure of an adjacent node. In general, link 252 protection is simpler to achieve. A repair which protects against 253 node failure will also protect against link failure for all 254 destinations except those for which the adjacent node is a single 255 point of failure. 257 In some cases it may be necessary to distinguish between a link or 258 node failure in order that the optimal repair strategy is invoked. 259 Methods for link/node failure determination may be based on 260 techniques such as BFD. This determination may be made prior to 261 invoking any repairs, but this will increase the period of packet 262 loss following a failure unless the determination can be performed as 263 part of the failure detection mechanism itself. Alternatively, a 264 subsequent determination can be used to optimise an already invoked 265 default strategy. 267 3.2.3. Multiple failures and Shared Risk Groups 269 Complete protection against multiple unrelated failures is out of 270 scope of this work. However, it is important that the occurrence of a 271 second failure while one failure is undergoing repair should not 272 result in a level of service which is significantly worse than that 273 which would have been achieved in the absence of any repair strategy. 275 Shared Risk Groups are an example of multiple related failures, and 276 their protection is a matter for further study. 278 One specific example of an SRLG which is clearly within the scope of 279 this work is a node failure. This causes the simultaneous failure of 280 multiple links, but their closely defined topological relationship 281 makes the problem more tractable. 283 3.3. Mechanisms for micro-loop prevention 285 Control of micro-loops is important not only because they can cause 286 packet loss in traffic which is affected by the failure, but because 287 they can also cause congestion loss of traffic which would otherwise 288 be unaffected by the failure. 290 A number of solutions to the problem of micro-loop formation have 291 been proposed. The following factors are significant in their 292 classification: 294 1. Partial or complete protection against micro-loops. 296 2. Delay imposed upon convergence. 298 3. Tolerance of multiple failures (from node failures, and in 299 general) 301 4. Computational complexity (pre-computed or real time) 303 5. Applicability to scheduled events 305 6. Applicability to link/node reinstatement. 307 4. Scope and applicability 309 Link state protocols provide ubiquitous topology information, which 310 facilitates the computation of repairs paths. Therefore the initial 311 scope of this work is in the context of link state IGPs. 313 Provision of similar facilities in non-link state IGPs and BGP is a 314 matter for further study, but the correct operation of the repair 315 mechanisms for traffic with a destination outside the IGP domain is 316 an important consideration for solutions based on this framework 318 5. IANA considerations 320 There are no IANA considerations that arise from this description of 321 IPFRR. However there may be changes to the IGPs to support IPFRR in 322 which there will be IANA considerations. 324 6. Security Considerations 326 This framework document does not itself introduce any security 327 issues, but attention must be paid to the security implications of 328 any proposed solutions to the problem. 330 Acknowledgments 332 Normative References 334 Internet-drafts are works in progress available from 335 http://www.ietf.org/internet-drafts/ 337 Informative References 339 Internet-drafts are works in progress available from 340 http://www.ietf.org/internet-drafts/ 341 BFD Katz, D., and Ward, D., "Bidirectional Forwarding 342 Detection", draft-katz-ward-bfd-01.txt, August 343 2003 (work in progress). 345 MPLSFRR Pan, P. et al, "Fast Reroute Extensions to RSVP- 346 TE for LSP Tunnels", draft-ietf-mpls-rsvp-lsp- 347 fastreroute-05.txt 349 Author's Address 351 Mike Shand 352 Cisco Systems, 353 250, Longwater Avenue, 354 Green Park, 355 Reading, RG2 6GB, 356 United Kingdom. Email: mshand@cisco.com 358 Full copyright statement 360 Copyright (C) The Internet Society (2004). All Rights Reserved. 362 This document is subject to the rights, licenses and restrictions 363 contained in BCP 78, and except as set forth therein, the authors 364 retain all their rights. 366 This document and the information contained herein are provided on an 367 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 368 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 369 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 370 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 371 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 372 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.