idnits 2.17.1 draft-hegde-rtgwg-microloop-avoidance-using-spring-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 3, 2017) is 2481 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 4970 (Obsoleted by RFC 7770) ** Obsolete normative reference: RFC 4971 (Obsoleted by RFC 7981) == Outdated reference: A later version (-09) exists of draft-ietf-rtgwg-uloop-delay-05 == Outdated reference: A later version (-15) exists of draft-ietf-spring-segment-routing-12 Summary: 2 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Routing area S. Hegde 3 Internet-Draft Juniper Networks, Inc. 4 Intended status: Standards Track P. Sarkar 5 Expires: January 4, 2018 Individual 6 July 3, 2017 8 Micro-loop avoidance using SPRING 9 draft-hegde-rtgwg-microloop-avoidance-using-spring-03 11 Abstract 13 When there is a change in network topology either due to a link going 14 down or due to a new link addition, all the nodes in the network need 15 to get the complete view of the network and re-compute the routes. 16 There will generally be a small time window when the forwarding state 17 of each of the nodes is not synchronized. This can result in 18 transient loops in the network, leading to dropped traffic due to 19 over-subscription of links. Micro-looping is generally more harmful 20 than simply dropping traffic on failed links, because it can cause 21 control traffic to be dropped on an otherwise healthy link involved 22 in micro-loop. This can lead to cascading adjacency failures or 23 network meltdown. 25 Requirements Language 27 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 28 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 29 document are to be interpreted as described in RFC 2119 [RFC2119]. 31 Status of This Memo 33 This Internet-Draft is submitted in full conformance with the 34 provisions of BCP 78 and BCP 79. 36 Internet-Drafts are working documents of the Internet Engineering 37 Task Force (IETF). Note that other groups may also distribute 38 working documents as Internet-Drafts. The list of current Internet- 39 Drafts is at http://datatracker.ietf.org/drafts/current/. 41 Internet-Drafts are draft documents valid for a maximum of six months 42 and may be updated, replaced, or obsoleted by other documents at any 43 time. It is inappropriate to use Internet-Drafts as reference 44 material or to cite them other than as "work in progress." 46 This Internet-Draft will expire on January 4, 2018. 48 Copyright Notice 50 Copyright (c) 2017 IETF Trust and the persons identified as the 51 document authors. All rights reserved. 53 This document is subject to BCP 78 and the IETF Trust's Legal 54 Provisions Relating to IETF Documents 55 (http://trustee.ietf.org/license-info) in effect on the date of 56 publication of this document. Please review these documents 57 carefully, as they describe your rights and restrictions with respect 58 to this document. Code Components extracted from this document must 59 include Simplified BSD License text as described in Section 4.e of 60 the Trust Legal Provisions and are provided without warranty as 61 described in the Simplified BSD License. 63 Table of Contents 65 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 66 2. Procedures for Micro-loop prevention . . . . . . . . . . . . 3 67 3. Detailed Solution based on SPRING . . . . . . . . . . . . . . 5 68 3.1. Link-down event . . . . . . . . . . . . . . . . . . . . . 6 69 3.2. Link-up event . . . . . . . . . . . . . . . . . . . . . . 11 70 3.3. Computation of nearest PLR . . . . . . . . . . . . . . . 12 71 3.3.1. Link down event . . . . . . . . . . . . . . . . . . . 12 72 3.3.2. Node down event . . . . . . . . . . . . . . . . . . . 12 73 3.4. Handling multiple network events . . . . . . . . . . . . 13 74 3.4.1. Handling SRLG failures . . . . . . . . . . . . . . . 13 75 3.5. Handling ECMP . . . . . . . . . . . . . . . . . . . . . . 15 76 3.6. Recognizing same network event . . . . . . . . . . . . . 15 77 3.7. Partial deployment Considerations . . . . . . . . . . . . 15 78 4. Protocol Procedures . . . . . . . . . . . . . . . . . . . . . 17 79 4.1. OSPF . . . . . . . . . . . . . . . . . . . . . . . . . . 17 80 4.2. ISIS . . . . . . . . . . . . . . . . . . . . . . . . . . 17 81 4.3. Elements of procedure . . . . . . . . . . . . . . . . . . 18 82 5. Security Considerations . . . . . . . . . . . . . . . . . . . 18 83 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 84 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 19 85 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 19 86 8.1. Normative References . . . . . . . . . . . . . . . . . . 19 87 8.2. Informative References . . . . . . . . . . . . . . . . . 19 88 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 21 90 1. Introduction 92 Micro-loops are transient loops that occur during the period of time 93 when some nodes have become aware of a topology change and have 94 changed their forwarding tables in response, but slow routers have 95 not yet modified their forwarding tables. This document provides 96 mechanisms to prevent micro-loops in the network in the event of link 97 up/down or metric change.The micro-loop prevention mechanism uses the 98 basic principles of near-side tunnelling as described in [RFC5715] 99 sec 6.2. 101 Micro-loops can be formed involving the PLRs or nodes which are not 102 directly connected to the link/node going down. The nodes which are 103 not directly connected to the node/link going down/up are referred to 104 as remote nodes. The micro-loop prevention mechanism described in 105 this document prevents possible micro-loops involving the remote 106 nodes. A new sub-tlv is defined in ISIS router capability TLV 107 [RFC4971] and OSPF router capability TLV [RFC4970] for discovering 108 support of this feature. The details are described in Section 4. 109 The operational procedures for micro-loop prevention are described in 110 Section 3. 112 2. Procedures for Micro-loop prevention 114 +----+ 10 +----+ 10 +----+ 10 +----+ 10 +----+ 115 | S1 |----| R1 |----| S |-------| E |----| D1 | 116 +----+ +----+ +----+ +----+ +----+ 117 \ \ / 118 \ 10 \ 100 / 60 119 \ \ / 120 \ +----+ +----+ 121 +--| R2 |---------| R3 | 122 +----+ 30 +----+ 123 / 124 / 10 125 +----+ 126 | S2 | 127 +----+ 129 Figure 1: Sample Network 131 The topology shown in figure 1 illustrates a sample network topology 132 where micro-loops can occur. The symmetric link metrics are shown in 133 the diagram above. The traffic from S1 to D1 takes the path 134 S1->R1->S->E->D1 and traffic from S2 takes the path 135 S2->R2->S1->R1->S->E->D1 in normal operation. When the S->E link 136 goes down, traffic can loop between S1->R2 when the FIB on S1 137 reflects the shortest path to D1 after the failure and the FIB on R2 138 reflects the shortest path to D1 before the failure. The mechanisms 139 described in [I-D.ietf-rtgwg-uloop-delay] do not address micro-loops 140 involving nodes that are not directly attached to the link that has 141 just gone down or come up. For example when S->E link goes down, S 142 and E are the Point of Local Repair (PLR) and micro-loops formed 143 between S1 and R2 are not handled. 145 The basic principle of the solution is to send the traffic on 146 tunnelled paths for a certain time period until all the nodes in the 147 network process the event and update their forwarding plane. When 148 the link S->E goes down, all the nodes in the network tunnel the 149 traffic to the nearest PLR. The PLR S needs to maintain the backup 150 path created using FRR ([RFC5286]) or other mechanisms until all 151 other nodes in the network converge. The PLR S forwards the traffic 152 to the affected destinations via the back-up path until the 153 convergence procedure is complete. This document assumes 100% backup 154 coverage for the destinations via various FRR mechanisms. This 155 document describes the procedures corresponding to the traffic flow 156 from sources (S nodes) to the destination nodes (D nodes). The 157 procedures equally apply to the D nodes being source and S nodes 158 being destination. 160 As soon as a node learns of the topology change, it modifies its FIB 161 to use loop-free tunnelled paths for the affected traffic, and it 162 starts a "convergence delay timer". When the "convergence delay 163 timer" expires, the node modifies its FIB to use the SPF path based 164 on the changed topology. The use of tunnelled paths during the 165 convergence period ensures that (barring other topology changes) all 166 traffic affected by the topology change travels on a loop-free path. 168 After all the nodes in the network converge to actual SPF path,PLR 169 converges to SPF path and updates the FIB. This micro-loop 170 prevention mechanism delays the time it takes for routing to converge 171 to the optimal paths in the new topology by a factor of 3 but the 172 convergence time is deterministic and completely avoids micro-loops. 174 In principle, near-side tunnelling could be accomplished using labels 175 distributed via LDP. However, since the application requires that 176 any given router have the potential to create a tunnel to nearly 177 every other router in the IGP domain, a large number of targeted LDP 178 sessions would be needed to learn the FEC-label bindings distributed 179 by the PLRs. SPRING [I-D.ietf-spring-segment-routing] provides a 180 more efficient method for distributing shortest path labels for this 181 application, since any router can compute the locally significant 182 FEC-label bindings for any other router without the need for targeted 183 LDP sessions. 185 [RFC5715] describes other mechanisms to prevent micro-loop 186 prevention. Near-side tunnelling is more suited for deployments as 187 it does not need additional computation or additional state 188 maintenance in the network nodes.Far side tunnelling has the 189 disadvantage that it requires the use of not-via addresses [RFC6981] 190 which requires additional address configuration on each node.Per 191 destination non micro-looping path computation is another approach to 192 prevent micro-loops but it is computationally intensive. 194 3. Detailed Solution based on SPRING 196 +----+ 197 | R4 | SRGB:1000-2000 198 +----+ SID:9 199 / \ 200 5 / \ 5 201 / \ SRGB:1000-2000 202 SID:1 / \ SID:2 SID:3 SID:4 SID:5 203 +----+ 10 +----+ 10 +----+ 10 +----+ 10 +----+ 204 | S1 |----| R1 |----| S |-------| E |----| D1 | 205 +----+ +----+ +----+ +----+ +----+ 206 \ \ / 207 10 \ \ 100 / 60 208 \ SRGB:1000-2000 \ / 209 \ +----+ +----+ 210 +--| R2 |---------| R3 |SID:7 211 SID:6 +----+ 30 +----+SRGB:1000-2000 212 / 213 / 10 214 +----+ 215 | S2 |SID:8 216 +----+SRGB:1000-2000 218 Figure 2: Sample SR Network 220 The above sample topology is provided with basic SPRING 221 configurations of SRGB and the indices corresponding to each node. 222 Each node has an SRGB 1000-2000 configured on the node. Same SRGB on 223 all nodes is used for simplifying the example and the procedures are 224 equally applicable when there is different SRGB configured on 225 multiple nodes. Each node is provisioned with a 226 MAX_CONVERGENCE_DELAY value that corresponds to its RIB to FIB 227 convergence time. The information for support of the micro-loop 228 prevention feature and the MAX_CONVERGENCE_DELAY value are flooded 229 across the IGP domain (ISIS level/OSPF area). Each node in the IGP 230 domain sets the MAX_CONVERGENCE_DELAY to the maximum of the values 231 received in the domain. 233 3.1. Link-down event 235 When the S->E link goes down, all the nodes in the network receive 236 the event via IGP database flooding. Each node supporting the micro- 237 loop prevention mechanism specified in this document SHOULD perform 238 the steps below. 240 1. The PLRs (S and E) perform FRR local repair for destinations 241 affected by the failure of the link. Each computing node 242 identifies the destinations affected by the topology change.In 243 the example above, the destination D1 is affected by S->E link 244 down for nodes S1,R1,R2, and R4. For S2, although the path to D1 245 changes there is no change in the immediate next-hop and hence 246 its not necessary for S2 to perform any specific actions to 247 prevent micro-loops. 249 2. For each affected destination, identify the nearest PLR 250 advertising the change. The link-down event is advertised by 251 both S and E. S is the nearest PLR for the nodes S1,R1,R2, and 252 R4. 254 3. Let the S->E link down event occurs at time T0. 256 4. Start a timer T1 = max (all MAXIMUM_CONVERGENCE_DELAY) at all 257 non-PLR nodes with affected destinations. 259 5. Start a timer T2 = 2 * T1 at the PLR. 261 6. For IP routes, modify the FIB for the affected destinations so 262 that the nearest PLR's node-sid is pushed on the packet's label 263 stack. For MPLS ingress and transit routes, modify the FIB for 264 the affected destinations with a two label stack, the inner label 265 corresponding to the destination and the outer label 266 corresponding to the nearest PLR. 268 7. In the case of ECMP paths to the nearest PLR, both tunnelled 269 paths are used. S1 has ECMP paths to the destination D1 and both 270 the paths are impacted. Both the paths are modified to carry two 271 label stacks containing the nearest PLR on top and the 272 destination label at the bottom. 274 8. After the expiry of timer T1 all the non-PLR nodes modify their 275 FIBs to use the shortest path as computed by the IGP, and they no 276 longer push the node-SID of the nearest PLR on the packets. 278 9. After the expiry of T2, the PLR converges and updates the FIB to 279 represent shortest path. 281 The ingress MPLS routes at various nodes for destination D1 at 282 specified time intervals is mentioned below. 284 +======+=============+=================+=============+==============+ 285 | Node | Before T0 | T0-T1 | T1-T2 | After T2 | 286 +======+=============+=================+=============+==============+ 287 | S1 | Push 1005, | Push 1005, | Push 1005, | Push 1005, | 288 | | Fwd to R1 | 1003(top), Fwd | Fwd to R2 | Fwd to R2 | 289 | | | to R1 | | | 290 | +-------------+-----------------+-------------+--------------+ 291 | | Push 1005, | Push 1005, | | | 292 | | Fwd to R4 | 1003(top), Fwd | | | 293 | | | to R4 | | | 294 +======+=============+=================+=============+==============+ 295 | S2 | Push 1005, | Push 1005, Fwd | Push 1005, | Push 1005, | 296 | | Fwd to R2 | to R2 | Fwd to R2 | Fwd to R2 | 297 +======+=============+=================+=============+==============+ 298 | R1 | Push 1005, | Push 1005, Fwd | Push 1005, | Push 1005, | 299 | | Fwd to S | to S | Fwd to R4 | Fwd to R4 | 300 | +-------------+-----------------+-------------+--------------+ 301 | | | | Push 1005, | Push 1005, | 302 | | | | Fwd to S1 | Fwd to S1 | 303 +======+=============+=================+=============+==============+ 304 | R2 | Push 1005, | Push 1005, | Push 1005, | Push 1005, | 305 | | Fwd to S1 | 1003(top), Fwd | Fwd to R3 | Fwd to R3 | 306 | | | to S1 | | | 307 +======+=============+=================+=============+==============+ 308 | R3 | Push 1005, | Push 1005, | Push 1005, | Push 1005, | 309 | | Fwd to E | 1003(top), Fwd | Fwd to E | Fwd to E | 310 | | | to E | | | 311 +======+=============+=================+=============+==============+ 312 | R4 | Push 1005, | Push 1005, | Push 1005, | Push 1005, | 313 | | Fwd to R1 | 1003(top), Fwd | Fwd to S1 | Fwd to S1 | 314 | | | to R1 | | | 315 +======+=============+=================+=============+==============+ 316 | S | Push 1005, | Push 1005, Fwd | Push 1005, | Push 1005, | 317 | | Fwd to E | to R3 * | Fwd to R3 * | Fwd to R1 | 318 | +-------------+-----------------+-------------+---------- ---+ 319 | | Push 1005, | | | Push 1005, | 320 | | Fwd to R3 * | | | Fwd to R3 * | 321 +======+=============+=================+=============+==============+ 322 | E | Pop, Fwd to | Pop, Fwd to D1 | Pop, Fwd to | Pop, Fwd to | 323 | | D1 | | D1 | D1 | 324 +======+=============+=================+=============+==============+ 326 * - Indicates backup path. 328 Figure 3: Sample MPLS ingress RIB 330 The corresponding MPLS transit routes at various nodes at specified 331 time interval is shown below. 333 +======+==========+==========+==============+===========+===========+ 334 | Node | Incoming | Before | T0-T1 | T1-T2 | After T2 | 335 | | Label | T0 | | | | 336 +======+==========+==========+==============+===========+===========+ 337 | S1 | 1005 | Push | Push 1005, | Push | Push | 338 | | | 1005, | 1003(top), | 1005, Fwd | 1005, Fwd | 339 | | | Fwd to | Fwd to R1 | to R2 | to R2 | 340 | | | R1 | | | | 341 | | +----------+--------------+-----------+-----------+ 342 | | | Push | Push 1005, | | | 343 | | | 1005, | 1003(top), | | | 344 | | | Fwd to | Fwd to R4 | | | 345 | | | R4 | | | | 346 | +----------+----------+--------------+-----------+-----------+ 347 | | 1003 | Push | Push 1003, | Push | Push | 348 | | | 1003, | Fwd to R1 | 1003, Fwd | 1003, Fwd | 349 | | | Fwd to | | to R2 | to R2 | 350 | | | R1 | | | | 351 +======+==========+==========+==============+===========+===========+ 352 | S2 | 1005 | Push | Push 1005, | Push | Push | 353 | | | 1005, | Fwd to R2 | 1005, Fwd | 1005, Fwd | 354 | | | Fwd to | | to R2 | to R2 | 355 | | | R2 | | | | 356 | +----------+----------+--------------+-----------+-----------+ 357 | | 1003 | Push | Push 1003, | Push | Push | 358 | | | 1003, | Fwd to R1 | 1003, Fwd | 1003, Fwd | 359 | | | Fwd to | | to R2 | to R2 | 360 | | | R1 | | | | 361 +======+==========+==========+==============+===========+===========+ 362 | R1 | 1005 | Push | Push 1005, | Push | Push | 363 | | | 1005, | Fwd to S | 1005, Fwd | 1005, Fwd | 364 | | | Fwd to S | | to R4 | to R4 | 365 | | +----------+--------------+-----------+-----------+ 366 | | | | | Push | Push | 367 | | | | | 1005, Fwd | 1005, Fwd | 368 | | | | | to S1 | to S1 | 369 | +----------+----------+--------------+-----------+-----------+ 370 | | 1003 | Push | Push 1003, | Push | Push | 371 | | | 1003, | Fwd to S | 1003, Fwd | 1003, Fwd | 372 | | | Fwd to S | | to S | to S | 373 +======+==========+==========+==============+===========+===========+ 374 | R2 | 1005 | Push | Push 1005, | Push | Push | 375 | | | 1005, | 1003(top), | 1005, Fwd | 1005, Fwd | 376 | | | Fwd to | Fwd to S1 | to R3 | to R3 | 377 | | | S1 | | | | 378 | +----------+----------+--------------+-----------+-----------+ 379 | | 1003 | Push | Push 1003, | Push | Push | 380 | | | 1003, | Fwd to S1 | 1003, Fwd | 1003, Fwd | 381 | | | Fwd to | | to S1 | to S1 | 382 | | | S1 | | | | 383 +======+==========+==========+==============+===========+===========+ 384 | R3 | 1005 | Push | Push 1005, | Push | Push | 385 | | | 1005, | 1003(top), | 1005, Fwd | 1005, Fwd | 386 | | | Fwd to E | Fwd to E | to E | to E | 387 | +----------+----------+--------------+-----------+-----------+ 388 | | 1003 | Push | Push 1003, | Push | Push | 389 | | | 1003, | Fwd to R2 | 1003, Fwd | 1003, Fwd | 390 | | | Fwd to | | to R2 | to R2 | 391 | | | R2 | | | | 392 +======+==========+==========+==============+===========+===========+ 393 | R4 | 1005 | Push | Push 1005, | Push | Push | 394 | | | 1005, | 1003(top), | 1005, Fwd | 1005, Fwd | 395 | | | Fwd to | Fwd to R1 | to S1 | to S1 | 396 | | | R1 | | | | 397 | +----------+----------+--------------+-----------+-----------+ 398 | | 1003 | Push | Push 1003, | Push | Push | 399 | | | 1003, | Fwd to R1 | 1003, Fwd | 1003, Fwd | 400 | | | Fwd to | | to R1 | to R1 | 401 | | | R1 | | | | 402 +======+==========+==========+==============+===========+===========+ 403 | S | 1005 | Push | Push 1005, | Push | Push | 404 | | | 1005, | Fwd to R3 * | 1005, Fwd | 1005, Fwd | 405 | | | Fwd to E | | to R3 * | to R1 | 406 | | +----------+--------------+-----------+-----------+ 407 | | | Push | | | Push | 408 | | | 1005, | | | 1005, Fwd | 409 | | | Fwd to | | | to R3 * | 410 | | | R3 * | | | | 411 | +----------+----------+--------------+-----------+-----------+ 412 | | 1003 | -- | -- | -- | -- | 413 +======+==========+==========+==============+===========+===========+ 414 | E | 1005 | Pop, Fwd | Pop, Fwd to | Pop, Fwd | Pop, Fwd | 415 | | | to D1 | D1 | to D1 | to D1 | 416 +======+==========+==========+==============+===========+===========+ 418 * - Indicates backup path. 420 Figure 4: Sample MPLS transit RIB 422 3.2. Link-up event 424 When a new-link is added to the network, the PLR needs to update the 425 FIB before it announces the change. First the PLR converges, updates 426 the FIB as per the new-link based topology and then announces the 427 new-link addition to the rest of the network. The other network 428 nodes SHOULD follow the procedure exactly same as described in sec 429 3.1. They SHOULD update their FIB to tunnel the traffic to the 430 closest node corresponding to the change.After MAX_CONVERGENCE_DELAY 431 the nodes SHOULD update the FIB with the shortest path next-hops. 433 SRGB:1000-2000 434 SID:1 SID:2 SID:3 SID:4 SID:5 435 +----+ 10 +----+ 10 +----+ 10 +----+ 10 +----+ 436 | S1 |----| R1 |----| S |---X---| E |----| D1 | 437 +----+ +----+ +----+ +----+ +----+ 438 \ \ / 439 10 \ \ 10 / 100 440 \ SRGB:1000-2000 \ / 441 \ +----+ +----+ 442 +--| R2 |---------| R3 |SID:7 443 SID:6 +----+ 10 +----+SRGB:1000-2000 444 / 445 / 10 446 +----+ 447 | S2 |SID:8 448 +----+SRGB:1000-2000 450 Figure 5: Sample SR Network 452 In the figure above, when the S->E link is added (or restored back), 454 1. PLR S processes the event and programs the FIB with new path for 455 the affected destinations. 457 2. PLR delays flooding the event for MAX_CONVERGENCE_DELAY interval. 458 This step prevents possible local micro-loop between S and R3. 460 3. Once PLR floods the event, non PLR nodes in the network identify 461 the destinations affected by the database change. This is done 462 by SPF computation and examining the next-hop change. The 463 destination D1 is affected by S->E link up for nodes S1, R1, R2 464 and R3. 466 4. For each affected destination, identify the nearest PLR 467 advertising the change. The link-up event is advertised by both 468 S and E. S is the nearest PLR for the nodes S1,R1,R2 and R3. 469 When there are ECMP paths to the destination and a new ECMP path 470 is added, the new ECMP path follows the micro-loop prevention 471 mechanisms and tunnels the traffic towards nearest PLR. 473 5. Start a timer T3 = max (all MAXIMUM_CONVERGENCE_DELAY) at all 474 non-PLR nodes. 476 6. For IP routes, update the FIB for the affected destinations so 477 that the nearest PLR's node-sid is pushed on the packet's label 478 stack. For MPLS ingress and transit router update the path with 479 two label stack, the inner label corresponding to the destination 480 and the outer label corresponding to the nearest PLR. This step 481 prevents the possible remote micro-loop between S1 and R2. 483 7. After the expiry of timer T3 all the non-PLR nodes perform global 484 convergence and update the FIB to represent the shortest path. 486 Other management events like metric change are handled similar to the 487 link-down/link-up cases for metric increase/metric decrease cases 488 respectively. 490 3.3. Computation of nearest PLR 492 When a network event is received by a node via the IGP database 493 change notification, a node has to compute the nearest PLR 494 corresponding to that advertisement. The first database change 495 advertisement may be received from any of the PLRs, nearest or 496 farthest. 498 3.3.1. Link down event 500 When a link goes down, IGPs generate a fresh LSP/Router LSA with the 501 affected link removed. The computing node has to identify the 502 missing link by walking over the LSP/LSA and compare the contents 503 with an older version. Once the affected link is identified, the 504 cost to reach both ends of the link should be examined. The nearest 505 PLR is chosen based on the cost to reach the ends. 507 3.3.2. Node down event 509 When a node goes down, it is identified by the neighbouring nodes via 510 link-down event. the neighbouring routers generate a fresh LSP/ 511 Router LSA with the affected link removed. The computing node has to 512 identify the missing link by walking over the LSP/LSA and compare the 513 contents with an older version. Once the affected link is 514 identified, the cost to reach both ends of the link should be 515 examined. The nearest PLR is chosen based on the cost to reach the 516 ends. 518 When an advertisement from the farthest node is received before the 519 nearest node, it is possible that the node that went down is chosen 520 as the nearest PLR, as the node that went down might be still 521 lingering in the database. In such cases node protection mechanisms 522 for the deceased node at the previous-hop should prevent traffic 523 loss. The details of such a mechanism is outside the scope of this 524 document. 526 3.4. Handling multiple network events 528 It is important to categorize the received events as belonging to one 529 network event or multiple network events. The link-down/link-up 530 event is advertised by both ends of the link. The node-down/node-up 531 event is advertised by all the neighbouring nodes.When an event is 532 received, the computing node should analyse the changes in the 533 database advertisements and compare with previous database.The micro- 534 loop prevention procedures SHOULD be started when the first 535 notification is received. The node SHOULD record the event for which 536 micro-loop prevention procedures are being performed. If there are 537 more database changes received during this time, the change should be 538 mapped to the already on-going micro-loop prevention procedures.If 539 the event is same then the micro-loop prevention procedures MUST 540 continue, otherwise the micro-loop prevention procedures SHOULD be 541 aborted. 543 [RFC5715] sec 6.2 describes mechanisms to handle the SRLG failures. 544 If the received failure advertisement is part of an SRLG advertised 545 in the IGP TE advertisement, the links on the path sharing same SRLG 546 are identified and the tunnel is built with multiple label stack 547 corresponding to the nearest PLR of each SRLG member. 549 When a failure is received, and the failure does not belong to the 550 same SRLG as the already on-going micro-loop prevention, the micro- 551 loop prevention procedures MUST be aborted and the normal convergence 552 procedures SHOULD be followed. 554 3.4.1. Handling SRLG failures 556 Consider a sample network as shown above with S->E and S1->R1 557 belonging to same SRLG group. The symmetric link metrics are shown 558 in the figure and the SRGB is 1000-2000 on all nodes. When the S->E 559 link goes down, all the links belonging to the same SRLG are 560 considered to be down and the route is modified to carry multiple 561 node-sids along the path. 563 SRGB:1000-2000 564 SID:1 SID:2 SID:3 SID:4 SID:5 565 +----+ 10 +----+ 10 +----+ 10 +----+ 10 +----+ 566 | S1 |-------| R1 |----| S |-------| E |----| D1 | 567 +----+ SRLG=5+----+ +----+ SRLG=5+----+ +----+ 568 \ \ / 569 10 \ \ 10 / 100 570 \ SRGB:1000-2000 \ / 571 \ +----+ +----+ 572 +--| R2 |---------| R3 |SID:7 573 SID:6 +----+ 10 +----+SRGB:1000-2000 574 / 575 / 10 576 +----+ 577 | S2 |SID:8 578 +----+SRGB:1000-2000 580 Figure 6: Sample Network with SRLG links 582 1. when the S->E link goes down, S and E generate the link down 583 event, update their Router-LSA/ LSP and flood the updated 584 information across the IGP domain. 586 2. The nodes in the IGP domain process the link-down event for 587 affected destinations.If there are any other links with same SRLG 588 on the path to destination, the nearest PLRs for those links are 589 identified. In this example topology S1->R1 and S->E belong to 590 same SRLG. For destination D1, R2 identifies two PLRs S1 and S 591 for the S->E link down event. 593 3. The nodes build the tunnelled path having multiple labels for 594 each of the identified links. for ex, R2 builds a stack 595 containing node-sid of S1 and S. The tunnelled path at R2 looks 596 as shown in Figure 7 below. 598 +------+--------------------+---------------------------------+ 599 | Node | Destination Prefix | Label Operation | 600 +------+--------------------+---------------------------------+ 601 | R2 | D1 | Push 1005, 1003, 1001(top), | 602 | | | Fwd to S1 | 603 +------+--------------------+---------------------------------+ 605 Figure 7: Sample ingress RIB for SRLG failure handling 607 4. The procedures as described in sec 3.1 for the link-down event is 608 followed to achieve micro-loop free convergence. 610 3.5. Handling ECMP 612 When a network event is received, if the the change causes only one 613 of the ECMP paths to change, then the micro-loop prevention 614 mechanisms described in sec 3.1 and 3.2 are applied to the changed 615 path only. As described in section 3.1 and 3.2 , if there is an ECMP 616 path to the nearest PLR, then all ECMP paths are used to tunnel the 617 traffic during convergence. 619 3.6. Recognizing same network event 621 When a link goes down, both the ends of the link report the event by 622 updating their LSP/LSA and flood it across the IGP domain. It is 623 possible that the same network event being reported by two nodes is 624 perceived as two different network events by the nodes in the IGP 625 domain. The nodes processing the network events SHOULD evaluate if 626 the received multiple events correspond to a single event by 627 comparing the both ends of the reported link and also by looking at 628 the previous event for which micro-loop prevention is being 629 performed. If the event is same then micro-loop prevention 630 procedures MUST be allowed to continue and MUST NOT be aborted. 632 Node down or new node addition events are reported by removing a link 633 or adding a new link by all the adjacent nodes. In addition Node up 634 event also comprises of a new LSA advertisement. The criteria to 635 recognize if the event is same is to look at both ends of the changed 636 link. If one end of the changed link maps to previously reported 637 events and the other end of the link (advertising router) changes for 638 each successive event, then the event is SHOULD be recognized as a 639 new node addition or a node deletion. Micro-loop procedures MUST be 640 allowed to continue and MUST NOT be aborted. 642 3.7. Partial deployment Considerations 644 The micro-loop mechanisms described in this document, are very 645 effective and safe when all the nodes in the network support this 646 feature and apply it when a network event happens. However, in some 647 topologies, when all the nodes do not support the micro-loop 648 prevention mechanism, the time duration of the loop can increase when 649 only some nodes apply the procedures described in this document and 650 some nodes do not. 652 For example, consider the sample topology described in the figure 653 below. 655 +-----+ 656 | S3 | 657 +-----+ 658 / 659 / 660 +----+ 10 +----+ 10 +----+ 10 +----+ 10 +----+ 661 | S1 |----| R1 |----| S |-------| E |----| D1 | 662 +----+ +----+ +----+ +----+ +----+ 663 \ \ / 664 \ 10 \ 100 / 60 665 \ \ / 666 \ +----+ +----+ 667 +--| R2 |---------| R3 | 668 +----+ 30 +----+ 669 / 670 / 10 671 +----+ 672 | S2 | 673 +----+ 675 Figure 8: Sample Network with partial deployment 677 In this topology, S1, S2, and S3 are traffic sources and D1 is the 678 destination. For each of the sources, Figure 9 shows the path before 679 the failure (the before path) and the path after the failure (the 680 post convergence path).. 682 +----+------+-------------------------+-----------------------------+ 683 | Sr | Dest | Original Path | Post-Convergence Path | 684 | c | | | | 685 +----+------+-------------------------+-----------------------------+ 686 | S1 | D1 | S1->R1->S->E->D1 | S1->R2->R3->E->D1 | 687 +----+------+-------------------------+-----------------------------+ 688 | S2 | D1 | S2->R2->S1->R1->S->E->D1| S2->R2->R3->E->D1 | 689 +----+------+-------------------------+-----------------------------+ 690 | S3 | D1 | S3->S->E->D1 | S3->S->R1->S1->R2->R3->E->D1| 691 +----+------+-------------------------+-----------------------------+ 693 Figure 9: Traffic flow in normal operation and post convergence path 694 with S->E link down 696 In the above topology, if the PLR S does not support the micro-loop 697 prevention mechanism but all other nodes support and apply this 698 mechanism, then there is a possibility that the duration of traffic 699 looping is higher than when the micro-loop prevention mechanisms are 700 not applied at all. To mitigate this issue, protocol extensions to 701 negotiate the support of this feature in the IGP domain is needed. 703 Section 4 describes the protocol mechanisms to advertise the support 704 of this feature in OSPF and ISIS. 706 However, in certain deployments and topologies, it MAY be safe to 707 apply the micro-loop prevention procedures even when all the nodes in 708 the network do not support this feature, especially in topologies 709 where the post convergence path from PLR does not traverse the nodes 710 in P space of the PLR with respect to the the node or link being 711 protected. 713 4. Protocol Procedures 715 4.1. OSPF 717 [RFC4970], defines Router Information (RI) LSA which may be used to 718 advertise properties of the originating router. Payload of the RI 719 LSA consists of one or more nested Type/Length/Value (TLV) triplets. 720 This document defines a new TLV Micro-loop prevention support TLV 721 which has following format: 723 0 1 2 3 724 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 725 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 726 | Type | Length | 727 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 729 Figure 10: OSPF micro-loop prevention support TLV 731 Type : TBA, Suggested value 15 733 Length: 0 735 The MAX_CONVEREGENCE_DELAY described in this document is advertised 736 using Controlled Convergence TLV as described in [I-D.ietf-ospf-mrt] 738 4.2. ISIS 740 [RFC4971], defines Router capability TLV which may be used to 741 advertise properties of the originating router. This document 742 defines a new sub-TLV Micro-loop prevention support sub-TLV which has 743 following format: 745 0 1 2 746 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 747 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 748 | Type | Length | 749 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 751 Figure 11: ISIS micro-loop prevention support sub-TLV 753 The Router Capability TLV specifies flags that control its 754 advertisement. The Micro-loop prevention support sub-TLV MUST be 755 propagated throughout the level and SHOULD NOT be advertised across 756 level boundaries. Therefore Router Capability TLV distribution flags 757 SHOULD be set accordingly, i.e.: the S flag in the Router Capability 758 TLV [RFC4971] MUST be unset. 760 Type : TBA, Suggested value 5 762 Length: 0 764 The MAX_CONEVREGENCE_DELAY described in this document is advertised 765 using Controlled Convergence TLV as described in [I-D.ietf-isis-mrt] 767 4.3. Elements of procedure 769 The micro-loop prevention support sub-TLV MUST be advertised only 770 when the feature is enabled.When all the nodes in the IGP domain 771 advertise this sub-TLV, a node supporting this feature MUST perform 772 the micro-loop prevention procedures as described in this document. 773 The micro-loop prevention mechanisms are applied within the OSPF area 774 or ISIS level. 776 When there are one or more nodes in the IGP domain which do not 777 support this feature, a node MAY perform micro-loop prevention 778 procedures. Near side tunnelling mechanism ensures that when a group 779 of nodes support this feature, traffic sourced from these set of 780 nodes do not suffer micro-loop. A manageability interface SHOULD be 781 provided to support micro-loop prevention in case of partial feature 782 deployment. 784 5. Security Considerations 786 This document does not introduce any further security issues other 787 than those discussed in [RFC2328] ,[RFC5340] , [ISO10589] and 788 [RFC1195] 790 6. IANA Considerations 792 This specification updates one OSPF registry: OSPF Router Information 793 (RI) TLVs Registry 795 i) TBD - Micro-loop prevention support TLV 797 This specification updates one ISIS registry: ISIS Router capability 798 TLVs (TLV 242) Registry 800 i) TBD - Micro-loop prevention support sub-TLV 802 7. Acknowledgments 804 Thanks to Chris Bowers, Hannes Gredler,Eric Rosen and Stephane 805 Litkowsky for valuable inputs. 807 8. References 809 8.1. Normative References 811 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 812 Requirement Levels", BCP 14, RFC 2119, 813 DOI 10.17487/RFC2119, March 1997, 814 . 816 [RFC4970] Lindem, A., Ed., Shen, N., Vasseur, JP., Aggarwal, R., and 817 S. Shaffer, "Extensions to OSPF for Advertising Optional 818 Router Capabilities", RFC 4970, DOI 10.17487/RFC4970, July 819 2007, . 821 [RFC4971] Vasseur, JP., Ed., Shen, N., Ed., and R. Aggarwal, Ed., 822 "Intermediate System to Intermediate System (IS-IS) 823 Extensions for Advertising Router Information", RFC 4971, 824 DOI 10.17487/RFC4971, July 2007, 825 . 827 8.2. Informative References 829 [I-D.ietf-isis-mrt] 830 Li, Z., Wu, N., Zhao, Q., Atlas, A., Bowers, C., and J. 831 Tantsura, "Intermediate System to Intermediate System (IS- 832 IS) Extensions for Maximally Redundant Trees (MRT)", 833 draft-ietf-isis-mrt-03 (work in progress), June 2017. 835 [I-D.ietf-ospf-mrt] 836 Atlas, A., Hegde, S., Bowers, C., Tantsura, J., and Z. Li, 837 "OSPF Extensions to Support Maximally Redundant Trees", 838 draft-ietf-ospf-mrt-03 (work in progress), June 2017. 840 [I-D.ietf-rtgwg-uloop-delay] 841 Litkowski, S., Decraene, B., Filsfils, C., and P. 842 Francois, "Micro-loop prevention by introducing a local 843 convergence delay", draft-ietf-rtgwg-uloop-delay-05 (work 844 in progress), June 2017. 846 [I-D.ietf-spring-segment-routing] 847 Filsfils, C., Previdi, S., Decraene, B., Litkowski, S., 848 and R. Shakir, "Segment Routing Architecture", draft-ietf- 849 spring-segment-routing-12 (work in progress), June 2017. 851 [ISO10589] 852 "Intermediate system to Intermediate system intra-domain 853 routeing information exchange protocol for use in 854 conjunction with the protocol for providing the 855 connectionless-mode Network Service (ISO 8473), ISO/IEC 856 10589:2002, Second Edition.", Nov 2002. 858 [RFC1195] Callon, R., "Use of OSI IS-IS for routing in TCP/IP and 859 dual environments", RFC 1195, DOI 10.17487/RFC1195, 860 December 1990, . 862 [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, 863 DOI 10.17487/RFC2328, April 1998, 864 . 866 [RFC5286] Atlas, A., Ed. and A. Zinin, Ed., "Basic Specification for 867 IP Fast Reroute: Loop-Free Alternates", RFC 5286, 868 DOI 10.17487/RFC5286, September 2008, 869 . 871 [RFC5340] Coltun, R., Ferguson, D., Moy, J., and A. Lindem, "OSPF 872 for IPv6", RFC 5340, DOI 10.17487/RFC5340, July 2008, 873 . 875 [RFC5715] Shand, M. and S. Bryant, "A Framework for Loop-Free 876 Convergence", RFC 5715, DOI 10.17487/RFC5715, January 877 2010, . 879 [RFC6981] Bryant, S., Previdi, S., and M. Shand, "A Framework for IP 880 and MPLS Fast Reroute Using Not-Via Addresses", RFC 6981, 881 DOI 10.17487/RFC6981, August 2013, 882 . 884 Authors' Addresses 886 Shraddha Hegde 887 Juniper Networks, Inc. 888 Exora Business Park 889 Bangalore, KA 560037 890 India 892 Email: shraddha@juniper.net 894 Pushpasis Sarkar 895 Individual 897 Email: pushpasis.ietf@gmail.com