idnits 2.17.1 draft-chandra-mpls-ri-rsvp-frr-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == The page length should not exceed 58 lines per page, but there was 19 longer pages, the longest (page 1) being 61 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 5 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 7, 2016) is 2909 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC5439' is defined on line 991, but no explicit reference was found in the text Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Chandra Ramachandran 3 Internet Draft Juniper Networks 4 Intended status: Standards Track Ina Minei 5 Google, Inc 6 Dante Pacella 7 Verizon 8 Tarek Saad 9 Cisco Systems Inc. 11 Expires: November 7, 2016 May 7, 2016 13 Refresh Interval Independent FRR Facility Protection 14 draft-chandra-mpls-ri-rsvp-frr-04 16 Status of this Memo 18 This Internet-Draft is submitted in full conformance with the 19 provisions of BCP 78 and BCP 79. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF), its areas, and its working groups. Note that 23 other groups may also distribute working documents as Internet- 24 Drafts. 26 Internet-Drafts are draft documents valid for a maximum of six 27 months and may be updated, replaced, or obsoleted by other documents 28 at any time. It is inappropriate to use Internet-Drafts as 29 reference material or to cite them other than as "work in progress." 31 The list of current Internet-Drafts can be accessed at 32 http://www.ietf.org/ietf/1id-abstracts.txt 34 The list of Internet-Draft Shadow Directories can be accessed at 35 http://www.ietf.org/shadow.html 37 This Internet-Draft will expire on November 7, 2016. 39 Copyright Notice 41 Copyright (c) 2016 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (http://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with 49 respect to this document. Code Components extracted from this 50 document must include Simplified BSD License text as described in 51 Section 4.e of the Trust Legal Provisions and are provided without 52 warranty as described in the Simplified BSD License. 54 Abstract 56 RSVP-TE relies on periodic refresh of RSVP messages to synchronize 57 and maintain the LSP related states along the reserved path. In the 58 absence of refresh messages, the LSP related states are 59 automatically deleted. Reliance on periodic refreshes and refresh 60 timeouts are problematic from the scalability point of view. The 61 number of RSVP-TE LSPs that a router needs to maintain has been 62 growing in service provider networks and the implementations should 63 be capable of handling increase in LSP scale. 65 RFC 2961 specifies mechanisms to eliminate the reliance on periodic 66 refresh and refresh timeout of RSVP messages, and enables a router 67 to increase the message refresh interval to values much larger than 68 the default 30 seconds defined in RFC 2205. However, the protocol 69 extensions defined in RFC 4090 for supporting fast reroute (FRR) 70 using bypass tunnels implicitly rely on short refresh timeouts to 71 cleanup stale states. 73 In order to eliminate the reliance on refresh timeouts, the routers 74 should unambiguously determine when a particular LSP state should be 75 deleted. Coupling LSP state with the corresponding RSVP-TE signaling 76 adjacencies as recommended in RSVP-TE Scaling Recommendations 77 (draft-ietf-teas-rsvp-te-scaling-rec) will apply in scenarios other 78 than RFC 4090 FRR using bypass tunnels. In scenarios involving RFC 79 4090 FRR using bypass tunnels, additional explicit tear down 80 messages are necessary. Refresh-interval Independent RSVP FRR (RI- 81 RSVP-FRR) extensions specified in this document consists of 82 procedures to enable LSP state cleanup that are essential in 83 scenarios not covered by procedures defined in RSVP-TE Scaling 84 Recommendations. 86 Requirements Language 88 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 89 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 90 document are to be interpreted as described in RFC-2119 [RFC2119]. 92 Table of Contents 94 1. Introduction...................................................4 95 1.1. Motivation................................................4 96 2. Terminology....................................................5 97 3. Problem Description............................................5 98 4. Solution Aspects...............................................8 99 4.1. Signaling Handshake between PLR and MP....................8 100 4.1.1. PLR Behavior.........................................8 101 4.1.2. Remote Signaling Adjacency..........................10 102 4.1.3. MP Behavior.........................................10 103 4.1.4. "Remote" state on MP................................10 104 4.2. Impact of Failures on LSP State..........................11 105 4.2.1. Non-MP Behavior.....................................12 106 4.2.2. LP-MP Behavior......................................12 107 4.2.3. NP-MP Behavior......................................12 108 4.2.4. Behavior of a Router that is both LP-MP and NP-MP...13 109 4.3. Conditional Path Tear....................................14 110 4.3.1. Sending Conditional Path Tear.......................14 111 4.3.2. Processing Conditional Path Tear....................14 112 4.3.3. CONDITIONS object...................................15 113 4.4. Remote State Teardown....................................16 114 4.4.1. PLR Behavior on Local Repair Failure................16 115 4.4.2. PLR Behavior on Resv RRO Change.....................17 116 4.4.3. LSP Preemption during Local Repair..................17 117 4.4.3.1. Preemption on LP-MP after Phop Link failure....17 118 4.4.3.2. Preemption on NP-MP after Phop Link failure....18 119 4.5. Backward Compatibility Procedures........................18 120 4.5.1. Detecting Support for Refresh interval Independent FRR 121 ...........................................................19 122 4.5.2. Procedures for backward compatibility...............19 123 4.5.2.1. Lack of support on Downstream Node.............19 124 4.5.2.2. Lack of support on Upstream Node...............20 125 4.5.2.3. Incremental Deployment.........................20 126 5. Security Considerations.......................................21 127 6. IANA Considerations...........................................22 128 6.1. New Object - CONDITIONS..................................22 129 7. Normative References..........................................22 130 8. Informative References........................................23 131 9. Acknowledgments...............................................23 132 10. Contributors.................................................23 133 11. Authors' Addresses...........................................24 135 1. Introduction 137 RSVP-TE Fast Reroute [RFC4090] defines two local repair techniques 138 to reroute label switched path (LSP) traffic over pre-established 139 backup tunnel. Facility backup method allows one or more LSPs 140 traversing a connected link or node to be protected using a bypass 141 tunnel. The many-to-one nature of local repair technique is 142 attractive from scalability point of view. This document enumerates 143 facility backup procedures in RFC 4090 that rely on refresh timeout 144 and hence make facility backup method refresh-interval dependent. 145 The RSVP-TE extensions defined in this document will enhance the 146 facility backup protection mechanism by making the corresponding 147 procedures refresh-interval independent. 149 1.1. Motivation 151 Standard RSVP [RFC2205] maintains state via the generation of RSVP 152 Path/Resv refresh messages. Refresh messages are used to both 153 synchronize state between RSVP neighbors and to recover from lost 154 RSVP messages. The use of Refresh messages to cover many possible 155 failures has resulted in a number of operational problems. 157 - One problem relates to RSVP control plane scaling due to periodic 158 refreshes of Path and Resv messages, another relates to the 159 reliability and latency of RSVP signaling. 161 - An additional problem is the time to clean up the stale state 162 after a tear message is lost. For more on these problems see 163 Section 1 of RSVP Refresh Overhead Reduction Extensions 164 [RFC2961]. 166 The problems listed above adversely affect RSVP control plane 167 scalability and RSVP-TE [RFC3209] inherited these problems from 168 standard RSVP. Procedures specified in [RFC2961] address the above 169 mentioned problems by eliminating dependency on refreshes for state 170 synchronization and for recovering from lost RSVP messages, and by 171 eliminating dependency on refresh timeout for stale state cleanup. 172 Implementing these procedures allows to improve RSVP-TE control 173 plane scalability. For more details on eliminating dependency on 174 refresh timeout for stale state cleanup, refer to "Refresh Interval 175 Independent RSVP" section in [TE-SCALE-REC]. 177 However, the procedures specified in [RFC2961] do not fully address 178 stale state cleanup for facility backup protection [RFC4090], as 179 facility backup protection still depends on refresh timeouts for 180 stale state cleanup. Thus [RFC2961] is insufficient to address the 181 problem of stale state cleanup when facility backup protection is 182 used. 184 The procedures specified in this document, in combination with 185 [RFC2961], eliminate facility backup protection dependency on 186 refresh timeouts for stale state cleanup. These procedures, in 187 combination with [RFC2961], fully address the above mentioned 188 problem of RSVP-TE stale state cleanup, including the cleanup for 189 facility backup protection. 191 The procedures specified in this document assume reliable delivery 192 of RSVP messages, as specified in [RFC2961]. Therefore this document 193 makes support for [RFC2961] a pre-requisite. 195 2. Terminology 197 The reader is assumed to be familiar with the terminology in 198 [RFC2205], [RFC3209], [RFC4090] and [RFC4558]. 200 Phop node: Previous-hop router along the label switched path 202 PPhop node: Previous-Previous-hop router along the LSP 204 LP-MP node: Merge Point router at the tail of Link-protecting bypass 205 tunnel 207 NP-MP node: Merger Point router at the tail of Node-protecting 208 bypass tunnel 210 TED: Traffic Engineering Database 212 Conditional PathTear: PathTear message containing a suggestion to a 213 receiving downstream router to retain Path state if the receiving 214 router is NP-MP 216 Remote PathTear: PathTear message sent from Point of Local Repair 217 (PLR) to MP to delete state on MP if PLR had not reliably sent 218 backup Path state before 220 3. Problem Description 221 E 222 / \ 223 / \ 224 / \ 225 / \ 226 / \ 227 / \ 228 A ----- B ----- C ----- D 229 \ / 230 \ / 231 \ / 232 \ / 233 \ / 234 \ / 235 F 236 Figure 1: Example Topology 238 In the topology in Figure 1, consider a large number of LSPs from A 239 to D transiting B and C. Assume that refresh interval has been 240 configured to be large of the order of minutes and refresh reduction 241 extensions are enabled on all routers. 243 Also assume that node protection has been configured for the LSPs 244 and the LSPs are protected by each router in the following way 246 - A has made node protection available using bypass LSP A -> E -> 247 C; A is the Point of Local Repair (PLR) and C is Node Protecting 248 Merge Point (NP-MP) 250 - B has made node protection available using bypass LSP B -> F -> 251 D; B is the PLR and D is the NP-MP 253 - C has made link protection available using bypass LSP C -> B -> F 254 -> D; C is the PLR and D is the Link Protecting Merge Point (LP- 255 MP) 257 In the above condition, assume that B-C link fails. The following is 258 the sequence of events that is expected to occur for all protected 259 LSPs under normal conditions. 261 1. B performs local repair and re-directs LSP traffic over the bypass 262 LSP B -> F -> D. 263 2. B also creates backup state for the LSP and triggers sending of 264 backup LSP state to D over the bypass LSP B -> F -> D. 266 3. D receives backup LSP states and merges the backups with the 267 protected LSPs. 268 4. As the link on C, over which the LSP states are refreshed has 269 failed, C will no longer receive state refreshes. Consequently the 270 protected LSP states on C will time out and C will send tear down 271 message for all LSPs. As each router should consider itself as a 272 Merge Point, C will time out the state only after waiting for an 273 additional duration equal to refresh timeout. 274 While the above sequence of events has been described in [RFC4090], 275 there are a few problems for which no mechanism has been specified 276 explicitly. 278 - If the protected LSP on C times out before D receives signaling 279 for the backup LSP, then D would receive PathTear from C prior to 280 receiving signaling for the backup LSP, thus resulting in deleting 281 the LSP state. This would be possible at scale even with default 282 refresh time. 284 - If upon the link failure C is to keep state until its timeout, 285 then with long refresh interval this may result in a large amount 286 of stale state on C. Alternatively, if upon the link failure C is 287 to delete the state and send PathTear to D, this would result in 288 deleting the state on D, thus deleting the LSP. D needs a reliable 289 mechanism to determine whether it is MP or not to overcome this 290 problem. 292 - If head-end A attempts to tear down LSP after step 1 but before 293 step 2 of the above sequence, then B may receive the tear down 294 message before step 2 and delete the LSP state from its state 295 database. If B deletes its state without informing D, with long 296 refresh interval this could cause (large) buildup of stale state 297 on D. 299 - If B fails to perform local repair in step 1, then B will delete 300 the LSP state from its state database without informing D. As B 301 deletes its state without informing D, with long refresh interval 302 this could cause (large) buildup of stale state on D. 304 The purpose of this document is to provide solutions to the above 305 problems which will then make it practical to scale up to a large 306 number of protected LSPs in the network. 308 4. Solution Aspects 310 The solution consists of five parts. 312 - Utilize MP determination mechanism specified in [SUMMARY-FRR] 313 that enables the PLR to signal availability of local protection to 314 MP. In addition, introduce PLR and MP procedures to establish 315 Node-ID hello session between the PLR and the MP to detect router 316 failures and to determine capability. See section 4.1 for more 317 details. This part of the solution re-uses some of the extensions 318 defined in [SUMMARY-FRR] and [TE-SCALE-REC], and the subsequent 319 sub-sections will list the extensions in these drafts that are 320 utilized in this document. 322 - Handle upstream link or node failures by cleaning up LSP states 323 if the node has not found itself as MP through the MP 324 determination mechanism. See section 4.2 for more details. 326 The combination of "path state" maintained as Path State Block 327 (PSB) and "reservation state" maintained as Reservation State 328 Block (RSB) forms an individual LSP state on an RSVP-TE speaker. 330 - Introduce extensions to enable a router to send tear down message 331 to downstream router that enables the receiving router to 332 conditionally delete its local state. See section 4.3 for more 333 details. 335 - Enhance facility protection by allowing a PLR to directly send 336 tear down message to MP without requiring the PLR to either have a 337 working bypass LSP or have already signaled backup LSP state. See 338 section 4.4 for more details. 340 - Introduce extensions to enable the above procedures to be 341 backward compatible with routers along the LSP path running 342 implementation that do not support these procedures. See section 343 4.5 for more details. 345 4.1. Signaling Handshake between PLR and MP 347 4.1.1. PLR Behavior 349 As per the procedures specified in RFC 4090, when a protected LSP 350 comes up and if the "local protection desired" flag is set in the 351 SESSION_ATTRIBUTE object, each node along the LSP path attempts to 352 make local protection available for the LSP. 354 - If the "node protection desired" flag is set, then the node 355 tries to become a PLR by attempting to create a NP-bypass LSP to 356 the NNhop node avoiding the Nhop node on protected LSP path. In 357 case node protection could not be made available after some time 358 out, the node attempts to create a LP-bypass LSP to Nhop node 359 avoiding only the link that protected LSP takes to reach Nhop 361 - If the "node protection desired" flag is not set, then the PLR 362 attempts to create a LP-bypass LSP to Nhop node avoiding the link 363 that the protected LSP takes to reach Nhop 365 With regard to the PLR procedures described above and that are 366 specified in RFC 4090, this document specifies the following 367 additional procedures. 369 - While selecting the destination address of the bypass LSP, the 370 PLR SHOULD attempt to select the router ID of the NNhop or Nhop 371 node. If the PLR and the MP are in same area, then the PLR may 372 utilize the TED to determine the router ID from the interface 373 address in RRO (if NodeID is not included in RRO). If the PLR and 374 the MP are in different IGP areas, then the PLR SHOULD use the 375 NodeID address of NNhop MP if included in the RRO of RESV. If the 376 NP-MP in a different area has not included NodeID in RRO, then the 377 PLR SHOULD use NP-MP's interface address present in the RRO. The 378 PLR SHOULD use its router ID as the source address of the bypass 379 LSP. The PLR SHOULD also include its router ID as the NodeID in 380 PATH RRO unless configured explicitly not to include NodeID. 382 - In parallel to the attempt made to create NP-bypass or LP-bypass, 383 the PLR SHOULD initiate a Node-ID based Hello session to the NNhop 384 or Nhop node respectively to establish the RSVP-TE signaling 385 adjacency. This Hello session is used to detect MP node failure as 386 well as determine the capability of the MP node. If the MP sets I- 387 bit in CAPABILITY object [TE-SCALE-REC] carried in Hello message 388 corresponding to NodeID based Hello session, then the PLR SHOULD 389 conclude that the MP supports refresh-interval independent FRR 390 procedures defined in this document. 392 - If the bypass LSP comes up, then the PLR SHOULD include Bypass 393 Summary FRR Association object and triggers PATH to be sent. If 394 Bypass Summary FRR Association object is included in PATH message, 395 then the encoding rules specified in [SUMMARY-FRR] MUST be 396 followed. 398 4.1.2. Remote Signaling Adjacency 400 A NodeID based RSVP-TE Hello session is one in which NodeID is used 401 in source and destination address fields in RSVP Hello. [RFC4558] 402 formalizes NodeID based Hello messages between two routers. This 403 document extends NodeID based RSVP Hello session to track the state 404 of RSVP-TE neighbor that is not directly connected by at least one 405 interface. In order to apply NodeID based RSVP-TE Hello session 406 between any two routers that are not immediate neighbors, the router 407 that supports the extensions defined in the document SHOULD set TTL 408 to 255 in the NodeID based Hello messages exchanged between PLR and 409 MP. The default hello interval for this NodeID hello session SHOULD 410 be set to the default specified in [TE-SCALE-REC]. 412 In the rest of the document the term "signaling adjacency", or 413 "remote signaling adjacency" refers specifically to the RSVP-TE 414 signaling adjacency. 416 4.1.3. MP Behavior 418 When the NNhop or Nhop node receives the triggered PATH with a 419 "matching" Bypass Summary FRR Association object, the node should 420 consider itself as the MP for the PLR IP address "corresponding" to 421 the Bypass Summary FRR Association object. The matching and ordering 422 rules of Bypass Summary FRR Association specified in [SUMMARY-FRR] 423 SHOULD be followed by implementations supporting this document. 425 In addition to the above procedures, the node SHOULD check the 426 presence of remote signaling adjacency with PLR (this check is 427 needed to detect network being partitioned). If a matching Bypass 428 Summary FRR Association object is found in PATH and the RSVP-TE 429 signaling adjacency is present, the node concludes that the PLR will 430 undertake refresh-interval independent FRR procedures specified in 431 this document. If the PLR has included NodeID in PATH RRO, then that 432 NodeID is the remote neighbor address. Otherwise, the PLR's 433 interface address in RRO will be the remote neighbor address. If a 434 matching Bypass Summary FRR Association object is included by PPhop 435 node, then it is NP-MP. If a matching Bypass Summary FRR Association 436 object is included by Phop node, it concludes it is LP-MP. 438 4.1.4. "Remote" state on MP 440 Once a router concludes it is MP for a PLR running refresh-interval 441 independent FRR procedures, it SHOULD create a remote path state for 442 the LSP. The "remote" state is identical to the protected LSP path 443 state except for the difference in RSVP_HOP object. The RSVP_HOP 444 object in "remote" Path state contains the address that the PLR uses 445 to send NodeID hello messages to MP. 447 The MP SHOULD consider the "remote" path state automatically deleted 448 if: 450 - MP later receives a PATH with no matching Bypass Summary FRR 451 Association object corresponding to the PLR RRO, or 453 - Node signaling adjacency with PLR goes down, or 455 - MP receives backup LSP signaling from PLR or 457 - MP receives PathTear, or 459 - MP deletes the LSP state on local policy or exception event 461 Unlike the normal path state that is either locally generated on 462 Ingress or created from PATH message from Phop node, the "remote" 463 path state is not signaled explicitly from PLR. The purpose of 464 "remote" path state is to enable the PLR to explicitly tear down 465 path and reservation states corresponding to the LSP by sending tear 466 message for the "remote" path state. Such message tearing down 467 "remote" path state is called "Remote PathTear. 469 The scenarios in which "Remote" PathTear is applied are described in 470 Section 4.4 - Remote State Teardown. 472 4.2. Impact of Failures on LSP State 474 This section describes the procedures for routers on the LSP path 475 for different kinds of failures. The procedures described on 476 detecting RSVP control plane adjacency failures do not impact the 477 RSVP-TE graceful restart mechanisms ([RFC3473], [RFC5063]). If the 478 router executing these procedures act as helper for neighboring 479 router, then the control plane adjacency will be declared as having 480 failed after taking into account the grace period extended for 481 neighbor by the helper. 483 Immediate node failures are detected from the state of NodeID hello 484 sessions established with immediate neighbors. [TE-SCALE-REC] 485 recommends each router to establish NodeID hello sessions with all 486 its immediate neighbors. PLR or MP node failure is detected from the 487 state of remote signaling adjacency established according to Section 488 4.1.2 of this document. 490 4.2.1. Non-MP Behavior 492 When a router detects Phop link or Phop node failure and the router 493 is not an MP for the LSP, then it SHOULD send Conditional PathTear 494 (refer to Section "Conditional PathTear" below) and delete PSB and 495 RSB states corresponding to the LSP. 497 4.2.2. LP-MP Behavior 499 When the Phop link for an LSP fails on a router that is LP-MP for 500 the LSP, the LP-MP SHOULD retain PSB and RSB states corresponding to 501 the LSP till the occurrence of any of the following events. 503 - Node-ID signaling adjacency with Phop PLR goes down, or 505 - MP receives normal or "Remote" PathTear for PSB, or 507 - MP receives ResvTear RSB. 509 When a router that is LP-MP for an LSP detects Phop node failure 510 from Node-ID signaling adjacency state, the LP-MP SHOULD send normal 511 PathTear and delete PSB and RSB states corresponding to the LSP. 513 4.2.3. NP-MP Behavior 515 When a router that is NP-MP for an LSP detects Phop link failure, or 516 Phop node failure from Node-ID signaling adjacency, the router 517 SHOULD retain PSB and RSB states corresponding to the LSP till the 518 occurrence of any of the following events. 520 - Remote Node-ID signaling adjacency with PPhop PLR goes down, or 522 - MP receives normal or "Remote" PathTear for PSB, or 524 - MP receives ResvTear for RSB. 526 When a router that is NP-MP does not detect Phop link or node 527 failure, but receives Conditional PathTear from the Phop node, then 528 the router SHOULD retain PSB and RSB states corresponding to the LSP 529 till the occurrence of any of the following events. 531 - Remote Node-ID signaling adjacency with PPhop PLR goes down, or 532 - MP receives normal or "Remote" PathTear for PSB, or 534 - MP receives ResvTear for RSB. 536 Receiving Conditional PathTear from the Phop node will not impact 537 the "remote" state from the PLR. Note that Phop node would send 538 Conditional PathTear if it was not an MP. 540 In the example topology in Figure 1, assume C & D are NP-MP for PLRs 541 A & B respectively. Now when A-B link fails, as B is not MP and its 542 Phop link signaling adjacency has failed, B will delete LSP state 543 (this behavior is required for unprotected LSPs - Section 4.2.1). In 544 the data plane, that would require B to delete the label forwarding 545 entry corresponding to the LSP. So if B's downstream nodes C and D 546 continue to retain state, it would not be correct for D to continue 547 to assume itself as NP-MP for PLR B. 549 The mechanism that enables D to stop considering itself as NP-MP and 550 delete "remote" path state is given below. 552 1. When C receives Conditional PathTear from B, it decides to 553 retain LSP state as it is NP-MP of PLR A. C also SHOULD check 554 whether Phop B had previously signaled availability of node 555 protection. As B had previously signaled NP availability in its 556 PATH RRO, C SHOULD remove SUMMARY_FRR_BYPASS_ASSOCIATION sub- 557 object corresponding to B from the RRO and trigger PATH to D. 558 2. When D receives triggered PATH, it realizes that it is no 559 longer NP-MP and so deletes the "remote" path state. D does not 560 propagate PATH further down because the only change is in PATH 561 RRO SUMMARY_FRR_BYPASS_ASSOCIATION sub-object corresponding to 562 B. 563 4.2.4. Behavior of a Router that is both LP-MP and NP-MP 565 A router may be both LP-MP as well as NP-MP at the same time for 566 Phop and PPhop nodes respectively of an LSP. If Phop link fails on 567 such node, the node SHOULD retain PSB and RSB states corresponding 568 to the LSP till the occurrence of any of the following events. 570 - Both Node-ID signaling adjacencies with Phop and PPhop nodes go 571 down, or 573 - MP receives normal or "Remote" PathTear for PSB, or 574 - MP receives ResvTear for RSB. 576 If a router that is both LP-MP and NP-MP detects Phop node failure, 577 then the node SHOULD retain PSB and RSB states corresponding to the 578 LSP till the occurrence of any of the following events. 580 - Remote Node-ID signaling adjacency with PPhop PLR goes down, or 582 - MP receives normal or "Remote" PathTear for PSB, or 584 - MP receives ResvTear for RSB. 586 4.3. Conditional Path Tear 588 In the example provided in the Section 4.2.5 "NP-MP Behavior on PLR 589 link failure", B deletes PSB and RSB states corresponding to the LSP 590 once B detects its link to Phop went down as B is not MP. If B were 591 to send PathTear normally, then C would delete LSP state 592 immediately. In order to avoid this, there should be some mechanism 593 by which B can indicate to C that B does not require the receiving 594 node to unconditionally delete the LSP state immediately. For this, 595 B SHOULD add a new optional object called CONDITIONS object in 596 PathTear. The new optional object is defined in Section 4.3.3. If 597 node C also understands the new object, then C SHOULD delete LSP 598 state only if it is not an NP-MP - in other words C SHOULD delete 599 LSP state if there is no "remote" PLR state on C. 601 4.3.1. Sending Conditional Path Tear 603 A router that is not an MP for an LSP SHOULD delete PSB and RSB 604 states corresponding to the LSP if Phop link or Phop Node-ID 605 signaling adjacency goes down (Section 4.2.1). The router SHOULD 606 send Conditional PathTear if the following are also true. 608 - Ingress has requested node protection for the LSP, and 610 - PathTear is not received from upstream node 612 4.3.2. Processing Conditional Path Tear 614 When a router that is not an NP-MP receives Conditional PathTear, 615 the node SHOULD delete PSB and RSB states corresponding to the LSP, 616 and process Conditional PathTear by considering it as normal 617 PathTear. Specifically, the node SHOULD NOT propagate Conditional 618 PathTear downstream but remove the optional object and send normal 619 PathTear downstream. 621 When a node that is an NP-MP receives Conditional PathTear, it 622 SHOULD NOT delete LSP state. The node SHOULD check whether the Phop 623 node had previously included Bypass Summary FRR Association object 624 in PATH. If the object had been included previously by Phop, then 625 the node processing Conditional PathTear from Phop SHOULD remove the 626 corresponding object and trigger PATH downstream. 628 If Conditional PathTear is received from a neighbor that has not 629 advertised support (refer to Section 4.5) for the new procedures 630 defined in this document, then the node SHOULD consider the message 631 as normal PathTear. The node SHOULD propagate normal PathTear 632 downstream and delete LSP state. 634 4.3.3. CONDITIONS object 636 As any implementation that does not support Conditional PathTear 637 SHOULD ignore the new object but process the message as normal 638 PathTear without generating any error, the Class-Num of the new 639 object SHOULD be 10bbbbbb where 'b' represents a bit (from Section 640 3.10 of [RFC2205]). 642 The new object is called as "CONDITIONS" object that will specify 643 the conditions under which default processing rules of the RSVP-TE 644 message SHOULD be invoked. 646 The object has the following format: 648 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 649 | Length | Class | C-type | 650 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 651 | Reserved |M| 652 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 654 Length 656 This contains the size of the object in bytes and should be set to 657 eight. 659 Class 661 To be assigned 663 C-type 664 1 666 M bit 668 If M-bit is set to 1, then the PathTear message SHOULD be processed 669 based on the condition if the receiver router is a Merge Point or 670 not. 672 If M-bit is set to 0, then the PathTear message SHOULD be processed 673 as normal PathTear message. 675 4.4. Remote State Teardown 677 If the Ingress wants to tear down the LSP because of a management 678 event while the LSP is being locally repaired at a transit PLR, it 679 would not be desirable to wait till backup LSP signaling to perform 680 state cleanup. To enable LSP state cleanup when the LSP is being 681 locally repaired, the PLR SHOULD send "remote" PathTear message 682 instructing the MP to delete PSB and RSB states corresponding to the 683 LSP. The TTL in "remote" PathTear message SHOULD be set to 255. 685 Consider node C in example topology (Figure 1) has gone down and B 686 locally repairs the LSP. 688 1. Ingress A receives a management event to tear down the LSP. 689 2. A sends normal PathTear to B. 690 3. To enable LSP state cleanup, B SHOULD send "remote" PathTear with 691 destination IP address set to that of D used in Node-ID signaling 692 adjacency with D, and RSVP_HOP object containing local address 693 used in Node-ID signaling adjacency. 694 4. B then deletes PSB and RSB states corresponding to the LSP. 695 5. On D there would be a remote signaling adjacency with B and so D 696 SHOULD accept the remote PathTear and delete PSB and RSB states 697 corresponding to the LSP. 698 4.4.1. PLR Behavior on Local Repair Failure 700 If local repair fails on the PLR after a failure, then this should 701 be considered as a case for cleaning up LSP state from PLR to the 702 Egress. PLR would achieve this using "remote" PathTear to clean up 703 state from MP. If MP has retained state, then it would propagate 704 PathTear downstream thereby achieving state cleanup. Note that in 705 the case of link protection, the PathTear would be directed to LP-MP 706 node IP address rather than the Nhop interface address. 708 4.4.2. PLR Behavior on Resv RRO Change 710 When a router that has already made NP available detects a change in 711 the RRO carried in RESV message, and if the RRO change indicates 712 that the router's former NP-MP is no longer present in the LSP path, 713 then the router SHOULD send "Remote" PathTear directly to its former 714 NP-MP. 716 In the example topology in Figure 1, assume A has made node 717 protection available and C has concluded it is NP-MP. When the B-C 718 link fails then implementing the procedure specified in Section 719 4.2.4 of this document, C will retain state till: remote NodeID 720 control plane adjacency with A goes down, or PathTear or ResvTear is 721 received for PSB or RSB respectively. If B also has made node 722 protection available, B will eventually complete backup LSP 723 signaling with its NP-MP D and trigger RESV to A with RRO changed. 724 The new RRO of the LSP carried in RESV will not contain C. When A 725 processes the RESV with a new RRO not containing C - its former NP- 726 MP, A SHOULD send "Remote" PathTear to C. When C receives a "Remote" 727 PathTear for its PSB state, C will send normal PathTear downstream 728 to D and delete both PSB and RSB states corresponding to the LSP. As 729 D has already received backup LSP signaling from B, D will retain 730 control plane and forwarding states corresponding to the LSP. 732 4.4.3. LSP Preemption during Local Repair 734 If an LSP is preempted when there is no failure along the path of 735 the LSP, the node on which preemption occurs would send PathErr and 736 ResvTear upstream and only delete the forwarding state and RSB state 737 corresponding to the LSP. But if the LSP is being locally repaired 738 upstream of the node on which the LSP is preempted, then the node 739 SHOULD delete both PSB and RSB states corresponding to the LSP and 740 send normal PathTear downstream. 742 4.4.3.1. Preemption on LP-MP after Phop Link failure 744 If an LSP is preempted on LP-MP after its Phop or incoming link has 745 already failed but the backup LSP has not been signaled yet, then 746 the node SHOULD send normal PathTear and delete both PSB and RSB 747 states corresponding to the LSP. As the LP-MP has retained LSP state 748 because the PLR would signal the LSP through backup LSP signaling, 749 preemption would bring down the LSP and the node would not be LP-MP 750 any more requiring the node to clean up LSP state. 752 4.4.3.2. Preemption on NP-MP after Phop Link failure 754 If an LSP is preempted on NP-MP after its Phop link has already 755 failed but the backup LSP has not been signaled yet, then the node 756 SHOULD send normal PathTear and delete PSB and RSB states 757 corresponding to the LSP. As the NP-MP has retained LSP state 758 because the PLR would signal the LSP through backup LSP signaling, 759 preemption would bring down the LSP and the node would not be NP-MP 760 any more requiring the node to clean up LSP state. 762 Consider B-C link goes down on the same example topology (Figure 1). 763 As C is NP-MP for PLR A, C will retain LSP state. 765 1. The LSP is preempted on C. 766 2. C will delete RSB state corresponding to the LSP. But C cannot 767 send PathErr or ResvTear to PLR A because backup LSP has not 768 been signaled yet. 769 3. As the only reason for C having retained state after Phop node 770 failure was that it was NP-MP, C SHOULD send normal PathTear to 771 D and delete PSB state also. D would also delete PSB and RSB 772 states on receiving PathTear from C. 773 4. B starts backup LSP signaling to D. But as D does not have the 774 LSP state, it will reject backup LSP PATH and send PathErr to B. 775 5. B will delete its reservation and send ResvTear to A. 776 4.5. Backward Compatibility Procedures 778 The "Refresh interval Independent FRR" or RI-RSVP-FRR referred below 779 in this section refers to the changes that have been proposed in 780 previous sections. Any implementation that does not support them has 781 been termed as "non-RI-RSVP-FRR implementation". The extensions 782 proposed in [SUMMARY-FRR] are applicable to implementations that do 783 not support RI-RSVP-FRR. On the other hand, changes proposed 784 relating to LSP state cleanup namely Conditional and remote PathTear 785 require support from one-hop and two-hop neighboring nodes along the 786 LSP path. So procedures that fall under LSP state cleanup category 787 SHOULD be turned on only if all nodes involved in the node 788 protection FRR i.e. PLR, MP and intermediate node in the case of NP, 789 support the extensions. Note that for LSPs requesting only link 790 protection, the PLR and the LP-MP should support the extensions. 792 4.5.1. Detecting Support for Refresh interval Independent FRR 794 An implementation supporting the extensions specified in previous 795 sections (called RI-RSVP-FRR here after) SHOULD set the flag 796 "Refresh interval Independent RSVP" or RI-RSVP in CAPABILITY object 797 in Hello messages. The RI-RSVP flag is specified in [TE-SCALE-REC]. 799 - As nodes supporting the extensions SHOULD initiate Node Hellos 800 with adjacent nodes, a node on the path of protected LSP can 801 determine whether its Phop or Nhop neighbor supports RI-RSVP-FRR 802 enhancements from the Hello messages sent by the neighbor. 804 - If a node attempts to make node protection available, then the 805 PLR SHOULD initiate remote Node-ID signaling adjacency with NNhop. 806 If the NNhop (a) does not reply to remote node Hello message or 807 (b) does not set "Enhanced facility protection" flag in CAPABILITY 808 object in the reply, then the PLR can conclude that NNhop does not 809 support RI-RSVP-FRR extensions. 811 - If node protection is requested for an LSP and if (a) PPhop node 812 has not included a matching Bypass Summary FRR Association object 813 in PATH or (b) PPhop node has not initiated remote node Hello 814 messages, then the node SHOULD conclude that PLR does not support 815 RI-RSVP-FRR extensions. The details are described in the 816 "Procedures for backward compatibility" section below. 818 Any node that sets the I-bit is set in its CAPABILITY object MUST 819 also set Refresh-Reduction-Capable bit in common header of all RSVP- 820 TE messages. 822 4.5.2. Procedures for backward compatibility 824 The procedures defined hereafter are performed on a subset of LSPs 825 that traverse a node, rather than on all LSPs that traverse a node. 826 This behavior is required to support backward compatibility for a 827 subset of LSPs traversing nodes running non-RI-RSVP-FRR 828 implementations. 830 4.5.2.1. Lack of support on Downstream Node 832 - If the Nhop does not support the RI-RSVP-FRR extensions, then the 833 node SHOULD reduce the "refresh period" in TIME_VALUES object 834 carried in PATH to default small refresh default value. 836 - If node protection is requested and the NNhop node does not 837 support the enhancements, then the node SHOULD reduce the "refresh 838 period" in TIME_VALUES object carried in PATH to a small refresh 839 default value. 841 If the node reduces the refresh time from the above procedures, it 842 SHOULD also not send remote PathTear or Conditional PathTear 843 messages. 845 Consider the example topology in Figure 1. If C does not support the 846 RI-RSVP-FRR extensions, then: 848 - A and B SHOULD reduce the refresh time to default value of 30 849 seconds and trigger PATH 851 - If B is not an MP and if Phop link of B fails, B cannot send 852 Conditional PathTear to C but SHOULD time out PSB state from A 853 normally. This would be accomplished if A would also reduce the 854 refresh time to default value. So if C does not support the RI- 855 RSVP-FRR extensions, then Phop B and PPhop A SHOULD reduce refresh 856 time to a small default value. 858 4.5.2.2. Lack of support on Upstream Node 860 - If Phop node does not support the RI-RSVP-FRR extensions, then 861 the node SHOULD reduce the "refresh period" in TIME_VALUES object 862 carried in RESV to default small refresh time value. 864 - If node protection is requested and the Phop node does not 865 support the RI-RSVP-FRR extensions, then the node SHOULD reduce 866 the "refresh period" in TIME_VALUES object carried in PATH to 867 default value. 869 - If node protection is requested and PPhop node does not support 870 the RI-RSVP-FRR extensions, then the node SHOULD reduce the 871 "refresh period" in TIME_VALUES object carried in RESV to default 872 value. 874 - If the node reduces the refresh time from the above procedures, 875 it SHOULD also not execute MP procedures specified in Section 4.2 876 of this document. 878 4.5.2.3. Incremental Deployment 880 The backward compatibility procedures described in the previous sub- 881 sections imply that a router supporting the RI-RSVP-FRR extensions 882 specified in this document can apply the procedures specified in the 883 document either in the downstream or upstream direction of an LSP, 884 depending on the capability of the routers downstream or upstream in 885 the LSP path. 887 - RI-RSVP-FRR extensions and procedures are enabled for downstream 888 Path, PathTear and ResvErr messages corresponding to an LSP if 889 link protection is requested for the LSP and the Nhop node 890 supports the extensions 892 - RI-RSVP-FRR extensions and procedures are enabled for downstream 893 Path, PathTear and ResvErr messages corresponding to an LSP if 894 node protection is requested for the LSP and both Nhop & NNhop 895 nodes support the extensions 897 - RI-RSVP-FRR extensions and procedures are enabled for upstream 898 PathErr, Resv and ResvTear messages corresponding to an LSP if 899 link protection is requested for the LSP and the Phop node 900 supports the extensions 902 - RI-RSVP-FRR extensions and procedures are enabled for upstream 903 PathErr, Resv and ResvTear messages corresponding to an LSP if 904 node protection is requested for the LSP and both Phop and PPhop 905 nodes support the extensions 907 For example, if an implementation supporting the RI-RSVP-FRR 908 extensions specified in this document is deployed on all routers in 909 particular region of the network and if all the LSPs in the network 910 request node protection, then the FRR extensions will only be 911 applied for the LSP segments that traverse the particular region. 912 This will aid incremental deployment of these extensions and also 913 allow reaping the benefits of the extensions in portions of the 914 network where it is supported. 916 5. Security Considerations 918 This security considerations pertaining to [RFC2205], [RFC3209] and 919 [RFC5920] remain relevant. 921 This document extends the applicability of Node-ID based Hello 922 session between immediate neighbors. The Node-ID based Hello session 923 between PLR and NP-MP may require the two routers to exchange Hello 924 messages with non-immediate neighbor. So, the implementations SHOULD 925 provide the option to configure Node-ID neighbor specific or global 926 authentication key to authentication messages received from Node-ID 927 neighbors. The network administrator MAY utilize this option to 928 enable RSVP-TE routers to authenticate Node-ID Hello messages 929 received with TTL greater than 1. Implementations SHOULD also 930 provide the option to specify a limit on the number of Node-ID based 931 Hello sessions that can be established on a router supporting the 932 extensions defined in this document. 934 6. IANA Considerations 936 6.1. New Object - CONDITIONS 938 RSVP Change Guidelines [RFC3936] defines the Class-Number name space 939 for RSVP objects. The name space is managed by IANA. 941 IANA registry: RSVP Parameters 942 Subsection: Class Names, Class Numbers, and Class Types 944 A new RSVP object using a Class-Number from 128-183 range called the 945 "CONDITIONS" object is defined in Section 4.3 of this document. The 946 Class-Number from 128-183 range will be allocated by IANA. 948 7. Normative References 950 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 951 Requirement Levels", BCP 14, RFC 2119, March 1997. 953 [RFC3209] Awduche, D., "RSVP-TE: Extensions to RSVP for LSP 954 Tunnels", RFC 3209, December 2001. 956 [RFC4090] Pan, P., "Fast Reroute Extensions to RSVP-TE for LSP 957 Tunnels", RFC 4090, May 2005. 959 [RFC2961] Berger, L., "RSVP Refresh Overhead Reduction Extensions", 960 RFC 2961, April 2001. 962 [RFC2205] Braden, R., "Resource Reservation Protocol (RSVP)", RFC 963 2205, September 1997. 965 [RFC4558] Ali, Z., "Node-ID Based Resource Reservation (RSVP) Hello: 966 A Clarification Statement", RFC 4558, June 2006. 968 [RFC3473] Berger, L., "Generalized Multi-Protocol Label Switching 969 Signaling Resource Reservation Protocol-Traffic Engineering 970 Extensions", RFC 3473, January 2003. 972 [RFC5063] Satyanarayana, A., "Extensions to GMPLS Resource 973 Reservation Protocol Graceful Restart", RFC5063, October 974 2007. 976 [RFC3936] Kompella, K. and J. Lang, "Procedures for Modifying the 977 Resource reSerVation Protocol (RSVP)", BCP 96, RFC 3936, 978 October 2004. 980 [TE-SCALE-REC] Vishnu Pavan Beeram et. al, "Implementation 981 Recommendations to improve scalability of RSVP-TE 982 Deployments", draft-ietf-teas-rsvp-te-scaling-rec (work in 983 progress) 985 [SUMMARY-FRR] Mike Tallion et. al, "RSVP-TE Summary Fast Reroute 986 Extensions for LSP Tunnels", draft-mtaillon-mpls-summary- 987 frr-rsvpte (work in progress) 989 8. Informative References 991 [RFC5439] Yasukawa, S., "An Analysis of Scaling Issues in MPLS-TE 992 Core Networks", RFC 5439, February 2009. 994 [RFC5920] Fang, L., "Security Framework for MPLS and GMPLS 995 Networks", RFC 5920, July 2010. 997 9. Acknowledgments 999 We are very grateful to Yakov Rekhter for his contributions to the 1000 development of the idea and thorough review of content of the draft. 1001 Thanks to Raveendra Torvi and Yimin Shen for their comments and 1002 inputs. 1004 10. Contributors 1006 Markus Jork 1007 Juniper Networks 1008 Email: mjork@juniper.net 1010 Harish Sitaraman 1011 Juniper Networks 1012 Email: hsitaraman@juniper.net 1014 Vishnu Pavan Beeram 1015 Juniper Networks 1016 Email: vbeeram@juniper.net 1018 Ebben Aries 1019 Juniper Networks 1020 Email: exa@juniper.net 1022 Mike Tallion 1023 Cisco Systems Inc. 1024 Email: mtallion@cisco.com 1026 11. Authors' Addresses 1028 Chandra Ramachandran 1029 Juniper Networks 1030 Email: csekar@juniper.net 1032 Ina Minei 1033 Google, Inc 1034 inaminei@google.com 1036 Dante Pacella 1037 Verizon 1038 Email: dante.j.pacella@verizon.com 1040 Tarek Saad 1041 Cisco Systems Inc. 1042 Email: tsaad@cisco.com