idnits 2.17.1 draft-ietf-mpls-ri-rsvp-frr-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == The page length should not exceed 58 lines per page, but there was 20 longer pages, the longest (page 1) being 61 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 5 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 10, 2018) is 2267 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC5439' is defined on line 1020, but no explicit reference was found in the text Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Chandra Ramachandran 3 Internet Draft Juniper Networks 4 Intended status: Standards Track Ina Minei 5 Google, Inc 6 Dante Pacella 7 Verizon 8 Tarek Saad 9 Cisco Systems Inc. 11 Expires: August 9, 2018 February 10, 2018 13 Refresh Interval Independent FRR Facility Protection 14 draft-ietf-mpls-ri-rsvp-frr-03 16 Status of this Memo 18 This Internet-Draft is submitted in full conformance with the 19 provisions of BCP 78 and BCP 79. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF), its areas, and its working groups. Note that 23 other groups may also distribute working documents as Internet- 24 Drafts. 26 Internet-Drafts are draft documents valid for a maximum of six 27 months and may be updated, replaced, or obsoleted by other documents 28 at any time. It is inappropriate to use Internet-Drafts as 29 reference material or to cite them other than as "work in progress." 31 The list of current Internet-Drafts can be accessed at 32 http://www.ietf.org/ietf/1id-abstracts.txt 34 The list of Internet-Draft Shadow Directories can be accessed at 35 http://www.ietf.org/shadow.html 37 This Internet-Draft will expire on August 9, 2018. 39 Copyright Notice 41 Copyright (c) 2018 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (http://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with 49 respect to this document. Code Components extracted from this 50 document must include Simplified BSD License text as described in 51 Section 4.e of the Trust Legal Provisions and are provided without 52 warranty as described in the Simplified BSD License. 54 Abstract 56 RSVP-TE relies on periodic refresh of RSVP messages to synchronize 57 and maintain the LSP related states along the reserved path. In the 58 absence of refresh messages, the LSP related states are 59 automatically deleted. Reliance on periodic refreshes and refresh 60 timeouts are problematic from the scalability point of view. The 61 number of RSVP-TE LSPs that a router needs to maintain has been 62 growing in service provider networks and the implementations should 63 be capable of handling increase in LSP scale. 65 RFC 2961 specifies mechanisms to eliminate the reliance on periodic 66 refresh and refresh timeout of RSVP messages, and enables a router 67 to increase the message refresh interval to values much longer than 68 the default 30 seconds defined in RFC 2205. However, the protocol 69 extensions defined in RFC 4090 for supporting fast reroute (FRR) 70 using bypass tunnels implicitly rely on short refresh timeouts to 71 cleanup stale states. 73 In order to eliminate the reliance on refresh timeouts, the routers 74 should unambiguously determine when a particular LSP state should be 75 deleted. Coupling LSP state with the corresponding RSVP-TE signaling 76 adjacencies as recommended in RSVP-TE Scaling Recommendations 77 (draft-ietf-teas-rsvp-te-scaling-rec) will apply in scenarios other 78 than RFC 4090 FRR using bypass tunnels. In scenarios involving RFC 79 4090 FRR using bypass tunnels, additional explicit tear down 80 messages are necessary. Refresh-interval Independent RSVP FRR (RI- 81 RSVP-FRR) extensions specified in this document consists of 82 procedures to enable LSP state cleanup that are essential in 83 scenarios not covered by procedures defined in RSVP-TE Scaling 84 Recommendations. Hence, this document updates the semantics of 85 Refresh-Interval Independent RSVP (RI-RSVP) capability specified in 86 RSVP-TE Scaling Recommendations (draft-ietf-teas-rsvp-te-scaling- 87 rec). 89 Requirements Language 91 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 92 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 93 document are to be interpreted as described in RFC-2119 [RFC2119]. 95 Table of Contents 97 1. Introduction...................................................4 98 1.1. Motivation................................................4 99 2. Terminology....................................................5 100 3. Problem Description............................................6 101 4. Solution Aspects...............................................8 102 4.1. Signaling Handshake between PLR and MP....................8 103 4.1.1. PLR Behavior.........................................8 104 4.1.2. Remote Signaling Adjacency..........................10 105 4.1.3. MP Behavior.........................................10 106 4.1.4. "Remote" state on MP................................11 107 4.2. Impact of Failures on LSP State..........................12 108 4.2.1. Non-MP Behavior.....................................12 109 4.2.2. LP-MP Behavior......................................12 110 4.2.3. NP-MP Behavior......................................13 111 4.2.4. Behavior of a Router that is both LP-MP and NP-MP...14 112 4.3. Conditional Path Tear....................................14 113 4.3.1. Sending Conditional Path Tear.......................15 114 4.3.2. Processing Conditional Path Tear....................15 115 4.3.3. CONDITIONS object...................................15 116 4.4. Remote State Teardown....................................16 117 4.4.1. PLR Behavior on Local Repair Failure................17 118 4.4.2. PLR Behavior on Resv RRO Change.....................17 119 4.4.3. LSP Preemption during Local Repair..................18 120 4.4.3.1. Preemption on LP-MP after Phop Link failure....18 121 4.4.3.2. Preemption on NP-MP after Phop Link failure....18 122 4.5. Backward Compatibility Procedures........................19 123 4.5.1. Detecting Support for Refresh interval Independent FRR 124 ...........................................................19 125 4.5.2. Procedures for backward compatibility...............20 126 4.5.2.1. Lack of support on Downstream Node.............20 127 4.5.2.2. Lack of support on Upstream Node...............21 128 4.5.2.3. Incremental Deployment.........................21 129 5. Security Considerations.......................................22 130 6. IANA Considerations...........................................22 131 6.1. New Object - CONDITIONS..................................22 132 7. Normative References..........................................23 133 8. Informative References........................................23 134 9. Acknowledgments...............................................24 135 10. Contributors.................................................24 136 11. Authors' Addresses...........................................24 138 1. Introduction 140 RSVP-TE Fast Reroute [RFC4090] defines two local repair techniques 141 to reroute label switched path (LSP) traffic over pre-established 142 backup tunnel. Facility backup method allows one or more LSPs 143 traversing a connected link or node to be protected using a bypass 144 tunnel. The many-to-one nature of local repair technique is 145 attractive from scalability point of view. This document enumerates 146 facility backup procedures in RFC 4090 that rely on refresh timeout 147 and hence make facility backup method refresh-interval dependent. 148 The RSVP-TE extensions defined in this document will enhance the 149 facility backup protection mechanism by making the corresponding 150 procedures refresh-interval independent. 152 1.1. Motivation 154 Standard RSVP [RFC2205] maintains state via the generation of RSVP 155 Path/Resv refresh messages. Refresh messages are used to both 156 synchronize state between RSVP neighbors and to recover from lost 157 RSVP messages. The use of Refresh messages to cover many possible 158 failures has resulted in a number of operational problems. 160 - One problem relates to RSVP control plane scaling due to periodic 161 refreshes of Path and Resv messages, another relates to the 162 reliability and latency of RSVP signaling. 164 - An additional problem is the time to clean up the stale state 165 after a tear message is lost. For more on these problems see 166 Section 1 of RSVP Refresh Overhead Reduction Extensions 167 [RFC2961]. 169 The problems listed above adversely affect RSVP control plane 170 scalability and RSVP-TE [RFC3209] inherited these problems from 171 standard RSVP. Procedures specified in [RFC2961] address the above 172 mentioned problems by eliminating dependency on refreshes for state 173 synchronization and for recovering from lost RSVP messages, and by 174 eliminating dependency on refresh timeout for stale state cleanup. 175 Implementing these procedures allows implementations to improve 176 RSVP-TE control plane scalability. For more details on eliminating 177 dependency on refresh timeout for stale state cleanup, refer to 178 "Refresh Interval Independent RSVP" section in [TE-SCALE-REC]. 180 However, the procedures specified in [TE-SCALE-REC] do not fully 181 address stale state cleanup for facility backup protection 183 [RFC4090], as facility backup protection still depends on refresh 184 timeouts for stale state cleanup. 186 The procedures specified in this document, in combination with [TE- 187 SCALE-REC], eliminate facility backup protection dependency on 188 refresh timeouts for stale state cleanup including the cleanup for 189 facility backup protection. The document hence updates the semantics 190 of Refresh-Interval Independent RSVP (RI-RSVP) capability specified 191 in [TE-SCALE-REC]. 193 The procedures specified in this document assume reliable delivery 194 of RSVP messages, as specified in [RFC2961]. Therefore this document 195 makes support for [RFC2961] a pre-requisite. 197 2. Terminology 199 The reader is assumed to be familiar with the terminology in 200 [RFC2205], [RFC3209], [RFC4090] and [RFC4558]. 202 Phop node: Previous-hop router along the label switched path 204 PPhop node: Previous-Previous-hop router along the LSP 206 LP-MP node: Merge Point router at the tail of Link-protecting bypass 207 tunnel 209 NP-MP node: Merge Point router at the tail of Node-protecting bypass 210 tunnel 212 TED: Traffic Engineering Database 214 LSP state: The combination of "path state" maintained as Path State 215 Block (PSB) and "reservation state" maintained as Reservation State 216 Block (RSB) forms an individual LSP state on an RSVP-TE speaker 218 Conditional PathTear: PathTear message containing a suggestion to a 219 receiving downstream router to retain Path state if the receiving 220 router is NP-MP 222 Remote PathTear: PathTear message sent from Point of Local Repair 223 (PLR) to MP to delete LSP state on MP if PLR had not reliably sent 224 backup Path state before 226 3. Problem Description 228 E 229 / \ 230 / \ 231 / \ 232 / \ 233 / \ 234 / \ 235 A ----- B ----- C ----- D 236 \ / 237 \ / 238 \ / 239 \ / 240 \ / 241 \ / 242 F 244 Figure 1: Example Topology 246 In the topology in Figure 1, consider a large number of LSPs from A 247 to D transiting B and C. Assume that refresh interval has been 248 configured to be long of the order of minutes and refresh reduction 249 extensions are enabled on all routers. 251 Also assume that node protection has been configured for the LSPs 252 and the LSPs are protected by each router in the following way 254 - A has made node protection available using bypass LSP A -> E -> 255 C; A is the Point of Local Repair (PLR) and C is Node Protecting 256 Merge Point (NP-MP) 258 - B has made node protection available using bypass LSP B -> F -> 259 D; B is the PLR and D is the NP-MP 261 - C has made link protection available using bypass LSP C -> B -> F 262 -> D; C is the PLR and D is the Link Protecting Merge Point (LP- 263 MP) 265 In the above condition, assume that B-C link fails. The following is 266 the sequence of events that is expected to occur for all protected 267 LSPs under normal conditions. 269 1. B performs local repair and re-directs LSP traffic over the bypass 270 LSP B -> F -> D. 271 2. B also creates backup state for the LSP and triggers sending of 272 backup LSP state to D over the bypass LSP B -> F -> D. 273 3. D receives backup LSP states and merges the backups with the 274 protected LSPs. 275 4. As the link on C, over which the LSP states are refreshed has 276 failed, C will no longer receive state refreshes. Consequently the 277 protected LSP states on C will time out and C will send tear down 278 message for all LSPs. As each router should consider itself as a 279 Merge Point, C will time out the state only after waiting for an 280 additional duration equal to refresh timeout. 281 While the above sequence of events has been described in [RFC4090], 282 there are a few problems for which no mechanism has been specified 283 explicitly. 285 - If the protected LSP on C times out before D receives signaling 286 for the backup LSP, then D would receive PathTear from C prior to 287 receiving signaling for the backup LSP, thus resulting in deleting 288 the LSP state. This would be possible at scale even with default 289 refresh time. 291 - If upon the link failure C is to keep state until its timeout, 292 then with long refresh interval this may result in a large amount 293 of stale state on C. Alternatively, if upon the link failure C is 294 to delete the state and send PathTear to D, this would result in 295 deleting the state on D, thus deleting the LSP. D needs a reliable 296 mechanism to determine whether it is MP or not to overcome this 297 problem. 299 - If head-end A attempts to tear down LSP after step 1 but before 300 step 2 of the above sequence, then B may receive the tear down 301 message before step 2 and delete the LSP state from its state 302 database. If B deletes its state without informing D, with long 303 refresh interval this could cause (large) buildup of stale state 304 on D. 306 - If B fails to perform local repair in step 1, then B will delete 307 the LSP state from its state database without informing D. As B 308 deletes its state without informing D, with long refresh interval 309 this could cause (large) buildup of stale state on D. 311 The purpose of this document is to provide solutions to the above 312 problems which will then make it practical to scale up to a large 313 number of protected LSPs in the network. 315 4. Solution Aspects 317 The solution consists of five parts. 319 - Utilize MP determination mechanism specified in [SUMMARY-FRR] 320 that enables the PLR to signal the availability of local 321 protection to the MP. In addition, introduce PLR and MP procedures 322 to establish Node-ID based hello session between the PLR and the 323 MP to detect router failures and to determine capability. See 324 section 4.1 for more details. This part of the solution re-uses 325 some of the extensions defined in [SUMMARY-FRR] and [TE-SCALE- 326 REC], and the subsequent sub-sections will list the extensions in 327 these drafts that are utilized in this document. 329 - Handle upstream link or node failures by cleaning up LSP states 330 if the node has not found itself as MP through the MP 331 determination mechanism. See section 4.2 for more details. 333 - Introduce extensions to enable a router to send tear down message 334 to the downstream router that enables the receiving router to 335 conditionally delete its local LSP state. See section 4.3 for more 336 details. 338 - Enhance facility protection by allowing a PLR to directly send 339 tear down message to MP without requiring the PLR to either have a 340 working bypass LSP or have already signaled backup LSP state. See 341 section 4.4 for more details. 343 - Introduce extensions to enable the above procedures to be 344 backward compatible with routers along the LSP path running 345 implementation that do not support these procedures. See section 346 4.5 for more details. 348 4.1. Signaling Handshake between PLR and MP 350 4.1.1. PLR Behavior 352 As per the procedures specified in RFC 4090, when a protected LSP 353 comes up and if the "local protection desired" flag is set in the 354 SESSION_ATTRIBUTE object, each node along the LSP path attempts to 355 make local protection available for the LSP. 357 - If the "node protection desired" flag is set, then the node 358 tries to become a PLR by attempting to create a NP-bypass LSP to 359 the NNhop node avoiding the Nhop node on protected LSP path. In 360 case node protection could not be made available, the node 361 attempts to create a LP-bypass LSP to Nhop node avoiding only the 362 link that protected LSP takes to reach Nhop 364 - If the "node protection desired" flag is not set, then the PLR 365 attempts to create a LP-bypass LSP to Nhop node avoiding the link 366 that the protected LSP takes to reach Nhop 368 With regard to the PLR procedures described above and that are 369 specified in RFC 4090, this document specifies the following 370 additional procedures. 372 - While selecting the destination address of the bypass LSP, the 373 PLR SHOULD attempt to select the router ID of the NNhop or Nhop 374 node. If the PLR and the MP are in same area, then the PLR may 375 utilize the TED to determine the router ID from the interface 376 address in RRO (if NodeID is not included in RRO). If the PLR and 377 the MP are in different IGP areas, then the PLR SHOULD use the 378 NodeID address of NNhop MP if included in the RRO of RESV. If the 379 NP-MP in a different area has not included NodeID in RRO, then the 380 PLR SHOULD use NP-MP's interface address present in the RRO. The 381 PLR SHOULD use its router ID as the source address of the bypass 382 LSP. 384 - The PLR SHOULD also include its router ID in a NodeID sub-object 385 in PATH RRO unless configured explicitly not to include NodeID. 386 While including its router ID in the NodeID sub-object carried in 387 the outgoing Path message, the PLR MUST include the NodeID sub- 388 object after including its IPv4/IPv6 address or unnumbered 389 interface ID sub-object. 391 - In parallel to the attempt made to create NP-bypass or LP-bypass, 392 the PLR SHOULD initiate a Node-ID based Hello session to the NNhop 393 or Nhop node respectively to establish the RSVP-TE signaling 394 adjacency. This Hello session is used to detect MP node failure as 395 well as determine the capability of the MP node. If the MP sets I- 396 bit in CAPABILITY object [TE-SCALE-REC] carried in Hello message 397 corresponding to NodeID based Hello session, then the PLR SHOULD 398 conclude that the MP supports refresh-interval independent FRR 399 procedures defined in this document. 401 - If the bypass LSP comes up, then the PLR SHOULD include Bypass 402 Summary FRR Extended (B-SFRR) Association object and triggers a 403 PATH message to be sent. If a B-SFRR Extended Association object 404 is included in the PATH message, then the encoding and ordering 405 rules for the B-SFRR Extended Association object specified in 406 [SUMMARY-FRR] MUST be followed. 408 4.1.2. Remote Signaling Adjacency 410 A NodeID based RSVP-TE Hello session is one in which NodeID is used 411 in source and destination address fields in RSVP Hello. [RFC4558] 412 formalizes NodeID based Hello messages between two routers. This 413 document extends NodeID based RSVP Hello session to track the state 414 of any RSVP-TE neighbor that is not directly connected by at least 415 one interface. In order to apply NodeID based RSVP-TE Hello session 416 between any two routers that are not immediate neighbors, the router 417 that supports the extensions defined in the document SHOULD set TTL 418 to 255 in the NodeID based Hello messages exchanged between PLR and 419 MP. The default hello interval for this NodeID hello session SHOULD 420 be set to the default specified in [TE-SCALE-REC]. 422 In the rest of the document the term "signaling adjacency", or 423 "remote signaling adjacency" refers specifically to the RSVP-TE 424 signaling adjacency. 426 4.1.3. MP Behavior 428 When the NNhop or the Nhop node receives the triggered PATH with a 429 "matching" Bypass Summary FRR Extended Association object, the node 430 should consider itself as the MP for the PLR IP address 431 "corresponding" to the Bypass Summary FRR Extended Association 432 object. The matching and ordering rules for Bypass Summary FRR 433 Extended Association specified in [SUMMARY-FRR] MUST be followed by 434 implementations supporting this document. 436 In addition to the above procedures, the node SHOULD check the 437 presence of remote signaling adjacency with Refresh-interval 438 Independent RSVP (RI-RSVP) capable PLR. RI-RSVP capability is 439 specified in [TE-SCALE-REC] and this document updates the semantics 440 of RI-RSVP capability for RFC 4090 facility bypass FRR. If a 441 matching Bypass Summary FRR Extended Association object is found in 442 the PATH and if the RSVP-TE signaling adjacency is also present, 443 then the node concludes that the PLR will undertake refresh-interval 444 independent FRR procedures specified in this document. If the PLR 445 has included NodeID sub-object in PATH RRO, then that NodeID is the 446 remote neighbor address. Otherwise, the PLR's interface address in 447 PATH RRO will be the remote neighbor address. To enable the MP to 448 correctly match the bypass source address in B-SFRR Extended 449 Association object with the corresponding RSVP-TE Node-ID based 450 signaling adjacency with the PLR, the bypass source address in B- 451 SFRR Extended Association object MUST either be equal to or be tied 452 to the same node on TED, as the PLR's address used for sending 453 NodeID based Hello messages for maintaining RSVP-TE signaling 454 adjacency with the MP. It is recommended that the PLR and the MP 455 include NodeID sub-object in PATH and RESV RRO respectively, and the 456 PLR select its NodeID address as the source and the NodeID address 457 of the MP as the destination addresses for the bypass LSP. 459 - If a matching Bypass Summary FRR Extended Association object is 460 included by the PPhop node and if a corresponding Node-ID 461 signaling adjacency exists with the PPhop node, then the router 462 SHOULD conclude it is NP-MP. 464 - If a matching Bypass Summary FRR Extended Association object is 465 included by Phop node and if a corresponding Node-ID signaling 466 adjacency exists with the Phop node, then the router SHOULD 467 conclude it is LP-MP. 469 4.1.4. "Remote" state on MP 471 Once a router concludes it is the MP for a PLR running refresh- 472 interval independent FRR procedures, it SHOULD create a remote path 473 state for the LSP. The "remote" state is identical to the protected 474 LSP path state except for the difference in RSVP_HOP object. The 475 thatRSVP_HOP object in "remote" Path state contains the address that 476 the PLR uses to send NodeID hello messages to MP. 478 The MP SHOULD consider the "remote" path state automatically deleted 479 if: 481 - MP later receives a PATH with no matching B-SFRR Extended 482 Association object corresponding to the PLR's IP address contained 483 in PATH RRO, or 485 - Node signaling adjacency with PLR goes down, or 487 - MP receives backup LSP signaling from PLR or 489 - MP receives PathTear, or 491 - MP deletes the LSP state on local policy or exception event 492 Unlike the normal path state that is either locally generated on the 493 Ingress or created from a PATH message from the Phop node, the 494 "remote" path state is not signaled explicitly from PLR. The purpose 495 of "remote" path state is to enable the PLR to explicitly tear down 496 path and reservation states corresponding to the LSP by sending tear 497 message for the "remote" path state. Such message tearing down 498 "remote" path state is called "Remote PathTear. 500 The scenarios in which "Remote" PathTear is applied are described in 501 Section 4.4 - Remote State Teardown. 503 4.2. Impact of Failures on LSP State 505 This section describes the procedures for routers on the LSP path 506 for different kinds of failures. The procedures described on 507 detecting RSVP control plane adjacency failures do not impact the 508 RSVP-TE graceful restart mechanisms ([RFC3473], [RFC5063]). If the 509 router executing these procedures act as helper for neighboring 510 router, then the control plane adjacency will be declared as having 511 failed after taking into account the grace period extended for 512 neighbor by the helper. 514 Immediate node failures are detected from the state of NodeID hello 515 sessions established with immediate neighbors. [TE-SCALE-REC] 516 recommends each router to establish NodeID hello sessions with all 517 its immediate neighbors. PLR or MP node failure is detected from the 518 state of remote signaling adjacency established according to Section 519 4.1.2 of this document. 521 4.2.1. Non-MP Behavior 523 When a router detects Phop link or Phop node failure and the router 524 is not an MP for the LSP, then it SHOULD send Conditional PathTear 525 (refer to Section "Conditional PathTear" below) and delete PSB and 526 RSB states corresponding to the LSP. 528 4.2.2. LP-MP Behavior 530 When the Phop link for an LSP fails on a router that is LP-MP for 531 the LSP, the LP-MP SHOULD retain PSB and RSB states corresponding to 532 the LSP till the occurrence of any of the following events. 534 - Node-ID signaling adjacency with Phop PLR goes down, or 536 - MP receives normal or "Remote" PathTear for PSB, or 537 - MP receives ResvTear RSB. 539 When a router that is LP-MP for an LSP detects Phop node failure 540 from Node-ID signaling adjacency state, the LP-MP SHOULD send normal 541 PathTear and delete PSB and RSB states corresponding to the LSP. 543 4.2.3. NP-MP Behavior 545 When a router that is NP-MP for an LSP detects Phop link failure, or 546 Phop node failure from Node-ID signaling adjacency, the router 547 SHOULD retain PSB and RSB states corresponding to the LSP till the 548 occurrence of any of the following events. 550 - Remote Node-ID signaling adjacency with PPhop PLR goes down, or 552 - MP receives normal or "Remote" PathTear for PSB, or 554 - MP receives ResvTear for RSB. 556 When a router that is NP-MP does not detect Phop link or node 557 failure, but receives Conditional PathTear from the Phop node, then 558 the router SHOULD retain PSB and RSB states corresponding to the LSP 559 till the occurrence of any of the following events. 561 - Remote Node-ID signaling adjacency with PPhop PLR goes down, or 563 - MP receives normal or "Remote" PathTear for PSB, or 565 - MP receives ResvTear for RSB. 567 Receiving Conditional PathTear from the Phop node will not impact 568 the "remote" state from the PPhop PLR. Note that Phop node would 569 send Conditional PathTear if it was not an MP. 571 In the example topology in Figure 1, assume C & D are NP-MP for PLRs 572 A & B respectively. Now when A-B link fails, as B is not MP and its 573 Phop link has failed, B will delete LSP state (this behavior is 574 required for unprotected LSPs - Section 4.2.1). In the data plane, 575 that would require B to delete the label forwarding entry 576 corresponding to the LSP. So if B's downstream nodes C and D 577 continue to retain state, it would not be correct for D to continue 578 to assume itself as NP-MP for PLR B. 580 The mechanism that enables D to stop considering itself as the NP-MP 581 for B and delete the corresponding "remote" path state is given 582 below. 584 1. When C receives Conditional PathTear from B, it decides to 585 retain LSP state as it is NP-MP of PLR A. C also SHOULD check 586 whether Phop B had previously signaled availability of node 587 protection. As B had previously signaled NP availability by 588 including B-SFRR Extended Association object, C SHOULD remove 589 the B-SFRR Extended Association object containing Association 590 Source set to B from the PATH message and trigger PATH to D. 591 2. When D receives triggered PATH, it realizes that it is no 592 longer the NP-MP for B and so it deletes the corresponding 593 "remote" path state. D does not propagate PATH further down 594 because the only change is that the B-SFRR Extended Association 595 object corresponding to Association Source B is no longer 596 present in the PATH message. 597 4.2.4. Behavior of a Router that is both LP-MP and NP-MP 599 A router may be both LP-MP as well as NP-MP at the same time for 600 Phop and PPhop nodes respectively of an LSP. If Phop link fails on 601 such node, the node SHOULD retain PSB and RSB states corresponding 602 to the LSP till the occurrence of any of the following events. 604 - Both Node-ID signaling adjacencies with Phop and PPhop nodes go 605 down, or 607 - MP receives normal or "Remote" PathTear for PSB, or 609 - MP receives ResvTear for RSB. 611 If a router that is both LP-MP and NP-MP detects Phop node failure, 612 then the node SHOULD retain PSB and RSB states corresponding to the 613 LSP till the occurrence of any of the following events. 615 - Remote Node-ID signaling adjacency with PPhop PLR goes down, or 617 - MP receives normal or "Remote" PathTear for PSB, or 619 - MP receives ResvTear for RSB. 621 4.3. Conditional Path Tear 623 In the example provided in the Section 4.2.5 "NP-MP Behavior on PLR 624 link failure", B deletes PSB and RSB states corresponding to the LSP 625 once B detects its link to Phop went down as B is not MP. If B were 626 to send PathTear normally, then C would delete LSP state 627 immediately. In order to avoid this, there should be some mechanism 628 by which B can indicate to C that B does not require the receiving 629 node to unconditionally delete the LSP state immediately. For this, 630 B SHOULD add a new optional object called CONDITIONS object in 631 PathTear. The new optional object is defined in Section 4.3.3. If 632 node C also understands the new object, then C SHOULD delete LSP 633 state only if it is not an NP-MP - in other words C SHOULD delete 634 LSP state if there is no "remote" PLR path state on C. 636 4.3.1. Sending Conditional Path Tear 638 A router that is not an MP for an LSP SHOULD delete PSB and RSB 639 states corresponding to the LSP if Phop link or Phop Node-ID 640 signaling adjacency goes down (Section 4.2.1). The router SHOULD 641 send Conditional PathTear if the following are also true. 643 - Ingress has requested node protection for the LSP, and 645 - PathTear is not received from the upstream node 647 4.3.2. Processing Conditional Path Tear 649 When a router that is not an NP-MP receives Conditional PathTear, 650 the node SHOULD delete PSB and RSB states corresponding to the LSP, 651 and process Conditional PathTear by considering it as normal 652 PathTear. Specifically, the node SHOULD NOT propagate Conditional 653 PathTear downstream but remove the optional object and send normal 654 PathTear downstream. 656 When a node that is an NP-MP receives Conditional PathTear, it 657 SHOULD NOT delete LSP state. The node SHOULD check whether the Phop 658 node had previously included B-SFRR Extended Association object in 659 PATH. If the object had been included previously by the Phop, then 660 the node processing Conditional PathTear from the Phop SHOULD remove 661 the corresponding object and trigger PATH downstream. 663 If Conditional PathTear is received from a neighbor that has not 664 advertised support (refer to Section 4.5) for the new procedures 665 defined in this document, then the node SHOULD consider the message 666 as normal PathTear. The node SHOULD propagate normal PathTear 667 downstream and delete the LSP state. 669 4.3.3. CONDITIONS object 671 As any implementation that does not support Conditional PathTear 672 SHOULD ignore the new object but process the message as normal 673 PathTear without generating any error, the Class-Num of the new 674 object SHOULD be 10bbbbbb where 'b' represents a bit (from Section 675 3.10 of [RFC2205]). 677 The new object is called as "CONDITIONS" object that will specify 678 the conditions under which default processing rules of the RSVP-TE 679 message SHOULD be invoked. 681 The object has the following format: 683 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 684 | Length | Class | C-type | 685 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 686 | Reserved |M| 687 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 689 Length 691 This contains the size of the object in bytes and should be set to 692 eight. 694 Class 696 To be assigned 698 C-type 700 1 702 M bit 704 If M-bit is set to 1, then the PathTear message SHOULD be processed 705 based on the condition if the receiver router is a Merge Point or 706 not. 708 If M-bit is set to 0, then the PathTear message SHOULD be processed 709 as normal PathTear message. 711 4.4. Remote State Teardown 713 If the Ingress wants to tear down the LSP because of a management 714 event while the LSP is being locally repaired at a transit PLR, it 715 would not be desirable to wait till the completion of backup LSP 716 signaling to perform state cleanup. To enable LSP state cleanup when 717 the LSP is being locally repaired, the PLR SHOULD send "remote" 718 PathTear message instructing the MP to delete PSB and RSB states 719 corresponding to the LSP. The TTL in "remote" PathTear message 720 SHOULD be set to 255. 722 Consider node C in example topology (Figure 1) has gone down and B 723 locally repairs the LSP. 725 1. Ingress A receives a management event to tear down the LSP. 726 2. A sends normal PathTear to B. 727 3. Assume B has not initiated backup signaling for the LSR.To enable 728 LSP state cleanup, B SHOULD send "remote" PathTear with 729 destination IP address set to that of D used in Node-ID signaling 730 adjacency with D, and RSVP_HOP object containing local address 731 used in Node-ID signaling adjacency. 732 4. B then deletes PSB and RSB states corresponding to the LSP. 733 5. On D there would be a remote signaling adjacency with B and so D 734 SHOULD accept the remote PathTear and delete PSB and RSB states 735 corresponding to the LSP. 736 4.4.1. PLR Behavior on Local Repair Failure 738 If local repair fails on the PLR after a failure, then this should 739 be considered as a case for cleaning up LSP state from PLR to the 740 Egress. PLR would achieve this using "remote" PathTear to clean up 741 state from MP. If MP has retained state, then it would propagate 742 PathTear downstream thereby achieving state cleanup. Note that in 743 the case of link protection, the PathTear would be directed to LP-MP 744 node IP address rather than the Nhop interface address. 746 4.4.2. PLR Behavior on Resv RRO Change 748 When a router that has already made NP available detects a change in 749 the RRO carried in RESV message, and if the RRO change indicates 750 that the router's former NP-MP is no longer present in the LSP path, 751 then the router SHOULD send "Remote" PathTear directly to its former 752 NP-MP. 754 In the example topology in Figure 1, assume A has made node 755 protection available and C has concluded it is the NP-MP for A. When 756 the B-C link fails then C, implementing the procedure specified in 757 Section 4.2.4 of this document, will retain state till: remote 758 NodeID signaling adjacency with A goes down, or PathTear or ResvTear 759 is received for PSB or RSB respectively. If B also has made node 760 protection available, B will eventually complete backup LSP 761 signaling with its NP-MP D and trigger RESV to A with RRO changed. 762 The new RRO of the LSP carried in RESV will not contain C. When A 763 processes the RESV with a new RRO not containing C - its former NP- 764 MP, A SHOULD send "Remote" PathTear to C. When C receives a "Remote" 765 PathTear for its PSB state, C will send normal PathTear downstream 766 to D and delete both PSB and RSB states corresponding to the LSP. As 767 D has already received backup LSP signaling from B, D will retain 768 control plane and forwarding states corresponding to the LSP. 770 4.4.3. LSP Preemption during Local Repair 772 4.4.3.1. Preemption on LP-MP after Phop Link failure 774 If an LSP is preempted on LP-MP after its Phop or incoming link has 775 already failed but the backup LSP has not been signaled yet, then 776 the node SHOULD send normal PathTear and delete both PSB and RSB 777 states corresponding to the LSP. As the LP-MP has retained LSP state 778 expecting the PLR to perform backup LSP signaling, preemption would 779 bring down the LSP and the node would not be LP-MP any more 780 requiring the node to clean up LSP state. 782 4.4.3.2. Preemption on NP-MP after Phop Link failure 784 If an LSP is preempted on NP-MP after its Phop link has already 785 failed but the backup LSP has not been signaled yet, then the node 786 SHOULD send normal PathTear and delete PSB and RSB states 787 corresponding to the LSP. As the NP-MP has retained LSP state 788 expecting the PLR to perform backup LSP signaling, preemption would 789 bring down the LSP and the node would not be NP-MP any more 790 requiring the node to clean up LSP state. 792 Consider B-C link goes down on the same example topology (Figure 1). 793 As C is NP-MP for PLR A, C will retain LSP state. 795 1. The LSP is preempted on C. 796 2. C will delete RSB state corresponding to the LSP. But C cannot 797 send PathErr or ResvTear to PLR A because backup LSP has not 798 been signaled yet. 799 3. As the only reason for C having retained state after Phop node 800 failure was that it was NP-MP, C SHOULD send normal PathTear to 801 D and delete PSB state also. D would also delete PSB and RSB 802 states on receiving PathTear from C. 803 4. B starts backup LSP signaling to D. But as D does not have the 804 LSP state, it will reject backup LSP PATH and send PathErr to B. 805 5. B will delete its reservation and send ResvTear to A. 806 4.5. Backward Compatibility Procedures 808 The "Refresh interval Independent FRR" or RI-RSVP-FRR referred below 809 in this section refers to the changes that have been proposed in 810 previous sections. Any implementation that does not support them has 811 been termed as "non-RI-RSVP-FRR implementation". The extensions 812 proposed in [SUMMARY-FRR] are applicable to implementations that do 813 not support RI-RSVP-FRR. On the other hand, changes proposed 814 relating to LSP state cleanup namely Conditional and remote PathTear 815 require support from one-hop and two-hop neighboring nodes along the 816 LSP path. So procedures that fall under LSP state cleanup category 817 SHOULD be turned on only if all nodes involved in the node 818 protection FRR i.e. PLR, MP and intermediate node in the case of NP, 819 support the extensions. Note that for LSPs requesting only link 820 protection, the PLR and the LP-MP should support the extensions. 822 4.5.1. Detecting Support for Refresh interval Independent FRR 824 An implementation supporting the extensions specified in previous 825 sections (called RI-RSVP-FRR here after) SHOULD set the flag 826 "Refresh interval Independent RSVP" or RI-RSVP in CAPABILITY object 827 carried in Hello messages. The RI-RSVP flag is specified in [TE- 828 SCALE-REC]. 830 - As nodes supporting the extensions SHOULD initiate Node Hellos 831 with adjacent nodes, a node on the path of protected LSP can 832 determine whether its Phop or Nhop neighbor supports RI-RSVP-FRR 833 enhancements from the Hello messages sent by the neighbor. 835 - If a node attempts to make node protection available, then the 836 PLR SHOULD initiate remote Node-ID signaling adjacency with NNhop. 837 If the NNhop (a) does not reply to remote node Hello message or 838 (b) does not set RI-RSVP flag in CAPABILITY object carried in its 839 Node-ID Hello messages, then the PLR can conclude that NNhop does 840 not support RI-RSVP-FRR extensions. 842 - If node protection is requested for an LSP and if (a) PPhop node 843 has not included a matching B-SFRR Extended Association object in 844 PATH or (b) PPhop node has not initiated remote node Hello 845 messages or (c) PPhop node does not set RI-RSVP flag in CAPABILITY 846 object carried in its Node-ID Hello messages, then the node SHOULD 847 conclude that the PLR does not support RI-RSVP-FRR extensions. The 848 details are described in the "Procedures for backward 849 compatibility" section below. 851 4.5.2. Procedures for backward compatibility 853 The procedures defined hereafter are performed on a subset of LSPs 854 that traverse a node, rather than on all LSPs that traverse a node. 855 This behavior is required to support backward compatibility for a 856 subset of LSPs traversing nodes running non-RI-RSVP-FRR 857 implementations. 859 4.5.2.1. Lack of support on Downstream Node 861 - If the Nhop does not support the RI-RSVP-FRR extensions, then the 862 node SHOULD reduce the "refresh period" in TIME_VALUES object 863 carried in PATH to default short refresh default value. 865 - If node protection is requested and the NNhop node does not 866 support the enhancements, then the node SHOULD reduce the "refresh 867 period" in TIME_VALUES object carried in PATH to a short refresh 868 default value. 870 If the node reduces the refresh time from the above procedures, it 871 SHOULD also not send remote PathTear or Conditional PathTear 872 messages. 874 Consider the example topology in Figure 1. If C does not support the 875 RI-RSVP-FRR extensions, then: 877 - A and B SHOULD reduce the refresh time to default value of 30 878 seconds and trigger PATH 880 - If B is not an MP and if Phop link of B fails, B cannot send 881 Conditional PathTear to C but SHOULD time out PSB state from A 882 normally. This would be accomplished if A would also reduce the 883 refresh time to default value. So if C does not support the RI- 884 RSVP-FRR extensions, then Phop B and PPhop A SHOULD reduce refresh 885 time to a small default value. 887 4.5.2.2. Lack of support on Upstream Node 889 - If Phop node does not support the RI-RSVP-FRR extensions, then 890 the node SHOULD reduce the "refresh period" in TIME_VALUES object 891 carried in RESV to default short refresh time value. 893 - If node protection is requested and the Phop node does not 894 support the RI-RSVP-FRR extensions, then the node SHOULD reduce 895 the "refresh period" in TIME_VALUES object carried in PATH to 896 default value. 898 - If node protection is requested and PPhop node does not support 899 the RI-RSVP-FRR extensions, then the node SHOULD reduce the 900 "refresh period" in TIME_VALUES object carried in RESV to default 901 value. 903 - If the node reduces the refresh time from the above procedures, 904 it SHOULD also not execute MP procedures specified in Section 4.2 905 of this document. 907 4.5.2.3. Incremental Deployment 909 The backward compatibility procedures described in the previous sub- 910 sections imply that a router supporting the RI-RSVP-FRR extensions 911 specified in this document can apply the procedures specified in the 912 document either in the downstream or upstream direction of an LSP, 913 depending on the capability of the routers downstream or upstream in 914 the LSP path. 916 - RI-RSVP-FRR extensions and procedures are enabled for downstream 917 Path, PathTear and ResvErr messages corresponding to an LSP if 918 link protection is requested for the LSP and the Nhop node 919 supports the extensions 921 - RI-RSVP-FRR extensions and procedures are enabled for downstream 922 Path, PathTear and ResvErr messages corresponding to an LSP if 923 node protection is requested for the LSP and both Nhop & NNhop 924 nodes support the extensions 926 - RI-RSVP-FRR extensions and procedures are enabled for upstream 927 PathErr, Resv and ResvTear messages corresponding to an LSP if 928 link protection is requested for the LSP and the Phop node 929 supports the extensions 931 - RI-RSVP-FRR extensions and procedures are enabled for upstream 932 PathErr, Resv and ResvTear messages corresponding to an LSP if 933 node protection is requested for the LSP and both Phop and PPhop 934 nodes support the extensions 936 For example, if an implementation supporting the RI-RSVP-FRR 937 extensions specified in this document is deployed on all routers in 938 particular region of the network and if all the LSPs in the network 939 request node protection, then the FRR extensions will only be 940 applied for the LSP segments that traverse the particular region. 941 This will aid incremental deployment of these extensions and also 942 allow reaping the benefits of the extensions in portions of the 943 network where it is supported. 945 5. Security Considerations 947 This security considerations pertaining to [RFC2205], [RFC3209] and 948 [RFC5920] remain relevant. 950 This document extends the applicability of Node-ID based Hello 951 session between immediate neighbors. The Node-ID based Hello session 952 between PLR and NP-MP may require the two routers to exchange Hello 953 messages with non-immediate neighbor. So, the implementations SHOULD 954 provide the option to configure Node-ID neighbor specific or global 955 authentication key to authentication messages received from Node-ID 956 neighbors. The network administrator MAY utilize this option to 957 enable RSVP-TE routers to authenticate Node-ID Hello messages 958 received with TTL greater than 1. Implementations SHOULD also 959 provide the option to specify a limit on the number of Node-ID based 960 Hello sessions that can be established on a router supporting the 961 extensions defined in this document. 963 6. IANA Considerations 965 6.1. New Object - CONDITIONS 967 RSVP Change Guidelines [RFC3936] defines the Class-Number name space 968 for RSVP objects. The name space is managed by IANA. 970 IANA registry: RSVP Parameters 971 Subsection: Class Names, Class Numbers, and Class Types 973 A new RSVP object using a Class-Number from 128-183 range called the 974 "CONDITIONS" object is defined in Section 4.3 of this document. The 975 Class-Number from 128-183 range will be allocated by IANA. 977 7. Normative References 979 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 980 Requirement Levels", BCP 14, RFC 2119, March 1997. 982 [RFC3209] Awduche, D., "RSVP-TE: Extensions to RSVP for LSP 983 Tunnels", RFC 3209, December 2001. 985 [RFC4090] Pan, P., "Fast Reroute Extensions to RSVP-TE for LSP 986 Tunnels", RFC 4090, May 2005. 988 [RFC2961] Berger, L., "RSVP Refresh Overhead Reduction Extensions", 989 RFC 2961, April 2001. 991 [RFC2205] Braden, R., "Resource Reservation Protocol (RSVP)", RFC 992 2205, September 1997. 994 [RFC4558] Ali, Z., "Node-ID Based Resource Reservation (RSVP) Hello: 995 A Clarification Statement", RFC 4558, June 2006. 997 [RFC3473] Berger, L., "Generalized Multi-Protocol Label Switching 998 Signaling Resource Reservation Protocol-Traffic Engineering 999 Extensions", RFC 3473, January 2003. 1001 [RFC5063] Satyanarayana, A., "Extensions to GMPLS Resource 1002 Reservation Protocol Graceful Restart", RFC5063, October 1003 2007. 1005 [RFC3936] Kompella, K. and J. Lang, "Procedures for Modifying the 1006 Resource reSerVation Protocol (RSVP)", BCP 96, RFC 3936, 1007 October 2004. 1009 [TE-SCALE-REC] Vishnu Pavan Beeram et. al, "Implementation 1010 Recommendations to improve scalability of RSVP-TE 1011 Deployments", draft-ietf-teas-rsvp-te-scaling-rec (work in 1012 progress) 1014 [SUMMARY-FRR] Mike Tallion et. al, "RSVP-TE Summary Fast Reroute 1015 Extensions for LSP Tunnels", draft-mtaillon-mpls-summary- 1016 frr-rsvpte (work in progress) 1018 8. Informative References 1020 [RFC5439] Yasukawa, S., "An Analysis of Scaling Issues in MPLS-TE 1021 Core Networks", RFC 5439, February 2009. 1023 [RFC5920] Fang, L., "Security Framework for MPLS and GMPLS 1024 Networks", RFC 5920, July 2010. 1026 9. Acknowledgments 1028 We are very grateful to Yakov Rekhter for his contributions to the 1029 development of the idea and thorough review of content of the draft. 1030 Thanks to Raveendra Torvi and Yimin Shen for their comments and 1031 inputs. 1033 10. Contributors 1035 Markus Jork 1036 Juniper Networks 1037 Email: mjork@juniper.net 1039 Harish Sitaraman 1040 Juniper Networks 1041 Email: hsitaraman@juniper.net 1043 Vishnu Pavan Beeram 1044 Juniper Networks 1045 Email: vbeeram@juniper.net 1047 Ebben Aries 1048 Juniper Networks 1049 Email: exa@juniper.net 1051 Mike Tallion 1052 Cisco Systems Inc. 1053 Email: mtallion@cisco.com 1055 11. Authors' Addresses 1057 Chandra Ramachandran 1058 Juniper Networks 1059 Email: csekar@juniper.net 1061 Ina Minei 1062 Google, Inc 1063 inaminei@google.com 1065 Dante Pacella 1066 Verizon 1067 Email: dante.j.pacella@verizon.com 1068 Tarek Saad 1069 Cisco Systems Inc. 1070 Email: tsaad@cisco.com