idnits 2.17.1 draft-chandra-mpls-enhanced-frr-bypass-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == The page length should not exceed 58 lines per page, but there was 18 longer pages, the longest (page 1) being 61 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 23 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 5 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 06, 2015) is 3336 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'E' is mentioned on line 166, but not defined == Missing Reference: 'F' is mentioned on line 180, but not defined == Missing Reference: 'RFC3473' is mentioned on line 433, but not defined == Missing Reference: 'RFC5063' is mentioned on line 920, but not defined == Unused Reference: 'RFC3209' is defined on line 941, but no explicit reference was found in the text Summary: 0 errors (**), 0 flaws (~~), 9 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Chandra Ramachandran (Ed) 3 Internet Draft Yakov Rekhter 4 Intended status: Standards Track Juniper Networks 5 Ina Minei 6 Google, Inc 7 Ebben Aries 8 Facebook 9 Dante Pacella 10 Verizon 12 Expires: September 06, 2015 March 06, 2015 14 Refresh Interval Independent FRR Facility Protection 15 draft-chandra-mpls-enhanced-frr-bypass-01 17 Status of this Memo 19 This Internet-Draft is submitted in full conformance with the 20 provisions of BCP 78 and BCP 79. 22 Internet-Drafts are working documents of the Internet Engineering 23 Task Force (IETF), its areas, and its working groups. Note that 24 other groups may also distribute working documents as Internet- 25 Drafts. 27 Internet-Drafts are draft documents valid for a maximum of six 28 months and may be updated, replaced, or obsoleted by other documents 29 at any time. It is inappropriate to use Internet-Drafts as 30 reference material or to cite them other than as "work in progress." 32 The list of current Internet-Drafts can be accessed at 33 http://www.ietf.org/ietf/1id-abstracts.txt 35 The list of Internet-Draft Shadow Directories can be accessed at 36 http://www.ietf.org/shadow.html 38 This Internet-Draft will expire on September 06, 2015. 40 Copyright Notice 42 Copyright (c) 2015 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents 47 (http://trustee.ietf.org/license-info) in effect on the date of 48 publication of this document. Please review these documents 49 carefully, as they describe your rights and restrictions with 50 respect to this document. Code Components extracted from this 51 document must include Simplified BSD License text as described in 52 Section 4.e of the Trust Legal Provisions and are provided without 53 warranty as described in the Simplified BSD License. 55 Abstract 57 This document defines RSVP-TE extensions to facilitate refresh- 58 interval independent FRR facility protection. 60 Conventions used in this document 62 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 63 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 64 document are to be interpreted as described in RFC-2119 [RFC2119]. 66 Table of Contents 68 1. Introduction...................................................3 69 2. Motivation.....................................................3 70 3. Problem Description............................................4 71 4. Solution Aspects...............................................6 72 4.1. Signaling Protection availability in Path RRO Flags.......7 73 4.1.1. PLR Behavior.........................................7 74 4.1.2. Remote Signaling Adjacency...........................8 75 4.1.3. PATH RRO flags Propagation...........................9 76 4.1.4. MP Behavior..........................................9 77 4.1.5. "Remote" state on MP.................................9 78 4.2. Impact of Failures on LSP State..........................10 79 4.2.1. Non-MP Behavior on Phop Link/Node Failure...........10 80 4.2.2. LP-MP Behavior on Phop Link Failure.................11 81 4.2.3. LP-MP Behavior on Phop Node Failure.................11 82 4.2.4. NP-MP Behavior on Phop Link/Node Failure............11 83 4.2.5. NP-MP Behavior on PLR Link Failure..................11 84 4.2.6. Phop Link Failure on a Node that is LP-MP and NP-MP.12 85 4.2.7. Phop Node Failure on Node that is LP-MP and NP-MP...13 86 4.3. Conditional Path Tear....................................13 87 4.3.1. Sending Conditional Path Tear.......................13 88 4.3.2. Processing Conditional Path Tear....................13 89 4.3.3. CONDITIONS object...................................14 90 4.4. Remote State Teardown....................................15 91 4.4.1. PLR Behavior on Local Repair Failure................15 92 4.4.2. PLR Behavior on Resv RRO Change.....................15 93 4.4.3. LSP Preemption during Local Repair..................16 94 4.4.3.1. Preemption on LP-MP after Phop Link failure....16 95 4.4.3.2. Preemption on NP-MP after Phop Link failure....16 96 4.5. Backward Compatibility Procedures........................17 97 4.5.1. Detecting Support for Enhanced FRR Facility Protection 98 ...........................................................18 99 4.5.2. Procedures for backward compatibility...............19 100 4.5.2.1. Lack of support on Downstream Node.............19 101 4.5.2.2. Lack of support on Upstream Node...............19 102 4.5.2.3. Incremental Deployment.........................20 103 5. Security Considerations.......................................21 104 6. IANA Considerations...........................................21 105 6.1. New Object - CONDITIONS..................................21 106 6.2. New CAPABILITY Object value..............................21 107 7. Normative References..........................................21 108 8. Acknowledgments...............................................22 109 9. Authors' Addresses............................................22 111 1. Introduction 113 The facility backup protection mechanism is one of two methods 114 discussed in [RFC4090] for enabling the fast reroute of traffic onto 115 backup LSP tunnels in 10s of milliseconds, in the event of a 116 failure. This document discusses a few shortcomings with some of the 117 refresh-interval reliant procedures proposed for this method in 118 [RFC4090]. These shortcomings come to the fore under scaled 119 conditions and get highlighted even further when large RSVP-TE 120 refresh intervals are used. The RSVP-TE extensions defined in this 121 document will enhance the facility backup protection mechanism by 122 making the corresponding procedures refresh-interval independent. 124 2. Motivation 126 Standard RSVP [RFC2205] maintains state via the generation of RSVP 127 Path/Resv refresh messages. Refresh messages are used to both 128 synchronize state between RSVP neighbors and to recover from lost 129 RSVP messages. The use of Refresh messages to cover many possible 130 failures has resulted in a number of operational problems. One 131 problem relates to RSVP control plane scaling due to periodic 132 refreshes of Path and Resv messages, another relates to the 133 reliability and latency of RSVP signaling. An additional problem is 134 the time to clean up the stale state after a tear message is lost. 135 For more on these problems see Section 1 of [RFC2961]. All these 136 problems adversely affect RSVP control plane scalability. RSVP-TE 137 inherited all these problems from standard RSVP. 139 Procedures specified in [RFC2961] address the above mentioned 140 problems by eliminating dependency on refreshes for state 141 synchronization and for recovering from lost RSVP messages, and by 142 eliminating dependency on refresh timeout for stale state cleanup. 143 Implementing these procedures allows to improve RSVP-TE control 144 plane scalability. 146 However, the procedures specified in [RFC2961] do not fully address 147 stale state cleanup for facility backup protection [RFC4090], as 148 facility backup protection still depends on refresh timeouts for 149 stale state cleanup. Thus [RFC2961] is insufficient to address the 150 problem of stale state cleanup when facility backup protection is 151 used. 153 The procedures specified in this document, in combination with 154 [RFC2961], eliminate facility backup protection dependency on 155 refresh timeouts for stale state cleanup. These procedures, in 156 combination with [RFC2961], fully address the above mentioned 157 problem of RSVP-TE stale state cleanup, including the cleanup for 158 facility backup protection. 160 The procedures specified in this document assume reliable delivery 161 of RSVP messages, as specified in [RFC2961]. Therefore this document 162 makes support for [RFC2961] a pre-requisite. 164 3. Problem Description 166 [E] 167 / \ 168 / \ 169 / \ 170 / \ 171 / \ 172 / \ 173 [A]-----[B]-----[C]-----[D] 174 \ / 175 \ / 176 \ / 177 \ / 178 \ / 179 \ / 180 [F] 182 Figure 1: Example Topology 184 In the topology illustrated in Figure 1, consider a large number of 185 LSPs from A to D transiting B and C. Assume that refresh interval 186 has been configured to be large of the order of minutes and refresh 187 reduction extensions are enabled on all routers. 189 Also assume that node protection has been configured for the LSPs 190 and the LSPs are protected by each router in the following way 192 - A has made node protection available using bypass LSP A -> E -> 193 C; A is the Point of Local Repair (PLR) and C is Node Protecting 194 Merge Point (NP-MP) 196 - B has made node protection available using bypass LSP B -> F -> 197 D; B is the PLR and D is the NP-MP 199 - C has made link protection available using bypass LSP C -> B -> F 200 -> D; C is the PLR and D is the Link Protecting Merge Point (LP- 201 MP) 203 In the above condition, assume that B-C link fails. The following is 204 the sequence of events that is expected to occur for all protected 205 LSPs under normal conditions. 207 1.B performs local repair and re-directs LSP traffic over the bypass 208 LSP B -> F -> D. 209 2.B also creates backup state for the LSP and triggers sending of 210 backup LSP state to D over the bypass LSP B -> F -> D. 211 3.D receives backup LSP states and merges the backups with the 212 protected LSPs. 213 4.As the link on C over which the LSP states are refreshed has 214 failed, C will no longer receive state refreshes. Consequently the 215 protected LSP states on C will time out and C will send tear down 216 message for all LSPs. 217 While the above sequence of events has been described in [RFC4090], 218 there are a few problems for which no mechanism has been specified 219 explicitly. 221 - If the protected LSP on C times out before D receives signaling 222 for the backup LSP, then D would receive PathTear from C prior to 223 receiving signaling for the backup LSP, thus resulting in deleting 224 the LSP state. This would be possible at scale even with default 225 refresh time. 227 - If upon the link failure C is to keep state until its timeout, 228 then with long refresh interval this may result in a large amount 229 of stale state on C. Alternatively, if upon the link failure C is 230 to delete the state and send PathTear to D, this would result in 231 deleting the state on D, thus deleting the LSP. D needs a reliable 232 mechanism to determine whether it is MP or not to overcome this 233 problem. 235 - If head-end A attempts to tear down LSP after step 1 but before 236 step 2 of the above sequence, then B may receive the tear down 237 message before step 2 and delete the LSP state from its state 238 database. If B deletes its state without informing D, with long 239 refresh interval this could cause (large) buildup of stale state 240 on D. 242 - If B fails to perform local repair in step 1, then B will delete 243 the LSP state from its state database without informing D. As B 244 deletes its state without informing D, with long refresh interval 245 this could cause (large) buildup of stale state on D. 247 The purpose of this document is to provide solutions to the above 248 problems which will then make it practical to scale up to a large 249 number of protected LSPs in the network. 251 4. Solution Aspects 253 The solution consists of five parts. 255 - Enhance the facility protection method defined in [RFC4090] by 256 introducing an MP determination mechanism that enables PLR to 257 signal availability of link or node protection to the MP. See 258 section 4.1 for more details. 260 - Handle upstream link or node failures by cleaning up LSP states 261 if the node has not found itself as MP through the MP 262 determination mechanism. See section 4.2 for more details. 264 The combination of "path state" maintained as Path State Block 265 (PSB) and "reservation state" maintained as Reservation State 266 Block (RSB) forms an individual LSP state on an RSVP-TE speaker. 268 - Introduce extensions to enable a router to send tear down message 269 to downstream router that enables the receiving router to 270 conditionally delete its local state. See section 4.3 for more 271 details. 273 - Enhance facility protection by allowing a PLR to directly send 274 tear down message to MP without requiring the PLR to either have a 275 working bypass LSP or have already signaled backup LSP state. See 276 section 4.4 for more details. 278 - Introduce extensions to enable the above procedures to be 279 backward compatible with routers along the LSP path running 280 implementation that do not support these procedures. See section 281 4.5 for more details. 283 4.1. Signaling Protection availability in Path RRO Flags 285 This section specifies a mechanism to allow the PLR to inform the MP 286 if local protection is available. This mechanism relies on a 287 combination of rules around the propagation of RRO flags carried in 288 PATH messages (Section 4.1.2) and a targeted Node-ID Hello session 289 (Section 4.1.3). 291 4.1.1. PLR Behavior 293 As per the procedures specified in RFC 4090, when a protected LSP 294 comes up and if the "local protection desired" flag is set in the 295 SESSION_ATTRIBUTE object, each node along the LSP path attempts to 296 make local protection available for the LSP. 298 - If the "node protection desired" flag is set, then the node tries 299 to become a PLR by attempting to create a NP-bypass LSP to the 300 NNhop node avoiding the Nhop node on protected LSP path. In case 301 node protection could not be made available after some time out, 302 the node attempts to create a LP-bypass LSP to Nhop node avoiding 303 only the link that protected LSP takes to reach Nhop 305 - If the "node protection desired" flag is not set, then the PLR 306 attempts to create a LP-bypass LSP to Nhop node avoiding the link 307 that the protected LSP takes to reach Nhop 309 With regard to the PLR procedures described above and that are 310 specified in RFC 4090, this document specifies the following 311 recommendations involving addresses selection, and additional PLR 312 procedures involving RRO flags carried in PATH message as well as 313 the initiation of Node-ID based Hello sessions. 315 - While selecting the destination address of the bypass LSP, the 316 PLR SHOULD attempt to select the router ID of the NNhop or Nhop 317 node. If the PLR and the MP are in same area, then the PLR may 318 utilize the TED to determine the router ID from the interface 319 address in RRO (if NodeID is not included in RRO). If the PLR and 320 the MP are in different IGP areas, then the PLR SHOULD use the 321 NodeID address of NNhop MP if included in the RRO of RESV. If the 322 NP-MP in a different area has not included NodeID in RRO, then the 323 PLR SHOULD use NP-MP's interface address present in the RRO. The 324 PLR SHOULD use its router ID as the source address of the bypass 325 LSP. The PLR SHOULD also include its router ID as the NodeID in 326 PATH RRO unless configured explicitly not to include NodeID. 328 In parallel to the attempt made to create NP-bypass or LP-bypass, 329 the PLR SHOULD initiate a Node-ID based Hello session to the NNhop 330 or Nhop node respectively to establish the RSVP-TE signaling 331 adjacency. This Hello session is used to track the state of the 332 adjacency, including detection of adjacency failure. 334 - If the NP-bypass LSP comes up, then the PLR SHOULD set the "local 335 protection available" and "NP available" RRO flags and triggers 336 PATH to be sent. 338 - If the LP-bypass LSP comes up, then the PLR SHOULD set the "local 339 protection available" RRO flag and triggers PATH to be sent. 341 - After signaling protection availability, if the PLR finds that 342 the protection becomes unavailable then it SHOULD attempt to make 343 protection available. The PLR SHOULD wait for a time out before 344 resetting RRO flags relating to protection availability and 345 triggering PATH downstream. On the other hand, the PLR need not 346 wait for a time out to set RRO flags relating to protection 347 availability and immediately trigger PATH downstream. 349 4.1.2. Remote Signaling Adjacency 351 A NodeID based RSVP-TE Hello session is one in which NodeID is used 352 in source and destination address fields in RSVP Hello. [RFC4558] 353 formalizes NodeID based Hello messages between two routers. This 354 document extends NodeID based RSVP Hello session to track the state 355 of RSVP-TE neighbor that is not directly connected by at least one 356 interface. In order to apply NodeID based RSVP-TE Hello session 357 between any two routers that are not immediate neighbors, the router 358 that supports the extensions defined in the document SHOULD set TTL 359 to 255 in the NodeID based Hello messages exchanged between PLR and 360 MP. 362 In the rest of the document the term "signaling adjacency", or 363 "remote signaling adjacency" refers specifically to the RSVP-TE 364 signaling adjacency. 366 4.1.3. PATH RRO flags Propagation 368 As each node along the LSP path can make protection available, 369 propagating PATH immediately due to change in RRO flags on any 370 upstream node would increase control plane message load. So whenever 371 a node receives PATH, it SHOULD check if the only change is in RRO 372 flags. If the change is only in PATH RRO flags, then the node SHOULD 373 decide whether to propagate the PATH based on the following rule. 375 - If "NP desired" flag is set and "NP available" flag has changed 376 in Phop's RRO flags, then PATH is triggered. 378 - In all other cases the change is not propagated. 380 4.1.4. MP Behavior 382 When the NNhop or Nhop node receives the triggered PATH with RRO 383 flag(s) set, the node SHOULD check the presence of remote signaling 384 adjacency with PLR (this check is needed to detect network being 385 partitioned). If the flags are set and the RSVP-TE signaling 386 adjacency is present, the node concludes that protection has been 387 made available at the PLR. If the PLR has included NodeID in PATH 388 RRO, then that NodeID is the remote neighbor address. Otherwise, the 389 PLR's interface address in RRO will be the remote neighbor address. 390 If the "NP available" flag is set by PPhop node, then it is NP-MP. 391 Otherwise, it concludes it is LP-MP. 393 4.1.5. "Remote" state on MP 395 Once a router concludes it is MP, it SHOULD create a remote path 396 state for the LSP. The "remote" state is identical to the protected 397 LSP path state except for the difference in HOP object. The HOP 398 object corresponding to the "remote" path state contains the address 399 of remote node signaling adjacency with PLR. 401 The MP SHOULD consider the "remote" path state automatically deleted 402 if: 404 - NP-MP later receives a PATH with "NP available" flag reset in 405 PLR's RRO flags, or 407 - LP-MP later receives PATH with "local protection available" flag 408 reset in PLR's RRO flags, or 410 - Node signaling adjacency with PLR goes down, or 411 - MP receives backup LSP signaling from PLR or 413 - MP receives PathTear, or 415 - MP deletes the LSP state on local policy or exception event 417 Unlike the normal path state that is either locally generated on 418 Ingress or created from PATH message from Phop node, the "remote" 419 path state is not signaled explicitly form PLR. The purpose of 420 "remote" path state is to enable the PLR to explicitly tear down 421 path and reservation states corresponding to the LSP by sending tear 422 message for the "remote" path state. Such message tearing down 423 "remote" path state is called "Remote PathTear. 425 The scenarios in which "Remote" PathTear is applied are described in 426 Section 4.4 - Remote State Teardown. 428 4.2. Impact of Failures on LSP State 430 This section describes the procedures for routers on the LSP path 431 for different kinds of failures. The procedures described on 432 detecting RSVP control plane adjacency failures do not impact the 433 RSVP-TE graceful restart mechanisms ([RFC3473], [RFC5063]). If the 434 router executing these procedures act as helper for neighboring 435 router, then the control plane adjacency will be declared as having 436 failed after taking into account the grace period extended for 437 neighbor by the helper. 439 It should be noted that even though this section and the subsequent 440 sections of the document mention "link failure" and "node failure" 441 separately involving upstream or downstream of a protected LSP, a 442 router implementing the procedures specified in the document need 443 not have a mechanism to distinguish between these two types of 444 failures. Optionally, a router MAY run Node-ID based RSVP-TE 445 signaling adjacency with immediate neighbors to distinguish between 446 these two types of failures. 448 4.2.1. Non-MP Behavior on Phop Link/Node Failure 450 When a router detects Phop link or Phop node failure and the router 451 is not an MP for the LSP, then it SHOULD send Conditional PathTear 452 (refer to Section "Conditional PathTear" below) and delete PSB and 453 RSB states corresponding to the LSP. 455 4.2.2. LP-MP Behavior on Phop Link Failure 457 When the Phop link for an LSP fails on a router that is LP-MP for 458 the LSP, the LP-MP SHOULD retain PSB and RSB states corresponding to 459 the LSP till the occurrence of any of the following events. 461 - Node-ID signaling adjacency with Phop PLR goes down, or 463 - MP receives normal or "Remote" PathTear for PSB, or 465 - MP receives ResvTear RSB. 467 4.2.3. LP-MP Behavior on Phop Node Failure 469 When a router that is LP-MP for an LSP detects Phop node failure 470 from Node-ID signaling adjacency state, the LP-MP SHOULD send normal 471 PathTear and delete PSB and RSB states corresponding to the LSP. 473 4.2.4. NP-MP Behavior on Phop Link/Node Failure 475 When a router that is NP-MP for an LSP detects Phop link failure, or 476 Phop node failure from Node-ID signaling adjacency, the router 477 SHOULD retain PSB and RSB states corresponding to the LSP till the 478 occurrence of any of the following events. 480 - Remote Node-ID signaling adjacency with PPhop PLR goes down, or 482 - MP receives normal or "Remote" PathTear for PSB, or 484 - MP receives ResvTear for RSB. 486 4.2.5. NP-MP Behavior on PLR Link Failure 488 If the PLR link that is not attached to NP-MP fails and if NP-MP 489 receives Conditional PathTear from the Phop node, then the MP SHOULD 490 retain PSB and RSB states corresponding to the LSP till the 491 occurrence of any of the following events. 493 - Remote Node-ID signaling adjacency with PPhop PLR goes down, or 495 - MP receives normal or "Remote" PathTear for PSB, or 497 - MP receives ResvTear for RSB. 499 Receiving Conditional PathTear from the Phop node will not impact 500 the "remote" state from the PLR. Note that Phop node would send 501 Conditional PathTear if it was not an MP. 503 In the example topology in Figure 1, assume C & D are NP-MP for PLRs 504 A & B respectively. Now when A-B link fails, as B is not MP and its 505 Phop link signaling adjacency has failed, B will delete LSP state 506 (this behavior is required for unprotected LSPs - Section 4.2.1). In 507 the data plane, that would require B to delete the label forwarding 508 entry corresponding to the LSP. So if B's downstream nodes C and D 509 continue to retain state, it would not be correct for D to continue 510 to assume itself as NP-MP for PLR B. 512 The mechanism that enables D to stop considering itself as NP-MP and 513 delete "remote" path state is given below. 515 1. When C receives Conditional PathTear from B, it decides to 516 retain LSP state as it is NP-MP of PLR A. C also SHOULD check 517 whether Phop B had previously signaled availability of node 518 protection. As B had previously signaled NP availability in its 519 PATH RRO flags, C SHOULD reset "local protection available" and 520 "NP available" on RRO flags corresponding to B and trigger PATH 521 to D. 522 2. When D receives triggered PATH, it realizes that it is no longer 523 NP-MP and so deletes the "remote" path state. D does not 524 propagate PATH further down because the only change is in PATH 525 RRO flags of B. 526 4.2.6. Phop Link Failure on a Node that is LP-MP and NP-MP 528 A router may be both LP-MP as well as NP-MP at the same time for 529 Phop and PPhop nodes respectively of an LSP. If Phop link fails on 530 such node, the node SHOULD retain PSB and RSB states corresponding 531 to the LSP till the occurrence of any of the following events. 533 - Both Node-ID signaling adjacencies with Phop and PPhop nodes go 534 down, or 536 - MP receives normal or "Remote" PathTear for PSB, or 538 - MP receives ResvTear for RSB. 540 4.2.7. Phop Node Failure on Node that is LP-MP and NP-MP 542 If a router that is both LP-MP and NP-MP detects Phop node failure, 543 then the node SHOULD retain PSB and RSB states corresponding to the 544 LSP till the occurrence of any of the following events. 546 - Remote Node-ID signaling adjacency with PPhop PLR goes down, or 548 - MP receives normal or "Remote" PathTear for PSB, or 550 - MP receives ResvTear for RSB. 552 4.3. Conditional Path Tear 554 In the example provided in the Section 4.2.5 "NP-MP Behavior on PLR 555 link failure", B deletes PSB and RSB states corresponding to the LSP 556 once B detects its link to Phop went down as B is not MP. If B were 557 to send PathTear normally, then C would delete LSP state 558 immediately. In order to avoid this, there should be some mechanism 559 by which B can indicate to C that B does not require the receiving 560 node to unconditionally delete the LSP state immediately. For this, 561 B SHOULD add a new optional object called CONDITIONS object in 562 PathTear. The new optional object is defined in Section 4.3.3. If 563 node C also understands the new object, then C SHOULD delete LSP 564 state only if it is not an NP-MP - in other words C SHOULD delete 565 LSP state if there is no "remote" PLR state on C. 567 4.3.1. Sending Conditional Path Tear 569 A router that is not an MP for an LSP SHOULD delete PSB and RSB 570 states corresponding to the LSP if Phop link or Phop Node-ID 571 signaling adjacency goes down (Section 4.2.1). The router SHOULD 572 send Conditional PathTear if the following are also true. 574 - Ingress has requested node protection for the LSP, and 576 - PathTear is not received from upstream node 578 4.3.2. Processing Conditional Path Tear 580 When a router that is not an NP-MP receives Conditional PathTear, 581 the node SHOULD delete PSB and RSB states corresponding to the LSP, 582 and process Conditional PathTear by considering it as normal 583 PathTear. Specifically, the node SHOULD NOT propagate Conditional 584 PathTear downstream but remove the optional object and send normal 585 PathTear downstream. 587 When a node that is an NP-MP receives Conditional PathTear, it 588 SHOULD NOT delete LSP state. The node SHOULD check whether the Phop 589 node previously set "NP available" flag in PATH RRO flags. If the 590 flag had been set previously by Phop, then the node SHOULD clear 591 "local protection available" and "NP available" flags in Phop's RRO 592 flags and trigger PATH downstream. 594 If Conditional PathTear is received from a neighbor that has not 595 advertised support (refer to Section 4.5) for the new procedures 596 defined in this document, then the node SHOULD consider the message 597 as normal PathTear. The node SHOULD propagate normal PathTear 598 downstream and delete LSP state. 600 4.3.3. CONDITIONS object 602 As any implementation that does not support Conditional PathTear 603 SHOULD ignore the new object but process the message as normal 604 PathTear without generating any error, the Class-Num of the new 605 object SHOULD be 10bbbbbb where 'b' represents a bit (from Section 606 3.10 of [RFC2205]). 608 The new object is called as "CONDITIONS" object that will specify 609 the conditions under which default processing rules of the RSVP-TE 610 message SHOULD be invoked. 612 The object has the following format: 614 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 615 | Length | Class | C-type | 616 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 617 | Reserved |M| 618 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 620 Length 622 This contains the size of the object in bytes and should be set to 623 eight. 625 Class 627 TBD 629 C-type 631 1 632 M bit 634 This bit indicates that the message SHOULD be processed based on the 635 condition whether the receiving node is Merge Point or not. 637 4.4. Remote State Teardown 639 If the Ingress wants to tear down the LSP because of a management 640 event while the LSP is being locally repaired at a transit PLR, it 641 would not be desirable to wait till backup LSP signaling to perform 642 state cleanup. To enable LSP state cleanup when the LSP is being 643 locally repaired, the PLR SHOULD send "remote" PathTear message 644 instructing the MP to delete PSB and RSB states corresponding to the 645 LSP. 647 Consider node C in example topology (Figure 1) has gone down and B 648 locally repairs the LSP. 650 1. Ingress A receives a management event to tear down the LSP. 651 2. A sends normal PathTear to B. 652 3. To enable LSP state cleanup, B SHOULD send "remote" PathTear 653 with destination IP address set to that of D used in Node-ID 654 signaling adjacency with D, and HOP object containing local 655 address used in Node-ID signaling adjacency. 656 4. B then deletes PSB and RSB states corresponding to the LSP. 657 5. On D there would be a remote signaling adjacency with B and so D 658 SHOULD accept the remote PathTear and delete PSB and RSB states 659 corresponding to the LSP. 660 4.4.1. PLR Behavior on Local Repair Failure 662 If local repair fails on the PLR after a failure, then this should 663 be considered as a case for cleaning up LSP state from PLR to the 664 Egress. PLR would achieve this using "remote" PathTear to clean up 665 state from MP. If MP has retained state, then it would propagate 666 PathTear downstream thereby achieving state cleanup. Note that in 667 the case of link protection, the PathTear would be directed to LP-MP 668 node IP address rather than the Nhop interface address. 670 4.4.2. PLR Behavior on Resv RRO Change 672 When a router that has already made NP available detects a change in 673 the RRO carried in RESV message, and if the RRO change indicates 674 that the router's former NP-MP is no longer present in the LSP path, 675 then the router SHOULD send "Remote" PathTear directly to its former 676 NP-MP. 678 In the example topology in Figure 1, assume A has made node 679 protection available and C has concluded it is NP-MP. When the B-C 680 link fails then implementing the procedure specified in Section 681 4.2.4 of this document, C will retain state till: remote NodeID 682 control plane adjacency with A goes down, or PathTear or ResvTear is 683 received for PSB or RSB respectively. If B also has made node 684 protection available, B will eventually complete backup LSP 685 signaling with its NP-MP D and trigger RESV to A with RRO changed. 686 The new RRO of the LSP carried in RESV will not contain C. When A 687 processes the RESV with a new RRO not containing C - its former NP- 688 MP, A SHOULD send "Remote" PathTear to C. When C receives a "Remote" 689 PathTear for its PSB state, C will send normal PathTear downstream 690 to D and delete both PSB and RSB states corresponding to the LSP. As 691 D has already received backup LSP signaling from B, D will retain 692 control plane and forwarding states corresponding to the LSP. 694 4.4.3. LSP Preemption during Local Repair 696 If an LSP is preempted when there is no failure along the path of 697 the LSP, the node on which preemption occurs would send PathErr and 698 ResvTear upstream and only delete the forwarding state and RSB state 699 corresponding to the LSP. But if the LSP is being locally repaired 700 upstream of the node on which the LSP is preempted, then the node 701 SHOULD delete both PSB and RSB states corresponding to the LSP and 702 send normal PathTear downstream. 704 4.4.3.1. Preemption on LP-MP after Phop Link failure 706 If an LSP is preempted on LP-MP after its Phop or incoming link has 707 already failed but the backup LSP has not been signaled yet, then 708 the node SHOULD send normal PathTear and delete both PSB and RSB 709 states corresponding to the LSP. As the LP-MP has retained LSP state 710 because the PLR would signal the LSP through backup LSP signaling, 711 preemption would bring down the LSP and the node would not be LP-MP 712 any more requiring the node to clean up LSP state. 714 4.4.3.2. Preemption on NP-MP after Phop Link failure 716 If an LSP is preempted on NP-MP after its Phop link has already 717 failed but the backup LSP has not been signaled yet, then the node 718 SHOULD send normal PathTear and delete PSB and RSB states 719 corresponding to the LSP. As the NP-MP has retained LSP state 720 because the PLR would signal the LSP through backup LSP signaling, 721 preemption would bring down the LSP and the node would not be NP-MP 722 any more requiring the node to clean up LSP state. 724 Consider B-C link goes down on the same example topology (Figure 1). 725 As C is NP-MP for PLR A, C will retain LSP state. 727 1. The LSP is preempted on C. 728 2. C will delete RSB state corresponding to the LSP. But C cannot 729 send PathErr or ResvTear to PLR A because backup LSP has not 730 been signaled yet. 731 3. As the only reason for C having retained state after Phop node 732 failure was that it was NP-MP, C SHOULD send normal PathTear to 733 D and delete PSB state also. D would also delete PSB and RSB 734 states on receiving PathTear from C. 735 4. B starts backup LSP signaling to D. But as D does not have the 736 LSP state, it will reject backup LSP PATH and send PathErr to B. 737 5. B will delete its reservation and send ResvTear to A. 738 4.5. Backward Compatibility Procedures 740 The "Enhanced FRR facility protection" referred below in this 741 section refers to the set of changes that have been proposed in 742 previous sections. Any implementation that does not support them has 743 been termed as "existing implementation". Of the proposed 744 extensions, signaling protection using RRO flags is expected to be 745 backward compatible and can work safely irrespective of whether the 746 refresh time is small or arbitrarily long. This is because the 747 existing implementations would not send error or tear down message 748 in response to the flags in PATH RRO but would simply ignore and 749 propagate them. On the other hand, changes proposed relating to LSP 750 state cleanup namely Conditional and remote PathTear require support 751 from other nodes along the LSP path. So procedures that fall under 752 LSP state cleanup category SHOULD be turned on only if all nodes 753 involved in the node protection FRR i.e. PLR, MP and intermediate 754 node in the case of NP, support the extensions. Note that for LSPs 755 requesting only link protection, the PLR and the LP-MP should 756 support the extensions. 758 4.5.1. Detecting Support for Enhanced FRR Facility Protection 760 An implementation supporting the FRR facility protection extensions 761 specified in previous sections SHOULD set a new flag "Enhanced 762 facility protection" in CAPABILITY object in Hello messages. 764 - As nodes supporting the extensions SHOULD initiate Node Hellos 765 with adjacent nodes, a node on the path of protected LSP can 766 determine whether its Phop or Nhop neighbor supports FRR 767 enhancements from the Hello messages sent by the neighbor. 769 - If a node attempts to make node protection available, then the 770 PLR SHOULD initiate remote Node-ID signaling adjacency with NNhop. 771 If the NNhop (a) does not reply to remote node Hello message or 772 (b) does not set "Enhanced facility protection" flag in CAPABILITY 773 object in the reply, then the PLR can conclude that NNhop does not 774 support FRR extensions. 776 - If node protection is requested for an LSP and if (a) PPhop node 777 has not set "local protection available" and "NP available" flags 778 in its RRO flags or (b) PPhop node has not initiated remote node 779 Hello messages, then the node SHOULD conclude that PLR does not 780 support FRR extensions. The details are described in the 781 "Procedures for backward compatibility" section below. 783 The new flag that will be introduced to CAPABILITY object is 784 specified below. 786 0 1 2 3 787 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 788 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 789 | Length | Class-Num(134)| C-Type (1) | 790 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 791 | Reserved |E|T|R|S| 792 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 794 E bit 796 Indicates that the sender supports Enhanced FRR facility protection 798 Any node that sets the new E-bit is set in its CAPABILITY object 799 MUST also set Refresh-Reduction-Capable bit in common header of all 800 RSVP-TE messages. 802 4.5.2. Procedures for backward compatibility 804 The procedures defined hereafter are performed on a subset of LSPs 805 that traverse a node, rather than on all LSPs that traverse a node. 806 This behavior is required to support backward compatibility for a 807 subset of LSPs traversing nodes running existing implementations. 809 4.5.2.1. Lack of support on Downstream Node 811 - If the Nhop does not support enhanced facility protection FRR, 812 then the node SHOULD reduce the "refresh period" in TIME_VALUES 813 object carried in PATH to default small refresh default value. 815 - If node protection is requested and the NNhop node does not 816 support the enhancements, then the node SHOULD reduce the "refresh 817 period" in TIME_VALUES object carried in PATH to a small refresh 818 default value. 820 If the node reduces the refresh time from the above procedures, it 821 SHOULD also not send remote PathTear or Conditional PathTear 822 messages. 824 Consider the example topology in Figure 1. If C does not support 825 scalability improvements, then: 827 - A and B SHOULD reduce the refresh time to default value of 30 828 seconds and trigger PATH 830 - If B is not an MP and if Phop link of B fails, B cannot send 831 Conditional PathTear to C but SHOULD time out PSB state from A 832 normally. This would be accomplished if A would also reduce the 833 refresh time to default value. So if C does not support enhanced 834 facility protection, then Phop B and PPhop A SHOULD reduce refresh 835 time to a small default value. 837 4.5.2.2. Lack of support on Upstream Node 839 - If Phop node does not support enhanced facility protection, then 840 the node SHOULD reduce the "refresh period" in TIME_VALUES object 841 carried in RESV to default small refresh time value. 843 - If node protection is requested and the Phop node does not 844 support the enhancements, then the node SHOULD reduce the "refresh 845 period" in TIME_VALUES object carried in PATH to default value. 847 - If node protection is requested and PPhop node does not support 848 the enhancements, then the node SHOULD reduce the "refresh period" 849 in TIME_VALUES object carried in RESV to default value. 851 - If the node reduces the refresh time from the above procedures, 852 it SHOULD also not execute MP determination procedures. 854 4.5.2.3. Incremental Deployment 856 The backward compatibility procedures described in the previous sub- 857 sections imply that a router supporting the FRR extensions specified 858 in this document can apply the procedures specified in the document 859 either in the downstream or upstream direction of an LSP, depending 860 on the capability of the routers downstream or upstream in the LSP 861 path. 863 - FRR extensions and procedures are enabled for downstream Path, 864 PathTear and ResvErr messages corresponding to an LSP if link 865 protection is requested for the LSP and the Nhop node supports the 866 extensions 868 - FRR extensions and procedures are enabled for downstream Path, 869 PathTear and ResvErr messages corresponding to an LSP if node 870 protection is requested for the LSP and both Nhop & NNhop nodes 871 support the extensions 873 - FRR extensions and procedures are enabled for upstream PathErr, 874 Resv and ResvTear messages corresponding to an LSP if link 875 protection is requested for the LSP and the Phop node supports the 876 extensions 878 - FRR extensions and procedures are enabled for upstream PathErr, 879 Resv and ResvTear messages corresponding to an LSP if node 880 protection is requested for the LSP and both Phop and PPhop nodes 881 support the extensions 883 For example, if implementation supporting the FRR extensions 884 specified in this document is deployed on all routers in particular 885 region of the network and if all the LSPs in the network request 886 node protection, then the FRR extensions will only be applied for 887 the LSP segments that traverse the particular region. This will aid 888 incremental deployment of these extensions and also allow reaping 889 the benefits of the extensions in portions of the network where it 890 is supported. 892 5. Security Considerations 894 This document extends the applicability of Node-ID based Hello 895 session between immediate neighbors. The Node-ID based Hello session 896 between PLR and NP-MP may require the two routers to exchange Hello 897 messages with non-immediate neighbor. So, the implementations SHOULD 898 provide the option to configure Node-ID neighbor specific or global 899 authentication key to authentication messages received from Node-ID 900 neighbors. The network administrator MAY utilize this option to 901 enable RSVP-TE routers to authenticate Node-ID Hello messages 902 received with TTL greater than 1. 904 6. IANA Considerations 906 6.1. New Object - CONDITIONS 908 [RFC2205] defines the Class-Number name space for RSVP objects. The 909 name space is managed by IANA. 911 IANA registry: RSVP Parameters 912 Subsection: Class Names, Class Numbers, and Class Types 914 A new RSVP object using a Class-Number of form 10bbbbbb called the 915 "CONDITIONS" object is defined in Section 4.3 of this document. The 916 Class-Number is TBD. 918 6.2. New CAPABILITY Object value 920 [RFC5063] defines the name space for RSVP Capability Object Values. 921 The name space is managed by IANA. 923 IANA registry: RSVP PARAMETERS 924 Subsection: Capability Object Values 926 A new Capability flag called "Enhanced FRR facility protection" is 927 defined in Section 4.5 of this document. The bit number for this 928 flag is TBD. 930 7. Normative References 932 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 933 Requirement Levels", BCP 14, RFC 2119, March 1997. 935 [RFC4090] Pan, P., "Fast Reroute Extensions to RSVP-TE for LSP 936 Tunnels", RFC 4090, May 2005. 938 [RFC2961] Berger, L., "RSVP Refresh Overhead Reduction 939 Extensions", RFC 2961, April 2001. 941 [RFC3209] Awduche, D., "RSVP-TE: Extensions to RSVP for LSP 942 Tunnels", RFC 3209, December 2001. 944 [RFC2205] Braden, R., "Resource Reservation Protocol (RSVP)", 945 RFC 2205, September 1997. 947 [RFC4558] Ali, Z., "Node-ID Based Resource Reservation (RSVP) 948 Hello: A Clarification Statement", RFC 4558, June 2006. 950 8. Acknowledgments 952 Thanks to Raveendra Torvi and Yimin Shen for their comments and 953 inputs. 955 9. Authors' Addresses 957 Chandra Ramachandran 958 Juniper Networks 959 Email: csekar@juniper.net 961 Yakov Rekhter 962 Juniper Networks 963 Email: yakov@juniper.net 965 Ina Minei 966 Google, Inc 967 inaminei@google.com 969 Ebben Aries 970 Facebook 971 Email: exa@fb.com 973 Dante Pacella 974 Verizon 975 Email: dante.j.pacella@verizon.com 977 Markus Jork 978 Juniper Networks 979 Email: mjork@juniper.net 981 Harish Sitaraman 982 Juniper Networks 983 Email: hsitaraman@juniper.net 985 Vishnu Pavan Beeram 986 Juniper Networks 987 Email: vbeeram@juniper.net