idnits 2.17.1 draft-ietf-ccamp-gr-description-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? -- It seems you're using the 'non-IETF stream' Licence Notice instead Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (January 21, 2009) is 5572 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Obsolete informational reference (is this intentional?): RFC 2411 (Obsoleted by RFC 6071) -- Obsolete informational reference (is this intentional?): RFC 4306 (Obsoleted by RFC 5996) -- Obsolete informational reference (is this intentional?): RFC 4835 (Obsoleted by RFC 7321) Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group Dan Li (Huawei) 2 Internet Draft Jianhua Gao (Huawei) 3 Arun Satyanarayana (Cisco) 4 Snigdho C. Bardalai (Fujitsu) 6 Intended Status: Informational 7 Expires: July 21, 2009 January 21, 2009 9 Description of the RSVP-TE Graceful Restart Procedures 11 draft-ietf-ccamp-gr-description-04.txt 13 Status of this Memo 15 This Internet-Draft is submitted to IETF in full conformance with 16 the provisions of BCP 78 and BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six 24 months and may be updated, replaced, or obsoleted by other documents 25 at any time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 Abstract 36 The Hello message for the Resource Reservation Protocol (RSVP) has 37 been defined to establish and maintain basic signaling node 38 adjacencies for Label Switching Routers (LSRs) participating in a 39 Multiprotocol Label Switching (MPLS) traffic engineered (TE) 40 network. The Hello message has been extended for use in Generalized 41 MPLS (GMPLS) network for state recovery of control channel or nodal 42 faults. 44 GMPLS protocol definitions for RSVP also allow a restarting node to 45 learn the label that it previously allocated for use on a Label 46 Switching Path (LSP). 48 Further RSVP protocol extensions have been defined to enable a 49 restarting node to recover full control plane state by exchanging 50 RSVP messages with its upstream and downstream neighbors. 52 This document provides an informational clarification of the 53 control plane procedures for a GMPLS network when there are 54 multiple node failures, and describes how full control plane state 55 can be recovered in different scenarios where the order in which 56 the nodes restart is different. 58 This document does not define any new processes or procedures. All 59 protocol mechanisms are already defined in the referenced documents. 61 Table of Contents 63 1. Introduction.................................................3 64 2. Existing Procedures for Single Node Restart..................4 65 2.1. Procedures Defined in [RFC3473]............................4 66 2.2. Procedures Defined in [RFC5063]............................5 67 3. Multiple Node Restart Scenarios..............................5 68 4. RSVP State...................................................7 69 5. Procedures for Multiple Node Restart.........................7 70 5.1. Procedures for the Normal Node.............................7 71 5.2. Procedures for the Restarting Node.........................7 72 5.2.1. Procedures for Scenario 1................................8 73 5.2.2. Procedures for Scenario 2................................9 74 5.2.3. Procedures for Scenario 3...............................10 75 5.2.4. Procedures for Scenario 4...............................11 76 5.2.5. Procedures for Scenario 5...............................12 77 5.3. Consideration of Re-Use of Data Plane Resources...........12 78 5.4. Consideration of Management Plane Intervention............12 79 6. Clarification of Restarting Node Procedure..................13 80 7. Security Considerations.....................................14 81 8. IANA Considerations.........................................16 82 9. Acknowledgments.............................................16 83 10. References.................................................16 84 10.1. Normative References.....................................16 85 10.2. Informative References...................................16 86 11. Author's Addresses.........................................17 87 12. Full Copyright Statement...................................18 88 13. Intellectual Property Statement............................18 90 1. Introduction 92 The Hello message for the Resource Reservation Protocol (RSVP) has 93 been defined to establish and maintain basic signaling node 94 adjacencies for Label Switching Routers (LSRs) participating in a 95 Multiprotocol Label Switching (MPLS) traffic engineered (TE) 96 network [RFC3209]. The Hello message has been extended for use in 97 Generalized MPLS (GMPLS) network for state recovery of control 98 channel or nodal faults through the exchange of the Restart 99 Capabilities object [RFC3473]. 101 GMPLS protocol definitions for RSVP [RFC3473] also allow a 102 restarting node to learn the label that it previously allocated for 103 use on a Label Switching Path (LSP) through the RECOVERY_LABEL 104 object carried on a Path message sent to a restarting node from its 105 upstream neighbor. 107 Further RSVP protocol extensions have been defined [RFC5063] to 108 perform graceful restart and to enable a restarting node to recover 109 full control plane state by exchanging RSVP messages with its 110 upstream and downstream neighbors. State previously transmitted to 111 the upstream neighbor (principally the downstream label) is 112 recovered from the upstream neighbor on a Path message (using the 113 RECOVERY_LABEL object as described in [RFC3473]). State previously 114 transmitted to the downstream neighbor (including the upstream 115 label, interface identifiers, and the explicit route) is recovered 116 from the downstream neighbor using a RecoveryPath message. 118 [RFC5063] also extends the Hello message to exchange information 119 about the ability to support the RecoveryPath message. 121 The examples and procedures in [RFC3473] and [RFC5063] focus on the 122 description of a single node restart when adjacent network nodes 123 are operative. Although the procedures are equally applicable to 124 multi-node restarts, no detailed explanation is provided. 126 This document provides an informational clarification of the 127 control plane procedures for a GMPLS network when there are 128 multiple node failures, and describes how full control plane state 129 can be recovered in different scenarios where the order in which 130 the nodes restart is different. 132 This document does not define any new processes or procedures. All 133 protocol mechanisms already defined in [RFC3473] and [RFC5063] are 134 definitive. 136 2. Existing Procedures for Single Node Restart 138 This section documents for information the existing procedures 139 defined in [RFC3473] and [RFC5063]. Those documents are definitive, 140 and the description here is non-normative. It is provided for 141 informational clarification only. 143 2.1. Procedures Defined in [RFC3473] 145 In the case of nodal faults, the procedures for the restarting node 146 and the procedures for the neighbor of a restarting node are 147 applied to the corresponding nodes. These procedures described in 148 [RFC3473] are summarized as follows: 150 For the Restarting Node: 152 1) Tells its neighbors that state recovery is supported using the 153 Hello message; 155 2) Recover its RSVP state with the help of a Path message received 156 from its upstream neighbor carrying the RECOVERY_LABEL object; 158 3) For bidirectional LSPs, the UPSTREAM_LABEL object on the received 159 Path message is used to recover the corresponding RSVP state; 161 4) If the corresponding forwarding state in the data plane does not 162 exist, the node treats this as a setup for a new LSP. If the 163 forwarding state in the data plane exists, the forwarding state is 164 bound to the LSP associated with the message, and related forwarding 165 state should be considered as valid and refreshed. In addition, if 166 the node is not the tail-end of the LSP, the incoming label on the 167 downstream interface is retrieved from the forwarding state on the 168 restarting node and set in the UPSTREAM_LABEL object in the Path 169 message sent to the downstream neighbor. 171 For the Neighbor of a restarting node: 173 1) Sends a Path message with RECOVERY_LABEL object containing a label 174 value corresponding to the label value received in the most recently 175 received corresponding Resv message; 177 2) Resumes refreshing Path state with the restarting node; 179 3) Resumes refreshing Resv state with the restarting node. 181 2.2. Procedures Defined in [RFC5063] 183 A new message is introduced in [RFC5063] called the RecoveryPath 184 message. The message is sent by the downstream neighbor of a 185 restarting node to convey the contents of the last received Path 186 message back to the restarting node. 188 The restarting node will receive the Path message with the 189 RECOVERY_LABEL object from its upstream neighbor, and/or the 190 RecoveryPath message from its downstream neighbor. The full RSVP 191 state of the restarting node can be recovered from these two 192 messages. 194 The following state can be recovered from the received Path message: 196 o Upstream data interface (from RSVP_HOP object) 198 o Label on the upstream data interface (from RECOVERY_LABEL object) 200 o Upstream label for bidirectional LSP (from UPSTREAM_LABEL object) 202 The following state can be recovered from the received RecoveryPath 203 message: 205 o Downstream data interface (from RSVP_HOP object) 207 o Label on the downstream data interface (from RECOVERY_LABEL object) 209 o Upstream direction label for bidirectional LSP (from 210 UPSTREAM_LABEL object) 212 The other objects also can be recovered either from the regular 213 Path and Resv messages, or from the RecoveryPath message. 215 3. Multiple Node Restart Scenarios 217 We define the following terms for the different node types: 219 Restarting - The node has restarted; communication with its 220 neighbor nodes is restored, its RSVP state is under recovery. 222 Delayed Restarting - The node has restarted, but the communication 223 with a neighbor node is interrupted (for example, the neighbor node 224 needs to restart). 226 Normal - The normal node is the fully operational neighbor of a 227 restarting or delayed restarting node. 229 There are five scenarios for multi-node restart. We will focus on 230 the different positions of a restarting node. As shown in Figure 1, 231 an LSP starts from Node A, traverses Nodes B and C, and ends at 232 Node D. 234 +-----+ Path +-----+ Path +-----+ Path +-----+ 235 | PSB |------->| PSB |------->| PSB |------->| PSB | 236 | | | | | | | | 237 | RSB |<-------| RSB |<-------| RSB |<-------| RSB | 238 +-----+ Resv +-----+ Resv +-----+ Resv +-----+ 239 Node A Node B Node C Node D 240 Figure 1 Two neighbor nodes restart 242 1) A Restarting node with downstream Delayed Restarting node. For 243 example, in Figure 1, Nodes A and D are Normal nodes, Node B is a 244 Restarting node, and Node C is a Delayed Restarting node. 246 2) A Restarting node with upstream Delayed Restarting node. For 247 example, in Figure 1, Nodes A and D are Normal nodes, Node B is a 248 Delayed Restarting node, and Node C is a Restarting node. 250 3) A Restarting node with downstream and upstream Delayed Restarting 251 nodes. For example, in Figure 1, Node A is a Normal node, Nodes B and 252 D are Delayed Restarting nodes, and Node C is a Restarting node. 254 4) A Restarting Ingress node with downstream Delayed Restarting node. 255 For example, in Figure 1, Node A is a Restarting node, and Node B is 256 a Delayed Restarting node. Nodes C and D are Normal nodes. 258 5) A Restarting Egress node with upstream Delayed Restarting node. 259 For example, in Figure 1, Nodes A and B are Normal nodes, Node C is a 260 Delayed Restarting node, and Node D is a Restarting node. 262 If the communication between two nodes is interrupted, the upstream 263 node may think the downstream node is a Delayed Restarting node, or 264 vice versa. 266 Note that if multiple nodes which are not neighbors are restarted, 267 the restart Procedures could be applied as multiple separated 268 restart procedures which are exactly the same as the procedures 269 described in [RFC3473] and [RFC5063]. Therefore, these scenarios 270 are not described in this document. For example, in Figure 1, Node 271 A and Node C are normal nodes, and Node B and Node D are restarting 272 nodes, so Node B could be restarted through Node A and Node C, 273 meanwhile, Node D could be restarted through Node C separately. 275 4. RSVP State 277 For each scenario, the RSVP state needs to be recovered at the 278 restarting nodes are Path State Block (PSB) and Resv State Block 279 (RSB), which are created when the node receives the corresponding 280 Path message and Resv message. 282 According to [RFC2209], how to construct the PSB and RSB is really 283 an implementation issue. In fact, there is no requirement to 284 maintain separate PSB and RSB data structures. And in GMPLS, there 285 is a much closer tie between Path and Resv state so it is possible 286 to combine the information into a single state block (the LSP state 287 block). On the other hand, if point to multi-point is supported, it 288 may be convenient to maintain separate upstream and downstream 289 state. Note that the PSB and RSB are not upstream and downstream 290 state since the PSB is responsible for receiving a Path from 291 upstream and sending a Path to downstream. 293 Regardless of how the RSVP state is implemented, on recovery there 294 are two logical pieces of state to be recovered and these 295 correspond to the PSB and RSB. 297 5. Procedures for Multiple Node Restart 299 In this document, all the nodes are assumed to have the graceful 300 restart capabilities which are described in [RFC3473] and [RFC5063]. 302 5.1. Procedures for the Normal Node 304 When the downstream Normal node detects its neighbor restarting, it 305 must send a RecoveryPath message for each LSP associated with the 306 restarting node for which it has previously sent a Resv message and 307 which has not been torn down. 309 When the upstream Normal node detects its neighbor restarting, it 310 must send a Path message with RECOVERY_LABEL object containing a 311 label value corresponding to the label value received in the most 312 recently received corresponding Resv message. 314 This document does not modify the procedures for the Normal node 315 which are described in [RFC3473] and [RFC5063]. 317 5.2. Procedures for the Restarting Node 319 This document does not modify the procedures for the Restarting 320 node which are described in [RFC3473] and [RFC5063]. 322 5.2.1. Procedures for Scenario 1 324 After the Restarting node restarts, it starts a Recovery Timer. Any 325 RSVP state that has not been resynchronized when the Recovery Timer 326 expires, should be cleared. 328 At the Restarting node (Node B in the example), full 329 resynchronization with the upstream neighbor (Node A) is possible 330 because Node A is a Normal node. The upstream Path information is 331 recovered from the Path message received from Node A. Node B also 332 recovers the upstream Resv information (that it had previously sent 333 to Node A) from the RECOVERY_LABEL object carried in the Path 334 message received from Node A, but, obviously, some information 335 (like the Recorded Route Object) will be missing from the new Resv 336 message generated by Node B, and can not be supplied until the 337 downstream Delayed Restarting node (Node C) restarts and sends a 338 Resv. 340 After the upstream Path information and upstream Resv information 341 has been recovered by Node B, the normal refresh procedure with the 342 upstream Node A should be started. 344 As per [RFC5063], the Restarting node (Node B) would normally 345 expect to receive a RecoveryPath message from its downstream 346 neighbor (Node C). It would use this to recover the downstream Path 347 information, and would subsequently send a Path message to its 348 downstream neighbor and receive a Resv message. But in this 349 scenario, because the downstream neighbor has not restarted yet, 350 Node B detects the communication with Node C is interrupted and 351 must wait before resynchronizing with its downstream neighbor. 353 In this case, the Restarting node (Node B) follows the procedures 354 in section 9.3 of [RFC3473] and may run a Restart Timer to wait for 355 the downstream neighbor (Node C) to restart. If its downstream 356 neighbor (Node C) has not restarted before the timer expires the 357 corresponding LSPs may be torn down according to local policy 358 [RFC3473]. Note, however, that the Restart Time value suggested in 359 [RFC3473] is based on the previous Hello message exchanged with the 360 node that has not restarted yet (Node C). Since this time value is 361 unlikely to be available to the restarting node (Node B), a 362 configured time value must be used if the timer is operated. 364 The RSVP state must be reconciled with the retained data plane 365 state if the cross-connect information can be retrieved from the 366 data plane. In the event of any mismatches, local policy will 367 dictate the action that must be taken which could include: 369 - reprogramming the data plane 371 - sending an alert to the management plane 373 - tearing down the control plane state for the LSP. 375 In the case that the Delayed Restarting node never comes back, and 376 where a Restart Timer is not used to automatically tear down LSPs, 377 the LSPs can be tidied up through the control plane using a 378 PathTear from the upstream node (Node A). Note that if Node C 379 restarts after this operation, the RecoveryPath message that it 380 sends to Node B will not be matched with any state on Node B and 381 will receive a PathTear as its response resulting in the teardown 382 of the LSP at all downstream nodes. 384 5.2.2. Procedures for Scenario 2 386 In this case, the Restarting node (Node C) can recover full 387 downstream state from its downstream neighbor (Node D) which is a 388 Normal node. The downstream Path state can be recovered from the 389 RecoveryPath message which is sent by Node D. This allows Node C to 390 send a Path refresh message to Node D, and Node D will respond with 391 a Resv message from which Node C can reconstruct the downstream 392 Resv state. 394 After the downstream Path information and downstream Resv 395 information has been recovered in Node C, the normal refresh 396 procedure with downstream Node D should be started. 398 The Restarting node would normally expect to resynchronize with its 399 upstream neighbor to re-learn the upstream Path and Resv state, but 400 in this scenario, because the upstream neighbor (Node B) has not 401 restarted yet, the Restarting node (Node C) detects that the 402 communication with upstream neighbor (Node B) is interrupted. The 403 Restarting node (Node C) follows the procedures in section 9.3 of 404 [RFC3473] and may run a Restart Timer to wait the upstream neighbor 405 (Node B) to restart. If its upstream neighbor (Node B) has not 406 restarted before the Restart Timer expires, the corresponding LSPs 407 may be torn down according to local policy [RFC3473]. Note, however, 408 that the Restart Time value suggested in [RFC3473] is based on the 409 previous Hello message exchanged with the node that has not 410 restarted yet (Node B). Since this time value is unlikely to be 411 available to the restarting node (Node C), a configured time value 412 must be used if the timer is operated. 414 Note that no Resv message is sent to the upstream neighbor (Node B) 415 because it has not restarted. 417 The RSVP state must be reconciled with the retained data plane 418 state if the cross-connect information can be retrieved from the 419 data plane. 421 In the event of any mismatches, local policy will dictate the 422 action that must be taken which could include: 424 - reprogramming the data plane 426 - sending an alert to the management plane 428 - tearing down the control plane state for the LSP. 430 In the case that the Delayed Restarting node never comes back, and 431 where a Restart Timer is not used to automatically tear down LSPs, 432 the LSPs cannot be tidied up through the control plane using a 433 PathTear from the upstream node (Node A), because there is no 434 control plane connectivity to Node C from the upstream direction. 435 There are two possibilities in [RFC3473]: 437 - Management action may be taken at the Restarting node to tear the 438 LSP. This will result in the LSP being removed from Node C, and a 439 PathTear being sent downstream to Node D. 441 - Management action may be taken at any downstream node (for 442 example, Node D) resulting in a PathErr message with the 443 Path_State_Removed flag set being sent to Node C to tear the LSP 444 state. 446 Note that if Node B restarts after this operation, the Path message 447 that it sends to Node C will not be matched with any state on Node 448 C and will be treated as a new Path message resulting in LSP setup. 449 Node C should use the labels carried in the Path message (in the 450 UPSTREAM_LABEL object and in the RECOVERY_LABEL object) to drive 451 its label allocation, but may use other labels according to normal 452 LSP setup rules. 454 5.2.3. Procedures for Scenario 3 456 In this example, the Restarting node (Node C) is isolated. It's 457 upstream and downstream neighbors have not restarted. 459 The Restarting node (Node C) follows the procedures in section 9.3 460 of [RFC3473] and may run a Restart Timer for each of its neighbors 461 (Nodes B and D). If a neighbor has not restarted before its Restart 462 Timer expires, the corresponding LSPs may be torn down according to 463 local policy [RFC3473]. Note, however, that the Restart Time values 464 suggested in [RFC3473] are based on the previous Hello message 465 exchanged with the nodes that have not restarted yet. Since these 466 time values are unlikely to be available to the restarting node 467 (Node C), a configured time value must be used if the timer is 468 operated. 470 During the Recovery Time, if the upstream Delayed Restarting node 471 has restarted, the procedure for scenario 1 can be applied. 473 During the Recovery Time, if the downstream Delayed Restarting node 474 has restarted, the procedure for scenario 2 can be applied. 476 In the case that neither Delayed Restarting node ever comes back, 477 and where a Restart Timer is not used to automatically tear down 478 LSPs, management intervention is required to tidy up the control 479 plane and the data plane on the node that is waiting for the failed 480 device to restart. 482 If the downstream Delayed Restarting node restarts after the 483 cleanup of LSPs at Node C, the RecoveryPath message from Node D 484 will be responded with a PathTear message. If the upstream Delayed 485 Restarting node restarts after the cleanup of LSPs at Node C, the 486 Path message from Node B will be treated as a new LSP setup request, 487 but the setup will fail because Node D cannot be reached - Node C 488 will respond with a PathErr message. Since this happens to Node B 489 during its restart processing, it should follow the rules of 490 [RFC5063] and tear down the LSP. 492 5.2.4. Procedures for Scenario 4 494 When the Ingress node (Node A) restarts, it does not know which 495 LSPs it caused to be created. Usually, however, this information is 496 retrieved from the management plane or from the configuration 497 requests stored in non-volatile form in the node in order to 498 recover the LSP state. 500 Furthermore, if the downstream node (Node B) is a Normal node, 501 according to the procedures in [RFC5063], the ingress will receive 502 a RecoveryPath message and will understand that it was the ingress 503 of the LSP. 505 However, in this scenario, the downstream node is a Delayed 506 Restarting node, so Node A must rely on the information from the 507 management plane or stored configuration, or it must wait for Node 508 B to restart. 510 In the event that Node B never restarts, management plane 511 intervention is needed at Node A to clean up any LSP control plane 512 state restored from the management plane or from local 513 configuration, and to release any data plane resources. 515 5.2.5. Procedures for Scenario 5 517 In this scenario the Egress node (Node D) restarts, and its 518 upstream neighbor (Node C) has not restarted. In this case, the 519 Egress node may have no control plane state relating to the LSPs. 520 It has no downstream neighbor to help it, and no management plane 521 or configuration information, although there will be data plane 522 state for the LSP. The Egress node must simply wait until its 523 upstream neighbor restarts and gives it the information as Path 524 messages carrying RECOVERY_LABEL objects. 526 5.3. Consideration of Re-Use of Data Plane Resources 528 Fundamental to the processes described above is an understanding 529 that data plane resources may remain in use (allocated and cross- 530 connected) when control plane state has not been fully 531 resynchronized because some control plane nodes have not restarted. 533 It is assumed that these data plane resources might be carrying 534 traffic and should not be reconfigured except through application 535 of operator-configured policy, or as a direct result of operator 536 action. 538 In particular, new LSP setup requests from the control plane or the 539 management plane should not be allowed to use data plane resources 540 that are still in use. Specific action must first be taken to 541 release the resources. 543 5.4. Consideration of Management Plane Intervention 545 The management plane must always retain the ability to control data 546 plane resources and to over-ride the control plane. In this context, 547 the management plane must always be able to release data plane 548 resources that were previously in place for use by control-plane 549 established LSPs. Further, the management plane must always be able 550 to instruct any control plane node to tear down any LSP. 552 Operators should be aware of the risks of misconnection that could 553 be caused by careless manipulation from the management plane of in- 554 use data plane resources. 556 6. Clarification of Restarting Node Procedure 558 According to the current graceful restart procedure [RFC3473], 559 after a node restarts its control plane, it needs its upstream node 560 to send PATH message with recovery label to synchronize its RSVP 561 state. If the restarted control plane becomes operational quickly, 562 the upstream node may not detect the restarting of downstream node 563 and therefore, may send a PATH message without recovery label 564 causing errors and unwanted connection deletion. 566 N1 N2 567 | | 568 | X (Restart start) 569 | HELLO | 570 |--------------->| 571 | | 572 | SRefresh | 573 |--------------->| 574 | | 575 | HELLO | 576 |--------------->| 577 | | 578 | X (Restart complete) 579 | SRefresh | 580 |--------------->| 581 | NACK | 582 |<---------------| 583 | Path without | 584 | recovery label | 585 |--------------->| 586 | X (resource allocation failed because the 587 | | resources are in use) 588 | PathErr | 589 |<---------------| 590 | PathTear | 591 |--------------->| 592 X(LSP deletion) X (LSP deletion) 593 | | 594 Figure 2 Message flow for accidental LSP deletion 596 The sequence diagram above depicts one scenario where the LSP may 597 get deleted. 599 In this sequence N1 did not detect Hello failure and continues 600 sending SRefreshes which may get NACK'ed by N2 once restart 601 completes because there is no Path state corresponding to the 602 SRefresh message. This NACK causes a Path refresh message to be 603 generated but there is no RECOVERY_LABEL because N1 did not yet 604 detect that N2 has restarted as Hello exchanges have not yet 605 started. The Path message is treated as "new" and fails to allocate 606 the resources because they are still in use. This causes a PathErr 607 message to be generated which may lead to the tear down of the LSP. 609 To resolve the aforementioned problem, the following procedures 610 which are implicit in [RFC3473] and [RFC5063] should be followed. 611 These procedures work together with the recovery procedures 612 documented in [RFC3473]. Here, it is assumed that the restarting 613 node and the neighboring node(s) support Hello extension as 614 documented in [RFC3209] and recovery procedures documented in 615 [RFC3473]. 617 After a node restarts its control plane, it should ignore and 618 silently drop all RSVP-TE messages, except Hello messages, it 619 receives from any neighbor to which, no HELLO session has been 620 established. 622 The restarting node should follow [RFC3209] to establish Hello 623 sessions with its neighbors, after its control plane becomes 624 operational. 626 The restarting node resumes processing of RSVP-TE messages sent 627 from each neighbor to which the Hello session has been established. 629 7. Security Considerations 631 This document clarifies the procedures defined in [RFC3473] and 632 [RFC5063] to be performed on RSVP agents that neighbor one or more 633 restarting RSVP agents. It does not introduce any new procedures 634 and, therefore, does not introduce any new security risks or issues. 636 In the case of the control plane in general, and the RSVP agent in 637 particular, where one or more nodes carrying one or more LSPs are 638 restarted due to external attacks, the procedures defined in 639 [RFC5063] and described in this document provide the ability for 640 the restarting RSVP agents to recover the RSVP state in each 641 restarting node corresponding to the LSPs, with the least possible 642 perturbation to the rest of the network. These procedures can be 643 considered to provide mechanisms by which the GMPLS network can 644 recover from physical attacks or from attacks on remotely 645 controlled power supplies. 647 The procedures described are such that, only the neighboring RSVP 648 agents should notice the restart of a node, and hence only they 649 need to perform additional processing. This allows for a network 650 with active LSPs to recover LSP state gracefully from an external 651 attack, without perturbing the data/forwarding plane state, and 652 without propagating the error condition in the control or data 653 plane. In other words, the effect of the restart (which might be 654 the result of an attack) does not spread into the network. 656 Note that concern has been expressed about the vulnerability of a 657 restarting node to false messages received from its neighbors. For 658 example, a restarting node might receive a false Path message with 659 a Recovery Label object from an upstream neighbor, or a false 660 RecoveryPath message from its downstream neighbor. This situation 661 might arise in one of four cases: 663 - The message is spoofed and does not come from the neighbor at all. 665 - The message has been modified as it was traveling from the 666 neighbor. 668 - The neighbor is defective and has generated a message in error. 670 - The neighbor has been subverted and has a "rogue" RSVP agent. 672 The first two cases may be handled using standard RSVP 673 authentication and integrity procedures [RFC3209], [RFC3473]. If 674 the operator is particularly worried, the control plane may be 675 operated using IPsec [RFC4301], [RFC4302], [RFC4835], [RFC4306], 676 and [RFC2411]. 678 Protection against defective or rogue RSVP implementations is 679 generally hard to impossible. Neighbor-to-neighbor authentication 680 and integrity validation is, by definition, ineffective in these 681 situations. For example, if a neighbor node sends a Resv during 682 normal LSP setup, and if that message carries a GENERALIZED_LABEL 683 object carrying an incorrect label value, then the receiving LSR 684 will use the supplied value and the LSP will be set up incorrectly. 685 Alternatively, if a Path message is modified by an upstream LSR to 686 change the destination and explicit route, there is no way for the 687 downstream LSR to detect this, and the LSP may be set up to the 688 wrong destination. Furthermore, the upstream LSR could disguise 689 this fact by modifying the recorded route reported in the Resv 690 message. Thus, these issues are in no way specific to the restart 691 case, do not cause any greater or different problems from the 692 normal case, and do not warrant specific security measure 693 applicable to restart scenarios. 695 Note that the RSVP POLICY_DATA object [RFC2205] provides a scope by 696 which secure end-to-end checks could be applied. However, very 697 little definition of the use of this object has been made to date. 699 See [MPLS-SEC] for a wider discussion of security in MPLS and GMPLS 700 networks. 702 8. IANA Considerations 704 This document defines no new protocols or extensions and makes no 705 requests to IANA for registry management. 707 9. Acknowledgments 709 We would like to thank Adrian Farrel, Dimitri Papadimitriou, and 710 Lou Berger for their useful comments. 712 10. References 714 10.1. Normative References 716 [RFC2209] R. Braden, L. Zhang, "Resource ReSerVation Protocol (RSVP) 717 -- Version 1 Message Processing Rules", RFC 2209, September 718 1997. 720 [RFC3209] Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, V., 721 and G. Swallow, "RSVP-TE: Extensions to RSVP for LSP 722 Tunnels", RFC 3209, December 2001. 724 [RFC3473] Berger, L., "Generalized Multi-Protocol Label Switching 725 (GMPLS) Signaling Resource ReserVation Protocol-Traffic 726 Engineering (RSVP-TE) Extensions", RFC 3473, January 2003. 728 [RFC5063] A. Satyanarayana, R. Rahman, "Extensions to GMPLS RSVP 729 Graceful Restart", RFC 5063, September 2007. 731 10.2. Informative References 733 [MPLS-SEC] Fang, L., "Security Framework for MPLS and GMPLS Networks", 734 draft-ietf-mpls-mpls-and-gmpls-security-framework, work in 735 progress. 737 [RFC2205] Braden, R. (Ed.), Zhang, L., Berson, S., Herzog, S. and S. 738 Jamin, "Resource ReserVation Protocol -- Version 1 739 Functional Specification", RFC 2205, September 1997. 741 [RFC2411] R. Thayer, N. Doraswamy, R. Glenn, "IP Security Document 742 Roadmap", RFC 2411, November 1998. 744 [RFC4301] S. Kent, K. Seo, "Security Architecture for the Internet 745 Protocol", RFC 4301, December 2005. 747 [RFC4302] S. Kent, "IP Authentication Header", RFC 4302, December 748 2005. 750 [RFC4306] C. Kaufman, "Internet Key Exchange (IKEv2) Protocol", RFC 751 4306, December 2005. 753 [RFC4835] V. Manral, "Cryptographic Algorithm Implementation 754 Requirements for Encapsulating Security Payload (ESP) and 755 Authentication Header (AH)", RFC 4835, April 2007. 757 11. Authors' Addresses 759 Dan Li 760 Huawei Technologies 761 F3-5-B R&D Center, Huawei Base, 762 Shenzhen 518129, China 764 Phone: +86 755 28970230 765 Email: danli@huawei.com 767 Jianhua Gao 768 Huawei Technologies 769 F3-5-B R&D Center, Huawei Base, 770 Shenzhen 518129, China 772 Phone: +86 755 28972902 773 Email: gjhhit@huawei.com 775 Arun Satyanarayana 776 Cisco Systems 777 170 West Tasman Dr 778 San Jose, CA 95134, USA 780 Phone: +1 408 853-3206 781 Email: asatyana@cisco.com 782 Snigdho C. Bardalai 783 Fujitsu Network Communications 784 2801 Telecom Parkway 785 Richardson, Texas 75082, USA 787 Phone: +1 972 479 2951 788 Email: snigdho.bardalai@us.fujitsu.com 790 12. Full Copyright Statement 792 Copyright (c) 2009 IETF Trust and the persons identified as the 793 document authors. All rights reserved. 795 This document is subject to BCP 78 and the IETF Trust's Legal 796 Provisions Relating to IETF Documents 797 (http://trustee.ietf.org/license-info) in effect on the date of 798 publication of this document. Please review these documents 799 carefully, as they describe your rights and restrictions with respect 800 to this document. 802 All IETF Documents and the information contained therein are provided 803 on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE 804 REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE 805 IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL 806 WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY 807 WARRANTY THAT THE USE OF THE INFORMATION THEREIN WILL NOT INFRINGE 808 ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS 809 FOR A PARTICULAR PURPOSE. 811 13. Intellectual Property Statement 813 The IETF Trust takes no position regarding the validity or scope of 814 any Intellectual Property Rights or other rights that might be 815 claimed to pertain to the implementation or use of the technology 816 described in any IETF Document or the extent to which any license 817 under such rights might or might not be available; nor does it 818 represent that it has made any independent effort to identify any 819 such rights. 821 Copies of Intellectual Property disclosures made to the IETF 822 Secretariat and any assurances of licenses to be made available, or 823 the result of an attempt made to obtain a general license or 824 permission for the use of such proprietary rights by implementers or 825 users of this specification can be obtained from the IETF on-line IPR 826 repository at http://www.ietf.org/ipr 827 The IETF invites any interested party to bring to its attention any 828 copyrights, patents or patent applications, or other proprietary 829 rights that may cover technology that may be required to implement 830 any standard or specification contained in an IETF Document. Please 831 address the information to the IETF at ietf-ipr@ietf.org. 833 The definitive version of an IETF Document is that published by, or 834 under the auspices of, the IETF. Versions of IETF Documents that are 835 published by third parties, including those that are translated into 836 other languages, should not be considered to be definitive versions 837 of IETF Documents. The definitive version of these Legal Provisions 838 is that published by, or under the auspices of, the IETF. Versions of 839 these Legal Provisions that are published by third parties, including 840 those that are translated into other languages, should not be 841 considered to be definitive versions of these Legal Provisions. 843 For the avoidance of doubt, each Contributor to the IETF Standards 844 Process licenses each Contribution that he or she makes as part of 845 the IETF Standards Process to the IETF Trust pursuant to the 846 provisions of RFC 5378. No language to the contrary, or terms, 847 conditions or rights that differ from or are inconsistent with the 848 rights and licenses granted under RFC 5378, shall have any effect and 849 shall be null and void, whether published or posted by such 850 Contributor, or included with or in such Contribution.