idnits 2.17.1 draft-lin-ccamp-gmpls-proactive-protection-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (November 3, 2019) is 1629 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC4426' is defined on line 534, but no explicit reference was found in the text Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 CCAMP Working Group Yi Lin 2 Internet Draft Huawei Technologies 3 Intended status: Standards Track November 3, 2019 4 Expires: May 2020 6 RSVP-TE Extensions in Support of Proactive Protection 7 draft-lin-ccamp-gmpls-proactive-protection-00.txt 9 Status of this Memo 11 This Internet-Draft is submitted in full conformance with the 12 provisions of BCP 78 and BCP 79. 14 Internet-Drafts are working documents of the Internet Engineering 15 Task Force (IETF), its areas, and its working groups. Note that 16 other groups may also distribute working documents as Internet- 17 Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six 20 months and may be updated, replaced, or obsoleted by other documents 21 at any time. It is inappropriate to use Internet-Drafts as 22 reference material or to cite them other than as "work in progress." 24 The list of current Internet-Drafts can be accessed at 25 http://www.ietf.org/ietf/1id-abstracts.txt 27 The list of Internet-Draft Shadow Directories can be accessed at 28 http://www.ietf.org/shadow.html 30 This Internet-Draft will expire on May 3, 2020. 32 Copyright Notice 34 Copyright (c) 2019 IETF Trust and the persons identified as the 35 document authors. All rights reserved. 37 This document is subject to BCP 78 and the IETF Trust's Legal 38 Provisions Relating to IETF Documents 39 (http://trustee.ietf.org/license-info) in effect on the date of 40 publication of this document. Please review these documents 41 carefully, as they describe your rights and restrictions with 42 respect to this document. Code Components extracted from this 43 document must include Simplified BSD License text as described in 44 Section 4.e of the Trust Legal Provisions and are provided without 45 warranty as described in the Simplified BSD License. 47 Abstract 49 This document describes protocol-specific procedures and extensions 50 for Generalized Multi-Protocol Label Switching (GMPLS) Resource 51 ReSerVation Protocol - Traffic Engineering (RSVP-TE) signaling to 52 support Label Switched Path (LSP) Proactive Protection, which create 53 the protection LSP after a failure is predicted and before it 54 becomes a real failure. 56 Table of Contents 58 1. Introduction .................................................. 2 59 2. Conventions used in this document ............................. 3 60 3. Overview of Predicted Failure and Related Recovery Methods .... 3 61 3.1. Predicted Failure ........................................ 3 62 3.2. Proactive Protection ..................................... 4 63 4. Modified PROTECTION Object Format ............................. 5 64 5. Extension to ERROR_SPEC Object ................................ 6 65 5.1. New Error Code / Sub-code ................................ 6 66 5.2. New TLV in ERROR_SPEC Object ............................. 6 67 6. End-to-end Proactive Protection ............................... 7 68 6.1. Creation of the Protected LSP ............................ 7 69 6.2. Notification of Predicted Failure Event .................. 7 70 6.3. Tearing Down of the Protection LSP ....................... 8 71 7. Proactive Segment Protection .................................. 8 72 7.1. Creation of the Protected LSP ............................ 8 73 7.2. Notification of Predicted Failure Event .................. 9 74 7.3. Tearing Down of the Segment Recovery LSP ................. 9 75 7.4. Priority and Resource Pre-emption ....................... 10 76 8. Consideration of Backward Compatibility ...................... 11 77 9. Security Considerations ...................................... 11 78 10. IANA Considerations ......................................... 11 79 11. References .................................................. 12 80 11.1. Normative References ................................... 12 81 11.2. Informative References ................................. 12 82 12. Authors' Addresses .......................................... 12 84 1. Introduction 86 [RFC4872] and [RFC4873] describe protocol-specific procedures and 87 extensions for GMPLS RSVP-TE signaling to support end-to-end LSP 88 recovery (including protection and restoration) and segment LSP 89 recovery, respectively. 91 Traditional protection solution (e.g., 1+1 or 1:1 protection) could 92 have very fast protection switch after failure happens, but takes 93 twice of resource in the network during the whole lifetime of the 94 LSP. On the other hand, the traditional restoration solution has 95 much higher resource use, but the recovery of the LSP is much 96 slower, due to the additional signaling time to create the 97 restoration LSP. 99 In order to reduce the recovery resource while keeping the very fast 100 protection switch, an approach is to use the failure prediction 101 technologies and to create 1+1 or 1:1 protection only when a 102 potential failure is predicted. This approach refers to "Proactive 103 Protection" in this document. 105 This document extends the RSVP-TE protocol to support the control of 106 the Proactive Protection. 108 2. Conventions used in this document 110 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 111 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 112 "OPTIONAL" in this document are to be interpreted as described in 113 BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all 114 capitals, as shown here. 116 3. Overview of Predicted Failure and Related Recovery Methods 118 3.1. Predicted Failure 120 In most cases, there will be some indications before a physical 121 failure happens in a network. For example, abnormal fluctuation of 122 noise of a lightpath, BER (Bit Error Rate) (before error correction) 123 rising, temperature rising of a transponder. 125 Therefore, by monitoring on certain physical parameters and 126 analyzing the change tendency using, for example, Machine Learning 127 (ML) or other technologies, a node is possible to predict whether 128 failure will happen in an upcoming period of time. 130 Note that a predicted failure is different from a Signal Degrade in 131 that: 133 - When Signal Degrade happens to a connection, the connection is 134 still available but the quality of the signal carried by this 135 connection has declined and is lower than the predetermined 136 threshold. For example, the BER of a connection rises and is out 137 of tolerance. 139 - When a predicted failure of a connection is inferred, no failure 140 nor degradation happens at present, but there is a trend that 141 after a period of time, failure will probably happen, which will 142 cause Signal Fail or Signal Degrade. 144 The methods to predict failures are outside the scope of this 145 document. 147 3.2. Proactive Protection 149 The "Proactive Protection" refers to an LSP protection approach 150 which create the protection LSP after a failure is predicted and 151 before it becomes a real failure. Both end-to-end protection 152 (defined in [RFC4872] and segment protection (defined in [RFC4873]) 153 are applicable for the Proactive Protection. 155 The main procedure of Proactive Protection is shown in Figure 1: 157 |-> Predicted failure notification received 158 | |-> Proactive Protection path created 159 | | |-> Real failure happens 160 | | | |-> Protection switch finished 161 | | | | 162 | | | | Protection path deleted <-| 163 | | | | if no failure happened | 164 | | | | | 165 | | t3 | | t6 | 166 ---+---+--------+======x=+==========================+----+---> t 167 t1 t2 | t4 t5 | t7 168 | | 169 |<--Predicted failure time period-->| 171 Figure 1: Overview of Proactive Protection 173 - t1: The protection source node of an LSP is notified that a 174 failure will probably happen during t3~t6, so it starts to create 175 1+1 or 1:1 protection of the connection. Here the protection 176 source node can be the source node of the LSP (for end-to-end 177 protection case), or a branch node located between the source node 178 and the predicted failure point of the LSP (for segment protection 179 case). 181 t2: The 1+1 or 1:1 protecting path is created between the 182 protection source node and the protection destination node. Here 183 the protection destination node can be the destination node of the 184 LSP (for end-to-end protection case), or a merge node located 185 between the predicted failure point and the destination node of 186 the LSP (for segment protection case). 188 - t4: If real failure happens as predicted, the 1+1 or 1:1 189 protection switch will be triggered. 191 - t5: Protection switch finished and the service in the connection 192 is recovered. 194 - t7: If in fact the predicted failure didn't happen, and no further 195 predicted failure notification received, the protection source 196 node MAY tear down the protecting path after t6, in order to save 197 the network resource. 199 4. Modified PROTECTION Object Format 201 This document modifies the PROTECTION object (C-Type=2) by adding 202 two new bits T and A in reserved fields, as shown in Figure 2 below: 204 0 1 2 3 205 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 206 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 207 | Length | Class-Num(37) | C-Type (2) | 208 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 209 |S|P|N|O|T| Res. | LSP Flags | Reserved | Link Flags| 210 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 211 |I|R|A| Reserved | Seg.Flags | Reserved | 212 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 214 Figure 2: The modified PROTECTION object (C-Type=2) 216 - T (Triggered End-to-end Proactive Protection): 1 bit, when set 217 (1), it indicates that the end-to-end Proactive Protection are 218 required. 220 Note that if T bit is set (1), the LSP Flags SHOULD be one of: 221 0x04 1:N Protection with Extra-Traffic 222 0x08 1+1 Unidirectional Protection 223 0x10 1+1 Bidirectional Protection 225 - A (proActive Segment Protection): 1 bit, when set (1), it 226 indicates that the Proactive Segment Protection are required. 228 Note that If A bit is set (1), the Seg. Flags SHOULD be one of: 229 0x04 1:N Protection with Extra-Traffic 230 0x08 1+1 Unidirectional Protection 231 0x10 1+1 Bidirectional Protection 233 See [RFC4872] and [RFC4873] for the definition of other fields. 235 5. Extension to ERROR_SPEC Object 237 5.1. New Error Code / Sub-code 239 A new Error Sub-code under Error Code "25 - Notify Error" is defined 240 in this document, which is used to notify the event of a predicted 241 failure: 243 Error Code = 25: "Notify Error" (see [RFC3209]) 245 Error Sub-code = TBA: "Notify Error/LSP Local Predicted Failure" 247 5.2. New TLV in ERROR_SPEC Object 249 When predicting a failure, a certain time before which the failure 250 may happen may also be predicted. This time information is useful 251 for the source node to know how long it should wait for the 252 predicted failure to become a real failure, and to decide when it's 253 safe to tear down the protection LSP if the predicted failure didn't 254 happen. 256 A new TLV in IPv4/IPv6 IF_ID ERROR_SPEC Object is defined in this 257 document, which is used to indicate the time before which the 258 predicted failure will probably become real failure. The format of 259 this new TLV is shown in Figure 3 below: 261 0 1 2 3 262 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 263 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 264 | Type = TBA | Length = 8 | 265 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 266 | Time | 267 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 269 Figure 3: New TLV (type=TBA) in ERROR_SPEC Object 271 - Type: TBA 273 - Length: 8 274 - Time: A relative time measured in second, which indicates within 275 how many seconds (from the current time) the predicted failure 276 will probably become real failure. 278 6. End-to-end Proactive Protection 280 6.1. Creation of the Protected LSP 282 To create an LSP with recovery type of "End-to-end Proactive 283 Protection", the source node of the LSP generates a Path message 284 with a PROTECTION object included. The T bit in the PROTECTION 285 object MUST be set to 1 (End-to-end Proactive Protection), so that 286 all other nodes along the LSP can start the failure prediction 287 function on related links/nodes. 289 Note that the N bit in the PROTECTION object is used to indicate 290 whether the control plane message exchange is only used for 291 notification or for protection-switching purpose after real failure 292 happens, see [RFC4872]. In other words, the N bit have nothing to do 293 with the notification of a predicted failure before real failure 294 happens. 296 To allow the notification of predicted failure event to the source 297 node by the Notify message, the NOTIFY REQUEST object MUST also be 298 included in the Path message (see [RFC3473]), where the "Notify Node 299 Address" SHOULD be the address of the source node of the LSP. 301 6.2. Notification of Predicted Failure Event 303 When an intermediate node on an LSP infers that a failure will 304 happen and will affect the LSP, a Notify message will be sent to the 305 source node of the LSP, to inform such predicted failure event. A 306 new error code/sub-code "Notify Error/LSP Local Predicted Failure" 307 is used in the ERROR_SPEC object or IF_ID_ERROR_SPEC object in the 308 Notify message. 310 The Notify message MAY also include a TLV (type = TBA) in the IPv4 311 or IPv6 IF_ID_ERROR_SPEC object, to indicate the time before which 312 the predicted failure will probably become real failure. 314 On receiving the Notify message with error code/sub-code "Notify 315 Error/LSP Local Predicted Failure", the source node of the LSP 316 SHOULD trigger the procedure to create the protection LSP, according 317 to the protection type indicated in the "LSP Flags" field of the 318 PROTECTION object in the Path message for the protected LSP. The 319 procedures of creating the protection LSP and the protection 320 switching after real failure happens are described in [RFC4872]. 322 6.3. Tearing Down of the Protection LSP 324 After the protected LSP is created, the source node MAY start a 325 timer T_wait and wait for the predicted failure to become a real 326 failure. If no real failure happens and no more notification of 327 predicted failure is received till T_wait, the source node MAY 328 trigger the procedure to tear down the protection LSP, according to 329 local policy. See [RFC4872] about the process of tearing down a 330 protection LSP. 332 Implementations SHOULD allow this policy to be configured to provide 333 a default across all LSPs on a node, but SHOULD also allow it to be 334 configured per LSP. 336 Note that the T_wait MUST longer than the time indicated in the TLV 337 (type=TBA) in the ERROR_SPEC object in the Notify message, if the 338 TLV exists. 340 Note also that the value of T_wait is a local matter of the source 341 node, and is outside the scope of this document. 343 7. Proactive Segment Protection 345 7.1. Creation of the Protected LSP 347 To create an LSP with recovery type of "Proactive Segment 348 Protection", the source node of the LSP generates a Path message, 349 where: 351 - A PROTECTION object is included, where the A bit MUST be set to 1 352 (Proactive Segment Protection), so that all nodes along the 353 protected LSP can start the failure prediction function on related 354 links/nodes if supported. The "Seg. Flags" are used to indicate 355 the protection type of the Proactive Segment Protection. 357 - One or more SERO objects MAY included (i.e., explicit Proactive 358 Segment Protection), indicating the branch node and the merge node 359 of each segment recovery LSP. If no SERO object is included, it 360 indicates that the dynamic Proactive Segment Protection method is 361 used. 363 - A NOTIFY REQUEST object is included, where the Notify Node 364 Address" SHOULD be the address of the source node of the LSP. 366 For explicit Proactive Segment Protection, when a branch node 367 receives a Path message with A bit set to 1 in the PROTECTION 368 object, the branch node follows [RFC4873] to process the Path 369 message, except that the Path message for the recovery LSP will not 370 be generated and be sent at this stage. Also, one more NOTIFY 371 REQUEST object SHOULD be added to the Path message of the protected 372 LSP, which carries the address of this branch node. 374 For dynamic Proactive Segment Protection, when an intermediate node 375 receives a Path message with A bit set to 1 in the PROTECTION 376 object, the node will determine if it has the ability to be a branch 377 node, as described in Section 6.2 of [RFC4873]. If yes, it follows 378 the same procedure as what a branch node does in the case of 379 explicit Proactive Segment Protection, as described above. If not, 380 the node only follows the standard procedure to create the protected 381 LSP. 383 7.2. Notification of Predicted Failure Event 385 When an intermediate node between a pair of branch and merge nodes 386 on an LSP infers that a failure will happen and will affect the LSP, 387 a Notify message will be sent to the nearest branch node on the 388 upstream direction of the LSP, to inform such predicted failure 389 event. The error code/sub-code "Notify Error/LSP Local Predicted 390 Failure" is used in the ERROR_SPEC object or IF_ID_ERROR_SPEC object 391 in the Notify message. 393 Similar to End-to-end Proactive Protection, the time before which 394 the predicted failure may occur MAY also be included in the Notify 395 message. 397 On receiving the Notify message with error code/sub-code "Notify 398 Error/LSP Local Predicted Failure", the branch node on the protected 399 LSP SHOULD generate a new Path message, and send this new Path 400 message along the recovery LSP between the branch and the merge 401 nodes. The procedures of generating new Path message and creating 402 the recovery LSP are the same as what is described in [RFC4873], 403 except that the A bit in the PROTECTION object of this new Path 404 message MUST set to 1. 406 7.3. Tearing Down of the Segment Recovery LSP 408 After the segment recovery LSP is created, the branch node MAY start 409 a timer T_wait and wait for the predicted failure to become a real 410 failure. If no real failure happen and no more notification of 411 predicted failure is received till T_wait, the branch node MAY 412 trigger the procedure to tear down the segment recovery LSP, 413 according to local policy. See [RFC4873] about the process of 414 tearing down a segment recovery LSP. 416 Implementations SHOULD allow this policy to be configured to provide 417 a default across all LSPs on a node, but SHOULD also allow it to be 418 configured per LSP. 420 Note that the T_wait MUST longer than the time indicated in the TLV 421 (type=TBA) in the ERROR_SPEC object in the Notify message, if the 422 TLV exists. 424 Note also that the value of T_wait is a local matter of the branch 425 node, and is outside the scope of this document. 427 7.4. Priority and Resource Pre-emption 429 It's possible that after recovery LSP is created and before the 430 predicted failure becomes a real failure, another real failure 431 happens on the LSP outside the protected segment. In this case, the 432 source node (or an intermediate node in the upstream direction of 433 the real failure) may start a restoration procedure to recover the 434 LSP. For the same protected LSP, since recovering from a real 435 failure always has higher priority than protecting against a 436 predicted failure which still hasn't happened, the restoration LSP 437 can pre-empt the resource of the segment recovery LSP. 439 As shown in Figure 4, assume that node B (branch node) was notified 440 of a predicted failure event between N-4 and M (merge node), and has 441 created the segment recovery LSP along B, N-1, N-2, N-3 and M. If 442 another failure between S (source node) and B happens before the 443 predicted failure becomes a real failure, node S will try to create 444 the restoration LSP. Since that resource is limited, the restoration 445 LSP can pre-empt the resource of the segment recovery LSP between N- 446 1 and N-3. 448 The nodes along the segment recovery LSP has enough information to 449 determine whether pre-emption is allowed. This is because these 450 nodes know that: 452 - The current segment recovery LSP is used for Proactive Segment 453 Protection through the A bit in the PROTECTION object; 455 - The segment recovery LSP and the restoration LSP are protecting 456 the same LSP through the association relationship. 458 |<------ Pre-emption ------>| 459 | | 460 *************************************************************** 461 *+---+ +---+ +---+ +---+ +---+* 462 *| +---------+N-1+---------+N-2+---------+N-3+---------+ |* 463 *+-+-+ +-+-+ +---+ +-+-+ +-+-+* 464 * | |###########################| | * 465 * | |# #| | * 466 * | |# #| | * 467 *+-+-+ +-+-+ +---+ +-+-+ +-+-+* 468 ***| S +----X----+ B +---------+N-4+----?----+ M +---------+ D |*** 469 +---+ +---+ +---+ +---+ +---+ 470 =================================================================== 472 S: Source node D: Destination node 473 B: Branch node M: Merge node 474 X: Real failure ?: Predicted failure (haven't happened yet) 476 =====: Protected LSP 477 #####: Segment Recovery LSP 478 *****: Restoration LSP 480 Figure 4: Resource pre-emption by restoration LSP 482 8. Consideration of Backward Compatibility 484 TBD. 486 [Editor's note]: will add some description about interwork with 487 legacy nodes which do not support the function of failure prediction 488 and reporting. 490 9. Security Considerations 492 TBD. 494 10. IANA Considerations 496 IANA assigns values to RSVP protocol parameters. Within the current 497 document, a new Error code/sub-code value is defined: 499 Error Code = 25: "Notify Error" (see [RFC3209]) 501 o "Notify Error/LSP Local Predicted Failure" (TBA) 503 11. References 505 11.1. Normative References 507 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 508 Requirement Levels", BCP 14, RFC 2119, DOI 509 10.17487/RFC2119, March 1997. 511 [RFC3209] Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, V., 512 and G. Swallow, "RSVP-TE: Extensions to RSVP for LSP 513 Tunnels", RFC 3209, December 2001. 515 [RFC3473] Berger, L., Ed., "Generalized Multi-Protocol Label 516 Switching (GMPLS) Signaling Resource ReserVation Protocol- 517 Traffic Engineering (RSVP-TE) Extensions", RFC 3473, 518 January 2003. 520 [RFC4872] Lang, J., Ed., Rekhter, Y., Ed., and D. Papadimitriou, 521 Ed., "RSVP-TE Extensions in Support of End-to-End 522 Generalized Multi-Protocol Label Switching (GMPLS) 523 Recovery", RFC 4872, May 2007. 525 [RFC4873] Berger, L., Bryskin, I., Papadimitriou, D., and A. Farrel, 526 "GMPLS Segment Recovery", RFC 4873, May 2007. 528 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 529 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 530 May 2017. 532 11.2. Informative References 534 [RFC4426] Lang, J., Ed., Rajagopalan, B., Ed., and D. Papadimitriou, 535 Ed., "Generalized Multi-Protocol Label Switching (GMPLS) 536 Recovery Functional Specification," RFC 4426, March 2006. 538 12. Authors' Addresses 540 Yi Lin 541 Huawei Technologies 542 F3 R&D Center, Huawei Industrial Base, 543 Bantian, Longgang District, 544 Shenzhen 518129 P.R.China 545 Email: yi.lin@huawei.com