idnits 2.17.1 draft-sprecher-mpls-tp-survive-fwk-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 18. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 1004. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 980. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 987. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 993. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 2) being 59 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Informational ---------------------------------------------------------------------------- -- No information found for draft-jenkins-mpls-mplstp-requirements - is the name correct? Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group N. Sprecher 2 Internet Draft Nokia Siemens Networks 3 Category: Informational A. Farrel 4 Created: July 7, 2008 Old Dog Consulting 5 Expires: January 7, 2009 V. Kompella 6 Alcatel-Lucent 8 Multiprotocol Label Switching Transport Profile 9 Survivability Framework 11 draft-sprecher-mpls-tp-survive-fwk-00.txt 13 Status of this Memo 15 By submitting this Internet-Draft, each author represents that any 16 applicable patent or other IPR claims of which he or she is aware 17 have been or will be disclosed, and any of which he or she becomes 18 aware will be disclosed, in accordance with Section 6 of BCP 79. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF), its areas, and its working groups. Note that other 22 groups may also distribute working documents as Internet-Drafts. 24 Internet-Drafts are draft documents valid for a maximum of six months 25 and may be updated, replaced, or obsoleted by other documents at any 26 time. It is inappropriate to use Internet-Drafts as reference 27 material or to cite them other than as "work in progress." 29 The list of current Internet-Drafts can be accessed at 30 http://www.ietf.org/ietf/1id-abstracts.txt. 32 The list of Internet-Draft Shadow Directories can be accessed at 33 http://www.ietf.org/shadow.html. 35 Abstract 37 Network survivability is the network's ability to restore traffic 38 following failure or attack; it plays a critical factor in the 39 delivery of reliable services in transport networks. Guaranteed 40 services in the form of Service Level Agreements (SLAs) require a 41 resilient network that detects facility or node failures, very 42 rapidly, and immediately starts to restore network operations in 43 accordance with the terms of the SLA. 45 The Transport Profile of Multiprotocol Label Switching (MPLS-TP) is a 46 packet transport technology that combines the packet experience of 47 MPLS with the operational experience of SONET/SDH. It provides 48 survivability mechanisms such as protection and restoration, with 49 similar function levels to those found in established transport 50 networks such as in SONET/SDH networks. Some of the MPLS-TP 51 protection mechanisms are data plane-driven and are based on MPLS-TP 52 OAM fault management functions which are used to trigger protection 53 switching in the absence of a control plane. Other protection 54 mechanisms utilize the MPLS-TP control plane. 56 This document provides a framework for MPLS-TP survivability. 58 Table of Contents 60 1. Introduction 3 61 2. Terminology and References 6 62 3. Requirements for Survivability 6 63 4. Functional Architecture 8 64 4.1. Elements of Control 8 65 4.1.1. Manual Control 8 66 4.1.2. Failure-Triggered Actions 9 67 4.1.3. OAM Signaling 9 68 4.1.4. 9 69 4.2. Elements of Recovery 9 70 4.2.1. Span Recovery 10 71 4.2.2. Segment Recovery 10 72 4.2.3. 10 73 4.3. Levels of Recovery 11 74 4.3.1. Dedicated Protection 11 75 4.3.2. Shared Protection 11 76 4.3.3. Extra Traffic 12 77 4.3.4. Restoration and Repair 12 78 4.3.5. 13 79 4.4. Mechanisms for Recovery 13 80 4.4.1. Link-Level Protection 13 81 4.4.2. Alternate Paths and Segments 13 82 4.4.3. 13 83 4.5. Protection in Different Topologies 13 84 4.5.1. Mesh Networks 13 85 4.5.2. Ring Networks 15 86 4.5.3. 15 87 4.6. Recovery in Layered Networks 15 88 4.6.1. Inherited Link-Level Protection 16 89 4.6.2. Shared Risk Groups 16 90 4.6.3. Fault Correlation 16 91 5. Mechanisms for Providing Protection in MPLS-TP 16 92 5.1. Management Plane 16 93 5.1.1. Configuration of Protection Operation 17 94 5.1.2. Forced Protection Actions 17 95 5.1.3. Blocked Protection Actions 17 96 5.2. Fault Detection 17 97 5.3. Fault Isolation 18 98 5.4. OAM Signaling 18 99 5.4.1. Fault Detection 18 100 5.4.2. Fault Isolation 18 101 5.4.3. Fault Reporting 18 102 5.4.4. Coordination of Recovery Actions 18 103 5.5. Control Plane Signaling 18 104 5.5.1. Fault Detection 18 105 5.5.2. Fault Isolation 18 106 5.5.3. Fault Reporting 18 107 5.5.4. Coordination of Recovery Actions 18 108 6. Pseudowire Protection Considerations 18 109 6.1. Utilizing Underlying MPLS-TP Protection 18 110 6.2. Protection in the Pseudowire Layer 18 111 7. Manageability Considerations 18 112 8. Security Considerations 18 113 9. IANA Considerations 18 114 10. Acknowledgments 18 115 11. References 19 116 11.1. Normative References 19 117 11.2. Informative References 19 118 12. Editors' Addresses 20 119 13. Intellectual Property Statement 20 121 1. Introduction 123 Network survivability is the network's ability to restore traffic 124 following failure or attack; it plays a critical factor in the 125 delivery of reliable services in transport networks. Guaranteed 126 services in the form of Service Level Agreements (SLAs) require a 127 resilient network that very rapidly detects facility or node 128 failures, and immediately starts to restore network operations in 129 accordance with the terms of the SLA. 131 The Transport Profile of Multiprotocol Label Switching (MPLS-TP) 132 [MPLS-TP-JWT] , [MPLS-TP-REQ] is a packet transport technology that 133 combines the packet experience of MPLS with the operational 134 experience of SONET/SDH. MPLS-TP is designed to be consistent with 135 existing transport network operations and management models and 136 provide survivability mechanisms, such as protection and restoration 137 with similar function levels to those found in established transport 138 networks such as the SONET/SDH networks which provided service 139 providers with a high benchmark for reliability. 141 This document provides a framework for MPLS-TP-based survivability. 142 It uses the recovery terminology defined in [RFC4427] which draws 143 heavily on [G.808.1], and refers to the requirements specified in 144 [MPLS-TP-REQ]. 146 Various recovery schemes (for protection and restoration) and 147 processes have been defined and analyzed in [RFC4427] and [RFC4428]. 148 These schemes may also be applied in MPLS-TP networks to re-establish 149 end-to-end traffic delivery within the agreed service level and so 150 recover from 'failed' or 'degraded' transport entities (links or 151 nodes). Such actions are normally initiated by the detection of a 152 defect or performance degradation, or by an external request (e.g., 153 an operator request for manual control of protection switching). 155 [RFC4427] makes a distinction between protection switching and 156 restoration mechanisms. Protection switching makes use of 157 pre-assigned capacity between nodes, where the simplest scheme has 158 one dedicated protection entity for each working entity, while the 159 most complex scheme has m protection entities shared between n 160 working entities (m:n). Protection switching may be either 161 unidirectional or bidirectional. Restoration uses any capacity 162 available between nodes and usually involves re-routing. The 163 resources used for restoration may be pre-planned and recovery 164 priority may be used as a differentiation mechanism to determine 165 which services are recovered and which are not recovered or are 166 sacrificed in order to achieve recovery of other services.. In 167 general, protection actions are completed within time frames of tens 168 of milliseconds, while restoration actions are normally completed in 169 periods ranging from hundreds of milliseconds to a maximum of a few 170 seconds. 172 However, the recovery schemes described in [RFC4427] and evaluated in 173 [RFC4428] assume some control plane-driven actions that are performed 174 in the recovery context. As for other transport technologies and 175 associated transport networks, the presence of a distributed control 176 plane in support of MPLS-TP network operations is optional, and the 177 absence of such a control plane does not affect the ability to 178 operate the network and to use MPLS-TP forwarding, OAM, and 179 protection capabilities. 181 Thus, some of the MPLS-TP recovery mechanisms do not depend on a 182 control plane and rely on MPLS-TP OAM capabilities to trigger 183 protection switching. These mechanisms are data plane-driven and are 184 based on MPLS-TP OAM fault management functions. "Fault management" 185 in this context refers to failure detection, localization, and 186 notification (where the term "failure" is used to represent both 187 signal failure and signal degradation). 189 The principles of MPLS-TP protection switching operation are similar 190 to those defined in [RFC4427], as the protection mechanism is based 191 on the ability to detect certain defects in the transport entities 192 within the protected domain. The protection switching controller does 193 not care which monitoring method is used, as long as it can be given 194 information about the status of the transport entities within the 195 recovery domain (e.g., 'OK', signal failure, signal degradation, 196 etc.). 198 An MPLS-TP OAM Automatic Protection Switching (APS) protocol may be 199 used as an in-band (i.e., data plane-based) control protocol to align 200 both ends of the protected domain. 202 The MPLS-TP protection mechanisms may be applied at various levels 203 throughout the MPLS-TP network, as is the case with the recovery 204 schemes defined in [RFC4427] and [RFC4873]. A Label Switching Path 205 (LSP) may be subject to span, segment, and/or end-to-end recovery, 206 where: 208 - span protection refers to the protection of an individual link (and 209 hence all or a subset of the LSPs routed over the link) between two 210 neighboring switches; 212 - segment protection refers to the recovery of an LSP segment (i.e., 213 tandem connection in the language of [MPLS-TP-REQ]) between two 214 nodes which are the boundary nodes of the segment; and 216 - end-to-end protection refers to the protection of an entire LSP 217 from the ingress to the egress node. 219 Multiple recovery levels may be used concurrently by a single LSP for 220 added resiliency. 222 It is a basic requirement of MPLS-TP that both directions of a 223 bidirectional LSP should be co-routed (that is, share the same route 224 within the network) and be fate-sharing (that is, if one direction 225 fails, both directions should cease to operate) [MPLS-TP-REQ]. This 226 causes a direct interaction between the protection levels affecting 227 the directions of an LSP such that both directions of the LSP are 228 switched to a new span, segment, or end-to-end path together. 230 The protection scheme operating at the data plane level can function 231 in a multi-domain environment; it should also protect against a 232 failure of a boundary node in the case of inter-domain operation. 234 The MPLS-TP recovery schemes apply to LSPs and PWE3. This document 235 focuses on LSPs and handles both point-to-point (P2P) and point-to- 236 multipoint (P2MP) LSPs. 238 This framework introduces the architecture of the MPLS-TP recovery 239 domain and describes the recovery schemes in MPLS-TP (based on the 240 recovery types defined in [RFC4427]) as well as the principles of 241 operation, recovery states, recovery triggers, and information 242 exchanges between the different elements that sustain the reference 243 model. The reference model is based on the MPLS-TP OAM reference 244 model which is defined in [MPLS-TP-OAM]. 246 This framework also refers to recovery schemes that are optimized for 247 specific topologies, such as linear, ring, and mesh, in order to 248 handle protection switching in a cost-efficient manner. 250 This document takes into account the timing co-ordination of 251 protection switches at multiple layers. This prevents races and 252 allows the protection switching mechanism of the server layer to fix 253 a problem before switching at the MPLS-TP layer. 255 This framework also specifies the functions that must be supported by 256 MPLS-TP OAM (e.g., APS) and the management and/or the control plane 257 in order to support the recovery mechanisms. 259 MPLS-TP introduces a tool kit to enable recovery in MPLS-TP-based 260 transport networks and to ensure that affected traffic is restored in 261 the event of a failure. 263 Different recovery levels may be used concurrently by a single LSP 264 for added resiliency. 266 Generally, network operators aim to provide the fastest, most stable, 267 and the best protection mechanism available at a reasonable cost. The 268 higher the levels of protection, the greater the number of resources 269 consumed. It is therefore expected that network operators will offer 270 a wide spectrum of service levels. MPLS-TP-based recovery offers the 271 flexibility to select the recovery mechanism, choose the granularity 272 at which traffic is protected, and also choose the specific types of 273 traffic that are to be protected. With MPLS-TP-based recovery, it is 274 possible to provide different levels of protection for different 275 classes of service, based on their service requirements. 277 2. Terminology and References 279 The terminology used in this document is consistent with that defined 280 in [RFC4427]. That RFC is, itself, consistent with [G.808.1]. 282 However, certain protection concepts (such as ring protection) are 283 not discussed in [RFC4427], and for those concepts, terminology in 284 this document is drawn from [G.841]. 286 Readers should refer to those documents for normative definitions. 287 This document supplies brief summaries of some terms for clarity and 288 to aid the reader, but does not re-define terms. 290 In particular, note the distinction and definitions made in [RFC4427] 291 for the following three terms. 293 - Protection: re-establishing end-to-end traffic using pre-allocated 294 resources. 295 - Restoration: re-establishing end-to-end traffic using resources 296 allocated at the time of need. Sometimes referred to as "repair". 297 - Recovery: a generic term covering both Protection and Restoration. 299 Important background information can be found in [RFC3386], 300 [RFC3469], [RFC4426], [RFC4427], and [RFC4428]. 302 3. Requirements for Survivability 304 MPLS-TP requirements are presented in [MPLS-TP-REQ]. Survivability is 305 presented as a critical factor in the delivery of reliable services, 306 and the requirements for survivability are set out using the recovery 307 terminology defined in [RFC4427]. 309 These requirements are summarized below. This section may be updated 310 if changes are made to [MPLS-TP-REQ], and that document should be 311 regarded as normative for the definition of all MPLS-TP requirements 312 including those for survivability. 314 General: 316 - Must support tandem network connection protection. 317 - Must support LSP protection. 318 - Must support pseudowire protection. 319 - Must provide appropriate recovery times. 320 - Must scale when many services are affected by a single fault. 321 - Should support span protection. 322 - Should support tandem connection protection. 323 - Should support end-to-end protection. 324 - Must support management plane control. 325 - Must support control plane control. 327 Restoration: 329 - May support pre-planning of restoration resources. 330 - May support computation of restoration resources after failure. 331 - May support shared mesh restoration. 332 - Should support soft LSP restoration (Make-before-break). 333 - May support hard LSP restoration (break-before-make). 334 - Must be topology agnostic. 335 - May support restoration priority. 336 - May utilize preemption during restoration, but only under operator 337 configuration. 339 Protection: 341 - Should be able to apply protection at different levels in the 342 network. 343 - Should operate in conjunction with protection in under-lying 344 networks. 345 - Must support data plane triggered recovery. 346 - Should be equally applicable to LSPs and pseudowires. 347 - Must include mechanisms to detect, locate, notify, and remedy 348 network faults. 349 - May support 1:1 bidirectional protection switching in which case 350 protection switching must be synchronized. 351 - May support 1+1 unidirectional protection switching. 352 - Must be applicable to P2P LSPs 353 - Should be applicable to P2MP LSPs. 354 - Must support protection ration of 100%. 355 - Must support operator's QoS objectives on protection path. 356 - May support extra traffic in 1:1 protection modes. 357 - Must provide operator control and protection prioritization. 358 - Must support revertive and non-revertive behavior. 360 - Must provide mechanisms to prevent protection switching thrashing. 361 - Must provide coordination between protection mechanisms at 362 different layers. 363 - May provide different mechanisms optimized for specific topologies. 365 4. Functional Architecture 367 This section presents an overview of the elements of the functional 368 architecture for survivability within an MPLS-TP network. The 369 intention is to break the components out as separate items so that it 370 can be seen how they may be combined to provide different levels of 371 recovery to meet the requirements set out in the previous section. 373 4.1. Elements of Control 375 Survivability is achieved through specific actions taken to repair 376 network resources or to redirect traffic onto paths that avoid 377 failures in the network. Those actions may be triggered automatically 378 by the network devices, may be enhanced by data plane (i.e., OAM) 379 control plane signaling, and may be under direct the control of an 380 operator. 382 These different options are explored in the next sections. 384 4.1.1. Manual Control 386 Of course, the survivability behavior of the network as a whole, and 387 the reaction of each LSP when a fault is reported, may be under 388 operator control. That is, the operator may establish network-wide or 389 local policies that determine what actions will be taken when 390 different failures are reported that affect different LSPs. At the 391 same time, when a service request is made to cause the establishment 392 of one or more LSPs in the network, the operator (or requesting 393 application) may express a required or desired level of service, and 394 this will be mapped to particular survivability actions taken before 395 and during LSP setup, after the failure of network resources, and 396 upon recovery of those resources. 398 The operator can also be given manual control of survivability 399 actions and events. For example, the operator may force a switchover 400 from a working path to a recovery path (for network optimization 401 purposes with minimal disturbance of services, like when modifying 402 protected or unprotected services, when replacing network elements, 403 etc.), inhibit survivability actions, enable or disable survivability 404 function, or induce the simulation of a network fault. 406 4.1.2. Failure-Triggered Actions 408 Survivability actions may be directly triggered by network failures. 409 That is, the device that detects the failure (for example, Loss of 410 Light on an optical interface) may immediately perform a 411 survivability action. Note that the term "failure" is used to 412 represent both signal failure and signal degradation. 414 This behavior can be subject to management plane or control plane 415 control, but does not require any messages exchanges in any of the 416 management plane, control plane, or data plane to trigger the 417 recovery action - it is directly triggered by data plane stimuli. 418 Note, however, that coordination of recovery actions may require 419 message exchanges. 421 4.1.3. OAM Signaling 423 OAM signaling refers to message exchanges in-band or closely coupled 424 to the data channel. Such messages may be used to detect and isolate 425 faults, but in this context we are concerned with the use of these 426 messages to control or trigger survivability actions. 428 Note that in some cases, it may be the failure to receive an OAM 429 signaling message that causes the survivability action to be taken. 431 OAM signaling may also be used to coordinate recovery actions within 432 the network. 434 4.1.4. Control Plane Signaling 436 The control plane signaling is responsible for setup and teardown of 437 LSPs that are not under management plane control. The control plane 438 can also be used to detect, isolate, and communicate network failures 439 pertaining to peer relationships (neighbor-to-neighbor, or end-to- 440 end). Thus, control plane signaling can initiate and coordinate 441 survivability actions. 443 The control plane can also be used to distribute topology and 444 resource-availability information. In this way, "graceful shutdown" 445 of resources may be effected by withdrawing them, and this can be 446 used as a stimulus to survivability action in a similar way to the 447 reporting or discovery of a fault as described in the previous 448 sections. 450 4.2. Elements of Recovery 452 This section describes the elements of recovery. These are the 453 quantitative aspects of recovery; that is the pieces of the network 454 for which recovery can be provided. 456 4.2.1. Span Recovery 458 A span is a single hop between neighboring nodes in the same network 459 layer. A span is sometimes referred to as a link although this may 460 cause some confusion between the concept of a data link and a traffic 461 engineering (TE) link. LSPs traverse TE links between neighboring 462 label switching routers (LSRs) in the MPLS-TP network, however, a TE 463 link may be provided by: 465 - a single data link 466 - a series of data links in a lower layer established as an LSP and 467 presented to the upper layer as a single TE link 468 - a set of parallel data links in the same layer presented either as 469 a bundle of TE links or a collection of data links that, together, 470 provide data link layer protection scheme. 472 Thus, span recovery may be provided by: 474 - moving the TE link to be supported by a different data link between 475 the same pair of neighbors 476 - re-routing the LSP in the lower layer. 478 Moving the protected LSP to another TE link between the same pair of 479 neighbors is known as segment recovery and is described in Section 480 4.2.2. 482 4.2.2. Segment Recovery 484 An LSP segment is one or more hops on the path of the LSP. (Note that 485 recovery of pseudowire segments is discussed in Section 6). 487 Segment recovery involves redirecting traffic from one end of a 488 segment of an LSP on an alternate path to the other end of the 489 segment. This redirection may be on a pre-established LSP segment, 490 through re-routing of the protected segment, or by tunneling the 491 protected LSP on a "bypass" LSP. 493 Note that protecting an LSP against the failure of a node requires 494 the use of segment recovery, while a link could be protected using 495 span or segment recovery. 497 4.2.3. End-to-End Recovery 499 End-to-end recovery is a special case of segment recovery where the 500 protected LSP segment is the whole of the LSP. End-to-end recovery 501 may be provided as link-diverse or node-diverse recovery where the 502 recovery path shares no links or no nodes with the recovery path. 503 Note that node-diverse paths are necessarily link-diverse, and that 504 full, end-to-end node-diversity is required to guarantee recovery. 506 4.3. Levels of Recovery 508 This section describes the qualitative levels of survivability 509 function that can be provided. The level of recovery offered has a 510 direct effect on the service level provided to the end-user in the 511 event of a network fault. This will be observed as the amount of data 512 lost when a network fault occurs, and the length of time to recovery 513 connectivity. 515 In general there is a correlation between the service level (i.e., 516 the rapidity of recovery and reduction of data loss) and the cost to 517 the network; better service levels require pre-allocation of 518 resources to the recovery paths, and those resources cannot be used 519 for other purposes if high quality recovery is required. 521 Sections 6 and 7 of [RFC4427] provide a full break down of protection 522 and recovery schemes. This section summarizes the qualitative levels 523 available. 525 4.3.1. Dedicated Protection 527 In dedicated protection, the resources for the recovery LSP are 528 pre-assigned for use only by the protected service. This will clearly 529 be the case in 1+1 protection, and may also be the case in 1:1 530 protection where extra traffic (see Section 4.3.3) is not supported. 532 Note that in the bypass tunnel recovery mechanism (see Section 4.4.3) 533 resources may also be dedicated to protecting a specific service. In 534 some cases (one-for-one protection) the whole of the bypass tunnel 535 may be dedicated to provide recovery for a specific LSP, but in other 536 cases (such as facility backup) a subset of the resources of the 537 bypass tunnel may be pre-assigned for use to recover a specific 538 service. However, as described in Section 4.4.3, the bypass tunnel 539 approach can also be used for shared protection (Section 4.3.2), to 540 carry extra traffic (Section 4.3.3), or without reserving resources 541 to achieve best-effort recovery. 543 4.3.2. Shared Protection 545 In shared protection, the resources for the recovery LSPs of several 546 services are shared. These may be shared as 1:n or m:n, and may be 547 shared on individual links, on LSP segments, or on end-to-end LSPs. 549 Where a bypass tunnel is used (Section 4.4.3) the tunnel might not 550 have sufficient resources to simultaneously protect all of the LSPs 551 to which it offers protection so that if they were all affected by 552 network failures at the same time, they would not all be recovered. 554 Shared protection is a trade-off between expensive network resources 555 being dedicated to protection that is not required most of the time, 556 and the risk of unrecoverable services in the event of multiple 557 network failures. There is also a trade-off between rapid recovery 558 (that can be achieved with dedicated protection, but which is delayed 559 by message exchanges in the management, control, or data planes for 560 shared protection) and the reduction of network cost by sharing 561 protection resources. These trade-offs may be somewhat mitigated by 562 using m:n for some value of m <> 1, and by establishing new 563 protection paths as each available protection path is put into use. 565 4.3.3. Extra Traffic 567 A way to utilize network resources that would otherwise be idle 568 awaiting use to protect services, is to use them to carry other 569 traffic. Obviously, this is not practical in dedicated protection 570 (Section 4.3.1), but is practical in shared protection (Section 571 4.3.2) and bypass tunnel protection (Section 4.4.3). 573 When a network resource that is carrying extra traffic is required 574 for protection, the extra traffic is disrupted - essentially it is 575 pre-empted by the recovery LSP. This may require some additional 576 messages exchanges in the management, control, or data planes, with 577 the consequence that recovery may be delayed somewhat. This provides 578 an obvious trade-off against the cost reduction (or rather, revenue 579 increase) achieved by carrying extra traffic. 581 4.3.4. Restoration and Repair 583 If resources are not pre-assigned for use by the recovery LSP, the 584 recovery LSP must be established "on demand" when the network failure 585 is detected and reported, or upon instruction from the management 586 plane. 588 Restoration represents the most cost-effective use of network 589 resources as no resources are tied up for specific protection usage. 590 However, restoration requires computation of a new path and 591 activation of a new LSP (through the management or control plane). 592 These steps can take much more time than is required for recovery 593 using protection techniques. 595 Furthermore, there is no guarantee that restoration will be able to 596 recover the service. It may be that all suitable network resources 597 are already in use for other LSPs so that no new path can be found. 598 This problem can be partially mitigated by the use of LSP setup 599 priorities so that recovery LSPs can pre-empt other low priority 600 LSPs. 602 Additionally, when a network failure occurs, multiple LSPs may be 603 disrupted by the same event. These LSPs may have been established by 604 different Network Management Stations (NMSs) or signaled by different 605 head-end LSRs, and this means that multiple points in the network 606 will be trying to compute and establish recovery LSPs at the same 607 time. This can lead to contention within the network meaning that 608 some recovery LSPs must be retried resulting in very slow recovery 609 times for some services. 611 4.3.5. Reversion 613 When a service has been recovered so that traffic is flowing on the 614 recovery LSP, the faulted network resource may be repaired. The 615 choice must be made about whether to redirect the traffic back on to 616 the original working LSP, or to leave it where it is on the recovery 617 LSP. These behaviors are known as "revertive" and "non-revertive", 618 respectively. 620 In "revertive" mode, care should be taken to prevent frequent 621 operation of the recovery operation due to an intermittent defect. 622 Therefore, when the failure condition of a recovery element has been 623 handled, a fixed period of time should elapse before normal data 624 traffic is redirected back onto the original working entity. 626 4.4. Mechanisms for Recovery 628 The purpose of this section is to describe in general (MPLS-TP 629 non-specific) terms the mechanisms that can be used to provide 630 protection. 632 4.4.1. Link-Level Protection 634 4.4.2. Alternate Paths and Segments 636 4.4.3. Bypass Tunnels 638 4.5. Protection in Different Topologies 640 As described in the requirements listed in Section 3 and detailed in 641 [MPLS-TP-REQ], the recovery techniques used may be optimized for 642 different network topologies. This section describes two different 643 topologies and explains how recovery may be markedly different in 644 those different scenarios. It also introduces the concept of a 645 recovery domain and shows how end-to-end survivability may be 646 achieved through a concatenation of recovery domains each providing 647 some level of recovery in part of the network. 649 4.5.1. Mesh Networks 651 Linear protection provides a fast and simple protection switching 652 mechanism and it fits best in mesh networks. It can protect against a 653 failure that may happen on an entity (element of recovery that may 654 constitute a span, LSP segment, PW segment, end-to-end LSP or end-to- 655 end PW). 657 In order to guarantee the protection, two entities are 658 pre-provisioned. One of the pre-provisioned entities is configured to 659 be the 'working' entity (primary) and the other is configured as the 660 'protection' entity (backup). 662 The Protection switching occurs at the protection controllers which 663 reside at the edges of the protected entity. Between these endpoints, 664 there are working and protection entities. 666 In linear protection, a protection entity is pre-provisioned to 667 protect the working entity. In order to guarantee protection 668 switching in case of a 'failed' condition, the physical routes of the 669 working and the protection entities should have complete physical 670 diversity. 672 [MPLS-TP-REQ] requires that both 1:1 linear protection scheme and 1+1 673 protection schemes are supported. The 1:1 protection switching, 674 bi-directional protection switching should be supported. In 1+1 675 linear protection switching unidirectional protection switching 676 should be supported. 678 1:1 linear protection: 680 - In normal conditions the data traffic is transmitted either over 681 the working entity or the 'protection' entity. Normal conditions 682 are defined when there is no failure on the 'working' entity and 683 there is no administrative configuration or requests that cause 684 traffic to transmit over the 'protection' entity. Upon a failure 685 condition or a specific administrative request, the traffic is 686 switched over to the 'protection' entity. 688 - In each transmission direction, the source of the protection domain 689 bridges the traffic into the appropriate entity and the sink of the 690 protected domain selects the traffic from the appropriate entity. 691 The source and the sink need to be coordinated to ensure that the 692 bridging and the selection are done to and from the same entity. 693 For that sake a signaling coordination protocol is needed. 695 - In bi-directional protection switching, both ends of the protection 696 domain should switch to the 'protection' entity (even when the 697 failure is unidirectional). 699 - When there is no failure, the resources of the 'idle' entity may be 700 used for less priority traffic, extra traffic. When protection 701 switching is performed, the extra traffic is required for 702 protection, the extra traffic is pre-empted by the protected 703 traffic. 705 1+1 linear protection: 707 - The data traffic is copied at fed to both the 'working' and the 708 'protection' entities. The traffic on the 'working' and the 709 'protection' entities is transmitted simultaneously to the sink of 710 the protected domain, where a selection between the 'working' and 711 'protection' entities is made (based on some predetermined 712 criteria). Since only uni-directional protection switching is 713 supported in the 1+1 linear protection scheme, there is no need to 714 coordinate between the protection controllers. 716 4.5.2. Ring Networks 718 4.5.3. Protection and Restoration Domains 720 Protection and restoration are performed in the context of a recovery 721 domain. A recovery domain is defined between two recovery reference 722 points which are located at the edges of the recovery domain and are 723 responsible for performing recovery for a 'working' entity (which may 724 be one of the elements of recovery defined above) when an appropriate 725 trigger is received. These reference points function as recovery 726 controllers. 728 As described in section 4.2 above, the recovery element may 729 constitute a spam, a tandem connection (i.e. either an LSP segment or 730 a PW segment), an end-to-end LSP, or an end-to-end PW. 732 The method used to monitor the health of the recovery element is 733 unimportant, provided that the recovery controllers receive 734 information on its condition. The condition of the recovery element 735 may be OK, 'failed', or degraded. 737 When the recovery operation is launched by an OAM trigger, the 738 recovery domain is equivalent to the OAM maintenance entity which is 739 defined in [MPLS-TP-OAM], and the recovery reference points are 740 defined at the same location as the OAM MEPs. 742 4.6. Recovery in Layered Networks 744 In multi-layer or multi-region networking, recovery may be performed 745 at multiple layers or across cascaded recovery domains. 747 The MPLS-TP recovery mechanism must ensure that the timing of 748 recovery is co-ordinated in order to avoid races, and to allow either 749 the recovery mechanism of the server layer to fix the problem before 750 recovery takes place at the MPLS-TP layer, or to allow an upstream 751 recovery domain to perform recovery before a downstream domain. In 752 inter-connected rings, for example, it may be preferable to allow the 753 upstream ring to perform recovery before the downstream ring, in 754 order to ensure that recovery takes place in the ring in which the 755 failure occurred. 757 A hold-off timer is required to coordinate the timing of recovery at 758 multiple layers or across cascaded recovery domains. Setting this 759 configurable timer involves a trade-off between rapid recovery and 760 the creation of a race condition where multiple layers respond to the 761 same fault, potentially allocating resources in an inefficient 762 manner. Thus, the detection of a failure condition in the MPLS-TP 763 layer should not immediately trigger the recovery process if the 764 hold-off timer is set to a value other than zero. The hold-off timer 765 should be started and, on expiry, the recovery element should be 766 checked to determine whether the failure condition still exists. If 767 it does exist, the defect triggers the recovery operation. 769 In other configurations, where the lower layer does not have a 770 restoration capability, or where it is not expected to provide 771 protection, the lower layer needs to trigger the higher layer to 772 immediately perform recovery. 774 [RFC3386] 776 4.6.1. Inherited Link-Level Protection 778 4.6.2. Shared Risk Groups 780 4.6.3. Fault Correlation 782 5. Mechanisms for Providing Protection in MPLS-TP 784 This section describes the existing mechanisms available to provide 785 protection within MPLS-TP networks and highlights areas where new 786 work is required. It is expected that, as new protocol extensions and 787 techniques are developed, this section will be updated to convert the 788 statements of required work into references to those protocol 789 extensions and techniques. 791 5.1. Management Plane 793 As described above, a fundamental requirement of MPLS-TP is that 794 recovery mechanisms should be capable of functioning in the absence 795 of a control plane. Recovery may be triggered by MPLS-TP OAM fault 796 management functions or by external requests (e.g. an operator 797 request for manual control of protection switching). 799 The management plane may be used to configure the recovery domain by 800 setting the reference points (recovery controllers), the 'working' 801 and 'protection' entities, and the recovery type (e.g. 1:1 802 bi-directional linear protection, ring protection, etc.). Additional 803 parameters associated with the recovery process (such as a hold-off 804 timer, revertive/non-revertive operation, etc.) may also be 805 configured. 807 In addition, the management plane may initiate manual control of the 808 protection switching function. Either the fault condition or the 809 operator request should be prioritized. 811 Since provisioning the recovery domain involves the selection of a 812 number of options, mismatches may occur at the different reference 813 points. The MPLS-TP OAM Automatic Protection Switching (APS) protocol 814 may be used as an in-band (i.e., data plane-based) control protocol 815 to align both ends of the protected domain. 817 It should also be possible for the management plane to monitor the 818 recovery status. 820 5.1.1. Configuration of Protection Operation 822 In order to implement the protection switching mechanism, the 823 following entities and information should be provisioned: 825 - The protection controllers (reference points) 827 - The protection group consisting of a 'working' entity (which may be 828 one of the recovery elements defined above) and a 'protection' 829 entity. To guarantee protection, the paths of the 'working' and the 830 'protection' entities should have complete physical diversity. 832 - The protection type that should be applied 834 - Revertive/non-revertive behavior 836 5.1.2. External manual commands 838 The following external, manual commands may be applied to a 839 protection group; they are listed in descending order of priority: 841 - Blocked protection action - a manual command to prevent data 842 traffic from switching to the 'protection' entity. This command 843 actually disables the protection group. 845 - Force protection action - a manual command that forces a switch of 846 normal data traffic to the 'protection' entity. 848 - Manual protection action - a manual command that forces a switch of 849 data traffic to the 'protection' entity when there is no failure in 850 the 'working' or the 'protection' entity 852 5.2. Fault Detection 854 5.3. Fault Isolation 855 5.4. OAM Signaling 857 5.4.1. Fault Detection 859 5.4.2. Fault Isolation 861 5.4.3. Fault Reporting 863 5.4.4. Coordination of Recovery Actions 865 5.5. Control Plane Signaling 867 5.5.1. Fault Detection 869 5.5.2. Fault Isolation 871 5.5.3. Fault Reporting 873 5.5.4. Coordination of Recovery Actions 875 6. Pseudowire Protection Considerations 877 The main application for the MPLS-TP network is currently identified 878 as the pseudowire. Pseudowires provide end-to-end connectivity over 879 the MPLS-TP network and may be comprised of a single pseudowire 880 segment, or multiple segments "stitched" together to provide end-to- 881 end connectivity. 883 The pseudowire service may, itself, require a level of protection as 884 part of its SLA. This protection could be provided by the MPLS-TP 885 LSPs that support the pseudowire, or could be a feature of the 886 pseudowire layer itself. 888 6.1. Utilizing Underlying MPLS-TP Protection 890 6.2. Protection in the Pseudowire Layer 892 7. Manageability Considerations 894 8. Security Considerations 896 9. IANA Considerations 898 This informational document makes no requests for IANA action. 900 10. Acknowledgments 901 11. References 903 11.1. Normative References 905 [RFC4427] Mannie, E., and Papadimitriou, D., "Recovery 906 (Protection and Restoration) Terminology for 907 Generalized Multi-Protocol Label Switching (GMPLS)", 908 RFC 4427, March 2006. 910 [RFC4428] Papadimitriou D. and E.Mannie, Editors, "Analysis of 911 Generalized Multi-Protocol Label Switching (GMPLS)- 912 based Recovery Mechanisms (including Protection and 913 Restoration)", RFC 4428, March 2006. 915 [RFC4873] Berger, L., Bryskin, I., Papadimitriou, D., and 916 Farrel, A., " GMPLS Segment Recovery", RFC 4873, May 917 2007. 919 [G.808.1] ITU-T, "Generic Protection Switching - Linear trail 920 and subnetwork protection,", Recommendation G.808.1, 921 December 2003. 923 [G.841] ITU-T, "Types and Characteristics of SDH Network 924 Protection Architectures," Recommendation G.841, 925 October 1998. 927 [MPLS-TP-JWT] Bryant, S., and Andersson, L. "JWT Report on MPLS 928 Architectural Considerations for a Transport Profile", 929 draft-bryant-jwt-mplstp-report, work in progress. 931 [MPLS-TP-REQ] B. Niven-Jenkins, et al., "Requirements for MPLS-TP", 932 draft-jenkins-mpls-mplstp-requirements, work in 933 progress. 935 [MPLS-TP-OAM] Vigoureux, M., Betts, M., and Ward, D., "MPLS TP OAM 936 Requirements (MPLS)", work in progress. 938 11.2. Informative References 940 [RFC3386] Lai, W. and D. McDysan, "Network Hierarchy and 941 Multilayer Survivability", RFC 3386, November 2002. 943 [RFC3469] Sharma, V., and Hellstrand, F., "Framework for Multi- 944 Protocol Label Switching (MPLS)-based Recovery", RFC 945 3469, February 2003. 947 [RFC4426] Lang, J., Rajagopalan B., and D. Papadimitriou, 948 Editors, "Generalized Multiprotocol Label Switching 949 (GMPLS) Recovery Functional Specification", RFC 4426, 950 March 2006. 952 12. Editors' Addresses 954 Nurit Sprecher 955 Nokia Siemens Networks 956 3 Hanagar St. Neve Ne'eman B 957 45241 Hod Hasharon, Israel 958 Tel. +972 9 7751229 959 Email: nurit.sprecher@nsn.com 961 Adrian Farrel 962 Old Dog Consulting 963 Email: adrian@olddog.co.uk 965 Vach Kompella 966 Alcatel-Lucent 967 701 East Middlefield Rd. 968 Mountain View, CA 94043 969 Email: vach.kompella@alcatel.com 971 13. Intellectual Property Statement 973 The IETF takes no position regarding the validity or scope of any 974 Intellectual Property Rights or other rights that might be claimed to 975 pertain to the implementation or use of the technology described in 976 this document or the extent to which any license under such rights 977 might or might not be available; nor does it represent that it has 978 made any independent effort to identify any such rights. Information 979 on the procedures with respect to rights in RFC documents can be 980 found in BCP 78 and BCP 79. 982 Copies of IPR disclosures made to the IETF Secretariat and any 983 assurances of licenses to be made available, or the result of an 984 attempt made to obtain a general license or permission for the use of 985 such proprietary rights by implementers or users of this 986 specification can be obtained from the IETF on-line IPR repository at 987 http://www.ietf.org/ipr. 989 The IETF invites any interested party to bring to its attention any 990 copyrights, patents or patent applications, or other proprietary 991 rights that may cover technology that may be required to implement 992 this standard. Please address the information to the IETF at ietf- 993 ipr@ietf.org. 995 Disclaimer of Validity 997 This document and the information contained herein are provided 998 on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE 999 REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE 1000 IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL 1001 WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY 1002 WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE 1003 ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS 1004 FOR A PARTICULAR PURPOSE. 1006 Copyright Statement 1008 Copyright (C) The IETF Trust (2008). This document is subject to the 1009 rights, licenses and restrictions contained in BCP 78, and except as 1010 set forth therein, the authors retain all their rights.