idnits 2.17.1 

draft-sprecher-mpls-tp-survive-fwk-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 18.

  -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on
     line 1004.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 980.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 987.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 993.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == The page length should not exceed 58 lines per page, but there was 1
     longer page, the longest (page 2) being 59 lines


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust Copyright Line does not match the
     current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- Couldn't find a document date in the document -- date freshness check
     skipped.


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  -- No information found for draft-jenkins-mpls-mplstp-requirements - is the
     name correct?


     Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 8 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Network Working Group                                        N. Sprecher
2	Internet Draft                                    Nokia Siemens Networks
3	Category: Informational                                        A. Farrel
4	Created: July 7, 2008                                 Old Dog Consulting
5	Expires: January 7, 2009                                     V. Kompella
6	                                                          Alcatel-Lucent

8	               Multiprotocol Label Switching Transport Profile
9	                           Survivability Framework

11	                  draft-sprecher-mpls-tp-survive-fwk-00.txt

13	Status of this Memo

15	   By submitting this Internet-Draft, each author represents that any
16	   applicable patent or other IPR claims of which he or she is aware
17	   have been or will be disclosed, and any of which he or she becomes
18	   aware will be disclosed, in accordance with Section 6 of BCP 79.

20	   Internet-Drafts are working documents of the Internet Engineering
21	   Task Force (IETF), its areas, and its working groups. Note that other
22	   groups may also distribute working documents as Internet-Drafts.

24	   Internet-Drafts are draft documents valid for a maximum of six months
25	   and may be updated, replaced, or obsoleted by other documents at any
26	   time. It is inappropriate to use Internet-Drafts as reference
27	   material or to cite them other than as "work in progress."

29	   The list of current Internet-Drafts can be accessed at
30	   http://www.ietf.org/ietf/1id-abstracts.txt.

32	   The list of Internet-Draft Shadow Directories can be accessed at
33	   http://www.ietf.org/shadow.html.

35	Abstract

37	   Network survivability is the network's ability to restore traffic
38	   following failure or attack; it plays a critical factor in the
39	   delivery of reliable services in transport networks. Guaranteed
40	   services in the form of Service Level Agreements (SLAs) require a
41	   resilient network that detects facility or node failures, very
42	   rapidly, and immediately starts to restore network operations in
43	   accordance with the terms of the SLA.

45	   The Transport Profile of Multiprotocol Label Switching (MPLS-TP) is a
46	   packet transport technology that combines the packet experience of
47	   MPLS with the operational experience of SONET/SDH. It provides
48	   survivability mechanisms such as protection and restoration, with
49	   similar function levels to those found in established transport
50	   networks such as in SONET/SDH networks. Some of the MPLS-TP
51	   protection mechanisms are data plane-driven and are based on MPLS-TP
52	   OAM fault management functions which are used to trigger protection
53	   switching in the absence of a control plane. Other protection
54	   mechanisms utilize the MPLS-TP control plane.

56	   This document provides a framework for MPLS-TP survivability.

58	Table of Contents

60	   1. Introduction                                     3
61	   2. Terminology and References                       6
62	   3. Requirements for Survivability                   6
63	   4. Functional Architecture                          8
64	   4.1. Elements of Control                            8
65	   4.1.1. Manual Control                               8
66	   4.1.2. Failure-Triggered Actions                    9
67	   4.1.3. OAM Signaling                                9
68	   4.1.4.                                              9
69	   4.2. Elements of Recovery                           9
70	   4.2.1. Span Recovery                               10
71	   4.2.2. Segment Recovery                            10
72	   4.2.3.                                             10
73	   4.3. Levels of Recovery                            11
74	   4.3.1. Dedicated Protection                        11
75	   4.3.2. Shared Protection                           11
76	   4.3.3. Extra Traffic                               12
77	   4.3.4. Restoration and Repair                      12
78	   4.3.5.                                             13
79	   4.4. Mechanisms for Recovery                       13
80	   4.4.1. Link-Level Protection                       13
81	   4.4.2. Alternate Paths and Segments                13
82	   4.4.3.                                             13
83	   4.5. Protection in Different Topologies            13
84	   4.5.1. Mesh Networks                               13
85	   4.5.2. Ring Networks                               15
86	   4.5.3.                                             15
87	   4.6. Recovery in Layered Networks                  15
88	   4.6.1. Inherited Link-Level Protection             16
89	   4.6.2. Shared Risk Groups                          16
90	   4.6.3. Fault Correlation                           16
91	   5. Mechanisms for Providing Protection in MPLS-TP  16
92	   5.1. Management Plane                              16
93	   5.1.1. Configuration of Protection Operation       17
94	   5.1.2. Forced Protection Actions                   17
95	   5.1.3. Blocked Protection Actions                  17
96	   5.2. Fault Detection                               17
97	   5.3. Fault Isolation                               18
98	   5.4. OAM Signaling                                 18
99	   5.4.1. Fault Detection                             18
100	   5.4.2. Fault Isolation                             18
101	   5.4.3. Fault Reporting                             18
102	   5.4.4. Coordination of Recovery Actions            18
103	   5.5. Control Plane Signaling                       18
104	   5.5.1. Fault Detection                             18
105	   5.5.2. Fault Isolation                             18
106	   5.5.3. Fault Reporting                             18
107	   5.5.4. Coordination of Recovery Actions            18
108	   6. Pseudowire Protection Considerations            18
109	   6.1. Utilizing Underlying MPLS-TP Protection       18
110	   6.2. Protection in the Pseudowire Layer            18
111	   7. Manageability Considerations                    18
112	   8. Security Considerations                         18
113	   9. IANA Considerations                             18
114	   10. Acknowledgments                                18
115	   11. References                                     19
116	   11.1. Normative References                         19
117	   11.2. Informative References                       19
118	   12. Editors' Addresses                             20
119	   13. Intellectual Property Statement                20

121	1. Introduction

123	   Network survivability is the network's ability to restore traffic
124	   following failure or attack; it plays a critical factor in the
125	   delivery of reliable services in transport networks. Guaranteed
126	   services in the form of Service Level Agreements (SLAs) require a
127	   resilient network that very rapidly detects facility or node
128	   failures, and immediately starts to restore network operations in
129	   accordance with the terms of the SLA.

131	   The Transport Profile of Multiprotocol Label Switching (MPLS-TP)
132	   [MPLS-TP-JWT] , [MPLS-TP-REQ] is a packet transport technology that
133	   combines the packet experience of MPLS with the operational
134	   experience of SONET/SDH. MPLS-TP is designed to be consistent with
135	   existing transport network operations and management models and
136	   provide survivability mechanisms, such as protection and restoration
137	   with similar function levels to those found in established transport
138	   networks such as the SONET/SDH networks which provided service
139	   providers with a high benchmark for reliability.

141	   This document provides a framework for MPLS-TP-based survivability.
142	   It uses the recovery terminology defined in [RFC4427] which draws
143	   heavily on [G.808.1], and refers to the requirements specified in
144	   [MPLS-TP-REQ].

146	   Various recovery schemes (for protection and restoration) and
147	   processes have been defined and analyzed in [RFC4427] and [RFC4428].
148	   These schemes may also be applied in MPLS-TP networks to re-establish
149	   end-to-end traffic delivery within the agreed service level and so
150	   recover from 'failed' or 'degraded' transport entities (links or
151	   nodes). Such actions are normally initiated by the detection of a
152	   defect or performance degradation, or by an external request (e.g.,
153	   an operator request for manual control of protection switching).

155	   [RFC4427] makes a distinction between protection switching and
156	   restoration mechanisms. Protection switching makes use of
157	   pre-assigned capacity between nodes, where the simplest scheme has
158	   one dedicated protection entity for each working entity, while the
159	   most complex scheme has m protection entities shared between n
160	   working entities (m:n). Protection switching may be either
161	   unidirectional or bidirectional. Restoration uses any capacity
162	   available between nodes and usually involves re-routing. The
163	   resources used for restoration may be pre-planned and recovery
164	   priority may be used as a differentiation mechanism to determine
165	   which services are recovered and which are not recovered or are
166	   sacrificed in order to achieve recovery of other services.. In
167	   general, protection actions are completed within time frames of tens
168	   of milliseconds, while restoration actions are normally completed in
169	   periods ranging from hundreds of milliseconds to a maximum of a few
170	   seconds.

172	   However, the recovery schemes described in [RFC4427] and evaluated in
173	   [RFC4428] assume some control plane-driven actions that are performed
174	   in the recovery context. As for other transport technologies and
175	   associated transport networks, the presence of a distributed control
176	   plane in support of MPLS-TP network operations is optional, and the
177	   absence of such a control plane does not affect the ability to
178	   operate the network and to use MPLS-TP forwarding, OAM, and
179	   protection capabilities.

181	   Thus, some of the MPLS-TP recovery mechanisms do not depend on a
182	   control plane and rely on MPLS-TP OAM capabilities to trigger
183	   protection switching. These mechanisms are data plane-driven and are
184	   based on MPLS-TP OAM fault management functions. "Fault management"
185	   in this context refers to failure detection, localization, and
186	   notification (where the term "failure" is used to represent both
187	   signal failure and signal degradation).

189	   The principles of MPLS-TP protection switching operation are similar
190	   to those defined in [RFC4427], as the protection mechanism is based
191	   on the ability to detect certain defects in the transport entities
192	   within the protected domain. The protection switching controller does
193	   not care which monitoring method is used, as long as it can be given
194	   information about the status of the transport entities within the
195	   recovery domain (e.g., 'OK', signal failure, signal degradation,
196	   etc.).

198	   An MPLS-TP OAM Automatic Protection Switching (APS) protocol may be
199	   used as an in-band (i.e., data plane-based) control protocol to align
200	   both ends of the protected domain.

202	   The MPLS-TP protection mechanisms may be applied at various levels
203	   throughout the MPLS-TP network, as is the case with the recovery
204	   schemes defined in [RFC4427] and [RFC4873]. A Label Switching Path
205	   (LSP) may be subject to span, segment, and/or end-to-end recovery,
206	   where:

208	   - span protection refers to the protection of an individual link (and
209	     hence all or a subset of the LSPs routed over the link) between two
210	     neighboring switches;

212	   - segment protection refers to the recovery of an LSP segment (i.e.,
213	     tandem connection in the language of [MPLS-TP-REQ]) between two
214	     nodes which are the boundary nodes of the segment; and

216	   - end-to-end protection refers to the protection of an entire LSP
217	     from the ingress to the egress node.

219	   Multiple recovery levels may be used concurrently by a single LSP for
220	   added resiliency.

222	   It is a basic requirement of MPLS-TP that both directions of a
223	   bidirectional LSP should be co-routed (that is, share the same route
224	   within the network) and be fate-sharing (that is, if one direction
225	   fails, both directions should cease to operate) [MPLS-TP-REQ]. This
226	   causes a direct interaction between the protection levels affecting
227	   the directions of an LSP such that both directions of the LSP are
228	   switched to a new span, segment, or end-to-end path together.

230	   The protection scheme operating at the data plane level can function
231	   in a multi-domain environment; it should also protect against a
232	   failure of a boundary node in the case of inter-domain operation.

234	   The MPLS-TP recovery schemes apply to LSPs and PWE3. This document
235	   focuses on LSPs and handles both point-to-point (P2P) and point-to-
236	   multipoint (P2MP) LSPs.

238	   This framework introduces the architecture of the MPLS-TP recovery
239	   domain and describes the recovery schemes in MPLS-TP (based on the
240	   recovery types defined in [RFC4427]) as well as the principles of
241	   operation, recovery states, recovery triggers, and information
242	   exchanges between the different elements that sustain the reference
243	   model. The reference model is based on the MPLS-TP OAM reference
244	   model which is defined in [MPLS-TP-OAM].

246	   This framework also refers to recovery schemes that are optimized for
247	   specific topologies, such as linear, ring, and mesh, in order to
248	   handle protection switching in a cost-efficient manner.

250	   This document takes into account the timing co-ordination of
251	   protection switches at multiple layers. This prevents races and
252	   allows the protection switching mechanism of the server layer to fix
253	   a problem before switching at the MPLS-TP layer.

255	   This framework also specifies the functions that must be supported by
256	   MPLS-TP OAM (e.g., APS) and the management and/or the control plane
257	   in order to support the recovery mechanisms.

259	   MPLS-TP introduces a tool kit to enable recovery in MPLS-TP-based
260	   transport networks and to ensure that affected traffic is restored in
261	   the event of a failure.

263	   Different recovery levels may be used concurrently by a single LSP
264	   for added resiliency.

266	   Generally, network operators aim to provide the fastest, most stable,
267	   and the best protection mechanism available at a reasonable cost. The
268	   higher the levels of protection, the greater the number of resources
269	   consumed. It is therefore expected that network operators will offer
270	   a wide spectrum of service levels. MPLS-TP-based recovery offers the
271	   flexibility to select the recovery mechanism, choose the granularity
272	   at which traffic is protected, and also choose the specific types of
273	   traffic that are to be protected. With MPLS-TP-based recovery, it is
274	   possible to provide different levels of protection for different
275	   classes of service, based on their service requirements.

277	2. Terminology and References

279	   The terminology used in this document is consistent with that defined
280	   in [RFC4427]. That RFC is, itself, consistent with [G.808.1].

282	   However, certain protection concepts (such as ring protection) are
283	   not discussed in [RFC4427], and for those concepts, terminology in
284	   this document is drawn from [G.841].

286	   Readers should refer to those documents for normative definitions.
287	   This document supplies brief summaries of some terms for clarity and
288	   to aid the reader, but does not re-define terms.

290	   In particular, note the distinction and definitions made in [RFC4427]
291	   for the following three terms.

293	   - Protection: re-establishing end-to-end traffic using pre-allocated
294	     resources.
295	   - Restoration: re-establishing end-to-end traffic using resources
296	     allocated at the time of need. Sometimes referred to as "repair".
297	   - Recovery: a generic term covering both Protection and Restoration.

299	   Important background information can be found in [RFC3386],
300	   [RFC3469], [RFC4426], [RFC4427], and [RFC4428].

302	3. Requirements for Survivability

304	   MPLS-TP requirements are presented in [MPLS-TP-REQ]. Survivability is
305	   presented as a critical factor in the delivery of reliable services,
306	   and the requirements for survivability are set out using the recovery
307	   terminology defined in [RFC4427].

309	   These requirements are summarized below. This section may be updated
310	   if changes are made to [MPLS-TP-REQ], and that document should be
311	   regarded as normative for the definition of all MPLS-TP requirements
312	   including those for survivability.

314	   General:

316	   - Must support tandem network connection protection.
317	   - Must support LSP protection.
318	   - Must support pseudowire protection.
319	   - Must provide appropriate recovery times.
320	   - Must scale when many services are affected by a single fault.
321	   - Should support span protection.
322	   - Should support tandem connection protection.
323	   - Should support end-to-end protection.
324	   - Must support management plane control.
325	   - Must support control plane control.

327	   Restoration:

329	   - May support pre-planning of restoration resources.
330	   - May support computation of restoration resources after failure.
331	   - May support shared mesh restoration.
332	   - Should support soft LSP restoration (Make-before-break).
333	   - May support hard LSP restoration (break-before-make).
334	   - Must be topology agnostic.
335	   - May support restoration priority.
336	   - May utilize preemption during restoration, but only under operator
337	     configuration.

339	   Protection:

341	   - Should be able to apply protection at different levels in the
342	     network.
343	   - Should operate in conjunction with protection in under-lying
344	     networks.
345	   - Must support data plane triggered recovery.
346	   - Should be equally applicable to LSPs and pseudowires.
347	   - Must include mechanisms to detect, locate, notify, and remedy
348	     network faults.
349	   - May support 1:1 bidirectional protection switching in which case
350	     protection switching must be synchronized.
351	   - May support 1+1 unidirectional protection switching.
352	   - Must be applicable to P2P LSPs
353	   - Should be applicable to P2MP LSPs.
354	   - Must support protection ration of 100%.
355	   - Must support operator's QoS objectives on protection path.
356	   - May support extra traffic in 1:1 protection modes.
357	   - Must provide operator control and protection prioritization.
358	   - Must support revertive and non-revertive behavior.

360	   - Must provide mechanisms to prevent protection switching thrashing.
361	   - Must provide coordination between protection mechanisms at
362	     different layers.
363	   - May provide different mechanisms optimized for specific topologies.

365	4. Functional Architecture

367	   This section presents an overview of the elements of the functional
368	   architecture for survivability within an MPLS-TP network. The
369	   intention is to break the components out as separate items so that it
370	   can be seen how they may be combined to provide different levels of
371	   recovery to meet the requirements set out in the previous section.

373	4.1. Elements of Control

375	   Survivability is achieved through specific actions taken to repair
376	   network resources or to redirect traffic onto paths that avoid
377	   failures in the network. Those actions may be triggered automatically
378	   by the network devices, may be enhanced by data plane (i.e., OAM)
379	   control plane signaling, and may be under direct the control of an
380	   operator.

382	   These different options are explored in the next sections.

384	4.1.1. Manual Control

386	   Of course, the survivability behavior of the network as a whole, and
387	   the reaction of each LSP when a fault is reported, may be under
388	   operator control. That is, the operator may establish network-wide or
389	   local policies that determine what actions will be taken when
390	   different failures are reported that affect different LSPs. At the
391	   same time, when a service request is made to cause the establishment
392	   of one or more LSPs in the network, the operator (or requesting
393	   application) may express a required or desired level of service, and
394	   this will be mapped to particular survivability actions taken before
395	   and during LSP setup, after the failure of network resources, and
396	   upon recovery of those resources.

398	   The operator can also be given manual control of survivability
399	   actions and events. For example, the operator may force a switchover
400	   from a working path to a recovery path (for network optimization
401	   purposes with minimal disturbance of services, like when modifying
402	   protected or unprotected services, when replacing network elements,
403	   etc.), inhibit survivability actions, enable or disable survivability
404	   function, or induce the simulation of a network fault.

406	4.1.2. Failure-Triggered Actions

408	   Survivability actions may be directly triggered by network failures.
409	   That is, the device that detects the failure (for example, Loss of
410	   Light on an optical interface) may immediately perform a
411	   survivability action. Note that the term "failure" is used to
412	   represent both signal failure and signal degradation.

414	   This behavior can be subject to management plane or control plane
415	   control, but does not require any messages exchanges in any of the
416	   management plane, control plane, or data plane to trigger the
417	   recovery action - it is directly triggered by data plane stimuli.
418	   Note, however, that coordination of recovery actions may require
419	   message exchanges.

421	4.1.3. OAM Signaling

423	   OAM signaling refers to message exchanges in-band or closely coupled
424	   to the data channel. Such messages may be used to detect and isolate
425	   faults, but in this context we are concerned with the use of these
426	   messages to control or trigger survivability actions.

428	   Note that in some cases, it may be the failure to receive an OAM
429	   signaling message that causes the survivability action to be taken.

431	   OAM signaling may also be used to coordinate recovery actions within
432	   the network.

434	4.1.4. Control Plane Signaling

436	   The control plane signaling is responsible for setup and teardown of
437	   LSPs that are not under management plane control. The control plane
438	   can also be used to detect, isolate, and communicate network failures
439	   pertaining to peer relationships (neighbor-to-neighbor, or end-to-
440	   end). Thus, control plane signaling can initiate and coordinate
441	   survivability actions.

443	   The control plane can also be used to distribute topology and
444	   resource-availability information. In this way, "graceful shutdown"
445	   of resources may be effected by withdrawing them, and this can be
446	   used as a stimulus to survivability action in a similar way to the
447	   reporting or discovery of a fault as described in the previous
448	   sections.

450	4.2. Elements of Recovery

452	   This section describes the elements of recovery. These are the
453	   quantitative aspects of recovery; that is the pieces of the network
454	   for which recovery can be provided.

456	4.2.1. Span Recovery

458	   A span is a single hop between neighboring nodes in the same network
459	   layer. A span is sometimes referred to as a link although this may
460	   cause some confusion between the concept of a data link and a traffic
461	   engineering (TE) link. LSPs traverse TE links between neighboring
462	   label switching routers (LSRs) in the MPLS-TP network, however, a TE
463	   link may be provided by:

465	   - a single data link
466	   - a series of data links in a lower layer established as an LSP and
467	     presented to the upper layer as a single TE link
468	   - a set of parallel data links in the same layer presented either as
469	     a bundle of TE links or a collection of data links that, together,
470	     provide data link layer protection scheme.

472	   Thus, span recovery may be provided by:

474	   - moving the TE link to be supported by a different data link between
475	     the same pair of neighbors
476	   - re-routing the LSP in the lower layer.

478	   Moving the protected LSP to another TE link between the same pair of
479	   neighbors is known as segment recovery and is described in Section
480	   4.2.2.

482	4.2.2. Segment Recovery

484	   An LSP segment is one or more hops on the path of the LSP. (Note that
485	   recovery of pseudowire segments is discussed in Section 6).

487	   Segment recovery involves redirecting traffic from one end of a
488	   segment of an LSP on an alternate path to the other end of the
489	   segment. This redirection may be on a pre-established LSP segment,
490	   through re-routing of the protected segment, or by tunneling the
491	   protected LSP on a "bypass" LSP.

493	   Note that protecting an LSP against the failure of a node requires
494	   the use of segment recovery, while a link could be protected using
495	   span or segment recovery.

497	4.2.3. End-to-End Recovery

499	   End-to-end recovery is a special case of segment recovery where the
500	   protected LSP segment is the whole of the LSP. End-to-end recovery
501	   may be provided as link-diverse or node-diverse recovery where the
502	   recovery path shares no links or no nodes with the recovery path.
503	   Note that node-diverse paths are necessarily link-diverse, and that
504	   full, end-to-end node-diversity is required to guarantee recovery.

506	4.3. Levels of Recovery

508	   This section describes the qualitative levels of survivability
509	   function that can be provided. The level of recovery offered has a
510	   direct effect on the service level provided to the end-user in the
511	   event of a network fault. This will be observed as the amount of data
512	   lost when a network fault occurs, and the length of time to recovery
513	   connectivity.

515	   In general there is a correlation between the service level (i.e.,
516	   the rapidity of recovery and reduction of data loss) and the cost to
517	   the network; better service levels require pre-allocation of
518	   resources to the recovery paths, and those resources cannot be used
519	   for other purposes if high quality recovery is required.

521	   Sections 6 and 7 of [RFC4427] provide a full break down of protection
522	   and recovery schemes. This section summarizes the qualitative levels
523	   available.

525	4.3.1. Dedicated Protection

527	   In dedicated protection, the resources for the recovery LSP are
528	   pre-assigned for use only by the protected service. This will clearly
529	   be the case in 1+1 protection, and may also be the case in 1:1
530	   protection where extra traffic (see Section 4.3.3) is not supported.

532	   Note that in the bypass tunnel recovery mechanism (see Section 4.4.3)
533	   resources may also be dedicated to protecting a specific service. In
534	   some cases (one-for-one protection) the whole of the bypass tunnel
535	   may be dedicated to provide recovery for a specific LSP, but in other
536	   cases (such as facility backup) a subset of the resources of the
537	   bypass tunnel may be pre-assigned for use to recover a specific
538	   service. However, as described in Section 4.4.3, the bypass tunnel
539	   approach can also be used for shared protection (Section 4.3.2), to
540	   carry extra traffic (Section 4.3.3), or without reserving resources
541	   to achieve best-effort recovery.

543	4.3.2. Shared Protection

545	   In shared protection, the resources for the recovery LSPs of several
546	   services are shared. These may be shared as 1:n or m:n, and may be
547	   shared on individual links, on LSP segments, or on end-to-end LSPs.

549	   Where a bypass tunnel is used (Section 4.4.3) the tunnel might not
550	   have sufficient resources to simultaneously protect all of the LSPs
551	   to which it offers protection so that if they were all affected by
552	   network failures at the same time, they would not all be recovered.

554	   Shared protection is a trade-off between expensive network resources
555	   being dedicated to protection that is not required most of the time,
556	   and the risk of unrecoverable services in the event of multiple
557	   network failures. There is also a trade-off between rapid recovery
558	   (that can be achieved with dedicated protection, but which is delayed
559	   by message exchanges in the management, control, or data planes for
560	   shared protection) and the reduction of network cost by sharing
561	   protection resources. These trade-offs may be somewhat mitigated by
562	   using m:n for some value of m <> 1, and by establishing new
563	   protection paths as each available protection path is put into use.

565	4.3.3. Extra Traffic

567	   A way to utilize network resources that would otherwise be idle
568	   awaiting use to protect services, is to use them to carry other
569	   traffic. Obviously, this is not practical in dedicated protection
570	   (Section 4.3.1), but is practical in shared protection (Section
571	   4.3.2) and bypass tunnel protection (Section 4.4.3).

573	   When a network resource that is carrying extra traffic is required
574	   for protection, the extra traffic is disrupted - essentially it is
575	   pre-empted by the recovery LSP. This may require some additional
576	   messages exchanges in the management, control, or data planes, with
577	   the consequence that recovery may be delayed somewhat. This provides
578	   an obvious trade-off against the cost reduction (or rather, revenue
579	   increase) achieved by carrying extra traffic.

581	4.3.4. Restoration and Repair

583	   If resources are not pre-assigned for use by the recovery LSP, the
584	   recovery LSP must be established "on demand" when the network failure
585	   is detected and reported, or upon instruction from the management
586	   plane.

588	   Restoration represents the most cost-effective use of network
589	   resources as no resources are tied up for specific protection usage.
590	   However, restoration requires computation of a new path and
591	   activation of a new LSP (through the management or control plane).
592	   These steps can take much more time than is required for recovery
593	   using protection techniques.

595	   Furthermore, there is no guarantee that restoration will be able to
596	   recover the service. It may be that all suitable network resources
597	   are already in use for other LSPs so that no new path can be found.
598	   This problem can be partially mitigated by the use of LSP setup
599	   priorities so that recovery LSPs can pre-empt other low priority
600	   LSPs.

602	   Additionally, when a network failure occurs, multiple LSPs may be
603	   disrupted by the same event. These LSPs may have been established by
604	   different Network Management Stations (NMSs) or signaled by different
605	   head-end LSRs, and this means that multiple points in the network
606	   will be trying to compute and establish recovery LSPs at the same
607	   time. This can lead to contention within the network meaning that
608	   some recovery LSPs must be retried resulting in very slow recovery
609	   times for some services.

611	4.3.5. Reversion

613	   When a service has been recovered so that traffic is flowing on the
614	   recovery LSP, the faulted network resource may be repaired. The
615	   choice must be made about whether to redirect the traffic back on to
616	   the original working LSP, or to leave it where it is on the recovery
617	   LSP. These behaviors are known as "revertive" and "non-revertive",
618	   respectively.

620	   In "revertive" mode, care should be taken to prevent frequent
621	   operation of the recovery operation due to an intermittent defect.
622	   Therefore, when the failure condition of a recovery element has been
623	   handled, a fixed period of time should elapse before normal data
624	   traffic is redirected back onto the original working entity.

626	4.4. Mechanisms for Recovery

628	   The purpose of this section is to describe in general (MPLS-TP
629	   non-specific) terms the mechanisms that can be used to provide
630	   protection.

632	4.4.1. Link-Level Protection

634	4.4.2. Alternate Paths and Segments

636	4.4.3. Bypass Tunnels

638	4.5. Protection in Different Topologies

640	   As described in the requirements listed in Section 3 and detailed in
641	   [MPLS-TP-REQ], the recovery techniques used may be optimized for
642	   different network topologies. This section describes two different
643	   topologies and explains how recovery may be markedly different in
644	   those different scenarios. It also introduces the concept of a
645	   recovery domain and shows how end-to-end survivability may be
646	   achieved through a concatenation of recovery domains each providing
647	   some level of recovery in part of the network.

649	4.5.1. Mesh Networks

651	   Linear protection provides a fast and simple protection switching
652	   mechanism and it fits best in mesh networks. It can protect against a
653	   failure that may happen on an entity (element of recovery that may
654	   constitute a span, LSP segment, PW segment, end-to-end LSP or end-to-
655	   end PW).

657	   In order to guarantee the protection, two entities are
658	   pre-provisioned. One of the pre-provisioned entities is configured to
659	   be the 'working' entity (primary) and the other is configured as the
660	   'protection' entity (backup).

662	   The Protection switching occurs at the protection controllers which
663	   reside at the edges of the protected entity. Between these endpoints,
664	   there are working and protection entities.

666	   In linear protection, a protection entity is pre-provisioned to
667	   protect the working entity. In order to guarantee protection
668	   switching in case of a 'failed' condition, the physical routes of the
669	   working and the protection entities should have complete physical
670	   diversity.

672	   [MPLS-TP-REQ] requires that both 1:1 linear protection scheme and 1+1
673	   protection schemes are supported. The 1:1 protection switching,
674	   bi-directional protection switching should be supported. In 1+1
675	   linear protection switching unidirectional protection switching
676	   should be supported.

678	   1:1 linear protection:

680	   - In normal conditions the data traffic is transmitted either over
681	     the working entity or the 'protection' entity. Normal conditions
682	     are defined when there is no failure on the 'working' entity and
683	     there is no administrative configuration or requests that cause
684	     traffic to transmit over the 'protection' entity. Upon a failure
685	     condition or a specific administrative request, the traffic is
686	     switched over to the 'protection' entity.

688	   - In each transmission direction, the source of the protection domain
689	     bridges the traffic into the appropriate entity and the sink of the
690	     protected domain selects the traffic from the appropriate entity.
691	     The source and the sink need to be coordinated to ensure that the
692	     bridging and the selection are done to and from the same entity.
693	     For that sake a signaling coordination protocol is needed.

695	   - In bi-directional protection switching, both ends of the protection
696	     domain should switch to the 'protection' entity (even when the
697	     failure is unidirectional).

699	   - When there is no failure, the resources of the 'idle' entity may be
700	     used for less priority traffic, extra traffic. When protection
701	     switching is performed, the extra traffic is required for
702	     protection, the extra traffic is pre-empted by the protected
703	     traffic.

705	   1+1 linear protection:

707	   - The data traffic is copied at fed to both the 'working' and the
708	     'protection' entities. The traffic on the 'working' and the
709	     'protection' entities is transmitted simultaneously to the sink of
710	     the protected domain, where a selection between the 'working' and
711	     'protection' entities is made (based on some predetermined
712	     criteria). Since only uni-directional protection switching is
713	     supported in the 1+1 linear protection scheme, there is no need to
714	     coordinate between the protection controllers.

716	4.5.2. Ring Networks

718	4.5.3. Protection and Restoration Domains

720	   Protection and restoration are performed in the context of a recovery
721	   domain. A recovery domain is defined between two recovery reference
722	   points which are located at the edges of the recovery domain and are
723	   responsible for performing recovery for a 'working' entity (which may
724	   be one of the elements of recovery defined above) when an appropriate
725	   trigger is received. These reference points function as recovery
726	   controllers.

728	   As described in section 4.2 above, the recovery element may
729	   constitute a spam, a tandem connection (i.e. either an LSP segment or
730	   a PW segment), an end-to-end LSP, or an end-to-end PW.

732	   The method used to monitor the health of the recovery element is
733	   unimportant, provided that the recovery controllers receive
734	   information on its condition. The condition of the recovery element
735	   may be OK, 'failed', or degraded.

737	   When the recovery operation is launched by an OAM trigger, the
738	   recovery domain is equivalent to the OAM maintenance entity which is
739	   defined in [MPLS-TP-OAM], and the recovery reference points are
740	   defined at the same location as the OAM MEPs.

742	4.6. Recovery in Layered Networks

744	   In multi-layer or multi-region networking, recovery may be performed
745	   at multiple layers or across cascaded recovery domains.

747	   The MPLS-TP recovery mechanism must ensure that the timing of
748	   recovery is co-ordinated in order to avoid races, and to allow either
749	   the recovery mechanism of the server layer to fix the problem before
750	   recovery takes place at the MPLS-TP layer, or to allow an upstream
751	   recovery domain to perform recovery before a downstream domain. In
752	   inter-connected rings, for example, it may be preferable to allow the
753	   upstream ring to perform recovery before the downstream ring, in
754	   order to ensure that recovery takes place in the ring in which the
755	   failure occurred.

757	   A hold-off timer is required to coordinate the timing of recovery at
758	   multiple layers or across cascaded recovery domains. Setting this
759	   configurable timer involves a trade-off between rapid recovery and
760	   the creation of a race condition where multiple layers respond to the
761	   same fault, potentially allocating resources in an inefficient
762	   manner. Thus, the detection of a failure condition in the MPLS-TP
763	   layer should not immediately trigger the recovery process if the
764	   hold-off timer is set to a value other than zero. The hold-off timer
765	   should be started and, on expiry, the recovery element should be
766	   checked to determine whether the failure condition still exists. If
767	   it does exist, the defect triggers the recovery operation.

769	   In other configurations, where the lower layer does not have a
770	   restoration capability, or where it is not expected to provide
771	   protection, the lower layer needs to trigger the higher layer to
772	   immediately perform recovery.

774	   [RFC3386]

776	4.6.1. Inherited Link-Level Protection

778	4.6.2. Shared Risk Groups

780	4.6.3. Fault Correlation

782	5. Mechanisms for Providing Protection in MPLS-TP

784	   This section describes the existing mechanisms available to provide
785	   protection within MPLS-TP networks and highlights areas where new
786	   work is required. It is expected that, as new protocol extensions and
787	   techniques are developed, this section will be updated to convert the
788	   statements of required work into references to those protocol
789	   extensions and techniques.

791	5.1. Management Plane

793	   As described above, a fundamental requirement of MPLS-TP is that
794	   recovery mechanisms should be capable of functioning in the absence
795	   of a control plane. Recovery may be triggered by MPLS-TP OAM fault
796	   management functions or by external requests (e.g. an operator
797	   request for manual control of protection switching).

799	   The management plane may be used to configure the recovery domain by
800	   setting the reference points (recovery controllers), the 'working'
801	   and 'protection' entities, and the recovery type (e.g. 1:1
802	   bi-directional linear protection, ring protection, etc.). Additional
803	   parameters associated with the recovery process (such as a hold-off
804	   timer, revertive/non-revertive operation, etc.) may also be
805	   configured.

807	   In addition, the management plane may initiate manual control of the
808	   protection switching function. Either the fault condition or the
809	   operator request should be prioritized.

811	   Since provisioning the recovery domain involves the selection of a
812	   number of options, mismatches may occur at the different reference
813	   points. The MPLS-TP OAM Automatic Protection Switching (APS) protocol
814	   may be used as an in-band (i.e., data plane-based) control protocol
815	   to align both ends of the protected domain.

817	   It should also be possible for the management plane to monitor the
818	   recovery status.

820	5.1.1. Configuration of Protection Operation

822	   In order to implement the protection switching mechanism, the
823	   following entities and information should be provisioned:

825	   - The protection controllers (reference points)

827	   - The protection group consisting of a 'working' entity (which may be
828	     one of the recovery elements defined above) and a 'protection'
829	     entity. To guarantee protection, the paths of the 'working' and the
830	     'protection' entities should have complete physical diversity.

832	   - The protection type that should be applied

834	   - Revertive/non-revertive behavior

836	5.1.2. External manual commands

838	   The following external, manual commands may be applied to a
839	   protection group; they are listed in descending order of priority:

841	   - Blocked protection action - a manual command to prevent data
842	     traffic from switching to the 'protection' entity. This command
843	     actually disables the protection group.

845	   - Force protection action - a manual command that forces a switch of
846	     normal data traffic to the 'protection' entity.

848	   - Manual protection action - a manual command that forces a switch of
849	     data traffic to the 'protection' entity when there is no failure in
850	     the 'working' or the 'protection' entity

852	5.2. Fault Detection

854	5.3. Fault Isolation
855	5.4. OAM Signaling

857	5.4.1. Fault Detection

859	5.4.2. Fault Isolation

861	5.4.3. Fault Reporting

863	5.4.4. Coordination of Recovery Actions

865	5.5. Control Plane Signaling

867	5.5.1. Fault Detection

869	5.5.2. Fault Isolation

871	5.5.3. Fault Reporting

873	5.5.4. Coordination of Recovery Actions

875	6. Pseudowire Protection Considerations

877	   The main application for the MPLS-TP network is currently identified
878	   as the pseudowire. Pseudowires provide end-to-end connectivity over
879	   the MPLS-TP network and may be comprised of a single pseudowire
880	   segment, or multiple segments "stitched" together to provide end-to-
881	   end connectivity.

883	   The pseudowire service may, itself, require a level of protection as
884	   part of its SLA. This protection could be provided by the MPLS-TP
885	   LSPs that support the pseudowire, or could be a feature of the
886	   pseudowire layer itself.

888	6.1. Utilizing Underlying MPLS-TP Protection

890	6.2. Protection in the Pseudowire Layer

892	7. Manageability Considerations

894	8. Security Considerations

896	9. IANA Considerations

898	   This informational document makes no requests for IANA action.

900	10. Acknowledgments
901	11. References

903	11.1. Normative References

905	   [RFC4427]      Mannie, E., and Papadimitriou, D., "Recovery
906	                  (Protection and Restoration) Terminology for
907	                  Generalized Multi-Protocol Label Switching (GMPLS)",
908	                  RFC 4427, March 2006.

910	   [RFC4428]      Papadimitriou D. and E.Mannie, Editors, "Analysis of
911	                  Generalized Multi-Protocol Label Switching (GMPLS)-
912	                  based Recovery Mechanisms (including Protection and
913	                  Restoration)", RFC 4428, March 2006.

915	   [RFC4873]      Berger, L., Bryskin, I., Papadimitriou, D., and
916	                  Farrel, A., " GMPLS Segment Recovery", RFC 4873, May
917	                  2007.

919	   [G.808.1]      ITU-T, "Generic Protection Switching - Linear trail
920	                  and subnetwork protection,", Recommendation G.808.1,
921	                  December 2003.

923	   [G.841]        ITU-T, "Types and Characteristics of SDH Network
924	                  Protection Architectures," Recommendation G.841,
925	                  October 1998.

927	   [MPLS-TP-JWT]  Bryant, S., and Andersson, L. "JWT Report on MPLS
928	                  Architectural Considerations for a Transport Profile",
929	                  draft-bryant-jwt-mplstp-report, work in progress.

931	   [MPLS-TP-REQ]  B. Niven-Jenkins, et al., "Requirements for MPLS-TP",
932	                  draft-jenkins-mpls-mplstp-requirements, work in
933	                  progress.

935	   [MPLS-TP-OAM]  Vigoureux, M., Betts, M., and Ward, D., "MPLS TP OAM
936	                  Requirements (MPLS)", work in progress.

938	11.2. Informative References

940	   [RFC3386]      Lai, W. and D.  McDysan, "Network Hierarchy and
941	                  Multilayer Survivability", RFC 3386, November 2002.

943	   [RFC3469]      Sharma, V., and Hellstrand, F., "Framework for Multi-
944	                  Protocol Label Switching (MPLS)-based Recovery", RFC
945	                  3469, February 2003.

947	   [RFC4426]      Lang, J., Rajagopalan B., and D. Papadimitriou,
948	                  Editors, "Generalized Multiprotocol Label Switching
949	                  (GMPLS) Recovery Functional Specification", RFC 4426,
950	                  March 2006.

952	12. Editors' Addresses

954	   Nurit Sprecher
955	   Nokia Siemens Networks
956	   3 Hanagar St. Neve Ne'eman B
957	   45241 Hod Hasharon, Israel
958	   Tel. +972 9 7751229
959	   Email: nurit.sprecher@nsn.com

961	   Adrian Farrel
962	   Old Dog Consulting
963	   Email: adrian@olddog.co.uk

965	   Vach Kompella
966	   Alcatel-Lucent
967	   701 East Middlefield Rd.
968	   Mountain View, CA 94043
969	   Email: vach.kompella@alcatel.com

971	13. Intellectual Property Statement

973	   The IETF takes no position regarding the validity or scope of any
974	   Intellectual Property Rights or other rights that might be claimed to
975	   pertain to the implementation or use of the technology described in
976	   this document or the extent to which any license under such rights
977	   might or might not be available; nor does it represent that it has
978	   made any independent effort to identify any such rights.  Information
979	   on the procedures with respect to rights in RFC documents can be
980	   found in BCP 78 and BCP 79.

982	   Copies of IPR disclosures made to the IETF Secretariat and any
983	   assurances of licenses to be made available, or the result of an
984	   attempt made to obtain a general license or permission for the use of
985	   such proprietary rights by implementers or users of this
986	   specification can be obtained from the IETF on-line IPR repository at
987	   http://www.ietf.org/ipr.

989	   The IETF invites any interested party to bring to its attention any
990	   copyrights, patents or patent applications, or other proprietary
991	   rights that may cover technology that may be required to implement
992	   this standard.  Please address the information to the IETF at ietf-
993	   ipr@ietf.org.

995	   Disclaimer of Validity

997	   This document and the information contained herein are provided
998	   on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
999	   REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE
1000	   IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL
1001	   WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY
1002	   WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE
1003	   ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS
1004	   FOR A PARTICULAR PURPOSE.

1006	   Copyright Statement

1008	   Copyright (C) The IETF Trust (2008). This document is subject to the
1009	   rights, licenses and restrictions contained in BCP 78, and except as
1010	   set forth therein, the authors retain all their rights.