idnits 2.17.1 

draft-papadimitriou-ccamp-gmpls-recovery-analysis-03.txt:
  ** The Abstract section seems to be numbered

-(96): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(473): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(917): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(1098): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(1141): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(1622): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(1852): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(1868): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(1884): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(1903): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(1933): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(1940): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(1967): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document is more than 15 pages and seems to lack a Table of Contents.

  == There are 118 instances of lines with non-ascii characters in the
     document.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 37
     longer pages, the longest (page 39) being 60 lines


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** The abstract seems to contain references ([CCAMP-TERM]), which it
     shouldn't.  Please replace those with straight textual mentions of the
     documents in question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == Line 1631 has weird spacing: '... during  recov...'

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (November 2002) is 7833 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: '1' on line 25

  -- Looks like a reference, but probably isn't: '2' on line 60

  == Missing Reference: 'ITUT-G709' is mentioned on line 75, but not defined

  == Missing Reference: 'GMPLS-RSVP-TE' is mentioned on line 343, but not
     defined

  == Missing Reference: 'ITU-T G.709' is mentioned on line 556, but not
     defined

  == Missing Reference: 'ITU-T G.874' is mentioned on line 556, but not
     defined

  == Missing Reference: 'IPO-IMP' is mentioned on line 1323, but not defined

  == Missing Reference: 'MPLS-BUNDLE' is mentioned on line 1777, but not
     defined

  == Unused Reference: 'RFC-2026' is defined on line 1844, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC-2119' is defined on line 1847, but no explicit
     reference was found in the text

  == Unused Reference: 'CCAMP-LI' is defined on line 1854, but no explicit
     reference was found in the text

  == Unused Reference: 'CCAMP-LIU' is defined on line 1859, but no explicit
     reference was found in the text

  == Unused Reference: 'G.826' is defined on line 1903, but no explicit
     reference was found in the text

  == Unused Reference: 'G.GPS' is defined on line 1913, but no explicit
     reference was found in the text

  == Unused Reference: 'GMPLS-ARCH' is defined on line 1920, but no explicit
     reference was found in the text

  == Unused Reference: 'MPLS-REC' is defined on line 1948, but no explicit
     reference was found in the text

  == Unused Reference: 'MPLS-OSU' is defined on line 1953, but no explicit
     reference was found in the text

  == Unused Reference: 'TE-NS' is defined on line 1961, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. 'BOUILLET'

  -- Possible downref: Normative reference to a draft: ref. 'CCAMP-LI' 

  == Outdated reference: A later version (-01) exists of
     draft-liu-gmpls-ospf-restoration-00

  -- Possible downref: Normative reference to a draft: ref. 'CCAMP-LIU' 

  == Outdated reference: A later version (-02) exists of
     draft-papadimitriou-ccamp-srlg-processing-01

  -- Possible downref: Normative reference to a draft: ref. 'CCAMP-SRLG' 

  -- Possible downref: Non-RFC (?) normative reference: ref. 'CCAMP-SRG'

  == Outdated reference: A later version (-06) exists of
     draft-ietf-ccamp-gmpls-recovery-terminology-00

  ** Downref: Normative reference to an Informational draft:
     draft-ietf-ccamp-gmpls-recovery-terminology (ref. 'CCAMP-TERM')

  -- Possible downref: Non-RFC (?) normative reference: ref. 'DEMEESTER'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'G.707'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'G.709'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'G.783'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'G.798'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'G.806'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'G.826'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'G.841'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'G.842'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'G.GPS'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'GLI'

  == Outdated reference: A later version (-07) exists of
     draft-ietf-ccamp-gmpls-architecture-03

  == Outdated reference: A later version (-09) exists of
     draft-ietf-ccamp-gmpls-routing-05

  == Outdated reference: A later version (-10) exists of
     draft-ietf-ccamp-lmp-06

  == Outdated reference: A later version (-03) exists of
     draft-ietf-ccamp-lmp-wdm-01

  -- Possible downref: Non-RFC (?) normative reference: ref. 'MANCHESTER'

  == Outdated reference: A later version (-08) exists of
     draft-ietf-mpls-recovery-frmwrk-06

  ** Downref: Normative reference to an Informational draft:
     draft-ietf-mpls-recovery-frmwrk (ref. 'MPLS-REC')

  -- Possible downref: Normative reference to a draft: ref. 'MPLS-OSU' 

  == Outdated reference: A later version (-03) exists of
     draft-owens-te-network-survivability-01

  -- Possible downref: Normative reference to a draft: ref. 'TE-NS' 

  ** Downref: Normative reference to an Informational draft:
     draft-ietf-tewg-restore-hierarchy (ref. 'TE-RH')


     Summary: 10 errors (**), 0 flaws (~~), 29 warnings (==), 23 comments
     (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	CCAMP Working Group                         CCAMP GMPLS P&R Design Team
3	Internet Draft
4	Expiration Date: May 2003                Dimitri Papadimitriou (Editor)
5	                                                   Eric Mannie (Editor)

7	                                             Deborah Brungard    (AT&T)
8	                                          Sudheer Dharanikota (Consult)
9	                                                Jonathan Lang (Calient)
10	                                                  Guangzhi Li    (AT&T)
11	                                             Bala Rajagopalan (Tellium)
12	                                                Yakov Rekhter (Juniper)

14	                                                          November 2002

16	         Analysis of Generalized MPLS-based Recovery Mechanisms
17	                 (including Protection and Restoration)

19	        draft-papadimitriou-ccamp-gmpls-recovery-analysis-03.txt

21	Status of this Memo

23	   This document is an Internet-Draft and is in full conformance with
24	   all provisions of Section 10 of RFC2026 [1].

26	   Internet-Drafts are working documents of the Internet Engineering
27	   Task Force (IETF), its areas, and its working groups. Note that
28	   other groups may also distribute working documents as Internet-
29	   Drafts. Internet-Drafts are draft documents valid for a maximum of
30	   six months and may be updated, replaced, or obsoleted by other
31	   documents at any time. It is inappropriate to use Internet- Drafts
32	   as reference material or to cite them other than as "work in
33	   progress."

35	   The list of current Internet-Drafts can be accessed at
36	   http://www.ietf.org/ietf/1id-abstracts.txt

38	   The list of Internet-Draft Shadow Directories can be accessed at
39	   http://www.ietf.org/shadow.html.

41	   For potential updates to the above required-text see:
42	   http://www.ietf.org/ietf/1id-guidelines.txt

44	1. Abstract

46	   This document provides an analysis grid that can be used to
47	   evaluate, compare and contrast the numerous Generalized MPLS
48	   (GMPLS)-based recovery mechanisms currently proposed at the CCAMP
49	   Working Group. A detailed analysis of each of the recovery phases is
50	   provided using the terminology defined in [CCAMP-TERM]. Also, this
51	   document focuses on transport plane survivability and recovery
52	   issues and not on control plane resilience and related aspects.

54	D.Papadimitriou et al. - Internet Draft � Expires May 2003           1
55	2. Conventions used in this document

57	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
58	   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in
59	   this document are to be interpreted as described in RFC-2119 [2].

61	3. Introduction

63	   This document provides an analysis grid that can be used to
64	   evaluate, compare and contrast the numerous Generalized MPLS (GMPLS)
65	   based recovery mechanisms currently proposed in the CCAMP Working
66	   Group. Here, the focus will only be on transport plane survivability
67	   and recovery issues and not on control plane resilience related
68	   aspects. Although the recovery mechanisms described in this document
69	   impose different requirements on recovery protocols, the protocol(s)
70	   specifications will not be covered in this document. Though the
71	   concepts discussed here are technology independent, this document
72	   will implicitly focus on Sonet/SDH and pre-OTN technologies except
73	   when specific details need to be considered (for instance, in the
74	   case of failure detection). Details for applicability to other
75	   technologies such as Optical Transport Networks (OTN) [ITUT-G709]
76	   will be covered in a future release of this document.

78	   In the present release, a detailed analysis is provided for each of
79	   the recovery phases as identified in [CCAMP-TERM]. These phases
80	   define the sequence of generic operations that need to be performed
81	   when a LSP/Span failure (or any other event generating such
82	   failures) occurs:

84	      - Phase 1: Failure detection
85	      - Phase 2: Failure localization and isolation
86	      - Phase 3: Failure notification
87	      - Phase 4: Recovery (Protection/Restoration)
88	      - Phase 5: Reversion (normalization)

90	   Failure detection, localization and notification phases together are
91	   referred to as fault management. Within a recovery domain, the
92	   entities involved during the recovery operations are defined in
93	   [CCAMP-TERM]; these entities include ingress, egress and
94	   intermediate nodes.

96	   In this document the term �recovery mechanism� is used to cover both
97	   protection and restoration mechanisms. Specific terms such as
98	   protection and restoration are only used when differentiation is
99	   required. Likewise the term �failure� is used to represent both
100	   signal failure and signal degradation. In addition, a clear
101	   distinction is made between partitioning (horizontal hierarchy) and
102	   layering (vertical hierarchy) when analyzing hierarchical recovery
103	   mechanisms including disjointness related issues. We also introduce
104	   the dimensions from which each of the recovery mechanisms described
105	   in this document can be further analyzed and provide an analysis
106	   grid with respect to these dimensions. Last, we conclude by

108	D.Papadimitriou et al. - Internet Draft � May 2003                   2
109	   detailing the applicability of the current GMPLS protocol building
110	   blocks for recovery purposes.

112	   Note: Any other recovery-related terminology used in this document
113	   conforms to the one defined in [CCAMP-TERM].

115	4. Fault Management

117	4.1 Failure Detection

119	   Transport failure detection is the only phase that can not be
120	   achieved by the control plane alone since the latter needs a hook to
121	   the transport plane to collect the related information. It has to be
122	   emphasized that even if failure events themselves are detected by
123	   the transport plane, the latter, upon failure condition, MUST
124	   trigger the control plane for subsequent actions through the use of
125	   GMPLS signalling capabilities (see [GMPLS-SIG]) or Link Management
126	   Protocol (see [LMP], Section 6) capabilities.

128	   Therefore, by definition, transport failure detection is transport
129	   technology dependent (and so exceptionally, we keep here the
130	   �transport plane� terminology). In transport fault management,
131	   distinction is made between a defect and a failure. Here, the
132	   discussion addresses failure detection (persistent fault cause). In
133	   the technology dependent descriptions, a more precise specification
134	   will be provided.

136	   As an example, Sonet/SDH (see [G.707], [G.783] and [G.806]) provides
137	   supervision capabilities covering:

139	   - Continuity: monitors the integrity of the continuity of a trail
140	     (i.e. section or path). This operation is performed by monitoring
141	     the presence/absence of the signal. Examples are Loss of Signal
142	     (LOS) detection for the physical layer, Unequipped (UNEQ) Signal
143	     detection for the path layer, Server Signal Fail Detection (e.g.
144	     AIS) at the client layer.

146	   - Connectivity: monitors the integrity of the routing of the signal
147	     between end-points. Connectivity monitoring is needed if
148	     the layer provides flexible connectivity, either automatically
149	     (e.g. cross-connects controlled by the TMN) or manually (e.g.
150	     fiber distribution frame). An example is the Trail (i.e. section
151	     or path) Trace Identifier used at the different layers and the
152	     corresponding Trail Trace Identifier Mismatch detection.

154	   - Alignment: checks that the client and server layer frame start can
155	     be correctly recovered from the detection of loss of alignment.
156	     The specific processes depend on the signal/frame structure and
157	     may include: (multi-)frame alignment, pointer processing and
158	     alignment of several independent frames to a common frame start in
159	     case of inverse multiplexing. Loss of alignment is a generic term.
160	     Examples are loss of frame, loss of multi-frame, or loss of
161	     pointer.

163	D.Papadimitriou et al. - Internet Draft � May 2003                   3
164	   - Payload type: checks that compatible adaptation functions are used
165	     at the source and the sink. This is normally done by adding a
166	     signal type identifier at the source adaptation function and
167	     comparing it with the expected identifier at the sink. For
168	     instance, the payload signal label and the corresponding payload
169	     signal mismatch detection.

171	   - Signal Quality: monitors the performance of a signal. For
172	     instance, if the performance falls below a certain threshold a
173	     defect � excessive errors (EXC) or degraded signal (DEG) - is
174	     detected.

176	   The most important point to keep in mind is that the supervision
177	   processes and the corresponding failure detection (used to initiate
178	   the recovery phase(s)) result in either:

180	   - Signal Degrade (SD): A signal indicating that the associated data
181	     has degraded in the sense that a degraded defect condition is
182	     active (for instance, a dDEG declared when the Bit Error Rate
183	     exceeds a preset threshold).

185	   - Signal Fail (SF): A signal indicating that the associated data has
186	     failed in the sense that a signal interrupting near-end defect
187	     condition is active (as opposed to the degraded defect).

189	   In Optical Transport Networks (OTN) equivalent supervision
190	   capabilities are provided at the optical/digital section layers
191	   (OTS, OMS and OTUk) and at optical/digital path layers (OCh and
192	   ODUk). Interested readers are referred to the ITU-T Recommendations
193	   [G.798] and [G.709] for more details.

195	   The above are examples where the failure detection, reporting and
196	   recovery responsible entities are co-located.

198	   On the other hand, in pre-OTN networks, a failure may be masked by
199	   intermediate O/E/O based Optical Line System (OLS), preventing a
200	   Photonic Cross-Connect (PXC) from detecting upstream failures. In
201	   such cases, failure detection may be assisted by an out-of-band
202	   communication channel and failure condition reported to the PXC
203	   control plane. This can be provided by using [LMP-WDM] extensions
204	   that delivers IP message-based communication between the PXC and the
205	   OLS control plane. Also, since PXCs are framing format independent,
206	   failure conditions can only be triggered either by detecting the
207	   absence of the optical signal or by measuring its optical quality,
208	   mechanisms which are less reliable than electrical (digital)
209	   mechanisms. Both types of detection mechanisms are out of the scope
210	   of this document. If the intermediate OLS supports electrical
211	   (digital) mechanisms, using the LMP communication channel, these
212	   failure conditions are reported to the PXC and subsequent recovery
213	   actions performed as described in Section 5. As such from the
214	   control plane viewpoint, this mechanism makes the OLS-PXC composed
215	   system appearing as a single logical entity allowing considering for

217	D.Papadimitriou et al. - Internet Draft � May 2003                   4
218	   such entity the same failure management mechanisms as for any other
219	   O/E/O capable device.

221	   This example is to illustrate the scenario where the failure
222	   detection and reporting (recovery responsible) entities are not co-
223	   located.

225	   More generally, the following are typical failure conditions in
226	   Sonet/SDH and pre-OTN networks:
227	   - Loss of Light (LOL)/Loss of Signal (LOS): Signal Failure (SF)
228	     condition where the optical signal is not detected anymore on a
229	     given interface�s receiver.
230	   - Signal Degrade (SD): detection of the signal degradation over
231	     a specific period of time.
232	   - For Sonet/SDH payloads, all of the above-mentioned supervision
233	     capabilities can be used, resulting in SD or SF condition.

235	   In summary, the following cases are considered to illustrate the
236	   communication between the detecting and reporting (also recovery
237	   responsible) entities:

239	   - Co-located detecting and reporting entities: both the detecting
240	     and reporting entities are on the same node (e.g., Sonet/SDH
241	     equipment, Opaque cross-connects, and, with some limitations, for
242	     Transparent cross-connects, etc.).

244	   - Non co-located detecting and reporting entities:
245	     - with In-band communication between entities:
246	       Entities are separated but transport plane (in-band)
247	       communication is provided between them (e.g., Server Signal
248	       Failures (AIS), etc.)
249	     - with Out-of-band communication between entities:
250	       Entities are separated but out-of-band communication is provided
251	       between them (e.g., using [LMP]).

253	4.2 Failure Localization and Isolation

255	   Failure localization provides the required information in order to
256	   perform the subsequent recovery action(s) at the LSP/span end-
257	   points.

259	   In some cases, accurate failure localization may be less urgent; the
260	   need is to identify the failure as occurring within the recovery
261	   domain. This is particularly the case when edge-to-edge LSP recovery
262	   (edge referring to a sub-network end-node for instance) is performed
263	   based on a simple failure notification (including the identification
264	   of the failed working LSPs) so that a more accurate localization can
265	   be performed after LSP recovery.

267	   Failure localization should be triggered immediately after the fault
268	   detection phase. This operation can be performed at the transport
269	   management plane and/or, if unavailable (via the transport plane),

271	D.Papadimitriou et al. - Internet Draft � May 2003                   5
272	   the control plane level where dedicated signaling messages can be
273	   used.

275	   When performed at the control plane level, a protocol such as LMP
276	   (see [LMP], Section 6) can be used for failure localization and
277	   isolation purposes.

279	4.3 Failure Notification

281	   Failure notification is used 1) to inform intermediate nodes that a
282	   LSP/span failure has occurred and has been detected 2) to inform the
283	   recovery deciding entities (which can correspond to any intermediate
284	   or end-point of the failed LSP/span) that the corresponding service
285	   is not available. In general, these deciding entities will be the
286	   ones taking the appropriate recovery decision. When co-located with
287	   the recovering entity, these entities will also perform the
288	   corresponding recovery action(s).

290	   Failure notification can be either provided by the transport or by
291	   the control plane. As an example, let us first briefly describe the
292	   failure notification mechanism defined at the Sonet/SDH transport
293	   plane level (also referred to as maintenance signal supervision):

295	   - AIS (Alarm Indication Signal) occurs as a result of a failure
296	     condition such as Loss of Signal and is used to notify downstream
297	     nodes (of the appropriate layer processing) that a failure has
298	     occurred. AIS performs two functions 1) inform the intermediate
299	     nodes (with the appropriate layer monitoring capability) that a
300	     failure has been detected 2) notify the connection end-point that
301	     the service is no longer available.

303	   For a distributed control plane supporting one (or more) failure
304	   notification mechanism(s), regardless of the mechanism�s actual
305	   implementation, the same capabilities are needed with more (or less)
306	   information provided about the LSPs/Spans under failure condition,
307	   their detailed status, etc.

309	   The most important difference between these mechanisms is related to
310	   the fact that transport plane notifications (as defined today) would
311	   initiate a protection scheme directly (such as those defined in
312	   [CCAMP-TERM]) or a restoration scheme via the management plane. On
313	   the other hand, using a failure notification mechanism through the
314	   control plane would provide the possibility to trigger either a
315	   protection or a restoration action via the control plane. This has
316	   the advantage that a control plane recovery responsible entity does
317	   not necessarily have to be co-located with a transport
318	   maintenance/recovery domain. A control plane recovery domain can be
319	   defined at entities not supporting a transport plane recovery.

321	   Moreover, as specified in [GMPLS-SIG], notification message
322	   exchanges through a GMPLS control plane may not follow the same path
323	   as the LSP/spans for which these messages carry the status. In turn,
324	   this ensures a fast, reliable (through the use of either a dedicated

326	D.Papadimitriou et al. - Internet Draft � May 2003                   6
327	   control plane network or disjoint control channels) and efficient
328	   (through the aggregation of several LSP/span status within the same
329	   message) failure notification mechanism.

331	   The other important properties to be met by the failure notification
332	   mechanism are mainly the following:

334	   - Notification messages must provide enough information such that
335	     the most efficient subsequent recovery action will be taken (in
336	     most of the recovery schemes this action is even deterministic)
337	     at the recovering entities. Remember here that these entities can
338	     be either intermediate or end-points through which normal traffic
339	     flows. Based on local policy, intermediate nodes may not use this
340	     information for subsequent recovery actions (see for instance the
341	     APS protocol phases as described in [CCAMP-TERM]). In addition,
342	     since fast notification is a mechanism running in collaboration
343	     with the existing signalling (see for instance, [GMPLS-RSVP-TE])
344	     allowing intermediate nodes to stay informed about the status of
345	     the working LSP/spans under failure condition.

347	     The trade-off here is to define what information the LSP/span end-
348	     points (more precisely, the deciding entity) needs in order for
349	     the recovering entity to take the best recovery action: if not
350	     enough information is provided, the decision can not be optimal
351	     (note that in this eventuality, the important issue is to quantify
352	     the level of sub-optimality), if too much information is provided
353	     the control plane may be overloaded with unnecessary information
354	     and the aggregation/correlation of this notification information
355	     will be more complex and time consuming to achieve. Notice that
356	     more detailed quantification of the amount of information to be
357	     exchanged and processed is strongly dependent on the failure
358	     notification protocol specification.

360	   - If the failure localization and isolation is not performed by one
361	     of the LSP/Span end-points or some intermediate points, they
362	     should receive enough information from the notification message in
363	     order to locate the failure otherwise they would need to (re-)
364	     initiate a failure localization and isolation action.

366	   - Avoiding so-called notification storms implies that failure
367	     detection output is correlated (i.e. alarm correlation) and
368	     aggregated at the node detecting the failure(s), failure
369	     notifications are directed to a restricted set of destinations (in
370	     general the end-points) and notification suppression (i.e. alarm
371	     suppression) is provided in order to limit flooding in case of
372	     multiple and/or correlated failures appearing at several locations
373	     in the network

375	   - Alarm correlation and aggregation (at the failure detecting
376	     node) implies a consistent decision based on the conditions for
377	     which a trade-off between fast convergence (at detecting node) and
378	     fast notification (implying that correlation and aggregation
379	     occurs at receiving end-points) can be found.

381	D.Papadimitriou et al. - Internet Draft � May 2003                   7
382	4.5 Correlating Failure Conditions

384	   A single failure event (such as a span failure) can result into
385	   reporting multiple failures (such as individual LSP failures)
386	   conditions. These can be grouped (i.e. correlated) to reduce the
387	   number of failure conditions communicated on the reporting channel,
388	   for both in-band and out-of-band failure reporting.

390	   In such a scenario, it can be important to wait for a certain period
391	   of time, typically called failure correlation time, and gather all
392	   the failures to report them as a group of failures (or simply group
393	   failure). For instance, this approach can be provided using LMP-WDM
394	   for pre-OTN networks (see [LMP-WDM]) or when using Signal Failure/
395	   Degrade Group in the Sonet/SDH context.

397	   Note that a default average time interval during which failure
398	   correlation operation can be performed is difficult to provide since
399	   it is strongly dependent on the underlying network topology.
400	   Therefore, it can be advisable to provide a per node configurable
401	   failure correlation time. The detailed selection criteria for this
402	   time interval are outside of the scope of this document.

404	   When failure correlation is not provided, multiple failure
405	   notification messages may be sent out in response to a single
406	   failure (for instance, a fiber cut), each one containing a set of
407	   information on the failed working resources (for instance, the
408	   individual lambda LSP flowing through this fiber). This allows for a
409	   more prompt response but can potentially overload the control plane
410	   due to a large amount of failure notifications.

412	5. Recovery Mechanisms and Schemes

414	5.1 Transport vs. Control Plane Responsibilities

416	   For both protection and restoration, and when applicable, recovery
417	   resources are provisioned using GMPLS signalling capabilities. Thus,
418	   these are control plane-driven actions (topological and resource-
419	   constrained) that are always performed in this context.

421	   The following table gives an overview of the responsibilities taken
422	   by the control plane in case of LSP/Span recovery:

424	   1. LSP/span Protection Schemes

426	   - Phase 1: Failure detection                 Transport plane
427	   - Phase 2: Failure isolation/localization    Transport/Control plane
428	   - Phase 3: Failure notification              Transport/Control plane
429	   - Phase 4: Protection switching              Transport/Control plane
430	   - Phase 5: Reversion (normalization)         Transport/Control plane

432	   Note: in the LSP/span protection context control plane actions can
433	   be performed either for operational purposes and/or synchronization

435	D.Papadimitriou et al. - Internet Draft � May 2003                   8
436	   purposes (vertical synchronization between transport and control
437	   plane) and/or notification purposes (horizontal synchronization
438	   between nodes at control plane level).

440	   2. LSP/span Restoration Schemes

442	   - Phase 1: Failure detection                 Transport plane
443	   - Phase 2: Failure isolation/localization    Transport/Control plane
444	   - Phase 3: Failure notification              Control plane
445	   - Phase 4: Recovery switching                Control plane
446	   - Phase 5: Reversion (normalization)         Control plane

448	   Therefore, this document is primarily focused on provisioning of
449	   recovery resources, failure notification, LSP/span recovery and
450	   reversion operations. Moreover some additional considerations can be
451	   dedicated to the mechanisms associated to the failure
452	   localization/isolation phase.

454	5.2 Technology in/dependent mechanisms

456	   The present recovery mechanisms analysis applies in fact to any
457	   circuit oriented data plane technology with discrete bandwidth
458	   increments (like Sonet/SDH, G.709 OTN, etc.) being controlled by an
459	   IP-centric distributed control plane.

461	   The following sub-sections are not intended to favor one technology
462	   versus another. They just lists pro and cons for each of them in
463	   order to determine the mechanisms that GMPLS-based recovery must
464	   deliver to overcome their cons and take benefits of their pros in
465	   their respective applicability context.

467	5.2.1 OTN Recovery

469	   OTN recovery specifics are left for further considerations.

471	5.2.2 Pre-OTN Recovery

473	   Pre-OTN Recovery specifics (also referred to as �lambda switching�)
474	   presents mainly the following advantages:

476	   - benefits from a simpler architecture making it more suitable for
477	     meshed-based recovery schemes (on a per channel basis).

479	   - when providing suppression of intermediate node transponders (vs.
480	     use of non-standard masking of upstream failures) e.g. use of
481	     squelching, implies that failures (such as LoL) will propagate to
482	     edge nodes giving the possibility to initiate upper layer driven
483	     recovery actions.

485	   The main disadvantage comes from the lack of interworking due to the
486	   large amount of failure management (in particular failure
487	   notification protocols) and recovery mechanisms currently available.

489	D.Papadimitriou et al. - Internet Draft � May 2003                   9
490	   Note also, that for all-optical networks, combination of recovery
491	   with optical physical impairments is left for a future release of
492	   this document since corresponding detection technologies are under
493	   specification.

495	5.2.3 Sonet/SDH Recovery

497	   Some of the advantages of Sonet/SDH and more generically any TDM
498	   transport plane are:

500	   - Protection schemes are standardized (see [G.841]) and can operate
501	     across protected domains and interwork (see [G.842]).

503	   - Provides failure detection, notification and path/section
504	     Automatic Protection Switching (APS) mechanisms.

506	   - Provides greater control over the granularity of the TDM
507	     LSPs/Links that can be recovered with respect to coarser optical
508	     channel (or whole fiber content) recovery switching

510	   Some of the current limitations of the Sonet/SDH layer recovery are:

512	   - Limited topological scope: Inherently the use of ring topologies
513	     (Dedicated SNCP or Shared Protection Rings) has a reduced
514	     flexibility with respect to the somewhat more complex but
515	     potentially more resource efficient mesh-based recovery schemes.

517	   - Inefficient use of spare capacity: Sonet/SDH protection is largely
518	     applied for ring topologies, where spare capacity often remains
519	     idle, making the efficiency of bandwidth usage an issue.

521	   - Support of meshed recovery requires intensive network management
522	     development, and the functionality is limited by both the network
523	     elements and the element management systems capabilities.

525	5.3 Specific Aspects of Control Plane-based Recovery Mechanisms

527	5.3.1 In-band vs Out-of-band Signalling

529	   The nodes communicate through the use of (IP terminating) control
530	   channels defining the control plane (transport) topology. In this
531	   context, two classes of transport mechanisms can be considered here
532	   i.e. in-fiber or out-of-fiber (through a dedicated physically
533	   diverse control network referred to as the Data Communication
534	   Network or DCN). The potential impact of the usage of an in-fiber
535	   (signalling) transport mechanism is briefly considered here.

537	   In-fiber transport mechanism can be further subdivided into in-band
538	   and out-of-band. As such, the distinction between in-fiber in-band
539	   and in-fiber out-of-band signalling reduces to the consideration of
540	   a logically versus physically embedded control plane topology with
541	   respect to the transport plane topology. In the scope of this
542	   document, since we assume that (IP terminating) channels between

544	D.Papadimitriou et al. - Internet Draft � May 2003                  10
545	   nodes must be continuously available in order to enable the exchange
546	   of recovery-related information and messages, one considers that in
547	   either case (i.e. in-band or out-of-band) at least one logical
548	   channel or one physical channel between nodes is available.

550	   Therefore, the key issue when using in-fiber signalling is whether
551	   we can assume independence between the fault-tolerance capabilities
552	   of control plane and the failures affecting the transport plane
553	   (including the nodes). Note also that existing specifications like
554	   the OTN provide a limited form of independence for in-fiber
555	   signaling by dedicating a separate optical supervisory channel (OSC,
556	   see [ITU-T G.709] and [ITU-T G.874]) to transport the overhead and
557	   other control traffic. For OTNs, failure of the OSC does not result
558	   in failing the optical channels. Similarly, loss of the control
559	   channel must not result in failing the data (transport plane).

561	5.3.2 Uni- versus Bi-directional Failures

563	   The failure detection, correlation and notification mechanisms
564	   (described in Section 4) can be triggered when either a
565	   unidirectional or a bi-directional LSP/Span failure occurs (or a
566	   combination of both). As illustrated in Figure 1 and 2, two
567	   alternatives can be considered here:

569	   1. Uni-directional failure detection: the failure is detected on the
570	      receiver side i.e. it is only is detected by the downstream node
571	      to the failure (or by the upstream node depending on the failure
572	      propagation direction, respectively)

574	   2. Bi-directional failure detection: the failure is detected on the
575	      receiver side of both downstream node AND upstream node to the
576	      failure.

578	   Notice that after the failure detection time, if only control plane
579	   based failure management is provided, the peering node is unaware of
580	   the failure detection status of its neighbor.

582	    -------             -------           -------             -------
583	   |       |           |       |Tx     Rx|       |           |       |
584	   | NodeA |----...----| NodeB |xxxxxxxxx| NodeC |----...----| NodeD |
585	   |       |----...----|       |---------|       |----...----|       |
586	    -------             -------           -------             -------

588	   t0                                >>>>>>> F

590	   t1                      x <---------------x
591	                               Notification
592	   t2  <--------...--------x                 x--------...-------->
593	          Up Notification                      Down Notification

595	    -------             -------           -------             -------
596	   |       |           |       |Tx     Rx|       |           |       |

598	D.Papadimitriou et al. - Internet Draft � May 2003                  11
599	   | NodeA |----...----| NodeB |xxxxxxxxx| NodeC |----...----| NodeD |
600	   |       |----...----|       |xxxxxxxxx|       |----...----|       |
601	    -------             -------           -------             -------

603	   t0                      F <<<<<<< >>>>>>> F

605	   t1                      x <-------------> x
606	                               Notification
607	   t2  <--------...--------x                 x--------...-------->
608	          Up Notification                      Down Notification

610	    Fig. 1 & 2. Uni- and Bi-directional Failure Detection/Notification

612	   After failure detection, the following failure management operations
613	   can be subsequently considered:

615	   - Each detecting entity sends a notification message to the
616	     corresponding transmitting entity. For instance, in Fig. 1 (Fig.
617	     2), node C sends a notification message to node B (while node B
618	     sends a notification message to node A). To ensure reliable
619	     failure notification, a dedicated acknowledgment message can be
620	     returned back to the sender node.

622	   - Next, within a certain (and pre-determined) time window, nodes
623	     impacted by the failure occurrences perform their correlation. In
624	     case of unidirectional failure, node B only receives the
625	     notification message from node C and thus the time for this
626	     operation is negligible. However, in case of bi-directional
627	     failure, node B (and node C) must correlate the received
628	     notification message from node C (node B, respectively) with the
629	     corresponding locally detected information.

631	   - After some (pre-determined) period of time, referred to as the
632	     hold-off time, after which local recovery actions were not
633	     successful, the following occurs. In case of unidirectional
634	     failure and depending on the directionality of the connection,
635	     node B should send an upstream notification message to the ingress
636	     node A or node C should send a downstream notification to the
637	     egress node D. However, in such a case only node A (node D,
638	     respectively) referred to as the master and node D, to as the
639	     slave per [CCAMP-TERM], would initiate a edge to edge recovery
640	     action. Note that the connection terminating node (i.e. node D or
641	     node A) may be optionally notified.

643	     In case of bi-directional failure, node B may send an upstream
644	     notification message to the ingress node A or node C a downstream
645	     notification to the egress node D. However, due to the dependence
646	     on the connection directionality, only ingress node A or egress
647	     node D would initiate an edge to edge recovery action. Note that
648	     the connection terminating node (i.e. node D or node A) should be
649	     also notified of this event using upstream and downstream fast

651	D.Papadimitriou et al. - Internet Draft � May 2003                  12
652	     notification (see [GMPLS-SIG]). For instance, if a connection
653	     directed from D to A is under failure condition, only the
654	     notification sent by from node C to D would initiate a recovery
655	     action. Here as well, per [CCAMP-TERM], the deciding (and
656	     recovering) node D is referred to as the "master" while the node A
657	     is referred to as the "slave" (i.e. recovering only entity).

659	     Note: The determination of the master and the slave may be based
660	     either on configured information or dedicated protocol capability.

662	   In the above scenarios, the path followed by the notification
663	   messages does not have to be the same as the one followed by the
664	   failed LSP (see [GMPLS-SIG], for more details on the notification
665	   message exchange). The important point, concerning this mechanism,
666	   is that either the detecting/reporting entity (i.e. the nodes B and
667	   C) are also the deciding/recovery entity or the detecting/reporting
668	   entities are simply intermediate nodes in the subsequent recovery
669	   process. One refers to local recovery in the former case and to
670	   edge-to-edge recovery in the latter one.

672	5.3.3 Partial versus Full Span Recovery

674	   When given span carries more than one LSPs or LSP segments, an
675	   additional aspect must be considered during span failure carrying
676	   several LSPs. These LSPs can be either individually recovered or
677	   recovered as a group (aka bulk LSP recovery) or independent sub-
678	   groups. The selection of this mechanism would be triggered
679	   independently of the failure notification granularity when
680	   correlation time windows are used and simultaneous recovery of
681	   several LSPs can be performed using single request. Moreover,
682	   criteria by which such sub-groups can be formed are outside of the
683	   scope of this document.

685	   An additional complexity arises in case of (sub-)group LSP recovery.
686	   Between a given node pair, the LSPs a given (sub-)group contains may
687	   have been created from different source (i.e. initiator) nodes
688	   toward different destinations nodes. Consequently the failure
689	   notification messages sub-sequent to a bi-directional span failure
690	   affecting several LSPs (or the whole group of LSPs it carries) are
691	   not necessarily directed toward the same initiator nodes. In
692	   particular these messages may be directed to both the upstream and
693	   downstream nodes to the failure. Therefore, such span failure may
694	   trigger recovery actions to be performed from both sides (i.e. both
695	   from the upstream and the downstream node to the failure). In order
696	   to facilitate the definition of the corresponding recovery
697	   mechanisms (and their sequence), one assumes here as well, that per
698	   [CCAMP-TERM] the deciding (and recovering) entity, referred to as
699	   the "master" is the only initiator of the recovery of the whole LSP
700	   (sub-)group.

702	D.Papadimitriou et al. - Internet Draft � May 2003                  13
703	5.3.4 Difference between LSP, LSP Segment and Span Recovery

705	   The recovery definitions given in [CCAMP-TERM] are quite generic and
706	   apply for link (or local span) and LSP recovery. The major
707	   difference between LSP, LSP Segment and span recovery is related to
708	   the number of intermediate nodes that the signalling messages have
709	   to travel. Since nodes are not necessarily adjacent in case of LSP
710	   (or LSP Segment) recovery, signalling message exchanges from the
711	   reporting to the deciding/recovery entity will have to cross several
712	   intermediate nodes. In particular, this applies for the notification
713	   messages due to the number of hops separating the failure occurrence
714	   location from their destination. This results in an additional
715	   propagation and forwarding delay. Note that the former delay may in
716	   certain circumstances be non-negligible e.g. in case of copper out-
717	   of-band network one has to consider approximately 1 ms per 200km.

719	   Moreover, the recovery mechanisms applicable to end-to-end LSP and
720	   to the segments (i.e. edge-to-edge) that may compose an end-to-end
721	   LSP can be exactly the same. However, one expects in the latter
722	   case, that the destination of the failure notification message will
723	   be the ingress of each of these segments. Therefore, taking into
724	   account the mechanism described in Section 5.3.2, failure
725	   notification can be first exchanged between the LSP segments
726	   terminating points and after expiration of the hold-off time
727	   directed toward end-to-end LSP terminating points.

729	5.4 Difference between Recovery Type and Scheme

731	   Section 4.6 of [CCAMP-TERM] defines the basic recovery types. The
732	   purpose of this section is to describe the schemes that can be built
733	   using these recovery types. In brief, a recovery scheme is defined
734	   as the combination between different ingress-egress node pairs of a
735	   set of identical recovery types. Several examples are provided in
736	   order to illustrate the difference between a recovery type such as
737	   1:1 and a recovery scheme such as (1:1)^n.

739	   1. (1:1)^n with recovery resource sharing

741	   The exponent, n, indicates the number of times a 1:1 recovery type
742	   is applied between at most n different ingress-egress node pairs.
743	   Here, at most n pairs of disjoint working and recovery LSPs/spans
744	   share at most n times a common resource. Since the working LSPs/
745	   spans are mutually disjoint, simultaneous requests for use of the
746	   shared (common) resource will only occur in case of simultaneous
747	   failures, which is less likely to happen.

749	   For instance, in the (1:1)^2 common case if the 2 recovery LSPs in
750	   the group overlap the same common resource, then it can handle only
751	   single failures; any multiple working LSP failures will cause at
752	   least one working LSP to be denied automatic recovery. Consider for
753	   instance, the following example, with working LSPs A-B and E-F and
754	   recovery LSPs A-C-D-B and E-C-D-F sharing a common C-D resource.

756	D.Papadimitriou et al. - Internet Draft � May 2003                  14
757	                          A ----------------- B
758	                           \                 /
759	                            \               /
760	                             C ----------- D
761	                            /               \
762	                           /                 \
763	                          E ----------------- F

765	   2. (M:N)^n with recovery resource sharing

767	   The exponent, n, indicates the number of times a M:N recovery type
768	   is applied between at most n different ingress-egress node pairs.
769	   So the interpretation follows from the previous case, expect that
770	   here disjointness applies to the N working LSPs/spans and to the M
771	   recovery LSPs/spans while sharing at most n times M common
772	   resources.

774	   In both schemes, one may see the following at the LSP level: we have
775	   a �group� of sum{n=1}^N N{n} working LSPs and a pool of shared
776	   backup resources, not all of which are available to any given
777	   working path. In such conditions, defining a metric that describes
778	   the amount of overlap among the recovery LSPs would give some
779	   indication of the group�s ability to handle multiple simultaneous
780	   failures.

782	   For instance, in the simple (1:1)^n case situation if n recovery
783	   LSPs in a (1:1)^n group overlap, then it can handle only single
784	   failures; any multiple working LSP failures will cause at least one
785	   working LSP to be denied automatic recovery. But if one consider for
786	   instance, a (2:2)^2 group in which there are two pairs of
787	   overlapping recovery LSPs, then two LSPs (belonging to the same
788	   pair) can be simultaneously recovered. The latter case can be
789	   illustrated as follows: 2 working LSPs A-B and E-F and 2 recovery
790	   LSPs A-C-D-B and E-C-D-F sharing the two common C-D resources.

792	                          A ================ B
793	                           \\               //
794	                            \\             //
795	                             C =========== D
796	                            //             \\
797	                           //               \\
798	                          E ================ F

800	   Moreover, in all these schemes, (working) path disjointness can be
801	   reinforced by exchanging working LSP related information during the
802	   recovery LSP signalling.

804	   Specific issues related to the combination of shared (discrete)
805	   bandwidth and disjointness for recovery schemes are described in
806	   Section 8.4.2.

808	D.Papadimitriou et al. - Internet Draft � May 2003                  15
809	5.5 LSP Restoration Schemes

811	5.5.1 Classification

813	   LSPs/spans recovery time and ratio depend on the proper recovery LSP
814	   (soft) provisioning and the level of recovery resources overbooking
815	   (i.e. over-provisioning). A proper balance of these two mechanisms
816	   will result in a desired LSP/span recovery time and ratio when
817	   single or multiple failure(s) occur(s).

819	   Recovery LSP Provisioning phases:

821	   (1) Route Computation --> On-demand
822	           |
823	           |
824	            --> Pre-Computed
825	                    |
826	                    |
827	                   (2) Signalling --> On-demand
828	                           |
829	                           |
830	                            --> Pre-Signaled
831	                                    |
832	                                    |
833	                                   (3) Resource Selection --> On-demand
834	                                                |
835	                                                |
836	                                                 --> Pre-Selected

838	   Overbooking Levels:

840	                    +----- Dedicated (for instance: 1+1, 1:1, etc.)
841	                    |
842	                    |
843	                    +----- Shared (for instance: 1:N, M:N, etc.)
844	                    |
845	   Level of         |
846	   Overbooking -----+----- Unprotected (for instance: 0:1, 0:N)

848	          Fig 3. LSP Provisioning and Overbooking Classification

850	   In this figure, we present a classification of different options
851	   under LSP provisioning and overbooking. Although we acknowledge
852	   these operations are run mostly during planning (using network
853	   planning) and provisioning time (using signaling and routing)
854	   activities, we keep them in analyzing the recovery schemes.

856	   Proper LSP/span provisioning will help in alleviating many of the
857	   failures. As an example, one may compute primary and secondary
858	   paths, either end-to-end or segment-per-segment, to recover an LSP
859	   from multiple failure events affecting link(s), node(s), SRLG(s)

861	D.Papadimitriou et al. - Internet Draft � May 2003                  16
862	   and/or SRG(s). Such primary and secondary LSP/span provisioning can
863	   be categorized, as shown in the above figure, based on:
864	   (1) the recovery path (i.e. route) can be either pre-computed or
865	       computed on demand.
866	   (2) when the recovery path is pre-computed: pre-signaled (implying
867	       recovery resource reservation) or signaled on demand.
868	   (3) and when the recovery resources are reserved, they can be either
869	       pre-selected or selection on-demand.

871	   Note that these different options give rise to different LSP/span
872	   recovery times. The following subsections will consider all these
873	   cases in analyzing the schemes.

875	   There are many mechanisms available allowing the overbooking of the
876	   recovery resources. This overbooking can be done per LSP (such as
877	   the example mentioned above), per link (such as span protection) or
878	   per domain (such as ring topologies). In all these cases the level
879	   of overbooking, as shown in the above figure, can be classified as
880	   dedicated (such as 1+1 and 1:1), shared (such as 1:N and M:N) or
881	   unprotected (i.e. restorable if enough recovery resources are
882	   available).

884	   Under a shared restoration scheme one may support preemptable
885	   (preempt low priority connections in case of resource contention)
886	   extra-traffic. In this document we keep in mind all the above-
887	   mentioned overbooking mechanisms in analyzing the recovery schemes.

889	5.5.2 Dynamic LSP Restoration

891	   We first define the following times in order to provide a
892	   quantitative estimation about the time performance of the different
893	   dynamic and pre-signaled LSP restoration:
894	   - Path Computation Time - Tpc
895	   - Path Selection Time - Tps
896	   - End-to-end LSP Resource Reservation � Trr (a delta for resource
897	     selection is also considered, the corresponding total time is then
898	     referred to as Trrs)
899	   - End-to-end LSP Resource Activation Time � Tra (a delta for
900	     resource selection is also considered, the corresponding total
901	     time is then referred to as Tras)

903	   Path Selection Time (Tps) is considered when a pool of recovery
904	   LSP�s paths between a given source/destination is pre-computed and
905	   after failure occurrence one of these paths is selected for the
906	   recovery of the LSP under failure condition.

908	   Note: failure management operations such as failure detection,
909	   correlation and notification are considered as equivalently time
910	   consuming for all the mechanisms described here below:

912	   1. With Route Pre-computation (or LSP re-provisioning)

914	D.Papadimitriou et al. - Internet Draft � May 2003                  17
915	   An end-to-end restoration LSP is established after the failure(s)
916	   occur(s) based on a pre-computed path (i.e. route). As such, one can
917	   define this as an �LSP re-provisioning� mechanism. Here, one or more
918	   (disjoint) routes for the restoration path are computed (and
919	   optionally pre-selected) before a failure occurs.

921	   No reservation or selection of resources is performed along the
922	   restoration path before failure. As a result, there is no guarantee
923	   that a restoration connection is available when a failure occurs.

925	   The expected total restoration time T is thus equal to Tps + Trrs or
926	   when a dedicated computation is performed for each working LSP to
927	   Trrs.

929	   2. Without Route Pre-computation (or LSP re-routing)

931	   An end-to-end restoration LSP is established after the failure(s)
932	   occur(s). Here, one or more (disjoint) explicit routes for the
933	   restoration path are dynamically computed and one is selected after
934	   failure. As such, one can define this as an �LSP re-routing�
935	   mechanism.

937	   No reservation or selection of resources is performed along the
938	   restoration path before failure. As a result, there is no guarantee
939	   that a restoration connection is available when a failure occurs.

941	   The expected total restoration time T is thus equal to Tpc (+ Tps) +
942	   Trrs. Therefore, time performance between these two approaches
943	   differs by the time required for route computation Tpc (and its
944	   potential selection time, Tps).

946	5.5.3 Pre-signaled Restoration LSP

948	   1. With resource reservation without pre-selection

950	   An end-to-end restoration path is pre-selected from a set of one or
951	   more pre-computed (disjoint) explicit route before failure. The
952	   restoration LSP is signaled along this pre-selected path to reserve
953	   resources (i.e. signaled) at each node but resources are not
954	   selected.

956	   In this case, the resources reserved for each restoration LSP may be
957	   dedicated or shared between different working LSP that are not
958	   expected to fail simultaneously. Local node policies can be applied
959	   to define the degree to which these resources are shared across
960	   independent failures.

962	   Upon failure detection, signaling is initiated along the restoration
963	   path to select the resources, and to perform the appropriate
964	   operation at each node entity involved in the restoration connection
965	   (e.g. cross-connections).

967	D.Papadimitriou et al. - Internet Draft � May 2003                  18
968	   The expected total restoration time T is thus equal to Tras (post-
969	   failure activation) while operations performed before failure
970	   occurrence takes Tpc + Tps + Trr.

972	   2. With resource reservation and pre-selection

974	   An end-to-end restoration path is pre-selected from a set of one or
975	   more pre-computed (disjoint) explicit route before failure. The
976	   restoration LSP is signaled along this pre-selected path to reserve
977	   AND select resources at each node but not cross-connected. Such that
978	   the selection of the recovery resources is fixed at the control
979	   plane level. However, no cross-connections are performed along the
980	   restoration path.

982	   In this case, the resources reserved for each restoration LSP may
983	   only be shared between different working LSPs that are not expected
984	   to fail simultaneously. Since one considers restoration schemes
985	   here, the sharing degree should not be limited to working (and
986	   recovery) LSPs starting and ending at the same ingress and egress
987	   nodes. Therefore, one expects to receive some feedback information
988	   on the recovery resource sharing degree at each node participating
989	   to the recovery scheme.

991	   Upon failure detection, signaling is initiated along the restoration
992	   path to activate the reserved and selected resources and to perform
993	   the appropriate operation at each node involved in the restoration
994	   connection (e.g. cross-connections).

996	   The expected total restoration time T is thus equal to Tra (post-
997	   failure activation) while operations performed before failure
998	   occurrence takes Tpc + Tps + Trrs. Therefore, time performance
999	   between these two approaches differs only by the time required for
1000	   resource selection during the activation of the recovery LSP (i.e.
1001	   Tras � Tra).

1003	5.5.4 LSP Segment Restoration

1005	   The above approaches can be applied on a sub-network basis rather
1006	   than end-to-end basis (in order to reduce the global recovery time).

1008	   It should be also noted that using the horizontal hierarchical
1009	   approach described in Section 7.1, that a given end-to-end LSP can
1010	   be recovered by multiple recovery mechanisms (e.g. 1:1 protection in
1011	   a metro edge network but M:N protection in the core). These
1012	   mechanisms are ideally independent and may even use different
1013	   failure localization and notification mechanisms.

1015	6. Normalization

1017	   Normalization is defined as the mechanism allowing switching normal
1018	   traffic from the recovery LSP/span to the working LSP/span
1019	   previously under failure condition.

1021	D.Papadimitriou et al. - Internet Draft � May 2003                  19
1022	   Use of normalization is under the discretion of the recovery domain
1023	   policy. Normalization (reversion) may impact the normal traffic (a
1024	   second hit) depending on the normalization mechanism used.

1026	   If normalization is supported 1) the LSP/span must be returned to
1027	   the working LSP/span when the failure condition clears 2) capability
1028	   to de-activate (turn-off) the use of reversion should be provided.
1029	   De-activation of reversion should not impact the normal traffic
1030	   (regardless if currently using the working or recovery LSP/span).

1032	   Note: during the failure, the reuse of any non-failed resources
1033	   (e.g. LSP spans) belonging to the working LSP/span is under the
1034	   discretion of recovery domain policy.

1036	6.1 Wait-To-Restore

1038	   A specific mechanism (Wait-To-Restore) is used to prevent frequent
1039	   recovery switching operation due to an intermittent defect (e.g. BER
1040	   fluctuating around the SD threshold).

1042	   First, an LSP/span under failure condition must become fault-free,
1043	   e.g. a BER less than a certain recovery threshold. After the
1044	   recovered LSP/span (i.e. the previously working LSP/span) meets this
1045	   criterion, a fixed period of time shall elapse before normal traffic
1046	   uses the corresponding resources again. This duration called Wait-
1047	   To-Restore (WTR) period or timer is generally of the order of a few
1048	   minutes (for instance, 5 minutes) and should be capable of being
1049	   set. The WTR timer may be either a fixed period, or provide for
1050	   incremental longer periods before retrying. An SF or SD condition on
1051	   the previously working LSP/span will override the WTR timer value
1052	   (i.e. the WTR cancels and the WTR timer will restart).

1054	6.2 Revertive Mode Operation

1056	   In revertive mode of operation, when the recovery LSP/span is no
1057	   longer required, i.e. the failed working LSP/span is no longer in SD
1058	   or SF condition, a local Wait-to-Restore (WTR) state will be
1059	   activated before switching the normal traffic back to the recovered
1060	   working LSP/span.

1062	   During the reversion operation, since this state becomes the highest
1063	   in priority, signalling must maintain the normal traffic on the
1064	   recovery LSP/span from the previously failed working LSP/span.
1065	   Moreover, during this WTR state, any null traffic or extra traffic
1066	   (if applicable) request is rejected.

1068	   However, deactivation (cancellation) of the wait-to-restore timer
1069	   may occur in case of higher priority request attempts. That is the
1070	   recovery LSP/span usage by the normal traffic may be preempted if a
1071	   higher priority request for this recovery LSP/span is attempted.

1073	D.Papadimitriou et al. - Internet Draft � May 2003                  20
1074	6.3 Orphans

1076	   When a reversion operation is requested normal traffic must be
1077	   switched from the recovery to the recovered working LSP/span. A
1078	   particular situation occurs when the previously working LSP/span can
1079	   not be recovered such that normal traffic can not be switched back.
1080	   In such a case, the LSP/span under failure condition (also referred
1081	   to as �orphan�) must be cleared i.e. removed from the pool of
1082	   resources allocated for normal traffic. Otherwise, potential de-
1083	   synchronization between the control and transport plane resource
1084	   usage can appear. Depending on the signalling protocol capabilities
1085	   and behavior different mechanisms are to be expected here.

1087	   Therefore any reserved or allocated resources for the LSP/span under
1088	   failure condition must be unreserved/de-allocated. Several ways can
1089	   be used for that purpose: wait for the elapsing of the clear-out
1090	   time interval, or initiate a deletion from the ingress or the egress
1091	   node, or trigger the initiation of deletion from an entity (such as
1092	   an EMS or NMS) capable to react on the reception of an appropriate
1093	   notification message.

1095	7. Hierarchies

1097	   Recovery mechanisms are being made available at multiple (if not
1098	   each) transport layers within so-called �IP-over-optical� networks.
1099	   However, each layer has certain recovery features and one needs to
1100	   determine the exact impact of the interaction between the recovery
1101	   mechanisms provided by these layers.

1103	   Hierarchies are used to build scalable complex systems. Abstraction
1104	   is used as a mechanism to build large networks or as a technique for
1105	   enforcing technology, topological or administrative boundaries. The
1106	   same hierarchical concept can be applied to control the network
1107	   survivability. In general, it is expected that the recovery action
1108	   is taken by the recoverable LSP/span closest to the failure in order
1109	   to avoid the multiplication of recovery actions. Moreover, recovery
1110	   hierarchies can be also bound to control plane logical partitions
1111	   (e.g. administrative or topological boundaries). Each of them may
1112	   apply different recovery mechanisms.

1114	   In brief, commonly accepted ideas are generally that the lower
1115	   layers can provide coarse but faster recovery while the higher
1116	   layers can provide finer but slower recovery. Moreover, it is also
1117	   more than desirable to avoid too many layers with functional
1118	   overlaps. In this context, this section intends to analyze these
1119	   hierarchical aspects including the physical (passive) layer(s).

1121	7.1 Horizontal Hierarchy (Partitioning)

1123	   A horizontal hierarchy is defined when partitioning a single layer
1124	   network (and its control plane) into several recovery domains.
1125	   Within a domain, the recovery scope may extend over a link (or
1126	   span), LSP segment or even an end-to-end LSP. Moreover, an

1128	D.Papadimitriou et al. - Internet Draft � May 2003                  21
1129	   administrative domain may consist of a single recovery domain or can
1130	   be partitioned into several smaller recovery domains. The operator
1131	   can partition the network into recovery domains based on physical
1132	   network topology, control plane capabilities or various traffic
1133	   engineering constraints.

1135	   An example often addressed in the literature is the metro-core-metro
1136	   application (sometimes extended to a metro-metro/core-core) within a
1137	   single transport layer (see Section 7.2). For such a case, an end-
1138	   to-end LSP is defined between the ingress and egress metro nodes,
1139	   while LSP segments may be defined within the metro or core sub-
1140	   networks. Each of these topological structures determines a so-
1141	   called �recovery domain� since each of the LSPs they carry can have
1142	   its own recovery type (or even scheme). The support of multiple
1143	   recovery schemes within a sub-network is referred to as a multi-
1144	   recovery capable domain or simply multi-recovery domain.

1146	7.2 Vertical Hierarchy (Layers)

1148	   It is a very challenging task to combine in a coordinated manner the
1149	   different recovery capabilities available across the path (i.e.
1150	   switching capable) and section layers to ensure that certain network
1151	   survivability objectives are met for the different services
1152	   supported by the network.

1154	   As a first analysis step, one can draw the following guidelines for
1155	   a vertical coordination of the recovery mechanisms:
1156	   - The lower the layer the faster the notification and switching
1157	   - The higher the layer the finer the granularity of the recoverable
1158	     entity and therefore the granularity of the recovery resource
1159	     (and subsequently its sharing ratio)

1161	   Therefore, in the scope of this analysis, a vertical hierarchy
1162	   consists of multiple layered transport planes providing different:
1163	   - Discrete bandwidth granularities for non-packet LSPs such as OCh,
1164	     ODUk, STS SPE/HOVC and VT SPE/LOVC LSPs and continuous bandwidth
1165	     granularities for packet LSPs
1166	   - Potentially, recovery capabilities with different temporal
1167	     granularities: ranging from milliseconds to tens of seconds

1169	   Note: based on the bandwidth granularity we can determine four
1170	   classes of vertical hierarchies� (1) packet over packet (2) packet
1171	   over circuit (3) circuit over packet and (4) circuit over circuit.
1172	   Here below we extend a little bit more on (4), (2) being covered in
1173	   [TE-RH] on the other hand (1) is extensively covered at the MPLS
1174	   Working Group, and (3) at the PWE3 Working Group.

1176	   In Sonet/SDH environments, one typically considers the VT/LOVC and
1177	   STS SPE/HOVC as independent layers, VT/LOVC LSP using the underlying
1178	   STS SPE/HOVC LSPs as links, for instance. In OTN, the ODUk path
1179	   layers will lie on the OCh path layer i.e. the ODUk LSPs using the
1180	   underlying OCh LSPs as links. Notice here that server layer LSPs may

1182	D.Papadimitriou et al. - Internet Draft � May 2003                  22
1183	   simply be provisioned and not dynamically triggered or established
1184	   (control driven approach).

1186	   The following figure (including only the path layers) illustrates
1187	   the hierarchical layers that can be covered by the recovery
1188	   architecture of a transmission network comprising a SDH/Sonet and an
1189	   OTN part:

1191	    LOVC <------------------------------------------------------> LOVC
1192	     ||                                                            ||
1193	    HOVC ---- HOVC <----------------------------------> HOVC ---- HOVC
1194	               ||                                        ||
1195	    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1196	               ||                                        ||
1197	              ODUk ---- ODUk <--------------> ODUk ---- ODUk
1198	                         ||                    ||
1199	                        OTUk <--------------> OTUk
1200	                         ||                    ||
1201	                        OCh -- OCh -..- OCh -- OCh

1203	   In this context, the important points are the following:
1204	   - these layers are path layers; i.e. the ones controlled by
1205	     the GMPLS (in particular, signalling) protocol suite.
1206	   - an LSP at the lower layer for instance an optical channel (=
1207	     network connection) appears as a section (= link) for the OTUk
1208	     layer i.e. the links that are typically controlled by link
1209	     management protocols such as LMP.

1211	   The first key issue with multi-layer recovery is that achieving
1212	   control plane individual or bulk LSP recovery will be as efficient
1213	   as the underlying link (local span) recovery. In such a case, the
1214	   span can be either protected or unprotected, but the LSP it carries
1215	   MUST be (at least locally) recoverable. Therefore, the span recovery
1216	   process can either be independent when protected (or restorable), or
1217	   triggered by the upper LSP recovery process. The former requires
1218	   coordination in order to achieve subsequent LSP recovery. Therefore,
1219	   in order to achieve robustness and fast convergence, multi-layer
1220	   recovery requires a fine-tuned coordination mechanism.

1222	   Moreover, in the absence of adequate recovery mechanism coordination
1223	   (pre-determined for instance by the hold-off timer), a failure
1224	   notification may propagate from one layer to the next within a
1225	   recovery hierarchy. This can cause "collisions" and trigger
1226	   simultaneous recovery actions that may lead to race conditions and
1227	   in turn, reduce the optimization of the resource utilization and/or
1228	   generate global instabilities in the network (see [MANCHESTER]).
1229	   Therefore, a consistent and efficient escalation strategy is needed
1230	   to coordinate recovery across several layers.

1232	   Therefore, one can expect that the definition of the recovery
1233	   mechanisms and protocol(s) is technology independent such that they
1234	   can be consistently implemented at different layers; this would in
1235	   turn simplify their global coordination. Moreover, as mentioned in

1237	D.Papadimitriou et al. - Internet Draft � May 2003                  23

1239	   [TE-RH], some looser form of coordination and communication between
1240	   (vertical) layers such a consistent hold-off timer configuration
1241	   (and setup through signalling during the working LSP establishment)
1242	   can be considered in this context, allowing synchronization between
1243	   recovery actions performed across these layers.

1245	   Note: Recovery Granularity

1247	   In most environments, the design of the network and the vertical
1248	   distribution of the LSP bandwidth are such that the recovery
1249	   granularity is finer for higher layers. The OTN and SDH/Sonet layers
1250	   can only recover the whole section or the individual connections it
1251	   transports whereas IP/MPLS layer(s) can recover individual packet
1252	   LSPs or groups of packet LSPs.

1254	   Obviously, the recovery granularity at the sub-wavelength (i.e.
1255	   Sonet/SDH) level can be provided only when the network includes
1256	   devices switching at the same granularity level (and thus not with
1257	   optical channel switching capable devices). Therefore, the network
1258	   layer can deliver control-plane driven recovery mechanisms on a per-
1259	   LSP basis if and only if the LSPs class has the corresponding
1260	   switching capability at the transport plane level.

1262	7.3 Escalation Strategies

1264	   There are two types of escalation strategies (see [DEMEESTER]):
1265	   bottom-up and top-down.

1267	   The bottom-up approach assumes that lower layer recovery schemes are
1268	   more expedient and faster than the upper layer one. Therefore we can
1269	   inhibit or hold-off higher layer recovery. However this assumption
1270	   is not entirely true. Imagine a Sonet/SDH based protection mechanism
1271	   (with a less than 50 ms protection switching time) lying on top of
1272	   an OTN restoration mechanism (with a less than 200 ms restoration
1273	   time). Therefore, this assumption should be (at least) clarified as:
1274	   lower layer recovery schemes are faster than upper level one but
1275	   only if the same type of recovery mechanism is used at each layer
1276	   (assuming that the lower layer one is faster).

1278	   Consequently, taking into account the recovery actions at the
1279	   different layers in a bottom-up approach, if lower layer recovery
1280	   mechanisms are provided and sequentially activated in conjunction
1281	   with higher layer ones, the lower layers MUST have an opportunity to
1282	   recover normal traffic before the higher layers do. However, if
1283	   lower layer recovery is slower than higher layer recovery, the lower
1284	   layer MUST either communicate the failure related information to the
1285	   higher layer(s) (and allow it to perform recovery), or use a hold-
1286	   off timer in order to temporarily set the higher layer recovery
1287	   action in a �standby mode�. Note that the a priori information
1288	   exchange between layers concerning their efficiency is not within
1289	   the current scope of this document. Nevertheless, the coordination
1290	   functionality between layers must be configurable and tunable.

1292	D.Papadimitriou et al. - Internet Draft � May 2003                  24
1293	   An example of coordination between the optical and packet layer
1294	   control plane enables for instance letting the optical layer
1295	   performing the failure management operations (in particular, failure
1296	   detection and notification) while giving to the packet layer control
1297	   plane the authority to perform the recovery actions. In case of
1298	   packet layer unsuccessful recovery action, fallback at the optical
1299	   layer can be subsequently performed.

1301	   The Top-down approach attempts service recovery at the higher layers
1302	   before invoking lower layer recovery. Higher layer recovery is
1303	   service selective, and permits "per-CoS" or "per-connection" re-
1304	   routing. With this approach, the most important aspect is that the
1305	   upper layer must provide its own reliable and independent failure
1306	   detection mechanism from the lower layer.

1308	   The same reference suggests also recovery mechanisms incorporating a
1309	   coordinated effort shared by two adjacent layers with periodic
1310	   status updates. Moreover, at certain layers, some of these recovery
1311	   operations can be pre-assigned, e.g. a particular link will be
1312	   handled by the packet layer while another will be handled by the
1313	   optical layer.

1315	7.4 Disjointness

1317	   Having link and node diverse working and recovery LSPs/spans does
1318	   not guarantee working and recovery LSPs/Spans disjointness. Due to
1319	   the common physical layer topology (passive), additional
1320	   hierarchical concepts such as the Shared Risk Link Group (SRLG) and
1321	   mechanisms such as SRLG diverse path computation must be developed
1322	   to provide a complete working and recovery LSP/span disjointness
1323	   (see [IPO-IMP] and [CCAMP-SRLG]). Otherwise, a failure affecting the
1324	   working LSP/span would also potentially affect the recovery LSP/span
1325	   resources, one refers to such event as a common failure.

1327	7.4.1 SRLG Disjointness

1329	   A Shared Risk Link Group (SRLG) is defined as the set of optical
1330	   spans (or links or optical lines) sharing a common physical resource
1331	   (for instance, fiber links, fiber trunks or cables) i.e. sharing a
1332	   common risk. For instance, a set of links L belongs to the same SRLG
1333	   s, if they are provisioned over the same fiber link f.

1335	   The SRLG properties can be summarized as follows:

1337	   1) A link belongs to more than one SRLG if and only if it crosses
1338	      one of the resources covered by each of them.

1340	   2) Two links belonging to the same SRLG can belong individually to
1341	      (one or more) other SRLGs.

1343	   3) The SRLG set S of an LSP is defined as the union of the
1344	      individual SRLG s of the individual links composing this LSP.

1346	D.Papadimitriou et al. - Internet Draft � May 2003                  25
1347	   SRLG disjointness for LSP:

1349	      The LSP SRLG disjointness concept is based on the following
1350	      postulate: an LSP (i.e. sequence of links) covers an SRLG if and
1351	      only if it crosses one of the links belonging to that SRLG.

1353	      Therefore, the SRLG disjointness for LSPs can be defined as
1354	      follows: two LSPs are disjoint with respect to an SRLG s if and
1355	      only if none of them covers simultaneously this SRLG.

1357	      While the LSP SRLG disjointness with respect of a set S of SRLGs
1358	      is defined as follows: two LSPs are disjoint with respect to a
1359	      set of SRLGs S if and only if the sets of SRLGs they cover are
1360	      completely and mutually disjoint.

1362	   The impact on recovery is obvious: SRLG disjointness is a necessary
1363	   (but not a sufficient) condition to ensure optical network
1364	   survivability. With respect to the physical network resources, a
1365	   working-recovery LSP/span pair must be SRLG disjoint in case of
1366	   dedicated recovery type while a working-recovery LSP/span group must
1367	   be SRLG disjoint in case of shared recovery.

1369	7.4.2 SRG Disjointness

1371	   By extending the previous definition from a link to a more generic
1372	   structure, referred to as a �risk domain�, one comes to the SRG
1373	   (Shared Risk Group) notion (see [CCAMP-SRG]). A risk domain is a
1374	   group of arbitrarily connected nodes and spans that together can
1375	   provide certain like-capabilities (such as a chain of dedicated/
1376	   shared protected links and nodes, or a ring forming nodes and links,
1377	   or a protected hierarchical TE Link).

1379	   In turn, an SRG represents the risk domain capabilities and other
1380	   parameters, which assist in computing diverse paths through the
1381	   domain (it can also be used in assessing the risk associated with
1382	   the risk domain.)

1384	   Note that the SRLG set of a risk domain constitutes a subset of the
1385	   SRGs. SRLGs address only risks associated with the links (physical)
1386	   and passive elements within the risk domain, whereas SRGs may
1387	   contain nodes and other topological information in addition to the
1388	   links. The key difference between an SRLG and an SRG is that an SRLG
1389	   translates to only one link share risk with respect to server layer
1390	   topology (even hierarchical TE Links) while an SRG translates a
1391	   sequence of SRLGs over the same layer from one source to one or more
1392	   than one destination located within the same area.

1394	   As for SRLG disjointness, the impact on recovery is that SRG
1395	   disjointness is a necessary (but not a sufficient) condition to
1396	   ensure optical network survivability. With respect to the physical
1397	   and logical network resources (and topology), a working-recovery
1398	   LSP/span pair must be SRG disjoint in case of dedicated recovery

1400	D.Papadimitriou et al. - Internet Draft � May 2003                  26
1401	   type while a working-recovery LSP/span group must be SRG disjoint in
1402	   case of shared recovery.

1404	8. Recovery Scheme/Strategy Selection

1406	   In order to provide a structured selection and analysis of the
1407	   recovery scheme/strategy, the following dimensions can be defined:

1409	   1. Fast convergence (performance): provide a mechanism that
1410	      aggregates multiple failures (this implies fast failure
1411	      detection and correlation mechanisms) and fast recovery decision
1412	      independently of the number of failures occurring in the optical
1413	      network (implying also a fast failure notification).

1415	   2. Efficiency (scalability): minimize the switching time required
1416	      for LSP/span recovery independently of number of LSPs/spans being
1417	      recovered (this implies an efficient failure correlation, a fast
1418	      failure notification and timely efficient recovery mechanism(s)).

1420	   3. Robustness (availability): minimize the LSP/span downtime
1421	      independently of the underlying topology of the transport plane
1422	      (this implies a highly responsive recovery mechanism).

1424	   4. Resource optimization (optimality): minimize the resource
1425	      capacity, including LSP/span and nodes (switching capacity),
1426	      required for recovery purposes; this dimension can also be
1427	      referred to as optimize the sharing degree of the recovery
1428	      resources.

1430	   5. Cost optimization: provide a cost-effective recovery strategy.

1432	   However, these dimensions are either out of the scope of this
1433	   document such as cost optimization and recovery path computational
1434	   aspects or going in opposite directions. For instance, it is obvious
1435	   that providing a 1+1 recovery type for each LSP minimizes the LSP
1436	   downtime (in case of failure) while being non-scalable and recovery
1437	   resource consuming without enabling any extra-traffic.

1439	   The following sections try to provide a first response in order to
1440	   select a recovery strategy with respect to the dimensions described
1441	   above and the recovery schemes proposed in [CCAMP-TERM].

1443	8.1 Fast Convergence (Detection/Correlation and Hold-off Time)

1445	   Fast convergence is related to the failure management operations. It
1446	   refers to the elapsing time between the failure detection/
1447	   correlation and hold-off time, point at which the recovery switching
1448	   actions are initiated. This point has been already discussed in
1449	   Section 4.

1451	D.Papadimitriou et al. - Internet Draft � May 2003                  27
1452	8.2 Efficiency (Switching Time)

1454	   In general, the more pre-assignment/pre-planning of the recovery
1455	   LSP/span, the more rapid the recovery scheme is. Since protection
1456	   implies pre-assignment (and cross-connection in case of LSP
1457	   recovery) of the protection resources, in general, protection
1458	   schemes recover faster than restoration schemes.

1460	   Span restoration (since using control plane) is also likely to be
1461	   slower than most span protection types; however this greatly depends
1462	   on the span restoration signalling efficiency. LSP Restoration with
1463	   pre-signaled and pre-selected recovery resources is likely to be
1464	   faster than fully dynamic LSP restoration, especially because of the
1465	   elimination of any potential crank-back during the recovery LSP
1466	   establishment.

1468	   If one excludes the crank-back issue, the difference between dynamic
1469	   and pre-planned restoration depends on the restoration path
1470	   computation and path selection time. Since computational
1471	   considerations are outside of the scope of this document, it is up
1472	   to the vendor to determine the average path computation time in
1473	   different scenarios and to the operator to decide whether or not
1474	   dynamic restoration is advantageous over pre-planned schemes
1475	   depending on the network environment. This difference depends also
1476	   on the flexibility provided by pre-planned restoration with respect
1477	   to dynamic one: the former implies a limited number of failure
1478	   scenarios (that can be due for instance to local storage
1479	   limitation). This, while the latter enables an on-demand path
1480	   computation based on the information received through failure
1481	   notification and as such more robust with respect to the failure
1482	   scenario scope.

1484	   Moreover, LSP segment restoration, in particular, dynamic
1485	   restoration (i.e. no path pre-computation so none of the recovery
1486	   resource is pre-signaled) will generally be faster than end-to-end
1487	   LSP schemes. However, local LSP restoration assumes that each LSP
1488	   segment end-point has enough computational capacity to perform this
1489	   operation while end-to-end requires only that LSP end-points
1490	   provides this path computation capability.

1492	   Recovery time objectives for Sonet/SDH protection switching (not
1493	   including time to detect failure) are specified in [G.841] at 50 ms,
1494	   taking into account constraints on distance, number of connections
1495	   involved, and in the case of ring enhanced protection, number of
1496	   nodes in the ring. Recovery time objectives for restoration
1497	   mechanisms have been proposed through a separate effort [TE-RH].

1499	8.3 Robustness

1501	   In general, the less pre-assignment (protection)/pre-planning
1502	   (restoration) of the recovery LSP/span, the more robust the recovery
1503	   type/scheme is to a variety of (single) failures, provided that
1504	   adequate resources are available. Moreover, the pre-selection of the

1506	D.Papadimitriou et al. - Internet Draft � May 2003                  28
1507	   recovery resources gives less flexibility for multiple failure
1508	   scenarios than no recovery resource pre-selection. For instance, if
1509	   failures occur that affect two LSPs sharing a common link along
1510	   their restoration paths, then only one of these LSPs can be
1511	   recovered. This occurs unless the restoration path of at least one
1512	   of these LSPs is re-computed or the local resource assignment is
1513	   modified on the fly.

1515	   In addition, recovery schemes with pre-planned recovery resources,
1516	   in particular spans for protection and LSP for restoration purposes,
1517	   will not be able to recover from failures that simultaneously affect
1518	   both the working and recovery LSP/span. Thus, the recovery resources
1519	   should ideally be chosen to be as disjoint as possible (with respect
1520	   to link, node and SRLG) from the working ones, so that any single
1521	   failure event will not affect both working and recovery LSP/span. In
1522	   brief, working and recovery resource must be fully diverse in order
1523	   to guarantee that a given failure will not affect simultaneously the
1524	   working and the recovery LSP/span. Also, the risk of simultaneous
1525	   failure of the working and restoration LSP can be reduced by re-
1526	   computing a restoration path whenever a failure occurs along the
1527	   corresponding recovery LSP or by re-computing a restoration path and
1528	   re-provisioning the corresponding recovery LSP whenever a failure
1529	   occurs along a working LSP/span. This method enables to maintain the
1530	   number of available recovery path constant.

1532	   The robustness of a recovery scheme is also determined by the amount
1533	   of reserved (i.e. signaled) recovery resources within a given shared
1534	   resource pool: as the amount of recovery resources sharing degree
1535	   increases, the recovery scheme becomes less robust to multiple
1536	   failure occurrences. Recovery schemes, in particular restoration,
1537	   with pre-signaled resource reservation (with or without pre-
1538	   selection) should be capable to reserve the adequate amount of
1539	   resource to ensure recovery from any specific set of failure events,
1540	   such as any single SRLG failure, any two SRLG failures etc.

1542	8.4 Resource Optimization

1544	   It is commonly admitted that sharing recovery resources provides
1545	   network resource optimization. Therefore, from a resource
1546	   utilization perspective, protection schemes are often classified
1547	   with respect to their degree of sharing recovery resources with
1548	   respect to the working entities. Moreover, non-permanent bridging
1549	   protection types allow (under normal conditions) for extra-traffic
1550	   over the recovery resources.

1552	   From this perspective 1) 1+1 LSP/Span protection is the more
1553	   resource consuming protection type since it doesn�t allow for any
1554	   extra-traffic 2) 1:1 LSP/span protection type requires dedicated
1555	   recovery LSP/span allowing carrying extra preemptible traffic 3) 1:N
1556	   and M:N LSP/span recovery types require 1 (or M, respectively)
1557	   recovery LSP/span (shared between the N working LSP/span) while
1558	   allowing carrying extra preemptible traffic. Obviously, 1+1
1559	   protection precludes and 1:1 recovery type does not allow for

1561	D.Papadimitriou et al. - Internet Draft � May 2003                  29
1562	   recovery LSP/span sharing whereas 1:N and M:N recovery types do
1563	   allow sharing of 1 (M, respectively) recovery LSP/spans between N
1564	   working LSP/spans.

1566	   However, despite the fact that the 1:1 recovery type does not allow
1567	   recovery LSP/span sharing, the recovery schemes (see Section 5.4)
1568	   that can be built from them (e.g.(1:1)^n) do allow for sharing of
1569	   recovery resources these entities includes. In addition, the
1570	   flexibility in the usage of shared recovery resources (in
1571	   particular, shared links) may be limited because of network topology
1572	   restrictions, e.g. fixed ring topology for traditional enhanced
1573	   protection schemes.

1575	   On the other hand, in restoration with pre-signaled resource
1576	   reservation, the amount of reserved restoration capacity is
1577	   determined by the local bandwidth reservation policies. In
1578	   restoration schemes with re-provisioning, a pool of restoration
1579	   resource can be defined from which all (spare) restoration resources
1580	   are selected after failure occurrence for recovery path computation
1581	   purpose. The degree to which restoration schemes allow sharing
1582	   amongst multiple independent failures is then directly dictated by
1583	   the size of the restoration pool. Moreover, in all restoration
1584	   schemes, spare resources can be used to carry preemptible traffic
1585	   (thus over preemptible LSP/span) when the corresponding resources
1586	   have not been committed for LSP/span recovery purposes.

1588	   From this, it clearly follows that less recovery resources (i.e.
1589	   LSP/spans and switching capacity) have to be allocated to a shared
1590	   recovery resource pool if a greater sharing degree is allowed. Thus,
1591	   the degree to which the network is survivable is determined by the
1592	   policy that defines the amount of reserved (shared) recovery
1593	   resources and the maximum sharing degree allowed.

1595	8.4.1. Recovery Resource Sharing

1597	   When recovery resources are shared over several LSP/Spans, [GMPLS-
1598	   RTG], the use of the Maximum LSP Bandwidth, the Maximum Reservable
1599	   Bandwidth and the Unreserved Bandwidth TE Link sub-TLVs provides
1600	   only part of the information needed to obtain the optimization of
1601	   the network resources allocated for shared recovery purposes.

1603	   Here, one has to additionally consider a recovery resource sharing
1604	   ratio (or degree) in order to optimize the shared resource usage,
1605	   since the distribution of the bandwidth utilization per component
1606	   Link ID over a given TE Link is by definition unknown. For this
1607	   purpose, we define the difference between Maximum Reservable
1608	   Bandwidth (for recovery) and the Maximum Capacity per TE Link i as
1609	   the Maximum Sharable Bandwidth or max_R[i]. Within this quantity,
1610	   the amount of bandwidth currently allocated for shared recovery per
1611	   TE Link i is defined as R[i]. Both quantities are expressed in terms
1612	   of component link bandwidth unit (and thus equivalently the Min LSP
1613	   Bandwidth is of one bandwidth unit).

1615	D.Papadimitriou et al. - Internet Draft � May 2003                  30
1616	   From these definitions, it results that the usage of this
1617	   information available per TE Link can be considered in order to
1618	   optimize the usage of the resources allocated (per TE Link) for
1619	   shared recovery. If one refers to r[i] as the actual bandwidth per
1620	   TE Link i (in terms of per component bandwidth unit) committed for
1621	   shared recovery, then the following quantity must be maximized over
1622	   the potential TE Link candidates: sum {i=1}^N [(R{i} + r{i})/(t{i} �
1623	   b{i})] or equivalently: sum {i=1}^N [(R{i} + r{i})/r{i}] with R{i}
1624	   >= 1 and r{i} >= 1 (in terms of per component bandwidth unit). In
1625	   this formula, N is the total number of links traversed by a given
1626	   LSP, t[i] the Maximum LSP Bandwidth per TE Link i and b[i] the sum
1627	   per TE Link i of the bandwidth committed for working LSPs and
1628	   dedicated recovery. The quantity [(R{i} + r{i})/r{i}] is defined as
1629	   the Shared (Recovery) Bandwidth Ratio per TE Link i. In addition, TE
1630	   Links for which R[i] = max_R[i] or for which r[i] = 0 are pruned
1631	   during  recovery path computation. Note also that the TE Links for
1632	   which R[i] = max_R[i] = r[i] can not be shared more than twice
1633	   (their sharing ratio equals 2).

1635	   More generally, one can draw the following mapping between the
1636	   available bandwidth at the transport and control plane level:

1638	                                 - -------- Max Reservable Bandwidth
1639	                                |R -----
1640	                                 - -----
1641	                                   -----
1642	   --------  TE Link Capacity    - -------- TE Link Capacity
1643	   -----                        |r -----
1644	   -----     <------ b ------>   - -----
1645	   -----                           -----
1646	   -----                           -----
1647	   -----                           ----- <--- Min LSP Bandwidth
1648	   -------- 0                      -------- 0

1650	   Note that the above approach does not require the flooding of any
1651	   per LSP information or a detailed distribution of the bandwidth
1652	   allocation per component link (or individual ports). Moreover, it
1653	   has been demonstrated that this Partial Information Routing approach
1654	   can also be extended to resource shareability with respect to the
1655	   number of times each SRLG is protected by a recovery resource, in
1656	   particular an LSP (see also Section 8.4.2). This method also
1657	   referred to as stochastic approach is described in [BOUILLET]. By
1658	   flooding this summarized information using a link-state protocol,
1659	   recovery path computation and selection for SRLG diverse recovery
1660	   paths can be optimized with respect to resource sharing giving a
1661	   performance difference of less than 5% compared to a Full
1662	   Information Flooding approach. The latter can be found in [GLI] for
1663	   instance. Strictly speaking both methods rely on deterministic
1664	   knowledge of the network topology and resource (usage) status.

1666	   For GMPLS-based recovery purposes, the Partial Information Routing
1667	   approach can be further enhanced by extending GMPLS signalling
1668	   capabilities. This, by allowing the working LSP related information

1670	D.Papadimitriou et al. - Internet Draft � May 2003                  31
1671	   and in particular, its explicit route to be exchanged over the
1672	   recovery LSP in order to enable more efficient admission control at
1673	   shared (link) resource upstream nodes.

1675	8.4.2 Recovery Resource Sharing and SRLG Disjointness

1677	   As stated in the previous section, resource shareability should be
1678	   maximized with respect to the number of times each SRLG is protected
1679	   by a recovery resource.

1681	   Methods can be considered for avoiding contention for the shared
1682	   recovery resources during a single SRLG/node failure (see Section
1683	   5). These allow the sharing of common reserved recovery resource
1684	   between two (or more) recovery LSPs (only) if their respective
1685	   working LSPs are mutually disjoint with respect to link, node or
1686	   SRLG. A single failure then does not disrupt several (at least two)
1687	   working LSPs simultaneously.

1689	   For this purpose, additional extensions to [GMPLS-RTG] in support of
1690	   the path computation for shared mesh restoration may be considered.
1691	   First, the information about the recovery resource sharing on a TE
1692	   link such as the current number of recovered LSPs sharing the
1693	   recovery resources reserved on the TE link (see also Section 8.4.1)
1694	   and the current number of SRLGs recovered by this amount of shared
1695	   recovery resource on the TE link, may be considered. The latter is
1696	   equivalent to the total number of SRLGs that the (recovery) LSPs
1697	   sharing the recovery resources shall recover. Then, if SRLG-
1698	   disjointness has to be considered under strong recovery guarantee in
1699	   the event of a single SRLG failure, the explicit list of SRLGs
1700	   recovered by the currently recovery resources shared on the TE link
1701	   together with their respective sharable recovery bandwidth (see also
1702	   Section 8.4.1). The latter information is equivalent to the maximum
1703	   sharable recovery bandwidth per SRLG or per group of SRLG (thus one
1704	   considers a decreasing amount of sharable bandwidth and SRLG list
1705	   over time).

1707	   Note: it has to be emphasized that a per (group of) SRLG maximum
1708	   sharable recovery bandwidth is restricted by the length that the
1709	   corresponding (sub-)TLV may take and thus the number of SRLGs that
1710	   it can include.

1712	   Therefore, compared to the case of simple recovery resource sharing
1713	   regardless of SRLG disjointness (as described in Section 8.4.1), the
1714	   additional TE link information considered here should allow for
1715	   better path selection (at distinct ingress node) during SRLG-
1716	   disjoint LSP provisioning in shared meshed recovery scheme. The next
1717	   section will demonstrate that such extensions are complementary to
1718	   the exchange of the explicit route of working LSP over the recovery
1719	   LSP path in order to achieve shared recovery resource contention
1720	   avoidance.

1722	D.Papadimitriou et al. - Internet Draft � May 2003                  32
1723	8.4.3 Recovery Resource Sharing, SRLG Disjointness and Admission
1724	   Control

1726	   Admission control is a strict requirement to be fulfilled by nodes
1727	   giving access to shared links. This can be illustrated using the
1728	   following recovery scheme:

1730	      A ------
1731	      |       |
1732	      |       C ====== D
1733	      |       |        |
1734	      |  B ---         |
1735	      |  |             |
1736	       --E-------------F

1738	   Node A creates a working LSP to D, through C only, B creates
1739	   simultaneously a working LSP to D through C and a recovery LSP
1740	   (through E and F) to the same destination. Then, A decides to create
1741	   a recovery LSP to D, but since C to D span carries both working LSPs
1742	   node E should either assign a dedicated resource for this recovery
1743	   LSP or if it has already reached its maximum shared recovery
1744	   bandwidth level reject this request. Otherwise, in the latter case a
1745	   C-D span failure would imply that one of the working LSP would not
1746	   be recoverable.

1748	   Consequently, node E must have the required information (implying
1749	   for instance that the explicit route followed by the primary LSPs to
1750	   be carried with the corresponding recovery LSP request) in order to
1751	   perform an admission control for the recovery LSP requests.

1753	   Moreover, node E may securely (if its maximum shared recovery
1754	   bandwidth ratio has not been reached yet for this link) accept the
1755	   recovery LSP request and logically assign the same resource to these
1756	   LSPs. This if and only if it can guarantee that A-C-D and B-C-D are
1757	   SRLG disjoint over the C-D span (one considers here in the scope of
1758	   this example, node failure probability as negligible). To achieve
1759	   this, the explicit route of the primary LSP (and transported over
1760	   the recovery path) is examined at each shared link ingress node. The
1761	   latter uses the interface identifier as index to retrieve in the TE
1762	   Link State DataBase (TE LSDB) the SRLG id list associated to the
1763	   links of the working LSPs. If these LSPs have one or more SRLG id in
1764	   common (in this example, one or more SRLG id in common over C-D),
1765	   then node E should not assign the same resource to the recovery
1766	   LSPs. Otherwise one of these working LSPs would not be recoverable
1767	   in case of C-D span failure.

1769	   There are some issues related to this method, the major one being
1770	   the number of SRLG Ids that a single link can cover (more than 100,
1771	   in complex environments). Moreover, when using link bundles, this
1772	   approach may generate the rejection of some recovery LSP requests.
1773	   This because the SRLG sub-TLV corresponding to a link bundle

1775	D.Papadimitriou et al. - Internet Draft � May 2003                  33
1776	   includes the union of the SRLG id list of all the component links
1777	   belonging to this bundle (see [GMPLS-RTG] and [MPLS-BUNDLE]).

1779	   In order to overcome this specific issue, an additional mechanism
1780	   may consist of querying the nodes where such an information would be
1781	   available (in this case, node E would query C). The major drawback
1782	   of this method, in addition to the dedicated mechanism it requires,
1783	   is that it may become very complex when several common nodes are
1784	   traversed by the working LSPs. Therefore, when using link bundles, a
1785	   potential way of solving this issue tightly related to the sequence
1786	   of the recovery operations (at least in a first step, since per
1787	   component flooding of SRLG id would impact the link state routing
1788	   protocol scalability), is to rely on the usage of dedicated queries
1789	   to an on-line accessible network management system.

1791	8.5 Summary

1793	   One can summarize by the following table the selection of a recovery
1794	   scheme/strategy, using the recovery types proposed in [CCAMP-TERM]
1795	   and the above discussion.

1797	   --------------------------------------------------------------------
1798	              |          Path Search (computation and selection)
1799	   --------------------------------------------------------------------
1800	              |          Pre-planned       |         Dynamic
1801	   --------------------------------------------------------------------
1802	          |   | faster recovery            | Does not apply
1803	          |   | less flexible              |
1804	          | 1 | less robust                |
1805	          |   | most resource consuming    |
1806	   Path   |   |                            |
1807	   Setup   ------------------------------------------------------------
1808	          |   | relatively fast recovery   | Does not apply
1809	          |   | relatively flexible        |
1810	          | 2 | relatively robust          |
1811	          |   | resource consumption       |
1812	          |   |  depends on sharing degree |
1813	           ------------------------------------------------------------
1814	          |   | relatively fast recovery   | less faster (computation)
1815	          |   | more flexible              | most flexible
1816	          | 3 | relatively robust          | most robust
1817	          |   | less resource consuming    | least resource consuming
1818	          |   |  depends on sharing degree |
1819	   --------------------------------------------------------------------

1821	   1. Path Setup with Resource Reservation (i.e. signalling) and
1822	      Selection
1823	   2. Path Setup with Resource Reservation (i.e. signalling) w/o
1824	      Selection
1825	   3. Path Setup w/o  Resource Reservation (i.e. signalling) w/o
1826	      Selection

1828	D.Papadimitriou et al. - Internet Draft � May 2003                  34
1829	   As defined in [CCAMP-TERM], the term pre-planned refers to
1830	   restoration resource pre-computation, signaling (reservation) and a
1831	   priori selection (optional), but not cross-connection.

1833	9. Conclusion

1835	   TBD.

1837	10. Security Considerations

1839	   This document does not introduce or imply any specific security
1840	   consideration.

1842	11. References

1844	   [RFC-2026]   S.Bradner, �The Internet Standards Process --
1845	                Revision 3�, BCP 9, RFC 2026, October 1996.

1847	   [RFC-2119]   S.Bradner, �Key words for use in RFCs to Indicate
1848	                Requirement Levels�, BCP 14, RFC 2119, March 1997.

1850	   [BOUILLET]   E.Bouillet et al., �Stochastic Approaches to Compute
1851	                Shared Meshed Restored Lightpaths in Optical Network
1852	                Architectures�, INFOCOM 2002, New York City, June 2002.

1854	   [CCAMP-LI]   G.Li et al. �RSVP-TE Extensions For Shared-Mesh
1855	                Restoration in Transport Networks�, Internet Draft,
1856	                Work in progress, draft-li-shared-mesh-restoration-
1857	                01.txt, November 2001.

1859	   [CCAMP-LIU]  H.Liu et al. �OSPF-TE Extensions in Support of Shared
1860	                Mesh Restoration�, Internet Draft, Work in progress,
1861	                draft-liu-gmpls-ospf-restoration-00.txt, October 2002.

1863	   [CCAMP-SRLG] D.Papadimitriou et al., �Shared Risk Link Groups
1864	                Encoding and Processing,� Internet Draft, Work in
1865	                progress, draft-papadimitriou-ccamp-srlg-processing-
1866	                01.txt, November 2002.

1868	   [CCAMP-SRG]  S.Dharanikota et al., �Inter domain routing with Shared
1869	                Risk Groups,� Internet Draft, Work in progress,
1870	                November 2001.

1872	   [CCAMP-TERM] E.Mannie and D.Papadimitriou (Editors), �Recovery
1873	                (Protection and Restoration) Terminology for GMPLS,�
1874	                Internet Draft, Work in progress, draft-ietf-ccamp-
1875	                gmpls-recovery-terminology-00.txt, June 2002.

1877	   [DEMEESTER]  P.Demeester et al., �Resilience in Multilayer
1878	                Networks�, IEEE Communications Magazine, Vol. 37, No.
1879	                8, August 1998, pp. 70-76.

1881	D.Papadimitriou et al. - Internet Draft � May 2003                  35

1883	   [G.707]      ITU-T, �Network Node Interface for the Synchronous
1884	                Digital Hierarchy (SDH)�, Recommendation G.707, October
1885	                2000.

1887	   [G.709]      ITU-T, �Network Node Interface for the Optical
1888	                Transport Network (OTN)�, Recommendation G.709,
1889	                February 2001 (and Amendment n�1, October 2001).

1891	   [G.783]      ITU-T, �Characteristics of Synchronous Digital
1892	                Hierarchy (SDH) Equipment Functional Blocks�,
1893	                Recommendation G.783, October 2000.

1895	   [G.798]      ITU-T, �Characteristics of Optical Transport Network
1896	                (OTN) Equipment Functional Blocks�, Recommendation
1897	                G.798, January 2002.

1899	   [G.806]      ITU-T, �Characteristics of Transport Equipment �
1900	                Description Methodology and Generic Functionality�,
1901	                Recommendation G.806, October 2000.

1903	   [G.826]      ITU-T, �Performance Monitoring�, Recommendation G.826,
1904	                February 1999.

1906	   [G.841]      ITU-T, �Types and Characteristics of SDH Network
1907	                Protection Architectures�, Recommendation G.841,
1908	                October 1998.

1910	   [G.842]      ITU-T, �Interworking of SDH network protection
1911	                architectures�, Recommendation G.842, October 1998.

1913	   [G.GPS]      ITU-T Draft Recommendation G.GPS, Version 2, �Generic
1914	                Protection Switching�, Work in progress, May 2002.

1916	   [GLI]        Guangzhi Li et al., �Efficient Distributed Path
1917	                Selection for Shared Restoration Connections�, IEEE
1918	                Infocom, New York, June 2002.

1920	   [GMPLS-ARCH] E.Mannie (Editor), �Generalized MPLS Architecture�,
1921	                Internet Draft, Work in progress, draft-ietf-ccamp-
1922	                gmpls-architecture-03.txt, August 2002.

1924	   [GMPLS-RTG]  K.Kompella et al., �Routing Extensions in Support of
1925	                Generalized MPLS,� Internet Draft, Work in Progress,
1926	                draft-ietf-ccamp-gmpls-routing-05.txt, August 2002.

1928	   [GMPLS-SIG]  L.Berger (Editor), �Generalized MPLS � Signaling
1929	                Functional Description�, Internet Draft, Work in
1930	                progress, draft-ietf-mpls-generalized-signaling-09.txt,
1931	                October 2002.

1933	   [LMP]        J.Lang (Editor), �Link Management Protocol (LMP) v1.0�
1934	                Internet Draft, Work in progress, draft-ietf-ccamp-lmp-
1935	                06, September 2002.

1937	D.Papadimitriou et al. - Internet Draft � May 2003                  36

1939	   [LMP-WDM]    A.Fredette and J.Lang (Editors), �Link Management
1940	                Protocol (LMP) for DWDM Optical Line Systems,� Internet
1941	                Draft, Work in progress, draft-ietf-ccamp-lmp-wdm-
1942	                01.txt, September 2002.

1944	   [MANCHESTER] J.Manchester, P.Bonenfant and C.Newton, �The Evolution
1945	                of Transport Network Survivability,� IEEE
1946	                Communications Magazine, August 1999.

1948	   [MPLS-REC]   V.Sharma and F.Hellstrand (Editors) et al., �A
1949	                Framework for MPLS Recovery�, Internet Draft, Work in
1950	                Progress, draft-ietf-mpls-recovery-frmwrk-06.txt, July
1951	                2002.

1953	   [MPLS-OSU]   S.Seetharaman et al., �IP over Optical Networks: A
1954	                Summary of Issues�, Internet Draft, Work in Progress,
1955	                draft-osu-ipo-mpls-issues-02.txt, April 2001.

1957	   [T1.105]     ANSI, "Synchronous Optical Network (SONET): Basic
1958	                Description Including Multiplex Structure, Rates, and
1959	                Formats", ANSI T1.105, January 2001.

1961	   [TE-NS]      K.Owens et al., �Network Survivability Considerations
1962	                for Traffic Engineered IP Networks�, Internet Draft,
1963	                Work in Progress, draft-owens-te-network-survivability-
1964	                01.txt, July 2001.

1966	   [TE-RH]      W.Lai, D.McDysan, J.Boyle, et al., �Network Hierarchy
1967	                and Multi-layer Survivability�, Internet Draft, Work in
1968	                Progress, draft-ietf-tewg-restore-hierarchy-01.txt,
1969	                June 2002.

1971	12. Acknowledgments

1973	   The authors would like to thank Fabrice Poppe (Alcatel) and Bart
1974	   Rousseau (Alcatel) for their revision effort, Richard Rabbat
1975	   (Fujitsu), David Griffith (NIST) and Lyndon Ong (Ciena) for their
1976	   useful comments.

1978	13. Author's Addresses

1980	   Deborah Brungard (AT&T)
1981	   Rm. D1-3C22
1982	   200 S. Laurel Ave.
1983	   Middletown, NJ 07748, USA
1984	   Email: dbrungard@att.com

1986	   Sudheer Dharanikota (Nayna)
1987	   481 Sycamore Drive
1988	   Milpitas, CA 95035, USA
1989	   Email: sudheer@nayna.com

1991	D.Papadimitriou et al. - Internet Draft � May 2003                  37
1992	   Jonathan P. Lang (Calient)
1993	   25 Castilian
1994	   Goleta, CA 93117, USA
1995	   Email: jplang@calient.net

1997	   Guangzhi Li (AT&T)
1998	   180 Park Avenue,
1999	   Florham Park, NJ 07932, USA
2000	   Email: gli@research.att.com
2001	   Phone: +1 973 360-7376

2003	   Eric Mannie (Consulting)
2004	   Email: eric_mannie@hotmail.com

2006	   Dimitri Papadimitriou (Alcatel)
2007	   Francis Wellesplein, 1
2008	   B-2018 Antwerpen, Belgium
2009	   Phone: +32 3 240-8491
2010	   Email: dimitri.papadimitriou@alcatel.be

2012	   Bala Rajagopalan (Tellium)
2013	   2 Crescent Place
2014	   P.O. Box 901
2015	   Oceanport, NJ 07757-0901, USA
2016	   Phone: +1 732 923-4237
2017	   Email: braja@tellium.com

2019	   Yakov Rekhter (Juniper)
2020	   Email: yakov@juniper.net

2022	D.Papadimitriou et al. - Internet Draft � May 2003                  38
2023	Full Copyright Statement

2025	   "Copyright (C) The Internet Society (date). All Rights Reserved.
2026	   This document and translations of it may be copied and furnished to
2027	   others, and derivative works that comment on or otherwise explain it
2028	   or assist in its implementation may be prepared, copied, published
2029	   and distributed, in whole or in part, without restriction of any
2030	   kind, provided that the above copyright notice and this paragraph
2031	   are included on all such copies and derivative works. However, this
2032	   document itself may not be modified in any way, such as by removing
2033	   the copyright notice or references to the Internet Society or other
2034	   Internet organizations, except as needed for the purpose of
2035	   developing Internet standards in which case the procedures for
2036	   copyrights defined in the Internet Standards process must be
2037	   followed, or as required to translate it into languages other than
2038	   English.

2040	   The limited permissions granted above are perpetual and will not be
2041	   revoked by the Internet Society or its successors or assigns.

2043	   This document and the information contained herein is provided on an
2044	   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
2045	   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
2046	   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
2047	   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
2048	   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE."

2050	D.Papadimitriou et al. - Internet Draft � May 2003                  39