idnits 2.17.1 draft-li-shared-mesh-restoration-01.txt: -(118): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(444): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(445): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(454): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(482): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(488): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(489): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(514): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == There are 27 instances of lines with non-ascii characters in the document. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 10) being 59 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-09) exists of draft-ietf-mpls-generalized-signaling-04 == Outdated reference: A later version (-02) exists of draft-kompella-ospf-gmpls-extensions-01 -- Possible downref: Normative reference to a draft: ref. '2' == Outdated reference: A later version (-19) exists of draft-ietf-isis-gmpls-extensions-02 ** Downref: Normative reference to an Informational draft: draft-ietf-isis-gmpls-extensions (ref. '3') == Outdated reference: A later version (-01) exists of draft-lang-ccamp-recovery-00 -- Possible downref: Normative reference to a draft: ref. '4' -- Possible downref: Normative reference to a draft: ref. '5' -- Possible downref: Non-RFC (?) normative reference: ref. '6' -- Possible downref: Normative reference to a draft: ref. '7' == Outdated reference: A later version (-09) exists of draft-ietf-mpls-generalized-rsvp-te-03 == Outdated reference: A later version (-01) exists of draft-bala-restoration-signaling-00 -- Possible downref: Normative reference to a draft: ref. '9' -- Possible downref: Non-RFC (?) normative reference: ref. '10' == Outdated reference: A later version (-07) exists of draft-iwata-mpls-crankback-00 -- Possible downref: Normative reference to a draft: ref. '11' -- Possible downref: Non-RFC (?) normative reference: ref. '12' -- Possible downref: Non-RFC (?) normative reference: ref. '13' -- Possible downref: Normative reference to a draft: ref. '14' == Outdated reference: A later version (-07) exists of draft-ietf-mpls-generalized-cr-ldp-03 == Outdated reference: A later version (-08) exists of draft-ietf-mpls-recovery-frmwrk-02 ** Downref: Normative reference to an Informational draft: draft-ietf-mpls-recovery-frmwrk (ref. '16') == Outdated reference: A later version (-03) exists of draft-chang-mpls-path-protection-02 -- Possible downref: Normative reference to a draft: ref. '17' == Outdated reference: A later version (-02) exists of draft-chang-mpls-rsvpte-path-protection-ext-01 -- Possible downref: Normative reference to a draft: ref. '18' Summary: 9 errors (**), 0 flaws (~~), 14 warnings (==), 15 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Draft G. Li (AT&T) 3 Expiration Date: May 2002 C. Kalmanek(AT&T) 4 J. Yates (AT&T) 5 Document: draft-li-shared-mesh-restoration-01.txt G. Bernstein (Ciena) 6 F. Liaw (Zaffire) 7 V. Sharma (Matanoia) 9 Nov. 2001 11 RSVP-TE Extensions For Shared-Mesh Restoration in Transport Networks 13 Status of this Memo 15 This document is an Internet-Draft and is in full conformance with 16 all provisions of Section 10 of RFC2026 [1]. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. Internet-Drafts are draft documents valid for a maximum of 22 six months and may be updated, replaced, or obsoleted by other 23 documents at any time. It is inappropriate to use Internet- Drafts 24 as reference material or to cite them other than as "work in 25 progress." 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/ietf/1id-abstracts.txt 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html. 31 Abstract 33 Efficient techniques for rapid restoration must be addressed within 34 GMPLS. This document describes extensions to RSVP-TE signaling in 35 support of shared mesh restoration. Shared mesh restoration 36 describes restoration plans in which restoration capacity is shared 37 across multiple independent failures. In particular, this document 38 proposes extensions enabling reservation of restoration capacity, 39 LSP restoration, LSP reversion and LSP deletion. 41 1. Introduction 43 Rapid recovery (restoration) from network failures is a crucial 44 aspect of current and future transport networks. Rapid restoration 45 is required by transport network providers to support stringent 46 Service Level Agreements (SLAs) that dictate high reliability and 47 availability for customer connectivity. 49 The choice of a restoration policy is a tradeoff between network 50 resource utilization (cost) and service interruption time. Clearly, 51 minimized service interruption time is desirable, but schemes 52 achieving this usually do so at the expense of network resource 53 utilization, resulting in increased cost to the provider. Different 54 restoration schemes operate with different tradeoffs mainly among 55 spare capacity requirements and service interruption time as well as 56 complexity, robustness, etc. 58 In light of these tradeoffs, transport providers are expected to 59 support a range of different service offerings, with a strong 60 differentiating factor between these service offerings being service 61 interruption time in the event of network failures. For example, a 62 provider�s highest offered service level would generally ensure the 63 most rapid recovery from network failures. However, such schemes 64 (e.g., 1+1, 1:1 protection) generally use a large amount of spare 65 restoration capacity, and are thus not cost effective for most 66 customer applications. Significant reductions in spare capacity can 67 be achieved by instead sharing this capacity across multiple 68 independent failures. 70 GMPLS signaling proposals have primarily focused on the development 71 of methods for label switched path (LSP) establishment and removal 72 [1,2,3] with some fault recovery capabilities. Recent Internet 73 drafts [4,9] examine how to realize some different restoration 74 schemes for transport networks using GMPLS signaling. Other LSP 75 restoration-related contributions [5,6,7,8,16,17,18] mainly focus on 76 MPLS networks. This contribution motivates the need for path-based 77 shared mesh restoration in transport networks, and defines 78 extensions to support it. The proposal here primarily focuses on 79 restoration within a single control domain. 81 Shared mesh restoration for transport networks was proposed in [13]. 82 The basic functionality discussed within [13] is enabled using the 83 GMPLS extensions proposed within this draft. Kini et al. have 84 recently proposed shared mesh restoration for MPLS networks [5,6]. 85 The fundamental difference between transport networks and packet 86 networks (where MPLS applies) is that in packet networks we can 87 establish an LSP without using any bandwidth. However, in transport 88 networks, if an LSP is established, then by definition the full 89 bandwidth requested by the LSP is consumed, independent of whether 90 traffic is transmitted over this LSP or not. A LSP can be 91 established before failure in MPLS, but not used until after 92 failure, whereas this is not possible in transport networks. This 93 contribution addresses the GMPLS-specific extensions required to 94 support shared mesh restoration in transport networks. 96 The current GMPLS signaling specification is based on extensions to 97 existing protocols � namely RSVP-TE [8] and CR-LDP [15]. The 98 introduction of new signaling protocols for restoration [9] is 99 likely to significantly complicate the standardization process and 100 future implementations. Instead, we propose extending the existing 101 signaling protocols to provide the necessary network failure 102 restoration functionality. We demonstrated a reference 103 implementation of the extensions to RSVP-TE described here for 104 shared-mesh restoration in [10], and have successfully demonstrated 105 that rapid end-to-end restoration signaling can be achieved using 106 these extensions. Similar extensions are required for CR-LDP. 108 2. Restoration methods 110 We classify restoration techniques into path-based and link-based 111 [16]. Path-based schemes are implemented via an alternate or backup 112 path that may traverse multiple nodes. Failure recovery is typically 113 provided on a per LSP basis between a pair of nodes. Different LSPs 114 on a failed link, segment or path may use different restoration 115 techniques and traverse different restoration routes. In contrast, 116 link-based techniques are provided on a per link basis. Traffic on 117 the failed link usually traverses on the same restoration route. 118 Note that by �link� in this document we mean a �logical� link in the 119 network layer of interest (e.g., one or more similar-routed channels 120 between a pair of optical cross-connects). 122 In general, path-based schemes may protect an end-to-end path, a 123 segment or a single link / node. The extensions proposed here are 124 applicable to all of these cases, although we focus primarily on 125 end-to-end path-based restoration. Depending on the degree to which 126 a service provider wishes to protect LSPs, the service and 127 restoration paths may be link-disjoint, node-disjoint or Shared Risk 128 Link Group (SRLG)[1,2,13]-disjoint. SRLG-disjoint routes are 129 important as they cover several common types of failure that must be 130 protected against, including link failures, conduit cuts, etc. 132 There are a number of possible path-based restoration techniques for 133 transport networks. The interested reader is referred to [16] for a 134 complete taxonomy of MPLS-based restoration schemes. If the network 135 pre-establishes a restoration path for a given service path, then 136 restoration of the service path in the event of service path failure 137 simply involves cross-connecting the add/drop ports at the source 138 and destination from the failed path onto the restoration path. This 139 is referred to as dedicated path protection. Dedicated path 140 protection provides very rapid failure recovery, but is expensive in 141 terms of the spare capacity requirements. 143 Alternatively, if the network searches for restoration capacity and 144 establishes the restoration path only after service path failure, 145 then the restoration scheme is referred to as dynamic restoration. 146 Dynamic restoration may utilize techniques such as crankback [11] to 147 successively try different paths until a path with sufficient 148 resources is found. Dynamic restoration does not require pre- 149 planning on a per LSP basis and as such may be more robust to 150 (unanticipated) failures. The disadvantages of dynamic restoration 151 schemes include long worst-case restoration times, lack of 152 predictability and no guarantee of successful failure recovery. 153 Dynamic restoration may be particularly useful as a backup 154 restoration technique when other pre-established or pre-calculated 155 restoration routes are not available (e.g., for multiple failure 156 events in which insufficient restoration capacity has been 157 established / reserved). 159 Another path-based restoration technique is instead based on pre- 160 calculating restoration routes, with cross-connection performed 161 after failure [10,12]. This approach allows efficient use of spare 162 restoration capacity by sharing this capacity across multiple 163 independent failures. In this scheme, when the service path for a 164 LSP is established, resources may be reserved along the restoration 165 path without allocating the resources to a specific LSP and 166 configuring the cross-connects on the restoration path. The 167 resources reserved for a particular restoration path can be shared 168 with other restoration paths if their service paths do not have any 169 (single) failure in common. In another words, if the service paths 170 of two LSPs are failure disjoint, (e.g., they fail independently), 171 the resources reserved for restoration can be shared on the common 172 links of their restoration paths. We refer to this technique as 173 shared mesh restoration. Note that for all-optical networks without 174 wavelength conversion, restoration resources may have to be shared 175 on a per-wavelength basis. 177 To implement shared mesh restoration, we require new extensions to 178 the existing GMPLS signaling specifications [8,15] for bandwidth 179 reservation, LSP restoration, LSP reversion and LSP deletion. These 180 signaling procedures are discussed in the following section. 182 3. Shared mesh restoration 184 3.1 Resource reservation for restoration 186 A restorable LSP in a transport network supporting shared mesh 187 restoration has both a service (primary) path and a restoration 188 (secondary) path. During normal network operation (without 189 failures), the LSP is established along the service path, with 190 resources (optionally) reserved along the restoration path. 192 In implementing shared mesh restoration, capacity may be reserved 193 along the restoration path during LSP provisioning [10,13]. The 194 resources reserved on each link along a restoration path may be 195 shared across different service LSPs that are not expected to fail 196 simultaneously. The restoration capacity might be either idle or 197 used for pre-emptable LSPs. 199 The amount of restoration capacity reserved on the restoration paths 200 determines the robustness of the restoration scheme to failures. For 201 example, a network operator may choose to reserve sufficient 202 capacity to ensure that all shared mesh restorable LSPs can be 203 recovered in the event of any single failure event (e.g., a conduit 204 being cut). A network operator may instead reserve more or less 205 capacity than that required to handle any single failure event, or 206 may alternatively choose to reserve only a fixed pool independent of 207 the number of LSPs requiring this capacity. 209 The sharing of restoration bandwidth across multiple independent 210 failures can be simply illustrated using the topology depicted in 211 Figure 1. We consider an LSP established between A and C, and 212 another between F and H. The service and restoration paths for the 213 LSP between A and C are A-B-C and A-D-E-C, respectively, whilst the 214 service and restoration paths for the LSP between F and H are F-G-H 215 and F-D-E-H, respectively. Thus, the link between D and E has 216 capacity reserved for the failure of both the service LSPs. If the 217 service provider wishes to guarantee recovery from any single 218 failure event, and if the links along the two service paths do not 219 share any common failure (e.g., SRLG), then a single unit of 220 capacity may be reserved on the D-E link for the restoration of 221 either of the service LSPs. An example is provided in Section 6 that 222 illustrates the reservation of restoration capacity when 223 guaranteeing recovery from a single SRLG failure. 225 A---------------B-------------C 226 \ / 227 \ / 228 D-----------------------E 229 / \ 230 / \ 231 F--------------G--------------H 233 Figure 1. Example network topology. 235 When the amount of reserved capacity is a function of the number of 236 LSPs that are to be restored on each link, signaling is required to 237 reserve this capacity along the restoration path. Details of 238 resource reservation are described in Section 4.1 240 In general, depending on the network operator�s desired 241 functionality, channel selection may be performed either during the 242 reservation stage, or after failure. If channels are pre-selected, 243 the channel selection is stored during the resource reservation 244 phase as part of the reservation state along the LSP�s restoration 245 path. Importantly, although the channels are pre-selected, the 246 cross-connect is not established until after a failure. If channels 247 are pre-selected during the reservation phase, then restoration 248 message processing during restoration may be faster. However, if the 249 pre-selected channels are dependent on the failure scenario, channel 250 pre-selection may necessitate that fault isolation be performed 251 before connectivity can be restored. 253 Alternatively, channel selection may be performed after failure on 254 receipt of a signaling message for restoration. In this case, since 255 restoration capacity along the restoration path is only reserved but 256 not allocated, handling a fault translates into allocating the 257 restoration LSP after failure. This requires efficient mechanisms 258 for triggering and allocating the restoration LSP to meet the tight 259 restoration timing constraints. The LSP restoration time will depend 260 on the time to detect the failure, (possibly) localize the failure, 261 notify the node(s) responsible for restoration, and finally activate 262 the restoration LSP. Internet draft [16] shows a complete 263 specification of the various cycle times involved in different 264 recovery scenarios. 266 3.2 Interaction with failure detection and localization 268 Both failure detection and failure localization are technology and 269 implementation dependent. In general, failures are detected by lower 270 layer mechanisms (e.g., SONET/SDH, Loss-of-Light (LOL)). When a node 271 detects a failure, an alarm may be passed up to a GMPLS entity, 272 which will take appropriate action. This section discusses models 273 for how failure detection interacts with and triggers end-to-end 274 path-based restoration. 276 One model generates alarms upon failure detection and uses IP 277 signaling to propagate a failure notification to the node(s) 278 responsible for initiating restoration. Fault localization is 279 important in this model to avoid having numerous alarms and IP 280 messages generated for each failed LSP. Where hardware-based (e.g., 281 SONET/SDH) fault localization techniques are not available, fault 282 localization can be performed using IP-based protocols, such as the 283 Link Management Protocol (LMP) [14]. Once the fault has been 284 localized, the node(s) adjacent to the failure send a failure 285 notification message to the node(s) responsible for restoring the 286 failed LSP, which initiates restoration. In RSVP, the failure 287 notification (NOTIFY) message is sent via normal IP forwarding with 288 optional end-to-end reliable transmission. 290 Using this approach, restoration may be delayed due to the fact that 291 failure localization needs to complete first. Additional delays may 292 be incurred when sending failure notifications if normal IP routing 293 has not converged. If the notification message is generated by a 294 node downstream (upstream) of the failure and sent to a node 295 upstream (downstream) of the failure, then normal IP forwarding may 296 result in the message following a route that is broken as a result 297 of the failure. The failure notification will thus not reach the 298 node responsible for initiating restoration until IP routing has 299 converged. 301 Another option is to trigger restoration based on failure detection 302 at the nodes terminating the LSP. Failure localization is now 303 targeted at the task of repairing the fault and becomes a background 304 task that can be performed on a much slower time scale. However, it 305 is important that valid signaling actions for planned events (e.g., 306 LSP deletion) do not trigger failure notification and restoration 307 actions along the path. For example, if LSPs are deleted in an all- 308 optical network by sending a single deletion message, LOL resulting 309 from disconnection at a node will propagate down the path faster 310 than the LSP deletion message, potentially triggering restoration. 311 Thus, for planned events that could result in LOL along the path, 312 such as LSP deletion, all nodes must be informed of the upcoming 313 event so that they may turn off alarms corresponding to the desired 314 LSP so as not to initiate restoration. 316 For uni-directional LSPs, failures will be detected at the 317 destination. For bi-directional LSPs, failures may be detected at 318 either the source, the destination or both, depending on whether 319 there is a uni-directional or bi-directional failure. Restoration 320 should then be initiated by either the source, the destination or 321 both. If restoration is initiated by the source (destination) and 322 only the destination (source) detects the failure, then a failure 323 notification must be propagated to the other end of the LSP. For 324 all-optical networks, this failure notification may be done using IP 325 messages, as above. However, most framing schemes in O-E-O networks 326 will be capable of hardware level notification upstream of the 327 failure, such as using SONET�s Path AIS. Alternatively, restoration 328 can be initiated by both the source and the destination, with 329 restoration signaling meeting at an intermediate node along the pre- 330 calculated restoration route. 332 All the above are potential implementations and therefore the 333 extensions proposed herein are intended to work independent of the 334 mechanism used for failure localization and notification. 336 4. Operations overview 338 The following discusses how shared-mesh restoration may be supported 339 using extensions to RSVP-TE signaling. 341 4.1 Restoration path reservation 343 When a LSP requesting path-based restoration is established, the 344 source node calculates the service and restoration paths for the 345 LSP. To satisfy SLAs, the network may reserve resources along the 346 chosen restoration path. To achieve this, the source node sends a 347 PATH message along the restoration path with a new �shared 348 reservation� flag (see Section 5.2) requesting a shared reservation 349 along the path. The PATH message sent along the pre-calculated 350 restoration path reserves the required restoration resources and 351 establishes shared reservation state relating to the LSP without 352 cross-connecting the channels (see the example in Section 6). A RESV 353 message with the same flag is returned to acknowledge the resource 354 reservation along the restoration path, but without establishing the 355 restoration LSP. 357 In general, many carriers will want to protect their network against 358 at least any single failure event, such as a fiber cut, or a conduit 359 cut. If we generalize the SRLG concept, it may be used to represent 360 different failure-prone network components, such as a fiber span, a 361 node, a DWDM system or a conduit. Thus, for simplicity in the 362 following description, we assume that we are protecting against SRLG 363 failures. 365 The nodes along the restoration path need to know the path taken by 366 the service LSP so that reservations can be shared among SRLG- 367 disjoint failures along the service path. Thus, the PATH message 368 sent along the restoration path includes information about the 369 service path. Two options for service path information are 370 discussed in Section 5.3. The information can contain either a list 371 of the links along the service LSP, or a list of the SRLGs traversed 372 by the service LSP. 374 4.2 Restoration path setup operation 376 As described in Section 3.2, restoration path setup can be triggered 377 in several ways. Path-based restoration may be triggered at either 378 the source or destination node, or both [12]. 380 If the restoration signaling is initiated by the source, the source 381 node sends a PATH message along the restoration path with the 382 �shared reservation� flag not set, indicating that the LSP should 383 now be established. Since nodes along the path retained reservation 384 state for the restoration LSP, this state can be used to ensure that 385 restoration LSPs allocate resources out of the capacity reserved for 386 restoration. Upon receipt of the PATH message, the nodes along the 387 restoration path should check the cross-connect state for this LSP. 388 (This is needed in case restoration triggered from the destination 389 node has already performed the cross-connection.) If the cross- 390 connection has not been performed for this LSP, the node should 391 select channels for the LSP (if not already pre-selected), and 392 perform the required cross-connections. In nodes with potentially 393 slower cross-connect switching times (e.g., MEMS cross-connects) it 394 is important to have the PATH message be forwarded without waiting 395 for the cross-connection to be completed. The destination node sends 396 a RESV message to the source to acknowledge the successful 397 establishment of the restoration path. 399 If the signaling is initiated by the destination, then a RESV 400 message is sent along the restoration path with the �shared 401 reservation� flag not set. Upon receipt of the RESV message, the 402 nodes along the restoration path should check the cross-connection 403 states for this LSP. If the cross-connection has not been performed 404 for this LSP, the node should select channels for the LSP (if not 405 already pre-selected), and perform the required cross-connections. 406 In nodes with potentially slower cross-connect switching times 407 (e.g., MEMS cross-connects) it is important to have the RESV message 408 forwarded without waiting for the cross-connection to be completed. 409 The source node sends a RESV_CONF message to the destination to 410 acknowledge the successful establishment of the restoration path. 412 If both ends initiate restoration, the PATH and RESV messages for 413 the same LSP may meet at an intermediate node. This may result in 414 label contention. For a uni-directional LSP, the contention is 415 resolved using downstream label assignment. For a bi-directional 416 LSP, the contention is resolved based on higher node-ID label 417 assignment, as proposed for GMPLS [1,8]. When signaling messages 418 from the two ends meet at an intermediate node, the node sends a 419 RESV message to the source and RESV_CONF to the destination in 420 response to the establishment of the restoration path. 422 When restoration is triggered from both source and destination, and 423 PATH/RESV messages are forwarded without waiting for cross- 424 connection as described above, the receipt of the RESV or RESV_CONF 425 does not guarantee the success of restoration path establishment. In 426 this case, a subsequent error message may override the 427 acknowledgment. This behavior must be evaluated further. 429 One issue in establishing a restoration path using GMPLS LSP setup 430 signaling is the contention resolution method. GMPLS allows upstream 431 suggested label and resolves the contention via master/slave node 432 relationship. During restoration process, two LSPs from different 433 clients may be mis-connected when contention occurs. This may occur 434 between two restoration LSPs or between a restoration LSP and a 435 service LSP if they share the same label pool. One possible solution 436 may be to only do label assignment from the master node, but this 437 method may affect the restoration time. The detailed behavior must 438 be evaluated further. 440 4.3 Error handling 442 In shared mesh restoration schemes, the reserved restoration 443 resources may be limited. During a restoration path establishment, 444 there may be scenarios in which the restoration path can�t be setup, 445 for example, if there aren�t adequate reserved restoration resources 446 due to any reason or if there is a failure along the restoration 447 path. In this case, PATHERR and RESVERR messages may be used to 448 report the failure of restoration path establishment. It is 449 important that any resources allocated by the incomplete restoration 450 path establishment be immediately released such that these resources 451 can be used for other restoration paths. 453 In the RSVP-TE extensions proposed for GMPLS, the PATHERR message 454 was extended to carry a �state_remove� flag to release the resources 455 consumed by incomplete LSP establishment. In shared mesh restoration 456 schemes, we may borrow the same idea and define a new flag 457 �allocation_remove�, which could be carried in both PATHERR and 458 RESVERR messages. Upon receipt of PATHERR or RESVERR messages with 459 this �allocation_remove� flag, the node does not remove all local 460 state but instead frees the cross-connect resources and releases the 461 channels to the reserved capacity pool. 463 4.4 LSP reversion operation 465 After service path repair, most carriers prefer to cause the LSP to 466 revert back to its original service path. Often, the routing of the 467 restoration LSP may not be as efficient as the original service LSP. 468 Additionally, once a restoration LSP is established, there is no 469 guarantee that other service paths that were sharing its resources 470 are protected, unless the other restoration routes are re- 471 calculated. 473 Reverting back to the service path after a failure is repaired 474 requires that the service LSP�s resources remain allocated during 475 the time that the LSP uses restoration resources. For RSVP, 476 techniques must be developed that allow service path resources to 477 remain allocated even though refreshes may be affected by failed 478 signaling channels. 480 It is important to have mechanisms that allow LSP reversion to be 481 performed without disrupting service to the customer. This can be 482 achieved if LSP reversion is implemented using a �bridge and roll� 483 approach. The source node commences the process by �bridging� the 484 customer signal onto both the service and restoration paths. Once 485 the bridge process has completed, the source node sends a 486 Notification message to the destination, requesting that the 488 destination �bridge and roll� the service and restoration paths. In 489 this case, the �roll� function causes the destination to select the 490 service path signal. Upon finishing the bridge and roll at the 491 destination, the destination sends a Notification message to the 492 source confirming the completion of the bridge and roll operation. 493 When the source receives this Notification, it stops transmitting 494 traffic along the restoration route, and sends another Notification 495 message to the destination confirming that the LSP is reversed. Once 496 the destination receives this Notification message, it issues a 497 RESVTEAR message along the restoration path and stops transmitting 498 along the restoration route. Additional mechanisms may be required 499 in some cases (e.g., all-optical networks) to ensure that 500 intermediate nodes do not alarm due to LOL during the teardown 501 procedure (see Section 3.2). The RESVTEAR message informs the nodes 502 along the restoration route to release the restoration resources if 503 shared restoration is used for this LSP. This procedure achieves 504 the �make-before-break� feature, that is, minimal service traffic 505 interruption during the reversion process. Note that the RESVTEAR 506 removes the cross-connection for the restoration path (and frees the 507 resources to be used for restoring other failures), but does not 508 delete the Path state along the restoration path. In this case, the 509 RESVTEAR should not trigger a PATHTEAR message from the source since 510 we want resources to continue to be reserved for this LSP. This 511 allows the termination node to quickly re-establish the restoration 512 path by sending either a RESV or PATH message if the service path 513 fails again in the future. The protection object with �shared 514 reservation� flag is carried in the RESVTEAR message to suppress the 515 PATHTEAR. If the restoration paths are reoptimized periodically, the 516 original restoration reservation state should be cleared and new 517 restoration reservation state must be created. 519 4.5 LSP deletion operation 521 Once an LSP is no longer required, the LSP service path and its 522 restoration resources should be released for future traffic. If the 523 source node initiates the LSP deletion, it should send two PATHTEAR 524 messages to the destination node: one along the service path and the 525 other along the restoration path. The PATHTEAR along the restoration 526 path should include information about the service path. The 527 information can contain either a list of the links along the service 528 LSP, or a list of the SRLGs traversed by the service LSP. If the 529 destination initiates the LSP deletion, it should send two RESVTEAR 530 messages to the source. The RESVTEAR along the restoration path 531 should include the information about the service path. Again, 532 additional mechanisms may be required in some cases (e.g., all- 533 optical networks) to ensure that nodes do not alarm due to LOL 534 during the teardown procedure (see Section 3.2). 536 5. RSVP-TE restoration extensions 538 5.1 Current GMPLS fault restoration capabilities 540 The GMPLS signaling specifications [1] currently define protection 541 information used in the LSP setup procedure. This protection 542 information is carried in a new object/TLV that includes a bit flag 543 that indicates whether the LSP is a primary (service) or a secondary 544 (restoration) LSP. 546 GMPLS also specifies a Link Flags field in the protection 547 information object. The Link Flags field indicates the link 548 protection type desired by the LSP. If a particular type is 549 requested, a new LSP request is processed only if the desired link 550 protection type can be honored. 552 5.2 Shared reservation/allocation request 554 To implement restoration resource reservation for shared mesh 555 restoration, a new mechanism must be introduced into PATH messages to 556 distinguish between normal LSP establishment, reservation of shared 557 resources, and allocation of shared resources to a particular LSP. 558 The S (secondary) bit in the protection information object may be 559 used to indicate that an LSP is a restoration/secondary path, not a 560 service LSP. 562 The shared resource reservation and shared resource allocation can be 563 explicitly indicated through a new Shared Reservation flag in the 564 protection information object. The protection information object 565 would be used in the PATH/RESV message forwarded along the 566 restoration route during LSP resource reservation and resource 567 allocation. 569 0 1 2 3 570 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 571 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 572 |S|R| Reserved | Link Flags| 573 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 574 Figure 3. Protection information object. 576 The Shared Reservation (R) flag described above may be encoded as 577 follows: 579 0 allocation 580 1 reservation 581 If other flags are needed to support path-based restoration, the 582 shared reservation flag can be included in a Path Flags field. 584 5.3 Service path information 586 To support shared reservations, intermediate nodes must compute the 587 total resources that must be reserved to support service paths that 588 are not subject to simultaneous failures. This requires 589 identification of the specific failure events that are to be 590 protected. If we wish to protect against link failures, then we must 591 know the set of links used along the service path when reserving 592 capacity on the restoration path. Alternatively, if we wish to 593 protect (more generally) against SRLG failures, when a restoration 594 LSP is reserved, the setup message must convey information about the 595 SRLGs that are associated with the service LSP that it is 596 protecting. Since a single restoration channel on a common link of 597 multiple restoration paths can be shared by non-simultaneous fiber 598 span failures. 600 This information is communicated by introducing a new object, the 601 service path information object, in the PATH message. We propose 602 two alternatives for information that might be conveyed: 604 (1) LINK_LIST SERVICE_PATH INFORMATION object 606 The LINK_LIST SERVICE_PATH INFORMATION object denotes the set of TE 607 links [2,2] that are used along the service path. This information 608 can be used directly when restoration bandwidth reservation accounts 609 for link failures only. If we account for SRLG failures in our 610 restoration reservations, then the use of the LINK_LIST requires the 611 nodes along the restoration path to map from links to SRLGs. 613 (2) SRLG_LIST SERVICE_PATH INFORMATION object 615 If we account for SRLG failures in the restoration reservations, 616 then transmitting the list of links along the restoration route 617 would require that every node duplicate the calculation of the 618 associated set of SRLGs for the primary links. This calculation 619 could instead be performed only at the source node, with the set of 620 SRLGs then carried in the PATH message. We thus propose a SRLG_LIST 621 SERVICE_PATH INFORMATION object. 623 The SRLG_LIST carries the list of SRLGs that are used by the service 624 path. Each SRLG is defined as a 32-bit unsigned number [2,3]. In 625 this SRLG list, the order of specific SRLGs is not significant. 627 The information carried in the SRLG_LIST would be: 629 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 630 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 631 | SRLG 1 | 632 |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 633 | SRLG 2 | 634 |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 635 | ...... | 636 |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 637 | SRLG n | 638 |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 639 Figure 4. SRLG list. 641 The use of the SRLG_LIST is more straightforward and requires less 642 processing at each node than the LINK_LIST. However, the LINK_LIST 643 is more generic and, in some realistic topologies, may be 644 significantly shorter. 646 5.4 Path message format 648 The new proposed format for the PATH message is: 649 ::== 650 [] 651 [ | ] 652 [ ] 653 654 655 656 [ ] 657 658 [ ] 659 [ ] 660 [ ] 661 [ ] 662 [ ] 663 [ ] 664 666 Shared restoration resource reservation is done if and only if the 667 PATH message includes the and the 668 objects with S and R (shared reservation) bits set. 669 Otherwise, the is ignored and message 670 processing is performed as usual. Shared restoration resource 671 allocation is done if and only if the PATH/RESV message includes the 672 object with S bit set and the R bit not set. 673 5.5 LSP establishment after failure 675 When a service path fails, the restoration LSP should be established 676 along the restoration path using the reserved restoration bandwidth 677 on each link. The LSP establishment along the restoration path may 678 be signaled from the source and/or the destination. A PATH message 679 is sent from the source including the object with S bit 680 set and the R bit not set, and/or a RESV message is sent from the 681 destination including the object with S bit set and the 682 R bit not set. 684 5.6 LSP reversion extension 686 It is proposed that LSP reversion be handled using the RSVP 687 Notification message. The NOTIFICATION message should be extended to 688 include a status field describing each of the different steps in the 689 reversion process. The NOTIFY message includes the 690 object, which has four fields: node address, flags, code, and 691 value. The node address represents the address of the node 692 generating the notification. New codes/values in the 693 object could be reserved to support reversion. Three new 694 codes/values are needed: 696 + Bridging completed 697 + Roll/bridge completed 698 + Roll completed 700 5.7 Deletion extension 702 A PATHTEAR message or RESVTEAR message as defined in the GMPLS 703 signaling specification [8] is used to remove (de-allocate) the 704 service path. Additional mechanisms required to ensure that nodes do 705 not alarm due to LOL during the teardown procedure are being 706 developed for some network applications � such as all-optical 707 networks. Once a restoration LSP is no longer required, we must also 708 release the reserved restoration resources and any allocated 709 resources along the restoration path. To achieve this, the source 710 sends a PATHTEAR message along the restoration path, including the 711 object. Upon receipt of this message, each 712 node along the restoration path should de-allocate any resources 713 allocated to this LSP (e.g., if the LSP is currently using the 714 restoration path) and decrement the reserved resources accordingly. 716 The new proposed format for the PATHTEAR message is: 717 ::== 718 [] 719 [ | ] 720 [ ] 721 722 723 [ ] 724 [ ] 725 [ ] 726 [ ] 727 729 6. Example 731 We illustrate here how the above RSVP signaling messages can be used 732 to implement resource reservation for shared mesh restoration in a 733 network that aims to guarantee recovery from any single SRLG 734 failure. We also assume here that channels are selected after 735 failure and that full wavelength conversion capabilities exist if we 736 are considering an all-optical network. 738 With the GMPLS routing enhancements [2,3], each node will have a 739 representation of the transport network topology, including the 740 available bandwidth, and the list of SRLGs for each optical link. 742 When a new LSP request arrives in the network, the source node is 743 responsible for computing two SRLG diverse paths. An RSVP PATH 744 message is sent along the calculated service path to establish the 745 service LSP. An RSVP PATH message containing a Protection 746 information object with the S and R (shared reservation) bits set 747 should also be forwarded along the restoration path with information 748 that identifies the SRLGs of the service path. This information may 749 be conveyed using either the LINK_LIST or the SRLG_LIST. Upon 750 receipt of this message, each node should then update the 751 restoration bandwidth reserved on the outgoing links of the 752 restoration path. Assume that each link has a Reservation array 753 R[i], i=1,2,...,K, where K is the maximum SRLG index. There are 754 various techniques on how these arrays for each link can be 755 maintained among the nodes. These methods are not specified here. 756 R[i] indicates the bandwidth required on the link if the i-th SRLG 757 in the network fails. The total reserved restoration capacity 758 should be calculated as the maximum over all SRLGs (i.e., max R[i], 759 i=1,2,...,K ). When a node receives a new reservation message, it 760 saves state relating to the LSP and updates the Reservation array on 761 its link(s) in the following way: R[i]=R[i] + reservation bandwidth 762 if the i-th SRLG is in the SRLGs associated with the 763 object. Once R[i] has been re-calculated 764 for all SRLGs associated with the service path, a new required 765 reserved capacity is calculated (i.e., max R[i]=1,2,...,K). If 766 inadequate capacity is available to support this new resource 767 reservation, the LSP reservation process may be abandoned, with an 768 error message (PATHERR) being returned to the source. The already 769 reserved resources must then be removed. However, if the reservation 770 is successful and the reserved capacity has changed as a result of 771 this new LSP, then updated link resource information may be flooded 772 to other nodes in the network for the purpose of path computation. 773 For example, the reserved capacity may reduce the available 774 bandwidth information that is flooded. If the GMPLS routing 775 extensions were further extended to explicitly flood the bandwidth 776 reserved on each link, some additional improvement in network 777 utilization may be possible. 779 Similarly, when a node receives a message requesting the removal of 780 reservations for an existing restoration LSP, the restoration 781 capacity is updated for each of the SRLGs along the primary path: 782 R[i] = R[i] - reservation bandwidth if the i-th SRLG is in the set 783 of SRLGs along the service path. Again, this update may result in a 784 change in the link information that is flooded throughout the 785 network. 787 7. Discussion 789 7.1 Interaction with other restoration schemes 791 An operational transport network is expected to support multiple 792 restoration schemes to satisfy different clients requirements. For 793 example, a service provider may offer four different services based 794 on dedicated protection (1:1, 1+1), shared mesh restoration, dynamic 795 restoration, and no restoration. In fact, our shared mesh restoration 796 can co-exist with dedicated protection, dynamic restoration, and 797 other restoration schemes. Our RSVP-TE extensions can also be re-used 798 for these schemes. For example, restoration path reversion messages 799 and procedures can be used for 1:1 protection whilst dynamic 800 restoration can re-use the restoration path creation message for 801 purposes of bandwidth accounting, path reversion, and deletion. 803 7.2 Multi-domain restoration 805 This contribution focuses on shared mesh restoration within a single 806 control domain, area, or sub-network. Realistically, each domain may 807 implement different restoration schemes. If a LSP is routed over 808 multiple domains, domain-by-domain restoration may be applied to 809 recover from failures internal to each domain. External links between 810 domains may be protected via link protection (e.g., 1:1 or 1+1 811 protection). In this way, the shared mesh restoration procedures 812 proposed here are able to interoperate with other protection schemes 813 crossing network-to-network interfaces. 815 Alternatively, the shared mesh restoration procedure proposed here 816 may also be executed across multiple domains. 818 7.3 Restoration priority and pre-emption 820 The shared mesh restoration extensions proposed within this draft can 821 support restoration priority and pre-emption using setup priority and 822 holding priority. Our restoration messages are extended from RSVP-TE 823 provisioning messages and inherit the pre-emption functionalities. 825 8. Security considerations 827 This draft introduces no new security considerations to [1,8]. 829 9. References 830 [1] P. Ashwood-Smith et al., "Generalized MPLS - Signaling 831 Functional Description," Internet draft, draft-ietf-mpls- 832 generalized-signaling-04.txt, May 2001. 833 [2] K. Kompella et al., "OSPF Extensions in Support of Generalized 834 MPLS," Internet draft, draft-kompella-ospf-gmpls-extensions-01.txt, 835 Feb. 2001. 836 [3] K. Kompella et al., "IS-IS Extensions in Support of Generalized 837 MPLS," Internet draft, draft-ietf-isis-gmpls-extensions-02.txt, Feb. 838 2001. 840 [4] J. Lang et al. "Generalized MPLS Recovery Mechanisms," Internet 841 draft, draft-lang-ccamp-recovery-00.txt, Feb. 2001. 842 [5] S. Kini et al. "Shared backup Label Switched Path restoration," 843 Internet draft, draft-kini-restoration-shared-backup-01.txt, May 844 2001. 845 [6] S. Kini et al. "ReSerVation Protocol with Traffic Engineering 846 extensions: extension for label switched path restoration," Nov. 847 2000. 848 [7] D. Gan et al. "A Method for MPLS LSP Fast-Reroute Using RSVP 849 Detours," Internet draft, draft-gan-fast-reroute-00.txt, Feb. 2001. 850 [8] P. Ashwood-Smith et al., "Generalized MPLS Signaling - RSVP-TE 851 Extensions," Internet draft, draft-ietf-mpls-generalized-rsvp-te- 852 03.txt, May 2001. 853 [9] B. Rajagopalan et al. "Signaling for Fast Restoration in Optical 854 Mesh Networks," Internet draft, draft-bala-restoration-signaling- 855 00.txt, Feb. 2001. 856 [10] G. Li, J. Yates, R. Doverspike and D. Wang, "Experiments in 857 Fast Restoration using GMPLS in Optical / Electronic Mesh Networks," 858 Postdeadline Papers Digest, Optical Fiber Commun. Conf., March 2001. 859 [11] A. Iwata et al., "Crankback Routing Extensions for MPLS 860 Signaling," IETF draft, draft-iwata-mpls-crankback-00.txt, November 861 2000. 862 [12] R. Doverspike, G. Sahin, J. Strand and R. Tkach, "Fast 863 Restoration in a Mesh Network of Optical Cross-connects," Optical 864 Fiber Commun. Conf., 1999. 866 [13] S. Chaudhuri, G. Hj�lmt�sson and J. Yates, "Control of 867 Lightpaths in an Optical Network," OIF contribution OIF2000.04, Jan. 868 2000. 869 [14] J. Lang et al., "Link Management Protocol (LMP)," Internet 870 draft, draft-lang-mpls-lmp-02.txt, July 2000. 871 [15] P. Ashwood-Smith et al., "Generalized MPLS Signaling � CR-LDP 872 Extensions," Internet draft, draft-ietf-mpls-generalized-cr-ldp- 873 03.txt, May 2001. 874 [16] V. Sharma and F. Hellstrand (Editors), "A Framework for MPLS- 875 based Recovery," Internet Draft, draft-ietf-mpls-recovery-frmwrk- 876 02.txt, March 2001. 877 [17] Owens, K., Makam, V., Sharma, V., Mack-Crane, B., and Haung, 878 C., "A Path Protection/Restoration Mechanism for MPLS Networks," 879 Internet Draft, draft-chang-mpls-path-protection-02.txt, Work in 880 Progress November 2000. 881 [18] Owens, K. et al, "Extensions to RSVP-TE for MPLS Path 882 Protection," Internet Draft, draft-chang-mpls-rsvpte-path- 883 protection-ext-01.txt, November 2000. 885 10. Author's Addresses 887 Guangzhi Li Charles Kalmanek 888 AT&T AT&T 889 180 Park avenue 180 Park avenue 890 Florham Park, NJ 07932 Florham park, NJ 07932 891 973-360-7376 973-360-8720 892 gli@research.att.com crk@research.att.com 893 Jennifer Yates Greg Bernstein 894 AT&T Ciena Corporation 895 180 park avenue 10480 Ridgeview Court 896 Florham Park, NJ 07932 Cupertino, CA 94014 897 973-360-7036 Phone: (408) 366-4713 898 jyates@research.att.com greg@ciena.com 900 Fong Liaw 901 Vishal Sharma 902 Metanoia, Inc, Zaffire/Centerpoint Inc. 903 305 Elan Village Lane, Unit 121 2630 Orchard Parkway, 904 San Jose, CA 95134 San Jose, CA 95134 905 Email: V.Sharma@ieee.org 906 fliaw@zaffire.com