Network Working Group Zubair Ahmad Internet Draft Equant Category: Informational E. Oki Expires: September 2006 NTT Lei Wang Telenor Jaime Miles Level 3 Communications March 2006 Multi-TEchnology Recovery (MTER) Problem Statement draft-ahmad-mter-problem-statement-00.txt Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract The objective of this document is to begin a discussion that will determine the level of interest at the IETF in documenting how multiple recovery techniques can successfully be combined to protect a set of network resources and the various interactions between these recovery techniques. Potential outcome of this work could be to define new MIBs and/or OAM techniques devoted to such interactions. Ahmad, Oki, Wang, Miles MTER Problem Statement [Page 1] draft-ahmad-mter-problem-statement-00.txt January 2006 Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119. Table of Contents 1. Note........................................................2 2. Terminology.................................................3 3. Introduction................................................3 4. Non objectives..............................................4 5. Three MTER scenarios........................................4 5.1. IGP Fast Convergence and MPLS FRR...........................4 5.1.1. Description.................................................4 5.1.2 Evaluation....................................................5 5.1.3 Potential side effects........................................5 5.2. IGP Fast Convergence & Graceful Restart.....................5 5.2.1. Description.................................................5 5.2.2. Interactions................................................6 5.3. IGP Fast Convergence and GMPLS Protection and Restoration.................................................7 5.3.1. Description.................................................7 5.3.2. Evaluation..................................................8 5.3.3. Potential Side effects......................................8 5.3.4. Required functions..........................................8 6. Multi Techno OAM............................................9 6.1. Failure detection and isolation in an MTER context..........9 6.2. MIBs/OAM...................................................10 7. Future items...............................................10 8. Acknowledgment.............................................10 9. References.................................................10 10. Authors' Addresses:........................................11 11. Intellectual Property Statement............................12 1. Note The first revision of this document aims to start discussions that will be used to determine the level of interest in the IETF in documenting the impacts of combining multiple recovery techniques. The content in this document is not exhaustive but instead illustrates some of the challenges and opportunities encountered when combining a few of the multiple recovery techniques. Ideally the discussions generated from this document will also highlight the priority of which combinations a future working group should look at first. Potential outcome of this work could be to define new MIBs and/or OAM techniques devoted to such interactions. Ahmad, Oki, Wang, Miles MTER Problem Statement [Page 2] draft-ahmad-mter-problem-statement-00.txt January 2006 2. Terminology Recovery Mechanism: A network mechanism that restores network connectivity upon link, node or SRLG failure. MPLS Fast Reroute: MPLS Recovery Mechanism that relies on pre- established local backup LSPs. GMPLS P&R: GMPLS Protection and Restoration. Set of GMPLS Recovery Mechanisms under definition within the CCAMP WG. This includes end- to-end recovery and segment recovery. IGP Fast Convergence: The set of IGP improvements to minimize IGP convergence time upon failure. This includes, but is not limited to, fast LSA/LSP triggering (with back-off) upon routing adjacency changes, fast SPF triggering (with back-off), and incremental SPF. Graceful Restart: Control plane recovery mechanism allowing a control plane element to restart and recover state information from its neighbors with no impact on the data plane. A set of graceful restart mechanisms have been defined for routing and signaling protocols (e.g. OSPF, ISIS, BGP, LDP, RSVP, etc.). MTER: Multi-TEchnology Recovery : This refers to the combination of at least two distinct recovery mechanisms to protect against network elements failures (link, node, SRLG). 3. Introduction Over the past few years a plethora of recovery techniques (involving global and local mechanisms for protection and restoration) have been defined in various Working Groups at the IETF, implemented by the vendor community, and deployed by network operators. Examples of these techniques include IGP Fast Convergence, MPLS Fast Reroute ([FRR]), GMPLS P&R ([E2E-RECOVERY], [SEG-RECOVERY]), and Graceful Restart ([RFC3847], [RFC3623], [RFC3478], [RSVP-RESTART]). The goal of this work has been to improve overall availability of the networks that implement these recovery techniques. Unfortunately no single recovery technique can protect a network against all possible failure types. Consequently network operators have found that the variety of failure types usually requires that they deploy multiple recovery techniques with different scopes (link, node control plane, node data plane, SRLG...) to improve their overall network availability. This is referred to as Multi-Technology Recovery (MTER). As network operators have combined several of the recovery technologies into their networks they have discovered that there can be interactions between the combined recovery techniques. The following sections of this document will look at several combinations Ahmad, Oki, Wang, Miles MTER Problem Statement [Page 3] draft-ahmad-mter-problem-statement-00.txt January 2006 of recovery techniques and their unique interactions when deployed together. For each of the combinations studied, this document analyses the potential side effects and look at various recovery metrics such as scope of recovery, robustness, recovery time, number of states, stability, ability to provide bandwidth protection, transient congestion (micro-loops) that impact other flows, race conditions, and backup bandwidth overbooking. Note that recovery times referenced in this document will be in terms of an order of magnitude as they strongly depend on implementations and network engineering. Although various techniques already exist so as to safely deploy a set of recovery mechanisms in combination, there are no defined tools to manage and monitor their combined use. 4. Non objectives This document does not provide a detailed description of the referenced recovery techniques. For a detailed description of each recovery technique see the referenced RFCs or Drafts. Additionally this paper does not compare recovery technologies or seek to make any recommendations as to which recovery techniques to combine. Moreover, the objective of this work is not to propose extensions of existing protocols that are defined in existing Working Groups. Finally, there are cases where a recovery technique is mainly available because of a vendor implementation optimization or specific algorithm. This document will not cover such implementation specific aspects. 5. Three MTER scenarios 5.1. IGP Fast Convergence and MPLS FRR 5.1.1. Description In this MTER scenario, recovery upon link and SRLG failures relies on MPLS FRR link protection with one-hop unconstrained primary tunnels, while recovery upon node failures relies on IGP Fast Convergence. The motivation for such MTER scenario may be, for instance, to rely on FRR for link protection only so as to avoid the deployment of a potentially large number of end-to-end TE LSPs and rely on the IGP in case of node failures. Furthermore, in a MPLS/VPN network MPLS FRR node protection coverage is currently limited to P routers, PE routers recovery still rely on BGP/IGP Fast Convergence. Ahmad, Oki, Wang, Miles MTER Problem Statement [Page 4] draft-ahmad-mter-problem-statement-00.txt January 2006 5.1.2 Evaluation In case of link protection, the traffic is fast rerouted along the pre-established backup tunnel path and the recovery time is on the order of a few tens of ms thanks to the local protection nature of MPLS FRR (see [FRR]). In case of node protection, the recovery relies on IGP fast convergence, and the sub-second convergence should be achievable by IGP fast convergence in some circumstances. 5.1.3 Potential side effects With this combination, a link failure is likely to trigger both FRR and IGP recovery which compete for the resources to offer the same protection with the following potential side effects: - The IGP is unnecessarily stressed, - There might be a risk of congestion that may be due to inherent IGP micro-loops or lack of bandwidth ressources; 5.1.4 Required functions The problem relies on the fact that there is no way to distinguish links and node failures. A useful way to avoid double recovery activation would be to rely on a mechanism to rapidly distinguish link and node failures. This would allow activating the appropriate mechanism upon failure. Another currently available solution, although not entirely optimal, consists of advertising the primary TE-LSP as a link within the IGP so as not to activate IGP convergence upon link failure. Note that there are currently no defined tools (MIBs, OAM) allowing for the monitoring of such interaction. For example, is there a need for a MIB that would keep track of the number of times each mechanism has triggered a recovery action and the reasons for each activation? 5.2. IGP Fast Convergence & Graceful Restart 5.2.1. Description In this MTER scenario Graceful Restart (GR) is combined with IGP Fast Convergence. GR is used to ensure recovery upon node control plane failures, and IGP fast convergence is used to ensure recovery upon data plane (link & node) failures. Graceful Restart (GR) mechanisms have been defined at the IETF for various protocols including BGP [BGP-RESTART], ISIS [RFC3847], OSPF [RFC3623], LDP [RFC3478] and RSVP-TE ([RFC3473], [RSVP-RESTART]). Graceful restart allows networks to protect against Control Plane failures on Edge and Core Routers (e.g. Control plane processor Failure/ Switchover, Planned Maintenance Operations, etc). The key objective of GR is to gracefully recover the control plane following Ahmad, Oki, Wang, Miles MTER Problem Statement [Page 5] draft-ahmad-mter-problem-statement-00.txt January 2006 control plane processor failure/switchover and reduce or eliminate the impact on traffic forwarding. IGP Fast Convergence allows quick recovery from backbone link failures as well as Node Data Plane failures. 5.2.2. Interactions During GR operation, the neighbors/peers continue to forward traffic through the node which is undergoing a Control Plane Restart, whereas in the case of IGP Fast Convergence the idea is to re-route around the failed link/node as quickly as possible upon failure detection. Although the respective behaviors of GR and IGP Fast Convergence seem to be orthogonal at a first sight, there can be some interactions when both of these functions are used in conjunction. Indeed, the GR protocols rely on support of their neighbors to provide the routing information during a Control Plane Restart. When a failure/ switchover occurs on a GR capable device, its neighbors are supposed to wait a certain amount of time (i.e. Restart timer in case of BGP GR, ISIS Holdtime in case of ISIS GR..) to allow for the recovery of the node control plane after completing the IGP process restart (making the assumption of a recoverable control plane failure). In order to allow the GR mechanism to work effectively, it is required that the IGP process restarted and IGP Hellos are sent to the adjacent neighbors before the IGP Holdtime/Dead-interval expires ([RFC3847], [RFC3623]). In case the IGP process restart takes longer or the IGP Hellos are not sent within the Holdtime/Dead-interval, the IGP adjacency will be declared down by the neighbors, thus triggering routing convergence and causing packet loss. This implies that certain Control Plane failures may actually cause to abort GR and trigger IGP convergence. Another implication of GR mechanism and inherent IGP process restart time is that it may require compromising the IGP adjacency’s failure detection time i.e. the IGP Holdtime/ Dead-interval must be greater than the process switchover time. So, this aspect may limit the ability to aggressively tune the IGP Adjacency failure detection time (e.g. using IGP Fast Hellos or BFD, etc.) and may not allow for a faster detection of link and node data plane failures, in cases where data plane failure detection relies on IGP adjacency loss. It should be noted however, that Fast failure detection is not necessarily incompatible with Graceful Restart. The issue really is that when an IGP adjacency is lost, the peer does not know whether this is due to Control Plane or Data Plane failure, and hence does not know if it has to continue sending traffic to the neighbor or rather trigger IGP convergence. This is to be noted that GR mechanism only protects from Control Plane failures and is built on the premise that forwarding can be preserved during Control Plane Restart on the GR capable device. Ahmad, Oki, Wang, Miles MTER Problem Statement [Page 6] draft-ahmad-mter-problem-statement-00.txt January 2006 There may be a situation where a Control Plane Restart occurs and forwarding cannot be truly preserved (e.g. due to memory corruption, software bug, etc.). In this scenario it should be desirable to trigger IGP Fast Convergence and avoid blackholing of traffic. However, in this situation the IGP convergence is likely to be delayed due to having the GR mechanism activated. To summarize, GR mechanism requires the neighbors to wait for some time mainly due to the inherent control plane processor switchover time. If this time is set to be too short, it will likely interfere with GR mechanism by triggering routing convergence at the neighbors. If this time is set a bit higher, it will delay IGP Fast Convergence in case of link and node data plane failures. This issue arises mainly due to the lack of a mechanism to make a clear distinction between Control Plane failures and Data Plane failures. Such mechanism may be considered to efficiently handle this MTER scenario. As in the previous case, there are currently no defined tools (MIBs, OAM) allowing for the monitoring of such interaction. Is there a need to be able to record a case where a GR procedure has failed leading to increasing convergence time of the IGP? Is there a need to be able to monitor the GR performances in order to be able to tune more aggressively IGP timers? 5.3. IGP Fast Convergence and GMPLS Protection & Restoration 5.3.1. Description In this third MTER scenario, IP links are supported by a transport network (SDH, WDM…) managed by a GMPLS control plane. Protection against link and node failures is as follows: - Link recovery relies on an underlying GMPLS protection and restoration (P&R) mechanism. This includes end-to-end recovery and segment recovery mechanisms ([E2E-RECOVERY],[SEG-RECOVERY]). - Node recovery relies on IGP Fast Convergence. Note that some data plane recovery mechanisms such as the SONET/SDH protection mechanism may co-exist with GMPLS P&R mechanisms. The co- existence with data plane recovery and GMPLS recovery mechanisms is not addressed in this section. GMPLS P&R mechanisms are activated at either end nodes in the case of the end-to-end recovery or at an end node or transit node along the GMPLS LSP route in the case of the segment recovery. Recovery may be triggered by one of the following events: - Some physical interface alarms. A physical interface alarm is invoked by detecting data plane errors such as a loss of light or error of control bits. - Receiving GMPLS control plane messages. The messages include Notify Message, and Path/Resv Error Messages; Ahmad, Oki, Wang, Miles MTER Problem Statement [Page 7] draft-ahmad-mter-problem-statement-00.txt January 2006 - GMPLS message time out. A RSVP state is expired, as no refresh message is received at the node within the configured refresh interval time-out value; - Link Manage Protocol (LMP) Detection. LMP [RFC4204] is able to detect node or link failures. IGP Fast Convergence may be triggered by one of the following events: - Some physical interface alarms. This case is the same as the case of GMPLS P&R mechanisms. - IGP time out. IGP neighbor establishment is expired, as no hello message is received at the node within the configured time interval. - BFD. A BFD timer expired Upon transport network failure, both IGP Fast Convergence and GMPLS P&R may be simultaneously activated. 5.3.2. Evaluation Orders of GMPLS recovery times (IP link failure) are usually from several tens of milliseconds to several hundreds of milliseconds (to a few seconds in case of E2E rerouting with a large number of LSPs). Orders of IGP Fast Convergence times (IP node failures) are typically from few hundreds of millisecond to a few seconds. 5.3.3. Potential Side effects Upon transport network failure, both IGP Fast Convergence and GMPLS P&R may be simultaneously activated. This has the following side effects: - This may lead to congestion at the IP layer while backup transport resources are available. - This may lead to race conditions: IGP rerouting on an IP link that is going to be preempted by the GMPLS recovery process; - This may generate useless IGP stress: Link-state re- advertisement and SPF re-calculation: A large amount of control-plane packets are transmitted within a short time period. Processing load at each node is increased. - This may lead to useless IGP instabilities: Congestion due to transient routing loops, etc. 5.3.4. Required functions Recovery mechanisms should be coordinated between packet and transport layers to satisfy high availability needs while using network resources efficiently. In other words it must be possible to avoid simultaneous recovery activations in both layers. For the sake of illustration the inter-layer recovery coordination may rely, for instance, on a hold off timer approach or on a separate layer recovery approach. Ahmad, Oki, Wang, Miles MTER Problem Statement [Page 8] draft-ahmad-mter-problem-statement-00.txt January 2006 Although not entirely optimal, a hold-off timer approach allows a transport layer recovery to be performed first before a packet layer recovery is activated. In case the transport layer recovery does not succeed within a certain time, a packet layer recovery is activated. In order for the packet layer to judge whether the transport layer recovery succeed or not, a hold off timer is used. With this approach, network operators should not allow a physical interface alarm to invoke both GMPLS P&R mechanism and IGP Fast Convergence when both mechanisms co-exist. In addition, network operators should configure the IGP time-out value more than the time required for GMPLS recovery. The pros of the hold-off timer approach is the simplicity, and the major cons is the delay introduced in case of node failure. Indeed, in case of node failure, neighbor routers wait, while they could immediately activate IGP rerouting. This may be applied for best effort traffic. The separate layer recovery approach allows an inter-layer recovery mechanism to make the distinction between IP link and IP node failures, and to control which layer will be in charge of the recovery. If an IP node failure is detected, the packet layer recovery is activated while the transport layer recovery is not. On the other hand, if an IP link failure, caused by an optical link or node failure, is detected, the transport layer recovery is activated while the packet layer recovery is not. The separate layer recovery approach may be applied for non best effort traffic, because recovery time in case of node failure can be reduced compared to the hold-off timer approach. To be efficient this approach requires to rapidily distinguish failure types (link and node) and this may be quite challenging. The separate layer recovery approach can be combined with the hold off timer approach. In the separate layer recovery approach, even after an IP link failure is detected the transport layer recovery may not succeed. In that case, the packet layer recovery will be activated after the hold off timer expired. Similarly to the two previous cases, there are currently no defined tools (MIBs, OAM) allowing for the monitoring of such interaction. Is there a need to be able to record a case where a GMPLS P&R procedure has failed leading to increasing the IGP convergence time? Is there a need to be able to monitor the GMPLS P&R performances in order to be able to tune more aggressively IGP timers? 6. Multi Techno OAM 6.1. Failure detection and isolation in an MTER context. In an MTER context it appears relevant to study mechanisms allowing to detect and isolate failures (control plane/data plane, link/node, SRLG...). Particular attention would have to be given on the time for Ahmad, Oki, Wang, Miles MTER Problem Statement [Page 9] draft-ahmad-mter-problem-statement-00.txt January 2006 such detection/isolation, as this may diminish the benefits of such mechanisms. Various mechanisms have been defined to detect failures: Layer 2 alarms (e.g. Sonet AIS), control plane alarms (e.g. GMPLS Notify), control plane hellos (e.g. IGP hello), BFD [BFD], LMP [RFC4204], etc. Applicability of these technologies to the detection and isolation of failures, would have to be documented for each MTER scenario. 6.2. MIBs/OAM There is a need for a MTER MIB that would, non exhaustively contain the following information for each failure occurrence: - the location of the failure (link, node, SRLG) and the exact reason for the rerouting; - the set of P&R mechanisms triggered; - an order of magnitude of the recovery time. 7. Future items Here is a non exhaustive set of additional items that need to be covered as part of the MTER work: - Other combinations of recovery techniques (e.g. BGP-IGP interactions, IGP-IPFRR interactions, etc.); - Superset of elementary combinations (e.g IGP Fast Convergence for node data plane failures + Graceful Restart for node control plane failures + MPLS FRR link protection for link failures). - Combination of recovery techniques to protect distinct traffics (e.g. IGP Fast convergence and end-to-end MPLS-TE recovery); 8. Acknowledgments We would like to thank Jean-Louis Le Roux, Jean-Philippe Vasseur and Raymond Zhang, for their useful comments and suggestions. 9. References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3667] Bradner, S., "IETF Rights in Contributions", BCP 78, RFC 3667, February 2004. [BCP79] Bradner, S., "Intellectual Property Rights in IETF Technology", RFC 3979, March 2005. [RFC3945] Mannie, E., et. al. "Generalized Multi-Protocol Label Switching Architecture", RFC 3945, October 2004. [FRR] P. Pan, G. Swallow, A. Atlas, et al "Fast Reroute Extensions to RSVP-TE for LSP Tunnels", FRC 4090, May 2005. Ahmad, Oki, Wang, Miles MTER Problem Statement [Page 10] draft-ahmad-mter-problem-statement-00.txt January 2006 [BGP-RESTART] P. Sangli, Y. Rekhter, R. Fernando, J. Scudder, E. Chen, "Graceful Restart Mechanism for BGP", work in progress. [RFC3847] M. Shand, L. Ginsberg, "Restart Signaling for Intermediate System to Intermediate System (IS-IS)", RFC3847, July 2004. [RFC3623] J. Moy, P. Pillay, A. Lindem, "Graceful OSPF Restart", RFC3623, November 2003. [RFC3478] M. Leelanivas, Y. Rekhter, R. Aggarwal, "Graceful Restart LDP", RFC3478, February 2003. [RFC3473] L. Berger et al., "GMPLS RSVP-TE extensions", RFC3473, January 2003. [RSVP-RESTART] A. Satyanarayana, R. Rahman, "Extensions to GMPLS RSVP Graceful Restart", work in progress. [E2E-RECOVERY] J. Lang, Y. Rekhter, D. Papadimitriou, "RSVP-TE Extensions in support of End-to-End Generalized Multi-Protocol Label Switching (GMPLS)-based Recovery", work in progress. [SEG-RECOVERY] L. Berger, I. Bryskin, D. Papadimitriou, A. Farrel, "GMPLS Based Segment Recovery", work in progress. [RFC4204] J. Lang et al. "Link Management Protocol", RFC 4204, October 2005. [BFD] D. Katz, D. Ward, "Bidirectional Forwarding Detection", work in progress. 10. Authors' Addresses: Zubair Ahmad Equant 13775 McLearen Road, Oak Hill VA 20171 Email: zubair.ahmad@equant.com Eiji Oki NTT 3-9-11 Midori-Cho Musashino, Tokyo 180-8585, Japan Email: oki.eiji@lab.ntt.co.jp Lei Wang Telenor Snaroyveien 30 Fornebu 1331 NORWAY Email: lei.wang@telenor.com Ahmad, Oki, Wang, Miles MTER Problem Statement [Page 11] draft-ahmad-mter-problem-statement-00.txt January 2006 Jaime Miles Level 3 Communications Email: jaime.miles@level3.com 11. Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Disclaimer of Validity This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Copyright Statement Copyright (C) The Internet Society (2006). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. Ahmad, Oki, Wang, Miles MTER Problem Statement [Page 12]