Internet Engineering Task Force Hirokazu Ishimatsu Internet-Draft Yoshihiro Hayata Susumu Yoneda Japan Telecom Co., LTD. Expiration Date: May 2001 Ramesh Bhandari George Newsome Eve Varma Lucent Technologies November, 2000 Carrier Needs Regarding Survivability and Maintenance for Switched Optical Networks draft-hayata-ipo-carrier-needs-00.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. 1. Abstract As discussed in [1], the need for survivable optical networks is critical, and introducing capabilities that further enhance network survivability continues to be an essential objective. This is particularly important for operators with stringent requirements for network resilience and service survivability. However, disruption of service can result not only from faults, but also from scheduled maintenance procedures. This draft introduces some additional considerations and carrier needs related to failure recovery and scheduled maintenance work in switched optical networks. These are of critical importance for serving -business customers who require super high quality service assurance and pay correspondingly high tariffs in order to guarantee this level of QoS. 2. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119. 3. Introduction The explosion of data services is increasingly imposing challenging network infrastructure requirements at the same time that wavelength services are emerging in the marketplace. Next generation optical networking solutions must enable scalable, flexible, and reliable networks as well as increased responsiveness to client network needs. Provision of an optical layer service framework has been discussed in the context of service considerations considered important for inter- city network operators [2]. As described in this material, some key objectives include service functionality, a workable business model, and evolvability in a heterogeneous network environment. Key service functionality cited in [2] has included rapid provisioning and restoration. Automated provisioning of optical layer resources in support of scheduled and demand-based customer/client needs offers opportunities for supporting new services as well as handling routine maintenance activities in a non-service disrupting manner (e.g., scheduled or predictable maintenance-related churn). Assuring support for a workable business model that can adapt to change, e.g., arbitrage, is important. In particular, it has become clear that there is a range of reasonable business models that might be utilized in an operator's network, depending upon the scope and objectives of the enterprise. In particular, as discussed in [4], such models might be used in various ways, and for various purposes, even by different organizations within the same network operator domain. Evolvability is an important consideration as it is essential for service providers to have a smooth network evolution path for addressing the unique problems inherent in simultaneously supporting an existing network while deploying a new multi-service infrastructure. Clearly, it is also necessary to enable emergent service providers to optimally tailor their networks for their targeted market and service offerings; however, emergent providers quickly need to deal with embedded base as soon as initial deployment of resources has occurred. Within the remainder of this draft, we will focus upon service functionality and business model objectives in relation to service survivability and maintenance considerations for highly reliable services such as the super high quality services discussed below. 4. Switched Optical Services The basic requirement of a switched optical service is that a channel is established via an appropriate signaling mechanism before data can be transferred and that this establishment is achieved in the following manner: - a real-time client specifies its traffic characteristics and its end- to-end performance requirements to the server - the most suitable route for a channel that meets these requirements is determined - translate the end-to-end parameters into local parameters at each NE and attempt to reserve resources via signaling. The service abstraction defines a contractual relationship between client and server. Hence once the connection is established the server guarantees in the absence of a failure that it will meet its contractual obligations. This contract is basically agreed before data transfer. When the server guarantees the contract, several actions have to be taken in case of a failure. This paper addresses those actions in Sec. 4.2. 4.1 Super High Quality Services Characteristics Super high quality services (also known as private line services) offered by a carrier currently have the following characteristics: - The exact physical and logical location of a private line userÆs path in the network is known and uniquely identifiable, (i.e. the optical fiber cable, fiber, optical channel, SDH logical path, port of transmission equipment/router, etc) is known to the network operator. - When a logical path or port is switched to an alternate route (i.e., a back-up path) due to an unexpected event, after the event or failure is repaired, the carrier switches traffic back from the alternate path to the original path. - For scheduled maintenance, the carrier always asks customers having super high quality services (that may be affected due to this maintenance work) their preference in terms of when this work may be carried out. The carrier then carries out the scheduled maintenance work according to customer preference regarding date and time, as it is essential that important customers not be adversely impacted in any way by scheduled maintenance work. - The carrier provides for guaranteed service survivability in the event of failures. It does so by providing alternate paths for carrying services, with the service and alternate paths being physically and topologically diverse. 4.2 Service Survivability Considerations As discussed in [3], there is a range of failures that can occur within a network, and high reliability applications will require a variety of failures to be taken into account. Examples that have been considered include office outages, failures arising from diverse circuits traversing shared protection facilities such as rings, and natural disasters. It is essential to fully prepare for those natural disasters such as earthquakes, volcanoes and typhoons. Further, for super high quality services, there is extreme sensitivity to service interruptions. Thus, it is important that the service and alternate paths do not have links that are part of any Shared Risk Link Groups (SRLG) [3], or pass through the same "region of failure". Additionally, in order to assure an optimized survivable network architecture, it is desirable that the alternate path can be switched- back to the original service path once the failure is repaired (note that not all carriers may choose to revert). The following different grades of services may be defined with actions to be taken in the event of a failure: - Standard service, which is provided from a given source to a given destination over a path computed in accordance with normal network capacity constraints; when the customer loses connection on account of a fault, the customer may request the same connection which the network will then try to establish on a newly computed path. - Medium High Quality Service which, at the customerÆs request, provides a connection over a path that avoids a certain set of cities or regions, which are prone to damage due to natural disasters such as earthquakes, volcanoes, typhoons, etc. These "regions of failure" may each be ascribed a "radius of failure" determined from a study of the past history of the spatial extent and severity of damage in those regions; in the event of a failure of this service, the customer may request reestablishment of a connection, which the network will attempt to provide over a new path. - High Quality Service, which is provided with a physically disjoint back-up path in case of failure of the primary path; there are no requirements on city avoidance, etc; as a result, the back-up basically provides guarantee of continuity of service only in the event of link or equipment failure. - Super High Quality Service, which is provided with a physically disjoint back-up path, constrained to have no "region of failure" in common with the original path. Such type of service may be requested by big business customers who essentially want continuity of service at all times. In fact, since the downtime of the primary path may be significantly large in major catastrophes such as those due to earthquakes, floods, etc., a carrier may offer to provide a back-up for the back-up over which the guaranteed services were switched upon failure of the primary path. The above four types of services may be summarized in the table below: Service Type Physically disjoint Avoid a Region of protection path Failure Regular No No Medium No Yes High Yes No Super High Yes Yes In the event the constraints for the above high quality services can only be met partially (e.g., 100% physical diversity between a given pair of source and destination cannot be provided, e.g., because it just does not exist for the particular source-destination pair), then the customer, instead of being refused the desired service, may simply be offered service with a correspondingly reduced level of service protection; for example, if the percentage amount of fiber overlap on the primary and secondary routes is x, then the customer may be offered the service with a reduction in service continuity guarantee by x%, and thus also with correspondingly reduced costs to the customer. Furthermore, in those cases, where the customer does not want to pay the full cost of the above high quality services, even when such service exists, then service may still be provided, but with corresponding reduced quality guarantees within the class of service under consideration. 4.3 Data Bases and Algorithms Because natural disasters such as earthquakes, typhoons, etc. can damage a large area in one instance, it is important to ascertain the regions within the service provider's network prone to damage by such calamities. Normally, such areas have a history of damage, and it should be possible to construct a data base on the location, intensity of disaster, its frequency, and the size of the area affected; the area affected may be expressed as a "radius of failure". It may also be possible to use the information on the intensity of disaster and the frequency of occurrence to assign probabilities of failure to the offered services. For path computation, the following data bases are needed: - Nodes, links, and their fiber span content, or alternatively, nodes, fiber spans and links riding the individual spans also called Shared Risk Link Groups (SRLG's); clearly, if a link or node is not in service, it is not included in path computation. - Regions of failure, corresponding radii of failure and locations within the service provider's network; these should be taken into account before computing paths for the medium high and super high quality services. For highly reliable services such as the super high quality services, physically-disjoint paths for real-life networks (which involve span- sharing links or SRLGÆs) are required. Ref. [5] describes algorithms for such real-life networks. The algorithms emphasize optimality to save network costs. Depending upon the span-sharing topologies of a given network, these optimal algorithms can be very fast, and thus suitable for running in the real-time environment. For networks, with very complicated span-sharing topologies, exact algorithms do exist [5], but they are slow for large networks, since the problem becomes NP-complete. In such situations, fast heuristics may be developed [5] (see also [2] for a discussion on diversity). 4.4 Business Model Considerations As described in [4], there are several business models that may be applicable for network operators: ISP owning all Layer 1 infrastructure and only delivering IP-based services, ISP owning or leasing Layer 1 infrastructure and only delivering IP-based services, retailer or wholesaler for multi-services, and a carrierÆs carrier or bandwidth broker. A carrier owns the layer 1 infrastructure and sells multiple service types to customers, which may include other operator networks. This bandwidth brokering, or reseller, role takes on a new meaning in the context of service resilience. For many years, in Japan, operators have collaborated to handle traffic in the event of natural disasters, so that bandwidth can be borrowed from each other. Thus, if an operator doesnÆt have the capacity, they can borrow capacity from another network. Accommodating the unexpected is a key factor in this case. Indeed it seems to be a common pattern in industry that businessÆs that provide service and operate their own infrastructure tend to separate into two businessÆs. This makes it likely that even though infrastructure may be whole owned today, it may well not be tomorrow. This makes it important to take account of fully separated business models (case 3 and 4 of [4]) even if this does not seem to represent the majority of today's business's. 5. Implications for switched optical networks Considering the discussion in Subsections 4.1 - 4.4, switched optical networks must minimally: - Support the various grades of high quality services, including the Super High Quality Service described in Sec. 4.1. - Support survivability considerations related to diverse routing, tailored to the unique characteristics of JapanÆs geography and routing of fibers. - Enable "bandwidth borrowing on demand" from other carriers as well as support for multiple service types. Examples of necessary functionality are provided in more detail below, as well as some related connection setup operations. 5.1 Functions - When referring to Section 4, we can see that the following functions need to be supported: - Ability for network operator to manually set the date and time that a path switching function should take place, and have that occur automatically. (The guarantee that the switch occurs as scheduled is closely linked to resource allocation policies; see T1X1.5/2000-194 for further discussion on scheduled connections.) - Ability to specify switching to a physically/topologically disjoint path from the service path. - Ability to maintain and update the data bases in a timely manner so that a connection request is supported with the most current knowledge of the network. - Ability for operator to support a survivability policy that enables the capability for switch-back to the original service path. - Ability to support an operator policy to prioritize service requests so that, in the event of a fault, customers with super high quality services have first priority in being switched to disjoint paths. - Ability to enable key customers to request constraints on the connection path(e.g., avoid City X because an earthquake has just occurred, or simply because the city is very much prone to damage from natural disasters such as earthquakes, volcanoes and typhoons. This involves the ability to express geographic constraints, as opposed to just physical (equipment) or topological constraints. - Ability to prevent new customers from being added to a particular link for a certain amount of time (e.g., because of a failure, natural disaster, scheduled maintenance). This requires the ability to mark particular resources as out of service. - Ability for the operator to query service management function to establish the exact location and characteristics of service paths for key customers. - Ability for the operator to view information regarding which customer/user is associated with which service path(s). 5.2 Connection Setup Operation Referring to [4], some relevant connection setup parameters include: 1) Scheduled service - ability to request the connection to be made at some specified time in the future (see T1X1.5/2000-194 for further discussions). 2) Scheduled duration - ability to specify a duration for the Connection. 3) Resilience - ability to request resilience against server layer faults, and specify a particular degree of risk (see Sec. 4.2) 4) Connection Constraints - ability to specify the constraints as in the three levels of high quality service described in Sec. 4.2. 6. References [1] J. Luciani, B. Rajagopalan, D. Awduche, B. Cain, B. Jamoussi, "IP over Optical Networks - A Framework", , March 2000 [2] John Strand, "Optical Layer Services Framework", T1X1.5/2000-142 [3] Monica Lazer, John Strand, "Some Routing Constraints", T1X1.5/2000- 143 [4] George Newsome, "ASON - Requirements at the Client API", T1X1.5/2000-158 [5] Ramesh Bhandari, "Survivable Networks - Algorithms for Diverse Routing", Kluwer Academic Publishers, 1999. 7. Authors' Contact Information Hirokazu Ishimatsu Japan Telecom hirokazu@japan-telecom.co.jp Yoshihiro Hayata hayata@japan-telecom.co.jp Sussumo Yoneda Japan Telecom yone@japan-telecom.co.jp Ramesh Bhandari Lucent Technologies bhandari1@lucent.com George Newsome Lucent Technologies gnewsome@lucent.com Eve Varma Lucent Technologies evarma@lucent.com Expiration Date: May 2001