CCAMP Working Group           Richard Rabbat (Fujitsu Labs. of America)
Internet Draft                 Ching-Fong Su (Fujitsu Labs. of America)
Expires: April December 2004                   Vishal Sharma (Metanoia, Inc.)

                                                           October 2003

                                                              June 2004

   Observations on the Applicability of the Fault Notification Protocol
                   draft-rabbat-fnp-applicability-00.txt
                   draft-rabbat-fnp-applicability-01.txt

Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026 [1].

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
        http://www.ietf.org/ietf/1id-abstracts.txt
   The list of Internet-Draft Shadow Directories can be accessed at
        http://www.ietf.org/shadow.html.

Abstract

   The Fault Notification Protocol (FNP) is a set of procedures designed
   to enable time-bounded failure notification in networks using an IP-
   based control plane. This document discusses the applicability of FNP
   in the context of optical transport networks. It highlights the
   protocol’s
   protocolÆs principles of operation, and then describes the network,
   node, fault, and operational models in optical networks for which the
   protocol is designed. It also discusses the relationship to higher
   layers, and issues of scalability. Some guidelines for deployment are
   also provided.

Table of Contents

   1. Introduction...................................................2
   2. Terminology....................................................2
   3. Operational Overview of FNP....................................3
   4. FNP Applicability..............................................4
   4.1 Network Model.................................................4
   4.2 Node Architecture.............................................4
   4.3 Fault Model (Types of faults supported).......................4
   4.4 Network Layer at which FNP Applies............................5
   4.5 Relationship to Higher (Packet) Layers........................5
   4.6 Operational Model.............................................5
   4.7 Framing and Data Plane Considerations.........................6
   4.8 Scalability Considerations....................................6
   4.9 Guidelines for Deployment.....................................7
   5. Conclusion.....................................................7
   6. Acknowledgements...............................................7
   7. Intellectual Property Considerations...........................7 Considerations...........................8
   8. References.....................................................8
   9. Authors' Addresses.............................................9
   10. Full Copyright Statement.....................................10 Statement......................................9

1. Introduction

   As carriers move towards offering advanced services on their
   networks, with a tighter integration of the different network layers,
   the ability to provide rapid, scalable, and timely restoration is
   crucial for meeting agreed-upon SLAs, either between providers or
   between the end-customer and a provider. In this context, time-
   bounded fault notification will be a key component of the overall
   carrier restoration strategy.

   The Fault Notification Protocol (FNP) [2] is a protocol developed to
   meet this service provider requirement. It is designed to facilitate
   rapid restoration recovery by enabling time-bounded fault notification in
   networks that use an IP-based control plane.

   The purpose of this memo is to discuss the applicability of FNP in
   the context of optical transport networks.

2. Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [3].

3. Operational Overview of FNP

   In this section, we briefly review the basic operation of FNP while
   confining our discussion here to optical transport networks.

   Fundamentally, FNP is a set of procedures designed to provide time-
   bounded fault notification in a network with shared protection. That
   is, a network where either the protection route between two nodes
   carries “extra traffic” ôextra trafficö from two or more disjoint trails, or the
   provider implements M:N type shared restoration.

   Once a network fault is detected, detected at T-detect, and a set hold-off time
   T2 has expired at time T-start-ntf, the node detecting the fault
   sends out a fault notification message to each of its neighbors on
   the control plane. The message essentially identifies the resource
   (fiber, lambda, or node) at fault; this allows any network node
   receiving a fault notification message to determine whether it lies
   on the path of a backup LSP corresponding to a working LSP affected
   by the fault. The message also carries the time (per the local clock)
   at which the fault was detected. notification process started T-start-ntf.

   Each network node, upon the receipt of a fault notification message
   first transmits the message on each of its remaining outgoing
   interfaces, and then processes the message to determine whether it
   lies on the path of a backup LSP(s) that needs to be activated as a
   result of that fault. If so, the node first drops any extra-traffic
   that was using the resources originally reserved for this backup LSP,
   and reconfigures its cross-connect hardware so that the working
   traffic arriving on the backup LSP can be directed to the appropriate
   outgoing link/interface.

   The flooding mechanism ensures that information about the fault is
   propagated to each network node in the minimum number of hops on the
   control plane and, provided that the fault notification packet gets
   high-priority in the transmission queues at each node, also that
   fault notification is propagated in the shortest possible time.

   A protection switching

   An ingress node, upon receiving the notification message, waits for
   an amount of time that is the difference of the upper bound on the
   notification time bound (Tntf) (T-ntf) and the time at which the fault was
   detected (Tdetect),
   notification started (T-start-ntf), and then switches traffic from
   the affected working path(s) to the backup path(s). Note that this
   eliminates a phase of signaling that would typically be needed in a signaling-
   based
   signaling-based approach to activate the nodes along the backup LSP.

   The key is to ensure that by the time a protection-switching node
   performs the switch, all intermediate nodes along the associated
   backup path(s) will have configured themselves. This is assured by
   selecting a backup path in such a way that for any fault on the
   corresponding working path, all of the nodes along the backup path
   will have been informed (and will have reconfigured themselves)
   within a time Tntf bound T-ntf following Tdetect. T-start-ntf. Therefore, a protection-
   switching each node
   on the recovery path performs the switch [Tntf – (Tcurrent – Tdetect)] ms [T-ntf û (T-current û T-
   start-ntf)] milliseconds after learning of the fault via a fault
   notification message.

4. FNP Applicability

   Our objective in this section is to clearly specify how and where FNP
   applies in the context of optical transport networks, by discussing
   its applicability along several dimensions, as outlined below.

4.1 Network Model

   FNP is initially designed to operate within a single IGP area, where
   fine-grained signaling is used.

   In fine-grained signaling, the entire backup resource (link, lambda,
   and hence, label) is selected during the initial signaling phase for
   the backup path. Although FNP could also apply to coarse-grained
   signaling (where only a link bundle is selected during the signaling
   of the backup path, but the specific lambda and, hence, label, is
   selected upon the occurrence of a failure) that requires coordination
   with signaling between adjacent nodes, and is left for further study.

   FNP is useful in contexts where either: (a) the provider implements
   1:1 restoration and allows the bandwidth on the backup path to be
   shared by trails that originate and terminate at nodes other than the
   s-d of the backup path, or (b) the provider implements more general
   shared-mesh restoration, where multiple working LSPs with disjoint
   paths share backup resources.

4.2 Node Architecture

   FNP is designed to work in networks with OEO nodes. Its applicability
   to networks with OOO nodes (that is, fully transparent all-optical
   networks) depends on the monitoring capabilities of the OOO systems
   deployed, and is for further study.

   For a network with OEO nodes, the fault detection and correlation
   (which happens before FNP is activated, and is outside the scope of
   this document) occurs at the node closest to the fault. Once the
   detection procedure has determined that a bonafide fault has
   occurred, it activates FNP for fault notification.

4.3 Fault Model (Types of faults supported)
   FNP is designed to support three types of faults in an optical
   transport network  û fiber cuts, transponder failures, and switch
   failures. These correspond, respectively, to link faults, lightpath
   or LSP faults, and node faults.

4.4 Network Layer at which FNP Applies

   In the case of optical transport networks, FNP is designed to operate
   at the fiber and optical lightpath layers. The protocol works in the
   context of an optical transport layer that is controlled by an IP-
   based control plane.

   The operation of FNP in a multi-layer context, is a complex problem,
   and is for further study. (For example, in a multi-layer situation,
   the goal might be to perform notification both at the layer closest
   to the fault (as FNP currently does) and at the service layer (for
   example at the level of a VT1.5 circuit that may typically be
   embedded inside a larger SONET/SDH circuit on a lightpath).)

4.5 Relationship to Higher (Packet) Layers

   A key aspect of using FNP at the optical transport layer to provide
   time-bounded notification (and hence recovery) is to be able to
   provide the higher (packet) layer some guarantees on how long the
   optical transport layer would take to respond to a failure.

   This allows carriers to implement appropriate hold-off timers at the
   higher-layers, and to use this information to craft adequate SLA’s SLAÆs
   with their customers.

   In the event where the client layers (higher (packet) layers) and the
   server layer (the optical transport layer) are under the control of
   different providers, it is reasonable to expect that the inter-
   provider agreements between the carriers would incorporate protection
   switching timing bounds.  In that case, notification timing bound
   guarantees provided by the carrier owning/operating the server layer
   would be useful to enable the carrier owning/operating the client
   layer to, in turn, incorporate these in the SLAs SLAÆs it signs. This
   notion could be applied recursively between pairs of adjacent
   carriers.

4.6 Operational Model

   FNP is applicable in a hierarchical network layering model, for
   example, packet over SONET/SDH over lambda over fiber, with the
   recognition that the SONET/SDH layer is itself a layered architecture
   (for example, VT1.5 in STS-1 in STS-3 in STS-12/48).

   Note that FNP does not by itself impose any requirements on the
   policy that the provider uses to devise pre-emption schemes in the
   case where shared restoration and extra-traffic are used. As a
   practical matter, however, the carrier (or carriers) involved would
   have to devise pre-emption schemes that are not susceptible to a
   domino effect (where the removal of some extra-traffic LSP causes a
   cascading effect, triggering the pre-emption of a series of LSPs). A
   carrier would be expected to ensure this simply to maintain network
   stability.

4.7 Framing and Data Plane Considerations

   FNP is a control-plane mechanism for disseminating fault information
   throughout a network. As explained in Section 3, in the context of
   transport networks, the flooding mechanism of FNP accomplishes both
   notification and node reconfiguration simultaneously.  That is, it
   informs the intermediate nodes along a backup LSP corresponding to an
   affected working LSP of a fault, thus allowing them to reconfigure
   themselves, while at the same time notifying the edge nodes
   responsible for taking a restoration action to recover the affected
   LSP(s).

   When appropriate digital framing of the optical signal is available
   in the data plane (e.g. G.709 digital wrapper or SONET/SDH framing),
   and the optical transport nodes can process and interpret the framing
   overhead, FNP can interwork with the fault notification mechanisms
   available in the data plane (e.g. the Forward/Backward Defect
   Indication signals embedded in the framing overhead).  In this case,
   even though notification of the end nodes may occur in the data
   plane, the notification of the nodes along the backup paths of the
   affected working paths is still needed so that they can reconfigure
   themselves.  This can be accomplished via FNP.

4.8 Scalability Considerations

   FNP ensures that at most one message is exchanged on every control
   channel link, whereas fault notification using signaling may lead to
   a large number of signaling messages per link, as explained shortly.
   This leads to a scalability advantage for DWDM networks that have a
   large number of wavelengths or when there are numerous LSPs, each
   corresponding to a small granularity SONET/SDH channel.

   Let us define the length of a control channel between two adjacent
   nodes to be the number of hops that a control message takes to go
   from one node to the other.  Thus, an in-band control channel has
   length one.  By extension, the length of a path in the control plane
   is the sum of lengths of control channels used in this path.  In
   practice, the maximum number of messages using signaling per failed
   LSP is equal to the length of the path that the notification message
   takes from the detecting node to the protection switching point "s"
   plus twice sum of the lengths of the control channels corresponding
   to each hop of the protection path from s to d (s-d protection path
   on the control channel).  For the set of affected LSPs, that value is
   multiplied by the number of LSPs affected by the fault. fault that have
   unique sources (the assumption being that a bundled notification
   message can be sent to sources that originate multiple LSPs affected
   by the same fault).  The number of messages, in the worst case, is
   thus directly proportional to the number of LSPs affected. affected (assuming
   each affected LSP originates at a unique source).  This compares to a
   maximum number of messages for FNP equal to the sum of the lengths of
   all control channels in the network.

4.9 Guidelines for Deployment

   While use of FNP can be appropriate in a variety of situations, we
   provide some initial thoughts on deployment considerations here.

   FNP is expected to be very useful in core optical networks where the
   provider deploys a mesh-based topology and has a large number of
   active lambdas (or the possibility of having several lambdas turned
   on as the network grows). As explained earlier, this would save on
   the signaling overhead of individually activating each backup LSP.

   As explained in Section 4.5, FNP is applicable in situations where
   adjacent client and server layers are under the control of different
   providers.  Although FNP does not impose a limit on how many
   providers may be involved in offering service to the end customer,
   practical considerations would dictate that this “recursion” ôrecursionö of
   provider client-server relationships not be more than a few levels
   deep.

5. Conclusion

   This document has provided an overview of the domain of applicability
   of the FNP protocol in the context of optical transport networks. By
   outlining the network, node, and fault models to which FNP applies,
   the document has provided guidelines on where FNP is currently
   usable, and outlined areas of further work.

6. Acknowledgements

   We would like to thank the members of the CCAMP WG for on-line and
   off-line discussions that helped shape some of the ideas behind this
   document. In particular, Adrian Farrel, Zafar Ali, Neil Harisson,
   Jonathan Sadler, Jonathan Lang, Fabio Ricciato and Roberto Albanese.

7. Intellectual Property Considerations

   This section is taken from Section 10.4 of RFC2026 [1].

   The IETF takes no position regarding the validity or scope of any
   intellectual property
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; neither nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the
   IETF's procedures with respect to rights in standards-track and
   standards-related documentation RFC documents can be
   found in BCP-11. BCP 78 and BCP 79.

   Copies of
   claims of rights IPR disclosures made available for publication to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementors implementers or users of this
   specification can be obtained from the IETF Secretariat. on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights, which
   rights that may cover technology that may be required to practice implement
   this standard.  Please address the information to the IETF Executive
   Director. at ietf-
   ipr@ietf.org.

8. References

   [1] Bradner, S., "The Internet Standards Process -- Revision 3", BCP
      9, IETF RFC 2026, October 1996.

   [2] Rabbat, R., and V. Sharma (Eds.), "Fault Notification Protocol
      for GMPLS-Based Recovery", Internet Draft, work in progress,
      draft-rabbat-fault-notification-protocol-03.txt, June 2003.
      draft-rabbat-fault-notification-protocol-05.txt, May 2004.

   [3] Bradner, S., "Key words for use in RFCs to Indicate Requirement
      Levels," BCP 14, IETF RFC 2119, March 1997.

9. Authors' Addresses

   Richard Rabbat                      Ching-Fong Su
   Fujitsu Labs of America, Inc.       Fujitsu Labs of America, Inc.
   1240 E. Arques Ave, MS 345          1240 E. Arques Ave, MS 345
   Sunnyvale, CA 94085                 Sunnyvale, CA 94085
   United States of America
   Phone: +1-408-530-4537
   Email: rabbat@fla.fujitsu.com

   Ching-Fong Su
   Fujitsu Labs of America, Inc.
   1240 E. Arques Ave
   Sunnyvale, CA 94085            United States of America
   Phone: +1-408-530-4537              Phone: +1-408-530-4572
   Email: csu@fla.fujitsu.com. rabbat@alum.mit.edu          Email: csu@fla.fujitsu.com

   Vishal Sharma
   Metanoia, Inc.
   1600
   888 Villa Street, Unit 352 St, Suite 200B
   Mountain View, CA 94041
   United States of America
   Phone: +1-408-530-8313
   Email: v.sharma@ieee.org

10. Full Copyright Statement

   "Copyright (C) The Internet Society (2003). All Rights Reserved.
   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works. However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than
   English.

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assigns.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE."