CCAMP Working Group Richard Rabbat, Ed. (FLA) Internet Draft Vishal Sharma, Ed. (Metanoia, Inc.) Expires: April 2004 Norihiko Shinomiya (FLL) Ching-Fong Su (FLA) October 2003 Fault Notification Protocol for GMPLS-Based Recovery draft-rabbat-fault-notification-protocol-04.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026 [1]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This draft presents a fault notification protocol to be used in a GMPLS-based failure recovery scheme. The protocol guarantees recovery path(s) activation in a bounded time in the event of single resource failures. These failures include fiber cut, transponder failure and node failure. Bounded recovery time is achieved by pre- signaling recovery paths whose nodes can be reached within a specific time, based on the physical capabilities of the nodes and the delay characteristics of the control plane. We propose using a flooding protocol for fault notification to allow for per-failure notification and to speed up the recovery process. We justify choices made for the notification method and the messaging required for the protocol. The draft does not mandate a specific implementation of the Fault Notification Protocol. Rabbat & Sharma (Eds.) Expires – April 2004 [Page 1] draft-rabbat-fault-notification-protocol-04.txt October 2003 Table of Contents 1. Overview.......................................................2 2. Terminology....................................................4 3. Glossary of Terms Used.........................................4 4. Requirements at Recovery Path Setup Time.......................4 5. Steps in Failure Notification and Service Recovery.............5 5.1 T1: Fault Detection Time......................................6 5.2 T2: Hold-Off Time.............................................6 5.3 Tt = T3+T4: Fault Notification Time and Completion of Recovery Operation Time....................................................6 5.4 T5: Traffic Recovery Time.....................................7 6. Fault Notification Protocol (FNP)..............................7 6.1 FNP Flooding Operation........................................9 6.2 Delays Incurred by Messages...................................9 6.3 Notification Message Data....................................10 7. Reversion (Normalization).....................................11 8. Security Considerations.......................................11 9. Conclusion....................................................11 10. Acknowledgments..............................................11 11. Intellectual Property Considerations.........................11 12. References...................................................13 13. Authors' Addresses...........................................14 Appendix A. Fault Notification Message Delays on a Path..........14 A.1 Delays Associated with Link Traversal........................14 A.2 Delays Incurred at the Nodes.................................15 Full Copyright Statement.........................................16 1. Overview Recovery (protection and restoration in optical switching networks) under tight time constraints has been recognized as a challenging issue [2] that is crucial to meeting requirements for high- availability and service-level guarantees. Several mechanisms have been devised for recovery in mesh and ring topologies. Currently, the CCAMP WG has a collection of drafts that address the issue of recovery in networks featuring a Generalized Multi-Protocol Label Switching (GMPLS) control-plane. Requirements for recovery in optical networks are presented in [3]. Other drafts have been proposed: a terminology draft for GMPLS-based recovery [2], an analysis draft [4] that looks at differences between protection, restoration, path-based, link-based and span-based approaches, and a functional specification draft [5] that presents a functional description of some of the protocol extensions needed to support GMPLS-based recovery. Rabbat & Sharma (Eds.) Expires - April 2004 [Page 2] draft-rabbat-fault-notification-protocol-04.txt October 2003 In general, a fault notification protocol for optical transport networks should address recovery requirements falling into three main categories: - Timing requirements: it must meet adequate bounds on timing - Control plane resources: it must use control plane resources efficiently - Design of recovery schemes: it must allow for the design of flexible recovery schemes Protection and restoration algorithms can be used for local repair (link-based or node-based), span recovery, and path recovery. This document presents a fault notification protocol and recovery scheme designed to ensure bounded recovery times, (e.g., 50 ms), which are comparable to recovery times in the ring-based SONET/SDH networks that implement 1+1 or 1:1 protection schemes. Link-based recovery can handle faults such as fiber link failures and transponder failures. However, in the case of a node failure, the control plane uses either node-based or path-based recovery. The advantage of path-based recovery lies in its ability to reduce wavelength redundancy (wavelengths that are reserved for possible failures), but its disadvantage is the potentially lengthy delay incurred in notifying all nodes along the recovery path of the failure of a remote resource. Span-based protection allows the protection of independent segments on the working path, thereby decreasing the recovery time but requires more resources for protection. In addition, the provider has to go to a greater degree of planning to protect the same resource. In some applications, recovery paths need to be chosen carefully to meet certain recovery time requirements. This document presents a fault notification protocol that applies to intra-domain protection, and that we will call FNP. (We shall use the term fault notification protocol, when referring to a generic scheme for notification, and the term FNP, when referring to the specific scheme discussed in this document.) The protocol applies to networks that implement shared recovery, and deals with both ring and mesh- based recovery. Multi-domain recovery is not within the scope of this draft. In addition, this proposal focuses on scalability, an important issue that arises when using signaling for fault notification. Implementation of the protocol is left for further drafts. For details about the applicability of FNP, please refer to the accompanying draft [6]. We assume unidirectional traffic through Label Switched Paths (LSPs) and assume that bidirectional traffic is carried by two unidirectional LSPs. Assumptions made in this draft are also valid for bi-directional LSPs. For the purpose of illustration, we also Rabbat & Sharma (Eds.) Expires - April 2004 [Page 3] draft-rabbat-fault-notification-protocol-04.txt October 2003 assume a mesh Wavelength Division Multiplexing (WDM) network; applicability to ring-topology networks is automatic. 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [7]. 3. Glossary of Terms Used In addition to the terminology for GMPLS-based recovery that is documented in [2], this draft uses the following acronyms: o AIS: Alarm Indication Signal, a signal at the SONET/SDH transport layer o BDI: Backward Defect Indication, a signal at the transport layer sent upstream o LSP: Label Switched Path o MEMS: Micro-Electro Mechanical System o PXC: Photonic Cross-Connect, a cross-connect that switches wavelengths transparently, by means of a switching fabric such as MEMS o WDM: Wavelength Division Multiplexing 4. Requirements at Recovery Path Setup Time A request for a working path signaled into the network indicates the type of protection or restoration it requires, and, optionally, a recovery priority value. The recovery priority is useful if, during the recovery process, a node has to decide on which (of many) working paths to protect. After the recovery route computation algorithm calculates the protection or restoration path, the link resources (wavelengths, wavebands, etc.) along that path are reserved and possibly activated. When the recovery path is not activated, these link resources may be used to carry preemptible best-effort traffic to increase network utilization. This traffic is generally identified as "extra traffic." Alternatively, the same link resource may be reserved by multiple protection paths for different link failures as long as these protection paths do not need to be activated simultaneously (e.g., M:N shared protection). In either case, proper link resources need to be activated upon the notification of failure. When a label for a recovery LSP is setup on a certain node A by RSVP- TE, node A should be aware of the network resource that this LSP is Rabbat & Sharma (Eds.) Expires - April 2004 [Page 4] draft-rabbat-fault-notification-protocol-04.txt October 2003 protecting. When using RSVP-TE for example, the protection PATH message may notify all nodes on the protection path of this information at path setup time as proposed in [8]. This allows node A to bundle (or group together) labels (as well as link resources) that protect a particular network resource. For example, if two labels j and k correspond to two LSPs used to protect working paths from the failure of link (X,Y), then they belong to the bundle L (X,Y). This allows node A to process, in its control plane, the joint event of the two LSP failures and possibly jointly activate/cross-connect both LSPs referenced by labels j and k when it receives notification of the failure of link (X,Y). This documents proposes a method for per-failure fault notification (as compared to per-LSP fault notification), hence such bundled label information is essential. The main difference between "per-failure" vs. "per-LSP" notification is in the number of notification mechanisms that have to concurrently occur. Per-failure fault notification allows the engaging of one mechanism to notify all relevant nodes of the fault. On the other hand, per-LSP notification requires activating as many mechanisms as the number of failed LSPs (for example, all LSPs that failed due to a link failure). In an optical network carrying possibly hundreds of wavelengths per fiber, per-LSP notification can be taxing on the hardware and resource- intensive. Using LSP hierarchy, one could achieve some amount of efficiency by bundling notification messages. These messages would, however, have to be unbundled later, to reach different sources. The different sources would then initiate handshake mechanisms on the different recovery paths [5]. This is a time-consuming process that increases recovery time. As explained later, the flooding approach decreases the recovery time by removing the need for such a mechanism. A companion draft [9] explains the need for an expedited flooding mechanism to realize FNP. The document outlines how a flooding protocol implementation must balance the need for fast flooding of failure notifications, while controlling the frequency of flooding message transmission, so that it maintains network stability. 5. Steps in Failure Notification and Service Recovery The steps described in this section detail the different times between the occurrence of a network impairment and completion of all recovery actions. The failure sequence is based on the timing sequence in the ITU-T communication G.gps [10], adapted for WDM networks. The critical component in guaranteeing time constraints to service recovery is the fault notification process. The following Rabbat & Sharma (Eds.) Expires - April 2004 [Page 5] draft-rabbat-fault-notification-protocol-04.txt October 2003 sequence of events MUST be followed in order to ensure that the recovery process completes within a specific amount of time. +-Network Impairment | +-Fault Detected | | +-Start of Fault Notification | | | +-Recovery Operation Complete | | | | +-Traffic Recovered | | | | | | | | | | v v v v v ------------------------------------------------> | T1 | T2 | Tt | T5 | time Figure 1. Recovery Temporal Model 5.1 T1: Fault Detection Time This is the period of time between the impairment in a network and the detection of signals triggered by that impairment. An example of such network impairment is a fiber cut. In general, if a bi- directional link is cut, both its upstream and downstream nodes will detect the fault. A unidirectional link failure will be detected by the downstream node. To support the failure detection requirement, nodes MUST implement per-channel monitoring that will pinpoint the failure and report it. 5.2 T2: Hold-Off Time This is the period of time between the detection of a fault and the start of the fault recovery process. In other words, it is the period of time that the reporting entity waits before starting the fault recovery process. This allows the fault recovery process at a given layer to wait for recovery to occur at a lower layer. In the case of WDM-based recovery, this time should be 0 sec since there is no underlying layer recovery. In other networks that use SONET-based recovery, this time T2 may be set to 50ms such that SONET protection scheme can complete before any IP-based recovery is triggered. 5.3 Tt = T3+T4: Fault Notification Time and Completion of Recovery Operation Time T3 and T4 are summed as Tt (Transfer time), which is the period of time between the start of the fault notification process and the completion of recovery switching of the traffic on the protection path. This includes the transmission and processing of the control Rabbat & Sharma (Eds.) Expires - April 2004 [Page 6] draft-rabbat-fault-notification-protocol-04.txt October 2003 signals required to effect recovery. In other words, it is the interval between the time when the detecting entity starts sending out a fault notification message and the time when every node, including ingress nodes and intermediate nodes on the corresponding recovery paths, have been notified of the failure and finished reconfiguring themselves for carrying restored traffic. This includes the fact that the protection switching node has switched traffic unto the backup path. 5.4 T5: Traffic Recovery Time T5 is the period of time between the completion of the recovery actions and the full restoration of working traffic. In other words, this is the time between the last recovery action at the protection switching node and the time that the traffic (if present) is completely recovered. This interval is intended to account for the time required for traffic to once again arrive at the point in the network that experienced disrupted or degraded service due to the occurrence of the fault, i.e. the egress node. 6. Fault Notification Protocol (FNP) The Fault Notification Protocol is a series of procedures designed to occur during Tt (T3+T4), the fault notification and completion of recovery operation time, and effect timely notification of network faults. This protocol is used for notifying nodes of the resource failure and in activating the recovery lightpaths. For link-based recovery, the ingress node to the recovery LSP is the upstream detecting node. If the recovery time is strictly constrained, the ingress node SHOULD be as close to the link failure as possible. This reduces the recovery time since no messages have to be relayed to a remote or centralized authority to initiate recovery. For path-based recovery, the ingress node to the protection LSP is the protection switching node. The ingress and egress nodes of the working LSP are typically not involved in the notification process in the FNP approach. A fault detecting entity may be at different locations depending on the type of fault. That detecting entity will initiate the notification process. The detecting entity MAY use several fault notification methods to notify other nodes of the failure, including GMPLS-based signaling or flooding. In the case of GMPLS-based signaling, there is generally one fault notification message per disrupted Label Switched Path. In case LSP Hierarchy is used, it would decrease the number of messages by bundling them; these messages will, however, need to be unbundled Rabbat & Sharma (Eds.) Expires - April 2004 [Page 7] draft-rabbat-fault-notification-protocol-04.txt October 2003 to reach different sources. Then, each of these nodes would have to initiate the handshake process. While some protection paths may be the same and could be signaled together during the handshake phase, this is generally restrictive in a mesh network. Hence, signaling does not scale well with the number of connections; in addition, the message processing delay is less predictable. For details about the notification methods and the choice of flooding for this draft, the reader is encouraged to refer to [11]. In the case of flooding, the message sent from the detecting entity to all nodes on the various protection paths should reach them within the specified recovery time (T-rec) minus the reconfiguration time (T-cfg) needed at each node after it is notified of the fault. We define this time as the fault notification time (T-ntf = T-rec – T- cfg). The method for assigning each node's T-ntf is out of scope for this document. Nodes on a recovery path (including the ingress node) are aware that they are protecting against the failure of a particular resource. All nodes notified of the failure will activate the recovery path by performing any required hardware reconfiguration (e.g., moving mirrors in the case of a MEMS-based switching fabric). The approach outlined in this draft supports node reconfiguration applied sequentially (e.g., parallel movement of the mirrors is not available), or in parallel (e.g., electronic switching fabric). An algorithm that computes the constrained recovery path SHOULD take the physical capability of nodes into account in its path calculation. The ingress node starts sending data on the protection path at the start time S(I) specified in the next paragraph. If one of the detecting entities at the ingress or egress node detect, at the data plane, a failure in the protection light path to be activated, it MUST raise an alarm that may be dealt with at the management plane. The management plane will take appropriate remediation action. Alarm and remediation are outside the scope of this draft. The nodes on protection paths receive the fault notification within a deterministic time. This time delay is calculated by each node as explained in Appendix A. To avoid complex clock synchronization, an ingress node, identified as node I, that receives the notification from a detecting node, node J, calculates the start time S(I) at which it must switch traffic to the protection path as follows: S(I) = time-of-notification(I) - min-delay-between(J,I) + T-rec Where - time-of-notification(I) returns the clock time at node I. Rabbat & Sharma (Eds.) Expires - April 2004 [Page 8] draft-rabbat-fault-notification-protocol-04.txt October 2003 - min-delay-between(J,I) returns the minimum time needed for the notification from node J to reach node I; this value is dependent on the topology and the different equipment in the network. It is calculated offline based on the topology and hardware information, and is stored as a static table at every node. Note that (time-of-notification(I) - min-delay-between(J,I)) will give the time when failure was detected at J, and T-rec is the recovery time requirement. Our scheme, therefore, works in the following manner: 1. Given the topology and the equipment in the network, it is possible to calculate T-rec and T-ntf for a given failure. 2. An offline or online algorithm may calculate the recovery path using this information. 3. Upon the occurrence of a failure, when flooding-based notification is used as described above, a node I on the recovery path is guaranteed that at S(I), all other nodes along the recovery path have been informed of the failure and have taken the appropriate action to move traffic onto the recovery path. 6.1 FNP Flooding Operation Fault notification is done via flooding as follows. The detecting entity sends a notification packet to its neighbors on all outgoing links. The notification packet is a high-priority packet, and contains the unique identifier of the link at fault. Each node that receives such a packet sends an acknowledgement to the sender and transmits duplicates of the notification to all other neighboring nodes. To reduce the amount of fault notification traffic that is flooded, the nodes avoid re-broadcasting packets about the same fault and decrement a time-to-live field in the packets as they are received. 6.2 Delays Incurred by Messages The above discussion suggests that in order for the protection algorithm to abide by the T-rec ms recovery requirement, it needs to adopt one of two methods. 1. Be aware of timing issues to be able to select a proper path 2. Only consider the nodes and links that satisfy the timing constraints. Rabbat & Sharma (Eds.) Expires - April 2004 [Page 9] draft-rabbat-fault-notification-protocol-04.txt October 2003 Due to the complexity of the first method, we believe that the second method will be easier to develop and implement. For example, a pruned topology may be considered for protection path computation, where links/nodes that violate the strict recovery time requirements are excluded. A database of link information should hold the fiber physical length and the capacity of each link (or channel) as well as the notification message processing time. The total time needed by a notification packet to travel from source to destination can be broken into two delay components: the time needed to traverse each link and the time needed to go through each node. While the different delay calculations are discussed in Appendix A, the algorithm for computing the protection paths is out of scope of this draft. 6.3 Notification Message Data Two types of messages are needed for reliable communication of fault notifications: - A Fault Notify Message to carry the information regarding the failure from each node on each of its outgoing links to its neighboring node(s). - A Fault Notify Acknowledge Message to indicate that the notification message was properly received by a neighboring node. Aside from implementation-dependent constructs, the data to be carried in these messages is presented in Table 1 below. Table 1. Required and Optional Data for Fault Notifications -------------------------------------------------------------------- Data Object Fault Fault Notify Description Notify Acknowledge -------------------------------------------------------------------- Message ID R R Identifies notification messages Fault Link ID R - Identifies the failed link Fault ID R - Identifies sequence of failure Channel Status O - Indication of link fault status Detecting Node ID O - Identifies the original node that is reporting the failure TTL O - Time To Live field -------------------------------------------------------------------- R: required, O: optional, -: not applicable A node keeps sending Fault Notify messages at intervals until it receives a Fault Notify Acknowledgement response or the control channel connectivity is declared lost. Rabbat & Sharma (Eds.) Expires - April 2004 [Page 10] draft-rabbat-fault-notification-protocol-04.txt October 2003 7. Reversion (Normalization) Most of the current literature recommends that for resource efficiency, the traffic should be moved back to the original path when the failed link or node is back online. Although reversion is an optional step, it is typically employed. If reversion is not used, the "orphaned" bandwidth on the failed working paths should be reclaimed as these paths are repaired. The signaling of fault repair notifications is similar to that of fault notifications. However, the reversion phase does not have strict time constraints. 8. Security Considerations This draft makes use of several existing protocols; therefore this draft does not introduce any new security issues besides the ones that arise in the use of these protocols. 9. Conclusion This draft presented the Fault Notification Protocol for IP- controlled optical networks that implement shared recovery. It described the steps required in the notification process and how they lead to the recovery of service within specific time bounds. A "per- failure" approach (as opposed to the "per-LSP" approach) to fault notification was proposed in order to improve scalability and guarantees. 10. Acknowledgments The authors would like to thank Jonathan Lang, Adrian Farrel, Neil Harisson, Jonathan Sadler, Fabio Ricciato, Zafar Ali and Roberto Albanese for feedback and helpful comments on the fault notification protocol, and Takafumi Chujo, Peter Czezowski, and Akira Chugo for valuable inputs to this draft. 11. Intellectual Property Considerations This section is taken from Section 10.4 of RFC2026 [1]. The IETF takes no position regarding the validity or scope of any intellectual property or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights Rabbat & Sharma (Eds.) Expires - April 2004 [Page 11] draft-rabbat-fault-notification-protocol-04.txt October 2003 might or might not be available; neither does it represent that it has made any effort to identify any such rights. Information on the IETF's procedures with respect to rights in standards-track and standards-related documentation can be found in BCP-11. Copies of claims of rights made available for publication and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementors or users of this specification can be obtained from the IETF Secretariat. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights, which may cover technology that may be required to practice this standard. Please address the information to the IETF Executive Director. Rabbat & Sharma (Eds.) Expires - April 2004 [Page 12] draft-rabbat-fault-notification-protocol-04.txt October 2003 12. References [1] Bradner, S., "The Internet Standards Process -- Revision 3", BCP 9, RFC 2026, October 1996. [2] Mannie, E., ed., et al, "Recovery (Protection and Restoration) Terminology for Generalized Multi-Protocol Label Switching (GMPLS)", Internet Draft, work in progress, draft-ietf-ccamp- gmpls-recovery-terminology-02.txt, May 2003. [3] Rabbat, R. and Soumiya, T., (Eds.), "Optical network failure recovery requirements", Internet Draft, work in progress, draft- rabbat-optical-recovery-reqs-00.txt, June 2003. [4] Papadimitriou, D., et al, "Analysis of Generalized MPLS-based Recovery Mechanisms (including Protection and Restoration)", Internet draft, work in progress, draft-ietf-ccamp-gmpls-recovery- analysis-02.txt, September 2003. [5] Lang, J., and Rajagopalan, B. (Eds.) “Generalized MPLS recovery functional specification,” Internet Draft, Work in Progress, draft-ietf-ccamp-gmpls-recovery-functional-01.txt, September 2003. [6] Rabbat, R., Su, C.-F., Sharma, V., "Observations on the Applicability of the Fault Notification Protocol", Internet Draft, work in progress, draft-rabbat-fnp-applicability-00.txt, October 2003. [7] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [8] Li, G., J. Yates, et al, "Experiments in Fast Restoration using GMPLS in Optical/Electronic Mesh Networks", Post-deadline Papers Digest, OFC 2001, Anaheim, CA, March 2001. [9] Rabbat, R., Sharma, V. and Ali, Z., "Expedited Flooding for Restoration in Shared-Mesh Transport Networks", Internet draft, work in progress, draft-rabbat-expedited-flooding-00.txt, October 2003. [10] ITU-T Draft Recommendation G.gps, "Generic Protection Switching", April 2002, available at ITU web site. [11] Rabbat, R. et al, "Fault Notification and Service Recovery in WDM Networks", white paper available at: http://perth.mit.edu/~richard/wp-ietf-fault-notification.pdf. Rabbat & Sharma (Eds.) Expires - April 2004 [Page 13] 13. Authors' Addresses Richard Rabbat Vishal Sharma Fujitsu Labs of America, Inc. Metanoia, Inc. 1240 E. Arques Ave, MS 345 1600 Villa Street, Unit 352 Sunnyvale, CA 94085 Mtn. View, CA 94041 United States of America United States of America Phone: +1-408-530-4537 Phone: +1-408-530-8313 Email: rabbat@alum.mit.edu Email: v.sharma@ieee.org Norihiko Shinomiya Ching-Fong Su Fujitsu Laboratories Ltd. Fujitsu Labs of America, Inc. 1-1, Kamikodanaka 4-Chome 1240 E. Arques Ave Nakahara-ku, Kawasaki Sunnyvale, CA 94085 211-8588, Japan United States of America Phone: +81-44-754-2635 Phone: +1-408-530-4572 Email: shinomi@jp.fujitsu.com Email: csu@fla.fujitsu.com Appendix A. Fault Notification Message Delays on a Path This appendix describes the delays incurred on a path. Two types of delays occur on the path between any two nodes. They are delays incurred during traversal of the links on that path, and delays that occur at the nodes along the path. The following presents the computations and expected values for the different delays. A.1 Delays Associated with Link Traversal The time needed to traverse each link is the sum of the transmission time and the link propagation delay: 1. The transmission time is a value based on link capacity. The calculation is as follows: D trans = (packet size) / (link speed). 2. The link propagation delay is due to the physical length of the link: D prop = length / (light propagation speed on fiber). The length of a notification packet is expected to be of the order of a hundred bytes (about 10^3 bits). As an example, for a link speed of 1 Gbps, D trans ~= 10^3 / 10^9 = 10^-6 s = 1 microsecond. This value therefore can safely be ignored in calculating delays. On the other hand, the link propagation delay in metropolitan area and long-haul networks affects total delay. For a distance of 100 km, with light speed in a fiber at 2/3 (about 200,000 km/s) of its speed in free space, Rabbat & Sharma (Eds.) Expires – April 2004 [Page 14] draft-rabbat-fault-notification-protocol-04.txt October 2003 D prop ~= 10^2 / (2 * 10^5) = 0.5*10^-3 s = 500 microseconds. A.2 Delays Incurred at the Nodes At each node, two delays are important: queuing delay and processing time. The processing time D proc has been identified in the literature [8] as a few tenths of a millisecond in the case of an RSVP object. This value is smaller in the case of a simpler LMP message requesting the activation of an LSP path. The issue of queuing delay is important at all intermediate nodes. Fault notification messages should be queued at the front of the buffer that holds other control packets in order to avoid queuing delays, (those messages do not have to contend with data packets since obviously no data are sent over the control channel). A queuing process such as priority queuing would allow those packets to be admitted at the head of the queue, through the setup of the priority of the packet. A simple mechanism such as the setup of the priority bits at the IP header, such as the IP precedence bits or DSCP code points of the TOS (Type Of Service) byte would be appropriate. Using priority queuing for fault notification messages will ensure that their queuing delay will be bounded. In the case of flooding for fault notification, D queue(A) = 0 sec. If other fault notification messages are in the queue as well, this implies multiple failures, where the time recovery guarantee does not apply. Otherwise, it may indicate the fact that multiple messages are traveling on different protection paths to notify the same link failure, such as the case when a signaling protocol is used for fault notification. In the case of per-LSP fault notification just as in the case of using a signaling protocol, the maximum queuing delay at node A is: D queue max(A)= (number of protection paths) * (packet size) / (link bandwidth). This provides the mathematical basis for using flooding for fault notification; flooding allows this value to be 0 sec. In the absence of priority queuing, the maximum queue delay can be calculated as follows at node A, assuming fair queuing at the FIFO buffers of all control channels and assuming input buffers only: D queue max(A)= (number of queues) * (queue size) / (link bandwidth). This value is an upper bound, and is dependent on hardware buffer implementations. Rabbat & Sharma (Eds.) Expires - April 2004 [Page 15] draft-rabbat-fault-notification-protocol-04.txt October 2003 Full Copyright Statement "Copyright (C) The Internet Society (2003). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE." Rabbat & Sharma (Eds.) Expires - April 2004 [Page 16]