Network Working Group Seisho Yasukawa Internet Draft NTT Corp Expires: August 2006 Adrian Farrel Old Dog Consulting Daniel King Aria Networks Ltd. Thomas D. Nadeau Cisco Systems, Inc. February 2006 OAM Requirements for Point-to-Multipoint MPLS Networks draft-ietf-mpls-p2mp-oam-reqs-01.txt Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract Multi-Protocol Label Switching (MPLS) has been extended to encompass point-to-multipoint (P2MP) Label Switched Paths (LSPs). As with point-to-point MPLS LSPs the requirement to detect, handle and diagnose control and data plane defects is critical. For operators deploying services based on P2MP MPLS LSPs the detection and specification of how to handle those defects is important because such defects may not only affect the fundamentals of an MPLS network, but also because they may impact service level specification commitments for customers of their network. Yasukawa et al. [Page 1] Internet-Draft draft-ietf-mpls-p2mp-oam-reqs-01.txt February 2006 This document describes requirements for data plane operations and management for P2MP MPLS LSPs. These requirements apply to all forms of P2MP MPLS LSPs, and include P2MP Traffic Engineered (TE) LSPs and multicast LSPs. Table of Contents 1. Introduction .................................................. 2 2. Terminology ................................................... 3 2.1 Conventions ................................................ 3 2.2 Terminology ................................................ 3 2.3 Acronyms ................................................... 3 3. Motivations ................................................... 3 4. General Requirements .......................................... 4 4.1 Detection of Label Switch Path Defects ..................... 4 4.2 Diagnosis of a Broken Label Switch Path .................... 5 4.3 Path characterization ...................................... 6 4.4 Service Level Agreement Measurement ........................ 6 4.5 Frequency of OAM Execution ................................. 7 4.6 Alarm Suppression, Aggregation and Layer Coordination ...... 7 4.7 Support for OAM Interworking for Fault Notification ........ 7 4.8 Error Detection and Recovery ............................... 8 4.9 Standard Management Interfaces ............................. 8 4.10 Detection of Denial of Service Attacks .................... 9 4.11 Per-LSP Accounting Requirements ........................... 9 5. Security Considerations ....................................... 9 6. IANA Considerations .......................................... 10 7. References ................................................... 10 7.1 Normative References ...................................... 10 7.2 Informative References .................................... 10 8. Acknowledgements ............................................. 11 9. Authors' Addresses ........................................... 11 10. Intellectual Property Statement ............................. 12 11. Full Copyright Statement .................................... 12 1. Introduction This document describes requirements for data plane operations and management (OAM) for point-to-multipoint (P2MP) Multi-Protocol Label Switching (MPLS). This draft specifies OAM requirements for P2MP MPLS, as well as for applications of P2MP MPLS. These requirements apply to all forms of P2MP MPLS LSPs, and include P2MP Traffic Engineered (TE) LSPs [P2MP-SIG-REQ] and [P2MP-RSVP], as well as multicast LDP LSPs [MCAST-LDP]. Note that the requirements for OAM for P2MP MPLS build heavily on the requirements for OAM for point-to-point MPLS. These latter requirements are described in [RFC4377] and are not repeated in this document. Yasukawa et al. [Page 2] Internet-Draft draft-ietf-mpls-p2mp-oam-reqs-01.txt February 2006 For a generic framework for OAM in MPLS networks, refer to [RFC4378]. 2. Terminology 2.1 Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 2.2 Terminology Definitions of key terms for MPLS OAM are found in [RFC4377] and the reader is assumed to be familiar with those definitions which are not repeated here. [P2MP-SIG-REQ] includes some important definitions and terms for use within the context of P2MP MPLS. The reader should be familiar with at least the terminology section of that document. 2.3 Acronyms The following acronyms are used in this document. CE: Customer Edge DoS: Denial of service ECMP: Equal Cost Multipath LDP: Label Distribution Protocol LSP: Label Switch Path LSR: Label Switch Router OAM: Operations and Management OA&M: Operations, Administration and Maintenance. RSVP: Resource reSerVation Protocol P2MP: Point-to-Multipoint SP: Service Provider TE: Traffic Engineering 3. Motivations OAM for MPLS networks has been established as a fundamental requirement both through operational experience and through its documentation in numerous Internet drafts. Many such documents (for example, [RFC4379], [RFC3812], [RFC3813], [RFC3814], and [RFC3815]) developed specific solutions to individual issues or problems. Coordination of the full OAM requirements for MPLS was achieved by [RFC4377] in recognition of the fact that the previous piecemeal approach could lead to inconsistent and inefficient applicability of OAM techniques across the MPLS architecture, and might require significant modifications to operational procedures and systems in order to provide consistent and useful OAM functionality. Yasukawa et al. [Page 3] Internet-Draft draft-ietf-mpls-p2mp-oam-reqs-01.txt February 2006 This document builds on these realizations and extends the statements of MPLS OAM requirements to cover the new area of P2MP MPLS. That is, this document captures the requirements for P2MP MPLS OAM in advance of the development of specific solutions. Nevertheless, at the time of writing, some effort had already been expended to extend existing MPLS OAM solutions to cover P2MP MPLS (for example, [P2MP-LSP-PING]). While this approach of extending existing solutions may be reasonable, in order to ensure a consistent OAM framework it is necessary to articulate the full set of requirements in a single document. This will facilitate a uniform set of MPLS OAM solutions spanning multiple MPLS deployments and concurrent applications. 4. General Requirements The general requirements described in this section are similar to those described for point-to-point MPLS in [RFC4377]. The subsections below do not repeat material from [RFC4377], but simply give references to that document. However, where the requirements for P2MP MPLS OAM differ from or are more extensive than those expressed in [RFC4377], additional text is supplied. In general, it should be noted that P2MP LSPs introduce a scalability issue with respect to OAM that is not present in point-to-point MPLS. That is, an individual P2MP LSP will have more than one egress and the path to those egresses will very probably not be linear (for example, it may have a tree structure). Since the number of egresses for a single P2MP LSP is unknown and not bounded by any small number, it follows that all mechanisms defined for OAM support MUST scale well with the number of egresses and the complexity of the path of the LSP. Mechanisms that are able to deal with individual egresses will scale no worse than similar mechanisms for point-to-point LSPs, but it is desirable to develop mechanisms that are able to leverage the fact that multiple egresses are associated with a single LSP, and so achieve better scaling. 4.1 Detection of Label Switch Path Defects The ability to detect defects in a P2MP LSP SHOULD not require manual, hop-by-hop troubleshooting of each LSR used to switch traffic for that LSP, and SHOULD rely on proactive OAM procedures (such as continuous path connectivity and SLA measurement mechanisms). Any solutions SHOULD either extend or work in close conjunction with existing solutions developed for point-to-point MPLS, such as those specified in [RFC4379] where this requirement is not contradicted by the other requirements in this section. This will leverage existing software and hardware deployments. Yasukawa et al. [Page 4] Internet-Draft draft-ietf-mpls-p2mp-oam-reqs-01.txt February 2006 Note that P2MP LSPs may introduce additional scaling concerns for LSP probing by tools such as [RFC4379]. As the number of leaves of a P2MP LSP increases it potentially becomes more expensive to inspect the LSP to detect defects. Any tool developed for this purpose MUST be cognitive of this issue and MUST include techniques to reduce the scaling impact of an increase in the number of leaves. Nevertheless, it should also be noted that the introduction of additional leaves may mean that the use of techniques such as [RFC4379] are less appropriate for defect detection with P2MP LSPs, while the technique may still remain useful for defect diagnosis as described in the next section. Due to the above scaling concerns, LSRs or other network resources MUST NOT be overwhelmed by the operation of normal proactive OAM procedures, and measures taken to protect LSRs and network resources against being overwhelmed MUST NOT degrade the operational value or responsiveness of proactive OAM procedures. By "overwhelmed" we mean that it MUST NOT be possible for an LSR to be so busy handling proactive OAM that it is unable to continue to process control or data plane traffic at its advertised rate. Similarly, a network resource (such as a data link) MUST NOT be carrying so much proactive OAM traffic that it is unable to carry the advertised data rate. At the same time, it is important to configure proactive OAM, if it is in use, to not raise alarms caused by the failure to receive an OAM message if the component responsible for processing the messages is unable to process because other components are consuming too many system resources - such alarms might turn out to be false. In practice, of course, the requirements in the previous paragraph may be overcome by careful specification of the anticipated data throughput of LSRs or data links, but it should be recalled that proactive OAM procedures may be scaled linearly with the number of LSPs, and the number of LSPs is not necessarily a function of the available bandwidth in an LSR or on a data link. 4.2 Diagnosis of a Broken Label Switch Path The ability to diagnose a broken P2MP LSP and to isolate the failed component (i.e., link or node) in the path is REQUIRED. These functions include a path connectivity test that can test all branches and leaves of a P2MP LSP for reachability, as well as a path tracing function. Note that this requirement is distinct from the requirement to detect errors or failures described in the previous section. In practice Detection and Diagnosis/Isolation MAY be performed by separate or the same mechanisms according to the way in which the other requirements are met. It MUST be possible for the operator (or an automated process) to Yasukawa et al. [Page 5] Internet-Draft draft-ietf-mpls-p2mp-oam-reqs-01.txt February 2006 stipulate a timeout after which the failure to see a response shall be flagged as an error. Any mechanism developed to perform these functions is subject to the scalability concerns expressed in section 4. 4.3 Path Characterization The path characterization function [RFC4377] is the ability to reveal details of LSR forwarding operations for P2MP LSPs. These details can then be compared later during subsequent testing relevant to OAM functionality. Therefore, LSRs supporting P2MP LSPs MUST provide mechanisms that allow operators to interrogate and characterize P2MP paths. Since P2MP paths are more complex than the paths of point-to-point LSPs, the scaling concerns expressed in section 4 apply. Note that path characterization SHOULD lead to the operator being able to determine the full tree for a P2MP LSP. That is, it is not sufficient to know the list of LSRs in the tree, but it is important to know their relative order and where the LSP branches. Since, in some cases, the control plane state and data paths may branch at different points from the control plane and data plane topologies (for example, figure 1), it is not sufficient to present the order of LSRs, but it is important that the branching points on that tree are clearly identified. E / A---B---C===D \ F Figure 1. An example P2MP tree where the data path and control plane state branch at C, but the topology branches at D. A diagnostic tool that meets the path characterization requirements SHOULD collect information that is easy to process to determine the P2MP tree for a P2MP LSP, rather than provide information that must be post-processed with some complexity. 4.4 Service Level Agreement Measurement Mechanisms are required to measure the diverse aspects of Service Level Agreements for services that utilize P2MP LSPs. The aspects are listed in [RFC4377]. Service Level Agreements are often measured in terms of the quality Yasukawa et al. [Page 6] Internet-Draft draft-ietf-mpls-p2mp-oam-reqs-01.txt February 2006 and rate of data delivery. In the context of P2MP MPLS, data is delivered to multiple egress nodes. The mechanisms MUST, therefore, be capable of measuring the aspects of Service Level Agreements as they apply to each of the egress points to a P2MP LSP. At the same time, in order to diagnose issues with meeting Service Level Agreements, mechanisms SHOULD be provided to measure the aspects of the agreements at key points within the network such as at branch nodes on the P2MP tree. 4.5 Frequency of OAM Execution As stipulated in [RFC4377], the operator MUST have the flexibility to configure OAM parameters to meet their specific operational requirements. This requirement is potentially more important in P2MP deployments where the effects of the execution of OAM functions can be potentially much greater than in a non-P2MP configuration. For example, a mechanism that causes each egress of a P2MP LSP to respond could result in a large burst of responses for a single OAM request. Therefore, solutions produced SHOULD NOT impose any fixed limitations on the frequency of the execution of any OAM functions. 4.6 Alarm Suppression, Aggregation and Layer Coordination As described in [RFC4377], network elements MUST provide alarm suppression and aggregation mechanisms to prevent the generation of superfluous alarms within or across network layers. The same time constraint issues identified in [RFC4377] also apply to P2MP LSPs. A P2MP LSP also brings the possibility of a single fault causing a larger number of alarms than for a point-to-point LSP. This can happen because there are a larger number of downstream LSRs (for example, a larger number of egresses). The resultant multiplier in the number of alarms could cause swamping of the alarm management systems to which the alarms are reported, and serves as a multiplier to the number of potentially duplicate alarms raised by the network. Alarm aggregation or limitation techniques MUST be applied within any solution, or be available within an implementation, so that this scaling issue can be reduced. Note that this requirement introduces a second dimension to the concept of alarm aggregation. Where previously it applied to the correlation and suppression of alarms generated by different network layers, it now also applies to similar techniques applied to alarms generated by multiple downstream LSRs. 4.7 Support for OAM Interworking for Fault Notification [RFC4377] specifies that an LSR supporting the interworking of one or more networking technologies over MPLS MUST be able to translate an MPLS defect into the native technology's error Yasukawa et al. [Page 7] Internet-Draft draft-ietf-mpls-p2mp-oam-reqs-01.txt February 2006 condition. This also applies to any LSR supporting P2MP LSPs. However, careful attention to the requirements for alarm suppression stipulated therein and in section 4.6 SHOULD be observed. Note that the time constraints for fault notification and alarm propagation impact upon the solutions that might be applied to the scalability problem inherent in certain OAM techniques applied to P2MP LSPs. For example, a solution to the issue of a large number of egresses all responding to some form of probe request at the same time, might be to make the probes less frequent - but this might impact on the ability to detect and/or report faults. Where fault notification to the egress is required, there is the possibility that a single fault will give rise to multiple notifications, one to each egress node of the P2MP that is downstream of the fault. Any mechanisms MUST manage this scaling issue while still continuing to deliver fault notifications in a timely manner. Where fault notification to the ingress is required, the mechanisms MUST ensure that the notification identifies the egress nodes of the P2MP LSP that are impacted (that is, those downstream of the fault) and does not falsely imply that all egress nodes are impacted. 4.8 Error Detection and Recovery Recovery from a fault by a network element can be facilitated by MPLS OAM procedures. As described in [RFC4377], these procedures will detect a broad range of defects, and SHOULD be operable where MPLS P2MP LSPs span multiple routing areas, or multiple Service Provider domains. The same requirements as those expressed in [RFC4377] with respect to automatic repair and operator intervention ahead of customer detection of faults apply to P2MP LSPs. It should be observed that faults in P2MP LSPs MAY be recovered through techniques described in [P2MP-RSVP]. 4.9 Standard Management Interfaces The wide-spread deployment of MPLS requires common information modeling of management and control of OAM functionality. This is reflected in the integration of standard MPLS-related MIBs [RFC3813], [RFC3812], [RFC3814], [RFC3815] for fault, statistics and configuration management. These standard interfaces provide operators with common programmatic interface access to operations and management functions and their status. The standard MPLS-related MIB modules [RFC3812], [RFC3813], [RFC3814], and [RFC3815] SHOULD be extended wherever possible, to Yasukawa et al. [Page 8] Internet-Draft draft-ietf-mpls-p2mp-oam-reqs-01.txt February 2006 support P2MP LSPs, the associated OAM functions on these LSPs, and the applications that utilize P2MP LSPs. Extending them will facilitate the reuse of existing management software both in LSRs and in management systems. In cases where the existing MIB modules cannot be extended, then new MIB modules MUST be created. 4.10 Detection of Denial of Service Attacks The ability to detect denial of service (DoS) attacks against the data or control planes which signal P2MP LSPs MUST be part of any security management related to MPLS OAM tools or techniques. 4.11 Per-LSP Accounting Requirements In an MPLS network where P2MP LSPs are in use, Service Providers can measure traffic from an LSR to the egress of the network using some MPLS-related MIB modules (see section 4.9), for example. Other interfaces MAY exist as well and enable the creation of traffic matrices so that it is possible to know how much traffic is traveling from where to where within the network. Analysis of traffic flows to produce a traffic matrix is more complicated where P2MP LSPs are deployed because there is no simple pairing relationship between an ingress and a single egress. Fundamental to understanding traffic flows within a network that supports P2MP LSPs will be the knowledge of where the traffic is branched for each LSP within the network. That is, where within the network the branch nodes for the LSPs are located and what their relationship is to links and other LSRs. Traffic flow and accounting tools MUST take this fact into account. 5. Security Considerations This document introduces no new security issues compared with [RFC4377]. It is worth highlighting, however, that any tool designed to satisfy the requirements described in this document MUST include provisions to prevent its unauthorized use. Likewise, these tools MUST provide a means by which an operator can prevent denial of service attacks if those tools are used in such an attack. LSP mis-merging is described in [RFC4377] where it is pointed out that it has security implications beyond simply being a network defect. It needs to be stressed that it is in the nature of P2MP traffic flows that any erroneous delivery (such as caused by LSP mis-merging) is likely to have more far reaching consequences since the traffic will be mis-delivered to multiple receivers. As with the OAM functions described in [RFC4377], the performance of diagnostic functions and path characterization may involve the extraction of a significant amount of information about network construction. The network operator MAY consider this Yasukawa et al. [Page 9] Internet-Draft draft-ietf-mpls-p2mp-oam-reqs-01.txt February 2006 information private and wish to take steps to secure it, but further, the volume of this information may be considered as a threat to the integrity of the network if it is extracted in bulk. This issue may be greater in P2MP MPLS because of the potential for a large number of receivers on a single LSP and the consequent extensive path of the LSP. 6. IANA Considerations This document creates no new requirements on IANA namespaces. 7. References 7.1 Normative References [RFC4377] T. Nadeau, M. Morrow, G. Swallow, D. Allan, and S. Matsushima, "Operations and Management (OAM) Requirements for Multi-Protocol Label Switched (MPLS) Networks", RFC 4377, February 2006. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. 7.2 Informative References [RFC4378] Alan, D., Nadeau T., "A Framework for Multi-Protocol Label Switching (MPLS) Operations and Management (OAM)", RFC4378, February 2006. [RFC4379] Kompella, K., and Swallow, G., (Editors), "Detecting MPLS Data Plane Failures", RFC 4379, February 2006. [MCAST-LDP] IJ., Wijnands, I. Minei, et al., "Label Distribution Protocol Extensions for Point-to-Multipoint and Multipoint-to-Multipoint Label Switched Paths", draft-minei-wijnands-mpls-ldp-p2mp, work in progress. [P2MP-LSP-PING] Yasukawa, S., Farrel, A., Ali, Z., and Fenner, B., "Detecting Data Plane Failures in Point-to-Multipoint MPLS Traffic Engineering - Extensions to LSP Ping", draft-yasukawa-mpls-p2mp-lsp-ping, work in progress. [P2MP-RSVP] Aggarwal, R., Papadimitriou, D., and Yasukawa, S., "Extensions to RSVP-TE for Point to Multipoint TE LSPs", draft-ietf-mpls-rsvp-te-p2mp, work in progress. Yasukawa et al. [Page 10] Internet-Draft draft-ietf-mpls-p2mp-oam-reqs-01.txt February 2006 [P2MP-SIG-REQ] Yasukawa, S. (Editor), "Signaling Requirements for Point to Multipoint Traffic Engineered MPLS LSPs", draft-ietf-mpls-p2mp-sig-requirement, work in progress. [RFC3812] Srinivasan, C., Viswanathan, A. and T. Nadeau, "MPLS Traffic Engineering Management Information Base Using SMIv2", RFC3812, June 2004. [RFC3813] Srinivasan, C., Viswanathan, A. and T. Nadeau, "MPLS Label Switch Router Management Information Base Using SMIv2", RFC3813, June 2004. [RFC3814] Nadeau, T., Srinivasan, C., and A. Viswanathan, "Multiprotocol Label Switching (MPLS) FEC-To-NHLFE (FTN) Management Information Base", RFC3814, June 2004. [RFC3815] Cucchiara, J., Sjostrand, H., and Luciani, J., "Definitions of Managed Objects for the Multiprotocol Label Switching (MPLS), Label Distribution Protocol (LDP)", RFC 3815, June 2004. 8. Acknowledgment The authors wish to acknowledge and thank the following individuals for their valuable comments to this document: Rahul Aggarwal, Neil Harrison, Ben Niven-Jenkins and Dimitri Papadimitriou. 9. Authors' Addresses Adrian Farrel Old Dog Consulting Phone: +44 (0) 1978 860944 Email: adrian@olddog.co.uk Daniel King Aria Networks Ltd. Phone: +44 (0)1249 665923 Email: daniel.king@aria-networks.com Thomas D. Nadeau Cisco Systems, Inc. 1414 Massachusetts Ave. Boxborough, MA 01719 Email: tnadeau@cisco.com Yasukawa et al. [Page 11] Internet-Draft draft-ietf-mpls-p2mp-oam-reqs-01.txt February 2006 Seisho Yasukawa NTT Corporation 9-11, Midori-Cho 3-Chome Musashino-Shi, Tokyo 180-8585, Japan Phone: +81 422 59 4769 Email: yasukawa.seisho@lab.ntt.co.jp 10. Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. 11. Full Copyright Statement Copyright (C) The Internet Society (2006). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Yasukawa et al. [Page 12]