Internet Engineering Task Force MBONED Working Group INTERNET-DRAFT Kevin Almeroth draft-ietf-mboned-mrm-use-00.txt UCSB Expires August 1999 Liming Wei cisco Systems, Inc February 26, 1999 Justification for and use of the Multicast Routing Monitor (MRM) Protocol Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." To view the entire list of current Internet-Drafts, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Abstract This document motivates the need for the Multicast Routing Monitor (MRM) [MRM] protocol by describing the niche that exists for a router-based multicast management protocol. Using the "sufficient and necessary" argument, we suggest that existing protocols and techniques lack important management functionality. This document briefly describes the methodology used by MRM, justifies the existence of MRM, and describes some of the scenarios in which MRM will be of value. 1. Introduction The Multicast Routing Monitor (MRM) protocol has been designed to assist in the detection and isolation of network faults related to the delivery of multicast traffic[MRM99]. In particular, management functions offered by MRM are specifically designed to monitor routing operation, and assist in the investigation of routing anomalies and connectivity problems. MRM has been designed with consideration for the other types of multicast management protocols and tools that are available. As Almeroth, Wei [Page 1] INTERNET-DRAFT draft-ietf-mboned-mrm-use-00.txt February 1999 we will show, even though there are a wide variety of tools available today, there is a need for a router-based monitoring protocol. The justification for MRM as a new protocol has followed the ``necessary and sufficient'' premise. MRM is being developed because it is necessary when comparing its functions to those offered by alternatives like the Real Time Control Protocol (RTCP) [RTP] and the Simple Network Management Protocol (SNMP) [SNMPV1,SNMPV2]. Furthermore, MRM is being developed because it is sufficient in providing the functions needed by its target class of applications. Using this reasoning, MRM will offered functions and provide multicast traffic management that no other protocols currently offer. 2. Overview of MRM MRM is a protocol intended to be implemented in both routers and end stations. The operation of MRM is based on communication and coordination between three types of network entities. * MRM Manager: The MRM manager provides an interface enabling a user to configure and execute tests, and then collect and present results. The MRM manager communicates with MRM testers who are instructed to source and/or sink multicast traffic. The MRM manager, through beacon messages, also maintains and modifies the set of MRM testers. * Test Sender (TS): A test sender is basically responsible for sourcing multicast traffic. TSs will receive authenticated requests from the MRM manager and will send a specified number of multicast packets to a specified multicast address with a specified inter-transmission time between packets. * Test Receiver (TR): Based on instructions from an MRM manager, a test receiver is expected to either explicitly join a multicast group or simply monitor traffic on a specified group address. Based on thresholds specified by the MRM manager, a TR will report faults. Additionally, the MRM manager may request TR reports regardless of whether any thresholds were violated. One of the keys to scalability is ensuring that a large number of TRs don't overwhelm the MRM manager with traffic. Scalability is handled using a combination of techniques including report suppression and aggregation. As the MRM protocol specification indicates about itself, ``it only specifies the types of information a MRM manager can obtain, and the protocol used to acquire such information. How an MRM manager processes or presents the diagnostic information is an implementation issue.'' These functions are expected to be provided using companion management tools. Furthermore, the MRM protocol specification does not fully describe the scenarios in which MRM is expected to be useful. Such functions and scenarios are described in Section 4 of this document. Almeroth, Wei [Page 2] INTERNET-DRAFT draft-ietf-mboned-mrm-use-00.txt February 1999 3. Justification MRM provides a set of functions not provided by any of the commonly used MBone debugging protocols and tools. Most of the tools used in the MBone today fall into one of three categories: (1) SNMP-based tools like Mview [MVIEW] or Mstat [MSTAT]; (2) RTCP-based tools like Mhealth [MHEALTH], RTPmon [RTPMON], or MultiMON [MULTIMON]; or (3) multicast route tracing tools like Mtrace [MTRACE1,MTRACE2]. MRM, in addition to being an independent management tool, can be used in conjunction with these other tools to provide a richer set of management functions. Some of the reasons why the above protocols or tools fail are discussed in the following paragraphs. * SNMP: SNMP provides a mechanism to poll devices for information or to have alarms generated when certain events occur. The problem with SNMP is that a wide ranging failure could potentially overwhelm a management station. For example, consider a scenario in which SNMP agents in a particular multicast tree are configured to generate an alarm if the packet loss exceeds a certain level. Then consider the implosion that would occur if a link close to the root becomes congested, and a majority of group members generated alarms. This scenario demonstrates the basic drawbacks of SNMP: a general lack of scalability especially when considering that large number of devices/hosts that may be involved in a multicast group. Scalability arguments do not preclude the use of SNMP, but a manager using SNMP to manage multicast would have to be extremely careful in deciding how to configure the network. In fact, properly configuring network devices to provide sufficient management information while avoiding management-induced congestion or implosion may be prohibitive in most networks. * RTCP: RTCP has a much more scalable feedback mechanism but it has its own deficiencies. The scalability of RTCP is based on a random wait time chosen from an interval calculated by each group member and based on an estimate of the overall group size. The larger the group, the larger the wait interval, and the longer the average inter-packet time between RTCP feedback messages. The goal of the RTCP feedback mechanism to is consume bandwidth equal to 5% of the data traffic rate. While this algorithm seems reasonable it can be problematic as a tool to management multicast traffic. Some reasons include: o RTCP feedback is multicast to all group members, and given that receivers will have heterogeneous bandwidth capabilities, even scalable feedback has the potential to overwhelm some receivers. Almeroth, Wei [Page 3] INTERNET-DRAFT draft-ietf-mboned-mrm-use-00.txt February 1999 o In many management applications there is no need for feedback data to be transmitted to all group members. And if privacy is an issue, the group-wide delivery of RTCP is even less desirable. o RTCP feedback provides only a single, end-to-end loss and jitter value. More generally, RTCP contains only a very small amount of information useful for debugging purposes. MRM is designed to include a broader range of information, including packet duplication statistics, and also to be extensible. While some of the problems with RTCP are being addressed by redefining the standard to allow more flexibility in the use of RTCP [RTPNEW], these efforts do not solve all of the problems. In particular, the most critical deficiency of RTCP is a lack of detailed routing information. In particular, when trying to isolate routing faults, the end-to-end style feedback provided by RTCP is unlikely to have sufficient granularity. To address this problem, some RTCP-based tools are used in combination with other tools. For example, RTPmon and mtrace are commonly used together. However, the major drawback of this solution is that it fails to provide the sort of traffic origination and flexible group membership services offered by MRM. * Mtrace: Mtrace is a tool designed to provide hop-by-hop path information for a specific source and destination. It is a useful tool for figuring out a multicast path and round trip information. For a specific group, mtrace will also tell a user hop-by-hop packet loss. Coupled with RTCP feedback, mtrace can be used to monitor many of the relevant factors for an active source and group including per-receiver loss, hop-by-hop loss, tree topology, jitter, and round trip time. Several tools in development, including Mhealth, provide a graphical real-time display of group statistics. However, mtrace (coupled with other tools) only provides information about active groups. Attempting to do fault detection, or more specifically, fault pre-detection, is nearly impossible. The common paradigm today is to gather a set of willing participants who then join a ``debugging'' session. Further complicating the problem is that sometimes, starting an MBone tool in a remote location to receive and transmit RTCP reports is not possible. One solution to this problem is a ``dumb'', non-GUI tool that simply receives and responds to an RTP stream. While tools like this have been discussed, but none are widely available, and even if they were, attempting to rapidly configure and change group membership would be laborious at best. MRM is designed with the specific purpose of facilitating on-the-fly, adhoc test multicast senders and receivers to test a variety of multicast group configurations. Almeroth, Wei [Page 4] INTERNET-DRAFT draft-ietf-mboned-mrm-use-00.txt February 1999 4. Scenarios for Use of MRM MRM is designed to provide automated fault detection and isolation services for multicast traffic. In order to support these services with any kind of automation, MRM must be both flexible and scalable. MRM scalability implies the ability detect faults without raising so many alarms that additional problems are caused from the delivery of alarm messages. One problem, in particular, is response implosion at the MRM manager. MRM flexibility implies the ability to isolate faults by sourcing traffic from anywhere in the network and collecting statistics from any node or subset of nodes. In addition to basic fault detection and isolation, MRM is intended to provide more advanced functions. These extended functions include: * Fault logging and real-time (passive) monitoring functions. * Pro-active test (fault isolation) include service provisioning and impact analysis. The remainder of this section is dedicated to the description of scenarios in which MRM functions are expected to be used. * Pre-Event Testing: One of the best examples of this type of scenario is the MBone delivery of two audio/video channels from the IETF meetings held three times a year all over the world. Preceding the week of meetings for each IETF, staff members install a terminal room and establish network connectivity including multicast capability. In some cases, setup activities occur weeks, days, or hours before the first meeting Monday morning. Verifying that multicast routing is working both into and out of the IETF meeting rooms can be a challenge. Verification is especially challenging because the IETF meetings have a world-wide audience. Ensuring that multicast is working at even a small number of remote sites is difficult. One problem that sometimes occurs is that the MBone equipment, including cameras and workstations, may not be available when the network is first turned on. In these cases, there are no multicast-capable sources or receivers inside the IETF network. MRM would alleviate this problem by allowing testing of multicast in both directions. Furthermore, MRM would also allow someone not yet on site to test multicast connectivity. Relatively extensive testing can be performed by choosing a set of Test Receivers representative of the world-wide distribution of actual IETF participants. MRM would allow the IETF staff and the ISP to observe where major network bottlenecks are occurring. In some cases, early discovery of problems could lead to fixes in time for the event. Almeroth, Wei [Page 5] INTERNET-DRAFT draft-ietf-mboned-mrm-use-00.txt February 1999 These techniques, used for pre-event testing at ``nomadic events'', would also be appropriate for estimating the quality of transmissions events in ``non-nomadic'' networks. Instead of the IETF or an academic conference, an MRM manager might want to estimate the loss, delay, and jitter for a frequently scheduled event like an MBone lecture or company event. Instead of waiting until the event starts and using a tool like RTPmon, an MRM manager can set up a test session any time before the session starts, and evaluate the quality to most, if not all of the critical company locations. In the MBone today, if a transmitter wants to perform this kind of testing, the transmitter will, out-of-band, have to ask several friends to join a test session and then send a multicast stream and monitor RTCP reports. Obviously, this method is not very compelling. * Classic Fault Isolation: A second scenario that MRM is designed to assist a network manager in is classic fault isolation. Like unicast routing, multicast routing problems can be very difficult to debug. And unlike unicast routing, the additional complexities of providing efficient, one-to-many delivery can introduce additional bugs that are difficult to find. To date, a significant number of strategies, tools, and techniques have been developed, built, and proposed [MDH]. However, these attempts generally require a significant level of multicast routing expertise and experience, characteristics not always found among NOC personnel. As a result, MRM is designed to offer a layer of abstraction between multicast route management and the intricacies of multicast routing. MRM is also designed not to be completely independent of the strategies, tools, and techniques already in use today. MRM and existing tools can work in concert to isolate multicast routing problems. MRM's design offers some important flexibility in isolating multicast routing faults. In particular, the ability to specify a transmission rate allows a manager to closely inspect single, infrequently transmitted packets. Also, the ability to easily add and remove members from the group of Test Receivers allows a manager to quickly and efficiently affect the topology of the multicast tree. * Session Monitoring: The scenarios discussed so far followed logically, from verifying multicast connectivity to isolating any potential faults. The next key scenario is monitoring of existing, active sessions. Such groups will have a well-known multicast address, and might be exchanging group membership information via RTCP reports or some other out-of-band mechanism. If the group is small, and feedback from each receiver is Almeroth, Wei [Page 6] INTERNET-DRAFT draft-ietf-mboned-mrm-use-00.txt February 1999 important, the set of test receivers can be configured to send reports to the MRM manager via unicast. If the group is large and complete feedback is not necessary, the set of test receivers can be frequently adjusted to represent some statistical sampling of the group. The ability to send statistical reports via unicast helps to improve the scalability of session monitoring by not overwhelming all receivers with all reports. Finally, if the group is using multicast tools that do not use RTCP and use no real-time signaling, generation of a real-time list of group members may be difficult to create. Other techniques will have to be used. One network-layer approach might be to use SNMP information to find the set of links in the multicast tree. A more simple approach might depend on other available information like the fact that most users start the multicast tool via a WWW page. In this case, HTTP server logs can be used to estimate group membership. * Fault Logging: In the case when session monitoring identifies the existence of a fault, a range of fault logging functions may be required. At one extreme, the MRM manager may simply need to be alerted when faults occur so that appropriate investigative measures can be taken. At the other extreme, service contracts may depend on the provision of service with certain guarantees. Any outages will need to be closely tracked. These two extremes again demonstrate the need for MRM to be flexible. In particular, when faults need to be closely monitored and logged, a wide-scale outage may itself cause a heavy load on the network. While identifying the exact load capable of being supported by a distressed network is beyond the scope of MRM, MRM does and will support scalability and aggregation functions. 8. Security Security issues are discussed in the MRM protocol description [MRM]. Almeroth, Wei [Page 7] INTERNET-DRAFT draft-ietf-mboned-mrm-use-00.txt February 1999 9. Authors' Addresses Kevin Almeroth Department of Computer Science University of California Santa Barbara, CA 93106-5110 USA almeroth@cs.ucsb.edu Liming Wei cisco Systems, Inc. 170 West Tasman Drive San Jose, CA 95134 USA lwei@cisco.com 10. References [MRM] L. Wei, and D. Farinacci, "Multicast Routing Monitor (MRM)", IETF Internet-Draft, draft-ietf-mboned-mrm-*.txt, February 1999. [RTP] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", IETF RFC 1889, January 1996. [SNMPV1] J. Case, M. Fedor, M. Schoffstall, and J. Davin, "Simple Network Management Protocol", IETF RFC 1157, May 1990. [SNMPV2] J. Case, K. McCloghrie, M. Rose, and S. Waldbusser, "Protocol Operations for Version 2 of the Simple Network Management Protocol (SNMPv2)", IETF RFC 1905, January 1996. [MVIEW ] D. Thaler, "Mview Tool", http://www.merit.edu/~mbone/mviewdoc/Welcome.html. [MSTAT] B. Fenner, et al., "Mstat", Available as part of mrouted at ftp://ftp.parc.xerox.com/pub/net-research/ipmulti/. [MHEALTH] D. Makofske, and K. Almeroth, "Mhealth -- Real-Time Multicast Tree Health Monitoring Tool", http://imj.ucsb.edu/mhealth/, August 1998. [RTPMON] A. Swan, and D. Bacher, "RTPmon", ftp://mm-ftp.cs.berkeley.edu/pub/rtpmon/, January 1997. Almeroth, Wei [Page 8] INTERNET-DRAFT draft-ietf-mboned-mrm-use-00.txt February 1999 [MULTIMON] J. Robinson, and J. Stewart, "MultiMON 2.0 -- Multicast Network Monitor", http://www.merci.crc.ca/mbone/MultiMON/, August 1998. [MTRACE1] B. Fenner, et al., "Multicast Traceroute (mtrace) 5.2", ftp://ftp.parc.xerox.com/pub/net-research/ipmulti/ September 1998. [MTRACE2] B. Fenner, and S. Casner, "A `traceroute' Facility for IP Multicast", IETF Internet-Draft, draft-ietf-idmr-traceroute-ipm-*.txt, November 1995. [RTPNEW] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", IETF Internet-Draft, draft-ietf-avt-rtp-new-*.txt", November 1998. [MDH] D. Thaler, and B. Aboba, "Multicast Debugging Handbook", IETF Internet-Draft, draft-ietf-mboned-mdh-*.txt, October 1998. Almeroth, Wei [Page 9]