2.4.14 Remote Performance Management (rperfman) BOF

Current Meeting Report

(Remote Performance Management) BOF session held on March 30, 2000 at the IETF meeting in Adelaide.

Randy Presuhn opened the session with an overview of the proposed agenda, which required no changes. Andy Bierman agreed to take detailed notes for the minutes.

For purposes of discussion, the larger problem space of remote performance management was divided into seven major areas: instrumentation, instrumentation control, metrics, data reduction, data reduction control, collection coordination, and performance management applications. Randy outlined how these mapped to current work in progress by various working groups.

Carl Kalbfleisch gave a presentation of requirements arising from his experiences at Verio. Key points included the need to support multiple monitoring applications, to eliminate redundant polling, to make the configuration of the tests tractable, and, finally, to have useful applications for managing performance.

Dan Romascanu gave a presentation on the use of active probes for performance monitoring, covering the material which is to appear in draft-cole-appm-00.txt. This presentation covered the roles and usage of active probes. It also contrasted them with classical RMONMIB passive monitoring. Examples of use included trouble-shooting, circuit pre-testing, fault management, end-to-end capacity management, and SLA monitoring.

The discussion then turned to the various tradeoffs involved in using active and passive probe technologies. Some benefits of active probe technologies include control of the sampling and the probe characteristics. On the other hand, this generated traffic adds to network load, and is only a simulation, rather than measuring real user traffic.

Randy asked the group some questions to get a sense of the room:

- Do we want to create new primary metrics? No.

- Is traffic injection being used for performance measurement? Yes, lots.

- Is it automated? Yes.

- Is there a need to standardized its management? Yes.

A question came from audience regarding QoS performance monitoring and the need DIFFSERV monitoring. The responses were to use DS-MON to monitor DS flows (work from the rmonmib working group) and to use the DIFFSERV MIB to monitor the DS forwarding points.

The discussion turned to APPM architectural issues, including the need for a framework to cover both network and application layers, transport metrics, configuration control, and the relationship to fault management and data reduction, based on standard instrumentation, or at least instrumentation that can be managed in a standard way. Deployment considerations, e.g., where to put active probes, can impact validity and usefulness of data. Security is also an issue, to control the configuration, traffic rate, type of traffic, and so on since active probes could easily be used for denial-of-service attacks.

A brief survey of related work within the IETF mentioned IPPM, defining metrics to be collected, DISMAN, defining some active probe functions (remops) and data aggregation (expression, script MIB) capabilities, RMON, defining passive monitoring, application monitoring, and reporting, ApplMIB, defining additional application performance information measured at the application (client or server), as well as RTFM, BMWG, and frnetMIB's service work.

Steve Waldbusser asked how the 'big picture' slide relates to traffic generation configuration. Randy sees more interesting work in the data collection from many points and data reduction (like snmpconf) that these are separate problems, but that historically the IETF has not made the distinction.

Andy Bierman questioned the need for a standard to glue all components together, since this has traditionally done by management station applications. However, he does see a clear need for standard knobs to setup traffic generators. This led to the question of whether this can be modularized so the application does not have to provide all the components and set them up from scratch every time they are needed.

A review of configuration issues for probes included sampling methods (IPPM describes some sampling strategies and implementation details), probe configuration details, such as data configuration and path selection, the choice of statistics, traffic rate, and many others.

The review of implementation issues for probe control touched on several points. The packet generation needs to be carefully defined. The clock resolution of traffic generators needs to be understood. The error analysis phase needs to identify the errors that may be introduced in measurements.

Discussion of potential work items in this area included the development of a framework document on active monitoring within the Internet framework, the development of a MIB for active probe configuration, and the definition of MIBs for access to the transport and network level metrics that have already been defined by various working groups.

Follow-up questions included: How does the CAIDA work relate to this area? How about an IPPM implementation (source & sink)?

A person who works for an ISP voiced the concern that active probes should be developed by IETF so that they are constrained and well-behaved in order to reduce risk of abuse.

Steve Waldbusser argued that since people are already using this technology, and lots of companies are doing this, it is premature for IETF to standardize, and the work should be done in RMON when we are ready. He observed that doing synthetic transaction monitoring is not just sending octet string and waiting for a response, that synthetic transactions need lots of config details, including state machine for the protocols, knowledge of the network, transport and application layers. There are many ways to do this work, but it is too hard for PeopleSoft, that some generators are reduced to pushing buttons on the applications, that Ganymede does not do the real application, but instead does a very proprietary approach.

Steve concluded that this work should be carried out in the RMON working group. He cited the TR-RMON's active components as precedent, and offered that active network layer probes were only deferred because of RMON-2 work. Furthermore, RMON was just chartered for APM, and this includes reporting of synth transactions. RMON will need to address this later; it will be harder if there is overlapping charters and RMON has to redo some work to fit with the PD and APM.

Randy countered that there is a difference between "programming" an active probe, which requires between knowing all the details, and standardizing the 'on/off' button. He cited the application MIB work, which abstracts out the comment elements of applications, and doesn't even try to represent the peculiarities, leaving those aspects to device, vendor, or application-specific MIBs.

Steve replied that RMON found on/off to be insufficient.

Randy continued that aggregation of data from many points of collection is not covered by RMON. There may be a need to recombine into new tables, not just multiple instances of the RMON MIB.

Dan Romascanu agreed with Steve on the need to increase the priority of those issues in the RMON WG; still need to correlate data from many different places.

Russell Dietz gave the next presentation, explaining the RMON APM/TPM framework. The top-level elements were the APMCAPS, APM study, and TPM study.

The APM and TPM linked by flow measurements; APM gives high-level and TPM gives microflow. The TPM serves as a drilldown for APM.

APMCAPs has a hook to point to the control mechanism for a test, if available. It could point to standard MIBs like DISMAN remops or proprietary MIBs.

TPM has a microflow decomposition for the APM user experience.

TPM has statistical reporting. Russell wants feedback from the BOF so the drafts in progress can accommodate requirements from this work.

Randy gave a recap of what had been covered and glossed over during the session, including configuration management issues for active probes, the correlation problems for relating measurements from different sources. Possible deliverables would depend on which working group or groups took up the project. This would also have to be done with some consciousness of the work already in progress. One possibility would be to generate an architecture document addressing the issues of cross-system reporting.

The agenda turned to considering the possible paths forward.

Randy asked for a sense of the room on each of the possibilities:

a) a new WG -- no interest from anybody

b) disman -- if we focus on infrastructure

c) applmib -- not any interest; focused on apps, not network

d) rmon

e) something else

The consensus emerged to let RMON continue to focus on the instrumentation, instrumentation control, and single-system reporting. Disman appeared to be the right place to handle multi-system data reduction and data reduction control. SNMPCONF would appear to be the right place to handle the cross-system collection configuration coordination. IPPM and other working groups would continue to define fundamental metrics. A gap is ensuring that those metrics, once defined, are somehow made visible to management.

The following action items were agreed:

- RMON -- consider requirements for active probe control during the APM/TPM work, and start work right after APM/TPM

- DISMAN -- look at data correlation issues from multiple probes

- SNMPCONF -- allow for this work to use snmpconf

Consequently, it was felt that there was no need for a new mailing list, and that these 3 lists would be appropriate for further discussion.


Randy Presuhn randy_presuhn@bmc.com http://www.bmc.com/

Voice: +1 408 546-1006 BMC Software, Inc. 1-3141 2141 N. First Street

Fax: +1 408 965-0359 San Jose, California 95131 USA


Any relationship between my opinions and BMC's should be coincidental.



Performance Metrics