2.4.3 Benchmarking Methodology (bmwg)

NOTE: This charter is a snapshot of the 60th IETF Meeting in San Diego, California USA. It may now be out-of-date.

Last Modified: 2004-07-23

Kevin Dubray <kdubray@juniper.net>
Al Morton <acmorton@att.com>
Operations and Management Area Director(s):
Bert Wijnen <bwijnen@lucent.com>
David Kessens <david.kessens@nokia.com>
Operations and Management Area Advisor:
David Kessens <david.kessens@nokia.com>
Mailing Lists:
General Discussion: bmwg@ietf.org
To Subscribe: bmwg-request@ietf.org
In Body: subscribe your_email_address
Archive: http://www.ietf.org/mail-archive/web/bmwg/index.html
Description of Working Group:
The major goal of the Benchmarking Methodology Working Group is to make a series of recommendations concerning the measurement of the performance characteristics of various internetworking technologies; further, these recommendations may focus on the systems or services that are built from these technologies.

Each recommendation will describe the class of equipment, system, or service being addressed; discuss the performance characteristics that are pertinent to that class; clearly identify a set of metrics that aid in the description of those characteristics; specify the methodologies required to collect said metrics; and lastly, present the requirements for the common, unambiguous reporting of benchmarking results.

To better distinguish the BMWG from other measurement initiatives in the IETF, the scope of the BMWG is limited to technology characterization using simulated stimuli in a laboratory environment. Said differently, the BMWG does not attempt to produce benchmarks for live, operational networks. Moreover, the benchmarks produced by this WG shall strive to be vendor independent or otherwise have universal applicability to a given technology class.

Because the demands of a particular technology may vary from deployment to deployment, a specific non-goal of the Working Group is to define acceptance criteria or performance requirements.

An ongoing task is to provide a forum for discussion regarding the advancement of measurements designed to provide insight on the operation internetworking technologies.

Goals and Milestones:
Done  Expand the current Ethernet switch benchmarking methodology draft to define the metrics and methodologies particular to the general class of connectionless, LAN switches.
Done  Edit the LAN switch draft to reflect the input from BMWG. Issue a new version of document for comment. If appropriate, ascertain consensus on whether to recommend the draft for consideration as an RFC.
Done  Take controversial components of multicast draft to mailing list for discussion. Incorporate changes to draft and reissue appropriately.
Done  Submit workplan for initiating work on Benchmarking Methodology for LAN Switching Devices.
Done  Submit workplan for continuing work on the Terminology for Cell/Call Benchmarking draft.
Done  Submit initial draft of Benchmarking Methodology for LAN Switches.
Done  Submit Terminology for IP Multicast Benchmarking draft for AD Review.
Done  Submit Benchmarking Terminology for Firewall Performance for AD review
Done  Progress ATM benchmarking terminology draft to AD review.
Done  Submit Benchmarking Methodology for LAN Switching Devices draft for AD review.
Done  Submit first draft of Firewall Benchmarking Methodology.
Done  First Draft of Terminology for FIB related Router Performance Benchmarking.
Done  First Draft of Router Benchmarking Framework
Done  Progress Frame Relay benchmarking terminology draft to AD review.
Done  Methodology for ATM Benchmarking for AD review.
Done  Terminology for ATM ABR Benchmarking for AD review.
Done  Terminology for FIB related Router Performance Benchmarking to AD review.
Done  Firewall Benchmarking Methodology to AD Review
Done  First Draft of Methodology for FIB related Router Performance Benchmarking.
Done  First draft Net Traffic Control Benchmarking Methodology.
Done  Methodology for IP Multicast Benchmarking to AD Review.
Mar 03  Resource Reservation Benchmarking Terminology to AD Review
Done  First I-D on IPsec Device Benchmarking Terminology
Apr 03  Net Traffic Control Benchmarking Terminology to AD Review
Apr 03  Methodology for FIB related Router Performance Benchmarking to AD review.
Done  EGP Convergence Benchmarking Terminology to AD Review
Done  Resource Reservation Benchmarking Methodology to AD Review
Jul 03  Basic BGP Convergence Benchmarking Methodology to AD Review.
Dec 03  Net Traffic Control Benchmarking Methodology to AD Review.
Dec 03  IPsec Device Benchmarking Terminology to AD Review
  • - draft-ietf-bmwg-mcastm-14.txt
  • - draft-ietf-bmwg-dsmterm-09.txt
  • - draft-ietf-bmwg-benchres-term-04.txt
  • - draft-ietf-bmwg-conterm-06.txt
  • - draft-ietf-bmwg-ospfconv-term-10.txt
  • - draft-ietf-bmwg-ospfconv-intraarea-10.txt
  • - draft-ietf-bmwg-ospfconv-applicability-07.txt
  • - draft-ietf-bmwg-ipsec-term-04.txt
  • - draft-ietf-bmwg-igp-dataplane-conv-meth-03.txt
  • - draft-ietf-bmwg-igp-dataplane-conv-term-03.txt
  • - draft-ietf-bmwg-igp-dataplane-conv-app-03.txt
  • - draft-ietf-bmwg-acc-bench-term-03.txt
  • - draft-ietf-bmwg-acc-bench-meth-00.txt
  • - draft-ietf-bmwg-hash-stuffing-00.txt
  • Request For Comments:
    RFC1242 I Benchmarking Terminology for Network Interconnection Devices
    RFC1944 I Benchmarking Methodology for Network Interconnect Devices
    RFC2285 I Benchmarking Terminology for LAN Switching Devices
    RFC2432 I Terminology for IP Multicast Benchmarking
    RFC2544 I Benchmarking Methodology for Network Interconnect Devices
    RFC2647 I Benchmarking Terminology for Firewall Performance
    RFC2761 I Terminology for ATM Benchmarking
    RFC2889 I Benchmarking Methodology for LAN Switching Devices
    RFC3116 I Methodology for ATM Benchmarking
    RFC3133 I Terminology for Frame Relay Benchmarking
    RFC3134 I Terminology for ATM ABR Benchmarking
    RFC3222 I Terminology for Forwarding Information Base (FIB) based Router Performance
    RFC3511 I Benchmarking Methodology for Firewall Performance

    Current Meeting Report

    Benchmarking Methodology WG (bmwg)

    Thursday, August 5, 2004, 1300-1500


    CHAIRS: Kevin Dubray <kdubray@juniper.net>

    Al Morton <acmorton@att.com>

    The following meeting minutes were edited by Kevin Dubray from notes taken by Al Morton and Kevin Dubray.

    The BMWG meeting enjoyed approximately 21 attendees.

    Al Morton presented the agenda as:

    1. WG Status (Chairs, 10 min)

    2. Milestones (Chairs, 5 min)

    3. Comments re: Active Review Template Experiment (Chairs, 10 min)

    4. IGP Dataplane Convergence I-Ds. (Poretsky, 15 min)

    5. Core Router Accelerated Life Testing (Poretsky, 10 min)

    6. Hash and Stuffing (Newman, 15 min)

    7. Short discussion on "plug-in" Drafts (Chairs, 5 min)

    8. LDP Convergence (Eriksson, 10 min)

    9. Automatic Protection Switching (Poretsky, 15 min)

    - Terminology (comments, 5 min)

    - MPLS Protection Methodology (Poretsky, 5 min)

    - Discussion

    10. Wrap-up/Conclusion (Chairs, 10 min)

    The opportunity to modify the proposed agenda was declined by the attendees.

    1. Working Group Status

    Al Morton's presentation, "Agenda", gives a detailed accounting of the state of current BMWG documents and initiatives. (The presentation, BMWG-0, "Agenda," can be found in the proceedings.)

    Of note, one BMWG I-D is in the RFC Editor's queue; four I-Ds are in AD/IESG review; five I-Ds have been revised or are initial versions; there are a few new work proposals.

    2. WG Milestones

    Al stated that the Working Group's milestones were revised to reflect reality. Particulars can be found in the slides. Al cited difficulties getting the milestones to the BMWG Web page, so what eventually makes it to the page may vary as a function of milestone trajectory and timeliness in administrative processing.

    3. IGP Data plane convergence benchmark I-Ds.

    Scott ("Don't call me Jerry") Poretsky presented the history, activity, and state of the current work on these I-Ds. Scott's slides can be found in the Proceedings. (BMWG-1, "IGP Data Plane Convergence Benchmarking")

    There was some discussion on the recommendations that came out of the Last Call for this body of work. One attendee noted that a recommendation for a normalization of stronger parameter reporting didn't seemed to be picked up by the I-D editors. Scott said the I-D team considered the input, but opted not to adopt the suggestion. Scott stated that the I-Ds don't preclude the use of variables other than the recommended values. The attendee noted that only by rigorous, standardized reporting parameters could the test be a comparative benchmark of differing products from differing vendors.

    Brooks Hickman stood up to say that the I-Ds' principals should consider how the notion of packet loss might skew the convergence measurement due to traffic persistence (e.g. buffering). Scott thought this a good point that merited additional consideration.

    The chairs interjected a discussion regarding the BMWG experiment to solicit reviewers as a part of the Last Call process. In the presentation, "Agenda" (Proceeding slide set: BMWG-0), the slides on "Reviewers in BMWG Last Call," summarizes the motivation and guidelines for Reviewers (The IGP dataplane I-Ds were the first to utilize reviewers.)

    Heads nodded when the chairs asked if the group thought the experiment useful. Mr. Poretsky indicated that it worked well for the IGP effort.

    4. Benchmarking Core Router Accelerated Life Testing.

    Scott Poretsky's presentation (slides in Proceedings: BMWG-2, "Accelerated Stress Benchmarking") addressed the evolution of the effort. (The original framework document has been deprecated, with much of its content introduced in both the terminology and the new methodology document.)

    Scott emphasized that much of the effort attempted to provide better emphasis on benchmarks.

    He posed the following questions to the group: Does the current work sufficiently address benchmarking? Should additional test cases be added to the methodology? Should specific Denial of Service attacks be defined by explicit (rather than abstract) reference in the methodology I-D?

    There was a discussion regarding whether the documents had the specificity to support the notion of comparative benchmarks, as intended by the I-Ds themselves. The benchmarks should illuminate the differences in the tested system, either through direct system response or illumination of other secondary factors. That is, the test specification itself must not be the source of the difference. This topic led into a discussion on variability. While it was agreed that the test specification must not be the source of results variability, it was also agreed that having provisions that assessed variability of responses from a tested system was very desirable. This variability indicator might take the form of a metric, statistic, or might be the target of an additional "test cases" in the I-D. Scott responded he would consider the points.

    Scott queried the group about the notion of defining Denial of Service (DoS) attacks so they may be included in the test scope. Several people seemed to agree that "defining" specific DoS attacks were better tasked to more tailored efforts. They did not preclude, however, the accommodation of emulating these network events in the course of the test scope.

    Another attendee noted that tests should instrument Latency as well as Forwarding. Scott thought this, too, a good suggestion.

    On the topic of additional parameters to consider for the test scope, it was offered that, in addition to routing flap, hooks to apply common policy should be considered - and those hooks should have the ability to be varied. There was agreement on this point, and the thought expanded to consider filters as well as consistency checks.

    Al Morton closed the discussion, due to time constraints, reinforcing the importance to achieve the quality of a cross-platform comparative tool.

    5. New Work Proposal on Hash and Stuffing.

    In his presentation, David Newman cited the need for applying additional rigor in BMWG benchmarks when it comes to the areas of address specification and traffic composition. (Proceedings presentation: BMWG-3, "Hash and Stuffing: Overlooked Factors in Network Device Benchmarking.") The related Internet-Draft offers methods that could be used in the context of existing and new BMWG benchmarking recommendations.

    David drew the analogy of the proposal to an "application note" for working coming out of the BMWG.

    At the end of the presentation, David sought to get an indication as to whether folks thought the proposal was appropriate work for the Working Group.

    Many folks thought the proposal meaningful and relevant to the BMWG. One person suggested that a separate effort could be made for one particular area: MPLS Label Stacking, where it was speculated that hashing heuristic can significantly impact device response. Another thought was offered reflecting the belief that the effort could be made universal, regardless of device type. Another person countered the need for "hashing benchmarks" as many BMWG benchmarks already characterize the effects of hashing, even though it may be implicitly. Standalone benchmarks, it was parried, are good to raise awareness - especially for Service Providers.

    One attendee voiced that running a variety of traffic models maybe beneficial so as not to optimize to one device vendor or another. Newman stated that it was the current proposals explicit intent NOT to optimize to a given vendor.

    There were many remarks of the sort calling for verification of the distribution of the test stimuli. Other comments called for more detail concerning the specification of the degree of randomness or pseudo-randomness to keep tests comparable. David agreed that while more detail may be beneficial. He expressed his concern regarding the notion of random vs. pseudo-random. The more one gravitates to truly random stimuli, the less repeatability may be offered by the test. And repeatability is a targeted quality for BMWG benchmarks. It was offered that declaring what is meant by "pseudo-random" may be good enough.

    A last recommendation to the effort's advocates - make sure you cover the "entire [address] space." Lookups for unicast might be different than lookup behaviors for multicast. Randomness may be a goal, but the patch needs to be able to cover many types of scenarios sufficiently. It was offered that larger trials and averaging might ameliorate many of the conditions addressed by the proposal as well.

    Al announced that if the discussion were representative, there appears to be interest in the proposal; the chairs would follow up with the list to test the wider WG audience.

    6. Follow-up discussion of drafts that "plug-in" to other WG work.

    From the "Hash & Stuffing" discussion, the chairs opened a brief discussion to address a recent trend to take on additional BMWG work items that don't always fall into the notion of a discrete benchmark. One example was White's I-D on "Considerations in Benchmarking Routing Protocol Network Convergence." This work attempts to provide some anecdotal experience with issues related to the characterization of the OSPF protocol response. Another plug-in examples was the recently discussed "hash & stuffing" effort. This effort attempts to supplement existing BMWG work with addressing and traffic composition recommendations.

    These works don't form a standalone body of benchmarking specifications, rather they are meant to supplement existing recommendations.

    The chairs wanted to poll the group for their thoughts on how these efforts fall in to the BMWG charter and for suggestions on how best to handle plug-ins.

    It was questioned as to why have these works as separate documents? Why not revise older WG documents to use these concepts? For example, "Hash & Stuffing" may have other applicability in other areas such as IPSec. If so, embed in concepts

    In the IPSec benchmark docs. Moreover, doesn't embedding a recommendation in an existing work mandate an unquestionably explicit binding?

    One attendee supported the "respin-existing-work" concept with the belief that "plug-ins" are difficult to manage and may not have universal application. Another member opposed that view saying that he thought a) the charter didn't preclude plug-ins, and b) as a standalone commodity, it is easier to revise. Moreover, it was thought the "plug-in" could cite applicable existing work; newer work could cite the plug-in. Another thought was offered to tailor the plug-in handling (revise/replace RFC vs. new, standalone RFC) to the scope of the plug-in.

    There was overwhelming agreement that extreme scrutiny needed to be applied to this area such that provisions for "addenda" do not become vehicles for work "orphaned" or dismissed by other technical or standards groups.

    7. New Work Proposal on LDP Convergence Benchmarking

    T. Eriksson presented a series of slides that highlighted the new work proposal on LDP convergence benchmarking. The slides outlined the effort's motivation, goals, presented the Terminology Document, and proposed next steps. These slides, too, can be found in the proceedings as BMWG-4, "LDP Data Plane Convergence Benchmarking."

    The discussion was brief, as it appears that few read the corresponding document. It was asked that given the structure of the current recommendation, should there be multiple FEC modules? Scott Poretsky replied the intent would be to reuse the Fast Re-Route methods.

    Al cajoled the group to read the corresponding doc, so we can better gauge interest.

    8. Work Proposal on Protection Switching Methodology

    Scott Poretsky's slides (BMWG-5, "Benchmarking Protection Mechanisms") addressed the two-year history of this effort; offered comparative terminology; proposed benchmarks; and suggested a course of action on how best to proceed.

    Scott noted that the intent of the effort is to provide benchmarking procedures that could be used in any MPLS environment.

    Little discussion ensued. Al Morton asked how many read the draft. About half the participants indicated that they have. T. Eriksson stood up to offer his emphatic, verbal support of the effort. Al Morton suggests the discussion needs to move to the BMWG list to gauge wider interest, but it appears that proponents have successfully facilitated a new, single proposal over the multiple proposals of times past.


    IGP Data Plane Convergence Benchmarking
    Accelerated Stress Benchmarking
    Hash and Stuffing: Overlooked Factors in Network Device Benchmarking
    LDP Data Plane Convergence Benchmarking
    Benchmarking Protection Mechanisms