Benchmarking Methodology (bmwg)

NOTE: This charter is a snapshot of that in effect at the time of the 38th IETF Meeting in Memphis, Tennessee. It may now be out-of-date.


Guy Almes <
Kevin Dubray <

Operations and Management Area Director(s): 

Scott Bradner <
Michael O'Dell <
Deirdre Kostick <

Mailing Lists: 

To Subscribe:

Description of Working Group: 

The major goal of the Benchmarking Methodology Working Group is to make a series of recommendations concerning the measurement of the performance characteristics of various internetworking technologies; further, these recommendations may focus on the systems or services that are built from these technologies.

Each recommendation will describe the class of equipment, system, or service being addressed; discuss the performance characteristics that are pertinent to that class; clearly identify a set of metrics that aid in the description of those characteristics; specify the methodologies required to collect said metrics; and lastly, present the requirements for the common, unambiguous reporting of benchmarking results.

Because the demands of a class may vary from deployment to deployment, this Working Group will not attempt to define acceptance criteria or performance requirements.

Currently, there are two distinct efforts underway in the BMWG. The first addresses the metrics and methodologies associated with benchmarking network interconnect devices. The second effort (IPPM) focuses on determining the practical benchmarks and procedures needed in gaining insight for users and providers of IP Internet services. 

An ongoing task is to provide a forum for the discussion and the advancement of measurements designed to provide insight on the operation internetworking technologies.

Goals and Milestones:

Jun 96     

Draft a set of path performance metric definitions, including delay, flow capacity, and packet loss for the IPPM meeting in June, 1996.


Jun 96 


Expand the current Ethernet switch benchmarking methodology draft to define the metrics and methodologies particular to the general class of connectionless, LAN switches.


&middot; Benchmarking Terminology for LAN Switching Devices

&middot; Terminology for Cell/Call Benchmarking

&middot; Connectivity

&middot; Empirical Bulk Transfer Capacity

&middot; Framework for IP Provider Metrics

&middot; A One-way Delay Metric for IPPM

&middot; Terminology for IP Multicast Benchmarking 

&middot; A Packet Loss Metric for IPPM

Request For Comments:







Benchmarking Terminology for Network Interconnection Devices




Benchmarking Methodology for Network Interconnect Devices

Current Meeting Report


Minutes of the Benchmarking Methodology Workgroup (BMWG)


Reported by: Kevin Dubray 

The BMWG session was held on Friday, April 11, 1997. Twenty-eight people signed the attendance list for this meeting.

Kevin Dubray opened the Friday morning session. The agenda was presented and approved as follows:

I. (05 min) Agenda Bashing.

II. (45 min) Discuss the Multicast Benchmarking Terminology Draft.


III. (30 min) LAN Switch Terminology Draft. "draft-ietf-bmwg-lanswitch-04.txt"

IV. (60 min) Progress Latest Cell/Call Benchmarking Draft. "draft-ietf-bmwg-call-01.txt" 

V. (10 min) Review and Update BMWG Milestones.

II. Multicast Benchmarking Terminology Draft

Kevin Dubray gave a presentation that was an overview of the current draft on multicast benchmarking terminology. (Slides of the presentation have been forwarded to the IETF Secretariat.)

The presentation was targeted to be a basis of discussion for this early multicast draft. As such, discussions followed on a number of items highlighted in the presentation. One such discussion focused on the definition "flow."

Many people thought that it is important to have a logical handle on a categorization of traffic as opposed to the term "stream" which has a more physical (i.e., port or load) association. (The term "stream" is often used, but it is not formally defined in BMWG work). However, folks also

think the term "flow" is becoming trite and ambiguous in networking parlance.

Robert Craig suggested that "class" may work well for the term's name. Heads nodded. Dubray said that the much of terminology offered in the draft could have better names, and he is more than receptive to suggestions.

Another term that spawned debate was the forwarding metric, Scaled Group Throughput. Scott Bradner pointed out that using throughput as criterion may be unfairly weighted to those devices that have distributed architecture. Scott went on to say that Packet Loss Rate may be a better measure. Dubray countered saying that architecture is not the driving factor in offering the term; rather, throughput is a standard, more absolute metric. Scott acknowledged that statement, but reiterated that the throughput metric may not be the best choice for scrutinizing group scaling performance. Dubray noted that may be true; he asked the group if ascertaining DUT forwarding performance as a function of increasing multicast group support is a worthwhile exercise. The group agreed. Bradner and Dubray agreed to further the discussion of the choice for the basis

of the metric on the BMWG mailing list.

On another scaling/performance metric, Aggregated Multicast Throughput (AMT), Scott thought the base metric, throughput, is applicable and useful.

With the last throughput metric presented, transitional throughput, Dubray articulated that this is another example where the behavior being characterized is useful in a multicast environment (e.g., DVMRP), but the benchmark's title is awkward. He asked for better suggestions.

On the topic of fairness, Bradner suggested that it might be useful to define a metric that can communicate "crosstalk," or the ability of one class of traffic to impair the processing of another class. Many agreed.

When the discussion moved to multicast latencies, the group echoed the need to measure multiple latencies and relate them to multiple axes, such as like multicast groups, multicast sources, or multicast destinations. 

As the presentation on this topic drew to a close, a question came up on whether the draft should consider encrypted multicast. Dubray agreed that this is a cogent topic that could be addressed by the draft. He encouraged folks to offer draft proposals and input to the BMWG mailing list.

III. LAN Switch Terminology Draft

Dubray announced that Bob Mandeville had taken ill on travel and forced to return home. In Bob's absence, Kevin led the discussion of the LAN terminology draft. Kevin updated the group as to the draft's progress since the San Jose meeting. A discussion ensued on the current draft, <draft-ietf-bmwg-lanswitch-04.txt.

In general, the group thought the document is nearing completion and should be readied for ascertaining consensus after the following issues are addressed:

1. The group thought it is a good idea to callout a "one source port to one destination port" traffic distribution. (This is referred to as "non-mesh" in earlier email on the mailing list.) It is felt that this addresses a very popular traffic pattern used by a variety of test gear. The group thought it very important to differentiate between "traffic orientation" (e.g., unidirectional/bi-directional) versus "traffic distribution" (e.g., one-to-one, fully meshed, etc.). An example that "a one-to-one traffic distribution could have a bi-directional orientation" was offered to illustrate that need.

2. Items 3.2.1 through 3.2.4 are thought to be liberal in their use of "stream" in light of the multicast draft discussion on "flow/class." People thought it beneficial to ensure that no usage conflicts between the terms "flow/class" and "stream" occur in BMWG works-in-progress. 

3. A general cleanup to address a variety of minor nits was suggested. Some examples cited aligning the Index with the documents sections, making the document RFC 1543 compliant, etc. Dubray indicated that he would pass the input along to Bob. Kevin asked that people send additional feedback to the mailing list. He indicated that the group should strive to test consensus on this work before the next session.

IV. Cell/Call Terminology Draft

Kevin introduced Robert Craig, the editor of the current draft on cell/call benchmarking terminology. Robert stated that he attempted to keep a "black box" style of testing in mind when offering the benchmarks. Robert immediately started a point-by-point discussion of the draft.

On Item 3.1.1, Call Setup Time, a good discussion ensued - mostly with regards to conditions that delineated the completion of the setup procedure. In the end, folks communicated that the definition needed more meaning in the description of what is actually being measured.

With Item 3.1.2, Call Setup Rate, Robert inquired as to whether folks thought that the distinction of terms with respect to the qualifiers "sustained" and "peak" is useful. There is not an overwhelming response in the affirmative to do so; Robert articulated that he would not pursue the practice.

Section 3.1.3, Call Maintenance Overhead, generated many comments. Some folks thought the concept is provocative but lacked substance as currently worded. Another person stated that the ambiguity of the term would be problematic in comparing different results. Robert articulated that what he had in mind is to define a metric that will show the difference on forwarding performance as a function of virtual circuit mechanisms. He agreed that the term, as currently defined, is "too fuzzy." He thought it would be a good idea to hone the metric into a term such as "interference" or "crosstalk."

Call Teardown Time, Section 3.1.4, drew similar comments to its peer, Call Setup Time. Someone thought clarification of what exactly is being measured would help. Another person interjected that identifying the "freeing of resources" might not necessarily conform to black box testing. A comment from the group articulated what may be more important is the time it takes to end a call and the time required to start a new call. Robert indicated that he liked this "call turnaround" approach and would draft the appropriate wording.

Based on an earlier discussion, Robert indicated that he would remove Item 3.1.5, Call Teardown Rate.

The definition of "Impact of Signaling on Forwarding," Section 3.1.6, is thought to be unclear. Robert indicated what he intends to convey is the impact of forwarding performance as a function of a variety of parameters taken individually. Such parameters may be the number of outstanding calls, call request rate, etc. He further commented that this metric may be handled by the proposed crosstalk or interference metric alluded to earlier.

On Item 3.2.1, Packet Disassembly/reassembly time, a member of the group suggested breaking out the assembly and reassembly components as separate metrics. The rationale offered is that each metric may have a different methodology and impact on the DUT. A concern was raised that the methodology hinted in the discussion may not yield values that lend themselves to comparisons across varied systems. Another comment pointed out that integrating some BERT (Bit Error Rate Test) functionality may provide interesting information. Robert thought this was addressed in Item 3.2.3. Still another person suggested that the metric's discussion section needs to clearly declare that this was a black box test conducted on a system level. That is to say, results for this test are "indirectly" derived as opposed to a clear box analysis requiring internal instrumentation to empirically collect the measurement.

A general discussion followed on the nature of tests addressed by this draft and others. The point was offered that there seems to be some general sets of tests: A) Tests that can be run; B) Tests that can be run AND provide USEFUL information; C) Tests that could provide useful information but do not lend themselves to practical execution.

Robert declared that Item 3.2.3, Full Packet Drop Rate, may fall into set C. While it may be useful to determine how a damaged or lost cell impacts a DUT's overall forwarding ability, it may not be straightforward to collect this metric. Moreover, methodological reliance on DUT internals may depart from the "black box" model.

On "Topology Table Size," Item 3.3.2, Craig thought there needs to be some consolidation concerning capacity with other BMWG works-in-progress. Others agreed. Additionally, there is some question as to what is meant by the word "supported" in the term's definition.

Item 3.3.3, Topology Table Learning Rate, was explained to have been built in a more general fashion than a parallel concept proposed in Mandeville's LAN switch draft.

The metric "blocking probability," Item 3.3.6, was offered in direct response to a suggestion by Mike O'Dell. Robert indicated that while he drafted the preliminary wording, he is concerned with the practical nature of the metric. It is further noted that for "non-meshed" and possibly "partially meshed" traffic distribution patterns, measurement collection may be reasonably straightforward; however, there is concern that a "fully meshed" traffic distribution may be more problematic.

It is recommended that Mr. O'Dell be pinged for input.

A question was raised as to whether Dr. Raj Jain was invited to review the work addressed by this draft. The chair acknowledged that a request was sent to Dr. Jain, but as of yet no reply had been received. The chair indicated that he would contact Jain again.

On the terms "congestion avoidance" and "congestion management," Items 3.5.1 and 3.5.2 respectively, Craig noted the current wording is fuzzy. He added that the intent is to try to ascertain how the DUT behaved in the presence of congestion. There was a solicitation

for "bright ideas" on the topic.

On the concept of "Impact of Routing on Forwarding," Item 3.6.1, Robert noted the input of Curtis Villamizar. Robert felt this is a reasonably easy to demonstrate. Some in the group had concern over the methodological control of the metric.

"Impact of Congestion Control," Item 3.6.2, is another metric designed to explore the overhead of congestion control on the forwarding of data. The concept of "stable oscillation" is an important behavior to characterize.

On Item 3.7.1, Traffic Management Policing, Craig thought it would be a good idea to consolidate similar terms from other BMWG work, if appropriate. In addition, Robert thought the terms in Section 3.8, Multicast, could draw from or could be addressed in the multicast benchmarking terminology draft.

Robert concluded his session by stating that he had received outstanding input and hoped that folks would continue that practice.

The chair asked if there was any new business. With no new business introduced, the chair offered the following goals for the Munich session:

1. Edit the LAN switch draft to reflect the input from this session. Issue a new version of document for comment. If appropriate, ascertain consensus on whether to recommend the draft for consideration as an RFC.

2. Take controversial components of multicast draft to mailing list for discussion. Incorporate changes to draft and reissue appropriately.

3. Continue to work the Cell/Call Terminology Draft. Reissue draft as appropriate. Try again to contact Dr. Jain. 

The group consented to these goals and the meeting was closed. 


1. Multicast Benchmarking Terminology
Attendees List

Attendees List