2.4.2 Benchmarking Methodology (bmwg)

NOTE: This charter is a snapshot of the 42nd IETF Meeting in Chicago, Illinois. It may now be out-of-date. Last Modified: 17-Aug-98


Kevin Dubray <kdubray@ironbridgenetworks.com>

Operations and Management Area Director(s):

Harald Alvestrand <Harald.Alvestrand@maxware.no>
Bert Wijnen <wijnen@vnet.ibm.com>

Operations and Management Area Advisor:

Harald Alvestrand <Harald.Alvestrand@maxware.no>

Mailing Lists:

General Discussion:bmwg@ironbridgenetworks.com
To Subscribe: bmwg-request@ironbridgenetworks.com
Archive: http://www.alvestrand.no/archives/bmwg/

Description of Working Group:

The major goal of the Benchmarking Methodology Working Group is to make a series of recommendations concerning the measurement of the performance characteristics of various internetworking technologies; further, these recommendations may focus on the systems or services that are built from these technologies.

Each recommendation will describe the class of equipment, system, or service being addressed; discuss the performance characteristics that are pertinent to that class; clearly identify a set of metrics that aid in the description of those characteristics; specify the methodologies required to collect said metrics; and lastly, present the requirements for the common, unambiguous reporting of benchmarking results.

Because the demands of a class may vary from deployment to deployment, a specific non-goal of the Working Group is to define acceptance criteria or performance requirements.

An ongoing task is to provide a forum for discussion regarding the advancement of measurements designed to provide insight on the operation internetworking technologies.

Goals and Milestones:



Expand the current Ethernet switch benchmarking methodology draft to define the metrics and methodologies particular to the general class of connectionless, LAN switches.



Edit the LAN switch draft to reflect the input from BMWG. Issue a new version of document for comment. If appropriate, ascertain consensus on whether to recommend the draft for consideration as an RFC.



Take controversial components of multicast draft to mailing list for discussion. Incorporate changes to draft and reissue appropriately.



Submit workplan for continuing work on the Terminology for Cell/Call Benchmarking draft.



Submit workplan for initiating work on Benchmarking Methodology for LAN Switching Devices.

Aug 98


Submit initial draft of Benchmarking Methodology for LAN Switches.

Aug 98


Submit Terminology for IP Multicast Benchmarking draft for AD Review.

Sep 98


Incorporate BMWG input and continue to progress the Cell/Call Terminology Draft. Reissue draft as appropriate.

Sep 98


Submit first draft of Latency Benchmarking Terminology

Dec 98


Submit Benchmarking Terminology for Firewall Performance for AD review

Mar 99


Submit Terminology for Cell/Call Benchmarking draft for AD review.

Mar 99


Submit Benchmarking Methodology for LAN Switching Devices draft for AD review.

Jul 99


Submit Latency Benchmarking Terminology draft for AD review


Request For Comments:







Benchmarking Terminology for Network Interconnection Devices



Benchmarking Methodology for Network Interconnect Devices



Benchmarking Terminology for LAN Switching Devices

Current Meeting Report

Benchmarking Methodology WG Minutes

WG Chair: Kevin Dubray

Minutes reported by Kevin Dubray

The Benchmarking Methodology Working Group met on Monday, 24 August 98, from
1930h to 2200h.
There were approximately 50 attendees.

Kevin Dubray opened the meeting with the presentation of the session's agenda:

The agenda was left un-bashed.

Dubray announced the BMWG had made good progress since the LA meeting: the
Multicast Benchmarking Terminology draft was undergoing minor editorial changes
following an AD review; the Firewall Performance Terminology draft was firming up
nicely as a bulk of the "connection" oriented issues seemed to be resolved on the
mailing list; new editors had resurrected the cell/call benchmarking terminology draft;
and the first methodology draft on LAN switch benchmarking was delivered.

David Newman was then introduced to lead a discussion on the Firewall draft.

David Newman identified the area of change from the last two drafts.
(See presentation in the Procedings.)

A question was posed, "What sort of attacks may impact performance?" David
articulated that there was currently no attempt in the I-D to benchmark firewall
performance were the device under attack. He thought classifying the types of traffic
used in testing firewalls may help. To that end, David mentioned that the newly added
"illegal traffic" definition covers traffic used in an attack.

Newman identified the topics that he would like address during the session:

- Bit forward rate. (It's hard to come up with the general case.)
- What classes of traffic?
- Benchmarking firewalls under attack.

David put up a slide (slide 12) of some historical categories that were thought to be
helpful in characterizing firewall performance. Examples included the number of email
messages or the maximum number of concurrent telnet sessions.

This lead into a pointed discussion over using a bit vs. a frame forwarding rate metric.
It was communicated that because firewalls handle things at various "layers," forcing
a frame or bit matric may be counterproductive. Cynthia Martin suggested appendices
to handle specifics related to the context of a particular technology, such as ATM.

Jeff Dunn again brought up the notion of PDUs (protocol data units) as units of
measurements. The measurement unit then could then be a bit, octet, or a frame. So
what characterizes a PDU? That's up to the tester. For example, you can define IP-
over-ATM PDUs, or TCP-over-IP- over-Ethernet PDUs as counting units; again, the
unit of counting is a test realization left to the tester.

In general, the group seem to acknowledge the need to move to a more generic,
transaction-based measurement paradigm. Specifically, the PDU-as-a-measurement-
unit received support.

Harald Alvestrand asked what does one do for input/output mismatch. Email relays are
examples where there is mismatch. David said that he believes "goodput" provides for
these mismatches.

With respect to firewall forwarding, David though it was useful to construct a matrix
with regards to traffic classes. (See Slide 11 of Newman presentation.)

A question was offered regarding provisions in the draft for overhead characterization
when the DUT is under attack. David replied that the matrix helps to provide for that
type of characterization.

David said that he would make the appropriate changes to the draft and attempt to get
the draft in shape for an AD review in December.

Jeff Dunn was introduced and he immediately introduced Cynthia Martin as his co-
conspirator in trying to pull the draft together.

Jeff gave a insightful, yet entertaining, presentation on the history, purpose, and status
of the Cell/Call Benchmarking draft. (Presentation included in the Proceedings.)

Jeff summarized the draft's motivation as providing metrics for NBMA technologies.

Jeff cautioned that while many technologies follow "connection-oriented" paradigms
(e.g. ATM & Frame Relay), these technologies may use clearly different components
to achieve the same end. Moreover, Jeff reinforced the need not to "re-invent" new
terms for known concepts but to leverage existing concepts to build understanding.

Because of the varied contexts with which the subject matter could appear, Jeff
requested the BMWG's help in populating the draft.

Jeff then presented the revised workplan with respect to the draft. The group had no issues.

Jeff then state one of the unwritten goals of the draft: To characterize the effect of good
and poor throughput on higher layer functions. Jeff also said that the draft may possibly
need to address SONET in the future.

With that, Jeff opened the floor for comments.

Dubray suggested that the "unwritten" be written into the draft (i.e., characterize the
impact of lower layers on higher layer performance.)

An attendee offered the question: with the interest in IP over Sonet and its associated
scrambling issues, how much related effort will go into this draft? Jeff replied that he
thought addressing IP over SONET merited investigation, especially as scrambling
may have a specific effect. He further pointed out such investigation MUST be limited
to characterization as opposed to a direct determination of whether the IUT were
scrambling correctly or not.

David Newman stated that it was nice & clean to say "security considerations: none,"
but is there is traffic that could impair performance. Harald Alvestrand didn't think
normal benchmarking presented a corresponding security issue. Dunn agreed
reminding folks that context of the draft's benchmarks was a clinical exercise.
Moreover, he believed, the traffic content was a more methodological detail.

Someone wondered at what level did references to "connection" pertain? Is "connection"
the same for Frame Relay as it is relative to ATM? In general, wasn't the term "connection"
very layer centric? Jeff retorted that, yes, the draft is very layer specific. But he thought
it was appropriate to draw the required relationships when and where it made sense.

With no further questions, Jeff thanked the group for its input and asked them to continue
the discussion on the mailing list.

Word was passed to the chair that the editor was sitting in a plane on the tarmac in a
neighboring state. Dubray asked whether people thought that it would be productive to
start a discussion of the draft without one of its authors. The group thought that it would
be beneficial.

Dubray then attempted to lead a discussion on the first methodology draft for LAN
switching devices.

It was mentioned that the first paragraph of the draft's introduction states that the document
defines a set of tests. Shouldn't the document define the test methodologies specific to the
tests defined in RFC 2285? More generally, many thought the procedural descriptions
lacked the specificity required to generate consistent results. To the point, "would two
people executing the same stated procedure get similar results on the same tested platform?
(i.e., as a function of methodology.)

Another person questioned the general prescribed reporting format. It was communicated
that a more structured approach might be needed. Another attendee stated that if we
require structured reporting formats, why not just cite ISO9646?

There was several comments with respect to addressing and frames. One questioned
where the method in determining how packets are destined to specific ports was stated.
A person questioned the wisdom of validating frame delivery based on address scrutiny
alone. It would be better to have an independent validity check, such as a tag embedded
in the frame.
Another person cited that several rates are identified through the draft. How are these
rates calculated? What source(s) specify the calculations? A similar comment was
received regarding burst size.

One person took issue with priming the device's address tables prior to a test run, as
advocated in section 3, Test Set-up.

Another person identified inconsistencies with respect to spanning tree operation, this
draft, and previous BMWG work. For example, RFCs 1242 and 1944 require the
Spanning Tree protocol to be enabled; this draft's section 3 suggests disabling
Spanning Tree operation; section 5.9 makes provisions for spanning tree operations.
David Newman offered that this is most likely based on experience that not all
Spanning Tree IUTs are able to be activated or disabled. Another voice asked how
could one form a basis for fair comparison if spanning tree was on in one DUT but not
on in another?

On the topic of address learning rate, the question of consistency of results was raised. It
was thought that the methodology stated was not adequate in the determination of a known
state, thereby compromising consistency. Deborah Stopp believed that flushing the tables
would go a long way in getting to a known state. It was generally agreed that getting the
DUT to a known state was beneficial and that the procedures relative to the attainment of
a known state were lacking in the current methodological descriptions.

On the monitoring of flooded traffic, a question was raised as to whether flooded traffic
is counted as a good or bad event. Kim Martin thought it useful to report offered load
and forwarding rates with respect to frame size.

It was thought that the draft occasionally presents definitive (and potentially questionable)
conclusions that have no place in a document defining test methodology. It was thought
that the document would be better served by defining input parameters, test procedures,
and test outputs.

A question was raised asking why was the draft released with so many sections marked,
"to be done." The chair responded that the authors had acted on his request - the metrics
defined were modular enough to be addressed individually. Moreover, the scope of the
draft was discrete enough (having been defined by RFC 2285) that the approach to
garner commentary in a piecemeal fashion was not unsound.

The chair also responded to a charge made on the BMWG mailing list that the draft was a
vehicle for a vendor-specific implementation: the chair thought ad hominum attacks were counterproductive; the presentation of alternatives to questionable metrics and methods
was by far more productive. By having the BMWG choose the best _presented_ solution,
the networking community and its vendors would be best served.

Dubray postponed discussion of Latency issues until a later date.

With that, Dubray summarized the BMWG goals for the next session:


Call/Cell Benchmarking Terminology ID

Attendees List

go to list