Benchmarking Methodology WG (BMWG)
Tuesday, March 12, 2013
0900-1020 Morning Session I Boca 2 OPS bmwg
This report is arranged in 2 parts, a summary and detailed minutes.
The report was prepared by Al Morton, with Detailed Notes from Barry Constantine as official note taker. Joel Jaeggli and Bill Cerveny monitored jabber.
BMWG met with 25 people in attendance. The Meeting began and ended exactly on time.
One RFC was approved and published, the RFC2544 AS. WGLC is complete for IMIX Genome, the call of the result awaits appointment of our new AD Advisor, Joel.
Two WG charter topics need renewed attention, the Content Aware Device Benchmarking and the SIP device Benchmarking (where there are substantial IETF Last Call comments, and revised drafts are needed).
Four New Work Proposal topics were presented:
Power Benchmarking represents an important but challenging area for BMWG. A key point is to coordinate with other "green" activities and be sure that the metrics meet the user community's needs. Folks have read this and are interested in working it.
Traffic Management Benchmarking needs some refinement to match the usual BMWG role of vendor comparison, but there is clear interest on the list. Support for both of these items will be tested on the list.
IPv6 Neighbor Discovery discussions yielded clarifications of the metrics and identification of normal, capacity, and overload stages of testing. This is an early draft with reasonable interest, and it will benefit from further development and feedback.
A range of work areas related to Datacenter Benchmarking were proposed (presentation only). There appears to be useful work in this space beyond the existing data center bridge draft, so comments and further proposals are sought.
The New In-Service Software Update draft was not presented, but it was noted that there was some interest and useful discussion on the list prior to the meeting.
The WG thanked Ron Bonica for his efforts as AD advisor, and Successfully made it through another session without anyone proposing to benchmark Facebook.
Interest call on Power Benchmarking Draft
Interest call on Traffic Management Benchmarking
Re-chartering Discussions (Draft text)
Brief Liaison Reply to MEF
DETAILED MINUTES (provided by Barry Constantine with help from Ram Krishnan while he presented)
Al kicked off the meeting and welcomed 4 new attendees and invited them to join the BMWG.
Al welcomed Joel Jaeggli who is our new OPS Area Director and Advisor.
Working group activity summary:
The draft-imix-genome has finished WGLC and Joel will call consensus. He stated this would occur after IETF 86.
The SIP drafts accumulated a lot of comments during IETF Last Call and new revisions are required to move forward.
New RFC 6815 was published, “Applicability Statement for RFC 2544” (Considered Harmful to production networks)
Standard Security paragraph is in the slide deck, and Al gave the reminder that benchmarking is a lab test, not intended for production networks and that this paragraph should be included in all BMWG memos.
Al spoke about the liaison from the MEF regarding service activation testing. We have previously clarified that the RFC 2544 is not intended for service activation and he plans to refer the MEF to direct their liaison to IPPM in the future.
Working Proposal Summary
Al reviewed the proposed work matrix summary and various criterion to rate the working group proposed works. DCB is chartered item but lacks significant support at meetings and in terms of reviews.
Review of Various Individual Submissions
1. Power Benchmarking
Vishwas Manral presented Power Benchmarking on behalf of Puneet Sharma, Sujata Banerjee and Yang Ping.
HP has done extensive work in power benchmarking and has used this methodology for their own devices. Problem trying to solve, “power is the biggest unmanaged expense in the enterprise today; 50% of servers are left powered on in the evening and not being used; potential savings is $50B per year (ballpark number)”
Maximum rated power (MRP), is the standard metric provided by vendors and is not useful at all in terms of real world power consumption. Need a real way to measure true power usage of devices, which includes performance reports and a means to compare performance reports against true power consumption.
This work would define the relationship between network traffic intensity and processing of devices to the power consumption. Key metric is NECR (Network Energy Consumption Rate) which correlates Mbps of network traffic versus increase in MilliWatts. Also the NEPI was discussed which is the ratio of the Ideal (predicted) versus actual or measured power.
Vishwas discussed the various other standards groups that have worked in the power consumption domain but explained that IETF BMWG work goes beyond just the measurements and into reporting aspects as well.
- Another metric was described by Lucien Avramov, Typical Power consumption. The type of cables was also discussed and the need to use various type of cables, differences as great as 25% were observed. Load does matter in terms of power consumption; Lucien has seen that packet size does not affect power consumption not nearly as much. Lucien emphasized that typical power consumption is a significant missing metric in the industry
- Question was asked concerning the level of overlap with other standards bodies. Vishwas answered that the authors’ study of the other groups did not reveal any metric / benchmarking work and welcomed others to review and concur with their conclusion. Scott re-emphasized that communication to the other SDOs to verify that there is not overlap and that this work would be useful in the industry.
- Ramki suggested that the scope should be extended to servers / storage in addition to networking devices; Al indicated that is out of scope of BMWG, but noted that the work may be applicable to servers / storage (after the networking aspect is fully covered).
- Shangjin Jeong comment: - The operating conditions of line cards (active, idle, sleep, etc) may affect the power consumption of the device. So, it might be useful to consider the operating conditions for more accurate power benchmarking. This work is very useful for enterprise customers.
Al asked those who read it and asked if BMWG will pick up this work. There was sufficient interest to make this a working draft and this would be taken up to finalize on the list.
2. Traffic management benchmarking (Barry Constantine presented, Ramki Krishnan took notes)
- Al Morton comment: need scalability test (similar to a real deployment) - multiple queues
- Scott Bradner comment: go higher than policer/shaper rate – many devices fail when they have to go over the specified rate
3. Benchmarking Network Discovery Problems – Bill Cerveny
- Existing RFC 6583 “Operational Network Discovery Problems” and Bill explained the background.
- IPv4 network has 510 addresses but 2**64 in IPv6 subnet
- Scott Bradner brought up the point that this type of scanning may be considered DoS
- Joel Jaeggli asked the question “does the device under duress, continue to perform NDP for the hosts that it knows about or not”. Joel emphasized that it is good to know that a device can handle 50,000 NDP messages per second, but what would happen to a device under this type of load. Size under normal operation (capacity) and what happens when the device is in overload state was a key point. Joel indicated that when a switch is procured, we need to know how many NDP messages the switch can handle per second.
- Ron re-emphasized the point that the capacity needed to be benchmarked and the behavior of the device after capacity is exceeded.
- Bill discussed his lab and reviewed the topology for the lab. Bill also reviewed a more comprehensive test network idea and the fact that a test system could be used instead of network device(s); this would greatly expand the size of the “network”, Scott felt strongly that a test system should be used as many of the performance metrics were built into the test system.
- Bill reviewed the metrics that he proposed in the -00 document. Frequency of ND triggering was easy to test. Scott pointed out that measuring the behavior of the device to adhere to the ND rate could be considered benchmarking (Bill wondered if this would be considered compliance).
- Bill had the question whether the ND performance should be benchmarked or just the unusual behavior after capacity is exceeded. Scott emphasized that BMWG does not try to “break” a device but whether it performs according to the specifications. Being able to determine the size of the subnet before the device “dies” would be a useful metric.
4. Data Center Benchmarking Draft – Jacob Rapp
- Jacob reviewed the fact that RFC 2544, 2889, 3918 are the de facto benchmarks in the Data Center.
- Virtualization, low latency, big data bursts, etc. are not being covered by the current RFCs and this is a big hole in the Data Center benchmarking.
- Scott re-emphasized that above utilization is not a reliable benchmark and the variation could be as high as 20% and Scott does not use over utilization
- Ramki Krishnan asked how this draft would address the application mixture and how it would be tested. Jacob presented many of the concepts in the proposed draft including refined definitions and methodology that are relevant to the DC. Jacob used the example of FIFO versus LIFO testing and also included the fact that stateful and stateless traffic should be included.
- Goodput will be promoted heavily in the draft as well and this in terms of application throughput. Discussed “mice flows” versus “elephant flows” and the importance to understand goodput mice flow performance versus just elephant flows (most times “elephant flows” are the focus of performance studies).
- Al recommended that this work should be pared down to focus on the “low hanging fruit”
Closure: Al thanked out-going AD Advisor Ron Bonica and closed the meeting.