Benchmarking Methodology WG (bmwg) Tuesday, Afternoon Session I 1300-1500 Room Name: Spinnaker (Audio Channel 3) CHAIR(s): Al Morton <acmorton@att.com> The report of the BMWG session at IETF-67 is divided in two main sections: - Summary of the meeting and Action Items - Detailed Minutes of the meeting (typical of a Jabber Log) This Report was prepared by Al Morton, based on Tom Alexander's notes/minutes complied as official note-taker. 35 people signed the blue sheet, and about 5 participated remotely. Session Audio: http://limestone.uoregon.edu/ftp/pub/videolab/media/ietf67/ Jabber Log: http://www.ietf.org/meetings/ietf-logs/bmwg/2006-11-07.html Slides: https://datatracker.ietf.org/public/meeting_materials.cgi?meeting_num=67 BMWG Session Summary -------------------- The next Publication Request will be for the Address Hash and Bit Stuffing draft. It will be revised one more time, but it is very good shape editorially and the last few issues appear to closing rapidly. The Accelerated Stress Benchmarking Drafts and the IPsec Drafts are scheduled next on the milestone list. Accelerated Stress has really benefited from WGLC and continued comments beyond the deadline. Reviewers for the IPsec drafts have been found, so we should be able to put that development back on track. We had good discussions of the new work items: Protection and IPv6. We continue to see a steady stream of new work proposals, including Multicast VPN Scaleability and WLAN switches at this meeting. The returning proposal on SIP performance benchmarks has a new methodology draft in addition to the terminology draft (which must be coordinated with the work being conducted in SIPPING). General Action Items: --------------------- * Need Everyone's review to find editorial problems and fix them. * When you review drafts, send your comments to the list, not just the editors. * Need editors to be responsive to comments, and post drafts between meetings to promote more discussion on the list. Specific Action Items: ---------------------- * Gerry Wilson volunteered to review the IPsec drafts. * Editors to consider where to place the "standard paragraph" in their memos. * Co-chair to work with Area Directors on revised charter wording. Detailed Minutes of the Meeting: -------------------------------- 0. Agenda bashing See https://datatracker.ietf.org/public/meeting_materials.cgi?meeting_num=67 for Slides Meeting started at 1305 hours Pacific time. Tom Alexander was note taker. Al gave people a few minutes to trickle in from lunch. About 35 people attended, and about 5 participated remotely using jabber and the audio stream. Al checked audio quality on Jabber; Jay Karthik responded that he could hear clearly. About 10 people indicated (in response to an informal poll) that they were new to BMWG. As a consequence, Al briefly described the charter and work of the BMWG. Al went over the IPR from IETF on the yellow sheet and encouraged people to note it well. He then reviewed the agenda and requested bashing if any. It was noted that Scott Poretsky was not present, and so Al would be present briefly on the Accelerated Stress topic. New work included SIP performance benchmarking, multicast VPN scalability, and WLAN switches. There was no feedback on the agenda, and so it was approved. 1. Working Group Status (Chair) Topics/Drafts not covered by presentations below: Hash and Stuffing Draft (WGLC completed) http://www.ietf.cnri.reston.va.us/internet-drafts/draft-ietf-bmwg-hash -stuffing-06.txt Change Log for 06: http://www1.ietf.org/mail-archive/web/bmwg/current/msg01273.html Terms and Methods for Benchmarking IPsec Devices (WG review and feedback is needed!) http://www.ietf.cnri.reston.va.us/internet-drafts/draft-ietf-bmwg-ipse c-term-08.txt http://www.ietf.cnri.reston.va.us/internet-drafts/draft-ietf-bmwg-ipse c-meth-01.txt (Recently Expired, see http://tools.ietf.org/wg/bmwg/ for these versions) Methodology for Benchmarking Network-layer Traffic Control Mechanisms http://tools.ietf.org/wg/bmwg/draft-ietf-bmwg-dsmmeth/draft-ietf-bmwg- dsmmeth-02.txt Al reviewed the WG status. He noted that there were two topics at AD/IESG review. The terminology for reservations (draft-ietf-bmwg-benchres-term-07.txt) had a DISCUSS against it. Al briefly explained what a DISCUSS was, and said that he'd been working with the coauthors and they were planning to resolve it by November 10th. Also, David Kessens has reviewed the IGP dataplane convergence terminology/methodology drafts, and he would like to have them revised before he puts them on the IESG agenda. The drafts maturing to the WG Last Call process were then discussed. Al briefly gave the status of the accelerated benchmarking terminology and methodology; he noted that comments were coming in even after the last call was over. Al gave a brief description of the hash and stuffing I-D and the reasons it was important, and noted that two people (Tom Alexander and Al) had comments, most of which were resolved. Finally, Al discussed the IPsec drafts. Al noted that the drafts had expired, but Merike Kaeo (one of the coauthors) was present and they had discussed how/if to bring them back to life. Merike stated that more people were starting to be interested, primarily because of IPv6; her plan at present was that she would work with one of the coauthors and revise the drafts for WG Last Call. She would also start a new document on IKEv2 methodology as well, beginning as an individual document and continuing to a WG draft. Al asked for volunteers to read the IPsec drafts; Gerry Wilson volunteered to read and review them. Al covered the remaining I-Ds that were in progress, some of which had expired by plan and he wasn't going to do anything about that. There were no drafts before the RFC Editor at the time. After this, he mentioned all the new work proposals: SIP terminology/methodology, MPLS methodology, LDP convergence, MVPN scaling, and Wireless LAN Switching. A brief description of an updated charter for BMWG was made. This is a work in progress, in conjunction with the ADs. Also, a supplementary Web page has been created on http://home.comcast.net/~acmacm/BMWG, that contains general information about BMWG processes and tools, and new work I-Ds in progress. Al noted that if you include the letters "bmwg" somewhere in the filename, then the "tools team" page would automatically include links to the draft on their version of the BMWG web page, and also keep track of revisions and changes for the chartered work items. In the last 2-3 years there has been a considerable renaissance in the tools and the automatic processing functions. Al also discussed the nits checker, and encouraged people to become familiar with all of the new tools. A new RFC had been published, on Differentiated Services Traffic Control Terminology. One of the coauthors was present: Jerry Perser, and the group gave a round of applause. There is now a fairly stable "standard paragraph" covering introduction and security aspects, to be placed in all BMWG I-Ds and RFCs. Al suggested that the paragraph might be placed near the beginning (introduction) rather than in the security section (as he originally proposed). He noted that the IPv6 authors "spoke with their keyboard" - they placed the paragraphs in the security section. The jury was still out as to exactly where it would be best positioned, but this text should be placed in the I-D somewhere now. (Kevin Dubray remarked via jabber that RFC3116 has a decent "reference" to a stock BMWG security section, but his comment was not read-out during the meeting.) 2. Techniques for Benchmarking Accelerated Stress Testing. http://www.ietf.org/internet-drafts/draft-ietf-bmwg-acc-bench-term-10.txt http://www.ietf.org/internet-drafts/draft-ietf-bmwg-acc-bench-meth-06.txt Having covered the status and work-in-progress items, Al went on to the presentations on current I-Ds. The first was the accelerated stress I-D; Al presented the topic without slides. He noted that this was fairly complicated topic, but now was the time to read and review it. Al noted we're looking for the editors to be very responsive with regard to comments on this (and other) reviewed drafts. The Milestone for Accelerated Stress Testing is to reach consensus and request publication by December of 2006. (Kevin Dubray remarked via jabber that folks should COMMENT TO THE LIST versus direct response to the principal authors, as it will help our ADs assess WG support, but his comment was not read-out during the meeting.) 3. IPv6 Benchmarking Methodology Presenter: Ahmed Hamza Draft: http://tools.ietf.org/wg/bmwg/draft-popoviciu-bmwg-ipv6benchmarking-02.txt The next presentation was on IPv6 benchmarking methodology, Ahmed Hamza presenting. Ahmed noted that a lot of feedback had been received from BMWG as well as the v6ops group, and the new version reflected this feedback. He first covered the action items from the last meeting, all of which had been done (including the proposal to request an address space allocation from the IANA). The appendix had been updated with line rates of jumbo frames. Ahmed thanked the large list of reviewers for their efforts, as well as the test tool vendors who were planning to implement it as well. The current status: it was accepted as a BMWG chartered item, rev 02 had been published, much feedback had been received (all positive), and more feedback was solicited; and no objections had been received from either BMWG or V6OPS. Al noted that the draft name will be changed to reflect the fact that it was a full BMWG working draft, and so it would start at revision 00. Ahmed asked if the group would consider that the draft would be ready for last call, as it had been reviewed by so many people. Al asked if there were any other groups that would provide feedback; Ahmed noted that to the best of his knowledge only the IPv6 and BMWG groups would be interested in providing feedback, but he would encourage any group to comment if they were interested. Al asked how many people in the room had read the draft; quite a few responded. Scott Bradner had two comments. First, he remarked that there was a requirement to monitor CPU Utilization, etc., but then there was nothing stated as to what to do with it. Ahmed responded that there were a lot of implementation-dependent details involved in exactly what is reported. Scott suggested that implementers might be informed that they would find it useful to monitor CPU utilization, but not required (the wording could be changed slightly to reflect this). The other was a purely political thing: rather than instructing IANA to do something, he should request them to do it. Scott offered to give him further details offline. Al then thanked Ahmed, Diego Dugatkin, and the other co-authors for the presentation and their progress. 4. Protection Mechanisms Presenter: Jean-Louis Le Roux Drafts: http://www.ietf.org/internet-drafts/draft-ietf-bmwg-protection-term-00.txt http://www.ietf.org/internet-drafts/draft-ietf-bmwg-protection-meth-00.txt The next presentation was on sub-IP protection benchmarking, by JL Le Roux. He started with a brief history of the work, starting three years ago. He then reviewed the updates from the previous versions, and how the various contributory drafts had been merged in. The next steps were discussed, and specifically the areas that would benefit from more extensive WG review. Quite a few comments were received from Jabber. Rajiv Papneja noted that they would be responding to the comments from Arun Gandhi. Al remarked that Arun had covered many of the same issues he saw as he read the drafts. JL Le Roux then said that he considered that the two drafts are quite stable, and said that he was looking forward to WG Last Call. He then thanked the people who helped with and contributed to the draft. David Kessens said that he had a general comment: there was a need for more review of the documents. For example there were three documents that were sent to the IESG and were sent back because they had not been reviewed enough. He said that this is not a comment to the authors alone, but one which the whole bmwg community must take responsibility to address. Al echoed this comment, and encouraged people to read and review the drafts to make these better products. Jerry Perser said that he wanted to know how to apply this work outside of MPLS. Al said that Sylvia's document on multicast VPNs has some failure scenarios as well. JL remarked that the terminology is quite general and is not linked to MPLS; the methodology is, however, linked directly to MPLS. Jerry noted that he was working on wireless mesh protection schemes - sub IP - that could use this work, but did not rely on MPLS. There was considerable discussion on the definitions of failover event, the stimulus (failure), and protection switching. Al noted that was no definition for protection switching in the drafts, despite the word protection appearing in the title. JL said that Arun had made some comments along these lines. Al suggested that general terms should be used the same way every time (restoration is sometimes described as failover recovery). Scott Bradner said that the use of the term "sub-IP" may be loaded and is not recommended. It meant something different to other WGs, and there used to be a special sub-IP Area that is now closed. Scott Poretsky remarked (via jabber) that the terminology is general, and to read the terminology, specifically with respect to failover events. JL explained this as well. Scott Bradner asked if they were trying to measure the length of time to detect failure or the length of time to switch? The answer was "both". Scott B. responded that getting two unknowns into an equation is not going to help get the answer, and you may want to split the equation, especially as the two components (detection and response) are so different that it was better to keep them apart. This is partly because detection methods have widely varying responsiveness, depending on whether they use techniques like carrier-detection or heart-beats to detect failure. JL noted that in the methodology they tried to list a set of failure events (the causes of failure) that could cause an initiation of protection switching. The Jabber suddenly came alive with responses from the coauthors and others. Al read the various responses, and remarked that there was a bit of a backlog and some of the comments could not easily be matched to the discussion. Al summarized that there was some disagreement from the coauthors about Scott Bradner's suggestion to split the time into detection and switching. Scott B said that when talking about packet loss, what they really meant was packet loss at a particular link speed. In his view it was a somewhat backwards way of expressing the switching delay. Al said that he agreed that these measurements should be time based and not packet loss based as much as possible, and that this was one of the reasons we have both loss-based and rate-based convergence times in the IGP convergence drafts. Al further clarified that the switching times expected from protection technologies are much faster than the routing convergence times measured in the past. There was considerable jabber on Jabber, and more discussion took place between the remote participants, JL, Al and Scott B. Scott Poretsky took the position (on jabber) that IGP convergence times are <1 second, so the same methods that were applied there could be applied to protection time assessment. Scott B: if you are sending a series of timestamped packets through, you can say that we lost packets X through Y, and convert that to a time. Personally, he felt that it would be better to measure it as a time rather than a packet loss. Al then cut off discussion, as many other topics had to be covered. Jerry suggested that the discussion should be continued on the reflector. Al then thanked JL for doing a good job on a controversial presentation, and said that he had some other comments - specifically, that the methodology draft is in horrible shape editorially and needed to be fixed before people could be encouraged to review it - but he would be glad to discuss this off line. 5. Milestone Status (Chair) Al then reviewed milestones. He noted that we were a little bit behind on the hash and stuffing draft, but he felt that the comments on the last call had been dealt with and we could move on. Jerry Perser had a comment on the hash and stuffing draft: he wasn't implementing it any more because the random MAC addresses and security didn't mix very well in the wireless world and led to more support calls. Al asked that folks take that up off-line or on the list. Al reviewed the new work evaluation matrix, and noted that the SIP work now had two drafts supporting it, both terms and a methodology. 6. SIP Performance Benchmarking Proposal TEXT: http://www1.ietf.org/mail-archive/web/bmwg/current/msg01291.html Presenter: Carol Davids Related Drafts: http://www.ietf.org/internet-drafts/draft-poretsky-sip-bench-term-01.txt http://www.ietf.org/internet-drafts/draft-poretsky-bmwg-sip-bench-meth-00.txt The next presentation was on SIP performance benchmarking, by Carol Davids. Carol said that she would not talk too much about motivation, as we are all here because we are motivated to create benchmarks. She remarked that service providers could use this as black box tests on SIP devices. She then discussed the scope of the terminology and methodology, as well as the distinction between a DUT and a SUT vis a vis SIP benchmarking. Al commented that we don't standardize compliance testing, but in one of the drafts the statement was made that the device must be compliant with RFC 3261. He suggested that there should be an informative pointer to where you could go to find something to help you assess compliance. This could be an informative reference or something, certainly nothing normative. Carol gave a short overview of the terminology draft, and remarked that there had been work done as much as 8 years back on a terminology and a methodology for testing telephony services, or VoIP services. At this point, however, SIP is proposed to be used for much more than telephony, and the terminology has been expanded a bit to encompass these services. Also, several benchmarks have been defined, which fell into 3 categories: signaling plane, media plane and (within the signaling plane) invite vs. non-invite transactions. Daryl Malas remarked that first, his draft was specific to SIP, and could be just as useful for presence or anything like that, as well as voice; and second, what is meant by associated media sessions? He noted that RFC 3261 doesn't really define session that well, and the SIPPING draft defines the session in order to define things like session rates. He felt that the BMWG should not define a session different from what is defined in the SIPPING draft. Daryl said that SIP is specifically a control protocol, and so it is necessary to specify what is being tested. The SIPPING WG draft discusses setup and disconnect delay as well, and it was essential that the metrics themselves should be clearly different between the two (in addition to the differentiation between end-to-end vs. device testing). Carol responded that a session is defined as a media session; a better defined term was "call". Her concern is that we should be able to define how we are stressing our network or our box with the appropriate benchmark. So, we have to be able to add up all the metrics for all the individual boxes, and the sum should then equal to the metric that the SIPPING group is measuring. Al then cut the discussion off because they were running out of time, and suggested that they should resolve this issue but do it offline. Carol quickly reviewed the methodology draft at present, and discussed the next steps to complete the methodology and terminology. Al thanked Carol and asked if anyone had read the draft. About 3 people besides the coauthors had read the draft, not a lot of response, but he felt that we could make something out of this eventually. The mailing list should be used, and definitions should be coordinated. 7. Multicast VPN Scalability Benchmarking Brief Statement of proposed work: Multicast VPN (MVPN) is a service deployed by VPN service providers to enable their customers to use IP multicast applications over VPNs. With the increased popularity the scalability of deploying such a service is becoming of a great interest. This document defines standard metric and test methodology for characterizing and comparing control plane MVPN scalability of Provider Edge (PE) devices that implement MVPN service. Presenter: Silvija Dry Related Draft: http://www.ietf.org/internet-drafts/draft-sdry-bmwg-mvpnscale-00.txt Silvija Dry then presented on multicast VPN scalability benchmarking. She noted that there was interest from some equipment vendors and tool vendors. She said that the motivation behind the work was the large number of providers that had either deployed or started deploying MVPN, and scalability was of great interest, especially characterizing how devices scale when using this particular service. There was no way to measure MVPN scalability or even to describe it, and this would be a good time to cover this issue. Service providers were also interested in the ability of devices to recover from common network failures even when dealing with large systems. The scaleablility benchmark is multidimensional in nature and a metric set is required to describe it. There were two goals of this work: first, to have a means of comparing different implementations, and the second to generate results that would be useful to service providers in sizing their networks. With regard to the second goal, there was work being done with service providers, and some may share co-authoring duties. Silvija then discussed the relationship to other drafts already presented, and then went on to present an overview of the document. She noted that there was no terminology document, but they would be open to creating a terminology document if the group felt that there was a need. Al said that there was no reason to artificially split the document if there was no advantage, if it made sense then the terminology could be kept in the same document with the methodology. The test cases were presented, divided into two categories: one characterizing a single variable at a time, and the other characterizing desired deployment scenarios. The test cases covered both steady state and failure recovery testing, to ensure that the device can recover from operational failures. Six different stimuli had been defined for failure recovery testing. She noted that this was different from the work in the sub-IP draft, it was more focusing on the stability rather than recovery time. In terms of interest, several service providers and tool vendors had expressed interest, and this work would also be presented at MBONED. Al asked if the other equipment vendors had looked at the configurations and did they look general enough to implement in their world of MVPN scalability? Sylvija responded that they did, but there was a real challenge in comparing apples-to-apples across vendors. There was considerable discussion on this topic. Thomas Morin said that this was a useful work, and there might be some things that can be improved. One thing that was not highlighted, however, is that the draft was focused on the control plane and data plane procedures, the terminology was based on the terminology of an expired draft, and for both of these reasons if this became a WG work item then the draft should be rewritten to reflect this. Another comment was that part of the tests in the suite was not that related to MVPNs, but instead related to other aspects such as the scalability of PIM and so on. It would make more sense to put such tests into a separate multicast benchmarking suite. Multicast is not the only way to carry multicast traffic on provider backbones. Silvija responded that these had been considered, but the lack of operational experience kept them from writing the draft around the other options. Al asked if any other people had read the draft, and discovered that 7-8 people had read it, which was a very good indication of interest. Also, several people wanted to come to the mike to comment on it, but the line had already been cut for time. Also, Scott Poretsky wanted to know if the authors had read the multicast benchmarking RFC, the answer was yes. Silvija said that they could take the discussion offline with respect to the other options. Discussion continued outside the meeting room. 8. Extending the current methodologies to cover wireless LAN switches PRESENTER: Tom Alexander The WLAN switch benchmarking proposal was presented. The original work included WLAN meshes, but the scope has been reduced. Another key point was that the 802.11 committee reviewed the original proposal, and responded that it is not in their current scope of work, effectively giving BMWG the green light to consider it. Al noted that this proposal may have to be taken all the way to the IESG to check if this work can be made part of the BMWG charter. However, the process was the same (with one extra step): one or two drafts are created, then reviewed and by the group and the extent of interest assessed. The topic would then be discussed with the Area Directors, after which the IESG would decide whether the BMWG charter could be expanded to take-up the work or not. Scott Bradner pointed out that augmenting the scope for BMWG to include wireless devices might be as simple as adding a single bullet item, and Al concurred, saying that was his view as well. This is a proposal at an early stage, and no draft has been prepared yet. Tom and his colleague Jerry Perser are looking for volunteers to share the in glory of writing an Internet-Draft or two. Al asked if there were people interested in helping with this work. Scott Bradner was interested in working on the draft. With that, Al closed the meeting. Meeting ended at 1510 hours PST.