Benchmarking Methodology WG (bmwg) ================================== TUESDAY, March 20, 2007 1300-1500 Afternoon Session I Karlin I ================================== CHAIR(s): Al Morton <acmorton@att.com> Meeting Report -------------- The report of the BMWG session at IETF-68 is divided in two main sections: - Summary of the meeting and Action Items - Detailed Minutes of the meeting (typical of a Jabber Log, note that <name> indicates a jabber contribution to the meeting. This Report was prepared by Al Morton, based on Matt Zekauskas's notes/minutes compiled as official note-taker. 30 people signed the blue sheet, and 2 participated remotely. Session Audio: http://limestone.uoregon.edu/ftp/pub/videolab/media/ietf68/ Jabber Log: http://www.ietf.org/meetings/ietf-logs/bmwg/2007-03-20.html Slides: https://datatracker.ietf.org/public/meeting_materials.cgi?meeting_num=68 BMWG Session Summary -------------------- BMWG has an active Publication Request for the set of 3 drafts on IGP Convergence as measured in the Dataplane. These drafts have been revised as requested, and we seek another AD-level review. The next draft likely to reach publication request will be the IPv6 benchmarking draft, once the WGLC comments are resolved and consensus is reached. The Accelerated Stress Benchmarking Drafts are overdue, but new versions addressing the WGLC comments have just been published, and another WGLC is in order. The IPsec Drafts are also overdue, but new versions addressing comments are promised for April. The Protection Benchmarking drafts have begun to address the WG comments to make the measurement of restoration time more accurate, and more work is needed to include additional impairments of packet duplication and reordering. The Resource Reservation terminology is now in the RFC Editor's queue, but it is not known when the methodology draft will be updated to reflect the many changes in approach since it expired. The New Work Proposal on BFD was somewhat controversial regarding what must client protocols must be included - more development is needed and the discussion must continue on the list. There is a new methodology draft to support the LDP Convergence Proposal, but not much readership. The Proposal for Benchmarking Multicast VPN Scalability was controversial because the L3VPN architecture specification is still in draft form. However, the authors take the position that the subset of architectures within their proposed scope are stable and widely deployed. There is more work ahead to craft a scope that can achieve WG consensus. General Action Items: --------------------- * Need Everyone's review to find editorial problems and fix them. * Check that all Units of Measurement are specified, with resolutions. * When you review drafts, send your comments to the list, not just the editors. * Need editors to be responsive to comments, and post drafts between meetings to promote more discussion on the list. * Editors to consider where to place the "standard paragraph" in their memos. Specific Action Items: ---------------------- * Editors to update the IPv6 drafts with LC comment resolutions ASAP. * 2nd WGLC on Accelerated Stress Benchmarking Drafts. * Protection Benchmarking Editors to add details on time-based measurement. * Editors to update the IPsec drafts by end of April 2007. * Work Proposal Authors to address their feedback and continue development. Detailed Minutes of the Meeting: -------------------------------- Of the ~30 people who attended, about half said they were attending for the first time. Al did a quick intro to the WG, including charter, supplemental webpage, and how to "join". If folks proposing new work add "bmwg" into your filename, the ietf tools page will find the draft and help bring it to the group's attention, also make nits checking and diffmark versions available, just like chartered work items. Al highlighted the new draft on benchmarking MPLS - it was easy to find on the tools page. 1. Working Group Status (Chair) see http://tools.ietf.org/wg/bmwg/ for status of all drafts BMWG has an active Publication Request for the set of 3 drafts on IGP Convergence as measured in the Dataplane. These drafts have been revised as requested, and we seek another AD-level review. The Accelerated Stress Benchmarking Drafts are overdue, but new versions addressing the WGLC comments have just been published, and another WGLC is in order. The IPsec Drafts are also expired/overdue, but new versions addressing comments are promised for April. IGP Dataplane-conv: went through WGLC, but have been cleaned up based on suggestions. There are six active work proposals, including a new one on Bidirectional Forwarding Detection (BFD). LDP Convergence time and Multicast VPN Scaling will be discussed today. MPLS Benchmarking has been updated. The WG has let some drafts expire by design, to focus the work on related areas, such as accelerated stress benchmarking for EBGP and OPSEC. Recent experience has shown that the standard "paragraph" on security as the BMWG scope was a big help to move drafts through the IESG review. 2. IPv6 Benchmarking Methodology (Chip Popoviciu) draft-ietf-bmwg-ipv6-meth-01 The next draft likely to reach publication request will be the IPv6 benchmarking draft, once the WGLC comments are resolved and consensus is reached. The first WGLC ended on the date of the meeting. Chip gave a one page overview of draft. Update for SONET, left in other media types that are still used but rarely (token ring, FDDI). Already one implementation of recommendations in suite; also used for DoD mandate of IPv6 testing. Accepted as a WG item at IETF 66 in Montreal. LC Comments to date were mostly items that would be good to add, because this draft complements RFC 2544, rather than on actual recommendations wrt IPv6 itself. The slides list various comments, and actions to take. Frame sizes & SONET. First proposal 48 bytes, header + overhead. but to follow 2544, would have 40 + 8 bytes, so 56. No resolution on this yet. These sizes are probably not "realistic", but might be interested for other reasons, like security. What about the max size? The draft specifies 4096, and can go higher. Stop at 4096 here, but recommend a test at the maximum frame size available for the product. Also considered Max frame size for Ethernet, specifying 9216 currently. Dan Romascanu's comment was that said 2000 would be good, to catch sizes in 802.3as. Coverage in draft is more than sufficient, should we recommend a 2000 byte size too? Need WG feedback on this [deciding on min SONET, max Ethernet sizes]. Looking for several more last call templates from folks, only seen one from Bill Cerveny so far. (Rajiv Asati sent his in later in the day). Asked for comments in the audience. Only one reader, no comments. There has been good discussion from IPv6ops folks, so good cross-group review. Looking for an update shortly, then do short LC to make sure comments were incorporated adequately, and then Publication request. Milestone is March '07, but it looks like we will be close. 3. Benchmarking Protection Mechanisms (JL Le Roux) draft-ietf-bmwg-protection-meth-01 draft-ietf-bmwg-protection-term-01 There has not been a last call on these drafts yet. Terminology is supposed to be general, plus MPLS-specific methodology. JL presents changes to the document. (See slides.) A bunch were done for clarity. Addition of methods for computing failover time, based on discussion last time. NOTE: no terminology update since last time. Al: need right terms, augmented terms, so can distinguish loss-based benchmarks for restoration time from time-based method. WG agreed to account for impairments (duplication, reordering) not just loss, and these should fit well into time-based benchmark definition and method. jl: had considered as lost packets to date. Al: better to call as they are, and when we write definition, time from impaired packets (any of those) when they start to be seen, and when they stop being seen. Very important for reversion. More challenging than what we have done before, for example, in IGP convergence methodology, so need to be explicit. Open item: packet sampling interval. Particularly important for loss-based benchmarks. The sampling used in the IGP Dataplane Convergence drafts is too large for the restoration time scales measured here. Probably need something like 1ms packet sampling interval. Expect that test equip. should support this short interval, But we seek further verification or strong agreement (beyond what one person said on the list). Any thoughts? Rajiv Asati: depends on vendors to agree with this. But from a technical perspective, strongly support and agree with 1ms. <jay_karthik> I agree 1 ms to be the right interval as well unless we hear from test vendors that this is too aggressive. Next Steps: Reach Consensus on packet sampling interval, and encourage more feedback, then WG can LC after -02. Al would like terminology to be done soon, had a Feb milestone date. Al encouraged editors to get that one out soon. May do informal bar-bof on terminology definition set during IETF-68. If we arrange that, will send details to the list. Rajiv Asati: question regarding packet sampling interval. There are two different metrics: loss-based and time-based. Can you quickly offer the distinction between them? JL: Loss based: Approach is to set up stream at given rate, then trigger failure, and measure number of packets lost. The #Lost/link_rate gives failover time for particular stream. Al: there are 2 methods based on loss in IGP convergence drafts. one is to wit till the end of test, count # losses, and compute the implied outage time according to the offered packet rate. However, if computing a rate (on the fly at each sampling interval) then what you measure depends on sampling interval, and could have lots of error if the sampling interval is long compared with the restoration time. JL: For time-based: measure time of first loss packet, time for last loss, and the total time difference gives failover time. Al: We should talk about time resolution for the time based measurement, which is different than the sampling rate. Rajiv: Some may have perspective of looking at timestamp in packet. don't think that is what we are attempting to do. Ilya ?: When talk about sampling interval. Does it make sense to make sampling 1/2 of expected convergence time? If 50mS, fair enough to do 25 ms interval. will capture within expected time. Higher sampling Rate will just put extra load on equipment, and more concern about what vendors can and can't do. JL: If we take 25mS sampling, then precision only is 25mS. Suppose we want to measure 45ms convergence. 25mS sampling interval would cause lot of error. Rajiv: I agree Al: If 25ms interval, the boundaries of the sampling interval do not typically align with the failure, so know only within 25ms, that failover began. But Ilya's point is well taken. Can't have too great a demand on test equipment. However, the terminology draft is expected to span a number of different technologies, so we want to have some flexibility. Ilya: There are two views. average vs. likely convergence time what is likelihood that convergence within 50ms? In first case, need more fine-grained measurements. If you only want to know if converge within limits, then sampling interval can be more granular. Could randomize sampling time a bit too. Al: We should still have the test equip look at 1ms, and see if implementable. We would like better accuracy going forward. But would like to talk more and be sure we understand comment. Al: One other comment on draft... there is something wrong with the template they are using, pushing text too far to one side of page. This applies to a bunch of drafts here. need to fix it! 4. Benchmarking Resource Reservation Mechanisms (Krisztian Nemeth) draft-ietf-bmwg-benchres-term-08 RFC Ed Queue Looking at signalling INTSERV routers. short summary of long history. (see slide!) starting in 2000. now approved, in RFC Ed queue. Mention of most recent changes: thanks to Dan R. for comments. Clarified measurement units. Al: want to emphasize this. This must be clear in our drafts. Chair will attempt to review, but better if you take care of it first. Router state space change. Added QoS compliance to loss-free. no point in having something going on in control plane if it can't be measured in data plane. We don't do pass-fail or good-bad. So having states labelled good and bad has a pass-fail connotation. Now approved, can fix this AUTH48, because it's just clarification. Resurrecting the methodology draft. have some new people, goal is to get complete this faster. It is going to be a big revision from the last draft. Worried a bit that we might have new terminology come up from methodology revision. Maybe should have done together. Al: that's the model trying to move to. IPsec is being done this way. 5. Milestone Status (Chair) Al: 5 milestones indicated in Red where we are overdue, but we know some progress is being made on all of these. No draft yet, so worried about basic BGP convergence Benchmarking. The IPv6 Draft is on track. Thru end year, looking at net traffic control methodology and specific accelerated test method drafts, are in trouble without completing the work that's currently overdue. ******* New Work Proposals ********* 6. New Proposal Summary (Chair) Al: The Multicast scalability draft has the Internet draft column labeled "1+" because any terms and method are combined. from jabber: thinks should be significant support on list for BFD. <jaykarthik> Al ... Slide 8. Sig support on List for BFD should be Y Al: There has been lots of traffic, but not all messages offer support. Most messages are trying to sort out what BMWG should be doing. Many questions from BFD working group chairs, where they are still working on the specification. This topic may dovetail with acc stress benchmarking, or protection work (because failure detection is a part of protection). <jaykarthik> True Al. But we also need to benchmark the protocol BFD itself <jaykarthik> And that needs to ride solely as well besides the inclusion with other protocols. Al: I thought it didn't make sense to benchmark in isolation, since BFD is always used with other protocols. Ron Bonica (new AD): Question about benchmarking BFD in isolation. Generally the only way to know if BFD has done it's thing is if it informs some client of its activity. Rajiv A: Agree with Ron, the client must be there, and it is needed to characterize BFD. But the question stands. Is there a need to characterize BFD in isolation? No, because otherwise can't see if it has detected failure. But then, how many clients should be considered. Is one client enough? Do we need multiple clients to benchmark? <jay_karthik> Sure ... but we can emulate a dummy client <jay_karthik> Just to benchmark BFD protocol <kevin_dubray> Doesn't that better belong in the IRTF? Al: This is an IETF protocol, but if this topic is beyond engineering and more close to research, then IRTF is more appropriate. The WG agreed to reorder agenda topics on the fly, and deal with the BFD presentation now. 7. Bidirectional Forwarding Detection Benchmarking (Rajiv Asati) draft-salahuddin-bmwg-bfd-motivation-00 Short background on BFD (see slides). Why benchmark? Is new, but has the potential to be the single link failure detection mechanism. Note that there are many parameters; so it's not just the different clients, but the parameters that need to be considered. There are other schools of thought, one was vocal on the ML. This is a motivation draft to try to understand and gauge interest. Question to BMWG: is there a way to pull a BFD benchmark out alone, rather than rely on the "client"? Ilya: When you combine BFD & client into single test, you are losing granularity. The process involves two phases: one is detection (BFD) and reaction (the client). If you try to do benchmarking in single process, like ospf convergence, then you don't know the components of convergence time. Can you isolate detection from IGP action? It seems that you have to separate them. Ilya believes that it has to be done separately. You can have many different protocols, but must evaluate the common part. He believes that BFD must be eval on its own. Rajiv A: good point, rephrasing... differentiation on the routing proto implementation itself, and bfd impl itself needs to be highlight. but they work in conjunction as the intent of bgp. maybe need separate or together. Ron B: Can calculate the theoretical limit beyond which BFD can do no better. If we measure and near limit, then it doesn't really matter which protocol contributed most time, bfd or client. But if the measured result is far from the theoretical limit, then you would need to measure the implementation with a few different clients to infer how much delay each introduced. Rajiv A: Can calc theoretical times. But experience shows that the time to create a new forwarding entry in RP, and downloading into ASICS sometimes varies. Especially if there is interaction between different clients and BFD utilizes ipc-based implementation... don't know if theoretical calculation will hold true. But agree with this approach to some extent. from jabber: to Ilya - <jay_karthik> Good comment. That is precisely why we need to benchmark BFD separately as well as with the other work items. to Ron: <jay_karthik> But we need a methodology in place to ensure we are near the theoretical limits. Maria Napariela: comment on importance of examining the scale issue. (missed it) Rajiv A: paraphrasing, scaling should be included, very important. Back to Ron, point should be reflected in draft as one method, based on theoretical calculation. Al: I have to cut off discussion here for time, but folks can talk more in halls (Ilya had one other comment) Al: We need further discussion on list. Need to gather consensus. We had good input today. 8. LDP Convergence Benchmarking Terms and Methodology (Rajiv Asati) draft-eriksson-ldp-convergence-term-04 draft-karthik-bmwg-ldp-convergence-meth-00 Rajiv is the messenger, not author. Rajiv Papneja is author. LDP data plane convergence benchmarking terminology draft has been around for quite a while. See slides discussing motivation. This time there is a new methodology draft. Two people in the room had read it. <jay_karthik> Author of LDP Spec has commented as well, Bob Thomas. <kevin_dubray> I've read the LDP I-D <jay_karthik> Kevin has read as well. That is good Kevin. Thanks. Rajiv A: Speaking as a member of the Working Group, he has two comments on this specific topic/draft. First slide mentions the motivation of operational issues tied to lack of synch between LDP and IGP. Since 2005, all vendors have introduced techniques to address the missing synch between LDP and IGP. He is interested to know from authors if they have re-analyzed their approach, given that this protocol separation issue doesn't really exist anymore. Rajiv A: other comment (a nit) is the name itself. "LDP data plane convergence" Not sure I can agree with this, LDP is a control plane protocol. Al: What we're really doing is benchmarking LDP convergence as measured in IP data plane. We can ask the authors to make title more clear. <jay_karthik> We will address that, Al and Rajiv. Al: in methodology, didn't see lot of LDP Configuration info. Surely LDP is not optionless? Rajiv A: think that should be included as appendices. for a given test case, for example. Al: I think this should be an action item for authors. Al: We need more readership and traffic/support on list too. <jay_karthik> OK noted. We will address it. Taking an action item to include the Configuration <kevin_dubray> And it's good to know no messenger was harmed in the process of Rajiv's presentation. 9. Multicast VPN Scalability Benchmarking (Silvija Dry) draft-sdry-bmwg-mvpnscale-01 Silvija Dry speaking Would like to take most time getting consensus on scope of draft. Document goals: See slides. "rosen-8" is a version of MVPN arch. that has been implemented by some. We want results to be useful for network operators when sizing current net deployments. Document changes: Co-authors work at operators that have been using this arch. in their deployments. She has marked places in the draft where they could extend beyond the "rosen-8" architecture, but draft is only applicable to that architecture today. Made draft consistent with current draft in l3vpn WG, and the currently inactive rosen-8 draft. Added test cases... and lots of clarifications/explanations for "ease of reading". Background: draft-rosen-vpn-mcast ("rosen-8") [from 2000] is incorporated into broader draft (l3vpn-2547bis-mcast) but the rosen-8 architecture is widely deployed today, with multiple vendors and multiple operators. <kevin_dubray> How stable is the defining I-Ds? My concern w/this work (and BFD) is that we're chasing actively evolving targets. (Not good if you're trying to come up w/a universal measurement paradigm.) Silvija: fair question. We're trying to address in this draft, is an address subset that isn't changing. One that has been deployed with multiple implementations. There are 40-50 deployments. This arch is pretty fixed, what is not fixed are other mechanisms described in the WG draft. That's why we don't address them at this time, we get the operational experience first. Al: So in one sentence, you are targeting the stable subset of the WG draft. Ron B speaking as l3vpn chair: I share Kevin's concern that l3vpn WG has not arrived at a stable draft. In light of that. would it be possible to scope this draft so that it measures things regardless of which solution we are talking about. Silvija: that's a good question, we have thought about it. Some slides will touch on it. We have looked at it. We can't just use the common aspects because there isn't enough to valuably measure. Toerless Eckert: I think it will be valid for a long time to test pieces of the l3vpn WG draft. As far as benchmarking on-going standardization, we have the same experience with PIM standardization. It took from 1998 to 2005 to get another RFC, but there were a huge number of deployed implementations. Looking at deployed implementations and solutions is more important than to wait for standards. It is never possible to completely guess, but as soon as you see a large number of implementations, the migration to something else will take a while. Thomas Morin: I share concerns with Ron on relevance of the work. It maybe too early to be sure that this will be relevant to results of l3vpn WG. I think there hasn't been that much effort to make it relevant. I think it could be made, but not as written now. Silvija: I think we could come up with the agnostic piece. Would that be valuable to operators that have deployed one arch., if we only know what is happening on CE side and # PEs deployed. Would that give you as operator the scale limits of platform? Thomas: Yes I think so. Agnostic is probably not the best term, but making a draft that is generic would make sense. Silvija: I would argue that for bi-directional PIM and SSM, results in case of C instance would be different. <jay_karthik> Ron and Kevin - w.r.t. BFD, what might change with the parent IDs that are evolving is, how we detect failure rapidly (white box). However the fact that the failures need to be detected rapidly does not change and we could do black box measurements. <kevin_dubray> 1) This is a very nice draft. 2) To counter Thomas' points, we are a standards group, 3) agnostic/flexible may be the way to go - like the initial multicast RFCs from this group. Toerless: exist in deployment. Al: Toerless, example in BMWG is the multicast RFCs. ?1?: I think that this work will be wasted if you do it now. It's more efficient to do in more generic manner. Rajiv: It's not the first time that one group looks at another group's work that is evolving. But here, it is looking at implemented and deployed and stable version. I strongly support need for draft, and scope of draft, from perspective that data is needed today, using deployed arch. I don't think in best interest of WG to sit until the other architectures are fully baked, implemented, and deployed. <kevin_dubray> Read the charter folks. We are not a operational measurement group. Rajiv: If a methodology is available, it will give more data points for the new architectures being developed. I am very much in agreement with the scope, at least as in updated draft. Silvija: We don't want to have 10-15 yrs of deployment, and then write the benchmarking draft. Current implementations are stable. Mike Mcbride: I think draft is definitely not too early, if anything it is on the later side. Implementations have been around for many years now. I suggest the WG needs to reach consensus on the title of draft, to show the focus of daft. I am author of the vpn bcp, and have same lack of consensus on that one, now a broader area for discussion. To get consensus, the scope and title need to be specific, and explain history and what you are doing, and what you are not intending to be. Maria: I thought that it would be useful to have a draft. It should eventually become an agnostic draft. Technology changes... I think draft will help find bottlenecks of this tech in general, and will help direct new development. Most important aspect is using PIM for PE-PE exchange of mcast routes. understanding how that works, and how it scales. Al: gotta wrap up Silvija: Yes, multiple arch. can be multiple goals: 1. compare implementation of same specific arch 2. compare different arch/protocols Proposal is to - complete current draft. (does first goal) for stable arch that has been deployed today. - then work on agnostic doc to address item 2. Al: who has read the draft? about 1/3 of those in attendance, including 2 co-authors. From those that have read, we need to discuss scope more before we take it up. We should have that discussion on list, I encourage folks to participate. Silvija: Who is in favor of proposal? about 6 in favor object 3. (including Ron, presumably as l3vpn chair) Al: Objections are unusual. We still have more work to do here. <kevin_dubray> I read the draft. <kevin_dubray> I would like to see a change in scoping to more generic. Al: one final point, today we say thanks to David Kessens, for 3 years as our AD Advisor. During that time we completed 5 RFCs and two more RFCs to be, they're in the editor's queue. <David gets a big hand> David Kessens: I would like to thank all of you for your contributions because the working group does not succeed without your efforts.