IP Performance Metrics WG (ippm) Monday, July 10, 2006 - 15:20--17:20 ==================================== The meeting was chaired by Henk Uijterwaal and Matt Zekauskas. Al Morton took notes, which were edited into these minutes by the chairs. AGENDA: 1. Administrativia 2. Status of drafts and Milestones 3. BarBoF on application identification 4. TWAMP status and implementation 5. Traceroute draft 6. Reporting Metrics: a) The Reporting Metrics draft b) Different Points of View 7. Composition Framework and Spatial Composition drafts 8. The Capacity Definitions Draft 9. Packet Burst Metric 10. Registry BCP draft 11. AOB 2. Status of drafts and Milestones -- Henk Uijterwaal & Matt Zekauskas Reordering and OWAMP are with the RFC Editor. Jitter Applicability Statement was due in Jan 2006. 3. BarBoF on application identification -- Mark Allman IMRG trying to spin up a couple of activities. First, in the measurement community there has been a lot of work on trying to identify flows going by a monitor (without using port numbers). Is there enough here to have a workshop? Second, there is the bandwidth capacity definitions draft in IPPM. There was a bandwidth estimation workshop held by IMRG a few years ago; and there has been a lot of work in the area. Is it time to do another workshop here? Are there things that are mature enough to be kicked over to IPPM for standardization? If you have an opinion on either topic, either post to the IMRG list, or talk to Mark. 4. TWAMP status and implementation [draft-ietf-ippm-twamp-01] a. Draft update -- Keynam Heyadat There have been minor changes made for this 01 version -- A clarification on MSL, and a clarification of twamp light based on sequence numbers. There are two public implementations underway at Brix and Allied Telesyn, and there are others the author knows about under NDA. Next steps: need to update security section based on owamp experience, and need wider review from WG before submitting for publication. b. Looking at implementations of twamp in an industrial setting -- Roman Krzanowski The context is in-channel per-customer performance monitoring. Canoga Perkins currently has a proprietary "ICMP+" protocol, which looks at latency, jitter, and frame loss on VLANs among approximately 100 locations. They have a rich implementation of results and reporting. They are looking at implementing TWAMP as an open protocol, and will coordinate their implementation with Brix in order to ensure interoperability. The added value of TWAMP is to have an open protocol for testing with a common way of evaluating network performance.Reporting is to a single manager, from the sending device (the other is a reflector). Emile Stephan asked how many nodes were in services, then answer is high 10's, below 100. Emile also asked if the interoperability wanted was in the probes, or to the manager. The answer was that the probe interoperability was most useful, the target currently is a metro ethernet setting. Emile also asked if there were two probes doing measurements, where did they report results? Right now the goal is for the reflector to be interoperable, the active testers will report back to a single manager. Keynam noted that TWAMP doesn't address provisioning and reporting, there are lots of (proprietary) ways to do that. Roman added that looking at their deployment, there were four levels of software to aggregate and collect data. 5. Traceroute draft draft-ietf-ippm-storetraceroutes-00 --Saverio Nicollini Addressed several technical issues, including adding MPLS and AS information. Also many editorial updates. Need to cover conflicts and differences between RFC 2925 and the draft. Need to test the schema for correctness. Next version is available in 2 months. Main concern is the lack of GGF review and the possibility that they will not accept or adopt the same draft (and that was the goal). Lars Eggert noted that we could go through the IETF liaison to the GGF -- present an official request to the liaison for feedback on the document. [After the working group meeting, the chairs informally talked with the GGF liaison, David Chadwick. Matt knows one of the co-chairs of the GGF group, so he will pursue chair-to-chair interactions first.] 6. Reporting Metrics: a) draft-ietf-ippm-reporting -- Matt Zekauskas presenting for Stanislav Shalunov who could not attend. Stanislav created this document in response to being asked "what to report" for on-demand measurement tools whose results would be interpreted immediately by people. The scope is intended to focus on just on-demand measurements, not long-term measurement studies or ongoing active measurements. The document specifies a small set of orthogonal metrics that are robust and easy to understand: delay, loss, jitter, duplication and reordering. Phil Chimento (jabber) raised a question about the 2 second time-out and the wide statistical confidence intervals resulting from a small sample, he felt both might be too short to get meaningful results. Emile suggested that the time-out may be too long -- perhaps they should be shorter if a human is waiting, and it's hard to set a single default value. Emile is also concerned that if we specify a single default value for this use, people will use it for other things. Al Morton felt that the jitter definition is not consistent with IPDV in RFC 3393 -- lost packets are excluded from processing in RFC 3393. Al also felt that it was hard to understand the paraphrased reordering definition presented in this draft, and the one in the working group reordering draft should be used instead. Matt Mathis said he liked the idea of user presentable statistics with standard parameters. However, he really encourages using existing standard statistics and parameters; in particular he noted that the apparent change in delay/jitter. He would have the document present a restricted subset of existing metrics to users. b) Different Points of View draft-morton-ippm-reporting-metrics-00 --Al Morton Or, how to "run with scissors". There was a recent discussion on the mailing list about means, with one participant advocating "mean delay considered harmful". Al felt that this was an extreme position, and a difficult place to start a productive discussion. Al considered the various audiences of reported results and their different points of view as an alternate way to approach the topic of reporting. As background, Al made the observation (in his IETF-65 talk on delay variation) that how you want to use the metrics controls how you want to set the parameters to the metrics. There are choices Stanislav has made in his draft, and his choices are different from the two primary ways to measure delay variation based on RFC 3393. Al wrote this draft based on how different audiences might use the metrics, and thinks that it might become an applicability statement. Al saw Stanislav's draft on list, and we have just talked about that in great detail. We know it is about short term reporting. Al would would have made different choices, and he thought about why - different circumstances such as longer term measurement or different audiences for results tend to favor alternate parameter choices. This draft also addresses comments on composition framework raised at the last meeting, mostly having to do with truncating the delay distribution, which causes classical statistic manipulations to be invalid (such as the addition of expected values). Al tried to address that in the context of this talk as well. All of the IPPM metrics have multiple options, and they make the registry that we standardized less effective, as there is no simple way to report what options were used in measurement. This issue turned up in the course of finalizing the reordering draft. Emile noted that there was no way to distinguish ways the measurements were made. Al felt that there are two key points of view when asking how the results will be used: (1) for network characterization: "How am I doing with respect to a network SLA?". This is verification of data delivery. (2) for an application (or something above the transport layer) designer: "What happened in my stream?" "How is the network going to affect the thing I am responsible for above the transport stream?" For example, look at the loss threshold parameter. It must be selected to differentiate a long finite delay from true loss. If you are doing real-time reporting, the threshold must be small. However, in other venues you can make it very long. You can have a long waiting time and can still post-process the results representing shorter waiting times if that is desired. You avoid truncated distributions by setting the waiting time long enough to avoid declaring a packet lost that is still in transit, and this is possible if you combine the TTL with worst-case assumption of link delays and queuing delays at each hop. You don't want to use something like a 200 millisecond timeout in practice because you don't want to throw data away. Al further noted that we (IPPM) designate errored packets as lost. Even if a packet arrives that is corrupted in some way, but we have enough information to tell that it is part of a test stream, we designate it as lost. Al would rather include them for determining delay statistics, but he is willing to call them undefined. Lars noted that if an errored packet has errors in its headers, then you can't tell what stream they belong to. Al said that was an important detail, but he wanted a distinction between packets that arrive sufficiently intact to identify, and those that don't arrive prior to the loss timeout. Next, he thought about calling a lost packet as being "undefined" versus having "infinite delay". Al thought it should be undefined, and in fact in the delay variation RFC lost packets are excluded. If you are considering application performance, you just care about what arrived within a timeout, and want to know the delay of the delay of those packets that arrived. He said that he preferred just counting the delay of packets that arrived; counting packets that did not arrive as infinite was in some sense double counting network impairments (as both loss and delay). For the spatial composition draft, Al defined Type-P-Finite-One-way-Delay, and asserted that it is "consistent with the one-way delay RFC". The future of the composition work is influenced by what we'd like to do here. If we cannot use conditional delay distributions, it is not clear if we can compose delay metrics. In addition Al noted that he preferred classical statistics on delay; the "sample mean" is almost ubiquitous, it is on everyone's web site. There is some robustness if you use a long loss timeout. The crowds consider it useful, not harmful. Robust statistics have their own strengths, but also their own weaknesses, and reporting both mean and median delay can be useful. Thus, Al likened using a conditional delay distribution and classical statistics to running with scissors: understand the weaknesses and issues, be careful, and you can compensate for the weaknesses. You can also use the median to describe delay distributions with some care. Since the median has different properties, comparing it with the mean can be useful. Henk noted on slide 10, you report delay and cdf for delay for arrived packets, and that's the conditional distribution. (It is indicated as a Conditional CDF on the slide) Al summarized the recommendations for metric parameters and options based on his study of these two audiences for metric reporting on slide 14. Matt Mathis stated that for both this and the previous talk, he feared we are overlooking something else - an assumption about what samples size are appropriate. Users often use sizes orders of magnitude too small. The relevant loss scales are very long for transport performance, and short term measurements just aren't relevant. We need to say more about sample size in general, "large" just isn't enough information. People in this room have an assumption of "normal" sample size, and those notions span many orders of magnitude, and some numbers that are appropriate for one community are not appropriate for another. Al went back to his presentation -- the point he is trying to drive home is that * there are range of reporting problems to be solved, and short-term reporting is just one and * the settings of various parameters and options should be based on the end use for the reported measurements (the use case) We can go through some use cases, and recommend specific parameters per case in this draft, staying away from what Stanislav has done in his draft on short term reporting. Al's biggest concern is that while Stanislav's document focuses on the short-term reporting of on-demand measurements, it might be used as a building block for longer term measurement and reporting where some assumptions don't apply, such as the two second timeout or on-the-fly calculation. The document is fine for what it is, but he doesn't want to be shackled with those assumptions for long-term reporting, or other aspects of measurement. 7. Composition Framework and Spatial Composition drafts a) draft-ietf-ippm-framework-compagg-00 b) draft-ietf-ippm-spatial-composition-00 --Al Morton Al briefly talked about the framework for composition of metrics and the spatial composition drafts. Stephen van den Burghe may not be able to help out any more, his job has changed; Matt noted that he has some ties to the GEANT2 JRA1 group working on measurement for the GEANT2 network that Stephen was working with, so there is a possibility of finding another from that group. For the composition draft, we defined a finite one-way delay (which relates to the "different points of view" presentation earlier). We looked at RFC 3393 (delay variation) in order to do it. Al noted the closed issues now reflected in the document, and that decomposition (inferring the internals of the network from end-to-end measurements) was not going to be included based on the previous discussion. There is still a lot of redundancy in this draft, and that will be tackled in the next revision, along with adding composed metrics that are more than just averages as requested in the last meeting. However, Al feels the work is in jeopardy: if we can't agree on working with conditional distributions and finite delays, then we won't have means (and you cannot add medians). Henk asked people to re-read the drafts and post comments to the mailing list. 8. Bandwidth Draft --Joe Ishac Joe did a brief report on the bandwidth definitions draft. The current version has been available since late June. It's been almost a year since the initial draft, and we have had good discussion on the list. There is one open issue, that that is whether to incorporate the notion of "Type-P" into the draft. Joe and Al Morton will be meeting to discuss the issue. If that does not result in any major changes, the draft is basically stable and ready to go to WGLC. 9. Packet Burst Metric --Roman Krzanowski Roman gave a presentation on the idea of a packet burst metric; he would like to gauge community on whether work should go forward. A packet burst metric would be good to have when dealing with networks used for voice over IP or video. Right now, there is no common definition of "burst". This is a high-level proposal to develop metrics to measure packet bursts. It is not very complex, but it would be good for the community to agree on common definitions. In particular, Roman is not interested in developing a new definition, but just provide an agreed-upon standard definition. There is a depth to the problem, opportunities for extension and application. Lars Eggert asked if inter-arrival time is varying or fixed in this proposal. Roman stated he's seen definitions with varying inter-arrival time, but believes we should start with fixed times, based on the application that would be affected by the burst. Mark Allman asked if this is per-flow or aggregate? The initial thinking is per-flow, and that is Roman's background. He reiterated that burstiness is not a new thing, there is a lot of literature about bursts. He has a two dimensional view of bursts where arrival time is plotted vs Inter-arrival time, and the threshold is set to distinguish between long and short inter-loss times. Further research includes more dimensions and second/third order statistics, including Fractals. Matt Mathis noted that the bursts seen in TCP are caused by cross traffic, and if you measure with a different TCP, you'll see different burst characteristics. If the metric is application oriented, it may not fit with network measurement. Roman said he has seen a paper on this. However, the fundamental issue here is defining the burst -- what we mean by burst. Roman will create a draft based on these ideas and send to the list for comment. 10. Registry BCP draft draft-stephan-ippm-reporting-registry-00 --Emile Stephan Emile posted a draft that just missed the -00 cutoff, but sent email to the list with a pointer. He noted that recent composition drafts have prompted another look at the registry. He felt a new version must capture all the parameters and options that are defined in our RFCs. It could store use cases for reporting. Emile has some ideas how to do this, and some examples. The WG is asked to read the draft and comment. 11. AOB There was no time for "any other business", although there was a request by Jerome Durand to make a short presentation on what was being done for network measurement in GEANT2 (the European research network). Interested people can go look at http://www.geant2.net/ (select Research from the top and then Performance and Measurement on the left), http://www.perfsonar.net/ , and http://wiki.perfsonar.net/jra1-wiki/index.php/JRA1_Main .