IP Performance Metrics WG (ippm) Tuesday, March 21, 2006 - 17:40--19:50 ====================================== The meeting was chaired by Henk Uijterwaal and Matt Zekauskas. Phil Chimento and Al Morton took notes, which were edited into these minutes by the chairs. AGENDA: 1. Status of drafts and milestones 2. Composition of metrics: Framework 3. Composition of metrics: Spatial Composition 4. Composition of metrics: Multimetrics 5. Traceroute Storage 6. Jitter Metric Comparison 7. Packet Burst Metric 8. AOB 1. Status of drafts and milestones --Henk Uijterwaal for the Chairs Henk opened the meeting, and reviewed the agenda. Al Morton noted that there was a new standards body, the IPTV Interoperability Forum, and he would say a few words at the end. Next Henk reviewed the drafts and milestones. The reordering metrics draft passed WGLC in January, and was sent to the IESG the day before the meeting. The OWAMP issues raised in security area review have finally been solved, and the draft has been approved by the IESG. TWAMP is stable, but needs at least a security section, and review now that OWAMP is complete. The implementation reports have been discussed with the ADs; we can use them as part of a WGLC for advancing metrics. Joe Ishac sent Henk a note about the capacity draft just before the meeting -- there was movement, they just didn't finish the draft before the cutoff. There should be a new version immediately after the IETF meeting. Henk advised that he read the draft and thinks it needs more review, so please read and comment. 2. Composition of metrics: Framework -- Al Morton Al Morton presented the Metric Composition Framework draft proposed and accepted at the last IETF. This is the first version that tries to incorporate all the existing drafts. Requirements for composed metrics came from the IPPM Framework RFC. Section 5 has completely new material, including a "Ground Truth" concept - what you are trying to estimate with the composed metrics. Subpath metrics go through a function that produces a composed metric. Spatial metric measured at intermediate nodes along the path. For the multicast case: the one-to-group metric is the "ground truth" for the multicast sub-path measurements. Deviations from ground truth (errors) include the inaccuracies of underlying measurements and differences in scope between ground truth and component metrics. The plan is to expand coverage of temporal and spatial aggregation classes. We need people to read and comment. Emile Stephan, referring to slide 6, pointed out that this draft introduces the possibility of some passive measurement for metrics -- concepts that haven't been introduced elsewhere. [Chair note: all our metrics have been developed considering active measurement; passive isn't prohibited, but it hasn't been rigorously developed yet.] Henk noted that for the three one-way delay measurements 'm1', 'm2', and 'm3' you could have m1 and m3 be active, with m2 passive... but then what exactly are you measuring passively with m2? Can you add the results to the active results obtained for m1 and m3? Emile noted that then you were talking "spatial metrics", even if you do measure an active test stream. Al noted that there are some general issues with passive measurements: you don't control the test stream, and you are at the mercy of whatever the stream looks like in terms of sampling and performance. He thought it was a sizeable effort to talk about comparing active and passive measurements. Emile noted that you could be passively measuring a controlled stream. Al thought that if you controlled the source stream, it was very similar to our current active measurements. Matt Mathis notes that both passive and active techniques have problems, with active measurements you have "heisenberg" problems; with passive measurements you don't have control over sampling. Say a particular metric, loss rate, was the same no matter how you measured. Then you could compose them, if not then the idea of composition falls apart. Composition was a concept envisioned in the Framework RFC (2330) that we were not sure how to implement, and Matt encouraged work on the subject. However, there are deep traps here for some metrics. For example, jitter in segment m1 might affect bulk capacity in m3. Matt felt that loss rate, simple jitter, and simple delay were the right metrics to start with. Al noted that during the discussion in Paris, we decided to avoid reordering. Matt thought a better strategy might be to observe that some metrics have special challenges; reordering is one, BTC is another. 3. Composition of metrics: Spatial Composition -- Al Morton Al Morton then presented the spatial composition draft, which has been reved from the individual submission last time, and crafted to fit into the framework. This work was originally proposed at IETF-63, as composed finite one-way delay. A second draft had composed loss and then composed loss and delay. This new working group draft has new terminology (although the terms definitions themselves may move to the framework). The delay variation section is new, and there is an exact specification of an RFC 3393 selection function to produce the ITU Y.1540 metric. Multiple composition relationships waiting in the wings to be specified, but there is a potential IPR issue with one of them; Al is waiting for clarification before bringing them to the working group. In addition, there is work to be done to rationalize this draft with the new framework document. In particular, reduce some redundancy. Al then went on to list some open issues, where group feedback would be particularly useful. First, he would like the group to focus on the loss and delay combination metric -- is it worthwhile? Are there other combination metrics to define, and do they deserve a separate draft? Emile stated that he was shocked first time he read this composition draft. If you receive a stream of OWD with 80% infinite delay you only receive actually 20% of the packets. You may compute a very good delay from a minority of the packets. You need to have the combination in order to understand this. Roman Krzanowski noted that when you combine loss and delay you get a sort of index, but how is it interpreted? You have to be very careful with these so that they are not misinterpreted. Stanislav Shalunov noted that he had not read draft, but was curious in units are loss and delay measured. Al said that you get a loss and delay indication for each packet. Stanislav said that we define loss as a special case of long delay, and you should not need an extra bit. Emile noted that the way it's defined in this draft, that is discarded during composition, which could lead to misleading results. Phil Chimento noted that when computing the average delay on all packets that arrive, you are reducing sample space, conditioning on arrival. You no longer have the average on whole space, you have a conditional average. When you understand that, then you understand only getting part of the picture. The original determination made in the original definition of delay was to make to make loss infinite delay, because you can't pick finite number and get a sensible average. Phil thought that still made sense. Al stated that you only compute delay on packets that actually do arrive; if you report the metric, you must report all the circumstances of the metric, which one should do anyway as a scientist, and then you get proper understanding. If you just put single number on web page, it is misleading. Stanislav added that when you cut off the tail of distribution (which you do when you consider only received packets), the property of addition of independent variables to take expectation no longer holds. Al thought we could argue about if a packet is infinitely delayed or not. Stanislav maintained if you cut off the tail of a distribution, it is no longer possible to add averages. It's a nice property to be able to add averages. Classical statistics is nice in that way. But there are some disadvantages. If you drop one item from the sample, the property no longer holds, and the result can be off very significantly because classical statistics are not robust. Stanislav felt that we should use robust statistics, or drop nothing. Al thought that this issue required more thought, and would like to take the discussion off-line; he thought that he could argue that loss is not necessarily infinite delay. Stanislav contended that loss means that packet has not arrived, and that the delay is at least greater than timeout. That does not mean that it is necessarily infinite, but there remains a degree of indeterminacy. Maybe it will show up in 50 years... you do know that delay is at least timeout, and maybe infinity. Al thought that means that when we defined a timeout, we defined a portion of the tail of the distribution that we are not really interested in. Stanislav reiterated that once you have done that, you can no longer add two independent variables and expect that the mean of the sum is the sum of the means, or you risk huge error. Al agreed that this needed further discussion, and that we should not take up any more meeting time today. Henk thought this was an important issue to take to the list. Stanislav was asked to post a pointer to a reference. Emile again mentioned that maybe we can have a subsection that specifies all the information that must be reported. Al returned to the open issues: Should we do multicast metrics? Al felt we should defer multicast for now. What about decomposition? What is the relationship between composition and decomposition? Trying to infer subpath measurements from combined measurements. Should we mention or focus on this? There were no other comments at this time. 4. Composition of metrics: Multimetrics --Emile Stephan Since the multimetrics draft has not changed since the last meeting, Emile preferred to present some thoughts with respect to composition, and a framework to exchange results among providers. Originally IPPM was conceived of as IP Provider Metrics. Passive metrics were not excluded, and spatial metrics are a part of that. End-to-end point-to-multipoint metrics are included in the multimetrics draft, and composition of end-to-end metrics are what is in Al's draft. In order to satisfy provider metrics, one has to report what is measured in detail, and integrate spatial metrics decomposition. You can use passive metrics in the framework to do composition of results. Regarding the framework for composition, take measurement per-segment, check to see if they are acceptable, and they do temporal aggregation and do final composition. The one-way delay of one segment may have a lot of definitions (say, 20 -- see the one-way delay metrics draft to see the currently defined set). To define composition, you have to clearly state which metric variant is considered by the composition algorithm. In addition, once you compose delay, you have the delay of a path. One might like to take that value and use in a further composition. This is important because provider networks are made up of several domains, and each domain may have it's own measurement system. If the provider wishes to provide information about their own network, they may provide a delay computed using a composition. It is important that the information from one provider clearly state if the delay given is measured directly or a composition. As an example, look at the framework applied to a small case study: 3 different ISPs, and suppose they all use different methods to measure delay, one subpath, the second end-to-end only, and the last uses composition. Each ISP has to send to the other ones the delay computed in its own domain. You need to send not only the measurement, but all the information related to it. You have to know which definition was used to compute each individual measurement. We can use the existing IPPM metric registry (RFC 4148). We can reuse the ipfix and psamp data models. Al stated that it appears that what you're asking is to add a whole section in the framework on reporting. Emile said he had two points. 1) Clearly define which metric definition is being used. Al said that folks should use the metric ID when reporting. Emile said that or point to where the metric is defined. Point 2): how do you get this information from another provider. Al said that is a data format question. Emile said that you have a registry and so you have a clear definition. If you want to exchange information, you can use the definition and the tag. Al wanted to know what Emile wanted to be reflected in the framework and composition drafts. First it appears we should use the metric IDs explicitly in the composition draft Emile thought that the composition draft should contain a table defining the allowed compositions. Al reflected that we will probably need more IDs assigned; Emile noted it was not a problem, you can just add IDs to the registry. Al thought we have a two step problem: First define the compositions, ask IANA for IDs for them and then make the table of compositions. 5. Traceroute Storage --Juergen Quittek Next Juergen Quittek presented the (currently personal) draft on Traceroute storage and exchange. Since the last meeting, we discovered that the Global Grid Forum is doing similar work. Juergen met with the editor and so decided to merge both pieces of work. Idea is to coordinate to avoid duplication, but there is no formal cooperation. Same format will be proposed at both bodies, although GGF has put their work on hold, and it may be some long time before GGF gets back to it. The risk is that there could be minor inconsistencies because of late changes. We agreed to use same XML schema for both bodies. Use IPPM information elements (adopted from DISMAN WG MIB on Traceroute measurements) and GGF data organization (in particular separation of data from metadata). An -03 version of the draft was just posted, the first version with the merged schema, and the authors feel it's almost done except for minor editing and technical issues. The current open issues are that the metadata and data separation isn't fully complete. There is a small technical issue about the timer resolution: DISMAN stated that 0 RTT implies no result was received, but one could get 0 with a coarse timer and a fast network. The GGF was considering adding an MPLS label or AS number to label probe results. In addition, RFC 2925 (the DISMAN traceroute MIB) is currently being revised, and we want to ensure we are compliant. None of these issues is very difficult; they are largely editorial issues, and there is no problem to do these. The authors expect to produce the next version in April or May, expect to be ready for WGLC. The document is still currently individual, we did get a work item in the charter, and would like the next version to be an official WG document. Henk (as participant) thought that we should use AS numbers -- they are often more important than IP addresses. Carter Bullard asked if there was a solution to the time resolution issue. Increase timer resolution in the schema? Juergen said that was one option, another is a different distinguished value to indicate no response. Carter asked what are the issues that are blocking the solution; Juergen said just that there needed to be a discussion, and there is currently lots of delay with the GGF group. Henk asked the group if we should we pick this up as a WG document? There was no dissent. The next version will be a WG document. 6. Jitter Metric Comparison -- Al Morton Next Al Morton presented some preliminary work on comparing "jitter" metrics -- looking at the IPPM and ITU definitions, and in particular at two definitions that Roman Krzanowski (Verizon) had identified as being in common use in the provider community (difference in successive packet delays, and deviation from a minimum value). Al is not talking about revising the delay variation RFC. Both metrics are compliant. There are two metrics in widespread use: 1. Inter-packet delay variation: Reference is previous packet. Continuous delay variation. If all had the same delay, all the IPDVs would be 0. Because of the alternation in the example, there is a delay variation for all packets. 2. Packet delay distribution: Emphasize one-way-delay. Find packet with minimum delay, and subtract it from all the delays. Normalize delay distribution to the minimum. The first metric gives a dynamic reference for delay variation, essentially the adjustments of an Adaptive de-jitter buffer IF it adjusted its length on every packet arrival (practical buffers don't adjust that often). There is minimal destination clock stability required. Path changes with loss are effectively ignored. Path changes without loss affects 2 readings then you go back to stable delay variation. The second metric normalizes the delay distribution, but there is no clear relationship to RFC 3550. Only stability of clock over the measurement time is important. Path change with loss causes bi-modal distribution as well as path change without loss. First question: Are you using either one of these metric? How do you want to use the results? Where do the requirements come from and what does it mean to your customers? Alan Clark stated that there are other jitter metrics. MAPDV (keep a short term average and measure differences from the short term average) emulates how a dejitter buffer works. We found that MAPDV correlates very will with discard rates at de-jitter buffers. There is a philosophical point: What are you trying to measure with jitter measurements anyway? What happens if you have on-time on-time, late, late. Metrics that look at absolute delay variation, always report peak to peak. In Alan's example, Al noted that if you look at the simple range of variation with the PDV metric, you will see large jump in variation for the late packets. Al observed that Alan was pointing out that a pitfall in reporting any of these metrics is how to summarize them usefully. Alan pondered what should you report if the changes are periodic? Stanislav stated that it is important to note that both transforms remove one degree of freedom. (You have N numbers and you wind up with N-1 numbers). You don't lose very much information. You can reconstruct all the data that you had to begin with. The value will lie in the further reducing the number of degrees of freedom. More specifically, regardless of which purposes, the metrics have certain properties. One might find it desirable to have a metric invariant with respect to frequency. You want to produce the same numbers, regardless of the sampling frequency. Al noted the furthest you can go in the negative direction is directly related to the original spacing. Stanislav also noted that the problem of estimating variances of different quantities has been approached before, and it makes sense to look at solutions in wide use. Al asked for pointers, or for Stanislav to make some suggestions. Ron Pashby noted that ATM defined it - cell delay variation. (Two-point variation is similar to PDV). Carter Bullard said yes, but it was not measured in the same way. Carter also had a problem with the word jitter. We should try to eliminate the use of the word, and define our term carefully. here are many kinds of variation. With ATM, there is no way to measure source and receive time, so cell delay variation is a single point measurement; really cell arrival rate variation not delay variation. Even in the industry, there are lots of problems trying to define jitter, so we should be a bit more elaborate in definitions and use the word jitter sparingly. Al returned to his slides, really stating that this was now a separate talk looking at an experiment on spatial composition. Al reports on experiments that measured a congested T1 interface. Measurements show uniform distribution at the congested interface. End to end measurement got the expected result. They compared methods to take information from each distribution and predict the 99.9th percentile. Some methods did pretty well, others did badly. The next draft from Al will have the details, and he observed that this sort of composition is only possible with PDV, and from this perspective PDV seems to be the better of the two metrics. 7. Packet Burst Metric --Roman Krzanowski Roman Krzanowski presented some preliminary ideas on a packet burst metric. He stated that we have talked about bursts of losses, but never tried to define bursts of packets. This is from a request by the Verizon operations team, they want to measure burstiness for voice and video. He said he also did a literature search, and didn't find anything that fit precisely, although he mentioned a number of potential metrics (peak-to-average ratio, index of dispersion, Hurst parameter). Carter Bullard suggested perhaps measuring the inverse of burstiness. Roman is going to see if he can find enough people to write an initial draft. Henk thought the discussion should be brought to the mailing list. 8. AOB Finally, Al Morton mentioned work going on in a new IPTV Interoperability Forum. They are very interested in multicast performance, and some of the multimetrics work might be useful there. They are also looking at comparing the value of active and passive measurements. They have continuous traffic in their live streams. They may be able to re-use our definitions. However, there are also other metrics they will look to define that are specific to their environment. IGMP join time, for example. There are plenty of users out there for multicast measurements. Henk asked that a pointer to this work be sent to the mailing list.