IP Performance Metrics WG (ippm)

Tuesday, 23 March 2010, 17:40--19:40

This session was chaired by Henk Uijterwaal and Matt Zekauskas. Barry Constantine scribed the meeting, and his notes were edited into these minutes by the Chairs.


  1. Administrativia (Chairs)
  2. Status of drafts and milestones (Chairs)
  3. Testing Standards Track Metrics (Al Morton for Ruediger Geib)
  4. TCP Throughput Testing Methodology (Barry Constantine)
  5. Long-Term Reporting Metrics (Morton)
  6. Loss Episode Metrics (Morton for Nick Duffield)
  7. AOB

1. Administrativia (Chairs)

2. Status of Drafts and Milestones (Chairs)

(see slides)

TWAMP session control is with the IESG (reversed from the slide); reflect octets is about to go to last call after the meeting.

The short-term reporting draft has been stuck on chairs. Discovered an error in the reordering paragraph while writing up the shepherd note for the draft; some text reflects an older definition, not what is cited or the example. The text is being revised; since this is a text change, (although not changing intent of the draft) there will be a short working group last call before sending on.

Given the current work schedule, we should be finished mid next year. Thus, toward the end of the year we will be looking at charter revisions; if there is new work, please bring it forward.

3. Testing Standards Track Metrics


--Al Morton presenting

Al began by giving a short history of the draft, and then discussed the updates. In particular, we are trying to make sure the Standards Track specifications in the draft are sufficiently clear and unambiguous. The product of the work is clarified RFC's and not implementations; but you do need implementations to see if the definitions are clear and produce comparable results.

There were two in-depth reviews for the draft, and the current drafts reflect changes based on the reviews.

Basic strategy: for different implementations, use tunnels to force two flows to take the same path, and be treated similarly, on real networks. Ensure the setup allows for multiple runs so the results can be analyzed statistically. The authors are currently examining candidates for tunnel creation.

Next steps: need more comments/feedback. Make a working group document.

Yaakov Stein, after skimming the draft. Wouldn't the implementations be better tested using a captured packet trace? Get bit-exact answers. The problem is that this is active testing, and we don't operate on traces.

Yaakov also asked about tunnels, in particular Ethernet psuedowires. Yaakov didn't think that you could guarantee that the packets would follow the same paths. In addition, if you are using a tunnel, aren't you testing the tunnel and not the live network? The tunnel is just a connection between two routers. Al responded that the tunnel is intended to be used on a live network, and ensure the multiple test packet streams are affected equally by the live traffic, as much as possible.

Dave McDysan suggested that there should be back-to-back testing before going on a live network; equipment might be slow-path and dominate the performance measurements.

Yaakov also mentioned that you might need more IP header bits set than just source and destination address and port; these may not be enough to ensure fate sharing, it depends on the hardware and what ECMP mechanisms are out there.

Another person commented that you could also use something other than a routers - could use 802.3ah over DWDM instead of tunnels and routers. Other networks have 802.3ah continent-wide. Yaakov: ah as the bonding portion? Yes. Think bringing down scope compared to general Internet scenario.

Al also noted that some of the tests here are now laboratory tests, there are a few things that can only be done in the lab.

Yaakov also mentioned that IP pseudo-wire (mentioned in PWE control protocol draft) is also a possibility to be considered (it keeps the IP header in the tunnel). It might make more sense than an Ethernet pseudo-wire.

4. TCP Throughput Testing Methodology


---Barry Constantine
(See slides)

Context: need in network operator space for TCP-based throughput testing. The TCP layer is a finger-pointing area, and it's starting to be an area of shared responsibility. Today, testing is ad-hoc. Carriers desire repeatable process. Operators seem to have latched on to a RFC 2544-style test on operational network. But this doesn't relate to apps.

Matt Zekauskas asked about 2544 testing. Do they do this on live networks? It tends to drive networks to saturation. No, it's done during turnup, and if there are severe problems. So, it is done on an operational network, but not on one carrying live traffic. Matt also noted he was leery of recommending 2544 throughput testing, at least without a lot of context.

Yaakov Stein noted that 2544 has things other than throughput. And those other parts are done on live networks, and there are companies making money performing those tests. Things like frame delay variation.

Al noted that 2544 reaches throughput level by detecting loss. That's not a reliable measure on live network with transients. Al believes the misuse of 2544 came from jamming these techniques into Ethernet OAM recommendations from ITU SG 13, late into process without (much) review. Yaakov thinks that it pre-dates that, SG 13 was given it.

Barry noted that operators want to run TCP testing on "functional" networks; the 2544 testing was intended to make sure the network was functional first. The main point is to verify that L1, L2, and L3 working before L4 tests. Al felt that IPPM techniques could be used instead of 2544 for prequalification. Maybe the draft could be changed to state that more generically.

Want maximum sustained throughput. Operators don't understand bandwidth-delay product and RTT. Want to predict what should be achieved. So, this is a methodology to do that. So, this is a sequential process building up to measuring throughput, checking MTU, finding BDP, estimating max throughput that way. (See slides...)

A frequent use case is understanding single TCP streams for transferring large medical images. Some of draft is helping operators understand what tools are available, and what affects TCP transfer rate. Parallel stream tests are trying to find if parallel streams synchronize due to FIFO queuing versus RED. The background traffic effects on TCP are less well thought out at this point. Are getting feedback from network providers to make sure draft is on track.

Dave McDyson wanted to better understand what an operator should get from this. The document helps educate the operators on TCP, and different queuing effects; many are burned by synchronized streams. Customers are testing the networks and refusing service, this is a way for operators to do further testing and understand if there is a problem in their network. Dave thought that adding examples of how to interpret the test results (along the line of the parallel stream example) would be useful.

Al thought what was here was useful. However, SLAs are mentioned in the draft; that might be a trickier area to cover. To support an SLA that is meaningful requires more factors than would be measured with a standardized TCP sender/receiver, and those factors are difficult to standardize. Comparisons quickly become more tenuous. In addition, charter-wise, IPPM's metrics cannot determine pass/fail, but helping with diagnostics is OK. Also, with respect to varying MSS -- we want to be careful about hitting packet rate limitations if the MSS gets too small. Barry stated that what was intended was to find the maximum MTU without fragmentation; and that would be the minimum MTU to use later.

Al wanted to know what support would you expect from sender/receiver. Any host? Cooperating? Think that it's hard to do at gigabit speed without dedicated tester. There was going to be language about verifying the tester, which should get into the draft. Might also want to mention what speeds you think

Al also noted that, defining equilibrium, the state you are operating in, is also important to define properly. Yes, thought might work with Matt Mathis on that.

Yaakov noted that the MEF is doing a lot of work on SLAs. The way this draft is written is more diagnostic; if there's a problem noted by a customer, what can you do to understand if it is your problem. SLA tests are run automatically in the background, all the time, to collect information... is this something you can set up and keep in MIB? No, it is meant to be a turnup test and diagnostic. Today, technicians are sent out to do testing. They want an additional level of assurance. Yaakov also noted that one of the ways 2544 is used on operational networks (and the tests are run "all the time") is to check things like delay over time. If see creep up over time, that is meaningful and check into. This isn't for SLA conformance. Barry noted that operators often call the turnup phase "testing the SLA". Yaakov thought that the term "SLA" should be removed from this document. Al came back and noted that if you aren't at the max offered load for throughput, you aren't testing latency according to 2544. So it's RFC 2544-like, and making up as go, not standards work.

Kevin Lehey said he was a fan of 4821 MTUD; however, if doing this at turnup, wouldn't knowing that ICMP PMTUD was broken be useful? Barry said that the security folks have their design of the network, and we can't change it. If that's the security policy of the network operator, don't know what you can do about that.

A comment from an ex-operator: if you are running over MPLS on an existing network, how do 2544 throughput because some part of the network already has customers on it, and it would affect them. Barry said that was a good point, and he doesn't have an answer now.

Henk said that we would take the question of making this a working group draft to the list. See sufficient interest here, and there are already a number of wg members involved. Al wondered what kind of document this would be; it didn't seem like it should be a standards-track document. Perhaps start as experimental?

Dave McDyson said he was aware of application-level contractual agreements. So, he agreed was a diagnostic, not a SLA verifier. If only diagnostics does this fit into IPPM? Think maybe this should be informational versus experimental. Yaakov agreed that it should be informational. We have SLAs, and can say meet SLA. Would be great if there was an informational document to help diagnose when customer is still unhappy. Al liked the experimental tag, because he felt it encouraged more experimentation to get experience and make something standards track (or historic).

Lars Eggert said that the track depends on what is being developed. If this is diagnostics, it is not clear to how advance along standards track. What is the metric? As others have mentioned, there are many variables with TCP that it's hard to compare without recreating entire environment. Under rules talking about for advancing metrics, this couldn't move forward. There seems to be interest. Not core to the charter, but we have done things like this. Lars' preference would be informational. We could say this is experimental if we can show when it might go to standards track.

5. Long-Term Reporting Metrics


---Al Morton
(See slides)

Al gave a short background on this reporting draft and a summary of the updates. The newest material relates to "raw capacity" and utilization, and how to treat how they can vary over time. Page six of Al's slides has questions to the group. Please read and comment on the draft.

Dave McDyson noted current draft doesn't use the term SLA, but it was in slides. There is a point in the short-term draft relating to long-term measurements and unavailability, and equating unavailability to loss. It would be good to include that thought in this draft, and make clear it is SLA. Another measure that isn't a metric is the notion of availability, but that would be helpful to have here, to help tie back to practical SLA. Dave will put some of this on the list. In any event, there should be consistency between the short term and long term reporting documents.

Al noted this group has a connectivity metric, not availability. It's simple. He also said the ITU has availability, and it's very forgiving, with 5 minute windows. Industry-wide, seem to be harmonizing on a 10-second window used in transport. The thought was that when revise Y.1540 (IP Performance Metrics), change to 10 sec mechanism. In the Internet, can have other things going on relative to transport, such as route flapping. Dave just thought it should be mentioned here and some reference/harmonization to Y.1540.

Ali from the audience mentioned that some applications are looking for millisecond convergence, so 10 seconds may be too long. Al thought that one could count something on a small time scale, but thinks that most service providers would have one definition for availability, and it would have a larger period.

On "availability" - Yaakov noted that very short times are not traditionally called availability. If talking about really short times, call it packet loss and perhaps give it parameters of a Gilbert model. Transport providers understand availability to be longer time durations. George Bullis noted that if you have video, ten second outages are long.

Al said that he would post his questions he has to the list.

Loss Episode Metrics


--Al Morton

Al introduced the latest changes, starting with the title: "loss episodes" instead of "burst loss" (as a result of some group comments). There is IPR associated with the draft ( https://datatracker.ietf.org/ipr/1126/). Think that the draft should be adopted as a working group item. But, please read and comment. The current version incorporates revisions based on a couple of reviews that have been done by group members. In addition, some of the ideas have been incorporated into ITU-T work on stream repair metrics. See the slides for a summary of the document and changes.

IPR was called out again, and the disclosures have been made; this is similar to some PSAMP work that has progressed.

Yaakov Stein asked about section 7. Can you uniquely retrieve Gilbert model parameters from this? Gilbert-Elliot is meaningful for many things I do, if you can get me the parameters it would be useful. Is this for special cases or in general? Nick Duffield should respond, but got dropped from jabber right now.

Yaakov asked what is wrong with parameters of Gilbert-Elliot; people use it all the time. Al thinks that this is Gilbert, not Gilbert-Elliot. Gmin is set at 0; it requires consecutive loss (two state versus four state model). Yaakov said that would prefer Gilbert-Elliot; timing flow, quality of experience of real time. Al thought that for a lot of things, the two state model is enough. Will talk with Yaakov later on this, take it to the list, and clarify as necessary.

Henk will confirm adding this as a working group document to the list.

7. AOB

There was no other business, and the Chairs closed the meeting.