IP Performance Metrics WG (ippm)
Monday, July 10, 2006 - 15:20--17:20
====================================

The meeting was chaired by Henk Uijterwaal and Matt Zekauskas. 
Al Morton took notes, which were edited into these minutes by the
chairs.


AGENDA:
1. Administrativia
2. Status of drafts and Milestones
3. BarBoF on application identification
4. TWAMP status and implementation
5. Traceroute draft
6. Reporting Metrics:
     a) The Reporting Metrics draft
     b) Different Points of View
7. Composition Framework and Spatial Composition drafts
8. The Capacity Definitions Draft
9. Packet Burst Metric
10. Registry BCP draft
11. AOB




2. Status of drafts and Milestones
   -- Henk Uijterwaal & Matt Zekauskas
Reordering and OWAMP are with the RFC Editor.
Jitter Applicability Statement was due in Jan 2006.




3. BarBoF on application identification
   -- Mark Allman
IMRG trying to spin up a couple of activities.  First, in the
measurement community there has been a lot of work on trying
to identify flows going by a monitor (without using port numbers).
Is there enough here to have a workshop?

Second, there is the bandwidth capacity definitions draft in IPPM.
There was a bandwidth estimation workshop held by IMRG a few years
ago; and there has been a lot of work in the area.  Is it time to
do another workshop here?  Are there things that are mature enough
to be kicked over to IPPM for standardization?

If you have an opinion on either topic, either post to the IMRG
list, or talk to Mark.





4. TWAMP status and implementation [draft-ietf-ippm-twamp-01]
   a. Draft update -- Keynam Heyadat

There have been minor changes made for this 01 version --  A
clarification on MSL, and a clarification of twamp light
based on sequence numbers.  There are two public implementations
underway at Brix and Allied Telesyn, and there are others
the author knows about under NDA.

Next steps: need to update security section based on owamp
experience, and need wider review from WG before submitting
for publication.

   b. Looking at implementations of twamp in an industrial
      setting -- Roman Krzanowski

The context is in-channel per-customer performance monitoring.
Canoga Perkins currently has a proprietary "ICMP+" protocol,
which looks at latency, jitter, and frame loss on VLANs among
approximately 100 locations.  They have a rich implementation
of results and reporting.  They are looking at implementing
TWAMP as an open protocol, and will coordinate their implementation
with Brix in order to ensure interoperability.  The added value of
TWAMP is to have an open protocol for testing with a common way
of evaluating network performance.Reporting is to a single manager,
from the sending device (the other is a reflector).

Emile Stephan asked how many nodes were in services, then answer
is high 10's, below 100.  Emile also asked if the interoperability
wanted was in the probes, or to the manager.  The answer was
that the probe interoperability was most useful, the target
currently is a metro ethernet setting.  Emile also asked if
there were two probes doing measurements, where did they report
results?  Right now the goal is for the reflector to be interoperable,
the active testers will report back to a single manager.

Keynam noted that TWAMP doesn't address provisioning and reporting,
there are lots of (proprietary) ways to do that.  Roman added that
looking at their deployment, there were four levels of software to
aggregate and collect data.





5. Traceroute draft
   draft-ietf-ippm-storetraceroutes-00 
   --Saverio Nicollini

Addressed several technical issues, including adding MPLS and AS information.
Also many editorial updates.  Need to cover conflicts and differences
between RFC 2925 and the draft.  Need to test the schema for
correctness. Next version is available in 2 months.

Main concern is the lack of GGF review and the possibility that they
will not accept or adopt the same draft (and that was the goal).
Lars Eggert noted that we could go through the IETF liaison to the GGF --
present an official request to the liaison for feedback on the
document.

[After the working group meeting, the chairs informally talked with the
GGF liaison, David Chadwick.  Matt knows one of the co-chairs of the
GGF group, so he will pursue chair-to-chair interactions first.]




6. Reporting Metrics:
     a) draft-ietf-ippm-reporting
    -- Matt Zekauskas presenting for Stanislav Shalunov who could not attend.

Stanislav created this document in response to being asked "what to
report" for on-demand measurement tools whose results would be interpreted
immediately by people.  The scope is intended to focus on just on-demand
measurements, not long-term measurement studies or ongoing active
measurements.  The document specifies a small set of orthogonal metrics
that are robust and easy to understand: delay, loss, jitter, duplication
and reordering.

Phil Chimento (jabber) raised a question about the 2 second time-out
and the wide statistical confidence intervals resulting from a small
sample, he felt both might be too short to get meaningful results.
Emile suggested that the time-out may be too long -- perhaps they
should be shorter if a human is waiting, and it's hard to set a single
default value.  Emile is also concerned that if we specify a single
default value for this use, people will use it for other things.
Al Morton felt that the jitter definition is not consistent with IPDV in
RFC 3393 -- lost packets are excluded from processing in RFC 3393.  Al
also felt that it was hard to understand the paraphrased reordering
definition presented in this draft, and the one in the working group
reordering draft should be used instead.  Matt Mathis said he liked
the idea of user presentable statistics with standard parameters.
However, he really encourages using existing standard statistics
and parameters; in particular he noted that the apparent change in
delay/jitter.  He would have the document present a restricted
subset of existing metrics to users.







     b) Different Points of View
        draft-morton-ippm-reporting-metrics-00
        --Al Morton

Or, how to "run with scissors".  There was a recent discussion on the
mailing list about means, with one participant advocating "mean delay
considered harmful".  Al felt that this was an extreme position,
and a difficult place to start a productive discussion.    Al considered
the various audiences of reported results and their different points of
view as an alternate way to approach the topic of reporting.

As background, Al made the observation (in his IETF-65 talk on delay
variation) that how you want to use the metrics controls how you
want to set the parameters to the metrics.    There are choices
Stanislav has made in his draft, and his choices are different from
the two primary ways to measure delay variation based on RFC 3393.
Al wrote this draft based on how different audiences might use the
metrics, and thinks that it might become an applicability statement.

Al saw Stanislav's draft on list, and we have just talked about that in
great detail.  We know it is about short term reporting.  Al would would
have made different choices, and he thought about why - different
circumstances such as longer term measurement or different audiences
for results tend to favor alternate parameter choices.  

This draft also addresses comments on composition framework raised
at the last meeting, mostly having to do with truncating the delay
distribution, which causes classical statistic manipulations
to be invalid (such as the addition of expected values).  Al
tried to address that in the context of this talk as well.

All of the IPPM metrics have multiple options, and they make the registry
that we standardized less effective, as there is no simple way to report
what options were used in measurement.  This issue turned up in the
course of finalizing the reordering draft.

Emile noted that there was no way to distinguish ways the measurements
were made.

Al felt that there are two key points of view when asking how the results
will be used:

(1) for network characterization:
  "How am I doing with respect to a network SLA?".
  This is verification of data delivery.


(2) for an application (or something above the transport layer) designer:
  "What happened in my stream?"  
  "How is the network going to affect the thing I am responsible for
   above the transport stream?"

For example, look at the loss threshold parameter.  It must be selected
to differentiate a long finite delay from true loss.  If you are doing
real-time reporting, the threshold must be small.  However, in other 
venues you can make it very long.  You can have a long waiting time and
can still post-process the results representing shorter waiting times
if that is desired.  You avoid truncated distributions by setting
the waiting time long enough to avoid declaring a packet lost that
is still in transit, and this is possible if you combine the TTL
with worst-case assumption of link delays and queuing delays at each
hop.  You don't want to use something like a 200 millisecond timeout
in practice because you don't want to throw data away.

Al further noted that we (IPPM) designate errored packets as lost.
Even if a packet arrives that is corrupted in some way, but we have
enough information to tell that it is part of a test stream, we
designate it as lost.  Al would rather include them for determining
delay statistics, but he is willing to call them undefined.

Lars noted that if an errored packet has errors in its headers, then
you can't tell what stream they belong to.

Al said that was an important detail, but he wanted a distinction between
packets that arrive sufficiently intact to identify, and those that
don't arrive prior to the loss timeout.

Next, he thought about calling a lost packet as being "undefined"
versus having "infinite delay".  Al thought it should be undefined,
and in fact in the delay variation RFC lost packets are excluded.
If you are considering application performance, you just care about
what arrived within a timeout, and want to know the delay of the
delay of those packets that arrived.

He said that he preferred just counting the delay of packets that arrived;
counting packets that did not arrive as infinite was in some sense
double counting network impairments (as both loss and delay).

For the spatial composition draft, Al defined Type-P-Finite-One-way-Delay,
and asserted that it is "consistent with the one-way delay RFC".  The
future of the composition work is influenced by what we'd like to
do here.  If we cannot use conditional delay distributions, it is
not clear if we can compose delay metrics.

In addition Al noted that he preferred classical statistics on delay;
the "sample mean" is almost ubiquitous, it is on everyone's web site.
There is some robustness if you use a long loss timeout.  The crowds
consider it useful, not harmful.  Robust statistics have their own
strengths, but also their own weaknesses, and reporting both mean and
median delay can be useful.

Thus, Al likened using a conditional delay distribution and classical
statistics to running with scissors: understand the weaknesses and
issues, be careful, and you can compensate for the weaknesses.  You
can also use the median to describe delay distributions with some
care.  Since the median has different properties, comparing it with
the mean can be useful.

Henk noted on slide 10, you report delay and cdf for delay for
arrived packets, and that's the conditional distribution.
(It is indicated as a Conditional CDF on the slide)

Al summarized the recommendations for metric parameters and
options based on his study of these two audiences for metric
reporting on slide 14. 

Matt Mathis stated that for both this and the previous talk,
he feared we are overlooking something else - an assumption about
what samples size are appropriate.  Users often use sizes orders
of magnitude too small.  The relevant loss scales are very long
for transport performance, and short term measurements just aren't relevant.
We need to say more about sample size in general, "large" just isn't
enough information.

People in this room have an assumption of "normal" sample size, and
those notions span many orders of magnitude, and some numbers that
are appropriate for one community are not appropriate for another.

Al went back to his presentation -- the point he is trying to drive home
is that

 * there are range of reporting problems to be solved, and short-term
    reporting is just one

and

 * the settings of various parameters and options should be based on
    the end use for the reported measurements (the use case)

We can go through some use cases, and recommend specific parameters per
case in this draft, staying away from what Stanislav has done in his
draft on short term reporting.

Al's biggest concern is that while Stanislav's document focuses on the
short-term reporting of on-demand measurements, it might be used as a
building block for longer term measurement and reporting where some
assumptions don't apply, such as the two second timeout or on-the-fly
calculation.  The document is fine for what it is, but he doesn't want to
be shackled with those assumptions for long-term reporting,
or other aspects of measurement.





7. Composition Framework and Spatial Composition drafts
     a) draft-ietf-ippm-framework-compagg-00
     b) draft-ietf-ippm-spatial-composition-00
   --Al Morton

Al briefly talked about the framework for composition of metrics and
the spatial composition drafts.  Stephen van den Burghe may not be
able to help out any more, his job has changed; Matt noted that he has
some ties to the GEANT2 JRA1 group working on measurement for the
GEANT2 network that Stephen was working with, so there is a
possibility of finding another from that group.

For the composition draft, we defined a finite one-way delay (which
relates to the "different points of view" presentation earlier).
We looked at RFC 3393 (delay variation) in order to do it.
Al noted the closed issues now reflected in the document, and
that decomposition (inferring the internals of the network from
end-to-end measurements) was not going to be included based
on the previous discussion.

There is still a lot of redundancy in this draft, and that will be
tackled in the next revision, along with adding composed metrics
that are more than just averages as requested in the last meeting.

However, Al feels the work is in jeopardy: if we can't agree on working
with conditional distributions and finite delays, then we won't have
means (and you cannot add medians).

Henk asked people to re-read the drafts and post comments to the mailing
list.


8. Bandwidth Draft
    --Joe Ishac

Joe did a brief report on the bandwidth definitions draft.  The current
version has been available since late June.  It's been almost a year
since the initial draft, and we have had good discussion on the list.
There is one open issue, that that is whether to incorporate the notion
of "Type-P" into the draft.  Joe and Al Morton will be meeting to
discuss the issue.

If that does not result in any major changes, the draft is basically
stable and ready to go to WGLC.




9. Packet Burst Metric 
   --Roman Krzanowski

Roman gave a presentation on the idea of a packet burst metric;
he would like to gauge community on whether work should go forward.
A packet burst metric would be good to have when dealing with
networks used for voice over IP or video.  Right now, there is
no common definition of "burst".

This is a high-level proposal to develop metrics to measure
packet bursts.  It is not very complex, but it would be good
for the community to agree on common definitions.  In particular,
Roman is not interested in developing a new definition, but
just provide an agreed-upon standard definition.  There is
a depth to the problem, opportunities for extension and application.

Lars Eggert asked if inter-arrival time is varying or fixed in
this proposal.  Roman stated he's seen definitions with varying
inter-arrival time, but believes we should start with fixed
times, based on the application that would be affected by the burst.

Mark Allman asked if this is per-flow or aggregate?  The initial
thinking is per-flow, and that is Roman's background.  He reiterated
that burstiness is not a new thing, there is a lot of literature
about bursts.

He has a two dimensional view of bursts where arrival time
is plotted vs Inter-arrival time, and the threshold is set
to distinguish between long and short inter-loss times.
Further research includes more dimensions and second/third order
statistics, including Fractals.

Matt Mathis noted that the bursts seen in TCP are caused by
cross traffic, and if you measure with a different TCP, you'll
see different burst characteristics.  If the metric is application
oriented, it may not fit with network measurement.  Roman said
he has seen a paper on this.  However, the fundamental issue
here is defining the burst -- what we mean by burst.

Roman will create a draft based on these ideas and send to the
list for comment.




10. Registry BCP draft 
      draft-stephan-ippm-reporting-registry-00
    --Emile Stephan

Emile posted a draft that just missed the -00 cutoff, but sent email
to the list with a pointer.  He noted that recent composition drafts
have prompted another look at the registry.  He felt a new version
must capture all the parameters and options that are defined in our
RFCs.  It could store use cases for reporting.  Emile has some ideas
how to do this, and some examples.  The WG is asked to read the draft and
comment.





11. AOB

There was no time for "any other business", although there was
a request by Jerome Durand to make a short presentation on what
was being done for network measurement in GEANT2 (the European
research network).  Interested people can go look at
http://www.geant2.net/ (select Research from the top and
then Performance and Measurement on the left),
http://www.perfsonar.net/ , 
and http://wiki.perfsonar.net/jra1-wiki/index.php/JRA1_Main .