DRAFT DRAFT DRAFT DRAFT DRAFT

IP Performance Metrics Futures BOF (ippm++)
Wednesday, July 30, 2008, 09:00--11:30
===========================================

This session was chaired by Henk Uijterwaal and Matt Zekauskas,
and Paul Aitken was the scribe.

AGENDA:
  1. Administrative details.
  2. What did IPPM do so-far (Chairs)
  3. Possible future work items.
     a) SLA validations (Joel Sommers/Nick Duffield, Al Morton
        showing the slides).
     b) Passive measurements (Yutaka Kikuchi)
        (draft-kikuchi-passive-measure-02.txt, and
         draft-kikuchi-tunnel-measure-req-02.txt)
     c) Comparison of Metrics
     d) Other possible work items
  4. Discussion.
  5. How to proceed from here.


1. Agenda

    * Introduction
    * goals and non-goals.
    * Input from the list


2. What we did so far

    * 10 years of work
    * Work in progress
    * Future of the group

      - Work that never got done.

Matt: We are in transport, looking at metrics. PMOL is looking at
others. Talking about transport and measuring the IP network that we've got.

    * passive measurements
    * SLA monitoring
    * Moving metrics to standards

Lars Eggert: I thought there was a suggestion from Al, what to do if
results would be equivalent. Someone sent something to the list...

Henk: There were a couple of postings from Ruediger Geib from Deutsche Telekom.

Lars+Henk: We should just do that.

Al Morton: I always thought that the draft that listed all the
implementations was somewhat useful, and the way I thought it useful,
people compared their implementations against the specifications,
really comparing the implementation to the wording in the
specs. Advancing the RFCs along the stds track means the product we're
focusing on is the specification itself. There are two goals: how
accurately and well they compare, by product of the specifications
themselves. It is more direct to look at the specs and ensure they
meet what the specs say eg declaring loss of 3s, but see loss of 3.1s
delay, does that implementation meet the spec?  Things along those
lines that feed back into the spec. If someone has misinterpreted that
spec, it feeds back.  The equivalency of the measurements, comparing
impls rather than embodiments of the specs.  There are very specifc
things we could check without getting into the statistical equivelence
of the impls. Comparison of impls and specs is the most direct thing
we can do to move along the stds track.


Matt: Should some test vector be part of...

Al: I think it makes more sense if people are adopting our stdised test
formats. People have to build their own packets to meet our specs. Test
vectors make a lot of sense, store them some place people can use.

David Dyson, Verizon: Perfmon is particularly important to enterprise
customers. When we put perfmon networks out there, they only impl subset
of metrics you define. We can only get one vendor to work with another.
I'd like to see a subset, a potential future activity, a profile against
the specs with more precise specs. You must support...  Do more reports.
I would be more interested in. Across the internet, perfmon become more
important. Multi vendor and multi-provider. proposals to turn off, MPLS
nets, traceroute and ping, to diagnose latency, need a replacement for
diagnosis in those networks.

Henk: The draft which has those things has expired. When I put it
together I asked a bunch of questions.

Rudiger: I'm not sure whether it's good to have those comparisons.
I'm cooperating inside my company with a statician, internal tools, make
measurements with different average sending rates. Looking at
inter-domain measurements with different boxes, if you start something
across different provider networks, you have some understanding of the
error the other system provides and your system provides, you need
something.


    * How to contribute.
    * Questions/Discussion


3. SLA Monitoring/Nick Duffield

    (Al Morton presenting)
    NB There may be IPR associated with this material

    * Motivations
    * Leverage recent research advances?
    * Metrics under consideration here
    * Limitations of Poisson Probing of Loss
    * New Probing Methods for Loss and Loss Episodes
    * One-way delay: mean delay
    * One-way delay variation: delay quantiles
    * One-way delay: delay variation
    * Summary

Rudiger: Read some of the publications. Doing quite good work. Want to be
clear about IPR and underlying issues. If this is clear we could decide
whether it should become a WG item. Want to know about IPR before this
could be standardised.

Lars Egert: The wg needs to understand the terms; if the terms are
acceptable to the group then this can become a wg item.

LE: Ask Al a clarification: are we adding to metrics or obsoleting some
of them?

Al: the answer is "adding to them".

Kaynam Hedayat: I think this is a very good idea. Our customers tell us what
the metrics are, this brings clarity into the industry. Also want the
IPR clarified.

LE: Sometimes folks wait, don't want to go through pain of IPR
disclosure. This is the one thing blocking the WG from considering this
item. The ball is in their court.

Joel, via jabber: I cannot respond to the IPR as yet.


4. Passive Measurements/Yutaka Kikuchi

    * Note
    * Contents
    * Background
    * A case: TSP service
    * Problems
    * IPPM metrics

YK: Only for active metrics?

Matt: Can use them in a passive way; the basic ideas are still
applicable. We've not shown how to do that yet. Curious to know what you
think about how to make them less generic, eg profile what pieces of the
metric apply to the network you're measuring. It gives you a real end to
end view of the performance level along this path.

YK: Some subset can be used for passive measurement, but too abstract
for companied measurement.

    * Motivations
    * Requirements
    * IPPM docs help...
    * Need more helps with
    * Fermata

YK: End of the first part. Rest of the slides are for your information.

    * Note
    * Possible Solutions
    * Synopsis (in‐sequence)
    * Definition
    * Synopsis (loss, dup, reordering)
    * Definition
    * Measurement method
    * Detecting skipping packet
    * Detecting dup‐train
    * Detecting astern
    * Complex situations
    * An Implementation Example
    * A measurement sample
    * Discussions
    * Acknowledgements

Henk: Comments, questions?

Matt: What you're proposing requires you can get a seq no out of the
stream. You've used tunneling protocols wher eyou're got a seq num in
there. Have you given thought to making it more general? It seems the
applications are limited because of the requirements for a seqeunce number.

Henk: Correct, it's something we have to look at.

Matt: Is there other interest outside? We should extend what we've done
so far. I can think of ways to make it apply.

Kaynam: I agree with Matt, the measurements applied here apply to
passive and active as well. Experience the past 8 years, we had active
and passive meausrements in our company. You need seq nums or timestamps
or something to do these measurements. It's very costly to do these
passively. Yet passively applies well to applications. Challenge is...

Yakov Stein: If this is focussed at transport, service provider nets,
we do have seq no in packet. First slide, prob of active clocks -:
tictoc WG provides clocks inexpensively, problem solved.

Henk: Any further comments? No? Thank you.

Henk: Next, comparison of metrics. I don't think we have to say
anything there. Last item = other work we had.


** Input from the list (BOF slide 7)

Matt: Agenda 3d. Anything else?

Allen Clark: IPPM looking at packet delay variation, percentiles,
quantiles. When you're trying to infer perf of app based on those, it
can be useful to have model of packet delay variation that fits with the
model. I did some work leading to recommendation in ITU for testing VoIP
and apps over RTP. Are there time series models or others like that
which are representative. Would be extr useful for IPPM to characterize
app performance.

Matt: would that fit with Al and Joel's presentation? Another set of
stats on the base metrics.

AC: Yes, the input to the model is a sequence of individual delay
elements. Trying to match a time series model to the performance of the
channel.

Matt: are the models we could work off that could easily be stdised
now, or more work rq?

AC: Both. G.1050 = simple model, no IPR there. Simple model could be
looked at as a candidate, that approach is an interesting work. Another
is to measure packet delay variation to measure VoIP variance. Look at
behavior of jitter buffer. [...] That model works quite well, though
it's very application specific. [...]

Al: Quick followup. You mentioned two standards where those things
already exists. We'd be building on that?

AC: See it as extending. [...] Time series models deemed to be suitable
for that particular application, but not made generic. Update needed for
IPTV. I don't see any conflict in this WG with ITU study group 12. [...]

Al: Answers my question. Real q is will be expand charter...

AC: You almost are. Packet loss ratio, why are you doing that? If it
didn't have any effect, why would you measure it? [...]

Al: Packet loss is important to all apps.

Jabber Q from Severio: Seems to be saying the same as AC.

AC: [...]


Bob Briscoe, BT: General SLA validation etc. The security considerations
section, lead us into complicated area: how do you trust measurements?
Involved in a pre-stds group for interconnected diffserv domains, work
went into ITU. Had to assume operators trusted one another. My work is
around that area. If you're trying to do one-way measurements across
other operators domains, you end up in difficulty unless you trust that
operator. It's quite a difficult area.

Matt: Good point. We define ways of measuring things, but not say
what's good or bad. We don't define validation, up to implementor to set
up trust relationships.

BB: I guess another way to say that: people who are expert at stats and
measurement aren't good at security.

LE: You have tussle if your SLA isn't specified on IPPM Metrics, can
you use IPPM to verify it. is it actually telling you anything? Bob
brought up security, can you actuallymake a statment whether this is
valifd or not?

Henk: I don't think you can. People use IPPM as start for SLA

Al: I've been in front of firing line many times, being asked where did
these metrics come from? Quote RFCs so they know you haven't cooked up
these metrics at home.

BB: Do you want to go there, for that security architecture for
measurement?

??: Did I understand, this is related to end-to-end meausrements, or
concatenated

BB: If a bunch of ops want to know which one is to blame, you end up
with a complicated measurement architecture.

If you provide VPN service, and you're not present all over the world,
you offer SLA to customer, if there's trouble you want to break that
down. Interested in inter-carrier SLA's, but not interestedin measureing.

Henk: As individual, we install measurement equipment for you, expect
you don't do anything special with the traffic. You can manipulate
measurements like that.

Matt: Any more comments?

BB: I can give reference to SIGCOMM paper from 2006. Laskowski.

Henk: post it to the list.

Matt: I'm feeling from the room, no, we're probably not going to go
after the trust issue.  We're defining the metrics which make it
possible, how to place things so you've got the trust in security
there. You need to make sure the metrics are well defined so you can
build the trust but I don't see us definiting that trust.

David: I think that's the first thing you need to do. Metrics
defined, then semantics defined. Really not quite to that first point,
it's a prerequisite. Second has issues beyond transport, beyond the IETF.
I would support we need to define those metrics. At least the IETF
provided the framework. Too many choices.

Matt: You're reminding me... Al's reporting draft defines reporting
for long-running quality control tests, including specific parameters
for our metrics.  This may go a long way toward what you want as a
profile against the specifications for internet service providers.

Al: Find it in the tools page. That draft recognised all the major
metrics. You have to populate those options with the point in mind, how
are they going to be used? go a long way to the right answer if you
choose the right options to begin with. gets you some of the way with
the existing metrics. Get that in the minds of rechartering today.

Benoit Claise: I support the work on SLA, should be in profile. If the
do SLA, want to check it by doing IPPM. Concern I have to implement
that, we start to have an awful series of metrics. They are added to
previous one? We're over estimating knowledge of our customers. A basic
set of metrics would be appropriate. Do we want to have so many metrics
they're not used in a consistent way?

Al: Excellent point - information overload. Some of the
nick/joel/paul/... work helps us in terms of the confidence interval
around what *is* measured. For this sample the quantile is this, for
another sample it's that. Error bars are always helpful.

Alan Clark: Should you be talking about SLA or SLM (measurement)? SLA
implies contractual relationship. Users like SLAs they can understand,
where metrics relate to their experience. Maybe the terminology using in
relation to richer set of statistics is a worthy thing. Keep it simple
and understandable, which the recipient of the service can relate to.

Henk: any other comments?

Rudiger: I have experience inside the company, it's quite big. So they can
ping see whether service is available - but if not, they can't see where
it's disturbed. Difficult to get into that business and see. I had inter
carrier SLA already, difficult to understand. Agree work on SLA is good.
Better if providers agree what's best themselves and how to deal with it.

Henk if no more comments, next q is.

Matt thinking about this multi network debugging. Internet2, working
with other research nets around the world. Building perfSONAR. Taking
base metrics, IPPM metrics + some others, schemas in XML to report this
uniformly. Get stats from network segments to help you report this
stuff, debug using multi-provider stats. It's some other pleace to look for
mechanisms to help solve that problem.

Henk: brings us to next q, hoew to procede from here. Looks like we
have  a number of topics on our plate. Fits reasonably well with outr
charter. As chairs we need to revise charter and approve this work.

Al: I'd say passive measurement, if that's poart of it, it'd be a
significant jump.

Matt: Depends how you define it.

Al: that's true too.

Matt: trying to summarise what I've heard today.

* metrics advancement along stds track
   "just do it"
   - look at bench tests for metrics; test vectors.
   - involve staticians for comparison of results.
     * Ensure have one or more lined up before progress?

* multi-provider
   - profiles against specs?
   - need replacement for ping/traceroute?
   - troubleshooting performance problems (vs SLA)

Vote: saw about 12 said important, 1 said no.

Emile Stephan: in the current charter we have enough work to achieve.
New advanced metric to compare what you want. I prefer we work on meters
to be able to compare existing meters, than to go further on advanced
metrics. If the people who promote advanced metric have way to comapre,
I think it would be good for ht enext charter

Matt: I'm hearing you're interested in comparison of metrics and how to
ensure what we have meets spec, but want to ensure we're not adding new
metrics.

ES: We don't need new metrics for now; we have enough work with active
metrics for now.

Al: Help if you titled that first item "metric advancement along the
stds track".

Matt: Is anyone interested in working on this? See 2.


    ** "SLA validation"

    * Better probing, new statistics
      - fundamental metrics still used
      - clarify IPR; WG needs to understand terms

    * SLA profile

      - what should be measured
      - tension with above: don't want too many metrics,
        want simple and understandable

David Dyson: mine is better than yours, response time is not repeatable,
cannot be analysed. Ping or traceroute useful for troubleshooting. Pings
may take a markedly different route across the network. Something on a
more qualative basis: this is a source of latency of loss. SLA becomes
more challenging. Can influence business decisions, selection of
provider. Seperate these two things. Let's try to get the basis ones
adopted as potential charter. Could continue research for more metrics,
but eventually someone wants to get the benefit of this work.

Matt: How many are interested in working in multi-provider
troubleshooting? Seeing about 10. Little fewer than metrics advancements.

LE: q about replacement for ping/traceroute. Talking about a new
protocol deployed on every hop, or same protocol used differently?

Matt: Motivation thing rather than implementation thing? Not looking for
a new protocol

LE: So something better than ping/trace.

??: Two different cases: multi provider troubleshooting after perf
degradation occurs. Other is to ensure perf quality of customer traffic.
You don't know when perf deg happens. You're loosing events that occur
in the past.

Matt: are you asking about interest in that?

Canan?: generally, but also interested in the latter case too. In this 
case can deal with the former case, not the latter.

Matt: reserach groups implement end to end and monitor over time.

Matt: Anyone willing to work on that? See 3.


     "SLA validation" slide from above.

Matt: For this piece, are people interested and think this is a good
idea, versus people who think we should focus on what we have now? We
have to wait for IPR clarification.

Matt: How many think better to work on new probing, better stats? See 8-10?

Matt: How many think we should focus on what we do now? See 3.

LE: Clarify WG in the past has strong interest to work on something but
IPR was unacceptable. Had to work around it, not conflict with IPR.

Matt: do people think this (measurements useful for SLAs) is useful to
work on? See about 7,8.

Matt: How many willing to actually work on this: seeing 4 or 5.

Bob: what is the relation between this group and transport metrics group?

LE: TMRG is about how you compare transport protocols, not really about
IP path. Is this TCP version better than that one?

Matt: get to the passive thing in a second. in my mind it's not
comparing transport protos, but here is properties of that path.

Bob: [...]

Matt: to me, TMRG is, here are protos for end to end transport, what are
criteria to judge between mutlitpe ones.

LE: TMRG metrics = goodput, metrics on transport level connections. IPPM
has always been about the path.

Allan clark: what do you meen about sla profiles. Do you mean guidelines
doc, here are the metrics to pick from to provide SLAs, or here are
metrics to provide SLAs?

Matt: set of metrics and params

AC: So, a set of sla guidelines.

Matt: final thing we talked about today:

     * Passive

       - provider tunnels: leverage seqnos.
       - apply current IPPM metrics
         * streams... or characterising paths
         * statistics that depend on poisson, etc must be examined.

ES: working on passive means defining what is passive measure. going
further means clearly identify which IPPM metrics are directly able for
passive measurements, then define accpetable meterology. Mixed packet
size on IPPM stream; usually we take care to have fixed packet size.
Another is time.

Matt: why is is important to work on this?

ES: esp for inter domain, very difficult to have active meas between
different telcos.

Matt: Any more comments on passive?

AC: couple of comments. If you're passively measuring, you're measuring
some higher level protocol, so that overlaps work of other groups. But
also those groups should look at our work. Other point: good idea to
understand whether measuring something passively or actively is likely
to result in any significant difference. Active measurement, if you
could also measure passively, would the result be any different? think
the group should consider passive measurements, but not get into
mandatinghow to use rtp seq no to make those measurements.

??: If you measure passively it's not easy to determine path.

BC: agree with allan. At end of the day, these are the metrics, it
doesn't matter how we get them.

Matt: Don't think this is well enough defined to ask q, but who is
interested in having the group work on passive meas? 12.

Matt: Who's willing to define what this means and contribute?

BC: I see the first bullet ("Passive" slide) as a case study. The second
bullet should be first.

Matt: in my mind, that's the right way to go about this.

Matt: Who's willing to help craft this? See 5.

Henk: summarize all this to the list, see how it fits in the charter.
Going to be September due to vacations.

Chairs: Any other comments:

LE: we have shopping list of things, interest in all of them. WG running
10 years, but community relatively small. so what are the prioritize
here? So if you can only take on one thing, what is the most important?
How are we narrowing this down?

Matt: good q to the rest of the group.

AClark: [...]

Henk: first define what to do, see who's interested, then we can prioritise.

LE: ask people to write emails or drafts for discussion in minneapolis?

Henk: should have something on track by then.

Matt: encourage starting with emails, then drafts.

LE: finalise this discussion in minneapolis, but avoid repeating same
slides because no progress by then.

Al: this afternoon, extensions and new features.

Henk: this is work which is fixed in the charter, fixed, ongoing, it
will be in the afternoon.

Henk: OK, that's it.

Matt: summary slide:

     * advancement stds track
     * multi provide profiles (applic stmts)
     * Additional statsistics
     * sla profiles (applic stmts current metrics)
     * passive issues