Performance Measurement at Other Layers (PMOL) Working Group Minutes
==========================================================

Reported by Alan Clark based on notes from Henk Uijterwaal.

The Performance Management at Other Layers working group met once at the 72nd IETF meeting (July 27-Aug 1, 2008). The meeting was chaired by Alan Clark and Al Morton. Subjects under discussion included a review of the Charter of the group, the Metrics Framework draft and SIP Performance Metrics.

Introduction and Review of Charter

The working group chairs introduced the meeting and reviewed the agenda.  There were no changes made to the agenda.  

The charter of the group was outlined - this is a short-lived
working group with two deliverables, a Metrics Framework draft due for completion in September 2008 and a SIP Performance Metrics draft due for completion in June 2008.  These deliverables are expected to be complete or close to completion by the next IETF. The working group is due to close or be re-chartered in November 2008. 

The SIP Performance Metrics draft (draft-ietf-pmol-sip-perf-metrics-01.txt) is in WG Last Call, which ended 25th July.

The Performance Metrics Framework draft (draft-ietf-pmol-metrics-framework-00.txt) was accepted as a WG draft.

Dan Romascanu reminded the group that new PMOL work depends on decision to continue with the PMOL-WG or to form a directorate (the guidelines draft allows for both). This also depends on new PMOL work being submitted.

Al Morton noted that CCAMP has picked up one document that also could have gone to PMOL, and commented that there may be work items that could result from the IPPM++ BOF.

Alan mentioned that AVT was working on extensions to RTCP XR and that he was planning to mention the metrics framework at the AVT session.

SIP-performance draft
---------------------
The SIP Performance Metrics draft was presented by Daryl Malas.  There had been a quick review, a timer section had been added (normalizing time instance indicators and providing guidance on accuracy), averages were removed from metrics and other nits fixed.

There was extensive discussion of the Session Request Delay metric as Session Requests could be successful or failed.  One suggestion had been to split this into two metrics, one for success and one for failure, as opposed to one metric that describes both. Al Morton suggested that this would give a bi-modal distribution if you process them without thinking, and that it may be better to have a distribution for successful setups and a count of failures. Al suggested that you have to worry about an incorrect setup, Daryl responded that this is hard to identify in practice.
Daryl agreed that mixing the failed Session Request metrics with successful attempts would skew metrics.  Daryl pointed out that it can be difficult to identify failures in cases such as an authentication failure as a response would be received, hence this was a success from one perspective however it would be a failure from the user perspective. 

Daryl agreed with the idea of separating success and failure provided that a timeout could be regarded as the failure criteria. 

Dan asked if failures could be separated into timeouts and negative responses.  Daryl responded that this could be difficult as not all 4xx messages are failures.  

Alan pointed out that a slow response may also be regarded as a failure.  Daryl responded that the metrics are defined at a SIP level, not human level hence the term ?successful? does not necessarily mean success from a user view.

Dave Field suggested that it may be useful to have metrics per type of response.  Daryl said that this was a plausible but not within the scope of this draft.  Dave responded that this could be for categories of response not individual responses i.e. all 200 or all 400 responses. Daryl answered that SIP doesn't work that way, some 400 responses are a failures others are successes, indicating the user should do something. 

Al stated that he was interested in the categories. Was there a noun that describes the 200 class? If some we should use that.

Daryl explained that IM would be incorporated using the existing metrics, as IM looks essentially the same as existing metrics.  Al replied that an IM exchange seemed sufficiently different to session establishment that the metrics should be separated as performance criteria may be different.  Daryl did not want to introduce IM-* versions of all the metrics and explained that within the considerations section there was some discussion of categorization of metrics.

It was agreed to remove Section 6.6, which was a framework for new metrics

There was some discussion of the calculation of timing ? should these be based on first/last bit out/in, should message length and MTU be included?  

Session Duration Time is the length of a call, Daryl explained that he viewed seconds as being sufficient.  

There was no algorithm for session defects ? and Daryl will add a simple explanation of this.

There was some discussion of the use of Rate vs Ratio.  Paul Aitkin ? commented that a Rate involves time and hence Packet Loss Ratio would be an expression of what proportion of packets were lost whereas Packet Loss Rate would be the number of packets lost during some time interval.

Al asked about the calculation of Session Efficiency Rate.  Daryl said that this had been the subject of extensive discussion but would check to ensure that the correct response codes were included.  Daryl related this to Network Efficiency Rate (NER) and Answer-Seizure Rate (ASR) used by ITU ? where NER relates to whether the signaling worked and ASR to whether the user answered the phone.

Daryl asked the chairs if the updated draft should be re-reviewed within SIPPING or can it be moved forward.  Al answered that the draft should be complete before SIPPING meets again and it should be sufficient to cross-post the WGLC to both PMOL and SIPPING.  Dan confirmed that this should be sufficient for WGLC, and by the time the draft gets to IESG Dan will ensure the correct AD?s review the draft.

Metrics Framework draft
-----------------------------

Alan introduced the latest version of the Performance Metrics Framework.  This is now a WG draft, and has been generally cleaned up to provide a clean starting point.

Dan had provided a number of comments:

(i) The IETF now has two ?active? working groups rather than two working groups ? this will be corrected in the draft.
(ii) Section 3.3 - the wording for temporal aggregation should refer to ?sets of metrics? rather than ?metrics? ? this will be corrected
(iii) Section 3.4.1 ? it not clear what the difference is between the definition and description of the metric.  Alan explained that this was 
(iv) Section 3.4.1 ? does not mention measurement points ? Alan agreed that if the measurement is specific to a measurement point this should be specified.

Daryl commented that Section 3.4.1?s proposal that guidance is given on whether the value range for a metric was ?good? went against some of the WG?s discussion and guidance on this issue.  Alan responded that some metrics (e.g. MOS) could be the subject of guidance whereas the ?good? range for other metrics may be application specific.  Daryl suggested that the IETF did not typically provide this type of guidance whereas ITU may.  Alan disagreed with this, pointing out that providing metrics without some guidance can be confusing for the user, giving Echo Return Loss as an example; he agreed that for some situations, being able to get any communication or connection at all would be acceptable.

(v) Should conformance testing should be specified as part of the metric and should be a SHOULD or MAY?  Dan clarified his opinion that this should be milder, for example a lowercase ?should? or MAY.  Daryl commented that the intent of this seemed good however we should avoid setting a bar for implementers. Alan responded that it may be a better idea to give guidance that verification of the metric would be helpful and on how this could be achieved.
(vi) Section 3.5 provides a checklist on whether a metric definition is good or not.  Dan suggested that this was useful but could potentially be merged with the review criteria (in the second part of the document).
(vii) Section 3.6 contained text relating to reporting models and it was suggested that this be merged with Section 3.4.1.
(viii) Section 4.2 ? Dan commented that the approach for approving new work items should be the existing IETF approach.
(ix) Section 4.3 ? related to the WG or Directorate ? Dan suggested that the draft was a little abstract and should more clearly state the options that resulted from the BOF, leaving the decision to the IESG.
(x) Section 4.3 ? no need for dedicated mailing list in most cases, use the existing list.

Daryl commented on the list of composed metrics, suggesting that more examples could be included. He also suggested that there was considerable use of SHOULD in the draft and that MUST would be preferable. 

Alan asked the question ? ?is a BCP normative? and if not then can it include normative language.  Dan responded that even if MUST was used then economic or practical considerations may mean that MUST?s would be ignored ? hence his recommendation was that strong guidance (i.e. SHOULD) would be sufficient.

AOB
------

There was a brief discussion on other potential work items.  Alan outlined some work on a generalized model for measuring and reporting the performance of applications such as HTTP and SMTP, and suggested that this could be a potential new work item.  He will submit an I-D on this prior to the next IETF to see if there is potential interest in this topic.

The meeting closed