[PMOL] Several questions/suggestions from my review of draft-ietf-pmol-sip-perf-metrics-04

Robert Sparks <rjsparks@nostrum.com> Tue, 29 September 2009 18:30 UTC

Return-Path: <rjsparks@nostrum.com>
X-Original-To: pmol@core3.amsl.com
Delivered-To: pmol@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 82F9128C14B for <pmol@core3.amsl.com>; Tue, 29 Sep 2009 11:30:38 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.505
X-Spam-Level:
X-Spam-Status: No, score=-2.505 tagged_above=-999 required=5 tests=[AWL=0.095, BAYES_00=-2.599, SPF_PASS=-0.001]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ZS8V9eTAERzD for <pmol@core3.amsl.com>; Tue, 29 Sep 2009 11:30:37 -0700 (PDT)
Received: from nostrum.com (nostrum-pt.tunnel.tserv2.fmt.ipv6.he.net [IPv6:2001:470:1f03:267::2]) by core3.amsl.com (Postfix) with ESMTP id 589783A6904 for <pmol@ietf.org>; Tue, 29 Sep 2009 11:30:35 -0700 (PDT)
Received: from [192.168.2.2] (pool-173-71-53-15.dllstx.fios.verizon.net [173.71.53.15]) (authenticated bits=0) by nostrum.com (8.14.3/8.14.3) with ESMTP id n8TIVpGQ002568 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Tue, 29 Sep 2009 13:31:51 -0500 (CDT) (envelope-from rjsparks@nostrum.com)
Message-Id: <72A0E3C5-322B-4FEE-B565-9659581BBFC4@nostrum.com>
From: Robert Sparks <rjsparks@nostrum.com>
To: pmol@ietf.org
Content-Type: text/plain; charset="US-ASCII"; format="flowed"; delsp="yes"
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (Apple Message framework v936)
Date: Tue, 29 Sep 2009 13:31:50 -0500
X-Mailer: Apple Mail (2.936)
Received-SPF: pass (nostrum.com: 173.71.53.15 is authenticated by a trusted mechanism)
X-Mailman-Approved-At: Wed, 30 Sep 2009 01:17:48 -0700
Cc: pmol-chairs@tools.ietf.org
Subject: [PMOL] Several questions/suggestions from my review of draft-ietf-pmol-sip-perf-metrics-04
X-BeenThere: pmol@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Performance Metrics at Other Layers <pmol.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/pmol>, <mailto:pmol-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/pmol>
List-Post: <mailto:pmol@ietf.org>
List-Help: <mailto:pmol-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/pmol>, <mailto:pmol-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 29 Sep 2009 18:30:38 -0000

Hi All -

I have several concerns about draft-ietf-pmol-sip-perf-metrics that I  
would like to discuss.
I've asked for a dedicated RAI-review of this document, so there may  
be additional comments
later, but I wanted to get these to you now so we can start working  
through them.

These comments are more-or-less in document order, with a couple of  
nits moved to the end.
I've numbered them to help split responses into threads later. When  
replying to just one of the
items below, please change the subject line to indicate what you're  
replying to.

Thanks,

RjS

--------------------------------------------------------------------------------------------------

1 The document should more carefully describe its scope (and consider
   changing its title). This document focuses on the use of SIP for
   simple telephony and relies on measurements in earlier telephony
   networks for guidance.  But telephony is only one use of SIP. These
   aren't the same metrics that would be most useful for observing a
   network that was involved primarily in setting up MSRP sessions for
   file transfer, for instance. A eventual set of generic SIP
   performance metrics will need to focus on the primitives rather than
   artifacts from any particular application.

2 That said, I'm skeptical of the utility of many of these metrics even
   for monitoring systems that are focusing only on delivering basic
   telephony. Has the group surveyed operators to see what they're
   measuring, what they're finding useful, and what they're just
   throwing away? Some additional text motivating why this particular
   set of metrics were chosen should be provided to help
   operators/implementers choose which ones they are going to try to  
use.

3 "Each session is identified by a unique Call-ID" is incorrect. You
   need at least Call-ID, to-tag, and from-tag here. And to be pedantic,
   you're describing the SIP dialog, not one of the sessions it manages.
   The session is what is described by the Session Description Protocol.
   The metrics in this draft are derived from signaling events, not
   session events, and is making assumptions about how those correlate
   for a simple voice call that may not be true for more advanced uses.

4 The document is inconsistent about whether the metrics will describe
   any part of an early-dialog/early session. The introduction indicates
   it won't and focuses on the delivery of a 200 OK, but there are
   metrics that measure the arrival time of 180s. This should be
   reconciled. Do take note that early sessions are pervasive in real
   deployments at this point in time.

5 These metrics are intentionally designed to not measure (or be
   perturbed by) the hop-hop retransmission mechanisms. This should be
   made explicit. There should also be some discussion of the effect of
   the end-to-end retransmission of 200OK/ACK on the metrics based on
   those messages.

6 The document should consider the effects of the presence or absence
   of the reliable-provisional extension on its metrics (some of the
   metrics will be perturbed by a lost 18x that isn't sent reliably).

7 Using T1 and T4 as the timing interval measurement tokens is
   unfortunate. SIP uses those symbols already to mean something
   completely different. Is there a reason not to change these and avoid
   the confusion that the collision will cause?

8 The document uses the terms UAC and UAS incorrectly. It is trying to
   use them to mean the initiator and recipient of a simple phone call.
   But the terms are roles scoped to a particular transaction, not to a
   dialog. When an endpoint sends a BYE request, it is by definition
   acting as a UAC.

9 The document uses the word "dialog" in a way that's not the same as
   the formal term with the same name defined in RFC3261 and that will
   lead to confusion. (A sequence of register requests and responses,
   for example, are never part of any dialog. The INVITE/302/ACK
   messages shown in the call setup flows are not part of any dialog.)
   Please choose another word or phrase for this draft. I suggest
   "message exchange".

10 The 3rd to last paragraph of section 4 should be expanded. I think
   it's unlikely that implementers, especially those with other language
   backgrounds,  will understand the subtlety of the quotes around
   "final".  Enumerating the cases where you want the measurement to
   span from the request of one transaction to the final response of
   some other transaction will help. (I'm guessing you were primarily
   considering redirection, but I suspect you also wanted to capture the
   additional delay due to Requires-based negotiation or 488
   not-acceptable-here style re-attempts?). You may also want to
   consider the effect of the negotiation phase of extensions like
   session-timer on these metrics.

11 The document assumes that a registration will be DIGEST challenged.
   That's a common deployment model, but it is not required. If other
   authentication mechanics are used (such as SIP Identity), the RRD
   metric, for example, becomes muddied.

12 In section 4.2, "Subsequent REGISTER retries are identified by the
   same Call-ID" should say "identified by the same transaction
   identifier (same topmost Via header field branch parameter value".
   Completely different REGISTER transactions from a given registrant
   are likely to have the same Call-ID.

13 The SRD metric definition in 4.3.1 ignores the effect of forking.
   Unlike 200 OKs, where receiving multiple 200s in response to a single
   INVITE only happens if a race is won, it is the _normal_ state of
   affairs for a UAC to receive provisional responses from multiple
   branches when a request forks. Deployed systems are increasingly
   sending 18x responses reliably with an answer, establishing early
   sessions, so when forking is present it is _highly_ likely that there
   will be multiple 18x's from different branches arriving at the UA.
   This section should provide guidance on what to report when this
   happens.

14 The Failed Session Setup SRD claims to be useful in detecting
   problems in downstream signaling functions. Please provide some text
   or a reference supporting that claim. As written, this metric could
   be dominated by how long the called user lets his phone ring. Is that
   what was intended? You might consider separate treatment for 408s and
   for explicit decline response codes.

15 What was the motivation for making MESSAGE special in section 4.3.3.
   Why didn't the group instead extend the concept to measuring _any_
   non-INVITE transaction (with the possible exception of CANCEL)?

16 In section 4.4, what does it mean to measure the delay in the
   disconnect of a failed session completion? Without a successful
   session completion, there can be no BYE. This section also begs the
   very hard to answer question about what to do when BYEs receive
   failure responses. It would be better to note that edge-case exists
   and what, if anything, the metric is going to say about it if it
   happens.

17 Section 4.5 is a particularly strong example of these metrics
   focusing on the simple telephony application. It may even be falling
   into the same traps that lead to trying to build fraud-resistant
   billing based on the time difference between an INVITE and a BYE.
   Some additional discussion noting that the metric doesn't capture
   early media and recommendation on when to give up on seeing a BYE
   would be useful. (Sometimes BYEs don't happen even when there is no
   malicious intent.)

18 Trying to use Max-Forwards to determine how many hops a request took
   is going to produce incorrect results in any but the most simple of
   network deployments (I would have expected this to be based on
   counting Vias with a note pointing to the discussion on the problems
   B2BUAs introduce). Proxies  can reduce Max-Forwards by more than one.
   There are many implementations in the wild that cap Max-Forwards. If
   this metric remains as defined, you should also point out that
   neither endpoint can calculate it. Some third entity will have to
   collect information from each end to make this calculation.

19 The ratio metrics don't define (or convey) the interval that totals
   are taken over. Are these supposed to be "# requests received since
   this instance was manufactured' or "since last reboot" or "since last
   reset of statistics" or something else? What is the implementation
   supposed to report when the denominator of a ratio is 0?

20 Please add some discussion motivating why all 300s, 401, 402, and 407
   are treated specially (vrs several other candidate 4xx and 6xx
   responses) in sections like section 4.8. Were other codes considered?
   If so, why were they rejected?

21 Section 4.9 seems to be implying that you can't receive a 500 class
   response to a reINVITE which is not true. If you want this metric to
   only reflect the results of initial INVITEs, more definition will be
   needed.

22 ISA in section 4.10 claims that 408s indicate an overloaded state in
   a downstream element. Overload may induce 408s, but 408s do _not_
   indicate overload. Its possible to receive them just because someone
   is not answering a phone.

23 In section 5, why where these correlation dimensions chosen. Was the
   Request-URI considered? If so, why was it rejected?

24 The treatment of forking in section 6.3 is insufficient. As noted
   earlier, provisional messages establishing early sessions is becoming
   common, and there will be multiple early sessions for a given INVITE
   when there is forking. The recommendation to latch onto the "first"
   200 (or 18x) and ignore the others only marginally works for playing
   media for simple telephony applications - we're seeing phones that
   mix or present multiple lines, and applications that go beyond basic
   phone calls (like file transfer) that make use of all the responses.
   Trying to dodge the complexity as the current section does will lead
   to metrics that don't reflect what the network is doing.

25 I'm a little surprised there is no discussion on privacy,
   particularly on profiling the usage patterns of individuals or
   organizations, in the security considerations section.

26 Nits:
     26.1 What does it mean in section 4.3.1 for the "user" to send the
       first bit of a message? Suggest deleting "or user" from the
       sentence.
     26.2 Section 4.11 has a stale internal pointer to a non-existant
       section 3.5 I suspect it's trying to point back into 4 somewhere.