[PMOL] Several questions/suggestions from my review of draft-ietf-pmol-sip-perf-metrics-04

Robert Sparks <rjsparks@nostrum.com> Tue, 29 September 2009 18:30 UTC

Message-Id: <72A0E3C5-322B-4FEE-B565-9659581BBFC4@nostrum.com>
From: Robert Sparks <rjsparks@nostrum.com>
To: pmol@ietf.org
Content-Type: text/plain; charset="US-ASCII"; format="flowed"; delsp="yes"
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (Apple Message framework v936)
Date: Tue, 29 Sep 2009 13:31:50 -0500
Received-SPF: pass (nostrum.com: 173.71.53.15 is authenticated by a trusted mechanism)
Cc: pmol-chairs@tools.ietf.org
Subject: [PMOL] Several questions/suggestions from my review of draft-ietf-pmol-sip-perf-metrics-04
Precedence: list

Hi All -

I have several concerns about draft-ietf-pmol-sip-perf-metrics that I
would like to discuss.
I've asked for a dedicated RAI-review of this document, so there may
be additional comments
later, but I wanted to get these to you now so we can start working
through them.

These comments are more-or-less in document order, with a couple of
nits moved to the end.
I've numbered them to help split responses into threads later. When
replying to just one of the
items below, please change the subject line to indicate what you're
replying to.

Thanks,

RjS

--------------------------------------------------------------------------------------------------

1 The document should more carefully describe its scope (and consider
changing its title). This document focuses on the use of SIP for
simple telephony and relies on measurements in earlier telephony
networks for guidance. But telephony is only one use of SIP. These
aren't the same metrics that would be most useful for observing a
network that was involved primarily in setting up MSRP sessions for
file transfer, for instance. A eventual set of generic SIP
performance metrics will need to focus on the primitives rather than
artifacts from any particular application.

2 That said, I'm skeptical of the utility of many of these metrics even
for monitoring systems that are focusing only on delivering basic
telephony. Has the group surveyed operators to see what they're
measuring, what they're finding useful, and what they're just
throwing away? Some additional text motivating why this particular
set of metrics were chosen should be provided to help
operators/implementers choose which ones they are going to try to
use.

3 "Each session is identified by a unique Call-ID" is incorrect. You
need at least Call-ID, to-tag, and from-tag here. And to be pedantic,
you're describing the SIP dialog, not one of the sessions it manages.
The session is what is described by the Session Description Protocol.
The metrics in this draft are derived from signaling events, not
session events, and is making assumptions about how those correlate
for a simple voice call that may not be true for more advanced uses.

4 The document is inconsistent about whether the metrics will describe
any part of an early-dialog/early session. The introduction indicates
it won't and focuses on the delivery of a 200 OK, but there are
metrics that measure the arrival time of 180s. This should be
reconciled. Do take note that early sessions are pervasive in real
deployments at this point in time.

5 These metrics are intentionally designed to not measure (or be
perturbed by) the hop-hop retransmission mechanisms. This should be
made explicit. There should also be some discussion of the effect of
the end-to-end retransmission of 200OK/ACK on the metrics based on
those messages.

6 The document should consider the effects of the presence or absence
of the reliable-provisional extension on its metrics (some of the
metrics will be perturbed by a lost 18x that isn't sent reliably).

7 Using T1 and T4 as the timing interval measurement tokens is
unfortunate. SIP uses those symbols already to mean something
completely different. Is there a reason not to change these and avoid
the confusion that the collision will cause?

8 The document uses the terms UAC and UAS incorrectly. It is trying to
use them to mean the initiator and recipient of a simple phone call.
But the terms are roles scoped to a particular transaction, not to a
dialog. When an endpoint sends a BYE request, it is by definition
acting as a UAC.

9 The document uses the word "dialog" in a way that's not the same as
the formal term with the same name defined in RFC3261 and that will
lead to confusion. (A sequence of register requests and responses,
for example, are never part of any dialog. The INVITE/302/ACK
messages shown in the call setup flows are not part of any dialog.)
Please choose another word or phrase for this draft. I suggest
"message exchange".

10 The 3rd to last paragraph of section 4 should be expanded. I think
it's unlikely that implementers, especially those with other language
backgrounds, will understand the subtlety of the quotes around
"final". Enumerating the cases where you want the measurement to
span from the request of one transaction to the final response of
some other transaction will help. (I'm guessing you were primarily
considering redirection, but I suspect you also wanted to capture the
additional delay due to Requires-based negotiation or 488
not-acceptable-here style re-attempts?). You may also want to
consider the effect of the negotiation phase of extensions like
session-timer on these metrics.

11 The document assumes that a registration will be DIGEST challenged.
That's a common deployment model, but it is not required. If other
authentication mechanics are used (such as SIP Identity), the RRD
metric, for example, becomes muddied.

12 In section 4.2, "Subsequent REGISTER retries are identified by the
same Call-ID" should say "identified by the same transaction
identifier (same topmost Via header field branch parameter value".
Completely different REGISTER transactions from a given registrant
are likely to have the same Call-ID.

13 The SRD metric definition in 4.3.1 ignores the effect of forking.
Unlike 200 OKs, where receiving multiple 200s in response to a single
INVITE only happens if a race is won, it is the _normal_ state of
affairs for a UAC to receive provisional responses from multiple
branches when a request forks. Deployed systems are increasingly
sending 18x responses reliably with an answer, establishing early
sessions, so when forking is present it is _highly_ likely that there
will be multiple 18x's from different branches arriving at the UA.
This section should provide guidance on what to report when this
happens.

14 The Failed Session Setup SRD claims to be useful in detecting
problems in downstream signaling functions. Please provide some text
or a reference supporting that claim. As written, this metric could
be dominated by how long the called user lets his phone ring. Is that
what was intended? You might consider separate treatment for 408s and
for explicit decline response codes.

15 What was the motivation for making MESSAGE special in section 4.3.3.
Why didn't the group instead extend the concept to measuring _any_
non-INVITE transaction (with the possible exception of CANCEL)?

16 In section 4.4, what does it mean to measure the delay in the
disconnect of a failed session completion? Without a successful
session completion, there can be no BYE. This section also begs the
very hard to answer question about what to do when BYEs receive
failure responses. It would be better to note that edge-case exists
and what, if anything, the metric is going to say about it if it
happens.

17 Section 4.5 is a particularly strong example of these metrics
focusing on the simple telephony application. It may even be falling
into the same traps that lead to trying to build fraud-resistant
billing based on the time difference between an INVITE and a BYE.
Some additional discussion noting that the metric doesn't capture
early media and recommendation on when to give up on seeing a BYE
would be useful. (Sometimes BYEs don't happen even when there is no
malicious intent.)

18 Trying to use Max-Forwards to determine how many hops a request took
is going to produce incorrect results in any but the most simple of
network deployments (I would have expected this to be based on
counting Vias with a note pointing to the discussion on the problems
B2BUAs introduce). Proxies can reduce Max-Forwards by more than one.
There are many implementations in the wild that cap Max-Forwards. If
this metric remains as defined, you should also point out that
neither endpoint can calculate it. Some third entity will have to
collect information from each end to make this calculation.

19 The ratio metrics don't define (or convey) the interval that totals
are taken over. Are these supposed to be "# requests received since
this instance was manufactured' or "since last reboot" or "since last
reset of statistics" or something else? What is the implementation
supposed to report when the denominator of a ratio is 0?

20 Please add some discussion motivating why all 300s, 401, 402, and 407
are treated specially (vrs several other candidate 4xx and 6xx
responses) in sections like section 4.8. Were other codes considered?
If so, why were they rejected?

21 Section 4.9 seems to be implying that you can't receive a 500 class
response to a reINVITE which is not true. If you want this metric to
only reflect the results of initial INVITEs, more definition will be
needed.

22 ISA in section 4.10 claims that 408s indicate an overloaded state in
a downstream element. Overload may induce 408s, but 408s do _not_
indicate overload. Its possible to receive them just because someone
is not answering a phone.

23 In section 5, why where these correlation dimensions chosen. Was the
Request-URI considered? If so, why was it rejected?

24 The treatment of forking in section 6.3 is insufficient. As noted
earlier, provisional messages establishing early sessions is becoming
common, and there will be multiple early sessions for a given INVITE
when there is forking. The recommendation to latch onto the "first"
200 (or 18x) and ignore the others only marginally works for playing
media for simple telephony applications - we're seeing phones that
mix or present multiple lines, and applications that go beyond basic
phone calls (like file transfer) that make use of all the responses.
Trying to dodge the complexity as the current section does will lead
to metrics that don't reflect what the network is doing.

25 I'm a little surprised there is no discussion on privacy,
particularly on profiling the usage patterns of individuals or
organizations, in the security considerations section.

26 Nits:
26.1 What does it mean in section 4.3.1 for the "user" to send the
first bit of a message? Suggest deleting "or user" from the
sentence.
26.2 Section 4.11 has a stale internal pointer to a non-existant
section 3.5 I suspect it's trying to point back into 4 somewhere.

[PMOL] Several questions/suggestions from my revi… Robert Sparks
Re: [PMOL] Several questions/suggestions from my … Al Morton
Re: [PMOL] Several questions/suggestions from my … Robert Sparks