[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PMOL] RAI-ART review of draft-ietf-pmol-sip-perf-metrics-04 by Dale R. Worley
RAI-ART review of draft-ietf-pmol-sip-perf-metrics-04
Dale R. Worley
I am the assigned RAI-ART reviewer for draft-ietf-pmol-sip-perf-metrics-04.txt.
For background on RAI-ART, please see the FAQ at
<http://www.softarmor.com/rai/art/FAQ.html>.
Please resolve these comments along with any other Last Call comments
you may receive.
This draft is on the right track but has open issues, described in the review.
My comments are divided into groups of varying significance:
- Situations where the draft appears to have omitted measurements that
would be of value, and which appear to be of the same category as
measurements that are included in the draft. However, this omission
may have been deliberate due to the intended scope of the draft, in
which case the intended limitation should be described (and perhaps
justified) in the "Scope" section.
- Deficiencies in the exposition of how various timing measurements
are to be taken. These may also mask technical issues about how the
measurements are intended to be taken.
- Editorial issues, mostly involving aligning terminology with the
common usage in SIP discussions.
The items in these groups are numbered with discontinuous sets of
numbers.
Situations where the draft appears to have omitted measurements that
would be of value, and which appear to be of the same category as
measurements that are included in the draft. However, this omission
may have been deliberate due to the intended scope of the draft, in
which case the intended limitation should be described (and perhaps
justified) in the "Scope" section.
100. Is the intention to restrict attention to signaling (SIP) alone?
In our experience, performance problems first come to users' attention
in media (RTP), and any environment with tolerable media performance
has more than adequate signaling performance.
101. The measurements appear to be designed to closely parallel
performance metrics of TDM telephone systems. This may be
intentional, but this draft omits a number of measurements that do not
closely parallel TDM performance metrics, but are nonetheless
important for the performance of SIP systems, even when limiting
attention to "traditional telephone" usage. (See later items for
specifics.)
102. Many of the metrics appear to form natural triples:
- Average delay when the operation (of a particular class) was successful.
- Average delay when the operation was unsuccessful.
- Percentage of operations of the class which were successful.
The metrics scheme would be clarified and made more systematic if this
grouping was defined as an overall property of the metric scheme and
then applied as such to the various classes of operations:
registration request, session request, session establishment, session
disconnect, etc.
E.g., although "Failed session completion delay" (average
BYE-with-failure response delay) is defined, the *percentage* of
failed session completions is not defined, despite that that metric
should be very low in any well-functioning network, and thus can
provide a valuable performance indicator.
103. Some metrics distinguish between "failure in the network" and
"failure at the destination user agent"; e.g., "Session Defects
Ratio". Beware that distinguishing between these two cases is very
difficult and cannot easily be done by listing response codes for the
two cases. (I've been developing an I-D for a couple of years on how
to deal with this problem -- I have yet to devise a good solution.)
It may also be desirable to apply this distinction to other operations
other than initial INVITEs -- this would effectively add further
metrics to the "natural triple" described in item 102, as it divides
the class of failures into two sub-classes.
104. In regard to "telephone calls" (INVITE-initiated sessions), there
are three metrics described (each of which gives rise to a triple of
metrics):
- Session request (INVITE to 180)
- Session setup (INVITE to final response)
- Session disconnect (BYE to final response)
In addition, there is a metric for REGISTER requests.
But there are a number of additional operations whose performance is
important to the performance of SIP signaling, and which may differ
greatly from the performance of the above operations (and for which
the users' expectations may be very different from the above
operations):
- re-INVITEs
(These are particularly important as re-INVITE is used to implement
on-hold and off-hold operations, and users expect those operations
to complete much quicker than initial INVITEs.)
- UPDATE (if it is used in practice)
- initial SUBSCRIBE (for dialog events)
- re-SUBSCRIBE
- NOTIFY
Several metrics might be defined regarding REFER requests, which are
used to implement blind and consultative transfers.
In addition, MESSAGE requests can be sent either out-of-dialog or
within-dialog, and one expects the performance of the two cases to be
quite different, so there should be separate metrics for the two
cases.
105. Similar to the problem with detecting "failure in the network",
determining when session request is finished (that is, ringing starts)
is difficult to specify. A 180 response is clear indication that a
user has been notified, but other 18x responses may be considered as
successful setup in some situations. E.g., a 182 Queued message may
be considered "end of session request" if one is concerned about the
performance of the network, although it does not indicate the start of
useful communication from the user's point of view.
106. In regard to issues 105 and 103, it may be necessary to allow
that metrics may be "parametrized" by attaching a specification of
which response codes are treated as having which meaning. In any
case, allowance needs to be made that experience with the metrics may
show that the various specified sets of response codes need to be
modified to maximize the usefulness of the metrics.
Deficiencies in the exposition of how various timing measurements are
to be taken. However, these may mask technical issues about how the
measurements are intended to be taken.
200. Section 3 of the draft gives a nice standard for how the time
from sending a request to receiving a response is to be measured ("T1
to T4"), including within it a standard for how to measure the time of
sending and receiving ("first bit sent" vs. "last bit received").
However, in many places (e.g., section 4.1) these standards are not
referenced, but rather they are restated. This leads the reader to
have to check each metric definition to see if it is defined in the
same way as described in section 3, and to wonder if all of the
metrics are defined in the same way. Better would be to have section
3 note explicitly that all metrics use this definition of time delay.
201. Metrics involving INVITE should explicitly note that ACK is not
considered as part of the time delay; all time delays are measured
from sending the INVITE to receiving its response. Or rather, there
should be an overall declaration of this standard in section 3.
202. The discussion of "Successful session duration SDT" shows some
examples but does not clearly indicate how all four cases are to be
handled. (The cases are the combinations of: measurement at the
originating end vs. measurement at the terminating end, sending BYE
vs. receiving BYE.) There is also some confusion regarding T1 and T4,
which are specified in multiple different ways in this section.
203. Consideration should be made in the various definitions of calls
that are timed-out by the session-timer mechanism or other
session-keepalive mechanisms. (Unless it is known that the intended
networks will not use session-keepalives, or that session-keepalive
failures will not cause UAs to see differing signaling flows.)
204. As Robert notes, the "Hops per Request" measurement is unlikely
to be useful for out-of-dialog requests, because it does not capture
redirections handled by proxies and failed forks, and so is poorly
correlated with the amount of work needed to deliver a request to its
destination. However, this metric may be useful for in-dialog
requests, as there is usually no redirection and (AFAIK) B2BUAs
maintain the Max-Forwards value.
Editorial issues, mostly involving aligning terminology with the
common usage in SIP discussions.
300. Many of the metrics appear to be intended to report the average
of a quantity that is measured for many executions of an operation.
But most of these metrics do not state that averaging is to be done,
they describe in detail how a delay quantity is to be measured, but do
not mention that averaging is to be done. (E.g., the word "average"
appears only twice in the draft.)
301. The term "session" seems to always be used to describe what is
called a "dialog" in the SIP world, seeing has how the media (RTP)
session is never discussed. Is this a telecom usage, or should it be
replaced with "dialog"?
302. As Robert notes, T1 and T4 are used in ways that conflict with
the use of those symbols in RFC 3261. They also lead one to wonder
what happened to T2 and T3. But perhaps these symbols are a reference
to a terminology used in TDM performance measurement and should be
retained due to that standardization.
303. In measurements regarding a single request and its response(s),
the roles UAC and UAS are well-defined. But in session-duration
measurements, the roles of UAC and UAS are defined only within
individual request-response transactions. In that case, a different
terminology should be used. (E.g., "originator" vs. "terminator"? --
Surely the telecom world has words for this.)