Session Initiation Protocol Service Example -- Music on Hold
Avaya Inc.
600 Technology Park Dr.BillericaMA01821USdworley@avaya.comhttp://www.avaya.com
Transport
DispatchMusic on hold
The "music on hold" feature is one of the most desired features of
telephone systems in the business environment.
"Music on hold" is where, when one party to a call
has the call "on hold", that party's telephone provides an audio
stream (often music) to be heard by the other party.
Architectural features of SIP make it
difficult to implement music-on-hold in a way that is fully compliant
with the standards.
The implementation of music-on-hold described in this document is
fully effective and standards-compliant, and has a number of
advantages over the methods previously documented.
In particular, it is less likely to produce peculiar user interface
effects and more likely to work in systems which perform
authentication than the method of RFC 5359.
Within SIP-based systems, it is desirable to be
able to provide
features that are similar to those provided by traditional telephony
systems.
A frequently requested feature is "music on hold":
The music-on-hold feature is where, when one party to a call
has the call "on hold", that party's telephone provides an audio
stream (often music) to be heard by the other party.
Architectural features of SIP make it
difficult to implement music-on-hold in a way that is fully compliant
with the standards. The purpose of this document is to describe a
method that is reasonably simple yet fully effective and standards-compliant.
The "intended status" of this document is "Informational".
The reason that it is not "Best Current Practice" is that this method is
not specified as "best", nor is this specification intended
to supersede all other methods for implementing music-on-hold.
Indeed, the two user agents in a call can use different methods for
implementing music-on-hold, as can different user agents within a
telephone system.
The essence of the technique is that when the executing UA (the user's
UA) performs
a re-INVITE of the remote UA (the other user's UA) to establish the
hold state, it provides no SDP
offer,
thus compelling the remote UA to provide an SDP offer.
The executing UA then extracts
the offer SDP from the remote UA's 2xx response,
and uses that as the offer SDP in a new INVITE to
the external media source. The external media source is thus directed
to provide media directly to the remote UA.
The media source's answer SDP is returned to the remote UA in the ACK
to the re-INVITE.
The executing user instructs the executing UA to put the dialog
on-hold.The executing UA sends a re-INVITE without SDP to the remote UA,
which forces the remote UA to provide an SDP
offer in its 2xx response.
The Contact header of the re-INVITE includes the '+sip.rendering="no"'
field parameter to indicate that it is putting the call on
hold. ( section 5.2)
The remote UA sends a 2xx to the re-INVITE, and includes an SDP offer
giving its own listening address/port.
If the remote UA understands the sip.rendering feature parameter, the
offer may indicate that it will not send media by specifying the media
directionalities as "recvonly" (the reverse of "on-hold") or "inactive".
But the remote UA may offer to send media.
The executing UA uses this offer to derive the offer SDP of an initial
INVITE that it
sends to the configured music-on-hold (MOH) source.
The SDP in this request is largely copied
from the SDP returned by the remote UA in the previous step,
particularly regarding the
provided listening address/port and payload type numbers.
But the media
directionalities are restricted to "recvonly" or "inactive" as appropriate.
The executing UA may want or need to change the o= line.
In addition, some a=rtpmap lines may need to be added to control the
assignment of RTP payload type numbers.[]
The MOH source sends a 2xx response to the INVITE, which contains an SDP
answer that should include
its media source address as its listening address/port.
This SDP must necessarily specify "sendonly" or "inactive" as the
directionality for all media streams.
Although this address/port should receive no RTP, the specified port
determines the port for receiving RTCP (and conventionally, for sending
RTCP).
By convention, UAs
use their declared RTP listening ports as their RTP source ports as well.
The answer SDP will reach the remote UA, thus informing it of the
address/port from which the MOH media will come, and presumably
preventing the remote UA from ignoring the MOH media if the remote UA
filters media packets based on the source address.
This functionality requires the SDP answer to contain the sending
address in the c= line, even though the MOH source does not
receive RTP.)
The executing UA sends this SDP answer as its SDP answer in the ACK for the
re-INVITE to the remote UA. The o= line in the answer must be modified
to be within the sequence of o= lines previously generated by the executing
UA in the dialog. Any dynamic payload type number assignments that
have been created in the answer must be recorded in the state of the
original dialog.Due to the sip.rendering feature parameter in the Contact of the
re-INVITE and the media directionality in the SDP answer contained in
the ACK, the on-hold state of the dialog is
established (at the executing end).After this point, the MOH source generates RTP containing the
music-on-hold media, and sends it directly to the listening address/port of the
remote UA. The executing UA maintains two dialogs (one to
the remote UA, one to the MOH source), but does not see or handle the MOH
RTP.The executing user instructs the executing UA to take the dialog off-hold.The executing UA sends a re-INVITE to the remote UA with SDP that
requests to receive media.
The Contact header of the re-INVITE does not include the '+sip.rendering="no"'
field parameter.
(It may contain a sip.rendering field parameter with value "yes" or
"unknown", or it may omit the field parameter.)
Thus this INVITE removes the on-hold state of the
dialog (at the executing end).
(Note that the version in o= line of the offered SDP must account for
the SDP versions that were passed through from the MOH source.
Also note that
any payload type numbers that were assigned in SDP provided by
the MOH source must be respected.)When the remote UA sends a 2xx response to the re-INVITE, the executing UA
sends a BYE request in the dialog to the MOH source.After this point, the MOH source does not generate RTP and ordinary
RTP flow is reestablished in the original dialog.
This section shows a message flow which is an example of this
technique. The scenario is: Alice establishes a call with Bob. Bob
then places the call on hold, with music-on-hold provided from an
external source. Bob then takes the call off hold.
Note that this is just one possible message flow that illustrates this
technique; numerous variations on these operations are allowed by the
applicable standards.
While the call is on-hold, the remote UA can send a request to
modify the SDP or the feature parameters of its Contact header. This
can be done with either an INVITE or UPDATE method, both of which have
much the same effect in regard to MOH.A common reason for a re-INVITE is when the remote UA
desires to put the dialog on hold on its end. And because of the need
to support this case, an implementation must process
INVITEs and UPDATEs during the on-hold state as described below.The executing UA handles these requests by echoing requests and
responses: an incoming request from the remote UA causes the executing
UA to send a similar request to the MOH source and an incoming response from the
MOH source causes the executing UA to send a similar response to the
remote UA. In all cases, SDP offers or
answers that are received are added as bodies to the stimulated
request or response to the other UA.The passed-through SDP will usually need its o= line modified.
The directionality attributes may need to be restricted.
In regard to payload type numbers, since the mapping has already been
established within the MOH dialog, a=rtpmap lines need not be added.
The executing UA must be prepared to receive INVITE requests with
Replaces headers that replace the original dialog, and similarly it
must be prepared to receive REFER requests within the dialog.
The SDP within the new dialog is negotiated by being passed through to
the MOH source within a new dialog with the MOH source.
The SDP
offer or answer can be passed to the MOH source with only
modification to the o= line and directionality attributes.In some cases, the previous dialog with the MOH source can be reused,
but only if the executing UA presents the first offer within the new
dialog, as otherwise
there is no way to force the RTP payload types that have been used
previously in the MOH dialog to be mapped to the correct codecs in the
new dialog.It is possible for the MOH source to send an INVITE or
UPDATE request, and the executing UA can support doing so in similar
manner as requests from the remote UA.
However, if the MOH source is within the same
administrative domain as the executing UA, the executing UA may have
knowledge that the MOH
source will not (or need not) make such requests, and so can respond
to any such request with a failure response, avoiding the need to pass
the request through.However, in an environment in which ICE
is supported, the MOH
source may need to send requests as part of ICE negotiation with the remote UA.
Hence, in environments that support ICE, the executing UA must be able to
pass through requests from the MOH source as well as requests from
the remote UA.Again, as SDP is passed through, its o= line will need to be
modified.
In some cases, the directionality attributes will need to be
restricted.
In this technique, the MOH source generates an SDP answer that
the executing UA presents to the remote UA as an answer within the
original dialog.
In basic functionality, this presents no problem, because
(section 6.1, at the very end) specifies that the
payload type numbers used in either direction of RTP are the ones
specified in the SDP sent by the recipient of the RTP.
Thus, the MOH source will send RTP to the remote UA using the payload type numbers
specified in the offer SDP it received (ultimately) from the remote UA.But strict compliance to (section 8.3.2)
requires that payload type
numbers used in SDP may only duplicate the payload type numbers used in
any previous SDP sent in the same direction
if the payload type numbers represent the same media format (codec) as
they did previously.
However, the MOH source has no knowledge of the payload type numbers
previously used in the original dialog, and it may accidentally
specify a different media format for a previously used payload type number in its
answer (or in a subsequently generated INVITE or UPDATE).
This would cause no problem with media decoding, as it cannot send any
format that was not in the remote UA's offer, but it would violate
.Strictly speaking, it is impossible to avoid this problem because
the generator of a first answer in its dialog can
choose the payload numbers independently of the payload numbers in the
offer, and the MOH server believes that its answer is first in the dialog.
Thus the only absolute solution is to have the executing UA rewrite
the SDP that passes through it to
reassign payload type numbers, which would also require it to rewrite
the payload type numbers in the RTP packets -- a very undesirable solution.
The difficulty solving this problem (and similar problems in other situations)
argues that strict adherence should not be required to the rule that
payload type numbers not be reused for different codecs.The remainder of this section is devoted to describing a technique to
eliminate this problem, in case it is of practical significance in
some application.
We do not expect that user agents would need to implement it in most
applications.
However, we can construct a technique that will strictly adhere to the
payload type rule
by exploiting a SHOULD-level requirement in
(section 6.1): "In the case of RTP, if a particular codec was referenced with a
specific payload type number in the offer, that same payload type
number SHOULD be used for that codec in the answer."
Or rather, we exploit the "implied requirement" that if a specific payload number
in the offer is used for a particular codec, then the answer should not use that
payload number for a different codec.
If the MOH source obeys this restriction, the executing UA can modify
the offer SDP to "reserve" all payload type numbers that have ever
been offered by the executing UA to prevent the MOH source from
using them for different media formats.
When the executing UA is composing the INVITE to the MOH source, it
compiles a list of all the (dynamically-assigned) payload type numbers
and associated media formats
which have been used by it (or by MOH sources on its behalf) in the
original dialog.
(The executing UA must be maintaining a list of all previously used
payload type numbers anyway, in order to comply with
.)
Any payload type number that is present in the offer but has been used
previously by the executing UA in the original dialog for a different
media format is
rewritten to describe a dummy media format.
A payload type number is added to describe the deleted media format,
the number being either previously unused or previously used by the
executing UA for that media format.
Any further payload type numbers
which have been used by the executing UA in the
original dialog but which are not mapped to a media format in the
current offer are then mapped to a dummy media format.
The result is that the modified offer SDP:
offers the same set of
media formats (ignoring dummies) as the original offer SDP (though possibly with
different payload type numbers),
associates every payload type number
either with a dummy media format or with the media format that the
executing UA has previously used it for, and
provides a (real or
dummy) media format for every payload type number that the executing
UA has previously used.
These properties are sufficient to force an MOH server that obeys the
implied requirement to generate an answer that is a correct answer to the
original offer and is also compatible with previous SDP
from the executing UA.
Note that any re-INVITEs from the remote UA that the executing UA
passes through to the MOH server require similar modification, as
payload type numbers that the MOH server receives in past offers are not
absolutely reserved against its use (as they have not been sent in
SDP by the MOH server) nor is there a SHOULD-level proscription
against using them in the current answer (as they do not appear in
the current offer).
This should provide an adequate solution to the problems with
payload type numbers, as it will fail only if (1) the remote UA is
particular that other UAs follow the rule about not redefining
payload type numbers, and (2) the MOH server does not follow the
implied requirement of section 6.1.Let us show how this process works by modifying the example of with this specific assignment of supported
codecs:
Alice supports formats X and YBob supports formats X and ZMusic Source supports formats Y and Z
In this case, the SDP exchanges are:
F1 offers X and Y, F3 answers X and Z (which cannot be used)F6 offers X and Y, but F7 offers X, Y, and a place-holder to block use of type 92F8/F10 answers Y
The messages that are changed from are:
The executing UA may discover that either the remote UA or the MOH
source wishes to use dialog/session liveness
timers.
Since the timers verify the liveness of dialogs, not
sessions (despite the terminology of ), the
executing UA can support the timers on each dialog (to the remote UA
and to the MOH source) independently.
(If the executing UA becomes obliged to initiate a refresh
transaction, it must send an offerless UPDATE or re-INVITE, as if it sends an
offer, the remote element has the opportunity to provide an answer
which is different from its previous SDP, which could not easily be
conveyed to the other remote element.)
This technique for providing music-on-hold has advantages over other
methods now in use:
The original dialog is not transferred to another UA, so the "remote
endpoint URI" displayed by the remote endpoint's user interface and
dialog event package does not change during
the call, as contrasted to the method in
section 2.3.
This URI is usually displayed to the user as the the name and number
of the other party on the call, and it is desirable for it not to change
to that of the MOH server.
Compared to , this method does not
require use of an out-of-dialog REFER, which is not otherwise used
much in SIP.
Out-of-dialog REFERs may not be routed correctly, since neither the
From nor Contact URI of the original dialog may route correctly to the remote UA.
Also, out-of-dialog requests to UA URIs may not be handled correctly
by authorization mechanisms.
The music-on-hold media are sent directly from the music-on-hold source
to the remote UA, rather than being relayed through the executing UA.
This reduces the computational load on the executing UA and can reduce the load on
the network (by eliminating "hairpinning" of the media through the link serving
the executing UA).
The remote UA sees, in the incoming SDP, the address/port that the MOH
source will send MOH media from (assuming that the MOH source follows
the convention of sending its media from its advertised media listening
address/port).
Thus the remote UA will render the MOH media
even if is filtering incoming media based on originating
address as a media security measure.
The technique requires relatively simple manipulation of SDP, and
in particular: (1) does not require a SIP element to modify unrelated SDP to be
acceptable to be sent within an already established sequence of SDP (a
problem with section 2.3), and
(2) does not require converting an SDP answer into an SDP offer
(which was a problem with the -00 version of this document, as well as
with ).
Unnecessary failures can happen if SDP offerers do not always offer all media
formats that they support.
Doing so is considered best practice
( section 5.1 and 5.3), but some
elements offer only formats that have already been in use in the dialog.
An example of how omitting media formats in an offer can lead to
failure is as follows:
Suppose that the UAs in each support the
following media formats:
Alice supports formats X and YBob supports formats X and ZMusic Source supports formats Y and Z
In this case, the SDP exchanges are:
Alice calls Bob:
Alice offers X and Y (message F1)
Bob answers X (F3)
Bob puts Alice on hold:
Alice (via Bob) offers X and Y (F6 and F7)
Music Source (via Bob) answers Y (F8 and F10)
Bob takes Alice off hold:
Bob offers X and Z (F11)
Alice answers X (F12)
Note that in exchange 2, if Alice assumes that because only format X
is in use that she should offer only X, the exchange fails.
In exchange 3, Bob offers formats X and Z, even though neither is
in use at the time (because Bob is not involved in the media streams).
Many UAs provide MOH in the interval during which it is processing a
blind transfer, between receiving the REFER and receiving the final
response to the stimulated INVITE.
This process involves switching the user's interface between three
media sources:
(1) the session of the original dialog,
(2) the session with the MOH server,
and (3) the session of the new dialog,
and involves a number of race conditions that must be handled
correctly.
If the UA is a B2BUA whose "other side" is maintaining a single dialog
with another UA, each switching of media sources potentially causes a
re-INVITE transaction within the other-side dialog.
Since re-INVITEs take time and must be sequenced correctly
( section 14), such a B2BUA must allow the events
on each side to be non-synchronous and must coordinate them
correctly.
Failing to do so will lead to "glare" errors (491 or 500), leaving the
other-side UA not rendering the correct session.
Some UAs filter incoming media based on the address of origin
as a media security measure.
The technique described in this document ensures that any UA that
should render MOH media will
be informed of the source address of the media via the SDP that it
receives.
This should allow such UAs to filter without interfering with MOH
operation.
The original version of this proposal was derived from
section 2.3
and the similar implementation of MOH in the Snom UA.
Significant improvements to the sequence of operations, allowing
improvements to the SDP handling, were suggested by
Venkatesh.
John Elwell pointed out the need for the executing
UA to pass through re-INVITEs/UPDATEs in order to allow ICE
negotiation.
Paul Kyzivat pointed
out the difficulties regarding reuse of payload type numbers
and considerations that could be used to avoid those difficulties,
leading to the writing of .
Paul Kyzivat suggested adding
showing why offerers should always include all supported formats.
M. Ranganathan pointed out the difficulties experienced by a B2BUA
() due to the multiple changes of media source.
was significantly clarified based on
advice from Attila Sipos.
The need to discuss dialog/session
timers[] was pointed out by Rifaat
Shekh-Yusef.
Robert Sparks clarified the purpose of the "Best Common Practice" status,
leading to revising the intended status of this document to
"Informational".(Note to RFC Editor: Please remove this entire section upon
publication as an RFC.
Removed the original "Example Message Flow" and promoted the
"Alternative Example Message Flow" to replace it because of a number
of flaws that were found during the discussion of -00 on the SIPPING
mailing list.
Described the use of the sip.rendering feature parameter to indicate
on-hold status.
Added discussion of passing though re-INVITEs and UPDATEs.
Added discussion of payload type numbers.
Added Acknowledgments section.
Added showing the importance of the
offerer always including all supported media formats.
Updated references.
Revised handling of payload type numbers when passing offer to MOH
server , based on observations by Paul Kyzivat.
Added discussing handling of re-INVITEs by
B2BUAs when using this method.
Added "avoidance of out-of-dialog REFER" as an advantage.
Added "automaton", "sip.rendering", and "sip.byeless" feature tags to
the Contact URI of the Music Server in the examples.
Added initial discussion of dialog/session timer support.
Revised handling of payload type numbers based on further observations
by Paul Kyzivat.
Changed references to "SPIT" to refer to "media security", per
suggestion by Scott Lawrence.
Removed reference to the idea of having the executing UA not maintain
session timers itself, but rather, passing through session timer
negotiation and updates.
Examination showed this idea to be much more complex to implement than
having the executing UA terminate session timers itself for both
dialogs.
(Suggested by Rifaat Shekh-Yusef.)
On advice from Robert Sparks, changed the "intended status" from "BCP"
to "Informational", and added a section to explain the change.
Noted that the rule on not reusing payload type numbers is undesirable because
it complicates third-party operations (as noted by
Paul Kyzivat).
Updated author's contact information.On suggestion from John Elwell, added mention that the Music
Source's SDP address/port implies its RTCP address/port, which will
be used to receive RTCP.Updated references to and
to specify the sections of
documents, which are the ones that discuss music on hold.An Offer/Answer Model with the Session Description Protocol (SDP)SDP: Session Description ProtocolSIP: Session Initiation ProtocolSession Timers in the Session Initiation Protocol (SIP)An INVITE-Initiated Dialog Event Package for the
Session Initiation Protocol (SIP)Subject: [Sipping] RE: I-D Action:draft-worley-service-example-00.txtInteractive Connectivity Establishment (ICE): A Protocol for Network
Address Translator (NAT) Traversal for Offer/Answer ProtocolsSubject: Re: [Sipping] I-D ACTION:draft-ietf-sipping-service-examples-11.txt[Sip-implementors] draft-worley-service-example-02[Sip-implementors] draft-worley-service-example-02SIP (Session Initiation Protocol) Usage of the Offer/Answer ModelSession Initiation Protocol Service ExamplesSession Initiation Protocol Service Examples[sipcore] draft-worley-service-example-03RE: [Sip-implementors] draft-worley-service-example-02Indicating User Agent Capabilities in the Session Initiation
Protocol (SIP)Subject: Re: [Sipping] I-D ACTION:draft-ietf-sipping-service-examples-11.txt