Alan Duric <alan.duric at telio.no> writes:
see answers inline (if anything more needed or left unclear let me
know).
Thinking about it, there's another issue with the a=fmtp:NN
mode=YY
stuff to watch out for in implementation within SIP:
1) SIP INVITE:
...
m=audio 12345/1 RTP/AVP 97
a=rtpmap:97 iLBC/8000
a=ftmp:97 mode=20
2) SIP 200 OK:
...
m=audio 54321/1 RTP/AVP 97
a=rtpmap:97 iLBC/8000
a=ftmp:97 mode=30
Ok, by the spec bidirectionally both sides should use 30ms. Fine.
However, also according the SIP the caller needs to be able and ready
to
receive audio on 12345 as soon as they send the invite (assuming the
stream
isn't inactive). The 200 OK can be delayed or lost and considerable
traffic exchanged before then.
So, how is the initiator supposed to know what packetization to assume
on
reception? I assume the initiator must be prepared to receive and play
either and if need be "sniff" the data within the packets to determine
if
they're 20ms or 30ms. The initiator won't send data until it sees 200
OK
(of course). The initiator can't use the size alone since an RTP
packet
might contain 2 frames of 30ms or 3 of 20ms. Or the initiator could
ignore incoming packets until it sees an OK (which of course is
non-optimal).
This is the sort of thing it would be good to warn implementors about.
RFC 3952 (RTP format for iLBC) is somewhat vague on what the default
packetization period used is (20 or 30ms). There are sentences that
imply
it's 30ms, but it is not stated. Also, it's never stated what mode
is
selected if the offer says mode=20 and the answer doesn't specify a
mode
(or vice versa).
From the RFC (somewhat unclearly worded):
If 20 ms frame size mode is used, remote iLBC encoder SHALL
receive
"mode" parameter in the SDP "a=fmtp" attribute by copying them
directly from the MIME media type string as a semicolon separated
with parameter=value, where parameter is "mode", and values can
be 0
and 20 (where 0 is reserved and 20 stands for preferred 20 ms
frame
size). An example of the media representation in SDP for
describing
iLBC when 20 ms frame size mode is used might be:
m=audio 49120 RTP/AVP 97
a=rtpmap:97 iLBC/8000
a=fmtp:97 mode=20
This implies that the default is 30ms but doesn't state it. (The
rest
of the RFC is agnostic either way.)
It is important to emphasize the bi-directional character of the
"mode" parameter - both sides of a bi-directional session MUST use
the same "mode" value.
The offer contains the preferred mode of the offerer. The
answerer
may agree to that mode by including the same mode in the answer,
or
may include a different mode. The resulting mode used by both
parties SHALL be the lower of the bandwidth modes in the offer and
answer.
That is, an offer of "mode=20" receiving an answer of "mode=30"
will
result in "mode=30" being used by both participants. Similarly,
an
offer of "mode=30" and an answer of "mode=20" will result in
"mode=30" being used by both participants.
What if there is no mode in the response?
Then only 30 ms mode is supported on the far end (it is MUST to use
"mode"
in order to have 20 ms frames in the established call, if someone
does not
use it, it implies it does not support it) and thus 30 ms will be
selected.
--
Randell Jesup, Worldgate Communications, ex-Scala, ex-Amiga OS team
rjesup at wgate.com
_______________________________________________
Audio/Video Transport Working Group
avt at ietf.org
https://www1.ietf.org/mailman/listinfo/avt