[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[AVT] RE: Comments on draft-ietf-avt-rtp-3gpp-timed-text-04.txt



HI,

> -----Original Message-----
> From: Magnus Westerlund [mailto:magnus.westerlund at ericsson.com]
> Sent: Tuesday, July 27, 2004 4:16 PM
> To: Jose Rey
> Cc: Dave Singer; Jan van der Meer; IETF AVT WG;
> matsui.yoshinori at jp.panasonic.com
> Subject: Re: Comments on draft-ietf-avt-rtp-3gpp-timed-text-04.txt
>
>
> Hi Jose,
>
> Comments inline:
>
> Jose Rey wrote:
> >>>>4. Section 2.3.4: I think the tables of possible combinations to
> [snip]
> >
> > I would expect that both text strings and modifiers could get big, as
> > indicated in the draft.
> >
> > Your comments above are legitimate in any case, however I assume in the
> > draft the fragmentation rate is so low that it is not worth to do this.
> >
> > The impact to include a timestamp offset would not be very big, but I
> > think
> > this is not worth the effort.
> >
>
> Okay, as long as there valid reasoning behind this, I don't see a
> problem of leaving it as it is.
>
Okay.

> > I only count the interarrival jitter as affected by the low resolution.
> > Is
> > there anything else?
>
> Yes, I have also located the RFC 3611, RTCP XR measurement for RTP
> arrival timestamping that has this property.
>

Good point! I'll scan through that RFC.

> >>
> >>Okay, we are on different pages. My first question regarding the
> >>definition of a start value, seem to be the springing point here.
> >>
> >>If a start value, is the first SIDX used, and there exist a
> >
> > restriction,
> >
> >>that any subsequent SIDX value used must be higher, then your comment
> >>holds.
> >
> >
> > This is the case. Thus no problem.
> >
> >
> >>However if there exist no restrictions on how you use the SIDX
> >>values, then it is not a problem.
> >
> >
> > ?? the both are not a problem? I'm lost...
>
> Then there is no need to have a start value at all. And the attempt to
> define a start value would be meaningless.
>
> However you say that there is a need for a start value. This due to that
> the SIDX must be monotonically increased depending on restrictions in
> the 3GPP TS. Therefore I would like to have some further clarification
> on what a start value actually is. To my understanding it is the SIDX
> value that represent the first Sample description that are used. And
> then as each new sample description is used, it must have a higher SIDX.
>

There's not restriction in the TS saying that the SIDX shall be
monotonically increased.  Static SIDXs are something defined for RTP
streaming.
The SIDX is exactly what you define above as a monotonically increasing
number with an initial value of 129 (start value).  Each new static SIDX
increases the SIDX in one unit.

>
> >>
> >>>>19. Section 8.2.1:
> >>>>  "This means that an offerer using these
> >>>>   parameters only specifies which values are going to be used for
> >>>
> >>>the
> >>>
> >>>
> >>>>   sent stream."
> >>>>
> >>>>I hope you are aware of that you are changing the direction of how
> >>>>parameters normally work in offer-answer.
> >>>
> >>>
> >>>I am not sure what you mean.  This may need further discussion
> >
> > depending
> >
> >>>on
> >>>the answer to the one below.
> >>>
> >>
> >>In a sendrecv case, when the offerer gives a stream. The actual values
> >>of parameters he provides is what he accepts to receive.
> >
> >
> > OK, let's see:
> >
> > Is the problem you are referring to in the following sentence (Section
> > 8.2.1): "  The answerer MAY include these parameters or not in its
> > answer.
> > If included, the values MAY be different." ?
>
> The problem is that any declarative parameter does normally in
> Offer/Answer only apply for the stream that the declaring entity is
> going to receive. While the parameters present in the MIME registration
> needs to apply in the other direction.
>
>
> > As I understand you mean that if the offerer 'offers' a sendrecv stream
> > and
> > the answerer accepts it, this means the answerer must either include the
> > parameters verbatim in the answer or else remove the stream?
> >
> > However, if the stream offered is recvonly, the answerer may either
> > change
> > the parameters (downgraded or equal) or else remove the stream?
> >
> > This interpretation clashes with some past discussion
> > (http://www1.ietf.org/mail-archive/web/avt/current/msg03364.html) where
> > declarative is meant as something that both may use with different
> > values.
> > That may be the problem.
> >
>
> No, I am not talking about if they are declarative or negotiated. In
> fact there is no difference between "sendrecv" and "recvonly" in an
> offerer when it comes to declarative parameters. In both stream types,
> the declarative parameter apply to the stream that will be incoming to
> the offerer.

This is getting fuzzy...

I think I that 'declarative' is not the best word to use, as there seems to
be different definitions or, at least, have a wider definition.  I have the
feeling that one can neither use terms like "media format configuration
parameters", "media stream parameters" and "capability parameters" for those
that are symmetric or downgradable (relating to the stream and the actual
'physical capabilities' of the offerer/receiver) respectively.

So I'll speak about symmetric and downgradable, which is more general.

Let's see:

tx,ty,layer,tx3g,height,width

- if sendrecv/recvonly: the parameters express the values the offerer wishes
to have for the incoming stream.  At the same time these values are the
values the offerer will use (for the O->A stream if sendrecv) if the stream
is accepted by the answerer.   In this case, for simplicity, the parameters
preferred by the offerer cannot be changed by the answerer if he accepts the
stream (thus symmetric use) since it is not guaranteed that the modified
values are supported by the offerer, right?.

For interoperability,  the offer must be constructed in such a way that the
values of the parameters cover the widest audience possible of answerers, as
to avoid a rejected session.

- if sendonly: this is tricky.. The problem is that the receiver's display
possiblities are unknown at the time of composing the offer and *at the same
time* not all parameter can be changed by the answerer AND/OR the parameter
ranges cannot be expressed (e.g. you cannot express all supported tx3g
combinations) in the offer. So, the solution would be that tx, ty, layer
shall indicate maximum values and tx3g, height and width shall be used
symmetrically because, e.g., it may be a problem to change the h/w for a
given font size.

For interoperability, if different text box sizes are supported by the
offerer,  then different sets of tx,ty,layer,tx3g, height,width parameters
with adjusted sample description values shall be present in the offer.  This
is responsability of the session creator.

spldesc, this indicates a requirement (sendrecv/recvonly) or
property/limitation (sendonly) of the stream:
- if sendrecv/recvonly, this indicates whether the offerer *requires* to
receive a stream with out-of-band only descriptions (i.e. a simple stream).
if present MUST be used symmetrically, no change.
- if sendonly, if set to "out", it indicates that the sender can only sent
descriptions out-of-band..  MUST be used symmetrically, no change.

sver
- if sendrecv/recvonly or if sendonly, in both cases the parameter is
downgradable. No change.



>
> It is only declarative parameters in a sendonly stream in that are
> different, they provide preferences for how the offerer would desire to
> send things. It is the answerers declarative values, that will decide
> how the offerer will need to send the stream to the answerer.
>

So the offerer MUST take care that the offered values are supported by the
widest audience possible or else define do as for tx,ty above right?
> >
> >
> >
> >>For parameters
> >>applicable that works in the direction, of what the declaring entity
> >>sends does not fit very well. For example the offerer has to declare
> >>them prior to knowing what the answerer is accepting to receive.
> >>
> >
> >
> > Actually I thought in a sendrecv stream, the answerer can use different
> > parameter values for the stream it sends from those for the stream it
> > receive (I am basing this in the email referenced)... that's why I
> > included
> > the sentence in quotes above.  But as I understand from your comments,
> > this
> > would mean a change in how O/A works, and an extra message from offerer
> > to
> > receiver, right?
>
> The usage you are proposing, would not change how O/A as I think. It is
> a little confusing on the exact case we are discussing. As stated above,
> the offerer and answer can use different values for declarative
> parameters.

Not a little, it is very confusing ;)
>
> >
> > I think all tx,ty,layer,tx3g and height,width are of the same type. One
> > could think of negotiating each other differently but since they all
> > refer
> > to display, a change in one of them might also mean changes in the other
> > ones, e.g., if you change the position of the text track you have to
> > take
> > care that the new position doesn't go over the screen, else reduce the
> > size..etcetera.  I think it makes little sense to have 'fine
> > granularity'
> > here.
> >
> >
>
> I think I can agree that they are linked together and needs to be set
> consistent. However the problem I am trying to explain is that the
> entity needing to set them is the sending entity rather then the
> receiving one.
>

Yes, I think (?)

> >
> >>I think it may in fact be a serious problem to not have a defined RTP
> >>timestamp rate.
> >
> >
> > The timestamp clockrate is always included in the offer and it MUST be
> > accepted or else the whole stream removed, where's the problem?
> >
>
> Yes, the clock rate is a negotiated parameter.
> Lets assume an offer that
> looks like this:
>
> m=video x RTP/AVP 96
> a=rtpmap: 96 3gpp-tt/50
> a=recvonly
> a=fmtp 96 ...
>
>
> Now it is the answerer that is going to send a stream to the offerer,
> however its stream is stored on a file that assume the clock rate is
> 1000 Hz. As the answerer must use the same value, he will either have to
> refuse the session or perform a answer like below that does not match
> his stream:
>
> m=video x RTP/AVP 96
> a=rtpmap: 96 3gpp-tt/50
> a=sendonly
> a=fmtp 96 ...
>
>
> So the point is that, unless you have the possibility to rewrite the TS
> values, and SDUR fields you can't be certain that you can send a stream
> that is being negotiated in the offer/answer model.
>

In the case you address, there are two possiblities, either the encoded text
has lower or higher resolution.  If lower, this is clearly not a problem as
the resolution is not lost when converting to higher clockrates. The problem
is if the original clockrate is higehr than 1000Hz.
What we can do is REQUIRE 1000 HZ and advise that resolution MAY be lost
when converting from high to low, although I really think that 1000 HZ is
good enough for capturing and synchronising speech...unless human make
considerable developments in the spoken language ;)

>
> If one looks at the 3GPP timed text parameters, one would need to have
> something like H.264, and define the following parameters:
>
>     rate: the RTP timestamp clockrate is equal to the clockrate of the
>          media.  If RTP packets are generated out of a 3GP file, the
>          clockrate of the text media MUST be copied from the 3GP file,
>          i.e. the clockrate is the value of "timescale" parameter in the
>          Media Header Box describing that text track.  Other tracks
>          (audio/video/text) in the 3GP file may have their own clockrates
>          as indicated in their corresponding Media Header Box.  For live
>          encoding, a clockrate of 1000 Hz is RECOMMENDED but other values
>          MAY be used.
>
>     sver=<Z1(x1*256+y1)>, <Z2(x2*256+y2), ..., <Zi(xi*256+yi)>,...
>          The parameter "sver" specifies the list of supported backwards-
>          compatible versions of the timed text format specification (3GPP
>          TS 26.245), which the
> **"receiver"** (instead of sender)

I don't understand this change.  The above comments:

> The problem is that any declarative parameter does normally in
> Offer/Answer only apply for the stream that the declaring entity is
> going to receive. While the parameters present in the MIME registration
> needs to apply in the other direction.
>
>
 don't change anything here since this must be negotiated...?



>          supports (or is willing to accept).
>          The first value is the current value used or the preferred
>          value.  This MAY be followed by a comma-separated list of
>          increasingly older versions that SHOULD be used as alternatives.
>          The order is meaningful, being first most preferred and last
>          least preferred.  Regarding the value calculation: "Zi" is the
>          number of the Release, "xi" and "yi" are taken from the 3GPP
>          specification version, i.e. vZi.xi.yi.  For example, for 3GPP TS
>          26.245 v6.0.0, Zi(xi*256+yi)=6(0), the version value is "60".
>
>          Note that "60" is the concatenation of the values Zi=6 and
>          (xi*256+yi)=0 and not its product.
>
>     sprop-sver= The version and compatible version of the stream
> actually going to be sent.
>
>     sprop-width=<integer-value>, indicates the width in pixels of the text
>          track or area where the text is actually displayed.  This is a
>          16 bit integer.
>
>     maxwidth= The max display width in pixels that can be received.
>
>     sprop-height=<integer-value>, indicates the height in pixels of the
> text
>          track.  This is a 16 bit integer.
>
>     max-height= The max display height in pixels that can be received.
>
>     sprop-tx=<integer-value>, indicates the horizontal translation
> offset in
>          pixels of the text track with respect to the origin of the video
>           track.  This is a 16 bit integer.
>          .
>     sprop-ty=<integer-value>, indicates the vertical translation offset in
>          pixels of the text track.  This is a 16 bit integer.
>
>     sprop-layer=<integer-value>, indicates the proximity of the text
> track to
>          the viewer.  Higher values means closer to the viewer.  This
>          parameter has no units.  This is a 16 bit integer.
>
>     Optional parameters:
>
>     spldesc=<value> indicates the way the server sends the sample
>          descriptions.  This parameter MAY not be present, this meaning
>          that the value "both" is used.  In detail:
>
>          o "out": all sample descriptions are sent out-of-band, e.g. in
>             the SDP.  This may be used when the total number of sample
>             descriptions used is low.  This is useful, e.g., for those
>             clients that want to choose a simple text stream.
>
>          o "both":, where both, in- and out-of-band, mechanisms MAY be
>             used.  Note that "spldesc=both" indicates that both in-band
>             and out-of-band sample descriptions MAY be sent for that
>             stream,  and not that both are necessarily sent during a
>             session.  This corresponds to the default case.  This is the
>             default case.
>
>     sprop-tx3g=<base64-value-1>, <base64-value-2>,...This
> parameter MUST be
>          used for conveying sample descriptions out-of-band.  The list of
>          sample entries MAY follow any particular order and it MAY be
>          empty.  The absence of this parameter is equivalent to an empty
>          list of sample descriptions.  The <base64-value-i> represents
>          the base64 encoding of the concatenation of the SIDX and the
>          sample description for that SIDX, in this order.  The format of
>          a sample description entry can be found in 3GPP TS 26.245
>          Release 6 and later releases.  All servers and clients MUST
>          understand this parameter and MUST be capable of using the
>          sample description(s) contained in it.  Please refer to RFC 3548
>          [6] for details on the base64 encoding.
>
>
> I don't now if there is any usage of a tx3g parameter. In that case,
> that would mean, here is the sample descriptions that you shall use to
> send me timed text.
>


I think that with the O/A usage I defined above this is not needed (?)...I
guess sprop- means sender properties  Is it not an overkill to have such
fine granularity for the settings? Instead care shall be taken when
composing the session... I think..


Another issue:  I have been discussing with Jan and Dave offline.  There is
another possibility to combine TYPE2+TYPE3 fragments without adding any
fields.  As I said, this should happen very rarely and it is not worth
adding any new fields, but one can make an exception when the payload
contains fragments by saying that in that particular case, the timestamp
calculated for the first applies to all and no further timestamp calculation
shall be performed.  The SDUR is kept in TYPE3 and TYPE4 since they may
still be alone in RTP payloads...  It is a little exception that doesn't
require much change and takes care of it pretty elegantly.. what do you say?

Thanks,


Jose

>
> Cheers
>
> Magnus Westerlund
>
> Multimedia Technologies, Ericsson Research EAB/TVA/A
> ----------------------------------------------------------------------
> Ericsson AB                | Phone +46 8 4048287
> Torshamsgatan 23           | Fax   +46 8 7575550
> S-164 80 Stockholm, Sweden | mailto: magnus.westerlund at ericsson.com
>



_______________________________________________
Audio/Video Transport Working Group
avt at ietf.org
https://www1.ietf.org/mailman/listinfo/avt