[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[AVT] Comments on draft-ietf-avt-rtp-3gpp-timed-text-12.txt



Hi,

I have reviewed the draft and has mostly found editorial issues. I have a few minor technical issues to discuss. But otherwise I think the draft is in good shape.

Technical issues.

T1. In section 4, timestamp definition. It references 4.5 for details on how to calculate. However section 4.5 does not seem to contain a normative text on how to calculate the timestamp. It only contains some examples. Could you please do a complete normative text for this?

T2. Section 1, third paragraph. "MIME type usage". I would suggest that we stop calling it MIME, and instead only use the name "Media Type". Multipurpose Internet Mail Extensions (MIME) is one of the users of the media type registry. While SDP is another user of the registered types. So please remove all "MIME" references and replace them with "media type".

T3. Section 4.1.2: page 18, second paragraph:

" Furthermore, only sample descriptions (TYPE 5 units) MAY follow units of unknown duration in the same aggregate payload. Otherwise, it would not be possible to calculate the timestamp of these other units."

With the current proposed rules, the timestamp for a type 5 following an unbounded type 1 seems to lack TS derivation definition. The text in chapter 4:

"       TYPE 5 units receive their timestamp from the first non-TYPE 5
        unit following them in the payload or from the RTP packet header
        itself, if there are only TYPE 5 unit(s) or if one or several
        TYPE 5 units follow a sample of unknown duration (see Section
        4.1.2, SDUR definition).  If there are no non-TYPE 5 units that
        follow, the timestamp of the sample description is calculated in
        the usual way, i.e. by adding sample duration and timestamp
        value of the last unit encountered (see case a) in Figure 10).
        Finally, note that for TYPE 5 units, the timestamp actually does
        not represent the instant when they are played out, but instead
        the instant at which they become available for use."

The above sentences does not define how the second exception case mentioned in the first sentence will be handled.

T4. Section 4.1.3, page 19, second paragraph in the "SLEN" bullet:

     If several TYPE 2 units having the same timestamp but different
     SLEN are received, they MUST be discarded: a fragment of a text
     sample has always a size value that does not change during
     transmission.

Which is "they" in the above sentence? Is it all units with this timestamp received, the offending unit, or something else?

T5. Section 4.1.6.1, page 25, informative note:

"       Informative note: note that it is allowed to send any value of
        SIDX=X in the interval [0,127].  E.g. if [64..127] is the
        current active set and 65 is sent a new sample description is
        defined and an old one deleted (64).  Similarly one could send
        X=127, thus inverting the active and inactive sets."

I think there is an error in this text regarding which unit needs to be sent to remove SIDX 64 from the active set. The second sentence is also hard to understand. To my understanding it is SIDX 0 that must be received to make the active set be 0, and 65-127. If 65 is received, it only follows the duplicate rule, as it is within the set.

T6. Section 4.3, page 29:

"  o If there is some bitrate and free space in the payload available,
     sample descriptions (if at hand) SHOULD be aggregated.  Sample
     descriptions (TYPE 5 units) MAY be placed anywhere in an aggregate
     payload, since the sample index (SIDX) is used to associate them
     to their text samples (explained in Section 4.2)."

The second sentence is in error. A sample description may not be placed at any point, because depending on the location, the TS value will differ.

T7. Section 4.3, Page 30:

"   o An additional requirement when fragmenting text samples is that
     the start of the modifiers MUST be indicated using the payload
     header defined for that purpose, i.e. a TYPE 3 unit MUST be used
     (see Section 4.1.4).  Otherwise, if packets are lost, a client may
     be unable to identify where the modifiers start and the text ends
     or whether either text strings or modifiers were received
     completely or not."

The sentence starting with "otherwise, ..." is in error. This procedure does still not fully prevent the detection of the border. It does however enable the detection as long as only a single loss occurs. Thus I would propose to correct this sentence.

T8. A question on section 4.7:

Is there a relevant object identifier that should be used in RFC 3640 media type signalling to identify Timed Text content? If so, wouldn't it be good to have an reference to this?

T9. Section 5, first paragraph:

"  Apart from the basic fragmentation guidelines described in the
   section above, the simplest option for packet loss resilient
   transport is packet repetition.  A variant of packet repetition would
   be data carousel transmission, where data packets are sent in
   periodic cycles."

I don't think the usage of "carousel" is that appropriate. It is a unclear term, that people interpret quite differently. In my definition a carousel is something that transmits everything, and then restarts from the beginning. This is clearly not what you do here. This is either a window based repeat function or simply a repeating mechanism, depending if one aggregates the repeated packets with new or not.

T10. Section 7.2:

The position defined in SMIL, does that relate to the corner of the text track, or is it from where TX and TY is applied? I think that needs to be clarified.


T11. Section 7.3 tx, ty bullet: "Therefore, only the first 16 bits are used in the payload header."

and

"        o width, height: they also have the same name in the box and
          the payload header.  All (unsigned) 32 bits are meaningful."


Is this really correct? Isn't tx, ty, height and width only expressed in the media type parameters? So it can't really be used in the payload header.


T12. Section 9.2.1:
"     o Text track (area) dimensions, "height" and "width": in the case
        of sendonly (sendrecv) offers, an answerer accepting the offer
        MUST be prepared to render (and send) the stream with the same
        exact values.  If any of these conditions are not met, the
        stream MUST be removed or the session rejected."

Isn't the the text within the parenthesis wrong? In a "sendrecv" offer, the answer does not at all need to send with the parameters that the offer provides. He can choice his own, as this is a declarative stream property parameter. I would suggest to simply remove the two parenthesis.





Editorial things
----------------

E1. The page length on the first 3 pages is not consistent with the rest of the draft.

E2. Section 1, paragraph 3. "Section 8 registers the MIME type
usage.": In my eyes the formulation doesn't look right. I think "Section 8 defines the media type." The request to register it is part of section 10. The usage rules in SDP for the defined media type and its parameters are in section 9.


E3. Section 2.3, bullet 3a:
"If sample descriptions are needed in the course of a session, these may be sent also out-of-band or in-band." I would suggest to add "further" as the second word in the above sentence.


E4. Section 2.5, bullet 2:
"Instead, it is
        recommended that some more overhead be invested to provide full
        error correction by protecting the less text sample fragments
        using the measures outlined in Section 5. "

Something is wrong with the "the less", should it read "at least".

E5. Section 2.5, bullet 5:
"For this reason the fields SIDX
        and SDUR are swapped in TYPE 1 unit. "

Compared to what are they swapped? I would suggest to change this to:
"For this reason the fields SIDX and SDUR are swapped in TYPE 1 unit compared to the other units."


E6. Section 3, text strings definition:
"When using this payload format, the text string does contain any byte order mark (BOM)." I think there is a missing "not" before "contain".



E7. Page 10. "track / stream" is hanging alone on this page. Please go through the draft and adjust these things for better readability in the next version.


E8. Section 4, page 10:
"       Timestamp clockrates MUST be signaled by out-of-band means at
        session setup, e.g. using the "rate" attribute in SDP.  See
        Section 9 for details."

In my opinion the sentence should be changed to

"Timestamp clockrates MUST be signaled by out-of-band means at session setup, e.g. using the media type "rate" parameter in SDP. See Section 9 for details."

E9. Section 4.1.3, page 20, first example.

         "If lower delay and higher redundancy is required, a choice
          could be that the encoder 'collects' text every second; this
          yields text samples (TYPE 1 units) of 68 bytes, TYPE 1 header
          included.  Taking a smaller delay of 3s, three contiguous
          text samples could be aggregated in one RTP payload: the
          current and last two text samples."

I don't think "smaller" in the second sentence is correct. I interpret it to be a in comparison with the 1 second delay.

E10. Section 4.1.5: Last paragraph:

"Regarding the SDUR field and the absence of the SLEN and SIDX fields,
   the same reasoning as for TYPE 3 applies."

Can this language be tighten up. I think separating the normative part and the informative part would be good:

"The SDUR field is defined as in TYPE 1. The reasoning behind the absence of SLEN and SIDX is the same as in TYPE 3 units."

E11. Section 4.2, page 26, SDUR bullet, last line:
Extra space in "SDUR= SDUR1+SDUR2".

E12. Page 27:
               "a) The total number of indices used is greater than the
                number of indices available, i. e., for static ones more
                than 127 and for dynamic ones more than 64 or,  "

This sentence is hard to parse. I would suggest to change it to:

"a) The total number of indices used is greater than the
number of indices available, i. e., if the static sample descriptions are more than 127, or the dynamic ones are more than 64 or, "


E13. Section 5, Page 40:

"  A server MAY decide to use repetition as a measure for packet loss
   resilience.  Thereby, a server MAY send the same RTP packet payloads
   or just parts of it, i.e. single units."

The second sentence is a bit strange. At a minimum the "s" in "payloads" should be removed, or "it" is wrong.

I propose:

"  A server MAY decide to use repetition as a measure for packet loss
   resilience.  Thereby, a server MAY send the same RTP payloads
   or just some of the units from the payloads."


E14. section 8, section title:

I would suggest to remove 8.1 completely and instead place everything in 8. There is nothing else in this section than the media type definition.

Thus my proposal for section title is:
"8. 3GPP Timed Text media type"



Cheers

Magnus Westerlund

Multimedia Technologies, Ericsson Research EAB/TVA/A
----------------------------------------------------------------------
Ericsson AB                | Phone +46 8 4048287
Torshamsgatan 23           | Fax   +46 8 7575550
S-164 80 Stockholm, Sweden | mailto: magnus.westerlund at ericsson.com

_______________________________________________
Audio/Video Transport Working Group
avt at ietf.org
https://www1.ietf.org/mailman/listinfo/avt