[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[AVT] Comments on draft-ietf-avt-rtp-midi-format-11



Hi,

I have reviewed the RTP payload for MIDI as part of the request for WGLC. Sorry for it taking very long time, however scrapping together the 8+ hours needed for the review was not easy.

I think that the IANA levels and the use of media types for rinit that needs to be discussed. The others should be more straight forward corrections. However please remember that these are my opinions and should be challenged when you do not agree with the motivation.

Comments in order of appearance within the draft.

1. Section 3.2, page 18, second last paragraph:
"The RECOMMENDED receiver reaction to a cancellation depends on the
capabilities of the receiver."

This is not a normative "RECOMMENDED" to my reading, it appear to be a normal informative recommended.

2. Section 3.2, page 20:
"[1] reserves the undefined System Common commands 0xF4 and 0xF5 and the
undefined System Real-time commands 0xF9 and 0xFD for future use.  By
default, undefined commands MUST NOT appear in a MIDI Command field in
the MIDI list, with the exception of the 0xF5 octets used to code the
"dropped 0xF7" construction and the 0xF4 octets used by SysEx "cancel"
sublists."

So the MMA people reviewing the format is fine with this usage of the 0xF4 and 0xF5?

4. Section 7, bullet "rendering":
"  o  Rendering.  The payload format may be extended to support
     new MIDI renderers (Appendix C.6.2).  The extension
     mechanism uses the normal media type registration process [26].
     Certain general aspects of the RTP MIDI rendering process may
     also be extended, via IETF standards-track documents that
     define new token values for the render (Appendix C.6) and
     smf_info (Appendix C.6.4.1) parameters."

Reviewing this and the later Appendix explaining the parameter made me react. I don't think media type is the right registry for renders. I will explain why and will reference text further into the specification especially from C.6.2 (rinit parameter).

As I read c.6.2 the rinit parameter encodes primarily which type of render initialization information that is needed. This is not only which file format the information is provided with, it also concerns the rules for this render and have a wider semantic meaning that simple id the format of the init information included in the data part.

If the above statement is true I would recommend that one splits rinit into several parameters. One parameter indicating the complete rules for rinit which uses a MIDI specific registry for its values. It may then also include a media type to identify the format of the initialization data that are included inline or by reference.

Comments on this?

5. B.5.1, page 85, last paragraph:
"The FIRST field (present if F = 1) encodes a variable-length unsigned
integer value that sets the coverage of the DATA field."
What is meant here with "coverage of the DATA field"?

6. C. The usage of the normative language does not appear fully appropriate in the following sentences:

"The parameters listed above may optionally appear in session
descriptions of RTP MIDI streams; none are REQUIRED to appear."

"In Appendix C.7, we define REQUIRED RTP MIDI
features for two classes of applications: content-streaming using RTSP
(Appendix C.7.1) and network musical performance using SIP (Appendix
C.7.2)."

I don't see these usages of REQUIRED to be inline with RFC 2119. In both instances the sentences appear to be of more informative nature although they specify behavior this behavior seem to be a clarification rather than something that is crucial for interoperability.

7. C.1. The rules for cm_used and cm_unused are missing one important clarification. Is the set full or empty before applying the parameters. To my understanding the rules is that if the first of these two are cm_used the set was empty, and if it is cm_unused it was full. Is that correct? In that case please clarify.

8. C.1, page 96, first sentence after example SDP.
"(The a=fmtp line has been wrapped to fit the page to accommodate
 memo formatting restrictions; it comprises a single line in SDP)"

This is actually not correct in this case as the fmtp line is not wrapped.

9. Section C.6: Why is IETF standards track documents needed for defining new renders? To me defining renders is something that is to be expected to happen outside of IETF and we should allow other organization ease of registering. I think that Specification required is a suitable level for this.

10. Section C.6, page 123:
"If a party is offered a session description
that uses a token value that is not known to the party, and if the party
needs to know about the renderer for correct operation, the party MUST
NOT accept the renderer."

How does the answering party determine if it needs to know the render or not. Are there any indication encoded within all tokens to indicate if that property exist or not?

11. Section 6.3, fourth paragraph:
"Two url parameters that appear in sequence, the first assigned to a TCP
HTTP URL and second assigned to a secure HTTP URL, MUST point to the
same data object.  A receiver MUST first access the secure HTTP URL to
fetch the data object if it is able to do so, and only use the TCP URL
if the secure HTTP URL does not work.  The appearance of a TCP and
secure HTTP URL pair indicates the acceptability of using the TCP URL.
If secure access to a URL is REQUIRED for an application, a url
parameter with a TCP HTTP URL value MUST NOT be present."

I think this specification has the potential to creates some issues. First of all guaranteeing that both URL points to the same object is difficult and can easily be messed up by mistake. Also the second MUST is kind of a toothless one, as it is not anything more than a recommendation, especially as it is missing rules for what is acceptable failure reasons.

As I see it, if the security is not deemed to be required, (local render data, etc.) then one uses unsecured HTTP URL. If security is required then one include a secure URL. There are never any reason to include one secure and one that is not as this doesn't provide any better security than having the unsecured one.

12. Section  6.4.1: IANA registration level of "smf_info" parameter.
Also here I think that IETF specification is to strong requirement.

13. Section C.6.4.2, Page 129 first paragraph:
"If the associated RTP MIDI streams also form an ordered
relationship, the first SMF is merged with the first name space of the
relationship, the second SMF is merged to the second name space of the
relationship, etc."

How is first defined for name spaces? What shall be used to determine the ordering that is required to perform the action required by this sentence?

14. Section C.7.1:

It seems that this section is written under the assumption that an SDP in an RTSP context can be reduced from a complex configuration to a simpler one by the client itself. That is not true, as an RTSP client needs to accept the SDP as it is or refuse the session. To go around this problem there exist several solutions, however most common is that the client identifies itself to the server in the DESCRIBE request. This allows the server to tailor the content and description to the client. In 3GPP Packet-based Streaming Service this is done by providing an UAProf URL to the UAProf describing the client's capabilities.

Thus the following paragraph is in error:
"Other session descriptions for a performance
SHOULD also be provided, to offer enhanced support for full-featured
clients (just as Internet Radio services may offer the same content
using several audio codecs at several bandwidths)."

15. Section C.7.2: Page 138:
"Parties MUST be able to implement the semantics of the
guardtime parameter, for times from 0 ms to 5000 ms."

I have a problem with requiring the usage of guardtime of 0ms as this requires infinite bandwidth. Can we please select some more reasonable value here. Considering that network jitter can be quite many ms a to low value does not seem to provide a benefit in a best-effort environment. I do understand that there are certain use cases that requires tighter bounds. But as requirement for a basic application I think that guardtime should be sufficient to be 5 or 10 ms as lowest.

16. Section H.1:

Please reference draft-freed-media-type-reg-05 and RFC 3555 to indicate that you are using these registration rules.

17. Section H.1, Encoding consideration:
This should indicate that this format is "framed" and "binary". The thing about only define for transport over RTP is not correct here and is instead indicated in "Restriction on Usage" section.


18. Section H.2: Same as (17). In addition there is an error here. Either the registry rules shouldn't be duplicated, or they should be correctly referenced to H.2.1.

Cheers

Magnus

--

Magnus Westerlund

Multimedia Technologies, Ericsson Research EAB/TVA/A
----------------------------------------------------------------------
Ericsson AB                | Phone +46 8 4048287
Torshamsgatan 23           | Fax   +46 8 7575550
S-164 80 Stockholm, Sweden | mailto: magnus.westerlund at ericsson.com

_______________________________________________
Audio/Video Transport Working Group
avt at ietf.org
https://www1.ietf.org/mailman/listinfo/avt