I have been selected as the General Area Review Team (Gen-ART) reviewer for this draft. For background on Gen-ART, please see the FAQ at < http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq> Please resolve these comments along with any other comments you may receive. Document: draft-ietf-speechsc-mrcpv2-24.txt Reviewer: Miguel Garcia Review Date: 2011-05-03 IETF LC End Date: 2011-04-13 Summary: The document is on the right track, but has some issues that should be addressed before publication as a standards track RFC. Major issues: none Minor issues: - Section 4.2 reads: To remain backwards compatible with conventional SDP usage, the format field of the m-line MUST have the arbitrarily-selected value of "1". The comment is that in other protocols, for example, MSRP [RFC4975] it has been selected to use an asterisk "*" in the format field. I wonder if it is possible to unify criteria, in order to allow usage of conventional SDP parsers. - Although at the general level the draft says that it reuses the SDP offer/answer model, I think it would be good to explicitly mention it, just explicitly indicate that "we aren't doing anything new here". For example, in Section 4.2, the text reads When the client wants to add a media processing resource to the session, it issues a SIP re-INVITE transaction. and I think it would be better to say: When the client wants to add a media processing resource to the session, it issues a new SDP offer, according to the procedures of RFC 3264 [RFC3264], in a SIP re-INVITE request. Also, on page 17, the text reads: When the client wants to de-allocate the resource from this session, it issues a SIP re-INVITE transaction with the server. The SDP MUST offer the control m-line with port 0. It could be completed with: When the client wants to de-allocate the resource from this session, it issues a new SDP offer, according to RFC 3264 [RFC3264], where the control m-line port MUST be set to 0. This SDP offer is sent in a SIP re-INVITE request. - Section 4.2 describes the usage of the a=setup attribute, but it does not clarify if setting up MRCP sessions MUST always use the a=setup attribute or not. I think the answer is yes, at least for TCP or TLS transports (those defined in this version of the draft). But there should be a "MUST use the 'setup' attribute as per RFC 4145" somewhere. I recommend to add the following sentence at the beginning of the last paragraph in page 16: MRCVv2 clients and servers using TCP as a transport protocol MUST use the procedures specified in RFC 4145 [RFC4145] for setting up the TCP connection, with the considerations described hereby. Similarly, MRCVv2 clients and servers using TCP/TLS as a transport protocol MUST use the procedures specified in RFC 4572 [RFC4572] for setting up the TLS connection, with the considerations described hereby. The 'setup attribute, as described ... [continue with the current text] - Section 5.1. The ABNF of "mrcp-version", at the top of page 26, comes out of the blue: there is no "mrcp-version" in the ABNF on page 25, so, it is difficult to find out how to put "mrcp-version" into context. - Section 5.4, in the client failure table, the description of the 407 contains a normative text "MAY BE". I think these tables are an informative summary. All the normative text should be written in detail elsewhere in the document. You can safely turn them lowercase. - On Section 8.5.1 the spec talks about using the multipart/mixed Media Type. It is not clear to me if implementation of multipart/mixed is a MUST or a MAY that is negotiated. - On Section 8.6, first paragraph, the text reads: The SPEAK" method can carry voice and prosody.... I guess the "can" should be replaced by a normative "MAY". - On Sections 9.4.9, 10.4.8, and 11.4.11, there is no indication of the possible values of the "media-type-value". I guess you want to say that these are intended to be MIME types (so far, the word "MIME type" is not written), and in that case, you should also say that possible values are any of the values included in the MIME media types registry maintained by IANA. - On Sections 9.4.11, 10.4.3, and 11.4.16, the ABNF merely reads: completion-cause = "Completion-Cause" ":" 3DIGIT SP 1*VCHAR CRLF I am assuming that when the ABNF refers to "3DIGIT", these are supposed to be any of the values included in the "Cause-Code" column of the table in the same section. Similarly, I am assuming that the "1*VCHAR" should include any of the values of the "Cause-Name" column in the same table. I think this should be spelled out. I suggest to change the ABNF for this one: completion-cause = "Completion-Cause" ":" cause-code SP cause-name CRLF cause-code = 3DIGIT cause-name = *VCHAR And then do the missing mapping with a text similar to: "The 'cause-code' contains a numerical value selected from the Cause-Code column of the following table. The 'cause-name' contains the corresponding token selected from the Cause-Name column. Note that the resolution of this issue should be also applied to Sections 10.4.3 and 11.4.16. - Question: In Sections 9.4.21 and 10.4.6, should "1*UTFCHAR" be included in quotes? It is true that there is no other subfield in the same header, meaning that there is no intention to parse the text. But somehow I feel safer if you enclosed the text in quotes. Also, remember that this text is coming from other protocol beyond your control, so you never know, for example, if the other protocol is going to add CRLF or something weird that will crash the recipient of the header. Note that the resolution of this issue should also be applied to Section 10.4.6. - On Section 9.4.26, the ABNF reads: recognition-mode = "Recognition-Mode" ":" 1*ALPHA CRLF However, there is only to values for choice, "normal" and "hotword". I think the following ABNF represents better that there is only two possible values: recognition-mode = "Recognition-Mode" ":" normal-value / hotword-value normal-value = "normal" hotword-value = "hotword" - Section 9.4.40. I wonder why the Phrase-NL is defined as UTFCHAR, while most of the other headers are defined as VCHAR - Section 9.9, third paragraph, the text reads: "If the client needs to explicitly control grammar weights for the recognition operation, it must employ method 3 below. " I guess the "must" should be a "MUST". - Section 9.9, the paragraph that goes after bullet point number "3", the text reads: "In addition to performing recognition on the input, the recognizer may also enroll the collected utterance in a personal grammar if the ..." Here the "may" should be a "MAY". - Section 9.9, bullet point number 1 on page 114: "the recognizer must complete ...." I guess the "must" should be a "MUST". There is another instance with the same text on bullet point number 1 on page 115. - Section 9.9, bullet point number 2 on page 115: "the RECOGNIZE completes with ..." I guess it should be normative: "the RECOGNIZE MUST complete with ..." - Section 9.17, the text reads: "The END-PHRASE-ENROLLMENT method may be called ONLY during an active phrase enrollment session. " Two issues: First, the "may" should be a "MAY". Second, I guess the readability is increased if the sentence is turned into active. What about this: "The client MAY call the END-PHRASE-ENROLLMENT method ONLY during ..." - Section 9.17, the text reads: "... the client can abort ..." I guess the "can" should be a "MAY". - Section 9.17, second paragraph. The text should clearly indicate the name of header where the 'location/URI' should be included. - In Section 10.4.7, I would like to clarify that if the record-uri is not sent in the header, then the actual audio should be sent as a MIME body. I would suggest to replace: If this header field is not specified in the RECORD request, the server MUST capture the audio and send it in the "STOP" response or the RECORD- COMPLETE event as a message body. with this text: If this header field is not specified in the RECORD request, the server MUST capture the audio, MUST encode it as a MIME body, and MUST send it in the "STOP" response or the RECORD- COMPLETE event. - In Section 10.4.7, I believe you should write normative text at the end of paragraphs 1 and in paragraph 3. Also, in paragraph 3, you need to add references to all the valid schemes and add the 'cid' scheme. And perhaps write all the scheme names in lowercase. In this case, the response carrying the audio content MUST include a Content ID (cid) [RFC2392] value in this header pointing to the Content-ID in the message body. And in the the third paragraph: Implementations MUST only use 'http', 'https' [RFC2616], 'file' [RFC3986], and 'cid' [RFC2392] schemes in the URI. Other URI schemes MUST NOT be used. Note, however, that implementations already exist that support other schemes. - Section 10.5. Perhaps a reference to Section 10.4.7 should be added, because the details are described in 10.4.7. - Sections 4.2 and 10.6. The text says that if physical security is provided, one can void TLS and merely used TCP. I have the feeling that the connection of these two concepts (physical security and lack of TLS) is not sufficiently justified for the security experts. One could access the resources of the MRCP server remotely, via a TCP connection, not using TLS. Or even sniff the network. So, physical security is not enough for voiding TLS. - Section 11.4.2. I am missing more normative strength text here. For example: + a couple of occurrences of "is required" should be set to "MUST be present" + "may" should be "MAY" + "can" should be "MAY" - Section 11.5.2.10. I am missing more normative strength text. In particular, a couple of occurrences of "must" in the first paragraph should be "MUST". - Section 11.6, 7th paragraph. I am missing more normative strength text. When the text say: "... operations may be performed on the verifier resource". The "may" should be a "MAY". - Section 12.1, bullet point 1. Although the text says that clients and servers MUST support digest authentication, I think they SHOULD use it (isn't that the intention). In particular, the server SHOULD authenticate the client using SIP digest authentication. Also, a reference to RFC 3261 should be added here. - Section 12.1, bullet point 2. I agree that clients and servers SHOULD employ 'sips' URIs, but I guess the 'sips' is just the format of the underlying TLS security. I guess you should add "including that clients and servers SHOULD setup up TLS [ref] connections". - Section 12.3. The last word of this section, "recommended", should be normative: "RECOMMENDED". - Section 13.1.6 describes a mechanism where vendor-specific extensions use the reverse DNS mechanism, for example., "com.example.foo". Then, if the vendor-specific extension is connected to DNS to avoid clashes in names, why is there a need for an expert review policy prior to its registration? I see a contradiction in having a self-managing registry by avoiding clashes due to the connection to DNS, and then having anything else than a volunteer registry. - Section 13.7.2 registers new SDP "att-field" at the *session-level* (at least according to the title in Section 13.7.2. However, the text of both registers reads "media-level" in the type of attribute. So, are these two session-level or media-level SDP attributes? I think they are media-level, in which case, they both should be written in Section 13.7.3, which is the SDP "att-field" registration at the media-level. - I think it is not correct to have normative text (capitalized) in the examples, because examples are informative by nature. The text around the examples should describe what is happening. If there is a need for normative text, then it should be already written elsewhere, and if it isn't, it should be written in the normative sections, but not in the examples. So, I found the following instances of normative text in Section 14: + RECOMMENDED on page 186 + A couple of MUST on page 187 + MUST on page 189 - Sections 16.3 and 16.3. If these are schema definitions, then why don't they start with: and the namespace definition, like any other XML schema document? Nits: ==== - Expand acronyms at first occurrence. This includes: SSML, NLSML, VXML, RTSP - General to all the examples containing SDP. In Section 4.3. I think the "o=" lines are not correct, in particular, the IP address in the o= line is set to "192.0.2.4" in both the offer and the answer. So, the first thing is that it should be a different value in the offer and the answer. The second thing is that it should typically be equal to the IP address that we see in the c= line, unless you have a very good reason for it. Since it is not the goal of the spec to deal with these rare use cases, I would recommend to set the IP address in the o= line to become equal to that in the c= line. This also applies to the example in Section 7 and the examples in Section 14. - General to all the examples containing SDP. According to RFC 3264 you must include a "t=0 0" line in all the SDP offers and answers. - General to all the examples containing SDP. There is an "s=" line that contains text. According to RFC 3264, it is RECOMMENDED that it contains a white space or a dash "-". If you want to add a subject, the SIP Subject header field serves the purpose. - Section 5, add a formal reference to ISO 8859-1 when the text mentions it. - It would be good to have numbers in those tables in Section 5.4. I mean, there should be a caption saying "Table 1: Success 2xx response codes". - In Sections 9.4.21 and 10.4.6, add a formal reference to RFC3629 when you name "UTF-8". - Section 9.6.3.3., 1st bullet point: s/contains an float/contains a float - Section 9.6.3.4, add a formal reference to ISO 8601 when the text mentions it. - Section 9.9, bullet point numbered "3". Add a reference to the document that defines the "3C grammar weights" - Section 9.9, the first paragraph on page 114, s/The No-Input-Timer MUST BE started/The No-Input-Timer MUST be started - Section 10, add a formal reference to RFC 2326 when mentioning RTSP. - Around Section 10.4.7, it would be good to write a simple example showing a message body that includes the audio. This is to see an example of a 'cid' URI scheme. - General: Names of URI schemes should be written in lowercase, for example: 'https' URI scheme. There is an instance in Section 10.6, page 135. Other instances of "SIPS:" are in Section 12.1 - Starting in Section 10.6, but with many occurrences after that, there is a message from the Server to the Client which is written as "MRCP/2/0", where the second "/" should actually be a ".", such as: "MRCP/2.0". - Sections 9.20 and 9.21. The third message of the example starts with this line: S->C: MRCP/2.0 ... INTERPRETATION-COMPLETE 543267 200 COMPLETE The request-id value is 543267. However, the request-id of the client is 543266. I guess this request-id should also be 543266. - Section 10.6, 10.7, 10.8. The request-id of the request is set to 543257. However, the request-id of the following responses are all set to 456234. I guess they should all be set to the same value. - Section 10.8, page 137, first paragraph: s/audio ./audio. - Section 11, fourth paragraph: s/'multi-verification.'/'multi-verification'. - Section 11.1. The request-id of the fifth message is set to 314164, however, it should be set to 314162, because it is a response to the forth message. - Section 11.5.1. There is a closing bracket ")" at the end of the first sentence, for which there isn't an opening bracket. - Section 13.1.1. The registry of the set-cookie and set-cookie2 point to the wrong RFC. They should point to RFCs 2109 and 2965, respectively. - Section 13.1.3. I would suggest to split this long registry and create five separate registries, one per resource type and the Generic. This simplifies if in the future someone wants to add a new, e.g., Synthesizer header. With the current registry, IANA will add the new at the end of the registry, after the last header of the Verifier, and most people will miss this new hypothetical Synthesizer header. If you create five different registries, then the problem will be solved. - Sections 13.7.1 and 13.7.2. There are two tokens being registered in each Section. Can you add an empty line in between these two registrations (within the same Section) for the sake of readability? - Section 14.1. The example at the top of page 182, the "m=" line reads: m=audio 49170 RTP/AVP 0 96 However, there is no "a=rtpmap:96" line and an "a=fmtp:96" describing what the media type "96" is. I guess 96 represents a telephone event, in which case, you should add: a=rtpmap:96 telephone-event/8000 a=fmtp:96 0-15 - Next to last paragraph on page 187, the text reads "... turn around and issued a ...". s/issued/issue - First sentence on page 189: s/a SIP BYE/a SIP BYE request - Idnits reveals: ** Obsolete normative reference: RFC 3388 (Obsoleted by RFC 5888) ** Obsolete normative reference: RFC 2109 (Obsoleted by RFC 2965) ** Obsolete normative reference: RFC 4646 (Obsoleted by RFC 5646) ** Downref: Normative reference to an Experimental RFC: RFC 2483 -- Obsolete informational reference (is this intentional?): RFC 2234 (Obsoleted by RFC 4234) Regards, Miguel Garcia -- Miguel A. Garcia +34-91-339-3608 Ericsson Spain