[speechsc] Requests for Clarification
Andrew Wahbe <awahbe@voicegenie.com> Wed, 21 December 2005 22:02 UTC
Received: from localhost.cnri.reston.va.us ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1EpC2x-00059D-3i; Wed, 21 Dec 2005 17:02:55 -0500
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1EpC2v-00058a-Bu for speechsc@megatron.ietf.org; Wed, 21 Dec 2005 17:02:53 -0500
Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id RAA17342 for <speechsc@ietf.org>; Wed, 21 Dec 2005 17:01:48 -0500 (EST)
Received: from mail.voicegenie.com ([205.150.90.87] helo=voicegenie.com) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1EpC5c-0007qu-OI for speechsc@ietf.org; Wed, 21 Dec 2005 17:05:41 -0500
Received: from [205.150.90.65] (parrot.voicegenie.com [205.150.90.65]) by voicegenie.com (8.11.6+Sun/8.9.3) with ESMTP id jBLM2a811474 for <speechsc@ietf.org>; Wed, 21 Dec 2005 17:02:36 -0500 (EST)
Message-ID: <43A9D0FC.1080103@voicegenie.com>
Date: Wed, 21 Dec 2005 17:02:36 -0500
From: Andrew Wahbe <awahbe@voicegenie.com>
Organization: VoiceGenie Technologies
User-Agent: Mozilla Thunderbird 1.0.2 (Windows/20050317)
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: "IETF SPEECHSC (E-mail)" <speechsc@ietf.org>
Subject: [speechsc] Requests for Clarification
Content-Type: multipart/mixed; boundary="------------080208010202040004080009"
X-Spam-Score: 0.0 (/)
X-Scan-Signature: c83ccb5cc10e751496398f1233ca9c3a
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>, <mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>, <mailto:speechsc-request@ietf.org?subject=subscribe>
Sender: speechsc-bounces@ietf.org
Errors-To: speechsc-bounces@ietf.org
The VoiceXML Forum MRCP Liaison Committee is currently evaluating the latest MRCP v2 draft to (a) evaluate the compatibility between MRCP v2 and VoiceXML and (b) generate test assertions for MRCP v2 based VoiceXML browsers and MRCP v2 based media resources. We are currently examining the Speech Synthesis portion of the specification and have raised issues with the specification in prior emails to the SpeechSC list (See http://www.ietf.org/mail-archive/web/speechsc/current/msg01605.html http://www.ietf.org/mail-archive/web/speechsc/current/msg01606.html and http://www.ietf.org/mail-archive/web/speechsc/current/msg01607.html ). These issues (and the responses to them) have been discussed by the MRCP Liaison Committee and we would like to make the following requests and suggestions: 1) The relationship between the Fetch Hint header and the Audio Fetch Hint header should be clarified. More specifically, it should be stated that, when specified, the Audio Fetch Hint header overrides the Fetch Hint header for audio files only. 2) It should be clarified that SPEAK completion code 003 "uri-failure" only applies to fetched SSML files and that failure to fetch (or process) an audio file will not result in aborting the SPEAK request. This does mean, however, that there is no way to communicate the failure to fetch (or process) the audio file to the MRCP client. While SSML requires that the processor "notify the hosting environment" when such a failure occurs, the members of the committee agree that logging this event at the MRCP server is sufficient. It may be advisable for the MRCP specification to suggest that these events should be logged in some way. We would also like to suggest that future versions of MRCP consider adding an event (e.g. "Audio-Exception") to notify the MRCP client that such a failure has occurred without aborting the SPEAK request. 3) The definition of the Basic Synthesizer resources is a bit vague and should be clarified. Its not entirely clear from the description in the spec how it is supposed to work. The general consensus in the Committee is that this resource can be used for audio only prompts. It is supposed to accept a subset of SSML that only includes <speak><audio><say-as> and <mark>. What isn't clear, is how <say-as> is supposed to work in this case and if text strings are acceptable (you would think no if it wasn't for <say-as> being allowed). It may also be reasonable to make <mark> optional; a VoiceXML 2.0 browser certainly wouldn't need it anyway. We find that clarifications are needed in order to make any assertions on how a VoiceXML browser would use a basicsynth resource in an implementation. A final issue worth noting is that the maxage and maxstale cache control headers are global in MRCP while VoiceXML breaks this down by resource type (e.g. audiomaxage, audiomaxstale, grammarmaxage, grammarmaxstale, etc.). This may be acceptable because the context of each request should govern the type of file to which these headers apply. i.e. in a SPEAK request the control audio file fetches and in RECOGNIZE requests they control grammar file fetches. As we continue to evaluate the spec we will keep our eyes open for scenarios where this does not hold. Thus, we are not requesting any changes related to this issue at this time. Related to the above issue is the fact that the <audio> tag in VoiceXML extends the attributes defined in SSML by adding maxstale, maxage, fetchtimeout, and fetchhint (it also adds expr but that "evaluates away" to src). These fetch-related headers override their associated properties. Unfortunately, since MRCP is based on SSML, these attributes cannot be included in an MRCP request; instead, the associated headers would need to be set to control this behavior. This obviously introduces a problem if a request contains two <audio> tags that had these attributes set differently in the original VoiceXML document. It would seem that one way to address this problem is to break apart an SSML prompt so that each audio file is sent in its own request. Unfortunately, Issue (2) from above prevents this solution from working. Consider a prompt with alternate audio files such as: <audio maxstale="A" src="A.wav"><audio maxstale="B" src="B.wav"/></audio> where maxstale values A and B are not the same. These files can't be sent as part of the same request due to their maxstale values. However, if they are sent as part of separate requests, the client would need to know if A.wav could not be fetched in order to decide if it should request for B.wav to be played. But as discussed above, there is no way for the client to know this. The MRCP Liaison Committee believes that the best way for this to be addressed is to make a request to the W3C Voice Browser Working Group to add these attributes to the audio tag in SSML. Again, we are not requesting any changes to MRCP related to this issue. Regards, Andrew Wahbe VoiceXML Forum MRCP Liaison Committee
_______________________________________________ Speechsc mailing list Speechsc@ietf.org https://www1.ietf.org/mailman/listinfo/speechsc
- [speechsc] Requests for Clarification Andrew Wahbe
- Re: [speechsc] Requests for Clarification Dave Burke
- RE: [speechsc] Requests for Clarification Shanmugham, Saravanan