[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[speechsc] VoiceXML Compatibility Issues with Synthesis



The VoiceXML Forum MRCP Liaison committee is currently evaluating the latest MRCP v2 draft to (a) evaluate the compatibility between MRCP v2 and VoiceXML and (b) generate test assertions for MRCP v2 based VoiceXML browsers and MRCP v2 based media resources. We are currently examining the Speech Synthesis portion of the specification and have found the following issues:
  1. Both the Fetch Hint header (8.4.10) and Audio Fetch Hint header (8.4.11) say that they apply to audio files. Which one is it? These descriptions should be specific about what resource types they apply to.
  2. Audio Fetch Hint header (8.4.11) supports a "stream" value that is not supported by VoiceXML 2.x. Moreover, this value says nothing about when to fetch the audio file; it determines when it should start playing. You could have a "safe" fetch that was either streaming or non-streaming. The same applies to a "prefetched" audio file.  A separate header should control streaming if required. It's worth noting that VoiceXML leaves streaming as a platform-specific optimization.
  3. The maxage and maxstale cache control headers are global in MRCP while VoiceXML breaks this down by resource type (e.g. audiomaxage, audiomaxstale, grammarmaxage, grammarmaxstale, etc.). This results in compatibility issues between VoiceXML and MRCP. Is there a way to apply these parameters to specific resource types in MRCP?
  4. It isn't clear if the  "003 uri-failure" SPEAK completion cause applies to referenced SSML files, or audio files or both. If it applies to audio files then this is in conflict with the VoiceXML/SSML behavior of "continuing processing if audio cannot be successfully rendered. The SSML Recommendation says "If the audio element is not successfully rendered, a synthesis processor should continue processing and should notify the hosting environment." To truly meet this requirement it would seem that we need some way to indicate that audio was not played while continuing to play the other audio and SSML specified in the speak request. A completion-cause of 003 could perhaps be used to indicate this (if we allow the notification of the host environment to occur after the request completes), though the completion cause is documented as representing the "reason of request completion" which wouldn't be accurate in this case. This definitely needs some clarification.
If you would like to participate in the VoiceXML Forum MRCP Liaison committee please email me.
Thanks,

Andrew Wahbe
begin:vcard
fn:Andrew Wahbe
n:Wahbe;Andrew
org:VoiceGenie Technologies INC.;Multimodal and Development Tools
adr:8th Floor;;1120 Finch Avenue W.;Toronto;ON;M3J 3H7;Canada
email;internet:awahbe at voicegenie.com
title:Technical Manager
tel;work:(416) 736-0905 ext. 258
tel;fax:(416) 736-1551
x-mozilla-html:TRUE
url:http://www.voicegenie.com
version:2.1
end:vcard

_______________________________________________
Speechsc mailing list
Speechsc at ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc