[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [speechsc] VoiceXML Compatibility Issues with Synthesis



 


From: speechsc-bounces at ietf.org [mailto:speechsc-bounces at ietf.org] On Behalf Of Andrew Wahbe
Sent: Wednesday, December 07, 2005 1:51 PM
To: IETF SPEECHSC (E-mail)
Subject: [speechsc] VoiceXML Compatibility Issues with Synthesis

The VoiceXML Forum MRCP Liaison committee is currently evaluating the latest MRCP v2 draft to (a) evaluate the compatibility between MRCP v2 and VoiceXML and (b) generate test assertions for MRCP v2 based VoiceXML browsers and MRCP v2 based media resources. We are currently examining the Speech Synthesis portion of the specification and have found the following issues:
  1. Both the Fetch Hint header (8.4.10) and Audio Fetch Hint header (8.4.11) say that they apply to audio files. Which one is it? These descriptions should be specific about what resource types they apply to.
    [Sarvi>>] Fetchj-Hint applies for all documents including audio files. If the Audio Fetch Hint is specified it applies specifically for audio files only. 
  2. Audio Fetch Hint header (8.4.11) supports a "stream" value that is not supported by VoiceXML 2.x. Moreover, this value says nothing about when to fetch the audio file; it determines when it should start playing. You could have a "safe" fetch that was either streaming or non-streaming. The same applies to a "prefetched" audio file.  A separate header should control streaming if required. It's worth noting that VoiceXML leaves streaming as a platform-specific optimization.
    [Sarvi>>] I think this should be fine. "safe" and "pre-fetched" should cover the VoiceXML needs. This is just indicate to the Media server if it should fecth SSML documents or audio fiels ahead of time or just in time before play back. The streaming model, tells the media server to stream the audio as it fetches the URI and not wait for the audio file to finish downloading into the media server. Which is anothe rpossible option the VoiceXML allows for.
  3. The maxage and maxstale cache control headers are global in MRCP while VoiceXML breaks this down by resource type (e.g. audiomaxage, audiomaxstale, grammarmaxage, grammarmaxstale, etc.). This results in compatibility issues between VoiceXML and MRCP. Is there a way to apply these parameters to specific resource types in MRCP?
    [Sarvi>>] I'd like to hear some other opinions on this. There are different times of URI that will be neeeded to be loaded/cached. This includes audio files, grammar files, SSML, voice-prints etc. I don't believe we want to have separate headers for each. The best I might suggest is to leave the current Cache parameters to covere all types of documents including Audio, SSML, Grammar etc. Then if possible implement separate audio-maxstale etc which if specified apply to audio files only.
  4. It isn't clear if the  "003 uri-failure" SPEAK completion cause applies to referenced SSML files, or audio files or both. If it applies to audio files then this is in conflict with the VoiceXML/SSML behavior of
    [Sarvi>>] 

    003 uri-failure was meant for cases where the SSML was fetched through a URI and it failed to fetch it. So we should we leave this alone.

     
    "continuing processing if audio cannot be successfully rendered. The SSML Recommendation says "If the audio element is not successfully rendered, a synthesis processor should continue processing and should notify the hosting environment." To truly meet this requirement it would seem that we need some way to indicate that audio was not played while continuing to play the other audio and SSML specified in the speak request. A completion-cause of 003 could perhaps be used to indicate this (if we allow the notification of the host environment to occur after the request completes), though the completion cause is documented as representing the "reason of request completion" which wouldn't be accurate in this case. This definitely needs some clarification.
    [Sarvi>>] 

    This definitely needs some clarification We might want add a new "success-partial" completion cause code that says the SSML commpleted but with some audio failures.

    An alternate more extensive suggestion is to use create AUDIO-EXCEPTION event similar to SPEECH-MARKER event to be generated when such audio elements fail to loaded.

    Or we could use the SPEECH-MARKER event or rename to something generic and differentiate based on headers.
     
    Sarvi  

If you would like to participate in the VoiceXML Forum MRCP Liaison committee please email me.
Thanks,

Andrew Wahbe
_______________________________________________
Speechsc mailing list
Speechsc at ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc