|
The VoiceXML Forum MRCP Liaison committee is currently evaluating
the latest MRCP v2 draft to (a) evaluate the compatibility between MRCP v2 and
VoiceXML and (b) generate test assertions for MRCP v2 based VoiceXML browsers
and MRCP v2 based media resources. We are currently examining the Speech
Synthesis portion of the specification and have found the following
issues:
- Both the Fetch Hint header (8.4.10) and Audio Fetch Hint header (8.4.11)
say that they apply to audio files. Which one is it? These descriptions
should be specific about what resource types they apply to.
[Sarvi>>] Fetchj-Hint applies for all documents including
audio files. If the Audio Fetch Hint is specified it applies specifically
for audio files only.
- Audio Fetch Hint header (8.4.11) supports a "stream" value that is not
supported by VoiceXML 2.x. Moreover, this value says nothing about when to
fetch the audio file; it determines when it should start playing. You could
have a "safe" fetch that was either streaming or non-streaming. The same
applies to a "prefetched" audio file. A separate header should control
streaming if required. It's worth noting that VoiceXML leaves streaming as a
platform-specific optimization.
[Sarvi>>] I think this should be
fine. "safe" and "pre-fetched" should cover
the VoiceXML needs. This is just indicate to the Media server if
it should fecth SSML documents or audio fiels ahead of time or just in time
before play back. The streaming model, tells the media server to stream the
audio as it fetches the URI and not wait for the audio file to finish
downloading into the media server. Which is anothe rpossible option the
VoiceXML allows for.
- The maxage and maxstale cache control headers are global in MRCP while
VoiceXML breaks this down by resource type (e.g. audiomaxage, audiomaxstale,
grammarmaxage, grammarmaxstale, etc.). This results in compatibility issues
between VoiceXML and MRCP. Is there a way to apply these parameters to
specific resource types in MRCP?
[Sarvi>>] I'd like to hear some
other opinions on this. There are different times of URI that will be
neeeded to be loaded/cached. This includes audio files, grammar files,
SSML, voice-prints etc. I don't believe we want to have separate
headers for each. The best I might suggest is to leave the current Cache
parameters to covere all types of documents including Audio, SSML, Grammar
etc. Then if possible implement separate audio-maxstale etc which if
specified apply to audio files only.
- It isn't clear if the "003 uri-failure" SPEAK completion cause
applies to referenced SSML files, or audio files or both. If it applies to
audio files then this is in conflict with the VoiceXML/SSML behavior of
[Sarvi>>]
003 uri-failure was meant for cases where the SSML was fetched through a
URI and it failed to fetch it. So we should we leave this
alone. "continuing processing if audio cannot
be successfully rendered. The SSML Recommendation says "If the audio
element is not successfully rendered, a synthesis
processor should continue processing and should notify the hosting
environment." To truly meet this requirement it would seem that we need
some way to indicate that audio was not played while continuing to play the
other audio and SSML specified in the speak request. A completion-cause of
003 could perhaps be used to indicate this (if we allow the notification of
the host environment to occur after the request completes), though the
completion cause is documented as representing the "reason of request
completion" which wouldn't be accurate in this case. This definitely needs
some clarification. [Sarvi>>]
This definitely needs some
clarification We might want add a new "success-partial" completion cause
code that says the SSML commpleted but with some audio failures.
An alternate more extensive
suggestion is to use create AUDIO-EXCEPTION event similar to SPEECH-MARKER
event to be generated when such audio elements fail to loaded.
Or we could use the SPEECH-MARKER
event or rename to something generic and differentiate based on
headers.
Sarvi If you would like to
participate in the VoiceXML Forum MRCP Liaison committee please email
me. Thanks,
Andrew Wahbe
|