The VoiceXML Forum MRCP Liaison committee is currently evaluating the
latest MRCP v2 draft to (a) evaluate the compatibility between MRCP v2
and VoiceXML and (b) generate test assertions for MRCP v2 based
VoiceXML browsers and MRCP v2 based media resources. We are currently
examining the Speech Synthesis portion of the specification and have
found the following issues:
- Both the Fetch Hint header (8.4.10) and Audio Fetch Hint header
(8.4.11) say that they apply to audio files. Which one is it? These
descriptions should be specific about what resource types they apply to.
- Audio Fetch Hint header (8.4.11) supports a "stream" value that
is not supported by VoiceXML 2.x. Moreover, this value says nothing
about when to fetch the audio file; it determines when it should start
playing. You could have a "safe" fetch that was either streaming or
non-streaming. The same applies to a "prefetched" audio file. A
separate header should control streaming if required. It's worth noting
that VoiceXML leaves
streaming as a platform-specific optimization.
- The maxage and maxstale cache control headers are global in MRCP
while VoiceXML breaks this down by resource type (e.g. audiomaxage,
audiomaxstale, grammarmaxage, grammarmaxstale, etc.). This results in
compatibility issues between VoiceXML and MRCP. Is there a way to apply
these parameters to specific resource types in MRCP?
- It isn't clear if the "003 uri-failure" SPEAK completion cause
applies to referenced SSML files, or audio files or both. If it applies
to audio files then this is in conflict with the VoiceXML/SSML behavior
of "continuing processing if audio cannot be successfully rendered. The
SSML Recommendation says "If the audio element is not successfully rendered, a synthesis
processor should continue processing and should notify the hosting
environment." To truly meet this requirement it would seem that we
need some way to indicate that audio was not played while continuing to
play the other audio and SSML specified in the speak request. A
completion-cause of 003 could perhaps be used to indicate this (if we
allow the notification of the host environment to occur after the
request completes), though the completion cause is documented as
representing the "reason of request completion" which wouldn't be
accurate in this case. This definitely needs some clarification.
If you would like to participate in the VoiceXML Forum MRCP Liaison
committee please email me.
Thanks,
Andrew Wahbe
|