[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[speechsc] Hotword Recognition and Timers
The description of how timers (no-input and recognition) are used during
hotword recognition is inconsistent. In sections 9.4.7, it is stated
that "For a hotword recognition mode, this timer is started when the
user begins speaking. Note that for Hotword mode recognition the
START-OF-INPUT event is not generated." However, section 9.9 states that
for the hotword case: "The Recognition-Timer gets started at the
beginning of RECOGNIZE."
It seems that section 9.9 is incorrect (or at least is inconsistent with
VoiceXML).
Section 9.9 omits any mention of the no-input timer for the hotword mode
recognition case; however, none of the sections that deal with the
no-input timer make a distinction between the hotword and non-hotword
cases. VoiceXML also does not make this distinction. It would seem that
section 9.9 should be changed to indicate that no-input timers are
started in the hotword case and that no-input-timeout is a valid
completion cause for a hotword recognition.
A related question worth considering is if the recognition timer is
reset at any point, for example, on the detection of silence. Consider
the case when maxspeech has a value of say 20 seconds (a
typical/reasonable value) and hotword barge-in is being used on a prompt
that is 30 seconds long. This would mean that a user that spoke briefly
2 seconds into the prompt (and was silent for the remainder of the
prompt) would experience a maxspeech timeout at about 22 seconds into
the prompt. They would not hear the whole prompt which seems
inappropriate. The reason for maxspeech timeout is to catch continuous
noise and keep it from occupying a recognizer; but what should happen in
periods of silence in the hotword case?
Similarly, when is the no-input timer canceled in the hotword case? Is
it when speech (not necessarily matching) is detected? Or is it only
upon a match?
The correct behavior in my opinion is that the no-input timer is
canceled only on a match, and that the recognition timer should be reset
if silence (determined by complete timeout and incomplete timeout) is
detected. If we are just processing intermittent noise, the no-input
timer will eventually expire. Continuous noise is handled by the
recognition timer. Of course other there are other possibilities as
well, this is just one option that I think fits with VoiceXML.
begin:vcard
fn:Andrew Wahbe
n:Wahbe;Andrew
org:VoiceGenie Technologies INC.
adr:8th Floor;;1120 Finch Avenue W.;Toronto;ON;M3J 3H7;Canada
email;internet:awahbe at voicegenie.com
title:Senior Architect
tel;work:(416) 736-0905 ext. 258
tel;fax:(416) 736-1551
x-mozilla-html:TRUE
url:http://www.voicegenie.com
version:2.1
end:vcard
_______________________________________________
Speechsc mailing list
Speechsc at ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc