[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[speechsc] Hotword Recognition and Timers



The description of how timers (no-input and recognition) are used during hotword recognition is inconsistent. In sections 9.4.7, it is stated that "For a hotword recognition mode, this timer is started when the user begins speaking. Note that for Hotword mode recognition the START-OF-INPUT event is not generated." However, section 9.9 states that for the hotword case: "The Recognition-Timer gets started at the beginning of RECOGNIZE."

It seems that section 9.9 is incorrect (or at least is inconsistent with VoiceXML).

Section 9.9 omits any mention of the no-input timer for the hotword mode recognition case; however, none of the sections that deal with the no-input timer make a distinction between the hotword and non-hotword cases. VoiceXML also does not make this distinction. It would seem that section 9.9 should be changed to indicate that no-input timers are started in the hotword case and that no-input-timeout is a valid completion cause for a hotword recognition.

A related question worth considering is if the recognition timer is reset at any point, for example, on the detection of silence. Consider the case when maxspeech has a value of say 20 seconds (a typical/reasonable value) and hotword barge-in is being used on a prompt that is 30 seconds long. This would mean that a user that spoke briefly 2 seconds into the prompt (and was silent for the remainder of the prompt) would experience a maxspeech timeout at about 22 seconds into the prompt. They would not hear the whole prompt which seems inappropriate. The reason for maxspeech timeout is to catch continuous noise and keep it from occupying a recognizer; but what should happen in periods of silence in the hotword case?

Similarly, when is the no-input timer canceled in the hotword case? Is it when speech (not necessarily matching) is detected? Or is it only upon a match?

The correct behavior in my opinion is that the no-input timer is canceled only on a match, and that the recognition timer should be reset if silence (determined by complete timeout and incomplete timeout) is detected. If we are just processing intermittent noise, the no-input timer will eventually expire. Continuous noise is handled by the recognition timer. Of course other there are other possibilities as well, this is just one option that I think fits with VoiceXML.
begin:vcard
fn:Andrew Wahbe
n:Wahbe;Andrew
org:VoiceGenie Technologies INC.
adr:8th Floor;;1120 Finch Avenue W.;Toronto;ON;M3J 3H7;Canada
email;internet:awahbe at voicegenie.com
title:Senior Architect
tel;work:(416) 736-0905 ext. 258
tel;fax:(416) 736-1551
x-mozilla-html:TRUE
url:http://www.voicegenie.com
version:2.1
end:vcard

_______________________________________________
Speechsc mailing list
Speechsc at ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc