[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Speechsc] Speaker Verification - Insufficient or Noisy Speech
I sent an email previously requesting information on how a speaker
verification
system implementing MRCPv2 should cope in the situation, where there was
insufficient or poor quality speech arriving on the RTP audio stream. It
seemed
to me that was an area of some deficiency in the specification. I
received no
feedback other than one response saying that to his knowledge there were
no
other implementers for Speaker Verification.
Below I outline the MRCPv2 exchanges for a training operation:
C->S: MRCP/2.0 207 START-SESSION 314161
Channel-Identifier:32AECB23433801 at speakverify
Repository-URI:http://www.example.com/voiceprintdbase/
Voiceprint-Mode:train
Voiceprint-Identifier:johnsmith.voiceprint
S->C: MRCP/2.0 82 314161 200 COMPLETE
Channel-Identifier:32AECB23433801 at speakverify
C->S: MRCP/2.0 76 VERIFY 314162
Channel-Identifier:32AECB23433801 at speakverify
S->C: MRCP/2.0 85 314162 200 IN-PROGRESS
Channel-Identifier:32AECB23433801 at speakverify
The end-point detector show insufficient data (which is buffered), or bad
signal quality (bad SNR for example). Note that no START-OF-INPUT has NOT
been sent although speech has begun.
S->C: MRCP/2.0 140 VERIFICATION-COMPLETE 314162 COMPLETE
Channel-Identifier:32AECB23433801 at speakverify
Completion-Cause:002 no-input-timeout
This is undesirable from my perspective since it gives the impression to
the
client that no data has been received (untrue in the insufficient data
case), and
provides no distinction between this and the "bad data" case. This
information
might be of utility to a call-flow designer in an IVR system.
I also note that in the case of text-independent verifiers several turns
worth of
data may be required for a verification. Several rounds of "no input"
timeouts
would surely be confusing to the client, yet this class of verifiers may
be unable
to generate and nlsml+xml response on the nth dialog turn.
The enrolment might then continue:
C->S: MRCP/2.0 76 VERIFY 314163
Channel-Identifier:32AECB23433801 at speakverify
S->C: MRCP/2.0 85 314163 200 IN-PROGRESS
Channel-Identifier:32AECB23433801 at speakverify
S->C: MRCP/2.0 96 START-OF-INPUT 314163 IN-PROGRESS
Channel-Identifier:32AECB23433801 at speakverify
S->C: MRCP/2.0 131 VERIFICATION-COMPLETE 314163 COMPLETE
Channel-Identifier:32AECB23433801 at speakverify
Completion-Cause:000 success
C->S: MRCP/2.0 76 VERIFY 314164
Channel-Identifier:32AECB23433801 at speakverify
S->C: MRCP/2.0 85 314164 200 IN-PROGRESS
Channel-Identifier:32AECB23433801 at speakverify
S->C: MRCP/2.0 96 START-OF-INPUT 314164 IN-PROGRESS
Channel-Identifier:32AECB23433801 at speakverify
S->C: MRCP/2.0 131 VERIFICATION-COMPLETE 314164 COMPLETE
Channel-Identifier:32AECB23433801 at speakverify
Completion-Cause:000 success
C->S: MRCP/2.0 81 END-SESSION 314174
Channel-Identifier:32AECB23433801 at speakverify
S->C: MRCP/2.0 82 314174 200 COMPLETE
Channel-Identifier:32AECB23433801 at speakverify
Since I received no responses (perhaps due to being close to the holiday
season),
I will venture a proposal for extending the RFC to include the bad signal
cases
(+ indicates an addition, * a modification)
+------------+--------------------------+---------------------------+
| Cause-Code | Cause-Name | Description |
+------------+--------------------------+---------------------------+
| 000 | success | VERIFY or |
| | | VERIFY-FROM-BUFFER |
| | | request completed |
| | | successfully. The verify |
| | | decision can be |
| | | "accepted", "rejected", |
| | | or "undecided". |
| 001 | error | VERIFY or |
| | | VERIFY-FROM-BUFFER |
| | | request terminated |
| | | prematurely due to a |
| | | verification resource or |
| | | system error. |
| 002 | no-input-timeout | VERIFY request completed |
| | | with no result due to a |
| | | no-input-timeout. |
| 003 | too-much-speech-timeout | VERIFY request completed |
| | | result due to too much |
| | | speech. |
| 004 | speech-too-early | VERIFY request completed |
| | | with no result due to |
| | | spoke too soon. |
+ | 005 | insufficient-speech | VERIFY or |
+ | | | VERIFY-FROM-BUFFER |
+ | | | request completed |
+ | | | successfully but had |
+ | | | insufficient speech to |
+ | | | complete. More speech |
+ | | | will complete the current |
+ | | | incremental operation |
+ | 006 | bad-speech | VERIFY or |
+ | | | VERIFY-FROM-BUFFER |
+ | | | request completed |
+ | | | unsuccessfully, the |
+ | | | speech quality was too |
+ | | | poor |
* | 007 | buffer-empty | VERIFY-FROM-BUFFER |
| | | request completed with no |
| | | result due to empty |
| | | buffer. |
* | 008 | out-of-sequence | Verification operation |
| | | failed due to |
| | | out-of-sequence method |
| | | invocations. For example |
| | | calling VERIFY before |
| | | QUERY-VOICEPRINT. |
* | 009 | repository-uri-failure | Failure accessing |
| | | Repository URI. |
* | 010 | repository-uri-missing | Repository-uri is not |
| | | specified. |
* | 011 | voiceprint-id-missing | Voiceprint-identification |
| | | is not specified. |
* | 012 | voiceprint-id-not-exist | Voiceprint-identification |
| | | does not exist in the |
| | | voiceprint repository. |
+------------+--------------------------+---------------------------+
Alternatively the new entries could be appended for compatibility. The
only
disadvantage to doing so would be that entries would not be grouped in the
table by category.
I'll happily accept any corrections to my understanding, incase I have
misread
the spec, or feedback on my suggestions.
NIK WALDRON
_______________________________________________
Speechsc mailing list
Speechsc at ietf.org
https://www.ietf.org/mailman/listinfo/speechsc
Supplemental web site:
<http://www.standardstrack.com/ietf/speechsc>