I sent an email previously requesting information on how a speaker
verification
system implementing MRCPv2 should cope in the situation, where there
was
insufficient or poor quality speech arriving on the RTP audio
stream. It
seemed
to me that was an area of some deficiency in the specification. I
received no
feedback other than one response saying that to his knowledge there
were
no
other implementers for Speaker Verification.
Below I outline the MRCPv2 exchanges for a training operation:
C->S: MRCP/2.0 207 START-SESSION 314161
Channel-Identifier:32AECB23433801 at speakverify
Repository-URI:http://www.example.com/voiceprintdbase/
Voiceprint-Mode:train
Voiceprint-Identifier:johnsmith.voiceprint
S->C: MRCP/2.0 82 314161 200 COMPLETE
Channel-Identifier:32AECB23433801 at speakverify
C->S: MRCP/2.0 76 VERIFY 314162
Channel-Identifier:32AECB23433801 at speakverify
S->C: MRCP/2.0 85 314162 200 IN-PROGRESS
Channel-Identifier:32AECB23433801 at speakverify
The end-point detector show insufficient data (which is buffered),
or bad
signal quality (bad SNR for example). Note that no START-OF-INPUT
has NOT
been sent although speech has begun.
S->C: MRCP/2.0 140 VERIFICATION-COMPLETE 314162 COMPLETE
Channel-Identifier:32AECB23433801 at speakverify
Completion-Cause:002 no-input-timeout
This is undesirable from my perspective since it gives the
impression to
the
client that no data has been received (untrue in the insufficient data
case), and
provides no distinction between this and the "bad data" case. This
information
might be of utility to a call-flow designer in an IVR system.
I also note that in the case of text-independent verifiers several
turns
worth of
data may be required for a verification. Several rounds of "no input"
timeouts
would surely be confusing to the client, yet this class of verifiers
may
be unable
to generate and nlsml+xml response on the nth dialog turn.
The enrolment might then continue:
C->S: MRCP/2.0 76 VERIFY 314163
Channel-Identifier:32AECB23433801 at speakverify
S->C: MRCP/2.0 85 314163 200 IN-PROGRESS
Channel-Identifier:32AECB23433801 at speakverify
S->C: MRCP/2.0 96 START-OF-INPUT 314163 IN-PROGRESS
Channel-Identifier:32AECB23433801 at speakverify
S->C: MRCP/2.0 131 VERIFICATION-COMPLETE 314163 COMPLETE
Channel-Identifier:32AECB23433801 at speakverify
Completion-Cause:000 success
C->S: MRCP/2.0 76 VERIFY 314164
Channel-Identifier:32AECB23433801 at speakverify
S->C: MRCP/2.0 85 314164 200 IN-PROGRESS
Channel-Identifier:32AECB23433801 at speakverify
S->C: MRCP/2.0 96 START-OF-INPUT 314164 IN-PROGRESS
Channel-Identifier:32AECB23433801 at speakverify
S->C: MRCP/2.0 131 VERIFICATION-COMPLETE 314164 COMPLETE
Channel-Identifier:32AECB23433801 at speakverify
Completion-Cause:000 success
C->S: MRCP/2.0 81 END-SESSION 314174
Channel-Identifier:32AECB23433801 at speakverify
S->C: MRCP/2.0 82 314174 200 COMPLETE
Channel-Identifier:32AECB23433801 at speakverify
Since I received no responses (perhaps due to being close to the
holiday
season),
I will venture a proposal for extending the RFC to include the bad
signal
cases
(+ indicates an addition, * a modification)
+------------+--------------------------
+---------------------------+
| Cause-Code | Cause-Name |
Description |
+------------+--------------------------
+---------------------------+
| 000 | success | VERIFY
or |
| | | VERIFY-FROM-
BUFFER |
| | | request
completed |
| | | successfully. The
verify |
| | | decision can
be |
| | | "accepted",
"rejected", |
| | | or
"undecided". |
| 001 | error | VERIFY
or |
| | | VERIFY-FROM-
BUFFER |
| | | request
terminated |
| | | prematurely due to
a |
| | | verification resource
or |
| | | system
error. |
| 002 | no-input-timeout | VERIFY request
completed |
| | | with no result due to
a |
| | | no-input-
timeout. |
| 003 | too-much-speech-timeout | VERIFY request
completed |
| | | result due to too
much |
| | |
speech. |
| 004 | speech-too-early | VERIFY request
completed |
| | | with no result due
to |
| | | spoke too
soon. |
+ | 005 | insufficient-speech | VERIFY
or |
+ | | | VERIFY-FROM-
BUFFER |
+ | | | request
completed |
+ | | | successfully but
had |
+ | | | insufficient speech
to |
+ | | | complete. More
speech |
+ | | | will complete the
current |
+ | | | incremental
operation |
+ | 006 | bad-speech | VERIFY
or |
+ | | | VERIFY-FROM-
BUFFER |
+ | | | request
completed |
+ | | | unsuccessfully,
the |
+ | | | speech quality was
too |
+ | | |
poor |
* | 007 | buffer-empty | VERIFY-FROM-
BUFFER |
| | | request completed with
no |
| | | result due to
empty |
| | |
buffer. |
* | 008 | out-of-sequence | Verification
operation |
| | | failed due
to |
| | | out-of-sequence
method |
| | | invocations. For
example |
| | | calling VERIFY
before |
| | | QUERY-
VOICEPRINT. |
* | 009 | repository-uri-failure | Failure
accessing |
| | | Repository
URI. |
* | 010 | repository-uri-missing | Repository-uri is
not |
| | |
specified. |
* | 011 | voiceprint-id-missing | Voiceprint-
identification |
| | | is not
specified. |
* | 012 | voiceprint-id-not-exist | Voiceprint-
identification |
| | | does not exist in
the |
| | | voiceprint
repository. |
+------------+--------------------------
+---------------------------+
Alternatively the new entries could be appended for compatibility.
The
only
disadvantage to doing so would be that entries would not be grouped
in the
table by category.
I'll happily accept any corrections to my understanding, incase I have
misread
the spec, or feedback on my suggestions.
NIK WALDRON
_______________________________________________
Speechsc mailing list
Speechsc at ietf.org
https://www.ietf.org/mailman/listinfo/speechsc
Supplemental web site:
<http://www.standardstrack.com/ietf/speechsc>