Thanks for your response Dan,
The additional code resolves the problem (2)
of noisy or otherwise ‘bad’ input, and (3) clarifies how to specify
that additional data is needed for training.
I had not realised that result structure
was intended be used in the case of enrolments as well as verifications.
I’m not sure if my confusion has reach beyond myself and justifies an
explanatory note in the verification section. Thanks for the
clarification in any case.
I think that the document would benefit from
an appendix (or a separate document as is the case for SDP) which has examples of
all of the major use cases. In my opinion examples often resolve
confusion for readers learning a new protocol. I note that there are
examples in the document, although not any training (enrolment) examples that I
recall for speaker verification.
I appreciate the enormous effort that goes
into producing a standard protocol (everyone’s a critic). I’d
be happy to contribute some example conversations for Verification if such a
section or document eventuates.
Best regards,
NIK WALDRON
From:
dburnett at voxeo.com [mailto:dburnett at voxeo.com]
Sent: Wednesday, May 06, 2009 6:29
AM
To: Nik Waldron
Cc: speechsc at ietf.org
Subject: Re: [Speechsc] Speaker
Verification - Insufficient or Noisy Speech
Nik,
Thanks for your email.
There are three cases in what you have described:
1. speech not detected (because of SNR problem, etc.). This will
return no-input-timeout, just as it would for a speech recognizer.
2. speech detected, neither too early (speech-too-early) nor too much
(too-much-speech-timeout), but still unusable by the training or
verification process. Note that this could happen if the speech
passes the endpointer threshold but is too garbled or noisy to be of
use to the verification engine.
This case is not handled in MRCP today. I have added error code
011,
"speech-not-usable", for this case.
3. additional turns are needed: the <decision> result element can
be
used for this. "undecided" was the value we chose to represent
the
case where the engine did not yet have enough data to decide on a
verification or training result. Note that training decisions can
also be "accepted" or "rejected" just like verification
results -- the
former case means there is sufficient training data and the new
voiceprint is acceptable. The latter means there is sufficient
training data but the new voiceprint is rejected, because for example
it is too close to an existing voiceprint.
-- dan
On Jan 11, 2009, at 7:06 PM, Nik Waldron wrote:
> I sent an email previously requesting information on how a speaker
> verification
> system implementing MRCPv2 should cope in the situation, where there
> was
> insufficient or poor quality speech arriving on the RTP audio
> stream. It
> seemed
> to me that was an area of some deficiency in the specification. I
> received no
> feedback other than one response saying that to his knowledge there
> were
> no
> other implementers for Speaker Verification.
>
> Below I outline the MRCPv2 exchanges for a training operation:
>
> C->S: MRCP/2.0 207 START-SESSION 314161
>
Channel-Identifier:32AECB23433801 at speakverify
> Repository-URI:http://www.example.com/voiceprintdbase/
> Voiceprint-Mode:train
>
Voiceprint-Identifier:johnsmith.voiceprint
>
> S->C: MRCP/2.0 82 314161 200 COMPLETE
>
Channel-Identifier:32AECB23433801 at speakverify
>
> C->S: MRCP/2.0 76 VERIFY 314162
>
Channel-Identifier:32AECB23433801 at speakverify
>
> S->C: MRCP/2.0 85 314162 200 IN-PROGRESS
>
Channel-Identifier:32AECB23433801 at speakverify
>
> The end-point detector show insufficient data (which is buffered),
> or bad
> signal quality (bad SNR for example). Note that no
START-OF-INPUT
> has NOT
>
> been sent although speech has begun.
>
> S->C: MRCP/2.0 140 VERIFICATION-COMPLETE 314162
COMPLETE
>
Channel-Identifier:32AECB23433801 at speakverify
> Completion-Cause:002
no-input-timeout
>
> This is undesirable from my perspective since it gives the
> impression to
> the
> client that no data has been received (untrue in the insufficient data
> case), and
> provides no distinction between this and the "bad data"
case. This
> information
> might be of utility to a call-flow designer in an IVR system.
>
> I also note that in the case of text-independent verifiers several
> turns
> worth of
> data may be required for a verification. Several rounds of "no
input"
> timeouts
> would surely be confusing to the client, yet this class of verifiers
> may
> be unable
> to generate and nlsml+xml response on the nth dialog turn.
>
> The enrolment might then continue:
>
> C->S: MRCP/2.0 76 VERIFY 314163
>
Channel-Identifier:32AECB23433801 at speakverify
>
> S->C: MRCP/2.0 85 314163 200 IN-PROGRESS
>
Channel-Identifier:32AECB23433801 at speakverify
>
> S->C: MRCP/2.0 96 START-OF-INPUT 314163 IN-PROGRESS
>
Channel-Identifier:32AECB23433801 at speakverify
>
> S->C: MRCP/2.0 131 VERIFICATION-COMPLETE 314163
COMPLETE
>
Channel-Identifier:32AECB23433801 at speakverify
> Completion-Cause:000
success
>
> C->S: MRCP/2.0 76 VERIFY 314164
>
Channel-Identifier:32AECB23433801 at speakverify
>
> S->C: MRCP/2.0 85 314164 200 IN-PROGRESS
>
Channel-Identifier:32AECB23433801 at speakverify
>
> S->C: MRCP/2.0 96 START-OF-INPUT 314164 IN-PROGRESS
>
Channel-Identifier:32AECB23433801 at speakverify
>
> S->C: MRCP/2.0 131 VERIFICATION-COMPLETE 314164
COMPLETE
>
Channel-Identifier:32AECB23433801 at speakverify
> Completion-Cause:000
success
>
> C->S: MRCP/2.0 81 END-SESSION 314174
>
Channel-Identifier:32AECB23433801 at speakverify
>
> S->C: MRCP/2.0 82 314174 200 COMPLETE
> Channel-Identifier:32AECB23433801 at speakverify
>
> Since I received no responses (perhaps due to being close to the
> holiday
> season),
> I will venture a proposal for extending the RFC to include the bad
> signal
> cases
> (+ indicates an addition, * a modification)
>
> +------------+--------------------------
> +---------------------------+
> | Cause-Code |
Cause-Name
|
>
Description
|
> +------------+--------------------------
> +---------------------------+
> | 000 | success
| VERIFY
>
or
|
>
|
|
| VERIFY-FROM-
> BUFFER |
>
|
|
| request
> completed |
>
|
|
| successfully. The
> verify |
>
|
|
| decision can
> be |
>
|
|
| "accepted",
> "rejected", |
>
|
|
| or
> "undecided".
|
> | 001 |
error
| VERIFY
>
or
|
>
|
|
| VERIFY-FROM-
> BUFFER |
>
|
|
| request
> terminated |
>
| |
| prematurely due to
> a |
>
|
|
| verification resource
> or |
>
|
|
| system
>
error.
|
> | 002 |
no-input-timeout | VERIFY
request
> completed |
>
|
|
| with no result due to
> a |
>
|
|
| no-input-
> timeout. |
> | 003 |
too-much-speech-timeout | VERIFY request
> completed |
>
|
|
| result due to too
> much |
>
|
|
|
>
speech.
|
> | 004 |
speech-too-early | VERIFY
request
> completed |
>
|
|
| with no result due
> to |
>
|
|
| spoke too
> soon. |
> + | 005 |
insufficient-speech | VERIFY
>
or
|
> + |
|
| VERIFY-FROM-
> BUFFER |
> + |
|
| request
> completed |
> + |
|
| successfully but
> had |
> + |
|
| insufficient speech
> to |
> + |
|
| complete. More
> speech |
> + |
|
| will complete the
> current |
> + |
|
| incremental
> operation |
> + | 006 |
bad-speech
| VERIFY
>
or
|
> + |
|
| VERIFY-FROM-
> BUFFER |
> + |
|
| request
> completed |
> + |
|
| unsuccessfully,
> the |
> + |
|
| speech quality was
> too |
> + |
|
|
>
poor
|
> * | 007 |
buffer-empty
| VERIFY-FROM-
> BUFFER |
>
|
|
| request completed with
> no |
>
|
|
| result due to
> empty |
>
|
|
|
>
buffer.
|
> * | 008 |
out-of-sequence |
Verification
> operation |
>
|
|
| failed due
> to
|
>
|
|
| out-of-sequence
> method |
>
|
|
| invocations. For
> example |
>
|
|
| calling VERIFY
> before |
>
|
|
| QUERY-
> VOICEPRINT. |
> * | 009 |
repository-uri-failure | Failure
> accessing |
>
|
|
| Repository
> URI. |
> * | 010 |
repository-uri-missing | Repository-uri is
> not |
>
|
|
|
>
specified.
|
> * | 011 |
voiceprint-id-missing | Voiceprint-
> identification |
>
|
|
| is not
> specified. |
> * | 012 |
voiceprint-id-not-exist | Voiceprint-
> identification |
>
|
|
| does not exist in
> the |
>
|
|
| voiceprint
> repository. |
> +------------+--------------------------
> +---------------------------+
>
> Alternatively the new entries could be appended for
compatibility.
> The
> only
> disadvantage to doing so would be that entries would not be grouped
> in the
> table by category.
>
> I'll happily accept any corrections to my understanding, incase I have
> misread
> the spec, or feedback on my suggestions.
>
>
>
>
> NIK WALDRON
>
> _______________________________________________
> Speechsc mailing list
> Speechsc at ietf.org
> https://www.ietf.org/mailman/listinfo/speechsc
> Supplemental web site:
> <http://www.standardstrack.com/ietf/speechsc>
______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email
______________________________________________________________________