[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Speechsc] Media stream synchronization in RECOGNIZE method



I agree with Sarvi.

I think that draft 9 covers spells out the required support in section: 
9.9.  RECOGNIZE
   "The recognizer SHOULD expect the media to start flowing when it 
receives the
   recognize request, but SHOULD NOT buffer anything it receives
   beforehand."

The only exception would be for DTMF as described in section: 9.4.32. 
DTMF-Buffer-Time

Thanks,

Brett Gavagni 
WebSphere Voice Server Development 
http://www-306.ibm.com/software/pervasive/voice_server/
gavagni at us.ibm.com




"Shanmugham, Saravanan" <sarvi at cisco.com> 
03/16/2006 01:10 PM

To
<ilyak at nscspeech.com>, <speechsc at ietf.org>
cc

Subject
RE: [Speechsc] Media stream synchronization in RECOGNIZE method






I should let some of the server implementors respond to it.
 
>From the standards point of view. The server should not buffer audio 
received before the RECOGNIZE command.
This is mainly because we wouldn't know how much to buffer, and if we 
buffer too long we would end up using audio segments received before the 
RECOGNIZE command. Which could be from a previous utterance.
 
Sarvi 

From: Ilya Knyazhansky [mailto:iltak at nscspeech.com] 
Sent: Wednesday, March 01, 2006 7:31 AM
To: speechsc at ietf.org
Subject: [Speechsc] Media stream synchronization in RECOGNIZE method

Hi All,
 
I am currently implementing the MRCP interface for an ASR developed by our 
company and came across a problem of 
synchronizing media stream (RTP packets) with the recognition request 
(RECOGNIZE method) as the protocol draft states:
 
(Somewhere around page 100 of draft-ietf-speechsc-mrcpv2-09.txt)
<<
   A number of
   mechanisms exist to resolve this condition and the mechanism chosen
   is left to the implementers of recognition resource.  The recognizer
   SHOULD expect the media to start flowing when it receives the
   recognize request, but SHOULD NOT buffer anything it receives
   beforehand.
>>
 
I have a couple of questions regarding this statement:
 
1. What the proposed mechanisms ? I would appreciate getting any pointer 
to some sort of solution/algorithm
2. How this can be achieved w/o buffering packets received before 
recognition request
 
Thanks,
Ilya Knyazhansky
ilyak at nscspeech.com
 _______________________________________________
Speechsc mailing list
Speechsc at ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc



_______________________________________________
Speechsc mailing list
Speechsc at ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc