Last Modified: 2004-01-22
The speechsc Work Group will develop protocols to support distributed media processing of audio streams. The focus of this working group is to develop protocols to support ASR, TTS, and SV. The working group will only focus on the secure distributed control of these servers.
The working group will develop an informational RFC detailing the architecture and requirements for distributed speechsc control. In addition, the requirements document will describe the use cases driving these requirements. The working group will then examine existing media-related protocols, especially RTSP, for suitability as a protocol for carriage of speechsc server control. The working group will then propose extensions to existing protocols or the development of new protocols, as appropriate, to meet the requirements specified in the informational RFC.
The protocol will assume RTP carriage of media. Assuming session-oriented media transport, the protocol will use SDP to describe the session.
The working group will not be investigating distributed speech recognition (DSR), as exemplified by the ETSI Aurora project. The working group will not be recreating functionality available in other protocols, such as SIP or SDP. The working group will offer changes to existing protocols, with the possible exception of RTSP, to the appropriate IETF work group for consideration. This working group will explore modifications to RTSP, if required.
It is expected that we will coordinate our work in the IETF with the W3C Mutlimodal Interaction Work Group; the ITU-T Study Group 16 Working Party 3/16 on SG 16 Question 15/16; the 3GPP TSG SA WG1; and the ETSI Aurora STQ.
Once the current set of milestones is completed, the speechsc charter may be expanded, with IESG approval, to cover additional uses of the technology, such as the orchestration of multiple ASR/TTS/SV servers, the accommodation of additional types of servers such as simultaneous translation servers, etc.
|Done||Requirements ID submitted to IESG for publication (informational)|
|Done||Submit Internet Draft(s) Analyzing Existing Protocols (informational)|
|Done||Submit Internet Draft Describing New Protocol (if required) (standards track)|
|Oct 03||Submit Drafts to IESG for publication|
comments.SpeechSC Minutes 040302 17.00-18.00 Magnus Westerlund Chairs Introduction and WG Status --------------------------------- The WG chairs started with agenda bashing, followed up with presenting the WG status. The SpeechSC requirements document has been approved by the IESG with some smaller edits requested. However the document was lost between IESG and the RFC-Editor. This has delayed the publication. There is a milestone to request publication of any drafts by October 03. The current goal is to request publication no later than October 04. Separate Record Function Discussion ----------------------------------- The question to the WG was: Should there be a explicit recording function in MRCPv2? The draft version 01 does not have a pure recording function. Recording behaviour is determined by using speech recognition or at least voice activity detection. There was some discussion around the use cases for recording. One use cases mentioned for this behaviour is voice mail recording, where the recording is controlled through voice recognition. Therefore there is desire to have a RECORD resource, a RECORD method which has has a header indicating how the recorder should perform speech recognition. Another use case that match this behaviour is recording for training, or verification. The conclusion of the discussion was there is no expressed need for blind recording, any one needing this can use RTSP record. Also this use case should be mentioned in the protocol spec to motivate the functionality. MRCPv2 ------ The draft version 02 was made available on mailing list, will be submitted when firstname.lastname@example.org opens again. A number of open issues where discussed. Presentation was made by Sarvi Shanmugham. NAT traversal for the MRCPv2 TCP control channel setup: As long as only one end-point is in a private space it is possible to make things work.If both entities are in a private space a relay will be needed. To get TCP to work some signalling to indicate how the TCP connect should be done is needed. This is similar to the MMUSIC work on Co-Media (draft-ietf-mmusic-sdp-comedia-05). Do we need an INTERMEDIATE-RECOG-RESULT: The WG was questioned if there any need for this functionality. Nobody expressed any desire for it. Unless anyone on the mailing list expresses a real need for this functionality, it will not be included. A mail will be sent to the mailing list to ask this question. Speech or hotword barge-in: Eric Burger asked if there is any protocol difference between the two. The answer is that real issues is actually to identify what type of barge in that has happened, as this may exist policies accepting either of the types. The conclusion is to confirm the with the mailing list that this feature is included if a solution exist. Multiple instances of a Header field Vs Single header field with multiple values: First there where some discussion around the historical reasons. Then it was asked; Are any reason why not to leave it as it is? As none had a real reason to change it from how SIP, and HTTP handles headers? The list shall be asked if they no a reason to change it. Header field ranges Confidence-Threshold, Speed-Vs-Accuracy etc(0.0 - 1.0 or 0 - 100): There was consensus in the room to use 0.0-1.0 ranges. It will be confirmed by the mailing list. Proposal to specify a fixed header with a vendor identifier and a vendor registry for the Recognizer context block: David Oran stated that IESG has concerns with vendor specific extensions that make things fail. To make this work, the specification needs to ensure that it is optional and can be ignored. No new error cases should be generated by this. Also the motivation of this was discussed, allowing some resources to work better, however it is not required to. A proposal was: The client MUST copy the header field to the next resource within the session. Some discussion of making the MUST a SHOULD. After some more discussion around the issues the following conclusion was reached in the room: Client must copy, Server must not barf. Server is not allowed to reject a request based on empty or non-present header. The WG should also look into if there exist an already existing vendor registry that can be leveraged, for example with IANA. DTMF support and RFC 2833 support: The conclusion was: If one supports DTMF, one MUST support RFC 2833. Confirm consensus on the mailing list. Security support - sips: ? https? Digest ? SRTP?: What is the minimal security support to implement. To help interoperability it is normal to require a single solution as being mandatory when having security features. The discussion was split into the different parts. For the MRCP channel it where consensus for having MANDATORY support of TLS. For the media channel, the initial proposal from David Oran was: when transmitting media streams requiring security one uses either SRTP or IPSec. Further discussion made the observation that no other place in the system does there exist a requirement for IPSec. Therefore there might be a reason for looking more a SRTP. Further comments was that the WG should early on contact security area advisors to have them check the proposal, thus avoiding late surprise. Another question was, how does one indicate in SDP that one should use IPSec for a media stream? Define grammars one at a time: This proposal was supported by the room. PAUSE on Barge In: Is there a need to have this instead of Stop on barge in? There where no comments raised on this topic. There is two proposal for similar functionality of determining available functionality: "OPTION commands for m-lines" and "SIP Callee capability for resource description and capability". The proposal is to check if the SIP Callee method can solve everything. IF not then one needs to look into SIP OPTIONS. 3PCC model of connecting with the MRCPv2 server: It was proposed that using Offer-Answer will solve the problem. Invite response can be the initial offer. Specification conclusion: The specification is believed to be functionality wise completed. However it needs review to ensure that everything works. The WG chairs asked Sarvi if was possible to have a target of working group last call by end of April, which he confirmed.