Speech Services Control (speechsc) Charter

2.7.15 Speech Services Control (speechsc)

NOTE: This charter is a snapshot of the 59th IETF Meeting in Seoul, Korea. It may now be out-of-date.

Last Modified: 2004-01-22

Chair(s):

David Oran <oran@cisco.com>
Eric Burger <eburger@snowshore.com>

Transport Area Director(s):

Allison Mankin <mankin@psg.com>
Jon Peterson <jon.peterson@neustar.biz>

Transport Area Advisor:

Jon Peterson <jon.peterson@neustar.biz>

Mailing Lists:

General Discussion: speechsc@ietf.org
To Subscribe: speechsc-request@ietf.org
In Body: subscribe
Archive: www.ietf.org/mail-archive/working-groups/speechsc/current/maillist.html

Description of Working Group:

Many multimedia applications can benefit from having Automated Speech Recognition (ASR), Text to Speech (TTS), and Speaker Verification (SV) processing available as a distributed, network resource. To date, there are a number of proprietary ASR, TTS, and SV API's, as well as two IETF drafts, that address this problem. However, there are serious deficiencies to the existing drafts relating to this problem. In particular, they mix the semantics of existing protocols yet are close enough to other protocols as to be confusing to the implementer.

The speechsc Work Group will develop protocols to support distributed media processing of audio streams. The focus of this working group is to develop protocols to support ASR, TTS, and SV. The working group will only focus on the secure distributed control of these servers.

The working group will develop an informational RFC detailing the architecture and requirements for distributed speechsc control. In addition, the requirements document will describe the use cases driving these requirements. The working group will then examine existing media-related protocols, especially RTSP, for suitability as a protocol for carriage of speechsc server control. The working group will then propose extensions to existing protocols or the development of new protocols, as appropriate, to meet the requirements specified in the informational RFC.

The protocol will assume RTP carriage of media. Assuming session-oriented media transport, the protocol will use SDP to describe the session.

The working group will not be investigating distributed speech recognition (DSR), as exemplified by the ETSI Aurora project. The working group will not be recreating functionality available in other protocols, such as SIP or SDP. The working group will offer changes to existing protocols, with the possible exception of RTSP, to the appropriate IETF work group for consideration. This working group will explore modifications to RTSP, if required.

It is expected that we will coordinate our work in the IETF with the W3C Mutlimodal Interaction Work Group; the ITU-T Study Group 16 Working Party 3/16 on SG 16 Question 15/16; the 3GPP TSG SA WG1; and the ETSI Aurora STQ.

Once the current set of milestones is completed, the speechsc charter may be expanded, with IESG approval, to cover additional uses of the technology, such as the orchestration of multiple ASR/TTS/SV servers, the accommodation of additional types of servers such as simultaneous translation servers, etc.

Goals and Milestones:

Done		Requirements ID submitted to IESG for publication (informational)
Done		Submit Internet Draft(s) Analyzing Existing Protocols (informational)
Done		Submit Internet Draft Describing New Protocol (if required) (standards track)
Oct 03		Submit Drafts to IESG for publication

Internet-Drafts:

- draft-ietf-speechsc-reqts-05.txt

- draft-ietf-speechsc-mrcpv2-01.txt

No Request For Comments

Current Meeting Report

comments.SpeechSC Minutes 040302 17.00-18.00
Magnus Westerlund



Chairs Introduction and WG Status
---------------------------------


The WG chairs started with agenda bashing, followed up with 
presenting the WG status. The SpeechSC requirements document has been 
approved by the IESG with some smaller edits requested. However the 
document was lost between IESG and the RFC-Editor. This has delayed the 
publication. There is a milestone to request publication of any drafts by 
October 03. The current goal is to request publication no later than 
October 04.


Separate Record Function Discussion
-----------------------------------


The question to the WG was: Should there be a explicit recording 
function in MRCPv2? The draft version 01 does not have a pure 
recording function. Recording behaviour is determined by using speech 
recognition or at least voice activity detection. There was some 
discussion around the use cases for recording. One use cases mentioned for 
this behaviour is voice mail recording, where the recording is 
controlled through voice recognition. Therefore there is desire to have a 
RECORD resource, a RECORD method which has has a header indicating how the 
recorder should perform speech recognition. Another use case that match 
this behaviour is recording for training, or verification. The 
conclusion of the discussion was there is no expressed need for blind 
recording, any one needing this can use RTSP record. Also this use case 
should be mentioned in the protocol spec to motivate the 
functionality.



MRCPv2
------


The draft version 02 was made available on mailing list, will be 
submitted when internet-drafts@ietf.org opens again. A number of open 
issues where discussed. Presentation was made by Sarvi Shanmugham.


NAT traversal for the MRCPv2 TCP control channel setup: As long as only one 
end-point is in a private space it is possible to make things work.If both 
entities are in a private space a relay will be needed. To get TCP to work 
some signalling to indicate how the TCP connect should be done is 
needed. This is similar to the MMUSIC work on Co-Media 
(draft-ietf-mmusic-sdp-comedia-05).


Do we need an INTERMEDIATE-RECOG-RESULT: The WG was questioned if there any 
need for this functionality. Nobody expressed any desire for it. Unless 
anyone on the mailing list expresses a real need for this 
functionality, it will not be included. A mail will be sent to the 
mailing list to ask this question.


Speech or hotword barge-in: Eric Burger asked if there is any protocol 
difference between the two. The answer is that real issues is actually to 
identify what type of barge in that has happened, as this may exist 
policies accepting either of the types. The conclusion is to confirm the 
with the mailing list that this feature is included if a solution exist.


Multiple instances of a Header field Vs Single header field with 
multiple values: First there where some discussion around the 
historical reasons. Then it was asked; Are any reason why not to leave it as 
it is? As none had a real reason to change it from how SIP, and HTTP 
handles headers? The list shall be asked if they no a reason to change it.


Header field ranges Confidence-Threshold, Speed-Vs-Accuracy etc(0.0 - 1.0 or 0 
- 100): There was consensus in the room to use 0.0-1.0 ranges. It will be 
confirmed by the mailing list.


Proposal to specify a fixed header with a vendor identifier and a vendor 
registry for the Recognizer context block: David Oran stated that IESG has 
concerns with vendor specific extensions that make things fail. To make 
this work, the specification needs to ensure that it is optional and can be 
ignored. No new error cases should be generated by this. Also the 
motivation of this was discussed, allowing some resources to work 
better, however it is not required to. A proposal was: The client MUST copy 
the header field to the next resource within the session. Some 
discussion of making the MUST a SHOULD. After some more discussion around 
the issues the following conclusion was reached in the room: Client must 
copy, Server must not barf. Server is not allowed to reject a request 
based on empty or non-present header.


The WG should also look into if there exist an already existing vendor 
registry that can be leveraged, for example with IANA.


DTMF support and RFC 2833 support: The conclusion was: If one supports 
DTMF, one MUST  support RFC 2833. Confirm consensus on the mailing list.


Security support - sips: ? https? Digest ? SRTP?: What is the minimal 
security support to implement. To help interoperability it is normal to 
require a single solution as being mandatory when having security 
features. The discussion was split into the different parts. For the MRCP 
channel it where consensus for having MANDATORY support of TLS. For the 
media channel, the initial proposal from David Oran was: when 
transmitting media streams requiring security one uses either SRTP or 
IPSec. Further discussion made the observation that no other place in the 
system does there exist a requirement for IPSec. Therefore there might be a 
reason for looking more a SRTP. Further comments was that the WG should 
early on contact security area advisors to have them check the 
proposal, thus avoiding late surprise. Another question was, how does one 
indicate in SDP that one should use IPSec for a media stream?


Define grammars one at a time: This proposal was supported by the room.


PAUSE on Barge In: Is there a need to have this instead of Stop on barge in? 
There where no comments raised on this topic.


There is two proposal for similar functionality of determining 
available functionality: "OPTION commands for m-lines" and "SIP Callee 
capability for resource description and capability". The proposal is to 
check if the SIP Callee method can solve everything. IF not then one needs to 
look into SIP OPTIONS.


3PCC model of connecting with the MRCPv2 server: It was proposed that 
using Offer-Answer will solve the problem. Invite response can be the 
initial offer.


Specification conclusion: The specification is believed to be 
functionality wise completed. However it needs review to ensure that 
everything works. The WG chairs asked Sarvi if was possible to have a 
target of working group last call by end of April, which he 
confirmed.

Slides

Agenda

Presentation 0

MRCPv2

Presentation 1