2.8.19 Speech Services Control (speechsc)

NOTE: This charter is a snapshot of the 60th IETF Meeting in San Diego, California USA. It may now be out-of-date.

Last Modified: 2004-07-26

Chair(s):
David Oran <oran@cisco.com>
Eric Burger <eburger@brooktrout.com>
Transport Area Director(s):
Allison Mankin <mankin@psg.com>
Jon Peterson <jon.peterson@neustar.biz>
Transport Area Advisor:
Jon Peterson <jon.peterson@neustar.biz>
Mailing Lists:
General Discussion: speechsc@ietf.org
To Subscribe: speechsc-request@ietf.org
In Body: subscribe
Archive: http://www.ietf.org/mail-archive/web/speechsc/index.html
Description of Working Group:
Many multimedia applications can benefit from having Automated Speech Recognition (ASR), Text to Speech (TTS), and Speaker Verification (SV) processing available as a distributed, network resource. To date, there are a number of proprietary ASR, TTS, and SV API's, as well as two IETF drafts, that address this problem. However, there are serious deficiencies to the existing drafts relating to this problem. In particular, they mix the semantics of existing protocols yet are close enough to other protocols as to be confusing to the implementer.

The speechsc Work Group will develop protocols to support distributed media processing of audio streams. The focus of this working group is to develop protocols to support ASR, TTS, and SV. The working group will only focus on the secure distributed control of these servers.

The working group will develop an informational RFC detailing the architecture and requirements for distributed speechsc control. In addition, the requirements document will describe the use cases driving these requirements. The working group will then examine existing media-related protocols, especially RTSP, for suitability as a protocol for carriage of speechsc server control. The working group will then propose extensions to existing protocols or the development of new protocols, as appropriate, to meet the requirements specified in the informational RFC.

The protocol will assume RTP carriage of media. Assuming session-oriented media transport, the protocol will use SDP to describe the session.

The working group will not be investigating distributed speech recognition (DSR), as exemplified by the ETSI Aurora project. The working group will not be recreating functionality available in other protocols, such as SIP or SDP. The working group will offer changes to existing protocols, with the possible exception of RTSP, to the appropriate IETF work group for consideration. This working group will explore modifications to RTSP, if required.

It is expected that we will coordinate our work in the IETF with the W3C Mutlimodal Interaction Work Group; the ITU-T Study Group 16 Working Party 3/16 on SG 16 Question 15/16; the 3GPP TSG SA WG1; and the ETSI Aurora STQ.

Once the current set of milestones is completed, the speechsc charter may be expanded, with IESG approval, to cover additional uses of the technology, such as the orchestration of multiple ASR/TTS/SV servers, the accommodation of additional types of servers such as simultaneous translation servers, etc.

Goals and Milestones:
Done  Requirements ID submitted to IESG for publication (informational)
Done  Submit Internet Draft(s) Analyzing Existing Protocols (informational)
Done  Submit Internet Draft Describing New Protocol (if required) (standards track)
Oct 03  Submit Drafts to IESG for publication
Internet-Drafts:
  • - draft-ietf-speechsc-reqts-05.txt
  • - draft-ietf-speechsc-mrcpv2-04.txt
  • No Request For Comments

    Current Meeting Report

    SPEECHSC WG Meeting minutes
    -------------------------------------------------


    The agenda was agreed with no bashing (see slides in proceedings)


    Thomas Gal kindly agreed to serve as scribe. Thank you Thomas! (He also took notes during an extension of the WG meeting after the official time slot had run out).


    Dave Oran gave a brief update on the status of the requirements document (draft-ietf-speechsc-reqts-06.txt). This is still stuck in review, because there is a DISCUSS outstanding concerning the security characteristics of using Speechsc for speaker verification. Dave Oran and Dan Burnett will work to try to eliminate this roadblock so the document can progress.


    The remainder of the alloted time was devoted to a status report and review of the outstanding issues on the protocol specification, draft-ietf-speechsc-mrcpv2-04.txt. The status and issues may be found in the proceedings in the slides prepared by Sarvi Shanmughan; please see that for the status update portion. A list of issues and proposed resolutions from the discussons follows.


    - NLSML -> "borrow" this from the W3C since they are abandoning it, and we have no normative output format.


    - The IANA considerations section of the spec needs considerable work. Eric Burger and Dave Oran volunteered to help


    - We need to clarify the operation of verifying from buffer while the buffer is getting filled.


    - Security threshold needs to be better defined. Should forward this issue on to the list.


    - Accept-Threshold vs Adapt-Threshold are not well enough defined. In particular, the security implications of adapting to voiceprint input needs to be addressed, and the transactional properties of adaptation both within a session and across a session need to be clearly articulated. For example, do adaptations have ACID (Atomic, Consistent, Independent, Durable) behavior? Is there a specific "commit" action for adaptation.


    - Voiceprint-Group -> Would we like to support aliasing of lists of possible verification subjects. General feeling is that this can be done using content indirection through list URLs.


    - Voiceprint adaptation in multi-verification. This was a HUGE discussion that left a lot of issues open. Seemed like generally people agreed that adapting multiple voices was not a good idea. Also the issues of locking adapation rights came up and should be discussed further.


    - Speech Marker Timestamps: This was also a fairly lengthy topic covering the merits of using StartOfSpeech, SpeechComplete, SpeechMarker etc with NTP timestamps for the purpose of utilizing speech markers. There was a rough consensus on locking NTP to the RTP timestamps, and rough delays would be inconsequential.


    - Capability Routing: currently we have resource type and capability, do we need anything else? General consensus that we should use SIP callee-capabilities for this, and that mrcpv2 would introduce a couple of new code points for callee-capabilities, which could be registered through IANA as part of the mrcpv2 specification.


    [end of minutes]

    Slides

    Agenda
    MRCPv2 Status