2.7.15 Speech Services Control (speechsc)

NOTE: This charter is a snapshot of the 62nd IETF Meeting in Minneapolis, MN USA. It may now be out-of-date.

Last Modified: 2005-02-07


David Oran <oran@cisco.com>
Eric Burger <eburger@brooktrout.com>

Transport Area Director(s):

Allison Mankin <mankin@psg.com>
Jon Peterson <jon.peterson@neustar.biz>

Transport Area Advisor:

Jon Peterson <jon.peterson@neustar.biz>

Mailing Lists:

General Discussion: speechsc@ietf.org
To Subscribe: speechsc-request@ietf.org
In Body: subscribe
Archive: http://www.ietf.org/mail-archive/web/speechsc/index.html

Description of Working Group:

Many multimedia applications can benefit from having Automated Speech
Recognition (ASR), Text to Speech (TTS), and Speaker Verification (SV)
processing available as a distributed, network resource. To date,
are a number of proprietary ASR, TTS, and SV API's, as well as two
drafts, that address this problem. However, there are serious
deficiencies to the existing drafts relating to this problem. In
particular, they mix the semantics of existing protocols yet are close
enough to other protocols as to be confusing to the implementer.

The speechsc Work Group will develop protocols to support distributed
media processing of audio streams. The focus of this working group is
to develop protocols to support ASR, TTS, and SV. The working group
will only focus on the secure distributed control of these servers.

The working group will develop an informational RFC detailing the
architecture and requirements for distributed speechsc control. In
addition, the requirements document will describe the use cases
these requirements. The working group will then examine existing
media-related protocols, especially RTSP, for suitability as a
for carriage of speechsc server control. The working group will then
propose extensions to existing protocols or the development of new
protocols, as appropriate, to meet the requirements specified in the
informational RFC.

The protocol will assume RTP carriage of media. Assuming
session-oriented media transport, the protocol will use SDP to
the session.

The working group will not be investigating distributed speech
recognition (DSR), as exemplified by the ETSI Aurora project. The
working group will not be recreating functionality available in other
protocols, such as SIP or SDP. The working group will offer changes to
existing protocols, with the possible exception of RTSP, to the
appropriate IETF work group for consideration. This working group will
explore modifications to RTSP, if required.

It is expected that we will coordinate our work in the IETF with the
W3C Mutlimodal Interaction Work Group; the ITU-T Study Group 16
Party 3/16 on SG 16 Question 15/16; the 3GPP TSG SA WG1; and the ETSI
Aurora STQ.

Once the current set of milestones is completed, the speechsc charter
may be expanded, with IESG approval, to cover additional uses of the
technology, such as the orchestration of multiple ASR/TTS/SV servers,
the accommodation of additional types of servers such as simultaneous
translation servers, etc.

Goals and Milestones:

Done  Requirements ID submitted to IESG for publication (informational)
Done  Submit Internet Draft(s) Analyzing Existing Protocols (informational)
Done  Submit Internet Draft Describing New Protocol (if required) (standards track)
Oct 03  Submit Drafts to IESG for publication
Sep 04  Work Group Last Call MRCPv2 specification
Oct 04  Submit MRCPv2 specification to IESG


  • draft-ietf-speechsc-reqts-05.txt
  • draft-ietf-speechsc-mrcpv2-06.txt

    No Request For Comments

    Current Meeting Report


    - Intro and agenda bashing
    - Push to last call for MRCPv2 (draft-ietf-speechsc-mrcpv2-06.txt)
    - Discussion of security issues on requirements (draft-ietf-speechsc-reqts-06.txt)
    - Wrap-up

    MRCPv2 (Sarvi Shanmugham)

    - At version 06; includes
    - Edits from last meeting
    - IANA considerations section
    - Security considerations section
    - Clarifications and comments discussed on alias through 2/21/05
    - Baggia Paolo: Comments, Part 1
    - Jeff Kusnitz: Grammar weights issue
    - Klaus Reifenrath: NLSML issues (partial)

    - Resolved issues NOT included in rev 6:
    - Baggia Paolo comments, part 2:
    - Editorial remarks
    - Negotiate a future alternative for NLSML, to allow the client to optionally specify the alternative as a mime-type.
    - Cite the Semantic Interpretation spec, but do NOT add any additional text.
    - SDP changes:
    - Follow MMUSIC, per Magnus
    - Dropped SCTP for lack of SDP mechanism to address it; propose to address it independent of the Speechsc WG
    - IANA registrations: resource-types, method/event names, SDP proto values, SDP attributes, etc.
    - Dave and Sarvi to review with IANA tomorrow (3/9)

    - Open issues:
    - Do we want DEFINE-LEXICON support in the recognizer? Proposed by Baggia Paolo; Sarvi has no opinion. Will close offline via the list.
    - Security considerations (see below)

    - Should expect draft ready for last call within 2-3 weeks.


    - Have been holding up requirements draft for over a year.

    - Principal concern (as voice, e.g., in CSTA Report): Using any biometric or any sort of speech recognition for speaker identification is a bad idea.
    - Privacy: main concern is potential for large scale theft of voiceprints and concomitant privacy loss
    - Security of protocols: usual questions about confidentiality, integrity and authentication/authorization
    - Threats/vulnerabilities of speech as a biometric

    - (Real) threats to SI/SV protocols:
    - External threats (i.e. by unauthorized parties)
    - Attacks can be foiled by well-understood means
    - Speechsc employs
    TLS encryption of the control channel
    SRTP encryption of the media channel
    Authentication/authorization of all elements
    ==> NOT an issue.
    - Internal attacks: stealing the voiceprint DB or compromising the the server system, including its keying material
    - Prudent to store the data encrypted to foil theft.
    - But Speechsc is a protocol standard, so not clear what we need to do anything about this ... But many people think we must. So ...
    - Do whatever is done with Radius!
    - Replay and impersonation attacks
    - Premise: Possession of a stolen voiceprint does NOT by itself enable impersonation, due to use of challenge-response protocols.
    - Stephan (?) disputed that premise, in light of the ability to construct suitable responses given enough voice data. He suggested adding knowledge base on top of challenge-response -- e.g. "What's the name of your favorite pet?"

    - Voiceprints of varying quality can be obtained by anything that can record your voice. So voiceprint used for SI/SV needs to be protected.
    - Question: Is it really useful to requires servers holding voiceprints to be more secure than those holding speech recordings, especially if those recordings have meta-data allowing the source to be identified (e.g. calling phone number, logged in user id)?
    - To be discussed on the list.

    - Other issues:
    - Implicit agreement that your identity can be ascertained by the participants?
    - What about a secure session with explicit anonymity? Is SI out of line with common expectations of privacy?
    - If indirect methods of id need to be thwarted, there are things like voice distorting devices that render the SI/SV impotent.

    - Stephan (?) expressed skepticism as to the need for Speechsc to address these issues. Unfortunately, per Dave, the IESG is equally skeptical about biometrics and has thus far required these issues be addressed.

    - Next steps:
    - Further discussion of the above issues on the list.
    - Work with Transport & Security ADs to resolve these issues.
    - If needed, establish an ongoing "security advisor" function to help get closure on both the requirements and the MRCPv2 spec.


    Apr 05 MRCPv2 WGLC
    Jun 05 Submit MRCPv2 to IESG


    MRCPv2 Status