speechsc working group minutes, Wed July 16
reported by Edwin Aoki <firstname.lastname@example.org>
Eric Burger and David Oran chair
Administrivia and Agenda Bashing
Agenda Bashing 5 min
Requirement Status 4 min
Protocol Proposal 90 min
Protocol Analysis 20 min
Wrap up and next steps
There were no objections to the agenda as proposed.
Requirements Status - Dave Oran
The requirements document was in the IESG for some time, and the
majority of comments were integrated into
draft-ietf-speechsc-reqts-04. The security ADs asked for a couple minor
changes, which will be included in an -05 draft, including a reference to
the risks of use of biometrics, including speaker identification and
After those changes, the draft will go to the RFC editor.
Guido from the RNID had requested some changes in the wording of section
3.9. Dave indicated that he'd thought that those changes were already
incorporated in the -04 draft; Guido thought his comments were for -04.
Guido will verify that his comments are still appropriate for the -04
speechsc Protocol Proposal - Sarvi Shanmugham (via audio link)
The protocol proposal is now in draft form, based on the MRCP
proposal, also now in draft form. However, there was some issues that came
up relating to MRCP's tunneling capability. The proposal proposes a
SIP-based framework as a control channel to initiate sessions between
client and server. The control channel will run over TCP or SCTP and will
not use an unreliable protocol such as UDP.
This proposal doesn't address speaker identification or speaker
* The speechsc exchange is simple because it need not work around the
unreliability of the protocol
* Allows for TCP/SCTP connection sharing, unlike RTSP, which requires the
client to open a separate connection to the server for each session.
* Leverages MRCP - the state machine and flow are the same as MRCP, and are
Most of the issues that have been raised on the list have been noted and
simply need to be incorporated into the next set of drafts. Sarvi
presented a slide which listed the known issues, and the remainder of the
discussion focused around these issues (and others that would come in in the
course of the discussion). The chairs took a quick show of hands, which
revealed that a few people have read the most recent draft.
* Issue 1: Define SI and SV
The author has received some responses from a few people who might be
interested in working on the SI and SV problem, but if there are
additional people who are interested, they should contact the WG chairs.
Dan Burnett has volunteered.
* Issue 2: Why use SIP (Bryan Wild and others)
There was some discussion around the choice of SIP. Morna Hirsch asked the
question (which Bryan Wild and others have asked on the list) why we
wouldn't continue with the use of RTSP and extend that instead of going all
the way to SIP?
Sarvi explained that two issues that while RTSP was being used,
speechsc was primarily using MRCP as a TCP pipe and so therefore it
worked. The desire was to move the messages to the top layer without
requiring tunneling, and the separation of the control channel provided a
clean way to do this. Additionally, going to SIP allowed for reuse of the
TCP pipe between client and server.
In getting some more detail around the use of SIP for speechsc, Colin
asked whether the proposal was a subset of SIP, or whether there would be
parts of SIP that people would expect to work, that wouldn't when used in a
Sarvi explained that everything one would expect for a standard RFC
3261-compliant UA would work; it is not a subset of SIP and there's no
expectation that a profile would be needed.
The chairs took a hum on the question: "Is there consensus on using SIP as
the session initiation protocol for speechsc?" The hum indicated rough
consensus for the statement; there was no opposition.
The chairs then took a hum on the question of whether it would be
appropriate to adopt this draft as a WG item. Again, there was no
opposition, but only a light hum in favor. The chairs will take this
question to the list.
* Issue 3: Multiple resources of a given type
Dan Burnett asked regarding section 3.2 for some more
clarification on adding and removing resources. Is it possible to have, for
example, multiple ASR resources and then to be able to drop just one? As
long as there are only references to resource type and not to specific
resources, it's unclear what would be dropped?
There was some discussion around why one would want to have multiple
resources - for example to have multiple recognizers in parallel, but the
current draft does not consider having multiple resources on a single
Further discussion was taken to the list.
* Issue 4: Resource Tokens as strings
The protocol currently defines resources by an integer number. In an XML
format, it costs the same (in bytes) to use strings such as "SI", "SV", or
even "ASR" or others. Colin and Eric independently asked the question of
the extensibility of the namespace and whether strings could be used
instead of numbers.
Sarvi indicated that he was open to using strings, perhaps even URIs of the
form channel ID@asr.
There was some followup discussion on whether these strings would be
arbitrary, negotiated strings, or fixed strings as in an IANA
registry. The discussion seemed to focus around leaning towards
specific strings by resource types.
The chairs asked for a concrete proposal to be sent to the list
* Issue 5: Use of the m= line
Neil Deason brought up the issue of how one would specify the choice of TCP
or SCTP given the current specs. Two options were proposed.
Proposal 1: One m= line, with a protocol ID of "speechsc" and where the
MIME type is a resource ID
Proposal 2: One m= line with the protocol ID being the actual protocol used
(TCP or SCTP), MIME type of "application/speechsc" and additional
attributes a=resource ID <type>, a=channel ID <identifier>
There were no comments on this and further discussion was taken to the
Adam Roach made the comment that having content-length headers in the
middle of the data has proven difficult to implement efficiency in other WGs
(like SIP). Subsequent work, for example in MSRP, has gone to more of a
fixed-position framing for the ease of parsing. Various other options
include include either an easy to parse byte count, or well-known leader
text (a la MIME parts). This makes it easier to parse without having to
pull in the entire message.
Protocol Analysis Document - Eric Burger
The document is complete, though it still needs some more work,
particularly cross-review. A show of hands showed that 3 or 4 people had
read it. So now what? Does this document need to be published? Does it
need to be kept alive for the duration of the protocol? etc.
The AD felt that if it was interesting and/or worthwhile or could convey
some of the rationale for using IETF-supported protocols rather than not,
that it would be useful to document.
There was some collective intuition that it would be good to ahve
documented the reasons why the group moved in the direction that it did,
particularly because the group has made a fairly significant change in
direction. As of now, however, the document is not in a publishable
state, and needs further work.
Milestone Review - Eric Burger
The group is a little ahead of schedule on the milestones as far as draft
submissions are concerned. The milestones will be updated coming out of the