[12:22:09] --- nsb has joined
[12:22:09] --- nsb has left: Lost connection
[12:22:10] --- nsb has joined
[12:24:29] --- xcross has joined
[12:33:19] --- nsb has left: Disconnected.
[12:33:23] --- jengelsma has joined
[12:52:53] --- xcross has left
[12:55:28] --- nm has joined
[12:55:50] --- Ted has joined
[12:56:17] <Ted> Howdy. Anyone remote in the jabber chat?
[12:56:55] --- nsb has joined
[12:56:55] --- nsb has left: Lost connection
[12:56:56] --- nsb has joined
[12:58:56] --- nm has left: Replaced by new connection
[12:58:56] --- nm has joined
[12:58:56] --- nm has left
[12:59:22] --- tonyhansen has joined
[13:02:23] --- leiba has joined
[13:03:40] --- nm has joined
[13:06:22] --- dcrocker has joined
[13:07:55] <jengelsma> dave: is this only about sync'ing multiple streams to one recipient?
[13:08:33] <leiba> Jabber scribing is REALLY sparse...
[13:09:08] <jengelsma> Chris giving short multimodal demo...
[13:09:18] <Ted> he appears to be doing a multimodal demo using pizza ordering as a use case
[13:10:06] <Ted> demo shows the speech input reflected in the gui input
[13:11:24] <jengelsma> demo show a markup (XHTML+Voice) being used to show both a GUI and voice interface.
[13:11:30] --- tlr has joined
[13:11:52] <jengelsma> DMSP is used to distribute the two different modalities
[13:13:00] <jengelsma> demo was a "thick client". both modalities were implemented on a single PC.
[13:13:45] <jengelsma> in some cases a modality may be distributed. i.e. a client doesn't have the resources necessary to do speech recognition.
[13:14:48] <jengelsma> for example it may not be possible to handle a large street address grammar on a 500Mhz PDA/phone processor.
[13:15:43] <jengelsma> the user ought to make the choice of which modality it will use at any given time.
[13:16:16] <jengelsma> ted: are their candidate enablers being developed in OMA as well?
[13:16:58] <jengelsma> chris: Mentions the OMA reference architecture that enumerates the enablers required. The OMA spec refers to a protocol to synchronize, but assumes it will be done in IETF.
[13:18:03] <jengelsma> ted: recommends speaking to dean willis re: OMA to set expectations.
[13:18:27] <jengelsma> ted: would really like to see folks in OMA interested in this, participating in the IETF in defining.
[13:19:15] <jengelsma> 4 DMSP building blocks 1) modalities 2) MVC design pattern, 3) view indpenent model, 4) event-based modalities.
[13:20:23] <jengelsma> goes over two arch figures - 1) individual browsers for each modality 2) compound browser - one browser multiple modality.
[13:21:15] <jengelsma> a multimodal system can be modeled in terms of the MVC (model-view-controller) pattern.
[13:22:16] <jengelsma> view independent model - multiple modalities update a centralized data model. DMSP is the protocol used.
[13:23:04] --- Glenn Parsons has joined
[13:24:16] <jengelsma> a modality may be implemented in the infrastructure (i.e. speech) a protocol is needed to implement this. DMSP along with other protocols (SIP, RTP, etc.) are used to accomplish this.
[13:24:45] <jengelsma> The "synch" part is what is missing at present, and DMSP is meant to address this.
[13:26:14] <jengelsma> David Oran: (speechsc chair) how are we going to approach WG charter?
[13:27:54] <jengelsma> David: outlines two approachs 1) simply bless what is here and move forward or 2) take what is done now as a starting point but revisit design decisions and go from there.
[13:28:34] <jengelsma> speechsc took the approach 2 path.
[13:29:40] <jengelsma> Jim Ferrans will show a brief demo.
[13:30:58] <jengelsma> Demo is a video... 3 applications are shown.
[13:31:22] <jengelsma> the demo shows a form that lets user utter (or enter) a zipcode and address.
[13:31:40] <jengelsma> a map of that address is then displayed after jim has spoken a zip/addr.
[13:32:01] <jengelsma> second app shows a "mobile servide call" scenario.
[13:33:07] <jengelsma> handset alerts user who is then navigates through a series of screens via speech commands to find location of call, determine what needs to be done. GUI updates on phone display.
[13:33:31] <jengelsma> last demo shows a multimodal corporate directory application running on a handset.
[13:34:19] <jengelsma> user speaks a name. name is displayed in text on display and via TTS. user has the option to add to addressbook one phone or place a call.
[13:35:49] <jengelsma> Dave crocker: nice demo. but doesn't explain dmsp at all! I have no idea what the components are and how they interact... is that next?
[13:36:38] <jengelsma> Jim: presents viewgraph comparing widex and dmsp
[13:37:14] --- Lisa has joined
[13:37:20] <Lisa> What's an IM?
[13:37:31] <jengelsma> IM=Interaction Manager
[13:37:37] <Lisa> thx
[13:38:24] <jengelsma> viewgraph shows OMA reference architecture showing how DMSP links an IM on a client with a user agent (UA) implemented on a server.
[13:39:24] <jengelsma> also shown is how widex can be used to split a user agent across a client and server.
[13:39:44] <jengelsma> jim: dmsp and widex on orthogonal.
[13:40:27] <jengelsma> eric burger: paraphrases discussion at lunch: widex is about distributing GUI and dmsp about distributing a VUI.
[13:41:06] <jengelsma> eric burger: concurs with architecture diagram but suggests discussion on whether or not the boxes (DMSP/widex) are indeed the same.
[13:41:52] --- bhoeneis has joined
[13:42:00] <jengelsma> Vlad: the point is not how you render or the end result of the UI, but how you are interacting between the renderer and server. If they are XML they are the same thing.
[13:43:13] <jengelsma> Jim: in terms of the level of interaction, widex is much lower level. dmsp is higher level. For example, a voicexml interpreter is instructed to load a document. this was deliberate... mimimize network traffic.
[13:43:58] <jengelsma> Chris: One of the goals of dmsp was to handle different types of UAs, including those that do not have a DOM.
[13:44:02] --- elwynd has joined
[13:44:33] <jengelsma> Chris: VoiceXML 2.0 does not have a DOM.
[13:45:42] <jengelsma> Chris: W3C goal for next version of VXML is to have a DOM.
[13:45:46] --- tlr has left: Replaced by new connection
[13:46:46] <jengelsma> eric: explains VoiceXML is a language...
[13:47:15] <jengelsma> eric: VoiceXML 3.0 is completely different.... and will take years.
[13:47:33] <jengelsma> Nathanial: VXML 2.x will be out there for a long time.
[13:49:46] <jengelsma> Ted: suggests relaxing widex to allow interaction with voicexml... would that fold this into the widex working group?
[13:50:13] <Ted> wasn't really suggestion, just trying to get a sense of the extent of the barrier between the two.
[13:50:21] <jengelsma> sorry
[13:50:47] <jengelsma> eric: vxml has not dom... but there is a good understanding of the data model behind it.
[13:51:06] <jengelsma> chris: there is a possibility that the DOM becomes the interface between widex and dmsp.
[13:51:53] <jengelsma> chris: believes multimodal sync. is orthogonal with widex which is about syncing a dom element on a server with a dom element on a client.
[13:53:01] <jengelsma> eric: this is the disconnect. if the dom is "what is the user's name" whether its typed or spoken, the dom is the same and it can be rendered with speech or visually.
[13:54:03] <jengelsma> chris: true in a one-to-one mapping such as you cite, but in multimodal you don't always have a one-to-one mappings across modalities.
[13:55:46] <jengelsma> vlad: in widex you have one session, but the requirements define multiple renderers. widex also uses the MVC.
[13:56:18] --- tlr has joined
[13:56:57] <jengelsma> nathaniel: if widex could support voicexml, that may be one way. sometimes a divide and conquer approach is more efficient.
[13:57:11] --- Randall Gellens has joined
[13:58:08] <jengelsma> ted: sounds like there is a fair amount stuff out there right now (voicexml) where is the pain point?
[13:59:26] <jengelsma> jim: my earlier comments might have been misleading. deployed voicexml apps today are voice-only, not multimodal. information is also presented to the user via speech/audio. With multimodal you can display the info as well.
[13:59:57] <jengelsma> jim: likewise for data entry on handset today voice is easier than keying in data.
[14:00:06] <jengelsma> David: don't see how that answers ted's question.
[14:01:09] <jengelsma> chris: clarifies the existing voicexml market (deployed) is voice-only. the multimodal market has yet to emerge:
[14:03:47] <jengelsma> ??: positive you guys are doing something useful, but still dont' understand what it is. you need to change your vocabulary to bring this into the IETF. Need a common frame of reference... what is being synchonized with what??
[14:04:39] <jengelsma> nathaniel: understand the comment, but dont' know if there is a language to translate to.
[14:05:04] <Ted> our sons should marry their daughters; slow, but tried an true
[14:05:05] <jengelsma> ??: no saying ietf has a vocabulary for this. we have to cultures here... need to find some vocab that is mutually comfortable.
[14:05:16] <leiba> Ted: :-)
[14:05:54] <leiba> BTW, "??" == Dave Crocker
[14:06:06] <jengelsma> david: struggled with the same thing in speechsc. there is a lot you can talk about without understanding.
[14:07:52] <jengelsma> Dave C: agrees with ted and david. I didn't make clear what I meant. Right now we are trying to arrive at a common framework. I am distracted by repeated references to stuff that are not part of the framework, but about the detail of what you've been working on for a long time.
[14:08:28] <jengelsma> Nathaniel: suggests we dive into some more detail on DMSP to see if that helps.
[14:09:36] <jengelsma> Chris: Agrees in general with Dave's comment, and mentions the companies involved are targetting mutliple languages already, but use specific languages as examples.
[14:11:20] <jengelsma> ted: where is the interoperatibility here?
[14:13:23] <jengelsma> chris: resumes DMSP presentation - presents four abstract interfaces: Command, Response, Event, & Signal. explains the need of binary and xml bindings.
[14:14:22] <jengelsma> some off mic discussion on binary vs. xml... topic to be deferred.
[14:16:26] <jengelsma> chris: presents Signals - used for initialization state machines.
[14:16:55] --- Glenn Parsons has left
[14:17:40] <jengelsma> Command and control messages: list of messages such as event registration, loading documents/urls, running forms, setting focus on input items in forms, etc.
[14:18:33] <jengelsma> Response messages: responses to Command messages. Indicate how the UA responded to a Command message.
[14:20:21] <jengelsma> Event messages: the "glue" between modalities. Used to propagate events between UAs. e.g. recognition results, data filled on a visual form. Points out there is an "extended" recognition result to handle various standards for describing speech recognition results.
[14:22:01] <jengelsma> Conclusions: protocol enables distributed multimodal interactions, based on MVC, can be generalized for modalities besides GUI and Voice. Supports application-specific result protocols, EMMA, etc.
[14:22:44] <jengelsma> ted: suggests using apple "movement API" as the third UI!
[14:23:42] <jengelsma> Nathaniel: tries to get a sense of whether or not this work should be kept separate from widex.
[14:24:09] <jengelsma> Vlad: binary messages seems to suggest they are quite different.
[14:25:59] <jengelsma> ted: not clear how many people here think we are solving a problem that really belongs in the IETF. Maybe folks should get up and talk about this.
[14:27:33] <jengelsma> chris: OMA had a very focused effort (18 months or so) on defining a multimodal reference architecture with significant requirements document.
[14:27:53] <jengelsma> eric: W3C is really really bad at protocols.
[14:29:09] <jengelsma> eric: is there a need for a working group or is what needed is a well-reviewed spec.
[14:29:40] <jengelsma> nathaniel: IBM/Motorola is simply seeking an IETF rubber stamp. Wanted broader input.
[14:30:46] <jengelsma> chris: enumerates other companies involved in OMA MM arch. (since they aren't represented here) Nokia, Ericsson, Oracle.
[14:31:31] <jengelsma> ted: is there anybody in the room now, interested in stepping up to the mic and expressing how they will support this effort.
[14:32:00] <jengelsma> nobody responds.
[14:32:53] <jengelsma> Thomas: Is it obvious this work is needed?
[14:33:07] <jengelsma> nathaniel: calls for a hum on this question.
[14:33:32] <jengelsma> nathaniel: calls for hum on people "who don't know"?
[14:34:08] <jengelsma> concensus: most don't know...
[14:36:46] <jengelsma> Jim Ferrans to display a slide illustrating how dmsp is used in an end-to-end architecture.
[14:37:21] <jengelsma> slide shows three entities: mobile device, voice server, and application server.
[14:37:45] <Lisa> I can't focus on the unmoving text with the moving arrows! Maybe I need more coffee. Or less.
[14:39:52] <jengelsma> DMSP is shown connecting the voice server to the mobile device, to synchronize GUI/Voice events. The audio is transported between the mobile device and voice server via RTP.
[14:40:34] <jengelsma> The voice server is shown fetching VoiceXML, XHTML+Voice, grammars, audio prompts, etc. from the application server.
[14:41:07] <jengelsma> David O: I would expect the viewgraph to show that the voice server is running on my notebook computer. If it doesn't support that it is flawed.
[14:41:25] <jengelsma> Chris: Confirms the use case David describes is indeed supported.
[14:42:08] <jengelsma> Barry: points out there are two audio links, one up and one down.
[14:42:33] <jengelsma> Jim: clarifies these would be whatever the native encoders on the handset are.
[14:43:26] <jengelsma> nathaniel: asks for guidance from the area directors on how to proceed.
[14:46:28] --- dcrocker has left
[14:46:41] --- Ted has left
[14:46:42] <jengelsma> ted: go to the mailing list and see who is willing to participate. Agrees with Dave's point that there seems to be a disconnect between this present work and the IETF. Also agrees with Nathaniel's earlier point that its odd there is interest in widex and not dmsp. Recommends reaching out via the mailing lists.
[14:47:00] --- leiba has left
[14:47:18] --- tlr has left
[14:47:58] --- nm has left
[14:48:46] --- Lisa has left: Logged out
[14:53:56] --- Randall Gellens has left
[14:55:43] --- nsb has left: Disconnected.
[15:00:55] --- bhoeneis has left
[15:06:05] --- jengelsma has left
[15:25:39] --- tonyhansen has left: Replaced by new connection
[15:25:56] --- elwynd has left
[16:55:21] --- elwynd has joined
[17:02:13] --- elwynd has left