idnits 2.17.1 draft-saleem-msml-09.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.ii or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 10 instances of lines with non-RFC2606-compliant FQDNs in the document. == There are 4 instances of lines with private range IPv4 addresses in the document. If these are generic example addresses, they should be changed to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x, 198.51.100.x or 203.0.113.x. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 297: '...trol agents, and SHOULD have an author...' RFC 2119 keyword, line 321: '...sts and responses MAY be used to carry...' RFC 2119 keyword, line 338: '...dia session. MSML MAY be sent on these...' RFC 2119 keyword, line 393: '... Framework [i13] MAY be used as a tran...' RFC 2119 keyword, line 419: '... MUST be specified during the defini...' (287 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 442 has weird spacing: '... Core packa...' == Line 2710 has weird spacing: '...collect agc ...' == Line 4825 has weird spacing: '... code len...' == Line 6106 has weird spacing: '... ' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 2 errors (**), 0 flaws (~~), 9 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet-draft Media Server Markup Language July 28 2009 2 (MSML) 4 Internet Engineering Task Force A. Saleem 5 Internet-Draft Y. Xin 6 Intended status: Informational Radisys 7 Expires: January 29, 2010 G. Sharratt 8 July 28, 2009 10 Media Server Markup Language (MSML) 11 draft-saleem-msml-09 13 Status of this Memo 15 This Internet-Draft is submitted to IETF in full conformance with the 16 provisions of BCP 78 and BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on January 29, 2010. 36 Copyright Notice 38 Copyright (c) 2009 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents in effect on the date of 43 publication of this document (http://trustee.ietf.org/license-info). 44 Please review these documents carefully, as they describe your rights 45 and restrictions with respect to this document. 47 Internet-draft Media Server Markup Language July 28 2009 48 (MSML) 50 Abstract 52 The Media Server Markup Language (MSML) is used to control and invoke 53 many different types of services on IP Media Servers. The MSML 54 control interface was initially driven by Radisys with subsequent 55 significant contributions from Intel, Dialogic, and others in the 56 industry. Clients can use it to define how multimedia sessions 57 interact on a Media Server and to apply services to individuals or 58 groups of users. MSML can be used, for example, to control Media 59 Server conferencing features such as video layout and audio mixing, 60 create sidebar conferences or personal mixes, and set the properties 61 of media streams. As well, clients can use MSML to define media 62 processing dialogs, which may be used as parts of application 63 interactions with users or conferences. Transformation of media 64 streams to and from users or conferences as well as IVR dialogs are 65 examples of such interactions, which are specified using MSML. MSML 66 clients may also invoke dialogs with individual users or with groups 67 of conference participants using VoiceXML. 69 Table of Contents 71 1. Introduction...................................................4 72 2. Glossary.......................................................5 73 3. MSML SIP Usage.................................................6 74 3.1 SIP INFO...................................................7 75 3.2 SIP Control Framework......................................8 76 4. Language Structure............................................15 77 4.1 Package Scheme............................................15 78 4.2 Profile Scheme............................................19 79 5. Execution Flow................................................19 80 6. Media Server Object Model.....................................21 81 6.1 Objects...................................................21 82 6.2 Identifiers...............................................24 83 7. MSML Core Package.............................................26 84 7.1 ....................................................26 85 7.2 ....................................................26 86 7.3 ..................................................27 87 7.4 ...................................................27 88 8. MSML Conference Core Package..................................28 89 8.1 Conferences...............................................28 90 8.2 Media Streams.............................................29 91 8.3 ........................................31 92 8.4 ........................................33 93 8.5 .......................................35 94 8.6 ................................................35 95 8.7 .............................................37 97 Internet-draft Media Server Markup Language July 28 2009 98 (MSML) 100 8.8 ....................................................43 101 8.9 ............................................45 102 8.10 .................................................46 103 8.11 ................................................47 104 8.12 .................................................48 105 9. MSML Dialog Packages..........................................51 106 9.1 Overview..................................................51 107 9.2 Primitives................................................53 108 9.3 Events....................................................55 109 9.4 MSML Dialog Usage with SIP................................56 110 9.5 MSML Dialog Structure and Modularity......................57 111 9.6 MSML Dialog Core Package..................................58 112 9.7 MSML Dialog Base Package..................................63 113 9.8 MSML Dialog Group Package.................................81 114 9.9 MSML Dialog Transform Package.............................85 115 9.10 MSML Dialog Speech Package...............................88 116 9.11 MSML Dialog Fax Detection Package........................92 117 9.12 MSML Dialog Fax Send/Receive Package.....................93 118 10. MSML Audit Package..........................................100 119 10.1 MSML Audit Core Package.................................100 120 10.2 MSML Audit Conference Package...........................102 121 10.3 MSML Audit Connection Package...........................105 122 10.4 MSML Audit Dialog Package...............................108 123 10.5 MSML Audit Stream Package...............................110 124 11. Response Codes..............................................111 125 12. MSML Conference Examples....................................113 126 12.1 Establishing a Dial-in Conference.......................113 127 12.2 Example of a Sidebar Audio Conference...................117 128 12.3 Example of Removing a Conference........................118 129 12.4 Example of Modifying Video Layout.......................119 130 13. MSML Dialog Examples........................................120 131 13.1 Announcement............................................120 132 13.2 Voice Mail Retrieval....................................120 133 13.3 Play and Record.........................................121 134 13.4 Speech Recognition......................................123 135 13.5 Play and Collect........................................124 136 13.6 User Controlled Gain....................................125 137 14. MSML Audit Examples.........................................126 138 14.1 Audit All Conferences...................................126 139 14.2 Audit Conference Dialogs................................127 140 14.3 Audit Conference Streams................................128 141 14.4 Audit All Connections...................................128 142 14.5 Audit Connection Dialogs................................129 143 14.6 Audit Connection Streams................................129 144 14.7 Audit Connection With Selective States..................130 145 15. Change Summary..............................................131 146 16. Future Work.................................................133 147 17. XML Schema..................................................134 149 Internet-draft Media Server Markup Language July 28 2009 150 (MSML) 152 17.1 MSML Core...............................................135 153 17.2 MSML Conference Core Package............................139 154 17.3 MSML Dialog Packages....................................148 155 17.4 MSML Audit Packages.....................................169 156 18. Security Considerations.....................................175 157 19. IANA Considerations.........................................176 158 19.1 IANA registrations for 'application' MIME Media Type....176 159 19.2 IANA registrations for 'text' MIME Media Type...........178 160 19.3 URN Sub-Namespace Registration..........................179 161 19.4 XML Schema Registration.................................180 162 20. Normative References........................................180 163 21. Informative References......................................181 164 Acknowledgments.................................................182 165 Authors' Addresses..............................................183 167 1. Introduction 169 Media servers contain dynamic pools of media resources. Control 170 Agents and other users of media servers (called media server clients) 171 can define and create many different services based on how they 172 configure and use those resources. Often, that configuration and the 173 ways in which those resources interact will be changed dynamically 174 over the course of a call, to reflect changes in the way that an 175 application interacts with a user. 177 For example, a call may undergo an initial IVR dialog before being 178 placed into a conference. Calls may be moved from a main conference 179 to a sidebar conference and then back again. Individual calls may be 180 directly bridged to create small n-way calls or simple sidebars. None 181 of these change the SIP [n1] dialog or RTP [i3] session. Yet these do 182 affect the media flow and processing internal to the media server. 184 The Media Server Markup Language (MSML) is an XML [n2] language used 185 to control the flow of media streams and services applied to media 186 streams within a media server. It is used to invoke many different 187 types of services on individual sessions, groups of sessions, and 188 conferences. MSML allows the creation of conferences, bridging 189 different sessions together, and bridging sessions into conferences. 191 MSML may also be used to create user interaction dialogs and allows 192 the application of media transforms to media streams. Media 193 interaction dialogs created using MSML allow construction of IVR 194 dialog sessions to individual users as well as to groups of users 195 participating in a conference. Dialogs may also be specified using 197 Internet-draft Media Server Markup Language July 28 2009 198 (MSML) 200 other languages, VoiceXML [n5], which support complete single-party 201 application logic to be executed on the Media Server. 203 MSML is a transport independent language, such that it does not rely 204 on underlying transport mechanisms and language semantics are 205 independent of transport. However, SIP is a typical and commonly used 206 transport mechanism for MSML, invoked using the SIP URI scheme. This 207 specification defines using MSML Dialogs using SIP as the transport 208 mechanism. 210 A network connection may be established with the media server using 211 SIP. Media received and transmitted on that connection will flow 212 through different media resources on the media server depending on 213 the requested service. Basic Network Media Services with SIP [n7] 214 defines conventions for associating a basic service with a SIP 215 Request-URI. MSML allows services to be dynamically applied and 216 changed by a Control Agent during the lifetime of the SIP dialog. 218 MSML has been designed to address the control and manipulation of 219 media processing operations (e.g., announcement, IVR, play and 220 record, ASR/TTS, fax, video), as well as control and relationships of 221 media streams (e.g., simple and advanced conferencing). It provides a 222 general-purpose media server control architecture. MSML can 223 additionally be used to invoke other more complex IVR languages such 224 as VoiceXML. 226 The MSML control interface has been widely deployed in the industry 227 with numerous client-side and server-side implementations, since 228 2003. The in-service commercial deployments cover a wide variety of 229 applications including, but not limited to, IP multimedia 230 conferencing, network voice services, IVR/IVVR, and voice/video mail. 232 2. Glossary 234 Media Server: a general-purpose platform for executing real-time 235 media processing tasks. This is a logical function that maps either 236 to a single physical device or to a portion of a physical device. 238 Media Server Client: an application which originates MSML requests to 239 a media server and also referred to as a Control Agent in this 240 specification. 242 Network Connection: a participant that represents the termination on 243 a media server of one or more RTP [i3] sessions (for example audio 244 and video) associated with a call. Network connections are 245 established and removed using a session establishment protocol such 246 as SIP. An instance of a network connection is independent of MSML 247 processing instructions applied to it. 249 Internet-draft Media Server Markup Language July 28 2009 250 (MSML) 252 Dialog: an automated IVR participant. Examples of dialogs may be 253 announcement players, IVR interfaces, or voice recorders. Dialogs may 254 be defined in MSML or using VoiceXML [n5]. 256 Conference: an intermediary function that provides multimedia mixing 257 and other advanced conferencing services. This specification 258 currently considers conferences with audio and/or video media types, 259 but is extensible to other media types. 261 Identifier: a name that is used to refer to a specific instance of an 262 object on the media server, such as a conference or a dialog. 263 Identifiers are composed of one or more terms where each term 264 identifies an object class and instance. 266 Object: the generic term for a media server entity that terminates, 267 originates, or processes media. This specification defines four 268 classes of objects and specifies mechanisms to create them, join them 269 together, and destroy them. 271 Participant Object: an object in a media server that sources original 272 media in a call and/or receives and terminates media in a call. 274 Intermediary Object: an object in a media server that acts on media 275 within a call for the benefit of the participants. 277 Independent Object: an object that can exist on a media server 278 independent of other objects. 280 Operator: an intermediary transformer that modifies or transforms a 281 media stream. Examples of operators may be audio gain controls, video 282 scaling, or voice masking. MSML defines operators as media transform 283 objects, which transform media using operations such as gain control, 284 when applied to media streams. 286 Media Stream: a single media flow between two objects. A media stream 287 has a media type and may be unidirectional or bidirectional. 289 3. MSML SIP Usage 291 SIP is used to create and modify media sessions with a media server 292 according to the procedures defined in RFC 3261 [n1]. Often, SIP 293 third party call control [i4] will be used to create sessions to a 294 media server on behalf of end users. MSML is used to define and 295 change the service which a user connected to a media server will 296 receive. MSML clients are application servers, soft-switches, or 297 other forms of control agents, and SHOULD have an authorized security 298 relationship with the media server. MSML itself does not define 299 authorization mechanisms. 301 Internet-draft Media Server Markup Language July 28 2009 302 (MSML) 304 MSML transactions are originated based upon events that occur in the 305 application domain. These events may be independent from any media or 306 user interaction. For example, an application may wish to play an 307 announcement to a conference warning that its scheduled completion 308 time is approaching. Applications themselves are structured in many 309 different ways. Their structure and requirements contribute to their 310 selection of protocols and languages. To accommodate differing 311 application needs, MSML has been designed to be neutral to other 312 languages and independent of the transport used to carry it. 314 MSML language is purposely designed to be transport independent. In 315 this release of the specification, SIP INFO [i5] and SIP Control 316 Framework [i13] have been chosen for transport mechanisms for MSML, 317 as described in the following sections. 319 3.1 SIP INFO 321 SIP INVITE and INFO [i5] requests and responses MAY be used to carry 322 MSML. INFO requests allow asynchronous mid-call messages within SIP 323 with few additional semantics. In addition, there are existing widely 324 deployed implementations of that method, it aids in initial 325 developments which are closely coupled with SIP session 326 establishment, and it allows MSML to be directly associated with user 327 dialogs when third party call control is used. 329 Although INFO is sometimes considered to not be a suitable general- 330 purpose transport mechanism for messages within SIP, there have been 331 proposals to make it more acceptable. MSML may evolve to include 332 other SIP usage and/or to work with other protocols or as a stand- 333 alone protocol established through SIP, in future releases of this 334 document. 336 MSML supports several models for client interaction. When clients use 337 3PCC to establish media sessions on behalf of end users, clients will 338 have a SIP dialog for each media session. MSML MAY be sent on these 339 dialogs. However the targets of MSML actions are not inferred from 340 the session associated with the SIP dialog. The targets of MSML 341 actions are always explicitly specified using identifiers as 342 previously defined. 344 An application, after interacting with a user, may want to affect 345 multiple objects within a media server. For example, tones or 346 messages are often played to a conference when connections are added 347 or removed. A separate message may also be played to a participant as 348 they are joined, or to moderators. Explicit identifiers, that is, not 349 inferred from a transport mechanism allow these multiple actions to 350 be easily grouped into a single transaction sent on any SIP dialog. 352 Internet-draft Media Server Markup Language July 28 2009 353 (MSML) 355 MSML also supports a model of dedicated control associations. This 356 supports decoupled application architectures where a client can 357 control media server services without also establishing all of the 358 media sessions itself. Control associations are created using SIP but 359 they do not have any associated media session. Although initially 360 INFO messages will be sent on this SIP dialog, just as with dialogs 361 associated with media sessions, it is possible that in the future, 362 the SIP dialog will be used to establish a separate control session 363 (defined in SDP [n9]) that does not use SIP as the transport for MSML 364 messages. 366 A media server using MSML also sends asynchronous events to a client 367 using MSML scripts in SIP INFO. Events are sent based on previous 368 MSML requests and are sent within the SIP dialog on which the MSML 369 request that caused the event to be generated was received. If this 370 dialog no longer exists when the event is generated, the event is 371 discarded. 373 Events may be generated during the execution of a dialog created by a 374 element. For example, dialogs can send events based on 375 user input. VoiceXML dialogs, on the other hand, generally interact 376 with other servers outside of MSML using HTTP. 378 An event is also generated when the execution of a dialog terminates, 379 either because of completion or failure. The exact information 380 returned is dependent on the dialog language, the capabilities of the 381 dialog execution environment, and what was requested by the dialog. 382 Both MSML and VoiceXML [n5] allow information to be returned when 383 they exit. These events may be sent in a SIP INFO or a SIP BYE. SIP 384 BYE is used when the dialog itself specifies that the connection 385 should be disconnected, for example through the use of the 386 element. 388 Conferences may also generate events based upon their configuration. 389 An example of this is the notification of the set of active speakers. 391 3.2 SIP Control Framework 393 The SIP Control Framework [i13] MAY be used as a transport mechanism 394 for MSML. 396 The Control Framework provides a generic approach for establishment 397 and reporting capabilities of remotely initiated commands. The 398 framework utilizes many functions provided by the Session Initiation 399 Protocol [n1] (SIP) for the rendezvous and establishment of a 400 reliable channel for control interactions. Compared to SIP INFO, the 401 SIP Control Framework is a more general purpose transport mechanism 403 Internet-draft Media Server Markup Language July 28 2009 404 (MSML) 406 and one which is not constrained by limitations of the SIP INFO 407 mechanism. 409 The Control Framework also introduces the concept of a Control 410 Package, which is an explicit usage of the Control Framework for a 411 particular interaction set. This specification has already specified 412 a list of packages for MSML to control the Media Server in many 413 aspects, including basic dialog, advanced conferencing, advanced 414 dialog and audit service. Each of these packages has a unique Control 415 Package name assigned in order for MSML to be used with the Control 416 Framework. 418 This section fulfills the mandatory requirement for information that 419 MUST be specified during the definition of a Control Framework 420 Package, as detailed in SIP Control Framework [i13]. 422 3.2.1 Control Framework Package Names 424 The Control Framework [i13] requires a Control Package definition to 425 specify and register a unique name. 427 MSML specification defines Control Package names using a hierarchical 428 scheme to indicate the inherited relationship across packages. For 429 example, package "msml-x" is derived from package "msml", and package 430 "msml-x-y" is derived from package "msml-x". 432 The following is a list of Control Package names reserved by the MSML 433 specification. 435 "msml": this Control Package supports MSML Core package as 436 specified in section 8. 438 "msml-conf": this Control Package supports MSML Conference 439 Core package as specified in section 9. 441 "msml-dialog": this Control Package supports MSML Dialog 442 Core package as specified in section 10.6. 444 "msml-dialog-base": this Control Package supports MSML 445 Dialog Base package as specified in section 10.7. 447 "msml-dialog-transform": this Control Package supports MSML 448 Dialog Transform package as specified in section 449 10.9. 451 "msml-dialog-group": this Control Package supports MSML 452 Dialog Group package as specified in section 10.8. 454 Internet-draft Media Server Markup Language July 28 2009 455 (MSML) 457 "msml-dialog-speech": this Control Package supports MSML 458 Dialog Speech package as specified in section 459 10.10. 461 "msml-dialog-fax-detect": this Control Package supports MSML 462 Dialog Fax Detection package as specified in 463 section 10.11. 465 "msml-dialog-fax-sendrecv": this Control Package supports 466 MSML Dialog Fax Send/Receive package as specified 467 in section 10.12. 469 "msml-audit": this Control Package supports MSML Audit Core 470 Package as specified in section 11.1. 472 "msml-audit-conf": this Control Package supports MSML Audit 473 Conference Package as specified in section 11.2. 475 "msml-audit-conn": this Control Package supports MSML Audit 476 Connection Package as specified in section 11.3. 478 "msml-audit-dialog": this Control Package supports MSML 479 Audit Dialog Package as specified in section 11.4. 481 "msml-audit-stream": this Control Package supports MSML 482 Audit Stream Package as specified in section 11.5. 484 An Application Server using the Control Framework as transport for 485 MSML, MUST use one or multiple package names, depending on the 486 service required from the Media Server. The package name(s) are 487 identified in the "Control-Packages" SIP header that is present in 488 the SIP INVITE dialog request that creates the control channel, as 489 specified in [i13]. The "Control-Packages" value MAY be re-negotiated 490 via the SIP re-INVITE mechanism. 492 3.2.2 Control Framework Messages 494 The usage of CONTROL, response and REPORT messages, as defined in 495 [i13], by each Control Package defined in MSML is different and 496 described separately in the following sections. 498 MSML Core Package "msml" 500 The Application Server may send CONTROL message with a body of 501 MSML request using following elements to the MS: 503 : the root element that may contain a list of child 504 elements which request a specific operation. The child 506 Internet-draft Media Server Markup Language July 28 2009 507 (MSML) 509 elements are defined in extended packages (eg. "msml-conf" and 510 "msml-dialog"). This element is also the root element which 511 contains MSML result and event. 513 : sends an event to the specified recipient within the 514 Media Server. Specific event types are defined within the 515 extended packages. 517 The Media Server replies with a response message containing a 518 MSML result using the following elements: 520 : reports the results of an MSML transaction. 522 The Media Server MAY send MSML event to the Application 523 Server, in a REPORT or CONTROL message, using element . 524 The actual content of the and which Control Framework 525 message to use is defined within the extended packages. 527 MSML Conference Core Package "msml-conf" 529 This package extends the MSML Core Package to define a 530 framework for creation, manipulation and deletion of a 531 conference. 533 AS can send CONTROL message with a body of MSML request which 534 contains one or multiple conference related commands to MS. MS 535 then replies with a response message with a body of MSML 536 result to indicate if the request has been fulfilled or not. 538 During the lifetime of a conference, whenever an event occurs, 539 the Media Server MAY send CONTROL messages containing MSML 540 events to notify the Application Server. The Application 541 Server SHOULD reply with a response message with no MSML body 542 to acknowledge the event has been received. 544 This package does NOT use the REPORT message. 546 Dialog Core Package "msml-dialog" 548 This package extends the MSML Core Package to define the 549 structural framework and abstractions for MSML dialogs. 551 The Application Server MAY send CONTROL messages containing a 552 MSML request using following elements: 554 : instantiate an MSML media dialog on a 555 connection or a conference. 557 Internet-draft Media Server Markup Language July 28 2009 558 (MSML) 560 : terminates a MSML dialog. 562 : sends an event and an optional namelist to the dialog, 563 dialog group, or dialog primitive. 565 : used by the dialog description language to cause the 566 execution of the MSML dialog to terminate. 568 For the command, the response message MUST 569 contain a MSML result which indicates that the dialog has been 570 started successfully. The MSML result MAY contain 571 to return dialog identifier, if the identifiers was assigned 572 by the Media Server. Subsequently, zero of more MSML events 573 MAY be initiated by the Media Server in (update) REPORT 574 messages to report information gathered during the dialog. 575 Finally, a MSML event "msml.dialog.exit" SHOULD be generated 576 in a (terminate) REPORT message when the dialog terminates 577 (eg. MSML execution of ). 579 For the and commands, the response message 580 contains the final MSML result which indicates that the 581 request has either been fulfilled or rejected. 583 Dialog Base Package "msml-dialog-base" 585 This package extends the MSML Dialog Core Package to define a 586 set of base functionality for MSML dialogs. The extension 587 defines individual media primitives, including , 588 , , , and , to be 589 used as child element of . This package does not 590 change the framework message usage as defined by the MSML 591 Dialog Core Package. 593 Dialog Transform Package "msml-dialog-transform" 595 This package extends the MSML Dialog Core Package to define a 596 set of transform primitives which works as filter on half 597 duplex media streams. The extension defines transform 598 primitives, including , , , , 599 and , which MAY be used as child elements of 600 . This package does not change the framework 601 message usage as defined by the MSML Dialog Core Package. 603 Dialog Group Package "msml-dialog-group" 605 Internet-draft Media Server Markup Language July 28 2009 606 (MSML) 608 This package extends the MSML Dialog Core, Base and Transform 609 Packages to define a single control flow construct that 610 specifies concurrent execution of multiple media primitives. 611 The extension defines the element which MAY be used as 612 a child element of to enclose multiple media 613 primitives, such that they can be executed concurrently. This 614 package does not change the framework message usage as defined 615 by the MSML Dialog Core Package. 617 Dialog Speech Package "msml-dialog-speech" 619 This package extends the MSML Dialog Core and MSML Base 620 Package to define functionality which MAY be used for 621 automatic speech recognition and text-to-speech. The extension 622 extends the and the elements. 624 For , it defines a new child element to 625 activate grammars or user input rules associated with speech 626 recognition. For , it defines a new child element 627 to initiate the text-to-speech service. 629 This package does not change the framework message usage as 630 defined by the MSML Dialog Core Package. 632 Dialog Fax Detection Package "msml-dialog-fax-detect" 634 This package extends the MSML Dialog Core Package to define 635 primitives provide fax detection service. The extension 636 defines a primitive to be used as a child element 637 of . This package does not change the framework 638 message usage as defined by the MSML Dialog Core Package. 640 Dialog Fax Send/Receive Package "msml-dialog-fax-sendrecv" 642 This package extends the MSML Dialog Core Package to define 643 primitives which allow a media server to provide fax send or 644 receive service. The extension defines new primitives 645 and , to be used as child element of 646 . This package does not change the framework 647 message usage as defined in MSML Dialog Core Package. 649 Dialog Audit Core Package "msml-audit" 651 Internet-draft Media Server Markup Language July 28 2009 652 (MSML) 654 This package extends the MSML Core Package to define a 655 framework for auditing media resource(s) allocated on the 656 Media Server. 658 This package follows a simple request/response transaction, 659 allowing the Application Server to send CONTROL messages 660 containing MSML requests. The Media Server MUST reply 661 with a response message containing the result. The result is 662 contained within the element, returning the 663 queried state information. 665 This package does NOT use the REPORT message. 667 Dialog Audit Conference Package "msml-audit-conf" 669 This package extends the MSML Audit Core Package to define 670 conference specific states which MAY be queried via the 671 command and the corresponding response MUST be 672 returned by the element. This package does not 673 change the framework message usage as defined by the MSML 674 Audit Core Package. 676 Dialog Audit Connection Package "msml-audit-conn" 678 This package extends the MSML Audit Core Package to define 679 connection specific states which MAY be queried via the 680 command and the corresponding response MUST be 681 returned by the element. This package does not 682 change the framework message usage as defined by the MSML 683 Audit Core Package. 685 Dialog Audit Dialog Package "msml-audit-dialog" 687 This package extends the MSML Audit Core Package to define 688 dialog specific states which MAY be queried via the 689 command and the corresponding response MUST be returned by the 690 element. This package does not change the 691 framework message usage as defined by the MSML Audit Core 692 Package. 694 Dialog Audit Stream Package "msml-audit-stream" 696 This package extends the MSML Audit Core Package to define 697 stream specific states which MAY be queried via the 699 Internet-draft Media Server Markup Language July 28 2009 700 (MSML) 702 command and the corresponding response MUST returned by the 703 element. This package does not change the 704 framework message usage as defined by the MSML Audit Core 705 Package. 707 3.2.3 Common XML Support 709 The XML schema described in [i13] MUST be supported by all Control 710 Packages defined by MSML. However, the "connection-id" value MUST be 711 constructed as defined by MSML (i.e. the identifier MUST contain 712 local dialog tag only, while the SIP Control Framework [i13] requires 713 that the "connection-id" contain both local and remote dialog tags). 715 3.2.4 Control Message Body 717 A valid CONTROL body message MUST conform to the MSML schema, as 718 included in this specification, for the MSML package(s) used. 720 3.2.5 REPORT Message Body 722 A valid REPORT body message MUST conform to the MSML schema, as 723 included in this specification, for the MSML package(s) used. 725 4. Language Structure 727 4.1 Package Scheme 729 The primary mechanism for extending MSML is the "package". A package 730 is an integrated set of one or more XML schemas that define 731 additional features and functions via new or extended use of elements 732 and attributes. Each package, except for those defined in the current 733 document, is defined in a separate standards document, e.g., an 734 Internet Draft or an RFC. All packages, that extend the base MSML 735 functionality, MUST include references to the MSML base set of 736 schemas provided in the Internet drafts. A schema in a package MUST 737 only extend MSML, this is, it must not alter the existing 738 specification. 740 A particular MSML script will include references to all the schemas 741 defining the packages whose elements and attributes it makes use of. 742 A particular script MUST reference MSML base and optionally extension 743 package(s). See IANA Considerations section. 745 Each package MUST define its own namespace so that elements or 746 attributes with the same name in different packages do not conflict. 747 A script using a particular element or attribute MUST prefix the 748 namespace name on that element or attribute's name if it is defined 749 in a package (as opposed to being defined in the base). 751 Internet-draft Media Server Markup Language July 28 2009 752 (MSML) 754 MSML consists of a core package which provides structure without 755 support for any specific feature set. Additional packages, relying on 756 the core package, provide functional features. Any combination of 757 additional packages may be used along with the core package. The 758 following describes the set of MSML packages defined in this 759 document. 761 +--------------------------------------------------------+ 762 | MSML Core | 763 +--------------------------------------------------------+ 764 / \ \ 765 +--------+ +--------+ +-------+ 766 | Dialog | | Conf | | Audit | 767 | Core | | Core | | Core | 768 +--------+ +--------+ +-------+ 769 ________ \_______________________________________ | 770 ------------------------------------------------ | 771 / \ \ \ \ \ | 772 +------+ +---------+ +------+ +------+ +------+ +-------+ | 773 |Dialog| |Dialog | |Dialog| |Dialog| |Dialog| |Dialog | | 774 |Base | |Transform| |Group | |Speech| |Fax | |Fax | | 775 +------+ +---------+ +------+ +------+ |Detect| |Send/ | | 776 +------+ |Receive| | 777 +-------+ | 778 ________________________| 779 ------------------------- 780 / \ \ \ 781 +-----+ +-----+ +------+ +------+ 782 |Audit| |Audit| |Audit | |Audit | 783 |Conf | |Conn | |Dialog| |Stream| 784 +-----+ +-----+ +------+ +------+ 786 o MSML Core package (Mandatory) 788 Describes the minimum base framework which MUST be implemented 789 to support additional core packages. 791 o MSML Conference Core package (Conditionally Mandatory, for 792 Conferencing) 794 Describes the audio and multimedia basic and advanced 795 conferencing package, which MAY be implemented. 797 o MSML Dialog Core package (Conditionally Mandatory, for Dialogs) 799 Describes the dialog core package which MUST be implemented for 800 any dialog services. However, systems supporting conferencing 802 Internet-draft Media Server Markup Language July 28 2009 803 (MSML) 805 only, MAY omit support for MSML dialogs. The MSML dialog core 806 package specifies the framework within which additional dialog 807 packages are supported. The MSML dialog base package MUST be 808 supported, while all other dialog packages MAY be supported. 810 o MSML Dialog Base package (Conditionally Mandatory, for 811 Dialogs) 813 o MSML Dialog Group package (Optional) 815 o MSML Dialog Transform package (Optional) 817 o MSML Dialog Fax Detection package (Optional) 819 o MSML Dialog Fax Send/Receive package (Optional) 821 o MSML Dialog Speech package (Optional) 823 o MSML Audit Core package (Conditionally Mandatory, for Auditing) 825 Describes the audit core package which MUST be implemented to 826 support auditing services. The MSML audit core package 827 specifies the framework within which additional audit packages 828 are supported. 830 o MSML Audit Conference package (Conditionally Mandatory, for 831 Auditing Conference, Conference Dialog and Conference Stream) 833 o MSML Audit Connection package (Conditionally Mandatory, for 834 Auditing Connection, Connection Dialog and Connection Stream) 836 o MSML Audit Dialog package (Conditionally Mandatory, for 837 Auditing Dialog, and MUST be used with either MSML Audit 838 Conference Package or MSML Audit Connection Package) 840 o MSML Audit Stream package (Conditionally Mandatory, for 841 Auditing Stream, and MUST be used with either MSML Audit 842 Conference Package or MSML Audit Connection Package) 844 The formal process for defining extensions to MSML Dialogs is to 845 define a new package. The new package MUST provide a text description 846 of what extensions are included and how they work. It MUST also 847 define an XML schema file (if applicable) that defines the new 848 package (which may be through extension, restriction of an existing 849 package, or a specific profile of an existing package). Dependencies 850 upon other packages MUST be stated. For example a package that 851 extends or restricts has a dependency on the original package 853 Internet-draft Media Server Markup Language July 28 2009 854 (MSML) 856 specification. Finally, the new package MUST be assigned a unique 857 name and version. 859 The types of things which can be defined in new packages are: 861 o new primitives 863 o extensions to existing primitives (events, shadow variables, 864 attributes, content) 866 o new recognition grammars for existing primitives 868 o new markup languages for speech generation 870 o languages for specifying a topology schema 872 o new pre-defined topology schemas 874 o new variables / segment types (sets & languages) 876 o new control flow elements 878 MSML Packages are assembled together to form a specific MSML profile 879 that is shared between different implementations. The base MSML 880 Dialog profiles which are defined in this document consist of the 881 MSML Core package, MSML Dialog Core package, MSML Dialog Base 882 package, MSML Dialog Group package, MSML Transform package, MSML Fax 883 packages, and the MSML Speech package. 885 MSML extension packages, which define primitives, MUST define the 886 following for each primitive within the package: 888 o the function which the primitive performs 890 o the attributes which may be used to tailor its behavior 892 o the events which it is capable of understanding 894 o the shadow variables which provide access to information 895 determined as a result of the primitive's operation. 897 The mechanism used to insure that a media server and its client share 898 a compatible set of packages is not defined. Currently it is expected 899 that provisioning will be used, possibly coupled with a future 900 auditing capability. Additionally, when used in SIP networks, 901 packages could be defined using feature tags and the procedures 902 defined for Indicating User Agent Capabilities in SIP [i1] used to 904 Internet-draft Media Server Markup Language July 28 2009 905 (MSML) 907 allow a media server to describe its capabilities to other user 908 agents. 910 4.2 Profile Scheme 912 Not all devices and applications using MSML will need to support the 913 entire MSML schema. For example, a media processing device might 914 support only audio announcements, only audio simple conferencing, or 915 only multimedia IVR. It is highly desirable to have a system for 916 describing what portion of MSML a particular media processing device 917 or Control Agent supports. 919 The Package scheme described earlier allows MSML functionality to be 920 functionally grouped, relying on the MSML core package. This scheme 921 allows a portion of the complete MSML specification to be 922 implemented, on a per package basis and also creates a framework for 923 future extension packages. However, within a given package, in some 924 cases, only a subset of the package functionality may be required. In 925 order to support subsets of packages, with greater degree of 926 granularity than at the package level, a profile scheme is required. 928 MSML package profiles would identify a subset of a given MSML package 929 with specific definitions of elements and attributes. Each MSML 930 package profile MUST be accompanied by one or more corresponding 931 schemas. To use the examples above, there could be an audio 932 announcements profile of the MSML Dialog Base package, an audio 933 simple conferencing profile of the MSML Conference Core package, and 934 a multimedia IVR profile of the MSML Dialog Base package. 936 MSML package profiles MUST be published separately from the MSML 937 specification, in one or more standards documents (e.g., Internet 938 Drafts or RFCs) dedicated to MSML package profiles. Profiles would 939 not be registered with IANA and any organization would additionally 940 be free to create its own profile(s) if required. 942 5. Execution Flow 944 MSML assumes a model where there is a single control context within a 945 media server for MSML processing. That context may have one or many 946 SIP [n1] dialogs associated with it. It is assumed that any SIP 947 dialogs associated with the MSML control context have been 948 authorized, as appropriate, by mechanisms outside the scope of MSML. 950 A media server control context maintains information about the state 951 of all media objects and media streams within a media server. It 952 receives and processes all MSML requests from authorized SIP dialogs 953 and receives all events generated internally by media objects and 954 sends them on the appropriate SIP dialog. An MSML request is able to 956 Internet-draft Media Server Markup Language July 28 2009 957 (MSML) 959 create new media objects and streams, and to modify or destroy any 960 existing media objects and streams. 962 An MSML request may simply specify a single action for a media server 963 to undertake. In this case, the document is very similar to a simple 964 command request. Often, though, it may be more natural for a client 965 to request multiple actions at one time, or the client would like 966 several actions to be closely coordinated by the media server. 967 Multiple MSML elements received in a single request MUST be processed 968 sequentially in document order. 970 An example of the first scenario would be to create a conference and 971 join it with an initial participant. An example of the second case 972 would be to unjoin one or more participants from a main conference 973 and join them to a sidebar conference. In the first scenario, network 974 latencies may not be an issue, but it is simpler for the client to 975 combine the requests. In the second case, the added network latency 976 between separate requests could mean perceptible audio loss to the 977 participant. 979 Each MSML request is processed as a single transaction. A media 980 server MUST ensure that it has the necessary resources available to 981 carry out the complete transaction before executing any elements of 982 the request. If it does not have sufficient resources, it MUST return 983 a 520 response and MUST NOT execute the transaction. 985 The MSML request MUST be checked for well-formedness and validated 986 against the schema prior to executing any elements. This allows XML 987 [n2] errors to reported immediately and minimizes failures within a 988 transaction and the corresponding execution of only part of the 989 transaction. 991 Each element is expected to execute immediately. Elements such as 992 , which take an unpredictable amount of time, are 993 "forked" and executed in a separate thread (see MSML Dialog 994 packages). Once successfully forked, execution continues with the 995 element following the . As such, MSML does not provide 996 mechanisms to sequence or coordinate other operations with dialog 997 elements. 999 Processing within a transaction MUST stop if any errors occur. 1000 Elements that were executed prior to the error are not rolled back. 1001 It is the responsibility of the client to determine appropriate 1002 actions based upon the results indicated in the response. Most 1003 elements MAY contain an optional "mark" attribute. The value of that 1004 attribute from the last successfully executed element MUST be 1005 returned in an error response. Note that errors that occur during the 1007 Internet-draft Media Server Markup Language July 28 2009 1008 (MSML) 1010 execution of a dialog occur outside the context of an MSML 1011 transaction. These errors will be indicated in an asynchronous event. 1013 Transaction results are returned as part of the SIP request response. 1014 The transaction results indicate the success or failure of the 1015 transaction. The result MUST also include identifiers for any objects 1016 created by a media server for which the client did not provide an 1017 instance name. Additionally, if the transaction fails, the reason for 1018 the failure MUST be returned, as well as an indication of how much of 1019 the transaction was executed before the failure occurred SHOULD be 1020 returned. 1022 6. Media Server Object Model 1024 Media servers are general-purpose platforms for executing real-time 1025 media processing tasks. These tasks range in complexity from simple 1026 ones such as serving announcements, to complex ones, such as speech 1027 interfaces, centralized multimedia conferencing, and sophisticated 1028 gaming applications. 1030 Calls are established to a media server using SIP. Clients will often 1031 use SIP third party call control (3PCC) [i4] to establish calls to a 1032 media server on behalf of end users. However MSML does not require 1033 that 3PCC be used; only that the client and the media server share a 1034 common identifier for the call and its associated RTP [i3] sessions. 1036 Objects represent entities which source, sink, or modify media 1037 streams. A media streams is a bidirectional or unidirectional media 1038 flow between objects on a media server. The following subsections 1039 define the classes of objects that exist on a media server and the 1040 way these are identified in MSML. 1042 6.1 Objects 1044 A media object is an endpoint of one or more media streams. It may be 1045 a connection that terminates RTP sessions from the network or a 1046 resource that transforms or manipulates media. MSML defines four 1047 classes of media objects. Each class defines the basic properties of 1048 how object instances are used within a media server. However, most 1049 classes require that the function of specific instances be defined by 1050 the client, using MSML or other languages such as VoiceXML. 1052 The following classes of media processing objects are defined. The 1053 class names are given in parentheses: 1055 o network connection (conn) 1057 o conference (conf) 1059 Internet-draft Media Server Markup Language July 28 2009 1060 (MSML) 1062 o dialog (dialog) 1064 Network connection is an abstraction for the media processing 1065 resources involved in terminating the RTP session(s) of a call. For 1066 audio services a connection instance presents a full-duplex audio 1067 stream interface within a media server. Multimedia connections have 1068 multiple media streams of different media types, each corresponding 1069 to an RTP session. Network connections get instantiated through SIP 1070 [n1]. 1072 A conference represents the media resources and state information 1073 required for a single logical mix of each media type in the 1074 conference (e.g. audio and video). MSML models multiple mixes/views 1075 of the same media type as separate conferences. Each conference has 1076 multiple inputs. Inputs may be divided into classes that allow an 1077 application to request different media treatment for different 1078 participants. For example, the video streams for some participants 1079 may be assigned to fixed regions of the screen while those for other 1080 participants may only be shown when they are speaking. 1082 A conference has a single logical output per media type. For each 1083 participant, it consists of the audio conference mix, less any 1084 contributed audio of the participant, and the video mix shared by all 1085 conference participants. Video conferences using voice activated 1086 switching have an optional ability to show the previous speaker to 1087 the current speaker. 1089 Conferences are instantiated using the element. 1090 The content of the element specifies the 1091 parameters of the audio and/or video mixes. 1093 Dialogs are a class of objects that represent automated participants. 1094 They are similar to network connections from a media flow perspective 1095 and may have one or more media streams as the abstraction for their 1096 interface within a media server. Unlike connections however, dialogs 1097 are created and destroyed through MSML, and the media server itself 1098 implements the dialog participant. Dialogs are instantiated through 1099 the element. Contents of the element 1100 define the desired or expected dialog behavior. Dialogs may also be 1101 invoked by referencing VoiceXML as the dialog description language. 1103 Operators are functions that are used to filter or transform a media 1104 stream. The function that an instance of an operator fulfills is 1105 defined as a property of the media stream. Operators may be 1106 unidirectional or bidirectional and have a media type. Unidirectional 1107 operators reflect simple atomic functions such as automatic gain 1108 control, filtering tones from conferences, or applying specific gain 1109 values to a stream. Unidirectional operators have a single media 1111 Internet-draft Media Server Markup Language July 28 2009 1112 (MSML) 1114 input, which is connected to the media stream from one object, and a 1115 single media output, which is connected to the media stream of a 1116 different object. 1118 Bidirectional operators have two media inputs and two media outputs. 1119 One media input and output is associated with the stream to one 1120 object and the other input and output is associated with a stream to 1121 a different object. Bidirectional objects may treat the media 1122 differently in each direction. For example, an operator could be 1123 defined which changed the media sent to a connection based upon 1124 recognized speech or DTMF received from the connection. Operators are 1125 implicitly instantiated when streams are created or modified using 1126 the elements and respectively. 1128 The relationships between the different object classes (conf, conn, 1129 and dialog) are shown in the figure below. 1131 +--------------------------------------+ 1132 | Media Server | 1133 | | 1134 |------+ ,---. | 1135 | | +------+ / \ | 1136 <== RTP ==>| conn |<---->| oper |<---->( conf ) | 1137 | | +------+ \ / | 1138 |------+ `---' | 1139 | ^ ^ | 1140 | | | | 1141 | | +------+ +------+ | | 1142 | | | | | | | | 1143 | +-->|dialog| |dialog|<---+ | 1144 | | | | | | 1145 | +------+ +------+ | 1146 +--------------------------------------+ 1148 A single, full-duplex instance of each object class is shown together 1149 with common relationships between them. An operator (such as gain) is 1150 shown between a connection and a conference and dialogs are shown 1151 participating both with an individual connection and with a 1152 conference. The figure is not meant to imply only one to one 1153 relationships. Conferences will often have hundreds of participants, 1154 and either connections or conferences may be interacting with more 1155 than one dialog. For example, one dialog may be recording a 1156 conference while other dialogs announce participants joining or 1157 leaving the conference. 1159 Internet-draft Media Server Markup Language July 28 2009 1160 (MSML) 1162 6.2 Identifiers 1164 Objects are referenced using identifiers that are composed of one or 1165 more terms. Each term specifies an object class and names a specific 1166 instance within that class. The object class and instance are 1167 separated by a colon ":" in an identifier term. 1169 Identifiers are assigned to objects when they are first created. In 1170 general, either the MSML client or a media server may specify the 1171 instance name for an object. Objects for which a client does not 1172 assign an instance name will be assigned one by a media server. Media 1173 server assigned instance names are returned to the client as a 1174 complete object identifier in the response to the request that 1175 created the object. 1177 It is meaningful for some classes of objects to exist independently 1178 on a media server. Network connections may be created through SIP at 1179 any time. MSML can then be used to associate their media with other 1180 objects as required to create services. Conferences may be created 1181 and have specific resources reserved waiting for participant 1182 connections. 1184 Objects from these two classes, connections and conferences, are 1185 considered independent objects since they can exist on a standalone 1186 basis. Identifiers for independent objects consist of single term as 1187 defined above. For example, identifiers for a conference and 1188 connection could be "conf:abc" or "conn:1234" respectively. Clients 1189 which choose to assign instance names to independent objects must use 1190 globally unique instance names. One way to create globally unique 1191 names is to include the domain name of the client as part of the 1192 name. 1194 Dialogs are created to provide a service to independent objects. 1195 Dialogs may act as a participant in a conference or interact with a 1196 connection similar to a two participant call. Dialogs depend upon the 1197 existence of independent objects and this is reflected in the 1198 composition of their identifiers. Operators modify the media flow 1199 between other objects, such as application of gain between a 1200 connection and a conference. As operators are merely media transform 1201 primitives defined as properties of the media stream, they are not 1202 represented by identifiers and created implicitly. 1204 Identifiers for dialogs are composed of a structured list of slash 1205 ('/') separated terms. The left-most term of the identifier must 1206 specify a conference or connection. This serves as the root for the 1207 identifier. An example of an identifier for a dialog acting as a 1208 conference participant could be: 1210 Internet-draft Media Server Markup Language July 28 2009 1211 (MSML) 1213 conf:abc/dialog:recorder 1215 All objects except connections are created using MSML. Connections 1216 are created when media sessions get established through SIP. There 1217 are several options clients and media servers can use to establish a 1218 shared instance name for a connection and its media streams. 1220 When media servers support multiple media types, the instance name 1221 SHOULD be a call identifier that can be used to identify the 1222 collection of RTP sessions associated with a call. When MSML is used 1223 in conjunction with SIP and third party call control, the call 1224 identifier MUST be the same as the local tag assigned by the media 1225 server to identify the SIP dialog. This will be the tag the media 1226 server adds to the "To" header in its response to an initial invite 1227 transaction. RFC 3261 requires the tag values to be globally unique. 1229 An example of a connection identifier is: conn:74jgd63956ts. 1231 With third party call control, the MSML client acts as a back to back 1232 user agent (B2BUA) to establish the media sessions. SIP dialogs are 1233 established between the client and the media server allowing the use 1234 of the media server local tag as a connection identifier. If third 1235 party call control is not used, a SIP event package MAY be used to 1236 allow a media server to notify new sessions to a client that has 1237 subscribed to this information. 1239 Identifiers as described above allow every object in a media server 1240 to be uniquely addressed. They can also be used to refer to multiple 1241 objects. There are two ways in which this can currently be done: 1243 wildcards 1245 common instance names 1247 An identifier can reference multiple objects when a wildcard is used 1248 as an instance name. MSML reserves the instance name comprised of a 1249 single asterisk ('*') to mean all objects that have the same 1250 identifier root and class. Instance names containing an asterisk 1251 cannot be created. Wildcards MUST only be used as the right most term 1252 of an identifier and MUST NOT be used as part of the root for dialog 1253 identifiers. Wildcards are only allowed where explicitly indicated 1254 below. 1256 The following are examples of valid wildcards: 1258 conf:abc/dialog:* 1260 conn:* 1262 Internet-draft Media Server Markup Language July 28 2009 1263 (MSML) 1265 Examples of illegal wildcard usage are: 1267 conf:*/dialog:73849 1269 Although identifiers share a common syntax, MSML elements restrict 1270 the class of objects which are valid in a given context. As an 1271 example, although it is valid to join two connections together, it is 1272 not valid to join two IVR dialogs. 1274 7. MSML Core Package 1276 This section describes the core MSML package which MUST be supported 1277 in order to use any other MSML packages. The core MSML package 1278 defines a framework, without explicit functionality, over which 1279 functional packages are used. 1281 7.1 1283 is the root element. When received by a media server, it 1284 defines the set of operations that form a single MSML request. 1285 Operations are requested by the contents of the element. Each 1286 operation MAY appear zero or more times as children of . 1287 Specific operations are defined within the Conference package and in 1288 the set of Dialog packages. 1290 The results of a request or the contents of events sent by a media 1291 server are also enclosed within the element. The results of 1292 the transaction are included as a body in the response to the SIP 1293 request that contained the transaction. This response will contain 1294 any identifiers that the media server assigned to newly created 1295 objects. All messages that a media server generates are correlated to 1296 an object identifier. Objects and identifiers are discussed in 1297 section 7 (Media Server Object Model). 1299 Attributes: 1301 version: "1.1" Mandatory 1303 7.2 1305 Events are used to affect the behavior of different objects within a 1306 media server. The element is used to send an event to the 1307 specified recipient within the Media Server. 1309 Attributes: 1311 event: the name of an event. Mandatory. 1313 Internet-draft Media Server Markup Language July 28 2009 1314 (MSML) 1316 target: an object identifier. When the identifier is for a 1317 dialog, it may optionally be appended with a slash "/" followed 1318 by the target to be included in a MSML Dialog . 1319 Mandatory. 1321 valuelist: a list of zero or more parameters that are included 1322 with the event. 1324 mark: a token that can be used to identify execution progress 1325 in the case of errors. The value of the mark attribute from the 1326 last successfully executed MSML element is returned in an error 1327 response. Therefore the value of all mark attributes within an 1328 MSML document should be unique. 1330 7.3 1332 The element is used to report the results of an MSML 1333 transaction. It is included as a body in the final response to the 1334 SIP request which initiated the transaction. An optional child 1335 element may include text which expands on the meaning 1336 of error responses. Response codes are defined in section 11 1337 (Response Codes). 1339 Attributes: 1341 response: a numeric code indicating the overall success or 1342 failure of the transaction, and in the case of failure, an 1343 indication of the reason. Mandatory. 1345 mark: in the case of an error, the value of the mark attribute 1346 from the last successfully executed element that included the 1347 mark attribute. 1349 In the case of failure, a description of the reason SHOULD be 1350 provided using the child element . 1352 Three other child elements allow the response to include identifiers 1353 for objects created by the request but which did not have instance 1354 names specified by the client. Those elements are and 1355 , for objects created though a and 1356 respectively. 1358 7.4 1360 The element is used to notify an event to a media server 1361 client. Three types of events are defined by MSML Core package; 1362 "msml.dialog.exit", "msml.conf.nomedia", and "msml.conf.asn". These 1363 correspond to the termination of an executing dialog, a conference 1365 Internet-draft Media Server Markup Language July 28 2009 1366 (MSML) 1368 being automatically deleted when the last participant has left, and 1369 the notification of the current set of active speakers for a 1370 conference, respectively. Events may also be generated by an 1371 executing dialog. In this case the event type is specified by the 1372 dialog. (see MSML Dialog Core Package ). 1374 Attributes: 1376 name: the type of event. If the event is generated because of 1377 the execution MSML Dialog , the value MUST be the value 1378 of the "event" attribute from the element within the 1379 MSML Dialog Core package. If the event is generated because of 1380 the execution of an , the value MUST be "moml.exit". If 1381 the event is generated because of the execution of a 1382 , the value MUST be "moml.disconnect". If the event 1383 is generated because of an error, the value must be 1384 "moml.error". Mandatory. 1386 id: the identifier of the conference or dialog that generated 1387 the event or caused the event to be generated. Mandatory. 1389 has two children, and , which contain the 1390 name and value respectively of each namelist item associated 1391 with the event. 1393 8. MSML Conference Core Package 1395 8.1 Conferences 1397 A conference has a mixer for each type of media that the conference 1398 supports. Each mix has a corresponding description that defines how 1399 the media from participants contributes to that mix. A mixer has 1400 multiple inputs that are combined in a media specific way to create a 1401 single logical output. 1403 The elements that describe the mix for each media type are called 1404 mixer description elements. They are: 1406 defines the parameters for mixing audio media. 1408 defines the composition of a video window. 1410 These elements, defined in sections 9.6 (Audio Mix) and 9.7 (Video 1411 Layout) respectively, are used as content of the 1412 element to establish the initial properties of a conference. The 1413 elements are used within the element to change the 1414 properties of a conference once it has been created, or within the 1416 Internet-draft Media Server Markup Language July 28 2009 1417 (MSML) 1419 element to remove individual mixes from the 1420 conference. 1422 Conferences may be terminated by an MSML client using the 1423 element to remove the entire conference or by 1424 removing the last mixer(s) associated with the conference. 1425 Conferences can also be terminated automatically by a media server 1426 based on criteria specified when the conference is created. When the 1427 conference is deleted, any remaining participants will have their 1428 associated SIP dialogs left unchanged or deleted based on the value 1429 of the "term" attribute specified when the conference was created. 1431 8.2 Media Streams 1433 Objects have at least one media input and output for each type of 1434 media that they support. Each object class defines the number of 1435 inputs and outputs objects of that class support. Media streams are 1436 created when objects are joined, either explicitly using , or 1437 implicitly when dialogs are created using . Dialog 1438 creation has two stages, allocating and configuring the resources 1439 required for the dialog instance, and implicitly joining those 1440 resources to the dialog target during the dialog execution. Refer to 1441 MSML Dialog Base package. 1443 A join operation by default creates a bidirectional audio stream 1444 between two objects. Video and unidirectional streams may also be 1445 created. A media stream is created by connecting the output from one 1446 object to the input of another object and vice versa (assuming a 1447 bidirectional or full-duplex join). 1449 Many objects may only support a single input for each type of media. 1450 Within this specification, only the conference object class supports 1451 an arbitrary number of inputs. When a stream is requested to be 1452 created to an object that already has a stream of the same type 1453 connected to its single input, the result of the request depends upon 1454 the type of the media stream. 1456 Audio mixing is done by summing audio signals. Automatically mixing 1457 audio streams has common and straight forward applications. For 1458 example, the ability to bridge two streams allows for the easy 1459 creation of simple three-way calls or to bridge private announcements 1460 with a [whispered] conference mix for an individual participant. In 1461 the case of general conferences however, an MSML client SHOULD create 1462 an audio conference and then join participants to the conference. 1463 Conference mixers SHOULD subtract the audio of each participant from 1464 the mix so that they do not hear themselves. 1466 Internet-draft Media Server Markup Language July 28 2009 1467 (MSML) 1469 A media server that receives a request that requires joining an audio 1470 stream to the single audio input of an object that already has an 1471 audio stream connected, SHOULD automatically bridge the new stream 1472 with the existing stream, creating a mix of the two audio streams. 1473 The maximum number of streams that may be bridged in this manner is 1474 implementation-specific. It is RECOMMENDED that a media server 1475 support bridging at least two streams. A media server that cannot 1476 bridge a new stream with any existing streams MUST fail the operation 1477 requesting the join. 1479 Unlike audio mixing, there are many different ways that two video 1480 streams may be combined and presented. For example, they may be 1481 presented side by side in separate panes, picture in picture, or in a 1482 single pane which displays only a single stream at a time based on a 1483 heuristic such as active speaker. Each of these options creates a 1484 very different presentation and require significantly different media 1485 resources. 1487 A join operation does not describe how a new stream can be combined 1488 with an existing stream. Therefore automatic bridging of video is not 1489 supported. A media server MUST fail requests to join a new video 1490 stream to an object that only supports a single video input and 1491 already has a video stream connected to that input. For an object to 1492 have multiple video streams joined to it, the object itself must be 1493 capable in supporting multiple video streams. Conference objects can 1494 support multiple video streams and provide a way to specify the 1495 mixing presentation for the video streams. 1497 A media server MUST NOT establish any streams unless the media server 1498 is able to create all the streams requested by an operation. Streams 1499 are only able to be created if both objects support a media type and 1500 at least one of the following conditions is true: 1502 1. each object that is to receive media is not already receiving a 1503 stream of that type. 1505 2. any object that is to receive media and is already receiving a 1506 stream of that type supports receiving an additional stream of 1507 that type. The only class of objects defined in this 1508 specification that directly support receiving multiple streams 1509 of the same type are conferences. 1511 3. the media server is able to automatically bridge media streams 1512 for an object that is to receive media and that is already 1513 receiving a stream of the requested type. The only type of 1514 media defined in this specification that MAY be automatically 1515 bridged is audio. 1517 Internet-draft Media Server Markup Language July 28 2009 1518 (MSML) 1520 The directionality of media streams associated with a connection are 1521 modeled independently from what SDP [n9] allows for the corresponding 1522 RTP [i3] sessions. Media servers MUST respect the SDP in what they 1523 actually transmit but MUST NOT allow the SDP to affect the 1524 directionality when joining streams internal to the media server. 1526 8.3 1528 is used to allocate and configure the media mixing 1529 resources for conferences. A description of the properties for each 1530 type of media mix required for the conference is defined within the 1531 content of the element. Mixer descriptions are 1532 described in Audio Mix and Video Layout sections. When no mixer 1533 descriptions are specified, the default behavior MUST be equivalent 1534 to inclusion of a single . 1536 Clients can request that a media server automatically delete a 1537 conference when a specified condition occurs by using the 1538 "deletewhen" attribute. A value of "nomedia" indicates that the 1539 conference MUST be deleted when no participants remain into the 1540 conference. When this occurs, an "msml.conf.nomedia" event MUST be 1541 notified to the MSML client. A value of "nocontrol" indicates the 1542 conference MUST be deleted when the SIP [n1] dialog that carries the 1543 element is terminated. When this occurs, a media 1544 server MUST terminate all participant dialogs by sending a BYE for 1545 their associated SIP dialog. A value of "never" MUST leave the 1546 ability to delete a conference under the control of the MSML client. 1548 Attributes: 1550 name: the instance name of the conference. If the attribute is 1551 not present, the media server MUST assign a globally unique 1552 name for the conference. If the attribute is present but the 1553 name is already in use, an error (432) will result and MSML 1554 document execution MUST stop. Events which the conference 1555 generates use this name as the value of their "id" attribute 1556 (see section 5.6.2 ()). 1558 deletewhen: defines whether a media server should automatically 1559 delete the conference. Possible values are "nomedia", 1560 "nocontrol", and "never". Default is "nomedia". 1562 term: when true, the media server MUST send a BYE request on 1563 all SIP dialogs still associated with the conference when the 1564 conference is deleted. Setting term equal to false allows 1565 clients to start dialogs on connections once the conference has 1566 completed. Default true. 1568 Internet-draft Media Server Markup Language July 28 2009 1569 (MSML) 1571 mark: a token which MAY be used to identify execution progress 1572 in the case of errors. The value of the mark attribute from the 1573 last successfully executed MSML element is returned in an error 1574 response. Therefore the value of all mark attributes within an 1575 MSML document should be unique. 1577 An example of creating an audio conference is shown below. This 1578 conference allows at most two participants to contend to be heard and 1579 reports the set of active speakers no more frequently than every ten 1580 seconds. 1582 1583 1584 1585 1586 1587 1588 1589 1590 1592 8.3.1 1594 Conference resources may be reserved by including the 1595 element as a child of . allows the 1596 specification of a set of resources which a media server will reserve 1597 for the conference. Any requests for resources beyond those that have 1598 been reserved should be honored on a best-effort basis by a media 1599 server. 1601 Attributes: 1603 required: boolean that specifies whether 1604 should fail if the requested resources are not available. When 1605 set to false, the conference will be created, with no reserved 1606 resources, if the complete reservation cannot be honored. 1607 Default true. 1609 8.3.1.1 1611 The resources to be reserved are defined using . The 1612 contents of these elements describe a resource that is to be 1613 reserved. Descriptions are implementation-dependent. Media servers 1614 that support MSML Dialogs may use the elements from that package as 1615 the basis for resource descriptions. Each resource element may use 1616 the attribute "n" to define the quantity of the resource to reserve. 1618 Internet-draft Media Server Markup Language July 28 2009 1619 (MSML) 1621 For example, the following creates a conference and reserves two 1622 types of resources. One resource element may represent resources that 1623 are shared by all participants of the conference while the other may 1624 represent resources that are reserved for each of the expected 1625 participants. 1627 Attributes: 1629 n: number of resources to be reserved. Default 1. 1631 type: specifies whether the resource is to be reserved by each 1632 individual participant or reserved as a shared conference 1633 resource. Valid values for this attribute are "individual" or 1634 "shared". Default "individual". 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1647 8.4 1649 All of the properties of an audio mix or the presentation of a video 1650 mix may be changed during the life of a conference using the 1651 element. Changes to an audio mix are requested by 1652 including an element as a child of . 1653 This may also be used to add an audio mixer to the conference if none 1654 was previously allocated. Changes to a video presentation are 1655 requested by including a element as a child of 1656 . Similar to an audio mixer, this may be used to 1657 add a video mixer if none was previously allocated. 1659 Mixers are removed by including a mixer description element within 1660 . 1662 Features and presentation aspects are enabled/added or modified by 1663 including the element(s) that define the feature or presentation 1664 aspect within a mixer description. The complete specification of the 1665 element must be included just as it would be included when the 1666 conference is created. The new definition completely replaces any 1667 previous definition that existed. Only things that are defined by 1668 elements included in the mixer descriptions are affected. Any 1670 Internet-draft Media Server Markup Language July 28 2009 1671 (MSML) 1673 existing configuration aspects of a conference, which are not 1674 specified within the element, MUST maintain their 1675 current state in the Media Server. 1677 For example, if an MSML client wanted to change the minimum reporting 1678 interval for active speaker notification from that shown in the 1679 Conference Examples section () it would send the 1680 following to the media server: 1682 1683 1684 1685 1686 1687 1688 1689 1691 This would also enable active speaker notification if it had not 1692 previously been enabled. The N-loudest mixing is unaffected. 1694 Multiple elements MAY be included in the mixer descriptions similar 1695 to when conferences are created. For example, in a video conference, 1696 the video mix description () could specify that the 1697 layout of the video being displayed should change such that the 1698 regions currently displaying participants get smaller and new 1699 region(s) are created to support additional participants. A media 1700 server MUST make all of the requested changes or none of the 1701 requested changes. 1703 Additional examples of modifying conferences are presented in the 1704 Conference Examples section. 1706 Attributes: 1708 id: the identifier for a conference. Wildcards MUST NOT be 1709 used. Mandatory. 1711 mark: a token which can be used to identify execution progress 1712 in the case of errors. The value of the mark attribute from the 1713 last successfully executed MSML element is returned in an error 1714 response. Therefore the value of all "mark" attributes within 1715 an MSML document SHOULD be unique. 1717 Internet-draft Media Server Markup Language July 28 2009 1718 (MSML) 1720 8.5 1722 Destroy conference is used to delete mixers or to delete the entire 1723 conference and all state and shared resources. When a mixer is 1724 removed, all of the streams joined to that mixer are unjoined. When a 1725 conference is destroyed, SIP dialogs for any remaining participants 1726 MUST be maintained or removed based on the value of the "term" 1727 attribute when the conference was created. 1729 When there is no element content, deletes the 1730 entire conference. Individual mixer(s) are removed by including a 1731 mixer description element identifying the mix(es) to be removed as 1732 content to . is used remove audio 1733 mixers and is used remove video mixers. When one or 1734 more mixer descriptions are specified, then Media Server MUST only 1735 delete the specified mixer and MUST NOT affect any other existing 1736 mixers. When or are identified for 1737 individual removal, other feature aspects of the mix MUST NOT be 1738 included. If specified, the Media Server MUST ignore any such 1739 elements. When the last mixer is removed from a conference, a media 1740 server MUST remove all conference state, leaving or removing any 1741 remaining SIP dialogs as described above. 1743 Attributes: 1745 id: the identifier for a conference. Mandatory. 1747 mark: a token which can be used to identify execution progress 1748 in the case of errors. The value of the mark attribute from the 1749 last successfully executed MSML element is returned in an error 1750 response. Therefore the value of all "mark" attributes within 1751 an MSML document SHOULD be unique. 1753 8.6 1755 The properties of the overall audio mix are specified using the 1756 element. 1758 Attributes: 1760 id: an optional identifier for the audio mix. 1762 samplerate: Integer value specifies the sample rate (in Hz) for 1763 the audio mixer. Optional, default value of 8000. 1765 An example of the description for an audio mix is: 1767 1769 Internet-draft Media Server Markup Language July 28 2009 1770 (MSML) 1772 1773 1774 1776 8.6.1 1778 The element defines that participants contend to be 1779 included in the conference mix based upon their audio energy. When 1780 the element is not present, all participants are mixed. 1782 Attributes: 1784 n: the number of participants that will be included in the 1785 audio mix based upon having the greatest audio energy. 1786 Mandatory. 1788 8.6.2 1790 The element enables notification of active speakers. Active 1791 speakers MUST be notified using the element with an event 1792 name of "msml.conf.asn". The namelist of the event consists of the 1793 set of active speakers. The name of each item is the string "speaker" 1794 with a value of the connection identifier for the connection. 1796 Attributes: 1798 ri: the minimum reporting interval defines the minimum duration 1799 of time which must pass before changes to active speakers will 1800 be reported. A value of zero disables active speaker 1801 notification. 1803 asth: specifies the active speaker threshold (in unit of dBm0). 1804 Valid value range is 0 to -96. Optional, default is -96. 1806 An example of an active speaker notification is: 1808 1809 speaker 1810 conn:hd93tg5hdf 1811 speaker 1812 conn:w8cn59vei7 1813 speaker 1814 conn:p78fnh6sek47fg 1815 1817 Internet-draft Media Server Markup Language July 28 2009 1818 (MSML) 1820 8.7 1822 A video layout is specified using the element. It is 1823 used as a container to hold elements that describe all of the 1824 properties of a video mix. The parameters of the window that displays 1825 the video mix are defined by the element. When the video mix 1826 in composed of multiple panes, the location and characteristics of 1827 the panes are defined by one or more elements. A 1828 element is not required when only a single video stream is displayed 1829 at one time and none of the visual attributes of regions are 1830 required. 1832 Some regions may be used to display a video stream based on a 1833 selection criteria rather than having a video stream of a single 1834 participant continuously presented in the region. One such an example 1835 is a distance learning lecture where the instructor sees each of the 1836 students periodically displayed in a region. When a region is used to 1837 display one of a number of streams, it is placed as a child of a 1838 element. 1840 Attributes: 1842 type: specifies the language used to define the layout. Layouts 1843 defined using MSML MUST use the value "text/msml-basic-layout". 1844 This is the same convention as defined for the layout package 1845 from the W3C SMIL 2.0 specification [i6]. The default when 1846 omitted is "text/msml-basic-layout". 1848 id: an optional identifier for the video layout. 1850 8.7.1 1852 The element describes the root window or virtual screen in 1853 which the conference video mix will be displayed. Simple conferences 1854 can display participant video directly within the root window but 1855 more complex conferences will use regions for this purpose. Areas of 1856 the window which are not used to display video will show the root 1857 window background. 1859 All video presentations require a root window. It MUST be present 1860 when a video mix is created and it cannot be deleted, however its 1861 attributes MAY be changed using the element. 1863 Attributes: 1865 size: the size of the root window specified as one of the five 1866 standard common intermediate formats (e.g. CIF, QCIF, etc.). 1868 Internet-draft Media Server Markup Language July 28 2009 1869 (MSML) 1871 backgroundcolor: the color for the root window background 1872 defined using the values for the "background-color" property of 1873 the CSS2 specification [n10]. 1875 backgroundimage: the URI for an image to be displayed as the 1876 root window background. Transparent portions of the image allow 1877 the background color to show through. 1879 8.7.2 1881 elements define video panes that are used to display 1882 participant video streams. Regions are rendered on top of the root 1883 window. 1885 The size of a region is specified relative to the size of the root 1886 window using the "relativesize" attribute. Relative sizes are 1887 expressed as fractions (e.g. 1/4, 1/3) that preserve the aspect ratio 1888 of the original video stream while allowing for efficient scaling 1889 implementations. 1891 Regions are located on the root window based on the value of the 1892 position attributes "top" and "left". These attributes define the 1893 position of the top left corner of the region as an offset from the 1894 top left corner of the root window. Their values may be expressed 1895 either as a number of pixels or as a percent of the vertical or 1896 horizontal dimension of the root window. Percent values are appended 1897 with a percent ('%') character. Percent values of "33%" and "67%" 1898 should be interpreted as "1/3" and "2/3" to allow easy alignment of 1899 regions whose size is expressed relative to the size of the root 1900 window. 1902 An example of a video layout with six regions is: 1904 +-------+---+ 1905 | | 2 | 1906 | 1 +---+ 1907 | | 3 | 1908 +---+---+---+ 1909 | 6 | 5 | 4 | 1910 +---+---+---+ 1912 1913 1914 1915 1916 1917 1919 Internet-draft Media Server Markup Language July 28 2009 1920 (MSML) 1922 1923 1924 1926 The area of the root window covered by a region is a function of the 1927 region's position and its size. When areas of different regions 1928 overlap, they are layered in order of their "priority" attribute. The 1929 region with the highest value for the "priority" attribute is below 1930 all other regions and will be hidden by overlapping regions. The 1931 region with the lowest non-zero value for the "priority" attribute is 1932 on top of all other regions and will not be hidden by overlapping 1933 regions. The priority attribute may be assigned values between 0 and 1934 1. A value of zero disables the region, freeing any resources 1935 associated with the region, and unjoining any video stream displayed 1936 in the region. 1938 Regions that do not specify a priority will be assigned a priority by 1939 a media server when a conference is created. The first region within 1940 the element that does not specify a priority will be 1941 assigned a priority of one, the second a priority of two, etc. In 1942 this way, all regions that do not explicitly specify a priority will 1943 be underneath all regions that do specify a priority. As well, within 1944 those regions that do not specify a priority, they will be layered 1945 from top to bottom, in the order they appear within the 1946 element. 1948 For example, if a layout was specified as follows: 1950 1951 1952 1953 1954 1955 1956 1958 Then the regions would be layered, from top to bottom, c,a,b,d. 1960 Portions of regions that extend beyond the root window will be 1961 cropped. For example, a layout specified as: 1963 1964 1965 1966 1968 Internet-draft Media Server Markup Language July 28 2009 1969 (MSML) 1971 would appear similar to: 1973 +-----------+ 1974 | root | 1975 |background | 1976 | +-----+-- 1977 | | |// 1978 | | foo |// 1979 +-----+-----+// 1980 |//////// 1982 Visual attributes are used to define aspects of the visual appearance 1983 of individual regions. A border may be defined together with a title 1984 and/or logo. Text and logos are displayed as images on top of the 1985 region's video, below all regions with a lower priority. The visual 1986 attributes are "title", "titletextcolor", "titlebackgroundcolor", 1987 "bordercolor", "borderwidth", and "logo". 1989 Visual attributes can also be defined for individual streams (Video 1990 Stream Properties). When visual attributes are specified as part of 1991 both a region and a stream, those associated with the stream MUST 1992 take precedence. This allows streams that are chosen for display 1993 automatically (Stream Selection) to have proper text and logos 1994 displayed. The region visual attributes are displayed when no stream 1995 is associated with the region. 1997 Two other attributes associated with a region, "blank" and "freeze", 1998 define the state of the video displayed in the region. When the blank 1999 or freeze attribute is assigned the value "true", then the Media 2000 Server MUST display the region either as a blank region, or the video 2001 image frozen at the last received frame. 2003 These attributes are specified for a region and not allowed for 2004 streams because that appears to be the common use case. Applying them 2005 to streams would allow only that stream to be affected within a 2006 selector while other streams continue to display normally. Except for 2007 personal mixing scenarios, the same effect can be achieved by having 2008 the participant mute their own transmission to the media server. 2010 Attributes: associated with each region: 2012 id: a name that can be used to refer to the region. 2014 left: the position of the region from the left side of the root 2015 window. 2017 Internet-draft Media Server Markup Language July 28 2009 2018 (MSML) 2020 top: the position of the region from the top of the root 2021 window. 2023 relativesize: the size of the region expressed as a fraction of 2024 the root window size. 2026 priority: a number between 0 and 1 that is used to define the 2027 precedence when rendering overlapping regions. A value of zero 2028 disables the region. 2030 title: text to be displayed as the title for the region 2032 titletextcolor: the color of the text 2034 titlebackgroundcolor: the color of the text background 2036 bordercolor: the color of the region border 2038 borderwidth: the width of the region border 2040 logo: the URI of an image file to be displayed 2042 freeze: a boolean value, with a default of false, that defines 2043 whether the video image should be frozen at the currently 2044 displayed frame 2046 blank: a boolean value, with a default of false, that defines 2047 whether the region should display black instead of the 2048 associated video stream 2050 8.7.3 2052 It is often desired that one of several video streams be 2053 automatically selected to be displayed. The element is 2054 used to define the selection criteria and its associated parameters. 2055 The selection algorithm is specified by the "method" attribute. 2056 Currently defined selection methods allow for voice activated 2057 switching and to iterate sequentially through the set of associated 2058 video streams. 2060 The regions that will display the selected video stream are placed as 2061 child elements of the element. Including regions within a 2062 element does not affect their layout with respect to 2063 regions not subject to the selection. For simple video conferences 2064 that display the video directly in the root window, the 2065 element can be placed as a child of . Region elements MUST 2066 NOT be used in this case. 2068 Internet-draft Media Server Markup Language July 28 2009 2069 (MSML) 2071 For example, below is a common video layout that allows the video 2072 stream from the currently active speaker to be displayed in the large 2073 region ("1") at the top left of the layout while the streams from 2074 five other participants are displayed in regions located at the 2075 layout periphery. 2077 +-------+---+ 2078 | | 2 | 2079 | 1 +---+ 2080 | | 3 | 2081 +---+---+---+ 2082 | 6 | 5 | 4 | 2083 +---+---+---+ 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2097 All selector methods must be defined so that they work if only a 2098 single region is a child of the selector. Selector methods that 2099 support more than one child region MUST specify how the method works 2100 across multiple regions. Media server implementations MAY support 2101 only a single region for methods that are defined to allow multiple 2102 regions. 2104 The selector or region for a participant's video is defined using the 2105 "display" attribute of during a join operation. Specifying a 2106 selector allows the stream to be displayed according to the criteria 2107 defined by the selector method. Specifying a region supports 2108 continuous presence display of participants. Some streams may be 2109 joined with both a selector and a region. In this case, the value of 2110 attribute defines whether the streams associated with a 2111 continuous presence region should be blanked when the stream is 2112 selected for display in one of the selector regions. 2114 Attributes: common to all selector methods are: 2116 Internet-draft Media Server Markup Language July 28 2009 2117 (MSML) 2119 id: a name that can be used to refer to the selector. 2121 method: the name of the method used to select the video stream. 2122 A value of "vas" (see section on Voice Activated Switching) MAY 2123 be specified. 2125 status: specifies whether the selector is "active" or 2126 "disabled". 2128 blankothers: when "true", video streams that are also displayed 2129 in continuous presence regions will have the continuous 2130 presence regions blanked when the stream is displayed in a 2131 selection region. 2133 8.7.3.1 Voice Activate Switching (vas) 2135 Voice activated switching (VAS) is used to display the video stream 2136 that correlates with the participant who is currently speaking. It is 2137 specified using a selector method value of "vas". 2139 If the video stream associated with the active speaker is not 2140 currently displayed in a selection region, then it replaces the video 2141 in the region that is displaying the video of the speaker that was 2142 least recently active. If the video of the active speaker is 2143 currently displayed in a selection region, then there is no change to 2144 any region. When VAS is applied to a single region, this has the 2145 effect that the current speaker is displayed in that region. 2147 Attributes: 2149 si: switching interval is the minimum period of time that must 2150 elapse before allowing the video to switch to the active 2151 speaker. 2153 speakersees: defines whether the active speaker sees the 2154 "current" speaker (themselves) or the "previous" speaker. 2156 8.8 2158 is used to create one or more streams between two independent 2159 objects. Streams may be audio or video and may be bidirectional or 2160 unidirectional. A bidirectional stream is implicitly composed of two 2161 unidirectional streams that can be manipulated independently. The 2162 streams to be established are specified by elements (section 2163 ) as the content of . 2165 Without any content, by default establishes a bidirectional 2166 audio stream. When only a stream of a single type has previously been 2168 Internet-draft Media Server Markup Language July 28 2009 2169 (MSML) 2171 created between two objects, or when only a unidirectional stream 2172 exists, can be used to add a stream of another media type or 2173 make the stream bidirectional by including the necessary 2174 elements. Bidirectional streams are made unidirectional by using 2175 (section ) to remove the unidirectional stream for 2176 the direction that is no longer required. 2178 In addition to defining the media type and direction of streams, 2179 elements are also used to establish the properties of 2180 streams, such as gain, voice masking, or tone clamping of audio 2181 streams, or labels and other visual characteristics of video streams. 2182 Properties are often defined asymmetrically for a single direction of 2183 a stream. Creating a bidirectional stream requires two 2184 elements within the , one for each direction, if one direction 2185 is to have different properties from the other direction. 2187 If a media server can provide services using both compressed or 2188 uncompressed media, the MSML client may need to distinguish within 2189 requests which format is to be used. When compressed streams are 2190 created, both objects must use the same media format or an error 2191 response (450) is generated. 2193 Attributes: 2195 id1: an identifier of either a connection or conference. 2196 Wildcards MUST NOT be used. Mandatory. Any other object class 2197 results in a 440 error. 2199 id2: an identifier of either a connection or conference. 2200 Wildcards MUST NOT be used. Mandatory. Any other object class 2201 results in a 440 error. 2203 mark: a token which can be used to identify execution progress 2204 in the case of errors. The value of the mark attribute from the 2205 last successfully executed MSML element is returned in an error 2206 response. Therefore the value of all mark attributes within an 2207 MSML document SHOULD be unique. 2209 For example, consider a call center coaching scenario where a 2210 supervisor can listen to the conversation between an agent and a 2211 customer, and provide hints to the agent, which are not heard by the 2212 customer. One join establishes a stream between the agent and the 2213 customer and another join establishes a stream between the agent and 2214 the supervisor. A third join is used to establish a half-duplex 2215 stream from the customer to the supervisor. The media server 2216 automatically bridges the media streams from the customer and the 2217 supervisor for the agent, and from the customer and the agent for the 2218 supervisor. 2220 Internet-draft Media Server Markup Language July 28 2009 2221 (MSML) 2223 Assuming the following connections, each with a single audio stream: 2225 conn:supervisor 2227 conn:agent 2229 conn:customer 2231 The following would create the media flows previously described: 2233 2234 2235 2236 2237 2238 2239 2240 2242 The following example, shows joining a participant to a multimedia 2243 conference. It assumes that the conference has a video presentation 2244 region named "topright". The "display" attribute is explained in 2245 section Video Stream Properties. 2247 2248 2249 2250 2251 2252 2253 2254 2256 8.9 2258 Media streams can have different properties such as the gain for an 2259 audio stream or a visual label for a video stream. These properties 2260 are specified as the content of elements (section ). 2261 is used to change the properties of a stream by 2262 including one or more elements that are to have their 2263 properties changed. 2265 Stream properties MUST be set as specified by the element as 2266 a child element of element. Any properties not 2267 included in the element when modifying a stream MUST remain 2268 unchanged. Setting a property for only one direction of a 2269 bidirectional stream MUST NOT affect the other direction. The 2271 Internet-draft Media Server Markup Language July 28 2009 2272 (MSML) 2274 directionality of streams can be changed using issuing an 2275 followed by a . Any streams that exist between the two objects 2276 that are not included within MUST NOT be affected. 2278 Attributes: 2280 id1: an identifier of either a conference or a connection. The 2281 instance name MUST NOT contain a wildcard if "id2" contains a 2282 wildcard. Mandatory. 2284 id2: an identifier of either a conference or a connection. The 2285 instance name MUST NOT contain a wildcard if "id1" contains a 2286 wildcard. Mandatory. 2288 mark: a token which can be used to identify execution progress 2289 in the case of errors. The value of the mark attribute from the 2290 last successfully executed MSML element is returned in an error 2291 response. Therefore the value of all mark attributes within an 2292 MSML document are RECOMMENDED to be unique. 2294 8.10 2296 Unjoin removes one or more media streams between two objects. In the 2297 absence of any content in element, all media streams between 2298 the objects MUST be removed. Individual streams may be removed by 2299 specifying them using elements, while the unspecified 2300 streams MUST NOT be removed. A bidirectional stream is changed to a 2301 unidirectional stream by unjoining the direction that is no longer 2302 required, using the element. Operator elements MUST NOT be 2303 specified within elements when streams are being unjoined 2304 using the element. Any specified stream operators MUST be 2305 ignored. 2307 and may be used together to move a media stream, such 2308 as from a main conference to a sidebar conference. 2310 Attributes: 2312 id1: an identifier of either a conference or a connection. The 2313 instance name MUST NOT contain a wildcard if "id2" contains a 2314 wildcard. Mandatory. 2316 id2: an identifier of either a conference or a connection. The 2317 instance name MUST NOT contain a wildcard if "id1" contains a 2318 wildcard. Mandatory. 2320 mark: a token which can be used to identify execution progress 2321 in the case of errors. The value of the mark attribute from the 2323 Internet-draft Media Server Markup Language July 28 2009 2324 (MSML) 2326 last successfully executed MSML element is returned in an error 2327 response. Therefore the value of all mark attributes within an 2328 MSML document SHOULD be unique. 2330 The following removes a participant from a conference and plays a 2331 leave tone for the remaining participants in the conference. 2333 2334 2335 2336 2337 2338 2340 2341 2343 8.11 2345 Monitor is a specialized unidirectional join that copies the media 2346 that is destined for a connection object. One example of the use for 2347 may be quality monitoring within a conference. The media 2348 stream may be removed using the element (see section 2349 ). 2351 Attributes: 2353 id1: an identifier of the connection to be monitored. 2354 Mandatory. Any other object class results in a 440 error. 2355 Wildcards MUST NOT be used. 2357 id2: an identifier of the object which is to receive the copy 2358 of the media destined to id1. id2 may be a connection or a 2359 conference. Mandatory. Any other object class results in a 440 2360 error. Wildcards MUST NOT be used. 2362 compressed: "true" or "false". Specifies whether the join 2363 should occur before or after compression. When "true", id2 must 2364 be a connection using the same media format as id1 or an error 2365 response (450) is generated. Default is "false. 2367 mark: a token which can be used to identify execution progress 2368 in the case of errors. The value of the mark attribute from the 2369 last successfully executed MSML element is returned in an error 2370 response. Therefore the value of all mark attributes within an 2371 MSML document SHOULD be unique. 2373 Internet-draft Media Server Markup Language July 28 2009 2374 (MSML) 2376 8.12 2378 Individual streams are specified using the element. They MAY 2379 be included as a child element in any of the stream manipulation 2380 elements , , or . 2382 The type of the stream is specified using a "media" attribute that 2383 uses values corresponding to the top-level MIME media types as 2384 defined in RFC 2046 [i7]. This specification only addresses audio and 2385 video media. Other specifications may define procedures for 2386 additional types. 2388 A bidirectional stream is identified when no direction attribute 2389 "dir" is present. A unidirectional stream is identified when a 2390 direction attribute is present. The "dir" attribute MUST have a value 2391 of "from-id1" or "to-id1" depending on the required direction. These 2392 values are relative to the identifier attributes of the parent 2393 element. 2395 The compressed attribute is used to distinguish the compressed nature 2396 of the stream when necessary. It is implementation specific what is 2397 used when the attribute is not present. Joining compressed streams 2398 acts much like an RTP [i3] relay. 2400 The properties of the media streams are specified as the content of 2401 elements when the element is used as a child of or 2402 . Stream elements MUST NOT have any content when they 2403 are used as a child of to identify specific streams to 2404 remove. 2406 Some properties are defined within MSML as additional attributes or 2407 child elements of that are media type specific. Ones for 2408 audio streams and video streams are defined in the following two sub- 2409 sections. Operators, viewed as properties of the media stream, MAY be 2410 specified as child elements of the element. 2412 Attributes: 2414 media: "audio" or video". Mandatory 2416 dir: "from-id1" or "to-id1". 2418 compressed: "true" or "false". Specifies whether the stream 2419 uses compressed media. Default is implementation specific. 2421 Internet-draft Media Server Markup Language July 28 2009 2422 (MSML) 2424 8.12.1 Audio Stream Properties 2426 Audio mixes can be specified to only mix the N-loudest participants. 2427 However there may be some "preferred" participants that are always 2428 able to contribute. When audio streams are joined to a conference 2429 that uses N-loudest audio mixing, preferred streams need to be 2430 identified. 2432 A preferred audio stream is identified using the "preferred" 2433 attribute. The "preferred" attribute MAY be used for an audio stream 2434 that is input to a conference and MUST NOT be used for other streams. 2436 Additional attributes of the element for audio streams are: 2438 Attributes: 2440 preferred: a boolean value that defines whether the stream does 2441 not contend for N-loudest mixing. A value of "true" means that 2442 the stream MUST always be mixed while a value of "false" means 2443 that the stream MAY contend for mixing into a conference when 2444 N-loudest mixing is enabled. Default "false". 2446 There are two elements that can be used to change the characteristics 2447 of an audio stream as defined below. 2449 8.12.1.1 2451 The element may be used to adjust the volume of an audio media 2452 stream. It may be set to a specific gain amount, to automatically 2453 adjust the gain to a desired target level, or to mute the stream. 2455 Attributes: 2457 id: an optional identifier which may be referenced elsewhere 2458 for sending events to the gain primitive. 2460 amt: a specific gain to apply specified in dB or the string 2461 "mute" indicating that the stream should be muted. This 2462 attribute MUST NOT be used if "agc" is present. 2464 agc: boolean indicating whether automatic gain control is to be 2465 used. This attribute MUST NOT be used if "amt" is present. 2467 tgtlvl: the desired target level for AGC specified in dBm0. 2468 This attribute MUST be specified if "agc" is set to "true". 2469 This attribute MUST NOT be specified if "agc" is not present. 2471 Internet-draft Media Server Markup Language July 28 2009 2472 (MSML) 2474 maxgain: the maximum gain that AGC may apply. Maxgain is 2475 specified in dB. This attribute MUST be used if "agc" is 2476 present and MUST NOT be used when "agc" is not present. 2478 8.12.1.2 2480 The element is used to filter tones and/or audio-band dtmf 2481 from a media stream. 2483 Attributes: 2485 dtmf: boolean indicating whether DTMF tones should be removed. 2487 tone: boolean indicating whether other tones should be removed. 2489 8.12.2 Video Stream Properties 2491 Video mixes define a presentation that may have multiple regions, 2492 such as a quad-split. Each region displays the video from one or more 2493 participants. When video streams are joined to such a conference, the 2494 region that will display the video needs to be specified as part of 2495 the join operation. 2497 The region that will display the video is specified using the 2498 "display" attribute. The "display" attribute MUST be used for a video 2499 stream that is input to a conference and MUST NOT be used for other 2500 streams. The value of the attribute MUST identify a (see 2501 section ) or a (see section ) that is 2502 defined for the conference. A stream MUST NOT be directly joined to a 2503 region that is defined within a selector. Changing the value of the 2504 "display" attribute can be used to change where in a video 2505 presentation layout a video stream is displayed. 2507 Additional attributes of the element for video streams are: 2509 Attributes: 2511 display: the identifier of a video layout region or selector 2512 that is to be used to display the video stream. 2514 override: specifies whether or not the given video stream is 2515 the override source in the region defined by "display" 2516 attribute. Valid values are: "true" or "false". Optional, 2517 default value is "false". Only a video stream that is input to 2518 a conference can be the override source. A particular region 2519 can have at most one override source at a time. The most 2520 recently joined video stream with this attribute set to "true" 2521 becomes the override source. When there's an override source in 2523 Internet-draft Media Server Markup Language July 28 2009 2524 (MSML) 2526 place, its video is always displayed in the region, regardless 2527 of what video selection algorithm (either a selector or 2528 continuous presence mode) is configured for that region. Once 2529 the override source is cleared, the conference MUST revert back 2530 to original video selection algorithm. 2532 8.12.2.1 2534 Some regions of video conferences may display different streams 2535 automatically, such as when voice activated switching is used. 2536 Connections MAY also be joined directly without the use of video 2537 mixing. In these cases, the element may be used to define 2538 visual display properties for a stream. 2540 The element MAY use any of the visual attributes defined for 2541 regions (see section ). This allows the visual aspects of 2542 regions within a to be tailored to the selected video 2543 stream, or for streams that are directly joined to display a name or 2544 logo. 2546 9. MSML Dialog Packages 2548 9.1 Overview 2550 MSML Dialog Packages define an XML [n2] language for composing 2551 complex media objects from a vocabulary of simple media resource 2552 objects called primitives. It is primarily a descriptive or 2553 declarative language to describe media processing objects. MSML 2554 dialogs operate on a single or multiple streams which are identified 2555 by the MSML document outside the scope of the MSML dialog package. 2557 MSML Dialogs are intended to be used in different environments. As 2558 such, the language itself does not define how an MSML Dialog is used. 2559 Each environment in which MSML Dialog is used must define how it is 2560 used, the set of services provided and the mechanism for passing 2561 information between the environment and MSML Dialog. The specific 2562 mechanisms used to realize the interface between MSML Dialog and its 2563 environment are platform specific. 2565 MSML Dialog packages provide two models for access to media resources 2566 and service creation building blocks. Both models MAY be used in 2567 conjunction with each other in a complementary manner. The first 2568 model (referred to as "Media Primitives and Composites", part of the 2569 mandatory MSML Dialog Base package) contains media primitives (such 2570 as digit collection and announcements) and composite functions (such 2571 as play and collect combined as a single operation). The second model 2572 (referred to as "Media Groups", part of the optional MSML Dialog 2573 Group package) allows the ability to define complex customized 2575 Internet-draft Media Server Markup Language July 28 2009 2576 (MSML) 2578 interactions, via event passing mechanisms, between media primitives, 2579 if required. 2581 MSML Dialog Core Package 2583 Defines core framework over which all MSML dialog packages 2584 operate. 2586 MSML Dialog Base Package 2588 Media Primitives 2589 or 2590 DTMF digit collection 2591 2592 Playing of Announcements 2593 2594 Generation of DTMF digits 2595 2596 Tone genration 2597 2598 Media recording 2600 Media Composites 2601 2602 Supports play and collect operation. 2603 Composite function with inclusion of play. 2604 2605 Supports play and record operation. 2606 Composite function with inclusion of play. 2608 MSML Dialog Group Package 2610 2611 Allows grouping of media primitives for parallel 2612 execution, with an event exchange mechanism 2613 between the media primitives to achieve 2614 customized media operations. All the above media 2615 primitive elements are accepted within the 2616 group. 2618 Following operations MUST be supported using elements described above 2619 using either the MSML Dialog Base Package or MSML Dialog Group 2620 Package. 2622 Announcement only 2623 2624 Collection only 2625 or 2627 Internet-draft Media Server Markup Language July 28 2009 2628 (MSML) 2630 Recording only 2631 2633 Play and Collect 2634 2635 2636 2638 Play and Record 2639 2640 2641 2643 Additional MSML Dialog packages are: 2645 O MSML Dialog Transform Package 2647 O MSML Dialog Speech Package 2649 O MSML Fax Detection Package 2651 O MSML Fax Send/Receive Package 2653 MSML Dialogs MAY be used to simply expose primitive media resource 2654 objects but will be used more often to describe dialog operations and 2655 media transformation objects which can be controlled via user 2656 interaction. 2658 MSML Dialogs do not contain any computation or flow control 2659 constructs. There are no results automatically generated when media 2660 operations complete. Results MUST be explicitly requested using a 2661 or element within the definition of the MSML Dialog. 2663 9.2 Primitives 2665 Primitives perform a single function on a media stream or multiple 2666 streams such as generating audio/video, recognizing speech or DTMF, 2667 or adjusting the gain. They may be composed so that primitives 2668 execute concurrently. Primitives not composed for concurrent 2669 execution MUST simply execute sequentially in the order they occur in 2670 a MSML document. All concurrently executing primitives in the same 2671 MSML object (defined in one MSML document) MAY interact with each 2672 other through events (see MSML Dialog Group package). 2674 Primitives are categorized into one of the following descriptive 2675 categories. 2677 Internet-draft Media Server Markup Language July 28 2009 2678 (MSML) 2680 o recognizers have a media input but no output. They allow 2681 different things within a media stream to be recognized or 2682 detected and for events to be generated based upon received 2683 media. 2685 o transformers have one media input and output and may send and 2686 receive events; 2688 o sources and sinks generate or consume media. They have either a 2689 media input or a media output but not both. They may receive 2690 and generate events. 2692 o composites combine underlying primitives to provide higher- 2693 level user interaction, without the need for specific event 2694 based exchange between the primitives. The composite elements 2695 provide a simpler mechanism for more commonly used services, 2696 such as play and collect or play and record. 2698 Primitives may define different media processing behavior (states) 2699 based upon the events which they receive. Primitives which support 2700 different processing states must define their default starting state 2701 and should support the "initial" attribute to allow that state to be 2702 specified when the primitive is instantiated. All primitives must 2703 support the "terminate" event class. 2705 The following types of primitives are defined within this 2706 specification: 2708 Recognizers Transformers Source/Sink Composites 2709 ------------------------------------------------------ 2710 dtmf/collect agc play dtmf/collect 2711 faxdetect clamp record record 2712 speech gain dtmfgen 2713 vad gate tonegen 2714 relay faxsend 2715 faxrcv 2717 Primitives have shadow variables, similar to those within VoiceXML 2718 [n5], which are automatically assigned values when the primitives are 2719 used. Upon initialization of an MSML Dialog context, all shadow 2720 variables have the string value "undefined". Each primitive has its 2721 own instance of shadow variables which are global in scope to the 2722 entire MSML Dialog context. 2724 Names SHOULD be assigned to individual primitives when more than one 2725 primitive of the same type is used within one MSML document. Shadow 2726 variables are overwritten if the primitive has not been named and is 2727 instantiated a second time. 2729 Internet-draft Media Server Markup Language July 28 2009 2730 (MSML) 2732 Shadow variables cannot be modified under user control. They may be 2733 returned from the MSML Dialog context using the element. 2735 9.3 Events 2737 Events provide the mechanism for primitives to interact with each 2738 other and for a MSML context to interact with its external 2739 environment. The external environment is defined by the way in which 2740 a MSML context has been invoked. This will often be through MSML but 2741 other languages and protocols such as SIP may also be used. 2743 Every primitive and group conceptually implements their own event 2744 queue. Events sent to them get placed into their associated queue. 2745 Events are removed from their queues and processed in order. 2746 Primitives within a group conceptually have their own thread of 2747 execution. Due to the asynchronous nature of servicing events from 2748 multiple queues, it cannot be assumed that several events sent in 2749 sequence to different queues, will be processed in the order in which 2750 they were sent. For example, if recognition of something led to 2751 sending events to both a and a in that order, it is 2752 possible that the may process its event before the . 2754 Primitives each define the set of events which they support and the 2755 behavior associated with their handling of each event. This allows 2756 many types of behaviors to be defined. For example, VCR type controls 2757 can be constructed by defining primitives which support events 2758 corresponding to each control. Media recognition/detection can be 2759 used to cause those events to be generated. 2761 Alternatively, events can be originated elsewhere, such as from a 2762 Control Agent, and simply received by the primitive implementing the 2763 control. Examples of the use of events include adjusting volume 2764 (gain) and pause and resume of both announcement playout and record 2765 creation. 2767 Primitives act on events based upon the longest match of an event 2768 name. Event names are a period '.' delimited sequence of tokens. The 2769 first token, or the root of the name, can be considered an event 2770 class. Matching allows a standard meaning to be defined and then 2771 extended based upon what triggers an event's generation. For example, 2772 a record primitive has different behavior depending upon whether it 2773 completed because a user stopped speaking or because it was 2774 cancelled. The recording is retained in the first case but not the 2775 second. 2777 Longest match allows new recognizers to be created and used without 2778 changing how existing primitives are defined. For example, a face 2779 recognition capability could be created which generates a 2781 Internet-draft Media Server Markup Language July 28 2009 2782 (MSML) 2784 terminate.frowning event when a user looks puzzled. Although no 2785 primitive directly defines this event, it will still effect a generic 2786 terminate action. Primitives which require specialized behavior based 2787 upon frowning may be extended to support this. As well, the event can 2788 still be exported from the MSML context without requiring that 2789 primitives receiving the event understand facial expressions. 2791 9.4 MSML Dialog Usage with SIP 2793 MSML Dialogs MAY be used directly with SIP for dialog interactions 2794 (e.g., IVR or fax). It can be initially invoked as part of the 2795 "Prompt and Collect" service described in "Basic Network Media 2796 Services with SIP" [n7]. That defines service indicators for a small 2797 number of well defined services using the user part of the SIP 2798 Request-URI (R-URI). 2800 The prompt and collect service uses "dialog" as the service 2801 indicator. URI parameters further refine the specific IVR request. 2802 This document defines an additional parameter "msml-param" for the 2803 dialog service indicator as follows: 2805 dialog-parameters = ";" ( dialog-param [ vxml-parameters ] ) 2806 | moml-param 2807 dialog-param = "voicexml=" dialog-url 2808 moml-param = "moml=" moml-url 2810 There are no additional URI parameters when MSML is used as the 2811 dialog language. 2813 MSML Dialogs defines discrete IVR dialog commands. These commands MAY 2814 be included directly in the body of the INVITE to the "dialog" 2815 service indicator by using the "cid" [n8] URL scheme. This scheme 2816 identifies a message body part which in this case would contain the 2817 MSML Dialog request. Note that a multipart message body, containing a 2818 single part, MUST be present even if the INVITE does not contain an 2819 SDP offer. Subsequent MSML Dialog requests are sent in the body of 2820 SIP INFO messages as are all messages from a media server. 2822 An example of SIP URI as described above is: 2824 sip:dialog@mediaserver.example.net;\ 2825 moml=cid:14864099865376@appserver.example.net 2827 The body part that contained the MSML Dialog referenced by the URL 2828 would have a Content-Id header of: 2830 Content-Id: <14864099865376@appserver.example.net> 2832 Internet-draft Media Server Markup Language July 28 2009 2833 (MSML) 2835 The results of executing an or , or of executing a 2836 which has a "target" attribute value equal to "source", are 2837 notified in SIP INFO messages using the element from MSML 2838 Core package. No messages are sent if execution completes normally 2839 without executing one of these elements. 2841 If there is an error during validation or execution, then a media 2842 server MUST notify the error as described above and must include the 2843 namelist items "moml.error.status" and "moml.error.description". The 2844 values for these items are defined in section 11. 2846 A restricted subset of MSML Dialogs can also be used with the 2847 "Announcement" service defined in [n7]. This service uses "annc" as 2848 the service indicator and defines parameters that describe an 2849 announcement. The "play=" parameter identifies the URL of a prompt or 2850 a provisioned announcement sequence. The value of the "play=" 2851 parameter can refer to a MSML Dialog body part using a "cid" URL as 2852 described above. That body part must only contain the 2853 primitive. 2855 Using MSML Dialogs enhances the announcement service by allowing the 2856 client to specify a sequence of audio segments rather than requiring 2857 each sequence to be provisioned as well as support for video. 2858 Moreover, MSML Dialogs define a standard set of variables in contrast 2859 to [n7] which defines a parameterization mechanism but does not 2860 formally specify any semantics. 2862 If a media server does not understand the "cid" scheme or does not 2863 understand MSML Dialogs, it must respond with the SIP response code 2864 "488 - not acceptable here". If the MSML Dialog body contains 2865 elements other than the primitive, or there are errors during 2866 validation, a media server must respond with a SIP response code "400 2867 - bad request". Finally, if there is a discrepancy between parameters 2868 specified in the Request-URI and corresponding attributes defined in 2869 the MSML Dialog body, the Request-URI parameters must be silently 2870 ignored. 2872 MSML Dialogs MUST NOT change the operation of the announcement 2873 service from that defined in [n7]. When the announcement completes, a 2874 media server issues a SIP BYE request. The INFO method MUST NOT used 2875 with the announcement service. 2877 9.5 MSML Dialog Structure and Modularity 2879 MSML is structured as a set of packages. Only the core and base 2880 packages are required. The Dialog Core package, defines the framework 2881 for MSML requests to a media server, without specific functionality. 2882 It consists of the "primitive" abstraction, an abstract element for 2884 Internet-draft Media Server Markup Language July 28 2009 2885 (MSML) 2887 control flow, the sequential execution model, and the element. 2888 That is, the MSML Dialog Core package allows for the execution of a 2889 sequence of one or more media processing primitives with the ability 2890 to notify events to the invocation environment. 2892 Primitives are contained within the MSML Dialog Base package, which 2893 defines the basic , , , , and 2894 elements. Another package, the MSML Dialog Transform 2895 package, defines the simple half duplex filters. More advanced 2896 primitives are defined in the speech and fax packages. The MSML 2897 speech package depends on the MSML Dialog base package as it extends 2898 the capability of by adding synthesized speech. Finally, the 2899 group execution model, which is currently the only element which 2900 changes the flow of control is defined in a separate MSML Dialog 2901 Group package. All of these packages are optional with the exception 2902 that MSML Dialog Core and MSML Dialog Base packages MUST be 2903 implemented to provide the minimal functionality. 2905 9.6 MSML Dialog Core Package 2907 The MSML Dialog Core package defines the structural framework and 2908 abstractions for MSML Dialogs(via its schema). It also defines the 2909 basic elements which are not part of the core primitive or control 2910 abstractions. This package is dependent on the MSML Core package. 2911 Events generated by MSML Dialogs, such as prompt completion, digits 2912 collected, or dialog termination, etc, are communicated by the Media 2913 Server via the MSML Core Package (see MSML Core Package ). 2915 MSML Dialogs are executed independently from the MSML core context. 2916 When an MSML Dialog is started, MSML allocates the dialog control 2917 resources, and if successful, starts those resources executing. MSML 2918 core execution then continues without waiting for the MSML dialog to 2919 complete. This forking of MSML dialog invocation from the MSML core 2920 context is done via the element. Media streams are 2921 created between the MSML dialog target and other internal media 2922 server resources as part of dialog execution. Stream creation is 2923 subject to the requirements defined in MSML Core package and media 2924 streams as defined in MSML Conference Core package. 2926 9.6.1 2928 The element is used to instantiate an MSML media dialog 2929 on connections or conferences. The dialog is specified either inline 2930 or by a URI [n6]. Inline dialogs MUST be composed of any of the MSML 2931 Dialog packages. MSML dialogs MAY be defined externally as VoiceXML 2932 [n5]. The MSML dialog description MUST NOT be inline if the src 2933 attribute, containing a URI, is present. 2935 Internet-draft Media Server Markup Language July 28 2009 2936 (MSML) 2938 The originator of the MSML dialog is notified using a 2939 "msml.dialog.exit" event when the dialog completes. Any results 2940 returned by the dialog when it exits are sent as a namelist to the 2941 event. 2943 The "msml.dialog.exit" event is also used when dialogs fail due to 2944 errors encountered fetching external documents or errors that occur 2945 within the dialog execution thread. In this case, a namelist 2946 containing the items "dialog.exit.status" and 2947 "dialog.exit.description" is returned with the event to inform the 2948 client of the failure and the failure reason. The values of these 2949 items are defined within this package and the MSML Core package. 2950 Information from the failed dialog may be returned as additional 2951 namelist items. 2953 Attributes: 2955 target: an identifier of a connection or a conference which 2956 will interact with the dialog. The identifier must not contain 2957 wildcards. Mandatory. 2959 src: the URL of the dialog description. MUST NOT be used if the 2960 MSML dialog description is inline. Otherwise an error (422) 2961 will result and MSML document execution will stop. 2963 type: a MIME type which identifies the type of language used to 2964 describe the dialog. application/moml+xml and 2965 application/vxml+xml are used to identify MSML Dialogs and 2966 VoiceXML [n5] respectively. Mandatory. 2968 name: an instance name for the dialog. If the attribute is not 2969 present, the media server will assign an identifier to the 2970 dialog. If the attribute is present but the name is already 2971 associated with the target, an error (431) will result and MSML 2972 document execution will stop. Any results that a dialog 2973 generates will be correlated to its identifier. 2975 mark: a token which can be used to identify execution progress 2976 in the case of errors. The value of the mark attribute from the 2977 last successfully executed MSML element is returned in an error 2978 response. Therefore the value of all "mark" attributes within 2979 an MSML document should be unique. 2981 The following sections show examples of initiating an external MSML 2982 dialog, an in-line embedded MSML dialog, and an MSML initiated 2983 VoiceXML dialog. 2985 The following example starts a MSML dialog on a connection. 2987 Internet-draft Media Server Markup Language July 28 2009 2988 (MSML) 2990 2991 2992 2996 2998 The following example starts an in-line embedded MSML dialog on a 2999 connection. 3000 3001 3002 3003 3004 3009 3012 3013 3015 The following example starts a VoiceXML dialog on a connection. 3017 3018 3019 3023 3025 If this dialog fails once its execution thread had begun, for example 3026 the fetch of the VoiceXML document failed, an example of the event 3027 which would be returned would be: 3029 3030 3032 dialog.exit.status 3033 423 3034 dialog.exit.description 3035 External document fetch error 3037 Internet-draft Media Server Markup Language July 28 2009 3038 (MSML) 3040 3042 9.6.2 3044 Dialog end is used to terminate a MSML dialog created through 3045 before it completes of its own accord. The operation of 3046 depends on the dialog language being used by the 3047 executing context. When that context is VoiceXML, a 3048 "connection.disconnected" event will be thrown to the VoiceXML 3049 application. When that context is MSML Dialog, a "terminate" event 3050 will be sent to the MSML core context. 3052 allows the executing dialog the opportunity to gracefully 3053 complete before generating a "msml.dialog.exit" event. Dialog results 3054 may be returned and will be contained as a namelist to that event. 3056 Attributes: 3058 id: the identifier of a dialog. Mandatory. 3060 mark: a token which can be used to identify execution progress 3061 in the case of errors. The value of the mark attribute from the 3062 last successfully executed MSML Dialog element is returned in 3063 an error response. Therefore the value of all "mark" attributes 3064 within an MSML document should be unique. 3066 For example, if the dialog from the previous example was still 3067 executing, the following would terminate the dialog and generate a 3068 "msml.dialog.exit" event. 3070 3071 3072 3073 3075 9.6.3 3077 Sends an event and optional namelist to the recipient identified by 3078 the target attribute. Event names are defined by the recipient. In 3079 the case where the recipient is an MSML Dialog group or primitive, 3080 the events are defined within this document. Other recipients MAY use 3081 names that are suitable for their environment. 3083 The "target" attribute specifies the recipient of the event. 3084 Recipients MAY be other MSML Dialog primitives or groups executing 3086 Internet-draft Media Server Markup Language July 28 2009 3087 (MSML) 3089 within the object, the object itself, or the environment which 3090 invoked the MSML Dialog. Sending events to media primitives or groups 3091 is supported by the MSML Dialog Group package. Any target which is 3092 unknown within the object is assumed to be destined to the external 3093 environment. By convention, the string "source" SHOULD used to 3094 address that environment but any target name distinct from the MSML 3095 Dialog namespace MAY be used. 3097 Attributes: 3099 event: the name of an event. Mandatory. 3101 target: the recipient of the event. The recipient MUST be a 3102 MSML Dialog primitive, the currently executing group, or the 3103 MSML Dialog environment. A primitive is specified by a 3104 primitive type, optionally appended by a period '.' followed by 3105 the identifier of a primitive. Identifiers are only needed when 3106 more than one primitive of the same type exists in the object. 3107 The executing group is specified using the token "group". The 3108 environment is specified using the token "source", optionally 3109 appended by a period '.' followed by any environment specific 3110 target. Mandatory. 3112 namelist: a list of zero or more shadow variables which are 3113 included with the event. 3115 9.6.4 3117 Exit causes execution of the MSML Dialog to terminate. 3119 Attributes: 3121 namelist: a list of one or more shadow variables which MAY 3122 optionally be sent to the context which invoked the MSML Dialog 3123 object. 3125 9.6.5 3127 Disconnect is similar to but has the additional semantics of 3128 indicating to the context which invoked the MSML Dialog, that it 3129 should disconnect from a media server, the media stream associated 3130 with the object. The method of disconnection depends upon how the 3131 media stream was initially established. If SIP was used, a 3132 would cause a media server to issue a BYE request. The 3133 request would be sent for the SIP dialog associated with media 3134 session on which the MSML Dialog was operating. 3136 Attributes: 3138 Internet-draft Media Server Markup Language July 28 2009 3139 (MSML) 3141 namelist: a list of one or more shadow variables which MAY 3142 optionally be sent to the context which invoked the MSML Dialog 3143 object. 3145 9.7 MSML Dialog Base Package 3147 The MSML Dialog Base package defines a required set of base 3148 functionality for Media Server. It support individual media 3149 primitives, such as playing an announcement or collection digits, as 3150 well as composite operations such as play and collect. When this 3151 package is used in conjunction with MSML Dialog Group package the 3152 event based mechanism is used to control primitives. This package may 3153 also be used in conjunction with MSML Speech package to extend the 3154 functionality of prompts to include TTS and user input collection to 3155 include ASR. 3157 In the following sections, subsections of a primitive define child 3158 elements of that primitive and are not themselves considered 3159 primitives. They do not receive events or populate shadow variables. 3161 9.7.1 3163 Play is used to generate an audio or video stream. It MUST play in 3164 sequence the media created by the child media elements