idnits 2.17.1 draft-saleem-msml-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 18. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 9154. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 9120. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 9127. ** The document seems to lack an RFC 3978 Section 5.4 Reference to BCP 78 -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack an RFC 3979 Section 5, para. 3 IPR Disclosure Invitation -- however, there's a paragraph with a matching beginning. Boilerplate error? Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 10 instances of lines with non-RFC2606-compliant FQDNs in the document. == There are 4 instances of lines with private range IPv4 addresses in the document. If these are generic example addresses, they should be changed to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x, 198.51.100.x or 203.0.113.x. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year == Line 443 has weird spacing: '... Core packa...' == Line 2711 has weird spacing: '...collect agc ...' == Line 4826 has weird spacing: '... code len...' == Line 6107 has weird spacing: '... ' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 3 errors (**), 0 flaws (~~), 9 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet-draft Media Server Markup Language February 2008 2 (MSML) 4 SIPPING A. Saleem 5 Internet Draft Y. Xin 6 Intended status: Informational Radisys 7 Expires: August 09, 2008 G. Sharratt 8 February 11, 2008 10 Media Server Markup Language (MSML) 11 draft-saleem-msml-06 13 Status of this Memo 15 By submitting this Internet-Draft, each author represents that any 16 applicable patent or other IPR claims of which he or she is aware 17 have been or will be disclosed, and any of which he or she becomes 18 aware will be disclosed, in accordance with Section 6 of BCP 79. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF), its areas, and its working groups. Note that other 22 groups may also distribute working documents as Internet-Drafts. 24 Internet-Drafts are draft documents valid for a maximum of six months 25 and may be updated, replaced, or obsoleted by other documents at any 26 time. It is inappropriate to use Internet-Drafts as reference 27 material or to cite them other than as "work in progress." 29 The list of current Internet-Drafts can be accessed at 30 http://www.ietf.org/ietf/1id-abstracts.txt. 32 The list of Internet-Draft Shadow Directories can be accessed at 33 http://www.ietf.org/shadow.html. 35 This Internet-Draft will expire on August 09, 2008. 37 Copyright Notice 39 Copyright (C) The IETF Trust (2008). 41 Abstract 43 The Media Server Markup Language (MSML) is used to control and invoke 44 many different types of services on IP Media Servers. Clients can use 45 it to define how multimedia sessions interact on a Media Server and 46 to apply services to individuals or groups of users. MSML can be 47 used, for example, to control Media Server conferencing features such 48 as video layout and audio mixing, create sidebar conferences or 50 Internet-draft Media Server Markup Language February 2008 51 (MSML) 53 personal mixes, and set the properties of media streams. As well, 54 clients can use MSML to define media processing dialogs, which may be 55 used as parts of application interactions with users or conferences. 56 Transformation of media streams to and from users or conferences as 57 well as IVR dialogs are examples of such interactions, which are 58 specified using MSML. MSML clients may also invoke dialogs with 59 individual users or with groups of conference participants using 60 VoiceXML. 62 Table of Contents 64 1. Introduction...................................................4 65 2. Conventions used in this document..............................5 66 3. Glossary.......................................................5 67 4. MSML SIP Usage.................................................6 68 4.1 SIP INFO...................................................7 69 4.2 SIP Control Framework......................................8 70 5. Language Structure............................................15 71 5.1 Package Scheme............................................15 72 5.2 Profile Scheme............................................19 73 6. Execution Flow................................................19 74 7. Media Server Object Model.....................................21 75 7.1 Objects...................................................21 76 7.2 Identifiers...............................................24 77 8. MSML Core Package.............................................26 78 8.1 ....................................................26 79 8.2 ....................................................26 80 8.3 ..................................................27 81 8.4 ...................................................27 82 9. MSML Conference Core Package..................................28 83 9.1 Conferences...............................................28 84 9.2 Media Streams.............................................29 85 9.3 ........................................31 86 9.4 ........................................33 87 9.5 .......................................35 88 9.6 ................................................35 89 9.7 .............................................37 90 9.8 ....................................................43 91 9.9 ............................................45 92 9.10 .................................................46 93 9.11 ................................................47 94 9.12 .................................................48 95 10. MSML Dialog Packages.........................................51 96 10.1 Overview.................................................51 97 10.2 Primitives...............................................53 98 10.3 Events...................................................55 100 Internet-draft Media Server Markup Language February 2008 101 (MSML) 103 10.4 MSML Dialog Usage with SIP...............................56 104 10.5 MSML Dialog Structure and Modularity.....................57 105 10.6 MSML Dialog Core Package.................................58 106 10.7 MSML Dialog Base Package.................................63 107 10.8 MSML Dialog Group Package................................81 108 10.9 MSML Dialog Transform Package............................85 109 10.10 MSML Dialog Speech Package..............................88 110 10.11 MSML Dialog Fax Detection Package.......................92 111 10.12 MSML Dialog Fax Send/Receive Package....................93 112 11. MSML Audit Package..........................................100 113 11.1 MSML Audit Core Package.................................100 114 11.2 MSML Audit Conference Package...........................102 115 11.3 MSML Audit Connection Package...........................105 116 11.4 MSML Audit Dialog Package...............................108 117 11.5 MSML Audit Stream Package...............................110 118 12. Response Codes..............................................111 119 13. MSML Conference Examples....................................113 120 13.1 Establishing a Dial-in Conference.......................113 121 13.2 Example of a Sidebar Audio Conference...................117 122 13.3 Example of Removing a Conference........................118 123 13.4 Example of Modifying Video Layout.......................119 124 14. MSML Dialog Examples........................................120 125 14.1 Announcement............................................120 126 14.2 Voice Mail Retrieval....................................120 127 14.3 Play and Record.........................................121 128 14.4 Speech Recognition......................................123 129 14.5 Play and Collect........................................124 130 14.6 User Controlled Gain....................................125 131 15. MSML Audit Examples.........................................126 132 15.1 Audit All Conferences...................................126 133 15.2 Audit Conference Dialogs................................127 134 15.3 Audit Conference Streams................................128 135 15.4 Audit All Connections...................................128 136 15.5 Audit Connection Dialogs................................129 137 15.6 Audit Connection Streams................................129 138 15.7 Audit Connection With Selective States..................130 139 16. Change Summary..............................................131 140 17. Future Work.................................................133 141 18. XML Schema..................................................133 142 18.1 MSML Core...............................................135 143 18.2 MSML Conference Core Package............................139 144 18.3 MSML Dialog Packages....................................147 145 18.4 MSML Audit Packages.....................................169 146 19. Security Considerations.....................................175 147 20. IANA Considerations.........................................175 148 20.1 IANA registrations for 'application' MIME Media Type....175 149 20.2 IANA registrations for 'text' MIME Media Type...........176 150 20.3 URN Sub-Namespace Registration..........................177 152 Internet-draft Media Server Markup Language February 2008 153 (MSML) 155 20.4 XML Schema Registration.................................177 156 21. Normative References........................................177 157 22. Informative References......................................178 158 Acknowledgments.................................................179 159 Authors' Addresses..............................................179 160 Intellectual Property Statement.................................180 161 Full Copyright Statement........................................180 162 Disclaimer of Validity..........................................181 163 Acknowledgement.................................................181 165 1. Introduction 167 Media servers contain dynamic pools of media resources. Control 168 Agents and other users of media servers (called media server clients) 169 can define and create many different services based on how they 170 configure and use those resources. Often, that configuration and the 171 ways in which those resources interact will be changed dynamically 172 over the course of a call, to reflect changes in the way that an 173 application interacts with a user. 175 For example, a call may undergo an initial IVR dialog before being 176 placed into a conference. Calls may be moved from a main conference 177 to a sidebar conference and then back again. Individual calls may be 178 directly bridged to create small n-way calls or simple sidebars. None 179 of these change the SIP [n1] dialog or RTP [i3] session. Yet these do 180 affect the media flow and processing internal to the media server. 182 The Media Server Markup Language (MSML) is an XML [n2] language used 183 to control the flow of media streams and services applied to media 184 streams within a media server. It is used to invoke many different 185 types of services on individual sessions, groups of sessions, and 186 conferences. MSML allows the creation of conferences, bridging 187 different sessions together, and bridging sessions into conferences. 189 MSML may also be used to create user interaction dialogs and allows 190 the application of media transforms to media streams. Media 191 interaction dialogs created using MSML allow construction of IVR 192 dialog sessions to individual users as well as to groups of users 193 participating in a conference. Dialogs may also be specified using 194 other languages, VoiceXML [n5], which support complete single-party 195 application logic to be executed on the Media Server. 197 MSML is a transport independent language, such that it does not rely 198 on underlying transport mechanisms and language semantics are 199 independent of transport. However, SIP is a typical and commonly used 200 transport mechanism for MSML, invoked using the SIP URI scheme. This 202 Internet-draft Media Server Markup Language February 2008 203 (MSML) 205 specification defines using MSML Dialogs using SIP as the transport 206 mechanism. 208 A network connection may be established with the media server using 209 SIP. Media received and transmitted on that connection will flow 210 through different media resources on the media server depending on 211 the requested service. Basic Network Media Services with SIP [n7] 212 defines conventions for associating a basic service with a SIP 213 Request-URI. MSML allows services to be dynamically applied and 214 changed by a Control Agent during the lifetime of the SIP dialog. 216 MSML has been designed to address the control and manipulation of 217 media processing operations (e.g., announcement, IVR, play and 218 record, ASR/TTS, fax, video), as well as control and relationships of 219 media streams (e.g., simple and advanced conferencing). It provides a 220 general-purpose media server control architecture. MSML can 221 additionally be used to invoke other more complex IVR languages such 222 as VoiceXML. 224 2. Conventions used in this document 226 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 227 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 228 document are to be interpreted as described in RFC 2119 [i14]. 230 3. Glossary 232 Media Server: a general-purpose platform for executing real-time 233 media processing tasks. This is a logical function that maps either 234 to a single physical device or to a portion of a physical device. 236 Media Server Client: an application which originates MSML requests to 237 a media server and also referred to as a Control Agent in this 238 specification. 240 Network Connection: a participant that represents the termination on 241 a media server of one or more RTP [i3] sessions (for example audio 242 and video) associated with a call. Network connections are 243 established and removed using a session establishment protocol such 244 as SIP. An instance of a network connection is independent of MSML 245 processing instructions applied to it. 247 Dialog: an automated IVR participant. Examples of dialogs may be 248 announcement players, IVR interfaces, or voice recorders. Dialogs may 249 be defined in MSML or using VoiceXML [n5]. 251 Conference: an intermediary function that provides multimedia mixing 252 and other advanced conferencing services. This specification 254 Internet-draft Media Server Markup Language February 2008 255 (MSML) 257 currently considers conferences with audio and/or video media types, 258 but is extensible to other media types. 260 Identifier: a name that is used to refer to a specific instance of an 261 object on the media server, such as a conference or a dialog. 262 Identifiers are composed of one or more terms where each term 263 identifies an object class and instance. 265 Object: the generic term for a media server entity that terminates, 266 originates, or processes media. This specification defines four 267 classes of objects and specifies mechanisms to create them, join them 268 together, and destroy them. 270 Participant Object: an object in a media server that sources original 271 media in a call and/or receives and terminates media in a call. 273 Intermediary Object: an object in a media server that acts on media 274 within a call for the benefit of the participants. 276 Independent Object: an object that can exist on a media server 277 independent of other objects. 279 Operator: an intermediary transformer that modifies or transforms a 280 media stream. Examples of operators may be audio gain controls, video 281 scaling, or voice masking. MSML defines operators as media transform 282 objects, which transform media using operations such as gain control, 283 when applied to media streams. 285 Media Stream: a single media flow between two objects. A media stream 286 has a media type and may be unidirectional or bidirectional. 288 4. MSML SIP Usage 290 SIP is used to create and modify media sessions with a media server 291 according to the procedures defined in RFC 3261 [n1]. Often, SIP 292 third party call control [i4] will be used to create sessions to a 293 media server on behalf of end users. MSML is used to define and 294 change the service which a user connected to a media server will 295 receive. MSML clients are application servers, softswitches, or other 296 forms of control agents, and SHOULD have an authorized security 297 relationship with the media server. MSML itself does not define 298 authorization mechanisms. 300 MSML transactions are originated based upon events that occur in the 301 application domain. These events may be independent from any media or 302 user interaction. For example, an application may wish to play an 303 announcement to a conference warning that its scheduled completion 304 time is approaching. Applications themselves are structured in many 306 Internet-draft Media Server Markup Language February 2008 307 (MSML) 309 different ways. Their structure and requirements contribute to their 310 selection of protocols and languages. To accommodate differing 311 application needs, MSML has been designed to be neutral to other 312 languages and independent of the transport used to carry it. 314 MSML language is purposely designed to be transport independent. In 315 this release of the specification, SIP INFO [i5] and SIP Control 316 Framework [i13] have been chosen for transport mechanisms for MSML, 317 as described in the following sections. 319 4.1 SIP INFO 321 SIP INVITE and INFO [i5] requests and responses MAY be used to carry 322 MSML. INFO requests allow asynchronous mid-call messages within SIP 323 with few additional semantics. In addition, there are existing widely 324 deployed implementations of that method, it aids in initial 325 developments which are closely coupled with SIP session 326 establishment, and it allows MSML to be directly associated with user 327 dialogs when third party call control is used. 329 Although INFO is sometimes considered to not be a suitable general- 330 purpose transport mechanism for messages within SIP, there have been 331 proposals to make it more acceptable. MSML may evolve to include 332 other SIP usage and/or to work with other protocols or as a stand- 333 alone protocol established through SIP, in future releases of this 334 document. 336 MSML supports several models for client interaction. When clients use 337 3PCC to establish media sessions on behalf of end users, clients will 338 have a SIP dialog for each media session. MSML MAY be sent on these 339 dialogs. However the targets of MSML actions are not inferred from 340 the session associated with the SIP dialog. The targets of MSML 341 actions are always explicitly specified using identifiers as 342 previously defined. 344 An application, after interacting with a user, may want to affect 345 multiple objects within a media server. For example, tones or 346 messages are often played to a conference when connections are added 347 or removed. A separate message may also be played to a participant as 348 they are joined, or to moderators. Explicit identifiers, that is, not 349 inferred from a transport mechanism allow these multiple actions to 350 be easily grouped into a single transaction sent on any SIP dialog. 352 MSML also supports a model of dedicated control associations. This 353 supports decoupled application architectures where a client can 354 control media server services without also establishing all of the 355 media sessions itself. Control associations are created using SIP but 356 they do not have any associated media session. Although initially 358 Internet-draft Media Server Markup Language February 2008 359 (MSML) 361 INFO messages will be sent on this SIP dialog, just as with dialogs 362 associated with media sessions, it is possible that in the future, 363 the SIP dialog will be used to establish a separate control session 364 (defined in SDP [n9]) that does not use SIP as the transport for MSML 365 messages. 367 A media server using MSML also sends asynchronous events to a client 368 using MSML scripts in SIP INFO. Events are sent based on previous 369 MSML requests and are sent within the SIP dialog on which the MSML 370 request that caused the event to be generated was received. If this 371 dialog no longer exists when the event is generated, the event is 372 discarded. 374 Events may be generated during the execution of a dialog created by a 375 element. For example, dialogs can send events based on 376 user input. VoiceXML dialogs, on the other hand, generally interact 377 with other servers outside of MSML using HTTP. 379 An event is also generated when the execution of a dialog terminates, 380 either because of completion or failure. The exact information 381 returned is dependent on the dialog language, the capabilities of the 382 dialog execution environment, and what was requested by the dialog. 383 Both MSML and VoiceXML [n5] allow information to be returned when 384 they exit. These events may be sent in a SIP INFO or a SIP BYE. SIP 385 BYE is used when the dialog itself specifies that the connection 386 should be disconnected, for example through the use of the 387 element. 389 Conferences may also generate events based upon their configuration. 390 An example of this is the notification of the set of active speakers. 392 4.2 SIP Control Framework 394 The SIP Control Framework [i13] MAY be used as a transport mechanism 395 for MSML. 397 The Control Framework provides a generic approach for establishment 398 and reporting capabilities of remotely initiated commands. The 399 framework utilizes many functions provided by the Session Initiation 400 Protocol [n1] (SIP) for the rendezvous and establishment of a 401 reliable channel for control interactions. Compared to SIP INFO, the 402 SIP Control Framework is a more general purpose transport mechanism 403 and one which is not constrained by limitations of the SIP INFO 404 mechanism. 406 The Control Framework also introduces the concept of a Control 407 Package, which is an explicit usage of the Control Framework for a 408 particular interaction set. This specification has already specified 410 Internet-draft Media Server Markup Language February 2008 411 (MSML) 413 a list of packages for MSML to control the Media Server in many 414 aspects, including basic dialog, advanced conferencing, advanced 415 dialog and audit service. Each of these packages has a unique Control 416 Package name assigned in order for MSML to be used with the Control 417 Framework. 419 This section fulfills the mandatory requirement for information that 420 MUST be specified during the definition of a Control Framework 421 Package, as detailed in SIP Control Framework [i13]. 423 4.2.1 Control Framework Package Names 425 The Control Framework [i13] requires a Control Package definition to 426 specify and register a unique name. 428 MSML specification defines Control Package names using a hierarchical 429 scheme to indicate the inherited relationship across packages. For 430 example, package "msml-x" is derived from package "msml", and package 431 "msml-x-y" is derived from package "msml-x". 433 The following is a list of Control Package names reserved by the MSML 434 specification. 436 "msml": this Control Package supports MSML Core package as 437 specified in section 8. 439 "msml-conf": this Control Package supports MSML Conference 440 Core package as specified in section 9. 442 "msml-dialog": this Control Package supports MSML Dialog 443 Core package as specified in section 10.6. 445 "msml-dialog-base": this Control Package supports MSML 446 Dialog Base package as specified in section 10.7. 448 "msml-dialog-transform": this Control Package supports MSML 449 Dialog Transform package as specified in section 450 10.9. 452 "msml-dialog-group": this Control Package supports MSML 453 Dialog Group package as specified in section 10.8. 455 "msml-dialog-speech": this Control Package supports MSML 456 Dialog Speech package as specified in section 457 10.10. 459 Internet-draft Media Server Markup Language February 2008 460 (MSML) 462 "msml-dialog-fax-detect": this Control Package supports MSML 463 Dialog Fax Detection package as specified in 464 section 10.11. 466 "msml-dialog-fax-sendrecv": this Control Package supports 467 MSML Dialog Fax Send/Receive package as specified 468 in section 10.12. 470 "msml-audit": this Control Package supports MSML Audit Core 471 Package as specified in section 11.1. 473 "msml-audit-conf": this Control Package supports MSML Audit 474 Conference Package as specified in section 11.2. 476 "msml-audit-conn": this Control Package supports MSML Audit 477 Connection Package as specified in section 11.3. 479 "msml-audit-dialog": this Control Package supports MSML 480 Audit Dialog Package as specified in section 11.4. 482 "msml-audit-stream": this Control Package supports MSML 483 Audit Stream Package as specified in section 11.5. 485 An Application Server using the Control Framework as transport for 486 MSML, MUST use one or multiple package names, depending on the 487 service required from the Media Server. The package name(s) are 488 identified in the "Control-Packages" SIP header that is present in 489 the SIP INVITE dialog request that creates the control channel, as 490 specified in [i13]. The "Control-Packages" value MAY be re-negotiated 491 via the SIP re-INVITE mechanism. 493 4.2.2 Control Framework Messages 495 The usage of CONTROL, response and REPORT messages, as defined in 496 [i13], by each Control Package defined in MSML is different and 497 described separately in the following sections. 499 MSML Core Package "msml" 501 The Application Server may send CONTROL message with a body of 502 MSML request using following elements to the MS: 504 : the root element that may contain a list of child 505 elements which request a specific operation. The child 506 elements are defined in extended packages (eg. "msml-conf" and 507 "msml-dialog"). This element is also the root element which 508 contains MSML result and event. 510 Internet-draft Media Server Markup Language February 2008 511 (MSML) 513 : sends an event to the specified recipient within the 514 Media Server. Specific event types are defined within the 515 extended packages. 517 The Media Server replies with a response message containing a 518 MSML result using the following elements: 520 : reports the results of an MSML transaction. 522 The Media Server MAY send MSML event to the Application 523 Server, in a REPORT or CONTROL message, using element . 524 The actual content of the and which Control Framework 525 message to use is defined within the extended packages. 527 MSML Conference Core Package "msml-conf" 529 This package extends the MSML Core Package to define a 530 framework for creation, manipulation and deletion of a 531 conference. 533 AS can send CONTROL message with a body of MSML request which 534 contains one or multiple conference related commands to MS. MS 535 then replies with a response message with a body of MSML 536 result to indicate if the request has been fulfilled or not. 538 During the lifetime of a conference, whenever an event occurs, 539 the Media Server MAY send CONTROL messages containing MSML 540 events to notify the Application Server. The Application 541 Server SHOULD reply with a response message with no MSML body 542 to acknowledge the event has been received. 544 This package does NOT use the REPORT message. 546 Dialog Core Package "msml-dialog" 548 This package extends the MSML Core Package to define the 549 structural framework and abstractions for MSML dialogs. 551 The Application Server MAY send CONTROL messages containing a 552 MSML request using following elements: 554 : instantiate an MSML media dialog on a 555 connection or a conference. 557 : terminates a MSML dialog. 559 Internet-draft Media Server Markup Language February 2008 560 (MSML) 562 : sends an event and an optional namelist to the dialog, 563 dialog group, or dialog primitive. 565 : used by the dialog description language to cause the 566 execution of the MSML dialog to terminate. 568 For the command, the response message MUST 569 contain a MSML result which indicates that the dialog has been 570 started successfully. The MSML result MAY contain 571 to return dialog identifier, if the identifiers was assigned 572 by the Media Server. Subsequently, zero of more MSML events 573 MAY be initiated by the Media Server in (update) REPORT 574 messages to report information gathered during the dialog. 575 Finally, a MSML event "msml.dialog.exit" SHOULD be generated 576 in a (terminate) REPORT message when the dialog terminates 577 (eg. MSML execution of ). 579 For the and commands, the response message 580 contains the final MSML result which indicates that the 581 request has either been fulfilled or rejected. 583 Dialog Base Package "msml-dialog-base" 585 This package extends the MSML Dialog Core Package to define a 586 set of base functionality for MSML dialogs. The extension 587 defines individual media primitives, including , 588 , , , and , to be 589 used as child element of . This package does not 590 change the framework message usage as defined by the MSML 591 Dialog Core Package. 593 Dialog Transform Package "msml-dialog-transform" 595 This package extends the MSML Dialog Core Package to define a 596 set of transform primitives which works as filter on half 597 duplex media streams. The extension defines transform 598 primitives, including , , , , 599 and , which MAY be used as child elements of 600 . This package does not change the framework 601 message usage as defined by the MSML Dialog Core Package. 603 Dialog Group Package "msml-dialog-group" 605 This package extends the MSML Dialog Core, Base and Transform 606 Packages to define a single control flow construct that 608 Internet-draft Media Server Markup Language February 2008 609 (MSML) 611 specifies concurrent execution of multiple media primitives. 612 The extension defines the element which MAY be used as 613 a child element of to enclose multiple media 614 primitives, such that they can be executed concurrently. This 615 package does not change the framework message usage as defined 616 by the MSML Dialog Core Package. 618 Dialog Speech Package "msml-dialog-speech" 620 This package extends the MSML Dialog Core and MSML Base 621 Package to define functionality which MAY be used for 622 automatic speech recognition and text-to-speech. The extension 623 extends the and the elements. 625 For , it defines a new child element to 626 activate grammars or user input rules associated with speech 627 recognition. For , it defines a new child element 628 to initiate the text-to-speech service. 630 This package does not change the framework message usage as 631 defined by the MSML Dialog Core Package. 633 Dialog Fax Detection Package "msml-dialog-fax-detect" 635 This package extends the MSML Dialog Core Package to define 636 primitives provide fax detection service. The extension 637 defines a primitive to be used as a child element 638 of . This package does not change the framework 639 message usage as defined by the MSML Dialog Core Package. 641 Dialog Fax Send/Receive Package "msml-dialog-fax-sendrecv" 643 This package extends the MSML Dialog Core Package to define 644 primitives which allow a media server to provide fax send or 645 receive service. The extension defines new primitives 646 and , to be used as child element of 647 . This package does not change the framework 648 message usage as defined in MSML Dialog Core Package. 650 Dialog Audit Core Package "msml-audit" 652 Internet-draft Media Server Markup Language February 2008 653 (MSML) 655 This package extends the MSML Core Package to define a 656 framework for auditing media resource(s) allocated on the 657 Media Server. 659 This package follows a simple request/response transaction, 660 allowing the Application Server to send CONTROL messages 661 containing MSML requests. The Media Server MUST reply 662 with a response message containing the result. The result is 663 contained within the element, returning the 664 queried state information. 666 This package does NOT use the REPORT message. 668 Dialog Audit Conference Package "msml-audit-conf" 670 This package extends the MSML Audit Core Package to define 671 conference specific states which MAY be queried via the 672 command and the corresponding response MUST be 673 returned by the element. This package does not 674 change the framework message usage as defined by the MSML 675 Audit Core Package. 677 Dialog Audit Connection Package "msml-audit-conn" 679 This package extends the MSML Audit Core Package to define 680 connection specific states which MAY be queried via the 681 command and the corresponding response MUST be 682 returned by the element. This package does not 683 change the framework message usage as defined by the MSML 684 Audit Core Package. 686 Dialog Audit Dialog Package "msml-audit-dialog" 688 This package extends the MSML Audit Core Package to define 689 dialog specific states which MAY be queried via the 690 command and the corresponding response MUST be returned by the 691 element. This package does not change the 692 framework message usage as defined by the MSML Audit Core 693 Package. 695 Dialog Audit Stream Package "msml-audit-stream" 697 This package extends the MSML Audit Core Package to define 698 stream specific states which MAY be queried via the 700 Internet-draft Media Server Markup Language February 2008 701 (MSML) 703 command and the corresponding response MUST returned by the 704 element. This package does not change the 705 framework message usage as defined by the MSML Audit Core 706 Package. 708 4.2.3 Common XML Support 710 The XML schema described in [i13] MUST be supported by all Control 711 Packages defined by MSML. However, the "connection-id" value MUST be 712 constructed as defined by MSML (i.e. the identifier MUST contain 713 local dialog tag only, while the SIP Control Framework [i13] requires 714 that the "connection-id" contain both local and remote dialog tags). 716 4.2.4 Control Message Body 718 A valid CONTROL body message MUST conform to the MSML schema, as 719 included in this specification, for the MSML package(s) used. 721 4.2.5 REPORT Message Body 723 A valid REPORT body message MUST conform to the MSML schema, as 724 included in this specification, for the MSML package(s) used. 726 5. Language Structure 728 5.1 Package Scheme 730 The primary mechanism for extending MSML is the "package". A package 731 is an integrated set of one or more XML schemas that define 732 additional features and functions via new or extended use of elements 733 and attributes. Each package, except for those defined in the current 734 document, is defined in a separate standards document, e.g., an 735 Internet Draft or an RFC. All packages, that extend the base MSML 736 functionality, MUST include references to the MSML base set of 737 schemas provided in the Internet drafts. A schema in a package MUST 738 only extend MSML, this is, it must not alter the existing 739 specification. 741 A particular MSML script will include references to all the schemas 742 defining the packages whose elements and attributes it makes use of. 743 A particular script MUST reference MSML base and optionally extension 744 package(s). See IANA Considerations section. 746 Each package MUST define its own namespace so that elements or 747 attributes with the same name in different packages do not conflict. 748 A script using a particular element or attribute MUST prefix the 749 namespace name on that element or attribute's name if it is defined 750 in a package (as opposed to being defined in the base). 752 Internet-draft Media Server Markup Language February 2008 753 (MSML) 755 MSML consists of a core package which provides structure without 756 support for any specific feature set. Additional packages, relying on 757 the core package, provide functional features. Any combination of 758 additional packages may be used along with the core package. The 759 following describes the set of MSML packages defined in this 760 document. 762 +--------------------------------------------------------+ 763 | MSML Core | 764 +--------------------------------------------------------+ 765 / \ \ 766 +--------+ +--------+ +-------+ 767 | Dialog | | Conf | | Audit | 768 | Core | | Core | | Core | 769 +--------+ +--------+ +-------+ 770 ________ \_______________________________________ | 771 ------------------------------------------------ | 772 / \ \ \ \ \ | 773 +------+ +---------+ +------+ +------+ +------+ +-------+ | 774 |Dialog| |Dialog | |Dialog| |Dialog| |Dialog| |Dialog | | 775 |Base | |Transform| |Group | |Speech| |Fax | |Fax | | 776 +------+ +---------+ +------+ +------+ |Detect| |Send/ | | 777 +------+ |Receive| | 778 +-------+ | 779 ________________________| 780 ------------------------- 781 / \ \ \ 782 +-----+ +-----+ +------+ +------+ 783 |Audit| |Audit| |Audit | |Audit | 784 |Conf | |Conn | |Dialog| |Stream| 785 +-----+ +-----+ +------+ +------+ 787 o MSML Core package (Mandatory) 789 Describes the minimum base framework which MUST be implemented 790 to support additional core packages. 792 o MSML Conference Core package (Conditionally Mandatory, for 793 Conferencing) 795 Describes the audio and multimedia basic and advanced 796 conferencing package, which MAY be implemented. 798 o MSML Dialog Core package (Conditionally Mandatory, for Dialogs) 800 Describes the dialog core package which MUST be implemented for 801 any dialog services. However, systems supporting conferencing 803 Internet-draft Media Server Markup Language February 2008 804 (MSML) 806 only, MAY omit support for MSML dialogs. The MSML dialog core 807 package specifies the framework within which additional dialog 808 packages are supported. The MSML dialog base package MUST be 809 supported, while all other dialog packages MAY be supported. 811 o MSML Dialog Base package (Conditionally Mandatory, for 812 Dialogs) 814 o MSML Dialog Group package (Optional) 816 o MSML Dialog Transform package (Optional) 818 o MSML Dialog Fax Detection package (Optional) 820 o MSML Dialog Fax Send/Receive package (Optional) 822 o MSML Dialog Speech package (Optional) 824 o MSML Audit Core package (Conditionally Mandatory, for Auditing) 826 Describes the audit core package which MUST be implemented to 827 support auditing services. The MSML audit core package 828 specifies the framework within which additional audit packages 829 are supported. 831 o MSML Audit Conference package (Conditionally Mandatory, for 832 Auditing Conference, Conference Dialog and Conference Stream) 834 o MSML Audit Connection package (Conditionally Mandatory, for 835 Auditing Connection, Connection Dialog and Connection Stream) 837 o MSML Audit Dialog package (Conditionally Mandatory, for 838 Auditing Dialog, and MUST be used with either MSML Audit 839 Conference Package or MSML Audit Connection Package) 841 o MSML Audit Stream package (Conditionally Mandatory, for 842 Auditing Stream, and MUST be used with either MSML Audit 843 Conference Package or MSML Audit Connection Package) 845 The formal process for defining extensions to MSML Dialogs is to 846 define a new package. The new package MUST provide a text description 847 of what extensions are included and how they work. It MUST also 848 define an XML schema file (if applicable) that defines the new 849 package (which may be through extension, restriction of an existing 850 package, or a specific profile of an existing package). Dependencies 851 upon other packages MUST be stated. For example a package that 852 extends or restricts has a dependency on the original package 854 Internet-draft Media Server Markup Language February 2008 855 (MSML) 857 specification. Finally, the new package MUST be assigned a unique 858 name and version. 860 The types of things which can be defined in new packages are: 862 o new primitives 864 o extensions to existing primitives (events, shadow variables, 865 attributes, content) 867 o new recognition grammars for existing primitives 869 o new markup languages for speech generation 871 o languages for specifying a topology schema 873 o new pre-defined topology schemas 875 o new variables / segment types (sets & languages) 877 o new control flow elements 879 MSML Packages are assembled together to form a specific MSML profile 880 that is shared between different implementations. The base MSML 881 Dialog profiles which are defined in this document consist of the 882 MSML Core package, MSML Dialog Core package, MSML Dialog Base 883 package, MSML Dialog Group package, MSML Transform package, MSML Fax 884 packages, and the MSML Speech package. 886 MSML extension packages, which define primitives, MUST define the 887 following for each primitive within the package: 889 o the function which the primitive performs 891 o the attributes which may be used to tailor its behavior 893 o the events which it is capable of understanding 895 o the shadow variables which provide access to information 896 determined as a result of the primitive's operation. 898 The mechanism used to insure that a media server and its client share 899 a compatible set of packages is not defined. Currently it is expected 900 that provisioning will be used, possibly coupled with a future 901 auditing capability. Additionally, when used in SIP networks, 902 packages could be defined using feature tags and the procedures 903 defined for Indicating User Agent Capabilities in SIP [i1] used to 905 Internet-draft Media Server Markup Language February 2008 906 (MSML) 908 allow a media server to describe its capabilities to other user 909 agents. 911 5.2 Profile Scheme 913 Not all devices and applications using MSML will need to support the 914 entire MSML schema. For example, a media processing device might 915 support only audio announcements, only audio simple conferencing, or 916 only multimedia IVR. It is highly desirable to have a system for 917 describing what portion of MSML a particular media processing device 918 or Control Agent supports. 920 The Package scheme described earlier allows MSML functionality to be 921 functionally grouped, relying on the MSML core package. This scheme 922 allows a portion of the complete MSML specification to be 923 implemented, on a per package basis and also creates a framework for 924 future extension packages. However, within a given package, in some 925 cases, only a subset of the package functionality may be required. In 926 order to support subsets of packages, with greater degree of 927 granularity than at the package level, a profile scheme is required. 929 MSML package profiles would identify a subset of a given MSML package 930 with specific definitions of elements and attributes. Each MSML 931 package profile MUST be accompanied by one or more corresponding 932 schemas. To use the examples above, there could be an audio 933 announcements profile of the MSML Dialog Base package, an audio 934 simple conferencing profile of the MSML Conference Core package, and 935 a multimedia IVR profile of the MSML Dialog Base package. 937 MSML package profiles MUST be published separately from the MSML 938 specification, in one or more standards documents (e.g., Internet 939 Drafts or RFCs) dedicated to MSML package profiles. Profiles would 940 not be registered with IANA and any organization would additionally 941 be free to create its own profile(s) if required. 943 6. Execution Flow 945 MSML assumes a model where there is a single control context within a 946 media server for MSML processing. That context may have one or many 947 SIP [n1] dialogs associated with it. It is assumed that any SIP 948 dialogs associated with the MSML control context have been 949 authorized, as appropriate, by mechanisms outside the scope of MSML. 951 A media server control context maintains information about the state 952 of all media objects and media streams within a media server. It 953 receives and processes all MSML requests from authorized SIP dialogs 954 and receives all events generated internally by media objects and 955 sends them on the appropriate SIP dialog. An MSML request is able to 957 Internet-draft Media Server Markup Language February 2008 958 (MSML) 960 create new media objects and streams, and to modify or destroy any 961 existing media objects and streams. 963 An MSML request may simply specify a single action for a media server 964 to undertake. In this case, the document is very similar to a simple 965 command request. Often, though, it may be more natural for a client 966 to request multiple actions at one time, or the client would like 967 several actions to be closely coordinated by the media server. 968 Multiple MSML elements received in a single request MUST be processed 969 sequentially in document order. 971 An example of the first scenario would be to create a conference and 972 join it with an initial participant. An example of the second case 973 would be to unjoin one or more participants from a main conference 974 and join them to a sidebar conference. In the first scenario, network 975 latencies may not be an issue, but it is simpler for the client to 976 combine the requests. In the second case, the added network latency 977 between separate requests could mean perceptible audio loss to the 978 participant. 980 Each MSML request is processed as a single transaction. A media 981 server MUST ensure that it has the necessary resources available to 982 carry out the complete transaction before executing any elements of 983 the request. If it does not have sufficient resources, it MUST return 984 a 520 response and MUST NOT execute the transaction. 986 The MSML request MUST be checked for well-formedness and validated 987 against the schema prior to executing any elements. This allows XML 988 [n2] errors to reported immediately and minimizes failures within a 989 transaction and the corresponding execution of only part of the 990 transaction. 992 Each element is expected to execute immediately. Elements such as 993 , which take an unpredictable amount of time, are 994 "forked" and executed in a separate thread (see MSML Dialog 995 packages). Once successfully forked, execution continues with the 996 element following the . As such, MSML does not provide 997 mechanisms to sequence or coordinate other operations with dialog 998 elements. 1000 Processing within a transaction MUST stop if any errors occur. 1001 Elements that were executed prior to the error are not rolled back. 1002 It is the responsibility of the client to determine appropriate 1003 actions based upon the results indicated in the response. Most 1004 elements MAY contain an optional "mark" attribute. The value of that 1005 attribute from the last successfully executed element MUST be 1006 returned in an error response. Note that errors that occur during the 1008 Internet-draft Media Server Markup Language February 2008 1009 (MSML) 1011 execution of a dialog occur outside the context of an MSML 1012 transaction. These errors will be indicated in an asynchronous event. 1014 Transaction results are returned as part of the SIP request response. 1015 The transaction results indicate the success or failure of the 1016 transaction. The result MUST also include identifiers for any objects 1017 created by a media server for which the client did not provide an 1018 instance name. Additionally, if the transaction fails, the reason for 1019 the failure MUST be returned, as well as an indication of how much of 1020 the transaction was executed before the failure occurred SHOULD be 1021 returned. 1023 7. Media Server Object Model 1025 Media servers are general-purpose platforms for executing real-time 1026 media processing tasks. These tasks range in complexity from simple 1027 ones such as serving announcements, to complex ones, such as speech 1028 interfaces, centralized multimedia conferencing, and sophisticated 1029 gaming applications. 1031 Calls are established to a media server using SIP. Clients will often 1032 use SIP third party call control (3PCC) [i4] to establish calls to a 1033 media server on behalf of end users. However MSML does not require 1034 that 3PCC be used; only that the client and the media server share a 1035 common identifier for the call and its associated RTP [i3] sessions. 1037 Objects represent entities which source, sink, or modify media 1038 streams. A media streams is a bidirectional or unidirectional media 1039 flow between objects on a media server. The following subsections 1040 define the classes of objects that exist on a media server and the 1041 way these are identified in MSML. 1043 7.1 Objects 1045 A media object is an endpoint of one or more media streams. It may be 1046 a connection that terminates RTP sessions from the network or a 1047 resource that transforms or manipulates media. MSML defines four 1048 classes of media objects. Each class defines the basic properties of 1049 how object instances are used within a media server. However, most 1050 classes require that the function of specific instances be defined by 1051 the client, using MSML or other languages such as VoiceXML. 1053 The following classes of media processing objects are defined. The 1054 class names are given in parentheses: 1056 o network connection (conn) 1058 o conference (conf) 1060 Internet-draft Media Server Markup Language February 2008 1061 (MSML) 1063 o dialog (dialog) 1065 Network connection is an abstraction for the media processing 1066 resources involved in terminating the RTP session(s) of a call. For 1067 audio services a connection instance presents a full-duplex audio 1068 stream interface within a media server. Multimedia connections have 1069 multiple media streams of different media types, each corresponding 1070 to an RTP session. Network connections get instantiated through SIP 1071 [n1]. 1073 A conference represents the media resources and state information 1074 required for a single logical mix of each media type in the 1075 conference (e.g. audio and video). MSML models multiple mixes/views 1076 of the same media type as separate conferences. Each conference has 1077 multiple inputs. Inputs may be divided into classes that allow an 1078 application to request different media treatment for different 1079 participants. For example, the video streams for some participants 1080 may be assigned to fixed regions of the screen while those for other 1081 participants may only be shown when they are speaking. 1083 A conference has a single logical output per media type. For each 1084 participant, it consists of the audio conference mix, less any 1085 contributed audio of the participant, and the video mix shared by all 1086 conference participants. Video conferences using voice activated 1087 switching have an optional ability to show the previous speaker to 1088 the current speaker. 1090 Conferences are instantiated using the element. 1091 The content of the element specifies the 1092 parameters of the audio and/or video mixes. 1094 Dialogs are a class of objects that represent automated participants. 1095 They are similar to network connections from a media flow perspective 1096 and may have one or more media streams as the abstraction for their 1097 interface within a media server. Unlike connections however, dialogs 1098 are created and destroyed through MSML, and the media server itself 1099 implements the dialog participant. Dialogs are instantiated through 1100 the element. Contents of the element 1101 define the desired or expected dialog behavior. Dialogs may also be 1102 invoked by referencing VoiceXML as the dialog description language. 1104 Operators are functions that are used to filter or transform a media 1105 stream. The function that an instance of an operator fulfills is 1106 defined as a property of the media stream. Operators may be 1107 unidirectional or bidirectional and have a media type. Unidirectional 1108 operators reflect simple atomic functions such as automatic gain 1109 control, filtering tones from conferences, or applying specific gain 1110 values to a stream. Unidirectional operators have a single media 1112 Internet-draft Media Server Markup Language February 2008 1113 (MSML) 1115 input, which is connected to the media stream from one object, and a 1116 single media output, which is connected to the media stream of a 1117 different object. 1119 Bidirectional operators have two media inputs and two media outputs. 1120 One media input and output is associated with the stream to one 1121 object and the other input and output is associated with a stream to 1122 a different object. Bidirectional objects may treat the media 1123 differently in each direction. For example, an operator could be 1124 defined which changed the media sent to a connection based upon 1125 recognized speech or DTMF received from the connection. Operators are 1126 implicitly instantiated when streams are created or modified using 1127 the elements and respectively. 1129 The relationships between the different object classes (conf, conn, 1130 and dialog) are shown in the figure below. 1132 +--------------------------------------+ 1133 | Media Server | 1134 | | 1135 |------+ ,---. | 1136 | | +------+ / \ | 1137 <== RTP ==>| conn |<---->| oper |<---->( conf ) | 1138 | | +------+ \ / | 1139 |------+ `---' | 1140 | ^ ^ | 1141 | | | | 1142 | | +------+ +------+ | | 1143 | | | | | | | | 1144 | +-->|dialog| |dialog|<---+ | 1145 | | | | | | 1146 | +------+ +------+ | 1147 +--------------------------------------+ 1149 A single, full-duplex instance of each object class is shown together 1150 with common relationships between them. An operator (such as gain) is 1151 shown between a connection and a conference and dialogs are shown 1152 participating both with an individual connection and with a 1153 conference. The figure is not meant to imply only one to one 1154 relationships. Conferences will often have hundreds of participants, 1155 and either connections or conferences may be interacting with more 1156 than one dialog. For example, one dialog may be recording a 1157 conference while other dialogs announce participants joining or 1158 leaving the conference. 1160 Internet-draft Media Server Markup Language February 2008 1161 (MSML) 1163 7.2 Identifiers 1165 Objects are referenced using identifiers that are composed of one or 1166 more terms. Each term specifies an object class and names a specific 1167 instance within that class. The object class and instance are 1168 separated by a colon ":" in an identifier term. 1170 Identifiers are assigned to objects when they are first created. In 1171 general, either the MSML client or a media server may specify the 1172 instance name for an object. Objects for which a client does not 1173 assign an instance name will be assigned one by a media server. Media 1174 server assigned instance names are returned to the client as a 1175 complete object identifier in the response to the request that 1176 created the object. 1178 It is meaningful for some classes of objects to exist independently 1179 on a media server. Network connections may be created through SIP at 1180 any time. MSML can then be used to associate their media with other 1181 objects as required to create services. Conferences may be created 1182 and have specific resources reserved waiting for participant 1183 connections. 1185 Objects from these two classes, connections and conferences, are 1186 considered independent objects since they can exist on a standalone 1187 basis. Identifiers for independent objects consist of single term as 1188 defined above. For example, identifiers for a conference and 1189 connection could be "conf:abc" or "conn:1234" respectively. Clients 1190 which choose to assign instance names to independent objects must use 1191 globally unique instance names. One way to create globally unique 1192 names is to include the domain name of the client as part of the 1193 name. 1195 Dialogs are created to provide a service to independent objects. 1196 Dialogs may act as a participant in a conference or interact with a 1197 connection similar to a two participant call. Dialogs depend upon the 1198 existence of independent objects and this is reflected in the 1199 composition of their identifiers. Operators modify the media flow 1200 between other objects, such as application of gain between a 1201 connection and a conference. As operators are merely media transform 1202 primitives defined as properties of the media stream, they are not 1203 represented by identifiers and created implicitly. 1205 Identifiers for dialogs are composed of a structured list of slash 1206 ('/') separated terms. The left-most term of the identifier must 1207 specify a conference or connection. This serves as the root for the 1208 identifier. An example of an identifier for a dialog acting as a 1209 conference participant could be: 1211 Internet-draft Media Server Markup Language February 2008 1212 (MSML) 1214 conf:abc/dialog:recorder 1216 All objects except connections are created using MSML. Connections 1217 are created when media sessions get established through SIP. There 1218 are several options clients and media servers can use to establish a 1219 shared instance name for a connection and its media streams. 1221 When media servers support multiple media types, the instance name 1222 SHOULD be a call identifier that can be used to identify the 1223 collection of RTP sessions associated with a call. When MSML is used 1224 in conjunction with SIP and third party call control, the call 1225 identifier MUST be the same as the local tag assigned by the media 1226 server to identify the SIP dialog. This will be the tag the media 1227 server adds to the "To" header in its response to an initial invite 1228 transaction. RFC 3261 requires the tag values to be globally unique. 1230 An example of a connection identifier is: conn:74jgd63956ts. 1232 With third party call control, the MSML client acts as a back to back 1233 user agent (B2BUA) to establish the media sessions. SIP dialogs are 1234 established between the client and the media server allowing the use 1235 of the media server local tag as a connection identifier. If third 1236 party call control is not used, a SIP event package MAY be used to 1237 allow a media server to notify new sessions to a client that has 1238 subscribed to this information. 1240 Identifiers as described above allow every object in a media server 1241 to be uniquely addressed. They can also be used to refer to multiple 1242 objects. There are two ways in which this can currently be done: 1244 wildcards 1246 common instance names 1248 An identifier can reference multiple objects when a wildcard is used 1249 as an instance name. MSML reserves the instance name comprised of a 1250 single asterisk ('*') to mean all objects that have the same 1251 identifier root and class. Instance names containing an asterisk 1252 cannot be created. Wildcards MUST only be used as the right most term 1253 of an identifier and MUST NOT be used as part of the root for dialog 1254 identifiers. Wildcards are only allowed where explicitly indicated 1255 below. 1257 The following are examples of valid wildcards: 1259 conf:abc/dialog:* 1261 conn:* 1263 Internet-draft Media Server Markup Language February 2008 1264 (MSML) 1266 Examples of illegal wildcard usage are: 1268 conf:*/dialog:73849 1270 Although identifiers share a common syntax, MSML elements restrict 1271 the class of objects which are valid in a given context. As an 1272 example, although it is valid to join two connections together, it is 1273 not valid to join two IVR dialogs. 1275 8. MSML Core Package 1277 This section describes the core MSML package which MUST be supported 1278 in order to use any other MSML packages. The core MSML package 1279 defines a framework, without explicit functionality, over which 1280 functional packages are used. 1282 8.1 1284 is the root element. When received by a media server, it 1285 defines the set of operations that form a single MSML request. 1286 Operations are requested by the contents of the element. Each 1287 operation MAY appear zero or more times as children of . 1288 Specific operations are defined within the Conference package and in 1289 the set of Dialog packages. 1291 The results of a request or the contents of events sent by a media 1292 server are also enclosed within the element. The results of 1293 the transaction are included as a body in the response to the SIP 1294 request that contained the transaction. This response will contain 1295 any identifiers that the media server assigned to newly created 1296 objects. All messages that a media server generates are correlated to 1297 an object identifier. Objects and identifiers are discussed in 1298 section 7 (Media Server Object Model). 1300 Attributes: 1302 version: "1.1" Mandatory 1304 8.2 1306 Events are used to affect the behavior of different objects within a 1307 media server. The element is used to send an event to the 1308 specified recipient within the Media Server. 1310 Attributes: 1312 event: the name of an event. Mandatory. 1314 Internet-draft Media Server Markup Language February 2008 1315 (MSML) 1317 target: an object identifier. When the identifier is for a 1318 dialog, it may optionally be appended with a slash "/" followed 1319 by the target to be included in a MSML Dialog . 1320 Mandatory. 1322 valuelist: a list of zero or more parameters that are included 1323 with the event. 1325 mark: a token that can be used to identify execution progress 1326 in the case of errors. The value of the mark attribute from the 1327 last successfully executed MSML element is returned in an error 1328 response. Therefore the value of all mark attributes within an 1329 MSML document should be unique. 1331 8.3 1333 The element is used to report the results of an MSML 1334 transaction. It is included as a body in the final response to the 1335 SIP request which initiated the transaction. An optional child 1336 element may include text which expands on the meaning 1337 of error responses. Response codes are defined in section 11 1338 (Response Codes). 1340 Attributes: 1342 response: a numeric code indicating the overall success or 1343 failure of the transaction, and in the case of failure, an 1344 indication of the reason. Mandatory. 1346 mark: in the case of an error, the value of the mark attribute 1347 from the last successfully executed element that included the 1348 mark attribute. 1350 In the case of failure, a description of the reason SHOULD be 1351 provided using the child element . 1353 Three other child elements allow the response to include identifiers 1354 for objects created by the request but which did not have instance 1355 names specified by the client. Those elements are and 1356 , for objects created though a and 1357 respectively. 1359 8.4 1361 The element is used to notify an event to a media server 1362 client. Three types of events are defined by MSML Core package; 1363 "msml.dialog.exit", "msml.conf.nomedia", and "msml.conf.asn". These 1364 correspond to the termination of an executing dialog, a conference 1366 Internet-draft Media Server Markup Language February 2008 1367 (MSML) 1369 being automatically deleted when the last participant has left, and 1370 the notification of the current set of active speakers for a 1371 conference, respectively. Events may also be generated by an 1372 executing dialog. In this case the event type is specified by the 1373 dialog. (see MSML Dialog Core Package ). 1375 Attributes: 1377 name: the type of event. If the event is generated because of 1378 the execution MSML Dialog , the value MUST be the value 1379 of the "event" attribute from the element within the 1380 MSML Dialog Core package. If the event is generated because of 1381 the execution of an , the value MUST be "moml.exit". If 1382 the event is generated because of the execution of a 1383 , the value MUST be "moml.disconnect". If the event 1384 is generated because of an error, the value must be 1385 "moml.error". Mandatory. 1387 id: the identifier of the conference or dialog that generated 1388 the event or caused the event to be generated. Mandatory. 1390 has two children, and , which contain the 1391 name and value respectively of each namelist item associated 1392 with the event. 1394 9. MSML Conference Core Package 1396 9.1 Conferences 1398 A conference has a mixer for each type of media that the conference 1399 supports. Each mix has a corresponding description that defines how 1400 the media from participants contributes to that mix. A mixer has 1401 multiple inputs that are combined in a media specific way to create a 1402 single logical output. 1404 The elements that describe the mix for each media type are called 1405 mixer description elements. They are: 1407 defines the parameters for mixing audio media. 1409 defines the composition of a video window. 1411 These elements, defined in sections 9.6 (Audio Mix) and 9.7 (Video 1412 Layout) respectively, are used as content of the 1413 element to establish the initial properties of a conference. The 1414 elements are used within the element to change the 1415 properties of a conference once it has been created, or within the 1417 Internet-draft Media Server Markup Language February 2008 1418 (MSML) 1420 element to remove individual mixes from the 1421 conference. 1423 Conferences may be terminated by an MSML client using the 1424 element to remove the entire conference or by 1425 removing the last mixer(s) associated with the conference. 1426 Conferences can also be terminated automatically by a media server 1427 based on criteria specified when the conference is created. When the 1428 conference is deleted, any remaining participants will have their 1429 associated SIP dialogs left unchanged or deleted based on the value 1430 of the "term" attribute specified when the conference was created. 1432 9.2 Media Streams 1434 Objects have at least one media input and output for each type of 1435 media that they support. Each object class defines the number of 1436 inputs and outputs objects of that class support. Media streams are 1437 created when objects are joined, either explicitly using , or 1438 implicitly when dialogs are created using . Dialog 1439 creation has two stages, allocating and configuring the resources 1440 required for the dialog instance, and implicitly joining those 1441 resources to the dialog target during the dialog execution. Refer to 1442 MSML Dialog Base package. 1444 A join operation by default creates a bidirectional audio stream 1445 between two objects. Video and unidirectional streams may also be 1446 created. A media stream is created by connecting the output from one 1447 object to the input of another object and vice versa (assuming a 1448 bidirectional or full-duplex join). 1450 Many objects may only support a single input for each type of media. 1451 Within this specification, only the conference object class supports 1452 an arbitrary number of inputs. When a stream is requested to be 1453 created to an object that already has a stream of the same type 1454 connected to its single input, the result of the request depends upon 1455 the type of the media stream. 1457 Audio mixing is done by summing audio signals. Automatically mixing 1458 audio streams has common and straight forward applications. For 1459 example, the ability to bridge two streams allows for the easy 1460 creation of simple three-way calls or to bridge private announcements 1461 with a [whispered] conference mix for an individual participant. In 1462 the case of general conferences however, an MSML client SHOULD create 1463 an audio conference and then join participants to the conference. 1464 Conference mixers SHOULD subtract the audio of each participant from 1465 the mix so that they do not hear themselves. 1467 Internet-draft Media Server Markup Language February 2008 1468 (MSML) 1470 A media server that receives a request that requires joining an audio 1471 stream to the single audio input of an object that already has an 1472 audio stream connected, SHOULD automatically bridge the new stream 1473 with the existing stream, creating a mix of the two audio streams. 1474 The maximum number of streams that may be bridged in this manner is 1475 implementation-specific. It is RECOMMENDED that a media server 1476 support bridging at least two streams. A media server that cannot 1477 bridge a new stream with any existing streams MUST fail the operation 1478 requesting the join. 1480 Unlike audio mixing, there are many different ways that two video 1481 streams may be combined and presented. For example, they may be 1482 presented side by side in separate panes, picture in picture, or in a 1483 single pane which displays only a single stream at a time based on a 1484 heuristic such as active speaker. Each of these options creates a 1485 very different presentation and require significantly different media 1486 resources. 1488 A join operation does not describe how a new stream can be combined 1489 with an existing stream. Therefore automatic bridging of video is not 1490 supported. A media server MUST fail requests to join a new video 1491 stream to an object that only supports a single video input and 1492 already has a video stream connected to that input. For an object to 1493 have multiple video streams joined to it, the object itself must be 1494 capable in supporting multiple video streams. Conference objects can 1495 support multiple video streams and provide a way to specify the 1496 mixing presentation for the video streams. 1498 A media server MUST NOT establish any streams unless the media server 1499 is able to create all the streams requested by an operation. Streams 1500 are only able to be created if both objects support a media type and 1501 at least one of the following conditions is true: 1503 1. each object that is to receive media is not already receiving a 1504 stream of that type. 1506 2. any object that is to receive media and is already receiving a 1507 stream of that type supports receiving an additional stream of 1508 that type. The only class of objects defined in this 1509 specification that directly support receiving multiple streams 1510 of the same type are conferences. 1512 3. the media server is able to automatically bridge media streams 1513 for an object that is to receive media and that is already 1514 receiving a stream of the requested type. The only type of 1515 media defined in this specification that MAY be automatically 1516 bridged is audio. 1518 Internet-draft Media Server Markup Language February 2008 1519 (MSML) 1521 The directionality of media streams associated with a connection are 1522 modeled independently from what SDP [n9] allows for the corresponding 1523 RTP [i3] sessions. Media servers MUST respect the SDP in what they 1524 actually transmit but MUST NOT allow the SDP to affect the 1525 directionality when joining streams internal to the media server. 1527 9.3 1529 is used to allocate and configure the media mixing 1530 resources for conferences. A description of the properties for each 1531 type of media mix required for the conference is defined within the 1532 content of the element. Mixer descriptions are 1533 described in Audio Mix and Video Layout sections. When no mixer 1534 descriptions are specified, the default behavior MUST be equivalent 1535 to inclusion of a single . 1537 Clients can request that a media server automatically delete a 1538 conference when a specified condition occurs by using the 1539 "deletewhen" attribute. A value of "nomedia" indicates that the 1540 conference MUST be deleted when no participants remain into the 1541 conference. When this occurs, an "msml.conf.nomedia" event MUST be 1542 notified to the MSML client. A value of "nocontrol" indicates the 1543 conference MUST be deleted when the SIP [n1] dialog that carries the 1544 element is terminated. When this occurs, a media 1545 server MUST terminate all participant dialogs by sending a BYE for 1546 their associated SIP dialog. A value of "never" MUST leave the 1547 ability to delete a conference under the control of the MSML client. 1549 Attributes: 1551 name: the instance name of the conference. If the attribute is 1552 not present, the media server MUST assign a globally unique 1553 name for the conference. If the attribute is present but the 1554 name is already in use, an error (432) will result and MSML 1555 document execution MUST stop. Events which the conference 1556 generates use this name as the value of their "id" attribute 1557 (see section 5.6.2 ()). 1559 deletewhen: defines whether a media server should automatically 1560 delete the conference. Possible values are "nomedia", 1561 "nocontrol", and "never". Default is "nomedia". 1563 term: when true, the media server MUST send a BYE request on 1564 all SIP dialogs still associated with the conference when the 1565 conference is deleted. Setting term equal to false allows 1566 clients to start dialogs on connections once the conference has 1567 completed. Default true. 1569 Internet-draft Media Server Markup Language February 2008 1570 (MSML) 1572 mark: a token which MAY be used to identify execution progress 1573 in the case of errors. The value of the mark attribute from the 1574 last successfully executed MSML element is returned in an error 1575 response. Therefore the value of all mark attributes within an 1576 MSML document should be unique. 1578 An example of creating an audio conference is shown below. This 1579 conference allows at most two participants to contend to be heard and 1580 reports the set of active speakers no more frequently than every ten 1581 seconds. 1583 1584 1585 1586 1587 1588 1589 1590 1591 1593 9.3.1 1595 Conference resources may be reserved by including the 1596 element as a child of . allows the 1597 specification of a set of resources which a media server will reserve 1598 for the conference. Any requests for resources beyond those that have 1599 been reserved should be honored on a best-effort basis by a media 1600 server. 1602 Attributes: 1604 required: boolean that specifies whether 1605 should fail if the requested resources are not available. When 1606 set to false, the conference will be created, with no reserved 1607 resources, if the complete reservation cannot be honored. 1608 Default true. 1610 9.3.1.1 1612 The resources to be reserved are defined using . The 1613 contents of these elements describe a resource that is to be 1614 reserved. Descriptions are implementation-dependent. Media servers 1615 that support MSML Dialogs may use the elements from that package as 1616 the basis for resource descriptions. Each resource element may use 1617 the attribute "n" to define the quantity of the resource to reserve. 1619 Internet-draft Media Server Markup Language February 2008 1620 (MSML) 1622 For example, the following creates a conference and reserves two 1623 types of resources. One resource element may represent resources that 1624 are shared by all participants of the conference while the other may 1625 represent resources that are reserved for each of the expected 1626 participants. 1628 Attributes: 1630 n: number of resources to be reserved. Default 1. 1632 type: specifies whether the resource is to be reserved by each 1633 individual participant or reserved as a shared conference 1634 resource. Valid values for this attribute are "individual" or 1635 "shared". Default "individual". 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1648 9.4 1650 All of the properties of an audio mix or the presentation of a video 1651 mix may be changed during the life of a conference using the 1652 element. Changes to an audio mix are requested by 1653 including an element as a child of . 1654 This may also be used to add an audio mixer to the conference if none 1655 was previously allocated. Changes to a video presentation are 1656 requested by including a element as a child of 1657 . Similar to an audio mixer, this may be used to 1658 add a video mixer if none was previously allocated. 1660 Mixers are removed by including a mixer description element within 1661 . 1663 Features and presentation aspects are enabled/added or modified by 1664 including the element(s) that define the feature or presentation 1665 aspect within a mixer description. The complete specification of the 1666 element must be included just as it would be included when the 1667 conference is created. The new definition completely replaces any 1668 previous definition that existed. Only things that are defined by 1669 elements included in the mixer descriptions are affected. Any 1671 Internet-draft Media Server Markup Language February 2008 1672 (MSML) 1674 existing configuration aspects of a conference, which are not 1675 specified within the element, MUST maintain their 1676 current state in the Media Server. 1678 For example, if an MSML client wanted to change the minimum reporting 1679 interval for active speaker notification from that shown in the 1680 Conference Examples section () it would send the 1681 following to the media server: 1683 1684 1685 1686 1687 1688 1689 1690 1692 This would also enable active speaker notification if it had not 1693 previously been enabled. The N-loudest mixing is unaffected. 1695 Multiple elements MAY be included in the mixer descriptions similar 1696 to when conferences are created. For example, in a video conference, 1697 the video mix description () could specify that the 1698 layout of the video being displayed should change such that the 1699 regions currently displaying participants get smaller and new 1700 region(s) are created to support additional participants. A media 1701 server MUST make all of the requested changes or none of the 1702 requested changes. 1704 Additional examples of modifying conferences are presented in the 1705 Conference Examples section. 1707 Attributes: 1709 id: the identifier for a conference. Wildcards MUST NOT be 1710 used. Mandatory. 1712 mark: a token which can be used to identify execution progress 1713 in the case of errors. The value of the mark attribute from the 1714 last successfully executed MSML element is returned in an error 1715 response. Therefore the value of all "mark" attributes within 1716 an MSML document SHOULD be unique. 1718 Internet-draft Media Server Markup Language February 2008 1719 (MSML) 1721 9.5 1723 Destroy conference is used to delete mixers or to delete the entire 1724 conference and all state and shared resources. When a mixer is 1725 removed, all of the streams joined to that mixer are unjoined. When a 1726 conference is destroyed, SIP dialogs for any remaining participants 1727 MUST be maintained or removed based on the value of the "term" 1728 attribute when the conference was created. 1730 When there is no element content, deletes the 1731 entire conference. Individual mixer(s) are removed by including a 1732 mixer description element identifying the mix(es) to be removed as 1733 content to . is used remove audio 1734 mixers and is used remove video mixers. When one or 1735 more mixer descriptions are specified, then Media Server MUST only 1736 delete the specified mixer and MUST NOT affect any other existing 1737 mixers. When or are identified for 1738 individual removal, other feature aspects of the mix MUST NOT be 1739 included. If specified, the Media Server MUST ignore any such 1740 elements. When the last mixer is removed from a conference, a media 1741 server MUST remove all conference state, leaving or removing any 1742 remaining SIP dialogs as described above. 1744 Attributes: 1746 id: the identifier for a conference. Mandatory. 1748 mark: a token which can be used to identify execution progress 1749 in the case of errors. The value of the mark attribute from the 1750 last successfully executed MSML element is returned in an error 1751 response. Therefore the value of all "mark" attributes within 1752 an MSML document SHOULD be unique. 1754 9.6 1756 The properties of the overall audio mix are specified using the 1757 element. 1759 Attributes: 1761 id: an optional identifier for the audio mix. 1763 samplerate: Integer value specifies the sample rate (in Hz) for 1764 the audio mixer. Optional, default value of 8000. 1766 An example of the description for an audio mix is: 1768 1770 Internet-draft Media Server Markup Language February 2008 1771 (MSML) 1773 1774 1775 1777 9.6.1 1779 The element defines that participants contend to be 1780 included in the conference mix based upon their audio energy. When 1781 the element is not present, all participants are mixed. 1783 Attributes: 1785 n: the number of participants that will be included in the 1786 audio mix based upon having the greatest audio energy. 1787 Mandatory. 1789 9.6.2 1791 The element enables notification of active speakers. Active 1792 speakers MUST be notified using the element with an event 1793 name of "msml.conf.asn". The namelist of the event consists of the 1794 set of active speakers. The name of each item is the string "speaker" 1795 with a value of the connection identifier for the connection. 1797 Attributes: 1799 ri: the minimum reporting interval defines the minimum duration 1800 of time which must pass before changes to active speakers will 1801 be reported. A value of zero disables active speaker 1802 notification. 1804 asth: specifies the active speaker threshold (in unit of dBm0). 1805 Valid value range is 0 to -96. Optional, default is -96. 1807 An example of an active speaker notification is: 1809 1810 speaker 1811 conn:hd93tg5hdf 1812 speaker 1813 conn:w8cn59vei7 1814 speaker 1815 conn:p78fnh6sek47fg 1816 1818 Internet-draft Media Server Markup Language February 2008 1819 (MSML) 1821 9.7 1823 A video layout is specified using the element. It is 1824 used as a container to hold elements that describe all of the 1825 properties of a video mix. The parameters of the window that displays 1826 the video mix are defined by the element. When the video mix 1827 in composed of multiple panes, the location and characteristics of 1828 the panes are defined by one or more elements. A 1829 element is not required when only a single video stream is displayed 1830 at one time and none of the visual attributes of regions are 1831 required. 1833 Some regions may be used to display a video stream based on a 1834 selection criteria rather than having a video stream of a single 1835 participant continuously presented in the region. One such an example 1836 is a distance learning lecture where the instructor sees each of the 1837 students periodically displayed in a region. When a region is used to 1838 display one of a number of streams, it is placed as a child of a 1839 element. 1841 Attributes: 1843 type: specifies the language used to define the layout. Layouts 1844 defined using MSML MUST use the value "text/msml-basic-layout". 1845 This is the same convention as defined for the layout package 1846 from the W3C SMIL 2.0 specification [i6]. The default when 1847 omitted is "text/msml-basic-layout". 1849 id: an optional identifier for the video layout. 1851 9.7.1 1853 The element describes the root window or virtual screen in 1854 which the conference video mix will be displayed. Simple conferences 1855 can display participant video directly within the root window but 1856 more complex conferences will use regions for this purpose. Areas of 1857 the window which are not used to display video will show the root 1858 window background. 1860 All video presentations require a root window. It MUST be present 1861 when a video mix is created and it cannot be deleted, however its 1862 attributes MAY be changed using the element. 1864 Attributes: 1866 size: the size of the root window specified as one of the five 1867 standard common intermediate formats (e.g. CIF, QCIF, etc.). 1869 Internet-draft Media Server Markup Language February 2008 1870 (MSML) 1872 backgroundcolor: the color for the root window background 1873 defined using the values for the "background-color" property of 1874 the CSS2 specification [n10]. 1876 backgroundimage: the URI for an image to be displayed as the 1877 root window background. Transparent portions of the image allow 1878 the background color to show through. 1880 9.7.2 1882 elements define video panes that are used to display 1883 participant video streams. Regions are rendered on top of the root 1884 window. 1886 The size of a region is specified relative to the size of the root 1887 window using the "relativesize" attribute. Relative sizes are 1888 expressed as fractions (e.g. 1/4, 1/3) that preserve the aspect ratio 1889 of the original video stream while allowing for efficient scaling 1890 implementations. 1892 Regions are located on the root window based on the value of the 1893 position attributes "top" and "left". These attributes define the 1894 position of the top left corner of the region as an offset from the 1895 top left corner of the root window. Their values may be expressed 1896 either as a number of pixels or as a percent of the vertical or 1897 horizontal dimension of the root window. Percent values are appended 1898 with a percent ('%') character. Percent values of "33%" and "67%" 1899 should be interpreted as "1/3" and "2/3" to allow easy alignment of 1900 regions whose size is expressed relative to the size of the root 1901 window. 1903 An example of a video layout with six regions is: 1905 +-------+---+ 1906 | | 2 | 1907 | 1 +---+ 1908 | | 3 | 1909 +---+---+---+ 1910 | 6 | 5 | 4 | 1911 +---+---+---+ 1913 1914 1915 1916 1917 1918 1920 Internet-draft Media Server Markup Language February 2008 1921 (MSML) 1923 1924 1925 1927 The area of the root window covered by a region is a function of the 1928 region's position and its size. When areas of different regions 1929 overlap, they are layered in order of their "priority" attribute. The 1930 region with the highest value for the "priority" attribute is below 1931 all other regions and will be hidden by overlapping regions. The 1932 region with the lowest non-zero value for the "priority" attribute is 1933 on top of all other regions and will not be hidden by overlapping 1934 regions. The priority attribute may be assigned values between 0 and 1935 1. A value of zero disables the region, freeing any resources 1936 associated with the region, and unjoining any video stream displayed 1937 in the region. 1939 Regions that do not specify a priority will be assigned a priority by 1940 a media server when a conference is created. The first region within 1941 the element that does not specify a priority will be 1942 assigned a priority of one, the second a priority of two, etc. In 1943 this way, all regions that do not explicitly specify a priority will 1944 be underneath all regions that do specify a priority. As well, within 1945 those regions that do not specify a priority, they will be layered 1946 from top to bottom, in the order they appear within the 1947 element. 1949 For example, if a layout was specified as follows: 1951 1952 1953 1954 1955 1956 1957 1959 Then the regions would be layered, from top to bottom, c,a,b,d. 1961 Portions of regions that extend beyond the root window will be 1962 cropped. For example, a layout specified as: 1964 1965 1966 1967 1969 Internet-draft Media Server Markup Language February 2008 1970 (MSML) 1972 would appear similar to: 1974 +-----------+ 1975 | root | 1976 |background | 1977 | +-----+-- 1978 | | |// 1979 | | foo |// 1980 +-----+-----+// 1981 |//////// 1983 Visual attributes are used to define aspects of the visual appearance 1984 of individual regions. A border may be defined together with a title 1985 and/or logo. Text and logos are displayed as images on top of the 1986 region's video, below all regions with a lower priority. The visual 1987 attributes are "title", "titletextcolor", "titlebackgroundcolor", 1988 "bordercolor", "borderwidth", and "logo". 1990 Visual attributes can also be defined for individual streams (Video 1991 Stream Properties). When visual attributes are specified as part of 1992 both a region and a stream, those associated with the stream MUST 1993 take precedence. This allows streams that are chosen for display 1994 automatically (Stream Selection) to have proper text and logos 1995 displayed. The region visual attributes are displayed when no stream 1996 is associated with the region. 1998 Two other attributes associated with a region, "blank" and "freeze", 1999 define the state of the video displayed in the region. When the blank 2000 or freeze attribute is assigned the value "true", then the Media 2001 Server MUST display the region either as a blank region, or the video 2002 image frozen at the last received frame. 2004 These attributes are specified for a region and not allowed for 2005 streams because that appears to be the common use case. Applying them 2006 to streams would allow only that stream to be affected within a 2007 selector while other streams continue to display normally. Except for 2008 personal mixing scenarios, the same effect can be achieved by having 2009 the participant mute their own transmission to the media server. 2011 Attributes: associated with each region: 2013 id: a name that can be used to refer to the region. 2015 left: the position of the region from the left side of the root 2016 window. 2018 Internet-draft Media Server Markup Language February 2008 2019 (MSML) 2021 top: the position of the region from the top of the root 2022 window. 2024 relativesize: the size of the region expressed as a fraction of 2025 the root window size. 2027 priority: a number between 0 and 1 that is used to define the 2028 precedence when rendering overlapping regions. A value of zero 2029 disables the region. 2031 title: text to be displayed as the title for the region 2033 titletextcolor: the color of the text 2035 titlebackgroundcolor: the color of the text background 2037 bordercolor: the color of the region border 2039 borderwidth: the width of the region border 2041 logo: the URI of an image file to be displayed 2043 freeze: a boolean value, with a default of false, that defines 2044 whether the video image should be frozen at the currently 2045 displayed frame 2047 blank: a boolean value, with a default of false, that defines 2048 whether the region should display black instead of the 2049 associated video stream 2051 9.7.3 2053 It is often desired that one of several video streams be 2054 automatically selected to be displayed. The element is 2055 used to define the selection criteria and its associated parameters. 2056 The selection algorithm is specified by the "method" attribute. 2057 Currently defined selection methods allow for voice activated 2058 switching and to iterate sequentially through the set of associated 2059 video streams. 2061 The regions that will display the selected video stream are placed as 2062 child elements of the element. Including regions within a 2063 element does not affect their layout with respect to 2064 regions not subject to the selection. For simple video conferences 2065 that display the video directly in the root window, the 2066 element can be placed as a child of . Region elements MUST 2067 NOT be used in this case. 2069 Internet-draft Media Server Markup Language February 2008 2070 (MSML) 2072 For example, below is a common video layout that allows the video 2073 stream from the currently active speaker to be displayed in the large 2074 region ("1") at the top left of the layout while the streams from 2075 five other participants are displayed in regions located at the 2076 layout periphery. 2078 +-------+---+ 2079 | | 2 | 2080 | 1 +---+ 2081 | | 3 | 2082 +---+---+---+ 2083 | 6 | 5 | 4 | 2084 +---+---+---+ 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2098 All selector methods must be defined so that they work if only a 2099 single region is a child of the selector. Selector methods that 2100 support more than one child region MUST specify how the method works 2101 across multiple regions. Media server implementations MAY support 2102 only a single region for methods that are defined to allow multiple 2103 regions. 2105 The selector or region for a participant's video is defined using the 2106 "display" attribute of during a join operation. Specifying a 2107 selector allows the stream to be displayed according to the criteria 2108 defined by the selector method. Specifying a region supports 2109 continuous presence display of participants. Some streams may be 2110 joined with both a selector and a region. In this case, the value of 2111 attribute defines whether the streams associated with a 2112 continuous presence region should be blanked when the stream is 2113 selected for display in one of the selector regions. 2115 Attributes: common to all selector methods are: 2117 Internet-draft Media Server Markup Language February 2008 2118 (MSML) 2120 id: a name that can be used to refer to the selector. 2122 method: the name of the method used to select the video stream. 2123 A value of "vas" (see section on Voice Activated Switching) MAY 2124 be specified. 2126 status: specifies whether the selector is "active" or 2127 "disabled". 2129 blankothers: when "true", video streams that are also displayed 2130 in continuous presence regions will have the continuous 2131 presence regions blanked when the stream is displayed in a 2132 selection region. 2134 9.7.3.1 Voice Activate Switching (vas) 2136 Voice activated switching (VAS) is used to display the video stream 2137 that correlates with the participant who is currently speaking. It is 2138 specified using a selector method value of "vas". 2140 If the video stream associated with the active speaker is not 2141 currently displayed in a selection region, then it replaces the video 2142 in the region that is displaying the video of the speaker that was 2143 least recently active. If the video of the active speaker is 2144 currently displayed in a selection region, then there is no change to 2145 any region. When VAS is applied to a single region, this has the 2146 effect that the current speaker is displayed in that region. 2148 Attributes: 2150 si: switching interval is the minimum period of time that must 2151 elapse before allowing the video to switch to the active 2152 speaker. 2154 speakersees: defines whether the active speaker sees the 2155 "current" speaker (themselves) or the "previous" speaker. 2157 9.8 2159 is used to create one or more streams between two independent 2160 objects. Streams may be audio or video and may be bidirectional or 2161 unidirectional. A bidirectional stream is implicitly composed of two 2162 unidirectional streams that can be manipulated independently. The 2163 streams to be established are specified by elements (section 2164 ) as the content of . 2166 Without any content, by default establishes a bidirectional 2167 audio stream. When only a stream of a single type has previously been 2169 Internet-draft Media Server Markup Language February 2008 2170 (MSML) 2172 created between two objects, or when only a unidirectional stream 2173 exists, can be used to add a stream of another media type or 2174 make the stream bidirectional by including the necessary 2175 elements. Bidirectional streams are made unidirectional by using 2176 (section ) to remove the unidirectional stream for 2177 the direction that is no longer required. 2179 In addition to defining the media type and direction of streams, 2180 elements are also used to establish the properties of 2181 streams, such as gain, voice masking, or tone clamping of audio 2182 streams, or labels and other visual characteristics of video streams. 2183 Properties are often defined asymmetrically for a single direction of 2184 a stream. Creating a bidirectional stream requires two 2185 elements within the , one for each direction, if one direction 2186 is to have different properties from the other direction. 2188 If a media server can provide services using both compressed or 2189 uncompressed media, the MSML client may need to distinguish within 2190 requests which format is to be used. When compressed streams are 2191 created, both objects must use the same media format or an error 2192 response (450) is generated. 2194 Attributes: 2196 id1: an identifier of either a connection or conference. 2197 Wildcards MUST NOT be used. Mandatory. Any other object class 2198 results in a 440 error. 2200 id2: an identifier of either a connection or conference. 2201 Wildcards MUST NOT be used. Mandatory. Any other object class 2202 results in a 440 error. 2204 mark: a token which can be used to identify execution progress 2205 in the case of errors. The value of the mark attribute from the 2206 last successfully executed MSML element is returned in an error 2207 response. Therefore the value of all mark attributes within an 2208 MSML document SHOULD be unique. 2210 For example, consider a call center coaching scenario where a 2211 supervisor can listen to the conversation between an agent and a 2212 customer, and provide hints to the agent, which are not heard by the 2213 customer. One join establishes a stream between the agent and the 2214 customer and another join establishes a stream between the agent and 2215 the supervisor. A third join is used to establish a half-duplex 2216 stream from the customer to the supervisor. The media server 2217 automatically bridges the media streams from the customer and the 2218 supervisor for the agent, and from the customer and the agent for the 2219 supervisor. 2221 Internet-draft Media Server Markup Language February 2008 2222 (MSML) 2224 Assuming the following connections, each with a single audio stream: 2226 conn:supervisor 2228 conn:agent 2230 conn:customer 2232 The following would create the media flows previously described: 2234 2235 2236 2237 2238 2239 2240 2241 2243 The following example, shows joining a participant to a multimedia 2244 conference. It assumes that the conference has a video presentation 2245 region named "topright". The "display" attribute is explained in 2246 section Video Stream Properties. 2248 2249 2250 2251 2252 2253 2254 2255 2257 9.9 2259 Media streams can have different properties such as the gain for an 2260 audio stream or a visual label for a video stream. These properties 2261 are specified as the content of elements (section ). 2262 is used to change the properties of a stream by 2263 including one or more elements that are to have their 2264 properties changed. 2266 Stream properties MUST be set as specified by the element as 2267 a child element of element. Any properties not 2268 included in the element when modifying a stream MUST remain 2269 unchanged. Setting a property for only one direction of a 2270 bidirectional stream MUST NOT affect the other direction. The 2272 Internet-draft Media Server Markup Language February 2008 2273 (MSML) 2275 directionality of streams can be changed using issuing an 2276 followed by a . Any streams that exist between the two objects 2277 that are not included within MUST NOT be affected. 2279 Attributes: 2281 id1: an identifier of either a conference or a connection. The 2282 instance name MUST NOT contain a wildcard if "id2" contains a 2283 wildcard. Mandatory. 2285 id2: an identifier of either a conference or a connection. The 2286 instance name MUST NOT contain a wildcard if "id1" contains a 2287 wildcard. Mandatory. 2289 mark: a token which can be used to identify execution progress 2290 in the case of errors. The value of the mark attribute from the 2291 last successfully executed MSML element is returned in an error 2292 response. Therefore the value of all mark attributes within an 2293 MSML document are RECOMMENDED to be unique. 2295 9.10 2297 Unjoin removes one or more media streams between two objects. In the 2298 absence of any content in element, all media streams between 2299 the objects MUST be removed. Individual streams may be removed by 2300 specifying them using elements, while the unspecified 2301 streams MUST NOT be removed. A bidirectional stream is changed to a 2302 unidirectional stream by unjoining the direction that is no longer 2303 required, using the element. Operator elements MUST NOT be 2304 specified within elements when streams are being unjoined 2305 using the element. Any specified stream operators MUST be 2306 ignored. 2308 and may be used together to move a media stream, such 2309 as from a main conference to a sidebar conference. 2311 Attributes: 2313 id1: an identifier of either a conference or a connection. The 2314 instance name MUST NOT contain a wildcard if "id2" contains a 2315 wildcard. Mandatory. 2317 id2: an identifier of either a conference or a connection. The 2318 instance name MUST NOT contain a wildcard if "id1" contains a 2319 wildcard. Mandatory. 2321 mark: a token which can be used to identify execution progress 2322 in the case of errors. The value of the mark attribute from the 2324 Internet-draft Media Server Markup Language February 2008 2325 (MSML) 2327 last successfully executed MSML element is returned in an error 2328 response. Therefore the value of all mark attributes within an 2329 MSML document SHOULD be unique. 2331 The following removes a participant from a conference and plays a 2332 leave tone for the remaining participants in the conference. 2334 2335 2336 2337 2338 2339 2341 2342 2344 9.11 2346 Monitor is a specialized unidirectional join that copies the media 2347 that is destined for a connection object. One example of the use for 2348 may be quality monitoring within a conference. The media 2349 stream may be removed using the element (see section 2350 ). 2352 Attributes: 2354 id1: an identifier of the connection to be monitored. 2355 Mandatory. Any other object class results in a 440 error. 2356 Wildcards MUST NOT be used. 2358 id2: an identifier of the object which is to receive the copy 2359 of the media destined to id1. id2 may be a connection or a 2360 conference. Mandatory. Any other object class results in a 440 2361 error. Wildcards MUST NOT be used. 2363 compressed: "true" or "false". Specifies whether the join 2364 should occur before or after compression. When "true", id2 must 2365 be a connection using the same media format as id1 or an error 2366 response (450) is generated. Default is "false. 2368 mark: a token which can be used to identify execution progress 2369 in the case of errors. The value of the mark attribute from the 2370 last successfully executed MSML element is returned in an error 2371 response. Therefore the value of all mark attributes within an 2372 MSML document SHOULD be unique. 2374 Internet-draft Media Server Markup Language February 2008 2375 (MSML) 2377 9.12 2379 Individual streams are specified using the element. They MAY 2380 be included as a child element in any of the stream manipulation 2381 elements , , or . 2383 The type of the stream is specified using a "media" attribute that 2384 uses values corresponding to the top-level MIME media types as 2385 defined in RFC 2046 [i7]. This specification only addresses audio and 2386 video media. Other specifications may define procedures for 2387 additional types. 2389 A bidirectional stream is identified when no direction attribute 2390 "dir" is present. A unidirectional stream is identified when a 2391 direction attribute is present. The "dir" attribute MUST have a value 2392 of "from-id1" or "to-id1" depending on the required direction. These 2393 values are relative to the identifier attributes of the parent 2394 element. 2396 The compressed attribute is used to distinguish the compressed nature 2397 of the stream when necessary. It is implementation specific what is 2398 used when the attribute is not present. Joining compressed streams 2399 acts much like an RTP [i3] relay. 2401 The properties of the media streams are specified as the content of 2402 elements when the element is used as a child of or 2403 . Stream elements MUST NOT have any content when they 2404 are used as a child of to identify specific streams to 2405 remove. 2407 Some properties are defined within MSML as additional attributes or 2408 child elements of that are media type specific. Ones for 2409 audio streams and video streams are defined in the following two sub- 2410 sections. Operators, viewed as properties of the media stream, MAY be 2411 specified as child elements of the element. 2413 Attributes: 2415 media: "audio" or video". Mandatory 2417 dir: "from-id1" or "to-id1". 2419 compressed: "true" or "false". Specifies whether the stream 2420 uses compressed media. Default is implementation specific. 2422 Internet-draft Media Server Markup Language February 2008 2423 (MSML) 2425 9.12.1 Audio Stream Properties 2427 Audio mixes can be specified to only mix the N-loudest participants. 2428 However there may be some "preferred" participants that are always 2429 able to contribute. When audio streams are joined to a conference 2430 that uses N-loudest audio mixing, preferred streams need to be 2431 identified. 2433 A preferred audio stream is identified using the "preferred" 2434 attribute. The "preferred" attribute MAY be used for an audio stream 2435 that is input to a conference and MUST NOT be used for other streams. 2437 Additional attributes of the element for audio streams are: 2439 Attributes: 2441 preferred: a boolean value that defines whether the stream does 2442 not contend for N-loudest mixing. A value of "true" means that 2443 the stream MUST always be mixed while a value of "false" means 2444 that the stream MAY contend for mixing into a conference when 2445 N-loudest mixing is enabled. Default "false". 2447 There are two elements that can be used to change the characteristics 2448 of an audio stream as defined below. 2450 9.12.1.1 2452 The element may be used to adjust the volume of an audio media 2453 stream. It may be set to a specific gain amount, to automatically 2454 adjust the gain to a desired target level, or to mute the stream. 2456 Attributes: 2458 id: an optional identifier which may be referenced elsewhere 2459 for sending events to the gain primitive. 2461 amt: a specific gain to apply specified in dB or the string 2462 "mute" indicating that the stream should be muted. This 2463 attribute MUST NOT be used if "agc" is present. 2465 agc: boolean indicating whether automatic gain control is to be 2466 used. This attribute MUST NOT be used if "amt" is present. 2468 tgtlvl: the desired target level for AGC specified in dBm0. 2469 This attribute MUST be specified if "agc" is set to "true". 2470 This attribute MUST NOT be specified if "agc" is not present. 2472 Internet-draft Media Server Markup Language February 2008 2473 (MSML) 2475 maxgain: the maximum gain that AGC may apply. Maxgain is 2476 specified in dB. This attribute MUST be used if "agc" is 2477 present and MUST NOT be used when "agc" is not present. 2479 9.12.1.2 2481 The element is used to filter tones and/or audio-band dtmf 2482 from a media stream. 2484 Attributes: 2486 dtmf: boolean indicating whether DTMF tones should be removed. 2488 tone: boolean indicating whether other tones should be removed. 2490 9.12.2 Video Stream Properties 2492 Video mixes define a presentation that may have multiple regions, 2493 such as a quad-split. Each region displays the video from one or more 2494 participants. When video streams are joined to such a conference, the 2495 region that will display the video needs to be specified as part of 2496 the join operation. 2498 The region that will display the video is specified using the 2499 "display" attribute. The "display" attribute MUST be used for a video 2500 stream that is input to a conference and MUST NOT be used for other 2501 streams. The value of the attribute MUST identify a (see 2502 section ) or a (see section ) that is 2503 defined for the conference. A stream MUST NOT be directly joined to a 2504 region that is defined within a selector. Changing the value of the 2505 "display" attribute can be used to change where in a video 2506 presentation layout a video stream is displayed. 2508 Additional attributes of the element for video streams are: 2510 Attributes: 2512 display: the identifier of a video layout region or selector 2513 that is to be used to display the video stream. 2515 override: specifies whether or not the given video stream is 2516 the override source in the region defined by "display" 2517 attribute. Valid values are: "true" or "false". Optional, 2518 default value is "false". Only a video stream that is input to 2519 a conference can be the override source. A particular region 2520 can have at most one override source at a time. The most 2521 recently joined video stream with this attribute set to "true" 2522 becomes the override source. When there's an override source in 2524 Internet-draft Media Server Markup Language February 2008 2525 (MSML) 2527 place, its video is always displayed in the region, regardless 2528 of what video selection algorithm (either a selector or 2529 continuous presence mode) is configured for that region. Once 2530 the override source is cleared, the conference MUST revert back 2531 to original video selection algorithm. 2533 9.12.2.1 2535 Some regions of video conferences may display different streams 2536 automatically, such as when voice activated switching is used. 2537 Connections MAY also be joined directly without the use of video 2538 mixing. In these cases, the element may be used to define 2539 visual display properties for a stream. 2541 The element MAY use any of the visual attributes defined for 2542 regions (see section ). This allows the visual aspects of 2543 regions within a to be tailored to the selected video 2544 stream, or for streams that are directly joined to display a name or 2545 logo. 2547 10. MSML Dialog Packages 2549 10.1 Overview 2551 MSML Dialog Packages define an XML [n2] language for composing 2552 complex media objects from a vocabulary of simple media resource 2553 objects called primitives. It is primarily a descriptive or 2554 declarative language to describe media processing objects. MSML 2555 dialogs operate on a single or multiple streams which are identified 2556 by the MSML document outside the scope of the MSML dialog package. 2558 MSML Dialogs are intended to be used in different environments. As 2559 such, the language itself does not define how an MSML Dialog is used. 2560 Each environment in which MSML Dialog is used must define how it is 2561 used, the set of services provided and the mechanism for passing 2562 information between the environment and MSML Dialog. The specific 2563 mechanisms used to realize the interface between MSML Dialog and its 2564 environment are platform specific. 2566 MSML Dialog packages provide two models for access to media resources 2567 and service creation building blocks. Both models MAY be used in 2568 conjunction with each other in a complementary manner. The first 2569 model (referred to as "Media Primitives and Composites", part of the 2570 mandatory MSML Dialog Base package) contains media primitives (such 2571 as digit collection and announcements) and composite functions (such 2572 as play and collect combined as a single operation). The second model 2573 (referred to as "Media Groups", part of the optional MSML Dialog 2574 Group package) allows the ability to define complex customized 2576 Internet-draft Media Server Markup Language February 2008 2577 (MSML) 2579 interactions, via event passing mechanisms, between media primitives, 2580 if required. 2582 MSML Dialog Core Package 2584 Defines core framework over which all MSML dialog packages 2585 operate. 2587 MSML Dialog Base Package 2589 Media Primitives 2590 or 2591 DTMF digit collection 2592 2593 Playing of Announcements 2594 2595 Generation of DTMF digits 2596 2597 Tone genration 2598 2599 Media recording 2601 Media Composites 2602 2603 Supports play and collect operation. 2604 Composite function with inclusion of play. 2605 2606 Supports play and record operation. 2607 Composite function with inclusion of play. 2609 MSML Dialog Group Package 2611 2612 Allows grouping of media primitives for parallel 2613 execution, with an event exchange mechanism 2614 between the media primitives to achieve 2615 customized media operations. All the above media 2616 primitive elements are accepted within the 2617 group. 2619 Following operations MUST be supported using elements described above 2620 using either the MSML Dialog Base Package or MSML Dialog Group 2621 Package. 2623 Announcement only 2624 2625 Collection only 2626 or 2628 Internet-draft Media Server Markup Language February 2008 2629 (MSML) 2631 Recording only 2632 2634 Play and Collect 2635 2636 2637 2639 Play and Record 2640 2641 2642 2644 Additional MSML Dialog packages are: 2646 O MSML Dialog Transform Package 2648 O MSML Dialog Speech Package 2650 O MSML Fax Detection Package 2652 O MSML Fax Send/Receive Package 2654 MSML Dialogs MAY be used to simply expose primitive media resource 2655 objects but will be used more often to describe dialog operations and 2656 media transformation objects which can be controlled via user 2657 interaction. 2659 MSML Dialogs do not contain any computation or flow control 2660 constructs. There are no results automatically generated when media 2661 operations complete. Results MUST be explicitly requested using a 2662 or element within the definition of the MSML Dialog. 2664 10.2 Primitives 2666 Primitives perform a single function on a media stream or multiple 2667 streams such as generating audio/video, recognizing speech or DTMF, 2668 or adjusting the gain. They may be composed so that primitives 2669 execute concurrently. Primitives not composed for concurrent 2670 execution MUST simply execute sequentially in the order they occur in 2671 a MSML document. All concurrently executing primitives in the same 2672 MSML object (defined in one MSML document) MAY interact with each 2673 other through events (see MSML Dialog Group package). 2675 Primitives are categorized into one of the following descriptive 2676 categories. 2678 Internet-draft Media Server Markup Language February 2008 2679 (MSML) 2681 o recognizers have a media input but no output. They allow 2682 different things within a media stream to be recognized or 2683 detected and for events to be generated based upon received 2684 media. 2686 o transformers have one media input and output and may send and 2687 receive events; 2689 o sources and sinks generate or consume media. They have either a 2690 media input or a media output but not both. They may receive 2691 and generate events. 2693 o composites combine underlying primitives to provide higher- 2694 level user interaction, without the need for specific event 2695 based exchange between the primitives. The composite elements 2696 provide a simpler mechanism for more commonly used services, 2697 such as play and collect or play and record. 2699 Primitives may define different media processing behavior (states) 2700 based upon the events which they receive. Primitives which support 2701 different processing states must define their default starting state 2702 and should support the "initial" attribute to allow that state to be 2703 specified when the primitive is instantiated. All primitives must 2704 support the "terminate" event class. 2706 The following types of primitives are defined within this 2707 specification: 2709 Recognizers Transformers Source/Sink Composites 2710 ------------------------------------------------------ 2711 dtmf/collect agc play dtmf/collect 2712 faxdetect clamp record record 2713 speech gain dtmfgen 2714 vad gate tonegen 2715 relay faxsend 2716 faxrcv 2718 Primitives have shadow variables, similar to those within VoiceXML 2719 [n5], which are automatically assigned values when the primitives are 2720 used. Upon initialization of an MSML Dialog context, all shadow 2721 variables have the string value "undefined". Each primitive has its 2722 own instance of shadow variables which are global in scope to the 2723 entire MSML Dialog context. 2725 Names SHOULD be assigned to individual primitives when more than one 2726 primitive of the same type is used within one MSML document. Shadow 2727 variables are overwritten if the primitive has not been named and is 2728 instantiated a second time. 2730 Internet-draft Media Server Markup Language February 2008 2731 (MSML) 2733 Shadow variables cannot be modified under user control. They may be 2734 returned from the MSML Dialog context using the element. 2736 10.3 Events 2738 Events provide the mechanism for primitives to interact with each 2739 other and for a MSML context to interact with its external 2740 environment. The external environment is defined by the way in which 2741 a MSML context has been invoked. This will often be through MSML but 2742 other languages and protocols such as SIP may also be used. 2744 Every primitive and group conceptually implements their own event 2745 queue. Events sent to them get placed into their associated queue. 2746 Events are removed from their queues and processed in order. 2747 Primitives within a group conceptually have their own thread of 2748 execution. Due to the asynchronous nature of servicing events from 2749 multiple queues, it cannot be assumed that several events sent in 2750 sequence to different queues, will be processed in the order in which 2751 they were sent. For example, if recognition of something led to 2752 sending events to both a and a in that order, it is 2753 possible that the may process its event before the . 2755 Primitives each define the set of events which they support and the 2756 behavior associated with their handling of each event. This allows 2757 many types of behaviors to be defined. For example, VCR type controls 2758 can be constructed by defining primitives which support events 2759 corresponding to each control. Media recognition/detection can be 2760 used to cause those events to be generated. 2762 Alternatively, events can be originated elsewhere, such as from a 2763 Control Agent, and simply received by the primitive implementing the 2764 control. Examples of the use of events include adjusting volume 2765 (gain) and pause and resume of both announcement playout and record 2766 creation. 2768 Primitives act on events based upon the longest match of an event 2769 name. Event names are a period '.' delimited sequence of tokens. The 2770 first token, or the root of the name, can be considered an event 2771 class. Matching allows a standard meaning to be defined and then 2772 extended based upon what triggers an event's generation. For example, 2773 a record primitive has different behavior depending upon whether it 2774 completed because a user stopped speaking or because it was 2775 cancelled. The recording is retained in the first case but not the 2776 second. 2778 Longest match allows new recognizers to be created and used without 2779 changing how existing primitives are defined. For example, a face 2780 recognition capability could be created which generates a 2782 Internet-draft Media Server Markup Language February 2008 2783 (MSML) 2785 terminate.frowning event when a user looks puzzled. Although no 2786 primitive directly defines this event, it will still effect a generic 2787 terminate action. Primitives which require specialized behavior based 2788 upon frowning may be extended to support this. As well, the event can 2789 still be exported from the MSML context without requiring that 2790 primitives receiving the event understand facial expressions. 2792 10.4 MSML Dialog Usage with SIP 2794 MSML Dialogs MAY be used directly with SIP for dialog interactions 2795 (e.g., IVR or fax). It can be initially invoked as part of the 2796 "Prompt and Collect" service described in "Basic Network Media 2797 Services with SIP" [n7]. That defines service indicators for a small 2798 number of well defined services using the user part of the SIP 2799 Request-URI (R-URI). 2801 The prompt and collect service uses "dialog" as the service 2802 indicator. URI parameters further refine the specific IVR request. 2803 This document defines an additional parameter "msml-param" for the 2804 dialog service indicator as follows: 2806 dialog-parameters = ";" ( dialog-param [ vxml-parameters ] ) 2807 | moml-param 2808 dialog-param = "voicexml=" dialog-url 2809 moml-param = "moml=" moml-url 2811 There are no additional URI parameters when MSML is used as the 2812 dialog language. 2814 MSML Dialogs defines discrete IVR dialog commands. These commands MAY 2815 be included directly in the body of the INVITE to the "dialog" 2816 service indicator by using the "cid" [n8] URL scheme. This scheme 2817 identifies a message body part which in this case would contain the 2818 MSML Dialog request. Note that a multipart message body, containing a 2819 single part, MUST be present even if the INVITE does not contain an 2820 SDP offer. Subsequent MSML Dialog requests are sent in the body of 2821 SIP INFO messages as are all messages from a media server. 2823 An example of SIP URI as described above is: 2825 sip:dialog@mediaserver.example.net;\ 2826 moml=cid:14864099865376@appserver.example.net 2828 The body part that contained the MSML Dialog referenced by the URL 2829 would have a Content-Id header of: 2831 Content-Id: <14864099865376@appserver.example.net> 2833 Internet-draft Media Server Markup Language February 2008 2834 (MSML) 2836 The results of executing an or , or of executing a 2837 which has a "target" attribute value equal to "source", are 2838 notified in SIP INFO messages using the element from MSML 2839 Core package. No messages are sent if execution completes normally 2840 without executing one of these elements. 2842 If there is an error during validation or execution, then a media 2843 server MUST notify the error as described above and must include the 2844 namelist items "moml.error.status" and "moml.error.description". The 2845 values for these items are defined in section 12. 2847 A restricted subset of MSML Dialogs can also be used with the 2848 "Announcement" service defined in [n7]. This service uses "annc" as 2849 the service indicator and defines parameters that describe an 2850 announcement. The "play=" parameter identifies the URL of a prompt or 2851 a provisioned announcement sequence. The value of the "play=" 2852 parameter can refer to a MSML Dialog body part using a "cid" URL as 2853 described above. That body part must only contain the 2854 primitive. 2856 Using MSML Dialogs enhances the announcement service by allowing the 2857 client to specify a sequence of audio segments rather than requiring 2858 each sequence to be provisioned as well as support for video. 2859 Moreover, MSML Dialogs define a standard set of variables in contrast 2860 to [n7] which defines a parameterization mechanism but does not 2861 formally specify any semantics. 2863 If a media server does not understand the "cid" scheme or does not 2864 understand MSML Dialogs, it must respond with the SIP response code 2865 "488 - not acceptable here". If the MSML Dialog body contains 2866 elements other than the primitive, or there are errors during 2867 validation, a media server must respond with a SIP response code "400 2868 - bad request". Finally, if there is a discrepancy between parameters 2869 specified in the Request-URI and corresponding attributes defined in 2870 the MSML Dialog body, the Request-URI parameters must be silently 2871 ignored. 2873 MSML Dialogs MUST NOT change the operation of the announcement 2874 service from that defined in [n7]. When the announcement completes, a 2875 media server issues a SIP BYE request. The INFO method MUST NOT used 2876 with the announcement service. 2878 10.5 MSML Dialog Structure and Modularity 2880 MSML is structured as a set of packages. Only the core and base 2881 packages are required. The Dialog Core package, defines the framework 2882 for MSML requests to a media server, without specific functionality. 2883 It consists of the "primitive" abstraction, an abstract element for 2885 Internet-draft Media Server Markup Language February 2008 2886 (MSML) 2888 control flow, the sequential execution model, and the element. 2889 That is, the MSML Dialog Core package allows for the execution of a 2890 sequence of one or more media processing primitives with the ability 2891 to notify events to the invocation environment. 2893 Primitives are contained within the MSML Dialog Base package, which 2894 defines the basic , , , , and 2895 elements. Another package, the MSML Dialog Transform 2896 package, defines the simple half duplex filters. More advanced 2897 primitives are defined in the speech and fax packages. The MSML 2898 speech package depends on the MSML Dialog base package as it extends 2899 the capability of by adding synthesized speech. Finally, the 2900 group execution model, which is currently the only element which 2901 changes the flow of control is defined in a separate MSML Dialog 2902 Group package. All of these packages are optional with the exception 2903 that MSML Dialog Core and MSML Dialog Base packages MUST be 2904 implemented to provide the minimal functionality. 2906 10.6 MSML Dialog Core Package 2908 The MSML Dialog Core package defines the structural framework and 2909 abstractions for MSML Dialogs(via its schema). It also defines the 2910 basic elements which are not part of the core primitive or control 2911 abstractions. This package is dependent on the MSML Core package. 2912 Events generated by MSML Dialogs, such as prompt completion, digits 2913 collected, or dialog termination, etc, are communicated by the Media 2914 Server via the MSML Core Package (see MSML Core Package ). 2916 MSML Dialogs are executed independently from the MSML core context. 2917 When an MSML Dialog is started, MSML allocates the dialog control 2918 resources, and if successful, starts those resources executing. MSML 2919 core execution then continues without waiting for the MSML dialog to 2920 complete. This forking of MSML dialog invocation from the MSML core 2921 context is done via the element. Media streams are 2922 created between the MSML dialog target and other internal media 2923 server resources as part of dialog execution. Stream creation is 2924 subject to the requirements defined in MSML Core package and media 2925 streams as defined in MSML Conference Core package. 2927 10.6.1 2929 The element is used to instantiate an MSML media dialog 2930 on connections or conferences. The dialog is specified either inline 2931 or by a URI [n6]. Inline dialogs MUST be composed of any of the MSML 2932 Dialog packages. MSML dialogs MAY be defined externally as VoiceXML 2933 [n5]. The MSML dialog description MUST NOT be inline if the src 2934 attribute, containing a URI, is present. 2936 Internet-draft Media Server Markup Language February 2008 2937 (MSML) 2939 The originator of the MSML dialog is notified using a 2940 "msml.dialog.exit" event when the dialog completes. Any results 2941 returned by the dialog when it exits are sent as a namelist to the 2942 event. 2944 The "msml.dialog.exit" event is also used when dialogs fail due to 2945 errors encountered fetching external documents or errors that occur 2946 within the dialog execution thread. In this case, a namelist 2947 containing the items "dialog.exit.status" and 2948 "dialog.exit.description" is returned with the event to inform the 2949 client of the failure and the failure reason. The values of these 2950 items are defined within this package and the MSML Core package. 2951 Information from the failed dialog may be returned as additional 2952 namelist items. 2954 Attributes: 2956 target: an identifier of a connection or a conference which 2957 will interact with the dialog. The identifier must not contain 2958 wildcards. Mandatory. 2960 src: the URL of the dialog description. MUST NOT be used if the 2961 MSML dialog description is inline. Otherwise an error (422) 2962 will result and MSML document execution will stop. 2964 type: a MIME type which identifies the type of language used to 2965 describe the dialog. application/moml+xml and 2966 application/vxml+xml are used to identify MSML Dialogs and 2967 VoiceXML [n5] respectively. Mandatory. 2969 name: an instance name for the dialog. If the attribute is not 2970 present, the media server will assign an identifier to the 2971 dialog. If the attribute is present but the name is already 2972 associated with the target, an error (431) will result and MSML 2973 document execution will stop. Any results that a dialog 2974 generates will be correlated to its identifier. 2976 mark: a token which can be used to identify execution progress 2977 in the case of errors. The value of the mark attribute from the 2978 last successfully executed MSML element is returned in an error 2979 response. Therefore the value of all "mark" attributes within 2980 an MSML document should be unique. 2982 The following sections show examples of initiating an external MSML 2983 dialog, an in-line embedded MSML dialog, and an MSML initiated 2984 VoiceXML dialog. 2986 The following example starts a MSML dialog on a connection. 2988 Internet-draft Media Server Markup Language February 2008 2989 (MSML) 2991 2992 2993 2997 2999 The following example starts an in-line embedded MSML dialog on a 3000 connection. 3001 3002 3003 3004 3005 3010 3013 3014 3016 The following example starts a VoiceXML dialog on a connection. 3018 3019 3020 3024 3026 If this dialog fails once its execution thread had begun, for example 3027 the fetch of the VoiceXML document failed, an example of the event 3028 which would be returned would be: 3030 3031 3033 dialog.exit.status 3034 423 3035 dialog.exit.description 3036 External document fetch error 3038 Internet-draft Media Server Markup Language February 2008 3039 (MSML) 3041 3043 10.6.2 3045 Dialog end is used to terminate a MSML dialog created through 3046 before it completes of its own accord. The operation of 3047 depends on the dialog language being used by the 3048 executing context. When that context is VoiceXML, a 3049 "connection.disconnected" event will be thrown to the VoiceXML 3050 application. When that context is MSML Dialog, a "terminate" event 3051 will be sent to the MSML core context. 3053 allows the executing dialog the opportunity to gracefully 3054 complete before generating a "msml.dialog.exit" event. Dialog results 3055 may be returned and will be contained as a namelist to that event. 3057 Attributes: 3059 id: the identifier of a dialog. Mandatory. 3061 mark: a token which can be used to identify execution progress 3062 in the case of errors. The value of the mark attribute from the 3063 last successfully executed MSML Dialog element is returned in 3064 an error response. Therefore the value of all "mark" attributes 3065 within an MSML document should be unique. 3067 For example, if the dialog from the previous example was still 3068 executing, the following would terminate the dialog and generate a 3069 "msml.dialog.exit" event. 3071 3072 3073 3074 3076 10.6.3 3078 Sends an event and optional namelist to the recipient identified by 3079 the target attribute. Event names are defined by the recipient. In 3080 the case where the recipient is an MSML Dialog group or primitive, 3081 the events are defined within this document. Other recipients MAY use 3082 names that are suitable for their environment. 3084 The "target" attribute specifies the recipient of the event. 3085 Recipients MAY be other MSML Dialog primitives or groups executing 3087 Internet-draft Media Server Markup Language February 2008 3088 (MSML) 3090 within the object, the object itself, or the environment which 3091 invoked the MSML Dialog. Sending events to media primitives or groups 3092 is supported by the MSML Dialog Group package. Any target which is 3093 unknown within the object is assumed to be destined to the external 3094 environment. By convention, the string "source" SHOULD used to 3095 address that environment but any target name distinct from the MSML 3096 Dialog namespace MAY be used. 3098 Attributes: 3100 event: the name of an event. Mandatory. 3102 target: the recipient of the event. The recipient MUST be a 3103 MSML Dialog primitive, the currently executing group, or the 3104 MSML Dialog environment. A primitive is specified by a 3105 primitive type, optionally appended by a period '.' followed by 3106 the identifier of a primitive. Identifiers are only needed when 3107 more than one primitive of the same type exists in the object. 3108 The executing group is specified using the token "group". The 3109 environment is specified using the token "source", optionally 3110 appended by a period '.' followed by any environment specific 3111 target. Mandatory. 3113 namelist: a list of zero or more shadow variables which are 3114 included with the event. 3116 10.6.4 3118 Exit causes execution of the MSML Dialog to terminate. 3120 Attributes: 3122 namelist: a list of one or more shadow variables which MAY 3123 optionally be sent to the context which invoked the MSML Dialog 3124 object. 3126 10.6.5 3128 Disconnect is similar to but has the additional semantics of 3129 indicating to the context which invoked the MSML Dialog, that it 3130 should disconnect from a media server, the media stream associated 3131 with the object. The method of disconnection depends upon how the 3132 media stream was initially established. If SIP was used, a 3133 would cause a media server to issue a BYE request. The 3134 request would be sent for the SIP dialog associated with media 3135 session on which the MSML Dialog was operating. 3137 Attributes: 3139 Internet-draft Media Server Markup Language February 2008 3140 (MSML) 3142 namelist: a list of one or more shadow variables which MAY 3143 optionally be sent to the context which invoked the MSML Dialog 3144 object. 3146 10.7 MSML Dialog Base Package 3148 The MSML Dialog Base package defines a required set of base 3149 functionality for Media Server. It support individual media 3150 primitives, such as playing an announcement or collection digits, as 3151 well as composite operations such as play and collect. When this 3152 package is used in conjunction with MSML Dialog Group package the 3153 event based mechanism is used to control primitives. This package may 3154 also be used in conjunction with MSML Speech package to extend the 3155 functionality of prompts to include TTS and user input collection to 3156 include ASR. 3158 In the following sections, subsections of a primitive define child 3159 elements of that primitive and are not themselves considered 3160 primitives. They do not receive events or populate shadow variables. 3162 10.7.1 3164 Play is used to generate an audio or video stream. It MUST play in 3165 sequence the media created by the child media elements