Network Working Group Kutscher Internet-Draft Ott Expires: January 18, 2002 Bormann TZI, Universitaet Bremen July 20, 2001 Session Description and Capability Negotiation draft-ietf-mmusic-sdpng-01.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." To view the entire list of Internet-Draft Shadow Directories, see http://www.ietf.org/shadow.html. This Internet-Draft will expire on January 18, 2002. Copyright Notice Copyright (C) The Internet Society (2001). All Rights Reserved. Abstract This document defines a language for describing multimedia sessions with respect to configuration parameters and capabilities of end systems. This document is a product of the Multiparty Multimedia Session Control (MMUSIC) working group of the Internet Engineering Task Force. Comments are solicited and should be addressed to the working group's mailing list at confctrl@isi.edu and/or the authors. Document Revision $Revision: 2.0 $ Kutscher, et. al. Expires January 18, 2002 [Page 1] Internet-Draft SDPng July 2001 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 4 2. Terminology and System Model . . . . . . . . . . . . . . . 6 3. SDPng . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.1 Conceptual Outline . . . . . . . . . . . . . . . . . . . . 9 3.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . 9 3.1.2 Components & Configurations . . . . . . . . . . . . . . . 11 3.1.3 Constraints . . . . . . . . . . . . . . . . . . . . . . . 13 3.1.4 Session Attributes . . . . . . . . . . . . . . . . . . . . 14 3.1.4.1 Owner . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.4.2 Session Identification . . . . . . . . . . . . . . . . . . 15 3.1.4.3 Time Specification (SDP 't=', 'r=', and 'z=' lines) . . . 16 3.1.4.4 Component Semantic Specification . . . . . . . . . . . . . 17 3.2 Syntax Definition Mechanisms . . . . . . . . . . . . . . . 18 3.3 External Definition Packages . . . . . . . . . . . . . . . 20 3.3.1 Profile Definitions . . . . . . . . . . . . . . . . . . . 20 3.3.2 Library Definitions . . . . . . . . . . . . . . . . . . . 21 3.4 Mappings . . . . . . . . . . . . . . . . . . . . . . . . . 22 4. Formal Specification . . . . . . . . . . . . . . . . . . . 24 5. Use of SDPng in conjunction with other IETF Signaling Protocols . . . . . . . . . . . . . . . . . . . . . . . . 25 5.1 The Session Announcement Protocol (SAP) . . . . . . . . . 25 5.2 Session Initiation Protocol (SIP) . . . . . . . . . . . . 26 5.3 Real-Time Streaming Protocol (RTSP) . . . . . . . . . . . 26 5.4 Media Gateway Control Protocol (MEGACOP) . . . . . . . . . 27 6. Open Issues . . . . . . . . . . . . . . . . . . . . . . . 28 References . . . . . . . . . . . . . . . . . . . . . . . . 29 Authors' Addresses . . . . . . . . . . . . . . . . . . . . 30 A. Base SDPng Specifications for Audio Codec Descriptions . . 31 A.1 DVI4 . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 A.2 G.722 . . . . . . . . . . . . . . . . . . . . . . . . . . 32 A.3 G.726 . . . . . . . . . . . . . . . . . . . . . . . . . . 32 A.4 G.728 . . . . . . . . . . . . . . . . . . . . . . . . . . 32 A.5 G.729 . . . . . . . . . . . . . . . . . . . . . . . . . . 32 A.6 G.729 Annex D and E . . . . . . . . . . . . . . . . . . . 33 A.7 GSM . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 A.7.1 GSM Full Rate . . . . . . . . . . . . . . . . . . . . . . 33 A.7.2 GSM Half Rate . . . . . . . . . . . . . . . . . . . . . . 33 A.7.3 GSM Enhanced Full Rate . . . . . . . . . . . . . . . . . . 33 A.8 L8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 A.9 L16 . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 A.10 LPC . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 A.11 MPA . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 A.12 PCMA and PCMU . . . . . . . . . . . . . . . . . . . . . . 34 A.13 QCELP . . . . . . . . . . . . . . . . . . . . . . . . . . 34 A.14 VDVI . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Full Copyright Statement . . . . . . . . . . . . . . . . . 35 Kutscher, et. al. Expires January 18, 2002 [Page 2] Internet-Draft SDPng July 2001 1. Introduction Multiparty multimedia conferencing is one of the applications that require dynamic interchange of end system capabilities and the negotiation of a parameter set that is appropriate for all sending and receiving end systems in a conference. For some applications, e.g. for loosely coupled conferences or for broadcast scenarios, it may be sufficient to simply have session parameters be fixed by the initiator of a conference. In such a scenario no negotiation is required because only those participants with media tools that support the predefined settings can join a media session and/or a conference. This approach is applicable for conferences that are announced some time ahead of the actual start date of the conference. Potential participants can check the availability of media tools in advance and tools like session directories can configure media tools on startup. This procedure however fails to work for conferences initiated spontaneously like Internet phone calls or ad-hoc multiparty conferences. Fixed settings for parameters like media types, their encoding etc. can easily inhibit the initiation of conferences, for example in situations where a caller insists on a fixed audio encoding that is not available at the callee's end system. To allow for spontaneous conferences, the process of defining a conference's parameter set must therefore be performed either at conference start (for closed conferences) or maybe (potentially) even repeatedly every time a new participant joins an active conference. The latter approach may not be appropriate for every type of conference without applying certain policies: For conferences with TV-broadcast or lecture characteristics (one main active source) it is usually not desired to re-negotiate parameters every time a new participant with an exotic configuration joins because it may inconvenience existing participants or even exclude the main source from media sessions. But conferences with equal "rights" for participants that are open for new participants on the other hand would need a different model of dynamic capability negotiation, for example a telephone call that is extended to a 3-parties conference at some time during the session. SDP [2] allows to specify multimedia sessions (i.e. conferences, "session" as used here is not to be confused with "RTP session"!) by providing general information about the session as a whole and specifications for all the media streams (RTP sessions and others) to be used to exchange information within the multimedia session. Currently, media descriptions in SDP are used for two purposes: Kutscher, et. al. Expires January 18, 2002 [Page 3] Internet-Draft SDPng July 2001 o to describe session parameters for announcements and invitations (the original purpose of SDP) and o to describe the capabilities of a system and possibly provide a choice between a number of alternatives (which SDP was not designed for). A distinction between these two "sets of semantics" is only made implicitly. This document is based upon a set of requirements specified in a companion document [1] In the following we first introduce a model for session description and capability negotiation and introduce the basic terms used throughout this specification (section 2). Then we outline the concept for the concepts underlying SDPng and introduce the syntactical components step by step in section 3. In section 4, we provide a formal definition of the SDPng session description language. Finally, we overview aspects of using SDPng with various IETF signaling protocols in section 5. In Appendix A, we introduce basic audio codec and payload type definitions. Kutscher, et. al. Expires January 18, 2002 [Page 4] Internet-Draft SDPng July 2001 2. Terminology and System Model Any (computer) system has, at a time, a number of rather fixed hardware as well as software resources. These resources ultimately define the limitations on what can be captured, displayed, rendered, replayed, etc. with this particular device. We term features enabled and restricted by these resources "system capabilities". Example: System capabilities may include: a limitation of the screen resolution for true color by the graphics board; available audio hardware or software may offer only certain media encodings (e.g. G.711 and G.723.1 but not GSM); and CPU processing power and quality of implementation may constrain the possible video encoding algorithms. In multiparty multimedia conferences, participants employ different "components" in conducting the conference. Example: In lecture multicast conferences one component might be the voice transmission for the lecturer, another the transmission of video pictures showing the lecturer and the third the transmission of presentation material. Depending on system capabilities, user preferences and other technical and political constraints, different configurations can be chosen to accomplish the "deployment" of these components. Each component can be characterized at least by (a) its intended use (i.e. the function it shall provide) and (b) a one or more possible ways to realize this function. Each way of realizing a particular function is referred to as a "configuration". Example: A conference component's intended use may be to make transparencies of a presentation visible to the audience on the Mbone. This can be achieved either by a video camera capturing the image and transmitting a video stream via some video tool or by loading a copy of the slides into a distributed electronic whiteboard. For each of these cases, additional parameters may exist, variations of which lead to additional configurations (see below). Two configurations are considered different regardless of whether they employ entirely different mechanisms and protocols (as in the previous example) or they choose the same and differ only in a single parameter. Example: In case of video transmission, a JPEG-based still image protocol may be used, H.261 encoded CIF images could be sent as could H.261 encoded QCIF images. All three cases constitute Kutscher, et. al. Expires January 18, 2002 [Page 5] Internet-Draft SDPng July 2001 different configurations. Of course there are many more detailed protocol parameters. Each component's configurations are limited by the participating system's capabilities. In addition, the intended use of a component may constrain the possible configurations further to a subset suitable for the particular component's purpose. Example: In a system for highly interactive audio communication the component responsible for audio may decide not to use the available G.723.1 audio codec to avoid the additional latency but only use G.711. This would be reflected in this component only showing configurations based upon G.711. Still, multiple configurations are possible, e.g. depending on the use of A-law or u-Law, packetization and redundancy parameters, etc. In this system model, we distinguish two types of configurations: o potential configurations (a set of any number of configurations per component) indicating a system's functional capabilities as constrained by the intended use of the various components; o actual configurations (exactly one per instance of a component) reflecting the mode of operation of this component's particular instantiation. Example: The potential configuration of the aforementioned video component may indicate support for JPEG, H.261/CIF, and H.261/QCIF. A particular instantiation for a video conference may use the actual configuration of H.261/CIF for exchanging video streams. In summary, the key terms of this model are: o A multimedia session (streaming or conference) consists of one or more conference components for multimedia "interaction". o A component describes a particular type of interaction (e.g. audio conversation, slide presentation) that can be realized by means of different applications (possibly using different protocols). o A configuration is a set of parameters that are required to implement a certain variation (realization) of a certain component. There are actual and potential configurations. * Potential configurations describe possible configurations that are supported by an end system. Kutscher, et. al. Expires January 18, 2002 [Page 6] Internet-Draft SDPng July 2001 * An actual configuration is an "instantiation" of one of the potential configurations, i.e. a decision how to realize a certain component. In less abstract words, potential configurations describe what a system can do ("capabilities") and actual configurations describe how a system is configured to operate at a certain point in time (media stream spec). To decide on a certain actual configuration, a negotiation process needs to take place between the involved peers: 1. to determine which potential configuration(s) they have in common, and 2. to select one of this shared set of common potential configurations to be used for information exchange (e.g. based upon preferences, external constraints, etc.). In SAP-based [11] session announcements on the Mbone, for which SDP was originally developed, the negotiation procedure is non-existent. Instead, the announcement contains the media stream description sent out (i.e. the actual configurations) which implicitly describe what a receiver must understand to participate. In point-to-point scenarios, the negotiation procedure is typically carried out implicitly: each party informs the other about what it can receive and the respective sender chooses from this set a configuration that it can transmit. Capability negotiation must not only work for 2-party conferences but is also required for multi-party conferences. Especially for the latter case it is required that the process of determining the subset of allowable potential configurations is deterministic to reduce the number of required round trips before a session can be established. The requirements for the SDPng specification, subdivided into general requirements and requirements for session descriptions, potential and actual configurations as well as negotiation rules, are captured in a companion document [1]. Kutscher, et. al. Expires January 18, 2002 [Page 7] Internet-Draft SDPng July 2001 3. SDPng This section introduces the underlying concepts of the Session Description Protocol - next generation (SDPng) that is to meet most of the above requirements. The focus of this section is on the concepts of such a capability description and negotiation language with a stepwise introduction of the various syntactical elements; a full formal specification is provided in section 4. 3.1 Conceptual Outline The description language follows the system model introduced in the beginning of this document. We use a rather abstract language to avoid misinterpretations due to different intuitive understanding of terms as far as possible. The concept of a capability description language addresses various pieces of a full description of system and application capabilities in four separate "sections": Definitions (elementary and compound); see Section 3.1.1. Potential or Actual Configurations; see Section 3.1.2. Constraints; see Section 3.1.3. Session attributes; see Section 3.1.4. 3.1.1 Definitions The definition section specifies a number of basic abstractions that are later referenced to avoid repetitions in more complex specifications and allow for a concise representation. Definition elements are labelled with an identifier by which they may be referenced. They may be elementary or compound (i.e. combinations of elementary entities). Examples of definitions of that sections include (but are not limited to) codec definitions, redundancy schemes, transport mechanisms and payload formats. Elementary definition elements do not reference other elements. Each elementary entity only consists of one of more attributes and their values. Default values specified in the definition section may be overridden in descriptions for potential (and later actual) configurations. A mechanisms for overriding definitions is specified below. For the moment, elementary elements are defined for media types (i.e. codecs) and for media transports. For each transport and for each codec to be used, the respective attributes need to be defined. Kutscher, et. al. Expires January 18, 2002 [Page 8] Internet-Draft SDPng July 2001 This definition may either be provided within the "Definitions" section itself or in an external document (similar to the audio-video profile or an IANA registry that define payload types and media stream identifiers. It is not required to define all codec, transport mechanisms in a definitions sections and reference them in the definition of potential and actual configurations. Instead, a syntactic mechanism is defined that allows to specify some definitions directly in a configurations section. Examples for elementary definitions: The element type "audio-codec" is used in these examples to define audio codec configurations. The configuration parameters are given as attribute values. Definitions may have default values specified along with them for each attribute (as well as for their contents). Some of these default values may be overridden so that a codec definition can easily be re-used in a different context (e.g. by specifying a different sampling rate) without the need for a large number of base specifications. In the following example the definition of audio-L16-mono is re-used for the defintion of the corresponding stereo codec. Appendix A provides a complete set of corresponding audio-codec definitions of the codec used in RFC 1890 [4]. The example shows how exisiting defintions can be referenced in new definitiones. This approach allows to have simple as well as more complex definitions which are commonly used be available in an extensible set of reference documents. Section 3.3 specifies the mechanisms for external references. Besides definitions of audio codecs there will be other definitions like RTP payload format and specific transport mechanisms that are suitable to be defined in a defintion section for later referencing. The following example shows how RTP payload types are defined using a pre-defined codec. Kutscher, et. al. Expires January 18, 2002 [Page 9] Internet-Draft SDPng July 2001 In this example, the payload type "rtp-avp-11" is defined with payload type number 11, referencing the codec "audio-L16-mono". Instead of referencing an existing definition it is also possible to define the format "inline": Note: For negotiation between endpoints, it may be helpful to define two modes of operation: explicit and implicit. Implicit specifications may refer to externally defined entities to minimize traffic volume, explicit specifications would list all external definitions used in a description in the "Definitions" section. Again, see Section 3.3 for complete discussion of external definitions. The "Definitions" section may be empty if all transport, codecs, and other pieces needed to the specify Potential and Actual Configurations (as detailed below) are either included by referencing external definitions or are explicitly described within the Configurations themselves. 3.1.2 Components & Configurations The "Configurations" section contains all the components that constitute the multimedia conference (IP telephone call, multiplayer gaming session etc.). For each of these components, the potential and, later, the actual configurations are given. Potential configurations are used during capability exchange and/or negotiation, actual configurations to configure media streams after negotiation (e.g. with RTSP) or in session announcements (e.g. via SAP). A potential and the actual configuration of a component may be identical. Each component is labelled with an identifier so that it can be referenced, e.g. to associate semantics with a particular media stream. For such a component, any number of configurations may be given with each configuration describing an alternate way to realize the functionality of the respective component. Each configuration (potential as well as actual) is labelled with an identifier. A configuration combines one or more (elementary and/or compound) entities from the "Definitions" section to describe a potential or an actual configuration. Within the specification of the configuration, default values from the referenced entities may Kutscher, et. al. Expires January 18, 2002 [Page 10] Internet-Draft SDPng July 2001 be overwritten. Note: Not all protocol environments and their respective operation allow to explicitly distinguish between Potential and Actual Configurations. Therefore, SDPng so far does not provide for syntactical identification of a Configurations as being a Potential or an Actual one. The following example shows how RTP sessions can be described by referencing payload definitions. For example, an IP telephone call may require just a single component "name=interactive-audio" with two possible ways of implementing it. The two corresponding configurations are "AVP-audio-0" without modification, the other ("AVP-audio-11") uses linear 16-bit encoding. Typically, transport address parameters such as the port number would also be provided. In this example, this information is given by the "udp" element. Of course, it must be possible to specify other transport mechanisms as well. See Section 3.2 for a discussion of extension mechanisms that allow applications to use non-standard transport (or other) specifications. During/after the negotiation phase, an actual configuration is chosen out of a number of alternative potential configurations, the actual configuration may refer to the potential configuration just by its "id", possibly allowing for some parameter modifications. Alternatively, the full actual configuration may be given. Instead of referencing existing payload type definitions it is also possible to provide the required information "inline". The following example illustrates this: Kutscher, et. al. Expires January 18, 2002 [Page 11] Internet-Draft SDPng July 2001 The UDP/IPv4 multicast transport that is used in the examples is a simple variant of a transport specification. More complex ones are conceivable. For example, it must also be possible to specify the usage of source filters (inclusion and exclusion), Source Specific Multicast, the usage of multi-unicast, or other parameters. Therefore it is possible to extend the definition of transport mechanisms by providing the required information in the element content. An example: More transport mechanisms and options will be defined in future versions of this document. 3.1.3 Constraints Definitions specify media, transport, and other capabilities, whereas configurations indicate which combinations of these could be used to provide the desired functionality in a certain setting. There may, however, be further constraints within a system (such as CPU cycles, DSP available, dedicated hardware, etc.) that limit which of these configurations can be instantiated in parallel (and how many instances of these may exist). We deliberately do not couple this aspect of system resource limitations to the various Kutscher, et. al. Expires January 18, 2002 [Page 12] Internet-Draft SDPng July 2001 application semantics as the constraints exist across application boundaries. Also, in many cases, expressing such constraints is simply not necessary (as many uses of the current SDP show), so additional overhead can be avoided where this is not needed. Therefore, we introduce a "Constraints" section to contain these additional limitations. Constraints refer to potential configurations and to entity definitions and express and use simple logic to express mutual exclusion, limit the number of instantiations, and allow only certain combinations. The following example shows the definition of a constraints that restricts the maximum number of instantiation of two alternatives (that would have to be defined in the configuration section before) when they are used in parallel: As the example shows, contraints are defined by defining limits on simultaneous instantiations of alternatives. They are not defined by expressing abstract endsystem resources, such as CPU speed or memory size. By default, the "Constraints" section is empty (or missing) which means that no further restrictions apply. 3.1.4 Session Attributes The fourth and final section of the SDPng syntax addresses session layer attributes. These attributes largely include those defined by SDP [RFC2327] (which are explicitly indicated in the following specification) to describe originator, purpose, and timing of a multimedia session among other characteristics. Furthermore, SDPng includes attributes indicating the semantics of the various Components in a teleconference or other session. This part of the specification is open ended with an IANA registry to be set up to register further types of components; only a few of the examples are listed here. A session-level specification for connection information (SDP "c=" line), bandwidth information (SDP "b=" line), and encryption keys (SDP "k=" lines) is deliberately not provided for in SDPng. Session level attributes as defined by SDP still have to be examined and adopted for SDPng in a future revision of this specification. Kutscher, et. al. Expires January 18, 2002 [Page 13] Internet-Draft SDPng July 2001 3.1.4.1 Owner The owner refers to the creator of a session as defined in RFC2327 ("o=" line). The syntax is as follows: The owner field MUST be present if SDPng is used with SAP. For all other protocols, the owner field MAY be specified. The attributes listed above match those from the SDP specification; all attributes MUST be present and they MUST be created following the rules of RFC2327. Note: There are several possible ways ahead on this part: "owner" could stand as it is right now, but the various values of the various attributes could be concatenated (separated by blanks) the result being identical to the contents of the SDP "o=" line -- which then could be represented as either a single attribute or as contents of the "owner" element. Alternatively, the owner element could become part of the "session" element described below. Or the contents of the owner element could become an attribute of the "session" element below. 3.1.4.2 Session Identification The "session" element is used to identify the session and to provide a description and possible further references. The following attributes are defined: name: The session name as it is to appear e.g. in a session directory. This is equivalent to the SDP "s=" line. This attribute MUST be present. info: A pointer to further information about the session; this attribute MUST contain a URI. The attribute itself is OPTIONAL. The session element MAY contain arbitrary text of any length (but authors are encouraged to keep the inline description brief and provide additional information via URLs. This text is used to provide a description of the session; it is the equivalent of the SDP "i=" lines. Furthermore, the session element MAY contain other elements of the following types to provide further information about the session and its creator: info: The info element is intended to provide a pointer to further information on the session itself. Its contents MUST be exactly Kutscher, et. al. Expires January 18, 2002 [Page 14] Internet-Draft SDPng July 2001 one URI. If both the info attribute and one or more info elements are present, the union of the respective values is used. Info elements are OPTIONAL, they MAY be repeated any number of times. contact: The contact element provides contact information on the creator of the session; its contents MUST be exactly one URI. Any URI scheme suitable to reach a person or a group of persons is acceptable (e.g. sip:, mailto:, tel:). Contact elements are OPTIONAL, they MAY be repeated any number of times. And here comes a long description of the seminar indicating what this might be about and so forth. But we also include further information -- as additional elements: http://www.ietf.org/ mailto:joe@example.com mailto:bob@example.com tel:+49421281 sip:joe@example.com sip:bob@example.com 3.1.4.3 Time Specification (SDP 't=', 'r=', and 'z=' lines) The time specification for a session follows the same rules as in SDP. Time specifications are usually only meaningful when used in conjunction with SAP and hence are OPTIONAL. SDPng uses the following elements and attributes to specify timing: The element "time" is used to indicate a schedule for the session; time has two optional attributes: start: The starting time of the first occurrence of the session as defined in RFC2327. end: The ending time of the last occurrence of the session as defined in RFC2327. The time element MAY contain the following elements but otherwise MUST be empty: repeat: This element specifies the repetition pattern for the schedule. There MAY be zero or more occurrences of this element within the time element. "repeat" has two MANDATORY and one OPTIONAL attribute and no further contents; the attributes are as defined in SDP: interval: The duration between two start times of the session. Kutscher, et. al. Expires January 18, 2002 [Page 15] Internet-Draft SDPng July 2001 This attribute MUST be present. duration: The duration for which the session will be active starting at each repetition interval. This attribute MUST be present. offset: The offset relative to "start" attribute at which this repetition of the session is start. This attribute is OPTIONAL; if it is absent, a default value of "0" is assumed. Formatting of the attribute values MUST follow the rules defined in RFC2327. zone: The zone element specifies one or more time zone adjustments as defined in RFC2327. This element MAY have zero or more occurrences in the time element. It has two attributes as defined in SDP: adjtime: The time at which the next adjustment will take place. delta: The adjustment offset (typically +/- 1 hours). The example from RFC2327, page 16, expressed in SDPng: 3.1.4.4 Component Semantic Specification Another important session parameter is to specify - ideally in a machine-readable way but at least understandable for humans - the function of the various components in a session. Typically, the semantics of the streams are implicitly assumed (e.g. a video stream goes together with the only audio stream in a session). There are, however, scenarios in which such intuitive understanding is not sufficient and the semantics must be made explicit. Audio stream for the different speakers The above example shows a simple definition of the semantics for a the component "interactive-audio". Further options may be added to provide additional information, e.g. language, and other functions may be specified (e.g. "panel", "audience", "chair", etc.). Kutscher, et. al. Expires January 18, 2002 [Page 16] Internet-Draft SDPng July 2001 3.2 Syntax Definition Mechanisms In order to allow for the possibility to validate session descriptions and in order to allow for structured extensibility it is proposed to rely on a syntax framework that provides concepts as well as concrete procedures for document validation and extending the set of allowed syntax elements. SGML/XML technologies allow for the preparation of Document Type Definitions (DTDs) that can define the allowed content models for the elements of conforming documents. Documents can be formally validated against a given DTD to check their conformance and correctness. XML DTDs however, cannot easily be extended. It is not possible to alter to content models of element types or to add new element types after the DTD has been specified. For SDPng a mechanism is needed that allows the specification of a base syntax -- for example basic elements for the high level structure of description documents -- while allowing extensions, for example elements and attributes for new transport mechanisms, new media types etc. to added on demand. Still, it has to be ensured that extensions do not result in name collisions. Furthermore, it must be possible for applications that process descriptios documents to disinguish extensions from base definitions. For XML, mechanisms have been defined that allow for structured extensibility of a model of allowed syntax: XML Namespace and XML Schema. XML Schema mechanisms allows to constrain the allowed document content, e.g. for documents that contain structured data and also provide the possibility that document instances can conform to several XML Schema definitions at the same time, while allowing Schema validators to check the conformance of these documents. Extensions of the session description language, say for allowing to express the parameters of a new media type, would require the creation of a corresponding XML schema definition that contains the specification of element types that can be used to describe configurations of components for the new media type. Session description documents have to reference the non-standard Schema module, thus enabling parsers and validators to identify the elements of the new extension module and to either ignore them (if they are not supported) or to consider them for processing the session/capability description. It is important to note that the functionality of validating capability and session description documents is not necessarily required to generate or process them. For example, endpoints would Kutscher, et. al. Expires January 18, 2002 [Page 17] Internet-Draft SDPng July 2001 be configured to understand only those parts of description documents that are conforming to the baseline specification and simply ignore extensions they cannot support. The usage of XML and XML Schema is thus rather motivated by the need to allow for extensions being defined and added to the language in a structured way that does not preclude the possibility to have applications to identify and process the extensions elements they might support. The baseline specification of XML Schema definitions and profiles must be well-defined and targeted to the set of parameters that are relevant for the protocols and algorithms of the Internet Multimedia Conferencing Architecture, i.e. transport over RTP/UDP/IP, the audio video profile of RFC1890 etc. Section 3.3 describes profile definitions and library definition. A detailed definition of how the formal SDPng syntax and the corresponding extension mechanisms is to be provided in future versions of this document. The example below shows how the definition of codecs, transport-variants and configuration of components could be realized. Please note that this is not a complete example and that identifiers have been chosen arbitrarily. Kutscher, et. al. Expires January 18, 2002 [Page 18] Internet-Draft SDPng July 2001 This seminar is about SDPng... http://www.ietf.org/ mailto:joe@example.comg sip:joe@example.com Audio stream for the different speakers The example does also not include specifications of XML Schema definitions or references to such definitions. This will be provided in a future version of this draft. A real-world capability description would likely be shorter than the presented example because the codec and transport definitions can be factored-out to profile definition documents that would only be referenced in capability description documents. 3.3 External Definition Packages 3.3.1 Profile Definitions In order to allow for extensibility it must be possible to define extensions to the basic SDPng configuration options. For example if some application requires the use of a new esoteric transport protocol endpoints must be able describe their configuration with respect to the parameters of that transport Kutscher, et. al. Expires January 18, 2002 [Page 19] Internet-Draft SDPng July 2001 protocol. The mandatory and optional parameters that can be configured and negotiated when using the transport protocol will be specified in a definition document. Such a definition document is called a "profile". A profile contains rules that specify how SDPng is used to describe conferences or endsystem capabilities with respect to the parameters of the profile. The concrete properties of the profile definitions mechanism are still to be defined. An example of such a profile would be the RTP profile that defines how to specify RTP parameters. Another example would be the audio codec profiles that defines how specify audio codec parameters. SDPng documents can reference profiles and provide concrete definitions, for example the definition for the GSM audio codec. (This would be done in the "Definitions" section of a SDPng document.) A SDPng document that references a profile and provides concrete defintions of configurations can be validated against the profile definition. 3.3.2 Library Definitions While profile definitions specify the allowed parameters for a given profile SDPng definition sections refer to profile definitions and define concrete configurations based on a specific profile. In order for such definitions to be imported into SDPng documents, there will be the notion of "SDPng libraries". A library is a set of definitions that is conforming to a certain profile definition (or to more than one profile definition -- this needs to be defined). The purpose of the library concept is to allow certain common definitions to be factored-out so that not every SDPng document has to include the basic definitions, for example the PCMU codec definition. SDP [2] uses a similar concept by relying on the well known static payload types (defined in RFC1890 [4]) that are also just referenced but never defined in SDP documents. An SPDng document that references definitions from an external library has to declare the use of the external library. The external library, being a set of configuration definitions for a given profile, again needs to declare the use of the profile that it is conformant to. There are different possibilities of how profiles definitions and libraries can be used in SDPng documents: o In an SPDng document a profile definition can be referenced and Kutscher, et. al. Expires January 18, 2002 [Page 20] Internet-Draft SDPng July 2001 all the configuration definitions are provided within the document itself. The SDPng document is self-contained with respect to the definitions it uses. o In an SPDng document the use of an external library can be declared. The library references a profile definition and the SDPng document references the library. There are two alternatives how external libraries can be referenced: by name: Referencing libraries by names implies the use of a registration authority where definitions and reference names can be registered with. It is conceivable that the most common SDPng definitions be registered that way and that there will be a baseline set of definitions that minimal implementations must understand. Secondly, a registration procedure will be defined, that allows vendors to register frequently used definitions with a registration authority (e.g., IANA) and to declare the use of registered definition packages in conforming SDPng documents. Of course, care should be taken not to make the external references too complex and thus require too much a priori knowledge in a protocol engine implementing SDPng. Relying on this mechanism in general is also problematic because it impedes the extensiblity, because it requires implementors to provide support for new extensions in their products before they can interoperate. Registration is not useful for spontaneous or experimental extensions that are defined in an SDPng library. by address: An alternative to referencing libraries by name is to declare the use of an external library by providing an address, i.e., an URL, that specifies where the library can be obtained. While is allows the use of arbitrary third-party libraries that can extend the basic SDPng set of configuration options in many ways there are problems if the referenced libraries cannot be accessed by all communication partners. o Because of these problematic properties of external libraries, the final SDPng specification will have to provide a set of recommendations under which circumstances the different mechanisms of externalizing definitions should be used. 3.4 Mappings A mapping needs to be defined in particular to SDP that allows to translate final session descriptions (i.e. the result of capability negotiation processes) to SDP documents. In principle, this can be done in a rather schematic fashion. Furthermore, to accommodate SIP-H.323 gateways, a mapping from SDPng Kutscher, et. al. Expires January 18, 2002 [Page 21] Internet-Draft SDPng July 2001 to H.245 needs to be specified at some point. Kutscher, et. al. Expires January 18, 2002 [Page 22] Internet-Draft SDPng July 2001 4. Formal Specification To be provided. Kutscher, et. al. Expires January 18, 2002 [Page 23] Internet-Draft SDPng July 2001 5. Use of SDPng in conjunction with other IETF Signaling Protocols SDPng defines the notion of Components to indicate the intended types of collaboration between the users in e.g. a teleconferencing scenario. For the means conceivable to realize a particular Component, SDPng conceptually distinguishes three levels of support: a Capapility refers to the fact that one of the involved parties supports one particular way of exchanging media -- defined in terms of transport, codec, and other parameters -- as part of the teleconference. a Potential Configuration denotes a set of matching Capabilities from all those involved parties required to successfully realize one particular Component. an Actual Configuration indicates the Potential Configuration which was chosen by the involved parties to realize a certain Component at one particular point in time. As mentioned before, this abstract notion of the interactions between a number of communicating systems needs to be mapped to the application scenarios of SDPng in conjunction with the various IETF signaling protocols: SAP, SIP, RTSP, and MEGACO. 5.1 The Session Announcement Protocol (SAP) SAP is used to disseminate a previously created (and typically fixed) session description to a potentially large audience. An interested member of the audience will use the SDPng description contained in SAP to join the announced media sessions. This means that a SAP announcements contains the Actual Configurations of all Components that are part of the overall teleconference or broadcast. A SAP announcement may contain multiple Actual Configurations for the same Component. In this case, the "same" (i.e. semantically equivalent) media data from one configuration must be available from each of the Actual Configurations. In practice, this limits the use of multiple Actual Configurations to single-source multicast or broadcast scenarios. Each receiver of a SAP announcement with SDPng compares its locally stored Capabiities to realize a certain Component against the Actual Configurations contained in the announcement. If the intersection yields one or more Potential Configurations for the receiver, it Kutscher, et. al. Expires January 18, 2002 [Page 24] Internet-Draft SDPng July 2001 chooses the one it sees fit best. If the intersection is empty, the receiver cannot participate in the announced session. SAP may be substituted by HTTP (in the general case, at least), SMTP, NNTP, or other IETF protocols suitable for conveying a media description from one entity to one or more other without the intend for further negotiation of the session parameters. Example from the SAP spec. to be provided. 5.2 Session Initiation Protocol (SIP) SIP is used to establish and modify multimedia sessions, and SDPng may be carried at least in SIP INVITE and ACK messages as well as in a number of responses. From dealing with legacy SDP (and its essential non-suitability for capability negotiation), a particular use and interpretation of SDP has been defined for SIP. One of the important flexibilities introduced by SIP's usage of SDP is that a sender can change dynamically between all codecs that a receiver has indicated support (and has provided an address) for. Codec changes are not signaled out-of-band but only indicated by the payload type within the media stream. From this arises one important consequence to the conceptual view of a Component within SDPng. There is no clear distinction between Potential and Actual Configurations. There need not be a single Actual Configuration be chosen at setup time within the SIP signaling. Instead, a number of Potential Configurations is signaled in SIP (with all transport parameters required for carrying media streams) and the Actual Configuration is only identified by the paylaod type which is actually being transmitted at any point in time. Note that since SDPng does not explicitly distinguish between Potential and Actual Configurations, this has no implications on the SDPng signaling itself. SIP Examples to be defined. 5.3 Real-Time Streaming Protocol (RTSP) In contrast to SIP, RTSP has, from its intended usage, a clear distinction between offering Potential Configurations (typically by the server) and choosing one out of these (by the client), and, in some cases; some parameters (such as multicast addresses) may be dictated by the server. Hence with RTSP, there is a clear distinguish between Potential Configurations during the negotiation phase and a finally chosen Actual Configuration according to which streaming will take place. Kutscher, et. al. Expires January 18, 2002 [Page 25] Internet-Draft SDPng July 2001 Example from the RTSP spec to be provided. 5.4 Media Gateway Control Protocol (MEGACOP) The MEGACO architecture also follows the SDPng model of a clear separation between Potential and Actual Configurations. Upon startup, a Media Gateway (MG) will "register" with its Media Gateway Controller (MGC) and the latter will audit the MG for its Capabilities. Those will be provided as Potential Configurations, possibly with extensive Constraints specifications. Whenever a media path needs to be set up by the MGC between two MGs or an MG needs to be reconfigured internally, the MGC will use (updated) Actual Configurations. Details and examples to be defined. Kutscher, et. al. Expires January 18, 2002 [Page 26] Internet-Draft SDPng July 2001 6. Open Issues The precise sytnax for referencing profiles and libraries needs to be worked out. A registry (reuse of SDP mechanisms and names etc.) needs to be set up. Transport and Payload type specifications need to be defined as additional appendices. Negotiation mechanisms for multiparty conferencing need to be formalized. Further details on the signaling protocols need to be filled in. Mapping to other media description formats (SDP, H.245, ...) should be provided. For H.245, this is probably a different document (beloning to the SIP-H.323 interworking group). Kutscher, et. al. Expires January 18, 2002 [Page 27] Internet-Draft SDPng July 2001 References [1] Kutscher, D., Ott, J., Bormann, C. and I. Curcio, "Requirements for Session Description and Capability Negotiation", Internet Draft draft-ietf-mmusic-sdpng-req-01.txt, April 2001. [2] Handley, M. and V. Jacobsen, "SDP: Session Description Protocol", RFC 2327, April 1998. [3] Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobsen, "RTP: A Transport Protocol for Real-Time Applications", RFC 1889, January 1996. [4] Schulzrinne, H., "RTP Profile for Audio and Video Conferences with Minimal Control", RFC 1890, January 1996. [5] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video Conferences with Minimal Control", Internet-Draft draft-ietf-avt-profile-new-10.txt , March 2001. [6] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., Handley, M., Bolot, J., Vega-Garcia, A. and S. Fosse-Parisis, "RTP Payload for Redundant Audio Data", RFC 2198, September 1997. [7] Klyne, G., "A Syntax for Describing Media Feature Sets", RFC 2533, March 1999. [8] Klyne, G., "Protocol-independent Content Negotiation Framework", RFC 2703, September 1999. [9] Rosenberg, J. and H. Schulzrinne, "An RTP Payload Format for Generic Forward Error Correction", RFC 2733, December 1999. [10] Perkins, C. and O. Hodson, "Options for Repair of Streaming Media", RFC 2354, June 1998. [11] Handley, M., Perkins, C. and E. Whelan, "Session Announcement Protocol", RFC 2974, October 2000. Kutscher, et. al. Expires January 18, 2002 [Page 28] Internet-Draft SDPng July 2001 Authors' Addresses Dirk Kutscher TZI, Universitaet Bremen Bibliothekstr. 1 Bremen 28359 Germany Phone: +49.421.218-7595, sip:dku@tzi.org Fax: +49.421.218-7000 EMail: dku@tzi.uni-bremen.de Joerg Ott TZI, Universitaet Bremen Bibliothekstr. 1 Bremen 28359 Germany Phone: +49.421.201-7028, sip:jo@tzi.org Fax: +49.421.218-7000 EMail: jo@tzi.uni-bremen.de Carsten Bormann TZI, Universitaet Bremen Bibliothekstr. 1 Bremen 28359 Germany Phone: +49.421.218-7024, sip:cabo@tzi.org Fax: +49.421.218-7000 EMail: cabo@tzi.org Kutscher, et. al. Expires January 18, 2002 [Page 29] Internet-Draft SDPng July 2001 Appendix A. Base SDPng Specifications for Audio Codec Descriptions [5] specifies a number of audio codecs including short name to be used as reference by session description protocols such as SDP and SDPng. Those codec names, as listed in the first column of the above table, are used to identify codecs in SDPng. The following sections indicate the default values that are assumed if nothing else than the codec reference is specified. The following audio-codec attributes are defined for audio codecs: name: the identifier to be later used for referencing the codec spec encoding: the RTP/AVP profile identifier as registered with IANA mime: the MIME type; may alternatively be specified instead of "encoding" channels: the number of independent media channels pattern: the media channel pattern for mapping channels to payload sampling: the sample rate for the codec (which in most cases equals the RTP clock) Furthermode, options may be defined of the following format: if a value is associated with the option (note that arbitrary complex values are allowed), or alternatively: