idnits 2.17.1 

draft-ietf-mmusic-fid-04.txt:
-(565): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  == There are 4 instances of lines with non-ascii characters in the document.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 17
     longer pages, the longest (page 2) being 60 lines


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a Security Considerations section.

  ** The document seems to lack an Authors' Addresses Section.

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  == There are 24 instances of lines with non-RFC6890-compliant IPv4
     addresses in the document.  If these are example addresses, they should
     be changed.

  == There are 5 instances of lines with multicast IPv4 addresses in the
     document.  If these are generic example addresses, they should be changed
     to use the 233.252.0.x range defined in RFC 5771


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The document seems to lack the recommended RFC 2119 boilerplate, even if
     it appears to use RFC 2119 keywords. 

     (The document does seem to have the reference to RFC 2119 which the
     ID-Checklist requires).
  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (February 2002) is 8105 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  ** Obsolete normative reference: RFC 2327 (ref. '2') (Obsoleted by RFC 4566)

  == Outdated reference: A later version (-08) exists of
     draft-ietf-mmusic-sdpng-00

  ** Obsolete normative reference: RFC 2326 (ref. '4') (Obsoleted by RFC 7826)

  ** Obsolete normative reference: RFC 1889 (ref. '5') (Obsoleted by RFC 3550)

  ** Obsolete normative reference: RFC 2543 (ref. '6') (Obsoleted by RFC
     3261, RFC 3262, RFC 3263, RFC 3264, RFC 3265)

  -- Possible downref: Normative reference to a draft: ref. '7' 

  == Outdated reference: A later version (-01) exists of
     draft-rosenberg-sip-app-components-00

  -- Possible downref: Normative reference to a draft: ref. '8' 

  ** Obsolete normative reference: RFC 2833 (ref. '9') (Obsoleted by RFC
     4733, RFC 4734)


     Summary: 10 errors (**), 0 flaws (~~), 8 warnings (==), 4 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Engineering Task Force                     Gonzalo Camarillo
3	Internet draft                                             Jan Holler
4	                                                    Goran AP Eriksson
5	                                                             Ericsson

7	                                                  Henning Schulzrinne
8	                                                  Columbia University

10	                                                          August 2001
11	                                                Expires February 2002
12	                                       <draft-ietf-mmusic-fid-04.txt>

14	                     Grouping of media lines in SDP

16	Status of this Memo

18	   This document is an Internet-Draft and is in full conformance with
19	      all provisions of Section 10 of RFC2026.

21	   Internet-Drafts are working documents of the Internet Engineering
22	   Task Force (IETF), its areas, and its working groups. Note that
23	   other groups may also distribute working documents as Internet-
24	   Drafts. Internet-Drafts are draft documents valid for a maximum of
25	   six months and may be updated, replaced, or obsoleted by other
26	   documents at any time. It is inappropriate to use Internet- Drafts
27	   as reference material or to cite them other than as "work in
28	   progress."

30	   The list of current Internet-Drafts can be accessed at
31	   http://www.ietf.org/ietf/1id-abstracts.txt
32	   The list of Internet-Draft Shadow Directories can be accessed at
33	   http://www.ietf.org/shadow.html.

35	Abstract

37	   This document defines two SDP attributes: "group" and "mid". They
38	   allow to group together several "m" lines for two different
39	   purposes: for lip synchronization and for receiving media from a
40	   single flow (several media streams), encoded in different formats
41	   during a particular session, in different ports and host interfaces.

43	Camarillo/Holler/Eriksson/Schulzrinne                                1
44	                    Grouping of media lines in SDP

46	TABLE OF CONTENTS

48	   1     Introduction...............................................2
49	   2     Terminology................................................3
50	   3     Media stream identification attribute......................3
51	   4     Group attribute............................................3
52	   5     Use of "group" and "mid"...................................3
53	   6     Lip Synchronization (LS)...................................4
54	   6.1   Example of LS..............................................4
55	   7     Flow Identification (FID)..................................5
56	   7.1   SIP and cellular access....................................5
57	   7.2   DTMF tones.................................................5
58	   7.3   Media flow definition......................................6
59	   7.4   FID semantics..............................................6
60	   7.4.1 Examples of FID............................................6
61	   8     Scenarios that FID does not cover..........................9
62	   8.1   Parallel encoding using different codecs...................9
63	   8.2   Layered encoding..........................................10
64	   8.3   Same IP address and port number...........................10
65	   9     Usage of the "group" attribute in SIP.....................11
66	   9.1   Mid value in responses....................................11
67	   9.1.1 Example...................................................11
68	   9.2   Group value in responses..................................12
69	   9.2.1 Example...................................................13
70	   9.3   Capability negotiation....................................14
71	   9.3.1 Example...................................................14
72	   9.4   Backward compatibility....................................14
73	   9.4.1 Client does not support "group"...........................15
74	   9.4.2 Server does not support "group"...........................15
75	   10    IANA considerations.......................................15
76	   11    Acknowledgements..........................................15
77	   12    References................................................15
78	   13    Authors� Addresses........................................16

80	1 Introduction

82	   An SDP session description typically contains a number (one or more)
83	   of media lines - they are commonly known as "m" lines. When a
84	   session description contains more than one "m" line, SDP does not
85	   provide any means to express a particular relationship between two
86	   or more of them. When an application receives an SDP session
87	   description with more than one "m" line it is up to the application
88	   what to do with them. SDP does not carry any information about
89	   grouping media streams.

91	   While in some environments this information can be carried out of
92	   band, it would be desirable to have extensions to SDP that allowed
93	   to express how different media streams within a session description
94	   relate to each other. This document defines such extensions.

96	Camarillo/Holler/Eriksson/Schulzrinne                                2
97	                    Grouping of media lines in SDP

99	2 Terminology

101	   In this document, the key words "MUST", "MUST NOT", "REQUIRED",
102	   "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY",
103	   and "OPTIONAL" are to be interpreted as described in RFC 2119 [1]
104	   and indicate requirement levels for compliant implementations.

106	3. Media stream identification attribute

108	   A new "media stream identification" media attribute is defined. It
109	   is used for identifying media streams within a session description.
110	   Its formatting in SDP [2] is described by the following BNF:

112	         mid-attribute      = "a=mid:" identification-tag
113	         identification-tag = token

115	   The identification tag MUST be unique within an SDP session
116	   description.

118	4. Group attribute

120	   A new "group" session level attribute is defined. It is used for
121	   grouping together different media streams. Its formatting in SDP is
122	   described by the following BNF:

124	         group-attribute    = "a=group:" semantics
125	                              *(space identification-tag)
126	         semantics          = "LS" | "FID"

128	   This document defines two standard semantics: LS (Lip
129	   Synchronization) and FID (Flow Identification). If in the future it
130	   was needed to standardize further semantics they would need to be
131	   defined in a standards track document. However, defining new
132	   semantics apart from LS and FID is discouraged. Instead, it is
133	   RECOMMENDED to use other session description mechanisms such as
134	   SDPng [3].

136	5. Use of "group" and "mid"

138	   All the "m" lines of a session description that uses "group" MUST be
139	   identified with an "mid" attribute regardless of whether they appear
140	   or not in the group line(s). If a session description contains at
141	   least one "m" line that has no "mid" identification the application
142	   MUST NOT perform any grouping of media lines.

144	   "a=group" lines are used to group together several "m" lines that
145	   are identified by their "mid" attribute. "a=group" lines that
146	   contain identification-tags that do not correspond to any "m" line
147	   within the session description MUST be simply ignored. The
148	   application acts as if the "a=group" line did not exist. The
149	   behavior of an application receiving an SDP with grouped "m" lines
150	   is defined by the semantics field in the "a=group" line.

152	Camarillo/Holler/Eriksson/Schulzrinne                                3
153	                    Grouping of media lines in SDP

155	   There MAY be several "a=group" lines in a session description.

157	   An application that wants to be compliant to this specification MUST
158	   support both "group" and "mid". An application that supported just
159	   one of them would not be compliant.

161	6. Lip Synchronization (LS)

163	   An application that receives a session description that contains "m"
164	   lines that are grouped together using LS semantics MUST synchronize
165	   the play out of the corresponding media streams. Note that LS
166	   semantics not only apply to a video stream that has to be
167	   synchronized with an audio stream. The play out of two streams of
168	   the same type can perfectly be synchronized as well.

170	   For RTP streams synchronization is typically performed using RTCP,
171	   which provides enough information to map time stamps from the
172	   different streams into a wall clock. However, the concept of media
173	   stream synchronization MAY also apply to media streams that do not
174	   make use of RTP. If this is the case, the application MUST recover
175	   the original timing relationship between the streams using whatever
176	   available mechanism.

178	6.1 Example of LS

180	   The following example shows a session description of a conference
181	   that is being multicast. The first media stream (mid:1) contains the
182	   voice of the speaker, who speaks in English. The second media stream
183	   (mid:2) contains the video component and the third (mid:3) media
184	   stream carries the translation to Spanish of what he is saying. The
185	   first and the second media streams MUST be synchronized.

187	         v=0
188	         o=Laura 289083124 289083124 IN IP4 one.example.com
189	         t=0 0
190	         c=IN IP4 224.2.17.12/127
191	         a=group:LS 1 2
192	         m=audio 30000 RTP/AVP 0
193	         a=mid:1
194	         m=video 30002 RTP/AVP 31
195	         a=mid:2
196	         m=audio 30004 RTP/AVP 0
197	         i=This media stream contains the Spanish translation
198	         a=mid:3

200	   Note that although the third media stream is not present in the
201	   group line it still MUST contain an mid attribute (mid:3), as stated
202	   before.

204	Camarillo/Holler/Eriksson/Schulzrinne                                4
205	                    Grouping of media lines in SDP

207	7. Flow Identification (FID)

209	   An "m" line in an SDP session description defines a media stream.
210	   However, SDP does not define what a media stream is. To find the
211	   definition of a media stream we have to go to the RTSP
212	   specification. The RTSP RFC [4] defines a media stream as "a single
213	   media instance, e.g., an audio stream or a video stream as well as a
214	   single whiteboard or shared application group. When using RTP, a
215	   stream consists of all RTP and RTCP packets created by a source
216	   within an RTP session".

218	   This definition assumes that a single audio (or video) stream maps
219	   into an RTP session. To find the definition of an RTP session we go
220	   to the RTP specification. The RTP RFC [5] defines an RTP session as
221	   follows: "For each participant, the session is defined by a
222	   particular pair of destination transport addresses (one network
223	   address plus a port pair for RTP and RTCP)".

225	   While the previous definitions cover the most common cases, there
226	   are situations where a single media instance, (e.g., an audio stream
227	   or a video stream) is sent using more than one RTP session. Two
228	   examples (among many others) of this kind of situation are cellular
229	   systems using SIP [6] and systems receiving DTMF tones on a
230	   different host than the voice.

232	7.1 SIP and cellular access

234	   Systems using a cellular access and SIP as a signalling protocol
235	   need to receive media over the air. During a session the media can
236	   be encoded using different codecs. The encoded media has to traverse
237	   the radio interface. The radio interface is generally characterized
238	   by being bit error prone and associated with relatively high packet
239	   transfer delays. In addition, radio interface resources in a
240	   cellular environment are scarce and thus expensive, which calls for
241	   special measures in providing a highly efficient transport [7]. In
242	   order to get an appropriate speech quality in combination with an
243	   efficient transport, precise knowledge of codec properties are
244	   required so that a proper radio bearer for the RTP session can be
245	   configured before transferring the media. These radio bearers are
246	   dedicated bearers per media type, i.e. codec.

248	   Cellular systems typically configure different radio bearers on
249	   different port numbers. Therefore, incoming media has to have
250	   different destination port numbers for the different possible codecs
251	   in order to be routed properly to the correct radio bearer. Thus,
252	   this is an example in which several RTP sessions are used to carry a
253	   single media instance (the encoded speech from the sender).

255	7.2 DTMF tones

257	   Some voice sessions include DTMF tones. Sometimes the voice handling
258	   is performed by a different host than the DTMF handling. [8]
259	   contains several examples of how application servers in the network

261	Camarillo/Holler/Eriksson/Schulzrinne                                5
262	                    Grouping of media lines in SDP

264	   gather DTMF tones for the user while the user receives the encoded
265	   speech on his user agent. In this situations it is necessary to
266	   establish two RTP sessions: one for the voice and the other for the
267	   DTMF tones. Both RTP sessions are logically part of the same media
268	   instance.

270	7.3 Media flow definition

272	   The previous examples show that the definition of a media stream in
273	   [4] do not cover some scenarios. It cannot be assumed that a single
274	   media instance maps into a single RTP session. Therefore, we
275	   introduce the definition of a media flow:

277	   Media flow consists of a single media instance, e.g., an audio
278	   stream or a video stream as well as a single whiteboard or shared
279	   application group. When using RTP, a media flow comprises one or
280	   more RTP sessions.

282	7.4 FID semantics

284	   Several "m" lines grouped together using FID semantics form a media
285	   flow. A media agent handling a media flow that comprises several "m"
286	   lines MUST send a copy of the media to every "m" line part of the
287	   flow as long as the codecs and the direction attribute present in a
288	   particular "m" line allow it.

290	   It is assumed that the application uses only one codec at a time to
291	   encode the media produced. This codec MAY change dynamically during
292	   the session, but at any certain moment only one codec is in use.

294	   The application encodes the media using the current codec and checks
295	   one by one all the "m" lines that are part of the flow. If a
296	   particular "m" line contains the codec being used and the direction
297	   attribute is "sendonly" or "sendrecv" a copy of the encoded media is
298	   sent to the address/port specified in that particular media stream.
299	   If either the "m" line does not contain the codec being used or the
300	   direction attribute is neither "sendonly" nor "sendrecv" nothing is
301	   sent over this media stream.

303	   The application typically ends up sending media to different
304	   destinations (IP address/port number) depending on the codec used at
305	   any moment.

307	7.4.1 Examples of FID

309	   The session description below would be the SDP sent by a SIP user
310	   agent using a cellular access. The user agent supports GSM on port
311	   30000 and AMR on port 30002. When the remote party sends GSM it will
312	   send RTP packets to port number 30000. When AMR is the codec chosen,
313	   packets will be sent to port 30002. Note that the remote party can
314	   switch between both codecs dynamically in the middle of the session.
315	   However, in this example, only one media stream at a time carries

317	Camarillo/Holler/Eriksson/Schulzrinne                                6
318	                    Grouping of media lines in SDP

320	   voice. The other remains "muted" while its corresponding codec is
321	   not in use.

323	         v=0
324	         o=Laura 289083124 289083124 IN IP4 two.example.com
325	         t=0 0
326	         c=IN IP4 131.160.1.112
327	         a=group:FID 1 2
328	         m=audio 30000 RTP/AVP 3
329	         a=rtpmap:3 GSM/8000
330	         a=mid:1
331	         m=audio 30002 RTP/AVP 97
332	         a=rtpmap:97 AMR/8000
333	         a=fmtp:97 mode-set=0,2,5,7; mode-change-period=2; mode-change-
334	      neighbor; maxframes=1
335	         a=mid:2

337	   In the previous example a system receives media on the same IP
338	   address on different port numbers. The following example shows how a
339	   system can receive different codecs on different IP addresses.

341	         v=0
342	         o=Laura 289083124 289083124 IN IP4 three.example.com
343	         t=0 0
344	         c=IN IP4 131.160.1.112
345	         a=group:FID 1 2
346	         m=audio 20000 RTP/AVP 0
347	         c=IN IP4 131.160.1.111
348	         a=rtpmap:0 PCMU/8000
349	         a=mid:1
350	         m=audio 30002 RTP/AVP 97
351	         a=rtpmap:97 AMR/8000
352	         a=fmtp:97 mode-set=0,2,5,7; mode-change-period=2; mode-change-
353	      neighbor; maxframes=1
354	         a=mid:2

356	   The cellular terminal of this example only supports the AMR codec.
357	   However, many current IP phones only support PCM (payload 0). In
358	   order to be able to interoperate with them, the cellular terminal
359	   uses a transcoder whose IP address is 131.160.1.111. The cellular
360	   terminal includes in its SDP support for PCM at that IP address.
361	   Remote systems will send AMR directly to the terminal but PCM will
362	   be sent to the transcoder. The transcoder will be configured (using
363	   whatever method) to convert the incoming PCM audio to AMR and send
364	   it to the terminal.

366	   The next example shows that the "group" attribute used with FID
367	   semantics allows to express uni-directional codecs for a bi-
368	   directional media flow. That is, a codec that is only used in one
369	   direction within a sendrecv media stream.

371	Camarillo/Holler/Eriksson/Schulzrinne                                7
372	                    Grouping of media lines in SDP

374	         v=0
375	         o=Laura 289083124 289083124 IN IP4 four.example.com
376	         t=0 0
377	         c=IN IP4 131.160.1.112
378	         a=group:FID 1 2
379	         m=audio 30000 RTP/AVP 0
380	         a=mid:1
381	         m=audio 30002 RTP/AVP 8
382	         a=recvonly
383	         a=mid:2

385	   A user agent that receives the SDP above knows that at a certain
386	   moment it can send either PCM u-law to port number 30000 or PCM A-
387	   law to port number 30002. However, the media agent also knows that
388	   the other end will only send PCM u-law (payload 0).

390	   The following example shows a session description with different "m"
391	   lines grouped together using FID semantics that contain the same
392	   codec.

394	         v=0
395	         o=Laura 289083124 289083124 IN IP4 five.example.com
396	         t=0 0
397	         c=IN IP4 131.160.1.112
398	         a=groupe:FID 1 2 3
399	         m=audio 30000 RTP/AVP 0
400	         a=mid:1
401	         m=audio 30002 RTP/AVP 8
402	         a=mid:2
403	         m=audio 20000 RTP/AVP 0 8
404	         c=IN IP4 131.160.1.111
405	         a=recvonly
406	         a=mid:3

408	   At a particular point of time, if the media agent is sending PCM u-
409	   law (payload 0) it sends RTP packets to 131.160.1.112 on port 30000
410	   and to 131.160.1.111 on port 20000 (first and third "m" lines). If
411	   it is sending PCM A-law (payload 8) it sends RTP packets to
412	   131.160.1.112 on port 30002 and to 131.160.1.111 on port 20000
413	   (second and third "m" lines).

415	   The system that generated the SDP above supports PCM u-law on port
416	   30000 and PCM A-law on port 30002. Besides, it uses an application
417	   server whose IP address is 131.160.1.111 that records all the
418	   conversation. That is why the application server always receives a
419	   copy of the audio stream regardless of the codec being used at any
420	   given moment (it actually performs an RTP dump, so it can
421	   effectively receive any codec).

423	   Remember that if several "m" lines grouped together using FID
424	   semantics contain the same codec the media agent MUST send media
425	   over several RTP sessions at the same time.

427	Camarillo/Holler/Eriksson/Schulzrinne                                8
428	                    Grouping of media lines in SDP

430	   The last example of this section deals with DTMF tones. DTMF tones
431	   can be transmitted using a regular voice codec or can be transmitted
432	   as telephony events. The RTP payload for DTMF tones treated as
433	   telephone events is described in RFC 2833 [9]. Below there is an
434	   example of an SDP session description using FID semantics and this
435	   payload type.

437	         v=0
438	         o=Laura 289083124 289083124 IN IP4 six.example.com
439	         t=0 0
440	         c=IN IP4 131.160.1.112
441	         a=group:FID 1 2
442	         m=audio 30000 RTP/AVP 0
443	         a=mid:1
444	         m=audio 20000 RTP/AVP 97
445	         c=IN IP4 131.160.1.111
446	         a=rtpmap:97 telephone-events
447	         a=mid:2

449	   The remote party would send PCM encoded voice (payload 0) to
450	   131.160.1.112 and DTMF tones encoded as telephony events to
451	   131.160.1.111. Note that only voice or DTMF is sent at a particular
452	   point of time. When DTMF tones are sent the first media stream does
453	   not carry any data and when voice is sent there is no data in the
454	   second media stream. FID semantics provide different destinations
455	   for alternative codecs.

457	8 Scenarios that FID does not cover

459	   It is worthwhile mentioning some scenarios where the "group"
460	   attribute using existing semantics (particularly FID) might seem to
461	   be applicable but it is not. This section has been included because
462	   we have observed some confusion within the community regarding the
463	   three scenarios described below. This section helps clarify them.

465	8.1 Parallel encoding using different codecs

467	   FID semantics are useful when the application only uses one codec at
468	   a time. When a particular application encodes the same media using
469	   different codecs FID MUST NOT be used. Some systems that handle DTMF
470	   tones are a typical example of parallel encoding using different
471	   codecs.

473	   Some systems implement the RTP payload defined in RFC 2833, but when
474	   they send DTMF tones they do not mute the voice channel. Therefore,
475	   effectively they are sending two copies of the same DTMF tone:
476	   encoded as voice and encoded as a telephony event. When the receiver
477	   gets both copies it typically uses the telephony event rather than
478	   the tone encoded as voice. FID semantics MUST NOT be used in this
479	   context to group both media streams since such a system is not using
480	   alternative codecs but rather different parallel encodings for the
481	   same information.

483	Camarillo/Holler/Eriksson/Schulzrinne                                9
484	                    Grouping of media lines in SDP

486	8.2 Layered encoding

488	   Layered encoding schemes encode media in different layers. Quality
489	   at the receiver varies depending on the number of layers received.
490	   SDP provides a means to group together contiguous multicast
491	   addresses that transport different layers. The "c" line below:

493	        c=IN IP4 224.2.1.1/127/3

495	   is equivalent to the following three "c" lines:

497	        c=IN IP4 224.2.1.1/127
498	        c=IN IP4 224.2.1.2/127
499	        c=IN IP4 224.2.1.3/127

501	   FID MUST NOT be used to group "m" lines that contain the different
502	   layers of layered encoding scheme. Besides, we do not define new
503	   group semantics to provide a more flexible way of grouping different
504	   layers because the already existing SDP mechanism covers the most
505	   useful scenarios.

507	8.3 Same IP address and port number

509	   If several codecs have to be sent to the same IP address and port,
510	   the traditional SDP syntax of listing several codecs in the same "m"
511	   line MUST be used. FID MUST NOT be used to group "m" lines with the
512	   same IP address/port. Therefore, an SDP like the one below MUST NOT
513	   be generated.

515	         v=0
516	         o=Laura 289083124 289083124 IN IP4 six.example.com
517	         t=0 0
518	         c=IN IP4 131.160.1.112
519	         a=group:FID 1 2
520	         m=audio 30000 RTP/AVP 0
521	         a=mid:1
522	         m=audio 30000 RTP/AVP 8
523	         a=mid:2

525	   The correct SDP for the session above would be the following one:

527	         v=0
528	         o=Laura 289083124 289083124 IN IP4 six.example.com
529	         t=0 0
530	         c=IN IP4 131.160.1.112
531	         m=audio 30000 RTP/AVP 0 8

533	9. Usage of the "group" attribute in SIP

535	   SDP descriptions are used by several different protocols, SIP among
536	   them. We include a section about SIP because the "group" attribute
537	   will most likely be used mainly by SIP systems.

539	Camarillo/Holler/Eriksson/Schulzrinne                               10
540	                    Grouping of media lines in SDP

542	   SIP [6] is an application layer protocol for establishing,
543	   terminating and modifying multimedia sessions. SIP carries session
544	   descriptions in the bodies of the SIP messages but is independent
545	   from the protocol used for describing sessions. SDP [2] is one of
546	   the protocols that can be used for this purpose.

548	   At session establishment SIP provides a three-way handshake (INVITE-
549	   200 OK-ACK) between end systems. However, just two of these three
550	   messages carry SDP. SDPs MAY be present in INVITE and 200 OK or in
551	   200 OK and ACK. The following sections assume that INVITE and 200 OK
552	   are the ones carrying SDP for the shake of clarity, but everything
553	   is also applicable to the other possible scenario (200 OK and ACK).

555	9.1 Mid value in responses

557	   The "mid" attribute is an identifier for a particular media stream.
558	   Therefore, the "mid" value in the response MUST be the same as the
559	   "mid" value in the request. Besides, subsequent requests such as re-
560	   INVITEs SHOULD use the same "mid" value for the already existing
561	   media streams.

563	   Appendix B of [6] describes the usage of SDP in relation to SIP. It
564	   states: "The caller and callee align their media description so that
565	   the nth media stream ("m=" line) in the caller�s session description
566	   corresponds to the nth media stream in the callee�s description."

568	   The presence of the "group" attribute in an SDP session description
569	   does not modify this behavior.

571	   Since the "mid" attribute provides a means to label "m" lines it
572	   would be possible to perform media alignment using "mid" labels
573	   rather than matching nth "m" lines. However this would not bring any
574	   gain and would add complexity to implementations. Therefore SIP
575	   systems MUST perform media alignment matching nth lines regardless
576	   of the presence of the "group" or "mid" attributes.

578	   If a media stream that contained a particular "mid" identifier in
579	   the request contains a different identifier in the response the
580	   application ignores all the "mid" and "group" lines that might
581	   appear in the session description. The following example illustrates
582	   this scenario:

584	9.1.1 Example

586	   Two SIP entities exchange SDPs during session establishment. The
587	   INVITE contained the SDP below:

589	         v=0
590	         o=Laura 289083124 289083124 IN IP4 seven.example.com
591	         t=0 0
592	         c=IN IP4 131.160.1.112
593	         a=groupe:FID 1 2

595	Camarillo/Holler/Eriksson/Schulzrinne                               11
596	                    Grouping of media lines in SDP

598	         m=audio 30000 RTP/AVP 0 8
599	         a=mid:1
600	         m=audio 30002 RTP/AVP 0 8
601	         a=mid:2

603	   The 200 OK response contains the following SDP:

605	         v=0
606	         o=Bob 289083122 289083122 IN IP4 eigth.example.com
607	         t=0 0
608	         c=IN IP4 131.160.1.113
609	         a=groupe:FID 1 2
610	         m=audio 25000 RTP/AVP 0 8
611	         a=mid:2
612	         m=audio 25002 RTP/AVP 0 8
613	         a=mid:1

615	   Since alignment of "m" lines is performed based on matching of nth
616	   lines, the first stream had "mid:1" in the INVITE and "mid:2" in the
617	   200 OK. Therefore, the application MUST ignore every "mid" and
618	   "group" lines contained in the SDP.

620	   A well-behaved SIP user agent would have returned the SDP below in
621	   the 200 OK:

623	         v=0
624	         o=Bob 289083122 289083122 IN IP4 nine.example.com
625	         t=0 0
626	         c=IN IP4 131.160.1.113
627	         a=groupe:FID 1 2
628	         m=audio 25002 RTP/AVP 0 8
629	         a=mid:1
630	         m=audio 25000 RTP/AVP 0 8
631	         a=mid:2

633	9.2 Group value in responses

635	   A SIP entity that receives a request that contains an "a=group" line
636	   with semantics that it does not understand MUST return a response
637	   without the "group" line. Note that, as it was described in the
638	   previous section, the "mid" lines MUST still be present in the
639	   response.

641	   A SIP entity that receives a request that contains an "a=group" line
642	   which semantics that are understood MUST return a response that
643	   contains an "a=group" line with the same semantics. The
644	   identification-tags contained in this "a=group" lines MUST be the
645	   same that were received in the request or a subset of them (zero
646	   identification-tags is a valid subset). When the identification-tags
647	   in the response are a subset the "group" value to be used in the
648	   session MUST be the one present in the response.

650	Camarillo/Holler/Eriksson/Schulzrinne                               12
651	                    Grouping of media lines in SDP

653	   SIP entities refuse media streams by setting the port to zero in the
654	   corresponding "m" line. "a=group" lines MUST no contain
655	   identification-tags that correspond to "m" lines with port zero.

657	   Note that grouping of m lines MUST always be requested by the issuer
658	   of the request (the client), never by the issuer of the response
659	   (the server). Since SIP provides a two-way SDP exchange, a server
660	   that requested grouping in a response would not know whether the
661	   "group" attribute was accepted by the client or not. A server that
662	   wants to group media lines SHOULD issue another request after having
663	   responded to the first one (a re-INVITE for instance).

665	        Note that, as we mentioned previously, in this section we are
666	        assuming that the SDPs are present in the INVITE and in the 200
667	        OK. Applying the statement above to the scenario where SDPs are
668	        present in the 200 OK and in the ACK, the entity requesting
669	        grouping would be the server.

671	9.2.1 Example

673	   The example below shows how the callee refuses a media stream
674	   offered by the caller setting its port number to zero. The "mid"
675	   value corresponding to that media stream is removed from the "group"
676	   value in the response.

678	   SDP in the INVITE from caller to callee:

680	         v=0
681	         o=Laura 289083124 289083124 IN IP4 ten.example.com
682	         t=0 0
683	         c=IN IP4 131.160.1.112
684	         a=group:FID 1 2 3
685	         m=audio 30000 RTP/AVP 0
686	         a=mid:1
687	         m=audio 30002 RTP/AVP 8
688	         a=mid:2
689	         m=audio 30004 RTP/AVP 3
690	         a=mid:3

692	   SDP in the INVITE from callee to caller:

694	         v=0
695	         o=Bob 289083125 289083125 IN IP4 eleven.example.com
696	         t=0 0
697	         c=IN IP4 131.160.1.113
698	         a=group:FID 1 3
699	         m=audio 20000 RTP/AVP 0
700	         a=mid:1
701	         m=audio 0 RTP/AVP 8
702	         a=mid:2
703	         m=audio 20002 RTP/AVP 3
704	         a=mid:3

706	Camarillo/Holler/Eriksson/Schulzrinne                               13
707	                    Grouping of media lines in SDP

709	9.3 Capability negotiation

711	   A client that understands "group" and "mid" but does not want to
712	   make use of them in a particular session MAY want indicate that it
713	   supports them. If a client decides to do that, it SHOULD add an
714	   "a=group" line with zero identification-tags for every semantics it
715	   understands.

717	   If a server receives a request that contains empty "a=group" lines
718	   it SHOULD add its capabilities also in the form of empty "a=group"
719	   lines to its response.

721	9.3.1 Example

723	   A system that supports both LS and FID semantics but does not want
724	   to group any media stream for this particular session generates the
725	   following SDP:

727	         v=0
728	         o=Bob 289083125 289083125 IN IP4 twelve.example.com
729	         t=0 0
730	         c=IN IP4 131.160.1.113
731	         a=group:LS
732	         a=group:FID
733	         m=audio 20000 RTP/AVP 0 8

735	    The server that receives that request supports FID but not LS. It
736	   responds with the SDP below:

738	         v=0
739	         o=Laura 289083124 289083124 IN IP4 thirteen.example.com
740	         t=0 0
741	         c=IN IP4 131.160.1.112
742	         a=group:FID
743	         m=audio 30000 RTP/AVP 0

745	9.4 Backward compatibility

747	   This document does not define any SIP "Require" header. Therefore,
748	   if one of the SIP user agents does not understand the "group"
749	   attribute the standard SDP fall back mechanism MUST be used
750	   (attributes that are not understood are simply ignored).

752	9.4.1 Client does not support "group"

754	   This situation does not represent a problem because grouping
755	   requests is always performed by clients, not by servers. If the

757	Camarillo/Holler/Eriksson/Schulzrinne                               14
758	                    Grouping of media lines in SDP

760	   client does not support "group" this attribute will just not be
761	   used.

763	9.4.2 Server does not support "group"

765	   The server will ignore the "group" attribute, since it does not
766	   understand it (it will also ignore the "mid" attribute). For LS
767	   semantics, the server might decide to perform or to not perform
768	   synchronization between media streams.

770	   For FID semantics, the server will consider that the session
771	   comprises several media streams.

773	   Different implementations would behave in different ways.

775	   In the case of audio and different "m" lines for different codecs an
776	   implementation might decide to act as a mixer with the different
777	   incoming RTP sessions, which is the correct behavior.

779	   An implementation might also decide to refuse the request (e.g. 488
780	   Not acceptable here or 606 Not Acceptable) because it contains
781	   several "m" lines. In this case, the server does not support the
782	   type of session that the caller wanted to establish. In case the
783	   client is willing to establish a simpler session anyway, he SHOULD
784	   re-try the request without "group" attribute and only one "m" line
785	   per flow.

787	10. IANA considerations

789	   As previously stated in section 4, this document defines two
790	   standard semantics related to the "group"  attribute: LS (Lip
791	   Synchronization) and FID (Flow Identification). If in the future it
792	   was needed to standardize further semantics they would need to be
793	   defined in a standards track document.

795	11. Acknowledgments

797	   The authors would like to thank Jonathan Rosenberg, Adam Roach, Orit
798	   Levin and Joerg Ott for their feedback on this document.

800	12. References

802	   [1] S. Bradner, "Key words for use in RFCs to Indicate Requirement
803	   Levels", RFC 2119, IETF; March 1997.

805	   [2] M. Handley/V. Jacobson, "SDP: Session Description Protocol", RFC
806	   2327, IETF; April 1998.

808	   [3] D. Kutscher/J. Ott/C. Bormann, "Session Description and
809	   Capability Negotiation", draft-ietf-mmusic-sdpng-00.txt, IETF; April
810	   2001. Work in progress.

812	Camarillo/Holler/Eriksson/Schulzrinne                               15
813	                    Grouping of media lines in SDP

815	   [4] H. Schulzrinne/A. Rao/R. Lanphier, "Real Time Streaming Protocol
816	   (RTSP)", RFC 2326, IETF; April 1998.

818	   [5] H. Schulzrinne/S. Casner/R. Frederick/V. Jacobson, "RTP: A
819	   Transport Protocol for Real-Time Applications", RFC 1889, IETF;
820	   January 1996.

822	   [6] M. Handley/H. Schulzrinne/E. Schooler/J. Rosenberg, "SIP:
823	   Session Initiation Protocol", RFC 2543, IETF; Mach 1999.

825	   [7] L. Westberg/M. Lindqvist, "Realtime Traffic over Cellular Access
826	   Networks", draft-westberg-realtime-cellular-04.txt, IETF; June 2001.
827	   Work in progress.

829	   [8] J. Rosenberg/P.Mataga/H.Schulzrinne, "An Application Server
830	   Component Architecture for SIP", draft-rosenberg-sip-app-components-
831	   00.txt, IETF; November 2000. Work in progress.

833	   [9] H. Schulzrinne/S. Petrack, "RTP Payload for DTMF Digits,
834	   Telephony Tones and Telephony Signals", RFC 2833, IETF; May 2000.

836	13. Authors� Addresses

838	   Gonzalo Camarillo
839	   Ericsson
840	   Advanced Signalling Research Lab.
841	   FIN-02420 Jorvas
842	   Finland
843	   Phone: +358 9 299 3371
844	   Fax: +358 9 299 3052
845	   Email: Gonzalo.Camarillo@ericsson.com

847	   Jan Holler
848	   Ericsson Research
849	   S-16480 Stockholm
850	   Sweden
851	   Phone: +46 8 58532845
852	   Fax: +46 8 4047020
853	   Email: Jan.Holler@era.ericsson.se

855	   Goran AP Eriksson
856	   Ericsson Research
857	   S-16480 Stockholm
858	   Sweden
859	   Phone: +46 8 58531762
860	   Fax: +46 8 4047020
861	   Email: Goran.AP.Eriksson@era.ericsson.se

863	   Henning Schulzrinne
864	   Dept. of Computer Science
865	   Columbia University
866	   1214 Amsterdam Avenue

868	Camarillo/Holler/Eriksson/Schulzrinne                               16
869	                    Grouping of media lines in SDP

871	   New York, NY 10027
872	   USA
873	   Email: schulzrinne@cs.columbia.edu

875	Camarillo/Holler/Eriksson/Schulzrinne                               17