idnits 2.17.1 

draft-ietf-avtext-rtp-grouping-taxonomy-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (November 06, 2013) is 3821 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Unused Reference: 'UML' is defined on line 1775, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC3264' is defined on line 1816, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC6222' is defined on line 1861, but no explicit
     reference was found in the text

  == Outdated reference: A later version (-11) exists of
     draft-ietf-avtcore-clksrc-07

  == Outdated reference: A later version (-25) exists of
     draft-ietf-clue-framework-12

  == Outdated reference: A later version (-54) exists of
     draft-ietf-mmusic-sdp-bundle-negotiation-05

  == Outdated reference: A later version (-19) exists of
     draft-ietf-rtcweb-overview-08

  -- Obsolete informational reference (is this intentional?): RFC 4566
     (Obsoleted by RFC 8866)

  -- Obsolete informational reference (is this intentional?): RFC 6222
     (Obsoleted by RFC 7022)


     Summary: 0 errors (**), 0 flaws (~~), 8 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Network Working Group                                          J. Lennox
3	Internet-Draft                                                     Vidyo
4	Intended status: Informational                                  K. Gross
5	Expires: May 10, 2014                                                AVA
6	                                                           S. Nandakumar
7	                                                            G. Salgueiro
8	                                                           Cisco Systems
9	                                                               B. Burman
10	                                                                Ericsson
11	                                                       November 06, 2013

13	A Taxonomy of Grouping Semantics and Mechanisms for Real-Time Transport
14	                         Protocol (RTP) Sources
15	               draft-ietf-avtext-rtp-grouping-taxonomy-00

17	Abstract

19	   The terminology about, and associations among, Real-Time Transport
20	   Protocol (RTP) sources can be complex and somewhat opaque.  This
21	   document describes a number of existing and proposed relationships
22	   among RTP sources, and attempts to define common terminology for
23	   discussing protocol entities and their relationships.

25	Status of This Memo

27	   This Internet-Draft is submitted in full conformance with the
28	   provisions of BCP 78 and BCP 79.

30	   Internet-Drafts are working documents of the Internet Engineering
31	   Task Force (IETF).  Note that other groups may also distribute
32	   working documents as Internet-Drafts.  The list of current Internet-
33	   Drafts is at http://datatracker.ietf.org/drafts/current/.

35	   Internet-Drafts are draft documents valid for a maximum of six months
36	   and may be updated, replaced, or obsoleted by other documents at any
37	   time.  It is inappropriate to use Internet-Drafts as reference
38	   material or to cite them other than as "work in progress."

40	   This Internet-Draft will expire on May 10, 2014.

42	Copyright Notice

44	   Copyright (c) 2013 IETF Trust and the persons identified as the
45	   document authors.  All rights reserved.

47	   This document is subject to BCP 78 and the IETF Trust's Legal
48	   Provisions Relating to IETF Documents
49	   (http://trustee.ietf.org/license-info) in effect on the date of
50	   publication of this document.  Please review these documents
51	   carefully, as they describe your rights and restrictions with respect
52	   to this document.  Code Components extracted from this document must
53	   include Simplified BSD License text as described in Section 4.e of
54	   the Trust Legal Provisions and are provided without warranty as
55	   described in the Simplified BSD License.

57	Table of Contents

59	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
60	   2.  Concepts  . . . . . . . . . . . . . . . . . . . . . . . . . .   4
61	     2.1.  Media Chain . . . . . . . . . . . . . . . . . . . . . . .   4
62	       2.1.1.  Physical Stimulus . . . . . . . . . . . . . . . . . .   7
63	       2.1.2.  Media Capture . . . . . . . . . . . . . . . . . . . .   7
64	       2.1.3.  Raw Stream  . . . . . . . . . . . . . . . . . . . . .   7
65	       2.1.4.  Media Source  . . . . . . . . . . . . . . . . . . . .   8
66	       2.1.5.  Source Stream . . . . . . . . . . . . . . . . . . . .   9
67	       2.1.6.  Media Encoder . . . . . . . . . . . . . . . . . . . .   9
68	       2.1.7.  Encoded Stream  . . . . . . . . . . . . . . . . . . .  10
69	       2.1.8.  Dependent Stream  . . . . . . . . . . . . . . . . . .  10
70	       2.1.9.  Media Packetizer  . . . . . . . . . . . . . . . . . .  10
71	       2.1.10. Packet Stream . . . . . . . . . . . . . . . . . . . .  11
72	       2.1.11. Media Redundancy  . . . . . . . . . . . . . . . . . .  12
73	       2.1.12. Redundancy Packet Stream  . . . . . . . . . . . . . .  12
74	       2.1.13. Media Transport . . . . . . . . . . . . . . . . . . .  12
75	       2.1.14. Received Packet Stream  . . . . . . . . . . . . . . .  15
76	       2.1.15. Received Redundandy Packet Stream . . . . . . . . . .  15
77	       2.1.16. Media Repair  . . . . . . . . . . . . . . . . . . . .  15
78	       2.1.17. Repaired Packet Stream  . . . . . . . . . . . . . . .  15
79	       2.1.18. Media Depacketizer  . . . . . . . . . . . . . . . . .  15
80	       2.1.19. Received Encoded Stream . . . . . . . . . . . . . . .  15
81	       2.1.20. Media Decoder . . . . . . . . . . . . . . . . . . . .  16
82	       2.1.21. Received Source Stream  . . . . . . . . . . . . . . .  16
83	       2.1.22. Media Sink  . . . . . . . . . . . . . . . . . . . . .  16
84	       2.1.23. Received Raw Stream . . . . . . . . . . . . . . . . .  16
85	       2.1.24. Media Render  . . . . . . . . . . . . . . . . . . . .  16
86	     2.2.  Communication Entities  . . . . . . . . . . . . . . . . .  17
87	       2.2.1.  End Point . . . . . . . . . . . . . . . . . . . . . .  17
88	       2.2.2.  RTP Session . . . . . . . . . . . . . . . . . . . . .  17
89	       2.2.3.  Participant . . . . . . . . . . . . . . . . . . . . .  18
90	       2.2.4.  Multimedia Session  . . . . . . . . . . . . . . . . .  19
91	       2.2.5.  Communication Session . . . . . . . . . . . . . . . .  19
92	   3.  Relations at Different Levels . . . . . . . . . . . . . . . .  20
93	     3.1.  Media Source Relations  . . . . . . . . . . . . . . . . .  20
94	       3.1.1.  Synchronization Context . . . . . . . . . . . . . . .  20
95	       3.1.2.  End Point . . . . . . . . . . . . . . . . . . . . . .  21
96	       3.1.3.  Participant . . . . . . . . . . . . . . . . . . . . .  22
97	       3.1.4.  WebRTC MediaStream  . . . . . . . . . . . . . . . . .  22
98	     3.2.  Packetization Time Relations  . . . . . . . . . . . . . .  22
99	       3.2.1.  Single Stream Transport of SVC  . . . . . . . . . . .  23
100	       3.2.2.  Multi-Channel Audio . . . . . . . . . . . . . . . . .  23
101	       3.2.3.  Redundancy Format . . . . . . . . . . . . . . . . . .  23
102	     3.3.  Packet Stream Relations . . . . . . . . . . . . . . . . .  24
103	       3.3.1.  Simulcast . . . . . . . . . . . . . . . . . . . . . .  24
104	       3.3.2.  Layered Multi-Stream Transmission . . . . . . . . . .  25
105	       3.3.3.  Robustness and Repair . . . . . . . . . . . . . . . .  26
106	       3.3.4.  Packet Stream Separation  . . . . . . . . . . . . . .  29
107	     3.4.  Multiple RTP Sessions over one Media Transport  . . . . .  30
108	   4.  Topologies and Communication Entities . . . . . . . . . . . .  30
109	     4.1.  Point-to-Point Communication  . . . . . . . . . . . . . .  31
110	     4.2.  Central Conferencing  . . . . . . . . . . . . . . . . . .  32
111	     4.3.  Full Mesh Conferencing  . . . . . . . . . . . . . . . . .  33
112	     4.4.  Source-Specific Multicast . . . . . . . . . . . . . . . .  36
113	   5.  Security Considerations . . . . . . . . . . . . . . . . . . .  37
114	   6.  Acknowledgement . . . . . . . . . . . . . . . . . . . . . . .  38
115	   7.  Contributors  . . . . . . . . . . . . . . . . . . . . . . . .  38
116	   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  38
117	   9.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  38
118	     9.1.  Normative References  . . . . . . . . . . . . . . . . . .  38
119	     9.2.  Informative References  . . . . . . . . . . . . . . . . .  38
120	   Appendix A.  Changes From Earlier Versions  . . . . . . . . . . .  40
121	     A.1.  Modifications Between Version -02 and -03 . . . . . . . .  40
122	     A.2.  Modifications Between Version -01 and -02 . . . . . . . .  40
123	     A.3.  Modifications Between Version -00 and -01 . . . . . . . .  40
124	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  41

126	1.  Introduction

128	   The existing taxonomy of sources in RTP is often regarded as
129	   confusing and inconsistent.  Consequently, a deep understanding of
130	   how the different terms relate to each other becomes a real
131	   challenge.  Frequently cited examples of this confusion are (1) how
132	   different protocols that make use of RTP use the same terms to
133	   signify different things and (2) how the complexities addressed at
134	   one layer are often glossed over or ignored at another.

136	   This document attempts to provide some clarity by reviewing the
137	   semantics of various aspects of sources in RTP.  As an organizing
138	   mechanism, it approaches this by describing various ways that RTP
139	   sources can be grouped and associated together.

141	   All non-specific references to ControLling mUltiple streams for
142	   tElepresence (CLUE) in this document map to [I-D.ietf-clue-framework]
143	   and all references to Web Real-Time Communications (WebRTC) map to
144	   [I-D.ietf-rtcweb-overview].

146	2.  Concepts

148	   This section defines concepts that serve to identify and name various
149	   transformations and streams in a given RTP usage.  For each concept
150	   an attempt is made to list any alternate definitions and usages that
151	   co-exist today along with various characteristics that further
152	   describes the concept.  These concepts are divided into two
153	   categories, one related to the chain of streams and transformations
154	   that media can be subject to, the other for entities involved in the
155	   communication.

157	2.1.  Media Chain

159	   This section contains the concepts that can be involved in taking a
160	   sequence of physical world stimulus (sound waves, photons, key-
161	   strokes) at a sender side and transport them to a receiver, which may
162	   recover a sequence of physical stimulus.  This chain of concepts is
163	   of two main types, streams and transformations.  Streams are time-
164	   based sequences of samples of the physical stimulus in various
165	   representations, while transformations changes the representation of
166	   the streams in some way.

168	   The below examples are basic ones and it is important to keep in mind
169	   that this conceptual model enables more complex usages.  Some will be
170	   further discussed in later sections of this document.  In general the
171	   following applies to this model:

173	   o  A transformation may have zero or more inputs and one or more
174	      outputs.

176	   o  A Stream is of some type.

178	   o  A Stream has one source transformation and one or more sink
179	      transformation (with the exception of Physical Stimulus
180	      (Section 2.1.1) that can have no source or sink transformation).

182	   o  Streams can be forwarded from a transformation output to any
183	      number of inputs on other transformations that support that type.

185	   o  If the output of a transformation is sent to multiple
186	      transformations, those streams will be identical; it takes a
187	      transformation to make them different.

189	   o  There are no formal limitations on how streams are connected to
190	      transformations, this may include loops if required by a
191	      particular transformation.

193	   It is also important to remember that this is a conceptual model.
194	   Thus real-world implementations may look different and have different
195	   structure.

197	   To provide a basic understanding of the relationships in the chain we
198	   below first introduces the concepts for the sender side (Figure 1).
199	   This covers physical stimulus until media packets are emitted onto
200	   the network.

202	                Physical Stimulus
203	                       |
204	                       V
205	             +--------------------+
206	             |    Media Capture   |
207	             +--------------------+
208	                       |
209	                  Raw stream
210	                       V
211	             +--------------------+
212	             |    Media Source    |<- Synchronization Timing
213	             +--------------------+
214	                       |
215	                 Source Stream
216	                       V
217	             +--------------------+
218	             |   Media Encoder    |
219	             +--------------------+
220	                       |
221	                 Encoded Stream     +-----------+
222	                       V            |           V
223	             +--------------------+ | +--------------------+
224	             |  Media Packetizer  | | |  Media Redundancy  |
225	             +--------------------+ | +--------------------+
226	                       |            |           |
227	                       +------------+ Redundancy Packet Stream
228	                Source Packet Stream            |
229	                       V                        V
230	             +--------------------+   +--------------------+
231	             |  Media Transport   |   |  Media Transport   |
232	             +--------------------+   +--------------------+

234	             Figure 1: Sender Side Concepts in the Media Chain

236	   In Figure 1 we have included a branched chain to cover the concepts
237	   for using redundancy to improve the reliability of the transport.
238	   The Media Transport concept is an aggregate that is decomposed below
239	   in Section 2.1.13.2.

241	   Below we review a receiver media chain (Figure 2) matching the sender
242	   side to look at the inverse transformations and their attempts to
243	   recover possibly identical streams as in the sender chain.  Note that
244	   the streams out of a reverse transformation, like the Source Stream
245	   out the Media Decoder are in many cases not the same as the
246	   corresponding ones on the sender side, thus they are prefixed with a
247	   "Received" to denote a potentially modified version.  The reason for
248	   not being the same lies in the transformations that can be of
249	   irreversible type.  For example, lossy source coding in the Media
250	   Encoder prevents the Source Stream out of the Media Decoder to be the
251	   same as the one fed into the Media Encoder.  Other reasons include
252	   packet loss or late loss in the Media Transport transformation that
253	   even Media Repair, if used, fails to repair.  It should be noted that
254	   some transformations are not always present, like Media Repair that
255	   cannot operate without Redundancy Packet Streams.

257	           +--------------------+   +--------------------+
258	           |  Media Transport   |   |  Media Transport   |
259	           +--------------------+   +--------------------+
260	                     |                        |
261	           Received Packet Stream   Received Redundancy PS
262	                     |                        |
263	                     |    +-------------------+
264	                     V    V
265	           +--------------------+
266	           |    Media Repair    |
267	           +--------------------+
268	                     |
269	           Repaired Packet Stream
270	                     V
271	           +--------------------+
272	           | Media Depacketizer |
273	           +--------------------+
274	                     |
275	           Received Encoded Stream
276	                     V
277	           +--------------------+
278	           |   Media Decoder    |
279	           +--------------------+
280	                     |
281	           Received Source Stream
282	                     V
283	           +--------------------+
284	           |     Media Sink     |--> Synchronization Information
285	           +--------------------+
286	                     |
287	           Received Raw Stream
288	                     V

290	           +--------------------+
291	           |   Media Renderer   |
292	           +--------------------+
293	                     |
294	                     V
295	             Physical Stimulus

297	            Figure 2: Receiver Side Concepts of the Media Chain

299	2.1.1.  Physical Stimulus

301	   The physical stimulus is a physical event that can be captured and
302	   provided as media to a receiver.  This include sound waves making up
303	   audio, photons in a light field that is visible, or other excitations
304	   or interactions with sensors, like keystrokes on a keyboard.

306	2.1.2.  Media Capture

308	   The process of transforming the Physical Stimulus (Section 2.1.1)
309	   into captured media.  The Media Capture performs a digital sampling
310	   of the physical stimulus, usually periodically, and outputs this in
311	   some representation as a Raw Stream (Section 2.1.3).  This data is
312	   due to its periodical sampling, or at least being timed asynchronous
313	   events, some form of a stream of media data.  The Media Capture is
314	   normally instantiated in some type of device, i.e. media capture
315	   device.  Examples of different types of media capturing devices are
316	   digital cameras, microphones connected to A/D converters, or
317	   keyboards.

319	2.1.2.1.  Alternate Usages

321	   The CLUE WG uses the term "Capture Device" to identify a physical
322	   capture device.

324	   WebRTC WG uses the term "Recording Device" to refer to the locally
325	   available capture devices in an end-system.

327	2.1.2.2.  Characteristics

329	   o  A Media Capture is identified either by hardware/manufacturer ID
330	      or via a session-scoped device identifier as mandated by the
331	      application usage.

333	   o  A Media Capture can generate an Encoded Stream (Section 2.1.7) if
334	      the capture device support such a configuration.

336	2.1.3.  Raw Stream
337	   The time progressing stream of digitally sampled information, usually
338	   periodically sampled, provided by a Media Capture (Section 2.1.2).

340	2.1.4.  Media Source

342	   A Media Source is the logical source of a reference clock
343	   synchronized, time progressing, digital media stream, called a Source
344	   Stream (Section 2.1.5).  This transformation takes one or more Raw
345	   Streams (Section 2.1.3) and provides a Source Stream as output.  This
346	   output has been synchronized with some reference clock, even if just
347	   a system local wall clock.

349	   The output can be of different types.  One type is directly
350	   associated with a particular Media Capture's Raw Stream.  Others are
351	   more conceptual sources, like an audio mix of multiple Raw Streams
352	   (Figure 3), a mixed selection of the three loudest inputs regarding
353	   speech activity, a selection of a particular video based on the
354	   current speaker, i.e. typically based on other Media Sources.

356	                 Raw       Raw       Raw
357	                Stream    Stream    Stream
358	                  |         |         |
359	                  V         V         V
360	              +--------------------------+
361	              |        Media Source      |<-- Reference Clock
362	              |           Mixer          |
363	              +--------------------------+
364	                            |
365	                            V
366	                      Source Stream

368	         Figure 3: Conceptual Media Source in form of Audio Mixer

370	2.1.4.1.  Alternate Usages

372	   The CLUE WG uses the term "Media Capture" for this purpose.  A CLUE
373	   Media Capture is identified via indexed notation.  The terms Audio
374	   Capture and Video Capture are used to identify Audio Sources and
375	   Video Sources respectively.  Concepts such as "Capture Scene",
376	   "Capture Scene Entry" and "Capture" provide a flexible framework to
377	   represent media captured spanning spatial regions.

379	   The WebRTC WG defines the term "RtcMediaStreamTrack" to refer to a
380	   Media Source.  An "RtcMediaStreamTrack" is identified by the ID
381	   attribute.

383	   Typically a Media Source is mapped to a single m=line via the Session
384	   Description Protocol (SDP) [RFC4566] unless mechanisms such as
385	   Source-Specific attributes are in place [RFC5576].  In the latter
386	   cases, an m=line can represent either multiple Media Sources,
387	   multiple Packet Streams (Section 2.1.10), or both.

389	2.1.4.2.  Characteristics

391	   o  At any point, it can represent a physical captured source or
392	      conceptual source.

394	2.1.5.  Source Stream

396	   A time progressing stream of digital samples that has been
397	   synchronized with a reference clock and comes from particular Media
398	   Source (Section 2.1.4).

400	2.1.6.  Media Encoder

402	   A Media Encoder is a transform that is responsible for encoding the
403	   media data from a Source Stream (Section 2.1.5) into another
404	   representation, usually more compact, that is output as an Encoded
405	   Stream (Section 2.1.7).

407	   The Media Encoder step commonly includes pre-encoding
408	   transformations, such as scaling, resampling etc.  The Media Encoder
409	   can have a significant number of configuration options that affects
410	   the properties of the encoded stream.  This include properties such
411	   as bit-rate, start points for decoding, resolution, bandwidth or
412	   other fidelity affecting properties.  The actually used codec is also
413	   an important factor in many communication systems, not only its
414	   parameters.

416	   Scalable Media Encoders need special mentioning as they produce
417	   multiple outputs that are potentially of different types.  A scalable
418	   Media Encoder takes one input Source Stream and encodes it into
419	   multiple output streams of two different types; at least one Encoded
420	   Stream that is independently decodable and one or more Dependent
421	   Streams (Section 2.1.8) that requires at least one Encoded Stream and
422	   zero or more Dependent Streams to be possible to decode.  A Dependent
423	   Stream's dependency is one of the grouping relations this document
424	   discusses further in Section 3.3.2.

426	                              Source Stream
427	                                    |
428	                                    V
429	                       +--------------------------+
430	                       |  Scalable Media Encoder  |
431	                       +--------------------------+
432	                          |         |   ...    |
433	                          V         V          V
434	                       Encoded  Dependent  Dependent
435	                       Stream    Stream     Stream

437	            Figure 4: Scalable Media Encoder Input and Outputs

439	2.1.6.1.  Alternate Usages

441	   Within the SDP usage, an SDP media description (m=line) describes
442	   part of the necessary configuration required for encoding purposes.

444	   CLUE's "Capture Encoding" provides specific encoding configuration
445	   for this purpose.

447	2.1.6.2.  Characteristics

449	   o  A Media Source can be multiply encoded by different Media Encoders
450	      to provide various encoded representations.

452	2.1.7.  Encoded Stream

454	   A stream of time synchronized encoded media that can be independently
455	   decoded.

457	2.1.7.1.  Characteristics

459	   o  Due to temporal dependencies, an Encoded Stream may have
460	      limitations in where decoding can be started.  These entry points,
461	      for example Intra frames from a video encoder, may require
462	      identification and their generation may be event based or
463	      configured to occur periodically.

465	2.1.8.  Dependent Stream

467	   A stream of time synchronized encoded media fragments that are
468	   dependent on one or more Encoded Streams (Section 2.1.7) and zero or
469	   more Dependent Streams to be possible to decode.

471	2.1.8.1.  Characteristics

473	   o  Each Dependent Stream has a set of dependencies.  These
474	      dependencies must be understood by the parties in a multi-media
475	      session that intend to use a Dependent Stream.

477	2.1.9.  Media Packetizer

479	   The transformation of taking one or more Encoded (Section 2.1.7) or
480	   Dependent Stream (Section 2.1.8) and put their content into one or
481	   more sequences of packets, normally RTP packets, and output Source
482	   Packet Streams (Section 2.1.10).  This step includes both generating
483	   RTP payloads as well as RTP packets.

485	   The Media Packetizer can use multiple inputs when producing a single
486	   Packet Stream.  One such example is the packetization when using SVC,
487	   as in Single Stream Transport (SST) usage of the payload format both
488	   an Encoded Stream as well as Dependent Streams are packetized in a
489	   single Source Packet Stream using a single SSRC.

491	   The Media Packetizer can also produce multiple Packet Streams, for
492	   example when Encoded and/or Dependent Streams are distributed over
493	   multiple Packet Streams, possibly in different RTP sessions.

495	2.1.9.1.  Alternate Usages

497	   An RTP sender is part of the Media Packetizer.

499	2.1.9.2.  Characteristics

501	   o  The Media Packetizer will select which Synchronization source(s)
502	      (SSRC) [RFC3550] in which RTP sessions that are used.

504	   o  Media Packetizer can combine multiple Encoded or Dependent Streams
505	      into one or more Packet Streams.

507	2.1.10.  Packet Stream

509	   A stream of RTP packets containing media data, source or redundant.
510	   The Packet Stream is identified by an SSRC belonging to a particular
511	   RTP session.  The RTP session is identified as discussed in
512	   Section 2.2.2.

514	   A Source Packet Stream is a packet stream containing at least some
515	   content from an Encoded Stream.  Source material is any media
516	   material that is produced for transport over RTP without any
517	   additional redundancy applied to cope with network transport losses.
518	   Compare this with the Redundancy Packet Stream (Section 2.1.12).

520	2.1.10.1.  Alternate Usages

522	   The term "Stream" is used by the CLUE WG to define an encoded Media
523	   Source sent via RTP.  "Capture Encoding", "Encoding Groups" are
524	   defined to capture specific details of the encoding scheme.

526	   RFC3550 [RFC3550] uses the terms media stream, audio stream, video
527	   stream and streams of (RTP) packets interchangeably.  It defines the
528	   SSRC as the "The source of a stream of RTP packets, ..."
529	   The equivalent mapping of a Packet Stream in SDP [RFC4566] is defined
530	   per usage.  For example, each Media Description (m=line) and
531	   associated attributes can describe one Packet Stream OR properties
532	   for multiple Packet Streams OR for an RTP session (via [RFC5576]
533	   mechanisms for example).

535	2.1.10.2.  Characteristics

537	   o  Each Packet Stream is identified by a unique Synchronization
538	      source (SSRC) [RFC3550] that is carried in every RTP and RTP
539	      Control Protocol (RTCP) packet header in a specific RTP session
540	      context.

542	   o  At any given point in time, a Packet Stream can have one and only
543	      one SSRC.

545	   o  Each Packet Stream defines a unique RTP sequence numbering and
546	      timing space.

548	   o  Several Packet Streams may map to a single Media Source via the
549	      source transformations.

551	   o  Several Packet Streams can be carried over a single RTP Session.

553	2.1.11.  Media Redundancy

555	   Media redundancy is a transformation that generates redundant or
556	   repair packets sent out as a Redundancy Packet Stream to mitigate
557	   network transport impairments, like packet loss and delay.

559	   The Media Redundancy exists in many flavors; they may be generating
560	   independent Repair Streams that are used in addition to the Source
561	   Stream (RTP Retransmission [RFC4588] and some FEC [RFC5109]), they
562	   may generate a new Source Stream by combining redundancy information
563	   with source information (Using XOR FEC [RFC5109] as a redundancy
564	   payload [RFC2198]), or completely replace the source information with
565	   only redundancy packets.

567	2.1.12.  Redundancy Packet Stream

569	   A Packet Stream (Section 2.1.10) that contains no original source
570	   data, only redundant data that may be combined with one or more
571	   Received Packet Stream (Section 2.1.14) to produce Repaired Packet
572	   Streams (Section 2.1.17).

574	2.1.13.  Media Transport
575	   A Media Transport defines the transformation that the Packet Streams
576	   (Section 2.1.10) are subjected to by the end-to-end transport from
577	   one RTP sender to one specific RTP receiver (an RTP session may
578	   contain multiple RTP receivers per sender).  Each Media Transport is
579	   defined by a transport association that is identified by a 5-tuple
580	   (source address, source port, destination address, destination port,
581	   transport protocol).  Each transport association normally contains
582	   only a single RTP session, although a proposal exists for sending
583	   multiple RTP sessions over one transport association
584	   [I-D.westerlund-avtcore-transport-multiplexing].

586	2.1.13.1.  Characteristics

588	   o  Media Transport transmits Packet Streams of RTP Packets from a
589	      source transport address to a destination transport address.

591	2.1.13.2.  Media Stream Decomposition

593	   The Media Transport concept sometimes needs to be decomposed into
594	   more steps to enable discussion of what a sender emits that gets
595	   transformed by the network before it is received by the receiver.
596	   Thus we provide also this Media Transport decomposition (Figure 5).

598	                             Packet Stream
599	                                    |
600	                                    V
601	                       +--------------------------+
602	                       |  Media Transport Sender  |
603	                       +--------------------------+
604	                                    |
605	                             Sent Packet Stream
606	                                    V
607	                       +--------------------------+
608	                       |    Network Transport     |
609	                       +--------------------------+
610	                                    |
611	                        Transported Packet Stream
612	                                    V
613	                       +--------------------------+
614	                       | Media Transport Receiver |
615	                       +--------------------------+
616	                                    |
617	                                    V
618	                           Received Packet Stream

620	                Figure 5: Decomposition of Media Transport

622	2.1.13.2.1.  Media Transport Sender

624	   The first transformation within the Media Transport (Section 2.1.13)
625	   is the Media Transport Sender, where the sending End-Point
626	   (Section 2.2.1) takes a Packet Stream and emits the packets onto the
627	   network using the transport association established for this Media
628	   Transport thus creating a Sent Packet Stream (Section 2.1.13.2.2).
629	   In this process it transforms the Packet Stream in several ways.
630	   First, it gains the necessary protocol headers for the transport
631	   association, for example IP and UDP headers, thus forming IP/UDP/RTP
632	   packets.  In addition, the Media Transport Sender may queue, pace or
633	   otherwise affect how the packets are emitted onto the network.  Thus
634	   adding delay, jitter and inter packet spacings that characterize the
635	   Sent Packet Stream.

637	2.1.13.2.2.  Sent Packet Stream

639	   The Sent Packet Stream is the Packet Stream as entering the first hop
640	   of the network path to its destination.  The Sent Packet Stream is
641	   identified using network transport addresses, like for IP/UDP the
642	   5-tuple (source IP address, source port, destination IP address,
643	   destination port, and protocol (UDP)).

645	2.1.13.2.3.  Network Transport

647	   Network Transport is the transformation that the Sent Packet Stream
648	   (Section 2.1.13.2.2) is subjected to by traveling from the source to
649	   the destination through the network.  These transformations include,
650	   loss of some packets, varying delay on a per packet basis, packet
651	   duplication, and packet header or data corruption.  These
652	   transformations produces a Transported Packet Stream
653	   (Section 2.1.13.2.4) at the exit of the network path.

655	2.1.13.2.4.  Transported Packet Stream

657	   The Packet Stream that is emitted out of the network path at the
658	   destination, subjected to the Network Transport's transformation
659	   (Section 2.1.13.2.3).

661	2.1.13.2.5.  Media Transport Receiver

663	   The receiver End-Point's (Section 2.2.1) transformation of the
664	   Transported Packet Stream (Section 2.1.13.2.4) by its reception
665	   process that result in the Received Packet Stream (Section 2.1.14).
666	   This transformation includes transport checksums being verified and
667	   if non-matching, causing discarding of the corrupted packet.  Other
668	   transformations can include delay variations in receiving a packet on
669	   the network interface and providing it to the application.

671	2.1.14.  Received Packet Stream

673	   The Packet Stream (Section 2.1.10) resulting from the Media
674	   Transport's transformation, i.e. subjected to packet loss, packet
675	   corruption, packet duplication and varying transmission delay from
676	   sender to receiver.

678	2.1.15.  Received Redundandy Packet Stream

680	   The Redundancy Packet Stream (Section 2.1.12) resulting from the
681	   Media Transport's transformation, i.e. subjected to packet loss,
682	   packet corruption, and varying transmission delay from sender to
683	   receiver.

685	2.1.16.  Media Repair

687	   A Transformation that takes as input one or more Source Packet
688	   Streams (Section 2.1.10) as well as Redundancy Packet Streams
689	   (Section 2.1.12) and attempts to combine them to counter the
690	   transformations introduced by the Media Transport (Section 2.1.13) to
691	   minimize the difference between the Source Stream (Section 2.1.5) and
692	   the Received Source Stream (Section 2.1.21) after Media Decoder
693	   (Section 2.1.20).  The output is a Repaired Packet Stream
694	   (Section 2.1.17).

696	2.1.17.  Repaired Packet Stream

698	   A Received Packet Stream (Section 2.1.14) for which Received
699	   Redundancy Packet Stream (Section 2.1.15) information has been used
700	   to try to re-create the Packet Stream (Section 2.1.10) as it was
701	   before Media Transport (Section 2.1.13).

703	2.1.18.  Media Depacketizer

705	   A Media Depacketizer takes one or more Packet Streams
706	   (Section 2.1.10) and depacketizes them and attempts to reconstitute
707	   the Encoded Streams (Section 2.1.7) or Dependent Streams
708	   (Section 2.1.8) present in those Packet Streams.

710	2.1.19.  Received Encoded Stream

712	   The received version of an Encoded Stream (Section 2.1.7).

714	2.1.20.  Media Decoder

716	   A Media Decoder is a transformation that is responsible for decoding
717	   Encoded Streams (Section 2.1.7) and any Dependent Streams
718	   (Section 2.1.8) into a Source Stream (Section 2.1.5).

720	2.1.20.1.  Alternate Usages

722	   Within the context of SDP, an m=line describes the necessary
723	   configuration and identification (RTP Payload Types) required to
724	   decode either one or more incoming Media Streams.

726	2.1.20.2.  Characteristics

728	   o  A Media Decoder is the entity that will have to deal with any
729	      errors in the encoded streams that resulted from corruptions or
730	      failures to repair packet losses.  This as a media decoder
731	      generally is forced to produce some output periodically.  It thus
732	      commonly includes concealment methods.

734	2.1.21.  Received Source Stream

736	   The received version of a Source Stream (Section 2.1.5).

738	2.1.22.  Media Sink

740	   The Media Sink receives a Source Stream (Section 2.1.5) that
741	   contains, usually periodically, sampled media data together with
742	   associated synchronization information.  Depending on application,
743	   this Source Stream then needs to be transformed into a Raw Stream
744	   (Section 2.1.3) that is sent in synchronization with the output from
745	   other Media Sinks to a Media Render (Section 2.1.24).  The media sink
746	   may also be connected with a Media Source (Section 2.1.4) and be used
747	   as part of a conceptual Media Source.

749	2.1.22.1.  Characteristics

751	   o  The media sink can further transform the source stream into a
752	      representation that is suitable for rendering on the Media Render
753	      as defined by the application or system-wide configuration.  This
754	      include sample scaling, level adjustments etc.

756	2.1.23.  Received Raw Stream

758	   The received version of a Raw Stream (Section 2.1.3).

760	2.1.24.  Media Render
761	   A Media Render takes a Raw Stream (Section 2.1.3) and converts it
762	   into Physical Stimulus (Section 2.1.1) that a human user can
763	   perceive.  Examples of such devices are screens, D/A converters
764	   connected to amplifiers and loudspeakers.

766	2.1.24.1.  Characteristics

768	   o  An End Point can potentially have multiple Media Renders for each
769	      media type.

771	2.2.  Communication Entities

773	   This section contains concept for entities involved in the
774	   communication.

776	2.2.1.  End Point

778	   A single addressable entity sending or receiving RTP packets.  It may
779	   be decomposed into several functional blocks, but as long as it
780	   behaves as a single RTP stack entity it is classified as a single
781	   "End Point".

783	2.2.1.1.  Alternate Usages

785	   The CLUE Working Group (WG) uses the terms "Media Provider" and
786	   "Media Consumer" to describes aspects of End Point pertaining to
787	   sending and receiving functionalities.

789	2.2.1.2.  Characteristics

791	   End Points can be identified in several different ways.  While RTCP
792	   Canonical Names (CNAMEs) [RFC3550] provide a globally unique and
793	   stable identification mechanism for the duration of the Communication
794	   Session (see Section 2.2.5), their validity applies exclusively
795	   within a Synchronization Context (Section 3.1.1).  Thus one End Point
796	   can have multiple CNAMEs.  Therefore, mechanisms outside the scope of
797	   RTP, such as application defined mechanisms, must be used to ensure
798	   End Point identification when outside this Synchronization Context.

800	2.2.2.  RTP Session

802	   An RTP session is an association among a group of participants
803	   communicating with RTP.  It is a group communications channel which
804	   can potentially carry a number of Packet Streams.  Within an RTP
805	   session, every participant can find meta-data and control information
806	   (over RTCP) about all the Packet Streams in the RTP session.  The
807	   bandwidth of the RTCP control channel is shared between all
808	   participants within an RTP Session.

810	2.2.2.1.  Alternate Usages

812	   Within the context of SDP, a singe m=line can map to a single RTP
813	   Session or multiple m=lines can map to a single RTP Session.  The
814	   latter is enabled via multiplexing schemes such as BUNDLE
815	   [I-D.ietf-mmusic-sdp-bundle-negotiation], for example, which allows
816	   mapping of multiple m=lines to a single RTP Session.

818	2.2.2.2.  Characteristics

820	   o  Typically, an RTP Session can carry one ore more Packet Streams.

822	   o  An RTP Session shares a single SSRC space as defined in RFC3550
823	      [RFC3550].  That is, the End Points participating in an RTP
824	      Session can see an SSRC identifier transmitted by any of the other
825	      End Points.  An End Point can receive an SSRC either as SSRC or as
826	      a Contributing source (CSRC) in RTP and RTCP packets, as defined
827	      by the endpoints' network interconnection topology.

829	   o  An RTP Session uses at least two Media Transports
830	      (Section 2.1.13), one for sending and one for receiving.
831	      Commonly, the receiving one is the reverse direction of the same
832	      one as used for sending.  An RTP Session may use many Media
833	      Transports and these define the session's network interconnection
834	      topology.  A single Media Transport can normally not transport
835	      more than one RTP Session, unless a solution for multiplexing
836	      multiple RTP sessions over a single Media Transport is used.  One
837	      example of such a scheme is Multiple RTP Sessions on a Single
838	      Lower-Layer Transport
839	      [I-D.westerlund-avtcore-transport-multiplexing].

841	   o  Multiple RTP Sessions can be related.

843	2.2.3.  Participant

845	   A participant is an entity reachable by a single signaling address,
846	   and is thus related more to the signaling context than to the media
847	   context.

849	2.2.3.1.  Characteristics

851	   o  A single signaling-addressable entity, using an application-
852	      specific signaling address space, for example a SIP URI.

854	   o  A participant can have several Multimedia Sessions
855	      (Section 2.2.4).

857	   o  A participant can have several associated transport flows,
858	      including several separate local transport addresses for those
859	      transport flows.

861	2.2.4.  Multimedia Session

863	   A multimedia session is an association among a group of participants
864	   engaged in the communication via one or more RTP Sessions
865	   (Section 2.2.2).  It defines logical relationships among Media
866	   Sources (Section 2.1.4) that appear in multiple RTP Sessions.

868	2.2.4.1.  Alternate Usages

870	   RFC4566 [RFC4566] defines a multimedia session as a set of multimedia
871	   senders and receivers and the data streams flowing from senders to
872	   receivers.

874	   RFC3550 [RFC3550] defines it as set of concurrent RTP sessions among
875	   a common group of participants.  For example, a video conference
876	   (which is a multimedia session) may contain an audio RTP session and
877	   a video RTP session.

879	2.2.4.2.  Characteristics

881	   o  A Multimedia Session can be composed of several parallel RTP
882	      Sessions with potentially multiple Packet Streams per RTP Session.

884	   o  Each participant in a Multimedia Session can have a multitude of
885	      Media Captures and Media Rendering devices.

887	2.2.5.  Communication Session

889	   A Communication Session is an association among group of participants
890	   communicating with each other via a set of Multimedia Sessions.

892	2.2.5.1.  Alternate Usages

894	   The Session Description Protocol (SDP) [RFC4566] defines a multimedia
895	   session as a set of multimedia senders and receivers and the data
896	   streams flowing from senders to receivers.  In that definition it is
897	   however not clear if a multimedia session includes both the sender's
898	   and the receiver's view of the same RTP Packet Stream.

900	2.2.5.2.  Characteristics

902	   o  Each participant in a Communication Session is identified via an
903	      application-specific signaling address.

905	   o  A Communication Session is composed of at least one Multimedia
906	      Session per participant, involving one or more parallel RTP
907	      Sessions with potentially multiple Packet Streams per RTP Session.

909	   For example, in a full mesh communication, the Communication Session
910	   consists of a set of separate Multimedia Sessions between each pair
911	   of Participants.  Another example is a centralized conference, where
912	   the Communication Session consists of a set of Multimedia Sessions
913	   between each Participant and the conference handler.

915	3.  Relations at Different Levels

917	   This section uses the concepts from previous section and look at
918	   different types of relationships among them.  These relationships
919	   occur at different levels and for different purposes.  The section is
920	   organized such as to look at the level where a relation is required.
921	   The reason for the relationship may exist at another step in the
922	   media handling chain.  For example, using Simulcast (discussed in
923	   Section 3.3.1) needs to determine relations at Packet Stream level,
924	   however the reason to relate Packet Streams is that multiple Media
925	   Encoders use the same Media Source, i.e. to be able to identify a
926	   common Media Source.

928	3.1.  Media Source Relations

930	   Media Sources (Section 2.1.4) are commonly grouped and related to an
931	   End Point (Section 2.2.1) or a Participant (Section 2.2.3).  This
932	   occurs for several reasons; both application logic as well as media
933	   handling purposes.  These cases are further discussed below.

935	3.1.1.  Synchronization Context

937	   A Synchronization Context defines a requirement on a strong timing
938	   relationship between the Media Sources, typically requiring alignment
939	   of clock sources.  Such relationship can be identified in multiple
940	   ways as listed below.  A single Media Source can only belong to a
941	   single Synchronization Context, since it is assumed that a single
942	   Media Source can only have a single media clock and requiring
943	   alignment to several Synchronization Contexts (and thus reference
944	   clocks) will effectively merge those into a single Synchronization
945	   Context.

947	   A single Multimedia Session can contain media from one or more
948	   Synchronization Contexts.  An example of that is a Multimedia Session
949	   containing one set of audio and video for communication purposes
950	   belonging to one Synchronization Context, and another set of audio
951	   and video for presentation purposes (like playing a video file) with
952	   a separate Synchronization Context that has no strong timing
953	   relationship and need not be strictly synchronized with the audio and
954	   video used for communication.

956	3.1.1.1.  RTCP CNAME

958	   RFC3550 [RFC3550] describes Inter-media synchronization between RTP
959	   Sessions based on RTCP CNAME, RTP and Network Time Protocol (NTP)
960	   [RFC5905] formatted timestamps of a reference clock.  As indicated in
961	   [I-D.ietf-avtcore-clksrc], despite using NTP format timestamps, it is
962	   not required that the clock be synchronized to an NTP source.

964	3.1.1.2.  Clock Source Signaling

966	   [I-D.ietf-avtcore-clksrc] provides a mechanism to signal the clock
967	   source in SDP both for the reference clock as well as the media
968	   clock, thus allowing a Synchronization Context to be defined beyond
969	   the one defined by the usage of CNAME source descriptions.

971	3.1.1.3.  CLUE Scenes

973	   In CLUE "Capture Scene", "Capture Scene Entry" and "Captures" define
974	   an implied Synchronization Context.

976	3.1.1.4.  Implicitly via RtcMediaStream

978	   The WebRTC WG defines "RtcMediaStream" with one or more
979	   "RtcMediaStreamTracks".  All tracks in a "RTCMediaStream" are
980	   intended to be possible to synchronize when rendered.

982	3.1.1.5.  Explicitly via SDP Mechanisms

984	   RFC5888 [RFC5888] defines m=line grouping mechanism called "Lip
985	   Synchronization (LS)" for establishing the synchronization
986	   requirement across m=lines when they map to individual sources.

988	   RFC5576 [RFC5576] extends the above mechanism when multiple media
989	   sources are described by a single m=line.

991	3.1.2.  End Point

993	   Some applications requires knowledge of what Media Sources originate
994	   from a particular End Point (Section 2.2.1).  This can include such
995	   decisions as packet routing between parts of the topology, knowing
996	   the End Point origin of the Packet Streams.

998	   In RTP, this identification has been overloaded with the
999	   Synchronization Context through the usage of the source description
1000	   CNAME item.  This works for some usages, but sometimes it breaks
1001	   down.  For example, if an End Point has two sets of Media Sources
1002	   that have different Synchronization Contexts, like the audio and
1003	   video of the human participant as well as a set of Media Sources of
1004	   audio and video for a shared movie.  Thus, an End Point may have
1005	   multiple CNAMEs.  The CNAMEs or the Media Sources themselves can be
1006	   related to the End Point.

1008	3.1.3.  Participant

1010	   In communication scenarios, it is commonly needed to know which Media
1011	   Sources that originate from which Participant (Section 2.2.3).  Thus
1012	   enabling the application to for example display Participant Identity
1013	   information correctly associated with the Media Sources.  This
1014	   association is currently handled through the signaling solution to
1015	   point at a specific Multimedia Session where the Media Sources may be
1016	   explicitly or implicitly tied to a particular End Point.

1018	   Participant information becomes more problematic due to Media Sources
1019	   that are generated through mixing or other conceptual processing of
1020	   Raw Streams or Source Streams that originate from different
1021	   Participants.  This type of Media Sources can thus have a dynamically
1022	   varying set of origins and Participants.  RTP contains the concept of
1023	   Contributing Sources (CSRC) that carries such information about the
1024	   previous step origin of the included media content on RTP level.

1026	3.1.4.  WebRTC MediaStream

1028	   An RtcMediaStream, in addition to requiring a single Synchronization
1029	   Context as discussed above, is also an explicit grouping of a set of
1030	   Media Sources, as identified by RtcMediaStreamTracks, within the
1031	   RtcMediaStream.

1033	3.2.  Packetization Time Relations

1035	   At RTP Packetization time, there exists a possibility for a number of
1036	   different types of relationships between Encoded Streams
1037	   (Section 2.1.7), Dependent Streams (Section 2.1.8) and Packet Streams
1038	   (Section 2.1.10).  These are caused by grouping together or
1039	   distributing these different types of streams into Packet Streams.
1040	   This section will look at such relationships.

1042	3.2.1.  Single Stream Transport of SVC

1044	   Scalable Video Coding [RFC6190] has a mode of operation where Encoded
1045	   Streams and Dependent Streams from the SVC Media Encoder is grouped
1046	   together in a single Source Packet Stream using the SVC RTP Payload
1047	   format.

1049	3.2.2.  Multi-Channel Audio

1051	   There exist a number of RTP payload formats that can carry multi-
1052	   channel audio, despite the codec being a mono encoder.  Multi-channel
1053	   audio can be viewed as multiple Media Sources sharing a common
1054	   Synchronization Context.  These are then independently encoded by a
1055	   Media Encoder and the different Encoded Streams are then packetized
1056	   together in a time synchronized way into a single Source Packet
1057	   Stream using the used codec's RTP Payload format.  Example of such
1058	   codecs are, PCMA and PCMU [RFC3551], AMR [RFC4867], and G.719
1059	   [RFC5404].

1061	3.2.3.  Redundancy Format

1063	   The RTP Payload for Redundant Audio Data [RFC2198] defines how one
1064	   can transport redundant audio data together with primary data in the
1065	   same RTP payload.  The redundant data can be a time delayed version
1066	   of the primary or another time delayed Encoded stream using a
1067	   different Media Encoder to encode the same Media Source as the
1068	   primary, as depicted below in Figure 6.

1070	              +--------------------+
1071	              |    Media Source    |
1072	              +--------------------+
1073	                        |
1074	                   Source Stream
1075	                        |
1076	                        +------------------------+
1077	                        |                        |
1078	                        V                        V
1079	              +--------------------+   +--------------------+
1080	              |   Media Encoder    |   |   Media Encoder    |
1081	              +--------------------+   +--------------------+
1082	                        |                        |
1083	                        |                 +------------+
1084	                  Encoded Stream          | Time Delay |
1085	                        |                 +------------+
1086	                        |                        |
1087	                        |     +------------------+
1088	                        V     V
1089	              +--------------------+
1090	              |  Media Packetizer  |
1091	              +--------------------+
1092	                        |
1093	                        V
1094	                 Packet Stream

1096	   Figure 6: Concept for usage of Audio Redundancy with different Media
1097	                                 Encoders

1099	   The Redundancy format is thus providing the necessary meta
1100	   information to correctly relate different parts of the same Encoded
1101	   Stream, or in the case depicted above (Figure 6) relate the Received
1102	   Source Stream fragments coming out of different Media Decoders to be
1103	   able to combine them together into a less erroneous Source Stream.

1105	3.3.  Packet Stream Relations

1107	   This section discusses various cases of relationships among Packet
1108	   Streams.  This is a common relation to handle in RTP due to that
1109	   Packet Streams are separate and have their own SSRC, implying
1110	   independent sequence numbers and timestamp spaces.  The underlying
1111	   reasons for the Packet Stream relationships are different, as can be
1112	   seen in the cases below.  The different Packet Streams can be handled
1113	   within the same RTP Session or different RTP Sessions to accomplish
1114	   different transport goals.  This separation of Packet Streams is
1115	   further discussed in Section 3.3.4.

1117	3.3.1.  Simulcast

1119	   A Media Source represented as multiple independent Encoded Streams
1120	   constitutes a simulcast of that Media Source.  Figure 7 below
1121	   represents an example of a Media Source that is encoded into three
1122	   separate and different Simulcast streams, that are in turn sent on
1123	   the same Media Transport flow.  When using Simulcast, the Packet
1124	   Streams may be sharing RTP Session and Media Transport, or be
1125	   separated on different RTP Sessions and Media Transports, or be any
1126	   combination of these two.  It is other considerations that affect
1127	   which usage is desirable, as discussed in Section 3.3.4.

1129	                           +----------------+
1130	                           |  Media Source  |
1131	                           +----------------+
1132	                    Source Stream  |
1133	            +----------------------+----------------------+
1134	            |                      |                      |
1135	            v                      v                      v
1136	   +------------------+   +------------------+   +------------------+
1137	   |  Media Encoder   |   |  Media Encoder   |   |  Media Encoder   |
1138	   +------------------+   +------------------+   +------------------+
1139	            | Encoded              | Encoded              | Encoded
1140	            | Stream               | Stream               | Stream
1141	            v                      v                      v
1142	   +------------------+   +------------------+   +------------------+
1143	   | Media Packetizer |   | Media Packetizer |   | Media Packetizer |
1144	   +------------------+   +------------------+   +------------------+
1145	            | Source               | Source               | Source
1146	            | Packet               | Packet               | Packet
1147	            | Stream               | Stream               | Stream
1148	            +-----------------+    |    +-----------------+
1149	                              |    |    |
1150	                              V    V    V
1151	                         +-------------------+
1152	                         |  Media Transport  |
1153	                         +-------------------+

1155	                Figure 7: Example of Media Source Simulcast

1157	   The simulcast relation between the Packet Streams is the common Media
1158	   Source.  In addition, to be able to identify the common Media Source,
1159	   a receiver of the Packet Stream may need to know which configuration
1160	   or encoding goals that lay behind the produced Encoded Stream and its
1161	   properties.  This to enable selection of the stream that is most
1162	   useful in the application at that moment.

1164	3.3.2.  Layered Multi-Stream Transmission

1166	   Multi-stream transmission (MST) is a mechanism by which different
1167	   portions of a layered encoding of a Source Stream are sent using
1168	   separate Packet Streams (sometimes in separate RTP sessions).  MSTs
1169	   are useful for receiver control of layered media.

1171	   A Media Source represented as an Encoded Stream and multiple
1172	   Dependent Streams constitutes a Media Source that has layered
1173	   dependency.  The figure below represents an example of a Media Source
1174	   that is encoded into three dependent layers, where two layers are
1175	   sent on the same Media Transport using different Packet Streams, i.e.
1176	   SSRCs, and the third layer is sent on a separate Media Transport,
1177	   i.e. a different RTP Session.

1179	                            +----------------+
1180	                            |  Media Source  |
1181	                            +----------------+
1182	                                    |
1183	                                    |
1184	                                    V
1185	       +---------------------------------------------------------+
1186	       |                      Media Encoder                      |
1187	       +---------------------------------------------------------+
1188	               |                    |                     |
1189	        Encoded Stream       Dependent Stream     Dependent Stream
1190	               |                    |                     |
1191	               V                    V                     V
1192	       +----------------+   +----------------+   +----------------+
1193	       |Media Packetizer|   |Media Packetizer|   |Media Packetizer|
1194	       +----------------+   +----------------+   +----------------+
1195	               |                    |                     |
1196	         Packet Stream         Packet Stream        Packet Stream
1197	               |                    |                     |
1198	               +------+      +------+                     |
1199	                      |      |                            |
1200	                      V      V                            V
1201	                +-----------------+              +-----------------+
1202	                | Media Transport |              | Media Transport |
1203	                +-----------------+              +-----------------+

1205	           Figure 8: Example of Media Source Layered Dependency

1207	   The SVC MST relation needs to identify the common Media Encoder
1208	   origin for the Encoded and Dependent Streams.  The SVC RTP Payload
1209	   RFC is not particularly explicit about how this relation is to be
1210	   implemented.  When using different RTP Sessions, thus different Media
1211	   Transports, and as long as there is only one Packet Stream per Media
1212	   Encoder and a single Media Source in each RTP Session, common SSRC
1213	   and CNAMEs can be used to identify the common Media Source.  When
1214	   multiple Packet Streams are sent from one Media Encoder in the same
1215	   RTP Session, then CNAME is the only currently specified RTP
1216	   identifier that can be used.  In cases where multiple Media Encoders
1217	   use multiple Media Sources sharing Synchronization Context, and thus
1218	   having a common CNAME, additional heuristics need to be applied to
1219	   create the MST relationship between the Packet Streams.

1221	3.3.3.  Robustness and Repair

1223	   Packet Streams may be protected by Redundancy Packet Streams during
1224	   transport.  Several approaches listed below can achieve the same
1225	   result;

1227	   o  Duplication of the original Packet Stream

1229	   o  Duplication of the original Packet Stream with a time offset,

1231	   o  Forward Error Correction (FEC) techniques, and

1233	   o  Retransmission of lost packets (either globally or selectively).

1235	3.3.3.1.  RTP Retransmission

1237	   The figure below (Figure 9) represents an example where a Media
1238	   Source's Source Packet Stream is protected by a retransmission (RTX)
1239	   flow [RFC4588].  In this example the Source Packet Stream and the
1240	   Redundancy Packet Stream share the same Media Transport.

1242	          +--------------------+
1243	          |    Media Source    |
1244	          +--------------------+
1245	                    |
1246	                    V
1247	          +--------------------+
1248	          |   Media Encoder    |
1249	          +--------------------+
1250	                    |                              Retransmission
1251	              Encoded Stream     +--------+     +---- Request
1252	                    V            |        V     V
1253	          +--------------------+ | +--------------------+
1254	          |  Media Packetizer  | | | RTP Retransmission |
1255	          +--------------------+ | +--------------------+
1256	                    |            |           |
1257	                    +------------+  Redundancy Packet Stream
1258	             Source Packet Stream            |
1259	                    |                        |
1260	                    +---------+    +---------+
1261	                              |    |
1262	                              V    V
1263	                       +-----------------+
1264	                       | Media Transport |
1265	                       +-----------------+

1267	          Figure 9: Example of Media Source Retransmission Flows

1269	   The RTP Retransmission example (Figure 9) helps illustrate that this
1270	   mechanism works purely on the Source Packet Stream.  The RTP
1271	   Retransmission transform buffers the sent Source Packet Stream and
1272	   upon requests emits a retransmitted packet with some extra payload
1273	   header as a Redundancy Packet Stream.  The RTP Retransmission
1274	   mechanism [RFC4588] is specified so that there is a one to one
1275	   relation between the Source Packet Stream and the Redundancy Packet
1276	   Stream.  Thus a Redundancy Packet Stream needs to be associated with
1277	   its Source Packet Stream upon being received.  This is done based on
1278	   CNAME selectors and heuristics to match requested packets for a given
1279	   Source Packet Stream with the original sequence number in the payload
1280	   of any new Redundancy Packet Stream using the RTX payload format.  In
1281	   cases where the Redundancy Packet Stream is sent in a separate RTP
1282	   Session from the Source Packet Stream, these sessions are related,
1283	   e.g. using the SDP Media Grouping's [RFC5888] FID semantics.

1285	3.3.3.2.  Forward Error Correction

1287	   The figure below (Figure 10) represents an example where two Media
1288	   Sources' Source Packet Streams are protected by FEC.  Source Packet
1289	   Stream A has a Media Redundancy transformation in FEC Encoder 1.
1290	   This produces a Redundancy Packet Stream 1, that is only related to
1291	   Source Packet Stream A. The FEC Encoder 2, however takes two Source
1292	   Packet Streams (A and B) and produces a Redundancy Packet Stream 2
1293	   that protects them together, i.e. Redundancy Packet Stream 2 relate
1294	   to two Source Packet Streams (a FEC group).  FEC decoding, when
1295	   needed due to packet loss or packet corruption at the receiver,
1296	   requires knowledge about which Source Packet Streams that the FEC
1297	   encoding was based on.

1299	   In Figure 10 all Packet Streams are sent on the same Media Transport.
1300	   This is however not the only possible choice.  Numerous combinations
1301	   exist for spreading these Packet Streams over different Media
1302	   Transports to achieve the communication application's goal.

1304	       +--------------------+                +--------------------+
1305	       |   Media Source A   |                |   Media Source B   |
1306	       +--------------------+                +--------------------+
1307	                 |                                     |
1308	                 V                                     V
1309	       +--------------------+                +--------------------+
1310	       |   Media Encoder A  |                |   Media Encoder B  |
1311	       +--------------------+                +--------------------+
1312	                 |                                     |
1313	           Encoded Stream                        Encoded Stream
1314	                 V                                     V
1315	       +--------------------+                +--------------------+
1316	       | Media Packetizer A |                | Media Packetizer B |
1317	       +--------------------+                +--------------------+
1318	                 |                                     |
1319	       Source Packet Stream A                Source Packet Stream B
1320	                 |                                     |
1321	           +-----+-------+-------------+       +-------+------+
1322	           |             V             V       V              |
1323	           |    +---------------+  +---------------+          |
1324	           |    | FEC Encoder 1 |  | FEC Encoder 2 |          |
1325	           |    +---------------+  +---------------+          |
1326	           |             |                 |                  |
1327	           |     Redundancy PS 1    Redundancy PS 2           |
1328	           V             V                 V                  V
1329	       +----------------------------------------------------------+
1330	       |                    Media Transport                       |
1331	       +----------------------------------------------------------+

1333	                      Figure 10: Example of FEC Flows

1335	   As FEC Encoding exists in various forms, the methods for relating FEC
1336	   Redundancy Packet Streams with its source information in Source
1337	   Packet Streams are many.  The XOR based RTP FEC Payload format
1338	   [RFC5109] is defined in such a way that a Redundancy Packet Stream
1339	   has a one to one relation with a Source Packet Stream.  In fact, the
1340	   RFC requires the Redundancy Packet Stream to use the same SSRC as the
1341	   Source Packet Stream.  This requires to either use a separate RTP
1342	   session or to use the Redundancy RTP Payload format [RFC2198].  The
1343	   underlying relation requirement for this FEC format and a particular
1344	   Redundancy Packet Stream is to know the related Source Packet Stream,
1345	   including its SSRC.

1347	3.3.4.  Packet Stream Separation

1349	   Packet Streams can be separated exclusively based on their SSRCs or
1350	   at the RTP Session level or at the Multi-Media Session level as
1351	   explained below.

1353	   When the Packet Streams that have a relationship are all sent in the
1354	   same RTP Session and are uniquely identified based on their SSRC
1355	   only, it is termed an SSRC-Only Based Separation.  Such streams can
1356	   be related via RTCP CNAME to identify that the streams belong to the
1357	   same End Point.  [RFC5576]-based approaches, when used, can
1358	   explicitly relate various such Packet Streams.

1360	   On the other hand, when Packet Streams that are related but are sent
1361	   in the context of different RTP Sessions to achieve separation, it is
1362	   known as RTP Session-based separation.  This is commonly used when
1363	   the different Packet Streams are intended for different Media
1364	   Transports.

1366	   Several mechanisms that use RTP Session-based separation rely on it
1367	   to enable an implicit grouping mechanism expressing the relationship.
1368	   The solutions have been based on using the same SSRC value in the
1369	   different RTP Sessions to implicitly indicate their relation.  That
1370	   way, no explicit RTP level mechanism has been needed, only signalling
1371	   level relations have been established using semantics from Grouping
1372	   of Media lines framework [RFC5888].  Examples of this are RTP
1373	   Retransmission [RFC4588], SVC Multi Stream Transmission [RFC6190] and
1374	   XOR Based FEC [RFC5109].  RTCP CNAME explicitly relates Packet
1375	   Streams across different RTP Sessions, as explained in the previous
1376	   section.  Such a relationship can be used to perform inter-media
1377	   synchronization.

1379	   Packet Streams that are related and need to be associated can be part
1380	   of different Multimedia Sessions, rather than just different RTP
1381	   sessions within the same Multimedia Session context.  This puts
1382	   further demand on the scope of the mechanism(s) and its handling of
1383	   identifiers used for expressing the relationships.

1385	3.4.  Multiple RTP Sessions over one Media Transport

1387	   [I-D.westerlund-avtcore-transport-multiplexing] describes a mechanism
1388	   that allow several RTP Sessions to be carried over a single
1389	   underlying Media Transport.  The main reasons for doing this are
1390	   related to the impact of using one or more Media Transports.  Thus
1391	   using a common network path or potentially have different ones.
1392	   There is reduced need for NAT/FW traversal resources and no need for
1393	   flow based QoS.

1395	   However, Multiple RTP Sessions over one Media Transport makes it
1396	   clear that a single Media Transport 5-tuple is not sufficient to
1397	   express which RTP Session context a particular Packet Stream exists
1398	   in.  Complexities in the relationship between Media Transports and
1399	   RTP Session already exist as one RTP Session contains multiple Media
1400	   Transports, e.g. even a Peer-to-Peer RTP Session with RTP/RTCP
1401	   Multiplexing requires two Media Transports, one in each direction.
1402	   The relationship between Media Transports and RTP Sessions as well as
1403	   additional levels of identifiers need to be considered in both
1404	   signalling design and when defining terminology.

1406	4.  Topologies and Communication Entities

1408	   This Section reviews some communication topologies and looks at the
1409	   relationship among the communication entities that are defined in
1410	   Section 2.2.  This section doesn't deal with discussions about the
1411	   streams and their relation to the transport.  Instead, it covers the
1412	   aspects that enable the transport of those streams.  For example, the
1413	   Media Transports (Section 2.1.13) that exists between the End Points
1414	   (Section 2.2.1) that are part of an RTP session (Section 2.2.2) and
1415	   their relationship to the Multi-Media Session (Section 2.2.4) between
1416	   Participants (Section 2.2.3) and the established Communication
1417	   session (Section 2.2.5) are explained.

1419	4.1.  Point-to-Point Communication

1421	   Figure 11 shows a very basic point-to-point communication session
1422	   between A and B. It uses two different audio and video RTP sessions
1423	   between A's and B's end points.  Assume that the Multi-media session
1424	   shared by the participants is established using SIP (i.e., there is a
1425	   SIP Dialog between A and B).  The high level representation of this
1426	   communication scenario can be demonstrated using Figure 11.

1428	                            +---+         +---+
1429	                            | A |<------->| B |
1430	                            +---+         +---+

1432	                  Figure 11: Point to Point Communication

1434	   However, this picture gets slightly more complex when redrawn using
1435	   the communication entities concepts defined earlier in this document.

1437	       +-----------------------------------------------------------+
1438	       | Communication Session                                     |
1439	       |                                                           |
1440	       | +----------------+                     +----------------+ |
1441	       | | Participant A  |   +-------------+   | Participant B  | |
1442	       | |                |   | Multi-Media |   |                | |
1443	       | | +-------------+|<=>| Session     |<=>|+-------------+ | |
1444	       | | | End Point A ||   |(SIP Dialog) |   || End Point B | | |
1445	       | | |             ||   +-------------+   ||             | | |
1446	       | | | +-----------++---------------------++-----------+ | | |
1447	       | | | | RTP Session|                     |            | | | |
1448	       | | | | Audio      |---Media Transport-->|            | | | |
1449	       | | | |            |<--Media Transport---|            | | | |
1450	       | | | +-----------++---------------------++-----------+ | | |
1451	       | | |             ||                     ||             | | |
1452	       | | | +-----------++---------------------++-----------+ | | |
1453	       | | | | RTP Session|                     |            | | | |
1454	       | | | | Video      |---Media Transport-->|            | | | |
1455	       | | | |            |<--Media Transport---|            | | | |
1456	       | | | +-----------++---------------------++-----------+ | | |
1457	       | | +-------------+|                     |+-------------+ | |
1458	       | +----------------+                     +----------------+ |
1459	       +-----------------------------------------------------------+

1461	   Figure 12: Point to Point Communication Session with two RTP Sessions
1462	   Figure 12 shows the two RTP Sessions only exist between the two End
1463	   Points A and B and over their respective Media Transports.  The
1464	   Multi-Media Session establishes the association between the two
1465	   Participants and configures these RTP sessions and the Media
1466	   Transports that are used.

1468	4.2.  Central Conferencing

1470	   This section looks at the central conferencing communication
1471	   topology, where a number of participants, like A, B, C, and D in
1472	   Figure 13, communicate using an RTP mixer.

1474	   +---+      +------------+      +---+
1475	   | A |<---->|            |<---->| B |
1476	   +---+      |            |      +---+
1477	              |   Mixer    |
1478	   +---+      |            |      +---+
1479	   | C |<---->|            |<---->| D |
1480	   +---+      +------------+      +---+

1482	          Figure 13: Centralized Conferincing using an RTP Mixer

1484	   In this case each of the Participants establish their Multi-media
1485	   session with the Conference Bridge.  Thus, negotiation for the
1486	   establishment of the used RTP sessions and their configuration
1487	   happens between these entities.  The participants have their End
1488	   Points (A, B, C, D) and the Conference Bridge has the host running
1489	   the RTP mixer, referred to as End Point M in Figure 14.  However,
1490	   despite the individual establishment of four Multi-Media Sessions and
1491	   the corresponding Media Transports for each of the RTP sessions
1492	   between the respective End Points and the Conference Bridge, there is
1493	   actually only two RTP sessions.  One for audio and one for Video, as
1494	   these RTP sessions are, in this topology, shared between all the
1495	   Participants.

1497	   +-------------------------------------------------------------------+
1498	   | Communication Session                                             |
1499	   |                                                                   |
1500	   | +----------------+                             +----------------+ |
1501	   | | Participant A  |       +-------------+       | Conference     | |
1502	   | |                |       | Multi-Media |       | Bridge         | |
1503	   | | +-------------+|<=====>| Session A   |<=====>|+-------------+ | |
1504	   | | | End Point A ||       |(SIP Dialog) |       || End Point M | | |
1505	   | | |             ||       +-------------+       ||             | | |
1506	   | | | +-----------++-----------------------------++-----------+ | | |
1507	   | | | | RTP Session|                             |            | | | |
1508	   | | | | Audio      |-------Media Transport------>|            | | | |
1509	   | | | |            |<------Media Transport-------|            | | | |
1510	   | | | +-----------++-----------------------------++------+    | | | |
1511	   | | |             ||                             ||      |    | | | |
1512	   | | | +-----------++-----------------------------++----+ |    | | | |
1513	   | | | | RTP Session|                             |     | |    | | | |
1514	   | | | | Video      |-------Media Transport------>|     | |    | | | |
1515	   | | | |            |<------Media Transport-------|     | |    | | | |
1516	   | | | +-----------++-----------------------------++    | |    | | | |
1517	   | | +-------------+|                             ||    | |    | | | |
1518	   | +----------------+                             ||    | |    | | | |
1519	   |                                                ||    | |    | | | |
1520	   | +----------------+                             ||    | |    | | | |
1521	   | | Participant B  |       +-------------+       ||    | |    | | | |
1522	   | |                |       | Multi-Media |       ||    | |    | | | |
1523	   | | +-------------+|<=====>| Session B   |<=====>||    | |    | | | |
1524	   | | | End Point B ||       |(SIP Dialog) |       ||    | |    | | | |
1525	   | | |             ||       +-------------+       ||    | |    | | | |
1526	   | | | +-----------++-----------------------------++    | |    | | | |
1527	   | | | | RTP Session|                             |     | |    | | | |
1528	   | | | | Video      |-------Media Transport------>|     | |    | | | |
1529	   | | | |            |<------Media Transport-------|     | |    | | | |
1530	   | | | +-----------++-----------------------------++----+ |    | | | |
1531	   | | |             ||                             ||      |    | | | |
1532	   | | | +-----------++-----------------------------++------+    | | | |
1533	   | | | | RTP Session|                             |            | | | |
1534	   | | | | Audio      |-------Media Transport------>|            | | | |
1535	   | | | |            |<------Media Transport-------|            | | | |
1536	   | | | +-----------++-----------------------------++-----------+ | | |
1537	   | | +-------------+|                             |+-------------+ | |
1538	   | +----------------+                             +----------------+ |
1539	   +-------------------------------------------------------------------+

1541	       Figure 14: Central Conferencing with Two Participants A and B
1542	                  communicating over a Conference Bridge

1544	   It is important to stress that in the case of Figure 14, it might
1545	   appear that the the Multi-Media Sessions context is scoped between A
1546	   and B over M. This might not be always true and they can have
1547	   contexts that extend further.  In this case the RTP session, its
1548	   common SSRC space goes beyond what occurs between A and M and B and M
1549	   respectively.

1551	4.3.  Full Mesh Conferencing
1552	   This section looks at the case where the three Participants (A, B and
1553	   C) wish to communicate.  They establish individual Multi-Media
1554	   Sessions and RTP sessions between themselves and the other two peers.
1555	   Thus, each providing two copies of their media to every other
1556	   participant.  Figure 15 shows a high level representation of such a
1557	   topology.

1559	                             +---+      +---+
1560	                             | A |<---->| B |
1561	                             +---+      +---+
1562	                               ^         ^
1563	                                \       /
1564	                                 \     /
1565	                                  v   v
1566	                                  +---+
1567	                                  | C |
1568	                                  +---+

1570	   Figure 15: Full Mesh Conferencing with three Participants A, B and C

1572	   In this particular case there are two aspects worth noting.  The
1573	   first is there will be multiple Multi-Media Sessions per
1574	   Communication Session between the participants.  This, however,
1575	   hasn't been true in the earlier examples; the Centralized
1576	   Conferencing inSection 4.2 being the exception.  The second aspect is
1577	   consideration of whether one needs to maintain relationships between
1578	   entities and concepts, for example MediaSources, between these
1579	   different Multi-Media Sessions and between Packet Streams in the
1580	   independent RTP sessions configured by those Multi-Media Sessions.

1582	                           +-----------------------------------------+
1583	                           | Participant A                           |
1584	       +----------+        | +--------------------------------------+|
1585	       | Multi-   |        | | End Point A                          ||
1586	       | Media    |<======>| |                                      ||
1587	       | Session  |        | |+-------+     +-------+     +-------+ ||
1588	       | 1        |        | || RTP 1 |<----| MS A1 |---->| RTP 2 | ||
1589	       +----------+        | ||       |     +-------+     |       | ||
1590	           ^^              | +|-------|-------------------|-------|-+|
1591	           ||              +--|-------|-------------------|-------|--+
1592	           ||                 |       |          ^^       |       |
1593	           VV                 |       |          ||       |       |
1594	    +-------------------------|-------|----+     ||       |       |
1595	    | Participant B           |       |    |     VV       |       |
1596	    | +-----------------------|-------|---+| +----------+ |       |
1597	    | | End Point B    +----->|       |   || | Multi-   | |       |
1598	    | |                |      +-------+   || | Media    | |       |
1599	    | | +-------+      |      +-------+   || | Session  | |       |
1600	    | | | MS B1 |------+----->| RTP 3 |   || | 2        | |       |
1601	    | | +-------+             |       |   || +----------+ |       |
1602	    | +-----------------------|-------|---+|     ^^       |       |
1603	    +-------------------------|-------|----+     ||       |       |
1604	           ^^                 |       |          ||       |       |
1605	           ||                 |       |          VV       |       |
1606	           ||              +--|-------|-------------------|-------|--+
1607	           VV              |  |       | Participant C     |       |  |
1608	       +----------+        | +|-------|-------------------|-------|-+|
1609	       | Multi-   |        | ||       | End Point C       |       | ||
1610	       | Media    |<======>| |+-------+                   +-------+ ||
1611	       | Session  |        | |    ^         +-------+         ^     ||
1612	       | 3        |        | |    +---------| MS C1 |---------+     ||
1613	       +----------+        | |              +-------+               ||
1614	                           | +--------------------------------------+|
1615	                           +-----------------------------------------+

1617	   Figure 16: Full Mesh Conferencing between three Participants A, B and
1618	                                     C

1620	   For the sake of clarity, Figure 16 above does not include all these
1621	   concepts.  The Media Sources (MS) from a given End Point is sent to
1622	   the two peers.  This requires encoding and Media Packetization to
1623	   enable the Packet Streams to be sent over Media Transports in the
1624	   context of the RTP sessions depicted.  The RTP sessions 1, 2, and 3
1625	   are independent, and established in the context of each of the Multi-
1626	   Media Sessions 1, 2 and 3.  The joint communication session the full
1627	   figure represents (not shown here as it was Figure 14 in order to
1628	   save space), however, combines the received representations of the
1629	   peers' Media Sources and plays them back.

1631	   It is noteworthy that the full mesh conferencing topologies described
1632	   here have the potential for creating loops.  For example, if one
1633	   compares the above full mesh with a mixing three party communication
1634	   session as depicted in (Figure 17).  In this example A's Media Source
1635	   A1 is sent to B over a Multi-Media Session (A-B).  In B the Media
1636	   Source A1 is mixed with Media Source B1 and the resulting Media
1637	   Source (MS AB) is sent to C over a Multi-Media Session (B-C).  If C
1638	   and A would establish a Multi-Media Session (A-C) and C would act in
1639	   the same role as B, then A would receive a Media Source from C that
1640	   contains a mix of A, B and C's individual Media Sources.  This would
1641	   result in A playing out a time delay version of its own signal (i.e.,
1642	   the system has created an echo path).

1644	   +--------------+    +--------------+    +--------------+
1645	   | A            |    | B +-------+  |    | C            |
1646	   |              |    |   | MS B1 |  |    |              |
1647	   |              |    |   +-------+  |    |              |
1648	   | +-------+    |    |     |        |    |              |
1649	   | | MS A1 |----|--->|-----+ MS AB -|--->|              |
1650	   | +-------+    |    |              |    |              |
1651	   +--------------+    +--------------+    +--------------+

1653	            Figure 17: Mixing Three Party Communication Session

1655	   The looping issue can be avoided, detected or prevented using two
1656	   general methods.  The first method is to use great care when setting
1657	   up and establishing the communication session if participants have
1658	   any mixing or forwarding capacity, so that one doesn't end up getting
1659	   back a partial or full representation of one's own media believing it
1660	   is someone else's. The other method is to maintain some unique
1661	   identifiers at the communication session level for all Media Sources
1662	   and ensure that any Packet Streams received identify those Media
1663	   Sources that contributed to the content of the Packet Stream.

1665	4.4.  Source-Specific Multicast

1667	   In one-to-many media distribution cases (e.g., IPTV), where one Media
1668	   Sender or a set of Media Senders is allowed to send Packet Streams on
1669	   a particular Source-Specific Multicast (SSM) group to many receivers
1670	   (R), there are some different aspects to consider.  Figure 18
1671	   presents a high level SSM system for RTP/RTCP defined in [RFC5760].
1672	   In this case, several Media Senders sends their Packet Streams to the
1673	   Distribution Source, which is the only one allowed to send to the SSM
1674	   group.  The Receivers joining the SSM group can provide RTCP feedback
1675	   on its reception by sending unicast feedback to a Feedback Target
1676	   (FT).

1678	   +--------+       +-----+
1679	   |Media   |       |     |       Source-Specific
1680	   |Sender 1|<----->| D S |       Multicast (SSM)
1681	   +--------+       | I O |  +--+----------------> R(1)
1682	                    | S U |  |  |                    |
1683	   +--------+       | T R |  |  +-----------> R(2)   |
1684	   |Media   |<----->| R C |->+  |           :   |    |
1685	   |Sender 2|       | I E |  |  +------> R(n-1) |    |
1686	   +--------+       | B   |  |  |          |    |    |
1687	       :            | U   |  +--+--> R(n)  |    |    |
1688	       :            | T +-|          |     |    |    |
1689	       :            | I | |<---------+     |    |    |
1690	   +--------+       | O |F|<---------------+    |    |
1691	   |Media   |       | N |T|<--------------------+    |
1692	   |Sender M|<----->|   | |<-------------------------+
1693	   +--------+       +-----+       RTCP Unicast

1695	   FT = Feedback Target
1696	        Figure 18: Source-Specific Multicast Communication Topology

1698	   Here the Media Transport from the Distribution Source to all the SSM
1699	   receivers (R) have the same 5-tuple, but in reality have different
1700	   paths.  Also, the Multi-Media Sessions between the Distribution
1701	   Source and the individual receivers are normally identical.  This is
1702	   due to one-way communication from the Distribution Source to the
1703	   receiver of configuration information.  This is information typically
1704	   embedded in Electronic Program Guides (EPGs), distributed by the
1705	   Session Announcement Protocol (SAP) [RFC2974] or other one-way
1706	   protocols.  In some cases load balancing occurs, for example, by
1707	   providing the receiver with a set of Feedback Targets and then it
1708	   randomly selects one out of the set.

1710	   This scenario varies significantly from previously described
1711	   communication topologies due to the asymmetric nature of the RTP
1712	   Session context across the Distribution Source.  The Distribution
1713	   Source forms a focal point in collecting the unicasted RTCP feedback
1714	   from the receivers and then re-distributing it to the Media Senders.
1715	   Each Media Sender and the Distribution Source establish their own
1716	   Multi-Media Session Context for the underlying RTP Sessions but with
1717	   shared RTCP context across all the receivers.

1719	   To improve the readability,Figure 18 intentionally hides the details
1720	   of the various entities . Expanding on this, one can think of Media
1721	   Senders being part of one or more Multi-Media Sessions grouped under
1722	   a Communication Session.  The Media Sender in this scenario refers to
1723	   the Media Packetizer transformation Section 2.1.9.  The Packet Stream
1724	   generated by such a Media Sender can be part of its own RTP Session
1725	   or can be multiplexed with other Packet Streams within an End Point.
1726	   The latter case requires careful consideration since the re-
1727	   distributed RTCP packets now correspond to a single RTP Session
1728	   Context across all the Media Senders.

1730	5.  Security Considerations

1732	   This document simply tries to clarify the confusion prevalent in RTP
1733	   taxonomy because of inconsistent usage by multiple technologies and
1734	   protocols making use of the RTP protocol.  It does not introduce any
1735	   new security considerations beyond those already well documented in
1736	   the RTP protocol [RFC3550] and each of the many respective
1737	   specifications of the various protocols making use of it.

1739	   Hopefully having a well-defined common terminology and understanding
1740	   of the complexities of the RTP architecture will help lead us to
1741	   better standards, avoiding security problems.

1743	6.  Acknowledgement

1745	   This document has many concepts borrowed from several documents such
1746	   as WebRTC [I-D.ietf-rtcweb-overview], CLUE [I-D.ietf-clue-framework],
1747	   Multiplexing Architecture
1748	   [I-D.westerlund-avtcore-transport-multiplexing].  The authors would
1749	   like to thank all the authors of each of those documents.

1751	   The authors would also like to acknowledge the insights, guidance and
1752	   contributions of Magnus Westerlund, Roni Even, Paul Kyzivat, Colin
1753	   Perkins, Keith Drage, and Harald Alvestrand.

1755	7.  Contributors

1757	   Magnus Westerlund has contributed the concept model for the media
1758	   chain using transformations and streams model, including rewriting
1759	   pre-existing concepts into this model and adding missing concepts.
1760	   The first proposal for updating the relationships and the topologies
1761	   based on this concept was also performed by Magnus.

1763	8.  IANA Considerations

1765	   This document makes no request of IANA.

1767	9.  References

1769	9.1.  Normative References

1771	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
1772	              Jacobson, "RTP: A Transport Protocol for Real-Time
1773	              Applications", STD 64, RFC 3550, July 2003.

1775	   [UML]      Object Management Group, "OMG Unified Modeling Language
1776	              (OMG UML), Superstructure, V2.2", OMG formal/2009-02-02,
1777	              February 2009.

1779	9.2.  Informative References

1781	   [I-D.ietf-avtcore-clksrc]
1782	              Williams, A., Gross, K., Brandenburg, R., and H. Stokking,
1783	              "RTP Clock Source Signalling", draft-ietf-avtcore-
1784	              clksrc-07 (work in progress), October 2013.

1786	   [I-D.ietf-clue-framework]
1787	              Duckworth, M., Pepperell, A., and S. Wenger, "Framework
1788	              for Telepresence Multi-Streams", draft-ietf-clue-
1789	              framework-12 (work in progress), October 2013.

1791	   [I-D.ietf-mmusic-sdp-bundle-negotiation]
1792	              Holmberg, C., Alvestrand, H., and C. Jennings,
1793	              "Multiplexing Negotiation Using Session Description
1794	              Protocol (SDP) Port Numbers", draft-ietf-mmusic-sdp-
1795	              bundle-negotiation-05 (work in progress), October 2013.

1797	   [I-D.ietf-rtcweb-overview]
1798	              Alvestrand, H., "Overview: Real Time Protocols for Brower-
1799	              based Applications", draft-ietf-rtcweb-overview-08 (work
1800	              in progress), September 2013.

1802	   [I-D.westerlund-avtcore-transport-multiplexing]
1803	              Westerlund, M. and C. Perkins, "Multiplexing Multiple RTP
1804	              Sessions onto a Single Lower-Layer Transport", draft-
1805	              westerlund-avtcore-transport-multiplexing-07 (work in
1806	              progress), October 2013.

1808	   [RFC2198]  Perkins, C., Kouvelas, I., Hodson, O., Hardman, V.,
1809	              Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse-
1810	              Parisis, "RTP Payload for Redundant Audio Data", RFC 2198,
1811	              September 1997.

1813	   [RFC2974]  Handley, M., Perkins, C., and E. Whelan, "Session
1814	              Announcement Protocol", RFC 2974, October 2000.

1816	   [RFC3264]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
1817	              with Session Description Protocol (SDP)", RFC 3264, June
1818	              2002.

1820	   [RFC3551]  Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
1821	              Video Conferences with Minimal Control", STD 65, RFC 3551,
1822	              July 2003.

1824	   [RFC4566]  Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
1825	              Description Protocol", RFC 4566, July 2006.

1827	   [RFC4588]  Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R.
1828	              Hakenberg, "RTP Retransmission Payload Format", RFC 4588,
1829	              July 2006.

1831	   [RFC4867]  Sjoberg, J., Westerlund, M., Lakaniemi, A., and Q. Xie,
1832	              "RTP Payload Format and File Storage Format for the
1833	              Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband
1834	              (AMR-WB) Audio Codecs", RFC 4867, April 2007.

1836	   [RFC5109]  Li, A., "RTP Payload Format for Generic Forward Error
1837	              Correction", RFC 5109, December 2007.

1839	   [RFC5404]  Westerlund, M. and I. Johansson, "RTP Payload Format for
1840	              G.719", RFC 5404, January 2009.

1842	   [RFC5576]  Lennox, J., Ott, J., and T. Schierl, "Source-Specific
1843	              Media Attributes in the Session Description Protocol
1844	              (SDP)", RFC 5576, June 2009.

1846	   [RFC5760]  Ott, J., Chesterfield, J., and E. Schooler, "RTP Control
1847	              Protocol (RTCP) Extensions for Single-Source Multicast
1848	              Sessions with Unicast Feedback", RFC 5760, February 2010.

1850	   [RFC5888]  Camarillo, G. and H. Schulzrinne, "The Session Description
1851	              Protocol (SDP) Grouping Framework", RFC 5888, June 2010.

1853	   [RFC5905]  Mills, D., Martin, J., Burbank, J., and W. Kasch, "Network
1854	              Time Protocol Version 4: Protocol and Algorithms
1855	              Specification", RFC 5905, June 2010.

1857	   [RFC6190]  Wenger, S., Wang, Y., Schierl, T., and A. Eleftheriadis,
1858	              "RTP Payload Format for Scalable Video Coding", RFC 6190,
1859	              May 2011.

1861	   [RFC6222]  Begen, A., Perkins, C., and D. Wing, "Guidelines for
1862	              Choosing RTP Control Protocol (RTCP) Canonical Names
1863	              (CNAMEs)", RFC 6222, April 2011.

1865	Appendix A.  Changes From Earlier Versions

1867	   NOTE TO RFC EDITOR: Please remove this section prior to publication.

1869	A.1.  Modifications Between Version -02 and -03

1871	   o  Section 4 rewritten (and new communication topologies added) to
1872	      reflect the major updates to Sections 1-3

1874	   o  Section 8 removed (carryover from initial -00 draft)

1876	   o  General clean up of text, grammar and nits

1878	A.2.  Modifications Between Version -01 and -02

1880	   o  Section 2 rewritten to add both streams and transformations in the
1881	      media chain.

1883	   o  Section 3 rewritten to focus on exposing relationships.

1885	A.3.  Modifications Between Version -00 and -01
1886	   o  Too many to list

1888	   o  Added new authors

1890	   o  Updated content organization and presentation

1892	Authors' Addresses

1894	   Jonathan Lennox
1895	   Vidyo, Inc.
1896	   433 Hackensack Avenue
1897	   Seventh Floor
1898	   Hackensack, NJ  07601
1899	   US

1901	   Email: jonathan@vidyo.com

1903	   Kevin Gross
1904	   AVA Networks, LLC
1905	   Boulder, CO
1906	   US

1908	   Email: kevin.gross@avanw.com

1910	   Suhas Nandakumar
1911	   Cisco Systems
1912	   170 West Tasman Drive
1913	   San Jose, CA  95134
1914	   US

1916	   Email: snandaku@cisco.com

1918	   Gonzalo Salgueiro
1919	   Cisco Systems
1920	   7200-12 Kit Creek Road
1921	   Research Triangle Park, NC  27709
1922	   US

1924	   Email: gsalguei@cisco.com
1925	   Bo Burman
1926	   Ericsson
1927	   Farogatan 6
1928	   SE-164 80 Kista
1929	   Sweden

1931	   Phone: +46 10 714 13 11
1932	   Email: bo.burman@ericsson.com