idnits 2.17.1 

draft-ietf-avtext-rtp-grouping-taxonomy-06.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (March 5, 2015) is 3339 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Outdated reference: A later version (-11) exists of
     draft-ietf-avtcore-rtp-multi-stream-06

  == Outdated reference: A later version (-10) exists of
     draft-ietf-avtcore-rtp-topologies-update-06

  == Outdated reference: A later version (-25) exists of
     draft-ietf-clue-framework-21

  == Outdated reference: A later version (-54) exists of
     draft-ietf-mmusic-sdp-bundle-negotiation-17

  == Outdated reference: A later version (-14) exists of
     draft-ietf-mmusic-sdp-simulcast-00

  == Outdated reference: A later version (-19) exists of
     draft-ietf-rtcweb-overview-13

  -- Obsolete informational reference (is this intentional?): RFC 4566
     (Obsoleted by RFC 8866)


     Summary: 0 errors (**), 0 flaws (~~), 7 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Network Working Group                                          J. Lennox
3	Internet-Draft                                                     Vidyo
4	Intended status: Informational                                  K. Gross
5	Expires: September 6, 2015                                           AVA
6	                                                           S. Nandakumar
7	                                                            G. Salgueiro
8	                                                           Cisco Systems
9	                                                               B. Burman
10	                                                                Ericsson
11	                                                           March 5, 2015

13	A Taxonomy of Grouping Semantics and Mechanisms for Real-Time Transport
14	                         Protocol (RTP) Sources
15	               draft-ietf-avtext-rtp-grouping-taxonomy-06

17	Abstract

19	   The terminology about, and associations among, Real-Time Transport
20	   Protocol (RTP) sources can be complex and somewhat opaque.  This
21	   document describes a number of existing and proposed relationships
22	   among RTP sources, and attempts to define common terminology for
23	   discussing protocol entities and their relationships.

25	Status of This Memo

27	   This Internet-Draft is submitted in full conformance with the
28	   provisions of BCP 78 and BCP 79.

30	   Internet-Drafts are working documents of the Internet Engineering
31	   Task Force (IETF).  Note that other groups may also distribute
32	   working documents as Internet-Drafts.  The list of current Internet-
33	   Drafts is at http://datatracker.ietf.org/drafts/current/.

35	   Internet-Drafts are draft documents valid for a maximum of six months
36	   and may be updated, replaced, or obsoleted by other documents at any
37	   time.  It is inappropriate to use Internet-Drafts as reference
38	   material or to cite them other than as "work in progress."

40	   This Internet-Draft will expire on September 6, 2015.

42	Copyright Notice

44	   Copyright (c) 2015 IETF Trust and the persons identified as the
45	   document authors.  All rights reserved.

47	   This document is subject to BCP 78 and the IETF Trust's Legal
48	   Provisions Relating to IETF Documents
49	   (http://trustee.ietf.org/license-info) in effect on the date of
50	   publication of this document.  Please review these documents
51	   carefully, as they describe your rights and restrictions with respect
52	   to this document.  Code Components extracted from this document must
53	   include Simplified BSD License text as described in Section 4.e of
54	   the Trust Legal Provisions and are provided without warranty as
55	   described in the Simplified BSD License.

57	Table of Contents

59	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   4
60	   2.  Concepts  . . . . . . . . . . . . . . . . . . . . . . . . . .   4
61	     2.1.  Media Chain . . . . . . . . . . . . . . . . . . . . . . .   5
62	       2.1.1.  Physical Stimulus . . . . . . . . . . . . . . . . . .   8
63	       2.1.2.  Media Capture . . . . . . . . . . . . . . . . . . . .   9
64	       2.1.3.  Raw Stream  . . . . . . . . . . . . . . . . . . . . .   9
65	       2.1.4.  Media Source  . . . . . . . . . . . . . . . . . . . .   9
66	       2.1.5.  Source Stream . . . . . . . . . . . . . . . . . . . .  10
67	       2.1.6.  Media Encoder . . . . . . . . . . . . . . . . . . . .  10
68	       2.1.7.  Encoded Stream  . . . . . . . . . . . . . . . . . . .  12
69	       2.1.8.  Dependent Stream  . . . . . . . . . . . . . . . . . .  12
70	       2.1.9.  Media Packetizer  . . . . . . . . . . . . . . . . . .  12
71	       2.1.10. RTP Stream  . . . . . . . . . . . . . . . . . . . . .  13
72	       2.1.11. RTP-based Redundancy  . . . . . . . . . . . . . . . .  13
73	       2.1.12. Redundancy RTP Stream . . . . . . . . . . . . . . . .  14
74	       2.1.13. Media Transport . . . . . . . . . . . . . . . . . . .  14
75	       2.1.14. Media Transport Sender  . . . . . . . . . . . . . . .  15
76	       2.1.15. Sent RTP Stream . . . . . . . . . . . . . . . . . . .  15
77	       2.1.16. Network Transport . . . . . . . . . . . . . . . . . .  16
78	       2.1.17. Transported RTP Stream  . . . . . . . . . . . . . . .  16
79	       2.1.18. Media Transport Receiver  . . . . . . . . . . . . . .  16
80	       2.1.19. Received RTP Stream . . . . . . . . . . . . . . . . .  16
81	       2.1.20. Received Redundancy RTP Stream  . . . . . . . . . . .  16
82	       2.1.21. RTP-based Repair  . . . . . . . . . . . . . . . . . .  17
83	       2.1.22. Repaired RTP Stream . . . . . . . . . . . . . . . . .  17
84	       2.1.23. Media Depacketizer  . . . . . . . . . . . . . . . . .  17
85	       2.1.24. Received Encoded Stream . . . . . . . . . . . . . . .  17
86	       2.1.25. Media Decoder . . . . . . . . . . . . . . . . . . . .  17
87	       2.1.26. Received Source Stream  . . . . . . . . . . . . . . .  18
88	       2.1.27. Media Sink  . . . . . . . . . . . . . . . . . . . . .  18
89	       2.1.28. Received Raw Stream . . . . . . . . . . . . . . . . .  18
90	       2.1.29. Media Render  . . . . . . . . . . . . . . . . . . . .  18
91	     2.2.  Communication Entities  . . . . . . . . . . . . . . . . .  19
92	       2.2.1.  Endpoint  . . . . . . . . . . . . . . . . . . . . . .  20
93	       2.2.2.  RTP Session . . . . . . . . . . . . . . . . . . . . .  20
94	       2.2.3.  Participant . . . . . . . . . . . . . . . . . . . . .  21
95	       2.2.4.  Multimedia Session  . . . . . . . . . . . . . . . . .  21
96	       2.2.5.  Communication Session . . . . . . . . . . . . . . . .  22

98	   3.  Concepts of Inter-Relations . . . . . . . . . . . . . . . . .  22
99	     3.1.  Synchronization Context . . . . . . . . . . . . . . . . .  22
100	       3.1.1.  RTCP CNAME  . . . . . . . . . . . . . . . . . . . . .  23
101	       3.1.2.  Clock Source Signaling  . . . . . . . . . . . . . . .  23
102	       3.1.3.  Implicitly via RtcMediaStream . . . . . . . . . . . .  23
103	       3.1.4.  Explicitly via SDP Mechanisms . . . . . . . . . . . .  23
104	     3.2.  Endpoint  . . . . . . . . . . . . . . . . . . . . . . . .  23
105	     3.3.  Participant . . . . . . . . . . . . . . . . . . . . . . .  24
106	     3.4.  RtcMediaStream  . . . . . . . . . . . . . . . . . . . . .  24
107	     3.5.  Multi-Channel Audio . . . . . . . . . . . . . . . . . . .  24
108	     3.6.  Simulcast . . . . . . . . . . . . . . . . . . . . . . . .  25
109	     3.7.  Layered Multi-Stream  . . . . . . . . . . . . . . . . . .  26
110	     3.8.  RTP Stream Duplication  . . . . . . . . . . . . . . . . .  27
111	     3.9.  Redundancy Format . . . . . . . . . . . . . . . . . . . .  28
112	     3.10. RTP Retransmission  . . . . . . . . . . . . . . . . . . .  29
113	     3.11. Forward Error Correction  . . . . . . . . . . . . . . . .  31
114	     3.12. RTP Stream Separation . . . . . . . . . . . . . . . . . .  32
115	     3.13. Multiple RTP Sessions over one Media Transport  . . . . .  33
116	   4.  Mapping from Existing Terms . . . . . . . . . . . . . . . . .  33
117	     4.1.  Telepresence Terms  . . . . . . . . . . . . . . . . . . .  33
118	       4.1.1.  Audio Capture . . . . . . . . . . . . . . . . . . . .  33
119	       4.1.2.  Capture Device  . . . . . . . . . . . . . . . . . . .  33
120	       4.1.3.  Capture Encoding  . . . . . . . . . . . . . . . . . .  33
121	       4.1.4.  Capture Scene . . . . . . . . . . . . . . . . . . . .  34
122	       4.1.5.  Endpoint  . . . . . . . . . . . . . . . . . . . . . .  34
123	       4.1.6.  Individual Encoding . . . . . . . . . . . . . . . . .  34
124	       4.1.7.  Media Capture . . . . . . . . . . . . . . . . . . . .  34
125	       4.1.8.  Media Consumer  . . . . . . . . . . . . . . . . . . .  34
126	       4.1.9.  Media Provider  . . . . . . . . . . . . . . . . . . .  34
127	       4.1.10. Stream  . . . . . . . . . . . . . . . . . . . . . . .  34
128	       4.1.11. Video Capture . . . . . . . . . . . . . . . . . . . .  34
129	     4.2.  Media Description . . . . . . . . . . . . . . . . . . . .  34
130	     4.3.  Media Stream  . . . . . . . . . . . . . . . . . . . . . .  35
131	     4.4.  Multimedia Conference . . . . . . . . . . . . . . . . . .  35
132	     4.5.  Multimedia Session  . . . . . . . . . . . . . . . . . . .  35
133	     4.6.  Multipoint Control Unit (MCU) . . . . . . . . . . . . . .  35
134	     4.7.  Multi-Session Transmission (MST)  . . . . . . . . . . . .  35
135	     4.8.  Recording Device  . . . . . . . . . . . . . . . . . . . .  36
136	     4.9.  RtcMediaStream  . . . . . . . . . . . . . . . . . . . . .  36
137	     4.10. RtcMediaStreamTrack . . . . . . . . . . . . . . . . . . .  36
138	     4.11. RTP Sender  . . . . . . . . . . . . . . . . . . . . . . .  36
139	     4.12. RTP Session . . . . . . . . . . . . . . . . . . . . . . .  36
140	     4.13. Single Session Transmission (SST) . . . . . . . . . . . .  36
141	     4.14. SSRC  . . . . . . . . . . . . . . . . . . . . . . . . . .  37
142	   5.  Security Considerations . . . . . . . . . . . . . . . . . . .  37
143	   6.  Acknowledgement . . . . . . . . . . . . . . . . . . . . . . .  37
144	   7.  Contributors  . . . . . . . . . . . . . . . . . . . . . . . .  37
145	   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  38
146	   9.  Informative References  . . . . . . . . . . . . . . . . . . .  38
147	   Appendix A.  Changes From Earlier Versions  . . . . . . . . . . .  40
148	     A.1.  Modifications Between WG Version -05 and -06  . . . . . .  40
149	     A.2.  Modifications Between WG Version -04 and -05  . . . . . .  40
150	     A.3.  Modifications Between WG Version -03 and -04  . . . . . .  40
151	     A.4.  Modifications Between WG Version -02 and -03  . . . . . .  41
152	     A.5.  Modifications Between WG Version -01 and -02  . . . . . .  41
153	     A.6.  Modifications Between WG Version -00 and -01  . . . . . .  42
154	     A.7.  Modifications Between Version -02 and -03 . . . . . . . .  43
155	     A.8.  Modifications Between Version -01 and -02 . . . . . . . .  43
156	     A.9.  Modifications Between Version -00 and -01 . . . . . . . .  43
157	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  43

159	1.  Introduction

161	   The existing taxonomy of sources in RTP is often regarded as
162	   confusing and inconsistent.  Consequently, a deep understanding of
163	   how the different terms relate to each other becomes a real
164	   challenge.  Frequently cited examples of this confusion are (1) how
165	   different protocols that make use of RTP use the same terms to
166	   signify different things and (2) how the complexities addressed at
167	   one layer are often glossed over or ignored at another.

169	   This document attempts to provide some clarity by reviewing the
170	   semantics of various aspects of sources in RTP.  As an organizing
171	   mechanism, it approaches this by describing various ways that RTP
172	   sources can be grouped and associated together.

174	   All non-specific references to ControLling mUltiple streams for
175	   tElepresence (CLUE) in this document map to [I-D.ietf-clue-framework]
176	   and all references to Web Real-Time Communications (WebRTC) map to
177	   [I-D.ietf-rtcweb-overview].

179	2.  Concepts

181	   This section defines concepts that serve to identify and name various
182	   transformations and streams in a given RTP usage.  For each concept
183	   an attempt is made to list any alternate definitions and usages that
184	   co-exist today along with various characteristics that further
185	   describes the concept.  These concepts are divided into two
186	   categories, one related to the chain of streams and transformations
187	   that media can be subject to, the other for entities involved in the
188	   communication.

190	2.1.  Media Chain

192	   In the context of this memo, Media is a sequence of synthetic or
193	   Physical Stimuli (Section 2.1.1) (sound waves, photons, key-strokes),
194	   represented in digital form.  Synthesized Media is typically
195	   generated directly in the digital domain.

197	   This section contains the concepts that can be involved in taking
198	   Media at a sender side and transporting it to a receiver, which may
199	   recover a sequence of physical stimuli.  This chain of concepts is of
200	   two main types, streams and transformations.  Streams are time-based
201	   sequences of samples of the physical stimulus in various
202	   representations, while transformations changes the representation of
203	   the streams in some way.

205	   The below examples are basic ones and it is important to keep in mind
206	   that this conceptual model enables more complex usages.  Some will be
207	   further discussed in later sections of this document.  In general the
208	   following applies to this model:

210	   o  A transformation may have zero or more inputs and one or more
211	      outputs.

213	   o  A stream is of some type, such as audio, video, real-time text,
214	      etc.

216	   o  A stream has one source transformation and one or more sink
217	      transformations (with the exception of Physical Stimulus
218	      (Section 2.1.1) that may lack source or sink transformation).

220	   o  Streams can be forwarded from a transformation output to any
221	      number of inputs on other transformations that support that type.

223	   o  If the output of a transformation is sent to multiple
224	      transformations, those streams will be identical; it takes a
225	      transformation to make them different.

227	   o  There are no formal limitations on how streams are connected to
228	      transformations, this may include loops if required by a
229	      particular transformation.

231	   It is also important to remember that this is a conceptual model.
232	   Thus real-world implementations may look different and have different
233	   structure.

235	   To provide a basic understanding of the relationships in the chain we
236	   first introduce the concepts for the sender side (Figure 1).  This
237	   covers physical stimuli until media packets are emitted onto the
238	   network.

240	                Physical Stimulus
241	                       |
242	                       V
243	             +--------------------+
244	             |    Media Capture   |
245	             +--------------------+
246	                       |
247	                  Raw Stream
248	                       V
249	             +--------------------+
250	             |    Media Source    |<- Synchronization Timing
251	             +--------------------+
252	                       |
253	                 Source Stream
254	                       V
255	             +--------------------+
256	             |   Media Encoder    |
257	             +--------------------+
258	                       |
259	                 Encoded Stream     +------------+
260	                       V            |            V
261	             +--------------------+ | +----------------------+
262	             |  Media Packetizer  | | | RTP-based Redundancy |
263	             +--------------------+ | +----------------------+
264	                       |            |            |
265	                       +------------+  Redundancy RTP Stream
266	                Source RTP Stream                |
267	                       V                         V
268	             +--------------------+    +--------------------+
269	             |  Media Transport   |    |  Media Transport   |
270	             +--------------------+    +--------------------+

272	             Figure 1: Sender Side Concepts in the Media Chain

274	   In Figure 1 we have included a branched chain to cover the concepts
275	   for using redundancy to improve the reliability of the transport.
276	   The Media Transport concept is an aggregate that is decomposed in
277	   Section 2.1.13.

279	   In Figure 2 we review a receiver media chain matching the sender
280	   side, to look at the inverse transformations and their attempts to
281	   recover identical streams as in the sender chain, subject to what may
282	   be lossy compression and imperfect Media Transport.  Note that the
283	   streams out of a reverse transformation, like the Source Stream out
284	   the Media Decoder are in many cases not the same as the corresponding
285	   ones on the sender side, thus they are prefixed with a "Received" to
286	   denote a potentially modified version.  The reason for not being the
287	   same lies in the transformations that can be of irreversible type.
288	   For example, lossy source coding in the Media Encoder prevents the
289	   Source Stream out of the Media Decoder to be the same as the one fed
290	   into the Media Encoder.  Other reasons include packet loss or late
291	   loss in the Media Transport transformation that even RTP-based
292	   Repair, if used, fails to repair.  However, some transformations are
293	   not always present, like RTP-based Repair that cannot operate without
294	   Redundancy RTP Streams.

296	           +--------------------+   +--------------------+
297	           |  Media Transport   |   |  Media Transport   |
298	           +--------------------+   +--------------------+
299	                     |                        |
300	            Received RTP Stream  Received Redundancy RTP Stream
301	                     |                        |
302	                     |    +-------------------+
303	                     V    V
304	           +--------------------+
305	           |  RTP-based Repair  |
306	           +--------------------+
307	                     |
308	            Repaired RTP Stream
309	                     V
310	           +--------------------+
311	           | Media Depacketizer |
312	           +--------------------+
313	                     |
314	           Received Encoded Stream
315	                     V
316	           +--------------------+
317	           |   Media Decoder    |
318	           +--------------------+
319	                     |
320	           Received Source Stream
321	                     V
322	           +--------------------+
323	           |     Media Sink     |--> Synchronization Information
324	           +--------------------+
325	                     |
326	            Received Raw Stream
327	                     V
328	           +--------------------+
329	           |   Media Renderer   |
330	           +--------------------+
331	                     |
332	                     V
333	             Physical Stimulus

335	            Figure 2: Receiver Side Concepts of the Media Chain

337	2.1.1.  Physical Stimulus

339	   The physical stimulus is a physical event that can be sampled and
340	   converted to digital form by an appropriate sensor or transducer.
341	   This include sound waves making up audio, photons in a light field,
342	   or other excitations or interactions with sensors, like keystrokes on
343	   a keyboard.

345	2.1.2.  Media Capture

347	   Media Capture is the process of transforming the Physical Stimulus
348	   (Section 2.1.1) into digital Media using an appropriate sensor or
349	   transducer.  The Media Capture performs a digital sampling of the
350	   physical stimulus, usually periodically, and outputs this in some
351	   representation as a Raw Stream (Section 2.1.3).  This data is due to
352	   its periodical sampling, or at least being timed asynchronous events,
353	   some form of a stream of media data.  The Media Capture is normally
354	   instantiated in some type of device, i.e. media capture device.
355	   Examples of different types of media capturing devices are digital
356	   cameras, microphones connected to A/D converters, or keyboards.

358	   Characteristics:

360	   o  A Media Capture is identified either by hardware/manufacturer ID
361	      or via a session-scoped device identifier as mandated by the
362	      application usage.

364	   o  A Media Capture can generate an Encoded Stream (Section 2.1.7) if
365	      the capture device support such a configuration.

367	   o  The nature of the Media Capture may impose constraints on the
368	      clock handling in some of the subsequent steps.  For example, many
369	      audio or video capture devices are not completely free in
370	      selecting the sample rate.

372	2.1.3.  Raw Stream

374	   The time progressing stream of digitally sampled information, usually
375	   periodically sampled and provided by a Media Capture (Section 2.1.2).
376	   A Raw Stream can also contain synthesized Media that may not require
377	   any explicit Media Capture, since it is already in an appropriate
378	   digital form.

380	2.1.4.  Media Source

382	   A Media Source is the logical source of a reference clock
383	   synchronized, time progressing, digital media stream, called a Source
384	   Stream (Section 2.1.5).  This transformation takes one or more Raw
385	   Streams (Section 2.1.3) and provides a Source Stream as output.  The
386	   output is synchronized with a reference clock (Section 3.1), which
387	   can be as simple as a system local wall clock or as complex as NTP
388	   synchronized.

390	   The output can be of different types.  One type is directly
391	   associated with a particular Media Capture's Raw Stream.  Others are
392	   more conceptual sources, like an audio mix of multiple Source Streams
393	   (Figure 3).  Mixing multiple streams typically requires that the
394	   input streams are possible to relate in time, meaning that they have
395	   to be Source Streams (Section 2.1.5) rather than Raw Streams.  In
396	   Figure 3, the generated Source Stream is a mix of the three input
397	   Source Streams.

399	                Source    Source    Source
400	                Stream    Stream    Stream
401	                  |         |         |
402	                  V         V         V
403	              +--------------------------+
404	              |        Media Source      |<-- Reference Clock
405	              |           Mixer          |
406	              +--------------------------+
407	                            |
408	                            V
409	                      Source Stream

411	         Figure 3: Conceptual Media Source in form of Audio Mixer

413	   Another possible example of a conceptual Media Source is a video
414	   surveillance switch, where the input is multiple Source Streams from
415	   different cameras, and the output is one of those Source Streams
416	   based on some selection criteria, like a round-robin or based on some
417	   video activity measure.

419	   Characteristics:

421	   o  At any point, it can represent a physical captured source or
422	      conceptual source.

424	2.1.5.  Source Stream

426	   A time progressing stream of digital samples that has been
427	   synchronized with a reference clock and comes from particular Media
428	   Source (Section 2.1.4).

430	2.1.6.  Media Encoder

432	   A Media Encoder is a transform that is responsible for encoding the
433	   media data from a Source Stream (Section 2.1.5) into another
434	   representation, usually more compact, that is output as an Encoded
435	   Stream (Section 2.1.7).

437	   The Media Encoder step commonly includes pre-encoding
438	   transformations, such as scaling, resampling etc.  The Media Encoder
439	   can have a significant number of configuration options that affects
440	   the properties of the Encoded Stream.  This include properties such
441	   as bit-rate, start points for decoding, resolution, bandwidth or
442	   other fidelity affecting properties.  The actually used codec is also
443	   an important factor in many communication systems.

445	   Scalable Media Encoders need special attention as they produce
446	   multiple outputs that are potentially of different types.  As shown
447	   in Figure 4, a scalable Media Encoder takes one input Source Stream
448	   and encodes it into multiple output streams of two different types;
449	   at least one Encoded Stream that is independently decodable and one
450	   or more Dependent Streams (Section 2.1.8).  Decoding requires at
451	   least one Encoded Stream and zero or more Dependent Streams.  A
452	   Dependent Stream's dependency is one of the grouping relations this
453	   document discusses further in Section 3.7.

455	                              Source Stream
456	                                    |
457	                                    V
458	                       +--------------------------+
459	                       |  Scalable Media Encoder  |
460	                       +--------------------------+
461	                          |         |   ...    |
462	                          V         V          V
463	                       Encoded  Dependent  Dependent
464	                       Stream    Stream     Stream

466	            Figure 4: Scalable Media Encoder Input and Outputs

468	   There are also other variants of encoders, like so-called Multiple
469	   Description Coding (MDC).  Such Media Encoder produce multiple
470	   independent and thus individually decodable Encoded Streams.
471	   However, (logically) combining multiple of these Encoded Streams into
472	   a single Received Source Stream during decoding leads to an
473	   improvement in perceptual reproduced quality when compared to
474	   decoding a single Encoded Stream.

476	   Creating multiple Encoded Streams from the same Source Stream, where
477	   the Encoded Streams are neither in a scalable nor in an MDC
478	   relationship is commonly utilized in Simulcast
479	   [I-D.ietf-mmusic-sdp-simulcast] environments.

481	   Characteristics:

483	   o  A Media Source can be multiply encoded by different Media Encoders
484	      to provide various encoded representations.

486	2.1.7.  Encoded Stream

488	   A stream of time synchronized encoded media that can be independently
489	   decoded.

491	   Characteristics:

493	   o  Due to temporal dependencies, an Encoded Stream may have
494	      limitations in where decoding can be started.  These entry points,
495	      for example Intra frames from a video encoder, may require
496	      identification and their generation may be event based or
497	      configured to occur periodically.

499	2.1.8.  Dependent Stream

501	   A stream of time synchronized encoded media fragments that are
502	   dependent on one or more Encoded Streams (Section 2.1.7) and zero or
503	   more Dependent Streams to be possible to decode.

505	   Characteristics:

507	   o  Each Dependent Stream has a set of dependencies.  These
508	      dependencies must be understood by the parties in a Multimedia
509	      Session that intend to use a Dependent Stream.

511	2.1.9.  Media Packetizer

513	   The transformation of taking one or more Encoded (Section 2.1.7) or
514	   Dependent Streams (Section 2.1.8) and put their content into one or
515	   more sequences of packets, normally RTP packets, and output Source
516	   RTP Streams (Section 2.1.10).  This step includes both generating RTP
517	   payloads as well as RTP packets.

519	   The Media Packetizer can use multiple inputs when producing a single
520	   RTP Stream.  One such example is SRST packetization when using
521	   Scalable Video Coding (SVC) (Section 3.7).

523	   The Media Packetizer can also produce multiple RTP Streams, for
524	   example when Encoded and/or Dependent Streams are distributed over
525	   multiple RTP Streams.  One example of this is MRMT packetization when
526	   using SVC (Section 3.7).

528	   Characteristics:

530	   o  The Media Packetizer will select which Synchronization source(s)
531	      (SSRC) [RFC3550] in which RTP Sessions that are used.

533	   o  Media Packetizer can combine multiple Encoded or Dependent Streams
534	      into one or more RTP Streams.

536	2.1.10.  RTP Stream

538	   A stream of RTP packets containing media data, source or redundant.
539	   The RTP Stream is identified by an SSRC belonging to a particular RTP
540	   Session.  The RTP Session is identified as discussed in
541	   Section 2.2.2.

543	   A Source RTP Stream is a RTP Stream containing at least some content
544	   from an Encoded Stream (Section 2.1.7).  Source material is any media
545	   material that is produced for transport over RTP without any
546	   additional RTP-based redundancy applied.  Note that RTP-based
547	   redundancy excludes the type of redundancy that most suitable Media
548	   Encoders (Section 2.1.6) may add to the media format of the Encoded
549	   Stream that makes it cope better with inevitable RTP packet losses.
550	   This is further described in RTP-based Redundancy (Section 2.1.11)
551	   and Redundancy RTP Stream (Section 2.1.12).

553	   Characteristics:

555	   o  Each RTP Stream is identified by a Synchronization source (SSRC)
556	      [RFC3550] that is carried in every RTP and RTP Control Protocol
557	      (RTCP) packet header.  The SSRC is unique in a specific RTP
558	      Session context.

560	   o  At any given point in time, a RTP Stream can have one and only one
561	      SSRC, but SSRCs for a given RTP Stream can change over time.  SSRC
562	      collision and clock rate change [RFC7160] are examples of valid
563	      reasons to change SSRC for an RTP Stream.  In those cases, the RTP
564	      Stream itself is not changed in any significant way, only the
565	      identifying SSRC number.

567	   o  Each SSRC defines a unique RTP sequence numbering and timing
568	      space.

570	   o  Several RTP Streams, each with their own SSRC, may represent a
571	      single Media Source.

573	   o  Several RTP Streams, each with their own SSRC, can be carried in a
574	      single RTP Session.

576	2.1.11.  RTP-based Redundancy

578	   RTP-based Redundancy is defined here as a transformation that
579	   generates redundant or repair packets sent out as a Redundancy RTP
580	   Stream (Section 2.1.12) to mitigate network transport impairments,
581	   like packet loss and delay.

583	   The RTP-based Redundancy exists in many flavors; they may be
584	   generating independent Repair Streams that are used in addition to
585	   the Source Stream (like RTP Retransmission (Section 3.10) and some
586	   special types of Forward Error Correction, like RTP stream
587	   duplication (Section 3.8)), they may generate a new Source Stream by
588	   combining redundancy information with source information (Using XOR
589	   FEC (Section 3.11) as a redundancy payload (Section 3.9)), or
590	   completely replace the source information with only redundancy
591	   packets.

593	2.1.12.  Redundancy RTP Stream

595	   A RTP Stream (Section 2.1.10) that contains no original source data,
596	   only redundant data, which may either be used standalone or be
597	   combined with one or more Received RTP Streams (Section 2.1.19) to
598	   produce Repaired RTP Streams (Section 2.1.22).

600	2.1.13.  Media Transport

602	   A Media Transport defines the transformation that the RTP Streams
603	   (Section 2.1.10) are subjected to by the end-to-end transport from
604	   one RTP sender to one specific RTP receiver (an RTP Session
605	   (Section 2.2.2) may contain multiple RTP receivers per sender).  Each
606	   Media Transport is defined by a transport association that is
607	   normally identified by a 5-tuple (source address, source port,
608	   destination address, destination port, transport protocol), but a
609	   proposal exists for sending multiple transport associations on a
610	   single 5-tuple [I-D.westerlund-avtcore-transport-multiplexing].

612	   Characteristics:

614	   o  Media Transport transmits RTP Streams of RTP Packets from a source
615	      transport address to a destination transport address.

617	   o  Each Media Transport contains only a single RTP Session.

619	   o  A single RTP Session can span multiple Media Transports.

621	   The Media Transport concept sometimes needs to be decomposed into
622	   more steps to enable discussion of what a sender emits that gets
623	   transformed by the network before it is received by the receiver.
624	   Thus we provide also this Media Transport decomposition (Figure 5).

626	                               RTP Stream
627	                                    |
628	                                    V
629	                       +--------------------------+
630	                       |  Media Transport Sender  |
631	                       +--------------------------+
632	                                    |
633	                             Sent RTP Stream
634	                                    V
635	                       +--------------------------+
636	                       |    Network Transport     |
637	                       +--------------------------+
638	                                    |
639	                         Transported RTP Stream
640	                                    V
641	                       +--------------------------+
642	                       | Media Transport Receiver |
643	                       +--------------------------+
644	                                    |
645	                                    V
646	                           Received RTP Stream

648	                Figure 5: Decomposition of Media Transport

650	2.1.14.  Media Transport Sender

652	   The first transformation within the Media Transport (Section 2.1.13)
653	   is the Media Transport Sender.  The sending Endpoint (Section 2.2.1)
654	   takes an RTP Stream and emits the packets onto the network using the
655	   transport association established for this Media Transport, thereby
656	   creating a Sent RTP Stream (Section 2.1.15).  In the process, it
657	   transforms the RTP Stream in several ways.  First, it generates the
658	   necessary protocol headers for the transport association, for example
659	   IP and UDP headers, thus forming IP/UDP/RTP packets.  In addition,
660	   the Media Transport Sender may queue, pace or otherwise affect how
661	   the packets are emitted onto the network, thereby potentially
662	   introducing delay, jitter and inter packet spacings that characterize
663	   the Sent RTP Stream.

665	2.1.15.  Sent RTP Stream

667	   The Sent RTP Stream is the RTP Stream as entering the first hop of
668	   the network path to its destination.  The Sent RTP Stream is
669	   identified using network transport addresses, like for IP/UDP the
670	   5-tuple (source IP address, source port, destination IP address,
671	   destination port, and protocol (UDP)).

673	2.1.16.  Network Transport

675	   Network Transport is the transformation that subjects the Sent RTP
676	   Stream (Section 2.1.15) to traveling from the source to the
677	   destination through the network.  This transformation can result in
678	   loss of some packets, varying delay on a per packet basis, packet
679	   duplication, and packet header or data corruption.  This
680	   transformation produces a Transported RTP Stream (Section 2.1.17) at
681	   the exit of the network path.

683	2.1.17.  Transported RTP Stream

685	   The RTP Stream that is emitted out of the network path at the
686	   destination, subjected to the Network Transport's transformation
687	   (Section 2.1.16).

689	2.1.18.  Media Transport Receiver

691	   The receiver Endpoint's (Section 2.2.1) transformation of the
692	   Transported RTP Stream (Section 2.1.17) by its reception process,
693	   which results in the Received RTP Stream (Section 2.1.19).  This
694	   transformation includes transport checksums being verified.  Sensible
695	   system designs typically either discard packets with mis-matching
696	   checksums, or pass them on while somehow marking them in the
697	   resulting Received RTP Stream so to alarm subsequent transformations
698	   about the possible corrupt state.  In this context it is worth noting
699	   that there is typically some probability for corrupt packets to pass
700	   through undetected (with a seemingly correct checksum).  Other
701	   transformations can compensate for delay variations in receiving a
702	   packet on the network interface and providing it to the application
703	   (de-jitter buffer).

705	2.1.19.  Received RTP Stream

707	   The RTP Stream (Section 2.1.10) resulting from the Media Transport's
708	   transformation, i.e. subjected to packet loss, packet corruption,
709	   packet duplication and varying transmission delay from sender to
710	   receiver.

712	2.1.20.  Received Redundancy RTP Stream

714	   The Redundancy RTP Stream (Section 2.1.12) resulting from the Media
715	   Transport transformation, i.e. subjected to packet loss, packet
716	   corruption, and varying transmission delay from sender to receiver.

718	2.1.21.  RTP-based Repair

720	   RTP-based Repair is a Transformation that takes as input zero or more
721	   Received RTP Streams (Section 2.1.19) and one or more Received
722	   Redundancy RTP Streams (Section 2.1.20), and produces one or more
723	   Repaired RTP Streams (Section 2.1.22) that are as close to the
724	   corresponding sent Source RTP Streams (Section 2.1.10) as possible,
725	   using different RTP-based repair methods, for example the ones
726	   referred in RTP-based Redundancy (Section 2.1.11).

728	2.1.22.  Repaired RTP Stream

730	   A Received RTP Stream (Section 2.1.19) for which Received Redundancy
731	   RTP Stream (Section 2.1.20) information has been used to try to
732	   recover the Source RTP Stream (Section 2.1.10) as it was before Media
733	   Transport (Section 2.1.13).

735	2.1.23.  Media Depacketizer

737	   A Media Depacketizer takes one or more RTP Streams (Section 2.1.10),
738	   depacketizes them, and attempts to reconstitute the Encoded Streams
739	   (Section 2.1.7) or Dependent Streams (Section 2.1.8) present in those
740	   RTP Streams.

742	   In practical implementations, the Media Depacketizer and the Media
743	   Decoder may be tightly coupled and share information to improve or
744	   optimize the overall decoding and error concealment process.  It is,
745	   however, not expected that there would be any benefit in defining a
746	   taxonomy for those detailed (and likely very implementation-
747	   dependent) steps.

749	2.1.24.  Received Encoded Stream

751	   The received version of an Encoded Stream (Section 2.1.7).

753	2.1.25.  Media Decoder

755	   A Media Decoder is a transformation that is responsible for decoding
756	   Encoded Streams (Section 2.1.7) and any Dependent Streams
757	   (Section 2.1.8) into a Source Stream (Section 2.1.5).

759	   In practical implementations, the Media Decoder and the Media
760	   Depacketizer may be tightly coupled and share information to improve
761	   or optimize the overall decoding process in various ways.  It is
762	   however not expected that there would be any benefit in defining a
763	   taxonomy for those detailed (and likely very implementation-
764	   dependent) steps.

766	   Characteristics:

768	   o  A Media Decoder has to deal with any errors in the Encoded Streams
769	      that resulted from corruption or failure to repair packet losses.
770	      Therefore, it commonly is robust to error and losses, and includes
771	      concealment methods.

773	2.1.26.  Received Source Stream

775	   The received version of a Source Stream (Section 2.1.5).

777	2.1.27.  Media Sink

779	   The Media Sink receives a Source Stream (Section 2.1.5) that
780	   contains, usually periodically, sampled media data together with
781	   associated synchronization information.  Depending on application,
782	   this Source Stream then needs to be transformed into a Raw Stream
783	   (Section 2.1.3) that is conveyed to the Media Render
784	   (Section 2.1.29), synchronized with the output from other Media
785	   Sinks.  The Media Sink may also be connected with a Media Source
786	   (Section 2.1.4) and be used as part of a conceptual Media Source.

788	   Characteristics:

790	   o  The Media Sink can further transform the Source Stream into a
791	      representation that is suitable for rendering on the Media Render
792	      as defined by the application or system-wide configuration.  This
793	      include sample scaling, level adjustments etc.

795	2.1.28.  Received Raw Stream

797	   The received version of a Raw Stream (Section 2.1.3).

799	2.1.29.  Media Render

801	   A Media Render takes a Raw Stream (Section 2.1.3) and converts it
802	   into Physical Stimulus (Section 2.1.1) that a human user can
803	   perceive.  Examples of such devices are screens, and D/A converters
804	   connected to amplifiers and loudspeakers.

806	   Characteristics:

808	   o  An Endpoint can potentially have multiple Media Renders for each
809	      media type.

811	2.2.  Communication Entities

813	   This section contains concepts for entities involved in the
814	   communication.

816	      +------------------------------------------------------------+
817	      | Communication Session                                      |
818	      |                                                            |
819	      | +----------------+                      +----------------+ |
820	      | | Participant A  |    +------------+    | Participant B  | |
821	      | |                |    | Multimedia |    |                | |
822	      | | +------------+ |<==>| Session    |<==>| +------------+ | |
823	      | | | Endpoint A | |    |            |    | | Endpoint B | | |
824	      | | |            | |    +------------+    | |            | | |
825	      | | | +----------+-+----------------------+-+----------+ | | |
826	      | | | | RTP      | |                      | |          | | | |
827	      | | | | Session  |-+---Media Transport----+>|          | | | |
828	      | | | | Audio    |<+---Media Transport----+-|          | | | |
829	      | | | |          | |          ^           | |          | | | |
830	      | | | +----------+-+----------|-----------+-+----------+ | | |
831	      | | |            | |          v           | |            | | |
832	      | | |            | | +-----------------+  | |            | | |
833	      | | |            | | | Synchronization |  | |            | | |
834	      | | |            | | |     Context     |  | |            | | |
835	      | | |            | | +-----------------+  | |            | | |
836	      | | |            | |          ^           | |            | | |
837	      | | | +----------+-+----------|-----------+-+----------+ | | |
838	      | | | | RTP      | |          v           | |          | | | |
839	      | | | | Session  |<+---Media Transport----+-|          | | | |
840	      | | | | Video    |-+---Media Transport----+>|          | | | |
841	      | | | |          | |                      | |          | | | |
842	      | | | +----------+-+----------------------+-+----------+ | | |
843	      | | +------------+ |                      | +------------+ | |
844	      | +----------------+                      +----------------+ |
845	      +------------------------------------------------------------+

847	    Figure 6: Example Point to Point Communication Session with two RTP
848	                                 Sessions

850	   Figure 6 shows a high-level example representation of a very basic
851	   point-to-point Communication Session between Participants A and B.
852	   It uses two different audio and video RTP Sessions between A's and
853	   B's Endpoints, using separate Media Transports for those RTP
854	   Sessions.  The Multimedia Session shared by the Participants can, for
855	   example, be established using SIP (i.e., there is a SIP Dialog
856	   between A and B).  The terms used in Figure 6 are further elaborated
857	   in the sub-sections below.

859	2.2.1.  Endpoint

861	   A single addressable entity sending or receiving RTP packets.  It may
862	   be decomposed into several functional blocks, but as long as it
863	   behaves as a single RTP stack entity it is classified as a single
864	   "Endpoint".

866	   Characteristics:

868	   o  Endpoints can be identified in several different ways.  While RTCP
869	      Canonical Names (CNAMEs) [RFC3550] provide a globally unique and
870	      stable identification mechanism for the duration of the
871	      Communication Session (see Section 2.2.5), their validity applies
872	      exclusively within a Synchronization Context (Section 3.1).  Thus
873	      one Endpoint can handle multiple CNAMEs, each of which can be
874	      shared among a set of Endpoints belonging to the same Participant
875	      (Section 2.2.3).  Therefore, mechanisms outside the scope of RTP,
876	      such as application defined mechanisms, must be used to ensure
877	      Endpoint identification when outside this Synchronization Context.

879	   o  An Endpoint can be associated with at most one Participant
880	      (Section 2.2.3) at any single point in time.

882	   o  In some contexts, an Endpoint would typically correspond to a
883	      single "host", for example a computer using a single network
884	      interface and being used by a single human user.  In other
885	      contexts, a single "host" can serve multiple Participants, in
886	      which case each Participant's Endpoint may share properties, for
887	      example the IP address part of a transport address.

889	2.2.2.  RTP Session

891	   An RTP Session is an association among a group of Participants
892	   communicating with RTP.  It is a group communications channel which
893	   can potentially carry a number of RTP Streams.  Within an RTP
894	   Session, every Participant can find meta-data and control information
895	   (over RTCP) about all the RTP Streams in the RTP Session.  The
896	   bandwidth of the RTCP control channel is shared between all
897	   Participants within an RTP Session.

899	   Characteristics:

901	   o  An RTP Session can carry one ore more RTP Streams.

903	   o  An RTP Session shares a single SSRC space as defined in RFC3550
904	      [RFC3550].  That is, the Endpoints participating in an RTP Session
905	      can see an SSRC identifier transmitted by any of the other
906	      Endpoints.  An Endpoint can receive an SSRC either as SSRC or as a
907	      Contributing source (CSRC) in RTP and RTCP packets, as defined by
908	      the Endpoints' network interconnection topology.

910	   o  An RTP Session uses at least two Media Transports
911	      (Section 2.1.13), one for sending and one for receiving.
912	      Commonly, the receiving Media Transport is the reverse direction
913	      of the Media Transport used for sending.  An RTP Session may use
914	      many Media Transports and these define the session's network
915	      interconnection topology.

917	   o  A single Media Transport always carries a single RTP Session.

919	   o  Multiple RTP Sessions can be conceptually related, for example
920	      originating from or targeted for the same Participant
921	      (Section 2.2.3) or Endpoint (Section 2.2.1), or by containing RTP
922	      Streams that are somehow related (Section 3).

924	2.2.3.  Participant

926	   A Participant is an entity reachable by a single signaling address,
927	   and is thus related more to the signaling context than to the media
928	   context.

930	   Characteristics:

932	   o  A single signaling-addressable entity, using an application-
933	      specific signaling address space, for example a SIP URI.

935	   o  A Participant can participate in several Multimedia Sessions
936	      (Section 2.2.4).

938	   o  A Participant can be comprised of several associated Endpoints
939	      (Section 2.2.1).

941	2.2.4.  Multimedia Session

943	   A Multimedia Session is an association among a group of Participants
944	   (Section 2.2.3) engaged in the communication via one or more RTP
945	   Sessions (Section 2.2.2).  It defines logical relationships among
946	   Media Sources (Section 2.1.4) that appear in multiple RTP Sessions.

948	   Characteristics:

950	   o  A Multimedia Session can be composed of several RTP Sessions with
951	      potentially multiple RTP Streams per RTP Session.

953	   o  Each Participant in a Multimedia Session can have a multitude of
954	      Media Captures and Media Rendering devices.

956	   o  A single Multimedia Session can contain media from one or more
957	      Synchronization Contexts (Section 3.1).  An example of that is a
958	      Multimedia Session containing one set of audio and video for
959	      communication purposes belonging to one Synchronization Context,
960	      and another set of audio and video for presentation purposes (like
961	      playing a video file) with a separate Synchronization Context that
962	      has no strong timing relationship and need not be strictly
963	      synchronized with the audio and video used for communication.

965	2.2.5.  Communication Session

967	   A Communication Session is an association among two or more
968	   Participants (Section 2.2.3) communicating with each other via one or
969	   more Multimedia Sessions (Section 2.2.4).

971	   Characteristics:

973	   o  Each Participant in a Communication Session is identified via an
974	      application-specific signaling address.

976	   o  A Communication Session is composed of Participants that share at
977	      least one Multimedia Session, involving one or more parallel RTP
978	      Sessions with potentially multiple RTP Streams per RTP Session.

980	   For example, in a full mesh communication, the Communication Session
981	   consists of a set of separate Multimedia Sessions between each pair
982	   of Participants.  Another example is a centralized conference, where
983	   the Communication Session consists of a set of Multimedia Sessions
984	   between each Participant and the conference handler.

986	3.  Concepts of Inter-Relations

988	   This section uses the concepts from previous sections, and looks at
989	   different types of relationships among them.  These relationships
990	   occur at different abstraction levels and for different purposes, but
991	   the reason for the needed relationship at a certain step in the media
992	   handling chain may exist at another step.  For example, the use of
993	   Simulcast (Section 3.6)) implies a need to determine relations at RTP
994	   Stream level, but the underlying reason is that multiple Media
995	   Encoders use the same Media Source, i.e. to be able to identify a
996	   common Media Source.

998	3.1.  Synchronization Context

1000	   A Synchronization Context defines a requirement on a strong timing
1001	   relationship between the Media Sources, typically requiring alignment
1002	   of clock sources.  Such a relationship can be identified in multiple
1003	   ways as listed below.  A single Media Source can only belong to a
1004	   single Synchronization Context, since it is assumed that a single
1005	   Media Source can only have a single media clock and requiring
1006	   alignment to several Synchronization Contexts (and thus reference
1007	   clocks) will effectively merge those into a single Synchronization
1008	   Context.

1010	3.1.1.  RTCP CNAME

1012	   RFC3550 [RFC3550] describes Inter-media synchronization between RTP
1013	   Sessions based on RTCP CNAME, RTP and Network Time Protocol (NTP)
1014	   [RFC5905] formatted timestamps of a reference clock.  As indicated in
1015	   [RFC7273], despite using NTP format timestamps, it is not required
1016	   that the clock be synchronized to an NTP source.

1018	3.1.2.  Clock Source Signaling

1020	   [RFC7273] provides a mechanism to signal the clock source in Session
1021	   Description Protocol (SDP) [RFC4566] both for the reference clock as
1022	   well as the media clock, thus allowing a Synchronization Context to
1023	   be defined beyond the one defined by the usage of CNAME source
1024	   descriptions.

1026	3.1.3.  Implicitly via RtcMediaStream

1028	   WebRTC defines "RtcMediaStream" with one or more
1029	   "RtcMediaStreamTracks".  All tracks in a "RtcMediaStream" are
1030	   intended to be synchronized when rendered, implying that they must be
1031	   generated such that synchronization is possible.

1033	3.1.4.  Explicitly via SDP Mechanisms

1035	   The SDP Grouping Framework [RFC5888] defines an m= line (Section 4.2)
1036	   grouping mechanism called "Lip Synchronization" (with LS
1037	   identification-tag) for establishing the synchronization requirement
1038	   across m= lines when they map to individual sources.

1040	   Source-Specific Media Attributes in SDP [RFC5576] extends the above
1041	   mechanism when multiple Media Sources are described by a single m=
1042	   line.

1044	3.2.  Endpoint

1046	   Some applications requires knowledge of what Media Sources originate
1047	   from a particular Endpoint (Section 2.2.1).  This can include such
1048	   decisions as packet routing between parts of the topology, knowing
1049	   the Endpoint origin of the RTP Streams.

1051	   In RTP, this identification has been overloaded with the
1052	   Synchronization Context (Section 3.1) through the usage of the RTCP
1053	   source description CNAME (Section 3.1.1).  This works for some
1054	   usages, but in others it breaks down.  For example, if an Endpoint
1055	   has two sets of Media Sources that have different Synchronization
1056	   Contexts, like the audio and video of the human Participant as well
1057	   as a set of Media Sources of audio and video for a shared movie,
1058	   CNAME would not be an appropriate identification for that Endpoint.
1059	   Therefore, an Endpoint may have multiple CNAMEs.  The CNAMEs or the
1060	   Media Sources themselves can be related to the Endpoint.

1062	3.3.  Participant

1064	   In communication scenarios, it is commonly needed to know which Media
1065	   Sources originate from which Participant (Section 2.2.3).  One reason
1066	   is, for example, to enable the application to display Participant
1067	   Identity information correctly associated with the Media Sources.
1068	   This association is handled through the signaling solution to point
1069	   at a specific Multimedia Session where the Media Sources may be
1070	   explicitly or implicitly tied to a particular Endpoint.

1072	   Participant information becomes more problematic due to Media Sources
1073	   that are generated through mixing or other conceptual processing of
1074	   Raw Streams or Source Streams that originate from different
1075	   Participants.  This type of Media Sources can thus have a dynamically
1076	   varying set of origins and Participants.  RTP contains the concept of
1077	   CSRC that carry information about the previous step origin of the
1078	   included media content on RTP level.

1080	3.4.  RtcMediaStream

1082	   An RtcMediaStream in WebRTC is an explicit grouping of a set of Media
1083	   Sources (RtcMediaStreamTracks) that share a common identifier and a
1084	   single Synchronization Context (Section 3.1).

1086	3.5.  Multi-Channel Audio

1088	   There exist a number of RTP payload formats that can carry multi-
1089	   channel audio, despite the codec being a mono encoder.  Multi-channel
1090	   audio can be viewed as multiple Media Sources sharing a common
1091	   Synchronization Context.  These are independently encoded by a Media
1092	   Encoder and the different Encoded Streams are packetized together in
1093	   a time synchronized way into a single Source RTP Stream, using the
1094	   used codec's RTP Payload format.  Examples of codecs that support
1095	   multi-channel audio are PCMA and PCMU [RFC3551], AMR [RFC4867], and
1096	   G.719 [RFC5404].

1098	3.6.  Simulcast

1100	   A Media Source represented as multiple independent Encoded Streams
1101	   constitutes a Simulcast [I-D.ietf-mmusic-sdp-simulcast] or MDC of
1102	   that Media Source.  Figure 7 shows an example of a Media Source that
1103	   is encoded into three separate Simulcast streams, that are in turn
1104	   sent on the same Media Transport flow.  When using Simulcast, the RTP
1105	   Streams may be sharing RTP Session and Media Transport, or be
1106	   separated on different RTP Sessions and Media Transports, or any
1107	   combination of these two.  It is other considerations that affect
1108	   which usage is desirable, as discussed in Section 3.12.

1110	                            +----------------+
1111	                            |  Media Source  |
1112	                            +----------------+
1113	                     Source Stream  |
1114	             +----------------------+----------------------+
1115	             |                      |                      |
1116	             V                      V                      V
1117	    +------------------+   +------------------+   +------------------+
1118	    |  Media Encoder   |   |  Media Encoder   |   |  Media Encoder   |
1119	    +------------------+   +------------------+   +------------------+
1120	             | Encoded              | Encoded              | Encoded
1121	             | Stream               | Stream               | Stream
1122	             V                      V                      V
1123	    +------------------+   +------------------+   +------------------+
1124	    | Media Packetizer |   | Media Packetizer |   | Media Packetizer |
1125	    +------------------+   +------------------+   +------------------+
1126	             | Source               | Source               | Source
1127	             | RTP                  | RTP                  | RTP
1128	             | Stream               | Stream               | Stream
1129	             +-----------------+    |    +-----------------+
1130	                               |    |    |
1131	                               V    V    V
1132	                          +-------------------+
1133	                          |  Media Transport  |
1134	                          +-------------------+

1136	                Figure 7: Example of Media Source Simulcast

1138	   The Simulcast relation between the RTP Streams is the common Media
1139	   Source.  In addition, to be able to identify the common Media Source,
1140	   a receiver of the RTP Stream may need to know which configuration or
1141	   encoding goals that lay behind the produced Encoded Stream and its
1142	   properties.  This to enable selection of the stream that is most
1143	   useful in the application at that moment.

1145	3.7.  Layered Multi-Stream

1147	   Layered Multi-Stream (LMS) is a mechanism by which different portions
1148	   of a layered or scalable encoding of a Source Stream are sent using
1149	   separate RTP Streams (sometimes in separate RTP Sessions).  LMSs are
1150	   useful for receiver control of layered media.

1152	   A Media Source represented as an Encoded Stream and multiple
1153	   Dependent Streams constitutes a Media Source that has layered
1154	   dependencies.  Figure 8 represents an example of a Media Source that
1155	   is encoded into three dependent layers, where two layers are sent on
1156	   the same Media Transport using different RTP Streams, i.e. SSRCs, and
1157	   the third layer is sent on a separate Media Transport.

1159	                            +----------------+
1160	                            |  Media Source  |
1161	                            +----------------+
1162	                                    |
1163	                                    |
1164	                                    V
1165	       +---------------------------------------------------------+
1166	       |                      Media Encoder                      |
1167	       +---------------------------------------------------------+
1168	               |                    |                     |
1169	        Encoded Stream       Dependent Stream     Dependent Stream
1170	               |                    |                     |
1171	               V                    V                     V
1172	       +----------------+   +----------------+   +----------------+
1173	       |Media Packetizer|   |Media Packetizer|   |Media Packetizer|
1174	       +----------------+   +----------------+   +----------------+
1175	               |                    |                     |
1176	          RTP Stream           RTP Stream            RTP Stream
1177	               |                    |                     |
1178	               +------+      +------+                     |
1179	                      |      |                            |
1180	                      V      V                            V
1181	                +-----------------+              +-----------------+
1182	                | Media Transport |              | Media Transport |
1183	                +-----------------+              +-----------------+

1185	           Figure 8: Example of Media Source Layered Dependency

1187	   It is sometimes useful to make a distinction between using a single
1188	   Media Transport or multiple separate Media Transports when (in both
1189	   cases) using multiple RTP Streams to carry Encoded Streams and
1190	   Dependent Streams for a Media Source.  Therefore, the following new
1191	   terminology is defined here:

1193	   SRST:  Single RTP Stream on a Single Media Transport

1195	   MRST:  Multiple RTP Streams on a Single Media Transport

1197	   MRMT:  Multiple RTP Streams on Multiple Media Transports

1199	   MRST and MRMT relations needs to identify the common Media Encoder
1200	   origin for the Encoded and Dependent Streams.  When using different
1201	   RTP Sessions, thus different Media Transports, and as long as there
1202	   is only one RTP Stream per Media Encoder and a single Media Source in
1203	   each RTP Session (MRMT), common SSRC and CNAMEs can be used to
1204	   identify the common Media Source.  When multiple RTP Streams are sent
1205	   from one Media Encoder in the same RTP Session (MRST), then CNAME is
1206	   the only currently specified RTP identifier that can be used.  In
1207	   cases where multiple Media Encoders use multiple Media Sources
1208	   sharing Synchronization Context, and thus having a common CNAME,
1209	   additional heuristics or identification need to be applied to create
1210	   the MRST or MRMT relationships between the RTP Streams.

1212	3.8.  RTP Stream Duplication

1214	   RTP Stream Duplication [RFC7198], using the same or different Media
1215	   Transports, and optionally also delaying the duplicate [RFC7197],
1216	   offers a simple way to protect media flows from packet loss in some
1217	   cases (see Figure 9).  It is a specific type of redundancy and all
1218	   but one Source RTP Stream (Section 2.1.10) are effectively Redundancy
1219	   RTP Streams (Section 2.1.12), but since both Source and Redundant RTP
1220	   Streams are the same it does not matter which one is which.  This can
1221	   also be seen as a specific type of Simulcast (Section 3.6) that
1222	   transmits the same Encoded Stream (Section 2.1.7) multiple times.

1224	                            +----------------+
1225	                            |  Media Source  |
1226	                            +----------------+
1227	                     Source Stream  |
1228	                                    V
1229	                            +----------------+
1230	                            | Media Encoder  |
1231	                            +----------------+
1232	                    Encoded Stream  |
1233	                        +-----------+-----------+
1234	                        |                       |
1235	                        V                       V
1236	               +------------------+    +------------------+
1237	               | Media Packetizer |    | Media Packetizer |
1238	               +------------------+    +------------------+
1239	                 Source | RTP Stream     Source | RTP Stream
1240	                        |                       V
1241	                        |                +-------------+
1242	                        |                | Delay (opt) |
1243	                        |                +-------------+
1244	                        |                       |
1245	                        +-----------+-----------+
1246	                                    |
1247	                                    V
1248	                          +-------------------+
1249	                          |  Media Transport  |
1250	                          +-------------------+

1252	                Figure 9: Example of RTP Stream Duplication

1254	3.9.  Redundancy Format

1256	   The RTP Payload for Redundant Audio Data [RFC2198] defines a
1257	   transport for redundant audio data together with primary data in the
1258	   same RTP payload.  The redundant data can be a time delayed version
1259	   of the primary or another time delayed Encoded Stream using a
1260	   different Media Encoder to encode the same Media Source as the
1261	   primary, as depicted in Figure 10.

1263	              +--------------------+
1264	              |    Media Source    |
1265	              +--------------------+
1266	                        |
1267	                   Source Stream
1268	                        |
1269	                        +------------------------+
1270	                        |                        |
1271	                        V                        V
1272	              +--------------------+   +--------------------+
1273	              |   Media Encoder    |   |   Media Encoder    |
1274	              +--------------------+   +--------------------+
1275	                        |                        |
1276	                        |                 +------------+
1277	                  Encoded Stream          | Time Delay |
1278	                        |                 +------------+
1279	                        |                        |
1280	                        |     +------------------+
1281	                        V     V
1282	              +--------------------+
1283	              |  Media Packetizer  |
1284	              +--------------------+
1285	                        |
1286	                        V
1287	                   RTP Stream

1289	   Figure 10: Concept for usage of Audio Redundancy with different Media
1290	                                 Encoders

1292	   The Redundancy format is thus providing the necessary meta
1293	   information to correctly relate different parts of the same Encoded
1294	   Stream, or in the case depicted above (Figure 10) relate the Received
1295	   Source Stream fragments coming out of different Media Decoders to be
1296	   able to combine them together into a less erroneous Source Stream.

1298	3.10.  RTP Retransmission

1300	   Figure 11 shows an example where a Media Source's Source RTP Stream
1301	   is protected by a retransmission (RTX) flow [RFC4588].  In this
1302	   example the Source RTP Stream and the Redundancy RTP Stream share the
1303	   same Media Transport.

1305	          +--------------------+
1306	          |    Media Source    |
1307	          +--------------------+
1308	                    |
1309	                    V
1310	          +--------------------+
1311	          |   Media Encoder    |
1312	          +--------------------+
1313	                    |                              Retransmission
1314	              Encoded Stream     +--------+     +---- Request
1315	                    V            |        V     V
1316	          +--------------------+ | +--------------------+
1317	          |  Media Packetizer  | | | RTP Retransmission |
1318	          +--------------------+ | +--------------------+
1319	                    |            |           |
1320	                    +------------+  Redundancy RTP Stream
1321	             Source RTP Stream               |
1322	                    |                        |
1323	                    +---------+    +---------+
1324	                              |    |
1325	                              V    V
1326	                       +-----------------+
1327	                       | Media Transport |
1328	                       +-----------------+

1330	          Figure 11: Example of Media Source Retransmission Flows

1332	   The RTP Retransmission example (Figure 11) illustrates that this
1333	   mechanism works purely on the Source RTP Stream.  The RTP
1334	   Retransmission transform buffers the sent Source RTP Stream and, upon
1335	   request, emits a retransmitted packet with an extra payload header as
1336	   a Redundancy RTP Stream.  The RTP Retransmission mechanism [RFC4588]
1337	   is specified such that there is a one to one relation between the
1338	   Source RTP Stream and the Redundancy RTP Stream.  Therefore, a
1339	   Redundancy RTP Stream needs to be associated with its Source RTP
1340	   Stream.  This is done based on CNAME selectors and heuristics to
1341	   match requested packets for a given Source RTP Stream with the
1342	   original sequence number in the payload of any new Redundancy RTP
1343	   Stream using the RTX payload format.  In cases where the Redundancy
1344	   RTP Stream is sent in a separate RTP Session from the Source RTP
1345	   Stream, these sessions are related, which is signaled by using the
1346	   SDP Media Grouping's [RFC5888] Flow Identification (FID
1347	   identification-tag) semantics.

1349	3.11.  Forward Error Correction

1351	   Figure 12 shows an example where two Media Sources' Source RTP
1352	   Streams are protected by Forward Error Correction (FEC).  Source RTP
1353	   Stream A has a RTP-based Redundancy transformation in FEC Encoder 1.
1354	   This produces a Redundancy RTP Stream 1, that is only related to
1355	   Source RTP Stream A.  The FEC Encoder 2, however, takes two Source
1356	   RTP Streams (A and B) and produces a Redundancy RTP Stream 2 that
1357	   protects them jointly, i.e. Redundancy RTP Stream 2 relates to two
1358	   Source RTP Streams (a FEC group).  FEC decoding, when needed due to
1359	   packet loss or packet corruption at the receiver, requires knowledge
1360	   about which Source RTP Streams that the FEC encoding was based on.

1362	   In Figure 12 all RTP Streams are sent on the same Media Transport.
1363	   This is however not the only possible choice.  Numerous combinations
1364	   exist for spreading these RTP Streams over different Media Transports
1365	   to achieve the communication application's goal.

1367	       +--------------------+                +--------------------+
1368	       |   Media Source A   |                |   Media Source B   |
1369	       +--------------------+                +--------------------+
1370	                 |                                     |
1371	                 V                                     V
1372	       +--------------------+                +--------------------+
1373	       |   Media Encoder A  |                |   Media Encoder B  |
1374	       +--------------------+                +--------------------+
1375	                 |                                     |
1376	           Encoded Stream                        Encoded Stream
1377	                 V                                     V
1378	       +--------------------+                +--------------------+
1379	       | Media Packetizer A |                | Media Packetizer B |
1380	       +--------------------+                +--------------------+
1381	                 |                                     |
1382	        Source RTP Stream A                   Source RTP Stream B
1383	                 |                                     |
1384	           +-----+---------+-------------+         +---+---+
1385	           |               V             V         V       |
1386	           |       +---------------+  +---------------+    |
1387	           |       | FEC Encoder 1 |  | FEC Encoder 2 |    |
1388	           |       +---------------+  +---------------+    |
1389	           |  Redundancy   |     Redundancy   |            |
1390	           |  RTP Stream 1 |     RTP Stream 2 |            |
1391	           V               V                  V            V
1392	       +----------------------------------------------------------+
1393	       |                    Media Transport                       |
1394	       +----------------------------------------------------------+

1396	             Figure 12: Example of FEC Redundancy RTP Streams

1398	   As FEC Encoding exists in various forms, the methods for relating FEC
1399	   Redundancy RTP Streams with its source information in Source RTP
1400	   Streams are many.  The XOR based RTP FEC Payload format [RFC5109] is
1401	   defined in such a way that a Redundancy RTP Stream has a one to one
1402	   relation with a Source RTP Stream.  In fact, the RFC requires the
1403	   Redundancy RTP Stream to use the same SSRC as the Source RTP Stream.
1404	   This requires to either use a separate RTP Session or to use the
1405	   Redundancy RTP Payload format [RFC2198].  The underlying relation
1406	   requirement for this FEC format and a particular Redundancy RTP
1407	   Stream is to know the related Source RTP Stream, including its SSRC.

1409	3.12.  RTP Stream Separation

1411	   RTP Streams can be separated exclusively based on their SSRCs, at the
1412	   RTP Session level, or at the Multi-Media Session level.

1414	   When the RTP Streams that have a relationship are all sent in the
1415	   same RTP Session and are uniquely identified based on their SSRC
1416	   only, it is termed an SSRC-Only Based Separation.  Such streams can
1417	   be related via RTCP CNAME to identify that the streams belong to the
1418	   same Endpoint.  SSRC-based approaches [RFC5576], when used, can
1419	   explicitly relate various such RTP Streams.

1421	   On the other hand, when RTP Streams that are related but are sent in
1422	   the context of different RTP Sessions to achieve separation, it is
1423	   known as RTP Session-based separation.  This is commonly used when
1424	   the different RTP Streams are intended for different Media
1425	   Transports.

1427	   Several mechanisms that use RTP Session-based separation rely on it
1428	   to enable an implicit grouping mechanism expressing the relationship.
1429	   The solutions have been based on using the same SSRC value in the
1430	   different RTP Sessions to implicitly indicate their relation.  That
1431	   way, no explicit RTP level mechanism has been needed, only signaling
1432	   level relations have been established using semantics from Grouping
1433	   of Media lines framework [RFC5888].  Examples of this are RTP
1434	   Retransmission [RFC4588], SVC Multi-Session Transmission [RFC6190]
1435	   and XOR Based FEC [RFC5109].  RTCP CNAME explicitly relates RTP
1436	   Streams across different RTP Sessions, as explained in the previous
1437	   section.  Such a relationship can be used to perform inter-media
1438	   synchronization.

1440	   RTP Streams that are related and need to be associated can be part of
1441	   different Multimedia Sessions, rather than just different RTP
1442	   Sessions within the same Multimedia Session context.  This puts
1443	   further demand on the scope of the mechanism(s) and its handling of
1444	   identifiers used for expressing the relationships.

1446	3.13.  Multiple RTP Sessions over one Media Transport

1448	   [I-D.westerlund-avtcore-transport-multiplexing] describes a mechanism
1449	   that allows several RTP Sessions to be carried over a single
1450	   underlying Media Transport.  The main reasons for doing this are
1451	   related to the impact of using one or more Media Transports (using a
1452	   common network path or potentially have different ones).  The fewer
1453	   Media Transports used, the less need for NAT/FW traversal resources
1454	   and number of flow based Quality of Service (QoS).

1456	   However, Multiple RTP Sessions over one Media Transport imply that a
1457	   single Media Transport 5-tuple is not sufficient to express in which
1458	   RTP Session context a particular RTP Stream exists.  Complexities in
1459	   the relationship between Media Transports and RTP Session already
1460	   exist as one RTP Session contains multiple Media Transports, e.g.
1461	   even a Peer-to-Peer RTP Session with RTP/RTCP Multiplexing requires
1462	   two Media Transports, one in each direction.  The relationship
1463	   between Media Transports and RTP Sessions as well as additional
1464	   levels of identifiers need to be considered in both signaling design
1465	   and when defining terminology.

1467	4.  Mapping from Existing Terms

1469	   This section describes a selected set of terms from some relevant
1470	   IETF RFC and Internet Drafts (at the time of writing), using the
1471	   concepts from previous sections.

1473	4.1.  Telepresence Terms

1475	   The terms in this sub-section are used in the context of CLUE
1476	   [I-D.ietf-clue-framework].

1478	4.1.1.  Audio Capture

1480	   Describes an audio Media Source (Section 2.1.4).

1482	4.1.2.  Capture Device

1484	   Identifies a physical entity performing a Media Capture
1485	   (Section 2.1.2) transformation.

1487	4.1.3.  Capture Encoding

1489	   Describes an Encoded Stream (Section 2.1.7) related to CLUE specific
1490	   semantic information.

1492	4.1.4.  Capture Scene

1494	   Describes a set of spatially related Media Sources (Section 2.1.4).

1496	4.1.5.  Endpoint

1498	   Describes exactly one Participant (Section 2.2.3) and one or more
1499	   Endpoints (Section 2.2.1).

1501	4.1.6.  Individual Encoding

1503	   Describes the configuration information needed to perform a Media
1504	   Encoder (Section 2.1.6) transformation.

1506	4.1.7.  Media Capture

1508	   Describes either a Media Capture (Section 2.1.2) or a Media Source
1509	   (Section 2.1.4), depending on in which context the term is used.

1511	4.1.8.  Media Consumer

1513	   Describes the media receiving part of an Endpoint (Section 2.2.1).

1515	4.1.9.  Media Provider

1517	   Describes the media sending part of an Endpoint (Section 2.2.1).

1519	4.1.10.  Stream

1521	   Describes an RTP Stream (Section 2.1.10).

1523	4.1.11.  Video Capture

1525	   Describes a video Media Source (Section 2.1.4).

1527	4.2.  Media Description

1529	   A single Session Description Protocol (SDP) [RFC4566] media
1530	   description (or media block; an m-line and all subsequent lines until
1531	   the next m-line or the end of the SDP) describes part of the
1532	   necessary configuration and identification information needed for a
1533	   Media Encoder transformation, as well as the necessary configuration
1534	   and identification information for the Media Decoder to be able to
1535	   correctly interpret a received RTP Stream.

1537	   A Media Description typically relates to a single Media Source.  This
1538	   is for example an explicit restriction in WebRTC.  However, nothing
1539	   prevents that the same Media Description (and same RTP Session) is
1540	   re-used for multiple Media Sources
1541	   [I-D.ietf-avtcore-rtp-multi-stream].  It can thus describe properties
1542	   of one or more RTP Streams, and can also describe properties valid
1543	   for an entire RTP Session (via [RFC5576] mechanisms, for example).

1545	4.3.  Media Stream

1547	   RTP [RFC3550] uses media stream, audio stream, video stream, and
1548	   stream of (RTP) packets interchangeably, which are all RTP Streams.

1550	4.4.  Multimedia Conference

1552	   A Multimedia Conference is a Communication Session (Section 2.2.5)
1553	   between two or more Participants (Section 2.2.3), along with the
1554	   software they are using to communicate.

1556	4.5.  Multimedia Session

1558	   SDP [RFC4566] defines a Multimedia Session as a set of multimedia
1559	   senders and receivers and the data streams flowing from senders to
1560	   receivers, which would correspond to a set of Endpoints and the RTP
1561	   Streams that flow between them.  In this memo, Multimedia Session
1562	   (Section 2.2.4) also assumes those Endpoints belong to a set of
1563	   Participants that are engaged in communication via a set of related
1564	   RTP Streams.

1566	   RTP [RFC3550] defines a Multimedia Session as a set of concurrent RTP
1567	   Sessions among a common group of Participants.  For example, a video
1568	   conference may contain an audio RTP Session and a video RTP Session.
1569	   This would correspond to a group of Participants (each using one or
1570	   more Endpoints) sharing a set of concurrent RTP Sessions.  In this
1571	   memo, Multimedia Session also defines those RTP Sessions to have some
1572	   relation and be part of a communication among the Participants.

1574	4.6.  Multipoint Control Unit (MCU)

1576	   This term is commonly used to describe the central node in any type
1577	   of star topology [I-D.ietf-avtcore-rtp-topologies-update] conference.
1578	   It describes a device that includes one Participant (Section 2.2.3)
1579	   (usually corresponding to a so-called conference focus) and one or
1580	   more related Endpoints (Section 2.2.1) (sometimes one or more per
1581	   conference Participant).

1583	4.7.  Multi-Session Transmission (MST)

1585	   One of two transmission modes defined in H.264 based SVC [RFC6190],
1586	   the other mode being SST (Section 4.13).  In Multi-Session
1587	   Transmission (MST), the SVC Media Encoder sends Encoded Streams and
1588	   Dependent Streams distributed across two or more RTP Streams in one
1589	   or more RTP Sessions.  The term "MST" is ambiguous in RFC 6190,
1590	   especially since the name indicates the use of multiple "sessions",
1591	   while MST type packetization is in fact required whenever two or more
1592	   RTP Streams are used for the Encoded and Dependent Streams,
1593	   regardless if those are sent in one or more RTP Sessions.
1594	   Corresponds either to MRST or MRMT (Section 3.7) stream relations
1595	   defined in this specification.  The SVC RTP Payload RFC [RFC6190] is
1596	   not particularly explicit about how the common Media Encoder
1597	   (Section 2.1.6) relation between Encoded Streams (Section 2.1.7) and
1598	   Dependent Streams (Section 2.1.8) is to be implemented.

1600	4.8.  Recording Device

1602	   WebRTC specifications use this term to refer to locally available
1603	   entities performing a Media Capture (Section 2.1.2) transformation.

1605	4.9.  RtcMediaStream

1607	   A WebRTC RtcMediaStream is a set of Media Sources (Section 2.1.4)
1608	   sharing the same Synchronization Context (Section 3.1).

1610	4.10.  RtcMediaStreamTrack

1612	   A WebRTC RtcMediaStreamTrack is a Media Source (Section 2.1.4).

1614	4.11.  RTP Sender

1616	   RTP [RFC3550] uses this term, which can be seen as the RTP protocol
1617	   part of a Media Packetizer (Section 2.1.9).

1619	4.12.  RTP Session

1621	   Within the context of SDP, a singe m= line can map to a single RTP
1622	   Session (Section 2.2.2) or multiple m= lines can map to a single RTP
1623	   Session.  The latter is enabled via multiplexing schemes such as
1624	   BUNDLE [I-D.ietf-mmusic-sdp-bundle-negotiation], for example, which
1625	   allows mapping of multiple m= lines to a single RTP Session.

1627	4.13.  Single Session Transmission (SST)

1629	   One of two transmission modes defined in H.264 based SVC [RFC6190],
1630	   the other mode being MST (Section 4.7).  In Single Session
1631	   Transmission (SST), the SVC Media Encoder sends Encoded Streams
1632	   (Section 2.1.7) and Dependent Streams (Section 2.1.8) combined into a
1633	   single RTP Stream (Section 2.1.10) in a single RTP Session
1634	   (Section 2.2.2), using the SVC RTP Payload format.  The term "SST" is
1635	   ambiguous in RFC 6190, in that it sometimes refers to the use of a
1636	   single RTP Stream, like in sections relating to packetization, and
1637	   sometimes appears to refer to use of a single RTP Session, like in
1638	   the context of discussing SDP.  Closely corresponds to SRST
1639	   (Section 3.7) defined in this specification.

1641	4.14.  SSRC

1643	   RTP [RFC3550] defines this as "the source of a stream of RTP
1644	   packets", which indicates that an SSRC is not only a unique
1645	   identifier for the Encoded Stream (Section 2.1.7) carried in those
1646	   packets, but is also effectively used as a term to denote a Media
1647	   Packetizer (Section 2.1.9).

1649	5.  Security Considerations

1651	   This document simply tries to clarify the confusion prevalent in RTP
1652	   taxonomy because of inconsistent usage by multiple technologies and
1653	   protocols making use of the RTP protocol.  It does not introduce any
1654	   new security considerations beyond those already well documented in
1655	   the RTP protocol [RFC3550] and each of the many respective
1656	   specifications of the various protocols making use of it.

1658	   Hopefully having a well-defined common terminology and understanding
1659	   of the complexities of the RTP architecture will help lead us to
1660	   better standards, avoiding security problems.

1662	6.  Acknowledgement

1664	   This document has many concepts borrowed from several documents such
1665	   as WebRTC [I-D.ietf-rtcweb-overview], CLUE [I-D.ietf-clue-framework],
1666	   and Multiplexing Architecture
1667	   [I-D.westerlund-avtcore-transport-multiplexing].  The authors would
1668	   like to thank all the authors of each of those documents.

1670	   The authors would also like to acknowledge the insights, guidance and
1671	   contributions of Magnus Westerlund, Roni Even, Paul Kyzivat, Colin
1672	   Perkins, Keith Drage, Harald Alvestrand, Alex Eleftheriadis, Mo
1673	   Zanaty, Stephan Wenger, and Bernard Aboba.

1675	7.  Contributors

1677	   Magnus Westerlund has contributed the concept model for the media
1678	   chain using transformations and streams model, including rewriting
1679	   pre-existing concepts into this model and adding missing concepts.
1680	   The first proposal for updating the relationships and the topologies
1681	   based on this concept was also performed by Magnus.

1683	8.  IANA Considerations

1685	   This document makes no request of IANA.

1687	9.  Informative References

1689	   [I-D.ietf-avtcore-rtp-multi-stream]
1690	              Lennox, J., Westerlund, M., Wu, W., and C. Perkins,
1691	              "Sending Multiple Media Streams in a Single RTP Session",
1692	              draft-ietf-avtcore-rtp-multi-stream-06 (work in progress),
1693	              October 2014.

1695	   [I-D.ietf-avtcore-rtp-topologies-update]
1696	              Westerlund, M. and S. Wenger, "RTP Topologies", draft-
1697	              ietf-avtcore-rtp-topologies-update-06 (work in progress),
1698	              March 2015.

1700	   [I-D.ietf-clue-framework]
1701	              Duckworth, M., Pepperell, A., and S. Wenger, "Framework
1702	              for Telepresence Multi-Streams", draft-ietf-clue-
1703	              framework-21 (work in progress), March 2015.

1705	   [I-D.ietf-mmusic-sdp-bundle-negotiation]
1706	              Holmberg, C., Alvestrand, H., and C. Jennings,
1707	              "Negotiating Media Multiplexing Using the Session
1708	              Description Protocol (SDP)", draft-ietf-mmusic-sdp-bundle-
1709	              negotiation-17 (work in progress), March 2015.

1711	   [I-D.ietf-mmusic-sdp-simulcast]
1712	              Westerlund, M., Nandakumar, S., and M. Zanaty, "Using
1713	              Simulcast in SDP and RTP Sessions", draft-ietf-mmusic-sdp-
1714	              simulcast-00 (work in progress), January 2015.

1716	   [I-D.ietf-rtcweb-overview]
1717	              Alvestrand, H., "Overview: Real Time Protocols for
1718	              Browser-based Applications", draft-ietf-rtcweb-overview-13
1719	              (work in progress), November 2014.

1721	   [I-D.westerlund-avtcore-transport-multiplexing]
1722	              Westerlund, M. and C. Perkins, "Multiplexing Multiple RTP
1723	              Sessions onto a Single Lower-Layer Transport", draft-
1724	              westerlund-avtcore-transport-multiplexing-07 (work in
1725	              progress), October 2013.

1727	   [RFC2198]  Perkins, C., Kouvelas, I., Hodson, O., Hardman, V.,
1728	              Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse-
1729	              Parisis, "RTP Payload for Redundant Audio Data", RFC 2198,
1730	              September 1997.

1732	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
1733	              Jacobson, "RTP: A Transport Protocol for Real-Time
1734	              Applications", STD 64, RFC 3550, July 2003.

1736	   [RFC3551]  Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
1737	              Video Conferences with Minimal Control", STD 65, RFC 3551,
1738	              July 2003.

1740	   [RFC4566]  Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
1741	              Description Protocol", RFC 4566, July 2006.

1743	   [RFC4588]  Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R.
1744	              Hakenberg, "RTP Retransmission Payload Format", RFC 4588,
1745	              July 2006.

1747	   [RFC4867]  Sjoberg, J., Westerlund, M., Lakaniemi, A., and Q. Xie,
1748	              "RTP Payload Format and File Storage Format for the
1749	              Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband
1750	              (AMR-WB) Audio Codecs", RFC 4867, April 2007.

1752	   [RFC5109]  Li, A., "RTP Payload Format for Generic Forward Error
1753	              Correction", RFC 5109, December 2007.

1755	   [RFC5404]  Westerlund, M. and I. Johansson, "RTP Payload Format for
1756	              G.719", RFC 5404, January 2009.

1758	   [RFC5576]  Lennox, J., Ott, J., and T. Schierl, "Source-Specific
1759	              Media Attributes in the Session Description Protocol
1760	              (SDP)", RFC 5576, June 2009.

1762	   [RFC5888]  Camarillo, G. and H. Schulzrinne, "The Session Description
1763	              Protocol (SDP) Grouping Framework", RFC 5888, June 2010.

1765	   [RFC5905]  Mills, D., Martin, J., Burbank, J., and W. Kasch, "Network
1766	              Time Protocol Version 4: Protocol and Algorithms
1767	              Specification", RFC 5905, June 2010.

1769	   [RFC6190]  Wenger, S., Wang, Y., Schierl, T., and A. Eleftheriadis,
1770	              "RTP Payload Format for Scalable Video Coding", RFC 6190,
1771	              May 2011.

1773	   [RFC7160]  Petit-Huguenin, M. and G. Zorn, "Support for Multiple
1774	              Clock Rates in an RTP Session", RFC 7160, April 2014.

1776	   [RFC7197]  Begen, A., Cai, Y., and H. Ou, "Duplication Delay
1777	              Attribute in the Session Description Protocol", RFC 7197,
1778	              April 2014.

1780	   [RFC7198]  Begen, A. and C. Perkins, "Duplicating RTP Streams", RFC
1781	              7198, April 2014.

1783	   [RFC7273]  Williams, A., Gross, K., van Brandenburg, R., and H.
1784	              Stokking, "RTP Clock Source Signalling", RFC 7273, June
1785	              2014.

1787	Appendix A.  Changes From Earlier Versions

1789	   NOTE TO RFC EDITOR: Please remove this section prior to publication.

1791	A.1.  Modifications Between WG Version -05 and -06

1793	   o  Clarified that a Redundancy RTP Stream can be used standalone to
1794	      generate Repaired RTP Streams.

1796	   o  Clarified that (in accordance with above) RTP-based Repair takes
1797	      zero or more Received RTP Streams and one or more Received
1798	      Redundancy RTP Streams as input.

1800	   o  Changed Figure 6 to more clearly show that Media Transport is
1801	      terminated in the Endpoint, not in the Particpiant.

1803	   o  Added a sentence to Endpoint section that clarifies there may be
1804	      contexts where a single "host" can serve multiple Participants,
1805	      making those Endpoints share some properties.

1807	   o  Merged previous section 3.5 on SST/MST with previous section 3.8
1808	      on Layered Multi-Stream into a common section discussing the
1809	      scalable/layered stream relation, and moved improved, descriptive
1810	      text on SST and MST to new sub-sections 4.7 and 4.13, describing
1811	      them as existing terms.

1813	   o  Editorial improvements.

1815	A.2.  Modifications Between WG Version -04 and -05

1817	   o  Editorial improvements.

1819	A.3.  Modifications Between WG Version -03 and -04

1821	   o  Changed "Media Redundancy" and "Media Repair" to "RTP-based
1822	      Redundancy" and "RTP-based Repair", since those terms are more
1823	      specific and correct.

1825	   o  Changed "End Point" to "Endpoint" and removed Editor's Note on
1826	      this.

1828	   o  Clarified that a Media Capture may impose constraints on clock
1829	      handling.

1831	   o  Clarified that mixing multiple Raw Streams into a Source Stream is
1832	      not possible, since that requires mixed streams to have a timing
1833	      relation, requiring them to be Source Streams, and added an
1834	      example.

1836	   o  Clarified that RTP-based Redundancy excludes the type of encoding
1837	      redundancy found within the encoded media format in an Encoded
1838	      Stream.

1840	   o  Clarified that a Media Transport contains only a single RTP
1841	      Session, but a single RTP Session can span multiple Media
1842	      Transports.

1844	   o  Clarified that packets with seemingly correct checksum that are
1845	      received by a Media Transport Receiver may still be corrupt.

1847	   o  Clarified that a corrupt packet in a Media Transport Receiver is
1848	      typically either discarded or somehow marked and passed on in the
1849	      Received RTP Stream.

1851	   o  Added Synchronization Context to Figure 6.

1853	   o  Editorial improvements and clarifications.

1855	A.4.  Modifications Between WG Version -02 and -03

1857	   o  Changed section 3.5, removing SST-SS/MS and MST-SS/MS, replacing
1858	      them with SRST, MRST, and MRMT.

1860	   o  Updated section 3.8 to align with terminology changes in section
1861	      3.5.

1863	   o  Added a new section 4.12, describing the term Multimedia
1864	      Conference.

1866	   o  Changed reference from I-D to now published RFC 7273.

1868	   o  Editorial improvements and clarifications.

1870	A.5.  Modifications Between WG Version -01 and -02

1872	   o  Major re-structure

1874	   o  Moved media chain Media Transport detailing up one section level
1875	   o  Collapsed level 2 sub-sections of section 3 and thus moved level 3
1876	      sub-sections up one level, gathering some introductory text into
1877	      the beginning of section 3

1879	   o  Added that not only SSRC collision, but also a clock rate change
1880	      [RFC7160] is a valid reason to change SSRC value for an RTP stream

1882	   o  Added a sub-section on clock source signaling

1884	   o  Added a sub-section on RTP stream duplication

1886	   o  Elaborated a bit in section 2.2.1 on the relation between End
1887	      Points, Participants and CNAMEs

1889	   o  Elaborated a bit in section 2.2.4 on Multimedia Session and
1890	      synchronization contexts

1892	   o  Removed the section on CLUE scenes defining an implicit
1893	      synchronization context, since it was incorrect

1895	   o  Clarified text on SVC SST and MST according to list discussions

1897	   o  Removed the entire topology section to avoid possible
1898	      inconsistencies or duplications with draft-ietf-avtcore-rtp-
1899	      topologies-update, but saved one example overview figure of
1900	      Communication Entities into that section

1902	   o  Added a section 4 on mapping from existing terms with one sub-
1903	      section per term, mainly by moving text from sections 2 and 3

1905	   o  Changed all occurrences of Packet Stream to RTP Stream

1907	   o  Moved all normative references to informative, since this is an
1908	      informative document

1910	   o  Added references to RFC 7160, RFC 7197 and RFC 7198, and removed
1911	      unused references

1913	A.6.  Modifications Between WG Version -00 and -01

1915	   o  WG version -00 text is identical to individual draft -03

1917	   o  Amended description of SVC SST and MST encodings with respect to
1918	      concepts defined in this text

1920	   o  Removed UML as normative reference, since the text no longer uses
1921	      any UML notation

1923	   o  Removed a number of level 4 sections and moved out text to the
1924	      level above

1926	A.7.  Modifications Between Version -02 and -03

1928	   o  Section 4 rewritten (and new communication topologies added) to
1929	      reflect the major updates to Sections 1-3

1931	   o  Section 8 removed (carryover from initial -00 draft)

1933	   o  General clean up of text, grammar and nits

1935	A.8.  Modifications Between Version -01 and -02

1937	   o  Section 2 rewritten to add both streams and transformations in the
1938	      media chain.

1940	   o  Section 3 rewritten to focus on exposing relationships.

1942	A.9.  Modifications Between Version -00 and -01

1944	   o  Too many to list

1946	   o  Added new authors

1948	   o  Updated content organization and presentation

1950	Authors' Addresses

1952	   Jonathan Lennox
1953	   Vidyo, Inc.
1954	   433 Hackensack Avenue
1955	   Seventh Floor
1956	   Hackensack, NJ  07601
1957	   US

1959	   Email: jonathan@vidyo.com

1961	   Kevin Gross
1962	   AVA Networks, LLC
1963	   Boulder, CO
1964	   US

1966	   Email: kevin.gross@avanw.com
1967	   Suhas Nandakumar
1968	   Cisco Systems
1969	   170 West Tasman Drive
1970	   San Jose, CA  95134
1971	   US

1973	   Email: snandaku@cisco.com

1975	   Gonzalo Salgueiro
1976	   Cisco Systems
1977	   7200-12 Kit Creek Road
1978	   Research Triangle Park, NC  27709
1979	   US

1981	   Email: gsalguei@cisco.com

1983	   Bo Burman
1984	   Ericsson
1985	   Kistavagen 25
1986	   SE-164 80 Stockholm
1987	   Sweden

1989	   Email: bo.burman@ericsson.com