idnits 2.17.1 

draft-ietf-avtext-rtp-grouping-taxonomy-04.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (January 16, 2015) is 3387 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Outdated reference: A later version (-11) exists of
     draft-ietf-avtcore-rtp-multi-stream-06

  == Outdated reference: A later version (-10) exists of
     draft-ietf-avtcore-rtp-topologies-update-05

  == Outdated reference: A later version (-25) exists of
     draft-ietf-clue-framework-19

  == Outdated reference: A later version (-54) exists of
     draft-ietf-mmusic-sdp-bundle-negotiation-14

  == Outdated reference: A later version (-19) exists of
     draft-ietf-rtcweb-overview-13

  -- Obsolete informational reference (is this intentional?): RFC 4566
     (Obsoleted by RFC 8866)


     Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Network Working Group                                          J. Lennox
3	Internet-Draft                                                     Vidyo
4	Intended status: Informational                                  K. Gross
5	Expires: July 20, 2015                                               AVA
6	                                                           S. Nandakumar
7	                                                            G. Salgueiro
8	                                                           Cisco Systems
9	                                                               B. Burman
10	                                                                Ericsson
11	                                                        January 16, 2015

13	A Taxonomy of Grouping Semantics and Mechanisms for Real-Time Transport
14	                         Protocol (RTP) Sources
15	               draft-ietf-avtext-rtp-grouping-taxonomy-04

17	Abstract

19	   The terminology about, and associations among, Real-Time Transport
20	   Protocol (RTP) sources can be complex and somewhat opaque.  This
21	   document describes a number of existing and proposed relationships
22	   among RTP sources, and attempts to define common terminology for
23	   discussing protocol entities and their relationships.

25	Status of This Memo

27	   This Internet-Draft is submitted in full conformance with the
28	   provisions of BCP 78 and BCP 79.

30	   Internet-Drafts are working documents of the Internet Engineering
31	   Task Force (IETF).  Note that other groups may also distribute
32	   working documents as Internet-Drafts.  The list of current Internet-
33	   Drafts is at http://datatracker.ietf.org/drafts/current/.

35	   Internet-Drafts are draft documents valid for a maximum of six months
36	   and may be updated, replaced, or obsoleted by other documents at any
37	   time.  It is inappropriate to use Internet-Drafts as reference
38	   material or to cite them other than as "work in progress."

40	   This Internet-Draft will expire on July 20, 2015.

42	Copyright Notice

44	   Copyright (c) 2015 IETF Trust and the persons identified as the
45	   document authors.  All rights reserved.

47	   This document is subject to BCP 78 and the IETF Trust's Legal
48	   Provisions Relating to IETF Documents
49	   (http://trustee.ietf.org/license-info) in effect on the date of
50	   publication of this document.  Please review these documents
51	   carefully, as they describe your rights and restrictions with respect
52	   to this document.  Code Components extracted from this document must
53	   include Simplified BSD License text as described in Section 4.e of
54	   the Trust Legal Provisions and are provided without warranty as
55	   described in the Simplified BSD License.

57	Table of Contents

59	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   4
60	   2.  Concepts  . . . . . . . . . . . . . . . . . . . . . . . . . .   4
61	     2.1.  Media Chain . . . . . . . . . . . . . . . . . . . . . . .   4
62	       2.1.1.  Physical Stimulus . . . . . . . . . . . . . . . . . .   8
63	       2.1.2.  Media Capture . . . . . . . . . . . . . . . . . . . .   8
64	       2.1.3.  Raw Stream  . . . . . . . . . . . . . . . . . . . . .   8
65	       2.1.4.  Media Source  . . . . . . . . . . . . . . . . . . . .   8
66	       2.1.5.  Source Stream . . . . . . . . . . . . . . . . . . . .   9
67	       2.1.6.  Media Encoder . . . . . . . . . . . . . . . . . . . .  10
68	       2.1.7.  Encoded Stream  . . . . . . . . . . . . . . . . . . .  11
69	       2.1.8.  Dependent Stream  . . . . . . . . . . . . . . . . . .  11
70	       2.1.9.  Media Packetizer  . . . . . . . . . . . . . . . . . .  11
71	       2.1.10. RTP Stream  . . . . . . . . . . . . . . . . . . . . .  12
72	       2.1.11. RTP-based Redundancy  . . . . . . . . . . . . . . . .  13
73	       2.1.12. Redundancy RTP Stream . . . . . . . . . . . . . . . .  13
74	       2.1.13. Media Transport . . . . . . . . . . . . . . . . . . .  13
75	       2.1.14. Media Transport Sender  . . . . . . . . . . . . . . .  14
76	       2.1.15. Sent RTP Stream . . . . . . . . . . . . . . . . . . .  15
77	       2.1.16. Network Transport . . . . . . . . . . . . . . . . . .  15
78	       2.1.17. Transported RTP Stream  . . . . . . . . . . . . . . .  15
79	       2.1.18. Media Transport Receiver  . . . . . . . . . . . . . .  15
80	       2.1.19. Received RTP Stream . . . . . . . . . . . . . . . . .  15
81	       2.1.20. Received Redundancy RTP Stream  . . . . . . . . . . .  16
82	       2.1.21. RTP-based Repair  . . . . . . . . . . . . . . . . . .  16
83	       2.1.22. Repaired RTP Stream . . . . . . . . . . . . . . . . .  16
84	       2.1.23. Media Depacketizer  . . . . . . . . . . . . . . . . .  16
85	       2.1.24. Received Encoded Stream . . . . . . . . . . . . . . .  16
86	       2.1.25. Media Decoder . . . . . . . . . . . . . . . . . . . .  16
87	       2.1.26. Received Source Stream  . . . . . . . . . . . . . . .  17
88	       2.1.27. Media Sink  . . . . . . . . . . . . . . . . . . . . .  17
89	       2.1.28. Received Raw Stream . . . . . . . . . . . . . . . . .  17
90	       2.1.29. Media Render  . . . . . . . . . . . . . . . . . . . .  17
91	     2.2.  Communication Entities  . . . . . . . . . . . . . . . . .  18
92	       2.2.1.  Endpoint  . . . . . . . . . . . . . . . . . . . . . .  19
93	       2.2.2.  RTP Session . . . . . . . . . . . . . . . . . . . . .  19
94	       2.2.3.  Participant . . . . . . . . . . . . . . . . . . . . .  20
95	       2.2.4.  Multimedia Session  . . . . . . . . . . . . . . . . .  20
96	       2.2.5.  Communication Session . . . . . . . . . . . . . . . .  21

98	   3.  Concepts of Inter-Relations . . . . . . . . . . . . . . . . .  21
99	     3.1.  Synchronization Context . . . . . . . . . . . . . . . . .  21
100	       3.1.1.  RTCP CNAME  . . . . . . . . . . . . . . . . . . . . .  22
101	       3.1.2.  Clock Source Signaling  . . . . . . . . . . . . . . .  22
102	       3.1.3.  Implicitly via RtcMediaStream . . . . . . . . . . . .  22
103	       3.1.4.  Explicitly via SDP Mechanisms . . . . . . . . . . . .  22
104	     3.2.  Endpoint  . . . . . . . . . . . . . . . . . . . . . . . .  22
105	     3.3.  Participant . . . . . . . . . . . . . . . . . . . . . . .  23
106	     3.4.  RtcMediaStream  . . . . . . . . . . . . . . . . . . . . .  23
107	     3.5.  Single- and Multi-Session Transmission of Dependent
108	           Streams . . . . . . . . . . . . . . . . . . . . . . . . .  23
109	     3.6.  Multi-Channel Audio . . . . . . . . . . . . . . . . . . .  24
110	     3.7.  Simulcast . . . . . . . . . . . . . . . . . . . . . . . .  24
111	     3.8.  Layered Multi-Stream  . . . . . . . . . . . . . . . . . .  25
112	     3.9.  RTP Stream Duplication  . . . . . . . . . . . . . . . . .  27
113	     3.10. Redundancy Format . . . . . . . . . . . . . . . . . . . .  27
114	     3.11. RTP Retransmission  . . . . . . . . . . . . . . . . . . .  28
115	     3.12. Forward Error Correction  . . . . . . . . . . . . . . . .  29
116	     3.13. RTP Stream Separation . . . . . . . . . . . . . . . . . .  31
117	     3.14. Multiple RTP Sessions over one Media Transport  . . . . .  32
118	   4.  Mapping from Existing Terms . . . . . . . . . . . . . . . . .  32
119	     4.1.  Telepresence Terms  . . . . . . . . . . . . . . . . . . .  32
120	       4.1.1.  Audio Capture . . . . . . . . . . . . . . . . . . . .  32
121	       4.1.2.  Capture Device  . . . . . . . . . . . . . . . . . . .  32
122	       4.1.3.  Capture Encoding  . . . . . . . . . . . . . . . . . .  32
123	       4.1.4.  Capture Scene . . . . . . . . . . . . . . . . . . . .  33
124	       4.1.5.  Endpoint  . . . . . . . . . . . . . . . . . . . . . .  33
125	       4.1.6.  Individual Encoding . . . . . . . . . . . . . . . . .  33
126	       4.1.7.  Media Capture . . . . . . . . . . . . . . . . . . . .  33
127	       4.1.8.  Media Consumer  . . . . . . . . . . . . . . . . . . .  33
128	       4.1.9.  Media Provider  . . . . . . . . . . . . . . . . . . .  33
129	       4.1.10. Stream  . . . . . . . . . . . . . . . . . . . . . . .  33
130	       4.1.11. Video Capture . . . . . . . . . . . . . . . . . . . .  33
131	     4.2.  Media Description . . . . . . . . . . . . . . . . . . . .  33
132	     4.3.  Media Stream  . . . . . . . . . . . . . . . . . . . . . .  34
133	     4.4.  Multimedia Conference . . . . . . . . . . . . . . . . . .  34
134	     4.5.  Multimedia Session  . . . . . . . . . . . . . . . . . . .  34
135	     4.6.  Multipoint Control Unit (MCU) . . . . . . . . . . . . . .  34
136	     4.7.  Recording Device  . . . . . . . . . . . . . . . . . . . .  34
137	     4.8.  RtcMediaStream  . . . . . . . . . . . . . . . . . . . . .  35
138	     4.9.  RtcMediaStreamTrack . . . . . . . . . . . . . . . . . . .  35
139	     4.10. RTP Sender  . . . . . . . . . . . . . . . . . . . . . . .  35
140	     4.11. RTP Session . . . . . . . . . . . . . . . . . . . . . . .  35
141	     4.12. SSRC  . . . . . . . . . . . . . . . . . . . . . . . . . .  35
142	   5.  Security Considerations . . . . . . . . . . . . . . . . . . .  35
143	   6.  Acknowledgement . . . . . . . . . . . . . . . . . . . . . . .  36
144	   7.  Contributors  . . . . . . . . . . . . . . . . . . . . . . . .  36
145	   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  36
146	   9.  Informative References  . . . . . . . . . . . . . . . . . . .  36
147	   Appendix A.  Changes From Earlier Versions  . . . . . . . . . . .  38
148	     A.1.  Modifications Between WG Version -03 and -04  . . . . . .  38
149	     A.2.  Modifications Between WG Version -02 and -03  . . . . . .  39
150	     A.3.  Modifications Between WG Version -01 and -02  . . . . . .  39
151	     A.4.  Modifications Between WG Version -00 and -01  . . . . . .  40
152	     A.5.  Modifications Between Version -02 and -03 . . . . . . . .  40
153	     A.6.  Modifications Between Version -01 and -02 . . . . . . . .  41
154	     A.7.  Modifications Between Version -00 and -01 . . . . . . . .  41
155	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  41

157	1.  Introduction

159	   The existing taxonomy of sources in RTP is often regarded as
160	   confusing and inconsistent.  Consequently, a deep understanding of
161	   how the different terms relate to each other becomes a real
162	   challenge.  Frequently cited examples of this confusion are (1) how
163	   different protocols that make use of RTP use the same terms to
164	   signify different things and (2) how the complexities addressed at
165	   one layer are often glossed over or ignored at another.

167	   This document attempts to provide some clarity by reviewing the
168	   semantics of various aspects of sources in RTP.  As an organizing
169	   mechanism, it approaches this by describing various ways that RTP
170	   sources can be grouped and associated together.

172	   All non-specific references to ControLling mUltiple streams for
173	   tElepresence (CLUE) in this document map to [I-D.ietf-clue-framework]
174	   and all references to Web Real-Time Communications (WebRTC) map to
175	   [I-D.ietf-rtcweb-overview].

177	2.  Concepts

179	   This section defines concepts that serve to identify and name various
180	   transformations and streams in a given RTP usage.  For each concept
181	   an attempt is made to list any alternate definitions and usages that
182	   co-exist today along with various characteristics that further
183	   describes the concept.  These concepts are divided into two
184	   categories, one related to the chain of streams and transformations
185	   that media can be subject to, the other for entities involved in the
186	   communication.

188	2.1.  Media Chain

190	   In the context of this memo, Media is a sequence of synthetic or
191	   Physical Stimulus (Section 2.1.1) (sound waves, photons, key-
192	   strokes), represented in digital form.  Synthesized Media is
193	   typically generated directly in the digital domain.

195	   This section contains the concepts that can be involved in taking
196	   Media at a sender side and transporting it to a receiver, which may
197	   recover a sequence of physical stimulus.  This chain of concepts is
198	   of two main types, streams and transformations.  Streams are time-
199	   based sequences of samples of the physical stimulus in various
200	   representations, while transformations changes the representation of
201	   the streams in some way.

203	   The below examples are basic ones and it is important to keep in mind
204	   that this conceptual model enables more complex usages.  Some will be
205	   further discussed in later sections of this document.  In general the
206	   following applies to this model:

208	   o  A transformation may have zero or more inputs and one or more
209	      outputs.

211	   o  A stream is of some type, such as audio, video, real-time text,
212	      etc.

214	   o  A stream has one source transformation and one or more sink
215	      transformations (with the exception of Physical Stimulus
216	      (Section 2.1.1) that may lack source or sink transformation).

218	   o  Streams can be forwarded from a transformation output to any
219	      number of inputs on other transformations that support that type.

221	   o  If the output of a transformation is sent to multiple
222	      transformations, those streams will be identical; it takes a
223	      transformation to make them different.

225	   o  There are no formal limitations on how streams are connected to
226	      transformations, this may include loops if required by a
227	      particular transformation.

229	   It is also important to remember that this is a conceptual model.
230	   Thus real-world implementations may look different and have different
231	   structure.

233	   To provide a basic understanding of the relationships in the chain we
234	   below first introduce the concepts for the sender side (Figure 1).
235	   This covers physical stimulus until media packets are emitted onto
236	   the network.

238	                Physical Stimulus
239	                       |
240	                       V
241	             +--------------------+
242	             |    Media Capture   |
243	             +--------------------+
244	                       |
245	                  Raw Stream
246	                       V
247	             +--------------------+
248	             |    Media Source    |<- Synchronization Timing
249	             +--------------------+
250	                       |
251	                 Source Stream
252	                       V
253	             +--------------------+
254	             |   Media Encoder    |
255	             +--------------------+
256	                       |
257	                 Encoded Stream     +------------+
258	                       V            |            V
259	             +--------------------+ | +----------------------+
260	             |  Media Packetizer  | | | RTP-based Redundancy |
261	             +--------------------+ | +----------------------+
262	                       |            |            |
263	                       +------------+  Redundancy RTP Stream
264	                Source RTP Stream                |
265	                       V                         V
266	             +--------------------+    +--------------------+
267	             |  Media Transport   |    |  Media Transport   |
268	             +--------------------+    +--------------------+

270	             Figure 1: Sender Side Concepts in the Media Chain

272	   In Figure 1 we have included a branched chain to cover the concepts
273	   for using redundancy to improve the reliability of the transport.
274	   The Media Transport concept is an aggregate that is decomposed below
275	   in Section 2.1.13.

277	   Below we review a receiver media chain (Figure 2) matching the sender
278	   side, to look at the inverse transformations and their attempts to
279	   recover identical streams as in the sender chain, subject to what may
280	   be lossy compression and imperfect Media Transport.  Note that the
281	   streams out of a reverse transformation, like the Source Stream out
282	   the Media Decoder are in many cases not the same as the corresponding
283	   ones on the sender side, thus they are prefixed with a "Received" to
284	   denote a potentially modified version.  The reason for not being the
285	   same lies in the transformations that can be of irreversible type.

287	   For example, lossy source coding in the Media Encoder prevents the
288	   Source Stream out of the Media Decoder to be the same as the one fed
289	   into the Media Encoder.  Other reasons include packet loss or late
290	   loss in the Media Transport transformation that even RTP-based
291	   Repair, if used, fails to repair.  It should be noted that some
292	   transformations are not always present, like RTP-based Repair that
293	   cannot operate without Redundancy RTP Streams.

295	           +--------------------+   +--------------------+
296	           |  Media Transport   |   |  Media Transport   |
297	           +--------------------+   +--------------------+
298	                     |                        |
299	            Received RTP Stream  Received Redundancy RTP Stream
300	                     |                        |
301	                     |    +-------------------+
302	                     V    V
303	           +--------------------+
304	           |  RTP-based Repair  |
305	           +--------------------+
306	                     |
307	            Repaired RTP Stream
308	                     V
309	           +--------------------+
310	           | Media Depacketizer |
311	           +--------------------+
312	                     |
313	           Received Encoded Stream
314	                     V
315	           +--------------------+
316	           |   Media Decoder    |
317	           +--------------------+
318	                     |
319	           Received Source Stream
320	                     V
321	           +--------------------+
322	           |     Media Sink     |--> Synchronization Information
323	           +--------------------+
324	                     |
325	            Received Raw Stream
326	                     V
327	           +--------------------+
328	           |   Media Renderer   |
329	           +--------------------+
330	                     |
331	                     V
332	             Physical Stimulus

334	            Figure 2: Receiver Side Concepts of the Media Chain

336	2.1.1.  Physical Stimulus

338	   The physical stimulus is a physical event that can be sampled and
339	   converted to digital form by an appropriate sensor or transducer.
340	   This include sound waves making up audio, photons in a light field,
341	   or other excitations or interactions with sensors, like keystrokes on
342	   a keyboard.

344	2.1.2.  Media Capture

346	   Media Capture is the process of transforming the Physical Stimulus
347	   (Section 2.1.1) into digital Media using an appropriate sensor or
348	   transducer.  The Media Capture performs a digital sampling of the
349	   physical stimulus, usually periodically, and outputs this in some
350	   representation as a Raw Stream (Section 2.1.3).  This data is due to
351	   its periodical sampling, or at least being timed asynchronous events,
352	   some form of a stream of media data.  The Media Capture is normally
353	   instantiated in some type of device, i.e. media capture device.
354	   Examples of different types of media capturing devices are digital
355	   cameras, microphones connected to A/D converters, or keyboards.

357	   Characteristics:

359	   o  A Media Capture is identified either by hardware/manufacturer ID
360	      or via a session-scoped device identifier as mandated by the
361	      application usage.

363	   o  A Media Capture can generate an Encoded Stream (Section 2.1.7) if
364	      the capture device support such a configuration.

366	   o  The nature of the Media Capture may impose constraints on the
367	      clock handling in some of the subsequent steps.  For example, many
368	      audio or video capture devices are not completely free in
369	      selecting the sample rate.

371	2.1.3.  Raw Stream

373	   The time progressing stream of digitally sampled information, usually
374	   periodically sampled and provided by a Media Capture (Section 2.1.2).
375	   A Raw Stream can also contain synthesized Media that may not require
376	   any explicit Media Capture, since it is already in an appropriate
377	   digital form.

379	2.1.4.  Media Source

381	   A Media Source is the logical source of a reference clock
382	   synchronized, time progressing, digital media stream, called a Source
383	   Stream (Section 2.1.5).  This transformation takes one or more Raw
384	   Streams (Section 2.1.3) and provides a Source Stream as output.  The
385	   output is synchronized with a reference clock (Section 3.1), which
386	   can be as simple as a system local wall clock or as complex as NTP
387	   synchronized.

389	   The output can be of different types.  One type is directly
390	   associated with a particular Media Capture's Raw Stream.  Others are
391	   more conceptual sources, like an audio mix of multiple Source Streams
392	   (Figure 3).  Mixing multiple streams typically requires that the
393	   input streams are possible to relate in time, meaning that they have
394	   to be Source Streams (Section 2.1.5) rather than Raw Streams.  In the
395	   below example, the generated Source Stream is a mix of the three
396	   input Source Streams.

398	                Source    Source    Source
399	                Stream    Stream    Stream
400	                  |         |         |
401	                  V         V         V
402	              +--------------------------+
403	              |        Media Source      |<-- Reference Clock
404	              |           Mixer          |
405	              +--------------------------+
406	                            |
407	                            V
408	                      Source Stream

410	         Figure 3: Conceptual Media Source in form of Audio Mixer

412	   Another possible example of a conceptual Media Source is a video
413	   surveillance switch, where the input is multiple Source Streams from
414	   different cameras, and the output is one of those Source Streams
415	   based on some selection criteria, like a round-robin or based on some
416	   video activity measure.

418	   Characteristics:

420	   o  At any point, it can represent a physical captured source or
421	      conceptual source.

423	2.1.5.  Source Stream

425	   A time progressing stream of digital samples that has been
426	   synchronized with a reference clock and comes from particular Media
427	   Source (Section 2.1.4).

429	2.1.6.  Media Encoder

431	   A Media Encoder is a transform that is responsible for encoding the
432	   media data from a Source Stream (Section 2.1.5) into another
433	   representation, usually more compact, that is output as an Encoded
434	   Stream (Section 2.1.7).

436	   The Media Encoder step commonly includes pre-encoding
437	   transformations, such as scaling, resampling etc.  The Media Encoder
438	   can have a significant number of configuration options that affects
439	   the properties of the Encoded Stream.  This include properties such
440	   as bit-rate, start points for decoding, resolution, bandwidth or
441	   other fidelity affecting properties.  The actually used codec is also
442	   an important factor in many communication systems.

444	   Scalable Media Encoders need special attention as they produce
445	   multiple outputs that are potentially of different types.  A scalable
446	   Media Encoder takes one input Source Stream and encodes it into
447	   multiple output streams of two different types; at least one Encoded
448	   Stream that is independently decodable and one or more Dependent
449	   Streams (Section 2.1.8).  Decoding requires at least one Encoded
450	   Stream and zero or more Dependent Streams.  A Dependent Stream's
451	   dependency is one of the grouping relations this document discusses
452	   further in Section 3.8.

454	                              Source Stream
455	                                    |
456	                                    V
457	                       +--------------------------+
458	                       |  Scalable Media Encoder  |
459	                       +--------------------------+
460	                          |         |   ...    |
461	                          V         V          V
462	                       Encoded  Dependent  Dependent
463	                       Stream    Stream     Stream

465	            Figure 4: Scalable Media Encoder Input and Outputs

467	   There are also other variants of encoders, like so-called Multiple
468	   Description Coding (MDC).  Such Media Encoder produce multiple
469	   independent and thus individually decodable Encoded Streams.
470	   However, (logically) combining multiple of these Encoded Streams into
471	   a single Received Source Stream during decoding leads to an
472	   improvement in perceptual reproduced quality when compared to
473	   decoding a single Encoded Stream.

475	   Creating multiple Encoded Streams from the same Source Stream, where
476	   the Encoded Streams are neither in a scalable nor in an MDC
477	   relationship is commonly utilized in Simulcast environments.

479	   Characteristics:

481	   o  A Media Source can be multiply encoded by different Media Encoders
482	      to provide various encoded representations.

484	2.1.7.  Encoded Stream

486	   A stream of time synchronized encoded media that can be independently
487	   decoded.

489	   Characteristics:

491	   o  Due to temporal dependencies, an Encoded Stream may have
492	      limitations in where decoding can be started.  These entry points,
493	      for example Intra frames from a video encoder, may require
494	      identification and their generation may be event based or
495	      configured to occur periodically.

497	2.1.8.  Dependent Stream

499	   A stream of time synchronized encoded media fragments that are
500	   dependent on one or more Encoded Streams (Section 2.1.7) and zero or
501	   more Dependent Streams to be possible to decode.

503	   Characteristics:

505	   o  Each Dependent Stream has a set of dependencies.  These
506	      dependencies must be understood by the parties in a Multimedia
507	      Session that intend to use a Dependent Stream.

509	2.1.9.  Media Packetizer

511	   The transformation of taking one or more Encoded (Section 2.1.7) or
512	   Dependent Streams (Section 2.1.8) and put their content into one or
513	   more sequences of packets, normally RTP packets, and output Source
514	   RTP Streams (Section 2.1.10).  This step includes both generating RTP
515	   payloads as well as RTP packets.

517	   The Media Packetizer can use multiple inputs when producing a single
518	   RTP Stream.  One such example is SRST packetization when using SVC
519	   (Section 3.5).

521	   The Media Packetizer can also produce multiple RTP Streams, for
522	   example when Encoded and/or Dependent Streams are distributed over
523	   multiple RTP Streams.  One example of this is MRMT packetization when
524	   using SVC (Section 3.5).

526	   Characteristics:

528	   o  The Media Packetizer will select which Synchronization source(s)
529	      (SSRC) [RFC3550] in which RTP Sessions that are used.

531	   o  Media Packetizer can combine multiple Encoded or Dependent Streams
532	      into one or more RTP Streams.

534	2.1.10.  RTP Stream

536	   A stream of RTP packets containing media data, source or redundant.
537	   The RTP Stream is identified by an SSRC belonging to a particular RTP
538	   Session.  The RTP Session is identified as discussed in
539	   Section 2.2.2.

541	   A Source RTP Stream is a RTP Stream containing at least some content
542	   from an Encoded Stream (Section 2.1.7).  Source material is any media
543	   material that is produced for transport over RTP without any
544	   additional RTP-based redundancy applied.  Note that RTP-based
545	   redundancy excludes the type of redundancy that most suitable Media
546	   Encoders (Section 2.1.6) may add to the media format of the Encoded
547	   Stream that makes it cope better with inevitable RTP packet losses.
548	   This is further described in RTP-based Redundancy (Section 2.1.11)
549	   and Redundancy RTP Stream (Section 2.1.12).

551	   Characteristics:

553	   o  Each RTP Stream is identified by a Synchronization source (SSRC)
554	      [RFC3550] that is carried in every RTP and RTP Control Protocol
555	      (RTCP) packet header.  The SSRC is unique in a specific RTP
556	      Session context.

558	   o  At any given point in time, a RTP Stream can have one and only one
559	      SSRC, but SSRCs for a given RTP Stream can change over time.  SSRC
560	      collision and clock rate change [RFC7160] are examples of valid
561	      reasons to change SSRC for an RTP Stream.  In those cases, the RTP
562	      Stream itself is not changed in any significant way, only the
563	      identifying SSRC number.

565	   o  Each SSRC defines a unique RTP sequence numbering and timing
566	      space.

568	   o  Several RTP Streams, each with their own SSRC, may represent a
569	      single Media Source.

571	   o  Several RTP Streams, each with their own SSRC, can be carried in a
572	      single RTP Session.

574	2.1.11.  RTP-based Redundancy

576	   RTP-based Redundancy is defined here as a transformation that
577	   generates redundant or repair packets sent out as a Redundancy RTP
578	   Stream (Section 2.1.12) to mitigate network transport impairments,
579	   like packet loss and delay.

581	   The RTP-based Redundancy exists in many flavors; they may be
582	   generating independent Repair Streams that are used in addition to
583	   the Source Stream (like RTP Retransmission (Section 3.11) and some
584	   special types of Forward Error Correction, like RTP stream
585	   duplication (Section 3.9)), they may generate a new Source Stream by
586	   combining redundancy information with source information (Using XOR
587	   FEC (Section 3.12) as a redundancy payload (Section 3.10)), or
588	   completely replace the source information with only redundancy
589	   packets.

591	2.1.12.  Redundancy RTP Stream

593	   A RTP Stream (Section 2.1.10) that contains no original source data,
594	   only redundant data that may be combined with one or more Received
595	   RTP Stream (Section 2.1.19) to produce Repaired RTP Streams
596	   (Section 2.1.22).

598	2.1.13.  Media Transport

600	   A Media Transport defines the transformation that the RTP Streams
601	   (Section 2.1.10) are subjected to by the end-to-end transport from
602	   one RTP sender to one specific RTP receiver (an RTP Session
603	   (Section 2.2.2) may contain multiple RTP receivers per sender).  Each
604	   Media Transport is defined by a transport association that is
605	   normally identified by a 5-tuple (source address, source port,
606	   destination address, destination port, transport protocol), but a
607	   proposal exists for sending multiple transport associations on a
608	   single 5-tuple [I-D.westerlund-avtcore-transport-multiplexing].

610	   Characteristics:

612	   o  Media Transport transmits RTP Streams of RTP Packets from a source
613	      transport address to a destination transport address.

615	   o  Each Media Transport contains only a single RTP Session.

617	   o  A single RTP Session can span multiple Media Transports.

619	   The Media Transport concept sometimes needs to be decomposed into
620	   more steps to enable discussion of what a sender emits that gets
621	   transformed by the network before it is received by the receiver.
622	   Thus we provide also this Media Transport decomposition (Figure 5).

624	                               RTP Stream
625	                                    |
626	                                    V
627	                       +--------------------------+
628	                       |  Media Transport Sender  |
629	                       +--------------------------+
630	                                    |
631	                             Sent RTP Stream
632	                                    V
633	                       +--------------------------+
634	                       |    Network Transport     |
635	                       +--------------------------+
636	                                    |
637	                         Transported RTP Stream
638	                                    V
639	                       +--------------------------+
640	                       | Media Transport Receiver |
641	                       +--------------------------+
642	                                    |
643	                                    V
644	                           Received RTP Stream

646	                Figure 5: Decomposition of Media Transport

648	2.1.14.  Media Transport Sender

650	   The first transformation within the Media Transport (Section 2.1.13)
651	   is the Media Transport Sender.  The sending Endpoint (Section 2.2.1)
652	   takes an RTP Stream and emits the packets onto the network using the
653	   transport association established for this Media Transport, thereby
654	   creating a Sent RTP Stream (Section 2.1.15).  In the process, it
655	   transforms the RTP Stream in several ways.  First, it generates the
656	   necessary protocol headers for the transport association, for example
657	   IP and UDP headers, thus forming IP/UDP/RTP packets.  In addition,
658	   the Media Transport Sender may queue, pace or otherwise affect how
659	   the packets are emitted onto the network, thereby potentially
660	   introducing delay, jitter and inter packet spacings that characterize
661	   the Sent RTP Stream.

663	2.1.15.  Sent RTP Stream

665	   The Sent RTP Stream is the RTP Stream as entering the first hop of
666	   the network path to its destination.  The Sent RTP Stream is
667	   identified using network transport addresses, like for IP/UDP the
668	   5-tuple (source IP address, source port, destination IP address,
669	   destination port, and protocol (UDP)).

671	2.1.16.  Network Transport

673	   Network Transport is the transformation that subjects the Sent RTP
674	   Stream (Section 2.1.15) to traveling from the source to the
675	   destination through the network.  This transformation can result in
676	   loss of some packets, varying delay on a per packet basis, packet
677	   duplication, and packet header or data corruption.  This
678	   transformation produces a Transported RTP Stream (Section 2.1.17) at
679	   the exit of the network path.

681	2.1.17.  Transported RTP Stream

683	   The RTP Stream that is emitted out of the network path at the
684	   destination, subjected to the Network Transport's transformation
685	   (Section 2.1.16).

687	2.1.18.  Media Transport Receiver

689	   The receiver Endpoint's (Section 2.2.1) transformation of the
690	   Transported RTP Stream (Section 2.1.17) by its reception process,
691	   which results in the Received RTP Stream (Section 2.1.19).  This
692	   transformation includes transport checksums being verified.  Sensible
693	   system designs typically either discard packets with mis-matching
694	   checksums, or pass them on while somehow marking them in the
695	   resulting Received RTP Stream so to alarm subsequent transformations
696	   about the possible corrupt state.  In this context it is worth noting
697	   that there is typically some probability for corrupt packets to pass
698	   through undetected (with a seemingly correct checksum).  Other
699	   transformations can compensate for delay variations in receiving a
700	   packet on the network interface and providing it to the application
701	   (de-jitter buffer).

703	2.1.19.  Received RTP Stream

705	   The RTP Stream (Section 2.1.10) resulting from the Media Transport's
706	   transformation, i.e. subjected to packet loss, packet corruption,
707	   packet duplication and varying transmission delay from sender to
708	   receiver.

710	2.1.20.  Received Redundancy RTP Stream

712	   The Redundancy RTP Stream (Section 2.1.12) resulting from the Media
713	   Transport transformation, i.e. subjected to packet loss, packet
714	   corruption, and varying transmission delay from sender to receiver.

716	2.1.21.  RTP-based Repair

718	   RTP-based Repair is a Transformation that takes as input one or more
719	   Received RTP Streams (Section 2.1.19) and Received Redundancy RTP
720	   Streams (Section 2.1.20), and produces one or more Repaired RTP
721	   Streams (Section 2.1.22) that are as close to the corresponding sent
722	   Source RTP Streams (Section 2.1.10) as possible, using different RTP-
723	   based repair methods, for example the ones referred in RTP-based
724	   Redundancy (Section 2.1.11).

726	2.1.22.  Repaired RTP Stream

728	   A Received RTP Stream (Section 2.1.19) for which Received Redundancy
729	   RTP Stream (Section 2.1.20) information has been used to try to
730	   recover the Source RTP Stream (Section 2.1.10) as it was before Media
731	   Transport (Section 2.1.13).

733	2.1.23.  Media Depacketizer

735	   A Media Depacketizer takes one or more RTP Streams (Section 2.1.10),
736	   depacketizes them, and attempts to reconstitute the Encoded Streams
737	   (Section 2.1.7) or Dependent Streams (Section 2.1.8) present in those
738	   RTP Streams.

740	   It should be noted that in practical implementations, the Media
741	   Depacketizer and the Media Decoder may be tightly coupled and share
742	   information to improve or optimize the overall decoding and error
743	   concealment process.  It is, however, not expected that there would
744	   be any benefit in defining a taxonomy for those detailed (and likely
745	   very implementation-dependent) steps.

747	2.1.24.  Received Encoded Stream

749	   The received version of an Encoded Stream (Section 2.1.7).

751	2.1.25.  Media Decoder

753	   A Media Decoder is a transformation that is responsible for decoding
754	   Encoded Streams (Section 2.1.7) and any Dependent Streams
755	   (Section 2.1.8) into a Source Stream (Section 2.1.5).

757	   It should be noted that in practical implementations, the Media
758	   Decoder and the Media Depacketizer may be tightly coupled and share
759	   information to improve or optimize the overall decoding process in
760	   various ways.  It is however not expected that there would be any
761	   benefit in defining a taxonomy for those detailed (and likely very
762	   implementation-dependent) steps.

764	   Characteristics:

766	   o  A Media Decoder has to deal with any errors in the Encoded Streams
767	      that resulted from corruption or failure to repair packet losses.
768	      Therefore, it commonly is robust to error and losses, and includes
769	      concealment methods.

771	2.1.26.  Received Source Stream

773	   The received version of a Source Stream (Section 2.1.5).

775	2.1.27.  Media Sink

777	   The Media Sink receives a Source Stream (Section 2.1.5) that
778	   contains, usually periodically, sampled media data together with
779	   associated synchronization information.  Depending on application,
780	   this Source Stream then needs to be transformed into a Raw Stream
781	   (Section 2.1.3) that is conveyed to the Media Render
782	   (Section 2.1.29), synchronized with the output from other Media
783	   Sinks.  The Media Sink may also be connected with a Media Source
784	   (Section 2.1.4) and be used as part of a conceptual Media Source.

786	   Characteristics:

788	   o  The Media Sink can further transform the Source Stream into a
789	      representation that is suitable for rendering on the Media Render
790	      as defined by the application or system-wide configuration.  This
791	      include sample scaling, level adjustments etc.

793	2.1.28.  Received Raw Stream

795	   The received version of a Raw Stream (Section 2.1.3).

797	2.1.29.  Media Render

799	   A Media Render takes a Raw Stream (Section 2.1.3) and converts it
800	   into Physical Stimulus (Section 2.1.1) that a human user can
801	   perceive.  Examples of such devices are screens, and D/A converters
802	   connected to amplifiers and loudspeakers.

804	   Characteristics:

806	   o  An Endpoint can potentially have multiple Media Renders for each
807	      media type.

809	2.2.  Communication Entities

811	   This section contains concept for entities involved in the
812	   communication.

814	      +------------------------------------------------------------+
815	      | Communication Session                                      |
816	      |                                                            |
817	      | +----------------+                      +----------------+ |
818	      | | Participant A  |    +------------+    | Participant B  | |
819	      | |                |    | Multimedia |    |                | |
820	      | | +-------------+|<==>| Session    |<==>|+-------------+ | |
821	      | | | Endpoint A  ||    |            |    || Endpoint B  | | |
822	      | | |             ||    +------------+    ||             | | |
823	      | | | +-----------++----------------------++-----------+ | | |
824	      | | | |            |                      |            | | | |
825	      | | | | RTP Session|---Media Transport--->|            | | | |
826	      | | | | Audio      |<---Media Transport---|            | | | |
827	      | | | |            |          ^           |            | | | |
828	      | | | +-----------++----------|-----------++-----------+ | | |
829	      | | |             ||          v           ||             | | |
830	      | | |             || +-----------------+  ||             | | |
831	      | | |             || | Synchronization |  ||             | | |
832	      | | |             || |     Context     |  ||             | | |
833	      | | |             || +-----------------+  ||             | | |
834	      | | |             ||          ^           ||             | | |
835	      | | | +-----------++----------|-----------++-----------+ | | |
836	      | | | |            |          v           |            | | | |
837	      | | | | RTP Session|<---Media Transport---|            | | | |
838	      | | | | Video      |---Media Transport--->|            | | | |
839	      | | | |            |                      |            | | | |
840	      | | | +-----------++----------------------++-----------+ | | |
841	      | | +-------------+|                      |+-------------+ | |
842	      | +----------------+                      +----------------+ |
843	      +------------------------------------------------------------+

845	    Figure 6: Example Point to Point Communication Session with two RTP
846	                                 Sessions

848	   The figure above shows a high-level example representation of a very
849	   basic point-to-point Communication Session between Participants A and
850	   B.  It uses two different audio and video RTP Sessions between A's
851	   and B's Endpoints, using separate Media Transports for those RTP
852	   Sessions.  The Multimedia Session shared by the Participants can, for
853	   example, be established using SIP (i.e., there is a SIP Dialog
854	   between A and B).  The terms used in that figure are further
855	   elaborated in the sub-sections below.

857	2.2.1.  Endpoint

859	   A single addressable entity sending or receiving RTP packets.  It may
860	   be decomposed into several functional blocks, but as long as it
861	   behaves as a single RTP stack entity it is classified as a single
862	   "Endpoint".

864	   Characteristics:

866	   o  Endpoints can be identified in several different ways.  While RTCP
867	      Canonical Names (CNAMEs) [RFC3550] provide a globally unique and
868	      stable identification mechanism for the duration of the
869	      Communication Session (see Section 2.2.5), their validity applies
870	      exclusively within a Synchronization Context (Section 3.1).  Thus
871	      one Endpoint can handle multiple CNAMEs, each of which can be
872	      shared among a set of Endpoints belonging to the same Participant
873	      (Section 2.2.3).  Therefore, mechanisms outside the scope of RTP,
874	      such as application defined mechanisms, must be used to ensure
875	      Endpoint identification when outside this Synchronization Context.

877	   o  An Endpoint can be associated with at most one Participant
878	      (Section 2.2.3) at any single point in time.

880	   o  In some contexts, an Endpoint would typically correspond to a
881	      single "host", for example a computer using a single network
882	      interface and being used by a single human user.

884	2.2.2.  RTP Session

886	   An RTP Session is an association among a group of Participants
887	   communicating with RTP.  It is a group communications channel which
888	   can potentially carry a number of RTP Streams.  Within an RTP
889	   Session, every Participant can find meta-data and control information
890	   (over RTCP) about all the RTP Streams in the RTP Session.  The
891	   bandwidth of the RTCP control channel is shared between all
892	   Participants within an RTP Session.

894	   Characteristics:

896	   o  An RTP Session can carry one ore more RTP Streams.

898	   o  An RTP Session shares a single SSRC space as defined in RFC3550
899	      [RFC3550].  That is, the Endpoints participating in an RTP Session
900	      can see an SSRC identifier transmitted by any of the other
901	      Endpoints.  An Endpoint can receive an SSRC either as SSRC or as a
902	      Contributing source (CSRC) in RTP and RTCP packets, as defined by
903	      the Endpoints' network interconnection topology.

905	   o  An RTP Session uses at least two Media Transports
906	      (Section 2.1.13), one for sending and one for receiving.
907	      Commonly, the receiving Media Transport is the reverse direction
908	      of the Media Transport used for sending.  An RTP Session may use
909	      many Media Transports and these define the session's network
910	      interconnection topology.

912	   o  A single Media Transport always carries a single RTP Session.

914	   o  Multiple RTP Sessions can be conceptually related, for example
915	      originating from or targeted for the same Participant
916	      (Section 2.2.3) or Endpoint (Section 2.2.1), or by containing RTP
917	      Streams that are somehow related (Section 3).

919	2.2.3.  Participant

921	   A Participant is an entity reachable by a single signaling address,
922	   and is thus related more to the signaling context than to the media
923	   context.

925	   Characteristics:

927	   o  A single signaling-addressable entity, using an application-
928	      specific signaling address space, for example a SIP URI.

930	   o  A Participant can participate in several Multimedia Sessions
931	      (Section 2.2.4).

933	   o  A Participant can be comprised of several associated Endpoints
934	      (Section 2.2.1).

936	2.2.4.  Multimedia Session

938	   A Multimedia Session is an association among a group of Participants
939	   (Section 2.2.3) engaged in the communication via one or more RTP
940	   Sessions (Section 2.2.2).  It defines logical relationships among
941	   Media Sources (Section 2.1.4) that appear in multiple RTP Sessions.

943	   Characteristics:

945	   o  A Multimedia Session can be composed of several RTP Sessions with
946	      potentially multiple RTP Streams per RTP Session.

948	   o  Each Participant in a Multimedia Session can have a multitude of
949	      Media Captures and Media Rendering devices.

951	   o  A single Multimedia Session can contain media from one or more
952	      Synchronization Contexts (Section 3.1).  An example of that is a
953	      Multimedia Session containing one set of audio and video for
954	      communication purposes belonging to one Synchronization Context,
955	      and another set of audio and video for presentation purposes (like
956	      playing a video file) with a separate Synchronization Context that
957	      has no strong timing relationship and need not be strictly
958	      synchronized with the audio and video used for communication.

960	2.2.5.  Communication Session

962	   A Communication Session is an association among two or more
963	   Participants (Section 2.2.3) communicating with each other via one or
964	   more Multimedia Sessions (Section 2.2.4).

966	   Characteristics:

968	   o  Each Participant in a Communication Session is identified via an
969	      application-specific signaling address.

971	   o  A Communication Session is composed of Participants that share at
972	      least one Multimedia Session, involving one or more parallel RTP
973	      Sessions with potentially multiple RTP Streams per RTP Session.

975	   For example, in a full mesh communication, the Communication Session
976	   consists of a set of separate Multimedia Sessions between each pair
977	   of Participants.  Another example is a centralized conference, where
978	   the Communication Session consists of a set of Multimedia Sessions
979	   between each Participant and the conference handler.

981	3.  Concepts of Inter-Relations

983	   This section uses the concepts from previous sections, and looks at
984	   different types of relationships among them.  These relationships
985	   occur at different abstraction levels and for different purposes, but
986	   the reason for the needed relationship at a certain step in the media
987	   handling chain may exist at another step.  For example, the use of
988	   Simulcast (Section 3.7)) implies a need to determine relations at RTP
989	   Stream level, but the underlying reason is that multiple Media
990	   Encoders use the same Media Source, i.e. to be able to identify a
991	   common Media Source.

993	3.1.  Synchronization Context

995	   A Synchronization Context defines a requirement on a strong timing
996	   relationship between the Media Sources, typically requiring alignment
997	   of clock sources.  Such a relationship can be identified in multiple
998	   ways as listed below.  A single Media Source can only belong to a
999	   single Synchronization Context, since it is assumed that a single
1000	   Media Source can only have a single media clock and requiring
1001	   alignment to several Synchronization Contexts (and thus reference
1002	   clocks) will effectively merge those into a single Synchronization
1003	   Context.

1005	3.1.1.  RTCP CNAME

1007	   RFC3550 [RFC3550] describes Inter-media synchronization between RTP
1008	   Sessions based on RTCP CNAME, RTP and Network Time Protocol (NTP)
1009	   [RFC5905] formatted timestamps of a reference clock.  As indicated in
1010	   [RFC7273], despite using NTP format timestamps, it is not required
1011	   that the clock be synchronized to an NTP source.

1013	3.1.2.  Clock Source Signaling

1015	   [RFC7273] provides a mechanism to signal the clock source in SDP both
1016	   for the reference clock as well as the media clock, thus allowing a
1017	   Synchronization Context to be defined beyond the one defined by the
1018	   usage of CNAME source descriptions.

1020	3.1.3.  Implicitly via RtcMediaStream

1022	   The WebRTC WG defines "RtcMediaStream" with one or more
1023	   "RtcMediaStreamTracks".  All tracks in a "RtcMediaStream" are
1024	   intended to be synchronized when rendered, implying that they must be
1025	   generated such that synchronization is possible.

1027	3.1.4.  Explicitly via SDP Mechanisms

1029	   The SDP Grouping Framework [RFC5888] defines an m= line (Section 4.2)
1030	   grouping mechanism called "Lip Synchronization (LS)" for establishing
1031	   the synchronization requirement across m= lines when they map to
1032	   individual sources.

1034	   Source-Specific Media Attributes in SDP [RFC5576] extends the above
1035	   mechanism when multiple Media Sources are described by a single m=
1036	   line.

1038	3.2.  Endpoint

1040	   Some applications requires knowledge of what Media Sources originate
1041	   from a particular Endpoint (Section 2.2.1).  This can include such
1042	   decisions as packet routing between parts of the topology, knowing
1043	   the Endpoint origin of the RTP Streams.

1045	   In RTP, this identification has been overloaded with the
1046	   Synchronization Context (Section 3.1) through the usage of the RTCP
1047	   source description CNAME (Section 3.1.1).  This works for some
1048	   usages, but in others it breaks down.  For example, if an Endpoint
1049	   has two sets of Media Sources that have different Synchronization
1050	   Contexts, like the audio and video of the human Participant as well
1051	   as a set of Media Sources of audio and video for a shared movie,
1052	   CNAME would not be an appropriate identification for that Endpoint.
1053	   Therefore, an Endpoint may have multiple CNAMEs.  The CNAMEs or the
1054	   Media Sources themselves can be related to the Endpoint.

1056	3.3.  Participant

1058	   In communication scenarios, it is commonly needed to know which Media
1059	   Sources originate from which Participant (Section 2.2.3).  One reason
1060	   is, for example, to enable the application to display Participant
1061	   Identity information correctly associated with the Media Sources.
1062	   This association is handled through the signaling solution to point
1063	   at a specific Multimedia Session where the Media Sources may be
1064	   explicitly or implicitly tied to a particular Endpoint.

1066	   Participant information becomes more problematic due to Media Sources
1067	   that are generated through mixing or other conceptual processing of
1068	   Raw Streams or Source Streams that originate from different
1069	   Participants.  This type of Media Sources can thus have a dynamically
1070	   varying set of origins and Participants.  RTP contains the concept of
1071	   Contributing Sources (CSRC) that carry information about the previous
1072	   step origin of the included media content on RTP level.

1074	3.4.  RtcMediaStream

1076	   An RtcMediaStream in WebRTC is an explicit grouping of a set of Media
1077	   Sources (RtcMediaStreamTracks) that share a common identifier and a
1078	   single Synchronization Context (Section 3.1).

1080	3.5.  Single- and Multi-Session Transmission of Dependent Streams

1082	   Scalable media coding formats such as, for example, H.264 based
1083	   Scalable Video Coding [RFC6190] has two modes of operation:

1085	   1.  In Single Session Transmission (SST), the SVC Media Encoder sends
1086	       Encoded Streams (Section 2.1.7) and Dependent Streams
1087	       (Section 2.1.8) as a single RTP Stream (Section 2.1.10) in a
1088	       single RTP Session (Section 2.2.2), using the SVC RTP Payload
1089	       format.

1091	   2.  In Multi-Session Transmission (MST), the SVC Media Encoder sends
1092	       Encoded Streams and Dependent Streams distributed across multiple
1093	       RTP Streams in one or more RTP Sessions.

1095	   SST denotes one RTP Stream (SSRC) per Media Source in a single RTP
1096	   Session.  MST denotes one or more RTP Streams (SSRC) per Media Source
1097	   in each of multiple RTP Sessions.  The above is not unambiguously
1098	   specified in the SVC payload format text [RFC6190], but it is what
1099	   existing deployments of that RFC have implemented.

1101	   The use of the term "RTP Session" in the SST/MST definition is
1102	   somewhat misleading, since a single RTP Session can contain multiple
1103	   RTP Streams.  Also, it is sometimes useful to make a distinction
1104	   between using a single Media Transport or multiple separate Media
1105	   Transports when (in both cases) using multiple RTP Streams to carry
1106	   Encoded Streams and Dependent Streams for a Media Source.  Therefore,
1107	   herein the following new terminology is defined:

1109	   SRST:  Single RTP Stream on a Single Media Transport

1111	   MRST:  Multiple RTP Streams on a Single Media Transport

1113	   MRMT:  Multiple RTP Streams on Multiple Media Transports

1115	3.6.  Multi-Channel Audio

1117	   There exist a number of RTP payload formats that can carry multi-
1118	   channel audio, despite the codec being a mono encoder.  Multi-channel
1119	   audio can be viewed as multiple Media Sources sharing a common
1120	   Synchronization Context.  These are independently encoded by a Media
1121	   Encoder and the different Encoded Streams are packetized together in
1122	   a time synchronized way into a single Source RTP Stream, using the
1123	   used codec's RTP Payload format.  Examples of codecs that support
1124	   multi-channel audio are PCMA and PCMU [RFC3551], AMR [RFC4867], and
1125	   G.719 [RFC5404].

1127	3.7.  Simulcast

1129	   A Media Source represented as multiple independent Encoded Streams
1130	   constitutes a Simulcast or Multiple Description Coding of that Media
1131	   Source.  Figure 7 below shows an example of a Media Source that is
1132	   encoded into three separate Simulcast streams, that are in turn sent
1133	   on the same Media Transport flow.  When using Simulcast, the RTP
1134	   Streams may be sharing RTP Session and Media Transport, or be
1135	   separated on different RTP Sessions and Media Transports, or any
1136	   combination of these two.  It is other considerations that affect
1137	   which usage is desirable, as discussed in Section 3.13.

1139	                            +----------------+
1140	                            |  Media Source  |
1141	                            +----------------+
1142	                     Source Stream  |
1143	             +----------------------+----------------------+
1144	             |                      |                      |
1145	             V                      V                      V
1146	    +------------------+   +------------------+   +------------------+
1147	    |  Media Encoder   |   |  Media Encoder   |   |  Media Encoder   |
1148	    +------------------+   +------------------+   +------------------+
1149	             | Encoded              | Encoded              | Encoded
1150	             | Stream               | Stream               | Stream
1151	             V                      V                      V
1152	    +------------------+   +------------------+   +------------------+
1153	    | Media Packetizer |   | Media Packetizer |   | Media Packetizer |
1154	    +------------------+   +------------------+   +------------------+
1155	             | Source               | Source               | Source
1156	             | RTP                  | RTP                  | RTP
1157	             | Stream               | Stream               | Stream
1158	             +-----------------+    |    +-----------------+
1159	                               |    |    |
1160	                               V    V    V
1161	                          +-------------------+
1162	                          |  Media Transport  |
1163	                          +-------------------+

1165	                Figure 7: Example of Media Source Simulcast

1167	   The Simulcast relation between the RTP Streams is the common Media
1168	   Source.  In addition, to be able to identify the common Media Source,
1169	   a receiver of the RTP Stream may need to know which configuration or
1170	   encoding goals that lay behind the produced Encoded Stream and its
1171	   properties.  This to enable selection of the stream that is most
1172	   useful in the application at that moment.

1174	3.8.  Layered Multi-Stream

1176	   Layered Multi-Stream (LMS) is a mechanism by which different portions
1177	   of a layered encoding of a Source Stream are sent using separate RTP
1178	   Streams (sometimes in separate RTP Sessions).  LMSs are useful for
1179	   receiver control of layered media.

1181	   A Media Source represented as an Encoded Stream and multiple
1182	   Dependent Streams constitutes a Media Source that has layered
1183	   dependencies.  The figure below represents an example of a Media
1184	   Source that is encoded into three dependent layers, where two layers
1185	   are sent on the same Media Transport using different RTP Streams,
1186	   i.e.  SSRCs, and the third layer is sent on a separate Media
1187	   Transport.

1189	                            +----------------+
1190	                            |  Media Source  |
1191	                            +----------------+
1192	                                    |
1193	                                    |
1194	                                    V
1195	       +---------------------------------------------------------+
1196	       |                      Media Encoder                      |
1197	       +---------------------------------------------------------+
1198	               |                    |                     |
1199	        Encoded Stream       Dependent Stream     Dependent Stream
1200	               |                    |                     |
1201	               V                    V                     V
1202	       +----------------+   +----------------+   +----------------+
1203	       |Media Packetizer|   |Media Packetizer|   |Media Packetizer|
1204	       +----------------+   +----------------+   +----------------+
1205	               |                    |                     |
1206	          RTP Stream           RTP Stream            RTP Stream
1207	               |                    |                     |
1208	               +------+      +------+                     |
1209	                      |      |                            |
1210	                      V      V                            V
1211	                +-----------------+              +-----------------+
1212	                | Media Transport |              | Media Transport |
1213	                +-----------------+              +-----------------+

1215	           Figure 8: Example of Media Source Layered Dependency

1217	   As an example, the SVC MRST and MRMT (Section 3.5) relations needs to
1218	   identify the common Media Encoder origin for the Encoded and
1219	   Dependent Streams.  The SVC RTP Payload RFC [RFC6190] is not
1220	   particularly explicit about how this relation is to be implemented.
1221	   When using different RTP Sessions, thus different Media Transports
1222	   (MRMT (Section 3.5)), and as long as there is only one RTP Stream per
1223	   Media Encoder and a single Media Source in each RTP Session (MRMT),
1224	   common SSRC and CNAMEs can be used to identify the common Media
1225	   Source.  When multiple RTP Streams are sent from one Media Encoder in
1226	   the same RTP Session (MRST), then CNAME is the only currently
1227	   specified RTP identifier that can be used.  In cases where multiple
1228	   Media Encoders use multiple Media Sources sharing Synchronization
1229	   Context, and thus having a common CNAME, additional heuristics or
1230	   identification need to be applied to create the MRST or MRMT
1231	   relationships between the RTP Streams.

1233	3.9.  RTP Stream Duplication

1235	   RTP Stream Duplication [RFC7198], using the same or different Media
1236	   Transports, and optionally also delaying the duplicate [RFC7197],
1237	   offers a simple way to protect media flows from packet loss in some
1238	   cases.  It is a specific type of redundancy and all but one Source
1239	   RTP Stream (Section 2.1.10) are effectively Redundancy RTP Streams
1240	   (Section 2.1.12), but since both Source and Redundant RTP Streams are
1241	   the same it does not matter which one is which.  This can also be
1242	   seen as a specific type of Simulcast (Section 3.7) that transmits the
1243	   same Encoded Stream (Section 2.1.7) multiple times.

1245	                            +----------------+
1246	                            |  Media Source  |
1247	                            +----------------+
1248	                     Source Stream  |
1249	                                    V
1250	                            +----------------+
1251	                            | Media Encoder  |
1252	                            +----------------+
1253	                    Encoded Stream  |
1254	                        +-----------+-----------+
1255	                        |                       |
1256	                        V                       V
1257	               +------------------+    +------------------+
1258	               | Media Packetizer |    | Media Packetizer |
1259	               +------------------+    +------------------+
1260	                 Source | RTP Stream     Source | RTP Stream
1261	                        |                       V
1262	                        |                +-------------+
1263	                        |                | Delay (opt) |
1264	                        |                +-------------+
1265	                        |                       |
1266	                        +-----------+-----------+
1267	                                    |
1268	                                    V
1269	                          +-------------------+
1270	                          |  Media Transport  |
1271	                          +-------------------+

1273	                Figure 9: Example of RTP Stream Duplication

1275	3.10.  Redundancy Format

1277	   The RTP Payload for Redundant Audio Data [RFC2198] defines a
1278	   transport for redundant audio data together with primary data in the
1279	   same RTP payload.  The redundant data can be a time delayed version
1280	   of the primary or another time delayed Encoded Stream using a
1281	   different Media Encoder to encode the same Media Source as the
1282	   primary, as depicted below in Figure 10.

1284	              +--------------------+
1285	              |    Media Source    |
1286	              +--------------------+
1287	                        |
1288	                   Source Stream
1289	                        |
1290	                        +------------------------+
1291	                        |                        |
1292	                        V                        V
1293	              +--------------------+   +--------------------+
1294	              |   Media Encoder    |   |   Media Encoder    |
1295	              +--------------------+   +--------------------+
1296	                        |                        |
1297	                        |                 +------------+
1298	                  Encoded Stream          | Time Delay |
1299	                        |                 +------------+
1300	                        |                        |
1301	                        |     +------------------+
1302	                        V     V
1303	              +--------------------+
1304	              |  Media Packetizer  |
1305	              +--------------------+
1306	                        |
1307	                        V
1308	                   RTP Stream

1310	   Figure 10: Concept for usage of Audio Redundancy with different Media
1311	                                 Encoders

1313	   The Redundancy format is thus providing the necessary meta
1314	   information to correctly relate different parts of the same Encoded
1315	   Stream, or in the case depicted above (Figure 10) relate the Received
1316	   Source Stream fragments coming out of different Media Decoders to be
1317	   able to combine them together into a less erroneous Source Stream.

1319	3.11.  RTP Retransmission

1321	   Figure 11 shows an example where a Media Source's Source RTP Stream
1322	   is protected by a retransmission (RTX) flow [RFC4588].  In this
1323	   example the Source RTP Stream and the Redundancy RTP Stream share the
1324	   same Media Transport.

1326	          +--------------------+
1327	          |    Media Source    |
1328	          +--------------------+
1329	                    |
1330	                    V
1331	          +--------------------+
1332	          |   Media Encoder    |
1333	          +--------------------+
1334	                    |                              Retransmission
1335	              Encoded Stream     +--------+     +---- Request
1336	                    V            |        V     V
1337	          +--------------------+ | +--------------------+
1338	          |  Media Packetizer  | | | RTP Retransmission |
1339	          +--------------------+ | +--------------------+
1340	                    |            |           |
1341	                    +------------+  Redundancy RTP Stream
1342	             Source RTP Stream               |
1343	                    |                        |
1344	                    +---------+    +---------+
1345	                              |    |
1346	                              V    V
1347	                       +-----------------+
1348	                       | Media Transport |
1349	                       +-----------------+

1351	          Figure 11: Example of Media Source Retransmission Flows

1353	   The RTP Retransmission example (Figure 11) illustrates that this
1354	   mechanism works purely on the Source RTP Stream.  The RTP
1355	   Retransmission transform buffers the sent Source RTP Stream and, upon
1356	   request, emits a retransmitted packet with an extra payload header as
1357	   a Redundancy RTP Stream.  The RTP Retransmission mechanism [RFC4588]
1358	   is specified such that there is a one to one relation between the
1359	   Source RTP Stream and the Redundancy RTP Stream.  Therefore, a
1360	   Redundancy RTP Stream needs to be associated with its Source RTP
1361	   Stream.  This is done based on CNAME selectors and heuristics to
1362	   match requested packets for a given Source RTP Stream with the
1363	   original sequence number in the payload of any new Redundancy RTP
1364	   Stream using the RTX payload format.  In cases where the Redundancy
1365	   RTP Stream is sent in a separate RTP Session from the Source RTP
1366	   Stream, these sessions are related, which is signaled by using the
1367	   SDP Media Grouping's [RFC5888] FID semantics.

1369	3.12.  Forward Error Correction

1371	   The figure below (Figure 12) shows an example where two Media
1372	   Sources' Source RTP Streams are protected by FEC.  Source RTP Stream
1373	   A has a RTP-based Redundancy transformation in FEC Encoder 1.  This
1374	   produces a Redundancy RTP Stream 1, that is only related to Source
1375	   RTP Stream A.  The FEC Encoder 2, however, takes two Source RTP
1376	   Streams (A and B) and produces a Redundancy RTP Stream 2 that
1377	   protects them jointly, i.e.  Redundancy RTP Stream 2 relates to two
1378	   Source RTP Streams (a FEC group).  FEC decoding, when needed due to
1379	   packet loss or packet corruption at the receiver, requires knowledge
1380	   about which Source RTP Streams that the FEC encoding was based on.

1382	   In Figure 12 all RTP Streams are sent on the same Media Transport.
1383	   This is however not the only possible choice.  Numerous combinations
1384	   exist for spreading these RTP Streams over different Media Transports
1385	   to achieve the communication application's goal.

1387	       +--------------------+                +--------------------+
1388	       |   Media Source A   |                |   Media Source B   |
1389	       +--------------------+                +--------------------+
1390	                 |                                     |
1391	                 V                                     V
1392	       +--------------------+                +--------------------+
1393	       |   Media Encoder A  |                |   Media Encoder B  |
1394	       +--------------------+                +--------------------+
1395	                 |                                     |
1396	           Encoded Stream                        Encoded Stream
1397	                 V                                     V
1398	       +--------------------+                +--------------------+
1399	       | Media Packetizer A |                | Media Packetizer B |
1400	       +--------------------+                +--------------------+
1401	                 |                                     |
1402	        Source RTP Stream A                   Source RTP Stream B
1403	                 |                                     |
1404	           +-----+---------+-------------+         +---+---+
1405	           |               V             V         V       |
1406	           |       +---------------+  +---------------+    |
1407	           |       | FEC Encoder 1 |  | FEC Encoder 2 |    |
1408	           |       +---------------+  +---------------+    |
1409	           |  Redundancy   |     Redundancy   |            |
1410	           |  RTP Stream 1 |     RTP Stream 2 |            |
1411	           V               V                  V            V
1412	       +----------------------------------------------------------+
1413	       |                    Media Transport                       |
1414	       +----------------------------------------------------------+

1416	             Figure 12: Example of FEC Redundancy RTP Streams

1418	   As FEC Encoding exists in various forms, the methods for relating FEC
1419	   Redundancy RTP Streams with its source information in Source RTP
1420	   Streams are many.  The XOR based RTP FEC Payload format [RFC5109] is
1421	   defined in such a way that a Redundancy RTP Stream has a one to one
1422	   relation with a Source RTP Stream.  In fact, the RFC requires the
1423	   Redundancy RTP Stream to use the same SSRC as the Source RTP Stream.
1424	   This requires to either use a separate RTP Session or to use the
1425	   Redundancy RTP Payload format [RFC2198].  The underlying relation
1426	   requirement for this FEC format and a particular Redundancy RTP
1427	   Stream is to know the related Source RTP Stream, including its SSRC.

1429	3.13.  RTP Stream Separation

1431	   RTP Streams can be separated exclusively based on their SSRCs, at the
1432	   RTP Session level, or at the Multi-Media Session level.

1434	   When the RTP Streams that have a relationship are all sent in the
1435	   same RTP Session and are uniquely identified based on their SSRC
1436	   only, it is termed an SSRC-Only Based Separation.  Such streams can
1437	   be related via RTCP CNAME to identify that the streams belong to the
1438	   same Endpoint.  SSRC-based approaches [RFC5576], when used, can
1439	   explicitly relate various such RTP Streams.

1441	   On the other hand, when RTP Streams that are related but are sent in
1442	   the context of different RTP Sessions to achieve separation, it is
1443	   known as RTP Session-based separation.  This is commonly used when
1444	   the different RTP Streams are intended for different Media
1445	   Transports.

1447	   Several mechanisms that use RTP Session-based separation rely on it
1448	   to enable an implicit grouping mechanism expressing the relationship.
1449	   The solutions have been based on using the same SSRC value in the
1450	   different RTP Sessions to implicitly indicate their relation.  That
1451	   way, no explicit RTP level mechanism has been needed, only signaling
1452	   level relations have been established using semantics from Grouping
1453	   of Media lines framework [RFC5888].  Examples of this are RTP
1454	   Retransmission [RFC4588], SVC Multi-Session Transmission [RFC6190]
1455	   and XOR Based FEC [RFC5109].  RTCP CNAME explicitly relates RTP
1456	   Streams across different RTP Sessions, as explained in the previous
1457	   section.  Such a relationship can be used to perform inter-media
1458	   synchronization.

1460	   RTP Streams that are related and need to be associated can be part of
1461	   different Multimedia Sessions, rather than just different RTP
1462	   Sessions within the same Multimedia Session context.  This puts
1463	   further demand on the scope of the mechanism(s) and its handling of
1464	   identifiers used for expressing the relationships.

1466	3.14.  Multiple RTP Sessions over one Media Transport

1468	   [I-D.westerlund-avtcore-transport-multiplexing] describes a mechanism
1469	   that allows several RTP Sessions to be carried over a single
1470	   underlying Media Transport.  The main reasons for doing this are
1471	   related to the impact of using one or more Media Transports (using a
1472	   common network path or potentially have different ones).  The fewer
1473	   Media Transports used, the less need for NAT/FW traversal resources
1474	   and number of flow based QoS.

1476	   However, Multiple RTP Sessions over one Media Transport imply that a
1477	   single Media Transport 5-tuple is not sufficient to express in which
1478	   RTP Session context a particular RTP Stream exists.  Complexities in
1479	   the relationship between Media Transports and RTP Session already
1480	   exist as one RTP Session contains multiple Media Transports, e.g.
1481	   even a Peer-to-Peer RTP Session with RTP/RTCP Multiplexing requires
1482	   two Media Transports, one in each direction.  The relationship
1483	   between Media Transports and RTP Sessions as well as additional
1484	   levels of identifiers need to be considered in both signaling design
1485	   and when defining terminology.

1487	4.  Mapping from Existing Terms

1489	   This section describes a selected set of terms from some relevant
1490	   IETF RFC and Internet Drafts (at the time of writing), using the
1491	   concepts from previous sections.

1493	4.1.  Telepresence Terms

1495	   The terms in this sub-section are used in the context of CLUE
1496	   Telepresence [I-D.ietf-clue-framework].

1498	4.1.1.  Audio Capture

1500	   Describes an audio Media Source (Section 2.1.4).

1502	4.1.2.  Capture Device

1504	   Identifies a physical entity performing a Media Capture
1505	   (Section 2.1.2) transformation.

1507	4.1.3.  Capture Encoding

1509	   Describes an Encoded Stream (Section 2.1.7) related to CLUE specific
1510	   semantic information.

1512	4.1.4.  Capture Scene

1514	   Describes a set of spatially related Media Sources (Section 2.1.4).

1516	4.1.5.  Endpoint

1518	   Describes exactly one Participant (Section 2.2.3) and one or more
1519	   Endpoints (Section 2.2.1).

1521	4.1.6.  Individual Encoding

1523	   Describes the configuration information needed to perform a Media
1524	   Encoder (Section 2.1.6) transformation.

1526	4.1.7.  Media Capture

1528	   Describes either a Media Capture (Section 2.1.2) or a Media Source
1529	   (Section 2.1.4), depending on in which context the term is used.

1531	4.1.8.  Media Consumer

1533	   Describes the media receiving part of an Endpoint (Section 2.2.1).

1535	4.1.9.  Media Provider

1537	   Describes the media sending part of an Endpoint (Section 2.2.1).

1539	4.1.10.  Stream

1541	   Describes an RTP Stream (Section 2.1.10).

1543	4.1.11.  Video Capture

1545	   Describes a video Media Source (Section 2.1.4).

1547	4.2.  Media Description

1549	   A single Source Description Protocol (SDP) [RFC4566] media
1550	   description (or media block; an m-line and all subsequent lines until
1551	   the next m-line or the end of the SDP) describes part of the
1552	   necessary configuration and identification information needed for a
1553	   Media Encoder transformation, as well as the necessary configuration
1554	   and identification information for the Media Decoder to be able to
1555	   correctly interpret a received RTP Stream.

1557	   A Media Description typically relates to a single Media Source.  This
1558	   is for example an explicit restriction in WebRTC.  However, nothing
1559	   prevents that the same Media Description (and same RTP Session) is
1560	   re-used for multiple Media Sources
1561	   [I-D.ietf-avtcore-rtp-multi-stream].  It can thus describe properties
1562	   of one or more RTP Streams, and can also describe properties valid
1563	   for an entire RTP Session (via [RFC5576] mechanisms, for example).

1565	4.3.  Media Stream

1567	   RTP [RFC3550] uses media stream, audio stream, video stream, and
1568	   stream of (RTP) packets interchangeably, which are all RTP Streams.

1570	4.4.  Multimedia Conference

1572	   A Multimedia Conference is a Communication Session (Section 2.2.5)
1573	   between two or more Participants (Section 2.2.3), along with the
1574	   software they are using to communicate.

1576	4.5.  Multimedia Session

1578	   SDP [RFC4566] defines a Multimedia Session as a set of multimedia
1579	   senders and receivers and the data streams flowing from senders to
1580	   receivers, which would correspond to a set of Endpoints and the RTP
1581	   Streams that flow between them.  In this memo, Multimedia Session
1582	   (Section 2.2.4) also assumes those Endpoints belong to a set of
1583	   Participants that are engaged in communication via a set of related
1584	   RTP Streams.

1586	   RTP [RFC3550] defines a Multimedia Session as a set of concurrent RTP
1587	   Sessions among a common group of Participants.  For example, a video
1588	   conference may contain an audio RTP Session and a video RTP Session.
1589	   This would correspond to a group of Participants (each using one or
1590	   more Endpoints) sharing a set of concurrent RTP Sessions.  In this
1591	   memo, Multimedia Session also defines those RTP Sessions to have some
1592	   relation and be part of a communication among the Participants.

1594	4.6.  Multipoint Control Unit (MCU)

1596	   This term is commonly used to describe the central node in any type
1597	   of star topology [I-D.ietf-avtcore-rtp-topologies-update] conference.
1598	   It describes a device that includes one Participant (Section 2.2.3)
1599	   (usually corresponding to a so-called conference focus) and one or
1600	   more related Endpoints (Section 2.2.1) (sometimes one or more per
1601	   conference Participant).

1603	4.7.  Recording Device

1605	   WebRTC specifications use this term to refer to locally available
1606	   entities performing a Media Capture (Section 2.1.2) transformation.

1608	4.8.  RtcMediaStream

1610	   A WebRTC RtcMediaStreamTrack is a set of Media Sources
1611	   (Section 2.1.4) sharing the same Synchronization Context
1612	   (Section 3.1).

1614	4.9.  RtcMediaStreamTrack

1616	   A WebRTC RtcMediaStreamTrack is a Media Source (Section 2.1.4).

1618	4.10.  RTP Sender

1620	   RTP [RFC3550] uses this term, which can be seen as the RTP protocol
1621	   part of a Media Packetizer (Section 2.1.9).

1623	4.11.  RTP Session

1625	   Within the context of SDP, a singe m= line can map to a single RTP
1626	   Session (Section 2.2.2) or multiple m= lines can map to a single RTP
1627	   Session.  The latter is enabled via multiplexing schemes such as
1628	   BUNDLE [I-D.ietf-mmusic-sdp-bundle-negotiation], for example, which
1629	   allows mapping of multiple m= lines to a single RTP Session.

1631	4.12.  SSRC

1633	   RTP [RFC3550] defines this as "the source of a stream of RTP
1634	   packets", which indicates that an SSRC is not only a unique
1635	   identifier for the Encoded Stream (Section 2.1.7) carried in those
1636	   packets, but is also effectively used as a term to denote a Media
1637	   Packetizer (Section 2.1.9).

1639	5.  Security Considerations

1641	   This document simply tries to clarify the confusion prevalent in RTP
1642	   taxonomy because of inconsistent usage by multiple technologies and
1643	   protocols making use of the RTP protocol.  It does not introduce any
1644	   new security considerations beyond those already well documented in
1645	   the RTP protocol [RFC3550] and each of the many respective
1646	   specifications of the various protocols making use of it.

1648	   Hopefully having a well-defined common terminology and understanding
1649	   of the complexities of the RTP architecture will help lead us to
1650	   better standards, avoiding security problems.

1652	6.  Acknowledgement

1654	   This document has many concepts borrowed from several documents such
1655	   as WebRTC [I-D.ietf-rtcweb-overview], CLUE [I-D.ietf-clue-framework],
1656	   Multiplexing Architecture
1657	   [I-D.westerlund-avtcore-transport-multiplexing].  The authors would
1658	   like to thank all the authors of each of those documents.

1660	   The authors would also like to acknowledge the insights, guidance and
1661	   contributions of Magnus Westerlund, Roni Even, Paul Kyzivat, Colin
1662	   Perkins, Keith Drage, Harald Alvestrand, Alex Eleftheriadis, Mo
1663	   Zanaty, and Stephan Wenger.

1665	7.  Contributors

1667	   Magnus Westerlund has contributed the concept model for the media
1668	   chain using transformations and streams model, including rewriting
1669	   pre-existing concepts into this model and adding missing concepts.
1670	   The first proposal for updating the relationships and the topologies
1671	   based on this concept was also performed by Magnus.

1673	8.  IANA Considerations

1675	   This document makes no request of IANA.

1677	9.  Informative References

1679	   [I-D.ietf-avtcore-rtp-multi-stream]
1680	              Lennox, J., Westerlund, M., Wu, W., and C. Perkins,
1681	              "Sending Multiple Media Streams in a Single RTP Session",
1682	              draft-ietf-avtcore-rtp-multi-stream-06 (work in progress),
1683	              October 2014.

1685	   [I-D.ietf-avtcore-rtp-topologies-update]
1686	              Westerlund, M. and S. Wenger, "RTP Topologies", draft-
1687	              ietf-avtcore-rtp-topologies-update-05 (work in progress),
1688	              November 2014.

1690	   [I-D.ietf-clue-framework]
1691	              Duckworth, M., Pepperell, A., and S. Wenger, "Framework
1692	              for Telepresence Multi-Streams", draft-ietf-clue-
1693	              framework-19 (work in progress), December 2014.

1695	   [I-D.ietf-mmusic-sdp-bundle-negotiation]
1696	              Holmberg, C., Alvestrand, H., and C. Jennings,
1697	              "Negotiating Media Multiplexing Using the Session
1698	              Description Protocol (SDP)", draft-ietf-mmusic-sdp-bundle-
1699	              negotiation-14 (work in progress), December 2014.

1701	   [I-D.ietf-rtcweb-overview]
1702	              Alvestrand, H., "Overview: Real Time Protocols for
1703	              Browser-based Applications", draft-ietf-rtcweb-overview-13
1704	              (work in progress), November 2014.

1706	   [I-D.westerlund-avtcore-transport-multiplexing]
1707	              Westerlund, M. and C. Perkins, "Multiplexing Multiple RTP
1708	              Sessions onto a Single Lower-Layer Transport", draft-
1709	              westerlund-avtcore-transport-multiplexing-07 (work in
1710	              progress), October 2013.

1712	   [RFC2198]  Perkins, C., Kouvelas, I., Hodson, O., Hardman, V.,
1713	              Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse-
1714	              Parisis, "RTP Payload for Redundant Audio Data", RFC 2198,
1715	              September 1997.

1717	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
1718	              Jacobson, "RTP: A Transport Protocol for Real-Time
1719	              Applications", STD 64, RFC 3550, July 2003.

1721	   [RFC3551]  Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
1722	              Video Conferences with Minimal Control", STD 65, RFC 3551,
1723	              July 2003.

1725	   [RFC4566]  Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
1726	              Description Protocol", RFC 4566, July 2006.

1728	   [RFC4588]  Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R.
1729	              Hakenberg, "RTP Retransmission Payload Format", RFC 4588,
1730	              July 2006.

1732	   [RFC4867]  Sjoberg, J., Westerlund, M., Lakaniemi, A., and Q. Xie,
1733	              "RTP Payload Format and File Storage Format for the
1734	              Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband
1735	              (AMR-WB) Audio Codecs", RFC 4867, April 2007.

1737	   [RFC5109]  Li, A., "RTP Payload Format for Generic Forward Error
1738	              Correction", RFC 5109, December 2007.

1740	   [RFC5404]  Westerlund, M. and I. Johansson, "RTP Payload Format for
1741	              G.719", RFC 5404, January 2009.

1743	   [RFC5576]  Lennox, J., Ott, J., and T. Schierl, "Source-Specific
1744	              Media Attributes in the Session Description Protocol
1745	              (SDP)", RFC 5576, June 2009.

1747	   [RFC5888]  Camarillo, G. and H. Schulzrinne, "The Session Description
1748	              Protocol (SDP) Grouping Framework", RFC 5888, June 2010.

1750	   [RFC5905]  Mills, D., Martin, J., Burbank, J., and W. Kasch, "Network
1751	              Time Protocol Version 4: Protocol and Algorithms
1752	              Specification", RFC 5905, June 2010.

1754	   [RFC6190]  Wenger, S., Wang, Y., Schierl, T., and A. Eleftheriadis,
1755	              "RTP Payload Format for Scalable Video Coding", RFC 6190,
1756	              May 2011.

1758	   [RFC7160]  Petit-Huguenin, M. and G. Zorn, "Support for Multiple
1759	              Clock Rates in an RTP Session", RFC 7160, April 2014.

1761	   [RFC7197]  Begen, A., Cai, Y., and H. Ou, "Duplication Delay
1762	              Attribute in the Session Description Protocol", RFC 7197,
1763	              April 2014.

1765	   [RFC7198]  Begen, A. and C. Perkins, "Duplicating RTP Streams", RFC
1766	              7198, April 2014.

1768	   [RFC7273]  Williams, A., Gross, K., van Brandenburg, R., and H.
1769	              Stokking, "RTP Clock Source Signalling", RFC 7273, June
1770	              2014.

1772	Appendix A.  Changes From Earlier Versions

1774	   NOTE TO RFC EDITOR: Please remove this section prior to publication.

1776	A.1.  Modifications Between WG Version -03 and -04

1778	   o  Changed "Media Redundancy" and "Media Repair" to "RTP-based
1779	      Redundancy" and "RTP-based Repair", since those terms are more
1780	      specific and correct.

1782	   o  Changed "End Point" to "Endpoint" and removed Editor's Note on
1783	      this.

1785	   o  Clarified that a Media Capture may impose constraints on clock
1786	      handling.

1788	   o  Clarified that mixing multiple Raw Streams into a Source Stream is
1789	      not possible, since that requires mixed streams to have a timing
1790	      relation, requiring them to be Source Streams, and added an
1791	      example.

1793	   o  Clarified that RTP-based Redundancy excludes the type of encoding
1794	      redundancy found within the encoded media format in an Encoded
1795	      Stream.

1797	   o  Clarified that a Media Transport contains only a single RTP
1798	      Session, but a single RTP Session can span multiple Media
1799	      Transports.

1801	   o  Clarified that packets with seemingly correct checksum that are
1802	      received by a Media Transport Receiver may still be corrupt.

1804	   o  Clarified that a corrupt packet in a Media Transport Receiver is
1805	      typically either discarded or somehow marked and passed on in the
1806	      Received RTP Stream.

1808	   o  Added Synchronization Context to Figure 6.

1810	   o  Editorial improvements and clarifications.

1812	A.2.  Modifications Between WG Version -02 and -03

1814	   o  Changed section 3.5, removing SST-SS/MS and MST-SS/MS, replacing
1815	      them with SRST, MRST, and MRMT.

1817	   o  Updated section 3.8 to align with terminology changes in section
1818	      3.5.

1820	   o  Added a new section 4.12, describing the term Multimedia
1821	      Conference.

1823	   o  Changed reference from I-D to now published RFC 7273.

1825	   o  Editorial improvements and clarifications.

1827	A.3.  Modifications Between WG Version -01 and -02

1829	   o  Major re-structure

1831	   o  Moved media chain Media Transport detailing up one section level

1833	   o  Collapsed level 2 sub-sections of section 3 and thus moved level 3
1834	      sub-sections up one level, gathering some introductory text into
1835	      the beginning of section 3

1837	   o  Added that not only SSRC collision, but also a clock rate change
1838	      [RFC7160] is a valid reason to change SSRC value for an RTP stream

1840	   o  Added a sub-section on clock source signaling

1842	   o  Added a sub-section on RTP stream duplication
1843	   o  Elaborated a bit in section 2.2.1 on the relation between End
1844	      Points, Participants and CNAMEs

1846	   o  Elaborated a bit in section 2.2.4 on Multimedia Session and
1847	      synchronization contexts

1849	   o  Removed the section on CLUE scenes defining an implicit
1850	      synchronization context, since it was incorrect

1852	   o  Clarified text on SVC SST and MST according to list discussions

1854	   o  Removed the entire topology section to avoid possible
1855	      inconsistencies or duplications with draft-ietf-avtcore-rtp-
1856	      topologies-update, but saved one example overview figure of
1857	      Communication Entities into that section

1859	   o  Added a section 4 on mapping from existing terms with one sub-
1860	      section per term, mainly by moving text from sections 2 and 3

1862	   o  Changed all occurrences of Packet Stream to RTP Stream

1864	   o  Moved all normative references to informative, since this is an
1865	      informative document

1867	   o  Added references to RFC 7160, RFC 7197 and RFC 7198, and removed
1868	      unused references

1870	A.4.  Modifications Between WG Version -00 and -01

1872	   o  WG version -00 text is identical to individual draft -03

1874	   o  Amended description of SVC SST and MST encodings with respect to
1875	      concepts defined in this text

1877	   o  Removed UML as normative reference, since the text no longer uses
1878	      any UML notation

1880	   o  Removed a number of level 4 sections and moved out text to the
1881	      level above

1883	A.5.  Modifications Between Version -02 and -03

1885	   o  Section 4 rewritten (and new communication topologies added) to
1886	      reflect the major updates to Sections 1-3

1888	   o  Section 8 removed (carryover from initial -00 draft)

1890	   o  General clean up of text, grammar and nits

1892	A.6.  Modifications Between Version -01 and -02

1894	   o  Section 2 rewritten to add both streams and transformations in the
1895	      media chain.

1897	   o  Section 3 rewritten to focus on exposing relationships.

1899	A.7.  Modifications Between Version -00 and -01

1901	   o  Too many to list

1903	   o  Added new authors

1905	   o  Updated content organization and presentation

1907	Authors' Addresses

1909	   Jonathan Lennox
1910	   Vidyo, Inc.
1911	   433 Hackensack Avenue
1912	   Seventh Floor
1913	   Hackensack, NJ  07601
1914	   US

1916	   Email: jonathan@vidyo.com

1918	   Kevin Gross
1919	   AVA Networks, LLC
1920	   Boulder, CO
1921	   US

1923	   Email: kevin.gross@avanw.com

1925	   Suhas Nandakumar
1926	   Cisco Systems
1927	   170 West Tasman Drive
1928	   San Jose, CA  95134
1929	   US

1931	   Email: snandaku@cisco.com
1932	   Gonzalo Salgueiro
1933	   Cisco Systems
1934	   7200-12 Kit Creek Road
1935	   Research Triangle Park, NC  27709
1936	   US

1938	   Email: gsalguei@cisco.com

1940	   Bo Burman
1941	   Ericsson
1942	   Kistavagen 25
1943	   SE-164 80 Stockholm
1944	   Sweden

1946	   Phone: +46 10 714 13 11
1947	   Email: bo.burman@ericsson.com