idnits 2.17.1 draft-ietf-avtext-rtp-grouping-taxonomy-08.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 20, 2015) is 3202 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-11) exists of draft-ietf-avtcore-rtp-multi-stream-08 == Outdated reference: A later version (-25) exists of draft-ietf-clue-framework-22 == Outdated reference: A later version (-54) exists of draft-ietf-mmusic-sdp-bundle-negotiation-23 == Outdated reference: A later version (-14) exists of draft-ietf-mmusic-sdp-simulcast-00 == Outdated reference: A later version (-19) exists of draft-ietf-rtcweb-overview-14 -- Obsolete informational reference (is this intentional?): RFC 4566 (Obsoleted by RFC 8866) Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Lennox 3 Internet-Draft Vidyo 4 Intended status: Informational K. Gross 5 Expires: January 21, 2016 AVA 6 S. Nandakumar 7 G. Salgueiro 8 Cisco Systems 9 B. Burman, Ed. 10 Ericsson 11 July 20, 2015 13 A Taxonomy of Semantics and Mechanisms for Real-Time Transport Protocol 14 (RTP) Sources 15 draft-ietf-avtext-rtp-grouping-taxonomy-08 17 Abstract 19 The terminology about, and associations among, Real-Time Transport 20 Protocol (RTP) sources can be complex and somewhat opaque. This 21 document describes a number of existing and proposed properties and 22 relationships among RTP sources, and defines common terminology for 23 discussing protocol entities and their relationships. 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on January 21, 2016. 42 Copyright Notice 44 Copyright (c) 2015 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 60 2. Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . 4 61 2.1. Media Chain . . . . . . . . . . . . . . . . . . . . . . . 5 62 2.1.1. Physical Stimulus . . . . . . . . . . . . . . . . . . 9 63 2.1.2. Media Capture . . . . . . . . . . . . . . . . . . . . 9 64 2.1.3. Raw Stream . . . . . . . . . . . . . . . . . . . . . 9 65 2.1.4. Media Source . . . . . . . . . . . . . . . . . . . . 10 66 2.1.5. Source Stream . . . . . . . . . . . . . . . . . . . . 10 67 2.1.6. Media Encoder . . . . . . . . . . . . . . . . . . . . 11 68 2.1.7. Encoded Stream . . . . . . . . . . . . . . . . . . . 12 69 2.1.8. Dependent Stream . . . . . . . . . . . . . . . . . . 12 70 2.1.9. Media Packetizer . . . . . . . . . . . . . . . . . . 12 71 2.1.10. RTP Stream . . . . . . . . . . . . . . . . . . . . . 13 72 2.1.11. RTP-based Redundancy . . . . . . . . . . . . . . . . 13 73 2.1.12. Redundancy RTP Stream . . . . . . . . . . . . . . . . 14 74 2.1.13. RTP-based Security . . . . . . . . . . . . . . . . . 14 75 2.1.14. Secured RTP Stream . . . . . . . . . . . . . . . . . 15 76 2.1.15. Media Transport . . . . . . . . . . . . . . . . . . . 15 77 2.1.16. Media Transport Sender . . . . . . . . . . . . . . . 16 78 2.1.17. Sent RTP Stream . . . . . . . . . . . . . . . . . . . 17 79 2.1.18. Network Transport . . . . . . . . . . . . . . . . . . 17 80 2.1.19. Transported RTP Stream . . . . . . . . . . . . . . . 17 81 2.1.20. Media Transport Receiver . . . . . . . . . . . . . . 17 82 2.1.21. Received Secured RTP Stream . . . . . . . . . . . . . 18 83 2.1.22. RTP-based Validation . . . . . . . . . . . . . . . . 18 84 2.1.23. Received RTP Stream . . . . . . . . . . . . . . . . . 18 85 2.1.24. Received Redundancy RTP Stream . . . . . . . . . . . 18 86 2.1.25. RTP-based Repair . . . . . . . . . . . . . . . . . . 18 87 2.1.26. Repaired RTP Stream . . . . . . . . . . . . . . . . . 18 88 2.1.27. Media Depacketizer . . . . . . . . . . . . . . . . . 19 89 2.1.28. Received Encoded Stream . . . . . . . . . . . . . . . 19 90 2.1.29. Media Decoder . . . . . . . . . . . . . . . . . . . . 19 91 2.1.30. Received Source Stream . . . . . . . . . . . . . . . 19 92 2.1.31. Media Sink . . . . . . . . . . . . . . . . . . . . . 19 93 2.1.32. Received Raw Stream . . . . . . . . . . . . . . . . . 20 94 2.1.33. Media Render . . . . . . . . . . . . . . . . . . . . 20 95 2.2. Communication Entities . . . . . . . . . . . . . . . . . 20 96 2.2.1. Endpoint . . . . . . . . . . . . . . . . . . . . . . 22 97 2.2.2. RTP Session . . . . . . . . . . . . . . . . . . . . . 22 98 2.2.3. Participant . . . . . . . . . . . . . . . . . . . . . 23 99 2.2.4. Multimedia Session . . . . . . . . . . . . . . . . . 23 100 2.2.5. Communication Session . . . . . . . . . . . . . . . . 24 101 3. Concepts of Inter-Relations . . . . . . . . . . . . . . . . . 24 102 3.1. Synchronization Context . . . . . . . . . . . . . . . . . 24 103 3.1.1. RTCP CNAME . . . . . . . . . . . . . . . . . . . . . 25 104 3.1.2. Clock Source Signaling . . . . . . . . . . . . . . . 25 105 3.1.3. Implicitly via RtcMediaStream . . . . . . . . . . . . 25 106 3.1.4. Explicitly via SDP Mechanisms . . . . . . . . . . . . 25 107 3.2. Endpoint . . . . . . . . . . . . . . . . . . . . . . . . 25 108 3.3. Participant . . . . . . . . . . . . . . . . . . . . . . . 26 109 3.4. RtcMediaStream . . . . . . . . . . . . . . . . . . . . . 26 110 3.5. Multi-Channel Audio . . . . . . . . . . . . . . . . . . . 26 111 3.6. Simulcast . . . . . . . . . . . . . . . . . . . . . . . . 27 112 3.7. Layered Multi-Stream . . . . . . . . . . . . . . . . . . 28 113 3.8. RTP Stream Duplication . . . . . . . . . . . . . . . . . 29 114 3.9. Redundancy Format . . . . . . . . . . . . . . . . . . . . 30 115 3.10. RTP Retransmission . . . . . . . . . . . . . . . . . . . 31 116 3.11. Forward Error Correction . . . . . . . . . . . . . . . . 33 117 3.12. RTP Stream Separation . . . . . . . . . . . . . . . . . . 34 118 3.13. Multiple RTP Sessions over one Media Transport . . . . . 35 119 4. Mapping from Existing Terms . . . . . . . . . . . . . . . . . 35 120 4.1. Telepresence Terms . . . . . . . . . . . . . . . . . . . 35 121 4.1.1. Audio Capture . . . . . . . . . . . . . . . . . . . . 35 122 4.1.2. Capture Device . . . . . . . . . . . . . . . . . . . 35 123 4.1.3. Capture Encoding . . . . . . . . . . . . . . . . . . 36 124 4.1.4. Capture Scene . . . . . . . . . . . . . . . . . . . . 36 125 4.1.5. Endpoint . . . . . . . . . . . . . . . . . . . . . . 36 126 4.1.6. Individual Encoding . . . . . . . . . . . . . . . . . 36 127 4.1.7. Media Capture . . . . . . . . . . . . . . . . . . . . 36 128 4.1.8. Media Consumer . . . . . . . . . . . . . . . . . . . 36 129 4.1.9. Media Provider . . . . . . . . . . . . . . . . . . . 37 130 4.1.10. Stream . . . . . . . . . . . . . . . . . . . . . . . 37 131 4.1.11. Video Capture . . . . . . . . . . . . . . . . . . . . 37 132 4.2. Media Description . . . . . . . . . . . . . . . . . . . . 37 133 4.3. Media Stream . . . . . . . . . . . . . . . . . . . . . . 37 134 4.4. Multimedia Conference . . . . . . . . . . . . . . . . . . 37 135 4.5. Multimedia Session . . . . . . . . . . . . . . . . . . . 38 136 4.6. Multipoint Control Unit (MCU) . . . . . . . . . . . . . . 38 137 4.7. Multi-Session Transmission (MST) . . . . . . . . . . . . 38 138 4.8. Recording Device . . . . . . . . . . . . . . . . . . . . 39 139 4.9. RtcMediaStream . . . . . . . . . . . . . . . . . . . . . 39 140 4.10. RtcMediaStreamTrack . . . . . . . . . . . . . . . . . . . 39 141 4.11. RTP Sender . . . . . . . . . . . . . . . . . . . . . . . 39 142 4.12. RTP Session . . . . . . . . . . . . . . . . . . . . . . . 39 143 4.13. Single Session Transmission (SST) . . . . . . . . . . . . 39 144 4.14. SSRC . . . . . . . . . . . . . . . . . . . . . . . . . . 39 146 5. Security Considerations . . . . . . . . . . . . . . . . . . . 40 147 6. Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 40 148 7. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 40 149 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 41 150 9. Informative References . . . . . . . . . . . . . . . . . . . 41 151 Appendix A. Changes From Earlier Versions . . . . . . . . . . . 44 152 A.1. Modifications Between WG Version -07 and -08 . . . . . . 44 153 A.2. Modifications Between WG Version -06 and -07 . . . . . . 45 154 A.3. Modifications Between WG Version -05 and -06 . . . . . . 45 155 A.4. Modifications Between WG Version -04 and -05 . . . . . . 46 156 A.5. Modifications Between WG Version -03 and -04 . . . . . . 46 157 A.6. Modifications Between WG Version -02 and -03 . . . . . . 47 158 A.7. Modifications Between WG Version -01 and -02 . . . . . . 47 159 A.8. Modifications Between WG Version -00 and -01 . . . . . . 48 160 A.9. Modifications Between Version -02 and -03 . . . . . . . . 48 161 A.10. Modifications Between Version -01 and -02 . . . . . . . . 48 162 A.11. Modifications Between Version -00 and -01 . . . . . . . . 48 163 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 49 165 1. Introduction 167 The existing taxonomy of sources in the Real-Time Transport Protocol 168 (RTP) [RFC3550] has previously been regarded as confusing and 169 inconsistent. Consequently, a deep understanding of how the 170 different terms relate to each other becomes a real challenge. 171 Frequently cited examples of this confusion are (1) how different 172 protocols that make use of RTP use the same terms to signify 173 different things and (2) how the complexities addressed at one layer 174 are often glossed over or ignored at another. 176 This document improves clarity by reviewing the semantics of various 177 aspects of sources in RTP. As an organizing mechanism, it approaches 178 this by describing various ways that RTP sources are transformed on 179 their way between sender and receiver, and how they can be grouped 180 and associated together. 182 All non-specific references to ControLling mUltiple streams for 183 tElepresence (CLUE) in this document map to [I-D.ietf-clue-framework] 184 and all references to Web Real-Time Communications (WebRTC) map to 185 [I-D.ietf-rtcweb-overview]. 187 2. Concepts 189 This section defines concepts that serve to identify and name various 190 transformations and streams in a given RTP usage. For each concept, 191 alternate definitions and usages that co-exist today are listed along 192 with various characteristics that further describes the concept. 193 These concepts are divided into two categories, one related to the 194 chain of streams and transformations that media can be subject to, 195 the other for entities involved in the communication. 197 2.1. Media Chain 199 In the context of this document, Media is a sequence of synthetic or 200 Physical Stimuli (Section 2.1.1) (sound waves, photons, key-strokes), 201 represented in digital form. Synthesized Media is typically 202 generated directly in the digital domain. 204 This section contains the concepts that can be involved in taking 205 Media at a sender side and transporting it to a receiver, which may 206 recover a sequence of physical stimuli. This chain of concepts is of 207 two main types, streams and transformations. Streams are time-based 208 sequences of samples of the physical stimulus in various 209 representations, while transformations changes the representation of 210 the streams in some way. 212 The below examples are basic ones and it is important to keep in mind 213 that this conceptual model enables more complex usages. Some will be 214 further discussed in later sections of this document. In general the 215 following applies to this model: 217 o A transformation may have zero or more inputs and one or more 218 outputs. 220 o A stream is of some type, such as audio, video, real-time text, 221 etc. 223 o A stream has one source transformation and one or more sink 224 transformations (with the exception of Physical Stimulus 225 (Section 2.1.1) that may lack source or sink transformation). 227 o Streams can be forwarded from a transformation output to any 228 number of inputs on other transformations that support that type. 230 o If the output of a transformation is sent to multiple 231 transformations, those streams will be identical; it takes a 232 transformation to make them different. 234 o There are no formal limitations on how streams are connected to 235 transformations. 237 It is also important to remember that this is a conceptual model. 238 Thus real-world implementations may look different and have different 239 structure. 241 To provide a basic understanding of the relationships in the chain we 242 first introduce the concepts for the sender side (Figure 1). This 243 covers physical stimuli until media packets are emitted onto the 244 network. 246 Physical Stimulus 247 | 248 V 249 +----------------------+ 250 | Media Capture | 251 +----------------------+ 252 | 253 Raw Stream 254 V 255 +----------------------+ 256 | Media Source |<- Synchronization Timing 257 +----------------------+ 258 | 259 Source Stream 260 V 261 +----------------------+ 262 | Media Encoder | 263 +----------------------+ 264 | 265 Encoded Stream +------------+ 266 V | V 267 +----------------------+ | +----------------------+ 268 | Media Packetizer | | | RTP-based Redundancy | 269 +----------------------+ | +----------------------+ 270 | | | 271 +-------------+ Redundancy RTP Stream 272 Source RTP Stream | 273 V V 274 +----------------------+ +----------------------+ 275 | RTP-based Security | | RTP-based Security | 276 +----------------------+ +----------------------+ 277 | | 278 Secured RTP Stream Secured Redundancy RTP Stream 279 V V 280 +----------------------+ +----------------------+ 281 | Media Transport | | Media Transport | 282 +----------------------+ +----------------------+ 284 Figure 1: Sender Side Concepts in the Media Chain 286 In Figure 1 we have included a branched chain to cover the concepts 287 for using redundancy to improve the reliability of the transport. 289 The Media Transport concept is an aggregate that is decomposed in 290 Section 2.1.15. 292 In Figure 2 we review a receiver media chain matching the sender 293 side, to look at the inverse transformations and their attempts to 294 recover identical streams as in the sender chain, subject to what may 295 be lossy compression and imperfect Media Transport. Note that the 296 streams out of a reverse transformation, like the Source Stream out 297 the Media Decoder are in many cases not the same as the corresponding 298 ones on the sender side, thus they are prefixed with a "Received" to 299 denote a potentially modified version. The reason for not being the 300 same lies in the transformations that can be of irreversible type. 301 For example, lossy source coding in the Media Encoder prevents the 302 Source Stream out of the Media Decoder to be the same as the one fed 303 into the Media Encoder. Other reasons include packet loss or late 304 loss in the Media Transport transformation that even RTP-based 305 Repair, if used, fails to repair. However, some transformations are 306 not always present, like RTP-based Repair that cannot operate without 307 Redundancy RTP Streams. 309 +----------------------+ +----------------------+ 310 | Media Transport | | Media Transport | 311 +----------------------+ +----------------------+ 312 Received | Received | Secured 313 Secured RTP Stream Redundancy RTP Stream 314 V V 315 +----------------------+ +----------------------+ 316 | RTP-based Validation | | RTP-based Validation | 317 +----------------------+ +----------------------+ 318 | | 319 Received RTP Stream Received Redundancy RTP Stream 320 | | 321 | +--------------------+ 322 V V 323 +----------------------+ 324 | RTP-based Repair | 325 +----------------------+ 326 | 327 Repaired RTP Stream 328 V 329 +----------------------+ 330 | Media Depacketizer | 331 +----------------------+ 332 | 333 Received Encoded Stream 334 V 335 +----------------------+ 336 | Media Decoder | 337 +----------------------+ 338 | 339 Received Source Stream 340 V 341 +----------------------+ 342 | Media Sink |--> Synchronization Information 343 +----------------------+ 344 | 345 Received Raw Stream 346 V 347 +----------------------+ 348 | Media Renderer | 349 +----------------------+ 350 | 351 V 352 Physical Stimulus 354 Figure 2: Receiver Side Concepts of the Media Chain 356 2.1.1. Physical Stimulus 358 The Physical Stimulus is a physical event in the analog domain that 359 can be sampled and converted to digital form by an appropriate sensor 360 or transducer. This include sound waves making up audio, photons in 361 a light field, or other excitations or interactions with sensors, 362 like keystrokes on a keyboard. 364 2.1.2. Media Capture 366 Media Capture is the process of transforming the analog Physical 367 Stimulus (Section 2.1.1) into digital Media using an appropriate 368 sensor or transducer. The Media Capture performs a digital sampling 369 of the physical stimulus, usually periodically, and outputs this in 370 some representation as a Raw Stream (Section 2.1.3). This data is 371 considered "Media", because it includes data that is periodically 372 sampled, or made up of a set of timed asynchronous events. The Media 373 Capture is normally instantiated in some type of device, i.e. media 374 capture device. Examples of different types of media capturing 375 devices are digital cameras, microphones connected to A/D converters, 376 or keyboards. 378 Characteristics: 380 o A Media Capture is identified either by hardware/manufacturer ID 381 or via a session-scoped device identifier as mandated by the 382 application usage. 384 o A Media Capture can generate an Encoded Stream (Section 2.1.7) if 385 the capture device supports such a configuration. 387 o The nature of the Media Capture may impose constraints on the 388 clock handling in some of the subsequent steps. For example, many 389 audio or video capture devices are not completely free in 390 selecting the sample rate. 392 2.1.3. Raw Stream 394 A Raw Stream is the time progressing stream of digitally sampled 395 information, usually periodically sampled and provided by a Media 396 Capture (Section 2.1.2). A Raw Stream can also contain synthesized 397 Media that may not require any explicit Media Capture, since it is 398 already in an appropriate digital form. 400 2.1.4. Media Source 402 A Media Source is the logical source of a time progressing digital 403 media stream synchronized to a reference clock. This stream is 404 called a Source Stream (Section 2.1.5). This transformation takes 405 one or more Raw Streams (Section 2.1.3) and provides a Source Stream 406 as output. The output is synchronized with a reference clock 407 (Section 3.1), which can be as simple as a system local wall clock or 408 as complex as an NTP synchronized clock. 410 The output can be of different types. One type is directly 411 associated with a particular Media Capture's Raw Stream. Others are 412 more conceptual sources, like an audio mix of multiple Source Streams 413 (Figure 3). Mixing multiple streams typically requires that the 414 input streams are possible to relate in time, meaning that they have 415 to be Source Streams (Section 2.1.5) rather than Raw Streams. In 416 Figure 3, the generated Source Stream is a mix of the three input 417 Source Streams. 419 Source Source Source 420 Stream Stream Stream 421 | | | 422 V V V 423 +--------------------------+ 424 | Media Source |<-- Reference Clock 425 | Mixer | 426 +--------------------------+ 427 | 428 V 429 Source Stream 431 Figure 3: Conceptual Media Source in form of Audio Mixer 433 Another possible example of a conceptual Media Source is a video 434 surveillance switch, where the input is multiple Source Streams from 435 different cameras, and the output is one of those Source Streams 436 based on some selection criteria, like a round-robin or based on some 437 video activity measure. 439 2.1.5. Source Stream 441 A Source Stream is a stream of digital samples that has been 442 synchronized with a reference clock and comes from particular Media 443 Source (Section 2.1.4). 445 2.1.6. Media Encoder 447 A Media Encoder is a transform that is responsible for encoding the 448 media data from a Source Stream (Section 2.1.5) into another 449 representation, usually more compact, that is output as an Encoded 450 Stream (Section 2.1.7). 452 The Media Encoder step commonly includes pre-encoding 453 transformations, such as scaling, resampling etc. The Media Encoder 454 can have a significant number of configuration options that affects 455 the properties of the Encoded Stream. This include properties such 456 as codec, bit-rate, start points for decoding, resolution, bandwidth 457 or other fidelity affecting properties. 459 Scalable Media Encoders need special attention as they produce 460 multiple outputs that are potentially of different types. As shown 461 in Figure 4, a scalable Media Encoder takes one input Source Stream 462 and encodes it into multiple output streams of two different types; 463 at least one Encoded Stream that is independently decodable and one 464 or more Dependent Streams (Section 2.1.8). Decoding requires at 465 least one Encoded Stream and zero or more Dependent Streams. A 466 Dependent Stream's dependency is one of the grouping relations this 467 document discusses further in Section 3.7. 469 Source Stream 470 | 471 V 472 +--------------------------+ 473 | Scalable Media Encoder | 474 +--------------------------+ 475 | | ... | 476 V V V 477 Encoded Dependent Dependent 478 Stream Stream Stream 480 Figure 4: Scalable Media Encoder Input and Outputs 482 There are also other variants of encoders, like so-called Multiple 483 Description Coding (MDC). Such Media Encoders produce multiple 484 independent and thus individually decodable Encoded Streams. 485 However, (logically) combining multiple of these Encoded Streams into 486 a single Received Source Stream during decoding leads to an 487 improvement in perceptual reproduced quality when compared to 488 decoding a single Encoded Stream. 490 Creating multiple Encoded Streams from the same Source Stream, where 491 the Encoded Streams are neither in a scalable nor in an MDC 492 relationship is commonly utilized in Simulcast 493 [I-D.ietf-mmusic-sdp-simulcast] environments. 495 2.1.7. Encoded Stream 497 A stream of time synchronized encoded media that can be independently 498 decoded. 500 Due to temporal dependencies, an Encoded Stream may have limitations 501 in where decoding can be started. These entry points, for example 502 Intra frames from a video encoder, may require identification and 503 their generation may be event based or configured to occur 504 periodically. 506 2.1.8. Dependent Stream 508 A stream of time synchronized encoded media fragments that are 509 dependent on one or more Encoded Streams (Section 2.1.7) and zero or 510 more Dependent Streams to be possible to decode. 512 Each Dependent Stream has a set of dependencies. These dependencies 513 must be understood by the parties in a Multimedia Session that intend 514 to use a Dependent Stream. 516 2.1.9. Media Packetizer 518 The transformation of taking one or more Encoded (Section 2.1.7) or 519 Dependent Streams (Section 2.1.8) and putting their content into one 520 or more sequences of packets, normally RTP packets, and output Source 521 RTP Streams (Section 2.1.10). This step includes both generating RTP 522 payloads as well as RTP packets. The Media Packetizer then selects 523 which Synchronization source(s) (SSRC) [RFC3550] and RTP Sessions to 524 use. 526 The Media Packetizer can combine multiple Encoded or Dependent 527 Streams into one or more RTP Streams: 529 o The Media Packetizer can use multiple inputs when producing a 530 single RTP Stream. One such example is SRST packetization when 531 using Scalable Video Coding (SVC) (Section 3.7). 533 o The Media Packetizer can also produce multiple RTP Streams, for 534 example when Encoded and/or Dependent Streams are distributed over 535 multiple RTP Streams. One example of this is MRMT packetization 536 when using SVC (Section 3.7). 538 2.1.10. RTP Stream 540 An RTP Stream is a stream of RTP packets containing media data, 541 source or redundant. The RTP Stream is identified by an SSRC 542 belonging to a particular RTP Session. The RTP Session is identified 543 as discussed in Section 2.2.2. 545 A Source RTP Stream is an RTP Stream directly related to an Encoded 546 Stream (Section 2.1.7), targeted for transport over RTP without any 547 additional RTP-based Redundancy (Section 2.1.11) applied. 549 Characteristics: 551 o Each RTP Stream is identified by a Synchronization source (SSRC) 552 [RFC3550] that is carried in every RTP and RTP Control Protocol 553 (RTCP) packet header. The SSRC is unique in a specific RTP 554 Session context. 556 o At any given point in time, a RTP Stream can have one and only one 557 SSRC, but SSRCs for a given RTP Stream can change over time. SSRC 558 collision and clock rate change [RFC7160] are examples of valid 559 reasons to change SSRC for an RTP Stream. In those cases, the RTP 560 Stream itself is not changed in any significant way, only the 561 identifying SSRC number. 563 o Each SSRC defines a unique RTP sequence numbering and timing 564 space. 566 o Several RTP Streams, each with their own SSRC, may represent a 567 single Media Source. 569 o Several RTP Streams, each with their own SSRC, can be carried in a 570 single RTP Session. 572 2.1.11. RTP-based Redundancy 574 RTP-based Redundancy is defined here as a transformation that 575 generates redundant or repair packets sent out as a Redundancy RTP 576 Stream (Section 2.1.12) to mitigate network transport impairments, 577 like packet loss and delay. Note that this excludes the type of 578 redundancy that most suitable Media Encoders (Section 2.1.6) may add 579 to the media format of the Encoded Stream (Section 2.1.7) that makes 580 it cope better with inevitable RTP packet losses. 582 The RTP-based Redundancy exists in many flavors; they may be 583 generating independent Repair Streams that are used in addition to 584 the Source Stream (like RTP Retransmission (Section 3.10) and some 585 special types of Forward Error Correction, like RTP stream 586 duplication (Section 3.8)), they may generate a new Source Stream by 587 combining redundancy information with source information (Using XOR 588 FEC (Section 3.11) as a redundancy payload (Section 3.9)), or 589 completely replace the source information with only redundancy 590 packets. 592 2.1.12. Redundancy RTP Stream 594 A Redundancy RTP Stream is an RTP Stream (Section 2.1.10) that 595 contains no original source data, only redundant data, which may 596 either be used standalone or be combined with one or more Received 597 RTP Streams (Section 2.1.23) to produce Repaired RTP Streams 598 (Section 2.1.26). 600 2.1.13. RTP-based Security 602 The optional RTP-based Security transformation applies security 603 services such as authentication, integrity protection and 604 confidentiality to an input RTP Stream, like what is specified in The 605 Secure Real-time Transport Protocol (SRTP) [RFC3711], producing a 606 Secured RTP Stream (Section 2.1.14). Either an RTP Stream 607 (Section 2.1.10) or a Redundancy RTP Stream (Section 2.1.12) can be 608 used as input to this transformation. 610 In SRTP and the related Secure RTCP (SRTCP), all of the above 611 mentioned security services are optional, except for integrity 612 protection of SRTCP, which is mandatory. Also confidentiality 613 (encryption) is effectively optional in SRTP, since it is possible to 614 use a NULL encryption algorithm. As described in [RFC7201], the 615 strength of SRTP data origin authentication depends on the 616 cryptographic transform and key management used, for example in group 617 communication where it is sometimes possible to authenticate group 618 membership but not the actual RTP Stream sender. 620 RTP-based Security and RTP-based Redundancy can be combined in a few 621 different ways. One way is depicted in Figure 1, where an RTP Stream 622 and its corresponding Redundancy RTP Stream are protected by separate 623 RTP-based Security transforms. In other cases, like when a Media 624 Translator is adding FEC in Section 3.2.1.3 of 625 [I-D.ietf-avtcore-rtp-topologies-update], a middlebox can apply RTP- 626 based Redundancy to an already Secured RTP Stream instead of a Source 627 RTP Stream. One example of that is depicted in Figure 5 below. 629 Source RTP Stream +------------+ 630 V | V 631 +----------------------+ | +----------------------+ 632 | RTP-based Security | | | RTP-based Redundancy | 633 +----------------------+ | +----------------------+ 634 | | | 635 | | Redundancy RTP Stream 636 +-------------+ | 637 | V 638 | +----------------------+ 639 Secured RTP Stream | RTP-based Security | 640 | +----------------------+ 641 | | 642 | Secured Redundancy RTP Stream 643 V V 644 +----------------------+ +----------------------+ 645 | Media Transport | | Media Transport | 646 +----------------------+ +----------------------+ 648 Figure 5: Adding Redundancy to a Secured RTP Stream 650 In this case, the Redundancy RTP Stream may already have been secured 651 for confidentiality (encrypted) by the first RTP-based Security, and 652 it may therefore not be necessary to apply additional confidentiality 653 protection in the second RTP-based Security. To avoid attacks and 654 negative impact on RTP-based Repair (Section 2.1.25) and the 655 resulting Repaired RTP Stream (Section 2.1.26), it is however still 656 necessary to have this second RTP-based Security apply both 657 authentication and integrity protection to the Redundancy RTP Stream. 659 2.1.14. Secured RTP Stream 661 A Secured RTP Stream is a Source or Redundancy RTP Stream that is 662 protected through RTP-based Security (Section 2.1.13) by one or more 663 of the confidentiality, integrity, or authentication security 664 services. 666 2.1.15. Media Transport 668 A Media Transport defines the transformation that the RTP Streams 669 (Section 2.1.10) are subjected to by the end-to-end transport from 670 one RTP sender to one specific RTP receiver (an RTP Session 671 (Section 2.2.2) may contain multiple RTP receivers per sender). Each 672 Media Transport is defined by a transport association that is 673 normally identified by a 5-tuple (source address, source port, 674 destination address, destination port, transport protocol), but a 675 proposal exists for sending multiple transport associations on a 676 single 5-tuple [I-D.westerlund-avtcore-transport-multiplexing]. 678 Characteristics: 680 o Media Transport transmits RTP Streams of RTP Packets from a source 681 transport address to a destination transport address. 683 o Each Media Transport contains only a single RTP Session. 685 o A single RTP Session can span multiple Media Transports. 687 The Media Transport concept sometimes needs to be decomposed into 688 more steps to enable discussion of what a sender emits that gets 689 transformed by the network before it is received by the receiver. 690 Thus we provide also this Media Transport decomposition (Figure 6). 692 RTP Stream 693 | 694 V 695 +--------------------------+ 696 | Media Transport Sender | 697 +--------------------------+ 698 | 699 Sent RTP Stream 700 V 701 +--------------------------+ 702 | Network Transport | 703 +--------------------------+ 704 | 705 Transported RTP Stream 706 V 707 +--------------------------+ 708 | Media Transport Receiver | 709 +--------------------------+ 710 | 711 V 712 Received RTP Stream 714 Figure 6: Decomposition of Media Transport 716 2.1.16. Media Transport Sender 718 The first transformation within the Media Transport (Section 2.1.15) 719 is the Media Transport Sender. The sending Endpoint (Section 2.2.1) 720 takes an RTP Stream and emits the packets onto the network using the 721 transport association established for this Media Transport, thereby 722 creating a Sent RTP Stream (Section 2.1.17). In the process, it 723 transforms the RTP Stream in several ways. First, it generates the 724 necessary protocol headers for the transport association, for example 725 IP and UDP headers, thus forming IP/UDP/RTP packets. In addition, 726 the Media Transport Sender may queue, intentionally pace or otherwise 727 affect how the packets are emitted onto the network, thereby 728 potentially introducing delay and delay variations [RFC5481] that 729 characterize the Sent RTP Stream. 731 2.1.17. Sent RTP Stream 733 The Sent RTP Stream is the RTP Stream as entering the first hop of 734 the network path to its destination. The Sent RTP Stream is 735 identified using network transport addresses, like for IP/UDP the 736 5-tuple (source IP address, source port, destination IP address, 737 destination port, and protocol (UDP)). 739 2.1.18. Network Transport 741 Network Transport is the transformation that subjects the Sent RTP 742 Stream (Section 2.1.17) to traveling from the source to the 743 destination through the network. This transformation can result in 744 loss of some packets, delay and delay variation on a per packet 745 basis, packet duplication, and packet header or data corruption. 746 This transformation produces a Transported RTP Stream 747 (Section 2.1.19) at the exit of the network path. 749 2.1.19. Transported RTP Stream 751 The Transported RTP Stream is the RTP Stream that is emitted out of 752 the network path at the destination, subjected to the Network 753 Transport's transformation (Section 2.1.18). 755 2.1.20. Media Transport Receiver 757 The Media Transport Receiver is the receiver Endpoint's 758 (Section 2.2.1) transformation of the Transported RTP Stream 759 (Section 2.1.19) by its reception process, which results in the 760 Received RTP Stream (Section 2.1.23). This transformation includes 761 transport checksums being verified. Sensible system designs 762 typically either discard packets with mis-matching checksums, or pass 763 them on while somehow marking them in the resulting Received RTP 764 Stream so to alert subsequent transformations about the possible 765 corrupt state. In this context it is worth noting that there is 766 typically some probability for corrupt packets to pass through 767 undetected (with a seemingly correct checksum). Other 768 transformations can compensate for delay variations in receiving a 769 packet on the network interface and providing it to the application 770 (de-jitter buffer). 772 2.1.21. Received Secured RTP Stream 774 This is the Secured RTP Stream (Section 2.1.14) resulting from the 775 Media Transport (Section 2.1.15) aggregate transformation. 777 2.1.22. RTP-based Validation 779 RTP-based Validation is the reverse transformation of RTP-based 780 Security (Section 2.1.13). If this transformation fails, the result 781 is either not usable and must be discarded, or may be usable but 782 cannot be trusted. If the transformation succeeds, the result can be 783 a Received RTP Stream (Section 2.1.23) or a Received Redundancy RTP 784 Stream (Section 2.1.24), depending on what was input to the 785 corresponding RTP-based Security transformation, but can also be a 786 Received Secured RTP Stream (Section 2.1.21) in case several RTP- 787 based Security transformations were applied. 789 2.1.23. Received RTP Stream 791 The Received RTP Stream is the RTP Stream (Section 2.1.10) resulting 792 from the Media Transport's aggregate transformation (Section 2.1.15), 793 i.e. subjected to packet loss, packet corruption, packet duplication, 794 delay, and delay variation from sender to receiver. 796 2.1.24. Received Redundancy RTP Stream 798 The Received Redundancy RTP Stream is the Redundancy RTP Stream 799 (Section 2.1.12) resulting from the Media Transport transformation, 800 i.e. subjected to packet loss, packet corruption, delay, and delay 801 variation from sender to receiver. 803 2.1.25. RTP-based Repair 805 RTP-based Repair is a Transformation that takes as input zero or more 806 Received RTP Streams (Section 2.1.23) and one or more Received 807 Redundancy RTP Streams (Section 2.1.24), and produces one or more 808 Repaired RTP Streams (Section 2.1.26) that are as close to the 809 corresponding sent Source RTP Streams (Section 2.1.10) as possible, 810 using different RTP-based repair methods, for example the ones 811 referred in RTP-based Redundancy (Section 2.1.11). 813 2.1.26. Repaired RTP Stream 815 A Repaired RTP Stream is a Received RTP Stream (Section 2.1.23) for 816 which Received Redundancy RTP Stream (Section 2.1.24) information has 817 been used to try to recover the Source RTP Stream (Section 2.1.10) as 818 it was before Media Transport (Section 2.1.15). 820 2.1.27. Media Depacketizer 822 A Media Depacketizer takes one or more RTP Streams (Section 2.1.10), 823 depacketizes them, and attempts to reconstitute the Encoded Streams 824 (Section 2.1.7) or Dependent Streams (Section 2.1.8) present in those 825 RTP Streams. 827 In practical implementations, the Media Depacketizer and the Media 828 Decoder may be tightly coupled and share information to improve or 829 optimize the overall decoding and error concealment process. It is, 830 however, not expected that there would be any benefit in defining a 831 taxonomy for those detailed (and likely very implementation- 832 dependent) steps. 834 2.1.28. Received Encoded Stream 836 The Received Encoded Stream is the received version of an Encoded 837 Stream (Section 2.1.7). 839 2.1.29. Media Decoder 841 A Media Decoder is a transformation that is responsible for decoding 842 Encoded Streams (Section 2.1.7) and any Dependent Streams 843 (Section 2.1.8) into a Source Stream (Section 2.1.5). 845 In practical implementations, the Media Decoder and the Media 846 Depacketizer may be tightly coupled and share information to improve 847 or optimize the overall decoding process in various ways. It is 848 however not expected that there would be any benefit in defining a 849 taxonomy for those detailed (and likely very implementation- 850 dependent) steps. 852 A Media Decoder has to deal with any errors in the Encoded Streams 853 that resulted from corruption or failure to repair packet losses. 854 Therefore, it commonly is robust to error and losses, and includes 855 concealment methods. 857 2.1.30. Received Source Stream 859 The Received Source Stream is the received version of a Source Stream 860 (Section 2.1.5). 862 2.1.31. Media Sink 864 The Media Sink receives a Source Stream (Section 2.1.5) that 865 contains, usually periodically, sampled media data together with 866 associated synchronization information. Depending on application, 867 this Source Stream then needs to be transformed into a Raw Stream 868 (Section 2.1.3) that is conveyed to the Media Render 869 (Section 2.1.33), synchronized with the output from other Media 870 Sinks. The Media Sink may also be connected with a Media Source 871 (Section 2.1.4) and be used as part of a conceptual Media Source. 873 The Media Sink can further transform the Source Stream into a 874 representation that is suitable for rendering on the Media Render as 875 defined by the application or system-wide configuration. This 876 include sample scaling, level adjustments etc. 878 2.1.32. Received Raw Stream 880 The Received Raw Stream is the received version of a Raw Stream 881 (Section 2.1.3). 883 2.1.33. Media Render 885 A Media Render takes a Raw Stream (Section 2.1.3) and converts it 886 into Physical Stimulus (Section 2.1.1) that a human user can 887 perceive. Examples of such devices are screens, and D/A converters 888 connected to amplifiers and loudspeakers. 890 An Endpoint can potentially have multiple Media Renders for each 891 media type. 893 2.2. Communication Entities 895 This section contains concepts for entities involved in the 896 communication. 898 +------------------------------------------------------------+ 899 | Communication Session | 900 | | 901 | +----------------+ +----------------+ | 902 | | Participant A | +------------+ | Participant B | | 903 | | | | Multimedia | | | | 904 | | +------------+ |<==>| Session |<==>| +------------+ | | 905 | | | Endpoint A | | | | | | Endpoint B | | | 906 | | | | | +------------+ | | | | | 907 | | | +----------+-+----------------------+-+----------+ | | | 908 | | | | RTP | | | | | | | | 909 | | | | Session |-+---Media Transport----+>| | | | | 910 | | | | Audio |<+---Media Transport----+-| | | | | 911 | | | | | | ^ | | | | | | 912 | | | +----------+-+----------|-----------+-+----------+ | | | 913 | | | | | v | | | | | 914 | | | | | +-----------------+ | | | | | 915 | | | | | | Synchronization | | | | | | 916 | | | | | | Context | | | | | | 917 | | | | | +-----------------+ | | | | | 918 | | | | | ^ | | | | | 919 | | | +----------+-+----------|-----------+-+----------+ | | | 920 | | | | RTP | | v | | | | | | 921 | | | | Session |<+---Media Transport----+-| | | | | 922 | | | | Video |-+---Media Transport----+>| | | | | 923 | | | | | | | | | | | | 924 | | | +----------+-+----------------------+-+----------+ | | | 925 | | +------------+ | | +------------+ | | 926 | +----------------+ +----------------+ | 927 +------------------------------------------------------------+ 929 Figure 7: Example Point to Point Communication Session with two RTP 930 Sessions 932 Figure 7 shows a high-level example representation of a very basic 933 point-to-point Communication Session between Participants A and B. 934 It uses two different audio and video RTP Sessions between A's and 935 B's Endpoints, where each RTP Session is a group communications 936 channel that can potentially carry a number of RTP Streams. It is 937 using separate Media Transports for those RTP Sessions. The 938 Multimedia Session shared by the Participants can, for example, be 939 established using SIP (i.e., there is a SIP Dialog between A and B). 940 The terms used in Figure 7 are further elaborated in the sub-sections 941 below. 943 2.2.1. Endpoint 945 An Endpoint is a single addressable entity sending or receiving RTP 946 packets. It may be decomposed into several functional blocks, but as 947 long as it behaves as a single RTP stack entity it is classified as a 948 single "Endpoint". 950 Characteristics: 952 o Endpoints can be identified in several different ways. While RTCP 953 Canonical Names (CNAMEs) [RFC3550] provide a globally unique and 954 stable identification mechanism for the duration of the 955 Communication Session (see Section 2.2.5), their validity applies 956 exclusively within a Synchronization Context (Section 3.1). Thus 957 one Endpoint can handle multiple CNAMEs, each of which can be 958 shared among a set of Endpoints belonging to the same Participant 959 (Section 2.2.3). Therefore, mechanisms outside the scope of RTP, 960 such as application defined mechanisms, must be used to provide 961 Endpoint identification when outside this Synchronization Context. 963 o An Endpoint can be associated with at most one Participant 964 (Section 2.2.3) at any single point in time. 966 o In some contexts, an Endpoint would typically correspond to a 967 single "host", for example a computer using a single network 968 interface and being used by a single human user. In other 969 contexts, a single "host" can serve multiple Participants, in 970 which case each Participant's Endpoint may share properties, for 971 example the IP address part of a transport address. 973 2.2.2. RTP Session 975 An RTP Session is an association among a group of Participants 976 communicating with RTP. It is a group communications channel which 977 can potentially carry a number of RTP Streams. Within an RTP 978 Session, every Participant can find meta-data and control information 979 (over RTCP) about all the RTP Streams in the RTP Session. The 980 bandwidth of the RTCP control channel is shared between all 981 Participants within an RTP Session. 983 Characteristics: 985 o An RTP Session can carry one ore more RTP Streams. 987 o An RTP Session shares a single SSRC space as defined in RFC3550 988 [RFC3550]. That is, the Endpoints participating in an RTP Session 989 can see an SSRC identifier transmitted by any of the other 990 Endpoints. An Endpoint can receive an SSRC either as SSRC or as a 991 Contributing source (CSRC) in RTP and RTCP packets, as defined by 992 the Endpoints' network interconnection topology. 994 o An RTP Session uses at least two Media Transports 995 (Section 2.1.15), one for sending and one for receiving. 996 Commonly, the receiving Media Transport is the reverse direction 997 of the Media Transport used for sending. An RTP Session may use 998 many Media Transports and these define the session's network 999 interconnection topology. 1001 o A single Media Transport always carries a single RTP Session. 1003 o Multiple RTP Sessions can be conceptually related, for example 1004 originating from or targeted for the same Participant 1005 (Section 2.2.3) or Endpoint (Section 2.2.1), or by containing RTP 1006 Streams that are somehow related (Section 3). 1008 2.2.3. Participant 1010 A Participant is an entity reachable by a single signaling address, 1011 and is thus related more to the signaling context than to the media 1012 context. 1014 Characteristics: 1016 o A single signaling-addressable entity, using an application- 1017 specific signaling address space, for example a SIP URI. 1019 o A Participant can participate in several Multimedia Sessions 1020 (Section 2.2.4). 1022 o A Participant can be comprised of several associated Endpoints 1023 (Section 2.2.1). 1025 2.2.4. Multimedia Session 1027 A Multimedia Session is an association among a group of Participants 1028 (Section 2.2.3) engaged in the communication via one or more RTP 1029 Sessions (Section 2.2.2). It defines logical relationships among 1030 Media Sources (Section 2.1.4) that appear in multiple RTP Sessions. 1032 Characteristics: 1034 o A Multimedia Session can be composed of several RTP Sessions with 1035 potentially multiple RTP Streams per RTP Session. 1037 o Each Participant in a Multimedia Session can have a multitude of 1038 Media Captures and Media Rendering devices. 1040 o A single Multimedia Session can contain media from one or more 1041 Synchronization Contexts (Section 3.1). An example of that is a 1042 Multimedia Session containing one set of audio and video for 1043 communication purposes belonging to one Synchronization Context, 1044 and another set of audio and video for presentation purposes (like 1045 playing a video file) with a separate Synchronization Context that 1046 has no strong timing relationship and need not be strictly 1047 synchronized with the audio and video used for communication. 1049 2.2.5. Communication Session 1051 A Communication Session is an association among two or more 1052 Participants (Section 2.2.3) communicating with each other via one or 1053 more Multimedia Sessions (Section 2.2.4). 1055 Characteristics: 1057 o Each Participant in a Communication Session is identified via an 1058 application-specific signaling address. 1060 o A Communication Session is composed of Participants that share at 1061 least one Multimedia Session, involving one or more parallel RTP 1062 Sessions with potentially multiple RTP Streams per RTP Session. 1064 For example, in a full mesh communication, the Communication Session 1065 consists of a set of separate Multimedia Sessions between each pair 1066 of Participants. Another example is a centralized conference, where 1067 the Communication Session consists of a set of Multimedia Sessions 1068 between each Participant and the conference handler. 1070 3. Concepts of Inter-Relations 1072 This section uses the concepts from previous sections, and looks at 1073 different types of relationships among them. These relationships 1074 occur at different abstraction levels and for different purposes, but 1075 the reason for the needed relationship at a certain step in the media 1076 handling chain may exist at another step. For example, the use of 1077 Simulcast (Section 3.6)) implies a need to determine relations at RTP 1078 Stream level, but the underlying reason is that multiple Media 1079 Encoders use the same Media Source, i.e. to be able to identify a 1080 common Media Source. 1082 3.1. Synchronization Context 1084 A Synchronization Context defines a requirement on a strong timing 1085 relationship between the Media Sources, typically requiring alignment 1086 of clock sources. Such a relationship can be identified in multiple 1087 ways as listed below. A single Media Source can only belong to a 1088 single Synchronization Context, since it is assumed that a single 1089 Media Source can only have a single media clock and requiring 1090 alignment to several Synchronization Contexts (and thus reference 1091 clocks) will effectively merge those into a single Synchronization 1092 Context. 1094 3.1.1. RTCP CNAME 1096 RFC3550 [RFC3550] describes Inter-media synchronization between RTP 1097 Sessions based on RTCP CNAME, RTP and Network Time Protocol (NTP) 1098 [RFC5905] formatted timestamps of a reference clock. As indicated in 1099 [RFC7273], despite using NTP format timestamps, it is not required 1100 that the clock be synchronized to an NTP source. 1102 3.1.2. Clock Source Signaling 1104 [RFC7273] provides a mechanism to signal the clock source in Session 1105 Description Protocol (SDP) [RFC4566] both for the reference clock as 1106 well as the media clock, thus allowing a Synchronization Context to 1107 be defined beyond the one defined by the usage of CNAME source 1108 descriptions. 1110 3.1.3. Implicitly via RtcMediaStream 1112 WebRTC defines "RtcMediaStream" with one or more 1113 "RtcMediaStreamTracks". All tracks in a "RtcMediaStream" are 1114 intended to be synchronized when rendered, implying that they must be 1115 generated such that synchronization is possible. 1117 3.1.4. Explicitly via SDP Mechanisms 1119 The SDP Grouping Framework [RFC5888] defines an m= line (Section 4.2) 1120 grouping mechanism called "Lip Synchronization" (with LS 1121 identification-tag) for establishing the synchronization requirement 1122 across m= lines when they map to individual sources. 1124 Source-Specific Media Attributes in SDP [RFC5576] extends the above 1125 mechanism when multiple Media Sources are described by a single m= 1126 line. 1128 3.2. Endpoint 1130 Some applications requires knowledge of what Media Sources originate 1131 from a particular Endpoint (Section 2.2.1). This can include such 1132 decisions as packet routing between parts of the topology, knowing 1133 the Endpoint origin of the RTP Streams. 1135 In RTP, this identification has been overloaded with the 1136 Synchronization Context (Section 3.1) through the usage of the RTCP 1137 source description CNAME (Section 3.1.1). This works for some 1138 usages, but in others it breaks down. For example, if an Endpoint 1139 has two sets of Media Sources that have different Synchronization 1140 Contexts, like the audio and video of the human Participant as well 1141 as a set of Media Sources of audio and video for a shared movie, 1142 CNAME would not be an appropriate identification for that Endpoint. 1143 Therefore, an Endpoint may have multiple CNAMEs. The CNAMEs or the 1144 Media Sources themselves can be related to the Endpoint. 1146 3.3. Participant 1148 In communication scenarios, it is commonly needed to know which Media 1149 Sources originate from which Participant (Section 2.2.3). One reason 1150 is, for example, to enable the application to display Participant 1151 Identity information correctly associated with the Media Sources. 1152 This association is handled through the signaling solution to point 1153 at a specific Multimedia Session where the Media Sources may be 1154 explicitly or implicitly tied to a particular Endpoint. 1156 Participant information becomes more problematic due to Media Sources 1157 that are generated through mixing or other conceptual processing of 1158 Raw Streams or Source Streams that originate from different 1159 Participants. This type of Media Sources can thus have a dynamically 1160 varying set of origins and Participants. RTP contains the concept of 1161 CSRC that carry information about the previous step origin of the 1162 included media content on RTP level. 1164 3.4. RtcMediaStream 1166 An RtcMediaStream in WebRTC is an explicit grouping of a set of Media 1167 Sources (RtcMediaStreamTracks) that share a common identifier and a 1168 single Synchronization Context (Section 3.1). 1170 3.5. Multi-Channel Audio 1172 There exist a number of RTP payload formats that can carry multi- 1173 channel audio, despite the codec being a single-channel (mono) 1174 encoder. Multi-channel audio can be viewed as multiple Media Sources 1175 sharing a common Synchronization Context. These are independently 1176 encoded by a Media Encoder and the different Encoded Streams are 1177 packetized together in a time synchronized way into a single Source 1178 RTP Stream, using the used codec's RTP Payload format. Examples of 1179 codecs that support multi-channel audio are PCMA and PCMU [RFC3551], 1180 AMR [RFC4867], and G.719 [RFC5404]. 1182 3.6. Simulcast 1184 A Media Source represented as multiple independent Encoded Streams 1185 constitutes a Simulcast [I-D.ietf-mmusic-sdp-simulcast] or MDC of 1186 that Media Source. Figure 8 shows an example of a Media Source that 1187 is encoded into three separate Simulcast streams, that are in turn 1188 sent on the same Media Transport flow. When using Simulcast, the RTP 1189 Streams may be sharing RTP Session and Media Transport, or be 1190 separated on different RTP Sessions and Media Transports, or any 1191 combination of these two. One major reason to use separate Media 1192 Transports is to make use of different Quality of Service for the 1193 different Source RTP Streams. Some considerations on separating 1194 related RTP Streams are discussed in Section 3.12. 1196 +----------------+ 1197 | Media Source | 1198 +----------------+ 1199 Source Stream | 1200 +----------------------+----------------------+ 1201 | | | 1202 V V V 1203 +------------------+ +------------------+ +------------------+ 1204 | Media Encoder | | Media Encoder | | Media Encoder | 1205 +------------------+ +------------------+ +------------------+ 1206 | Encoded | Encoded | Encoded 1207 | Stream | Stream | Stream 1208 V V V 1209 +------------------+ +------------------+ +------------------+ 1210 | Media Packetizer | | Media Packetizer | | Media Packetizer | 1211 +------------------+ +------------------+ +------------------+ 1212 | Source | Source | Source 1213 | RTP | RTP | RTP 1214 | Stream | Stream | Stream 1215 +-----------------+ | +-----------------+ 1216 | | | 1217 V V V 1218 +-------------------+ 1219 | Media Transport | 1220 +-------------------+ 1222 Figure 8: Example of Media Source Simulcast 1224 The Simulcast relation between the RTP Streams is the common Media 1225 Source. In addition, to be able to identify the common Media Source, 1226 a receiver of the RTP Stream may need to know which configuration or 1227 encoding goals that lay behind the produced Encoded Stream and its 1228 properties. This enables selection of the stream that is most useful 1229 in the application at that moment. 1231 3.7. Layered Multi-Stream 1233 Layered Multi-Stream (LMS) is a mechanism by which different portions 1234 of a layered or scalable encoding of a Source Stream are sent using 1235 separate RTP Streams (sometimes in separate RTP Sessions). LMSs are 1236 useful for receiver control of layered media. 1238 A Media Source represented as an Encoded Stream and multiple 1239 Dependent Streams constitutes a Media Source that has layered 1240 dependencies. Figure 9 represents an example of a Media Source that 1241 is encoded into three dependent layers, where two layers are sent on 1242 the same Media Transport using different RTP Streams, i.e. SSRCs, and 1243 the third layer is sent on a separate Media Transport. 1245 +----------------+ 1246 | Media Source | 1247 +----------------+ 1248 | 1249 | 1250 V 1251 +---------------------------------------------------------+ 1252 | Media Encoder | 1253 +---------------------------------------------------------+ 1254 | | | 1255 Encoded Stream Dependent Stream Dependent Stream 1256 | | | 1257 V V V 1258 +----------------+ +----------------+ +----------------+ 1259 |Media Packetizer| |Media Packetizer| |Media Packetizer| 1260 +----------------+ +----------------+ +----------------+ 1261 | | | 1262 RTP Stream RTP Stream RTP Stream 1263 | | | 1264 +------+ +------+ | 1265 | | | 1266 V V V 1267 +-----------------+ +-----------------+ 1268 | Media Transport | | Media Transport | 1269 +-----------------+ +-----------------+ 1271 Figure 9: Example of Media Source Layered Dependency 1273 It is sometimes useful to make a distinction between using a single 1274 Media Transport or multiple separate Media Transports when (in both 1275 cases) using multiple RTP Streams to carry Encoded Streams and 1276 Dependent Streams for a Media Source. Therefore, the following new 1277 terminology is defined here: 1279 SRST: Single RTP Stream on a Single Media Transport 1281 MRST: Multiple RTP Streams on a Single Media Transport 1283 MRMT: Multiple RTP Streams on Multiple Media Transports 1285 MRST and MRMT relations needs to identify the common Media Encoder 1286 origin for the Encoded and Dependent Streams. When using different 1287 RTP Sessions (MRMT), a single RTP Stream per Media Encoder, and a 1288 single Media Source in each RTP Session, common SSRC and CNAMEs can 1289 be used to identify the common Media Source. When multiple RTP 1290 Streams are sent from one Media Encoder in the same RTP Session 1291 (MRST), then CNAME is the only currently specified RTP identifier 1292 that can be used. In cases where multiple Media Encoders use 1293 multiple Media Sources sharing Synchronization Context, and thus 1294 having a common CNAME, additional heuristics or identification need 1295 to be applied to create the MRST or MRMT relationships between the 1296 RTP Streams. 1298 3.8. RTP Stream Duplication 1300 RTP Stream Duplication [RFC7198], using the same or different Media 1301 Transports, and optionally also delaying the duplicate [RFC7197], 1302 offers a simple way to protect media flows from packet loss in some 1303 cases (see Figure 10). This is a specific type of redundancy. All 1304 but one Source RTP Stream (Section 2.1.10) are effectively Redundancy 1305 RTP Streams (Section 2.1.12), but since both Source and Redundant RTP 1306 Streams are the same, it does not matter which one is which. This 1307 can also be seen as a specific type of Simulcast (Section 3.6) that 1308 transmits the same Encoded Stream (Section 2.1.7) multiple times. 1310 +----------------+ 1311 | Media Source | 1312 +----------------+ 1313 Source Stream | 1314 V 1315 +----------------+ 1316 | Media Encoder | 1317 +----------------+ 1318 Encoded Stream | 1319 +-----------+-----------+ 1320 | | 1321 V V 1322 +------------------+ +------------------+ 1323 | Media Packetizer | | Media Packetizer | 1324 +------------------+ +------------------+ 1325 Source | RTP Stream Source | RTP Stream 1326 | V 1327 | +-------------+ 1328 | | Delay (opt) | 1329 | +-------------+ 1330 | | 1331 +-----------+-----------+ 1332 | 1333 V 1334 +-------------------+ 1335 | Media Transport | 1336 +-------------------+ 1338 Figure 10: Example of RTP Stream Duplication 1340 3.9. Redundancy Format 1342 The RTP Payload for Redundant Audio Data [RFC2198] defines a 1343 transport for redundant audio data together with primary data in the 1344 same RTP payload. The redundant data can be a time delayed version 1345 of the primary or another time delayed Encoded Stream using a 1346 different Media Encoder to encode the same Media Source as the 1347 primary, as depicted in Figure 11. 1349 +--------------------+ 1350 | Media Source | 1351 +--------------------+ 1352 | 1353 Source Stream 1354 | 1355 +------------------------+ 1356 | | 1357 V V 1358 +--------------------+ +--------------------+ 1359 | Media Encoder | | Media Encoder | 1360 +--------------------+ +--------------------+ 1361 | | 1362 | +------------+ 1363 Encoded Stream | Time Delay | 1364 | +------------+ 1365 | | 1366 | +------------------+ 1367 V V 1368 +--------------------+ 1369 | Media Packetizer | 1370 +--------------------+ 1371 | 1372 V 1373 RTP Stream 1375 Figure 11: Concept for usage of Audio Redundancy with different Media 1376 Encoders 1378 The Redundancy format is thus providing the necessary meta 1379 information to correctly relate different parts of the same Encoded 1380 Stream. The case depicted above (Figure 11) relates the Received 1381 Source Stream fragments coming out of different Media Decoders, to be 1382 able to combine them together into a less erroneous Source Stream. 1384 3.10. RTP Retransmission 1386 Figure 12 shows an example where a Media Source's Source RTP Stream 1387 is protected by a retransmission (RTX) flow [RFC4588]. In this 1388 example the Source RTP Stream and the Redundancy RTP Stream share the 1389 same Media Transport. 1391 +--------------------+ 1392 | Media Source | 1393 +--------------------+ 1394 | 1395 V 1396 +--------------------+ 1397 | Media Encoder | 1398 +--------------------+ 1399 | Retransmission 1400 Encoded Stream +--------+ +---- Request 1401 V | V V 1402 +--------------------+ | +--------------------+ 1403 | Media Packetizer | | | RTP Retransmission | 1404 +--------------------+ | +--------------------+ 1405 | | | 1406 +------------+ Redundancy RTP Stream 1407 Source RTP Stream | 1408 | | 1409 +---------+ +---------+ 1410 | | 1411 V V 1412 +-----------------+ 1413 | Media Transport | 1414 +-----------------+ 1416 Figure 12: Example of Media Source Retransmission Flows 1418 The RTP Retransmission example (Figure 12) illustrates that this 1419 mechanism works purely on the Source RTP Stream. The RTP 1420 Retransmission transform buffers the sent Source RTP Stream and, upon 1421 request, emits a retransmitted packet with an extra payload header as 1422 a Redundancy RTP Stream. The RTP Retransmission mechanism [RFC4588] 1423 is specified such that there is a one to one relation between the 1424 Source RTP Stream and the Redundancy RTP Stream. Therefore, a 1425 Redundancy RTP Stream needs to be associated with its Source RTP 1426 Stream. This is done based on CNAME selectors and heuristics to 1427 match requested packets for a given Source RTP Stream with the 1428 original sequence number in the payload of any new Redundancy RTP 1429 Stream using the RTX payload format. In cases where the Redundancy 1430 RTP Stream is sent in a different RTP Session than the Source RTP 1431 Stream, the RTP Session relation is signaled by using the SDP Media 1432 Grouping's [RFC5888] Flow Identification (FID identification-tag) 1433 semantics. 1435 3.11. Forward Error Correction 1437 Figure 13 shows an example where two Media Sources' Source RTP 1438 Streams are protected by Forward Error Correction (FEC). Source RTP 1439 Stream A has a RTP-based Redundancy transformation in FEC Encoder 1. 1440 This produces a Redundancy RTP Stream 1, that is only related to 1441 Source RTP Stream A. The FEC Encoder 2, however, takes two Source 1442 RTP Streams (A and B) and produces a Redundancy RTP Stream 2 that 1443 protects them jointly, i.e. Redundancy RTP Stream 2 relates to two 1444 Source RTP Streams (a FEC group). FEC decoding, when needed due to 1445 packet loss or packet corruption at the receiver, requires knowledge 1446 about which Source RTP Streams that the FEC encoding was based on. 1448 In Figure 13 all RTP Streams are sent on the same Media Transport. 1449 This is however not the only possible choice. Numerous combinations 1450 exist for spreading these RTP Streams over different Media Transports 1451 to achieve the communication application's goal. 1453 +--------------------+ +--------------------+ 1454 | Media Source A | | Media Source B | 1455 +--------------------+ +--------------------+ 1456 | | 1457 V V 1458 +--------------------+ +--------------------+ 1459 | Media Encoder A | | Media Encoder B | 1460 +--------------------+ +--------------------+ 1461 | | 1462 Encoded Stream Encoded Stream 1463 V V 1464 +--------------------+ +--------------------+ 1465 | Media Packetizer A | | Media Packetizer B | 1466 +--------------------+ +--------------------+ 1467 | | 1468 Source RTP Stream A Source RTP Stream B 1469 | | 1470 +-----+---------+-------------+ +---+---+ 1471 | V V V | 1472 | +---------------+ +---------------+ | 1473 | | FEC Encoder 1 | | FEC Encoder 2 | | 1474 | +---------------+ +---------------+ | 1475 | Redundancy | Redundancy | | 1476 | RTP Stream 1 | RTP Stream 2 | | 1477 V V V V 1478 +----------------------------------------------------------+ 1479 | Media Transport | 1480 +----------------------------------------------------------+ 1482 Figure 13: Example of FEC Redundancy RTP Streams 1484 As FEC Encoding exists in various forms, the methods for relating FEC 1485 Redundancy RTP Streams with its source information in Source RTP 1486 Streams are many. The XOR based RTP FEC Payload format [RFC5109] is 1487 defined in such a way that a Redundancy RTP Stream has a one to one 1488 relation with a Source RTP Stream. In fact, the RFC requires the 1489 Redundancy RTP Stream to use the same SSRC as the Source RTP Stream. 1490 This requires the use of either a separate RTP Session, or the 1491 Redundancy RTP Payload format [RFC2198]. The underlying relation 1492 requirement for this FEC format and a particular Redundancy RTP 1493 Stream is to know the related Source RTP Stream, including its SSRC. 1495 3.12. RTP Stream Separation 1497 RTP Streams can be separated exclusively based on their SSRCs, at the 1498 RTP Session level, or at the Multi-Media Session level. 1500 When the RTP Streams that have a relationship are all sent in the 1501 same RTP Session and are uniquely identified based on their SSRC 1502 only, it is termed an SSRC-Only Based Separation. Such streams can 1503 be related via RTCP CNAME to identify that the streams belong to the 1504 same Endpoint. SSRC-based approaches [RFC5576], when used, can 1505 explicitly relate various such RTP Streams. 1507 On the other hand, when RTP Streams that are related are sent in the 1508 context of different RTP Sessions to achieve separation, it is known 1509 as RTP Session-based separation. This is commonly used when the 1510 different RTP Streams are intended for different Media Transports. 1512 Several mechanisms that use RTP Session-based separation rely on it 1513 to enable an implicit grouping mechanism expressing the relationship. 1514 The solutions have been based on using the same SSRC value in the 1515 different RTP Sessions to implicitly indicate their relation. That 1516 way, no explicit RTP level mechanism has been needed, only signaling 1517 level relations have been established using semantics from Grouping 1518 of Media lines framework [RFC5888]. Examples of this are RTP 1519 Retransmission [RFC4588], SVC Multi-Session Transmission [RFC6190] 1520 and XOR Based FEC [RFC5109]. RTCP CNAME explicitly relates RTP 1521 Streams across different RTP Sessions, as explained in the previous 1522 section. Such a relationship can be used to perform inter-media 1523 synchronization. 1525 RTP Streams that are related and need to be associated can be part of 1526 different Multimedia Sessions, rather than just different RTP 1527 Sessions within the same Multimedia Session context. This puts 1528 further demand on the scope of the mechanism(s) and its handling of 1529 identifiers used for expressing the relationships. 1531 3.13. Multiple RTP Sessions over one Media Transport 1533 [I-D.westerlund-avtcore-transport-multiplexing] describes a mechanism 1534 that allows several RTP Sessions to be carried over a single 1535 underlying Media Transport. The main reasons for doing this are 1536 related to the impact of using one or more Media Transports (using a 1537 common network path or potentially have different ones). The fewer 1538 Media Transports used, the less need for NAT/FW traversal resources 1539 and smaller number of flow based Quality of Service (QoS). 1541 However, Multiple RTP Sessions over one Media Transport imply that a 1542 single Media Transport 5-tuple is not sufficient to express in which 1543 RTP Session context a particular RTP Stream exists. Complexities in 1544 the relationship between Media Transports and RTP Session already 1545 exist as one RTP Session contains multiple Media Transports, e.g. 1546 even a Peer-to-Peer RTP Session with RTP/RTCP Multiplexing requires 1547 two Media Transports, one in each direction. The relationship 1548 between Media Transports and RTP Sessions as well as additional 1549 levels of identifiers need to be considered in both signaling design 1550 and when defining terminology. 1552 4. Mapping from Existing Terms 1554 This section describes a selected set of terms from some relevant 1555 IETF RFC and Internet Drafts (at the time of writing), using the 1556 concepts from previous sections. 1558 4.1. Telepresence Terms 1560 The terms in this sub-section are used in the context of CLUE 1561 [I-D.ietf-clue-framework]. Note that some terms listed in this sub- 1562 section use the same names as terms defined elsewhere in this 1563 document. Unless explicitly stated (as "RTP Taxonomy") and in this 1564 sub-section, they are to be read as references to the CLUE-specific 1565 term within this sub-section. 1567 4.1.1. Audio Capture 1569 Defined in CLUE as a Media Capture (Section 4.1.7) for audio. 1570 Describes an audio Media Source (Section 2.1.4). 1572 4.1.2. Capture Device 1574 Defined in CLUE as a device that converts physical input into an 1575 electrical signal. Identifies a physical entity performing an RTP 1576 Taxonomy Media Capture (Section 2.1.2) transformation. 1578 4.1.3. Capture Encoding 1580 Defined in CLUE as a specific encoding (Section 4.1.6) of a Media 1581 Capture (Section 4.1.7). Describes an Encoded Stream (Section 2.1.7) 1582 related to CLUE specific semantic information. 1584 4.1.4. Capture Scene 1586 Defined in CLUE as a structure representing a spatial region captured 1587 by one or more Capture Devices (Section 4.1.2), each capturing media 1588 representing a portion of the region. Describes a set of spatially 1589 related Media Sources (Section 2.1.4). 1591 4.1.5. Endpoint 1593 Defined in CLUE as a CLUE-capable device which is the logical point 1594 of final termination through receiving, decoding and rendering and/or 1595 initiation through capturing, encoding, and sending of media streams 1596 (Section 4.1.10). CLUE further defines it to consist of one or more 1597 physical devices with source and sink media streams, and exactly one 1598 [RFC4353] Participant. Describes exactly one Participant 1599 (Section 2.2.3) and one or more RTP Taxonomy Endpoints 1600 (Section 2.2.1). 1602 4.1.6. Individual Encoding 1604 Defined in CLUE as a set of parameters representing a way to encode a 1605 Media Capture (Section 4.1.7) to become a Capture Encoding 1606 (Section 4.1.3). Describes the configuration information needed to 1607 perform a Media Encoder (Section 2.1.6) transformation. 1609 4.1.7. Media Capture 1611 Defined in CLUE as a source of media, such as from one or more 1612 Capture Devices (Section 4.1.2) or constructed from other media 1613 streams (Section 4.1.10). Describes either an RTP Taxonomy Media 1614 Capture (Section 2.1.2) or a Media Source (Section 2.1.4), depending 1615 on in which context the term is used. 1617 4.1.8. Media Consumer 1619 Defined in CLUE as a CLUE-capable device that intends to receive 1620 Capture Encodings (Section 4.1.3). Describes the media receiving 1621 part of an RTP Taxonomy Endpoint (Section 2.2.1). 1623 4.1.9. Media Provider 1625 Defined in CLUE as a CLUE-capable device that intends to send Capture 1626 Encodings (Section 4.1.3). Describes the media sending part of an 1627 RTP Taxonomy Endpoint (Section 2.2.1). 1629 4.1.10. Stream 1631 Defined in CLUE as a Capture Encoding (Section 4.1.3) sent from a 1632 Media Provider (Section 4.1.9) to a Media Consumer (Section 4.1.8) 1633 via RTP. Describes an RTP Stream (Section 2.1.10). 1635 4.1.11. Video Capture 1637 Defined in CLUE as a Media Capture (Section 4.1.7) for video. 1638 Describes a video Media Source (Section 2.1.4). 1640 4.2. Media Description 1642 A single Session Description Protocol (SDP) [RFC4566] media 1643 description (or media block; an m-line and all subsequent lines until 1644 the next m-line or the end of the SDP) describes part of the 1645 necessary configuration and identification information needed for a 1646 Media Encoder transformation, as well as the necessary configuration 1647 and identification information for the Media Decoder to be able to 1648 correctly interpret a received RTP Stream. 1650 A Media Description typically relates to a single Media Source. This 1651 is for example an explicit restriction in WebRTC. However, nothing 1652 prevents that the same Media Description (and same RTP Session) is 1653 re-used for multiple Media Sources 1654 [I-D.ietf-avtcore-rtp-multi-stream]. It can thus describe properties 1655 of one or more RTP Streams, and can also describe properties valid 1656 for an entire RTP Session (via [RFC5576] mechanisms, for example). 1658 4.3. Media Stream 1660 RTP [RFC3550] uses media stream, audio stream, video stream, and 1661 stream of (RTP) packets interchangeably, which are all RTP Streams. 1663 4.4. Multimedia Conference 1665 A Multimedia Conference is a Communication Session (Section 2.2.5) 1666 between two or more Participants (Section 2.2.3), along with the 1667 software they are using to communicate. 1669 4.5. Multimedia Session 1671 SDP [RFC4566] defines a Multimedia Session as a set of multimedia 1672 senders and receivers and the data streams flowing from senders to 1673 receivers, which would correspond to a set of Endpoints and the RTP 1674 Streams that flow between them. In this document, Multimedia Session 1675 (Section 2.2.4) also assumes those Endpoints belong to a set of 1676 Participants that are engaged in communication via a set of related 1677 RTP Streams. 1679 RTP [RFC3550] defines a Multimedia Session as a set of concurrent RTP 1680 Sessions among a common group of Participants. For example, a video 1681 conference may contain an audio RTP Session and a video RTP Session. 1682 This would correspond to a group of Participants (each using one or 1683 more Endpoints) sharing a set of concurrent RTP Sessions. In this 1684 document, Multimedia Session also defines those RTP Sessions to have 1685 some relation and be part of a communication among the Participants. 1687 4.6. Multipoint Control Unit (MCU) 1689 This term is commonly used to describe the central node in any type 1690 of star topology [I-D.ietf-avtcore-rtp-topologies-update] conference. 1691 It describes a device that includes one Participant (Section 2.2.3) 1692 (usually corresponding to a so-called conference focus) and one or 1693 more related Endpoints (Section 2.2.1) (sometimes one or more per 1694 conference Participant). 1696 4.7. Multi-Session Transmission (MST) 1698 One of two transmission modes defined in H.264 based SVC [RFC6190], 1699 the other mode being SST (Section 4.13). In Multi-Session 1700 Transmission (MST), the SVC Media Encoder sends Encoded Streams and 1701 Dependent Streams distributed across two or more RTP Streams in one 1702 or more RTP Sessions. The term "MST" is ambiguous in RFC 6190, 1703 especially since the name indicates the use of multiple "sessions", 1704 while MST type packetization is in fact required whenever two or more 1705 RTP Streams are used for the Encoded and Dependent Streams, 1706 regardless if those are sent in one or more RTP Sessions. 1707 Corresponds either to MRST or MRMT (Section 3.7) stream relations 1708 defined in this document. The SVC RTP Payload RFC [RFC6190] is not 1709 particularly explicit about how the common Media Encoder 1710 (Section 2.1.6) relation between Encoded Streams (Section 2.1.7) and 1711 Dependent Streams (Section 2.1.8) is to be implemented. 1713 4.8. Recording Device 1715 WebRTC specifications use this term to refer to locally available 1716 entities performing a Media Capture (Section 2.1.2) transformation. 1718 4.9. RtcMediaStream 1720 A WebRTC RtcMediaStream is a set of Media Sources (Section 2.1.4) 1721 sharing the same Synchronization Context (Section 3.1). 1723 4.10. RtcMediaStreamTrack 1725 A WebRTC RtcMediaStreamTrack is a Media Source (Section 2.1.4). 1727 4.11. RTP Sender 1729 RTP [RFC3550] uses this term, which can be seen as the RTP protocol 1730 part of a Media Packetizer (Section 2.1.9). 1732 4.12. RTP Session 1734 Within the context of SDP, a singe m= line can map to a single RTP 1735 Session (Section 2.2.2) or multiple m= lines can map to a single RTP 1736 Session. The latter is enabled via multiplexing schemes such as 1737 BUNDLE [I-D.ietf-mmusic-sdp-bundle-negotiation], for example, which 1738 allows mapping of multiple m= lines to a single RTP Session. 1740 4.13. Single Session Transmission (SST) 1742 One of two transmission modes defined in H.264 based SVC [RFC6190], 1743 the other mode being MST (Section 4.7). In Single Session 1744 Transmission (SST), the SVC Media Encoder sends Encoded Streams 1745 (Section 2.1.7) and Dependent Streams (Section 2.1.8) combined into a 1746 single RTP Stream (Section 2.1.10) in a single RTP Session 1747 (Section 2.2.2), using the SVC RTP Payload format. The term "SST" is 1748 ambiguous in RFC 6190, in that it sometimes refers to the use of a 1749 single RTP Stream, like in sections relating to packetization, and 1750 sometimes appears to refer to use of a single RTP Session, like in 1751 the context of discussing SDP. Closely corresponds to SRST 1752 (Section 3.7) defined in this document. 1754 4.14. SSRC 1756 RTP [RFC3550] defines this as "the source of a stream of RTP 1757 packets", which indicates that an SSRC is not only a unique 1758 identifier for the Encoded Stream (Section 2.1.7) carried in those 1759 packets, but is also effectively used as a term to denote a Media 1760 Packetizer (Section 2.1.9). In [RFC3550], it is stated that "a 1761 synchronization source may change its data format, e.g., audio 1762 encoding, over time". The related Encoded Stream data format in an 1763 RTP Stream (Section 2.1.10) is identified by the RTP Payload Type. 1764 Changing data format for an Encoded Stream effectively also changes 1765 what Media Encoder (Section 2.1.6) that is used for the Encoded 1766 Stream. No ambiguity is introduced to SSRC as Encoded Stream 1767 identifier by allowing RTP Payload Type changes, as long as only a 1768 single RTP Payload Type is valid for any given RTP Time Stamp. This 1769 is aligned with and further described by Section 5.2 of [RFC3550]. 1771 5. Security Considerations 1773 The purpose of this document is to make clarifications and reduce the 1774 confusion prevalent in RTP taxonomy because of inconsistent usage by 1775 multiple technologies and protocols making use of the RTP protocol. 1776 It does not introduce any new security considerations beyond those 1777 already well documented in the RTP protocol [RFC3550] and each of the 1778 many respective specifications of the various protocols making use of 1779 it. 1781 Having a well-defined common terminology and understanding of the 1782 complexities of the RTP architecture will help lead us to better 1783 standards, avoiding security problems. 1785 6. Acknowledgement 1787 This document has many concepts borrowed from several documents such 1788 as WebRTC [I-D.ietf-rtcweb-overview], CLUE [I-D.ietf-clue-framework], 1789 and Multiplexing Architecture 1790 [I-D.westerlund-avtcore-transport-multiplexing]. The authors would 1791 like to thank all the authors of each of those documents. 1793 The authors would also like to acknowledge the insights, guidance and 1794 contributions of Magnus Westerlund, Roni Even, Paul Kyzivat, Colin 1795 Perkins, Keith Drage, Harald Alvestrand, Alex Eleftheriadis, Mo 1796 Zanaty, Stephan Wenger, and Bernard Aboba. 1798 7. Contributors 1800 Magnus Westerlund has contributed the concept model for the media 1801 chain using transformations and streams model, including rewriting 1802 pre-existing concepts into this model and adding missing concepts. 1803 The first proposal for updating the relationships and the topologies 1804 based on this concept was also performed by Magnus. 1806 8. IANA Considerations 1808 This document makes no request of IANA. 1810 9. Informative References 1812 [I-D.ietf-avtcore-rtp-multi-stream] 1813 Lennox, J., Westerlund, M., Wu, W., and C. Perkins, 1814 "Sending Multiple Media Streams in a Single RTP Session", 1815 draft-ietf-avtcore-rtp-multi-stream-08 (work in progress), 1816 July 2015. 1818 [I-D.ietf-avtcore-rtp-topologies-update] 1819 Westerlund, M. and S. Wenger, "RTP Topologies", draft- 1820 ietf-avtcore-rtp-topologies-update-10 (work in progress), 1821 July 2015. 1823 [I-D.ietf-clue-framework] 1824 Duckworth, M., Pepperell, A., and S. Wenger, "Framework 1825 for Telepresence Multi-Streams", draft-ietf-clue- 1826 framework-22 (work in progress), April 2015. 1828 [I-D.ietf-mmusic-sdp-bundle-negotiation] 1829 Holmberg, C., Alvestrand, H., and C. Jennings, 1830 "Negotiating Media Multiplexing Using the Session 1831 Description Protocol (SDP)", draft-ietf-mmusic-sdp-bundle- 1832 negotiation-23 (work in progress), July 2015. 1834 [I-D.ietf-mmusic-sdp-simulcast] 1835 Burman, B., Westerlund, M., Nandakumar, S., and M. Zanaty, 1836 "Using Simulcast in SDP and RTP Sessions", draft-ietf- 1837 mmusic-sdp-simulcast-00 (work in progress), January 2015. 1839 [I-D.ietf-rtcweb-overview] 1840 Alvestrand, H., "Overview: Real Time Protocols for 1841 Browser-based Applications", draft-ietf-rtcweb-overview-14 1842 (work in progress), June 2015. 1844 [I-D.westerlund-avtcore-transport-multiplexing] 1845 Westerlund, M. and C. Perkins, "Multiplexing Multiple RTP 1846 Sessions onto a Single Lower-Layer Transport", draft- 1847 westerlund-avtcore-transport-multiplexing-07 (work in 1848 progress), October 2013. 1850 [RFC2198] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., 1851 Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse- 1852 Parisis, "RTP Payload for Redundant Audio Data", RFC 2198, 1853 DOI 10.17487/RFC2198, September 1997, 1854 . 1856 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 1857 Jacobson, "RTP: A Transport Protocol for Real-Time 1858 Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, 1859 July 2003, . 1861 [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and 1862 Video Conferences with Minimal Control", STD 65, RFC 3551, 1863 DOI 10.17487/RFC3551, July 2003, 1864 . 1866 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 1867 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 1868 RFC 3711, DOI 10.17487/RFC3711, March 2004, 1869 . 1871 [RFC4353] Rosenberg, J., "A Framework for Conferencing with the 1872 Session Initiation Protocol (SIP)", RFC 4353, 1873 DOI 10.17487/RFC4353, February 2006, 1874 . 1876 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 1877 Description Protocol", RFC 4566, DOI 10.17487/RFC4566, 1878 July 2006, . 1880 [RFC4588] Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R. 1881 Hakenberg, "RTP Retransmission Payload Format", RFC 4588, 1882 DOI 10.17487/RFC4588, July 2006, 1883 . 1885 [RFC4867] Sjoberg, J., Westerlund, M., Lakaniemi, A., and Q. Xie, 1886 "RTP Payload Format and File Storage Format for the 1887 Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband 1888 (AMR-WB) Audio Codecs", RFC 4867, DOI 10.17487/RFC4867, 1889 April 2007, . 1891 [RFC5109] Li, A., Ed., "RTP Payload Format for Generic Forward Error 1892 Correction", RFC 5109, DOI 10.17487/RFC5109, December 1893 2007, . 1895 [RFC5404] Westerlund, M. and I. Johansson, "RTP Payload Format for 1896 G.719", RFC 5404, DOI 10.17487/RFC5404, January 2009, 1897 . 1899 [RFC5481] Morton, A. and B. Claise, "Packet Delay Variation 1900 Applicability Statement", RFC 5481, DOI 10.17487/RFC5481, 1901 March 2009, . 1903 [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific 1904 Media Attributes in the Session Description Protocol 1905 (SDP)", RFC 5576, DOI 10.17487/RFC5576, June 2009, 1906 . 1908 [RFC5888] Camarillo, G. and H. Schulzrinne, "The Session Description 1909 Protocol (SDP) Grouping Framework", RFC 5888, 1910 DOI 10.17487/RFC5888, June 2010, 1911 . 1913 [RFC5905] Mills, D., Martin, J., Ed., Burbank, J., and W. Kasch, 1914 "Network Time Protocol Version 4: Protocol and Algorithms 1915 Specification", RFC 5905, DOI 10.17487/RFC5905, June 2010, 1916 . 1918 [RFC6190] Wenger, S., Wang, Y., Schierl, T., and A. Eleftheriadis, 1919 "RTP Payload Format for Scalable Video Coding", RFC 6190, 1920 DOI 10.17487/RFC6190, May 2011, 1921 . 1923 [RFC7160] Petit-Huguenin, M. and G. Zorn, Ed., "Support for Multiple 1924 Clock Rates in an RTP Session", RFC 7160, 1925 DOI 10.17487/RFC7160, April 2014, 1926 . 1928 [RFC7197] Begen, A., Cai, Y., and H. Ou, "Duplication Delay 1929 Attribute in the Session Description Protocol", RFC 7197, 1930 DOI 10.17487/RFC7197, April 2014, 1931 . 1933 [RFC7198] Begen, A. and C. Perkins, "Duplicating RTP Streams", 1934 RFC 7198, DOI 10.17487/RFC7198, April 2014, 1935 . 1937 [RFC7201] Westerlund, M. and C. Perkins, "Options for Securing RTP 1938 Sessions", RFC 7201, DOI 10.17487/RFC7201, April 2014, 1939 . 1941 [RFC7273] Williams, A., Gross, K., van Brandenburg, R., and H. 1942 Stokking, "RTP Clock Source Signalling", RFC 7273, 1943 DOI 10.17487/RFC7273, June 2014, 1944 . 1946 Appendix A. Changes From Earlier Versions 1948 NOTE TO RFC EDITOR: Please remove this section prior to publication. 1950 A.1. Modifications Between WG Version -07 and -08 1952 Addresses comments from IESG evaluation. 1954 o Made text more firm around what improvements this document 1955 introduces. 1957 o Clarified the distinction between analog and digital in sections 1958 2.1.1 and 2.1.2. 1960 o Removed the explicit requirement that a Source RTP Stream must 1961 send at least some data from an Encoded Stream, replacing it with 1962 a statement that it is directly related to the Encoded Stream. 1964 o Moved the clarification that RTP-based Redundancy excludes Media 1965 Encoder redundancy data in an Encoded Stream from Section 2.1.10 1966 (RTP Stream) to 2.1.11 (RTP-based Redundancy), since that 1967 statement applies to RTP-based Redundancy rather than to RTP 1968 Stream. 1970 o Added clarification that a Media Transport Sender can 1971 intentionally pace packet transmission. 1973 o Aligned text around delay variation to use this term throughout, 1974 and added a reference to RFC 5481. 1976 o Added that RTP Session is a group communications channel that can 1977 potentially carry a number of RTP Streams, as an additional 1978 clarification below Figure 7. 1980 o Added a clarification in Section 4.1 around Telepresence Terms on 1981 which references are to CLUE terms and which are to other sections 1982 of this document, for terms that have the same name in CLUE as in 1983 this document. 1985 o Clarified in Section 4.14 what SSRC data format changes means, 1986 since the RFC 3550 SSRC definition mentions this possibility. 1988 o Editorial improvements. 1990 A.2. Modifications Between WG Version -06 and -07 1992 Addresses comments from AD review and GenArt review. 1994 o Added RTP-based Security and RTP-based Validation transform 1995 sections, as well as Secured RTP Stream and Received Secured RTP 1996 Stream sections. 1998 o Improved wording in Abstract and Introduction sections. 2000 o Clarified what is considered "media" in section 2.1.2 Media 2001 Capture. 2003 o Changed a number of "Characteristics" lists to more suitable prose 2004 text. 2006 o Re-worded text around use of Encoded and Dependent RTP Streams in 2007 section 2.1.9 Media Packetizer. 2009 o Clarified description of Source RTP Stream in section 2.1.10. 2011 o Clarified motivation to use separate Media Transports for 2012 Simulcast in section 3.6. 2014 o Added local descriptions of terms imported from CLUE framework. 2016 o Editorial improvements. 2018 A.3. Modifications Between WG Version -05 and -06 2020 o Clarified that a Redundancy RTP Stream can be used standalone to 2021 generate Repaired RTP Streams. 2023 o Clarified that (in accordance with above) RTP-based Repair takes 2024 zero or more Received RTP Streams and one or more Received 2025 Redundancy RTP Streams as input. 2027 o Changed Figure 6 to more clearly show that Media Transport is 2028 terminated in the Endpoint, not in the Participant. 2030 o Added a sentence to Endpoint section that clarifies there may be 2031 contexts where a single "host" can serve multiple Participants, 2032 making those Endpoints share some properties. 2034 o Merged previous section 3.5 on SST/MST with previous section 3.8 2035 on Layered Multi-Stream into a common section discussing the 2036 scalable/layered stream relation, and moved improved, descriptive 2037 text on SST and MST to new sub-sections 4.7 and 4.13, describing 2038 them as existing terms. 2040 o Editorial improvements. 2042 A.4. Modifications Between WG Version -04 and -05 2044 o Editorial improvements. 2046 A.5. Modifications Between WG Version -03 and -04 2048 o Changed "Media Redundancy" and "Media Repair" to "RTP-based 2049 Redundancy" and "RTP-based Repair", since those terms are more 2050 specific and correct. 2052 o Changed "End Point" to "Endpoint" and removed Editor's Note on 2053 this. 2055 o Clarified that a Media Capture may impose constraints on clock 2056 handling. 2058 o Clarified that mixing multiple Raw Streams into a Source Stream is 2059 not possible, since that requires mixed streams to have a timing 2060 relation, requiring them to be Source Streams, and added an 2061 example. 2063 o Clarified that RTP-based Redundancy excludes the type of encoding 2064 redundancy found within the encoded media format in an Encoded 2065 Stream. 2067 o Clarified that a Media Transport contains only a single RTP 2068 Session, but a single RTP Session can span multiple Media 2069 Transports. 2071 o Clarified that packets with seemingly correct checksum that are 2072 received by a Media Transport Receiver may still be corrupt. 2074 o Clarified that a corrupt packet in a Media Transport Receiver is 2075 typically either discarded or somehow marked and passed on in the 2076 Received RTP Stream. 2078 o Added Synchronization Context to Figure 6. 2080 o Editorial improvements and clarifications. 2082 A.6. Modifications Between WG Version -02 and -03 2084 o Changed section 3.5, removing SST-SS/MS and MST-SS/MS, replacing 2085 them with SRST, MRST, and MRMT. 2087 o Updated section 3.8 to align with terminology changes in section 2088 3.5. 2090 o Added a new section 4.12, describing the term Multimedia 2091 Conference. 2093 o Changed reference from I-D to now published RFC 7273. 2095 o Editorial improvements and clarifications. 2097 A.7. Modifications Between WG Version -01 and -02 2099 o Major re-structure 2101 o Moved media chain Media Transport detailing up one section level 2103 o Collapsed level 2 sub-sections of section 3 and thus moved level 3 2104 sub-sections up one level, gathering some introductory text into 2105 the beginning of section 3 2107 o Added that not only SSRC collision, but also a clock rate change 2108 [RFC7160] is a valid reason to change SSRC value for an RTP stream 2110 o Added a sub-section on clock source signaling 2112 o Added a sub-section on RTP stream duplication 2114 o Elaborated a bit in section 2.2.1 on the relation between End 2115 Points, Participants and CNAMEs 2117 o Elaborated a bit in section 2.2.4 on Multimedia Session and 2118 synchronization contexts 2120 o Removed the section on CLUE scenes defining an implicit 2121 synchronization context, since it was incorrect 2123 o Clarified text on SVC SST and MST according to list discussions 2125 o Removed the entire topology section to avoid possible 2126 inconsistencies or duplications with draft-ietf-avtcore-rtp- 2127 topologies-update, but saved one example overview figure of 2128 Communication Entities into that section 2130 o Added a section 4 on mapping from existing terms with one sub- 2131 section per term, mainly by moving text from sections 2 and 3 2133 o Changed all occurrences of Packet Stream to RTP Stream 2135 o Moved all normative references to informative, since this is an 2136 informative document 2138 o Added references to RFC 7160, RFC 7197 and RFC 7198, and removed 2139 unused references 2141 A.8. Modifications Between WG Version -00 and -01 2143 o WG version -00 text is identical to individual draft -03 2145 o Amended description of SVC SST and MST encodings with respect to 2146 concepts defined in this text 2148 o Removed UML as normative reference, since the text no longer uses 2149 any UML notation 2151 o Removed a number of level 4 sections and moved out text to the 2152 level above 2154 A.9. Modifications Between Version -02 and -03 2156 o Section 4 rewritten (and new communication topologies added) to 2157 reflect the major updates to Sections 1-3 2159 o Section 8 removed (carryover from initial -00 draft) 2161 o General clean up of text, grammar and nits 2163 A.10. Modifications Between Version -01 and -02 2165 o Section 2 rewritten to add both streams and transformations in the 2166 media chain. 2168 o Section 3 rewritten to focus on exposing relationships. 2170 A.11. Modifications Between Version -00 and -01 2172 o Too many to list 2174 o Added new authors 2176 o Updated content organization and presentation 2178 Authors' Addresses 2180 Jonathan Lennox 2181 Vidyo, Inc. 2182 433 Hackensack Avenue 2183 Seventh Floor 2184 Hackensack, NJ 07601 2185 US 2187 Email: jonathan@vidyo.com 2189 Kevin Gross 2190 AVA Networks, LLC 2191 Boulder, CO 2192 US 2194 Email: kevin.gross@avanw.com 2196 Suhas Nandakumar 2197 Cisco Systems 2198 170 West Tasman Drive 2199 San Jose, CA 95134 2200 US 2202 Email: snandaku@cisco.com 2204 Gonzalo Salgueiro 2205 Cisco Systems 2206 7200-12 Kit Creek Road 2207 Research Triangle Park, NC 27709 2208 US 2210 Email: gsalguei@cisco.com 2212 Bo Burman (editor) 2213 Ericsson 2214 Kistavagen 25 2215 SE-16480 Stockholm 2216 Sweden 2218 Email: bo.burman@ericsson.com