idnits 2.17.1 draft-ietf-avtext-rtp-grouping-taxonomy-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 23, 2015) is 3229 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-11) exists of draft-ietf-avtcore-rtp-multi-stream-07 == Outdated reference: A later version (-10) exists of draft-ietf-avtcore-rtp-topologies-update-08 == Outdated reference: A later version (-25) exists of draft-ietf-clue-framework-22 == Outdated reference: A later version (-54) exists of draft-ietf-mmusic-sdp-bundle-negotiation-22 == Outdated reference: A later version (-14) exists of draft-ietf-mmusic-sdp-simulcast-00 == Outdated reference: A later version (-19) exists of draft-ietf-rtcweb-overview-14 -- Obsolete informational reference (is this intentional?): RFC 4566 (Obsoleted by RFC 8866) Summary: 0 errors (**), 0 flaws (~~), 7 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Lennox 3 Internet-Draft Vidyo 4 Intended status: Informational K. Gross 5 Expires: December 25, 2015 AVA 6 S. Nandakumar 7 G. Salgueiro 8 Cisco Systems 9 B. Burman, Ed. 10 Ericsson 11 June 23, 2015 13 A Taxonomy of Semantics and Mechanisms for Real-Time Transport Protocol 14 (RTP) Sources 15 draft-ietf-avtext-rtp-grouping-taxonomy-07 17 Abstract 19 The terminology about, and associations among, Real-Time Transport 20 Protocol (RTP) sources can be complex and somewhat opaque. This 21 document describes a number of existing and proposed properties and 22 relationships among RTP sources, and defines common terminology for 23 discussing protocol entities and their relationships. 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on December 25, 2015. 42 Copyright Notice 44 Copyright (c) 2015 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 60 2. Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . 4 61 2.1. Media Chain . . . . . . . . . . . . . . . . . . . . . . . 5 62 2.1.1. Physical Stimulus . . . . . . . . . . . . . . . . . . 9 63 2.1.2. Media Capture . . . . . . . . . . . . . . . . . . . . 9 64 2.1.3. Raw Stream . . . . . . . . . . . . . . . . . . . . . 9 65 2.1.4. Media Source . . . . . . . . . . . . . . . . . . . . 10 66 2.1.5. Source Stream . . . . . . . . . . . . . . . . . . . . 10 67 2.1.6. Media Encoder . . . . . . . . . . . . . . . . . . . . 11 68 2.1.7. Encoded Stream . . . . . . . . . . . . . . . . . . . 12 69 2.1.8. Dependent Stream . . . . . . . . . . . . . . . . . . 12 70 2.1.9. Media Packetizer . . . . . . . . . . . . . . . . . . 12 71 2.1.10. RTP Stream . . . . . . . . . . . . . . . . . . . . . 13 72 2.1.11. RTP-based Redundancy . . . . . . . . . . . . . . . . 13 73 2.1.12. Redundancy RTP Stream . . . . . . . . . . . . . . . . 14 74 2.1.13. RTP-based Security . . . . . . . . . . . . . . . . . 14 75 2.1.14. Secured RTP Stream . . . . . . . . . . . . . . . . . 15 76 2.1.15. Media Transport . . . . . . . . . . . . . . . . . . . 15 77 2.1.16. Media Transport Sender . . . . . . . . . . . . . . . 16 78 2.1.17. Sent RTP Stream . . . . . . . . . . . . . . . . . . . 17 79 2.1.18. Network Transport . . . . . . . . . . . . . . . . . . 17 80 2.1.19. Transported RTP Stream . . . . . . . . . . . . . . . 17 81 2.1.20. Media Transport Receiver . . . . . . . . . . . . . . 17 82 2.1.21. Received Secured RTP Stream . . . . . . . . . . . . . 18 83 2.1.22. RTP-based Validation . . . . . . . . . . . . . . . . 18 84 2.1.23. Received RTP Stream . . . . . . . . . . . . . . . . . 18 85 2.1.24. Received Redundancy RTP Stream . . . . . . . . . . . 18 86 2.1.25. RTP-based Repair . . . . . . . . . . . . . . . . . . 18 87 2.1.26. Repaired RTP Stream . . . . . . . . . . . . . . . . . 18 88 2.1.27. Media Depacketizer . . . . . . . . . . . . . . . . . 19 89 2.1.28. Received Encoded Stream . . . . . . . . . . . . . . . 19 90 2.1.29. Media Decoder . . . . . . . . . . . . . . . . . . . . 19 91 2.1.30. Received Source Stream . . . . . . . . . . . . . . . 19 92 2.1.31. Media Sink . . . . . . . . . . . . . . . . . . . . . 19 93 2.1.32. Received Raw Stream . . . . . . . . . . . . . . . . . 20 94 2.1.33. Media Render . . . . . . . . . . . . . . . . . . . . 20 95 2.2. Communication Entities . . . . . . . . . . . . . . . . . 20 96 2.2.1. Endpoint . . . . . . . . . . . . . . . . . . . . . . 21 97 2.2.2. RTP Session . . . . . . . . . . . . . . . . . . . . . 22 98 2.2.3. Participant . . . . . . . . . . . . . . . . . . . . . 23 99 2.2.4. Multimedia Session . . . . . . . . . . . . . . . . . 23 100 2.2.5. Communication Session . . . . . . . . . . . . . . . . 24 101 3. Concepts of Inter-Relations . . . . . . . . . . . . . . . . . 24 102 3.1. Synchronization Context . . . . . . . . . . . . . . . . . 24 103 3.1.1. RTCP CNAME . . . . . . . . . . . . . . . . . . . . . 25 104 3.1.2. Clock Source Signaling . . . . . . . . . . . . . . . 25 105 3.1.3. Implicitly via RtcMediaStream . . . . . . . . . . . . 25 106 3.1.4. Explicitly via SDP Mechanisms . . . . . . . . . . . . 25 107 3.2. Endpoint . . . . . . . . . . . . . . . . . . . . . . . . 25 108 3.3. Participant . . . . . . . . . . . . . . . . . . . . . . . 26 109 3.4. RtcMediaStream . . . . . . . . . . . . . . . . . . . . . 26 110 3.5. Multi-Channel Audio . . . . . . . . . . . . . . . . . . . 26 111 3.6. Simulcast . . . . . . . . . . . . . . . . . . . . . . . . 27 112 3.7. Layered Multi-Stream . . . . . . . . . . . . . . . . . . 28 113 3.8. RTP Stream Duplication . . . . . . . . . . . . . . . . . 29 114 3.9. Redundancy Format . . . . . . . . . . . . . . . . . . . . 30 115 3.10. RTP Retransmission . . . . . . . . . . . . . . . . . . . 31 116 3.11. Forward Error Correction . . . . . . . . . . . . . . . . 33 117 3.12. RTP Stream Separation . . . . . . . . . . . . . . . . . . 34 118 3.13. Multiple RTP Sessions over one Media Transport . . . . . 35 119 4. Mapping from Existing Terms . . . . . . . . . . . . . . . . . 35 120 4.1. Telepresence Terms . . . . . . . . . . . . . . . . . . . 35 121 4.1.1. Audio Capture . . . . . . . . . . . . . . . . . . . . 35 122 4.1.2. Capture Device . . . . . . . . . . . . . . . . . . . 35 123 4.1.3. Capture Encoding . . . . . . . . . . . . . . . . . . 35 124 4.1.4. Capture Scene . . . . . . . . . . . . . . . . . . . . 36 125 4.1.5. Endpoint . . . . . . . . . . . . . . . . . . . . . . 36 126 4.1.6. Individual Encoding . . . . . . . . . . . . . . . . . 36 127 4.1.7. Media Capture . . . . . . . . . . . . . . . . . . . . 36 128 4.1.8. Media Consumer . . . . . . . . . . . . . . . . . . . 36 129 4.1.9. Media Provider . . . . . . . . . . . . . . . . . . . 36 130 4.1.10. Stream . . . . . . . . . . . . . . . . . . . . . . . 37 131 4.1.11. Video Capture . . . . . . . . . . . . . . . . . . . . 37 132 4.2. Media Description . . . . . . . . . . . . . . . . . . . . 37 133 4.3. Media Stream . . . . . . . . . . . . . . . . . . . . . . 37 134 4.4. Multimedia Conference . . . . . . . . . . . . . . . . . . 37 135 4.5. Multimedia Session . . . . . . . . . . . . . . . . . . . 37 136 4.6. Multipoint Control Unit (MCU) . . . . . . . . . . . . . . 38 137 4.7. Multi-Session Transmission (MST) . . . . . . . . . . . . 38 138 4.8. Recording Device . . . . . . . . . . . . . . . . . . . . 38 139 4.9. RtcMediaStream . . . . . . . . . . . . . . . . . . . . . 38 140 4.10. RtcMediaStreamTrack . . . . . . . . . . . . . . . . . . . 39 141 4.11. RTP Sender . . . . . . . . . . . . . . . . . . . . . . . 39 142 4.12. RTP Session . . . . . . . . . . . . . . . . . . . . . . . 39 143 4.13. Single Session Transmission (SST) . . . . . . . . . . . . 39 144 4.14. SSRC . . . . . . . . . . . . . . . . . . . . . . . . . . 39 146 5. Security Considerations . . . . . . . . . . . . . . . . . . . 39 147 6. Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 40 148 7. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 40 149 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 40 150 9. Informative References . . . . . . . . . . . . . . . . . . . 40 151 Appendix A. Changes From Earlier Versions . . . . . . . . . . . 43 152 A.1. Modifications Between WG Version -06 and -07 . . . . . . 43 153 A.2. Modifications Between WG Version -05 and -06 . . . . . . 43 154 A.3. Modifications Between WG Version -04 and -05 . . . . . . 44 155 A.4. Modifications Between WG Version -03 and -04 . . . . . . 44 156 A.5. Modifications Between WG Version -02 and -03 . . . . . . 45 157 A.6. Modifications Between WG Version -01 and -02 . . . . . . 45 158 A.7. Modifications Between WG Version -00 and -01 . . . . . . 46 159 A.8. Modifications Between Version -02 and -03 . . . . . . . . 46 160 A.9. Modifications Between Version -01 and -02 . . . . . . . . 46 161 A.10. Modifications Between Version -00 and -01 . . . . . . . . 46 162 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 47 164 1. Introduction 166 The existing taxonomy of sources in Real-Time Transport Protocol 167 (RTP) [RFC3550] has previously often been regarded as confusing and 168 inconsistent. Consequently, a deep understanding of how the 169 different terms relate to each other becomes a real challenge. 170 Frequently cited examples of this confusion are (1) how different 171 protocols that make use of RTP use the same terms to signify 172 different things and (2) how the complexities addressed at one layer 173 are often glossed over or ignored at another. 175 This document provides some clarity by reviewing the semantics of 176 various aspects of sources in RTP. As an organizing mechanism, it 177 approaches this by describing various ways that RTP sources are 178 transformed on their way between sender and receiver, and how they 179 can be grouped and associated together. 181 All non-specific references to ControLling mUltiple streams for 182 tElepresence (CLUE) in this document map to [I-D.ietf-clue-framework] 183 and all references to Web Real-Time Communications (WebRTC) map to 184 [I-D.ietf-rtcweb-overview]. 186 2. Concepts 188 This section defines concepts that serve to identify and name various 189 transformations and streams in a given RTP usage. For each concept 190 an attempt is made to list any alternate definitions and usages that 191 co-exist today along with various characteristics that further 192 describes the concept. These concepts are divided into two 193 categories, one related to the chain of streams and transformations 194 that media can be subject to, the other for entities involved in the 195 communication. 197 2.1. Media Chain 199 In the context of this memo, Media is a sequence of synthetic or 200 Physical Stimuli (Section 2.1.1) (sound waves, photons, key-strokes), 201 represented in digital form. Synthesized Media is typically 202 generated directly in the digital domain. 204 This section contains the concepts that can be involved in taking 205 Media at a sender side and transporting it to a receiver, which may 206 recover a sequence of physical stimuli. This chain of concepts is of 207 two main types, streams and transformations. Streams are time-based 208 sequences of samples of the physical stimulus in various 209 representations, while transformations changes the representation of 210 the streams in some way. 212 The below examples are basic ones and it is important to keep in mind 213 that this conceptual model enables more complex usages. Some will be 214 further discussed in later sections of this document. In general the 215 following applies to this model: 217 o A transformation may have zero or more inputs and one or more 218 outputs. 220 o A stream is of some type, such as audio, video, real-time text, 221 etc. 223 o A stream has one source transformation and one or more sink 224 transformations (with the exception of Physical Stimulus 225 (Section 2.1.1) that may lack source or sink transformation). 227 o Streams can be forwarded from a transformation output to any 228 number of inputs on other transformations that support that type. 230 o If the output of a transformation is sent to multiple 231 transformations, those streams will be identical; it takes a 232 transformation to make them different. 234 o There are no formal limitations on how streams are connected to 235 transformations. 237 It is also important to remember that this is a conceptual model. 238 Thus real-world implementations may look different and have different 239 structure. 241 To provide a basic understanding of the relationships in the chain we 242 first introduce the concepts for the sender side (Figure 1). This 243 covers physical stimuli until media packets are emitted onto the 244 network. 246 Physical Stimulus 247 | 248 V 249 +----------------------+ 250 | Media Capture | 251 +----------------------+ 252 | 253 Raw Stream 254 V 255 +----------------------+ 256 | Media Source |<- Synchronization Timing 257 +----------------------+ 258 | 259 Source Stream 260 V 261 +----------------------+ 262 | Media Encoder | 263 +----------------------+ 264 | 265 Encoded Stream +------------+ 266 V | V 267 +----------------------+ | +----------------------+ 268 | Media Packetizer | | | RTP-based Redundancy | 269 +----------------------+ | +----------------------+ 270 | | | 271 +-------------+ Redundancy RTP Stream 272 Source RTP Stream | 273 V V 274 +----------------------+ +----------------------+ 275 | RTP-based Security | | RTP-based Security | 276 +----------------------+ +----------------------+ 277 | | 278 Secured RTP Stream Secured Redundancy RTP Stream 279 V V 280 +----------------------+ +----------------------+ 281 | Media Transport | | Media Transport | 282 +----------------------+ +----------------------+ 284 Figure 1: Sender Side Concepts in the Media Chain 286 In Figure 1 we have included a branched chain to cover the concepts 287 for using redundancy to improve the reliability of the transport. 289 The Media Transport concept is an aggregate that is decomposed in 290 Section 2.1.15. 292 In Figure 2 we review a receiver media chain matching the sender 293 side, to look at the inverse transformations and their attempts to 294 recover identical streams as in the sender chain, subject to what may 295 be lossy compression and imperfect Media Transport. Note that the 296 streams out of a reverse transformation, like the Source Stream out 297 the Media Decoder are in many cases not the same as the corresponding 298 ones on the sender side, thus they are prefixed with a "Received" to 299 denote a potentially modified version. The reason for not being the 300 same lies in the transformations that can be of irreversible type. 301 For example, lossy source coding in the Media Encoder prevents the 302 Source Stream out of the Media Decoder to be the same as the one fed 303 into the Media Encoder. Other reasons include packet loss or late 304 loss in the Media Transport transformation that even RTP-based 305 Repair, if used, fails to repair. However, some transformations are 306 not always present, like RTP-based Repair that cannot operate without 307 Redundancy RTP Streams. 309 +----------------------+ +----------------------+ 310 | Media Transport | | Media Transport | 311 +----------------------+ +----------------------+ 312 Received | Received | Secured 313 Secured RTP Stream Redundancy RTP Stream 314 V V 315 +----------------------+ +----------------------+ 316 | RTP-based Validation | | RTP-based Validation | 317 +----------------------+ +----------------------+ 318 | | 319 Received RTP Stream Received Redundancy RTP Stream 320 | | 321 | +--------------------+ 322 V V 323 +----------------------+ 324 | RTP-based Repair | 325 +----------------------+ 326 | 327 Repaired RTP Stream 328 V 329 +----------------------+ 330 | Media Depacketizer | 331 +----------------------+ 332 | 333 Received Encoded Stream 334 V 335 +----------------------+ 336 | Media Decoder | 337 +----------------------+ 338 | 339 Received Source Stream 340 V 341 +----------------------+ 342 | Media Sink |--> Synchronization Information 343 +----------------------+ 344 | 345 Received Raw Stream 346 V 347 +----------------------+ 348 | Media Renderer | 349 +----------------------+ 350 | 351 V 352 Physical Stimulus 354 Figure 2: Receiver Side Concepts of the Media Chain 356 2.1.1. Physical Stimulus 358 The physical stimulus is a physical event that can be sampled and 359 converted to digital form by an appropriate sensor or transducer. 360 This include sound waves making up audio, photons in a light field, 361 or other excitations or interactions with sensors, like keystrokes on 362 a keyboard. 364 2.1.2. Media Capture 366 Media Capture is the process of transforming the Physical Stimulus 367 (Section 2.1.1) into digital Media using an appropriate sensor or 368 transducer. The Media Capture performs a digital sampling of the 369 physical stimulus, usually periodically, and outputs this in some 370 representation as a Raw Stream (Section 2.1.3). This data is 371 considered "Media", because it includes data that is periodically 372 sampled, or made up of a set of timed asynchronous events. The Media 373 Capture is normally instantiated in some type of device, i.e. media 374 capture device. Examples of different types of media capturing 375 devices are digital cameras, microphones connected to A/D converters, 376 or keyboards. 378 Characteristics: 380 o A Media Capture is identified either by hardware/manufacturer ID 381 or via a session-scoped device identifier as mandated by the 382 application usage. 384 o A Media Capture can generate an Encoded Stream (Section 2.1.7) if 385 the capture device supports such a configuration. 387 o The nature of the Media Capture may impose constraints on the 388 clock handling in some of the subsequent steps. For example, many 389 audio or video capture devices are not completely free in 390 selecting the sample rate. 392 2.1.3. Raw Stream 394 The time progressing stream of digitally sampled information, usually 395 periodically sampled and provided by a Media Capture (Section 2.1.2). 396 A Raw Stream can also contain synthesized Media that may not require 397 any explicit Media Capture, since it is already in an appropriate 398 digital form. 400 2.1.4. Media Source 402 A Media Source is the logical source of a time progressing digital 403 media stream synchronized to a reference clock. This stream is 404 called a Source Stream (Section 2.1.5). This transformation takes 405 one or more Raw Streams (Section 2.1.3) and provides a Source Stream 406 as output. The output is synchronized with a reference clock 407 (Section 3.1), which can be as simple as a system local wall clock or 408 as complex as an NTP synchronized clock. 410 The output can be of different types. One type is directly 411 associated with a particular Media Capture's Raw Stream. Others are 412 more conceptual sources, like an audio mix of multiple Source Streams 413 (Figure 3). Mixing multiple streams typically requires that the 414 input streams are possible to relate in time, meaning that they have 415 to be Source Streams (Section 2.1.5) rather than Raw Streams. In 416 Figure 3, the generated Source Stream is a mix of the three input 417 Source Streams. 419 Source Source Source 420 Stream Stream Stream 421 | | | 422 V V V 423 +--------------------------+ 424 | Media Source |<-- Reference Clock 425 | Mixer | 426 +--------------------------+ 427 | 428 V 429 Source Stream 431 Figure 3: Conceptual Media Source in form of Audio Mixer 433 Another possible example of a conceptual Media Source is a video 434 surveillance switch, where the input is multiple Source Streams from 435 different cameras, and the output is one of those Source Streams 436 based on some selection criteria, like a round-robin or based on some 437 video activity measure. 439 2.1.5. Source Stream 441 A stream of digital samples that has been synchronized with a 442 reference clock and comes from particular Media Source 443 (Section 2.1.4). 445 2.1.6. Media Encoder 447 A Media Encoder is a transform that is responsible for encoding the 448 media data from a Source Stream (Section 2.1.5) into another 449 representation, usually more compact, that is output as an Encoded 450 Stream (Section 2.1.7). 452 The Media Encoder step commonly includes pre-encoding 453 transformations, such as scaling, resampling etc. The Media Encoder 454 can have a significant number of configuration options that affects 455 the properties of the Encoded Stream. This include properties such 456 as codec, bit-rate, start points for decoding, resolution, bandwidth 457 or other fidelity affecting properties. 459 Scalable Media Encoders need special attention as they produce 460 multiple outputs that are potentially of different types. As shown 461 in Figure 4, a scalable Media Encoder takes one input Source Stream 462 and encodes it into multiple output streams of two different types; 463 at least one Encoded Stream that is independently decodable and one 464 or more Dependent Streams (Section 2.1.8). Decoding requires at 465 least one Encoded Stream and zero or more Dependent Streams. A 466 Dependent Stream's dependency is one of the grouping relations this 467 document discusses further in Section 3.7. 469 Source Stream 470 | 471 V 472 +--------------------------+ 473 | Scalable Media Encoder | 474 +--------------------------+ 475 | | ... | 476 V V V 477 Encoded Dependent Dependent 478 Stream Stream Stream 480 Figure 4: Scalable Media Encoder Input and Outputs 482 There are also other variants of encoders, like so-called Multiple 483 Description Coding (MDC). Such Media Encoder produce multiple 484 independent and thus individually decodable Encoded Streams. 485 However, (logically) combining multiple of these Encoded Streams into 486 a single Received Source Stream during decoding leads to an 487 improvement in perceptual reproduced quality when compared to 488 decoding a single Encoded Stream. 490 Creating multiple Encoded Streams from the same Source Stream, where 491 the Encoded Streams are neither in a scalable nor in an MDC 492 relationship is commonly utilized in Simulcast 493 [I-D.ietf-mmusic-sdp-simulcast] environments. 495 2.1.7. Encoded Stream 497 A stream of time synchronized encoded media that can be independently 498 decoded. 500 Due to temporal dependencies, an Encoded Stream may have limitations 501 in where decoding can be started. These entry points, for example 502 Intra frames from a video encoder, may require identification and 503 their generation may be event based or configured to occur 504 periodically. 506 2.1.8. Dependent Stream 508 A stream of time synchronized encoded media fragments that are 509 dependent on one or more Encoded Streams (Section 2.1.7) and zero or 510 more Dependent Streams to be possible to decode. 512 Each Dependent Stream has a set of dependencies. These dependencies 513 must be understood by the parties in a Multimedia Session that intend 514 to use a Dependent Stream. 516 2.1.9. Media Packetizer 518 The transformation of taking one or more Encoded (Section 2.1.7) or 519 Dependent Streams (Section 2.1.8) and put their content into one or 520 more sequences of packets, normally RTP packets, and output Source 521 RTP Streams (Section 2.1.10). This step includes both generating RTP 522 payloads as well as RTP packets. The Media Packetizer then selects 523 which Synchronization source(s) (SSRC) [RFC3550] and RTP Sessions to 524 use. 526 The Media Packetizer can combine multiple Encoded or Dependent 527 Streams into one or more RTP Streams: 529 o The Media Packetizer can use multiple inputs when producing a 530 single RTP Stream. One such example is SRST packetization when 531 using Scalable Video Coding (SVC) (Section 3.7). 533 o The Media Packetizer can also produce multiple RTP Streams, for 534 example when Encoded and/or Dependent Streams are distributed over 535 multiple RTP Streams. One example of this is MRMT packetization 536 when using SVC (Section 3.7). 538 2.1.10. RTP Stream 540 A stream of RTP packets containing media data, source or redundant. 541 The RTP Stream is identified by an SSRC belonging to a particular RTP 542 Session. The RTP Session is identified as discussed in 543 Section 2.2.2. 545 A Source RTP Stream is an RTP Stream containing at least some content 546 from an Encoded Stream (Section 2.1.7) at some point during its 547 lifetime. Source material is any media material that is produced for 548 transport over RTP without any additional RTP-based redundancy 549 applied. Note that RTP-based redundancy excludes the type of 550 redundancy that most suitable Media Encoders (Section 2.1.6) may add 551 to the media format of the Encoded Stream that makes it cope better 552 with inevitable RTP packet losses. This is further described in RTP- 553 based Redundancy (Section 2.1.11) and Redundancy RTP Stream 554 (Section 2.1.12). 556 Characteristics: 558 o Each RTP Stream is identified by a Synchronization source (SSRC) 559 [RFC3550] that is carried in every RTP and RTP Control Protocol 560 (RTCP) packet header. The SSRC is unique in a specific RTP 561 Session context. 563 o At any given point in time, a RTP Stream can have one and only one 564 SSRC, but SSRCs for a given RTP Stream can change over time. SSRC 565 collision and clock rate change [RFC7160] are examples of valid 566 reasons to change SSRC for an RTP Stream. In those cases, the RTP 567 Stream itself is not changed in any significant way, only the 568 identifying SSRC number. 570 o Each SSRC defines a unique RTP sequence numbering and timing 571 space. 573 o Several RTP Streams, each with their own SSRC, may represent a 574 single Media Source. 576 o Several RTP Streams, each with their own SSRC, can be carried in a 577 single RTP Session. 579 2.1.11. RTP-based Redundancy 581 RTP-based Redundancy is defined here as a transformation that 582 generates redundant or repair packets sent out as a Redundancy RTP 583 Stream (Section 2.1.12) to mitigate network transport impairments, 584 like packet loss and delay. 586 The RTP-based Redundancy exists in many flavors; they may be 587 generating independent Repair Streams that are used in addition to 588 the Source Stream (like RTP Retransmission (Section 3.10) and some 589 special types of Forward Error Correction, like RTP stream 590 duplication (Section 3.8)), they may generate a new Source Stream by 591 combining redundancy information with source information (Using XOR 592 FEC (Section 3.11) as a redundancy payload (Section 3.9)), or 593 completely replace the source information with only redundancy 594 packets. 596 2.1.12. Redundancy RTP Stream 598 A RTP Stream (Section 2.1.10) that contains no original source data, 599 only redundant data, which may either be used standalone or be 600 combined with one or more Received RTP Streams (Section 2.1.23) to 601 produce Repaired RTP Streams (Section 2.1.26). 603 2.1.13. RTP-based Security 605 The optional RTP-based Security transformation applies security 606 services such as authentication, integrity protection and 607 confidentiality to an input RTP Stream, like what is specified in The 608 Secure Real-time Transport Protocol (SRTP) [RFC3711], producing a 609 Secured RTP Stream (Section 2.1.14). Either an RTP Stream 610 (Section 2.1.10) or a Redundancy RTP Stream (Section 2.1.12) can be 611 used as input to this transformation. 613 In SRTP and the related Secure RTCP (SRTCP), all of the above 614 mentioned security services are optional, except for integrity 615 protection of SRTCP, which is mandatory. Also confidentiality 616 (encryption) is effectively optional in SRTP, since it is possible to 617 use a NULL encryption algorithm. As described in [RFC7201], the 618 strength of SRTP data origin authentication depends on the 619 cryptographic transform and key management used, for example in group 620 communication where it is sometimes possible to authenticate group 621 membership but not the actual RTP Stream sender. 623 RTP-based Security and RTP-based Redundancy can be combined in a few 624 different ways. One way is depicted in Figure 1, where an RTP Stream 625 and its corresponding Redundancy RTP Stream are protected by separate 626 RTP-based Security transforms. In other cases, like when a Media 627 Translator is adding FEC in Section 3.2.1.3 of 628 [I-D.ietf-avtcore-rtp-topologies-update], a middlebox can apply RTP- 629 based Redundancy to an already Secured RTP Stream instead of a Source 630 RTP Stream. One example of that is depicted in Figure 5 below. 632 Source RTP Stream +------------+ 633 V | V 634 +----------------------+ | +----------------------+ 635 | RTP-based Security | | | RTP-based Redundancy | 636 +----------------------+ | +----------------------+ 637 | | | 638 | | Redundancy RTP Stream 639 +-------------+ | 640 | V 641 | +----------------------+ 642 Secured RTP Stream | RTP-based Security | 643 | +----------------------+ 644 | | 645 | Secured Redundancy RTP Stream 646 V V 647 +----------------------+ +----------------------+ 648 | Media Transport | | Media Transport | 649 +----------------------+ +----------------------+ 651 Figure 5: Adding Redundancy to a Secured RTP Stream 653 In this case, the Redundancy RTP Stream may already have been secured 654 for confidentiality (encrypted) by the first RTP-based Security, and 655 it may therefore not be necessary to apply additional confidentiality 656 protection in the second RTP-based Security. To avoid attacks and 657 negative impact on RTP-based Repair (Section 2.1.25) and the 658 resulting Repaired RTP Stream (Section 2.1.26), it is however still 659 necessary to have this second RTP-based Security apply both 660 authentication and integrity protection to the Redundancy RTP Stream. 662 2.1.14. Secured RTP Stream 664 A Secured RTP Stream is a Source or Redundancy RTP Stream that is 665 protected through RTP-based Security (Section 2.1.13) by one or more 666 of the confidentiality, integrity, or authentication security 667 services. 669 2.1.15. Media Transport 671 A Media Transport defines the transformation that the RTP Streams 672 (Section 2.1.10) are subjected to by the end-to-end transport from 673 one RTP sender to one specific RTP receiver (an RTP Session 674 (Section 2.2.2) may contain multiple RTP receivers per sender). Each 675 Media Transport is defined by a transport association that is 676 normally identified by a 5-tuple (source address, source port, 677 destination address, destination port, transport protocol), but a 678 proposal exists for sending multiple transport associations on a 679 single 5-tuple [I-D.westerlund-avtcore-transport-multiplexing]. 681 Characteristics: 683 o Media Transport transmits RTP Streams of RTP Packets from a source 684 transport address to a destination transport address. 686 o Each Media Transport contains only a single RTP Session. 688 o A single RTP Session can span multiple Media Transports. 690 The Media Transport concept sometimes needs to be decomposed into 691 more steps to enable discussion of what a sender emits that gets 692 transformed by the network before it is received by the receiver. 693 Thus we provide also this Media Transport decomposition (Figure 6). 695 RTP Stream 696 | 697 V 698 +--------------------------+ 699 | Media Transport Sender | 700 +--------------------------+ 701 | 702 Sent RTP Stream 703 V 704 +--------------------------+ 705 | Network Transport | 706 +--------------------------+ 707 | 708 Transported RTP Stream 709 V 710 +--------------------------+ 711 | Media Transport Receiver | 712 +--------------------------+ 713 | 714 V 715 Received RTP Stream 717 Figure 6: Decomposition of Media Transport 719 2.1.16. Media Transport Sender 721 The first transformation within the Media Transport (Section 2.1.15) 722 is the Media Transport Sender. The sending Endpoint (Section 2.2.1) 723 takes an RTP Stream and emits the packets onto the network using the 724 transport association established for this Media Transport, thereby 725 creating a Sent RTP Stream (Section 2.1.17). In the process, it 726 transforms the RTP Stream in several ways. First, it generates the 727 necessary protocol headers for the transport association, for example 728 IP and UDP headers, thus forming IP/UDP/RTP packets. In addition, 729 the Media Transport Sender may queue, pace or otherwise affect how 730 the packets are emitted onto the network, thereby potentially 731 introducing delay, jitter and inter packet spacings that characterize 732 the Sent RTP Stream. 734 2.1.17. Sent RTP Stream 736 The Sent RTP Stream is the RTP Stream as entering the first hop of 737 the network path to its destination. The Sent RTP Stream is 738 identified using network transport addresses, like for IP/UDP the 739 5-tuple (source IP address, source port, destination IP address, 740 destination port, and protocol (UDP)). 742 2.1.18. Network Transport 744 Network Transport is the transformation that subjects the Sent RTP 745 Stream (Section 2.1.17) to traveling from the source to the 746 destination through the network. This transformation can result in 747 loss of some packets, varying delay on a per packet basis, packet 748 duplication, and packet header or data corruption. This 749 transformation produces a Transported RTP Stream (Section 2.1.19) at 750 the exit of the network path. 752 2.1.19. Transported RTP Stream 754 The RTP Stream that is emitted out of the network path at the 755 destination, subjected to the Network Transport's transformation 756 (Section 2.1.18). 758 2.1.20. Media Transport Receiver 760 The receiver Endpoint's (Section 2.2.1) transformation of the 761 Transported RTP Stream (Section 2.1.19) by its reception process, 762 which results in the Received RTP Stream (Section 2.1.23). This 763 transformation includes transport checksums being verified. Sensible 764 system designs typically either discard packets with mis-matching 765 checksums, or pass them on while somehow marking them in the 766 resulting Received RTP Stream so to alert subsequent transformations 767 about the possible corrupt state. In this context it is worth noting 768 that there is typically some probability for corrupt packets to pass 769 through undetected (with a seemingly correct checksum). Other 770 transformations can compensate for delay variations in receiving a 771 packet on the network interface and providing it to the application 772 (de-jitter buffer). 774 2.1.21. Received Secured RTP Stream 776 This is the Secured RTP Stream (Section 2.1.14) resulting from the 777 Media Transport (Section 2.1.15) aggregate transformation. 779 2.1.22. RTP-based Validation 781 RTP-based Validation is the reverse transformation of RTP-based 782 Security (Section 2.1.13). If this transformation fails, the result 783 is either not usable and must be discarded, or may be usable but 784 cannot be trusted. If the transformation succeeds, the result can be 785 a Received RTP Stream (Section 2.1.23) or a Received Redundancy RTP 786 Stream (Section 2.1.24), depending on what was input to the 787 corresponding RTP-based Security transformation, but can also be a 788 Received Secured RTP Stream (Section 2.1.21) in case several RTP- 789 based Security transformations were applied. 791 2.1.23. Received RTP Stream 793 The RTP Stream (Section 2.1.10) resulting from the Media Transport's 794 aggregate transformation (Section 2.1.15), i.e. subjected to packet 795 loss, packet corruption, packet duplication and varying transmission 796 delay from sender to receiver. 798 2.1.24. Received Redundancy RTP Stream 800 The Redundancy RTP Stream (Section 2.1.12) resulting from the Media 801 Transport transformation, i.e. subjected to packet loss, packet 802 corruption, and varying transmission delay from sender to receiver. 804 2.1.25. RTP-based Repair 806 RTP-based Repair is a Transformation that takes as input zero or more 807 Received RTP Streams (Section 2.1.23) and one or more Received 808 Redundancy RTP Streams (Section 2.1.24), and produces one or more 809 Repaired RTP Streams (Section 2.1.26) that are as close to the 810 corresponding sent Source RTP Streams (Section 2.1.10) as possible, 811 using different RTP-based repair methods, for example the ones 812 referred in RTP-based Redundancy (Section 2.1.11). 814 2.1.26. Repaired RTP Stream 816 A Received RTP Stream (Section 2.1.23) for which Received Redundancy 817 RTP Stream (Section 2.1.24) information has been used to try to 818 recover the Source RTP Stream (Section 2.1.10) as it was before Media 819 Transport (Section 2.1.15). 821 2.1.27. Media Depacketizer 823 A Media Depacketizer takes one or more RTP Streams (Section 2.1.10), 824 depacketizes them, and attempts to reconstitute the Encoded Streams 825 (Section 2.1.7) or Dependent Streams (Section 2.1.8) present in those 826 RTP Streams. 828 In practical implementations, the Media Depacketizer and the Media 829 Decoder may be tightly coupled and share information to improve or 830 optimize the overall decoding and error concealment process. It is, 831 however, not expected that there would be any benefit in defining a 832 taxonomy for those detailed (and likely very implementation- 833 dependent) steps. 835 2.1.28. Received Encoded Stream 837 The received version of an Encoded Stream (Section 2.1.7). 839 2.1.29. Media Decoder 841 A Media Decoder is a transformation that is responsible for decoding 842 Encoded Streams (Section 2.1.7) and any Dependent Streams 843 (Section 2.1.8) into a Source Stream (Section 2.1.5). 845 In practical implementations, the Media Decoder and the Media 846 Depacketizer may be tightly coupled and share information to improve 847 or optimize the overall decoding process in various ways. It is 848 however not expected that there would be any benefit in defining a 849 taxonomy for those detailed (and likely very implementation- 850 dependent) steps. 852 A Media Decoder has to deal with any errors in the Encoded Streams 853 that resulted from corruption or failure to repair packet losses. 854 Therefore, it commonly is robust to error and losses, and includes 855 concealment methods. 857 2.1.30. Received Source Stream 859 The received version of a Source Stream (Section 2.1.5). 861 2.1.31. Media Sink 863 The Media Sink receives a Source Stream (Section 2.1.5) that 864 contains, usually periodically, sampled media data together with 865 associated synchronization information. Depending on application, 866 this Source Stream then needs to be transformed into a Raw Stream 867 (Section 2.1.3) that is conveyed to the Media Render 868 (Section 2.1.33), synchronized with the output from other Media 869 Sinks. The Media Sink may also be connected with a Media Source 870 (Section 2.1.4) and be used as part of a conceptual Media Source. 872 The Media Sink can further transform the Source Stream into a 873 representation that is suitable for rendering on the Media Render as 874 defined by the application or system-wide configuration. This 875 include sample scaling, level adjustments etc. 877 2.1.32. Received Raw Stream 879 The received version of a Raw Stream (Section 2.1.3). 881 2.1.33. Media Render 883 A Media Render takes a Raw Stream (Section 2.1.3) and converts it 884 into Physical Stimulus (Section 2.1.1) that a human user can 885 perceive. Examples of such devices are screens, and D/A converters 886 connected to amplifiers and loudspeakers. 888 An Endpoint can potentially have multiple Media Renders for each 889 media type. 891 2.2. Communication Entities 893 This section contains concepts for entities involved in the 894 communication. 896 +------------------------------------------------------------+ 897 | Communication Session | 898 | | 899 | +----------------+ +----------------+ | 900 | | Participant A | +------------+ | Participant B | | 901 | | | | Multimedia | | | | 902 | | +------------+ |<==>| Session |<==>| +------------+ | | 903 | | | Endpoint A | | | | | | Endpoint B | | | 904 | | | | | +------------+ | | | | | 905 | | | +----------+-+----------------------+-+----------+ | | | 906 | | | | RTP | | | | | | | | 907 | | | | Session |-+---Media Transport----+>| | | | | 908 | | | | Audio |<+---Media Transport----+-| | | | | 909 | | | | | | ^ | | | | | | 910 | | | +----------+-+----------|-----------+-+----------+ | | | 911 | | | | | v | | | | | 912 | | | | | +-----------------+ | | | | | 913 | | | | | | Synchronization | | | | | | 914 | | | | | | Context | | | | | | 915 | | | | | +-----------------+ | | | | | 916 | | | | | ^ | | | | | 917 | | | +----------+-+----------|-----------+-+----------+ | | | 918 | | | | RTP | | v | | | | | | 919 | | | | Session |<+---Media Transport----+-| | | | | 920 | | | | Video |-+---Media Transport----+>| | | | | 921 | | | | | | | | | | | | 922 | | | +----------+-+----------------------+-+----------+ | | | 923 | | +------------+ | | +------------+ | | 924 | +----------------+ +----------------+ | 925 +------------------------------------------------------------+ 927 Figure 7: Example Point to Point Communication Session with two RTP 928 Sessions 930 Figure 7 shows a high-level example representation of a very basic 931 point-to-point Communication Session between Participants A and B. 932 It uses two different audio and video RTP Sessions between A's and 933 B's Endpoints, using separate Media Transports for those RTP 934 Sessions. The Multimedia Session shared by the Participants can, for 935 example, be established using SIP (i.e., there is a SIP Dialog 936 between A and B). The terms used in Figure 7 are further elaborated 937 in the sub-sections below. 939 2.2.1. Endpoint 941 A single addressable entity sending or receiving RTP packets. It may 942 be decomposed into several functional blocks, but as long as it 943 behaves as a single RTP stack entity it is classified as a single 944 "Endpoint". 946 Characteristics: 948 o Endpoints can be identified in several different ways. While RTCP 949 Canonical Names (CNAMEs) [RFC3550] provide a globally unique and 950 stable identification mechanism for the duration of the 951 Communication Session (see Section 2.2.5), their validity applies 952 exclusively within a Synchronization Context (Section 3.1). Thus 953 one Endpoint can handle multiple CNAMEs, each of which can be 954 shared among a set of Endpoints belonging to the same Participant 955 (Section 2.2.3). Therefore, mechanisms outside the scope of RTP, 956 such as application defined mechanisms, must be used to provide 957 Endpoint identification when outside this Synchronization Context. 959 o An Endpoint can be associated with at most one Participant 960 (Section 2.2.3) at any single point in time. 962 o In some contexts, an Endpoint would typically correspond to a 963 single "host", for example a computer using a single network 964 interface and being used by a single human user. In other 965 contexts, a single "host" can serve multiple Participants, in 966 which case each Participant's Endpoint may share properties, for 967 example the IP address part of a transport address. 969 2.2.2. RTP Session 971 An RTP Session is an association among a group of Participants 972 communicating with RTP. It is a group communications channel which 973 can potentially carry a number of RTP Streams. Within an RTP 974 Session, every Participant can find meta-data and control information 975 (over RTCP) about all the RTP Streams in the RTP Session. The 976 bandwidth of the RTCP control channel is shared between all 977 Participants within an RTP Session. 979 Characteristics: 981 o An RTP Session can carry one ore more RTP Streams. 983 o An RTP Session shares a single SSRC space as defined in RFC3550 984 [RFC3550]. That is, the Endpoints participating in an RTP Session 985 can see an SSRC identifier transmitted by any of the other 986 Endpoints. An Endpoint can receive an SSRC either as SSRC or as a 987 Contributing source (CSRC) in RTP and RTCP packets, as defined by 988 the Endpoints' network interconnection topology. 990 o An RTP Session uses at least two Media Transports 991 (Section 2.1.15), one for sending and one for receiving. 992 Commonly, the receiving Media Transport is the reverse direction 993 of the Media Transport used for sending. An RTP Session may use 994 many Media Transports and these define the session's network 995 interconnection topology. 997 o A single Media Transport always carries a single RTP Session. 999 o Multiple RTP Sessions can be conceptually related, for example 1000 originating from or targeted for the same Participant 1001 (Section 2.2.3) or Endpoint (Section 2.2.1), or by containing RTP 1002 Streams that are somehow related (Section 3). 1004 2.2.3. Participant 1006 A Participant is an entity reachable by a single signaling address, 1007 and is thus related more to the signaling context than to the media 1008 context. 1010 Characteristics: 1012 o A single signaling-addressable entity, using an application- 1013 specific signaling address space, for example a SIP URI. 1015 o A Participant can participate in several Multimedia Sessions 1016 (Section 2.2.4). 1018 o A Participant can be comprised of several associated Endpoints 1019 (Section 2.2.1). 1021 2.2.4. Multimedia Session 1023 A Multimedia Session is an association among a group of Participants 1024 (Section 2.2.3) engaged in the communication via one or more RTP 1025 Sessions (Section 2.2.2). It defines logical relationships among 1026 Media Sources (Section 2.1.4) that appear in multiple RTP Sessions. 1028 Characteristics: 1030 o A Multimedia Session can be composed of several RTP Sessions with 1031 potentially multiple RTP Streams per RTP Session. 1033 o Each Participant in a Multimedia Session can have a multitude of 1034 Media Captures and Media Rendering devices. 1036 o A single Multimedia Session can contain media from one or more 1037 Synchronization Contexts (Section 3.1). An example of that is a 1038 Multimedia Session containing one set of audio and video for 1039 communication purposes belonging to one Synchronization Context, 1040 and another set of audio and video for presentation purposes (like 1041 playing a video file) with a separate Synchronization Context that 1042 has no strong timing relationship and need not be strictly 1043 synchronized with the audio and video used for communication. 1045 2.2.5. Communication Session 1047 A Communication Session is an association among two or more 1048 Participants (Section 2.2.3) communicating with each other via one or 1049 more Multimedia Sessions (Section 2.2.4). 1051 Characteristics: 1053 o Each Participant in a Communication Session is identified via an 1054 application-specific signaling address. 1056 o A Communication Session is composed of Participants that share at 1057 least one Multimedia Session, involving one or more parallel RTP 1058 Sessions with potentially multiple RTP Streams per RTP Session. 1060 For example, in a full mesh communication, the Communication Session 1061 consists of a set of separate Multimedia Sessions between each pair 1062 of Participants. Another example is a centralized conference, where 1063 the Communication Session consists of a set of Multimedia Sessions 1064 between each Participant and the conference handler. 1066 3. Concepts of Inter-Relations 1068 This section uses the concepts from previous sections, and looks at 1069 different types of relationships among them. These relationships 1070 occur at different abstraction levels and for different purposes, but 1071 the reason for the needed relationship at a certain step in the media 1072 handling chain may exist at another step. For example, the use of 1073 Simulcast (Section 3.6)) implies a need to determine relations at RTP 1074 Stream level, but the underlying reason is that multiple Media 1075 Encoders use the same Media Source, i.e. to be able to identify a 1076 common Media Source. 1078 3.1. Synchronization Context 1080 A Synchronization Context defines a requirement on a strong timing 1081 relationship between the Media Sources, typically requiring alignment 1082 of clock sources. Such a relationship can be identified in multiple 1083 ways as listed below. A single Media Source can only belong to a 1084 single Synchronization Context, since it is assumed that a single 1085 Media Source can only have a single media clock and requiring 1086 alignment to several Synchronization Contexts (and thus reference 1087 clocks) will effectively merge those into a single Synchronization 1088 Context. 1090 3.1.1. RTCP CNAME 1092 RFC3550 [RFC3550] describes Inter-media synchronization between RTP 1093 Sessions based on RTCP CNAME, RTP and Network Time Protocol (NTP) 1094 [RFC5905] formatted timestamps of a reference clock. As indicated in 1095 [RFC7273], despite using NTP format timestamps, it is not required 1096 that the clock be synchronized to an NTP source. 1098 3.1.2. Clock Source Signaling 1100 [RFC7273] provides a mechanism to signal the clock source in Session 1101 Description Protocol (SDP) [RFC4566] both for the reference clock as 1102 well as the media clock, thus allowing a Synchronization Context to 1103 be defined beyond the one defined by the usage of CNAME source 1104 descriptions. 1106 3.1.3. Implicitly via RtcMediaStream 1108 WebRTC defines "RtcMediaStream" with one or more 1109 "RtcMediaStreamTracks". All tracks in a "RtcMediaStream" are 1110 intended to be synchronized when rendered, implying that they must be 1111 generated such that synchronization is possible. 1113 3.1.4. Explicitly via SDP Mechanisms 1115 The SDP Grouping Framework [RFC5888] defines an m= line (Section 4.2) 1116 grouping mechanism called "Lip Synchronization" (with LS 1117 identification-tag) for establishing the synchronization requirement 1118 across m= lines when they map to individual sources. 1120 Source-Specific Media Attributes in SDP [RFC5576] extends the above 1121 mechanism when multiple Media Sources are described by a single m= 1122 line. 1124 3.2. Endpoint 1126 Some applications requires knowledge of what Media Sources originate 1127 from a particular Endpoint (Section 2.2.1). This can include such 1128 decisions as packet routing between parts of the topology, knowing 1129 the Endpoint origin of the RTP Streams. 1131 In RTP, this identification has been overloaded with the 1132 Synchronization Context (Section 3.1) through the usage of the RTCP 1133 source description CNAME (Section 3.1.1). This works for some 1134 usages, but in others it breaks down. For example, if an Endpoint 1135 has two sets of Media Sources that have different Synchronization 1136 Contexts, like the audio and video of the human Participant as well 1137 as a set of Media Sources of audio and video for a shared movie, 1138 CNAME would not be an appropriate identification for that Endpoint. 1139 Therefore, an Endpoint may have multiple CNAMEs. The CNAMEs or the 1140 Media Sources themselves can be related to the Endpoint. 1142 3.3. Participant 1144 In communication scenarios, it is commonly needed to know which Media 1145 Sources originate from which Participant (Section 2.2.3). One reason 1146 is, for example, to enable the application to display Participant 1147 Identity information correctly associated with the Media Sources. 1148 This association is handled through the signaling solution to point 1149 at a specific Multimedia Session where the Media Sources may be 1150 explicitly or implicitly tied to a particular Endpoint. 1152 Participant information becomes more problematic due to Media Sources 1153 that are generated through mixing or other conceptual processing of 1154 Raw Streams or Source Streams that originate from different 1155 Participants. This type of Media Sources can thus have a dynamically 1156 varying set of origins and Participants. RTP contains the concept of 1157 CSRC that carry information about the previous step origin of the 1158 included media content on RTP level. 1160 3.4. RtcMediaStream 1162 An RtcMediaStream in WebRTC is an explicit grouping of a set of Media 1163 Sources (RtcMediaStreamTracks) that share a common identifier and a 1164 single Synchronization Context (Section 3.1). 1166 3.5. Multi-Channel Audio 1168 There exist a number of RTP payload formats that can carry multi- 1169 channel audio, despite the codec being a single-channel (mono) 1170 encoder. Multi-channel audio can be viewed as multiple Media Sources 1171 sharing a common Synchronization Context. These are independently 1172 encoded by a Media Encoder and the different Encoded Streams are 1173 packetized together in a time synchronized way into a single Source 1174 RTP Stream, using the used codec's RTP Payload format. Examples of 1175 codecs that support multi-channel audio are PCMA and PCMU [RFC3551], 1176 AMR [RFC4867], and G.719 [RFC5404]. 1178 3.6. Simulcast 1180 A Media Source represented as multiple independent Encoded Streams 1181 constitutes a Simulcast [I-D.ietf-mmusic-sdp-simulcast] or MDC of 1182 that Media Source. Figure 8 shows an example of a Media Source that 1183 is encoded into three separate Simulcast streams, that are in turn 1184 sent on the same Media Transport flow. When using Simulcast, the RTP 1185 Streams may be sharing RTP Session and Media Transport, or be 1186 separated on different RTP Sessions and Media Transports, or any 1187 combination of these two. One major reason to use separate Media 1188 Transports is to make use of different Quality of Service for the 1189 different Source RTP Streams. Some considerations on separating 1190 related RTP Streams are discussed in Section 3.12. 1192 +----------------+ 1193 | Media Source | 1194 +----------------+ 1195 Source Stream | 1196 +----------------------+----------------------+ 1197 | | | 1198 V V V 1199 +------------------+ +------------------+ +------------------+ 1200 | Media Encoder | | Media Encoder | | Media Encoder | 1201 +------------------+ +------------------+ +------------------+ 1202 | Encoded | Encoded | Encoded 1203 | Stream | Stream | Stream 1204 V V V 1205 +------------------+ +------------------+ +------------------+ 1206 | Media Packetizer | | Media Packetizer | | Media Packetizer | 1207 +------------------+ +------------------+ +------------------+ 1208 | Source | Source | Source 1209 | RTP | RTP | RTP 1210 | Stream | Stream | Stream 1211 +-----------------+ | +-----------------+ 1212 | | | 1213 V V V 1214 +-------------------+ 1215 | Media Transport | 1216 +-------------------+ 1218 Figure 8: Example of Media Source Simulcast 1220 The Simulcast relation between the RTP Streams is the common Media 1221 Source. In addition, to be able to identify the common Media Source, 1222 a receiver of the RTP Stream may need to know which configuration or 1223 encoding goals that lay behind the produced Encoded Stream and its 1224 properties. This enables selection of the stream that is most useful 1225 in the application at that moment. 1227 3.7. Layered Multi-Stream 1229 Layered Multi-Stream (LMS) is a mechanism by which different portions 1230 of a layered or scalable encoding of a Source Stream are sent using 1231 separate RTP Streams (sometimes in separate RTP Sessions). LMSs are 1232 useful for receiver control of layered media. 1234 A Media Source represented as an Encoded Stream and multiple 1235 Dependent Streams constitutes a Media Source that has layered 1236 dependencies. Figure 9 represents an example of a Media Source that 1237 is encoded into three dependent layers, where two layers are sent on 1238 the same Media Transport using different RTP Streams, i.e. SSRCs, and 1239 the third layer is sent on a separate Media Transport. 1241 +----------------+ 1242 | Media Source | 1243 +----------------+ 1244 | 1245 | 1246 V 1247 +---------------------------------------------------------+ 1248 | Media Encoder | 1249 +---------------------------------------------------------+ 1250 | | | 1251 Encoded Stream Dependent Stream Dependent Stream 1252 | | | 1253 V V V 1254 +----------------+ +----------------+ +----------------+ 1255 |Media Packetizer| |Media Packetizer| |Media Packetizer| 1256 +----------------+ +----------------+ +----------------+ 1257 | | | 1258 RTP Stream RTP Stream RTP Stream 1259 | | | 1260 +------+ +------+ | 1261 | | | 1262 V V V 1263 +-----------------+ +-----------------+ 1264 | Media Transport | | Media Transport | 1265 +-----------------+ +-----------------+ 1267 Figure 9: Example of Media Source Layered Dependency 1269 It is sometimes useful to make a distinction between using a single 1270 Media Transport or multiple separate Media Transports when (in both 1271 cases) using multiple RTP Streams to carry Encoded Streams and 1272 Dependent Streams for a Media Source. Therefore, the following new 1273 terminology is defined here: 1275 SRST: Single RTP Stream on a Single Media Transport 1277 MRST: Multiple RTP Streams on a Single Media Transport 1279 MRMT: Multiple RTP Streams on Multiple Media Transports 1281 MRST and MRMT relations needs to identify the common Media Encoder 1282 origin for the Encoded and Dependent Streams. When using different 1283 RTP Sessions (MRMT), a single RTP Stream per Media Encoder, and a 1284 single Media Source in each RTP Session, common SSRC and CNAMEs can 1285 be used to identify the common Media Source. When multiple RTP 1286 Streams are sent from one Media Encoder in the same RTP Session 1287 (MRST), then CNAME is the only currently specified RTP identifier 1288 that can be used. In cases where multiple Media Encoders use 1289 multiple Media Sources sharing Synchronization Context, and thus 1290 having a common CNAME, additional heuristics or identification need 1291 to be applied to create the MRST or MRMT relationships between the 1292 RTP Streams. 1294 3.8. RTP Stream Duplication 1296 RTP Stream Duplication [RFC7198], using the same or different Media 1297 Transports, and optionally also delaying the duplicate [RFC7197], 1298 offers a simple way to protect media flows from packet loss in some 1299 cases (see Figure 10). This is a specific type of redundancy. All 1300 but one Source RTP Stream (Section 2.1.10) are effectively Redundancy 1301 RTP Streams (Section 2.1.12), but since both Source and Redundant RTP 1302 Streams are the same, it does not matter which one is which. This 1303 can also be seen as a specific type of Simulcast (Section 3.6) that 1304 transmits the same Encoded Stream (Section 2.1.7) multiple times. 1306 +----------------+ 1307 | Media Source | 1308 +----------------+ 1309 Source Stream | 1310 V 1311 +----------------+ 1312 | Media Encoder | 1313 +----------------+ 1314 Encoded Stream | 1315 +-----------+-----------+ 1316 | | 1317 V V 1318 +------------------+ +------------------+ 1319 | Media Packetizer | | Media Packetizer | 1320 +------------------+ +------------------+ 1321 Source | RTP Stream Source | RTP Stream 1322 | V 1323 | +-------------+ 1324 | | Delay (opt) | 1325 | +-------------+ 1326 | | 1327 +-----------+-----------+ 1328 | 1329 V 1330 +-------------------+ 1331 | Media Transport | 1332 +-------------------+ 1334 Figure 10: Example of RTP Stream Duplication 1336 3.9. Redundancy Format 1338 The RTP Payload for Redundant Audio Data [RFC2198] defines a 1339 transport for redundant audio data together with primary data in the 1340 same RTP payload. The redundant data can be a time delayed version 1341 of the primary or another time delayed Encoded Stream using a 1342 different Media Encoder to encode the same Media Source as the 1343 primary, as depicted in Figure 11. 1345 +--------------------+ 1346 | Media Source | 1347 +--------------------+ 1348 | 1349 Source Stream 1350 | 1351 +------------------------+ 1352 | | 1353 V V 1354 +--------------------+ +--------------------+ 1355 | Media Encoder | | Media Encoder | 1356 +--------------------+ +--------------------+ 1357 | | 1358 | +------------+ 1359 Encoded Stream | Time Delay | 1360 | +------------+ 1361 | | 1362 | +------------------+ 1363 V V 1364 +--------------------+ 1365 | Media Packetizer | 1366 +--------------------+ 1367 | 1368 V 1369 RTP Stream 1371 Figure 11: Concept for usage of Audio Redundancy with different Media 1372 Encoders 1374 The Redundancy format is thus providing the necessary meta 1375 information to correctly relate different parts of the same Encoded 1376 Stream. The case depicted above (Figure 11) relates the Received 1377 Source Stream fragments coming out of different Media Decoders, to be 1378 able to combine them together into a less erroneous Source Stream. 1380 3.10. RTP Retransmission 1382 Figure 12 shows an example where a Media Source's Source RTP Stream 1383 is protected by a retransmission (RTX) flow [RFC4588]. In this 1384 example the Source RTP Stream and the Redundancy RTP Stream share the 1385 same Media Transport. 1387 +--------------------+ 1388 | Media Source | 1389 +--------------------+ 1390 | 1391 V 1392 +--------------------+ 1393 | Media Encoder | 1394 +--------------------+ 1395 | Retransmission 1396 Encoded Stream +--------+ +---- Request 1397 V | V V 1398 +--------------------+ | +--------------------+ 1399 | Media Packetizer | | | RTP Retransmission | 1400 +--------------------+ | +--------------------+ 1401 | | | 1402 +------------+ Redundancy RTP Stream 1403 Source RTP Stream | 1404 | | 1405 +---------+ +---------+ 1406 | | 1407 V V 1408 +-----------------+ 1409 | Media Transport | 1410 +-----------------+ 1412 Figure 12: Example of Media Source Retransmission Flows 1414 The RTP Retransmission example (Figure 12) illustrates that this 1415 mechanism works purely on the Source RTP Stream. The RTP 1416 Retransmission transform buffers the sent Source RTP Stream and, upon 1417 request, emits a retransmitted packet with an extra payload header as 1418 a Redundancy RTP Stream. The RTP Retransmission mechanism [RFC4588] 1419 is specified such that there is a one to one relation between the 1420 Source RTP Stream and the Redundancy RTP Stream. Therefore, a 1421 Redundancy RTP Stream needs to be associated with its Source RTP 1422 Stream. This is done based on CNAME selectors and heuristics to 1423 match requested packets for a given Source RTP Stream with the 1424 original sequence number in the payload of any new Redundancy RTP 1425 Stream using the RTX payload format. In cases where the Redundancy 1426 RTP Stream is sent in a different RTP Session than the Source RTP 1427 Stream, the RTP Session relation is signaled by using the SDP Media 1428 Grouping's [RFC5888] Flow Identification (FID identification-tag) 1429 semantics. 1431 3.11. Forward Error Correction 1433 Figure 13 shows an example where two Media Sources' Source RTP 1434 Streams are protected by Forward Error Correction (FEC). Source RTP 1435 Stream A has a RTP-based Redundancy transformation in FEC Encoder 1. 1436 This produces a Redundancy RTP Stream 1, that is only related to 1437 Source RTP Stream A. The FEC Encoder 2, however, takes two Source 1438 RTP Streams (A and B) and produces a Redundancy RTP Stream 2 that 1439 protects them jointly, i.e. Redundancy RTP Stream 2 relates to two 1440 Source RTP Streams (a FEC group). FEC decoding, when needed due to 1441 packet loss or packet corruption at the receiver, requires knowledge 1442 about which Source RTP Streams that the FEC encoding was based on. 1444 In Figure 13 all RTP Streams are sent on the same Media Transport. 1445 This is however not the only possible choice. Numerous combinations 1446 exist for spreading these RTP Streams over different Media Transports 1447 to achieve the communication application's goal. 1449 +--------------------+ +--------------------+ 1450 | Media Source A | | Media Source B | 1451 +--------------------+ +--------------------+ 1452 | | 1453 V V 1454 +--------------------+ +--------------------+ 1455 | Media Encoder A | | Media Encoder B | 1456 +--------------------+ +--------------------+ 1457 | | 1458 Encoded Stream Encoded Stream 1459 V V 1460 +--------------------+ +--------------------+ 1461 | Media Packetizer A | | Media Packetizer B | 1462 +--------------------+ +--------------------+ 1463 | | 1464 Source RTP Stream A Source RTP Stream B 1465 | | 1466 +-----+---------+-------------+ +---+---+ 1467 | V V V | 1468 | +---------------+ +---------------+ | 1469 | | FEC Encoder 1 | | FEC Encoder 2 | | 1470 | +---------------+ +---------------+ | 1471 | Redundancy | Redundancy | | 1472 | RTP Stream 1 | RTP Stream 2 | | 1473 V V V V 1474 +----------------------------------------------------------+ 1475 | Media Transport | 1476 +----------------------------------------------------------+ 1478 Figure 13: Example of FEC Redundancy RTP Streams 1480 As FEC Encoding exists in various forms, the methods for relating FEC 1481 Redundancy RTP Streams with its source information in Source RTP 1482 Streams are many. The XOR based RTP FEC Payload format [RFC5109] is 1483 defined in such a way that a Redundancy RTP Stream has a one to one 1484 relation with a Source RTP Stream. In fact, the RFC requires the 1485 Redundancy RTP Stream to use the same SSRC as the Source RTP Stream. 1486 This requires the use of either a separate RTP Session, or the 1487 Redundancy RTP Payload format [RFC2198]. The underlying relation 1488 requirement for this FEC format and a particular Redundancy RTP 1489 Stream is to know the related Source RTP Stream, including its SSRC. 1491 3.12. RTP Stream Separation 1493 RTP Streams can be separated exclusively based on their SSRCs, at the 1494 RTP Session level, or at the Multi-Media Session level. 1496 When the RTP Streams that have a relationship are all sent in the 1497 same RTP Session and are uniquely identified based on their SSRC 1498 only, it is termed an SSRC-Only Based Separation. Such streams can 1499 be related via RTCP CNAME to identify that the streams belong to the 1500 same Endpoint. SSRC-based approaches [RFC5576], when used, can 1501 explicitly relate various such RTP Streams. 1503 On the other hand, when RTP Streams that are related are sent in the 1504 context of different RTP Sessions to achieve separation, it is known 1505 as RTP Session-based separation. This is commonly used when the 1506 different RTP Streams are intended for different Media Transports. 1508 Several mechanisms that use RTP Session-based separation rely on it 1509 to enable an implicit grouping mechanism expressing the relationship. 1510 The solutions have been based on using the same SSRC value in the 1511 different RTP Sessions to implicitly indicate their relation. That 1512 way, no explicit RTP level mechanism has been needed, only signaling 1513 level relations have been established using semantics from Grouping 1514 of Media lines framework [RFC5888]. Examples of this are RTP 1515 Retransmission [RFC4588], SVC Multi-Session Transmission [RFC6190] 1516 and XOR Based FEC [RFC5109]. RTCP CNAME explicitly relates RTP 1517 Streams across different RTP Sessions, as explained in the previous 1518 section. Such a relationship can be used to perform inter-media 1519 synchronization. 1521 RTP Streams that are related and need to be associated can be part of 1522 different Multimedia Sessions, rather than just different RTP 1523 Sessions within the same Multimedia Session context. This puts 1524 further demand on the scope of the mechanism(s) and its handling of 1525 identifiers used for expressing the relationships. 1527 3.13. Multiple RTP Sessions over one Media Transport 1529 [I-D.westerlund-avtcore-transport-multiplexing] describes a mechanism 1530 that allows several RTP Sessions to be carried over a single 1531 underlying Media Transport. The main reasons for doing this are 1532 related to the impact of using one or more Media Transports (using a 1533 common network path or potentially have different ones). The fewer 1534 Media Transports used, the less need for NAT/FW traversal resources 1535 and smaller number of flow based Quality of Service (QoS). 1537 However, Multiple RTP Sessions over one Media Transport imply that a 1538 single Media Transport 5-tuple is not sufficient to express in which 1539 RTP Session context a particular RTP Stream exists. Complexities in 1540 the relationship between Media Transports and RTP Session already 1541 exist as one RTP Session contains multiple Media Transports, e.g. 1542 even a Peer-to-Peer RTP Session with RTP/RTCP Multiplexing requires 1543 two Media Transports, one in each direction. The relationship 1544 between Media Transports and RTP Sessions as well as additional 1545 levels of identifiers need to be considered in both signaling design 1546 and when defining terminology. 1548 4. Mapping from Existing Terms 1550 This section describes a selected set of terms from some relevant 1551 IETF RFC and Internet Drafts (at the time of writing), using the 1552 concepts from previous sections. 1554 4.1. Telepresence Terms 1556 The terms in this sub-section are used in the context of CLUE 1557 [I-D.ietf-clue-framework]. 1559 4.1.1. Audio Capture 1561 Defined in CLUE as a Media Capture (Section 4.1.7) for audio. 1562 Describes an audio Media Source (Section 2.1.4). 1564 4.1.2. Capture Device 1566 Defined in CLUE as a device that converts physical input into an 1567 electrical signal. Identifies a physical entity performing a Media 1568 Capture (Section 2.1.2) transformation. 1570 4.1.3. Capture Encoding 1572 Defined in CLUE as a specific encoding (Section 4.1.6) of a Media 1573 Capture (Section 4.1.7). Describes an Encoded Stream (Section 2.1.7) 1574 related to CLUE specific semantic information. 1576 4.1.4. Capture Scene 1578 Defined in CLUE as a structure representing a spatial region captured 1579 by one or more Capture Devices (Section 4.1.2), each capturing media 1580 representing a portion of the region. Describes a set of spatially 1581 related Media Sources (Section 2.1.4). 1583 4.1.5. Endpoint 1585 Defined in CLUE as a CLUE-capable device which is the logical point 1586 of final termination through receiving, decoding and rendering and/or 1587 initiation through capturing, encoding, and sending of media streams 1588 (Section 4.1.10). CLUE further defines it to consist of one or more 1589 physical devices with source and sink media streams, and exactly one 1590 [RFC4353] Participant. Describes exactly one Participant 1591 (Section 2.2.3) and one or more Endpoints (Section 2.2.1). 1593 4.1.6. Individual Encoding 1595 Defined in CLUE as a set of parameters representing a way to encode a 1596 Media Capture (Section 4.1.7) to become a Capture Encoding 1597 (Section 4.1.3). Describes the configuration information needed to 1598 perform a Media Encoder (Section 2.1.6) transformation. 1600 4.1.7. Media Capture 1602 Defined in CLUE as a source of media, such as from one or more 1603 Capture Devices (Section 4.1.2) or constructed from other media 1604 streams (Section 4.1.10). Describes either a Media Capture 1605 (Section 2.1.2) or a Media Source (Section 2.1.4), depending on in 1606 which context the term is used. 1608 4.1.8. Media Consumer 1610 Defined in CLUE as a CLUE-capable device that intends to receive 1611 Capture Encodings (Section 4.1.3). Describes the media receiving 1612 part of an Endpoint (Section 2.2.1). 1614 4.1.9. Media Provider 1616 Defined in CLUE as a CLUE-capable device that intends to send Capture 1617 Encodings (Section 4.1.3). Describes the media sending part of an 1618 Endpoint (Section 2.2.1). 1620 4.1.10. Stream 1622 Defined in CLUE as a Capture Encoding (Section 4.1.3) sent from a 1623 Media Provider (Section 4.1.9) to a Media Consumer (Section 4.1.8) 1624 via RTP. Describes an RTP Stream (Section 2.1.10). 1626 4.1.11. Video Capture 1628 Defined in CLUE as a Media Capture (Section 4.1.7) for video. 1629 Describes a video Media Source (Section 2.1.4). 1631 4.2. Media Description 1633 A single Session Description Protocol (SDP) [RFC4566] media 1634 description (or media block; an m-line and all subsequent lines until 1635 the next m-line or the end of the SDP) describes part of the 1636 necessary configuration and identification information needed for a 1637 Media Encoder transformation, as well as the necessary configuration 1638 and identification information for the Media Decoder to be able to 1639 correctly interpret a received RTP Stream. 1641 A Media Description typically relates to a single Media Source. This 1642 is for example an explicit restriction in WebRTC. However, nothing 1643 prevents that the same Media Description (and same RTP Session) is 1644 re-used for multiple Media Sources 1645 [I-D.ietf-avtcore-rtp-multi-stream]. It can thus describe properties 1646 of one or more RTP Streams, and can also describe properties valid 1647 for an entire RTP Session (via [RFC5576] mechanisms, for example). 1649 4.3. Media Stream 1651 RTP [RFC3550] uses media stream, audio stream, video stream, and 1652 stream of (RTP) packets interchangeably, which are all RTP Streams. 1654 4.4. Multimedia Conference 1656 A Multimedia Conference is a Communication Session (Section 2.2.5) 1657 between two or more Participants (Section 2.2.3), along with the 1658 software they are using to communicate. 1660 4.5. Multimedia Session 1662 SDP [RFC4566] defines a Multimedia Session as a set of multimedia 1663 senders and receivers and the data streams flowing from senders to 1664 receivers, which would correspond to a set of Endpoints and the RTP 1665 Streams that flow between them. In this memo, Multimedia Session 1666 (Section 2.2.4) also assumes those Endpoints belong to a set of 1667 Participants that are engaged in communication via a set of related 1668 RTP Streams. 1670 RTP [RFC3550] defines a Multimedia Session as a set of concurrent RTP 1671 Sessions among a common group of Participants. For example, a video 1672 conference may contain an audio RTP Session and a video RTP Session. 1673 This would correspond to a group of Participants (each using one or 1674 more Endpoints) sharing a set of concurrent RTP Sessions. In this 1675 memo, Multimedia Session also defines those RTP Sessions to have some 1676 relation and be part of a communication among the Participants. 1678 4.6. Multipoint Control Unit (MCU) 1680 This term is commonly used to describe the central node in any type 1681 of star topology [I-D.ietf-avtcore-rtp-topologies-update] conference. 1682 It describes a device that includes one Participant (Section 2.2.3) 1683 (usually corresponding to a so-called conference focus) and one or 1684 more related Endpoints (Section 2.2.1) (sometimes one or more per 1685 conference Participant). 1687 4.7. Multi-Session Transmission (MST) 1689 One of two transmission modes defined in H.264 based SVC [RFC6190], 1690 the other mode being SST (Section 4.13). In Multi-Session 1691 Transmission (MST), the SVC Media Encoder sends Encoded Streams and 1692 Dependent Streams distributed across two or more RTP Streams in one 1693 or more RTP Sessions. The term "MST" is ambiguous in RFC 6190, 1694 especially since the name indicates the use of multiple "sessions", 1695 while MST type packetization is in fact required whenever two or more 1696 RTP Streams are used for the Encoded and Dependent Streams, 1697 regardless if those are sent in one or more RTP Sessions. 1698 Corresponds either to MRST or MRMT (Section 3.7) stream relations 1699 defined in this specification. The SVC RTP Payload RFC [RFC6190] is 1700 not particularly explicit about how the common Media Encoder 1701 (Section 2.1.6) relation between Encoded Streams (Section 2.1.7) and 1702 Dependent Streams (Section 2.1.8) is to be implemented. 1704 4.8. Recording Device 1706 WebRTC specifications use this term to refer to locally available 1707 entities performing a Media Capture (Section 2.1.2) transformation. 1709 4.9. RtcMediaStream 1711 A WebRTC RtcMediaStream is a set of Media Sources (Section 2.1.4) 1712 sharing the same Synchronization Context (Section 3.1). 1714 4.10. RtcMediaStreamTrack 1716 A WebRTC RtcMediaStreamTrack is a Media Source (Section 2.1.4). 1718 4.11. RTP Sender 1720 RTP [RFC3550] uses this term, which can be seen as the RTP protocol 1721 part of a Media Packetizer (Section 2.1.9). 1723 4.12. RTP Session 1725 Within the context of SDP, a singe m= line can map to a single RTP 1726 Session (Section 2.2.2) or multiple m= lines can map to a single RTP 1727 Session. The latter is enabled via multiplexing schemes such as 1728 BUNDLE [I-D.ietf-mmusic-sdp-bundle-negotiation], for example, which 1729 allows mapping of multiple m= lines to a single RTP Session. 1731 4.13. Single Session Transmission (SST) 1733 One of two transmission modes defined in H.264 based SVC [RFC6190], 1734 the other mode being MST (Section 4.7). In Single Session 1735 Transmission (SST), the SVC Media Encoder sends Encoded Streams 1736 (Section 2.1.7) and Dependent Streams (Section 2.1.8) combined into a 1737 single RTP Stream (Section 2.1.10) in a single RTP Session 1738 (Section 2.2.2), using the SVC RTP Payload format. The term "SST" is 1739 ambiguous in RFC 6190, in that it sometimes refers to the use of a 1740 single RTP Stream, like in sections relating to packetization, and 1741 sometimes appears to refer to use of a single RTP Session, like in 1742 the context of discussing SDP. Closely corresponds to SRST 1743 (Section 3.7) defined in this specification. 1745 4.14. SSRC 1747 RTP [RFC3550] defines this as "the source of a stream of RTP 1748 packets", which indicates that an SSRC is not only a unique 1749 identifier for the Encoded Stream (Section 2.1.7) carried in those 1750 packets, but is also effectively used as a term to denote a Media 1751 Packetizer (Section 2.1.9). 1753 5. Security Considerations 1755 This document simply tries to clarify the confusion prevalent in RTP 1756 taxonomy because of inconsistent usage by multiple technologies and 1757 protocols making use of the RTP protocol. It does not introduce any 1758 new security considerations beyond those already well documented in 1759 the RTP protocol [RFC3550] and each of the many respective 1760 specifications of the various protocols making use of it. 1762 Hopefully having a well-defined common terminology and understanding 1763 of the complexities of the RTP architecture will help lead us to 1764 better standards, avoiding security problems. 1766 6. Acknowledgement 1768 This document has many concepts borrowed from several documents such 1769 as WebRTC [I-D.ietf-rtcweb-overview], CLUE [I-D.ietf-clue-framework], 1770 and Multiplexing Architecture 1771 [I-D.westerlund-avtcore-transport-multiplexing]. The authors would 1772 like to thank all the authors of each of those documents. 1774 The authors would also like to acknowledge the insights, guidance and 1775 contributions of Magnus Westerlund, Roni Even, Paul Kyzivat, Colin 1776 Perkins, Keith Drage, Harald Alvestrand, Alex Eleftheriadis, Mo 1777 Zanaty, Stephan Wenger, and Bernard Aboba. 1779 7. Contributors 1781 Magnus Westerlund has contributed the concept model for the media 1782 chain using transformations and streams model, including rewriting 1783 pre-existing concepts into this model and adding missing concepts. 1784 The first proposal for updating the relationships and the topologies 1785 based on this concept was also performed by Magnus. 1787 8. IANA Considerations 1789 This document makes no request of IANA. 1791 9. Informative References 1793 [I-D.ietf-avtcore-rtp-multi-stream] 1794 Lennox, J., Westerlund, M., Wu, W., and C. Perkins, 1795 "Sending Multiple Media Streams in a Single RTP Session", 1796 draft-ietf-avtcore-rtp-multi-stream-07 (work in progress), 1797 March 2015. 1799 [I-D.ietf-avtcore-rtp-topologies-update] 1800 Westerlund, M. and S. Wenger, "RTP Topologies", draft- 1801 ietf-avtcore-rtp-topologies-update-08 (work in progress), 1802 June 2015. 1804 [I-D.ietf-clue-framework] 1805 Duckworth, M., Pepperell, A., and S. Wenger, "Framework 1806 for Telepresence Multi-Streams", draft-ietf-clue- 1807 framework-22 (work in progress), April 2015. 1809 [I-D.ietf-mmusic-sdp-bundle-negotiation] 1810 Holmberg, C., Alvestrand, H., and C. Jennings, 1811 "Negotiating Media Multiplexing Using the Session 1812 Description Protocol (SDP)", draft-ietf-mmusic-sdp-bundle- 1813 negotiation-22 (work in progress), June 2015. 1815 [I-D.ietf-mmusic-sdp-simulcast] 1816 Burman, B., Westerlund, M., Nandakumar, S., and M. Zanaty, 1817 "Using Simulcast in SDP and RTP Sessions", draft-ietf- 1818 mmusic-sdp-simulcast-00 (work in progress), January 2015. 1820 [I-D.ietf-rtcweb-overview] 1821 Alvestrand, H., "Overview: Real Time Protocols for 1822 Browser-based Applications", draft-ietf-rtcweb-overview-14 1823 (work in progress), June 2015. 1825 [I-D.westerlund-avtcore-transport-multiplexing] 1826 Westerlund, M. and C. Perkins, "Multiplexing Multiple RTP 1827 Sessions onto a Single Lower-Layer Transport", draft- 1828 westerlund-avtcore-transport-multiplexing-07 (work in 1829 progress), October 2013. 1831 [RFC2198] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., 1832 Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse- 1833 Parisis, "RTP Payload for Redundant Audio Data", RFC 2198, 1834 September 1997. 1836 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 1837 Jacobson, "RTP: A Transport Protocol for Real-Time 1838 Applications", STD 64, RFC 3550, July 2003. 1840 [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and 1841 Video Conferences with Minimal Control", STD 65, RFC 3551, 1842 July 2003. 1844 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 1845 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 1846 RFC 3711, March 2004. 1848 [RFC4353] Rosenberg, J., "A Framework for Conferencing with the 1849 Session Initiation Protocol (SIP)", RFC 4353, February 1850 2006. 1852 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 1853 Description Protocol", RFC 4566, July 2006. 1855 [RFC4588] Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R. 1856 Hakenberg, "RTP Retransmission Payload Format", RFC 4588, 1857 July 2006. 1859 [RFC4867] Sjoberg, J., Westerlund, M., Lakaniemi, A., and Q. Xie, 1860 "RTP Payload Format and File Storage Format for the 1861 Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband 1862 (AMR-WB) Audio Codecs", RFC 4867, April 2007. 1864 [RFC5109] Li, A., "RTP Payload Format for Generic Forward Error 1865 Correction", RFC 5109, December 2007. 1867 [RFC5404] Westerlund, M. and I. Johansson, "RTP Payload Format for 1868 G.719", RFC 5404, January 2009. 1870 [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific 1871 Media Attributes in the Session Description Protocol 1872 (SDP)", RFC 5576, June 2009. 1874 [RFC5888] Camarillo, G. and H. Schulzrinne, "The Session Description 1875 Protocol (SDP) Grouping Framework", RFC 5888, June 2010. 1877 [RFC5905] Mills, D., Martin, J., Burbank, J., and W. Kasch, "Network 1878 Time Protocol Version 4: Protocol and Algorithms 1879 Specification", RFC 5905, June 2010. 1881 [RFC6190] Wenger, S., Wang, Y., Schierl, T., and A. Eleftheriadis, 1882 "RTP Payload Format for Scalable Video Coding", RFC 6190, 1883 May 2011. 1885 [RFC7160] Petit-Huguenin, M. and G. Zorn, "Support for Multiple 1886 Clock Rates in an RTP Session", RFC 7160, April 2014. 1888 [RFC7197] Begen, A., Cai, Y., and H. Ou, "Duplication Delay 1889 Attribute in the Session Description Protocol", RFC 7197, 1890 April 2014. 1892 [RFC7198] Begen, A. and C. Perkins, "Duplicating RTP Streams", RFC 1893 7198, April 2014. 1895 [RFC7201] Westerlund, M. and C. Perkins, "Options for Securing RTP 1896 Sessions", RFC 7201, April 2014. 1898 [RFC7273] Williams, A., Gross, K., van Brandenburg, R., and H. 1899 Stokking, "RTP Clock Source Signalling", RFC 7273, June 1900 2014. 1902 Appendix A. Changes From Earlier Versions 1904 NOTE TO RFC EDITOR: Please remove this section prior to publication. 1906 A.1. Modifications Between WG Version -06 and -07 1908 Addresses comments from AD review and GenArt review. 1910 o Added RTP-based Security and RTP-based Validation transform 1911 sections, as well as Secured RTP Stream and Received Secured RTP 1912 Stream sections. 1914 o Improved wording in Abstract and Introduction sections. 1916 o Clarified what is considered "media" in section 2.1.2 Media 1917 Capture. 1919 o Changed a number of "Characteristics" lists to more suitable prose 1920 text. 1922 o Re-worded text around use of Encoded and Dependent RTP Streams in 1923 section 2.1.9 Media Packetizer. 1925 o Clarified description of Source RTP Stream in section 2.1.10. 1927 o Clarified motivation to use separate Media Transports for 1928 Simulcast in section 3.6. 1930 o Added local descriptions of terms imported from CLUE framework. 1932 o Editorial improvements. 1934 A.2. Modifications Between WG Version -05 and -06 1936 o Clarified that a Redundancy RTP Stream can be used standalone to 1937 generate Repaired RTP Streams. 1939 o Clarified that (in accordance with above) RTP-based Repair takes 1940 zero or more Received RTP Streams and one or more Received 1941 Redundancy RTP Streams as input. 1943 o Changed Figure 6 to more clearly show that Media Transport is 1944 terminated in the Endpoint, not in the Participant. 1946 o Added a sentence to Endpoint section that clarifies there may be 1947 contexts where a single "host" can serve multiple Participants, 1948 making those Endpoints share some properties. 1950 o Merged previous section 3.5 on SST/MST with previous section 3.8 1951 on Layered Multi-Stream into a common section discussing the 1952 scalable/layered stream relation, and moved improved, descriptive 1953 text on SST and MST to new sub-sections 4.7 and 4.13, describing 1954 them as existing terms. 1956 o Editorial improvements. 1958 A.3. Modifications Between WG Version -04 and -05 1960 o Editorial improvements. 1962 A.4. Modifications Between WG Version -03 and -04 1964 o Changed "Media Redundancy" and "Media Repair" to "RTP-based 1965 Redundancy" and "RTP-based Repair", since those terms are more 1966 specific and correct. 1968 o Changed "End Point" to "Endpoint" and removed Editor's Note on 1969 this. 1971 o Clarified that a Media Capture may impose constraints on clock 1972 handling. 1974 o Clarified that mixing multiple Raw Streams into a Source Stream is 1975 not possible, since that requires mixed streams to have a timing 1976 relation, requiring them to be Source Streams, and added an 1977 example. 1979 o Clarified that RTP-based Redundancy excludes the type of encoding 1980 redundancy found within the encoded media format in an Encoded 1981 Stream. 1983 o Clarified that a Media Transport contains only a single RTP 1984 Session, but a single RTP Session can span multiple Media 1985 Transports. 1987 o Clarified that packets with seemingly correct checksum that are 1988 received by a Media Transport Receiver may still be corrupt. 1990 o Clarified that a corrupt packet in a Media Transport Receiver is 1991 typically either discarded or somehow marked and passed on in the 1992 Received RTP Stream. 1994 o Added Synchronization Context to Figure 6. 1996 o Editorial improvements and clarifications. 1998 A.5. Modifications Between WG Version -02 and -03 2000 o Changed section 3.5, removing SST-SS/MS and MST-SS/MS, replacing 2001 them with SRST, MRST, and MRMT. 2003 o Updated section 3.8 to align with terminology changes in section 2004 3.5. 2006 o Added a new section 4.12, describing the term Multimedia 2007 Conference. 2009 o Changed reference from I-D to now published RFC 7273. 2011 o Editorial improvements and clarifications. 2013 A.6. Modifications Between WG Version -01 and -02 2015 o Major re-structure 2017 o Moved media chain Media Transport detailing up one section level 2019 o Collapsed level 2 sub-sections of section 3 and thus moved level 3 2020 sub-sections up one level, gathering some introductory text into 2021 the beginning of section 3 2023 o Added that not only SSRC collision, but also a clock rate change 2024 [RFC7160] is a valid reason to change SSRC value for an RTP stream 2026 o Added a sub-section on clock source signaling 2028 o Added a sub-section on RTP stream duplication 2030 o Elaborated a bit in section 2.2.1 on the relation between End 2031 Points, Participants and CNAMEs 2033 o Elaborated a bit in section 2.2.4 on Multimedia Session and 2034 synchronization contexts 2036 o Removed the section on CLUE scenes defining an implicit 2037 synchronization context, since it was incorrect 2039 o Clarified text on SVC SST and MST according to list discussions 2041 o Removed the entire topology section to avoid possible 2042 inconsistencies or duplications with draft-ietf-avtcore-rtp- 2043 topologies-update, but saved one example overview figure of 2044 Communication Entities into that section 2046 o Added a section 4 on mapping from existing terms with one sub- 2047 section per term, mainly by moving text from sections 2 and 3 2049 o Changed all occurrences of Packet Stream to RTP Stream 2051 o Moved all normative references to informative, since this is an 2052 informative document 2054 o Added references to RFC 7160, RFC 7197 and RFC 7198, and removed 2055 unused references 2057 A.7. Modifications Between WG Version -00 and -01 2059 o WG version -00 text is identical to individual draft -03 2061 o Amended description of SVC SST and MST encodings with respect to 2062 concepts defined in this text 2064 o Removed UML as normative reference, since the text no longer uses 2065 any UML notation 2067 o Removed a number of level 4 sections and moved out text to the 2068 level above 2070 A.8. Modifications Between Version -02 and -03 2072 o Section 4 rewritten (and new communication topologies added) to 2073 reflect the major updates to Sections 1-3 2075 o Section 8 removed (carryover from initial -00 draft) 2077 o General clean up of text, grammar and nits 2079 A.9. Modifications Between Version -01 and -02 2081 o Section 2 rewritten to add both streams and transformations in the 2082 media chain. 2084 o Section 3 rewritten to focus on exposing relationships. 2086 A.10. Modifications Between Version -00 and -01 2088 o Too many to list 2090 o Added new authors 2092 o Updated content organization and presentation 2094 Authors' Addresses 2096 Jonathan Lennox 2097 Vidyo, Inc. 2098 433 Hackensack Avenue 2099 Seventh Floor 2100 Hackensack, NJ 07601 2101 US 2103 Email: jonathan@vidyo.com 2105 Kevin Gross 2106 AVA Networks, LLC 2107 Boulder, CO 2108 US 2110 Email: kevin.gross@avanw.com 2112 Suhas Nandakumar 2113 Cisco Systems 2114 170 West Tasman Drive 2115 San Jose, CA 95134 2116 US 2118 Email: snandaku@cisco.com 2120 Gonzalo Salgueiro 2121 Cisco Systems 2122 7200-12 Kit Creek Road 2123 Research Triangle Park, NC 27709 2124 US 2126 Email: gsalguei@cisco.com 2128 Bo Burman (editor) 2129 Ericsson 2130 Kistavagen 25 2131 SE-16480 Stockholm 2132 Sweden 2134 Email: bo.burman@ericsson.com