idnits 2.17.1 draft-ietf-avtext-rtp-grouping-taxonomy-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 5, 2015) is 3339 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-11) exists of draft-ietf-avtcore-rtp-multi-stream-06 == Outdated reference: A later version (-10) exists of draft-ietf-avtcore-rtp-topologies-update-06 == Outdated reference: A later version (-25) exists of draft-ietf-clue-framework-21 == Outdated reference: A later version (-54) exists of draft-ietf-mmusic-sdp-bundle-negotiation-17 == Outdated reference: A later version (-14) exists of draft-ietf-mmusic-sdp-simulcast-00 == Outdated reference: A later version (-19) exists of draft-ietf-rtcweb-overview-13 -- Obsolete informational reference (is this intentional?): RFC 4566 (Obsoleted by RFC 8866) Summary: 0 errors (**), 0 flaws (~~), 7 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Lennox 3 Internet-Draft Vidyo 4 Intended status: Informational K. Gross 5 Expires: September 6, 2015 AVA 6 S. Nandakumar 7 G. Salgueiro 8 Cisco Systems 9 B. Burman 10 Ericsson 11 March 5, 2015 13 A Taxonomy of Grouping Semantics and Mechanisms for Real-Time Transport 14 Protocol (RTP) Sources 15 draft-ietf-avtext-rtp-grouping-taxonomy-06 17 Abstract 19 The terminology about, and associations among, Real-Time Transport 20 Protocol (RTP) sources can be complex and somewhat opaque. This 21 document describes a number of existing and proposed relationships 22 among RTP sources, and attempts to define common terminology for 23 discussing protocol entities and their relationships. 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on September 6, 2015. 42 Copyright Notice 44 Copyright (c) 2015 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 60 2. Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . 4 61 2.1. Media Chain . . . . . . . . . . . . . . . . . . . . . . . 5 62 2.1.1. Physical Stimulus . . . . . . . . . . . . . . . . . . 8 63 2.1.2. Media Capture . . . . . . . . . . . . . . . . . . . . 9 64 2.1.3. Raw Stream . . . . . . . . . . . . . . . . . . . . . 9 65 2.1.4. Media Source . . . . . . . . . . . . . . . . . . . . 9 66 2.1.5. Source Stream . . . . . . . . . . . . . . . . . . . . 10 67 2.1.6. Media Encoder . . . . . . . . . . . . . . . . . . . . 10 68 2.1.7. Encoded Stream . . . . . . . . . . . . . . . . . . . 12 69 2.1.8. Dependent Stream . . . . . . . . . . . . . . . . . . 12 70 2.1.9. Media Packetizer . . . . . . . . . . . . . . . . . . 12 71 2.1.10. RTP Stream . . . . . . . . . . . . . . . . . . . . . 13 72 2.1.11. RTP-based Redundancy . . . . . . . . . . . . . . . . 13 73 2.1.12. Redundancy RTP Stream . . . . . . . . . . . . . . . . 14 74 2.1.13. Media Transport . . . . . . . . . . . . . . . . . . . 14 75 2.1.14. Media Transport Sender . . . . . . . . . . . . . . . 15 76 2.1.15. Sent RTP Stream . . . . . . . . . . . . . . . . . . . 15 77 2.1.16. Network Transport . . . . . . . . . . . . . . . . . . 16 78 2.1.17. Transported RTP Stream . . . . . . . . . . . . . . . 16 79 2.1.18. Media Transport Receiver . . . . . . . . . . . . . . 16 80 2.1.19. Received RTP Stream . . . . . . . . . . . . . . . . . 16 81 2.1.20. Received Redundancy RTP Stream . . . . . . . . . . . 16 82 2.1.21. RTP-based Repair . . . . . . . . . . . . . . . . . . 17 83 2.1.22. Repaired RTP Stream . . . . . . . . . . . . . . . . . 17 84 2.1.23. Media Depacketizer . . . . . . . . . . . . . . . . . 17 85 2.1.24. Received Encoded Stream . . . . . . . . . . . . . . . 17 86 2.1.25. Media Decoder . . . . . . . . . . . . . . . . . . . . 17 87 2.1.26. Received Source Stream . . . . . . . . . . . . . . . 18 88 2.1.27. Media Sink . . . . . . . . . . . . . . . . . . . . . 18 89 2.1.28. Received Raw Stream . . . . . . . . . . . . . . . . . 18 90 2.1.29. Media Render . . . . . . . . . . . . . . . . . . . . 18 91 2.2. Communication Entities . . . . . . . . . . . . . . . . . 19 92 2.2.1. Endpoint . . . . . . . . . . . . . . . . . . . . . . 20 93 2.2.2. RTP Session . . . . . . . . . . . . . . . . . . . . . 20 94 2.2.3. Participant . . . . . . . . . . . . . . . . . . . . . 21 95 2.2.4. Multimedia Session . . . . . . . . . . . . . . . . . 21 96 2.2.5. Communication Session . . . . . . . . . . . . . . . . 22 98 3. Concepts of Inter-Relations . . . . . . . . . . . . . . . . . 22 99 3.1. Synchronization Context . . . . . . . . . . . . . . . . . 22 100 3.1.1. RTCP CNAME . . . . . . . . . . . . . . . . . . . . . 23 101 3.1.2. Clock Source Signaling . . . . . . . . . . . . . . . 23 102 3.1.3. Implicitly via RtcMediaStream . . . . . . . . . . . . 23 103 3.1.4. Explicitly via SDP Mechanisms . . . . . . . . . . . . 23 104 3.2. Endpoint . . . . . . . . . . . . . . . . . . . . . . . . 23 105 3.3. Participant . . . . . . . . . . . . . . . . . . . . . . . 24 106 3.4. RtcMediaStream . . . . . . . . . . . . . . . . . . . . . 24 107 3.5. Multi-Channel Audio . . . . . . . . . . . . . . . . . . . 24 108 3.6. Simulcast . . . . . . . . . . . . . . . . . . . . . . . . 25 109 3.7. Layered Multi-Stream . . . . . . . . . . . . . . . . . . 26 110 3.8. RTP Stream Duplication . . . . . . . . . . . . . . . . . 27 111 3.9. Redundancy Format . . . . . . . . . . . . . . . . . . . . 28 112 3.10. RTP Retransmission . . . . . . . . . . . . . . . . . . . 29 113 3.11. Forward Error Correction . . . . . . . . . . . . . . . . 31 114 3.12. RTP Stream Separation . . . . . . . . . . . . . . . . . . 32 115 3.13. Multiple RTP Sessions over one Media Transport . . . . . 33 116 4. Mapping from Existing Terms . . . . . . . . . . . . . . . . . 33 117 4.1. Telepresence Terms . . . . . . . . . . . . . . . . . . . 33 118 4.1.1. Audio Capture . . . . . . . . . . . . . . . . . . . . 33 119 4.1.2. Capture Device . . . . . . . . . . . . . . . . . . . 33 120 4.1.3. Capture Encoding . . . . . . . . . . . . . . . . . . 33 121 4.1.4. Capture Scene . . . . . . . . . . . . . . . . . . . . 34 122 4.1.5. Endpoint . . . . . . . . . . . . . . . . . . . . . . 34 123 4.1.6. Individual Encoding . . . . . . . . . . . . . . . . . 34 124 4.1.7. Media Capture . . . . . . . . . . . . . . . . . . . . 34 125 4.1.8. Media Consumer . . . . . . . . . . . . . . . . . . . 34 126 4.1.9. Media Provider . . . . . . . . . . . . . . . . . . . 34 127 4.1.10. Stream . . . . . . . . . . . . . . . . . . . . . . . 34 128 4.1.11. Video Capture . . . . . . . . . . . . . . . . . . . . 34 129 4.2. Media Description . . . . . . . . . . . . . . . . . . . . 34 130 4.3. Media Stream . . . . . . . . . . . . . . . . . . . . . . 35 131 4.4. Multimedia Conference . . . . . . . . . . . . . . . . . . 35 132 4.5. Multimedia Session . . . . . . . . . . . . . . . . . . . 35 133 4.6. Multipoint Control Unit (MCU) . . . . . . . . . . . . . . 35 134 4.7. Multi-Session Transmission (MST) . . . . . . . . . . . . 35 135 4.8. Recording Device . . . . . . . . . . . . . . . . . . . . 36 136 4.9. RtcMediaStream . . . . . . . . . . . . . . . . . . . . . 36 137 4.10. RtcMediaStreamTrack . . . . . . . . . . . . . . . . . . . 36 138 4.11. RTP Sender . . . . . . . . . . . . . . . . . . . . . . . 36 139 4.12. RTP Session . . . . . . . . . . . . . . . . . . . . . . . 36 140 4.13. Single Session Transmission (SST) . . . . . . . . . . . . 36 141 4.14. SSRC . . . . . . . . . . . . . . . . . . . . . . . . . . 37 142 5. Security Considerations . . . . . . . . . . . . . . . . . . . 37 143 6. Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 37 144 7. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 37 145 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 38 146 9. Informative References . . . . . . . . . . . . . . . . . . . 38 147 Appendix A. Changes From Earlier Versions . . . . . . . . . . . 40 148 A.1. Modifications Between WG Version -05 and -06 . . . . . . 40 149 A.2. Modifications Between WG Version -04 and -05 . . . . . . 40 150 A.3. Modifications Between WG Version -03 and -04 . . . . . . 40 151 A.4. Modifications Between WG Version -02 and -03 . . . . . . 41 152 A.5. Modifications Between WG Version -01 and -02 . . . . . . 41 153 A.6. Modifications Between WG Version -00 and -01 . . . . . . 42 154 A.7. Modifications Between Version -02 and -03 . . . . . . . . 43 155 A.8. Modifications Between Version -01 and -02 . . . . . . . . 43 156 A.9. Modifications Between Version -00 and -01 . . . . . . . . 43 157 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 43 159 1. Introduction 161 The existing taxonomy of sources in RTP is often regarded as 162 confusing and inconsistent. Consequently, a deep understanding of 163 how the different terms relate to each other becomes a real 164 challenge. Frequently cited examples of this confusion are (1) how 165 different protocols that make use of RTP use the same terms to 166 signify different things and (2) how the complexities addressed at 167 one layer are often glossed over or ignored at another. 169 This document attempts to provide some clarity by reviewing the 170 semantics of various aspects of sources in RTP. As an organizing 171 mechanism, it approaches this by describing various ways that RTP 172 sources can be grouped and associated together. 174 All non-specific references to ControLling mUltiple streams for 175 tElepresence (CLUE) in this document map to [I-D.ietf-clue-framework] 176 and all references to Web Real-Time Communications (WebRTC) map to 177 [I-D.ietf-rtcweb-overview]. 179 2. Concepts 181 This section defines concepts that serve to identify and name various 182 transformations and streams in a given RTP usage. For each concept 183 an attempt is made to list any alternate definitions and usages that 184 co-exist today along with various characteristics that further 185 describes the concept. These concepts are divided into two 186 categories, one related to the chain of streams and transformations 187 that media can be subject to, the other for entities involved in the 188 communication. 190 2.1. Media Chain 192 In the context of this memo, Media is a sequence of synthetic or 193 Physical Stimuli (Section 2.1.1) (sound waves, photons, key-strokes), 194 represented in digital form. Synthesized Media is typically 195 generated directly in the digital domain. 197 This section contains the concepts that can be involved in taking 198 Media at a sender side and transporting it to a receiver, which may 199 recover a sequence of physical stimuli. This chain of concepts is of 200 two main types, streams and transformations. Streams are time-based 201 sequences of samples of the physical stimulus in various 202 representations, while transformations changes the representation of 203 the streams in some way. 205 The below examples are basic ones and it is important to keep in mind 206 that this conceptual model enables more complex usages. Some will be 207 further discussed in later sections of this document. In general the 208 following applies to this model: 210 o A transformation may have zero or more inputs and one or more 211 outputs. 213 o A stream is of some type, such as audio, video, real-time text, 214 etc. 216 o A stream has one source transformation and one or more sink 217 transformations (with the exception of Physical Stimulus 218 (Section 2.1.1) that may lack source or sink transformation). 220 o Streams can be forwarded from a transformation output to any 221 number of inputs on other transformations that support that type. 223 o If the output of a transformation is sent to multiple 224 transformations, those streams will be identical; it takes a 225 transformation to make them different. 227 o There are no formal limitations on how streams are connected to 228 transformations, this may include loops if required by a 229 particular transformation. 231 It is also important to remember that this is a conceptual model. 232 Thus real-world implementations may look different and have different 233 structure. 235 To provide a basic understanding of the relationships in the chain we 236 first introduce the concepts for the sender side (Figure 1). This 237 covers physical stimuli until media packets are emitted onto the 238 network. 240 Physical Stimulus 241 | 242 V 243 +--------------------+ 244 | Media Capture | 245 +--------------------+ 246 | 247 Raw Stream 248 V 249 +--------------------+ 250 | Media Source |<- Synchronization Timing 251 +--------------------+ 252 | 253 Source Stream 254 V 255 +--------------------+ 256 | Media Encoder | 257 +--------------------+ 258 | 259 Encoded Stream +------------+ 260 V | V 261 +--------------------+ | +----------------------+ 262 | Media Packetizer | | | RTP-based Redundancy | 263 +--------------------+ | +----------------------+ 264 | | | 265 +------------+ Redundancy RTP Stream 266 Source RTP Stream | 267 V V 268 +--------------------+ +--------------------+ 269 | Media Transport | | Media Transport | 270 +--------------------+ +--------------------+ 272 Figure 1: Sender Side Concepts in the Media Chain 274 In Figure 1 we have included a branched chain to cover the concepts 275 for using redundancy to improve the reliability of the transport. 276 The Media Transport concept is an aggregate that is decomposed in 277 Section 2.1.13. 279 In Figure 2 we review a receiver media chain matching the sender 280 side, to look at the inverse transformations and their attempts to 281 recover identical streams as in the sender chain, subject to what may 282 be lossy compression and imperfect Media Transport. Note that the 283 streams out of a reverse transformation, like the Source Stream out 284 the Media Decoder are in many cases not the same as the corresponding 285 ones on the sender side, thus they are prefixed with a "Received" to 286 denote a potentially modified version. The reason for not being the 287 same lies in the transformations that can be of irreversible type. 288 For example, lossy source coding in the Media Encoder prevents the 289 Source Stream out of the Media Decoder to be the same as the one fed 290 into the Media Encoder. Other reasons include packet loss or late 291 loss in the Media Transport transformation that even RTP-based 292 Repair, if used, fails to repair. However, some transformations are 293 not always present, like RTP-based Repair that cannot operate without 294 Redundancy RTP Streams. 296 +--------------------+ +--------------------+ 297 | Media Transport | | Media Transport | 298 +--------------------+ +--------------------+ 299 | | 300 Received RTP Stream Received Redundancy RTP Stream 301 | | 302 | +-------------------+ 303 V V 304 +--------------------+ 305 | RTP-based Repair | 306 +--------------------+ 307 | 308 Repaired RTP Stream 309 V 310 +--------------------+ 311 | Media Depacketizer | 312 +--------------------+ 313 | 314 Received Encoded Stream 315 V 316 +--------------------+ 317 | Media Decoder | 318 +--------------------+ 319 | 320 Received Source Stream 321 V 322 +--------------------+ 323 | Media Sink |--> Synchronization Information 324 +--------------------+ 325 | 326 Received Raw Stream 327 V 328 +--------------------+ 329 | Media Renderer | 330 +--------------------+ 331 | 332 V 333 Physical Stimulus 335 Figure 2: Receiver Side Concepts of the Media Chain 337 2.1.1. Physical Stimulus 339 The physical stimulus is a physical event that can be sampled and 340 converted to digital form by an appropriate sensor or transducer. 341 This include sound waves making up audio, photons in a light field, 342 or other excitations or interactions with sensors, like keystrokes on 343 a keyboard. 345 2.1.2. Media Capture 347 Media Capture is the process of transforming the Physical Stimulus 348 (Section 2.1.1) into digital Media using an appropriate sensor or 349 transducer. The Media Capture performs a digital sampling of the 350 physical stimulus, usually periodically, and outputs this in some 351 representation as a Raw Stream (Section 2.1.3). This data is due to 352 its periodical sampling, or at least being timed asynchronous events, 353 some form of a stream of media data. The Media Capture is normally 354 instantiated in some type of device, i.e. media capture device. 355 Examples of different types of media capturing devices are digital 356 cameras, microphones connected to A/D converters, or keyboards. 358 Characteristics: 360 o A Media Capture is identified either by hardware/manufacturer ID 361 or via a session-scoped device identifier as mandated by the 362 application usage. 364 o A Media Capture can generate an Encoded Stream (Section 2.1.7) if 365 the capture device support such a configuration. 367 o The nature of the Media Capture may impose constraints on the 368 clock handling in some of the subsequent steps. For example, many 369 audio or video capture devices are not completely free in 370 selecting the sample rate. 372 2.1.3. Raw Stream 374 The time progressing stream of digitally sampled information, usually 375 periodically sampled and provided by a Media Capture (Section 2.1.2). 376 A Raw Stream can also contain synthesized Media that may not require 377 any explicit Media Capture, since it is already in an appropriate 378 digital form. 380 2.1.4. Media Source 382 A Media Source is the logical source of a reference clock 383 synchronized, time progressing, digital media stream, called a Source 384 Stream (Section 2.1.5). This transformation takes one or more Raw 385 Streams (Section 2.1.3) and provides a Source Stream as output. The 386 output is synchronized with a reference clock (Section 3.1), which 387 can be as simple as a system local wall clock or as complex as NTP 388 synchronized. 390 The output can be of different types. One type is directly 391 associated with a particular Media Capture's Raw Stream. Others are 392 more conceptual sources, like an audio mix of multiple Source Streams 393 (Figure 3). Mixing multiple streams typically requires that the 394 input streams are possible to relate in time, meaning that they have 395 to be Source Streams (Section 2.1.5) rather than Raw Streams. In 396 Figure 3, the generated Source Stream is a mix of the three input 397 Source Streams. 399 Source Source Source 400 Stream Stream Stream 401 | | | 402 V V V 403 +--------------------------+ 404 | Media Source |<-- Reference Clock 405 | Mixer | 406 +--------------------------+ 407 | 408 V 409 Source Stream 411 Figure 3: Conceptual Media Source in form of Audio Mixer 413 Another possible example of a conceptual Media Source is a video 414 surveillance switch, where the input is multiple Source Streams from 415 different cameras, and the output is one of those Source Streams 416 based on some selection criteria, like a round-robin or based on some 417 video activity measure. 419 Characteristics: 421 o At any point, it can represent a physical captured source or 422 conceptual source. 424 2.1.5. Source Stream 426 A time progressing stream of digital samples that has been 427 synchronized with a reference clock and comes from particular Media 428 Source (Section 2.1.4). 430 2.1.6. Media Encoder 432 A Media Encoder is a transform that is responsible for encoding the 433 media data from a Source Stream (Section 2.1.5) into another 434 representation, usually more compact, that is output as an Encoded 435 Stream (Section 2.1.7). 437 The Media Encoder step commonly includes pre-encoding 438 transformations, such as scaling, resampling etc. The Media Encoder 439 can have a significant number of configuration options that affects 440 the properties of the Encoded Stream. This include properties such 441 as bit-rate, start points for decoding, resolution, bandwidth or 442 other fidelity affecting properties. The actually used codec is also 443 an important factor in many communication systems. 445 Scalable Media Encoders need special attention as they produce 446 multiple outputs that are potentially of different types. As shown 447 in Figure 4, a scalable Media Encoder takes one input Source Stream 448 and encodes it into multiple output streams of two different types; 449 at least one Encoded Stream that is independently decodable and one 450 or more Dependent Streams (Section 2.1.8). Decoding requires at 451 least one Encoded Stream and zero or more Dependent Streams. A 452 Dependent Stream's dependency is one of the grouping relations this 453 document discusses further in Section 3.7. 455 Source Stream 456 | 457 V 458 +--------------------------+ 459 | Scalable Media Encoder | 460 +--------------------------+ 461 | | ... | 462 V V V 463 Encoded Dependent Dependent 464 Stream Stream Stream 466 Figure 4: Scalable Media Encoder Input and Outputs 468 There are also other variants of encoders, like so-called Multiple 469 Description Coding (MDC). Such Media Encoder produce multiple 470 independent and thus individually decodable Encoded Streams. 471 However, (logically) combining multiple of these Encoded Streams into 472 a single Received Source Stream during decoding leads to an 473 improvement in perceptual reproduced quality when compared to 474 decoding a single Encoded Stream. 476 Creating multiple Encoded Streams from the same Source Stream, where 477 the Encoded Streams are neither in a scalable nor in an MDC 478 relationship is commonly utilized in Simulcast 479 [I-D.ietf-mmusic-sdp-simulcast] environments. 481 Characteristics: 483 o A Media Source can be multiply encoded by different Media Encoders 484 to provide various encoded representations. 486 2.1.7. Encoded Stream 488 A stream of time synchronized encoded media that can be independently 489 decoded. 491 Characteristics: 493 o Due to temporal dependencies, an Encoded Stream may have 494 limitations in where decoding can be started. These entry points, 495 for example Intra frames from a video encoder, may require 496 identification and their generation may be event based or 497 configured to occur periodically. 499 2.1.8. Dependent Stream 501 A stream of time synchronized encoded media fragments that are 502 dependent on one or more Encoded Streams (Section 2.1.7) and zero or 503 more Dependent Streams to be possible to decode. 505 Characteristics: 507 o Each Dependent Stream has a set of dependencies. These 508 dependencies must be understood by the parties in a Multimedia 509 Session that intend to use a Dependent Stream. 511 2.1.9. Media Packetizer 513 The transformation of taking one or more Encoded (Section 2.1.7) or 514 Dependent Streams (Section 2.1.8) and put their content into one or 515 more sequences of packets, normally RTP packets, and output Source 516 RTP Streams (Section 2.1.10). This step includes both generating RTP 517 payloads as well as RTP packets. 519 The Media Packetizer can use multiple inputs when producing a single 520 RTP Stream. One such example is SRST packetization when using 521 Scalable Video Coding (SVC) (Section 3.7). 523 The Media Packetizer can also produce multiple RTP Streams, for 524 example when Encoded and/or Dependent Streams are distributed over 525 multiple RTP Streams. One example of this is MRMT packetization when 526 using SVC (Section 3.7). 528 Characteristics: 530 o The Media Packetizer will select which Synchronization source(s) 531 (SSRC) [RFC3550] in which RTP Sessions that are used. 533 o Media Packetizer can combine multiple Encoded or Dependent Streams 534 into one or more RTP Streams. 536 2.1.10. RTP Stream 538 A stream of RTP packets containing media data, source or redundant. 539 The RTP Stream is identified by an SSRC belonging to a particular RTP 540 Session. The RTP Session is identified as discussed in 541 Section 2.2.2. 543 A Source RTP Stream is a RTP Stream containing at least some content 544 from an Encoded Stream (Section 2.1.7). Source material is any media 545 material that is produced for transport over RTP without any 546 additional RTP-based redundancy applied. Note that RTP-based 547 redundancy excludes the type of redundancy that most suitable Media 548 Encoders (Section 2.1.6) may add to the media format of the Encoded 549 Stream that makes it cope better with inevitable RTP packet losses. 550 This is further described in RTP-based Redundancy (Section 2.1.11) 551 and Redundancy RTP Stream (Section 2.1.12). 553 Characteristics: 555 o Each RTP Stream is identified by a Synchronization source (SSRC) 556 [RFC3550] that is carried in every RTP and RTP Control Protocol 557 (RTCP) packet header. The SSRC is unique in a specific RTP 558 Session context. 560 o At any given point in time, a RTP Stream can have one and only one 561 SSRC, but SSRCs for a given RTP Stream can change over time. SSRC 562 collision and clock rate change [RFC7160] are examples of valid 563 reasons to change SSRC for an RTP Stream. In those cases, the RTP 564 Stream itself is not changed in any significant way, only the 565 identifying SSRC number. 567 o Each SSRC defines a unique RTP sequence numbering and timing 568 space. 570 o Several RTP Streams, each with their own SSRC, may represent a 571 single Media Source. 573 o Several RTP Streams, each with their own SSRC, can be carried in a 574 single RTP Session. 576 2.1.11. RTP-based Redundancy 578 RTP-based Redundancy is defined here as a transformation that 579 generates redundant or repair packets sent out as a Redundancy RTP 580 Stream (Section 2.1.12) to mitigate network transport impairments, 581 like packet loss and delay. 583 The RTP-based Redundancy exists in many flavors; they may be 584 generating independent Repair Streams that are used in addition to 585 the Source Stream (like RTP Retransmission (Section 3.10) and some 586 special types of Forward Error Correction, like RTP stream 587 duplication (Section 3.8)), they may generate a new Source Stream by 588 combining redundancy information with source information (Using XOR 589 FEC (Section 3.11) as a redundancy payload (Section 3.9)), or 590 completely replace the source information with only redundancy 591 packets. 593 2.1.12. Redundancy RTP Stream 595 A RTP Stream (Section 2.1.10) that contains no original source data, 596 only redundant data, which may either be used standalone or be 597 combined with one or more Received RTP Streams (Section 2.1.19) to 598 produce Repaired RTP Streams (Section 2.1.22). 600 2.1.13. Media Transport 602 A Media Transport defines the transformation that the RTP Streams 603 (Section 2.1.10) are subjected to by the end-to-end transport from 604 one RTP sender to one specific RTP receiver (an RTP Session 605 (Section 2.2.2) may contain multiple RTP receivers per sender). Each 606 Media Transport is defined by a transport association that is 607 normally identified by a 5-tuple (source address, source port, 608 destination address, destination port, transport protocol), but a 609 proposal exists for sending multiple transport associations on a 610 single 5-tuple [I-D.westerlund-avtcore-transport-multiplexing]. 612 Characteristics: 614 o Media Transport transmits RTP Streams of RTP Packets from a source 615 transport address to a destination transport address. 617 o Each Media Transport contains only a single RTP Session. 619 o A single RTP Session can span multiple Media Transports. 621 The Media Transport concept sometimes needs to be decomposed into 622 more steps to enable discussion of what a sender emits that gets 623 transformed by the network before it is received by the receiver. 624 Thus we provide also this Media Transport decomposition (Figure 5). 626 RTP Stream 627 | 628 V 629 +--------------------------+ 630 | Media Transport Sender | 631 +--------------------------+ 632 | 633 Sent RTP Stream 634 V 635 +--------------------------+ 636 | Network Transport | 637 +--------------------------+ 638 | 639 Transported RTP Stream 640 V 641 +--------------------------+ 642 | Media Transport Receiver | 643 +--------------------------+ 644 | 645 V 646 Received RTP Stream 648 Figure 5: Decomposition of Media Transport 650 2.1.14. Media Transport Sender 652 The first transformation within the Media Transport (Section 2.1.13) 653 is the Media Transport Sender. The sending Endpoint (Section 2.2.1) 654 takes an RTP Stream and emits the packets onto the network using the 655 transport association established for this Media Transport, thereby 656 creating a Sent RTP Stream (Section 2.1.15). In the process, it 657 transforms the RTP Stream in several ways. First, it generates the 658 necessary protocol headers for the transport association, for example 659 IP and UDP headers, thus forming IP/UDP/RTP packets. In addition, 660 the Media Transport Sender may queue, pace or otherwise affect how 661 the packets are emitted onto the network, thereby potentially 662 introducing delay, jitter and inter packet spacings that characterize 663 the Sent RTP Stream. 665 2.1.15. Sent RTP Stream 667 The Sent RTP Stream is the RTP Stream as entering the first hop of 668 the network path to its destination. The Sent RTP Stream is 669 identified using network transport addresses, like for IP/UDP the 670 5-tuple (source IP address, source port, destination IP address, 671 destination port, and protocol (UDP)). 673 2.1.16. Network Transport 675 Network Transport is the transformation that subjects the Sent RTP 676 Stream (Section 2.1.15) to traveling from the source to the 677 destination through the network. This transformation can result in 678 loss of some packets, varying delay on a per packet basis, packet 679 duplication, and packet header or data corruption. This 680 transformation produces a Transported RTP Stream (Section 2.1.17) at 681 the exit of the network path. 683 2.1.17. Transported RTP Stream 685 The RTP Stream that is emitted out of the network path at the 686 destination, subjected to the Network Transport's transformation 687 (Section 2.1.16). 689 2.1.18. Media Transport Receiver 691 The receiver Endpoint's (Section 2.2.1) transformation of the 692 Transported RTP Stream (Section 2.1.17) by its reception process, 693 which results in the Received RTP Stream (Section 2.1.19). This 694 transformation includes transport checksums being verified. Sensible 695 system designs typically either discard packets with mis-matching 696 checksums, or pass them on while somehow marking them in the 697 resulting Received RTP Stream so to alarm subsequent transformations 698 about the possible corrupt state. In this context it is worth noting 699 that there is typically some probability for corrupt packets to pass 700 through undetected (with a seemingly correct checksum). Other 701 transformations can compensate for delay variations in receiving a 702 packet on the network interface and providing it to the application 703 (de-jitter buffer). 705 2.1.19. Received RTP Stream 707 The RTP Stream (Section 2.1.10) resulting from the Media Transport's 708 transformation, i.e. subjected to packet loss, packet corruption, 709 packet duplication and varying transmission delay from sender to 710 receiver. 712 2.1.20. Received Redundancy RTP Stream 714 The Redundancy RTP Stream (Section 2.1.12) resulting from the Media 715 Transport transformation, i.e. subjected to packet loss, packet 716 corruption, and varying transmission delay from sender to receiver. 718 2.1.21. RTP-based Repair 720 RTP-based Repair is a Transformation that takes as input zero or more 721 Received RTP Streams (Section 2.1.19) and one or more Received 722 Redundancy RTP Streams (Section 2.1.20), and produces one or more 723 Repaired RTP Streams (Section 2.1.22) that are as close to the 724 corresponding sent Source RTP Streams (Section 2.1.10) as possible, 725 using different RTP-based repair methods, for example the ones 726 referred in RTP-based Redundancy (Section 2.1.11). 728 2.1.22. Repaired RTP Stream 730 A Received RTP Stream (Section 2.1.19) for which Received Redundancy 731 RTP Stream (Section 2.1.20) information has been used to try to 732 recover the Source RTP Stream (Section 2.1.10) as it was before Media 733 Transport (Section 2.1.13). 735 2.1.23. Media Depacketizer 737 A Media Depacketizer takes one or more RTP Streams (Section 2.1.10), 738 depacketizes them, and attempts to reconstitute the Encoded Streams 739 (Section 2.1.7) or Dependent Streams (Section 2.1.8) present in those 740 RTP Streams. 742 In practical implementations, the Media Depacketizer and the Media 743 Decoder may be tightly coupled and share information to improve or 744 optimize the overall decoding and error concealment process. It is, 745 however, not expected that there would be any benefit in defining a 746 taxonomy for those detailed (and likely very implementation- 747 dependent) steps. 749 2.1.24. Received Encoded Stream 751 The received version of an Encoded Stream (Section 2.1.7). 753 2.1.25. Media Decoder 755 A Media Decoder is a transformation that is responsible for decoding 756 Encoded Streams (Section 2.1.7) and any Dependent Streams 757 (Section 2.1.8) into a Source Stream (Section 2.1.5). 759 In practical implementations, the Media Decoder and the Media 760 Depacketizer may be tightly coupled and share information to improve 761 or optimize the overall decoding process in various ways. It is 762 however not expected that there would be any benefit in defining a 763 taxonomy for those detailed (and likely very implementation- 764 dependent) steps. 766 Characteristics: 768 o A Media Decoder has to deal with any errors in the Encoded Streams 769 that resulted from corruption or failure to repair packet losses. 770 Therefore, it commonly is robust to error and losses, and includes 771 concealment methods. 773 2.1.26. Received Source Stream 775 The received version of a Source Stream (Section 2.1.5). 777 2.1.27. Media Sink 779 The Media Sink receives a Source Stream (Section 2.1.5) that 780 contains, usually periodically, sampled media data together with 781 associated synchronization information. Depending on application, 782 this Source Stream then needs to be transformed into a Raw Stream 783 (Section 2.1.3) that is conveyed to the Media Render 784 (Section 2.1.29), synchronized with the output from other Media 785 Sinks. The Media Sink may also be connected with a Media Source 786 (Section 2.1.4) and be used as part of a conceptual Media Source. 788 Characteristics: 790 o The Media Sink can further transform the Source Stream into a 791 representation that is suitable for rendering on the Media Render 792 as defined by the application or system-wide configuration. This 793 include sample scaling, level adjustments etc. 795 2.1.28. Received Raw Stream 797 The received version of a Raw Stream (Section 2.1.3). 799 2.1.29. Media Render 801 A Media Render takes a Raw Stream (Section 2.1.3) and converts it 802 into Physical Stimulus (Section 2.1.1) that a human user can 803 perceive. Examples of such devices are screens, and D/A converters 804 connected to amplifiers and loudspeakers. 806 Characteristics: 808 o An Endpoint can potentially have multiple Media Renders for each 809 media type. 811 2.2. Communication Entities 813 This section contains concepts for entities involved in the 814 communication. 816 +------------------------------------------------------------+ 817 | Communication Session | 818 | | 819 | +----------------+ +----------------+ | 820 | | Participant A | +------------+ | Participant B | | 821 | | | | Multimedia | | | | 822 | | +------------+ |<==>| Session |<==>| +------------+ | | 823 | | | Endpoint A | | | | | | Endpoint B | | | 824 | | | | | +------------+ | | | | | 825 | | | +----------+-+----------------------+-+----------+ | | | 826 | | | | RTP | | | | | | | | 827 | | | | Session |-+---Media Transport----+>| | | | | 828 | | | | Audio |<+---Media Transport----+-| | | | | 829 | | | | | | ^ | | | | | | 830 | | | +----------+-+----------|-----------+-+----------+ | | | 831 | | | | | v | | | | | 832 | | | | | +-----------------+ | | | | | 833 | | | | | | Synchronization | | | | | | 834 | | | | | | Context | | | | | | 835 | | | | | +-----------------+ | | | | | 836 | | | | | ^ | | | | | 837 | | | +----------+-+----------|-----------+-+----------+ | | | 838 | | | | RTP | | v | | | | | | 839 | | | | Session |<+---Media Transport----+-| | | | | 840 | | | | Video |-+---Media Transport----+>| | | | | 841 | | | | | | | | | | | | 842 | | | +----------+-+----------------------+-+----------+ | | | 843 | | +------------+ | | +------------+ | | 844 | +----------------+ +----------------+ | 845 +------------------------------------------------------------+ 847 Figure 6: Example Point to Point Communication Session with two RTP 848 Sessions 850 Figure 6 shows a high-level example representation of a very basic 851 point-to-point Communication Session between Participants A and B. 852 It uses two different audio and video RTP Sessions between A's and 853 B's Endpoints, using separate Media Transports for those RTP 854 Sessions. The Multimedia Session shared by the Participants can, for 855 example, be established using SIP (i.e., there is a SIP Dialog 856 between A and B). The terms used in Figure 6 are further elaborated 857 in the sub-sections below. 859 2.2.1. Endpoint 861 A single addressable entity sending or receiving RTP packets. It may 862 be decomposed into several functional blocks, but as long as it 863 behaves as a single RTP stack entity it is classified as a single 864 "Endpoint". 866 Characteristics: 868 o Endpoints can be identified in several different ways. While RTCP 869 Canonical Names (CNAMEs) [RFC3550] provide a globally unique and 870 stable identification mechanism for the duration of the 871 Communication Session (see Section 2.2.5), their validity applies 872 exclusively within a Synchronization Context (Section 3.1). Thus 873 one Endpoint can handle multiple CNAMEs, each of which can be 874 shared among a set of Endpoints belonging to the same Participant 875 (Section 2.2.3). Therefore, mechanisms outside the scope of RTP, 876 such as application defined mechanisms, must be used to ensure 877 Endpoint identification when outside this Synchronization Context. 879 o An Endpoint can be associated with at most one Participant 880 (Section 2.2.3) at any single point in time. 882 o In some contexts, an Endpoint would typically correspond to a 883 single "host", for example a computer using a single network 884 interface and being used by a single human user. In other 885 contexts, a single "host" can serve multiple Participants, in 886 which case each Participant's Endpoint may share properties, for 887 example the IP address part of a transport address. 889 2.2.2. RTP Session 891 An RTP Session is an association among a group of Participants 892 communicating with RTP. It is a group communications channel which 893 can potentially carry a number of RTP Streams. Within an RTP 894 Session, every Participant can find meta-data and control information 895 (over RTCP) about all the RTP Streams in the RTP Session. The 896 bandwidth of the RTCP control channel is shared between all 897 Participants within an RTP Session. 899 Characteristics: 901 o An RTP Session can carry one ore more RTP Streams. 903 o An RTP Session shares a single SSRC space as defined in RFC3550 904 [RFC3550]. That is, the Endpoints participating in an RTP Session 905 can see an SSRC identifier transmitted by any of the other 906 Endpoints. An Endpoint can receive an SSRC either as SSRC or as a 907 Contributing source (CSRC) in RTP and RTCP packets, as defined by 908 the Endpoints' network interconnection topology. 910 o An RTP Session uses at least two Media Transports 911 (Section 2.1.13), one for sending and one for receiving. 912 Commonly, the receiving Media Transport is the reverse direction 913 of the Media Transport used for sending. An RTP Session may use 914 many Media Transports and these define the session's network 915 interconnection topology. 917 o A single Media Transport always carries a single RTP Session. 919 o Multiple RTP Sessions can be conceptually related, for example 920 originating from or targeted for the same Participant 921 (Section 2.2.3) or Endpoint (Section 2.2.1), or by containing RTP 922 Streams that are somehow related (Section 3). 924 2.2.3. Participant 926 A Participant is an entity reachable by a single signaling address, 927 and is thus related more to the signaling context than to the media 928 context. 930 Characteristics: 932 o A single signaling-addressable entity, using an application- 933 specific signaling address space, for example a SIP URI. 935 o A Participant can participate in several Multimedia Sessions 936 (Section 2.2.4). 938 o A Participant can be comprised of several associated Endpoints 939 (Section 2.2.1). 941 2.2.4. Multimedia Session 943 A Multimedia Session is an association among a group of Participants 944 (Section 2.2.3) engaged in the communication via one or more RTP 945 Sessions (Section 2.2.2). It defines logical relationships among 946 Media Sources (Section 2.1.4) that appear in multiple RTP Sessions. 948 Characteristics: 950 o A Multimedia Session can be composed of several RTP Sessions with 951 potentially multiple RTP Streams per RTP Session. 953 o Each Participant in a Multimedia Session can have a multitude of 954 Media Captures and Media Rendering devices. 956 o A single Multimedia Session can contain media from one or more 957 Synchronization Contexts (Section 3.1). An example of that is a 958 Multimedia Session containing one set of audio and video for 959 communication purposes belonging to one Synchronization Context, 960 and another set of audio and video for presentation purposes (like 961 playing a video file) with a separate Synchronization Context that 962 has no strong timing relationship and need not be strictly 963 synchronized with the audio and video used for communication. 965 2.2.5. Communication Session 967 A Communication Session is an association among two or more 968 Participants (Section 2.2.3) communicating with each other via one or 969 more Multimedia Sessions (Section 2.2.4). 971 Characteristics: 973 o Each Participant in a Communication Session is identified via an 974 application-specific signaling address. 976 o A Communication Session is composed of Participants that share at 977 least one Multimedia Session, involving one or more parallel RTP 978 Sessions with potentially multiple RTP Streams per RTP Session. 980 For example, in a full mesh communication, the Communication Session 981 consists of a set of separate Multimedia Sessions between each pair 982 of Participants. Another example is a centralized conference, where 983 the Communication Session consists of a set of Multimedia Sessions 984 between each Participant and the conference handler. 986 3. Concepts of Inter-Relations 988 This section uses the concepts from previous sections, and looks at 989 different types of relationships among them. These relationships 990 occur at different abstraction levels and for different purposes, but 991 the reason for the needed relationship at a certain step in the media 992 handling chain may exist at another step. For example, the use of 993 Simulcast (Section 3.6)) implies a need to determine relations at RTP 994 Stream level, but the underlying reason is that multiple Media 995 Encoders use the same Media Source, i.e. to be able to identify a 996 common Media Source. 998 3.1. Synchronization Context 1000 A Synchronization Context defines a requirement on a strong timing 1001 relationship between the Media Sources, typically requiring alignment 1002 of clock sources. Such a relationship can be identified in multiple 1003 ways as listed below. A single Media Source can only belong to a 1004 single Synchronization Context, since it is assumed that a single 1005 Media Source can only have a single media clock and requiring 1006 alignment to several Synchronization Contexts (and thus reference 1007 clocks) will effectively merge those into a single Synchronization 1008 Context. 1010 3.1.1. RTCP CNAME 1012 RFC3550 [RFC3550] describes Inter-media synchronization between RTP 1013 Sessions based on RTCP CNAME, RTP and Network Time Protocol (NTP) 1014 [RFC5905] formatted timestamps of a reference clock. As indicated in 1015 [RFC7273], despite using NTP format timestamps, it is not required 1016 that the clock be synchronized to an NTP source. 1018 3.1.2. Clock Source Signaling 1020 [RFC7273] provides a mechanism to signal the clock source in Session 1021 Description Protocol (SDP) [RFC4566] both for the reference clock as 1022 well as the media clock, thus allowing a Synchronization Context to 1023 be defined beyond the one defined by the usage of CNAME source 1024 descriptions. 1026 3.1.3. Implicitly via RtcMediaStream 1028 WebRTC defines "RtcMediaStream" with one or more 1029 "RtcMediaStreamTracks". All tracks in a "RtcMediaStream" are 1030 intended to be synchronized when rendered, implying that they must be 1031 generated such that synchronization is possible. 1033 3.1.4. Explicitly via SDP Mechanisms 1035 The SDP Grouping Framework [RFC5888] defines an m= line (Section 4.2) 1036 grouping mechanism called "Lip Synchronization" (with LS 1037 identification-tag) for establishing the synchronization requirement 1038 across m= lines when they map to individual sources. 1040 Source-Specific Media Attributes in SDP [RFC5576] extends the above 1041 mechanism when multiple Media Sources are described by a single m= 1042 line. 1044 3.2. Endpoint 1046 Some applications requires knowledge of what Media Sources originate 1047 from a particular Endpoint (Section 2.2.1). This can include such 1048 decisions as packet routing between parts of the topology, knowing 1049 the Endpoint origin of the RTP Streams. 1051 In RTP, this identification has been overloaded with the 1052 Synchronization Context (Section 3.1) through the usage of the RTCP 1053 source description CNAME (Section 3.1.1). This works for some 1054 usages, but in others it breaks down. For example, if an Endpoint 1055 has two sets of Media Sources that have different Synchronization 1056 Contexts, like the audio and video of the human Participant as well 1057 as a set of Media Sources of audio and video for a shared movie, 1058 CNAME would not be an appropriate identification for that Endpoint. 1059 Therefore, an Endpoint may have multiple CNAMEs. The CNAMEs or the 1060 Media Sources themselves can be related to the Endpoint. 1062 3.3. Participant 1064 In communication scenarios, it is commonly needed to know which Media 1065 Sources originate from which Participant (Section 2.2.3). One reason 1066 is, for example, to enable the application to display Participant 1067 Identity information correctly associated with the Media Sources. 1068 This association is handled through the signaling solution to point 1069 at a specific Multimedia Session where the Media Sources may be 1070 explicitly or implicitly tied to a particular Endpoint. 1072 Participant information becomes more problematic due to Media Sources 1073 that are generated through mixing or other conceptual processing of 1074 Raw Streams or Source Streams that originate from different 1075 Participants. This type of Media Sources can thus have a dynamically 1076 varying set of origins and Participants. RTP contains the concept of 1077 CSRC that carry information about the previous step origin of the 1078 included media content on RTP level. 1080 3.4. RtcMediaStream 1082 An RtcMediaStream in WebRTC is an explicit grouping of a set of Media 1083 Sources (RtcMediaStreamTracks) that share a common identifier and a 1084 single Synchronization Context (Section 3.1). 1086 3.5. Multi-Channel Audio 1088 There exist a number of RTP payload formats that can carry multi- 1089 channel audio, despite the codec being a mono encoder. Multi-channel 1090 audio can be viewed as multiple Media Sources sharing a common 1091 Synchronization Context. These are independently encoded by a Media 1092 Encoder and the different Encoded Streams are packetized together in 1093 a time synchronized way into a single Source RTP Stream, using the 1094 used codec's RTP Payload format. Examples of codecs that support 1095 multi-channel audio are PCMA and PCMU [RFC3551], AMR [RFC4867], and 1096 G.719 [RFC5404]. 1098 3.6. Simulcast 1100 A Media Source represented as multiple independent Encoded Streams 1101 constitutes a Simulcast [I-D.ietf-mmusic-sdp-simulcast] or MDC of 1102 that Media Source. Figure 7 shows an example of a Media Source that 1103 is encoded into three separate Simulcast streams, that are in turn 1104 sent on the same Media Transport flow. When using Simulcast, the RTP 1105 Streams may be sharing RTP Session and Media Transport, or be 1106 separated on different RTP Sessions and Media Transports, or any 1107 combination of these two. It is other considerations that affect 1108 which usage is desirable, as discussed in Section 3.12. 1110 +----------------+ 1111 | Media Source | 1112 +----------------+ 1113 Source Stream | 1114 +----------------------+----------------------+ 1115 | | | 1116 V V V 1117 +------------------+ +------------------+ +------------------+ 1118 | Media Encoder | | Media Encoder | | Media Encoder | 1119 +------------------+ +------------------+ +------------------+ 1120 | Encoded | Encoded | Encoded 1121 | Stream | Stream | Stream 1122 V V V 1123 +------------------+ +------------------+ +------------------+ 1124 | Media Packetizer | | Media Packetizer | | Media Packetizer | 1125 +------------------+ +------------------+ +------------------+ 1126 | Source | Source | Source 1127 | RTP | RTP | RTP 1128 | Stream | Stream | Stream 1129 +-----------------+ | +-----------------+ 1130 | | | 1131 V V V 1132 +-------------------+ 1133 | Media Transport | 1134 +-------------------+ 1136 Figure 7: Example of Media Source Simulcast 1138 The Simulcast relation between the RTP Streams is the common Media 1139 Source. In addition, to be able to identify the common Media Source, 1140 a receiver of the RTP Stream may need to know which configuration or 1141 encoding goals that lay behind the produced Encoded Stream and its 1142 properties. This to enable selection of the stream that is most 1143 useful in the application at that moment. 1145 3.7. Layered Multi-Stream 1147 Layered Multi-Stream (LMS) is a mechanism by which different portions 1148 of a layered or scalable encoding of a Source Stream are sent using 1149 separate RTP Streams (sometimes in separate RTP Sessions). LMSs are 1150 useful for receiver control of layered media. 1152 A Media Source represented as an Encoded Stream and multiple 1153 Dependent Streams constitutes a Media Source that has layered 1154 dependencies. Figure 8 represents an example of a Media Source that 1155 is encoded into three dependent layers, where two layers are sent on 1156 the same Media Transport using different RTP Streams, i.e. SSRCs, and 1157 the third layer is sent on a separate Media Transport. 1159 +----------------+ 1160 | Media Source | 1161 +----------------+ 1162 | 1163 | 1164 V 1165 +---------------------------------------------------------+ 1166 | Media Encoder | 1167 +---------------------------------------------------------+ 1168 | | | 1169 Encoded Stream Dependent Stream Dependent Stream 1170 | | | 1171 V V V 1172 +----------------+ +----------------+ +----------------+ 1173 |Media Packetizer| |Media Packetizer| |Media Packetizer| 1174 +----------------+ +----------------+ +----------------+ 1175 | | | 1176 RTP Stream RTP Stream RTP Stream 1177 | | | 1178 +------+ +------+ | 1179 | | | 1180 V V V 1181 +-----------------+ +-----------------+ 1182 | Media Transport | | Media Transport | 1183 +-----------------+ +-----------------+ 1185 Figure 8: Example of Media Source Layered Dependency 1187 It is sometimes useful to make a distinction between using a single 1188 Media Transport or multiple separate Media Transports when (in both 1189 cases) using multiple RTP Streams to carry Encoded Streams and 1190 Dependent Streams for a Media Source. Therefore, the following new 1191 terminology is defined here: 1193 SRST: Single RTP Stream on a Single Media Transport 1195 MRST: Multiple RTP Streams on a Single Media Transport 1197 MRMT: Multiple RTP Streams on Multiple Media Transports 1199 MRST and MRMT relations needs to identify the common Media Encoder 1200 origin for the Encoded and Dependent Streams. When using different 1201 RTP Sessions, thus different Media Transports, and as long as there 1202 is only one RTP Stream per Media Encoder and a single Media Source in 1203 each RTP Session (MRMT), common SSRC and CNAMEs can be used to 1204 identify the common Media Source. When multiple RTP Streams are sent 1205 from one Media Encoder in the same RTP Session (MRST), then CNAME is 1206 the only currently specified RTP identifier that can be used. In 1207 cases where multiple Media Encoders use multiple Media Sources 1208 sharing Synchronization Context, and thus having a common CNAME, 1209 additional heuristics or identification need to be applied to create 1210 the MRST or MRMT relationships between the RTP Streams. 1212 3.8. RTP Stream Duplication 1214 RTP Stream Duplication [RFC7198], using the same or different Media 1215 Transports, and optionally also delaying the duplicate [RFC7197], 1216 offers a simple way to protect media flows from packet loss in some 1217 cases (see Figure 9). It is a specific type of redundancy and all 1218 but one Source RTP Stream (Section 2.1.10) are effectively Redundancy 1219 RTP Streams (Section 2.1.12), but since both Source and Redundant RTP 1220 Streams are the same it does not matter which one is which. This can 1221 also be seen as a specific type of Simulcast (Section 3.6) that 1222 transmits the same Encoded Stream (Section 2.1.7) multiple times. 1224 +----------------+ 1225 | Media Source | 1226 +----------------+ 1227 Source Stream | 1228 V 1229 +----------------+ 1230 | Media Encoder | 1231 +----------------+ 1232 Encoded Stream | 1233 +-----------+-----------+ 1234 | | 1235 V V 1236 +------------------+ +------------------+ 1237 | Media Packetizer | | Media Packetizer | 1238 +------------------+ +------------------+ 1239 Source | RTP Stream Source | RTP Stream 1240 | V 1241 | +-------------+ 1242 | | Delay (opt) | 1243 | +-------------+ 1244 | | 1245 +-----------+-----------+ 1246 | 1247 V 1248 +-------------------+ 1249 | Media Transport | 1250 +-------------------+ 1252 Figure 9: Example of RTP Stream Duplication 1254 3.9. Redundancy Format 1256 The RTP Payload for Redundant Audio Data [RFC2198] defines a 1257 transport for redundant audio data together with primary data in the 1258 same RTP payload. The redundant data can be a time delayed version 1259 of the primary or another time delayed Encoded Stream using a 1260 different Media Encoder to encode the same Media Source as the 1261 primary, as depicted in Figure 10. 1263 +--------------------+ 1264 | Media Source | 1265 +--------------------+ 1266 | 1267 Source Stream 1268 | 1269 +------------------------+ 1270 | | 1271 V V 1272 +--------------------+ +--------------------+ 1273 | Media Encoder | | Media Encoder | 1274 +--------------------+ +--------------------+ 1275 | | 1276 | +------------+ 1277 Encoded Stream | Time Delay | 1278 | +------------+ 1279 | | 1280 | +------------------+ 1281 V V 1282 +--------------------+ 1283 | Media Packetizer | 1284 +--------------------+ 1285 | 1286 V 1287 RTP Stream 1289 Figure 10: Concept for usage of Audio Redundancy with different Media 1290 Encoders 1292 The Redundancy format is thus providing the necessary meta 1293 information to correctly relate different parts of the same Encoded 1294 Stream, or in the case depicted above (Figure 10) relate the Received 1295 Source Stream fragments coming out of different Media Decoders to be 1296 able to combine them together into a less erroneous Source Stream. 1298 3.10. RTP Retransmission 1300 Figure 11 shows an example where a Media Source's Source RTP Stream 1301 is protected by a retransmission (RTX) flow [RFC4588]. In this 1302 example the Source RTP Stream and the Redundancy RTP Stream share the 1303 same Media Transport. 1305 +--------------------+ 1306 | Media Source | 1307 +--------------------+ 1308 | 1309 V 1310 +--------------------+ 1311 | Media Encoder | 1312 +--------------------+ 1313 | Retransmission 1314 Encoded Stream +--------+ +---- Request 1315 V | V V 1316 +--------------------+ | +--------------------+ 1317 | Media Packetizer | | | RTP Retransmission | 1318 +--------------------+ | +--------------------+ 1319 | | | 1320 +------------+ Redundancy RTP Stream 1321 Source RTP Stream | 1322 | | 1323 +---------+ +---------+ 1324 | | 1325 V V 1326 +-----------------+ 1327 | Media Transport | 1328 +-----------------+ 1330 Figure 11: Example of Media Source Retransmission Flows 1332 The RTP Retransmission example (Figure 11) illustrates that this 1333 mechanism works purely on the Source RTP Stream. The RTP 1334 Retransmission transform buffers the sent Source RTP Stream and, upon 1335 request, emits a retransmitted packet with an extra payload header as 1336 a Redundancy RTP Stream. The RTP Retransmission mechanism [RFC4588] 1337 is specified such that there is a one to one relation between the 1338 Source RTP Stream and the Redundancy RTP Stream. Therefore, a 1339 Redundancy RTP Stream needs to be associated with its Source RTP 1340 Stream. This is done based on CNAME selectors and heuristics to 1341 match requested packets for a given Source RTP Stream with the 1342 original sequence number in the payload of any new Redundancy RTP 1343 Stream using the RTX payload format. In cases where the Redundancy 1344 RTP Stream is sent in a separate RTP Session from the Source RTP 1345 Stream, these sessions are related, which is signaled by using the 1346 SDP Media Grouping's [RFC5888] Flow Identification (FID 1347 identification-tag) semantics. 1349 3.11. Forward Error Correction 1351 Figure 12 shows an example where two Media Sources' Source RTP 1352 Streams are protected by Forward Error Correction (FEC). Source RTP 1353 Stream A has a RTP-based Redundancy transformation in FEC Encoder 1. 1354 This produces a Redundancy RTP Stream 1, that is only related to 1355 Source RTP Stream A. The FEC Encoder 2, however, takes two Source 1356 RTP Streams (A and B) and produces a Redundancy RTP Stream 2 that 1357 protects them jointly, i.e. Redundancy RTP Stream 2 relates to two 1358 Source RTP Streams (a FEC group). FEC decoding, when needed due to 1359 packet loss or packet corruption at the receiver, requires knowledge 1360 about which Source RTP Streams that the FEC encoding was based on. 1362 In Figure 12 all RTP Streams are sent on the same Media Transport. 1363 This is however not the only possible choice. Numerous combinations 1364 exist for spreading these RTP Streams over different Media Transports 1365 to achieve the communication application's goal. 1367 +--------------------+ +--------------------+ 1368 | Media Source A | | Media Source B | 1369 +--------------------+ +--------------------+ 1370 | | 1371 V V 1372 +--------------------+ +--------------------+ 1373 | Media Encoder A | | Media Encoder B | 1374 +--------------------+ +--------------------+ 1375 | | 1376 Encoded Stream Encoded Stream 1377 V V 1378 +--------------------+ +--------------------+ 1379 | Media Packetizer A | | Media Packetizer B | 1380 +--------------------+ +--------------------+ 1381 | | 1382 Source RTP Stream A Source RTP Stream B 1383 | | 1384 +-----+---------+-------------+ +---+---+ 1385 | V V V | 1386 | +---------------+ +---------------+ | 1387 | | FEC Encoder 1 | | FEC Encoder 2 | | 1388 | +---------------+ +---------------+ | 1389 | Redundancy | Redundancy | | 1390 | RTP Stream 1 | RTP Stream 2 | | 1391 V V V V 1392 +----------------------------------------------------------+ 1393 | Media Transport | 1394 +----------------------------------------------------------+ 1396 Figure 12: Example of FEC Redundancy RTP Streams 1398 As FEC Encoding exists in various forms, the methods for relating FEC 1399 Redundancy RTP Streams with its source information in Source RTP 1400 Streams are many. The XOR based RTP FEC Payload format [RFC5109] is 1401 defined in such a way that a Redundancy RTP Stream has a one to one 1402 relation with a Source RTP Stream. In fact, the RFC requires the 1403 Redundancy RTP Stream to use the same SSRC as the Source RTP Stream. 1404 This requires to either use a separate RTP Session or to use the 1405 Redundancy RTP Payload format [RFC2198]. The underlying relation 1406 requirement for this FEC format and a particular Redundancy RTP 1407 Stream is to know the related Source RTP Stream, including its SSRC. 1409 3.12. RTP Stream Separation 1411 RTP Streams can be separated exclusively based on their SSRCs, at the 1412 RTP Session level, or at the Multi-Media Session level. 1414 When the RTP Streams that have a relationship are all sent in the 1415 same RTP Session and are uniquely identified based on their SSRC 1416 only, it is termed an SSRC-Only Based Separation. Such streams can 1417 be related via RTCP CNAME to identify that the streams belong to the 1418 same Endpoint. SSRC-based approaches [RFC5576], when used, can 1419 explicitly relate various such RTP Streams. 1421 On the other hand, when RTP Streams that are related but are sent in 1422 the context of different RTP Sessions to achieve separation, it is 1423 known as RTP Session-based separation. This is commonly used when 1424 the different RTP Streams are intended for different Media 1425 Transports. 1427 Several mechanisms that use RTP Session-based separation rely on it 1428 to enable an implicit grouping mechanism expressing the relationship. 1429 The solutions have been based on using the same SSRC value in the 1430 different RTP Sessions to implicitly indicate their relation. That 1431 way, no explicit RTP level mechanism has been needed, only signaling 1432 level relations have been established using semantics from Grouping 1433 of Media lines framework [RFC5888]. Examples of this are RTP 1434 Retransmission [RFC4588], SVC Multi-Session Transmission [RFC6190] 1435 and XOR Based FEC [RFC5109]. RTCP CNAME explicitly relates RTP 1436 Streams across different RTP Sessions, as explained in the previous 1437 section. Such a relationship can be used to perform inter-media 1438 synchronization. 1440 RTP Streams that are related and need to be associated can be part of 1441 different Multimedia Sessions, rather than just different RTP 1442 Sessions within the same Multimedia Session context. This puts 1443 further demand on the scope of the mechanism(s) and its handling of 1444 identifiers used for expressing the relationships. 1446 3.13. Multiple RTP Sessions over one Media Transport 1448 [I-D.westerlund-avtcore-transport-multiplexing] describes a mechanism 1449 that allows several RTP Sessions to be carried over a single 1450 underlying Media Transport. The main reasons for doing this are 1451 related to the impact of using one or more Media Transports (using a 1452 common network path or potentially have different ones). The fewer 1453 Media Transports used, the less need for NAT/FW traversal resources 1454 and number of flow based Quality of Service (QoS). 1456 However, Multiple RTP Sessions over one Media Transport imply that a 1457 single Media Transport 5-tuple is not sufficient to express in which 1458 RTP Session context a particular RTP Stream exists. Complexities in 1459 the relationship between Media Transports and RTP Session already 1460 exist as one RTP Session contains multiple Media Transports, e.g. 1461 even a Peer-to-Peer RTP Session with RTP/RTCP Multiplexing requires 1462 two Media Transports, one in each direction. The relationship 1463 between Media Transports and RTP Sessions as well as additional 1464 levels of identifiers need to be considered in both signaling design 1465 and when defining terminology. 1467 4. Mapping from Existing Terms 1469 This section describes a selected set of terms from some relevant 1470 IETF RFC and Internet Drafts (at the time of writing), using the 1471 concepts from previous sections. 1473 4.1. Telepresence Terms 1475 The terms in this sub-section are used in the context of CLUE 1476 [I-D.ietf-clue-framework]. 1478 4.1.1. Audio Capture 1480 Describes an audio Media Source (Section 2.1.4). 1482 4.1.2. Capture Device 1484 Identifies a physical entity performing a Media Capture 1485 (Section 2.1.2) transformation. 1487 4.1.3. Capture Encoding 1489 Describes an Encoded Stream (Section 2.1.7) related to CLUE specific 1490 semantic information. 1492 4.1.4. Capture Scene 1494 Describes a set of spatially related Media Sources (Section 2.1.4). 1496 4.1.5. Endpoint 1498 Describes exactly one Participant (Section 2.2.3) and one or more 1499 Endpoints (Section 2.2.1). 1501 4.1.6. Individual Encoding 1503 Describes the configuration information needed to perform a Media 1504 Encoder (Section 2.1.6) transformation. 1506 4.1.7. Media Capture 1508 Describes either a Media Capture (Section 2.1.2) or a Media Source 1509 (Section 2.1.4), depending on in which context the term is used. 1511 4.1.8. Media Consumer 1513 Describes the media receiving part of an Endpoint (Section 2.2.1). 1515 4.1.9. Media Provider 1517 Describes the media sending part of an Endpoint (Section 2.2.1). 1519 4.1.10. Stream 1521 Describes an RTP Stream (Section 2.1.10). 1523 4.1.11. Video Capture 1525 Describes a video Media Source (Section 2.1.4). 1527 4.2. Media Description 1529 A single Session Description Protocol (SDP) [RFC4566] media 1530 description (or media block; an m-line and all subsequent lines until 1531 the next m-line or the end of the SDP) describes part of the 1532 necessary configuration and identification information needed for a 1533 Media Encoder transformation, as well as the necessary configuration 1534 and identification information for the Media Decoder to be able to 1535 correctly interpret a received RTP Stream. 1537 A Media Description typically relates to a single Media Source. This 1538 is for example an explicit restriction in WebRTC. However, nothing 1539 prevents that the same Media Description (and same RTP Session) is 1540 re-used for multiple Media Sources 1541 [I-D.ietf-avtcore-rtp-multi-stream]. It can thus describe properties 1542 of one or more RTP Streams, and can also describe properties valid 1543 for an entire RTP Session (via [RFC5576] mechanisms, for example). 1545 4.3. Media Stream 1547 RTP [RFC3550] uses media stream, audio stream, video stream, and 1548 stream of (RTP) packets interchangeably, which are all RTP Streams. 1550 4.4. Multimedia Conference 1552 A Multimedia Conference is a Communication Session (Section 2.2.5) 1553 between two or more Participants (Section 2.2.3), along with the 1554 software they are using to communicate. 1556 4.5. Multimedia Session 1558 SDP [RFC4566] defines a Multimedia Session as a set of multimedia 1559 senders and receivers and the data streams flowing from senders to 1560 receivers, which would correspond to a set of Endpoints and the RTP 1561 Streams that flow between them. In this memo, Multimedia Session 1562 (Section 2.2.4) also assumes those Endpoints belong to a set of 1563 Participants that are engaged in communication via a set of related 1564 RTP Streams. 1566 RTP [RFC3550] defines a Multimedia Session as a set of concurrent RTP 1567 Sessions among a common group of Participants. For example, a video 1568 conference may contain an audio RTP Session and a video RTP Session. 1569 This would correspond to a group of Participants (each using one or 1570 more Endpoints) sharing a set of concurrent RTP Sessions. In this 1571 memo, Multimedia Session also defines those RTP Sessions to have some 1572 relation and be part of a communication among the Participants. 1574 4.6. Multipoint Control Unit (MCU) 1576 This term is commonly used to describe the central node in any type 1577 of star topology [I-D.ietf-avtcore-rtp-topologies-update] conference. 1578 It describes a device that includes one Participant (Section 2.2.3) 1579 (usually corresponding to a so-called conference focus) and one or 1580 more related Endpoints (Section 2.2.1) (sometimes one or more per 1581 conference Participant). 1583 4.7. Multi-Session Transmission (MST) 1585 One of two transmission modes defined in H.264 based SVC [RFC6190], 1586 the other mode being SST (Section 4.13). In Multi-Session 1587 Transmission (MST), the SVC Media Encoder sends Encoded Streams and 1588 Dependent Streams distributed across two or more RTP Streams in one 1589 or more RTP Sessions. The term "MST" is ambiguous in RFC 6190, 1590 especially since the name indicates the use of multiple "sessions", 1591 while MST type packetization is in fact required whenever two or more 1592 RTP Streams are used for the Encoded and Dependent Streams, 1593 regardless if those are sent in one or more RTP Sessions. 1594 Corresponds either to MRST or MRMT (Section 3.7) stream relations 1595 defined in this specification. The SVC RTP Payload RFC [RFC6190] is 1596 not particularly explicit about how the common Media Encoder 1597 (Section 2.1.6) relation between Encoded Streams (Section 2.1.7) and 1598 Dependent Streams (Section 2.1.8) is to be implemented. 1600 4.8. Recording Device 1602 WebRTC specifications use this term to refer to locally available 1603 entities performing a Media Capture (Section 2.1.2) transformation. 1605 4.9. RtcMediaStream 1607 A WebRTC RtcMediaStream is a set of Media Sources (Section 2.1.4) 1608 sharing the same Synchronization Context (Section 3.1). 1610 4.10. RtcMediaStreamTrack 1612 A WebRTC RtcMediaStreamTrack is a Media Source (Section 2.1.4). 1614 4.11. RTP Sender 1616 RTP [RFC3550] uses this term, which can be seen as the RTP protocol 1617 part of a Media Packetizer (Section 2.1.9). 1619 4.12. RTP Session 1621 Within the context of SDP, a singe m= line can map to a single RTP 1622 Session (Section 2.2.2) or multiple m= lines can map to a single RTP 1623 Session. The latter is enabled via multiplexing schemes such as 1624 BUNDLE [I-D.ietf-mmusic-sdp-bundle-negotiation], for example, which 1625 allows mapping of multiple m= lines to a single RTP Session. 1627 4.13. Single Session Transmission (SST) 1629 One of two transmission modes defined in H.264 based SVC [RFC6190], 1630 the other mode being MST (Section 4.7). In Single Session 1631 Transmission (SST), the SVC Media Encoder sends Encoded Streams 1632 (Section 2.1.7) and Dependent Streams (Section 2.1.8) combined into a 1633 single RTP Stream (Section 2.1.10) in a single RTP Session 1634 (Section 2.2.2), using the SVC RTP Payload format. The term "SST" is 1635 ambiguous in RFC 6190, in that it sometimes refers to the use of a 1636 single RTP Stream, like in sections relating to packetization, and 1637 sometimes appears to refer to use of a single RTP Session, like in 1638 the context of discussing SDP. Closely corresponds to SRST 1639 (Section 3.7) defined in this specification. 1641 4.14. SSRC 1643 RTP [RFC3550] defines this as "the source of a stream of RTP 1644 packets", which indicates that an SSRC is not only a unique 1645 identifier for the Encoded Stream (Section 2.1.7) carried in those 1646 packets, but is also effectively used as a term to denote a Media 1647 Packetizer (Section 2.1.9). 1649 5. Security Considerations 1651 This document simply tries to clarify the confusion prevalent in RTP 1652 taxonomy because of inconsistent usage by multiple technologies and 1653 protocols making use of the RTP protocol. It does not introduce any 1654 new security considerations beyond those already well documented in 1655 the RTP protocol [RFC3550] and each of the many respective 1656 specifications of the various protocols making use of it. 1658 Hopefully having a well-defined common terminology and understanding 1659 of the complexities of the RTP architecture will help lead us to 1660 better standards, avoiding security problems. 1662 6. Acknowledgement 1664 This document has many concepts borrowed from several documents such 1665 as WebRTC [I-D.ietf-rtcweb-overview], CLUE [I-D.ietf-clue-framework], 1666 and Multiplexing Architecture 1667 [I-D.westerlund-avtcore-transport-multiplexing]. The authors would 1668 like to thank all the authors of each of those documents. 1670 The authors would also like to acknowledge the insights, guidance and 1671 contributions of Magnus Westerlund, Roni Even, Paul Kyzivat, Colin 1672 Perkins, Keith Drage, Harald Alvestrand, Alex Eleftheriadis, Mo 1673 Zanaty, Stephan Wenger, and Bernard Aboba. 1675 7. Contributors 1677 Magnus Westerlund has contributed the concept model for the media 1678 chain using transformations and streams model, including rewriting 1679 pre-existing concepts into this model and adding missing concepts. 1680 The first proposal for updating the relationships and the topologies 1681 based on this concept was also performed by Magnus. 1683 8. IANA Considerations 1685 This document makes no request of IANA. 1687 9. Informative References 1689 [I-D.ietf-avtcore-rtp-multi-stream] 1690 Lennox, J., Westerlund, M., Wu, W., and C. Perkins, 1691 "Sending Multiple Media Streams in a Single RTP Session", 1692 draft-ietf-avtcore-rtp-multi-stream-06 (work in progress), 1693 October 2014. 1695 [I-D.ietf-avtcore-rtp-topologies-update] 1696 Westerlund, M. and S. Wenger, "RTP Topologies", draft- 1697 ietf-avtcore-rtp-topologies-update-06 (work in progress), 1698 March 2015. 1700 [I-D.ietf-clue-framework] 1701 Duckworth, M., Pepperell, A., and S. Wenger, "Framework 1702 for Telepresence Multi-Streams", draft-ietf-clue- 1703 framework-21 (work in progress), March 2015. 1705 [I-D.ietf-mmusic-sdp-bundle-negotiation] 1706 Holmberg, C., Alvestrand, H., and C. Jennings, 1707 "Negotiating Media Multiplexing Using the Session 1708 Description Protocol (SDP)", draft-ietf-mmusic-sdp-bundle- 1709 negotiation-17 (work in progress), March 2015. 1711 [I-D.ietf-mmusic-sdp-simulcast] 1712 Westerlund, M., Nandakumar, S., and M. Zanaty, "Using 1713 Simulcast in SDP and RTP Sessions", draft-ietf-mmusic-sdp- 1714 simulcast-00 (work in progress), January 2015. 1716 [I-D.ietf-rtcweb-overview] 1717 Alvestrand, H., "Overview: Real Time Protocols for 1718 Browser-based Applications", draft-ietf-rtcweb-overview-13 1719 (work in progress), November 2014. 1721 [I-D.westerlund-avtcore-transport-multiplexing] 1722 Westerlund, M. and C. Perkins, "Multiplexing Multiple RTP 1723 Sessions onto a Single Lower-Layer Transport", draft- 1724 westerlund-avtcore-transport-multiplexing-07 (work in 1725 progress), October 2013. 1727 [RFC2198] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., 1728 Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse- 1729 Parisis, "RTP Payload for Redundant Audio Data", RFC 2198, 1730 September 1997. 1732 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 1733 Jacobson, "RTP: A Transport Protocol for Real-Time 1734 Applications", STD 64, RFC 3550, July 2003. 1736 [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and 1737 Video Conferences with Minimal Control", STD 65, RFC 3551, 1738 July 2003. 1740 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 1741 Description Protocol", RFC 4566, July 2006. 1743 [RFC4588] Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R. 1744 Hakenberg, "RTP Retransmission Payload Format", RFC 4588, 1745 July 2006. 1747 [RFC4867] Sjoberg, J., Westerlund, M., Lakaniemi, A., and Q. Xie, 1748 "RTP Payload Format and File Storage Format for the 1749 Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband 1750 (AMR-WB) Audio Codecs", RFC 4867, April 2007. 1752 [RFC5109] Li, A., "RTP Payload Format for Generic Forward Error 1753 Correction", RFC 5109, December 2007. 1755 [RFC5404] Westerlund, M. and I. Johansson, "RTP Payload Format for 1756 G.719", RFC 5404, January 2009. 1758 [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific 1759 Media Attributes in the Session Description Protocol 1760 (SDP)", RFC 5576, June 2009. 1762 [RFC5888] Camarillo, G. and H. Schulzrinne, "The Session Description 1763 Protocol (SDP) Grouping Framework", RFC 5888, June 2010. 1765 [RFC5905] Mills, D., Martin, J., Burbank, J., and W. Kasch, "Network 1766 Time Protocol Version 4: Protocol and Algorithms 1767 Specification", RFC 5905, June 2010. 1769 [RFC6190] Wenger, S., Wang, Y., Schierl, T., and A. Eleftheriadis, 1770 "RTP Payload Format for Scalable Video Coding", RFC 6190, 1771 May 2011. 1773 [RFC7160] Petit-Huguenin, M. and G. Zorn, "Support for Multiple 1774 Clock Rates in an RTP Session", RFC 7160, April 2014. 1776 [RFC7197] Begen, A., Cai, Y., and H. Ou, "Duplication Delay 1777 Attribute in the Session Description Protocol", RFC 7197, 1778 April 2014. 1780 [RFC7198] Begen, A. and C. Perkins, "Duplicating RTP Streams", RFC 1781 7198, April 2014. 1783 [RFC7273] Williams, A., Gross, K., van Brandenburg, R., and H. 1784 Stokking, "RTP Clock Source Signalling", RFC 7273, June 1785 2014. 1787 Appendix A. Changes From Earlier Versions 1789 NOTE TO RFC EDITOR: Please remove this section prior to publication. 1791 A.1. Modifications Between WG Version -05 and -06 1793 o Clarified that a Redundancy RTP Stream can be used standalone to 1794 generate Repaired RTP Streams. 1796 o Clarified that (in accordance with above) RTP-based Repair takes 1797 zero or more Received RTP Streams and one or more Received 1798 Redundancy RTP Streams as input. 1800 o Changed Figure 6 to more clearly show that Media Transport is 1801 terminated in the Endpoint, not in the Particpiant. 1803 o Added a sentence to Endpoint section that clarifies there may be 1804 contexts where a single "host" can serve multiple Participants, 1805 making those Endpoints share some properties. 1807 o Merged previous section 3.5 on SST/MST with previous section 3.8 1808 on Layered Multi-Stream into a common section discussing the 1809 scalable/layered stream relation, and moved improved, descriptive 1810 text on SST and MST to new sub-sections 4.7 and 4.13, describing 1811 them as existing terms. 1813 o Editorial improvements. 1815 A.2. Modifications Between WG Version -04 and -05 1817 o Editorial improvements. 1819 A.3. Modifications Between WG Version -03 and -04 1821 o Changed "Media Redundancy" and "Media Repair" to "RTP-based 1822 Redundancy" and "RTP-based Repair", since those terms are more 1823 specific and correct. 1825 o Changed "End Point" to "Endpoint" and removed Editor's Note on 1826 this. 1828 o Clarified that a Media Capture may impose constraints on clock 1829 handling. 1831 o Clarified that mixing multiple Raw Streams into a Source Stream is 1832 not possible, since that requires mixed streams to have a timing 1833 relation, requiring them to be Source Streams, and added an 1834 example. 1836 o Clarified that RTP-based Redundancy excludes the type of encoding 1837 redundancy found within the encoded media format in an Encoded 1838 Stream. 1840 o Clarified that a Media Transport contains only a single RTP 1841 Session, but a single RTP Session can span multiple Media 1842 Transports. 1844 o Clarified that packets with seemingly correct checksum that are 1845 received by a Media Transport Receiver may still be corrupt. 1847 o Clarified that a corrupt packet in a Media Transport Receiver is 1848 typically either discarded or somehow marked and passed on in the 1849 Received RTP Stream. 1851 o Added Synchronization Context to Figure 6. 1853 o Editorial improvements and clarifications. 1855 A.4. Modifications Between WG Version -02 and -03 1857 o Changed section 3.5, removing SST-SS/MS and MST-SS/MS, replacing 1858 them with SRST, MRST, and MRMT. 1860 o Updated section 3.8 to align with terminology changes in section 1861 3.5. 1863 o Added a new section 4.12, describing the term Multimedia 1864 Conference. 1866 o Changed reference from I-D to now published RFC 7273. 1868 o Editorial improvements and clarifications. 1870 A.5. Modifications Between WG Version -01 and -02 1872 o Major re-structure 1874 o Moved media chain Media Transport detailing up one section level 1875 o Collapsed level 2 sub-sections of section 3 and thus moved level 3 1876 sub-sections up one level, gathering some introductory text into 1877 the beginning of section 3 1879 o Added that not only SSRC collision, but also a clock rate change 1880 [RFC7160] is a valid reason to change SSRC value for an RTP stream 1882 o Added a sub-section on clock source signaling 1884 o Added a sub-section on RTP stream duplication 1886 o Elaborated a bit in section 2.2.1 on the relation between End 1887 Points, Participants and CNAMEs 1889 o Elaborated a bit in section 2.2.4 on Multimedia Session and 1890 synchronization contexts 1892 o Removed the section on CLUE scenes defining an implicit 1893 synchronization context, since it was incorrect 1895 o Clarified text on SVC SST and MST according to list discussions 1897 o Removed the entire topology section to avoid possible 1898 inconsistencies or duplications with draft-ietf-avtcore-rtp- 1899 topologies-update, but saved one example overview figure of 1900 Communication Entities into that section 1902 o Added a section 4 on mapping from existing terms with one sub- 1903 section per term, mainly by moving text from sections 2 and 3 1905 o Changed all occurrences of Packet Stream to RTP Stream 1907 o Moved all normative references to informative, since this is an 1908 informative document 1910 o Added references to RFC 7160, RFC 7197 and RFC 7198, and removed 1911 unused references 1913 A.6. Modifications Between WG Version -00 and -01 1915 o WG version -00 text is identical to individual draft -03 1917 o Amended description of SVC SST and MST encodings with respect to 1918 concepts defined in this text 1920 o Removed UML as normative reference, since the text no longer uses 1921 any UML notation 1923 o Removed a number of level 4 sections and moved out text to the 1924 level above 1926 A.7. Modifications Between Version -02 and -03 1928 o Section 4 rewritten (and new communication topologies added) to 1929 reflect the major updates to Sections 1-3 1931 o Section 8 removed (carryover from initial -00 draft) 1933 o General clean up of text, grammar and nits 1935 A.8. Modifications Between Version -01 and -02 1937 o Section 2 rewritten to add both streams and transformations in the 1938 media chain. 1940 o Section 3 rewritten to focus on exposing relationships. 1942 A.9. Modifications Between Version -00 and -01 1944 o Too many to list 1946 o Added new authors 1948 o Updated content organization and presentation 1950 Authors' Addresses 1952 Jonathan Lennox 1953 Vidyo, Inc. 1954 433 Hackensack Avenue 1955 Seventh Floor 1956 Hackensack, NJ 07601 1957 US 1959 Email: jonathan@vidyo.com 1961 Kevin Gross 1962 AVA Networks, LLC 1963 Boulder, CO 1964 US 1966 Email: kevin.gross@avanw.com 1967 Suhas Nandakumar 1968 Cisco Systems 1969 170 West Tasman Drive 1970 San Jose, CA 95134 1971 US 1973 Email: snandaku@cisco.com 1975 Gonzalo Salgueiro 1976 Cisco Systems 1977 7200-12 Kit Creek Road 1978 Research Triangle Park, NC 27709 1979 US 1981 Email: gsalguei@cisco.com 1983 Bo Burman 1984 Ericsson 1985 Kistavagen 25 1986 SE-164 80 Stockholm 1987 Sweden 1989 Email: bo.burman@ericsson.com