idnits 2.17.1 draft-ietf-avtext-rtp-grouping-taxonomy-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (January 16, 2015) is 3387 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-11) exists of draft-ietf-avtcore-rtp-multi-stream-06 == Outdated reference: A later version (-10) exists of draft-ietf-avtcore-rtp-topologies-update-05 == Outdated reference: A later version (-25) exists of draft-ietf-clue-framework-19 == Outdated reference: A later version (-54) exists of draft-ietf-mmusic-sdp-bundle-negotiation-14 == Outdated reference: A later version (-19) exists of draft-ietf-rtcweb-overview-13 -- Obsolete informational reference (is this intentional?): RFC 4566 (Obsoleted by RFC 8866) Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Lennox 3 Internet-Draft Vidyo 4 Intended status: Informational K. Gross 5 Expires: July 20, 2015 AVA 6 S. Nandakumar 7 G. Salgueiro 8 Cisco Systems 9 B. Burman 10 Ericsson 11 January 16, 2015 13 A Taxonomy of Grouping Semantics and Mechanisms for Real-Time Transport 14 Protocol (RTP) Sources 15 draft-ietf-avtext-rtp-grouping-taxonomy-04 17 Abstract 19 The terminology about, and associations among, Real-Time Transport 20 Protocol (RTP) sources can be complex and somewhat opaque. This 21 document describes a number of existing and proposed relationships 22 among RTP sources, and attempts to define common terminology for 23 discussing protocol entities and their relationships. 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on July 20, 2015. 42 Copyright Notice 44 Copyright (c) 2015 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 60 2. Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . 4 61 2.1. Media Chain . . . . . . . . . . . . . . . . . . . . . . . 4 62 2.1.1. Physical Stimulus . . . . . . . . . . . . . . . . . . 8 63 2.1.2. Media Capture . . . . . . . . . . . . . . . . . . . . 8 64 2.1.3. Raw Stream . . . . . . . . . . . . . . . . . . . . . 8 65 2.1.4. Media Source . . . . . . . . . . . . . . . . . . . . 8 66 2.1.5. Source Stream . . . . . . . . . . . . . . . . . . . . 9 67 2.1.6. Media Encoder . . . . . . . . . . . . . . . . . . . . 10 68 2.1.7. Encoded Stream . . . . . . . . . . . . . . . . . . . 11 69 2.1.8. Dependent Stream . . . . . . . . . . . . . . . . . . 11 70 2.1.9. Media Packetizer . . . . . . . . . . . . . . . . . . 11 71 2.1.10. RTP Stream . . . . . . . . . . . . . . . . . . . . . 12 72 2.1.11. RTP-based Redundancy . . . . . . . . . . . . . . . . 13 73 2.1.12. Redundancy RTP Stream . . . . . . . . . . . . . . . . 13 74 2.1.13. Media Transport . . . . . . . . . . . . . . . . . . . 13 75 2.1.14. Media Transport Sender . . . . . . . . . . . . . . . 14 76 2.1.15. Sent RTP Stream . . . . . . . . . . . . . . . . . . . 15 77 2.1.16. Network Transport . . . . . . . . . . . . . . . . . . 15 78 2.1.17. Transported RTP Stream . . . . . . . . . . . . . . . 15 79 2.1.18. Media Transport Receiver . . . . . . . . . . . . . . 15 80 2.1.19. Received RTP Stream . . . . . . . . . . . . . . . . . 15 81 2.1.20. Received Redundancy RTP Stream . . . . . . . . . . . 16 82 2.1.21. RTP-based Repair . . . . . . . . . . . . . . . . . . 16 83 2.1.22. Repaired RTP Stream . . . . . . . . . . . . . . . . . 16 84 2.1.23. Media Depacketizer . . . . . . . . . . . . . . . . . 16 85 2.1.24. Received Encoded Stream . . . . . . . . . . . . . . . 16 86 2.1.25. Media Decoder . . . . . . . . . . . . . . . . . . . . 16 87 2.1.26. Received Source Stream . . . . . . . . . . . . . . . 17 88 2.1.27. Media Sink . . . . . . . . . . . . . . . . . . . . . 17 89 2.1.28. Received Raw Stream . . . . . . . . . . . . . . . . . 17 90 2.1.29. Media Render . . . . . . . . . . . . . . . . . . . . 17 91 2.2. Communication Entities . . . . . . . . . . . . . . . . . 18 92 2.2.1. Endpoint . . . . . . . . . . . . . . . . . . . . . . 19 93 2.2.2. RTP Session . . . . . . . . . . . . . . . . . . . . . 19 94 2.2.3. Participant . . . . . . . . . . . . . . . . . . . . . 20 95 2.2.4. Multimedia Session . . . . . . . . . . . . . . . . . 20 96 2.2.5. Communication Session . . . . . . . . . . . . . . . . 21 98 3. Concepts of Inter-Relations . . . . . . . . . . . . . . . . . 21 99 3.1. Synchronization Context . . . . . . . . . . . . . . . . . 21 100 3.1.1. RTCP CNAME . . . . . . . . . . . . . . . . . . . . . 22 101 3.1.2. Clock Source Signaling . . . . . . . . . . . . . . . 22 102 3.1.3. Implicitly via RtcMediaStream . . . . . . . . . . . . 22 103 3.1.4. Explicitly via SDP Mechanisms . . . . . . . . . . . . 22 104 3.2. Endpoint . . . . . . . . . . . . . . . . . . . . . . . . 22 105 3.3. Participant . . . . . . . . . . . . . . . . . . . . . . . 23 106 3.4. RtcMediaStream . . . . . . . . . . . . . . . . . . . . . 23 107 3.5. Single- and Multi-Session Transmission of Dependent 108 Streams . . . . . . . . . . . . . . . . . . . . . . . . . 23 109 3.6. Multi-Channel Audio . . . . . . . . . . . . . . . . . . . 24 110 3.7. Simulcast . . . . . . . . . . . . . . . . . . . . . . . . 24 111 3.8. Layered Multi-Stream . . . . . . . . . . . . . . . . . . 25 112 3.9. RTP Stream Duplication . . . . . . . . . . . . . . . . . 27 113 3.10. Redundancy Format . . . . . . . . . . . . . . . . . . . . 27 114 3.11. RTP Retransmission . . . . . . . . . . . . . . . . . . . 28 115 3.12. Forward Error Correction . . . . . . . . . . . . . . . . 29 116 3.13. RTP Stream Separation . . . . . . . . . . . . . . . . . . 31 117 3.14. Multiple RTP Sessions over one Media Transport . . . . . 32 118 4. Mapping from Existing Terms . . . . . . . . . . . . . . . . . 32 119 4.1. Telepresence Terms . . . . . . . . . . . . . . . . . . . 32 120 4.1.1. Audio Capture . . . . . . . . . . . . . . . . . . . . 32 121 4.1.2. Capture Device . . . . . . . . . . . . . . . . . . . 32 122 4.1.3. Capture Encoding . . . . . . . . . . . . . . . . . . 32 123 4.1.4. Capture Scene . . . . . . . . . . . . . . . . . . . . 33 124 4.1.5. Endpoint . . . . . . . . . . . . . . . . . . . . . . 33 125 4.1.6. Individual Encoding . . . . . . . . . . . . . . . . . 33 126 4.1.7. Media Capture . . . . . . . . . . . . . . . . . . . . 33 127 4.1.8. Media Consumer . . . . . . . . . . . . . . . . . . . 33 128 4.1.9. Media Provider . . . . . . . . . . . . . . . . . . . 33 129 4.1.10. Stream . . . . . . . . . . . . . . . . . . . . . . . 33 130 4.1.11. Video Capture . . . . . . . . . . . . . . . . . . . . 33 131 4.2. Media Description . . . . . . . . . . . . . . . . . . . . 33 132 4.3. Media Stream . . . . . . . . . . . . . . . . . . . . . . 34 133 4.4. Multimedia Conference . . . . . . . . . . . . . . . . . . 34 134 4.5. Multimedia Session . . . . . . . . . . . . . . . . . . . 34 135 4.6. Multipoint Control Unit (MCU) . . . . . . . . . . . . . . 34 136 4.7. Recording Device . . . . . . . . . . . . . . . . . . . . 34 137 4.8. RtcMediaStream . . . . . . . . . . . . . . . . . . . . . 35 138 4.9. RtcMediaStreamTrack . . . . . . . . . . . . . . . . . . . 35 139 4.10. RTP Sender . . . . . . . . . . . . . . . . . . . . . . . 35 140 4.11. RTP Session . . . . . . . . . . . . . . . . . . . . . . . 35 141 4.12. SSRC . . . . . . . . . . . . . . . . . . . . . . . . . . 35 142 5. Security Considerations . . . . . . . . . . . . . . . . . . . 35 143 6. Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 36 144 7. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 36 145 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 36 146 9. Informative References . . . . . . . . . . . . . . . . . . . 36 147 Appendix A. Changes From Earlier Versions . . . . . . . . . . . 38 148 A.1. Modifications Between WG Version -03 and -04 . . . . . . 38 149 A.2. Modifications Between WG Version -02 and -03 . . . . . . 39 150 A.3. Modifications Between WG Version -01 and -02 . . . . . . 39 151 A.4. Modifications Between WG Version -00 and -01 . . . . . . 40 152 A.5. Modifications Between Version -02 and -03 . . . . . . . . 40 153 A.6. Modifications Between Version -01 and -02 . . . . . . . . 41 154 A.7. Modifications Between Version -00 and -01 . . . . . . . . 41 155 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 41 157 1. Introduction 159 The existing taxonomy of sources in RTP is often regarded as 160 confusing and inconsistent. Consequently, a deep understanding of 161 how the different terms relate to each other becomes a real 162 challenge. Frequently cited examples of this confusion are (1) how 163 different protocols that make use of RTP use the same terms to 164 signify different things and (2) how the complexities addressed at 165 one layer are often glossed over or ignored at another. 167 This document attempts to provide some clarity by reviewing the 168 semantics of various aspects of sources in RTP. As an organizing 169 mechanism, it approaches this by describing various ways that RTP 170 sources can be grouped and associated together. 172 All non-specific references to ControLling mUltiple streams for 173 tElepresence (CLUE) in this document map to [I-D.ietf-clue-framework] 174 and all references to Web Real-Time Communications (WebRTC) map to 175 [I-D.ietf-rtcweb-overview]. 177 2. Concepts 179 This section defines concepts that serve to identify and name various 180 transformations and streams in a given RTP usage. For each concept 181 an attempt is made to list any alternate definitions and usages that 182 co-exist today along with various characteristics that further 183 describes the concept. These concepts are divided into two 184 categories, one related to the chain of streams and transformations 185 that media can be subject to, the other for entities involved in the 186 communication. 188 2.1. Media Chain 190 In the context of this memo, Media is a sequence of synthetic or 191 Physical Stimulus (Section 2.1.1) (sound waves, photons, key- 192 strokes), represented in digital form. Synthesized Media is 193 typically generated directly in the digital domain. 195 This section contains the concepts that can be involved in taking 196 Media at a sender side and transporting it to a receiver, which may 197 recover a sequence of physical stimulus. This chain of concepts is 198 of two main types, streams and transformations. Streams are time- 199 based sequences of samples of the physical stimulus in various 200 representations, while transformations changes the representation of 201 the streams in some way. 203 The below examples are basic ones and it is important to keep in mind 204 that this conceptual model enables more complex usages. Some will be 205 further discussed in later sections of this document. In general the 206 following applies to this model: 208 o A transformation may have zero or more inputs and one or more 209 outputs. 211 o A stream is of some type, such as audio, video, real-time text, 212 etc. 214 o A stream has one source transformation and one or more sink 215 transformations (with the exception of Physical Stimulus 216 (Section 2.1.1) that may lack source or sink transformation). 218 o Streams can be forwarded from a transformation output to any 219 number of inputs on other transformations that support that type. 221 o If the output of a transformation is sent to multiple 222 transformations, those streams will be identical; it takes a 223 transformation to make them different. 225 o There are no formal limitations on how streams are connected to 226 transformations, this may include loops if required by a 227 particular transformation. 229 It is also important to remember that this is a conceptual model. 230 Thus real-world implementations may look different and have different 231 structure. 233 To provide a basic understanding of the relationships in the chain we 234 below first introduce the concepts for the sender side (Figure 1). 235 This covers physical stimulus until media packets are emitted onto 236 the network. 238 Physical Stimulus 239 | 240 V 241 +--------------------+ 242 | Media Capture | 243 +--------------------+ 244 | 245 Raw Stream 246 V 247 +--------------------+ 248 | Media Source |<- Synchronization Timing 249 +--------------------+ 250 | 251 Source Stream 252 V 253 +--------------------+ 254 | Media Encoder | 255 +--------------------+ 256 | 257 Encoded Stream +------------+ 258 V | V 259 +--------------------+ | +----------------------+ 260 | Media Packetizer | | | RTP-based Redundancy | 261 +--------------------+ | +----------------------+ 262 | | | 263 +------------+ Redundancy RTP Stream 264 Source RTP Stream | 265 V V 266 +--------------------+ +--------------------+ 267 | Media Transport | | Media Transport | 268 +--------------------+ +--------------------+ 270 Figure 1: Sender Side Concepts in the Media Chain 272 In Figure 1 we have included a branched chain to cover the concepts 273 for using redundancy to improve the reliability of the transport. 274 The Media Transport concept is an aggregate that is decomposed below 275 in Section 2.1.13. 277 Below we review a receiver media chain (Figure 2) matching the sender 278 side, to look at the inverse transformations and their attempts to 279 recover identical streams as in the sender chain, subject to what may 280 be lossy compression and imperfect Media Transport. Note that the 281 streams out of a reverse transformation, like the Source Stream out 282 the Media Decoder are in many cases not the same as the corresponding 283 ones on the sender side, thus they are prefixed with a "Received" to 284 denote a potentially modified version. The reason for not being the 285 same lies in the transformations that can be of irreversible type. 287 For example, lossy source coding in the Media Encoder prevents the 288 Source Stream out of the Media Decoder to be the same as the one fed 289 into the Media Encoder. Other reasons include packet loss or late 290 loss in the Media Transport transformation that even RTP-based 291 Repair, if used, fails to repair. It should be noted that some 292 transformations are not always present, like RTP-based Repair that 293 cannot operate without Redundancy RTP Streams. 295 +--------------------+ +--------------------+ 296 | Media Transport | | Media Transport | 297 +--------------------+ +--------------------+ 298 | | 299 Received RTP Stream Received Redundancy RTP Stream 300 | | 301 | +-------------------+ 302 V V 303 +--------------------+ 304 | RTP-based Repair | 305 +--------------------+ 306 | 307 Repaired RTP Stream 308 V 309 +--------------------+ 310 | Media Depacketizer | 311 +--------------------+ 312 | 313 Received Encoded Stream 314 V 315 +--------------------+ 316 | Media Decoder | 317 +--------------------+ 318 | 319 Received Source Stream 320 V 321 +--------------------+ 322 | Media Sink |--> Synchronization Information 323 +--------------------+ 324 | 325 Received Raw Stream 326 V 327 +--------------------+ 328 | Media Renderer | 329 +--------------------+ 330 | 331 V 332 Physical Stimulus 334 Figure 2: Receiver Side Concepts of the Media Chain 336 2.1.1. Physical Stimulus 338 The physical stimulus is a physical event that can be sampled and 339 converted to digital form by an appropriate sensor or transducer. 340 This include sound waves making up audio, photons in a light field, 341 or other excitations or interactions with sensors, like keystrokes on 342 a keyboard. 344 2.1.2. Media Capture 346 Media Capture is the process of transforming the Physical Stimulus 347 (Section 2.1.1) into digital Media using an appropriate sensor or 348 transducer. The Media Capture performs a digital sampling of the 349 physical stimulus, usually periodically, and outputs this in some 350 representation as a Raw Stream (Section 2.1.3). This data is due to 351 its periodical sampling, or at least being timed asynchronous events, 352 some form of a stream of media data. The Media Capture is normally 353 instantiated in some type of device, i.e. media capture device. 354 Examples of different types of media capturing devices are digital 355 cameras, microphones connected to A/D converters, or keyboards. 357 Characteristics: 359 o A Media Capture is identified either by hardware/manufacturer ID 360 or via a session-scoped device identifier as mandated by the 361 application usage. 363 o A Media Capture can generate an Encoded Stream (Section 2.1.7) if 364 the capture device support such a configuration. 366 o The nature of the Media Capture may impose constraints on the 367 clock handling in some of the subsequent steps. For example, many 368 audio or video capture devices are not completely free in 369 selecting the sample rate. 371 2.1.3. Raw Stream 373 The time progressing stream of digitally sampled information, usually 374 periodically sampled and provided by a Media Capture (Section 2.1.2). 375 A Raw Stream can also contain synthesized Media that may not require 376 any explicit Media Capture, since it is already in an appropriate 377 digital form. 379 2.1.4. Media Source 381 A Media Source is the logical source of a reference clock 382 synchronized, time progressing, digital media stream, called a Source 383 Stream (Section 2.1.5). This transformation takes one or more Raw 384 Streams (Section 2.1.3) and provides a Source Stream as output. The 385 output is synchronized with a reference clock (Section 3.1), which 386 can be as simple as a system local wall clock or as complex as NTP 387 synchronized. 389 The output can be of different types. One type is directly 390 associated with a particular Media Capture's Raw Stream. Others are 391 more conceptual sources, like an audio mix of multiple Source Streams 392 (Figure 3). Mixing multiple streams typically requires that the 393 input streams are possible to relate in time, meaning that they have 394 to be Source Streams (Section 2.1.5) rather than Raw Streams. In the 395 below example, the generated Source Stream is a mix of the three 396 input Source Streams. 398 Source Source Source 399 Stream Stream Stream 400 | | | 401 V V V 402 +--------------------------+ 403 | Media Source |<-- Reference Clock 404 | Mixer | 405 +--------------------------+ 406 | 407 V 408 Source Stream 410 Figure 3: Conceptual Media Source in form of Audio Mixer 412 Another possible example of a conceptual Media Source is a video 413 surveillance switch, where the input is multiple Source Streams from 414 different cameras, and the output is one of those Source Streams 415 based on some selection criteria, like a round-robin or based on some 416 video activity measure. 418 Characteristics: 420 o At any point, it can represent a physical captured source or 421 conceptual source. 423 2.1.5. Source Stream 425 A time progressing stream of digital samples that has been 426 synchronized with a reference clock and comes from particular Media 427 Source (Section 2.1.4). 429 2.1.6. Media Encoder 431 A Media Encoder is a transform that is responsible for encoding the 432 media data from a Source Stream (Section 2.1.5) into another 433 representation, usually more compact, that is output as an Encoded 434 Stream (Section 2.1.7). 436 The Media Encoder step commonly includes pre-encoding 437 transformations, such as scaling, resampling etc. The Media Encoder 438 can have a significant number of configuration options that affects 439 the properties of the Encoded Stream. This include properties such 440 as bit-rate, start points for decoding, resolution, bandwidth or 441 other fidelity affecting properties. The actually used codec is also 442 an important factor in many communication systems. 444 Scalable Media Encoders need special attention as they produce 445 multiple outputs that are potentially of different types. A scalable 446 Media Encoder takes one input Source Stream and encodes it into 447 multiple output streams of two different types; at least one Encoded 448 Stream that is independently decodable and one or more Dependent 449 Streams (Section 2.1.8). Decoding requires at least one Encoded 450 Stream and zero or more Dependent Streams. A Dependent Stream's 451 dependency is one of the grouping relations this document discusses 452 further in Section 3.8. 454 Source Stream 455 | 456 V 457 +--------------------------+ 458 | Scalable Media Encoder | 459 +--------------------------+ 460 | | ... | 461 V V V 462 Encoded Dependent Dependent 463 Stream Stream Stream 465 Figure 4: Scalable Media Encoder Input and Outputs 467 There are also other variants of encoders, like so-called Multiple 468 Description Coding (MDC). Such Media Encoder produce multiple 469 independent and thus individually decodable Encoded Streams. 470 However, (logically) combining multiple of these Encoded Streams into 471 a single Received Source Stream during decoding leads to an 472 improvement in perceptual reproduced quality when compared to 473 decoding a single Encoded Stream. 475 Creating multiple Encoded Streams from the same Source Stream, where 476 the Encoded Streams are neither in a scalable nor in an MDC 477 relationship is commonly utilized in Simulcast environments. 479 Characteristics: 481 o A Media Source can be multiply encoded by different Media Encoders 482 to provide various encoded representations. 484 2.1.7. Encoded Stream 486 A stream of time synchronized encoded media that can be independently 487 decoded. 489 Characteristics: 491 o Due to temporal dependencies, an Encoded Stream may have 492 limitations in where decoding can be started. These entry points, 493 for example Intra frames from a video encoder, may require 494 identification and their generation may be event based or 495 configured to occur periodically. 497 2.1.8. Dependent Stream 499 A stream of time synchronized encoded media fragments that are 500 dependent on one or more Encoded Streams (Section 2.1.7) and zero or 501 more Dependent Streams to be possible to decode. 503 Characteristics: 505 o Each Dependent Stream has a set of dependencies. These 506 dependencies must be understood by the parties in a Multimedia 507 Session that intend to use a Dependent Stream. 509 2.1.9. Media Packetizer 511 The transformation of taking one or more Encoded (Section 2.1.7) or 512 Dependent Streams (Section 2.1.8) and put their content into one or 513 more sequences of packets, normally RTP packets, and output Source 514 RTP Streams (Section 2.1.10). This step includes both generating RTP 515 payloads as well as RTP packets. 517 The Media Packetizer can use multiple inputs when producing a single 518 RTP Stream. One such example is SRST packetization when using SVC 519 (Section 3.5). 521 The Media Packetizer can also produce multiple RTP Streams, for 522 example when Encoded and/or Dependent Streams are distributed over 523 multiple RTP Streams. One example of this is MRMT packetization when 524 using SVC (Section 3.5). 526 Characteristics: 528 o The Media Packetizer will select which Synchronization source(s) 529 (SSRC) [RFC3550] in which RTP Sessions that are used. 531 o Media Packetizer can combine multiple Encoded or Dependent Streams 532 into one or more RTP Streams. 534 2.1.10. RTP Stream 536 A stream of RTP packets containing media data, source or redundant. 537 The RTP Stream is identified by an SSRC belonging to a particular RTP 538 Session. The RTP Session is identified as discussed in 539 Section 2.2.2. 541 A Source RTP Stream is a RTP Stream containing at least some content 542 from an Encoded Stream (Section 2.1.7). Source material is any media 543 material that is produced for transport over RTP without any 544 additional RTP-based redundancy applied. Note that RTP-based 545 redundancy excludes the type of redundancy that most suitable Media 546 Encoders (Section 2.1.6) may add to the media format of the Encoded 547 Stream that makes it cope better with inevitable RTP packet losses. 548 This is further described in RTP-based Redundancy (Section 2.1.11) 549 and Redundancy RTP Stream (Section 2.1.12). 551 Characteristics: 553 o Each RTP Stream is identified by a Synchronization source (SSRC) 554 [RFC3550] that is carried in every RTP and RTP Control Protocol 555 (RTCP) packet header. The SSRC is unique in a specific RTP 556 Session context. 558 o At any given point in time, a RTP Stream can have one and only one 559 SSRC, but SSRCs for a given RTP Stream can change over time. SSRC 560 collision and clock rate change [RFC7160] are examples of valid 561 reasons to change SSRC for an RTP Stream. In those cases, the RTP 562 Stream itself is not changed in any significant way, only the 563 identifying SSRC number. 565 o Each SSRC defines a unique RTP sequence numbering and timing 566 space. 568 o Several RTP Streams, each with their own SSRC, may represent a 569 single Media Source. 571 o Several RTP Streams, each with their own SSRC, can be carried in a 572 single RTP Session. 574 2.1.11. RTP-based Redundancy 576 RTP-based Redundancy is defined here as a transformation that 577 generates redundant or repair packets sent out as a Redundancy RTP 578 Stream (Section 2.1.12) to mitigate network transport impairments, 579 like packet loss and delay. 581 The RTP-based Redundancy exists in many flavors; they may be 582 generating independent Repair Streams that are used in addition to 583 the Source Stream (like RTP Retransmission (Section 3.11) and some 584 special types of Forward Error Correction, like RTP stream 585 duplication (Section 3.9)), they may generate a new Source Stream by 586 combining redundancy information with source information (Using XOR 587 FEC (Section 3.12) as a redundancy payload (Section 3.10)), or 588 completely replace the source information with only redundancy 589 packets. 591 2.1.12. Redundancy RTP Stream 593 A RTP Stream (Section 2.1.10) that contains no original source data, 594 only redundant data that may be combined with one or more Received 595 RTP Stream (Section 2.1.19) to produce Repaired RTP Streams 596 (Section 2.1.22). 598 2.1.13. Media Transport 600 A Media Transport defines the transformation that the RTP Streams 601 (Section 2.1.10) are subjected to by the end-to-end transport from 602 one RTP sender to one specific RTP receiver (an RTP Session 603 (Section 2.2.2) may contain multiple RTP receivers per sender). Each 604 Media Transport is defined by a transport association that is 605 normally identified by a 5-tuple (source address, source port, 606 destination address, destination port, transport protocol), but a 607 proposal exists for sending multiple transport associations on a 608 single 5-tuple [I-D.westerlund-avtcore-transport-multiplexing]. 610 Characteristics: 612 o Media Transport transmits RTP Streams of RTP Packets from a source 613 transport address to a destination transport address. 615 o Each Media Transport contains only a single RTP Session. 617 o A single RTP Session can span multiple Media Transports. 619 The Media Transport concept sometimes needs to be decomposed into 620 more steps to enable discussion of what a sender emits that gets 621 transformed by the network before it is received by the receiver. 622 Thus we provide also this Media Transport decomposition (Figure 5). 624 RTP Stream 625 | 626 V 627 +--------------------------+ 628 | Media Transport Sender | 629 +--------------------------+ 630 | 631 Sent RTP Stream 632 V 633 +--------------------------+ 634 | Network Transport | 635 +--------------------------+ 636 | 637 Transported RTP Stream 638 V 639 +--------------------------+ 640 | Media Transport Receiver | 641 +--------------------------+ 642 | 643 V 644 Received RTP Stream 646 Figure 5: Decomposition of Media Transport 648 2.1.14. Media Transport Sender 650 The first transformation within the Media Transport (Section 2.1.13) 651 is the Media Transport Sender. The sending Endpoint (Section 2.2.1) 652 takes an RTP Stream and emits the packets onto the network using the 653 transport association established for this Media Transport, thereby 654 creating a Sent RTP Stream (Section 2.1.15). In the process, it 655 transforms the RTP Stream in several ways. First, it generates the 656 necessary protocol headers for the transport association, for example 657 IP and UDP headers, thus forming IP/UDP/RTP packets. In addition, 658 the Media Transport Sender may queue, pace or otherwise affect how 659 the packets are emitted onto the network, thereby potentially 660 introducing delay, jitter and inter packet spacings that characterize 661 the Sent RTP Stream. 663 2.1.15. Sent RTP Stream 665 The Sent RTP Stream is the RTP Stream as entering the first hop of 666 the network path to its destination. The Sent RTP Stream is 667 identified using network transport addresses, like for IP/UDP the 668 5-tuple (source IP address, source port, destination IP address, 669 destination port, and protocol (UDP)). 671 2.1.16. Network Transport 673 Network Transport is the transformation that subjects the Sent RTP 674 Stream (Section 2.1.15) to traveling from the source to the 675 destination through the network. This transformation can result in 676 loss of some packets, varying delay on a per packet basis, packet 677 duplication, and packet header or data corruption. This 678 transformation produces a Transported RTP Stream (Section 2.1.17) at 679 the exit of the network path. 681 2.1.17. Transported RTP Stream 683 The RTP Stream that is emitted out of the network path at the 684 destination, subjected to the Network Transport's transformation 685 (Section 2.1.16). 687 2.1.18. Media Transport Receiver 689 The receiver Endpoint's (Section 2.2.1) transformation of the 690 Transported RTP Stream (Section 2.1.17) by its reception process, 691 which results in the Received RTP Stream (Section 2.1.19). This 692 transformation includes transport checksums being verified. Sensible 693 system designs typically either discard packets with mis-matching 694 checksums, or pass them on while somehow marking them in the 695 resulting Received RTP Stream so to alarm subsequent transformations 696 about the possible corrupt state. In this context it is worth noting 697 that there is typically some probability for corrupt packets to pass 698 through undetected (with a seemingly correct checksum). Other 699 transformations can compensate for delay variations in receiving a 700 packet on the network interface and providing it to the application 701 (de-jitter buffer). 703 2.1.19. Received RTP Stream 705 The RTP Stream (Section 2.1.10) resulting from the Media Transport's 706 transformation, i.e. subjected to packet loss, packet corruption, 707 packet duplication and varying transmission delay from sender to 708 receiver. 710 2.1.20. Received Redundancy RTP Stream 712 The Redundancy RTP Stream (Section 2.1.12) resulting from the Media 713 Transport transformation, i.e. subjected to packet loss, packet 714 corruption, and varying transmission delay from sender to receiver. 716 2.1.21. RTP-based Repair 718 RTP-based Repair is a Transformation that takes as input one or more 719 Received RTP Streams (Section 2.1.19) and Received Redundancy RTP 720 Streams (Section 2.1.20), and produces one or more Repaired RTP 721 Streams (Section 2.1.22) that are as close to the corresponding sent 722 Source RTP Streams (Section 2.1.10) as possible, using different RTP- 723 based repair methods, for example the ones referred in RTP-based 724 Redundancy (Section 2.1.11). 726 2.1.22. Repaired RTP Stream 728 A Received RTP Stream (Section 2.1.19) for which Received Redundancy 729 RTP Stream (Section 2.1.20) information has been used to try to 730 recover the Source RTP Stream (Section 2.1.10) as it was before Media 731 Transport (Section 2.1.13). 733 2.1.23. Media Depacketizer 735 A Media Depacketizer takes one or more RTP Streams (Section 2.1.10), 736 depacketizes them, and attempts to reconstitute the Encoded Streams 737 (Section 2.1.7) or Dependent Streams (Section 2.1.8) present in those 738 RTP Streams. 740 It should be noted that in practical implementations, the Media 741 Depacketizer and the Media Decoder may be tightly coupled and share 742 information to improve or optimize the overall decoding and error 743 concealment process. It is, however, not expected that there would 744 be any benefit in defining a taxonomy for those detailed (and likely 745 very implementation-dependent) steps. 747 2.1.24. Received Encoded Stream 749 The received version of an Encoded Stream (Section 2.1.7). 751 2.1.25. Media Decoder 753 A Media Decoder is a transformation that is responsible for decoding 754 Encoded Streams (Section 2.1.7) and any Dependent Streams 755 (Section 2.1.8) into a Source Stream (Section 2.1.5). 757 It should be noted that in practical implementations, the Media 758 Decoder and the Media Depacketizer may be tightly coupled and share 759 information to improve or optimize the overall decoding process in 760 various ways. It is however not expected that there would be any 761 benefit in defining a taxonomy for those detailed (and likely very 762 implementation-dependent) steps. 764 Characteristics: 766 o A Media Decoder has to deal with any errors in the Encoded Streams 767 that resulted from corruption or failure to repair packet losses. 768 Therefore, it commonly is robust to error and losses, and includes 769 concealment methods. 771 2.1.26. Received Source Stream 773 The received version of a Source Stream (Section 2.1.5). 775 2.1.27. Media Sink 777 The Media Sink receives a Source Stream (Section 2.1.5) that 778 contains, usually periodically, sampled media data together with 779 associated synchronization information. Depending on application, 780 this Source Stream then needs to be transformed into a Raw Stream 781 (Section 2.1.3) that is conveyed to the Media Render 782 (Section 2.1.29), synchronized with the output from other Media 783 Sinks. The Media Sink may also be connected with a Media Source 784 (Section 2.1.4) and be used as part of a conceptual Media Source. 786 Characteristics: 788 o The Media Sink can further transform the Source Stream into a 789 representation that is suitable for rendering on the Media Render 790 as defined by the application or system-wide configuration. This 791 include sample scaling, level adjustments etc. 793 2.1.28. Received Raw Stream 795 The received version of a Raw Stream (Section 2.1.3). 797 2.1.29. Media Render 799 A Media Render takes a Raw Stream (Section 2.1.3) and converts it 800 into Physical Stimulus (Section 2.1.1) that a human user can 801 perceive. Examples of such devices are screens, and D/A converters 802 connected to amplifiers and loudspeakers. 804 Characteristics: 806 o An Endpoint can potentially have multiple Media Renders for each 807 media type. 809 2.2. Communication Entities 811 This section contains concept for entities involved in the 812 communication. 814 +------------------------------------------------------------+ 815 | Communication Session | 816 | | 817 | +----------------+ +----------------+ | 818 | | Participant A | +------------+ | Participant B | | 819 | | | | Multimedia | | | | 820 | | +-------------+|<==>| Session |<==>|+-------------+ | | 821 | | | Endpoint A || | | || Endpoint B | | | 822 | | | || +------------+ || | | | 823 | | | +-----------++----------------------++-----------+ | | | 824 | | | | | | | | | | 825 | | | | RTP Session|---Media Transport--->| | | | | 826 | | | | Audio |<---Media Transport---| | | | | 827 | | | | | ^ | | | | | 828 | | | +-----------++----------|-----------++-----------+ | | | 829 | | | || v || | | | 830 | | | || +-----------------+ || | | | 831 | | | || | Synchronization | || | | | 832 | | | || | Context | || | | | 833 | | | || +-----------------+ || | | | 834 | | | || ^ || | | | 835 | | | +-----------++----------|-----------++-----------+ | | | 836 | | | | | v | | | | | 837 | | | | RTP Session|<---Media Transport---| | | | | 838 | | | | Video |---Media Transport--->| | | | | 839 | | | | | | | | | | 840 | | | +-----------++----------------------++-----------+ | | | 841 | | +-------------+| |+-------------+ | | 842 | +----------------+ +----------------+ | 843 +------------------------------------------------------------+ 845 Figure 6: Example Point to Point Communication Session with two RTP 846 Sessions 848 The figure above shows a high-level example representation of a very 849 basic point-to-point Communication Session between Participants A and 850 B. It uses two different audio and video RTP Sessions between A's 851 and B's Endpoints, using separate Media Transports for those RTP 852 Sessions. The Multimedia Session shared by the Participants can, for 853 example, be established using SIP (i.e., there is a SIP Dialog 854 between A and B). The terms used in that figure are further 855 elaborated in the sub-sections below. 857 2.2.1. Endpoint 859 A single addressable entity sending or receiving RTP packets. It may 860 be decomposed into several functional blocks, but as long as it 861 behaves as a single RTP stack entity it is classified as a single 862 "Endpoint". 864 Characteristics: 866 o Endpoints can be identified in several different ways. While RTCP 867 Canonical Names (CNAMEs) [RFC3550] provide a globally unique and 868 stable identification mechanism for the duration of the 869 Communication Session (see Section 2.2.5), their validity applies 870 exclusively within a Synchronization Context (Section 3.1). Thus 871 one Endpoint can handle multiple CNAMEs, each of which can be 872 shared among a set of Endpoints belonging to the same Participant 873 (Section 2.2.3). Therefore, mechanisms outside the scope of RTP, 874 such as application defined mechanisms, must be used to ensure 875 Endpoint identification when outside this Synchronization Context. 877 o An Endpoint can be associated with at most one Participant 878 (Section 2.2.3) at any single point in time. 880 o In some contexts, an Endpoint would typically correspond to a 881 single "host", for example a computer using a single network 882 interface and being used by a single human user. 884 2.2.2. RTP Session 886 An RTP Session is an association among a group of Participants 887 communicating with RTP. It is a group communications channel which 888 can potentially carry a number of RTP Streams. Within an RTP 889 Session, every Participant can find meta-data and control information 890 (over RTCP) about all the RTP Streams in the RTP Session. The 891 bandwidth of the RTCP control channel is shared between all 892 Participants within an RTP Session. 894 Characteristics: 896 o An RTP Session can carry one ore more RTP Streams. 898 o An RTP Session shares a single SSRC space as defined in RFC3550 899 [RFC3550]. That is, the Endpoints participating in an RTP Session 900 can see an SSRC identifier transmitted by any of the other 901 Endpoints. An Endpoint can receive an SSRC either as SSRC or as a 902 Contributing source (CSRC) in RTP and RTCP packets, as defined by 903 the Endpoints' network interconnection topology. 905 o An RTP Session uses at least two Media Transports 906 (Section 2.1.13), one for sending and one for receiving. 907 Commonly, the receiving Media Transport is the reverse direction 908 of the Media Transport used for sending. An RTP Session may use 909 many Media Transports and these define the session's network 910 interconnection topology. 912 o A single Media Transport always carries a single RTP Session. 914 o Multiple RTP Sessions can be conceptually related, for example 915 originating from or targeted for the same Participant 916 (Section 2.2.3) or Endpoint (Section 2.2.1), or by containing RTP 917 Streams that are somehow related (Section 3). 919 2.2.3. Participant 921 A Participant is an entity reachable by a single signaling address, 922 and is thus related more to the signaling context than to the media 923 context. 925 Characteristics: 927 o A single signaling-addressable entity, using an application- 928 specific signaling address space, for example a SIP URI. 930 o A Participant can participate in several Multimedia Sessions 931 (Section 2.2.4). 933 o A Participant can be comprised of several associated Endpoints 934 (Section 2.2.1). 936 2.2.4. Multimedia Session 938 A Multimedia Session is an association among a group of Participants 939 (Section 2.2.3) engaged in the communication via one or more RTP 940 Sessions (Section 2.2.2). It defines logical relationships among 941 Media Sources (Section 2.1.4) that appear in multiple RTP Sessions. 943 Characteristics: 945 o A Multimedia Session can be composed of several RTP Sessions with 946 potentially multiple RTP Streams per RTP Session. 948 o Each Participant in a Multimedia Session can have a multitude of 949 Media Captures and Media Rendering devices. 951 o A single Multimedia Session can contain media from one or more 952 Synchronization Contexts (Section 3.1). An example of that is a 953 Multimedia Session containing one set of audio and video for 954 communication purposes belonging to one Synchronization Context, 955 and another set of audio and video for presentation purposes (like 956 playing a video file) with a separate Synchronization Context that 957 has no strong timing relationship and need not be strictly 958 synchronized with the audio and video used for communication. 960 2.2.5. Communication Session 962 A Communication Session is an association among two or more 963 Participants (Section 2.2.3) communicating with each other via one or 964 more Multimedia Sessions (Section 2.2.4). 966 Characteristics: 968 o Each Participant in a Communication Session is identified via an 969 application-specific signaling address. 971 o A Communication Session is composed of Participants that share at 972 least one Multimedia Session, involving one or more parallel RTP 973 Sessions with potentially multiple RTP Streams per RTP Session. 975 For example, in a full mesh communication, the Communication Session 976 consists of a set of separate Multimedia Sessions between each pair 977 of Participants. Another example is a centralized conference, where 978 the Communication Session consists of a set of Multimedia Sessions 979 between each Participant and the conference handler. 981 3. Concepts of Inter-Relations 983 This section uses the concepts from previous sections, and looks at 984 different types of relationships among them. These relationships 985 occur at different abstraction levels and for different purposes, but 986 the reason for the needed relationship at a certain step in the media 987 handling chain may exist at another step. For example, the use of 988 Simulcast (Section 3.7)) implies a need to determine relations at RTP 989 Stream level, but the underlying reason is that multiple Media 990 Encoders use the same Media Source, i.e. to be able to identify a 991 common Media Source. 993 3.1. Synchronization Context 995 A Synchronization Context defines a requirement on a strong timing 996 relationship between the Media Sources, typically requiring alignment 997 of clock sources. Such a relationship can be identified in multiple 998 ways as listed below. A single Media Source can only belong to a 999 single Synchronization Context, since it is assumed that a single 1000 Media Source can only have a single media clock and requiring 1001 alignment to several Synchronization Contexts (and thus reference 1002 clocks) will effectively merge those into a single Synchronization 1003 Context. 1005 3.1.1. RTCP CNAME 1007 RFC3550 [RFC3550] describes Inter-media synchronization between RTP 1008 Sessions based on RTCP CNAME, RTP and Network Time Protocol (NTP) 1009 [RFC5905] formatted timestamps of a reference clock. As indicated in 1010 [RFC7273], despite using NTP format timestamps, it is not required 1011 that the clock be synchronized to an NTP source. 1013 3.1.2. Clock Source Signaling 1015 [RFC7273] provides a mechanism to signal the clock source in SDP both 1016 for the reference clock as well as the media clock, thus allowing a 1017 Synchronization Context to be defined beyond the one defined by the 1018 usage of CNAME source descriptions. 1020 3.1.3. Implicitly via RtcMediaStream 1022 The WebRTC WG defines "RtcMediaStream" with one or more 1023 "RtcMediaStreamTracks". All tracks in a "RtcMediaStream" are 1024 intended to be synchronized when rendered, implying that they must be 1025 generated such that synchronization is possible. 1027 3.1.4. Explicitly via SDP Mechanisms 1029 The SDP Grouping Framework [RFC5888] defines an m= line (Section 4.2) 1030 grouping mechanism called "Lip Synchronization (LS)" for establishing 1031 the synchronization requirement across m= lines when they map to 1032 individual sources. 1034 Source-Specific Media Attributes in SDP [RFC5576] extends the above 1035 mechanism when multiple Media Sources are described by a single m= 1036 line. 1038 3.2. Endpoint 1040 Some applications requires knowledge of what Media Sources originate 1041 from a particular Endpoint (Section 2.2.1). This can include such 1042 decisions as packet routing between parts of the topology, knowing 1043 the Endpoint origin of the RTP Streams. 1045 In RTP, this identification has been overloaded with the 1046 Synchronization Context (Section 3.1) through the usage of the RTCP 1047 source description CNAME (Section 3.1.1). This works for some 1048 usages, but in others it breaks down. For example, if an Endpoint 1049 has two sets of Media Sources that have different Synchronization 1050 Contexts, like the audio and video of the human Participant as well 1051 as a set of Media Sources of audio and video for a shared movie, 1052 CNAME would not be an appropriate identification for that Endpoint. 1053 Therefore, an Endpoint may have multiple CNAMEs. The CNAMEs or the 1054 Media Sources themselves can be related to the Endpoint. 1056 3.3. Participant 1058 In communication scenarios, it is commonly needed to know which Media 1059 Sources originate from which Participant (Section 2.2.3). One reason 1060 is, for example, to enable the application to display Participant 1061 Identity information correctly associated with the Media Sources. 1062 This association is handled through the signaling solution to point 1063 at a specific Multimedia Session where the Media Sources may be 1064 explicitly or implicitly tied to a particular Endpoint. 1066 Participant information becomes more problematic due to Media Sources 1067 that are generated through mixing or other conceptual processing of 1068 Raw Streams or Source Streams that originate from different 1069 Participants. This type of Media Sources can thus have a dynamically 1070 varying set of origins and Participants. RTP contains the concept of 1071 Contributing Sources (CSRC) that carry information about the previous 1072 step origin of the included media content on RTP level. 1074 3.4. RtcMediaStream 1076 An RtcMediaStream in WebRTC is an explicit grouping of a set of Media 1077 Sources (RtcMediaStreamTracks) that share a common identifier and a 1078 single Synchronization Context (Section 3.1). 1080 3.5. Single- and Multi-Session Transmission of Dependent Streams 1082 Scalable media coding formats such as, for example, H.264 based 1083 Scalable Video Coding [RFC6190] has two modes of operation: 1085 1. In Single Session Transmission (SST), the SVC Media Encoder sends 1086 Encoded Streams (Section 2.1.7) and Dependent Streams 1087 (Section 2.1.8) as a single RTP Stream (Section 2.1.10) in a 1088 single RTP Session (Section 2.2.2), using the SVC RTP Payload 1089 format. 1091 2. In Multi-Session Transmission (MST), the SVC Media Encoder sends 1092 Encoded Streams and Dependent Streams distributed across multiple 1093 RTP Streams in one or more RTP Sessions. 1095 SST denotes one RTP Stream (SSRC) per Media Source in a single RTP 1096 Session. MST denotes one or more RTP Streams (SSRC) per Media Source 1097 in each of multiple RTP Sessions. The above is not unambiguously 1098 specified in the SVC payload format text [RFC6190], but it is what 1099 existing deployments of that RFC have implemented. 1101 The use of the term "RTP Session" in the SST/MST definition is 1102 somewhat misleading, since a single RTP Session can contain multiple 1103 RTP Streams. Also, it is sometimes useful to make a distinction 1104 between using a single Media Transport or multiple separate Media 1105 Transports when (in both cases) using multiple RTP Streams to carry 1106 Encoded Streams and Dependent Streams for a Media Source. Therefore, 1107 herein the following new terminology is defined: 1109 SRST: Single RTP Stream on a Single Media Transport 1111 MRST: Multiple RTP Streams on a Single Media Transport 1113 MRMT: Multiple RTP Streams on Multiple Media Transports 1115 3.6. Multi-Channel Audio 1117 There exist a number of RTP payload formats that can carry multi- 1118 channel audio, despite the codec being a mono encoder. Multi-channel 1119 audio can be viewed as multiple Media Sources sharing a common 1120 Synchronization Context. These are independently encoded by a Media 1121 Encoder and the different Encoded Streams are packetized together in 1122 a time synchronized way into a single Source RTP Stream, using the 1123 used codec's RTP Payload format. Examples of codecs that support 1124 multi-channel audio are PCMA and PCMU [RFC3551], AMR [RFC4867], and 1125 G.719 [RFC5404]. 1127 3.7. Simulcast 1129 A Media Source represented as multiple independent Encoded Streams 1130 constitutes a Simulcast or Multiple Description Coding of that Media 1131 Source. Figure 7 below shows an example of a Media Source that is 1132 encoded into three separate Simulcast streams, that are in turn sent 1133 on the same Media Transport flow. When using Simulcast, the RTP 1134 Streams may be sharing RTP Session and Media Transport, or be 1135 separated on different RTP Sessions and Media Transports, or any 1136 combination of these two. It is other considerations that affect 1137 which usage is desirable, as discussed in Section 3.13. 1139 +----------------+ 1140 | Media Source | 1141 +----------------+ 1142 Source Stream | 1143 +----------------------+----------------------+ 1144 | | | 1145 V V V 1146 +------------------+ +------------------+ +------------------+ 1147 | Media Encoder | | Media Encoder | | Media Encoder | 1148 +------------------+ +------------------+ +------------------+ 1149 | Encoded | Encoded | Encoded 1150 | Stream | Stream | Stream 1151 V V V 1152 +------------------+ +------------------+ +------------------+ 1153 | Media Packetizer | | Media Packetizer | | Media Packetizer | 1154 +------------------+ +------------------+ +------------------+ 1155 | Source | Source | Source 1156 | RTP | RTP | RTP 1157 | Stream | Stream | Stream 1158 +-----------------+ | +-----------------+ 1159 | | | 1160 V V V 1161 +-------------------+ 1162 | Media Transport | 1163 +-------------------+ 1165 Figure 7: Example of Media Source Simulcast 1167 The Simulcast relation between the RTP Streams is the common Media 1168 Source. In addition, to be able to identify the common Media Source, 1169 a receiver of the RTP Stream may need to know which configuration or 1170 encoding goals that lay behind the produced Encoded Stream and its 1171 properties. This to enable selection of the stream that is most 1172 useful in the application at that moment. 1174 3.8. Layered Multi-Stream 1176 Layered Multi-Stream (LMS) is a mechanism by which different portions 1177 of a layered encoding of a Source Stream are sent using separate RTP 1178 Streams (sometimes in separate RTP Sessions). LMSs are useful for 1179 receiver control of layered media. 1181 A Media Source represented as an Encoded Stream and multiple 1182 Dependent Streams constitutes a Media Source that has layered 1183 dependencies. The figure below represents an example of a Media 1184 Source that is encoded into three dependent layers, where two layers 1185 are sent on the same Media Transport using different RTP Streams, 1186 i.e. SSRCs, and the third layer is sent on a separate Media 1187 Transport. 1189 +----------------+ 1190 | Media Source | 1191 +----------------+ 1192 | 1193 | 1194 V 1195 +---------------------------------------------------------+ 1196 | Media Encoder | 1197 +---------------------------------------------------------+ 1198 | | | 1199 Encoded Stream Dependent Stream Dependent Stream 1200 | | | 1201 V V V 1202 +----------------+ +----------------+ +----------------+ 1203 |Media Packetizer| |Media Packetizer| |Media Packetizer| 1204 +----------------+ +----------------+ +----------------+ 1205 | | | 1206 RTP Stream RTP Stream RTP Stream 1207 | | | 1208 +------+ +------+ | 1209 | | | 1210 V V V 1211 +-----------------+ +-----------------+ 1212 | Media Transport | | Media Transport | 1213 +-----------------+ +-----------------+ 1215 Figure 8: Example of Media Source Layered Dependency 1217 As an example, the SVC MRST and MRMT (Section 3.5) relations needs to 1218 identify the common Media Encoder origin for the Encoded and 1219 Dependent Streams. The SVC RTP Payload RFC [RFC6190] is not 1220 particularly explicit about how this relation is to be implemented. 1221 When using different RTP Sessions, thus different Media Transports 1222 (MRMT (Section 3.5)), and as long as there is only one RTP Stream per 1223 Media Encoder and a single Media Source in each RTP Session (MRMT), 1224 common SSRC and CNAMEs can be used to identify the common Media 1225 Source. When multiple RTP Streams are sent from one Media Encoder in 1226 the same RTP Session (MRST), then CNAME is the only currently 1227 specified RTP identifier that can be used. In cases where multiple 1228 Media Encoders use multiple Media Sources sharing Synchronization 1229 Context, and thus having a common CNAME, additional heuristics or 1230 identification need to be applied to create the MRST or MRMT 1231 relationships between the RTP Streams. 1233 3.9. RTP Stream Duplication 1235 RTP Stream Duplication [RFC7198], using the same or different Media 1236 Transports, and optionally also delaying the duplicate [RFC7197], 1237 offers a simple way to protect media flows from packet loss in some 1238 cases. It is a specific type of redundancy and all but one Source 1239 RTP Stream (Section 2.1.10) are effectively Redundancy RTP Streams 1240 (Section 2.1.12), but since both Source and Redundant RTP Streams are 1241 the same it does not matter which one is which. This can also be 1242 seen as a specific type of Simulcast (Section 3.7) that transmits the 1243 same Encoded Stream (Section 2.1.7) multiple times. 1245 +----------------+ 1246 | Media Source | 1247 +----------------+ 1248 Source Stream | 1249 V 1250 +----------------+ 1251 | Media Encoder | 1252 +----------------+ 1253 Encoded Stream | 1254 +-----------+-----------+ 1255 | | 1256 V V 1257 +------------------+ +------------------+ 1258 | Media Packetizer | | Media Packetizer | 1259 +------------------+ +------------------+ 1260 Source | RTP Stream Source | RTP Stream 1261 | V 1262 | +-------------+ 1263 | | Delay (opt) | 1264 | +-------------+ 1265 | | 1266 +-----------+-----------+ 1267 | 1268 V 1269 +-------------------+ 1270 | Media Transport | 1271 +-------------------+ 1273 Figure 9: Example of RTP Stream Duplication 1275 3.10. Redundancy Format 1277 The RTP Payload for Redundant Audio Data [RFC2198] defines a 1278 transport for redundant audio data together with primary data in the 1279 same RTP payload. The redundant data can be a time delayed version 1280 of the primary or another time delayed Encoded Stream using a 1281 different Media Encoder to encode the same Media Source as the 1282 primary, as depicted below in Figure 10. 1284 +--------------------+ 1285 | Media Source | 1286 +--------------------+ 1287 | 1288 Source Stream 1289 | 1290 +------------------------+ 1291 | | 1292 V V 1293 +--------------------+ +--------------------+ 1294 | Media Encoder | | Media Encoder | 1295 +--------------------+ +--------------------+ 1296 | | 1297 | +------------+ 1298 Encoded Stream | Time Delay | 1299 | +------------+ 1300 | | 1301 | +------------------+ 1302 V V 1303 +--------------------+ 1304 | Media Packetizer | 1305 +--------------------+ 1306 | 1307 V 1308 RTP Stream 1310 Figure 10: Concept for usage of Audio Redundancy with different Media 1311 Encoders 1313 The Redundancy format is thus providing the necessary meta 1314 information to correctly relate different parts of the same Encoded 1315 Stream, or in the case depicted above (Figure 10) relate the Received 1316 Source Stream fragments coming out of different Media Decoders to be 1317 able to combine them together into a less erroneous Source Stream. 1319 3.11. RTP Retransmission 1321 Figure 11 shows an example where a Media Source's Source RTP Stream 1322 is protected by a retransmission (RTX) flow [RFC4588]. In this 1323 example the Source RTP Stream and the Redundancy RTP Stream share the 1324 same Media Transport. 1326 +--------------------+ 1327 | Media Source | 1328 +--------------------+ 1329 | 1330 V 1331 +--------------------+ 1332 | Media Encoder | 1333 +--------------------+ 1334 | Retransmission 1335 Encoded Stream +--------+ +---- Request 1336 V | V V 1337 +--------------------+ | +--------------------+ 1338 | Media Packetizer | | | RTP Retransmission | 1339 +--------------------+ | +--------------------+ 1340 | | | 1341 +------------+ Redundancy RTP Stream 1342 Source RTP Stream | 1343 | | 1344 +---------+ +---------+ 1345 | | 1346 V V 1347 +-----------------+ 1348 | Media Transport | 1349 +-----------------+ 1351 Figure 11: Example of Media Source Retransmission Flows 1353 The RTP Retransmission example (Figure 11) illustrates that this 1354 mechanism works purely on the Source RTP Stream. The RTP 1355 Retransmission transform buffers the sent Source RTP Stream and, upon 1356 request, emits a retransmitted packet with an extra payload header as 1357 a Redundancy RTP Stream. The RTP Retransmission mechanism [RFC4588] 1358 is specified such that there is a one to one relation between the 1359 Source RTP Stream and the Redundancy RTP Stream. Therefore, a 1360 Redundancy RTP Stream needs to be associated with its Source RTP 1361 Stream. This is done based on CNAME selectors and heuristics to 1362 match requested packets for a given Source RTP Stream with the 1363 original sequence number in the payload of any new Redundancy RTP 1364 Stream using the RTX payload format. In cases where the Redundancy 1365 RTP Stream is sent in a separate RTP Session from the Source RTP 1366 Stream, these sessions are related, which is signaled by using the 1367 SDP Media Grouping's [RFC5888] FID semantics. 1369 3.12. Forward Error Correction 1371 The figure below (Figure 12) shows an example where two Media 1372 Sources' Source RTP Streams are protected by FEC. Source RTP Stream 1373 A has a RTP-based Redundancy transformation in FEC Encoder 1. This 1374 produces a Redundancy RTP Stream 1, that is only related to Source 1375 RTP Stream A. The FEC Encoder 2, however, takes two Source RTP 1376 Streams (A and B) and produces a Redundancy RTP Stream 2 that 1377 protects them jointly, i.e. Redundancy RTP Stream 2 relates to two 1378 Source RTP Streams (a FEC group). FEC decoding, when needed due to 1379 packet loss or packet corruption at the receiver, requires knowledge 1380 about which Source RTP Streams that the FEC encoding was based on. 1382 In Figure 12 all RTP Streams are sent on the same Media Transport. 1383 This is however not the only possible choice. Numerous combinations 1384 exist for spreading these RTP Streams over different Media Transports 1385 to achieve the communication application's goal. 1387 +--------------------+ +--------------------+ 1388 | Media Source A | | Media Source B | 1389 +--------------------+ +--------------------+ 1390 | | 1391 V V 1392 +--------------------+ +--------------------+ 1393 | Media Encoder A | | Media Encoder B | 1394 +--------------------+ +--------------------+ 1395 | | 1396 Encoded Stream Encoded Stream 1397 V V 1398 +--------------------+ +--------------------+ 1399 | Media Packetizer A | | Media Packetizer B | 1400 +--------------------+ +--------------------+ 1401 | | 1402 Source RTP Stream A Source RTP Stream B 1403 | | 1404 +-----+---------+-------------+ +---+---+ 1405 | V V V | 1406 | +---------------+ +---------------+ | 1407 | | FEC Encoder 1 | | FEC Encoder 2 | | 1408 | +---------------+ +---------------+ | 1409 | Redundancy | Redundancy | | 1410 | RTP Stream 1 | RTP Stream 2 | | 1411 V V V V 1412 +----------------------------------------------------------+ 1413 | Media Transport | 1414 +----------------------------------------------------------+ 1416 Figure 12: Example of FEC Redundancy RTP Streams 1418 As FEC Encoding exists in various forms, the methods for relating FEC 1419 Redundancy RTP Streams with its source information in Source RTP 1420 Streams are many. The XOR based RTP FEC Payload format [RFC5109] is 1421 defined in such a way that a Redundancy RTP Stream has a one to one 1422 relation with a Source RTP Stream. In fact, the RFC requires the 1423 Redundancy RTP Stream to use the same SSRC as the Source RTP Stream. 1424 This requires to either use a separate RTP Session or to use the 1425 Redundancy RTP Payload format [RFC2198]. The underlying relation 1426 requirement for this FEC format and a particular Redundancy RTP 1427 Stream is to know the related Source RTP Stream, including its SSRC. 1429 3.13. RTP Stream Separation 1431 RTP Streams can be separated exclusively based on their SSRCs, at the 1432 RTP Session level, or at the Multi-Media Session level. 1434 When the RTP Streams that have a relationship are all sent in the 1435 same RTP Session and are uniquely identified based on their SSRC 1436 only, it is termed an SSRC-Only Based Separation. Such streams can 1437 be related via RTCP CNAME to identify that the streams belong to the 1438 same Endpoint. SSRC-based approaches [RFC5576], when used, can 1439 explicitly relate various such RTP Streams. 1441 On the other hand, when RTP Streams that are related but are sent in 1442 the context of different RTP Sessions to achieve separation, it is 1443 known as RTP Session-based separation. This is commonly used when 1444 the different RTP Streams are intended for different Media 1445 Transports. 1447 Several mechanisms that use RTP Session-based separation rely on it 1448 to enable an implicit grouping mechanism expressing the relationship. 1449 The solutions have been based on using the same SSRC value in the 1450 different RTP Sessions to implicitly indicate their relation. That 1451 way, no explicit RTP level mechanism has been needed, only signaling 1452 level relations have been established using semantics from Grouping 1453 of Media lines framework [RFC5888]. Examples of this are RTP 1454 Retransmission [RFC4588], SVC Multi-Session Transmission [RFC6190] 1455 and XOR Based FEC [RFC5109]. RTCP CNAME explicitly relates RTP 1456 Streams across different RTP Sessions, as explained in the previous 1457 section. Such a relationship can be used to perform inter-media 1458 synchronization. 1460 RTP Streams that are related and need to be associated can be part of 1461 different Multimedia Sessions, rather than just different RTP 1462 Sessions within the same Multimedia Session context. This puts 1463 further demand on the scope of the mechanism(s) and its handling of 1464 identifiers used for expressing the relationships. 1466 3.14. Multiple RTP Sessions over one Media Transport 1468 [I-D.westerlund-avtcore-transport-multiplexing] describes a mechanism 1469 that allows several RTP Sessions to be carried over a single 1470 underlying Media Transport. The main reasons for doing this are 1471 related to the impact of using one or more Media Transports (using a 1472 common network path or potentially have different ones). The fewer 1473 Media Transports used, the less need for NAT/FW traversal resources 1474 and number of flow based QoS. 1476 However, Multiple RTP Sessions over one Media Transport imply that a 1477 single Media Transport 5-tuple is not sufficient to express in which 1478 RTP Session context a particular RTP Stream exists. Complexities in 1479 the relationship between Media Transports and RTP Session already 1480 exist as one RTP Session contains multiple Media Transports, e.g. 1481 even a Peer-to-Peer RTP Session with RTP/RTCP Multiplexing requires 1482 two Media Transports, one in each direction. The relationship 1483 between Media Transports and RTP Sessions as well as additional 1484 levels of identifiers need to be considered in both signaling design 1485 and when defining terminology. 1487 4. Mapping from Existing Terms 1489 This section describes a selected set of terms from some relevant 1490 IETF RFC and Internet Drafts (at the time of writing), using the 1491 concepts from previous sections. 1493 4.1. Telepresence Terms 1495 The terms in this sub-section are used in the context of CLUE 1496 Telepresence [I-D.ietf-clue-framework]. 1498 4.1.1. Audio Capture 1500 Describes an audio Media Source (Section 2.1.4). 1502 4.1.2. Capture Device 1504 Identifies a physical entity performing a Media Capture 1505 (Section 2.1.2) transformation. 1507 4.1.3. Capture Encoding 1509 Describes an Encoded Stream (Section 2.1.7) related to CLUE specific 1510 semantic information. 1512 4.1.4. Capture Scene 1514 Describes a set of spatially related Media Sources (Section 2.1.4). 1516 4.1.5. Endpoint 1518 Describes exactly one Participant (Section 2.2.3) and one or more 1519 Endpoints (Section 2.2.1). 1521 4.1.6. Individual Encoding 1523 Describes the configuration information needed to perform a Media 1524 Encoder (Section 2.1.6) transformation. 1526 4.1.7. Media Capture 1528 Describes either a Media Capture (Section 2.1.2) or a Media Source 1529 (Section 2.1.4), depending on in which context the term is used. 1531 4.1.8. Media Consumer 1533 Describes the media receiving part of an Endpoint (Section 2.2.1). 1535 4.1.9. Media Provider 1537 Describes the media sending part of an Endpoint (Section 2.2.1). 1539 4.1.10. Stream 1541 Describes an RTP Stream (Section 2.1.10). 1543 4.1.11. Video Capture 1545 Describes a video Media Source (Section 2.1.4). 1547 4.2. Media Description 1549 A single Source Description Protocol (SDP) [RFC4566] media 1550 description (or media block; an m-line and all subsequent lines until 1551 the next m-line or the end of the SDP) describes part of the 1552 necessary configuration and identification information needed for a 1553 Media Encoder transformation, as well as the necessary configuration 1554 and identification information for the Media Decoder to be able to 1555 correctly interpret a received RTP Stream. 1557 A Media Description typically relates to a single Media Source. This 1558 is for example an explicit restriction in WebRTC. However, nothing 1559 prevents that the same Media Description (and same RTP Session) is 1560 re-used for multiple Media Sources 1561 [I-D.ietf-avtcore-rtp-multi-stream]. It can thus describe properties 1562 of one or more RTP Streams, and can also describe properties valid 1563 for an entire RTP Session (via [RFC5576] mechanisms, for example). 1565 4.3. Media Stream 1567 RTP [RFC3550] uses media stream, audio stream, video stream, and 1568 stream of (RTP) packets interchangeably, which are all RTP Streams. 1570 4.4. Multimedia Conference 1572 A Multimedia Conference is a Communication Session (Section 2.2.5) 1573 between two or more Participants (Section 2.2.3), along with the 1574 software they are using to communicate. 1576 4.5. Multimedia Session 1578 SDP [RFC4566] defines a Multimedia Session as a set of multimedia 1579 senders and receivers and the data streams flowing from senders to 1580 receivers, which would correspond to a set of Endpoints and the RTP 1581 Streams that flow between them. In this memo, Multimedia Session 1582 (Section 2.2.4) also assumes those Endpoints belong to a set of 1583 Participants that are engaged in communication via a set of related 1584 RTP Streams. 1586 RTP [RFC3550] defines a Multimedia Session as a set of concurrent RTP 1587 Sessions among a common group of Participants. For example, a video 1588 conference may contain an audio RTP Session and a video RTP Session. 1589 This would correspond to a group of Participants (each using one or 1590 more Endpoints) sharing a set of concurrent RTP Sessions. In this 1591 memo, Multimedia Session also defines those RTP Sessions to have some 1592 relation and be part of a communication among the Participants. 1594 4.6. Multipoint Control Unit (MCU) 1596 This term is commonly used to describe the central node in any type 1597 of star topology [I-D.ietf-avtcore-rtp-topologies-update] conference. 1598 It describes a device that includes one Participant (Section 2.2.3) 1599 (usually corresponding to a so-called conference focus) and one or 1600 more related Endpoints (Section 2.2.1) (sometimes one or more per 1601 conference Participant). 1603 4.7. Recording Device 1605 WebRTC specifications use this term to refer to locally available 1606 entities performing a Media Capture (Section 2.1.2) transformation. 1608 4.8. RtcMediaStream 1610 A WebRTC RtcMediaStreamTrack is a set of Media Sources 1611 (Section 2.1.4) sharing the same Synchronization Context 1612 (Section 3.1). 1614 4.9. RtcMediaStreamTrack 1616 A WebRTC RtcMediaStreamTrack is a Media Source (Section 2.1.4). 1618 4.10. RTP Sender 1620 RTP [RFC3550] uses this term, which can be seen as the RTP protocol 1621 part of a Media Packetizer (Section 2.1.9). 1623 4.11. RTP Session 1625 Within the context of SDP, a singe m= line can map to a single RTP 1626 Session (Section 2.2.2) or multiple m= lines can map to a single RTP 1627 Session. The latter is enabled via multiplexing schemes such as 1628 BUNDLE [I-D.ietf-mmusic-sdp-bundle-negotiation], for example, which 1629 allows mapping of multiple m= lines to a single RTP Session. 1631 4.12. SSRC 1633 RTP [RFC3550] defines this as "the source of a stream of RTP 1634 packets", which indicates that an SSRC is not only a unique 1635 identifier for the Encoded Stream (Section 2.1.7) carried in those 1636 packets, but is also effectively used as a term to denote a Media 1637 Packetizer (Section 2.1.9). 1639 5. Security Considerations 1641 This document simply tries to clarify the confusion prevalent in RTP 1642 taxonomy because of inconsistent usage by multiple technologies and 1643 protocols making use of the RTP protocol. It does not introduce any 1644 new security considerations beyond those already well documented in 1645 the RTP protocol [RFC3550] and each of the many respective 1646 specifications of the various protocols making use of it. 1648 Hopefully having a well-defined common terminology and understanding 1649 of the complexities of the RTP architecture will help lead us to 1650 better standards, avoiding security problems. 1652 6. Acknowledgement 1654 This document has many concepts borrowed from several documents such 1655 as WebRTC [I-D.ietf-rtcweb-overview], CLUE [I-D.ietf-clue-framework], 1656 Multiplexing Architecture 1657 [I-D.westerlund-avtcore-transport-multiplexing]. The authors would 1658 like to thank all the authors of each of those documents. 1660 The authors would also like to acknowledge the insights, guidance and 1661 contributions of Magnus Westerlund, Roni Even, Paul Kyzivat, Colin 1662 Perkins, Keith Drage, Harald Alvestrand, Alex Eleftheriadis, Mo 1663 Zanaty, and Stephan Wenger. 1665 7. Contributors 1667 Magnus Westerlund has contributed the concept model for the media 1668 chain using transformations and streams model, including rewriting 1669 pre-existing concepts into this model and adding missing concepts. 1670 The first proposal for updating the relationships and the topologies 1671 based on this concept was also performed by Magnus. 1673 8. IANA Considerations 1675 This document makes no request of IANA. 1677 9. Informative References 1679 [I-D.ietf-avtcore-rtp-multi-stream] 1680 Lennox, J., Westerlund, M., Wu, W., and C. Perkins, 1681 "Sending Multiple Media Streams in a Single RTP Session", 1682 draft-ietf-avtcore-rtp-multi-stream-06 (work in progress), 1683 October 2014. 1685 [I-D.ietf-avtcore-rtp-topologies-update] 1686 Westerlund, M. and S. Wenger, "RTP Topologies", draft- 1687 ietf-avtcore-rtp-topologies-update-05 (work in progress), 1688 November 2014. 1690 [I-D.ietf-clue-framework] 1691 Duckworth, M., Pepperell, A., and S. Wenger, "Framework 1692 for Telepresence Multi-Streams", draft-ietf-clue- 1693 framework-19 (work in progress), December 2014. 1695 [I-D.ietf-mmusic-sdp-bundle-negotiation] 1696 Holmberg, C., Alvestrand, H., and C. Jennings, 1697 "Negotiating Media Multiplexing Using the Session 1698 Description Protocol (SDP)", draft-ietf-mmusic-sdp-bundle- 1699 negotiation-14 (work in progress), December 2014. 1701 [I-D.ietf-rtcweb-overview] 1702 Alvestrand, H., "Overview: Real Time Protocols for 1703 Browser-based Applications", draft-ietf-rtcweb-overview-13 1704 (work in progress), November 2014. 1706 [I-D.westerlund-avtcore-transport-multiplexing] 1707 Westerlund, M. and C. Perkins, "Multiplexing Multiple RTP 1708 Sessions onto a Single Lower-Layer Transport", draft- 1709 westerlund-avtcore-transport-multiplexing-07 (work in 1710 progress), October 2013. 1712 [RFC2198] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., 1713 Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse- 1714 Parisis, "RTP Payload for Redundant Audio Data", RFC 2198, 1715 September 1997. 1717 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 1718 Jacobson, "RTP: A Transport Protocol for Real-Time 1719 Applications", STD 64, RFC 3550, July 2003. 1721 [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and 1722 Video Conferences with Minimal Control", STD 65, RFC 3551, 1723 July 2003. 1725 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 1726 Description Protocol", RFC 4566, July 2006. 1728 [RFC4588] Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R. 1729 Hakenberg, "RTP Retransmission Payload Format", RFC 4588, 1730 July 2006. 1732 [RFC4867] Sjoberg, J., Westerlund, M., Lakaniemi, A., and Q. Xie, 1733 "RTP Payload Format and File Storage Format for the 1734 Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband 1735 (AMR-WB) Audio Codecs", RFC 4867, April 2007. 1737 [RFC5109] Li, A., "RTP Payload Format for Generic Forward Error 1738 Correction", RFC 5109, December 2007. 1740 [RFC5404] Westerlund, M. and I. Johansson, "RTP Payload Format for 1741 G.719", RFC 5404, January 2009. 1743 [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific 1744 Media Attributes in the Session Description Protocol 1745 (SDP)", RFC 5576, June 2009. 1747 [RFC5888] Camarillo, G. and H. Schulzrinne, "The Session Description 1748 Protocol (SDP) Grouping Framework", RFC 5888, June 2010. 1750 [RFC5905] Mills, D., Martin, J., Burbank, J., and W. Kasch, "Network 1751 Time Protocol Version 4: Protocol and Algorithms 1752 Specification", RFC 5905, June 2010. 1754 [RFC6190] Wenger, S., Wang, Y., Schierl, T., and A. Eleftheriadis, 1755 "RTP Payload Format for Scalable Video Coding", RFC 6190, 1756 May 2011. 1758 [RFC7160] Petit-Huguenin, M. and G. Zorn, "Support for Multiple 1759 Clock Rates in an RTP Session", RFC 7160, April 2014. 1761 [RFC7197] Begen, A., Cai, Y., and H. Ou, "Duplication Delay 1762 Attribute in the Session Description Protocol", RFC 7197, 1763 April 2014. 1765 [RFC7198] Begen, A. and C. Perkins, "Duplicating RTP Streams", RFC 1766 7198, April 2014. 1768 [RFC7273] Williams, A., Gross, K., van Brandenburg, R., and H. 1769 Stokking, "RTP Clock Source Signalling", RFC 7273, June 1770 2014. 1772 Appendix A. Changes From Earlier Versions 1774 NOTE TO RFC EDITOR: Please remove this section prior to publication. 1776 A.1. Modifications Between WG Version -03 and -04 1778 o Changed "Media Redundancy" and "Media Repair" to "RTP-based 1779 Redundancy" and "RTP-based Repair", since those terms are more 1780 specific and correct. 1782 o Changed "End Point" to "Endpoint" and removed Editor's Note on 1783 this. 1785 o Clarified that a Media Capture may impose constraints on clock 1786 handling. 1788 o Clarified that mixing multiple Raw Streams into a Source Stream is 1789 not possible, since that requires mixed streams to have a timing 1790 relation, requiring them to be Source Streams, and added an 1791 example. 1793 o Clarified that RTP-based Redundancy excludes the type of encoding 1794 redundancy found within the encoded media format in an Encoded 1795 Stream. 1797 o Clarified that a Media Transport contains only a single RTP 1798 Session, but a single RTP Session can span multiple Media 1799 Transports. 1801 o Clarified that packets with seemingly correct checksum that are 1802 received by a Media Transport Receiver may still be corrupt. 1804 o Clarified that a corrupt packet in a Media Transport Receiver is 1805 typically either discarded or somehow marked and passed on in the 1806 Received RTP Stream. 1808 o Added Synchronization Context to Figure 6. 1810 o Editorial improvements and clarifications. 1812 A.2. Modifications Between WG Version -02 and -03 1814 o Changed section 3.5, removing SST-SS/MS and MST-SS/MS, replacing 1815 them with SRST, MRST, and MRMT. 1817 o Updated section 3.8 to align with terminology changes in section 1818 3.5. 1820 o Added a new section 4.12, describing the term Multimedia 1821 Conference. 1823 o Changed reference from I-D to now published RFC 7273. 1825 o Editorial improvements and clarifications. 1827 A.3. Modifications Between WG Version -01 and -02 1829 o Major re-structure 1831 o Moved media chain Media Transport detailing up one section level 1833 o Collapsed level 2 sub-sections of section 3 and thus moved level 3 1834 sub-sections up one level, gathering some introductory text into 1835 the beginning of section 3 1837 o Added that not only SSRC collision, but also a clock rate change 1838 [RFC7160] is a valid reason to change SSRC value for an RTP stream 1840 o Added a sub-section on clock source signaling 1842 o Added a sub-section on RTP stream duplication 1843 o Elaborated a bit in section 2.2.1 on the relation between End 1844 Points, Participants and CNAMEs 1846 o Elaborated a bit in section 2.2.4 on Multimedia Session and 1847 synchronization contexts 1849 o Removed the section on CLUE scenes defining an implicit 1850 synchronization context, since it was incorrect 1852 o Clarified text on SVC SST and MST according to list discussions 1854 o Removed the entire topology section to avoid possible 1855 inconsistencies or duplications with draft-ietf-avtcore-rtp- 1856 topologies-update, but saved one example overview figure of 1857 Communication Entities into that section 1859 o Added a section 4 on mapping from existing terms with one sub- 1860 section per term, mainly by moving text from sections 2 and 3 1862 o Changed all occurrences of Packet Stream to RTP Stream 1864 o Moved all normative references to informative, since this is an 1865 informative document 1867 o Added references to RFC 7160, RFC 7197 and RFC 7198, and removed 1868 unused references 1870 A.4. Modifications Between WG Version -00 and -01 1872 o WG version -00 text is identical to individual draft -03 1874 o Amended description of SVC SST and MST encodings with respect to 1875 concepts defined in this text 1877 o Removed UML as normative reference, since the text no longer uses 1878 any UML notation 1880 o Removed a number of level 4 sections and moved out text to the 1881 level above 1883 A.5. Modifications Between Version -02 and -03 1885 o Section 4 rewritten (and new communication topologies added) to 1886 reflect the major updates to Sections 1-3 1888 o Section 8 removed (carryover from initial -00 draft) 1890 o General clean up of text, grammar and nits 1892 A.6. Modifications Between Version -01 and -02 1894 o Section 2 rewritten to add both streams and transformations in the 1895 media chain. 1897 o Section 3 rewritten to focus on exposing relationships. 1899 A.7. Modifications Between Version -00 and -01 1901 o Too many to list 1903 o Added new authors 1905 o Updated content organization and presentation 1907 Authors' Addresses 1909 Jonathan Lennox 1910 Vidyo, Inc. 1911 433 Hackensack Avenue 1912 Seventh Floor 1913 Hackensack, NJ 07601 1914 US 1916 Email: jonathan@vidyo.com 1918 Kevin Gross 1919 AVA Networks, LLC 1920 Boulder, CO 1921 US 1923 Email: kevin.gross@avanw.com 1925 Suhas Nandakumar 1926 Cisco Systems 1927 170 West Tasman Drive 1928 San Jose, CA 95134 1929 US 1931 Email: snandaku@cisco.com 1932 Gonzalo Salgueiro 1933 Cisco Systems 1934 7200-12 Kit Creek Road 1935 Research Triangle Park, NC 27709 1936 US 1938 Email: gsalguei@cisco.com 1940 Bo Burman 1941 Ericsson 1942 Kistavagen 25 1943 SE-164 80 Stockholm 1944 Sweden 1946 Phone: +46 10 714 13 11 1947 Email: bo.burman@ericsson.com