idnits 2.17.1 draft-ietf-avtext-rtp-grouping-taxonomy-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (November 06, 2013) is 3821 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'UML' is defined on line 1775, but no explicit reference was found in the text == Unused Reference: 'RFC3264' is defined on line 1816, but no explicit reference was found in the text == Unused Reference: 'RFC6222' is defined on line 1861, but no explicit reference was found in the text == Outdated reference: A later version (-11) exists of draft-ietf-avtcore-clksrc-07 == Outdated reference: A later version (-25) exists of draft-ietf-clue-framework-12 == Outdated reference: A later version (-54) exists of draft-ietf-mmusic-sdp-bundle-negotiation-05 == Outdated reference: A later version (-19) exists of draft-ietf-rtcweb-overview-08 -- Obsolete informational reference (is this intentional?): RFC 4566 (Obsoleted by RFC 8866) -- Obsolete informational reference (is this intentional?): RFC 6222 (Obsoleted by RFC 7022) Summary: 0 errors (**), 0 flaws (~~), 8 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Lennox 3 Internet-Draft Vidyo 4 Intended status: Informational K. Gross 5 Expires: May 10, 2014 AVA 6 S. Nandakumar 7 G. Salgueiro 8 Cisco Systems 9 B. Burman 10 Ericsson 11 November 06, 2013 13 A Taxonomy of Grouping Semantics and Mechanisms for Real-Time Transport 14 Protocol (RTP) Sources 15 draft-ietf-avtext-rtp-grouping-taxonomy-00 17 Abstract 19 The terminology about, and associations among, Real-Time Transport 20 Protocol (RTP) sources can be complex and somewhat opaque. This 21 document describes a number of existing and proposed relationships 22 among RTP sources, and attempts to define common terminology for 23 discussing protocol entities and their relationships. 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on May 10, 2014. 42 Copyright Notice 44 Copyright (c) 2013 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 60 2. Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . 4 61 2.1. Media Chain . . . . . . . . . . . . . . . . . . . . . . . 4 62 2.1.1. Physical Stimulus . . . . . . . . . . . . . . . . . . 7 63 2.1.2. Media Capture . . . . . . . . . . . . . . . . . . . . 7 64 2.1.3. Raw Stream . . . . . . . . . . . . . . . . . . . . . 7 65 2.1.4. Media Source . . . . . . . . . . . . . . . . . . . . 8 66 2.1.5. Source Stream . . . . . . . . . . . . . . . . . . . . 9 67 2.1.6. Media Encoder . . . . . . . . . . . . . . . . . . . . 9 68 2.1.7. Encoded Stream . . . . . . . . . . . . . . . . . . . 10 69 2.1.8. Dependent Stream . . . . . . . . . . . . . . . . . . 10 70 2.1.9. Media Packetizer . . . . . . . . . . . . . . . . . . 10 71 2.1.10. Packet Stream . . . . . . . . . . . . . . . . . . . . 11 72 2.1.11. Media Redundancy . . . . . . . . . . . . . . . . . . 12 73 2.1.12. Redundancy Packet Stream . . . . . . . . . . . . . . 12 74 2.1.13. Media Transport . . . . . . . . . . . . . . . . . . . 12 75 2.1.14. Received Packet Stream . . . . . . . . . . . . . . . 15 76 2.1.15. Received Redundandy Packet Stream . . . . . . . . . . 15 77 2.1.16. Media Repair . . . . . . . . . . . . . . . . . . . . 15 78 2.1.17. Repaired Packet Stream . . . . . . . . . . . . . . . 15 79 2.1.18. Media Depacketizer . . . . . . . . . . . . . . . . . 15 80 2.1.19. Received Encoded Stream . . . . . . . . . . . . . . . 15 81 2.1.20. Media Decoder . . . . . . . . . . . . . . . . . . . . 16 82 2.1.21. Received Source Stream . . . . . . . . . . . . . . . 16 83 2.1.22. Media Sink . . . . . . . . . . . . . . . . . . . . . 16 84 2.1.23. Received Raw Stream . . . . . . . . . . . . . . . . . 16 85 2.1.24. Media Render . . . . . . . . . . . . . . . . . . . . 16 86 2.2. Communication Entities . . . . . . . . . . . . . . . . . 17 87 2.2.1. End Point . . . . . . . . . . . . . . . . . . . . . . 17 88 2.2.2. RTP Session . . . . . . . . . . . . . . . . . . . . . 17 89 2.2.3. Participant . . . . . . . . . . . . . . . . . . . . . 18 90 2.2.4. Multimedia Session . . . . . . . . . . . . . . . . . 19 91 2.2.5. Communication Session . . . . . . . . . . . . . . . . 19 92 3. Relations at Different Levels . . . . . . . . . . . . . . . . 20 93 3.1. Media Source Relations . . . . . . . . . . . . . . . . . 20 94 3.1.1. Synchronization Context . . . . . . . . . . . . . . . 20 95 3.1.2. End Point . . . . . . . . . . . . . . . . . . . . . . 21 96 3.1.3. Participant . . . . . . . . . . . . . . . . . . . . . 22 97 3.1.4. WebRTC MediaStream . . . . . . . . . . . . . . . . . 22 98 3.2. Packetization Time Relations . . . . . . . . . . . . . . 22 99 3.2.1. Single Stream Transport of SVC . . . . . . . . . . . 23 100 3.2.2. Multi-Channel Audio . . . . . . . . . . . . . . . . . 23 101 3.2.3. Redundancy Format . . . . . . . . . . . . . . . . . . 23 102 3.3. Packet Stream Relations . . . . . . . . . . . . . . . . . 24 103 3.3.1. Simulcast . . . . . . . . . . . . . . . . . . . . . . 24 104 3.3.2. Layered Multi-Stream Transmission . . . . . . . . . . 25 105 3.3.3. Robustness and Repair . . . . . . . . . . . . . . . . 26 106 3.3.4. Packet Stream Separation . . . . . . . . . . . . . . 29 107 3.4. Multiple RTP Sessions over one Media Transport . . . . . 30 108 4. Topologies and Communication Entities . . . . . . . . . . . . 30 109 4.1. Point-to-Point Communication . . . . . . . . . . . . . . 31 110 4.2. Central Conferencing . . . . . . . . . . . . . . . . . . 32 111 4.3. Full Mesh Conferencing . . . . . . . . . . . . . . . . . 33 112 4.4. Source-Specific Multicast . . . . . . . . . . . . . . . . 36 113 5. Security Considerations . . . . . . . . . . . . . . . . . . . 37 114 6. Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 38 115 7. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 38 116 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 38 117 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 38 118 9.1. Normative References . . . . . . . . . . . . . . . . . . 38 119 9.2. Informative References . . . . . . . . . . . . . . . . . 38 120 Appendix A. Changes From Earlier Versions . . . . . . . . . . . 40 121 A.1. Modifications Between Version -02 and -03 . . . . . . . . 40 122 A.2. Modifications Between Version -01 and -02 . . . . . . . . 40 123 A.3. Modifications Between Version -00 and -01 . . . . . . . . 40 124 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 41 126 1. Introduction 128 The existing taxonomy of sources in RTP is often regarded as 129 confusing and inconsistent. Consequently, a deep understanding of 130 how the different terms relate to each other becomes a real 131 challenge. Frequently cited examples of this confusion are (1) how 132 different protocols that make use of RTP use the same terms to 133 signify different things and (2) how the complexities addressed at 134 one layer are often glossed over or ignored at another. 136 This document attempts to provide some clarity by reviewing the 137 semantics of various aspects of sources in RTP. As an organizing 138 mechanism, it approaches this by describing various ways that RTP 139 sources can be grouped and associated together. 141 All non-specific references to ControLling mUltiple streams for 142 tElepresence (CLUE) in this document map to [I-D.ietf-clue-framework] 143 and all references to Web Real-Time Communications (WebRTC) map to 144 [I-D.ietf-rtcweb-overview]. 146 2. Concepts 148 This section defines concepts that serve to identify and name various 149 transformations and streams in a given RTP usage. For each concept 150 an attempt is made to list any alternate definitions and usages that 151 co-exist today along with various characteristics that further 152 describes the concept. These concepts are divided into two 153 categories, one related to the chain of streams and transformations 154 that media can be subject to, the other for entities involved in the 155 communication. 157 2.1. Media Chain 159 This section contains the concepts that can be involved in taking a 160 sequence of physical world stimulus (sound waves, photons, key- 161 strokes) at a sender side and transport them to a receiver, which may 162 recover a sequence of physical stimulus. This chain of concepts is 163 of two main types, streams and transformations. Streams are time- 164 based sequences of samples of the physical stimulus in various 165 representations, while transformations changes the representation of 166 the streams in some way. 168 The below examples are basic ones and it is important to keep in mind 169 that this conceptual model enables more complex usages. Some will be 170 further discussed in later sections of this document. In general the 171 following applies to this model: 173 o A transformation may have zero or more inputs and one or more 174 outputs. 176 o A Stream is of some type. 178 o A Stream has one source transformation and one or more sink 179 transformation (with the exception of Physical Stimulus 180 (Section 2.1.1) that can have no source or sink transformation). 182 o Streams can be forwarded from a transformation output to any 183 number of inputs on other transformations that support that type. 185 o If the output of a transformation is sent to multiple 186 transformations, those streams will be identical; it takes a 187 transformation to make them different. 189 o There are no formal limitations on how streams are connected to 190 transformations, this may include loops if required by a 191 particular transformation. 193 It is also important to remember that this is a conceptual model. 194 Thus real-world implementations may look different and have different 195 structure. 197 To provide a basic understanding of the relationships in the chain we 198 below first introduces the concepts for the sender side (Figure 1). 199 This covers physical stimulus until media packets are emitted onto 200 the network. 202 Physical Stimulus 203 | 204 V 205 +--------------------+ 206 | Media Capture | 207 +--------------------+ 208 | 209 Raw stream 210 V 211 +--------------------+ 212 | Media Source |<- Synchronization Timing 213 +--------------------+ 214 | 215 Source Stream 216 V 217 +--------------------+ 218 | Media Encoder | 219 +--------------------+ 220 | 221 Encoded Stream +-----------+ 222 V | V 223 +--------------------+ | +--------------------+ 224 | Media Packetizer | | | Media Redundancy | 225 +--------------------+ | +--------------------+ 226 | | | 227 +------------+ Redundancy Packet Stream 228 Source Packet Stream | 229 V V 230 +--------------------+ +--------------------+ 231 | Media Transport | | Media Transport | 232 +--------------------+ +--------------------+ 234 Figure 1: Sender Side Concepts in the Media Chain 236 In Figure 1 we have included a branched chain to cover the concepts 237 for using redundancy to improve the reliability of the transport. 238 The Media Transport concept is an aggregate that is decomposed below 239 in Section 2.1.13.2. 241 Below we review a receiver media chain (Figure 2) matching the sender 242 side to look at the inverse transformations and their attempts to 243 recover possibly identical streams as in the sender chain. Note that 244 the streams out of a reverse transformation, like the Source Stream 245 out the Media Decoder are in many cases not the same as the 246 corresponding ones on the sender side, thus they are prefixed with a 247 "Received" to denote a potentially modified version. The reason for 248 not being the same lies in the transformations that can be of 249 irreversible type. For example, lossy source coding in the Media 250 Encoder prevents the Source Stream out of the Media Decoder to be the 251 same as the one fed into the Media Encoder. Other reasons include 252 packet loss or late loss in the Media Transport transformation that 253 even Media Repair, if used, fails to repair. It should be noted that 254 some transformations are not always present, like Media Repair that 255 cannot operate without Redundancy Packet Streams. 257 +--------------------+ +--------------------+ 258 | Media Transport | | Media Transport | 259 +--------------------+ +--------------------+ 260 | | 261 Received Packet Stream Received Redundancy PS 262 | | 263 | +-------------------+ 264 V V 265 +--------------------+ 266 | Media Repair | 267 +--------------------+ 268 | 269 Repaired Packet Stream 270 V 271 +--------------------+ 272 | Media Depacketizer | 273 +--------------------+ 274 | 275 Received Encoded Stream 276 V 277 +--------------------+ 278 | Media Decoder | 279 +--------------------+ 280 | 281 Received Source Stream 282 V 283 +--------------------+ 284 | Media Sink |--> Synchronization Information 285 +--------------------+ 286 | 287 Received Raw Stream 288 V 290 +--------------------+ 291 | Media Renderer | 292 +--------------------+ 293 | 294 V 295 Physical Stimulus 297 Figure 2: Receiver Side Concepts of the Media Chain 299 2.1.1. Physical Stimulus 301 The physical stimulus is a physical event that can be captured and 302 provided as media to a receiver. This include sound waves making up 303 audio, photons in a light field that is visible, or other excitations 304 or interactions with sensors, like keystrokes on a keyboard. 306 2.1.2. Media Capture 308 The process of transforming the Physical Stimulus (Section 2.1.1) 309 into captured media. The Media Capture performs a digital sampling 310 of the physical stimulus, usually periodically, and outputs this in 311 some representation as a Raw Stream (Section 2.1.3). This data is 312 due to its periodical sampling, or at least being timed asynchronous 313 events, some form of a stream of media data. The Media Capture is 314 normally instantiated in some type of device, i.e. media capture 315 device. Examples of different types of media capturing devices are 316 digital cameras, microphones connected to A/D converters, or 317 keyboards. 319 2.1.2.1. Alternate Usages 321 The CLUE WG uses the term "Capture Device" to identify a physical 322 capture device. 324 WebRTC WG uses the term "Recording Device" to refer to the locally 325 available capture devices in an end-system. 327 2.1.2.2. Characteristics 329 o A Media Capture is identified either by hardware/manufacturer ID 330 or via a session-scoped device identifier as mandated by the 331 application usage. 333 o A Media Capture can generate an Encoded Stream (Section 2.1.7) if 334 the capture device support such a configuration. 336 2.1.3. Raw Stream 337 The time progressing stream of digitally sampled information, usually 338 periodically sampled, provided by a Media Capture (Section 2.1.2). 340 2.1.4. Media Source 342 A Media Source is the logical source of a reference clock 343 synchronized, time progressing, digital media stream, called a Source 344 Stream (Section 2.1.5). This transformation takes one or more Raw 345 Streams (Section 2.1.3) and provides a Source Stream as output. This 346 output has been synchronized with some reference clock, even if just 347 a system local wall clock. 349 The output can be of different types. One type is directly 350 associated with a particular Media Capture's Raw Stream. Others are 351 more conceptual sources, like an audio mix of multiple Raw Streams 352 (Figure 3), a mixed selection of the three loudest inputs regarding 353 speech activity, a selection of a particular video based on the 354 current speaker, i.e. typically based on other Media Sources. 356 Raw Raw Raw 357 Stream Stream Stream 358 | | | 359 V V V 360 +--------------------------+ 361 | Media Source |<-- Reference Clock 362 | Mixer | 363 +--------------------------+ 364 | 365 V 366 Source Stream 368 Figure 3: Conceptual Media Source in form of Audio Mixer 370 2.1.4.1. Alternate Usages 372 The CLUE WG uses the term "Media Capture" for this purpose. A CLUE 373 Media Capture is identified via indexed notation. The terms Audio 374 Capture and Video Capture are used to identify Audio Sources and 375 Video Sources respectively. Concepts such as "Capture Scene", 376 "Capture Scene Entry" and "Capture" provide a flexible framework to 377 represent media captured spanning spatial regions. 379 The WebRTC WG defines the term "RtcMediaStreamTrack" to refer to a 380 Media Source. An "RtcMediaStreamTrack" is identified by the ID 381 attribute. 383 Typically a Media Source is mapped to a single m=line via the Session 384 Description Protocol (SDP) [RFC4566] unless mechanisms such as 385 Source-Specific attributes are in place [RFC5576]. In the latter 386 cases, an m=line can represent either multiple Media Sources, 387 multiple Packet Streams (Section 2.1.10), or both. 389 2.1.4.2. Characteristics 391 o At any point, it can represent a physical captured source or 392 conceptual source. 394 2.1.5. Source Stream 396 A time progressing stream of digital samples that has been 397 synchronized with a reference clock and comes from particular Media 398 Source (Section 2.1.4). 400 2.1.6. Media Encoder 402 A Media Encoder is a transform that is responsible for encoding the 403 media data from a Source Stream (Section 2.1.5) into another 404 representation, usually more compact, that is output as an Encoded 405 Stream (Section 2.1.7). 407 The Media Encoder step commonly includes pre-encoding 408 transformations, such as scaling, resampling etc. The Media Encoder 409 can have a significant number of configuration options that affects 410 the properties of the encoded stream. This include properties such 411 as bit-rate, start points for decoding, resolution, bandwidth or 412 other fidelity affecting properties. The actually used codec is also 413 an important factor in many communication systems, not only its 414 parameters. 416 Scalable Media Encoders need special mentioning as they produce 417 multiple outputs that are potentially of different types. A scalable 418 Media Encoder takes one input Source Stream and encodes it into 419 multiple output streams of two different types; at least one Encoded 420 Stream that is independently decodable and one or more Dependent 421 Streams (Section 2.1.8) that requires at least one Encoded Stream and 422 zero or more Dependent Streams to be possible to decode. A Dependent 423 Stream's dependency is one of the grouping relations this document 424 discusses further in Section 3.3.2. 426 Source Stream 427 | 428 V 429 +--------------------------+ 430 | Scalable Media Encoder | 431 +--------------------------+ 432 | | ... | 433 V V V 434 Encoded Dependent Dependent 435 Stream Stream Stream 437 Figure 4: Scalable Media Encoder Input and Outputs 439 2.1.6.1. Alternate Usages 441 Within the SDP usage, an SDP media description (m=line) describes 442 part of the necessary configuration required for encoding purposes. 444 CLUE's "Capture Encoding" provides specific encoding configuration 445 for this purpose. 447 2.1.6.2. Characteristics 449 o A Media Source can be multiply encoded by different Media Encoders 450 to provide various encoded representations. 452 2.1.7. Encoded Stream 454 A stream of time synchronized encoded media that can be independently 455 decoded. 457 2.1.7.1. Characteristics 459 o Due to temporal dependencies, an Encoded Stream may have 460 limitations in where decoding can be started. These entry points, 461 for example Intra frames from a video encoder, may require 462 identification and their generation may be event based or 463 configured to occur periodically. 465 2.1.8. Dependent Stream 467 A stream of time synchronized encoded media fragments that are 468 dependent on one or more Encoded Streams (Section 2.1.7) and zero or 469 more Dependent Streams to be possible to decode. 471 2.1.8.1. Characteristics 473 o Each Dependent Stream has a set of dependencies. These 474 dependencies must be understood by the parties in a multi-media 475 session that intend to use a Dependent Stream. 477 2.1.9. Media Packetizer 479 The transformation of taking one or more Encoded (Section 2.1.7) or 480 Dependent Stream (Section 2.1.8) and put their content into one or 481 more sequences of packets, normally RTP packets, and output Source 482 Packet Streams (Section 2.1.10). This step includes both generating 483 RTP payloads as well as RTP packets. 485 The Media Packetizer can use multiple inputs when producing a single 486 Packet Stream. One such example is the packetization when using SVC, 487 as in Single Stream Transport (SST) usage of the payload format both 488 an Encoded Stream as well as Dependent Streams are packetized in a 489 single Source Packet Stream using a single SSRC. 491 The Media Packetizer can also produce multiple Packet Streams, for 492 example when Encoded and/or Dependent Streams are distributed over 493 multiple Packet Streams, possibly in different RTP sessions. 495 2.1.9.1. Alternate Usages 497 An RTP sender is part of the Media Packetizer. 499 2.1.9.2. Characteristics 501 o The Media Packetizer will select which Synchronization source(s) 502 (SSRC) [RFC3550] in which RTP sessions that are used. 504 o Media Packetizer can combine multiple Encoded or Dependent Streams 505 into one or more Packet Streams. 507 2.1.10. Packet Stream 509 A stream of RTP packets containing media data, source or redundant. 510 The Packet Stream is identified by an SSRC belonging to a particular 511 RTP session. The RTP session is identified as discussed in 512 Section 2.2.2. 514 A Source Packet Stream is a packet stream containing at least some 515 content from an Encoded Stream. Source material is any media 516 material that is produced for transport over RTP without any 517 additional redundancy applied to cope with network transport losses. 518 Compare this with the Redundancy Packet Stream (Section 2.1.12). 520 2.1.10.1. Alternate Usages 522 The term "Stream" is used by the CLUE WG to define an encoded Media 523 Source sent via RTP. "Capture Encoding", "Encoding Groups" are 524 defined to capture specific details of the encoding scheme. 526 RFC3550 [RFC3550] uses the terms media stream, audio stream, video 527 stream and streams of (RTP) packets interchangeably. It defines the 528 SSRC as the "The source of a stream of RTP packets, ..." 529 The equivalent mapping of a Packet Stream in SDP [RFC4566] is defined 530 per usage. For example, each Media Description (m=line) and 531 associated attributes can describe one Packet Stream OR properties 532 for multiple Packet Streams OR for an RTP session (via [RFC5576] 533 mechanisms for example). 535 2.1.10.2. Characteristics 537 o Each Packet Stream is identified by a unique Synchronization 538 source (SSRC) [RFC3550] that is carried in every RTP and RTP 539 Control Protocol (RTCP) packet header in a specific RTP session 540 context. 542 o At any given point in time, a Packet Stream can have one and only 543 one SSRC. 545 o Each Packet Stream defines a unique RTP sequence numbering and 546 timing space. 548 o Several Packet Streams may map to a single Media Source via the 549 source transformations. 551 o Several Packet Streams can be carried over a single RTP Session. 553 2.1.11. Media Redundancy 555 Media redundancy is a transformation that generates redundant or 556 repair packets sent out as a Redundancy Packet Stream to mitigate 557 network transport impairments, like packet loss and delay. 559 The Media Redundancy exists in many flavors; they may be generating 560 independent Repair Streams that are used in addition to the Source 561 Stream (RTP Retransmission [RFC4588] and some FEC [RFC5109]), they 562 may generate a new Source Stream by combining redundancy information 563 with source information (Using XOR FEC [RFC5109] as a redundancy 564 payload [RFC2198]), or completely replace the source information with 565 only redundancy packets. 567 2.1.12. Redundancy Packet Stream 569 A Packet Stream (Section 2.1.10) that contains no original source 570 data, only redundant data that may be combined with one or more 571 Received Packet Stream (Section 2.1.14) to produce Repaired Packet 572 Streams (Section 2.1.17). 574 2.1.13. Media Transport 575 A Media Transport defines the transformation that the Packet Streams 576 (Section 2.1.10) are subjected to by the end-to-end transport from 577 one RTP sender to one specific RTP receiver (an RTP session may 578 contain multiple RTP receivers per sender). Each Media Transport is 579 defined by a transport association that is identified by a 5-tuple 580 (source address, source port, destination address, destination port, 581 transport protocol). Each transport association normally contains 582 only a single RTP session, although a proposal exists for sending 583 multiple RTP sessions over one transport association 584 [I-D.westerlund-avtcore-transport-multiplexing]. 586 2.1.13.1. Characteristics 588 o Media Transport transmits Packet Streams of RTP Packets from a 589 source transport address to a destination transport address. 591 2.1.13.2. Media Stream Decomposition 593 The Media Transport concept sometimes needs to be decomposed into 594 more steps to enable discussion of what a sender emits that gets 595 transformed by the network before it is received by the receiver. 596 Thus we provide also this Media Transport decomposition (Figure 5). 598 Packet Stream 599 | 600 V 601 +--------------------------+ 602 | Media Transport Sender | 603 +--------------------------+ 604 | 605 Sent Packet Stream 606 V 607 +--------------------------+ 608 | Network Transport | 609 +--------------------------+ 610 | 611 Transported Packet Stream 612 V 613 +--------------------------+ 614 | Media Transport Receiver | 615 +--------------------------+ 616 | 617 V 618 Received Packet Stream 620 Figure 5: Decomposition of Media Transport 622 2.1.13.2.1. Media Transport Sender 624 The first transformation within the Media Transport (Section 2.1.13) 625 is the Media Transport Sender, where the sending End-Point 626 (Section 2.2.1) takes a Packet Stream and emits the packets onto the 627 network using the transport association established for this Media 628 Transport thus creating a Sent Packet Stream (Section 2.1.13.2.2). 629 In this process it transforms the Packet Stream in several ways. 630 First, it gains the necessary protocol headers for the transport 631 association, for example IP and UDP headers, thus forming IP/UDP/RTP 632 packets. In addition, the Media Transport Sender may queue, pace or 633 otherwise affect how the packets are emitted onto the network. Thus 634 adding delay, jitter and inter packet spacings that characterize the 635 Sent Packet Stream. 637 2.1.13.2.2. Sent Packet Stream 639 The Sent Packet Stream is the Packet Stream as entering the first hop 640 of the network path to its destination. The Sent Packet Stream is 641 identified using network transport addresses, like for IP/UDP the 642 5-tuple (source IP address, source port, destination IP address, 643 destination port, and protocol (UDP)). 645 2.1.13.2.3. Network Transport 647 Network Transport is the transformation that the Sent Packet Stream 648 (Section 2.1.13.2.2) is subjected to by traveling from the source to 649 the destination through the network. These transformations include, 650 loss of some packets, varying delay on a per packet basis, packet 651 duplication, and packet header or data corruption. These 652 transformations produces a Transported Packet Stream 653 (Section 2.1.13.2.4) at the exit of the network path. 655 2.1.13.2.4. Transported Packet Stream 657 The Packet Stream that is emitted out of the network path at the 658 destination, subjected to the Network Transport's transformation 659 (Section 2.1.13.2.3). 661 2.1.13.2.5. Media Transport Receiver 663 The receiver End-Point's (Section 2.2.1) transformation of the 664 Transported Packet Stream (Section 2.1.13.2.4) by its reception 665 process that result in the Received Packet Stream (Section 2.1.14). 666 This transformation includes transport checksums being verified and 667 if non-matching, causing discarding of the corrupted packet. Other 668 transformations can include delay variations in receiving a packet on 669 the network interface and providing it to the application. 671 2.1.14. Received Packet Stream 673 The Packet Stream (Section 2.1.10) resulting from the Media 674 Transport's transformation, i.e. subjected to packet loss, packet 675 corruption, packet duplication and varying transmission delay from 676 sender to receiver. 678 2.1.15. Received Redundandy Packet Stream 680 The Redundancy Packet Stream (Section 2.1.12) resulting from the 681 Media Transport's transformation, i.e. subjected to packet loss, 682 packet corruption, and varying transmission delay from sender to 683 receiver. 685 2.1.16. Media Repair 687 A Transformation that takes as input one or more Source Packet 688 Streams (Section 2.1.10) as well as Redundancy Packet Streams 689 (Section 2.1.12) and attempts to combine them to counter the 690 transformations introduced by the Media Transport (Section 2.1.13) to 691 minimize the difference between the Source Stream (Section 2.1.5) and 692 the Received Source Stream (Section 2.1.21) after Media Decoder 693 (Section 2.1.20). The output is a Repaired Packet Stream 694 (Section 2.1.17). 696 2.1.17. Repaired Packet Stream 698 A Received Packet Stream (Section 2.1.14) for which Received 699 Redundancy Packet Stream (Section 2.1.15) information has been used 700 to try to re-create the Packet Stream (Section 2.1.10) as it was 701 before Media Transport (Section 2.1.13). 703 2.1.18. Media Depacketizer 705 A Media Depacketizer takes one or more Packet Streams 706 (Section 2.1.10) and depacketizes them and attempts to reconstitute 707 the Encoded Streams (Section 2.1.7) or Dependent Streams 708 (Section 2.1.8) present in those Packet Streams. 710 2.1.19. Received Encoded Stream 712 The received version of an Encoded Stream (Section 2.1.7). 714 2.1.20. Media Decoder 716 A Media Decoder is a transformation that is responsible for decoding 717 Encoded Streams (Section 2.1.7) and any Dependent Streams 718 (Section 2.1.8) into a Source Stream (Section 2.1.5). 720 2.1.20.1. Alternate Usages 722 Within the context of SDP, an m=line describes the necessary 723 configuration and identification (RTP Payload Types) required to 724 decode either one or more incoming Media Streams. 726 2.1.20.2. Characteristics 728 o A Media Decoder is the entity that will have to deal with any 729 errors in the encoded streams that resulted from corruptions or 730 failures to repair packet losses. This as a media decoder 731 generally is forced to produce some output periodically. It thus 732 commonly includes concealment methods. 734 2.1.21. Received Source Stream 736 The received version of a Source Stream (Section 2.1.5). 738 2.1.22. Media Sink 740 The Media Sink receives a Source Stream (Section 2.1.5) that 741 contains, usually periodically, sampled media data together with 742 associated synchronization information. Depending on application, 743 this Source Stream then needs to be transformed into a Raw Stream 744 (Section 2.1.3) that is sent in synchronization with the output from 745 other Media Sinks to a Media Render (Section 2.1.24). The media sink 746 may also be connected with a Media Source (Section 2.1.4) and be used 747 as part of a conceptual Media Source. 749 2.1.22.1. Characteristics 751 o The media sink can further transform the source stream into a 752 representation that is suitable for rendering on the Media Render 753 as defined by the application or system-wide configuration. This 754 include sample scaling, level adjustments etc. 756 2.1.23. Received Raw Stream 758 The received version of a Raw Stream (Section 2.1.3). 760 2.1.24. Media Render 761 A Media Render takes a Raw Stream (Section 2.1.3) and converts it 762 into Physical Stimulus (Section 2.1.1) that a human user can 763 perceive. Examples of such devices are screens, D/A converters 764 connected to amplifiers and loudspeakers. 766 2.1.24.1. Characteristics 768 o An End Point can potentially have multiple Media Renders for each 769 media type. 771 2.2. Communication Entities 773 This section contains concept for entities involved in the 774 communication. 776 2.2.1. End Point 778 A single addressable entity sending or receiving RTP packets. It may 779 be decomposed into several functional blocks, but as long as it 780 behaves as a single RTP stack entity it is classified as a single 781 "End Point". 783 2.2.1.1. Alternate Usages 785 The CLUE Working Group (WG) uses the terms "Media Provider" and 786 "Media Consumer" to describes aspects of End Point pertaining to 787 sending and receiving functionalities. 789 2.2.1.2. Characteristics 791 End Points can be identified in several different ways. While RTCP 792 Canonical Names (CNAMEs) [RFC3550] provide a globally unique and 793 stable identification mechanism for the duration of the Communication 794 Session (see Section 2.2.5), their validity applies exclusively 795 within a Synchronization Context (Section 3.1.1). Thus one End Point 796 can have multiple CNAMEs. Therefore, mechanisms outside the scope of 797 RTP, such as application defined mechanisms, must be used to ensure 798 End Point identification when outside this Synchronization Context. 800 2.2.2. RTP Session 802 An RTP session is an association among a group of participants 803 communicating with RTP. It is a group communications channel which 804 can potentially carry a number of Packet Streams. Within an RTP 805 session, every participant can find meta-data and control information 806 (over RTCP) about all the Packet Streams in the RTP session. The 807 bandwidth of the RTCP control channel is shared between all 808 participants within an RTP Session. 810 2.2.2.1. Alternate Usages 812 Within the context of SDP, a singe m=line can map to a single RTP 813 Session or multiple m=lines can map to a single RTP Session. The 814 latter is enabled via multiplexing schemes such as BUNDLE 815 [I-D.ietf-mmusic-sdp-bundle-negotiation], for example, which allows 816 mapping of multiple m=lines to a single RTP Session. 818 2.2.2.2. Characteristics 820 o Typically, an RTP Session can carry one ore more Packet Streams. 822 o An RTP Session shares a single SSRC space as defined in RFC3550 823 [RFC3550]. That is, the End Points participating in an RTP 824 Session can see an SSRC identifier transmitted by any of the other 825 End Points. An End Point can receive an SSRC either as SSRC or as 826 a Contributing source (CSRC) in RTP and RTCP packets, as defined 827 by the endpoints' network interconnection topology. 829 o An RTP Session uses at least two Media Transports 830 (Section 2.1.13), one for sending and one for receiving. 831 Commonly, the receiving one is the reverse direction of the same 832 one as used for sending. An RTP Session may use many Media 833 Transports and these define the session's network interconnection 834 topology. A single Media Transport can normally not transport 835 more than one RTP Session, unless a solution for multiplexing 836 multiple RTP sessions over a single Media Transport is used. One 837 example of such a scheme is Multiple RTP Sessions on a Single 838 Lower-Layer Transport 839 [I-D.westerlund-avtcore-transport-multiplexing]. 841 o Multiple RTP Sessions can be related. 843 2.2.3. Participant 845 A participant is an entity reachable by a single signaling address, 846 and is thus related more to the signaling context than to the media 847 context. 849 2.2.3.1. Characteristics 851 o A single signaling-addressable entity, using an application- 852 specific signaling address space, for example a SIP URI. 854 o A participant can have several Multimedia Sessions 855 (Section 2.2.4). 857 o A participant can have several associated transport flows, 858 including several separate local transport addresses for those 859 transport flows. 861 2.2.4. Multimedia Session 863 A multimedia session is an association among a group of participants 864 engaged in the communication via one or more RTP Sessions 865 (Section 2.2.2). It defines logical relationships among Media 866 Sources (Section 2.1.4) that appear in multiple RTP Sessions. 868 2.2.4.1. Alternate Usages 870 RFC4566 [RFC4566] defines a multimedia session as a set of multimedia 871 senders and receivers and the data streams flowing from senders to 872 receivers. 874 RFC3550 [RFC3550] defines it as set of concurrent RTP sessions among 875 a common group of participants. For example, a video conference 876 (which is a multimedia session) may contain an audio RTP session and 877 a video RTP session. 879 2.2.4.2. Characteristics 881 o A Multimedia Session can be composed of several parallel RTP 882 Sessions with potentially multiple Packet Streams per RTP Session. 884 o Each participant in a Multimedia Session can have a multitude of 885 Media Captures and Media Rendering devices. 887 2.2.5. Communication Session 889 A Communication Session is an association among group of participants 890 communicating with each other via a set of Multimedia Sessions. 892 2.2.5.1. Alternate Usages 894 The Session Description Protocol (SDP) [RFC4566] defines a multimedia 895 session as a set of multimedia senders and receivers and the data 896 streams flowing from senders to receivers. In that definition it is 897 however not clear if a multimedia session includes both the sender's 898 and the receiver's view of the same RTP Packet Stream. 900 2.2.5.2. Characteristics 902 o Each participant in a Communication Session is identified via an 903 application-specific signaling address. 905 o A Communication Session is composed of at least one Multimedia 906 Session per participant, involving one or more parallel RTP 907 Sessions with potentially multiple Packet Streams per RTP Session. 909 For example, in a full mesh communication, the Communication Session 910 consists of a set of separate Multimedia Sessions between each pair 911 of Participants. Another example is a centralized conference, where 912 the Communication Session consists of a set of Multimedia Sessions 913 between each Participant and the conference handler. 915 3. Relations at Different Levels 917 This section uses the concepts from previous section and look at 918 different types of relationships among them. These relationships 919 occur at different levels and for different purposes. The section is 920 organized such as to look at the level where a relation is required. 921 The reason for the relationship may exist at another step in the 922 media handling chain. For example, using Simulcast (discussed in 923 Section 3.3.1) needs to determine relations at Packet Stream level, 924 however the reason to relate Packet Streams is that multiple Media 925 Encoders use the same Media Source, i.e. to be able to identify a 926 common Media Source. 928 3.1. Media Source Relations 930 Media Sources (Section 2.1.4) are commonly grouped and related to an 931 End Point (Section 2.2.1) or a Participant (Section 2.2.3). This 932 occurs for several reasons; both application logic as well as media 933 handling purposes. These cases are further discussed below. 935 3.1.1. Synchronization Context 937 A Synchronization Context defines a requirement on a strong timing 938 relationship between the Media Sources, typically requiring alignment 939 of clock sources. Such relationship can be identified in multiple 940 ways as listed below. A single Media Source can only belong to a 941 single Synchronization Context, since it is assumed that a single 942 Media Source can only have a single media clock and requiring 943 alignment to several Synchronization Contexts (and thus reference 944 clocks) will effectively merge those into a single Synchronization 945 Context. 947 A single Multimedia Session can contain media from one or more 948 Synchronization Contexts. An example of that is a Multimedia Session 949 containing one set of audio and video for communication purposes 950 belonging to one Synchronization Context, and another set of audio 951 and video for presentation purposes (like playing a video file) with 952 a separate Synchronization Context that has no strong timing 953 relationship and need not be strictly synchronized with the audio and 954 video used for communication. 956 3.1.1.1. RTCP CNAME 958 RFC3550 [RFC3550] describes Inter-media synchronization between RTP 959 Sessions based on RTCP CNAME, RTP and Network Time Protocol (NTP) 960 [RFC5905] formatted timestamps of a reference clock. As indicated in 961 [I-D.ietf-avtcore-clksrc], despite using NTP format timestamps, it is 962 not required that the clock be synchronized to an NTP source. 964 3.1.1.2. Clock Source Signaling 966 [I-D.ietf-avtcore-clksrc] provides a mechanism to signal the clock 967 source in SDP both for the reference clock as well as the media 968 clock, thus allowing a Synchronization Context to be defined beyond 969 the one defined by the usage of CNAME source descriptions. 971 3.1.1.3. CLUE Scenes 973 In CLUE "Capture Scene", "Capture Scene Entry" and "Captures" define 974 an implied Synchronization Context. 976 3.1.1.4. Implicitly via RtcMediaStream 978 The WebRTC WG defines "RtcMediaStream" with one or more 979 "RtcMediaStreamTracks". All tracks in a "RTCMediaStream" are 980 intended to be possible to synchronize when rendered. 982 3.1.1.5. Explicitly via SDP Mechanisms 984 RFC5888 [RFC5888] defines m=line grouping mechanism called "Lip 985 Synchronization (LS)" for establishing the synchronization 986 requirement across m=lines when they map to individual sources. 988 RFC5576 [RFC5576] extends the above mechanism when multiple media 989 sources are described by a single m=line. 991 3.1.2. End Point 993 Some applications requires knowledge of what Media Sources originate 994 from a particular End Point (Section 2.2.1). This can include such 995 decisions as packet routing between parts of the topology, knowing 996 the End Point origin of the Packet Streams. 998 In RTP, this identification has been overloaded with the 999 Synchronization Context through the usage of the source description 1000 CNAME item. This works for some usages, but sometimes it breaks 1001 down. For example, if an End Point has two sets of Media Sources 1002 that have different Synchronization Contexts, like the audio and 1003 video of the human participant as well as a set of Media Sources of 1004 audio and video for a shared movie. Thus, an End Point may have 1005 multiple CNAMEs. The CNAMEs or the Media Sources themselves can be 1006 related to the End Point. 1008 3.1.3. Participant 1010 In communication scenarios, it is commonly needed to know which Media 1011 Sources that originate from which Participant (Section 2.2.3). Thus 1012 enabling the application to for example display Participant Identity 1013 information correctly associated with the Media Sources. This 1014 association is currently handled through the signaling solution to 1015 point at a specific Multimedia Session where the Media Sources may be 1016 explicitly or implicitly tied to a particular End Point. 1018 Participant information becomes more problematic due to Media Sources 1019 that are generated through mixing or other conceptual processing of 1020 Raw Streams or Source Streams that originate from different 1021 Participants. This type of Media Sources can thus have a dynamically 1022 varying set of origins and Participants. RTP contains the concept of 1023 Contributing Sources (CSRC) that carries such information about the 1024 previous step origin of the included media content on RTP level. 1026 3.1.4. WebRTC MediaStream 1028 An RtcMediaStream, in addition to requiring a single Synchronization 1029 Context as discussed above, is also an explicit grouping of a set of 1030 Media Sources, as identified by RtcMediaStreamTracks, within the 1031 RtcMediaStream. 1033 3.2. Packetization Time Relations 1035 At RTP Packetization time, there exists a possibility for a number of 1036 different types of relationships between Encoded Streams 1037 (Section 2.1.7), Dependent Streams (Section 2.1.8) and Packet Streams 1038 (Section 2.1.10). These are caused by grouping together or 1039 distributing these different types of streams into Packet Streams. 1040 This section will look at such relationships. 1042 3.2.1. Single Stream Transport of SVC 1044 Scalable Video Coding [RFC6190] has a mode of operation where Encoded 1045 Streams and Dependent Streams from the SVC Media Encoder is grouped 1046 together in a single Source Packet Stream using the SVC RTP Payload 1047 format. 1049 3.2.2. Multi-Channel Audio 1051 There exist a number of RTP payload formats that can carry multi- 1052 channel audio, despite the codec being a mono encoder. Multi-channel 1053 audio can be viewed as multiple Media Sources sharing a common 1054 Synchronization Context. These are then independently encoded by a 1055 Media Encoder and the different Encoded Streams are then packetized 1056 together in a time synchronized way into a single Source Packet 1057 Stream using the used codec's RTP Payload format. Example of such 1058 codecs are, PCMA and PCMU [RFC3551], AMR [RFC4867], and G.719 1059 [RFC5404]. 1061 3.2.3. Redundancy Format 1063 The RTP Payload for Redundant Audio Data [RFC2198] defines how one 1064 can transport redundant audio data together with primary data in the 1065 same RTP payload. The redundant data can be a time delayed version 1066 of the primary or another time delayed Encoded stream using a 1067 different Media Encoder to encode the same Media Source as the 1068 primary, as depicted below in Figure 6. 1070 +--------------------+ 1071 | Media Source | 1072 +--------------------+ 1073 | 1074 Source Stream 1075 | 1076 +------------------------+ 1077 | | 1078 V V 1079 +--------------------+ +--------------------+ 1080 | Media Encoder | | Media Encoder | 1081 +--------------------+ +--------------------+ 1082 | | 1083 | +------------+ 1084 Encoded Stream | Time Delay | 1085 | +------------+ 1086 | | 1087 | +------------------+ 1088 V V 1089 +--------------------+ 1090 | Media Packetizer | 1091 +--------------------+ 1092 | 1093 V 1094 Packet Stream 1096 Figure 6: Concept for usage of Audio Redundancy with different Media 1097 Encoders 1099 The Redundancy format is thus providing the necessary meta 1100 information to correctly relate different parts of the same Encoded 1101 Stream, or in the case depicted above (Figure 6) relate the Received 1102 Source Stream fragments coming out of different Media Decoders to be 1103 able to combine them together into a less erroneous Source Stream. 1105 3.3. Packet Stream Relations 1107 This section discusses various cases of relationships among Packet 1108 Streams. This is a common relation to handle in RTP due to that 1109 Packet Streams are separate and have their own SSRC, implying 1110 independent sequence numbers and timestamp spaces. The underlying 1111 reasons for the Packet Stream relationships are different, as can be 1112 seen in the cases below. The different Packet Streams can be handled 1113 within the same RTP Session or different RTP Sessions to accomplish 1114 different transport goals. This separation of Packet Streams is 1115 further discussed in Section 3.3.4. 1117 3.3.1. Simulcast 1119 A Media Source represented as multiple independent Encoded Streams 1120 constitutes a simulcast of that Media Source. Figure 7 below 1121 represents an example of a Media Source that is encoded into three 1122 separate and different Simulcast streams, that are in turn sent on 1123 the same Media Transport flow. When using Simulcast, the Packet 1124 Streams may be sharing RTP Session and Media Transport, or be 1125 separated on different RTP Sessions and Media Transports, or be any 1126 combination of these two. It is other considerations that affect 1127 which usage is desirable, as discussed in Section 3.3.4. 1129 +----------------+ 1130 | Media Source | 1131 +----------------+ 1132 Source Stream | 1133 +----------------------+----------------------+ 1134 | | | 1135 v v v 1136 +------------------+ +------------------+ +------------------+ 1137 | Media Encoder | | Media Encoder | | Media Encoder | 1138 +------------------+ +------------------+ +------------------+ 1139 | Encoded | Encoded | Encoded 1140 | Stream | Stream | Stream 1141 v v v 1142 +------------------+ +------------------+ +------------------+ 1143 | Media Packetizer | | Media Packetizer | | Media Packetizer | 1144 +------------------+ +------------------+ +------------------+ 1145 | Source | Source | Source 1146 | Packet | Packet | Packet 1147 | Stream | Stream | Stream 1148 +-----------------+ | +-----------------+ 1149 | | | 1150 V V V 1151 +-------------------+ 1152 | Media Transport | 1153 +-------------------+ 1155 Figure 7: Example of Media Source Simulcast 1157 The simulcast relation between the Packet Streams is the common Media 1158 Source. In addition, to be able to identify the common Media Source, 1159 a receiver of the Packet Stream may need to know which configuration 1160 or encoding goals that lay behind the produced Encoded Stream and its 1161 properties. This to enable selection of the stream that is most 1162 useful in the application at that moment. 1164 3.3.2. Layered Multi-Stream Transmission 1166 Multi-stream transmission (MST) is a mechanism by which different 1167 portions of a layered encoding of a Source Stream are sent using 1168 separate Packet Streams (sometimes in separate RTP sessions). MSTs 1169 are useful for receiver control of layered media. 1171 A Media Source represented as an Encoded Stream and multiple 1172 Dependent Streams constitutes a Media Source that has layered 1173 dependency. The figure below represents an example of a Media Source 1174 that is encoded into three dependent layers, where two layers are 1175 sent on the same Media Transport using different Packet Streams, i.e. 1176 SSRCs, and the third layer is sent on a separate Media Transport, 1177 i.e. a different RTP Session. 1179 +----------------+ 1180 | Media Source | 1181 +----------------+ 1182 | 1183 | 1184 V 1185 +---------------------------------------------------------+ 1186 | Media Encoder | 1187 +---------------------------------------------------------+ 1188 | | | 1189 Encoded Stream Dependent Stream Dependent Stream 1190 | | | 1191 V V V 1192 +----------------+ +----------------+ +----------------+ 1193 |Media Packetizer| |Media Packetizer| |Media Packetizer| 1194 +----------------+ +----------------+ +----------------+ 1195 | | | 1196 Packet Stream Packet Stream Packet Stream 1197 | | | 1198 +------+ +------+ | 1199 | | | 1200 V V V 1201 +-----------------+ +-----------------+ 1202 | Media Transport | | Media Transport | 1203 +-----------------+ +-----------------+ 1205 Figure 8: Example of Media Source Layered Dependency 1207 The SVC MST relation needs to identify the common Media Encoder 1208 origin for the Encoded and Dependent Streams. The SVC RTP Payload 1209 RFC is not particularly explicit about how this relation is to be 1210 implemented. When using different RTP Sessions, thus different Media 1211 Transports, and as long as there is only one Packet Stream per Media 1212 Encoder and a single Media Source in each RTP Session, common SSRC 1213 and CNAMEs can be used to identify the common Media Source. When 1214 multiple Packet Streams are sent from one Media Encoder in the same 1215 RTP Session, then CNAME is the only currently specified RTP 1216 identifier that can be used. In cases where multiple Media Encoders 1217 use multiple Media Sources sharing Synchronization Context, and thus 1218 having a common CNAME, additional heuristics need to be applied to 1219 create the MST relationship between the Packet Streams. 1221 3.3.3. Robustness and Repair 1223 Packet Streams may be protected by Redundancy Packet Streams during 1224 transport. Several approaches listed below can achieve the same 1225 result; 1227 o Duplication of the original Packet Stream 1229 o Duplication of the original Packet Stream with a time offset, 1231 o Forward Error Correction (FEC) techniques, and 1233 o Retransmission of lost packets (either globally or selectively). 1235 3.3.3.1. RTP Retransmission 1237 The figure below (Figure 9) represents an example where a Media 1238 Source's Source Packet Stream is protected by a retransmission (RTX) 1239 flow [RFC4588]. In this example the Source Packet Stream and the 1240 Redundancy Packet Stream share the same Media Transport. 1242 +--------------------+ 1243 | Media Source | 1244 +--------------------+ 1245 | 1246 V 1247 +--------------------+ 1248 | Media Encoder | 1249 +--------------------+ 1250 | Retransmission 1251 Encoded Stream +--------+ +---- Request 1252 V | V V 1253 +--------------------+ | +--------------------+ 1254 | Media Packetizer | | | RTP Retransmission | 1255 +--------------------+ | +--------------------+ 1256 | | | 1257 +------------+ Redundancy Packet Stream 1258 Source Packet Stream | 1259 | | 1260 +---------+ +---------+ 1261 | | 1262 V V 1263 +-----------------+ 1264 | Media Transport | 1265 +-----------------+ 1267 Figure 9: Example of Media Source Retransmission Flows 1269 The RTP Retransmission example (Figure 9) helps illustrate that this 1270 mechanism works purely on the Source Packet Stream. The RTP 1271 Retransmission transform buffers the sent Source Packet Stream and 1272 upon requests emits a retransmitted packet with some extra payload 1273 header as a Redundancy Packet Stream. The RTP Retransmission 1274 mechanism [RFC4588] is specified so that there is a one to one 1275 relation between the Source Packet Stream and the Redundancy Packet 1276 Stream. Thus a Redundancy Packet Stream needs to be associated with 1277 its Source Packet Stream upon being received. This is done based on 1278 CNAME selectors and heuristics to match requested packets for a given 1279 Source Packet Stream with the original sequence number in the payload 1280 of any new Redundancy Packet Stream using the RTX payload format. In 1281 cases where the Redundancy Packet Stream is sent in a separate RTP 1282 Session from the Source Packet Stream, these sessions are related, 1283 e.g. using the SDP Media Grouping's [RFC5888] FID semantics. 1285 3.3.3.2. Forward Error Correction 1287 The figure below (Figure 10) represents an example where two Media 1288 Sources' Source Packet Streams are protected by FEC. Source Packet 1289 Stream A has a Media Redundancy transformation in FEC Encoder 1. 1290 This produces a Redundancy Packet Stream 1, that is only related to 1291 Source Packet Stream A. The FEC Encoder 2, however takes two Source 1292 Packet Streams (A and B) and produces a Redundancy Packet Stream 2 1293 that protects them together, i.e. Redundancy Packet Stream 2 relate 1294 to two Source Packet Streams (a FEC group). FEC decoding, when 1295 needed due to packet loss or packet corruption at the receiver, 1296 requires knowledge about which Source Packet Streams that the FEC 1297 encoding was based on. 1299 In Figure 10 all Packet Streams are sent on the same Media Transport. 1300 This is however not the only possible choice. Numerous combinations 1301 exist for spreading these Packet Streams over different Media 1302 Transports to achieve the communication application's goal. 1304 +--------------------+ +--------------------+ 1305 | Media Source A | | Media Source B | 1306 +--------------------+ +--------------------+ 1307 | | 1308 V V 1309 +--------------------+ +--------------------+ 1310 | Media Encoder A | | Media Encoder B | 1311 +--------------------+ +--------------------+ 1312 | | 1313 Encoded Stream Encoded Stream 1314 V V 1315 +--------------------+ +--------------------+ 1316 | Media Packetizer A | | Media Packetizer B | 1317 +--------------------+ +--------------------+ 1318 | | 1319 Source Packet Stream A Source Packet Stream B 1320 | | 1321 +-----+-------+-------------+ +-------+------+ 1322 | V V V | 1323 | +---------------+ +---------------+ | 1324 | | FEC Encoder 1 | | FEC Encoder 2 | | 1325 | +---------------+ +---------------+ | 1326 | | | | 1327 | Redundancy PS 1 Redundancy PS 2 | 1328 V V V V 1329 +----------------------------------------------------------+ 1330 | Media Transport | 1331 +----------------------------------------------------------+ 1333 Figure 10: Example of FEC Flows 1335 As FEC Encoding exists in various forms, the methods for relating FEC 1336 Redundancy Packet Streams with its source information in Source 1337 Packet Streams are many. The XOR based RTP FEC Payload format 1338 [RFC5109] is defined in such a way that a Redundancy Packet Stream 1339 has a one to one relation with a Source Packet Stream. In fact, the 1340 RFC requires the Redundancy Packet Stream to use the same SSRC as the 1341 Source Packet Stream. This requires to either use a separate RTP 1342 session or to use the Redundancy RTP Payload format [RFC2198]. The 1343 underlying relation requirement for this FEC format and a particular 1344 Redundancy Packet Stream is to know the related Source Packet Stream, 1345 including its SSRC. 1347 3.3.4. Packet Stream Separation 1349 Packet Streams can be separated exclusively based on their SSRCs or 1350 at the RTP Session level or at the Multi-Media Session level as 1351 explained below. 1353 When the Packet Streams that have a relationship are all sent in the 1354 same RTP Session and are uniquely identified based on their SSRC 1355 only, it is termed an SSRC-Only Based Separation. Such streams can 1356 be related via RTCP CNAME to identify that the streams belong to the 1357 same End Point. [RFC5576]-based approaches, when used, can 1358 explicitly relate various such Packet Streams. 1360 On the other hand, when Packet Streams that are related but are sent 1361 in the context of different RTP Sessions to achieve separation, it is 1362 known as RTP Session-based separation. This is commonly used when 1363 the different Packet Streams are intended for different Media 1364 Transports. 1366 Several mechanisms that use RTP Session-based separation rely on it 1367 to enable an implicit grouping mechanism expressing the relationship. 1368 The solutions have been based on using the same SSRC value in the 1369 different RTP Sessions to implicitly indicate their relation. That 1370 way, no explicit RTP level mechanism has been needed, only signalling 1371 level relations have been established using semantics from Grouping 1372 of Media lines framework [RFC5888]. Examples of this are RTP 1373 Retransmission [RFC4588], SVC Multi Stream Transmission [RFC6190] and 1374 XOR Based FEC [RFC5109]. RTCP CNAME explicitly relates Packet 1375 Streams across different RTP Sessions, as explained in the previous 1376 section. Such a relationship can be used to perform inter-media 1377 synchronization. 1379 Packet Streams that are related and need to be associated can be part 1380 of different Multimedia Sessions, rather than just different RTP 1381 sessions within the same Multimedia Session context. This puts 1382 further demand on the scope of the mechanism(s) and its handling of 1383 identifiers used for expressing the relationships. 1385 3.4. Multiple RTP Sessions over one Media Transport 1387 [I-D.westerlund-avtcore-transport-multiplexing] describes a mechanism 1388 that allow several RTP Sessions to be carried over a single 1389 underlying Media Transport. The main reasons for doing this are 1390 related to the impact of using one or more Media Transports. Thus 1391 using a common network path or potentially have different ones. 1392 There is reduced need for NAT/FW traversal resources and no need for 1393 flow based QoS. 1395 However, Multiple RTP Sessions over one Media Transport makes it 1396 clear that a single Media Transport 5-tuple is not sufficient to 1397 express which RTP Session context a particular Packet Stream exists 1398 in. Complexities in the relationship between Media Transports and 1399 RTP Session already exist as one RTP Session contains multiple Media 1400 Transports, e.g. even a Peer-to-Peer RTP Session with RTP/RTCP 1401 Multiplexing requires two Media Transports, one in each direction. 1402 The relationship between Media Transports and RTP Sessions as well as 1403 additional levels of identifiers need to be considered in both 1404 signalling design and when defining terminology. 1406 4. Topologies and Communication Entities 1408 This Section reviews some communication topologies and looks at the 1409 relationship among the communication entities that are defined in 1410 Section 2.2. This section doesn't deal with discussions about the 1411 streams and their relation to the transport. Instead, it covers the 1412 aspects that enable the transport of those streams. For example, the 1413 Media Transports (Section 2.1.13) that exists between the End Points 1414 (Section 2.2.1) that are part of an RTP session (Section 2.2.2) and 1415 their relationship to the Multi-Media Session (Section 2.2.4) between 1416 Participants (Section 2.2.3) and the established Communication 1417 session (Section 2.2.5) are explained. 1419 4.1. Point-to-Point Communication 1421 Figure 11 shows a very basic point-to-point communication session 1422 between A and B. It uses two different audio and video RTP sessions 1423 between A's and B's end points. Assume that the Multi-media session 1424 shared by the participants is established using SIP (i.e., there is a 1425 SIP Dialog between A and B). The high level representation of this 1426 communication scenario can be demonstrated using Figure 11. 1428 +---+ +---+ 1429 | A |<------->| B | 1430 +---+ +---+ 1432 Figure 11: Point to Point Communication 1434 However, this picture gets slightly more complex when redrawn using 1435 the communication entities concepts defined earlier in this document. 1437 +-----------------------------------------------------------+ 1438 | Communication Session | 1439 | | 1440 | +----------------+ +----------------+ | 1441 | | Participant A | +-------------+ | Participant B | | 1442 | | | | Multi-Media | | | | 1443 | | +-------------+|<=>| Session |<=>|+-------------+ | | 1444 | | | End Point A || |(SIP Dialog) | || End Point B | | | 1445 | | | || +-------------+ || | | | 1446 | | | +-----------++---------------------++-----------+ | | | 1447 | | | | RTP Session| | | | | | 1448 | | | | Audio |---Media Transport-->| | | | | 1449 | | | | |<--Media Transport---| | | | | 1450 | | | +-----------++---------------------++-----------+ | | | 1451 | | | || || | | | 1452 | | | +-----------++---------------------++-----------+ | | | 1453 | | | | RTP Session| | | | | | 1454 | | | | Video |---Media Transport-->| | | | | 1455 | | | | |<--Media Transport---| | | | | 1456 | | | +-----------++---------------------++-----------+ | | | 1457 | | +-------------+| |+-------------+ | | 1458 | +----------------+ +----------------+ | 1459 +-----------------------------------------------------------+ 1461 Figure 12: Point to Point Communication Session with two RTP Sessions 1462 Figure 12 shows the two RTP Sessions only exist between the two End 1463 Points A and B and over their respective Media Transports. The 1464 Multi-Media Session establishes the association between the two 1465 Participants and configures these RTP sessions and the Media 1466 Transports that are used. 1468 4.2. Central Conferencing 1470 This section looks at the central conferencing communication 1471 topology, where a number of participants, like A, B, C, and D in 1472 Figure 13, communicate using an RTP mixer. 1474 +---+ +------------+ +---+ 1475 | A |<---->| |<---->| B | 1476 +---+ | | +---+ 1477 | Mixer | 1478 +---+ | | +---+ 1479 | C |<---->| |<---->| D | 1480 +---+ +------------+ +---+ 1482 Figure 13: Centralized Conferincing using an RTP Mixer 1484 In this case each of the Participants establish their Multi-media 1485 session with the Conference Bridge. Thus, negotiation for the 1486 establishment of the used RTP sessions and their configuration 1487 happens between these entities. The participants have their End 1488 Points (A, B, C, D) and the Conference Bridge has the host running 1489 the RTP mixer, referred to as End Point M in Figure 14. However, 1490 despite the individual establishment of four Multi-Media Sessions and 1491 the corresponding Media Transports for each of the RTP sessions 1492 between the respective End Points and the Conference Bridge, there is 1493 actually only two RTP sessions. One for audio and one for Video, as 1494 these RTP sessions are, in this topology, shared between all the 1495 Participants. 1497 +-------------------------------------------------------------------+ 1498 | Communication Session | 1499 | | 1500 | +----------------+ +----------------+ | 1501 | | Participant A | +-------------+ | Conference | | 1502 | | | | Multi-Media | | Bridge | | 1503 | | +-------------+|<=====>| Session A |<=====>|+-------------+ | | 1504 | | | End Point A || |(SIP Dialog) | || End Point M | | | 1505 | | | || +-------------+ || | | | 1506 | | | +-----------++-----------------------------++-----------+ | | | 1507 | | | | RTP Session| | | | | | 1508 | | | | Audio |-------Media Transport------>| | | | | 1509 | | | | |<------Media Transport-------| | | | | 1510 | | | +-----------++-----------------------------++------+ | | | | 1511 | | | || || | | | | | 1512 | | | +-----------++-----------------------------++----+ | | | | | 1513 | | | | RTP Session| | | | | | | | 1514 | | | | Video |-------Media Transport------>| | | | | | | 1515 | | | | |<------Media Transport-------| | | | | | | 1516 | | | +-----------++-----------------------------++ | | | | | | 1517 | | +-------------+| || | | | | | | 1518 | +----------------+ || | | | | | | 1519 | || | | | | | | 1520 | +----------------+ || | | | | | | 1521 | | Participant B | +-------------+ || | | | | | | 1522 | | | | Multi-Media | || | | | | | | 1523 | | +-------------+|<=====>| Session B |<=====>|| | | | | | | 1524 | | | End Point B || |(SIP Dialog) | || | | | | | | 1525 | | | || +-------------+ || | | | | | | 1526 | | | +-----------++-----------------------------++ | | | | | | 1527 | | | | RTP Session| | | | | | | | 1528 | | | | Video |-------Media Transport------>| | | | | | | 1529 | | | | |<------Media Transport-------| | | | | | | 1530 | | | +-----------++-----------------------------++----+ | | | | | 1531 | | | || || | | | | | 1532 | | | +-----------++-----------------------------++------+ | | | | 1533 | | | | RTP Session| | | | | | 1534 | | | | Audio |-------Media Transport------>| | | | | 1535 | | | | |<------Media Transport-------| | | | | 1536 | | | +-----------++-----------------------------++-----------+ | | | 1537 | | +-------------+| |+-------------+ | | 1538 | +----------------+ +----------------+ | 1539 +-------------------------------------------------------------------+ 1541 Figure 14: Central Conferencing with Two Participants A and B 1542 communicating over a Conference Bridge 1544 It is important to stress that in the case of Figure 14, it might 1545 appear that the the Multi-Media Sessions context is scoped between A 1546 and B over M. This might not be always true and they can have 1547 contexts that extend further. In this case the RTP session, its 1548 common SSRC space goes beyond what occurs between A and M and B and M 1549 respectively. 1551 4.3. Full Mesh Conferencing 1552 This section looks at the case where the three Participants (A, B and 1553 C) wish to communicate. They establish individual Multi-Media 1554 Sessions and RTP sessions between themselves and the other two peers. 1555 Thus, each providing two copies of their media to every other 1556 participant. Figure 15 shows a high level representation of such a 1557 topology. 1559 +---+ +---+ 1560 | A |<---->| B | 1561 +---+ +---+ 1562 ^ ^ 1563 \ / 1564 \ / 1565 v v 1566 +---+ 1567 | C | 1568 +---+ 1570 Figure 15: Full Mesh Conferencing with three Participants A, B and C 1572 In this particular case there are two aspects worth noting. The 1573 first is there will be multiple Multi-Media Sessions per 1574 Communication Session between the participants. This, however, 1575 hasn't been true in the earlier examples; the Centralized 1576 Conferencing inSection 4.2 being the exception. The second aspect is 1577 consideration of whether one needs to maintain relationships between 1578 entities and concepts, for example MediaSources, between these 1579 different Multi-Media Sessions and between Packet Streams in the 1580 independent RTP sessions configured by those Multi-Media Sessions. 1582 +-----------------------------------------+ 1583 | Participant A | 1584 +----------+ | +--------------------------------------+| 1585 | Multi- | | | End Point A || 1586 | Media |<======>| | || 1587 | Session | | |+-------+ +-------+ +-------+ || 1588 | 1 | | || RTP 1 |<----| MS A1 |---->| RTP 2 | || 1589 +----------+ | || | +-------+ | | || 1590 ^^ | +|-------|-------------------|-------|-+| 1591 || +--|-------|-------------------|-------|--+ 1592 || | | ^^ | | 1593 VV | | || | | 1594 +-------------------------|-------|----+ || | | 1595 | Participant B | | | VV | | 1596 | +-----------------------|-------|---+| +----------+ | | 1597 | | End Point B +----->| | || | Multi- | | | 1598 | | | +-------+ || | Media | | | 1599 | | +-------+ | +-------+ || | Session | | | 1600 | | | MS B1 |------+----->| RTP 3 | || | 2 | | | 1601 | | +-------+ | | || +----------+ | | 1602 | +-----------------------|-------|---+| ^^ | | 1603 +-------------------------|-------|----+ || | | 1604 ^^ | | || | | 1605 || | | VV | | 1606 || +--|-------|-------------------|-------|--+ 1607 VV | | | Participant C | | | 1608 +----------+ | +|-------|-------------------|-------|-+| 1609 | Multi- | | || | End Point C | | || 1610 | Media |<======>| |+-------+ +-------+ || 1611 | Session | | | ^ +-------+ ^ || 1612 | 3 | | | +---------| MS C1 |---------+ || 1613 +----------+ | | +-------+ || 1614 | +--------------------------------------+| 1615 +-----------------------------------------+ 1617 Figure 16: Full Mesh Conferencing between three Participants A, B and 1618 C 1620 For the sake of clarity, Figure 16 above does not include all these 1621 concepts. The Media Sources (MS) from a given End Point is sent to 1622 the two peers. This requires encoding and Media Packetization to 1623 enable the Packet Streams to be sent over Media Transports in the 1624 context of the RTP sessions depicted. The RTP sessions 1, 2, and 3 1625 are independent, and established in the context of each of the Multi- 1626 Media Sessions 1, 2 and 3. The joint communication session the full 1627 figure represents (not shown here as it was Figure 14 in order to 1628 save space), however, combines the received representations of the 1629 peers' Media Sources and plays them back. 1631 It is noteworthy that the full mesh conferencing topologies described 1632 here have the potential for creating loops. For example, if one 1633 compares the above full mesh with a mixing three party communication 1634 session as depicted in (Figure 17). In this example A's Media Source 1635 A1 is sent to B over a Multi-Media Session (A-B). In B the Media 1636 Source A1 is mixed with Media Source B1 and the resulting Media 1637 Source (MS AB) is sent to C over a Multi-Media Session (B-C). If C 1638 and A would establish a Multi-Media Session (A-C) and C would act in 1639 the same role as B, then A would receive a Media Source from C that 1640 contains a mix of A, B and C's individual Media Sources. This would 1641 result in A playing out a time delay version of its own signal (i.e., 1642 the system has created an echo path). 1644 +--------------+ +--------------+ +--------------+ 1645 | A | | B +-------+ | | C | 1646 | | | | MS B1 | | | | 1647 | | | +-------+ | | | 1648 | +-------+ | | | | | | 1649 | | MS A1 |----|--->|-----+ MS AB -|--->| | 1650 | +-------+ | | | | | 1651 +--------------+ +--------------+ +--------------+ 1653 Figure 17: Mixing Three Party Communication Session 1655 The looping issue can be avoided, detected or prevented using two 1656 general methods. The first method is to use great care when setting 1657 up and establishing the communication session if participants have 1658 any mixing or forwarding capacity, so that one doesn't end up getting 1659 back a partial or full representation of one's own media believing it 1660 is someone else's. The other method is to maintain some unique 1661 identifiers at the communication session level for all Media Sources 1662 and ensure that any Packet Streams received identify those Media 1663 Sources that contributed to the content of the Packet Stream. 1665 4.4. Source-Specific Multicast 1667 In one-to-many media distribution cases (e.g., IPTV), where one Media 1668 Sender or a set of Media Senders is allowed to send Packet Streams on 1669 a particular Source-Specific Multicast (SSM) group to many receivers 1670 (R), there are some different aspects to consider. Figure 18 1671 presents a high level SSM system for RTP/RTCP defined in [RFC5760]. 1672 In this case, several Media Senders sends their Packet Streams to the 1673 Distribution Source, which is the only one allowed to send to the SSM 1674 group. The Receivers joining the SSM group can provide RTCP feedback 1675 on its reception by sending unicast feedback to a Feedback Target 1676 (FT). 1678 +--------+ +-----+ 1679 |Media | | | Source-Specific 1680 |Sender 1|<----->| D S | Multicast (SSM) 1681 +--------+ | I O | +--+----------------> R(1) 1682 | S U | | | | 1683 +--------+ | T R | | +-----------> R(2) | 1684 |Media |<----->| R C |->+ | : | | 1685 |Sender 2| | I E | | +------> R(n-1) | | 1686 +--------+ | B | | | | | | 1687 : | U | +--+--> R(n) | | | 1688 : | T +-| | | | | 1689 : | I | |<---------+ | | | 1690 +--------+ | O |F|<---------------+ | | 1691 |Media | | N |T|<--------------------+ | 1692 |Sender M|<----->| | |<-------------------------+ 1693 +--------+ +-----+ RTCP Unicast 1695 FT = Feedback Target 1696 Figure 18: Source-Specific Multicast Communication Topology 1698 Here the Media Transport from the Distribution Source to all the SSM 1699 receivers (R) have the same 5-tuple, but in reality have different 1700 paths. Also, the Multi-Media Sessions between the Distribution 1701 Source and the individual receivers are normally identical. This is 1702 due to one-way communication from the Distribution Source to the 1703 receiver of configuration information. This is information typically 1704 embedded in Electronic Program Guides (EPGs), distributed by the 1705 Session Announcement Protocol (SAP) [RFC2974] or other one-way 1706 protocols. In some cases load balancing occurs, for example, by 1707 providing the receiver with a set of Feedback Targets and then it 1708 randomly selects one out of the set. 1710 This scenario varies significantly from previously described 1711 communication topologies due to the asymmetric nature of the RTP 1712 Session context across the Distribution Source. The Distribution 1713 Source forms a focal point in collecting the unicasted RTCP feedback 1714 from the receivers and then re-distributing it to the Media Senders. 1715 Each Media Sender and the Distribution Source establish their own 1716 Multi-Media Session Context for the underlying RTP Sessions but with 1717 shared RTCP context across all the receivers. 1719 To improve the readability,Figure 18 intentionally hides the details 1720 of the various entities . Expanding on this, one can think of Media 1721 Senders being part of one or more Multi-Media Sessions grouped under 1722 a Communication Session. The Media Sender in this scenario refers to 1723 the Media Packetizer transformation Section 2.1.9. The Packet Stream 1724 generated by such a Media Sender can be part of its own RTP Session 1725 or can be multiplexed with other Packet Streams within an End Point. 1726 The latter case requires careful consideration since the re- 1727 distributed RTCP packets now correspond to a single RTP Session 1728 Context across all the Media Senders. 1730 5. Security Considerations 1732 This document simply tries to clarify the confusion prevalent in RTP 1733 taxonomy because of inconsistent usage by multiple technologies and 1734 protocols making use of the RTP protocol. It does not introduce any 1735 new security considerations beyond those already well documented in 1736 the RTP protocol [RFC3550] and each of the many respective 1737 specifications of the various protocols making use of it. 1739 Hopefully having a well-defined common terminology and understanding 1740 of the complexities of the RTP architecture will help lead us to 1741 better standards, avoiding security problems. 1743 6. Acknowledgement 1745 This document has many concepts borrowed from several documents such 1746 as WebRTC [I-D.ietf-rtcweb-overview], CLUE [I-D.ietf-clue-framework], 1747 Multiplexing Architecture 1748 [I-D.westerlund-avtcore-transport-multiplexing]. The authors would 1749 like to thank all the authors of each of those documents. 1751 The authors would also like to acknowledge the insights, guidance and 1752 contributions of Magnus Westerlund, Roni Even, Paul Kyzivat, Colin 1753 Perkins, Keith Drage, and Harald Alvestrand. 1755 7. Contributors 1757 Magnus Westerlund has contributed the concept model for the media 1758 chain using transformations and streams model, including rewriting 1759 pre-existing concepts into this model and adding missing concepts. 1760 The first proposal for updating the relationships and the topologies 1761 based on this concept was also performed by Magnus. 1763 8. IANA Considerations 1765 This document makes no request of IANA. 1767 9. References 1769 9.1. Normative References 1771 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 1772 Jacobson, "RTP: A Transport Protocol for Real-Time 1773 Applications", STD 64, RFC 3550, July 2003. 1775 [UML] Object Management Group, "OMG Unified Modeling Language 1776 (OMG UML), Superstructure, V2.2", OMG formal/2009-02-02, 1777 February 2009. 1779 9.2. Informative References 1781 [I-D.ietf-avtcore-clksrc] 1782 Williams, A., Gross, K., Brandenburg, R., and H. Stokking, 1783 "RTP Clock Source Signalling", draft-ietf-avtcore- 1784 clksrc-07 (work in progress), October 2013. 1786 [I-D.ietf-clue-framework] 1787 Duckworth, M., Pepperell, A., and S. Wenger, "Framework 1788 for Telepresence Multi-Streams", draft-ietf-clue- 1789 framework-12 (work in progress), October 2013. 1791 [I-D.ietf-mmusic-sdp-bundle-negotiation] 1792 Holmberg, C., Alvestrand, H., and C. Jennings, 1793 "Multiplexing Negotiation Using Session Description 1794 Protocol (SDP) Port Numbers", draft-ietf-mmusic-sdp- 1795 bundle-negotiation-05 (work in progress), October 2013. 1797 [I-D.ietf-rtcweb-overview] 1798 Alvestrand, H., "Overview: Real Time Protocols for Brower- 1799 based Applications", draft-ietf-rtcweb-overview-08 (work 1800 in progress), September 2013. 1802 [I-D.westerlund-avtcore-transport-multiplexing] 1803 Westerlund, M. and C. Perkins, "Multiplexing Multiple RTP 1804 Sessions onto a Single Lower-Layer Transport", draft- 1805 westerlund-avtcore-transport-multiplexing-07 (work in 1806 progress), October 2013. 1808 [RFC2198] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., 1809 Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse- 1810 Parisis, "RTP Payload for Redundant Audio Data", RFC 2198, 1811 September 1997. 1813 [RFC2974] Handley, M., Perkins, C., and E. Whelan, "Session 1814 Announcement Protocol", RFC 2974, October 2000. 1816 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 1817 with Session Description Protocol (SDP)", RFC 3264, June 1818 2002. 1820 [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and 1821 Video Conferences with Minimal Control", STD 65, RFC 3551, 1822 July 2003. 1824 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 1825 Description Protocol", RFC 4566, July 2006. 1827 [RFC4588] Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R. 1828 Hakenberg, "RTP Retransmission Payload Format", RFC 4588, 1829 July 2006. 1831 [RFC4867] Sjoberg, J., Westerlund, M., Lakaniemi, A., and Q. Xie, 1832 "RTP Payload Format and File Storage Format for the 1833 Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband 1834 (AMR-WB) Audio Codecs", RFC 4867, April 2007. 1836 [RFC5109] Li, A., "RTP Payload Format for Generic Forward Error 1837 Correction", RFC 5109, December 2007. 1839 [RFC5404] Westerlund, M. and I. Johansson, "RTP Payload Format for 1840 G.719", RFC 5404, January 2009. 1842 [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific 1843 Media Attributes in the Session Description Protocol 1844 (SDP)", RFC 5576, June 2009. 1846 [RFC5760] Ott, J., Chesterfield, J., and E. Schooler, "RTP Control 1847 Protocol (RTCP) Extensions for Single-Source Multicast 1848 Sessions with Unicast Feedback", RFC 5760, February 2010. 1850 [RFC5888] Camarillo, G. and H. Schulzrinne, "The Session Description 1851 Protocol (SDP) Grouping Framework", RFC 5888, June 2010. 1853 [RFC5905] Mills, D., Martin, J., Burbank, J., and W. Kasch, "Network 1854 Time Protocol Version 4: Protocol and Algorithms 1855 Specification", RFC 5905, June 2010. 1857 [RFC6190] Wenger, S., Wang, Y., Schierl, T., and A. Eleftheriadis, 1858 "RTP Payload Format for Scalable Video Coding", RFC 6190, 1859 May 2011. 1861 [RFC6222] Begen, A., Perkins, C., and D. Wing, "Guidelines for 1862 Choosing RTP Control Protocol (RTCP) Canonical Names 1863 (CNAMEs)", RFC 6222, April 2011. 1865 Appendix A. Changes From Earlier Versions 1867 NOTE TO RFC EDITOR: Please remove this section prior to publication. 1869 A.1. Modifications Between Version -02 and -03 1871 o Section 4 rewritten (and new communication topologies added) to 1872 reflect the major updates to Sections 1-3 1874 o Section 8 removed (carryover from initial -00 draft) 1876 o General clean up of text, grammar and nits 1878 A.2. Modifications Between Version -01 and -02 1880 o Section 2 rewritten to add both streams and transformations in the 1881 media chain. 1883 o Section 3 rewritten to focus on exposing relationships. 1885 A.3. Modifications Between Version -00 and -01 1886 o Too many to list 1888 o Added new authors 1890 o Updated content organization and presentation 1892 Authors' Addresses 1894 Jonathan Lennox 1895 Vidyo, Inc. 1896 433 Hackensack Avenue 1897 Seventh Floor 1898 Hackensack, NJ 07601 1899 US 1901 Email: jonathan@vidyo.com 1903 Kevin Gross 1904 AVA Networks, LLC 1905 Boulder, CO 1906 US 1908 Email: kevin.gross@avanw.com 1910 Suhas Nandakumar 1911 Cisco Systems 1912 170 West Tasman Drive 1913 San Jose, CA 95134 1914 US 1916 Email: snandaku@cisco.com 1918 Gonzalo Salgueiro 1919 Cisco Systems 1920 7200-12 Kit Creek Road 1921 Research Triangle Park, NC 27709 1922 US 1924 Email: gsalguei@cisco.com 1925 Bo Burman 1926 Ericsson 1927 Farogatan 6 1928 SE-164 80 Kista 1929 Sweden 1931 Phone: +46 10 714 13 11 1932 Email: bo.burman@ericsson.com