idnits 2.17.1 draft-ietf-avt-rapid-rtp-sync-11.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document updates RFC3550, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC3550, updated by this document, for RFC5378 checks: 1998-04-07) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (May 30, 2010) is 5079 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 5285 (ref. '6') (Obsoleted by RFC 8285) ** Obsolete normative reference: RFC 1305 (ref. '7') (Obsoleted by RFC 5905) == Outdated reference: A later version (-27) exists of draft-ietf-avt-rtp-svc-21 -- Obsolete informational reference (is this intentional?): RFC 5245 (ref. '12') (Obsoleted by RFC 8445, RFC 8839) == Outdated reference: A later version (-22) exists of draft-zimmermann-avt-zrtp-18 -- Obsolete informational reference (is this intentional?): RFC 5117 (ref. '15') (Obsoleted by RFC 7667) == Outdated reference: A later version (-17) exists of draft-ietf-avt-rapid-acquisition-for-rtp-09 Summary: 2 errors (**), 0 flaws (~~), 4 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group C. Perkins 3 Internet-Draft University of Glasgow 4 Updates: 3550 (if approved) T. Schierl 5 Intended status: Standards Track Fraunhofer HHI 6 Expires: December 1, 2010 May 30, 2010 8 Rapid Synchronisation of RTP Flows 9 draft-ietf-avt-rapid-rtp-sync-11.txt 11 Abstract 13 This memo outlines how RTP sessions are synchronised, and discusses 14 how rapidly such synchronisation can occur. We show that most RTP 15 sessions can be synchronised immediately, but that the use of video 16 switching multipoint conference units (MCUs) or large source specific 17 multicast (SSM) groups can greatly increase the synchronisation 18 delay. This increase in delay can be unacceptable to some 19 applications that use layered and/or multi-description codecs. 21 This memo introduces three mechanisms to reduce the synchronisation 22 delay for such sessions. First, it updates the RTP Control Protocol 23 (RTCP) timing rules to reduce the initial synchronisation delay for 24 SSM sessions. Second, a new feedback packet is defined for use with 25 the Extended RTP Profile for RTCP-based Feedback (RTP/AVPF), allowing 26 video switching MCUs to rapidly request resynchronisation. Finally, 27 new RTP header extensions are defined to allow rapid synchronisation 28 of late joiners, and guarantee correct timestamp based decoding order 29 recovery for layered codecs in the presence of clock skew. 31 Status of this Memo 33 This Internet-Draft is submitted in full conformance with the 34 provisions of BCP 78 and BCP 79. 36 Internet-Drafts are working documents of the Internet Engineering 37 Task Force (IETF). Note that other groups may also distribute 38 working documents as Internet-Drafts. The list of current Internet- 39 Drafts is at http://datatracker.ietf.org/drafts/current/. 41 Internet-Drafts are draft documents valid for a maximum of six months 42 and may be updated, replaced, or obsoleted by other documents at any 43 time. It is inappropriate to use Internet-Drafts as reference 44 material or to cite them other than as "work in progress." 46 This Internet-Draft will expire on December 1, 2010. 48 Copyright Notice 49 Copyright (c) 2010 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents 54 (http://trustee.ietf.org/license-info) in effect on the date of 55 publication of this document. Please review these documents 56 carefully, as they describe your rights and restrictions with respect 57 to this document. Code Components extracted from this document must 58 include Simplified BSD License text as described in Section 4.e of 59 the Trust Legal Provisions and are provided without warranty as 60 described in the Simplified BSD License. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 65 2. Synchronisation of RTP Flows . . . . . . . . . . . . . . . . . 4 66 2.1. Initial Synchronisation Delay . . . . . . . . . . . . . . 5 67 2.1.1. Unicast Sessions . . . . . . . . . . . . . . . . . . . 5 68 2.1.2. Source Specific Multicast (SSM) Sessions . . . . . . . 6 69 2.1.3. Any Source Multicast (ASM) Sessions . . . . . . . . . 7 70 2.1.4. Discussion . . . . . . . . . . . . . . . . . . . . . . 8 71 2.2. Synchronisation for Late Joiners . . . . . . . . . . . . . 8 72 3. Reducing RTP Synchronisation Delays . . . . . . . . . . . . . 9 73 3.1. Reduced Initial RTCP Interval for SSM Senders . . . . . . 9 74 3.2. Rapid Resynchronisation Request . . . . . . . . . . . . . 10 75 3.3. In-band Delivery of Synchronisation Metadata . . . . . . . 11 76 4. Application to Decoding Order Recovery in Layered Codecs . . . 13 77 4.1. In-band Synchronisation for Decoding Order Recovery . . . 14 78 4.2. Timestamp based decoding order recovery . . . . . . . . . 15 79 4.3. Example . . . . . . . . . . . . . . . . . . . . . . . . . 16 80 5. Security Considerations . . . . . . . . . . . . . . . . . . . 17 81 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 82 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 18 83 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18 84 8.1. Normative References . . . . . . . . . . . . . . . . . . . 18 85 8.2. Informative References . . . . . . . . . . . . . . . . . . 19 86 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20 88 1. Introduction 90 When using RTP to deliver multimedia content it's often necessary to 91 synchronise playout of audio and video components of a presentation. 92 This is achieved using information contained in RTP Control Protocol 93 (RTCP) Sender Report (SR) packets [1]. These are sent periodically, 94 and the components of a multimedia session cannot be synchronised 95 until sufficient RTCP SR packets have been received for each RTP flow 96 to allow the receiver to establish mappings between the media clock 97 used for each RTP flow, and the common (NTP-format) reference clock 98 used to establish synchronisation. 100 Recently, concern has been expressed that this synchronisation delay 101 is problematic for some applications, for example those using layered 102 or multi-description video coding. This memo reviews the operations 103 of RTP synchronisation, and describes the synchronisation delay that 104 can be expected. Three backwards compatible extensions to the basic 105 RTP synchronisation mechanism are proposed: 107 o The RTCP transmission timing rules are relaxed for SSM senders, to 108 reduce the initial synchronisation latency for large SSM groups. 109 See Section 3.1. 111 o An enhancement to the Extended RTP Profile for RTCP-based Feedback 112 (RTP/AVPF) [2] is defined to allow receivers to request additional 113 RTCP SR packets, providing the metadata needed to synchronise RTP 114 flows. This can reduce the synchronisation delay when joining 115 sessions with large RTCP reporting intervals, in the presence of 116 packet loss, or when video switching MCUs are employed. See 117 Section 3.2. 119 o Two RTP header extensions are defined, to deliver synchronisation 120 metadata in-band with RTP data packets. These extensions provide 121 synchronisation metadata that is aligned with RTP data packets, 122 and so eliminate the need to estimate clock-skew between flows 123 before synchronisation. They can also reduce the need to receive 124 RTCP SR packets before flows can be synchronised, although it does 125 not eliminate the need for RTCP. See Section 3.3. 127 The immediate use-case for these extensions is to reduce the delay 128 due to synchronisation when joining a layered video session (e.g. an 129 H.264/SVC session in NI-T mode [9]). The extensions are not specific 130 to layered coding, however, and can be used in any environment when 131 synchronisation latency is an issue. 133 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 134 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 135 document are to be interpreted as described in RFC 2119 [3]. 137 2. Synchronisation of RTP Flows 139 RTP flows are synchronised by receivers based on information that is 140 contained in RTCP SR packets generated by senders (specifically, the 141 NTP-format timestamp and the RTP timestamp). Synchronisation 142 requires that a common reference clock MUST be used to generate the 143 NTP-format timestamps in a set of flows that are to be synchronised 144 (i.e. when synchronising several RTP flows, the RTP timestamps for 145 each flow are derived from separate, and media specific, clocks, but 146 the NTP format timestamps in the RTCP SR packets of all flows to be 147 synchronised MUST be sampled from the same clock). To achieve faster 148 and more accurate synchronisation, it is further RECOMMENDED that 149 senders and receivers use a synchronised common NTP format reference 150 clock with common properties, especially timebase, where possible 151 (recognising that this is often not possible when RTP is used outside 152 of controlled environments); the means by which that common reference 153 clock and its properties are signalled and distributed is outside the 154 scope of this memo. 156 For multimedia sessions, each type of media (e.g. audio or video) is 157 sent in a separate RTP session, and the receiver associates RTP flows 158 to be synchronised by means of the canonical end-point identifier 159 (CNAME) item included in the RTCP Source Description (SDES) packets 160 generated by the sender or signalled out of band [10]. For layered 161 media, different layers can be sent in different RTP sessions, or 162 using different SSRC values within a single RTP session; in both 163 cases, the CNAME is used to identify flows to be synchronised. To 164 ensure synchronisation, an RTP sender MUST therefore send periodic 165 compound RTCP packets following Section 6 of RFC 3550 [1]. 167 The timing of these periodic compound RTCP packets will depend on the 168 number of members in each RTP session, the fraction of those that are 169 sending data, the session bandwidth, the configured RTCP bandwidth 170 fraction, and whether the session is multicast or unicast (see RFC 171 3550 Section 6.2 for details). In summary, RTCP control traffic is 172 allocated a small fraction, generally 5%, of the session bandwidth, 173 and of that fraction, one quarter is allocated to active RTP senders, 174 while receivers use the remaining three quarters (these fractions can 175 be configured via SDP [11]). Each member of an RTP session derives 176 an RTCP reporting interval based on these fractions, whether the 177 session is multicast or unicast, the number of members it has 178 observed, and whether it is actively sending data or not. It then 179 sends a compound RTCP packet on average once per reporting interval 180 (the actual packet transmission time is randomised in the range [0.5 181 ... 1.5] times the reporting interval to avoid synchronisation of 182 reports). 184 A minimum reporting interval of 5 seconds is RECOMMENDED, except that 185 the delay before sending the initial report "MAY be set to half the 186 minimum interval to allow quicker notification that the new 187 participant is present" [1]. Also, for unicast sessions, "the delay 188 before sending the initial compound RTCP packet MAY be zero" [1]. In 189 addition, for unicast sessions, and for active senders in a multicast 190 session, the fixed minimum reporting interval MAY be scaled to "360 191 divided by the session bandwidth in kilobits/second. This minimum is 192 smaller than 5 seconds for bandwidths greater than 72 kb/s." [1] 194 2.1. Initial Synchronisation Delay 196 A multimedia session comprises a set of concurrent RTP sessions among 197 a common group of participants, using one RTP session for each media 198 type. For example, a videoconference (which is a multimedia session) 199 might contain an audio RTP session and a video RTP session. To allow 200 a receiver to synchronise the components of a multimedia session, a 201 compound RTCP packet containing an RTCP SR packet and an RTCP SDES 202 packet with a CNAME item MUST be sent to each of the RTP sessions in 203 the multimedia session by each sender. A receiver cannot synchronise 204 playout across the multimedia session until such RTCP packets have 205 been received on all of the component RTP sessions. If there is no 206 packet loss, this gives an expected initial synchronisation delay 207 equal to the average time taken to receive the first RTCP packet in 208 the RTP session with the longest RTCP reporting interval. This will 209 vary between unicast and multicast RTP sessions. 211 The initial synchronisation delay for layered sessions is similar to 212 that for multimedia sessions. The layers cannot be synchronised 213 until the RTCP SR and CNAME information has been received for each 214 layer in the session. 216 2.1.1. Unicast Sessions 218 For unicast multimedia or layered sessions, senders SHOULD transmit 219 an initial compound RTCP packet (containing an RTCP SR packet and an 220 RTCP SDES packet with a CNAME item) immediately on joining each RTP 221 session in the multimedia session. The individual RTP sessions are 222 considered to be joined once any in-band signalling for NAT traversal 223 (e.g. [12]) and/or security keying (e.g. [13],[14]) has concluded, 224 and the media path is open. This implies that the initial RTCP 225 packet is sent in parallel with the first data packet following the 226 guidance in RFC 3550 that "the delay before sending the initial 227 compound RTCP packet MAY be zero" and, in the absence of any packet 228 loss, flows can be synchronised immediately. 230 It is expected that NAT pinholes, firewall holes, quality-of-service, 231 and media security keys will have been negotiated as part of the 232 signalling, whether in-band or out-of-band, before the first RTCP 233 packet is sent. This should ensure that any middleboxes are ready to 234 accept traffic, and reduce the likelihood that the initial RTCP 235 packet will be lost. 237 2.1.2. Source Specific Multicast (SSM) Sessions 239 For multicast sessions, the delay before sending the initial RTCP 240 packet, and hence the synchronisation delay, varies with the session 241 bandwidth and the number of members in the session. For a multicast 242 multimedia or layered session, the average synchronisation delay will 243 depend on the slowest of the component RTP sessions; this will 244 generally be the session with the lowest bandwidth (assuming all the 245 RTP sessions have the same number of members). 247 When sending to a multicast group, the reduced minimum RTCP reporting 248 interval of 360 seconds divided by the session bandwidth in kilobits 249 per second [1] should be used when synchronisation latency is likely 250 to be an issue. Also, as usual, the reporting interval is halved for 251 the first RTCP packet. Depending on the session bandwidth and the 252 number of members, this gives the average synchronisation delays 253 shown in Figure 1. 255 Session| Number of receivers: 256 Bandwidth| 2 3 4 5 10 100 1000 10000 257 --+------------------------------------------------ 258 8 kbps| 2.73 4.10 5.47 5.47 5.47 5.47 5.47 5.47 259 16 kbps| 2.50 2.50 2.73 2.73 2.73 2.73 2.73 2.73 260 32 kbps| 2.50 2.50 2.50 2.50 2.50 2.50 2.50 2.50 261 64 kbps| 2.50 2.50 2.50 2.50 2.50 2.50 2.50 2.50 262 128 kbps| 1.41 1.41 1.41 1.41 1.41 1.41 1.41 1.41 263 256 kbps| 0.70 0.70 0.70 0.70 0.70 0.70 0.70 0.70 264 512 kbps| 0.35 0.35 0.35 0.35 0.35 0.35 0.35 0.35 265 1 Mbps| 0.18 0.18 0.18 0.18 0.18 0.18 0.18 0.18 266 2 Mbps| 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 267 4 Mbps| 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 269 Figure 1: Average initial synchronisation delay in seconds for an RTP 270 Session with 1 sender. 272 These numbers assume a source specific multicast channel with a 273 single active sender, assuming an average RTCP packet size of 70 274 octets. These intervals are sufficient for lip-synchronisation 275 without excessive delay, but might be viewed as having too much 276 latency for synchronising parts of a layered video stream. 278 The RTCP interval is randomised in the usual manner, so the minimum 279 synchronisation delay will be half these intervals, and the maximum 280 delay will be 1.5 times these intervals. Note also that these RTCP 281 intervals are calculated assuming perfect knowledge of the number of 282 members in the session. 284 2.1.3. Any Source Multicast (ASM) Sessions 286 For ASM sessions, the fraction of members that are senders plays an 287 important role, and causes more variation in average RTCP reporting 288 interval. This is illustrated in Figure 2 and Figure 3, which show 289 the RTCP reporting interval for the same session bandwidths and 290 receiver populations as the SSM session described in Figure 1, but 291 for sessions with 2 and 10 senders respectively. It can be seen that 292 the initial synchronisation delay scales with the number of senders 293 (this is to ensure that the total RTCP traffic from all group members 294 does not grow without bound) and can be significantly larger than for 295 source specific groups. Despite this, the initial synchronisation 296 time remains acceptable for lip-synchronisation in typical small-to- 297 medium sized group video conferencing scenarios. 299 Note that multi-sender groups implemented using multi-unicast with a 300 central RTP translator (Topo-Translator in the terminology of [15]) 301 or mixer (Topo-Mixer), or some forms of video switching MCU (Topo- 302 Video-switch-MCU) distribute RTCP packets to all members of the 303 group, and so scale in the same way as an ASM group with regards to 304 initial synchronisation latency. 306 Session| Number of receivers: 307 Bandwidth| 2 3 4 5 10 100 1000 10000 308 --+------------------------------------------------ 309 8 kbps| 2.73 4.10 5.47 6.84 10.94 10.94 10.94 10.94 310 16 kbps| 2.50 2.50 2.73 3.42 5.47 5.47 5.47 5.47 311 32 kbps| 2.50 2.50 2.50 2.50 2.73 2.73 2.73 2.73 312 64 kbps| 2.50 2.50 2.50 2.50 2.50 2.50 2.50 2.50 313 128 kbps| 1.41 1.41 1.41 1.41 1.41 1.41 1.41 1.41 314 256 kbps| 0.70 0.70 0.70 0.70 0.70 0.70 0.70 0.70 315 512 kbps| 0.35 0.35 0.35 0.35 0.35 0.35 0.35 0.35 316 1 Mbps| 0.18 0.18 0.18 0.18 0.18 0.18 0.18 0.18 317 2 Mbps| 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 318 4 Mbps| 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 320 Figure 2: Average initial synchronisation delay in seconds for an RTP 321 Session with 2 senders. 323 Session| Number of receivers: 324 Bandwidth| 2 3 4 5 10 100 1000 10000 325 --+------------------------------------------------ 326 8 kbps| 2.73 4.10 5.47 6.84 13.67 54.69 54.69 54.69 327 16 kbps| 2.50 2.50 2.73 3.42 6.84 27.34 27.34 27.34 328 32 kbps| 2.50 2.50 2.50 2.50 3.42 13.67 13.67 13.67 329 64 kbps| 2.50 2.50 2.50 2.50 2.50 6.84 6.84 6.84 330 128 kbps| 1.41 1.41 1.41 1.41 1.41 3.42 3.42 3.42 331 256 kbps| 0.70 0.70 0.70 0.70 0.70 1.71 1.71 1.71 332 512 kbps| 0.35 0.35 0.35 0.35 0.35 0.85 0.85 0.85 333 1 Mbps| 0.18 0.18 0.18 0.18 0.18 0.43 0.43 0.43 334 2 Mbps| 0.09 0.09 0.09 0.09 0.09 0.21 0.21 0.21 335 4 Mbps| 0.04 0.04 0.04 0.04 0.04 0.11 0.11 0.11 337 Figure 3: Average initial synchronisation delay in seconds for an RTP 338 Session with 10 senders. 340 2.1.4. Discussion 342 For unicast sessions, the existing RTCP SR-based mechanism allows for 343 immediate synchronisation, provided the initial RTCP packet is not 344 lost. 346 For SSM sessions, the initial synchronisation delay is sufficient for 347 lip-synchronisation, but may be larger than desired for some layered 348 codecs. The rationale for not sending immediate RTCP packets for 349 multicast groups is to avoid implosion of requests when large numbers 350 of members simultaneously join the group ("flash crowd"). This is 351 not an issue for SSM senders, since there can be at most one sender, 352 so it is desirable to allow SSM senders to send an immediate RTCP SR 353 on joining a session (as is currently allowed for unicast sessions, 354 which also don't suffer from the implosion problem). SSM receivers 355 using unicast feedback would not be allowed to send immediate RTCP. 356 For ASM sessions, implosion of responses is a concern, so no change 357 is proposed to the RTCP timing rules. 359 In all cases, it is possible that the initial RTCP SR packet is lost. 360 In this case, the receiver will not be able to synchronise the media 361 until the reporting interval has passed, and the next RTCP SR packet 362 is sent. This is undesirable. Section 3.2 defines a new RTP/AVPF 363 transport layer feedback message to request an RTCP SR be generated, 364 allowing rapid resynchronisation in the case of packet loss. 366 2.2. Synchronisation for Late Joiners 368 Synchronisation between RTP sessions is potentially slower for late 369 joiners than for participants present at the start of the session. 370 The reasons for this are three-fold: 372 1. Many of the optimisations that allow rapid transmission of RTCP 373 SR packets apply only at the start of a session. This implies 374 that a new participant may have to wait a complete RTCP reporting 375 interval for each session before receiving the necessary data to 376 synchronise media streams. This might potentially take several 377 seconds, depending on the configured session bandwidth and the 378 number of participants. 380 2. Additional synchronisation delay comes from the nature of the 381 RTCP timing rules. Packets are generated on average once per 382 reporting interval, but with the exact transmission times being 383 randomised +/- 50% to avoid synchronisation of reports. This is 384 important to avoid network congestion in multicast sessions, but 385 does mean that the timing of RTCP SR reports for different RTP 386 sessions isn't synchronised. Accordingly, a receiver must 387 estimate the skew on the NTP-format clock in order to align RTP 388 timestamps across sessions. This estimation is an essential part 389 of an RTP synchronisation implementation, and can be done with 390 high accuracy given sufficient reports. Collecting sufficient 391 RTCP SR data to perform this estimation, however, may require 392 reception of several RTCP reports, further increasing the 393 synchronisation delay. 395 3. Many media codecs have the notion of periodic access points, such 396 that a newly joined receiver often cannot start decoding a media 397 stream until the packets corresponding to the access point have 398 been received. These access points may be sent less often than 399 RTCP SR packets, and so may be the limiting factor in starting 400 synchronised media playout for late joiners. The RTP extension 401 for unicast-based rapid acquisition of multicast RTP sessions 402 [16] may be used to reduce the time taken to receive the access 403 points in some scenarios. 405 These delays are likely an issue for tuning in to an ongoing 406 multicast RTP session, or for video switching MCUs. 408 3. Reducing RTP Synchronisation Delays 410 Three backwards compatible RTP extensions are defined to reduce the 411 possible synchronisation delay: a reduced initial RTCP interval for 412 SSM senders, a rapid resynchronisation request message, and RTP 413 header extensions that can convey synchronisation metadata in-band. 415 3.1. Reduced Initial RTCP Interval for SSM Senders 417 In SSM sessions where the initial synchronisation delay is important, 418 the RTP sender MAY set the delay before sending the initial compound 419 RTCP packet to zero, and send its first RTCP packet immediately upon 420 joining the SSM session. RTP receivers in an SSM session, sending 421 unicast RTCP feedback, MUST NOT send RTCP packets with zero initial 422 delay; the timing rules defined in [4] apply unchanged to receivers. 424 3.2. Rapid Resynchronisation Request 426 The general format of an RTP/AVPF transport layer feedback message is 427 shown in Figure 4 (see [2] for details). 429 0 1 2 3 430 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 431 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 432 |V=2|P| FMT | PT=RTPFB=205 | length | 433 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 434 | SSRC of packet sender | 435 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 436 | SSRC of media source | 437 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 438 : Feedback Control Information (FCI) : 439 : : 441 Figure 4: RTP/AVP Transport Layer Feedback Message 443 One new feedback message type, RTCP-SR-REQ, is defined with FMT = 5. 444 The Feedback Control Information (FCI) part of the feedback message 445 MUST be empty. The SSRC of packet sender indicates the member that 446 is unable to synchronise media streams, while the SSRC of media 447 source indicates the sender of the media it is unable to synchronise. 448 The length MUST equal 2. 450 If the RTP/AVPF profile [2] is in use, this feedback message MAY be 451 sent by a receiver to indicate that it's unable to synchronise some 452 media streams, and desires that the media source transmit an RTCP SR 453 packet as soon as possible (within the constraints of the RTCP timing 454 rules for early feedback). When it receives such an indication, a 455 media source that understands the RTCP-SR-REQ packet SHOULD generate 456 an RTCP SR packet as soon as possible while complying with the RTCP 457 early feedback rules. If the use of non-compound RTCP [5] was 458 previously negotiated, both the feedback request and the RTCP SR 459 response may be sent as non-compound RTCP packets. The RTCP-SR-REQ 460 packet MAY be repeated once per RTCP reporting interval if no RTCP SR 461 packet is forthcoming. The media source may ignore RTCP-SR-REQ 462 packets if its regular schedule for transmission of synchronisation 463 metadata can be expected to allow the receiver to synchronise the 464 media streams within a reasonable time frame. 466 When using SSM sessions with unicast feedback, is possible that the 467 feedback target and media source are not co-located. If a feedback 468 target receives an RTCP-SR-REQ feedback message in such a case, the 469 request should be forwarded to the media source. The mechanism to be 470 used for forwarding such requests is not defined here. 472 3.3. In-band Delivery of Synchronisation Metadata 474 The RTP header extension mechanism defined in [6] can be adopted to 475 carry an OPTIONAL NTP format timestamp in RTP data packets. If such 476 a timestamp is included, it MUST correspond to the same time instant 477 as the RTP timestamp in the packet's header, and MUST be derived from 478 the same clock used to generate the NTP format timestamps included in 479 RTCP SR packets. Provided it has knowledge of the SSRC to CNAME 480 mapping, either from prior receipt of an RTCP CNAME packet or via 481 out-of-band signalling [10], the receiver can use the information 482 provided as input to the synchronisation algorithm, in exactly the 483 same way as if an additional RTCP SR packet was been received for the 484 flow. 486 Two variants are defined for this header extension. The first 487 variant extends the RTP header with a 64 bit NTP timestamp format 488 timestamp as defined in [7]. The second variant carries the lower 24 489 bit part of the Seconds of a NTP timestamp format timestamp and the 490 32 bit of the Fraction of a NTP timestamp format timestamp. The 491 formats of the two variants are shown in Figure 5 and Figure 6. 493 0 1 2 3 494 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 495 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 496 |V=2|P|1| CC |M| PT | sequence number | 497 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+R 498 | timestamp |T 499 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+P 500 | synchronisation source (SSRC) identifier | 501 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 502 | 0xBE | 0xDE | length=3 | 503 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+E 504 | ID-A | L=7 | NTP timestamp format - Seconds (bit 0-23) |x 505 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+t 506 |NTP Sec.(24-31)| NTP timestamp format - Fraction(bit 0-23) |n 507 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 508 |NTP Frc.(24-31)| 0 (pad) | 0 (pad) | 0 (pad) | 509 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 510 | payload data | 511 | .... | 512 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 514 Figure 5: Variant A/64-bit NTP RTP header extension 516 0 1 2 3 517 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 518 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 519 |V=2|P|1| CC |M| PT | sequence number | 520 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+R 521 | timestamp |T 522 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+P 523 | synchronisation source (SSRC) identifier | 524 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 525 | 0xBE | 0xDE | length=2 | 526 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+E 527 | ID-B | L=6 | NTP timestamp format - Seconds (bit 8-31) |x 528 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+t 529 | NTP timestamp format - Fraction (bit 0-31) |n 530 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 531 | payload data | 532 | .... | 533 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 535 Figure 6: Variant B/56-bit NTP RTP header extension 537 An NTP timestamp format timestamp MAY be included on any RTP packets 538 the sender chooses, but it is RECOMMENDED when performing timestamp 539 based decoding order recovery for layered codecs transported in 540 multiple RTP flows, as further specified in Section 4.1. This header 541 extension SHOULD be also sent on the RTP packets corresponding to a 542 video random access point, and on the associated audio packets, to 543 allow rapid synchronisation for late joiners in multimedia sessions, 544 and in video switching scenarios. 546 Note: The inclusion of an RTP header extension will reduce the 547 efficiency of RTP header compression, if it is used. Furthermore, 548 middle boxes which do not understand the header extensions may remove 549 them or may not update the content according to this memo. 551 In all cases, irrespective of whether in-band NTP timestamp format 552 timestamps are included or not, regular RTCP SR packets MUST be sent 553 to provide backwards compatibility with receivers that synchronise 554 RTP flows according to [1], and robustness in the face of middleboxes 555 (RTP translators) that might strip RTP header extensions. If the 556 Variant B/56-bit NTP RTP header extension is used, RTCP sender 557 reports MUST be used to derive the upper 8 bit of the Seconds for the 558 NTP timestamp format timestamp. 560 When the SDP is used, the use of the RTP header extensions defined 561 above MUST be indicated as specified in [6]. Therefore the following 562 URIs MUST be used: 564 o The URI used for signalling the use of Variant A/64-bit NTP RTP 565 header extension in SDP is "urn:ietf:params:rtp-hdrext:ntp-64". 567 o The URI used for signalling the use of Variant B/56-bit NTP RTP 568 header extension in SDP is "urn:ietf:params:rtp-hdrext:ntp-56". 570 4. Application to Decoding Order Recovery in Layered Codecs 572 Packets in RTP flows are often predictively coded, with a receiver 573 having to arrange the packets into a particular order before it can 574 decode the media data. Depending on the payload format, the decoding 575 order might be explicitly specified as a field in the RTP payload 576 header, or the receiver might decode the packets in order of their 577 RTP timestamps. If a layered encoding is used, where the media data 578 is split across several RTP flows, then it is often necessary to 579 exactly synchronise the RTP flows comprising the different layers 580 before layers other than the base layer can be decoded. Examples of 581 such layered encodings are H.264 SVC in NI-T mode [9] and MPEG 582 surround multi-channel audio [17]. As described in Section 2, such 583 synchronisation is possible in RTP, but can be difficult to perform 584 rapidly. In the following, we describe how the extensions defined in 585 Section 3.3 can be used to synchronise layered flows, and provide a 586 common timestamp-based decoding order. 588 4.1. In-band Synchronisation for Decoding Order Recovery 590 When a layered, multi-description, or multi-view codec is used, with 591 the different components of the media being transferred on separate 592 RTP flows, the RTP sender SHOULD use periodic synchronous in-band 593 delivery of synchronisation metadata to allow receivers to rapidly 594 and accurately synchronise the separate components of the layered 595 media flow. There are three parts to this: 597 o The sender must negotiate the use of the RTP header extensions 598 described in Section 3.3, and must periodically and synchronously 599 insert such header extensions into all the RTP flows forming the 600 separate components of the layered, multi-description, or multi- 601 view flow. 603 o Synchronous insertion requires the sender insert these RTP header 604 extensions into packets corresponding to the exact same sampling 605 instant in all the flows. Since the header extensions for each 606 flow are inserted at exactly the same sampling instant, they will 607 have identical NTP-format timestamps, hence allowing receivers to 608 exactly align the RTP timestamps for the component flows. This 609 may require the insertion of extra data packets into some of the 610 component RTP flows, if some component flows contain packets for 611 sampling instants that do not exist in other flows (for example, a 612 layered video codec, where the layers have differing frame rates). 614 o The frequency with which the sender inserts the header extensions 615 will directly correspond to the synchronisation latency, with more 616 frequent insertion leading to higher per-flows overheads, but 617 lower synchronisation latency. It is RECOMMENDED that the sender 618 insert the header extensions synchronously into all component RTP 619 flows at least once per random access point of the media, but they 620 MAY be inserted more often. 622 The sender MUST continue to send periodic RTCP reports including SR 623 packets, and MUST ensure the RTP timestamp to NTP-format timestamp 624 mapping in the RTCP SR packets is consistent with that used in the 625 RTP header extensions. Receivers should use both the information 626 contained in RTCP SR packets and the in-band mapping of RTP and NTP- 627 format timestamps as input to the synchronisation process, but it is 628 RECOMMENDED that receivers sanity check the mappings received and 629 discard outliers, to provide robustness against invalid data (one 630 might think it more likely that the RTCP SR mappings are invalid, 631 since they are sent at irregular times and subject to skew, but the 632 presence of broken RTP translators could also corrupt the timestamps 633 in the RTP header extension; receivers need to cope with both types 634 of failure). 636 4.2. Timestamp based decoding order recovery 638 Once a receiver has synchronised the components of a layered, multi- 639 description, or multi-view flow using the RTP header extensions as 640 described in Section 4.1, it may then derive a decoding order based 641 on the synchronised timestamps as follows (or it may use information 642 in the RTP payload header to derive the decoding order, if present 643 and desired). 645 There may be explicit dependencies between the component flows of a 646 layered, multi-description, or multi-view flow. For example, it is 647 common for layered flows to be arranged in a hierarchy, where flows 648 from "higher" layers cannot be decoded until the corresponding data 649 in "lower" layer flows has been received and decoded. If such a 650 decoding hierarchy exists, it MUST be signalled out of band, for 651 example using [8] when SDP signalling is used. 653 Each component RTP flow MUST contain packets corresponding to all the 654 sampling instants of the RTP flows on which it depends. If such 655 packets are not naturally present in the RTP flow, the sender MUST 656 generate additional packets as necessary in order to satisfy this 657 rule. The format of these packets depends on the payload format 658 used. For H.264 SVC, the Empty NAL unit packet [9] should be used. 659 Flows may also include packets corresponding to additional sampling 660 instants that are not present in the flows on which they depend. 662 The receiver should decode the packets in all the component RTP flows 663 as follows: 665 o For each RTP packet in each flow, use the mapping contained in the 666 RTP header extensions and RTCP SR packets to derive the NTP-format 667 timestamp corresponding to its RTP timestamp. 669 o Group together RTP data packets from all component flows that have 670 identical calculated NTP-format timestamps. 672 o Processing groups in order of ascending NTP-format timestamp, 673 decode the RTP packets in each group according to the signalled 674 RTP flow decoding hierarchy. That is, pass the RTP packet data 675 from the flow on which all other flows depend to the decoder 676 first, then that from the next dependent flow, and so on. The 677 decoding order of the RTP flow hierarchy may be indicated by 678 mechanisms defined in [8] or by some other means. 680 Note that the decoding order will not necessarily match the packet 681 transmission order. The receiver will need to buffer packets for a 682 codec-dependent amount of time in order for all necessary packets to 683 arrive to allow decoding. 685 4.3. Example 687 The example shown in Figure 7 refers to three RTP flows A, B and C 688 containing a layered, a multi-view or a multi-description media 689 stream. In the example, the dependency signalling as defined in [8] 690 indicates that flow A is the lowest RTP flow, B is the first higher 691 RTP flow and depends on A, and C is the second higher RTP flow 692 corresponding to flow A and depends on A and B. A media coding 693 structure is used that results in samples present in higher flows but 694 not present in all lower flows. Flow A has the lowest frame rate and 695 Flow B and C have the same but higher frame rate. The figure shows 696 the full video samples with their corresponding RTP timestamps "(x)". 697 The video samples are already re-ordered according to their RTP 698 sequence number order. The figure indicates for the received sample 699 in decoding order within each RTP flow, as well as the associated NTP 700 media timestamps ("TS[..]"). These timestamps may be derived using 701 the NTP format timestamp provided in the RTCP sender reports or as 702 shown in the figure directly from the NTP timestamp contained in the 703 RTP header extensions as indicate by the timestamp in "". Note 704 that the timestamps are not in increasing order since, in this 705 example, the decoding order is different from the output/presentation 706 order. 708 The process first proceeds to the sample parts associated with the 709 first available synchronous insertion of NTP timestamp into RTP 710 header extensions at NTP media timestamp TS=[8] and starts in the 711 highest RTP flow C and removes/ignores all preceding sample parts (in 712 decoding order) to sample parts with TS=[8] in each of the de- 713 jittering buffers of RTP flows A, B, and C. Then, starting from flow 714 C, the first media timestamp available in decoding order (TS=[8]) is 715 selected and sample parts starting from RTP flow A, and flow B and C 716 are placed in order of the RTP flow dependency as indicated by 717 mechanisms defined in [8] (in the example for TS[8]: first flow B and 718 then flow C into the video sample VS(TS[8]) associated with NTP media 719 timestamp TS=[8]. Then the next media timestamp TS=[6] (RTP 720 timestamp=(4)) in order of appearance in the highest RTP flow C is 721 processed and the process described above is repeated. Note that 722 there may be video samples with no sample parts present, e.g., in the 723 lowest RTP flow A (see, e.g., TS=[5]). The decoding order recovery 724 process could be also started after receiving all RTP sender reports 725 RTP timestamp to NTP-format timestamp mapping (indicated as 726 timestamps "(x){y}") assuming that there is no clock skew in the 727 source used for the NTP-format timestamp generation. 729 C:-(0)----(2)----(7)<8>--(5)----(4)----(6)-----(11)----(9){10}- 730 | | | | | | | | 731 B:-(3)----(5)---(10)<8>--(8)----(7)----(9){7}--(14)----(12)---- 732 | | | | 733 A:---------------(3)<8>--(1)-------------------(7){12}-(5)----- 735 ---------------------------------------decoding/transmission order-> 736 TS:[1] [3] [8]=<8> [6] [5] [7] [12] [10] 738 Key: 739 A, B, C - RTP flows 740 Integer values in "()"- video sample with its RTP timestamp as 741 indicated in its RTP packet. 742 "|" - indicates corresponding samples / parts of 743 sample of the same video sample VS(TS[..]) 744 in the RTP flows. 745 Integer values in "[]"- NTP media timestamp TS, sampling time 746 as derived from the NTP timestamp associated 747 with the video sample AU(TS[..]), consisting 748 of sample parts in the flows above. 749 Integer values in "<>"- NTP media timestamp TS as directly 750 taken from the NTP RTP header extensions. 751 Integer values in "{}"- NTP media timestamp TS as provided in the 752 RTCP sender reports. 754 Figure 7: Example of a layered RTP stream 756 5. Security Considerations 758 The security considerations of the RTP specification [1], the 759 Extended RTP profile for RTCP-Based Feedback [2], and the General 760 Mechanism for RTP Header Extensions [6] apply. 762 The RTP header extensions defined in Section 3.3 include an NTP- 763 format timestamp. When an RTP session using this header extension is 764 protected by the Secure RTP framework [18], that header extension is 765 not part of the encrypted portion of the RTP data packets or RTCP 766 control packets; however these NTP-format timestamps are encrypted 767 when using SRTP without this header extension. This is a minor 768 information leak, but one that is not believed to be significant. 769 The inclusion of this header extension will also reduce the 770 efficiency of RTP header compression, if it is used. Furthermore, 771 middle boxes which do not understand the header extensions may remove 772 them or may not update the content according to this memo. 774 6. IANA Considerations 776 NOTE TO RFC EDITOR: Please replace "RFC XXXX" in the following with 777 the RFC number assigned to this memo, and delete this note. 779 The IANA is requested to register one new value in the table of FMT 780 Values for RTPFB Payload Types [2] as follows: 782 Name: RTCP-SR-REQ 783 Long name: RTCP Rapid Resynchronisation Request 784 Value: 5 785 Reference: RFC XXXX 787 The IANA is also requested to register two new RTP Compact Header 788 Extensions [6], according to the following: 790 Extension URI: urn:ietf:params:rtp-hdrext:ntp-64 791 Description: Synchronisation metadata: 64-bit timestamp format 792 Contact: Thomas Schierl 793 IETF Audio/Video Transport Working Group 794 Reference: RFC XXXX 796 Extension URI: urn:ietf:params:rtp-hdrext:ntp-56 797 Description: Synchronisation metadata: 56-bit timestamp format 798 Contact: Thomas Schierl 799 IETF Audio/Video Transport Working Group 800 Reference: RFC XXXX 802 7. Acknowledgements 804 This memo has benefited from discussions with numerous members of the 805 IETF AVT working group, including Jonathan Lennox, Magnus Westerlund, 806 Randell Jesup, Gerard Babonneau, Ingemar Johansson, Ali C. Begen, Ye- 807 Kui Wang, Roni Even, Michael Dolan, Art Allison, and Stefan Doehla. 808 The RTP header extension format of Variant A in Section 3.3 was 809 suggested by Dave Singer, matching a similar mechanism specified by 810 ISMA. 812 8. References 814 8.1. Normative References 816 [1] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, 817 "RTP: A Transport Protocol for Real-Time Applications", STD 64, 818 RFC 3550, July 2003. 820 [2] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, 821 "Extended RTP Profile for Real-time Transport Control Protocol 822 (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, July 2006. 824 [3] Bradner, S., "Key words for use in RFCs to Indicate Requirement 825 Levels", BCP 14, RFC 2119, March 1997. 827 [4] Ott, J., Chesterfield, J., and E. Schooler, "RTP Control 828 Protocol (RTCP) Extensions for Single-Source Multicast Sessions 829 with Unicast Feedback", RFC 5760, February 2010. 831 [5] Johansson, I. and M. Westerlund, "Support for Reduced-Size 832 Real-Time Transport Control Protocol (RTCP): Opportunities and 833 Consequences", RFC 5506, April 2009. 835 [6] Singer, D. and H. Desineni, "A General Mechanism for RTP Header 836 Extensions", RFC 5285, July 2008. 838 [7] Mills, D., "Network Time Protocol (Version 3) Specification, 839 Implementation", RFC 1305, March 1992. 841 [8] Schierl, T. and S. Wenger, "Signaling Media Decoding Dependency 842 in the Session Description Protocol (SDP)", RFC 5583, 843 July 2009. 845 8.2. Informative References 847 [9] Wenger, S., Wang, Y., Schierl, T., and A. Eleftheriadis, "RTP 848 Payload Format for SVC Video", draft-ietf-avt-rtp-svc-21 (work 849 in progress), April 2010. 851 [10] Lennox, J., Ott, J., and T. Schierl, "Source-Specific Media 852 Attributes in the Session Description Protocol (SDP)", 853 RFC 5576, June 2009. 855 [11] Casner, S., "Session Description Protocol (SDP) Bandwidth 856 Modifiers for RTP Control Protocol (RTCP) Bandwidth", RFC 3556, 857 July 2003. 859 [12] Rosenberg, J., "Interactive Connectivity Establishment (ICE): A 860 Protocol for Network Address Translator (NAT) Traversal for 861 Offer/Answer Protocols", RFC 5245, April 2010. 863 [13] McGrew, D. and E. Rescorla, "Datagram Transport Layer Security 864 (DTLS) Extension to Establish Keys for Secure Real-time 865 Transport Protocol (SRTP)", draft-ietf-avt-dtls-srtp-07 (work 866 in progress), February 2009. 868 [14] Zimmermann, P., Johnston, A., and J. Callas, "ZRTP: Media Path 869 Key Agreement for Unicast Secure RTP", 870 draft-zimmermann-avt-zrtp-18 (work in progress), April 2010. 872 [15] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117, 873 January 2008. 875 [16] Steeg, B., Begen, A., Caenegem, T., and Z. Vax, "Unicast-Based 876 Rapid Acquisition of Multicast RTP Sessions", 877 draft-ietf-avt-rapid-acquisition-for-rtp-09 (work in progress), 878 April 2010. 880 [17] de Bont, F., Doehla, S., Schmidt, M., and R. Sperschneider, 881 "RTP Payload Format for Elementary Streams with MPEG Surround 882 Multi-Channel Audio", RFC 5691, October 2009. 884 [18] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 885 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 886 RFC 3711, March 2004. 888 Authors' Addresses 890 Colin Perkins 891 University of Glasgow 892 Department of Computing Science 893 Glasgow G12 8QQ 894 UK 896 Email: csp@csperkins.org 898 Thomas Schierl 899 Fraunhofer HHI 900 Einsteinufer 37 901 D-10587 Berlin 902 Germany 904 Phone: +49-30-31002-227 905 Email: ts@thomas-schierl.de