idnits 2.17.1 draft-ietf-avt-rapid-rtp-sync-09.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == The 'Updates: ' line in the draft header should list only the _numbers_ of the RFCs which will be updated by this document (if approved); it should not include the word 'RFC' in the list. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 883 has weird spacing: '...channel audio...' -- The document date (January 8, 2010) is 5220 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-19) exists of draft-ietf-avt-rtcpssm-18 ** Obsolete normative reference: RFC 5285 (ref. '6') (Obsoleted by RFC 8285) ** Obsolete normative reference: RFC 1305 (ref. '7') (Obsoleted by RFC 5905) == Outdated reference: A later version (-27) exists of draft-ietf-avt-rtp-svc-18 == Outdated reference: A later version (-07) exists of draft-ietf-avt-dtls-srtp-05 == Outdated reference: A later version (-22) exists of draft-zimmermann-avt-zrtp-13 -- Obsolete informational reference (is this intentional?): RFC 5117 (ref. '15') (Obsoleted by RFC 7667) == Outdated reference: A later version (-17) exists of draft-ietf-avt-rapid-acquisition-for-rtp-05 == Outdated reference: A later version (-03) exists of draft-ietf-avt-rtp-mps-02 Summary: 3 errors (**), 0 flaws (~~), 9 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group C. Perkins 3 Internet-Draft University of Glasgow 4 Updates: RFC3550 T. Schierl 5 (if approved) Fraunhofer HHI 6 Intended status: Standards Track January 8, 2010 7 Expires: July 12, 2010 9 Rapid Synchronisation of RTP Flows 10 draft-ietf-avt-rapid-rtp-sync-09.txt 12 Status of this Memo 14 This Internet-Draft is submitted to IETF in full conformance with the 15 provisions of BCP 78 and BCP 79. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt. 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 This Internet-Draft will expire on July 12, 2010. 35 Copyright Notice 37 Copyright (c) 2010 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents in effect on the date of 42 publication of this document (http://trustee.ietf.org/license-info). 43 Please review these documents carefully, as they describe your rights 44 and restrictions with respect to this document. 46 Abstract 48 This memo outlines how RTP sessions are synchronised, and discusses 49 how rapidly such synchronisation can occur. We show that most RTP 50 sessions can be synchronised immediately, but that the use of video 51 switching multipoint conference units (MCUs) or large source specific 52 multicast (SSM) groups can greatly increase the synchronisation 53 delay. This increase in delay can be unacceptable to some 54 applications that use layered and/or multi-description codecs. 56 This memo introduces three mechanisms to reduce the synchronisation 57 delay for such sessions. First, it updates the RTP Control Protocol 58 (RTCP) timing rules to reduce the initial synchronisation delay for 59 SSM sessions. Second, a new feedback packet is defined for use with 60 the Extended RTP Profile for RTCP-based Feedback (RTP/AVPF), allowing 61 video switching MCUs to rapidly request resynchronisation. Finally, 62 new RTP header extensions are defined to allow rapid synchronisation 63 of late joiners, and guarantee correct timestamp based decoding order 64 recovery for layered codecs in the presence of clock skew. 66 Table of Contents 68 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 69 2. Synchronisation of RTP Flows . . . . . . . . . . . . . . . . . 5 70 2.1. Initial Synchronisation Delay . . . . . . . . . . . . . . 6 71 2.1.1. Unicast Sessions . . . . . . . . . . . . . . . . . . . 6 72 2.1.2. Source Specific Multicast (SSM) Sessions . . . . . . . 7 73 2.1.3. Any Source Multicast (ASM) Sessions . . . . . . . . . 8 74 2.1.4. Discussion . . . . . . . . . . . . . . . . . . . . . . 9 75 2.2. Synchronisation for Late Joiners . . . . . . . . . . . . . 9 76 3. Reducing RTP Synchronisation Delays . . . . . . . . . . . . . 10 77 3.1. Reduced Initial RTCP Interval for SSM Senders . . . . . . 10 78 3.2. Rapid Resynchronisation Request . . . . . . . . . . . . . 11 79 3.3. In-band Delivery of Synchronisation Metadata . . . . . . . 12 80 4. Application to Decoding Order Recovery in Layered Codecs . . . 14 81 4.1. In-band Synchronisation for Decoding Order Recovery . . . 15 82 4.2. Timestamp based decoding order recovery . . . . . . . . . 16 83 4.3. Example . . . . . . . . . . . . . . . . . . . . . . . . . 17 84 5. Security Considerations . . . . . . . . . . . . . . . . . . . 18 85 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 86 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19 87 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 19 88 8.1. Normative References . . . . . . . . . . . . . . . . . . . 19 89 8.2. Informative References . . . . . . . . . . . . . . . . . . 20 90 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 21 92 1. Introduction 94 When using RTP to deliver multimedia content it's often necessary to 95 synchronise playout of audio and video components of a presentation. 96 This is achieved using information contained in RTP Control Protocol 97 (RTCP) Sender Report (SR) packets [1]. These are sent periodically, 98 and the components of a multimedia session cannot be synchronised 99 until sufficient RTCP SR packets have been received for each RTP flow 100 to allow the receiver to establish mappings between the media clock 101 used for each RTP flow, and the common (NTP-format) reference clock 102 used to establish synchronisation. 104 Recently, concern has been expressed that this synchronisation delay 105 is problematic for some applications, for example those using layered 106 or multi-description video coding. This memo reviews the operations 107 of RTP synchronisation, and describes the synchronisation delay that 108 can be expected. Three backwards compatible extensions to the basic 109 RTP synchronisation mechanism are proposed: 111 o The RTCP transmission timing rules are relaxed for SSM senders, to 112 reduce the initial synchronisation latency for large SSM groups. 113 See Section 3.1. 115 o An enhancement to the Extended RTP Profile for RTCP-based Feedback 116 (RTP/AVPF) [2] is defined to allow receivers to request additional 117 RTCP SR packets, providing the metadata needed to synchronise RTP 118 flows. This can reduce the synchronisation delay when joining 119 sessions with large RTCP reporting intervals, in the presence of 120 packet loss, or when video switching MCUs are employed. See 121 Section 3.2. 123 o Two RTP header extensions are defined, to deliver synchronisation 124 metadata in-band with RTP data packets. These extensions provide 125 synchronisation metadata that is aligned with RTP data packets, 126 and so eliminate the need to estimate clock-skew between flows 127 before synchronisation. They can also reduce the need to receive 128 RTCP SR packets before flows can be synchronised, although it does 129 not eliminate the need for RTCP. See Section 3.3. 131 The immediate use-case for these extensions is to reduce the delay 132 due to synchronisation when joining a layered video session (e.g. an 133 H.264/SVC session in NI-T mode [9]). The extensions are not specific 134 to layered coding, however, and can be used in any environment when 135 synchronisation latency is an issue. 137 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 138 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 139 document are to be interpreted as described in RFC 2119 [3]. 141 2. Synchronisation of RTP Flows 143 RTP flows are synchronised by receivers based on information that is 144 contained in RTCP SR packets generated by senders (specifically, the 145 NTP-format timestamp and the RTP timestamp). Synchronisation 146 requires that a common reference clock MUST be used to generate the 147 NTP-format timestamps in a set of flows that are to be synchronised 148 (i.e. when synchronising several RTP flows, the RTP timestamps for 149 each flow are derived from separate, and media specific, clocks, but 150 the NTP format timestamps in the RTCP SR packets of all flows to be 151 synchronised MUST be sampled from the same clock). To achieve faster 152 and more accurate synchronisation, it is further RECOMMENDED that 153 senders and receivers use a synchronised common NTP format reference 154 clock with common properties, especially timebase, where possible 155 (recognising that this is often not possible when RTP is used outside 156 of controlled environments); the means by which that common reference 157 clock and its properties are signalled and distributed is outside the 158 scope of this memo. 160 For multimedia sessions, each type of media (e.g. audio or video) is 161 sent in a separate RTP session, and the receiver associates RTP flows 162 to be synchronised by means of the canonical end-point identifier 163 (CNAME) item included in the RTCP Source Description (SDES) packets 164 generated by the sender or signalled out of band [10]. For layered 165 media, different layers can be sent in different RTP sessions, or 166 using different SSRC values within a single RTP session; in both 167 cases, the CNAME is used to identify flows to be synchronised. To 168 ensure synchronisation, an RTP sender MUST therefore send periodic 169 compound RTCP packets following Section 6 of RFC 3550 [1]. 171 The timing of these periodic compound RTCP packets will depend on the 172 number of members in each RTP session, the fraction of those that are 173 sending data, the session bandwidth, the configured RTCP bandwidth 174 fraction, and whether the session is multicast or unicast (see RFC 175 3550 Section 6.2 for details). In summary, RTCP control traffic is 176 allocated a small fraction, generally 5%, of the session bandwidth, 177 and of that fraction, one quarter is allocated to active RTP senders, 178 while receivers use the remaining three quarters (these fractions can 179 be configured via SDP [11]). Each member of an RTP session derives 180 an RTCP reporting interval based on these fractions, whether the 181 session is multicast or unicast, the number of members it has 182 observed, and whether it is actively sending data or not. It then 183 sends a compound RTCP packet on average once per reporting interval 184 (the actual packet transmission time is randomised in the range [0.5 185 ... 1.5] times the reporting interval to avoid synchronisation of 186 reports). 188 A minimum reporting interval of 5 seconds is RECOMMENDED, except that 189 the delay before sending the initial report "MAY be set to half the 190 minimum interval to allow quicker notification that the new 191 participant is present" [1]. Also, for unicast sessions, "the delay 192 before sending the initial compound RTCP packet MAY be zero" [1]. In 193 addition, for unicast sessions, and for active senders in a multicast 194 session, the fixed minimum reporting interval MAY be scaled to "360 195 divided by the session bandwidth in kilobits/second. This minimum is 196 smaller than 5 seconds for bandwidths greater than 72 kb/s." [1] 198 2.1. Initial Synchronisation Delay 200 A multimedia session comprises a set of concurrent RTP sessions among 201 a common group of participants, using one RTP session for each media 202 type. For example, a videoconference (which is a multimedia session) 203 might contain an audio RTP session and a video RTP session. To allow 204 a receiver to synchronise the components of a multimedia session, a 205 compound RTCP packet containing an RTCP SR packet and an RTCP SDES 206 packet with a CNAME item MUST be sent to each of the RTP sessions in 207 the multimedia session by each sender. A receiver cannot synchronise 208 playout across the multimedia session until such RTCP packets have 209 been received on all of the component RTP sessions. If there is no 210 packet loss, this gives an expected initial synchronisation delay 211 equal to the average time taken to receive the first RTCP packet in 212 the RTP session with the longest RTCP reporting interval. This will 213 vary between unicast and multicast RTP sessions. 215 The initial synchronisation delay for layered sessions is similar to 216 that for multimedia sessions. The layers cannot be synchronised 217 until the RTCP SR and CNAME information has been received for each 218 layer in the session. 220 2.1.1. Unicast Sessions 222 For unicast multimedia or layered sessions, senders SHOULD transmit 223 an initial compound RTCP packet (containing an RTCP SR packet and an 224 RTCP SDES packet with a CNAME item) immediately on joining each RTP 225 session in the multimedia session. The individual RTP sessions are 226 considered to be joined once any in-band signalling for NAT traversal 227 (e.g. [12]) and/or security keying (e.g. [13],[14]) has concluded, 228 and the media path is open. This implies that the initial RTCP 229 packet is sent in parallel with the first data packet following the 230 guidance in RFC 3550 that "the delay before sending the initial 231 compound RTCP packet MAY be zero" and, in the absence of any packet 232 loss, flows can be synchronised immediately. 234 Note that NAT pinholes, firewall holes, quality-of-service, and media 235 security keys should have been negotiated as part of the signalling, 236 whether in-band or out-of-band, before the first RTCP packet is sent. 238 This should ensure that any middleboxes are ready to accept traffic, 239 and reduce the likelihood that the initial RTCP packet will be lost. 241 2.1.2. Source Specific Multicast (SSM) Sessions 243 For multicast sessions, the delay before sending the initial RTCP 244 packet, and hence the synchronisation delay, varies with the session 245 bandwidth and the number of members in the session. For a multicast 246 multimedia or layered session, the average synchronisation delay will 247 depend on the slowest of the component RTP sessions; this will 248 generally be the session with the lowest bandwidth (assuming all the 249 RTP sessions have the same number of members). 251 When sending to a multicast group, the reduced minimum RTCP reporting 252 interval of 360 seconds divided by the session bandwidth in kilobits 253 per second [1] should be used when synchronisation latency is likely 254 to be an issue. Also, as usual, the reporting interval is halved for 255 the first RTCP packet. Depending on the session bandwidth and the 256 number of members, this gives the average synchronisation delays 257 shown in Figure 1. 259 Session| Number of receivers: 260 Bandwidth| 2 3 4 5 10 100 1000 10000 261 --+------------------------------------------------ 262 8 kbps| 2.73 4.10 5.47 5.47 5.47 5.47 5.47 5.47 263 16 kbps| 2.50 2.50 2.73 2.73 2.73 2.73 2.73 2.73 264 32 kbps| 2.50 2.50 2.50 2.50 2.50 2.50 2.50 2.50 265 64 kbps| 2.50 2.50 2.50 2.50 2.50 2.50 2.50 2.50 266 128 kbps| 1.41 1.41 1.41 1.41 1.41 1.41 1.41 1.41 267 256 kbps| 0.70 0.70 0.70 0.70 0.70 0.70 0.70 0.70 268 512 kbps| 0.35 0.35 0.35 0.35 0.35 0.35 0.35 0.35 269 1 Mbps| 0.18 0.18 0.18 0.18 0.18 0.18 0.18 0.18 270 2 Mbps| 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 271 4 Mbps| 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 273 Figure 1: Average initial synchronisation delay in seconds for an RTP 274 Session with 1 sender. 276 These numbers assume a source specific multicast channel with a 277 single active sender, assuming an average RTCP packet size of 70 278 octets. These intervals are sufficient for lip-synchronisation 279 without excessive delay, but might be viewed as having too much 280 latency for synchronising parts of a layered video stream. 282 The RTCP interval is randomised in the usual manner, so the minimum 283 synchronisation delay will be half these intervals, and the maximum 284 delay will be 1.5 times these intervals. Note also that these RTCP 285 intervals are calculated assuming perfect knowledge of the number of 286 members in the session. 288 2.1.3. Any Source Multicast (ASM) Sessions 290 For ASM sessions, the fraction of members that are senders plays an 291 important role, and causes more variation in average RTCP reporting 292 interval. This is illustrated in Figure 2 and Figure 3, which show 293 the RTCP reporting interval for the same session bandwidths and 294 receiver populations as the SSM session described in Figure 1, but 295 for sessions with 2 and 10 senders respectively. It can be seen that 296 the initial synchronisation delay scales with the number of senders 297 (this is to ensure that the total RTCP traffic from all group members 298 does not grow without bound) and can be significantly larger than for 299 source specific groups. Despite this, the initial synchronisation 300 time remains acceptable for lip-synchronisation in typical small-to- 301 medium sized group video conferencing scenarios. 303 Note that multi-sender groups implemented using multi-unicast with a 304 central RTP translator (Topo-Translator in the terminology of [15]) 305 or mixer (Topo-Mixer), or some forms of video switching MCU (Topo- 306 Video-switch-MCU) distribute RTCP packets to all members of the 307 group, and so scale in the same way as an ASM group with regards to 308 initial synchronisation latency. 310 Session| Number of receivers: 311 Bandwidth| 2 3 4 5 10 100 1000 10000 312 --+------------------------------------------------ 313 8 kbps| 2.73 4.10 5.47 6.84 10.94 10.94 10.94 10.94 314 16 kbps| 2.50 2.50 2.73 3.42 5.47 5.47 5.47 5.47 315 32 kbps| 2.50 2.50 2.50 2.50 2.73 2.73 2.73 2.73 316 64 kbps| 2.50 2.50 2.50 2.50 2.50 2.50 2.50 2.50 317 128 kbps| 1.41 1.41 1.41 1.41 1.41 1.41 1.41 1.41 318 256 kbps| 0.70 0.70 0.70 0.70 0.70 0.70 0.70 0.70 319 512 kbps| 0.35 0.35 0.35 0.35 0.35 0.35 0.35 0.35 320 1 Mbps| 0.18 0.18 0.18 0.18 0.18 0.18 0.18 0.18 321 2 Mbps| 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 322 4 Mbps| 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 324 Figure 2: Average initial synchronisation delay in seconds for an RTP 325 Session with 2 senders. 327 Session| Number of receivers: 328 Bandwidth| 2 3 4 5 10 100 1000 10000 329 --+------------------------------------------------ 330 8 kbps| 2.73 4.10 5.47 6.84 13.67 54.69 54.69 54.69 331 16 kbps| 2.50 2.50 2.73 3.42 6.84 27.34 27.34 27.34 332 32 kbps| 2.50 2.50 2.50 2.50 3.42 13.67 13.67 13.67 333 64 kbps| 2.50 2.50 2.50 2.50 2.50 6.84 6.84 6.84 334 128 kbps| 1.41 1.41 1.41 1.41 1.41 3.42 3.42 3.42 335 256 kbps| 0.70 0.70 0.70 0.70 0.70 1.71 1.71 1.71 336 512 kbps| 0.35 0.35 0.35 0.35 0.35 0.85 0.85 0.85 337 1 Mbps| 0.18 0.18 0.18 0.18 0.18 0.43 0.43 0.43 338 2 Mbps| 0.09 0.09 0.09 0.09 0.09 0.21 0.21 0.21 339 4 Mbps| 0.04 0.04 0.04 0.04 0.04 0.11 0.11 0.11 341 Figure 3: Average initial synchronisation delay in seconds for an RTP 342 Session with 10 senders. 344 2.1.4. Discussion 346 For unicast sessions, the existing RTCP SR-based mechanism allows for 347 immediate synchronisation, provided the initial RTCP packet is not 348 lost. 350 For SSM sessions, the initial synchronisation delay is sufficient for 351 lip-synchronisation, but may be larger than desired for some layered 352 codecs. The rationale for not sending immediate RTCP packets for 353 multicast groups is to avoid implosion of requests when large numbers 354 of members simultaneously join the group ("flash crowd"). This is 355 not an issue for SSM senders, since there can be at most one sender, 356 so it is desirable to allow SSM senders to send an immediate RTCP SR 357 on joining a session (as is currently allowed for unicast sessions, 358 which also don't suffer from the implosion problem). SSM receivers 359 using unicast feedback would not be allowed to send immediate RTCP. 360 For ASM sessions, implosion of responses is a concern, so no change 361 is proposed to the RTCP timing rules. 363 In all cases, it is possible that the initial RTCP SR packet is lost. 364 In this case, the receiver will not be able to synchronise the media 365 until the reporting interval has passed, and the next RTCP SR packet 366 is sent. This is undesirable. Section 3.2 defines a new RTP/AVPF 367 transport layer feedback message to request an RTCP SR be generated, 368 allowing rapid resynchronisation in the case of packet loss. 370 2.2. Synchronisation for Late Joiners 372 Synchronisation between RTP sessions is potentially slower for late 373 joiners than for participants present at the start of the session. 374 The reasons for this are three-fold: 376 1. Many of the optimisations that allow rapid transmission of RTCP 377 SR packets apply only at the start of a session. This implies 378 that a new participant may have to wait a complete RTCP reporting 379 interval for each session before receiving the necessary data to 380 synchronise media streams. This might potentially take several 381 seconds, depending on the configured session bandwidth and the 382 number of participants. 384 2. Additional synchronisation delay comes from the nature of the 385 RTCP timing rules. Packets are generated on average once per 386 reporting interval, but with the exact transmission times being 387 randomised +/- 50% to avoid synchronisation of reports. This is 388 important to avoid network congestion in multicast sessions, but 389 does mean that the timing of RTCP SR reports for different RTP 390 sessions isn't synchronised. Accordingly, a receiver must 391 estimate the skew on the NTP-format clock in order to align RTP 392 timestamps across sessions. This estimation is an essential part 393 of an RTP synchronisation implementation, and can be done with 394 high accuracy given sufficient reports. Collecting sufficient 395 RTCP SR data to perform this estimation, however, may require 396 reception of several RTCP reports, further increasing the 397 synchronisation delay. 399 3. Many media codecs have the notion of periodic access points, such 400 that a newly joined receiver often cannot start decoding a media 401 stream until the packets corresponding to the access point have 402 been received. These access points may be sent less often than 403 RTCP SR packets, and so may be the limiting factor in starting 404 synchronised media playout for late joiners. The RTP extension 405 for unicast-based rapid acquisition of multicast RTP sessions 406 [16] may be used to reduce the time taken to receive the access 407 points in some scenarios. 409 These delays are likely an issue for tuning in to an ongoing 410 multicast RTP session, or for video switching MCUs. 412 3. Reducing RTP Synchronisation Delays 414 Three backwards compatible RTP extensions are defined to reduce the 415 possible synchronisation delay: a reduced initial RTCP interval for 416 SSM senders, a rapid resynchronisation request message, and RTP 417 header extensions that can convey synchronisation metadata in-band. 419 3.1. Reduced Initial RTCP Interval for SSM Senders 421 In SSM sessions where the initial synchronisation delay is important, 422 the RTP sender MAY set the delay before sending the initial compound 423 RTCP packet to zero, and send its first RTCP packet immediately upon 424 joining the SSM session. RTP receivers in an SSM session, sending 425 unicast RTCP feedback, MUST NOT send RTCP packets with zero initial 426 delay; the timing rules defined in [4] apply unchanged to receivers. 428 3.2. Rapid Resynchronisation Request 430 The general format of an RTP/AVPF transport layer feedback message is 431 shown in Figure 4 (see [2] for details). 433 0 1 2 3 434 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 435 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 436 |V=2|P| FMT | PT=RTPFB=205 | length | 437 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 438 | SSRC of packet sender | 439 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 440 | SSRC of media source | 441 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 442 : Feedback Control Information (FCI) : 443 : : 445 Figure 4: RTP/AVP Transport Layer Feedback Message 447 One new feedback message type, RTCP-SR-REQ, is defined with FMT = 5. 448 The Feedback Control Information (FCI) part of the feedback message 449 MUST be empty. The SSRC of packet sender indicates the member that 450 is unable to synchronise media streams, while the SSRC of media 451 source indicates the sender of the media it is unable to synchronise. 452 The length MUST equal 2. 454 If the RTP/AVPF profile [2] is in use, this feedback message MAY be 455 sent by a receiver to indicate that it's unable to synchronise some 456 media streams, and desires that the media source transmit an RTCP SR 457 packet as soon as possible (within the constraints of the RTCP timing 458 rules for early feedback). When it receives such an indication, a 459 media source that understands the RTCP-SR-REQ packet SHOULD generate 460 an RTCP SR packet as soon as possible while complying with the RTCP 461 early feedback rules. If the use of non-compound RTCP [5] was 462 previously negotiated, both the feedback request and the RTCP SR 463 response may be sent as non-compound RTCP packets. The RTCP-SR-REQ 464 packet MAY be repeated once per RTCP reporting interval if no RTCP SR 465 packet is forthcoming. The media source may ignore RTCP-SR-REQ 466 packets if its regular schedule for transmission of synchronisation 467 metadata can be expected to allow the receiver to synchronise the 468 media streams within a reasonable time frame. 470 When using SSM sessions with unicast feedback, is possible that the 471 feedback target and media source are not co-located. If a feedback 472 target receives an RTCP-SR-REQ feedback message in such a case, the 473 request should be forwarded to the media source. The mechanism to be 474 used for forwarding such requests is not defined here. 476 3.3. In-band Delivery of Synchronisation Metadata 478 The RTP header extension mechanism defined in [6] can be adopted to 479 carry an OPTIONAL NTP format timestamp in RTP data packets. If such 480 a timestamp is included, it MUST correspond to the same time instant 481 as the RTP timestamp in the packet's header, and MUST be derived from 482 the same clock used to generate the NTP format timestamps included in 483 RTCP SR packets. Provided it has knowledge of the SSRC to CNAME 484 mapping, either from prior receipt of an RTCP CNAME packet or via 485 out-of-band signalling [10], the receiver can use the information 486 provided as input to the synchronisation algorithm, in exactly the 487 same way as if an additional RTCP SR packet was been received for the 488 flow. 490 Two variants are defined for this header extension. The first 491 variant extends the RTP header with a 64 bit NTP timestamp format 492 timestamp as defined in [7]. The second variant carries the lower 24 493 bit part of the Seconds of a NTP timestamp format timestamp and the 494 32 bit of the Fraction of a NTP timestamp format timestamp. The 495 formats of the two variants are shown in Figure 5 and Figure 6. 497 0 1 2 3 498 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 499 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 500 |V=2|P|1| CC |M| PT | sequence number | 501 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+R 502 | timestamp |T 503 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+P 504 | synchronisation source (SSRC) identifier | 505 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 506 | 0xBE | 0xDE | length=3 | 507 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+E 508 | ID-A | L=7 | NTP timestamp format - Seconds (bit 0-23) |x 509 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+t 510 |NTP Sec.(24-31)| NTP timestamp format - Fraction(bit 0-23) |n 511 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 512 |NTP Frc.(24-31)| 0 (pad) | 0 (pad) | 0 (pad) | 513 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 514 | payload data | 515 | .... | 516 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 518 Figure 5: Variant A/64-bit NTP RTP header extension 520 0 1 2 3 521 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 522 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 523 |V=2|P|1| CC |M| PT | sequence number | 524 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+R 525 | timestamp |T 526 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+P 527 | synchronisation source (SSRC) identifier | 528 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 529 | 0xBE | 0xDE | length=2 | 530 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+E 531 | ID-B | L=6 | NTP timestamp format - Seconds (bit 8-31) |x 532 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+t 533 | NTP timestamp format - Fraction (bit 0-31) |n 534 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 535 | payload data | 536 | .... | 537 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 539 Figure 6: Variant B/56-bit NTP RTP header extension 541 An NTP timestamp format timestamp MAY be included on any RTP packets 542 the sender chooses, but it is RECOMMENDED when performing timestamp 543 based decoding order recovery for layered codecs transported in 544 multiple RTP flows, as further specified in Section 4.1. This header 545 extension SHOULD be also sent on the RTP packets corresponding to a 546 video random access point, and on the associated audio packets, to 547 allow rapid synchronisation for late joiners in multimedia sessions, 548 and in video switching scenarios. 550 Note: The inclusion of an RTP header extension will reduce the 551 efficiency of RTP header compression, if it is used. Furthermore, 552 middle boxes which do not understand the header extensions may remove 553 them or may not update the content according to this memo. 555 In all cases, irrespective of whether in-band NTP timestamp format 556 timestamps are included or not, regular RTCP SR packets MUST be sent 557 to provide backwards compatibility with receivers that synchronise 558 RTP flows according to [1], and robustness in the face of middleboxes 559 (RTP translators) that might strip RTP header extensions. If the 560 Variant B/56-bit NTP RTP header extension is used, RTCP sender 561 reports MUST be used to derive the upper 8 bit of the Seconds for the 562 NTP timestamp format timestamp. 564 When the SDP is used, the use of the RTP header extensions defined 565 above MUST be indicated as specified in [6]. Therefore the following 566 URIs MUST be used: 568 o The URI used for signalling the use of Variant A/64-bit NTP RTP 569 header extension in SDP is "urn:ietf:params:rtp-hdrext:ntp-64". 571 o The URI used for signalling the use of Variant B/56-bit NTP RTP 572 header extension in SDP is "urn:ietf:params:rtp-hdrext:ntp-56". 574 4. Application to Decoding Order Recovery in Layered Codecs 576 Packets in RTP flows are often predictively coded, with a receiver 577 having to arrange the packets into a particular order before it can 578 decode the media data. Depending on the payload format, the decoding 579 order might be explicitly specified as a field in the RTP payload 580 header, or the receiver might decode the packets in order of their 581 RTP timestamps. If a layered encoding is used, where the media data 582 is split across several RTP flows, then it is often necessary to 583 exactly synchronise the RTP flows comprising the different layers 584 before layers other than the base layer can be decoded. Examples of 585 such layered encodings are H.264 SVC in NI-T mode [9] and MPEG 586 surround multi-channel audio [17]. As described in Section 2, such 587 synchronisation is possible in RTP, but can be difficult to perform 588 rapidly. In the following, we describe how the extensions defined in 589 Section 3.3 can be used to synchronise layered flows, and provide a 590 common timestamp-based decoding order. 592 4.1. In-band Synchronisation for Decoding Order Recovery 594 When a layered, multi-description, or multi-view codec is used, with 595 the different components of the media being transferred on separate 596 RTP flows, the RTP sender SHOULD use periodic synchronous in-band 597 delivery of synchronisation metadata to allow receivers to rapidly 598 and accurately synchronise the separate components of the layered 599 media flow. There are three parts to this: 601 o The sender must negotiate the use of the RTP header extensions 602 described in Section 3.3, and must periodically and synchronously 603 insert such header extensions into all the RTP flows forming the 604 separate components of the layered, multi-description, or multi- 605 view flow. 607 o Synchronous insertion requires the sender insert these RTP header 608 extensions into packets corresponding to the exact same sampling 609 instant in all the flows. Since the header extensions for each 610 flow are inserted at exactly the same sampling instant, they will 611 have identical NTP-format timestamps, hence allowing receivers to 612 exactly align the RTP timestamps for the component flows. This 613 may require the insertion of extra data packets into some of the 614 component RTP flows, if some component flows contain packets for 615 sampling instants that do not exist in other flows (for example, a 616 layered video codec, where the layers have differing frame rates). 618 o The frequency with which the sender inserts the header extensions 619 will directly correspond to the synchronisation latency, with more 620 frequent insertion leading to higher per-flows overheads, but 621 lower synchronisation latency. It is RECOMMENDED that the sender 622 insert the header extensions synchronously into all component RTP 623 flows at least once per random access point of the media, but they 624 MAY be inserted more often. 626 The sender MUST continue to send periodic RTCP reports including SR 627 packets, and MUST ensure the RTP timestamp to NTP-format timestamp 628 mapping in the RTCP SR packets is consistent with that used in the 629 RTP header extensions. Receivers should use both the information 630 contained in RTCP SR packets and the in-band mapping of RTP and NTP- 631 format timestamps as input to the synchronisation process, but it is 632 RECOMMENDED that receivers sanity check the mappings received and 633 discard outliers, to provide robustness against invalid data (one 634 might think it more likely that the RTCP SR mappings are invalid, 635 since they are sent at irregular times and subject to skew, but the 636 presence of broken RTP translators could also corrupt the timestamps 637 in the RTP header extension; receivers need to cope with both types 638 of failure). 640 4.2. Timestamp based decoding order recovery 642 Once a receiver has synchronised the components of a layered, multi- 643 description, or multi-view flow using the RTP header extensions as 644 described in Section 4.1, it may then derive a decoding order based 645 on the synchronised timestamps as follows (or it may use information 646 in the RTP payload header to derive the decoding order, if present 647 and desired). 649 There may be explicit dependencies between the component flows of a 650 layered, multi-description, or multi-view flow. For example, it is 651 common for layered flows to be arranged in a hierarchy, where flows 652 from "higher" layers cannot be decoded until the corresponding data 653 in "lower" layer flows has been received and decoded. If such a 654 decoding hierarchy exists, it MUST be signalled out of band, for 655 example using [8] when SDP signalling is used. 657 Each component RTP flow MUST contain packets corresponding to all the 658 sampling instants of the RTP flows on which it depends. If such 659 packets are not naturally present in the RTP flow, the sender MUST 660 generate additional packets as necessary in order to satisfy this 661 rule. The format of these packets depends on the payload format 662 used. For H.264 SVC, the Empty NAL unit packet [9] should be used. 663 Flows may also include packets corresponding to additional sampling 664 instants that are not present in the flows on which they depend. 666 The receiver should decode the packets in all the component RTP flows 667 as follows: 669 o For each RTP packet in each flow, use the mapping contained in the 670 RTP header extensions and RTCP SR packets to derive the NTP-format 671 timestamp corresponding to its RTP timestamp. 673 o Group together RTP data packets from all component flows that have 674 identical calculated NTP-format timestamps. 676 o Processing groups in order of ascending NTP-format timestamp, 677 decode the RTP packets in each group according to the signalled 678 RTP flow decoding hierarchy. That is, pass the RTP packet data 679 from the flow on which all other flows depend to the decoder 680 first, then that from the next dependent flow, and so on. The 681 decoding order of the RTP flow hierarchy may be indicated by 682 mechanisms defined in [8] or by some other means. 684 Note that the decoding order will not necessarily match the packet 685 transmission order. The receiver will need to buffer packets for a 686 codec-dependent amount of time in order for all necessary packets to 687 arrive to allow decoding. 689 4.3. Example 691 The example shown in Figure 3 refers to three RTP flows A, B and C 692 containing a layered, a multi-view or a multi-description media 693 stream. In the example, the dependency signalling as defined in [8] 694 indicates that flow A is the lowest RTP flow, B is the first higher 695 RTP flow and depends on A, and C is the second higher RTP flow 696 corresponding to flow A and depends on A and B. A media coding 697 structure is used that results in samples present in higher flows but 698 not present in all lower flows. Flow A has the lowest frame rate and 699 Flow B and C have the same but higher frame rate. The figure shows 700 the full video samples with their corresponding RTP timestamps "(x)". 701 The video samples are already re-ordered according to their RTP 702 sequence number order. The figure indicates for the received sample 703 in decoding order within each RTP flow, as well as the associated NTP 704 media timestamps ("TS[..]"). These timestamps may be derived using 705 the NTP format timestamp provided in the RTCP sender reports or as 706 shown in the figure directly from the NTP timestamp contained in the 707 RTP header extensions as indicate by the timestamp in "". Note 708 that the timestamps are not in increasing order since, in this 709 example, the decoding order is different from the output/presentation 710 order. 712 The process first proceeds to the sample parts associated with the 713 first available synchronous insertion of NTP timestamp into RTP 714 header extensions at NTP media timestamp TS=[8] and starts in the 715 highest RTP flow C and removes/ignores all preceding sample parts (in 716 decoding order) to sample parts with TS=[8] in each of the de- 717 jittering buffers of RTP flows A, B, and C. Then, starting from flow 718 C, the first media timestamp available in decoding order (TS=[8]) is 719 selected and sample parts starting from RTP flow A, and flow B and C 720 are placed in order of the RTP flow dependency as indicated by 721 mechanisms defined in [8] (in the example for TS[8]: first flow B and 722 then flow C into the video sample VS(TS[8]) associated with NTP media 723 timestamp TS=[8]. Then the next media timestamp TS=[6] (RTP 724 timestamp=(4)) in order of appearance in the highest RTP flow C is 725 processed and the process described above is repeated. Note that 726 there may be video samples with no sample parts present, e.g., in the 727 lowest RTP flow A (see, e.g., TS=[5]). The decoding order recovery 728 process could be also started after receiving all RTP sender reports 729 RTP timestamp to NTP-format timestamp mapping (indicated as 730 timestamps "(x){y}") assuming that there is no clock skew in the 731 source used for the NTP-format timestamp generation. 733 C:-(0)----(2)----(7)<8>--(5)----(4)----(6)-----(11)----(9){10}- 734 | | | | | | | | 735 B:-(3)----(5)---(10)<8>--(8)----(7)----(9){7}--(14)----(12)---- 736 | | | | 737 A:---------------(3)<8>--(1)-------------------(7){12}-(5)----- 739 ---------------------------------------decoding/transmission order-> 740 TS:[1] [3] [8]=<8> [6] [5] [7] [12] [10] 742 Key: 743 A, B, C - RTP flows 744 Integer values in "()"- video sample with its RTP timestamp as 745 indicated in its RTP packet. 746 "|" - indicates corresponding samples / parts of 747 sample of the same video sample VS(TS[..]) 748 in the RTP flows. 749 Integer values in "[]"- NTP media timestamp TS, sampling time 750 as derived from the NTP timestamp associated 751 with the video sample AU(TS[..]), consisting 752 of sample parts in the flows above. 753 Integer values in "<>"- NTP media timestamp TS as directly 754 taken from the NTP RTP header extensions. 755 Integer values in "{}"- NTP media timestamp TS as provided in the 756 RTCP sender reports. 758 5. Security Considerations 760 The security considerations of the RTP specification [1], the 761 Extended RTP profile for RTCP-Based Feedback [2], and the General 762 Mechanism for RTP Header Extensions [6] apply. 764 The RTP header extensions defined in Section 3.3 include an NTP- 765 format timestamp. When an RTP session using this header extension is 766 protected by the Secure RTP framework [18], that header extension is 767 not part of the encrypted portion of the RTP data packets or RTCP 768 control packets; however these NTP-format timestamps are encrypted 769 when using SRTP without this header extension. This is a minor 770 information leak, but one that is not believed to be significant. 772 6. IANA Considerations 774 NOTE TO RFC EDITOR: Please replace "RFC XXXX" in the following with 775 the RFC number assigned to this memo, and delete this note. 777 The IANA is requested to register one new value in the table of FMT 778 Values for RTPFB Payload Types [2] as follows: 780 Name: RTCP-SR-REQ 781 Long name: RTCP Rapid Resynchronisation Request 782 Value: 5 783 Reference: RFC XXXX 785 The IANA is also requested to register two new RTP Compact Header 786 Extensions [6], according to the following: 788 Extension URI: urn:ietf:params:rtp-hdrext:ntp-64 789 Description: Synchronisation metadata: 64-bit timestamp format 790 Contact: Thomas Schierl 791 IETF Audio/Video Transport Working Group 792 Reference: RFC XXXX 794 Extension URI: urn:ietf:params:rtp-hdrext:ntp-56 795 Description: Synchronisation metadata: 56-bit timestamp format 796 Contact: Thomas Schierl 797 IETF Audio/Video Transport Working Group 798 Reference: RFC XXXX 800 7. Acknowledgements 802 This memo has benefited from discussions with numerous members of the 803 IETF AVT working group, including Jonathan Lennox, Magnus Westerlund, 804 Randell Jesup, Gerard Babonneau, Ingemar Johansson, Ali C. Begen, Ye- 805 Kui Wang, Roni Even, Michael Dolan, Art Allison, and Stefan Doehla. 806 The RTP header extension format of Variant A in Section 3.3 was 807 suggested by Dave Singer, matching a similar mechanism specified by 808 ISMA. 810 8. References 812 8.1. Normative References 814 [1] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, 815 "RTP: A Transport Protocol for Real-Time Applications", STD 64, 816 RFC 3550, July 2003. 818 [2] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, 819 "Extended RTP Profile for Real-time Transport Control Protocol 820 (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, July 2006. 822 [3] Bradner, S., "Key words for use in RFCs to Indicate Requirement 823 Levels", BCP 14, RFC 2119, March 1997. 825 [4] Schooler, E., Ott, J., and J. Chesterfield, "RTCP Extensions 826 for Single-Source Multicast Sessions with Unicast Feedback", 827 draft-ietf-avt-rtcpssm-18 (work in progress), March 2009. 829 [5] Johansson, I. and M. Westerlund, "Support for Reduced-Size 830 Real-Time Transport Control Protocol (RTCP): Opportunities and 831 Consequences", RFC 5506, April 2009. 833 [6] Singer, D. and H. Desineni, "A General Mechanism for RTP Header 834 Extensions", RFC 5285, July 2008. 836 [7] Mills, D., "Network Time Protocol (Version 3) Specification, 837 Implementation", RFC 1305, March 1992. 839 [8] Schierl, T. and S. Wenger, "Signaling media decoding dependency 840 in Session Description Protocol (SDP)", 841 draft-ietf-mmusic-decoding-dependency-08 (work in progress), 842 April 2009. 844 8.2. Informative References 846 [9] Wenger, S., Wang, Y., Schierl, T., and A. Eleftheriadis, "RTP 847 Payload Format for SVC Video", draft-ietf-avt-rtp-svc-18 (work 848 in progress), March 2009. 850 [10] Lennox, J., Ott, J., and T. Schierl, "Source-Specific Media 851 Attributes in the Session Description Protocol (SDP)", 852 draft-ietf-mmusic-sdp-source-attributes-02 (work in progress), 853 October 2008. 855 [11] Casner, S., "Session Description Protocol (SDP) Bandwidth 856 Modifiers for RTP Control Protocol (RTCP) Bandwidth", RFC 3556, 857 July 2003. 859 [12] Rosenberg, J., "Interactive Connectivity Establishment (ICE): A 860 Protocol for Network Address Translator (NAT) Traversal for 861 Offer/Answer Protocols", draft-ietf-mmusic-ice-19 (work in 862 progress), October 2007. 864 [13] McGrew, D. and E. Rescorla, "Datagram Transport Layer Security 865 (DTLS) Extension to Establish Keys for Secure Real-time 866 Transport Protocol (SRTP)", draft-ietf-avt-dtls-srtp-05 (work 867 in progress), September 2008. 869 [14] Zimmermann, P., Johnston, A., and J. Callas, "ZRTP: Media Path 870 Key Agreement for Secure RTP", draft-zimmermann-avt-zrtp-13 871 (work in progress), January 2009. 873 [15] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117, 874 January 2008. 876 [16] Steeg, B., Begen, A., Caenegem, T., and Z. Vax, "Unicast-Based 877 Rapid Acquisition of Multicast RTP Sessions", 878 draft-ietf-avt-rapid-acquisition-for-rtp-05 (work in progress), 879 November 2009. 881 [17] Bont, F., Doehla, S., Schmidt, M., and R. Sperschneider, "RTP 882 Payload Format for Elementary Streams with MPEG Surround multi- 883 channel audio", draft-ietf-avt-rtp-mps-02 (work in progress), 884 January 2009. 886 [18] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 887 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 888 RFC 3711, March 2004. 890 Authors' Addresses 892 Colin Perkins 893 University of Glasgow 894 Department of Computing Science 895 Glasgow G12 8QQ 896 UK 898 Email: csp@csperkins.org 900 Thomas Schierl 901 Fraunhofer HHI 902 Einsteinufer 37 903 D-10587 Berlin 904 Germany 906 Phone: +49-30-31002-227 907 Email: ts@thomas-schierl.de