idnits 2.17.1 draft-ietf-avt-rapid-rtp-sync-08.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == The 'Updates: ' line in the draft header should list only the _numbers_ of the RFCs which will be updated by this document (if approved); it should not include the word 'RFC' in the list. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 884 has weird spacing: '...channel audio...' -- The document date (December 22, 2009) is 5229 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-19) exists of draft-ietf-avt-rtcpssm-18 ** Obsolete normative reference: RFC 5285 (ref. '6') (Obsoleted by RFC 8285) ** Obsolete normative reference: RFC 1305 (ref. '7') (Obsoleted by RFC 5905) == Outdated reference: A later version (-27) exists of draft-ietf-avt-rtp-svc-18 == Outdated reference: A later version (-07) exists of draft-ietf-avt-dtls-srtp-05 == Outdated reference: A later version (-22) exists of draft-zimmermann-avt-zrtp-13 -- Obsolete informational reference (is this intentional?): RFC 5117 (ref. '15') (Obsoleted by RFC 7667) == Outdated reference: A later version (-17) exists of draft-ietf-avt-rapid-acquisition-for-rtp-05 == Outdated reference: A later version (-03) exists of draft-ietf-avt-rtp-mps-02 Summary: 3 errors (**), 0 flaws (~~), 9 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group C. Perkins 3 Internet-Draft University of Glasgow 4 Updates: RFC3550 T. Schierl 5 (if approved) Fraunhofer HHI 6 Intended status: Standards Track December 22, 2009 7 Expires: June 25, 2010 9 Rapid Synchronisation of RTP Flows 10 draft-ietf-avt-rapid-rtp-sync-08.txt 12 Status of this Memo 14 This Internet-Draft is submitted to IETF in full conformance with the 15 provisions of BCP 78 and BCP 79. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt. 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 This Internet-Draft will expire on June 25, 2010. 35 Copyright Notice 37 Copyright (c) 2009 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents in effect on the date of 42 publication of this document (http://trustee.ietf.org/license-info). 43 Please review these documents carefully, as they describe your rights 44 and restrictions with respect to this document. 46 Abstract 48 This memo outlines how RTP sessions are synchronised, and discusses 49 how rapidly such synchronisation can occur. We show that most RTP 50 sessions can be synchronised immediately, but that the use of video 51 switching multipoint conference units (MCUs) or large source specific 52 multicast (SSM) groups can greatly increase the synchronisation 53 delay. This increase in delay can be unacceptable to some 54 applications that use layered and/or multi-description codecs. 56 This memo introduces three mechanisms to reduce the synchronisation 57 delay for such sessions. First, it updates the RTP Control Protocol 58 (RTCP) timing rules to reduce the initial synchronisation delay for 59 SSM sessions. Second, a new feedback packet is defined for use with 60 the Extended RTP Profile for RTCP-based Feedback (RTP/AVPF), allowing 61 video switching MCUs to rapidly request resynchronisation. Finally, 62 new RTP header extensions are defined to allow rapid synchronisation 63 of late joiners, and guarantee correct timestamp based decoding order 64 recovery for layered codecs in the presence of clock skew. 66 Table of Contents 68 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 69 2. Synchronisation of RTP Flows . . . . . . . . . . . . . . . . . 5 70 2.1. Initial Synchronisation Delay . . . . . . . . . . . . . . 6 71 2.1.1. Unicast Sessions . . . . . . . . . . . . . . . . . . . 6 72 2.1.2. Source Specific Multicast (SSM) Sessions . . . . . . . 7 73 2.1.3. Any Source Multicast (ASM) Sessions . . . . . . . . . 8 74 2.1.4. Discussion . . . . . . . . . . . . . . . . . . . . . . 9 75 2.2. Synchronisation for Late Joiners . . . . . . . . . . . . . 9 76 3. Reducing RTP Synchronisation Delays . . . . . . . . . . . . . 10 77 3.1. Reduced Initial RTCP Interval for SSM Senders . . . . . . 10 78 3.2. Rapid Resynchronisation Request . . . . . . . . . . . . . 11 79 3.3. In-band Delivery of Synchronisation Metadata . . . . . . . 12 80 4. Application to Decoding Order Recovery in Layered Codecs . . . 14 81 4.1. In-band Synchronisation for Decoding Order Recovery . . . 15 82 4.2. Timestamp based decoding order recovery . . . . . . . . . 16 83 4.3. Example . . . . . . . . . . . . . . . . . . . . . . . . . 17 84 5. Security Considerations . . . . . . . . . . . . . . . . . . . 18 85 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 86 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19 87 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 19 88 8.1. Normative References . . . . . . . . . . . . . . . . . . . 19 89 8.2. Informative References . . . . . . . . . . . . . . . . . . 20 90 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 21 92 1. Introduction 94 When using RTP to deliver multimedia content it's often necessary to 95 synchronise playout of audio and video components of a presentation. 96 This is achieved using information contained in RTP Control Protocol 97 (RTCP) Sender Report (SR) packets [1]. These are sent periodically, 98 and the components of a multimedia session cannot be synchronised 99 until sufficient RTCP SR packets have been received for each RTP flow 100 to allow the receiver to establish mappings between the media clock 101 used for each RTP flow, and the common (NTP-format) reference clock 102 used to establish synchronisation. 104 Recently, concern has been expressed that this synchronisation delay 105 is problematic for some applications, for example those using layered 106 or multi-description video coding. This memo reviews the operations 107 of RTP synchronisation, and describes the synchronisation delay that 108 can be expected. Three backwards compatible extensions to the basic 109 RTP synchronisation mechanism are proposed: 111 o The RTCP transmission timing rules are relaxed for SSM senders, to 112 reduce the initial synchronisation latency for large SSM groups. 113 See Section 3.1. 115 o An enhancement to the Extended RTP Profile for RTCP-based Feedback 116 (RTP/AVPF) [2] is defined to allow receivers to request additional 117 RTCP SR packets, providing the metadata needed to synchronise RTP 118 flows. This can reduce the synchronisation delay when joining 119 sessions with large RTCP reporting intervals, in the presence of 120 packet loss, or when video switching MCUs are employed. See 121 Section 3.2. 123 o Two RTP header extensions are defined, to deliver synchronisation 124 metadata in-band with RTP data packets. These extensions provide 125 synchronisation metadata that is aligned with RTP data packets, 126 and so eliminate the need to estimate clock-skew between flows 127 before synchronisation. They can also reduce the need to receive 128 RTCP SR packets before flows can be synchronised, although it does 129 not eliminate the need for RTCP. See Section 3.3. 131 The immediate use-case for these extensions is to reduce the delay 132 due to synchronisation when joining a layered video session (e.g. an 133 H.264/SVC session in NI-T mode [9]). The extensions are not specific 134 to layered coding, however, and can be used in any environment when 135 synchronisation latency is an issue. 137 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 138 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 139 document are to be interpreted as described in RFC 2119 [3]. 141 2. Synchronisation of RTP Flows 143 RTP flows are synchronised by receivers based on information that is 144 contained in RTCP SR packets generated by senders (specifically, the 145 NTP-format timestamp and the RTP timestamp). Synchronisation 146 requires that a common reference clock MUST be used to generate the 147 NTP-format timestamps in a set of flows that are to be synchronised 148 (i.e. when synchronising several RTP flows, the RTP timestamps for 149 each flow are derived from separate, and media specific, clocks, but 150 the NTP format timestamps in the RTCP SR packets of all flows to be 151 synchronised MUST be sampled from the same clock). To achieve faster 152 and more accurate synchronisation, it is further RECOMMENDED that 153 senders and receivers use a synchronised common NTP format reference 154 clock with common properties, especially timebase, where possible 155 (recognising that this is often not possible when RTP is used outside 156 of controlled environments); the means by which that common reference 157 clock and its properties are signalled and distributed is outside the 158 scope of this memo. 160 For multimedia sessions, each type of media (e.g. audio or video) is 161 sent in a separate RTP session, and the receiver associates RTP flows 162 to be synchronised by means of the canonical end-point identifier 163 (CNAME) item included in the RTCP Source Description (SDES) packets 164 generated by the sender or signalled out of band [10]. For layered 165 media, different layers can be sent in different RTP sessions, or 166 using different SSRC values within a single RTP session; in both 167 cases, the CNAME is used to identify flows to be synchronised. To 168 ensure synchronisation, an RTP sender MUST therefore send periodic 169 compound RTCP packets following Section 6 of RFC 3550 [1]. 171 The timing of these periodic compound RTCP packets will depend on the 172 number of members in each RTP session, the fraction of those that are 173 sending data, the session bandwidth, the configured RTCP bandwidth 174 fraction, and whether the session is multicast or unicast (see RFC 175 3550 Section 6.2 for details). In summary, RTCP control traffic is 176 allocated a small fraction, generally 5%, of the session bandwidth, 177 and of that fraction, one quarter is allocated to active RTP senders, 178 while receivers use the remaining three quarters (these fractions can 179 be configured via SDP [11]). Each member of an RTP session derives 180 an RTCP reporting interval based on these fractions, whether the 181 session is multicast or unicast, the number of members it has 182 observed, and whether it is actively sending data or not. It then 183 sends a compound RTCP packet on average once per reporting interval 184 (the actual packet transmission time is randomised in the range [0.5 185 ... 1.5] times the reporting interval to avoid synchronisation of 186 reports). 188 A minimum reporting interval of 5 seconds is RECOMMENDED, except that 189 the delay before sending the initial report "MAY be set to half the 190 minimum interval to allow quicker notification that the new 191 participant is present" [1]. Also, for unicast sessions, "the delay 192 before sending the initial compound RTCP packet MAY be zero" [1]. In 193 addition, for unicast sessions, and for active senders in a multicast 194 session, the fixed minimum reporting interval MAY be scaled to "360 195 divided by the session bandwidth in kilobits/second. This minimum is 196 smaller than 5 seconds for bandwidths greater than 72 kb/s." [1] 198 2.1. Initial Synchronisation Delay 200 A multimedia session comprises a set of concurrent RTP sessions among 201 a common group of participants, using one RTP session for each media 202 type. For example, a videoconference (which is a multimedia session) 203 might contain an audio RTP session and a video RTP session. To allow 204 a receiver to synchronise the components of a multimedia session, a 205 compound RTCP packet containing an RTCP SR packet and an RTCP SDES 206 packet with a CNAME item MUST be sent to each of the RTP sessions in 207 the multimedia session by each sender. A receiver cannot synchronise 208 playout across the multimedia session until such RTCP packets have 209 been received on all of the component RTP sessions. If there is no 210 packet loss, this gives an expected initial synchronisation delay 211 equal to the average time taken to receive the first RTCP packet in 212 the RTP session with the longest RTCP reporting interval. This will 213 vary between unicast and multicast RTP sessions. 215 The initial synchronisation delay for layered sessions is similar to 216 that for multimedia sessions. The layers cannot be synchronised 217 until the RTCP SR and CNAME information has been received for each 218 layer in the session. 220 2.1.1. Unicast Sessions 222 For unicast multimedia or layered sessions, senders SHOULD transmit 223 an initial compound RTCP packet (containing an RTCP SR packet and an 224 RTCP SDES packet with a CNAME item) immediately on joining each RTP 225 session in the multimedia session. The individual RTP sessions are 226 considered to be joined once any in-band signalling for NAT traversal 227 (e.g. [12]) and/or security keying (e.g. [13],[14]) has concluded, 228 and the media path is open. This implies that the initial RTCP 229 packet is sent in parallel with the first data packet following the 230 guidance in RFC 3550 that "the delay before sending the initial 231 compound RTCP packet MAY be zero" and, in the absence of any packet 232 loss, flows can be synchronised immediately. 234 Note that NAT pinholes, firewall holes, quality-of-service, and media 235 security keys should have been negotiated as part of the signalling, 236 whether in-band or out-of-band, before the first RTCP packet is sent. 238 This should ensure that any middleboxes are ready to accept traffic, 239 and reduce the likelihood that the initial RTCP packet will be lost. 241 2.1.2. Source Specific Multicast (SSM) Sessions 243 For multicast sessions, the delay before sending the initial RTCP 244 packet, and hence the synchronisation delay, varies with the session 245 bandwidth and the number of members in the session. For a multicast 246 multimedia or layered session, the average synchronisation delay will 247 depend on the slowest of the component RTP sessions; this will 248 generally be the session with the lowest bandwidth (assuming all the 249 RTP sessions have the same number of members). 251 When sending to a multicast group, the reduced minimum RTCP reporting 252 interval of 360 seconds divided by the session bandwidth in kilobits 253 per second [1] should be used when synchronisation latency is likely 254 to be an issue. Also, as usual, the reporting interval is halved for 255 the first RTCP packet. Depending on the session bandwidth and the 256 number of members, this gives the average synchronisation delays 257 shown in Figure 1. 259 Session| Number of receivers: 260 Bandwidth| 2 3 4 5 10 100 1000 10000 261 --+------------------------------------------------ 262 8 kbps| 2.73 4.10 5.47 5.47 5.47 5.47 5.47 5.47 263 16 kbps| 2.50 2.50 2.73 2.73 2.73 2.73 2.73 2.73 264 32 kbps| 2.50 2.50 2.50 2.50 2.50 2.50 2.50 2.50 265 64 kbps| 2.50 2.50 2.50 2.50 2.50 2.50 2.50 2.50 266 128 kbps| 1.41 1.41 1.41 1.41 1.41 1.41 1.41 1.41 267 256 kbps| 0.70 0.70 0.70 0.70 0.70 0.70 0.70 0.70 268 512 kbps| 0.35 0.35 0.35 0.35 0.35 0.35 0.35 0.35 269 1 Mbps| 0.18 0.18 0.18 0.18 0.18 0.18 0.18 0.18 270 2 Mbps| 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 271 4 Mbps| 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 273 Figure 1: Average initial synchronisation delay in seconds for an RTP 274 Session with 1 sender. 276 These numbers assume a source specific multicast channel with a 277 single active sender, assuming an average RTCP packet size of 70 278 octets. These intervals are sufficient for lip-synchronisation 279 without excessive delay, but might be viewed as having too much 280 latency for synchronising parts of a layered video stream. 282 The RTCP interval is randomised in the usual manner, so the minimum 283 synchronisation delay will be half these intervals, and the maximum 284 delay will be 1.5 times these intervals. Note also that these RTCP 285 intervals are calculated assuming perfect knowledge of the number of 286 members in the session. 288 2.1.3. Any Source Multicast (ASM) Sessions 290 For ASM sessions, the fraction of members that are senders plays an 291 important role, and causes more variation in average RTCP reporting 292 interval. This is illustrated in Figure 2 and Figure 3, which show 293 the RTCP reporting interval for the same session bandwidths and 294 receiver populations as the SSM session described in Figure 1, but 295 for sessions with 2 and 10 senders respectively. It can be seen that 296 the initial synchronisation delay scales with the number of senders 297 (this is to ensure that the total RTCP traffic from all group members 298 does not grow without bound) and can be significantly larger than for 299 source specific groups. Despite this, the initial synchronisation 300 time remains acceptable for lip-synchronisation in typical small-to- 301 medium sized group video conferencing scenarios. 303 Note that multi-sender groups implemented using multi-unicast with a 304 central RTP translator (Topo-Translator in the terminology of [15]) 305 or mixer (Topo-Mixer), or some forms of video switching MCU (Topo- 306 Video-switch-MCU) distribute RTCP packets to all members of the 307 group, and so scale in the same way as an ASM group with regards to 308 initial synchronisation latency. 310 Session| Number of receivers: 311 Bandwidth| 2 3 4 5 10 100 1000 10000 312 --+------------------------------------------------ 313 8 kbps| 2.73 4.10 5.47 6.84 10.94 10.94 10.94 10.94 314 16 kbps| 2.50 2.50 2.73 3.42 5.47 5.47 5.47 5.47 315 32 kbps| 2.50 2.50 2.50 2.50 2.73 2.73 2.73 2.73 316 64 kbps| 2.50 2.50 2.50 2.50 2.50 2.50 2.50 2.50 317 128 kbps| 1.41 1.41 1.41 1.41 1.41 1.41 1.41 1.41 318 256 kbps| 0.70 0.70 0.70 0.70 0.70 0.70 0.70 0.70 319 512 kbps| 0.35 0.35 0.35 0.35 0.35 0.35 0.35 0.35 320 1 Mbps| 0.18 0.18 0.18 0.18 0.18 0.18 0.18 0.18 321 2 Mbps| 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 322 4 Mbps| 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 324 Figure 2: Average initial synchronisation delay in seconds for an RTP 325 Session with 2 senders. 327 Session| Number of receivers: 328 Bandwidth| 2 3 4 5 10 100 1000 10000 329 --+------------------------------------------------ 330 8 kbps| 2.73 4.10 5.47 6.84 13.67 54.69 54.69 54.69 331 16 kbps| 2.50 2.50 2.73 3.42 6.84 27.34 27.34 27.34 332 32 kbps| 2.50 2.50 2.50 2.50 3.42 13.67 13.67 13.67 333 64 kbps| 2.50 2.50 2.50 2.50 2.50 6.84 6.84 6.84 334 128 kbps| 1.41 1.41 1.41 1.41 1.41 3.42 3.42 3.42 335 256 kbps| 0.70 0.70 0.70 0.70 0.70 1.71 1.71 1.71 336 512 kbps| 0.35 0.35 0.35 0.35 0.35 0.85 0.85 0.85 337 1 Mbps| 0.18 0.18 0.18 0.18 0.18 0.43 0.43 0.43 338 2 Mbps| 0.09 0.09 0.09 0.09 0.09 0.21 0.21 0.21 339 4 Mbps| 0.04 0.04 0.04 0.04 0.04 0.11 0.11 0.11 341 Figure 3: Average initial synchronisation delay in seconds for an RTP 342 Session with 10 senders. 344 2.1.4. Discussion 346 For unicast sessions, the existing RTCP SR-based mechanism allows for 347 immediate synchronisation, provided the initial RTCP packet is not 348 lost. 350 For SSM sessions, the initial synchronisation delay is sufficient for 351 lip-synchronisation, but may be larger than desired for some layered 352 codecs. The rationale for not sending immediate RTCP packets for 353 multicast groups is to avoid implosion of requests when large numbers 354 of members simultaneously join the group ("flash crowd"). This is 355 not an issue for SSM senders, since there can be at most one sender, 356 so it is desirable to allow SSM senders to send an immediate RTCP SR 357 on joining a session (as is currently allowed for unicast sessions, 358 which also don't suffer from the implosion problem). SSM receivers 359 using unicast feedback would not be allowed to send immediate RTCP. 360 For ASM sessions, implosion of responses is a concern, so no change 361 is proposed to the RTCP timing rules. 363 In all cases, it is possible that the initial RTCP SR packet is lost. 364 In this case, the receiver will not be able to synchronise the media 365 until the reporting interval has passed, and the next RTCP SR packet 366 is sent. This is undesirable. Section 3.2 defines a new RTP/AVPF 367 transport layer feedback message to request an RTCP SR be generated, 368 allowing rapid resynchronisation in the case of packet loss. 370 2.2. Synchronisation for Late Joiners 372 Synchronisation between RTP sessions is potentially slower for late 373 joiners than for participants present at the start of the session. 374 The reasons for this are three-fold: 376 1. Many of the optimisations that allow rapid transmission of RTCP 377 SR packets apply only at the start of a session. This implies 378 that a new participant may have to wait a complete RTCP reporting 379 interval for each session before receiving the necessary data to 380 synchronise media streams. This might potentially take several 381 seconds, depending on the configured session bandwidth and the 382 number of participants. 384 2. Additional synchronisation delay comes from the nature of the 385 RTCP timing rules. Packets are generated on average once per 386 reporting interval, but with the exact transmission times being 387 randomised +/- 50% to avoid synchronisation of reports. This is 388 important to avoid network congestion in multicast sessions, but 389 does mean that the timing of RTCP SR reports for different RTP 390 sessions isn't synchronised. Accordingly, a receiver must 391 estimate the skew on the NTP-format clock in order to align RTP 392 timestamps across sessions. This estimation is an essential part 393 of an RTP synchronisation implementation, and can be done with 394 high accuracy given sufficient reports. Collecting sufficient 395 RTCP SR data to perform this estimation, however, may require 396 reception of several RTCP reports, further increasing the 397 synchronisation delay. 399 3. Many media codecs have the notion of periodic access points, such 400 that a newly joined receiver often cannot start decoding a media 401 stream until the packets corresponding to the access point have 402 been received. These access points may be sent less often than 403 RTCP SR packets, and so may be the limiting factor in starting 404 synchronised media playout for late joiners. The RTP extension 405 for unicast-based rapid acquisition of multicast RTP sessions 406 [16] may be used to reduce the time taken to receive the access 407 points in some scenarios. 409 These delays are likely an issue for tuning in to an ongoing 410 multicast RTP session, or for video switching MCUs. 412 3. Reducing RTP Synchronisation Delays 414 Three backwards compatible RTP extensions are defined to reduce the 415 possible synchronisation delay: a reduced initial RTCP interval for 416 SSM senders, a rapid resynchronisation request message, and RTP 417 header extensions that can convey synchronisation metadata in-band. 419 3.1. Reduced Initial RTCP Interval for SSM Senders 421 In SSM sessions where the initial synchronisation delay is important, 422 the RTP sender MAY set the delay before sending the initial compound 423 RTCP packet to zero, and send its first RTCP packet immediately upon 424 joining the SSM session. RTP receivers in an SSM session, sending 425 unicast RTCP feedback, MUST NOT send RTCP packets with zero initial 426 delay; the timing rules defined in [4] apply unchanged to receivers. 428 3.2. Rapid Resynchronisation Request 430 The general format of an RTP/AVPF transport layer feedback message is 431 shown in Figure 4 (see [2] for details). 433 0 1 2 3 434 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 435 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 436 |V=2|P| FMT | PT=RTPFB=205 | length | 437 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 438 | SSRC of packet sender | 439 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 440 | SSRC of media source | 441 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 442 : Feedback Control Information (FCI) : 443 : : 445 Figure 4: RTP/AVP Transport Layer Feedback Message 447 One new feedback message type, RTCP-SR-REQ, is defined with FMT = 5. 448 The Feedback Control Information (FCI) part of the feedback message 449 MUST be empty. The SSRC of packet sender indicates the member that 450 is unable to synchronise media streams, while the SSRC of media 451 source indicates the sender of the media it is unable to synchronise. 452 The length MUST equal 2. 454 If the RTP/AVPF profile [2] is in use, this feedback message MAY be 455 sent by a receiver to indicate that it's unable to synchronise some 456 media streams, and desires that the media source transmit an RTCP SR 457 packet as soon as possible (within the constraints of the RTCP timing 458 rules for early feedback). When it receives such an indication, a 459 media source that understands the RTCP-SR-REQ packet SHOULD generate 460 an RTCP SR packet as soon as possible while complying with the RTCP 461 early feedback rules. If the use of non-compound RTCP [5] was 462 previously negotiated, both the feedback request and the RTCP SR 463 response may be sent as non-compound RTCP packets. The RTCP-SR-REQ 464 packet MAY be repeated once per RTCP reporting interval if no RTCP SR 465 packet is forthcoming. The media source may ignore RTCP-SR-REQ 466 packets if its regular schedule for transmission of synchronisation 467 metadata can be expected to allow the receiver to synchronise the 468 media streams within a reasonable time frame. 470 When using SSM sessions with unicast feedback, is possible that the 471 feedback target and media source are not co-located. If a feedback 472 target receives an RTCP-SR-REQ feedback message in such a case, the 473 request should be forwarded to the media source. The mechanism to be 474 used for forwarding such requests is not defined here. 476 3.3. In-band Delivery of Synchronisation Metadata 478 The RTP header extension mechanism defined in [6] can be adopted to 479 carry an OPTIONAL NTP format timestamp in RTP data packets. If such 480 a timestamp is included, it MUST correspond to the same time instant 481 as the RTP timestamp in the packet's header, and MUST be derived from 482 the same clock used to generate the NTP format timestamps included in 483 RTCP SR packets. Provided it has knowledge of the SSRC to CNAME 484 mapping, either from prior receipt of an RTCP CNAME packet or via 485 out-of-band signalling [10], the receiver can use the information 486 provided as input to the synchronisation algorithm, in exactly the 487 same way as if an additional RTCP SR packet was been received for the 488 flow. 490 Two variants are defined for this header extension. The first 491 variant extends the RTP header with a 64 bit NTP timestamp format 492 timestamp as defined in [7]. The second variant carries the lower 24 493 bit part of the Seconds of a NTP timestamp format timestamp and the 494 32 bit of the Fraction of a NTP timestamp format timestamp. The 495 formats of the two variants are shown in Figure 5 and Figure 6. 497 0 1 2 3 498 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 499 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 500 |V=2|P|1| CC |M| PT | sequence number | 501 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+R 502 | timestamp |T 503 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+P 504 | synchronisation source (SSRC) identifier | 505 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 506 | 0xBE | 0xDE | length=3 | 507 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+E 508 | ID-A | L=7 | NTP timestamp format - Seconds (bit 0-23) |x 509 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+t 510 |NTP Sec.(24-31)| NTP timestamp format - Fraction(bit 0-23) |n 511 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 512 |NTP Frc.(24-31)| 0 (pad) | 0 (pad) | 0 (pad) | 513 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 514 | payload data | 515 | .... | 516 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 518 Figure 5: Variant A/64-bit NTP RTP header extension 520 0 1 2 3 521 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 522 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 523 |V=2|P|1| CC |M| PT | sequence number | 524 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+R 525 | timestamp |T 526 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+P 527 | synchronisation source (SSRC) identifier | 528 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 529 | 0xBE | 0xDE | length=2 | 530 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+E 531 | ID-B | L=6 | NTP timestamp format - Seconds (bit 8-31) |x 532 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+t 533 | NTP timestamp format - Fraction (bit 0-31) |n 534 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 535 | payload data | 536 | .... | 537 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 539 Figure 6: Variant B/56-bit NTP RTP header extension 541 An NTP timestamp format timestamp MAY be included on any RTP packets 542 the sender chooses, but it is RECOMMENDED when performing timestamp 543 based decoding order recovery for layered codecs transported in 544 multiple RTP flows, as further specified in Section 4.1. This header 545 extension SHOULD be also sent on the RTP packets corresponding to a 546 video random access point, and on the associated audio packets, to 547 allow rapid synchronisation for late joiners in multimedia sessions, 548 and in video switching scenarios. 550 Note: The inclusion of an RTP header extension will reduce the 551 efficiency of RTP header compression, if it is used. Furthermore, 552 middle boxes which do not understand the header extensions may remove 553 them or may not update the content according to this memo. 555 In all cases, irrespective of whether in-band NTP timestamp format 556 timestamps are included or not, regular RTCP SR packets MUST be sent 557 to provide backwards compatibility with receivers that synchronise 558 RTP flows according to [1], and robustness in the face of middleboxes 559 (RTP translators) that might strip RTP header extensions. The sender 560 reports are also required to receive the upper 8 bit of the Seconds 561 of the NTP timestamp format timestamp not included in the Variant 562 B/56-bit NTP RTP header extension (although this may generally be 563 inferred from context). 565 When the SDP is used, the use of the RTP header extensions defined 566 above MUST be indicated as specified in [6]. Therefore the following 567 URIs MUST be used: 569 o The URI used for signalling the use of Variant A/64-bit NTP RTP 570 header extension in SDP is "urn:ietf:params:rtp-hdrext:ntp-64". 572 o The URI used for signalling the use of Variant B/56-bit NTP RTP 573 header extension in SDP is "urn:ietf:params:rtp-hdrext:ntp-56". 575 4. Application to Decoding Order Recovery in Layered Codecs 577 Packets in RTP flows are often predictively coded, with a receiver 578 having to arrange the packets into a particular order before it can 579 decode the media data. Depending on the payload format, the decoding 580 order might be explicitly specified as a field in the RTP payload 581 header, or the receiver might decode the packets in order of their 582 RTP timestamps. If a layered encoding is used, where the media data 583 is split across several RTP flows, then it is often necessary to 584 exactly synchronise the RTP flows comprising the different layers 585 before layers other than the base layer can be decoded. Examples of 586 such layered encodings are H.264 SVC in NI-T mode [9] and MPEG 587 surround multi-channel audio [17]. As described in Section 2, such 588 synchronisation is possible in RTP, but can be difficult to perform 589 rapidly. In the following, we describe how the extensions defined in 590 Section 3.3 can be used to synchronise layered flows, and provide a 591 common timestamp-based decoding order. 593 4.1. In-band Synchronisation for Decoding Order Recovery 595 When a layered, multi-description, or multi-view codec is used, with 596 the different components of the media being transferred on separate 597 RTP flows, the RTP sender SHOULD use periodic synchronous in-band 598 delivery of synchronisation metadata to allow receivers to rapidly 599 and accurately synchronise the separate components of the layered 600 media flow. There are three parts to this: 602 o The sender must negotiate the use of the RTP header extensions 603 described in Section 3.3, and must periodically and synchronously 604 insert such header extensions into all the RTP flows forming the 605 separate components of the layered, multi-description, or multi- 606 view flow. 608 o Synchronous insertion requires the sender insert these RTP header 609 extensions into packets corresponding to the exact same sampling 610 instant in all the flows. Since the header extensions for each 611 flow are inserted at exactly the same sampling instant, they will 612 have identical NTP-format timestamps, hence allowing receivers to 613 exactly align the RTP timestamps for the component flows. This 614 may require the insertion of extra data packets into some of the 615 component RTP flows, if some component flows contain packets for 616 sampling instants that do not exist in other flows (for example, a 617 layered video codec, where the layers have differing frame rates). 619 o The frequency with which the sender inserts the header extensions 620 will directly correspond to the synchronisation latency, with more 621 frequent insertion leading to higher per-flows overheads, but 622 lower synchronisation latency. It is RECOMMENDED that the sender 623 insert the header extensions synchronously into all component RTP 624 flows at least once per random access point of the media, but they 625 MAY be inserted more often. 627 The sender MUST continue to send periodic RTCP reports including SR 628 packets, and MUST ensure the RTP timestamp to NTP-format timestamp 629 mapping in the RTCP SR packets is consistent with that used in the 630 RTP header extensions. Receivers should use both the information 631 contained in RTCP SR packets and the in-band mapping of RTP and NTP- 632 format timestamps as input to the synchronisation process, but it is 633 RECOMMENDED that receivers sanity check the mappings received and 634 discard outliers, to provide robustness against invalid data (one 635 might think it more likely that the RTCP SR mappings are invalid, 636 since they are sent at irregular times and subject to skew, but the 637 presence of broken RTP translators could also corrupt the timestamps 638 in the RTP header extension; receivers need to cope with both types 639 of failure). 641 4.2. Timestamp based decoding order recovery 643 Once a receiver has synchronised the components of a layered, multi- 644 description, or multi-view flow using the RTP header extensions as 645 described in Section 4.1, it may then derive a decoding order based 646 on the synchronised timestamps as follows (or it may use information 647 in the RTP payload header to derive the decoding order, if present 648 and desired). 650 There may be explicit dependencies between the component flows of a 651 layered, multi-description, or multi-view flow. For example, it is 652 common for layered flows to be arranged in a hierarchy, where flows 653 from "higher" layers cannot be decoded until the corresponding data 654 in "lower" layer flows has been received and decoded. If such a 655 decoding hierarchy exists, it MUST be signalled out of band, for 656 example using [8] when SDP signalling is used. 658 Each component RTP flow MUST contain packets corresponding to all the 659 sampling instants of the RTP flows on which it depends. If such 660 packets are not naturally present in the RTP flow, the sender MUST 661 generate additional packets as necessary in order to satisfy this 662 rule. The format of these packets depends on the payload format 663 used. For H.264 SVC, the Empty NAL unit packet [9] should be used. 664 Flows may also include packets corresponding to additional sampling 665 instants that are not present in the flows on which they depend. 667 The receiver should decode the packets in all the component RTP flows 668 as follows: 670 o For each RTP packet in each flow, use the mapping contained in the 671 RTP header extensions and RTCP SR packets to derive the NTP-format 672 timestamp corresponding to its RTP timestamp. 674 o Group together RTP data packets from all component flows that have 675 identical calculated NTP-format timestamps. 677 o Processing groups in order of ascending NTP-format timestamp, 678 decode the RTP packets in each group according to the signalled 679 RTP flow decoding hierarchy. That is, pass the RTP packet data 680 from the flow on which all other flows depend to the decoder 681 first, then that from the next dependent flow, and so on. The 682 decoding order of the RTP flow hierarchy may be indicated by 683 mechanisms defined in [8] or by some other means. 685 Note that the decoding order will not necessarily match the packet 686 transmission order. The receiver will need to buffer packets for a 687 codec-dependent amount of time in order for all necessary packets to 688 arrive to allow decoding. 690 4.3. Example 692 The example shown in Figure 3 refers to three RTP flows A, B and C 693 containing a layered, a multi-view or a multi-description media 694 stream. In the example, the dependency signalling as defined in [8] 695 indicates that flow A is the lowest RTP flow, B is the first higher 696 RTP flow and depends on A, and C is the second higher RTP flow 697 corresponding to flow A and depends on A and B. A media coding 698 structure is used that results in samples present in higher flows but 699 not present in all lower flows. Flow A has the lowest frame rate and 700 Flow B and C have the same but higher frame rate. The figure shows 701 the full video samples with their corresponding RTP timestamps "(x)". 702 The video samples are already re-ordered according to their RTP 703 sequence number order. The figure indicates for the received sample 704 in decoding order within each RTP flow, as well as the associated NTP 705 media timestamps ("TS[..]"). These timestamps may be derived using 706 the NTP format timestamp provided in the RTCP sender reports or as 707 shown in the figure directly from the NTP timestamp contained in the 708 RTP header extensions as indicate by the timestamp in "". Note 709 that the timestamps are not in increasing order since, in this 710 example, the decoding order is different from the output/presentation 711 order. 713 The process first proceeds to the sample parts associated with the 714 first available synchronous insertion of NTP timestamp into RTP 715 header extensions at NTP media timestamp TS=[8] and starts in the 716 highest RTP flow C and removes/ignores all preceding sample parts (in 717 decoding order) to sample parts with TS=[8] in each of the de- 718 jittering buffers of RTP flows A, B, and C. Then, starting from flow 719 C, the first media timestamp available in decoding order (TS=[8]) is 720 selected and sample parts starting from RTP flow A, and flow B and C 721 are placed in order of the RTP flow dependency as indicated by 722 mechanisms defined in [8] (in the example for TS[8]: first flow B and 723 then flow C into the video sample VS(TS[8]) associated with NTP media 724 timestamp TS=[8]. Then the next media timestamp TS=[6] (RTP 725 timestamp=(4)) in order of appearance in the highest RTP flow C is 726 processed and the process described above is repeated. Note that 727 there may be video samples with no sample parts present, e.g., in the 728 lowest RTP flow A (see, e.g., TS=[5]). The decoding order recovery 729 process could be also started after receiving all RTP sender reports 730 RTP timestamp to NTP-format timestamp mapping (indicated as 731 timestamps "(x){y}") assuming that there is no clock skew in the 732 source used for the NTP-format timestamp generation. 734 C:-(0)----(2)----(7)<8>--(5)----(4)----(6)-----(11)----(9){10}- 735 | | | | | | | | 736 B:-(3)----(5)---(10)<8>--(8)----(7)----(9){7}--(14)----(12)---- 737 | | | | 738 A:---------------(3)<8>--(1)-------------------(7){12}-(5)----- 740 ---------------------------------------decoding/transmission order-> 741 TS:[1] [3] [8]=<8> [6] [5] [7] [12] [10] 743 Key: 744 A, B, C - RTP flows 745 Integer values in "()"- video sample with its RTP timestamp as 746 indicated in its RTP packet. 747 "|" - indicates corresponding samples / parts of 748 sample of the same video sample VS(TS[..]) 749 in the RTP flows. 750 Integer values in "[]"- NTP media timestamp TS, sampling time 751 as derived from the NTP timestamp associated 752 with the video sample AU(TS[..]), consisting 753 of sample parts in the flows above. 754 Integer values in "<>"- NTP media timestamp TS as directly 755 taken from the NTP RTP header extensions. 756 Integer values in "{}"- NTP media timestamp TS as provided in the 757 RTCP sender reports. 759 5. Security Considerations 761 The security considerations of the RTP specification [1], the 762 Extended RTP profile for RTCP-Based Feedback [2], and the General 763 Mechanism for RTP Header Extensions [6] apply. 765 The RTP header extensions defined in Section 3.3 include an NTP- 766 format timestamp. When an RTP session using this header extension is 767 protected by the Secure RTP framework [18], that header extension is 768 not part of the encrypted portion of the RTP data packets or RTCP 769 control packets; however these NTP-format timestamps are encrypted 770 when using SRTP without this header extension. This is a minor 771 information leak, but one that is not believed to be significant. 773 6. IANA Considerations 775 NOTE TO RFC EDITOR: Please replace "RFC XXXX" in the following with 776 the RFC number assigned to this memo, and delete this note. 778 The IANA is requested to register one new value in the table of FMT 779 Values for RTPFB Payload Types [2] as follows: 781 Name: RTCP-SR-REQ 782 Long name: RTCP Rapid Resynchronisation Request 783 Value: 5 784 Reference: RFC XXXX 786 The IANA is also requested to register two new RTP Compact Header 787 Extensions [6], according to the following: 789 Extension URI: urn:ietf:params:rtp-hdrext:ntp-64 790 Description: Synchronisation metadata: 64-bit timestamp format 791 Contact: Thomas Schierl 792 IETF Audio/Video Transport Working Group 793 Reference: RFC XXXX 795 Extension URI: urn:ietf:params:rtp-hdrext:ntp-56 796 Description: Synchronisation metadata: 56-bit timestamp format 797 Contact: Thomas Schierl 798 IETF Audio/Video Transport Working Group 799 Reference: RFC XXXX 801 7. Acknowledgements 803 This memo has benefited from discussions with numerous members of the 804 IETF AVT working group, including Jonathan Lennox, Magnus Westerlund, 805 Randell Jesup, Gerard Babonneau, Ingemar Johansson, Ali C. Begen, Ye- 806 Kui Wang, Roni Even, Michael Dolan, Art Allison, and Stefan Doehla. 807 The RTP header extension format of Variant A in Section 3.3 was 808 suggested by Dave Singer, matching a similar mechanism specified by 809 ISMA. 811 8. References 813 8.1. Normative References 815 [1] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, 816 "RTP: A Transport Protocol for Real-Time Applications", STD 64, 817 RFC 3550, July 2003. 819 [2] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, 820 "Extended RTP Profile for Real-time Transport Control Protocol 821 (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, July 2006. 823 [3] Bradner, S., "Key words for use in RFCs to Indicate Requirement 824 Levels", BCP 14, RFC 2119, March 1997. 826 [4] Schooler, E., Ott, J., and J. Chesterfield, "RTCP Extensions 827 for Single-Source Multicast Sessions with Unicast Feedback", 828 draft-ietf-avt-rtcpssm-18 (work in progress), March 2009. 830 [5] Johansson, I. and M. Westerlund, "Support for Reduced-Size 831 Real-Time Transport Control Protocol (RTCP): Opportunities and 832 Consequences", RFC 5506, April 2009. 834 [6] Singer, D. and H. Desineni, "A General Mechanism for RTP Header 835 Extensions", RFC 5285, July 2008. 837 [7] Mills, D., "Network Time Protocol (Version 3) Specification, 838 Implementation", RFC 1305, March 1992. 840 [8] Schierl, T. and S. Wenger, "Signaling media decoding dependency 841 in Session Description Protocol (SDP)", 842 draft-ietf-mmusic-decoding-dependency-08 (work in progress), 843 April 2009. 845 8.2. Informative References 847 [9] Wenger, S., Wang, Y., Schierl, T., and A. Eleftheriadis, "RTP 848 Payload Format for SVC Video", draft-ietf-avt-rtp-svc-18 (work 849 in progress), March 2009. 851 [10] Lennox, J., Ott, J., and T. Schierl, "Source-Specific Media 852 Attributes in the Session Description Protocol (SDP)", 853 draft-ietf-mmusic-sdp-source-attributes-02 (work in progress), 854 October 2008. 856 [11] Casner, S., "Session Description Protocol (SDP) Bandwidth 857 Modifiers for RTP Control Protocol (RTCP) Bandwidth", RFC 3556, 858 July 2003. 860 [12] Rosenberg, J., "Interactive Connectivity Establishment (ICE): A 861 Protocol for Network Address Translator (NAT) Traversal for 862 Offer/Answer Protocols", draft-ietf-mmusic-ice-19 (work in 863 progress), October 2007. 865 [13] McGrew, D. and E. Rescorla, "Datagram Transport Layer Security 866 (DTLS) Extension to Establish Keys for Secure Real-time 867 Transport Protocol (SRTP)", draft-ietf-avt-dtls-srtp-05 (work 868 in progress), September 2008. 870 [14] Zimmermann, P., Johnston, A., and J. Callas, "ZRTP: Media Path 871 Key Agreement for Secure RTP", draft-zimmermann-avt-zrtp-13 872 (work in progress), January 2009. 874 [15] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117, 875 January 2008. 877 [16] Steeg, B., Begen, A., Caenegem, T., and Z. Vax, "Unicast-Based 878 Rapid Acquisition of Multicast RTP Sessions", 879 draft-ietf-avt-rapid-acquisition-for-rtp-05 (work in progress), 880 November 2009. 882 [17] Bont, F., Doehla, S., Schmidt, M., and R. Sperschneider, "RTP 883 Payload Format for Elementary Streams with MPEG Surround multi- 884 channel audio", draft-ietf-avt-rtp-mps-02 (work in progress), 885 January 2009. 887 [18] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 888 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 889 RFC 3711, March 2004. 891 Authors' Addresses 893 Colin Perkins 894 University of Glasgow 895 Department of Computing Science 896 Glasgow G12 8QQ 897 UK 899 Email: csp@csperkins.org 901 Thomas Schierl 902 Fraunhofer HHI 903 Einsteinufer 37 904 D-10587 Berlin 905 Germany 907 Phone: +49-30-31002-227 908 Email: ts@thomas-schierl.de