idnits 2.17.1 draft-ietf-avtext-avpf-ccm-layered-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC5104, updated by this document, for RFC5378 checks: 2006-08-29) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 17, 2016) is 2716 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group S. Wenger 3 Internet-Draft J. Lennox 4 Updates: 5104 (if approved) Vidyo, Inc. 5 Intended status: Standards Track B. Burman 6 Expires: May 21, 2017 M. Westerlund 7 Ericsson 8 November 17, 2016 10 Using Codec Control Messages in the RTP Audio-Visual Profile with 11 Feedback with Layered Codecs 12 draft-ietf-avtext-avpf-ccm-layered-03 14 Abstract 16 This document updates RFC5104 by fixing a shortcoming in the 17 specification language of the Codec Control Message Full Intra 18 Request (FIR) as defined in RFC5104 when using it with layered 19 codecs. In particular, a Decoder Refresh Point needs to be sent by a 20 media sender when a FIR is received on any layer of the layered 21 bitstream, regardless on whether those layers are being sent in a 22 single or in multiple RTP flows. The other payload-specific feedback 23 messages defined in RFC 5104 and RFC 4585 as updated by RFC 5506 have 24 also been analyzed, and no corresponding shortcomings have been 25 found. 27 Status of This Memo 29 This Internet-Draft is submitted in full conformance with the 30 provisions of BCP 78 and BCP 79. 32 Internet-Drafts are working documents of the Internet Engineering 33 Task Force (IETF). Note that other groups may also distribute 34 working documents as Internet-Drafts. The list of current Internet- 35 Drafts is at http://datatracker.ietf.org/drafts/current/. 37 Internet-Drafts are draft documents valid for a maximum of six months 38 and may be updated, replaced, or obsoleted by other documents at any 39 time. It is inappropriate to use Internet-Drafts as reference 40 material or to cite them other than as "work in progress." 42 This Internet-Draft will expire on May 21, 2017. 44 Copyright Notice 46 Copyright (c) 2016 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents 51 (http://trustee.ietf.org/license-info) in effect on the date of 52 publication of this document. Please review these documents 53 carefully, as they describe your rights and restrictions with respect 54 to this document. Code Components extracted from this document must 55 include Simplified BSD License text as described in Section 4.e of 56 the Trust Legal Provisions and are provided without warranty as 57 described in the Simplified BSD License. 59 Table of Contents 61 1. Introduction and Problem Statement . . . . . . . . . . . . . 2 62 2. Requirements Language . . . . . . . . . . . . . . . . . . . . 4 63 3. Updated definition of Decoder Refresh Point . . . . . . . . . 4 64 4. Full Intra Request for Layered Codecs . . . . . . . . . . . . 5 65 5. Identifying the use of layered bitstreams (Informative) . . . 5 66 6. Layered Codecs and non-FIR codec control messages 67 (Informative) . . . . . . . . . . . . . . . . . . . . . . . . 6 68 6.1. Picture Loss Indication (PLI) . . . . . . . . . . . . . . 6 69 6.2. Slice Loss Indication (SLI) . . . . . . . . . . . . . . . 6 70 6.3. Reference Picture Selection Indication (RPSI) . . . . . . 7 71 6.4. Temporal-Spatial Trade-off Request and Notification 72 (TSTR/TSTN) . . . . . . . . . . . . . . . . . . . . . . . 7 73 6.5. H.271 Video Back Channel Message (VBCM) . . . . . . . . . 8 74 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 8 75 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 76 9. Security Considerations . . . . . . . . . . . . . . . . . . . 8 77 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 8 78 10.1. Normative References . . . . . . . . . . . . . . . . . . 8 79 10.2. Informative References . . . . . . . . . . . . . . . . . 9 80 Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 10 81 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 10 83 1. Introduction and Problem Statement 85 The Extended RTP Profile for Real-time Transport Control Protocol 86 (RTCP)-Based Feedback (RTP/AVPF) [RFC4585] and Codec Control Messages 87 in the RTP Audio-Visual Profile with Feedback (AVPF) [RFC5104] 88 specify a number of payload-specific feedback messages which a media 89 receiver can use to inform a media sender of certain conditions, or 90 make certain requests. The feedback messages are being sent as RTCP 91 receiver reports, and RFC 4585 specifies timing rules that make the 92 use of those messages practical for time-sensitive codec control. 94 Since the time those RFCs were developed, layered codecs have gained 95 in popularity and deployment. Layered codecs use multiple sub- 96 bitstreams called layers to represent the content in different 97 fidelities. Depending on the media codec and its RTP payload format 98 in use, a number of options exist how to transport those layers in 99 RTP. With reference to A Taxonomy of Semantics and Mechanisms for 100 Real-Time Transport Protocol (RTP) Sources [RFC7656]): 102 single layers or groups of layers may be sent in their own RTP 103 streams in MRST or MRMT mode; 105 using media-codec specific multiplexing mechanisms, multiple 106 layers may be sent in a single RTP stream in SRST mode. 108 The dependency relationship between layers in a truly layered, 109 pyramid-shaped bitstream forms a directed graph, with the base layer 110 at the root. Enhancement layers depend on the base layer and 111 potentially on other enhancement layers, and the target layer and all 112 layers it depends on have to be decoded jointly in order to re-create 113 the uncompressed media signal at the fidelity of the target layer. 114 Such a layering structure is assumed henceforth; for more exotic 115 layering structures please see Section 5. 117 Implementation experience has shown that the Full Intra Request 118 command as defined in [RFC5104] is underspecified when used with 119 layered codecs and when more than one RTP stream is used to transport 120 the layers of a layered bitstream at a given fidelity. In 121 particular, from the [RFC5104] specification language it is not clear 122 whether an FIR received for only a single RTP stream of multiple RTP 123 streams covering the same layered bitstream necessarily triggers the 124 sending of a Decoder Refresh Point (as defined in [RFC5104] section 125 2.2) for all layers, or only for the layer which is transported in 126 the RTP stream which the FIR request is associated with. 128 This document fixes this shortcoming by: 130 a. Updating the definition of the Decoder Refresh Point (as defined 131 in [RFC5104] section 2.2) to cover layered codecs, in line with 132 the corresponding definitions used in a popular layered codec 133 format, namely H.264/SVC [H.264]. Specifically, a decoder 134 refresh point, in conjunction with layered codecs, resets the 135 state of the whole decoder, which implies that it includes hard 136 or gradual single-layer decoder refresh for all layers; 138 b. Require a media sender to send a Decoder Refresh Point after the 139 media sender has received a Full Intra Request over an RTCP 140 stream associated with any of the RTP streams over which a part 141 of the layered bitstream is transported; 143 c. Require that a media receiver sends the FIR on the RTCP stream 144 associated with the base layer. The option of receiving FIR on 145 enhancement layer-associated RTCP stream as specified in point b) 146 above is kept for backward compatibility; and 148 d. Providing guidance on how to detect that a layered bitstream is 149 in use for which the above rules apply. 151 While, clearly, the reaction to FIR for layered codecs in [RFC5104] 152 and companion documents is underspecified, it appears that this is 153 not the case for any of the other payload-specific codec control 154 messages defined in any of [RFC4585], [RFC5104]. A brief summary of 155 the analysis that led to this conclusion is also included in this 156 document. 158 2. Requirements Language 160 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 161 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 162 document are to be interpreted as described in RFC 2119 [RFC2119]. 164 3. Updated definition of Decoder Refresh Point 166 The remainder of this section replaces the definition of Decoder 167 Refresh Point in section 2.2 of [RFC5104] in its entirety. 169 Decoder Refresh Point: A bit string, packetized in one or more RTP 170 packets, that completely resets the decoder to a known state. 172 Examples for "hard" single layer decoder refresh points are Intra 173 pictures in H.261 [H.261], H.263 [H.263], MPEG-1 [MPEG-1], MPEG-2 174 [MPEG-2], and MPEG-4 [MPEG-4]; Instantaneous Decoder Refresh (IDR) 175 pictures in H.264 [H.264], and H.265 [H.265]; and Keyframes in VP8 176 [RFC6386] and VP9 [I-D.grange-vp9-bitstream]. "Gradual" decoder 177 refresh points may also be used; see for example H.264 [H.264]. 178 While both "hard" and "gradual" decoder refresh points are acceptable 179 in the scope of this specification, in most cases the user experience 180 will benefit from using a "hard" decoder refresh point. 182 A decoder refresh point also contains all header information above 183 the syntactical level of the picture layer that is conveyed in-band. 184 In [H.264], for example, a decoder refresh point contains those 185 parameter set Network Adaptation Layer (NAL) units that generate 186 parameter sets necessary for the decoding of the following slice/data 187 partition NAL units. (That is assuming the parameter sets have not 188 been conveyed out of band.) 190 When a layered codec is in use, the above definition--in particular, 191 the requirement to completely reset the decoder to a known state-- 192 implies that the decoder refresh point includes hard or gradual 193 single layer decoder refresh points for all layers. 195 4. Full Intra Request for Layered Codecs 197 A media receiver or middlebox may decide to send a FIR command based 198 on the guidance provided in Section 4.3.1 of [RFC5104]. When sending 199 the FIR command, it MUST target the RTP stream that carries the base 200 layer of the layered bitstream, and this is done by setting the 201 Feedback Control Information (FCI, and in particular the SSRC field 202 therein) to refer to the SSRC of the forward RTP stream that carries 203 the base layer. 205 When a Full Intra Request Command is received by the designated media 206 sender in the RTCP stream associated with any of the RTP streams in 207 which any layer of a layered bitstream are sent, the designated media 208 sender MUST send a Decoder Refresh Point (Section 3) as defined above 209 at its earliest opportunity. The requirements related to congestion 210 control on the forward RTP streams as specified in sections 3.5.1. 211 and 5. of [RFC5104] apply for the RTP streams both in isolation and 212 combined. 214 Note: the requirement to react to FIR commands associated with 215 enhancement layers is included for robustness and backward 216 compatibility reasons. 218 5. Identifying the use of layered bitstreams (Informative) 220 The above modifications to RFC 5104 unambiguously define how to deal 221 with FIR when layered bitstreams are in use. However, it is 222 surprisingly difficult to identify the use of a layered bitstream. 223 In general, it is expected that implementers know when layered 224 bitstreams (in its commonly understood sense: with inter-layer 225 prediction between pyramided-arranged layers) are in use and when 226 not, and can therefore implement the above updates to RFC 5104 227 correctly. However, there are scenarios in which layered codecs are 228 employed creating non-pyramid shaped bitstreams. Those scenarios may 229 be viewed as somewhat exotic today but clearly are supported by 230 certain video coding syntaxes, such as H.264/SVC. When blindly 231 applying the above rules to those non-pyramid-arranged layering 232 structures, suboptimal system behavior would result. Nothing would 233 break, and there would not be an interoperability failure, but the 234 user experience may suffer through the sending or receiving of 235 Decoder Refresh Points at times or on parts of the bitstream that are 236 unnecessary from a user experience viewpoint. Therefore, this 237 informative section is included that provides the current 238 understanding of when a layered bitstream is in use and when not. 240 The key observation made here is that the RTP payload format 241 negotiated for the RTP streams, in isolation, is not necessarily an 242 indicator for the use of a layered bitstream. Some layered codecs 243 (including H.264/SVC) can form decodable bitstreams including only 244 (one or more) enhancement layers, without the base layer, effectively 245 creating simulcastable sub-bitstreams within a single scalable 246 bitstream (as defined in the video coding standard), but without 247 inter-layer prediction. In such a scenario, it is potentially, 248 though not necessarily, counter-productive to send a decoder refresh 249 point on all RTP streams using that payload format and SSRC. It is 250 beyond the scope of this document to discuss optimized reactions to 251 FIRs received on RTP streams carrying such exotic bitstreams. 253 One good indication of the likely use of pyramid-shaped layering with 254 interlayer prediction is when the various RTP streams are "bound" 255 together on the signaling level. In an SDP environment, this would 256 be the case if they are marked as being dependent from each other 257 using The Session Description Protocol (SDP) Grouping Framework 258 [RFC5888] and the layer dependency RFC 5583 [RFC5583]. 260 6. Layered Codecs and non-FIR codec control messages (Informative) 262 Between them, AVPF [RFC4585] and Codec Control Messages [RFC5104] 263 define a total of seven Payload-specific Feedback messages. For the 264 FIR command message, guidance has been provided above. In this 265 section, some information is provided with respect to the remaining 266 six codec control messages. 268 6.1. Picture Loss Indication (PLI) 270 PLI is defined in section 6.3.1 of [RFC4585]. The prudent response 271 to a PLI message received for an enhancement layer is to "repair" 272 that enhancement layer and all dependent enhancement layers through 273 appropriate source-coding specific means. However, the reference 274 layer(s) used by the enhancement layer for which the PLI was received 275 does not require repair. The encoder can figure out by itself what 276 constitutes a dependent enhancement layer and does not need help from 277 the system stack in doing so. Thus, there is nothing that needs to 278 be specified herein. 280 6.2. Slice Loss Indication (SLI) 282 SLI is defined in section 6.3.2 of [RFC4585]. The authors' current 283 understanding is that the prudent response to a SLI message received 284 for an enhancement layer is to "repair" the affected spatial area of 285 that enhancement layer and all dependent enhancement layers through 286 appropriate source-coding specific means. As in PLI, the reference 287 layers used by the enhancement layer for which the SLI was received 288 do not need to be repaired. Again, as in PLI, the encoder can 289 determine by itself what constitutes a dependent enhancement layer 290 and does not need help from the system stack in doing so. Thus, 291 there is nothing that needs to be specified herein. SLI has seen 292 very little implementation and, as far as it is known, none in 293 conjunction with layered systems. 295 6.3. Reference Picture Selection Indication (RPSI) 297 RPSI is defined in section 6.3.3 of [RFC4585]. While a technical 298 equivalent of RPSI has been in use with non-layered systems for many 299 years, no implementations are known in conjunction of layered codecs. 300 The authors' current understanding is that the reception of an RPSI 301 message on any layer indicating a missing reference picture forces 302 the encoder to appropriately handle that missing reference picture in 303 the layer indicated, and all dependent layers. Thus, RPSI should 304 work without further need for specification language. 306 6.4. Temporal-Spatial Trade-off Request and Notification (TSTR/TSTN) 308 TSTN/TSTR are defined in section 4.3.2 and 4.3.3 of [RFC5104], 309 respectively. The TSTR request communicates guidance of the 310 preferred trade-off between spatial quality and frame rate. A 311 technical equivalent of TSTN/TSTR has seen deployment for many years 312 in non-scalable systems. 314 The Temporal-Spatial Trade-off request and notification messages 315 include an SSRC target, which, similarly to FIR, may refer to an RTP 316 stream carrying a base layer, an enhancement layer, or multiple 317 layers. Therefore, the authors' current understanding is that the 318 semantics of the message applies to the layers present in the 319 targeted RTP stream. 321 It is noted that per-layer TSTR/TSTN is a mechanism that is, in some 322 ways, counterproductive in a system using layered codecs. Given a 323 sufficiently complex layered bitstream layout, a sending system has 324 flexibility in adjusting the spatio/temporal quality balance by 325 adding and removing temporal, spatial, or quality enhancement layers. 326 At present it is unclear whether an allowed (or even recommended) 327 option to the reception of a TSTR is to adjust the bit allocation 328 within the layer(s) present in the addressed RTP stream, or to adjust 329 the layering structure accordingly--which can involve more than just 330 the addressed RTP stream. 332 Until there is a sufficient critical mass of implementation practice, 333 it is probably prudent for an implementer not to assume either of the 334 two options or any middleground that may exist between the two. 335 Instead, it is suggested that an implementation be liberal in 336 accepting TSTR messages, and upon receipt responding in TSTN 337 indicating "no change". Further, it is suggested that new 338 implementations do not send TSTR messages except when operating in 339 SRST mode as defined in [RFC7656]. Finally implementers are 340 encouraged to contribute to the IETF documentation of any 341 implementation requirements that make per-layer TSTR/TSTN useful. 343 6.5. H.271 Video Back Channel Message (VBCM) 345 VBCM is defined in section 4.3.4 of [RFC5104]. What was said above 346 for RPSI (Section 6.3) applies here as well. 348 7. Acknowledgements 350 The authors want to thank Mo Zanaty for useful discussions. 352 8. IANA Considerations 354 This memo includes no request to IANA. 356 9. Security Considerations 358 The security considerations of AVPF [RFC4585] (as updated by Support 359 for Reduced-Size Real-Time Transport Control Protocol (RTCP): 360 Opportunities and Consequences [RFC5506]) and Codec Control Messages 361 [RFC5104] apply. The clarified response to FIR does not introduce 362 additional security considerations. 364 10. References 366 10.1. Normative References 368 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 369 Requirement Levels", BCP 14, RFC 2119, 370 DOI 10.17487/RFC2119, March 1997, 371 . 373 [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, 374 "Extended RTP Profile for Real-time Transport Control 375 Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, 376 DOI 10.17487/RFC4585, July 2006, 377 . 379 [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman, 380 "Codec Control Messages in the RTP Audio-Visual Profile 381 with Feedback (AVPF)", RFC 5104, DOI 10.17487/RFC5104, 382 February 2008, . 384 [RFC5506] Johansson, I. and M. Westerlund, "Support for Reduced-Size 385 Real-Time Transport Control Protocol (RTCP): Opportunities 386 and Consequences", RFC 5506, DOI 10.17487/RFC5506, April 387 2009, . 389 10.2. Informative References 391 [H.261] ITU-T, "ITU-T Rec. H.261: Video codec for audiovisual 392 services at p x 64 kbit/s", 1993, 393 . 395 [H.263] ITU-T, "ITU-T Rec. H.263: Video coding for low bit rate 396 communication", 2005, 397 . 399 [H.264] ITU-T, "ITU-T Rec. H.264: Advanced video coding for 400 generic audiovisual services", 2014, 401 . 403 [H.265] ITU-T, "ITU-T Rec. H.265: High efficiency video coding", 404 2015, . 406 [I-D.grange-vp9-bitstream] 407 Grange, A. and H. Alvestrand, "A VP9 Bitstream Overview", 408 draft-grange-vp9-bitstream-00 (work in progress), February 409 2013. 411 [MPEG-1] ISO/IEC, "ISO/IEC 11172-2:1993 Information technology -- 412 Coding of moving pictures and associated audio for digital 413 storage media at up to about 1,5 Mbit/s -- Part 2: Video", 414 1993. 416 [MPEG-2] ISO/IEC, "ISO/IEC 13818-2:2013 Information technology -- 417 Generic coding of moving pictures and associated audio 418 information -- Part 2: Video", 2013. 420 [MPEG-4] ISO/IEC, "ISO/IEC 14496-2:2004 Information technology -- 421 Coding of audio-visual objects -- Part 2: Visual", 2004. 423 [RFC5583] Schierl, T. and S. Wenger, "Signaling Media Decoding 424 Dependency in the Session Description Protocol (SDP)", 425 RFC 5583, DOI 10.17487/RFC5583, July 2009, 426 . 428 [RFC5888] Camarillo, G. and H. Schulzrinne, "The Session Description 429 Protocol (SDP) Grouping Framework", RFC 5888, 430 DOI 10.17487/RFC5888, June 2010, 431 . 433 [RFC6386] Bankoski, J., Koleszar, J., Quillio, L., Salonen, J., 434 Wilkins, P., and Y. Xu, "VP8 Data Format and Decoding 435 Guide", RFC 6386, DOI 10.17487/RFC6386, November 2011, 436 . 438 [RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and 439 B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms 440 for Real-Time Transport Protocol (RTP) Sources", RFC 7656, 441 DOI 10.17487/RFC7656, November 2015, 442 . 444 Appendix A. Change Log 446 NOTE TO RFC EDITOR: Please remove this section prior to publication. 448 draft-wenger-avtext-avpf-ccm-layered-00-00: initial version 450 draft-ietf-avtext-avpf-ccm-layered-00: resubmit as avtext WG draft 451 per IETF95 and list confirmation by Rachel 4/25/2016 453 draft-ietf-avtext-avpf-ccm-layered-00: In section "Identifying the 454 use of Layered Codecs (Informative)", removed last sentence that 455 could be misread that the explicit signaling of simulcasting in 456 conjunction with payload formats supporting layered coding implies no 457 layering. 459 draft-ietf-avtext-avpf-ccm-layered-01: clarifications in section 5. 461 draft-ietf-avtext-avpf-ccm-layered-02: addressing WGLC comments, 462 mostly editorial; see reflector discussions 09/2016 464 draft-ietf-avtext-avpf-ccm-layered-03: addressing AD writeup 465 comments, editorial 467 Authors' Addresses 469 Stephan Wenger 470 Vidyo, Inc. 472 Email: stewe@stewe.org 474 Jonathan Lennox 475 Vidyo, Inc. 477 Email: jonathan@vidyo.com 478 Bo Burman 479 Ericsson 480 Kistavagen 25 481 SE - 164 80 Kista 482 Sweden 484 Email: bo.burman@ericsson.com 486 Magnus Westerlund 487 Ericsson 488 Farogatan 2 489 SE- 164 80 Kista 490 Sweden 492 Phone: +46107148287 493 Email: magnus.westerlund@ericsson.com