idnits 2.17.1 draft-ietf-avtext-framemarking-13.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (November 2021) is 892 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Zanaty 3 Internet-Draft E. Berger 4 Intended status: Experimental S. Nandakumar 5 Expires: 15 May 2022 Cisco Systems 6 November 2021 8 Frame Marking RTP Header Extension 9 draft-ietf-avtext-framemarking-13 11 Abstract 13 This document describes a Frame Marking RTP header extension used to 14 convey information about video frames that is critical for error 15 recovery and packet forwarding in RTP middleboxes or network nodes. 16 It is most useful when media is encrypted, and essential when the 17 middlebox or node has no access to the media decryption keys. It is 18 also useful for codec-agnostic processing of encrypted or unencrypted 19 media, while it also supports extensions for codec-specific 20 information. 22 Status of This Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at https://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on 5 May 2022. 39 Copyright Notice 41 Copyright (c) 2021 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 46 license-info) in effect on the date of publication of this document. 47 Please review these documents carefully, as they describe your rights 48 and restrictions with respect to this document. Code Components 49 extracted from this document must include Simplified BSD License text 50 as described in Section 4.e of the Trust Legal Provisions and are 51 provided without warranty as described in the Simplified BSD License. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 56 2. Key Words for Normative Requirements . . . . . . . . . . . . 4 57 3. Frame Marking RTP Header Extension . . . . . . . . . . . . . 4 58 3.1. Long Extension for Scalable Streams . . . . . . . . . . . 5 59 3.2. Short Extension for Non-Scalable Streams . . . . . . . . 7 60 3.3. Layer ID Mappings for Scalable Streams . . . . . . . . . 7 61 3.3.1. VP9 LID Mapping . . . . . . . . . . . . . . . . . . . 7 62 3.3.2. H265 LID Mapping . . . . . . . . . . . . . . . . . . 8 63 3.3.3. H264-SVC LID Mapping . . . . . . . . . . . . . . . . 9 64 3.3.4. H264 (AVC) LID Mapping . . . . . . . . . . . . . . . 9 65 3.3.5. VP8 LID Mapping . . . . . . . . . . . . . . . . . . . 10 66 3.3.6. Future Codec LID Mapping . . . . . . . . . . . . . . 11 67 3.4. Signaling Information . . . . . . . . . . . . . . . . . . 11 68 3.5. Usage Considerations . . . . . . . . . . . . . . . . . . 11 69 3.5.1. Relation to Layer Refresh Request (LRR) . . . . . . . 12 70 3.5.2. Scalability Structures . . . . . . . . . . . . . . . 12 71 4. Security Considerations . . . . . . . . . . . . . . . . . . . 12 72 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 12 73 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 74 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 13 75 7.1. Normative References . . . . . . . . . . . . . . . . . . 13 76 7.2. Informative References . . . . . . . . . . . . . . . . . 13 77 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 14 79 1. Introduction 81 Many widely deployed RTP [RFC3550] topologies [RFC7667] used in 82 modern voice and video conferencing systems include a centralized 83 component that acts as an RTP switch. It receives voice and video 84 streams from each participant, which may be encrypted using SRTP 85 [RFC3711], or extensions that provide participants with private media 86 [RFC8871] via end-to-end encryption where the switch has no access to 87 media decryption keys. The goal is to provide a set of streams back 88 to the participants which enable them to render the right media 89 content. In a simple video configuration, for example, the goal will 90 be that each participant sees and hears just the active speaker. In 91 that case, the goal of the switch is to receive the voice and video 92 streams from each participant, determine the active speaker based on 93 energy in the voice packets, possibly using the client-to-mixer audio 94 level RTP header extension [RFC6464], and select the corresponding 95 video stream for transmission to participants; see Figure 1. 97 In this document, an "RTP switch" is used as a common short term for 98 the terms "switching RTP mixer", "source projecting middlebox", 99 "source forwarding unit/middlebox" and "video switching MCU" as 100 discussed in [RFC7667]. 102 +---+ +------------+ +---+ 103 | A |<---->| |<---->| B | 104 +---+ | | +---+ 105 | RTP | 106 +---+ | Switch | +---+ 107 | C |<---->| |<---->| D | 108 +---+ +------------+ +---+ 110 Figure 1: RTP switch 112 In order to properly support switching of video streams, the RTP 113 switch typically needs some critical information about video frames 114 in order to start and stop forwarding streams. 116 * Because of inter-frame dependencies, it should ideally switch 117 video streams at a point where the first frame from the new 118 speaker can be decoded by recipients without prior frames, e.g 119 switch on an intra-frame. 120 * In many cases, the switch may need to drop frames in order to 121 realize congestion control techniques, and needs to know which 122 frames can be dropped with minimal impact to video quality. 123 * For scalable streams with dependent layers, the switch may need to 124 selectively forward specific layers to specific recipients due to 125 recipient bandwidth or decoder limits. 126 * Furthermore, it is highly desirable to do this in a payload 127 format-agnostic way which is not specific to each different video 128 codec. Most modern video codecs share common concepts around 129 frame types and other critical information to make this codec- 130 agnostic handling possible. 131 * It is also desirable to be able to do this for SRTP without 132 requiring the video switch to decrypt the packets. SRTP will 133 encrypt the RTP payload format contents and consequently this data 134 is not usable for the switching function without decryption, which 135 may not even be possible in the case of end-to-end encryption of 136 private media [RFC8871]. 138 By providing meta-information about the RTP streams outside the 139 encrypted media payload, an RTP switch can do codec-agnostic 140 selective forwarding without decrypting the payload. This document 141 specifies the necessary meta-information in an RTP header extension. 143 2. Key Words for Normative Requirements 145 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 146 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 147 "OPTIONAL" in this document are to be interpreted as described in 148 [RFC2119]. 150 3. Frame Marking RTP Header Extension 152 This specification uses RTP header extensions as defined in 153 [RFC8285]. A subset of meta-information from the video stream is 154 provided as an RTP header extension to allow an RTP switch to do 155 generic selective forwarding of video streams encoded with 156 potentially different video codecs. 158 The Frame Marking RTP header extension is encoded using the one-byte 159 header or two-byte header as described in [RFC8285]. The one-byte 160 header format is used for examples in this memo. The two-byte header 161 format is used when other two-byte header extensions are present in 162 the same RTP packet, since mixing one-byte and two-byte extensions is 163 not possible in the same RTP packet. 165 This extension is only specified for Source (not Redundancy) RTP 166 Streams [RFC7656] that carry video payloads. It is not specified for 167 audio payloads, nor is it specified for Redundancy RTP Streams. The 168 (separate) specifications for Redundancy RTP Streams often include 169 provisions for recovering any header extensions that were part of the 170 original source packet. Such provisions SHALL be followed to recover 171 the Frame Marking RTP header extension of the original source packet. 172 Source packet frame markings may be useful when generating Redundancy 173 RTP Streams; for example, the I and D bits can be used to generate 174 extra or no redundancy, respectively, and redundancy schemes with 175 source blocks can align source block boundaries with Independent 176 frame boundaries as marked by the I bit. 178 A frame, in the context of this specification, is the set of RTP 179 packets with the same RTP timestamp from a specific RTP 180 synchronization source (SSRC). A frame within a layer is the set of 181 RTP packets with the same RTP timestamp, SSRC, Temporal ID (TID), and 182 Layer ID (LID). 184 3.1. Long Extension for Scalable Streams 186 The following RTP header extension is RECOMMENDED for scalable 187 streams. It MAY also be used for non-scalable streams, in which case 188 TID, LID and TL0PICIDX MUST be 0 or omitted. The ID is assigned per 189 [RFC8285], and the length is encoded as L=2 which indicates 3 octets 190 of data when nothing is omitted, or L=1 for 2 octets when TL0PICIDX 191 is omitted, or L=0 for 1 octet when both LID and TL0PICIDX are 192 omitted. 194 0 1 2 3 195 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 196 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 197 | ID=? | L=2 |S|E|I|D|B| TID | LID | TL0PICIDX | 198 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 199 or 200 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 201 | ID=? | L=1 |S|E|I|D|B| TID | LID | (TL0PICIDX omitted) 202 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 203 or 204 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 205 | ID=? | L=0 |S|E|I|D|B| TID | (LID and TL0PICIDX omitted) 206 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 208 The following information are extracted from the media payload and 209 sent in the Frame Marking RTP header extension. 211 * S: Start of Frame (1 bit) - MUST be 1 in the first packet in a 212 frame within a layer; otherwise MUST be 0. 213 * E: End of Frame (1 bit) - MUST be 1 in the last packet in a frame 214 within a layer; otherwise MUST be 0. Note that the RTP header 215 marker bit MAY be used to infer the last packet of the highest 216 enhancement layer, in payload formats with such semantics. 217 * I: Independent Frame (1 bit) - MUST be 1 for a frame within a 218 layer that can be decoded independent of temporally prior frames, 219 e.g. intra-frame, VPX keyframe, H.264 IDR [RFC6184], H.265 220 IDR/CRA/BLA/RAP [RFC7798]; otherwise MUST be 0. Note that this 221 bit only signals temporal independence, so it can be 1 in spatial 222 or quality enhancement layers that depend on temporally co-located 223 layers but not temporally prior frames. 224 * D: Discardable Frame (1 bit) - MUST be 1 for a frame within a 225 layer the sender knows can be discarded, and still provide a 226 decodable media stream; otherwise MUST be 0. 227 * B: Base Layer Sync (1 bit) - When TID is not 0, this MUST be 1 if 228 the sender knows this frame within a layer only depends on the 229 base temporal layer; otherwise MUST be 0. When TID is 0 or if no 230 scalability is used, this MUST be 0. 232 * TID: Temporal ID (3 bits) - Identifies the temporal layer/sub- 233 layer encoded, starting with 0 for the base layer, and increasing 234 with higher temporal fidelity. If no scalability is used, this 235 MUST be 0. It is implicitly 0 in the short extension format. 236 * LID: Layer ID (8 bits) - Identifies the spatial and quality layer 237 encoded, starting with 0 for the base layer, and increasing with 238 higher fidelity. If no scalability is used, this MUST be 0 or 239 omitted to reduce length. When omitted, TL0PICIDX MUST also be 240 omitted. It is implicitly 0 in the short extension format or when 241 omitted in the long extension format. 242 * TL0PICIDX: Temporal Layer 0 Picture Index (8 bits) - When TID is 0 243 and LID is 0, this is a cyclic counter labeling base layer frames. 244 When TID is not 0 or LID is not 0, this indicates a dependency on 245 the given index, such that this frame within this layer depends on 246 the frame with this label in the layer with TID 0 and LID 0. If 247 no scalability is used, or the cyclic counter is unknown, this 248 MUST be omitted to reduce length. Note that 0 is a valid index 249 value for TL0PICIDX. 251 The layer information contained in TID and LID convey useful aspects 252 of the layer structure that can be utilized in selective forwarding. 254 Without further information about the layer structure, these TID/LID 255 identifiers can only be used for relative priority of layers and 256 implicit dependencies between layers. They convey a layer hierarchy 257 with TID=0 and LID=0 identifying the base layer. Higher values of 258 TID identify higher temporal layers with higher frame rates. Higher 259 values of LID identify higher spatial and/or quality layers with 260 higher resolutions and/or bitrates. Implicit dependencies between 261 layers assume that a layer with a given TID/LID MAY depend on 262 layer(s) with the same or lower TID/LID, but MUST NOT depend on 263 layer(s) with higher TID/LID. 265 With further information, for example, possible future RTCP SDES 266 items that convey full layer structure information, it may be 267 possible to map these TIDs and LIDs to specific absolute frame rates, 268 resolutions and bitrates, as well as explicit dependencies between 269 layers. Such additional layer information may be useful for 270 forwarding decisions in the RTP switch, but is beyond the scope of 271 this memo. The relative layer information is still useful for many 272 selective forwarding decisions even without such additional layer 273 information. 275 3.2. Short Extension for Non-Scalable Streams 277 The following RTP header extension is RECOMMENDED for non-scalable 278 streams. It is identical to the shortest form of the extension for 279 scalable streams, except the last four bits (B and TID) are replaced 280 with zeros. It MAY also be used for scalable streams if the sender 281 has limited or no information about stream scalability. The ID is 282 assigned per [RFC8285], and the length is encoded as L=0 which 283 indicates 1 octet of data. 285 0 1 286 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 287 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 288 | ID=? | L=0 |S|E|I|D|0 0 0 0| 289 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 291 The following information are extracted from the media payload and 292 sent in the Frame Marking RTP header extension. 294 * S: Start of Frame (1 bit) - MUST be 1 in the first packet in a 295 frame; otherwise MUST be 0. 296 * E: End of Frame (1 bit) - MUST be 1 in the last packet in a frame; 297 otherwise MUST be 0. SHOULD match the RTP header marker bit in 298 payload formats with such semantics for marking end of frame. 299 * I: Independent Frame (1 bit) - MUST be 1 for frames that can be 300 decoded independent of temporally prior frames, e.g. intra-frame, 301 VPX keyframe, H.264 IDR [RFC6184], H.265 IDR/CRA/BLA/IRAP 302 [RFC7798]; otherwise MUST be 0. 303 * D: Discardable Frame (1 bit) - MUST be 1 for frames the sender 304 knows can be discarded, and still provide a decodable media 305 stream; otherwise MUST be 0. 306 * The remaining (4 bits) - are reserved/fixed values and not used 307 for non-scalable streams; they MUST be set to 0 upon transmission 308 and ignored upon reception. 310 3.3. Layer ID Mappings for Scalable Streams 312 This section maps the specific Layer ID information contained in 313 specific scalable codecs to the generic LID and TID fields. 315 Note that non-scalable streams have no Layer ID information and thus 316 no mappings. 318 3.3.1. VP9 LID Mapping 320 The following shows the VP9 [I-D.ietf-payload-vp9] Spatial Layer ID 321 (SID, 3 bits) and Temporal Layer ID (TID, 3 bits) from the VP9 322 payload descriptor mapped to the generic LID and TID fields. 324 The S bit MUST match the B bit in the VP9 payload descriptor. 326 The E bit MUST match the E bit in the VP9 payload descriptor. 328 The I bit MUST match the inverse of the P bit in the VP9 payload 329 descriptor. 331 The D bit MUST be 1 if the refresh_frame_flags in the VP9 payload 332 uncompressed header are all 0, otherwise it MUST be 0. 334 The B bit MUST be 0 if TID is 0; otherwise, if TID is not 0, it MUST 335 match the U bit in the VP9 payload descriptor. Note: When using 336 temporally nested scalability structures as recommended in 337 Section 3.5.2, the B bit and VP9 U bit will always be 1 if TID is not 338 0, since it is always possible to switch up to a higher temporal 339 layer in such nested structures. 341 TID and TL0PICIDX MUST match the correspondingly named fields in the 342 VP9 payload descriptor. 344 0 1 2 3 345 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 346 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 347 | ID=? | L=2 |S|E|I|D|B| TID |0|0|0|0|0| SID | TL0PICIDX | 348 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 350 3.3.2. H265 LID Mapping 352 The following shows the H265 [RFC7798] LayerID (6 bits) and TID (3 353 bits) from the NAL unit header mapped to the generic LID and TID 354 fields. 356 The S and E bits MUST match the correspondingly named bits in 357 PACI:PHES:TSCI payload structures. 359 The I bit MUST be 1 when the NAL unit type is 16-23 (inclusive) or 360 32-34 (inclusive), or an aggregation packet or fragmentation unit 361 encapsulating any of these types, otherwise it MUST be 0. These 362 ranges cover intra (IRAP) frames as well as critical parameter sets 363 (VPS, SPS, PPS). 365 The D bit MUST be 1 when the NAL unit type is 0, 2, 4, 6, 8, 10, 12, 366 14, or 38, or an aggregation packet or fragmentation unit 367 encapsulating only these types, otherwise it MUST be 0. These ranges 368 cover non-reference frames as well as filler data. 370 The B bit can not be determined reliably from simple inspection of 371 payload headers, and therefore is determined by implementation- 372 specific means. For example, internal codec interfaces may provide 373 information to set this reliably. 375 0 1 2 3 376 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 377 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 378 | ID=? | L=2 |S|E|I|D|B| TID |0|0| LayerID | TL0PICIDX | 379 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 381 3.3.3. H264-SVC LID Mapping 383 The following shows H264-SVC [RFC6190] Layer encoding information (3 384 bits for spatial/dependency layer, 4 bits for quality layer and 3 385 bits for temporal layer) mapped to the generic LID and TID fields. 387 The S, E, I and D bits MUST match the correspondingly named bits in 388 PACSI payload structures. 390 The I bit MUST be 1 when the NAL unit type is 5, 7, 8, 13, or 15, or 391 an aggregation packet or fragmentation unit encapsulating any of 392 these types, otherwise it MUST be 0. These ranges cover intra (IDR) 393 frames as well as critical parameter sets (SPS/PPS variants). 395 The D bit MUST be 1 when the NAL unit header NRI field is 0, or an 396 aggregation packet or fragmentation unit encapsulating only NAL units 397 with NRI=0, otherwise it MUST be 0. The NRI=0 condition signals non- 398 reference frames. 400 The B bit can not be determined reliably from simple inspection of 401 payload headers, and therefore is determined by implementation- 402 specific means. For example, internal codec interfaces may provide 403 information to set this reliably. 405 0 1 2 3 406 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 407 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 408 | ID=? | L=2 |S|E|I|D|B| TID |0| DID | QID | TL0PICIDX | 409 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 411 3.3.4. H264 (AVC) LID Mapping 413 The following shows the header extension for H264 (AVC) [RFC6184] 414 that contains only temporal layer information. 416 The S bit MUST be 1 when the timestamp in the RTP header differs from 417 the timestamp in the prior RTP sequence number from the same SSRC, 418 otherwise it MUST be 0. 420 The E bit MUST match the M bit in the RTP header. 422 The I bit MUST be 1 when the NAL unit type is 5, 7, or 8, or an 423 aggregation packet or fragmentation unit encapsulating any of these 424 types, otherwise it MUST be 0. These ranges cover intra (IDR) frames 425 as well as critical parameter sets (SPS/PPS). 427 The D bit MUST be 1 when the NAL unit header NRI field is 0, or an 428 aggregation packet or fragmentation unit encapsulating only NAL units 429 with NRI=0, otherwise it MUST be 0. The NRI=0 condition signals non- 430 reference frames. 432 The B bit can not be determined reliably from simple inspection of 433 payload headers, and therefore is determined by implementation- 434 specific means. For example, internal codec interfaces may provide 435 information to set this reliably. 437 0 1 2 3 438 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 439 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 440 | ID=? | L=2 |S|E|I|D|B| TID |0|0|0|0|0|0|0|0| TL0PICIDX | 441 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 443 3.3.5. VP8 LID Mapping 445 The following shows the header extension for VP8 [RFC7741] that 446 contains only temporal layer information. 448 The S bit MUST match the correspondingly named bit in the VP8 payload 449 descriptor when PID=0, otherwise it MUST be 0. 451 The E bit MUST match the M bit in the RTP header. 453 The I bit MUST match the inverse of the P bit in the VP8 payload 454 header. 456 The D bit MUST match the N bit in the VP8 payload descriptor. 458 The B bit MUST match the Y bit in the VP8 payload descriptor. Note: 459 When using temporally nested scalability structures as recommended in 460 Section 3.5.2, the B bit and VP8 Y bit will always be 1 if TID is not 461 0, since it is always possible to switch up to a higher temporal 462 layer in such nested structures. 464 TID and TL0PICIDX MUST match the correspondingly named fields in the 465 VP8 payload descriptor. 467 0 1 2 3 468 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 469 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 470 | ID=? | L=2 |S|E|I|D|B| TID |0|0|0|0|0|0|0|0| TL0PICIDX | 471 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 473 3.3.6. Future Codec LID Mapping 475 The RTP payload format specification for future video codecs SHOULD 476 include a section describing the LID mapping and TID mapping for the 477 codec. 479 3.4. Signaling Information 481 The URI for declaring this header extension in an extmap attribute is 482 "urn:ietf:params:rtp-hdrext:framemarking". It does not contain any 483 extension attributes. 485 An example attribute line in SDP: 487 a=extmap:3 urn:ietf:params:rtp-hdrext:framemarking 489 3.5. Usage Considerations 491 The header extension values MUST represent what is already in the RTP 492 payload. 494 When an RTP switch needs to discard a received video frame due to 495 congestion control considerations, it is RECOMMENDED that it 496 preferably drop frames marked with the D (Discardable) bit set, or 497 the highest values of TID and LID, which indicate the highest 498 temporal and spatial/quality enhancement layers, since those 499 typically have fewer dependenices on them than lower layers. 501 When an RTP switch wants to forward a new video stream to a receiver, 502 it is RECOMMENDED to select the new video stream from the first 503 switching point with the I (Independent) bit set in all spatial 504 layers and forward the same. An RTP switch can request a media 505 source to generate a switching point by sending Full Intra Request 506 (RTCP FIR) as defined in [RFC5104], for example. 508 3.5.1. Relation to Layer Refresh Request (LRR) 510 Receivers can use the Layer Refresh Request (LRR) 511 [I-D.ietf-avtext-lrr] RTCP feedback message to upgrade to a higher 512 layer in scalable encodings. The TID/LID values and formats used in 513 LRR messages MUST correspond to the same values and formats specified 514 in Section 3.1. 516 Because frame marking can only be used with temporally-nested 517 streams, temporal-layer LRR refreshes are unnecessary for frame- 518 marked streams. Other refreshes can be detected based on the I bit 519 being set for the specific spatial layers. 521 3.5.2. Scalability Structures 523 The LID and TID information is most useful for fixed scalability 524 structures, such as nested hierarchical temporal layering structures, 525 where each temporal layer only references lower temporal layers or 526 the base temporal layer. The LID and TID information is less useful, 527 or even not useful at all, for complex, irregular scalability 528 structures that do not conform to common, fixed patterns of inter- 529 layer dependencies and referencing structures. Therefore it is 530 RECOMMENDED to use LID and TID information for RTP switch forwarding 531 decisions only in the case of temporally nested scalability 532 structures, and it is NOT RECOMMENDED for other (more complex or 533 irregular) scalability structures. 535 4. Security Considerations 537 In the Secure Real-Time Transport Protocol (SRTP) [RFC3711], RTP 538 header extensions are authenticated but usually not encrypted. When 539 header extensions are used some of the payload type information are 540 exposed and visible to middle boxes. The encrypted media data is not 541 exposed, so this is not seen as a high risk exposure. 543 5. Acknowledgements 545 Many thanks to Bernard Aboba, Jonathan Lennox, Stephan Wenger, Dale 546 Worley, and Magnus Westerlund for their inputs. 548 6. IANA Considerations 550 This document defines a new extension URI to the RTP Compact 551 HeaderExtensions sub-registry of the Real-Time Transport Protocol 552 (RTP) Parameters registry, according to the following data: 554 Extension URI: urn:ietf:params:rtp-hdrext:framemarkinginfo 555 Description: Frame marking information for video streams Contact: 556 mzanaty@cisco.com Reference: RFC XXXX 558 Note to RFC Editor: please replace RFC XXXX with the number of this 559 RFC. 561 7. References 563 7.1. Normative References 565 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 566 Requirement Levels", BCP 14, RFC 2119, 567 DOI 10.17487/RFC2119, March 1997, 568 . 570 [RFC8285] Singer, D., Desineni, H., and R. Even, Ed., "A General 571 Mechanism for RTP Header Extensions", RFC 8285, 572 DOI 10.17487/RFC8285, October 2017, 573 . 575 [RFC6184] Wang, Y.-K., Even, R., Kristensen, T., and R. Jesup, "RTP 576 Payload Format for H.264 Video", RFC 6184, 577 DOI 10.17487/RFC6184, May 2011, 578 . 580 [RFC6190] Wenger, S., Wang, Y.-K., Schierl, T., and A. 581 Eleftheriadis, "RTP Payload Format for Scalable Video 582 Coding", RFC 6190, DOI 10.17487/RFC6190, May 2011, 583 . 585 [RFC7741] Westin, P., Lundin, H., Glover, M., Uberti, J., and F. 586 Galligan, "RTP Payload Format for VP8 Video", RFC 7741, 587 DOI 10.17487/RFC7741, March 2016, 588 . 590 [RFC7798] Wang, Y.-K., Sanchez, Y., Schierl, T., Wenger, S., and M. 591 M. Hannuksela, "RTP Payload Format for High Efficiency 592 Video Coding (HEVC)", RFC 7798, DOI 10.17487/RFC7798, 593 March 2016, . 595 7.2. Informative References 597 [RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and 598 B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms 599 for Real-Time Transport Protocol (RTP) Sources", RFC 7656, 600 DOI 10.17487/RFC7656, November 2015, 601 . 603 [RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667, 604 DOI 10.17487/RFC7667, November 2015, 605 . 607 [RFC6464] Lennox, J., Ed., Ivov, E., and E. Marocco, "A Real-time 608 Transport Protocol (RTP) Header Extension for Client-to- 609 Mixer Audio Level Indication", RFC 6464, 610 DOI 10.17487/RFC6464, December 2011, 611 . 613 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 614 Jacobson, "RTP: A Transport Protocol for Real-Time 615 Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, 616 July 2003, . 618 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 619 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 620 RFC 3711, DOI 10.17487/RFC3711, March 2004, 621 . 623 [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman, 624 "Codec Control Messages in the RTP Audio-Visual Profile 625 with Feedback (AVPF)", RFC 5104, DOI 10.17487/RFC5104, 626 February 2008, . 628 [RFC8871] Jones, P., Benham, D., and C. Groves, "A Solution 629 Framework for Private Media in Privacy-Enhanced RTP 630 Conferencing (PERC)", RFC 8871, DOI 10.17487/RFC8871, 631 January 2021, . 633 [I-D.ietf-avtext-lrr] 634 Lennox, J., Hong, D., Uberti, J., Holmer, S., and M. 635 Flodman, "The Layer Refresh Request (LRR) RTCP Feedback 636 Message", Work in Progress, Internet-Draft, draft-ietf- 637 avtext-lrr-07, 2 July 2017, 638 . 641 [I-D.ietf-payload-vp9] 642 Uberti, J., Holmer, S., Flodman, M., Hong, D., and J. 643 Lennox, "RTP Payload Format for VP9 Video", Work in 644 Progress, Internet-Draft, draft-ietf-payload-vp9-16, 10 645 June 2021, . 648 Authors' Addresses 649 Mo Zanaty 650 Cisco Systems 651 170 West Tasman Drive 652 San Jose, CA 95134 653 United States of America 655 Email: mzanaty@cisco.com 657 Espen Berger 658 Cisco Systems 660 Email: espeberg@cisco.com 662 Suhas Nandakumar 663 Cisco Systems 664 170 West Tasman Drive 665 San Jose, CA 95134 666 United States of America 668 Email: snandaku@cisco.com