idnits 2.17.1 draft-ietf-avtext-framemarking-12.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but does not include the phrase in its RFC 2119 key words list. -- The document date (March 10, 2021) is 1136 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-16) exists of draft-ietf-payload-vp9-10 Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Zanaty 3 Internet-Draft E. Berger 4 Intended status: Standards Track S. Nandakumar 5 Expires: September 11, 2021 Cisco Systems 6 March 10, 2021 8 Frame Marking RTP Header Extension 9 draft-ietf-avtext-framemarking-12 11 Abstract 13 This document describes a Frame Marking RTP header extension used to 14 convey information about video frames that is critical for error 15 recovery and packet forwarding in RTP middleboxes or network nodes. 16 It is most useful when media is encrypted, and essential when the 17 middlebox or node has no access to the media decryption keys. It is 18 also useful for codec-agnostic processing of encrypted or unencrypted 19 media, while it also supports extensions for codec-specific 20 information. 22 Status of This Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at https://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on September 11, 2021. 39 Copyright Notice 41 Copyright (c) 2021 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (https://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with respect 49 to this document. Code Components extracted from this document must 50 include Simplified BSD License text as described in Section 4.e of 51 the Trust Legal Provisions and are provided without warranty as 52 described in the Simplified BSD License. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 57 2. Key Words for Normative Requirements . . . . . . . . . . . . 4 58 3. Frame Marking RTP Header Extension . . . . . . . . . . . . . 4 59 3.1. Long Extension for Scalable Streams . . . . . . . . . . . 4 60 3.2. Short Extension for Non-Scalable Streams . . . . . . . . 6 61 3.3. Layer ID Mappings for Scalable Streams . . . . . . . . . 7 62 3.3.1. VP9 LID Mapping . . . . . . . . . . . . . . . . . . . 7 63 3.3.2. H265 LID Mapping . . . . . . . . . . . . . . . . . . 8 64 3.3.3. H264-SVC LID Mapping . . . . . . . . . . . . . . . . 9 65 3.3.4. H264 (AVC) LID Mapping . . . . . . . . . . . . . . . 9 66 3.3.5. VP8 LID Mapping . . . . . . . . . . . . . . . . . . . 10 67 3.3.6. Future Codec LID Mapping . . . . . . . . . . . . . . 11 68 3.4. Signaling Information . . . . . . . . . . . . . . . . . . 11 69 3.5. Usage Considerations . . . . . . . . . . . . . . . . . . 11 70 3.5.1. Relation to Layer Refresh Request (LRR) . . . . . . . 11 71 3.5.2. Scalability Structures . . . . . . . . . . . . . . . 12 72 4. Security Considerations . . . . . . . . . . . . . . . . . . . 12 73 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 12 74 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 75 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 13 76 7.1. Normative References . . . . . . . . . . . . . . . . . . 13 77 7.2. Informative References . . . . . . . . . . . . . . . . . 13 78 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 14 80 1. Introduction 82 Many widely deployed RTP [RFC3550] topologies [RFC7667] used in 83 modern voice and video conferencing systems include a centralized 84 component that acts as an RTP switch. It receives voice and video 85 streams from each participant, which may be encrypted using SRTP 86 [RFC3711], or extensions that provide participants with private media 87 [RFC8871] via end-to-end encryption where the switch has no access to 88 media decryption keys. The goal is to provide a set of streams back 89 to the participants which enable them to render the right media 90 content. In a simple video configuration, for example, the goal will 91 be that each participant sees and hears just the active speaker. In 92 that case, the goal of the switch is to receive the voice and video 93 streams from each participant, determine the active speaker based on 94 energy in the voice packets, possibly using the client-to-mixer audio 95 level RTP header extension [RFC6464], and select the corresponding 96 video stream for transmission to participants; see Figure 1. 98 In this document, an "RTP switch" is used as a common short term for 99 the terms "switching RTP mixer", "source projecting middlebox", 100 "source forwarding unit/middlebox" and "video switching MCU" as 101 discussed in [RFC7667]. 103 +---+ +------------+ +---+ 104 | A |<---->| |<---->| B | 105 +---+ | | +---+ 106 | RTP | 107 +---+ | Switch | +---+ 108 | C |<---->| |<---->| D | 109 +---+ +------------+ +---+ 111 Figure 1: RTP switch 113 In order to properly support switching of video streams, the RTP 114 switch typically needs some critical information about video frames 115 in order to start and stop forwarding streams. 117 o Because of inter-frame dependencies, it should ideally switch 118 video streams at a point where the first frame from the new 119 speaker can be decoded by recipients without prior frames, e.g 120 switch on an intra-frame. 121 o In many cases, the switch may need to drop frames in order to 122 realize congestion control techniques, and needs to know which 123 frames can be dropped with minimal impact to video quality. 124 o For scalable streams with dependent layers, the switch may need to 125 selectively forward specific layers to specific recipients due to 126 recipient bandwidth or decoder limits. 127 o Furthermore, it is highly desirable to do this in a payload 128 format-agnostic way which is not specific to each different video 129 codec. Most modern video codecs share common concepts around 130 frame types and other critical information to make this codec- 131 agnostic handling possible. 132 o It is also desirable to be able to do this for SRTP without 133 requiring the video switch to decrypt the packets. SRTP will 134 encrypt the RTP payload format contents and consequently this data 135 is not usable for the switching function without decryption, which 136 may not even be possible in the case of end-to-end encryption of 137 private media [RFC8871]. 139 By providing meta-information about the RTP streams outside the 140 encrypted media payload, an RTP switch can do codec-agnostic 141 selective forwarding without decrypting the payload. This document 142 specifies the necessary meta-information in an RTP header extension. 144 2. Key Words for Normative Requirements 146 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 147 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 148 document are to be interpreted as described in [RFC2119]. 150 3. Frame Marking RTP Header Extension 152 This specification uses RTP header extensions as defined in 153 [RFC8285]. A subset of meta-information from the video stream is 154 provided as an RTP header extension to allow an RTP switch to do 155 generic selective forwarding of video streams encoded with 156 potentially different video codecs. 158 The Frame Marking RTP header extension is encoded using the one-byte 159 header or two-byte header as described in [RFC8285]. The one-byte 160 header format is used for examples in this memo. The two-byte header 161 format is used when other two-byte header extensions are present in 162 the same RTP packet, since mixing one-byte and two-byte extensions is 163 not possible in the same RTP packet. 165 This extension is only specified for Source (not Redundancy) RTP 166 Streams [RFC7656] that carry video payloads. It is not specified for 167 audio payloads, nor is it specified for Redundancy RTP Streams. The 168 (separate) specifications for Redundancy RTP Streams often include 169 provisions for recovering any header extensions that were part of the 170 original source packet. Such provisions SHALL be followed to recover 171 the Frame Marking RTP header extension of the original source packet. 172 Source packet frame markings may be useful when generating Redundancy 173 RTP Streams; for example, the I and D bits can be used to generate 174 extra or no redundancy, respectively, and redundancy schemes with 175 source blocks can align source block boundaries with Independent 176 frame boundaries as marked by the I bit. 178 A frame, in the context of this specification, is the set of RTP 179 packets with the same RTP timestamp from a specific RTP 180 synchronization source (SSRC). A frame within a layer is the set of 181 RTP packets with the same RTP timestamp, SSRC, Temporal ID (TID), and 182 Layer ID (LID). 184 3.1. Long Extension for Scalable Streams 186 The following RTP header extension is RECOMMENDED for scalable 187 streams. It MAY also be used for non-scalable streams, in which case 188 TID, LID and TL0PICIDX MUST be 0 or omitted. The ID is assigned per 189 [RFC8285], and the length is encoded as L=2 which indicates 3 octets 190 of data when nothing is omitted, or L=1 for 2 octets when TL0PICIDX 191 is omitted, or L=0 for 1 octet when both LID and TL0PICIDX are 192 omitted. 194 0 1 2 3 195 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 196 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 197 | ID=? | L=2 |S|E|I|D|B| TID | LID | TL0PICIDX | 198 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 199 or 200 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 201 | ID=? | L=1 |S|E|I|D|B| TID | LID | (TL0PICIDX omitted) 202 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 203 or 204 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 205 | ID=? | L=0 |S|E|I|D|B| TID | (LID and TL0PICIDX omitted) 206 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 208 The following information are extracted from the media payload and 209 sent in the Frame Marking RTP header extension. 211 o S: Start of Frame (1 bit) - MUST be 1 in the first packet in a 212 frame within a layer; otherwise MUST be 0. 213 o E: End of Frame (1 bit) - MUST be 1 in the last packet in a frame 214 within a layer; otherwise MUST be 0. Note that the RTP header 215 marker bit MAY be used to infer the last packet of the highest 216 enhancement layer, in payload formats with such semantics. 217 o I: Independent Frame (1 bit) - MUST be 1 for a frame within a 218 layer that can be decoded independent of temporally prior frames, 219 e.g. intra-frame, VPX keyframe, H.264 IDR [RFC6184], H.265 220 IDR/CRA/BLA/RAP [RFC7798]; otherwise MUST be 0. Note that this 221 bit only signals temporal independence, so it can be 1 in spatial 222 or quality enhancement layers that depend on temporally co-located 223 layers but not temporally prior frames. 224 o D: Discardable Frame (1 bit) - MUST be 1 for a frame within a 225 layer the sender knows can be discarded, and still provide a 226 decodable media stream; otherwise MUST be 0. 227 o B: Base Layer Sync (1 bit) - When TID is not 0, this MUST be 1 if 228 the sender knows this frame within a layer only depends on the 229 base temporal layer; otherwise MUST be 0. When TID is 0 or if no 230 scalability is used, this MUST be 0. 231 o TID: Temporal ID (3 bits) - Identifies the temporal layer/sub- 232 layer encoded, starting with 0 for the base layer, and increasing 233 with higher temporal fidelity. If no scalability is used, this 234 MUST be 0. It is implicitly 0 in the short extension format. 235 o LID: Layer ID (8 bits) - Identifies the spatial and quality layer 236 encoded, starting with 0 for the base layer, and increasing with 237 higher fidelity. If no scalability is used, this MUST be 0 or 238 omitted to reduce length. When omitted, TL0PICIDX MUST also be 239 omitted. It is implicitly 0 in the short extension format or when 240 omitted in the long extension format. 241 o TL0PICIDX: Temporal Layer 0 Picture Index (8 bits) - When TID is 0 242 and LID is 0, this is a cyclic counter labeling base layer frames. 243 When TID is not 0 or LID is not 0, this indicates a dependency on 244 the given index, such that this frame within this layer depends on 245 the frame with this label in the layer with TID 0 and LID 0. If 246 no scalability is used, or the cyclic counter is unknown, this 247 MUST be omitted to reduce length. Note that 0 is a valid index 248 value for TL0PICIDX. 250 The layer information contained in TID and LID convey useful aspects 251 of the layer structure that can be utilized in selective forwarding. 253 Without further information about the layer structure, these TID/LID 254 identifiers can only be used for relative priority of layers and 255 implicit dependencies between layers. They convey a layer hierarchy 256 with TID=0 and LID=0 identifying the base layer. Higher values of 257 TID identify higher temporal layers with higher frame rates. Higher 258 values of LID identify higher spatial and/or quality layers with 259 higher resolutions and/or bitrates. Implicit dependencies between 260 layers assume that a layer with a given TID/LID MAY depend on 261 layer(s) with the same or lower TID/LID, but MUST NOT depend on 262 layer(s) with higher TID/LID. 264 With further information, for example, possible future RTCP SDES 265 items that convey full layer structure information, it may be 266 possible to map these TIDs and LIDs to specific absolute frame rates, 267 resolutions and bitrates, as well as explicit dependencies between 268 layers. Such additional layer information may be useful for 269 forwarding decisions in the RTP switch, but is beyond the scope of 270 this memo. The relative layer information is still useful for many 271 selective forwarding decisions even without such additional layer 272 information. 274 3.2. Short Extension for Non-Scalable Streams 276 The following RTP header extension is RECOMMENDED for non-scalable 277 streams. It is identical to the shortest form of the extension for 278 scalable streams, except the last four bits (B and TID) are replaced 279 with zeros. It MAY also be used for scalable streams if the sender 280 has limited or no information about stream scalability. The ID is 281 assigned per [RFC8285], and the length is encoded as L=0 which 282 indicates 1 octet of data. 284 0 1 285 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 286 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 287 | ID=? | L=0 |S|E|I|D|0 0 0 0| 288 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 290 The following information are extracted from the media payload and 291 sent in the Frame Marking RTP header extension. 293 o S: Start of Frame (1 bit) - MUST be 1 in the first packet in a 294 frame; otherwise MUST be 0. 295 o E: End of Frame (1 bit) - MUST be 1 in the last packet in a frame; 296 otherwise MUST be 0. SHOULD match the RTP header marker bit in 297 payload formats with such semantics for marking end of frame. 298 o I: Independent Frame (1 bit) - MUST be 1 for frames that can be 299 decoded independent of temporally prior frames, e.g. intra-frame, 300 VPX keyframe, H.264 IDR [RFC6184], H.265 IDR/CRA/BLA/IRAP 301 [RFC7798]; otherwise MUST be 0. 302 o D: Discardable Frame (1 bit) - MUST be 1 for frames the sender 303 knows can be discarded, and still provide a decodable media 304 stream; otherwise MUST be 0. 305 o The remaining (4 bits) - are reserved/fixed values and not used 306 for non-scalable streams; they MUST be set to 0 upon transmission 307 and ignored upon reception. 309 3.3. Layer ID Mappings for Scalable Streams 311 This section maps the specific Layer ID information contained in 312 specific scalable codecs to the generic LID and TID fields. 314 Note that non-scalable streams have no Layer ID information and thus 315 no mappings. 317 3.3.1. VP9 LID Mapping 319 The following shows the VP9 [I-D.ietf-payload-vp9] Spatial Layer ID 320 (SID, 3 bits) and Temporal Layer ID (TID, 3 bits) from the VP9 321 payload descriptor mapped to the generic LID and TID fields. 323 The S bit MUST match the B bit in the VP9 payload descriptor. 325 The E bit MUST match the E bit in the VP9 payload descriptor. 327 The I bit MUST match the inverse of the P bit in the VP9 payload 328 descriptor. 330 The D bit MUST be 1 if the refresh_frame_flags in the VP9 payload 331 uncompressed header are all 0, otherwise it MUST be 0. 333 The B bit MUST be 0 if TID is 0; otherwise, if TID is not 0, it MUST 334 match the U bit in the VP9 payload descriptor. Note: When using 335 temporally nested scalability structures as recommended in 336 Section 3.5.2, the B bit and VP9 U bit will always be 1 if TID is not 337 0, since it is always possible to switch up to a higher temporal 338 layer in such nested structures. 340 TID and TL0PICIDX MUST match the correspondingly named fields in the 341 VP9 payload descriptor. 343 0 1 2 3 344 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 345 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 346 | ID=? | L=2 |S|E|I|D|B| TID |0|0|0|0|0| SID | TL0PICIDX | 347 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 349 3.3.2. H265 LID Mapping 351 The following shows the H265 [RFC7798] LayerID (6 bits) and TID (3 352 bits) from the NAL unit header mapped to the generic LID and TID 353 fields. 355 The S and E bits MUST match the correspondingly named bits in 356 PACI:PHES:TSCI payload structures. 358 The I bit MUST be 1 when the NAL unit type is 16-23 (inclusive) or 359 32-34 (inclusive), or an aggregation packet or fragmentation unit 360 encapsulating any of these types, otherwise it MUST be 0. These 361 ranges cover intra (IRAP) frames as well as critical parameter sets 362 (VPS, SPS, PPS). 364 The D bit MUST be 1 when the NAL unit type is 0, 2, 4, 6, 8, 10, 12, 365 14, or 38, or an aggregation packet or fragmentation unit 366 encapsulating only these types, otherwise it MUST be 0. These ranges 367 cover non-reference frames as well as filler data. 369 The B bit can not be determined reliably from simple inspection of 370 payload headers, and therefore is determined by implementation- 371 specific means. For example, internal codec interfaces may provide 372 information to set this reliably. 374 0 1 2 3 375 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 376 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 377 | ID=? | L=2 |S|E|I|D|B| TID |0|0| LayerID | TL0PICIDX | 378 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 380 3.3.3. H264-SVC LID Mapping 382 The following shows H264-SVC [RFC6190] Layer encoding information (3 383 bits for spatial/dependency layer, 4 bits for quality layer and 3 384 bits for temporal layer) mapped to the generic LID and TID fields. 386 The S, E, I and D bits MUST match the correspondingly named bits in 387 PACSI payload structures. 389 The I bit MUST be 1 when the NAL unit type is 5, 7, 8, 13, or 15, or 390 an aggregation packet or fragmentation unit encapsulating any of 391 these types, otherwise it MUST be 0. These ranges cover intra (IDR) 392 frames as well as critical parameter sets (SPS/PPS variants). 394 The D bit MUST be 1 when the NAL unit header NRI field is 0, or an 395 aggregation packet or fragmentation unit encapsulating only NAL units 396 with NRI=0, otherwise it MUST be 0. The NRI=0 condition signals non- 397 reference frames. 399 The B bit can not be determined reliably from simple inspection of 400 payload headers, and therefore is determined by implementation- 401 specific means. For example, internal codec interfaces may provide 402 information to set this reliably. 404 0 1 2 3 405 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 406 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 407 | ID=? | L=2 |S|E|I|D|B| TID |0| DID | QID | TL0PICIDX | 408 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 410 3.3.4. H264 (AVC) LID Mapping 412 The following shows the header extension for H264 (AVC) [RFC6184] 413 that contains only temporal layer information. 415 The S bit MUST be 1 when the timestamp in the RTP header differs from 416 the timestamp in the prior RTP sequence number from the same SSRC, 417 otherwise it MUST be 0. 419 The E bit MUST match the M bit in the RTP header. 421 The I bit MUST be 1 when the NAL unit type is 5, 7, or 8, or an 422 aggregation packet or fragmentation unit encapsulating any of these 423 types, otherwise it MUST be 0. These ranges cover intra (IDR) frames 424 as well as critical parameter sets (SPS/PPS). 426 The D bit MUST be 1 when the NAL unit header NRI field is 0, or an 427 aggregation packet or fragmentation unit encapsulating only NAL units 428 with NRI=0, otherwise it MUST be 0. The NRI=0 condition signals non- 429 reference frames. 431 The B bit can not be determined reliably from simple inspection of 432 payload headers, and therefore is determined by implementation- 433 specific means. For example, internal codec interfaces may provide 434 information to set this reliably. 436 0 1 2 3 437 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 438 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 439 | ID=? | L=2 |S|E|I|D|B| TID |0|0|0|0|0|0|0|0| TL0PICIDX | 440 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 442 3.3.5. VP8 LID Mapping 444 The following shows the header extension for VP8 [RFC7741] that 445 contains only temporal layer information. 447 The S bit MUST match the correspondingly named bit in the VP8 payload 448 descriptor when PID=0, otherwise it MUST be 0. 450 The E bit MUST match the M bit in the RTP header. 452 The I bit MUST match the inverse of the P bit in the VP8 payload 453 header. 455 The D bit MUST match the N bit in the VP8 payload descriptor. 457 The B bit MUST match the Y bit in the VP8 payload descriptor. Note: 458 When using temporally nested scalability structures as recommended in 459 Section 3.5.2, the B bit and VP8 Y bit will always be 1 if TID is not 460 0, since it is always possible to switch up to a higher temporal 461 layer in such nested structures. 463 TID and TL0PICIDX MUST match the correspondingly named fields in the 464 VP8 payload descriptor. 466 0 1 2 3 467 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 468 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 469 | ID=? | L=2 |S|E|I|D|B| TID |0|0|0|0|0|0|0|0| TL0PICIDX | 470 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 472 3.3.6. Future Codec LID Mapping 474 The RTP payload format specification for future video codecs SHOULD 475 include a section describing the LID mapping and TID mapping for the 476 codec. 478 3.4. Signaling Information 480 The URI for declaring this header extension in an extmap attribute is 481 "urn:ietf:params:rtp-hdrext:framemarking". It does not contain any 482 extension attributes. 484 An example attribute line in SDP: 486 a=extmap:3 urn:ietf:params:rtp-hdrext:framemarking 488 3.5. Usage Considerations 490 The header extension values MUST represent what is already in the RTP 491 payload. 493 When an RTP switch needs to discard a received video frame due to 494 congestion control considerations, it is RECOMMENDED that it 495 preferably drop frames marked with the D (Discardable) bit set, or 496 the highest values of TID and LID, which indicate the highest 497 temporal and spatial/quality enhancement layers, since those 498 typically have fewer dependenices on them than lower layers. 500 When an RTP switch wants to forward a new video stream to a receiver, 501 it is RECOMMENDED to select the new video stream from the first 502 switching point with the I (Independent) bit set in all spatial 503 layers and forward the same. An RTP switch can request a media 504 source to generate a switching point by sending Full Intra Request 505 (RTCP FIR) as defined in [RFC5104], for example. 507 3.5.1. Relation to Layer Refresh Request (LRR) 509 Receivers can use the Layer Refresh Request (LRR) 510 [I-D.ietf-avtext-lrr] RTCP feedback message to upgrade to a higher 511 layer in scalable encodings. The TID/LID values and formats used in 512 LRR messages MUST correspond to the same values and formats specified 513 in Section 3.1. 515 Because frame marking can only be used with temporally-nested 516 streams, temporal-layer LRR refreshes are unnecessary for frame- 517 marked streams. Other refreshes can be detected based on the I bit 518 being set for the specific spatial layers. 520 3.5.2. Scalability Structures 522 The LID and TID information is most useful for fixed scalability 523 structures, such as nested hierarchical temporal layering structures, 524 where each temporal layer only references lower temporal layers or 525 the base temporal layer. The LID and TID information is less useful, 526 or even not useful at all, for complex, irregular scalability 527 structures that do not conform to common, fixed patterns of inter- 528 layer dependencies and referencing structures. Therefore it is 529 RECOMMENDED to use LID and TID information for RTP switch forwarding 530 decisions only in the case of temporally nested scalability 531 structures, and it is NOT RECOMMENDED for other (more complex or 532 irregular) scalability structures. 534 4. Security Considerations 536 In the Secure Real-Time Transport Protocol (SRTP) [RFC3711], RTP 537 header extensions are authenticated but usually not encrypted. When 538 header extensions are used some of the payload type information are 539 exposed and visible to middle boxes. The encrypted media data is not 540 exposed, so this is not seen as a high risk exposure. 542 5. Acknowledgements 544 Many thanks to Bernard Aboba, Jonathan Lennox, Stephan Wenger, Dale 545 Worley, and Magnus Westerlund for their inputs. 547 6. IANA Considerations 549 This document defines a new extension URI to the RTP Compact 550 HeaderExtensions sub-registry of the Real-Time Transport Protocol 551 (RTP) Parameters registry, according to the following data: 553 Extension URI: urn:ietf:params:rtp-hdrext:framemarkinginfo 554 Description: Frame marking information for video streams 555 Contact: mzanaty@cisco.com 556 Reference: RFC XXXX 558 Note to RFC Editor: please replace RFC XXXX with the number of this 559 RFC. 561 7. References 563 7.1. Normative References 565 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 566 Requirement Levels", BCP 14, RFC 2119, 567 DOI 10.17487/RFC2119, March 1997, 568 . 570 [RFC6184] Wang, Y., Even, R., Kristensen, T., and R. Jesup, "RTP 571 Payload Format for H.264 Video", RFC 6184, 572 DOI 10.17487/RFC6184, May 2011, 573 . 575 [RFC6190] Wenger, S., Wang, Y., Schierl, T., and A. Eleftheriadis, 576 "RTP Payload Format for Scalable Video Coding", RFC 6190, 577 DOI 10.17487/RFC6190, May 2011, 578 . 580 [RFC7741] Westin, P., Lundin, H., Glover, M., Uberti, J., and F. 581 Galligan, "RTP Payload Format for VP8 Video", RFC 7741, 582 DOI 10.17487/RFC7741, March 2016, 583 . 585 [RFC7798] Wang, Y., Sanchez, Y., Schierl, T., Wenger, S., and M. 586 Hannuksela, "RTP Payload Format for High Efficiency Video 587 Coding (HEVC)", RFC 7798, DOI 10.17487/RFC7798, March 588 2016, . 590 [RFC8285] Singer, D., Desineni, H., and R. Even, Ed., "A General 591 Mechanism for RTP Header Extensions", RFC 8285, 592 DOI 10.17487/RFC8285, October 2017, 593 . 595 7.2. Informative References 597 [I-D.ietf-avtext-lrr] 598 Lennox, J., Hong, D., Uberti, J., Holmer, S., and M. 599 Flodman, "The Layer Refresh Request (LRR) RTCP Feedback 600 Message", draft-ietf-avtext-lrr-07 (work in progress), 601 July 2017. 603 [I-D.ietf-payload-vp9] 604 Uberti, J., Holmer, S., Flodman, M., Hong, D., and J. 605 Lennox, "RTP Payload Format for VP9 Video", draft-ietf- 606 payload-vp9-10 (work in progress), July 2020. 608 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 609 Jacobson, "RTP: A Transport Protocol for Real-Time 610 Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, 611 July 2003, . 613 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 614 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 615 RFC 3711, DOI 10.17487/RFC3711, March 2004, 616 . 618 [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman, 619 "Codec Control Messages in the RTP Audio-Visual Profile 620 with Feedback (AVPF)", RFC 5104, DOI 10.17487/RFC5104, 621 February 2008, . 623 [RFC6464] Lennox, J., Ed., Ivov, E., and E. Marocco, "A Real-time 624 Transport Protocol (RTP) Header Extension for Client-to- 625 Mixer Audio Level Indication", RFC 6464, 626 DOI 10.17487/RFC6464, December 2011, 627 . 629 [RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and 630 B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms 631 for Real-Time Transport Protocol (RTP) Sources", RFC 7656, 632 DOI 10.17487/RFC7656, November 2015, 633 . 635 [RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667, 636 DOI 10.17487/RFC7667, November 2015, 637 . 639 [RFC8871] Jones, P., Benham, D., and C. Groves, "A Solution 640 Framework for Private Media in Privacy-Enhanced RTP 641 Conferencing (PERC)", RFC 8871, DOI 10.17487/RFC8871, 642 January 2021, . 644 Authors' Addresses 646 Mo Zanaty 647 Cisco Systems 648 170 West Tasman Drive 649 San Jose, CA 95134 650 US 652 Email: mzanaty@cisco.com 653 Espen Berger 654 Cisco Systems 656 Phone: +47 98228179 657 Email: espeberg@cisco.com 659 Suhas Nandakumar 660 Cisco Systems 661 170 West Tasman Drive 662 San Jose, CA 95134 663 US 665 Email: snandaku@cisco.com