idnits 2.17.1 draft-ietf-avtext-framemarking-11.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but does not include the phrase in its RFC 2119 key words list. -- The document date (August 4, 2020) is 1362 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-16) exists of draft-ietf-payload-vp9-10 Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Zanaty 3 Internet-Draft E. Berger 4 Intended status: Standards Track S. Nandakumar 5 Expires: February 5, 2021 Cisco Systems 6 August 4, 2020 8 Frame Marking RTP Header Extension 9 draft-ietf-avtext-framemarking-11 11 Abstract 13 This document describes a Frame Marking RTP header extension used to 14 convey information about video frames that is critical for error 15 recovery and packet forwarding in RTP middleboxes or network nodes. 16 It is most useful when media is encrypted, and essential when the 17 middlebox or node has no access to the media decryption keys. It is 18 also useful for codec-agnostic processing of encrypted or unencrypted 19 media, while it also supports extensions for codec-specific 20 information. 22 Status of This Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at https://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on February 5, 2021. 39 Copyright Notice 41 Copyright (c) 2020 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (https://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with respect 49 to this document. Code Components extracted from this document must 50 include Simplified BSD License text as described in Section 4.e of 51 the Trust Legal Provisions and are provided without warranty as 52 described in the Simplified BSD License. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 57 2. Key Words for Normative Requirements . . . . . . . . . . . . 4 58 3. Frame Marking RTP Header Extension . . . . . . . . . . . . . 4 59 3.1. Long Extension for Scalable Streams . . . . . . . . . . . 4 60 3.2. Short Extension for Non-Scalable Streams . . . . . . . . 6 61 3.3. Layer ID Mappings for Scalable Streams . . . . . . . . . 7 62 3.3.1. H265 LID Mapping . . . . . . . . . . . . . . . . . . 7 63 3.3.2. H264-SVC LID Mapping . . . . . . . . . . . . . . . . 8 64 3.3.3. H264 (AVC) LID Mapping . . . . . . . . . . . . . . . 9 65 3.3.4. VP8 LID Mapping . . . . . . . . . . . . . . . . . . . 9 66 3.3.5. Future Codec LID Mapping . . . . . . . . . . . . . . 10 67 3.4. Signaling Information . . . . . . . . . . . . . . . . . . 10 68 3.5. Usage Considerations . . . . . . . . . . . . . . . . . . 10 69 3.5.1. Relation to Layer Refresh Request (LRR) . . . . . . . 10 70 3.5.2. Scalability Structures . . . . . . . . . . . . . . . 11 71 4. Security Considerations . . . . . . . . . . . . . . . . . . . 11 72 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 11 73 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 74 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 12 75 7.1. Normative References . . . . . . . . . . . . . . . . . . 12 76 7.2. Informative References . . . . . . . . . . . . . . . . . 12 77 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13 79 1. Introduction 81 Many widely deployed RTP [RFC3550] topologies [RFC7667] used in 82 modern voice and video conferencing systems include a centralized 83 component that acts as an RTP switch. It receives voice and video 84 streams from each participant, which may be encrypted using SRTP 85 [RFC3711], or extensions that provide participants with private media 86 [I-D.ietf-perc-private-media-framework] via end-to-end encryption 87 where the switch has no access to media decryption keys. The goal is 88 to provide a set of streams back to the participants which enable 89 them to render the right media content. In a simple video 90 configuration, for example, the goal will be that each participant 91 sees and hears just the active speaker. In that case, the goal of 92 the switch is to receive the voice and video streams from each 93 participant, determine the active speaker based on energy in the 94 voice packets, possibly using the client-to-mixer audio level RTP 95 header extension [RFC6464], and select the corresponding video stream 96 for transmission to participants; see Figure 1. 98 In this document, an "RTP switch" is used as a common short term for 99 the terms "switching RTP mixer", "source projecting middlebox", 100 "source forwarding unit/middlebox" and "video switching MCU" as 101 discussed in [RFC7667]. 103 +---+ +------------+ +---+ 104 | A |<---->| |<---->| B | 105 +---+ | | +---+ 106 | RTP | 107 +---+ | Switch | +---+ 108 | C |<---->| |<---->| D | 109 +---+ +------------+ +---+ 111 Figure 1: RTP switch 113 In order to properly support switching of video streams, the RTP 114 switch typically needs some critical information about video frames 115 in order to start and stop forwarding streams. 117 o Because of inter-frame dependencies, it should ideally switch 118 video streams at a point where the first frame from the new 119 speaker can be decoded by recipients without prior frames, e.g 120 switch on an intra-frame. 121 o In many cases, the switch may need to drop frames in order to 122 realize congestion control techniques, and needs to know which 123 frames can be dropped with minimal impact to video quality. 124 o For scalable streams with dependent layers, the switch may need to 125 selectively forward specific layers to specific recipients due to 126 recipient bandwidth or decoder limits. 127 o Furthermore, it is highly desirable to do this in a payload 128 format-agnostic way which is not specific to each different video 129 codec. Most modern video codecs share common concepts around 130 frame types and other critical information to make this codec- 131 agnostic handling possible. 132 o It is also desirable to be able to do this for SRTP without 133 requiring the video switch to decrypt the packets. SRTP will 134 encrypt the RTP payload format contents and consequently this data 135 is not usable for the switching function without decryption, which 136 may not even be possible in the case of end-to-end encryption of 137 private media [I-D.ietf-perc-private-media-framework]. 139 By providing meta-information about the RTP streams outside the 140 encrypted media payload, an RTP switch can do codec-agnostic 141 selective forwarding without decrypting the payload. This document 142 specifies the necessary meta-information in an RTP header extension. 144 2. Key Words for Normative Requirements 146 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 147 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 148 document are to be interpreted as described in [RFC2119]. 150 3. Frame Marking RTP Header Extension 152 This specification uses RTP header extensions as defined in 153 [RFC8285]. A subset of meta-information from the video stream is 154 provided as an RTP header extension to allow an RTP switch to do 155 generic selective forwarding of video streams encoded with 156 potentially different video codecs. 158 The Frame Marking RTP header extension is encoded using the one-byte 159 header or two-byte header as described in [RFC8285]. The one-byte 160 header format is used for examples in this memo. The two-byte header 161 format is used when other two-byte header extensions are present in 162 the same RTP packet, since mixing one-byte and two-byte extensions is 163 not possible in the same RTP packet. 165 This extension is only specified for Source (not Redundancy) RTP 166 Streams [RFC7656] that carry video payloads. It is not specified for 167 audio payloads, nor is it specified for Redundancy RTP Streams. The 168 (separate) specifications for Redundancy RTP Streams often include 169 provisions for recovering any header extensions that were part of the 170 original source packet. Such provisions SHALL be followed to recover 171 the Frame Marking RTP header extension of the original source packet. 172 Source packet frame markings may be useful when generating Redundancy 173 RTP Streams; for example, the I and D bits can be used to generate 174 extra or no redundancy, respectively, and redundancy schemes with 175 source blocks can align source block boundaries with Independent 176 frame boundaries as marked by the I bit. 178 A frame, in the context of this specification, is the set of RTP 179 packets with the same RTP timestamp from a specific RTP 180 synchronization source (SSRC). A frame within a layer is the set of 181 RTP packets with the same RTP timestamp, SSRC, Temporal ID (TID), and 182 Layer ID (LID). 184 3.1. Long Extension for Scalable Streams 186 The following RTP header extension is RECOMMENDED for scalable 187 streams. It MAY also be used for non-scalable streams, in which case 188 TID, LID and TL0PICIDX MUST be 0 or omitted. The ID is assigned per 189 [RFC8285], and the length is encoded as L=2 which indicates 3 octets 190 of data when nothing is omitted, or L=1 for 2 octets when TL0PICIDX 191 is omitted, or L=0 for 1 octet when both LID and TL0PICIDX are 192 omitted. 194 0 1 2 3 195 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 196 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 197 | ID=? | L=2 |S|E|I|D|B| TID | LID | TL0PICIDX | 198 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 199 or 200 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 201 | ID=? | L=1 |S|E|I|D|B| TID | LID | (TL0PICIDX omitted) 202 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 203 or 204 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 205 | ID=? | L=0 |S|E|I|D|B| TID | (LID and TL0PICIDX omitted) 206 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 208 The following information are extracted from the media payload and 209 sent in the Frame Marking RTP header extension. 211 o S: Start of Frame (1 bit) - MUST be 1 in the first packet in a 212 frame within a layer; otherwise MUST be 0. 213 o E: End of Frame (1 bit) - MUST be 1 in the last packet in a frame 214 within a layer; otherwise MUST be 0. Note that the RTP header 215 marker bit MAY be used to infer the last packet of the highest 216 enhancement layer, in payload formats with such semantics. 217 o I: Independent Frame (1 bit) - MUST be 1 for a frame within a 218 layer that can be decoded independent of temporally prior frames, 219 e.g. intra-frame, VPX keyframe, H.264 IDR [RFC6184], H.265 220 IDR/CRA/BLA/RAP [RFC7798]; otherwise MUST be 0. Note that this 221 bit only signals temporal independence, so it can be 1 in spatial 222 or quality enhancement layers that depend on temporally co-located 223 layers but not temporally prior frames. 224 o D: Discardable Frame (1 bit) - MUST be 1 for a frame within a 225 layer the sender knows can be discarded, and still provide a 226 decodable media stream; otherwise MUST be 0. 227 o B: Base Layer Sync (1 bit) - When TID is not 0, this MUST be 1 if 228 the sender knows this frame within a layer only depends on the 229 base temporal layer; otherwise MUST be 0. When TID is 0 or if no 230 scalability is used, this MUST be 0. 231 o TID: Temporal ID (3 bits) - Identifies the temporal layer/sub- 232 layer encoded, starting with 0 for the base layer, and increasing 233 with higher temporal fidelity. If no scalability is used, this 234 MUST be 0. It is implicitly 0 in the short extension format. 235 o LID: Layer ID (8 bits) - Identifies the spatial and quality layer 236 encoded, starting with 0 for the base layer, and increasing with 237 higher fidelity. If no scalability is used, this MUST be 0 or 238 omitted to reduce length. When omitted, TL0PICIDX MUST also be 239 omitted. It is implicitly 0 in the short extension format or when 240 omitted in the long extension format. 241 o TL0PICIDX: Temporal Layer 0 Picture Index (8 bits) - When TID is 0 242 and LID is 0, this is a cyclic counter labeling base layer frames. 243 When TID is not 0 or LID is not 0, this indicates a dependency on 244 the given index, such that this frame within this layer depends on 245 the frame with this label in the layer with TID 0 and LID 0. If 246 no scalability is used, or the cyclic counter is unknown, this 247 MUST be omitted to reduce length. Note that 0 is a valid index 248 value for TL0PICIDX. 250 The layer information contained in TID and LID convey useful aspects 251 of the layer structure that can be utilized in selective forwarding. 253 Without further information about the layer structure, these TID/LID 254 identifiers can only be used for relative priority of layers and 255 implicit dependencies between layers. They convey a layer hierarchy 256 with TID=0 and LID=0 identifying the base layer. Higher values of 257 TID identify higher temporal layers with higher frame rates. Higher 258 values of LID identify higher spatial and/or quality layers with 259 higher resolutions and/or bitrates. Implicit dependencies between 260 layers assume that a layer with a given TID/LID MAY depend on 261 layer(s) with the same or lower TID/LID, but MUST NOT depend on 262 layer(s) with higher TID/LID. 264 With further information, for example, possible future RTCP SDES 265 items that convey full layer structure information, it may be 266 possible to map these TIDs and LIDs to specific absolute frame rates, 267 resolutions and bitrates, as well as explicit dependencies between 268 layers. Such additional layer information may be useful for 269 forwarding decisions in the RTP switch, but is beyond the scope of 270 this memo. The relative layer information is still useful for many 271 selective forwarding decisions even without such additional layer 272 information. 274 3.2. Short Extension for Non-Scalable Streams 276 The following RTP header extension is RECOMMENDED for non-scalable 277 streams. It is identical to the shortest form of the extension for 278 scalable streams, except the last four bits (B and TID) are replaced 279 with zeros. It MAY also be used for scalable streams if the sender 280 has limited or no information about stream scalability. The ID is 281 assigned per [RFC8285], and the length is encoded as L=0 which 282 indicates 1 octet of data. 284 0 1 285 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 286 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 287 | ID=? | L=0 |S|E|I|D|0 0 0 0| 288 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 290 The following information are extracted from the media payload and 291 sent in the Frame Marking RTP header extension. 293 o S: Start of Frame (1 bit) - MUST be 1 in the first packet in a 294 frame; otherwise MUST be 0. 295 o E: End of Frame (1 bit) - MUST be 1 in the last packet in a frame; 296 otherwise MUST be 0. SHOULD match the RTP header marker bit in 297 payload formats with such semantics for marking end of frame. 298 o I: Independent Frame (1 bit) - MUST be 1 for frames that can be 299 decoded independent of temporally prior frames, e.g. intra-frame, 300 VPX keyframe, H.264 IDR [RFC6184], H.265 IDR/CRA/BLA/IRAP 301 [RFC7798]; otherwise MUST be 0. 302 o D: Discardable Frame (1 bit) - MUST be 1 for frames the sender 303 knows can be discarded, and still provide a decodable media 304 stream; otherwise MUST be 0. 305 o The remaining (4 bits) - are reserved/fixed values and not used 306 for non-scalable streams; they MUST be set to 0 upon transmission 307 and ignored upon reception. 309 3.3. Layer ID Mappings for Scalable Streams 311 This section maps the specific Layer ID information contained in 312 specific scalable codecs to the generic LID and TID fields. 314 Note that non-scalable streams have no Layer ID information and thus 315 no mappings. 317 3.3.1. H265 LID Mapping 319 The following shows the H265 [RFC7798] LayerID (6 bits) and TID (3 320 bits) from the NAL unit header mapped to the generic LID and TID 321 fields. 323 The S and E bits MUST match the correspondingly named bits in 324 PACI:PHES:TSCI payload structures. 326 The I bit MUST be 1 when the NAL unit type is 16-23 (inclusive) or 327 32-34 (inclusive), or an aggregation packet or fragmentation unit 328 encapsulating any of these types, otherwise it MUST be 0. These 329 ranges cover intra (IRAP) frames as well as critical parameter sets 330 (VPS, SPS, PPS). 332 The D bit MUST be 1 when the NAL unit type is 0, 2, 4, 6, 8, 10, 12, 333 14, or 38, or an aggregation packet or fragmentation unit 334 encapsulating only these types, otherwise it MUST be 0. These ranges 335 cover non-reference frames as well as filler data. 337 The B bit can not be determined reliably from simple inspection of 338 payload headers, and therefore is determined by implementation- 339 specific means. For example, internal codec interfaces may provide 340 information to set this reliably. 342 0 1 2 3 343 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 344 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 345 | ID=? | L=2 |S|E|I|D|B| TID |0|0| LayerID | TL0PICIDX | 346 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 348 3.3.2. H264-SVC LID Mapping 350 The following shows H264-SVC [RFC6190] Layer encoding information (3 351 bits for spatial/dependency layer, 4 bits for quality layer and 3 352 bits for temporal layer) mapped to the generic LID and TID fields. 354 The S, E, I and D bits MUST match the correspondingly named bits in 355 PACSI payload structures. 357 The I bit MUST be 1 when the NAL unit type is 5, 7, 8, 13, or 15, or 358 an aggregation packet or fragmentation unit encapsulating any of 359 these types, otherwise it MUST be 0. These ranges cover intra (IDR) 360 frames as well as critical parameter sets (SPS/PPS variants). 362 The D bit MUST be 1 when the NAL unit header NRI field is 0, or an 363 aggregation packet or fragmentation unit encapsulating only NAL units 364 with NRI=0, otherwise it MUST be 0. The NRI=0 condition signals non- 365 reference frames. 367 The B bit can not be determined reliably from simple inspection of 368 payload headers, and therefore is determined by implementation- 369 specific means. For example, internal codec interfaces may provide 370 information to set this reliably. 372 0 1 2 3 373 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 374 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 375 | ID=? | L=2 |S|E|I|D|B| TID |0| DID | QID | TL0PICIDX | 376 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 378 3.3.3. H264 (AVC) LID Mapping 380 The following shows the header extension for H264 (AVC) [RFC6184] 381 that contains only temporal layer information. 383 The S bit MUST be 1 when the timestamp in the RTP header differs from 384 the timestamp in the prior RTP sequence number from the same SSRC, 385 otherwise it MUST be 0. 387 The E bit MUST match the M bit in the RTP header. 389 The I bit MUST be 1 when the NAL unit type is 5, 7, or 8, or an 390 aggregation packet or fragmentation unit encapsulating any of these 391 types, otherwise it MUST be 0. These ranges cover intra (IDR) frames 392 as well as critical parameter sets (SPS/PPS). 394 The D bit MUST be 1 when the NAL unit header NRI field is 0, or an 395 aggregation packet or fragmentation unit encapsulating only NAL units 396 with NRI=0, otherwise it MUST be 0. The NRI=0 condition signals non- 397 reference frames. 399 The B bit can not be determined reliably from simple inspection of 400 payload headers, and therefore is determined by implementation- 401 specific means. For example, internal codec interfaces may provide 402 information to set this reliably. 404 0 1 2 3 405 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 406 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 407 | ID=? | L=2 |S|E|I|D|B| TID |0|0|0|0|0|0|0|0| TL0PICIDX | 408 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 410 3.3.4. VP8 LID Mapping 412 The following shows the header extension for VP8 [RFC7741] that 413 contains only temporal layer information. 415 The S bit MUST match the correspondingly named bit in the VP8 payload 416 descriptor when PID=0, otherwise it MUST be 0. 418 The E bit MUST match the M bit in the RTP header. 420 The I bit MUST match the inverse of the P bit in the VP8 payload 421 header. 423 The D bit MUST match the N bit in the VP8 payload descriptor. 425 The B bit MUST match the Y bit in the VP8 payload descriptor. 427 0 1 2 3 428 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 429 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 430 | ID=? | L=2 |S|E|I|D|B| TID |0|0|0|0|0|0|0|0| TL0PICIDX | 431 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 433 3.3.5. Future Codec LID Mapping 435 The RTP payload format specification for future video codecs SHOULD 436 include a section describing the LID mapping and TID mapping for the 437 codec. For example, the LID/TID mapping for the VP9 codec is 438 described in the VP9 RTP Payload Format [I-D.ietf-payload-vp9]. 440 3.4. Signaling Information 442 The URI for declaring this header extension in an extmap attribute is 443 "urn:ietf:params:rtp-hdrext:framemarking". It does not contain any 444 extension attributes. 446 An example attribute line in SDP: 448 a=extmap:3 urn:ietf:params:rtp-hdrext:framemarking 450 3.5. Usage Considerations 452 The header extension values MUST represent what is already in the RTP 453 payload. 455 When an RTP switch needs to discard a received video frame due to 456 congestion control considerations, it is RECOMMENDED that it 457 preferably drop frames marked with the D (Discardable) bit set, or 458 the highest values of TID and LID, which indicate the highest 459 temporal and spatial/quality enhancement layers, since those 460 typically have fewer dependenices on them than lower layers. 462 When an RTP switch wants to forward a new video stream to a receiver, 463 it is RECOMMENDED to select the new video stream from the first 464 switching point with the I (Independent) bit set in all spatial 465 layers and forward the same. An RTP switch can request a media 466 source to generate a switching point by sending Full Intra Request 467 (RTCP FIR) as defined in [RFC5104], for example. 469 3.5.1. Relation to Layer Refresh Request (LRR) 471 Receivers can use the Layer Refresh Request (LRR) 472 [I-D.ietf-avtext-lrr] RTCP feedback message to upgrade to a higher 473 layer in scalable encodings. The TID/LID values and formats used in 474 LRR messages MUST correspond to the same values and formats specified 475 in Section 3.1. 477 Because frame marking can only be used with temporally-nested 478 streams, temporal-layer LRR refreshes are unnecessary for frame- 479 marked streams. Other refreshes can be detected based on the I bit 480 being set for the specific spatial layers. 482 3.5.2. Scalability Structures 484 The LID and TID information is most useful for fixed scalability 485 structures, such as nested hierarchical temporal layering structures, 486 where each temporal layer only references lower temporal layers or 487 the base temporal layer. The LID and TID information is less useful, 488 or even not useful at all, for complex, irregular scalability 489 structures that do not conform to common, fixed patterns of inter- 490 layer dependencies and referencing structures. Therefore it is 491 RECOMMENDED to use LID and TID information for RTP switch forwarding 492 decisions only in the case of temporally nested scalability 493 structures, and it is NOT RECOMMENDED for other (more complex or 494 irregular) scalability structures. 496 4. Security Considerations 498 In the Secure Real-Time Transport Protocol (SRTP) [RFC3711], RTP 499 header extensions are authenticated but usually not encrypted. When 500 header extensions are used some of the payload type information are 501 exposed and visible to middle boxes. The encrypted media data is not 502 exposed, so this is not seen as a high risk exposure. 504 5. Acknowledgements 506 Many thanks to Bernard Aboba, Jonathan Lennox, Stephan Wenger, Dale 507 Worley, and Magnus Westerlund for their inputs. 509 6. IANA Considerations 511 This document defines a new extension URI to the RTP Compact 512 HeaderExtensions sub-registry of the Real-Time Transport Protocol 513 (RTP) Parameters registry, according to the following data: 515 Extension URI: urn:ietf:params:rtp-hdrext:framemarkinginfo 516 Description: Frame marking information for video streams 517 Contact: mzanaty@cisco.com 518 Reference: RFC XXXX 520 Note to RFC Editor: please replace RFC XXXX with the number of this 521 RFC. 523 7. References 525 7.1. Normative References 527 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 528 Requirement Levels", BCP 14, RFC 2119, 529 DOI 10.17487/RFC2119, March 1997, 530 . 532 [RFC6184] Wang, Y., Even, R., Kristensen, T., and R. Jesup, "RTP 533 Payload Format for H.264 Video", RFC 6184, 534 DOI 10.17487/RFC6184, May 2011, 535 . 537 [RFC6190] Wenger, S., Wang, Y., Schierl, T., and A. Eleftheriadis, 538 "RTP Payload Format for Scalable Video Coding", RFC 6190, 539 DOI 10.17487/RFC6190, May 2011, 540 . 542 [RFC7741] Westin, P., Lundin, H., Glover, M., Uberti, J., and F. 543 Galligan, "RTP Payload Format for VP8 Video", RFC 7741, 544 DOI 10.17487/RFC7741, March 2016, 545 . 547 [RFC7798] Wang, Y., Sanchez, Y., Schierl, T., Wenger, S., and M. 548 Hannuksela, "RTP Payload Format for High Efficiency Video 549 Coding (HEVC)", RFC 7798, DOI 10.17487/RFC7798, March 550 2016, . 552 [RFC8285] Singer, D., Desineni, H., and R. Even, Ed., "A General 553 Mechanism for RTP Header Extensions", RFC 8285, 554 DOI 10.17487/RFC8285, October 2017, 555 . 557 7.2. Informative References 559 [I-D.ietf-avtext-lrr] 560 Lennox, J., Hong, D., Uberti, J., Holmer, S., and M. 561 Flodman, "The Layer Refresh Request (LRR) RTCP Feedback 562 Message", draft-ietf-avtext-lrr-07 (work in progress), 563 July 2017. 565 [I-D.ietf-payload-vp9] 566 Uberti, J., Holmer, S., Flodman, M., Hong, D., and J. 567 Lennox, "RTP Payload Format for VP9 Video", draft-ietf- 568 payload-vp9-10 (work in progress), July 2020. 570 [I-D.ietf-perc-private-media-framework] 571 Jones, P., Benham, D., and C. Groves, "A Solution 572 Framework for Private Media in Privacy Enhanced RTP 573 Conferencing (PERC)", draft-ietf-perc-private-media- 574 framework-12 (work in progress), June 2019. 576 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 577 Jacobson, "RTP: A Transport Protocol for Real-Time 578 Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, 579 July 2003, . 581 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 582 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 583 RFC 3711, DOI 10.17487/RFC3711, March 2004, 584 . 586 [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman, 587 "Codec Control Messages in the RTP Audio-Visual Profile 588 with Feedback (AVPF)", RFC 5104, DOI 10.17487/RFC5104, 589 February 2008, . 591 [RFC6464] Lennox, J., Ed., Ivov, E., and E. Marocco, "A Real-time 592 Transport Protocol (RTP) Header Extension for Client-to- 593 Mixer Audio Level Indication", RFC 6464, 594 DOI 10.17487/RFC6464, December 2011, 595 . 597 [RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and 598 B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms 599 for Real-Time Transport Protocol (RTP) Sources", RFC 7656, 600 DOI 10.17487/RFC7656, November 2015, 601 . 603 [RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667, 604 DOI 10.17487/RFC7667, November 2015, 605 . 607 Authors' Addresses 609 Mo Zanaty 610 Cisco Systems 611 170 West Tasman Drive 612 San Jose, CA 95134 613 US 615 Email: mzanaty@cisco.com 616 Espen Berger 617 Cisco Systems 619 Phone: +47 98228179 620 Email: espeberg@cisco.com 622 Suhas Nandakumar 623 Cisco Systems 624 170 West Tasman Drive 625 San Jose, CA 95134 626 US 628 Email: snandaku@cisco.com