idnits 2.17.1 

draft-ietf-avtext-framemarking-12.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but
     does not include the phrase in its RFC 2119 key words list.

  -- The document date (March 10, 2021) is 1136 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Outdated reference: A later version (-16) exists of
     draft-ietf-payload-vp9-10


     Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Network Working Group                                          M. Zanaty
3	Internet-Draft                                                 E. Berger
4	Intended status: Standards Track                           S. Nandakumar
5	Expires: September 11, 2021                                Cisco Systems
6	                                                          March 10, 2021

8	                   Frame Marking RTP Header Extension
9	                   draft-ietf-avtext-framemarking-12

11	Abstract

13	   This document describes a Frame Marking RTP header extension used to
14	   convey information about video frames that is critical for error
15	   recovery and packet forwarding in RTP middleboxes or network nodes.
16	   It is most useful when media is encrypted, and essential when the
17	   middlebox or node has no access to the media decryption keys.  It is
18	   also useful for codec-agnostic processing of encrypted or unencrypted
19	   media, while it also supports extensions for codec-specific
20	   information.

22	Status of This Memo

24	   This Internet-Draft is submitted in full conformance with the
25	   provisions of BCP 78 and BCP 79.

27	   Internet-Drafts are working documents of the Internet Engineering
28	   Task Force (IETF).  Note that other groups may also distribute
29	   working documents as Internet-Drafts.  The list of current Internet-
30	   Drafts is at https://datatracker.ietf.org/drafts/current/.

32	   Internet-Drafts are draft documents valid for a maximum of six months
33	   and may be updated, replaced, or obsoleted by other documents at any
34	   time.  It is inappropriate to use Internet-Drafts as reference
35	   material or to cite them other than as "work in progress."

37	   This Internet-Draft will expire on September 11, 2021.

39	Copyright Notice

41	   Copyright (c) 2021 IETF Trust and the persons identified as the
42	   document authors.  All rights reserved.

44	   This document is subject to BCP 78 and the IETF Trust's Legal
45	   Provisions Relating to IETF Documents
46	   (https://trustee.ietf.org/license-info) in effect on the date of
47	   publication of this document.  Please review these documents
48	   carefully, as they describe your rights and restrictions with respect
49	   to this document.  Code Components extracted from this document must
50	   include Simplified BSD License text as described in Section 4.e of
51	   the Trust Legal Provisions and are provided without warranty as
52	   described in the Simplified BSD License.

54	Table of Contents

56	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
57	   2.  Key Words for Normative Requirements  . . . . . . . . . . . .   4
58	   3.  Frame Marking RTP Header Extension  . . . . . . . . . . . . .   4
59	     3.1.  Long Extension for Scalable Streams . . . . . . . . . . .   4
60	     3.2.  Short Extension for Non-Scalable Streams  . . . . . . . .   6
61	     3.3.  Layer ID Mappings for Scalable Streams  . . . . . . . . .   7
62	       3.3.1.  VP9 LID Mapping . . . . . . . . . . . . . . . . . . .   7
63	       3.3.2.  H265 LID Mapping  . . . . . . . . . . . . . . . . . .   8
64	       3.3.3.  H264-SVC LID Mapping  . . . . . . . . . . . . . . . .   9
65	       3.3.4.  H264 (AVC) LID Mapping  . . . . . . . . . . . . . . .   9
66	       3.3.5.  VP8 LID Mapping . . . . . . . . . . . . . . . . . . .  10
67	       3.3.6.  Future Codec LID Mapping  . . . . . . . . . . . . . .  11
68	     3.4.  Signaling Information . . . . . . . . . . . . . . . . . .  11
69	     3.5.  Usage Considerations  . . . . . . . . . . . . . . . . . .  11
70	       3.5.1.  Relation to Layer Refresh Request (LRR) . . . . . . .  11
71	       3.5.2.  Scalability Structures  . . . . . . . . . . . . . . .  12
72	   4.  Security Considerations . . . . . . . . . . . . . . . . . . .  12
73	   5.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  12
74	   6.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  12
75	   7.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  13
76	     7.1.  Normative References  . . . . . . . . . . . . . . . . . .  13
77	     7.2.  Informative References  . . . . . . . . . . . . . . . . .  13
78	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  14

80	1.  Introduction

82	   Many widely deployed RTP [RFC3550] topologies [RFC7667] used in
83	   modern voice and video conferencing systems include a centralized
84	   component that acts as an RTP switch.  It receives voice and video
85	   streams from each participant, which may be encrypted using SRTP
86	   [RFC3711], or extensions that provide participants with private media
87	   [RFC8871] via end-to-end encryption where the switch has no access to
88	   media decryption keys.  The goal is to provide a set of streams back
89	   to the participants which enable them to render the right media
90	   content.  In a simple video configuration, for example, the goal will
91	   be that each participant sees and hears just the active speaker.  In
92	   that case, the goal of the switch is to receive the voice and video
93	   streams from each participant, determine the active speaker based on
94	   energy in the voice packets, possibly using the client-to-mixer audio
95	   level RTP header extension [RFC6464], and select the corresponding
96	   video stream for transmission to participants; see Figure 1.

98	   In this document, an "RTP switch" is used as a common short term for
99	   the terms "switching RTP mixer", "source projecting middlebox",
100	   "source forwarding unit/middlebox" and "video switching MCU" as
101	   discussed in [RFC7667].

103	            +---+      +------------+      +---+
104	            | A |<---->|            |<---->| B |
105	            +---+      |            |      +---+
106	                       |   RTP      |
107	            +---+      |  Switch    |      +---+
108	            | C |<---->|            |<---->| D |
109	            +---+      +------------+      +---+

111	                           Figure 1: RTP switch

113	   In order to properly support switching of video streams, the RTP
114	   switch typically needs some critical information about video frames
115	   in order to start and stop forwarding streams.

117	   o  Because of inter-frame dependencies, it should ideally switch
118	      video streams at a point where the first frame from the new
119	      speaker can be decoded by recipients without prior frames, e.g
120	      switch on an intra-frame.
121	   o  In many cases, the switch may need to drop frames in order to
122	      realize congestion control techniques, and needs to know which
123	      frames can be dropped with minimal impact to video quality.
124	   o  For scalable streams with dependent layers, the switch may need to
125	      selectively forward specific layers to specific recipients due to
126	      recipient bandwidth or decoder limits.
127	   o  Furthermore, it is highly desirable to do this in a payload
128	      format-agnostic way which is not specific to each different video
129	      codec.  Most modern video codecs share common concepts around
130	      frame types and other critical information to make this codec-
131	      agnostic handling possible.
132	   o  It is also desirable to be able to do this for SRTP without
133	      requiring the video switch to decrypt the packets.  SRTP will
134	      encrypt the RTP payload format contents and consequently this data
135	      is not usable for the switching function without decryption, which
136	      may not even be possible in the case of end-to-end encryption of
137	      private media [RFC8871].

139	   By providing meta-information about the RTP streams outside the
140	   encrypted media payload, an RTP switch can do codec-agnostic
141	   selective forwarding without decrypting the payload.  This document
142	   specifies the necessary meta-information in an RTP header extension.

144	2.  Key Words for Normative Requirements

146	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
147	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
148	   document are to be interpreted as described in [RFC2119].

150	3.  Frame Marking RTP Header Extension

152	   This specification uses RTP header extensions as defined in
153	   [RFC8285].  A subset of meta-information from the video stream is
154	   provided as an RTP header extension to allow an RTP switch to do
155	   generic selective forwarding of video streams encoded with
156	   potentially different video codecs.

158	   The Frame Marking RTP header extension is encoded using the one-byte
159	   header or two-byte header as described in [RFC8285].  The one-byte
160	   header format is used for examples in this memo.  The two-byte header
161	   format is used when other two-byte header extensions are present in
162	   the same RTP packet, since mixing one-byte and two-byte extensions is
163	   not possible in the same RTP packet.

165	   This extension is only specified for Source (not Redundancy) RTP
166	   Streams [RFC7656] that carry video payloads.  It is not specified for
167	   audio payloads, nor is it specified for Redundancy RTP Streams.  The
168	   (separate) specifications for Redundancy RTP Streams often include
169	   provisions for recovering any header extensions that were part of the
170	   original source packet.  Such provisions SHALL be followed to recover
171	   the Frame Marking RTP header extension of the original source packet.
172	   Source packet frame markings may be useful when generating Redundancy
173	   RTP Streams; for example, the I and D bits can be used to generate
174	   extra or no redundancy, respectively, and redundancy schemes with
175	   source blocks can align source block boundaries with Independent
176	   frame boundaries as marked by the I bit.

178	   A frame, in the context of this specification, is the set of RTP
179	   packets with the same RTP timestamp from a specific RTP
180	   synchronization source (SSRC).  A frame within a layer is the set of
181	   RTP packets with the same RTP timestamp, SSRC, Temporal ID (TID), and
182	   Layer ID (LID).

184	3.1.  Long Extension for Scalable Streams

186	   The following RTP header extension is RECOMMENDED for scalable
187	   streams.  It MAY also be used for non-scalable streams, in which case
188	   TID, LID and TL0PICIDX MUST be 0 or omitted.  The ID is assigned per
189	   [RFC8285], and the length is encoded as L=2 which indicates 3 octets
190	   of data when nothing is omitted, or L=1 for 2 octets when TL0PICIDX
191	   is omitted, or L=0 for 1 octet when both LID and TL0PICIDX are
192	   omitted.

194	    0                   1                   2                   3
195	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
196	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
197	   |  ID=? |  L=2  |S|E|I|D|B| TID |   LID         |    TL0PICIDX  |
198	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
199	              or
200	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
201	   |  ID=? |  L=1  |S|E|I|D|B| TID |   LID         | (TL0PICIDX omitted)
202	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
203	              or
204	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
205	   |  ID=? |  L=0  |S|E|I|D|B| TID | (LID and TL0PICIDX omitted)
206	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

208	   The following information are extracted from the media payload and
209	   sent in the Frame Marking RTP header extension.

211	   o  S: Start of Frame (1 bit) - MUST be 1 in the first packet in a
212	      frame within a layer; otherwise MUST be 0.
213	   o  E: End of Frame (1 bit) - MUST be 1 in the last packet in a frame
214	      within a layer; otherwise MUST be 0.  Note that the RTP header
215	      marker bit MAY be used to infer the last packet of the highest
216	      enhancement layer, in payload formats with such semantics.
217	   o  I: Independent Frame (1 bit) - MUST be 1 for a frame within a
218	      layer that can be decoded independent of temporally prior frames,
219	      e.g. intra-frame, VPX keyframe, H.264 IDR [RFC6184], H.265
220	      IDR/CRA/BLA/RAP [RFC7798]; otherwise MUST be 0.  Note that this
221	      bit only signals temporal independence, so it can be 1 in spatial
222	      or quality enhancement layers that depend on temporally co-located
223	      layers but not temporally prior frames.
224	   o  D: Discardable Frame (1 bit) - MUST be 1 for a frame within a
225	      layer the sender knows can be discarded, and still provide a
226	      decodable media stream; otherwise MUST be 0.
227	   o  B: Base Layer Sync (1 bit) - When TID is not 0, this MUST be 1 if
228	      the sender knows this frame within a layer only depends on the
229	      base temporal layer; otherwise MUST be 0.  When TID is 0 or if no
230	      scalability is used, this MUST be 0.
231	   o  TID: Temporal ID (3 bits) - Identifies the temporal layer/sub-
232	      layer encoded, starting with 0 for the base layer, and increasing
233	      with higher temporal fidelity.  If no scalability is used, this
234	      MUST be 0.  It is implicitly 0 in the short extension format.
235	   o  LID: Layer ID (8 bits) - Identifies the spatial and quality layer
236	      encoded, starting with 0 for the base layer, and increasing with
237	      higher fidelity.  If no scalability is used, this MUST be 0 or
238	      omitted to reduce length.  When omitted, TL0PICIDX MUST also be
239	      omitted.  It is implicitly 0 in the short extension format or when
240	      omitted in the long extension format.
241	   o  TL0PICIDX: Temporal Layer 0 Picture Index (8 bits) - When TID is 0
242	      and LID is 0, this is a cyclic counter labeling base layer frames.
243	      When TID is not 0 or LID is not 0, this indicates a dependency on
244	      the given index, such that this frame within this layer depends on
245	      the frame with this label in the layer with TID 0 and LID 0.  If
246	      no scalability is used, or the cyclic counter is unknown, this
247	      MUST be omitted to reduce length.  Note that 0 is a valid index
248	      value for TL0PICIDX.

250	   The layer information contained in TID and LID convey useful aspects
251	   of the layer structure that can be utilized in selective forwarding.

253	   Without further information about the layer structure, these TID/LID
254	   identifiers can only be used for relative priority of layers and
255	   implicit dependencies between layers.  They convey a layer hierarchy
256	   with TID=0 and LID=0 identifying the base layer.  Higher values of
257	   TID identify higher temporal layers with higher frame rates.  Higher
258	   values of LID identify higher spatial and/or quality layers with
259	   higher resolutions and/or bitrates.  Implicit dependencies between
260	   layers assume that a layer with a given TID/LID MAY depend on
261	   layer(s) with the same or lower TID/LID, but MUST NOT depend on
262	   layer(s) with higher TID/LID.

264	   With further information, for example, possible future RTCP SDES
265	   items that convey full layer structure information, it may be
266	   possible to map these TIDs and LIDs to specific absolute frame rates,
267	   resolutions and bitrates, as well as explicit dependencies between
268	   layers.  Such additional layer information may be useful for
269	   forwarding decisions in the RTP switch, but is beyond the scope of
270	   this memo.  The relative layer information is still useful for many
271	   selective forwarding decisions even without such additional layer
272	   information.

274	3.2.  Short Extension for Non-Scalable Streams

276	   The following RTP header extension is RECOMMENDED for non-scalable
277	   streams.  It is identical to the shortest form of the extension for
278	   scalable streams, except the last four bits (B and TID) are replaced
279	   with zeros.  It MAY also be used for scalable streams if the sender
280	   has limited or no information about stream scalability.  The ID is
281	   assigned per [RFC8285], and the length is encoded as L=0 which
282	   indicates 1 octet of data.

284	    0                   1
285	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
286	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
287	   |  ID=? |  L=0  |S|E|I|D|0 0 0 0|
288	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

290	   The following information are extracted from the media payload and
291	   sent in the Frame Marking RTP header extension.

293	   o  S: Start of Frame (1 bit) - MUST be 1 in the first packet in a
294	      frame; otherwise MUST be 0.
295	   o  E: End of Frame (1 bit) - MUST be 1 in the last packet in a frame;
296	      otherwise MUST be 0.  SHOULD match the RTP header marker bit in
297	      payload formats with such semantics for marking end of frame.
298	   o  I: Independent Frame (1 bit) - MUST be 1 for frames that can be
299	      decoded independent of temporally prior frames, e.g. intra-frame,
300	      VPX keyframe, H.264 IDR [RFC6184], H.265 IDR/CRA/BLA/IRAP
301	      [RFC7798]; otherwise MUST be 0.
302	   o  D: Discardable Frame (1 bit) - MUST be 1 for frames the sender
303	      knows can be discarded, and still provide a decodable media
304	      stream; otherwise MUST be 0.
305	   o  The remaining (4 bits) - are reserved/fixed values and not used
306	      for non-scalable streams; they MUST be set to 0 upon transmission
307	      and ignored upon reception.

309	3.3.  Layer ID Mappings for Scalable Streams

311	   This section maps the specific Layer ID information contained in
312	   specific scalable codecs to the generic LID and TID fields.

314	   Note that non-scalable streams have no Layer ID information and thus
315	   no mappings.

317	3.3.1.  VP9 LID Mapping

319	   The following shows the VP9 [I-D.ietf-payload-vp9] Spatial Layer ID
320	   (SID, 3 bits) and Temporal Layer ID (TID, 3 bits) from the VP9
321	   payload descriptor mapped to the generic LID and TID fields.

323	   The S bit MUST match the B bit in the VP9 payload descriptor.

325	   The E bit MUST match the E bit in the VP9 payload descriptor.

327	   The I bit MUST match the inverse of the P bit in the VP9 payload
328	   descriptor.

330	   The D bit MUST be 1 if the refresh_frame_flags in the VP9 payload
331	   uncompressed header are all 0, otherwise it MUST be 0.

333	   The B bit MUST be 0 if TID is 0; otherwise, if TID is not 0, it MUST
334	   match the U bit in the VP9 payload descriptor.  Note: When using
335	   temporally nested scalability structures as recommended in
336	   Section 3.5.2, the B bit and VP9 U bit will always be 1 if TID is not
337	   0, since it is always possible to switch up to a higher temporal
338	   layer in such nested structures.

340	   TID and TL0PICIDX MUST match the correspondingly named fields in the
341	   VP9 payload descriptor.

343	      0                   1                   2                   3
344	      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
345	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
346	     |  ID=? |  L=2  |S|E|I|D|B| TID |0|0|0|0|0| SID |    TL0PICIDX  |
347	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

349	3.3.2.  H265 LID Mapping

351	   The following shows the H265 [RFC7798] LayerID (6 bits) and TID (3
352	   bits) from the NAL unit header mapped to the generic LID and TID
353	   fields.

355	   The S and E bits MUST match the correspondingly named bits in
356	   PACI:PHES:TSCI payload structures.

358	   The I bit MUST be 1 when the NAL unit type is 16-23 (inclusive) or
359	   32-34 (inclusive), or an aggregation packet or fragmentation unit
360	   encapsulating any of these types, otherwise it MUST be 0.  These
361	   ranges cover intra (IRAP) frames as well as critical parameter sets
362	   (VPS, SPS, PPS).

364	   The D bit MUST be 1 when the NAL unit type is 0, 2, 4, 6, 8, 10, 12,
365	   14, or 38, or an aggregation packet or fragmentation unit
366	   encapsulating only these types, otherwise it MUST be 0.  These ranges
367	   cover non-reference frames as well as filler data.

369	   The B bit can not be determined reliably from simple inspection of
370	   payload headers, and therefore is determined by implementation-
371	   specific means.  For example, internal codec interfaces may provide
372	   information to set this reliably.

374	    0                   1                   2                   3
375	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
376	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
377	   |  ID=? |  L=2  |S|E|I|D|B| TID |0|0|  LayerID  |    TL0PICIDX  |
378	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

380	3.3.3.  H264-SVC LID Mapping

382	   The following shows H264-SVC [RFC6190] Layer encoding information (3
383	   bits for spatial/dependency layer, 4 bits for quality layer and 3
384	   bits for temporal layer) mapped to the generic LID and TID fields.

386	   The S, E, I and D bits MUST match the correspondingly named bits in
387	   PACSI payload structures.

389	   The I bit MUST be 1 when the NAL unit type is 5, 7, 8, 13, or 15, or
390	   an aggregation packet or fragmentation unit encapsulating any of
391	   these types, otherwise it MUST be 0.  These ranges cover intra (IDR)
392	   frames as well as critical parameter sets (SPS/PPS variants).

394	   The D bit MUST be 1 when the NAL unit header NRI field is 0, or an
395	   aggregation packet or fragmentation unit encapsulating only NAL units
396	   with NRI=0, otherwise it MUST be 0.  The NRI=0 condition signals non-
397	   reference frames.

399	   The B bit can not be determined reliably from simple inspection of
400	   payload headers, and therefore is determined by implementation-
401	   specific means.  For example, internal codec interfaces may provide
402	   information to set this reliably.

404	    0                   1                   2                   3
405	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
406	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
407	   |  ID=? |  L=2  |S|E|I|D|B| TID |0| DID |  QID  |    TL0PICIDX  |
408	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

410	3.3.4.  H264 (AVC) LID Mapping

412	   The following shows the header extension for H264 (AVC) [RFC6184]
413	   that contains only temporal layer information.

415	   The S bit MUST be 1 when the timestamp in the RTP header differs from
416	   the timestamp in the prior RTP sequence number from the same SSRC,
417	   otherwise it MUST be 0.

419	   The E bit MUST match the M bit in the RTP header.

421	   The I bit MUST be 1 when the NAL unit type is 5, 7, or 8, or an
422	   aggregation packet or fragmentation unit encapsulating any of these
423	   types, otherwise it MUST be 0.  These ranges cover intra (IDR) frames
424	   as well as critical parameter sets (SPS/PPS).

426	   The D bit MUST be 1 when the NAL unit header NRI field is 0, or an
427	   aggregation packet or fragmentation unit encapsulating only NAL units
428	   with NRI=0, otherwise it MUST be 0.  The NRI=0 condition signals non-
429	   reference frames.

431	   The B bit can not be determined reliably from simple inspection of
432	   payload headers, and therefore is determined by implementation-
433	   specific means.  For example, internal codec interfaces may provide
434	   information to set this reliably.

436	    0                   1                   2                   3
437	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
438	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
439	   |  ID=? |  L=2  |S|E|I|D|B| TID |0|0|0|0|0|0|0|0|    TL0PICIDX  |
440	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

442	3.3.5.  VP8 LID Mapping

444	   The following shows the header extension for VP8 [RFC7741] that
445	   contains only temporal layer information.

447	   The S bit MUST match the correspondingly named bit in the VP8 payload
448	   descriptor when PID=0, otherwise it MUST be 0.

450	   The E bit MUST match the M bit in the RTP header.

452	   The I bit MUST match the inverse of the P bit in the VP8 payload
453	   header.

455	   The D bit MUST match the N bit in the VP8 payload descriptor.

457	   The B bit MUST match the Y bit in the VP8 payload descriptor.  Note:
458	   When using temporally nested scalability structures as recommended in
459	   Section 3.5.2, the B bit and VP8 Y bit will always be 1 if TID is not
460	   0, since it is always possible to switch up to a higher temporal
461	   layer in such nested structures.

463	   TID and TL0PICIDX MUST match the correspondingly named fields in the
464	   VP8 payload descriptor.

466	    0                   1                   2                   3
467	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
468	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
469	   |  ID=? |  L=2  |S|E|I|D|B| TID |0|0|0|0|0|0|0|0|    TL0PICIDX  |
470	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

472	3.3.6.  Future Codec LID Mapping

474	   The RTP payload format specification for future video codecs SHOULD
475	   include a section describing the LID mapping and TID mapping for the
476	   codec.

478	3.4.  Signaling Information

480	   The URI for declaring this header extension in an extmap attribute is
481	   "urn:ietf:params:rtp-hdrext:framemarking".  It does not contain any
482	   extension attributes.

484	   An example attribute line in SDP:

486	      a=extmap:3 urn:ietf:params:rtp-hdrext:framemarking

488	3.5.  Usage Considerations

490	   The header extension values MUST represent what is already in the RTP
491	   payload.

493	   When an RTP switch needs to discard a received video frame due to
494	   congestion control considerations, it is RECOMMENDED that it
495	   preferably drop frames marked with the D (Discardable) bit set, or
496	   the highest values of TID and LID, which indicate the highest
497	   temporal and spatial/quality enhancement layers, since those
498	   typically have fewer dependenices on them than lower layers.

500	   When an RTP switch wants to forward a new video stream to a receiver,
501	   it is RECOMMENDED to select the new video stream from the first
502	   switching point with the I (Independent) bit set in all spatial
503	   layers and forward the same.  An RTP switch can request a media
504	   source to generate a switching point by sending Full Intra Request
505	   (RTCP FIR) as defined in [RFC5104], for example.

507	3.5.1.  Relation to Layer Refresh Request (LRR)

509	   Receivers can use the Layer Refresh Request (LRR)
510	   [I-D.ietf-avtext-lrr] RTCP feedback message to upgrade to a higher
511	   layer in scalable encodings.  The TID/LID values and formats used in
512	   LRR messages MUST correspond to the same values and formats specified
513	   in Section 3.1.

515	   Because frame marking can only be used with temporally-nested
516	   streams, temporal-layer LRR refreshes are unnecessary for frame-
517	   marked streams.  Other refreshes can be detected based on the I bit
518	   being set for the specific spatial layers.

520	3.5.2.  Scalability Structures

522	   The LID and TID information is most useful for fixed scalability
523	   structures, such as nested hierarchical temporal layering structures,
524	   where each temporal layer only references lower temporal layers or
525	   the base temporal layer.  The LID and TID information is less useful,
526	   or even not useful at all, for complex, irregular scalability
527	   structures that do not conform to common, fixed patterns of inter-
528	   layer dependencies and referencing structures.  Therefore it is
529	   RECOMMENDED to use LID and TID information for RTP switch forwarding
530	   decisions only in the case of temporally nested scalability
531	   structures, and it is NOT RECOMMENDED for other (more complex or
532	   irregular) scalability structures.

534	4.  Security Considerations

536	   In the Secure Real-Time Transport Protocol (SRTP) [RFC3711], RTP
537	   header extensions are authenticated but usually not encrypted.  When
538	   header extensions are used some of the payload type information are
539	   exposed and visible to middle boxes.  The encrypted media data is not
540	   exposed, so this is not seen as a high risk exposure.

542	5.  Acknowledgements

544	   Many thanks to Bernard Aboba, Jonathan Lennox, Stephan Wenger, Dale
545	   Worley, and Magnus Westerlund for their inputs.

547	6.  IANA Considerations

549	   This document defines a new extension URI to the RTP Compact
550	   HeaderExtensions sub-registry of the Real-Time Transport Protocol
551	   (RTP) Parameters registry, according to the following data:

553	   Extension URI: urn:ietf:params:rtp-hdrext:framemarkinginfo
554	   Description: Frame marking information for video streams
555	   Contact: mzanaty@cisco.com
556	   Reference: RFC XXXX

558	   Note to RFC Editor: please replace RFC XXXX with the number of this
559	   RFC.

561	7.  References

563	7.1.  Normative References

565	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
566	              Requirement Levels", BCP 14, RFC 2119,
567	              DOI 10.17487/RFC2119, March 1997,
568	              <https://www.rfc-editor.org/info/rfc2119>.

570	   [RFC6184]  Wang, Y., Even, R., Kristensen, T., and R. Jesup, "RTP
571	              Payload Format for H.264 Video", RFC 6184,
572	              DOI 10.17487/RFC6184, May 2011,
573	              <https://www.rfc-editor.org/info/rfc6184>.

575	   [RFC6190]  Wenger, S., Wang, Y., Schierl, T., and A. Eleftheriadis,
576	              "RTP Payload Format for Scalable Video Coding", RFC 6190,
577	              DOI 10.17487/RFC6190, May 2011,
578	              <https://www.rfc-editor.org/info/rfc6190>.

580	   [RFC7741]  Westin, P., Lundin, H., Glover, M., Uberti, J., and F.
581	              Galligan, "RTP Payload Format for VP8 Video", RFC 7741,
582	              DOI 10.17487/RFC7741, March 2016,
583	              <https://www.rfc-editor.org/info/rfc7741>.

585	   [RFC7798]  Wang, Y., Sanchez, Y., Schierl, T., Wenger, S., and M.
586	              Hannuksela, "RTP Payload Format for High Efficiency Video
587	              Coding (HEVC)", RFC 7798, DOI 10.17487/RFC7798, March
588	              2016, <https://www.rfc-editor.org/info/rfc7798>.

590	   [RFC8285]  Singer, D., Desineni, H., and R. Even, Ed., "A General
591	              Mechanism for RTP Header Extensions", RFC 8285,
592	              DOI 10.17487/RFC8285, October 2017,
593	              <https://www.rfc-editor.org/info/rfc8285>.

595	7.2.  Informative References

597	   [I-D.ietf-avtext-lrr]
598	              Lennox, J., Hong, D., Uberti, J., Holmer, S., and M.
599	              Flodman, "The Layer Refresh Request (LRR) RTCP Feedback
600	              Message", draft-ietf-avtext-lrr-07 (work in progress),
601	              July 2017.

603	   [I-D.ietf-payload-vp9]
604	              Uberti, J., Holmer, S., Flodman, M., Hong, D., and J.
605	              Lennox, "RTP Payload Format for VP9 Video", draft-ietf-
606	              payload-vp9-10 (work in progress), July 2020.

608	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
609	              Jacobson, "RTP: A Transport Protocol for Real-Time
610	              Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550,
611	              July 2003, <https://www.rfc-editor.org/info/rfc3550>.

613	   [RFC3711]  Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
614	              Norrman, "The Secure Real-time Transport Protocol (SRTP)",
615	              RFC 3711, DOI 10.17487/RFC3711, March 2004,
616	              <https://www.rfc-editor.org/info/rfc3711>.

618	   [RFC5104]  Wenger, S., Chandra, U., Westerlund, M., and B. Burman,
619	              "Codec Control Messages in the RTP Audio-Visual Profile
620	              with Feedback (AVPF)", RFC 5104, DOI 10.17487/RFC5104,
621	              February 2008, <https://www.rfc-editor.org/info/rfc5104>.

623	   [RFC6464]  Lennox, J., Ed., Ivov, E., and E. Marocco, "A Real-time
624	              Transport Protocol (RTP) Header Extension for Client-to-
625	              Mixer Audio Level Indication", RFC 6464,
626	              DOI 10.17487/RFC6464, December 2011,
627	              <https://www.rfc-editor.org/info/rfc6464>.

629	   [RFC7656]  Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and
630	              B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms
631	              for Real-Time Transport Protocol (RTP) Sources", RFC 7656,
632	              DOI 10.17487/RFC7656, November 2015,
633	              <https://www.rfc-editor.org/info/rfc7656>.

635	   [RFC7667]  Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667,
636	              DOI 10.17487/RFC7667, November 2015,
637	              <https://www.rfc-editor.org/info/rfc7667>.

639	   [RFC8871]  Jones, P., Benham, D., and C. Groves, "A Solution
640	              Framework for Private Media in Privacy-Enhanced RTP
641	              Conferencing (PERC)", RFC 8871, DOI 10.17487/RFC8871,
642	              January 2021, <https://www.rfc-editor.org/info/rfc8871>.

644	Authors' Addresses

646	   Mo Zanaty
647	   Cisco Systems
648	   170 West Tasman Drive
649	   San Jose, CA  95134
650	   US

652	   Email: mzanaty@cisco.com
653	   Espen Berger
654	   Cisco Systems

656	   Phone: +47 98228179
657	   Email: espeberg@cisco.com

659	   Suhas Nandakumar
660	   Cisco Systems
661	   170 West Tasman Drive
662	   San Jose, CA  95134
663	   US

665	   Email: snandaku@cisco.com