idnits 2.17.1 

draft-ietf-avtext-framemarking-11.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but
     does not include the phrase in its RFC 2119 key words list.

  -- The document date (August 4, 2020) is 1362 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Outdated reference: A later version (-16) exists of
     draft-ietf-payload-vp9-10


     Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Network Working Group                                          M. Zanaty
3	Internet-Draft                                                 E. Berger
4	Intended status: Standards Track                           S. Nandakumar
5	Expires: February 5, 2021                                  Cisco Systems
6	                                                          August 4, 2020

8	                   Frame Marking RTP Header Extension
9	                   draft-ietf-avtext-framemarking-11

11	Abstract

13	   This document describes a Frame Marking RTP header extension used to
14	   convey information about video frames that is critical for error
15	   recovery and packet forwarding in RTP middleboxes or network nodes.
16	   It is most useful when media is encrypted, and essential when the
17	   middlebox or node has no access to the media decryption keys.  It is
18	   also useful for codec-agnostic processing of encrypted or unencrypted
19	   media, while it also supports extensions for codec-specific
20	   information.

22	Status of This Memo

24	   This Internet-Draft is submitted in full conformance with the
25	   provisions of BCP 78 and BCP 79.

27	   Internet-Drafts are working documents of the Internet Engineering
28	   Task Force (IETF).  Note that other groups may also distribute
29	   working documents as Internet-Drafts.  The list of current Internet-
30	   Drafts is at https://datatracker.ietf.org/drafts/current/.

32	   Internet-Drafts are draft documents valid for a maximum of six months
33	   and may be updated, replaced, or obsoleted by other documents at any
34	   time.  It is inappropriate to use Internet-Drafts as reference
35	   material or to cite them other than as "work in progress."

37	   This Internet-Draft will expire on February 5, 2021.

39	Copyright Notice

41	   Copyright (c) 2020 IETF Trust and the persons identified as the
42	   document authors.  All rights reserved.

44	   This document is subject to BCP 78 and the IETF Trust's Legal
45	   Provisions Relating to IETF Documents
46	   (https://trustee.ietf.org/license-info) in effect on the date of
47	   publication of this document.  Please review these documents
48	   carefully, as they describe your rights and restrictions with respect
49	   to this document.  Code Components extracted from this document must
50	   include Simplified BSD License text as described in Section 4.e of
51	   the Trust Legal Provisions and are provided without warranty as
52	   described in the Simplified BSD License.

54	Table of Contents

56	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
57	   2.  Key Words for Normative Requirements  . . . . . . . . . . . .   4
58	   3.  Frame Marking RTP Header Extension  . . . . . . . . . . . . .   4
59	     3.1.  Long Extension for Scalable Streams . . . . . . . . . . .   4
60	     3.2.  Short Extension for Non-Scalable Streams  . . . . . . . .   6
61	     3.3.  Layer ID Mappings for Scalable Streams  . . . . . . . . .   7
62	       3.3.1.  H265 LID Mapping  . . . . . . . . . . . . . . . . . .   7
63	       3.3.2.  H264-SVC LID Mapping  . . . . . . . . . . . . . . . .   8
64	       3.3.3.  H264 (AVC) LID Mapping  . . . . . . . . . . . . . . .   9
65	       3.3.4.  VP8 LID Mapping . . . . . . . . . . . . . . . . . . .   9
66	       3.3.5.  Future Codec LID Mapping  . . . . . . . . . . . . . .  10
67	     3.4.  Signaling Information . . . . . . . . . . . . . . . . . .  10
68	     3.5.  Usage Considerations  . . . . . . . . . . . . . . . . . .  10
69	       3.5.1.  Relation to Layer Refresh Request (LRR) . . . . . . .  10
70	       3.5.2.  Scalability Structures  . . . . . . . . . . . . . . .  11
71	   4.  Security Considerations . . . . . . . . . . . . . . . . . . .  11
72	   5.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  11
73	   6.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  11
74	   7.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  12
75	     7.1.  Normative References  . . . . . . . . . . . . . . . . . .  12
76	     7.2.  Informative References  . . . . . . . . . . . . . . . . .  12
77	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  13

79	1.  Introduction

81	   Many widely deployed RTP [RFC3550] topologies [RFC7667] used in
82	   modern voice and video conferencing systems include a centralized
83	   component that acts as an RTP switch.  It receives voice and video
84	   streams from each participant, which may be encrypted using SRTP
85	   [RFC3711], or extensions that provide participants with private media
86	   [I-D.ietf-perc-private-media-framework] via end-to-end encryption
87	   where the switch has no access to media decryption keys.  The goal is
88	   to provide a set of streams back to the participants which enable
89	   them to render the right media content.  In a simple video
90	   configuration, for example, the goal will be that each participant
91	   sees and hears just the active speaker.  In that case, the goal of
92	   the switch is to receive the voice and video streams from each
93	   participant, determine the active speaker based on energy in the
94	   voice packets, possibly using the client-to-mixer audio level RTP
95	   header extension [RFC6464], and select the corresponding video stream
96	   for transmission to participants; see Figure 1.

98	   In this document, an "RTP switch" is used as a common short term for
99	   the terms "switching RTP mixer", "source projecting middlebox",
100	   "source forwarding unit/middlebox" and "video switching MCU" as
101	   discussed in [RFC7667].

103	            +---+      +------------+      +---+
104	            | A |<---->|            |<---->| B |
105	            +---+      |            |      +---+
106	                       |   RTP      |
107	            +---+      |  Switch    |      +---+
108	            | C |<---->|            |<---->| D |
109	            +---+      +------------+      +---+

111	                           Figure 1: RTP switch

113	   In order to properly support switching of video streams, the RTP
114	   switch typically needs some critical information about video frames
115	   in order to start and stop forwarding streams.

117	   o  Because of inter-frame dependencies, it should ideally switch
118	      video streams at a point where the first frame from the new
119	      speaker can be decoded by recipients without prior frames, e.g
120	      switch on an intra-frame.
121	   o  In many cases, the switch may need to drop frames in order to
122	      realize congestion control techniques, and needs to know which
123	      frames can be dropped with minimal impact to video quality.
124	   o  For scalable streams with dependent layers, the switch may need to
125	      selectively forward specific layers to specific recipients due to
126	      recipient bandwidth or decoder limits.
127	   o  Furthermore, it is highly desirable to do this in a payload
128	      format-agnostic way which is not specific to each different video
129	      codec.  Most modern video codecs share common concepts around
130	      frame types and other critical information to make this codec-
131	      agnostic handling possible.
132	   o  It is also desirable to be able to do this for SRTP without
133	      requiring the video switch to decrypt the packets.  SRTP will
134	      encrypt the RTP payload format contents and consequently this data
135	      is not usable for the switching function without decryption, which
136	      may not even be possible in the case of end-to-end encryption of
137	      private media [I-D.ietf-perc-private-media-framework].

139	   By providing meta-information about the RTP streams outside the
140	   encrypted media payload, an RTP switch can do codec-agnostic
141	   selective forwarding without decrypting the payload.  This document
142	   specifies the necessary meta-information in an RTP header extension.

144	2.  Key Words for Normative Requirements

146	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
147	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
148	   document are to be interpreted as described in [RFC2119].

150	3.  Frame Marking RTP Header Extension

152	   This specification uses RTP header extensions as defined in
153	   [RFC8285].  A subset of meta-information from the video stream is
154	   provided as an RTP header extension to allow an RTP switch to do
155	   generic selective forwarding of video streams encoded with
156	   potentially different video codecs.

158	   The Frame Marking RTP header extension is encoded using the one-byte
159	   header or two-byte header as described in [RFC8285].  The one-byte
160	   header format is used for examples in this memo.  The two-byte header
161	   format is used when other two-byte header extensions are present in
162	   the same RTP packet, since mixing one-byte and two-byte extensions is
163	   not possible in the same RTP packet.

165	   This extension is only specified for Source (not Redundancy) RTP
166	   Streams [RFC7656] that carry video payloads.  It is not specified for
167	   audio payloads, nor is it specified for Redundancy RTP Streams.  The
168	   (separate) specifications for Redundancy RTP Streams often include
169	   provisions for recovering any header extensions that were part of the
170	   original source packet.  Such provisions SHALL be followed to recover
171	   the Frame Marking RTP header extension of the original source packet.
172	   Source packet frame markings may be useful when generating Redundancy
173	   RTP Streams; for example, the I and D bits can be used to generate
174	   extra or no redundancy, respectively, and redundancy schemes with
175	   source blocks can align source block boundaries with Independent
176	   frame boundaries as marked by the I bit.

178	   A frame, in the context of this specification, is the set of RTP
179	   packets with the same RTP timestamp from a specific RTP
180	   synchronization source (SSRC).  A frame within a layer is the set of
181	   RTP packets with the same RTP timestamp, SSRC, Temporal ID (TID), and
182	   Layer ID (LID).

184	3.1.  Long Extension for Scalable Streams

186	   The following RTP header extension is RECOMMENDED for scalable
187	   streams.  It MAY also be used for non-scalable streams, in which case
188	   TID, LID and TL0PICIDX MUST be 0 or omitted.  The ID is assigned per
189	   [RFC8285], and the length is encoded as L=2 which indicates 3 octets
190	   of data when nothing is omitted, or L=1 for 2 octets when TL0PICIDX
191	   is omitted, or L=0 for 1 octet when both LID and TL0PICIDX are
192	   omitted.

194	    0                   1                   2                   3
195	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
196	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
197	   |  ID=? |  L=2  |S|E|I|D|B| TID |   LID         |    TL0PICIDX  |
198	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
199	              or
200	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
201	   |  ID=? |  L=1  |S|E|I|D|B| TID |   LID         | (TL0PICIDX omitted)
202	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
203	              or
204	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
205	   |  ID=? |  L=0  |S|E|I|D|B| TID | (LID and TL0PICIDX omitted)
206	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

208	   The following information are extracted from the media payload and
209	   sent in the Frame Marking RTP header extension.

211	   o  S: Start of Frame (1 bit) - MUST be 1 in the first packet in a
212	      frame within a layer; otherwise MUST be 0.
213	   o  E: End of Frame (1 bit) - MUST be 1 in the last packet in a frame
214	      within a layer; otherwise MUST be 0.  Note that the RTP header
215	      marker bit MAY be used to infer the last packet of the highest
216	      enhancement layer, in payload formats with such semantics.
217	   o  I: Independent Frame (1 bit) - MUST be 1 for a frame within a
218	      layer that can be decoded independent of temporally prior frames,
219	      e.g. intra-frame, VPX keyframe, H.264 IDR [RFC6184], H.265
220	      IDR/CRA/BLA/RAP [RFC7798]; otherwise MUST be 0.  Note that this
221	      bit only signals temporal independence, so it can be 1 in spatial
222	      or quality enhancement layers that depend on temporally co-located
223	      layers but not temporally prior frames.
224	   o  D: Discardable Frame (1 bit) - MUST be 1 for a frame within a
225	      layer the sender knows can be discarded, and still provide a
226	      decodable media stream; otherwise MUST be 0.
227	   o  B: Base Layer Sync (1 bit) - When TID is not 0, this MUST be 1 if
228	      the sender knows this frame within a layer only depends on the
229	      base temporal layer; otherwise MUST be 0.  When TID is 0 or if no
230	      scalability is used, this MUST be 0.
231	   o  TID: Temporal ID (3 bits) - Identifies the temporal layer/sub-
232	      layer encoded, starting with 0 for the base layer, and increasing
233	      with higher temporal fidelity.  If no scalability is used, this
234	      MUST be 0.  It is implicitly 0 in the short extension format.
235	   o  LID: Layer ID (8 bits) - Identifies the spatial and quality layer
236	      encoded, starting with 0 for the base layer, and increasing with
237	      higher fidelity.  If no scalability is used, this MUST be 0 or
238	      omitted to reduce length.  When omitted, TL0PICIDX MUST also be
239	      omitted.  It is implicitly 0 in the short extension format or when
240	      omitted in the long extension format.
241	   o  TL0PICIDX: Temporal Layer 0 Picture Index (8 bits) - When TID is 0
242	      and LID is 0, this is a cyclic counter labeling base layer frames.
243	      When TID is not 0 or LID is not 0, this indicates a dependency on
244	      the given index, such that this frame within this layer depends on
245	      the frame with this label in the layer with TID 0 and LID 0.  If
246	      no scalability is used, or the cyclic counter is unknown, this
247	      MUST be omitted to reduce length.  Note that 0 is a valid index
248	      value for TL0PICIDX.

250	   The layer information contained in TID and LID convey useful aspects
251	   of the layer structure that can be utilized in selective forwarding.

253	   Without further information about the layer structure, these TID/LID
254	   identifiers can only be used for relative priority of layers and
255	   implicit dependencies between layers.  They convey a layer hierarchy
256	   with TID=0 and LID=0 identifying the base layer.  Higher values of
257	   TID identify higher temporal layers with higher frame rates.  Higher
258	   values of LID identify higher spatial and/or quality layers with
259	   higher resolutions and/or bitrates.  Implicit dependencies between
260	   layers assume that a layer with a given TID/LID MAY depend on
261	   layer(s) with the same or lower TID/LID, but MUST NOT depend on
262	   layer(s) with higher TID/LID.

264	   With further information, for example, possible future RTCP SDES
265	   items that convey full layer structure information, it may be
266	   possible to map these TIDs and LIDs to specific absolute frame rates,
267	   resolutions and bitrates, as well as explicit dependencies between
268	   layers.  Such additional layer information may be useful for
269	   forwarding decisions in the RTP switch, but is beyond the scope of
270	   this memo.  The relative layer information is still useful for many
271	   selective forwarding decisions even without such additional layer
272	   information.

274	3.2.  Short Extension for Non-Scalable Streams

276	   The following RTP header extension is RECOMMENDED for non-scalable
277	   streams.  It is identical to the shortest form of the extension for
278	   scalable streams, except the last four bits (B and TID) are replaced
279	   with zeros.  It MAY also be used for scalable streams if the sender
280	   has limited or no information about stream scalability.  The ID is
281	   assigned per [RFC8285], and the length is encoded as L=0 which
282	   indicates 1 octet of data.

284	    0                   1
285	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
286	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
287	   |  ID=? |  L=0  |S|E|I|D|0 0 0 0|
288	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

290	   The following information are extracted from the media payload and
291	   sent in the Frame Marking RTP header extension.

293	   o  S: Start of Frame (1 bit) - MUST be 1 in the first packet in a
294	      frame; otherwise MUST be 0.
295	   o  E: End of Frame (1 bit) - MUST be 1 in the last packet in a frame;
296	      otherwise MUST be 0.  SHOULD match the RTP header marker bit in
297	      payload formats with such semantics for marking end of frame.
298	   o  I: Independent Frame (1 bit) - MUST be 1 for frames that can be
299	      decoded independent of temporally prior frames, e.g. intra-frame,
300	      VPX keyframe, H.264 IDR [RFC6184], H.265 IDR/CRA/BLA/IRAP
301	      [RFC7798]; otherwise MUST be 0.
302	   o  D: Discardable Frame (1 bit) - MUST be 1 for frames the sender
303	      knows can be discarded, and still provide a decodable media
304	      stream; otherwise MUST be 0.
305	   o  The remaining (4 bits) - are reserved/fixed values and not used
306	      for non-scalable streams; they MUST be set to 0 upon transmission
307	      and ignored upon reception.

309	3.3.  Layer ID Mappings for Scalable Streams

311	   This section maps the specific Layer ID information contained in
312	   specific scalable codecs to the generic LID and TID fields.

314	   Note that non-scalable streams have no Layer ID information and thus
315	   no mappings.

317	3.3.1.  H265 LID Mapping

319	   The following shows the H265 [RFC7798] LayerID (6 bits) and TID (3
320	   bits) from the NAL unit header mapped to the generic LID and TID
321	   fields.

323	   The S and E bits MUST match the correspondingly named bits in
324	   PACI:PHES:TSCI payload structures.

326	   The I bit MUST be 1 when the NAL unit type is 16-23 (inclusive) or
327	   32-34 (inclusive), or an aggregation packet or fragmentation unit
328	   encapsulating any of these types, otherwise it MUST be 0.  These
329	   ranges cover intra (IRAP) frames as well as critical parameter sets
330	   (VPS, SPS, PPS).

332	   The D bit MUST be 1 when the NAL unit type is 0, 2, 4, 6, 8, 10, 12,
333	   14, or 38, or an aggregation packet or fragmentation unit
334	   encapsulating only these types, otherwise it MUST be 0.  These ranges
335	   cover non-reference frames as well as filler data.

337	   The B bit can not be determined reliably from simple inspection of
338	   payload headers, and therefore is determined by implementation-
339	   specific means.  For example, internal codec interfaces may provide
340	   information to set this reliably.

342	    0                   1                   2                   3
343	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
344	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
345	   |  ID=? |  L=2  |S|E|I|D|B| TID |0|0|  LayerID  |    TL0PICIDX  |
346	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

348	3.3.2.  H264-SVC LID Mapping

350	   The following shows H264-SVC [RFC6190] Layer encoding information (3
351	   bits for spatial/dependency layer, 4 bits for quality layer and 3
352	   bits for temporal layer) mapped to the generic LID and TID fields.

354	   The S, E, I and D bits MUST match the correspondingly named bits in
355	   PACSI payload structures.

357	   The I bit MUST be 1 when the NAL unit type is 5, 7, 8, 13, or 15, or
358	   an aggregation packet or fragmentation unit encapsulating any of
359	   these types, otherwise it MUST be 0.  These ranges cover intra (IDR)
360	   frames as well as critical parameter sets (SPS/PPS variants).

362	   The D bit MUST be 1 when the NAL unit header NRI field is 0, or an
363	   aggregation packet or fragmentation unit encapsulating only NAL units
364	   with NRI=0, otherwise it MUST be 0.  The NRI=0 condition signals non-
365	   reference frames.

367	   The B bit can not be determined reliably from simple inspection of
368	   payload headers, and therefore is determined by implementation-
369	   specific means.  For example, internal codec interfaces may provide
370	   information to set this reliably.

372	    0                   1                   2                   3
373	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
374	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
375	   |  ID=? |  L=2  |S|E|I|D|B| TID |0| DID |  QID  |    TL0PICIDX  |
376	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

378	3.3.3.  H264 (AVC) LID Mapping

380	   The following shows the header extension for H264 (AVC) [RFC6184]
381	   that contains only temporal layer information.

383	   The S bit MUST be 1 when the timestamp in the RTP header differs from
384	   the timestamp in the prior RTP sequence number from the same SSRC,
385	   otherwise it MUST be 0.

387	   The E bit MUST match the M bit in the RTP header.

389	   The I bit MUST be 1 when the NAL unit type is 5, 7, or 8, or an
390	   aggregation packet or fragmentation unit encapsulating any of these
391	   types, otherwise it MUST be 0.  These ranges cover intra (IDR) frames
392	   as well as critical parameter sets (SPS/PPS).

394	   The D bit MUST be 1 when the NAL unit header NRI field is 0, or an
395	   aggregation packet or fragmentation unit encapsulating only NAL units
396	   with NRI=0, otherwise it MUST be 0.  The NRI=0 condition signals non-
397	   reference frames.

399	   The B bit can not be determined reliably from simple inspection of
400	   payload headers, and therefore is determined by implementation-
401	   specific means.  For example, internal codec interfaces may provide
402	   information to set this reliably.

404	    0                   1                   2                   3
405	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
406	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
407	   |  ID=? |  L=2  |S|E|I|D|B| TID |0|0|0|0|0|0|0|0|    TL0PICIDX  |
408	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

410	3.3.4.  VP8 LID Mapping

412	   The following shows the header extension for VP8 [RFC7741] that
413	   contains only temporal layer information.

415	   The S bit MUST match the correspondingly named bit in the VP8 payload
416	   descriptor when PID=0, otherwise it MUST be 0.

418	   The E bit MUST match the M bit in the RTP header.

420	   The I bit MUST match the inverse of the P bit in the VP8 payload
421	   header.

423	   The D bit MUST match the N bit in the VP8 payload descriptor.

425	   The B bit MUST match the Y bit in the VP8 payload descriptor.

427	    0                   1                   2                   3
428	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
429	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
430	   |  ID=? |  L=2  |S|E|I|D|B| TID |0|0|0|0|0|0|0|0|    TL0PICIDX  |
431	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

433	3.3.5.  Future Codec LID Mapping

435	   The RTP payload format specification for future video codecs SHOULD
436	   include a section describing the LID mapping and TID mapping for the
437	   codec.  For example, the LID/TID mapping for the VP9 codec is
438	   described in the VP9 RTP Payload Format [I-D.ietf-payload-vp9].

440	3.4.  Signaling Information

442	   The URI for declaring this header extension in an extmap attribute is
443	   "urn:ietf:params:rtp-hdrext:framemarking".  It does not contain any
444	   extension attributes.

446	   An example attribute line in SDP:

448	      a=extmap:3 urn:ietf:params:rtp-hdrext:framemarking

450	3.5.  Usage Considerations

452	   The header extension values MUST represent what is already in the RTP
453	   payload.

455	   When an RTP switch needs to discard a received video frame due to
456	   congestion control considerations, it is RECOMMENDED that it
457	   preferably drop frames marked with the D (Discardable) bit set, or
458	   the highest values of TID and LID, which indicate the highest
459	   temporal and spatial/quality enhancement layers, since those
460	   typically have fewer dependenices on them than lower layers.

462	   When an RTP switch wants to forward a new video stream to a receiver,
463	   it is RECOMMENDED to select the new video stream from the first
464	   switching point with the I (Independent) bit set in all spatial
465	   layers and forward the same.  An RTP switch can request a media
466	   source to generate a switching point by sending Full Intra Request
467	   (RTCP FIR) as defined in [RFC5104], for example.

469	3.5.1.  Relation to Layer Refresh Request (LRR)

471	   Receivers can use the Layer Refresh Request (LRR)
472	   [I-D.ietf-avtext-lrr] RTCP feedback message to upgrade to a higher
473	   layer in scalable encodings.  The TID/LID values and formats used in
474	   LRR messages MUST correspond to the same values and formats specified
475	   in Section 3.1.

477	   Because frame marking can only be used with temporally-nested
478	   streams, temporal-layer LRR refreshes are unnecessary for frame-
479	   marked streams.  Other refreshes can be detected based on the I bit
480	   being set for the specific spatial layers.

482	3.5.2.  Scalability Structures

484	   The LID and TID information is most useful for fixed scalability
485	   structures, such as nested hierarchical temporal layering structures,
486	   where each temporal layer only references lower temporal layers or
487	   the base temporal layer.  The LID and TID information is less useful,
488	   or even not useful at all, for complex, irregular scalability
489	   structures that do not conform to common, fixed patterns of inter-
490	   layer dependencies and referencing structures.  Therefore it is
491	   RECOMMENDED to use LID and TID information for RTP switch forwarding
492	   decisions only in the case of temporally nested scalability
493	   structures, and it is NOT RECOMMENDED for other (more complex or
494	   irregular) scalability structures.

496	4.  Security Considerations

498	   In the Secure Real-Time Transport Protocol (SRTP) [RFC3711], RTP
499	   header extensions are authenticated but usually not encrypted.  When
500	   header extensions are used some of the payload type information are
501	   exposed and visible to middle boxes.  The encrypted media data is not
502	   exposed, so this is not seen as a high risk exposure.

504	5.  Acknowledgements

506	   Many thanks to Bernard Aboba, Jonathan Lennox, Stephan Wenger, Dale
507	   Worley, and Magnus Westerlund for their inputs.

509	6.  IANA Considerations

511	   This document defines a new extension URI to the RTP Compact
512	   HeaderExtensions sub-registry of the Real-Time Transport Protocol
513	   (RTP) Parameters registry, according to the following data:

515	   Extension URI: urn:ietf:params:rtp-hdrext:framemarkinginfo
516	   Description: Frame marking information for video streams
517	   Contact: mzanaty@cisco.com
518	   Reference: RFC XXXX

520	   Note to RFC Editor: please replace RFC XXXX with the number of this
521	   RFC.

523	7.  References

525	7.1.  Normative References

527	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
528	              Requirement Levels", BCP 14, RFC 2119,
529	              DOI 10.17487/RFC2119, March 1997,
530	              <https://www.rfc-editor.org/info/rfc2119>.

532	   [RFC6184]  Wang, Y., Even, R., Kristensen, T., and R. Jesup, "RTP
533	              Payload Format for H.264 Video", RFC 6184,
534	              DOI 10.17487/RFC6184, May 2011,
535	              <https://www.rfc-editor.org/info/rfc6184>.

537	   [RFC6190]  Wenger, S., Wang, Y., Schierl, T., and A. Eleftheriadis,
538	              "RTP Payload Format for Scalable Video Coding", RFC 6190,
539	              DOI 10.17487/RFC6190, May 2011,
540	              <https://www.rfc-editor.org/info/rfc6190>.

542	   [RFC7741]  Westin, P., Lundin, H., Glover, M., Uberti, J., and F.
543	              Galligan, "RTP Payload Format for VP8 Video", RFC 7741,
544	              DOI 10.17487/RFC7741, March 2016,
545	              <https://www.rfc-editor.org/info/rfc7741>.

547	   [RFC7798]  Wang, Y., Sanchez, Y., Schierl, T., Wenger, S., and M.
548	              Hannuksela, "RTP Payload Format for High Efficiency Video
549	              Coding (HEVC)", RFC 7798, DOI 10.17487/RFC7798, March
550	              2016, <https://www.rfc-editor.org/info/rfc7798>.

552	   [RFC8285]  Singer, D., Desineni, H., and R. Even, Ed., "A General
553	              Mechanism for RTP Header Extensions", RFC 8285,
554	              DOI 10.17487/RFC8285, October 2017,
555	              <https://www.rfc-editor.org/info/rfc8285>.

557	7.2.  Informative References

559	   [I-D.ietf-avtext-lrr]
560	              Lennox, J., Hong, D., Uberti, J., Holmer, S., and M.
561	              Flodman, "The Layer Refresh Request (LRR) RTCP Feedback
562	              Message", draft-ietf-avtext-lrr-07 (work in progress),
563	              July 2017.

565	   [I-D.ietf-payload-vp9]
566	              Uberti, J., Holmer, S., Flodman, M., Hong, D., and J.
567	              Lennox, "RTP Payload Format for VP9 Video", draft-ietf-
568	              payload-vp9-10 (work in progress), July 2020.

570	   [I-D.ietf-perc-private-media-framework]
571	              Jones, P., Benham, D., and C. Groves, "A Solution
572	              Framework for Private Media in Privacy Enhanced RTP
573	              Conferencing (PERC)", draft-ietf-perc-private-media-
574	              framework-12 (work in progress), June 2019.

576	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
577	              Jacobson, "RTP: A Transport Protocol for Real-Time
578	              Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550,
579	              July 2003, <https://www.rfc-editor.org/info/rfc3550>.

581	   [RFC3711]  Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
582	              Norrman, "The Secure Real-time Transport Protocol (SRTP)",
583	              RFC 3711, DOI 10.17487/RFC3711, March 2004,
584	              <https://www.rfc-editor.org/info/rfc3711>.

586	   [RFC5104]  Wenger, S., Chandra, U., Westerlund, M., and B. Burman,
587	              "Codec Control Messages in the RTP Audio-Visual Profile
588	              with Feedback (AVPF)", RFC 5104, DOI 10.17487/RFC5104,
589	              February 2008, <https://www.rfc-editor.org/info/rfc5104>.

591	   [RFC6464]  Lennox, J., Ed., Ivov, E., and E. Marocco, "A Real-time
592	              Transport Protocol (RTP) Header Extension for Client-to-
593	              Mixer Audio Level Indication", RFC 6464,
594	              DOI 10.17487/RFC6464, December 2011,
595	              <https://www.rfc-editor.org/info/rfc6464>.

597	   [RFC7656]  Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and
598	              B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms
599	              for Real-Time Transport Protocol (RTP) Sources", RFC 7656,
600	              DOI 10.17487/RFC7656, November 2015,
601	              <https://www.rfc-editor.org/info/rfc7656>.

603	   [RFC7667]  Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667,
604	              DOI 10.17487/RFC7667, November 2015,
605	              <https://www.rfc-editor.org/info/rfc7667>.

607	Authors' Addresses

609	   Mo Zanaty
610	   Cisco Systems
611	   170 West Tasman Drive
612	   San Jose, CA  95134
613	   US

615	   Email: mzanaty@cisco.com
616	   Espen Berger
617	   Cisco Systems

619	   Phone: +47 98228179
620	   Email: espeberg@cisco.com

622	   Suhas Nandakumar
623	   Cisco Systems
624	   170 West Tasman Drive
625	   San Jose, CA  95134
626	   US

628	   Email: snandaku@cisco.com