idnits 2.17.1 

draft-abhishek-mmusic-superimposition-grouping-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     All group and mid attributes MUST follow the rules defined in
     [RFC5888].  The "mid" attribute MUST be used for all "m" lines covering
     visual media within a session description for which a
     foreground/background relationship is to be defined.  The foreground/
     background relationship of visual media within a session description that
     is not covered in a group is undefined.  Multiple groups MUST not be used
     within one session.  If the identification-tags associated with "a=group"
     lines do not map to any "m" lines, the identification-tags MUST be
     ignored.

  -- The document date (June 1, 2021) is 1060 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: '-128' is mentioned on line 230, but not defined

  -- Looks like a reference, but probably isn't: '127' on line 230

  -- Looks like a reference, but probably isn't: '0' on line 231

  -- Looks like a reference, but probably isn't: '255' on line 231


     Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 4 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	mmusic                                                       R. Abhishek
3	Internet-Draft                                                 S. Wenger
4	Intended status: Standards Track                                 Tencent
5	Expires: December 3, 2021                                   June 1, 2021

7	                 SDP Superimposition Grouping framework
8	           draft-abhishek-mmusic-superimposition-grouping-02

10	Abstract

12	   This document defines semantics that allow for signaling a new SDP
13	   group "supim" for superimposed media in an SDP session.  The "supim"
14	   attribute can be used by the application to relate all the fully or
15	   partly superimposed visual media streams enabling them to be added as
16	   an overlay on top of any one or more background visual media streams.
17	   The superimposition grouping semantics is helpful if the media stream
18	   data is separate and transported via different sessions.

20	Status of This Memo

22	   This Internet-Draft is submitted in full conformance with the
23	   provisions of BCP 78 and BCP 79.

25	   Internet-Drafts are working documents of the Internet Engineering
26	   Task Force (IETF).  Note that other groups may also distribute
27	   working documents as Internet-Drafts.  The list of current Internet-
28	   Drafts is at https://datatracker.ietf.org/drafts/current/.

30	   Internet-Drafts are draft documents valid for a maximum of six months
31	   and may be updated, replaced, or obsoleted by other documents at any
32	   time.  It is inappropriate to use Internet-Drafts as reference
33	   material or to cite them other than as "work in progress."

35	   This Internet-Draft will expire on December 3, 2021.

37	Copyright Notice

39	   Copyright (c) 2021 IETF Trust and the persons identified as the
40	   document authors.  All rights reserved.

42	   This document is subject to BCP 78 and the IETF Trust's Legal
43	   Provisions Relating to IETF Documents
44	   (https://trustee.ietf.org/license-info) in effect on the date of
45	   publication of this document.  Please review these documents
46	   carefully, as they describe your rights and restrictions with respect
47	   to this document.  Code Components extracted from this document must
48	   include Simplified BSD License text as described in Section 4.e of
49	   the Trust Legal Provisions and are provided without warranty as
50	   described in the Simplified BSD License.

52	Table of Contents

54	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
55	   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   3
56	   3.  Media Superimposition in SDP  . . . . . . . . . . . . . . . .   3
57	   4.  Superimposition Group Identification Attribute  . . . . . . .   4
58	   5.  Use of group and mid  . . . . . . . . . . . . . . . . . . . .   5
59	   6.  "superimposition" Attribute for Superimposition Group
60	       Identification Attribute  . . . . . . . . . . . . . . . . . .   5
61	   7.  Example of Supim  . . . . . . . . . . . . . . . . . . . . . .   6
62	   8.  Relationship with Existing Specifications (informative) . . .   7
63	   9.  Security Considerations . . . . . . . . . . . . . . . . . . .   8
64	   10. IANA Considerations . . . . . . . . . . . . . . . . . . . . .   8
65	   11. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .   9
66	   12. References  . . . . . . . . . . . . . . . . . . . . . . . . .   9
67	     12.1.  Normative References . . . . . . . . . . . . . . . . . .   9
68	     12.2.  Informative References . . . . . . . . . . . . . . . . .  10
69	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  11

71	1.  Introduction

73	   This document defines semantics that allow for signaling a new SDP
74	   group "supim" for superimposed media in an SDP session.  The "supim"
75	   attribute can be used by the application to relate all the fully or
76	   partly superimposed visual media streams enabling them to be added as
77	   an overlay on top of any one or more background visual media streams.
78	   The superimposition grouping semantics is helpful if the media stream
79	   data is separate and transported via different sessions.

81	   Media superimposition herein is defined to be a visual media stream
82	   (video/image/text) that is fully or partly superimposed on top of an
83	   already existing visual media stream such that the resulting
84	   foreground and background media can be displayed simultaneously.
85	   Superimposition can be recursive in that visual media that is
86	   superimposed against its background can, in turn, be the background
87	   of another superimposed visual media.  The superimposed visual media
88	   displayed over a background media content may be anywhere between
89	   opaque and transparent.  Examples of applications for video
90	   superimposition include real-time multi-party gaming, where these
91	   superimposed media may be used to provide additional details or stats
92	   about each player, or multi-party teleconferencing where visual media
93	   from users in the teleconference may be superimposed over a
94	   background media or over each other.

96	   This document describes new SDP group semantics for grouping the
97	   superimposition in an SDP session.  An SDP session description
98	   consists of one or multiple media lines known as "m" lines which can
99	   be identified by a token carried in a "mid" attribute.  The SDP
100	   session describes a session-level group-level attribute that groups
101	   different media lines using a defined group semantics.  The semantics
102	   defined in this memo are to be used in conjunction with "The Session
103	   Description Protocol (SDP) Grouping Framework" [RFC5888].

105	   We have studied the existing specifications, including the CLUE
106	   framework [RFC8845] and work in MPEG, and found that such work is not
107	   covering our intended application space; please refer to Section 8
108	   for details.  The superimposition grouping as described below enables
109	   a compliant receiver/renderer implementation to know the relative
110	   relevance of the visual media as coded by the sender(s) and, in a
111	   compliant implementation, observed by the renderer through
112	   superimposition when needed.

114	2.  Terminology

116	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
117	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
118	   "OPTIONAL" in this document are to be interpreted as described in BCP
119	   14 [RFC2119] [RFC8174] when, and only when, they appear in all
120	   capitals, as shown here.

122	3.  Media Superimposition in SDP

124	   SDP is predominantly used for describing the format for multimedia
125	   communication sessions.  Many SDP-based systems use open standards
126	   such as RTP [RFC3550] for media transport and SIP [RFC3261] for
127	   session setup and control.  An SDP session may contain more than one
128	   media description, with each media description identified by
129	   "m"=line.  Each line denotes a single media stream.  If multiple
130	   visual media lines are present in a session, at present, rendering
131	   aspects, including their possible superimposition (foreground/
132	   background), relationship at the rendering device is undefined.  This
133	   memo introduces a mechanism in which certain rendering information
134	   becomes available.  The rendering information herein is limited to
135	   the foreground/background relationship of each grouped media to other
136	   media streams through a layer order value, and optionally a
137	   transparency value.  Where, spatially, the media is rendered is not
138	   covered by this memo, and is in many application scenarios a function
139	   of the user interface.  An example is shown in Figure 1, where three
140	   foreground media streams have been superimposed over a background
141	   media stream, with Media B being partly superimposed over Media C.

143	                            _____________________________________
144	                           | =================                   |
145	                           | ==== Media A ====                   |
146	                           | =================                   |
147	                           | =================                   |
148	                           |                   +++++++++++++++++ |
149	                           |                   ++++ Media B ++++ |
150	                           |       ############+++++++++++++++++ |
151	                           |       ############+++++++++++++++++ |
152	                           |       #### Media C ####             |
153	                           |       #################             |
154	                           |_____________________________________|

156	               Figure 1: A example of media superimposition

158	   Of course, assuming sufficient screen real-estate, a renderer may not
159	   have to rely on superimposition mechanisms at all-when there is
160	   enough screen real-estate available, a valid display strategy may
161	   well be to show all media without overlapping and hence without
162	   superimposition.  However, when the screen real-estate becomes
163	   insufficient, then the information provided by the mechanisms defined
164	   in this memo can be used to order (in the sense of foreground to
165	   background) the visual media according to a hierarchy chosen by the
166	   sender or a MANE (media-aware network element), and based on their
167	   application knowledge.

169	   When multiple superimposed streams are transmitted within a session,
170	   the receiver needs to be able to relate the media streams to each
171	   other.  This is achieved by the SDP grouping framework [RFC5888] by
172	   using the "group" attribute that groups different "m" lines in a
173	   session.  By using a new superimpose group semantic defined in this
174	   memo, a group's media streams can be uniquely identified across
175	   multiple SDP descriptions exchanged with different receivers, thereby
176	   identifying the streams in terms of their role in the session
177	   irrespective of their media type and transport protocol.  These
178	   superimposed streams within the group may be multiplexed based on the
179	   guidelines defined in [draft-ietf-avtcore-multiplex-guidelines-12].

181	4.  Superimposition Group Identification Attribute

183	   The "superimposition media stream identification" attribute, "supim",
184	   is used to identify the relationship of superimposed media streams
185	   within a session description.  In a superimposition group, the media
186	   lines MAY have different media formats.  There is no defined behavior
187	   for the rendering of non- visual media being grouped in a
188	   superimposition group.  It is assumed that all the media streams are
189	   that need to be time- synchronized are time-synchronized.  Its
190	   formatting follows [RFC5888] in the use of the 'mid' attribute to
191	   identify the media line to be included in the superimposition.

193	   It is used for grouping the foreground and the background media
194	   streams intended for the purpose of composition with foreground media
195	   to be superimposed over the background media stream.  A media player
196	   that chooses to implement the extension and receives a session
197	   description that contains "m" lines grouped together using "supim"
198	   semantics is able to superimpose the foreground media streams on top
199	   of the background media stream in cases where there is overlap.  For
200	   non-supporting devices, these media streams are treated as
201	   independent media streams.

203	5.  Use of group and mid

205	   All group and mid attributes MUST follow the rules defined in
206	   [RFC5888].  The "mid" attribute MUST be used for all "m" lines
207	   covering visual media within a session description for which a
208	   foreground/background relationship is to be defined.  The foreground/
209	   background relationship of visual media within a session description
210	   that is not covered in a group is undefined.  Multiple groups MUST
211	   not be used within one session.  If the identification-tags
212	   associated with "a=group" lines do not map to any "m" lines, the
213	   identification-tags MUST be ignored.

215	       semantics = "supim" /; semantics extension
216	                             as defined in RFC5888

218	6.  "superimposition" Attribute for Superimposition Group Identification
219	    Attribute

221	   This memo defines a new media-level attribute, "superimposition",
222	   with the following ABNF [RFC5234].  The identification-tag is defined
223	   in [RFC5888].

225	           superimposition-attribute =
226	                   "superimposition:" super-opt *(SP super-opt)
227	           super-opt = super-trans / super-layer
228	           super-trans = "transparency:" super-trans-val
229	           super-layer = "layer:" super-layer-val
230	           super-trans-val = signed-integer ; range [-128, 127]
231	           super-layer-val = signed-integer ; range [0, 255]

233	           signed-integer =
234	                   <zero-based-integer defined in RFC8866>
235	                           / "-" <integer defined in RFC8866>
236	           attribute = <attribute defined in RFC4566>
237	           attribute =/ superimposition-attribute

239	   The transparency for the media stream is identified by its super-
240	   trans-val values in the super-trans attribute.  The value MUST be an
241	   ASCII representation of an 8 bit signed integer with values between
242	   "-128" and "127", and linear weighting between the two extremes.  A
243	   value of -128 means the media stream is opaque, and the highest value
244	   of 127 means it is transparent.  Further details of interpretion is
245	   to be left open to the implementer.  The layering order value for the
246	   media stream is identified by super-layer-val.  It MUST be an integer
247	   value between 0 and n, where the value 0 represents the deepest
248	   background layer.  For each k within 0..n, a reconstructed sample of
249	   the k-th media is superimposed (while perhaps applying an super-
250	   trans-val value) on the 0 to k-th reconstructed samples in the same
251	   spatial position. Each "m" line in a session MUST NOT contain more
252	   than one instance of super-opt attribute.

254	7.  Example of Supim

256	   The following example shows a session description for superimposed
257	   media streams in an SDP session.  The "group" line indicates that the
258	   "m" lines with tokens 1, 2 and 3 are grouped for the purpose of
259	   superimposition.

261	   In the example shown below, three media streams are being transmitted
262	   for superimposition.  The background media stream along with the
263	   foreground media streams are grouped together using "supim".  All
264	   media streams are videos with "superimposition" attribute.  The media
265	   stream with layer order value 0 is intended for background.

267	       v=0
268	       o=Alice 292742730 29277831 IN IP4 233.252.0.74
269	       c=IN IP4 233.252.0.79
270	       t=0 0
271	       a=group:supim 1 2 3
272	       m=video 30000 RTP/AVP 31
273	       a=mid:1
274	       a= superimposition:transparency= -128, layer=0
275	       m=video 30002 RTP/AVP 31
276	       a=mid:2
277	       a= superimposition:transparency=35, layer=1
278	       m=video 30003 RTP/AVP 31
279	       a=mid:3
280	       a= superimposition:transparency=75, layer=2

282	   The transparency value is used for composing the foreground with the
283	   background media [Wiki.Alpha-compositing].  This value itself does
284	   not define the transparency of each pixel but is applied to each
285	   pixel within a frame and defines the factor by which the transparency
286	   of each pixel within a frame is to be increased or decreased.  The
287	   "layer" value is relevant when two or more media streams are to be
288	   composed.  When the transparency value of the foreground is -128, the
289	   composed image will be the foreground image, as it is being displayed
290	   as opaque.  Similarly, if the transparency value for the foreground
291	   media is 127, the resulting image will be the background media, as
292	   the foreground media stream is being presented fully transparent,
293	   hence invisible.  The details of the weighting of foreground and
294	   background sample values based on a given super-trans value is left
295	   to the implementation, beyond the abstract definition that value
296	   equal to -128 means opaque, and value equal to 127 means transparent,
297	   and the weighting is to be implemented such that it is visually
298	   linear for the values in between.  We do not define a weighting
299	   formula in this specification as these formulae would depend on many
300	   factors such as the colorspace and the sampling structure of the
301	   media.

303	8.  Relationship with Existing Specifications (informative)

305	   Edt. Note: maybe we remove this section later once there is a general
306	   understanding why the existing specifications in its current form is
307	   unsuitable.  The CLUE framework [RFC8845] is the IETF's chosen
308	   technology for the applications requiring defining multiple
309	   "captures" (camera views), and their geo-spatial relationship to
310	   each.  However, information pertaining to display/rendering is
311	   outside of CLUE's scope.  While many CLUE-capable receivers infer
312	   appropriate rendering strategies from the information offered by
313	   CLUE, the CLUE framework has generally assumed non-overlapped
314	   rendering of transmitted and reconstructed video streams from the
315	   multiple captures, often on different physical rendering devices.
316	   Insofar, we concluded that the CLUE framework neither supports the
317	   application we contemplate in this memo, nor would it be sensible to
318	   enhance the CLUE specifications with rendering-related mechanisms.
319	   There are certain technologies from standards bodies such as MPEG
320	   [MPEG-4], often described as "scene descriptions", that to a certain
321	   extent can address the applications we contemplate.  We evaluated the
322	   technologies we are aware of and concluded that something different
323	   is required.  We base our assumption on a) the complexity of these
324	   mechanisms, and b) their design as a metadata media stream, which in
325	   the IETF context would be conveyed in RTP sessions or similar, rather
326	   than a static or semi-static stream description that is best conveyed
327	   at session setup or renegotiation using SDP.

329	9.  Security Considerations

331	   All security considerations as defined in [RFC5888] apply:

333	   Using the "group" parameter with FID semantics, an entity that
334	   managed to modify the session descriptions exchanged between the
335	   participants to establish a multimedia session could force the
336	   participants to send a copy of the media to any destination of its
337	   choosing.

339	   Integrity mechanisms provided by protocols used to exchange session
340	   descriptions and media encryption can be used to prevent this attack.
341	   In SIP, Secure/Multipurpose Internet Mail Extensions (S/MIME)
342	   [RFC8550] and Transport Layer Security (TLS) [RFC8446] can be used to
343	   protect session description exchanges in an end-to-end and a hop-
344	   byhop fashion, respectively.

346	10.  IANA Considerations

348	   The following contact information shall be used for all registrations
349	   included here:

351	       Rohit Abhishek  <rabhishek@rabhishek.com>
352	       Stephan Wenger <stewe@stewe.org>
353	       The IETF MMUSIC working group <mmusic@ietf.org> or its successor
354	                                              as designated by the IESG.

356	   This document defines a new SDP group semantics value for media
357	   superimposition for a SDP session.  This attribute can be used by the
358	   application to group the foreground and the background media streams
359	   to be superimposed together in a session.  Semantics values to be
360	   used with this framework should be registered by the IANA following
361	   the Standards Action policy [RFC8126].  This document adds a new
362	   group semantics value to the sdp-paramters registry group defined in
363	   [RFC5888] [RFC8859].

365	   IANA is requested to register the following semantics value in the
366	   "sdp-parameters" in the registry.

368	   Semantics             Token          Reference
369	   ----------------------------------------------
370	   Superimposition       supim          RFCXXXX

372	   The "supim" attribute is used to group different media streams to be
373	   superimposed together with one background media stream and the rest
374	   foreground streams.  Its format is defined in Section 4.

376	   IANA is requested to register the semantics value for SDP media-level
377	   attribute "superimposition" for "sdp-attributes(media-level only)".
378	   The registration procedure in [RFC8866] applies.

380	   SDP Attribute ("sdp-attributes(media level only)"):

382	         Attribute name: superimposition: transparency, layer
383	         Long form: superimposition transparency, superimposition layer
384	         Type of name: att-field
385	         Type of attribute: media level only
386	         Subject to charset: no
387	         Purpose: RFC 5583
388	         Reference: RFC 5583
389	         Values: super-trans-val, super-layer-val

391	11.  Acknowledgements

393	   The authors would like to thank Christer Holmberg and Paul Kyzivat
394	   for reviewing the draft and providing key ideas.

396	12.  References

398	12.1.  Normative References

400	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
401	              Requirement Levels", BCP 14, RFC 2119,
402	              DOI 10.17487/RFC2119, March 1997,
403	              <https://www.rfc-editor.org/info/rfc2119>.

405	   [RFC3261]  Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
406	              A., Peterson, J., Sparks, R., Handley, M., and E.
407	              Schooler, "SIP: Session Initiation Protocol", RFC 3261,
408	              DOI 10.17487/RFC3261, June 2002,
409	              <https://www.rfc-editor.org/info/rfc3261>.

411	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
412	              Jacobson, "RTP: A Transport Protocol for Real-Time
413	              Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550,
414	              July 2003, <https://www.rfc-editor.org/info/rfc3550>.

416	   [RFC5234]  Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax
417	              Specifications: ABNF", STD 68, RFC 5234,
418	              DOI 10.17487/RFC5234, January 2008,
419	              <https://www.rfc-editor.org/info/rfc5234>.

421	   [RFC5888]  Camarillo, G. and H. Schulzrinne, "The Session Description
422	              Protocol (SDP) Grouping Framework", RFC 5888,
423	              DOI 10.17487/RFC5888, June 2010,
424	              <https://www.rfc-editor.org/info/rfc5888>.

426	   [RFC8126]  Cotton, M., Leiba, B., and T. Narten, "Guidelines for
427	              Writing an IANA Considerations Section in RFCs", BCP 26,
428	              RFC 8126, DOI 10.17487/RFC8126, June 2017,
429	              <https://www.rfc-editor.org/info/rfc8126>.

431	   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
432	              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
433	              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

435	   [RFC8446]  Rescorla, E., "The Transport Layer Security (TLS) Protocol
436	              Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018,
437	              <https://www.rfc-editor.org/info/rfc8446>.

439	   [RFC8550]  Schaad, J., Ramsdell, B., and S. Turner, "Secure/
440	              Multipurpose Internet Mail Extensions (S/MIME) Version 4.0
441	              Certificate Handling", RFC 8550, DOI 10.17487/RFC8550,
442	              April 2019, <https://www.rfc-editor.org/info/rfc8550>.

444	   [RFC8859]  Nandakumar, S., "A Framework for Session Description
445	              Protocol (SDP) Attributes When Multiplexing", RFC 8859,
446	              DOI 10.17487/RFC8859, January 2021,
447	              <https://www.rfc-editor.org/info/rfc8859>.

449	   [RFC8866]  Begen, A., Kyzivat, P., Perkins, C., and M. Handley, "SDP:
450	              Session Description Protocol", RFC 8866,
451	              DOI 10.17487/RFC8866, January 2021,
452	              <https://www.rfc-editor.org/info/rfc8866>.

454	12.2.  Informative References

456	   [draft-ietf-avtcore-multiplex-guidelines-12]
457	              Westerlund, M., Burman, B., Perkins, C., Alvestrand, H.,
458	              and R. Even, "Guidelines for using the Multiplexing
459	              Features of RTP to Support Multiple Media Streams", draft-
460	              ietf-avtcore-multiplex-guidelines-12 (work in progress),
461	              June 2020.

463	   [MPEG-4]   "MPEG-4 Scene Description and Application Engine",
464	              <https://mpeg.chiariglione.org/standards/mpeg-4/scene-
465	              description-and-application-engine>.

467	   [RFC8845]  Duckworth, M., Ed., Pepperell, A., and S. Wenger,
468	              "Framework for Telepresence Multi-Streams", RFC 8845,
469	              DOI 10.17487/RFC8845, January 2021,
470	              <https://www.rfc-editor.org/info/rfc8845>.

472	   [Wiki.Alpha-compositing]
473	              "Alpha compositing",
474	              <https://en.wikipedia.org/wiki/Alpha_compositing>.

476	Authors' Addresses

478	   Rohit Abhishek
479	   Tencent
480	   2747 Park Blvd
481	   Palo Alto  94588
482	   USA

484	   Email: rabhishek@rabhishek.com

486	   Stephan Wenger
487	   Tencent
488	   2747 Park Blvd
489	   Palo Alto  94588
490	   USA

492	   Email: stewe@stewe.org