idnits 2.17.1 draft-abhishek-mmusic-superimposition-grouping-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: All group and mid attributes MUST follow the rules defined in [RFC5888]. The "mid" attribute MUST be used for all "m" lines covering visual media within a session description for which a foreground/background relationship is to be defined. The foreground/ background relationship of visual media within a session description that is not covered in a group is undefined. Multiple groups MUST not be used within one session. If the identification-tags associated with "a=group" lines do not map to any "m" lines, the identification-tags MUST be ignored. -- The document date (June 1, 2021) is 1060 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '-128' is mentioned on line 230, but not defined -- Looks like a reference, but probably isn't: '127' on line 230 -- Looks like a reference, but probably isn't: '0' on line 231 -- Looks like a reference, but probably isn't: '255' on line 231 Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 mmusic R. Abhishek 3 Internet-Draft S. Wenger 4 Intended status: Standards Track Tencent 5 Expires: December 3, 2021 June 1, 2021 7 SDP Superimposition Grouping framework 8 draft-abhishek-mmusic-superimposition-grouping-02 10 Abstract 12 This document defines semantics that allow for signaling a new SDP 13 group "supim" for superimposed media in an SDP session. The "supim" 14 attribute can be used by the application to relate all the fully or 15 partly superimposed visual media streams enabling them to be added as 16 an overlay on top of any one or more background visual media streams. 17 The superimposition grouping semantics is helpful if the media stream 18 data is separate and transported via different sessions. 20 Status of This Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at https://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on December 3, 2021. 37 Copyright Notice 39 Copyright (c) 2021 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (https://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 55 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 56 3. Media Superimposition in SDP . . . . . . . . . . . . . . . . 3 57 4. Superimposition Group Identification Attribute . . . . . . . 4 58 5. Use of group and mid . . . . . . . . . . . . . . . . . . . . 5 59 6. "superimposition" Attribute for Superimposition Group 60 Identification Attribute . . . . . . . . . . . . . . . . . . 5 61 7. Example of Supim . . . . . . . . . . . . . . . . . . . . . . 6 62 8. Relationship with Existing Specifications (informative) . . . 7 63 9. Security Considerations . . . . . . . . . . . . . . . . . . . 8 64 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 65 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 9 66 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 9 67 12.1. Normative References . . . . . . . . . . . . . . . . . . 9 68 12.2. Informative References . . . . . . . . . . . . . . . . . 10 69 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 11 71 1. Introduction 73 This document defines semantics that allow for signaling a new SDP 74 group "supim" for superimposed media in an SDP session. The "supim" 75 attribute can be used by the application to relate all the fully or 76 partly superimposed visual media streams enabling them to be added as 77 an overlay on top of any one or more background visual media streams. 78 The superimposition grouping semantics is helpful if the media stream 79 data is separate and transported via different sessions. 81 Media superimposition herein is defined to be a visual media stream 82 (video/image/text) that is fully or partly superimposed on top of an 83 already existing visual media stream such that the resulting 84 foreground and background media can be displayed simultaneously. 85 Superimposition can be recursive in that visual media that is 86 superimposed against its background can, in turn, be the background 87 of another superimposed visual media. The superimposed visual media 88 displayed over a background media content may be anywhere between 89 opaque and transparent. Examples of applications for video 90 superimposition include real-time multi-party gaming, where these 91 superimposed media may be used to provide additional details or stats 92 about each player, or multi-party teleconferencing where visual media 93 from users in the teleconference may be superimposed over a 94 background media or over each other. 96 This document describes new SDP group semantics for grouping the 97 superimposition in an SDP session. An SDP session description 98 consists of one or multiple media lines known as "m" lines which can 99 be identified by a token carried in a "mid" attribute. The SDP 100 session describes a session-level group-level attribute that groups 101 different media lines using a defined group semantics. The semantics 102 defined in this memo are to be used in conjunction with "The Session 103 Description Protocol (SDP) Grouping Framework" [RFC5888]. 105 We have studied the existing specifications, including the CLUE 106 framework [RFC8845] and work in MPEG, and found that such work is not 107 covering our intended application space; please refer to Section 8 108 for details. The superimposition grouping as described below enables 109 a compliant receiver/renderer implementation to know the relative 110 relevance of the visual media as coded by the sender(s) and, in a 111 compliant implementation, observed by the renderer through 112 superimposition when needed. 114 2. Terminology 116 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 117 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 118 "OPTIONAL" in this document are to be interpreted as described in BCP 119 14 [RFC2119] [RFC8174] when, and only when, they appear in all 120 capitals, as shown here. 122 3. Media Superimposition in SDP 124 SDP is predominantly used for describing the format for multimedia 125 communication sessions. Many SDP-based systems use open standards 126 such as RTP [RFC3550] for media transport and SIP [RFC3261] for 127 session setup and control. An SDP session may contain more than one 128 media description, with each media description identified by 129 "m"=line. Each line denotes a single media stream. If multiple 130 visual media lines are present in a session, at present, rendering 131 aspects, including their possible superimposition (foreground/ 132 background), relationship at the rendering device is undefined. This 133 memo introduces a mechanism in which certain rendering information 134 becomes available. The rendering information herein is limited to 135 the foreground/background relationship of each grouped media to other 136 media streams through a layer order value, and optionally a 137 transparency value. Where, spatially, the media is rendered is not 138 covered by this memo, and is in many application scenarios a function 139 of the user interface. An example is shown in Figure 1, where three 140 foreground media streams have been superimposed over a background 141 media stream, with Media B being partly superimposed over Media C. 143 _____________________________________ 144 | ================= | 145 | ==== Media A ==== | 146 | ================= | 147 | ================= | 148 | +++++++++++++++++ | 149 | ++++ Media B ++++ | 150 | ############+++++++++++++++++ | 151 | ############+++++++++++++++++ | 152 | #### Media C #### | 153 | ################# | 154 |_____________________________________| 156 Figure 1: A example of media superimposition 158 Of course, assuming sufficient screen real-estate, a renderer may not 159 have to rely on superimposition mechanisms at all-when there is 160 enough screen real-estate available, a valid display strategy may 161 well be to show all media without overlapping and hence without 162 superimposition. However, when the screen real-estate becomes 163 insufficient, then the information provided by the mechanisms defined 164 in this memo can be used to order (in the sense of foreground to 165 background) the visual media according to a hierarchy chosen by the 166 sender or a MANE (media-aware network element), and based on their 167 application knowledge. 169 When multiple superimposed streams are transmitted within a session, 170 the receiver needs to be able to relate the media streams to each 171 other. This is achieved by the SDP grouping framework [RFC5888] by 172 using the "group" attribute that groups different "m" lines in a 173 session. By using a new superimpose group semantic defined in this 174 memo, a group's media streams can be uniquely identified across 175 multiple SDP descriptions exchanged with different receivers, thereby 176 identifying the streams in terms of their role in the session 177 irrespective of their media type and transport protocol. These 178 superimposed streams within the group may be multiplexed based on the 179 guidelines defined in [draft-ietf-avtcore-multiplex-guidelines-12]. 181 4. Superimposition Group Identification Attribute 183 The "superimposition media stream identification" attribute, "supim", 184 is used to identify the relationship of superimposed media streams 185 within a session description. In a superimposition group, the media 186 lines MAY have different media formats. There is no defined behavior 187 for the rendering of non- visual media being grouped in a 188 superimposition group. It is assumed that all the media streams are 189 that need to be time- synchronized are time-synchronized. Its 190 formatting follows [RFC5888] in the use of the 'mid' attribute to 191 identify the media line to be included in the superimposition. 193 It is used for grouping the foreground and the background media 194 streams intended for the purpose of composition with foreground media 195 to be superimposed over the background media stream. A media player 196 that chooses to implement the extension and receives a session 197 description that contains "m" lines grouped together using "supim" 198 semantics is able to superimpose the foreground media streams on top 199 of the background media stream in cases where there is overlap. For 200 non-supporting devices, these media streams are treated as 201 independent media streams. 203 5. Use of group and mid 205 All group and mid attributes MUST follow the rules defined in 206 [RFC5888]. The "mid" attribute MUST be used for all "m" lines 207 covering visual media within a session description for which a 208 foreground/background relationship is to be defined. The foreground/ 209 background relationship of visual media within a session description 210 that is not covered in a group is undefined. Multiple groups MUST 211 not be used within one session. If the identification-tags 212 associated with "a=group" lines do not map to any "m" lines, the 213 identification-tags MUST be ignored. 215 semantics = "supim" /; semantics extension 216 as defined in RFC5888 218 6. "superimposition" Attribute for Superimposition Group Identification 219 Attribute 221 This memo defines a new media-level attribute, "superimposition", 222 with the following ABNF [RFC5234]. The identification-tag is defined 223 in [RFC5888]. 225 superimposition-attribute = 226 "superimposition:" super-opt *(SP super-opt) 227 super-opt = super-trans / super-layer 228 super-trans = "transparency:" super-trans-val 229 super-layer = "layer:" super-layer-val 230 super-trans-val = signed-integer ; range [-128, 127] 231 super-layer-val = signed-integer ; range [0, 255] 233 signed-integer = 234 235 / "-" 236 attribute = 237 attribute =/ superimposition-attribute 239 The transparency for the media stream is identified by its super- 240 trans-val values in the super-trans attribute. The value MUST be an 241 ASCII representation of an 8 bit signed integer with values between 242 "-128" and "127", and linear weighting between the two extremes. A 243 value of -128 means the media stream is opaque, and the highest value 244 of 127 means it is transparent. Further details of interpretion is 245 to be left open to the implementer. The layering order value for the 246 media stream is identified by super-layer-val. It MUST be an integer 247 value between 0 and n, where the value 0 represents the deepest 248 background layer. For each k within 0..n, a reconstructed sample of 249 the k-th media is superimposed (while perhaps applying an super- 250 trans-val value) on the 0 to k-th reconstructed samples in the same 251 spatial position. Each "m" line in a session MUST NOT contain more 252 than one instance of super-opt attribute. 254 7. Example of Supim 256 The following example shows a session description for superimposed 257 media streams in an SDP session. The "group" line indicates that the 258 "m" lines with tokens 1, 2 and 3 are grouped for the purpose of 259 superimposition. 261 In the example shown below, three media streams are being transmitted 262 for superimposition. The background media stream along with the 263 foreground media streams are grouped together using "supim". All 264 media streams are videos with "superimposition" attribute. The media 265 stream with layer order value 0 is intended for background. 267 v=0 268 o=Alice 292742730 29277831 IN IP4 233.252.0.74 269 c=IN IP4 233.252.0.79 270 t=0 0 271 a=group:supim 1 2 3 272 m=video 30000 RTP/AVP 31 273 a=mid:1 274 a= superimposition:transparency= -128, layer=0 275 m=video 30002 RTP/AVP 31 276 a=mid:2 277 a= superimposition:transparency=35, layer=1 278 m=video 30003 RTP/AVP 31 279 a=mid:3 280 a= superimposition:transparency=75, layer=2 282 The transparency value is used for composing the foreground with the 283 background media [Wiki.Alpha-compositing]. This value itself does 284 not define the transparency of each pixel but is applied to each 285 pixel within a frame and defines the factor by which the transparency 286 of each pixel within a frame is to be increased or decreased. The 287 "layer" value is relevant when two or more media streams are to be 288 composed. When the transparency value of the foreground is -128, the 289 composed image will be the foreground image, as it is being displayed 290 as opaque. Similarly, if the transparency value for the foreground 291 media is 127, the resulting image will be the background media, as 292 the foreground media stream is being presented fully transparent, 293 hence invisible. The details of the weighting of foreground and 294 background sample values based on a given super-trans value is left 295 to the implementation, beyond the abstract definition that value 296 equal to -128 means opaque, and value equal to 127 means transparent, 297 and the weighting is to be implemented such that it is visually 298 linear for the values in between. We do not define a weighting 299 formula in this specification as these formulae would depend on many 300 factors such as the colorspace and the sampling structure of the 301 media. 303 8. Relationship with Existing Specifications (informative) 305 Edt. Note: maybe we remove this section later once there is a general 306 understanding why the existing specifications in its current form is 307 unsuitable. The CLUE framework [RFC8845] is the IETF's chosen 308 technology for the applications requiring defining multiple 309 "captures" (camera views), and their geo-spatial relationship to 310 each. However, information pertaining to display/rendering is 311 outside of CLUE's scope. While many CLUE-capable receivers infer 312 appropriate rendering strategies from the information offered by 313 CLUE, the CLUE framework has generally assumed non-overlapped 314 rendering of transmitted and reconstructed video streams from the 315 multiple captures, often on different physical rendering devices. 316 Insofar, we concluded that the CLUE framework neither supports the 317 application we contemplate in this memo, nor would it be sensible to 318 enhance the CLUE specifications with rendering-related mechanisms. 319 There are certain technologies from standards bodies such as MPEG 320 [MPEG-4], often described as "scene descriptions", that to a certain 321 extent can address the applications we contemplate. We evaluated the 322 technologies we are aware of and concluded that something different 323 is required. We base our assumption on a) the complexity of these 324 mechanisms, and b) their design as a metadata media stream, which in 325 the IETF context would be conveyed in RTP sessions or similar, rather 326 than a static or semi-static stream description that is best conveyed 327 at session setup or renegotiation using SDP. 329 9. Security Considerations 331 All security considerations as defined in [RFC5888] apply: 333 Using the "group" parameter with FID semantics, an entity that 334 managed to modify the session descriptions exchanged between the 335 participants to establish a multimedia session could force the 336 participants to send a copy of the media to any destination of its 337 choosing. 339 Integrity mechanisms provided by protocols used to exchange session 340 descriptions and media encryption can be used to prevent this attack. 341 In SIP, Secure/Multipurpose Internet Mail Extensions (S/MIME) 342 [RFC8550] and Transport Layer Security (TLS) [RFC8446] can be used to 343 protect session description exchanges in an end-to-end and a hop- 344 byhop fashion, respectively. 346 10. IANA Considerations 348 The following contact information shall be used for all registrations 349 included here: 351 Rohit Abhishek 352 Stephan Wenger 353 The IETF MMUSIC working group or its successor 354 as designated by the IESG. 356 This document defines a new SDP group semantics value for media 357 superimposition for a SDP session. This attribute can be used by the 358 application to group the foreground and the background media streams 359 to be superimposed together in a session. Semantics values to be 360 used with this framework should be registered by the IANA following 361 the Standards Action policy [RFC8126]. This document adds a new 362 group semantics value to the sdp-paramters registry group defined in 363 [RFC5888] [RFC8859]. 365 IANA is requested to register the following semantics value in the 366 "sdp-parameters" in the registry. 368 Semantics Token Reference 369 ---------------------------------------------- 370 Superimposition supim RFCXXXX 372 The "supim" attribute is used to group different media streams to be 373 superimposed together with one background media stream and the rest 374 foreground streams. Its format is defined in Section 4. 376 IANA is requested to register the semantics value for SDP media-level 377 attribute "superimposition" for "sdp-attributes(media-level only)". 378 The registration procedure in [RFC8866] applies. 380 SDP Attribute ("sdp-attributes(media level only)"): 382 Attribute name: superimposition: transparency, layer 383 Long form: superimposition transparency, superimposition layer 384 Type of name: att-field 385 Type of attribute: media level only 386 Subject to charset: no 387 Purpose: RFC 5583 388 Reference: RFC 5583 389 Values: super-trans-val, super-layer-val 391 11. Acknowledgements 393 The authors would like to thank Christer Holmberg and Paul Kyzivat 394 for reviewing the draft and providing key ideas. 396 12. References 398 12.1. Normative References 400 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 401 Requirement Levels", BCP 14, RFC 2119, 402 DOI 10.17487/RFC2119, March 1997, 403 . 405 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 406 A., Peterson, J., Sparks, R., Handley, M., and E. 407 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 408 DOI 10.17487/RFC3261, June 2002, 409 . 411 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 412 Jacobson, "RTP: A Transport Protocol for Real-Time 413 Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, 414 July 2003, . 416 [RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax 417 Specifications: ABNF", STD 68, RFC 5234, 418 DOI 10.17487/RFC5234, January 2008, 419 . 421 [RFC5888] Camarillo, G. and H. Schulzrinne, "The Session Description 422 Protocol (SDP) Grouping Framework", RFC 5888, 423 DOI 10.17487/RFC5888, June 2010, 424 . 426 [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for 427 Writing an IANA Considerations Section in RFCs", BCP 26, 428 RFC 8126, DOI 10.17487/RFC8126, June 2017, 429 . 431 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 432 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 433 May 2017, . 435 [RFC8446] Rescorla, E., "The Transport Layer Security (TLS) Protocol 436 Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018, 437 . 439 [RFC8550] Schaad, J., Ramsdell, B., and S. Turner, "Secure/ 440 Multipurpose Internet Mail Extensions (S/MIME) Version 4.0 441 Certificate Handling", RFC 8550, DOI 10.17487/RFC8550, 442 April 2019, . 444 [RFC8859] Nandakumar, S., "A Framework for Session Description 445 Protocol (SDP) Attributes When Multiplexing", RFC 8859, 446 DOI 10.17487/RFC8859, January 2021, 447 . 449 [RFC8866] Begen, A., Kyzivat, P., Perkins, C., and M. Handley, "SDP: 450 Session Description Protocol", RFC 8866, 451 DOI 10.17487/RFC8866, January 2021, 452 . 454 12.2. Informative References 456 [draft-ietf-avtcore-multiplex-guidelines-12] 457 Westerlund, M., Burman, B., Perkins, C., Alvestrand, H., 458 and R. Even, "Guidelines for using the Multiplexing 459 Features of RTP to Support Multiple Media Streams", draft- 460 ietf-avtcore-multiplex-guidelines-12 (work in progress), 461 June 2020. 463 [MPEG-4] "MPEG-4 Scene Description and Application Engine", 464 . 467 [RFC8845] Duckworth, M., Ed., Pepperell, A., and S. Wenger, 468 "Framework for Telepresence Multi-Streams", RFC 8845, 469 DOI 10.17487/RFC8845, January 2021, 470 . 472 [Wiki.Alpha-compositing] 473 "Alpha compositing", 474 . 476 Authors' Addresses 478 Rohit Abhishek 479 Tencent 480 2747 Park Blvd 481 Palo Alto 94588 482 USA 484 Email: rabhishek@rabhishek.com 486 Stephan Wenger 487 Tencent 488 2747 Park Blvd 489 Palo Alto 94588 490 USA 492 Email: stewe@stewe.org