idnits 2.17.1 draft-abhishek-mmusic-superimposition-grouping-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 1, 2021) is 1180 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 mmusic R. Abhishek 3 Internet-Draft S. Wenger 4 Intended status: Standards Track Tencent 5 Expires: August 5, 2021 February 1, 2021 7 SDP Superimposition Grouping framework 8 draft-abhishek-mmusic-superimposition-grouping-01 10 Abstract 12 This document defines semantics that allow for signaling a new SDP 13 group "supim" for superimposed media in an SDP session. The "supim" 14 attribute can be used by the application to relate all the 15 superimposed visual media streams enabling them to be added as an 16 overlay on top of any visual media stream. The superimposition 17 grouping semantics is helpful, if the media data is separate and 18 transported via different sessions. 20 Status of This Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at https://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on August 5, 2021. 37 Copyright Notice 39 Copyright (c) 2021 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (https://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 55 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 56 3. Superimposition Group Identification Attribute . . . . . . . 4 57 4. Use of group and mid . . . . . . . . . . . . . . . . . . . . 5 58 5. "superposition" Attribute for Superimposition Group 59 Identification Attribute . . . . . . . . . . . . . . . . . . 5 60 6. Example of Supim . . . . . . . . . . . . . . . . . . . . . . 6 61 7. Relationship with CLUE (informative) . . . . . . . . . . . . 7 62 8. Security Considerations . . . . . . . . . . . . . . . . . . . 7 63 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 64 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 9 65 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 9 66 11.1. Normative References . . . . . . . . . . . . . . . . . . 9 67 11.2. Informative References . . . . . . . . . . . . . . . . . 10 68 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 10 70 1. Introduction 72 Media superimposition herein is defined to be a visual media 73 (video/image/text) which is superimposed on top of an already 74 existing visual media such that the resulting foreground and 75 background media can be displayed simultaneously. Superimposition 76 can be recursive in that visual media that is superimposed against 77 its background can, in turn, be the background of another 78 superimposed visual media. The superimposed visual media displayed 79 over a background media content may be anywhere between opaque and 80 transparent. Examples of applications for video superimposition 81 include real-time multi-party gaming, where these superimposed media 82 maybe used to provide additional details or stats about each player, 83 or multi-party teleconferencing where visual media from users in the 84 teleconference may be superimposed on a background media or over each 85 other. An example is shown in the figure below, where three 86 foreground media have been superimposed over a background media, with 87 one foreground media being partly superimposed over another 88 foreground media. 90 ---------------------------- 91 | Background media | 92 | _________ | 93 | | Media A | | 94 | |_________| | 95 | __________ | 96 | ______|__ Media B| | 97 | |Media |__|_______| | 98 | |_C_______| | 99 ---------------------------- 101 Figure 1: A example of media superimposition 103 SDP is predominantly used for describing the format for multimedia 104 communication sessions. Many SDP-based systems use open standards 105 such as RTP [RFC3550] for media transport and SIP [RFC3261] for 106 session setup and control. An SDP session may contain more than one 107 media description with each media description identified by "m"=line. 108 Each line denotes a single media stream. If multiple visual media 109 lines are present in a session, at present, their superimposition 110 (foreground/background) relationship at the rendering device is 111 undefined. This memo introduces a mechanism in which certain 112 rendering information becomes available. The rendering information 113 herein is limited to the foreground/background relationship of each 114 grouped media vis-a-vis each other through a layer order value, and 115 optionally a transparency value. Where, spatially, the media is 116 rendered is not covered by this memo, and is in many application 117 scenarios a function of the user interface. The CLUE framework 118 [RFC8845] is available when the application requires defining capture 119 (camera ports), and their geo-spatial relationship to each other is 120 needed. The superimposition grouping as described below enables a 121 compliant receiver/renderer implementation to know the relative 122 relevance of the visual media as coded by the sender(s) and, in a 123 compliant implementation, observed by the renderer through 124 superimposition when needed. Of course, assuming sufficient screen 125 real-estate, a renderer may not have to rely on superimposition 126 mechanisms at all--when there is enough screen real-estate available, 127 a valid display strategy may well be to show all media without 128 overlapping and hence without superimposition. However, when the 129 screen real-estate becomes insufficient, then the information 130 provided by the mechanisms defined in this memo can be used to order 131 (in the sense of foreground to background) the visual media according 132 to a hierarchy chosen by the sender or a middlebox, and based on 133 their application knowledge. 135 When multiple superimposed streams are transmitted within a session, 136 the receiver needs to be able to relate the media streams to each 137 other. This is achieved by the SDP grouping framework [RFC5888] by 138 using the "group" attribute that groups different "m" lines in a 139 session. By using a new superimpose group semantic defined in this 140 memo, a group's media streams can be uniquely identified across 141 multiple SDP descriptions exchanged with different receivers, thereby 142 identifying the streams in terms of their role in the session 143 irrespective of its media type and transport protocol. These 144 superimposed streams within the group may be multiplexed based on the 145 guidelines defined in [draft-ietf-avtcore-multiplex-guidelines-12]. 147 This document describes a new SDP group semantics for grouping the 148 superimposition in an SDP session. An SDP session description 149 consists of one or multiple media lines known as "m" lines which can 150 be identified by a token carried in a "mid" attribute. The SDP 151 session describes a session-level group level attribute that groups 152 different media lines using a defined group semantics. The semantics 153 defined in this memo is to be used in conjunction with "The Session 154 Description Protocol (SDP) Grouping Framework"[RFC5888]. 156 2. Terminology 158 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 159 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 160 document are to be interpreted as described in [RFC2119]. 162 3. Superimposition Group Identification Attribute 164 The "superimposition media stream identification" attribute is used 165 to identify the relationship of superimposed media streams within a 166 session description. In a superimposition group, the media lines MAY 167 have different media formats but, to be meaningful, SHOULD be visual 168 media. There is no defined behavior for the rendering of non-visual 169 media being grouped in a superimposition group. Its formatting 170 follows [RFC5888] in the use of 'mid' attribute to identify the media 171 line to be included in the superimposition. 173 This document defines a new group semantics "supim" identification 174 media attribute, which is used to identify superimposition group 175 media streams within a session description. It is used for grouping 176 the foreground and the background media streams intended for the 177 purpose of composition with foreground media to be superimposed over 178 the background media stream. An application that chooses to 179 implement the extension, receives a session description that contains 180 "m" lines grouped together using "supim" semantics MUST superimpose 181 the foreground media streams on top of the background media stream in 182 case there is overlap. For non-supporting devices, these media 183 streams are treated as independent media streams. 185 4. Use of group and mid 187 All group and mid attributes MUST follow the rules defined in 188 [RFC5888]. The "mid" attribute MUST be used for all "m" lines 189 covering visual media within a session description for which a 190 foreground/background relationship is to be defined. The foreground/ 191 background relationship of visual media within a session description 192 that is not covered in a group is undefined. No more than one group 193 MUST be used within one session. If the identification-tags 194 associated with "a=group" lines do not map to any "m" lines, it MUST 195 be ignored. 197 semantics = "supim" /; semantics extension 198 as defined in RFC5888 200 5. "superposition" Attribute for Superimposition Group Identification 201 Attribute 203 This memo defines a new media-level attribute, "superposition", with 204 the following ABNF [RFC5234]. The identification-tag is defined in 205 [RFC5888]. 207 superimposition-attribute = 208 "a=superposition:" "transparency:" transparency-tag, 209 "layer:" layer-tag 210 transparency-tag =transparency-value *("," transparency-value) CRLF 211 transparency-value= alpha 212 layer-tag =layering-order *("," layering-order) CRLF 213 layering-order = beta 215 Alpha describes the transparency for the media stream. It is 216 identified by its transparency-tag values in the transparency- 217 attribute. The transparency value must be an ASCII representation of 218 an 8 bit signed integer with values between "-128" and "127", and 219 linear weighting between the two extremes. A value of -128 means 220 media stream is opaque and the highest value of 127 means it is 221 transparent. Beta represents the layering order value for the media 222 stream. The layering order value is an integer value between 0 to n, 223 where the value 0 represents the most background layer. For each k 224 within 0..n, a reconstructed sample of the k-th media is superimposed 225 (while perhaps applying an alpha transparency value) on the 0 to k-th 226 reconstructed samples in the same spatial position. The transparency 227 attribute MUST be omitted for layer with order 0, and the default 228 transparency value for background media stream of -128 is applied. 230 6. Example of Supim 232 The following example shows a session description for superimposed 233 media stream in an SDP session. The "group" line indicates that the 234 "m" lines with tokens 1, 2 and 3 are grouped for the purpose of 235 superimpositon. 237 In the example shown below, three media streams are being transmitted 238 for superimposition. The background media stream along with the 239 foreground media streams are grouped together using "supim". All 240 media streams are video with "superposition" attribute. Media stream 241 with layer order value 0 is intended for background. 243 v=0 244 o=Alice 292742730 29277831 IN IP4 233.252.0.74 245 c=IN IP4 233.252.0.79 246 t=0 0 247 a=group:supim 1 2 3 248 m=video 30000 RTP/AVP 31 249 a=mid:1 250 a= superposition:transparency= -128, layer=0 251 m=video 30002 RTP/AVP 31 252 a=mid:2 253 a= superposition:transparency=35, layer=1 254 m=video 30003 RTP/AVP 31 255 a=mid:3 256 a= superposition:transparency=75, layer=2 258 The transparency value is used for composing the foreground with the 259 background media [Wiki.Alpha-compositing]. The "layer" value is 260 relevant when two or more media streams are to be composed. When the 261 transparency value of the foreground is -128, the composed image will 262 be the foreground image, as it is being displayed as opaque. 263 Similarly, if the transparency value for the foreground media is 127, 264 the resulting image will be the background media, as the foreground 265 media stream is being presented fully transparent, hence invisible. 266 The details of the weighting of foreground and background sample 267 values based on a given alpha value is left undefined herein, beyond 268 the abstract definition that alpha equal to -128 means opaque, and 269 alpha equal to 127 means transparent, and the weighting is to be 270 implemented such that it is visually linear for the values in 271 between. We do not define a weighting formula as these formulae 272 would depend on many factors such as the colorspace and the sampling 273 structure of the media. 275 7. Relationship with CLUE (informative) 277 Edt. Note: maybe we remove this section later once there is a general 278 understanding why CLUE in its current form is unsuitable. The CLUE 279 framework [RFC8845] and its associated suite of I-Ds and RFCs 280 describe a telepresence framework that, at the first glance seems to 281 have a lot in common with the technology proposed herein. CLUE 282 defines captures (camera ports), and their geo-spatial relationship 283 to each other. A render can use this information to put the 284 reconstructed samples of the streams from the various captures into a 285 suitable arrangement such that visually pleasant rendering can be 286 achieved. However, CLUE does not describe the relative relevance of 287 the captures. For that reason, CLUE would need to be extended in a 288 spirit very similar to the one described in this memo to achieve the 289 desired functionality. CLUE has not seen wide deployment outside its 290 intended key application (large room, multiple camera telepresence 291 systems). It's not reasonable to assume that small systems would 292 willingly implement the overhead the (comparatively complex) CLUE 293 protocols require when a simple SDP extension can serve the same 294 purpose. 296 8. Security Considerations 298 All security considerations as defined in [RFC5888] apply: 300 Using the "group" parameter with FID semantics, an entity that 301 managed to modify the session descriptions exchanged between the 302 participants to establish a multimedia session could force the 303 participants to send a copy of the media to any destination of its 304 choosing. 306 Integrity mechanisms provided by protocols used to exchange session 307 descriptions and media encryption can be used to prevent this attack. 308 In SIP, Secure/Multipurpose Internet Mail Extensions (S/MIME) 309 [RFC8550] and Transport Layer Security (TLS) [RFC8446] can be used to 310 protect session description exchanges in an end-to-end and a hop- 311 byhop fashion, respectively. 313 9. IANA Considerations 315 The following contact information shall be used for all registrations 316 included here: 318 Rohit Abhishek 319 Stephan Wenger 320 The IETF MMUSIC working group or its successor 321 as designated by the IESG. 323 This document defines a new SDP group semantics for media 324 superimposition for a SDP session. This attribute can be used by the 325 application to group the foreground and the background media streams 326 to be superimposed together in a session. Semantics values to be 327 used with this framework should be registered by the IANA following 328 the Standards Action policy [RFC8126]. This document adds a new 329 group semantics and follows the registry group defined in [RFC5888]. 331 The following semantics needs to be registered by IANA in Semantics 332 for the "group" SDP Attribute under SDP Parameters. 334 Semantics Token Reference 335 ---------------------------------------------- 336 Superimposition supim RFCXXXX 338 The "supim" attribute is used to group different media streams to be 339 superimposed together with one backgorund media stream and rest 340 foreground streams. Its format is defined in Section 3. 342 The SDP media-level attribute "superposition" needs to be registered 343 by IANA Semantics for "att-field (media-level only)". The 344 registration procedure in [RFC8866] applies. 346 SDP Attribute ("att-field (media level only)"): 348 Attribute name: superposition: transparency, layer 349 Long form: superimposition transparency, superimposition layer 350 Type of name: att-field 351 Type of attribute: media level only 352 Subject to charset: no 353 Purpose: RFC 5583 354 Reference: RFC 5583 355 Values: alpha, beta 357 The IANA Considerations section of the RFC MUST include the following 358 information, which appears in the IANA registry along with the RFC 359 number of the publication. 361 o A brief description of the semantics. 363 o Token to be used within the "group" attribute. This token may be 364 of any length, but SHOULD be no more than four characters long. 366 o Reference to a standards track RFC. 368 10. Acknowledgements 370 The authors would like to thank Christer Holmberg and Paul Kyzivat 371 for reviewing the draft and providing key ideas. 373 11. References 375 11.1. Normative References 377 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 378 Requirement Levels", BCP 14, RFC 2119, 379 DOI 10.17487/RFC2119, March 1997, 380 . 382 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 383 A., Peterson, J., Sparks, R., Handley, M., and E. 384 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 385 DOI 10.17487/RFC3261, June 2002, 386 . 388 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 389 Jacobson, "RTP: A Transport Protocol for Real-Time 390 Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, 391 July 2003, . 393 [RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax 394 Specifications: ABNF", STD 68, RFC 5234, 395 DOI 10.17487/RFC5234, January 2008, 396 . 398 [RFC5888] Camarillo, G. and H. Schulzrinne, "The Session Description 399 Protocol (SDP) Grouping Framework", RFC 5888, 400 DOI 10.17487/RFC5888, June 2010, 401 . 403 [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for 404 Writing an IANA Considerations Section in RFCs", BCP 26, 405 RFC 8126, DOI 10.17487/RFC8126, June 2017, 406 . 408 [RFC8446] Rescorla, E., "The Transport Layer Security (TLS) Protocol 409 Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018, 410 . 412 [RFC8550] Schaad, J., Ramsdell, B., and S. Turner, "Secure/ 413 Multipurpose Internet Mail Extensions (S/MIME) Version 4.0 414 Certificate Handling", RFC 8550, DOI 10.17487/RFC8550, 415 April 2019, . 417 [RFC8866] Begen, A., Kyzivat, P., Perkins, C., and M. Handley, "SDP: 418 Session Description Protocol", RFC 8866, 419 DOI 10.17487/RFC8866, January 2021, 420 . 422 11.2. Informative References 424 [draft-ietf-avtcore-multiplex-guidelines-12] 425 Westerlund, M., Burman, B., Perkins, C., Alvestrand, H., 426 and R. Even, "Guidelines for using the Multiplexing 427 Features of RTP to Support Multiple Media Streams", draft- 428 ietf-avtcore-multiplex-guidelines-12 (work in progress), 429 June 2020. 431 [RFC8845] Duckworth, M., Ed., Pepperell, A., and S. Wenger, 432 "Framework for Telepresence Multi-Streams", RFC 8845, 433 DOI 10.17487/RFC8845, January 2021, 434 . 436 [Wiki.Alpha-compositing] 437 "Alpha compositing", 438 . 440 Authors' Addresses 442 Rohit Abhishek 443 Tencent 444 2747 Park Blvd 445 Palo Alto 94588 446 USA 448 Email: rabhishek@rabhishek.com 450 Stephan Wenger 451 Tencent 452 2747 Park Blvd 453 Palo Alto 94588 454 USA 456 Email: stewe@stewe.org