INTERNET-DRAFT R. Huang Intended Status: Standard Track Huawei October 19, 2015 Multi-view streams in SDP and RTP Sessions draft-huang-mmusic-multiview-00 Abstract This document analyses the streaming options of multi-view applications,and describes the required SDP signaling for them. Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Copyright and License Notice Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of Expires April 21, 2016 [Page 1] INTERNET DRAFT October 19, 2015 the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3.1. 3D Channel of IPTV . . . . . . . . . . . . . . . . . . . . 4 3.2 Multi-view video conference . . . . . . . . . . . . . . . . 4 3.2.3 Tele-education with Free Viewpoint Video . . . . . . . . 4 3.2.4 Immersive Telepresence with Multi-view Video . . . . . . 4 4. Multi-view Video Transmission . . . . . . . . . . . . . . . . 4 5. SDP Signaling Requirements for Multi-view Video . . . . . . . . 5 6. Gap Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 6 6.1. RFC5583 . . . . . . . . . . . . . . . . . . . . . . . . . . 6 6.2. Simulcast . . . . . . . . . . . . . . . . . . . . . . . . . 6 6.3. RID . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 6.4. CLUE . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 6.5. 3D Signaling . . . . . . . . . . . . . . . . . . . . . . . 7 7. Possible Solutions . . . . . . . . . . . . . . . . . . . . . . 7 8. Security Considerations . . . . . . . . . . . . . . . . . . . 7 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 7 10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 7 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 7 11.1 Normative References . . . . . . . . . . . . . . . . . . . 8 11.2 Informative References . . . . . . . . . . . . . . . . . . 8 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 9 Expires April 21, 2016 [Page 2] INTERNET DRAFT October 19, 2015 1 Introduction Multi-view video consists of multiple views that are taken by multiple cameras from different positions and angles. 3D video and free viewpoint video are two typical use cases. The first offers a 3D depth impression of the observed scenery, while the second allows for interactive selection of viewpoint and direction within a certain operating range as known from computer graphics. Streaming of such multi-view applications on Internet is usually offered at varying speeds and costs over a variety of physical infrastructures. However, since the multi-view video consists of multiple video sequences, the traffic is several times larger than traditional multimedia, which brings the dramatic increase in the bandwidth requirement. For streaming of multi-view representations two operating options are usually used: streaming all views in a highly compressed MVC bit- stream with little possibility of random access or encoding all views independently and streaming only required views. This document analyses the streaming options of multi-view applications,and describes the required SDP signaling for them. 2 Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. This document uses the following terms: Multi-view Video Compression: Using efficient compression techniques to transmit multi-view video. Usually, inter-view similarity between adjacent views and temporal similarity between temporally successive images of each video are exploited. MVC (Multiview Video Coding): An extension of the AVC standard, standardised in 2009, covers a wide range of 3D video applications including 3D video streaming, free-viewpoint video. It is inherently backward compatible with AVC which serves as the base-view that can be decoded independently in the absence of the MVC decoder. Any additional views are referred to as enhancement views and are typically coded using interview prediction within the same bitstream. 3. Use Cases Multi-view video communications can be applied on a wide variety of use cases. In this section, several of the most likely usage scenarios are introduced. Expires April 21, 2016 [Page 3] INTERNET DRAFT October 19, 2015 3.1. 3D Channel of IPTV The multi-view video content is sent to a group of end users by multicast, which is widely applied in 3D channels of IPTV. Different views can be separately transmitted in different multicast group or just be transmitted in one multicast group in MVC. 3.2 Multi-view video conference Two or more users engage in a remote video conversation using mobile or desktop terminals, which have stereoscopic capture and display capabilities. The multi-view video stream requires the transmission of 2 or more views or, alternately, of one view and a depth map. The views or depth maps may be transmitted as separate transport streams, or together,depending on the choice of multi-view video transmissions. In the case of a multiparty session, an intermediate media server should be used for mixing. Video mixing for the multi- view video stream needs to be aware of the additional views or depth maps. 3.2.3 Tele-education with Free Viewpoint Video A teacher intends to give a real-time lecture to students in one or more remote sites. The teacher's site is equipped with a multi-view capture setup, and has a 2D display to show feedback from the students, who are equipped with regular 2D cameras and displays. Multi-view video content is transmitted so that the student sites are capable of selecting perspectives of the captured scene within a certain operating range, and rendering them in real time, thus some signaling interaction between students and the teacher may be required. 3.2.4 Immersive Telepresence with Multi-view Video A group of users wants to conduct a meeting through a telepresence system, connecting to the session from several sites, and with each terminal supporting multiple users. Participants are arranged around a shared virtual table, and each remote participant is shown in a separate screen, which is autostereoscopic and displays two different views for each user in the room. At each site, several autostereoscopic displays and a multi-view capture setup are deployed. The viewpoint of these views is adjusted during the session so as to match the position of observing users. The use of multi-view video makes telepresence systems having perfect eye contact and spatial faithfulness. 4. Multi-view Video Transmission Expires April 21, 2016 [Page 4] INTERNET DRAFT October 19, 2015 Multi-view video, which has two or more views, increases the encoding complexity and bandwidth requirement for transmission. There are several ways that can be used to transmit these views: * Multi-view simulcast: encode each view and/or depth map independently using a monocular video codec, which enables streaming each view over separate channels; and clients can requests as many views as their displays require without worrying about inter-view dependencies. While this method has its benefits, it does not exploit the redundancies that are preset in between the views. * Multi-view video compression: encode views using specific technique to decrease the overall bit rate by exploiting the inter-view redundancies. However, although it exploits the similarities that are present between the views, it increases the effect of transmission errors. And the inter-view efficient compression techniques make the views depend on each others. In order to decode a frame correctly, the frames it depend on must be decoded at first, which will bring more unnecessary transmission and delay if the views should be displayed is far away from the reference view. MVC is a typical and often used video compression format applied to multi-view videos. * Adaptive Multi-view video transmission: Adaptive streaming of multi-view video is also considered when functionalities such as rate scalability, resolution scalability, view scalability, and packet-loss resilience should be offered. In such a case, simulcast or Multi-view video compression will be used together with adaptive streaming techniques, e.g., SVC. * Combination transmission: Sending multi-view video by using a combination of simulcast and multi-view video compression techniques may be also a good option for some specific scenarios. For example, if one multi-view video compression technique works well for closely related views but not for widely differing views, sending several multi-view video compression streams simultaneously can solve the problem. 5. SDP Signaling Requirements for Multi-view Video The following requirements need to be met to support streaming multi- view video in previous sections: REQ-1: It must be possible to signal whether multi-view simulcast or multi-view video compression is used. REQ-2: It must be possible to signal adaptive multi-view video Expires April 21, 2016 [Page 5] INTERNET DRAFT October 19, 2015 transmission, e.g., multi-view video simulcast is used together with adaptive simulcast. REQ-3: It must be possible to signal combination transmission where multi-view video compression is used together with simulcast. REQ-4: Bundled [I-D.ietf-mmusic-sdp-bundle-negotiation] usage must be considered. REQ-5: It must be possible to signal multi-view video related decoder constraints, e.g., maximum number of view streams that can be provided at the sender and maximum number of view streams that can be received at the receiver. REQ-6: It must be possible to support both declarative SDP and SDP offer/answer. REQ-7: When multi-view simulcast is used, it must be possible to have some ways to allow receivers ask for required view streams that they wish to receive. REQ-8: It must be compatible with existing other mechanisms, e.g., RTP retransmission [RFC4588], Forward Error Correction [RFC5109]. 6. Gap Analysis 6.1. RFC5583 This specification defines a SDP mechanism to signaling the decoding dependency of different media descriptions with the same media type [RFC5583]. It can be used to signal the use of SVC or MVC. However, it cannot differentiate the usages between multi-view simulcast and MVC, not mention when multi-view transmission is used together with simulcast or SVC. 6.2. Simulcast [I.d-ietf-mmusic-sdp-simulcat] describes simulcast as the scenarios where sending multiple differently encoded versions of the same media source in different RTP streams. It is mainly used for the same video source encoded with different video encoder types or image resolutions. It still can not deal with the case when multi-view transmission is used together with simulcast or SVC. 6.3. RID [I.d-pthatcher-mmusic-rid] defines a framework to identify Source RTP Expires April 21, 2016 [Page 6] INTERNET DRAFT October 19, 2015 streams with constraints on its payload format in SDP. It can effectively identify the source RTP stream within a RTP session, which is quite useful for simulcast. Thus, it may be also helpful for multi-view video signaling. It need to be considered further for potential problems and issues. 6.4. CLUE CLUE is dedicated for telepresence systems which provide high definition, high quality audio/video enabling a "being-there" experience. It involves multiple devices like multiple cameras, displays, microphones, and loudspeakers. It specifies spatial relationship of these devices, viewpoint, field of view/capture for these devices, and related information [I.d-ietf-clue-signal]. However, the usages of 3D or free view point video in CLUE are not considered in current CLUE scope. Thus, the supporting multi-view view of CLUE needs to be considered further. 6.5. 3D Signaling There were some work in MMUSIC to propose some 3D signaling solutions. [I.d-greevenbosch-mmusic-signal-3d-format] and [I.d- greevenbosch-mmusic-sdp-parallax] introduce new SDP attributes to provide format description and depth position signaling in 3D applications. [I.d-capelastegui-mmusic-3dv-sdp] introduces a mechanism to describe 3D video streams composed of multiple video video views, or of a combination of views and depth maps. They are not directly multi-view, but should be evaluated when proposing possible solutions. 7. Possible Solutions TBD. 8. Security Considerations TBD. 9. IANA Considerations TBD. 10. Acknowledgments 11. References Expires April 21, 2016 [Page 7] INTERNET DRAFT October 19, 2015 11.1 Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC4588] Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R., Hakenberg, "RTP Retransmission Payload Format", RFC 4588, July 2006. [RFC5109] Li, A., Ed., "RTP Payload Format for Generic Forward Error Correction", RFC 5109, December 2007. [I-D.ietf-mmusic-sdp-bundle-negotiation] Holmberg, C., Alvestrand, H., and C. Jennings, "Negotiating Media Multiplexing Using the Session Description Protocol (SDP)", draft-ietf- mmusic-sdp-bundle-negotiation-23 (work in progress), July 2015. [I.d-ietf-mmusic-sdp-simulcat] Burman, B., Westerlund, M., Nandakumar, S., and M. Zanaty, "Using Simulcast in SDP and RTP Sessions", draft-ietf-mmusic-sdp-simulcast-02 (work in progress), October 6, 2015. [I.d-pthatcher-mmusic-rid] Thatcher, P., Zanaty, M., Nandakuma, S., Roach, A., Burman, B., and B. Campen, "RTP Payload Format Constraints", draft-pthatcher-mmusic-rid-01 (work in progress), October 2015. [I.d-ietf-clue-signal] Kyzivat, P., Xiao, L., Groves, C., and R., Hansen, "CLUE Signaling", draft-ietf-clue-signaling-06, August 2015. [I.d-greevenbosch-mmusic-signal-3d-format] Greevenbosch, B., and Y., Hui, "Signal 3D format", draft-greevenbosch-mmusic-sdp-3d- format-01, October 2012. [I.d-greevenbosch-mmusic-sdp-parallax] Greevenbosch, B., and Y., Hui, "SDP attribute to signal parallax", draft-greevenbosch- mmusic-sdp-parallax-01, October 2012. [I.d-capelastegui-mmusic-3dv-sdp] Capelastegui, P., "3D Video in the Session Description Protocol (SDP)", draft-capelastegui- mmusic-3dv-sdp-00, April 2012. 11.2 Informative References Expires April 21, 2016 [Page 8] INTERNET DRAFT October 19, 2015 Authors' Addresses Rachel Huang Huawei 101 Software Avenue Nanjing, China EMail: rachel.huang@huawei.com Expires April 21, 2016 [Page 9]