INTERNET-DRAFT Q. Wei Intended Status: Standard Track R. Huang Expires: May 2, 2017 H. Zheng Huawei October 29, 2016 RTP Payload Format for HTTP Adaptive Streaming draft-wei-payload-has-over-rtp-01 Abstract whis document introduces a new RTP payload format for encapsulating the HTTP Adaptive Streaming (HAS) data into RTP, so that current RTP schemes can be leveraged into OTT video delivery services. For example, operators can easily deliver OTT live content through multicast eliminating the impact of live content consumption peaks. Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Copyright and License Notice Copyright (c) 2016 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents Expires May 2, 2017 [Page 1] INTERNET DRAFT October 29, 2016 (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Existing Technologies . . . . . . . . . . . . . . . . . . . . 3 3.1 HTTP Adaptive Streaming . . . . . . . . . . . . . . . . . . 3 3.2 Multicast Adaptive Bit Rate (Multicast-ABR) . . . . . . . . 4 4. HAS Over RTP Use Scenarios . . . . . . . . . . . . . . . . . . 5 5. Overview of HTTP Adaptive Streaming over RTP . . . . . . . . . 5 6. HTTP Adaptive Streaming Payload . . . . . . . . . . . . . . . 7 6.1 RTP Payload Format for HAS Content . . . . . . . . . . . . 7 6.2 Use Existing RTP Payload Format for HAS Content . . . . . . 10 6.3 Manifest file and Initial Information Consideration . . . . 10 7. Payload Format Parameters . . . . . . . . . . . . . . . . . . 11 7.1 Media Type Definition . . . . . . . . . . . . . . . . . . . 11 7.2 SDP Signaling . . . . . . . . . . . . . . . . . . . . . . . 12 8. Congestion Control . . . . . . . . . . . . . . . . . . . . . . 12 9. Security Considerations . . . . . . . . . . . . . . . . . . . 12 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 11. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 13 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 13 12.1 Normative References . . . . . . . . . . . . . . . . . . . 13 12.2 Informative References . . . . . . . . . . . . . . . . . . 13 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 13 Expires May 2, 2017 [Page 2] INTERNET DRAFT October 29, 2016 1. Introduction Video consumption has exploded over the last few years as more and more consumers are watching live Over-the-Top (OTT) content on smartphones, tablets, PCs and other IP connected devices. Since OTT video services rely on HTTP adaptive streaming (HAS) technology, e.g., DASH and HTTP Live Streaming (HLS), to deliver content, so every time a user requests a piece of content, a stream is sent throughout the entire network. If a significant number of users are requesting content, the operator's bandwidth is drained. It is usually difficult for operators to predict the popularity of live video content, especially for some major sporting events. For such an event, millions of users will be watching the content simultaneously, and causing peak traffic in the network. Even when Content Delivery Network (CDN) is used to help distribute the traffic load over network edge nodes, it might still not be sufficient. Since CDNs cannot be deployed close enough to the users, its scalability is in question. Furthermore, using CDN may also incur a very high expense, especially for the new video services like 4K live, or VR live. All of this leads to a poor Quality of Experience (QoE) for the users, and a high cost for the service providers. The most effective solution is to use multicast technology, even for OTT live content delivery. Through multicast technology, operators can stream live content only once in their network, regardless of the number of viewers watching, which eliminates the impact of live content consumption peaks. This document introduces a new RTP payload format for encapsulating the HAS data into RTP, so that current RTP schemes can be leveraged into OTT video delivery services. For example, operators can easily deliver OTT live content through multicast to eliminating the impact of live content consumption peaks. 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 3. Existing Technologies 3.1 HTTP Adaptive Streaming HTTP adaptive streaming has become a popular approach in video commercial deployments. The multimedia content is captured, divided into small segments, and stored on an HTTP server. The consuming user first obtains the manifest file, e.g., the Media Presentation Description (MPD), which describes a manifest of the available Expires May 2, 2017 [Page 3] INTERNET DRAFT October 29, 2016 segments information, corresponding bitrates, their URL addresses, and other characteristics. Based on this, the consuming user selects the appropriate encoded alternative and starts streaming of the content by fetching the segments using HTTP GET requests in unicast. MPEG developed the specification, known as MPEG Dynamic Adaptive Streaming over HTTP (DASH) [DASH], to standardize the MPD and the segment formats. Other private mechanisms like Apple's HTTP Live Streaming (HLS) [HLS] are also popular. HAS is a typical client pull model. All the manifest files, HAS segments, and etc., are pulled from the HTTP server one after another by the clients issuing HTTP requests. HTTP adaptive streaming is very efficient for the usage of Video on Demand (VOD). However, when delivering live content simultaneously to millions of users, this becomes quite a problem. The peak bandwidth in video consumption is simply too much for an operator to handle since each viewer counts as a separate unicast session. As live OTT multi-screen video consumption shows no signs of slowing down, a traditional unicast delivery method is becoming too expensive in terms of bandwidth and investments that must be made to maintain the network. Partnering with a CDN provider only helps optimize the traffic on the backbone for known content. Additional infrastructure investment is still required at the edge of the network to absorb the load, but is too costly of an undertaking and would only be a temporary solution, as there would always be a need for more servers when live OTT consumption increases. 3.2 Multicast Adaptive Bit Rate (Multicast-ABR) Operators are seeking ways to improve the quality of services available, while also creating more balanced and effective delivery of data to enhance the operators' cost-efficiency, and reduce wastage across increasingly constrained bandwidths. Multicast-ABR, specified in [CableLabs], is one of the innovations. Multicast-ABR leverages HTTP streaming into multicast by keeping the different alternatives in separate multicast groups, so that smart network nodes or clients are able to select an appropriate rate by joining the correct multicast and delivering these segments to clients. And multicast-ABR uses NACK-Oriented Reliable Multicast (NORM) [RFC5740] to deliver HTTP adaptive streaming data in multicast. Multicast-ABR is a low cost and easy to deploy solution that allows operators to see multicast gains on all in-home devices leveraging their TV Everywhere infrastructure. However, using NORM to convey Expires May 2, 2017 [Page 4] INTERNET DRAFT October 29, 2016 HTTP adaptive streaming data has 3 shortcomings: Firstly, NORM has no fast channel change (FCC) mechanisms, like [RFC6285], so that changing different video resolutions may take some time and cause video frame freeze. Secondly, some telecom operators only have IPTV multicast platform, which may not support NORM protocol. Thirdly, NORM is not aware of the media timing in a way that RTP is as RTP is nature to handle multimedia. Based on this, using RTP to deliver HTTP adaptive streaming data could be an alternative. 4. HAS Over RTP Use Scenarios Network operators running IPTV have already built up an RTP over IP multicast infrastructure to deliver IPTV content. With the introduction of HAS payload for RTP, network operators can reuse the existing infrastructure for the delivery of OTT content using HAS. The benefit is that it can greatly reduce the investment for IPTV headend devices and simplify the whole architecture. From the perspective of OTT content providers, HAS over RTP will provide them with another means other than CDN for content delivery. OTT content providers can deliver their in high-demand videos through multicast, thus ensuring the Quality of Experience (QoE) for the end users. This would not only help save the bandwidth for network operators, but also provide them with more opportunities for revenue. They can offer the multicast as a service to OTT providers. 5. Overview of HTTP Adaptive Streaming over RTP Figure 1 shows the architecture for HTTP adaptive streaming over RTP, which is similar to the ones defined in Multicast-ABR. Expires May 2, 2017 [Page 5] INTERNET DRAFT October 29, 2016 +---------+ | Content | | Source | +---------+ | ----- Unicast -------------------| | | ***** Multicast +-----------+ +------------+ | HAS | | Multicast | | Consumer | | Gateway | +-----------+ +------------+ * * * **************************************** * * * * +------------+ +------------+ | Multicast | ................ | M2U | | Consumer | +------------+ +------------+ | | | +------------+ | Unicast | | Consumer | +------------+ Figure 1: HTTP Adaptive Streaming over RTP with Multicast In the above figure, a HAS Consumer can access the Content Source using HTTP as normal. This memo considers only the operation of the Multicast Gateway, the packetisation of HAS content within Multicast RTP, the operation of the Multicast Consumer that receives that content, and the operation of the Unicast Consumer that receives the content by Unicast RTP. A Multicast Gateway receives the unicast HAS streams of various bitrates from the Content Source, and converts each unicast stream to multicast to pass on a specific multicast group. Multicast Consumers subscribe to a multicast group to receive data from the specific bit-rate stream. Unicast Consumers don't support multicast but unicast RTP. Therefore, an intermediate node M2U (multicast-to-unicast) can be introduced to help terminate the multicast and convert the stream back to unicast for further delivery. Multicast Gateway: It is responsible for converting the HAS streams from unicast to multicast, and providing the multicast service to its receivers. It will directly pass through the live Expires May 2, 2017 [Page 6] INTERNET DRAFT October 29, 2016 content. HAS Consumer: It is a standard HAS end point. It could be an application, or just a user device. Multicast Consumer: It is an end point supporting HAS RTP by receiving the content in multicast. Unicast Consumer: It is an end point supporting HAS RTP by receiving the content in unicast. M2U: It stands for multicast-to-unicast. It helps convert multicast to unicast if the consumer doesn't support multicast. HAS over RTP will be used throughout Multicast Server, M2U, Multicast Consumer, and Unicast Consumer. 6. HTTP Adaptive Streaming Payload This section specifies the format of the RTP payload of HTTP adaptive streaming data. The structure of the payload is illustrated as Figure 2. This payload format uses the fields of the header in a manner consistent with that specification. +----------+------------+ |RTP Header|HAS Payload | +----------+------------+ Figure 2: Packet Structure with RTP Header There are two ways in which HAS content can be encoded in RTP packets: The HAS Payload can be a new RTP Payload Format, specially designed to carry HAS Content in RTP. Alternatively, an existing RTP payload format can be used if the HAS Content uses a codec with an existing format. We describe a new RTP payload format for HAS content in section 6.1, and discuss using existing formats in Section 6.2. 6.1 RTP Payload Format for HAS Content The format of RTP header is specified in [RFC3550] and is shown as Figure 3 for convenience. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Expires May 2, 2017 [Page 7] INTERNET DRAFT October 29, 2016 | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | [ contributing source (CSRC) identifiers ] | | .... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 3: RTP Header Defined in [RFC3550] The RTP header information to be set according to this RTP payload format is set as followings: Marker bit (M): 1 bits The marker bit set "1" SHALL indicate the last RTP packet of the media segment, carried in the current RTP stream. This is in line with the normal use of the M bit in video formats to allow an efficient playout buffer handling. Payload Type (PT): 7 bits The assignment of an RTP payload type for this new packet format is outside of the scope of this document and will not be specified here. The assignment of a payload type has to be performed either through the profile used or in a dynamic way. Sequence Number (SN): 16 bits Set and used in accordance with [RFC3550]. Timestamp: 16 bits The RTP timestamp is set to the sampling timestamp of the content. The clock rate is specified dynamically through non-RTP means. If no clock rate is signaled, 90 kHz MUST be used. When a media segment is encapsulated into several RTP packets, each of them shares the same timestamp. The format of the HAS payload is illustrated in Figure 4. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| RSV | Length | [URL Length] | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Offset | Expires May 2, 2017 [Page 8] INTERNET DRAFT October 29, 2016 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | [ URL ...] | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | HAS data | | | | . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | :...OPTIONAL RTP padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 4: Format for HTTP Adaptive Streaming Payload Fragmentation (F): 1 bit If the fragmentation is set, it indicates the received packet is a part of a decodable fragment and can not be decoded correctly until the whole decodable segment is received. The different parts belong to the same decodable segment are ordered by their sequence numbers, and share the same timestamp. If the fragmentation is set, URL length field and URL field will be omitted. RSV : 7 bits These bits are reserved for future use. They MUST be set to zero by senders and ignored by receivers. Length: 16 bits The size of the RTP payload in bytes, excluding the RTP header but including the payload header. URL length: 8 bits The size of the URL field in bytes, including the URL length. Offset: 32 bits The offset of this fragment in current decodable segment. URL: bits defined by URL length. This field indicates the URL of the content. Examples would be "/PLTV/88888888/224/3221225484/3221225484.mpd", or "Stream_1_1944000". It facilitates associating the content with HTTP request so that receivers can easily turn it into HAS scheme when receiving it. It is used to relate the RTP packet with Expires May 2, 2017 [Page 9] INTERNET DRAFT October 29, 2016 corresponding HAS segment specified in the manifest. Basically, it is required to split application data into RTP packets so that each packet is usable, no matter what is lost. This is possible when the multicast server has the ability and is authorized to access the HAS content. However, when OTT content is encrypted to the multicast server, the frame boundary that can be decoded independently is hardly figured out. Accordingly, one fragment of the HAS segment lost will lead to the whole segment undecodable, and the receivers joining the multicast randomly cannot view the content immediately. In this case, mechanisms like FEC [RFC5109], or retransmission [RFC4585] MUST be used to alleviate packet losses. And FCC [RFC6285] SHOULD be used to ensure endpoints who joins the multicast randomly can view the content immediately. Another possible way to do smart fragmenting is to extend the manifest files, e.g. MPD, to allow the OTT content providers indicate the fragmentation points where independently decodable application data can be extracted. Thus, the multicast server bridging HAS into RTP would fetch the extended manifest file, then use these hints to determine how to fragment each segment into RTP packets. Since DASH is the standard in MPEG, this method requires the work in MPEG to specify the extended fragmentation points. 6.2 Use Existing RTP Payload Format for HAS Content It is also possible to use existing RTP payload format to transport HAS content. For example, if the HAS content is H.264, the multicast gateway will parse the HAS segments to find the boundaries, extract the NAL units, and generates RTP packets using H.264 RTP format [RFC6184]. This approach has its advantage that the resulting RTP stream will be more robust to packet loss, since it uses a loss tolerant encoding, and will use a standard RTP payload format, that many existing RTP clients can decode. However, it has several limitations: It is only possible when the multicast server has the ability and is authorized to access the HAS content; It will complicate the multicast gateway; And also the multicast gateway is required to support multiple existing RTP formats to enable flexibility since HAS content usually uses container, e.g., MP4 [MPEG-4 Part 14], which can support multiple encodings. 6.3 Manifest file and Initial Information Consideration HAS comes with manifest files to describe the segments and initial information to setup the session. Since these data needs reliable ways to delivery, we do not consider transporting them in-band over Expires May 2, 2017 [Page 10] INTERNET DRAFT October 29, 2016 RTP with HAS segments. In fact, the end devices may not need the manifest files as HAS over RTP is a push model and allows some packets to be dropped. When the manifest files and initial information are needed, they can be acquired using out of band method like HTTP or SDP. Especially when the receiver joins the multicast, it's better to obtain the manifest or initial information by out of band ways in advance. 7. Payload Format Parameters This section specifies the media type and the parameters identifying this RTP payload format. 7.1 Media Type Definition This registration is done using the template defined in [RFC6838] and following [RFC4855]. Type name: video Subtype name: HAS Required parameter: has-type: This parameter indicates the HTTP adaptive streaming protocol. The value of has-type MUST be in the range of 0 to 7, inclusive. The detailed value can be seen as following. HSPT=0: DASH HSPT=1: Http Live Streaming (HLS) HSPT=2-6: Reserved HSPT=7: Profile-specific HTTP adaptive streaming Optional parameters: bitrate: This parameter can be used to signal receiver the bitrate of the stream. How to use it is up to the receiver. The value of this parameter is an integer in units of kilo-bits per second. Encoding considerations: This media type is framed in RTP and contains binary data; see Section 4.8 of [RFC6838]. Security considerations: See section 7 of RFCXXXX. Published specification: N/A Additional information: None Expires May 2, 2017 [Page 11] INTERNET DRAFT October 29, 2016 File extensions: none Macintosh file type code: none Object identifier or OID: none Person & email address to contact for further information: Rachel Huang (rachel.huang@huawei.com) Intended usage: COMMON Author: See Authors' Addresses section of RFCxxxx. Change controller: IETF Audio/Video Transport Payloads working group delegated from the IESG. 7.2 SDP Signaling TBD. Negotiation of the new RTP payload is required. Further details will be provided in the next versions. 8. Congestion Control Current DASH clients do congestion control individually. When using multicast to transport HAS data, it is expected that multicast receivers have the ability to dynamically join the corresponding multicast group based on different network conditions. Multicast receivers share the same stream in one multicast group, but HAS receivers compete each other with different streams, which means the congestion control mechanisms used in HAS don't work for multicast. However, it is expected that HAS over RTP for multicast is deployed in a managed network, hence the congestion can be well controlled. In the future, when it is deployed on the Internet, a new congestion control mechanism may be required to coordinate the multicast receivers with same congestion problem, so that they can share the stream to obtain the best quality they can have. 9. Security Considerations TBD. 10. IANA Considerations Expires May 2, 2017 [Page 12] INTERNET DRAFT October 29, 2016 TBD. 11. Acknowledgments The authors would like to thank the following individuals who help to review this document and provide very valuable comments: Colin Perkins, Roni Even, Yuping Jiang. 12. References 12.1 Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003. [CableLabs] "IP Multicast Adaptive Bit Rate Architecture Technical Report" http://www.cablelabs.com/wp-content/uploads/specdocs/OC-TR- IP-MULTI-ARCH-V01-1411121.pdf 12.2 Informative References [RFC5740] Adamson, B., Bormann, C., Handley, M., and J. Macker "NACK-Oriented Reliable Multicast (NORM) Transport Protocol", RFC 5740, November 2009. [RFC6285] Steeg, B., Begen, A., Caenegem, T., and Z. Vax "Unicast- Based Rapid Acquisition of Multicast RTP Sessions", RFC 6285, June 2011. [DASH] ISO/IEC 23009-1:2014 Information technology -- Dynamic adaptive streaming over HTTP (DASH) -- Part 1: Media presentation description and segment format. [HLS] Pantos, R. and W. May, "HTTP Live Streaming", https://tools.ietf.org/html/draft-pantos-http-live-streaming-20, September 2016. [MPEG-4 Part 14] "Information technology - Coding of audio-visual objects - Part 14:MP4 file format", ISO/IEC 14496-14, November 2003. Authors' Addresses Qikun Wei Expires May 2, 2017 [Page 13] INTERNET DRAFT October 29, 2016 Huawei Email: weiqikun@huawei.com Rachel Huang Huawei Email: rachel.huang@huawei.com Hui Zheng Huawei Email: marvin.zhenghui@huawei.com Yong Xia China SARFT Email: xiayong@abs.ac.cn Expires May 2, 2017 [Page 14]