Network Working Group N. Zong Internet-Draft Huawei Technologies Intended status: Informational October 24, 2010 Expires: April 27, 2011 Survey and Gap Analysis for HTTP Streaming Standards and Implementations draft-zong-httpstreaming-gap-analysis-01 Abstract With the explosive growth of the Internet usage and increasing demand for multimedia information on the web, media delivery over Internet attract substantial attention from media industry. To meet above requirements, HTTP Streaming technology is designed and gradually plays an important role in recent years. Several leading Standard Development Organizations (SDOs) have been producing a series of technical specifications to define streaming over HTTP. Moreover, several companies have devoted to developing private HTTP-based media delivery platform to provide high quality, adaptive viewing experience to customers. Following a brief survey of existing HTTP streaming standards and implementations, this document gives a brief summary on these related work, analyzes the potential challenges especially from the network point of view, and lists the gap between existing work and possible working scope on the topic of HTTP streaming in IETF. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on April 27, 2011. Copyright Notice Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. Zong Expires April 27, 2011 [Page 1] Internet-Draft Survey and Gap Analysis October 2010 This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. This document may contain material from IETF Documents or IETF Contributions published or made publicly available before November 10, 2008. The person(s) controlling the copyright in some of this material may not have granted the IETF Trust the right to allow modifications of such material outside the IETF Standards Process. Without obtaining an adequate license from the person(s) controlling the copyright in such materials, this document may not be modified outside the IETF Standards Process, and derivative works of it may not be created outside the IETF Standards Process, except to format it for publication as an RFC or to translate it into languages other than English. Zong Expires April 27, 2011 [Page 2] Internet-Draft Survey and Gap Analysis October 2010 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 3. HTTP Streaming Standards . . . . . . . . . . . . . . . . . . . 6 3.1. 3GPP . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.1.1. Media Presentation Components . . . . . . . . . . . . 6 3.1.2. Media Presentation Description . . . . . . . . . . . . 8 3.1.3. Streaming Procedure . . . . . . . . . . . . . . . . . 9 3.1.3.1. Overview . . . . . . . . . . . . . . . . . . . . . 9 3.1.3.2. Segment list generation . . . . . . . . . . . . . 10 3.1.3.3. Seeking, trick mode and adaptation support . . . . 10 3.2. OIPF . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2.1. MPD . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2.2. Segmentation . . . . . . . . . . . . . . . . . . . . . 11 3.2.3. Media formats for MPEG2-TS . . . . . . . . . . . . . . 11 3.2.4. Use cases . . . . . . . . . . . . . . . . . . . . . . 12 3.2.4.1. Live streaming . . . . . . . . . . . . . . . . . . 12 3.2.4.2. Trick mode and seeking . . . . . . . . . . . . . . 12 3.3. MPEG . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3.1. Objectives . . . . . . . . . . . . . . . . . . . . . . 13 3.3.2. Requirements for proposal . . . . . . . . . . . . . . 13 4. HTTP Streaming Implementations . . . . . . . . . . . . . . . . 14 4.1. Microsoft Smooth Streaming . . . . . . . . . . . . . . . . 14 4.1.1. On-disk MP4 file format . . . . . . . . . . . . . . . 15 4.1.2. On-wire segments transmission . . . . . . . . . . . . 15 4.1.3. Adaptative support . . . . . . . . . . . . . . . . . . 16 4.2. Adobe . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.2.1. Components . . . . . . . . . . . . . . . . . . . . . . 16 4.2.2. Workflow . . . . . . . . . . . . . . . . . . . . . . . 16 4.2.3. Top features . . . . . . . . . . . . . . . . . . . . . 17 4.3. Apple . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.3.1. Basic process . . . . . . . . . . . . . . . . . . . . 18 5. Gap Analysys . . . . . . . . . . . . . . . . . . . . . . . . . 18 5.1. Brief Summary of Exitsting Work . . . . . . . . . . . . . 19 5.2. Challenges . . . . . . . . . . . . . . . . . . . . . . . . 20 5.3. Gap List and Potential Working Scope in IETF . . . . . . . 21 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 7. Security Considerations . . . . . . . . . . . . . . . . . . . 22 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 22 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 22 9.1. Normative References . . . . . . . . . . . . . . . . . . . 22 9.2. Informative References . . . . . . . . . . . . . . . . . . 22 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 23 Zong Expires April 27, 2011 [Page 3] Internet-Draft Survey and Gap Analysis October 2010 1. Introduction Media streaming have played increasingly important role in Internet content deliveries, and are becoming indispensable in many applications (e.g., distance learning, digital libraries, home shopping, and video-on-demand). Currently, several streaming protocols are commonly used to deliver media content on Internet, such as HTTP, RTSP/RTP, RTMP, MMS, etc. HTTP streaming, one of above listed protocols, is rapidly becoming one of the most commonly used approach for media content distribution on the Internet. HTTP streaming is a mechanism for sending media data/file, which is divided into several chunks/fragments and supply them in order to user through port 80/8080. HTTP streaming includes various streaming media formats/codec including MP4, MPEG2-TS, H.264/ AAC, etc., and streaming services over HTTP, such as Windows Media/ Silver Light Streaming, Flash Video, QuickTime Streaming Server, Real Media Streaming and others. HTTP streaming offers two advantages as below: 1) Media protocols often have difficulty getting around firewalls and routers because they are commonly based on UDP sockets over unusual port numbers. HTTP-based media delivery has no such problems because firewalls and routers know to pass HTTP downloads through port 80. 2) HTTP media delivery has the ability to use standard HTTP servers and standard HTTP caches (or cheap servers in general) to deliver the content, so that it doesn't require special proxies or caches. Additionally, most Content Delivery Network (CDN) make use of HTTP to redirect request, retrieve cached multimedia object, and communicate policy servers. Several leading Standard Development Organizations (SDOs) have been producing a series of technical specifications to define streaming over HTTP. 3GPP introduces adaptive HTTP streaming in Technical Specification (TS) 26.234 [3GPP], where HTTP streaming is introduced in detail including Media Presentation Description (MPD), Media Segmentation Format, HTTP server and client behavior, etc., as an alternative approach to the RTSP/RTP based media delivery. Open IPTV Forum (OIPF) introduces HTTP adaptive streaming in its technical Specification [OIPF], which defines the usage of and extensions to 3GPP HTTP streaming to enable HTTP based Adaptive Streaming for OIPF compliant services and devices. Recently, ISO/IEC JTC1/SC29/WG11 (MPEG) launched a new standard on HTTP streaming. A bunch of documents [MPEG-1][MPEG-2][MPEG-3][MPEG-4] have been proposed to address the backgroud, objectives, use cases and requriements of the transport of MPEG media over HTTP. Zong Expires April 27, 2011 [Page 4] Internet-Draft Survey and Gap Analysis October 2010 Several companies have devoted to developing private HTTP-based media delivery platform to provide high quality, adaptive viewing experience to customers. Microsoft has implemented its Smooth Streaming technology, which is a web-base, adaptive media content delivery approach that uses standard HTTP [MS-IIS]. Instead of delivering media as full-file download, in Smooth Streaming, the content is delivered to client as a series of small file chunks that can be easily cached at edge servers, closer to client. Adobe HTTP Dynamic Streaming is a new Adobe-defined delivery method for enabling on-demand and live adaptive bitrate video streaming over regular HTTP connections [Adobe]. Adobe HTTP Dynamic Streaming packages media files into fragments that Flash Player clients can access instantly without downloading the entire file. Apple HTTP Live Streaming [Apple] allows to send live or prerecorded audio and video to iPhone or other devices, such as desktop computers, using an ordinary Web server, with support of adaptive bitrate. Following a brief survey of the above mentioned existing HTTP streaming standards and implementations, this document gives a brief summary on these related work, analyzes the potential challenges especially from the network point of view, and lists the gap between existing work and possible working scope on the topic of HTTP streaming in IETF. 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119] and indicate requirement levels for compliant implementations. Live Streaming: Live events can be streamed over the Internet with the help of broadcast software which encodes the live source - from a microphone, video camera, or other recording device and delivers the resulting stream to the server. The server then transfers the stream. So the user experiences the event as it happens. On-Demand Streaming: To provide "anytime" access to media content, client is allowed to select and playback on demand. Progressive Download: A mode that allow client playback the media file while the file is downloading, after only a few seconds wait for buffering, the process of collecting the first part of a media file before playing. Adaptive Streaming: Adaptive streaming is a process that adjusts the quality of a video delivered to a client based on the changing Zong Expires April 27, 2011 [Page 5] Internet-Draft Survey and Gap Analysis October 2010 network conditions to ensure the best possible viewer experience. 3. HTTP Streaming Standards 3.1. 3GPP 3GPP introduces adaptive HTTP streaming in Technical Specification (TS) 26.234 [3GPP]. TS 26.234 specifies the protocols and codecs for the Packet-Switched Steaming Service (PSS) within the 3GPP system. Protocols for control signalling, capability exchange, media transport, rate adaptation and protection are specified. Codecs for speech, natural and synthetic audio, video, still images, bitmap graphics, vector graphics, timed text and text are specified. The delivery of media over HTTP provides an alternative delivery mechanism to the RTSP/RTP based media delivery. It is assumed that the HTTP-Streaming Client has access to a Media Presentation Description (MPD). An MPD provides sufficient information for the HTTP-Streaming Client to provide a streaming service to the user by sequentially downloading media data from an HTTP server and rendering the included media appropriately. 3.1.1. Media Presentation Components A media presentation is a structured collection of data that is accessible to the HTTP-Streaming Client, which is described in a MPD. The media presentation structure is shown in the following figure. Zong Expires April 27, 2011 [Page 6] Internet-Draft Survey and Gap Analysis October 2010 ^ resultion / bit-rate / language / etc | | representation | +------------------------------------------------+ | | segment segment | | | +----------------------------+ +-----+ | | | | +----------+ +----------+ | | | | | | | | meta | | media | | | | | | | | | data | | data | | | | ... ...| | | | +----------+ +----------+ | | | | | | +----------------------------+ +-----+ | | +------------------------------------------------+ | | +------------------------------------------------+ | | representation | | +------------------------------------------------+ | ... ... | +------------------------------------------------+ | | representation | | +------------------------------------------------+ | period 1 period2 ... +-------------------------------------------------------------------> time A media presentation consists of: 1) A sequence of Periods. 2) Each Period contains one or more Representations from the same media content. Different Representations usually have different attributes on media resolution, bit-rate, language, etc. 3) Each Representation consists of one or more segments. 4) Segments contain media data and/or metadata to decode and present the included media data and is defined as a unit that can be uniquely referenced by an http-URL element in the MPD. The Initialisation Segment contains initialisation information (no media data) for accessing the Representation. A Media Segment contains media data that are described either within this Media Segment or by the Initialisation Segment. The segment has a start time relative to the start time of the representation (period) such that the client can download a specific segment. The segment provides random access information, namely if and how you can randomly access the media within this segment. There is no requirement that a segment starts with a random access point (RAP). But it is possible that all segments start with a RAP. Zong Expires April 27, 2011 [Page 7] Internet-Draft Survey and Gap Analysis October 2010 3.1.2. Media Presentation Description The logic structure of media presentation is actually described as the data structure (e.g. xml schema) in MPD file. That is, the MDP contains metadata required by the client to construct appropriate URIs to access segments and to provide the streaming service to the user. Several important attributes and elements contained in a MPD are listed as below: 1) "type" attribute: type of the media presentation, i.e. VOD or live. 2) "availabilityStartTime" attribute: media presentation start time if "type"=live. If "type"=VoD, media presentation start time is 0. 3) "duration" attribute: duration/length of the media presentation. For live presentation, the sum of "duration" and "availabilityStart" specifies the end time of the media presentation. If "duration" is not provided, then the MPD does not describe an entire media presentation and the MPD may be updated during live presentation. 4) "minimumUpdatePeriodMPD" attribute: minimum MPD update period. 5) "timeShiftBufferDepth" attribute: duration of time shifting buffer maintained at the server for live presentation. This attribute will be used in the case of trick mode. 6) "minBufferTime" attribute: minimum buffer time for the stream. 7) Multiple "Period" element: describe a period. A "Period" element contains the following important attributes and elements: 7.1) "start" attribute: start time of this period. 7.2) Multiple "Representation" element: describe a representation with different bit-rate, resolution, language, etc. A "Representation" element contains the following important attributes and elements: 7.2.1) "bandwidth" attribute: maximum bit-rate of the representation averaged over any interval of "minBufferTime" duration. 7.2.2) "startWithRAP" attribute: When True, indicates that all segments in the representation start with a random access point (RAP). Zong Expires April 27, 2011 [Page 8] Internet-Draft Survey and Gap Analysis October 2010 7.2.3) "qualityRanking" attribute: quality ranking of the representation. 7.2.4) "TrickMode" element: provides the information for trick mode. In this element, "AlternatePlayoutRate" attribute denotes the playout speed as a multiple of the regular playout speed. 7.2.5) "SegmentInfo" element: describe all segments in a representation. Each "SegmentInfo" element permits generating a list of Media Segment URLs (possibly with a byte range) and Media Segment start times relative to the start time of the Representation. A "SegmentInfo" element contains the following important attributes and elements: 7.2.5.1) "duration" attribute: gives the constant approximate segment duration. 7.2.5.2) at most one "InitialisationSegmentURL" element. If not present, then each media segment within this representation shall be self-initialising. 7.2.5.3) either a URLtemplate" element that specifies a default segment URL template for all segments, or one or more "Url" elements that provides a set of explicit URL(s) for segments. Note that a client derives the request-for-MPD-update time as the sum of the time of its last requested update of the MPD and the "minimumUpdatePeriodMPD" attribute. 3.1.3. Streaming Procedure 3.1.3.1. Overview Initially, the client parses the MPD and creates an segment list for each representation. Then the client selects one representation based on the information in the representation attributes and other information, e.g. available bandwidth, client capabilities. Client acquires initialisation segments and the media segments of the selected representation by using the generated segment list. Client continues consuming the media content by continuously requesting media segments taking into account the MPD update. Client may change representations taking into account updated MPD information and/or updated information from its environment, e.g. access bit-rate changes. Zong Expires April 27, 2011 [Page 9] Internet-Draft Survey and Gap Analysis October 2010 3.1.3.2. Segment list generation A list contains: 1) URL to initialization segment; 2) URLs to media segments; 3) start times to media segments in the period. There are two approaches for generating segment list. One is template based generation, that is to utilize the "URLtemplate" and "duration" attributes in "SegmentInfo" element in MPD. Another is play-list based generation, that is to utilize the "URLs" and "duration" in "SegmentInfo" element in MPD. 3.1.3.3. Seeking, trick mode and adaptation support Suppose that the client wants to seek to time "tp", the corresponding segment can be searched by the server through: Target_segment_index = max { i | MediaSegment[i].StartTime <= tp- Period.start }. For accurate seeking to time "tp", client needs to access a RAP. Client may use the information in the 'sidx' to locate the RAP and the corresponding presentation time in the media presentation. For fast start-up, client may initially request the 'sidx' box from the beginning of the media segment using byte range requests. Trick mode can be implemented by utilizing the "AlternatePlayoutRate" attribute in "TrickMode" element in MPD. Switching to a new representation is equivalent to seeking to the new representation. Client should seek to a RAP in the new representation at a desired presentation time "tp" later than current presentation time. 3.2. OIPF Open IPTV Forum (OIPF) introduces HTTP adaptive streaming in its technical Specification [OIPF]. This specification defines the usage of and, where necessary, extensions to the technologies defined in 3GPP TS 26.234 to enable HTTP based Adaptive Streaming for Release 2 OIPF compliant services and devices. Most details on HTTP adaptive streaming in this specification is based on 3GPP TS 26.234. The extensions and designs speficic to OIPF are introduced in this document. 3.2.1. MPD A Representation may be made up of multiple components, for example audio, video. A partial Representation may only contain some of these components and a terminal may need to download (and play) multiple partial Representations to build up a complete Representation, with the appropriate components according to the preferences and wishes of the user. Accordingly, in MPD, the Zong Expires April 27, 2011 [Page 10] Internet-Draft Survey and Gap Analysis October 2010 "Representation" element may consist of one or more Components which may be downloaded and provided to the terminal in addition to content being downloaded from other "Representation" elements. In this case the "Representation" element in MPD SHALL contain one or more "Component" elements. The "Representation" element in MPD may carry a "group" attribute. The value of the "group" attribute SHALL be the same for Representations that contain at least one same Component. Two Representations with completely different Components (e.g. audio at two different languages) SHALL have different values for the "group" attribute. To provide nPVR functionality, when the Segments of the live Content are stored on the nPVR server, the URLs indicating the Segments on the nPVR server SHOULD be provided to the OIPF to enable it to access these Segments by the MPD update mechanism defined in 3GPP TS 26.234. 3.2.2. Segmentation Each Segment SHALL start with a random access point (RAP). Moreover, to enable seamless switching: 1) Different Component Streams of the same Component SHALL be encoded in the same media format but MAY be different in the profile of that format. (e.g., if a Representation contains a Component Stream of a certain video Component that is encoded using H.264/AVC using the HD profile, then all other Representations that have a Component Stream of that Component must use H.264/AVC but may use different configurations within the HD profile.) 2) Segments of Representations with the same value for the "group" attribute SHALL be time aligned. 3.2.3. Media formats for MPEG2-TS Component Streams of the same Component (e.g. "video angle 1 in H.264 at 720x576" and "video angle 1 in H.264 at 320x288") SHALL be carried in transport stream packets that have the same PID. When the Segments of a Representation contain MPEG-2 TS packets, the value of the "id" attribute in each Component element, if present, SHALL be the PID of the Transport Stream packets which carry the Component. For all Representations, the PAT and PMT are either contained in the initialisation Segments or in the media Segments. The Representations with zero "group" attribute will have the same PAT/ PMT as Representations with non-zero "group" attribute. Zong Expires April 27, 2011 [Page 11] Internet-Draft Survey and Gap Analysis October 2010 A media Segment SHALL contain the concatenation of one or several contiguous (and complete) PES packets which are split and encapsulated into TS packets. When packetizing video elementary streams, up to one frame SHALL be included into one PES packet. The PES packet where a frame starts SHALL always contain a PTS/DTS header fields in the PES header. 3.2.4. Use cases 3.2.4.1. Live streaming If the "timeShiftBufferDepth" attribute is present in the MPD, it may be used by the terminal to know at any moment which Segments are effectively available for downloading with the current MPD. If this timeshift information is not present in the MPD, the terminal may assume that all Segments described in the MPD which are already in the past are available for downloading. Periods may be used in the live streaming scenario to appropriately describe successive live events with different encoding or adaptive streaming properties. 3.2.4.2. Trick mode and seeking Basic implementation of trick modes is based on the processing of Segments by the terminal software: downloaded Segments may be provided to the decoder at a speed lower or higher than normal. The playback of Segments in fast forward and fast rewind has an immediate effect on the bitrate, because the Segments also need to be downloaded at a faster rate than normal. Dedicated streams may be used to implement efficient trick modes: it is recommended to produce the streams with a lower frame rate, longer Segments or a lower resolution to ensure that the bitrate is kept at a reasonable level even when the Segment is downloaded at a faster rate. The dedicated stream is described as Representation with a "TrickMode" element in the MPD. It is also recommended that if there are dedicated fast forward Representations, the normal Representations do not contain the "TrickMode" element in the MPD. To determine the random access point in a media Segment, the client should download and search RAP one by one till the required RAP is found. 3.3. MPEG Recently, ISO/IEC JTC1/SC29/WG11 (MPEG) launched a new standard on HTTP streaming. A series of proposals [MPEG-1][MPEG-2][MPEG-3][MPEG-4] have been proposed to address the backgroud, objectives, use cases and requriements of the transport of MPEG media over HTTP, as well as call-for-propsal on this topic. Zong Expires April 27, 2011 [Page 12] Internet-Draft Survey and Gap Analysis October 2010 3.3.1. Objectives The main objectives of this new standard are: 1) Efficient delivery of MPEG media over HTTP in an adaptive, progressive, download/streaming fashion. 2) Support of live streaming of multimedia content. 3) Efficient and ease of use of existing content distribution infrastructure components such as CDNs, proxies, caches, NATs and firewalls. 4) Support of integrated services with multiple components. 5) Support for signaling, delivery, utilization of multiple content protection and rights management schemes, and support for efficient content forwarding and relay. 3.3.2. Requirements for proposal A list of requirements on HTTP streaming are ecouraged by MPEG. Only those related to media delivery are introduced as follows. 1) This standard shall support streaming of content and content components over HTTP 1.1. 2) The media files prepared for this standard should be deliverable using progressive download with minimal changes. 3) This standard shall support streaming of live content of possibly indefinite length, including PVR functionalities such as pause and time-shifted play. 4) The standard shall support random access (seeking). 5) The standard shall support trick modes at least to the extent that the underlying formats support them in local playback. 6) The standard shall not require any extension to HTTP 1.1. It shall support the efficient use of HTTP optimized infrastructures such as Content Delivery Networks (CDNs), caches and proxies. 7) The standard shall allow segmentation of the content. The standard shall not require fixed size or fixed duration segments during delivery of content. Zong Expires April 27, 2011 [Page 13] Internet-Draft Survey and Gap Analysis October 2010 8) The standard should introduce minimal transport overhead and should incur minimal presentation startup delay. 9) The standard shall support description of media components for delivery and presentation. 10) The standard shall support interactive selection of media components for delivery and presentation, for example view selection in multi-view content. 11) This standard shall support prioritization of content and content components. 12) This standard shall support signaling the relationship among content components. 13) The standard should support network transition during delivery of the content. 14) The standard shall enable adaptation of content along axes such as bitrate, temporal resolution, spatial resolution, quality/ fidelity or view perspective. 15) The standard shall support initial selection, and dynamic adaptation of the content without presentation interruption during delivery. 4. HTTP Streaming Implementations 4.1. Microsoft Smooth Streaming Smooth Streaming is Microsoft implementation of adaptive streaming technology, which is a web-base media content delivery that uses standard HTTP [MS-IIS]. Instead of delivering media as full-file download, or as progressive download, the content is delivered to client as a series of small file chunks that can be easily and cheaply cached at edge servers, closer to client. Smooth Streaming defines each chunk/GOP as an MPEG-4 Movie Fragment and stores it within a contiguous MP4 file for easy random access. One MP4 file is expected for each bit rate. Because the media is "virtually" split into fragment files, the server must translate sequential URL requests into exact byte range offsets within the MP4 file. Server extracts the fragment box and sends it over the wire to the client as a standalone file. Zong Expires April 27, 2011 [Page 14] Internet-Draft Survey and Gap Analysis October 2010 4.1.1. On-disk MP4 file format +-------------------------------------------------------------------+ | +----+ +---------------------+ +--------------+ +------+ +------+ | | | | | Movie Metadata(moov)| |Movie Fragment| |Media | |Movie | | | |file| |+-----++-----++-----+| | (moof) | |Data | |Frag | | | |type| ||Movie||Track||Movie|| |+----+ +-----+| |(mdat)| |Random| | | | | ||hdr || ||Ext. || ||Frag| |Track|| | | |Access| | | | | || || || || ||hdr | |Frag || | | |(mfra)| | | | | || || || || || | | || | | | | | | | | |+-----++-----++-----+| |+----+ +-----+| | | | | | | +----+ +---------------------+ +--------------+ +------+ +------+ | +-------------------------------------------------------------------+ In a nutshell, the MP4 file starts with file-level metadata ('moov') that generically describes the file, but the bulk of the payload is actually contained in the fragment boxes that also carry more accurate fragment-level metadata ('moof') and media data ('mdat'). Closing the file is an 'mfra' index box that allows easy and accurate seeking within the file. In Smooth Streaming, the MP4 files are classified into two kinds. One is *.ismv file containing video and audio. Another is *.isma containing audio only. Beside media files, there are manifest files. Server manifest file (*.ism) describes the relationships between the media tracks, bit rates and files on disk. Client manifest file (*.ismc) describes the available streams to the client: the codecs used, bit rates encoded, video resolutions, markers, captions, etc. 4.1.2. On-wire segments transmission Initially, the client requests the *.ismc client manifest from the server. Client then requests fragments in the form of a URL, e.g., h ttp://video.foo.com/NBA.ism/QualityLevels(400000)/ Fragments(video=610275114). Server then looks up the quality level (bit rate) in the corresponding *.ism server manifest and maps it to a physical *.ismv or *.isma file on disk. Server reads the appropriate MP4 file, and based on its 'tfra' index box, figures out which fragment box ('moof' + 'mdat') corresponds to the requested start time offset. Server extracts the fragment box and sends it over the wire to the client as a standalone file. The sent fragment/ file can now be automatically cached further down the network, potentially saving the origin server from sending the same fragment/ file again to another client that requests the same URL. Zong Expires April 27, 2011 [Page 15] Internet-Draft Survey and Gap Analysis October 2010 4.1.3. Adaptative support Smooth Streaming provides multiple encoded bit rates of the same media source and thus allow client to seamlessly switch between bit rates. As client plays chunks, network condition may change or media processing may be impacted by other applications. Client can immediately request the next chunk come from stream that is encoded at a different bit rate to accommodate changing conditions. This enables client to play media without any stuttering, buffering and freezing, thereby providing fittest-quality playback to client. 4.2. Adobe Adobe HTTP Dynamic Streaming is a new Adobe-defined delivery method for enabling on-demand and live adaptive bitrate video streaming over regular HTTP connections [Adobe]. HTTP Dynamic Streaming packages media files into fragments that Flash Player clients can access instantly without downloading the entire file. Adobe HTTP Dynamic Streaming contains several components that work together to package media and stream it over HTTP to Flash Player. 4.2.1. Components File Packagers include Live Packager and VoD Packager. VoD Packager translates on-demand media files into fragments and writes the fragments to F4F files. Live Packager translates ingested live streams over Real Time Messaging Protocol (RTMP) into F4F files in real-time. HTTP Origin Module is an Apache HTTP Server module that serves the F4F files created by the File Packagers. The F4F file format describes how to divide media content into segments and fragments. Each fragment has its own bootstrap information that provides cache management and fast seeking. The F4M Manifest file format contains information about a package of files that the HTTP Origin Module can serve. Manifest information includes codecs, resolutions, and the availability of files encoded at multiple bit rates. 4.2.2. Workflow HTTP Dynamic Streaming workflow includes content preparation which write media fragments into files, distribution of files over HTTP, media consumption and protection, etc. Zong Expires April 27, 2011 [Page 16] Internet-Draft Survey and Gap Analysis October 2010 +--------+ +-------+ +-------+ +------+ | | | | | | | | Live | |F4F/F4M | | | | | | streaming|File |Files | HTTP |HTTP | HTTP |HTTP |Client| -------->|Packager|------->| Origin|Delivery| Cache/|Delivery|Appl. | | | | Module|------->| CDN |------->| | VoD | | | | | | | | content | | | | | | | | -------->| | | | | | | | +--------+ +-------+ +-------+ +------+ 4.2.3. Top features HTTP Dynamic Streaming supports features like adaptive bitrate, DVR functionality, etc. 1) Adaptive bitrate. To stream multi-bitrate content, the server encodes a piece of media at multiple bitrates, creating multiple files. The media files share a manifest file that lists information about each media file. With this information, the client detects the client's bandwidth, computer resources, etc and requests content fragments encoded at the most appropriate bitrate for the best viewing experience. 2) DVR functionality. Add interactivity to live streams by enabling DVR functionality, allowing viewers to pause, rewind, and skip forward to real time. 3) Support for standard HTTP caching systems. Leverage existing standard server hardware and caching infrastructures to maximize capacity and reach. 4.3. Apple Apple HTTP Live Streaming [Apple] allows to send live or prerecorded audio and video to iPhone or other devices, such as desktop computers, using an ordinary Web server. Playback requires iPhone OS 3.0 or later on iPhone or iPod touch; QuickTime X or later is required on the desktop. Zong Expires April 27, 2011 [Page 17] Internet-Draft Survey and Gap Analysis October 2010 4.3.1. Basic process +-------+ +---------+ +------+ +------+ | | | | | | | | Live | |MPEG2 | |Index/ | | | | streaming|Media |TS |Stream |.ts Files|HTTP |HTTP |Client| -------->|Encoder|----->|Segmenter|-------->|Server|Delivery|Appl. | | | | | | |------->| | VoD | | | | | | | | content | | | | | | | | -------->| | | | | | | | +-------+ +---------+ +------+ +------+ Media Encoder takes audio-video input and turns it into an MPEG-2 Transport Stream. Currently, the supported format is MPEG-2 Transport Streams (with H.264 video and AAC audio) for audio-video, or MPEG elementary streams for audio. Stream segmenter reads the Transport Stream from the local network and divides it into a series of small media files (.ts files) of equal duration, and creates an index file containing a playlist of the media files, as well as meta-data information. The index file is in .M3U8 format. In the case of a live stream, each time the segmenter completes a new media file, the index file is updated. The index is used to track the availability and location of the media files. Both .ts and .M3U8 files are placed on a HTTP server. A HTTP server or a web caching system that delivers the media files and index files to the client over HTTP. A client begins by fetching the index file, based on a URL identifying the stream. The index file in turn specifies the location of the available media files, decryption keys, and any alternate streams available. For the selected stream, the client downloads each available media file in sequence. Each file contains a consecutive segment of the stream. Once it has a sufficient amount of data downloaded, the client begins presenting the reassembled stream to the user. In addition, HTTP Live Streaming technology supports adaptive bitrate and automatically switches to the optimal bitrate based on the network conditions for a smooth quality playback experience. 5. Gap Analysys Zong Expires April 27, 2011 [Page 18] Internet-Draft Survey and Gap Analysis October 2010 5.1. Brief Summary of Exitsting Work It can be observed that 3GPP, OIPF, MS Smooth Streaming, Adobe Dynamic Streaming and Apple HTTP Live Streaming all follow a similar design scope, that is: 1) Streaming server utilizes a stream encoder/segmenter to write the media content into a series of small files, as well as produce a manifest file to describe these media files. See below summary of existing defined media and menifest files for HTTP streaming, regardless the codec and media container type. | Media File | Menifest File | ========================================================= 3GPP/OIPF | .3GP file | .3GP file | --------------------------------------------------------- MS Smooth HTTP | .ismv/.isma file | .ism/.ismc file | --------------------------------------------------------- Adobe Dynamic HTTP | .F4F file | .F4M file | --------------------------------------------------------- Apple Live HTTP | .ts file | .M3U8 file | 2) HTTP client firstly obtains the menifest file, then construct a series of URIs pointing to the media files. Based on the condition of client (e.g. network, device type, etc), or in the situation when the user operates trick mode, the client choose to request certain media file using HTTP request with the corresponding URI. 3) Upon receiving the HTTP request, the HTTP server send the media file corresponding to the URI in the request to the client. Apparently, the above design leave the network transport out of scope, that is, the media (both live streaming and VoD content) is encrypted into files and further transmitted by standard HTTP as payload. From the network transport point of view, there is no difference between transmission of such media data and normal text file. All the main features of media streaming, such as meta- information of media, PVR funtion, seeking, trick mode, adaptation between different viewing quality, etc, are implemented (or can be implemented) by the negotiation between server and client by flexible MPD, or menifest file. Another word, all the intelligence in current HTTP streaming design resides on the server and client software, rather than the network transport. Zong Expires April 27, 2011 [Page 19] Internet-Draft Survey and Gap Analysis October 2010 5.2. Challenges However streaming long duration and high quality media over the best- effort Internet to satisfy the real-time streaming requirements faces several challenges when there are no network capabilities support for HTTP Streaming. The first challenge is that the current HTTP streaming is based on pull mode where the HTTP client relies on the updated menifest file from the server to pull the chunk one after another through issuing a sequence of HTTP requests to the HTTP server. In the case of live streaming, the server will need to update the manifest file frequently once a new chunk of live media becomes available. Hence, a potential problem is that there will be additional round trips between the client and the server for manifest file update before the client can request each new chunk, which could risk the real-time feature of live streaming. HTTP server push model, on the other hand, enables the server to actively and continuously push chunks to the client once a new chunk is available on the server, without the round trips between the client and the server for manifest file update. In this sense, push model could be more efficient and a better candidate for time-sensitive scenario. The second challenge is the lack of QoE improvement and monitoring mechanisms in current HTTP streaming systems. Compared to the dedicated IPTV system, the HTTP streaming based on the best-effort Internet may suffer more from network transition. For example, when a user switches live channel, the current group of pictures (GoP) and initialization information for decoders (a.k.a. Reference Information (RI)) of the media content need to be acquired by the client ASAP to start playback. Unfortunately, there is no mechanism so far to improve the transmission of the important HTTP packets, hence may introduce a long delay to start the playback in the scenario of HTTP streaming. Additionally, some QoE metrics at session level, such as startup delay are important to the HTTP streaming system for monitoring or diagnostic purpose. Unfortunately, there is no such quality monitoring mechanisms (e.g. like RTCP report) in current HTTP streaming system. To provide a high-quality service for the user, monitoring and analyzing the system's overall performance is extremely important, since offering the performance monitoring capability can help diagnose the potential network impairment. With these above challenges, the typical user experience in the existing HTTP streaming schemes can be limited by delayed startups, poor quality, buffering delays, etc. Especially, in the case of "Multi-Screen" applications, the service provider intends to provide a common user experience when the user enjoys the media content Zong Expires April 27, 2011 [Page 20] Internet-Draft Survey and Gap Analysis October 2010 across PCs, TVs, and smart-phones. Therefore, HTTP streaming over the Internet without some optimization on network transport for QoE improvement may lead difficulty for the service provider to comply the service level agreements (SLAs) between service provider and users. 5.3. Gap List and Potential Working Scope in IETF The following table list the gaps in exisiting works on HTTP streaming including 3GPP, MS, Adobe, and Apple. | If satisfied by | Characteristic | existing work | ========================================================== Adaptation bir-rate | Yes | | ---------------------------------------------------------- Playback control | Yes | | ---------------------------------------------------------- Use existing cache, CDN | Yes | | ---------------------------------------------------------- client pull model | Yes | | ---------------------------------------------------------- server push model | | No | ---------------------------------------------------------- Reliable transmission in network | Yes | | ---------------------------------------------------------- Real-time support in network | | No | ---------------------------------------------------------- QoE improvement (e.g. startup) | | No | ---------------------------------------------------------- QoE monitoring | | No | ---------------------------------------------------------- Multicast support for scalability | | No | As the leading SDO on making the Internet work better, IETF is a suitable place to address the above mentioned gaps by studying and enhancing the network to meet the real-time requirement of HTTP streaming system. A potential working scope can be: 1) investigate the usage of server push model in HTTP streaming to find the better model for the more time-sensitive applications, such as live streaming; 2) study some QoE monitoring and feedback mechanisms (e.g. like RTCP report ) in HTTP streaming system, including monitoring architecture, feedback message coding, QoE metrics for HTTP streaming, etc; 3) define some mechanisms for QoE improvement for HTTP streaming, such as reducing startup delay in playback when user swithes live channel or starts VoD; 4) further improve the real-time streaming performance from the aspect of network transport functions. Please refer to [HTTPStreamingPS], for more details on the problem Zong Expires April 27, 2011 [Page 21] Internet-Draft Survey and Gap Analysis October 2010 statement and scope of work. 6. IANA Considerations This document presently raises no IANA considerations. 7. Security Considerations This document presently raises no security considerations. 8. Acknowledgements The authors would like to thank many people who give valuable comments on this draft. 9. References 9.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. 9.2. Informative References [3GPP] 3GPP, "Transparent end-to-end Packet-switched Streaming Service (PSS) - Protocols and codecs (Release 9)", March 2010. [OIPF] OIPF, "HTTP Adaptive Streaming (Release 2)", September 2010. [MPEG-1] ISO/IEC JTC1/SC29/WG11, "HTTP Streaming of MPEG Media Context and Objectives (N11337)", April 2010. [MPEG-2] ISO/IEC JTC1/SC29/WG11, "Call for Proposals on HTTP Streaming of MPEG Media (N11338)", April 2010. [MPEG-3] ISO/IEC JTC1/SC29/WG11, "Use Cases for HTTP Streaming of MPEG Media (N11339)", April 2010. [MPEG-4] ISO/IEC JTC1/SC29/WG11, "Requirements on HTTP Streaming of MPEG Media (N11340)", April 2010. [MS-IIS] Microsoft Corporation, "IIS Smooth Streaming Technical Zong Expires April 27, 2011 [Page 22] Internet-Draft Survey and Gap Analysis October 2010 Overview", March 2009. [Adobe] Adobe, "Using ADOBE HTTP DYNAMIC STREAMING", 2010. [Apple] Apple, "HTTP Live Streaming Overview", November 2009. [HTTPStreamingPS] Wu, Q., "Problem Statement for HTTP Streaming", draft-wu-http-streaming-optimization-ps-02.txt (work in progress), September 2010. Author's Address Ning Zong Huawei Technologies Phone: +86 25 56624760 Email: zongning@huawei.com Zong Expires April 27, 2011 [Page 23]