idnits 2.17.1 draft-mekuria-mmediaingest-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There is 1 instance of lines with control characters in the document. == There is 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 145 has weird spacing: '...ies and synch...' == Line 822 has weird spacing: '...egments poste...' == Line 823 has weird spacing: '...cifying the f...' == Line 870 has weird spacing: '...unities for d...' == Line 871 has weird spacing: '...type of event...' == (5 more instances...) == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: 1. Create the metadata stream as a fragmentedMP4stream that conveys the metadata , the media handler (hdlr) is "meta", the track handler box is a null media header box "nmhd". 2. The metadata stream applies to the media streams in the presentation ingested to active publishing point at the media processing entity 3. The URIMetaSampleEntry entry contains, in a URIbox, the URI following the URI syntax in [RFC3986] defining the form of the metadata (see the ISO Base media file format specification [ISOBMFF]). For example, the URIBox could contain for ID3 tags [ID3v2] the URL http://www.id3.org or or urn:scte:scte35:2013a:bin for scte 35 markers [SCTE-35] 4. The timescale of the metadata should match the value specified in the media header box "mdhd" of the metadata track. Mekuria & Zhang Expires January 15 2019 [Page19] 5. The Arrival time is signalled in the "tfdt" box of the track fragment as the basemediadecode time, this time is often different from the media presentation time, which is occurs when a message is applied. The duration of a metadata fragment can be set to zero, letting it be determined by the time (tfdt) of a next metadata segment received. 6. All Timed Metadata samples SHOULD be sync samples [ISOBMFF], defining the entire set of metadata for the time interval they cover. Hence, the sync sample table box SHOULD not be present in the metadata stream. 7. The metadata segment becomes available to the publishing point/ media processing entity when the corresponding track fragment from the media that has an equal or larger timestamp compared to the arrival time signalled in the tfdt basemediadecodetime. For example, if the sparse fragment has a timestamp of t=1000, it is expected that after the publishing point/processing entity sees "video" (assuming the parent track name is "video") fragment timestamp 1000 or beyond, it can retrieve the sparse fragment from the binary payload. 8. The payload of sparse track fragments is conveyed in the mdat box as sample information. This enables muxing of the metadata tracks. For example XML metadata can for example be coded as base64 as common for [SCTE-35] metadata messages -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'DASH' -- Possible downref: Non-RFC (?) normative reference: ref. 'SCTE-35' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISOBMFF' -- Possible downref: Non-RFC (?) normative reference: ref. 'HEVC' ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) -- Possible downref: Non-RFC (?) normative reference: ref. 'CMAF' -- Possible downref: Non-RFC (?) normative reference: ref. 'CENC' -- Possible downref: Non-RFC (?) normative reference: ref. 'MPEG-4-30' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO639-2' -- Possible downref: Non-RFC (?) normative reference: ref. 'DVB-DASH' -- Obsolete informational reference (is this intentional?): RFC 2818 (Obsoleted by RFC 9110) Summary: 2 errors (**), 0 flaws (~~), 9 warnings (==), 11 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Rufael Mekuria 3 Unified Streaming 4 Internet Engineering Task Force Sam Geqiang Zhang 5 Internet-Draft Microsoft 6 Expires: January 15, 2019 8 Intended status: Best Current Practice July 15 2018 10 Live Media and Metadata Ingest Protocol 11 draft-mekuria-mmediaingest-01.txt 13 Abstract 15 This Internet draft presents a best industry practice for 16 ingesting encoded live media to media processing entities. 17 Two profiles of the media ingest are defined covering the most 18 common use cases. The first profile facilates active media 19 processing and is based on the fragmented MPEG-4 format. 20 The second profile enables efficient ingest of media streaming 21 presentations based on established streaming protocols 22 by also adding a manifest besides the fragmented MPEG-4 stream. 23 Details on carriage of metadata markers, timed text, 24 subtitles and encryption specific metadata are also included. 26 Status of This Memo 28 This Internet-Draft is submitted in full conformance 29 with the provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet 32 Engineering Task Force (IETF). Note that other groups 33 may also distribute working documents as Internet-Drafts. 34 The list of current Internet- 35 Drafts is at http://datatracker.ietf.org/drafts/current/. 37 Internet-Drafts are draft documents valid for a maximum 38 of six months and may be updated, replaced, or obsoleted 39 by other documents at any time. It is inappropriate to 40 use Internet-Drafts as reference material or to cite 41 them other than as "work in progress." 43 Mekuria & Zhang Expires January 15 2019 [Page1] 44 Copyright Notice 46 Copyright (c) 2018 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents 51 (http://trustee.ietf.org/license-info) in effect on the date of 52 publication of this document. Please review these documents 53 carefully, as they describe your rights and restrictions with 54 respect to this document. Code Components extracted from this 55 document must include Simplified BSD License text as described 56 in Section 4.e of the Trust Legal Provisions and are provided 57 without warranty as described in the Simplified BSD License 59 Table of Contents 61 1. Introduction 62 2. Conventions and Terminology 63 3. Media Ingest Workflows and Use Cases 64 4. General Media Ingest Protocol Behavior 65 5. Profile 1: Fragmented MPEG-4 Ingest General Considerations 66 6. Profile 1: Fragmented MPEG-4 Ingest Protocol Behavior 67 6.1 General Protocol Requirements 68 6.2 Requirements for Formatting Media Tracks 69 6.3 Requirements for Timed Text Captions and Subtitle Streams 70 6.4 Requirements for Timed Metadata 71 6.5 Requirements for Media Processing Entity Failover 72 6.6 Requirements for Live Media Source Failover 73 7. Profile 2: DASH and HLS Ingest General Considerations 74 8. Profile 2: DASH and HLS Ingest Protocol Behavior 75 8.1 General Protocol requirements 76 8.2 Requirements for Formatting Media Tracks 77 8.3 Requirements for Timed Text, Caption and Subtitle Streams 78 8.4 Requirements for Timed Metadata 79 8.5 Requirements for Media Processing Entity Failover 80 8.6 Requirements for Live Media Source Failover 81 9. Security Considerations 82 10. IANA Considerations 83 11. Contributors 84 12. References 85 12.1. Normative References 86 12.2. Informative References 87 12.3. URL References 88 Author's Address 90 Mekuria & Zhang Expires January 15 2019 [Page2] 91 1. Introduction 93 This document describes a best practice for ingesting 94 encoded media content from a live source such as a 95 live video encoder towards distributed media 96 processing entities. Examples of such entities 97 include media packagers, publishing points, 98 streaming origins and content delivery networks. 99 The combination of live sources ingesting 100 media and distributed media processing entities 101 is important in practical video streaming deployments. 102 In such deployments, interoperability between 103 live sources and downstream processing 104 entities can be challenging. 105 This challenge comes from the fact that 106 there are multiple levels of interoperability 107 that need to be adressed and achieved. 109 For example, the network protocol for transmission 110 of data and the setup of the connectivity are important. 111 This includes schemes for establishing the ingest 112 connection, handling disconnections and failures, 113 procedures for repeatedly sending and receving 114 the data, and timely resolution of hostnames. 116 A second level of interoperability lies 117 in the media container and coded media formats. 118 The Moving Picture Experts Group defined several media 119 container formats such as [ISOBMFF] and MPEG-2 Transport 120 Stream which are widely adopted and well supported. 121 However, these are general purpose formats, 122 targetting several different application areas. 123 To do so they provide many different profiles and options. 124 Detailed operability is often achieved through 125 other application standards such as those for 126 the broadcast or storage. In addition, the codec 127 and profile used, e.g. [HEVC] is an important 128 interoperability point that itself also 129 has different profiles and options. 131 A third level, is the way metadata is 132 inserted in streams which can be a source 133 of interoperability issues, especially for live 134 content that needs such meta-data to signal 135 opportunities for signalling ad insertion, 136 or other metadata like timed graphics. Examples 137 of such metadata include [SCTE-35] markers which 138 are often found in broadcast streams and other 139 metadata like ID3 tags [ID3v2]. 141 Mekuria & Zhang Expires January 15 2019 [Page3] 142 Fourth, for live media handling the timeline 143 of the presentation consistently is important. 144 This includes correct sampling of media, avoiding 145 timeline discontinuities and synchronizing 146 timestamps attached by different live sources. 148 Fifth, in streaming workflows it is important 149 to have support for failovers of both the live sources 150 and the media processing entities. This is important 151 to avoid interruptions of 24/7 live services such 152 as Internet television where components can fail. 153 In practical deployments, multiple live sources 154 and media processing entities are used. This requires 155 the multile live sources and media processing to 156 work together in a redundant workflow where 157 some of the components might fail. 159 This document provides an industry best 160 practice approach for establishing these 161 interoperability points for live media ingest. 162 The approaches are based on known standardized 163 technologies and have been tested and deployed 164 in several streaming large scale streaming 165 deployments. Two key workflows have been 166 identified for which two different media 167 ingest profiles will be detailed. 169 In first workflow, encoded media is ingested 170 downstream for further processing of the media. 171 Examples of such media processing could be any 172 media transformation such as packaging, 173 encrypting or transcoding the stream. 174 Other operations could include watermarking, 175 content insertion and generating streaming manifests 176 based on [DASH] or HLS[RFC8216]. What is typical 177 of these operations is that they actively inspect, 178 or modify the media content and may 179 generate new derived media content. 180 In this workflow it is is important 181 to convey mediadata and metadata that 182 assists such active media processing operations. 183 This is workflow type will be adressed 184 in the first profile. 186 Mekuria & Zhang Expires January 15 2019 [Page4] 187 In the second workflow, the encoded media is ingested 188 into an entity that does none or very minimal inspection 189 or modification of the media content. The main aim 190 of such processing entities often lies in storage, 191 caching and delivery of the media content. An example 192 of such an entity is a Content Delivery Network (CDN) 193 for delivering and caching Internet content. 194 Content delivery networks are often designed for 195 Internet content like web pages and might 196 not be aware of media specific aspects. In fact, streaming 197 protocols like MPEG DASH and HTTP Live Streaming have been 198 developed with re-use of such a media agnostic 199 Content Delivery Networks in mind. For ingesting 200 encoded media into a content delivery network it 201 is important to have the media presentation in a form 202 that is very close or matching to the format 203 that the clients need to playback the presentation, 204 as changing or complementing the media presentation 205 will be difficult. This second workflow is addressed 206 in profile 2. 208 Diagram 1: Example with media ingest in profile 1 209 ============ ============== ============== 210 || live || ingest|| Active || HLS || Content || HLS 211 || media ||====>>>||processing||===>>>|| Delivery ||==>>>Client 212 || source || || entity || DASH || Network || DASH 213 ============ ============== ============== 215 Diagram 2: Example with media ingest in profile 2 217 ============ ============== 218 || live || ingest|| Content || 219 || media ||====>>>||Delivery ||==>>>> Client 220 || source || || Network || 221 ============ ============== 223 Diagram 1 shows the workflow with a live media ingest from a 224 live media source towards an active media processing entity. 225 In the example in diagram 1 the media processing entity 226 prepares the final media presentation for the client 227 that is delivered by the Content Delivery Network to a client. 229 Diagram 2 shows the example in workflow 2 were content 230 is ingested directly into a Content Delivery Network. 231 The content delivery network enables the delivery to the client. 233 An example of a media ingest protocol 234 is the ingest part of Microsoft Smooth 235 Streaming protocol [MS-SSTR]. This protocol 236 connects live encoders to 237 the Microsoft Smooth Streaming server and to 238 the Microsoft Azure cloud. 240 Mekuria & Zhang Expires January 15 2019 [Page5] 241 This protocol has shown 242 to be robust, flexible and easy to implement in live 243 encoders. In addition it provided features for 244 high availability and server side redundancy. 246 The first profile relating to workflow 1 247 advances over the smooth ingest procotol 248 including lessons learned over the last 249 ten years after the initial deployment of 250 smooth streaming in 2009 and several advances 251 on signalling of information 252 such as timed metadata markers for content insertion. 253 In addition, it incorporates the latest media formats 254 and protocols, making it ready for current and 255 next generation media codecs such as [HEVC] 256 and protocols like MPEG DASH [DASH]. 258 A second profile is included for ingest of media 259 streaming presentations to entities were 260 the media is not altered actively, and further 261 media processing perhaps restricted to the manifests. 262 A key idea of this part of the specification is to re-use 263 the similarities of MPEG DASH [DASH] and HLS[RFC8216] protocols 264 to enable a simultaneous ingest of media 265 presentations of these two formats using 266 common media segments such as based on [ISOBMFF] 267 and [CMAF] formats. In addition, in this 268 approach naming is important to enable direct 269 processing and storage of the presentation. 271 Based on our experience we present 272 these two as separate profiles to 273 handle the two workflows. 274 We made this decision as it will 275 reduce a lot of overhead in the 276 information that needs to be signalled 277 compared to having both profiles 278 combined into one, as was the case 279 in a prior version of this draft. 281 We further motivate this best practice presented 282 in this document supporting using 283 HTTP [RFC2626] and [ISOBMFF] a bit more. 284 We believe that Smooth streaming [MS-SSTR] 285 and HLS [RFC8216] have shown that HTTP usage 286 can survive the Internet ecosystem for 287 media delivery. In addition, HTTP based 288 ingest fits well with current HTTP 289 based streaming protocols including [DASH]. 290 In addition, there is good support for HTTP 291 middleboxes and HTTP routing available 292 making it easier to debug and trace errors. 293 The HTTP POST provides a push based 294 method for delivery for pusing the 295 live content when available. 296 Mekuria & Zhang Expires January 15 2019 [Page6] 297 The binary media format for conveying 298 the media is based on fragmented MPEG-4 as 299 specified in [ISOBMFF] [CMAF]. A key benefit of this 300 format is that it allows easy identification 301 of stream boundaries, enabling switching, redundancy, 302 re-transmission resulting in a good fit with the current 303 Internet infrastructures. Many problems in 304 practical streaming deployment often deal 305 with issues related to the binary 306 media format. We believe that the fragmented 307 MPEG-4 will make things easier 308 and that the industry is already heading 309 in this direction following recent specifications 310 like [CMAF] and HLS[RFC8216]. 312 Regarding the transports protocol, in future versions, 313 alternative transport protocols could be considered 314 advancing over HTTP. We believe the proposed media format 315 will provide the same benefits with other transports 316 protocols. Our view is that for current and near future 317 deployments using [RFC2626] is still a good approach. 319 The document is structured as follows, in section 2 320 we present the conventions and terminology used throughout 321 this document. In section 3 we present use cases and 322 workflows related to media ingest and the two profiles 323 presented. Sections 4-8 will detail the protocol and 324 the two different profiles. 326 2. Conventions and Terminology 328 The following terminology is used in the rest of this document. 330 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 331 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 332 document are to be interpreted as described in BCP 14, RFC 2119 333 [RFC2119]. 335 ISOBMFF: the ISO Base Media File Format specified in [ISOBMFF]. 337 Expires January 15 2019 [Page7] 338 Live Ingest Stream: 339 the stream of media produced by the live source 340 transmitted to the media processing entity. 341 Live Stream Event: 342 the total live stream for the ingest. 343 (Live) encoder: entity performing live 344 encoding and producing a high quality live stream, 345 can serve as media ingest source 346 Media (Ingest) source: 347 a media source ingesting media content 348 , typically a live encoder but not restricted 349 to this,the media ingest source could by any 350 type of media ingest source such as a stored 351 file that is send in partial chunks. 352 Live Ingest Source: 353 Media Ingest source producing live content 354 Publishing point: 355 entity used to publish the media content, 356 consumes/receives the incoming media ingest stream 357 Media processing entity: 358 entity used to process the media content, 359 receives/consumes a media ingest stream. 360 Media processing function: 361 Media processing entity 362 Connection: 363 a connection setup between two hosts, typically the 364 media ingest source and media processing entity. 365 ftyp: 366 the filetype and compatibility "ftyp" box as described 367 in the ISOBMFF [ISOBMFF] that describes the "brand" 368 moov: 369 the container box for all metadata "moov" described in the 370 ISOBMFF base media file format [ISOBMFF] 371 moof: 372 the movie fragment "moof" box as described in the 373 ISOBMFF base media file format [ISOBMFF] that describes 374 the metadata of a fragment of media. 375 mdat: 376 the media data container "mdat" box contained in 377 an ISOBMFF [ISOBMFF], this box contains the 378 compressed media samples 379 kind: 380 the track kind box defined in the ISOBMFF [ISOBMFF] 381 to label a track with its usage 382 mfra: 383 the movie fragment random access "mfra" box defined in 384 the ISOBMFF [ISOBMFF] to signal random access samples 385 (these are samples that require no prior 386 or other samples for decoding) [ISOBMFF]. 387 tfdt: 388 the TrackFragmentBaseMediaDecodeTimeBox box "tfdt" 389 in the base media file format [ISOBMFF] used 390 to signal the decode time of the media 391 fragment signalled in the moof box. 392 Expires January 15 2019 [Page8] 393 mdhd: 394 The media header box "mdhd" as defined in [ISOBMFF], 395 this box contains information about the media such 396 as timescale, duration, language using ISO 639-2/T codes 397 [ISO639-2] 398 pssh: 399 The protection specific system header "pssh" box defined 400 in [CENC] that can be used to signal the content protection 401 information according to the MPEG Common Encryption [CENC] 402 sinf: 403 Protection scheme information box "sinf" defined in 404 [ISOBMFF] that provides information on the encryption 405 scheme used in the file 406 elng: 407 extended language box "elng" defined in [ISOBMFF] that 408 can override the language information 409 nmhd: 410 The null media header Box "nmhd" as defined in [ISOBMFF] 411 to signal a track for which no specific 412 media header is defined, often used for metadata tracks 413 HTTP: 414 Hyper Text Transfer Protocol, 415 version 1.1 as specified by [RFC2626] 416 HTTP POST: 417 Command used in the Hyper Text Transfer Protocol for 418 sending data from a source to a destination [RFC2626] 419 fragmentedMP4stream: 420 stream of [ISOBMFF] fragments 421 (moof and mdat), a more precise definition will follow 422 later in this section. 423 POST_URL: 424 Target URL of a POST command in the HTTP protocol 425 for posting data from a source to a destination. 426 TCP: 427 Transmission Control Protocol (TCP) as defined in [RFC793] 428 URI_SAFE_IDENTIFIER: 429 identifier/string formatted according to [RFC3986] 431 A fragmentedMP4stream can be defined 432 using the IETF RFC 5234 ANB [RFC5234] as follows. 434 fragmentedMP4stream = headerboxes fragments 435 headerboxes = ftyp moov 436 fragments = X fragment 437 fragment = Moof Mdat 439 This fragmentedMP4 stream is used in both profiles. 441 Mekuria & Zhang Expires January 15 2019 [Page9] 442 3. Media Ingest Workflows and Use Cases 444 In this section we highlight some of the target use cases 445 and example workflows for the media ingest. 446 Diagram 3 shows an example workflow of media ingest 447 with profile 1 in a streaming workflow. The live media 448 is ingested into the media processing entity that performs 449 operations like on-the-fly encryption, content stitching 450 packaging and possibly other operations before 451 delivery of the final media presentation to the client. 452 This type of distributed media processing offloads 453 many functionalities from the live media source. 454 As long as the stream originating from the media 455 source contains sufficient metadata, the media 456 processing entity can generate the media presentation 457 for streaming to clients or other derived media 458 presentations as needed by a client. 460 Diagram 4 shows an alternative example with ingest 461 to a content delivery network, or perhaps another 462 passive media entity such as a storage. In this case 463 the live media source posts the segments and the 464 manifests for the media presentation. 465 In this case, still fragmented MPEG-4 segments can be used, 466 but the ingest works slightly different. 468 Diagram 3: 469 Streaming workflow with fragmented MPEG-4 ingest in profile 1 470 ============ ============== ============== 471 || live ||ingest || Media || HLS || Content || HLS 472 || media ||====>>>||processing||===>>>|| Delivery ||==>>> Client 473 || source || fmp4 || entity || DASH || Network || DASH 474 ============ ============== ============== 476 Diagram 4: 477 Streaming workflow with DASH ingest in profile 2 478 ============ingest ============== 479 || live || DASH || Content || 480 || media ||====>>>||Delivery ||==>>>> Client 481 || source || || Network || 482 ============ ============== 484 Mekuria & Zhang Expires January 15 2019 [Page10] 485 Practice has shown that the ingest schemes 486 can be quite different for the two configurations 487 , and that combining them into a single protocol 488 will result in overhead such as sending 489 duplicate information in the manifest or 490 ISOBMFF moov box, and increased 491 signalling overhead for starting, closing 492 and resetting the connection. Therefore, 493 the two procedures for media ingest in 494 such two common workflows are presented 495 as separate profiles in the next two sections. 497 In Diagram 5 we highlight some of the key 498 differences for practical consideration between 499 the profiles. In profile 1 the encoder can be 500 simple as the media processing entity can 501 do many of the operations related to the 502 delivery such as encryption or generating the streaming 503 manifests. In addition the distribution of functionalities 504 can make it easier to scale a deployment with many 505 live media sources and media processing entities. 507 In some cases, an encoder has sufficient 508 capabilities to prepare the final presentation for the 509 client, in that case content can be ingested directly 510 to a more passive media processing entity that provides 511 a more pass through like functionality. 512 In this case also manifests and other client specific 513 information needs to be ingested. Besides these factors 514 , chosing a workflow for a video streaming platform depends 515 on many factors. The media ingest best practice 516 covers these two types of workflows by two different 517 profiles. The best choice for a specific platform depends 518 on many of the use case specific requirements, circumstances 519 and the available technologies. 521 Mekuria & Zhang Expires January 15 2019 [Page11] 522 In Diagram 6 we highlight another aspect taken into 523 consideration for large scale systems with many users. 524 Often one would like to run multiple encoders, 525 multiple processing entities and make them available 526 to the clients via a load balancer. This way requests 527 can be balanced over multiple processing nodes. 528 This approach is common when serving web pages, 529 and this architecture also applies to video 530 streaming platforms that also use HTTP. In Diagram 531 6 it is highlighted how one or more multiple live encoders can 532 be sending data to one or more processing entities. In 533 such a workflow it is important to handle the case 534 when one source or media processing entity fails over. 535 We call this support for failover. It is an important 536 consideration in practical video streaming systems that 537 need to run 24/7. Failovers must be handled robustly 538 and seamlesslessly without causing service interruption. 539 In both profiles we detail how this failover and redundancy 540 support can be achieved. 542 Diagram 5: Differences profile 1 and profile 2 for use cases 543 ============================================================ 544 |Profile | Encoder/Live source | Media processing | 545 |----------|----------------------|------------------------| 546 |Profile 1 |limited overview |DRM,transcode, watermark| 547 | | simple encoder |man. create, packaging| 548 | | multiple sources |content stitch, timed | 549 |Profile 2 |Global overview | cache, store, deliver | 550 | |encoder targets client| | 551 | |only duplicate sources| manifest manipulation | 552 ============================================================ 554 Diagram 6: 555 workflow with redundant sources and media processing entities 557 ============ fmp4 ============== 558 || live || stream|| Media || 559 || media ||====>>>||Processing|| \\ 560 || source || // || Entity || \\ 561 ============ // ============== \\ ============ 562 || live || // \\ || load || 563 || media ||// redundant stream >>||balancer|| ==>>> Client 564 || source ||\\ stream // ============= 565 ============ \\ ============= // 566 || live || \\ || Media || // 567 ||ingest ||====>>>||Processing ||// 568 || source || // || Entity || 569 ============ // =============== 570 || live || // 571 ||ingest ||// redundant stream 572 || source || 573 ============ 575 Mekuria & Zhang Expires January 15 2019 [Page11] 576 4. General Ingest Protocol Behavior 578 The media ingest follows the following 579 general requirements for both target profiles. 580 1. The live encoder or ingest source communicates to 581 the publishing point/processing entity using the HTTP 582 POST method as defined in the HTTP protocol [RFC2626] 583 2. The media ingest source SHOULD use HTTP over TLS [RFC2818] 584 to connect to the media processing entity 585 3. The live encoder/media source SHOULD repeatedly resolve 586 the Hostname to adapt to changes in the IP to Hostname mapping 587 such as for example by using the dynamic naming system 588 DNS [RFC1035] or any other system that is in place. 589 4. The Live encoder media source MUST update the IP to hostname 590 resolution respecting the TTL (time to live) from DNS 591 query responses, this will enable better resillience 592 to changes of the IP address in large scale deployments 593 where the IP adress of the publishing point media 594 processing nodes may change frequenty. 595 5. In case HTTPS[RFC2818] protocol is used, 596 basic authentication HTTP AUTH [RFC7617] 597 or better methods like TLS client certificates SHOULD be used 598 6. As compatibility profile for the TLS encryption 599 we recommend the ingest SHOULD use the mozzilla 600 intermediate compatibility profile which is supported 601 in many available implementations [MozillaTLS]. 602 7. The encoder or ingest source SHOULD terminate 603 the HTTP POST request if data is not being sent 604 at a rate commensurate with the MP4 segment duration. 605 An HTTP POST request that does not send data can 606 prevent publishing points or media processing entities 607 from quickly disconnecting from the live encoder or 608 media ingest source in the event of a service update. 609 For this reason, the HTTP POST for sparse 610 data such as sparse tracks SHOULD be short-lived, 611 terminating as soon as 612 the sparse fragment is sent. 613 8. The POST request uses a POST_URL to the basepath of the 614 publishing point at the media processing entity and 615 MAY use a relative path for different streams and segments. 617 5. Profile 1: Fragmented MPEG-4 Ingest General Considerations 619 The first profile assumes ingest to an active media processing entity, 620 from one or more live ingest sources, ingesting one or more 621 types of media streams. This advances over the ingest 622 part of the smooth ingest protocol [MS-SSTR] by using 623 standardized media container formats based on [ISOBMFF][CMAF]. 624 In addition this allows extension to codecs like [HEVC] and 625 timed metadata ingest of subtitle and timed text streams. 626 The workflow ingesting multiple media ingest streams with 627 fragmented MPEG-4 ingest is illustrated in Diagram 7. 629 Mekuria & Zhang Expires January 15 2019 [Page13] 630 Diagram 7: fragmented MPEG-4 ingest with multiple ingest sources 631 ============ fmp4 ============== 632 || live || video || || 633 || ingest ||====>>>|| || 634 || source || || || 635 ============ || || 636 || live || fmp4 || || 637 || ingest ||====>>>|| Active || ============== 638 || source || audio || Media || HLS || Content || HLS 639 ============ || procesing||===>>>|| Delivery ||==>>> Client 640 || live || fmp4 || entity || DASH || Network || DASH 641 ||ingest ||====>>>|| || ============= 642 || source || text || || 643 ============ || || 644 || live || fmp4 || || 645 ||ingest || meta || || 646 || source || data || || 647 || ||====>>>|| || 648 ============ ============== 650 In diagrams 8-10 we detail some of the concepts and structures. 651 Diagram 8 shows the data format structure of fragmented 652 MPEG-4 [ISOBMFF] and [CMAF]. In this format media meta data 653 (playback time, sample duration) and sample data (encoded samples) 654 are interleaved. the moof box as specified in [ISOBMFF] is used 655 to signal the information to playback and decode the samples 656 followed in the mdat box. 657 The ftyp and moov box contain the track specific information 658 and can be seen as a header of the stream, sometimes referred 659 as a [CMAF] header. The styp box can be used to signal the 660 type of segment. The combination of styp moof mdat can be referred 661 as a segment, the combination of ftyp and moof can be referred 662 to as an init segment or a CMAF header. 664 Diagram 8: fragmented mp4 stream: 665 ========================================================= 666 ||ftyp||moov||styp||moof||mdat||styp||moof||mdat|| .....= 667 ========================================================= 669 In diagram 9 we illustrate the synchronisation model, that 670 is in many ways similar, but simplified, from the synchronisation 671 model propose in [CMAF]. Different bit-rate tracks 672 and or media streams are conveyed in separate fragmented mp4 streams. 673 by having the boundaries to the segments time alligned for tracks 674 comprising the same stream at different bit-rates, bit-rate 675 switching can be achieved. By using a common timeline 676 different streams can be synchronized at the receiver, 677 while they are in a separeted fragmented mp4 stream 678 send over a separate connection, possibly from a different 679 live ingest source. 681 Mekuria & Zhang Expires January 15 2019 [Page14] 682 In diagram 10 another advantage of this synchronisation model 683 is illustrated, the concept of late binding. In the case 684 of late binding a new stream becomes available. By using 685 the segment boundaries and a common timeline it can 686 be received by the media processing entity and embedded 687 in the presentation. Late binding is useful for many 688 practical use cases when broadcasting television 689 content with different types of metadata tracks. 691 Diagram 9: fmp4 stream synchronisation: 692 ========================================================= 693 ||ftyp||moov||styp||moof||mdat||styp||moof||mdat|| .....= 694 ========================================================= 695 ||ftyp||moov||styp||moof||mdat||styp||moof||mdat|| .....= 696 ========================================================= 697 ||ftyp||moov||styp||moof||mdat||styp||moof||mdat|| .....= 698 ========================================================= 700 Diagram 10: fmp4 late binding: 701 =================================================== 702 ||ftyp||moov||styp||moof||mdat||moof||mdat|| .....= 703 =================================================== 704 ========================== 705 ||ftyp||moov||styp||moof|| 706 ========================= 708 Diagram 11 shows the flow of the media ingest. It starts with a 709 DNS resolution (if needed) and an authentication step (Authy, 710 TLS certificate) to establish a secure TCP connection. 711 In some private datacenter deployments where nodes 712 are not reachable from outside, a non authenticated connection 713 MAY also be used. The ingest source then issues an empty POST 714 to test that the media processing entity is listening. It then 715 start sending the moov + ftyp box (the init segment), followed 716 by the rest of the segments in the fragmented MPEG-4 stream. In 717 the end of the session, for tear down the source can send an 718 empty mfra box to close the connection. 720 Mekuria & Zhang Expires January 15 2019 [Page15] 721 Diagram 11: fmp4 ingest flow 722 ||===============================================================|| 723 ||===================== ============================ || 724 ||| live ingest source | | Media processing entity | || 725 ||===================== ============================ || 726 || || <<------ DNS Resolve -------->> || || 727 || || <<------ Authenticate -------->> || || 728 || || <<------POST fmp4stream -------->> || || 729 ||=============== empty POST to test connection ================|| 730 || || <<------ 404 Bad Request -----------|| || 731 || || <<------ 202 OK --------------------|| || 732 || || <<------ 403 Forbidden -------------|| || 733 || || <<------ 404 Bad Request || || 734 || || <<------ 400 Forbidden -------------|| || 735 || || Unsupported Media Type || || 736 || || <<------ 415 Forbidden -------------|| || 737 ||================== Moov + ftyp Sending =======================|| 738 ||============= fragmented MP4 Sending ==========================|| 739 || || <<------ 404 Bad Request -----------|| || 740 ||============= mfra box Sending (close) ========================|| 741 || || <<------ 200 OK --------------------|| || 742 ||===================== ============================ || 743 ||| live ingest source | | Media processing entity | || 744 ||===================== ============================ || 745 || || || || 746 ||===============================================================|| 748 6. Profile 1: Fragmented MPEG-4 Ingest Protocol Behavior 750 This section describes the protocol behavior specific to 751 profile 1: fragmented MPEG-4 ingest. Operation of this 752 profile MUST also adhere to general requirements in secion 4. 754 6.1. General Protocol Requirements 756 1. The live encoder or ingest source SHOULD start 757 by sending an HTTP POST request with an empty "body" 758 (zero content length) by using the POSTURL 759 This can help the live encoder or media 760 ingest source to quickly detect whether the 761 live ingest publishing point is valid, 762 and if there are any authentication 763 or other conditions required. 764 2. The live encoder or ingest source MUST initiate 765 a media ingest connection by POSTING the 766 header boxes "ftyp" and "moov" after step 1 767 3. The encoder or ingest source SHOULD use chunked transfer 768 encoding option of the HTTP POST command [RFC2626] 769 as it might be difficult to predict the entire content length 770 of the segment. This can also be used for example to support 771 use cases that require low latency. 773 Mekuria & Zhang Expires January 15 2019 [Page16] 774 4. If the HTTP POST request terminates or times out with a TCP 775 error prior to the end of the stream, the encoder MUST issue 776 a new connection, and follow the 777 preceding requirements. Additionally, the encoder MAY resend 778 the previous segment that was already sent again. 779 5. The live encoder or ingest source MUST handle 780 any error or failed authentication responses 781 received from the media processing, by issueing 782 a new connection and following the preceding 783 requirements inlcluding retransmitting the ftyp and moov boxes. 784 6. In case the live stream event is over the live media 785 source or ingest source should signal 786 the stop by transmitting an empty "mfra" box 787 towards the publishing point/processing entity. 788 7. The live ingest source SHOULD use a separate TCP 789 connection for ingest of each different track 790 8. The live ingest source MAY use a separate relative path 791 in the POST_URL for ingest of each different track 793 6.2. Requirements for formatting Media Tracks 795 1. The trackFragmentDecodeTime box "tfdt" box 796 MUST be present for each segment posted. 797 2. The ISOBMFF media fragment duration SHOULD be constant, 798 the duration MAY fluctuate to compensate 799 for non-integer frame rates. By choosing an appropriate 800 timescale (a multiple of the frame rate is recommended) 801 this issue SHOULD be avoided. 802 3. The MPEG-4 fragment durations SHOULD be between 803 approximately 1 and 6 seconds. 804 4. The fragment decode timestamps "tfdt" of fragments in the 805 fragmentedMP4stream and the indexes base_media_decode_ time 806 SHOULD arrive in increasing order for each of the different 807 tracks/streams that are ingested. 808 5. The segments formatted as fragmented MP4 stream SHOULD use 809 a timescale for video streams based on the framerate 810 and 44.1 KHz or 48 KHz for audio streams 811 or any another timescale that enables integer 812 increments of the decode times of 813 fragments signalled in the "tfdt" box based on this scale. 814 6. The language of the stream SHOULD be signalled in the 815 "mdhd" box or "elng" boxes in the 816 init segment and/or moof headers ("mdhd"). 817 7. Encryption specific information SHOULD be signalled 818 in the "pssh","schm" and "sinf" boxes following [ISOBMFF][CENC] 819 8. Segments posted towards the media procesing entity SHOULD 820 contain the bitrate "btrt" box specifying the target 821 bitrate of the segments 822 9. Segments posted towards the media procesing entity SHOULD 823 contain the "tfdt" box specifying the fragments decode time 824 and the "tfhd" box specifying the track id. 826 Mekuria & Zhang Expires January 15 2019 [Page17] 827 6.3 Requirements for Timed Text Captions and Subtitle streams 829 The media ingest follows the following requirements for ingesting 830 a track with timed text, captions and/or subtitle streams. 832 1. The track will be a sparse track signalled by a null media 833 header "nmhd" containing the timed text, images, captions 834 corresponding to the recommendation of storing tracks 835 in fragmented MPEG-4 [CMAF] 836 2. Based on this recommendation the trackhandler "hdlr" shall 837 be set to "text" for WebVTT and "subt" for TTML following 838 [MPEG-4-30] 839 3. In case TTML is used the track must use the XMLSampleEntry 840 to signal sample description of the sub-title stream [MPEG-4-30] 841 4. In case WebVTT is used the track must use the WVTTSampleEntry 842 to signal sample description of the text stream [MPEG-4-30] 843 5. These boxes SHOULD signal the mime type and specifics as 844 described in [CMAF] sections 11.3 ,11.4 and 11.5 845 6. The boxes described in 2-5 must be present in the init 846 segment (ftyp + moov) for the given track 847 7. subtitles in CTA-608 and CTA-708 format SHOULD be conveyed 848 following the recommendation section 11.5 in [CMAF] via 849 Supplemental Enhancement Information SEI messages 850 in the video track [CMAF] 851 8. The "ftyp" box in the init segment for the track 852 containing timed text, images, captions and sub-titles 853 MAY use signalling using CMAF profiles based on [CMAF] 855 8a. WebVTT Specified in 11.2 ISO/IEC 14496-30 856 [MPEG-4-30] 'cwvt' 857 8b.TTML IMSC1 Text Specified in 11.3.3 [MPEG-4-30] 858 IMSC1 Text Profile 'im1t' 859 8c.TTML IMSC1 Image Specified in 11.3.4 [MPEG-4-30] 860 IMSC1 Image Profile 'im1i' 861 8d. CEA CTA-608 and CTA-708 Specified in 11.4 [MPEG-4-30] 862 Caption data is embedded in SEI messages in video track; 863 'ccea' 865 6.4 Requirements for Timed Metadata 867 This section discusses the specific formatting requirements 868 for ingest of timed metadata related to events and markers for 869 ad insertion or other timed metadata An example of 870 these are opportunities for dynamic live ad insertion 871 signalled by SCTE-35 markers. This type of event signalling 872 is different from regular audio/video information 873 because of its sparse nature. In this case, 874 the signalling data usually does not 875 happen continuously, and the intervals can 876 be hard to predict. Examples of timed metadata are ID3 tags 877 [ID3v2], SCTE-35 markers [SCTE-35] and DASH emsg 878 messages defined in section 5.10.3.3 of [DASH]. For example, 879 DASH Event messages contain a schemeIdUri that defines 880 the payload of the message. 881 Mekuria & Zhang Expires January 15 2019 [Page18] 882 Table 1 provides some 883 example schemes in DASH event messages and Table 2 884 illustrates an example of a SCTE-35 marker stored 885 in a DASH emsg. The presented approach allows ingest of 886 timed metadata from different sources, 887 possibly on different locations by embedding them in 888 sparse metadata tracks. 890 Table 1 Example of DASH emsg schemes URI 892 Scheme URI | Reference 893 -------------------------|------------------ 894 urn:mpeg:dash:event:2012 | [DASH], 5.10.4 895 urn:dvb:iptv:cpm:2014 | [DVB-DASH], 9.1.2.1 896 urn:scte:scte35:2013:bin | [SCTE-35] 14-3 (2015), 7.3.2 897 www.nielsen.com:id3:v1 | Nielsen ID3 in MPEG-DASH 899 Table 2 example of a SCTE-35 marker embedded in a DASH emsg 900 Tag | Value 901 ------------------------|----------------------------- 902 scheme_uri_id | "urn:scte:scte35:2013:bin" 903 Value | the value of the SCTE 35 PID 904 Timescale | positive number 905 presentation_time_delta | non-negative number expressing splice time 906 | relative to tfdt 907 event_duration | duration of event 908 | "0xFFFFFFFF" indicates unknown duration 909 Id | unique identifier for message 910 message_data | splice info section including CRC 912 The following steps are recommended for timed metadata 913 ingest related to events, tags, ad markers and 914 program information: 916 1. Create the metadata stream as a 917 fragmentedMP4stream that conveys the metadata 918 , the media handler (hdlr) is "meta", 919 the track handler box is a null media header box "nmhd". 920 2. The metadata stream applies to the media streams 921 in the presentation ingested to active publishing 922 point at the media processing entity 923 3. The URIMetaSampleEntry entry contains, in a URIbox, 924 the URI following the URI syntax in [RFC3986] 925 defining the form of the metadata 926 (see the ISO Base media file format 927 specification [ISOBMFF]). For example, the URIBox 928 could contain for ID3 tags [ID3v2] 929 the URL http://www.id3.org or 930 or urn:scte:scte35:2013a:bin 931 for scte 35 markers [SCTE-35] 932 4. The timescale of the metadata should match the value 933 specified in the media header box "mdhd" of the 934 metadata track. 935 Mekuria & Zhang Expires January 15 2019 [Page19] 936 5. The Arrival time is signalled in the "tfdt" box 937 of the track fragment as the basemediadecode 938 time, this time is often different 939 from the media presentation time, which is occurs 940 when a message is applied. The duration of 941 a metadata fragment can be set to zero, 942 letting it be determined by the 943 time (tfdt) of a next metadata segment received. 944 6. All Timed Metadata samples SHOULD 945 be sync samples [ISOBMFF], 946 defining the entire set of 947 metadata for the time interval 948 they cover. Hence, the sync 949 sample table box SHOULD 950 not be present in the metadata stream. 951 7. The metadata segment becomes available to the 952 publishing point/ media processing entity 953 when the corresponding track fragment 954 from the media that has an equal 955 or larger timestamp compared to 956 the arrival time signalled 957 in the tfdt basemediadecodetime. 958 For example, if the sparse fragment 959 has a timestamp of t=1000, it is expected that after the 960 publishing point/processing entity sees "video" 961 (assuming the parent track name is "video") 962 fragment timestamp 1000 or beyond, it can retrieve the 963 sparse fragment from the binary payload. 964 8. The payload of sparse track fragments is conveyed 965 in the mdat box as sample information. This enables 966 muxing of the metadata tracks. For example 967 XML metadata can for example be coded as base64 as 968 common for [SCTE-35] metadata messages 970 6.5 Requirements for Media Processing Entity Failover 972 Given the nature of live streaming, good failover support is 973 critical for ensuring the availability of the service. 974 Typically, media services are designed to handle various types 975 of failures, including network errors, server errors, and storage 976 issues. When used in conjunction with proper failover 977 logic from the live encoder side, highly reliable live streaming 978 setups can be build. In this section, we discuss requirements 979 for failover scenarios. 981 The following steps are required for a live encoder or media 982 ingest source to deal with a failing media processing entity. 984 Mekuria & Zhang Expires January 15 2019 [Page20] 985 1. Use a 10-second timeout for establishing the 986 TCP connection. 987 If an attempt to establish the connection takes longer 988 than 10 seconds, abort the operation and try again. 989 2. Use a short timeout for sending the HTTP requests. 990 If the target segment duration is N seconds, use a send 991 timeout between N and 2 N seconds; for example, if 992 the segment duration is 6 seconds, 993 use a timeout of 6 to 12 seconds. 994 If a timeout occurs, reset the connection, 995 open a new connection, 996 and resume stream ingest on the new connection. 997 This is needed to avoid latency introduced 998 by failing connectivity in the workflow. 999 3. Resend track segments for which a 1000 connection was terminated early 1001 4. We recommend that the encoder or ingest source 1002 does NOT limit the number of retries to establish a 1003 connection or resume streaming after a TCP error occurs. 1004 5. After a TCP error: 1005 a. The current connection MUST be closed, 1006 and a new connection MUST be created 1007 for a new HTTP POST request. 1008 b. The new HTTP POST URL MUST be the same 1009 as the initial POST URL for the 1010 segment to be ingested. 1011 c. The new HTTP POST MUST include stream 1012 headers ("ftyp", and "moov" boxes) 1013 identical to the stream headers in the 1014 initial POST request for fragmented media ingest. 1015 6. In case the media processing entity cannot process the 1016 POST request due to authentication or permission 1017 problems then it SHOULD return a permission denied HTTP 403 1018 7. In case the media processing entity can process the request 1019 it SHOULD return an HTTP 200 OK or 202 Accepted 1020 8. In case the media processing entity can process 1021 the manifest or segment in the POST request body but finds 1022 the media type cannot be supported it SHOULD return an HTTP 415 1023 unsupported media type 1024 9. In case an unknown error happened during 1025 the processing of the HTTP 1026 POST request a HTTP 404 Bad request SHOULD be returned 1027 10. In case the media processing entity cannot 1028 proces a segment posted 1029 due to missing or incorrect init segment, an HTTP 412 1030 unfulfilled condition SHOULD be returned 1031 11. In case a media source receives an HTTP 412 response, 1032 it SHOULD resend "ftyp" and "moov" boxes 1034 Mekuria & Zhang Expires January 15 2019 [Page21] 1035 6.6 Requirements for Live Media Source Failover 1037 Live encoder or media ingest source failover is the second type 1038 of failover scenario that needs to be addressed for end-to-end 1039 live streaming delivery. In this scenario, the error condition 1040 occurs on the encoder side. The following expectations apply 1041 to the live ingestion endpoint when encoder failover happens: 1043 1. A new encoder or media ingest source instance 1044 SHOULD be instantiated to continue streaming 1045 2. The new encoder or media ingest source MUST use 1046 the same URL for HTTP POST requests as the failed instance. 1047 3. The new encoder or media ingest source POST request 1048 MUST include the same header boxes moov 1049 and ftyp as the failed instance 1050 4. The new encoder or media ingest source 1051 MUST be properly synced with all other running encoders 1052 for the same live presentation to generate synced audio/video 1053 samples with aligned fragment boundaries. 1054 This implies that UTC timestamps 1055 for fragments in the "tdft" match between decoders, 1056 and encoders start running at 1057 an appropriate segment boundaries. 1058 5. The new stream MUST be semantically equivalent 1059 with the previous stream, and interchangeable 1060 at the header and media fragment levels. 1061 6. The new encoder or media ingest source SHOULD 1062 try to minimize data loss. The basemediadecodetime tdft 1063 of media fragments SHOULD increase from the point where 1064 the encoder last stopped. The basemediadecodetime in the 1065 tdft box SHOULD increase in a continuous manner, but it 1066 is permissible to introduce a discontinuity, if necessary. 1067 Media processing entities or publishing points can ignore 1068 fragments that it has already received and processed, so 1069 it is better to error on the side of resending fragments 1070 than to introduce discontinuities in the media timeline. 1072 Mekuria & Zhang Expires January 15 2019 [Page22] 1073 7. Profile 2: DASH Ingest General Considerations 1075 Profile 2 is designed to ingest media into entities that only 1076 provide pass through functionality. In this case the media 1077 ingest source also provides the manifest based on MPEG DASH[DASH] 1078 or HTTP Live Streaming [RFC8216]. 1080 The key idea here is to reuse the fragmented MPEG-4 ingest to 1081 enable simulataneous ingest of DASH and HLS based on the 1082 fragmented MPEG-4 files using commonalities as 1083 described in [CMAF] which is a format based on fragmented 1084 MPEG-4 that can be used in both DASH and HLS presentations. 1086 The flow of operation in profile 2 is shown in Diagram 12. In this 1087 case the live ingest source (media source) sends a manifest first. 1088 Based on this manifest the media processing entity can setup 1089 reception paths for the ingest url 1090 http://hostname/presentationpath 1092 In the next step segments are send in individual post requests 1093 using URLS corresponding to relative 1094 paths and segment names in the manifest. 1095 e.g. http://hostname/presentationpath/relative_path/segment1.cmf 1097 This profile re-uses as much functionality as possible from 1098 profile 1 as the manifest can be seen 1099 as a complementary addition to the 1100 fragmented MPEG-4 stream. A difference lies in the way 1101 the connection is setup and the way data is transmitted, 1102 which can use relative URL paths for the segments based on the 1103 paths in the manifest. For the rest, it largely 1104 uses the same fragmented MPEG-4 layer based on [ISOBMFF] 1105 and [CMAF]. 1107 Mekuria & Zhang Expires January 15 2019 [Page23] 1108 Diagram 12 1109 ||===============================================================|| 1110 ||===================== ============================ || 1111 ||| live media source | | Media processing entity | || 1112 ||===================== ============================ || 1113 || || || || 1114 ||===============Initial Manifest Sending========================|| 1115 || || || || 1116 || ||-- POST /prefix/media.mpd -------->>|| || 1117 || || Succes || || 1118 || || <<------ 200 OK --------------------|| || 1119 || || Permission denied || || 1120 || || <<------ 403 Forbidden -------------|| || 1121 || || Bad Request || || 1122 || || <<------ 400 Forbidden -------------|| || 1123 || || Unsupported Media Type || || 1124 || || <<-- 412 Unfulfilled Condition -----|| || 1125 || || Unsupported Media Type || || 1126 || || <<------ 415 Unsupported Media -----|| || 1127 || || || || 1128 ||==================== Segment Sending ==========================|| 1129 || ||-- POST /prefix/chunk.cmaf ------->>|| || 1130 || || Succes/Accepted || || 1131 || || <<------ 200 OK --------------------|| || 1132 || || Succes/Accepted || || 1133 || || <<------ 202 OK --------------------|| || 1134 || || Premission Denied || || 1135 || || <<------ 403 Forbidden -------------|| || 1136 || || Bad Request || || 1137 || || <<------ 400 Forbidden -------------|| || 1138 || || Unsupported Media Type || || 1139 || || <<------ 415 Forbidden -------------|| || 1140 || || Unsupported Media Type || || 1141 || || <<-- 412 Unfulfilled Condition -----|| || 1142 || || || || 1143 || || || || 1144 ||===================== ============================ || 1145 ||| live media source | | Media processing entity | || 1146 ||===================== ============================ || 1147 || || || || 1148 ||===============================================================|| 1150 Mekuria & Zhang Expires January 15 2019 [Page24] 1151 8. profile 2: DASH and HLS Ingest Protocol Behavior 1153 Operation of this profile MUST also adhere 1154 to general requirements in section 4. 1156 8.1 General Protocol Requirements 1158 1. Before sending the segments 1159 based on fragmentedMP4Stream the live encoder/source 1160 MUST send a manifest [DASH] 1161 with the following the limitations/constraints. 1162 1a. Only relative URL paths to be used for each segment 1163 1b. Only unique paths are used for each new presentation 1164 1c. In case the manifest contains these relative paths, 1165 these paths SHOULD be used in combination with the 1166 POST_URL to POST each of the different segments from 1167 the live encoder or ingest source 1168 to the processing entity. 1169 2. The live encoder or ingest source MAY send 1170 updated versions of the manifest, 1171 this manifest cannot override current 1172 settings and relative paths or break currently running and 1173 incoming POST requests. The updated manifest can only be 1174 slightly different from the one that was send previously, 1175 e.g. introduce new segments available or event messages. 1176 The updated manifest SHOULD be send using a PUT request 1177 instead of a POST request. 1179 3. Following media segment requests 1180 POST_URLs SHOULD be corresponding to the segments listed 1181 in the manifest as POST_URL + relative URLs. 1182 4. The encoder or ingest source SHOULD use 1183 individual HTTP POST commands [RFC2626] 1184 for uploading media segments when available. 1186 5. In case fixed length POST Commands are used, the live source 1187 entity MUST resend the segment to be posted decribed 1188 in the manifest entirely in case of responses HTTP 400, 404 1189 412 or 415 together with the init segment consisting 1190 of "moov" and "ftyp" boxes. 1192 6. A persistent connection SHOULD be used for the different 1193 individual POST requests as defined in [RFC2626] enabling 1194 re-use of the TCP connection for multiple POST requests. 1196 8.2 Requirements for Formatting Media Tracks 1198 1. Media data tracks and segments MUST be formatted and delivered 1199 conforming to the same requirements as stated in 6.2 1200 2. Media specific information SHOULD be signalled in the manifest 1201 3. Formatting described in manifest and media track MUST 1202 correspond consistently 1204 Mekuria & Zhang Expires January 152019 [Page25] 1205 8.3 Requirements for Timed Text Captions and Subtitle stream 1207 1. Timed Text, caption and subtitle stream tracks MUST 1208 be formatted conforming to the same requirements as in 6.3 1209 2. Timed Text captions and subtitle specific information 1210 SHOULD also be signalled in the manifest 1211 3. Formatting described in manifest and 1212 media track MUST correspond consistently 1214 8.4 Requirements for Timed Metadata 1216 1. Timed Metadata tracks MAY be formatted conforming 1217 to the same requirements as in 8.4 1218 2. In addition, the emsg box containing the metadata 1219 SHOULD also be signalled in inband in the media 1220 track as recommended in [CMAF] 1221 3. DASH event messages SHOULD also 1222 be signalled in the Manifest 1224 8.4 Requirements for Media Processing Entity Failover 1225 1. Requirements for failover are similar as stated in 6.4 1226 2. In addition the live encoder source SHOULD resend the manifest 1227 before sending any of the other segments 1229 8.5 Requirements for Live Media Source Failover 1231 1. Requirements for failover are similar as stated in 6.5 1232 2. In addition the live encoder source SHOULD 1233 resend the manifest before sending any 1234 of the other segments 1236 9. Security Considerations 1238 Security consideration are extremely important 1239 for media ingest. Retrieving media from a illicit 1240 source can cause inappropriate content 1241 to be broadcasted 1242 and possibly lead to failure of infrastructure. 1243 Basic security requirements have been covered in 1244 section 4. 1245 No security considerations except the ones mentioned 1246 in this part of the text are expelictly considered. 1247 Further security considerations will be updated 1248 once they have been investigated further based 1249 on review of this draft. 1251 10. IANA Considerations 1253 This memo includes no request to IANA. 1255 Mekuria & Zhang Expires January 15 2019 [Page26] 1256 11. Contributors 1258 Will Law Akamai 1259 James Gruessing BBC R&D 1260 Kevin Moore Amazon AWS Elemental 1261 Kevin Johns CenturyLink 1262 John Deutscher Microsoft 1263 Patrick Gendron Harmonic Inc. 1264 Nikos Kyriopoulos MediaExcel 1265 Rufael Mekuria Unified Streaming 1266 Sam Geqiang Microsoft 1267 Arjen Wagenaar Unified Streaming 1268 Dirk Griffioen Unified Streaming 1269 Matt Poole ITV 1270 Alex Giladi Comcast 1272 12. References 1274 12.1. Normative References 1276 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1277 Requirement Levels", BCP 14, RFC 2119, March 1997. 1279 [DASH] MPEG ISO/IEC JTC1/SC29 WG11, "ISO/IEC 23009-1:2014: 1280 Dynamic adaptive streaming over HTTP (DASH) -- Part 1: 1281 Media presentation description and segment formats," 2014. 1283 [SCTE-35] Society of Cable Television Engineers, 1284 "SCTE-35 (ANSI/SCTE 35 2013) 1285 Digital Program Insertion Cueing Message for Cable," 1286 SCTE-35 (ANSI/SCTE 35 2013). 1288 [ISOBMFF] MPEG ISO/IEC JTC1/SC29 WG11, " Information technology 1289 -- Coding of audio-visual objects Part 12: ISO base 1290 media file format ISO/IEC 14496-12:2012" 1292 [HEVC] MPEG ISO/IEC JTC1/SC29 WG11, 1293 "Information technology -- High efficiency coding 1294 and media delivery in heterogeneous environments 1295 -- Part 2: High efficiency video coding", 1296 ISO/IEC 23008-2:2015, 2015. 1298 [RFC793] J Postel IETF DARPA, "TRANSMISSION CONTROL PROTOCOL," 1299 IETF RFC 793, 1981. 1301 [RFC3986] R. Fielding, L. Masinter, T. Berners Lee, 1302 "Uniform Resource Identifiers (URI): Generic Syntax," 1303 IETF RFC 3986, 2004. 1305 Mekuria & Zhang Expires January 152019 [Page27] 1307 [RFC1035] P. Mockapetris, 1308 "DOMAIN NAMES - IMPLEMENTATION AND SPECIFICATION" 1309 IETF RFC 1035, 1987. 1311 [CMAF] MPEG ISO/IEC JTC1/SC29 WG11, "Information technology 1312 (MPEG-A) -- Part 19: Common media application 1313 format (CMAF) for segmented media," 1314 MPEG, ISO/IEC International standard 1316 [RFC5234] D. Crocker "Augmented BNF for Syntax Specifications: 1317 ABNF" IETF RFC 5234 2008 1319 [CENC] MPEG ISO/IEC JTC1 SC29 WG11 "Information technology -- 1320 MPEG systems technologies -- Part 7: Common encryption 1321 in ISO base media file format files" 1322 ISO/IEC 23001-7:2016 1324 [MPEG-4-30] MPEG ISO/IEC JTC1 SC29 WG11 1325 "ISO/IEC 14496-30:2014 Information technology 1326 Coding of audio-visual objects -- Part 30": 1327 Timed text and other visual overlays in 1328 ISO base media file format 1330 [ISO639-2] ISO 639-2 "Codes for the Representation of Names 1331 of Languages -- Part 2 ISO 639-2:1998" 1333 [DVB-DASH] ETSI Digital Video Broadcasting 1334 "MPEG-DASH Profile for Transport of ISOBMFF 1335 Based DVB Services over IP Based Networks" 1336 ETSI TS 103 285 1338 [RFC7617] J Reschke "The 'Basic' HTTP Authentication Scheme" 1339 IETF RFC 7617 September 2015 1341 12.2. Informative References 1343 [RFC2626] R. Fielding et al 1344 "Hypertext Transfer Protocol HTTP/1.1", 1345 RFC 2626 June 1999 1347 [RFC2818] E. Rescorla RFC 2818 HTTP over TLS 1348 IETF RFC 2818 May 2000 1350 [RFC8216] R. Pantos, W. May "HTTP Live Streaming", August 2018 1351 (last acessed) 1353 Mekuria & Zhang Expires January 15 2019 [Page28] 1354 12.3. URL References 1356 [fmp4git] Unified Streaming github fmp4 ingest, 1357 "https://github.com/unifiedstreaming/fmp4-ingest". 1359 [MozillaTLS] Mozilla Wikie Security/Server Side TLS 1360 https://wiki.mozilla.org/Security/Server_Side_TLS 1361 #Intermediate_compatibility_.28default.29 1362 (last acessed 30th of March 2018) 1364 [ID3v2] M. Nilsson "ID3 Tag version 2.4.0 Main structure" 1365 http://id3.org/id3v2.4.0-structure 1366 November 2000 (last acessed 2nd of May 2018) 1368 [MS-SSTR] Smooth streaming protocol 1369 https://msdn.microsoft.com/en-us/library/ff469518.aspx 1370 last updated March 16 2018 (last acessed June 11 2018) 1371 Author's Address 1373 Rufael Mekuria (editor) 1374 Unified Streaming 1375 Overtoom 60 1054HK 1377 Phone: +31 (0)202338801 1378 E-Mail: rufael@unified-streaming.com 1380 Sam Geqiang Zhang 1381 Microsoft 1382 E-mail: Geqiang.Zhang@microsoft.com 1384 Mekuria & Zhang Expires August 1 2019 [Page29]