idnits 2.17.1 draft-mekuria-mmediaingest-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 508 has weird spacing: '...elative to tf...' == Line 560 has weird spacing: '... ingest sourc...' == Line 568 has weird spacing: '...lishing point...' -- The document date (May 7, 2018) is 2180 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'MozillaTLS' is mentioned on line 850, but not defined == Missing Reference: 'ID3v2' is mentioned on line 855, but not defined -- Possible downref: Non-RFC (?) normative reference: ref. 'DASH' -- Possible downref: Non-RFC (?) normative reference: ref. 'SCTE-35' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISOBMFF' -- Possible downref: Non-RFC (?) normative reference: ref. 'HEVC' ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) -- Possible downref: Non-RFC (?) normative reference: ref. 'CMAF' -- Possible downref: Non-RFC (?) normative reference: ref. 'CENC' -- Possible downref: Non-RFC (?) normative reference: ref. 'MPEG-4-30' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO639-2' -- Possible downref: Non-RFC (?) normative reference: ref. 'DVB-DASH' -- Obsolete informational reference (is this intentional?): RFC 2818 (Obsoleted by RFC 9110) Summary: 1 error (**), 0 flaws (~~), 7 warnings (==), 12 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force R. Mekuria 3 Internet-Draft Unified Streaming B.V. 4 Expires: November 7th, 2018 6 Intended status: Best Current Practice May 7, 2018 8 Live Media and Metadata Ingest Protocol 9 draft-mekuria-mmediaingest-00.txt 11 Abstract 13 This Internet draft presents a protocol specification for 14 ingesting live media and metadata content from a 15 live media source such as a live encoder towards a media 16 processing entity or content delivery network. 17 It defines the media format usage, the preferred transmission 18 methods and the handling of failovers and redundancy. 19 The live media considered includes high quality encoded 20 audio visual content. The timed metadata supported 21 includes timed graphics, captions, subtitles and 22 metadata markers and information. This protocol can 23 for example be used advanced live streaming workflows 24 that combine high quality live encoders and advanced 25 media processing entities. The specification follows 26 best current industry practice. 28 Status of This Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at http://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six 39 months and may be updated, replaced, or obsoleted by other documents 40 at any time. It is inappropriate to use Internet-Drafts as 41 reference material or to cite them other than as "work in progress." 43 Expires November 7 2018 [Page1] 44 Copyright Notice 46 Copyright (c) 2018 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents 51 (http://trustee.ietf.org/license-info) in effect on the date of 52 publication of this document. Please review these documents 53 carefully, as they describe your rights and restrictions with respect 54 to this document. Code Components extracted from this document must 55 include Simplified BSD License text as described in Section 4.e of 56 the Trust Legal Provisions and are provided without warranty as 57 described in the Simplified BSD License. 59 Table of Contents 61 1. Introduction 62 2. Conventions and Terminology 63 3. Media Ingest Protocol Behavior 64 4. Formatting Requirements for Timed Text, Captions and Subtitles 65 5. Formatting Requirements for Timed Metadata Markers 66 6. Guidelines for Handling of Media Processing Entity Failover 67 7. Guidelines for Handling of Live Media Source Failover 68 8. Security Considerations 69 9. IANA Considerations 70 10. Contributors 71 11. References 72 11.1. Normative References 73 11.2. Informative References 74 11.3. URL References 75 Author's Address 77 1. Introduction 79 This specification describes a protocol for media ingest from 80 a live source (e.g. live encoder) towards media processing 81 entities. Examples of media processing entities 82 include media packagers, publishing points, streaming origins, 83 content delivery networks and others. In particular, we 84 distinguish active media processing entities and passive media 85 processing entities. Active media processing entities perform 86 media processing such as encryption, packaging, changing (parts of) 87 the media content and deriving additional information. Passive 88 media processing entities provide pass through functionality 89 and/or delivery and caching functions that do not alter the media 90 content itself. An example of a passive media processing entity 91 could be a content delivery network (CDN) that provides 92 functionalities for the delivery of the content. 93 An example of an active media processing entity could 94 be a just-in-time packager or a just in time transcoder. 96 Expires November 7 2018 [Page2] 97 Diagram 1: Example workflow with media ingest 98 Live Media Source -> Media processing entity -> CDN -> End User 100 Diagram 1 shows the workflow with a live media ingest from a 101 live media source towards a media processing entity. The media 102 processing entity provides additional processing such as 103 content stitching, encryption, packaging, manifest generation, 104 transcoding etc. Such setups are beneficial for advanced 105 media delivery. The ingest described in this draft includes 106 the latest technologies and standards used in the industry 107 such as timed metadata, captions, timed text and encoding 108 standards such as HEVC [HEVC]. The media ingest protocol 109 specification and associated requirements were discussed 110 with stakeholders, including broadcasters, live encoder vendors, 111 content delivery networks, telecommunications companies 112 and cloud service providers. While this draft specification 113 has also been extensively discussed and reviewed by these 114 stakeholders representing current best practices. 115 Nevertheless, this current draft solely reflects the 116 point of view of the authors of this draft taking received 117 feedback from these stakeholders into account. Some 118 insights on the discussions leading to this draft 119 can be found on [fmp4git]. 121 2. Conventions and Terminology 123 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 124 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 125 document are to be interpreted as described in BCP 14, RFC 2119 126 [RFC2119]. 128 This specification uses the following additional terminology. 129 ISOBMFF: the ISO Base Media File Format specified in [ISOBMFF]. 130 ftyp: the filetype and compatibility "ftyp" box as described 131 in the ISOBMFF [ISOBMFF] that describes the "brand" 132 moov: the container box for all metadata "moov" described in the 133 ISOBMFF base media file format [ISOBMFF] 134 moof: the movie fragment "moof" box as described in the 135 ISOBMFF base media file format [ISOBMFF] that describes 136 the metadata of a fragment of media. 137 mdat: the media data container "mdat" box contained in 138 an ISOBMFF [ISOBMFF], this box contains the 139 compressed media samples 140 kind: the track kind box defined in the ISOBMFF [ISOBMFF] 141 to label a track with its usage 142 mfra: the movie fragment random access "mfra" box defined in 143 the ISOBMFF [ISOBMFF] to signal random access samples 144 (these are samples that require no prior 145 or other samples for decoding) [ISOBMFF]. 146 tfdt: the TrackFragmentDecodeTimeBox box "tfdt" 147 in the base media file format [ISOBMFF] used 148 to signal the decode time of the media 149 fragment signalled in the moof box. 151 Expires November 7 2018 [Page3] 152 mdhd: The media header box "mdhd" as defined in [ISOBMFF], 153 this box contains information about the media such 154 as timescale, duration, language using ISO 639-2/T codes 155 [ISO639-2] 157 pssh: The protection specific system header "pssh" box defined 158 in [CENC] that can be used to signal the content protection 159 information according to the MPEG Common Encryption (CENC) 160 sinf: Protection scheme information box "sinf" defined in [ISOBMFF] 161 that provides information on the encryption 162 scheme used in the file 163 elng: extended language box "elng" defined in [ISOBMFF] that 164 can override the language information 165 nmhd: The null media header Box "nmhd" as defined in [ISOBMFF] 166 to signal a track for which no specific 167 media header is defined, often used for metadata tracks 168 HTTP: Hyper Text Transfer Protocol, 169 version 1.1 as specified by [RFC2626] 170 HTTP POST: Command used in the Hyper Text Transfer Protocol for 171 sending data from a source to a destination [RFC2626] 172 fragmentedMP4stream: stream of [ISOBMFF] fragments 173 (moof and mdat) see page 5 for definition 174 POST_URL: Target URL of a POST command in the HTTP protocol 175 for pushing data from a source to a destination. 176 TCP: Transmission Control Protocol (TCP) as defined in [RFC793] 177 URI_SAFE_IDENTIFIER: identifier/string 178 formatted according to [RFC3986] 179 Connection: connection setup between a host and a source. 180 Live stream event: the total media broadcast stream of the ingest. 181 (Live) encoder: entity performing live encoding and producing 182 a high quality encoded stream, can serve as Media ingest source 183 (Media) Ingest source: a media source ingesting media content 184 , typically a live encoder but not restricted to this, 185 the media ingest source could by any type of media ingest 186 source such as a stored file that is send in partial chunks 187 Publishing point: entity used to publish the media content, 188 consumes/receives the incoming media ingest stream 189 Media processing entity: entity used to process media content, 190 receives/consumes a media ingest stream. 191 Media processing function: Media processing entity 193 3. Media Ingest Protocol Behavior 195 The specification uses multiple HTTP POST and/or PUT requests 196 to transmit an optional manifest followed by encoded media data 197 packaged in fragmented [ISOBMFF]. The subsequent posted segments 198 correspond to those decribed in the manifest. Each HTTP POST sends 199 a complete manifest or media segment towards the processing entity. 200 The sequence of POST commands starts with the manifest and init 201 segments that includes header boxes (ftyp and moov boxes). 202 It continues with the sequence of segments 203 (combinations of moof and mdat boxes). 205 Expires November 7 2018 [Page4] 206 An example of a POST URL 207 targeting the publishing point is: 208 http://HostName/presentationPath/manifestpath 209 /rsegmentpath/Identifier 211 The PostURL the syntax is defined as follows using the 212 IETF RFC 5234 ANB [RFC5234] to specify the structure. 214 PostURL = Protocol ://BroadcastURL Identifier 215 Protocol = "http" / "https" 216 BroadcastURL = HostName "/" PresentationPath 217 HostName = URI_SAFE_IDENTIFIER 218 PresentationPath = URI_SAFE_IDENTIFIER 219 ManifestPath = URI_SAFE_IDENTIFIER 220 Rsegmentpath = URI_SAFE_IDENTIFIER 221 Identifier = segment_file_name 223 In this PostURL the HostName is typically the hostname of the 224 media processing entity or publishing point. The presentation path 225 is the path to the specific presentation at the publishing point. 226 The manifest path can be used to signal the specific manifest of 227 the presentation. The rsegmentpath can be a different optional 228 extended path based on the relative paths in the manifest file. 229 The identifier describes the filename of the segment as described 230 in the manifest. The live source sender first sends the manifest 231 to the path http://hostname/presentationpath/ allowing 232 the receiving entity to setup reception paths for the following 233 segments and manifests. In case no manifest is used any POST_URL 234 setup for media ingest such as http://hostname/presentationpath/ 235 can be used. The fragmentedMP4stream can be defined 236 using the IETF RFC 5234 ANB [RFC5234] as follows. 238 fragmentedMP4stream = headerboxes fragments 239 headerboxes = ftyp moov 240 fragments = X fragment 241 fragment = Moof Mdat 243 The communication between the live encoder/media ingest source 244 and the receiving media procesing entity follows the following 245 requirements. 247 1. The live encoder or ingest source communicates to 248 the publishing point/processing entity using the HTTP 249 POST method as defined in the HTTP protocol [RFC2626], 250 or in the case for manifest updates the HTTP PUT Method. 251 2. The live encoder or ingest source SHOULD start 252 by sending an HTTP POST request with an empty "body" 253 (zero content length) by using the same POSTURL. 254 This can help the live encoder or media 255 ingest source to quickly detect whether the 256 live ingest publishing point is valid, 257 and if there are any authentication or other conditions required. 259 Expires November 7 2018 [Page5] 260 3. The live encoder/media source SHOULD use secured 261 transmission using HTTPS protocol 262 as specified in [RFC2818] for connecting 263 to the receiving media processing entity 264 or publishing point. 265 4. In case HTTPS protocol is used, 266 basic authentication HTTP AUTH [RFC7617] 267 or better methods like 268 TLS client certificates SHOULD be used to 269 secure the connection. 270 5. As compatibility profile for the TLS encryption 271 we recommend the mozzilla 272 intermediate compatibility profile which is supported 273 in many available implementations [MozillaTLS]. 274 6. Before sending the segments 275 based on fragmentedMP4Stream the live encoder/source 276 MAY send a manifest 277 with the following the limitations/constraints. 278 6a. Only relative URL paths to be used for each segment 279 6b. Only unique paths are used for each new presentation 280 6c. In case the manifest contains these relative paths, 281 these paths MAY be used in combination with the 282 POST_URL + relative URLs 283 to POST each of the different segments from 284 the live encoder or ingest source 285 to the processing entity. 286 6d. In case the manifest contains no relative paths, 287 or no manifest is used the 288 segments SHOULD be posted to the original 289 POST_URL specified by the service. 290 6e. In this case the tdft and trackids MAY 291 be used by the processing entity 292 to distinguish incoming segments 293 instead of the target POST_URL. 295 7. The live encoder MAY send an updated version of the manifest, 296 this manifest cannot override current settings and relative 297 paths or break currently running and incoming POST requests. 298 The updated manifest can only be slightly different from 299 the one that was send previously, e.g. introduce new segments 300 available or event messages. The updated manifest SHOULD be 301 send using a PUT request instead of a POST request. 303 Note: this manifest will be useful for passive media processing 304 entities mostly, for ingest towards active media processing 305 entities this manifest could be avoided and information 306 is signalled through the boxes available in the ISOBMFF. 308 8. The encoder or ingest source MUST handle any error or failed 309 authentication responses received from the media processing 310 entity such as 403 (forbidden), 400 bad request, 415 311 unsupported media type, 412 not fulfilling conditions 313 Expires November 7 2018 [Page6] 314 9. In case of a 412 not fullfilling conditions or 415 315 unsupported media type, 316 the live source/encoder MUST resend the init segment 317 consisting of a "moov" and "ftyp" box. 318 10. The live encoder or ingest source SHOULD start 319 a new HTTP POST segment request sequence with the 320 init segment including header boxes "ftyp" and "moov" 321 11. Following media segment requests SHOULD be corresponding 322 to the segments listed in the manifest if a manifest was sent. 323 12. The payload of each request MAY start with the header boxes 324 "ftyp" and "moov", followed by segments which consist of 325 a combination of "moof" and "mdat" boxes. 327 Note that the "ftyp", and "moov" boxes (in this order) MAY be 328 transmitted with each request, especially if the encoder must 329 reconnect because the previous POST request was terminated 330 prior to the end of the stream with a 412 or 331 415 message. Resending the "moov" and "ftyp" boxes 332 allows the receiving entitity to recover the init segment 333 and the track information needed for interpreting the content. 334 13. The encoder or ingest source MAY use chunked transfer 335 encoding option of the HTTP POST command [RFC2626] for uploading 336 as it might be difficult to predict the entire content length 337 of the segment. This can be used for example to support use 338 cases that require low latency. 339 14. The encoder or ingest source SHOULD use individual HTTP POST 340 commands [RFC2626] for uploading media segments when ready. 341 15. If the HTTP POST request terminates or times out with a TCP 342 error prior to the end of the stream, the encoder MUST issue 343 a new POST request by using a new connection, and follow the 344 preceding requirements. Additionally, the encoder MAY resend 345 the previous two segments that were already sent again. 346 16. In case fixed length POST Commands are used, the live source 347 entity MUST resend the segment 348 to be posted decribed in the manifest entirely 349 in case of responses HTTP 400, 412 or 415 together 350 with the init segment consisting of "moov" and "ftyp" boxes. 351 17. In case the live stream event is over the live media 352 source/encoder should signal 353 the stop by transmitting an empty "mfra" box 354 towards the publishing point/processing entity 355 18. The trackFragmentDecodeTime box "tfdt" box 356 MUST be present for each segment posted. 357 19. The ISOBMFF media fragment duration SHOULD be constant, 358 to reduce the size of the client manifests. 359 A constant MPEG-4 fragment duration also improves client 360 download heuristics through the use of repeat tags. 361 The duration MAY fluctuate to compensate 362 for non-integer frame rates. By choosing an appropriate 363 timescale (a multiple of the frame rate is recommended) 364 this issue can be avoided. 366 Expires November 7 2018 [Page6] 367 20. The MPEG-4 fragment duration SHOULD be between 368 approximately 2 and 6 seconds. 369 21. The fragment decode timestamps "tfdt" of fragments in the 370 fragmentedMP4stream and the indexes base_media_decode_ time 371 SHOULD arrive in increasing order for each of the different 372 tracks/streams that are ingested. 373 22. The segments formatted as fragmented MP4 stream SHOULD use 374 a timescale for video streams based on the framerate 375 and 44.1 KHz or 48 KHz for audio streams 376 or any another timescale that enables integer 377 increments of the decode times of 378 fragments signalled in the "tfdt" box based on this scale. 379 23. The manifest MAY be used to signal the language of the stream, 380 which SHOULD also be signalled in the "mdhd" box or "elng" boxes 381 in the init segment and/or moof headers ("mdhd") 382 24. The manifest SHOULD be used to signal encryption specific 383 information, which SHOULD also be signalled in the "pssh", 384 "schm" and "sinf" boxes in the segments of 385 the init segment and media segments 386 25. The manifest SHOULD be used to signal information 387 about the different 388 tracks such as the durations, media encoding types, 389 content types, which SHOULD also be signalled in the 390 "moov" box in the init segment or the "moof" box 391 in the media segments 392 26. The manifest SHOULD be used to signal information 393 about the timed text, images and sub-titles in adaptation 394 sets and this information SHOULD also be signalled 395 in the "moov" box in the init segment, 396 for more information see the next section. 397 27. Segments posted towards the media procesing entity MUST contain 398 the bitrate "btrt" box specifying the target bitrate of 399 the segments and the "tfdt" box specifying the fragments 400 decode time and the "tfhd" box specifying the track id. 401 28. The live encoder/media source SHOULD repeatedly resolve 402 the Hostname to adapt to changes in the IP to Hostname mapping 403 such as for example by using the dynamic naming system 404 DNS [RFC1035] or any other system that is in place. 405 29. The Live encoder media source MUST update the IP to hostname 406 resolution respecting the TTL (time to live) from DNS 407 query responses, this will enable better resillience 408 to changes of the IP address in large scale deployments 409 where the IP adress of the publishing point media 410 processing nodes may change frequenty. 411 30. To support the ingest of live events with low latency, 412 shorter segment and fragment durations MAY be used 413 such as segments with a duration of 1 second. 414 31. The live encoder/media source SHOULD use a separate TCP 415 connection for ingest of each different bit-rate 416 tracks ingested 418 Expires November 7 2018 [Page8] 419 4. Formatting Requirements for Timed Text, Captions and Subtitles 421 The specification supports ingest of timed text, 422 images, captions and subtitles. we follow the normative 423 reference [MPEG-4-30] in this section. 425 1. The tracks containing timed text, images, captions 426 or subtitles MAY be signalled in the manifest by 427 an adaptationset with the different segments 428 containing the data of the track. 429 2. The segment data MAY be posted to the URL 430 corresponding to the path in the manifest for the segment, 431 else they MUST be posted towards the original POST_URL 432 3. The track will be a sparse track signalled by a null media header 433 "nmhd" containing the timed text, images, captions corresponding 434 to the recommendation of storing tracks in fragmented MPEG-4 [CMAF] 435 4. Based on this recommendation the trackhandler "hdlr" shall 436 be set to "text" for WebVTT and "subt" for TTML 437 5. In case TTML is used the track must use the XMLSampleEntry 438 to signal sample description of the sub-title stream 439 6. In case WebVTT is used the track must use the WVTTSampleEntry 440 to signal sample description of the text stream 441 7. These boxes SHOULD signal the mime type and specifics as 442 described in [CMAF] sections 11.3 ,11.4 and 11.5 443 8. The boxes described in 3-7 must be present in the init 444 segment ("ftyp" + "moov") for the given track 445 9. subtitles in CTA-608 and CTA-708 can be transmitted 446 following the recommendation section 11.5 in [CMAF] via 447 SEI messages in the video track 448 10. The "ftyp" box in the init segment for the track 449 containing timed text, images, captions and sub-titles 450 can use signalling using CMAF profiles based on [CMAF] 452 10a. WebVTT Specified in 11.2 ISO/IEC 14496-30 453 [MPEG-4-30] 'cwvt' 454 10b.TTML IMSC1 Text Specified in 11.3.3 [MPEG-4-30] 455 IMSC1 Text Profile 'im1t' 456 10c.TTML IMSC1 Image Specified in 11.3.4 [MPEG-4-30] 457 IMSC1 Image Profile 'im1i' 458 10d. CEA CTA-608 and CTA-708 Specified in 11.4 [MPEG-4-30] 459 Caption data is embedded in SEI messages in video track; 460 'ccea' 461 11. The segments of the tracks containing Timed Text, Images, 462 Captions and Sub-titles SHOULD use the bit-rate box "btrt" to 463 signal bit-rate of the track in each segment. 465 Expires November 7 2018 [Page9] 466 5. Formatting Requirements for Timed Metadata 468 This section discusses the specific formatting requirements 469 for ingest of timed metadata related to events and markers for 470 ad- insertion or other timed metadata relating to the media 471 content such as information about the content. 472 When delivering a live streaming presentation with a rich 473 client experience, often it is necessary to transmit time-synced 474 events, metadata or other signals in-band with the main 475 media data. An example of these are opportunities for dynamic 476 live ad insertion signalled by SCTE-35 markers. This type of 477 event signalling is different from regular audio/video streaming 478 because of its sparse nature. In other words, the signalling data 479 usually does not happen continuously, and the interval can 480 be hard to predict. Examples of timed metadata are ID3 tags 481 [ID3v2], SCTE-35 markers [SCTE-35] and DASH emsg 482 messages defined in section 5.10.3.3 of [DASH]. For example, 483 DASH Event messages contain a schemeIdUri that defines 484 the payload of the message. Table 1 provides some 485 example schemes in DASH event messages and Table 2 486 illustrates an example of a SCTE-35 marker stored 487 in a dash emsg. The presented approach allows ingest of 488 timed metadata from different sources, 489 possibly on different locations by embedding them in 490 sparse metadata tracks. 492 Table 1 Example of DASH emsg schemes URI 494 Scheme URI | Reference 495 -------------------------|------------------ 496 urn:mpeg:dash:event:2012 | [DASH], 5.10.4 497 urn:dvb:iptv:cpm:2014 | [DVB-DASH], 9.1.2.1 498 urn:scte:scte35:2013:bin | [SCTE-35] 14-3 (2015), 7.3.2 499 www.nielsen.com:id3:v1 | Nielsen ID3 in MPEG-DASH 501 Table 2 example of a SCTE-35 marker embedded in a DASH emsg 502 Tag | Value 503 ------------------------|----------------------------- 504 scheme_uri_id | "urn:scte:scte35:2013:bin" 505 Value | the value of the SCTE 35 PID 506 Timescale | positive number 507 presentation_time_delta | non-negative number expressing splice time 508 | relative to tfdt 509 event_duration | duration of event 510 | "0xFFFFFFFF" indicates unknown duration 511 Id | unique identifier for message 512 message_data | splice info section including CRC 514 Expires November 7 2018 [Page10] 515 The following steps are recommended for timed metadata 516 ingest related to events, tags, ad markers and 517 program information: 518 1. Create a fragmentedMP4stream that contains only a sparse 519 metadata track which are tracks without audio/video. 520 2. Metadata tracks MAY be signalled in a manifest using an 521 adaptationset with a sparse track, the actual data 522 is in the sparse media track in the segments. 523 3. For a metadata track the media handler type is "meta" 524 and the tracks handler box is a null media header box "nmhd". 525 4. The URIMetaSampleEntry entry contains, in a URIbox, 526 the URI following the URI syntax in [RFC3986] defining the form 527 of the metadata 529 (see the ISO Base media file format specification [ISOBMFF]). 530 For example, the URIBox could contain for ID3 tags [ID3v2] 531 the URL http://www.id3.org 532 5. For the case of ID3, a sample contains a single ID3 tag. 533 The ID3 tag may contain one or more ID3 frames. 534 6. For the case of DASH e-msg, a sample may contain 535 one or more event message ("emsg") boxes. 536 Version 0 Event Message SHOULD be used. 537 The presentation_time_delta field is relative to the absolute 538 timestamp specified in the TrackFragmentBaseMediaDecode-TimeBox 539 ("tfdt"). The timescale field should match the value specified 540 in the media header box "mdhd". 541 7. For the case of a DASH e-msg, the kind box 542 (contained in the udta) MUST be used to signal 543 the scheme URI of the type of metadata 544 8. A BitRateBox ("btrt") SHOULD be present at the end of 545 MetaDataSampleEntry to signal the bit rate information 546 of the stream. 547 9. If the specific format uses internal timing values, 548 then the timescale must match the timescale field set 549 in the media header box "mdhd". 550 10. All Timed Metadata samples are sync samples [ISOBMFF], 551 defining the entire set of metadata for the time interval 552 they cover. Hence, the sync sample table box is not present. 553 11. When Timed Metadata is stored in a TrackRunBox ("trun"), 554 a single sample is present with the duration set to the 555 duration for that run. 557 Given the sparse nature of the signalling event, the following 558 is recommended: 559 12. At the beginning of the live event, the encoder or 560 media ingest source sends the initial header boxes to 561 the processing entity/publishing point, 562 which allows the service to register the sparse track. 563 13. When sending segments, the encoder SHOULD start sending 564 from the header boxes, followed by the new fragments. 566 Expires November 7 2018 [Page11] 567 14. The sparse track segment becomes available to the 568 publishing point/processing entity when the corresponding 569 parent track fragment that has an equal or larger timestamp 570 value is made available. For example, if the sparse fragment 571 has a timestamp of t=1000, it is expected that after the 572 publishing point/processing entity sees "video" 573 (assuming the parent track name is "video") 574 fragment timestamp 1000 or beyond, it can retrieve the 575 sparse fragment t=1000. Note that the actual 576 signal could be used for a different position 577 in the presentation timeline for its designated purpose. 578 In this example, it is possible that the sparse fragment 579 of t=1000 has an XML payload, which is for inserting 580 an ad in a position that is a few seconds later. 581 15. The payload of sparse track fragments can be in 582 different formats (such as XML, text, or binary), 583 depending on the scenario 585 6. Guidelines for Handling of Media Processing Entity Failover 587 Given the nature of live streaming, good failover support is 588 critical for ensuring the availability of the service. 589 Typically, media services are designed to handle various types 590 of failures, including network errors, server errors, and storage 591 issues. When used in conjunction with proper failover 592 logic from the live encoder side, customers can achieve 593 a highly reliable live streaming service from the cloud. 594 In this section, we discuss service failover scenarios. 595 In this case, the failure happens somewhere within the service, 596 and it manifests itself as a network error. Here are some 597 recommendations for the encoder implementation for handling 598 service failover: 599 1. Use a 10-second timeout for establishing the 600 TCP connection. 601 If an attempt to establish the connection takes longer 602 than 10 seconds, abort the operation and try again. 603 2. Use a short timeout for sending the HTTP requests. 604 If the target segment duration is N seconds, use a send 605 timeout between N and 2 N seconds; for example, if 606 the segment duration is 6 seconds, 607 use a timeout of 6 to 12 seconds. 608 If a timeout occurs, reset the connection, 609 open a new connection, 610 and resume stream ingest on the new connection. 611 This is needed to avoid latency introduced 612 by failing connectivity in the workflow. 613 3. completely resend segments from the ingest source 614 for which a connection was terminated early 615 4. We recommend that the encoder or ingest source 616 does NOT limit the number of retries to establish a 617 connection or resume streaming after a TCP error occurs. 619 Expires November 7 2018 [Page12] 620 5. After a TCP error: 621 a. The current connection MUST be closed, 622 and a new connection MUST be created 623 for a new HTTP POST request. 624 b. The new HTTP POST URL MUST be the same 625 as the initial POST URL for the 626 segment to be ingested. 627 c. The new HTTP POST MUST include stream 628 headers ("ftyp", and "moov" boxes) that are 629 identical to the stream headers in the 630 initial POST request for fragmented media ingest. 631 d. The last two fragments sent for each segment 632 MAY be retransmitted. Other ISOBMFF fragment 633 timestamps MUST increase continuously, 634 even across HTTP POST requests. 635 6. The encoder or ingest source SHOULD terminate 636 the HTTP POST request if data is not being sent 637 at a rate commensurate with the MP4 segment duration. 638 An HTTP POST request that does not send data can 639 prevent publishing points or media processing entities 640 from quickly disconnecting from the live encoder or 641 media ingest source in the event of a service update. 642 For this reason, the HTTP POST for sparse (ad signal) 643 tracks SHOULD be short-lived, terminating as soon as 644 the sparse fragment is sent. 645 In addition this draft defines responses to the 646 POST requests in order to signal the live media source its status. 647 7. In case the media processing entity cannot process the manifest 648 or segment POST request due to authentication or permission 649 problems then it can return a permission denied HTTP 403 650 8. In case the media processing entity can process the manifest 651 or segment POSTED to the POST_URL it returns HTTP 200 OK or 652 202 Accepted 653 9. In case the media processing entity can process 654 the manifest or segment POST request but finds 655 the media type cannot be supported it returns HTTP 415 656 unsupported media type 657 10. In case an unknown error happened during 658 the processing of the HTTP 659 POST request a HTTP 400 Bad request is returned 660 11. In case the media processing entity cannot 661 proces a segment posted 662 due to missing init segment, a HTTP 412 663 unfulfilled condition 664 is returned 665 12. In case a media source receives an HTTP 412 response, 666 it SHOULD resend the manifest and "ftyp" and "moov" 667 boxes for the track. 669 Expires November 7 2018 [Page13] 670 An example of media ingest with failure and HTTP 671 responses is shown in the following figure: 673 ||===============================================================|| 674 ||===================== ============================ || 675 ||| live media source | | Media processing entity | || 676 ||===================== ============================ || 677 || || || || 678 ||===============Initial Manifest Sending========================|| 679 || || || || 680 || ||-- POST /prefix/media.mpd -------->>|| || 681 || || Succes || || 682 || || <<------ 200 OK --------------------|| || 683 || || Permission denied || || 684 || || <<------ 403 Forbidden -------------|| || 685 || || Bad Request || || 686 || || <<------ 400 Forbidden -------------|| || 687 || || Unsupported Media Type || || 688 || || <<------ 415 Unsupported Media -----|| || 689 || || || || 690 ||==================== Segment Sending ==========================|| 691 || ||-- POST /prefix/chunk.cmaf ------->>|| || 692 || || Succes/Accepted || || 693 || || <<------ 200 OK --------------------|| || 694 || || Succes/Accepted || || 695 || || <<------ 202 OK --------------------|| || 696 || || Premission Denied || || 697 || || <<------ 403 Forbidden -------------|| || 698 || || Bad Request || || 699 || || <<------ 400 Forbidden -------------|| || 700 || || Unsupported Media Type || || 701 || || <<------ 415 Forbidden -------------|| || 702 || || Unsupported Media Type || || 703 || || <<-- 412 Unfulfilled Condition -----|| || 704 || || || || 705 || || || || 706 ||===================== ============================ || 707 ||| live media source | | Media processing entity | || 708 ||===================== ============================ || 709 || || || || 710 ||===============================================================|| 712 Expires November 7 2018 [Page13] 713 7. Guidelines for Handling of Live Media Source Failover 714 Encoder or media ingest source failover is the second type 715 of failover scenario that needs to be addressed for end-to-end 716 live streaming delivery. In this scenario, the error condition 717 occurs on the encoder side. The following expectations apply fro 718 m the live ingestion endpoint when encoder failover happens: 719 1. A new encoder or media ingest source instance 720 SHOULD be created to continue streaming 721 2. The new encoder or media ingest source MUST use 722 the same URL for HTTP POST requests as the failed instance. 723 3. The new encoder or media ingest source POST request 724 MUST include the same header boxes moov 725 and ftyp as the failed instance. 726 4. The new encoder or media ingest source 727 MUST be properly synced with all other running encoders 728 for the same live presentation to generate synced audio/video 729 samples with aligned fragment boundaries. 730 This implies that UTC timestamps 731 for fragments in the "tdft" match between decoders, 732 and encoders start running at 733 an appropriate segment boundary. 734 5. The new stream MUST be semantically equivalent 735 with the previous stream, and interchangeable 736 at the header and media fragment levels. 737 6. The new encoder or media ingest source SHOULD 738 try to minimize data loss. The basemediadecodetime tdft 739 of media fragments SHOULD increase from the point where 740 the encoder last stopped. The basemediadecodetime in the 741 "tdft" box SHOULD increase in a continuous manner, but it 742 is permissible to introduce a discontinuity, if necessary. 743 Media processing entities or publishing points can ignore 744 fragments that it has already received and processed, so 745 it is better to error on the side of resending fragments 746 than to introduce discontinuities in the media timeline. 748 8. Security Considerations 750 No security considerations except the ones mentioned 751 in the preceding text. Further 752 security considerations will be updated 753 when they become known. 755 9. IANA Considerations 757 This memo includes no request to IANA. 759 10. Contributors 761 Arjen Wagenaar, Dirk Griffioen, Unified Streaming B.V. 762 We thank all of the individual contributors to the discussions 763 in [fmp4git] representing major content delivery networks, 764 broadcasters, commercial encoders and cloud service providers. 766 Expires November 7 2018 [Page14] 767 11. References 769 11.1. Normative References 771 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 772 Requirement Levels", BCP 14, RFC 2119, March 1997. 774 [DASH] MPEG ISO/IEC JTC1/SC29 WG11, "ISO/IEC 23009-1:2014: 775 Dynamic adaptive streaming over HTTP (DASH) -- Part 1: 776 Media presentation description and segment formats," 2014. 778 [SCTE-35] Society of Cable Television Engineers, 779 "SCTE-35 (ANSI/SCTE 35 2013) 780 Digital Program Insertion Cueing Message for Cable," 781 SCTE-35 (ANSI/SCTE 35 2013). 783 [ISOBMFF] MPEG ISO/IEC JTC1/SC29 WG11, " Information technology 784 -- Coding of audio-visual objects Part 12: ISO base 785 media file format ISO/IEC 14496-12:2012" 787 [HEVC] MPEG ISO/IEC JTC1/SC29 WG11, 788 "Information technology -- High efficiency coding 789 and media delivery in heterogeneous environments 790 -- Part 2: High efficiency video coding", 791 ISO/IEC 23008-2:2015, 2015. 793 [RFC793] J Postel IETF DARPA, "TRANSMISSION CONTROL PROTOCOL," 794 IETF RFC 793, 1981. 796 [RFC3986] R. Fielding, L. Masinter, T. Berners Lee, 797 "Uniform Resource Identifiers (URI): Generic Syntax," 798 IETF RFC 3986, 2004. 800 [RFC1035] P. Mockapetris, 801 "DOMAIN NAMES - IMPLEMENTATION AND SPECIFICATION" 802 IETF RFC 1035, 1987. 804 [CMAF] MPEG ISO/IEC JTC1/SC29 WG11, "Information technology 805 (MPEG-A) -- Part 19: Common media application 806 format (CMAF) for segmented media," 807 MPEG, ISO/IEC International standard 809 [RFC5234] D. Crocker "Augmented BNF for Syntax Specifications: 810 ABNF" IETF RFC 5234 2008 812 [CENC] MPEG ISO/IEC JTC1 SC29 WG11 "Information technology -- 813 MPEG systems technologies -- Part 7: Common encryption 814 in ISO base media file format files" 815 ISO/IEC 23001-7:2016 817 Expires November 7 2018 [Page15] 819 [MPEG-4-30] MPEG ISO/IEC JTC1 SC29 WG11 820 "ISO/IEC 14496-30:2014 Information technology 821 Coding of audio-visual objects -- Part 30": 822 Timed text and other visual overlays in 823 ISO base media file format 825 [ISO639-2] ISO 639-2 "Codes for the Representation of Names 826 of Languages -- Part 2 ISO 639-2:1998" 828 [DVB-DASH] ETSI Digital Video Broadcasting 829 "MPEG-DASH Profile for Transport of ISOBMFF 830 Based DVB Services over IP Based Networks" 831 ETSI TS 103 285 833 [RFC7617] J Reschke "The 'Basic' HTTP Authentication Scheme" 834 IETF RFC 7617 September 2015 836 11.2. Informative References 838 [RFC2626] R. Fielding et al 839 "Hypertext Transfer Protocol HTTP/1.1", 840 RFC 2626 June 1999 842 [RFC2818] E. Rescorla RFC 2818 HTTP over TLS 843 IETF RFC 2818 May 2000 845 11.3. URL References 847 [fmp4git] Unified Streaming github fmp4 ingest, 848 "https://github.com/unifiedstreaming/fmp4-ingest". 850 [MozillaTLS] Mozilla Wikie Security/Server Side TLS 851 https://wiki.mozilla.org/Security/Server_Side_TLS 852 #Intermediate_compatibility_.28default.29 853 (last acessed 30th of March 2018) 855 [ID3v2] M. Nilsson "ID3 Tag version 2.4.0 Main structure" 856 http://id3.org/id3v2.4.0-structure 857 November 2000 (last acessed 2nd of May 2018) 859 Author's Address 861 Rufael Mekuria (editor) 862 Unified Streaming 863 Overtoom 60 1054HK 865 Phone: +31 (0)202338801 866 E-Mail: rufael@unified-streaming.com 868 Expires November 7 2018 [Page16]