idnits 2.17.1 draft-theo-hesp-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- -- The document has an IETF Trust Provisions (28 Dec 2009) Section 6.c(i) Publication Limitation clause. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (13 May 2022) is 715 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 7230 (Obsoleted by RFC 9110, RFC 9112) ** Obsolete normative reference: RFC 7233 (Obsoleted by RFC 9110) Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Individual P. Speelmans, Ed. 3 Internet-Draft THEO Technologies 4 Intended status: Informational 13 May 2022 5 Expires: 14 November 2022 7 HESP - High Efficiency Streaming Protocol 8 draft-theo-hesp-02 10 Abstract 12 This document describes a protocol for delivering multimedia data, 13 enabling ultra-low latency and fast channel change over HTTP 14 networks. It specifies the data format of the files and the actions 15 to be taken by the server (sender) and the clients (receivers) of the 16 streams. It describes version 1 of this protocol. 18 Status of This Memo 20 This Internet-Draft is submitted in full conformance with the 21 provisions of BCP 78 and BCP 79. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF). Note that other groups may also distribute 25 working documents as Internet-Drafts. The list of current Internet- 26 Drafts is at https://datatracker.ietf.org/drafts/current/. 28 Internet-Drafts are draft documents valid for a maximum of six months 29 and may be updated, replaced, or obsoleted by other documents at any 30 time. It is inappropriate to use Internet-Drafts as reference 31 material or to cite them other than as "work in progress." 33 This Internet-Draft will expire on 14 November 2022. 35 Copyright Notice 37 Copyright (c) 2022 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 42 license-info) in effect on the date of publication of this document. 43 Please review these documents carefully, as they describe your rights 44 and restrictions with respect to this document. 46 This document may not be modified, and derivative works of it may not 47 be created, except to format it for publication as an RFC or to 48 translate it into languages other than English. 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 53 2. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 4 54 2.1. HESP components . . . . . . . . . . . . . . . . . . . . . 4 55 2.2. Two complementary streams . . . . . . . . . . . . . . . . 4 56 2.3. HESP object model . . . . . . . . . . . . . . . . . . . . 5 57 2.4. Reference flow . . . . . . . . . . . . . . . . . . . . . 7 58 3. HESP Manifest . . . . . . . . . . . . . . . . . . . . . . . . 8 59 3.1. Timestamps . . . . . . . . . . . . . . . . . . . . . . . 8 60 3.1.1. Manifest Timestamp and Media Timestamp . . . . . . . 8 61 3.1.2. Sequence Numbers . . . . . . . . . . . . . . . . . . 9 62 3.1.3. Calculating the Sequence Number of an Initialization 63 Packet . . . . . . . . . . . . . . . . . . . . . . . 9 64 3.2. Manifest data types . . . . . . . . . . . . . . . . . . . 10 65 3.2.1. Additional JSON data types . . . . . . . . . . . . . 10 66 3.2.2. ManifestType . . . . . . . . . . . . . . . . . . . . 11 67 3.2.3. TimeSource . . . . . . . . . . . . . . . . . . . . . 13 68 3.2.4. ScaledValue . . . . . . . . . . . . . . . . . . . . . 13 69 3.2.5. PresentationType . . . . . . . . . . . . . . . . . . 13 70 3.2.6. TimeBounds . . . . . . . . . . . . . . . . . . . . . 17 71 3.2.7. AudioSwitchingSetType . . . . . . . . . . . . . . . . 17 72 3.2.8. SwitchingSetProtection . . . . . . . . . . . . . . . 22 73 3.2.9. SwitchingSetProtectionSystem . . . . . . . . . . . . 23 74 3.2.10. VideoSwitchingSetType . . . . . . . . . . . . . . . . 24 75 3.2.11. MetadataSwitchingSetType . . . . . . . . . . . . . . 28 76 3.2.12. AudioTrackType . . . . . . . . . . . . . . . . . . . 30 77 3.2.13. VideoTrackType . . . . . . . . . . . . . . . . . . . 36 78 3.2.14. Resolution . . . . . . . . . . . . . . . . . . . . . 42 79 3.2.15. MetadataTrackType . . . . . . . . . . . . . . . . . . 43 80 3.2.16. SegmentType . . . . . . . . . . . . . . . . . . . . . 46 81 3.2.17. PresentationEventType . . . . . . . . . . . . . . . . 47 82 3.2.18. PresentationEventTimeBounds . . . . . . . . . . . . . 48 83 3.3. Manifest requests . . . . . . . . . . . . . . . . . . . . 49 84 3.3.1. Manifest responses . . . . . . . . . . . . . . . . . 50 85 3.4. Addressing of content requests . . . . . . . . . . . . . 50 86 3.4.1. Content request URL resolution . . . . . . . . . . . 50 87 3.4.2. Requesting using an identifier and the content request 88 URL . . . . . . . . . . . . . . . . . . . . . . . . . 51 89 3.5. Manifest example . . . . . . . . . . . . . . . . . . . . 52 90 4. Initialization Stream . . . . . . . . . . . . . . . . . . . . 52 91 4.1. Initialization Stream purpose . . . . . . . . . . . . . . 52 92 4.2. Initialization Packet format . . . . . . . . . . . . . . 52 93 4.2.1. Video Initialization Packet . . . . . . . . . . . . . 52 94 4.2.2. Audio Initialization Packet . . . . . . . . . . . . . 53 95 4.2.3. CMAF header . . . . . . . . . . . . . . . . . . . . . 53 96 4.2.4. Event message information . . . . . . . . . . . . . . 53 97 4.3. Initialization Stream addressing . . . . . . . . . . . . 54 98 4.3.1. Initialization Stream requests . . . . . . . . . . . 54 99 4.3.2. Initialization Stream responses . . . . . . . . . . . 55 100 5. Continuation Stream . . . . . . . . . . . . . . . . . . . . . 55 101 5.1. Continuation Stream format . . . . . . . . . . . . . . . 55 102 5.1.1. Media content . . . . . . . . . . . . . . . . . . . . 55 103 5.2. Continuation Segment availability . . . . . . . . . . . . 56 104 5.3. Continuation Stream addressing . . . . . . . . . . . . . 57 105 5.3.1. Continuation Stream URLs . . . . . . . . . . . . . . 57 106 5.3.2. Continuation Stream requests . . . . . . . . . . . . 57 107 5.3.3. Continuation Stream responses . . . . . . . . . . . . 58 108 6. Timed metadata . . . . . . . . . . . . . . . . . . . . . . . 60 109 6.1. Metadata Tracks . . . . . . . . . . . . . . . . . . . . . 60 110 6.2. Metadata events . . . . . . . . . . . . . . . . . . . . . 61 111 6.2.1. In-band events . . . . . . . . . . . . . . . . . . . 61 112 6.2.2. Out-of-band events . . . . . . . . . . . . . . . . . 63 113 7. Content protection . . . . . . . . . . . . . . . . . . . . . 64 114 7.1. Common encryption support . . . . . . . . . . . . . . . . 64 115 7.2. HESP Manifest . . . . . . . . . . . . . . . . . . . . . . 64 116 7.3. CMAF box structure . . . . . . . . . . . . . . . . . . . 64 117 7.3.1. Initialization Stream . . . . . . . . . . . . . . . . 65 118 7.3.2. Continuation Stream . . . . . . . . . . . . . . . . . 65 119 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 66 120 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 66 121 10. Security Considerations . . . . . . . . . . . . . . . . . . . 66 122 11. Normative References . . . . . . . . . . . . . . . . . . . . 67 123 12. Informative References . . . . . . . . . . . . . . . . . . . 68 124 Appendix A. Example usage . . . . . . . . . . . . . . . . . . . 68 125 A.1. Manifest . . . . . . . . . . . . . . . . . . . . . . . . 68 126 A.1.1. Retrieving the Manifest . . . . . . . . . . . . . . . 69 127 A.1.2. Timing information . . . . . . . . . . . . . . . . . 74 128 A.1.3. Content addressing . . . . . . . . . . . . . . . . . 74 129 A.2. Initialization Stream . . . . . . . . . . . . . . . . . . 75 130 A.2.1. Retrieving Initialization Packets . . . . . . . . . . 76 131 A.2.2. Parsing offset information . . . . . . . . . . . . . 76 132 A.3. Continuation Stream . . . . . . . . . . . . . . . . . . . 76 133 A.3.1. Retrieving Continuation Segments . . . . . . . . . . 76 134 Appendix B. CDNs . . . . . . . . . . . . . . . . . . . . . . . . 77 135 Appendix C. HESP Profiles (using H.264 as video codec) . . . . . 78 136 C.1. Maximal Gain Profile . . . . . . . . . . . . . . . . . . 78 137 C.2. Compatibility Profile . . . . . . . . . . . . . . . . . . 79 138 C.2.1. Example . . . . . . . . . . . . . . . . . . . . . . . 80 139 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 81 141 1. Introduction 143 Viewers are more demanding than ever, calling for a streaming 144 protocol that combines ultra-low latency, fast zapping and cost- 145 effective scalability. 147 HESP is an HTTP-based streaming approach that works with standard 148 contribution feeds from (live) productions, standard encoders, albeit 149 specifically configured, a HESP compliant packager, regular CDNs and 150 an HESP compliant player. 152 HESP offers sub-second latency, near real-time interactivity, fast 153 startup and channel change times and cost-effective scalability up to 154 millions of viewers. 156 The purpose of this document is to facilitate interoperability 157 between HESP implementations by describing the media transmission 158 protocol. 160 This document describes version 1 of the protocol. 162 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 163 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 164 document are to be interpreted as described in [RFC2119]. 166 2. Overview 168 This section contains an overview of the HESP protocol and its 169 building blocks. 171 2.1. HESP components 173 HESP follows a regular approach to online video streaming. The 174 content is ingested and transcoded in different qualities. Each 175 quality requires two streams. The encoded streams are packaged by an 176 HESP packager and made available via an origin server. An HESP 177 player requests the stream using HTTP requests. A CDN can be used. 179 _________ ________ __________ ________ _____ ________ 180 | | | | | | | | | | | | 181 | Input |\| xcoder |\| packager |\| origin |\| CDN |\| HESP | 182 | Streams |/| |/| |/| |/| |/| player | 183 |_________| |________| |__________| |________| |_____| |________| 185 Figure 1: HESP chain from source to playback 187 2.2. Two complementary streams 189 HESP is based on using two streams for each track, the Initialization 190 Stream and the Continuation Stream. The encoder MUST ensure that the 191 corresponding media data of both streams are issued with synchronized 192 presentation timestamps. The packager MUST ensure this data remains 193 in sync. 195 The Initialization Stream consists of Initialization Packets. 196 Initialization Packets MUST be individually addressable. An 197 Initialization Packet MUST contain an independent media sample, the 198 reference to the segment and the position in the segment where the 199 Continuation Stream can start. Since the contained samples are 200 independent, playback can start with any Initialization Packet. 202 The Continuation Stream can start playback immediately after an 203 Initialization Packet, allowing for very fast channel start and 204 switch times. This mechanism puts referencing limitations to the 205 Continuation Stream. When a reference is made from the 206 Initialization Stream to a Continuation Stream, all data needed to 207 render the sample with the subsequent Presentation timestamp MUST be 208 available at the reference offset. In addition, the samples in the 209 Initialization Stream and the samples in the Continuation Stream MUST 210 be aligned. That is, the corresponding media samples of both streams 211 MUST have the same PTS and MUST be made available at the same time. 212 The Continuation Stream is addressed using byte-range requests. The 213 Continuation Stream SHOULD be published in chunks in order to reduce 214 end-to-end latency. 216 The player receives the Initialization Packet, initializes the 217 decoder, puts the media data in the decoder buffer, and requests the 218 subsequent media data from the Continuation Stream (using HTTP Range 219 requests, starting by the range offset given by the Initialization 220 Packet.) 222 time sequence T1 T2 T3 T4 T5 T6 T7 T8 223 +----+----+----+----+----+----+----+----+ 224 initialization stream | I1 | I2 | I3 | I4 | I5 | I6 | I7 | I8 | 225 +----+----+----+----+----+----+----+----+ 226 +----+----+----+----+----+----+----+----+ 227 continuation stream | C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | 228 +----+----+----+----+----+----+----+----+ 229 +----+----+----+----+----+----+ 230 playback buffer | I3 | C4 | C5 | C6 | C7 | C8 | 231 +----+----+----+----+----+----+ 233 Figure 2: Start of an HESP stream 235 2.3. HESP object model 237 The object model follows the CMAF media object model [CMAF]. 239 A Track is used to contain media samples (audio or video) or 240 metadata. It consists of a Continuation Stream and (except for 241 metadata Tracks) an Initialization Stream. 243 The Initialization Stream consists of Initialization Packets, where 244 each such packet MUST contain a CMAF Header and possibly a CMAF 245 Fragment. Each Initialization Packet MUST be individually 246 addressable through Sequence Numbers. It is further explained in 247 Section 4. 249 The Continuation Stream consists of Continuation Segments. These 250 Segments MUST consist of CMAF Chunks that can be sent individually to 251 clients. It is further explained in Section 5. 253 A Switching Set groups together Tracks that contain the same content 254 but with different encoding parameters (e.g., different resolution or 255 different bitrate). A client is able to seamlessly switch between 256 Tracks of a Switching Set as a result. If multiple Switching Sets 257 contain differing content but are aligned in their timings (e.g., 258 multiple view perspectives on the same performance or different 259 languages of the same audio), we can consider them Aligned Switching 260 Sets. 262 A Selection Set groups together multiple Switching Sets of the same 263 media type. HESP currently allows 3 Selection Sets: audio, video and 264 metadata Selection Sets. 266 A Presentation contains the Selection Sets for a given period of 267 time. Multiple Presentations together form a continuous timeline of 268 media content, even though each Presentation might have different 269 content, encodings or timestamps. It can be an advertisement, or a 270 part of a show, or the first half of a game,... The Presentation is 271 the lowest granularity inside a Manifest. All Tracks of a 272 Presentation MUST have media data for the full duration of the 273 Presentation. 275 The Manifest informs clients of the aforementioned data structure and 276 must be retrieved before any media data. The format of the Manifest 277 is detailed in Section 3. 279 A new Manifest is not needed to request a new Segment. Segment 280 addressing can happen automatically within a Presentation for an 281 efficient and continuous delivery of the Continuation Stream. A 282 Manifest is only obligated to be updated before the start of a new 283 Presentation. This gives the opportunity for low frequency Manifest 284 updates, though this update rate can be freely configured. 286 Since Presentations can be of unknown duration, a different mechanism 287 is used to signal a new Presentation to a client. To that extent, 288 in-band metadata events are introduced in the Continuation Stream. 289 Such an event can signal the client to retrieve a new Manifest, when 290 information about an upcoming Presentation becomes available. 292 2.4. Reference flow 294 +---------+ +---------+ 295 | Player | | Origin | 296 +---------+ +---------+ 297 | | 298 _____________|_____________________________________________|_____ 299 ! LOOP / all presentations | ! 300 !_______/ | | ! 301 ! | | ! 302 ! | request Manifest file | ! 303 ! |-------------------------------------------->| ! 304 ! | | ! 305 ! | Return Manifest | ! 306 ! |<--------------------------------------------| ! 307 ! | | ! 308 ! | parse Manifest | ! 309 ! |--------------- | ! 310 ! | | | ! 311 ! |<-------------- | ! 312 ! | | ! 313 ! | | ! 314 ! | Request Initialization Packet | ! 315 ! |-------------------------------------------->| ! 316 ! | | ! 317 ! | Return Initialization Packet | ! 318 ! |<--------------------------------------------| ! 319 ! | | ! 320 ! | Initialize decode pipeline | ! 321 ! |--------------------------- | ! 322 ! | | | ! 323 ! |<-------------------------- | ! 324 ! | | ! 325 ! | Parse position information | ! 326 ! | of the corresponding data | ! 327 ! | in the Continuation Stream | ! 328 ! |-------------------------- | ! 329 ! | | | ! 330 ! |<------------------------- | ! 331 ! | | ! 332 ! | | ! 333 ! | Request Continuation Stream: | ! 334 ! | (Segment n) byte-range [now, -) | ! 335 ! |-------------------------------------------->| ! 336 ! | | ! 337 ! | Return CMAF Fragments until the | ! 338 ! | end of the Segment | ! 339 ! |<--------------------------------------------| ! 340 ! __________|_____________________________________________|__ ! 341 ! ! LOOP / all Segments of the Presentation | ! ! 342 ! !_______/ | | ! ! 343 ! ! | | ! ! 344 ! ! | Request Continuation Stream: | ! ! 345 ! ! | (Segment n+i) without byte-range | ! ! 346 ! ! |-------------------------------------------->| ! ! 347 ! ! | | ! ! 348 ! ! | Return CMAF Fragments until the | ! ! 349 ! ! | end of the Segment | ! ! 350 ! ! |<--------------------------------------------| ! ! 351 ! ! | | ! ! 352 ! !__________|_____________________________________________|__! ! 353 ! | | ! 354 !_____________|_____________________________________________|_____! 355 | ! 356 | ! 358 Figure 3: HESP reference flow 360 Though the manifest file is shared, video, audio and metadata 361 (subtitles) media data MUST be distributed separately. They each 362 require separate requests from the client. 364 3. HESP Manifest 366 The first step for playback of an HESP stream is to fetch a Manifest. 367 It contains information on the available Tracks and how to request 368 content from every Track's Initialization and Continuation Streams. 370 3.1. Timestamps 372 3.1.1. Manifest Timestamp and Media Timestamp 374 A distinction is made between Manifest Timestamps and Media 375 Timestamps in the definitions below. Media Timestamps MUST represent 376 Presentation timestamps as they are given by the media data itself. 377 Manifest Timestamps MUST define those same Media Timestamps, but with 378 an additional offset such that the timestamps of all Tracks of a 379 Presentation are aligned. This offset MUST be given by the Manifest 380 for each Track or Switching Set. 382 For example, consider the first Segment of a Track with Media 383 Timestamps starting at 0 seconds. This Track is part of the second 384 Presentation of an HESP stream. The first Presentation has run for a 385 significant amount of time and the second Presentation follows 386 immediately after the first Presentation ends. In this case, the 387 Manifest Timestamps of the second Presentation must succeed the 388 Manifest Timestamps of the first Presentation. As a result, the 389 Media Timestamps and Manifest Timestamps of the second Presentation 390 now differ. An offset must be given by the Manifest in order to 391 inform clients about this difference between timestamp types for each 392 Track of the second Presentation. 394 An example of this distinction is given in Appendix A.1.2. 396 3.1.2. Sequence Numbers 398 Each Initialization Packet belonging to a Track is given a Sequence 399 Number. This is a simple identifier used to retrieve specific 400 Initialization Packets by the client. The Sequence Number is a 401 positive integer that MUST be increased by 1 for each subsequent 402 Initialization Packet of the same Track. 404 3.1.3. Calculating the Sequence Number of an Initialization Packet 406 To calculate the Sequence Number of a chosen Initialization Packet, 407 the Manifest provides the client multiple values. Each active 408 Presentation MUST contain both its current Manifest Timestamp and, 409 for each Track, the Sequence Number of the latest Initialization 410 Packet and the (constant) frame rate. 412 It is given that: 414 * the frame rate MUST be constant for each Track. 416 * Initialization Packets MUST become available in real-time. 418 * Sequence Numbers MUST be increased by 1 per Initialization Packet. 420 * the Sequence Number and Manifest Timestamp of the latest 421 Initialization Packet for a Track are given by the Manifest. 423 As a result, it is possible to derive Sequence Numbers for arbitrary 424 Initialization Packets. By taking the latest Sequence Number of a 425 Track, its associated Manifest Timestamp, and the frame rate, it is 426 possible to calculate the number of frames between the given 427 Initialization Packet and the required Initialization Packet in the 428 past or future. This amount can then be subtracted for that latest 429 Sequence Number in order to derive the Sequence Number of the desired 430 Initialization Packet. 432 For example, suppose the latest Sequence Number would be 103 at 433 Manifest Timestamp 00:00:04.120 for a Track with a frame rate of 25 434 fps. To find the Sequence Number at Manifest Timestamp 00:00:01.360, 435 the difference between both frames should be calculated. This is 436 2.760 seconds, or 2.760 * 25 = 69 frames. This means that the 437 Sequence Number for the provided Manifest Timestamp is 103 - 69 = 34. 439 3.2. Manifest data types 441 The Manifest MUST be formatted as a JSON file. All data types used 442 in the information below MUST satisfy the JavaScript Object Notation 443 specification [RFC8259]. With regards to the definition of numbers 444 within the JSON specification, their range and precision are 445 implementation-dependent. It should be noted that for 446 interoperability, numbers SHOULD be representable using IEEE 754 447 double-precision numbers. 449 Some additional data types are introduced on top of the JSON 450 specification to narrow the allowed values of some fields further. 452 3.2.1. Additional JSON data types 454 3.2.1.1. Integer 456 An integer (or int) is a subset of the number type defined in the 457 JSON specification, limited to integer numbers. Concretely, 458 following Section 6 of the JSON specification, a number MUST NOT have 459 a fraction part to be considered an integer. Note that as mentioned 460 above, integers SHOULD be representable using IEEE 754 double- 461 precision numbers. As a result, integers SHOULD be within the 462 [(-2**53)+1, (2**53)-1] range to make sure implementations agree on 463 the exact numeric value. 465 3.2.1.2. Unsigned integer 467 An unsigned integer (or uint) is a subset of the integer type, only 468 allowing for nonnegative integer numbers. Concretely, following 469 Section 6 of the JSON specification, an integer MUST NOT contain a 470 minus to be considered an unsigned integer. 472 3.2.1.3. Enumeration 474 An enumeration defines a list of constant strings, where a valid 475 value of this type MUST equal one of these strings. The type is 476 written as Enum(x, y, z), where x, y and z are the constant strings 477 allowed to be used. 479 3.2.1.4. DateTime 481 A DateTime MUST be formatted as a string in the format defined by ISO 482 8601 [ISO8601]: 484 YYYY-MM-DDThh:mm:ss.mmmTZD 486 where YYYY = four-digit year 487 MM = two-digit month 488 DD = two-digit day of the month 489 hh = two digits of the hour (00 through 23) 490 mm = two digits of a minute (00 through 59) 491 ss = two digits of a second (00 through 59) 492 mmm = three digits of a millisecond (000 through 999) 493 TZD = time zone designator (Z or +hh:mm or -hh:mm) 495 3.2.2. ManifestType 497 This data type represents the root of the Manifest. The structure 498 consists of general playback information and a collection of 499 Presentations. 501 The table below gives the possible fields. The "Required?" column 502 indicates if a field is REQUIRED (Y) or OPTIONAL (N). 504 +======================+===============+====+=======================+ 505 | Attribute | Type |Req?|Description | 506 +======================+===============+====+=======================+ 507 | availabilityDuration | ScaledValue |Y |The amount of time (in | 508 | | | |seconds) that live | 509 | | | |content MUST be kept | 510 | | | |available to be | 511 | | | |retrieved by a client. | 512 | | | |It is used to define an| 513 | | | |interval [live - n, | 514 | | | |live], where live is | 515 | | | |the current live point | 516 | | | |and n is | 517 | | | |availabilityDuration, | 518 | | | |where content MUST be | 519 | | | |available for clients | 520 | | | |of a live stream. For | 521 | | | |VOD streams, this value| 522 | | | |MUST be ignored. | 523 +----------------------+---------------+----+-----------------------+ 524 | creationDate | DateTime |Y |The timestamp of the | 525 | | | |moment that this | 526 | | | |specific Manifest was | 527 | | | |created by the packager| 528 | | | |(in local packager | 529 | | | |time.) | 530 +----------------------+---------------+----+-----------------------+ 531 | fallbackPollRate | uint |Y |The number of seconds a| 532 | | | |player SHOULD wait to | 533 | | | |poll a new Manifest, if| 534 | | | |it hasn't requested any| 535 | | | |since retrieving the | 536 | | | |current Manifest. | 537 +----------------------+---------------+----+-----------------------+ 538 | manifestVersion | Enum("1.1.0") |Y |The version number of | 539 | | | |the Manifest. | 540 +----------------------+---------------+----+-----------------------+ 541 | presentations | Presentation |Y |A list of all | 542 | | Type[] | |Presentations that are | 543 | | | |currently known. | 544 +----------------------+---------------+----+-----------------------+ 545 | streamType | Enum("live", |Y |Indicates whether the | 546 | | "vod") | |stream is a live stream| 547 | | | |(live) which does not | 548 | | | |have a known ending, or| 549 | | | |a video on demand | 550 | | | |stream (vod) with a | 551 | | | |known ending and | 552 | | | |duration. | 553 +----------------------+---------------+----+-----------------------+ 554 | activePresentation | string |N |The identifier of the | 555 | | | |currently active | 556 | | | |Presentation. This | 557 | | | |value MUST be available| 558 | | | |if streamType equals | 559 | | | |live, but MUST be | 560 | | | |ignored if streamType | 561 | | | |equals vod. | 562 +----------------------+---------------+----+-----------------------+ 563 | contentBaseUrl | string |N |The base URL for all | 564 | | | |content requests | 565 | | | |relating to this | 566 | | | |Manifest. See | 567 | | | |Section 3.4 for more | 568 | | | |information. | 569 +----------------------+---------------+----+-----------------------+ 570 | timeSource | TimeSource |N |A reference to a time | 571 | | | |server with which the | 572 | | | |packager is synced. | 573 | | | |This value MUST be | 574 | | | |ignored if streamType | 575 | | | |equals vod. | 576 +----------------------+---------------+----+-----------------------+ 578 Table 1 580 3.2.3. TimeSource 582 A TimeSource is used to sync the packager and player internal clocks 583 to make requests for data only once it is available. 585 +===========+========+==========+============================+ 586 | Attribute | Type | Required | Description | 587 +===========+========+==========+============================+ 588 | scheme | string | Y | A universally unique | 589 | | | | identifier (UUID) [UUID] | 590 | | | | that defines the format of | 591 | | | | the time server response. | 592 +-----------+--------+----------+----------------------------+ 593 | url | string | Y | The URL to be queried for | 594 | | | | a time server response. | 595 +-----------+--------+----------+----------------------------+ 597 Table 2 599 3.2.4. ScaledValue 601 In order to avoid rounding issues introduced through floating-point 602 numbers, this structure defines two integers. The total value is 603 calculated by dividing value over scale. 605 +===========+======+==========+============================+ 606 | Attribute | Type | Required | Description | 607 +===========+======+==========+============================+ 608 | value | int | Y | The defined integer value. | 609 +-----------+------+----------+----------------------------+ 610 | scale | uint | N | If not defined, the scale | 611 | | | | SHALL equal 1. | 612 +-----------+------+----------+----------------------------+ 614 Table 3 616 3.2.5. PresentationType 618 This is the type definition of a Presentation. 620 +=============+============================+======+=================+ 621 | Attribute | Type | Req? | Description | 622 +=============+============================+======+=================+ 623 | id | string | Y | The unique | 624 | | | | identifier for | 625 | | | | this | 626 | | | | Presentation. | 627 | | | | It MUST be | 628 | | | | unique over | 629 | | | | all | 630 | | | | Presentations | 631 | | | | of this | 632 | | | | Manifest. It | 633 | | | | MUST NOT | 634 | | | | change over | 635 | | | | Manifest | 636 | | | | updates. | 637 +-------------+----------------------------+------+-----------------+ 638 | timeBounds | TimeBounds | Y | The time | 639 | | | | boundaries of | 640 | | | | this | 641 | | | | Presentation, | 642 | | | | in Manifest | 643 | | | | Time. The | 644 | | | | start time | 645 | | | | MUST be | 646 | | | | announced at | 647 | | | | least 2 | 648 | | | | seconds before | 649 | | | | the | 650 | | | | Presentation | 651 | | | | is active. | 652 | | | | The end time | 653 | | | | of an active | 654 | | | | Presentation | 655 | | | | MUST be | 656 | | | | available at | 657 | | | | least 2 | 658 | | | | seconds before | 659 | | | | the actual end | 660 | | | | of that | 661 | | | | Presentation. | 662 +-------------+----------------------------+------+-----------------+ 663 | audio | AudioSwitchingSetType[] | N | The audio | 664 | | | | Selection Set. | 665 | | | | It contains | 666 | | | | all audio | 667 | | | | Switching Sets | 668 | | | | of this | 669 | | | | Presentation. | 670 | | | | If not | 671 | | | | defined, this | 672 | | | | value SHALL | 673 | | | | equal an empty | 674 | | | | list. | 675 +-------------+----------------------------+------+-----------------+ 676 | baseUrl | string | N | The base URL | 677 | | | | of this | 678 | | | | Presentation. | 679 | | | | It is part of | 680 | | | | the content | 681 | | | | base URLs for | 682 | | | | all Switching | 683 | | | | Sets belonging | 684 | | | | to this | 685 | | | | Presentation. | 686 | | | | See | 687 | | | | Section 3.4 | 688 | | | | for more | 689 | | | | information. | 690 +-------------+----------------------------+------+-----------------+ 691 | currentTime | ScaledValue | N | The most | 692 | | | | recent | 693 | | | | composition | 694 | | | | timestamp of | 695 | | | | any Track | 696 | | | | contained by | 697 | | | | this | 698 | | | | Presentation | 699 | | | | when this | 700 | | | | Manifest is | 701 | | | | created. It | 702 | | | | SHOULD be | 703 | | | | specified in | 704 | | | | Manifest Time, | 705 | | | | not in Media | 706 | | | | Time (as that | 707 | | | | could have | 708 | | | | different | 709 | | | | offsets from | 710 | | | | Track to | 711 | | | | Track.) If | 712 | | | | the | 713 | | | | Presentation | 714 | | | | is active, the | 715 | | | | current time | 716 | | | | MUST be | 717 | | | | defined. If | 718 | | | | the | 719 | | | | Presentation | 720 | | | | is not active, | 721 | | | | this value | 722 | | | | MUST be | 723 | | | | ignored. | 724 +-------------+----------------------------+------+-----------------+ 725 | events | PresentationEventType[] | N | List of all | 726 | | | | currently | 727 | | | | available | 728 | | | | Presentation | 729 | | | | Events related | 730 | | | | to this | 731 | | | | Presentation. | 732 | | | | If not | 733 | | | | defined, this | 734 | | | | value SHALL | 735 | | | | equal an empty | 736 | | | | list. | 737 +-------------+----------------------------+------+-----------------+ 738 | metadata | MetadataSwitchingSetType[] | N | The metadata | 739 | | | | Selection Set. | 740 | | | | It contains | 741 | | | | all metadata | 742 | | | | Switching Sets | 743 | | | | of this | 744 | | | | Presentation. | 745 | | | | If not | 746 | | | | defined, this | 747 | | | | value SHALL | 748 | | | | equal an empty | 749 | | | | list. | 750 +-------------+----------------------------+------+-----------------+ 751 | video | VideoSwitchingSetType[] | N | The video | 752 | | | | Selection Set. | 753 | | | | It contains | 754 | | | | all video | 755 | | | | Switching Sets | 756 | | | | of this | 757 | | | | Presentation. | 758 | | | | If not | 759 | | | | defined, this | 760 | | | | value SHALL | 761 | | | | equal an empty | 762 | | | | list. | 763 +-------------+----------------------------+------+-----------------+ 764 Table 4 766 3.2.6. TimeBounds 768 A TimeBounds structure denotes a time interval with a start and end 769 time. A start or end time may be undefined. The desired behavior in 770 such a situation depends on the specific usage of the structure. 772 These boundaries are inclusive at the start time and exclusive at the 773 end time. I.e., if two TimeBounds need to be continuous, then the 774 end time of the first one must equal the start time of the second. 776 +===========+======+==========+============================+ 777 | Attribute | Type | Required | Description | 778 +===========+======+==========+============================+ 779 | startTime | uint | N | This value denotes the | 780 | | | | start time in seconds when | 781 | | | | divided by the timescale. | 782 +-----------+------+----------+----------------------------+ 783 | endTime | uint | N | This value denotes the end | 784 | | | | time in seconds when | 785 | | | | divided by the timescale. | 786 +-----------+------+----------+----------------------------+ 787 | scale | uint | N | If the timescale is not | 788 | | | | defined, it SHALL equal 1. | 789 +-----------+------+----------+----------------------------+ 791 Table 5 793 3.2.7. AudioSwitchingSetType 795 This is the type definition of an audio Switching Set. 797 +=======================+==================+======+=================+ 798 | Attribute | Type | Req? | Description | 799 +=======================+==================+======+=================+ 800 | id | string | Y | The unique | 801 | | | | identifier for | 802 | | | | this Switching | 803 | | | | Set. It MUST | 804 | | | | be unique | 805 | | | | within its | 806 | | | | Presentation. | 807 +-----------------------+------------------+------+-----------------+ 808 | language | string | Y | The language | 809 | | | | of all audio | 810 | | | | Tracks of this | 811 | | | | Switching Set. | 812 | | | | It MUST be | 813 | | | | specified here | 814 | | | | by its ISO | 815 | | | | 639-2 | 816 | | | | [ISO6392] | 817 | | | | code. | 818 +-----------------------+------------------+------+-----------------+ 819 | tracks | AudioTrackType[] | Y | The collection | 820 | | | | of all Tracks | 821 | | | | belonging to | 822 | | | | this Switching | 823 | | | | Set. | 824 +-----------------------+------------------+------+-----------------+ 825 | alignId | string | N | A unique | 826 | | | | identifier | 827 | | | | that SHOULD be | 828 | | | | set by all | 829 | | | | Switching Sets | 830 | | | | that are | 831 | | | | Aligned | 832 | | | | Switching Sets | 833 | | | | with each | 834 | | | | other. | 835 +-----------------------+------------------+------+-----------------+ 836 | baseUrl | string | N | The base URL | 837 | | | | of this | 838 | | | | Switching Set. | 839 | | | | It is part of | 840 | | | | the content | 841 | | | | base URLs for | 842 | | | | all Tracks | 843 | | | | belonging to | 844 | | | | this Switching | 845 | | | | Set. See | 846 | | | | Section 3.4 | 847 | | | | for more | 848 | | | | information. | 849 +-----------------------+------------------+------+-----------------+ 850 | channels | uint | N | The audio | 851 | | | | channel | 852 | | | | configuration | 853 | | | | of all audio | 854 | | | | Tracks | 855 | | | | belonging to | 856 | | | | this Switching | 857 | | | | Set. It is | 858 | | | | defined as the | 859 | | | | total number | 860 | | | | of audio | 861 | | | | channels. | 862 | | | | This value is | 863 | | | | used for ABR | 864 | | | | selection by | 865 | | | | the player. | 866 +-----------------------+------------------+------+-----------------+ 867 | codecs | string | N | The definition | 868 | | | | of the | 869 | | | | codec(s) | 870 | | | | necessary to | 871 | | | | render the | 872 | | | | content of all | 873 | | | | Tracks | 874 | | | | belonging to | 875 | | | | this Switching | 876 | | | | Set. It MUST | 877 | | | | follow the ISO | 878 | | | | File Format | 879 | | | | Name Space as | 880 | | | | defined by | 881 | | | | [RFC6381]. If | 882 | | | | this is not | 883 | | | | defined, it | 884 | | | | MUST be | 885 | | | | defined on all | 886 | | | | Tracks of this | 887 | | | | Switching Set | 888 | | | | separately. | 889 +-----------------------+------------------+------+-----------------+ 890 | continuationPattern | string | N | The URL | 891 | | | | pattern used | 892 | | | | to request | 893 | | | | Continuation | 894 | | | | Segments | 895 | | | | belonging to | 896 | | | | this Switching | 897 | | | | Set. The | 898 | | | | pattern MUST | 899 | | | | include the | 900 | | | | {segmentId} | 901 | | | | string; see | 902 | | | | Section 3.4 | 903 | | | | for more | 904 | | | | information. | 905 | | | | If this is not | 906 | | | | defined, it | 907 | | | | MUST be | 908 | | | | defined on all | 909 | | | | Tracks of this | 910 | | | | Switching Set | 911 | | | | separately. | 912 +-----------------------+------------------+------+-----------------+ 913 | initializationPattern | string | N | The URL | 914 | | | | pattern used | 915 | | | | to request | 916 | | | | Initialization | 917 | | | | Packets | 918 | | | | belonging to | 919 | | | | this Switching | 920 | | | | Set. The | 921 | | | | pattern MUST | 922 | | | | include the | 923 | | | | {initId} | 924 | | | | string; see | 925 | | | | Section 3.4 | 926 | | | | for more | 927 | | | | information. | 928 | | | | If this is not | 929 | | | | defined, it | 930 | | | | MUST be | 931 | | | | defined on all | 932 | | | | Tracks of this | 933 | | | | Switching Set | 934 | | | | separately. | 935 +-----------------------+------------------+------+-----------------+ 936 | label | string | N | A human- | 937 | | | | readable name | 938 | | | | for this | 939 | | | | Switching Set. | 940 +-----------------------+------------------+------+-----------------+ 941 | mediaTimeOffset | ScaledValue | N | The offset to | 942 | | | | be added to | 943 | | | | Manifest | 944 | | | | Timestamps to | 945 | | | | calculate | 946 | | | | Media | 947 | | | | Timestamps | 948 | | | | contained by | 949 | | | | Segments of | 950 | | | | this Switching | 951 | | | | Set. If | 952 | | | | neither the | 953 | | | | Track nor the | 954 | | | | Switching Set | 955 | | | | has this value | 956 | | | | set, it SHALL | 957 | | | | equal 0. | 958 +-----------------------+------------------+------+-----------------+ 959 | mimeType | string | N | The MIME type | 960 | | | | of all content | 961 | | | | belonging to | 962 | | | | this Switching | 963 | | | | Set. It MUST | 964 | | | | be a valid | 965 | | | | audio MIME | 966 | | | | type. If not | 967 | | | | set, the MIME | 968 | | | | type of all | 969 | | | | content of | 970 | | | | this Switching | 971 | | | | Set SHALL | 972 | | | | equal audio/ | 973 | | | | mp4. | 974 +-----------------------+------------------+------+-----------------+ 975 | protection | SwitchingSet | N | The | 976 | | Protection | | information | 977 | | | | related to | 978 | | | | content | 979 | | | | protection for | 980 | | | | all Tracks | 981 | | | | belonging to | 982 | | | | this Switching | 983 | | | | Set. | 984 +-----------------------+------------------+------+-----------------+ 985 | sampleRate | uint | N | The sample | 986 | | | | rate (in Hz) | 987 | | | | of all audio | 988 | | | | Tracks | 989 | | | | belonging to | 990 | | | | this Switching | 991 | | | | Set. If this | 992 | | | | is not | 993 | | | | defined, it | 994 | | | | MUST be | 995 | | | | defined on all | 996 | | | | Tracks of this | 997 | | | | Switching Set | 998 | | | | separately. | 999 +-----------------------+------------------+------+-----------------+ 1000 | samplesPerFrame | uint | N | The number of | 1001 | | | | audio samples | 1002 | | | | in one frame. | 1003 | | | | If set, this | 1004 | | | | MUST apply to | 1005 | | | | each Track | 1006 | | | | belonging to | 1007 | | | | this Switching | 1008 | | | | Set that does | 1009 | | | | not define | 1010 | | | | this value | 1011 | | | | itself. If | 1012 | | | | neither the | 1013 | | | | Track nor the | 1014 | | | | Switching Set | 1015 | | | | has this value | 1016 | | | | set, it SHALL | 1017 | | | | equal 1024. | 1018 +-----------------------+------------------+------+-----------------+ 1020 Table 6 1022 3.2.8. SwitchingSetProtection 1024 The following fields are defined on the SwitchingSetProtection 1025 structure: 1027 +===========+================================+======+===============+ 1028 | Attribute | Type | Req? | Description | 1029 +===========+================================+======+===============+ 1030 | type | Enum("cenc", "cbcs") | Y | The | 1031 | | | | protection | 1032 | | | | scheme used | 1033 | | | | to encrypt | 1034 | | | | this | 1035 | | | | Switching | 1036 | | | | Set. | 1037 +-----------+--------------------------------+------+---------------+ 1038 | systems | SwitchingSetProtectionSystem[] | Y | Metadata | 1039 | | | | about the | 1040 | | | | DRM systems | 1041 | | | | that can be | 1042 | | | | used to | 1043 | | | | playback | 1044 | | | | this | 1045 | | | | Switching | 1046 | | | | Set. This | 1047 | | | | list MUST | 1048 | | | | contain at | 1049 | | | | least one | 1050 | | | | entry. | 1051 +-----------+--------------------------------+------+---------------+ 1053 Table 7 1055 More information on content protection can be found in Section 7. 1057 3.2.9. SwitchingSetProtectionSystem 1059 The following fields are defined on the SwitchingSetProtectionSystem 1060 structure: 1062 +===========+=======+==========+====================================+ 1063 | Attribute |Type | Required | Description | 1064 +===========+=======+==========+====================================+ 1065 | pssh |string | N | A Base 64 encoded | 1066 | | | | ProtectionSystemSpecificHeaderBox. | 1067 | | | | If it is not defined in the | 1068 | | | | Manifest, then such a box MUST be | 1069 | | | | available in the Initialization | 1070 | | | | Stream of each Track belonging to | 1071 | | | | this Switching Set. | 1072 +-----------+-------+----------+------------------------------------+ 1073 | schemeId |string | Y | A UUID [UUID] that identifies the | 1074 | | | | protection system used to protect | 1075 | | | | this Switching Set. | 1076 +-----------+-------+----------+------------------------------------+ 1078 Table 8 1080 Additional attributes MAY be defined by specific protection systems 1081 such as license acquisition URLs, authorization-related URLs, 1082 specific initialization data (a default key ID),... These attributes 1083 SHOULD be defined on this SwitchingSetProtectionSystem structure; 1084 however, they will not be defined here. Depending on the scheme ID 1085 that is given here, a client SHOULD verify that these additional 1086 attributes exist. 1088 3.2.10. VideoSwitchingSetType 1090 This is the type definition of a video Switching Set. 1092 +=======================+==================+======+=================+ 1093 | Attribute | Type | Req? | Description | 1094 +=======================+==================+======+=================+ 1095 | id | string | Y | The unique | 1096 | | | | identifier for | 1097 | | | | this Switching | 1098 | | | | Set. It MUST | 1099 | | | | be unique | 1100 | | | | within its | 1101 | | | | Presentation. | 1102 +-----------------------+------------------+------+-----------------+ 1103 | tracks | VideoTrackType[] | Y | The collection | 1104 | | | | of all Tracks | 1105 | | | | belonging to | 1106 | | | | this Switching | 1107 | | | | Set. | 1108 +-----------------------+------------------+------+-----------------+ 1109 | alignId | string | N | A unique | 1110 | | | | identifier | 1111 | | | | that SHOULD be | 1112 | | | | set by all | 1113 | | | | Switching Sets | 1114 | | | | that are | 1115 | | | | Aligned | 1116 | | | | Switching Sets | 1117 | | | | with each | 1118 | | | | other. | 1119 +-----------------------+------------------+------+-----------------+ 1120 | baseUrl | string | N | The base URL | 1121 | | | | of this | 1122 | | | | Switching Set. | 1123 | | | | It is part of | 1124 | | | | the content | 1125 | | | | base URLs for | 1126 | | | | all Tracks | 1127 | | | | belonging to | 1128 | | | | this Switching | 1129 | | | | Set. See | 1130 | | | | Section 3.4 | 1131 | | | | for more | 1132 | | | | information. | 1133 +-----------------------+------------------+------+-----------------+ 1134 | codecs | string | N | The definition | 1135 | | | | of the | 1136 | | | | codec(s) | 1137 | | | | necessary to | 1138 | | | | render the | 1139 | | | | content of all | 1140 | | | | Tracks | 1141 | | | | belonging to | 1142 | | | | this Switching | 1143 | | | | Set. It MUST | 1144 | | | | follow the ISO | 1145 | | | | File Format | 1146 | | | | Name Space as | 1147 | | | | defined by | 1148 | | | | [RFC6381]. If | 1149 | | | | this is not | 1150 | | | | defined, it | 1151 | | | | MUST be | 1152 | | | | defined on all | 1153 | | | | Tracks of this | 1154 | | | | Switching Set | 1155 | | | | separately. | 1156 +-----------------------+------------------+------+-----------------+ 1157 | continuationPattern | string | N | The URL | 1158 | | | | pattern used | 1159 | | | | to request | 1160 | | | | Continuation | 1161 | | | | Segments | 1162 | | | | belonging to | 1163 | | | | this Switching | 1164 | | | | Set. The | 1165 | | | | pattern MUST | 1166 | | | | include the | 1167 | | | | {segmentId} | 1168 | | | | string; see | 1169 | | | | Section 3.4 | 1170 | | | | for more | 1171 | | | | information. | 1172 | | | | If this is not | 1173 | | | | defined, it | 1174 | | | | MUST be | 1175 | | | | defined on all | 1176 | | | | Tracks of this | 1177 | | | | Switching Set | 1178 | | | | separately. | 1179 +-----------------------+------------------+------+-----------------+ 1180 | frameRate | ScaledValue | N | The frame rate | 1181 | | | | of all video | 1182 | | | | Tracks | 1183 | | | | belonging to | 1184 | | | | this Switching | 1185 | | | | Set. If this | 1186 | | | | is not | 1187 | | | | defined, then | 1188 | | | | every Track | 1189 | | | | MUST set its | 1190 | | | | frame rate | 1191 | | | | separately. | 1192 +-----------------------+------------------+------+-----------------+ 1193 | initializationPattern | string | N | The URL | 1194 | | | | pattern used | 1195 | | | | to request | 1196 | | | | Initialization | 1197 | | | | Packets | 1198 | | | | belonging to | 1199 | | | | this Switching | 1200 | | | | Set. The | 1201 | | | | pattern MUST | 1202 | | | | include the | 1203 | | | | {initId} | 1204 | | | | string; see | 1205 | | | | Section 3.4 | 1206 | | | | for more | 1207 | | | | information. | 1208 | | | | If this is not | 1209 | | | | defined, it | 1210 | | | | MUST be | 1211 | | | | defined on all | 1212 | | | | Tracks of this | 1213 | | | | Switching Set | 1214 | | | | separately. | 1215 +-----------------------+------------------+------+-----------------+ 1216 | label | string | N | A human- | 1217 | | | | readable name | 1218 | | | | for this | 1219 | | | | Switching Set. | 1220 +-----------------------+------------------+------+-----------------+ 1221 | mediaTimeOffset | ScaledValue | N | The offset to | 1222 | | | | be added to | 1223 | | | | Manifest | 1224 | | | | Timestamps to | 1225 | | | | calculate | 1226 | | | | Media | 1227 | | | | Timestamps | 1228 | | | | contained by | 1229 | | | | Segments of | 1230 | | | | this Switching | 1231 | | | | Set. If | 1232 | | | | neither the | 1233 | | | | Track nor the | 1234 | | | | Switching Set | 1235 | | | | has this value | 1236 | | | | set, it SHALL | 1237 | | | | equal 0. | 1238 +-----------------------+------------------+------+-----------------+ 1239 | mimeType | string | N | The MIME type | 1240 | | | | of all content | 1241 | | | | belonging to | 1242 | | | | this Switching | 1243 | | | | Set. It MUST | 1244 | | | | be a valid | 1245 | | | | video MIME | 1246 | | | | type. If not | 1247 | | | | set, the MIME | 1248 | | | | type of all | 1249 | | | | content of | 1250 | | | | this Switching | 1251 | | | | Set SHALL | 1252 | | | | equal video/ | 1253 | | | | mp4. | 1254 +-----------------------+------------------+------+-----------------+ 1255 | protection | SwitchingSet | N | The | 1256 | | Protection | | information | 1257 | | | | related to | 1258 | | | | content | 1259 | | | | protection for | 1260 | | | | all Tracks | 1261 | | | | belonging to | 1262 | | | | this Switching | 1263 | | | | Set. | 1264 +-----------------------+------------------+------+-----------------+ 1266 Table 9 1268 3.2.11. MetadataSwitchingSetType 1270 This is the type definition of a metadata Switching Set. 1272 +=====================+=====================+=====+=================+ 1273 | Attribute | Type |Req? |Description | 1274 +=====================+=====================+=====+=================+ 1275 | id | string |Y |The unique | 1276 | | | |identifier for | 1277 | | | |this Switching | 1278 | | | |Set. It MUST be | 1279 | | | |unique within its| 1280 | | | |Presentation. | 1281 +---------------------+---------------------+-----+-----------------+ 1282 | mimeType | string |Y |The MIME type of | 1283 | | | |all content | 1284 | | | |belonging to this| 1285 | | | |Switching Set. | 1286 +---------------------+---------------------+-----+-----------------+ 1287 | tracks | MetadataTrackType[] |Y |The collection of| 1288 | | | |all Tracks | 1289 | | | |belonging to this| 1290 | | | |Switching Set. | 1291 +---------------------+---------------------+-----+-----------------+ 1292 | schemeId | string |Y |An identifier | 1293 | | | |that denotes the | 1294 | | | |type of metadata | 1295 | | | |contained by this| 1296 | | | |metadata | 1297 | | | |Switching Set. | 1298 | | | |Client behavior | 1299 | | | |may differ | 1300 | | | |depending on the | 1301 | | | |vendor-specific | 1302 | | | |identifier set | 1303 | | | |here. | 1304 +---------------------+---------------------+-----+-----------------+ 1305 | alignId | string |N |A unique | 1306 | | | |identifier that | 1307 | | | |SHOULD be set by | 1308 | | | |all Switching | 1309 | | | |Sets that are | 1310 | | | |Aligned Switching| 1311 | | | |Sets with each | 1312 | | | |other. | 1313 +---------------------+---------------------+-----+-----------------+ 1314 | baseUrl | string |N |The base URL of | 1315 | | | |this Switching | 1316 | | | |Set. It is part | 1317 | | | |of the content | 1318 | | | |base URLs for all| 1319 | | | |Tracks belonging | 1320 | | | |to this Switching| 1321 | | | |Set. See | 1322 | | | |Section 3.4 for | 1323 | | | |more information.| 1324 +---------------------+---------------------+-----+-----------------+ 1325 | codecs | string |N |The definition of| 1326 | | | |the codec(s) | 1327 | | | |necessary to | 1328 | | | |render the | 1329 | | | |content of all | 1330 | | | |Tracks belonging | 1331 | | | |to this Switching| 1332 | | | |Set. It MUST | 1333 | | | |follow the ISO | 1334 | | | |File Format Name | 1335 | | | |Space as defined | 1336 | | | |by [RFC6381]. If| 1337 | | | |this Switching | 1338 | | | |Set does not | 1339 | | | |define it, then | 1340 | | | |it must be | 1341 | | | |defined on all | 1342 | | | |its Tracks | 1343 | | | |separately. | 1344 +---------------------+---------------------+-----+-----------------+ 1345 | continuationPattern | string |N |The URL pattern | 1346 | | | |used to request | 1347 | | | |Continuation | 1348 | | | |Segments | 1349 | | | |belonging to this| 1350 | | | |Switching Set. | 1351 | | | |The pattern MUST | 1352 | | | |include the | 1353 | | | |{segmentId} | 1354 | | | |string; see | 1355 | | | |Section 3.4 for | 1356 | | | |more information.| 1357 | | | |If this is not | 1358 | | | |defined, it MUST | 1359 | | | |be defined on all| 1360 | | | |Tracks of this | 1361 | | | |Switching Set | 1362 | | | |separately. | 1363 +---------------------+---------------------+-----+-----------------+ 1364 | label | string |N |A human-readable | 1365 | | | |name for this | 1366 | | | |Switching Set. | 1367 +---------------------+---------------------+-----+-----------------+ 1368 | language | string |N |The language of | 1369 | | | |all Tracks of the| 1370 | | | |metadata | 1371 | | | |Switching Set. It| 1372 | | | |MUST be specified| 1373 | | | |here by its ISO | 1374 | | | |639-2 [ISO6392] | 1375 | | | |code. | 1376 +---------------------+---------------------+-----+-----------------+ 1377 | mediaTimeOffset | ScaledValue |N |The offset to be | 1378 | | | |added to Manifest| 1379 | | | |Timestamps to | 1380 | | | |calculate Media | 1381 | | | |Timestamps | 1382 | | | |contained by | 1383 | | | |Segments of this | 1384 | | | |Switching Set. If| 1385 | | | |neither the Track| 1386 | | | |nor the Switching| 1387 | | | |Set has this | 1388 | | | |value set, it | 1389 | | | |SHALL equal 0. | 1390 +---------------------+---------------------+-----+-----------------+ 1392 Table 10 1394 3.2.12. AudioTrackType 1396 This is the type definition of an audio Track. 1398 +=======================+===============+======+====================+ 1399 | Attribute | Type | Req? | Description | 1400 +=======================+===============+======+====================+ 1401 | bandwidth | uint | Y | The peak bitrate | 1402 | | | | of this Track. | 1403 | | | | It is denoted in | 1404 | | | | bits per second. | 1405 | | | | The measured | 1406 | | | | bitrates of all | 1407 | | | | Segments of the | 1408 | | | | Track MUST NOT | 1409 | | | | exceed this | 1410 | | | | value. This | 1411 | | | | value is used to | 1412 | | | | aid in ABR | 1413 | | | | decisions by the | 1414 | | | | player. | 1415 +-----------------------+---------------+------+--------------------+ 1416 | id | string | Y | The unique | 1417 | | | | identifier for | 1418 | | | | this Track. It | 1419 | | | | MUST be unique | 1420 | | | | within its | 1421 | | | | Switching Set. | 1422 +-----------------------+---------------+------+--------------------+ 1423 | segments | SegmentType[] | Y | The metadata of | 1424 | | | | the Segments | 1425 | | | | contained by this | 1426 | | | | Track at the | 1427 | | | | moment of | 1428 | | | | Manifest | 1429 | | | | creation. If a | 1430 | | | | Segment duration | 1431 | | | | is set in the | 1432 | | | | Track of this | 1433 | | | | Segment, then | 1434 | | | | only the active | 1435 | | | | Segment MUST be | 1436 | | | | shown here. If a | 1437 | | | | segment duration | 1438 | | | | is not set, then | 1439 | | | | each Segment MUST | 1440 | | | | be announced by | 1441 | | | | the Manifest | 1442 | | | | before it is | 1443 | | | | active. More | 1444 | | | | information about | 1445 | | | | Segment | 1446 | | | | availability is | 1447 | | | | given in | 1448 | | | | Section 5.2. | 1449 +-----------------------+---------------+------+--------------------+ 1450 | activeSegment | uint | N | The identifier of | 1451 | | | | the Segment that | 1452 | | | | is currently | 1453 | | | | active. If this | 1454 | | | | Track's | 1455 | | | | Presentation is | 1456 | | | | currently active, | 1457 | | | | or if the Track's | 1458 | | | | Presentation was | 1459 | | | | previously active | 1460 | | | | but contains | 1461 | | | | Segments that can | 1462 | | | | still be | 1463 | | | | requested, then | 1464 | | | | this value MUST | 1465 | | | | equal the | 1466 | | | | identifier of the | 1467 | | | | most recent | 1468 | | | | Continuation | 1469 | | | | Segment that is | 1470 | | | | available for | 1471 | | | | retrieval. If | 1472 | | | | the Track's | 1473 | | | | Presentation has | 1474 | | | | not yet been | 1475 | | | | active (and as a | 1476 | | | | result, no | 1477 | | | | Continuation | 1478 | | | | Segments are | 1479 | | | | available yet), | 1480 | | | | then this value | 1481 | | | | MUST be ignored | 1482 | | | | by the client. | 1483 +-----------------------+---------------+------+--------------------+ 1484 | activeSequenceNumber | uint | N | The active | 1485 | | | | Sequence Number | 1486 | | | | of this Track. | 1487 | | | | It MUST denote | 1488 | | | | the Sequence | 1489 | | | | Number of the | 1490 | | | | Initialization | 1491 | | | | Packet of this | 1492 | | | | Track that was | 1493 | | | | most recently | 1494 | | | | published at the | 1495 | | | | time of the | 1496 | | | | creation of this | 1497 | | | | Manifest. If | 1498 | | | | this Track's | 1499 | | | | Presentation was | 1500 | | | | previously active | 1501 | | | | or is currently | 1502 | | | | active, then the | 1503 | | | | most recent | 1504 | | | | Initialization | 1505 | | | | Packet's Sequence | 1506 | | | | Number MUST be | 1507 | | | | shown here. If | 1508 | | | | the Track's | 1509 | | | | Presentation has | 1510 | | | | not yet been | 1511 | | | | active (i.e., no | 1512 | | | | Initialization | 1513 | | | | Packets are yet | 1514 | | | | available), then | 1515 | | | | the client MUST | 1516 | | | | ignore this | 1517 | | | | value. | 1518 +-----------------------+---------------+------+--------------------+ 1519 | averageBandwidth | uint | N | The average | 1520 | | | | bitrate of this | 1521 | | | | Track, denoted in | 1522 | | | | bits per second. | 1523 | | | | It is expected | 1524 | | | | that over a | 1525 | | | | duration of 10 | 1526 | | | | minutes, the | 1527 | | | | average bitrate | 1528 | | | | of this Track | 1529 | | | | SHALL be within | 1530 | | | | 5% of the given | 1531 | | | | value. This | 1532 | | | | value is used to | 1533 | | | | aid in ABR | 1534 | | | | decisions by the | 1535 | | | | player. | 1536 +-----------------------+---------------+------+--------------------+ 1537 | baseUrl | string | N | The base URL of | 1538 | | | | this Track. It | 1539 | | | | is part of the | 1540 | | | | content base URLs | 1541 | | | | used to request | 1542 | | | | this Track's | 1543 | | | | Initialization | 1544 | | | | Packets and | 1545 | | | | Continuation | 1546 | | | | Segments. See | 1547 | | | | Section 3.4 for | 1548 | | | | more information. | 1549 +-----------------------+---------------+------+--------------------+ 1550 | channels | integer | N | The audio channel | 1551 | | | | configuration of | 1552 | | | | this audio Track, | 1553 | | | | defined as the | 1554 | | | | total number of | 1555 | | | | audio channels in | 1556 | | | | the media data. | 1557 | | | | This is used for | 1558 | | | | ABR selection by | 1559 | | | | the player. | 1560 +-----------------------+---------------+------+--------------------+ 1561 | codecs | string | N | The definition of | 1562 | | | | the codec(s) | 1563 | | | | necessary to | 1564 | | | | render the | 1565 | | | | content of this | 1566 | | | | Track. It MUST | 1567 | | | | follow the ISO | 1568 | | | | File Format Name | 1569 | | | | Space as defined | 1570 | | | | by [RFC6381]. If | 1571 | | | | this is not | 1572 | | | | defined here, | 1573 | | | | then it MUST be | 1574 | | | | defined by the | 1575 | | | | Track's Switching | 1576 | | | | Set. | 1577 +-----------------------+---------------+------+--------------------+ 1578 | continuationPattern | string | N | The URL pattern | 1579 | | | | used to request | 1580 | | | | Continuation | 1581 | | | | Segments | 1582 | | | | belonging to this | 1583 | | | | Track. The | 1584 | | | | pattern needs to | 1585 | | | | include the | 1586 | | | | {segmentId} | 1587 | | | | string; see | 1588 | | | | Section 3.4 for | 1589 | | | | more information. | 1590 | | | | If this is not | 1591 | | | | defined here, | 1592 | | | | then it MUST be | 1593 | | | | defined by the | 1594 | | | | Track's Switching | 1595 | | | | Set. | 1596 +-----------------------+---------------+------+--------------------+ 1597 | label | string | N | A human-readable | 1598 | | | | name for this | 1599 | | | | Track. | 1600 +-----------------------+---------------+------+--------------------+ 1601 | initializationPattern | string | N | The URL pattern | 1602 | | | | used to request | 1603 | | | | Initialization | 1604 | | | | Packets belonging | 1605 | | | | to this media | 1606 | | | | Track. The | 1607 | | | | pattern needs to | 1608 | | | | include the | 1609 | | | | {initId} string; | 1610 | | | | see Section 3.4 | 1611 | | | | for more | 1612 | | | | information. If | 1613 | | | | this is not | 1614 | | | | defined here, | 1615 | | | | then it MUST be | 1616 | | | | defined by the | 1617 | | | | Track's Switching | 1618 | | | | Set. | 1619 +-----------------------+---------------+------+--------------------+ 1620 | mediaTimeOffset | ScaledValue | N | The offset to be | 1621 | | | | added to Manifest | 1622 | | | | Timestamps to | 1623 | | | | calculate Media | 1624 | | | | Timestamps | 1625 | | | | contained by | 1626 | | | | segments of this | 1627 | | | | Track. If | 1628 | | | | neither the Track | 1629 | | | | nor the Switching | 1630 | | | | Set has this | 1631 | | | | value set, it | 1632 | | | | SHALL equal 0. | 1633 +-----------------------+---------------+------+--------------------+ 1634 | sampleRate | uint | N | The sample rate | 1635 | | | | (in Hz) of this | 1636 | | | | audio Track. If | 1637 | | | | this is not | 1638 | | | | defined here, | 1639 | | | | then it MUST be | 1640 | | | | defined by the | 1641 | | | | Track's Switching | 1642 | | | | Set. | 1643 +-----------------------+---------------+------+--------------------+ 1644 | samplesPerFrame | uint | N | The number of | 1645 | | | | audio samples in | 1646 | | | | one frame of this | 1647 | | | | audio Track. If | 1648 | | | | neither the Track | 1649 | | | | nor the Switching | 1650 | | | | Set has this | 1651 | | | | value set, it | 1652 | | | | SHALL equal 1024. | 1653 +-----------------------+---------------+------+--------------------+ 1654 | segmentDuration | ScaledValue | N | The duration (in | 1655 | | | | seconds) of each | 1656 | | | | Segment contained | 1657 | | | | by this Track. | 1658 | | | | If this value is | 1659 | | | | set, then every | 1660 | | | | segmentDuration | 1661 | | | | seconds, a new | 1662 | | | | Segment SHOULD be | 1663 | | | | available. More | 1664 | | | | information about | 1665 | | | | the availability | 1666 | | | | is given in | 1667 | | | | Section 5.2. If | 1668 | | | | not set, then | 1669 | | | | each Segment MUST | 1670 | | | | be individually | 1671 | | | | defined by the | 1672 | | | | segments field. | 1673 +-----------------------+---------------+------+--------------------+ 1675 Table 11 1677 For any attributes that also exist on AudioSwitchingSetType, if both 1678 AudioSwitchingSetType and AudioTrackType have a value for this 1679 attribute, then only the value set by AudioTrackType must be 1680 considered (except for the identifier and label.) 1682 3.2.13. VideoTrackType 1684 This is the type definition of a video Track. 1686 +=======================+===============+======+====================+ 1687 | Attribute | Type | Req? | Description | 1688 +=======================+===============+======+====================+ 1689 | bandwidth | uint | Y | The peak bitrate | 1690 | | | | of this Track. | 1691 | | | | It is denoted in | 1692 | | | | bits per second. | 1693 | | | | The measured | 1694 | | | | bitrates of all | 1695 | | | | Segments of the | 1696 | | | | Track MUST NOT | 1697 | | | | exceed this | 1698 | | | | value. This | 1699 | | | | value is used to | 1700 | | | | aid in ABR | 1701 | | | | decisions by the | 1702 | | | | player. | 1703 +-----------------------+---------------+------+--------------------+ 1704 | id | string | Y | The unique | 1705 | | | | identifier for | 1706 | | | | this Track. It | 1707 | | | | MUST be unique | 1708 | | | | within its | 1709 | | | | Switching Set. | 1710 +-----------------------+---------------+------+--------------------+ 1711 | resolution | Resolution | Y | The resolution of | 1712 | | | | this video Track. | 1713 +-----------------------+---------------+------+--------------------+ 1714 | segments | SegmentType[] | Y | The metadata of | 1715 | | | | the Segments | 1716 | | | | contained by this | 1717 | | | | Track at the | 1718 | | | | moment of | 1719 | | | | Manifest | 1720 | | | | creation. If a | 1721 | | | | Segment duration | 1722 | | | | is set in the | 1723 | | | | Track of this | 1724 | | | | Segment, then | 1725 | | | | only the active | 1726 | | | | Segment MUST be | 1727 | | | | shown here. If a | 1728 | | | | segment duration | 1729 | | | | is not set, then | 1730 | | | | each Segment MUST | 1731 | | | | be announced by | 1732 | | | | the Manifest | 1733 | | | | before it is | 1734 | | | | active. More | 1735 | | | | information about | 1736 | | | | Segment | 1737 | | | | availability is | 1738 | | | | given in | 1739 | | | | Section 5.2. | 1740 +-----------------------+---------------+------+--------------------+ 1741 | activeSegment | uint | N | The identifier of | 1742 | | | | the Segment that | 1743 | | | | is currently | 1744 | | | | active. If this | 1745 | | | | Track's | 1746 | | | | Presentation is | 1747 | | | | currently active, | 1748 | | | | or if the Track's | 1749 | | | | Presentation was | 1750 | | | | previously active | 1751 | | | | but contains | 1752 | | | | Segments that can | 1753 | | | | still be | 1754 | | | | requested, then | 1755 | | | | this value MUST | 1756 | | | | equal the | 1757 | | | | identifier of the | 1758 | | | | most recent | 1759 | | | | Continuation | 1760 | | | | Segment that is | 1761 | | | | available for | 1762 | | | | retrieval. If | 1763 | | | | the Track's | 1764 | | | | Presentation has | 1765 | | | | not yet been | 1766 | | | | active (and as a | 1767 | | | | result, no | 1768 | | | | Continuation | 1769 | | | | Segments are | 1770 | | | | available yet), | 1771 | | | | then this value | 1772 | | | | MUST be ignored | 1773 | | | | by the client. | 1774 +-----------------------+---------------+------+--------------------+ 1775 | activeSequenceNumber | uint | N | The active | 1776 | | | | Sequence Number | 1777 | | | | of this Track. | 1778 | | | | It MUST denote | 1779 | | | | the Sequence | 1780 | | | | Number of the | 1781 | | | | Initialization | 1782 | | | | Packet of this | 1783 | | | | Track that was | 1784 | | | | most recently | 1785 | | | | published at the | 1786 | | | | time of the | 1787 | | | | creation of this | 1788 | | | | Manifest. If | 1789 | | | | this Track's | 1790 | | | | Presentation was | 1791 | | | | previously active | 1792 | | | | or is currently | 1793 | | | | active, then the | 1794 | | | | most recent | 1795 | | | | Initialization | 1796 | | | | Packet's Sequence | 1797 | | | | Number MUST be | 1798 | | | | shown here. If | 1799 | | | | the Track's | 1800 | | | | Presentation has | 1801 | | | | not yet been | 1802 | | | | active (i.e., no | 1803 | | | | Initialization | 1804 | | | | Packets are yet | 1805 | | | | available), then | 1806 | | | | the client MUST | 1807 | | | | ignore this | 1808 | | | | value. | 1809 +-----------------------+---------------+------+--------------------+ 1810 | averageBandwidth | uint | N | The average | 1811 | | | | bitrate of this | 1812 | | | | Track, denoted in | 1813 | | | | bits per second. | 1814 | | | | It is expected | 1815 | | | | that over a | 1816 | | | | duration of 10 | 1817 | | | | minutes, the | 1818 | | | | average bitrate | 1819 | | | | of this Track | 1820 | | | | SHALL be within | 1821 | | | | 5% of the given | 1822 | | | | value. This | 1823 | | | | value is used to | 1824 | | | | aid in ABR | 1825 | | | | decisions by the | 1826 | | | | player. | 1827 +-----------------------+---------------+------+--------------------+ 1828 | baseUrl | string | N | The base URL of | 1829 | | | | this Track. It | 1830 | | | | is part of the | 1831 | | | | content base URLs | 1832 | | | | used to request | 1833 | | | | this Track's | 1834 | | | | Initialization | 1835 | | | | Packets and | 1836 | | | | Continuation | 1837 | | | | Segments. See | 1838 | | | | Section 3.4 for | 1839 | | | | more information. | 1840 +-----------------------+---------------+------+--------------------+ 1841 | codecs | string | N | The definition of | 1842 | | | | the codec(s) | 1843 | | | | necessary to | 1844 | | | | render the | 1845 | | | | content of this | 1846 | | | | Track. It MUST | 1847 | | | | follow the ISO | 1848 | | | | File Format Name | 1849 | | | | Space as defined | 1850 | | | | by [RFC6381]. If | 1851 | | | | this is not | 1852 | | | | defined here, | 1853 | | | | then it MUST be | 1854 | | | | defined by the | 1855 | | | | Track's Switching | 1856 | | | | Set. | 1857 +-----------------------+---------------+------+--------------------+ 1858 | continuationPattern | string | N | The URL pattern | 1859 | | | | used to request | 1860 | | | | Continuation | 1861 | | | | Segments | 1862 | | | | belonging to this | 1863 | | | | Track. The | 1864 | | | | pattern needs to | 1865 | | | | include the | 1866 | | | | {segmentId} | 1867 | | | | string; see | 1868 | | | | Section 3.4 for | 1869 | | | | more information. | 1870 | | | | If this is not | 1871 | | | | defined here, | 1872 | | | | then it MUST be | 1873 | | | | defined by the | 1874 | | | | Track's Switching | 1875 | | | | Set. | 1876 +-----------------------+---------------+------+--------------------+ 1877 | frameRate | ScaledValue | N | The frame rate of | 1878 | | | | this video Track. | 1879 | | | | If it is not | 1880 | | | | defined by the | 1881 | | | | Switching Set, | 1882 | | | | then it must be | 1883 | | | | defined here. | 1884 +-----------------------+---------------+------+--------------------+ 1885 | label | string | N | A human-readable | 1886 | | | | name for this | 1887 | | | | Track. | 1888 +-----------------------+---------------+------+--------------------+ 1889 | initializationPattern | string | N | The URL pattern | 1890 | | | | used to request | 1891 | | | | Initialization | 1892 | | | | Packets belonging | 1893 | | | | to this media | 1894 | | | | Track. The | 1895 | | | | pattern needs to | 1896 | | | | include the | 1897 | | | | {initId} string; | 1898 | | | | see Section 3.4 | 1899 | | | | for more | 1900 | | | | information. If | 1901 | | | | this is not | 1902 | | | | defined here, | 1903 | | | | then it MUST be | 1904 | | | | defined by the | 1905 | | | | Track's Switching | 1906 | | | | Set. | 1907 +-----------------------+---------------+------+--------------------+ 1908 | mediaTimeOffset | ScaledValue | N | The offset to be | 1909 | | | | added to Manifest | 1910 | | | | Timestamps to | 1911 | | | | calculate Media | 1912 | | | | Timestamps | 1913 | | | | contained by | 1914 | | | | segments of this | 1915 | | | | Track. If | 1916 | | | | neither the Track | 1917 | | | | nor the Switching | 1918 | | | | Set has this | 1919 | | | | value set, it | 1920 | | | | SHALL equal 0. | 1921 +-----------------------+---------------+------+--------------------+ 1922 | segmentDuration | ScaledValue | N | The duration (in | 1923 | | | | seconds) of each | 1924 | | | | Segment contained | 1925 | | | | by this Track. | 1926 | | | | If this value is | 1927 | | | | set, then every | 1928 | | | | segmentDuration | 1929 | | | | seconds, a new | 1930 | | | | Segment SHOULD be | 1931 | | | | available. More | 1932 | | | | information about | 1933 | | | | the availability | 1934 | | | | is given in | 1935 | | | | Section 5.2. If | 1936 | | | | not set, then | 1937 | | | | each Segment MUST | 1938 | | | | be individually | 1939 | | | | defined by the | 1940 | | | | segments field. | 1941 +-----------------------+---------------+------+--------------------+ 1943 Table 12 1945 For any attributes that also exist on VideoSwitchingSetType, if both 1946 VideoSwitchingSetType and VideoTrackType have a value for this 1947 attribute, then only the value set by VideoTrackType must be 1948 considered (except for the identifier and label.) 1950 3.2.14. Resolution 1952 A Resolution contains the following elements: 1954 +===========+======+==========+=================================+ 1955 | Attribute | Type | Required | Description | 1956 +===========+======+==========+=================================+ 1957 | width | uint | Y | The width of the picture. | 1958 +-----------+------+----------+---------------------------------+ 1959 | height | uint | Y | The height of the picture. | 1960 +-----------+------+----------+---------------------------------+ 1961 | sarWidth | uint | N | The width of the sample aspect | 1962 | | | | ratio of the resolution. If it | 1963 | | | | is not set, it SHALL equal 1. | 1964 +-----------+------+----------+---------------------------------+ 1965 | sarHeight | uint | N | The height of the sample aspect | 1966 | | | | ratio of the resolution. If it | 1967 | | | | is not set, it SHALL equal 1. | 1968 +-----------+------+----------+---------------------------------+ 1970 Table 13 1972 The display aspect ratio belonging to this Resolution can be 1973 calculated by using this sample aspect ratio width and height, and 1974 the picture width and height as follows: 1976 dar = (darWidth, darHeight) 1977 darWidth = sarWidth * width 1978 darHeight = sarHeight * height 1980 3.2.15. MetadataTrackType 1982 This is the type definition of a metadata Track. 1984 +=====================+===============+======+======================+ 1985 | Attribute | Type | Req? | Description | 1986 +=====================+===============+======+======================+ 1987 | id | string | Y | The unique | 1988 | | | | identifier for this | 1989 | | | | Track. It MUST be | 1990 | | | | unique within its | 1991 | | | | Switching Set. | 1992 +---------------------+---------------+------+----------------------+ 1993 | segments | SegmentType[] | Y | The metadata of the | 1994 | | | | Segments that are | 1995 | | | | contained by this | 1996 | | | | Track at the moment | 1997 | | | | of Manifest | 1998 | | | | creation. | 1999 | | | | If a Segment | 2000 | | | | duration is set in | 2001 | | | | the Track of this | 2002 | | | | Segment, then only | 2003 | | | | the active Segment | 2004 | | | | MUST be shown here. | 2005 | | | | If a segment | 2006 | | | | duration is not | 2007 | | | | set, then each | 2008 | | | | Segment MUST be | 2009 | | | | announced by the | 2010 | | | | Manifest before it | 2011 | | | | is active. More | 2012 | | | | information about | 2013 | | | | Segment | 2014 | | | | availability is | 2015 | | | | given in | 2016 | | | | Section 5.2. | 2017 +---------------------+---------------+------+----------------------+ 2018 | activeSegment | uint | N | The identifier of | 2019 | | | | the Segment that is | 2020 | | | | currently active. | 2021 | | | | If this Track's | 2022 | | | | Presentation is | 2023 | | | | currently active, | 2024 | | | | or if the Track's | 2025 | | | | Presentation was | 2026 | | | | previously active | 2027 | | | | but contains | 2028 | | | | Segments that can | 2029 | | | | still be requested, | 2030 | | | | then this value | 2031 | | | | MUST equal the | 2032 | | | | identifier of the | 2033 | | | | most recent | 2034 | | | | Continuation | 2035 | | | | Segment that is | 2036 | | | | available for | 2037 | | | | retrieval. If the | 2038 | | | | Track's | 2039 | | | | Presentation has | 2040 | | | | not yet been active | 2041 | | | | (and as a result, | 2042 | | | | no Continuation | 2043 | | | | Segments are | 2044 | | | | available yet), | 2045 | | | | then this value | 2046 | | | | MUST be ignored by | 2047 | | | | the client. | 2048 +---------------------+---------------+------+----------------------+ 2049 | averageBandwidth | uint | N | The average bitrate | 2050 | | | | of this Track, | 2051 | | | | denoted in bits per | 2052 | | | | second. | 2053 | | | | It is expected that | 2054 | | | | over a duration of | 2055 | | | | 10 minutes, the | 2056 | | | | average bitrate of | 2057 | | | | this Track SHALL be | 2058 | | | | within 5% of the | 2059 | | | | given value. This | 2060 | | | | value is used to | 2061 | | | | aid in ABR | 2062 | | | | decisions by the | 2063 | | | | player. | 2064 +---------------------+---------------+------+----------------------+ 2065 | bandwidth | uint | N | The peak bitrate of | 2066 | | | | this Track. It is | 2067 | | | | denoted in bits per | 2068 | | | | second. The | 2069 | | | | measured bitrates | 2070 | | | | of all Segments of | 2071 | | | | the Track MUST NOT | 2072 | | | | exceed this value. | 2073 | | | | This value is used | 2074 | | | | to aid in ABR | 2075 | | | | decisions by the | 2076 | | | | player. | 2077 +---------------------+---------------+------+----------------------+ 2078 | baseUrl | string | N | The base URL of | 2079 | | | | this Track. It is | 2080 | | | | part of the content | 2081 | | | | base URLs used to | 2082 | | | | request this | 2083 | | | | Track's | 2084 | | | | Continuation | 2085 | | | | Segments. See | 2086 | | | | Section 3.4 for | 2087 | | | | more information. | 2088 +---------------------+---------------+------+----------------------+ 2089 | codecs | string | N | The definition of | 2090 | | | | the codec(s) | 2091 | | | | necessary to render | 2092 | | | | the content of this | 2093 | | | | Track. It MUST | 2094 | | | | follow the ISO File | 2095 | | | | Format Name Space | 2096 | | | | as defined by | 2097 | | | | [RFC6381]. | 2098 +---------------------+---------------+------+----------------------+ 2099 | continuationPattern | string | N | The URL pattern | 2100 | | | | used to request | 2101 | | | | Continuation | 2102 | | | | Segments belonging | 2103 | | | | to this Track. The | 2104 | | | | pattern needs to | 2105 | | | | include the | 2106 | | | | {segmentId} string; | 2107 | | | | see Section 3.4 for | 2108 | | | | more information. | 2109 | | | | If this is not | 2110 | | | | defined here, then | 2111 | | | | it MUST be defined | 2112 | | | | by the Track's | 2113 | | | | Switching Set. | 2114 +---------------------+---------------+------+----------------------+ 2115 | label | string | N | A human-readable | 2116 | | | | name for this | 2117 | | | | Track. | 2118 +---------------------+---------------+------+----------------------+ 2119 | mediaTimeOffset | ScaledValue | N | The offset to be | 2120 | | | | added to Manifest | 2121 | | | | Timestamps to | 2122 | | | | calculate Media | 2123 | | | | Timestamps | 2124 | | | | contained by | 2125 | | | | segments of this | 2126 | | | | Track. If neither | 2127 | | | | the Track nor the | 2128 | | | | Switching Set has | 2129 | | | | this value set, it | 2130 | | | | SHALL equal 0. | 2131 +---------------------+---------------+------+----------------------+ 2132 | segmentDuration | ScaledValue | N | The duration (in | 2133 | | | | seconds) of each | 2134 | | | | Segment contained | 2135 | | | | by this Track. If | 2136 | | | | this value is set, | 2137 | | | | then every | 2138 | | | | segmentDuration | 2139 | | | | seconds, a new | 2140 | | | | Segment SHOULD be | 2141 | | | | available. More | 2142 | | | | information about | 2143 | | | | the availability is | 2144 | | | | given in | 2145 | | | | Section 5.2. If | 2146 | | | | not set, then each | 2147 | | | | Segment MUST be | 2148 | | | | individually | 2149 | | | | defined by the | 2150 | | | | segments field. | 2151 +---------------------+---------------+------+----------------------+ 2153 Table 14 2155 3.2.16. SegmentType 2157 This is the type definition of a Segment. 2159 +============+============+==========+==============================+ 2160 | Attribute | Type | Required | Description | 2161 +============+============+==========+==============================+ 2162 | id | uint | Y | The unique identifier for | 2163 | | | | this Segment within the | 2164 | | | | Track. This identifier | 2165 | | | | MUST be incremented by 1 | 2166 | | | | for every new Segment of | 2167 | | | | the same Track. | 2168 +------------+------------+----------+------------------------------+ 2169 | timeBounds | TimeBounds | N | The time boundaries of | 2170 | | | | this Segment. If this | 2171 | | | | Segment's Track does not | 2172 | | | | have a constant segment | 2173 | | | | duration, then this value | 2174 | | | | MUST be set, and the | 2175 | | | | Segment's duration MUST | 2176 | | | | be available at least 2 | 2177 | | | | seconds before the actual | 2178 | | | | end of that Segment. | 2179 +------------+------------+----------+------------------------------+ 2181 Table 15 2183 3.2.17. PresentationEventType 2185 Timed metadata events can be embedded in the Manifest by using 2186 Presentation Events. Each PresentationEventType contains the 2187 following elements: 2189 +============+=============================+======+=================+ 2190 | Attribute | Type | Req? | Description | 2191 +============+=============================+======+=================+ 2192 | data | string | Y | The event | 2193 | | | | payload. | 2194 +------------+-----------------------------+------+-----------------+ 2195 | id | string | Y | An unique | 2196 | | | | identifier for | 2197 | | | | this event. | 2198 | | | | It MUST be | 2199 | | | | unique within | 2200 | | | | this | 2201 | | | | Presentation's | 2202 | | | | events. | 2203 +------------+-----------------------------+------+-----------------+ 2204 | timeBounds | PresentationEventTimeBounds | Y | The time | 2205 | | | | boundaries for | 2206 | | | | the event, | 2207 | | | | during which | 2208 | | | | the event | 2209 | | | | SHALL be | 2210 | | | | active. | 2211 +------------+-----------------------------+------+-----------------+ 2212 | encoding | Enum("identity", "base64", | N | The content | 2213 | | "json") | | encoding of | 2214 | | | | the event | 2215 | | | | data. identity | 2216 | | | | signifies that | 2217 | | | | the encoding | 2218 | | | | is plaintext. | 2219 | | | | base64 | 2220 | | | | signifies | 2221 | | | | Base64 | 2222 | | | | encoding. json | 2223 | | | | signifies that | 2224 | | | | the payload | 2225 | | | | given by data | 2226 | | | | is a valid | 2227 | | | | JSON document. | 2228 | | | | If this JSON | 2229 | | | | document is | 2230 | | | | not valid, | 2231 | | | | this event | 2232 | | | | MUST be | 2233 | | | | ignored. If | 2234 | | | | not set, the | 2235 | | | | encoding SHALL | 2236 | | | | default to | 2237 | | | | identity. | 2238 +------------+-----------------------------+------+-----------------+ 2240 Table 16 2242 3.2.18. PresentationEventTimeBounds 2244 The time boundaries of a PresentationEventType are defined as 2245 follows: 2247 +=================+======+==========+==============================+ 2248 | Attribute | Type | Required | Description | 2249 +=================+======+==========+==============================+ 2250 | startTimeOffset | uint | N | The scaled start time offset | 2251 | | | | of the event, defined in | 2252 | | | | seconds divided by the | 2253 | | | | timescale. The actual start | 2254 | | | | time of the event can be | 2255 | | | | calculated by dividing this | 2256 | | | | value by the scale and | 2257 | | | | adding the resulting value | 2258 | | | | to the start time of the | 2259 | | | | Presentation. It is defined | 2260 | | | | as seconds divided by the | 2261 | | | | timescale. If it is not | 2262 | | | | set, it SHALL equal 0. | 2263 +-----------------+------+----------+------------------------------+ 2264 | duration | uint | N | The scaled duration of the | 2265 | | | | event, defined in seconds | 2266 | | | | divided by the timescale. | 2267 | | | | If it is not set, it SHALL | 2268 | | | | equal 0. | 2269 +-----------------+------+----------+------------------------------+ 2270 | scale | uint | N | The timescale used for these | 2271 | | | | time bounds. If it is not | 2272 | | | | set, it SHALL be equal to 1. | 2273 +-----------------+------+----------+------------------------------+ 2275 Table 17 2277 3.3. Manifest requests 2279 The Manifest SHOULD be available through an HTTP GET request. The 2280 URL of this Manifest can be personalized by the user. 2282 +==================================+========+==================+ 2283 | Request path | Method | Summary | 2284 +==================================+========+==================+ 2285 | (chosen by the stream publisher) | GET | Retrieve the | 2286 | | | stream Manifest. | 2287 +----------------------------------+--------+------------------+ 2289 Table 18: Manifest Requests 2291 3.3.1. Manifest responses 2293 +==============+========================================+ 2294 | Success | | 2295 +==============+========================================+ 2296 | Status code | 200 | 2297 +--------------+----------------------------------------+ 2298 | Content-Type | JSON (MIME type MUST equal | 2299 | | application/vnd.theo.hesp+json) | 2300 +--------------+----------------------------------------+ 2301 | Description | When the given Manifest exists on the | 2302 | | server, the Manifest data is returned. | 2303 +--------------+----------------------------------------+ 2305 Table 19: Manifest Responses: Success 2307 +=============+==================================================+ 2308 | Status code | Description | 2309 +=============+==================================================+ 2310 | 404 | The Manifest is not available for the given URL. | 2311 +-------------+--------------------------------------------------+ 2313 Table 20: Manifest Responses: Error 2315 If an error occurs, the player SHOULD attempt to retry the request. 2316 In case of consecutive unsuccessful requests, the media player SHOULD 2317 assume the content is unavailable and cease playback. 2319 3.4. Addressing of content requests 2321 Several elements of the Manifest contain a baseUrl attribute. These 2322 attributes are used to construct the URLs of media content (and 2323 metadata.) 2325 3.4.1. Content request URL resolution 2327 A content request URL for a specific Track SHALL be constructed by 2328 applying relative resolution (as explained in Section 5.2 of RFC 3986 2329 [RFC3986]) to each defined baseUrl or contentBaseUrl attribute 2330 relating to that Track, starting from the root of the Manifest. The 2331 URL of the Manifest SHALL be used as a base for the first resolution, 2332 after which the target URL of the previous resolution SHALL be used 2333 as the base URL of the next resolution. 2335 This means that the content request URL for a Track's Initialization 2336 Stream is constructed as follows (in pseudocode): 2338 T = manifestUrl 2339 if isDefined(Manifest.contentBaseUrl): 2340 T = resolve(Manifest.contentBaseUrl, T) 2341 if isDefined(Presentation.baseUrl): 2342 T = resolve(Presentation.baseUrl, T) 2343 if isDefined(SwitchingSet.baseUrl): 2344 T = resolve(SwitchingSet.baseUrl, T) 2345 if isDefined(Track.baseUrl): 2346 T = resolve(Track.baseUrl, T) 2347 contentRequestURL = resolve(initializationPattern, T) 2349 where initializationPattern is the attribute given by either 2350 SwitchingSet or Track (with Track taking precedence if given by both) 2351 and resolve(R, B) is relative resolution, where R is a URI reference 2352 and B is a base URI. 2354 The content request URL for the Track's Continuation Stream can be 2355 constructed in the same manner if the initializationPattern is 2356 replaced with the continuationPattern. Both URLs MUST be unique for 2357 each unique Track. 2359 An example of content addressing is given in Appendix A.1.3. 2361 3.4.2. Requesting using an identifier and the content request URL 2363 The content request URL MUST include the {initId} pattern for 2364 Initialization Stream URLs and the {segmentId} pattern for 2365 Continuation Stream URLs. These patterns are replaced in the actual 2366 content requests: {initId} is replaced with the requested Sequence 2367 Number of the Initialization Stream, and {segmentId} is replaced with 2368 the requested segment identifier of the Continuation Stream. 2370 In order to add leading zeros to the identifier, the following can be 2371 added to the patterns initId and segmentId: :0(n)d where (n) is the 2372 minimal amount of characters the resulting identifier has (if the 2373 identifier is already longer, it will not be altered.) 2375 For example, for a continuation segment with identifier 100, a 2376 following content request URL https://www.example.com/s-1/ 2377 content-{segmentId:06d}.mp4 will resolve to https://www.example.com/ 2378 s-1/content-000100.mp4. While the content request URL 2379 https://www.example.com/s-1/content-{segmentId:02d}.mp4 resolves to 2380 https://www.example.com/s-1/content-100.mp4 2382 Further details about these requests are given in Section 4 and 2383 Section 5. 2385 3.5. Manifest example 2387 A full Manifest example, together with information on content 2388 addressing and timing information, is given by Appendix A.1. 2390 4. Initialization Stream 2392 The player uses the information in the HESP Manifest to fetch an 2393 Initialization Packet. 2395 4.1. Initialization Stream purpose 2397 The Initialization Stream is a stream containing only independent 2398 samples. This stream is not regularly used by a single client nor 2399 completely streamed. The purpose of the Initialization Stream is to 2400 make available, upon request of the client, the separate independent 2401 frames packaged in an Initialization Packet. This way, a client has 2402 more control over the specific position of the stream where it wants 2403 to initiate playback. For example, it allows for more granularity 2404 when starting playback at the live edge, seeking a specific point in 2405 time, or switching to an alternative Track. 2407 Additionally, the Initialization Packet contains information on the 2408 position of the following frame in the Continuation Stream to achieve 2409 regular media playback. 2411 4.2. Initialization Packet format 2413 4.2.1. Video Initialization Packet 2415 A video Initialization Packet MUST contain: 2417 * A CMAF header, i.e., all information required to initialize the 2418 media decoder, stored in ISO Base Media File Format [ISOBMFF] 2419 boxes as specified by CMAF [CMAF]. The minimal required contents 2420 are defined in Section 4.2.3 2422 - An additional in-band metadata event containing extra HESP 2423 information, included in the aforementioned CMAF segment. The 2424 definition of such an event is denoted in Section 4.2.4. 2426 * A CMAF segment, i.e., an independent media sample and timing 2427 information related to this sample, stored in ISOBMFF [ISOBMFF] 2428 boxes as specified by CMAF [CMAF]. 2430 4.2.2. Audio Initialization Packet 2432 An audio Initialization Packet MUST contain: 2434 * A CMAF header, i.e., all information required to initialize the 2435 media decoder, stored in ISO Base Media File Format [ISOBMFF] 2436 boxes as specified by CMAF [CMAF]. The minimal required contents 2437 are defined below. 2439 * An additional in-band metadata event containing extra HESP 2440 information, included in the aforementioned CMAF segment. The 2441 definition of such an event is denoted below. 2443 It MAY also contain an independent (audio) media sample, as is 2444 defined by the video Initialization Packet. However, it is heavily 2445 discouraged as it leads to more storage costs for the provider 2446 without any significant advantage. 2448 4.2.3. CMAF header 2450 A CMAF header is defined in ISO/IEC 23000-19:2020 [CMAF] as: "a 2451 sequence of CMAF constrained ISOBMFF boxes that do not reference any 2452 media samples (3.3.15), but are associated with a CMAF track (3.2.1) 2453 and necessary for the decoding of its CMAF fragments (3.1.1)." As 2454 such, the header MUST NOT contain any samples. 2456 This means that for the ISO Base Media File Format, at least the 2457 following boxes MUST be given: 2459 * An ftyp box. 2461 * A moov box. 2463 * One or more tracks and accompanying boxes, though stbl and other 2464 related boxes cannot give information about any actual sample 2465 entries. 2467 4.2.4. Event message information 2469 In order to pass information about the Continuation Stream, an in- 2470 band event in the form of an emsg box must be added to the 2471 Initialization Packet. The format of this box is defined at 2472 Section 6.2.1.1. The message_data field MUST contain the following 2473 two values (see Table 31): 2475 * the Continuation Segment index (index), which defines the index of 2476 the Continuation Segment containing the next media sample. 2478 * the Continuation Segment byte offset (offset), which defines the 2479 position of the next media sample in the Continuation Segment. 2480 This field is OPTIONAL if the next media sample is located at the 2481 start of the segment, in which case a byte-range request is not 2482 needed. 2484 The message_data field of the emsg box could for example look like 2485 this: 2487 {"index":1,"offset":1234} 2489 4.3. Initialization Stream addressing 2491 Section 3.4 defines how to create the correct URLs and how to request 2492 Initialization Packets. The request content URL is used together 2493 with the desired Sequence Number to retrieve an Initialization 2494 Packet. 2496 Additionally, {initId} can also be replaced with the string now. If 2497 this URL is requested, the most recently available Initialization 2498 Packet MUST be returned. 2500 4.3.1. Initialization Stream requests 2502 Initialization Packets MUST be available through a basic GET request. 2503 They MUST be able to be retrieved through two methods: 2505 +========================+========+============================+ 2506 | Request path | Method | Summary | 2507 +========================+========+============================+ 2508 | Initialization content | GET | Retrieve the most recent | 2509 | request URL | | Initialization Packet from | 2510 | {initId} is replaced | | the given track. | 2511 | with the string now | | | 2512 +------------------------+--------+----------------------------+ 2513 | Initialization content | GET | Retrieve the | 2514 | request URL | | Initialization Packet with | 2515 | {initId} is replaced | | the given Sequence Number | 2516 | with a Sequence Number | | from the given track. | 2517 +------------------------+--------+----------------------------+ 2519 Table 21: Initialization Stream requests 2521 4.3.2. Initialization Stream responses 2523 +==============+================================================+ 2524 | Success | | 2525 +==============+================================================+ 2526 | Status Code | 200 | 2527 +--------------+------------------------------------------------+ 2528 | Content-Type | MUST match the mimeType given by the Stream's | 2529 | | Switching Set (see Section 3 for more details) | 2530 +--------------+------------------------------------------------+ 2531 | Description | Return the requested Initialization Packet | 2532 | | when it exists on the media server. | 2533 +--------------+------------------------------------------------+ 2535 Table 22: Initialization Stream responses: success 2537 +========+============================================+ 2538 | Status | Description | 2539 | code | | 2540 +========+============================================+ 2541 | 404 | The Track does not exist on the media | 2542 | | server, or the requested Initialization | 2543 | | Packet does not exist on the media server. | 2544 +--------+--------------------------------------------+ 2546 Table 23: Initialization Stream responses: error 2548 5. Continuation Stream 2550 The client uses the information given by the Manifest and the 2551 Initialization Packet to request the Continuation Stream. This low 2552 latency stream is used for the bulk of media playback of HESP. The 2553 Continuation Stream is an independently playable CMAF stream. 2555 5.1. Continuation Stream format 2557 5.1.1. Media content 2559 The Continuation Stream is packaged as a regular CMAF stream, albeit 2560 with some encoding constraints, depending on the chosen profile (see 2561 Appendix C.) 2563 Maximum Gain Profile 2565 * Samples MUST only contain references to the one sample preceding 2566 it. 2568 * Each CMAF chunk MUST contain at most one sample (for the lowest 2569 possible latency.) 2571 * Long CMAF Segments (values of multiple minutes are possible.) 2573 Compatibility Profile 2575 * Regular sized CMAF Segments (recommendation: between 1 and 30 2576 seconds) 2578 * Chunk sizes range between one sample and one sub-GOP 2580 * The following GOP structure MUST be followed: I B ... B P B ... B 2581 P B ... B P 2583 * The number of B frames MAY vary from 0 (same behavior as Maximum 2584 Gain Profile) to 4 (recommended maximum) 2586 * Each sub-GOP (B ... B P) MUST only reference one previous frame 2587 (allowing injection of keyframes) 2589 5.2. Continuation Segment availability 2591 Continuation Segments must together create a continuous stream of 2592 media data. There MUST NOT exist gaps in the time boundaries of 2593 subsequent Continuation Segments. 2595 There exist two ways to signal Segment availability. Either the 2596 Track has a segmentDuration set in the Manifest, which represents the 2597 constant duration of each Segment. In this case, a client can derive 2598 on its own when to start requesting a new Segment. A new Segment 2599 SHOULD be published every n seconds, where n equals segmentDuration. 2600 This new Segment MUST become available within 100 milliseconds from 2601 that point. It is not allowed to drift, i.e., if a Segment is only 2602 published at n+100ms, then the next Segment MUST be available at 2603 2n+100ms at the latest. 2605 The duration of the last Segment of a Track MAY be shorter than this 2606 constant segmentDuration. As a new Manifest must be retrieved near 2607 the end of a Presentation, a client should be able to start 2608 requesting content of the new Presentation in a timely manner, 2609 regardless of the length of this last Segment. 2611 The other option is for the Track to specify all its Segments 2612 outright in the segments field of the Manifest. As the Manifest does 2613 not get regularly retrieved, it is REQUIRED to signal a Manifest 2614 update (see Section 6.2.1.2) at the end of each Segment. It is 2615 recommended only to use this option in the "Maximum Gain Profile", as 2616 it can otherwise significantly increase the number of Manifest 2617 requests necessary. 2619 5.3. Continuation Stream addressing 2621 To start playback of a Track, a client MUST request the Continuation 2622 Segments of the chosen Track(s) using the information given by 2623 (already obtained) the Manifest and Initialization Packet. 2625 Requests for Content Segments should be made using HTTP GET requests. 2626 The first request, following an Initialization Packet, fetches the 2627 Segment indicated by the Initialization Packet. A byte-range header 2628 should be used to specify the range of data that needs to be 2629 requested. This information is also given by the Initialization 2630 Packet. 2632 A client then automatically calculates the URL of the next Segment as 2633 it is indicated in the Manifest and requests the next Segment using a 2634 regular HTTP GET request. 2636 5.3.1. Continuation Stream URLs 2638 Section 3.4 defines how to create the correct URLs and how to request 2639 a continuation segment. The request content URL is used together 2640 with the desired segment identifier to retrieve the Segment. 2642 5.3.2. Continuation Stream requests 2644 Suppose a client needs to request a partial Continuation Segment, for 2645 example, starting at the byte offset given by metadata of an 2646 Initialization Packet. In that case, it MUST use one or more HTTP 2647 Range [RFC7233] Requests. The start and end of the range of each 2648 request MUST be defined. 2650 Often, the total length of the Continuation Segment will not yet be 2651 known at the time of the request. If a client cannot define an 2652 accurate end value for the HTTP Range of a request, then 2^53 - 1 2653 (9007199254740991) SHOULD be used. 2655 +===========================+========+===========================+ 2656 | Request path | Method | Summary | 2657 +===========================+========+===========================+ 2658 | Continuation content | GET | Retrieve the Continuation | 2659 | request URL | | Segment with the | 2660 | {segmentId} is replaced | | requested identifier from | 2661 | with a segment identifier | | the given track. | 2662 +---------------------------+--------+---------------------------+ 2664 Table 24: Continuation Stream requests 2666 5.3.3. Continuation Stream responses 2668 A distinction is made between how responses should be returned, 2669 depending on the version of the HTTP protocol used. 2671 5.3.3.1. HTTP/1.1 successful response 2673 If an HTTP Range request is sent, then a 206 Partial Content response 2674 MUST be returned upon a successful response. The response MUST use 2675 Chunked Transfer Coding [RFC7230] to ensure timely delivery of media 2676 data. 2678 +===================+==========================================+ 2679 | Success | | 2680 +===================+==========================================+ 2681 | Status Code | 206 | 2682 +-------------------+------------------------------------------+ 2683 | Content-Type | MUST match the mimeType given by the | 2684 | | Stream's Switching Set (see Section 3 | 2685 | | for more details) | 2686 +-------------------+------------------------------------------+ 2687 | Transfer-Encoding | chunked | 2688 +-------------------+------------------------------------------+ 2689 | Description | The requested range of Segment data is | 2690 | | returned from the server. Depending on | 2691 | | the byte-range requested, the connection | 2692 | | is kept open to retrieve live data. | 2693 +-------------------+------------------------------------------+ 2695 Table 25: Continuation Stream responses: success (Range 2696 header is given) 2698 For requests without a Range header, a 200 OK response MUST be 2699 returned upon success. 2701 +===================+========================================+ 2702 | Success | | 2703 +===================+========================================+ 2704 | Status Code | 200 | 2705 +-------------------+----------------------------------------+ 2706 | Content-Type | MUST match the mimeType given by the | 2707 | | Stream's Switching Set (see Section 3 | 2708 | | for more details) | 2709 +-------------------+----------------------------------------+ 2710 | Transfer-Encoding | chunked | 2711 +-------------------+----------------------------------------+ 2712 | Description | The Segment data is returned from the | 2713 | | server. Depending on the availability | 2714 | | of the Segment, the connection is kept | 2715 | | open to retrieve live data. | 2716 +-------------------+----------------------------------------+ 2718 Table 26: Continuation Stream responses: success (Range 2719 header is not given) 2721 5.3.3.2. HTTP/2 successful response 2723 HTTP/2 uses frame-based transmission and cannot use Chunked Transfer 2724 Coding. As such, this header is not given here. 2726 +==============+==========================================+ 2727 | Success | | 2728 +==============+==========================================+ 2729 | Status code | 206 | 2730 +--------------+------------------------------------------+ 2731 | Content-Type | MUST match the mimeType given by the | 2732 | | Stream's Switching Set (see Section 3 | 2733 | | for more details) | 2734 +--------------+------------------------------------------+ 2735 | Description | The requested range of Segment data is | 2736 | | returned from the server. Depending on | 2737 | | the byte-range requested, the connection | 2738 | | is kept open to retrieve live data. | 2739 +--------------+------------------------------------------+ 2741 Table 27: Continuation Stream responses: success (Range 2742 header is given) 2744 For requests without a Range header, a 200 OK response MUST be 2745 returned upon success. 2747 +==============+===================================================+ 2748 | Success | | 2749 +==============+===================================================+ 2750 | Status code | 200 | 2751 +--------------+---------------------------------------------------+ 2752 | Content-Type | MUST match the mimeType given by the Stream's | 2753 | | Switching Set (see Section 3 for more details) | 2754 +--------------+---------------------------------------------------+ 2755 | Description | The Segment data is returned from the server. | 2756 | | Depending on the availability of the Segment, the | 2757 | | connection is kept open to retrieve live data. | 2758 +--------------+---------------------------------------------------+ 2760 Table 28: Continuation Stream responses: success (Range header 2761 is not given) 2763 5.3.3.3. Response errors 2765 +========+=======================================================+ 2766 | Status | Description | 2767 | code | | 2768 +========+=======================================================+ 2769 | 404 | The Track does not exist on the media server, or the | 2770 | | requested Segment does not exist on the media server. | 2771 +--------+-------------------------------------------------------+ 2772 | 416 | The requested byte-range could not be fulfilled. | 2773 +--------+-------------------------------------------------------+ 2775 Table 29: Continuation Stream responses: error 2777 6. Timed metadata 2779 HESP supports timed metadata through two methods: metadata Tracks for 2780 continuous, often segmented metadata and metadata events for sporadic 2781 metadata updates. 2783 6.1. Metadata Tracks 2785 Metadata Tracks function similarly to media Tracks, however, without 2786 the need for an Initialization Stream. The attributes for metadata 2787 Tracks in the Manifest can be found at Section 3.2.15. The 2788 addressing happens similarly to the addressing of Continuation 2789 segments: a continuationPattern is given for each metadata Track, 2790 either directly set on the Track or on the Switching Set of the 2791 Track. Each metadata Segment contains a numerical identifier that 2792 increments by one for each new Segment of the Track. This 2793 identifier, together with the addressing pattern, creates the URL 2794 used to retrieve the contents of the Segment. 2796 The Manifest contains the duration of each metadata Segment, either 2797 stated individually per Segment or through the segmentDuration 2798 attribute set on the metadata Track. It is possible that chunked 2799 encoding is used here to ensure that the contents are delivered as 2800 soon as possible, but this is not a requirement. This can be useful 2801 for subtitles alongside live content, for example. 2803 6.2. Metadata events 2805 Metadata events can be used to signal information that does not 2806 become available in regular intervals. These events can either be 2807 transmitted in-band, where it MUST be added to the video or audio 2808 Continuation Streams, or out-of-band, in which case details about the 2809 metadata event MUST be available in the Manifest. 2811 6.2.1. In-band events 2813 In order to deliver events in-band, root-level Event Message (emsg) 2814 boxes MUST be added to ongoing media Continuation Streams. The 2815 definition of an emsg box can be found in the CMAF [CMAF] 2816 specification. 2818 These boxes should be appended to each Track of a Presentation to 2819 ensure all viewers can receive such an event. It is recommended to 2820 use out-of-band events if the included data is significantly large. 2822 The HESP specification defines a few in-band events that it leverages 2823 for client initialization and Manifest updates. The structure of 2824 these events is given below. 2826 6.2.1.1. Initialization data 2828 This event is appended to each Initialization Packet for the client 2829 to request the correct Continuation Segments. It is an 'emsg' box 2830 with the following REQUIRED values: 2832 +=========================+==================================+ 2833 | Attribute | Value | 2834 +=========================+==================================+ 2835 | version | 0 | 2836 +-------------------------+----------------------------------+ 2837 | scheme_id_uri | "urn:theo:hesp:2020" | 2838 +-------------------------+----------------------------------+ 2839 | value | "initdata" | 2840 +-------------------------+----------------------------------+ 2841 | timescale | MUST match the timescale of the | 2842 | | Initialization Packet in case of | 2843 | | video or MUST equal 1 otherwise. | 2844 +-------------------------+----------------------------------+ 2845 | presentation_time_delta | 0 | 2846 +-------------------------+----------------------------------+ 2847 | event_duration | MUST match the duration of the | 2848 | | Initialization Packet in case of | 2849 | | video or MUST equal 0 otherwise. | 2850 +-------------------------+----------------------------------+ 2851 | id | can be freely set (and MUST be | 2852 | | ignored by the player) | 2853 +-------------------------+----------------------------------+ 2854 | message_data | MUST contain the data defined by | 2855 | | Table 31, formatted as JSON | 2856 +-------------------------+----------------------------------+ 2858 Table 30: emsg box containing an initialization data event 2860 +===========+=========+==========+================================+ 2861 | Attribute | Type | Required | Description | 2862 +===========+=========+==========+================================+ 2863 | index | integer | Y | The Continuation Segment index | 2864 | | | | (see Section 4.2.4.) | 2865 +-----------+---------+----------+--------------------------------+ 2866 | offset | integer | N | The Continuation Segment byte | 2867 | | | | offset (see Section 4.2.4), it | 2868 | | | | SHALL be 0 if not given here. | 2869 +-----------+---------+----------+--------------------------------+ 2871 Table 31: message_data contents of an initialization data event 2873 6.2.1.2. Manifest update 2875 This event is used to signal the client that a new Manifest must be 2876 retrieved. This can occur for many reasons, such as before a 2877 Presentation change, availability of new out-of-band metadata, etc. 2878 It is an emsg box with the following REQUIRED values: 2880 +=========================+================================+ 2881 | Attribute | Value | 2882 +=========================+================================+ 2883 | version | 0 | 2884 +-------------------------+--------------------------------+ 2885 | scheme_id_uri | "urn:theo:hesp:2020" | 2886 +-------------------------+--------------------------------+ 2887 | value | "manifestupdate" | 2888 +-------------------------+--------------------------------+ 2889 | timescale | 1 | 2890 +-------------------------+--------------------------------+ 2891 | presentation_time_delta | 0 | 2892 +-------------------------+--------------------------------+ 2893 | event_duration | 0 | 2894 +-------------------------+--------------------------------+ 2895 | id | MAY be freely set (and MUST be | 2896 | | ignored by the player) | 2897 +-------------------------+--------------------------------+ 2898 | message_data | MUST contain the data defined | 2899 | | by Table 33, formatted as JSON | 2900 +-------------------------+--------------------------------+ 2902 Table 32: emsg box containing a Manifest update event 2904 +===========+========+==========+===========================+ 2905 | Attribute | Type | Required | Description | 2906 +===========+========+==========+===========================+ 2907 | url | string | N | the URL of an alternative | 2908 | | | | location of the Manifest | 2909 +-----------+--------+----------+---------------------------+ 2911 Table 33: message_data contents of a Manifest data event 2913 If the event contains a URL, then this Manifest request MUST be made 2914 to this address. Further Manifest requests do not have this 2915 requirement. 2917 6.2.2. Out-of-band events 2919 Out-of-band events can be added to the Manifest through Presentation 2920 Events. The definition of a Presentation Event is given in 2921 Section 3.2.17. The contents of the event data are not predefined. 2922 It is possible to include arbitrary data as Base64 encoded data, and 2923 plaintext data can be included as-is. If needed, the data can be an 2924 URL that needs to be resolved separately, but this is up to the 2925 publisher of this data. 2927 7. Content protection 2929 HESP has support for content protection. It allows DRM systems to be 2930 implemented through the common encryption standard. 2932 7.1. Common encryption support 2934 As HESP has the requirement for media to be structured as ISOBMFF 2935 [ISOBMFF], common encryption [CENC] is to be used to encrypt that 2936 media content. 2938 Common encryption specifies 4 protection schemes that may be used: 2939 cenc, cbc1, cens and cbcs. For HESP, either the AES-CBC subsample 2940 pattern encryption scheme (cbcs) or the AES-CTR full sample pattern 2941 encryption scheme (cenc) MUST be used. 2943 7.2. HESP Manifest 2945 The HESP Manifest has a SwitchingSetProtection structure to set up 2946 common encryption for audio and video Switching Sets. The definition 2947 of this structure can be found in Chapter 3. As a result of this 2948 being set on a Switching Set level: 2950 * All Tracks belonging to such a Switching Set MUST be encrypted 2951 with the same content key. Aligned Switching Sets can be used to 2952 ensure that a client can still switch through Tracks of different 2953 Switching Sets. 2955 * Audio and video data SHOULD be encrypted with different content 2956 keys. This is a recommendation as both often have different 2957 encryption strength requirements. 2959 The SwitchingSetProtection structure MAY contain a 2960 ProtectionSystemSpecificHeaderBox (pssh), which can also be contained 2961 by the Initialization Stream. Note that this box MUST be given by at 2962 least one of both options in order for a license request to be made. 2963 If both exist for a Switching Set, the pssh box from the 2964 Initialization Stream MUST be disregarded. This allows for more 2965 straightforward alterations of license information after a stream has 2966 been created or published. 2968 7.3. CMAF box structure 2970 The media of a protected stream needs to contain certain ISOBMFF 2971 boxes to be compliant with CENC (and CMAF). HESP requires the same 2972 boxes to be present in media streams. A brief overview of these 2973 boxes is given below. More information on this requirement can be 2974 found in the CENC [CENC] and CMAF [CMAF] specifications. 2976 7.3.1. Initialization Stream 2978 Initialization Packets MUST contain the following boxes: 2980 * TrackEncryptionBox (tenc): This box contains default parameters 2981 regarding sample encryption. 2983 * SchemeTypeBox (schm): This box identifies the protection scheme. 2984 scheme_type MUST be set to cbcs or cenc. 2986 * ProtectionSystemSpecificHeaderBox (pssh): If the Manifest does not 2987 contain a pssh box that applies to this Track, then this box MUST 2988 be included in each Initialization Packet of each Track of the 2989 protected Switching Set. If the Manifest does contain a pssh box 2990 that applies to this Track, then this box MUST be disregarded. 2992 In order to signal that the Track is encrypted, the stream type MUST 2993 equal encv for video Tracks and enca for audio Tracks. 2995 Additionally, video Initialization Packets contain a sample. If that 2996 sample is encrypted, then the same requirements for CMAF fragments of 2997 a Continuation Segment also apply here: a senc box MUST be included, 2998 pssh, saiz and saio boxes MAY be included. The following section 2999 contains more information about these requirements. 3001 7.3.2. Continuation Stream 3003 CMAF fragments of a Continuation Segment MUST contain the following 3004 box: 3006 * SampleEncryptionBox (senc): This box is used to store 3007 initialization vector data and information on subsample 3008 encryption. As each chunk in HESP must currently contain at most 3009 one sample, sample_count SHALL always 1. 3011 The following boxes MAY be included: 3013 * ProtectionSystemSpecificHeaderBox (pssh): If any updates need to 3014 be made to the underlying licensing system, a pssh box MAY be 3015 included. 3017 * SampleAuxiliaryInformationSizesBox (saiz): This box is used to 3018 store the size of per-sample auxiliary information. It is only 3019 REQUIRED if such per-sample information exists. 3021 * SampleAuxiliaryInformationOffsetsBox (saio): This box is used to 3022 store the offsets of per-sample auxiliary information. It is only 3023 REQUIRED if such per-sample information exists. 3025 8. Contributors 3027 Significant contributions to the design of this protocol were made by 3028 Egon Okerman, Samie Beheydt, and Johan Vounckx. 3030 9. IANA Considerations 3032 This memo requests that the following MIME type [RFC2046] be 3033 registered with the IANA: 3035 | Type name: application 3036 | 3037 | Subtype name: vnd.theo.hesp+json 3038 | 3039 | Required parameters: (none) 3040 | 3041 | Optional parameters: (none) 3042 | 3043 | Encoding considerations: encoded as text. 3044 | 3045 | Security considerations: See Section 10. 3046 | 3047 | Compression: this media type does not employ compression. 3048 | 3049 | Interoperability considerations: There are no byte-ordering 3050 | issues, since files are 7- or 8-bit text. Applications could 3051 | encounter unrecognized tags, which SHOULD be ignored. 3053 10. Security Considerations 3055 Since the protocol uses HTTP to transmit data, the regular HTTP 3056 security considerations apply. See section 15 of RFC 7230 [RFC7230]. 3058 Clients SHOULD take care when parsing files received from a server so 3059 that non-compliant files are rejected. Clients SHOULD range-check 3060 responses to prevent buffer overflows. See also the Security 3061 Considerations section of RFC 3986 [RFC3986]. Clients SHOULD load 3062 resources identified by URI lazily to avoid contributing to denial- 3063 of-service attacks. 3065 HTTP requests often include session state ("cookies"), which may 3066 contain private user data. Implementations MUST follow cookie 3067 restriction and expiry rules specified by RFC 6265 [RFC6265]. See 3068 also the Security Considerations section of RFC 6265, and RFC 2964 3069 [RFC2964]. 3071 Encryption keys are specified by URI. The delivery of these keys 3072 SHOULD be secured by a mechanism such as HTTP over TLS [RFC8446] 3073 (formerly SSL) in conjunction with a secure realm or a session 3074 cookie. 3076 11. Normative References 3078 [CENC] International Organization for Standardization, 3079 "Information technology - MPEG systems technologies - Part 3080 7: Common encryption in ISO base media file format files", 3081 December 2020, . 3083 [CMAF] International Organization for Standardization, 3084 "Information technology - Multimedia application format 3085 (MPEG-A) - Part 19: Common media application format (CMAF) 3086 for segmented media", March 2020, 3087 . 3089 [ISO6392] International Organization for Standardization, "Codes for 3090 the representation of names of languages - Part 2: Alpha-3 3091 code", November 1998, 3092 . 3094 [ISO8601] International Organization for Standardization, "Date and 3095 time - Representations for information interchange", 3096 February 2019, . 3098 [ISOBMFF] International Organization for Standardization, 3099 "Information technology - Coding of audio-visual objects - 3100 Part 12: ISO base media file format", December 2020, 3101 . 3103 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 3104 Requirement Levels", BCP 14, RFC 2119, 3105 DOI 10.17487/RFC2119, March 1997, 3106 . 3108 [RFC2964] Moore, K. and N. Freed, "Use of HTTP State Management", 3109 BCP 44, RFC 2964, DOI 10.17487/RFC2964, October 2000, 3110 . 3112 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 3113 Resource Identifier (URI): Generic Syntax", STD 66, 3114 RFC 3986, DOI 10.17487/RFC3986, January 2005, 3115 . 3117 [RFC6265] Barth, A., "HTTP State Management Mechanism", RFC 6265, 3118 DOI 10.17487/RFC6265, April 2011, 3119 . 3121 [RFC6381] Gellens, R., Singer, D., and P. Frojdh, "The 'Codecs' and 3122 'Profiles' Parameters for "Bucket" Media Types", RFC 6381, 3123 DOI 10.17487/RFC6381, August 2011, 3124 . 3126 [RFC7230] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer 3127 Protocol (HTTP/1.1): Message Syntax and Routing", 3128 RFC 7230, DOI 10.17487/RFC7230, June 2014, 3129 . 3131 [RFC7233] Fielding, R., Ed., Lafon, Y., Ed., and J. Reschke, Ed., 3132 "Hypertext Transfer Protocol (HTTP/1.1): Range Requests", 3133 RFC 7233, DOI 10.17487/RFC7233, June 2014, 3134 . 3136 [RFC8259] Bray, T., Ed., "The JavaScript Object Notation (JSON) Data 3137 Interchange Format", STD 90, RFC 8259, 3138 DOI 10.17487/RFC8259, December 2017, 3139 . 3141 [RFC8446] Rescorla, E., "The Transport Layer Security (TLS) Protocol 3142 Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018, 3143 . 3145 [UUID] International Organization for Standardization, 3146 "Information technology - Procedures for the operation of 3147 object identifier registration authorities - Part 8: 3148 Generation of universally unique identifiers (UUIDs) and 3149 their use in object identifiers", August 2014, 3150 . 3152 12. Informative References 3154 [RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 3155 Extensions (MIME) Part Two: Media Types", RFC 2046, 3156 DOI 10.17487/RFC2046, November 1996, 3157 . 3159 Appendix A. Example usage 3161 A.1. Manifest 3163 For the initial step, the client retrieves a Manifest. 3165 A.1.1. Retrieving the Manifest 3167 The URL of the Manifest is given out of band to the client. The 3168 client sends out a GET request. In this case, let's suppose the 3169 Manifest is available at https://example.com/stream1/manifest.json. 3170 The client then makes such a request: 3172 GET /stream1/manifest.json HTTP/1.1 3173 Host: example.com 3174 Accept: application/vnd.theo.hesp+json 3176 The server responds with the following headers: 3178 HTTP/1.1 200 OK 3179 Content-Type: application/vnd.theo.hesp+json; charset=utf-8 3180 Content-Length: 6867 3181 Date: Wed, 31 Mar 2021 08:00:00 GMT 3183 and the following body: 3185 { 3186 "activePresentation":"1", 3187 "availabilityDuration":{ 3188 "value":2400 3189 }, 3190 "creationDate":"2021-03-31T08:00:00.000Z", 3191 "fallbackPollRate":300, 3192 "manifestVersion":"1.1.0", 3193 "streamType":"live", 3194 "presentations":[ 3195 { 3196 "id":"0", 3197 "timeBounds":{ 3198 "startTime":0, 3199 "endTime":972000000, 3200 "scale":90000 3201 }, 3202 "audio":[ 3203 { 3204 "id":"main-audio", 3205 "language":"eng", 3206 "baseUrl":"audio/", 3207 "channels":2, 3208 "codecs":"mp4a.40.2", 3209 "continuationPattern":"content-{segmentId}.mp4", 3210 "initializationPattern":"init-{initId}.mp4", 3211 "sampleRate":48000, 3212 "tracks":[ 3213 { 3214 "id":"96kbps", 3215 "activeSegment":1799, 3216 "activeSequenceNumber":506249, 3217 "averageBandwidth":96000, 3218 "bandwidth":96000, 3219 "baseUrl":"96k/", 3220 "segmentDuration":{ 3221 "value":540000, 3222 "scale":90000 3223 }, 3224 "segments":[ 3225 { 3226 "id":1799, 3227 "timeBounds":{ 3228 "startTime":971460000, 3229 "scale":90000 3230 } 3231 } 3232 ] 3233 } 3234 ] 3235 } 3236 ], 3237 "video":[ 3238 { 3239 "id":"main-video", 3240 "baseUrl":"video/", 3241 "frameRate":{ 3242 "value":25 3243 }, 3244 "continuationPattern":"content-{segmentId}.mp4", 3245 "initializationPattern":"init-{initId}.mp4", 3246 "tracks":[ 3247 { 3248 "id":"720p", 3249 "activeSegment":1799, 3250 "activeSequenceNumber":269999, 3251 "bandwidth":3000000, 3252 "baseUrl":"720p/", 3253 "codecs":"avc1.4d001f", 3254 "resolution":{ 3255 "width":1280, 3256 "height":720 3257 }, 3258 "segmentDuration":{ 3259 "value":540000, 3260 "scale":90000 3262 }, 3263 "segments":[ 3264 { 3265 "id":1799, 3266 "timeBounds":{ 3267 "startTime":971460000, 3268 "scale":90000 3269 } 3270 } 3271 ] 3272 } 3273 ] 3274 } 3275 ] 3276 }, 3277 { 3278 "id":"1", 3279 "timeBounds":{ 3280 "startTime":972000000, 3281 "scale":90000 3282 }, 3283 "currentTime":{ 3284 "value":1134000000, 3285 "scale":90000 3286 }, 3287 "baseUrl":"https://otherexample.com/s2/", 3288 "audio":[ 3289 { 3290 "id":"main-audio", 3291 "language":"eng", 3292 "baseUrl":"audio/", 3293 "channels":2, 3294 "codecs":"mp4a.40.2", 3295 "sampleRate":48000, 3296 "mediaTimeOffset":{ 3297 "value":-972000000, 3298 "scale":90000 3299 }, 3300 "tracks":[ 3301 { 3302 "id":"128kbps", 3303 "activeSegment":300, 3304 "activeSequenceNumber":84375, 3305 "averageBandwidth":128000, 3306 "bandwidth":128000, 3307 "continuationPattern": 3308 "128k-content-{segmentId}.mp4", 3309 "initializationPattern": 3311 "128k-init-{initId}.mp4", 3312 "segmentDuration":{ 3313 "value":540000, 3314 "scale":90000 3315 }, 3316 "segments":[ 3317 { 3318 "id":300, 3319 "timeBounds":{ 3320 "startTime":1134000000, 3321 "scale":90000 3322 } 3323 } 3324 ] 3325 } 3326 ] 3327 } 3328 ], 3329 "video":[ 3330 { 3331 "id":"main-video", 3332 "baseUrl":"video/", 3333 "frameRate":{ 3334 "value":25 3335 }, 3336 "mediaTimeOffset":{ 3337 "value":-972000000, 3338 "scale":90000 3339 }, 3340 "tracks":[ 3341 { 3342 "id":"720p", 3343 "activeSegment":300, 3344 "activeSequenceNumber":45000, 3345 "bandwidth":3000000, 3346 "codecs":"avc1.4d001f", 3347 "continuationPattern": 3348 "720p-content-{segmentId}.mp4", 3349 "initializationPattern": 3350 "720p-init-{initId}.mp4", 3351 "resolution":{ 3352 "width":1280, 3353 "height":720 3354 }, 3355 "segmentDuration":{ 3356 "value":540000, 3357 "scale":90000 3358 }, 3359 "segments":[ 3360 { 3361 "id":300, 3362 "timeBounds":{ 3363 "startTime":1134000000, 3364 "scale":90000 3365 } 3366 } 3367 ] 3368 }, 3369 { 3370 "id":"1080p", 3371 "activeSegment":300, 3372 "activeSequenceNumber":45000, 3373 "bandwidth":5000000, 3374 "codecs":"avc1.4d001f", 3375 "continuationPattern": 3376 "1080p-content-{segmentId}.mp4", 3377 "initializationPattern": 3378 "1080p-init-{initId}.mp4", 3379 "resolution":{ 3380 "width":1920, 3381 "height":1080 3382 }, 3383 "segmentDuration":{ 3384 "value":540000, 3385 "scale":90000 3386 }, 3387 "segments":[ 3388 { 3389 "id":300, 3390 "timeBounds":{ 3391 "startTime":1134000000, 3392 "scale":90000 3393 } 3394 } 3395 ] 3396 } 3397 ] 3398 } 3399 ] 3400 } 3401 ] 3402 } 3404 A.1.2. Timing information 3406 In the above Manifest, we have two Presentations: 3408 * The first Presentation with ID "0" is not active at the current 3409 time, though it was active previously. It contains one audio 3410 Track and one video Track. It is 3 hours long in total and starts 3411 at 00:00:00.000 (Manifest Timestamp.) No Track has a 3412 mediaTimeOffset defined, so the Manifest Timestamps match the 3413 Media Timestamps. 3415 * The second Presentation with ID "1" is active at the current time. 3416 It contains one audio Track and two video Tracks. Its total 3417 duration is yet to be defined. Its start time is 03:00:00.000 in 3418 Manifest Time and its current time (at the moment the Manifest was 3419 retrieved) is 03:30:00.000. All Tracks have the same 3420 mediaTimeOffset defined. The Media Timestamps for this 3421 Presentation start at 00:00:00.000. That also means that for all 3422 the currently active Segments, the media data will contain a 3423 starting timestamp of 00:30:00.000. 3425 The availabilityDuration of the Manifest is 40 minutes. As only 30 3426 minutes of the second Presentation have elapsed, some Segments of the 3427 first Presentation are still available. Therefore, the first 3428 Presentation must still be included in the Manifest. Once 10 minutes 3429 have passed, the first Presentation can be left out of the Manifest. 3431 A.1.3. Content addressing 3433 The client uses the Manifest to derive the content request URLs of 3434 each Track. 3436 For the audio Track of the first Presentation, the following parts 3437 are found: 3439 AudioSwitchingSet.baseUrl = "audio/" 3440 AudioTrack.baseUrl = "96k/" 3441 AudioSwitchingSet.initializationPattern = "init-{initId}.mp4" 3442 AudioSwitchingSet.continuationPattern = "content-{segmentId}.mp4" 3444 The Manifest URL (https://example.com/stream1/manifest.json) is used 3445 as the base for resolution, and then relative resolution is applied 3446 to each of the parts above. 3448 T = resolve(AudioSwitchingSet.baseUrl, manifestUrl) 3449 = "https://example.com/stream1/audio/" 3450 T = resolve(AudioTrack.baseUrl, T) 3451 = "https://example.com/stream1/audio/96k/" 3452 initBaseUrl = resolve(AudioSwitchingSet.initializationPattern, T) 3453 = "https://example.com/stream1/audio/96k/init-{initId}.mp4" 3454 contBaseUrl = resolve(AudioSwitchingSet.continuationPattern, T) 3455 = "https://example.com/stream1/audio/96k/content-{segmentId}.mp4" 3457 The final content request URLs are 3458 https://example.com/stream1/audio/96k/init-{initId}.mp4 and 3459 https://example.com/stream1/audio/96k/content-{segmentId}.mp4 for the 3460 Initialization Stream and Continuation Stream respectively. 3462 For the audio Track of the second Presentation, the following parts 3463 are found: 3465 Presentation.baseUrl = "https://otherexample.com/s2/" 3466 AudioSwitchingSet.baseUrl = "audio/" 3467 AudioTrack.initializationPattern = "128k-init-{initId}.mp4" 3468 AudioTrack.continuationPattern = "128k-content-{segmentId}.mp4" 3470 The Manifest URL (https://example.com/stream1/manifest.json) is used 3471 as the base for resolution, and then relative resolution is applied 3472 to each of the parts above. 3474 T = resolve(Presentation.baseUrl, manifestUrl) 3475 = "https://otherexample.com/s2/" 3476 T = resolve(AudioSwitchingSet.baseUrl, T) 3477 = "https://otherexample.com/s2/audio/" 3478 initBaseUrl = resolve(AudioTrack.initializationPattern, T) 3479 = "https://otherexample.com/s2/audio/128k-init-{initId}.mp4" 3480 contBaseUrl = resolve(AudioTrack.continuationPattern, T) 3481 = "https://otherexample.com/s2/audio/128k-content-{segmentId}.mp4" 3483 The final content request URLs are https://otherexample.com/s2/ 3484 audio/128k-init-{initId}.mp4 and https://otherexample.com/s2/ 3485 audio/128k-content-{segmentId}.mp4 for the Initialization Stream and 3486 Continuation Stream respectively. 3488 A.2. Initialization Stream 3490 In the second step, the client uses the Manifest to retrieve and then 3491 parse an Initialization Packet. 3493 A.2.1. Retrieving Initialization Packets 3495 The client decides to retrieve the only audio Track of the active 3496 Presentation. The URL pattern for the Initialization Stream of this 3497 Track is https://otherexample.com/s2/audio/128k-init-{initId}.mp4. 3499 The client opts to retrieve the most recent Initialization Packet and 3500 sends out a request: 3502 GET /s2/128k-init-now.mp4 HTTP/1.1 3503 Host: otherexample.com 3504 Accept: */* 3506 The server responds with the following headers 3508 HTTP/1.1 200 OK 3509 Content-Type: audio/mp4 3510 Content-Length: 742 3511 Date: Wed, 31 Mar 2021 08:00:02 GMT 3513 and a binary body containing the ISOBMFF media data of the audio 3514 Initialization Packet. 3516 The client repeats this to retrieve one of the video Tracks. 3518 A.2.2. Parsing offset information 3520 The client parses the ISOBMFF boxes in the audio Initialization 3521 Packet. Once the emsg box with scheme ID urn:theo:hesp:2020 and 3522 value initdata is found, its message_data field is parsed. 3524 In this case, message_data equals {"index":200,"offset":63275}. The 3525 following media data for the chosen audio Track is available in the 3526 Continuation Segment with ID 200 and at a byte offset of 63275. 3528 A.3. Continuation Stream 3530 In the final step, the client uses the retrieved information to 3531 retrieve Continuation Segments and reaches playback. 3533 A.3.1. Retrieving Continuation Segments 3535 The client retrieves the Continuation Segment of the chosen audio 3536 Track. The URL pattern for the Continuation Stream of this Track is 3537 https://otherexample.com/s2/audio/128k-content-{segmentId}.mp4. 3539 The client sends out a request for the segment with ID 200, starting 3540 at offset 63275: 3542 GET /s2/128k-content-200.mp4 HTTP/1.1 3543 Host: otherexample.com 3544 Accept: */* 3545 Range: bytes=63275-9007199254740991 3547 As discussed in Section 5.3.2, a sufficiently large end byte value is 3548 chosen to ensure the entire range is retrieved. 3550 The server responds with the following headers 3552 HTTP/1.1 200 OK 3553 Content-Type: audio/mp4 3554 Content-Range: bytes 63275-9007199254740991/9007199254677716 3555 Transfer-Encoding: chunked 3556 Date: Wed, 31 Mar 2021 08:00:03 GMT 3558 and a (chunked) binary body containing the ISOBMFF media data of the 3559 Continuation Segment. 3561 The client repeats this for the chosen video Track and uses the 3562 retrieved media data to reach playback. The Manifest that was 3563 retrieved previously contains sufficient information to retrieve new 3564 Continuation Segments from this point on. 3566 Appendix B. CDNs 3568 A Content Delivery Network (CDN) MAY be employed to increase the 3569 scalability and cacheability for delivering HESP. 3571 While the HESP protocol uses HTTP/1.1 and delivery should be possible 3572 over most HTTP CDNs, care must be taken to ensure that the CDN has 3573 all the required features. 3575 To correctly handle HESP, the CDN MUST either support HTTP/1.1 with 3576 chunked transfer encoding or support HTTP/2. It MUST also support 3577 Range Requests as well as the caching of the partial object 3578 responses. This ensures that HESP requests and responses pass 3579 correctly through the CDN and that responses can be cached for future 3580 use. 3582 The CDN SHOULD support collapsing multiple HTTP/1.1 Range Requests 3583 with overlapping byte-ranges into a single request. This ensures 3584 that two requests with byte-ranges that are partially overlapping 3585 require only a single request to the media server for the overlapping 3586 part. This reduces the number of concurrent requests arriving on the 3587 media server. 3589 Appendix C. HESP Profiles (using H.264 as video codec) 3591 In this annex, we describe two possible profiles for video Tracks of 3592 HESP streams. Both profiles place certain requirements on the 3593 underlying H.264 encoding of the HESP stream. 3595 C.1. Maximal Gain Profile 3597 The goal of this profile is to allow the stream to reach the lowest 3598 latency and zapping times possible using the HESP protocol. In this 3599 profile, it should be ensured that an Initialization Packet exists 3600 for each frame of the Continuation Stream. As a result, a client can 3601 start playback, seek and switch between Tracks at any time position 3602 of the stream. 3604 The Initialization Stream for Tracks of the Maximal Gain Profile must 3605 satisfy the following requirements: 3607 * The frame rate of the Initialization Stream MUST match the 3608 Continuation Stream. 3610 * Each media sample of the Initialization Stream MUST be independent 3611 (i.e., an I frame in H.264) and individually addressable (the 3612 latter is currently always true for HESP.) 3614 The Continuation Stream for Tracks of the Maximal Gain Profile must 3615 satisfy the following requirements: 3617 * Each media sample of the Continuation Stream MUST be either 3618 independent (i.e., an I frame in H.264) or dependent on the media 3619 sample directly preceding it in decode order (i.e., a P frame 3620 referencing only the previous frame.) 3622 * Each CMAF Fragment of the Continuation Segment MUST only contain 3623 one media sample. 3625 * Each Continuation Segment SHOULD be significantly long (values of 3626 multiple minutes are possible and encouraged.) 3627 +----+----+----+----+----+----+----+----+----+----+ 3628 time positions | 00 | 01 | 02 | 03 | 04 | 05 | 06 | 07 | 08 | 09 | 3629 +----+----+----+----+----+----+----+----+----+----+ 3631 +----+----+----+----+----+----+----+----+----+----+ 3632 initialization | IDR| IDR| IDR| IDR| IDR| IDR| IDR| IDR| IDR| IDR| 3633 stream | 00 | 01 | 02 | 03 | 04 | 05 | 06 | 07 | 08 | 09 | 3634 +----+----+----+----+----+----+----+----+----+----+ 3635 | REF| REF| REF| REF| REF| REF| REF| REF| REF| REF| 3636 | a | b | c | d | e | f | g | h | i | j | 3637 +----+----+----+----+----+----+----+----+----+----+ 3639 +----+----+----+----+----+----+----+----+----+----+ 3640 continuation | I,P| I,P| I,P| I,P| I,P| I,P| I,P| I,P| I,P| I,P| 3641 stream | 00 | 01 | 02 | 03 | 04 | 05 | 06 | 07 | 08 | 09 | 3642 +----+----+----+----+----+----+----+----+----+----+ 3643 | | | 3644 | | \___ position c 3645 | | 3646 | \___ position b 3647 | 3648 \___ position a 3650 Figure 4: Maximum Gain Profile 3652 C.2. Compatibility Profile 3654 The goal of this profile is to allow media data from other HTTP-based 3655 adaptive bitrate protocols to be reused. This comes at the cost of 3656 some optimizations made by the previous profile. In this profile, it 3657 is not possible to start playback at any time position of the stream, 3658 as it is not possible for Initialization Packets to refer to every 3659 sample of a Continuation Segment. 3661 The Initialization Stream for Tracks of the Compatibility Profile 3662 must satisfy the following requirements: 3664 * Each media sample of the Initialization Stream MUST be independent 3665 (i.e., an I frame in H.264) and individually addressable (the 3666 latter is currently always true for HESP.) 3668 * Initialization Packets MUST contain a reference to the subsequent 3669 sample in the Continuation Segment, where this subsequent sample 3670 MUST be either independent (i.e., an I frame in H.264) or 3671 dependent on the media sample directly preceding it in decode 3672 order (i.e., a P frame referencing only the previous frame.) If 3673 the subsequent sample does not meet this constraint, then this 3674 Initialization Packet MUST NOT be published. Instead, the last 3675 valid Initialization Packet MUST be returned if this Sequence 3676 Number is queried. 3678 The Continuation Stream for Tracks of the Compatibility Profile must 3679 satisfy the following requirements: 3681 * Continuation Segments SHOULD NOT exceed a duration of 30 seconds. 3683 * Each CMAF Fragment of the Continuation Segment MUST contain at 3684 most the amount of media samples of a sub-GOP (defined below.) 3686 * The following GOP structure must be followed in the underlying 3687 H.264 stream: I, B (repeated n times), P, B (repeated n times), P, 3688 where n lies between 0 and 4. 3690 * Each sub-GOP (B ... B P) MUST depend on at most one previous frame 3691 (allowing for keyframe insertion.) 3693 C.2.1. Example 3695 Figure 5 depicts a part of H.264 output of 7 frames, sorted in decode 3696 order. The dependencies of each frame are shown with arrows. 3698 ______________________________________ 3699 | | | | 3700 V | | | 3701 +----+ +----+ +----+ +----+ +----+ +----+ +----+ 3702 |####| |%%%%| |%%%%| |%%%%| |@@@@| |@@@@| |@@@@| 3703 | I1 | | P4 | | B2 | | B3 | | P7 | | B5 | | B6 | 3704 +----+ +----+ +----+ +----+ +----+ +----+ +----+ 3705 ^ | ^ | | ^ | | 3706 |_____| | | | |______| | 3707 | | | | | | 3708 |________|_____| | |______________| 3709 | | 3710 |______________________| 3712 Figure 5: Continuation stream with sub-GOPs 3714 C.2.1.1. Sub GOPs 3716 A "sub-GOP" defines a set of B and P frames that only depend on one 3717 previous frame. 3719 In Figure 5, there are 3 sub-GOPs: 3721 * sub-GOP 1 (####) contains only a single I frame. 3723 * sub-GOP 2 (%%%%) contains a P frame (P4) that only depends on I1; 3724 all B frames depend on I1 and P4. 3726 * sub-GOP 3 (@@@@) contains a P frame (P7) that only depends on P4; 3727 all B frames depend on P7 and P4. 3729 C.2.1.2. Initialization Packets 3731 An Initialization Packet can be published if the subsequent media 3732 sample of the Continuation Stream depends on at most one previous 3733 frame. 3735 For Figure 5, this is the case at the following positions: 3737 * position 1: an Initialization Packet can be published. It will 3738 contain IDR1 (a keyframe matching the timestamp of I1) and will 3739 reference P4, the subsequent media sample of the Continuation 3740 Stream that only depends on I1. On the client-side, the media 3741 data will be decoded with IDR1 inserted at the location of I1. 3743 * position 4: an Initialization Packet can be published. It will 3744 contain IDR4 (a keyframe matching the timestamp of P4) and will 3745 reference P7, the subsequent media sample of the Continuation 3746 Stream that only depends on P4. On the client-side, the media 3747 data will be decoded with IDR4 inserted at the location of P4. 3749 A client requesting an Initialization Packet at other time positions 3750 must receive the most recent valid Initialization Packet. For 3751 example, that means that a request for an Initialization Packet at 3752 position 2 in Figure 5 must return the Initialization Packet at 3753 position 1. 3755 Author's Address 3757 Pieter-Jan Speelmans (editor) 3758 THEO Technologies 3759 Leuven 3760 Belgium 3761 Email: pieter-jan.speelmans@theoplayer.com