idnits 2.17.1 draft-ietf-cellar-ebml-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but does not include the phrase in its RFC 2119 key words list. -- The document date (July 2, 2017) is 2490 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'EBMLParentPath' is mentioned on line 977, but not defined == Missing Reference: 'EBMLMinOccurrence' is mentioned on line 988, but not defined == Missing Reference: 'EBMLMaxOccurrence' is mentioned on line 988, but not defined == Missing Reference: 'PathMinOccurrence' is mentioned on line 991, but not defined == Missing Reference: 'PathMaxOccurrence' is mentioned on line 991, but not defined -- Possible downref: Non-RFC (?) normative reference: ref. 'IEEE.754.1985' -- Possible downref: Non-RFC (?) normative reference: ref. 'ITU.V42.1994' Summary: 0 errors (**), 0 flaws (~~), 7 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 cellar S. Lhomme 3 Internet-Draft 4 Intended status: Standards Track D. Rice 5 Expires: January 3, 2018 6 M. Bunkus 7 July 2, 2017 9 Extensible Binary Meta Language 10 draft-ietf-cellar-ebml-03 12 Abstract 14 This document defines the Extensible Binary Meta Language (EBML) 15 format as a generalized file format for any type of data in a 16 hierarchical form. EBML is designed as a binary equivalent to XML 17 and uses a storage-efficient approach to build nested Elements with 18 identifiers, lengths, and values. Similar to how an XML Schema 19 defines the structure and semantics of an XML Document, this document 20 defines how EBML Schemas are created to convey the semantics of an 21 EBML Document. 23 Status of This Memo 25 This Internet-Draft is submitted in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF). Note that other groups may also distribute 30 working documents as Internet-Drafts. The list of current Internet- 31 Drafts is at http://datatracker.ietf.org/drafts/current/. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 This Internet-Draft will expire on January 3, 2018. 40 Copyright Notice 42 Copyright (c) 2017 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents 47 (http://trustee.ietf.org/license-info) in effect on the date of 48 publication of this document. Please review these documents 49 carefully, as they describe your rights and restrictions with respect 50 to this document. Code Components extracted from this document must 51 include Simplified BSD License text as described in Section 4.e of 52 the Trust Legal Provisions and are provided without warranty as 53 described in the Simplified BSD License. 55 Table of Contents 57 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 58 2. Notation and Conventions . . . . . . . . . . . . . . . . . . 4 59 3. Security Considerations . . . . . . . . . . . . . . . . . . . 6 60 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 61 5. Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 7 62 6. Variable Size Integer . . . . . . . . . . . . . . . . . . . . 7 63 6.1. VINT_WIDTH . . . . . . . . . . . . . . . . . . . . . . . 8 64 6.2. VINT_MARKER . . . . . . . . . . . . . . . . . . . . . . . 8 65 6.3. VINT_DATA . . . . . . . . . . . . . . . . . . . . . . . . 8 66 6.4. VINT Examples . . . . . . . . . . . . . . . . . . . . . . 9 67 7. Element ID . . . . . . . . . . . . . . . . . . . . . . . . . 9 68 8. Element Data Size . . . . . . . . . . . . . . . . . . . . . . 11 69 9. EBML Element Types . . . . . . . . . . . . . . . . . . . . . 12 70 9.1. Signed Integer Element . . . . . . . . . . . . . . . . . 13 71 9.2. Unsigned Integer Element . . . . . . . . . . . . . . . . 13 72 9.3. Float Element . . . . . . . . . . . . . . . . . . . . . . 13 73 9.4. String Element . . . . . . . . . . . . . . . . . . . . . 14 74 9.5. UTF-8 Element . . . . . . . . . . . . . . . . . . . . . . 14 75 9.6. Date Element . . . . . . . . . . . . . . . . . . . . . . 14 76 9.7. Master Element . . . . . . . . . . . . . . . . . . . . . 14 77 9.8. Binary Element . . . . . . . . . . . . . . . . . . . . . 15 78 10. Terminating Elements . . . . . . . . . . . . . . . . . . . . 15 79 11. Guidelines for Updating Elements . . . . . . . . . . . . . . 16 80 11.1. Reducing a Element Data in Size . . . . . . . . . . . . 16 81 11.1.1. Adding a Void Element . . . . . . . . . . . . . . . 16 82 11.1.2. Extending the Element Data Size . . . . . . . . . . 16 83 11.1.3. Terminating Element Data . . . . . . . . . . . . . . 17 84 11.2. Considerations when Updating Elements with CRC . . . . . 17 85 12. EBML Document . . . . . . . . . . . . . . . . . . . . . . . . 18 86 12.1. EBML Header . . . . . . . . . . . . . . . . . . . . . . 18 87 12.2. EBML Body . . . . . . . . . . . . . . . . . . . . . . . 18 88 13. EBML Stream . . . . . . . . . . . . . . . . . . . . . . . . . 19 89 14. Elements semantic . . . . . . . . . . . . . . . . . . . . . . 19 90 14.1. EBML Schema . . . . . . . . . . . . . . . . . . . . . . 19 91 14.1.1. Element . . . . . . . . . . . . . . . . . . . . . . 20 92 14.1.2. Attributes . . . . . . . . . . . . . . . . . . . . . 20 93 14.1.3. Element . . . . . . . . . . . . . . . . . . . . . . 20 94 14.1.4. Attributes . . . . . . . . . . . . . . . . . . . . . 21 95 14.1.5. Element . . . . . . . . . . . . . . . . . . . . . . 25 96 14.1.6. Attributes . . . . . . . . . . . . . . . . . . . . . 25 97 14.1.7. Element . . . . . . . . . . . . . . . . . . . . . . 26 98 14.1.8. Element . . . . . . . . . . . . . . . . . . . . . . 26 99 14.1.9. Attributes . . . . . . . . . . . . . . . . . . . . . 26 100 14.1.10. XML Schema for EBML Schema . . . . . . . . . . . . . 26 101 14.1.11. EBML Schema Example . . . . . . . . . . . . . . . . 28 102 14.1.12. Identically Recurring Elements . . . . . . . . . . . 29 103 14.1.13. Expression of range . . . . . . . . . . . . . . . . 29 104 14.1.14. Textual expression of Floats . . . . . . . . . . . . 30 105 14.1.15. Note on the Use of default attributes to define 106 Mandatory EBML Elements . . . . . . . . . . . . . . 31 107 14.2. EBML Header Elements . . . . . . . . . . . . . . . . . . 31 108 14.2.1. EBML Element . . . . . . . . . . . . . . . . . . . . 31 109 14.2.2. EBMLVersion Element . . . . . . . . . . . . . . . . 32 110 14.2.3. EBMLReadVersion Element . . . . . . . . . . . . . . 32 111 14.2.4. EBMLMaxIDLength Element . . . . . . . . . . . . . . 33 112 14.2.5. EBMLMaxSizeLength Element . . . . . . . . . . . . . 33 113 14.2.6. DocType Element . . . . . . . . . . . . . . . . . . 34 114 14.2.7. DocTypeVersion Element . . . . . . . . . . . . . . . 34 115 14.2.8. DocTypeReadVersion Element . . . . . . . . . . . . . 35 116 14.3. Global elements (used everywhere in the format) . . . . 35 117 14.3.1. Void Element . . . . . . . . . . . . . . . . . . . . 36 118 15. References . . . . . . . . . . . . . . . . . . . . . . . . . 36 119 15.1. Normative References . . . . . . . . . . . . . . . . . . 36 120 15.2. Informative References . . . . . . . . . . . . . . . . . 37 121 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 37 123 1. Introduction 125 "EBML", short for Extensible Binary Meta Language, specifies a binary 126 and octet (byte) aligned format inspired by the principle of XML (a 127 framework for structuring data). 129 The goal of this document is to define a generic, binary, space- 130 efficient format that can be used to define more complex formats 131 (such as containers for multimedia content) using an "EBML Schema". 132 The definition of the "EBML" format recognizes the idea behind HTML 133 and XML as a good one: separate structure and semantics allowing the 134 same structural layer to be used with multiple, possibly widely 135 differing semantic layers. Except for the "EBML Header" and a few 136 global elements this specification does not define particular "EBML" 137 format semantics; however this specification is intended to define 138 how other "EBML"-based formats can be defined. 140 "EBML" uses a simple approach of building "Elements" upon three 141 pieces of data (tag, length, and value) as this approach is well 142 known, easy to parse, and allows selective data parsing. The "EBML" 143 structure additionally allows for hierarchical arrangement to support 144 complex structural formats in an efficient manner. 146 2. Notation and Conventions 148 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 149 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 150 document are to be interpreted as described in [RFC2119]. 152 This document defines specific terms in order to define the format 153 and application of "EBML". Specific terms are defined below: 155 "EBML": Extensible Binary Meta Language 157 "EBML Document Type": An "EBML Document Type" is a name provided by 158 an "EBML Schema" for a particular implementation of "EBML" for a data 159 format (examples: matroska and webm). 161 "EBML Schema": A standardized definition for the structure of an 162 "EBML Document Type". 164 "EBML Document": An "EBML Document" is a datastream comprised of only 165 two components, an "EBML Header" and an "EBML Body". 167 "EBML Reader": An "EBML Reader" is a data parser that interprets the 168 semantics of an "EBML Document" and creates a way for programs to use 169 "EBML". 171 "EBML Stream": An "EBML Stream" is a file that consists of one or 172 more "EBML Documents" that are concatenated together. 174 "EBML Header": The "EBML Header" is a declaration that provides 175 processing instructions and identification of the "EBML Body". The 176 "EBML Header" may be considered as analogous to an XML Declaration 177 [W3C.REC-xml-20081126] (see section 2.8 on Prolog and Document Type 178 Declaration). 180 "EBML Body": All data of an "EBML Document" following the "EBML 181 Header" may be considered the "EBML Body". 183 "Variable Size Integer": A compact variable-length binary value which 184 defines its own length. 186 "VINT": Also known as "Variable Size Integer". 188 "EBML Element": A foundation block of data that contains three parts: 189 an "Element ID", an "Element Data Size", and "Element Data". 191 "Element ID": The "Element ID" is a binary value, encoded as a 192 "Variable Size Integer", used to uniquely identify a defined "EBML 193 Element" within a specific "EBML Schema". 195 "EBML Class": A representation of the octet length of an "Element 196 ID". 198 "Element Data Size": An expression, encoded as a "Variable Size 199 Integer", of the length in octets of "Element Data". 201 "VINTMAX": The maximum possible value that can be stored as "Element 202 Data Size". 204 "Unknown-Sized Element": An Element with an unknown "Element Data 205 Size". 207 "Element Data": The value(s) of the "EBML Element" which is 208 identified by its "Element ID" and "Element Data Size". The form of 209 the "Element Data" is defined by this document and the corresponding 210 "EBML Schema" of the Element's "EBML Document Type". 212 "Root Level": The starting level in the hierarchy of an "EBML 213 Document". 215 "Root Element": A mandatory, non-repeating "EBML Element" which 216 occurs at the top level of the path hierarchy within an "EBML Body" 217 and contains all other "EBML Elements" of the "EBML Body", excepting 218 optional "Void Elements". 220 "Top-Level Element": An "EBML Element" defined to only occur as a 221 "Child Element" of the "Root Element". 223 "Master Element": The "Master Element" contains zero, one, or many 224 other "EBML Elements". 226 "Child Element": A "Child Element" is a relative term to describe the 227 "EBML Elements" immediately contained within a "Master Element". 229 "Parent Element": A relative term to describe the "Master Element" 230 which contains a specified element. For any specified "EBML Element" 231 that is not at "Root Level", the "Parent Element" refers to the 232 "Master Element" in which that "EBML Element" is contained. 234 "Descendant Element": A "Descendant Element" is a relative term to 235 describe any "EBML Elements" contained within a "Master Element", 236 including any of the "Child Elements" of its "Child Elements", and so 237 on. 239 "Element Name": The official human-readable name of the "EBML 240 Element". 242 "Element Path": The hierarchy of "Parent Element" where the "EBML 243 Element" is expected to be found in the "EBML Body". 245 "Empty Element": An "Empty Element" is an "EBML Element" that has an 246 "Element Data Size" with all "VINT_DATA" bits set to zero which 247 indicates that the "Element Data" of the Element is zero octets in 248 length. 250 3. Security Considerations 252 "EBML" itself does not offer any kind of security and does not 253 provide confidentiality. "EBML" does not provide any kind of 254 authorization. "EBML" only offers marginally useful and effective 255 data integrity options, such as CRC elements. 257 Even if the semantic layer offers any kind of encryption, "EBML" 258 itself could leak information at both the semantic layer (as declared 259 via the DocType element) and within the "EBML" structure (you can 260 derive the presence of "EBML Elements" even with an unknown semantic 261 layer with a heuristic approach; not without errors, of course, but 262 with a certain degree of confidence). 264 Attacks on an "EBML Reader" could include: 266 o Invalid "Element IDs" that are longer than the limit stated in the 267 "EBMLMaxIDLength Element" of the "EBML Header". 269 o Invalid "Element IDs" that are not encoded in the shortest- 270 possible way. 272 o Invalid "Element IDs" comprised of reserved values. 274 o Invalid "Element Data Size" values that are longer than the limit 275 stated in the "EBMLMaxSizeLength Element" of the "EBML Header". 277 o Invalid "Element Data Size" values (e.g. extending the length of 278 the "EBML Element" beyond the scope of the "Parent Element"; 279 possibly triggering access-out-of-bounds issues). 281 o Very high lengths in order to force out-of-memory situations 282 resulting in a denial of service, access-out-of-bounds issues etc. 284 o Missing "EBML Elements" that are mandatory and have no declared 285 default value. 287 o Usage of "0x00" octets in "EBML Elements" with a string type. 289 o Usage of invalid UTF-8 encoding in "EBML Elements" of UTF-8 type 290 (e.g. in order to trigger access-out-of-bounds or buffer overflow 291 issues). 293 o Usage of invalid data in "EBML Elements" with a date type. 295 Side channel attacks could exploit: 297 o The semantic equivalence of the same string stored in a "String 298 Element" or "UTF-8 Element" with and without zero-bit padding. 300 o The semantic equivalence of "VINT_DATA" within "Element Data Size" 301 with to different lengths due to left-padding zero bits. 303 o Data contained within a "Master Element" which is not itself part 304 of an "EBML Element". 306 o Extraneous copies of "Identically Recurring Element". 308 o Copies of "Identically Recurring Element" within a "Parent 309 Element" that contain invalid "CRC-32 Elements". 311 o Use of "Void Elements". 313 4. IANA Considerations 315 This document has no IANA actions. 317 5. Structure 319 "EBML" uses a system of Elements to compose an "EBML Document". 320 "EBML Elements" incorporate three parts: an "Element ID", an "Element 321 Data Size", and "Element Data". The "Element Data", which is 322 described by the "Element ID", includes either binary data, one or 323 many other "EBML Elements", or both. 325 6. Variable Size Integer 327 The "Element ID" and "Element Data Size" are both encoded as a 328 "Variable Size Integer", developed according to a UTF-8 like system. 329 The "Variable Size Integer" is composed of a "VINT_WIDTH", 330 "VINT_MARKER", and "VINT_DATA", in that order. "Variable Size 331 Integers" SHALL left-pad the "VINT_DATA" value with zero bits so that 332 the whole "Variable Size Integer" is octet-aligned. "Variable Size 333 Integers" SHALL be referred to as "VINT" for shorthand. 335 6.1. VINT_WIDTH 337 Each "Variable Size Integer" begins with a "VINT_WIDTH" which 338 consists of zero or many zero-value bits. The count of consecutive 339 zero-values of the "VINT_WIDTH" plus one equals the length in octets 340 of the "Variable Size Integer". For example, a "Variable Size 341 Integer" that starts with a "VINT_WIDTH" which contains zero 342 consecutive zero-value bits is one octet in length and a "Variable 343 Size Integer" that starts with one consecutive zero-value bit is two 344 octets in length. The "VINT_WIDTH" MUST only contain zero-value bits 345 or be empty. 347 Within the "EBML Header" the "VINT_WIDTH" MUST NOT exceed three bits 348 in length (meaning that the "Variable Size Integer" MUST NOT exceed 349 four octets in length). Within the "EBML Body", when "VINTs" are 350 used to express an "Element ID", the maximum length allowed for the 351 "VINT_WIDTH" is one less than the value set in the "EBMLMaxIDLength 352 Element". Within the "EBML Body", when "VINTs" are used to express 353 an "Element Data Size", the maximum length allowed for the 354 "VINT_WIDTH" is one less than the value set in the "EBMLMaxSizeLength 355 Element". 357 6.2. VINT_MARKER 359 The "VINT_MARKER" serves as a separator between the "VINT_WIDTH" and 360 "VINT_DATA". Each "Variable Size Integer" MUST contain exactly one 361 "VINT_MARKER". The "VINT_MARKER" MUST be one bit in length and 362 contain a bit with a value of one. The first bit with a value of one 363 within the "Variable Size Integer" is the "VINT_MARKER". 365 6.3. VINT_DATA 367 The "VINT_DATA" portion of the "Variable Size Integer" includes all 368 data that follows (but not including) the "VINT_MARKER" until end of 369 the "Variable Size Integer" whose length is derived from the 370 "VINT_WIDTH". The bits required for the "VINT_WIDTH" and the 371 "VINT_MARKER" combined use one out of eight bits of the total length 372 of the "Variable Size Integer". Thus a "Variable Size Integer" of 1 373 octet length supplies 7 bits for "VINT_DATA", a 2 octet length 374 supplies 14 bits for "VINT_DATA", and a 3 octet length supplies 21 375 bits for "VINT_DATA". If the number of bits required for "VINT_DATA" 376 are less than the bit size of "VINT_DATA", then "VINT_DATA" SHOULD be 377 zero-padded to the left to a size that fits. The "VINT_DATA" value 378 MUST be expressed as a big-endian unsigned integer. 380 6.4. VINT Examples 382 This table shows examples of "Variable Size Integers" with lengths 383 from 1 to 5 octets. The Size column refers to the size of the 384 "VINT_DATA" in bits. The Representation column depicts a binary 385 expression of "Variable Size Integers" where "VINT_WIDTH" is depicted 386 by '0', the "VINT_MARKER" as '1', and the "VINT_DATA" as 'x'. 388 +-------------+------+----------------------------------------------+ 389 | Octet | Size | Representation | 390 | Length | | | 391 +-------------+------+----------------------------------------------+ 392 | 1 | 2^7 | 1xxx xxxx | 393 | 2 | 2^14 | 01xx xxxx xxxx xxxx | 394 | 3 | 2^21 | 001x xxxx xxxx xxxx xxxx xxxx | 395 | 4 | 2^28 | 0001 xxxx xxxx xxxx xxxx xxxx xxxx xxxx | 396 | 5 | 2^35 | 0000 1xxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx | 397 | | | xxxx | 398 +-------------+------+----------------------------------------------+ 400 Data encoded as a "Variable Size Integer" MAY be rendered at octet 401 lengths larger than needed to store the data. In this table a binary 402 value of "0b10" is shown encoded as different "Variable Size 403 Integers" with lengths from one octet to four octet. All four 404 encoded examples have identical semantic meaning though the 405 "VINT_WIDTH" and the padding of the "VINT_DATA" vary. 407 +--------------+--------------+-------------------------------------+ 408 | Binary Value | Octet Length | As Represented in Variable Size | 409 | | | Integer | 410 +--------------+--------------+-------------------------------------+ 411 | 10 | 1 | 1000 0010 | 412 | 10 | 2 | 0100 0000 0000 0010 | 413 | 10 | 3 | 0010 0000 0000 0000 0000 0010 | 414 | 10 | 4 | 0001 0000 0000 0000 0000 0000 0000 | 415 | | | 0010 | 416 +--------------+--------------+-------------------------------------+ 418 7. Element ID 420 The "Element ID" MUST be encoded as a "Variable Size Integer". By 421 default, "Element IDs" are encoded in lengths from one octet to four 422 octets, although "Element IDs" of greater lengths are used if the 423 octet length of the longest "Element ID" of the "EBML Document" is 424 declared in the "EBMLMaxIDLength Element" of the "EBML Header" (see 425 Section 14.2.4). The "VINT_DATA" component of the "Element ID" MUST 426 NOT be either defined or written as either all zero values or all one 427 values. Any "Element ID" with the "VINT_DATA" component set as all 428 zero values or all one values MUST be ignored and MUST NOT be 429 considered an error in the "EBML Document". The "VINT_DATA" 430 component of the "Element ID" MUST be encoded at the shortest valid 431 length. For example, an "Element ID" with binary encoding of "1011 432 1111" is valid, whereas an "Element ID" with binary encoding of "0100 433 0000 0011 1111" stores a semantically equal "VINT_DATA" but is 434 invalid because a shorter "VINT" encoding is possible. Additionally, 435 an "Element ID" with binary encoding of "1111 1111" is invalid since 436 the "VINT_DATA" section is set to all one values, whereas an "Element 437 ID" with binary encoding of "0100 0000 0111 1111" stores a 438 semantically equal "VINT_DATA" and is the shortest possible "VINT" 439 encoding. 441 The following table details these specific examples further: 443 +------------+-------------+----------------+-----------------------+ 444 | VINT_WIDTH | VINT_MARKER | VINT_DATA | Element ID Status | 445 +------------+-------------+----------------+-----------------------+ 446 | | 1 | 0000000 | Invalid: "VINT_DATA" | 447 | | | | MUST NOT be set to | 448 | | | | all 0 | 449 | 0 | 1 | 00000000000000 | Invalid: "VINT_DATA" | 450 | | | | MUST NOT be set to | 451 | | | | all 0 | 452 | | 1 | 0000001 | Valid | 453 | 0 | 1 | 00000000000001 | Invalid: A shorter | 454 | | | | "VINT_DATA" encoding | 455 | | | | is available. | 456 | | 1 | 0111111 | Valid | 457 | 0 | 1 | 00000000111111 | Invalid: A shorter | 458 | | | | "VINT_DATA" encoding | 459 | | | | is available. | 460 | | 1 | 1111111 | Invalid: "VINT_DATA" | 461 | | | | MUST NOT be set to | 462 | | | | all 1 | 463 | 0 | 1 | 00000001111111 | Valid | 464 +------------+-------------+----------------+-----------------------+ 466 The octet length of an "Element ID" determines its "EBML Class". 468 +------------+--------------+--------------------------------+ 469 | EBML Class | Octet Length | Number of Possible Element IDs | 470 +------------+--------------+--------------------------------+ 471 | Class A | 1 | 2^7 - 2 = 126 | 472 | Class B | 2 | 2^14 - 2^7 - 1 = 16,255 | 473 | Class C | 3 | 2^21 - 2^14 - 1 = 2,080,767 | 474 | Class D | 4 | 2^28 - 2^21 - 1 = 266,338,303 | 475 +------------+--------------+--------------------------------+ 477 8. Element Data Size 479 The "Element Data Size" expresses the length in octets of "Element 480 Data". The "Element Data Size" itself MUST be encoded as a "Variable 481 Size Integer". By default, "Element Data Sizes" can be encoded in 482 lengths from one octet to eight octets, although "Element Data Sizes" 483 of greater lengths MAY be used if the octet length of the longest 484 "Element Data Size" of the "EBML Document" is declared in the 485 "EBMLMaxSizeLength Element" of the "EBML Header" (see 486 Section 14.2.5). Unlike the "VINT_DATA" of the "Element ID", the 487 "VINT_DATA" component of the "Element Data Size" is not mandated to 488 be encoded at the shortest valid length. For example, an "Element 489 Data Size" with binary encoding of "1011 1111" or a binary encoding 490 of "0100 0000 0011 1111" are both valid "Element Data Sizes" and both 491 store a semantically equal value (both "0b00000000111111" and 492 "0b0111111", the "VINT_DATA" sections of the examples, represent the 493 integer 63). 495 Although an "Element ID" with all "VINT_DATA" bits set to zero is 496 invalid, an "Element Data Size" with all "VINT_DATA" bits set to zero 497 is allowed for "EBML Element Types" which do not mandate a non-zero 498 length (see Section 9). An "Element Data Size" with all "VINT_DATA" 499 bits set to zero indicates that the "Element Data" is zero octets in 500 length. Such an "EBML Element" is referred to as an "Empty Element". 501 If an "Empty Element" has a "default" value declared then the "EBML 502 Reader" MUST interpret the value of the "Empty Element" as the 503 "default" value. If an "Empty Element" has no "default" value 504 declared then the "EBML Reader" MUST interpret the value of the 505 "Empty Element" as defined as part of the definition of the 506 corresponding "EBML Element Type" associated with the "Element ID". 508 An "Element Data Size" with all "VINT_DATA" bits set to one is 509 reserved as an indicator that the size of the "EBML Element" is 510 unknown. The only reserved value for the "VINT_DATA" of "Element 511 Data Size" is all bits set to one. An "EBML Element" with an unknown 512 "Element Data Size" is referred to as an "Unknown-Sized Element". 513 Only "Master Elements" SHALL be "Unknown-Sized Elements". "Master 514 Elements" MUST NOT use an unknown size unless the 515 "unknownsizeallowed" attribute of their "EBML Schema" is set to true 516 (see Section 14.1.4.10). The use of "Unknown-Sized Elements" allows 517 for an "EBML Element" to be written and read before the size of the 518 "EBML Element" is known. "Unknown-Sized Element" MUST NOT be used or 519 defined unnecessarily; however if the "Element Data Size" is not 520 known before the "Element Data" is written, such as in some cases of 521 data streaming, then "Unknown-Sized Elements" MAY be used. The end 522 of an "Unknown-Sized Element" is determined by whichever comes first: 523 the end of the file or the beginning of the next "EBML Element", 524 defined by this document or the corresponding "EBML Schema", that is 525 not independently valid as "Descendant Element" of the "Unknown-Sized 526 Element". 528 For "Element Data Sizes" encoded at octet lengths from one to eight, 529 this table depicts the range of possible values that can be encoded 530 as an "Element Data Size". An "Element Data Size" with an octet 531 length of 8 is able to express a size of 2^56-2 or 532 72,057,594,037,927,934 octets (or about 72 petabytes). The maximum 533 possible value that can be stored as "Element Data Size" is referred 534 to as "VINTMAX". 536 +--------------+----------------------+ 537 | Octet Length | Possible Value Range | 538 +--------------+----------------------+ 539 | 1 | 0 to 2^7-2 | 540 | 2 | 0 to 2^14-2 | 541 | 3 | 0 to 2^21-2 | 542 | 4 | 0 to 2^28-2 | 543 | 5 | 0 to 2^35-2 | 544 | 6 | 0 to 2^42-2 | 545 | 7 | 0 to 2^49-2 | 546 | 8 | 0 to 2^56-2 | 547 +--------------+----------------------+ 549 If the length of "Element Data" equals "2^(n*7)-1" then the octet 550 length of the "Element Data Size" MUST be at least "n+1". This rule 551 prevents an "Element Data Size" from being expressed as a reserved 552 value. For example, an "EBML Element" with an octet length of 127 553 MUST NOT be encoded in an "Element Data Size" encoding with a one 554 octet length. The following table clarifies this rule by showing a 555 valid and invalid expression of an "Element Data Size" with a 556 "VINT_DATA" of 127 (which is equal to 2^(1*7)-1). 558 +------------+-------------+----------------+-----------------------+ 559 | VINT_WIDTH | VINT_MARKER | VINT_DATA | Element Data Size | 560 | | | | Status | 561 +------------+-------------+----------------+-----------------------+ 562 | | 1 | 1111111 | Reserved (meaning | 563 | | | | Unknown) | 564 | 0 | 1 | 00000001111111 | Valid (meaning 127 | 565 | | | | octets) | 566 +------------+-------------+----------------+-----------------------+ 568 9. EBML Element Types 570 "EBML Elements" are defined by an "EBML Schema" which MUST declare 571 one of the following "EBML Element Types" for each "EBML Element". 572 An "EBML Element Type" defines a concept of storing data within an 573 "EBML Element" that describes such characteristics as length, 574 endianness, and definition. 576 "EBML Elements" which are defined as a "Signed Integer Element", 577 "Unsigned Integer Element", "Float Element", or "Date Element" use 578 big endian storage. 580 9.1. Signed Integer Element 582 A "Signed Integer Element" MUST declare a length from zero to eight 583 octets. If the "EBML Element" is not defined to have a "default" 584 value, then a "Signed Integer Element" with a zero-octet length 585 represents an integer value of zero. 587 A "Signed Integer Element" stores an integer (meaning that it can be 588 written without a fractional component) which could be negative, 589 positive, or zero. Signed Integers MUST be stored with two's 590 complement notation with the leftmost bit being the sign bit. 591 Because "EBML" limits Signed Integers to 8 octets in length a "Signed 592 Integer Element" stores a number from -9,223,372,036,854,775,808 to 593 +9,223,372,036,854,775,807. 595 9.2. Unsigned Integer Element 597 An "Unsigned Integer Element" MUST declare a length from zero to 598 eight octets. If the "EBML Element" is not defined to have a 599 "default" value, then an "Unsigned Integer Element" with a zero-octet 600 length represents an integer value of zero. 602 An "Unsigned Integer Element" stores an integer (meaning that it can 603 be written without a fractional component) which could be positive or 604 zero. Because "EBML" limits Unsigned Integers to 8 octets in length 605 an "Unsigned Integer Element" stores a number from 0 to 606 18,446,744,073,709,551,615. 608 9.3. Float Element 610 A "Float Element" MUST declare a length of either zero octets (0 611 bit), four octets (32 bit) or eight octets (64 bit). If the "EBML 612 Element" is not defined to have a "default" value, then a "Float 613 Element" with a zero-octet length represents a numerical value of 614 zero. 616 A "Float Element" stores a floating-point number as defined in 617 [IEEE.754.1985]. 619 9.4. String Element 621 A "String Element" MUST declare a length in octets from zero to 622 "VINTMAX". If the "EBML Element" is not defined to have a "default" 623 value, then a "String Element" with a zero-octet length represents an 624 empty string. 626 A "String Element" MUST either be empty (zero-length) or contain 627 printable ASCII characters [RFC0020] in the range of "0x20" to 628 "0x7E", with an exception made for termination (see Section 10). 630 9.5. UTF-8 Element 632 A "UTF-8 Element" MUST declare a length in octets from zero to 633 "VINTMAX". If the "EBML Element" is not defined to have a "default" 634 value, then a "UTF-8 Element" with a zero-octet length represents an 635 empty string. 637 A "UTF-8 Element" contains only a valid Unicode string as defined in 638 [RFC3629], with an exception made for termination (see Section 10). 640 9.6. Date Element 642 A "Date Element" MUST declare a length of either zero octets or eight 643 octets. If the "EBML Element" is not defined to have a "default" 644 value, then a "Date Element" with a zero-octet length represents a 645 timestamp of 2001-01-01T00:00:00.000000000 UTC [RFC3339]. 647 The "Date Element" stores an integer in the same format as the 648 "Signed Integer Element" that expresses a point in time referenced in 649 nanoseconds from the precise beginning of the third millennium of the 650 Gregorian Calendar in Coordinated Universal Time (also known as 651 2001-01-01T00:00:00.000000000 UTC). This provides a possible 652 expression of time from 1708-09-11T00:12:44.854775808 UTC to 653 2293-04-11T11:47:16.854775807 UTC. 655 9.7. Master Element 657 A "Master Element" MUST declare a length in octets from zero to 658 "VINTMAX". The "Master Element" MAY also use an unknown length. See 659 Section 8 for rules that apply to elements of unknown length. 661 The "Master Element" contains zero, one, or many other elements. 662 "EBML Elements" contained within a "Master Element" MUST have the 663 "EBMLParentPath" of their "Element Path" equals to the 664 "EBMLReferencePath" of the "Master Element" "Element Path" (see 665 Section 14.1.4.2). "Element Data" stored within "Master Elements" 666 SHOULD only consist of "EBML Elements" and SHOULD NOT contain any 667 data that is not part of an "EBML Element". When "EBML" is used in 668 transmission or streaming, data that is not part of an "EBML Element" 669 is permitted to be present within a "Master Element" if 670 "unknownsizeallowed" is enabled within the definition for that 671 "Master Element". In this case, the "EBML Reader" should skip data 672 until a valid "Element ID" of the same "EBMLParentPath" or the next 673 upper level "Element Path" of the "Master Element" is found. What 674 "Element IDs" are considered valid within a "Master Element" is 675 identified by the "EBML Schema" for that version of the "EBML 676 Document Type". Any data contained within a "Master Element" that is 677 not part of a "Child Element" MUST be ignored. 679 9.8. Binary Element 681 A "Binary Element" MUST declare a length in octets from zero to 682 "VINTMAX". 684 The contents of a "Binary Element" should not be interpreted by the 685 "EBML Reader". 687 10. Terminating Elements 689 "Null Octets", which are octets with all bits set to zero, MAY follow 690 the value of a "String Element" or "UTF-8 Element" to serve as a 691 terminator. An "EBML Writer" MAY terminate a "String Element" or 692 "UTF-8 Element" with "Null Octets" in order to overwrite a stored 693 value with a new value of lesser length while maintaining the same 694 "Element Data Size" (this can prevent the need to rewrite large 695 portions of an "EBML Document"); otherwise the use of "Null Octets" 696 within a "String Element" or "UTF-8 Element" is NOT RECOMMENDED. An 697 "EBML Reader" MUST consider the value of the "String Element" or 698 "UTF-8 Element" to be terminated upon the first read "Null Octet" and 699 MUST ignore any data following the first "Null Octet" within that 700 "Element". A string value and a copy of that string value terminated 701 by one or more "Null Octets" are semantically equal. 703 The following table shows examples of semantics and validation for 704 the use of "Null Octets". Values to represent "Stored Values" and 705 the "Semantic Meaning" as represented as hexidecimal values. 707 +---------------------+---------------------+ 708 | Stored Value | Semantic Meaning | 709 +---------------------+---------------------+ 710 | 0x65 0x62 0x6d 0x6c | 0x65 0x62 0x6d 0x6c | 711 | 0x65 0x62 0x00 0x6c | 0x65 0x62 | 712 | 0x65 0x62 0x00 0x00 | 0x65 0x62 | 713 | 0x65 0x62 | 0x65 0x62 | 714 +---------------------+---------------------+ 716 11. Guidelines for Updating Elements 718 An EBML Document can be updated without requiring that the entire 719 EBML Document be rewritten. These recommendations describe 720 strategies to change the "Element Data" of a written "EBML Element" 721 with minimal disruption to the rest of the "EBML Document". 723 11.1. Reducing a Element Data in Size 725 There are three methods to reduce the size of "Element Data" of a 726 written "EBML Element". 728 11.1.1. Adding a Void Element 730 When an "EBML Element" is changed to reduce its total length by more 731 than one octet, an "EBML Writer" SHOULD fill the freed space with a 732 "Void Element". 734 11.1.2. Extending the Element Data Size 736 The same value for "Element Data Size" MAY be written in variable 737 lengths, so for minor reductions in octet length the "Element Data 738 Size" MAY be written to a longer octet length to fill the freed 739 space. 741 For example, the first row of the following table depicts a "String 742 Element" that stores an "Element ID" (3 octets), "Element Data Size" 743 (1 octet), and "Element Data" (4 octets). If the "Element Data" is 744 changed to reduce the length by one octet and if the current length 745 of the "Element Data Size" is less than its maximum permitted length, 746 then the "Element Data Size" of that "Element" MAY be rewritten to 747 increase its length by one octet. Thus before and after the change 748 the "EBML Element" maintains the same length of 8 octets and data 749 around the "Element" does not need to be moved. 751 +-------------+------------+-------------------+--------------+ 752 | Status | Element ID | Element Data Size | Element Data | 753 +-------------+------------+-------------------+--------------+ 754 | Before edit | 0x3B4040 | 0x84 | 0x65626d6c | 755 | After edit | 0x3B4040 | 0x4003 | 0x6d6b76 | 756 +-------------+------------+-------------------+--------------+ 758 This method is only RECOMMENDED for reducing "Element Data" by a 759 single octet; for reductions by two or more octets it is RECOMMENDED 760 to fill the freed space with a "Void Element". 762 Note that if the "Element Data" length needs to be rewritten as 763 shortened by one octet and the "Element Data Size" could be rewritten 764 as a shorter "VINT" then it is RECOMMENDED to rewrite the "Element 765 Data Size" as one octet shorter, shorten the "Element Data" by one 766 octet, and follow that "Element" with a "Void Element". For example, 767 the following table depicts a "String Element" that stores an 768 "Element ID" (3 octets), "Element Data Size" (2 octets, but could be 769 rewritten in one octet), and "Element Data" (3 octets). If the 770 "Element Data" is to be rewritten to a two octet length, then another 771 octet can be taken from "Element Data Size" so that there is enough 772 space to add a two octent "Void Element". 774 +--------+------------+-----------------+-------------+-------------+ 775 | Status | Element ID | Element Data | Element | Void | 776 | | | Size | Data | Element | 777 +--------+------------+-----------------+-------------+-------------+ 778 | Before | 0x3B4040 | 0x4003 | 0x6d6b76 | | 779 | After | 0x3B4040 | 0x82 | 0x6869 | 0xEC80 | 780 +--------+------------+-----------------+-------------+-------------+ 782 11.1.3. Terminating Element Data 784 For "String Elements" and "UTF-8 Elements" the length of "Element 785 Data" MAY be reduced by adding "Null Octets" to terminate the 786 "Element Data" (see Section 10). 788 In the following table, a four octet long "Element Data" is changed 789 to a three octet long value followed by a "Null Octet"; the "Element 790 Data Size" includes any "Null Octets" used to terminate "Element 791 Data" so remains unchanged. 793 +-------------+------------+-------------------+--------------+ 794 | Status | Element ID | Element Data Size | Element Data | 795 +-------------+------------+-------------------+--------------+ 796 | Before edit | 0x3B4040 | 0x84 | 0x65626d6c | 797 | After edit | 0x3B4040 | 0x84 | 0x6d6b7600 | 798 +-------------+------------+-------------------+--------------+ 800 Note that this method is NOT RECOMMENDED. For reductions of one 801 octet, the method for "Extending the Element Data Size" SHOULD be 802 used. For reduction by more than one octet, the method for "Adding a 803 Void Element" SHOULD be used. 805 11.2. Considerations when Updating Elements with CRC 807 If the "Element" to be changed is a "Descendant Element" of any 808 "Master Element" that contains an "CRC-32 Element" then the "CRC-32 809 Element" MUST be verified before permitting the change. Additionally 810 the "CRC-32 Element" value MUST be subsequently updated to reflect 811 the changed data. 813 12. EBML Document 815 An "EBML Document" is comprised of only two components, an "EBML 816 Header" and an "EBML Body". An "EBML Document" MUST start with an 817 "EBML Header" that declares significant characteristics of the entire 818 "EBML Body". An "EBML Document" consists of "EBML Elements" and MUST 819 NOT contain any data that is not part of an "EBML Element". 821 12.1. EBML Header 823 The "EBML Header" is a declaration that provides processing 824 instructions and identification of the "EBML Body". The "EBML 825 Header" of an "EBML Document" is analogous to the XML Declaration of 826 an XML Document. 828 The "EBML Header" documents the "EBML Schema" (also known as the 829 "EBML DocType") that is used to semantically interpret the structure 830 and meaning of the "EBML Document". Additionally the "EBML Header" 831 documents the versions of both "EBML" and the "EBML Schema" that were 832 used to write the "EBML Document" and the versions required to read 833 the "EBML Document". 835 The "EBML Header" MUST contain a single "Master Element" with an 836 "Element Name" of "EBML" and "Element ID" of "0x1A45DFA3" (see 837 Section 14.2.1) and any number of additional "EBML Elements" within 838 it. The "EBML Header" MUST only contain "EBML Elements" that are 839 defined as part of this document. 841 All "EBML Elements" within the "EBML Header" MUST NOT use any 842 "Element ID" with a length greater than 4 octets. All "EBML 843 Elements" within the "EBML Header" MUST NOT use any "Element Data 844 Size" with a length greater than 4 octets. 846 12.2. EBML Body 848 All data of an "EBML Document" following the "EBML Header" is the 849 "EBML Body". The end of the "EBML Body", as well as the end of the 850 "EBML Document" that contains the "EBML Body", is considered as 851 whichever comes first: the beginning of a new "EBML Header" at the 852 "Root Level" or the end of the file. The "EBML Body" MUST consist 853 only of "EBML Elements" and MUST NOT contain any data that is not 854 part of an "EBML Element". This document defines precisely what 855 "EBML Elements" are to be used within the "EBML Header", but does not 856 name or define what "EBML Elements" are to be used within the "EBML 857 Body". The definition of what "EBML Elements" are to be used within 858 the "EBML Body" is defined by an "EBML Schema". 860 13. EBML Stream 862 An "EBML Stream" is a file that consists of one or many "EBML 863 Documents" that are concatenated together. An occurrence of a "EBML 864 Header" at the "Root Level" marks the beginning of an "EBML 865 Document". 867 14. Elements semantic 869 14.1. EBML Schema 871 An "EBML Schema" is an XML Document that defines the properties, 872 arrangement, and usage of "EBML Elements" that compose a specific 873 "EBML Document Type". The relationship of an "EBML Schema" to an 874 "EBML Document" may be considered analogous to the relationship of an 875 XML Schema [W3C.REC-xmlschema-0-20010502] to an XML Document 876 [W3C.REC-xml-20081126]. An "EBML Schema" MUST be clearly associated 877 with one or many "EBML Document Types". An "EBML Schema" must be 878 expressed as well-formed XML. An "EBML Document Type" is identified 879 by a string stored within the "EBML Header" in the "DocType Element"; 880 for example "matroska" or "webm" (see Section 14.2.6). The "DocType" 881 value for an "EBML Document Type" SHOULD be unique and persistent. 883 An "EBML Schema" MUST declare exactly one "EBML Element" at "Root 884 Level" (referred to as the "Root Element") that MUST occur exactly 885 once within an "EBML Document". The "Void Element" MAY also occur at 886 "Root Level" but is not considered to be "Root Elements" (see 887 Section 14.3.1). 889 The "EBML Schema" MUST document all Elements of the "EBML Body". The 890 "EBML Schema" does not document "Global Elements" that are defined by 891 this document (namely the "Void Element" and the "CRC-32 Element"). 893 An "EBML Schema" MAY constrain the use of "EBML Header Elements" (see 894 Section 14.2) by adding or constraining that Element's "range" 895 attribute. For example, an "EBML Schema" MAY constrain the 896 "EBMLMaxSizeLength" to a maximum value of "8" or MAY constain the 897 "EBMLVersion" to only support a value of "1". If an "EBML Schema" 898 adopts the "EBML Header Element" as-is, then it is not REQUIRED to 899 document that Element within the "EBML Schema". If an "EBML Schema" 900 constrains the range of an "EBML Header Element", then that "Element" 901 MUST be documented within an "" node of the "EBML Schema". 902 This document provides an example of an "EBML Schema", see 903 Section 14.1.11. 905 14.1.1. Element 907 As an XML Document, the "EBML Schema" MUST use "" as the 908 top level element. The "" element MAY contain 909 "" sub-elements. 911 14.1.2. Attributes 913 Within an "EBML Schema" the "" element uses the following 914 attributes: 916 14.1.2.1. docType 918 The "docType" lists the official name of the "EBML Document Type" 919 that is defined by the "EBML Schema"; for example, "". 922 The "docType" attribute is REQUIRED within the "" 923 Element. 925 14.1.2.2. version 927 The "version" lists an incremental non-negative integer that 928 specifies the version of the docType documented by the "EBML Schema". 929 Unlike XML Schemas, an "EBML Schema" documents all versions of a 930 docType's definition rather than using separate "EBML Schemas" for 931 each version of a "docType". "EBML Elements" may be introduced and 932 deprecated by using the "minver" and "maxver" attributes of 933 "". 935 The "version" attribute is REQUIRED within the "" 936 Element. 938 14.1.3. Element 940 Each "" defines one "EBML Element" through the use of 941 several attributes that are defined in Section 14.1.2. "EBML 942 Schemas" MAY contain additional attributes to extend the semantics 943 but MUST NOT conflict with the definitions of the "" 944 attributes defined within this document. 946 The "" nodes contain a description of the meaning and use of 947 the "EBML Element" stored within one or many "" sub- 948 elements and zero or one "" sub-element. All 949 "" nodes MUST be sub-elements of the "". 951 14.1.4. Attributes 953 Within an "EBML Schema" the "" uses the following attributes 954 to define an "EBML Element": 956 14.1.4.1. name 958 The "name" provides the official human-readable name of the "EBML 959 Element". The value of the name MUST be in the form of characters 960 "A" to "Z", "a" to "z", "0" to "9", "-" and ".". 962 The "name" attribute is REQUIRED. 964 14.1.4.2. path 966 The path defines the allowed storage locations of the "EBML Element" 967 within an "EBML Document". This path MUST be defined with the full 968 hierarchy of "EBML Elements" separated with a "/". The top "EBML 969 Element" in the path hierarchy being the first in the value. The 970 syntax of the "path" attribute is defined using this Augmented 971 Backus-Naur Form (ABNF) [RFC5234] with the case sensitive update 972 [RFC7405] notation: 974 The "path" attribute is REQUIRED. 976 EBMLFullPath = EBMLElementOccurrence "(" EBMLReferencePath ")" 977 EBMLReferencePath = [EBMLParentPath] EBMLElementPath 978 EBMLParentPath = EBMLFixedParent EBMLLastParent 979 EBMLFixedParent = *(EBMLPathAtom) 980 EBMLElementPath = EBMLPathAtom / EBMLPathAtomRecursive 981 EBMLPathAtom = PathDelimiter EBMLAtomName 982 EBMLPathAtomRecursive = "(1*(" EBMLPathAtom "))" 983 EBMLLastParent = EBMLPathAtom / EBMLVariableParent 984 EBMLVariableParent = "(" VariableParentOccurrence "\)" 985 EBMLAtomName = 1*(EBMLNameChar) 986 EBMLNameChar = ALPHA / DIGIT / "-" / "." 987 PathDelimiter = "\" 988 EBMLElementOccurrence = [EBMLMinOccurrence] "*" [EBMLMaxOccurrence] 989 EBMLMinOccurrence = 1*DIGIT 990 EBMLMaxOccurrence = 1*DIGIT 991 VariableParentOccurrence = [PathMinOccurrence] "*" [PathMaxOccurrence] 992 PathMinOccurrence = 1*DIGIT 993 PathMaxOccurrence = 1*DIGIT 995 The ""*"", ""("" and "")"" symbols MUST be interpreted as they are 996 defined in the ABNF. 998 The "EBMLPathAtom" part of the "EBMLElementPath" MUST be equal to the 999 "name" attribute of the "EBML Schema". 1001 The starting "PathDelimiter" of the path corresponds to the root of 1002 the "EBML Document". 1004 The "EBMLElementOccurrence" part is interpreted as an ABNF Variable 1005 Repetition. The repetition amounts correspond to how many times the 1006 "EBML Element" can be found in its "Parent Element". 1008 The "EBMLMinOccurrence" represents the minimum number of occurrences 1009 of this "EBML Element" within its "Parent Element". Each instance of 1010 the "Parent Element" MUST contain at least this many instances of 1011 this "EBML Element". If the "EBML Element" has an empty 1012 "EBMLParentPath" then "EBMLMinOccurrence" refers to constraints on 1013 the occurrence of the "EBML Element" within the "EBML Document". If 1014 "EBMLMinOccurrence" is not present then that "EBML Element" is 1015 considered to have a "EBMLMinOccurrence" value of 0. The semantic 1016 meaning of "EBMLMinOccurrence" within an "EBML Schema" is considered 1017 analogous to the meaning of "minOccurs" within an "XML Schema". 1018 "EBML Elements" with "EBMLMinOccurrence" set to "1" that also have a 1019 "default" value (see Section 14.1.4.8) declared are not REQUIRED to 1020 be stored but are REQUIRED to be interpreted, see Section 14.1.15. 1021 An "EBML Element" defined with a "EBMLMinOccurrence" value greater 1022 than zero is called a "Mandatory EBML Element". 1024 The "EBMLMaxOccurrence" represents the maximum number of occurrences 1025 of this "EBML Element" within its "Parent Element". Each instance of 1026 the "Parent Element" MUST contain at most this many instances of this 1027 "EBML Element". If the "EBML Element" has an empty "EBMLParentPath" 1028 then "EBMLMaxOccurrence" refers to constraints on the occurrence of 1029 the "EBML Element" within the "EBML Document". If 1030 "EBMLMaxOccurrence" is not present then that "EBML Element" is 1031 considered to have no maximum occurrence. The semantic meaning of 1032 "EBMLMaxOccurrence" within an "EBML Schema path" is considered 1033 analogous to the meaning of "maxOccurs" within an "XML Schema". 1035 The "VariableParentOccurrence" part is interpreted as an ABNF 1036 Variable Repetition. The repetition amounts correspond to the amount 1037 of unspecified "Parent Element" levels there can be between the 1038 "EBMLFixedParent" and the actual "EBMLElementPath". 1040 If the path contains a "EBMLPathAtomRecursive" part, the "EBML 1041 Element" can occur within itself recursively (see the 1042 Section 14.1.4.11). 1044 14.1.4.3. id 1046 The "Element ID" encoded as a "Variable Size Integer" expressed in 1047 hexadecimal notation prefixed by a "0x" that is read and stored in 1048 big-endian order. To reduce the risk of false positives while 1049 parsing "EBML Streams", the "Element IDs" of the "Root Element" and 1050 "Top-Level Elements" SHOULD be at least 4 octets in length. "Element 1051 IDs" defined for use at "Root Level" or directly under the "Root 1052 Level" MAY use shorter octet lengths to facilitate padding and 1053 optimize edits to "EBML Documents"; for instance, the "Void Element" 1054 uses an "Element ID" with a one octet length to allow its usage in 1055 more writing and editing scenarios. 1057 The "id" attribute is REQUIRED. 1059 14.1.4.4. minOccurs 1061 An integer expressing the minimum number of occurrences of this "EBML 1062 Element" within its "Parent Element". The "minOccurs" value MUST be 1063 equal to the "EBMLMinOccurrence" value of the "path". 1065 The "minOccurs" attribute is OPTIONAL. If the "minOccurs" attribute 1066 is not present then that "EBML Element" is considered to have a 1067 "minOccurs" value of 0. 1069 14.1.4.5. maxOccurs 1071 An integer expressing the maximum number of occurrences of this "EBML 1072 Element" within its "Parent Element". The "maxOccurs" value MUST be 1073 equal to the "EBMLMaxOccurrence" value of the "path". 1075 The "maxOccurs" attribute is OPTIONAL. If the "maxOccurs" attribute 1076 is not present then that "EBML Element" is considered to have no 1077 maximum occurrence, similar to "unbounded" in the XML world. 1079 14.1.4.6. range 1081 A numerical range for "EBML Elements" which are of numerical types 1082 (Unsigned Integer, Signed Integer, Float, and Date). If specified 1083 the value of the "EBML Element" MUST be within the defined range. 1084 See Section 14.1.13 for rules applied to expression of range values. 1086 The "range" attribute is OPTIONAL. If the "range" attribute is not 1087 present then any value legal for the "type" attribute is valid. 1089 14.1.4.7. size 1091 A value to express the valid length of the "Element Data" as written 1092 measured in octets. The "size" provides a constraint in addition to 1093 the Length value of the definition of the corresponding "EBML Element 1094 Type". This "size" MUST be expressed as either a non-negative 1095 integer or a range (see Section 14.1.13) that consists of only non- 1096 negative integers and valid operators. 1098 The "size" attribute is OPTIONAL. If the "size" attribute is not 1099 present for that "EBML Element" then that "EBML Element" is only 1100 limited in size by the definition of the associated "EBML Element 1101 Type". 1103 14.1.4.8. default 1105 If an Element is mandatory (has a "EBMLMinOccurrence" value greater 1106 than zero) but not written within its "Parent Element" or stored as 1107 an "Empty Element", then the "EBML Reader" of the "EBML Document" 1108 MUST semantically interpret the "EBML Element" as present with this 1109 specified default value for the "EBML Element". "EBML Elements" that 1110 are "Master Elements" MUST NOT declare a "default" value. "EBML 1111 Elements" with a "minOccurs" value greater than 1 MUST NOT declare a 1112 "default" value. 1114 The "default" attribute is OPTIONAL. 1116 14.1.4.9. type 1118 The "type" MUST be set to one of the following values: 'integer' 1119 (signed integer), 'uinteger' (unsigned integer), 'float', 'string', 1120 'date', 'utf-8', 'master', or 'binary'. The content of each "type" 1121 is defined within Section 9. 1123 The "type" attribute is REQUIRED. 1125 14.1.4.10. unknownsizeallowed 1127 A boolean to express if an "EBML Element" MAY be used as an "Unknown- 1128 Sized Element" (having all "VINT_DATA" bits of "Element Data Size" 1129 set to 1). "EBML Elements" that are not "Master Elements" MUST NOT 1130 set "unknownsizeallowed" to true. An "EBML Element" that is defined 1131 with an "unknownsizeallowed" attribute set to 1 MUST also have the 1132 "unknownsizeallowed" attribute of its "Parent Element" set to 1. 1134 The "unknownsizeallowed" attribute is OPTIONAL. If the 1135 "unknownsizeallowed" attribute is not used then that "EBML Element" 1136 is not allowed to use an unknown "Element Data Size". 1138 14.1.4.11. recursive 1140 A boolean to express if an "EBML Element" MAY be stored recursively. 1141 In this case the "EBML Element" MAY be stored within another "EBML 1142 Element" that has the same "Element ID". Which itself can be stored 1143 in an "EBML Element" that has the same "Element ID", and so on. 1144 "EBML Elements" that are not "Master Elements" MUST NOT set 1145 "recursive" to true. 1147 If the "path" contains a "EBMLPathAtomRecursive" part then the 1148 "recursive" value MUST be true and false otherwise. 1150 The "recursive" attribute is OPTIONAL. If the "recursive" attribute 1151 is not present then the "EBML Element" MUST NOT be used recursively. 1153 14.1.4.12. minver 1155 The "minver" (minimum version) attribute stores a non-negative 1156 integer that represents the first version of the "docType" to support 1157 the "EBML Element". 1159 The "minver" attribute is OPTIONAL. If the "minver" attribute is not 1160 present then the "EBML Element" has a minimum version of "1". 1162 14.1.4.13. maxver 1164 The "maxver" (maximum version) attribute stores a non-negative 1165 integer that represents the last or most recent version of the 1166 "docType" to support the element. "maxver" MUST be greater than or 1167 equal to "minver". 1169 The "maxver" attribute is OPTIONAL. If the "maxver" attribute is not 1170 present then the "EBML Element" has a maximum version equal to the 1171 value stored in the "version" attribute of "". 1173 14.1.5. Element 1175 The "" element provides additional information about 1176 the "EBML Element". 1178 14.1.6. Attributes 1180 14.1.6.1. lang 1182 A "lang" attribute which is set to the [RFC5646] value of the 1183 language of the element's documentation. 1185 The "lang" attribute is OPTIONAL. 1187 14.1.6.2. type 1189 A "type" attribute distinguishes the meaning of the documentation. 1190 Values for the "" sub-element's "type" attribute MUST 1191 include one of the following: "definition", "rationale", "usage 1192 notes", and "references". 1194 The "type" attribute is OPTIONAL. 1196 14.1.7. Element 1198 The "" element provides information about restrictions 1199 to the allowable values for the "EBML Element" which are listed in 1200 "" elements. 1202 14.1.8. Element 1204 The "" element stores a list of values allowed for storage in 1205 the "EBML Element". The values MUST match the "type" of the "EBML 1206 Element" (for example "" can not be a valid value 1207 for a "EBML Element" that is defined as an unsigned integer). An 1208 "" element MAY also store "" elements to further 1209 describe the "". 1211 14.1.9. Attributes 1213 14.1.9.1. label 1215 The "label" provides a concise expression for human consumption that 1216 describes what the "value" of the "" represents. 1218 The "label" attribute is OPTIONAL. 1220 14.1.9.2. value 1222 The "value" represents data that MAY be stored within the "EBML 1223 Element". 1225 The "value" attribute is REQUIRED. 1227 14.1.10. XML Schema for EBML Schema 1229 1230 1234 1235 1236 1237 1239 1240 1241 1242 1243 1244 1245 1247 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1268 1269 1270 1271 1272 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1287 14.1.11. EBML Schema Example 1289 1290 1292 1293 1296 1299 1300 1301 Container of data and 1302 attributes representing one or many files. 1303 1304 1306 1307 An attached file. 1308 1309 1310 1313 1314 Filename of the attached file. 1315 1316 1317 1320 1321 MIME type of the file. 1322 1323 1324 1327 1328 Modification timestamp of the file. 1329 1330 1331 1333 1334 The data of the file. 1335 1336 1337 1339 14.1.12. Identically Recurring Elements 1341 An "Identically Recurring Element" is an "EBML Element" that MAY 1342 occur within its "Parent Element" more than once but that each 1343 recurrence within that "Parent Element" MUST be identical both in 1344 storage and semantics. "Identically Recurring Elements" are 1345 permitted to be stored multiple times within the same "Parent 1346 Element" in order to increase data resilience and optimize the use of 1347 "EBML" in transmission. For instance a pertinent "Top-Level Element" 1348 could be periodically resent within a data stream so that an "EBML 1349 Reader" which starts reading the stream from the middle could better 1350 interpret the contents. "Identically Recurring Elements" SHOULD 1351 include a "CRC-32 Element" as a "Child Element"; this is especially 1352 recommended when "EBML" is used for long-term storage or 1353 transmission. If a "Parent Element" contains more than one copy of 1354 an "Identically Recurring Element" which includes a "CRC-32 Element" 1355 as a "Child Element" then the first instance of the "Identically 1356 Recurring Element" with a valid CRC-32 value should be used for 1357 interpretation. If a "Parent Element" contains more than one copy of 1358 an "Identically Recurring Element" which does not contain a "CRC-32 1359 Element" or if "CRC-32 Elements" are present but none are valid then 1360 the first instance of the "Identically Recurring Element" should be 1361 used for interpretation. 1363 14.1.13. Expression of range 1365 The "range" attribute MUST only be used with "EBML Elements" that are 1366 either "signed integer", "unsigned integer", "float", or "date". The 1367 "range" expression may contain whitespace for readability but 1368 whitespace within a "range" expression MUST NOT convey meaning. The 1369 expression of the "range" MUST adhere to one of the following forms: 1371 o "x-y" where x and y are integers or floats and "y" MUST be greater 1372 than "x", meaning that the value MUST be greater than or equal to 1373 "x" and less than or equal to "y". "x" MUST be less than "y". 1375 o ">x" where "x" is an integer or float, meaning that the value MUST 1376 be greater than "x". 1378 o ">=x" where "x" is an integer or float, meaning that the value 1379 MUST be greater than or equal to "x". 1381 o "=4 1537 default: 4 1539 type: Unsigned Integer 1541 description: The "EBMLMaxIDLength Element" stores the maximum length 1542 in octets of the "Element IDs" to be found within the "EBML Body". 1543 An "EBMLMaxIDLength Element" value of four is RECOMMENDED, though 1544 larger values are allowed. 1546 14.2.5. EBMLMaxSizeLength Element 1548 name: "EBMLMaxSizeLength" 1550 path: "1*1(\EBML\EBMLMaxSizeLength)" 1552 id "0x42F3" 1554 minOccurs: 1 1556 maxOccurs: 1 1558 range: not 0 1560 default: 8 1562 type: Unsigned Integer 1564 description: The "EBMLMaxSizeLength Element" stores the maximum 1565 length in octets of the expression of all "Element Data Sizes" to be 1566 found within the "EBML Body". To be clear the "EBMLMaxSizeLength 1567 Element" documents the maximum 'length' of all "Element Data Size" 1568 expressions within the "EBML Body" and not the maximum 'value' of all 1569 "Element Data Size" expressions within the "EBML Body". "EBML 1570 Elements" that have an "Element Data Size" expression which is larger 1571 in octets than what is expressed by "EBMLMaxSizeLength ELEMENT" SHALL 1572 be considered invalid. 1574 14.2.6. DocType Element 1576 name: "DocType" 1578 path: "1*1(\EBML\DocType)" 1580 id "0x4282" 1582 minOccurs: 1 1584 maxOccurs: 1 1586 size: >0 1588 type: String 1590 description: A string that describes and identifies the content of 1591 the "EBML Body" that follows this "EBML Header". 1593 14.2.7. DocTypeVersion Element 1595 name: "DocTypeVersion" 1597 path: "1*1(\EBML\DocTypeVersion)" 1599 id "0x4287" 1601 minOccurs: 1 1603 maxOccurs: 1 1605 default: 1 1607 type: Unsigned Integer 1609 description: The version of "DocType" interpreter used to create the 1610 "EBML Document". 1612 14.2.8. DocTypeReadVersion Element 1614 name: DocTypeReadVersion 1616 path: "1*1(\EBML\DocTypeReadVersion)" 1618 id "0x4285" 1620 minOccurs: 1 1622 maxOccurs: 1 1624 default: 1 1626 type: Unsigned Integer 1628 description: The minimum "DocType" version an "EBML Reader" has to 1629 support to read this "EBML Document". The value of the 1630 "DocTypeReadVersion Element" MUST be less than or equal to the value 1631 of the "DocTypeVersion Element". 1633 14.3. Global elements (used everywhere in the format) 1635 name: CRC-32 1637 path: "*1((1*\)\CRC-32)" 1639 id: "0xBF" 1641 minOccurs: 0 1643 maxOccurs: 1 1645 size: 4 1647 type: Binary 1649 description: The "CRC-32 Element" contains a 32-bit Cyclic Redundancy 1650 Check value of all the "Element Data" of the "Parent Element" as 1651 stored except for the "CRC-32 Element" itself. When the "CRC-32 1652 Element" is present, the "CRC-32 Element" MUST be the first ordered 1653 "EBML Element" within its "Parent Element" for easier reading. All 1654 "Top-Level Elements" of an "EBML Document" that are "Master Elements" 1655 SHOULD include a "CRC-32 Element" as a "Child Element". The CRC in 1656 use is the IEEE-CRC-32 algorithm as used in the [ISO.3309.1979] 1657 standard and in section 8.1.1.6.2 of [ITU.V42.1994], with initial 1658 value of "0xFFFFFFFF". The CRC value MUST be computed on a little 1659 endian bitstream and MUST use little endian storage. 1661 14.3.1. Void Element 1663 name: Void 1665 path: "*((*\)\Void)" 1667 id: "0xEC" 1669 minOccurs: 0 1671 type: Binary 1673 description: Used to void damaged data, to avoid unexpected behaviors 1674 when using damaged data. The content is discarded. Also used to 1675 reserve space in a sub-element for later use. 1677 15. References 1679 15.1. Normative References 1681 [IEEE.754.1985] 1682 Institute of Electrical and Electronics Engineers, 1683 "Standard for Binary Floating-Point Arithmetic", 1684 IEEE Standard 754, August 1985. 1686 [ISO.3309.1979] 1687 International Organization for Standardization, "Data 1688 communication - High-level data link control procedures - 1689 Frame structure", ISO Standard 3309, 1979. 1691 [ISO.9899.2011] 1692 International Organization for Standardization, 1693 "Programming languages - C", ISO Standard 9899, 2011. 1695 [ITU.V42.1994] 1696 International Telecommunications Union, "Error-correcting 1697 Procedures for DCEs Using Asynchronous-to-Synchronous 1698 Conversion", ITU-T Recommendation V.42, 1994. 1700 [RFC0020] Cerf, V., "ASCII format for network interchange", STD 80, 1701 RFC 20, DOI 10.17487/RFC0020, October 1969, 1702 . 1704 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1705 Requirement Levels", BCP 14, RFC 2119, 1706 DOI 10.17487/RFC2119, March 1997, 1707 . 1709 [RFC3339] Klyne, G. and C. Newman, "Date and Time on the Internet: 1710 Timestamps", RFC 3339, DOI 10.17487/RFC3339, July 2002, 1711 . 1713 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 1714 10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November 1715 2003, . 1717 [RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax 1718 Specifications: ABNF", STD 68, RFC 5234, 1719 DOI 10.17487/RFC5234, January 2008, 1720 . 1722 [RFC5646] Phillips, A., Ed. and M. Davis, Ed., "Tags for Identifying 1723 Languages", BCP 47, RFC 5646, DOI 10.17487/RFC5646, 1724 September 2009, . 1726 [RFC7405] Kyzivat, P., "Case-Sensitive String Support in ABNF", 1727 RFC 7405, DOI 10.17487/RFC7405, December 2014, 1728 . 1730 [W3C.REC-xml-20081126] 1731 Bray, T., Paoli, J., Sperberg-McQueen, M., Maler, E., and 1732 F. Yergeau, "Extensible Markup Language (XML) 1.0 (Fifth 1733 Edition)", World Wide Web Consortium Recommendation REC- 1734 xml-20081126, November 2008, 1735 . 1737 15.2. Informative References 1739 [W3C.REC-xmlschema-0-20010502] 1740 Fallside, D., "XML Schema Part 0: Primer", World Wide Web 1741 Consortium Recommendation REC-xmlschema-0-20010502, May 1742 2001, 1743 . 1745 Authors' Addresses 1747 Steve Lhomme 1749 Email: slhomme@matroska.org 1751 Dave Rice 1753 Email: dave@dericed.com 1754 Moritz Bunkus 1756 Email: moritz@bunkus.org