idnits 2.17.1 draft-ietf-cellar-flac-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There are 56 instances of too long lines in the document, the longest one being 4 characters in excess of 72. == There are 2 instances of lines with non-RFC2606-compliant FQDNs in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (29 October 2021) is 909 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 cellar M. Richardson 3 Internet-Draft 4 Intended status: Informational A. Weaver 5 Expires: 2 May 2022 29 October 2021 7 Free Lossless Audio Codec 8 draft-ietf-cellar-flac-02 10 Abstract 12 This document defines FLAC, which stands for Free Lossless Audio 13 Codec, a free, open source codec for lossless audio compression and 14 decompression. 16 Status of This Memo 18 This Internet-Draft is submitted in full conformance with the 19 provisions of BCP 78 and BCP 79. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF). Note that other groups may also distribute 23 working documents as Internet-Drafts. The list of current Internet- 24 Drafts is at https://datatracker.ietf.org/drafts/current/. 26 Internet-Drafts are draft documents valid for a maximum of six months 27 and may be updated, replaced, or obsoleted by other documents at any 28 time. It is inappropriate to use Internet-Drafts as reference 29 material or to cite them other than as "work in progress." 31 This Internet-Draft will expire on 2 May 2022. 33 Copyright Notice 35 Copyright (c) 2021 IETF Trust and the persons identified as the 36 document authors. All rights reserved. 38 This document is subject to BCP 78 and the IETF Trust's Legal 39 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 40 license-info) in effect on the date of publication of this document. 41 Please review these documents carefully, as they describe your rights 42 and restrictions with respect to this document. Code Components 43 extracted from this document must include Revised BSD License text as 44 described in Section 4.e of the Trust Legal Provisions and are 45 provided without warranty as described in the Revised BSD License. 47 Table of Contents 49 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 50 2. Notation and Conventions . . . . . . . . . . . . . . . . . . 3 51 3. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 3 52 4. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 53 5. Architecture . . . . . . . . . . . . . . . . . . . . . . . . 4 54 6. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 5 55 7. Blocking . . . . . . . . . . . . . . . . . . . . . . . . . . 7 56 8. Interchannel Decorrelation . . . . . . . . . . . . . . . . . 7 57 9. Prediction . . . . . . . . . . . . . . . . . . . . . . . . . 8 58 10. Residual Coding . . . . . . . . . . . . . . . . . . . . . . . 9 59 11. Format . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 60 11.1. Principles . . . . . . . . . . . . . . . . . . . . . . . 10 61 11.2. Overview . . . . . . . . . . . . . . . . . . . . . . . . 11 62 11.3. Subset . . . . . . . . . . . . . . . . . . . . . . . . . 14 63 11.4. Conventions . . . . . . . . . . . . . . . . . . . . . . 15 64 11.5. STREAM . . . . . . . . . . . . . . . . . . . . . . . . . 15 65 11.6. METADATA_BLOCK . . . . . . . . . . . . . . . . . . . . . 15 66 11.7. METADATA_BLOCK_HEADER . . . . . . . . . . . . . . . . . 16 67 11.8. BLOCK_TYPE . . . . . . . . . . . . . . . . . . . . . . . 16 68 11.9. METADATA_BLOCK_DATA . . . . . . . . . . . . . . . . . . 17 69 11.10. METADATA_BLOCK_STREAMINFO . . . . . . . . . . . . . . . 17 70 11.11. METADATA_BLOCK_PADDING . . . . . . . . . . . . . . . . . 18 71 11.12. METADATA_BLOCK_APPLICATION . . . . . . . . . . . . . . . 18 72 11.13. METADATA_BLOCK_SEEKTABLE . . . . . . . . . . . . . . . . 19 73 11.14. SEEKPOINT . . . . . . . . . . . . . . . . . . . . . . . 19 74 11.15. METADATA_BLOCK_VORBIS_COMMENT . . . . . . . . . . . . . 20 75 11.16. METADATA_BLOCK_CUESHEET . . . . . . . . . . . . . . . . 20 76 11.17. CUESHEET_TRACK . . . . . . . . . . . . . . . . . . . . . 21 77 11.18. CUESHEET_TRACK_INDEX . . . . . . . . . . . . . . . . . . 22 78 11.19. METADATA_BLOCK_PICTURE . . . . . . . . . . . . . . . . . 23 79 11.20. PICTURE_TYPE . . . . . . . . . . . . . . . . . . . . . . 24 80 11.21. FRAME . . . . . . . . . . . . . . . . . . . . . . . . . 25 81 11.22. FRAME_HEADER . . . . . . . . . . . . . . . . . . . . . . 25 82 11.22.1. FRAME HEADER RESERVED . . . . . . . . . . . . . . . 26 83 11.22.2. BLOCKING STRATEGY . . . . . . . . . . . . . . . . . 26 84 11.22.3. INTERCHANNEL SAMPLE BLOCK SIZE . . . . . . . . . . 27 85 11.22.4. SAMPLE RATE . . . . . . . . . . . . . . . . . . . . 27 86 11.22.5. CHANNEL ASSIGNMENT . . . . . . . . . . . . . . . . 28 87 11.22.6. SAMPLE SIZE . . . . . . . . . . . . . . . . . . . . 30 88 11.22.7. FRAME HEADER RESERVED2 . . . . . . . . . . . . . . 30 89 11.22.8. CODED NUMBER . . . . . . . . . . . . . . . . . . . 30 90 11.22.9. BLOCK SIZE INT . . . . . . . . . . . . . . . . . . 31 91 11.22.10. SAMPLE RATE INT . . . . . . . . . . . . . . . . . . 31 92 11.22.11. FRAME CRC . . . . . . . . . . . . . . . . . . . . . 31 93 11.23. FRAME_FOOTER . . . . . . . . . . . . . . . . . . . . . . 31 94 11.24. SUBFRAME . . . . . . . . . . . . . . . . . . . . . . . . 32 95 11.25. SUBFRAME_HEADER . . . . . . . . . . . . . . . . . . . . 32 96 11.25.1. SUBFRAME TYPE . . . . . . . . . . . . . . . . . . . 32 97 11.25.2. WASTED BITS PER SAMPLE FLAG . . . . . . . . . . . . 33 98 11.26. SUBFRAME_CONSTANT . . . . . . . . . . . . . . . . . . . 33 99 11.27. SUBFRAME_FIXED . . . . . . . . . . . . . . . . . . . . . 34 100 11.28. SUBFRAME_LPC . . . . . . . . . . . . . . . . . . . . . . 34 101 11.29. SUBFRAME_VERBATIM . . . . . . . . . . . . . . . . . . . 34 102 11.30. RESIDUAL . . . . . . . . . . . . . . . . . . . . . . . . 35 103 11.30.1. RESIDUAL_CODING_METHOD . . . . . . . . . . . . . . 35 104 11.30.2. RESIDUAL_CODING_METHOD_PARTITIONED_EXP_GOLOMB . . . 35 105 11.30.3. RESIDUAL_CODING_METHOD_PARTITIONED_EXP_GOLOMB2 . . 36 106 11.30.4. ENCODED RESIDUAL . . . . . . . . . . . . . . . . . 37 107 12. Security Considerations . . . . . . . . . . . . . . . . . . . 38 108 13. Normative References . . . . . . . . . . . . . . . . . . . . 38 109 14. Informative References . . . . . . . . . . . . . . . . . . . 38 110 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 39 112 1. Introduction 114 This is a detailed description of the FLAC format. There is also a 115 companion document that describes FLAC-to-Ogg mapping 116 (https://xiph.org/flac/ogg_mapping.html). 118 For a user-oriented overview, see About the FLAC Format 119 (https://xiph.org/flac/documentation_format_overview.html). 121 2. Notation and Conventions 123 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 124 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 125 "OPTIONAL" in this document are to be interpreted as described in BCP 126 14 [RFC2119] [RFC8174] when, and only when, they appear in all 127 capitals, as shown here. 129 3. Acknowledgments 131 FLAC owes much to the many people who have advanced the audio 132 compression field so freely. For instance: - A. J. Robinson 133 (http://svr-www.eng.cam.ac.uk/~ajr/) for his work on Shorten 134 (http://svr-www.eng.cam.ac.uk/reports/abstracts/robinson_tr156.html); 135 his paper is a good starting point on some of the basic methods used 136 by FLAC. FLAC trivially extends and improves the fixed predictors, 137 LPC coefficient quantization, and Exponential-Golomb coding used in 138 Shorten. - S. W. Golomb 139 (https://web.archive.org/web/20040215005354/http://csi.usc.edu/ 140 faculty/golomb.html) and Robert F. Rice; their universal codes are 141 used by FLAC's entropy coder. - N. Levinson and J. Durbin; the 142 reference encoder uses an algorithm developed and refined by them for 143 determining the LPC coefficients from the autocorrelation 144 coefficients. - And of course, Claude Shannon 145 (http://en.wikipedia.org/wiki/Claude_Shannon) 147 4. Scope 149 FLAC stands for Free Lossless Audio Codec: it is designed to reduce 150 the amount of computer storage space needed to store digital audio 151 signals without needing to remove information in doing so (i.e. 152 lossless). FLAC is free in the sense that its specification is open, 153 its reference implementation is open-source and it is not encumbered 154 by any known patent. 156 FLAC is able to achieve lossless compression because samples in audio 157 signals tend to be highly correlated with their close neighbors. In 158 contrast with general purpose compressors, which often use 159 dictionaries, do run-length coding or exploit long-term repetition, 160 FLAC removes redundancy solely in the very short term, looking back 161 at most 32 samples. 163 The FLAC format is suited for pulse-code modulated (PCM) audio with 1 164 to 8 channels, sample rates from 1 to 1048576 Hertz and bit depths 165 between 4 and 32 bits. Most tools for reading and writing the FLAC 166 format have been optimized for CD-audio, which is PCM audio with 2 167 channels, a sample rate of 44.1 kHz and a bit depth of 16 bits. 169 Compared to other lossless (audio) coding formats, FLAC is a format 170 with low complexity and can be coded to and from with little 171 computing resources. Decoding of FLAC has seen many independent 172 implementations on many different platforms, and both encoding and 173 decoding can be implemented without needing floating-point 174 arithmetic. 176 The coding methods provided by the FLAC format works best on PCM 177 audio signals of which the samples have a signed representation and 178 are centered around zero. Audio signals in which samples have an 179 unsigned representation must be transformed to a signed 180 representation as described in this document in order to achieve 181 reasonable compression. The FLAC format is not suited to compress 182 audio that is not PCM. Pulse-density modulated audio, e.g. DSD, 183 cannot be compressed by FLAC. 185 5. Architecture 187 Similar to many audio coders, a FLAC encoder has the following 188 stages: 190 * Blocking (see section on Blocking (#blocking)). The input is 191 broken up into many contiguous blocks. With FLAC, the blocks MAY 192 vary in size. The optimal size of the block is usually affected 193 by many factors, including the sample rate, spectral 194 characteristics over time, etc. Though FLAC allows the block size 195 to vary within a stream, the reference encoder uses a fixed block 196 size. 198 * Interchannel Decorrelation (see section on Interchannel 199 Decorrelation (#interchannel-decorrelation)). In the case of 200 stereo streams, the encoder will create mid and side signals based 201 on the average and difference (respectively) of the left and right 202 channels. The encoder will then pass the best form of the signal 203 to the next stage. 205 * Prediction (see section on Prediction (#prediction)). The block 206 is passed through a prediction stage where the encoder tries to 207 find a mathematical description (usually an approximate one) of 208 the signal. This description is typically much smaller than the 209 raw signal itself. Since the methods of prediction are known to 210 both the encoder and decoder, only the parameters of the predictor 211 need be included in the compressed stream. FLAC currently uses 212 four different classes of predictors, but the format has reserved 213 space for additional methods. FLAC allows the class of predictor 214 to change from block to block, or even within the channels of a 215 block. 217 * Residual Coding (See section on Residual Coding (#residual- 218 coding)). If the predictor does not describe the signal exactly, 219 the difference between the original signal and the predicted 220 signal (called the error or residual signal) MUST be coded 221 losslessly. If the predictor is effective, the residual signal 222 will require fewer bits per sample than the original signal. FLAC 223 currently uses only one method for encoding the residual, but the 224 format has reserved space for additional methods. FLAC allows the 225 residual coding method to change from block to block, or even 226 within the channels of a block. 228 In addition, FLAC specifies a metadata system, which allows arbitrary 229 information about the stream to be included at the beginning of the 230 stream. 232 6. Definitions 234 * *Block*: A (short) section of linear pulse-code modulated audio, 235 with one or more channels. 237 * *Subblock*: All samples within a corresponding block for 1 238 channel. One or more subblocks form a block, and all subblocks in 239 a certain block contain the same number of samples. 241 * *Frame*: A frame header plus one or more subframes. It encodes 242 the contents of a corresponding block. 244 * *Subframe*: An encoded subblock. All subframes within a frame 245 code for the same number of samples. A subframe MAY correspond to 246 a subblock, else it corresponds to either the addition or 247 subtraction of two subblocks, see section on interchannel 248 decorrelation (#interchannel-decorrelation). 250 * *Blocksize*: The total number of samples contained in a block or 251 coded in a frame, divided by the number of channels. In other 252 words, the number of samples in any subblock of a block, or any 253 subframe of a frame. This is also called *interchannel samples*. 255 * *Bit depth* or *bits per sample*: the number of bits used to 256 contain each sample. This MUST be the same for all subblocks in a 257 block but MAY be different for different subframes in a frame 258 because of interchannel decorrelation (#interchannel- 259 decorrelation). 261 * *Predictor*: a model used to predict samples in an audio signal 262 based on past samples. FLAC uses such predictors to remove 263 redundancy in a signal in order to be able to compress it. 265 * *Linear predictor*: a predictor using linear prediction 266 (https://en.wikipedia.org/wiki/Linear_prediction). This is also 267 called *linear predictive coding (LPC)*. With a linear predictor 268 each prediction is a linear combination of past samples, hence the 269 name. A linear predictor has a causal discrete-time finite 270 impulse response (https://en.wikipedia.org/wiki/ 271 Finite_impulse_response). 273 * *Fixed predictor*: a linear predictor in which the model 274 parameters are the same across all FLAC files, and thus not need 275 to be stored. 277 * *Predictor order*: the number of past samples that a predictor 278 uses. For example, a 4th order predictor uses the 4 samples 279 directly preceding a certain sample to predict it. In FLAC, 280 samples used in a predictor are always consecutive, and are always 281 the samples directly before the sample that is being predicted 283 * *Residual*: The audio signal that remains after a predictor has 284 been subtracted from a subblock. If the predictor has been able 285 to remove redundancy from the signal, the samples of the remaining 286 signal (the *residual samples*) will have, on average, a smaller 287 numerical value than the original signal. 289 * *Rice code*: A variable-length code 290 (https://en.wikipedia.org/wiki/Variable-length_code) which 291 compresses data by making use of the observation that, after using 292 an effective predictor, most residual samples are closer to zero 293 than the original samples, while still allowing for a small part 294 of the samples to be much larger. 296 7. Blocking 298 The size used for blocking the audio data has a direct effect on the 299 compression ratio. If the block size is too small, the resulting 300 large number of frames mean that excess bits will be wasted on frame 301 headers. If the block size is too large, the characteristics of the 302 signal MAY vary so much that the encoder will be unable to find a 303 good predictor. In order to simplify encoder/decoder design, FLAC 304 imposes a minimum block size of 16 samples, and a maximum block size 305 of 65535 samples. This range covers the optimal size for all of the 306 audio data FLAC supports. 308 Currently the reference encoder uses a fixed block size, optimized on 309 the sample rate of the input. Future versions MAY vary the block 310 size depending on the characteristics of the signal. 312 Blocked data is passed to the predictor stage one subblock (channel) 313 at a time. Each subblock is independently coded into a subframe, and 314 the subframes are concatenated into a frame. Because each channel is 315 coded separately, one channel of a stereo frame MAY be encoded as a 316 constant subframe, and the other an LPC subframe. 318 8. Interchannel Decorrelation 320 In many audio files, channels are correlated. The FLAC format can 321 exploit this correlation in stereo files by not directly coding 322 subblocks into subframes, but instead coding an average of all 323 samples in both subblocks (a mid channel) or the difference between 324 all samples in both subblocks (a side channel). The following 325 combinations are possible: 327 * *Independent*. All channels are coded independently. All non- 328 stereo files MUST be encoded this way. 330 * *Mid-side*. A left and right subblock are converted to mid and 331 side subframes. To calculate a sample for a mid subframe, the 332 corresponding left and right samples are summed and the result is 333 shifted right by 1 bit. To calculate a sample for a side 334 subframe, the corresponding right sample is subtracted from the 335 corresponding left sample. On decoding, the mid channel has to be 336 shifted left by 1 bit. Also, if the side channel is uneven, 1 has 337 to be added to the mid channel after the left shift. To 338 reconstruct the left channel, the corresponding samples in the mid 339 and side subframes are added and the result shifted right by 1 340 bit, while for the right channel the side channel has to be 341 subtracted from the mid channel and the result shifted right by 1 342 bit. 344 * *Left-side*. The left subblock is coded and the left and right 345 subblock are used to code a side subframe. The side subframe is 346 constructed in the same way as for mid-side. To decode, the right 347 subblock is restored by subtracting the samples in the side 348 subframe from the corresponding samples the left subframe. 350 * *Right-side*. The right subblock is coded and the left and right 351 subblock are used to code a side subframe. Note that the actual 352 coded subframe order is side-right. The side subframe is 353 constructed in the same way as for mid-side. To decode, the left 354 subblock is restored by adding the samples in the side subframe to 355 the corresponding samples in the left subframe. 357 The side channel needs one extra bit of bit depth as the subtraction 358 can produce sample values twice as large as the maximum possible in 359 any given bit depth. The mid channel in mid-side stereo does not 360 need one extra bit, as it is shifted left one bit. The left shift of 361 the mid channel does not lead to non-lossless behavior, because an 362 uneven sample in the mid subframe must always be accompanied by a 363 corresponding uneven sample in the side subframe, which means the 364 lost least significant bit can be restored by taking it from the 365 sample in the side subframe. 367 9. Prediction 369 FLAC uses four methods for modeling the input signal: 371 1. *Verbatim*. This is essentially a zero-order predictor of the 372 signal. The predicted signal is zero, meaning the residual is 373 the signal itself, and the compression is zero. This is the 374 baseline against which the other predictors are measured. If you 375 feed random data to the encoder, the verbatim predictor will 376 probably be used for every subblock. Since the raw signal is not 377 actually passed through the residual coding stage (it is added to 378 the stream 'verbatim'), the encoding results will not be the same 379 as a zero-order linear predictor. 381 2. *Constant*. This predictor is used whenever the subblock is pure 382 DC ("digital silence"), i.e. a constant value throughout. The 383 signal is run-length encoded and added to the stream. 385 3. *Fixed linear predictor*. FLAC uses a class of computationally- 386 efficient fixed linear predictors (for a good description, see 387 audiopak (http://www.hpl.hp.com/techreports/1999/HPL- 388 1999-144.pdf) and shorten (http://svr- 389 www.eng.cam.ac.uk/reports/abstracts/robinson_tr156.html)). FLAC 390 adds a fourth-order predictor to the zero-to-third-order 391 predictors used by Shorten. Since the predictors are fixed, the 392 predictor order is the only parameter that needs to be stored in 393 the compressed stream. The error signal is then passed to the 394 residual coder. 396 4. *FIR Linear prediction*. For more accurate modeling (at a cost of 397 slower encoding), FLAC supports up to 32nd order FIR linear 398 prediction (again, for information on linear prediction, see 399 audiopak (http://www.hpl.hp.com/techreports/1999/HPL- 400 1999-144.pdf) and shorten (http://svr- 401 www.eng.cam.ac.uk/reports/abstracts/robinson_tr156.html)). The 402 reference encoder uses the Levinson-Durbin method for calculating 403 the LPC coefficients from the autocorrelation coefficients, and 404 the coefficients are quantized before computing the residual. 405 Whereas encoders such as Shorten used a fixed quantization for 406 the entire input, FLAC allows the quantized coefficient precision 407 to vary from subframe to subframe. The FLAC reference encoder 408 estimates the optimal precision to use based on the block size 409 and dynamic range of the original signal. 411 10. Residual Coding 413 FLAC uses Exponential-Golomb (a variant of Rice) coding as its 414 residual encoder. You can learn more about exp-golomb coding 415 (https://en.wikipedia.org/wiki/Exponential-Golomb_coding) on 416 Wikipedia. 418 FLAC currently defines two similar methods for the coding of the 419 error signal from the prediction stage. The error signal is coded 420 using Exponential-Golomb codes in one of two ways: 422 1. the encoder estimates a single exp-golomb parameter based on the 423 variance of the residual and exp-golomb codes the entire residual 424 using this parameter; 426 2. the residual is partitioned into several equal-length regions of 427 contiguous samples, and each region is coded with its own exp- 428 golomb parameter based on the region's mean. 430 (Note that the first method is a special case of the second method 431 with one partition, except the exp-golomb parameter is based on the 432 residual variance instead of the mean.) 434 The FLAC format has reserved space for other coding methods. Some 435 possibilities for volunteers would be to explore better context- 436 modeling of the exp-golomb parameter, or Huffman coding. See LOCO-I 437 (http://www.hpl.hp.com/techreports/98/HPL-98-193.html) and pucrunch ( 438 http://web.archive.org/web/20140827133312/http://www.cs.tut.fi/~alber 439 t/Dev/pucrunch/packing.html) for descriptions of several universal 440 codes. 442 11. Format 444 This section specifies the FLAC bitstream format. 446 11.1. Principles 448 FLAC has no format version information, but it does contain reserved 449 space in several places. Future versions of the format MAY use this 450 reserved space safely without breaking the format of older streams. 451 Older decoders MAY choose to abort decoding or skip data encoded with 452 newer methods. Apart from reserved patterns, in places the format 453 specifies invalid patterns, meaning that the patterns MAY never 454 appear in any valid bitstream, in any prior, present, or future 455 versions of the format. These invalid patterns are usually used to 456 make the synchronization mechanism more robust. 458 All numbers used in a FLAC bitstream MUST be integers; there are no 459 floating-point representations. All numbers MUST be big-endian 460 coded, except the length field used in Vorbis comments, which MUST be 461 little-endian coded. All numbers MUST be unsigned except linear 462 predictor coefficients, the linear prediction shift and numbers which 463 directly represent samples, which MUST be signed. None of these 464 restrictions apply to application metadata blocks. 466 All samples encoded to and decoded from the FLAC format MUST be in a 467 signed representation. 469 There are several ways to convert unsigned sample representations to 470 signed sample representations, but the coding methods provided by the 471 FLAC format work best on audio signals of which the numerical values 472 of the samples are centered around zero, i.e. have no DC offset. In 473 most unsigned audio formats, signals are centered around halfway the 474 range of the unsigned integer type used. If that is the case, all 475 sample representations SHOULD be converted by first copying the 476 number to a signed integer with sufficient range and then subtracting 477 half of the range of the unsigned integer type, which should result 478 in a signal with samples centered around 0. 480 11.2. Overview 482 Before the formal description of the stream, an overview might be 483 helpful. 485 * A FLAC bitstream consists of the "fLaC" (i.e. 0x664C6143) marker 486 at the beginning of the stream, followed by a mandatory metadata 487 block (called the STREAMINFO block), any number of other metadata 488 blocks, then the audio frames. 490 * FLAC supports up to 128 kinds of metadata blocks; currently the 491 following are defined: 493 - STREAMINFO: This block has information about the whole stream, 494 like sample rate, number of channels, total number of samples, 495 etc. It MUST be present as the first metadata block in the 496 stream. Other metadata blocks MAY follow, and ones that the 497 decoder doesn't understand, it will skip. 499 - PADDING: This block allows for an arbitrary amount of padding. 500 The contents of a PADDING block have no meaning. This block is 501 useful when it is known that metadata will be edited after 502 encoding; the user can instruct the encoder to reserve a 503 PADDING block of sufficient size so that when metadata is 504 added, it will simply overwrite the padding (which is 505 relatively quick) instead of having to insert it into the right 506 place in the existing file (which would normally require 507 rewriting the entire file). 509 - APPLICATION: This block is for use by third-party applications. 510 The only mandatory field is a 32-bit identifier. This ID is 511 granted upon request to an application by the FLAC maintainers. 512 The remainder is of the block is defined by the registered 513 application. Visit the registration page 514 (https://xiph.org/flac/id.html) if you would like to register 515 an ID for your application with FLAC. 517 - SEEKTABLE: This is an OPTIONAL block for storing seek points. 518 It is possible to seek to any given sample in a FLAC stream 519 without a seek table, but the delay can be unpredictable since 520 the bitrate MAY vary widely within a stream. By adding seek 521 points to a stream, this delay can be significantly reduced. 522 Each seek point takes 18 bytes, so 1% resolution within a 523 stream adds less than 2K. There can be only one SEEKTABLE in a 524 stream, but the table can have any number of seek points. 525 There is also a special 'placeholder' seekpoint which will be 526 ignored by decoders but which can be used to reserve space for 527 future seek point insertion. 529 - VORBIS_COMMENT: This block is for storing a list of human- 530 readable name/value pairs. Values are encoded using UTF-8. It 531 is an implementation of the Vorbis comment specification 532 (http://xiph.org/vorbis/doc/v-comment.html) (without the 533 framing bit). This is the only officially supported tagging 534 mechanism in FLAC. There MUST be only zero or one 535 VORBIS_COMMENT blocks in a stream. In some external 536 documentation, Vorbis comments are called FLAC tags to lessen 537 confusion. 539 - CUESHEET: This block is for storing various information that 540 can be used in a cue sheet. It supports track and index 541 points, compatible with Red Book CD digital audio discs, as 542 well as other CD-DA metadata such as media catalog number and 543 track ISRCs. The CUESHEET block is especially useful for 544 backing up CD-DA discs, but it can be used as a general purpose 545 cueing mechanism for playback. 547 - PICTURE: This block is for storing pictures associated with the 548 file, most commonly cover art from CDs. There MAY be more than 549 one PICTURE block in a file. The picture format is similar to 550 the APIC frame in ID3v2 (http://www.id3.org/id3v2.4.0-frames). 551 The PICTURE block has a type, MIME type, and UTF-8 description 552 like ID3v2, and supports external linking via URL (though this 553 is discouraged). The differences are that there is no 554 uniqueness constraint on the description field, and the MIME 555 type is mandatory. The FLAC PICTURE block also includes the 556 resolution, color depth, and palette size so that the client 557 can search for a suitable picture without having to scan them 558 all. 560 * The audio data is composed of one or more audio frames. Each 561 frame consists of a frame header, which contains a sync code, 562 information about the frame like the block size, sample rate, 563 number of channels, et cetera, and an 8-bit CRC. The frame header 564 also contains either the sample number of the first sample in the 565 frame (for variable-blocksize streams), or the frame number (for 566 fixed-blocksize streams). This allows for fast, sample-accurate 567 seeking to be performed. Following the frame header are encoded 568 subframes, one for each channel, and finally, the frame is zero- 569 padded to a byte boundary. Each subframe has its own header that 570 specifies how the subframe is encoded. 572 * Since a decoder MAY start decoding in the middle of a stream, 573 there MUST be a method to determine the start of a frame. A 574 14-bit sync code begins each frame. The sync code will not appear 575 anywhere else in the frame header. However, since it MAY appear 576 in the subframes, the decoder has two other ways of ensuring a 577 correct sync. The first is to check that the rest of the frame 578 header contains no invalid data. Even this is not foolproof since 579 valid header patterns can still occur within the subframes. The 580 decoder's final check is to generate an 8-bit CRC of the frame 581 header and compare this to the CRC stored at the end of the frame 582 header. 584 * Again, since a decoder MAY start decoding at an arbitrary frame in 585 the stream, each frame header MUST contain some basic information 586 about the stream because the decoder MAY not have access to the 587 STREAMINFO metadata block at the start of the stream. This 588 information includes sample rate, bits per sample, number of 589 channels, etc. Since the frame header is pure overhead, it has a 590 direct effect on the compression ratio. To keep the frame header 591 as small as possible, FLAC uses lookup tables for the most 592 commonly used values for frame parameters. For instance, the 593 sample rate part of the frame header is specified using 4 bits. 594 Eight of the bit patterns correspond to the commonly used sample 595 rates of 8, 16, 22.05, 24, 32, 44.1, 48 or 96 kHz. However, odd 596 sample rates can be specified by using one of the 'hint' bit 597 patterns, directing the decoder to find the exact sample rate at 598 the end of the frame header. The same method is used for 599 specifying the block size and bits per sample. In this way, the 600 frame header size stays small for all of the most common forms of 601 audio data. 603 * Individual subframes (one for each channel) are coded separately 604 within a frame, and appear serially in the stream. In other 605 words, the encoded audio data is NOT channel-interleaved. This 606 reduces decoder complexity at the cost of requiring larger decode 607 buffers. Each subframe has its own header specifying the 608 attributes of the subframe, like prediction method and order, 609 residual coding parameters, etc. The header is followed by the 610 encoded audio data for that channel. 612 11.3. Subset 614 FLAC specifies a subset of itself as the Subset format. The purpose 615 of this is to ensure that any streams encoded according to the Subset 616 are truly "streamable", meaning that a decoder that cannot seek 617 within the stream can still pick up in the middle of the stream and 618 start decoding. It also makes hardware decoder implementations more 619 practical by limiting the encoding parameters such that decoder 620 buffer sizes and other resource requirements can be easily 621 determined. *flac* generates Subset streams by default unless the "-- 622 lax" command-line option is used. The Subset makes the following 623 limitations on what MAY be used in the stream: 625 * The blocksize bits in the FRAME_HEADER (see FRAME_HEADER section 626 (#frameheader)) MUST be 0b0001-0b1110. The blocksize MUST be <= 627 16384; if the sample rate is <= 48000 Hz, the blocksize MUST be <= 628 4608 = 2^9 * 3^2. 630 * The sample rate bits in the FRAME_HEADER MUST be 0b0001-0b1110. 632 * The bits-per-sample bits in the FRAME_HEADER MUST be 0b001-0b111. 634 * If the sample rate is <= 48000 Hz, the filter order in LPC 635 subframes (see SUBFRAME_LPC section (#subframelpc)) MUST be less 636 than or equal to 12, i.e. the subframe type bits in the 637 SUBFRAME_HEADER (see SUBFRAME_HEADER section (#subframeheader)) 638 SHOULD NOT be 0b101100-0b111111. 640 * The Rice partition order (see Coded residual section (#coded- 641 residual)) MUST be less than or equal to 8. 643 11.4. Conventions 645 The following tables constitute a formal description of the FLAC 646 format. Values expressed as u(n) represent unsigned big-endian 647 integer using n bits. n may be expressed as an equation using * 648 (multiplication), / (division), + (addition), or - (subtraction). An 649 inclusive range of the number of bits expressed may be represented 650 with an ellipsis, such as u(m...n). The name of a value followed by 651 an asterisk * indicates zero or more occurrences of the value. The 652 name of a value followed by a plus sign + indicates one or more 653 occurrences of the value. 655 11.5. STREAM 657 +===========================+=====================================+ 658 | Data | Description | 659 +===========================+=====================================+ 660 | u(32) | "fLaC", the FLAC stream marker in | 661 | | ASCII, meaning byte 0 of the stream | 662 | | is 0x66, followed by 0x4C 0x61 0x43 | 663 +---------------------------+-------------------------------------+ 664 | METADATA_BLOCK_STREAMINFO | This is the mandatory STREAMINFO | 665 | | metadata block that has the basic | 666 | | properties of the stream. | 667 +---------------------------+-------------------------------------+ 668 | METADATA_BLOCK* | Zero or more metadata blocks | 669 +---------------------------+-------------------------------------+ 670 | FRAME+ | One or more audio frames | 671 +---------------------------+-------------------------------------+ 673 Table 1 675 11.6. METADATA_BLOCK 677 +=======================+========================================+ 678 | Data | Description | 679 +=======================+========================================+ 680 | METADATA_BLOCK_HEADER | A block header that specifies the type | 681 | | and size of the metadata block data. | 682 +-----------------------+----------------------------------------+ 683 | METADATA_BLOCK_DATA | | 684 +-----------------------+----------------------------------------+ 686 Table 2 688 11.7. METADATA_BLOCK_HEADER 690 +=======+=========================================================+ 691 | Data | Description | 692 +=======+=========================================================+ 693 | u(1) | Last-metadata-block flag: '1' if this block is the last | 694 | | metadata block before the audio blocks, '0' otherwise. | 695 +-------+---------------------------------------------------------+ 696 | u(7) | BLOCK_TYPE | 697 +-------+---------------------------------------------------------+ 698 | u(24) | Length (in bytes) of metadata to follow (does not | 699 | | include the size of the METADATA_BLOCK_HEADER) | 700 +-------+---------------------------------------------------------+ 702 Table 3 704 11.8. BLOCK_TYPE 706 +=========+====================================================+ 707 | Value | Description | 708 +=========+====================================================+ 709 | 0 | STREAMINFO | 710 +---------+----------------------------------------------------+ 711 | 1 | PADDING | 712 +---------+----------------------------------------------------+ 713 | 2 | APPLICATION | 714 +---------+----------------------------------------------------+ 715 | 3 | SEEKTABLE | 716 +---------+----------------------------------------------------+ 717 | 4 | VORBIS_COMMENT | 718 +---------+----------------------------------------------------+ 719 | 5 | CUESHEET | 720 +---------+----------------------------------------------------+ 721 | 6 | PICTURE | 722 +---------+----------------------------------------------------+ 723 | 7 - 126 | reserved | 724 +---------+----------------------------------------------------+ 725 | 127 | invalid, to avoid confusion with a frame sync code | 726 +---------+----------------------------------------------------+ 728 Table 4 730 11.9. METADATA_BLOCK_DATA 732 +===================================================+==============+ 733 | Data | Description | 734 +===================================================+==============+ 735 | METADATA_BLOCK_STREAMINFO || | The block | 736 | METADATA_BLOCK_PADDING || | data MUST | 737 | METADATA_BLOCK_APPLICATION || | match the | 738 | METADATA_BLOCK_SEEKTABLE || | block type | 739 | METADATA_BLOCK_VORBIS_COMMENT || | in the block | 740 | METADATA_BLOCK_CUESHEET || METADATA_BLOCK_PICTURE | header. | 741 +---------------------------------------------------+--------------+ 743 Table 5 745 11.10. METADATA_BLOCK_STREAMINFO 747 +========+=================================================+ 748 | Data | Description | 749 +========+=================================================+ 750 | u(16) | The minimum block size (in samples) used in the | 751 | | stream. | 752 +--------+-------------------------------------------------+ 753 | u(16) | The maximum block size (in samples) used in the | 754 | | stream. (Minimum blocksize == maximum | 755 | | blocksize) implies a fixed-blocksize stream. | 756 +--------+-------------------------------------------------+ 757 | u(24) | The minimum frame size (in bytes) used in the | 758 | | stream. A value of 0 signifies that the value | 759 | | is not known. | 760 +--------+-------------------------------------------------+ 761 | u(24) | The maximum frame size (in bytes) used in the | 762 | | stream. A value of 0 signifies that the value | 763 | | is not known. | 764 +--------+-------------------------------------------------+ 765 | u(20) | Sample rate in Hz. Though 20 bits are | 766 | | available, the maximum sample rate is limited | 767 | | by the structure of frame headers to 655350 Hz. | 768 | | Also, a value of 0 is invalid. | 769 +--------+-------------------------------------------------+ 770 | u(3) | (number of channels)-1. FLAC supports from 1 | 771 | | to 8 channels | 772 +--------+-------------------------------------------------+ 773 | u(5) | (bits per sample)-1. FLAC supports from 4 to | 774 | | 32 bits per sample. Currently the reference | 775 | | encoder and decoders only support up to 24 bits | 776 | | per sample. | 777 +--------+-------------------------------------------------+ 778 | u(36) | Total samples in stream. 'Samples' means | 779 | | inter-channel sample, i.e. one second of 44.1 | 780 | | kHz audio will have 44100 samples regardless of | 781 | | the number of channels. A value of zero here | 782 | | means the number of total samples is unknown. | 783 +--------+-------------------------------------------------+ 784 | u(128) | MD5 signature of the unencoded audio data. | 785 | | This allows the decoder to determine if an | 786 | | error exists in the audio data even when the | 787 | | error does not result in an invalid bitstream. | 788 +--------+-------------------------------------------------+ 790 Table 6 792 FLAC specifies a minimum block size of 16 and a maximum block size of 793 65535, meaning the bit patterns corresponding to the numbers 0-15 in 794 the minimum blocksize and maximum blocksize fields are invalid. 796 The MD5 signature is made by performing an MD5 transformation on the 797 samples of all channels interleaved, represented in signed, little- 798 endian form. This interleaving is on a per-sample basis, so for a 799 stereo file this means first the first sample of the first channel, 800 then the first sample of the second channel, then the second sample 801 of the first channel etc. Before performing the MD5 transformation, 802 all samples must be byte-aligned. So, in case the bit depth is not a 803 whole number of bytes, additional zero bits are inserted at the most- 804 significant position until each sample representation is a whole 805 number of bytes. 807 11.11. METADATA_BLOCK_PADDING 809 +======+========================================+ 810 | Data | Description | 811 +======+========================================+ 812 | u(n) | n '0' bits (n MUST be a multiple of 8) | 813 +------+----------------------------------------+ 815 Table 7 817 11.12. METADATA_BLOCK_APPLICATION 819 +=======+===========================================+ 820 | Data | Description | 821 +=======+===========================================+ 822 | u(32) | Registered application ID. (Visit the | 823 | | registration page (https://xiph.org/flac/ | 824 | | id.html) to register an ID with FLAC.) | 825 +-------+-------------------------------------------+ 826 | u(n) | Application data (n MUST be a multiple of | 827 | | 8) | 828 +-------+-------------------------------------------+ 830 Table 8 832 11.13. METADATA_BLOCK_SEEKTABLE 834 +============+==========================+ 835 | Data | Description | 836 +============+==========================+ 837 | SEEKPOINT+ | One or more seek points. | 838 +------------+--------------------------+ 840 Table 9 842 NOTE - The number of seek points is implied by the metadata header 843 'length' field, i.e. equal to length / 18. 845 11.14. SEEKPOINT 847 +=======+==========================================================+ 848 | Data | Description | 849 +=======+==========================================================+ 850 | u(64) | Sample number of first sample in the target frame, or | 851 | | 0xFFFFFFFFFFFFFFFF for a placeholder point. | 852 +-------+----------------------------------------------------------+ 853 | u(64) | Offset (in bytes) from the first byte of the first frame | 854 | | header to the first byte of the target frame's header. | 855 +-------+----------------------------------------------------------+ 856 | u(16) | Number of samples in the target frame. | 857 +-------+----------------------------------------------------------+ 859 Table 10 861 NOTES 863 * For placeholder points, the second and third field values are 864 undefined. 866 * Seek points within a table MUST be sorted in ascending order by 867 sample number. 869 * Seek points within a table MUST be unique by sample number, with 870 the exception of placeholder points. 872 * The previous two notes imply that there MAY be any number of 873 placeholder points, but they MUST all occur at the end of the 874 table. 876 11.15. METADATA_BLOCK_VORBIS_COMMENT 878 +======+===========================================================+ 879 | Data | Description | 880 +======+===========================================================+ 881 | u(n) | Also known as FLAC tags, the contents of a vorbis comment | 882 | | packet as specified here (http://www.xiph.org/vorbis/doc/ | 883 | | v-comment.html) (without the framing bit). Note that the | 884 | | vorbis comment spec allows for on the order of 2^64 bytes | 885 | | of data where as the FLAC metadata block is limited to | 886 | | 2^24 bytes. Given the stated purpose of vorbis comments, | 887 | | i.e. human-readable textual information, this limit is | 888 | | unlikely to be restrictive. Also note that the 32-bit | 889 | | field lengths are little-endian coded according to the | 890 | | vorbis spec, as opposed to the usual big-endian coding of | 891 | | fixed-length integers in the rest of FLAC. | 892 +------+-----------------------------------------------------------+ 894 Table 11 896 11.16. METADATA_BLOCK_CUESHEET 898 +=================+================================================+ 899 | Data | Description | 900 +=================+================================================+ 901 | u(128*8) | Media catalog number, in ASCII printable | 902 | | characters 0x20-0x7E. In general, the media | 903 | | catalog number SHOULD be 0 to 128 bytes long; | 904 | | any unused characters SHOULD be right-padded | 905 | | with NUL characters. For CD-DA, this is a | 906 | | thirteen digit number, followed by 115 NUL | 907 | | bytes. | 908 +-----------------+------------------------------------------------+ 909 | u(64) | The number of lead-in samples. This field has | 910 | | meaning only for CD-DA cuesheets; for other | 911 | | uses it SHOULD be 0. For CD-DA, the lead-in | 912 | | is the TRACK 00 area where the table of | 913 | | contents is stored; more precisely, it is the | 914 | | number of samples from the first sample of the | 915 | | media to the first sample of the first index | 916 | | point of the first track. According to the | 917 | | Red Book, the lead-in MUST be silence and CD | 918 | | grabbing software does not usually store it; | 919 | | additionally, the lead-in MUST be at least two | 920 | | seconds but MAY be longer. For these reasons | 921 | | the lead-in length is stored here so that the | 922 | | absolute position of the first track can be | 923 | | computed. Note that the lead-in stored here | 924 | | is the number of samples up to the first index | 925 | | point of the first track, not necessarily to | 926 | | INDEX 01 of the first track; even the first | 927 | | track MAY have INDEX 00 data. | 928 +-----------------+------------------------------------------------+ 929 | u(1) | 1 if the CUESHEET corresponds to a Compact | 930 | | Disc, else 0. | 931 +-----------------+------------------------------------------------+ 932 | u(7+258*8) | Reserved. All bits MUST be set to zero. | 933 +-----------------+------------------------------------------------+ 934 | u(8) | The number of tracks. Must be at least 1 | 935 | | (because of the requisite lead-out track). | 936 | | For CD-DA, this number MUST be no more than | 937 | | 100 (99 regular tracks and one lead-out | 938 | | track). | 939 +-----------------+------------------------------------------------+ 940 | CUESHEET_TRACK+ | One or more tracks. A CUESHEET block is | 941 | | REQUIRED to have a lead-out track; it is | 942 | | always the last track in the CUESHEET. For | 943 | | CD-DA, the lead-out track number MUST be 170 | 944 | | as specified by the Red Book, otherwise it | 945 | | MUST be 255. | 946 +-----------------+------------------------------------------------+ 948 Table 12 950 11.17. CUESHEET_TRACK 952 +=====================+=================================================+ 953 |Data |Description | 954 +=====================+=================================================+ 955 |u(64) |Track offset in samples, relative to the | 956 | |beginning of the FLAC audio stream. It is the | 957 | |offset to the first index point of the track. | 958 | |(Note how this differs from CD-DA, where the | 959 | |track's offset in the TOC is that of the track's | 960 | |INDEX 01 even if there is an INDEX 00.) For CD- | 961 | |DA, the offset MUST be evenly divisible by 588 | 962 | |samples (588 samples = 44100 samples/s * 1/75 s).| 963 +---------------------+-------------------------------------------------+ 964 |u(8) |Track number. A track number of 0 is not allowed| 965 | |to avoid conflicting with the CD-DA spec, which | 966 | |reserves this for the lead-in. For CD-DA the | 967 | |number MUST be 1-99, or 170 for the lead-out; for| 968 | |non-CD-DA, the track number MUST for 255 for the | 969 | |lead-out. It is not REQUIRED but encouraged to | 970 | |start with track 1 and increase sequentially. | 971 | |Track numbers MUST be unique within a CUESHEET. | 972 +---------------------+-------------------------------------------------+ 973 |u(12*8) |Track ISRC. This is a 12-digit alphanumeric | 974 | |code; see here (http://isrc.ifpi.org/) and here | 975 | |(http://www.disctronics.co.uk/technology/cdaudio/| 976 | |cdaud_isrc.htm). A value of 12 ASCII NUL | 977 | |characters MAY be used to denote absence of an | 978 | |ISRC. | 979 +---------------------+-------------------------------------------------+ 980 |u(1) |The track type: 0 for audio, 1 for non-audio. | 981 | |This corresponds to the CD-DA Q-channel control | 982 | |bit 3. | 983 +---------------------+-------------------------------------------------+ 984 |u(1) |The pre-emphasis flag: 0 for no pre-emphasis, 1 | 985 | |for pre-emphasis. This corresponds to the CD-DA | 986 | |Q-channel control bit 5; see here | 987 | |(http://www.chipchapin.com/CDMedia/cdda9.php3). | 988 +---------------------+-------------------------------------------------+ 989 |u(6+13*8) |Reserved. All bits MUST be set to zero. | 990 +---------------------+-------------------------------------------------+ 991 |u(8) |The number of track index points. There MUST be | 992 | |at least one index in every track in a CUESHEET | 993 | |except for the lead-out track, which MUST have | 994 | |zero. For CD-DA, this number SHOULD NOT be more | 995 | |than 100. | 996 +---------------------+-------------------------------------------------+ 997 |CUESHEET_TRACK_INDEX+|For all tracks except the lead-out track, one or | 998 | |more track index points. | 999 +---------------------+-------------------------------------------------+ 1001 Table 13 1003 11.18. CUESHEET_TRACK_INDEX 1005 +========+=========================================================+ 1006 | Data | Description | 1007 +========+=========================================================+ 1008 | u(64) | Offset in samples, relative to the track offset, of the | 1009 | | index point. For CD-DA, the offset MUST be evenly | 1010 | | divisible by 588 samples (588 samples = 44100 samples/s | 1011 | | * 1/75 s). Note that the offset is from the beginning | 1012 | | of the track, not the beginning of the audio data. | 1013 +--------+---------------------------------------------------------+ 1014 | u(8) | The index point number. For CD-DA, an index number of | 1015 | | 0 corresponds to the track pre-gap. The first index in | 1016 | | a track MUST have a number of 0 or 1, and subsequently, | 1017 | | index numbers MUST increase by 1. Index numbers MUST | 1018 | | be unique within a track. | 1019 +--------+---------------------------------------------------------+ 1020 | u(3*8) | Reserved. All bits MUST be set to zero. | 1021 +--------+---------------------------------------------------------+ 1023 Table 14 1025 11.19. METADATA_BLOCK_PICTURE 1027 +========+==================================================+ 1028 | Data | Description | 1029 +========+==================================================+ 1030 | u(32) | The PICTURE_TYPE according to the ID3v2 APIC | 1031 | | frame. | 1032 +--------+--------------------------------------------------+ 1033 | u(32) | The length of the MIME type string in bytes. | 1034 +--------+--------------------------------------------------+ 1035 | u(n*8) | The MIME type string, in printable ASCII | 1036 | | characters 0x20-0x7E. The MIME type MAY also be | 1037 | | --> to signify that the data part is a URL of | 1038 | | the picture instead of the picture data itself. | 1039 +--------+--------------------------------------------------+ 1040 | u(32) | The length of the description string in bytes. | 1041 +--------+--------------------------------------------------+ 1042 | u(n*8) | The description of the picture, in UTF-8. | 1043 +--------+--------------------------------------------------+ 1044 | u(32) | The width of the picture in pixels. | 1045 +--------+--------------------------------------------------+ 1046 | u(32) | The height of the picture in pixels. | 1047 +--------+--------------------------------------------------+ 1048 | u(32) | The color depth of the picture in bits-per- | 1049 | | pixel. | 1050 +--------+--------------------------------------------------+ 1051 | u(32) | For indexed-color pictures (e.g. GIF), the | 1052 | | number of colors used, or 0 for non-indexed | 1053 | | pictures. | 1054 +--------+--------------------------------------------------+ 1055 | u(32) | The length of the picture data in bytes. | 1056 +--------+--------------------------------------------------+ 1057 | u(n*8) | The binary picture data. | 1058 +--------+--------------------------------------------------+ 1060 Table 15 1062 11.20. PICTURE_TYPE 1064 +=======+=====================================+ 1065 | Value | Description | 1066 +=======+=====================================+ 1067 | 0 | Other | 1068 +-------+-------------------------------------+ 1069 | 1 | 32x32 pixels 'file icon' (PNG only) | 1070 +-------+-------------------------------------+ 1071 | 2 | Other file icon | 1072 +-------+-------------------------------------+ 1073 | 3 | Cover (front) | 1074 +-------+-------------------------------------+ 1075 | 4 | Cover (back) | 1076 +-------+-------------------------------------+ 1077 | 5 | Leaflet page | 1078 +-------+-------------------------------------+ 1079 | 6 | Media (e.g. label side of CD) | 1080 +-------+-------------------------------------+ 1081 | 7 | Lead artist/lead performer/soloist | 1082 +-------+-------------------------------------+ 1083 | 8 | Artist/performer | 1084 +-------+-------------------------------------+ 1085 | 9 | Conductor | 1086 +-------+-------------------------------------+ 1087 | 10 | Band/Orchestra | 1088 +-------+-------------------------------------+ 1089 | 11 | Composer | 1090 +-------+-------------------------------------+ 1091 | 12 | Lyricist/text writer | 1092 +-------+-------------------------------------+ 1093 | 13 | Recording Location | 1094 +-------+-------------------------------------+ 1095 | 14 | During recording | 1096 +-------+-------------------------------------+ 1097 | 15 | During performance | 1098 +-------+-------------------------------------+ 1099 | 16 | Movie/video screen capture | 1100 +-------+-------------------------------------+ 1101 | 17 | A bright colored fish | 1102 +-------+-------------------------------------+ 1103 | 18 | Illustration | 1104 +-------+-------------------------------------+ 1105 | 19 | Band/artist logotype | 1106 +-------+-------------------------------------+ 1107 | 20 | Publisher/Studio logotype | 1108 +-------+-------------------------------------+ 1109 Table 16 1111 Other values are reserved and SHOULD NOT be used. There MAY only be 1112 one each of picture type 1 and 2 in a file. 1114 11.21. FRAME 1116 +==============+=================================+ 1117 | Data | Description | 1118 +==============+=================================+ 1119 | FRAME_HEADER | | 1120 +--------------+---------------------------------+ 1121 | SUBFRAME+ | One SUBFRAME per channel. | 1122 +--------------+---------------------------------+ 1123 | u(?) | Zero-padding to byte alignment. | 1124 +--------------+---------------------------------+ 1125 | FRAME_FOOTER | | 1126 +--------------+---------------------------------+ 1128 Table 17 1130 11.22. FRAME_HEADER 1132 +=======+================================+ 1133 | Data | Description | 1134 +=======+================================+ 1135 | u(14) | Sync code '0b11111111111110' | 1136 +-------+--------------------------------+ 1137 | u(1) | FRAME HEADER RESERVED | 1138 +-------+--------------------------------+ 1139 | u(1) | BLOCKING STRATEGY | 1140 +-------+--------------------------------+ 1141 | u(4) | INTERCHANNEL SAMPLE BLOCK SIZE | 1142 +-------+--------------------------------+ 1143 | u(4) | SAMPLE RATE | 1144 +-------+--------------------------------+ 1145 | u(4) | CHANNEL ASSIGNMENT | 1146 +-------+--------------------------------+ 1147 | u(3) | SAMPLE SIZE | 1148 +-------+--------------------------------+ 1149 | u(1) | FRAME HEADER RESERVED2 | 1150 +-------+--------------------------------+ 1151 | u(?) | CODED NUMBER | 1152 +-------+--------------------------------+ 1153 | u(?) | BLOCK SIZE INT | 1154 +-------+--------------------------------+ 1155 | u(?) | SAMPLE RATE INT | 1156 +-------+--------------------------------+ 1157 | u(8) | FRAME CRC | 1158 +-------+--------------------------------+ 1160 Table 18 1162 11.22.1. FRAME HEADER RESERVED 1164 +=======+=========================+ 1165 | Value | Description | 1166 +=======+=========================+ 1167 | 0 | mandatory value | 1168 +-------+-------------------------+ 1169 | 1 | reserved for future use | 1170 +-------+-------------------------+ 1172 Table 19 1174 FRAME HEADER RESERVED MUST remain reserved for 0 in order for a FLAC 1175 frame's initial 15 bits to be distinguishable from the start of an 1176 MPEG audio frame (see also (http://lists.xiph.org/pipermail/flac- 1177 dev/2008-December/002607.html)). 1179 11.22.2. BLOCKING STRATEGY 1181 +=======+==================================+ 1182 | Value | Description | 1183 +=======+==================================+ 1184 | 0 | fixed-blocksize stream; frame | 1185 | | header encodes the frame number | 1186 +-------+----------------------------------+ 1187 | 1 | variable-blocksize stream; frame | 1188 | | header encodes the sample number | 1189 +-------+----------------------------------+ 1191 Table 20 1193 The BLOCKING STRATEGY bit MUST be the same throughout the entire 1194 stream. 1196 The BLOCKING STRATEGY bit determines how to calculate the sample 1197 number of the first sample in the frame. If the bit is 0 (fixed- 1198 blocksize), the frame header encodes the frame number as above, and 1199 the frame's starting sample number will be the frame number times the 1200 blocksize. If it is 1 (variable-blocksize), the frame header encodes 1201 the frame's starting sample number itself. (In the case of a fixed- 1202 blocksize stream, only the last block MAY be shorter than the stream 1203 blocksize; its starting sample number will be calculated as the frame 1204 number times the previous frame's blocksize, or zero if it is the 1205 first frame). 1207 11.22.3. INTERCHANNEL SAMPLE BLOCK SIZE 1209 +=================+=========================================+ 1210 | Value | Description | 1211 +=================+=========================================+ 1212 | 0b0000 | reserved | 1213 +-----------------+-----------------------------------------+ 1214 | 0b0001 | 192 samples | 1215 +-----------------+-----------------------------------------+ 1216 | 0b0010 - 0b0101 | 576 * (2^(n-2)) samples, i.e. 576, | 1217 | | 1152, 2304 or 4608 | 1218 +-----------------+-----------------------------------------+ 1219 | 0b0110 | get 8 bit (blocksize-1) from end of | 1220 | | header | 1221 +-----------------+-----------------------------------------+ 1222 | 0b0111 | get 16 bit (blocksize-1) from end of | 1223 | | header | 1224 +-----------------+-----------------------------------------+ 1225 | 0b1000 - 0b1111 | 256 * (2^(n-8)) samples, i.e. 256, 512, | 1226 | | 1024, 2048, 4096, 8192, 16384 or 32768 | 1227 +-----------------+-----------------------------------------+ 1229 Table 21 1231 11.22.4. SAMPLE RATE 1233 +========+=====================================================+ 1234 | Value | Description | 1235 +========+=====================================================+ 1236 | 0b0000 | get from STREAMINFO metadata block | 1237 +--------+-----------------------------------------------------+ 1238 | 0b0001 | 88.2 kHz | 1239 +--------+-----------------------------------------------------+ 1240 | 0b0010 | 176.4 kHz | 1241 +--------+-----------------------------------------------------+ 1242 | 0b0011 | 192 kHz | 1243 +--------+-----------------------------------------------------+ 1244 | 0b0100 | 8 kHz | 1245 +--------+-----------------------------------------------------+ 1246 | 0b0101 | 16 kHz | 1247 +--------+-----------------------------------------------------+ 1248 | 0b0110 | 22.05 kHz | 1249 +--------+-----------------------------------------------------+ 1250 | 0b0111 | 24 kHz | 1251 +--------+-----------------------------------------------------+ 1252 | 0b1000 | 32 kHz | 1253 +--------+-----------------------------------------------------+ 1254 | 0b1001 | 44.1 kHz | 1255 +--------+-----------------------------------------------------+ 1256 | 0b1010 | 48 kHz | 1257 +--------+-----------------------------------------------------+ 1258 | 0b1011 | 96 kHz | 1259 +--------+-----------------------------------------------------+ 1260 | 0b1100 | get 8 bit sample rate (in kHz) from end of header | 1261 +--------+-----------------------------------------------------+ 1262 | 0b1101 | get 16 bit sample rate (in Hz) from end of header | 1263 +--------+-----------------------------------------------------+ 1264 | 0b1110 | get 16 bit sample rate (in daHz) from end of header | 1265 +--------+-----------------------------------------------------+ 1266 | 0b1111 | invalid, to prevent sync-fooling string of 1s | 1267 +--------+-----------------------------------------------------+ 1269 Table 22 1271 11.22.5. CHANNEL ASSIGNMENT 1273 Values 0b0000-0b0111 represent the (number of independent channels)-1 1274 coded independently, channel order follows SMPTE/ITU-R 1275 recommendations. Values 0b1000-0b1010 represent 2 channel (stereo) 1276 audio where the signal has been mapped to a different representation, 1277 see section on Interchannel Decorrelation (#interchannel- 1278 decorrelation). 1280 +==========+======================================================+ 1281 | Value | Description | 1282 +==========+======================================================+ 1283 | 0b0000 | 1 channel: mono | 1284 +----------+------------------------------------------------------+ 1285 | 0b0001 | 2 channels: left, right | 1286 +----------+------------------------------------------------------+ 1287 | 0b0010 | 3 channels: left, right, center | 1288 +----------+------------------------------------------------------+ 1289 | 0b0011 | 4 channels: front left, front right, back left, back | 1290 | | right | 1291 +----------+------------------------------------------------------+ 1292 | 0b0100 | 5 channels: front left, front right, front center, | 1293 | | back/surround left, back/surround right | 1294 +----------+------------------------------------------------------+ 1295 | 0b0101 | 6 channels: front left, front right, front center, | 1296 | | LFE, back/surround left, back/surround right | 1297 +----------+------------------------------------------------------+ 1298 | 0b0110 | 7 channels: front left, front right, front center, | 1299 | | LFE, back center, side left, side right | 1300 +----------+------------------------------------------------------+ 1301 | 0b0111 | 8 channels: front left, front right, front center, | 1302 | | LFE, back left, back right, side left, side right | 1303 +----------+------------------------------------------------------+ 1304 | 0b1000 | left/side stereo: channel 0 is the left channel, | 1305 | | channel 1 is the side(difference) channel | 1306 +----------+------------------------------------------------------+ 1307 | 0b1001 | right/side stereo: channel 0 is the side(difference) | 1308 | | channel, channel 1 is the right channel | 1309 +----------+------------------------------------------------------+ 1310 | 0b1010 | mid/side stereo: channel 0 is the mid(average) | 1311 | | channel, channel 1 is the side(difference) channel | 1312 +----------+------------------------------------------------------+ 1313 | 0b1011 - | reserved | 1314 | 0b1111 | | 1315 +----------+------------------------------------------------------+ 1317 Table 23 1319 Please note that the actual coded subframe order for right/side 1320 stereo is side-right. 1322 11.22.6. SAMPLE SIZE 1324 +=======+====================================+ 1325 | Value | Description | 1326 +=======+====================================+ 1327 | 0b000 | get from STREAMINFO metadata block | 1328 +-------+------------------------------------+ 1329 | 0b001 | 8 bits per sample | 1330 +-------+------------------------------------+ 1331 | 0b010 | 12 bits per sample | 1332 +-------+------------------------------------+ 1333 | 0b011 | reserved | 1334 +-------+------------------------------------+ 1335 | 0b100 | 16 bits per sample | 1336 +-------+------------------------------------+ 1337 | 0b101 | 20 bits per sample | 1338 +-------+------------------------------------+ 1339 | 0b110 | 24 bits per sample | 1340 +-------+------------------------------------+ 1341 | 0b111 | reserved | 1342 +-------+------------------------------------+ 1344 Table 24 1346 For subframes that encode a difference channel, the sample size is 1347 one bit larger than the sample size of the frame, in order to be able 1348 to encode the difference between extreme values. 1350 11.22.7. FRAME HEADER RESERVED2 1352 +=======+=========================+ 1353 | Value | Description | 1354 +=======+=========================+ 1355 | 0 | mandatory value | 1356 +-------+-------------------------+ 1357 | 1 | reserved for future use | 1358 +-------+-------------------------+ 1360 Table 25 1362 11.22.8. CODED NUMBER 1364 Frame/Sample numbers are encoded using the UTF-8 format, from BEFORE 1365 it was limited to 4 bytes by RFC3629, this variant supports the 1366 original 7 byte maximum. 1368 Note to implementors: All Unicode compliant UTF-8 decoders and 1369 encoders are limited to 4 bytes, it's best to just write your own one 1370 off solution. 1372 if(variable blocksize) 1373 `u(8...56)`: "UTF-8" coded sample number (decoded number is 36 bits) 1374 else 1375 `u(8...48)`: "UTF-8" coded frame number (decoded number is 31 bits) 1377 11.22.9. BLOCK SIZE INT 1379 if(`INTERCHANNEL SAMPLE BLOCK SIZE` == 0b0110) 1380 8 bit (blocksize-1) 1381 else if(`INTERCHANNEL SAMPLE BLOCK SIZE` == 0b0111) 1382 16 bit (blocksize-1) 1384 11.22.10. SAMPLE RATE INT 1386 if(`SAMPLE RATE` == 0b1100) 1387 8 bit sample rate (in kHz) 1388 else if(`SAMPLE RATE` == 0b1101) 1389 16 bit sample rate (in Hz) 1390 else if(`SAMPLE RATE` == 0b1110) 1391 16 bit sample rate (in daHz) 1393 11.22.11. FRAME CRC 1395 CRC-8 (polynomial = x^8 + x^2 + x^1 + x^0, initialized with 0) of 1396 everything before the CRC, including the sync code 1398 11.23. FRAME_FOOTER 1400 +=======+===================================================+ 1401 | Data | Description | 1402 +=======+===================================================+ 1403 | u(16) | CRC-16 (polynomial = x^16 + x^15 + x^2 + x^0, | 1404 | | initialized with 0) of everything before the CRC, | 1405 | | back to and including the frame header sync code | 1406 +-------+---------------------------------------------------+ 1408 Table 26 1410 11.24. SUBFRAME 1412 +========================================+======================+ 1413 | Data | Description | 1414 +========================================+======================+ 1415 | SUBFRAME_HEADER | | 1416 +----------------------------------------+----------------------+ 1417 | SUBFRAME_CONSTANT || SUBFRAME_FIXED || | The SUBFRAME_HEADER | 1418 | SUBFRAME_LPC || SUBFRAME_VERBATIM | specifies which one. | 1419 +----------------------------------------+----------------------+ 1421 Table 27 1423 11.25. SUBFRAME_HEADER 1425 +========+========================================================+ 1426 | Data | Description | 1427 +========+========================================================+ 1428 | u(1) | Zero bit padding, to prevent sync-fooling string of 1s | 1429 +--------+--------------------------------------------------------+ 1430 | u(6) | SUBFRAME TYPE (see section on SUBFRAME TYPE | 1431 | | (#subframe-type)) | 1432 +--------+--------------------------------------------------------+ 1433 | u(1+k) | WASTED BITS PER SAMPLE FLAG (see section on WASTED | 1434 | | BITS PER SAMPLE FLAG (#wasted-bits-per-sample-flag)) | 1435 +--------+--------------------------------------------------------+ 1437 Table 28 1439 11.25.1. SUBFRAME TYPE 1441 +==========+=======================================================+ 1442 | Value | Description | 1443 +==========+=======================================================+ 1444 | 0b000000 | SUBFRAME_CONSTANT | 1445 +----------+-------------------------------------------------------+ 1446 | 0b000001 | SUBFRAME_VERBATIM | 1447 +----------+-------------------------------------------------------+ 1448 | 0b00001x | reserved | 1449 +----------+-------------------------------------------------------+ 1450 | 0b0001xx | reserved | 1451 +----------+-------------------------------------------------------+ 1452 | 0b001xxx | if(xxx <= 4) SUBFRAME_FIXED, xxx=order; else reserved | 1453 +----------+-------------------------------------------------------+ 1454 | 0b01xxxx | reserved | 1455 +----------+-------------------------------------------------------+ 1456 | 0b1xxxxx | SUBFRAME_LPC, xxxxx=order-1 | 1457 +----------+-------------------------------------------------------+ 1458 Table 29 1460 11.25.2. WASTED BITS PER SAMPLE FLAG 1462 Certain file formats, like AIFF, can store audio samples with a bit 1463 depth that is not an integer number of bytes by padding them with 1464 least significant zero bits to a bit depth that is an integer number 1465 of bytes. For example, shifting a 14-bit sample right by 2 pads it 1466 to a 16-bit sample, which then has two zero least-significant bits. 1467 In this specification, these least-significant zero bits are referred 1468 to as wasted bits-per-sample or simply wasted bits. They are wasted 1469 in a sense that they contain no information, but are stored anyway. 1471 The wasted bits-per-sample flag in a subframe header is set to 1 if a 1472 certain number of least-significant bits of all samples in the 1473 current subframe are zero. If this is the case, the number of wasted 1474 bits-per-sample (k) minus 1 follows the flag in an unary encoding. 1475 For example, if k is 3, 0b001 follows. If k = 0, the wasted bits- 1476 per-sample flag is 0 and no unary coded k follows. 1478 In case k is not equal to 0, samples are coded ignoring k least- 1479 significant bits. For example, if the preceding frame header 1480 specified a sample size of 16 bits per sample and k is 3, samples in 1481 the subframe are coded as 13 bits per sample. A decoder MUST add k 1482 least-significant zero bits by shifting left (padding) after decoding 1483 a subframe sample. In case the frame has left/side, right/side or 1484 mid/side stereo, padding MUST happen to a sample before it is used to 1485 reconstruct a left or right sample. 1487 Besides audio files that have a certain number of wasted bits for the 1488 whole file, there exist audio files in which the number of wasted 1489 bits varies. There are DVD-Audio discs in which blocks of samples 1490 have had their least-significant bits selectively zeroed, as to 1491 slightly improve the compression of their otherwise lossless Meridian 1492 Lossless Packing codec. There are also audio processors like 1493 lossyWAV that enable users to improve compression of their files by a 1494 lossless audio codec in a non-lossless way. Because of this the 1495 number of wasted bits k MAY change between frames and MAY differ 1496 between subframes. 1498 11.26. SUBFRAME_CONSTANT 1500 +======+========================================+ 1501 | Data | Description | 1502 +======+========================================+ 1503 | u(n) | Unencoded constant value of the | 1504 | | subblock, n = frame's bits-per-sample. | 1505 +------+----------------------------------------+ 1506 Table 30 1508 11.27. SUBFRAME_FIXED 1510 +==========+========================================+ 1511 | Data | Description | 1512 +==========+========================================+ 1513 | u(n) | Unencoded warm-up samples (n = frame's | 1514 | | bits-per-sample * predictor order). | 1515 +----------+----------------------------------------+ 1516 | RESIDUAL | Encoded residual | 1517 +----------+----------------------------------------+ 1519 Table 31 1521 11.28. SUBFRAME_LPC 1523 +==========+========================================================+ 1524 | Data | Description | 1525 +==========+========================================================+ 1526 | u(n) | Unencoded warm-up samples (n = frame's bits- | 1527 | | per-sample * lpc order). | 1528 +----------+--------------------------------------------------------+ 1529 | u(4) | (quantized linear predictor coefficients' | 1530 | | precision in bits)-1 (NOTE: 0b1111 is invalid). | 1531 +----------+--------------------------------------------------------+ 1532 | u(5) | Quantized linear predictor coefficient shift | 1533 | | needed in bits (NOTE: this number is signed | 1534 | | two's-complement). | 1535 +----------+--------------------------------------------------------+ 1536 | u(n) | Unencoded predictor coefficients (n = qlp coeff | 1537 | | precision * lpc order) (NOTE: the coefficients | 1538 | | are signed two's-complement). | 1539 +----------+--------------------------------------------------------+ 1540 | RESIDUAL | Encoded residual | 1541 +----------+--------------------------------------------------------+ 1543 Table 32 1545 11.29. SUBFRAME_VERBATIM 1547 +=========+=============================================+ 1548 | Data | Description | 1549 +=========+=============================================+ 1550 | u(n\*i) | Unencoded subblock, where n is frame's | 1551 | | bits-per-sample and i is frame's blocksize. | 1552 +---------+---------------------------------------------+ 1553 Table 33 1555 11.30. RESIDUAL 1557 +================================================+======================+ 1558 |Data |Description | 1559 +================================================+======================+ 1560 |u(2) |RESIDUAL_CODING_METHOD| 1561 +------------------------------------------------+----------------------+ 1562 |RESIDUAL_CODING_METHOD_PARTITIONED_EXP_GOLOMB ||| | 1563 |RESIDUAL_CODING_METHOD_PARTITIONED_EXP_GOLOMB2 | | 1564 +------------------------------------------------+----------------------+ 1566 Table 34 1568 11.30.1. RESIDUAL_CODING_METHOD 1570 +=======+========================================================+ 1571 | Value | Description | 1572 +=======+========================================================+ 1573 | 0b00 | partitioned Exp-Golomb coding with 4-bit Exp-Golomb | 1574 | | parameter; | 1575 | | RESIDUAL_CODING_METHOD_PARTITIONED_EXP_GOLOMB follows | 1576 +-------+--------------------------------------------------------+ 1577 | 0b01 | partitioned Exp-Golomb coding with 5-bit Exp-Golomb | 1578 | | parameter; | 1579 | | RESIDUAL_CODING_METHOD_PARTITIONED_EXP_GOLOMB2 follows | 1580 +-------+--------------------------------------------------------+ 1581 | 0b10 | reserved | 1582 | - | | 1583 | 0b11 | | 1584 +-------+--------------------------------------------------------+ 1586 Table 35 1588 11.30.2. RESIDUAL_CODING_METHOD_PARTITIONED_EXP_GOLOMB 1590 +=======================+===================================+ 1591 | Data | Description | 1592 +=======================+===================================+ 1593 | u(4) | Partition order. | 1594 +-----------------------+-----------------------------------+ 1595 | EXP_GOLOMB_PARTITION+ | There will be 2^order partitions. | 1596 +-----------------------+-----------------------------------+ 1598 Table 36 1600 11.30.2.1. EXP_GOLOMB_PARTITION 1602 +==========+====================================================+ 1603 | Data | Description | 1604 +==========+====================================================+ 1605 | u(4(+5)) | EXP-GOLOMB PARTITION ENCODING PARAMETER (see | 1606 | | section on EXP-GOLOMB PARTITION ENCODING PARAMETER | 1607 | | (#exp-golomb-partition-encoding-parameter)) | 1608 +----------+----------------------------------------------------+ 1609 | u(?) | ENCODED RESIDUAL (see section on ENCODED RESIDUAL | 1610 | | (#encoded-residual)) | 1611 +----------+----------------------------------------------------+ 1613 Table 37 1615 11.30.2.2. EXP GOLOMB PARTITION ENCODING PARAMETER 1617 +==========+==========================================+ 1618 | Value | Description | 1619 +==========+==========================================+ 1620 | 0b0000 - | Exp-golomb parameter. | 1621 | 0b1110 | | 1622 +----------+------------------------------------------+ 1623 | 0b1111 | Escape code, meaning the partition is in | 1624 | | unencoded binary form using n bits per | 1625 | | sample; n follows as a 5-bit number. | 1626 +----------+------------------------------------------+ 1628 Table 38 1630 11.30.3. RESIDUAL_CODING_METHOD_PARTITIONED_EXP_GOLOMB2 1632 +========================+===================================+ 1633 | Data | Description | 1634 +========================+===================================+ 1635 | u(4) | Partition order. | 1636 +------------------------+-----------------------------------+ 1637 | EXP-GOLOMB2_PARTITION+ | There will be 2^order partitions. | 1638 +------------------------+-----------------------------------+ 1640 Table 39 1642 11.30.3.1. EXP_GOLOMB2_PARTITION 1644 +==========+=====================================================+ 1645 | Data | Description | 1646 +==========+=====================================================+ 1647 | u(5(+5)) | EXP-GOLOMB2 PARTITION ENCODING PARAMETER (see | 1648 | | section on EXP-GOLOMB2 PARTITION ENCODING PARAMETER | 1649 | | (#expgolomb2-partition-encoding-parameter)) | 1650 +----------+-----------------------------------------------------+ 1651 | u(?) | ENCODED RESIDUAL (see section on ENCODED RESIDUAL | 1652 | | (#encoded-residual)) | 1653 +----------+-----------------------------------------------------+ 1655 Table 40 1657 11.30.3.2. EXP-GOLOMB2 PARTITION ENCODING PARAMETER 1659 +===========+==========================================+ 1660 | Value | Description | 1661 +===========+==========================================+ 1662 | 0b00000 - | Exp-golomb parameter. | 1663 | 0b11110 | | 1664 +-----------+------------------------------------------+ 1665 | 0b11111 | Escape code, meaning the partition is in | 1666 | | unencoded binary form using n bits per | 1667 | | sample; n follows as a 5-bit number. | 1668 +-----------+------------------------------------------+ 1670 Table 41 1672 11.30.4. ENCODED RESIDUAL 1674 The number of samples (n) in the partition is determined as follows: 1676 * if the partition order is zero, n = frame's blocksize - predictor 1677 order 1679 * else if this is not the first partition of the subframe, n = 1680 (frame's blocksize / (2^partition order)) 1682 * else n = (frame's blocksize / (2^partition order)) - predictor 1683 order 1685 12. Security Considerations 1687 Like any other codec (such as [RFC6716]), FLAC should not be used 1688 with insecure ciphers or cipher modes that are vulnerable to known 1689 plaintext attacks. Some of the header bits as well as the padding 1690 are easily predictable. 1692 Implementations of the FLAC codec need to take appropriate security 1693 considerations into account. Those related to denial of service are 1694 outlined in Section 2.1 of [RFC4732]. It is extremely important for 1695 the decoder to be robust against malicious payloads. Malicious 1696 payloads MUST NOT cause the decoder to overrun its allocated memory 1697 or to take an excessive amount of resources to decode. An overrun in 1698 allocated memory could lead to arbitrary code execution by an 1699 attacker. The same applies to the encoder, even though problems in 1700 encoders are typically rarer. Malicious audio streams MUST NOT cause 1701 the encoder to misbehave because this would allow an attacker to 1702 attack transcoding gateways. An example is allocating more memory 1703 than available especially with blocksizes of more than 10000 or with 1704 big metadata blocks, or not allocating enough memory before copying 1705 data, which lead to execution of malicious code, crashes, freezes or 1706 reboots on some known implementations. See the FLAC decoder 1707 testbench (https://wiki.hydrogenaud.io/ 1708 index.php?title=FLAC_decoder_testbench) for a non-exhaustive list of 1709 FLAC files with extreme configurations which lead to crashes or 1710 reboots on some known implementations. 1712 None of the content carried in FLAC is intended to be executable. 1714 13. Normative References 1716 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1717 Requirement Levels", BCP 14, RFC 2119, 1718 DOI 10.17487/RFC2119, March 1997, 1719 . 1721 [RFC4732] Handley, M., Ed., Rescorla, E., Ed., and IAB, "Internet 1722 Denial-of-Service Considerations", RFC 4732, 1723 DOI 10.17487/RFC4732, December 2006, 1724 . 1726 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1727 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1728 May 2017, . 1730 14. Informative References 1732 [RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the 1733 Opus Audio Codec", RFC 6716, DOI 10.17487/RFC6716, 1734 September 2012, . 1736 Authors' Addresses 1738 Michael Richardson 1740 Email: mcr@sandelman.ca 1742 Andrew Weaver 1744 Email: theandrewjw@gmail.com