idnits 2.17.1 draft-ietf-cellar-flac-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There is 1 instance of lines with non-ascii characters in the document. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (23 April 2022) is 732 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Downref: Normative reference to an Informational RFC: RFC 4732 Summary: 2 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 cellar M.Q.C. van Beurden 3 Internet-Draft 4 Intended status: Standards Track A. Weaver 5 Expires: 25 October 2022 23 April 2022 7 Free Lossless Audio Codec 8 draft-ietf-cellar-flac-03 10 Abstract 12 This document defines the Free Lossless Audio Codec (FLAC) format. 13 FLAC is designed to reduce the amount of computer storage space 14 needed to store digital audio signals without needing to remove 15 information in doing so (i.e. lossless). FLAC is free in the sense 16 that its specification is open, its reference implementation is open- 17 source and it is not encumbered by any known patent. Compared to 18 other lossless (audio) coding formats, FLAC is a format with low 19 complexity and can be coded to and from with little computing 20 resources. Decoding of FLAC has seen many independent 21 implementations on many different platforms, and both encoding and 22 decoding can be implemented without needing floating-point 23 arithmetic. 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at https://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on 25 October 2022. 42 Copyright Notice 44 Copyright (c) 2022 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 49 license-info) in effect on the date of publication of this document. 50 Please review these documents carefully, as they describe your rights 51 and restrictions with respect to this document. Code Components 52 extracted from this document must include Revised BSD License text as 53 described in Section 4.e of the Trust Legal Provisions and are 54 provided without warranty as described in the Revised BSD License. 56 Table of Contents 58 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 59 2. Notation and Conventions . . . . . . . . . . . . . . . . . . 4 60 3. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 4 61 4. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 5 62 5. Conceptual overview . . . . . . . . . . . . . . . . . . . . . 7 63 5.1. Blocking . . . . . . . . . . . . . . . . . . . . . . . . 8 64 5.2. Interchannel Decorrelation . . . . . . . . . . . . . . . 8 65 5.3. Prediction . . . . . . . . . . . . . . . . . . . . . . . 9 66 5.4. Residual Coding . . . . . . . . . . . . . . . . . . . . . 10 67 6. Format principles . . . . . . . . . . . . . . . . . . . . . . 11 68 7. Format lay-out . . . . . . . . . . . . . . . . . . . . . . . 12 69 8. Format subset . . . . . . . . . . . . . . . . . . . . . . . . 15 70 9. File-level metadata . . . . . . . . . . . . . . . . . . . . . 15 71 9.1. Metadata block header . . . . . . . . . . . . . . . . . . 16 72 9.2. Streaminfo . . . . . . . . . . . . . . . . . . . . . . . 16 73 9.3. Padding . . . . . . . . . . . . . . . . . . . . . . . . . 18 74 9.4. Application . . . . . . . . . . . . . . . . . . . . . . . 18 75 9.5. Seektable . . . . . . . . . . . . . . . . . . . . . . . . 19 76 9.5.1. Seekpoint . . . . . . . . . . . . . . . . . . . . . . 19 77 9.6. Vorbis comment . . . . . . . . . . . . . . . . . . . . . 20 78 9.6.1. Standard field names . . . . . . . . . . . . . . . . 20 79 9.6.2. Channel mask . . . . . . . . . . . . . . . . . . . . 21 80 9.7. Cuesheet . . . . . . . . . . . . . . . . . . . . . . . . 23 81 9.7.1. Cuesheet track . . . . . . . . . . . . . . . . . . . 25 82 9.8. Picture . . . . . . . . . . . . . . . . . . . . . . . . . 26 83 10. Frame structure . . . . . . . . . . . . . . . . . . . . . . . 29 84 10.1. Frame header . . . . . . . . . . . . . . . . . . . . . . 29 85 10.1.1. Blocksize bits . . . . . . . . . . . . . . . . . . . 29 86 10.1.2. Sample rate bits . . . . . . . . . . . . . . . . . . 30 87 10.1.3. Channels bits . . . . . . . . . . . . . . . . . . . 31 88 10.1.4. Bit depth bits . . . . . . . . . . . . . . . . . . . 32 89 10.1.5. Coded number . . . . . . . . . . . . . . . . . . . . 32 90 10.1.6. Uncommon blocksize . . . . . . . . . . . . . . . . . 33 91 10.1.7. Uncommon sample rate . . . . . . . . . . . . . . . . 33 92 10.1.8. Frame header CRC . . . . . . . . . . . . . . . . . . 33 93 10.2. Subframes . . . . . . . . . . . . . . . . . . . . . . . 33 94 10.2.1. Subframe header . . . . . . . . . . . . . . . . . . 33 95 10.2.2. Wasted bits per sample . . . . . . . . . . . . . . . 34 96 10.2.3. Constant subframe . . . . . . . . . . . . . . . . . 35 97 10.2.4. Verbatim subframe . . . . . . . . . . . . . . . . . 35 98 10.2.5. Fixed predictor subframe . . . . . . . . . . . . . . 35 99 10.2.6. Linear predictor subframe . . . . . . . . . . . . . 37 100 10.2.7. Coded residual . . . . . . . . . . . . . . . . . . . 38 101 10.3. Frame footer . . . . . . . . . . . . . . . . . . . . . . 40 102 11. Implementation status . . . . . . . . . . . . . . . . . . . . 40 103 12. Security Considerations . . . . . . . . . . . . . . . . . . . 41 104 13. Normative References . . . . . . . . . . . . . . . . . . . . 41 105 14. Informative References . . . . . . . . . . . . . . . . . . . 42 106 Appendix A. Numerical considerations . . . . . . . . . . . . . . 43 107 A.1. Determining necessary data type size . . . . . . . . . . 43 108 A.2. Stereo decorrelation . . . . . . . . . . . . . . . . . . 44 109 A.3. Prediction . . . . . . . . . . . . . . . . . . . . . . . 44 110 A.4. Rice coding . . . . . . . . . . . . . . . . . . . . . . . 46 111 Appendix B. Examples . . . . . . . . . . . . . . . . . . . . . . 46 112 B.1. Decoding example 1 . . . . . . . . . . . . . . . . . . . 47 113 B.1.1. Example file 1 in hexadecimal representation . . . . 47 114 B.1.2. Example file 1 in binary representation . . . . . . . 47 115 B.1.3. Signature and streaminfo . . . . . . . . . . . . . . 47 116 B.1.4. Audio frames . . . . . . . . . . . . . . . . . . . . 49 117 B.2. Decoding example 2 . . . . . . . . . . . . . . . . . . . 51 118 B.2.1. Example file 2 in hexadecimal representation . . . . 51 119 B.2.2. Example file 2 in binary representation (only audio 120 frames) . . . . . . . . . . . . . . . . . . . . . . . 52 121 B.2.3. Signature and streaminfo . . . . . . . . . . . . . . 53 122 B.2.4. Seektable . . . . . . . . . . . . . . . . . . . . . . 53 123 B.2.5. Vorbis comment . . . . . . . . . . . . . . . . . . . 54 124 B.2.6. Padding . . . . . . . . . . . . . . . . . . . . . . . 55 125 B.2.7. First audio frame . . . . . . . . . . . . . . . . . . 56 126 B.2.8. Second audio frame . . . . . . . . . . . . . . . . . 62 127 B.2.9. MD5 checksum verification . . . . . . . . . . . . . . 64 128 B.3. Decoding example 3 . . . . . . . . . . . . . . . . . . . 64 129 B.3.1. Example file 3 in hexadecimal representation . . . . 64 130 B.3.2. Example file 3 in binary representation (only audio 131 frame) . . . . . . . . . . . . . . . . . . . . . . . 64 132 B.3.3. Signature and streaminfo . . . . . . . . . . . . . . 64 133 B.3.4. Audio frame . . . . . . . . . . . . . . . . . . . . . 65 134 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 70 136 1. Introduction 138 This document defines the FLAC format. FLAC files and streams can 139 code for pulse-code modulated (PCM) audio with 1 to 8 channels, 140 sample rates from 1 to 1048576 Hertz and bit depths between 4 and 32 141 bits. Most tools for coding to and decoding from the FLAC format 142 have been optimized for CD-audio, which is PCM audio with 2 channels, 143 a sample rate of 44.1 kHz and a bit depth of 16 bits. 145 FLAC is able to achieve lossless compression because samples in audio 146 signals tend to be highly correlated with their close neighbors. In 147 contrast with general purpose compressors, which often use 148 dictionaries, do run-length coding or exploit long-term repetition, 149 FLAC removes redundancy solely in the very short term, looking back 150 at most 32 samples. 152 The coding methods provided by the FLAC format work best on PCM audio 153 signals of which the samples have a signed representation and are 154 centered around zero. Audio signals in which samples have an 155 unsigned representation must be transformed to a signed 156 representation as described in this document in order to achieve 157 reasonable compression. The FLAC format is not suited to compress 158 audio that is not PCM. Pulse-density modulated audio, e.g. DSD, 159 cannot be compressed by FLAC. 161 2. Notation and Conventions 163 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 164 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 165 "OPTIONAL" in this document are to be interpreted as described in BCP 166 14 [RFC2119] [RFC8174] when, and only when, they appear in all 167 capitals, as shown here. 169 Values expressed as u(n) represent unsigned big-endian integer using 170 n bits. Values expressed as s(n) represent signed big-endian integer 171 using n bits, signed two's complement. n may be expressed as an 172 equation using * (multiplication), / (division), + (addition), or - 173 (subtraction). An inclusive range of the number of bits expressed 174 may be represented with an ellipsis, such as u(m...n). The name of a 175 value followed by an asterisk * indicates zero or more occurrences of 176 the value. The name of a value followed by a plus sign + indicates 177 one or more occurrences of the value. 179 3. Acknowledgments 181 FLAC owes much to the many people who have advanced the audio 182 compression field so freely. For instance: 184 * A. J. Robinson (https://web.archive.org/web/20160315141134/ 185 http://mi.eng.cam.ac.uk/~ajr/) for his work on Shorten; his paper 186 ([robinson-tr156]) is a good starting point on some of the basic 187 methods used by FLAC. FLAC trivially extends and improves the 188 fixed predictors, LPC coefficient quantization, and Rice coding 189 used in Shorten. 191 * S. W. Golomb 192 (https://web.archive.org/web/20040215005354/http://csi.usc.edu/ 193 faculty/golomb.html) and Robert F. Rice; their universal codes 194 are used by FLAC's entropy coder. 196 * N. Levinson and J. Durbin; the reference encoder uses an 197 algorithm developed and refined by them for determining the LPC 198 coefficients from the autocorrelation coefficients. 200 * And of course, Claude Shannon (https://en.wikipedia.org/wiki/ 201 Claude_Shannon) 203 4. Definitions 205 * *Lossless compression*: reducing the amount of computer storage 206 space needed to store data without needing to remove or 207 irreversibly alter any of this data in doing so. In other words, 208 decompressing losslessly compressed information returns exactly 209 the original data. 211 * *Lossy compression*: like lossless compression, but instead 212 removing, irreversibly altering or only approximating information 213 for the purpose of further reducing the amount of computer storage 214 space needed. In other words, decompressing lossy compressed 215 information returns an approximation of the original data. 217 * *Block*: A (short) section of linear pulse-code modulated audio, 218 with one or more channels. 220 * *Subblock*: All samples within a corresponding block for 1 221 channel. One or more subblocks form a block, and all subblocks in 222 a certain block contain the same number of samples. 224 * *Frame*: A frame header plus one or more subframes. It encodes 225 the contents of a corresponding block. 227 * *Subframe*: An encoded subblock. All subframes within a frame 228 code for the same number of samples. A subframe MAY correspond to 229 a subblock, else it corresponds to either the addition or 230 subtraction of two subblocks, see section on interchannel 231 decorrelation (#interchannel-decorrelation). 233 * *Blocksize*: The total number of samples contained in a block or 234 coded in a frame, divided by the number of channels. In other 235 words, the number of samples in any subblock of a block, or any 236 subframe of a frame. This is also called *interchannel samples*. 238 * *Bit depth* or *bits per sample*: the number of bits used to 239 contain each sample. This MUST be the same for all subblocks in a 240 block but MAY be different for different subframes in a frame 241 because of interchannel decorrelation (#interchannel- 242 decorrelation). 244 * *Predictor*: a model used to predict samples in an audio signal 245 based on past samples. FLAC uses such predictors to remove 246 redundancy in a signal in order to be able to compress it. 248 * *Linear predictor*: a predictor using linear prediction 249 (https://en.wikipedia.org/wiki/Linear_prediction). This is also 250 called *linear predictive coding (LPC)*. With a linear predictor 251 each prediction is a linear combination of past samples, hence the 252 name. A linear predictor has a causal discrete-time finite 253 impulse response (https://en.wikipedia.org/wiki/ 254 Finite_impulse_response). 256 * *Fixed predictor*: a linear predictor in which the model 257 parameters are the same across all FLAC files, and thus not need 258 to be stored. 260 * *Predictor order*: the number of past samples that a predictor 261 uses. For example, a 4th order predictor uses the 4 samples 262 directly preceding a certain sample to predict it. In FLAC, 263 samples used in a predictor are always consecutive, and are always 264 the samples directly before the sample that is being predicted 266 * *Residual*: The audio signal that remains after a predictor has 267 been subtracted from a subblock. If the predictor has been able 268 to remove redundancy from the signal, the samples of the remaining 269 signal (the *residual samples*) will have, on average, a smaller 270 numerical value than the original signal. 272 * *Rice code*: A variable-length code 273 (https://en.wikipedia.org/wiki/Variable-length_code) which 274 compresses data by making use of the observation that, after using 275 an effective predictor, most residual samples are closer to zero 276 than the original samples, while still allowing for a small part 277 of the samples to be much larger. 279 5. Conceptual overview 281 Similar to many audio coders, a FLAC file is encoded following the 282 steps below. On decoding a FLAC file, these steps are undone in 283 reverse order, i.e. from bottom to top. 285 * Blocking (see section on Blocking (#blocking)). The input is 286 split up into many contiguous blocks. With FLAC, the blocks MAY 287 vary in size. The optimal size of the block is usually affected 288 by many factors, including the sample rate, spectral 289 characteristics over time, etc. However, as finding the optimal 290 block size arrangement is a rather complex problem, the FLAC 291 format allows for a constant block size throughout a stream as 292 well. 294 * Interchannel Decorrelation (see section on Interchannel 295 Decorrelation (#interchannel-decorrelation)). In the case of 296 stereo streams, the FLAC format allows for transforming the left- 297 right signal into a mid-side signal to remove redundancy, if there 298 is any. Besides coding as left-right and mid-side, it is also 299 possible to code left-side and side-right, whichever ordering 300 results in the highest compression. Choosing between any of these 301 transformation is done independently for each block. 303 * Prediction (see section on Prediction (#prediction)). To remove 304 redundancy in a signal, a predictor is stored for each subblock or 305 its transformation as formed in the previous step. A predictor 306 consists of a simple mathematical description that can be used, as 307 the name implies, to predict a certain sample from the samples 308 that preceded it. As this prediction is rarely exact, the error 309 of this prediction is passed to the next stage. The predictor of 310 each subblock is completely independent from other subblocks. 311 Since the methods of prediction are known to both the encoder and 312 decoder, only the parameters of the predictor need be included in 313 the compressed stream. In case no usable predictor can be found 314 for a certain subblock, the signal is stored instead of compressed 315 and the next stage is skipped. 317 * Residual Coding (See section on Residual Coding (#residual- 318 coding)). As the predictor does not describe the signal exactly, 319 the difference between the original signal and the predicted 320 signal (called the error or residual signal) MUST be coded 321 losslessly. If the predictor is effective, the residual signal 322 will require fewer bits per sample than the original signal. FLAC 323 uses Rice coding, a subset of Golomb coding, with either 4-bit or 324 5-bit parameters to code the residual signal. 326 In addition, FLAC specifies a metadata system (see section on File- 327 level metadata (#file-level-metadata)), which allows arbitrary 328 information about the stream to be included at the beginning of the 329 stream. 331 5.1. Blocking 333 The size used for blocking the audio data has a direct effect on the 334 compression ratio. If the block size is too small, the resulting 335 large number of frames mean that excess bits will be wasted on frame 336 headers. If the block size is too large, the characteristics of the 337 signal may vary so much that the encoder will be unable to find a 338 good predictor. In order to simplify encoder/decoder design, FLAC 339 imposes a minimum block size of 16 samples, and a maximum block size 340 of 65535 samples. This range covers the optimal size for all of the 341 audio data FLAC supports. 343 While the block size MAY vary in a FLAC file, it is often difficult 344 to find the optimal arrangement of block sizes for maximum 345 compression. Because of this the FLAC format explicitly stores 346 whether a file has a constant or a variable blocksize throughout the 347 stream, and stores a block number instead of a sample number to 348 slighly improve compression in case a stream has a constant block 349 size. 351 Blocked data is passed to the predictor stage one subblock at a time. 352 Each subblock is independently coded into a subframe, and the 353 subframes are concatenated into a frame. Because each channel is 354 coded separately, subframes MAY use different predictors, even within 355 a frame. 357 5.2. Interchannel Decorrelation 359 In many audio files, channels are correlated. The FLAC format can 360 exploit this correlation in stereo files by not directly coding 361 subblocks into subframes, but instead coding an average of all 362 samples in both subblocks (a mid channel) or the difference between 363 all samples in both subblocks (a side channel). The following 364 combinations are possible: 366 * *Independent*. All channels are coded independently. All non- 367 stereo files MUST be encoded this way. 369 * *Mid-side*. A left and right subblock are converted to mid and 370 side subframes. To calculate a sample for a mid subframe, the 371 corresponding left and right samples are summed and the result is 372 shifted right by 1 bit. To calculate a sample for a side 373 subframe, the corresponding right sample is subtracted from the 374 corresponding left sample. On decoding, the mid channel has to be 375 shifted left by 1 bit. Also, if the side channel is uneven, 1 has 376 to be added to the mid channel after the left shift. To 377 reconstruct the left channel, the corresponding samples in the mid 378 and side subframes are added and the result shifted right by 1 379 bit, while for the right channel the side channel has to be 380 subtracted from the mid channel and the result shifted right by 1 381 bit. 383 * *Left-side*. The left subblock is coded and the left and right 384 subblock are used to code a side subframe. The side subframe is 385 constructed in the same way as for mid-side. To decode, the right 386 subblock is restored by subtracting the samples in the side 387 subframe from the corresponding samples the left subframe. 389 * *Right-side*. The right subblock is coded and the left and right 390 subblock are used to code a side subframe. Note that the actual 391 coded subframe order is side-right. The side subframe is 392 constructed in the same way as for mid-side. To decode, the left 393 subblock is restored by adding the samples in the side subframe to 394 the corresponding samples in the right subframe. 396 The side channel needs one extra bit of bit depth as the subtraction 397 can produce sample values twice as large as the maximum possible in 398 any given bit depth. The mid channel in mid-side stereo does not 399 need one extra bit, as it is shifted right one bit. The right shift 400 of the mid channel does not lead to non-lossless behavior, because an 401 uneven sample in the mid subframe must always be accompanied by a 402 corresponding uneven sample in the side subframe, which means the 403 lost least significant bit can be restored by taking it from the 404 sample in the side subframe. 406 5.3. Prediction 408 The FLAC format has four methods for modeling the input signal: 410 1. *Verbatim*. Samples are stored directly, without any modelling. 411 This method is used for inputs with little correlation like white 412 noise. Since the raw signal is not actually passed through the 413 residual coding stage (it is added to the stream 'verbatim'), the 414 method is different from using a zero-order fixed predictor. 416 2. *Constant*. A single sample value is stored. This method is used 417 whenever a signal is pure DC ("digital silence"), i.e. a constant 418 value throughout. 420 3. *Fixed predictor*. Samples are predicted with one of five fixed 421 (i.e. predefined) predictors, the error of this prediction is 422 processed by the residual coder. These fixed predictors are well 423 suited for predicting simple waveforms. Since the predictors are 424 fixed, no predictor coefficients are stored. From a mathematical 425 point of view, the predictors work by extrapolating the signal 426 from the previous samples. The number of previous samples used 427 is equal to the predictor order. For more information see the 428 section on the fixed predictor subframe (#fixed-predictor- 429 subframe) 431 4. *Linear predictor*. Samples are predicted using past samples and 432 a set of predictor coefficients, the error of this prediction is 433 processed by the residual coder. Compared to a fixed predictor, 434 using a generic linear predictor adds overhead as predictor 435 coefficients need to be stored. Therefore, this method of 436 prediction is best suited for predicting more complex waveforms, 437 where the added overhead is offset by space savings in the 438 residual coding stage resulting from more accurate prediction. A 439 linear predictor in FLAC has two parameters besides the predictor 440 coefficients and the predictor order: the number of bits with 441 which each coefficient is stored (the coefficient precision) and 442 a prediction right shift. A prediction is formed by taking the 443 sum of multiplying each predictor coefficient with the 444 corresponding past sample, and dividing that sum by applying the 445 specified right shift. For more information see the section on 446 the linear predictor subframe (#linear-predictor-subframe) 448 For more information on fixed and linear predictors, see 449 [HPL-1999-144] and [robinson-tr156]. 451 5.4. Residual Coding 453 In case a subframe uses a predictor to approximate the audio signal, 454 a residual needs to be stored to 'correct' the approximation to the 455 exact value. When an effective predictor is used, the average 456 numerical value of the residual samples is smaller than that of the 457 samples before prediction. While having smaller values on average, 458 it is possible a few 'outlier' residual samples are much larger than 459 any of the original samples. Sometimes these outliers even exceed 460 the range the bit depth of the original audio offers. 462 To be able to efficiently code such a stream of relatively small 463 numbers with an occasional outlier, Rice coding (a subset of Golomb 464 coding) is used. Depending on how small the numbers are that have to 465 be coded, a Rice parameter is chosen. The numerical value of each 466 residual sample is split in two parts by dividing it with 2^(Rice 467 parameter), creating a quotient and a remainder. The quotient is 468 stored in unary form, the remainder in binary form. If indeed most 469 residual samples are close to zero and the Rice parameter is chosen 470 right, this form of coding, a so-called variable-length code, needs 471 less bits to store than storing the residual in unencoded form. 473 As Rice codes can only handle unsigned numbers, signed numbers are 474 zigzag encoded to a so-called folded residual. For more information 475 see section coded residual (#coded-residual) for a more thorough 476 explanation. 478 Quite often the optimal Rice parameter varies over the course of a 479 subframe. To accommodate this, the residual can be split up into 480 partitions, where each partition has its own Rice parameter. To keep 481 overhead and complexity low, the number of partitions used in a 482 subframe is limited to powers of two. 484 The FLAC format uses two forms of Rice coding, which only differ in 485 the number of bits used for encoding the Rice parameter, either 4 or 486 5 bits. 488 6. Format principles 490 FLAC has no format version information, but it does contain reserved 491 space in several places. Future versions of the format MAY use this 492 reserved space safely without breaking the format of older streams. 493 Older decoders MAY choose to abort decoding or skip data encoded with 494 newer methods. Apart from reserved patterns, in places the format 495 specifies invalid patterns, meaning that the patterns MAY never 496 appear in any valid bitstream, in any prior, present, or future 497 versions of the format. These invalid patterns are usually used to 498 make the synchronization mechanism more robust. 500 All numbers used in a FLAC bitstream MUST be integers; there are no 501 floating-point representations. All numbers MUST be big-endian 502 coded, except the length field used in Vorbis comments, which MUST be 503 little-endian coded. All numbers MUST be unsigned except linear 504 predictor coefficients, the linear prediction shift and numbers which 505 directly represent samples, which MUST be signed. None of these 506 restrictions apply to application metadata blocks. 508 All samples encoded to and decoded from the FLAC format MUST be in a 509 signed representation. 511 There are several ways to convert unsigned sample representations to 512 signed sample representations, but the coding methods provided by the 513 FLAC format work best on audio signals of which the numerical values 514 of the samples are centered around zero, i.e. have no DC offset. In 515 most unsigned audio formats, signals are centered around halfway the 516 range of the unsigned integer type used. If that is the case, all 517 sample representations SHOULD be converted by first copying the 518 number to a signed integer with sufficient range and then subtracting 519 half of the range of the unsigned integer type, which should result 520 in a signal with samples centered around 0. 522 7. Format lay-out 524 Before the formal description of the stream, an overview of the lay- 525 out of FLAC file might be helpful. 527 * A FLAC bitstream consists of the "fLaC" (i.e. 0x664C6143) marker 528 at the beginning of the stream, followed by a mandatory metadata 529 block (called the STREAMINFO block), any number of other metadata 530 blocks, then the audio frames. 532 * FLAC supports up to 128 kinds of metadata blocks; currently the 533 following are defined: 535 - STREAMINFO: This block has information about the whole stream, 536 like sample rate, number of channels, total number of samples, 537 etc. It MUST be present as the first metadata block in the 538 stream. Other metadata blocks MAY follow, and ones that the 539 decoder doesn't understand, it will skip. 541 - PADDING: This block allows for an arbitrary amount of padding. 542 The contents of a PADDING block have no meaning. This block is 543 useful when it is known that metadata will be edited after 544 encoding; the user can instruct the encoder to reserve a 545 PADDING block of sufficient size so that when metadata is 546 added, it will simply overwrite the padding (which is 547 relatively quick) instead of having to insert it into the right 548 place in the existing file (which would normally require 549 rewriting the entire file). 551 - APPLICATION: This block is for use by third-party applications. 552 The only mandatory field is a 32-bit identifier. This ID is 553 granted upon request to an application by the FLAC maintainers. 554 The remainder is of the block is defined by the registered 555 application. Visit the registration page 556 (https://xiph.org/flac/id.html) if you would like to register 557 an ID for your application with FLAC. 559 - SEEKTABLE: This is an OPTIONAL block for storing seek points. 560 It is possible to seek to any given sample in a FLAC stream 561 without a seek table, but the delay can be unpredictable since 562 the bitrate MAY vary widely within a stream. By adding seek 563 points to a stream, this delay can be significantly reduced. 565 Each seek point takes 18 bytes, so 1% resolution within a 566 stream adds less than 2K. There can be only one SEEKTABLE in a 567 stream, but the table can have any number of seek points. 568 There is also a special 'placeholder' seekpoint which will be 569 ignored by decoders but which can be used to reserve space for 570 future seek point insertion. 572 - VORBIS_COMMENT: This block is for storing a list of human- 573 readable name/value pairs. Values are encoded using UTF-8. It 574 is an implementation of the Vorbis comment specification 575 (http://xiph.org/vorbis/doc/v-comment.html) (without the 576 framing bit). This is the only officially supported tagging 577 mechanism in FLAC. There MUST be only zero or one 578 VORBIS_COMMENT blocks in a stream. In some external 579 documentation, Vorbis comments are called FLAC tags to lessen 580 confusion. 582 - CUESHEET: This block is for storing various information that 583 can be used in a cue sheet. It supports track and index 584 points, compatible with Red Book CD digital audio discs, as 585 well as other CD-DA metadata such as media catalog number and 586 track ISRCs. The CUESHEET block is especially useful for 587 backing up CD-DA discs, but it can be used as a general purpose 588 cueing mechanism for playback. 590 - PICTURE: This block is for storing pictures associated with the 591 file, most commonly cover art from CDs. There MAY be more than 592 one PICTURE block in a file. The picture format is similar to 593 the APIC frame in ID3v2 (http://www.id3.org/id3v2.4.0-frames). 594 The PICTURE block has a type, MIME type, and UTF-8 description 595 like ID3v2, and supports external linking via URL (though this 596 is discouraged). The differences are that there is no 597 uniqueness constraint on the description field, and the MIME 598 type is mandatory. The FLAC PICTURE block also includes the 599 resolution, color depth, and palette size so that the client 600 can search for a suitable picture without having to scan them 601 all. 603 * The audio data is composed of one or more audio frames. Each 604 frame consists of a frame header, which contains a sync code, 605 information about the frame like the block size, sample rate, 606 number of channels, et cetera, and an 8-bit CRC. The frame header 607 also contains either the sample number of the first sample in the 608 frame (for variable-blocksize streams), or the frame number (for 609 fixed-blocksize streams). This allows for fast, sample-accurate 610 seeking to be performed. Following the frame header are encoded 611 subframes, one for each channel, and finally, the frame is zero- 612 padded to a byte boundary. Each subframe has its own header that 613 specifies how the subframe is encoded. 615 * Since a decoder MAY start decoding in the middle of a stream, 616 there MUST be a method to determine the start of a frame. A 617 14-bit sync code begins each frame. The sync code will not appear 618 anywhere else in the frame header. However, since it MAY appear 619 in the subframes, the decoder has two other ways of ensuring a 620 correct sync. The first is to check that the rest of the frame 621 header contains no invalid data. Even this is not foolproof since 622 valid header patterns can still occur within the subframes. The 623 decoder's final check is to generate an 8-bit CRC of the frame 624 header and compare this to the CRC stored at the end of the frame 625 header. 627 * Again, since a decoder MAY start decoding at an arbitrary frame in 628 the stream, each frame header MUST contain some basic information 629 about the stream because the decoder MAY not have access to the 630 STREAMINFO metadata block at the start of the stream. This 631 information includes sample rate, bits per sample, number of 632 channels, etc. Since the frame header is pure overhead, it has a 633 direct effect on the compression ratio. To keep the frame header 634 as small as possible, FLAC uses lookup tables for the most 635 commonly used values for frame parameters. For instance, the 636 sample rate part of the frame header is specified using 4 bits. 637 Eight of the bit patterns correspond to the commonly used sample 638 rates of 8, 16, 22.05, 24, 32, 44.1, 48 or 96 kHz. However, odd 639 sample rates can be specified by using one of the 'hint' bit 640 patterns, directing the decoder to find the exact sample rate at 641 the end of the frame header. The same method is used for 642 specifying the block size and bits per sample. In this way, the 643 frame header size stays small for all of the most common forms of 644 audio data. 646 * Individual subframes (one for each channel) are coded separately 647 within a frame, and appear serially in the stream. In other 648 words, the encoded audio data is NOT channel-interleaved. This 649 reduces decoder complexity at the cost of requiring larger decode 650 buffers. Each subframe has its own header specifying the 651 attributes of the subframe, like prediction method and order, 652 residual coding parameters, etc. The header is followed by the 653 encoded audio data for that channel. 655 8. Format subset 657 FLAC specifies a subset of itself as the Subset format. The purpose 658 of this is to ensure that any streams encoded according to the Subset 659 are truly "streamable", meaning that a decoder that cannot seek 660 within the stream can still pick up in the middle of the stream and 661 start decoding. It also makes hardware decoder implementations more 662 practical by limiting the encoding parameters such that decoder 663 buffer sizes and other resource requirements can be easily 664 determined. *flac* generates Subset streams by default unless the "-- 665 lax" command-line option is used. The Subset makes the following 666 limitations on what MAY be used in the stream: 668 * The blocksize bits (#blocksize-bits) in the frame header MUST be 669 0b0001-0b1110. The blocksize MUST be <= 16384; if the sample rate 670 is <= 48000 Hz, the blocksize MUST be <= 4608 = 2^9 * 3^2. 672 * The sample rate bits (#sample-rate-bits) in the frame header MUST 673 be 0b0001-0b1110. 675 * The bits depth bits (#bit-depth-bits) in the frame header MUST be 676 0b001-0b111. 678 * If the sample rate is <= 48000 Hz, the filter order in linear 679 subframes (see section linear predictor subframe (#linear- 680 predictor-subframe)) MUST be less than or equal to 12, i.e. the 681 subframe type bits in the subframe header (see subframe header 682 section (#subframe-header)) SHOULD NOT be 0b101100-0b111111. 684 * The Rice partition order (see coded residual section (#coded- 685 residual)) MUST be less than or equal to 8. 687 9. File-level metadata 689 At the start of a FLAC file or stream, following the fLaC ASCII file 690 signature, one or more metadata blocks MUST be present before any 691 audio frames appear. The first metadata block MUST be a streaminfo 692 block. 694 9.1. Metadata block header 696 Each metadata block starts with a 4 byte header. The first bit in 697 this header flags whether a metadata block is the last one, it is a 0 698 when other metadata blocks follow, otherwise it is a 1. The 7 699 remaining bits of the first header byte contain the type of the 700 metadata block as an unsigned number between 0 and 126 according to 701 the following table. A value of 127 (i.e. 0b1111111) is invalid. 702 The three bytes that follow code for the size of the metadata block 703 in bytes excluding the 4 header bytes as an unsigned number coded 704 big-endian. 706 +=========+====================================================+ 707 | Value | Metadata block type | 708 +=========+====================================================+ 709 | 0 | Streaminfo | 710 +---------+----------------------------------------------------+ 711 | 1 | Padding | 712 +---------+----------------------------------------------------+ 713 | 2 | Application | 714 +---------+----------------------------------------------------+ 715 | 3 | Seektable | 716 +---------+----------------------------------------------------+ 717 | 4 | Vorbis comment | 718 +---------+----------------------------------------------------+ 719 | 5 | Cuesheet | 720 +---------+----------------------------------------------------+ 721 | 6 | Picture | 722 +---------+----------------------------------------------------+ 723 | 7 - 126 | reserved | 724 +---------+----------------------------------------------------+ 725 | 127 | invalid, to avoid confusion with a frame sync code | 726 +---------+----------------------------------------------------+ 728 Table 1 730 9.2. Streaminfo 732 The streaminfo metadata block contains technical information about 733 the FLAC stream relevant for decoding. Decoder behavior in case of 734 incorrect or incomplete information is left unspecified (i.e. up to 735 the decoder implementation). A decoder MAY choose to stop further 736 decoding in case the information supplied by the streaminfo metadata 737 block turns out to be incorrect or invalid. A decoder accepting 738 information from the streaminfo block (most significantly the maximum 739 frame size, maximum block size, number of audio channels, number of 740 bits per sample and total number of samples) without doing further 741 checks during decoding of audio frames could be vulnerable to buffer 742 overflows. See also the section on security considerations 743 (#security-considerations). 745 +========+===================================================+ 746 | Data | Description | 747 +========+===================================================+ 748 | u(16) | The minimum block size (in samples) used in the | 749 | | stream, excluding the last block. | 750 +--------+---------------------------------------------------+ 751 | u(16) | The maximum block size (in samples) used in the | 752 | | stream. | 753 +--------+---------------------------------------------------+ 754 | u(24) | The minimum frame size (in bytes) used in the | 755 | | stream. A value of 0 signifies that the value is | 756 | | not known. | 757 +--------+---------------------------------------------------+ 758 | u(24) | The maximum frame size (in bytes) used in the | 759 | | stream. A value of 0 signifies that the value is | 760 | | not known. | 761 +--------+---------------------------------------------------+ 762 | u(20) | Sample rate in Hz. Though 20 bits are available, | 763 | | the maximum sample rate is limited by the | 764 | | structure of frame headers to 655350 Hz. Also, a | 765 | | value of 0 is invalid. | 766 +--------+---------------------------------------------------+ 767 | u(3) | (number of channels)-1. FLAC supports from 1 to | 768 | | 8 channels | 769 +--------+---------------------------------------------------+ 770 | u(5) | (bits per sample)-1. FLAC supports from 4 to 32 | 771 | | bits per sample. Currently the reference encoder | 772 | | and decoders only support up to 24 bits per | 773 | | sample. | 774 +--------+---------------------------------------------------+ 775 | u(36) | Total samples in stream. 'Samples' means inter- | 776 | | channel sample, i.e. one second of 44.1 kHz audio | 777 | | will have 44100 samples regardless of the number | 778 | | of channels. A value of zero here means the | 779 | | number of total samples is unknown. | 780 +--------+---------------------------------------------------+ 781 | u(128) | MD5 signature of the unencoded audio data. This | 782 | | allows the decoder to determine if an error | 783 | | exists in the audio data even when the error does | 784 | | not result in an invalid bitstream. A value of 0 | 785 | | signifies that the value is not known. | 786 +--------+---------------------------------------------------+ 788 Table 2 790 The minimum block size is excluding the last block of a FLAC file, 791 which may be smaller. If the minimum block size is equal to the 792 maximum block size, the file contains a fixed block size stream. 793 Note that the actual maximum block size might be smaller than the 794 maximum block size listed in the streaminfo block, and the actual 795 smallest block size excluding the last block might be larger than the 796 minimum block size listed in the streaminfo block. This is because 797 the encoder has to write these fields before receiving any input 798 audio data, and cannot know beforehand what block sizes it will use, 799 only between what bounds these will be chosen. 801 FLAC specifies a minimum block size of 16 and a maximum block size of 802 65535, meaning the bit patterns corresponding to the numbers 0-15 in 803 the minimum block size and maximum block size fields are invalid. 805 The MD5 signature is made by performing an MD5 transformation on the 806 samples of all channels interleaved, represented in signed, little- 807 endian form. This interleaving is on a per-sample basis, so for a 808 stereo file this means first the first sample of the first channel, 809 then the first sample of the second channel, then the second sample 810 of the first channel etc. Before performing the MD5 transformation, 811 all samples must be byte-aligned. So, in case the bit depth is not a 812 whole number of bytes, additional zero bits are inserted at the most- 813 significant position until each sample representation is a whole 814 number of bytes. 816 9.3. Padding 818 +======+========================================+ 819 | Data | Description | 820 +======+========================================+ 821 | u(n) | n '0' bits (n MUST be a multiple of 8) | 822 +------+----------------------------------------+ 824 Table 3 826 9.4. Application 828 +=======+===========================================+ 829 | Data | Description | 830 +=======+===========================================+ 831 | u(32) | Registered application ID. (Visit the | 832 | | registration page (https://xiph.org/flac/ | 833 | | id.html) to register an ID with FLAC.) | 834 +-------+-------------------------------------------+ 835 | u(n) | Application data (n MUST be a multiple of | 836 | | 8) | 837 +-------+-------------------------------------------+ 838 Table 4 840 9.5. Seektable 842 +============+==========================+ 843 | Data | Description | 844 +============+==========================+ 845 | SEEKPOINT+ | One or more seek points. | 846 +------------+--------------------------+ 848 Table 5 850 NOTE - The number of seek points is implied by the metadata header 851 'length' field, i.e. equal to length / 18. 853 9.5.1. Seekpoint 855 +=======+==========================================================+ 856 | Data | Description | 857 +=======+==========================================================+ 858 | u(64) | Sample number of first sample in the target frame, or | 859 | | 0xFFFFFFFFFFFFFFFF for a placeholder point. | 860 +-------+----------------------------------------------------------+ 861 | u(64) | Offset (in bytes) from the first byte of the first frame | 862 | | header to the first byte of the target frame's header. | 863 +-------+----------------------------------------------------------+ 864 | u(16) | Number of samples in the target frame. | 865 +-------+----------------------------------------------------------+ 867 Table 6 869 NOTES 871 * For placeholder points, the second and third field values are 872 undefined. 874 * Seek points within a table MUST be sorted in ascending order by 875 sample number. 877 * Seek points within a table MUST be unique by sample number, with 878 the exception of placeholder points. 880 * The previous two notes imply that there MAY be any number of 881 placeholder points, but they MUST all occur at the end of the 882 table. 884 9.6. Vorbis comment 886 A vorbis comment metadata block contains human-readable information 887 coded in UTF-8. The name vorbis comment points to the fact that the 888 vorbis codec stores such metadata in almost the same way. A vorbis 889 comment metadata block consists of a vendor string optionally 890 followed by a number of fields, which are pairs of field names and 891 field contents. Many users refer to these fields as FLAC tags or 892 simply as tags. A FLAC file MUST NOT contain more than one vorbis 893 comment metadata block. 895 In a vorbis comment metadata block, the metadata block header is 896 directly followed by 4 bytes containing the length in bytes of the 897 vendor string as an unsigned number coded little-endian. The vendor 898 string follows UTF-8 coded, and is not terminated in any way. 900 Following the vendor string are 4 bytes containing the number of 901 fields that are in the vorbis comment block, stored as an unsigned 902 number, coded little-endian. If this number is non-zero, it is 903 followed by the fields themselves, each field stored with a 4 byte 904 length. First, the 4 byte field length in bytes is stored as an 905 unsigned number, coded little-endian. The field itself is, like the 906 vendor string, UTF-8 coded, not terminated in any way. 908 Each field consists of a field name and a field content, separated by 909 an = character. The field name MUST only consist of UTF-8 code 910 points U+0020 through U+0074, excluding U+003D, which is the = 911 character. In other words, the field name can contain all printable 912 ASCII characters except the equals sign. The evaluation of the field 913 names MUST be case insensitive, so U+0041 through 0+005A (A-Z) MUST 914 be considered equivalent to U+0061 through U+007A (a-z) respectively. 915 The field contents can contain any UTF-8 character. 917 Note that the vorbis comment as used in vorbis allows for on the 918 order of 2^64 bytes of data whereas the FLAC metadata block is 919 limited to 2^24 bytes. Given the stated purpose of vorbis comments, 920 i.e. human-readable textual information, this limit is unlikely to be 921 restrictive. Also note that the 32-bit field lengths are coded 922 little-endian, as opposed to the usual big-endian coding of fixed- 923 length integers in the rest of the FLAC format. 925 9.6.1. Standard field names 927 Except the one defined in the section channel mask (#channel-mask), 928 no standard field names are defined. In general, most software 929 recognizes the following field names 931 * Title: name of the current work 932 * Artist: name of the artist generally responsible for the current 933 work. For orchestral works this is usually the composer, 934 otherwise is it often the performer 936 * Album: name of the collection the current work belongs to 938 For a more comprehensive list of possible field names, the list of 939 tags used in the MusicBrainz project (http://picard- 940 docs.musicbrainz.org/en/variables/variables.html) is recommended. 942 9.6.2. Channel mask 944 Besides fields containing information about the work itself, one 945 field is defined for technical reasons, of which the field name is 946 WAVEFORMATEXTENSIBLE_CHANNEL_MASK. This field contains information 947 on which channels the file contains. Use of this field is 948 RECOMMENDED in case these differ from the channels defined in the 949 section channels bits (#channels-bits). 951 The channel mask consists of flag bits indicating which channels are 952 present, stored in a hexadecimal representation preceded by 0x. The 953 flags only signal which channels are present, not in which order, so 954 in case a file has to be encoded in which channels are ordered 955 differently, they have to be reordered. Please note that a file in 956 which the channel order is defined through the 957 WAVEFORMATEXTENSIBLE_CHANNEL_MASK is not streamable, i.e. non-subset, 958 as the field is not found in each frame header. The mask bits can be 959 found in the following table 960 +============+=============================+ 961 | Bit number | Channel description | 962 +============+=============================+ 963 | 0 | Front left | 964 +------------+-----------------------------+ 965 | 1 | Front right | 966 +------------+-----------------------------+ 967 | 2 | Front center | 968 +------------+-----------------------------+ 969 | 3 | Low-frequency effects (LFE) | 970 +------------+-----------------------------+ 971 | 4 | Back left | 972 +------------+-----------------------------+ 973 | 5 | Back right | 974 +------------+-----------------------------+ 975 | 6 | Front left of center | 976 +------------+-----------------------------+ 977 | 7 | Front right of center | 978 +------------+-----------------------------+ 979 | 8 | Back center | 980 +------------+-----------------------------+ 981 | 9 | Side left | 982 +------------+-----------------------------+ 983 | 10 | Side right | 984 +------------+-----------------------------+ 985 | 11 | Top center | 986 +------------+-----------------------------+ 987 | 12 | Top front left | 988 +------------+-----------------------------+ 989 | 13 | Top front center | 990 +------------+-----------------------------+ 991 | 14 | Top front right | 992 +------------+-----------------------------+ 993 | 15 | Top rear left | 994 +------------+-----------------------------+ 995 | 16 | Top rear center | 996 +------------+-----------------------------+ 997 | 17 | Top rear right | 998 +------------+-----------------------------+ 1000 Table 7 1002 Following are 3 examples: 1004 * if a file has a single channel, being a LFE channel, the vorbis 1005 comment field is WAVEFORMATEXTENSIBLE_CHANNEL_MASK=0x8 1007 * if a file has 4 channels, being front left, front right, top front 1008 left and top front right, the vorbis comment field is 1009 WAVEFORMATEXTENSIBLE_CHANNEL_MASK=0x5003 1011 * if an input has 4 channels, being back center, top front center, 1012 front center and top rear center in that order, they have to be 1013 reordered to front center, back center, top front center and top 1014 rear center. The vorbis comment field added is 1015 WAVEFORMATEXTENSIBLE_CHANNEL_MASK=0x12004. 1017 WAVEFORMATEXTENSIBLE_CHANNEL_MASK fields MAY be padded with zeros, 1018 for example, 0x0008 for a single LFE channel. Parsing of 1019 WAVEFORMATEXTENSIBLE_CHANNEL_MASK fields MUST be case-insensitive for 1020 both the field name and the field contents. 1022 9.7. Cuesheet 1024 To either store the track and index point structure of a CD-DA along 1025 with its audio or to provide a mechanism to store locations of 1026 interest within a FLAC file, a cuesheet metadata block can be used. 1027 Certain aspects of this metadata block follow directly from the CD-DA 1028 specification, called Red Book, which is standardized as 1029 [IEC.60908.1999]. For more information on the function and history 1030 of these aspects, please refer to [IEC.60908.1999]. 1032 The structure of a cuesheet metadata block is enumerated in the 1033 following table. 1035 +============+================================================+ 1036 | Data | Description | 1037 +============+================================================+ 1038 | u(128*8) | Media catalog number, in ASCII printable | 1039 | | characters 0x20-0x7E. | 1040 +------------+------------------------------------------------+ 1041 | u(64) | Number of lead-in samples. | 1042 +------------+------------------------------------------------+ 1043 | u(1) | 1 if the cuesheet corresponds to a Compact | 1044 | | Disc, else 0. | 1045 +------------+------------------------------------------------+ 1046 | u(7+258*8) | Reserved. All bits MUST be set to zero. | 1047 +------------+------------------------------------------------+ 1048 | u(8) | Number of tracks in this cuesheet. | 1049 +------------+------------------------------------------------+ 1050 | Cuesheet | A number of structures as specified in the | 1051 | tracks | section cuesheet track (#cuesheet-track) equal | 1052 | | to the number of tracks specified previously. | 1053 +------------+------------------------------------------------+ 1055 Table 8 1057 If the media catalog number is less than 128 bytes long, it SHOULD be 1058 right-padded with NUL characters. For CD-DA, this is a thirteen 1059 digit number, followed by 115 NUL bytes. 1061 The number of lead-in samples has meaning only for CD-DA cuesheets; 1062 for other uses it SHOULD be 0. For CD-DA, the lead-in is the TRACK 1063 00 area where the table of contents is stored; more precisely, it is 1064 the number of samples from the first sample of the media to the first 1065 sample of the first index point of the first track. According to 1066 [IEC.60908.1999], the lead-in MUST be silence and CD grabbing 1067 software does not usually store it; additionally, the lead-in MUST be 1068 at least two seconds but MAY be longer. For these reasons the lead- 1069 in length is stored here so that the absolute position of the first 1070 track can be computed. Note that the lead-in stored here is the 1071 number of samples up to the first index point of the first track, not 1072 necessarily to INDEX 01 of the first track; even the first track MAY 1073 have INDEX 00 data. 1075 The number of tracks MUST be at least 1, as a cuesheet block MUST 1076 have a lead-out track. For CD-DA, this number MUST be no more than 1077 100 (99 regular tracks and one lead-out track). The lead-out track 1078 is always the last track in the cuesheet. For CD-DA, the lead-out 1079 track number MUST be 170 as specified by [IEC.60908.1999], otherwise 1080 it MUST be 255. 1082 9.7.1. Cuesheet track 1084 +===========+=======================================================+ 1085 | Data | Description | 1086 +===========+=======================================================+ 1087 | u(64) | Track offset of first index point in | 1088 | | samples, relative to the beginning of the | 1089 | | FLAC audio stream. | 1090 +-----------+-------------------------------------------------------+ 1091 | u(8) | Track number. | 1092 +-----------+-------------------------------------------------------+ 1093 | u(12*8) | Track ISRC. | 1094 +-----------+-------------------------------------------------------+ 1095 | u(1) | The track type: 0 for audio, 1 for non- | 1096 | | audio. This corresponds to the CD-DA | 1097 | | Q-channel control bit 3. | 1098 +-----------+-------------------------------------------------------+ 1099 | u(1) | The pre-emphasis flag: 0 for no pre- | 1100 | | emphasis, 1 for pre-emphasis. This | 1101 | | corresponds to the CD-DA Q-channel control | 1102 | | bit 5. | 1103 +-----------+-------------------------------------------------------+ 1104 | u(6+13*8) | Reserved. All bits MUST be set to zero. | 1105 +-----------+-------------------------------------------------------+ 1106 | u(8) | The number of track index points. | 1107 +-----------+-------------------------------------------------------+ 1108 | Cuesheet | For all tracks except the lead-out track, a | 1109 | track | number of structures as specified in the | 1110 | index | section cuesheet track index point | 1111 | points | (#cuesheet-track-index-point) equal to the | 1112 | | number of index points specified previously. | 1113 +-----------+-------------------------------------------------------+ 1115 Table 9 1117 Note that the track offset differs from the one in CD-DA, where the 1118 track's offset in the TOC is that of the track's INDEX 01 even if 1119 there is an INDEX 00. For CD-DA, the track offset MUST be evenly 1120 divisible by 588 samples (588 samples = 44100 samples/s * 1/75 s). 1122 A track number of 0 is not allowed to avoid conflicting with the CD- 1123 DA spec, which reserves this for the lead-in. For CD-DA the number 1124 MUST be 1-99, or 170 for the lead-out; for non-CD-DA, the track 1125 number MUST for 255 for the lead-out. It is RECOMMENDED to start 1126 with track 1 and increase sequentially. Track numbers MUST be unique 1127 within a cuesheet. 1129 The track ISRC (International Standard Recording Code) is a 12-digit 1130 alphanumeric code; see [ISRC-handbook]. A value of 12 ASCII NUL 1131 characters MAY be used to denote absence of an ISRC. 1133 There MUST be at least one index point in every track in a cuesheet 1134 except for the lead-out track, which MUST have zero. For CD-DA, the 1135 number of index points SHOULD NOT be more than 100. 1137 9.7.1.1. Cuesheet track index point 1139 +========+====================================+ 1140 | Data | Description | 1141 +========+====================================+ 1142 | u(64) | Offset in samples, relative to the | 1143 | | track offset, of the index point. | 1144 +--------+------------------------------------+ 1145 | u(8) | The track index point number. | 1146 +--------+------------------------------------+ 1147 | u(3*8) | Reserved. All bits MUST be set to | 1148 | | zero. | 1149 +--------+------------------------------------+ 1151 Table 10 1153 For CD-DA, the track index point offset MUST be evenly divisible by 1154 588 samples (588 samples = 44100 samples/s * 1/75 s). Note that the 1155 offset is from the beginning of the track, not the beginning of the 1156 audio data. 1158 For CD-DA, an track index point number of 0 corresponds to the track 1159 pre-gap. The first index point in a track MUST have a number of 0 or 1160 1, and subsequently, index point numbers MUST increase by 1. Index 1161 point numbers MUST be unique within a track. 1163 9.8. Picture 1165 The picture metadata block contains image data of a picture in some 1166 way belonging to the audio contained in the FLAC file. Its format is 1167 derived from the APIC frame in the ID3v2 specification. However, 1168 contrary to the APIC frame in ID3v2, the MIME type and description 1169 are prepended with a 4-byte length field instead of being null 1170 delimited strings. A FLAC file MAY contain one or more picture 1171 metadata blocks. 1173 Note that while the length fields for MIME type, description and 1174 picture data are 4 bytes in length and could in theory code for a 1175 size up to 4 GiB, the total metadata block size cannot exceed what 1176 can be described by the metadata block header, i.e. 16 MiB. 1178 +========+==================================================+ 1179 | Data | Description | 1180 +========+==================================================+ 1181 | u(32) | The picture type according to next table | 1182 +--------+--------------------------------------------------+ 1183 | u(32) | The length of the MIME type string in bytes. | 1184 +--------+--------------------------------------------------+ 1185 | u(n*8) | The MIME type string, in printable ASCII | 1186 | | characters 0x20-0x7E. The MIME type MAY also be | 1187 | | --> to signify that the data part is a URL of | 1188 | | the picture instead of the picture data itself. | 1189 +--------+--------------------------------------------------+ 1190 | u(32) | The length of the description string in bytes. | 1191 +--------+--------------------------------------------------+ 1192 | u(n*8) | The description of the picture, in UTF-8. | 1193 +--------+--------------------------------------------------+ 1194 | u(32) | The width of the picture in pixels. | 1195 +--------+--------------------------------------------------+ 1196 | u(32) | The height of the picture in pixels. | 1197 +--------+--------------------------------------------------+ 1198 | u(32) | The color depth of the picture in bits-per- | 1199 | | pixel. | 1200 +--------+--------------------------------------------------+ 1201 | u(32) | For indexed-color pictures (e.g. GIF), the | 1202 | | number of colors used, or 0 for non-indexed | 1203 | | pictures. | 1204 +--------+--------------------------------------------------+ 1205 | u(32) | The length of the picture data in bytes. | 1206 +--------+--------------------------------------------------+ 1207 | u(n*8) | The binary picture data. | 1208 +--------+--------------------------------------------------+ 1210 Table 11 1212 The following table contains all defined picture types. Values other 1213 than those listed in the table are reserved and SHOULD NOT be used. 1214 There MAY only be one each of picture type 1 and 2 in a file. In 1215 general practice, many decoders display the contents of a picture 1216 metadata block with picture type 3 (front cover) during playback, if 1217 present. 1219 +=======+=================================================+ 1220 | Value | Picture type | 1221 +=======+=================================================+ 1222 | 0 | Other | 1223 +-------+-------------------------------------------------+ 1224 | 1 | PNG file icon of 32x32 pixels | 1225 +-------+-------------------------------------------------+ 1226 | 2 | General file icon | 1227 +-------+-------------------------------------------------+ 1228 | 3 | Front cover | 1229 +-------+-------------------------------------------------+ 1230 | 4 | Back cover | 1231 +-------+-------------------------------------------------+ 1232 | 5 | Liner notes page | 1233 +-------+-------------------------------------------------+ 1234 | 6 | Media label (e.g. CD, Vinyl or Cassette label) | 1235 +-------+-------------------------------------------------+ 1236 | 7 | Lead artist, lead performer or soloist | 1237 +-------+-------------------------------------------------+ 1238 | 8 | Artist or performer | 1239 +-------+-------------------------------------------------+ 1240 | 9 | Conductor | 1241 +-------+-------------------------------------------------+ 1242 | 10 | Band or orchestra | 1243 +-------+-------------------------------------------------+ 1244 | 11 | Composer | 1245 +-------+-------------------------------------------------+ 1246 | 12 | Lyricist or text writer | 1247 +-------+-------------------------------------------------+ 1248 | 13 | Recording location | 1249 +-------+-------------------------------------------------+ 1250 | 14 | During recording | 1251 +-------+-------------------------------------------------+ 1252 | 15 | During performance | 1253 +-------+-------------------------------------------------+ 1254 | 16 | Movie or video screen capture | 1255 +-------+-------------------------------------------------+ 1256 | 17 | A bright colored fish | 1257 +-------+-------------------------------------------------+ 1258 | 18 | Illustration | 1259 +-------+-------------------------------------------------+ 1260 | 19 | Band or artist logotype | 1261 +-------+-------------------------------------------------+ 1262 | 20 | Publisher or studio logotype | 1263 +-------+-------------------------------------------------+ 1265 Table 12 1267 10. Frame structure 1269 Directly after the last metadata block, one or more frames follow. 1270 Each frame consists of a frame header, one or more subframes, padding 1271 zero bits to achieve byte-alignment and a frame footer. The number 1272 of subframes in each frame is equal to the number of audio channels. 1274 10.1. Frame header 1276 Each frame starts with the 15-bit frame sync code 0b111111111111100. 1277 Following the sync code is the blocking strategy bit, which MUST NOT 1278 change during the audio stream. The blocking strategy bit is 0 for a 1279 fixed blocksize stream or 1 for variable blocksize stream. If the 1280 blocking strategy is known, a decoder can search for a 16-bit sync 1281 code, either 0xF8 for a fixed blocksize stream or 0xF9 for a variable 1282 blocksize stream. To ease the search for the sync code and further 1283 reduction of false positives, all frames MUST start on a byte 1284 boundary. 1286 10.1.1. Blocksize bits 1288 Following the frame sync code and blocksize strategy bit are 4 bits 1289 referred to as the blocksize bits. Their value relates to the 1290 blocksize according to the following table, where v is the value of 1291 the 4 bits as an unsigned number. Uncommon blocksizes are stored 1292 after the coded number. 1294 +=================+===========================================+ 1295 | Value | Blocksize | 1296 +=================+===========================================+ 1297 | 0b0000 | reserved | 1298 +-----------------+-------------------------------------------+ 1299 | 0b0001 | 192 | 1300 +-----------------+-------------------------------------------+ 1301 | 0b0010 - 0b0101 | 144 * (2^v), i.e. 576, 1152, 2304 or 4608 | 1302 +-----------------+-------------------------------------------+ 1303 | 0b0110 | uncommon blocksize minus 1 stored as an | 1304 | | 8-bit number | 1305 +-----------------+-------------------------------------------+ 1306 | 0b0111 | uncommon blocksize minus 1 stored as a | 1307 | | 16-bit number | 1308 +-----------------+-------------------------------------------+ 1309 | 0b1000 - 0b1111 | 2^v, i.e. 256, 512, 1024, 2048, 4096, | 1310 | | 8192, 16384 or 32768 | 1311 +-----------------+-------------------------------------------+ 1313 Table 13 1315 10.1.2. Sample rate bits 1317 The next 4 bits, referred to as the sample rate bits, contain the 1318 sample rate according to the following table 1320 +========+=======================================================+ 1321 | Value | Sample rate | 1322 +========+=======================================================+ 1323 | 0b0000 | sample rate only stored in streaminfo metadata block | 1324 +--------+-------------------------------------------------------+ 1325 | 0b0001 | 88.2 kHz | 1326 +--------+-------------------------------------------------------+ 1327 | 0b0010 | 176.4 kHz | 1328 +--------+-------------------------------------------------------+ 1329 | 0b0011 | 192 kHz | 1330 +--------+-------------------------------------------------------+ 1331 | 0b0100 | 8 kHz | 1332 +--------+-------------------------------------------------------+ 1333 | 0b0101 | 16 kHz | 1334 +--------+-------------------------------------------------------+ 1335 | 0b0110 | 22.05 kHz | 1336 +--------+-------------------------------------------------------+ 1337 | 0b0111 | 24 kHz | 1338 +--------+-------------------------------------------------------+ 1339 | 0b1000 | 32 kHz | 1340 +--------+-------------------------------------------------------+ 1341 | 0b1001 | 44.1 kHz | 1342 +--------+-------------------------------------------------------+ 1343 | 0b1010 | 48 kHz | 1344 +--------+-------------------------------------------------------+ 1345 | 0b1011 | 96 kHz | 1346 +--------+-------------------------------------------------------+ 1347 | 0b1100 | uncommon sample rate in kHz stored as an 8-bit number | 1348 +--------+-------------------------------------------------------+ 1349 | 0b1101 | uncommon sample rate in Hz stored as a 16-bit number | 1350 +--------+-------------------------------------------------------+ 1351 | 0b1110 | uncommon sample rate in Hz divided by 10, stored as a | 1352 | | 16-bit number | 1353 +--------+-------------------------------------------------------+ 1354 | 0b1111 | invalid | 1355 +--------+-------------------------------------------------------+ 1357 Table 14 1359 10.1.3. Channels bits 1361 The next 4 bits (the first 4 bits of the fourth byte of each frame), 1362 referred to as the channel bits, code for both the number of channels 1363 as well as any stereo decorrelation used according to the following 1364 table, where v is the value of the 4 bits as an unsigned number. See 1365 also the section on interchannel decorrelation (#interchannel- 1366 decorrelation). 1368 +==========+====================================================+ 1369 | Value | Channels | 1370 +==========+====================================================+ 1371 | 0b0000 | 1 channel: mono | 1372 +----------+----------------------------------------------------+ 1373 | 0b0001 | 2 channels: left, right | 1374 +----------+----------------------------------------------------+ 1375 | 0b0010 | 3 channels: left, right, center | 1376 +----------+----------------------------------------------------+ 1377 | 0b0011 | 4 channels: front left, front right, back left, | 1378 | | back right | 1379 +----------+----------------------------------------------------+ 1380 | 0b0100 | 5 channels: front left, front right, front center, | 1381 | | back/surround left, back/surround right | 1382 +----------+----------------------------------------------------+ 1383 | 0b0101 | 6 channels: front left, front right, front center, | 1384 | | LFE, back/surround left, back/surround right | 1385 +----------+----------------------------------------------------+ 1386 | 0b0110 | 7 channels: front left, front right, front center, | 1387 | | LFE, back center, side left, side right | 1388 +----------+----------------------------------------------------+ 1389 | 0b0111 | 8 channels: front left, front right, front center, | 1390 | | LFE, back left, back right, side left, side right | 1391 +----------+----------------------------------------------------+ 1392 | 0b1000 | 2 channels, stored as left/side stereo | 1393 +----------+----------------------------------------------------+ 1394 | 0b1001 | 2 channels, stored as right/side stereo | 1395 +----------+----------------------------------------------------+ 1396 | 0b1010 | 2 channels, stored as mid/side stereo | 1397 +----------+----------------------------------------------------+ 1398 | 0b1011 - | reserved | 1399 | 0b1111 | | 1400 +----------+----------------------------------------------------+ 1402 Table 15 1404 10.1.4. Bit depth bits 1406 The next 3 bits code for the bit depth of the samples in the subframe 1407 according to the following table. 1409 +=======+====================================================+ 1410 | Value | Bit depth | 1411 +=======+====================================================+ 1412 | 0b000 | bit depth only stored in streaminfo metadata block | 1413 +-------+----------------------------------------------------+ 1414 | 0b001 | 8 bits per sample | 1415 +-------+----------------------------------------------------+ 1416 | 0b010 | 12 bits per sample | 1417 +-------+----------------------------------------------------+ 1418 | 0b011 | reserved | 1419 +-------+----------------------------------------------------+ 1420 | 0b100 | 16 bits per sample | 1421 +-------+----------------------------------------------------+ 1422 | 0b101 | 20 bits per sample | 1423 +-------+----------------------------------------------------+ 1424 | 0b110 | 24 bits per sample | 1425 +-------+----------------------------------------------------+ 1426 | 0b111 | reserved | 1427 +-------+----------------------------------------------------+ 1429 Table 16 1431 The next bit is reserved and MUST be zero. 1433 10.1.5. Coded number 1435 Following the reserved bit (starting at the fifth byte of the frame) 1436 is either a sample or a frame number, which will be referred to as 1437 the coded number. When dealing with variable blocksize streams, the 1438 sample number of the first sample in the frame is encoded. When the 1439 file contains a fixed blocksize stream, the frame number is encoded. 1440 The coded number is stored in a variable length code like UTF-8, but 1441 extended to a maximum of 36 bits unencoded, 7 byte encoded. When a 1442 frame number is encoded, the value MUST NOT be larger than what fits 1443 a value 31 bits unencoded or 6 byte encoded. Please note that most 1444 general purpose UTF-8 encoders and decoders will not be able to 1445 handle these extended codes. 1447 10.1.6. Uncommon blocksize 1449 If the blocksize bits defined earlier in this section were 0b0110 or 1450 0b0111 (uncommon blocksize minus 1 stored), this follows the coded 1451 number as either an 8-bit or a 16-bit unsigned number coded big- 1452 endian. 1454 10.1.7. Uncommon sample rate 1456 Following either the coded number or an uncommon blocksize is the 1457 sample rate, if the sample rate bits were 0b1100, 0b1101 or 0b1110 1458 (uncommon sample rate stored), as either an 8-bit or a 16-bit 1459 unsigned number coded big-endian. 1461 10.1.8. Frame header CRC 1463 Finally, after either the frame/sample number, the blocksize or the 1464 sample rate, is a 8-bit CRC. This CRC is initialized with 0 and has 1465 the polynomial x^8 + x^2 + x^1 + x^0. This CRC covers the whole 1466 frame header before the CRC, including the sync code. 1468 10.2. Subframes 1470 Following the frame header are a number subframes equal to the number 1471 of audio channels. 1473 10.2.1. Subframe header 1475 Each subframe starts with a header. The first bit of the header is 1476 always 0, followed by 6 bits describing which subframe type is used 1477 according to the following table, where v is the value of the 6 bits 1478 as an unsigned number. 1480 +=====================+=====================================+ 1481 | Value | Subframe type | 1482 +=====================+=====================================+ 1483 | 0b000000 | Constant subframe | 1484 +---------------------+-------------------------------------+ 1485 | 0b000001 | Verbatim subframe | 1486 +---------------------+-------------------------------------+ 1487 | 0b000010 - 0b000111 | reserved | 1488 +---------------------+-------------------------------------+ 1489 | 0b001000 - 0b001100 | Subframe with a fixed predictor | 1490 | | v-8, i.e. 0, 1, 2, 3 or 4 | 1491 +---------------------+-------------------------------------+ 1492 | 0b001101 - 0b011111 | reserved | 1493 +---------------------+-------------------------------------+ 1494 | 0b100000 - 0b111111 | Subframe with a linear predictor | 1495 | | v-31, i.e. 1 through 32 (inclusive) | 1496 +---------------------+-------------------------------------+ 1498 Table 17 1500 Following the subframe type bits is a bit that flags whether the 1501 subframe has any wasted bits. If it is 0, the subframe doesn't have 1502 any wasted bits and the subframe header is complete. If it is 1, the 1503 subframe does have wasted bits and the number of wasted bits follows 1504 unary coded. 1506 10.2.2. Wasted bits per sample 1508 Certain file formats, like AIFF, can store audio samples with a bit 1509 depth that is not an integer number of bytes by padding them with 1510 least significant zero bits to a bit depth that is an integer number 1511 of bytes. For example, shifting a 14-bit sample right by 2 pads it 1512 to a 16-bit sample, which then has two zero least-significant bits. 1513 In this specification, these least-significant zero bits are referred 1514 to as wasted bits-per-sample or simply wasted bits. They are wasted 1515 in a sense that they contain no information, but are stored anyway. 1517 The wasted bits-per-sample flag in a subframe header is set to 1 if a 1518 certain number of least-significant bits of all samples in the 1519 current subframe are zero. If this is the case, the number of wasted 1520 bits-per-sample (k) minus 1 follows the flag in an unary encoding. 1521 For example, if k is 3, 0b001 follows. If k = 0, the wasted bits- 1522 per-sample flag is 0 and no unary coded k follows. 1524 In case k is not equal to 0, samples are coded ignoring k least- 1525 significant bits. For example, if the preceding frame header 1526 specified a sample size of 16 bits per sample and k is 3, samples in 1527 the subframe are coded as 13 bits per sample. A decoder MUST add k 1528 least-significant zero bits by shifting left (padding) after decoding 1529 a subframe sample. In case the frame has left/side, right/side or 1530 mid/side stereo, padding MUST happen to a sample before it is used to 1531 reconstruct a left or right sample. 1533 Besides audio files that have a certain number of wasted bits for the 1534 whole file, there exist audio files in which the number of wasted 1535 bits varies. There are DVD-Audio discs in which blocks of samples 1536 have had their least-significant bits selectively zeroed, as to 1537 slightly improve the compression of their otherwise lossless Meridian 1538 Lossless Packing codec. There are also audio processors like 1539 lossyWAV that enable users to improve compression of their files by a 1540 lossless audio codec in a non-lossless way. Because of this the 1541 number of wasted bits k MAY change between frames and MAY differ 1542 between subframes. 1544 10.2.3. Constant subframe 1546 In a constant subframe only a single sample is stored. This sample 1547 is stored as a integer number coded big-endian, signed two's 1548 complement. The number of bits used to store this sample depends on 1549 the bit depth of the current subframe. The bit depth of a subframe 1550 is equal to the bit depth as coded in the frame header (#bit-depth- 1551 bits), minus the number of wasted bits coded in the subframe header 1552 (#wasted-bits-per-sample). In case a subframe is a side subframe 1553 (see the section on interchannel decorrelation (#interchannel- 1554 decorrelation), the bit depth of that subframe is increased by 1 bit. 1556 10.2.4. Verbatim subframe 1558 A verbatim subframe stores all samples unencoded in sequential order. 1559 See section on Constant subframe (#constant-subframe) on how a sample 1560 is stored unencoded. The number of samples that need to be stored in 1561 a subframe is given by the blocksize in the frame header. 1563 10.2.5. Fixed predictor subframe 1565 Five different fixed predictors are defined in the following table, 1566 one for each prediction order 0 through 4. In the table is also a 1567 derivation, which explains the rationale for choosing these fixed 1568 predictors. 1570 +=======+==================================+======================+ 1571 | Order | Prediction | Derivation | 1572 +=======+==================================+======================+ 1573 | 0 | 0 | N/A | 1574 +-------+----------------------------------+----------------------+ 1575 | 1 | s(n-1) | N/A | 1576 +-------+----------------------------------+----------------------+ 1577 | 2 | 2 * s(n-1) - s(n-2) | s(n-1) + s'(n-1) | 1578 +-------+----------------------------------+----------------------+ 1579 | 3 | 3 * s(n-1) - 3 * s(n-2) + s(n-3) | s(n-1) + s'(n-1) + | 1580 | | | s''(n-1) | 1581 +-------+----------------------------------+----------------------+ 1582 | 4 | 4 * s(n-1) - 6 * s(n-2) + 4 * | s(n-1) + s'(n-1) + | 1583 | | s(n-3) - s(n-4) | s''(n-1) + s'''(n-1) | 1584 +-------+----------------------------------+----------------------+ 1586 Table 18 1588 Where 1590 * n is the number of the sample being predicted 1592 * s(n) is the sample being predicted 1594 * s(n-1) is the sample before the one being predicted 1596 * s'(n-1) is the difference between the previous sample and the 1597 sample before that, i.e. s(n-1) - s(n-2). This is the closest 1598 available first-order discrete derivative 1600 * s''(n-1) is s'(n-1) - s'(n-2) or the closest available second- 1601 order discrete derivative 1603 * s'''(n-1) is s''(n-1) - s''(n-2) or the closest available third- 1604 order discrete derivative 1606 To encode a signal with a fixed predictor, each sample has the 1607 corresponding prediction subtracted and sent to the residual coder. 1608 To decode a signal with a fixed predictor, first the residual has to 1609 be decoded, after which for each sample the prediction can be added. 1610 This means that decoding MUST be a sequential process within a 1611 subframe, as for each sample, enough fully decoded previous samples 1612 are needed to calculate the prediction. 1614 For fixed predictor order 0, the prediction is always 0, thus each 1615 residual sample is equal to its corresponding input or decoded 1616 sample. The difference between a fixed predictor with order 0 and a 1617 verbatim subframe, is that a verbatim subframe stores all samples 1618 unencoded, while a fixed predictor with order 0 has all its samples 1619 processed by the residual coder. 1621 The first order fixed predictor is comparable to how DPCM encoding 1622 works, as the resulting residual sample is the difference between the 1623 corresponding sample and the sample before it. The higher fixed 1624 predictors can be understood as polynomials fitted to the previous 1625 samples. 1627 As the fixed predictors are specified, they do not have to be stored. 1628 The fixed predictor order specifies which predictor is used. To be 1629 able to predict samples, warm-up samples are stored, as the predictor 1630 needs previous samples in its prediction. The number of warm-up 1631 samples is equal to the predictor order. See section on Constant 1632 subframe (#constant-subframe) on how samples are stored unencoded. 1633 Directly following the warm-up samples is the coded residual. 1635 10.2.6. Linear predictor subframe 1637 Whereas fixed predictors are well suited for simple signals, using a 1638 (non-fixed) linear predictor on more complex signals can improve 1639 compression by making the residual samples even smaller. There is a 1640 certain trade-off however, as storing the predictor coefficients 1641 takes up space as well. 1643 In the FLAC format, a predictor is defined by up to 32 predictor 1644 coefficients and a shift. To form a prediction, each coefficient is 1645 multiplied with its corresponding past sample, the results are added 1646 and this addition is then shifted. To encode a signal with a linear 1647 predictor, each sample has the corresponding prediction subtracted 1648 and sent to the residual coder. To decode a signal with a linear 1649 predictor, first the residual has to be decoded, after which for each 1650 sample the prediction can be added. This means that decoding MUST be 1651 a sequential process within a subframe, as for each sample, enough 1652 fully decoded previous samples are needed to calculate the 1653 prediction. 1655 The table below defines how a linear predictor subframe appears in 1656 the bitstream 1657 +==========+===============================================+ 1658 | Data | Description | 1659 +==========+===============================================+ 1660 | s(n) | Unencoded warm-up samples (n = frame's bits- | 1661 | | per-sample * lpc order). | 1662 +----------+-----------------------------------------------+ 1663 | u(4) | (Predictor coefficient precision in bits)-1 | 1664 | | (NOTE: 0b1111 is invalid). | 1665 +----------+-----------------------------------------------+ 1666 | s(5) | Prediction right shift needed in bits. | 1667 +----------+-----------------------------------------------+ 1668 | s(n) | Unencoded predictor coefficients (n = | 1669 | | predictor coefficient precision * lpc order). | 1670 +----------+-----------------------------------------------+ 1671 | Coded | Encoded residual | 1672 | residual | | 1673 +----------+-----------------------------------------------+ 1675 Table 19 1677 See section on Constant subframe (#constant-subframe) on how the 1678 warm-up samples are stored unencoded. The unencoded predictor 1679 coefficients are stored the same way as the warm-up samples, but the 1680 number of bits needed for each coefficient is defined by the 1681 predictor coefficient precision. While the prediction right shift is 1682 signed two's complement, this number MUST be positive. 1684 Please note that the order in which the predictor coefficients appear 1685 in the bitstream corresponds to which *past* sample they belong. In 1686 other words, the order of the predictor coefficients is opposite to 1687 the chronological order of the samples. So, the first predictor 1688 coefficient has to be multiplied with the sample directly before the 1689 sample that is being predicted, the second predictor coefficient has 1690 to be multiplied with the sample before that etc. 1692 10.2.7. Coded residual 1694 The first two bits in a coded residual indicate which coding method 1695 is used. See the table below 1696 +=============+=============================================+ 1697 | Value | Description | 1698 +=============+=============================================+ 1699 | 0b00 | partitioned Rice code with 4-bit parameters | 1700 +-------------+---------------------------------------------+ 1701 | 0b01 | partitioned Rice code with 5-bit parameters | 1702 +-------------+---------------------------------------------+ 1703 | 0b10 - 0b11 | reserved | 1704 +-------------+---------------------------------------------+ 1706 Table 20 1708 Both defined coding methods work the same way, but differ in the 1709 number of bits used for rice parameters. The 4 bits that directly 1710 follow the coding method bits form the partition order, which is an 1711 unsigned number. The rest of the coded residual consists of 1712 2^(partition order) partitions. For example, if the 4 bits are 1713 0b1000, the partition order is 8 and the residual is split up into 1714 2^8 = 256 partitions. 1716 Each partition contains a certain amount of residual samples. The 1717 number of residual samples in the first partition is equal to 1718 (blocksize >> partition order) - predictor order, i.e. the blocksize 1719 divided by the number of partitions minus the predictor order. In 1720 all other partitions the number of residual samples is equal to 1721 (blocksize >> partition order). 1723 The partition order MUST be so that the blocksize is evenly divisible 1724 by the number of partitions. This means for example that for all odd 1725 blocksizes, only partition order 0 is allowed. The partition order 1726 also MUST be so that the (blocksize >> partition order) is larger 1727 than the predictor order. This means for example that with a 1728 blocksize of 4096 and a predictor order of 4, partition order cannot 1729 be larger than 9. 1731 In case the coded residual of a subframe is one with a 4-bit Rice 1732 parameter (see table at the start of this section), the first 4 bits 1733 of each partition are either a rice parameter or an escape code. 1734 These 4 bits indicate an escape code if they are 0b1111, otherwise 1735 they contain the rice parameter as an unsigned number. In case the 1736 coded residual of the current subframe is one with a 5-bit Rice 1737 parameter, the first 5 bits indicate an escape code if they are 1738 0b11111, otherwise they contain the rice parameter as an unsigned 1739 number as well. 1741 In case an escape code was used, the partition does not contain a 1742 variable-length rice coded residual, but a fixed-length unencoded 1743 residual. Directly following the escape code are 5 bits containing 1744 the number of bits with which each residual sample is stored, as an 1745 unsigned number. The residual samples themselves are stored signed 1746 two's complement. 1748 In case a rice parameter was provided, the partition contains a rice 1749 coded residual. The residual samples, which are signed numbers, are 1750 represented by unsigned numbers in the rice code. For positive 1751 numbers, the representation is the number doubled, for negative 1752 numbers, the representation is the number multiplied by -2 and has 1 1753 subtracted. This representation of signed numbers is also known as 1754 zigzag encoding and the zigzag encoded residual is called the folded 1755 residual. The folded residual samples are then each divided by the 1756 rice parameter. The result of each division rounded down (the 1757 quotient) is stored unary, the remainder is stored binary. 1759 Decoding the coded residual thus involves selecting the right coding 1760 method, finding the number of partitions, reading unary and binary 1761 parts of each codeword one-by-one and keeping track of when a new 1762 partition starts and thus when a new rice parameter needs to be read. 1764 10.3. Frame footer 1766 Following the last subframe is the frame footer. If the last 1767 subframe is not byte aligned (i.e. the bits required to store all 1768 subframes put together are not divisible by 8), zero bits are added 1769 until byte alignment is reached. Following this is a 16-bit CRC, 1770 initialized with 0, with polynomial x^16 + x^15 + x^2 + x^0. This 1771 CRC covers the whole frame excluding the 16-bit CRC, including the 1772 sync code. 1774 11. Implementation status 1776 This section records the status of known implementations of the FLAC 1777 format, and is based on a proposal described in [RFC7942]. Please 1778 note that the listing of any individual implementation here does not 1779 imply endorsement by the IETF. Furthermore, no effort has been spent 1780 to verify the information presented here that was supplied by IETF 1781 contributors. This is not intended as, and must not be construed to 1782 be, a catalog of available implementations or their features. 1783 Readers are advised to note that other implementations may exist. 1785 A reference encoder and decoder implementation of the FLAC format 1786 exists, known as libFLAC, maintained by Xiph.Org. It can be found at 1787 https://xiph.org/flac/ (https://xiph.org/flac/) Note that while all 1788 libFLAC components are licensed under 3-clause BSD, the flac and 1789 metaflac command line tools often supplied together with libFLAC are 1790 licensed under GPL. 1792 Another completely independent implementation of both encoder and 1793 decoder of the FLAC format is available in libavcodec, maintained by 1794 FFmpeg, licensed under LGPL 2.1 or later. It can be found at 1795 https://ffmpeg.org/ (https://ffmpeg.org/) 1797 A list of other implementations and an overview of which parts of the 1798 format they implement can be found here: https://github.com/ietf-wg- 1799 cellar/flac-specification/wiki/Implementations (https://github.com/ 1800 ietf-wg-cellar/flac-specification/wiki/Implementations) 1802 12. Security Considerations 1804 Like any other codec (such as [RFC6716]), FLAC should not be used 1805 with insecure ciphers or cipher modes that are vulnerable to known 1806 plaintext attacks. Some of the header bits as well as the padding 1807 are easily predictable. 1809 Implementations of the FLAC codec need to take appropriate security 1810 considerations into account. Those related to denial of service are 1811 outlined in Section 2.1 of [RFC4732]. It is extremely important for 1812 the decoder to be robust against malicious payloads. Malicious 1813 payloads MUST NOT cause the decoder to overrun its allocated memory 1814 or to take an excessive amount of resources to decode. An overrun in 1815 allocated memory could lead to arbitrary code execution by an 1816 attacker. The same applies to the encoder, even though problems in 1817 encoders are typically rarer. Malicious audio streams MUST NOT cause 1818 the encoder to misbehave because this would allow an attacker to 1819 attack transcoding gateways. An example is allocating more memory 1820 than available especially with blocksizes of more than 10000 or with 1821 big metadata blocks, or not allocating enough memory before copying 1822 data, which lead to execution of malicious code, crashes, freezes or 1823 reboots on some known implementations. See the FLAC decoder 1824 testbench (https://wiki.hydrogenaud.io/ 1825 index.php?title=FLAC_decoder_testbench) for a non-exhaustive list of 1826 FLAC files with extreme configurations which lead to crashes or 1827 reboots on some known implementations. 1829 None of the content carried in FLAC is intended to be executable. 1831 13. Normative References 1833 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1834 Requirement Levels", BCP 14, RFC 2119, 1835 DOI 10.17487/RFC2119, March 1997, 1836 . 1838 [RFC4732] Handley, M., Ed., Rescorla, E., Ed., and IAB, "Internet 1839 Denial-of-Service Considerations", RFC 4732, 1840 DOI 10.17487/RFC4732, December 2006, 1841 . 1843 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1844 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1845 May 2017, . 1847 14. Informative References 1849 [HPL-1999-144] 1850 Hans, M. and RW. Schafer, "Lossless Compression of Digital 1851 Audio", DOI 10.1109/79.939834, November 1999, 1852 . 1855 [IEC.60908.1999] 1856 International Electrotechnical Commission, "Audio 1857 recording - Compact disc digital audio system", 1858 IEC International standard 60908 second edition, 1999. 1860 [ISRC-handbook] 1861 "International Standard Recording Code (ISRC) Handbook, 1862 4th edition", 2021, . 1864 [RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the 1865 Opus Audio Codec", RFC 6716, DOI 10.17487/RFC6716, 1866 September 2012, . 1868 [RFC7942] Sheffer, Y. and A. Farrel, "Improving Awareness of Running 1869 Code: The Implementation Status Section", BCP 205, 1870 RFC 7942, DOI 10.17487/RFC7942, July 2016, 1871 . 1873 [robinson-tr156] 1874 Robinson, T., "SHORTEN: Simple lossless and near-lossless 1875 waveform compression", December 1994, 1876 . 1879 Appendix A. Numerical considerations 1881 In order to maintain lossless behavior, all arithmetic used in 1882 encoding and decoding sample values MUST be done with integer data 1883 types to eliminate the possibility of introducing rounding errors 1884 associated with floating-point arithmetic. Use of floating-point 1885 representations in analysis (e.g. finding a good predictor or rice 1886 parameter) is not a concern, as long as the process of using the 1887 found predictor and rice parameter to encode audio samples is 1888 implemented with only integer math. 1890 Furthermore, the possibility of integer overflow MUST be eliminated 1891 by using data types large enough to never overflow. Choosing a 1892 64-bit signed data type for all arithmetic involving sample values 1893 would make sure the possibility for overflow is eliminated, but 1894 usually smaller data types are chosen for increased performance, 1895 especially in embedded devices. This section will provide guidelines 1896 for choosing the right data type in each step of encoding and 1897 decoding FLAC files. 1899 A.1. Determining necessary data type size 1901 To find the smallest data type size that is guaranteed not to 1902 overflow for a certain sequence of arithmetic operations, the 1903 combination of values producing the largest possible result should be 1904 considered. 1906 If for example two 16-bit signed integers are added, the largest 1907 possible result forms if both values are the largest number that can 1908 be represented with a 16-bit signed integer. To store the result, an 1909 signed integer data type with at least 17 bits is needed. Similarly, 1910 when adding 4 of these values, 18 bits are needed, when adding 8, 19 1911 bits are needed etc. In general, the number of bits necessary when 1912 adding numbers together is increased by the log base 2 of the number 1913 of values rounded up to the nearest integer. So, when adding 18 1914 unknown values stored in 8 bit signed integers, we need a signed 1915 integer data type of at least 13 bits to store the result, as the log 1916 base 2 of 18 rounded up is 5. 1918 In case of multiplication, the number of bits needed for the result 1919 is the size of the first variable plus the size of the second 1920 variable, but counting only one sign bit if working with signed data 1921 types. If for example a 16-bit signed integer is multiplied by a 1922 16-bit signed integer, the result needs at least 31 bits to store 1923 without overflowing. 1925 A.2. Stereo decorrelation 1927 When stereo decorrelation is used, the side channel will have one 1928 extra bit of bit depth, see section on Interchannel Decorrelation 1929 (#interchannel-decorrelation). 1931 This means that while 16-bit signed integers have sufficient range to 1932 store samples from a fully decoded FLAC frame with a bit depth of 16 1933 bit, the decoding of a side subframe in such a file will need a data 1934 type with at least 17 bit to store decoded subframe samples before 1935 undoing stereo decorrelation. 1937 Most FLAC decoders store decoded (subframe) samples as 32-bit values, 1938 which is sufficient for files with bit depths up to (and including) 1939 31 bit. 1941 A.3. Prediction 1943 A prediction (which is used to calculate the residual on encoding or 1944 added to the residual to calculate the sample value on decoding) is 1945 formed by multiplying and summing preceding sample values. In order 1946 to eliminate the possibility of integer overflow, the combination of 1947 preceding sample values and predictor coefficients producing the 1948 largest possible value should be considered. 1950 To determine the size of the data type needed to calculate either a 1951 residual sample (on encoding) or an audio sample value (on decoding) 1952 in a fixed predictor subframe, the maximal possible value for these 1953 is calculated as described in the previous subsection (#determining- 1954 necessary-data-type-size) in the following table. For example: if a 1955 frame codes for 16-bit audio and has some form of stereo 1956 decorrelation, the subframe coding for the side channel would need 1957 16+1+3 bits in case a third order fixed predictor is used. 1959 +=======+==============================+===============+=======+ 1960 | Order | Calculation of residual | Sample values | Extra | 1961 | | | summed | bits | 1962 +=======+==============================+===============+=======+ 1963 | 0 | s(n) | 1 | 0 | 1964 +-------+------------------------------+---------------+-------+ 1965 | 1 | s(n) - s(n-1) | 2 | 1 | 1966 +-------+------------------------------+---------------+-------+ 1967 | 2 | s(n) - 2 * s(n-1) + s(n-2) | 4 | 2 | 1968 +-------+------------------------------+---------------+-------+ 1969 | 3 | s(n) - 3 * s(n-1) + 3 * | 8 | 3 | 1970 | | s(n-2) - s(n-3) | | | 1971 +-------+------------------------------+---------------+-------+ 1972 | 4 | s(n) - 4 * s(n-1) + 6 * | 16 | 4 | 1973 | | s(n-2) - 4 * s(n-3) + s(n-4) | | | 1974 +-------+------------------------------+---------------+-------+ 1976 Table 21 1978 Where 1980 * n is the number of the sample being predicted 1982 * s(n) is the sample being predicted 1984 * s(n-1) is the sample before the one being predicted, s(n-2) is the 1985 sample before that etc. 1987 For subframes with a linear predictor, calculation is a little more 1988 complicated. Each prediction is a sum of several multiplications. 1989 Each of these multiply a sample value with a predictor coefficient. 1990 The extra bits needed can be calculated by adding the predictor 1991 coefficient precision (in bits) to the bit depth of the audio 1992 samples. As both are signed numbers and only one 'sign bit' is 1993 necessary, 1 bit can be subtracted. To account for the summing of 1994 these multiplications, the log base 2 of the predictor order rounded 1995 up is added. 1997 For example, if the sample bitdepth of the source is 24, the current 1998 subframe encodes a side channel (see the section on interchannel 1999 decorrelation (#interchannel-decorrelation)), the predictor order is 2000 12 and the predictor coefficient precision is 15 bits, the minimum 2001 required size of the used signed integer data type is at least (24 + 2002 1) + (15 - 1) + ceil(log2(12)) = 43 bits. As another example, with a 2003 side-channel subframe bit depth of 16, a predictor order of 8 and a 2004 predictor coefficient precision of 12 bits, the minimum required size 2005 of the used signed integer data type is (16 + 1) + (12 - 1) + 2006 ceil(log2(8)) = 31 bits. 2008 After the prediction has been shifted right, the number of bits 2009 needed is reduced by the amount of right shift and increased by one 2010 bit for the subtraction from the current sample on encoding. On 2011 decoding, the data type size needed to store the result of the 2012 addition of the residual and the prediction should fit the subframe 2013 bit depth, assuming all calculations were done correctly. 2015 Taking the last example where 31 bits were needed for the prediction, 2016 the required data type size for the residual samples in case of a 2017 right shift of 10 bits would be 31 - 10 + 1 = 22 bits. 2019 A.4. Rice coding 2021 When folding (i.e. zig-zag encoding) the residual sample values, no 2022 extra bits are needed when the absolute value of each residual sample 2023 is first stored in an unsigned data type of the size of the last 2024 step, then doubled and then has one subtracted depending on whether 2025 the residual sample was positive or negative. Many implementations 2026 however choose to require one extra bit of data type size so zig-zag 2027 encoding can happen in one step and without a cast instead of the 2028 procedure described in the previous sentence. 2030 Appendix B. Examples 2032 This informational appendix contains short example FLAC files and 2033 short parts of FLAC files which are decoded step by step. These 2034 examples provide a more engaging way to understand the FLAC format 2035 than the formal specification. The text explaining these examples 2036 assumes the reader has at least cursory read the specification and 2037 that the reader refers to the specification for explanation of the 2038 terminology used. These examples mostly focus on the lay-out of 2039 several metadata blocks and subframe types and the implications of 2040 certain aspects (for example wasted bits and stereo decorrelation) on 2041 this lay-out. 2043 The examples feature (parts of) files generated by various FLAC 2044 encoders. These are presented in hexadecimal or binary format, 2045 followed by tables and text referring to various features by their 2046 starting bit positions in these representations. Each starting 2047 position (shortened to 'start' in the tables) is a hexadecimal byte 2048 position and a start bit within that byte, separated by a plus sign. 2049 Counts for these start at zero. For example, a feature starting at 2050 the 3rd bit of the 17th byte is referred to as starting at 0x10+2. 2052 All data in this appendix has been thoroughly verified. However, as 2053 this appendix is informational, in case any information here 2054 conflicts with statements in the formal specification, the latter 2055 takes precedence. 2057 B.1. Decoding example 1 2059 This very short example FLAC file codes for PCM audio that has two 2060 channels, each containing 1 sample. The focus of this example is on 2061 the essential parts of a FLAC file. 2063 B.1.1. Example file 1 in hexadecimal representation 2065 00000000: 664c 6143 8000 0022 1000 1000 fLaC...".... 2066 0000000c: 0000 0f00 000f 0ac4 42f0 0000 ........B... 2067 00000018: 0001 3e84 b418 07dc 6903 0758 ..>.....i..X 2068 00000024: 6a3d ad1a 2e0f fff8 6918 0000 j=......i... 2069 00000030: bf03 58fd 0312 8baa 9a ..X...... 2071 B.1.2. Example file 1 in binary representation 2073 00000000: 01100110 01001100 01100001 01000011 fLaC 2074 00000004: 10000000 00000000 00000000 00100010 ..." 2075 00000008: 00010000 00000000 00010000 00000000 .... 2076 0000000c: 00000000 00000000 00001111 00000000 .... 2077 00000010: 00000000 00001111 00001010 11000100 .... 2078 00000014: 01000010 11110000 00000000 00000000 B... 2079 00000018: 00000000 00000001 00111110 10000100 ..>. 2080 0000001c: 10110100 00011000 00000111 11011100 .... 2081 00000020: 01101001 00000011 00000111 01011000 i..X 2082 00000024: 01101010 00111101 10101101 00011010 j=.. 2083 00000028: 00101110 00001111 11111111 11111000 .... 2084 0000002c: 01101001 00011000 00000000 00000000 i... 2085 00000030: 10111111 00000011 01011000 11111101 ..X. 2086 00000034: 00000011 00010010 10001011 10101010 .... 2087 00000038: 10011010 2089 B.1.3. Signature and streaminfo 2091 The first 4 bytes of the file contain the fLaC file signature. 2092 Directly following it is a metadata block. The signature and the 2093 first metadata block header are broken down in the following table 2094 +========+========+============+===========================+ 2095 | Start | Length | Contents | Description | 2096 +========+========+============+===========================+ 2097 | 0x00+0 | 4 byte | 0x664C6143 | fLaC | 2098 +--------+--------+------------+---------------------------+ 2099 | 0x04+0 | 1 bit | 0b1 | Last metadata block | 2100 +--------+--------+------------+---------------------------+ 2101 | 0x04+1 | 7 bit | 0b0000000 | Streaminfo metadata block | 2102 +--------+--------+------------+---------------------------+ 2103 | 0x05+0 | 3 byte | 0x000022 | Length 34 byte | 2104 +--------+--------+------------+---------------------------+ 2106 Table 22 2108 As the header indicates that this is the last metadata block, the 2109 position of the first audio frame can now be calculated as the 2110 position of the first byte after the metadata block header + the 2111 length of the block, i.e. 8+34 = 42 or 0x2a. As can be seen 0x2a 2112 indeed contains the frame sync code for fixed blocksize streams, 2113 0xfff8. 2115 The streaminfo metadata block contents are broken down in the 2116 following table 2118 +========+=========+====================+=========================+ 2119 | Start | Length | Contents | Description | 2120 +========+=========+====================+=========================+ 2121 | 0x08+0 | 2 byte | 0x1000 | Min. blocksize 4096 | 2122 +--------+---------+--------------------+-------------------------+ 2123 | 0x0a+0 | 2 byte | 0x1000 | Max. blocksize 4096 | 2124 +--------+---------+--------------------+-------------------------+ 2125 | 0x0c+0 | 3 byte | 0x00000f | Min. frame size 15 byte | 2126 +--------+---------+--------------------+-------------------------+ 2127 | 0x0f+0 | 3 byte | 0x00000f | Max. frame size 15 byte | 2128 +--------+---------+--------------------+-------------------------+ 2129 | 0x12+0 | 20 bit | 0x0ac4, 0b0100 | Sample rate 44100 Hertz | 2130 +--------+---------+--------------------+-------------------------+ 2131 | 0x14+4 | 3 bit | 0b001 | 2 channels | 2132 +--------+---------+--------------------+-------------------------+ 2133 | 0x14+7 | 5 bit | 0b01111 | Sample bit depth 16 | 2134 +--------+---------+--------------------+-------------------------+ 2135 | 0x15+4 | 36 bit | 0b0000, 0x00000001 | Total no. of samples 1 | 2136 +--------+---------+--------------------+-------------------------+ 2137 | 0x1a | 16 byte | (...) | MD5 signature | 2138 +--------+---------+--------------------+-------------------------+ 2140 Table 23 2142 The minimum and maximum blocksize are both 4096. This was apparently 2143 the blocksize the encoder planned to use, but as only 1 interchannel 2144 sample was provided, no frames with 4096 samples are actually present 2145 in this file. 2147 Note that anywhere a number of samples is mentioned (blocksize, total 2148 number of samples, sample rate), interchannel samples are meant. 2150 The MD5 sum (starting at 0x1a) is 0x3e84 b418 07dc 6903 0758 6a3d 2151 ad1a 2e0f. This will be validated after decoding the samples. 2153 B.1.4. Audio frames 2155 The frame header starts at position 0x2a and is broken down in the 2156 following table. 2158 +========+========+=================+==============================+ 2159 | Start | Length | Contents | Description | 2160 +========+========+=================+==============================+ 2161 | 0x2a+0 | 15 bit | 0xff, 0b1111100 | frame sync | 2162 +--------+--------+-----------------+------------------------------+ 2163 | 0x2b+7 | 1 bit | 0b0 | blocksize strategy | 2164 +--------+--------+-----------------+------------------------------+ 2165 | 0x2c+0 | 4 bit | 0b0110 | 8-bit blocksize further down | 2166 +--------+--------+-----------------+------------------------------+ 2167 | 0x2c+4 | 4 bit | 0b1001 | sample rate 44.1kHz | 2168 +--------+--------+-----------------+------------------------------+ 2169 | 0x2d+0 | 4 bit | 0b0001 | stereo, no decorrelation | 2170 +--------+--------+-----------------+------------------------------+ 2171 | 0x2d+4 | 3 bit | 0b100 | bit depth 16 bit | 2172 +--------+--------+-----------------+------------------------------+ 2173 | 0x2d+7 | 1 bit | 0b0 | mandatory 0 bit | 2174 +--------+--------+-----------------+------------------------------+ 2175 | 0x2e+0 | 1 byte | 0x00 | frame number 0 | 2176 +--------+--------+-----------------+------------------------------+ 2177 | 0x2f+0 | 1 byte | 0x00 | blocksize 1 | 2178 +--------+--------+-----------------+------------------------------+ 2179 | 0x30+0 | 1 byte | 0xbf | frame header CRC | 2180 +--------+--------+-----------------+------------------------------+ 2182 Table 24 2184 As the stream is a fixed blocksize stream, the number at 0x2e 2185 contains a frame number. As the value is smaller than 128, only 1 2186 byte is used for the encoding. 2188 At byte 0x31 the subframe header of the first subframe starts, it is 2189 broken down in the following table. 2191 +========+========+================+=========================+ 2192 | Start | Length | Contents | Description | 2193 +========+========+================+=========================+ 2194 | 0x31+0 | 1 bit | 0b0 | mandatory 0 bit | 2195 +--------+--------+----------------+-------------------------+ 2196 | 0x31+1 | 6 bit | 0b000001 | verbatim subframe | 2197 +--------+--------+----------------+-------------------------+ 2198 | 0x31+7 | 1 bit | 0b1 | wasted bits present | 2199 +--------+--------+----------------+-------------------------+ 2200 | 0x32+0 | 2 bit | 0b01 | 2 wasted bits | 2201 +--------+--------+----------------+-------------------------+ 2202 | 0x32+2 | 14 bit | 0b011000, 0xfd | 14-bit unencoded sample | 2203 +--------+--------+----------------+-------------------------+ 2205 Table 25 2207 As the wasted bits flag is 1 in this subframe, an unary coded number 2208 follows. Starting at 0x32, we see 0b01, which unary codes for 1, 2209 meaning we have 2 wasted bits in this subframe. 2211 As this is a verbatim subframe, the subframe only contains unencoded 2212 sample values. With a blocksize of 1, it contains only a single 2213 sample. The bit depth of the audio is 16 bit, but as the subframe 2214 header signals 2 wasted bits, only 14 bits are stored. As no stereo 2215 decorrelation is used, a bit depth increase for the side channel is 2216 not applicable. So, the next 14 bit (starting at position 0x32+2) 2217 contain the unencoded sample coded big-endian, signed two's 2218 complement. The value reads 0b011000 11111101, or 6397. This value 2219 needs to be shifted left by 2 bits, to account for the wasted bits. 2220 The value is then 0b011000 11111101 00, or 25588. 2222 The second subframe starts at 0x34, it is broken down in the 2223 following table. 2225 +========+========+==============+=========================+ 2226 | Start | Length | Contents | Description | 2227 +========+========+==============+=========================+ 2228 | 0x34+0 | 1 bit | 0b0 | mandatory 0 bit | 2229 +--------+--------+--------------+-------------------------+ 2230 | 0x34+1 | 6 bit | 0b000001 | verbatim subframe | 2231 +--------+--------+--------------+-------------------------+ 2232 | 0x34+7 | 1 bit | 0b1 | wasted bits present | 2233 +--------+--------+--------------+-------------------------+ 2234 | 0x35+0 | 4 bit | 0b0001 | 4 wasted bits | 2235 +--------+--------+--------------+-------------------------+ 2236 | 0x35+4 | 12 bit | 0b0010, 0x8b | 12-bit unencoded sample | 2237 +--------+--------+--------------+-------------------------+ 2239 Table 26 2241 Here the wasted bits flag is also one, but the unary coded number 2242 that follows it is 4 bit long, indicating 4 wasted bits. This means 2243 the sample is stored in 12 bits. The sample value is 0b0010 2244 10001011, or 651. This value now has to be shifted left by 4 bits, 2245 i.e. 0b0010 10001011 0000 or 10416. 2247 At this point, we would do stereo decorrelation if that was 2248 applicable. 2250 As the last subframe ends byte-aligned, no padding bits were 2251 inserted. The next 2 bytes, starting at 0x38, contain the frame CRC. 2252 As this is the only frame in the file, the file ends with the CRC. 2254 To validate the MD5, we line up the samples interleaved, byte- 2255 aligned, little endian, signed two's complement. The first sample, 2256 the value of which was 25588 translates to 0xf463, the second sample 2257 had a value of 10416 which translates to 0xb028. When MD5 summing 2258 0xf463b028, we get the MD5 sum found in the header, so decoding was 2259 lossless. 2261 B.2. Decoding example 2 2263 This FLAC file is larger than the first example, but still contains 2264 very little audio. The focus of this example is on decoding a 2265 subframe with a fixed predictor and a coded residual, but it also 2266 contains a very short seektable, vorbis comment and padding metadata 2267 block. 2269 B.2.1. Example file 2 in hexadecimal representation 2270 00000000: 664c 6143 0000 0022 0010 0010 fLaC...".... 2271 0000000c: 0000 1700 0044 0ac4 42f0 0000 .....D..B... 2272 00000018: 0013 d5b0 5649 75e9 8b8d 8b93 ....VIu..... 2273 00000024: 0422 757b 8103 0300 0012 0000 ."u{........ 2274 00000030: 0000 0000 0000 0000 0000 0000 ............ 2275 0000003c: 0000 0010 0400 003a 2000 0000 .......: ... 2276 00000048: 7265 6665 7265 6e63 6520 6c69 reference li 2277 00000054: 6246 4c41 4320 312e 332e 3320 bFLAC 1.3.3 2278 00000060: 3230 3139 3038 3034 0100 0000 20190804.... 2279 0000006c: 0e00 0000 5449 544c 453d d7a9 ....TITLE=.. 2280 00000078: d79c d795 d79d 8100 0006 0000 ............ 2281 00000084: 0000 0000 fff8 6998 000f 9912 ......i..... 2282 00000090: 0867 0162 3d14 4299 8f5d f70d .g.b=.B..].. 2283 0000009c: 6fe0 0c17 caeb 2100 0ee7 a77a o.....!....z 2284 000000a8: 24a1 590c 1217 b603 097b 784f $.Y......{xO 2285 000000b4: aa9a 33d2 85e0 70ad 5b1b 4851 ..3...p.[.HQ 2286 000000c0: b401 0d99 d2cd 1a68 f1e6 b810 .......h.... 2287 000000cc: fff8 6918 0102 a402 c382 c40b ..i......... 2288 000000d8: c14a 03ee 48dd 03b6 7c13 30 .J..H...|.0 2290 B.2.2. Example file 2 in binary representation (only audio frames) 2292 00000088: 11111111 11111000 01101001 10011000 ..i. 2293 0000008c: 00000000 00001111 10011001 00010010 .... 2294 00000090: 00001000 01100111 00000001 01100010 .g.b 2295 00000094: 00111101 00010100 01000010 10011001 =.B. 2296 00000098: 10001111 01011101 11110111 00001101 .].. 2297 0000009c: 01101111 11100000 00001100 00010111 o... 2298 000000a0: 11001010 11101011 00100001 00000000 ..!. 2299 000000a4: 00001110 11100111 10100111 01111010 ...z 2300 000000a8: 00100100 10100001 01011001 00001100 $.Y. 2301 000000ac: 00010010 00010111 10110110 00000011 .... 2302 000000b0: 00001001 01111011 01111000 01001111 .{xO 2303 000000b4: 10101010 10011010 00110011 11010010 ..3. 2304 000000b8: 10000101 11100000 01110000 10101101 ..p. 2305 000000bc: 01011011 00011011 01001000 01010001 [.HQ 2306 000000c0: 10110100 00000001 00001101 10011001 .... 2307 000000c4: 11010010 11001101 00011010 01101000 ...h 2308 000000c8: 11110001 11100110 10111000 00010000 .... 2309 000000cc: 11111111 11111000 01101001 00011000 ..i. 2310 000000d0: 00000001 00000010 10100100 00000010 .... 2311 000000d4: 11000011 10000010 11000100 00001011 .... 2312 000000d8: 11000001 01001010 00000011 11101110 .J.. 2313 000000dc: 01001000 11011101 00000011 10110110 H... 2314 000000e0: 01111100 00010011 00110000 |.0 2316 B.2.3. Signature and streaminfo 2318 Most of the streaminfo block is the same as in example 1, so only 2319 parts that are different are listed in the following table 2321 +========+========+============+=============================+ 2322 | Start | Length | Contents | Description | 2323 +========+========+============+=============================+ 2324 | 0x04+0 | 1 bit | 0b0 | Not the last metadata block | 2325 +--------+--------+------------+-----------------------------+ 2326 | 0x08+0 | 2 byte | 0x0010 | Min. blocksize 16 | 2327 +--------+--------+------------+-----------------------------+ 2328 | 0x0a+0 | 2 byte | 0x0010 | Max. blocksize 16 | 2329 +--------+--------+------------+-----------------------------+ 2330 | 0x0c+0 | 3 byte | 0x000017 | Min. frame size 23 byte | 2331 +--------+--------+------------+-----------------------------+ 2332 | 0x0f+0 | 3 byte | 0x000044 | Max. frame size 68 byte | 2333 +--------+--------+------------+-----------------------------+ 2334 | 0x15+4 | 36 bit | 0b0000, | Total no. of samples 19 | 2335 | | | 0x00000013 | | 2336 +--------+--------+------------+-----------------------------+ 2337 | 0x1a | 16 | (...) | MD5 signature | 2338 | | byte | | | 2339 +--------+--------+------------+-----------------------------+ 2341 Table 27 2343 This time, the minimum and maximum blocksizes are reflected in the 2344 file: there is one block of 16 samples, but the last block (which has 2345 3 samples) is excluded from this number. The MD5 signature is 0xd5b0 2346 5649 75e9 8b8d 8b93 0422 757b 8103, this will be verified at the end 2347 of this example. 2349 B.2.4. Seektable 2351 The seektable metadata block only holds one entry. It is not really 2352 useful here, as it points to the first frame, but it is enough for 2353 this example. The seektable metadata block is broken down in the 2354 following table. 2356 +========+========+====================+================+ 2357 | Start | Length | Contents | Description | 2358 +========+========+====================+================+ 2359 | 0x2a+0 | 1 bit | 0b0 | Not the last | 2360 | | | | metadata block | 2361 +--------+--------+--------------------+----------------+ 2362 | 0x2a+1 | 7 bit | 0b0000011 | Seektable | 2363 | | | | metadata block | 2364 +--------+--------+--------------------+----------------+ 2365 | 0x2b+0 | 3 byte | 0x000012 | Length 18 byte | 2366 +--------+--------+--------------------+----------------+ 2367 | 0x2e+0 | 8 byte | 0x0000000000000000 | Seekpoint to | 2368 | | | | sample 0 | 2369 +--------+--------+--------------------+----------------+ 2370 | 0x36+0 | 8 byte | 0x0000000000000000 | Seekpoint to | 2371 | | | | offset 0 | 2372 +--------+--------+--------------------+----------------+ 2373 | 0x3e+0 | 2 byte | 0x0010 | Seekpoint to | 2374 | | | | blocksize 16 | 2375 +--------+--------+--------------------+----------------+ 2377 Table 28 2379 B.2.5. Vorbis comment 2381 The vorbis comment metadata block contains the vendor string and a 2382 single comment. It is broken down in the following table. 2384 +========+=========+============+===============================+ 2385 | Start | Length | Contents | Description | 2386 +========+=========+============+===============================+ 2387 | 0x40+0 | 1 bit | 0b0 | Not the last metadata block | 2388 +--------+---------+------------+-------------------------------+ 2389 | 0x40+1 | 7 bit | 0b0000100 | Vorbis comment metadata block | 2390 +--------+---------+------------+-------------------------------+ 2391 | 0x41+0 | 3 byte | 0x00003a | Length 58 byte | 2392 +--------+---------+------------+-------------------------------+ 2393 | 0x44+0 | 4 byte | 0x20000000 | Vendor string length 32 byte | 2394 +--------+---------+------------+-------------------------------+ 2395 | 0x48+0 | 32 byte | (...) | Vendor string | 2396 +--------+---------+------------+-------------------------------+ 2397 | 0x68+0 | 4 byte | 0x01000000 | Number of fields 1 | 2398 +--------+---------+------------+-------------------------------+ 2399 | 0x6c+0 | 4 byte | 0x0e000000 | Field length 14 byte | 2400 +--------+---------+------------+-------------------------------+ 2401 | 0x70+0 | 14 byte | (...) | Field contents | 2402 +--------+---------+------------+-------------------------------+ 2404 Table 29 2406 The vendor string is reference libFLAC 1.3.3 20190804, the field 2407 contents of the only field is TITLE=שלום (U+05E9 U+05DC U+05D5 2408 U+05DD). The vorbis comment field is 14 bytes but only 10 characters 2409 in size, because it contains four 2-byte characters. 2411 B.2.6. Padding 2413 The last metadata block is a (very short) padding block. 2415 +========+========+================+========================+ 2416 | Start | Length | Contents | Description | 2417 +========+========+================+========================+ 2418 | 0x7e+0 | 1 bit | 0b1 | Last metadata block | 2419 +--------+--------+----------------+------------------------+ 2420 | 0x7e+1 | 7 bit | 0b0000001 | Padding metadata block | 2421 +--------+--------+----------------+------------------------+ 2422 | 0x7f+0 | 3 byte | 0x000006 | Length 6 byte | 2423 +--------+--------+----------------+------------------------+ 2424 | 0x82+0 | 6 byte | 0x000000000000 | Padding bytes | 2425 +--------+--------+----------------+------------------------+ 2427 Table 30 2429 B.2.7. First audio frame 2431 The frame header starts at position 0x88 and is broken down in the 2432 following table. 2434 +========+========+=================+==============================+ 2435 | Start | Length | Contents | Description | 2436 +========+========+=================+==============================+ 2437 | 0x88+0 | 15 bit | 0xff, 0b1111100 | frame sync | 2438 +--------+--------+-----------------+------------------------------+ 2439 | 0x89+7 | 1 bit | 0b0 | blocksize strategy | 2440 +--------+--------+-----------------+------------------------------+ 2441 | 0x8a+0 | 4 bit | 0b0110 | 8-bit blocksize further down | 2442 +--------+--------+-----------------+------------------------------+ 2443 | 0x8a+4 | 4 bit | 0b1001 | sample rate 44.1kHz | 2444 +--------+--------+-----------------+------------------------------+ 2445 | 0x8b+0 | 4 bit | 0b1001 | right-side stereo | 2446 +--------+--------+-----------------+------------------------------+ 2447 | 0x8b+4 | 3 bit | 0b100 | bit depth 16 bit | 2448 +--------+--------+-----------------+------------------------------+ 2449 | 0x8b+7 | 1 bit | 0b0 | mandatory 0 bit | 2450 +--------+--------+-----------------+------------------------------+ 2451 | 0x8c+0 | 1 byte | 0x00 | frame number 0 | 2452 +--------+--------+-----------------+------------------------------+ 2453 | 0x8d+0 | 1 byte | 0x0f | blocksize 16 | 2454 +--------+--------+-----------------+------------------------------+ 2455 | 0x8e+0 | 1 byte | 0x99 | frame header CRC | 2456 +--------+--------+-----------------+------------------------------+ 2458 Table 31 2460 The first subframe starts at byte 0x8f, it is broken down in the 2461 following table excluding the coded residual. As this subframe codes 2462 for a side channel, the bit depth is increased by 1 bit from 16 bit 2463 to 17 bit. This is most clearly present in the unencoded warm-up 2464 sample. 2466 +========+========+=============+===========================+ 2467 | Start | Length | Contents | Description | 2468 +========+========+=============+===========================+ 2469 | 0x8f+0 | 1 bit | 0b0 | mandatory 0 bit | 2470 +--------+--------+-------------+---------------------------+ 2471 | 0x8f+1 | 6 bit | 0b001001 | fixed subframe, 1st order | 2472 +--------+--------+-------------+---------------------------+ 2473 | 0x8f+7 | 1 bit | 0b0 | no wasted bits present | 2474 +--------+--------+-------------+---------------------------+ 2475 | 0x90+0 | 17 bit | 0x0867, 0b0 | unencoded warm-up sample | 2476 +--------+--------+-------------+---------------------------+ 2478 Table 32 2480 The coded residual is broken down in the following table. All 2481 quotients are unary coded, all remainders are unencoded with a number 2482 of bits specified by the rice parameter. 2484 +========+========+=================+=================+ 2485 | Start | Length | Contents | Description | 2486 +========+========+=================+=================+ 2487 | 0x92+1 | 2 bit | 0b00 | Rice code with | 2488 | | | | 4-bit parameter | 2489 +--------+--------+-----------------+-----------------+ 2490 | 0x92+3 | 4 bit | 0b0000 | Partition order | 2491 | | | | 0 | 2492 +--------+--------+-----------------+-----------------+ 2493 | 0x92+7 | 4 bit | 0b1011 | Rice parameter | 2494 | | | | 11 | 2495 +--------+--------+-----------------+-----------------+ 2496 | 0x93+3 | 4 bit | 0b0001 | Quotient 3 | 2497 +--------+--------+-----------------+-----------------+ 2498 | 0x93+7 | 11 bit | 0b00011110100 | Remainder 244 | 2499 +--------+--------+-----------------+-----------------+ 2500 | 0x95+2 | 2 bit | 0b01 | Quotient 1 | 2501 +--------+--------+-----------------+-----------------+ 2502 | 0x95+4 | 11 bit | 0b01000100001 | Remainder 545 | 2503 +--------+--------+-----------------+-----------------+ 2504 | 0x96+7 | 2 bit | 0b01 | Quotient 1 | 2505 +--------+--------+-----------------+-----------------+ 2506 | 0x97+1 | 11 bit | 0b00110011000 | Remainder 408 | 2507 +--------+--------+-----------------+-----------------+ 2508 | 0x98+4 | 1 bit | 0b1 | Quotient 0 | 2509 +--------+--------+-----------------+-----------------+ 2510 | 0x98+5 | 11 bit | 0b11101011101 | Remainder 1885 | 2511 +--------+--------+-----------------+-----------------+ 2512 | 0x9a+0 | 1 bit | 0b1 | Quotient 0 | 2513 +--------+--------+-----------------+-----------------+ 2514 | 0x9a+1 | 11 bit | 0b11101110000 | Remainder 1904 | 2515 +--------+--------+-----------------+-----------------+ 2516 | 0x9b+4 | 1 bit | 0b1 | Quotient 0 | 2517 +--------+--------+-----------------+-----------------+ 2518 | 0x9b+5 | 11 bit | 0b10101101111 | Remainder 1391 | 2519 +--------+--------+-----------------+-----------------+ 2520 | 0x9d+0 | 1 bit | 0b1 | Quotient 0 | 2521 +--------+--------+-----------------+-----------------+ 2522 | 0x9d+1 | 11 bit | 0b11000000000 | Remainder 1536 | 2523 +--------+--------+-----------------+-----------------+ 2524 | 0x9e+4 | 1 bit | 0b1 | Quotient 0 | 2525 +--------+--------+-----------------+-----------------+ 2526 | 0x9e+5 | 11 bit | 0b10000010111 | Remainder 1047 | 2527 +--------+--------+-----------------+-----------------+ 2528 | 0xa0+0 | 1 bit | 0b1 | Quotient 0 | 2529 +--------+--------+-----------------+-----------------+ 2530 | 0xa0+1 | 11 bit | 0b10010101110 | Remainder 1198 | 2531 +--------+--------+-----------------+-----------------+ 2532 | 0xa1+4 | 1 bit | 0b1 | Quotient 0 | 2533 +--------+--------+-----------------+-----------------+ 2534 | 0xa1+5 | 11 bit | 0b01100100001 | Remainder 801 | 2535 +--------+--------+-----------------+-----------------+ 2536 | 0xa3+0 | 13 bit | 0b0000000000001 | Quotient 12 | 2537 +--------+--------+-----------------+-----------------+ 2538 | 0xa4+5 | 11 bit | 0b11011100111 | Remainder 1767 | 2539 +--------+--------+-----------------+-----------------+ 2540 | 0xa6+0 | 1 bit | 0b1 | Quotient 0 | 2541 +--------+--------+-----------------+-----------------+ 2542 | 0xa6+1 | 11 bit | 0b01001110111 | Remainder 631 | 2543 +--------+--------+-----------------+-----------------+ 2544 | 0xa7+4 | 1 bit | 0b1 | Quotient 0 | 2545 +--------+--------+-----------------+-----------------+ 2546 | 0xa7+5 | 11 bit | 0b01000100100 | Remainder 548 | 2547 +--------+--------+-----------------+-----------------+ 2548 | 0xa9+0 | 1 bit | 0b1 | Quotient 0 | 2549 +--------+--------+-----------------+-----------------+ 2550 | 0xa9+1 | 11 bit | 0b01000010101 | Remainder 533 | 2551 +--------+--------+-----------------+-----------------+ 2552 | 0xaa+4 | 1 bit | 0b1 | Quotient 0 | 2553 +--------+--------+-----------------+-----------------+ 2554 | 0xaa+5 | 11 bit | 0b00100001100 | Remainder 268 | 2555 +--------+--------+-----------------+-----------------+ 2557 Table 33 2559 At this point, the decoder should know it is done decoding the coded 2560 residual, as it received 16 samples: 1 warm-up sample and 15 residual 2561 samples. Each residual sample can be calculated from the quotient 2562 and remainder, and undoing the zig-zag encoding. For example, the 2563 value of the first zig-zag encoded residual sample is 3 * 2^11 + 244 2564 = 6388. As this is an even number, the zig-zag encoding is undone by 2565 dividing by 2, the residual sample value is 3194. This is done for 2566 all residual samples in the next table 2568 +==========+===========+=================+=======================+ 2569 | Quotient | Remainder | Zig-zag encoded | Residual sample value | 2570 +==========+===========+=================+=======================+ 2571 | 3 | 244 | 6388 | 3194 | 2572 +----------+-----------+-----------------+-----------------------+ 2573 | 1 | 545 | 2593 | -1297 | 2574 +----------+-----------+-----------------+-----------------------+ 2575 | 1 | 408 | 2456 | 1228 | 2576 +----------+-----------+-----------------+-----------------------+ 2577 | 0 | 1885 | 1885 | -943 | 2578 +----------+-----------+-----------------+-----------------------+ 2579 | 0 | 1904 | 1904 | 952 | 2580 +----------+-----------+-----------------+-----------------------+ 2581 | 0 | 1391 | 1391 | -696 | 2582 +----------+-----------+-----------------+-----------------------+ 2583 | 0 | 1536 | 1536 | 768 | 2584 +----------+-----------+-----------------+-----------------------+ 2585 | 0 | 1047 | 1047 | -524 | 2586 +----------+-----------+-----------------+-----------------------+ 2587 | 0 | 1198 | 1198 | 599 | 2588 +----------+-----------+-----------------+-----------------------+ 2589 | 0 | 801 | 801 | -401 | 2590 +----------+-----------+-----------------+-----------------------+ 2591 | 12 | 1767 | 26343 | -13172 | 2592 +----------+-----------+-----------------+-----------------------+ 2593 | 0 | 631 | 631 | -316 | 2594 +----------+-----------+-----------------+-----------------------+ 2595 | 0 | 548 | 548 | 274 | 2596 +----------+-----------+-----------------+-----------------------+ 2597 | 0 | 533 | 533 | -267 | 2598 +----------+-----------+-----------------+-----------------------+ 2599 | 0 | 268 | 268 | 134 | 2600 +----------+-----------+-----------------+-----------------------+ 2602 Table 34 2604 It can be calculated that using a Rice code is in this case more 2605 efficient than storing values unencoded. The rice code (excluding 2606 the partition order and parameter) is 199 bits in length. The 2607 largest residual value (-13172) would need 15 bits to be stored 2608 unencoded, so storing all 15 samples with 15 bits results in a 2609 sequence with a length of 225 bits. 2611 The next step is using the predictor and the residuals to restore the 2612 sample values. As this subframe uses a fixed predictor with order 1, 2613 this means adding the residual value to the value of the previous 2614 sample. 2616 +===========+==============+ 2617 | Residual | Sample value | 2618 +===========+==============+ 2619 | (warm-up) | 4302 | 2620 +-----------+--------------+ 2621 | 3194 | 7496 | 2622 +-----------+--------------+ 2623 | -1297 | 6199 | 2624 +-----------+--------------+ 2625 | 1228 | 7427 | 2626 +-----------+--------------+ 2627 | -943 | 6484 | 2628 +-----------+--------------+ 2629 | 952 | 7436 | 2630 +-----------+--------------+ 2631 | -696 | 6740 | 2632 +-----------+--------------+ 2633 | 768 | 7508 | 2634 +-----------+--------------+ 2635 | -524 | 6984 | 2636 +-----------+--------------+ 2637 | 599 | 7583 | 2638 +-----------+--------------+ 2639 | -401 | 7182 | 2640 +-----------+--------------+ 2641 | -13172 | -5990 | 2642 +-----------+--------------+ 2643 | -316 | -6306 | 2644 +-----------+--------------+ 2645 | 274 | -6032 | 2646 +-----------+--------------+ 2647 | -267 | -6299 | 2648 +-----------+--------------+ 2649 | 134 | -6165 | 2650 +-----------+--------------+ 2652 Table 35 2654 With this, decoding of the first subframe is complete. Decoding of 2655 the second subframe is very similar, as it also uses a fixed 2656 predictor of order 1, so this is left as an exercise for the reader, 2657 results are in the next table. The next step is stereo 2658 decorrelation, which is done in the following table. As the stereo 2659 decorrelation is right-side, in which the actual ordering of the 2660 subframes is side-right, the samples in the right channel come 2661 directly from the second subframe, while the samples in the left 2662 channel are found by adding the values of both subframes for each 2663 sample. 2665 +============+============+========+=======+ 2666 | Subframe 1 | Subframe 2 | Left | Right | 2667 +============+============+========+=======+ 2668 | 4302 | 6070 | 10372 | 6070 | 2669 +------------+------------+--------+-------+ 2670 | 7496 | 10545 | 18041 | 10545 | 2671 +------------+------------+--------+-------+ 2672 | 6199 | 8743 | 14942 | 8743 | 2673 +------------+------------+--------+-------+ 2674 | 7427 | 10449 | 17876 | 10449 | 2675 +------------+------------+--------+-------+ 2676 | 6484 | 9143 | 15627 | 9143 | 2677 +------------+------------+--------+-------+ 2678 | 7436 | 10463 | 17899 | 10463 | 2679 +------------+------------+--------+-------+ 2680 | 6740 | 9502 | 16242 | 9502 | 2681 +------------+------------+--------+-------+ 2682 | 7508 | 10569 | 18077 | 10569 | 2683 +------------+------------+--------+-------+ 2684 | 6984 | 9840 | 16824 | 9840 | 2685 +------------+------------+--------+-------+ 2686 | 7583 | 10680 | 18263 | 10680 | 2687 +------------+------------+--------+-------+ 2688 | 7182 | 10113 | 17295 | 10113 | 2689 +------------+------------+--------+-------+ 2690 | -5990 | -8428 | -14418 | -8428 | 2691 +------------+------------+--------+-------+ 2692 | -6306 | -8895 | -15201 | -8895 | 2693 +------------+------------+--------+-------+ 2694 | -6032 | -8476 | -14508 | -8476 | 2695 +------------+------------+--------+-------+ 2696 | -6299 | -8896 | -15195 | -8896 | 2697 +------------+------------+--------+-------+ 2698 | -6165 | -8653 | -14818 | -8653 | 2699 +------------+------------+--------+-------+ 2701 Table 36 2703 As the second subframe ends byte-aligned, no padding bits follow it. 2704 Finally, the last 2 bytes in the frame is the frame CRC. 2706 B.2.8. Second audio frame 2708 The second audio frame is very similar to the frame decoded in the 2709 first example, but this time not 1 but 3 samples are present. 2711 The frame header starts at position 0xcc and is broken down in the 2712 following table. 2714 +========+========+=================+==============================+ 2715 | Start | Length | Contents | Description | 2716 +========+========+=================+==============================+ 2717 | 0xcc+0 | 15 bit | 0xff, 0b1111100 | frame sync | 2718 +--------+--------+-----------------+------------------------------+ 2719 | 0xcd+7 | 1 bit | 0b0 | blocksize strategy | 2720 +--------+--------+-----------------+------------------------------+ 2721 | 0xce+0 | 4 bit | 0b0110 | 8-bit blocksize further down | 2722 +--------+--------+-----------------+------------------------------+ 2723 | 0xce+4 | 4 bit | 0b1001 | sample rate 44.1kHz | 2724 +--------+--------+-----------------+------------------------------+ 2725 | 0xcf+0 | 4 bit | 0b0001 | stereo, no decorrelation | 2726 +--------+--------+-----------------+------------------------------+ 2727 | 0xcf+4 | 3 bit | 0b100 | bit depth 16 bit | 2728 +--------+--------+-----------------+------------------------------+ 2729 | 0xcf+7 | 1 bit | 0b0 | mandatory 0 bit | 2730 +--------+--------+-----------------+------------------------------+ 2731 | 0xd0+0 | 1 byte | 0x01 | frame number 1 | 2732 +--------+--------+-----------------+------------------------------+ 2733 | 0xd1+0 | 1 byte | 0x02 | blocksize 3 | 2734 +--------+--------+-----------------+------------------------------+ 2735 | 0xd2+0 | 1 byte | 0xa4 | frame header CRC | 2736 +--------+--------+-----------------+------------------------------+ 2738 Table 37 2740 The first subframe starts at 0xd3+0 and is broken down in the 2741 following table. 2743 +========+========+==========+=========================+ 2744 | Start | Length | Contents | Description | 2745 +========+========+==========+=========================+ 2746 | 0xd3+0 | 1 bit | 0b0 | mandatory 0 bit | 2747 +--------+--------+----------+-------------------------+ 2748 | 0xd3+1 | 6 bit | 0b000001 | verbatim subframe | 2749 +--------+--------+----------+-------------------------+ 2750 | 0xd3+7 | 1 bit | 0b0 | no wasted bits present | 2751 +--------+--------+----------+-------------------------+ 2752 | 0xd4+0 | 16 bit | 0xc382 | 16-bit unencoded sample | 2753 +--------+--------+----------+-------------------------+ 2754 | 0xd6+0 | 16 bit | 0xc40b | 16-bit unencoded sample | 2755 +--------+--------+----------+-------------------------+ 2756 | 0xd8+0 | 16 bit | 0xc14a | 16-bit unencoded sample | 2757 +--------+--------+----------+-------------------------+ 2759 Table 38 2761 The second subframe starts at 0xda+0 and is broken down in the 2762 following table 2764 +========+========+===================+=========================+ 2765 | Start | Length | Contents | Description | 2766 +========+========+===================+=========================+ 2767 | 0xda+0 | 1 bit | 0b0 | mandatory 0 bit | 2768 +--------+--------+-------------------+-------------------------+ 2769 | 0xda+1 | 6 bit | 0b000001 | verbatim subframe | 2770 +--------+--------+-------------------+-------------------------+ 2771 | 0xda+7 | 1 bit | 0b1 | wasted bits present | 2772 +--------+--------+-------------------+-------------------------+ 2773 | 0xdb+0 | 1 bit | 0b1 | 1 wasted bit | 2774 +--------+--------+-------------------+-------------------------+ 2775 | 0xdb+1 | 15 bit | 0b110111001001000 | 15-bit unencoded sample | 2776 +--------+--------+-------------------+-------------------------+ 2777 | 0xdd+0 | 15 bit | 0b110111010000001 | 15-bit unencoded sample | 2778 +--------+--------+-------------------+-------------------------+ 2779 | 0xde+7 | 15 bit | 0b110110110011111 | 15-bit unencoded sample | 2780 +--------+--------+-------------------+-------------------------+ 2782 Table 39 2784 As this subframe has wasted bits, the 15-bit unencoded samples need 2785 to be shifted left by 1 bit. For example, sample 1 is stored as 2786 -4536 and becomes -9072 after shifting left 1 bit. 2788 As the last subframe does not end on byte alignment, 2 padding bits 2789 are added before the 2 byte frame CRC follows at 0xe1+0. 2791 B.2.9. MD5 checksum verification 2793 All samples in the file have been decoded, we can now verify the MD5 2794 sum. All sample values must be interleaved and stored signed, coded 2795 little-endian. The result of this follows in groups of 12 samples 2796 (i.e. 6 interchannel samples) 2798 0x8428 B617 7946 3129 5E3A 2722 D445 D128 0B3D B723 EB45 DF28 2799 0x723f 1E25 9D46 4929 B841 7026 5747 B829 8F43 8127 AEC7 14DF 2800 0x9FC4 41DD 54C7 E4DE A5C4 40DD 1EC6 33DE 82C3 90DC 0BC4 02DD 2801 0x4AC1 3EDB 2803 The MD5sum of this is indeed the same as the one found in the 2804 streaminfo metadata block. 2806 B.3. Decoding example 3 2808 This example is once again a very short FLAC file. The focus of this 2809 example is on decoding a subframe with a linear predictor and a coded 2810 residual with more than one partition. 2812 B.3.1. Example file 3 in hexadecimal representation 2814 00000000: 664c 6143 8000 0022 1000 1000 fLaC...".... 2815 0000000c: 0000 1f00 001f 07d0 0070 0000 .........p.. 2816 00000018: 0018 f8f9 e396 f5cb cfc6 dc80 ............ 2817 00000024: 7f99 7790 6b32 fff8 6802 0017 ..w.k2..h... 2818 00000030: e944 004f 6f31 3d10 47d2 27cb .D.Oo1=.G.'. 2819 0000003c: 6d09 0831 452b dc28 2222 8057 m..1E+.("".W 2820 00000048: a3 . 2822 B.3.2. Example file 3 in binary representation (only audio frame) 2824 0000002a: 11111111 11111000 01101000 00000010 ..h. 2825 0000002e: 00000000 00010111 11101001 01000100 ...D 2826 00000032: 00000000 01001111 01101111 00110001 .Oo1 2827 00000036: 00111101 00010000 01000111 11010010 =.G. 2828 0000003a: 00100111 11001011 01101101 00001001 '.m. 2829 0000003e: 00001000 00110001 01000101 00101011 .1E+ 2830 00000042: 11011100 00101000 00100010 00100010 .("" 2831 00000046: 10000000 01010111 10100011 .W. 2833 B.3.3. Signature and streaminfo 2835 Most of the streaminfo block is the same as in example 1, so only 2836 parts that are different are listed in the following table 2837 +========+=========+====================+=========================+ 2838 | Start | Length | Contents | Description | 2839 +========+=========+====================+=========================+ 2840 | 0x0c+0 | 3 byte | 0x00001f | Min. frame size 31 byte | 2841 +--------+---------+--------------------+-------------------------+ 2842 | 0x0f+0 | 3 byte | 0x00001f | Max. frame size 31 byte | 2843 +--------+---------+--------------------+-------------------------+ 2844 | 0x12+0 | 20 bit | 0x07d0, 0x0000 | Sample rate 32000 Hertz | 2845 +--------+---------+--------------------+-------------------------+ 2846 | 0x14+4 | 3 bit | 0b000 | 1 channel | 2847 +--------+---------+--------------------+-------------------------+ 2848 | 0x14+7 | 5 bit | 0b00111 | Sample bit depth 8 bit | 2849 +--------+---------+--------------------+-------------------------+ 2850 | 0x15+4 | 36 bit | 0b0000, 0x00000018 | Total no. of samples 24 | 2851 +--------+---------+--------------------+-------------------------+ 2852 | 0x1a | 16 byte | (...) | MD5 signature | 2853 +--------+---------+--------------------+-------------------------+ 2855 Table 40 2857 B.3.4. Audio frame 2859 The frame header starts at position 0x2a and is broken down in the 2860 following table. 2862 +========+========+=================+==============================+ 2863 | Start | Length | Contents | Description | 2864 +========+========+=================+==============================+ 2865 | 0x2a+0 | 15 bit | 0xff, 0b1111100 | Frame sync | 2866 +--------+--------+-----------------+------------------------------+ 2867 | 0x2b+7 | 1 bit | 0b0 | Blocksize strategy | 2868 +--------+--------+-----------------+------------------------------+ 2869 | 0x2c+0 | 4 bit | 0b0110 | 8-bit blocksize further down | 2870 +--------+--------+-----------------+------------------------------+ 2871 | 0x2c+4 | 4 bit | 0b1000 | Sample rate 32kHz | 2872 +--------+--------+-----------------+------------------------------+ 2873 | 0x2d+0 | 4 bit | 0b0000 | Mono audio (1 channel) | 2874 +--------+--------+-----------------+------------------------------+ 2875 | 0x2d+4 | 3 bit | 0b001 | Bit depth 8 bit | 2876 +--------+--------+-----------------+------------------------------+ 2877 | 0x2d+7 | 1 bit | 0b0 | Mandatory 0 bit | 2878 +--------+--------+-----------------+------------------------------+ 2879 | 0x2e+0 | 1 byte | 0x00 | Frame number 0 | 2880 +--------+--------+-----------------+------------------------------+ 2881 | 0x2f+0 | 1 byte | 0x17 | Blocksize 24 | 2882 +--------+--------+-----------------+------------------------------+ 2883 | 0x30+0 | 1 byte | 0xe9 | Frame header CRC | 2884 +--------+--------+-----------------+------------------------------+ 2886 Table 41 2888 The first and only subframe starts at byte 0x31, it is broken down in 2889 the following table, without the coded residual. 2891 +========+========+==========+=====================+ 2892 | Start | Length | Contents | Description | 2893 +========+========+==========+=====================+ 2894 | 0x31+0 | 1 bit | 0b0 | Mandatory 0 bit | 2895 +--------+--------+----------+---------------------+ 2896 | 0x31+1 | 6 bit | 0b100010 | Linear prediction | 2897 | | | | subframe, 3rd order | 2898 +--------+--------+----------+---------------------+ 2899 | 0x31+7 | 1 bit | 0b0 | No wasted bits | 2900 | | | | present | 2901 +--------+--------+----------+---------------------+ 2902 | 0x32+0 | 8 bit | 0x00 | Unencoded warm-up | 2903 | | | | sample 0 | 2904 +--------+--------+----------+---------------------+ 2905 | 0x33+0 | 8 bit | 0x4f | Unencoded warm-up | 2906 | | | | sample 79 | 2907 +--------+--------+----------+---------------------+ 2908 | 0x34+0 | 8 bit | 0x6f | Unencoded warm-up | 2909 | | | | sample 111 | 2910 +--------+--------+----------+---------------------+ 2911 | 0x35+0 | 4 bit | 0b0011 | Coefficient | 2912 | | | | precision 4 bit | 2913 +--------+--------+----------+---------------------+ 2914 | 0x35+4 | 5 bit | 0b00010 | Prediction right | 2915 | | | | shift 2 | 2916 +--------+--------+----------+---------------------+ 2917 | 0x36+1 | 4 bit | 0b0111 | Predictor | 2918 | | | | coefficient 7 | 2919 +--------+--------+----------+---------------------+ 2920 | 0x36+5 | 4 bit | 0b1010 | Predictor | 2921 | | | | coefficient -6 | 2922 +--------+--------+----------+---------------------+ 2923 | 0x37+1 | 4 bit | 0b0010 | Predictor | 2924 | | | | coefficient 2 | 2925 +--------+--------+----------+---------------------+ 2927 Table 42 2929 The data stream continues with the coded residual, which is broken 2930 down in the following table. Residual partition 3 and 4 are left as 2931 an exercise for the reader. 2933 +========+========+==========+======================================+ 2934 | Start | Length | Contents | Description | 2935 +========+========+==========+======================================+ 2936 | 0x37+5 | 2 bit | 0b00 | Rice-coded residual, | 2937 | | | | 4-bit parameter | 2938 +--------+--------+----------+--------------------------------------+ 2939 | 0x37+7 | 4 bit | 0b0010 | Partition order 2 | 2940 +--------+--------+----------+--------------------------------------+ 2941 | 0x38+3 | 4 bit | 0b0011 | Rice parameter 3 | 2942 +--------+--------+----------+--------------------------------------+ 2943 | 0x38+7 | 1 bit | 0b1 | Quotient 0 | 2944 +--------+--------+----------+--------------------------------------+ 2945 | 0x39+0 | 3 bit | 0b110 | Remainder 6 | 2946 +--------+--------+----------+--------------------------------------+ 2947 | 0x39+3 | 1 bit | 0b1 | Quotient 0 | 2948 +--------+--------+----------+--------------------------------------+ 2949 | 0x39+4 | 3 bit | 0b001 | Remainder 1 | 2950 +--------+--------+----------+--------------------------------------+ 2951 | 0x39+7 | 4 bit | 0b0001 | Quotient 3 | 2952 +--------+--------+----------+--------------------------------------+ 2953 | 0x3a+3 | 3 bit | 0b001 | Remainder 1 | 2954 +--------+--------+----------+--------------------------------------+ 2955 | 0x3a+6 | 4 bit | 0b1111 | No rice parameter, | 2956 | | | | escape code | 2957 +--------+--------+----------+--------------------------------------+ 2958 | 0x3b+2 | 5 bit | 0b00101 | Partition encoded | 2959 | | | | with 5 bits | 2960 +--------+--------+----------+--------------------------------------+ 2961 | 0x3b+7 | 5 bit | 0b10110 | Residual -10 | 2962 +--------+--------+----------+--------------------------------------+ 2963 | 0x3c+4 | 5 bit | 0b11010 | Residual -6 | 2964 +--------+--------+----------+--------------------------------------+ 2965 | 0x3d+1 | 5 bit | 0b00010 | Residual 2 | 2966 +--------+--------+----------+--------------------------------------+ 2967 | 0x3d+6 | 5 bit | 0b01000 | Residual 8 | 2968 +--------+--------+----------+--------------------------------------+ 2969 | 0x3e+3 | 5 bit | 0b01000 | Residual 8 | 2970 +--------+--------+----------+--------------------------------------+ 2971 | 0x3f+0 | 5 bit | 0b00110 | Residual 6 | 2972 +--------+--------+----------+--------------------------------------+ 2973 | 0x3f+5 | 4 bit | 0b0010 | Rice parameter 2 | 2974 +--------+--------+----------+--------------------------------------+ 2975 | 0x40+1 | 22 bit | (...) | Residual partition 3 | 2976 +--------+--------+----------+--------------------------------------+ 2977 | 0x42+7 | 4 bit | 0b0001 | Rice parameter 1 | 2978 +--------+--------+----------+--------------------------------------+ 2979 | 0x43+3 | 23 bit | (...) | Residual partition 4 | 2980 +--------+--------+----------+--------------------------------------+ 2982 Table 43 2984 The frame ends with 6 padding bits and a 2 byte frame CRC 2985 To decode this subframe, 21 predictions have to be calculated and 2986 added to their corresponding residuals. This is a sequential 2987 process: as each prediction uses previous samples, it is not possible 2988 to start this decoding halfway a subframe or decode a subframe with 2989 parallel threads. 2991 +===========+=====================+===========+==============+ 2992 | Residual | Predictor w/o shift | Predictor | Sample value | 2993 +===========+=====================+===========+==============+ 2994 | (warm-up) | N/A | N/A | 0 | 2995 +-----------+---------------------+-----------+--------------+ 2996 | (warm-up) | N/A | N/A | 79 | 2997 +-----------+---------------------+-----------+--------------+ 2998 | (warm-up) | N/A | N/A | 111 | 2999 +-----------+---------------------+-----------+--------------+ 3000 | 3 | 303 | 75 | 78 | 3001 +-----------+---------------------+-----------+--------------+ 3002 | -1 | 38 | 9 | 8 | 3003 +-----------+---------------------+-----------+--------------+ 3004 | -13 | -190 | -48 | -61 | 3005 +-----------+---------------------+-----------+--------------+ 3006 | -10 | -319 | -80 | -90 | 3007 +-----------+---------------------+-----------+--------------+ 3008 | -6 | -248 | -62 | -68 | 3009 +-----------+---------------------+-----------+--------------+ 3010 | 2 | -58 | -15 | -13 | 3011 +-----------+---------------------+-----------+--------------+ 3012 | 8 | 137 | 34 | 42 | 3013 +-----------+---------------------+-----------+--------------+ 3014 | 8 | 236 | 59 | 67 | 3015 +-----------+---------------------+-----------+--------------+ 3016 | 6 | 191 | 47 | 53 | 3017 +-----------+---------------------+-----------+--------------+ 3018 | 0 | 53 | 13 | 13 | 3019 +-----------+---------------------+-----------+--------------+ 3020 | -3 | -93 | -24 | -27 | 3021 +-----------+---------------------+-----------+--------------+ 3022 | -5 | -161 | -41 | -46 | 3023 +-----------+---------------------+-----------+--------------+ 3024 | -4 | -134 | -34 | -38 | 3025 +-----------+---------------------+-----------+--------------+ 3026 | -1 | -44 | -11 | -12 | 3027 +-----------+---------------------+-----------+--------------+ 3028 | 1 | 52 | 13 | 14 | 3029 +-----------+---------------------+-----------+--------------+ 3030 | 1 | 94 | 23 | 24 | 3031 +-----------+---------------------+-----------+--------------+ 3032 | 4 | 60 | 15 | 19 | 3033 +-----------+---------------------+-----------+--------------+ 3034 | 2 | 17 | 4 | 6 | 3035 +-----------+---------------------+-----------+--------------+ 3036 | 2 | -24 | -6 | -4 | 3037 +-----------+---------------------+-----------+--------------+ 3038 | 2 | -26 | -7 | -5 | 3039 +-----------+---------------------+-----------+--------------+ 3040 | 0 | 1 | 0 | 0 | 3041 +-----------+---------------------+-----------+--------------+ 3043 Table 44 3045 Lining all these samples up, we get the following input for the MD5 3046 summing process. 3048 0x004F 6F4E 08C3 A6BC F32A 4335 0DE5 D2DA F40E 1813 06FC FB00 3050 Which indeed results in the MD5 signature found in the streaminfo 3051 metadata block. 3053 Authors' Addresses 3055 Martijn van Beurden 3056 Netherlands 3057 Email: mvanb1@gmail.com 3059 Andrew Weaver 3060 Email: theandrewjw@gmail.com