idnits 2.17.1 draft-ietf-cellar-ffv1-18.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (7 October 2020) is 1296 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '41' on line 1101 Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 cellar M. Niedermayer 3 Internet-Draft 4 Intended status: Informational D. Rice 5 Expires: 10 April 2021 6 J. Martinez 7 7 October 2020 9 FFV1 Video Coding Format Version 0, 1, and 3 10 draft-ietf-cellar-ffv1-18 12 Abstract 14 This document defines FFV1, a lossless intra-frame video encoding 15 format. FFV1 is designed to efficiently compress video data in a 16 variety of pixel formats. Compared to uncompressed video, FFV1 17 offers storage compression, frame fixity, and self-description, which 18 makes FFV1 useful as a preservation or intermediate video format. 20 Status of This Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at https://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on 10 April 2021. 37 Copyright Notice 39 Copyright (c) 2020 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 44 license-info) in effect on the date of publication of this document. 45 Please review these documents carefully, as they describe your rights 46 and restrictions with respect to this document. Code Components 47 extracted from this document must include Simplified BSD License text 48 as described in Section 4.e of the Trust Legal Provisions and are 49 provided without warranty as described in the Simplified BSD License. 51 Table of Contents 53 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 54 2. Notation and Conventions . . . . . . . . . . . . . . . . . . 5 55 2.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . 5 56 2.2. Conventions . . . . . . . . . . . . . . . . . . . . . . . 6 57 2.2.1. Pseudo-code . . . . . . . . . . . . . . . . . . . . . 6 58 2.2.2. Arithmetic Operators . . . . . . . . . . . . . . . . 6 59 2.2.3. Assignment Operators . . . . . . . . . . . . . . . . 7 60 2.2.4. Comparison Operators . . . . . . . . . . . . . . . . 7 61 2.2.5. Mathematical Functions . . . . . . . . . . . . . . . 8 62 2.2.6. Order of Operation Precedence . . . . . . . . . . . . 8 63 2.2.7. Range . . . . . . . . . . . . . . . . . . . . . . . . 9 64 2.2.8. NumBytes . . . . . . . . . . . . . . . . . . . . . . 9 65 2.2.9. Bitstream Functions . . . . . . . . . . . . . . . . . 9 66 3. Sample Coding . . . . . . . . . . . . . . . . . . . . . . . . 10 67 3.1. Border . . . . . . . . . . . . . . . . . . . . . . . . . 10 68 3.2. Samples . . . . . . . . . . . . . . . . . . . . . . . . . 11 69 3.3. Median Predictor . . . . . . . . . . . . . . . . . . . . 11 70 3.4. Quantization Table Sets . . . . . . . . . . . . . . . . . 12 71 3.5. Context . . . . . . . . . . . . . . . . . . . . . . . . . 13 72 3.6. Quantization Table Set Indexes . . . . . . . . . . . . . 13 73 3.7. Color spaces . . . . . . . . . . . . . . . . . . . . . . 13 74 3.7.1. YCbCr . . . . . . . . . . . . . . . . . . . . . . . . 14 75 3.7.2. RGB . . . . . . . . . . . . . . . . . . . . . . . . . 14 76 3.8. Coding of the Sample Difference . . . . . . . . . . . . . 16 77 3.8.1. Range Coding Mode . . . . . . . . . . . . . . . . . . 16 78 3.8.2. Golomb Rice Mode . . . . . . . . . . . . . . . . . . 22 79 4. Bitstream . . . . . . . . . . . . . . . . . . . . . . . . . . 28 80 4.1. Quantization Table Set . . . . . . . . . . . . . . . . . 29 81 4.1.1. quant_tables . . . . . . . . . . . . . . . . . . . . 30 82 4.1.2. context_count . . . . . . . . . . . . . . . . . . . . 31 83 4.2. Parameters . . . . . . . . . . . . . . . . . . . . . . . 31 84 4.2.1. version . . . . . . . . . . . . . . . . . . . . . . . 33 85 4.2.2. micro_version . . . . . . . . . . . . . . . . . . . . 33 86 4.2.3. coder_type . . . . . . . . . . . . . . . . . . . . . 34 87 4.2.4. state_transition_delta . . . . . . . . . . . . . . . 34 88 4.2.5. colorspace_type . . . . . . . . . . . . . . . . . . . 35 89 4.2.6. chroma_planes . . . . . . . . . . . . . . . . . . . . 35 90 4.2.7. bits_per_raw_sample . . . . . . . . . . . . . . . . . 36 91 4.2.8. log2_h_chroma_subsample . . . . . . . . . . . . . . . 36 92 4.2.9. log2_v_chroma_subsample . . . . . . . . . . . . . . . 36 93 4.2.10. extra_plane . . . . . . . . . . . . . . . . . . . . . 36 94 4.2.11. num_h_slices . . . . . . . . . . . . . . . . . . . . 37 95 4.2.12. num_v_slices . . . . . . . . . . . . . . . . . . . . 37 96 4.2.13. quant_table_set_count . . . . . . . . . . . . . . . . 37 97 4.2.14. states_coded . . . . . . . . . . . . . . . . . . . . 37 98 4.2.15. initial_state_delta . . . . . . . . . . . . . . . . . 37 99 4.2.16. ec . . . . . . . . . . . . . . . . . . . . . . . . . 38 100 4.2.17. intra . . . . . . . . . . . . . . . . . . . . . . . . 38 101 4.3. Configuration Record . . . . . . . . . . . . . . . . . . 39 102 4.3.1. reserved_for_future_use . . . . . . . . . . . . . . . 39 103 4.3.2. configuration_record_crc_parity . . . . . . . . . . . 39 104 4.3.3. Mapping FFV1 into Containers . . . . . . . . . . . . 39 105 4.4. Frame . . . . . . . . . . . . . . . . . . . . . . . . . . 40 106 4.5. Slice . . . . . . . . . . . . . . . . . . . . . . . . . . 42 107 4.6. Slice Header . . . . . . . . . . . . . . . . . . . . . . 43 108 4.6.1. slice_x . . . . . . . . . . . . . . . . . . . . . . . 44 109 4.6.2. slice_y . . . . . . . . . . . . . . . . . . . . . . . 44 110 4.6.3. slice_width . . . . . . . . . . . . . . . . . . . . . 44 111 4.6.4. slice_height . . . . . . . . . . . . . . . . . . . . 44 112 4.6.5. quant_table_set_index_count . . . . . . . . . . . . . 44 113 4.6.6. quant_table_set_index . . . . . . . . . . . . . . . . 45 114 4.6.7. picture_structure . . . . . . . . . . . . . . . . . . 45 115 4.6.8. sar_num . . . . . . . . . . . . . . . . . . . . . . . 45 116 4.6.9. sar_den . . . . . . . . . . . . . . . . . . . . . . . 46 117 4.7. Slice Content . . . . . . . . . . . . . . . . . . . . . . 46 118 4.7.1. primary_color_count . . . . . . . . . . . . . . . . . 46 119 4.7.2. plane_pixel_height . . . . . . . . . . . . . . . . . 46 120 4.7.3. slice_pixel_height . . . . . . . . . . . . . . . . . 47 121 4.7.4. slice_pixel_y . . . . . . . . . . . . . . . . . . . . 47 122 4.8. Line . . . . . . . . . . . . . . . . . . . . . . . . . . 47 123 4.8.1. plane_pixel_width . . . . . . . . . . . . . . . . . . 47 124 4.8.2. slice_pixel_width . . . . . . . . . . . . . . . . . . 48 125 4.8.3. slice_pixel_x . . . . . . . . . . . . . . . . . . . . 48 126 4.8.4. sample_difference . . . . . . . . . . . . . . . . . . 48 127 4.9. Slice Footer . . . . . . . . . . . . . . . . . . . . . . 48 128 4.9.1. slice_size . . . . . . . . . . . . . . . . . . . . . 49 129 4.9.2. error_status . . . . . . . . . . . . . . . . . . . . 49 130 4.9.3. slice_crc_parity . . . . . . . . . . . . . . . . . . 49 131 5. Restrictions . . . . . . . . . . . . . . . . . . . . . . . . 49 132 6. Security Considerations . . . . . . . . . . . . . . . . . . . 50 133 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 51 134 7.1. Media Type Definition . . . . . . . . . . . . . . . . . . 51 135 8. Changelog . . . . . . . . . . . . . . . . . . . . . . . . . . 52 136 9. Normative References . . . . . . . . . . . . . . . . . . . . 52 137 10. Informative References . . . . . . . . . . . . . . . . . . . 53 138 Appendix A. Multi-theaded decoder implementation suggestions . . 55 139 Appendix B. Future handling of some streams created by non 140 conforming encoders . . . . . . . . . . . . . . . . . . . 55 141 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 55 143 1. Introduction 145 This document describes FFV1, a lossless video encoding format. The 146 design of FFV1 considers the storage of image characteristics, data 147 fixity, and the optimized use of encoding time and storage 148 requirements. FFV1 is designed to support a wide range of lossless 149 video applications such as long-term audiovisual preservation, 150 scientific imaging, screen recording, and other video encoding 151 scenarios that seek to avoid the generational loss of lossy video 152 encodings. 154 This document defines version 0, 1 and 3 of FFV1. The distinctions 155 of the versions are provided throughout the document, but in summary: 157 * Version 0 of FFV1 was the original implementation of FFV1 and has 158 been flagged as stable on April 14, 2006 [FFV1_V0]. 160 * Version 1 of FFV1 adds support of more video bit depths and has 161 been has been flagged as stable on April 24, 2009 [FFV1_V1]. 163 * Version 2 of FFV1 only existed in experimental form and is not 164 described by this document, but is available as a LyX file at 165 https://github.com/FFmpeg/FFV1/ 166 blob/8ad772b6d61c3dd8b0171979a2cd9f11924d5532/ffv1.lyx 167 (https://github.com/FFmpeg/FFV1/ 168 blob/8ad772b6d61c3dd8b0171979a2cd9f11924d5532/ffv1.lyx). 170 * Version 3 of FFV1 adds several features such as increased 171 description of the characteristics of the encoding images and 172 embedded CRC data to support fixity verification of the encoding. 173 Version 3 has been flagged as stable on August 17, 2013 [FFV1_V3]. 175 This document assumes familiarity with mathematical and coding 176 concepts such as Range coding [range-coding] and YCbCr color spaces 177 [YCbCr]. 179 This specification describes the valid bitstream and how to decode 180 such valid bitstream. Bitstreams not conforming to this 181 specification or how they are handled is outside this specification. 182 A decoder could reject every invalid bitstream or attempt to perform 183 error concealment or re-download or use a redundant copy of the 184 invalid part or any other action it deems appropriate. 186 2. Notation and Conventions 188 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 189 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 190 "OPTIONAL" in this document are to be interpreted as described in BCP 191 14 [RFC2119] [RFC8174] when, and only when, they appear in all 192 capitals, as shown here. 194 2.1. Definitions 196 "FFV1": choosen name of this video encoding format, short version of 197 "FF Video 1", the letters "FF" coming from "FFmpeg", the name of the 198 reference decoder, whose the first letters originaly means "Fast 199 Forward". 201 "Container": Format that encapsulates Frames (see Section 4.4) and 202 (when required) a "Configuration Record" into a bitstream. 204 "Sample": The smallest addressable representation of a color 205 component or a luma component in a Frame. Examples of Sample are 206 Luma (Y), Blue-difference Chroma (Cb), Red-difference Chroma (Cr), 207 Transparency, Red, Green, and Blue. 209 "Symbol": A value stored in the bitstream, which is defined and 210 decoded through one of the methods described in Table 4. 212 "Line": A discrete component of a static image composed of Samples 213 that represent a specific quantification of Samples of that image. 215 "Plane": A discrete component of a static image composed of Lines 216 that represent a specific quantification of Lines of that image. 218 "Pixel": The smallest addressable representation of a color in a 219 Frame. It is composed of one or more Samples. 221 "ESC": An ESCape Symbol to indicate that the Symbol to be stored is 222 too large for normal storage and that an alternate storage method is 223 used. 225 "MSB": Most Significant Bit, the bit that can cause the largest 226 change in magnitude of the Symbol. 228 "VLC": Variable Length Code, a code that maps source symbols to a 229 variable number of bits. 231 "RGB": A reference to the method of storing the value of a Pixel by 232 using three numeric values that represent Red, Green, and Blue. 234 "YCbCr": A reference to the method of storing the value of a Pixel by 235 using three numeric values that represent the luma of the Pixel (Y) 236 and the chroma of the Pixel (Cb and Cr). YCbCr word is used for 237 historical reasons and currently references any color space relying 238 on 1 luma Sample and 2 chroma Samples, e.g. YCbCr, YCgCo or ICtCp. 239 The exact meaning of the three numeric values is unspecified. 241 2.2. Conventions 243 2.2.1. Pseudo-code 245 The FFV1 bitstream is described in this document using pseudo-code. 246 Note that the pseudo-code is used for clarity in order to illustrate 247 the structure of FFV1 and not intended to specify any particular 248 implementation. The pseudo-code used is based upon the C programming 249 language [ISO.9899.2018] and uses its "if/else", "while" and "for" 250 keywords as well as functions defined within this document. 252 In some instances, pseudo-code is presented in a two-column format 253 such as shown in Figure 1. In this form the "type" column provides a 254 Symbol as defined in Table 4 that defines the storage of the data 255 referenced in that same line of pseudo-code. 257 pseudo-code | type 258 --------------------------------------------------------------|----- 259 ExamplePseudoCode( ) { | 260 value | ur 261 } | 263 Figure 1: A depiction of type-labelled pseudo-code used within 264 this document. 266 2.2.2. Arithmetic Operators 268 Note: the operators and the order of precedence are the same as used 269 in the C programming language [ISO.9899.2018], with the exception of 270 ">>" (removal of implementation defined behavior) and "^" (power 271 instead of XOR) operators which are re-defined within this section. 273 "a + b" means a plus b. 275 "a - b" means a minus b. 277 "-a" means negation of a. 279 "a * b" means a multiplied by b. 281 "a / b" means a divided by b. 283 "a ^ b" means a raised to the b-th power. 285 "a & b" means bit-wise "and" of a and b. 287 "a | b" means bit-wise "or" of a and b. 289 "a >> b" means arithmetic right shift of two's complement integer 290 representation of a by b binary digits. This is equivalent to 291 dividing a by 2, b times, with rounding toward negative infinity. 293 "a << b" means arithmetic left shift of two's complement integer 294 representation of a by b binary digits. 296 2.2.3. Assignment Operators 298 "a = b" means a is assigned b. 300 "a++" is equivalent to a is assigned a + 1. 302 "a--" is equivalent to a is assigned a - 1. 304 "a += b" is equivalent to a is assigned a + b. 306 "a -= b" is equivalent to a is assigned a - b. 308 "a *= b" is equivalent to a is assigned a * b. 310 2.2.4. Comparison Operators 312 "a > b" is true when a is greater than b. 314 "a >= b" is true when a is greater than or equal to b. 316 "a < b" is true when a is less than b. 318 "a <= b" is true when a is less than or equal b. 320 "a == b" is true when a is equal to b. 322 "a != b" is true when a is not equal to b. 324 "a && b" is true when both a is true and b is true. 326 "a || b" is true when either a is true or b is true. 328 "!a" is true when a is not true. 330 "a ? b : c" if a is true, then b, otherwise c. 332 2.2.5. Mathematical Functions 334 "floor(a)" means the largest integer less than or equal to a. 336 "ceil(a)" means the smallest integer greater than or equal to a. 338 "sign(a)" extracts the sign of a number, i.e. if a < 0 then -1, else 339 if a > 0 then 1, else 0. 341 "abs(a)" means the absolute value of a, i.e. "abs(a)" = "sign(a) * 342 a". 344 "log2(a)" means the base-two logarithm of a. 346 "min(a,b)" means the smaller of two values a and b. 348 "max(a,b)" means the larger of two values a and b. 350 "median(a,b,c)" means the numerical middle value in a data set of a, 351 b, and c, i.e. a+b+c-min(a,b,c)-max(a,b,c). 353 "A <== B" means B implies A. 355 "A <==> B" means A <== B , B <== A. 357 a_(b) means the b-th value of a sequence of a 359 a_(b,c) means the 'b,c'-th value of a sequence of a 361 2.2.6. Order of Operation Precedence 363 When order of precedence is not indicated explicitly by use of 364 parentheses, operations are evaluated in the following order (from 365 top to bottom, operations of same precedence being evaluated from 366 left to right). This order of operations is based on the order of 367 operations used in Standard C. 369 a++, a-- 370 !a, -a 371 a ^ b 372 a * b, a / b 373 a + b, a - b 374 a << b, a >> b 375 a < b, a <= b, a > b, a >= b 376 a == b, a != b 377 a & b 378 a | b 379 a && b 380 a || b 381 a ? b : c 382 a = b, a += b, a -= b, a *= b 384 2.2.7. Range 386 "a...b" means any value from a to b, inclusive. 388 2.2.8. NumBytes 390 "NumBytes" is a non-negative integer that expresses the size in 8-bit 391 octets of a particular FFV1 "Configuration Record" or "Frame". FFV1 392 relies on its Container to store the "NumBytes" values; see 393 Section 4.3.3. 395 2.2.9. Bitstream Functions 397 2.2.9.1. remaining_bits_in_bitstream 399 "remaining_bits_in_bitstream( NumBytes )" means the count of 400 remaining bits after the pointer in that "Configuration Record" or 401 "Frame". It is computed from the "NumBytes" value multiplied by 8 402 minus the count of bits of that "Configuration Record" or "Frame" 403 already read by the bitstream parser. 405 2.2.9.2. remaining_symbols_in_syntax 407 "remaining_symbols_in_syntax( )" is true as long as the RangeCoder 408 has not consumed all the given input bytes. 410 2.2.9.3. byte_aligned 412 "byte_aligned( )" is true if "remaining_bits_in_bitstream( NumBytes 413 )" is a multiple of 8, otherwise false. 415 2.2.9.4. get_bits 417 "get_bits( i )" is the action to read the next "i" bits in the 418 bitstream, from most significant bit to least significant bit, and to 419 return the corresponding value. The pointer is increased by "i". 421 3. Sample Coding 423 For each "Slice" (as described in Section 4.5) of a Frame, the 424 Planes, Lines, and Samples are coded in an order determined by the 425 color space (see Section 3.7). Each Sample is predicted by the 426 median predictor as described in Section 3.3 from other Samples 427 within the same Plane and the difference is stored using the method 428 described in Section 3.8. 430 3.1. Border 432 A border is assumed for each coded "Slice" for the purpose of the 433 median predictor and context according to the following rules: 435 * one column of Samples to the left of the coded slice is assumed as 436 identical to the Samples of the leftmost column of the coded slice 437 shifted down by one row. The value of the topmost Sample of the 438 column of Samples to the left of the coded slice is assumed to be 439 "0" 441 * one column of Samples to the right of the coded slice is assumed 442 as identical to the Samples of the rightmost column of the coded 443 slice 445 * an additional column of Samples to the left of the coded slice and 446 two rows of Samples above the coded slice are assumed to be "0" 448 Figure 2 depicts a slice of 9 Samples "a,b,c,d,e,f,g,h,i" in a 3x3 449 arrangement along with its assumed border. 451 +---+---+---+---+---+---+---+---+ 452 | 0 | 0 | | 0 | 0 | 0 | | 0 | 453 +---+---+---+---+---+---+---+---+ 454 | 0 | 0 | | 0 | 0 | 0 | | 0 | 455 +---+---+---+---+---+---+---+---+ 456 | | | | | | | | | 457 +---+---+---+---+---+---+---+---+ 458 | 0 | 0 | | a | b | c | | c | 459 +---+---+---+---+---+---+---+---+ 460 | 0 | a | | d | e | f | | f | 461 +---+---+---+---+---+---+---+---+ 462 | 0 | d | | g | h | i | | i | 463 +---+---+---+---+---+---+---+---+ 465 Figure 2: A depiction of FFV1's assumed border for a set example 466 Samples. 468 3.2. Samples 470 Relative to any Sample "X", six other relatively positioned Samples 471 from the coded Samples and presumed border are identified according 472 to the labels used in Figure 3. The labels for these relatively 473 positioned Samples are used within the median predictor and context. 475 +---+---+---+---+ 476 | | | T | | 477 +---+---+---+---+ 478 | |tl | t |tr | 479 +---+---+---+---+ 480 | L | l | X | | 481 +---+---+---+---+ 483 Figure 3: A depiction of how relatively positioned Samples are 484 referenced within this document. 486 The labels for these relative Samples are made of the first letters 487 of the words Top, Left and Right. 489 3.3. Median Predictor 491 The prediction for any Sample value at position "X" may be computed 492 based upon the relative neighboring values of "l", "t", and "tl" via 493 this equation: 495 median(l, t, l + t - tl) 497 Note, this prediction template is also used in [ISO.14495-1.1999] and 498 [HuffYUV]. 500 Exception for the median predictor: if "colorspace_type == 0 && 501 bits_per_raw_sample == 16 && ( coder_type == 1 || coder_type == 2 )" 502 (see Section 4.2.5, Section 4.2.7 and Section 4.2.5), the following 503 median predictor MUST be used: 505 median(left16s, top16s, left16s + top16s - diag16s) 507 where: 509 left16s = l >= 32768 ? ( l - 65536 ) : l 510 top16s = t >= 32768 ? ( t - 65536 ) : t 511 diag16s = tl >= 32768 ? ( tl - 65536 ) : tl 513 Background: a two's complement 16-bit signed integer was used for 514 storing Sample values in all known implementations of FFV1 bitstream. 515 So in some circumstances, the most significant bit was wrongly 516 interpreted (used as a sign bit instead of the 16th bit of an 517 unsigned integer). Note that when the issue was discovered, the only 518 configuration of all known implementations being impacted is 16-bit 519 YCbCr with no Pixel transformation with Range Coder coder, as other 520 potentially impacted configurations (e.g. 15/16-bit JPEG2000-RCT with 521 Range Coder coder, or 16-bit content with Golomb Rice coder) were 522 implemented nowhere [ISO.15444-1.2016]. In the meanwhile, 16-bit 523 JPEG2000-RCT with Range Coder coder was implemented without this 524 issue in one implementation and validated by one conformance checker. 525 It is expected (to be confirmed) to remove this exception for the 526 median predictor in the next version of the FFV1 bitstream. 528 3.4. Quantization Table Sets 530 The FFV1 bitstream contains one or more Quantization Table Sets. 531 Each Quantization Table Set contains exactly 5 Quantization Tables 532 with each Quantization Table corresponding to one of the five 533 Quantized Sample Differences. For each Quantization Table, both the 534 number of quantization steps and their distribution are stored in the 535 FFV1 bitstream; each Quantization Table has exactly 256 entries, and 536 the 8 least significant bits of the Quantized Sample Difference are 537 used as index: 539 Q_(j)[k] = quant_tables[i][j][k&255] 541 Figure 4 543 In this formula, "i" is the Quantization Table Set index, "j" is the 544 Quantized Table index, "k" the Quantized Sample Difference. 546 3.5. Context 548 Relative to any Sample "X", the Quantized Sample Differences "L-l", 549 "l-tl", "tl-t", "T-t", and "t-tr" are used as context: 551 context = Q_(0)[l - tl] + 552 Q_(1)[tl - t] + 553 Q_(2)[t - tr] + 554 Q_(3)[L - l] + 555 Q_(4)[T - t] 557 Figure 5 559 If "context >= 0" then "context" is used and the difference between 560 the Sample and its predicted value is encoded as is, else "-context" 561 is used and the difference between the Sample and its predicted value 562 is encoded with a flipped sign. 564 3.6. Quantization Table Set Indexes 566 For each Plane of each slice, a Quantization Table Set is selected 567 from an index: 569 * For Y Plane, "quant_table_set_index[ 0 ]" index is used 571 * For Cb and Cr Planes, "quant_table_set_index[ 1 ]" index is used 573 * For extra Plane, "quant_table_set_index[ (version <= 3 || 574 chroma_planes) ? 2 : 1 ]" index is used 576 Background: in first implementations of FFV1 bitstream, the index for 577 Cb and Cr Planes was stored even if it is not used (chroma_planes set 578 to 0), this index is kept for "version" <= 3 in order to keep 579 compatibility with FFV1 bitstreams in the wild. 581 3.7. Color spaces 583 FFV1 supports several color spaces. The count of allowed coded 584 planes and the meaning of the extra Plane are determined by the 585 selected color space. 587 The FFV1 bitstream interleaves data in an order determined by the 588 color space. In YCbCr for each Plane, each Line is coded from top to 589 bottom and for each Line, each Sample is coded from left to right. 590 In JPEG2000-RCT for each Line from top to bottom, each Plane is coded 591 and for each Plane, each Sample is encoded from left to right. 593 3.7.1. YCbCr 595 This color space allows 1 to 4 Planes. 597 The Cb and Cr Planes are optional, but if used then MUST be used 598 together. Omitting the Cb and Cr Planes codes the frames in 599 grayscale without color data. 601 An optional transparency Plane can be used to code transparency data. 603 An FFV1 Frame using YCbCr MUST use one of the following arrangements: 605 * Y 607 * Y, Transparency 609 * Y, Cb, Cr 611 * Y, Cb, Cr, Transparency 613 The Y Plane MUST be coded first. If the Cb and Cr Planes are used 614 then they MUST be coded after the Y Plane. If a transparency Plane 615 is used, then it MUST be coded last. 617 3.7.2. RGB 619 This color space allows 3 or 4 Planes. 621 An optional transparency Plane can be used to code transparency data. 623 JPEG2000-RCT is a Reversible Color Transform that codes RGB (red, 624 green, blue) Planes losslessly in a modified YCbCr color space 625 [ISO.15444-1.2016]. Reversible Pixel transformations between YCbCr 626 and RGB use the following formulae. 628 Cb = b - g 629 Cr = r - g 630 Y = g + (Cb + Cr) >> 2 631 g = Y - (Cb + Cr) >> 2 632 r = Cr + g 633 b = Cb + g 635 Figure 6 637 Exception for the JPEG2000-RCT conversion: if "bits_per_raw_sample" 638 is between 9 and 15 inclusive and "extra_plane" is 0, the following 639 formulae for reversible conversions between YCbCr and RGB MUST be 640 used instead of the ones above: 642 Cb = g - b 643 Cr = r - b 644 Y = b +(Cb + Cr) >> 2 645 b = Y -(Cb + Cr) >> 2 646 r = Cr + b 647 g = Cb + b 649 Figure 7 651 Background: At the time of this writing, in all known implementations 652 of FFV1 bitstream, when "bits_per_raw_sample" was between 9 and 15 653 inclusive and "extra_plane" is 0, GBR Planes were used as BGR Planes 654 during both encoding and decoding. In the meanwhile, 16-bit 655 JPEG2000-RCT was implemented without this issue in one implementation 656 and validated by one conformance checker. Methods to address this 657 exception for the transform are under consideration for the next 658 version of the FFV1 bitstream. 660 Cb and Cr are positively offset by "1 << bits_per_raw_sample" after 661 the conversion from RGB to the modified YCbCr and are negatively 662 offseted by the same value before the conversion from the modified 663 YCbCr to RGB, in order to have only non-negative values after the 664 conversion. 666 When FFV1 uses the JPEG2000-RCT, the horizontal Lines are interleaved 667 to improve caching efficiency since it is most likely that the 668 JPEG2000-RCT will immediately be converted to RGB during decoding. 669 The interleaved coding order is also Y, then Cb, then Cr, and then, 670 if used, transparency. 672 As an example, a Frame that is two Pixels wide and two Pixels high, 673 could comprise the following structure: 675 +------------------------+------------------------+ 676 | Pixel(1,1) | Pixel(2,1) | 677 | Y(1,1) Cb(1,1) Cr(1,1) | Y(2,1) Cb(2,1) Cr(2,1) | 678 +------------------------+------------------------+ 679 | Pixel(1,2) | Pixel(2,2) | 680 | Y(1,2) Cb(1,2) Cr(1,2) | Y(2,2) Cb(2,2) Cr(2,2) | 681 +------------------------+------------------------+ 683 In JPEG2000-RCT, the coding order would be left to right and then top 684 to bottom, with values interleaved by Lines and stored in this order: 686 Y(1,1) Y(2,1) Cb(1,1) Cb(2,1) Cr(1,1) Cr(2,1) Y(1,2) Y(2,2) Cb(1,2) 687 Cb(2,2) Cr(1,2) Cr(2,2) 689 3.8. Coding of the Sample Difference 691 Instead of coding the n+1 bits of the Sample Difference with Huffman 692 or Range coding (or n+2 bits, in the case of JPEG2000-RCT), only the 693 n (or n+1, in the case of JPEG2000-RCT) least significant bits are 694 used, since this is sufficient to recover the original Sample. In 695 the equation below, the term "bits" represents "bits_per_raw_sample + 696 1" for JPEG2000-RCT or "bits_per_raw_sample" otherwise: 698 coder_input = [(sample_difference + 2 ^ (bits - 1)) & 699 (2 ^ bits - 1)] - 2 ^ (bits - 1) 701 Figure 8: Description of the coding of the Sample Difference in 702 the bitstream. 704 3.8.1. Range Coding Mode 706 Early experimental versions of FFV1 used the CABAC Arithmetic coder 707 from H.264 as defined in [ISO.14496-10.2014] but due to the uncertain 708 patent/royalty situation, as well as its slightly worse performance, 709 CABAC was replaced by a Range coder based on an algorithm defined by 710 G. Nigel N. Martin in 1979 [range-coding]. 712 3.8.1.1. Range Binary Values 714 To encode binary digits efficiently a Range coder is used. C_(i) is 715 the i-th Context. B_(i) is the i-th byte of the bytestream. b_(i) is 716 the i-th Range coded binary value, S_(0, i) is the i-th initial 717 state. The length of the bytestream encoding n binary symbols is 718 j_(n) bytes. 720 r_(i) = floor( ( R_(i) * S_(i, C_(i)) ) / 2 ^ 8 ) 722 Figure 9: A formula of the read of a binary value in Range Binary 723 mode. 725 S_(i + 1, C_(i)) = zero_state_(S_(i, C_(i))) AND 726 l_(i) = L_(i) AND 727 t_(i) = R_(i) - r_(i) <== 728 b_(i) = 0 <==> 729 L_(i) < R_(i) - r_(i) 731 S_(i + 1, C_(i)) = one_state_(S_(i, C_(i))) AND 732 l_(i) = L_(i) - R_(i) + r_(i) AND 733 t_(i) = r_(i) <== 734 b_(i) = 1 <==> 735 L_(i) >= R_(i) - r_(i) 736 Figure 10 738 S_(i + 1, k) = S_(i, k) <== C_(i) != k 740 Figure 11 742 R_(i + 1) = 2 ^ 8 * t_(i) AND 743 L_(i + 1) = 2 ^ 8 * l_(i) + B_(j_(i)) AND 744 j_(i + 1) = j_(i) + 1 <== 745 t_(i) < 2 ^ 8 747 R_(i + 1) = t_(i) AND 748 L_(i + 1) = l_(i) AND 749 j_(i + 1) = j_(i) <== 750 t_(i) >= 2 ^ 8 752 Figure 12 754 R_(0) = 65280 756 Figure 13 758 L_(0) = 2 ^ 8 * B_(0) + B_(1) 760 Figure 14 762 j_(0) = 2 764 Figure 15 766 range = 0xFF00; 767 end = 0; 768 low = get_bits(16); 769 if (low >= range) { 770 low = range; 771 end = 1; 772 } 774 Figure 16: A pseudo-code description of the initial states in 775 Range Binary mode. 777 refill() { 778 if (range < 256) { 779 range = range * 256; 780 low = low * 256; 781 if (!end) { 782 c.low += get_bits(8); 783 if (remaining_bits_in_bitstream( NumBytes ) == 0) { 784 end = 1; 785 } 786 } 787 } 788 } 790 Figure 17: A pseudo-code description of refilling the Range 791 Binary Value coder buffer. 793 get_rac(state) { 794 rangeoff = (range * state) / 256; 795 range -= rangeoff; 796 if (low < range) { 797 state = zero_state[state]; 798 refill(); 799 return 0; 800 } else { 801 low -= range; 802 state = one_state[state]; 803 range = rangeoff; 804 refill(); 805 return 1; 806 } 807 } 809 Figure 18: A pseudo-code description of the read of a binary 810 value in Range Binary mode. 812 3.8.1.1.1. Termination 814 The range coder can be used in three modes. 816 * In "Open mode" when decoding, every Symbol the reader attempts to 817 read is available. In this mode arbitrary data can have been 818 appended without affecting the range coder output. This mode is 819 not used in FFV1. 821 * In "Closed mode" the length in bytes of the bytestream is provided 822 to the range decoder. Bytes beyond the length are read as 0 by 823 the range decoder. This is generally one byte shorter than the 824 open mode. 826 * In "Sentinel mode" the exact length in bytes is not known and thus 827 the range decoder MAY read into the data that follows the range 828 coded bytestream by one byte. In "Sentinel mode", the end of the 829 range coded bytestream is a binary Symbol with state 129, which 830 value SHALL be discarded. After reading this Symbol, the range 831 decoder will have read one byte beyond the end of the range coded 832 bytestream. This way the byte position of the end can be 833 determined. Bytestreams written in "Sentinel mode" can be read in 834 "Closed mode" if the length can be determined, in this case the 835 last (sentinel) Symbol will be read non-corrupted and be of value 836 0. 838 Above describes the range decoding. Encoding is defined as any 839 process which produces a decodable bytestream. 841 There are three places where range coder termination is needed in 842 FFV1. First is in the "Configuration Record", in this case the size 843 of the range coded bytestream is known and handled as "Closed mode". 844 Second is the switch from the "Slice Header" which is range coded to 845 Golomb coded slices as "Sentinel mode". Third is the end of range 846 coded Slices which need to terminate before the CRC at their end. 847 This can be handled as "Sentinel mode" or as "Closed mode" if the CRC 848 position has been determined. 850 3.8.1.2. Range Non Binary Values 852 To encode scalar integers, it would be possible to encode each bit 853 separately and use the past bits as context. However that would mean 854 255 contexts per 8-bit Symbol that is not only a waste of memory but 855 also requires more past data to reach a reasonably good estimate of 856 the probabilities. Alternatively assuming a Laplacian distribution 857 and only dealing with its variance and mean (as in Huffman coding) 858 would also be possible, however, for maximum flexibility and 859 simplicity, the chosen method uses a single Symbol to encode if a 860 number is 0, and if not, encodes the number using its exponent, 861 mantissa and sign. The exact contexts used are best described by 862 Figure 19. 864 int get_symbol(RangeCoder *c, uint8_t *state, int is_signed) { 865 if (get_rac(c, state + 0) { 866 return 0; 867 } 869 int e = 0; 870 while (get_rac(c, state + 1 + min(e, 9)) { //1..10 871 e++; 872 } 874 int a = 1; 875 for (int i = e - 1; i >= 0; i--) { 876 a = a * 2 + get_rac(c, state + 22 + min(i, 9)); // 22..31 877 } 879 if (!is_signed) { 880 return a; 881 } 883 if (get_rac(c, state + 11 + min(e, 10))) { //11..21 884 return -a; 885 } else { 886 return a; 887 } 888 } 890 Figure 19: A pseudo-code description of the contexts of Range Non 891 Binary Values. 893 "get_symbol" is used for the read out of "sample_difference" 894 indicated in Figure 8. 896 "get_rac" returns a boolean, computed from the bytestream as 897 described in Figure 9 as a formula and in Figure 18 as pseudo-code. 899 3.8.1.3. Initial Values for the Context Model 901 When "keyframe" (see Section 4.4) value is 1, all Range coder state 902 variables are set to their initial state. 904 3.8.1.4. State Transition Table 906 one_state_(i) = 907 default_state_transition_(i) + state_transition_delta_(i) 909 Figure 20 911 zero_state_(i) = 256 - one_state_(256-i) 912 Figure 21 914 3.8.1.5. default_state_transition 916 0, 0, 0, 0, 0, 0, 0, 0, 20, 21, 22, 23, 24, 25, 26, 27, 918 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 37, 38, 39, 40, 41, 42, 920 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 56, 57, 922 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 924 74, 75, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 926 89, 90, 91, 92, 93, 94, 94, 95, 96, 97, 98, 99,100,101,102,103, 928 104,105,106,107,108,109,110,111,112,113,114,114,115,116,117,118, 930 119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,133, 932 134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149, 934 150,151,152,152,153,154,155,156,157,158,159,160,161,162,163,164, 936 165,166,167,168,169,170,171,171,172,173,174,175,176,177,178,179, 938 180,181,182,183,184,185,186,187,188,189,190,190,191,192,194,194, 940 195,196,197,198,199,200,201,202,202,204,205,206,207,208,209,209, 942 210,211,212,213,215,215,216,217,218,219,220,220,222,223,224,225, 944 226,227,227,229,229,230,231,232,234,234,235,236,237,238,239,240, 946 241,242,243,244,245,246,247,248,248, 0, 0, 0, 0, 0, 0, 0, 948 3.8.1.6. Alternative State Transition Table 950 The alternative state transition table has been built using iterative 951 minimization of frame sizes and generally performs better than the 952 default. To use it, the "coder_type" (see Section 4.2.3) MUST be set 953 to 2 and the difference to the default MUST be stored in the 954 "Parameters", see Section 4.2. The reference implementation of FFV1 955 in FFmpeg uses Figure 22 by default at the time of this writing when 956 Range coding is used. 958 0, 10, 10, 10, 10, 16, 16, 16, 28, 16, 16, 29, 42, 49, 20, 49, 960 59, 25, 26, 26, 27, 31, 33, 33, 33, 34, 34, 37, 67, 38, 39, 39, 962 40, 40, 41, 79, 43, 44, 45, 45, 48, 48, 64, 50, 51, 52, 88, 52, 964 53, 74, 55, 57, 58, 58, 74, 60,101, 61, 62, 84, 66, 66, 68, 69, 966 87, 82, 71, 97, 73, 73, 82, 75,111, 77, 94, 78, 87, 81, 83, 97, 968 85, 83, 94, 86, 99, 89, 90, 99,111, 92, 93,134, 95, 98,105, 98, 970 105,110,102,108,102,118,103,106,106,113,109,112,114,112,116,125, 972 115,116,117,117,126,119,125,121,121,123,145,124,126,131,127,129, 974 165,130,132,138,133,135,145,136,137,139,146,141,143,142,144,148, 976 147,155,151,149,151,150,152,157,153,154,156,168,158,162,161,160, 978 172,163,169,164,166,184,167,170,177,174,171,173,182,176,180,178, 980 175,189,179,181,186,183,192,185,200,187,191,188,190,197,193,196, 982 197,194,195,196,198,202,199,201,210,203,207,204,205,206,208,214, 984 209,211,221,212,213,215,224,216,217,218,219,220,222,228,223,225, 986 226,224,227,229,240,230,231,232,233,234,235,236,238,239,237,242, 988 241,243,242,244,245,246,247,248,249,250,251,252,252,253,254,255, 990 Figure 22: Alternative state transition table for Range coding. 992 3.8.2. Golomb Rice Mode 994 The end of the bitstream of the Frame is padded with 0-bits until the 995 bitstream contains a multiple of 8 bits. 997 3.8.2.1. Signed Golomb Rice Codes 999 This coding mode uses Golomb Rice codes. The VLC is split into two 1000 parts. The prefix stores the most significant bits and the suffix 1001 stores the k least significant bits or stores the whole number in the 1002 ESC case. 1004 int get_ur_golomb(k) { 1005 for (prefix = 0; prefix < 12; prefix++) { 1006 if (get_bits(1)) { 1007 return get_bits(k) + (prefix << k); 1008 } 1009 } 1010 return get_bits(bits) + 11; 1011 } 1013 Figure 23: A pseudo-code description of the read of an unsigned 1014 integer in Golomb Rice mode. 1016 int get_sr_golomb(k) { 1017 v = get_ur_golomb(k); 1018 if (v & 1) return - (v >> 1) - 1; 1019 else return (v >> 1); 1020 } 1022 Figure 24: A pseudo-code description of the read of a signed 1023 integer in Golomb Rice mode. 1025 3.8.2.1.1. Prefix 1027 +================+=======+ 1028 | bits | value | 1029 +================+=======+ 1030 | 1 | 0 | 1031 +----------------+-------+ 1032 | 01 | 1 | 1033 +----------------+-------+ 1034 | ... | ... | 1035 +----------------+-------+ 1036 | 0000 0000 01 | 9 | 1037 +----------------+-------+ 1038 | 0000 0000 001 | 10 | 1039 +----------------+-------+ 1040 | 0000 0000 0001 | 11 | 1041 +----------------+-------+ 1042 | 0000 0000 0000 | ESC | 1043 +----------------+-------+ 1045 Table 1 1047 3.8.2.1.2. Suffix 1049 +=========+========================================+ 1050 +=========+========================================+ 1051 | non ESC | the k least significant bits MSB first | 1052 +---------+----------------------------------------+ 1053 | ESC | the value - 11, in MSB first order | 1054 +---------+----------------------------------------+ 1056 Table 2 1058 ESC MUST NOT be used if the value can be coded as non ESC. 1060 3.8.2.1.3. Examples 1062 Table 3 shows practical examples of how Signed Golomb Rice Codes are 1063 decoded based on the series of bits extracted from the bitstream as 1064 described by the method above: 1066 +=====+=======================+=======+ 1067 | k | bits | value | 1068 +=====+=======================+=======+ 1069 | 0 | 1 | 0 | 1070 +-----+-----------------------+-------+ 1071 | 0 | 001 | 2 | 1072 +-----+-----------------------+-------+ 1073 | 2 | 1 00 | 0 | 1074 +-----+-----------------------+-------+ 1075 | 2 | 1 10 | 2 | 1076 +-----+-----------------------+-------+ 1077 | 2 | 01 01 | 5 | 1078 +-----+-----------------------+-------+ 1079 | any | 000000000000 10000000 | 139 | 1080 +-----+-----------------------+-------+ 1082 Table 3: Examples of decoded Signed 1083 Golomb Rice Codes. 1085 3.8.2.2. Run Mode 1087 Run mode is entered when the context is 0 and left as soon as a non-0 1088 difference is found. The sample difference is identical to the 1089 predicted one. The run and the first different sample difference are 1090 coded as defined in Section 3.8.2.4.1. 1092 3.8.2.2.1. Run Length Coding 1094 The run value is encoded in two parts. The prefix part stores the 1095 more significant part of the run as well as adjusting the "run_index" 1096 that determines the number of bits in the less significant part of 1097 the run. The second part of the value stores the less significant 1098 part of the run as it is. The "run_index" is reset for each Plane 1099 and slice to 0. 1101 log2_run[41] = { 1102 0, 0, 0, 0, 1, 1, 1, 1, 1103 2, 2, 2, 2, 3, 3, 3, 3, 1104 4, 4, 5, 5, 6, 6, 7, 7, 1105 8, 9,10,11,12,13,14,15, 1106 16,17,18,19,20,21,22,23, 1107 24, 1108 }; 1110 if (run_count == 0 && run_mode == 1) { 1111 if (get_bits(1)) { 1112 run_count = 1 << log2_run[run_index]; 1113 if (x + run_count <= w) { 1114 run_index++; 1115 } 1116 } else { 1117 if (log2_run[run_index]) { 1118 run_count = get_bits(log2_run[run_index]); 1119 } else { 1120 run_count = 0; 1121 } 1122 if (run_index) { 1123 run_index--; 1124 } 1125 run_mode = 2; 1126 } 1127 } 1129 The "log2_run" array is also used within [ISO.14495-1.1999]. 1131 3.8.2.3. Sign extension 1133 "sign_extend" is the function of increasing the number of bits of an 1134 input binary number in twos complement signed number representation 1135 while preserving the input number's sign (positive/negative) and 1136 value, in order to fit in the output bit width. It MAY be computed 1137 with: 1139 sign_extend(input_number, input_bits) { 1140 negative_bias = 1 << (input_bits - 1); 1141 bits_mask = negative_bias - 1; 1142 output_number = input_number & bits_mask; // Remove negative bit 1143 is_negative = input_number & negative_bias; // Test negative bit 1144 if (is_negative) 1145 output_number -= negative_bias; 1146 return output_number 1147 } 1149 3.8.2.4. Scalar Mode 1151 Each difference is coded with the per context mean prediction removed 1152 and a per context value for k. 1154 get_vlc_symbol(state) { 1155 i = state->count; 1156 k = 0; 1157 while (i < state->error_sum) { 1158 k++; 1159 i += i; 1160 } 1162 v = get_sr_golomb(k); 1164 if (2 * state->drift < -state->count) { 1165 v = -1 - v; 1166 } 1168 ret = sign_extend(v + state->bias, bits); 1170 state->error_sum += abs(v); 1171 state->drift += v; 1173 if (state->count == 128) { 1174 state->count >>= 1; 1175 state->drift >>= 1; 1176 state->error_sum >>= 1; 1177 } 1178 state->count++; 1179 if (state->drift <= -state->count) { 1180 state->bias = max(state->bias - 1, -128); 1182 state->drift = max(state->drift + state->count, 1183 -state->count + 1); 1184 } else if (state->drift > 0) { 1185 state->bias = min(state->bias + 1, 127); 1187 state->drift = min(state->drift - state->count, 0); 1188 } 1190 return ret; 1191 } 1193 3.8.2.4.1. Golomb Rice Sample Difference Coding 1195 Level coding is identical to the normal difference coding with the 1196 exception that the 0 value is removed as it cannot occur: 1198 diff = get_vlc_symbol(context_state); 1199 if (diff >= 0) { 1200 diff++; 1201 } 1203 Note, this is different from JPEG-LS, which doesn't use prediction in 1204 run mode and uses a different encoding and context model for the last 1205 difference. On a small set of test Samples the use of prediction 1206 slightly improved the compression rate. 1208 3.8.2.5. Initial Values for the VLC context state 1210 When "keyframe" (see Section 4.4) value is 1, all coder state 1211 variables are set to their initial state. 1213 drift = 0; 1214 error_sum = 4; 1215 bias = 0; 1216 count = 1; 1218 4. Bitstream 1220 An FFV1 bitstream is composed of a series of one or more Frames and 1221 (when required) a "Configuration Record". 1223 Within the following sub-sections, pseudo-code is used, as described 1224 in Section 2.2.1, to explain the structure of each FFV1 bitstream 1225 component. Table 4 lists symbols used to annotate that pseudo-code 1226 in order to define the storage of the data referenced in that line of 1227 pseudo-code. 1229 +========+=================================================+ 1230 | Symbol | Definition | 1231 +========+=================================================+ 1232 | u(n) | unsigned big endian integer Symbol using n bits | 1233 +--------+-------------------------------------------------+ 1234 | sg | Golomb Rice coded signed scalar Symbol coded | 1235 | | with the method described in Section 3.8.2 | 1236 +--------+-------------------------------------------------+ 1237 | br | Range coded Boolean (1-bit) Symbol with the | 1238 | | method described in Section 3.8.1.1 | 1239 +--------+-------------------------------------------------+ 1240 | ur | Range coded unsigned scalar Symbol coded with | 1241 | | the method described in Section 3.8.1.2 | 1242 +--------+-------------------------------------------------+ 1243 | sr | Range coded signed scalar Symbol coded with the | 1244 | | method described in Section 3.8.1.2 | 1245 +--------+-------------------------------------------------+ 1246 | sd | Sample difference Symbol coded with the method | 1247 | | described in Section 3.8 | 1248 +--------+-------------------------------------------------+ 1250 Table 4: Definition of pseudo-code symbols for this 1251 document. 1253 The following MUST be provided by external means during 1254 initialization of the decoder: 1256 "frame_pixel_width" is defined as Frame width in Pixels. 1258 "frame_pixel_height" is defined as Frame height in Pixels. 1260 Default values at the decoder initialization phase: 1262 "ConfigurationRecordIsPresent" is set to 0. 1264 4.1. Quantization Table Set 1266 The Quantization Table Sets are stored by storing the number of equal 1267 entries -1 of the first half of the table (represented as "len - 1" 1268 in the pseudo-code below) using the method described in 1269 Section 3.8.1.2. The second half doesn't need to be stored as it is 1270 identical to the first with flipped sign. "scale" and "len_count[ i 1271 ][ j ]" are temporary values used for the computing of 1272 "context_count[ i ]" and are not used outside Quantization Table Set 1273 pseudo-code. 1275 Example: 1277 Table: 0 0 1 1 1 1 2 2 -2 -2 -2 -1 -1 -1 -1 0 1279 Stored values: 1, 3, 1 1281 "QuantizationTableSet" has its own initial states, all set to 128. 1283 pseudo-code | type 1284 --------------------------------------------------------------|----- 1285 QuantizationTableSet( i ) { | 1286 scale = 1 | 1287 for (j = 0; j < MAX_CONTEXT_INPUTS; j++) { | 1288 QuantizationTable( i, j, scale ) | 1289 scale *= 2 * len_count[ i ][ j ] - 1 | 1290 } | 1291 context_count[ i ] = ceil( scale / 2 ) | 1292 } | 1294 "MAX_CONTEXT_INPUTS" is 5. 1296 pseudo-code | type 1297 --------------------------------------------------------------|----- 1298 QuantizationTable(i, j, scale) { | 1299 v = 0 | 1300 for (k = 0; k < 128;) { | 1301 len - 1 | ur 1302 for (n = 0; n < len; n++) { | 1303 quant_tables[ i ][ j ][ k ] = scale * v | 1304 k++ | 1305 } | 1306 v++ | 1307 } | 1308 for (k = 1; k < 128; k++) { | 1309 quant_tables[ i ][ j ][ 256 - k ] = \ | 1310 -quant_tables[ i ][ j ][ k ] | 1311 } | 1312 quant_tables[ i ][ j ][ 128 ] = \ | 1313 -quant_tables[ i ][ j ][ 127 ] | 1314 len_count[ i ][ j ] = v | 1315 } | 1317 4.1.1. quant_tables 1319 "quant_tables[ i ][ j ][ k ]" indicates the quantification table 1320 value of the Quantized Sample Difference "k" of the Quantization 1321 Table "j" of the Set Quantization Table Set "i". 1323 4.1.2. context_count 1325 "context_count[ i ]" indicates the count of contexts for Quantization 1326 Table Set "i". "context_count[ i ]" MUST be less than or equal to 1327 32768. 1329 4.2. Parameters 1331 The "Parameters" section contains significant characteristics about 1332 the decoding configuration used for all instances of Frame (in FFV1 1333 version 0 and 1) or the whole FFV1 bitstream (other versions), 1334 including the stream version, color configuration, and quantization 1335 tables. Figure 25 describes the contents of the bitstream. 1337 "Parameters" has its own initial states, all set to 128. 1339 pseudo-code | type 1340 --------------------------------------------------------------|----- 1341 Parameters( ) { | 1342 version | ur 1343 if (version >= 3) { | 1344 micro_version | ur 1345 } | 1346 coder_type | ur 1347 if (coder_type > 1) { | 1348 for (i = 1; i < 256; i++) { | 1349 state_transition_delta[ i ] | sr 1350 } | 1351 } | 1352 colorspace_type | ur 1353 if (version >= 1) { | 1354 bits_per_raw_sample | ur 1355 } | 1356 chroma_planes | br 1357 log2_h_chroma_subsample | ur 1358 log2_v_chroma_subsample | ur 1359 extra_plane | br 1360 if (version >= 3) { | 1361 num_h_slices - 1 | ur 1362 num_v_slices - 1 | ur 1363 quant_table_set_count | ur 1364 } | 1365 for (i = 0; i < quant_table_set_count; i++) { | 1366 QuantizationTableSet( i ) | 1367 } | 1368 if (version >= 3) { | 1369 for (i = 0; i < quant_table_set_count; i++) { | 1370 states_coded | br 1371 if (states_coded) { | 1372 for (j = 0; j < context_count[ i ]; j++) { | 1373 for (k = 0; k < CONTEXT_SIZE; k++) { | 1374 initial_state_delta[ i ][ j ][ k ] | sr 1375 } | 1376 } | 1377 } | 1378 } | 1379 ec | ur 1380 intra | ur 1381 } | 1382 } | 1384 Figure 25: A pseudo-code description of the bitstream contents. 1386 CONTEXT_SIZE is 32. 1388 4.2.1. version 1390 "version" specifies the version of the FFV1 bitstream. 1392 Each version is incompatible with other versions: decoders SHOULD 1393 reject FFV1 bitstreams due to an unknown version. 1395 Decoders SHOULD reject FFV1 bitstreams with version <= 1 && 1396 ConfigurationRecordIsPresent == 1. 1398 Decoders SHOULD reject FFV1 bitstreams with version >= 3 && 1399 ConfigurationRecordIsPresent == 0. 1401 +=======+=========================+ 1402 | value | version | 1403 +=======+=========================+ 1404 | 0 | FFV1 version 0 | 1405 +-------+-------------------------+ 1406 | 1 | FFV1 version 1 | 1407 +-------+-------------------------+ 1408 | 2 | reserved* | 1409 +-------+-------------------------+ 1410 | 3 | FFV1 version 3 | 1411 +-------+-------------------------+ 1412 | Other | reserved for future use | 1413 +-------+-------------------------+ 1415 Table 5 1417 * Version 2 was experimental and this document does not describe it. 1419 4.2.2. micro_version 1421 "micro_version" specifies the micro-version of the FFV1 bitstream. 1423 After a version is considered stable (a micro-version value is 1424 assigned to be the first stable variant of a specific version), each 1425 new micro-version after this first stable variant is compatible with 1426 the previous micro-version: decoders SHOULD NOT reject FFV1 1427 bitstreams due to an unknown micro-version equal or above the micro- 1428 version considered as stable. 1430 Meaning of "micro_version" for "version" 3: 1432 +=======+=========================+ 1433 | value | micro_version | 1434 +=======+=========================+ 1435 | 0...3 | reserved* | 1436 +-------+-------------------------+ 1437 | 4 | first stable variant | 1438 +-------+-------------------------+ 1439 | Other | reserved for future use | 1440 +-------+-------------------------+ 1442 Table 6: The definitions for 1443 "micro_version" values for FFV1 1444 version 3. 1446 * development versions may be incompatible with the stable variants. 1448 4.2.3. coder_type 1450 "coder_type" specifies the coder used. 1452 +=======+=================================================+ 1453 | value | coder used | 1454 +=======+=================================================+ 1455 | 0 | Golomb Rice | 1456 +-------+-------------------------------------------------+ 1457 | 1 | Range Coder with default state transition table | 1458 +-------+-------------------------------------------------+ 1459 | 2 | Range Coder with custom state transition table | 1460 +-------+-------------------------------------------------+ 1461 | Other | reserved for future use | 1462 +-------+-------------------------------------------------+ 1464 Table 7 1466 Restrictions: 1468 If "coder_type" is 0, then "bits_per_raw_sample" SHOULD NOT be > 8. 1470 Background: At the time of this writing, there is no known 1471 implementation of FFV1 bitstream supporting Golomb Rice algorithm 1472 with "bits_per_raw_sample" greater than 8, and Range Coder is 1473 prefered. 1475 4.2.4. state_transition_delta 1477 "state_transition_delta" specifies the Range coder custom state 1478 transition table. 1480 If "state_transition_delta" is not present in the FFV1 bitstream, all 1481 Range coder custom state transition table elements are assumed to be 1482 0. 1484 4.2.5. colorspace_type 1486 "colorspace_type" specifies the color space encoded, the pixel 1487 transformation used by the encoder, the extra plane content, as well 1488 as interleave method. 1490 +=======+==============+================+==============+============+ 1491 | value | color space | pixel | extra plane | interleave | 1492 | | encoded | transformation | content | method | 1493 +=======+==============+================+==============+============+ 1494 | 0 | YCbCr | None | Transparency | Plane then | 1495 | | | | | Line | 1496 +-------+--------------+----------------+--------------+------------+ 1497 | 1 | RGB | JPEG2000-RCT | Transparency | Line then | 1498 | | | | | Plane | 1499 +-------+--------------+----------------+--------------+------------+ 1500 | Other | reserved | reserved for | reserved for | reserved | 1501 | | for future | future use | future use | for future | 1502 | | use | | | use | 1503 +-------+--------------+----------------+--------------+------------+ 1505 Table 8 1507 FFV1 bitstreams with "colorspace_type" == 1 && ("chroma_planes" != 1508 1 || "log2_h_chroma_subsample" != 0 || "log2_v_chroma_subsample" != 1509 0) are not part of this specification. 1511 4.2.6. chroma_planes 1513 "chroma_planes" indicates if chroma (color) Planes are present. 1515 +=======+===============================+ 1516 | value | presence | 1517 +=======+===============================+ 1518 | 0 | chroma Planes are not present | 1519 +-------+-------------------------------+ 1520 | 1 | chroma Planes are present | 1521 +-------+-------------------------------+ 1523 Table 9 1525 4.2.7. bits_per_raw_sample 1527 "bits_per_raw_sample" indicates the number of bits for each Sample. 1528 Inferred to be 8 if not present. 1530 +=======+=================================+ 1531 | value | bits for each sample | 1532 +=======+=================================+ 1533 | 0 | reserved* | 1534 +-------+---------------------------------+ 1535 | Other | the actual bits for each Sample | 1536 +-------+---------------------------------+ 1538 Table 10 1540 * Encoders MUST NOT store "bits_per_raw_sample" = 0. Decoders SHOULD 1541 accept and interpret "bits_per_raw_sample" = 0 as 8. 1543 4.2.8. log2_h_chroma_subsample 1545 "log2_h_chroma_subsample" indicates the subsample factor, stored in 1546 powers to which the number 2 is raised, between luma and chroma width 1547 ("chroma_width = 2 ^ -log2_h_chroma_subsample * luma_width"). 1549 4.2.9. log2_v_chroma_subsample 1551 "log2_v_chroma_subsample" indicates the subsample factor, stored in 1552 powers to which the number 2 is raised, between luma and chroma 1553 height ("chroma_height = 2 ^ -log2_v_chroma_subsample * 1554 luma_height"). 1556 4.2.10. extra_plane 1558 "extra_plane" indicates if an extra Plane is present. 1560 +=======+============================+ 1561 | value | presence | 1562 +=======+============================+ 1563 | 0 | extra Plane is not present | 1564 +-------+----------------------------+ 1565 | 1 | extra Plane is present | 1566 +-------+----------------------------+ 1568 Table 11 1570 4.2.11. num_h_slices 1572 "num_h_slices" indicates the number of horizontal elements of the 1573 slice raster. 1575 Inferred to be 1 if not present. 1577 4.2.12. num_v_slices 1579 "num_v_slices" indicates the number of vertical elements of the slice 1580 raster. 1582 Inferred to be 1 if not present. 1584 4.2.13. quant_table_set_count 1586 "quant_table_set_count" indicates the number of Quantization 1587 Table Sets. "quant_table_set_count" MUST be less than or equal to 8. 1589 Inferred to be 1 if not present. 1591 MUST NOT be 0. 1593 4.2.14. states_coded 1595 "states_coded" indicates if the respective Quantization Table Set has 1596 the initial states coded. 1598 Inferred to be 0 if not present. 1600 +=======+================================+ 1601 | value | initial states | 1602 +=======+================================+ 1603 | 0 | initial states are not present | 1604 | | and are assumed to be all 128 | 1605 +-------+--------------------------------+ 1606 | 1 | initial states are present | 1607 +-------+--------------------------------+ 1609 Table 12 1611 4.2.15. initial_state_delta 1613 "initial_state_delta[ i ][ j ][ k ]" indicates the initial Range 1614 coder state, it is encoded using "k" as context index and 1616 pred = j ? initial_states[ i ][j - 1][ k ] : 128 1617 Figure 26 1619 initial_state[ i ][ j ][ k ] = 1620 ( pred + initial_state_delta[ i ][ j ][ k ] ) & 255 1622 Figure 27 1624 4.2.16. ec 1626 "ec" indicates the error detection/correction type. 1628 +=======+=================================================+ 1629 | value | error detection/correction type | 1630 +=======+=================================================+ 1631 | 0 | 32-bit CRC in "ConfigurationRecord" | 1632 +-------+-------------------------------------------------+ 1633 | 1 | 32-bit CRC in "Slice" and "ConfigurationRecord" | 1634 +-------+-------------------------------------------------+ 1635 | Other | reserved for future use | 1636 +-------+-------------------------------------------------+ 1638 Table 13 1640 4.2.17. intra 1642 "intra" indicates the constraint on "keyframe" in each instance of 1643 Frame. 1645 Inferred to be 0 if not present. 1647 +=======+=======================================================+ 1648 | value | relationship | 1649 +=======+=======================================================+ 1650 | 0 | "keyframe" can be 0 or 1 (non keyframes or keyframes) | 1651 +-------+-------------------------------------------------------+ 1652 | 1 | "keyframe" MUST be 1 (keyframes only) | 1653 +-------+-------------------------------------------------------+ 1654 | Other | reserved for future use | 1655 +-------+-------------------------------------------------------+ 1657 Table 14 1659 4.3. Configuration Record 1661 In the case of a FFV1 bitstream with "version >= 3", a "Configuration 1662 Record" is stored in the underlying Container as described in 1663 Section 4.3.3. It contains the "Parameters" used for all instances 1664 of Frame. The size of the "Configuration Record", "NumBytes", is 1665 supplied by the underlying Container. 1667 pseudo-code | type 1668 -----------------------------------------------------------|----- 1669 ConfigurationRecord( NumBytes ) { | 1670 ConfigurationRecordIsPresent = 1 | 1671 Parameters( ) | 1672 while (remaining_symbols_in_syntax(NumBytes - 4)) { | 1673 reserved_for_future_use | br/ur/sr 1674 } | 1675 configuration_record_crc_parity | u(32) 1676 } | 1678 4.3.1. reserved_for_future_use 1680 "reserved_for_future_use" is a placeholder for future updates of this 1681 specification. 1683 Encoders conforming to this version of this specification SHALL NOT 1684 write "reserved_for_future_use". 1686 Decoders conforming to this version of this specification SHALL 1687 ignore "reserved_for_future_use". 1689 4.3.2. configuration_record_crc_parity 1691 "configuration_record_crc_parity" 32 bits that are chosen so that the 1692 "Configuration Record" as a whole has a CRC remainder of 0. 1694 This is equivalent to storing the CRC remainder in the 32-bit parity. 1696 The CRC generator polynomial used is described in Section 4.9.3. 1698 4.3.3. Mapping FFV1 into Containers 1700 This "Configuration Record" can be placed in any file format 1701 supporting "Configuration Records", fitting as much as possible with 1702 how the file format uses to store "Configuration Records". The 1703 "Configuration Record" storage place and "NumBytes" are currently 1704 defined and supported by this version of this specification for the 1705 following formats: 1707 4.3.3.1. AVI File Format 1709 The "Configuration Record" extends the stream format chunk ("AVI ", 1710 "hdlr", "strl", "strf") with the ConfigurationRecord bitstream. 1712 See [AVI] for more information about chunks. 1714 "NumBytes" is defined as the size, in bytes, of the strf chunk 1715 indicated in the chunk header minus the size of the stream format 1716 structure. 1718 4.3.3.2. ISO Base Media File Format 1720 The "Configuration Record" extends the sample description box 1721 ("moov", "trak", "mdia", "minf", "stbl", "stsd") with a "glbl" box 1722 that contains the ConfigurationRecord bitstream. See 1723 [ISO.14496-12.2015] for more information about boxes. 1725 "NumBytes" is defined as the size, in bytes, of the "glbl" box 1726 indicated in the box header minus the size of the box header. 1728 4.3.3.3. NUT File Format 1730 The "codec_specific_data" element (in "stream_header" packet) 1731 contains the ConfigurationRecord bitstream. See [NUT] for more 1732 information about elements. 1734 "NumBytes" is defined as the size, in bytes, of the 1735 "codec_specific_data" element as indicated in the "length" field of 1736 "codec_specific_data". 1738 4.3.3.4. Matroska File Format 1740 FFV1 SHOULD use "V_FFV1" as the Matroska "Codec ID". For FFV1 1741 versions 2 or less, the Matroska "CodecPrivate" Element SHOULD NOT be 1742 used. For FFV1 versions 3 or greater, the Matroska "CodecPrivate" 1743 Element MUST contain the FFV1 "Configuration Record" structure and no 1744 other data. See [Matroska] for more information about elements. 1746 "NumBytes" is defined as the "Element Data Size" of the 1747 "CodecPrivate" Element. 1749 4.4. Frame 1751 A Frame is an encoded representation of a complete static image. The 1752 whole Frame is provided by the underlaying container. 1754 A Frame consists of the "keyframe" field, "Parameters" (if "version" 1755 <= 1), and a sequence of independent slices. The pseudo-code below 1756 describes the contents of a Frame. 1758 "keyframe" field has its own initial state, set to 128. 1760 pseudo-code | type 1761 --------------------------------------------------------------|----- 1762 Frame( NumBytes ) { | 1763 keyframe | br 1764 if (keyframe && !ConfigurationRecordIsPresent { | 1765 Parameters( ) | 1766 } | 1767 while (remaining_bits_in_bitstream( NumBytes )) { | 1768 Slice( ) | 1769 } | 1770 } | 1772 Architecture overview of slices in a Frame: 1774 +=================================================================+ 1775 +=================================================================+ 1776 | first slice header | 1777 +-----------------------------------------------------------------+ 1778 | first slice content | 1779 +-----------------------------------------------------------------+ 1780 | first slice footer | 1781 +-----------------------------------------------------------------+ 1782 | --------------------------------------------------------------- | 1783 +-----------------------------------------------------------------+ 1784 | second slice header | 1785 +-----------------------------------------------------------------+ 1786 | second slice content | 1787 +-----------------------------------------------------------------+ 1788 | second slice footer | 1789 +-----------------------------------------------------------------+ 1790 | --------------------------------------------------------------- | 1791 +-----------------------------------------------------------------+ 1792 | ... | 1793 +-----------------------------------------------------------------+ 1794 | --------------------------------------------------------------- | 1795 +-----------------------------------------------------------------+ 1796 | last slice header | 1797 +-----------------------------------------------------------------+ 1798 | last slice content | 1799 +-----------------------------------------------------------------+ 1800 | last slice footer | 1801 +-----------------------------------------------------------------+ 1803 Table 15 1805 4.5. Slice 1807 A "Slice" is an independent spatial sub-section of a Frame that is 1808 encoded separately from another region of the same Frame. The use of 1809 more than one "Slice" per Frame can be useful for taking advantage of 1810 the opportunities of multithreaded encoding and decoding. 1812 A "Slice" consists of a "Slice Header" (when relevant), a "Slice 1813 Content", and a "Slice Footer" (when relevant). The pseudo-code 1814 below describes the contents of a "Slice". 1816 pseudo-code | type 1817 --------------------------------------------------------------|----- 1818 Slice( ) { | 1819 if (version >= 3) { | 1820 SliceHeader( ) | 1821 } | 1822 SliceContent( ) | 1823 if (coder_type == 0) { | 1824 while (!byte_aligned()) { | 1825 padding | u(1) 1826 } | 1827 } | 1828 if (version <= 1) { | 1829 while (remaining_bits_in_bitstream( NumBytes ) != 0) {| 1830 reserved | u(1) 1831 } | 1832 } | 1833 if (version >= 3) { | 1834 SliceFooter( ) | 1835 } | 1836 } | 1838 "padding" specifies a bit without any significance and used only for 1839 byte alignment. MUST be 0. 1841 "reserved" specifies a bit without any significance in this revision 1842 of the specification and may have a significance in a later revision 1843 of this specification. 1845 Encoders SHOULD NOT fill "reserved". 1847 Decoders SHOULD ignore "reserved". 1849 4.6. Slice Header 1851 A "Slice Header" provides information about the decoding 1852 configuration of the "Slice", such as its spatial position, size, and 1853 aspect ratio. The pseudo-code below describes the contents of the 1854 "Slice Header". 1856 "Slice Header" has its own initial states, all set to 128. 1858 pseudo-code | type 1859 --------------------------------------------------------------|----- 1860 SliceHeader( ) { | 1861 slice_x | ur 1862 slice_y | ur 1863 slice_width - 1 | ur 1864 slice_height - 1 | ur 1865 for (i = 0; i < quant_table_set_index_count; i++) { | 1866 quant_table_set_index[ i ] | ur 1867 } | 1868 picture_structure | ur 1869 sar_num | ur 1870 sar_den | ur 1871 } | 1873 4.6.1. slice_x 1875 "slice_x" indicates the x position on the slice raster formed by 1876 num_h_slices. 1878 Inferred to be 0 if not present. 1880 4.6.2. slice_y 1882 "slice_y" indicates the y position on the slice raster formed by 1883 num_v_slices. 1885 Inferred to be 0 if not present. 1887 4.6.3. slice_width 1889 "slice_width" indicates the width on the slice raster formed by 1890 num_h_slices. 1892 Inferred to be 1 if not present. 1894 4.6.4. slice_height 1896 "slice_height" indicates the height on the slice raster formed by 1897 num_v_slices. 1899 Inferred to be 1 if not present. 1901 4.6.5. quant_table_set_index_count 1903 "quant_table_set_index_count" is defined as: 1905 1 + ( ( chroma_planes || version <= 3 ) ? 1 : 0 ) 1906 + ( extra_plane ? 1 : 0 ) 1908 4.6.6. quant_table_set_index 1910 "quant_table_set_index" indicates the Quantization Table Set index to 1911 select the Quantization Table Set and the initial states for the 1912 "Slice Content". 1914 Inferred to be 0 if not present. 1916 4.6.7. picture_structure 1918 "picture_structure" specifies the temporal and spatial relationship 1919 of each Line of the Frame. 1921 Inferred to be 0 if not present. 1923 +=======+=========================+ 1924 | value | picture structure used | 1925 +=======+=========================+ 1926 | 0 | unknown | 1927 +-------+-------------------------+ 1928 | 1 | top field first | 1929 +-------+-------------------------+ 1930 | 2 | bottom field first | 1931 +-------+-------------------------+ 1932 | 3 | progressive | 1933 +-------+-------------------------+ 1934 | Other | reserved for future use | 1935 +-------+-------------------------+ 1937 Table 16 1939 4.6.8. sar_num 1941 "sar_num" specifies the Sample aspect ratio numerator. 1943 Inferred to be 0 if not present. 1945 A value of 0 means that aspect ratio is unknown. 1947 Encoders MUST write 0 if Sample aspect ratio is unknown. 1949 If "sar_den" is 0, decoders SHOULD ignore the encoded value and 1950 consider that "sar_num" is 0. 1952 4.6.9. sar_den 1954 "sar_den" specifies the Sample aspect ratio denominator. 1956 Inferred to be 0 if not present. 1958 A value of 0 means that aspect ratio is unknown. 1960 Encoders MUST write 0 if Sample aspect ratio is unknown. 1962 If "sar_num" is 0, decoders SHOULD ignore the encoded value and 1963 consider that "sar_den" is 0. 1965 4.7. Slice Content 1967 A "Slice Content" contains all Line elements part of the "Slice". 1969 Depending on the configuration, Line elements are ordered by Plane 1970 then by row (YCbCr) or by row then by Plane (RGB). 1972 pseudo-code | type 1973 --------------------------------------------------------------|----- 1974 SliceContent( ) { | 1975 if (colorspace_type == 0) { | 1976 for (p = 0; p < primary_color_count; p++) { | 1977 for (y = 0; y < plane_pixel_height[ p ]; y++) { | 1978 Line( p, y ) | 1979 } | 1980 } | 1981 } else if (colorspace_type == 1) { | 1982 for (y = 0; y < slice_pixel_height; y++) { | 1983 for (p = 0; p < primary_color_count; p++) { | 1984 Line( p, y ) | 1985 } | 1986 } | 1987 } | 1988 } | 1990 4.7.1. primary_color_count 1992 "primary_color_count" is defined as: 1994 1 + ( chroma_planes ? 2 : 0 ) + ( extra_plane ? 1 : 0 ) 1996 4.7.2. plane_pixel_height 1998 "plane_pixel_height[ p ]" is the height in Pixels of Plane p of the 1999 "Slice". It is defined as: 2001 chroma_planes == 1 && (p == 1 || p == 2) 2002 ? ceil(slice_pixel_height / (1 << log2_v_chroma_subsample)) 2003 : slice_pixel_height 2005 4.7.3. slice_pixel_height 2007 "slice_pixel_height" is the height in pixels of the slice. It is 2008 defined as: 2010 floor( 2011 ( slice_y + slice_height ) 2012 * slice_pixel_height 2013 / num_v_slices 2014 ) - slice_pixel_y. 2016 4.7.4. slice_pixel_y 2018 "slice_pixel_y" is the slice vertical position in pixels. It is 2019 defined as: 2021 floor( slice_y * frame_pixel_height / num_v_slices ) 2023 4.8. Line 2025 A Line is a list of the sample differences (relative to the 2026 predictor) of primary color components. The pseudo-code below 2027 describes the contents of the Line. 2029 pseudo-code | type 2030 --------------------------------------------------------------|----- 2031 Line( p, y ) { | 2032 if (colorspace_type == 0) { | 2033 for (x = 0; x < plane_pixel_width[ p ]; x++) { | 2034 sample_difference[ p ][ y ][ x ] | sd 2035 } | 2036 } else if (colorspace_type == 1) { | 2037 for (x = 0; x < slice_pixel_width; x++) { | 2038 sample_difference[ p ][ y ][ x ] | sd 2039 } | 2040 } | 2041 } | 2043 4.8.1. plane_pixel_width 2045 "plane_pixel_width[ p ]" is the width in Pixels of Plane p of the 2046 "Slice". It is defined as: 2048 chroma\_planes == 1 && (p == 1 || p == 2) 2049 ? ceil( slice_pixel_width / (1 << log2_h_chroma_subsample) ) 2050 : slice_pixel_width. 2052 4.8.2. slice_pixel_width 2054 "slice_pixel_width" is the width in Pixels of the slice. It is 2055 defined as: 2057 floor( 2058 ( slice_x + slice_width ) 2059 * slice_pixel_width 2060 / num_h_slices 2061 ) - slice_pixel_x 2063 4.8.3. slice_pixel_x 2065 "slice_pixel_x" is the slice horizontal position in Pixels. It is 2066 defined as: 2068 floor( slice_x * frame_pixel_width / num_h_slices ) 2070 4.8.4. sample_difference 2072 "sample_difference[ p ][ y ][ x ]" is the sample difference for 2073 Sample at Plane "p", y position "y", and x position "x". The Sample 2074 value is computed based on median predictor and context described in 2075 Section 3.2. 2077 4.9. Slice Footer 2079 A "Slice Footer" provides information about slice size and 2080 (optionally) parity. The pseudo-code below describes the contents of 2081 the "Slice Footer". 2083 Note: "Slice Footer" is always byte aligned. 2085 pseudo-code | type 2086 --------------------------------------------------------------|----- 2087 SliceFooter( ) { | 2088 slice_size | u(24) 2089 if (ec) { | 2090 error_status | u(8) 2091 slice_crc_parity | u(32) 2092 } | 2093 } | 2095 4.9.1. slice_size 2097 "slice_size" indicates the size of the slice in bytes. 2099 Note: this allows finding the start of slices before previous slices 2100 have been fully decoded, and allows parallel decoding as well as 2101 error resilience. 2103 4.9.2. error_status 2105 "error_status" specifies the error status. 2107 +=======+======================================+ 2108 | value | error status | 2109 +=======+======================================+ 2110 | 0 | no error | 2111 +-------+--------------------------------------+ 2112 | 1 | slice contains a correctable error | 2113 +-------+--------------------------------------+ 2114 | 2 | slice contains a uncorrectable error | 2115 +-------+--------------------------------------+ 2116 | Other | reserved for future use | 2117 +-------+--------------------------------------+ 2119 Table 17 2121 4.9.3. slice_crc_parity 2123 "slice_crc_parity" 32 bits that are chosen so that the slice as a 2124 whole has a crc remainder of 0. 2126 This is equivalent to storing the crc remainder in the 32-bit parity. 2128 The CRC generator polynomial used is the standard IEEE CRC polynomial 2129 (0x104C11DB7), with initial value 0, without pre-inversion and 2130 without post-inversion. 2132 5. Restrictions 2134 To ensure that fast multithreaded decoding is possible, starting with 2135 version 3 and if "frame_pixel_width * frame_pixel_height" is more 2136 than 101376, "slice_width * slice_height" MUST be less or equal to 2137 "num_h_slices * num_v_slices / 4". Note: 101376 is the frame size in 2138 Pixels of a 352x288 frame also known as CIF ("Common Intermediate 2139 Format") frame size format. 2141 For each Frame, each position in the slice raster MUST be filled by 2142 one and only one slice of the Frame (no missing slice position, no 2143 slice overlapping). 2145 For each Frame with "keyframe" value of 0, each slice MUST have the 2146 same value of "slice_x", "slice_y", "slice_width", "slice_height" as 2147 a slice in the previous Frame. 2149 6. Security Considerations 2151 Like any other codec, (such as [RFC6716]), FFV1 should not be used 2152 with insecure ciphers or cipher-modes that are vulnerable to known 2153 plaintext attacks. Some of the header bits as well as the padding 2154 are easily predictable. 2156 Implementations of the FFV1 codec need to take appropriate security 2157 considerations into account, as outlined in [RFC4732]. It is 2158 extremely important for the decoder to be robust against malicious 2159 payloads. Malicious payloads MUST NOT cause the decoder to overrun 2160 its allocated memory or to take an excessive amount of resources to 2161 decode. The same applies to the encoder, even though problems in 2162 encoders are typically rarer. Malicious video streams MUST NOT cause 2163 the encoder to misbehave because this would allow an attacker to 2164 attack transcoding gateways. A frequent security problem in image 2165 and video codecs is failure to check for integer overflows. An 2166 example is allocating "frame_pixel_width * frame_pixel_height" in 2167 Pixel count computations without considering that the multiplication 2168 result may have overflowed the arithmetic types range. The range 2169 coder could, if implemented naively, read one byte over the end. The 2170 implementation MUST ensure that no read outside allocated and 2171 initialized memory occurs. 2173 None of the content carried in FFV1 is intended to be executable. 2175 The reference implementation [REFIMPL] contains no known buffer 2176 overflow or cases where a specially crafted packet or video segment 2177 could cause a significant increase in CPU load. 2179 The reference implementation [REFIMPL] was validated in the following 2180 conditions: 2182 * Sending the decoder valid packets generated by the reference 2183 encoder and verifying that the decoder's output matches the 2184 encoder's input. 2186 * Sending the decoder packets generated by the reference encoder and 2187 then subjected to random corruption. 2189 * Sending the decoder random packets that are not FFV1. 2191 In all of the conditions above, the decoder and encoder was run 2192 inside the [VALGRIND] memory debugger as well as clangs address 2193 sanitizer [Address-Sanitizer], which track reads and writes to 2194 invalid memory regions as well as the use of uninitialized memory. 2195 There were no errors reported on any of the tested conditions. 2197 7. IANA Considerations 2199 The IANA is requested to register the following values: 2201 7.1. Media Type Definition 2203 This registration is done using the template defined in [RFC6838] and 2204 following [RFC4855]. 2206 Type name: video 2208 Subtype name: FFV1 2210 Required parameters: None. 2212 Optional parameters: These parameters are used to signal the 2213 capabilities of a receiver implementation. These parameters MUST NOT 2214 be used for any other purpose. 2216 * "version": The "version" of the FFV1 encoding as defined by 2217 Section 4.2.1. 2219 * "micro_version": The "micro_version" of the FFV1 encoding as 2220 defined by Section 4.2.2. 2222 * "coder_type": The "coder_type" of the FFV1 encoding as defined by 2223 Section 4.2.3. 2225 * "colorspace_type": The "colorspace_type" of the FFV1 encoding as 2226 defined by Section 4.2.5. 2228 * "bits_per_raw_sample": The "bits_per_raw_sample" of the FFV1 2229 encoding as defined by Section 4.2.7. 2231 * "max_slices": The value of "max_slices" is an integer indicating 2232 the maximum count of slices with a frames of the FFV1 encoding. 2234 Encoding considerations: This media type is defined for encapsulation 2235 in several audiovisual container formats and contains binary data; 2236 see Section 4.3.3. This media type is framed binary data; see 2237 Section 4.8 of [RFC6838]. 2239 Security considerations: See Section 6 of this document. 2241 Interoperability considerations: None. 2243 Published specification: RFC XXXX. 2245 [RFC Editor: Upon publication as an RFC, please replace "XXXX" with 2246 the number assigned to this document and remove this note.] 2248 Applications which use this media type: Any application that requires 2249 the transport of lossless video can use this media type. Some 2250 examples are, but not limited to screen recording, scientific 2251 imaging, and digital video preservation. 2253 Fragment identifier considerations: N/A. 2255 Additional information: None. 2257 Person & email address to contact for further information: Michael 2258 Niedermayer michael@niedermayer.cc (mailto:michael@niedermayer.cc) 2260 Intended usage: COMMON 2262 Restrictions on usage: None. 2264 Author: Dave Rice dave@dericed.com (mailto:dave@dericed.com) 2266 Change controller: IETF cellar working group delegated from the IESG. 2268 8. Changelog 2270 See https://github.com/FFmpeg/FFV1/commits/master 2271 (https://github.com/FFmpeg/FFV1/commits/master) 2273 [RFC Editor: Please remove this Changelog section prior to 2274 publication.] 2276 9. Normative References 2278 [ISO.15444-1.2016] 2279 International Organization for Standardization, 2280 "Information technology -- JPEG 2000 image coding system: 2281 Core coding system", October 2016. 2283 [ISO.9899.2018] 2284 International Organization for Standardization, 2285 "Programming languages - C", ISO Standard 9899, 2018. 2287 [Matroska] IETF, "Matroska", 2019, . 2290 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2291 Requirement Levels", BCP 14, RFC 2119, 2292 DOI 10.17487/RFC2119, March 1997, 2293 . 2295 [RFC4732] Handley, M., Ed., Rescorla, E., Ed., and IAB, "Internet 2296 Denial-of-Service Considerations", RFC 4732, 2297 DOI 10.17487/RFC4732, December 2006, 2298 . 2300 [RFC4855] Casner, S., "Media Type Registration of RTP Payload 2301 Formats", RFC 4855, DOI 10.17487/RFC4855, February 2007, 2302 . 2304 [RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the 2305 Opus Audio Codec", RFC 6716, DOI 10.17487/RFC6716, 2306 September 2012, . 2308 [RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type 2309 Specifications and Registration Procedures", BCP 13, 2310 RFC 6838, DOI 10.17487/RFC6838, January 2013, 2311 . 2313 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2314 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 2315 May 2017, . 2317 10. Informative References 2319 [Address-Sanitizer] 2320 The Clang Team, "ASAN AddressSanitizer website", undated, 2321 . 2323 [AVI] Microsoft, "AVI RIFF File Reference", undated, 2324 . 2327 [FFV1_V0] Niedermayer, M., "Commit to mark FFV1 version 0 as non- 2328 experimental", April 2006, . 2332 [FFV1_V1] Niedermayer, M., "Commit to release FFV1 version 1", April 2333 2009, . 2336 [FFV1_V3] Niedermayer, M., "Commit to mark FFV1 version 3 as non- 2337 experimental", August 2013, . 2341 [HuffYUV] Rudiak-Gould, B., "HuffYUV", December 2003, 2342 . 2345 [ISO.14495-1.1999] 2346 International Organization for Standardization, 2347 "Information technology -- Lossless and near-lossless 2348 compression of continuous-tone still images: Baseline", 2349 December 1999. 2351 [ISO.14496-10.2014] 2352 International Organization for Standardization, 2353 "Information technology -- Coding of audio-visual objects 2354 -- Part 10: Advanced Video Coding", September 2014. 2356 [ISO.14496-12.2015] 2357 International Organization for Standardization, 2358 "Information technology -- Coding of audio-visual objects 2359 -- Part 12: ISO base media file format", December 2015. 2361 [NUT] Niedermayer, M., "NUT Open Container Format", December 2362 2013, . 2364 [range-coding] 2365 Martin, G. N. N., "Range encoding: an algorithm for 2366 removing redundancy from a digitised message", Proceedings 2367 of the Conference on Video and Data Recording. Institution 2368 of Electronic and Radio Engineers, Hampshire, England, 2369 July 1979. 2371 [REFIMPL] Niedermayer, M., "The reference FFV1 implementation / the 2372 FFV1 codec in FFmpeg", undated, . 2374 [VALGRIND] Valgrind Developers, "Valgrind website", undated, 2375 . 2377 [YCbCr] Wikipedia, "YCbCr", undated, 2378 . 2380 Appendix A. Multi-theaded decoder implementation suggestions 2382 This appendix is informative. 2384 The FFV1 bitstream is parsable in two ways: in sequential order as 2385 described in this document or with the pre-analysis of the footer of 2386 each slice. Each slice footer contains a "slice_size" field so the 2387 boundary of each slice is computable without having to parse the 2388 slice content. That allows multi-threading as well as independence 2389 of slice content (a bitstream error in a slice header or slice 2390 content has no impact on the decoding of the other slices). 2392 After having checked "keyframe" field, a decoder SHOULD parse 2393 "slice_size" fields, from "slice_size" of the last slice at the end 2394 of the "Frame" up to "slice_size" of the first slice at the beginning 2395 of the "Frame", before parsing slices, in order to have slices 2396 boundaries. A decoder MAY fallback on sequential order e.g. in case 2397 of a corrupted "Frame" (frame size unknown, "slice_size" of slices 2398 not coherent...) or if there is no possibility of seeking into the 2399 stream. 2401 Appendix B. Future handling of some streams created by non conforming 2402 encoders 2404 This appendix is informative. 2406 Some bitstreams were found with 40 extra bits corresponding to 2407 "error_status" and "slice_crc_parity" in the "reserved" bits of 2408 "Slice()". Any revision of this specification SHOULD care about 2409 avoiding to add 40 bits of content after "SliceContent" if "version" 2410 == 0 or "version" == 1. Else a decoder conforming to the revised 2411 specification could not distinguish between a revised bitstream and 2412 such buggy bitstream in the wild. 2414 Authors' Addresses 2416 Michael Niedermayer 2418 Email: michael@niedermayer.cc 2420 Dave Rice 2422 Email: dave@dericed.com 2424 Jerome Martinez 2425 Email: jerome@mediaarea.net