idnits 2.17.1 draft-ietf-cellar-ffv1-17.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (21 August 2020) is 1344 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '41' on line 1044 Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 cellar M. Niedermayer 3 Internet-Draft 4 Intended status: Informational D. Rice 5 Expires: 22 February 2021 6 J. Martinez 7 21 August 2020 9 FFV1 Video Coding Format Version 0, 1, and 3 10 draft-ietf-cellar-ffv1-17 12 Abstract 14 This document defines FFV1, a lossless intra-frame video encoding 15 format. FFV1 is designed to efficiently compress video data in a 16 variety of pixel formats. Compared to uncompressed video, FFV1 17 offers storage compression, frame fixity, and self-description, which 18 makes FFV1 useful as a preservation or intermediate video format. 20 Status of This Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at https://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on 22 February 2021. 37 Copyright Notice 39 Copyright (c) 2020 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 44 license-info) in effect on the date of publication of this document. 45 Please review these documents carefully, as they describe your rights 46 and restrictions with respect to this document. Code Components 47 extracted from this document must include Simplified BSD License text 48 as described in Section 4.e of the Trust Legal Provisions and are 49 provided without warranty as described in the Simplified BSD License. 51 Table of Contents 53 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 54 2. Notation and Conventions . . . . . . . . . . . . . . . . . . 5 55 2.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . 5 56 2.2. Conventions . . . . . . . . . . . . . . . . . . . . . . . 5 57 2.2.1. Pseudo-code . . . . . . . . . . . . . . . . . . . . . 6 58 2.2.2. Arithmetic Operators . . . . . . . . . . . . . . . . 6 59 2.2.3. Assignment Operators . . . . . . . . . . . . . . . . 7 60 2.2.4. Comparison Operators . . . . . . . . . . . . . . . . 7 61 2.2.5. Mathematical Functions . . . . . . . . . . . . . . . 7 62 2.2.6. Order of Operation Precedence . . . . . . . . . . . . 8 63 2.2.7. Range . . . . . . . . . . . . . . . . . . . . . . . . 9 64 2.2.8. NumBytes . . . . . . . . . . . . . . . . . . . . . . 9 65 2.2.9. Bitstream Functions . . . . . . . . . . . . . . . . . 9 66 3. Sample Coding . . . . . . . . . . . . . . . . . . . . . . . . 9 67 3.1. Border . . . . . . . . . . . . . . . . . . . . . . . . . 10 68 3.2. Samples . . . . . . . . . . . . . . . . . . . . . . . . . 10 69 3.3. Median Predictor . . . . . . . . . . . . . . . . . . . . 11 70 3.4. Quantization Table Sets . . . . . . . . . . . . . . . . . 12 71 3.5. Context . . . . . . . . . . . . . . . . . . . . . . . . . 12 72 3.6. Quantization Table Set Indexes . . . . . . . . . . . . . 12 73 3.7. Color spaces . . . . . . . . . . . . . . . . . . . . . . 13 74 3.7.1. YCbCr . . . . . . . . . . . . . . . . . . . . . . . . 13 75 3.7.2. RGB . . . . . . . . . . . . . . . . . . . . . . . . . 14 76 3.8. Coding of the Sample Difference . . . . . . . . . . . . . 15 77 3.8.1. Range Coding Mode . . . . . . . . . . . . . . . . . . 16 78 3.8.2. Golomb Rice Mode . . . . . . . . . . . . . . . . . . 21 79 4. Bitstream . . . . . . . . . . . . . . . . . . . . . . . . . . 26 80 4.1. Quantization Table Set . . . . . . . . . . . . . . . . . 27 81 4.1.1. quant_tables . . . . . . . . . . . . . . . . . . . . 28 82 4.1.2. context_count . . . . . . . . . . . . . . . . . . . . 29 83 4.2. Parameters . . . . . . . . . . . . . . . . . . . . . . . 29 84 4.2.1. version . . . . . . . . . . . . . . . . . . . . . . . 31 85 4.2.2. micro_version . . . . . . . . . . . . . . . . . . . . 31 86 4.2.3. coder_type . . . . . . . . . . . . . . . . . . . . . 32 87 4.2.4. state_transition_delta . . . . . . . . . . . . . . . 32 88 4.2.5. colorspace_type . . . . . . . . . . . . . . . . . . . 33 89 4.2.6. chroma_planes . . . . . . . . . . . . . . . . . . . . 33 90 4.2.7. bits_per_raw_sample . . . . . . . . . . . . . . . . . 34 91 4.2.8. log2_h_chroma_subsample . . . . . . . . . . . . . . . 34 92 4.2.9. log2_v_chroma_subsample . . . . . . . . . . . . . . . 34 93 4.2.10. extra_plane . . . . . . . . . . . . . . . . . . . . . 34 94 4.2.11. num_h_slices . . . . . . . . . . . . . . . . . . . . 35 95 4.2.12. num_v_slices . . . . . . . . . . . . . . . . . . . . 35 96 4.2.13. quant_table_set_count . . . . . . . . . . . . . . . . 35 97 4.2.14. states_coded . . . . . . . . . . . . . . . . . . . . 35 98 4.2.15. initial_state_delta . . . . . . . . . . . . . . . . . 35 99 4.2.16. ec . . . . . . . . . . . . . . . . . . . . . . . . . 36 100 4.2.17. intra . . . . . . . . . . . . . . . . . . . . . . . . 36 101 4.3. Configuration Record . . . . . . . . . . . . . . . . . . 37 102 4.3.1. reserved_for_future_use . . . . . . . . . . . . . . . 37 103 4.3.2. configuration_record_crc_parity . . . . . . . . . . . 37 104 4.3.3. Mapping FFV1 into Containers . . . . . . . . . . . . 37 105 4.4. Frame . . . . . . . . . . . . . . . . . . . . . . . . . . 38 106 4.5. Slice . . . . . . . . . . . . . . . . . . . . . . . . . . 40 107 4.6. Slice Header . . . . . . . . . . . . . . . . . . . . . . 41 108 4.6.1. slice_x . . . . . . . . . . . . . . . . . . . . . . . 42 109 4.6.2. slice_y . . . . . . . . . . . . . . . . . . . . . . . 42 110 4.6.3. slice_width . . . . . . . . . . . . . . . . . . . . . 42 111 4.6.4. slice_height . . . . . . . . . . . . . . . . . . . . 42 112 4.6.5. quant_table_set_index_count . . . . . . . . . . . . . 42 113 4.6.6. quant_table_set_index . . . . . . . . . . . . . . . . 43 114 4.6.7. picture_structure . . . . . . . . . . . . . . . . . . 43 115 4.6.8. sar_num . . . . . . . . . . . . . . . . . . . . . . . 43 116 4.6.9. sar_den . . . . . . . . . . . . . . . . . . . . . . . 44 117 4.7. Slice Content . . . . . . . . . . . . . . . . . . . . . . 44 118 4.7.1. primary_color_count . . . . . . . . . . . . . . . . . 44 119 4.7.2. plane_pixel_height . . . . . . . . . . . . . . . . . 44 120 4.7.3. slice_pixel_height . . . . . . . . . . . . . . . . . 45 121 4.7.4. slice_pixel_y . . . . . . . . . . . . . . . . . . . . 45 122 4.8. Line . . . . . . . . . . . . . . . . . . . . . . . . . . 45 123 4.8.1. plane_pixel_width . . . . . . . . . . . . . . . . . . 45 124 4.8.2. slice_pixel_width . . . . . . . . . . . . . . . . . . 46 125 4.8.3. slice_pixel_x . . . . . . . . . . . . . . . . . . . . 46 126 4.8.4. sample_difference . . . . . . . . . . . . . . . . . . 46 127 4.9. Slice Footer . . . . . . . . . . . . . . . . . . . . . . 46 128 4.9.1. slice_size . . . . . . . . . . . . . . . . . . . . . 47 129 4.9.2. error_status . . . . . . . . . . . . . . . . . . . . 47 130 4.9.3. slice_crc_parity . . . . . . . . . . . . . . . . . . 47 131 5. Restrictions . . . . . . . . . . . . . . . . . . . . . . . . 47 132 6. Security Considerations . . . . . . . . . . . . . . . . . . . 48 133 7. Media Type Definition . . . . . . . . . . . . . . . . . . . . 49 134 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 50 135 9. Changelog . . . . . . . . . . . . . . . . . . . . . . . . . . 50 136 10. Normative References . . . . . . . . . . . . . . . . . . . . 50 137 11. Informative References . . . . . . . . . . . . . . . . . . . 51 138 Appendix A. Multi-theaded decoder implementation suggestions . . 53 139 Appendix B. Future handling of some streams created by non 140 conforming encoders . . . . . . . . . . . . . . . . . . . 53 141 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 53 143 1. Introduction 145 This document describes FFV1, a lossless video encoding format. The 146 design of FFV1 considers the storage of image characteristics, data 147 fixity, and the optimized use of encoding time and storage 148 requirements. FFV1 is designed to support a wide range of lossless 149 video applications such as long-term audiovisual preservation, 150 scientific imaging, screen recording, and other video encoding 151 scenarios that seek to avoid the generational loss of lossy video 152 encodings. 154 This document defines version 0, 1 and 3 of FFV1. The distinctions 155 of the versions are provided throughout the document, but in summary: 157 * Version 0 of FFV1 was the original implementation of FFV1 and has 158 been in non-experimental use since April 14, 2006 [FFV1_V0]. 160 * Version 1 of FFV1 adds support of more video bit depths and has 161 been in use since April 24, 2009 [FFV1_V1]. 163 * Version 2 of FFV1 only existed in experimental form and is not 164 described by this document, but is available as a LyX file at 165 https://github.com/FFmpeg/FFV1/ 166 blob/8ad772b6d61c3dd8b0171979a2cd9f11924d5532/ffv1.lyx 167 (https://github.com/FFmpeg/FFV1/ 168 blob/8ad772b6d61c3dd8b0171979a2cd9f11924d5532/ffv1.lyx). 170 * Version 3 of FFV1 adds several features such as increased 171 description of the characteristics of the encoding images and 172 embedded CRC data to support fixity verification of the encoding. 173 Version 3 has been in non-experimental use since August 17, 2013 174 [FFV1_V3]. 176 This document assumes familiarity with mathematical and coding 177 concepts such as Range coding [range-coding] and YCbCr color spaces 178 [YCbCr]. 180 This specification describes the valid bitstream and how to decode 181 such valid bitstream. Bitstreams not conforming to this 182 specification or how they are handled is outside this specification. 183 A decoder could reject every invalid bitstream or attempt to perform 184 error concealment or re-download or use a redundant copy of the 185 invalid part or any other action it deems appropriate. 187 2. Notation and Conventions 189 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 190 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 191 "OPTIONAL" in this document are to be interpreted as described in BCP 192 14 [RFC2119] [RFC8174] when, and only when, they appear in all 193 capitals, as shown here. 195 2.1. Definitions 197 "Container": Format that encapsulates "Frames" (see Section 4.4) and 198 (when required) a "Configuration Record" into a bitstream. 200 "Sample": The smallest addressable representation of a color 201 component or a luma component in a "Frame". Examples of "Sample" are 202 Luma (Y), Blue-difference Chroma (Cb), Red-difference Chroma (Cr), 203 Transparency, Red, Green, and Blue. 205 "Plane": A discrete component of a static image comprised of 206 "Samples" that represent a specific quantification of "Samples" of 207 that image. 209 "Pixel": The smallest addressable representation of a color in a 210 "Frame". It is composed of one or more "Samples". 212 "ESC": An ESCape symbol to indicate that the symbol to be stored is 213 too large for normal storage and that an alternate storage method is 214 used. 216 "MSB": Most Significant Bit, the bit that can cause the largest 217 change in magnitude of the symbol. 219 "VLC": Variable Length Code, a code that maps source symbols to a 220 variable number of bits. 222 "RGB": A reference to the method of storing the value of a "Pixel" by 223 using three numeric values that represent Red, Green, and Blue. 225 "YCbCr": A reference to the method of storing the value of a "Pixel" 226 by using three numeric values that represent the luma of the "Pixel" 227 (Y) and the chroma of the "Pixel" (Cb and Cr). YCbCr word is used 228 for historical reasons and currently references any color space 229 relying on 1 luma "Sample" and 2 chroma "Samples", e.g. YCbCr, YCgCo 230 or ICtCp. The exact meaning of the three numeric values is 231 unspecified. 233 2.2. Conventions 234 2.2.1. Pseudo-code 236 The FFV1 bitstream is described in this document using pseudo-code. 237 Note that the pseudo-code is used for clarity in order to illustrate 238 the structure of FFV1 and not intended to specify any particular 239 implementation. The pseudo-code used is based upon the C programming 240 language [ISO.9899.1990] and uses its "if/else", "while" and "for" 241 keywords as well as functions defined within this document. 243 In some instances, pseudo-code is presented in a two-column format 244 such as shown in Figure 1. In this form the "type" column provides a 245 symbol as defined in Table 4 that defines the storage of the data 246 referenced in that same line of pseudo-code. 248 pseudo-code | type 249 --------------------------------------------------------------|----- 250 ExamplePseudoCode( ) { | 251 value | ur 252 } | 254 Figure 1: A depiction of type-labelled pseudo-code used within 255 this document. 257 2.2.2. Arithmetic Operators 259 Note: the operators and the order of precedence are the same as used 260 in the C programming language [ISO.9899.2018], with the exception of 261 ">>" (removal of implementation defined behavior) and "^" (power 262 instead of XOR) operators which are re-defined within this section. 264 "a + b" means a plus b. 266 "a - b" means a minus b. 268 "-a" means negation of a. 270 "a * b" means a multiplied by b. 272 "a / b" means a divided by b. 274 "a ^ b" means a raised to the b-th power. 276 "a & b" means bit-wise "and" of a and b. 278 "a | b" means bit-wise "or" of a and b. 280 "a >> b" means arithmetic right shift of two's complement integer 281 representation of a by b binary digits. This is equivalent to 282 dividing a by 2, b times, with rounding toward negative infinity. 284 "a << b" means arithmetic left shift of two's complement integer 285 representation of a by b binary digits. 287 2.2.3. Assignment Operators 289 "a = b" means a is assigned b. 291 "a++" is equivalent to a is assigned a + 1. 293 "a--" is equivalent to a is assigned a - 1. 295 "a += b" is equivalent to a is assigned a + b. 297 "a -= b" is equivalent to a is assigned a - b. 299 "a *= b" is equivalent to a is assigned a * b. 301 2.2.4. Comparison Operators 303 "a > b" means a is greater than b. 305 "a >= b" means a is greater than or equal to b. 307 "a < b" means a is less than b. 309 "a <= b" means a is less than or equal b. 311 "a == b" means a is equal to b. 313 "a != b" means a is not equal to b. 315 "a && b" means Boolean logical "and" of a and b. 317 "a || b" means Boolean logical "or" of a and b. 319 "!a" means Boolean logical "not" of a. 321 "a ? b : c" if a is true, then b, otherwise c. 323 2.2.5. Mathematical Functions 325 "floor(a)" means the largest integer less than or equal to a. 327 "ceil(a)" means the smallest integer greater than or equal to a. 329 "sign(a)" extracts the sign of a number, i.e. if a < 0 then -1, else 330 if a > 0 then 1, else 0. 332 "abs(a)" means the absolute value of a, i.e. "abs(a)" = "sign(a) * 333 a". 335 "log2(a)" means the base-two logarithm of a. 337 "min(a,b)" means the smallest of two values a and b. 339 "max(a,b)" means the largest of two values a and b. 341 "median(a,b,c)" means the numerical middle value in a data set of a, 342 b, and c, i.e. a+b+c-min(a,b,c)-max(a,b,c). 344 "A <== B" means B implies A. 346 "A <==> B" means A <== B , B <== A. 348 a_(b) means the b-th value of a sequence of a 350 a_(b,c) means the 'b,c'-th value of a sequence of a 352 2.2.6. Order of Operation Precedence 354 When order of precedence is not indicated explicitly by use of 355 parentheses, operations are evaluated in the following order (from 356 top to bottom, operations of same precedence being evaluated from 357 left to right). This order of operations is based on the order of 358 operations used in Standard C. 360 a++, a-- 361 !a, -a 362 a ^ b 363 a * b, a / b, a % b 364 a + b, a - b 365 a << b, a >> b 366 a < b, a <= b, a > b, a >= b 367 a == b, a != b 368 a & b 369 a | b 370 a && b 371 a || b 372 a ? b : c 373 a = b, a += b, a -= b, a *= b 375 2.2.7. Range 377 "a...b" means any value starting from a to b, inclusive. 379 2.2.8. NumBytes 381 "NumBytes" is a non-negative integer that expresses the size in 8-bit 382 octets of a particular FFV1 "Configuration Record" or "Frame". FFV1 383 relies on its "Container" to store the "NumBytes" values; see 384 Section 4.3.3. 386 2.2.9. Bitstream Functions 388 2.2.9.1. remaining_bits_in_bitstream 390 "remaining_bits_in_bitstream( )" means the count of remaining bits 391 after the pointer in that "Configuration Record" or "Frame". It is 392 computed from the "NumBytes" value multiplied by 8 minus the count of 393 bits of that "Configuration Record" or "Frame" already read by the 394 bitstream parser. 396 2.2.9.2. remaining_symbols_in_syntax 398 "remaining_symbols_in_syntax( )" is true as long as the RangeCoder 399 has not consumed all the given input bytes. 401 2.2.9.3. byte_aligned 403 "byte_aligned( )" is true if "remaining_bits_in_bitstream( NumBytes 404 )" is a multiple of 8, otherwise false. 406 2.2.9.4. get_bits 408 "get_bits( i )" is the action to read the next "i" bits in the 409 bitstream, from most significant bit to least significant bit, and to 410 return the corresponding value. The pointer is increased by "i". 412 3. Sample Coding 414 For each "Slice" (as described in Section 4.5) of a "Frame", the 415 "Planes", "Lines", and "Samples" are coded in an order determined by 416 the "Color Space" (see Section 3.7). Each "Sample" is predicted by 417 the median predictor as described in Section 3.3 from other "Samples" 418 within the same "Plane" and the difference is stored using the method 419 described in Section 3.8. 421 3.1. Border 423 A border is assumed for each coded "Slice" for the purpose of the 424 median predictor and context according to the following rules: 426 * one column of "Samples" to the left of the coded slice is assumed 427 as identical to the "Samples" of the leftmost column of the coded 428 slice shifted down by one row. The value of the topmost "Sample" 429 of the column of "Samples" to the left of the coded slice is 430 assumed to be "0" 432 * one column of "Samples" to the right of the coded slice is assumed 433 as identical to the "Samples" of the rightmost column of the coded 434 slice 436 * an additional column of "Samples" to the left of the coded slice 437 and two rows of "Samples" above the coded slice are assumed to be 438 "0" 440 Figure 2 depicts a slice of 9 "Samples" "a,b,c,d,e,f,g,h,i" in a 3x3 441 arrangement along with its assumed border. 443 +---+---+---+---+---+---+---+---+ 444 | 0 | 0 | | 0 | 0 | 0 | | 0 | 445 +---+---+---+---+---+---+---+---+ 446 | 0 | 0 | | 0 | 0 | 0 | | 0 | 447 +---+---+---+---+---+---+---+---+ 448 | | | | | | | | | 449 +---+---+---+---+---+---+---+---+ 450 | 0 | 0 | | a | b | c | | c | 451 +---+---+---+---+---+---+---+---+ 452 | 0 | a | | d | e | f | | f | 453 +---+---+---+---+---+---+---+---+ 454 | 0 | d | | g | h | i | | i | 455 +---+---+---+---+---+---+---+---+ 457 Figure 2: A depiction of FFV1's assumed border for a set example 458 Samples. 460 3.2. Samples 462 Relative to any "Sample" "X", six other relatively positioned 463 "Samples" from the coded "Samples" and presumed border are identified 464 according to the labels used in Figure 3. The labels for these 465 relatively positioned "Samples" are used within the median predictor 466 and context. 468 +---+---+---+---+ 469 | | | T | | 470 +---+---+---+---+ 471 | |tl | t |tr | 472 +---+---+---+---+ 473 | L | l | X | | 474 +---+---+---+---+ 476 Figure 3: A depiction of how relatively positions Samples are 477 references within this document. 479 The labels for these relative "Samples" are made of the first letters 480 of the words Top, Left and Right. 482 3.3. Median Predictor 484 The prediction for any "Sample" value at position "X" may be computed 485 based upon the relative neighboring values of "l", "t", and "tl" via 486 this equation: 488 median(l, t, l + t - tl) 490 Note, this prediction template is also used in [ISO.14495-1.1999] and 491 [HuffYUV]. 493 Exception for the median predictor: if "colorspace_type == 0 && 494 bits_per_raw_sample == 16 && ( coder_type == 1 || coder_type == 2 )", 495 the following median predictor MUST be used: 497 median(left16s, top16s, left16s + top16s - diag16s) 499 where: 501 left16s = l >= 32768 ? ( l - 65536 ) : l 502 top16s = t >= 32768 ? ( t - 65536 ) : t 503 diag16s = tl >= 32768 ? ( tl - 65536 ) : tl 505 Background: a two's complement signed 16-bit signed integer was used 506 for storing "Sample" values in all known implementations of FFV1 507 bitstream. So in some circumstances, the most significant bit was 508 wrongly interpreted (used as a sign bit instead of the 16th bit of an 509 unsigned integer). Note that when the issue was discovered, the only 510 configuration of all known implementations being impacted is 16-bit 511 YCbCr with no Pixel transformation with Range Coder coder, as other 512 potentially impacted configurations (e.g. 15/16-bit JPEG2000-RCT with 513 Range Coder coder, or 16-bit content with Golomb Rice coder) were 514 implemented nowhere [ISO.15444-1.2016]. In the meanwhile, 16-bit 515 JPEG2000-RCT with Range Coder coder was implemented without this 516 issue in one implementation and validated by one conformance checker. 517 It is expected (to be confirmed) to remove this exception for the 518 median predictor in the next version of the FFV1 bitstream. 520 3.4. Quantization Table Sets 522 The FFV1 bitstream contains one or more Quantization Table Sets. 523 Each Quantization Table Set contains exactly 5 Quantization Tables 524 with each Quantization Table corresponding to one of the five 525 Quantized Sample Differences. For each Quantization Table, both the 526 number of quantization steps and their distribution are stored in the 527 FFV1 bitstream; each Quantization Table has exactly 256 entries, and 528 the 8 least significant bits of the Quantized Sample Difference are 529 used as index: 531 Q_(j)[k] = quant_tables[i][j][k&255] 533 Figure 4 535 In this formula, "i" is the Quantization Table Set index, "j" is the 536 Quantized Table index, "k" the Quantized Sample Difference. 538 3.5. Context 540 Relative to any "Sample" "X", the Quantized Sample Differences "L-l", 541 "l-tl", "tl-t", "T-t", and "t-tr" are used as context: 543 context = Q_(0)[l - tl] + 544 Q_(1)[tl - t] + 545 Q_(2)[t - tr] + 546 Q_(3)[L - l] + 547 Q_(4)[T - t] 549 Figure 5 551 If "context >= 0" then "context" is used and the difference between 552 the "Sample" and its predicted value is encoded as is, else 553 "-context" is used and the difference between the "Sample" and its 554 predicted value is encoded with a flipped sign. 556 3.6. Quantization Table Set Indexes 558 For each "Plane" of each slice, a Quantization Table Set is selected 559 from an index: 561 * For Y "Plane", "quant_table_set_index[ 0 ]" index is used 563 * For Cb and Cr "Planes", "quant_table_set_index[ 1 ]" index is used 564 * For extra "Plane", "quant_table_set_index[ (version <= 3 || 565 chroma_planes) ? 2 : 1 ]" index is used 567 Background: in first implementations of FFV1 bitstream, the index for 568 Cb and Cr "Planes" was stored even if it is not used (chroma_planes 569 set to 0), this index is kept for "version" <= 3 in order to keep 570 compatibility with FFV1 bitstreams in the wild. 572 3.7. Color spaces 574 FFV1 supports several color spaces. The count of allowed coded 575 planes and the meaning of the extra "Plane" are determined by the 576 selected color space. 578 The FFV1 bitstream interleaves data in an order determined by the 579 color space. In YCbCr for each "Plane", each "Line" is coded from 580 top to bottom and for each "Line", each "Sample" is coded from left 581 to right. In JPEG2000-RCT for each "Line" from top to bottom, each 582 "Plane" is coded and for each "Plane", each "Sample" is encoded from 583 left to right. 585 3.7.1. YCbCr 587 This color space allows 1 to 4 "Planes". 589 The Cb and Cr "Planes" are optional, but if used then MUST be used 590 together. Omitting the Cb and Cr "Planes" codes the frames in 591 grayscale without color data. 593 An optional transparency "Plane" can be used to code transparency 594 data. 596 An FFV1 "Frame" using YCbCr MUST use one of the following 597 arrangements: 599 * Y 601 * Y, Transparency 603 * Y, Cb, Cr 605 * Y, Cb, Cr, Transparency 607 The Y "Plane" MUST be coded first. If the Cb and Cr "Planes" are 608 used then they MUST be coded after the Y "Plane". If a transparency 609 "Plane" is used, then it MUST be coded last. 611 3.7.2. RGB 613 This color space allows 3 or 4 "Planes". 615 An optional transparency "Plane" can be used to code transparency 616 data. 618 JPEG2000-RCT is a Reversible Color Transform that codes RGB (red, 619 green, blue) "Planes" losslessly in a modified YCbCr color space 620 [ISO.15444-1.2016]. Reversible Pixel transformations between YCbCr 621 and RGB use the following formulae. 623 Cb = b - g 624 Cr = r - g 625 Y = g + (Cb + Cr) >> 2 626 g = Y - (Cb + Cr) >> 2 627 r = Cr + g 628 b = Cb + g 630 Figure 6 632 Exception for the JPEG2000-RCT conversion: if "bits_per_raw_sample" 633 is between 9 and 15 inclusive and "extra_plane" is 0, the following 634 formulae for reversible conversions between YCbCr and RGB MUST be 635 used instead of the ones above: 637 Cb = g - b 638 Cr = r - b 639 Y = b +(Cb + Cr) >> 2 640 b = Y -(Cb + Cr) >> 2 641 r = Cr + b 642 g = Cb + b 644 Figure 7 646 Background: At the time of this writing, in all known implementations 647 of FFV1 bitstream, when "bits_per_raw_sample" was between 9 and 15 648 inclusive and "extra_plane" is 0, GBR "Planes" were used as BGR 649 "Planes" during both encoding and decoding. In the meanwhile, 16-bit 650 JPEG2000-RCT was implemented without this issue in one implementation 651 and validated by one conformance checker. Methods to address this 652 exception for the transform are under consideration for the next 653 version of the FFV1 bitstream. 655 Cb and Cr are positively offset by "1 << bits_per_raw_sample" after 656 the conversion from RGB to the modified YCbCr and are negatively 657 offseted by the same value before the conversion from the modified 658 YCbCr to RGB, in order to have only non-negative values after the 659 conversion. 661 When FFV1 uses the JPEG2000-RCT, the horizontal "Lines" are 662 interleaved to improve caching efficiency since it is most likely 663 that the JPEG2000-RCT will immediately be converted to RGB during 664 decoding. The interleaved coding order is also Y, then Cb, then Cr, 665 and then if used transparency. 667 As an example, a "Frame" that is two "Pixels" wide and two "Pixels" 668 high, could comprise the following structure: 670 +------------------------+------------------------+ 671 | Pixel(1,1) | Pixel(2,1) | 672 | Y(1,1) Cb(1,1) Cr(1,1) | Y(2,1) Cb(2,1) Cr(2,1) | 673 +------------------------+------------------------+ 674 | Pixel(1,2) | Pixel(2,2) | 675 | Y(1,2) Cb(1,2) Cr(1,2) | Y(2,2) Cb(2,2) Cr(2,2) | 676 +------------------------+------------------------+ 678 In JPEG2000-RCT, the coding order would be left to right and then top 679 to bottom, with values interleaved by "Lines" and stored in this 680 order: 682 Y(1,1) Y(2,1) Cb(1,1) Cb(2,1) Cr(1,1) Cr(2,1) Y(1,2) Y(2,2) Cb(1,2) 683 Cb(2,2) Cr(1,2) Cr(2,2) 685 3.8. Coding of the Sample Difference 687 Instead of coding the n+1 bits of the Sample Difference with Huffman 688 or Range coding (or n+2 bits, in the case of JPEG2000-RCT), only the 689 n (or n+1, in the case of JPEG2000-RCT) least significant bits are 690 used, since this is sufficient to recover the original "Sample". In 691 the equation below, the term "bits" represents "bits_per_raw_sample + 692 1" for JPEG2000-RCT or "bits_per_raw_sample" otherwise: 694 coder_input = [(sample_difference + 2 ^ (bits - 1)) & 695 (2 ^ bits - 1)] - 2 ^ (bits - 1) 697 Figure 8: Description of the coding of the Sample Difference in 698 the bitstream. 700 3.8.1. Range Coding Mode 702 Early experimental versions of FFV1 used the CABAC Arithmetic coder 703 from H.264 as defined in [ISO.14496-10.2014] but due to the uncertain 704 patent/royalty situation, as well as its slightly worse performance, 705 CABAC was replaced by a Range coder based on an algorithm defined by 706 G. Nigel and N. Martin in 1979 [range-coding]. 708 3.8.1.1. Range Binary Values 710 To encode binary digits efficiently a Range coder is used. C_(i) is 711 the i-th Context. B_(i) is the i-th byte of the bytestream. b_(i) is 712 the i-th Range coded binary value, S_(0, i) is the i-th initial 713 state. The length of the bytestream encoding n binary symbols is 714 j_(n) bytes. 716 r_(i) = floor( ( R_(i) * S_(i, C_(i)) ) / 2 ^ 8 ) 718 Figure 9 720 S_(i + 1, C_(i)) = zero_state_(S_(i, C_(i))) AND 721 l_(i) = L_(i) AND 722 t_(i) = R_(i) - r_(i) <== 723 b_(i) = 0 <==> 724 L_(i) < R_(i) - r_(i) 726 S_(i + 1, C_(i)) = one_state_(S_(i, C_(i))) AND 727 l_(i) = L_(i) - R_(i) + r_(i) AND 728 t_(i) = r_(i) <== 729 b_(i) = 1 <==> 730 L_(i) >= R_(i) - r_(i) 732 Figure 10 734 S_(i + 1, k) = S_(i, k) <== C_(i) != k 736 Figure 11 738 R_(i + 1) = 2 ^ 8 * t_(i) AND 739 L_(i + 1) = 2 ^ 8 * l_(i) + B_(j_(i)) AND 740 j_(i + 1) = j_(i) + 1 <== 741 t_(i) < 2 ^ 8 743 R_(i + 1) = t_(i) AND 744 L_(i + 1) = l_(i) AND 745 j_(i + 1) = j_(i) <== 746 t_(i) >= 2 ^ 8 747 Figure 12 749 R_(0) = 65280 751 Figure 13 753 L_(0) = 2 ^ 8 * B_(0) + B_(1) 755 Figure 14 757 j_(0) = 2 759 Figure 15 761 3.8.1.1.1. Termination 763 The range coder can be used in three modes. 765 * In "Open mode" when decoding, every symbol the reader attempts to 766 read is available. In this mode arbitrary data can have been 767 appended without affecting the range coder output. This mode is 768 not used in FFV1. 770 * In "Closed mode" the length in bytes of the bytestream is provided 771 to the range decoder. Bytes beyond the length are read as 0 by 772 the range decoder. This is generally one byte shorter than the 773 open mode. 775 * In "Sentinel mode" the exact length in bytes is not known and thus 776 the range decoder MAY read into the data that follows the range 777 coded bytestream by one byte. In "Sentinel mode", the end of the 778 range coded bytestream is a binary symbol with state 129, which 779 value SHALL be discarded. After reading this symbol, the range 780 decoder will have read one byte beyond the end of the range coded 781 bytestream. This way the byte position of the end can be 782 determined. Bytestreams written in "Sentinel mode" can be read in 783 "Closed mode" if the length can be determined, in this case the 784 last (sentinel) symbol will be read non-corrupted and be of value 785 0. 787 Above describes the range decoding. Encoding is defined as any 788 process which produces a decodable bytestream. 790 There are three places where range coder termination is needed in 791 FFV1. First is in the "Configuration Record", in this case the size 792 of the range coded bytestream is known and handled as "Closed mode". 793 Second is the switch from the "Slice Header" which is range coded to 794 Golomb coded slices as "Sentinel mode". Third is the end of range 795 coded Slices which need to terminate before the CRC at their end. 796 This can be handled as "Sentinel mode" or as "Closed mode" if the CRC 797 position has been determined. 799 3.8.1.2. Range Non Binary Values 801 To encode scalar integers, it would be possible to encode each bit 802 separately and use the past bits as context. However that would mean 803 255 contexts per 8-bit symbol that is not only a waste of memory but 804 also requires more past data to reach a reasonably good estimate of 805 the probabilities. Alternatively assuming a Laplacian distribution 806 and only dealing with its variance and mean (as in Huffman coding) 807 would also be possible, however, for maximum flexibility and 808 simplicity, the chosen method uses a single symbol to encode if a 809 number is 0, and if not, encodes the number using its exponent, 810 mantissa and sign. The exact contexts used are best described by 811 Figure 16. 813 int get_symbol(RangeCoder *c, uint8_t *state, int is_signed) { 814 if (get_rac(c, state + 0) { 815 return 0; 816 } 818 int e = 0; 819 while (get_rac(c, state + 1 + min(e, 9)) { //1..10 820 e++; 821 } 823 int a = 1; 824 for (int i = e - 1; i >= 0; i--) { 825 a = a * 2 + get_rac(c, state + 22 + min(i, 9)); // 22..31 826 } 828 if (!is_signed) { 829 return a; 830 } 832 if (get_rac(c, state + 11 + min(e, 10))) { //11..21 833 return -a; 834 } else { 835 return a; 836 } 837 } 839 Figure 16: A pseudo-code description of the contexts of Range Non 840 Binary Values. 842 "get_symbol" is used for the read out of "sample_difference" 843 indicated in Figure 8. 845 "get_rac" returns a boolean, computed from the bytestream as 846 described in Section 3.8.1.1. 848 3.8.1.3. Initial Values for the Context Model 850 At keyframes all Range coder state variables are set to their initial 851 state. 853 3.8.1.4. State Transition Table 855 one_state_(i) = 856 default_state_transition_(i) + state_transition_delta_(i) 858 Figure 17 860 zero_state_(i) = 256 - one_state_(256-i) 862 Figure 18 864 3.8.1.5. default_state_transition 865 0, 0, 0, 0, 0, 0, 0, 0, 20, 21, 22, 23, 24, 25, 26, 27, 867 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 37, 38, 39, 40, 41, 42, 869 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 56, 57, 871 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 873 74, 75, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 875 89, 90, 91, 92, 93, 94, 94, 95, 96, 97, 98, 99,100,101,102,103, 877 104,105,106,107,108,109,110,111,112,113,114,114,115,116,117,118, 879 119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,133, 881 134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149, 883 150,151,152,152,153,154,155,156,157,158,159,160,161,162,163,164, 885 165,166,167,168,169,170,171,171,172,173,174,175,176,177,178,179, 887 180,181,182,183,184,185,186,187,188,189,190,190,191,192,194,194, 889 195,196,197,198,199,200,201,202,202,204,205,206,207,208,209,209, 891 210,211,212,213,215,215,216,217,218,219,220,220,222,223,224,225, 893 226,227,227,229,229,230,231,232,234,234,235,236,237,238,239,240, 895 241,242,243,244,245,246,247,248,248, 0, 0, 0, 0, 0, 0, 0, 897 3.8.1.6. Alternative State Transition Table 899 The alternative state transition table has been built using iterative 900 minimization of frame sizes and generally performs better than the 901 default. To use it, the "coder_type" (see Section 4.2.3) MUST be set 902 to 2 and the difference to the default MUST be stored in the 903 "Parameters", see Section 4.2. The reference implementation of FFV1 904 in FFmpeg uses Figure 19 by default at the time of this writing when 905 Range coding is used. 907 0, 10, 10, 10, 10, 16, 16, 16, 28, 16, 16, 29, 42, 49, 20, 49, 909 59, 25, 26, 26, 27, 31, 33, 33, 33, 34, 34, 37, 67, 38, 39, 39, 911 40, 40, 41, 79, 43, 44, 45, 45, 48, 48, 64, 50, 51, 52, 88, 52, 913 53, 74, 55, 57, 58, 58, 74, 60,101, 61, 62, 84, 66, 66, 68, 69, 915 87, 82, 71, 97, 73, 73, 82, 75,111, 77, 94, 78, 87, 81, 83, 97, 917 85, 83, 94, 86, 99, 89, 90, 99,111, 92, 93,134, 95, 98,105, 98, 919 105,110,102,108,102,118,103,106,106,113,109,112,114,112,116,125, 921 115,116,117,117,126,119,125,121,121,123,145,124,126,131,127,129, 923 165,130,132,138,133,135,145,136,137,139,146,141,143,142,144,148, 925 147,155,151,149,151,150,152,157,153,154,156,168,158,162,161,160, 927 172,163,169,164,166,184,167,170,177,174,171,173,182,176,180,178, 929 175,189,179,181,186,183,192,185,200,187,191,188,190,197,193,196, 931 197,194,195,196,198,202,199,201,210,203,207,204,205,206,208,214, 933 209,211,221,212,213,215,224,216,217,218,219,220,222,228,223,225, 935 226,224,227,229,240,230,231,232,233,234,235,236,238,239,237,242, 937 241,243,242,244,245,246,247,248,249,250,251,252,252,253,254,255, 939 Figure 19: Alternative state transition table for Range coding. 941 3.8.2. Golomb Rice Mode 943 The end of the bitstream of the "Frame" is filled with 0-bits until 944 that the bitstream contains a multiple of 8 bits. 946 3.8.2.1. Signed Golomb Rice Codes 948 This coding mode uses Golomb Rice codes. The VLC is split into two 949 parts. The prefix stores the most significant bits and the suffix 950 stores the k least significant bits or stores the whole number in the 951 ESC case. 953 int get_ur_golomb(k) { 954 for (prefix = 0; prefix < 12; prefix++) { 955 if (get_bits(1)) { 956 return get_bits(k) + (prefix << k); 957 } 958 } 959 return get_bits(bits) + 11; 960 } 962 Figure 20: A pseudo-code description of the read of an unsigned 963 integer in Golomb Rice mode. 965 int get_sr_golomb(k) { 966 v = get_ur_golomb(k); 967 if (v & 1) return - (v >> 1) - 1; 968 else return (v >> 1); 969 } 971 Figure 21: A pseudo-code description of the read of a signed 972 integer in Golomb Rice mode. 974 3.8.2.1.1. Prefix 976 +================+=======+ 977 | bits | value | 978 +================+=======+ 979 | 1 | 0 | 980 +----------------+-------+ 981 | 01 | 1 | 982 +----------------+-------+ 983 | ... | ... | 984 +----------------+-------+ 985 | 0000 0000 01 | 9 | 986 +----------------+-------+ 987 | 0000 0000 001 | 10 | 988 +----------------+-------+ 989 | 0000 0000 0001 | 11 | 990 +----------------+-------+ 991 | 0000 0000 0000 | ESC | 992 +----------------+-------+ 994 Table 1 996 3.8.2.1.2. Suffix 998 +=========+========================================+ 999 +=========+========================================+ 1000 | non ESC | the k least significant bits MSB first | 1001 +---------+----------------------------------------+ 1002 | ESC | the value - 11, in MSB first order | 1003 +---------+----------------------------------------+ 1005 Table 2 1007 "ESC" MUST NOT be used if the value can be coded as "non ESC". 1009 3.8.2.1.3. Examples 1011 +=====+=======================+=======+ 1012 | k | bits | value | 1013 +=====+=======================+=======+ 1014 | 0 | 1 | 0 | 1015 +-----+-----------------------+-------+ 1016 | 0 | 001 | 2 | 1017 +-----+-----------------------+-------+ 1018 | 2 | 1 00 | 0 | 1019 +-----+-----------------------+-------+ 1020 | 2 | 1 10 | 2 | 1021 +-----+-----------------------+-------+ 1022 | 2 | 01 01 | 5 | 1023 +-----+-----------------------+-------+ 1024 | any | 000000000000 10000000 | 139 | 1025 +-----+-----------------------+-------+ 1027 Table 3 1029 3.8.2.2. Run Mode 1031 Run mode is entered when the context is 0 and left as soon as a non-0 1032 difference is found. The level is identical to the predicted one. 1033 The run and the first different level are coded. 1035 3.8.2.2.1. Run Length Coding 1037 The run value is encoded in two parts. The prefix part stores the 1038 more significant part of the run as well as adjusting the "run_index" 1039 that determines the number of bits in the less significant part of 1040 the run. The second part of the value stores the less significant 1041 part of the run as it is. The "run_index" is reset for each "Plane" 1042 and slice to 0. 1044 log2_run[41] = { 1045 0, 0, 0, 0, 1, 1, 1, 1, 1046 2, 2, 2, 2, 3, 3, 3, 3, 1047 4, 4, 5, 5, 6, 6, 7, 7, 1048 8, 9,10,11,12,13,14,15, 1049 16,17,18,19,20,21,22,23, 1050 24, 1051 }; 1053 if (run_count == 0 && run_mode == 1) { 1054 if (get_bits(1)) { 1055 run_count = 1 << log2_run[run_index]; 1056 if (x + run_count <= w) { 1057 run_index++; 1058 } 1059 } else { 1060 if (log2_run[run_index]) { 1061 run_count = get_bits(log2_run[run_index]); 1062 } else { 1063 run_count = 0; 1064 } 1065 if (run_index) { 1066 run_index--; 1067 } 1068 run_mode = 2; 1069 } 1070 } 1072 The "log2_run" array is also used within [ISO.14495-1.1999]. 1074 3.8.2.3. Sign extension 1076 "sign_extend" is the function of increasing the number of bits of an 1077 input binary number in twos complement signed number representation 1078 while preserving the input number's sign (positive/negative) and 1079 value, in order to fit in the output bit width. It MAY be computed 1080 with: 1082 sign_extend(input_number, input_bits) { 1083 negative_bias = 1 << (input_bits - 1); 1084 bits_mask = negative_bias - 1; 1085 output_number = input_number & bits_mask; // Remove negative bit 1086 is_negative = input_number & negative_bias; // Test negative bit 1087 if (is_negative) 1088 output_number -= negative_bias; 1089 return output_number 1090 } 1092 3.8.2.4. Scalar Mode 1094 Each difference is coded with the per context mean prediction removed 1095 and a per context value for k. 1097 get_vlc_symbol(state) { 1098 i = state->count; 1099 k = 0; 1100 while (i < state->error_sum) { 1101 k++; 1102 i += i; 1103 } 1105 v = get_sr_golomb(k); 1107 if (2 * state->drift < -state->count) { 1108 v = -1 - v; 1109 } 1111 ret = sign_extend(v + state->bias, bits); 1113 state->error_sum += abs(v); 1114 state->drift += v; 1116 if (state->count == 128) { 1117 state->count >>= 1; 1118 state->drift >>= 1; 1119 state->error_sum >>= 1; 1120 } 1121 state->count++; 1122 if (state->drift <= -state->count) { 1123 state->bias = max(state->bias - 1, -128); 1125 state->drift = max(state->drift + state->count, 1126 -state->count + 1); 1127 } else if (state->drift > 0) { 1128 state->bias = min(state->bias + 1, 127); 1130 state->drift = min(state->drift - state->count, 0); 1131 } 1133 return ret; 1134 } 1136 3.8.2.4.1. Level Coding 1138 Level coding is identical to the normal difference coding with the 1139 exception that the 0 value is removed as it cannot occur: 1141 diff = get_vlc_symbol(context_state); 1142 if (diff >= 0) { 1143 diff++; 1144 } 1146 Note, this is different from JPEG-LS, which doesn't use prediction in 1147 run mode and uses a different encoding and context model for the last 1148 difference. On a small set of test "Samples" the use of prediction 1149 slightly improved the compression rate. 1151 3.8.2.5. Initial Values for the VLC context state 1153 At keyframes all coder state variables are set to their initial 1154 state. 1156 drift = 0; 1157 error_sum = 4; 1158 bias = 0; 1159 count = 1; 1161 4. Bitstream 1163 An FFV1 bitstream is composed of a series of one or more "Frames" and 1164 (when required) a "Configuration Record". 1166 Within the following sub-sections, pseudo-code is used to explain the 1167 structure of each FFV1 bitstream component, as described in 1168 Section 2.2.1. Table 4 lists symbols used to annotate that pseudo- 1169 code in order to define the storage of the data referenced in that 1170 line of pseudo-code. 1172 +========+==============================================+ 1173 | Symbol | Definition | 1174 +========+==============================================+ 1175 | u(n) | unsigned big endian integer using n bits | 1176 +--------+----------------------------------------------+ 1177 | sg | Golomb Rice coded signed scalar symbol coded | 1178 | | with the method described in Section 3.8.2 | 1179 +--------+----------------------------------------------+ 1180 | br | Range coded Boolean (1-bit) symbol with the | 1181 | | method described in Section 3.8.1.1 | 1182 +--------+----------------------------------------------+ 1183 | ur | Range coded unsigned scalar symbol coded | 1184 | | with the method described in Section 3.8.1.2 | 1185 +--------+----------------------------------------------+ 1186 | sr | Range coded signed scalar symbol coded with | 1187 | | the method described in Section 3.8.1.2 | 1188 +--------+----------------------------------------------+ 1189 | sd | Sample difference coded with the method | 1190 | | described in Section 3.8 | 1191 +--------+----------------------------------------------+ 1193 Table 4: Definition of pseudo-code symbols for this 1194 document. 1196 The following MUST be provided by external means during 1197 initialization of the decoder: 1199 "frame_pixel_width" is defined as "Frame" width in "Pixels". 1201 "frame_pixel_height" is defined as "Frame" height in "Pixels". 1203 Default values at the decoder initialization phase: 1205 "ConfigurationRecordIsPresent" is set to 0. 1207 4.1. Quantization Table Set 1209 The Quantization Table Sets are stored by storing the number of equal 1210 entries -1 of the first half of the table (represented as "len - 1" 1211 in the pseudo-code below) using the method described in 1212 Section 3.8.1.2. The second half doesn't need to be stored as it is 1213 identical to the first with flipped sign. "scale" and "len_count[ i 1214 ][ j ]" are temporary values used for the computing of 1215 "context_count[ i ]" and are not used outside Quantization Table Set 1216 pseudo-code. 1218 Example: 1220 Table: 0 0 1 1 1 1 2 2 -2 -2 -2 -1 -1 -1 -1 0 1222 Stored values: 1, 3, 1 1224 "QuantizationTableSet" has its own initial states, all set to 128. 1226 pseudo-code | type 1227 --------------------------------------------------------------|----- 1228 QuantizationTableSet( i ) { | 1229 scale = 1 | 1230 for (j = 0; j < MAX_CONTEXT_INPUTS; j++) { | 1231 QuantizationTable( i, j, scale ) | 1232 scale *= 2 * len_count[ i ][ j ] - 1 | 1233 } | 1234 context_count[ i ] = ceil( scale / 2 ) | 1235 } | 1237 "MAX_CONTEXT_INPUTS" is 5. 1239 pseudo-code | type 1240 --------------------------------------------------------------|----- 1241 QuantizationTable(i, j, scale) { | 1242 v = 0 | 1243 for (k = 0; k < 128;) { | 1244 len - 1 | ur 1245 for (n = 0; n < len; n++) { | 1246 quant_tables[ i ][ j ][ k ] = scale * v | 1247 k++ | 1248 } | 1249 v++ | 1250 } | 1251 for (k = 1; k < 128; k++) { | 1252 quant_tables[ i ][ j ][ 256 - k ] = \ | 1253 -quant_tables[ i ][ j ][ k ] | 1254 } | 1255 quant_tables[ i ][ j ][ 128 ] = \ | 1256 -quant_tables[ i ][ j ][ 127 ] | 1257 len_count[ i ][ j ] = v | 1258 } | 1260 4.1.1. quant_tables 1262 "quant_tables[ i ][ j ][ k ]" indicates the quantification table 1263 value of the Quantized Sample Difference "k" of the Quantization 1264 Table "j" of the Set Quantization Table Set "i". 1266 4.1.2. context_count 1268 "context_count[ i ]" indicates the count of contexts for Quantization 1269 Table Set "i". "context_count[ i ]" MUST be less than or equal to 1270 32768. 1272 4.2. Parameters 1274 The "Parameters" section contains significant characteristics about 1275 the decoding configuration used for all instances of "Frame" (in FFV1 1276 version 0 and 1) or the whole FFV1 bitstream (other versions), 1277 including the stream version, color configuration, and quantization 1278 tables. Figure 22 describes the contents of the bitstream. 1280 "Parameters" has its own initial states, all set to 128. 1282 pseudo-code | type 1283 --------------------------------------------------------------|----- 1284 Parameters( ) { | 1285 version | ur 1286 if (version >= 3) { | 1287 micro_version | ur 1288 } | 1289 coder_type | ur 1290 if (coder_type > 1) { | 1291 for (i = 1; i < 256; i++) { | 1292 state_transition_delta[ i ] | sr 1293 } | 1294 } | 1295 colorspace_type | ur 1296 if (version >= 1) { | 1297 bits_per_raw_sample | ur 1298 } | 1299 chroma_planes | br 1300 log2_h_chroma_subsample | ur 1301 log2_v_chroma_subsample | ur 1302 extra_plane | br 1303 if (version >= 3) { | 1304 num_h_slices - 1 | ur 1305 num_v_slices - 1 | ur 1306 quant_table_set_count | ur 1307 } | 1308 for (i = 0; i < quant_table_set_count; i++) { | 1309 QuantizationTableSet( i ) | 1310 } | 1311 if (version >= 3) { | 1312 for (i = 0; i < quant_table_set_count; i++) { | 1313 states_coded | br 1314 if (states_coded) { | 1315 for (j = 0; j < context_count[ i ]; j++) { | 1316 for (k = 0; k < CONTEXT_SIZE; k++) { | 1317 initial_state_delta[ i ][ j ][ k ] | sr 1318 } | 1319 } | 1320 } | 1321 } | 1322 ec | ur 1323 intra | ur 1324 } | 1325 } | 1327 Figure 22: A pseudo-code description of the bitstream contents. 1329 CONTEXT_SIZE is 32. 1331 4.2.1. version 1333 "version" specifies the version of the FFV1 bitstream. 1335 Each version is incompatible with other versions: decoders SHOULD 1336 reject FFV1 bitstreams due to an unknown version. 1338 Decoders SHOULD reject FFV1 bitstreams with version <= 1 && 1339 ConfigurationRecordIsPresent == 1. 1341 Decoders SHOULD reject FFV1 bitstreams with version >= 3 && 1342 ConfigurationRecordIsPresent == 0. 1344 +=======+=========================+ 1345 | value | version | 1346 +=======+=========================+ 1347 | 0 | FFV1 version 0 | 1348 +-------+-------------------------+ 1349 | 1 | FFV1 version 1 | 1350 +-------+-------------------------+ 1351 | 2 | reserved* | 1352 +-------+-------------------------+ 1353 | 3 | FFV1 version 3 | 1354 +-------+-------------------------+ 1355 | Other | reserved for future use | 1356 +-------+-------------------------+ 1358 Table 5 1360 * Version 2 was experimental and this document does not describe it. 1362 4.2.2. micro_version 1364 "micro_version" specifies the micro-version of the FFV1 bitstream. 1366 After a version is considered stable (a micro-version value is 1367 assigned to be the first stable variant of a specific version), each 1368 new micro-version after this first stable variant is compatible with 1369 the previous micro-version: decoders SHOULD NOT reject FFV1 1370 bitstreams due to an unknown micro-version equal or above the micro- 1371 version considered as stable. 1373 Meaning of "micro_version" for "version" 3: 1375 +=======+=========================+ 1376 | value | micro_version | 1377 +=======+=========================+ 1378 | 0...3 | reserved* | 1379 +-------+-------------------------+ 1380 | 4 | first stable variant | 1381 +-------+-------------------------+ 1382 | Other | reserved for future use | 1383 +-------+-------------------------+ 1385 Table 6: The definitions for 1386 "micro_version" values for FFV1 1387 version 3. 1389 * development versions may be incompatible with the stable variants. 1391 4.2.3. coder_type 1393 "coder_type" specifies the coder used. 1395 +=======+=================================================+ 1396 | value | coder used | 1397 +=======+=================================================+ 1398 | 0 | Golomb Rice | 1399 +-------+-------------------------------------------------+ 1400 | 1 | Range Coder with default state transition table | 1401 +-------+-------------------------------------------------+ 1402 | 2 | Range Coder with custom state transition table | 1403 +-------+-------------------------------------------------+ 1404 | Other | reserved for future use | 1405 +-------+-------------------------------------------------+ 1407 Table 7 1409 Restrictions: 1411 If "coder_type" is 0, then "bits_per_raw_sample" SHOULD NOT be > 8. 1413 Background: At the time of this writing, there is no known 1414 implementation of FFV1 bitstream supporting Golomb Rice algorithm 1415 with "bits_per_raw_sample" greater than 8, and Range Coder is 1416 prefered. 1418 4.2.4. state_transition_delta 1420 "state_transition_delta" specifies the Range coder custom state 1421 transition table. 1423 If "state_transition_delta" is not present in the FFV1 bitstream, all 1424 Range coder custom state transition table elements are assumed to be 1425 0. 1427 4.2.5. colorspace_type 1429 "colorspace_type" specifies the color space encoded, the pixel 1430 transformation used by the encoder, the extra plane content, as well 1431 as interleave method. 1433 +=======+=============+================+==============+=============+ 1434 | value | color space | pixel | extra plane | interleave | 1435 | | encoded | transformation | content | method | 1436 +=======+=============+================+==============+=============+ 1437 | 0 | YCbCr | None | Transparency | "Plane" | 1438 | | | | | then | 1439 | | | | | "Line" | 1440 +-------+-------------+----------------+--------------+-------------+ 1441 | 1 | RGB | JPEG2000-RCT | Transparency | "Line" | 1442 | | | | | then | 1443 | | | | | "Plane" | 1444 +-------+-------------+----------------+--------------+-------------+ 1445 | Other | reserved | reserved for | reserved for | reserved | 1446 | | for future | future use | future use | for future | 1447 | | use | | | use | 1448 +-------+-------------+----------------+--------------+-------------+ 1450 Table 8 1452 FFV1 bitstreams with "colorspace_type" == 1 && ("chroma_planes" != 1453 1 || "log2_h_chroma_subsample" != 0 || "log2_v_chroma_subsample" != 1454 0) are not part of this specification. 1456 4.2.6. chroma_planes 1458 "chroma_planes" indicates if chroma (color) "Planes" are present. 1460 +=======+=================================+ 1461 | value | presence | 1462 +=======+=================================+ 1463 | 0 | chroma "Planes" are not present | 1464 +-------+---------------------------------+ 1465 | 1 | chroma "Planes" are present | 1466 +-------+---------------------------------+ 1468 Table 9 1470 4.2.7. bits_per_raw_sample 1472 "bits_per_raw_sample" indicates the number of bits for each "Sample". 1473 Inferred to be 8 if not present. 1475 +=======+===================================+ 1476 | value | bits for each sample | 1477 +=======+===================================+ 1478 | 0 | reserved* | 1479 +-------+-----------------------------------+ 1480 | Other | the actual bits for each "Sample" | 1481 +-------+-----------------------------------+ 1483 Table 10 1485 * Encoders MUST NOT store "bits_per_raw_sample" = 0. Decoders SHOULD 1486 accept and interpret "bits_per_raw_sample" = 0 as 8. 1488 4.2.8. log2_h_chroma_subsample 1490 "log2_h_chroma_subsample" indicates the subsample factor, stored in 1491 powers to which the number 2 must be raised, between luma and chroma 1492 width ("chroma_width = 2 ^ -log2_h_chroma_subsample * luma_width"). 1494 4.2.9. log2_v_chroma_subsample 1496 "log2_v_chroma_subsample" indicates the subsample factor, stored in 1497 powers to which the number 2 must be raised, between luma and chroma 1498 height ("chroma_height = 2 ^ -log2_v_chroma_subsample * 1499 luma_height"). 1501 4.2.10. extra_plane 1503 "extra_plane" indicates if an extra "Plane" is present. 1505 +=======+==============================+ 1506 | value | presence | 1507 +=======+==============================+ 1508 | 0 | extra "Plane" is not present | 1509 +-------+------------------------------+ 1510 | 1 | extra "Plane" is present | 1511 +-------+------------------------------+ 1513 Table 11 1515 4.2.11. num_h_slices 1517 "num_h_slices" indicates the number of horizontal elements of the 1518 slice raster. 1520 Inferred to be 1 if not present. 1522 4.2.12. num_v_slices 1524 "num_v_slices" indicates the number of vertical elements of the slice 1525 raster. 1527 Inferred to be 1 if not present. 1529 4.2.13. quant_table_set_count 1531 "quant_table_set_count" indicates the number of Quantization 1532 Table Sets. "quant_table_set_count" MUST be less than or equal to 8. 1534 Inferred to be 1 if not present. 1536 MUST NOT be 0. 1538 4.2.14. states_coded 1540 "states_coded" indicates if the respective Quantization Table Set has 1541 the initial states coded. 1543 Inferred to be 0 if not present. 1545 +=======+================================+ 1546 | value | initial states | 1547 +=======+================================+ 1548 | 0 | initial states are not present | 1549 | | and are assumed to be all 128 | 1550 +-------+--------------------------------+ 1551 | 1 | initial states are present | 1552 +-------+--------------------------------+ 1554 Table 12 1556 4.2.15. initial_state_delta 1558 "initial_state_delta[ i ][ j ][ k ]" indicates the initial Range 1559 coder state, it is encoded using "k" as context index and 1561 pred = j ? initial_states[ i ][j - 1][ k ] : 128 1562 Figure 23 1564 initial_state[ i ][ j ][ k ] = 1565 ( pred + initial_state_delta[ i ][ j ][ k ] ) & 255 1567 Figure 24 1569 4.2.16. ec 1571 "ec" indicates the error detection/correction type. 1573 +=======+=================================================+ 1574 | value | error detection/correction type | 1575 +=======+=================================================+ 1576 | 0 | 32-bit CRC in "ConfigurationRecord" | 1577 +-------+-------------------------------------------------+ 1578 | 1 | 32-bit CRC in "Slice" and "ConfigurationRecord" | 1579 +-------+-------------------------------------------------+ 1580 | Other | reserved for future use | 1581 +-------+-------------------------------------------------+ 1583 Table 13 1585 4.2.17. intra 1587 "intra" indicates the constraint on "keyframe" in each instance of 1588 "Frame". 1590 Inferred to be 0 if not present. 1592 +=======+=======================================================+ 1593 | value | relationship | 1594 +=======+=======================================================+ 1595 | 0 | "keyframe" can be 0 or 1 (non keyframes or keyframes) | 1596 +-------+-------------------------------------------------------+ 1597 | 1 | "keyframe" MUST be 1 (keyframes only) | 1598 +-------+-------------------------------------------------------+ 1599 | Other | reserved for future use | 1600 +-------+-------------------------------------------------------+ 1602 Table 14 1604 4.3. Configuration Record 1606 In the case of a FFV1 bitstream with "version >= 3", a "Configuration 1607 Record" is stored in the underlying "Container" as described in 1608 Section 4.3.3. It contains the "Parameters" used for all instances 1609 of "Frame". The size of the "Configuration Record", "NumBytes", is 1610 supplied by the underlying "Container". 1612 pseudo-code | type 1613 -----------------------------------------------------------|----- 1614 ConfigurationRecord( NumBytes ) { | 1615 ConfigurationRecordIsPresent = 1 | 1616 Parameters( ) | 1617 while (remaining_symbols_in_syntax(NumBytes - 4)) { | 1618 reserved_for_future_use | br/ur/sr 1619 } | 1620 configuration_record_crc_parity | u(32) 1621 } | 1623 4.3.1. reserved_for_future_use 1625 "reserved_for_future_use" has semantics that are reserved for future 1626 use. 1628 Encoders conforming to this version of this specification SHALL NOT 1629 write this value. 1631 Decoders conforming to this version of this specification SHALL 1632 ignore its value. 1634 4.3.2. configuration_record_crc_parity 1636 "configuration_record_crc_parity" 32 bits that are chosen so that the 1637 "Configuration Record" as a whole has a CRC remainder of 0. 1639 This is equivalent to storing the CRC remainder in the 32-bit parity. 1641 The CRC generator polynomial used is described in Section 4.9.3. 1643 4.3.3. Mapping FFV1 into Containers 1645 This "Configuration Record" can be placed in any file format 1646 supporting "Configuration Records", fitting as much as possible with 1647 how the file format uses to store "Configuration Records". The 1648 "Configuration Record" storage place and "NumBytes" are currently 1649 defined and supported by this version of this specification for the 1650 following formats: 1652 4.3.3.1. AVI File Format 1654 The "Configuration Record" extends the stream format chunk ("AVI ", 1655 "hdlr", "strl", "strf") with the ConfigurationRecord bitstream. 1657 See [AVI] for more information about chunks. 1659 "NumBytes" is defined as the size, in bytes, of the strf chunk 1660 indicated in the chunk header minus the size of the stream format 1661 structure. 1663 4.3.3.2. ISO Base Media File Format 1665 The "Configuration Record" extends the sample description box 1666 ("moov", "trak", "mdia", "minf", "stbl", "stsd") with a "glbl" box 1667 that contains the ConfigurationRecord bitstream. See 1668 [ISO.14496-12.2015] for more information about boxes. 1670 "NumBytes" is defined as the size, in bytes, of the "glbl" box 1671 indicated in the box header minus the size of the box header. 1673 4.3.3.3. NUT File Format 1675 The "codec_specific_data" element (in "stream_header" packet) 1676 contains the ConfigurationRecord bitstream. See [NUT] for more 1677 information about elements. 1679 "NumBytes" is defined as the size, in bytes, of the 1680 "codec_specific_data" element as indicated in the "length" field of 1681 "codec_specific_data". 1683 4.3.3.4. Matroska File Format 1685 FFV1 SHOULD use "V_FFV1" as the Matroska "Codec ID". For FFV1 1686 versions 2 or less, the Matroska "CodecPrivate" Element SHOULD NOT be 1687 used. For FFV1 versions 3 or greater, the Matroska "CodecPrivate" 1688 Element MUST contain the FFV1 "Configuration Record" structure and no 1689 other data. See [Matroska] for more information about elements. 1691 "NumBytes" is defined as the "Element Data Size" of the 1692 "CodecPrivate" Element. 1694 4.4. Frame 1696 A "Frame" is an encoded representation of a complete static image. 1697 The whole "Frame" is provided by the underlaying container. 1699 A "Frame" consists of the "keyframe" field, "Parameters" (if 1700 "version" <= 1), and a sequence of independent slices. The pseudo- 1701 code below describes the contents of a "Frame". 1703 "keyframe" field has its own initial state, set to 128. 1705 pseudo-code | type 1706 --------------------------------------------------------------|----- 1707 Frame( NumBytes ) { | 1708 keyframe | br 1709 if (keyframe && !ConfigurationRecordIsPresent { | 1710 Parameters( ) | 1711 } | 1712 while (remaining_bits_in_bitstream( NumBytes )) { | 1713 Slice( ) | 1714 } | 1715 } | 1717 Architecture overview of slices in a "Frame": 1719 +=================================================================+ 1720 +=================================================================+ 1721 | first slice header | 1722 +-----------------------------------------------------------------+ 1723 | first slice content | 1724 +-----------------------------------------------------------------+ 1725 | first slice footer | 1726 +-----------------------------------------------------------------+ 1727 | --------------------------------------------------------------- | 1728 +-----------------------------------------------------------------+ 1729 | second slice header | 1730 +-----------------------------------------------------------------+ 1731 | second slice content | 1732 +-----------------------------------------------------------------+ 1733 | second slice footer | 1734 +-----------------------------------------------------------------+ 1735 | --------------------------------------------------------------- | 1736 +-----------------------------------------------------------------+ 1737 | ... | 1738 +-----------------------------------------------------------------+ 1739 | --------------------------------------------------------------- | 1740 +-----------------------------------------------------------------+ 1741 | last slice header | 1742 +-----------------------------------------------------------------+ 1743 | last slice content | 1744 +-----------------------------------------------------------------+ 1745 | last slice footer | 1746 +-----------------------------------------------------------------+ 1748 Table 15 1750 4.5. Slice 1752 A "Slice" is an independent spatial sub-section of a "Frame" that is 1753 encoded separately from another region of the same "Frame". The use 1754 of more than one "Slice" per "Frame" can be useful for taking 1755 advantage of the opportunities of multithreaded encoding and 1756 decoding. 1758 A "Slice" consists of a "Slice Header" (when relevant), a "Slice 1759 Content", and a "Slice Footer" (when relevant). The pseudo-code 1760 below describes the contents of a "Slice". 1762 pseudo-code | type 1763 --------------------------------------------------------------|----- 1764 Slice( ) { | 1765 if (version >= 3) { | 1766 SliceHeader( ) | 1767 } | 1768 SliceContent( ) | 1769 if (coder_type == 0) { | 1770 while (!byte_aligned()) { | 1771 padding | u(1) 1772 } | 1773 } | 1774 if (version <= 1) { | 1775 while (remaining_bits_in_bitstream( NumBytes ) != 0) {| 1776 reserved | u(1) 1777 } | 1778 } | 1779 if (version >= 3) { | 1780 SliceFooter( ) | 1781 } | 1782 } | 1784 "padding" specifies a bit without any significance and used only for 1785 byte alignment. MUST be 0. 1787 "reserved" specifies a bit without any significance in this revision 1788 of the specification and may have a significance in a later revision 1789 of this specification. 1791 Encoders SHOULD NOT fill these bits. 1793 Decoders SHOULD ignore these bits. 1795 4.6. Slice Header 1797 A "Slice Header" provides information about the decoding 1798 configuration of the "Slice", such as its spatial position, size, and 1799 aspect ratio. The pseudo-code below describes the contents of the 1800 "Slice Header". 1802 "Slice Header" has its own initial states, all set to 128. 1804 pseudo-code | type 1805 --------------------------------------------------------------|----- 1806 SliceHeader( ) { | 1807 slice_x | ur 1808 slice_y | ur 1809 slice_width - 1 | ur 1810 slice_height - 1 | ur 1811 for (i = 0; i < quant_table_set_index_count; i++) { | 1812 quant_table_set_index[ i ] | ur 1813 } | 1814 picture_structure | ur 1815 sar_num | ur 1816 sar_den | ur 1817 } | 1819 4.6.1. slice_x 1821 "slice_x" indicates the x position on the slice raster formed by 1822 num_h_slices. 1824 Inferred to be 0 if not present. 1826 4.6.2. slice_y 1828 "slice_y" indicates the y position on the slice raster formed by 1829 num_v_slices. 1831 Inferred to be 0 if not present. 1833 4.6.3. slice_width 1835 "slice_width" indicates the width on the slice raster formed by 1836 num_h_slices. 1838 Inferred to be 1 if not present. 1840 4.6.4. slice_height 1842 "slice_height" indicates the height on the slice raster formed by 1843 num_v_slices. 1845 Inferred to be 1 if not present. 1847 4.6.5. quant_table_set_index_count 1849 "quant_table_set_index_count" is defined as: 1851 1 + ( ( chroma_planes || version <= 3 ) ? 1 : 0 ) 1852 + ( extra_plane ? 1 : 0 ) 1854 4.6.6. quant_table_set_index 1856 "quant_table_set_index" indicates the Quantization Table Set index to 1857 select the Quantization Table Set and the initial states for the 1858 "Slice Content". 1860 Inferred to be 0 if not present. 1862 4.6.7. picture_structure 1864 "picture_structure" specifies the temporal and spatial relationship 1865 of each "Line" of the "Frame". 1867 Inferred to be 0 if not present. 1869 +=======+=========================+ 1870 | value | picture structure used | 1871 +=======+=========================+ 1872 | 0 | unknown | 1873 +-------+-------------------------+ 1874 | 1 | top field first | 1875 +-------+-------------------------+ 1876 | 2 | bottom field first | 1877 +-------+-------------------------+ 1878 | 3 | progressive | 1879 +-------+-------------------------+ 1880 | Other | reserved for future use | 1881 +-------+-------------------------+ 1883 Table 16 1885 4.6.8. sar_num 1887 "sar_num" specifies the "Sample" aspect ratio numerator. 1889 Inferred to be 0 if not present. 1891 A value of 0 means that aspect ratio is unknown. 1893 Encoders MUST write 0 if "Sample" aspect ratio is unknown. 1895 If "sar_den" is 0, decoders SHOULD ignore the encoded value and 1896 consider that "sar_num" is 0. 1898 4.6.9. sar_den 1900 "sar_den" specifies the "Sample" aspect ratio denominator. 1902 Inferred to be 0 if not present. 1904 A value of 0 means that aspect ratio is unknown. 1906 Encoders MUST write 0 if "Sample" aspect ratio is unknown. 1908 If "sar_num" is 0, decoders SHOULD ignore the encoded value and 1909 consider that "sar_den" is 0. 1911 4.7. Slice Content 1913 A "Slice Content" contains all "Line" elements part of the "Slice". 1915 Depending on the configuration, "Line" elements are ordered by 1916 "Plane" then by row (YCbCr) or by row then by "Plane" (RGB). 1918 pseudo-code | type 1919 --------------------------------------------------------------|----- 1920 SliceContent( ) { | 1921 if (colorspace_type == 0) { | 1922 for (p = 0; p < primary_color_count; p++) { | 1923 for (y = 0; y < plane_pixel_height[ p ]; y++) { | 1924 Line( p, y ) | 1925 } | 1926 } | 1927 } else if (colorspace_type == 1) { | 1928 for (y = 0; y < slice_pixel_height; y++) { | 1929 for (p = 0; p < primary_color_count; p++) { | 1930 Line( p, y ) | 1931 } | 1932 } | 1933 } | 1934 } | 1936 4.7.1. primary_color_count 1938 "primary_color_count" is defined as: 1940 1 + ( chroma_planes ? 2 : 0 ) + ( extra_plane ? 1 : 0 ) 1942 4.7.2. plane_pixel_height 1944 "plane_pixel_height[ p ]" is the height in "Pixels" of "Plane" p of 1945 the "Slice". It is defined as: 1947 chroma_planes == 1 && (p == 1 || p == 2) 1948 ? ceil(slice_pixel_height / (1 << log2_v_chroma_subsample)) 1949 : slice_pixel_height 1951 4.7.3. slice_pixel_height 1953 "slice_pixel_height" is the height in pixels of the slice. It is 1954 defined as: 1956 floor( 1957 ( slice_y + slice_height ) 1958 * slice_pixel_height 1959 / num_v_slices 1960 ) - slice_pixel_y. 1962 4.7.4. slice_pixel_y 1964 "slice_pixel_y" is the slice vertical position in pixels. It is 1965 defined as: 1967 floor( slice_y * frame_pixel_height / num_v_slices ) 1969 4.8. Line 1971 A "Line" is a list of the sample differences (relative to the 1972 predictor) of primary color components. The pseudo-code below 1973 describes the contents of the "Line". 1975 pseudo-code | type 1976 --------------------------------------------------------------|----- 1977 Line( p, y ) { | 1978 if (colorspace_type == 0) { | 1979 for (x = 0; x < plane_pixel_width[ p ]; x++) { | 1980 sample_difference[ p ][ y ][ x ] | sd 1981 } | 1982 } else if (colorspace_type == 1) { | 1983 for (x = 0; x < slice_pixel_width; x++) { | 1984 sample_difference[ p ][ y ][ x ] | sd 1985 } | 1986 } | 1987 } | 1989 4.8.1. plane_pixel_width 1991 "plane_pixel_width[ p ]" is the width in "Pixels" of "Plane" p of the 1992 "Slice". It is defined as: 1994 chroma\_planes == 1 && (p == 1 || p == 2) 1995 ? ceil( slice_pixel_width / (1 << log2_h_chroma_subsample) ) 1996 : slice_pixel_width. 1998 4.8.2. slice_pixel_width 2000 "slice_pixel_width" is the width in "Pixels" of the slice. It is 2001 defined as: 2003 floor( 2004 ( slice_x + slice_width ) 2005 * slice_pixel_width 2006 / num_h_slices 2007 ) - slice_pixel_x 2009 4.8.3. slice_pixel_x 2011 "slice_pixel_x" is the slice horizontal position in "Pixels". It is 2012 defined as: 2014 floor( slice_x * frame_pixel_width / num_h_slices ) 2016 4.8.4. sample_difference 2018 "sample_difference[ p ][ y ][ x ]" is the sample difference for 2019 "Sample" at "Plane" "p", y position "y", and x position "x". The 2020 "Sample" value is computed based on median predictor and context 2021 described in Section 3.2. 2023 4.9. Slice Footer 2025 A "Slice Footer" provides information about slice size and 2026 (optionally) parity. The pseudo-code below describes the contents of 2027 the "Slice Footer". 2029 Note: "Slice Footer" is always byte aligned. 2031 pseudo-code | type 2032 --------------------------------------------------------------|----- 2033 SliceFooter( ) { | 2034 slice_size | u(24) 2035 if (ec) { | 2036 error_status | u(8) 2037 slice_crc_parity | u(32) 2038 } | 2039 } | 2041 4.9.1. slice_size 2043 "slice_size" indicates the size of the slice in bytes. 2045 Note: this allows finding the start of slices before previous slices 2046 have been fully decoded, and allows parallel decoding as well as 2047 error resilience. 2049 4.9.2. error_status 2051 "error_status" specifies the error status. 2053 +=======+======================================+ 2054 | value | error status | 2055 +=======+======================================+ 2056 | 0 | no error | 2057 +-------+--------------------------------------+ 2058 | 1 | slice contains a correctable error | 2059 +-------+--------------------------------------+ 2060 | 2 | slice contains a uncorrectable error | 2061 +-------+--------------------------------------+ 2062 | Other | reserved for future use | 2063 +-------+--------------------------------------+ 2065 Table 17 2067 4.9.3. slice_crc_parity 2069 "slice_crc_parity" 32 bits that are chosen so that the slice as a 2070 whole has a crc remainder of 0. 2072 This is equivalent to storing the crc remainder in the 32-bit parity. 2074 The CRC generator polynomial used is the standard IEEE CRC polynomial 2075 (0x104C11DB7), with initial value 0, without pre-inversion and 2076 without post-inversion. 2078 5. Restrictions 2080 To ensure that fast multithreaded decoding is possible, starting with 2081 version 3 and if "frame_pixel_width * frame_pixel_height" is more 2082 than 101376, "slice_width * slice_height" MUST be less or equal to 2083 "num_h_slices * num_v_slices / 4". Note: 101376 is the frame size in 2084 "Pixels" of a 352x288 frame also known as CIF ("Common Intermediate 2085 Format") frame size format. 2087 For each "Frame", each position in the slice raster MUST be filled by 2088 one and only one slice of the "Frame" (no missing slice position, no 2089 slice overlapping). 2091 For each "Frame" with "keyframe" value of 0, each slice MUST have the 2092 same value of "slice_x", "slice_y", "slice_width", "slice_height" as 2093 a slice in the previous "Frame". 2095 6. Security Considerations 2097 Like any other codec, (such as [RFC6716]), FFV1 should not be used 2098 with insecure ciphers or cipher-modes that are vulnerable to known 2099 plaintext attacks. Some of the header bits as well as the padding 2100 are easily predictable. 2102 Implementations of the FFV1 codec need to take appropriate security 2103 considerations into account, as outlined in [RFC4732]. It is 2104 extremely important for the decoder to be robust against malicious 2105 payloads. Malicious payloads must not cause the decoder to overrun 2106 its allocated memory or to take an excessive amount of resources to 2107 decode. The same applies to the encoder, even though problems in 2108 encoders are typically rarer. Malicious video streams must not cause 2109 the encoder to misbehave because this would allow an attacker to 2110 attack transcoding gateways. A frequent security problem in image 2111 and video codecs is also to not check for integer overflows, for 2112 example to allocate "frame_pixel_width * frame_pixel_height" in 2113 "Pixel" count computations without considering that the 2114 multiplication result may have overflowed the arithmetic types range. 2115 The range coder could, if implemented naively, read one byte over the 2116 end. The implementation must ensure that no read outside allocated 2117 and initialized memory occurs. 2119 None of the content carried in FFV1 is intended to be executable. 2121 The reference implementation [REFIMPL] contains no known buffer 2122 overflow or cases where a specially crafted packet or video segment 2123 could cause a significant increase in CPU load. 2125 The reference implementation [REFIMPL] was validated in the following 2126 conditions: 2128 * Sending the decoder valid packets generated by the reference 2129 encoder and verifying that the decoder's output matches the 2130 encoder's input. 2132 * Sending the decoder packets generated by the reference encoder and 2133 then subjected to random corruption. 2135 * Sending the decoder random packets that are not FFV1. 2137 In all of the conditions above, the decoder and encoder was run 2138 inside the [VALGRIND] memory debugger as well as clangs address 2139 sanitizer [Address-Sanitizer], which track reads and writes to 2140 invalid memory regions as well as the use of uninitialized memory. 2141 There were no errors reported on any of the tested conditions. 2143 7. Media Type Definition 2145 This registration is done using the template defined in [RFC6838] and 2146 following [RFC4855]. 2148 Type name: video 2150 Subtype name: FFV1 2152 Required parameters: None. 2154 Optional parameters: These parameters are used to signal the 2155 capabilities of a receiver implementation. These parameters MUST NOT 2156 be used for any other purpose. 2158 * "version": The "version" of the FFV1 encoding as defined by 2159 Section 4.2.1. 2161 * "micro_version": The "micro_version" of the FFV1 encoding as 2162 defined by Section 4.2.2. 2164 * "coder_type": The "coder_type" of the FFV1 encoding as defined by 2165 Section 4.2.3. 2167 * "colorspace_type": The "colorspace_type" of the FFV1 encoding as 2168 defined by Section 4.2.5. 2170 * "bits_per_raw_sample": The "bits_per_raw_sample" of the FFV1 2171 encoding as defined by Section 4.2.7. 2173 * "max_slices": The value of "max_slices" is an integer indicating 2174 the maximum count of slices with a frames of the FFV1 encoding. 2176 Encoding considerations: This media type is defined for encapsulation 2177 in several audiovisual container formats and contains binary data; 2178 see Section 4.3.3. This media type is framed binary data; see 2179 Section 4.8 of [RFC6838]. 2181 Security considerations: See Section 6 of this document. 2183 Interoperability considerations: None. 2185 Published specification: RFC XXXX. 2187 [RFC Editor: Upon publication as an RFC, please replace "XXXX" with 2188 the number assigned to this document and remove this note.] 2190 Applications which use this media type: Any application that requires 2191 the transport of lossless video can use this media type. Some 2192 examples are, but not limited to screen recording, scientific 2193 imaging, and digital video preservation. 2195 Fragment identifier considerations: N/A. 2197 Additional information: None. 2199 Person & email address to contact for further information: Michael 2200 Niedermayer michael@niedermayer.cc (mailto:michael@niedermayer.cc) 2202 Intended usage: COMMON 2204 Restrictions on usage: None. 2206 Author: Dave Rice dave@dericed.com (mailto:dave@dericed.com) 2208 Change controller: IETF cellar working group delegated from the IESG. 2210 8. IANA Considerations 2212 The IANA is requested to register the following values: 2214 * Media type registration as described in Section 7. 2216 9. Changelog 2218 See https://github.com/FFmpeg/FFV1/commits/master 2219 (https://github.com/FFmpeg/FFV1/commits/master) 2221 [RFC Editor: Please remove this Changelog section prior to 2222 publication.] 2224 10. Normative References 2226 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2227 Requirement Levels", BCP 14, RFC 2119, 2228 DOI 10.17487/RFC2119, March 1997, 2229 . 2231 [ISO.9899.2018] 2232 International Organization for Standardization, 2233 "Programming languages - C", ISO Standard 9899, 2018. 2235 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2236 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 2237 May 2017, . 2239 [RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the 2240 Opus Audio Codec", RFC 6716, DOI 10.17487/RFC6716, 2241 September 2012, . 2243 [RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type 2244 Specifications and Registration Procedures", BCP 13, 2245 RFC 6838, DOI 10.17487/RFC6838, January 2013, 2246 . 2248 [RFC4855] Casner, S., "Media Type Registration of RTP Payload 2249 Formats", RFC 4855, DOI 10.17487/RFC4855, February 2007, 2250 . 2252 [ISO.9899.1990] 2253 International Organization for Standardization, 2254 "Programming languages - C", ISO Standard 9899, 1990. 2256 [RFC4732] Handley, M., Ed., Rescorla, E., Ed., and IAB, "Internet 2257 Denial-of-Service Considerations", RFC 4732, 2258 DOI 10.17487/RFC4732, December 2006, 2259 . 2261 [Matroska] IETF, "Matroska", 2019, . 2264 [ISO.15444-1.2016] 2265 International Organization for Standardization, 2266 "Information technology -- JPEG 2000 image coding system: 2267 Core coding system", October 2016. 2269 11. Informative References 2271 [AVI] Microsoft, "AVI RIFF File Reference", undated, 2272 . 2275 [NUT] Niedermayer, M., "NUT Open Container Format", December 2276 2013, . 2278 [range-coding] 2279 Nigel, G. and N. Martin, "Range encoding: an algorithm for 2280 removing redundancy from a digitised message.", 2281 Proceedings of the Conference on Video and Data 2282 Recording. Institution of Electronic and Radio Engineers, 2283 Hampshire, England, July 1979. 2285 [REFIMPL] Niedermayer, M., "The reference FFV1 implementation / the 2286 FFV1 codec in FFmpeg", undated, . 2288 [Address-Sanitizer] 2289 The Clang Team, "ASAN AddressSanitizer website", undated, 2290 . 2292 [ISO.14496-10.2014] 2293 International Organization for Standardization, 2294 "Information technology -- Coding of audio-visual objects 2295 -- Part 10: Advanced Video Coding", September 2014. 2297 [VALGRIND] Valgrind Developers, "Valgrind website", undated, 2298 . 2300 [FFV1_V0] Niedermayer, M., "Commit to mark FFV1 version 0 as non- 2301 experimental", April 2006, . 2305 [FFV1_V1] Niedermayer, M., "Commit to release FFV1 version 1", April 2306 2009, . 2309 [ISO.14495-1.1999] 2310 International Organization for Standardization, 2311 "Information technology -- Lossless and near-lossless 2312 compression of continuous-tone still images: Baseline", 2313 December 1999. 2315 [HuffYUV] Rudiak-Gould, B., "HuffYUV", December 2003, 2316 . 2319 [FFV1_V3] Niedermayer, M., "Commit to mark FFV1 version 3 as non- 2320 experimental", August 2013, . 2324 [YCbCr] Wikipedia, "YCbCr", undated, 2325 . 2327 [ISO.14496-12.2015] 2328 International Organization for Standardization, 2329 "Information technology -- Coding of audio-visual objects 2330 -- Part 12: ISO base media file format", December 2015. 2332 Appendix A. Multi-theaded decoder implementation suggestions 2334 This appendix is informative. 2336 The FFV1 bitstream is parsable in two ways: in sequential order as 2337 described in this document or with the pre-analysis of the footer of 2338 each slice. Each slice footer contains a "slice_size" field so the 2339 boundary of each slice is computable without having to parse the 2340 slice content. That allows multi-threading as well as independence 2341 of slice content (a bitstream error in a slice header or slice 2342 content has no impact on the decoding of the other slices). 2344 After having checked "keyframe" field, a decoder SHOULD parse 2345 "slice_size" fields, from "slice_size" of the last slice at the end 2346 of the "Frame" up to "slice_size" of the first slice at the beginning 2347 of the "Frame", before parsing slices, in order to have slices 2348 boundaries. A decoder MAY fallback on sequential order e.g. in case 2349 of a corrupted "Frame" (frame size unknown, "slice_size" of slices 2350 not coherent...) or if there is no possibility of seeking into the 2351 stream. 2353 Appendix B. Future handling of some streams created by non conforming 2354 encoders 2356 This appendix is informative. 2358 Some bitstreams were found with 40 extra bits corresponding to 2359 "error_status" and "slice_crc_parity" in the "reserved" bits of 2360 "Slice()". Any revision of this specification SHOULD care about 2361 avoiding to add 40 bits of content after "SliceContent" if "version" 2362 == 0 or "version" == 1. Else a decoder conforming to the revised 2363 specification could not distinguish between a revised bitstream and 2364 such buggy bitstream in the wild. 2366 Authors' Addresses 2368 Michael Niedermayer 2370 Email: michael@niedermayer.cc 2372 Dave Rice 2373 Email: dave@dericed.com 2375 Jerome Martinez 2377 Email: jerome@mediaarea.net