idnits 2.17.1 draft-ietf-cellar-ffv1-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (January 26, 2018) is 2281 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '1' on line 611 -- Looks like a reference, but probably isn't: '2' on line 611 Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 cellar M. Niedermayer 3 Internet-Draft 4 Intended status: Standards Track D. Rice 5 Expires: July 30, 2018 6 J. Martinez 7 January 26, 2018 9 FF Video Codec 1 10 draft-ietf-cellar-ffv1-01 12 Abstract 14 This document defines FFV1, a lossless intra-frame video encoding 15 format. FFV1 is designed to efficiently compress video data in a 16 variety of pixel formats. Compared to uncompressed video, FFV1 17 offers storage compression, frame fixity, and self-description, which 18 makes FFV1 useful as a preservation or intermediate video format. 20 Status of This Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at https://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on July 30, 2018. 37 Copyright Notice 39 Copyright (c) 2018 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (https://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 55 2. Notation and Conventions . . . . . . . . . . . . . . . . . . 4 56 2.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . 4 57 2.2. Conventions . . . . . . . . . . . . . . . . . . . . . . . 5 58 2.2.1. Arithmetic operators . . . . . . . . . . . . . . . . 5 59 2.2.2. Assignment operators . . . . . . . . . . . . . . . . 6 60 2.2.3. Comparison operators . . . . . . . . . . . . . . . . 6 61 2.2.4. Mathematical functions . . . . . . . . . . . . . . . 7 62 2.2.5. Order of operation precedence . . . . . . . . . . . . 7 63 2.2.6. Pseudo-code . . . . . . . . . . . . . . . . . . . . . 8 64 2.2.7. Range . . . . . . . . . . . . . . . . . . . . . . . . 8 65 2.2.8. NumBytes . . . . . . . . . . . . . . . . . . . . . . 8 66 2.2.9. Bitstream functions . . . . . . . . . . . . . . . . . 8 67 3. General Description . . . . . . . . . . . . . . . . . . . . . 8 68 3.1. Border . . . . . . . . . . . . . . . . . . . . . . . . . 9 69 3.2. Samples . . . . . . . . . . . . . . . . . . . . . . . . . 9 70 3.3. Median predictor . . . . . . . . . . . . . . . . . . . . 10 71 3.4. Context . . . . . . . . . . . . . . . . . . . . . . . . . 10 72 3.5. Quantization Table Sets . . . . . . . . . . . . . . . . . 11 73 3.6. Quantization Table Set indexes . . . . . . . . . . . . . 11 74 3.7. Color spaces . . . . . . . . . . . . . . . . . . . . . . 12 75 3.7.1. YCbCr . . . . . . . . . . . . . . . . . . . . . . . . 12 76 3.7.2. RGB . . . . . . . . . . . . . . . . . . . . . . . . . 12 77 3.8. Coding of the Sample Difference . . . . . . . . . . . . . 14 78 3.8.1. Range coding mode . . . . . . . . . . . . . . . . . . 14 79 3.8.2. Golomb Rice mode . . . . . . . . . . . . . . . . . . 17 80 4. Bitstream . . . . . . . . . . . . . . . . . . . . . . . . . . 19 81 4.1. Configuration Record . . . . . . . . . . . . . . . . . . 20 82 4.1.1. reserved_for_future_use . . . . . . . . . . . . . . . 21 83 4.1.2. configuration_record_crc_parity . . . . . . . . . . . 21 84 4.1.3. Mapping FFV1 into Containers . . . . . . . . . . . . 21 85 4.2. Frame . . . . . . . . . . . . . . . . . . . . . . . . . . 22 86 4.3. Slice . . . . . . . . . . . . . . . . . . . . . . . . . . 22 87 4.4. Slice Header . . . . . . . . . . . . . . . . . . . . . . 23 88 4.4.1. slice_x . . . . . . . . . . . . . . . . . . . . . . . 23 89 4.4.2. slice_y . . . . . . . . . . . . . . . . . . . . . . . 23 90 4.4.3. slice_width . . . . . . . . . . . . . . . . . . . . . 24 91 4.4.4. slice_height . . . . . . . . . . . . . . . . . . . . 24 92 4.4.5. quant_table_set_index_count . . . . . . . . . . . . . 24 93 4.4.6. quant_table_set_index . . . . . . . . . . . . . . . . 24 94 4.4.7. picture_structure . . . . . . . . . . . . . . . . . . 24 95 4.4.8. sar_num . . . . . . . . . . . . . . . . . . . . . . . 24 96 4.4.9. sar_den . . . . . . . . . . . . . . . . . . . . . . . 25 97 4.4.10. reset_contexts . . . . . . . . . . . . . . . . . . . 25 98 4.4.11. slice_coding_mode . . . . . . . . . . . . . . . . . . 25 99 4.5. Slice Content . . . . . . . . . . . . . . . . . . . . . . 25 100 4.5.1. primary_color_count . . . . . . . . . . . . . . . . . 25 101 4.5.2. plane_pixel_height . . . . . . . . . . . . . . . . . 26 102 4.5.3. slice_pixel_height . . . . . . . . . . . . . . . . . 26 103 4.5.4. slice_pixel_y . . . . . . . . . . . . . . . . . . . . 26 104 4.6. Line . . . . . . . . . . . . . . . . . . . . . . . . . . 26 105 4.6.1. plane_pixel_width . . . . . . . . . . . . . . . . . . 26 106 4.6.2. slice_pixel_width . . . . . . . . . . . . . . . . . . 27 107 4.6.3. slice_pixel_x . . . . . . . . . . . . . . . . . . . . 27 108 4.6.4. sample_difference . . . . . . . . . . . . . . . . . . 27 109 4.7. Slice Footer . . . . . . . . . . . . . . . . . . . . . . 27 110 4.7.1. slice_size . . . . . . . . . . . . . . . . . . . . . 27 111 4.7.2. error_status . . . . . . . . . . . . . . . . . . . . 27 112 4.7.3. slice_crc_parity . . . . . . . . . . . . . . . . . . 28 113 4.8. Parameters . . . . . . . . . . . . . . . . . . . . . . . 28 114 4.8.1. version . . . . . . . . . . . . . . . . . . . . . . . 29 115 4.8.2. micro_version . . . . . . . . . . . . . . . . . . . . 30 116 4.8.3. coder_type . . . . . . . . . . . . . . . . . . . . . 31 117 4.8.4. state_transition_delta . . . . . . . . . . . . . . . 31 118 4.8.5. colorspace_type . . . . . . . . . . . . . . . . . . . 31 119 4.8.6. chroma_planes . . . . . . . . . . . . . . . . . . . . 32 120 4.8.7. bits_per_raw_sample . . . . . . . . . . . . . . . . . 32 121 4.8.8. log2_h_chroma_subsample . . . . . . . . . . . . . . . 32 122 4.8.9. log2_v_chroma_subsample . . . . . . . . . . . . . . . 32 123 4.8.10. alpha_plane . . . . . . . . . . . . . . . . . . . . . 32 124 4.8.11. num_h_slices . . . . . . . . . . . . . . . . . . . . 33 125 4.8.12. num_v_slices . . . . . . . . . . . . . . . . . . . . 33 126 4.8.13. quant_table_set_count . . . . . . . . . . . . . . . . 33 127 4.8.14. states_coded . . . . . . . . . . . . . . . . . . . . 33 128 4.8.15. initial_state_delta . . . . . . . . . . . . . . . . . 33 129 4.8.16. ec . . . . . . . . . . . . . . . . . . . . . . . . . 33 130 4.8.17. intra . . . . . . . . . . . . . . . . . . . . . . . . 34 131 4.9. Quantization Table Set . . . . . . . . . . . . . . . . . 34 132 4.9.1. quant_tables . . . . . . . . . . . . . . . . . . . . 35 133 4.9.2. context_count . . . . . . . . . . . . . . . . . . . . 35 134 5. Restrictions . . . . . . . . . . . . . . . . . . . . . . . . 35 135 6. Security Considerations . . . . . . . . . . . . . . . . . . . 36 136 7. Appendixes . . . . . . . . . . . . . . . . . . . . . . . . . 36 137 7.1. Decoder implementation suggestions . . . . . . . . . . . 37 138 7.1.1. Multi-threading support and independence of slices . 37 139 8. Changelog . . . . . . . . . . . . . . . . . . . . . . . . . . 37 140 9. ToDo . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 141 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 38 142 10.1. Normative References . . . . . . . . . . . . . . . . . . 38 143 10.2. Informative References . . . . . . . . . . . . . . . . . 38 144 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 40 146 1. Introduction 148 This document describes FFV1, a lossless video encoding format. The 149 design of FFV1 considers the storage of image characteristics, data 150 fixity, and the optimized use of encoding time and storage 151 requirements. FFV1 is designed to support a wide range of lossless 152 video applications such as long-term audiovisual preservation, 153 scientific imaging, screen recording, and other video encoding 154 scenarios that seek to avoid the generational loss of lossy video 155 encodings. 157 This document defines a version 0, 1, and 3 of FFV1. The 158 distinctions of the versions are provided throughout the document, 159 but in summary: 161 o Version 0 of FFV1 was the original implementation of FFV1 and has 162 been in non-experimental use since April 14, 2006 [FFV1_V0]. 164 o Version 1 of FFV1 adds support of more video bit depths and has 165 been in use since April 24, 2009 [FFV1_V1]. 167 o Version 2 of FFV1 only existed in experimental form and is not 168 described by this document. 170 o Version 3 of FFV1 adds several features such as increased 171 description of the characteristics of the encoding images and 172 embedded CRC data to support fixity verification of the encoding. 173 Version 3 has been in non-experimental use since August 17, 2013 174 [FFV1_V3]. 176 The latest version of this document is available at 177 179 This document assumes familiarity with mathematical and coding 180 concepts such as Range coding [range-coding] and YCbCr color spaces 181 [YCbCr]. 183 2. Notation and Conventions 185 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 186 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 187 document are to be interpreted as described in [RFC2119]. 189 2.1. Definitions 191 "Sample": The smallest addressable representation of a color 192 component or a luma component in a frame. Examples of sample are 193 Luma, Blue Chrominance, Red Chrominance, Alpha, Red, Green, Blue. 195 "Pixel": The smallest addressable representation of a color in a 196 frame. It is composed of 1 or more samples. 198 "ESC": An ESCape symbol to indicate that the symbol to be stored is 199 too large for normal storage and that an alternate storage method. 201 "MSB": Most Significant Bit, the bit that can cause the largest 202 change in magnitude of the symbol. 204 "RCT": Reversible Color Transform, a near linear, exactly reversible 205 integer transform that converts between RGB and YCbCr representations 206 of a Pixel. 208 "VLC": Variable Length Code, a code which maps source symbols to a 209 variable number of bits. 211 "RGB": A reference to the method of storing the value of a Pixel by 212 using three numeric values that represent Red, Green, and Blue. 214 "YCbCr": A reference to the method of storing the value of a Pixel by 215 using three numeric values that represent the luma of the Pixel (Y) 216 and the chrominance of the Pixel (Cb and Cr). YCbCr word is used for 217 historical reasons and currently references any color space relying 218 on 1 luma and 2 chrominances e.g. YCbCr, YCgCo or ICtCp. Exact 219 meaning of the three numeric values is unspecified. 221 "TBA": To Be Announced. Used in reference to the development of 222 future iterations of the FFV1 specification. 224 2.2. Conventions 226 Note: the operators and the order of precedence are the same as used 227 in the C programming language [ISO.9899.1990]. 229 2.2.1. Arithmetic operators 231 "a + b" means a plus b. 233 "a - b" means a minus b. 235 "-a" means negation of a. 237 "a * b" means a multiplied by b. 239 "a / b" means a divided by b. 241 "a & b" means bit-wise "and" of a and b. 243 "a | b" means bit-wise "or" of a and b. 245 "a >> b" means arithmetic right shift of two's complement integer 246 representation of a by b binary digits. 248 "a << b" means arithmetic left shift of two's complement integer 249 representation of a by b binary digits. 251 2.2.2. Assignment operators 253 "a = b" means a is assigned b. 255 "a++" is equivalent to a is assigned a + 1. 257 "a--" is equivalent to a is assigned a - 1. 259 "a += b" is equivalent to a is assigned a + b. 261 "a -= b" is equivalent to a is assigned a - b. 263 "a *= b" is equivalent to a is assigned a * b. 265 2.2.3. Comparison operators 267 "a > b" means a is greater than b. 269 "a >= b" means a is greater than or equal to b. 271 "a < b" means a is less than b. 273 "a <= b" means a is less than or equal b. 275 "a == b" means a is equal to b. 277 "a != b" means a is not equal to b. 279 "a && b" means Boolean logical "and" of a and b. 281 "a || b" means Boolean logical "or" of a and b. 283 "!a" means Boolean logical "not" of a. 285 "a ? b : c" if a is true, then b, otherwise c. 287 2.2.4. Mathematical functions 289 floor(a) the largest integer less than or equal to a 291 ceil(a) the largest integer less than or equal to a 293 sign(a) extracts the sign of a number, i.e. if a < 0 then -1, else if 294 a > 0 then 1, else 0 296 abs(a) the absolute value of a, i.e. abs(a) = sign(a)*a 298 log2(a) the base-two logarithm of a 300 min(a,b) the smallest of two values a and b 302 max(a,b) the largest of two values a and b 304 median(a,b,c) the numerical middle value in a data set of a, b, and 305 c, i.e. a+b+c-min(a,b,c)-max(a,b,c) 307 a_{b} the b-th value of a sequence of a 309 a_{b,c} the 'b,c'-th value of a sequence of a 311 2.2.5. Order of operation precedence 313 When order of precedence is not indicated explicitly by use of 314 parentheses, operations are evaluated in the following order (from 315 top to bottom, operations of same precedence being evaluated from 316 left to right). This order of operations is based on the order of 317 operations used in Standard C. 319 a++, a-- 320 !a, -a 321 a * b, a / b, a % b 322 a + b, a - b 323 a << b, a >> b 324 a < b, a <= b, a > b, a >= b 325 a == b, a != b 326 a & b 327 a | b 328 a && b 329 a || b 330 a ? b : c 331 a = b, a += b, a -= b, a *= b 333 2.2.6. Pseudo-code 335 Several components of FFV1 are described in this document using 336 pseudo-code. Note that the pseudo-code is used for clarity in order 337 to illustrate the structure of FFV1 and not intended to specify any 338 particular implementation. The pseudo-code used is based upon the C 339 programming language [ISO.9899.1990] as uses its "if/else", "while" 340 and "for" functions as well as functions defined within this 341 document. 343 2.2.7. Range 345 "a...b" means any value starting from a to b, inclusive. 347 2.2.8. NumBytes 349 "NumBytes" is a non-negative integer that expresses the size in 8-bit 350 octets of particular FFV1 components such as the "Configuration 351 Record" and "Frame". FFV1 relies on its container to store the 352 "NumBytes" values, see Section 4.1.3. 354 2.2.9. Bitstream functions 356 2.2.9.1. remaining_bits_in_bitstream 358 "remaining_bits_in_bitstream( )" means the count of remaining bits 359 after the pointer in that bitstream component. It is computed from 360 the "NumBytes" value multiplied by 8 minus the count of bits of that 361 component already read by the bitstream parser. 363 2.2.9.2. byte_aligned 365 "byte_aligned( )" is true if "remaining_bits_in_bitstream( NumBytes 366 )" is a multiple of 8, otherwise false. 368 2.2.9.3. get_bits 370 "get_bits( i )" is the action to read the next "i" bits in the 371 bitstream, from most significant bit to least significant bit, and to 372 return the corresponding value. The pointer is increased by "i". 374 3. General Description 376 Samples within a plane are coded in raster scan order (left->right, 377 top->bottom). Each sample is predicted by the median predictor from 378 samples in the same plane and the difference is stored see 379 Section 3.8. 381 3.1. Border 383 A border is assumed for each coded slice for the purpose of the 384 predictor and context according to the following rules: 386 o one column of samples to the left of the coded slice is assumed as 387 identical to the samples of the leftmost column of the coded slice 388 shifted down by one row. The value of the topmost sample of the 389 column of samples to the left of the coded slice is assumed to be 390 "0" 392 o one column of samples to the right of the coded slice is assumed 393 as identical to the samples of the rightmost column of the coded 394 slice 396 o an additional column of samples to the left of the coded slice and 397 two rows of samples above the coded slice are assumed to be "0" 399 The following table depicts a slice of samples "a,b,c,d,e,f,g,h,i" 400 along with its assumed border. 402 +---+---+---+---+---+---+---+---+ 403 | 0 | 0 | | 0 | 0 | 0 | | 0 | 404 +---+---+---+---+---+---+---+---+ 405 | 0 | 0 | | 0 | 0 | 0 | | 0 | 406 +---+---+---+---+---+---+---+---+ 407 | | | | | | | | | 408 +---+---+---+---+---+---+---+---+ 409 | 0 | 0 | | a | b | c | | c | 410 +---+---+---+---+---+---+---+---+ 411 | 0 | a | | d | e | f | | f | 412 +---+---+---+---+---+---+---+---+ 413 | 0 | d | | g | h | i | | i | 414 +---+---+---+---+---+---+---+---+ 416 3.2. Samples 418 Positions used for context and median predictor are: 420 +---+---+---+---+ 421 | | | T | | 422 +---+---+---+---+ 423 | |tl | t |tr | 424 +---+---+---+---+ 425 | L | l | X | | 426 +---+---+---+---+ 428 "X" is the current processed Sample. The identifiers are made of the 429 first letters of the words Top, Left and Right. 431 3.3. Median predictor 433 The prediction for any sample value at position "X" may be computed 434 based upon the relative neighboring values of "l", "t", and "tl" via 435 this equation: 437 "median(l, t, l + t - tl)". 439 Note, this prediction template is also used in [ISO.14495-1.1999] and 440 [HuffYUV]. 442 Exception for the media predictor: if "colorspace_type == 0 && 443 bits_per_raw_sample == 16 && ( coder_type == 1 || coder_type == 2 )", 444 the following media predictor MUST be used: 446 "median(left16s, top16s, left16s + top16s - diag16s)" 448 where: 450 left16s = l >= 32768 ? ( l - 65536 ) : l 451 top16s = t >= 32768 ? ( t - 65536 ) : t 452 diag16s = tl >= 32768 ? ( tl - 65536 ) : tl 454 Background: a two's complement signed 16-bit signed integer was used 455 for storing sample values in all known implementations of FFV1 456 bitstream. So in some circumstances, the most significant bit was 457 wrongly interpreted (used as a sign bit instead of the 16th bit of an 458 unsigned integer). Note that when the issue is discovered, the only 459 configuration of all known implementations being impacted is 16-bit 460 YCbCr with no Pixel transformation with Range Coder coder, as other 461 potentially impacted configurations (e.g. 15/16-bit JPEG2000-RCT with 462 Range Coder coder, or 16-bit content with Golomb Rice coder) were 463 implemented nowhere. In the meanwhile, 16-bit JPEG2000-RCT with 464 Range Coder coder was implemented without this issue in one 465 implementation and validated by one conformance checker. It is 466 expected (to be confirmed) to remove this exception for the media 467 predictor in the next version of the FFV1 bitstream. 469 3.4. Context 471 Relative to any sample "X", the Quantized Sample Differences "L-l", 472 "l-tl", "tl-t", "T-t", and "t-tr" are used as context: 474 context = Q_{0}[l - tl] + 475 Q_{1}[tl - t] + 476 Q_{2}[t - tr] + 477 Q_{3}[L - l] + 478 Q_{4}[T - t] 480 If "context >= 0" then "context" is used and the difference between 481 the sample and its predicted value is encoded as is, else "-context" 482 is used and the difference between the sample and its predicted value 483 is encoded with a flipped sign. 485 3.5. Quantization Table Sets 487 The FFV1 bitstream contains 1 or more Quantization Table Sets. Each 488 Quantization Table Set contains exactly 5 Quantization Tables, each 489 Quantization Table corresponding to 1 of the 5 Quantized Sample 490 Differences. For each Quantization Table, both the number of 491 quantization steps and their distribution are stored in the FFV1 492 bitstream; each Quantization Table has exactly 256 entries, and the 8 493 least significant bits of the Quantized Sample Difference are used as 494 index: 496 Q_{j}[k] = quant_tables[i][j][k&255] 498 In this formula, "i" is the Quantization Table Set index, "j" is the 499 Quantized Table index, "k" the Quantized Sample Difference. 501 3.6. Quantization Table Set indexes 503 For each plane of each slice, a Quantization Table Set is selected 504 from an index: 506 o For Y plane, "quant_table_set_index [ 0 ]" index is used 508 o For Cb and Cr planes, "quant_table_set_index [ 1 ]" index is used 510 o For Alpha plane, "quant_table_set_index [ (version <= 3 || 511 chroma_planes) ? 2 : 1 ]" index is used 513 Background: in first implementations of FFV1 bitstream, the index for 514 Cb and Cr planes was stored even if it is not used (chroma_planes set 515 to 0), this index is kept for version <= 3 in order to keep 516 compatibility with FFV1 bitstreams in the wild. 518 3.7. Color spaces 520 FFV1 supports two color spaces: YCbCr and RGB. Both color spaces 521 allow an optional Alpha plane that can be used to code transparency 522 data. 524 3.7.1. YCbCr 526 In YCbCr color space, the Cb and Cr planes are optional, but if used 527 then MUST be used together. Omitting the Cb and Cr planes codes the 528 frames in grayscale without color data. An FFV1 "Frame" using YCbCr 529 MUST use one of the following arrangements: 531 o Y 533 o Y, Alpha 535 o Y, Cb, Cr 537 o Y, Cb, Cr, Alpha 539 The Y plane MUST be coded first. If the Cb and Cr planes are used 540 then they MUST be coded after the Y plane. If an Alpha 541 (transparency) plane is used, then it MUST be coded last. 543 3.7.2. RGB 545 JPEG2000-RCT is a Reversible Color Transform that codes RGB (red, 546 green, blue) planes losslessly in a modified YCbCr color space. 547 Reversible Pixel transformations between YCbCr and RGB use the 548 following formulae. 550 Cb=b-g 552 Cr=r-g 554 Y=g+(Cb+Cr)>>2 556 g=Y-(Cb+Cr)>>2 558 r=Cr+g 560 b=Cb+g 562 Exception for the JPEG2000-RCT conversion: if bits_per_raw_sample is 563 between 9 and 15 inclusive, the following formulae for reversible 564 conversions between YCbCr and RGB MUST be used instead of the ones 565 above: 567 Cb=g-b 569 Cr=r-b 571 Y=b+(Cb+Cr)>>2 573 b=Y-(Cb+Cr)>>2 575 r=Cr+b 577 g=Cb+b 579 Background: At the time of this writing, in all known implementations 580 of FFV1 bitstream, when bits_per_raw_sample was between 9 and 15 581 inclusive, GBR planes were used as BGR planes during both encoding 582 and decoding. In the meanwhile, 16-bit JPEG2000-RCT was implemented 583 without this issue in one implementation and validated by one 584 conformance checker. Methods to address this exception for the 585 transform are under consideration for the next version of the FFV1 586 bitstream. 588 [ISO.15444-1.2016] 590 When FFV1 uses the JPEG2000-RCT, the horizontal lines are interleaved 591 to improve caching efficiency since it is most likely that the RCT 592 will immediately be converted to RGB during decoding. The 593 interleaved coding order is also Y, then Cb, then Cr, and then if 594 used Alpha. 596 As an example, a "Frame" that is two pixels wide and two pixels high, 597 could be comprised of the following structure: 599 +------------------------+------------------------+ 600 | Pixel[1,1] | Pixel[2,1] | 601 | Y[1,1] Cb[1,1] Cr[1,1] | Y[2,1] Cb[2,1] Cr[2,1] | 602 +------------------------+------------------------+ 603 | Pixel[1,2] | Pixel[2,2] | 604 | Y[1,2] Cb[1,2] Cr[1,2] | Y[2,2] Cb[2,2] Cr[2,2] | 605 +------------------------+------------------------+ 607 In JPEG2000-RCT, the coding order would be left to right and then top 608 to bottom, with values interleaved by lines and stored in this order: 610 Y[1,1] Y[2,1] Cb[1,1] Cb[2,1] Cr[1,1] Cr[2,1] Y[1,2] Y[2,2] Cb[1,2] 611 Cb[2,2] Cr[1,2] Cr[2,2] 613 3.8. Coding of the Sample Difference 615 Instead of coding the n+1 bits of the Sample Difference with Huffman 616 or Range coding (or n+2 bits, in the case of RCT), only the n (or 617 n+1) least significant bits are used, since this is sufficient to 618 recover the original sample. In the equation below, the term "bits" 619 represents bits_per_raw_sample+1 for RCT or bits_per_raw_sample 620 otherwise: 622 coder_input = 623 [(sample_difference + 2^(bits-1)) & (2^bits - 1)] - 2^(bits-1) 625 3.8.1. Range coding mode 627 Early experimental versions of FFV1 used the CABAC Arithmetic coder 628 from H.264 as defined in [ISO.14496-10.2014] but due to the uncertain 629 patent/royalty situation, as well as its slightly worse performance, 630 CABAC was replaced by a Range coder based on an algorithm defined by 631 _G. Nigel_ and _N. Martin_ in 1979 [range-coding]. 633 3.8.1.1. Range binary values 635 To encode binary digits efficiently a Range coder is used. "C_{i}" 636 is the i-th Context. "B_{i}" is the i-th byte of the bytestream. 637 "b_{i}" is the i-th Range coded binary value, "S_{0,i}" is the i-th 638 initial state, which is 128. The length of the bytestream encoding n 639 binary symbols is "j_{n}" bytes. 641 r_{i} = floor( ( R_{i} * S_{i,C_{i}} ) / 2^8 ) 643 S_{i+1,C_{i}} = zero_state_{S_{i,C_{i}}} XOR 644 l_i = L_i XOR 645 t_i = R_i - r_i <== 646 b_i = 0 <==> 647 L_i < R_i - r_i 649 S_{i+1,C_{i}} = one_state_{S_{i,C_{i}}} XOR 650 l_i = L_i - R_i + r_i XOR 651 t_i = r_i <== 652 b_i = 1 <==> 653 L_i >= R_i - r_i 655 S_{i+1,k} = S_{i,k} <== C_i != k 657 R_{i+1} = 2^8 * t_{i} XOR 658 L_{i+1} = 2^8 * l_{i} + B_{j_{i}} XOR 659 j_{i+1} = j_{i} + 1 <== 660 t_{i} < 2^8 661 R_{i+1} = t_{i} XOR 662 L_{i+1} = l_{i} XOR 663 j_{i+1} = j_{i} <== 664 t_{i} >= 2^8 666 R_{0} = 65280 668 L_{0} = 2^8 * B_{0} + B_{1} 670 j_{0} = 2 672 3.8.1.2. Range non binary values 674 To encode scalar integers, it would be possible to encode each bit 675 separately and use the past bits as context. However that would mean 676 255 contexts per 8-bit symbol which is not only a waste of memory but 677 also requires more past data to reach a reasonably good estimate of 678 the probabilities. Alternatively assuming a Laplacian distribution 679 and only dealing with its variance and mean (as in Huffman coding) 680 would also be possible, however, for maximum flexibility and 681 simplicity, the chosen method uses a single symbol to encode if a 682 number is 0 and if not encodes the number using its exponent, 683 mantissa and sign. The exact contexts used are best described by the 684 following code, followed by some comments. 686 pseudo-code | type 687 --------------------------------------------------------------|----- 688 void put_symbol(RangeCoder *c, uint8_t *state, int v, int \ | 689 is_signed) { | 690 int i; | 691 put_rac(c, state+0, !v); | 692 if (v) { | 693 int a= abs(v); | 694 int e= log2(a); | 695 | 696 for (i=0; i=0; i--) | 701 put_rac(c, state+22+min(i,9), (a>>i)&1); //22..31 | 702 | 703 if (is_signed) | 704 put_rac(c, state+11 + min(e, 10), v < 0); //11..21| 705 } | 706 } | 708 3.8.1.3. Initial values for the context model 710 At keyframes all Range coder state variables are set to their initial 711 state. 713 3.8.1.4. State transition table 715 one_state_{i} = 716 default_state_transition_{i} + state_transition_delta_{i} 718 zero_state_{i} = 256 - one_state_{256-i} 720 3.8.1.5. default_state_transition 722 0, 0, 0, 0, 0, 0, 0, 0, 20, 21, 22, 23, 24, 25, 26, 27, 724 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 37, 38, 39, 40, 41, 42, 726 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 56, 57, 728 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 730 74, 75, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 732 89, 90, 91, 92, 93, 94, 94, 95, 96, 97, 98, 99,100,101,102,103, 734 104,105,106,107,108,109,110,111,112,113,114,114,115,116,117,118, 736 119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,133, 738 134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149, 740 150,151,152,152,153,154,155,156,157,158,159,160,161,162,163,164, 742 165,166,167,168,169,170,171,171,172,173,174,175,176,177,178,179, 744 180,181,182,183,184,185,186,187,188,189,190,190,191,192,194,194, 746 195,196,197,198,199,200,201,202,202,204,205,206,207,208,209,209, 748 210,211,212,213,215,215,216,217,218,219,220,220,222,223,224,225, 750 226,227,227,229,229,230,231,232,234,234,235,236,237,238,239,240, 752 241,242,243,244,245,246,247,248,248, 0, 0, 0, 0, 0, 0, 0, 754 3.8.1.6. alternative state transition table 756 The alternative state transition table has been built using iterative 757 minimization of frame sizes and generally performs better than the 758 default. To use it, the coder_type MUST be set to 2 and the 759 difference to the default MUST be stored in the parameters. The 760 reference implementation of FFV1 in FFmpeg uses this table by default 761 at the time of this writing when Range coding is used. 763 0, 10, 10, 10, 10, 16, 16, 16, 28, 16, 16, 29, 42, 49, 20, 49, 765 59, 25, 26, 26, 27, 31, 33, 33, 33, 34, 34, 37, 67, 38, 39, 39, 767 40, 40, 41, 79, 43, 44, 45, 45, 48, 48, 64, 50, 51, 52, 88, 52, 769 53, 74, 55, 57, 58, 58, 74, 60,101, 61, 62, 84, 66, 66, 68, 69, 771 87, 82, 71, 97, 73, 73, 82, 75,111, 77, 94, 78, 87, 81, 83, 97, 773 85, 83, 94, 86, 99, 89, 90, 99,111, 92, 93,134, 95, 98,105, 98, 775 105,110,102,108,102,118,103,106,106,113,109,112,114,112,116,125, 777 115,116,117,117,126,119,125,121,121,123,145,124,126,131,127,129, 779 165,130,132,138,133,135,145,136,137,139,146,141,143,142,144,148, 781 147,155,151,149,151,150,152,157,153,154,156,168,158,162,161,160, 783 172,163,169,164,166,184,167,170,177,174,171,173,182,176,180,178, 785 175,189,179,181,186,183,192,185,200,187,191,188,190,197,193,196, 787 197,194,195,196,198,202,199,201,210,203,207,204,205,206,208,214, 789 209,211,221,212,213,215,224,216,217,218,219,220,222,228,223,225, 791 226,224,227,229,240,230,231,232,233,234,235,236,238,239,237,242, 793 241,243,242,244,245,246,247,248,249,250,251,252,252,253,254,255, 795 3.8.2. Golomb Rice mode 797 This coding mode uses Golomb Rice codes. The VLC is split into 2 798 parts, the prefix stores the most significant bits, the suffix stores 799 the k least significant bits or stores the whole number in the ESC 800 case. The end of the bitstream of the "Frame" is filled with 0-bits 801 until that the bitstream contains a multiple of 8 bits. 803 3.8.2.1. Prefix 805 +----------------+-------+ 806 | bits | value | 807 +----------------+-------+ 808 | 1 | 0 | 809 | 01 | 1 | 810 | ... | ... | 811 | 0000 0000 0001 | 11 | 812 | 0000 0000 0000 | ESC | 813 +----------------+-------+ 815 3.8.2.2. Suffix 817 +-------+-----------------------------------------------------------+ 818 | non | the k least significant bits MSB first | 819 | ESC | | 820 | ESC | the value - 11, in MSB first order, ESC may only be used | 821 | | if the value cannot be coded as non ESC | 822 +-------+-----------------------------------------------------------+ 824 3.8.2.3. Examples 826 +-----+-------------------------+-------+ 827 | k | bits | value | 828 +-----+-------------------------+-------+ 829 | 0 | "1" | 0 | 830 | 0 | "001" | 2 | 831 | 2 | "1 00" | 0 | 832 | 2 | "1 10" | 2 | 833 | 2 | "01 01" | 5 | 834 | any | "000000000000 10000000" | 139 | 835 +-----+-------------------------+-------+ 837 3.8.2.4. Run mode 839 Run mode is entered when the context is 0 and left as soon as a non-0 840 difference is found. The level is identical to the predicted one. 841 The run and the first different level is coded. 843 3.8.2.5. Run length coding 845 The run value is encoded in 2 parts, the prefix part stores the more 846 significant part of the run as well as adjusting the run_index which 847 determines the number of bits in the less significant part of the 848 run. The 2nd part of the value stores the less significant part of 849 the run as it is. The run_index is reset for each plane and slice to 850 0. 852 pseudo-code | type 853 --------------------------------------------------------------|----- 854 log2_run[41]={ | 855 0, 0, 0, 0, 1, 1, 1, 1, | 856 2, 2, 2, 2, 3, 3, 3, 3, | 857 4, 4, 5, 5, 6, 6, 7, 7, | 858 8, 9,10,11,12,13,14,15, | 859 16,17,18,19,20,21,22,23, | 860 24, | 861 }; | 862 | 863 if (run_count == 0 && run_mode == 1) { | 864 if (get_bits(1)) { | 865 run_count = 1 << log2_run[run_index]; | 866 if (x + run_count <= w) | 867 run_index++; | 868 } else { | 869 if (log2_run[run_index]) | 870 run_count = get_bits(log2_run[run_index]); | 871 else | 872 run_count = 0; | 873 if (run_index) | 874 run_index--; | 875 run_mode = 2; | 876 } | 877 } | 879 The log2_run function is also used within [ISO.14495-1.1999]. 881 3.8.2.6. Level coding 883 Level coding is identical to the normal difference coding with the 884 exception that the 0 value is removed as it cannot occur: 886 if (diff>0) diff--; 887 encode(diff); 889 Note, this is different from JPEG-LS, which doesn't use prediction in 890 run mode and uses a different encoding and context model for the last 891 difference On a small set of test samples the use of prediction 892 slightly improved the compression rate. 894 4. Bitstream 895 +--------+----------------------------------------------------------+ 896 | Symbol | Definition | 897 +--------+----------------------------------------------------------+ 898 | u(n) | unsigned big endian integer using n bits | 899 | sg | Golomb Rice coded signed scalar symbol coded with the | 900 | | method described in Section 3.8.2 | 901 | br | Range coded Boolean (1-bit) symbol with the method | 902 | | described in Section 3.8.1.1 | 903 | ur | Range coded unsigned scalar symbol coded with the method | 904 | | described in Section 3.8.1.2 | 905 | sr | Range coded signed scalar symbol coded with the method | 906 | | described in Section 3.8.1.2 | 907 +--------+----------------------------------------------------------+ 909 The same context which is initialized to 128 is used for all fields 910 in the header. 912 The following MUST be provided by external means during 913 initialization of the decoder: 915 "frame_pixel_width" is defined as "Frame" width in pixels. 917 "frame_pixel_height" is defined as "Frame" height in pixels. 919 Default values at the decoder initialization phase: 921 "ConfigurationRecordIsPresent" is set to 0. 923 4.1. Configuration Record 925 In the case of a FFV1 bitstream with "version >= 3", a "Configuration 926 Record" is stored in the underlying container, at the track header 927 level. It contains the parameters used for all instances of "Frame". 928 The size of the "Configuration Record", "NumBytes", is supplied by 929 the underlying container. 931 pseudo-code | type 932 --------------------------------------------------------------|----- 933 ConfigurationRecord( NumBytes ) { | 934 ConfigurationRecordIsPresent = 1 | 935 Parameters( ) | 936 while( remaining_bits_in_bitstream( NumBytes ) > 32 ) | 937 reserved_for_future_use | u(1) 938 configuration_record_crc_parity | u(32) 939 } | 941 4.1.1. reserved_for_future_use 943 "reserved_for_future_use" has semantics that are reserved for future 944 use. 945 Encoders conforming to this version of this specification SHALL NOT 946 write this value. 947 Decoders conforming to this version of this specification SHALL 948 ignore its value. 950 4.1.2. configuration_record_crc_parity 952 "configuration_record_crc_parity" 32 bits that are chosen so that the 953 "Configuration Record" as a whole has a crc remainder of 0. 954 This is equivalent to storing the crc remainder in the 32-bit parity. 955 The CRC generator polynomial used is the standard IEEE CRC polynomial 956 (0x104C11DB7) with initial value 0. 958 4.1.3. Mapping FFV1 into Containers 960 This "Configuration Record" can be placed in any file format 961 supporting "Configuration Records", fitting as much as possible with 962 how the file format uses to store "Configuration Records". The 963 "Configuration Record" storage place and "NumBytes" are currently 964 defined and supported by this version of this specification for the 965 following container formats: 967 4.1.3.1. AVI File Format 969 The "Configuration Record" extends the stream format chunk ("AVI ", 970 "hdlr", "strl", "strf") with the ConfigurationRecord bitstream. 971 See [AVI] for more information about chunks. 973 "NumBytes" is defined as the size, in bytes, of the strf chunk 974 indicated in the chunk header minus the size of the stream format 975 structure. 977 4.1.3.2. ISO Base Media File Format 979 The "Configuration Record" extends the sample description box 980 ("moov", "trak", "mdia", "minf", "stbl", "stsd") with a "glbl" box 981 which contains the ConfigurationRecord bitstream. See 982 [ISO.14496-12.2015] for more information about boxes. 984 "NumBytes" is defined as the size, in bytes, of the "glbl" box 985 indicated in the box header minus the size of the box header. 987 4.1.3.3. NUT File Format 989 The codec_specific_data element (in "stream_header" packet) contains 990 the ConfigurationRecord bitstream. See [NUT] for more information 991 about elements. 993 "NumBytes" is defined as the size, in bytes, of the 994 codec_specific_data element as indicated in the "length" field of 995 codec_specific_data 997 4.1.3.4. Matroska File Format 999 FFV1 SHOULD use "V_FFV1" as the Matroska "Codec ID". For FFV1 1000 versions 2 or less, the Matroska "CodecPrivate" Element SHOULD NOT be 1001 used. For FFV1 versions 3 or greater, the Matroska "CodecPrivate" 1002 Element MUST contain the FFV1 "Configuration Record" structure and no 1003 other data. See [Matroska] for more information about elements. 1005 "NumBytes" is defined as the "Element Data Size" of the 1006 "CodecPrivate" Element. 1008 4.2. Frame 1010 A "Frame" consists of the keyframe field, parameters (if version 1011 <=1), and a sequence of independent slices. 1013 pseudo-code | type 1014 --------------------------------------------------------------|----- 1015 Frame( NumBytes ) { | 1016 keyframe | br 1017 if (keyframe && !ConfigurationRecordIsPresent | 1018 Parameters( ) | 1019 while ( remaining_bits_in_bitstream( NumBytes ) ) | 1020 Slice( ) | 1021 } | 1023 4.3. Slice 1024 pseudo-code | type 1025 --------------------------------------------------------------|----- 1026 Slice( ) { | 1027 if (version >= 3) | 1028 SliceHeader( ) | 1029 SliceContent( ) | 1030 if (coder_type == 0) | 1031 while (!byte_aligned()) | 1032 padding | u(1) 1033 if (version >= 3) | 1034 SliceFooter( ) | 1035 } | 1037 "padding" specifies a bit without any significance and used only for 1038 byte alignment. MUST be 0. 1040 4.4. Slice Header 1042 pseudo-code | type 1043 --------------------------------------------------------------|----- 1044 SliceHeader( ) { | 1045 slice_x | ur 1046 slice_y | ur 1047 slice_width - 1 | ur 1048 slice_height - 1 | ur 1049 for( i = 0; i < quant_table_set_index_count; i++ ) | 1050 quant_table_set_index [ i ] | ur 1051 picture_structure | ur 1052 sar_num | ur 1053 sar_den | ur 1054 if (version >= 4) { | 1055 reset_contexts | br 1056 slice_coding_mode | ur 1057 } | 1058 } | 1060 4.4.1. slice_x 1062 "slice_x" indicates the x position on the slice raster formed by 1063 num_h_slices. 1064 Inferred to be 0 if not present. 1066 4.4.2. slice_y 1068 "slice_y" indicates the y position on the slice raster formed by 1069 num_v_slices. 1070 Inferred to be 0 if not present. 1072 4.4.3. slice_width 1074 "slice_width" indicates the width on the slice raster formed by 1075 num_h_slices. 1076 Inferred to be 1 if not present. 1078 4.4.4. slice_height 1080 "slice_height" indicates the height on the slice raster formed by 1081 num_v_slices. 1082 Inferred to be 1 if not present. 1084 4.4.5. quant_table_set_index_count 1086 "quant_table_set_index_count" is defined as "1 + ( ( chroma_planes || 1087 version \<= 3 ) ? 1 : 0 ) + ( alpha_plane ? 1 : 0 )". 1089 4.4.6. quant_table_set_index 1091 "quant_table_set_index" indicates the Quantization Table Set index to 1092 select the Quantization Table Set and the initial states for the 1093 slice. 1094 Inferred to be 0 if not present. 1096 4.4.7. picture_structure 1098 "picture_structure" specifies the temporal and spatial relationship 1099 of each line of the "Frame". 1100 Inferred to be 0 if not present. 1102 +-------+-------------------------+ 1103 | value | picture structure used | 1104 +-------+-------------------------+ 1105 | 0 | unknown | 1106 | 1 | top field first | 1107 | 2 | bottom field first | 1108 | 3 | progressive | 1109 | Other | reserved for future use | 1110 +-------+-------------------------+ 1112 4.4.8. sar_num 1114 "sar_num" specifies the sample aspect ratio numerator. 1115 Inferred to be 0 if not present. 1116 MUST be 0 if sample aspect ratio is unknown. 1118 4.4.9. sar_den 1120 "sar_den" specifies the sample aspect ratio numerator. 1121 Inferred to be 0 if not present. 1122 MUST be 0 if sample aspect ratio is unknown. 1124 4.4.10. reset_contexts 1126 "reset_contexts" indicates if slice contexts must be reset. 1127 Inferred to be 0 if not present. 1129 4.4.11. slice_coding_mode 1131 "slice_coding_mode" indicates the slice coding mode. 1132 Inferred to be 0 if not present. 1134 +-------+-----------------------------+ 1135 | value | slice coding mode | 1136 +-------+-----------------------------+ 1137 | 0 | Range Coding or Golomb Rice | 1138 | 1 | raw PCM | 1139 | Other | reserved for future use | 1140 +-------+-----------------------------+ 1142 4.5. Slice Content 1144 pseudo-code | type 1145 --------------------------------------------------------------|----- 1146 SliceContent( ) { | 1147 if (colorspace_type == 0) { | 1148 for( p = 0; p < primary_color_count; p++ ) | 1149 for( y = 0; y < plane_pixel_height[ p ]; y++ ) | 1150 Line( p, y ) | 1151 } else if (colorspace_type == 1) { | 1152 for( y = 0; y < slice_pixel_height; y++ ) | 1153 for( p = 0; p < primary_color_count; p++ ) | 1154 Line( p, y ) | 1155 } | 1156 } | 1158 4.5.1. primary_color_count 1160 "primary_color_count" is defined as 1 + ( chroma_planes ? 2 : 0 ) + ( 1161 alpha_plane ? 1 : 0 ). 1163 4.5.2. plane_pixel_height 1165 "plane_pixel_height[ p ]" is the height in pixels of plane p of the 1166 slice. 1167 "plane_pixel_height[ 0 ]" and "plane_pixel_height[ 1 + ( 1168 chroma_planes ? 2 : 0 ) ]" value is "slice_pixel_height". 1169 If "chroma_planes" is set to 1, "plane_pixel_height[ 1 ]" and 1170 "plane_pixel_height[ 2 ]" value is "ceil(slice_pixel_height / 1171 log2_v_chroma_subsample)". 1173 4.5.3. slice_pixel_height 1175 "slice_pixel_height" is the height in pixels of the slice. 1176 Its value is "floor(( slice_y + slice_height ) * slice_pixel_height / 1177 num_v_slices) - slice_pixel_y". 1179 4.5.4. slice_pixel_y 1181 "slice_pixel_y" is the slice vertical position in pixels. 1182 Its value is "floor(slice_y * frame_pixel_height / num_v_slices)". 1184 4.6. Line 1186 pseudo-code | type 1187 --------------------------------------------------------------|----- 1188 Line( p, y ) { | 1189 if (colorspace_type == 0) { | 1190 for( x = 0; x < plane_pixel_width[ p ]; x++ ) | 1191 sample_difference[ p ][ y ][ x ] | 1192 } else if (colorspace_type == 1) { | 1193 for( x = 0; x < slice_pixel_width; x++ ) | 1194 sample_difference[ p ][ y ][ x ] | 1195 } | 1196 } | 1198 4.6.1. plane_pixel_width 1200 "plane_pixel_width[ p ]" is the width in pixels of plane p of the 1201 slice. 1202 "plane_pixel_width[ 0 ]" and "plane_pixel_width[ 1 + ( chroma_planes 1203 ? 2 : 0 ) ]" value is "slice_pixel_width". 1204 If "chroma_planes" is set to 1, "plane_pixel_width[ 1 ]" and 1205 "plane_pixel_width[ 2 ]" value is "ceil(slice_pixel_width / (1 << 1206 log2_h_chroma_subsample))". 1208 4.6.2. slice_pixel_width 1210 "slice_pixel_width" is the width in pixels of the slice. 1211 Its value is "floor(( slice_x + slice_width ) * slice_pixel_width / 1212 num_h_slices) - slice_pixel_x". 1214 4.6.3. slice_pixel_x 1216 "slice_pixel_x" is the slice horizontal position in pixels. 1217 Its value is "floor(slice_x * frame_pixel_width / num_h_slices)". 1219 4.6.4. sample_difference 1221 "sample_difference[ p ][ y ][ x ]" is the sample difference for 1222 sample at plane "p", y position "y" and x position "x". Sample value 1223 is computed based on prediction and context described in Section 3.2. 1225 4.7. Slice Footer 1227 Note: slice footer is always byte aligned. 1229 pseudo-code | type 1230 --------------------------------------------------------------|----- 1231 SliceFooter( ) { | 1232 slice_size | u(24) 1233 if (ec) { | 1234 error_status | u(8) 1235 slice_crc_parity | u(32) 1236 } | 1237 } | 1239 4.7.1. slice_size 1241 "slice_size" indicates the size of the slice in bytes. 1242 Note: this allows finding the start of slices before previous slices 1243 have been fully decoded. And allows this way parallel decoding as 1244 well as error resilience. 1246 4.7.2. error_status 1248 "error_status" specifies the error status. 1250 +-------+--------------------------------------+ 1251 | value | error status | 1252 +-------+--------------------------------------+ 1253 | 0 | no error | 1254 | 1 | slice contains a correctable error | 1255 | 2 | slice contains a uncorrectable error | 1256 | Other | reserved for future use | 1257 +-------+--------------------------------------+ 1259 4.7.3. slice_crc_parity 1261 "slice_crc_parity" 32 bits that are chosen so that the slice as a 1262 whole has a crc remainder of 0. 1263 This is equivalent to storing the crc remainder in the 32-bit parity. 1264 The CRC generator polynomial used is the standard IEEE CRC polynomial 1265 (0x104C11DB7) with initial value 0. 1267 4.8. Parameters 1268 pseudo-code | type 1269 --------------------------------------------------------------|----- 1270 Parameters( ) { | 1271 version | ur 1272 if (version >= 3) | 1273 micro_version | ur 1274 coder_type | ur 1275 if (coder_type > 1) | 1276 for (i = 1; i < 256; i++) | 1277 state_transition_delta[ i ] | sr 1278 colorspace_type | ur 1279 if (version >= 1) | 1280 bits_per_raw_sample | ur 1281 chroma_planes | br 1282 log2_h_chroma_subsample | ur 1283 log2_v_chroma_subsample | ur 1284 alpha_plane | br 1285 if (version >= 3) { | 1286 num_h_slices - 1 | ur 1287 num_v_slices - 1 | ur 1288 quant_table_set_count | ur 1289 } | 1290 for( i = 0; i < quant_table_set_count; i++ ) | 1291 QuantizationTableSet( i ) | 1292 if (version >= 3) { | 1293 for( i = 0; i < quant_table_set_count; i++ ) { | 1294 states_coded | br 1295 if (states_coded) | 1296 for( j = 0; j < context_count[ i ]; j++ ) | 1297 for( k = 0; k < CONTEXT_SIZE; k++ ) | 1298 initial_state_delta[ i ][ j ][ k ] | sr 1299 } | 1300 ec | ur 1301 intra | ur 1302 } | 1303 } | 1305 4.8.1. version 1307 "version" specifies the version of the FFV1 bitstream. 1308 Each version is incompatible with others versions: decoders SHOULD 1309 reject a file due to unknown version. 1310 Decoders SHOULD reject a file with version <= 1 && 1311 ConfigurationRecordIsPresent == 1. 1312 Decoders SHOULD reject a file with version >= 3 && 1313 ConfigurationRecordIsPresent == 0. 1315 +-------+-------------------------+ 1316 | value | version | 1317 +-------+-------------------------+ 1318 | 0 | FFV1 version 0 | 1319 | 1 | FFV1 version 1 | 1320 | 2 | reserved* | 1321 | 3 | FFV1 version 3 | 1322 | Other | reserved for future use | 1323 +-------+-------------------------+ 1325 * Version 2 was never enabled in the encoder thus version 2 files 1326 SHOULD NOT exist, and this document does not describe them to keep 1327 the text simpler. 1329 4.8.2. micro_version 1331 "micro_version" specifies the micro-version of the FFV1 bitstream. 1332 After a version is considered stable (a micro-version value is 1333 assigned to be the first stable variant of a specific version), each 1334 new micro-version after this first stable variant is compatible with 1335 the previous micro-version: decoders SHOULD NOT reject a file due to 1336 an unknown micro-version equal or above the micro-version considered 1337 as stable. 1339 Meaning of micro_version for version 3: 1341 +-------+-------------------------+ 1342 | value | micro_version | 1343 +-------+-------------------------+ 1344 | 0...3 | reserved* | 1345 | 4 | first stable variant | 1346 | Other | reserved for future use | 1347 +-------+-------------------------+ 1349 * development versions which may be incompatible with the stable 1350 variants. 1352 Meaning of micro_version for version 4 (note: at the time of writing 1353 of this specification, version 4 is not considered stable so the 1354 first stable version value is to be announced in the future): 1356 +---------+-------------------------+ 1357 | value | micro_version | 1358 +---------+-------------------------+ 1359 | 0...TBA | reserved* | 1360 | TBA | first stable variant | 1361 | Other | reserved for future use | 1362 +---------+-------------------------+ 1364 * development versions which may be incompatible with the stable 1365 variants. 1367 4.8.3. coder_type 1369 "coder_type" specifies the coder used. 1371 +-------+-------------------------------------------------+ 1372 | value | coder used | 1373 +-------+-------------------------------------------------+ 1374 | 0 | Golomb Rice | 1375 | 1 | Range Coder with default state transition table | 1376 | 2 | Range Coder with custom state transition table | 1377 | Other | reserved for future use | 1378 +-------+-------------------------------------------------+ 1380 4.8.4. state_transition_delta 1382 "state_transition_delta" specifies the Range coder custom state 1383 transition table. 1384 If state_transition_delta is not present in the FFV1 bitstream, all 1385 Range coder custom state transition table elements are assumed to be 1386 0. 1388 4.8.5. colorspace_type 1390 "colorspace_type" specifies color space losslessly encoded, Pixel 1391 transformation used by the encoder, as well as interleave method. 1393 +-------+---------------------+------------------+------------------+ 1394 | value | color space | transformation | interleave | 1395 | | losslessly encoded | | method | 1396 +-------+---------------------+------------------+------------------+ 1397 | 0 | YCbCr | No Pixel | plane then line | 1398 | | | transformation | | 1399 | 1 | RGB | JPEG2000-RCT | line then plane | 1400 | Other | reserved for future | reserved for | reserved for | 1401 | | use | future use | future use | 1402 +-------+---------------------+------------------+------------------+ 1404 Restrictions: 1405 If "colorspace_type" is 1, "chroma_planes" MUST be 1, 1406 "h_chroma_subsample" MUST be 1, "v_chroma_subsample" MUST be 1. 1408 4.8.6. chroma_planes 1410 "chroma_planes" indicates if chroma (color) planes are present. 1412 +-------+-------------------------------+ 1413 | value | presence | 1414 +-------+-------------------------------+ 1415 | 0 | chroma planes are not present | 1416 | 1 | chroma planes are present | 1417 +-------+-------------------------------+ 1419 4.8.7. bits_per_raw_sample 1421 "bits_per_raw_sample" indicates the number of bits for each sample. 1422 Inferred to be 8 if not present. 1424 +-------+---------------------------------+ 1425 | value | bits for each sample | 1426 +-------+---------------------------------+ 1427 | 0 | reserved* | 1428 | Other | the actual bits for each sample | 1429 +-------+---------------------------------+ 1431 * Encoders MUST NOT store bits_per_raw_sample = 0 Decoders SHOULD 1432 accept and interpret bits_per_raw_sample = 0 as 8. 1434 4.8.8. log2_h_chroma_subsample 1436 "log2_h_chroma_subsample" indicates the subsample factor, stored in 1437 powers to which the number 2 must be raised, between luma and chroma 1438 width ("chroma_width = 2^(-log2_h_chroma_subsample) * luma_width"). 1440 4.8.9. log2_v_chroma_subsample 1442 "log2_v_chroma_subsample" indicates the subsample factor, stored in 1443 powers to which the number 2 must be raised, between luma and chroma 1444 height ("chroma_height=2^(-log2_v_chroma_subsample) * luma_height"). 1446 4.8.10. alpha_plane 1448 "alpha_plane" indicates if a transparency plane is present. 1450 +-------+-----------------------------------+ 1451 | value | presence | 1452 +-------+-----------------------------------+ 1453 | 0 | transparency plane is not present | 1454 | 1 | transparency plane is present | 1455 +-------+-----------------------------------+ 1457 4.8.11. num_h_slices 1459 "num_h_slices" indicates the number of horizontal elements of the 1460 slice raster. 1461 Inferred to be 1 if not present. 1463 4.8.12. num_v_slices 1465 "num_v_slices" indicates the number of vertical elements of the slice 1466 raster. 1467 Inferred to be 1 if not present. 1469 4.8.13. quant_table_set_count 1471 "quant_table_set_count" indicates the number of Quantization 1472 Table Sets. 1473 Inferred to be 1 if not present. 1474 MUST NOT be 0. 1476 4.8.14. states_coded 1478 "states_coded" indicates if the respective Quantization Table Set has 1479 the initial states coded. 1480 Inferred to be 0 if not present. 1482 +-------+-----------------------------------------------------------+ 1483 | value | initial states | 1484 +-------+-----------------------------------------------------------+ 1485 | 0 | initial states are not present and are assumed to be all | 1486 | | 128 | 1487 | 1 | initial states are present | 1488 +-------+-----------------------------------------------------------+ 1490 4.8.15. initial_state_delta 1492 "initial_state_delta[ i ][ j ][ k ]" indicates the initial Range 1493 coder state, it is encoded using "k" as context index and 1495 pred = j ? initial_states[ i ][j - 1][ k ] : 128 1497 initial_state[ i ][ j ][ k ] = 1498 ( pred + initial_state_delta[ i ][ j ][ k ] ) & 255 1500 4.8.16. ec 1502 "ec" indicates the error detection/correction type. 1504 +-------+--------------------------------------------+ 1505 | value | error detection/correction type | 1506 +-------+--------------------------------------------+ 1507 | 0 | 32-bit CRC on the global header | 1508 | 1 | 32-bit CRC per slice and the global header | 1509 | Other | reserved for future use | 1510 +-------+--------------------------------------------+ 1512 4.8.17. intra 1514 "intra" indicates the relationship between the instances of "Frame". 1515 Inferred to be 0 if not present. 1517 +-------+-----------------------------------------------------------+ 1518 | value | relationship | 1519 +-------+-----------------------------------------------------------+ 1520 | 0 | Frames are independent or dependent (keyframes and non | 1521 | | keyframes) | 1522 | 1 | Frames are independent (keyframes only) | 1523 | Other | reserved for future use | 1524 +-------+-----------------------------------------------------------+ 1526 4.9. Quantization Table Set 1528 The Quantization Table Sets are stored by storing the number of equal 1529 entries -1 of the first half of the table (represented as "len - 1" 1530 in the pseudo-code below) using the method described in 1531 Section 3.8.1.2. The second half doesn't need to be stored as it is 1532 identical to the first with flipped sign. 1534 example: 1536 Table: 0 0 1 1 1 1 2 2 -2 -2 -2 -1 -1 -1 -1 0 1538 Stored values: 1, 3, 1 1540 pseudo-code | type 1541 --------------------------------------------------------------|----- 1542 QuantizationTableSet( i ) { | 1543 scale = 1 | 1544 for( j = 0; j < MAX_CONTEXT_INPUTS; j++ ) { | 1545 QuantizationTable( i, j, scale ) | 1546 scale *= 2 * len_count[ i ][ j ] - 1 | 1547 } | 1548 context_count[ i ] = ( scale + 1 ) / 2 | 1549 } | 1551 MAX_CONTEXT_INPUTS is 5. 1553 pseudo-code | type 1554 --------------------------------------------------------------|----- 1555 QuantizationTable(i, j, scale) { | 1556 v = 0 | 1557 for( k = 0; k < 128; ) { | 1558 len - 1 | ur 1559 for( a = 0; a < len; a++ ) { | 1560 quant_tables[ i ][ j ][ k ] = scale* v | 1561 k++ | 1562 } | 1563 v++ | 1564 } | 1565 for( k = 1; k < 128; k++ ) { | 1566 quant_tables[ i ][ j ][ 256 - k ] = \ | 1567 -quant_tables[ i ][ j ][ k ] | 1568 } | 1569 quant_tables[ i ][ j ][ 128 ] = \ | 1570 -quant_tables[ i ][ j ][ 127 ] | 1571 len_count[ i ][ j ] = v | 1572 } | 1574 4.9.1. quant_tables 1576 "quant_tables[ i ][ j ][ k ]" indicates the quantification table 1577 value of the Quantized Sample Difference "k" of the Quantization 1578 Table "j" of the Set Quantization Table Set "i". 1580 4.9.2. context_count 1582 "context_count[ i ]" indicates the count of contexts for Quantization 1583 Table Set "i". 1585 5. Restrictions 1587 To ensure that fast multithreaded decoding is possible, starting 1588 version 3 and if frame_pixel_width * frame_pixel_height is more than 1589 101376, slice_width * slice_height MUST be less or equal to 1590 num_h_slices * num_v_slices / 4. Note: 101376 is the frame size in 1591 pixels of a 352x288 frame also known as CIF ("Common Intermediate 1592 Format") frame size format. 1594 For each "Frame", each position in the slice raster MUST be filled by 1595 one and only one slice of the "Frame" (no missing slice position, no 1596 slice overlapping). 1598 For each "Frame" with keyframe value of 0, each slice MUST have the 1599 same value of slice_x, slice_y, slice_width, slice_height as a slice 1600 in the previous "Frame", except if reset_contexts is 1. 1602 6. Security Considerations 1604 Like any other codec, (such as [RFC6716]), FFV1 should not be used 1605 with insecure ciphers or cipher-modes that are vulnerable to known 1606 plaintext attacks. Some of the header bits as well as the padding 1607 are easily predictable. 1609 Implementations of the FFV1 codec need to take appropriate security 1610 considerations into account, as outlined in [RFC4732]. It is 1611 extremely important for the decoder to be robust against malicious 1612 payloads. Malicious payloads must not cause the decoder to overrun 1613 its allocated memory or to take an excessive amount of resources to 1614 decode. Although problems in encoders are typically rarer, the same 1615 applies to the encoder. Malicious video streams must not cause the 1616 encoder to misbehave because this would allow an attacker to attack 1617 transcoding gateways. A frequent security problem in image and video 1618 codecs is also to not check for integer overflows in Pixel count 1619 computations, that is to allocate width * height without considering 1620 that the multiplication result may have overflowed the arithmetic 1621 types range. 1623 The reference implementation [REFIMPL] contains no known buffer 1624 overflow or cases where a specially crafted packet or video segment 1625 could cause a significant increase in CPU load. 1627 The reference implementation [REFIMPL] was validated in the following 1628 conditions: 1630 o Sending the decoder valid packets generated by the reference 1631 encoder and verifying that the decoder's output matches the 1632 encoders input. 1634 o Sending the decoder packets generated by the reference encoder and 1635 then subjected to random corruption. 1637 o Sending the decoder random packets that are not FFV1. 1639 In all of the conditions above, the decoder and encoder was run 1640 inside the [VALGRIND] memory debugger as well as clangs address 1641 sanitizer [Address-Sanitizer], which track reads and writes to 1642 invalid memory regions as well as the use of uninitialized memory. 1643 There were no errors reported on any of the tested conditions. 1645 7. Appendixes 1646 7.1. Decoder implementation suggestions 1648 7.1.1. Multi-threading support and independence of slices 1650 The FFV1 bitstream is parsable in two ways: in sequential order as 1651 described in this document or with the pre-analysis of the footer of 1652 each slice. Each slice footer contains a slice_size field so the 1653 boundary of each slice is computable without having to parse the 1654 slice content. That allows multi-threading as well as independence 1655 of slice content (a bitstream error in a slice header or slice 1656 content has no impact on the decoding of the other slices). 1658 After having checked keyframe field, a decoder SHOULD parse 1659 slice_size fields, from slice_size of the last slice at the end of 1660 the "Frame" up to slice_size of the first slice at the beginning of 1661 the "Frame", before parsing slices, in order to have slices 1662 boundaries. A decoder MAY fallback on sequential order e.g. in case 1663 of a corrupted "Frame" (frame size unknown, slice_size of slices not 1664 coherent...) or if there is no possibility of seek into the stream. 1666 Architecture overview of slices in a "Frame": 1668 +-----------------------------------------------------------------+ 1669 | first slice header | 1670 | first slice content | 1671 | first slice footer | 1672 | --------------------------------------------------------------- | 1673 | second slice header | 1674 | second slice content | 1675 | second slice footer | 1676 | --------------------------------------------------------------- | 1677 | ... | 1678 | --------------------------------------------------------------- | 1679 | last slice header | 1680 | last slice content | 1681 | last slice footer | 1682 +-----------------------------------------------------------------+ 1684 8. Changelog 1686 See 1688 9. ToDo 1690 o mean,k estimation for the Golomb Rice codes 1692 10. References 1694 10.1. Normative References 1696 [ISO.15444-1.2016] 1697 International Organization for Standardization, 1698 "Information technology -- JPEG 2000 image coding system: 1699 Core coding system", October 2016. 1701 [ISO.9899.1990] 1702 International Organization for Standardization, 1703 "Programming languages - C", ISO Standard 9899, 1990. 1705 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1706 Requirement Levels", BCP 14, RFC 2119, 1707 DOI 10.17487/RFC2119, March 1997, 1708 . 1710 10.2. Informative References 1712 [Address-Sanitizer] 1713 The Clang Team, "ASAN AddressSanitizer website", undated, 1714 . 1716 [AVI] Microsoft, "AVI RIFF File Reference", undated, 1717 . 1720 [FFV1_V0] Niedermayer, M., "Commit to mark FFV1 version 0 as non- 1721 experimental", April 2006, . 1725 [FFV1_V1] Niedermayer, M., "Commit to release FFV1 version 1", April 1726 2009, . 1729 [FFV1_V3] Niedermayer, M., "Commit to mark FFV1 version 3 as non- 1730 experimental", August 2013, . 1734 [HuffYUV] Rudiak-Gould, B., "HuffYUV", December 2003, 1735 . 1738 [ISO.14495-1.1999] 1739 International Organization for Standardization, 1740 "Information technology -- Lossless and near-lossless 1741 compression of continuous-tone still images: Baseline", 1742 December 1999. 1744 [ISO.14496-10.2014] 1745 International Organization for Standardization, 1746 "Information technology -- Coding of audio-visual objects 1747 -- Part 10: Advanced Video Coding", September 2014. 1749 [ISO.14496-12.2015] 1750 International Organization for Standardization, 1751 "Information technology -- Coding of audio-visual objects 1752 -- Part 12: ISO base media file format", December 2015. 1754 [Matroska] 1755 IETF, "Matroska", 2016, . 1758 [NUT] Niedermayer, M., "NUT Open Container Format", December 1759 2013, . 1761 [range-coding] 1762 Nigel, G. and N. Martin, "Range encoding: an algorithm for 1763 removing redundancy from a digitised message.", Proc. 1764 Institution of Electronic and Radio Engineers 1765 International Conference on Video and Data Recording , 1766 July 1979. 1768 [REFIMPL] Niedermayer, M., "The reference FFV1 implementation / the 1769 FFV1 codec in FFmpeg", undated, . 1771 [RFC4732] Handley, M., Ed., Rescorla, E., Ed., and IAB, "Internet 1772 Denial-of-Service Considerations", RFC 4732, 1773 DOI 10.17487/RFC4732, December 2006, 1774 . 1776 [RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the 1777 Opus Audio Codec", RFC 6716, DOI 10.17487/RFC6716, 1778 September 2012, . 1780 [VALGRIND] 1781 Valgrind Developers, "Valgrind website", undated, 1782 . 1784 [YCbCr] Wikipedia, "YCbCr", undated, 1785 . 1787 Authors' Addresses 1789 Michael Niedermayer 1791 Email: michael@niedermayer.cc 1793 Dave Rice 1795 Email: dave@dericed.com 1797 Jerome Martinez 1799 Email: jerome@mediaarea.net