idnits 2.17.1 draft-ietf-cellar-ffv1-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (September 25, 2018) is 2039 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '1' on line 617 -- Looks like a reference, but probably isn't: '2' on line 617 == Outdated reference: A later version (-20) exists of draft-ietf-cellar-ffv1-04 ** Obsolete normative reference: RFC 4288 (Obsoleted by RFC 6838) Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 cellar M. Niedermayer 3 Internet-Draft 4 Intended status: Informational D. Rice 5 Expires: March 29, 2019 6 J. Martinez 7 September 25, 2018 9 FFV1 Video Coding Format Version 0, 1, and 3 10 draft-ietf-cellar-ffv1-05 12 Abstract 14 This document defines FFV1, a lossless intra-frame video encoding 15 format. FFV1 is designed to efficiently compress video data in a 16 variety of pixel formats. Compared to uncompressed video, FFV1 17 offers storage compression, frame fixity, and self-description, which 18 makes FFV1 useful as a preservation or intermediate video format. 20 Status of This Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at https://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on March 29, 2019. 37 Copyright Notice 39 Copyright (c) 2018 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (https://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 55 2. Notation and Conventions . . . . . . . . . . . . . . . . . . 4 56 2.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . 5 57 2.2. Conventions . . . . . . . . . . . . . . . . . . . . . . . 5 58 2.2.1. Pseudo-code . . . . . . . . . . . . . . . . . . . . . 6 59 2.2.2. Arithmetic Operators . . . . . . . . . . . . . . . . 6 60 2.2.3. Assignment Operators . . . . . . . . . . . . . . . . 6 61 2.2.4. Comparison Operators . . . . . . . . . . . . . . . . 7 62 2.2.5. Mathematical Functions . . . . . . . . . . . . . . . 7 63 2.2.6. Order of Operation Precedence . . . . . . . . . . . . 8 64 2.2.7. Range . . . . . . . . . . . . . . . . . . . . . . . . 8 65 2.2.8. NumBytes . . . . . . . . . . . . . . . . . . . . . . 8 66 2.2.9. Bitstream Functions . . . . . . . . . . . . . . . . . 8 67 3. General Description . . . . . . . . . . . . . . . . . . . . . 9 68 3.1. Border . . . . . . . . . . . . . . . . . . . . . . . . . 9 69 3.2. Samples . . . . . . . . . . . . . . . . . . . . . . . . . 10 70 3.3. Median Predictor . . . . . . . . . . . . . . . . . . . . 10 71 3.4. Context . . . . . . . . . . . . . . . . . . . . . . . . . 11 72 3.5. Quantization Table Sets . . . . . . . . . . . . . . . . . 11 73 3.6. Quantization Table Set Indexes . . . . . . . . . . . . . 11 74 3.7. Color spaces . . . . . . . . . . . . . . . . . . . . . . 12 75 3.7.1. YCbCr . . . . . . . . . . . . . . . . . . . . . . . . 12 76 3.7.2. RGB . . . . . . . . . . . . . . . . . . . . . . . . . 12 77 3.8. Coding of the Sample Difference . . . . . . . . . . . . . 14 78 3.8.1. Range Coding Mode . . . . . . . . . . . . . . . . . . 14 79 3.8.2. Golomb Rice Mode . . . . . . . . . . . . . . . . . . 17 80 4. Bitstream . . . . . . . . . . . . . . . . . . . . . . . . . . 20 81 4.1. Parameters . . . . . . . . . . . . . . . . . . . . . . . 20 82 4.1.1. version . . . . . . . . . . . . . . . . . . . . . . . 21 83 4.1.2. micro_version . . . . . . . . . . . . . . . . . . . . 22 84 4.1.3. coder_type . . . . . . . . . . . . . . . . . . . . . 22 85 4.1.4. state_transition_delta . . . . . . . . . . . . . . . 23 86 4.1.5. colorspace_type . . . . . . . . . . . . . . . . . . . 23 87 4.1.6. chroma_planes . . . . . . . . . . . . . . . . . . . . 23 88 4.1.7. bits_per_raw_sample . . . . . . . . . . . . . . . . . 23 89 4.1.8. log2_h_chroma_subsample . . . . . . . . . . . . . . . 24 90 4.1.9. log2_v_chroma_subsample . . . . . . . . . . . . . . . 24 91 4.1.10. alpha_plane . . . . . . . . . . . . . . . . . . . . . 24 92 4.1.11. num_h_slices . . . . . . . . . . . . . . . . . . . . 24 93 4.1.12. num_v_slices . . . . . . . . . . . . . . . . . . . . 24 94 4.1.13. quant_table_set_count . . . . . . . . . . . . . . . . 25 95 4.1.14. states_coded . . . . . . . . . . . . . . . . . . . . 25 96 4.1.15. initial_state_delta . . . . . . . . . . . . . . . . . 25 97 4.1.16. ec . . . . . . . . . . . . . . . . . . . . . . . . . 25 98 4.1.17. intra . . . . . . . . . . . . . . . . . . . . . . . . 25 99 4.2. Configuration Record . . . . . . . . . . . . . . . . . . 26 100 4.2.1. reserved_for_future_use . . . . . . . . . . . . . . . 26 101 4.2.2. configuration_record_crc_parity . . . . . . . . . . . 26 102 4.2.3. Mapping FFV1 into Containers . . . . . . . . . . . . 27 103 4.3. Frame . . . . . . . . . . . . . . . . . . . . . . . . . . 28 104 4.4. Slice . . . . . . . . . . . . . . . . . . . . . . . . . . 28 105 4.5. Slice Header . . . . . . . . . . . . . . . . . . . . . . 29 106 4.5.1. slice_x . . . . . . . . . . . . . . . . . . . . . . . 30 107 4.5.2. slice_y . . . . . . . . . . . . . . . . . . . . . . . 30 108 4.5.3. slice_width . . . . . . . . . . . . . . . . . . . . . 30 109 4.5.4. slice_height . . . . . . . . . . . . . . . . . . . . 30 110 4.5.5. quant_table_set_index_count . . . . . . . . . . . . . 30 111 4.5.6. quant_table_set_index . . . . . . . . . . . . . . . . 30 112 4.5.7. picture_structure . . . . . . . . . . . . . . . . . . 31 113 4.5.8. sar_num . . . . . . . . . . . . . . . . . . . . . . . 31 114 4.5.9. sar_den . . . . . . . . . . . . . . . . . . . . . . . 31 115 4.6. Slice Content . . . . . . . . . . . . . . . . . . . . . . 31 116 4.6.1. primary_color_count . . . . . . . . . . . . . . . . . 32 117 4.6.2. plane_pixel_height . . . . . . . . . . . . . . . . . 32 118 4.6.3. slice_pixel_height . . . . . . . . . . . . . . . . . 32 119 4.6.4. slice_pixel_y . . . . . . . . . . . . . . . . . . . . 32 120 4.7. Line . . . . . . . . . . . . . . . . . . . . . . . . . . 32 121 4.7.1. plane_pixel_width . . . . . . . . . . . . . . . . . . 32 122 4.7.2. slice_pixel_width . . . . . . . . . . . . . . . . . . 33 123 4.7.3. slice_pixel_x . . . . . . . . . . . . . . . . . . . . 33 124 4.7.4. sample_difference . . . . . . . . . . . . . . . . . . 33 125 4.8. Slice Footer . . . . . . . . . . . . . . . . . . . . . . 33 126 4.8.1. slice_size . . . . . . . . . . . . . . . . . . . . . 33 127 4.8.2. error_status . . . . . . . . . . . . . . . . . . . . 33 128 4.8.3. slice_crc_parity . . . . . . . . . . . . . . . . . . 34 129 4.9. Quantization Table Set . . . . . . . . . . . . . . . . . 34 130 4.9.1. quant_tables . . . . . . . . . . . . . . . . . . . . 35 131 4.9.2. context_count . . . . . . . . . . . . . . . . . . . . 35 132 5. Restrictions . . . . . . . . . . . . . . . . . . . . . . . . 35 133 6. Security Considerations . . . . . . . . . . . . . . . . . . . 36 134 7. Media Type Definition . . . . . . . . . . . . . . . . . . . . 36 135 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 38 136 9. Appendixes . . . . . . . . . . . . . . . . . . . . . . . . . 38 137 9.1. Decoder implementation suggestions . . . . . . . . . . . 38 138 9.1.1. Multi-threading Support and Independence of Slices . 38 139 10. Changelog . . . . . . . . . . . . . . . . . . . . . . . . . . 39 140 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 39 141 11.1. Normative References . . . . . . . . . . . . . . . . . . 39 142 11.2. Informative References . . . . . . . . . . . . . . . . . 40 143 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 41 145 1. Introduction 147 This document describes FFV1, a lossless video encoding format. The 148 design of FFV1 considers the storage of image characteristics, data 149 fixity, and the optimized use of encoding time and storage 150 requirements. FFV1 is designed to support a wide range of lossless 151 video applications such as long-term audiovisual preservation, 152 scientific imaging, screen recording, and other video encoding 153 scenarios that seek to avoid the generational loss of lossy video 154 encodings. 156 This document defines a version 0, 1, and 3 of FFV1. The 157 distinctions of the versions are provided throughout the document, 158 but in summary: 160 o Version 0 of FFV1 was the original implementation of FFV1 and has 161 been in non-experimental use since April 14, 2006 [FFV1_V0]. 163 o Version 1 of FFV1 adds support of more video bit depths and has 164 been in use since April 24, 2009 [FFV1_V1]. 166 o Version 2 of FFV1 only existed in experimental form and is not 167 described by this document, but is available as a LyX file at 168 . 171 o Version 3 of FFV1 adds several features such as increased 172 description of the characteristics of the encoding images and 173 embedded CRC data to support fixity verification of the encoding. 174 Version 3 has been in non-experimental use since August 17, 2013 175 [FFV1_V3]. 177 The latest version of this document is available at 178 180 This document assumes familiarity with mathematical and coding 181 concepts such as Range coding [range-coding] and YCbCr color spaces 182 [YCbCr]. 184 2. Notation and Conventions 186 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 187 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 188 document are to be interpreted as described in [RFC2119]. 190 2.1. Definitions 192 "Frame": An encoded representation of a complete static image. 194 "Slice": A spatial sub-section of a "Frame" that is encoded 195 separately from an other region of the same frame. 197 "Container": Format that encapsulates "Frames" and (when required) a 198 "Configuration Record" into a bitstream. 200 "Sample": The smallest addressable representation of a color 201 component or a luma component in a "Frame". Examples of sample are 202 Luma, Blue Chrominance, Red Chrominance, Alpha, Red, Green, and Blue. 204 "Pixel": The smallest addressable representation of a color in a 205 "Frame". It is composed of 1 or more samples. 207 "ESC": An ESCape symbol to indicate that the symbol to be stored is 208 too large for normal storage and that an alternate storage method. 210 "MSB": Most Significant Bit, the bit that can cause the largest 211 change in magnitude of the symbol. 213 "RCT": Reversible Color Transform, a near linear, exactly reversible 214 integer transform that converts between RGB and YCbCr representations 215 of a Pixel. 217 "VLC": Variable Length Code, a code that maps source symbols to a 218 variable number of bits. 220 "RGB": A reference to the method of storing the value of a Pixel by 221 using three numeric values that represent Red, Green, and Blue. 223 "YCbCr": A reference to the method of storing the value of a Pixel by 224 using three numeric values that represent the luma of the Pixel (Y) 225 and the chrominance of the Pixel (Cb and Cr). YCbCr word is used for 226 historical reasons and currently references any color space relying 227 on 1 luma sample and 2 chrominance samples e.g. YCbCr, YCgCo or 228 ICtCp. Exact meaning of the three numeric values is unspecified. 230 "TBA": To Be Announced. Used in reference to the development of 231 future iterations of the FFV1 specification. 233 2.2. Conventions 234 2.2.1. Pseudo-code 236 The FFV1 bitstream is described in this document using pseudo-code. 237 Note that the pseudo-code is used for clarity in order to illustrate 238 the structure of FFV1 and not intended to specify any particular 239 implementation. The pseudo-code used is based upon the C programming 240 language [ISO.9899.1990] and uses its "if/else", "while" and "for" 241 functions as well as functions defined within this document. 243 2.2.2. Arithmetic Operators 245 Note: the operators and the order of precedence are the same as used 246 in the C programming language [ISO.9899.1990]. 248 "a + b" means a plus b. 250 "a - b" means a minus b. 252 "-a" means negation of a. 254 "a * b" means a multiplied by b. 256 "a / b" means a divided by b. 258 "a & b" means bit-wise "and" of a and b. 260 "a | b" means bit-wise "or" of a and b. 262 "a >> b" means arithmetic right shift of two's complement integer 263 representation of a by b binary digits. 265 "a << b" means arithmetic left shift of two's complement integer 266 representation of a by b binary digits. 268 2.2.3. Assignment Operators 270 "a = b" means a is assigned b. 272 "a++" is equivalent to a is assigned a + 1. 274 "a--" is equivalent to a is assigned a - 1. 276 "a += b" is equivalent to a is assigned a + b. 278 "a -= b" is equivalent to a is assigned a - b. 280 "a *= b" is equivalent to a is assigned a * b. 282 2.2.4. Comparison Operators 284 "a > b" means a is greater than b. 286 "a >= b" means a is greater than or equal to b. 288 "a < b" means a is less than b. 290 "a <= b" means a is less than or equal b. 292 "a == b" means a is equal to b. 294 "a != b" means a is not equal to b. 296 "a && b" means Boolean logical "and" of a and b. 298 "a || b" means Boolean logical "or" of a and b. 300 "!a" means Boolean logical "not" of a. 302 "a ? b : c" if a is true, then b, otherwise c. 304 2.2.5. Mathematical Functions 306 floor(a) the largest integer less than or equal to a 308 ceil(a) the smallest integer greater than or equal to a 310 sign(a) extracts the sign of a number, i.e. if a < 0 then -1, else if 311 a > 0 then 1, else 0 313 abs(a) the absolute value of a, i.e. abs(a) = sign(a)*a 315 log2(a) the base-two logarithm of a 317 min(a,b) the smallest of two values a and b 319 max(a,b) the largest of two values a and b 321 median(a,b,c) the numerical middle value in a data set of a, b, and 322 c, i.e. a+b+c-min(a,b,c)-max(a,b,c) 324 a_{b} the b-th value of a sequence of a 326 a_{b,c} the 'b,c'-th value of a sequence of a 328 2.2.6. Order of Operation Precedence 330 When order of precedence is not indicated explicitly by use of 331 parentheses, operations are evaluated in the following order (from 332 top to bottom, operations of same precedence being evaluated from 333 left to right). This order of operations is based on the order of 334 operations used in Standard C. 336 a++, a-- 337 !a, -a 338 a * b, a / b, a % b 339 a + b, a - b 340 a << b, a >> b 341 a < b, a <= b, a > b, a >= b 342 a == b, a != b 343 a & b 344 a | b 345 a && b 346 a || b 347 a ? b : c 348 a = b, a += b, a -= b, a *= b 350 2.2.7. Range 352 "a...b" means any value starting from a to b, inclusive. 354 2.2.8. NumBytes 356 "NumBytes" is a non-negative integer that expresses the size in 8-bit 357 octets of particular FFV1 "Configuration Record" or "Frame". FFV1 358 relies on its "Container" to store the "NumBytes" values, see 359 Section 4.2.3. 361 2.2.9. Bitstream Functions 363 2.2.9.1. remaining_bits_in_bitstream 365 "remaining_bits_in_bitstream( )" means the count of remaining bits 366 after the pointer in that "Configuration Record" or "Frame". It is 367 computed from the "NumBytes" value multiplied by 8 minus the count of 368 bits of that "Configuration Record" or "Frame" already read by the 369 bitstream parser. 371 2.2.9.2. byte_aligned 373 "byte_aligned( )" is true if "remaining_bits_in_bitstream( NumBytes 374 )" is a multiple of 8, otherwise false. 376 2.2.9.3. get_bits 378 "get_bits( i )" is the action to read the next "i" bits in the 379 bitstream, from most significant bit to least significant bit, and to 380 return the corresponding value. The pointer is increased by "i". 382 3. General Description 384 Samples within a plane are coded in raster scan order (left->right, 385 top->bottom). Each sample is predicted by the median predictor from 386 samples in the same plane and the difference is stored see 387 Section 3.8. 389 3.1. Border 391 A border is assumed for each coded slice for the purpose of the 392 predictor and context according to the following rules: 394 o one column of samples to the left of the coded slice is assumed as 395 identical to the samples of the leftmost column of the coded slice 396 shifted down by one row. The value of the topmost sample of the 397 column of samples to the left of the coded slice is assumed to be 398 "0" 400 o one column of samples to the right of the coded slice is assumed 401 as identical to the samples of the rightmost column of the coded 402 slice 404 o an additional column of samples to the left of the coded slice and 405 two rows of samples above the coded slice are assumed to be "0" 407 The following table depicts a slice of samples "a,b,c,d,e,f,g,h,i" 408 along with its assumed border. 410 +---+---+---+---+---+---+---+---+ 411 | 0 | 0 | | 0 | 0 | 0 | | 0 | 412 +---+---+---+---+---+---+---+---+ 413 | 0 | 0 | | 0 | 0 | 0 | | 0 | 414 +---+---+---+---+---+---+---+---+ 415 | | | | | | | | | 416 +---+---+---+---+---+---+---+---+ 417 | 0 | 0 | | a | b | c | | c | 418 +---+---+---+---+---+---+---+---+ 419 | 0 | a | | d | e | f | | f | 420 +---+---+---+---+---+---+---+---+ 421 | 0 | d | | g | h | i | | i | 422 +---+---+---+---+---+---+---+---+ 424 3.2. Samples 426 Positions used for context and median predictor are: 428 +---+---+---+---+ 429 | | | T | | 430 +---+---+---+---+ 431 | |tl | t |tr | 432 +---+---+---+---+ 433 | L | l | X | | 434 +---+---+---+---+ 436 "X" is the current processed Sample. The identifiers are made of the 437 first letters of the words Top, Left and Right. 439 3.3. Median Predictor 441 The prediction for any sample value at position "X" may be computed 442 based upon the relative neighboring values of "l", "t", and "tl" via 443 this equation: 445 "median(l, t, l + t - tl)". 447 Note, this prediction template is also used in [ISO.14495-1.1999] and 448 [HuffYUV]. 450 Exception for the media predictor: if "colorspace_type == 0 && 451 bits_per_raw_sample == 16 && ( coder_type == 1 || coder_type == 2 )", 452 the following media predictor MUST be used: 454 "median(left16s, top16s, left16s + top16s - diag16s)" 456 where: 458 left16s = l >= 32768 ? ( l - 65536 ) : l 459 top16s = t >= 32768 ? ( t - 65536 ) : t 460 diag16s = tl >= 32768 ? ( tl - 65536 ) : tl 462 Background: a two's complement signed 16-bit signed integer was used 463 for storing sample values in all known implementations of FFV1 464 bitstream. So in some circumstances, the most significant bit was 465 wrongly interpreted (used as a sign bit instead of the 16th bit of an 466 unsigned integer). Note that when the issue is discovered, the only 467 configuration of all known implementations being impacted is 16-bit 468 YCbCr with no Pixel transformation with Range Coder coder, as other 469 potentially impacted configurations (e.g. 15/16-bit JPEG2000-RCT with 470 Range Coder coder, or 16-bit content with Golomb Rice coder) were 471 implemented nowhere [ISO.15444-1.2016]. In the meanwhile, 16-bit 472 JPEG2000-RCT with Range Coder coder was implemented without this 473 issue in one implementation and validated by one conformance checker. 474 It is expected (to be confirmed) to remove this exception for the 475 media predictor in the next version of the FFV1 bitstream. 477 3.4. Context 479 Relative to any sample "X", the Quantized Sample Differences "L-l", 480 "l-tl", "tl-t", "T-t", and "t-tr" are used as context: 482 context = Q_{0}[l - tl] + 483 Q_{1}[tl - t] + 484 Q_{2}[t - tr] + 485 Q_{3}[L - l] + 486 Q_{4}[T - t] 488 If "context >= 0" then "context" is used and the difference between 489 the sample and its predicted value is encoded as is, else "-context" 490 is used and the difference between the sample and its predicted value 491 is encoded with a flipped sign. 493 3.5. Quantization Table Sets 495 The FFV1 bitstream contains 1 or more Quantization Table Sets. Each 496 Quantization Table Set contains exactly 5 Quantization Tables, each 497 Quantization Table corresponding to 1 of the 5 Quantized Sample 498 Differences. For each Quantization Table, both the number of 499 quantization steps and their distribution are stored in the FFV1 500 bitstream; each Quantization Table has exactly 256 entries, and the 8 501 least significant bits of the Quantized Sample Difference are used as 502 index: 504 Q_{j}[k] = quant_tables[i][j][k&255] 506 In this formula, "i" is the Quantization Table Set index, "j" is the 507 Quantized Table index, "k" the Quantized Sample Difference. 509 3.6. Quantization Table Set Indexes 511 For each plane of each slice, a Quantization Table Set is selected 512 from an index: 514 o For Y plane, "quant_table_set_index [ 0 ]" index is used 516 o For Cb and Cr planes, "quant_table_set_index [ 1 ]" index is used 518 o For Alpha plane, "quant_table_set_index [ (version <= 3 || 519 chroma_planes) ? 2 : 1 ]" index is used 521 Background: in first implementations of FFV1 bitstream, the index for 522 Cb and Cr planes was stored even if it is not used (chroma_planes set 523 to 0), this index is kept for version <= 3 in order to keep 524 compatibility with FFV1 bitstreams in the wild. 526 3.7. Color spaces 528 FFV1 supports two color spaces: YCbCr and RGB. Both color spaces 529 allow an optional Alpha plane that can be used to code transparency 530 data. 532 3.7.1. YCbCr 534 In YCbCr color space, the Cb and Cr planes are optional, but if used 535 then MUST be used together. Omitting the Cb and Cr planes codes the 536 frames in grayscale without color data. An FFV1 "Frame" using YCbCr 537 MUST use one of the following arrangements: 539 o Y 541 o Y, Alpha 543 o Y, Cb, Cr 545 o Y, Cb, Cr, Alpha 547 The Y plane MUST be coded first. If the Cb and Cr planes are used 548 then they MUST be coded after the Y plane. If an Alpha 549 (transparency) plane is used, then it MUST be coded last. 551 3.7.2. RGB 553 JPEG2000-RCT is a Reversible Color Transform that codes RGB (red, 554 green, blue) planes losslessly in a modified YCbCr color space 555 [ISO.15444-1.2016]. Reversible Pixel transformations between YCbCr 556 and RGB use the following formulae. 558 Cb=b-g 560 Cr=r-g 562 Y=g+(Cb+Cr)>>2 564 g=Y-(Cb+Cr)>>2 566 r=Cr+g 568 b=Cb+g 570 Exception for the JPEG2000-RCT conversion: if bits_per_raw_sample is 571 between 9 and 15 inclusive and alpha_plane is 0, the following 572 formulae for reversible conversions between YCbCr and RGB MUST be 573 used instead of the ones above: 575 Cb=g-b 577 Cr=r-b 579 Y=b+(Cb+Cr)>>2 581 b=Y-(Cb+Cr)>>2 583 r=Cr+b 585 g=Cb+b 587 Background: At the time of this writing, in all known implementations 588 of FFV1 bitstream, when bits_per_raw_sample was between 9 and 15 589 inclusive and alpha_plane is 0, GBR planes were used as BGR planes 590 during both encoding and decoding. In the meanwhile, 16-bit 591 JPEG2000-RCT was implemented without this issue in one implementation 592 and validated by one conformance checker. Methods to address this 593 exception for the transform are under consideration for the next 594 version of the FFV1 bitstream. 596 When FFV1 uses the JPEG2000-RCT, the horizontal lines are interleaved 597 to improve caching efficiency since it is most likely that the RCT 598 will immediately be converted to RGB during decoding. The 599 interleaved coding order is also Y, then Cb, then Cr, and then if 600 used Alpha. 602 As an example, a "Frame" that is two pixels wide and two pixels high, 603 could be comprised of the following structure: 605 +------------------------+------------------------+ 606 | Pixel[1,1] | Pixel[2,1] | 607 | Y[1,1] Cb[1,1] Cr[1,1] | Y[2,1] Cb[2,1] Cr[2,1] | 608 +------------------------+------------------------+ 609 | Pixel[1,2] | Pixel[2,2] | 610 | Y[1,2] Cb[1,2] Cr[1,2] | Y[2,2] Cb[2,2] Cr[2,2] | 611 +------------------------+------------------------+ 613 In JPEG2000-RCT, the coding order would be left to right and then top 614 to bottom, with values interleaved by lines and stored in this order: 616 Y[1,1] Y[2,1] Cb[1,1] Cb[2,1] Cr[1,1] Cr[2,1] Y[1,2] Y[2,2] Cb[1,2] 617 Cb[2,2] Cr[1,2] Cr[2,2] 619 3.8. Coding of the Sample Difference 621 Instead of coding the n+1 bits of the Sample Difference with Huffman 622 or Range coding (or n+2 bits, in the case of RCT), only the n (or 623 n+1) least significant bits are used, since this is sufficient to 624 recover the original sample. In the equation below, the term "bits" 625 represents bits_per_raw_sample+1 for RCT or bits_per_raw_sample 626 otherwise: 628 coder_input = 629 [(sample_difference + 2^(bits-1)) & (2^bits - 1)] - 2^(bits-1) 631 3.8.1. Range Coding Mode 633 Early experimental versions of FFV1 used the CABAC Arithmetic coder 634 from H.264 as defined in [ISO.14496-10.2014] but due to the uncertain 635 patent/royalty situation, as well as its slightly worse performance, 636 CABAC was replaced by a Range coder based on an algorithm defined by 637 G. Nigel and N. Martin in 1979 [range-coding]. 639 3.8.1.1. Range Binary Values 641 To encode binary digits efficiently a Range coder is used. "C_{i}" 642 is the i-th Context. "B_{i}" is the i-th byte of the bytestream. 643 "b_{i}" is the i-th Range coded binary value, "S_{0,i}" is the i-th 644 initial state, which is 128. The length of the bytestream encoding n 645 binary symbols is "j_{n}" bytes. 647 r_{i} = floor( ( R_{i} * S_{i,C_{i}} ) / 2^8 ) 649 S_{i+1,C_{i}} = zero_state_{S_{i,C_{i}}} XOR 650 l_i = L_i XOR 651 t_i = R_i - r_i <== 652 b_i = 0 <==> 653 L_i < R_i - r_i 655 S_{i+1,C_{i}} = one_state_{S_{i,C_{i}}} XOR 656 l_i = L_i - R_i + r_i XOR 657 t_i = r_i <== 658 b_i = 1 <==> 659 L_i >= R_i - r_i 661 S_{i+1,k} = S_{i,k} <== C_i != k 663 R_{i+1} = 2^8 * t_{i} XOR 664 L_{i+1} = 2^8 * l_{i} + B_{j_{i}} XOR 665 j_{i+1} = j_{i} + 1 <== 666 t_{i} < 2^8 667 R_{i+1} = t_{i} XOR 668 L_{i+1} = l_{i} XOR 669 j_{i+1} = j_{i} <== 670 t_{i} >= 2^8 672 R_{0} = 65280 674 L_{0} = 2^8 * B_{0} + B_{1} 676 j_{0} = 2 678 3.8.1.2. Range Non Binary Values 680 To encode scalar integers, it would be possible to encode each bit 681 separately and use the past bits as context. However that would mean 682 255 contexts per 8-bit symbol that is not only a waste of memory but 683 also requires more past data to reach a reasonably good estimate of 684 the probabilities. Alternatively assuming a Laplacian distribution 685 and only dealing with its variance and mean (as in Huffman coding) 686 would also be possible, however, for maximum flexibility and 687 simplicity, the chosen method uses a single symbol to encode if a 688 number is 0 and if not encodes the number using its exponent, 689 mantissa and sign. The exact contexts used are best described by the 690 following code, followed by some comments. 692 pseudo-code | type 693 --------------------------------------------------------------|----- 694 void put_symbol(RangeCoder *c, uint8_t *state, int v, int \ | 695 is_signed) { | 696 int i; | 697 put_rac(c, state+0, !v); | 698 if (v) { | 699 int a= abs(v); | 700 int e= log2(a); | 701 | 702 for (i=0; i=0; i--) | 707 put_rac(c, state+22+min(i,9), (a>>i)&1); //22..31 | 708 | 709 if (is_signed) | 710 put_rac(c, state+11 + min(e, 10), v < 0); //11..21| 711 } | 712 } | 714 3.8.1.3. Initial Values for the Context Model 716 At keyframes all Range coder state variables are set to their initial 717 state. 719 3.8.1.4. State Transition Table 721 one_state_{i} = 722 default_state_transition_{i} + state_transition_delta_{i} 724 zero_state_{i} = 256 - one_state_{256-i} 726 3.8.1.5. default_state_transition 728 0, 0, 0, 0, 0, 0, 0, 0, 20, 21, 22, 23, 24, 25, 26, 27, 730 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 37, 38, 39, 40, 41, 42, 732 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 56, 57, 734 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 736 74, 75, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 738 89, 90, 91, 92, 93, 94, 94, 95, 96, 97, 98, 99,100,101,102,103, 740 104,105,106,107,108,109,110,111,112,113,114,114,115,116,117,118, 742 119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,133, 744 134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149, 746 150,151,152,152,153,154,155,156,157,158,159,160,161,162,163,164, 748 165,166,167,168,169,170,171,171,172,173,174,175,176,177,178,179, 750 180,181,182,183,184,185,186,187,188,189,190,190,191,192,194,194, 752 195,196,197,198,199,200,201,202,202,204,205,206,207,208,209,209, 754 210,211,212,213,215,215,216,217,218,219,220,220,222,223,224,225, 756 226,227,227,229,229,230,231,232,234,234,235,236,237,238,239,240, 758 241,242,243,244,245,246,247,248,248, 0, 0, 0, 0, 0, 0, 0, 760 3.8.1.6. Alternative State Transition Table 762 The alternative state transition table has been built using iterative 763 minimization of frame sizes and generally performs better than the 764 default. To use it, the coder_type MUST be set to 2 and the 765 difference to the default MUST be stored in the "Parameters", see 766 Section 4.1. The reference implementation of FFV1 in FFmpeg uses 767 this table by default at the time of this writing when Range coding 768 is used. 770 0, 10, 10, 10, 10, 16, 16, 16, 28, 16, 16, 29, 42, 49, 20, 49, 772 59, 25, 26, 26, 27, 31, 33, 33, 33, 34, 34, 37, 67, 38, 39, 39, 774 40, 40, 41, 79, 43, 44, 45, 45, 48, 48, 64, 50, 51, 52, 88, 52, 776 53, 74, 55, 57, 58, 58, 74, 60,101, 61, 62, 84, 66, 66, 68, 69, 778 87, 82, 71, 97, 73, 73, 82, 75,111, 77, 94, 78, 87, 81, 83, 97, 780 85, 83, 94, 86, 99, 89, 90, 99,111, 92, 93,134, 95, 98,105, 98, 782 105,110,102,108,102,118,103,106,106,113,109,112,114,112,116,125, 784 115,116,117,117,126,119,125,121,121,123,145,124,126,131,127,129, 786 165,130,132,138,133,135,145,136,137,139,146,141,143,142,144,148, 788 147,155,151,149,151,150,152,157,153,154,156,168,158,162,161,160, 790 172,163,169,164,166,184,167,170,177,174,171,173,182,176,180,178, 792 175,189,179,181,186,183,192,185,200,187,191,188,190,197,193,196, 794 197,194,195,196,198,202,199,201,210,203,207,204,205,206,208,214, 796 209,211,221,212,213,215,224,216,217,218,219,220,222,228,223,225, 798 226,224,227,229,240,230,231,232,233,234,235,236,238,239,237,242, 800 241,243,242,244,245,246,247,248,249,250,251,252,252,253,254,255, 802 3.8.2. Golomb Rice Mode 804 This coding mode uses Golomb Rice codes. The VLC is split into 2 805 parts, the prefix stores the most significant bits and the suffix 806 stores the k least significant bits or stores the whole number in the 807 ESC case. The end of the bitstream of the "Frame" is filled with 808 0-bits until that the bitstream contains a multiple of 8 bits. 810 3.8.2.1. Prefix 812 +----------------+-------+ 813 | bits | value | 814 +----------------+-------+ 815 | 1 | 0 | 816 | 01 | 1 | 817 | ... | ... | 818 | 0000 0000 0001 | 11 | 819 | 0000 0000 0000 | ESC | 820 +----------------+-------+ 822 3.8.2.2. Suffix 824 +-------+-----------------------------------------------------------+ 825 | non | the k least significant bits MSB first | 826 | ESC | | 827 | ESC | the value - 11, in MSB first order, ESC may only be used | 828 | | if the value cannot be coded as non ESC | 829 +-------+-----------------------------------------------------------+ 831 3.8.2.3. Examples 833 +-----+-------------------------+-------+ 834 | k | bits | value | 835 +-----+-------------------------+-------+ 836 | 0 | "1" | 0 | 837 | 0 | "001" | 2 | 838 | 2 | "1 00" | 0 | 839 | 2 | "1 10" | 2 | 840 | 2 | "01 01" | 5 | 841 | any | "000000000000 10000000" | 139 | 842 +-----+-------------------------+-------+ 844 3.8.2.4. Run Mode 846 Run mode is entered when the context is 0 and left as soon as a non-0 847 difference is found. The level is identical to the predicted one. 848 The run and the first different level are coded. 850 3.8.2.5. Run Length Coding 852 The run value is encoded in 2 parts, the prefix part stores the more 853 significant part of the run as well as adjusting the run_index that 854 determines the number of bits in the less significant part of the 855 run. The 2nd part of the value stores the less significant part of 856 the run as it is. The run_index is reset for each plane and slice to 857 0. 859 pseudo-code | type 860 --------------------------------------------------------------|----- 861 log2_run[41]={ | 862 0, 0, 0, 0, 1, 1, 1, 1, | 863 2, 2, 2, 2, 3, 3, 3, 3, | 864 4, 4, 5, 5, 6, 6, 7, 7, | 865 8, 9,10,11,12,13,14,15, | 866 16,17,18,19,20,21,22,23, | 867 24, | 868 }; | 869 | 870 if (run_count == 0 && run_mode == 1) { | 871 if (get_bits(1)) { | 872 run_count = 1 << log2_run[run_index]; | 873 if (x + run_count <= w) | 874 run_index++; | 875 } else { | 876 if (log2_run[run_index]) | 877 run_count = get_bits(log2_run[run_index]); | 878 else | 879 run_count = 0; | 880 if (run_index) | 881 run_index--; | 882 run_mode = 2; | 883 } | 884 } | 886 The log2_run function is also used within [ISO.14495-1.1999]. 888 3.8.2.6. Level Coding 890 Level coding is identical to the normal difference coding with the 891 exception that the 0 value is removed as it cannot occur: 893 if (diff>0) diff--; 894 encode(diff); 896 Note, this is different from JPEG-LS, which doesn't use prediction in 897 run mode and uses a different encoding and context model for the last 898 difference On a small set of test samples the use of prediction 899 slightly improved the compression rate. 901 4. Bitstream 903 +--------+----------------------------------------------------------+ 904 | Symbol | Definition | 905 +--------+----------------------------------------------------------+ 906 | u(n) | unsigned big endian integer using n bits | 907 | sg | Golomb Rice coded signed scalar symbol coded with the | 908 | | method described in Section 3.8.2 | 909 | br | Range coded Boolean (1-bit) symbol with the method | 910 | | described in Section 3.8.1.1 | 911 | ur | Range coded unsigned scalar symbol coded with the method | 912 | | described in Section 3.8.1.2 | 913 | sr | Range coded signed scalar symbol coded with the method | 914 | | described in Section 3.8.1.2 | 915 +--------+----------------------------------------------------------+ 917 The same context that is initialized to 128 is used for all fields in 918 the header. 920 The following MUST be provided by external means during 921 initialization of the decoder: 923 "frame_pixel_width" is defined as "Frame" width in pixels. 925 "frame_pixel_height" is defined as "Frame" height in pixels. 927 Default values at the decoder initialization phase: 929 "ConfigurationRecordIsPresent" is set to 0. 931 4.1. Parameters 933 The "Parameters" section contains significant characteristics used 934 for all instances of "Frame". The pseudo-code below describes the 935 contents of the bitstream. 937 pseudo-code | type 938 --------------------------------------------------------------|----- 939 Parameters( ) { | 940 version | ur 941 if (version >= 3) | 942 micro_version | ur 943 coder_type | ur 944 if (coder_type > 1) | 945 for (i = 1; i < 256; i++) | 946 state_transition_delta[ i ] | sr 947 colorspace_type | ur 948 if (version >= 1) | 949 bits_per_raw_sample | ur 950 chroma_planes | br 951 log2_h_chroma_subsample | ur 952 log2_v_chroma_subsample | ur 953 alpha_plane | br 954 if (version >= 3) { | 955 num_h_slices - 1 | ur 956 num_v_slices - 1 | ur 957 quant_table_set_count | ur 958 } | 959 for( i = 0; i < quant_table_set_count; i++ ) | 960 QuantizationTableSet( i ) | 961 if (version >= 3) { | 962 for( i = 0; i < quant_table_set_count; i++ ) { | 963 states_coded | br 964 if (states_coded) | 965 for( j = 0; j < context_count[ i ]; j++ ) | 966 for( k = 0; k < CONTEXT_SIZE; k++ ) | 967 initial_state_delta[ i ][ j ][ k ] | sr 968 } | 969 ec | ur 970 intra | ur 971 } | 972 } | 974 4.1.1. version 976 "version" specifies the version of the FFV1 bitstream. 977 Each version is incompatible with others versions: decoders SHOULD 978 reject a file due to unknown version. 979 Decoders SHOULD reject a file with version <= 1 && 980 ConfigurationRecordIsPresent == 1. 981 Decoders SHOULD reject a file with version >= 3 && 982 ConfigurationRecordIsPresent == 0. 984 +-------+-------------------------+ 985 | value | version | 986 +-------+-------------------------+ 987 | 0 | FFV1 version 0 | 988 | 1 | FFV1 version 1 | 989 | 2 | reserved* | 990 | 3 | FFV1 version 3 | 991 | Other | reserved for future use | 992 +-------+-------------------------+ 994 * Version 2 was never enabled in the encoder thus version 2 files 995 SHOULD NOT exist, and this document does not describe them to keep 996 the text simpler. 998 4.1.2. micro_version 1000 "micro_version" specifies the micro-version of the FFV1 bitstream. 1001 After a version is considered stable (a micro-version value is 1002 assigned to be the first stable variant of a specific version), each 1003 new micro-version after this first stable variant is compatible with 1004 the previous micro-version: decoders SHOULD NOT reject a file due to 1005 an unknown micro-version equal or above the micro-version considered 1006 as stable. 1008 Meaning of micro_version for version 3: 1010 +-------+-------------------------+ 1011 | value | micro_version | 1012 +-------+-------------------------+ 1013 | 0...3 | reserved* | 1014 | 4 | first stable variant | 1015 | Other | reserved for future use | 1016 +-------+-------------------------+ 1018 * development versions may be incompatible with the stable variants. 1020 4.1.3. coder_type 1022 "coder_type" specifies the coder used. 1024 +-------+-------------------------------------------------+ 1025 | value | coder used | 1026 +-------+-------------------------------------------------+ 1027 | 0 | Golomb Rice | 1028 | 1 | Range Coder with default state transition table | 1029 | 2 | Range Coder with custom state transition table | 1030 | Other | reserved for future use | 1031 +-------+-------------------------------------------------+ 1033 4.1.4. state_transition_delta 1035 "state_transition_delta" specifies the Range coder custom state 1036 transition table. 1037 If state_transition_delta is not present in the FFV1 bitstream, all 1038 Range coder custom state transition table elements are assumed to be 1039 0. 1041 4.1.5. colorspace_type 1043 "colorspace_type" specifies color space losslessly encoded, Pixel 1044 transformation used by the encoder, as well as interleave method. 1046 +-------+---------------------+------------------+------------------+ 1047 | value | color space | transformation | interleave | 1048 | | losslessly encoded | | method | 1049 +-------+---------------------+------------------+------------------+ 1050 | 0 | YCbCr | No Pixel | plane then line | 1051 | | | transformation | | 1052 | 1 | RGB | JPEG2000-RCT | line then plane | 1053 | Other | reserved for future | reserved for | reserved for | 1054 | | use | future use | future use | 1055 +-------+---------------------+------------------+------------------+ 1057 Restrictions: 1058 If "colorspace_type" is 1, then "chroma_planes" MUST be 1, 1059 "log2_h_chroma_subsample" MUST be 0, and "log2_v_chroma_subsample" 1060 MUST be 0. 1062 4.1.6. chroma_planes 1064 "chroma_planes" indicates if chroma (color) planes are present. 1066 +-------+-------------------------------+ 1067 | value | presence | 1068 +-------+-------------------------------+ 1069 | 0 | chroma planes are not present | 1070 | 1 | chroma planes are present | 1071 +-------+-------------------------------+ 1073 4.1.7. bits_per_raw_sample 1075 "bits_per_raw_sample" indicates the number of bits for each sample. 1076 Inferred to be 8 if not present. 1078 +-------+---------------------------------+ 1079 | value | bits for each sample | 1080 +-------+---------------------------------+ 1081 | 0 | reserved* | 1082 | Other | the actual bits for each sample | 1083 +-------+---------------------------------+ 1085 * Encoders MUST NOT store bits_per_raw_sample = 0 Decoders SHOULD 1086 accept and interpret bits_per_raw_sample = 0 as 8. 1088 4.1.8. log2_h_chroma_subsample 1090 "log2_h_chroma_subsample" indicates the subsample factor, stored in 1091 powers to which the number 2 must be raised, between luma and chroma 1092 width ("chroma_width = 2^(-log2_h_chroma_subsample) * luma_width"). 1094 4.1.9. log2_v_chroma_subsample 1096 "log2_v_chroma_subsample" indicates the subsample factor, stored in 1097 powers to which the number 2 must be raised, between luma and chroma 1098 height ("chroma_height=2^(-log2_v_chroma_subsample) * luma_height"). 1100 4.1.10. alpha_plane 1102 "alpha_plane" indicates if a transparency plane is present. 1104 +-------+-----------------------------------+ 1105 | value | presence | 1106 +-------+-----------------------------------+ 1107 | 0 | transparency plane is not present | 1108 | 1 | transparency plane is present | 1109 +-------+-----------------------------------+ 1111 4.1.11. num_h_slices 1113 "num_h_slices" indicates the number of horizontal elements of the 1114 slice raster. 1115 Inferred to be 1 if not present. 1117 4.1.12. num_v_slices 1119 "num_v_slices" indicates the number of vertical elements of the slice 1120 raster. 1121 Inferred to be 1 if not present. 1123 4.1.13. quant_table_set_count 1125 "quant_table_set_count" indicates the number of Quantization 1126 Table Sets. 1127 Inferred to be 1 if not present. 1128 MUST NOT be 0. 1130 4.1.14. states_coded 1132 "states_coded" indicates if the respective Quantization Table Set has 1133 the initial states coded. 1134 Inferred to be 0 if not present. 1136 +-------+-----------------------------------------------------------+ 1137 | value | initial states | 1138 +-------+-----------------------------------------------------------+ 1139 | 0 | initial states are not present and are assumed to be all | 1140 | | 128 | 1141 | 1 | initial states are present | 1142 +-------+-----------------------------------------------------------+ 1144 4.1.15. initial_state_delta 1146 "initial_state_delta[ i ][ j ][ k ]" indicates the initial Range 1147 coder state, it is encoded using "k" as context index and 1149 pred = j ? initial_states[ i ][j - 1][ k ] : 128 1151 initial_state[ i ][ j ][ k ] = 1152 ( pred + initial_state_delta[ i ][ j ][ k ] ) & 255 1154 4.1.16. ec 1156 "ec" indicates the error detection/correction type. 1158 +-------+--------------------------------------------+ 1159 | value | error detection/correction type | 1160 +-------+--------------------------------------------+ 1161 | 0 | 32-bit CRC on the global header | 1162 | 1 | 32-bit CRC per slice and the global header | 1163 | Other | reserved for future use | 1164 +-------+--------------------------------------------+ 1166 4.1.17. intra 1168 "intra" indicates the relationship between the instances of "Frame". 1169 Inferred to be 0 if not present. 1171 +-------+-----------------------------------------------------------+ 1172 | value | relationship | 1173 +-------+-----------------------------------------------------------+ 1174 | 0 | Frames are independent or dependent (keyframes and non | 1175 | | keyframes) | 1176 | 1 | Frames are independent (keyframes only) | 1177 | Other | reserved for future use | 1178 +-------+-----------------------------------------------------------+ 1180 4.2. Configuration Record 1182 In the case of a FFV1 bitstream with "version >= 3", a "Configuration 1183 Record" is stored in the underlying "Container", at the track header 1184 level. It contains the "Parameters" used for all instances of 1185 "Frame". The size of the "Configuration Record", "NumBytes", is 1186 supplied by the underlying "Container". 1188 pseudo-code | type 1189 --------------------------------------------------------------|----- 1190 ConfigurationRecord( NumBytes ) { | 1191 ConfigurationRecordIsPresent = 1 | 1192 Parameters( ) | 1193 while( remaining_bits_in_bitstream( NumBytes ) > 32 ) | 1194 reserved_for_future_use | u(1) 1195 configuration_record_crc_parity | u(32) 1196 } | 1198 4.2.1. reserved_for_future_use 1200 "reserved_for_future_use" has semantics that are reserved for future 1201 use. 1202 Encoders conforming to this version of this specification SHALL NOT 1203 write this value. 1204 Decoders conforming to this version of this specification SHALL 1205 ignore its value. 1207 4.2.2. configuration_record_crc_parity 1209 "configuration_record_crc_parity" 32 bits that are chosen so that the 1210 "Configuration Record" as a whole has a crc remainder of 0. 1211 This is equivalent to storing the crc remainder in the 32-bit parity. 1212 The CRC generator polynomial used is the standard IEEE CRC polynomial 1213 (0x104C11DB7) with initial value 0. 1215 4.2.3. Mapping FFV1 into Containers 1217 This "Configuration Record" can be placed in any file format 1218 supporting "Configuration Records", fitting as much as possible with 1219 how the file format uses to store "Configuration Records". The 1220 "Configuration Record" storage place and "NumBytes" are currently 1221 defined and supported by this version of this specification for the 1222 following formats: 1224 4.2.3.1. AVI File Format 1226 The "Configuration Record" extends the stream format chunk ("AVI ", 1227 "hdlr", "strl", "strf") with the ConfigurationRecord bitstream. 1228 See [AVI] for more information about chunks. 1230 "NumBytes" is defined as the size, in bytes, of the strf chunk 1231 indicated in the chunk header minus the size of the stream format 1232 structure. 1234 4.2.3.2. ISO Base Media File Format 1236 The "Configuration Record" extends the sample description box 1237 ("moov", "trak", "mdia", "minf", "stbl", "stsd") with a "glbl" box 1238 that contains the ConfigurationRecord bitstream. See 1239 [ISO.14496-12.2015] for more information about boxes. 1241 "NumBytes" is defined as the size, in bytes, of the "glbl" box 1242 indicated in the box header minus the size of the box header. 1244 4.2.3.3. NUT File Format 1246 The codec_specific_data element (in "stream_header" packet) contains 1247 the ConfigurationRecord bitstream. See [NUT] for more information 1248 about elements. 1250 "NumBytes" is defined as the size, in bytes, of the 1251 codec_specific_data element as indicated in the "length" field of 1252 codec_specific_data 1254 4.2.3.4. Matroska File Format 1256 FFV1 SHOULD use "V_FFV1" as the Matroska "Codec ID". For FFV1 1257 versions 2 or less, the Matroska "CodecPrivate" Element SHOULD NOT be 1258 used. For FFV1 versions 3 or greater, the Matroska "CodecPrivate" 1259 Element MUST contain the FFV1 "Configuration Record" structure and no 1260 other data. See [Matroska] for more information about elements. 1262 "NumBytes" is defined as the "Element Data Size" of the 1263 "CodecPrivate" Element. 1265 4.3. Frame 1267 A "Frame" consists of the keyframe field, "Parameters" (if version 1268 <=1), and a sequence of independent slices. 1270 pseudo-code | type 1271 --------------------------------------------------------------|----- 1272 Frame( NumBytes ) { | 1273 keyframe | br 1274 if (keyframe && !ConfigurationRecordIsPresent | 1275 Parameters( ) | 1276 while ( remaining_bits_in_bitstream( NumBytes ) ) | 1277 Slice( ) | 1278 } | 1280 Architecture overview of slices in a "Frame": 1282 +-----------------------------------------------------------------+ 1283 | first slice header | 1284 | first slice content | 1285 | first slice footer | 1286 | --------------------------------------------------------------- | 1287 | second slice header | 1288 | second slice content | 1289 | second slice footer | 1290 | --------------------------------------------------------------- | 1291 | ... | 1292 | --------------------------------------------------------------- | 1293 | last slice header | 1294 | last slice content | 1295 | last slice footer | 1296 +-----------------------------------------------------------------+ 1298 4.4. Slice 1299 pseudo-code | type 1300 --------------------------------------------------------------|----- 1301 Slice( ) { | 1302 if (version >= 3) | 1303 SliceHeader( ) | 1304 SliceContent( ) | 1305 if (coder_type == 0) | 1306 while (!byte_aligned()) | 1307 padding | u(1) 1308 if (version <= 1) { | 1309 while (remaining_bits_in_bitstream( NumBytes ) != 0 ) | 1310 reserved | u(1) 1311 } | 1312 if (version >= 3) | 1313 SliceFooter( ) | 1314 } | 1316 "padding" specifies a bit without any significance and used only for 1317 byte alignment. MUST be 0. 1319 "reserved" specifies a bit without any significance in this revision 1320 of the specification and may have a significance in a later revision 1321 of this specification. 1322 Encoders SHOULD NOT fill these bits. 1323 Decoders SHOULD ignore these bits. 1324 Note in case these bits are used in a later revision of this 1325 specification: any revision of this specification SHOULD care about 1326 avoiding to add 40 bits of content after "SliceContent" for version 0 1327 and 1 of the bitstream. Background: due to some non conforming 1328 encoders, some bitstreams where found with 40 extra bits 1329 corresponding to "error_status" and "slice_crc_parity", a decoder 1330 conforming to the revised specification could not do the difference 1331 between a revised bitstream and a buggy bitstream. 1333 4.5. Slice Header 1334 pseudo-code | type 1335 --------------------------------------------------------------|----- 1336 SliceHeader( ) { | 1337 slice_x | ur 1338 slice_y | ur 1339 slice_width - 1 | ur 1340 slice_height - 1 | ur 1341 for( i = 0; i < quant_table_set_index_count; i++ ) | 1342 quant_table_set_index [ i ] | ur 1343 picture_structure | ur 1344 sar_num | ur 1345 sar_den | ur 1346 } | 1348 4.5.1. slice_x 1350 "slice_x" indicates the x position on the slice raster formed by 1351 num_h_slices. 1352 Inferred to be 0 if not present. 1354 4.5.2. slice_y 1356 "slice_y" indicates the y position on the slice raster formed by 1357 num_v_slices. 1358 Inferred to be 0 if not present. 1360 4.5.3. slice_width 1362 "slice_width" indicates the width on the slice raster formed by 1363 num_h_slices. 1364 Inferred to be 1 if not present. 1366 4.5.4. slice_height 1368 "slice_height" indicates the height on the slice raster formed by 1369 num_v_slices. 1370 Inferred to be 1 if not present. 1372 4.5.5. quant_table_set_index_count 1374 "quant_table_set_index_count" is defined as "1 + ( ( chroma_planes || 1375 version \<= 3 ) ? 1 : 0 ) + ( alpha_plane ? 1 : 0 )". 1377 4.5.6. quant_table_set_index 1379 "quant_table_set_index" indicates the Quantization Table Set index to 1380 select the Quantization Table Set and the initial states for the 1381 slice. 1383 Inferred to be 0 if not present. 1385 4.5.7. picture_structure 1387 "picture_structure" specifies the temporal and spatial relationship 1388 of each line of the "Frame". 1389 Inferred to be 0 if not present. 1391 +-------+-------------------------+ 1392 | value | picture structure used | 1393 +-------+-------------------------+ 1394 | 0 | unknown | 1395 | 1 | top field first | 1396 | 2 | bottom field first | 1397 | 3 | progressive | 1398 | Other | reserved for future use | 1399 +-------+-------------------------+ 1401 4.5.8. sar_num 1403 "sar_num" specifies the sample aspect ratio numerator. 1404 Inferred to be 0 if not present. 1405 MUST be 0 if sample aspect ratio is unknown. 1407 4.5.9. sar_den 1409 "sar_den" specifies the sample aspect ratio denominator. 1410 Inferred to be 0 if not present. 1411 MUST be 0 if sample aspect ratio is unknown. 1413 4.6. Slice Content 1415 pseudo-code | type 1416 --------------------------------------------------------------|----- 1417 SliceContent( ) { | 1418 if (colorspace_type == 0) { | 1419 for( p = 0; p < primary_color_count; p++ ) | 1420 for( y = 0; y < plane_pixel_height[ p ]; y++ ) | 1421 Line( p, y ) | 1422 } else if (colorspace_type == 1) { | 1423 for( y = 0; y < slice_pixel_height; y++ ) | 1424 for( p = 0; p < primary_color_count; p++ ) | 1425 Line( p, y ) | 1426 } | 1427 } | 1429 4.6.1. primary_color_count 1431 "primary_color_count" is defined as "1 + ( chroma_planes ? 2 : 0 ) + 1432 ( alpha_plane ? 1 : 0 )". 1434 4.6.2. plane_pixel_height 1436 "plane_pixel_height[ p ]" is the height in pixels of plane p of the 1437 slice. 1438 "plane_pixel_height[ 0 ]" and "plane_pixel_height[ 1 + ( 1439 chroma_planes ? 2 : 0 ) ]" value is "slice_pixel_height". 1440 If "chroma_planes" is set to 1, "plane_pixel_height[ 1 ]" and 1441 "plane_pixel_height[ 2 ]" value is "ceil(slice_pixel_height / 1442 log2_v_chroma_subsample)". 1444 4.6.3. slice_pixel_height 1446 "slice_pixel_height" is the height in pixels of the slice. 1447 Its value is "floor(( slice_y + slice_height ) * slice_pixel_height / 1448 num_v_slices) - slice_pixel_y". 1450 4.6.4. slice_pixel_y 1452 "slice_pixel_y" is the slice vertical position in pixels. 1453 Its value is "floor(slice_y * frame_pixel_height / num_v_slices)". 1455 4.7. Line 1457 pseudo-code | type 1458 --------------------------------------------------------------|----- 1459 Line( p, y ) { | 1460 if (colorspace_type == 0) { | 1461 for( x = 0; x < plane_pixel_width[ p ]; x++ ) | 1462 sample_difference[ p ][ y ][ x ] | 1463 } else if (colorspace_type == 1) { | 1464 for( x = 0; x < slice_pixel_width; x++ ) | 1465 sample_difference[ p ][ y ][ x ] | 1466 } | 1467 } | 1469 4.7.1. plane_pixel_width 1471 "plane_pixel_width[ p ]" is the width in pixels of plane p of the 1472 slice. 1473 "plane_pixel_width[ 0 ]" and "plane_pixel_width[ 1 + ( chroma_planes 1474 ? 2 : 0 ) ]" value is "slice_pixel_width". 1476 If "chroma_planes" is set to 1, "plane_pixel_width[ 1 ]" and 1477 "plane_pixel_width[ 2 ]" value is "ceil(slice_pixel_width / (1 << 1478 log2_h_chroma_subsample))". 1480 4.7.2. slice_pixel_width 1482 "slice_pixel_width" is the width in pixels of the slice. 1483 Its value is "floor(( slice_x + slice_width ) * slice_pixel_width / 1484 num_h_slices) - slice_pixel_x". 1486 4.7.3. slice_pixel_x 1488 "slice_pixel_x" is the slice horizontal position in pixels. 1489 Its value is "floor(slice_x * frame_pixel_width / num_h_slices)". 1491 4.7.4. sample_difference 1493 "sample_difference[ p ][ y ][ x ]" is the sample difference for 1494 sample at plane "p", y position "y", and x position "x". The sample 1495 value is computed based on prediction and context described in 1496 Section 3.2. 1498 4.8. Slice Footer 1500 Note: slice footer is always byte aligned. 1502 pseudo-code | type 1503 --------------------------------------------------------------|----- 1504 SliceFooter( ) { | 1505 slice_size | u(24) 1506 if (ec) { | 1507 error_status | u(8) 1508 slice_crc_parity | u(32) 1509 } | 1510 } | 1512 4.8.1. slice_size 1514 "slice_size" indicates the size of the slice in bytes. 1515 Note: this allows finding the start of slices before previous slices 1516 have been fully decoded, and allows parallel decoding as well as 1517 error resilience. 1519 4.8.2. error_status 1521 "error_status" specifies the error status. 1523 +-------+--------------------------------------+ 1524 | value | error status | 1525 +-------+--------------------------------------+ 1526 | 0 | no error | 1527 | 1 | slice contains a correctable error | 1528 | 2 | slice contains a uncorrectable error | 1529 | Other | reserved for future use | 1530 +-------+--------------------------------------+ 1532 4.8.3. slice_crc_parity 1534 "slice_crc_parity" 32 bits that are chosen so that the slice as a 1535 whole has a crc remainder of 0. 1536 This is equivalent to storing the crc remainder in the 32-bit parity. 1537 The CRC generator polynomial used is the standard IEEE CRC polynomial 1538 (0x104C11DB7) with initial value 0. 1540 4.9. Quantization Table Set 1542 The Quantization Table Sets are stored by storing the number of equal 1543 entries -1 of the first half of the table (represented as "len - 1" 1544 in the pseudo-code below) using the method described in 1545 Section 3.8.1.2. The second half doesn't need to be stored as it is 1546 identical to the first with flipped sign. "scale" and "len_count[ i 1547 ][ j ]" are temporary values used for the computing of 1548 "context_count[ i ]" and are not used outside Quantization Table Set 1549 pseudo-code. 1551 example: 1553 Table: 0 0 1 1 1 1 2 2 -2 -2 -2 -1 -1 -1 -1 0 1555 Stored values: 1, 3, 1 1557 pseudo-code | type 1558 --------------------------------------------------------------|----- 1559 QuantizationTableSet( i ) { | 1560 scale = 1 | 1561 for( j = 0; j < MAX_CONTEXT_INPUTS; j++ ) { | 1562 QuantizationTable( i, j, scale ) | 1563 scale *= 2 * len_count[ i ][ j ] - 1 | 1564 } | 1565 context_count[ i ] = ceil ( scale / 2 ) | 1566 } | 1568 MAX_CONTEXT_INPUTS is 5. 1570 pseudo-code | type 1571 --------------------------------------------------------------|----- 1572 QuantizationTable(i, j, scale) { | 1573 v = 0 | 1574 for( k = 0; k < 128; ) { | 1575 len - 1 | ur 1576 for( a = 0; a < len; a++ ) { | 1577 quant_tables[ i ][ j ][ k ] = scale* v | 1578 k++ | 1579 } | 1580 v++ | 1581 } | 1582 for( k = 1; k < 128; k++ ) { | 1583 quant_tables[ i ][ j ][ 256 - k ] = \ | 1584 -quant_tables[ i ][ j ][ k ] | 1585 } | 1586 quant_tables[ i ][ j ][ 128 ] = \ | 1587 -quant_tables[ i ][ j ][ 127 ] | 1588 len_count[ i ][ j ] = v | 1589 } | 1591 4.9.1. quant_tables 1593 "quant_tables[ i ][ j ][ k ]" indicates the quantification table 1594 value of the Quantized Sample Difference "k" of the Quantization 1595 Table "j" of the Set Quantization Table Set "i". 1597 4.9.2. context_count 1599 "context_count[ i ]" indicates the count of contexts for Quantization 1600 Table Set "i". 1602 5. Restrictions 1604 To ensure that fast multithreaded decoding is possible, starting 1605 version 3 and if frame_pixel_width * frame_pixel_height is more than 1606 101376, slice_width * slice_height MUST be less or equal to 1607 num_h_slices * num_v_slices / 4. Note: 101376 is the frame size in 1608 pixels of a 352x288 frame also known as CIF ("Common Intermediate 1609 Format") frame size format. 1611 For each "Frame", each position in the slice raster MUST be filled by 1612 one and only one slice of the "Frame" (no missing slice position, no 1613 slice overlapping). 1615 For each "Frame" with keyframe value of 0, each slice MUST have the 1616 same value of slice_x, slice_y, slice_width, slice_height as a slice 1617 in the previous "Frame". 1619 6. Security Considerations 1621 Like any other codec, (such as [RFC6716]), FFV1 should not be used 1622 with insecure ciphers or cipher-modes that are vulnerable to known 1623 plaintext attacks. Some of the header bits as well as the padding 1624 are easily predictable. 1626 Implementations of the FFV1 codec need to take appropriate security 1627 considerations into account, as outlined in [RFC4732]. It is 1628 extremely important for the decoder to be robust against malicious 1629 payloads. Malicious payloads must not cause the decoder to overrun 1630 its allocated memory or to take an excessive amount of resources to 1631 decode. Although problems in encoders are typically rarer, the same 1632 applies to the encoder. Malicious video streams must not cause the 1633 encoder to misbehave because this would allow an attacker to attack 1634 transcoding gateways. A frequent security problem in image and video 1635 codecs is also to not check for integer overflows in Pixel count 1636 computations, that is to allocate width * height without considering 1637 that the multiplication result may have overflowed the arithmetic 1638 types range. 1640 The reference implementation [REFIMPL] contains no known buffer 1641 overflow or cases where a specially crafted packet or video segment 1642 could cause a significant increase in CPU load. 1644 The reference implementation [REFIMPL] was validated in the following 1645 conditions: 1647 o Sending the decoder valid packets generated by the reference 1648 encoder and verifying that the decoder's output matches the 1649 encoder's input. 1651 o Sending the decoder packets generated by the reference encoder and 1652 then subjected to random corruption. 1654 o Sending the decoder random packets that are not FFV1. 1656 In all of the conditions above, the decoder and encoder was run 1657 inside the [VALGRIND] memory debugger as well as clangs address 1658 sanitizer [Address-Sanitizer], which track reads and writes to 1659 invalid memory regions as well as the use of uninitialized memory. 1660 There were no errors reported on any of the tested conditions. 1662 7. Media Type Definition 1664 This registration is done using the template defined in [RFC6838] and 1665 following [RFC4855]. 1667 Type name: video 1669 Subtype name: FFV1 1671 Required parameters: None. 1673 Optional parameters: 1675 This parameter is used to signal the capabilities of a receiver 1676 implementation. This parameter MUST NOT be used for any other 1677 purpose. 1679 version: The version of the FFV1 encoding as defined by 1680 Section 4.1.1. 1682 micro_version: The micro_version of the FFV1 encoding as defined by 1683 Section 4.1.2. 1685 coder_type: The coder_type of the FFV1 encoding as defined by 1686 Section 4.1.3. 1688 colorspace_type: The colorspace_type of the FFV1 encoding as defined 1689 by Section 4.1.5. 1691 bits_per_raw_sample: The version of the FFV1 encoding as defined by 1692 Section 4.1.7. 1694 max-slices: The value of max-slices is an integer indicating the 1695 maximum count of slices with a frames of the FFV1 encoding. 1697 Encoding considerations: 1699 This media type is defined for encapsulation in several audiovisual 1700 container formats and contains binary data; see Section 4.2.3. This 1701 media type is framed binary data Section 4.8 of [RFC4288]. 1703 Security considerations: 1705 See Section 6 of this document. 1707 Interoperability considerations: None. 1709 Published specification: 1711 [I-D.ietf-cellar-ffv1] and RFC XXXX. 1713 [RFC Editor: Upon publication as an RFC, please replace "XXXX" with 1714 the number assigned to this document and remove this note.] 1715 Applications which use this media type: 1717 Any application that requires the transport of lossless video can use 1718 this media type. Some examples are, but not limited to screen 1719 recording, scientific imaging, and digital video preservation. 1721 Fragment identifier considerations: N/A. 1723 Additional information: None. 1725 Person & email address to contact for further information: Michael 1726 Niedermayer 1728 Intended usage: COMMON 1730 Restrictions on usage: None. 1732 Author: Dave Rice 1734 Change controller: IETF cellar working group delegated from the IESG. 1736 8. IANA Considerations 1738 The IANA is requested to register the following values: 1740 o Media type registration as described in Section 7. 1742 9. Appendixes 1744 9.1. Decoder implementation suggestions 1746 9.1.1. Multi-threading Support and Independence of Slices 1748 The FFV1 bitstream is parsable in two ways: in sequential order as 1749 described in this document or with the pre-analysis of the footer of 1750 each slice. Each slice footer contains a slice_size field so the 1751 boundary of each slice is computable without having to parse the 1752 slice content. That allows multi-threading as well as independence 1753 of slice content (a bitstream error in a slice header or slice 1754 content has no impact on the decoding of the other slices). 1756 After having checked keyframe field, a decoder SHOULD parse 1757 slice_size fields, from slice_size of the last slice at the end of 1758 the "Frame" up to slice_size of the first slice at the beginning of 1759 the "Frame", before parsing slices, in order to have slices 1760 boundaries. A decoder MAY fallback on sequential order e.g. in case 1761 of a corrupted "Frame" (frame size unknown, slice_size of slices not 1762 coherent...) or if there is no possibility of seek into the stream. 1764 10. Changelog 1766 See 1768 11. References 1770 11.1. Normative References 1772 [I-D.ietf-cellar-ffv1] 1773 Niedermayer, M., Rice, D., and J. Martinez, "FFV1 Video 1774 Coding Format Version 0, 1, and 3", draft-ietf-cellar- 1775 ffv1-04 (work in progress), July 2018. 1777 [ISO.15444-1.2016] 1778 International Organization for Standardization, 1779 "Information technology -- JPEG 2000 image coding system: 1780 Core coding system", October 2016. 1782 [ISO.9899.1990] 1783 International Organization for Standardization, 1784 "Programming languages - C", ISO Standard 9899, 1990. 1786 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1787 Requirement Levels", BCP 14, RFC 2119, 1788 DOI 10.17487/RFC2119, March 1997, 1789 . 1791 [RFC4288] Freed, N. and J. Klensin, "Media Type Specifications and 1792 Registration Procedures", RFC 4288, DOI 10.17487/RFC4288, 1793 December 2005, . 1795 [RFC4732] Handley, M., Ed., Rescorla, E., Ed., and IAB, "Internet 1796 Denial-of-Service Considerations", RFC 4732, 1797 DOI 10.17487/RFC4732, December 2006, 1798 . 1800 [RFC4855] Casner, S., "Media Type Registration of RTP Payload 1801 Formats", RFC 4855, DOI 10.17487/RFC4855, February 2007, 1802 . 1804 [RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the 1805 Opus Audio Codec", RFC 6716, DOI 10.17487/RFC6716, 1806 September 2012, . 1808 [RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type 1809 Specifications and Registration Procedures", BCP 13, 1810 RFC 6838, DOI 10.17487/RFC6838, January 2013, 1811 . 1813 11.2. Informative References 1815 [Address-Sanitizer] 1816 The Clang Team, "ASAN AddressSanitizer website", undated, 1817 . 1819 [AVI] Microsoft, "AVI RIFF File Reference", undated, 1820 . 1823 [FFV1_V0] Niedermayer, M., "Commit to mark FFV1 version 0 as non- 1824 experimental", April 2006, . 1828 [FFV1_V1] Niedermayer, M., "Commit to release FFV1 version 1", April 1829 2009, . 1832 [FFV1_V3] Niedermayer, M., "Commit to mark FFV1 version 3 as non- 1833 experimental", August 2013, . 1837 [HuffYUV] Rudiak-Gould, B., "HuffYUV", December 2003, 1838 . 1841 [ISO.14495-1.1999] 1842 International Organization for Standardization, 1843 "Information technology -- Lossless and near-lossless 1844 compression of continuous-tone still images: Baseline", 1845 December 1999. 1847 [ISO.14496-10.2014] 1848 International Organization for Standardization, 1849 "Information technology -- Coding of audio-visual objects 1850 -- Part 10: Advanced Video Coding", September 2014. 1852 [ISO.14496-12.2015] 1853 International Organization for Standardization, 1854 "Information technology -- Coding of audio-visual objects 1855 -- Part 12: ISO base media file format", December 2015. 1857 [Matroska] 1858 IETF, "Matroska", 2016, . 1861 [NUT] Niedermayer, M., "NUT Open Container Format", December 1862 2013, . 1864 [range-coding] 1865 Nigel, G. and N. Martin, "Range encoding: an algorithm for 1866 removing redundancy from a digitised message.", Proc. 1867 Institution of Electronic and Radio Engineers 1868 International Conference on Video and Data Recording , 1869 July 1979. 1871 [REFIMPL] Niedermayer, M., "The reference FFV1 implementation / the 1872 FFV1 codec in FFmpeg", undated, . 1874 [VALGRIND] 1875 Valgrind Developers, "Valgrind website", undated, 1876 . 1878 [YCbCr] Wikipedia, "YCbCr", undated, 1879 . 1881 Authors' Addresses 1883 Michael Niedermayer 1885 Email: michael@niedermayer.cc 1887 Dave Rice 1889 Email: dave@dericed.com 1891 Jerome Martinez 1893 Email: jerome@mediaarea.net