idnits 2.17.1 draft-ietf-cellar-ffv1-v4-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (April 22, 2018) is 2193 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '1' on line 598 -- Looks like a reference, but probably isn't: '2' on line 598 == Outdated reference: A later version (-20) exists of draft-ietf-cellar-ffv1-01 ** Downref: Normative reference to an Informational draft: draft-ietf-cellar-ffv1 (ref. 'I-D.ietf-cellar-ffv1') ** Obsolete normative reference: RFC 4288 (Obsoleted by RFC 6838) Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 cellar M. Niedermayer 3 Internet-Draft 4 Intended status: Standards Track D. Rice 5 Expires: October 24, 2018 6 J. Martinez 7 April 22, 2018 9 FFV1 Video Coding Format Version 4 10 draft-ietf-cellar-ffv1-v4-00 12 Abstract 14 This document defines FFV1, a lossless intra-frame video encoding 15 format. FFV1 is designed to efficiently compress video data in a 16 variety of pixel formats. Compared to uncompressed video, FFV1 17 offers storage compression, frame fixity, and self-description, which 18 makes FFV1 useful as a preservation or intermediate video format. 20 Status of This Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at https://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on October 24, 2018. 37 Copyright Notice 39 Copyright (c) 2018 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (https://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 55 2. Notation and Conventions . . . . . . . . . . . . . . . . . . 4 56 2.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . 4 57 2.2. Conventions . . . . . . . . . . . . . . . . . . . . . . . 5 58 2.2.1. Arithmetic operators . . . . . . . . . . . . . . . . 5 59 2.2.2. Assignment operators . . . . . . . . . . . . . . . . 6 60 2.2.3. Comparison operators . . . . . . . . . . . . . . . . 6 61 2.2.4. Mathematical functions . . . . . . . . . . . . . . . 6 62 2.2.5. Order of operation precedence . . . . . . . . . . . . 7 63 2.2.6. Pseudo-code . . . . . . . . . . . . . . . . . . . . . 7 64 2.2.7. Range . . . . . . . . . . . . . . . . . . . . . . . . 7 65 2.2.8. NumBytes . . . . . . . . . . . . . . . . . . . . . . 8 66 2.2.9. Bitstream functions . . . . . . . . . . . . . . . . . 8 67 3. General Description . . . . . . . . . . . . . . . . . . . . . 8 68 3.1. Border . . . . . . . . . . . . . . . . . . . . . . . . . 8 69 3.2. Samples . . . . . . . . . . . . . . . . . . . . . . . . . 9 70 3.3. Median predictor . . . . . . . . . . . . . . . . . . . . 9 71 3.4. Context . . . . . . . . . . . . . . . . . . . . . . . . . 10 72 3.5. Quantization Table Sets . . . . . . . . . . . . . . . . . 11 73 3.6. Quantization Table Set indexes . . . . . . . . . . . . . 11 74 3.7. Color spaces . . . . . . . . . . . . . . . . . . . . . . 11 75 3.7.1. YCbCr . . . . . . . . . . . . . . . . . . . . . . . . 11 76 3.7.2. RGB . . . . . . . . . . . . . . . . . . . . . . . . . 12 77 3.8. Coding of the Sample Difference . . . . . . . . . . . . . 13 78 3.8.1. Range coding mode . . . . . . . . . . . . . . . . . . 14 79 3.8.2. Golomb Rice mode . . . . . . . . . . . . . . . . . . 17 80 4. Bitstream . . . . . . . . . . . . . . . . . . . . . . . . . . 19 81 4.1. Configuration Record . . . . . . . . . . . . . . . . . . 20 82 4.1.1. reserved_for_future_use . . . . . . . . . . . . . . . 21 83 4.1.2. configuration_record_crc_parity . . . . . . . . . . . 21 84 4.1.3. Mapping FFV1 into Containers . . . . . . . . . . . . 21 85 4.2. Frame . . . . . . . . . . . . . . . . . . . . . . . . . . 22 86 4.3. Slice . . . . . . . . . . . . . . . . . . . . . . . . . . 22 87 4.4. Slice Header . . . . . . . . . . . . . . . . . . . . . . 23 88 4.4.1. slice_x . . . . . . . . . . . . . . . . . . . . . . . 23 89 4.4.2. slice_y . . . . . . . . . . . . . . . . . . . . . . . 23 90 4.4.3. slice_width . . . . . . . . . . . . . . . . . . . . . 24 91 4.4.4. slice_height . . . . . . . . . . . . . . . . . . . . 24 92 4.4.5. quant_table_set_index_count . . . . . . . . . . . . . 24 93 4.4.6. quant_table_set_index . . . . . . . . . . . . . . . . 24 94 4.4.7. picture_structure . . . . . . . . . . . . . . . . . . 24 95 4.4.8. sar_num . . . . . . . . . . . . . . . . . . . . . . . 24 96 4.4.9. sar_den . . . . . . . . . . . . . . . . . . . . . . . 25 97 4.4.10. reset_contexts . . . . . . . . . . . . . . . . . . . 25 98 4.4.11. slice_coding_mode . . . . . . . . . . . . . . . . . . 25 99 4.5. Slice Content . . . . . . . . . . . . . . . . . . . . . . 25 100 4.5.1. primary_color_count . . . . . . . . . . . . . . . . . 25 101 4.5.2. plane_pixel_height . . . . . . . . . . . . . . . . . 26 102 4.5.3. slice_pixel_height . . . . . . . . . . . . . . . . . 26 103 4.5.4. slice_pixel_y . . . . . . . . . . . . . . . . . . . . 26 104 4.6. Line . . . . . . . . . . . . . . . . . . . . . . . . . . 26 105 4.6.1. plane_pixel_width . . . . . . . . . . . . . . . . . . 26 106 4.6.2. slice_pixel_width . . . . . . . . . . . . . . . . . . 27 107 4.6.3. slice_pixel_x . . . . . . . . . . . . . . . . . . . . 27 108 4.6.4. sample_difference . . . . . . . . . . . . . . . . . . 27 109 4.7. Slice Footer . . . . . . . . . . . . . . . . . . . . . . 27 110 4.7.1. slice_size . . . . . . . . . . . . . . . . . . . . . 27 111 4.7.2. error_status . . . . . . . . . . . . . . . . . . . . 27 112 4.7.3. slice_crc_parity . . . . . . . . . . . . . . . . . . 28 113 4.8. Parameters . . . . . . . . . . . . . . . . . . . . . . . 28 114 4.8.1. version . . . . . . . . . . . . . . . . . . . . . . . 29 115 4.8.2. micro_version . . . . . . . . . . . . . . . . . . . . 30 116 4.8.3. coder_type . . . . . . . . . . . . . . . . . . . . . 31 117 4.8.4. state_transition_delta . . . . . . . . . . . . . . . 31 118 4.8.5. colorspace_type . . . . . . . . . . . . . . . . . . . 31 119 4.8.6. chroma_planes . . . . . . . . . . . . . . . . . . . . 32 120 4.8.7. bits_per_raw_sample . . . . . . . . . . . . . . . . . 32 121 4.8.8. log2_h_chroma_subsample . . . . . . . . . . . . . . . 32 122 4.8.9. log2_v_chroma_subsample . . . . . . . . . . . . . . . 32 123 4.8.10. alpha_plane . . . . . . . . . . . . . . . . . . . . . 32 124 4.8.11. num_h_slices . . . . . . . . . . . . . . . . . . . . 33 125 4.8.12. num_v_slices . . . . . . . . . . . . . . . . . . . . 33 126 4.8.13. quant_table_set_count . . . . . . . . . . . . . . . . 33 127 4.8.14. states_coded . . . . . . . . . . . . . . . . . . . . 33 128 4.8.15. initial_state_delta . . . . . . . . . . . . . . . . . 33 129 4.8.16. ec . . . . . . . . . . . . . . . . . . . . . . . . . 33 130 4.8.17. intra . . . . . . . . . . . . . . . . . . . . . . . . 34 131 4.9. Quantization Table Set . . . . . . . . . . . . . . . . . 34 132 4.9.1. quant_tables . . . . . . . . . . . . . . . . . . . . 35 133 4.9.2. context_count . . . . . . . . . . . . . . . . . . . . 35 134 5. Restrictions . . . . . . . . . . . . . . . . . . . . . . . . 36 135 6. Security Considerations . . . . . . . . . . . . . . . . . . . 36 136 7. Media Type Definition . . . . . . . . . . . . . . . . . . . . 37 137 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 38 138 9. Appendixes . . . . . . . . . . . . . . . . . . . . . . . . . 39 139 9.1. Decoder implementation suggestions . . . . . . . . . . . 39 140 9.1.1. Multi-threading support and independence of slices . 39 141 10. Changelog . . . . . . . . . . . . . . . . . . . . . . . . . . 39 142 11. ToDo . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 143 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 40 144 12.1. Normative References . . . . . . . . . . . . . . . . . . 40 145 12.2. Informative References . . . . . . . . . . . . . . . . . 40 146 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 42 148 1. Introduction 150 This document describes FFV1, a lossless video encoding format. The 151 design of FFV1 considers the storage of image characteristics, data 152 fixity, and the optimized use of encoding time and storage 153 requirements. FFV1 is designed to support a wide range of lossless 154 video applications such as long-term audiovisual preservation, 155 scientific imaging, screen recording, and other video encoding 156 scenarios that seek to avoid the generational loss of lossy video 157 encodings. 159 This document defines a version 4 of FFV1. Prior versions of FFV1 160 are defined within [I-D.ietf-cellar-ffv1]. 162 The latest version of this document is available at 163 165 This document assumes familiarity with mathematical and coding 166 concepts such as Range coding [range-coding] and YCbCr color spaces 167 [YCbCr]. 169 2. Notation and Conventions 171 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 172 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 173 document are to be interpreted as described in [RFC2119]. 175 2.1. Definitions 177 "Container": Format that encapsulates "Frames" and (when required) a 178 "Configuration Record" into a bitstream. 180 "Sample": The smallest addressable representation of a color 181 component or a luma component in a "Frame". Examples of sample are 182 Luma, Blue Chrominance, Red Chrominance, Alpha, Red, Green, and Blue. 184 "Pixel": The smallest addressable representation of a color in a 185 "Frame". It is composed of 1 or more samples. 187 "ESC": An ESCape symbol to indicate that the symbol to be stored is 188 too large for normal storage and that an alternate storage method. 190 "MSB": Most Significant Bit, the bit that can cause the largest 191 change in magnitude of the symbol. 193 "RCT": Reversible Color Transform, a near linear, exactly reversible 194 integer transform that converts between RGB and YCbCr representations 195 of a Pixel. 197 "VLC": Variable Length Code, a code that maps source symbols to a 198 variable number of bits. 200 "RGB": A reference to the method of storing the value of a Pixel by 201 using three numeric values that represent Red, Green, and Blue. 203 "YCbCr": A reference to the method of storing the value of a Pixel by 204 using three numeric values that represent the luma of the Pixel (Y) 205 and the chrominance of the Pixel (Cb and Cr). YCbCr word is used for 206 historical reasons and currently references any color space relying 207 on 1 luma sample and 2 chrominance samples e.g. YCbCr, YCgCo or 208 ICtCp. Exact meaning of the three numeric values is unspecified. 210 "TBA": To Be Announced. Used in reference to the development of 211 future iterations of the FFV1 specification. 213 2.2. Conventions 215 Note: the operators and the order of precedence are the same as used 216 in the C programming language [ISO.9899.1990]. 218 2.2.1. Arithmetic operators 220 "a + b" means a plus b. 222 "a - b" means a minus b. 224 "-a" means negation of a. 226 "a * b" means a multiplied by b. 228 "a / b" means a divided by b. 230 "a & b" means bit-wise "and" of a and b. 232 "a | b" means bit-wise "or" of a and b. 234 "a >> b" means arithmetic right shift of two's complement integer 235 representation of a by b binary digits. 237 "a << b" means arithmetic left shift of two's complement integer 238 representation of a by b binary digits. 240 2.2.2. Assignment operators 242 "a = b" means a is assigned b. 244 "a++" is equivalent to a is assigned a + 1. 246 "a--" is equivalent to a is assigned a - 1. 248 "a += b" is equivalent to a is assigned a + b. 250 "a -= b" is equivalent to a is assigned a - b. 252 "a *= b" is equivalent to a is assigned a * b. 254 2.2.3. Comparison operators 256 "a > b" means a is greater than b. 258 "a >= b" means a is greater than or equal to b. 260 "a < b" means a is less than b. 262 "a <= b" means a is less than or equal b. 264 "a == b" means a is equal to b. 266 "a != b" means a is not equal to b. 268 "a && b" means Boolean logical "and" of a and b. 270 "a || b" means Boolean logical "or" of a and b. 272 "!a" means Boolean logical "not" of a. 274 "a ? b : c" if a is true, then b, otherwise c. 276 2.2.4. Mathematical functions 278 floor(a) the largest integer less than or equal to a 280 ceil(a) the largest integer less than or equal to a 282 sign(a) extracts the sign of a number, i.e. if a < 0 then -1, else if 283 a > 0 then 1, else 0 285 abs(a) the absolute value of a, i.e. abs(a) = sign(a)*a 287 log2(a) the base-two logarithm of a 288 min(a,b) the smallest of two values a and b 290 max(a,b) the largest of two values a and b 292 median(a,b,c) the numerical middle value in a data set of a, b, and 293 c, i.e. a+b+c-min(a,b,c)-max(a,b,c) 295 a_{b} the b-th value of a sequence of a 297 a_{b,c} the 'b,c'-th value of a sequence of a 299 2.2.5. Order of operation precedence 301 When order of precedence is not indicated explicitly by use of 302 parentheses, operations are evaluated in the following order (from 303 top to bottom, operations of same precedence being evaluated from 304 left to right). This order of operations is based on the order of 305 operations used in Standard C. 307 a++, a-- 308 !a, -a 309 a * b, a / b, a % b 310 a + b, a - b 311 a << b, a >> b 312 a < b, a <= b, a > b, a >= b 313 a == b, a != b 314 a & b 315 a | b 316 a && b 317 a || b 318 a ? b : c 319 a = b, a += b, a -= b, a *= b 321 2.2.6. Pseudo-code 323 The FFV1 bitstream is described in this document using pseudo-code. 324 Note that the pseudo-code is used for clarity in order to illustrate 325 the structure of FFV1 and not intended to specify any particular 326 implementation. The pseudo-code used is based upon the C programming 327 language [ISO.9899.1990] as uses its "if/else", "while" and "for" 328 functions as well as functions defined within this document. 330 2.2.7. Range 332 "a...b" means any value starting from a to b, inclusive. 334 2.2.8. NumBytes 336 "NumBytes" is a non-negative integer that expresses the size in 8-bit 337 octets of particular FFV1 "Configuration Record" or "Frame". FFV1 338 relies on its "Container" to store the "NumBytes" values, see 339 Section 4.1.3. 341 2.2.9. Bitstream functions 343 2.2.9.1. remaining_bits_in_bitstream 345 "remaining_bits_in_bitstream( )" means the count of remaining bits 346 after the pointer in that "Configuration Record" or "Frame". It is 347 computed from the "NumBytes" value multiplied by 8 minus the count of 348 bits of that "Configuration Record" or "Frame" already read by the 349 bitstream parser. 351 2.2.9.2. byte_aligned 353 "byte_aligned( )" is true if "remaining_bits_in_bitstream( NumBytes 354 )" is a multiple of 8, otherwise false. 356 2.2.9.3. get_bits 358 "get_bits( i )" is the action to read the next "i" bits in the 359 bitstream, from most significant bit to least significant bit, and to 360 return the corresponding value. The pointer is increased by "i". 362 3. General Description 364 Samples within a plane are coded in raster scan order (left->right, 365 top->bottom). Each sample is predicted by the median predictor from 366 samples in the same plane and the difference is stored see 367 Section 3.8. 369 3.1. Border 371 A border is assumed for each coded slice for the purpose of the 372 predictor and context according to the following rules: 374 o one column of samples to the left of the coded slice is assumed as 375 identical to the samples of the leftmost column of the coded slice 376 shifted down by one row. The value of the topmost sample of the 377 column of samples to the left of the coded slice is assumed to be 378 "0" 380 o one column of samples to the right of the coded slice is assumed 381 as identical to the samples of the rightmost column of the coded 382 slice 384 o an additional column of samples to the left of the coded slice and 385 two rows of samples above the coded slice are assumed to be "0" 387 The following table depicts a slice of samples "a,b,c,d,e,f,g,h,i" 388 along with its assumed border. 390 +---+---+---+---+---+---+---+---+ 391 | 0 | 0 | | 0 | 0 | 0 | | 0 | 392 +---+---+---+---+---+---+---+---+ 393 | 0 | 0 | | 0 | 0 | 0 | | 0 | 394 +---+---+---+---+---+---+---+---+ 395 | | | | | | | | | 396 +---+---+---+---+---+---+---+---+ 397 | 0 | 0 | | a | b | c | | c | 398 +---+---+---+---+---+---+---+---+ 399 | 0 | a | | d | e | f | | f | 400 +---+---+---+---+---+---+---+---+ 401 | 0 | d | | g | h | i | | i | 402 +---+---+---+---+---+---+---+---+ 404 3.2. Samples 406 Positions used for context and median predictor are: 408 +---+---+---+---+ 409 | | | T | | 410 +---+---+---+---+ 411 | |tl | t |tr | 412 +---+---+---+---+ 413 | L | l | X | | 414 +---+---+---+---+ 416 "X" is the current processed Sample. The identifiers are made of the 417 first letters of the words Top, Left and Right. 419 3.3. Median predictor 421 The prediction for any sample value at position "X" may be computed 422 based upon the relative neighboring values of "l", "t", and "tl" via 423 this equation: 425 "median(l, t, l + t - tl)". 427 Note, this prediction template is also used in [ISO.14495-1.1999] and 428 [HuffYUV]. 430 Exception for the media predictor: if "colorspace_type == 0 && 431 bits_per_raw_sample == 16 && ( coder_type == 1 || coder_type == 2 )", 432 the following media predictor MUST be used: 434 "median(left16s, top16s, left16s + top16s - diag16s)" 436 where: 438 left16s = l >= 32768 ? ( l - 65536 ) : l 439 top16s = t >= 32768 ? ( t - 65536 ) : t 440 diag16s = tl >= 32768 ? ( tl - 65536 ) : tl 442 Background: a two's complement signed 16-bit signed integer was used 443 for storing sample values in all known implementations of FFV1 444 bitstream. So in some circumstances, the most significant bit was 445 wrongly interpreted (used as a sign bit instead of the 16th bit of an 446 unsigned integer). Note that when the issue is discovered, the only 447 configuration of all known implementations being impacted is 16-bit 448 YCbCr with no Pixel transformation with Range Coder coder, as other 449 potentially impacted configurations (e.g. 15/16-bit JPEG2000-RCT with 450 Range Coder coder, or 16-bit content with Golomb Rice coder) were 451 implemented nowhere. In the meanwhile, 16-bit JPEG2000-RCT with 452 Range Coder coder was implemented without this issue in one 453 implementation and validated by one conformance checker. It is 454 expected (to be confirmed) to remove this exception for the media 455 predictor in the next version of the FFV1 bitstream. 457 3.4. Context 459 Relative to any sample "X", the Quantized Sample Differences "L-l", 460 "l-tl", "tl-t", "T-t", and "t-tr" are used as context: 462 context = Q_{0}[l - tl] + 463 Q_{1}[tl - t] + 464 Q_{2}[t - tr] + 465 Q_{3}[L - l] + 466 Q_{4}[T - t] 468 If "context >= 0" then "context" is used and the difference between 469 the sample and its predicted value is encoded as is, else "-context" 470 is used and the difference between the sample and its predicted value 471 is encoded with a flipped sign. 473 3.5. Quantization Table Sets 475 The FFV1 bitstream contains 1 or more Quantization Table Sets. Each 476 Quantization Table Set contains exactly 5 Quantization Tables, each 477 Quantization Table corresponding to 1 of the 5 Quantized Sample 478 Differences. For each Quantization Table, both the number of 479 quantization steps and their distribution are stored in the FFV1 480 bitstream; each Quantization Table has exactly 256 entries, and the 8 481 least significant bits of the Quantized Sample Difference are used as 482 index: 484 Q_{j}[k] = quant_tables[i][j][k&255] 486 In this formula, "i" is the Quantization Table Set index, "j" is the 487 Quantized Table index, "k" the Quantized Sample Difference. 489 3.6. Quantization Table Set indexes 491 For each plane of each slice, a Quantization Table Set is selected 492 from an index: 494 o For Y plane, "quant_table_set_index [ 0 ]" index is used 496 o For Cb and Cr planes, "quant_table_set_index [ 1 ]" index is used 498 o For Alpha plane, "quant_table_set_index [ (version <= 3 || 499 chroma_planes) ? 2 : 1 ]" index is used 501 Background: in first implementations of FFV1 bitstream, the index for 502 Cb and Cr planes was stored even if it is not used (chroma_planes set 503 to 0), this index is kept for version <= 3 in order to keep 504 compatibility with FFV1 bitstreams in the wild. 506 3.7. Color spaces 508 FFV1 supports two color spaces: YCbCr and RGB. Both color spaces 509 allow an optional Alpha plane that can be used to code transparency 510 data. 512 3.7.1. YCbCr 514 In YCbCr color space, the Cb and Cr planes are optional, but if used 515 then MUST be used together. Omitting the Cb and Cr planes codes the 516 frames in grayscale without color data. An FFV1 "Frame" using YCbCr 517 MUST use one of the following arrangements: 519 o Y 520 o Y, Alpha 522 o Y, Cb, Cr 524 o Y, Cb, Cr, Alpha 526 The Y plane MUST be coded first. If the Cb and Cr planes are used 527 then they MUST be coded after the Y plane. If an Alpha 528 (transparency) plane is used, then it MUST be coded last. 530 3.7.2. RGB 532 JPEG2000-RCT is a Reversible Color Transform that codes RGB (red, 533 green, blue) planes losslessly in a modified YCbCr color space. 534 Reversible Pixel transformations between YCbCr and RGB use the 535 following formulae. 537 Cb=b-g 539 Cr=r-g 541 Y=g+(Cb+Cr)>>2 543 g=Y-(Cb+Cr)>>2 545 r=Cr+g 547 b=Cb+g 549 Exception for the JPEG2000-RCT conversion: if bits_per_raw_sample is 550 between 9 and 15 inclusive and alpha_plane is 0, the following 551 formulae for reversible conversions between YCbCr and RGB MUST be 552 used instead of the ones above: 554 Cb=g-b 556 Cr=r-b 558 Y=b+(Cb+Cr)>>2 560 b=Y-(Cb+Cr)>>2 562 r=Cr+b 564 g=Cb+b 566 Background: At the time of this writing, in all known implementations 567 of FFV1 bitstream, when bits_per_raw_sample was between 9 and 15 568 inclusive and alpha_plane is 0, GBR planes were used as BGR planes 569 during both encoding and decoding. In the meanwhile, 16-bit 570 JPEG2000-RCT was implemented without this issue in one implementation 571 and validated by one conformance checker. Methods to address this 572 exception for the transform are under consideration for the next 573 version of the FFV1 bitstream. 575 [ISO.15444-1.2016] 577 When FFV1 uses the JPEG2000-RCT, the horizontal lines are interleaved 578 to improve caching efficiency since it is most likely that the RCT 579 will immediately be converted to RGB during decoding. The 580 interleaved coding order is also Y, then Cb, then Cr, and then if 581 used Alpha. 583 As an example, a "Frame" that is two pixels wide and two pixels high, 584 could be comprised of the following structure: 586 +------------------------+------------------------+ 587 | Pixel[1,1] | Pixel[2,1] | 588 | Y[1,1] Cb[1,1] Cr[1,1] | Y[2,1] Cb[2,1] Cr[2,1] | 589 +------------------------+------------------------+ 590 | Pixel[1,2] | Pixel[2,2] | 591 | Y[1,2] Cb[1,2] Cr[1,2] | Y[2,2] Cb[2,2] Cr[2,2] | 592 +------------------------+------------------------+ 594 In JPEG2000-RCT, the coding order would be left to right and then top 595 to bottom, with values interleaved by lines and stored in this order: 597 Y[1,1] Y[2,1] Cb[1,1] Cb[2,1] Cr[1,1] Cr[2,1] Y[1,2] Y[2,2] Cb[1,2] 598 Cb[2,2] Cr[1,2] Cr[2,2] 600 3.8. Coding of the Sample Difference 602 Instead of coding the n+1 bits of the Sample Difference with Huffman 603 or Range coding (or n+2 bits, in the case of RCT), only the n (or 604 n+1) least significant bits are used, since this is sufficient to 605 recover the original sample. In the equation below, the term "bits" 606 represents bits_per_raw_sample+1 for RCT or bits_per_raw_sample 607 otherwise: 609 coder_input = 610 [(sample_difference + 2^(bits-1)) & (2^bits - 1)] - 2^(bits-1) 612 3.8.1. Range coding mode 614 Early experimental versions of FFV1 used the CABAC Arithmetic coder 615 from H.264 as defined in [ISO.14496-10.2014] but due to the uncertain 616 patent/royalty situation, as well as its slightly worse performance, 617 CABAC was replaced by a Range coder based on an algorithm defined by 618 _G. Nigel_ and _N. Martin_ in 1979 [range-coding]. 620 3.8.1.1. Range binary values 622 To encode binary digits efficiently a Range coder is used. "C_{i}" 623 is the i-th Context. "B_{i}" is the i-th byte of the bytestream. 624 "b_{i}" is the i-th Range coded binary value, "S_{0,i}" is the i-th 625 initial state, which is 128. The length of the bytestream encoding n 626 binary symbols is "j_{n}" bytes. 628 r_{i} = floor( ( R_{i} * S_{i,C_{i}} ) / 2^8 ) 630 S_{i+1,C_{i}} = zero_state_{S_{i,C_{i}}} XOR 631 l_i = L_i XOR 632 t_i = R_i - r_i <== 633 b_i = 0 <==> 634 L_i < R_i - r_i 636 S_{i+1,C_{i}} = one_state_{S_{i,C_{i}}} XOR 637 l_i = L_i - R_i + r_i XOR 638 t_i = r_i <== 639 b_i = 1 <==> 640 L_i >= R_i - r_i 642 S_{i+1,k} = S_{i,k} <== C_i != k 644 R_{i+1} = 2^8 * t_{i} XOR 645 L_{i+1} = 2^8 * l_{i} + B_{j_{i}} XOR 646 j_{i+1} = j_{i} + 1 <== 647 t_{i} < 2^8 649 R_{i+1} = t_{i} XOR 650 L_{i+1} = l_{i} XOR 651 j_{i+1} = j_{i} <== 652 t_{i} >= 2^8 654 R_{0} = 65280 656 L_{0} = 2^8 * B_{0} + B_{1} 658 j_{0} = 2 660 3.8.1.2. Range non binary values 662 To encode scalar integers, it would be possible to encode each bit 663 separately and use the past bits as context. However that would mean 664 255 contexts per 8-bit symbol that is not only a waste of memory but 665 also requires more past data to reach a reasonably good estimate of 666 the probabilities. Alternatively assuming a Laplacian distribution 667 and only dealing with its variance and mean (as in Huffman coding) 668 would also be possible, however, for maximum flexibility and 669 simplicity, the chosen method uses a single symbol to encode if a 670 number is 0 and if not encodes the number using its exponent, 671 mantissa and sign. The exact contexts used are best described by the 672 following code, followed by some comments. 674 pseudo-code | type 675 --------------------------------------------------------------|----- 676 void put_symbol(RangeCoder *c, uint8_t *state, int v, int \ | 677 is_signed) { | 678 int i; | 679 put_rac(c, state+0, !v); | 680 if (v) { | 681 int a= abs(v); | 682 int e= log2(a); | 683 | 684 for (i=0; i=0; i--) | 689 put_rac(c, state+22+min(i,9), (a>>i)&1); //22..31 | 690 | 691 if (is_signed) | 692 put_rac(c, state+11 + min(e, 10), v < 0); //11..21| 693 } | 694 } | 696 3.8.1.3. Initial values for the context model 698 At keyframes all Range coder state variables are set to their initial 699 state. 701 3.8.1.4. State transition table 703 one_state_{i} = 704 default_state_transition_{i} + state_transition_delta_{i} 706 zero_state_{i} = 256 - one_state_{256-i} 708 3.8.1.5. default_state_transition 710 0, 0, 0, 0, 0, 0, 0, 0, 20, 21, 22, 23, 24, 25, 26, 27, 712 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 37, 38, 39, 40, 41, 42, 714 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 56, 57, 716 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 718 74, 75, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 720 89, 90, 91, 92, 93, 94, 94, 95, 96, 97, 98, 99,100,101,102,103, 722 104,105,106,107,108,109,110,111,112,113,114,114,115,116,117,118, 724 119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,133, 726 134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149, 728 150,151,152,152,153,154,155,156,157,158,159,160,161,162,163,164, 730 165,166,167,168,169,170,171,171,172,173,174,175,176,177,178,179, 732 180,181,182,183,184,185,186,187,188,189,190,190,191,192,194,194, 734 195,196,197,198,199,200,201,202,202,204,205,206,207,208,209,209, 736 210,211,212,213,215,215,216,217,218,219,220,220,222,223,224,225, 738 226,227,227,229,229,230,231,232,234,234,235,236,237,238,239,240, 740 241,242,243,244,245,246,247,248,248, 0, 0, 0, 0, 0, 0, 0, 742 3.8.1.6. alternative state transition table 744 The alternative state transition table has been built using iterative 745 minimization of frame sizes and generally performs better than the 746 default. To use it, the coder_type MUST be set to 2 and the 747 difference to the default MUST be stored in the parameters. The 748 reference implementation of FFV1 in FFmpeg uses this table by default 749 at the time of this writing when Range coding is used. 751 0, 10, 10, 10, 10, 16, 16, 16, 28, 16, 16, 29, 42, 49, 20, 49, 753 59, 25, 26, 26, 27, 31, 33, 33, 33, 34, 34, 37, 67, 38, 39, 39, 755 40, 40, 41, 79, 43, 44, 45, 45, 48, 48, 64, 50, 51, 52, 88, 52, 757 53, 74, 55, 57, 58, 58, 74, 60,101, 61, 62, 84, 66, 66, 68, 69, 759 87, 82, 71, 97, 73, 73, 82, 75,111, 77, 94, 78, 87, 81, 83, 97, 761 85, 83, 94, 86, 99, 89, 90, 99,111, 92, 93,134, 95, 98,105, 98, 763 105,110,102,108,102,118,103,106,106,113,109,112,114,112,116,125, 765 115,116,117,117,126,119,125,121,121,123,145,124,126,131,127,129, 767 165,130,132,138,133,135,145,136,137,139,146,141,143,142,144,148, 769 147,155,151,149,151,150,152,157,153,154,156,168,158,162,161,160, 771 172,163,169,164,166,184,167,170,177,174,171,173,182,176,180,178, 773 175,189,179,181,186,183,192,185,200,187,191,188,190,197,193,196, 775 197,194,195,196,198,202,199,201,210,203,207,204,205,206,208,214, 777 209,211,221,212,213,215,224,216,217,218,219,220,222,228,223,225, 779 226,224,227,229,240,230,231,232,233,234,235,236,238,239,237,242, 781 241,243,242,244,245,246,247,248,249,250,251,252,252,253,254,255, 783 3.8.2. Golomb Rice mode 785 This coding mode uses Golomb Rice codes. The VLC is split into 2 786 parts, the prefix stores the most significant bits and the suffix 787 stores the k least significant bits or stores the whole number in the 788 ESC case. The end of the bitstream of the "Frame" is filled with 789 0-bits until that the bitstream contains a multiple of 8 bits. 791 3.8.2.1. Prefix 792 +----------------+-------+ 793 | bits | value | 794 +----------------+-------+ 795 | 1 | 0 | 796 | 01 | 1 | 797 | ... | ... | 798 | 0000 0000 0001 | 11 | 799 | 0000 0000 0000 | ESC | 800 +----------------+-------+ 802 3.8.2.2. Suffix 804 +-------+-----------------------------------------------------------+ 805 | non | the k least significant bits MSB first | 806 | ESC | | 807 | ESC | the value - 11, in MSB first order, ESC may only be used | 808 | | if the value cannot be coded as non ESC | 809 +-------+-----------------------------------------------------------+ 811 3.8.2.3. Examples 813 +-----+-------------------------+-------+ 814 | k | bits | value | 815 +-----+-------------------------+-------+ 816 | 0 | "1" | 0 | 817 | 0 | "001" | 2 | 818 | 2 | "1 00" | 0 | 819 | 2 | "1 10" | 2 | 820 | 2 | "01 01" | 5 | 821 | any | "000000000000 10000000" | 139 | 822 +-----+-------------------------+-------+ 824 3.8.2.4. Run mode 826 Run mode is entered when the context is 0 and left as soon as a non-0 827 difference is found. The level is identical to the predicted one. 828 The run and the first different level are coded. 830 3.8.2.5. Run length coding 832 The run value is encoded in 2 parts, the prefix part stores the more 833 significant part of the run as well as adjusting the run_index that 834 determines the number of bits in the less significant part of the 835 run. The 2nd part of the value stores the less significant part of 836 the run as it is. The run_index is reset for each plane and slice to 837 0. 839 pseudo-code | type 840 --------------------------------------------------------------|----- 841 log2_run[41]={ | 842 0, 0, 0, 0, 1, 1, 1, 1, | 843 2, 2, 2, 2, 3, 3, 3, 3, | 844 4, 4, 5, 5, 6, 6, 7, 7, | 845 8, 9,10,11,12,13,14,15, | 846 16,17,18,19,20,21,22,23, | 847 24, | 848 }; | 849 | 850 if (run_count == 0 && run_mode == 1) { | 851 if (get_bits(1)) { | 852 run_count = 1 << log2_run[run_index]; | 853 if (x + run_count <= w) | 854 run_index++; | 855 } else { | 856 if (log2_run[run_index]) | 857 run_count = get_bits(log2_run[run_index]); | 858 else | 859 run_count = 0; | 860 if (run_index) | 861 run_index--; | 862 run_mode = 2; | 863 } | 864 } | 866 The log2_run function is also used within [ISO.14495-1.1999]. 868 3.8.2.6. Level coding 870 Level coding is identical to the normal difference coding with the 871 exception that the 0 value is removed as it cannot occur: 873 if (diff>0) diff--; 874 encode(diff); 876 Note, this is different from JPEG-LS, which doesn't use prediction in 877 run mode and uses a different encoding and context model for the last 878 difference On a small set of test samples the use of prediction 879 slightly improved the compression rate. 881 4. Bitstream 882 +--------+----------------------------------------------------------+ 883 | Symbol | Definition | 884 +--------+----------------------------------------------------------+ 885 | u(n) | unsigned big endian integer using n bits | 886 | sg | Golomb Rice coded signed scalar symbol coded with the | 887 | | method described in Section 3.8.2 | 888 | br | Range coded Boolean (1-bit) symbol with the method | 889 | | described in Section 3.8.1.1 | 890 | ur | Range coded unsigned scalar symbol coded with the method | 891 | | described in Section 3.8.1.2 | 892 | sr | Range coded signed scalar symbol coded with the method | 893 | | described in Section 3.8.1.2 | 894 +--------+----------------------------------------------------------+ 896 The same context that is initialized to 128 is used for all fields in 897 the header. 899 The following MUST be provided by external means during 900 initialization of the decoder: 902 "frame_pixel_width" is defined as "Frame" width in pixels. 904 "frame_pixel_height" is defined as "Frame" height in pixels. 906 Default values at the decoder initialization phase: 908 "ConfigurationRecordIsPresent" is set to 0. 910 4.1. Configuration Record 912 In the case of a FFV1 bitstream with "version >= 3", a "Configuration 913 Record" is stored in the underlying "Container", at the track header 914 level. It contains the parameters used for all instances of "Frame". 915 The size of the "Configuration Record", "NumBytes", is supplied by 916 the underlying "Container". 918 pseudo-code | type 919 --------------------------------------------------------------|----- 920 ConfigurationRecord( NumBytes ) { | 921 ConfigurationRecordIsPresent = 1 | 922 Parameters( ) | 923 while( remaining_bits_in_bitstream( NumBytes ) > 32 ) | 924 reserved_for_future_use | u(1) 925 configuration_record_crc_parity | u(32) 926 } | 928 4.1.1. reserved_for_future_use 930 "reserved_for_future_use" has semantics that are reserved for future 931 use. 932 Encoders conforming to this version of this specification SHALL NOT 933 write this value. 934 Decoders conforming to this version of this specification SHALL 935 ignore its value. 937 4.1.2. configuration_record_crc_parity 939 "configuration_record_crc_parity" 32 bits that are chosen so that the 940 "Configuration Record" as a whole has a crc remainder of 0. 941 This is equivalent to storing the crc remainder in the 32-bit parity. 942 The CRC generator polynomial used is the standard IEEE CRC polynomial 943 (0x104C11DB7) with initial value 0. 945 4.1.3. Mapping FFV1 into Containers 947 This "Configuration Record" can be placed in any file format 948 supporting "Configuration Records", fitting as much as possible with 949 how the file format uses to store "Configuration Records". The 950 "Configuration Record" storage place and "NumBytes" are currently 951 defined and supported by this version of this specification for the 952 following formats: 954 4.1.3.1. AVI File Format 956 The "Configuration Record" extends the stream format chunk ("AVI ", 957 "hdlr", "strl", "strf") with the ConfigurationRecord bitstream. 958 See [AVI] for more information about chunks. 960 "NumBytes" is defined as the size, in bytes, of the strf chunk 961 indicated in the chunk header minus the size of the stream format 962 structure. 964 4.1.3.2. ISO Base Media File Format 966 The "Configuration Record" extends the sample description box 967 ("moov", "trak", "mdia", "minf", "stbl", "stsd") with a "glbl" box 968 that contains the ConfigurationRecord bitstream. See 969 [ISO.14496-12.2015] for more information about boxes. 971 "NumBytes" is defined as the size, in bytes, of the "glbl" box 972 indicated in the box header minus the size of the box header. 974 4.1.3.3. NUT File Format 976 The codec_specific_data element (in "stream_header" packet) contains 977 the ConfigurationRecord bitstream. See [NUT] for more information 978 about elements. 980 "NumBytes" is defined as the size, in bytes, of the 981 codec_specific_data element as indicated in the "length" field of 982 codec_specific_data 984 4.1.3.4. Matroska File Format 986 FFV1 SHOULD use "V_FFV1" as the Matroska "Codec ID". For FFV1 987 versions 2 or less, the Matroska "CodecPrivate" Element SHOULD NOT be 988 used. For FFV1 versions 3 or greater, the Matroska "CodecPrivate" 989 Element MUST contain the FFV1 "Configuration Record" structure and no 990 other data. See [Matroska] for more information about elements. 992 "NumBytes" is defined as the "Element Data Size" of the 993 "CodecPrivate" Element. 995 4.2. Frame 997 A "Frame" consists of the keyframe field, parameters (if version 998 <=1), and a sequence of independent slices. 1000 pseudo-code | type 1001 --------------------------------------------------------------|----- 1002 Frame( NumBytes ) { | 1003 keyframe | br 1004 if (keyframe && !ConfigurationRecordIsPresent | 1005 Parameters( ) | 1006 while ( remaining_bits_in_bitstream( NumBytes ) ) | 1007 Slice( ) | 1008 } | 1010 4.3. Slice 1011 pseudo-code | type 1012 --------------------------------------------------------------|----- 1013 Slice( ) { | 1014 if (version >= 3) | 1015 SliceHeader( ) | 1016 SliceContent( ) | 1017 if (coder_type == 0) | 1018 while (!byte_aligned()) | 1019 padding | u(1) 1020 if (version >= 3) | 1021 SliceFooter( ) | 1022 } | 1024 "padding" specifies a bit without any significance and used only for 1025 byte alignment. MUST be 0. 1027 4.4. Slice Header 1029 pseudo-code | type 1030 --------------------------------------------------------------|----- 1031 SliceHeader( ) { | 1032 slice_x | ur 1033 slice_y | ur 1034 slice_width - 1 | ur 1035 slice_height - 1 | ur 1036 for( i = 0; i < quant_table_set_index_count; i++ ) | 1037 quant_table_set_index [ i ] | ur 1038 picture_structure | ur 1039 sar_num | ur 1040 sar_den | ur 1041 if (version >= 4) { | 1042 reset_contexts | br 1043 slice_coding_mode | ur 1044 } | 1045 } | 1047 4.4.1. slice_x 1049 "slice_x" indicates the x position on the slice raster formed by 1050 num_h_slices. 1051 Inferred to be 0 if not present. 1053 4.4.2. slice_y 1055 "slice_y" indicates the y position on the slice raster formed by 1056 num_v_slices. 1057 Inferred to be 0 if not present. 1059 4.4.3. slice_width 1061 "slice_width" indicates the width on the slice raster formed by 1062 num_h_slices. 1063 Inferred to be 1 if not present. 1065 4.4.4. slice_height 1067 "slice_height" indicates the height on the slice raster formed by 1068 num_v_slices. 1069 Inferred to be 1 if not present. 1071 4.4.5. quant_table_set_index_count 1073 "quant_table_set_index_count" is defined as "1 + ( ( chroma_planes || 1074 version \<= 3 ) ? 1 : 0 ) + ( alpha_plane ? 1 : 0 )". 1076 4.4.6. quant_table_set_index 1078 "quant_table_set_index" indicates the Quantization Table Set index to 1079 select the Quantization Table Set and the initial states for the 1080 slice. 1081 Inferred to be 0 if not present. 1083 4.4.7. picture_structure 1085 "picture_structure" specifies the temporal and spatial relationship 1086 of each line of the "Frame". 1087 Inferred to be 0 if not present. 1089 +-------+-------------------------+ 1090 | value | picture structure used | 1091 +-------+-------------------------+ 1092 | 0 | unknown | 1093 | 1 | top field first | 1094 | 2 | bottom field first | 1095 | 3 | progressive | 1096 | Other | reserved for future use | 1097 +-------+-------------------------+ 1099 4.4.8. sar_num 1101 "sar_num" specifies the sample aspect ratio numerator. 1102 Inferred to be 0 if not present. 1103 MUST be 0 if sample aspect ratio is unknown. 1105 4.4.9. sar_den 1107 "sar_den" specifies the sample aspect ratio numerator. 1108 Inferred to be 0 if not present. 1109 MUST be 0 if sample aspect ratio is unknown. 1111 4.4.10. reset_contexts 1113 "reset_contexts" indicates if slice contexts must be reset. 1114 Inferred to be 0 if not present. 1116 4.4.11. slice_coding_mode 1118 "slice_coding_mode" indicates the slice coding mode. 1119 Inferred to be 0 if not present. 1121 +-------+-----------------------------+ 1122 | value | slice coding mode | 1123 +-------+-----------------------------+ 1124 | 0 | Range Coding or Golomb Rice | 1125 | 1 | raw PCM | 1126 | Other | reserved for future use | 1127 +-------+-----------------------------+ 1129 4.5. Slice Content 1131 pseudo-code | type 1132 --------------------------------------------------------------|----- 1133 SliceContent( ) { | 1134 if (colorspace_type == 0) { | 1135 for( p = 0; p < primary_color_count; p++ ) | 1136 for( y = 0; y < plane_pixel_height[ p ]; y++ ) | 1137 Line( p, y ) | 1138 } else if (colorspace_type == 1) { | 1139 for( y = 0; y < slice_pixel_height; y++ ) | 1140 for( p = 0; p < primary_color_count; p++ ) | 1141 Line( p, y ) | 1142 } | 1143 } | 1145 4.5.1. primary_color_count 1147 "primary_color_count" is defined as 1 + ( chroma_planes ? 2 : 0 ) + ( 1148 alpha_plane ? 1 : 0 ). 1150 4.5.2. plane_pixel_height 1152 "plane_pixel_height[ p ]" is the height in pixels of plane p of the 1153 slice. 1154 "plane_pixel_height[ 0 ]" and "plane_pixel_height[ 1 + ( 1155 chroma_planes ? 2 : 0 ) ]" value is "slice_pixel_height". 1156 If "chroma_planes" is set to 1, "plane_pixel_height[ 1 ]" and 1157 "plane_pixel_height[ 2 ]" value is "ceil(slice_pixel_height / 1158 log2_v_chroma_subsample)". 1160 4.5.3. slice_pixel_height 1162 "slice_pixel_height" is the height in pixels of the slice. 1163 Its value is "floor(( slice_y + slice_height ) * slice_pixel_height / 1164 num_v_slices) - slice_pixel_y". 1166 4.5.4. slice_pixel_y 1168 "slice_pixel_y" is the slice vertical position in pixels. 1169 Its value is "floor(slice_y * frame_pixel_height / num_v_slices)". 1171 4.6. Line 1173 pseudo-code | type 1174 --------------------------------------------------------------|----- 1175 Line( p, y ) { | 1176 if (colorspace_type == 0) { | 1177 for( x = 0; x < plane_pixel_width[ p ]; x++ ) | 1178 sample_difference[ p ][ y ][ x ] | 1179 } else if (colorspace_type == 1) { | 1180 for( x = 0; x < slice_pixel_width; x++ ) | 1181 sample_difference[ p ][ y ][ x ] | 1182 } | 1183 } | 1185 4.6.1. plane_pixel_width 1187 "plane_pixel_width[ p ]" is the width in pixels of plane p of the 1188 slice. 1189 "plane_pixel_width[ 0 ]" and "plane_pixel_width[ 1 + ( chroma_planes 1190 ? 2 : 0 ) ]" value is "slice_pixel_width". 1191 If "chroma_planes" is set to 1, "plane_pixel_width[ 1 ]" and 1192 "plane_pixel_width[ 2 ]" value is "ceil(slice_pixel_width / (1 << 1193 log2_h_chroma_subsample))". 1195 4.6.2. slice_pixel_width 1197 "slice_pixel_width" is the width in pixels of the slice. 1198 Its value is "floor(( slice_x + slice_width ) * slice_pixel_width / 1199 num_h_slices) - slice_pixel_x". 1201 4.6.3. slice_pixel_x 1203 "slice_pixel_x" is the slice horizontal position in pixels. 1204 Its value is "floor(slice_x * frame_pixel_width / num_h_slices)". 1206 4.6.4. sample_difference 1208 "sample_difference[ p ][ y ][ x ]" is the sample difference for 1209 sample at plane "p", y position "y", and x position "x". The sample 1210 value is computed based on prediction and context described in 1211 Section 3.2. 1213 4.7. Slice Footer 1215 Note: slice footer is always byte aligned. 1217 pseudo-code | type 1218 --------------------------------------------------------------|----- 1219 SliceFooter( ) { | 1220 slice_size | u(24) 1221 if (ec) { | 1222 error_status | u(8) 1223 slice_crc_parity | u(32) 1224 } | 1225 } | 1227 4.7.1. slice_size 1229 "slice_size" indicates the size of the slice in bytes. 1230 Note: this allows finding the start of slices before previous slices 1231 have been fully decoded. And allows this way parallel decoding as 1232 well as error resilience. 1234 4.7.2. error_status 1236 "error_status" specifies the error status. 1238 +-------+--------------------------------------+ 1239 | value | error status | 1240 +-------+--------------------------------------+ 1241 | 0 | no error | 1242 | 1 | slice contains a correctable error | 1243 | 2 | slice contains a uncorrectable error | 1244 | Other | reserved for future use | 1245 +-------+--------------------------------------+ 1247 4.7.3. slice_crc_parity 1249 "slice_crc_parity" 32 bits that are chosen so that the slice as a 1250 whole has a crc remainder of 0. 1251 This is equivalent to storing the crc remainder in the 32-bit parity. 1252 The CRC generator polynomial used is the standard IEEE CRC polynomial 1253 (0x104C11DB7) with initial value 0. 1255 4.8. Parameters 1256 pseudo-code | type 1257 --------------------------------------------------------------|----- 1258 Parameters( ) { | 1259 version | ur 1260 if (version >= 3) | 1261 micro_version | ur 1262 coder_type | ur 1263 if (coder_type > 1) | 1264 for (i = 1; i < 256; i++) | 1265 state_transition_delta[ i ] | sr 1266 colorspace_type | ur 1267 if (version >= 1) | 1268 bits_per_raw_sample | ur 1269 chroma_planes | br 1270 log2_h_chroma_subsample | ur 1271 log2_v_chroma_subsample | ur 1272 alpha_plane | br 1273 if (version >= 3) { | 1274 num_h_slices - 1 | ur 1275 num_v_slices - 1 | ur 1276 quant_table_set_count | ur 1277 } | 1278 for( i = 0; i < quant_table_set_count; i++ ) | 1279 QuantizationTableSet( i ) | 1280 if (version >= 3) { | 1281 for( i = 0; i < quant_table_set_count; i++ ) { | 1282 states_coded | br 1283 if (states_coded) | 1284 for( j = 0; j < context_count[ i ]; j++ ) | 1285 for( k = 0; k < CONTEXT_SIZE; k++ ) | 1286 initial_state_delta[ i ][ j ][ k ] | sr 1287 } | 1288 ec | ur 1289 intra | ur 1290 } | 1291 } | 1293 4.8.1. version 1295 "version" specifies the version of the FFV1 bitstream. 1296 Each version is incompatible with others versions: decoders SHOULD 1297 reject a file due to unknown version. 1298 Decoders SHOULD reject a file with version <= 1 && 1299 ConfigurationRecordIsPresent == 1. 1300 Decoders SHOULD reject a file with version >= 3 && 1301 ConfigurationRecordIsPresent == 0. 1303 +-------+-------------------------+ 1304 | value | version | 1305 +-------+-------------------------+ 1306 | 0 | FFV1 version 0 | 1307 | 1 | FFV1 version 1 | 1308 | 2 | reserved* | 1309 | 3 | FFV1 version 3 | 1310 | 4 | FFV1 version 4 | 1311 | Other | reserved for future use | 1312 +-------+-------------------------+ 1314 * Version 2 was never enabled in the encoder thus version 2 files 1315 SHOULD NOT exist, and this document does not describe them to keep 1316 the text simpler. 1318 4.8.2. micro_version 1320 "micro_version" specifies the micro-version of the FFV1 bitstream. 1321 After a version is considered stable (a micro-version value is 1322 assigned to be the first stable variant of a specific version), each 1323 new micro-version after this first stable variant is compatible with 1324 the previous micro-version: decoders SHOULD NOT reject a file due to 1325 an unknown micro-version equal or above the micro-version considered 1326 as stable. 1328 Meaning of micro_version for version 3: 1330 +-------+-------------------------+ 1331 | value | micro_version | 1332 +-------+-------------------------+ 1333 | 0...3 | reserved* | 1334 | 4 | first stable variant | 1335 | Other | reserved for future use | 1336 +-------+-------------------------+ 1338 * development versions may be incompatible with the stable variants. 1340 Meaning of micro_version for version 4 (note: at the time of writing 1341 of this specification, version 4 is not considered stable so the 1342 first stable version value is to be announced in the future): 1344 +---------+-------------------------+ 1345 | value | micro_version | 1346 +---------+-------------------------+ 1347 | 0...TBA | reserved* | 1348 | TBA | first stable variant | 1349 | Other | reserved for future use | 1350 +---------+-------------------------+ 1352 * development versions which may be incompatible with the stable 1353 variants. 1355 4.8.3. coder_type 1357 "coder_type" specifies the coder used. 1359 +-------+-------------------------------------------------+ 1360 | value | coder used | 1361 +-------+-------------------------------------------------+ 1362 | 0 | Golomb Rice | 1363 | 1 | Range Coder with default state transition table | 1364 | 2 | Range Coder with custom state transition table | 1365 | Other | reserved for future use | 1366 +-------+-------------------------------------------------+ 1368 4.8.4. state_transition_delta 1370 "state_transition_delta" specifies the Range coder custom state 1371 transition table. 1372 If state_transition_delta is not present in the FFV1 bitstream, all 1373 Range coder custom state transition table elements are assumed to be 1374 0. 1376 4.8.5. colorspace_type 1378 "colorspace_type" specifies color space losslessly encoded, Pixel 1379 transformation used by the encoder, as well as interleave method. 1381 +-------+---------------------+------------------+------------------+ 1382 | value | color space | transformation | interleave | 1383 | | losslessly encoded | | method | 1384 +-------+---------------------+------------------+------------------+ 1385 | 0 | YCbCr | No Pixel | plane then line | 1386 | | | transformation | | 1387 | 1 | RGB | JPEG2000-RCT | line then plane | 1388 | Other | reserved for future | reserved for | reserved for | 1389 | | use | future use | future use | 1390 +-------+---------------------+------------------+------------------+ 1392 Restrictions: 1393 If "colorspace_type" is 1, then "chroma_planes" MUST be 1, 1394 "log2_h_chroma_subsample" MUST be 0, and "log2_v_chroma_subsample" 1395 MUST be 0. 1397 4.8.6. chroma_planes 1399 "chroma_planes" indicates if chroma (color) planes are present. 1401 +-------+-------------------------------+ 1402 | value | presence | 1403 +-------+-------------------------------+ 1404 | 0 | chroma planes are not present | 1405 | 1 | chroma planes are present | 1406 +-------+-------------------------------+ 1408 4.8.7. bits_per_raw_sample 1410 "bits_per_raw_sample" indicates the number of bits for each sample. 1411 Inferred to be 8 if not present. 1413 +-------+---------------------------------+ 1414 | value | bits for each sample | 1415 +-------+---------------------------------+ 1416 | 0 | reserved* | 1417 | Other | the actual bits for each sample | 1418 +-------+---------------------------------+ 1420 * Encoders MUST NOT store bits_per_raw_sample = 0 Decoders SHOULD 1421 accept and interpret bits_per_raw_sample = 0 as 8. 1423 4.8.8. log2_h_chroma_subsample 1425 "log2_h_chroma_subsample" indicates the subsample factor, stored in 1426 powers to which the number 2 must be raised, between luma and chroma 1427 width ("chroma_width = 2^(-log2_h_chroma_subsample) * luma_width"). 1429 4.8.9. log2_v_chroma_subsample 1431 "log2_v_chroma_subsample" indicates the subsample factor, stored in 1432 powers to which the number 2 must be raised, between luma and chroma 1433 height ("chroma_height=2^(-log2_v_chroma_subsample) * luma_height"). 1435 4.8.10. alpha_plane 1437 "alpha_plane" indicates if a transparency plane is present. 1439 +-------+-----------------------------------+ 1440 | value | presence | 1441 +-------+-----------------------------------+ 1442 | 0 | transparency plane is not present | 1443 | 1 | transparency plane is present | 1444 +-------+-----------------------------------+ 1446 4.8.11. num_h_slices 1448 "num_h_slices" indicates the number of horizontal elements of the 1449 slice raster. 1450 Inferred to be 1 if not present. 1452 4.8.12. num_v_slices 1454 "num_v_slices" indicates the number of vertical elements of the slice 1455 raster. 1456 Inferred to be 1 if not present. 1458 4.8.13. quant_table_set_count 1460 "quant_table_set_count" indicates the number of Quantization 1461 Table Sets. 1462 Inferred to be 1 if not present. 1463 MUST NOT be 0. 1465 4.8.14. states_coded 1467 "states_coded" indicates if the respective Quantization Table Set has 1468 the initial states coded. 1469 Inferred to be 0 if not present. 1471 +-------+-----------------------------------------------------------+ 1472 | value | initial states | 1473 +-------+-----------------------------------------------------------+ 1474 | 0 | initial states are not present and are assumed to be all | 1475 | | 128 | 1476 | 1 | initial states are present | 1477 +-------+-----------------------------------------------------------+ 1479 4.8.15. initial_state_delta 1481 "initial_state_delta[ i ][ j ][ k ]" indicates the initial Range 1482 coder state, it is encoded using "k" as context index and 1484 pred = j ? initial_states[ i ][j - 1][ k ] : 128 1486 initial_state[ i ][ j ][ k ] = 1487 ( pred + initial_state_delta[ i ][ j ][ k ] ) & 255 1489 4.8.16. ec 1491 "ec" indicates the error detection/correction type. 1493 +-------+--------------------------------------------+ 1494 | value | error detection/correction type | 1495 +-------+--------------------------------------------+ 1496 | 0 | 32-bit CRC on the global header | 1497 | 1 | 32-bit CRC per slice and the global header | 1498 | Other | reserved for future use | 1499 +-------+--------------------------------------------+ 1501 4.8.17. intra 1503 "intra" indicates the relationship between the instances of "Frame". 1504 Inferred to be 0 if not present. 1506 +-------+-----------------------------------------------------------+ 1507 | value | relationship | 1508 +-------+-----------------------------------------------------------+ 1509 | 0 | Frames are independent or dependent (keyframes and non | 1510 | | keyframes) | 1511 | 1 | Frames are independent (keyframes only) | 1512 | Other | reserved for future use | 1513 +-------+-----------------------------------------------------------+ 1515 4.9. Quantization Table Set 1517 The Quantization Table Sets are stored by storing the number of equal 1518 entries -1 of the first half of the table (represented as "len - 1" 1519 in the pseudo-code below) using the method described in 1520 Section 3.8.1.2. The second half doesn't need to be stored as it is 1521 identical to the first with flipped sign. "scale" and "len_count[ i 1522 ][ j ]" are temporary values used for the computing of 1523 "context_count[ i ]" and are not used outside Quantization Table Set 1524 pseudo-code. 1526 example: 1528 Table: 0 0 1 1 1 1 2 2 -2 -2 -2 -1 -1 -1 -1 0 1530 Stored values: 1, 3, 1 1531 pseudo-code | type 1532 --------------------------------------------------------------|----- 1533 QuantizationTableSet( i ) { | 1534 scale = 1 | 1535 for( j = 0; j < MAX_CONTEXT_INPUTS; j++ ) { | 1536 QuantizationTable( i, j, scale ) | 1537 scale *= 2 * len_count[ i ][ j ] - 1 | 1538 } | 1539 context_count[ i ] = ceil ( scale / 2 ) | 1540 } | 1542 MAX_CONTEXT_INPUTS is 5. 1544 pseudo-code | type 1545 --------------------------------------------------------------|----- 1546 QuantizationTable(i, j, scale) { | 1547 v = 0 | 1548 for( k = 0; k < 128; ) { | 1549 len - 1 | ur 1550 for( a = 0; a < len; a++ ) { | 1551 quant_tables[ i ][ j ][ k ] = scale* v | 1552 k++ | 1553 } | 1554 v++ | 1555 } | 1556 for( k = 1; k < 128; k++ ) { | 1557 quant_tables[ i ][ j ][ 256 - k ] = \ | 1558 -quant_tables[ i ][ j ][ k ] | 1559 } | 1560 quant_tables[ i ][ j ][ 128 ] = \ | 1561 -quant_tables[ i ][ j ][ 127 ] | 1562 len_count[ i ][ j ] = v | 1563 } | 1565 4.9.1. quant_tables 1567 "quant_tables[ i ][ j ][ k ]" indicates the quantification table 1568 value of the Quantized Sample Difference "k" of the Quantization 1569 Table "j" of the Set Quantization Table Set "i". 1571 4.9.2. context_count 1573 "context_count[ i ]" indicates the count of contexts for Quantization 1574 Table Set "i". 1576 5. Restrictions 1578 To ensure that fast multithreaded decoding is possible, starting 1579 version 3 and if frame_pixel_width * frame_pixel_height is more than 1580 101376, slice_width * slice_height MUST be less or equal to 1581 num_h_slices * num_v_slices / 4. Note: 101376 is the frame size in 1582 pixels of a 352x288 frame also known as CIF ("Common Intermediate 1583 Format") frame size format. 1585 For each "Frame", each position in the slice raster MUST be filled by 1586 one and only one slice of the "Frame" (no missing slice position, no 1587 slice overlapping). 1589 For each "Frame" with keyframe value of 0, each slice MUST have the 1590 same value of slice_x, slice_y, slice_width, slice_height as a slice 1591 in the previous "Frame", except if reset_contexts is 1. 1593 6. Security Considerations 1595 Like any other codec, (such as [RFC6716]), FFV1 should not be used 1596 with insecure ciphers or cipher-modes that are vulnerable to known 1597 plaintext attacks. Some of the header bits as well as the padding 1598 are easily predictable. 1600 Implementations of the FFV1 codec need to take appropriate security 1601 considerations into account, as outlined in [RFC4732]. It is 1602 extremely important for the decoder to be robust against malicious 1603 payloads. Malicious payloads must not cause the decoder to overrun 1604 its allocated memory or to take an excessive amount of resources to 1605 decode. Although problems in encoders are typically rarer, the same 1606 applies to the encoder. Malicious video streams must not cause the 1607 encoder to misbehave because this would allow an attacker to attack 1608 transcoding gateways. A frequent security problem in image and video 1609 codecs is also to not check for integer overflows in Pixel count 1610 computations, that is to allocate width * height without considering 1611 that the multiplication result may have overflowed the arithmetic 1612 types range. 1614 The reference implementation [REFIMPL] contains no known buffer 1615 overflow or cases where a specially crafted packet or video segment 1616 could cause a significant increase in CPU load. 1618 The reference implementation [REFIMPL] was validated in the following 1619 conditions: 1621 o Sending the decoder valid packets generated by the reference 1622 encoder and verifying that the decoder's output matches the 1623 encoder's input. 1625 o Sending the decoder packets generated by the reference encoder and 1626 then subjected to random corruption. 1628 o Sending the decoder random packets that are not FFV1. 1630 In all of the conditions above, the decoder and encoder was run 1631 inside the [VALGRIND] memory debugger as well as clangs address 1632 sanitizer [Address-Sanitizer], which track reads and writes to 1633 invalid memory regions as well as the use of uninitialized memory. 1634 There were no errors reported on any of the tested conditions. 1636 7. Media Type Definition 1638 This registration is done using the template defined in [RFC6838] and 1639 following [RFC4855]. 1641 Type name: video 1643 Subtype name: FFV1 1645 Required parameters: None. 1647 Optional parameters: 1649 This parameter is used to signal the capabilities of a receiver 1650 implementation. This parameter MUST NOT be used for any other 1651 purpose. 1653 version: The version of the FFV1 encoding as defined by 1654 Section 4.8.1. 1656 micro_version: The micro_version of the FFV1 encoding as defined by 1657 Section 4.8.2. 1659 coder_type: The coder_type of the FFV1 encoding as defined by 1660 Section 4.8.3. 1662 colorspace_type: The colorspace_type of the FFV1 encoding as defined 1663 by Section 4.8.5. 1665 bits_per_raw_sample: The version of the FFV1 encoding as defined by 1666 Section 4.8.7. 1668 max-slices: The value of max-slices is an integer indicating the 1669 maximum count of slices with a frames of the FFV1 encoding. 1671 Encoding considerations: 1673 This media type is defined for encapsulation in several audiovisual 1674 container formats and contains binary data; see Section 4.1.3. This 1675 media type is framed binary data Section 4.8 of [RFC4288]. 1677 Security considerations: 1679 See Section 6 of this document. 1681 Interoperability considerations: None. 1683 Published specification: 1685 [I-D.ietf-cellar-ffv1] and RFC XXXX. 1687 [RFC Editor: Upon publication as an RFC, please replace "XXXX" with 1688 the number assigned to this document and remove this note.] 1690 Applications which use this media type: 1692 Any application that requires the transport of lossless video can use 1693 this media type. Some examples are, but not limited to screen 1694 recording, scientific imaging, and digital video preservation. 1696 Fragment identifier considerations: N/A. 1698 Additional information: None. 1700 Person & email address to contact for further information: Michael 1701 Niedermayer 1703 Intended usage: COMMON 1705 Restrictions on usage: None. 1707 Author: Dave Rice 1709 Change controller: IETF cellar working group delegated from the IESG. 1711 8. IANA Considerations 1713 The IANA is requested to register the following values: 1715 o Media type registration as described in Section 7. 1717 9. Appendixes 1719 9.1. Decoder implementation suggestions 1721 9.1.1. Multi-threading support and independence of slices 1723 The FFV1 bitstream is parsable in two ways: in sequential order as 1724 described in this document or with the pre-analysis of the footer of 1725 each slice. Each slice footer contains a slice_size field so the 1726 boundary of each slice is computable without having to parse the 1727 slice content. That allows multi-threading as well as independence 1728 of slice content (a bitstream error in a slice header or slice 1729 content has no impact on the decoding of the other slices). 1731 After having checked keyframe field, a decoder SHOULD parse 1732 slice_size fields, from slice_size of the last slice at the end of 1733 the "Frame" up to slice_size of the first slice at the beginning of 1734 the "Frame", before parsing slices, in order to have slices 1735 boundaries. A decoder MAY fallback on sequential order e.g. in case 1736 of a corrupted "Frame" (frame size unknown, slice_size of slices not 1737 coherent...) or if there is no possibility of seek into the stream. 1739 Architecture overview of slices in a "Frame": 1741 +-----------------------------------------------------------------+ 1742 | first slice header | 1743 | first slice content | 1744 | first slice footer | 1745 | --------------------------------------------------------------- | 1746 | second slice header | 1747 | second slice content | 1748 | second slice footer | 1749 | --------------------------------------------------------------- | 1750 | ... | 1751 | --------------------------------------------------------------- | 1752 | last slice header | 1753 | last slice content | 1754 | last slice footer | 1755 +-----------------------------------------------------------------+ 1757 10. Changelog 1759 See 1761 11. ToDo 1763 o mean,k estimation for the Golomb Rice codes 1765 12. References 1767 12.1. Normative References 1769 [I-D.ietf-cellar-ffv1] 1770 Niedermayer, M., Rice, D., and J. Martinez, "FF Video 1771 Codec 1", draft-ietf-cellar-ffv1-01 (work in progress), 1772 January 2018. 1774 [ISO.15444-1.2016] 1775 International Organization for Standardization, 1776 "Information technology -- JPEG 2000 image coding system: 1777 Core coding system", October 2016. 1779 [ISO.9899.1990] 1780 International Organization for Standardization, 1781 "Programming languages - C", ISO Standard 9899, 1990. 1783 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1784 Requirement Levels", BCP 14, RFC 2119, 1785 DOI 10.17487/RFC2119, March 1997, 1786 . 1788 [RFC4288] Freed, N. and J. Klensin, "Media Type Specifications and 1789 Registration Procedures", RFC 4288, DOI 10.17487/RFC4288, 1790 December 2005, . 1792 [RFC4855] Casner, S., "Media Type Registration of RTP Payload 1793 Formats", RFC 4855, DOI 10.17487/RFC4855, February 2007, 1794 . 1796 [RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type 1797 Specifications and Registration Procedures", BCP 13, 1798 RFC 6838, DOI 10.17487/RFC6838, January 2013, 1799 . 1801 12.2. Informative References 1803 [Address-Sanitizer] 1804 The Clang Team, "ASAN AddressSanitizer website", undated, 1805 . 1807 [AVI] Microsoft, "AVI RIFF File Reference", undated, 1808 . 1811 [HuffYUV] Rudiak-Gould, B., "HuffYUV", December 2003, 1812 . 1815 [ISO.14495-1.1999] 1816 International Organization for Standardization, 1817 "Information technology -- Lossless and near-lossless 1818 compression of continuous-tone still images: Baseline", 1819 December 1999. 1821 [ISO.14496-10.2014] 1822 International Organization for Standardization, 1823 "Information technology -- Coding of audio-visual objects 1824 -- Part 10: Advanced Video Coding", September 2014. 1826 [ISO.14496-12.2015] 1827 International Organization for Standardization, 1828 "Information technology -- Coding of audio-visual objects 1829 -- Part 12: ISO base media file format", December 2015. 1831 [Matroska] 1832 IETF, "Matroska", 2016, . 1835 [NUT] Niedermayer, M., "NUT Open Container Format", December 1836 2013, . 1838 [range-coding] 1839 Nigel, G. and N. Martin, "Range encoding: an algorithm for 1840 removing redundancy from a digitised message.", Proc. 1841 Institution of Electronic and Radio Engineers 1842 International Conference on Video and Data Recording , 1843 July 1979. 1845 [REFIMPL] Niedermayer, M., "The reference FFV1 implementation / the 1846 FFV1 codec in FFmpeg", undated, . 1848 [RFC4732] Handley, M., Ed., Rescorla, E., Ed., and IAB, "Internet 1849 Denial-of-Service Considerations", RFC 4732, 1850 DOI 10.17487/RFC4732, December 2006, 1851 . 1853 [RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the 1854 Opus Audio Codec", RFC 6716, DOI 10.17487/RFC6716, 1855 September 2012, . 1857 [VALGRIND] 1858 Valgrind Developers, "Valgrind website", undated, 1859 . 1861 [YCbCr] Wikipedia, "YCbCr", undated, 1862 . 1864 Authors' Addresses 1866 Michael Niedermayer 1868 Email: michael@niedermayer.cc 1870 Dave Rice 1872 Email: dave@dericed.com 1874 Jerome Martinez 1876 Email: jerome@mediaarea.net