idnits 2.17.1 draft-bankoski-vp8-bitstream-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 635 has weird spacing: '...typedef sign...' == Line 644 has weird spacing: '...def int int32...' -- The document date (January 6, 2011) is 4852 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: 'A' on line 2059 -- Looks like a reference, but probably isn't: 'L' on line 2059 == Missing Reference: '8' is mentioned on line 5192, but not defined -- Looks like a reference, but probably isn't: '-1' on line 2498 == Missing Reference: '0' is mentioned on line 5453, but not defined == Missing Reference: '7' is mentioned on line 2551, but not defined == Missing Reference: '9' is mentioned on line 5260, but not defined == Missing Reference: '5' is mentioned on line 4789, but not defined == Missing Reference: '6' is mentioned on line 5207, but not defined == Missing Reference: '16' is mentioned on line 2891, but not defined == Missing Reference: '12' is mentioned on line 3629, but not defined -- Looks like a reference, but probably isn't: 'MVPcount' on line 4869 -- Looks like a reference, but probably isn't: 'MVPsign' on line 4918 Summary: 3 errors (**), 0 flaws (~~), 11 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Bankoski 3 Internet-Draft P. Wilkins 4 Intended status: Informational Y. Xu 5 Expires: July 10, 2011 Google, Inc. 6 January 6, 2011 8 VP8 Data Format and Decoding Guide 9 draft-bankoski-vp8-bitstream-00 11 Abstract 13 This document describes the VP8 compressed video data format created 14 by Google On2, together with a discussion of the decoding procedure 15 for this format. 17 Status of this Memo 19 This Internet-Draft is submitted in full conformance with the 20 provisions of BCP 78 and BCP 79. 22 Internet-Drafts are working documents of the Internet Engineering 23 Task Force (IETF). Note that other groups may also distribute 24 working documents as Internet-Drafts. The list of current Internet- 25 Drafts is at http://datatracker.ietf.org/drafts/current/. 27 Internet-Drafts are draft documents valid for a maximum of six months 28 and may be updated, replaced, or obsoleted by other documents at any 29 time. It is inappropriate to use Internet-Drafts as reference 30 material or to cite them other than as "work in progress." 32 This Internet-Draft will expire on July 10, 2011. 34 Copyright Notice 36 Copyright (c) 2011 IETF Trust and the persons identified as the 37 document authors. All rights reserved. 39 This document is subject to BCP 78 and the IETF Trust's Legal 40 Provisions Relating to IETF Documents 41 (http://trustee.ietf.org/license-info) in effect on the date of 42 publication of this document. Please review these documents 43 carefully, as they describe your rights and restrictions with respect 44 to this document. 46 Table of Contents 48 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 49 2. Format Overview . . . . . . . . . . . . . . . . . . . . . . . 6 50 3. Compressed Frame Types . . . . . . . . . . . . . . . . . . . 8 51 4. Overview of Compressed Data Format . . . . . . . . . . . . . 9 52 5. Overview of the Decoding Process . . . . . . . . . . . . . . 11 53 6. Description of Algorithms . . . . . . . . . . . . . . . . . . 16 54 7. Boolean Entropy Decoder . . . . . . . . . . . . . . . . . . . 19 55 7.1. Underlying Theory of Coding . . . . . . . . . . . . . . . 20 56 7.2. Practical Algorithm Description . . . . . . . . . . . . . 21 57 7.3. Actual Implementation . . . . . . . . . . . . . . . . . . 23 58 8. Compressed Data Components . . . . . . . . . . . . . . . . . 28 59 8.1. Tree Coding Implementation . . . . . . . . . . . . . . . 30 60 8.2. Tree Coding Example . . . . . . . . . . . . . . . . . . . 31 61 9. Frame Header . . . . . . . . . . . . . . . . . . . . . . . . 34 62 9.1. Uncompressed Data Chunk . . . . . . . . . . . . . . . . . 34 63 9.2. Color Space and Pixel Type (Key Frames-only) . . . . . . 37 64 9.3. Segment-based Adjustments . . . . . . . . . . . . . . . . 37 65 9.4. Loop Filter Type and Levels . . . . . . . . . . . . . . . 38 66 9.5. Token Partition and Partition Data Offsets . . . . . . . 39 67 9.6. Dequantization Indices . . . . . . . . . . . . . . . . . 40 68 9.7. Refresh Golden Frame and AltRef Frame . . . . . . . . . . 41 69 9.8. Refresh Last Frame Buffer . . . . . . . . . . . . . . . . 43 70 9.9. DCT Coefficient Probability Update . . . . . . . . . . . 43 71 9.10. Remaining Frame Header Data (non-Key Frame) . . . . . . . 43 72 9.11. Remaining Frame Header Data (Key Frame) . . . . . . . . . 44 73 10. Segment-based Feature Adjustments . . . . . . . . . . . . . . 45 74 11. Key Frame Macroblock Prediction Records . . . . . . . . . . . 46 75 11.1. mb_skip_coeff . . . . . . . . . . . . . . . . . . . . . . 46 76 11.2. Luma Modes . . . . . . . . . . . . . . . . . . . . . . . 46 77 11.3. Subblock Mode Contexts . . . . . . . . . . . . . . . . . 49 78 11.4. Chroma Modes . . . . . . . . . . . . . . . . . . . . . . 50 79 11.5. Subblock Mode Probability Table . . . . . . . . . . . . . 51 80 12. Intraframe Prediction . . . . . . . . . . . . . . . . . . . . 55 81 12.1. mb_skip_coeff . . . . . . . . . . . . . . . . . . . . . . 55 82 12.2. Chroma Prediction . . . . . . . . . . . . . . . . . . . . 56 83 12.3. Luma Prediction . . . . . . . . . . . . . . . . . . . . . 58 84 13. DCT Coefficient Decoding . . . . . . . . . . . . . . . . . . 64 85 13.1. MB Without non-Zero Coefficient Values . . . . . . . . . 64 86 13.2. Coding of Individual Coefficient Values . . . . . . . . . 65 87 13.3. Token Probabilities . . . . . . . . . . . . . . . . . . . 67 88 13.4. Token Probability Updates . . . . . . . . . . . . . . . . 71 89 13.5. Default Token Probability Table . . . . . . . . . . . . . 76 90 14. DCT and WHT Inversion and Macroblock Reconstruction . . . . . 81 91 14.1. Dequantization . . . . . . . . . . . . . . . . . . . . . 81 92 14.2. Inverse Transforms . . . . . . . . . . . . . . . . . . . 82 93 14.3. Implementation of the WHT Inversion . . . . . . . . . . . 83 94 14.4. Implementation of the DCT Inversion . . . . . . . . . . . 85 95 14.5. Summation of Predictor and Residue . . . . . . . . . . . 88 96 15. Loop Filter . . . . . . . . . . . . . . . . . . . . . . . . . 89 97 15.1. Filter Geometry and Overall Procedure . . . . . . . . . . 90 98 15.2. Simple Filter . . . . . . . . . . . . . . . . . . . . . . 92 99 15.3. Normal Filter . . . . . . . . . . . . . . . . . . . . . . 96 100 15.4. Calculation of Control Parameters . . . . . . . . . . . . 101 101 16. Interframe Macroblock Prediction Records . . . . . . . . . . 103 102 16.1. Intra-Predicted Macroblocks . . . . . . . . . . . . . . . 103 103 16.2. Inter-Predicted Macroblocks . . . . . . . . . . . . . . . 104 104 16.3. Mode and Motion Vector Contexts . . . . . . . . . . . . . 105 105 16.4. Split Prediction . . . . . . . . . . . . . . . . . . . . 111 106 17. Motion Vector Decoding . . . . . . . . . . . . . . . . . . . 115 107 17.1. Coding of Each Component . . . . . . . . . . . . . . . . 115 108 17.2. Probability Updates . . . . . . . . . . . . . . . . . . . 117 109 18. Interframe Prediction . . . . . . . . . . . . . . . . . . . . 120 110 18.1. Bounds on and Adjustment of Motion Vectors . . . . . . . 120 111 18.2. Prediction Subblocks . . . . . . . . . . . . . . . . . . 121 112 18.3. Sub-pixel Interpolation . . . . . . . . . . . . . . . . . 122 113 18.4. Filter Properties . . . . . . . . . . . . . . . . . . . . 125 114 19. Annex A: Bitstream Syntax . . . . . . . . . . . . . . . . . . 128 115 19.1. Uncompressed Data Chunk . . . . . . . . . . . . . . . . . 128 116 19.2. Frame Header . . . . . . . . . . . . . . . . . . . . . . 130 117 19.3. Macroblock Data . . . . . . . . . . . . . . . . . . . . . 142 118 20. License . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 119 21. Copyright . . . . . . . . . . . . . . . . . . . . . . . . . . 148 120 22. References . . . . . . . . . . . . . . . . . . . . . . . . . 149 121 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 150 123 1. Introduction 125 IMPORTANT NOTE: Portions of this document contain mathematical 126 expressions, code blocks and other non-narrative content that may not 127 render suitably in the text-only RFC format. Formatted versions of 128 this document's content is available at http://www.webmproject.org/. 130 * * * 132 This document describes the VP8 compressed video data format created 133 by Google On2, together with a discussion of the decoding procedure 134 for this format. It is intended to be used in conjunction with and 135 as a guide to the reference decoder provided by Google On2. If there 136 are any conflicts between this document and the reference source 137 code, the reference source code should be considered correct. The 138 bitstream is defined by the reference source code and not this 139 document. 141 Like many modern video compression schemes, VP8 is based on 142 decomposition of frames into square subblocks of pixels, prediction 143 of such subblocks using previously constructed blocks, and adjustment 144 of such predictions (as well as synthesis of unpredicted blocks) 145 using a discrete cosine transform (hereafter abbreviated as DCT). In 146 one special case, however, VP8 uses a "Walsh-Hadamard" (hereafter 147 abbreviated as WHT) transform instead of a DCT. 149 Roughly speaking, such systems reduce datarate by exploiting the 150 temporal and spatial coherence of most video signals. It is more 151 efficient to specify the location of a visually similar portion of a 152 prior frame than it is to specify pixel values. The frequency 153 segregation provided by the DCT and WHT facilitate the exploitation 154 of both spatial coherence in the original signal and the tolerance of 155 the human visual system to moderate losses of fidelity in the 156 reconstituted signal. 158 VP8 augments these basic concepts with, among other things, 159 sophisticated usage of contextual probabilities. The result is a 160 significant reduction in datarate at a given quality. 162 Unlike some similar schemes (the older MPEG formats, for example), 163 VP8 specifies exact values for reconstructed pixels. Specifically, 164 the specification for the DCT and WHT portions of the reconstruction 165 does not allow for any "drift" caused by truncation of fractions. 166 Rather, the algorithm is specified using fixed-precision integer 167 operations exclusively. This greatly facilitates the verification of 168 the correctness of a decoder implementation as well as avoiding 169 difficult-to-predict visual incongruities between such 170 implementations. 172 It should be remarked that, in a complete video playback system, the 173 displayed frames may or may not be identical to the reconstructed 174 frames. Many systems apply a final level of filtering (commonly 175 referred to as postprocessing) to the reconstructed frames prior to 176 viewing. Such postprocessing has no effect on the decoding and 177 reconstruction of subsequent frames (which are predicted using the 178 completely-specified reconstructed frames) and is beyond the scope of 179 this document. In practice, the nature and extent of this sort of 180 postprocessing is dependent on both the taste of the user and on the 181 computational facilities of the playback environment. 183 2. Format Overview 185 VP8 works exclusively with an 8-bit YUV 4:2:0 image format. In this 186 format, each 8-bit pixel in the two chroma planes (U and V) 187 corresponds positionally to a 2x2 block of 8-bit luma pixels in the Y 188 plane; coordinates of the upper left corner of the Y block are of 189 course exactly twice the coordinates of the corresponding chroma 190 pixels. When we refer to pixels or pixel distances without 191 specifying a plane, we are implicitly referring to the Y plane or to 192 the complete image, both of which have the same (full) resolution. 194 As is usually the case, the pixels are simply a large array of bytes 195 stored in rows from top to bottom, each row being stored from left to 196 right. This "left to right" then "top to bottom" raster-scan order 197 is reflected in the layout of the compressed data as well. 199 Provision has been made for the support of two different YUV color 200 formats in the VP8 bitstream header, however only one format is 201 supported in the first release of VP8. 203 The YUV formats differ in terms of their conversion to and from RGB 204 color space. The first corresponds to the traditional YUV color 205 space similar to the YCrCb color space defined in ITU-R BT.601. The 206 second (currently unsupported) format corresponds to a new YUV color 207 space whose digital conversion to and from RGB can be implemented 208 without multiplications and divides. The VP8 Decoder should decode 209 and pass the information on to the processes that convert the YUV 210 output to RGB color space. 212 Occasionally, at very low datarates, a compression system may decide 213 to reduce the resolution of the input signal to facilitate efficient 214 compression. The VP8 data format supports this via optional 215 upscaling of its internal reconstruction buffer prior to output (this 216 is completely distinct from the optional postprocessing discussed 217 earlier, which has nothing to do with decoding per se). This 218 upsampling restores the video frames to their original resolution. 219 In other words, the compression/decompression system can be viewed as 220 a "black box", where the input and output is always at a given 221 resolution. The compressor might decide to "cheat" and process the 222 signal at a lower resolution. In that case, the decompressor needs 223 the ability to restore the signal to its original resolution. 225 Internally, VP8 decomposes each output frame into an array of 226 macroblocks. A macroblock is a square array of pixels whose Y 227 dimensions are 16x16 and whose U and V dimensions are 8x8. 228 Macroblock-level data in a compressed frame occurs (and must be 229 processed) in a raster order similar to that of the pixels comprising 230 the frame. 232 Macroblocks are further decomposed into 4x4 subblocks. Every 233 macroblock has 16 Y subblocks, 4 U subblocks, and 4 V subblocks. Any 234 subblock-level data (and processing of such data) again occurs in 235 raster order, this time in raster order within the containing 236 macroblock. 238 As discussed in further detail below, data can be specified at the 239 levels of both macroblocks and their subblocks. 241 Pixels are always treated, at a minimum, at the level of subblocks, 242 which may be thought of as the "atoms" of the VP8 algorithm. In 243 particular, the 2x2 chroma blocks corresponding to 4x4 Y subblocks 244 are never treated explicitly in the data format or in the algorithm 245 specification. 247 The DCT and WHT always operate at a 4x4 resolution. The DCT is used 248 for the 16Y, 4U and 4V subblocks. The WHT is used (with some but not 249 all prediction modes) to encode a 4x4 array comprising the average 250 intensities of the 16 Y subblocks of a macroblock. These average 251 intensities are, up to a constant normalization factor, nothing more 252 that the zeroth DCT coefficients of the Y subblocks. This "higher- 253 level" WHT is a substitute for the explicit specification of those 254 coefficients, in exactly the same way as the DCT of a subblock 255 substitutes for the specification of the pixel values comprising the 256 subblock. We consider this 4x4 array as a second-order subblock 257 called Y2, and think of a macroblock as containing 24 "real" 258 subblocks and, sometimes, a 25th "virtual" subblock. This is dealt 259 with further in Chapter 13. 261 The frame layout used by the reference decoder may be found in the 262 file yv12config.h. 264 3. Compressed Frame Types 266 There are only two types of frames in VP8. 268 Intraframes (also called key frames and, in MPEG terminology, 269 I-frames) are decoded without reference to any other frame in a 270 sequence, that is, the decompressor reconstructs such frames 271 beginning from its "default" state. Key frames provide random access 272 (or seeking) points in a video stream. 274 Interframes (also called prediction frames and, in MPEG terminology, 275 P-frames) are encoded with reference to prior frames, specifically 276 all prior frames up to and including the most recent key frame. 277 Generally speaking, the correct decoding of an interframe depends on 278 the correct decoding of the most recent key frame and all ensuing 279 frames. Consequently, the decoding algorithm is not tolerant of 280 dropped frames: In an environment in which frames may be dropped or 281 corrupted, correct decoding will not be possible until a key frame is 282 correctly received. 284 In contrast to MPEG, there is no use of bidirectional prediction. No 285 frame is predicted using frames temporally subsequent to it; there is 286 no analog to an MPEG B-frame. 288 Secondly, VP8 augments these notions with that of alternate 289 prediction frames, called golden frames and altref frames 290 (alternative reference frames). Blocks in an interframe may be 291 predicted using blocks in the immediately previous frame as well as 292 the most recent golden frame or altref frame. Every key frame is 293 automatically golden and altref, and any interframe may optionally 294 replace the most recent golden or altref frame. 296 Golden frames and altref frames may also be used to partially 297 overcome the intolerance to dropped frames discussed above: If a 298 compressor is configured to code golden frames only with reference to 299 the prior golden frame (and key frame) then the "substream" of key 300 and golden frames may be decoded regardless of loss of other 301 interframes. Roughly speaking, the implementation requires (on the 302 compressor side) that golden frames subsume and recode any context 303 updates effected by the intervening interframes. A typical 304 application of this approach is video conferencing, in which 305 retransmission of a prior golden frame and/or a delay in playback 306 until receipt of the next golden frame is preferable to a larger 307 retransmit and/or delay until the next key frame. 309 4. Overview of Compressed Data Format 311 The input to a VP8 decoder is a sequence of compressed frames whose 312 order matches their order in time. Issues such as the duration of 313 frames, the corresponding audio, and synchronization are generally 314 provided by the playback environment and are irrelevant to the 315 decoding process itself, however, to aid in fast seeking a start code 316 is included in the header of each key frame. 318 The decoder is simply presented with a sequence of compressed frames 319 and produces a sequence of decompressed (reconstructed) YUV frames 320 corresponding to the input sequence. As stated in the introduction, 321 the exact pixel values in the reconstructed frame are part of VP8's 322 specification. This document specifies the layout of the compressed 323 frames and gives unambiguous algorithms for the correct production of 324 reconstructed frames. 326 The first frame presented to the decompressor is of course a key 327 frame. This may be followed by any number of interframes; the 328 correct reconstruction of each frame depends on all prior frames up 329 to the key frame. The next key frame restarts this process: The 330 decompressor resets to its default initial condition upon reception 331 of a key frame and the decoding of a key frame (and its ensuing 332 interframes) is completely independent of any prior decoding. 334 At the highest level, every compressed frame has three or more 335 pieces. It begins with an uncompressed data chunk comprising 10 336 bytes in the case of key frames and 3-bytes for inter frames. This 337 is followed by two or more blocks of compressed data (called 338 partitions). These compressed data partitions begin and end on byte 339 boundaries. 341 The first compressed partition has two subsections: 343 1. Header information that applies to the frame as a whole. 345 2. Per-macroblock information specifying how each macroblock is 346 predicted from the already-reconstructed data that is available 347 to the decompressor. 349 As stated above, the macroblock-level information occurs in raster- 350 scan order. 352 The rest of the partitions contain, for each block, the DCT/WHT 353 coefficients (quantized and logically compressed) of the residue 354 signal to be added to the predicted block values. It typically 355 accounts for roughly 70% of the overall datarate. VP8 supports 356 packing the compressed DCT/WHT coefficients' data from macroblock 357 rows into separate partitions. If there is more than one partition 358 for these coefficients, the sizes of the partitions -- except the 359 last partition -- in bytes are also present in the bitstream right 360 after the above first partition. Each of the sizes is a 3-byte data 361 item written in little endian format. These sizes provide the 362 decoder direct access to all DCT/WHT coefficient partitions, which 363 enables parallel processing of the coefficients in a decoder. 365 The separate partitioning of the prediction data and coefficient data 366 also allows flexibility in the implementation of a decompressor: An 367 implementation may decode and store the prediction information for 368 the whole frame and then decode, transform, and add the residue 369 signal to the entire frame, or it may simultaneously decode both 370 partitions, calculating prediction information and adding in the 371 residue signal for each block in order. The length field in the 372 frame tag, which allows decoding of the second partition to begin 373 before the first partition has been completely decoded, is necessary 374 for the second "block-at-a-time" decoder implementation. 376 All partitions are decoded using separate instances of the boolean 377 entropy decoder described in Chapter 7. Although some of the data 378 represented within the partitions is conceptually "flat" (a bit is 379 just a bit with no probabilistic expectation one way or the other), 380 because of the way such coders work, there is never a direct 381 correspondence between a "conceptual bit" and an actual physical bit 382 in the compressed data partitions. Only in the 3 or 10 byte 383 uncompressed chunk described above is there such a physical 384 correspondence. 386 A related matter, which is true for most lossless compression 387 formats, is that seeking within a partition is not supported. The 388 data must be decompressed and processed (or at least stored) in the 389 order in which it occurs in the partition. 391 While this document specifies the ordering of the partition data 392 correctly, the details and semantics of this data are discussed in a 393 more logical fashion to facilitate comprehension. For example, the 394 frame header contains updates to many probability tables used in 395 decoding per-macroblock data. The latter is often described before 396 the layouts of the probabilities and their updates, even though this 397 is the opposite of their order in the bitstream. 399 5. Overview of the Decoding Process 401 A VP8 decoder needs to maintain four YUV frame buffers whose 402 resolutions are at least equal to that of the encoded image. These 403 buffers hold the current frame being reconstructed, the immediately 404 previous reconstructed frame, the most recent golden frame, and the 405 most recent altref frame. 407 Most implementations will wish to "pad" these buffers with 408 "invisible" pixels that extend a moderate number of pixels beyond all 409 four edges of the visible image. This simplifies interframe 410 prediction by allowing all (or most) prediction blocks -- which are 411 not guaranteed to lie within the visible area of a prior frame -- to 412 address usable image data. 414 Regardless of the amount of padding chosen, the invisible rows above 415 (below) the image are filled with copies of the top (bottom) row of 416 the image; the invisible columns to the left (right) of the image are 417 filled with copies of the leftmost (rightmost) visible row; and the 418 four invisible corners are filled with copies of the corresponding 419 visible corner pixels. The use of these prediction buffers (and 420 suggested sizes for the halo) will be elaborated on in the discussion 421 of motion vectors, interframe prediction, and sub-pixel interpolation 422 later in this document. 424 As will be seen in the description of the frame header, the image 425 dimensions are specified (and can change) with every key frame. 426 These buffers (and any other data structures whose size depends on 427 the size of the image) should be allocated (or re-allocated) 428 immediately after the dimensions are decoded. 430 Leaving most of the details for later elaboration, the following is 431 an outline the decoding process. 433 First, the frame header (beginning of the first data partition) is 434 decoded. Altering or augmenting the maintained state of the decoder, 435 this provides the context in which the per-macroblock data can be 436 interpreted. 438 The macroblock data occurs (and must be processed) in raster-scan 439 order. This data comes in two or more parts. The first (prediction 440 or mode) part comes in the remainder of the first data partition. 441 The other parts comprise the data partition(s) for the DCT/WHT 442 coefficients of the residue signal. For each macroblock, the 443 prediction data must be processed before the residue. 445 Each macroblock is predicted using one (and only one) of four 446 possible frames. All macroblocks in a key frame, and all intra-coded 447 macroblocks in an interframe, are predicted using the already-decoded 448 macroblocks in the current frame. Macroblocks in an interframe may 449 also be predicted using the previous frame, the golden frame or the 450 altref frame. Such macroblocks are said to be inter-coded. 452 The purpose of prediction is to use already-constructed image data to 453 approximate the portion of the original image being reconstructed. 454 The effect of any of the prediction modes is then to write a 455 macroblock-sized prediction buffer containing this approximation. 457 Regardless of the prediction method, the residue DCT signal is 458 decoded, dequantized, reverse-transformed, and added to the 459 prediction buffer to produce the (almost final) reconstruction value 460 of the macroblock, which is stored in the correct position of the 461 current frame buffer. 463 The residue signal consists of 24 (sixteen Y, four U, and four V) 4x4 464 quantized and losslessly-compressed DCT transforms approximating the 465 difference between the original macroblock in the uncompressed source 466 and the prediction buffer. For most prediction modes, the zeroth 467 coefficients of the sixteen Y subblocks are expressed via a 25th WHT 468 of the second-order virtual Y2 subblock discussed above. 470 Intra-prediction exploits the spatial coherence of frames. The 16x16 471 luma (Y) and 8x8 chroma (UV) components are predicted independently 472 of each other using one of four simple means of pixel propagation, 473 starting from the already-reconstructed (16-pixel long luma, 8-pixel 474 long chroma) row above and column to the left of the current 475 macroblock. The four methods are: 477 1. Copying the row from above throughout the prediction buffer. 479 2. Copying the column from left throughout the prediction buffer. 481 3. Copying the average value of the row and column throughout the 482 prediction buffer. 484 4. Extrapolation from the row and column using the (fixed) second 485 difference (horizontal and vertical) from the upper left corner. 487 Additionally, the sixteen Y subblocks may be predicted independently 488 of each other using one of ten different modes, four of which are 4x4 489 analogs of those described above, augmented with six "diagonal" 490 prediction methods. There are two types of predictions, one intra 491 and one prediction (among all the modes), for which the residue 492 signal does not use the Y2 block to encode the DC portion of the 493 sixteen 4x4 Y subblock DCTs. This "independent Y subblock" mode has 494 no effect on the 8x8 chroma prediction. 496 Inter-prediction exploits the temporal coherence between nearby 497 frames. Except for the choice of the prediction frame itself, there 498 is no difference between inter-prediction based on the previous frame 499 and that based on the golden frame or altref frame. 501 Inter-prediction is conceptually very simple. While, for reasons of 502 efficiency, there are several methods of encoding the relationship 503 between the current macroblock and corresponding sections of the 504 prediction frame, ultimately each of the sixteen Y subblocks is 505 related to a 4x4 subblock of the prediction frame, whose position in 506 that frame differs from the current subblock position by a (usually 507 small) displacement. These two-dimensional displacements are called 508 motion vectors. 510 The motion vectors used by VP8 have quarter-pixel precision. 511 Prediction of a subblock using a motion vector that happens to have 512 integer (whole number) components is very easy: the 4x4 block of 513 pixels from the displaced block in the previous, golden, or altref 514 frame are simply copied into the correct position of the current 515 macroblock's prediction buffer. 517 Fractional displacements are conceptually and implementationally more 518 complex. They require the inference (or synthesis) of sample values 519 that, strictly speaking, do not exist. This is one of the most basic 520 problems in signal processing and readers conversant with that 521 subject will see that the approach taken by VP8 provides a good 522 balance of robustness, accuracy, and efficiency. 524 Leaving the details for the implementation discussion below, the 525 pixel interpolation is calculated by applying a kernel filter (using 526 reasonable-precision integer math) three pixels on either side, both 527 horizontally and vertically, of the pixel to be synthesized. The 528 resulting 4x4 block of synthetic pixels is then copied into position 529 exactly as in the case of integer displacements. 531 Each of the eight chroma subblocks is handled similarly. Their 532 motion vectors are never specified explicitly; instead, the motion 533 vector for each chroma subblock is calculated by averaging the 534 vectors of the four Y subblocks that occupy the same area of the 535 frame. Since chroma pixels have twice the diameter (and four times 536 the area) of luma pixels, the calculated chroma motion vectors have 537 1/8 pixel resolution, but the procedure for copying or generating 538 pixels for each subblock is essentially identical to that done in the 539 luma plane. 541 After all the macroblocks have been generated (predicted and 542 corrected with the DCT/WHT residue), a filtering step (the loop 543 filter) is applied to the entire frame. The purpose of the loop 544 filter is to reduce blocking artifacts at the boundaries between 545 macroblocks and between subblocks of the macroblocks. The term loop 546 filter is used because this filter is part of the "coding loop," that 547 is, it affects the reconstructed frame buffers that are used to 548 predict ensuing frames. This is distinguished from the 549 postprocessing filters discussed earlier which affect only the viewed 550 video and do not "feed into" subsequent frames. 552 Next, if signaled in the data, the current frame (or individual 553 macroblocks within the current frame) may replace the golden frame 554 prediction buffer and/or the altref frame buffer. 556 The halos of the frame buffers are next filled as specified above. 557 Finally, at least as far as decoding is concerned, the (references 558 to) the "current" and "last" frame buffers should be exchanged in 559 preparation for the next frame. 561 Various processes may be required (or desired) before viewing the 562 generated frame. As discussed in the frame dimension information 563 below, truncation and/or upscaling of the frame may be required. 564 Some playback systems may require a different frame format (RGB, 565 YUY2, etc.). Finally, as mentioned in the introduction, further 566 postprocessing or filtering of the image prior to viewing may be 567 desired. Since the primary purpose of this document is a decoding 568 specification, the postprocessing is not specified in this document. 570 While the basic ideas of prediction and correction used by VP8 are 571 straightforward, many of the details are quite complex. The 572 management of probabilities is particularly elaborate. Not only do 573 the various modes of intra-prediction and motion vector specification 574 have associated probabilities but they, together with the coding of 575 DCT coefficients and motion vectors, often base these probabilities 576 on a variety of contextual information (calculated from what has been 577 decoded so far), as well as on explicit modification via the frame 578 header. 580 The "top-level" of decoding and frame reconstruction is implemented 581 in the reference decoder files onyxd_if.c and decodframe.c. 583 This concludes our summary of decoding and reconstruction; we 584 continue by discussing the individual aspects in more depth. 586 A reasonable "divide and conquer" approach to implementation of a 587 decoder is to begin by decoding streams composed exclusively of key 588 frames. After that works reliably, interframe handling can be added 589 more easily than if complete functionality were attempted 590 immediately. In accordance with this, we first discuss components 591 needed to decode key frames (most of which are also used in the 592 decoding of interframes) and conclude with topics exclusive to 593 interframes. 595 6. Description of Algorithms 597 As the intent of this document, together with the reference decoder 598 source code, is to specify a platform-independent procedure for the 599 decoding and reconstruction of a VP8 video stream, many (small) 600 algorithms must be described exactly. 602 Due to its near-universality, terseness, ability to easily describe 603 calculation at specific precisions, and the fact that On2's reference 604 VP8 decoder is written in C, these algorithm fragments are written 605 using the C programming language, augmented with a few simple 606 definitions below. 608 The standard (and best) reference for C is The C Programming 609 Language, written by Brian W. Kernighan and Dennis M. Ritchie, and 610 published by Prentice-Hall. 612 Many code fragments will be presented in this document. Some will be 613 nearly identical to corresponding sections of the reference decoder; 614 others will differ. Roughly speaking, there are three reasons for 615 such differences: 617 1. For reasons of efficiency, the reference decoder version may be 618 less obvious. 620 2. The reference decoder often uses large data structures to 621 maintain context that need not be described or used here. 623 3. The authors of this document felt that a different expression of 624 the same algorithm might facilitate exposition. 626 Regardless of the chosen presentation, the calculation effected by 627 any of the algorithms described here is identical to that effected by 628 the corresponding portion of the reference decoder. 630 All VP8 decoding algorithms use integer math. To facilitate 631 specification of arithmetic precision, we define the following types. 633 ---- Begin code block -------------------------------------- 635 typedef signed char int8; /* signed int exactly 8 bits wide */ 636 typedef unsigned char uint8; /* unsigned "" */ 638 typedef short int16; /* signed int exactly 16 bits wide */ 639 typedef unsigned int16 uint16; /* unsigned "" */ 641 /* int32 is a signed integer type at least 32 bits wide */ 643 typedef long int32; /* guaranteed to work on all systems */ 644 typedef int int32; /* will be more efficient on some systems */ 646 typedef unsigned int32 uint32; 648 /* unsigned integer type, at least 16 bits wide, whose exact size 649 is most convenient to whatever processor we are using */ 651 typedef unsigned int uint; 653 /* While pixels themselves are 8-bit unsigned integers, 654 pixel arithmetic often occurs at 16- or 32-bit precision and 655 the results need to be "saturated" or clamped to an 8-bit 656 range. */ 658 typedef uint8 Pixel; 660 Pixel clamp255( int32 v) { return v < 0? 0 : (v < 255? v : 255);} 662 /* As is elaborated in the discussion of the bool_decoder below, 663 VP8 represents probabilities as unsigned 8-bit numbers. */ 665 typedef uint8 Prob; 667 ---- End code block ---------------------------------------- 669 We occasionally need to discuss mathematical functions involving 670 honest-to-goodness "infinite precision" real numbers. The DCT is 671 first described via the cosine function cos; the ratio of the lengths 672 of the circumference and diameter of a circle is denoted pi; at one 673 point, we take a (base 1//2) logarithm denoted log; and pow( x, y) 674 denotes x raised to the power y. If x = 2 and y is a small non- 675 negative integer, pow( 2, y) may be expressed in C as 1 << y. 677 Finally, we sometimes need to divide signed integers by powers of 678 two, that is, we occasionally right-shift signed numbers. The 679 behavior of such shifts (i.e., the propagation of the sign bit) is, 680 perhaps surprisingly, not defined by the C language itself and is 681 left up to individual compilers. Because of the utility of this 682 frequently needed operation, it is at least arguable that it should 683 be defined by the language (to naturally propagate the sign bit) and, 684 at a minimum, should be correctly implemented by any reasonable 685 compiler. In the interest of strict portability, we attempt to call 686 attention to these shifts when they arise. 688 7. Boolean Entropy Decoder 690 As discussed in the overview above, essentially the entire VP8 data 691 stream is encoded using a boolean entropy coder. 693 An understanding of the bool_decoder is critical to the 694 implementation of a VP8 decompressor, so we discuss in detail. It is 695 easier to comprehend the bool_decoder in conjunction with the 696 bool_encoder used by the compressor to write the compressed data 697 partitions. 699 The bool_encoder encodes (and the bool_decoder decodes) one bool 700 (zero-or-one boolean value) at a time. Its purpose is to losslessly 701 compress a sequence of bools for which the probability of their being 702 zero or one can be well-estimated (via constant or previously-coded 703 information) at the time they are written, using identical 704 corresponding probabilities at the time they are read. 706 As the reader is probably aware, if a bool is much more likely to be 707 zero than one (for instance), it can, on average, be faithfully 708 encoded using much less than one bit per value. The bool_encoder 709 exploits this. 711 In the 1940s, Claude Shannon proved that there is a lower bound for 712 the average datarate of a faithful encoding of a sequence of bools 713 (whose probability distributions are known and are independent of 714 each other) and also that there are encoding algorithms that 715 approximate this lower bound as closely as one wishes. 717 If we encode a sequence of bools whose probability of being zero is p 718 (and whose probability of being 1 is 1-p), the lowest possible 719 datarate per value is 721 plog(p) + (1-p)log(1-p); 723 taking the logarithms to the base 1//2 expresses the datarate in 724 bits/value. 726 We give two simple examples. At one extreme, if p=1//2, then log(p) 727 = log(1-p) = 1 and the lowest possible datarate per bool is 1//2 + 728 1//2 = 1, that is, we cannot do any better than simply literally 729 writing out bits. At another extreme, if p is very small, say p=1// 730 1024, then log(p)=10, log(1-p) is roughly .0014, and the lowest 731 possible datarate is approximately 10//1024 + .0014, roughly 1/100 of 732 a bit per bool. 734 Because most of the bools in the VP8 datastream have zero- 735 probabilities nowhere near 1//2, the compression provided by the 736 bool_encoder is critical to the performance of VP8. 738 The bool coder used by VP8 is a variant of an arithmetic coder. An 739 excellent discussion of arithmetic coding (and other lossless 740 compression techniques) can be found in the book Text Compression by 741 Timothy C. Bell, John G. Cleary, and Ian H. Witten, published in 1990 742 by Prentice-Hall. 744 7.1. Underlying Theory of Coding 746 The basic idea used by the bool coder is to consider the entire data 747 stream (either of the partitions in our case) as the binary expansion 748 of a single number x with 0 <= x < 1. The bits (or bytes) in x are 749 of course written from high to low order and if b[j] (B[j]) is the 750 j^(th) bit (byte) in the partition, the value x is simply the sum 751 (starting with j = 1) of pow(2, -j) * b[j] or pow(256, -j) * B[j]. 753 Before the first bool is coded, all values of x are possible. 755 The coding of each bool restricts the possible values of x in 756 proportion to the probability of what is coded. If p1 is the 757 probability of the first bool being zero and a zero is coded, the 758 range of possible x is restricted to 0 <= x < p1. If a one is coded, 759 the range becomes p1 <= x < 1. 761 The coding continues by repeating the same idea. At every stage, 762 there is an interval a <= x < b of possible values of x. If p is the 763 probability of a zero being coded at this stage and a zero is coded, 764 the interval becomes a <= x < a + (p(b-a)). If a one is coded, the 765 possible x are restricted to a + (p(b-a)) <= x < b. 767 Assuming only finitely many values are to be coded, after the encoder 768 has received the last bool, it can write as its output any value x 769 that lies in the final interval. VP8 simply writes the left endpoint 770 of the final interval. Consequently, the output it would make if 771 encoding were to stop at any time either increases or stays the same 772 as each bool is encoded. 774 Decoding parallels encoding. The decoder is presented with the 775 number x, which has only the initial restriction 0 <= x < 1. To 776 decode the first bool, the decoder is given the first probability p1. 777 If x < p1, a zero is decoded; if x >= p1, a one is decoded. In 778 either case, the new restriction on x, that is, the interval of 779 possible x, is remembered. 781 Decoding continues in exactly the same way: If a <= x < b is the 782 current interval and we are to decode a bool with zero-probability p, 783 we return a zero if a <= x < a + (p(b-a)) and a one if a + (p(b-a)) 784 <= x < b. In either case, the new restriction is remembered in 785 preparation for decoding the next bool. 787 The process outlined above uses real numbers of infinite precision to 788 express the probabilities and ranges. It is true that, if one could 789 actualize this process and coded a large number of bools whose 790 supplied probabilities matched their value distributions, the 791 datarate achieved would approach the theoretical minimum as the 792 number of bools encoded increased. 794 Unfortunately, computers operate at finite precision and an 795 approximation to the theoretically perfect process described above is 796 necessary. Such approximation increases the datarate but, at quite 797 moderate precision and for a wide variety of data sets, this increase 798 is negligible. 800 The only conceptual limitations are, first, that coder probabilities 801 must be expressed at finite precision and, second, that the decoder 802 be able to detect each individual modification to the value interval 803 via examination of a fixed amount of input. As a practical matter, 804 many of the implementation details stem from the fact that the coder 805 can function using only a small "window" to incrementally read or 806 write the arbitrarily precise number x. 808 7.2. Practical Algorithm Description 810 VP8's bool coder works with 8-bit probabilities p. The range of such 811 p is 0 <= p <= 255; the actual probability represented by p is 812 p//256. Also, the coder is designed so that decoding of a bool 813 requires no more than an 8-bit comparison and so that the state of 814 both the encoder and decoder can be easily represented using a small 815 number of unsigned 16-bit integers. 817 The details are most easily understood if we first describe the 818 algorithm using bit-at-a-time input and output. Aside from the 819 ability to maintain a position in this bitstream and write/read bits, 820 the encoder also needs the ability to add 1 to the bits already 821 output; after writing n bits, adding 1 to the existing output is the 822 same thing as adding pow( 2, -n) to x. 824 Together with the bit position, the encoder must maintain two 825 unsigned 8-bit numbers which we call "bottom" and "range". Writing w 826 for the n bits already written and S = pow( 2, - n - 8) for the scale 827 of the current bit position one byte out, we have the following 828 constraint on all future values v of w (including the final value v = 829 x): 831 w + ( S * "bottom" ) <= v < w + ( S * ( "bottom" + "range" ) ) 832 Thus, appending "bottom" to the already-written bits w gives the left 833 endpoint of the interval of possible values, appending "bottom" + 834 "range" gives the right endpoint, "range" itself (scaled to the 835 current output position) is the length of the interval. 837 So that our probabilistic encodings are reasonably accurate, we do 838 not let "range" vary by more than a factor of two: It stays within 839 the bounds 128 <= "range" <= 255. 841 The process for encoding a boolean value "val" whose probability of 842 being zero is "prob" / 256 -- and whose probability of being one is ( 843 256 - "prob" ) / 256 -- with 1 <= "prob" <= 255 is as follows. 845 Using an unsigned 16-bit multiply followed by an unsigned right 846 shift, we calculate an unsigned 8-bit "split" value: 848 split = 1 + ((("range" - 1) * "probability")]] >> 8) 850 "split" is approximately ( "prob" / 256 ) * "range" and lies within 851 the bounds 1 <= "split" <= "range" - 1. These bounds ensure the 852 correctness of the decoding procedure described below. 854 If "val" is false, we leave the left interval endpoint "bottom" alone 855 and reduce "range", replacing it by "split". If "val" is true, we 856 move up the left endpoint to "bottom" + "split", propagating any 857 carry to the already-written value "w" (this is where we need the 858 ability to add 1 to "w"), and reduce "range" to "range" - "split". 860 Regardless of the value encoded, "range" has been reduced and now has 861 the bounds 1 <= "range" <= 254. If "range" < 128, the encoder 862 doubles it and shifts the high-order bit out of "bottom" to the 863 output as it also doubles "bottom", repeating this process one bit at 864 a time until 128 <= "range" <= 255. Once this is completed, the 865 encoder is ready to accept another bool, maintaining the constraints 866 described above. 868 After encoding the last bool, the partition may be completed by 869 appending "bottom" to the bitstream. 871 The decoder mimics the state of the encoder. It maintains, together 872 with an input bit position, two unsigned 8-bit numbers, a "range" 873 identical to that maintained by the encoder and a "value". Decoding 874 one bool at a time, the decoder (in effect) tracks the same left 875 interval endpoint as does the encoder and subtracts it from the 876 remaining input. Appending the unread portion of the bitstream to 877 the 8-bit "value" gives the difference between the actual value 878 encoded and the known left endpoint. 880 The decoder is initialized by setting "range" = 255 and reading the 881 first 16 input bits into "value". The decoder maintains "range" and 882 calculates "split" in exactly the same way as does the encoder. 884 To decode a bool, it compares "value" to "split"; if "value" < 885 "split", the bool is zero, and "range" is replaced with "split". If 886 "value" >= "split", the bool is one, "range" is replaced with "range" 887 - "split", and "value" is replaced with "value" - "split". 889 Again, "range" is doubled one bit at a time until it is at least 128. 890 The "value" is doubled in parallel, shifting a new input bit into the 891 "bottom" each time. 893 Writing "Value" for "value" together with the unread input bits and 894 "Range" for "range" extended indefinitely on the right by zeros, the 895 condition "Value" < "Range" is maintained at all times by the 896 decoder. In particular, the bits shifted out of "value" as it is 897 doubled are always zero. 899 7.3. Actual Implementation 901 The C code below gives complete implementations of the encoder and 902 decoder described above. While they are logically identical to the 903 "bit-at-a-time" versions, they internally buffer a couple of extra 904 bytes of the bitstream. This allows I/O to be done (more 905 practically) a byte at a time and drastically reduces the number of 906 carries the encoder has to propagate into the already-written data. 908 Another (logically equivalent) implementation may be found in the 909 reference decoder files dboolhuff.h and dboolhuff.c. 911 ---- Begin code block -------------------------------------- 913 /* Encoder first */ 915 typedef struct { 916 uint8 *output; /* ptr to next byte to be written */ 917 uint32 range; /* 128 <= range <= 255 */ 918 uint32 bottom; /* minimum value of remaining output */ 919 int bit_count; /* # of shifts before an output byte 920 is available */ 921 } bool_encoder; 923 /* Must set initial state of encoder before writing any bools. */ 925 void init_bool_encoder( bool_encoder *e, uint8 *start_partition) 926 { 927 e->output = start_partition; 928 e->range = 255; 930 e->bottom = 0; 931 e->bit_count = 24; 932 } 934 /* Encoding very rarely produces a carry that must be propagated 935 to the already-written output. The arithmetic guarantees that 936 the propagation will never go beyond the beginning of the 937 output. Put another way, the encoded value x is always less 938 than one. */ 940 void add_one_to_output( uint8 *q) 941 { 942 while( *--q == 255) 943 *q = 0; 944 ++*q; 945 } 947 /* Main function writes a bool_value whose probability of being 948 zero is (expected to be) prob/256. */ 950 void write_bool( bool_encoder *e, Prob prob, int bool_value) 951 { 952 /* split is approximately (range * prob) / 256 and, 953 crucially, is strictly bigger than zero and strictly 954 smaller than range */ 956 uint32 split = 1 + ( ((e->range - 1) * prob) >> 8); 958 if( bool_value) { 959 e->bottom += split; /* move up bottom of interval */ 960 e->range -= split; /* with corresponding decrease in range */ 961 } else 962 e->range = split; /* decrease range, leaving bottom alone */ 964 while( e->range < 128) 965 { 966 e->range <<= 1; 968 if( e->bottom & (1 << 31)) /* detect carry */ 969 add_one_to_output( e->output); 971 e->bottom <<= 1; /* before shifting bottom */ 973 if( !--e->bit_count) { /* write out high byte of bottom ... */ 975 *e->output++ = (uint8) (e->bottom >> 24); 976 e->bottom &= (1 << 24) - 1; /* ... keeping low 3 bytes */ 978 e->bit_count = 8; /* 8 shifts until next output */ 979 } 980 } 981 } 983 /* Call this function (exactly once) after encoding the last 984 bool value for the partition being written */ 986 void flush_bool_encoder( bool_encoder *e) 987 { 988 int c = e->bit_count; 989 uint32 v = e->bottom; 991 if( v & (1 << (32 - c))) /* propagate (unlikely) carry */ 992 add_one_to_output( e->output); 993 v <<= c & 7; /* before shifting remaining output */ 994 c >>= 3; /* to top of internal buffer */ 995 while( --c >= 0) 996 v <<= 8; 997 c = 4; 998 while( --c >= 0) { /* write remaining data, possibly padded */ 999 *e->output++ = (uint8) (v >> 24); 1000 v <<= 8; 1001 } 1002 } 1004 /* Decoder state exactly parallels that of the encoder. 1005 "value", together with the remaining input, equals the 1006 complete encoded number x less the left endpoint of the 1007 current coding interval. */ 1009 typedef struct { 1010 uint8 *input; /* pointer to next compressed data byte */ 1011 uint32 range; /* always identical to encoder's range */ 1012 uint32 value; /* contains at least 24 significant bits */ 1013 int bit_count; /* # of bits shifted out of 1014 value, at most 7 */ 1015 } bool_decoder; 1017 /* Call this function before reading any bools from the 1018 partition.*/ 1020 void init_bool_decoder( bool_decoder *d, uint8 *start_partition) 1021 { 1022 { 1023 int i = 0; 1024 d->value = 0; /* value = first 24 input bytes */ 1025 while( ++i <= 24) 1026 d->value = (d->value << 8) | *start_partition++; 1027 } 1029 d->input = start_partition; /* ptr to next byte to be read */ 1030 d->range = 255; /* initial range is full */ 1031 d->bit_count = 0; /* have not yet shifted out any bits */ 1032 } 1034 /* Main function reads a bool encoded at probability prob/256, 1035 which of course must agree with the probability used when the 1036 bool was written. */ 1038 int read_bool( bool_decoder *d, Prob prob) 1039 { 1040 /* range and split are identical to the corresponding values 1041 used by the encoder when this bool was written */ 1043 uint32 split = 1 + ( ((d->range - 1) * prob) >> 8); 1044 uint32 SPLIT = split << 8; 1045 int retval; /* will be 0 or 1 */ 1047 if( d->value >= SPLIT) { /* encoded a one */ 1048 retval = 1; 1049 d->range -= split; /* reduce range */ 1050 d->value -= SPLIT; /* subtract off left endpoint of interval */ 1051 } else { /* encoded a zero */ 1052 retval = 0; 1053 d->range = split; /* reduce range, no change in left endpoint */ 1054 } 1056 while( d->range < 128) { /* shift out irrelevant value bits */ 1057 d->value <<= 1; 1058 d->range <<= 1; 1059 if( ++d->bit_count == 8) { /* shift in new bits 8 at a time */ 1060 d->bit_count = 0; 1061 d->value |= *d->input++; 1062 } 1063 } 1064 return retval; 1065 } 1067 /* Convenience function reads a "literal", that is, a "num_bits" 1068 wide unsigned value whose bits come high- to low-order, with 1069 each bit encoded at probability 128 (i.e., 1/2). */ 1071 uint32 read_literal( bool_decoder *d, int num_bits) 1072 { 1073 uint32 v = 0; 1074 while( num_bits--) 1075 v = (v << 1) + read_bool( d, 128); 1076 return v; 1077 } 1079 /* Variant reads a signed number */ 1081 int32 read_signed_literal( bool_decoder *d, int num_bits) 1082 { 1083 int32 v = 0; 1084 if( !num_bits) 1085 return 0; 1086 if( read_bool( d, 128)) 1087 v = -1; 1088 while( --num_bits) 1089 v = (v << 1) + read_bool( d, 128); 1090 return v; 1091 } 1093 ---- End code block ---------------------------------------- 1095 8. Compressed Data Components 1097 At the lowest level, VP8's compressed data is simply a sequence of 1098 probabilistically-encoded bools. Most of this data is composed of 1099 (slightly) larger semantic units fashioned from bools, which we 1100 describe here. 1102 We sometimes use these descriptions in C expressions within data 1103 format specifications. In this context, they refer to the return 1104 value of a call to an appropriate bool_decoder d, reading (as always) 1105 from its current reference point. 1107 +--------------+-------+--------------------------------------------+ 1108 | Call | Alt. | Return | 1109 +--------------+-------+--------------------------------------------+ 1110 | Bool(p) | B(p) | Bool with probability p of being 0. | 1111 | | | Abbreviated B(p). Return value of | 1112 | | | read_bool(d, p). | 1113 | | | | 1114 | Flag | F | A one-bit flag (same thing as a B(128) or | 1115 | | | an L(1)). Abbreviated F. Return value of | 1116 | | | read_bool(d, 128). | 1117 | | | | 1118 | Lit(n) | L(n) | Unsigned n-bit number encoded as n flags | 1119 | | | (a "literal"). Abbreviated L(n). The bits | 1120 | | | are read from high to low order. Return | 1121 | | | value of read_literal(d, n). | 1122 | | | | 1123 | SignedLit(n) | | Signed n-bit number encoded similarly to | 1124 | | | an L(n). Return value of | 1125 | | | read_signed_literal(d, n). These are rare. | 1126 | | | | 1127 | P(8) | | An 8-bit probability. No different from an | 1128 | | | L(8), but we sometimes use this notation | 1129 | | | to emphasize that a probability is being | 1130 | | | coded. | 1131 | | | | 1132 | P(7) | | A 7-bit specification of an 8-bit | 1133 | | | probability. Coded as an L(7) number x; | 1134 | | | the resulting 8-bit probability is x ? x | 1135 | | | << 1 : 1. | 1136 | | | | 1137 | F? X | | A flag which, if true, is followed by a | 1138 | | | piece of data X. | 1139 | | | | 1140 | F? X:Y | | A flag which, if true, is followed by X | 1141 | | | and, if false, is followed by Y. Also used | 1142 | | | to express a value where Y is an implicit | 1143 | | | default (not encoded in the data stream), | 1144 | | | as in F? P(8):255, which expresses an | 1145 | | | optional probability: if the flag is true, | 1146 | | | the probability is specified as an 8-bit | 1147 | | | literal, while if the flag is false, the | 1148 | | | probability defaults to 255. | 1149 | | | | 1150 | B(p)? X | B(p)? | Variants of the above using a boolean | 1151 | | X:Y | indicator whose probability is not | 1152 | | | necessarily 128. | 1153 | | | | 1154 | X | | Multi-component field, the specifics of | 1155 | | | which will be given at a more appropriate | 1156 | | | point in the discussion. | 1157 | | | | 1158 | T | | Tree-encoded value from small alphabet. | 1159 +--------------+-------+--------------------------------------------+ 1161 The last type requires elaboration. We often wish to encode 1162 something whose value is restricted to a small number of 1163 possibilities (the alphabet). 1165 This is done by representing the alphabet as the leaves of a small 1166 binary tree. The (non-leaf) nodes of the tree have associated 1167 probabilities p and correspond to calls to read_bool(d, p). We think 1168 of a zero as choosing the left branch below the node and a one as 1169 choosing the right branch. 1171 Thus every value (leaf) whose tree depth is x is decoded after 1172 exactly x calls to read_bool. 1174 A tree representing an encoding of an alphabet of n possible values 1175 always contains n-1 non-leaf nodes, regardless of its shape (this is 1176 easily seen by induction on n). 1178 There are many ways that a given alphabet can be so represented. The 1179 choice of tree has little impact on datarate but does affect decoder 1180 performance. The trees used by VP8 are chosen to (on average) 1181 minimize the number of calls to read_bool. This amounts to shaping 1182 the tree so that more probable values have smaller tree depth than do 1183 less probable values. 1185 Readers familiar with Huffman coding will notice that, given an 1186 alphabet together with probabilities for each value, the associated 1187 Huffman tree minimizes the expected number of calls to read_bool. 1189 Such readers will also realize that the coding method described here 1190 never results in higher datarates than does the Huffman method and, 1191 indeed, often results in much lower datarates. Huffman coding is, in 1192 fact, nothing more than a special case of this method in which each 1193 node probability is fixed at 128 (i.e., 1/2). 1195 8.1. Tree Coding Implementation 1197 We give a suggested implementation of a tree data structure followed 1198 by a couple of actual examples of its usage by VP8. 1200 It is most convenient to represent the values using small positive 1201 integers, typically an enum counting up from zero. The largest 1202 alphabet (used to code DCT coefficients, described in Chapter 13 that 1203 is tree-coded by VP8 has only 12 values. The tree for this alphabet 1204 adds 11 interior nodes and so has a total of 23 positions. Thus, an 1205 8-bit number easily accommodates both a tree position and a return 1206 value. 1208 A tree may then be compactly represented as an array of (pairs of) 1209 8-bit integers. Each (even) array index corresponds to an interior 1210 node of the tree;, the zeroth index of course corresponds to the root 1211 of the tree. The array entries come in pairs corresponding to the 1212 left (0) and right (1) branches of the subtree below the interior 1213 node. We use the convention that a positive (even) branch entry is 1214 the index of a deeper interior node, while a nonpositive entry v 1215 corresponds to a leaf whose value is -v. 1217 The node probabilities associated to a tree-coded value are stored in 1218 an array whose indices are half the indices of the corresponding tree 1219 positions. The length of the probability array is one less than the 1220 size of the alphabet. 1222 Here is C code implementing the foregoing. The advantages of our 1223 data structure should be noted. Aside from the smallness of the 1224 structure itself, the tree-directed reading algorithm is essentially 1225 a single line of code. 1227 ---- Begin code block -------------------------------------- 1229 /* A tree specification is simply an array of 8-bit integers. */ 1231 typedef int8 tree_index; 1232 typedef const tree_index Tree[]; 1234 /* Read and return a tree-coded value at the current decoder 1235 position. */ 1237 int treed_read( 1238 bool_decoder * const d, /* bool_decoder always returns a 0 or 1 */ 1239 Tree t, /* tree specification */ 1240 const Prob p[] /* corresponding interior node probabilities */ 1241 ) { 1242 register tree_index i = 0; /* begin at root */ 1244 /* Descend tree until leaf is reached */ 1246 while( ( i = t[ i + read_bool( d, p[i>>1]) ] ) > 0) {} 1248 return -i; /* return value is negation of nonpositive index */ 1249 } 1251 ---- End code block ---------------------------------------- 1253 Tree-based decoding is implemented in the reference decoder file 1254 tree_reader.h. 1256 8.2. Tree Coding Example 1258 As a multi-part example, without getting too far into the semantics 1259 of macroblock decoding (which is of course taken up below), we look 1260 at the "mode" coding for intra-predicted macroblocks. 1262 It so happens that, because of a difference in statistics, the Y (or 1263 luma) mode encoding uses two different trees: one for key frames and 1264 another for interframes. This is the only instance in VP8 of the 1265 same dataset being coded by different trees under different 1266 circumstances. The UV (or chroma) modes are a proper subset of the Y 1267 modes and, as such, have their own decoding tree. 1269 ---- Begin code block -------------------------------------- 1271 typedef enum 1272 { 1273 DC_PRED, /* predict DC using row above and column to the left */ 1274 V_PRED, /* predict rows using row above */ 1275 H_PRED, /* predict columns using column to the left */ 1276 TM_PRED, /* propagate second differences a la "true motion" */ 1278 B_PRED, /* each Y subblock is independently predicted */ 1280 num_uv_modes = B_PRED, /* first four modes apply to chroma */ 1281 num_ymodes /* all modes apply to luma */ 1282 } 1283 intra_mbmode; 1285 /* The aforementioned trees together with the implied codings as 1286 comments. 1287 Actual (i.e., positive) indices are always even. 1288 Value (i.e., nonpositive) indices are arbitrary. */ 1290 const tree_index ymode_tree [2 * (num_ymodes - 1)] = 1291 { 1292 -DC_PRED, 2, /* root: DC_PRED = "0", "1" subtree */ 1293 4, 6, /* "1" subtree has 2 descendant subtrees */ 1294 -V_PRED, -H_PRED, /* "10" subtree: V_PRED = "100", 1295 H_PRED = "101" */ 1296 -TM_PRED, -B_PRED /* "11" subtree: TM_PRED = "110", 1297 B_PRED = "111" */ 1298 }; 1300 const tree_index kf_ymode_tree [2 * (num_ymodes - 1)] = 1301 { 1302 -B_PRED, 2, /* root: B_PRED = "0", "1" subtree */ 1303 4, 6, /* "1" subtree has 2 descendant subtrees */ 1304 -DC_PRED, -V_PRED, /* "10" subtree: DC_PRED = "100", 1305 V_PRED = "101" */ 1306 -H_PRED, -TM_PRED /* "11" subtree: H_PRED = "110", 1307 TM_PRED = "111" */ 1308 }; 1310 const tree_index uv_mode_tree [2 * (num_uv_modes - 1)] = 1311 { 1312 -DC_PRED, 2, /* root: DC_PRED = "0", "1" subtree */ 1313 -V_PRED, 4, /* "1" subtree: V_PRED = "10", 1314 "11" subtree */ 1315 -H_PRED, -TM_PRED /* "11" subtree: H_PRED = "110", 1316 TM_PRED = "111" */ 1317 }; 1319 /* Given a bool_decoder d, a Y mode might be decoded as follows.*/ 1321 const Prob pretend_its_huffman [num_ymodes - 1] = 1322 { 128, 128, 128, 128}; 1324 Ymode = (intra_mbmode) treed_read( d, ymode_tree, 1325 pretend_its_huffman); 1327 ---- End code block ---------------------------------------- 1329 Since it greatly facilitates re-use of reference code and since there 1330 is no real reason to do otherwise, it is strongly suggested that any 1331 decoder implementation use exactly the same enumeration values and 1332 probability table layouts as described in this document (and in the 1333 reference code) for all tree-coded data in VP8. 1335 9. Frame Header 1337 The uncompressed data chunk at the start of each frame and the first 1338 part of the first data partition contains information pertaining to 1339 the frame as a whole. We list the fields in the order of occurrence, 1340 giving details for some of the fields. Other details are postponed 1341 until a more logical point in our overall description. Most of the 1342 header decoding occurs in the reference decoder file decodeframe.c. 1344 9.1. Uncompressed Data Chunk 1346 The uncompressed data chunk comprises a common (for key frames and 1347 interframes) 3-byte frame tag that contains four fields, as follows: 1349 1. A 1-bit frame type (0 for key frames, 1 for interframes). 1351 2. A 3-bit version number (0 - 3 are defined as four different 1352 profiles with different decoding complexity; other values may be 1353 defined for future variants of the VP8 data format). 1355 3. A 1-bit show_frame flag (0 when current frame is not for display, 1356 1 when current frame is for display). 1358 4. A 19-bit field containing the size of the first data partition in 1359 bytes. 1361 Version number enables or disables certain features in the bitstream, 1362 as follows: 1364 +---------+-------------------------+-------------+ 1365 | Version | Reconstruction filter | Loop filter | 1366 +---------+-------------------------+-------------+ 1367 | 0 | Bicubic | Normal | 1368 | | | | 1369 | 1 | Bilinear | Simple | 1370 | | | | 1371 | 2 | Bilinear | None | 1372 | | | | 1373 | 3 | None | None | 1374 | | | | 1375 | Other | Reserved for future use | | 1376 +---------+-------------------------+-------------+ 1378 The reference software also adjusts the loop filter based on version 1379 number, as per the table above. Version number 1 implies a "simple" 1380 loop filter and version numbers 2 and 3 imply no loop filter. 1381 However, the "simple" filter setting in this context has no effect 1382 whatsoever on the decoding process, and the "no loop filter" setting 1383 only forces the reference encoder to set filter level equal to 0. 1384 Neither affect the decoding process. In decoding, the only loop 1385 filter settings that matter are those in the frame header. 1387 For key frames the frame tag is followed by a further 7 bytes of 1388 uncompressed data, as follows: 1390 ---- Begin code block -------------------------------------- 1392 Start code byte 0 0x9d 1393 Start code byte 1 0x01 1394 Start code byte 2 0x2a 1396 16 bits : (2 bits Horizontal Scale << 14) | Width (14 bits) 1397 16 bits : (2 bits Vertical Scale << 14) | Height (14 bits) 1399 ---- End code block ---------------------------------------- 1401 The following source code segment illustrates validation of the start 1402 code and reading the width, height and scale factors for a key frame. 1404 ---- Begin code block -------------------------------------- 1406 unsigned char *c = pbi->Source+3; 1408 // vet via sync code 1409 if(c[0]!=0x9d||c[1]!=0x01||c[2]!=0x2a) 1410 return -1; 1412 ---- End code block ---------------------------------------- 1414 where pbi->source points to the beginning of the frame. 1416 The following code reads the image dimension from the bitstream: 1418 ---- Begin code block -------------------------------------- 1420 pc->Width = swap2(*(unsigned short*)(c+3))&0x3fff; 1421 pc->horiz_scale = swap2(*(unsigned short*)(c+3))>>14; 1422 pc->Height = swap2(*(unsigned short*)(c+5))&0x3fff; 1423 pc->vert_scale = swap2(*(unsigned short*)(c+5))>>14; 1425 ---- End code block ---------------------------------------- 1427 where swap2 macro takes care of the endian on different platform: 1429 ---- Begin code block -------------------------------------- 1431 #if defined(__ppc__) || defined(__ppc64__) 1432 # define swap2(d) \ 1433 ((d&0x000000ff)<<8) | \ 1434 ((d&0x0000ff00)>>8) 1435 #else 1436 # define swap2(d) d 1437 #endif 1439 ---- End code block ---------------------------------------- 1441 While each frame is encoded as a raster scan of 16x16 macroblocks, 1442 the frame dimensions are not necessarily evenly divisible by 16. In 1443 this case, write ew = 16 - (width & 15) and eh = 16 - (height & 15) 1444 for the excess width and height, respectively. Although they are 1445 encoded, the last ew columns and eh rows are not actually part of the 1446 image and should be discarded before final output. However, these 1447 "excess pixels" should be maintained in the internal reconstruction 1448 buffer used to predict ensuing frames. 1450 The scaling specifications for each dimension are encoded as follows. 1452 +-------+--------------------------------------+ 1453 | Value | Scaling | 1454 +-------+--------------------------------------+ 1455 | 0 | No upscaling (the most common case). | 1456 | | | 1457 | 1 | Upscale by 5/4. | 1458 | | | 1459 | 2 | Upscale by 5/3. | 1460 | | | 1461 | 3 | Upscale by 2. | 1462 +-------+--------------------------------------+ 1464 Upscaling does not affect the reconstruction buffer, which should be 1465 maintained at the encoded resolution. Any reasonable method of 1466 upsampling (including any that may be supported by video hardware in 1467 the playback environment) may be used. Since scaling has no effect 1468 on decoding, we do not discuss it any further. 1470 As discussed in Chapter 5, allocation (or re-allocation) of data 1471 structures (such as the reconstruction buffer) whose size depends on 1472 dimension will be triggered here. 1474 9.2. Color Space and Pixel Type (Key Frames-only) 1476 +-------+------------------------------------------+ 1477 | Field | Value | 1478 +-------+------------------------------------------+ 1479 | L(1) | 1-bit color space type specification | 1480 | | | 1481 | L(1) | 1-bit pixel value clamping specification | 1482 +-------+------------------------------------------+ 1484 The color space type bit is encoded as the following: 1486 o 0 - YUV color space similar to the YCrCb color space defined in 1487 ITU-R BT.601 1489 o 1 - YUV color space whose digital conversion to RGB does not 1490 involve multiplication and division 1492 It should be noted that in either case, the actual conversion between 1493 YUV and RGB is not part of this specification. 1495 Note: In the initial release of VP8 only color space type 0 is 1496 supported. 1498 The pixel value clamping type bit is encoded as the following: 1500 o 0 - Decoders are required to clamp the reconstructed pixel values 1501 to between 0 and 255 (inclusive). 1503 o 1 - Reconstructed pixel values are guaranteed to be between 0 and 1504 255, no clamping is necessary. 1506 Information in this subsection does not appear in interframes. 1508 9.3. Segment-based Adjustments 1510 This subsection contains probability and value information for 1511 implementing segment adaptive adjustments to default decoder 1512 behaviors. The data in this section is used in the decoding of the 1513 ensuing per-segment information and applies to the entire frame. 1514 When segment adaptive adjustments are enabled, each macroblock will 1515 be assigned a segment ID. Macroblocks with the same segment ID 1516 belong to same segment, and have the same adaptive adjustments over 1517 default baseline values for the frame. The adjustments can be 1518 quantization level or loop filter strength. 1520 The context for decoding this feature is provided by section B of the 1521 frame header. It contains: 1523 1. A segmentation_enabled Flag which if 1 (0), enables (disables) 1524 the feature for this frame. The remaining fields occur if the 1525 feature is enabled. 1527 2. L(1) indicates if the segment map is updated for the current 1528 frame (update_mb_segmentaton_map) 1530 3. L(1) indicates if the segment feature data items are updated for 1531 the current frame 1533 4. If flag in 3 is 1, the following fields occur: 1535 1. L(1) the mode of segment feature data, can be absolute value 1536 mode or delta value mode, later mode, feature data is the 1537 difference against current frame defaults. 1539 2. Segment feature data items are decoded segment by each 1540 segment for each segment feature. For every data item, a one 1541 bit flag indicating if the item is 0 or a non-zero value to 1542 be decoded. If there is non-zero value, the value is decoded 1543 as a magnitude L(n) followed by a one bit sign (L(1), 0 for 1544 positive and 1 for negative). The length n can be looked up 1545 from a pre-defined length table for all feature data. 1547 5. If flag in 2 is 1, the probabilities of the decoding tree for 1548 segment map are decoded from the bitstream. Each probability is 1549 decoded with one bit flag indicating if the probability is the 1550 default value of 255 (flag is 0), or the probability is an 8-bit 1551 value, L(8), from the bitstream. 1553 The layout and semantics supporting this feature at the macroblock 1554 level will be described in Chapter 10. 1556 9.4. Loop Filter Type and Levels 1558 VP8 supports two types of loop filter, having different computational 1559 complexity. The following bits occur in the header to support the 1560 selection of the baseline type, strength and sharpness behavior of 1561 the loop filter used for the current frame. 1563 +-------+-------------------+ 1564 | Index | Description | 1565 +-------+-------------------+ 1566 | L(1) | filter_type | 1567 | | | 1568 | L(6) | loop_filter_level | 1569 | | | 1570 | L(3) | sharpness_level | 1571 +-------+-------------------+ 1573 The meaning of these numbers will be further explained in Chapter 15. 1575 VP8 has a feature in the bitstream that enables adjustment of the 1576 loop filter level based on a macroblock's prediction mode and 1577 reference frame. The per-macroblock adjustment is done through delta 1578 values against default loop filter level for the current frame. This 1579 subsection contains flag and value information for implementing per- 1580 macroblock loop filter level adjustment to default decoder behaviors. 1581 The data in this section is used in the decoding of the ensuing per- 1582 macroblock information and applies to the entire frame. 1584 L(1) is a one-bit flag indicating if macroblock loop filter 1585 adjustment is on for the current frame. 0 means such feature is not 1586 supported in the current frame and 1 means this feature is enabled 1587 for the current frame. 1589 Whether the adjustment is based on reference frame or encoding mode, 1590 the adjustment of loop filter level is done via a delta value against 1591 a baseline loop filter value. The delta values are updated for the 1592 current frame if an L(1) bit, mode_ref_lf_delta_update, takes the 1593 value 1. There are two groups of delta values, one group of delta 1594 values are for reference frame-based adjustments, the other group is 1595 for mode-based adjustments. The number of delta values in the two 1596 groups is MAX_REF_LF_DELTAS and MAX_MODE_LF_DELTAS, respectively. 1597 For every value within the two groups, there is one bit L(1) to 1598 indicate if the particular value is updated. When one is updated 1599 (1), it is transmitted as a six-bit magnitude L(6) followed by a one- 1600 bit sign flag (L(1), 0 for positive and 1 for negative). 1602 9.5. Token Partition and Partition Data Offsets 1604 VP8 allows DCT coefficients to be packed into multiple partitions 1605 besides the first partition with header and per-macroblock prediction 1606 information, so the decoder can perform parallel decoding in an 1607 efficient manner. There are two bits L(2) used to indicate the 1608 number of coefficient data partitions within a compressed frame. The 1609 two bits are defined in the following table: 1611 +-------+-------+----------------------+ 1612 | Bit 1 | Bit 0 | Number of Partitions | 1613 +-------+-------+----------------------+ 1614 | 0 | 0 | 1 | 1615 | | | | 1616 | 0 | 1 | 2 | 1617 | | | | 1618 | 1 | 0 | 4 | 1619 | | | | 1620 | 1 | 1 | 8 | 1621 +-------+-------+----------------------+ 1623 When the number of partitions is greater than one, offsets are 1624 embedded in the bitstream to provide the decoder direct access to 1625 token partitions. Each offset is written in 3 bytes (24 bits). 1626 Since the offset to the first partition is always 0, only the offsets 1627 for partitions other than the first partition are encoded in the 1628 bitstream. The partitioned data are consecutive in the bitstream, so 1629 offsets can also be used to calculate the data size of each 1630 partition. The following pseudo code illustrates how the size/offset 1631 is defined by the three bytes in the bitstream. 1633 ---- Begin code block -------------------------------------- 1635 Offset/size = (uint32)(byte0) + ((uint32)(byte1)<<8) 1636 + ((uint32)(byte2)<<16); 1638 ---- End code block ---------------------------------------- 1640 9.6. Dequantization Indices 1642 All residue signals are specified via a quantized 4x4 DCT applied to 1643 the Y, U, V, or Y2 subblocks of a macroblock. As detailed in Chapter 1644 14, before inverting the transform, each decoded coefficient is 1645 multiplied by one of six dequantization factors, the choice of which 1646 depends on the plane (Y, chroma = U or V, Y2) and coefficient 1647 position (DC = coefficient 0, AC = coefficients 1-15). The six 1648 values are specified using 7-bit indices into six corresponding fixed 1649 tables (the tables are given in Chapter 14). 1651 The first 7-bit index gives the dequantization table index for Y 1652 plane AC coefficients, called yac_qi. It is always coded and acts as 1653 a baseline for the other 5 quantization indices, each of which is 1654 represented by a delta from this baseline index. Following is pseudo 1655 code for reading the indices: 1657 ---- Begin code block -------------------------------------- 1659 yac_qi = L(7); /* Y ac index always specified */ 1660 ydc_delta = F? delta(): 0; /* Y dc delta specified if 1661 flag is true */ 1663 y2dc_delta = F? delta(): 0; /* Y2 dc delta specified if 1664 flag is true */ 1665 y2ac_delta = F? delta(): 0; /* Y2 ac delta specified if 1666 flag is true */ 1668 uvdc_delta = F? delta(): 0; /* chroma dc delta specified 1669 if flag is true */ 1670 uvac_delta = F? delta(): 0; /* chroma ac delta specified 1671 if flag is true */ 1673 ---- End code block ---------------------------------------- 1675 Where delta() is the process to read 5 bits from the bitstream to 1676 determine a signed delta value: 1678 +-------+--------------------------------------------------+ 1679 | Index | Description | 1680 +-------+--------------------------------------------------+ 1681 | L(4) | Magnitude of delta | 1682 | | | 1683 | L(1) | Sign of delta, 0 for positive and 1 for negative | 1684 +-------+--------------------------------------------------+ 1686 9.7. Refresh Golden Frame and AltRef Frame 1688 For key frames, both golden frame and altref frame are refreshed/ 1689 replaced by the current reconstructed frame, by default. For non-key 1690 frames, VP8 uses two bits to indicate whether the two frame buffers 1691 are refreshed, using the reconstructed current frame: 1693 +-------+-----------------------------------------------------------+ 1694 | Index | Description | 1695 +-------+-----------------------------------------------------------+ 1696 | L(1) | Whether golden frame is refreshed (0 complete. This flag | 1697 | | does not occur for no, 1 for yes). | 1698 | | | 1699 | L(1) | Whether altref frame is refreshed (0 for no, 1 for yes). | 1700 +-------+-----------------------------------------------------------+ 1702 When the flag for golden frame is 0, VP8 uses 2 more bits in the 1703 bitstream to indicate whether the buffer (and which buffer) is copied 1704 to the golden frame, or if no buffer is copied: 1706 +-------+------------------------------------------+ 1707 | Index | Description | 1708 +-------+------------------------------------------+ 1709 | L(2) | Buffer copy flag for golden frame buffer | 1710 +-------+------------------------------------------+ 1712 Where: 1714 o 0 means no buffer is copied to golden frame 1716 o 1 means last_frame is copied to golden frame 1718 o 2 means alt_ref_frame is copied to golden frame 1720 Similarly, when the flag for altref is 0, VP8 uses 2 bits in the 1721 bitstream to indicate which buffer is copied to alt_ref_frame. 1723 +-------+------------------------------------------+ 1724 | Index | Description | 1725 +-------+------------------------------------------+ 1726 | L(2) | Buffer copy flag for altref frame buffer | 1727 +-------+------------------------------------------+ 1729 Where: 1731 o 0 means no buffer is copied to altref frame 1733 o 1 means last_frame is copied to altref frame 1735 o 2 means golden_frame is copied to altref frame 1737 Two bits are transmitted for ref_frame_sign_bias for golden_frame and 1738 alt_ref_frame respectively. 1740 +-------+---------------------------------+ 1741 | Index | Description | 1742 +-------+---------------------------------+ 1743 | L(1) | Sign bias flag for golden frame | 1744 | | | 1745 | L(1) | Sign bias flag for altref frame | 1746 +-------+---------------------------------+ 1748 These values are used to control the sign of the motion vectors when 1749 a golden frame or an altref frame is used as the reference frame for 1750 a macroblock. 1752 9.8. Refresh Last Frame Buffer 1754 VP8 uses one bit, L(1), to indicate if the last frame reference 1755 buffer is refreshed using the constructed current frame. On key 1756 frame this bit is overridden, and the last frame buffer is always 1757 refreshed. 1759 9.9. DCT Coefficient Probability Update 1761 Contains a partial update of the probability tables used to decode 1762 DCT coefficients. These tables are maintained across interframes but 1763 are of course replaced with their defaults at the beginning of every 1764 key frame. 1766 The layout and semantics of this field will be taken up in Chapter 1767 13. 1769 9.10. Remaining Frame Header Data (non-Key Frame) 1771 +-------+-----------------------------------------------------------+ 1772 | Index | Description | 1773 +-------+-----------------------------------------------------------+ 1774 | L(1) | mb_no_coeff_skip. This flag indicates at the frame level | 1775 | | if skipping of macroblocks with no non-zero coefficients | 1776 | | is enabled. If it is set to 0 then prob_skip_false is not | 1777 | | read and mb_skip_coeff is forced to 0 for all macroblocks | 1778 | | (see Sections 11.1 and 12.1). | 1779 | | | 1780 | L(8) | prob_skip_false = probability used for decoding a | 1781 | | macroblock level flag, which indicates if a macroblock | 1782 | | has any non-zero coefficients. Only read if | 1783 | | mb_no_coeff_skip is 1. | 1784 | | | 1785 | L(8) | prob_intra = probability that a macroblock is "intra" | 1786 | | predicted, that is, predicted from the already-encoded | 1787 | | portions of the current frame as opposed to "inter" | 1788 | | predicted, that is, predicted from the contents of a | 1789 | | prior frame. | 1790 | | | 1791 | L(8) | prob_last = probability that an inter-predicted | 1792 | | macroblock is predicted from the immediately previous | 1793 | | frame, as opposed to the most recent golden frame or | 1794 | | altref frame.. | 1795 | | | 1796 | L(8) | prob_gf = probability that an inter-predicted macroblock | 1797 | | is predicted from the most recent golden frame, as | 1798 | | opposed to the altref frame | 1799 | | | 1800 | F | If true, followed by four L(8)s updating the | 1801 | | probabilities for the different types of intra-prediction | 1802 | | for the Y plane. These probabilities correspond to the | 1803 | | four interior nodes of the decoding tree for intra Y | 1804 | | modes in an interframe, that is, the even positions in | 1805 | | the ymode_tree array given above. | 1806 | | | 1807 | F | If true, followed by three L(8)s updating the | 1808 | | probabilities for the different types of intra-prediction | 1809 | | for the chroma planes. These probabilities correspond to | 1810 | | the even positions in the uv_mode_tree array given above. | 1811 | | | 1812 | X | Motion vector probability update. The details will be | 1813 | | given after the discussion of motion vector decoding. | 1814 +-------+-----------------------------------------------------------+ 1816 Decoding of this portion (only) of the frame header is handled in the 1817 reference decoder file decodemv.c. 1819 9.11. Remaining Frame Header Data (Key Frame) 1821 +-------+-----------------------------------------------------------+ 1822 | Index | Description | 1823 +-------+-----------------------------------------------------------+ 1824 | L(1) | mb_no_coeff_skip. This flag indicates at the frame level | 1825 | | if skipping of macroblocks with no non-zero coefficients | 1826 | | is enabled. If it is set to 0 then prob_skip_false is not | 1827 | | read and mb_skip_coeff is forced to 0 for all macroblocks | 1828 | | (see Sections 11.1 and 12.1). | 1829 | | | 1830 | L(8) | prob_skip_false = Probability used for decoding a | 1831 | | macroblock level flag, which indicates if a macroblock | 1832 | | has any non-zero coefficients. Only read if | 1833 | | mb_no_coeff_skip is 1. | 1834 +-------+-----------------------------------------------------------+ 1836 Decoding of this portion of the frame header is handled in the 1837 reference decoder file demode.c. 1839 This completes the layout of the frame header. The remainder of the 1840 first data partition consists of macroblock-level prediction data. 1842 After the frame header is processed, all probabilities needed to 1843 decode the prediction and residue data are known and will not change 1844 until the next frame. 1846 10. Segment-based Feature Adjustments 1848 Every macroblock may optionally override some of the default 1849 behaviors of the decoder. Specifically, VP8 uses segment based 1850 adjustments to support changing quantizer level and loop filter level 1851 for a macroblock. When the segment-based adjustment feature is 1852 enabled for a frame, each macroblock within the frame is coded with a 1853 segment_id. This effectively segments all the macroblocks in the 1854 current frame into a number of different segments. Macroblocks 1855 within the same segment behave exactly the same for quantizer and 1856 loop filter level adjustments. 1858 If both the segmentation_enabled and update_mb_segmentation_map flags 1859 in subsection B of the frame header take a value of 1, the prediction 1860 data for each (intra- or inter-coded) macroblock begins with a 1861 specification of segment_id for the current macroblock. It is 1862 decoded using this simple tree ... 1864 ---- Begin code block -------------------------------------- 1866 const tree_index mb_segment_tree [2 * (4-1)] = 1867 { 1868 2, 4, /* root: "0", "1" subtrees */ 1869 -0, -1, /* "00" = 0th value, "01" = 1st value */ 1870 -2, -3 /* "10" = 2nd value, "11" = 3rd value */ 1871 } 1873 ---- End code block ---------------------------------------- 1875 ... combined with a 3-entry probability table 1876 mb_segment_tree_probs[3]. The macroblock's segment_id is used later 1877 in the decoding process to look into the segment_feature_data table 1878 and determine how the quantizer and loop filter levels are adjusted. 1880 The decoding of segment_id, together with the parsing of intra- 1881 prediction modes (which is taken up next), is implemented in the 1882 reference decoder file demode.c. 1884 11. Key Frame Macroblock Prediction Records 1886 After the features described above, the macroblock prediction record 1887 next specifies the prediction mode used for the macroblock. 1889 11.1. mb_skip_coeff 1891 The single bool flag is decoded using prob_skip_false if and only if 1892 mb_no_coeff_skip is set to 1 (see sections 9.10 and 9.11). If 1893 mb_no_coeff_skip is set to 0 then this value defaults to 0. 1895 11.2. Luma Modes 1897 First comes the luma specification of type intra_mbmode, coded using 1898 the kf_ymode_tree, as described in Chapter 8 and repeated here for 1899 convenience: 1901 ---- Begin code block -------------------------------------- 1903 typedef enum 1904 { 1905 DC_PRED, /* predict DC using row above and column to the left */ 1906 V_PRED, /* predict rows using row above */ 1907 H_PRED, /* predict columns using column to the left */ 1908 TM_PRED, /* propagate second differences a la "true motion" */ 1910 B_PRED, /* each Y subblock is independently predicted */ 1912 num_uv_modes = B_PRED, /* first four modes apply to chroma */ 1913 num_ymodes /* all modes apply to luma */ 1914 } 1915 intra_mbmode; 1917 const tree_index kf_ymode_tree [2 * (num_ymodes - 1)] = 1918 { 1919 -B_PRED, 2, /* root: B_PRED = "0", "1" subtree */ 1920 4, 6, /* "1" subtree has 2 descendant subtrees */ 1921 -DC_PRED, -V_PRED, /* "10" subtree: DC_PRED = "100", 1922 V_PRED = "101" */ 1923 -H_PRED, -TM_PRED /* "11" subtree: H_PRED = "110", 1924 TM_PRED = "111" */ 1925 }; 1927 ---- End code block ---------------------------------------- 1929 For key frames, the Y mode is decoded using a fixed probability array 1930 as follows: 1932 ---- Begin code block -------------------------------------- 1934 const Prob kf_ymode_prob [num_ymodes - 1] = { 145, 156, 163, 128}; 1935 Ymode = (intra_mbmode) treed_read( d, kf_ymode_tree, kf_ymode_prob); 1937 ---- End code block ---------------------------------------- 1939 d is of course the bool_decoder being used to read the first data 1940 partition. 1942 If the Ymode is B_PRED, it is followed by a (tree-coded) mode for 1943 each of the 16 Y subblocks. The 10 subblock modes and their coding 1944 tree as follows: 1946 ---- Begin code block -------------------------------------- 1948 typedef enum 1949 { 1950 B_DC_PRED, /* predict DC using row above and column 1951 to the left */ 1952 B_TM_PRED, /* propagate second differences a la 1953 "true motion" */ 1955 B_VE_PRED, /* predict rows using row above */ 1956 B_HE_PRED, /* predict columns using column to the left */ 1958 B_LD_PRED, /* southwest (left and down) 45 degree diagonal 1959 prediction */ 1960 B_RD_PRED, /* southeast (right and down) "" */ 1962 B_VR_PRED, /* SSE (vertical right) diagonal prediction */ 1963 B_VL_PRED, /* SSW (vertical left) "" */ 1964 B_HD_PRED, /* ESE (horizontal down) "" */ 1965 B_HU_PRED, /* ENE (horizontal up) "" */ 1967 num_intra_bmodes 1968 } 1969 intra_bmode; 1971 /* Coding tree for the above, with implied codings as comments */ 1973 const tree_index bmode_tree [2 * (num_intra_bmodes - 1)] = 1974 { 1975 -B_DC_PRED, 2, /* B_DC_PRED = "0" */ 1976 -B_TM_PRED, 4, /* B_TM_PRED = "10" */ 1977 -B_VE_PRED, 6, /* B_VE_PRED = "110" */ 1978 8, 12, 1979 -B_HE_PRED, 10, /* B_HE_PRED = "1110" */ 1980 -B_RD_PRED, -B_VR_PRED, /* B_RD_PRED = "111100", 1981 B_VR_PRED = "111101" */ 1982 -B_LD_PRED, 14, /* B_LD_PRED = "111110" */ 1983 -B_VL_PRED, 16 /* B_VL_PRED = "1111110" */ 1984 -B_HD_PRED, -B_HU_PRED /* HD = "11111110", 1985 HU = "11111111" */ 1986 }; 1988 ---- End code block ---------------------------------------- 1990 The first four modes are smaller versions of the similarly-named 1991 16x16 modes above, albeit with slightly different numbering. The 1992 last six "diagonal" modes are unique to luma subblocks. 1994 11.3. Subblock Mode Contexts 1996 The coding of subblock modes in key frames uses the modes already 1997 coded for the subblocks to the left of and above the subblock to 1998 select a probability array for decoding the current subblock mode. 1999 This is our first instance of contextual prediction, and there are 2000 several caveats associated with it: 2002 1. The adjacency relationships between subblocks are based on the 2003 normal default raster placement of the subblocks. 2005 2. The adjacent subblocks need not lie in the current macroblock. 2006 The subblocks to the left of the left-edge subblocks 0, 4, 8, and 2007 12 are the right-edge subblocks 3, 7, 11, and 15, respectively, 2008 of the (already coded) macroblock immediately to the left. 2009 Similarly, the subblocks above the top-edge subblocks 0, 1, 2, 2010 and 3 are the bottom-edge subblocks 12, 13, 14, and 15 of the 2011 already-coded macroblock immediately above us. 2013 3. For macroblocks on the top row or left edge of the image, some of 2014 the predictors will be non-existent. Such predictors are taken 2015 to have had the value B_DC_PRED which, perhaps conveniently, 2016 takes the value 0 in the enumeration above. A simple management 2017 scheme for these contexts might maintain a row of above 2018 predictors and four left predictors. Before decoding the frame, 2019 the entire row is initialized to B_DC_PRED; before decoding each 2020 row of macroblocks, the four left predictors are also set to 2021 B_DC_PRED. After decoding a macroblock, the bottom four subblock 2022 modes are copied into the row predictor (at the current position, 2023 which then advances to be above the next macroblock) and the 2024 right four subblock modes are copied into the left predictor. 2026 4. Many macroblocks will of course be coded using a 16x16 luma 2027 prediction mode. For the purpose of predicting ensuing subblock 2028 modes (only), such macroblocks derive a subblock mode, constant 2029 throughout the macroblock, from the 16x16 luma mode as follows: 2030 DC_PRED uses B_DC_PRED, V_PRED uses B_VE_PRED, H_PRED uses 2031 B_HE_PRED, and TM_PRED uses B_TM_PRED. 2033 5. Although we discuss interframe modes later, we remark here that, 2034 while interframes do use all the intra-coding modes described 2035 here and below, the subblock modes in an interframe are coded 2036 using a single constant probability array that does not depend on 2037 any context. 2039 The dependence of subblock mode probability on the nearby subblock 2040 mode context is most easily handled using a three-dimensional 2041 constant array: 2043 ---- Begin code block -------------------------------------- 2045 const Prob kf_bmode_prob [num_intra_bmodes] [num_intra_bmodes] 2046 [num_intra_bmodes-1]; 2048 ---- End code block ---------------------------------------- 2050 The outer two dimensions of this array are indexed by the already- 2051 coded subblock modes above and to the left of the current block, 2052 respectively. The inner dimension is a typical tree probability list 2053 whose indices correspond to the even indices of the bmode_tree above. 2054 The mode for the j^(th) luma subblock is then 2056 ---- Begin code block -------------------------------------- 2058 Bmode = (intra_bmode) treed_read( d, bmode_tree, kf_bmode_prob 2059 [A] [L]); 2061 ---- End code block ---------------------------------------- 2063 where the 4x4 Y subblock index j varies from 0 to 15 in raster order 2064 and A and L are the modes used above and to-the-left of the j^(th) 2065 subblock. 2067 The contents of the kf_bmode_prob array are given at the end of this 2068 chapter. 2070 11.4. Chroma Modes 2072 After the Y mode (and optional subblock mode) specification comes the 2073 chroma mode. The chroma modes are a subset of the Y modes and are 2074 coded using the uv_mode_tree described in Chapter 8, again repeated 2075 here for convenience: 2077 ---- Begin code block -------------------------------------- 2079 const tree_index uv_mode_tree [2 * (num_uv_modes - 1)] = 2080 { 2081 -DC_PRED, 2, /* root: DC_PRED = "0", "1" subtree */ 2082 -V_PRED, 4, /* "1" subtree: V_PRED = "10", 2083 "11" subtree */ 2084 -H_PRED, -TM_PRED /* "11" subtree: H_PRED = "110", 2085 TM_PRED = "111" */ 2086 }; 2088 ---- End code block ---------------------------------------- 2090 As for the Y modes (in a key frame), the chroma modes are coded using 2091 a fixed, contextless probability table: 2093 ---- Begin code block -------------------------------------- 2095 const Prob kf_uv_mode_prob [num_uv_modes - 1] = { 142, 114, 183}; 2096 uv_mode = (intra_mbmode) treed_read( d, uv_mode_tree, 2097 kf_uv_mode_prob); 2099 ---- End code block ---------------------------------------- 2101 This completes the description of macroblock prediction coding for 2102 key frames. As will be discussed in Chapter 16, the coding of intra 2103 modes within interframes is similar, but not identical, to that 2104 described here (and in the reference code) for prediction modes and, 2105 indeed, for all tree-coded data in VP8. 2107 11.5. Subblock Mode Probability Table 2109 Finally, here is the fixed probability table used to decode subblock 2110 modes in key frames. 2112 ---- Begin code block -------------------------------------- 2114 const Prob kf_bmode_prob [num_intra_bmodes] [num_intra_bmodes] 2115 [num_intra_bmodes-1] = 2116 { 2117 { 2118 { 231, 120, 48, 89, 115, 113, 120, 152, 112}, 2119 { 152, 179, 64, 126, 170, 118, 46, 70, 95}, 2120 { 175, 69, 143, 80, 85, 82, 72, 155, 103}, 2121 { 56, 58, 10, 171, 218, 189, 17, 13, 152}, 2122 { 144, 71, 10, 38, 171, 213, 144, 34, 26}, 2123 { 114, 26, 17, 163, 44, 195, 21, 10, 173}, 2124 { 121, 24, 80, 195, 26, 62, 44, 64, 85}, 2125 { 170, 46, 55, 19, 136, 160, 33, 206, 71}, 2126 { 63, 20, 8, 114, 114, 208, 12, 9, 226}, 2127 { 81, 40, 11, 96, 182, 84, 29, 16, 36} 2128 }, 2129 { 2130 { 134, 183, 89, 137, 98, 101, 106, 165, 148}, 2131 { 72, 187, 100, 130, 157, 111, 32, 75, 80}, 2132 { 66, 102, 167, 99, 74, 62, 40, 234, 128}, 2133 { 41, 53, 9, 178, 241, 141, 26, 8, 107}, 2134 { 104, 79, 12, 27, 217, 255, 87, 17, 7}, 2135 { 74, 43, 26, 146, 73, 166, 49, 23, 157}, 2136 { 65, 38, 105, 160, 51, 52, 31, 115, 128}, 2137 { 87, 68, 71, 44, 114, 51, 15, 186, 23}, 2138 { 47, 41, 14, 110, 182, 183, 21, 17, 194}, 2139 { 66, 45, 25, 102, 197, 189, 23, 18, 22} 2140 }, 2141 { 2142 { 88, 88, 147, 150, 42, 46, 45, 196, 205}, 2143 { 43, 97, 183, 117, 85, 38, 35, 179, 61}, 2144 { 39, 53, 200, 87, 26, 21, 43, 232, 171}, 2145 { 56, 34, 51, 104, 114, 102, 29, 93, 77}, 2146 { 107, 54, 32, 26, 51, 1, 81, 43, 31}, 2147 { 39, 28, 85, 171, 58, 165, 90, 98, 64}, 2148 { 34, 22, 116, 206, 23, 34, 43, 166, 73}, 2149 { 68, 25, 106, 22, 64, 171, 36, 225, 114}, 2150 { 34, 19, 21, 102, 132, 188, 16, 76, 124}, 2151 { 62, 18, 78, 95, 85, 57, 50, 48, 51} 2152 }, 2153 { 2154 { 193, 101, 35, 159, 215, 111, 89, 46, 111}, 2155 { 60, 148, 31, 172, 219, 228, 21, 18, 111}, 2156 { 112, 113, 77, 85, 179, 255, 38, 120, 114}, 2157 { 40, 42, 1, 196, 245, 209, 10, 25, 109}, 2158 { 100, 80, 8, 43, 154, 1, 51, 26, 71}, 2159 { 88, 43, 29, 140, 166, 213, 37, 43, 154}, 2160 { 61, 63, 30, 155, 67, 45, 68, 1, 209}, 2161 { 142, 78, 78, 16, 255, 128, 34, 197, 171}, 2162 { 41, 40, 5, 102, 211, 183, 4, 1, 221}, 2163 { 51, 50, 17, 168, 209, 192, 23, 25, 82} 2164 }, 2165 { 2166 { 125, 98, 42, 88, 104, 85, 117, 175, 82}, 2167 { 95, 84, 53, 89, 128, 100, 113, 101, 45}, 2168 { 75, 79, 123, 47, 51, 128, 81, 171, 1}, 2169 { 57, 17, 5, 71, 102, 57, 53, 41, 49}, 2170 { 115, 21, 2, 10, 102, 255, 166, 23, 6}, 2171 { 38, 33, 13, 121, 57, 73, 26, 1, 85}, 2172 { 41, 10, 67, 138, 77, 110, 90, 47, 114}, 2173 { 101, 29, 16, 10, 85, 128, 101, 196, 26}, 2174 { 57, 18, 10, 102, 102, 213, 34, 20, 43}, 2175 { 117, 20, 15, 36, 163, 128, 68, 1, 26} 2176 }, 2177 { 2178 { 138, 31, 36, 171, 27, 166, 38, 44, 229}, 2179 { 67, 87, 58, 169, 82, 115, 26, 59, 179}, 2180 { 63, 59, 90, 180, 59, 166, 93, 73, 154}, 2181 { 40, 40, 21, 116, 143, 209, 34, 39, 175}, 2182 { 57, 46, 22, 24, 128, 1, 54, 17, 37}, 2183 { 47, 15, 16, 183, 34, 223, 49, 45, 183}, 2184 { 46, 17, 33, 183, 6, 98, 15, 32, 183}, 2185 { 65, 32, 73, 115, 28, 128, 23, 128, 205}, 2186 { 40, 3, 9, 115, 51, 192, 18, 6, 223}, 2187 { 87, 37, 9, 115, 59, 77, 64, 21, 47} 2188 }, 2189 { 2190 { 104, 55, 44, 218, 9, 54, 53, 130, 226}, 2191 { 64, 90, 70, 205, 40, 41, 23, 26, 57}, 2192 { 54, 57, 112, 184, 5, 41, 38, 166, 213}, 2193 { 30, 34, 26, 133, 152, 116, 10, 32, 134}, 2194 { 75, 32, 12, 51, 192, 255, 160, 43, 51}, 2195 { 39, 19, 53, 221, 26, 114, 32, 73, 255}, 2196 { 31, 9, 65, 234, 2, 15, 1, 118, 73}, 2197 { 88, 31, 35, 67, 102, 85, 55, 186, 85}, 2198 { 56, 21, 23, 111, 59, 205, 45, 37, 192}, 2199 { 55, 38, 70, 124, 73, 102, 1, 34, 98} 2200 }, 2201 { 2202 { 102, 61, 71, 37, 34, 53, 31, 243, 192}, 2203 { 69, 60, 71, 38, 73, 119, 28, 222, 37}, 2204 { 68, 45, 128, 34, 1, 47, 11, 245, 171}, 2205 { 62, 17, 19, 70, 146, 85, 55, 62, 70}, 2206 { 75, 15, 9, 9, 64, 255, 184, 119, 16}, 2207 { 37, 43, 37, 154, 100, 163, 85, 160, 1}, 2208 { 63, 9, 92, 136, 28, 64, 32, 201, 85}, 2209 { 86, 6, 28, 5, 64, 255, 25, 248, 1}, 2210 { 56, 8, 17, 132, 137, 255, 55, 116, 128}, 2211 { 58, 15, 20, 82, 135, 57, 26, 121, 40} 2212 }, 2213 { 2214 { 164, 50, 31, 137, 154, 133, 25, 35, 218}, 2215 { 51, 103, 44, 131, 131, 123, 31, 6, 158}, 2216 { 86, 40, 64, 135, 148, 224, 45, 183, 128}, 2217 { 22, 26, 17, 131, 240, 154, 14, 1, 209}, 2218 { 83, 12, 13, 54, 192, 255, 68, 47, 28}, 2219 { 45, 16, 21, 91, 64, 222, 7, 1, 197}, 2220 { 56, 21, 39, 155, 60, 138, 23, 102, 213}, 2221 { 85, 26, 85, 85, 128, 128, 32, 146, 171}, 2222 { 18, 11, 7, 63, 144, 171, 4, 4, 246}, 2223 { 35, 27, 10, 146, 174, 171, 12, 26, 128} 2224 }, 2225 { 2226 { 190, 80, 35, 99, 180, 80, 126, 54, 45}, 2227 { 85, 126, 47, 87, 176, 51, 41, 20, 32}, 2228 { 101, 75, 128, 139, 118, 146, 116, 128, 85}, 2229 { 56, 41, 15, 176, 236, 85, 37, 9, 62}, 2230 { 146, 36, 19, 30, 171, 255, 97, 27, 20}, 2231 { 71, 30, 17, 119, 118, 255, 17, 18, 138}, 2232 { 101, 38, 60, 138, 55, 70, 43, 26, 142}, 2233 { 138, 45, 61, 62, 219, 1, 81, 188, 64}, 2234 { 32, 41, 20, 117, 151, 142, 20, 21, 163}, 2235 { 112, 19, 12, 61, 195, 128, 48, 4, 24} 2236 } 2237 }; 2239 ---- End code block ---------------------------------------- 2241 12. Intraframe Prediction 2243 Intraframe prediction uses already-coded macroblocks within the 2244 current frame to approximate the contents of the current macroblock. 2245 It applies to intra-coded macroblocks in an interframe and to all 2246 macroblocks in a key frame. 2248 Relative to the current macroblock "M", the already-coded macroblocks 2249 include all macroblocks above M together with the macroblocks on the 2250 same row as, and to the left of, M, though at most four of these 2251 macroblocks are actually used: the block "A" directly above M, the 2252 blocks immediately to the left and right of A, and the block 2253 immediately to the left of M. 2255 Each of the prediction modes (i.e., means of extrapolation from 2256 already-calculated values) uses fairly simple arithmetic on pixel 2257 values whose positions, relative to the current position, are defined 2258 by the mode. 2260 The chroma (U and V) and luma (Y) predictions are independent of each 2261 other. 2263 The relative addressing of pixels applied to macroblocks on the upper 2264 row or left column of the frame will sometimes cause pixels outside 2265 the visible frame to be referenced. Usually such out-of-bounds 2266 pixels have an assumed value of 129 for pixels to the left of the 2267 leftmost column of the visible frame and 127 for pixels above the top 2268 row of the visible frame (including the special case of the pixel 2269 above and to the left of the top-left pixel in the visible frame). 2270 Exceptions to this (associated to certain modes) will be noted below. 2272 The already-coded macroblocks referenced by intra-prediction have 2273 been "reconstructed", that is, have been predicted and residue- 2274 adjusted (as described in Chapter 14), but have not been loop- 2275 filtered. While it does process the edges between individual 2276 macroblocks and individual subblocks, loop filtering (described in 2277 Chapter 15) is applied to the frame as a whole, after all of the 2278 macroblocks have been reconstructed. 2280 12.1. mb_skip_coeff 2282 The single bool flag is decoded using prob_skip_false if and only if 2283 mb_no_coeff_skip is set to 1 (see Sections 9.10 and 9.11). If 2284 mb_no_coeff_skip is set to 0 then this value defaults to 0. 2286 12.2. Chroma Prediction 2288 The chroma prediction is a little simpler than the luma prediction, 2289 so we treat it first. Each of the chroma modes treats U and V 2290 identically, that is, the U and V prediction values are calculated in 2291 parallel, using the same relative addressing and arithmetic in each 2292 of the two planes. 2294 The modes extrapolate prediction values using the 8-pixel row "A" 2295 lying immediately above the block (that is, the bottom chroma row of 2296 the macroblock immediately above the current macroblock) and the 2297 8-pixel column "L" immediately to the left of the block (that is, the 2298 rightmost chroma column of the macroblock immediately to the left of 2299 the current macroblock). 2301 Vertical prediction (chroma mode V_PRED) simply fills each 8-pixel 2302 row of the 8x8 chroma block with a copy of the "above" row (A). If 2303 the current macroblock lies on the top row of the frame, all 8 of the 2304 pixel values in A are assigned the value 127. 2306 Similarly, horizontal prediction (H_PRED) fills each 8-pixel column 2307 of the 8x8 chroma block with a copy of the "left" column (L). If the 2308 current macroblock is in the left column of the frame, all 8 pixel 2309 values in L are assigned the value 129. 2311 DC prediction (DC_PRED) fills the 8x8 chroma block with a single 2312 value. In the generic case of a macroblock lying below the top row 2313 and right of the leftmost column of the frame, this value is the 2314 average of the 16 (genuinely visible) pixels in the (union of the) 2315 above row A and left column L. 2317 Otherwise, if the current macroblock lies on the top row of the 2318 frame, the average of the 8 pixels in L is used; if it lies in the 2319 left column of the frame, the average of the 8 pixels in A is used. 2320 Note that the averages used in these exceptional cases are not the 2321 same as those that would be arrived at by using the out-of-bounds A 2322 and L values defined for V_PRED and H_PRED. In the case of the 2323 leftmost macroblock on the top row of the frame the 8x8 block is 2324 simply filled with the constant value 128. 2326 For DC_PRED, apart from the exceptional case of the top left 2327 macroblock, we are averaging either 16 or 8 pixel values to get a 2328 single prediction value that fills the 8x8 block. The rounding is 2329 done as follows: 2331 ---- Begin code block -------------------------------------- 2333 int sum; /* sum of 8 or 16 pixels at (at least) 16-bit precision */ 2334 int shf; /* base 2 logarithm of the number of pixels (3 or 4) */ 2336 Pixel DCvalue = (sum + (1 << (shf-1))) >> shf; 2338 ---- End code block ---------------------------------------- 2340 Because the summands are all valid pixels, no "clamp" is necessary in 2341 the calculation of DCvalue. 2343 The remaining "True Motion" (TM_PRED) chroma mode gets its name from 2344 an older technique of video compression used by On2 Technologies, to 2345 which it bears some relation. In addition to the row "A" and column 2346 "L", TM_PRED uses the pixel "P" above and to the left of the chroma 2347 block. 2349 In this mode, we propagate the horizontal differences between pixels 2350 in A (starting from P), using the pixels from L to start each row. 2351 The exact algorithm is as follows. 2353 ---- Begin code block -------------------------------------- 2355 void TMpred( 2356 Pixel b[8][8], /* chroma (U or V) prediction block */ 2357 const Pixel A[8], /* row of already-constructed pixels 2358 above block */ 2359 const Pixel L[8], /* column of "" just to the left of 2360 block */ 2361 const Pixel P /* pixel just to the left of A and 2362 above L*/ 2363 ) { 2364 int r = 0; /* row */ 2365 do { 2366 int c = 0; /* column */ 2367 do { 2368 b[r][c] = clamp255( L[r]+ A[c] - P); 2369 } while( ++c < 8); 2370 } while( ++r < 8); 2371 } 2373 ---- End code block ---------------------------------------- 2375 Note that the process could equivalently be described as propagating 2376 the vertical differences between pixels in L (starting from P), using 2377 the pixels from A to start each column. 2379 An implementation of chroma intra-prediction may be found in the 2380 reference decoder file reconintra.c. 2382 Unlike DC_PRED, for macroblocks on the top row or left edge TM_PRED 2383 does use the out-of-bounds values of 127 and 129 (respectively) 2384 defined for V_PRED and H_PRED. 2386 12.3. Luma Prediction 2388 The prediction processes for the first four 16x16 luma modes 2389 (DC_PRED, V_PRED, H_PRED, and TM_PRED) are essentially identical to 2390 the corresponding chroma prediction processes described above, the 2391 only difference being that we are predicting a single 16x16 luma 2392 block instead of two 8x8 chroma blocks. 2394 Thus, the row "A" and column "L" here contain 16 pixels, the DC 2395 prediction is calculated using 16 or 32 pixels (and shf is 4 or 5), 2396 and we of course fill the entire prediction buffer, that is, 16 rows 2397 (or columns) containing 16 pixels each. The reference implementation 2398 of 16x16 luma prediction is also in reconintra.c. 2400 In the remaining luma mode (B_PRED), each 4x4 Y subblock is 2401 independently predicted using one of ten modes (listed, along with 2402 their encodings, in Chapter 11). 2404 Also, unlike the full-macroblock modes already described, some of the 2405 subblock modes use prediction pixels above and to the right of the 2406 current subblock. In detail, each 4x4 subblock "B" is predicted 2407 using (at most) the 4-pixel column "L" immediately to the left of B 2408 and the 8-pixel row "A" immediately above B, consisting of the 4 2409 pixels above B followed by the 4 adjacent pixels above and to the 2410 right of B, together with the single pixel "P" immediately to the 2411 left of A (and immediately above L). 2413 For the purpose of subblock intra-prediction, the pixels immediately 2414 to the left and right of a pixel in a subblock are the same as the 2415 pixels immediately to the left and right of the corresponding pixel 2416 in the frame buffer "F". Vertical offsets behave similarly: The 2417 above row A lies immediately above B in F, and the adjacent pixels in 2418 the left column L are separated by a single row in F. 2420 Because entire macroblocks (as opposed to their constituent 2421 subblocks) are reconstructed in raster-scan order, for subblocks 2422 lying along the right edge (and not along the top row) of the current 2423 macroblock, the four "extra" prediction pixels in A above and to the 2424 right of B have not yet actually been constructed. 2426 Subblocks 7, 11, and 15 are affected. All three of these subblocks 2427 use the same extra pixels as does subblock 3 (at the upper right 2428 corner of the macroblock), namely the 4 pixels immediately above and 2429 to the right of subblock 3. Writing (R,C) for a frame buffer 2430 position offset from the upper left corner of the current macroblock 2431 by R rows and C columns, the extra pixels for all the right-edge 2432 subblocks (3, 7, 11, and 15) are at positions (-1,16), (-1,17), 2433 (-1,18), and (-1,19). 2435 The details of the prediction modes are most easily described in 2436 code. 2438 ---- Begin code block -------------------------------------- 2440 /* Result pixels are often averages of two or three predictor 2441 pixels. The following subroutines are used to calculate 2442 these averages. Because the arguments are valid pixels, no 2443 clamping is necessary. An actual implementation would 2444 probably use inline functions or macros. */ 2446 /* Compute weighted average centered at y w/adjacent x, z */ 2448 Pixel avg3( Pixel x, Pixel y, Pixel z) { 2449 return (x + y + y + z + 2) >> 2;} 2451 /* Weighted average of 3 adjacent pixels centered at p */ 2453 Pixel avg3p( const Pixel *p) { return avg3( p[-1], p[0], p[1]);} 2455 /* Simple average of x and y */ 2457 Pixel avg2( Pixel x, Pixel y) { return (x + y + 1) >> 1;} 2459 /* Average of p[0] and p[1] may be considered to be a synthetic 2460 pixel lying between the two, that is, one half-step past p. */ 2462 Pixel avg2p( const Pixel *p) { return avg2( p[0], p[1]);} 2464 /* Main function. Out-of-frame pixels in A or L should be set 2465 to 128. */ 2467 void subblock_intra_predict( 2468 Pixel B[4][4], /* Y subblock prediction buffer */ 2469 const Pixel *A, /* A[0]...A[7] = above row, A[-1] = P */ 2470 const Pixel *L, /* L[0]...L[3] = left column, L[-1] = P */ 2471 intra_bmode mode /* enum is in section 11.1 above */ 2472 ) { 2473 Pixel E[9]; /* 9 already-constructed edge pixels */ 2474 E[0] = L[3]; E[1] = L[2]; E[2] = L[1]; E[3] = L[0]; 2475 E[4] = A[-1]; /* == L[-1] == P */ 2476 E[5] = A[0]; E[6] = A[1]; E[7] = A[2]; E[8] = A[3]; 2478 switch( mode) { 2479 /* First four modes are similar to corresponding 2480 full-block modes. */ 2482 case B_DC_PRED: 2483 { 2484 int v = 4; /* DC sum/avg, 4 is rounding adjustment */ 2485 int i = 0; do { v += A[i] + L[i];} while( ++i < 4); 2486 v >>= 3; /* averaging 8 pixels */ 2487 i = 0; do { /* fill prediction buffer with constant DC 2488 value */ 2489 int j = 0; do { B[i][j] = v;} while( ++j < 4); 2490 } while( ++i < 4); 2491 break; 2492 } 2494 case B_TM_PRED: /* just like 16x16 TM_PRED */ 2495 { 2496 int r = 0; do { 2497 int c = 0; do { 2498 B[r][c] = clamp255( L[r] + A[c] - A[-1]); 2499 } while( ++c < 4); 2500 } while( ++r < 4); 2501 break; 2502 } 2504 case B_VE_PRED: /* like 16x16 V_PRED except using averages */ 2505 { 2506 int c = 0; do { /* all 4 rows = smoothed top row */ 2507 B[0][c] = B[1][c] = B[2][c] = B[3][c] = avg3p( A + c); 2508 } while( ++c < 4); 2509 break; 2510 } 2512 case B_HE_PRED: /* like 16x16 H_PRED except using averages */ 2513 { 2514 /* Bottom row is exceptional because L[4] does not exist */ 2515 int v = avg3( L[2], L[3], L[3]); 2516 int r = 3; while( 1) { /* all 4 columns = smoothed left 2517 column */ 2518 B[r][0] = B[r][1] = B[r][2] = B[r][3] = v; 2519 if( --r < 0) 2520 break; 2521 v = avg3p( L + r); /* upper 3 rows use average of 2522 3 pixels */ 2524 } 2525 break; 2526 } 2528 /* The remaining six "diagonal" modes subdivide the 2529 prediction buffer into diagonal lines. All the pixels 2530 on each line are assigned the same value; this value is 2531 (a smoothed or synthetic version of) an 2532 already-constructed predictor value lying on the same 2533 line. For clarity, in the comments, we express the 2534 positions of these predictor pixels relative to the 2535 upper left corner of the destination array B. 2537 These modes are unique to subblock prediction and have 2538 no full-block analogues. The first two use lines at 2539 +|- 45 degrees from horizontal (or, equivalently, 2540 vertical), that is, lines whose slopes are +|- 1. */ 2542 case B_LD_PRED: /* southwest (left and down) step = 2543 (-1, 1) or (1,-1) */ 2544 /* avg3p( A + j) is the "smoothed" pixel at (-1,j) */ 2545 B[0][0] = avg3p( A + 1); 2546 B[0][1] = B[1][0] = avg3p( A + 2); 2547 B[0][2] = B[1][1] = B[2][0] = avg3p( A + 3); 2548 B[0][3] = B[1][2] = B[2][1] = B[3][0] = avg3p( A + 4); 2549 B[1][3] = B[2][2] = B[3][1] = avg3p( A + 5); 2550 B[2][3] = B[3][2] = avg3p( A + 6); 2551 B[3][3] = avg3( A[6], A[7], A[7]); /* A[8] does not exist */ 2552 break; 2554 case B_RD_PRED: /* southeast (right and down) step = 2555 (1,1) or (-1,-1) */ 2556 B[3][0] = avg3p( E + 1); /* predictor is from (2, -1) */ 2557 B[3][1] = B[2][0] = avg3p( E + 2); /* (1, -1) */ 2558 B[3][2] = B[2][1] = B[1][0] = avg3p( E + 3); /* (0, -1) */ 2559 B[3][3] = B[2][2] = B[1][1] = B[0][0] = 2560 avg3p( E + 4); /* (-1, -1) */ 2561 B[2][3] = B[1][2] = B[0][1] = avg3p( E + 5); /* (-1, 0) */ 2562 B[1][3] = B[0][2] = avg3p( E + 6); /* (-1, 1) */ 2563 B[0][3] = avg3p( E + 7); /* (-1, 2) */ 2564 break; 2566 /* The remaining 4 diagonal modes use lines whose slopes are 2567 +|- 2 and +|- 1/2. The angles of these lines are roughly 2568 +|- 27 degrees from horizontal or vertical. 2570 Unlike the 45 degree diagonals, here we often need to 2571 "synthesize" predictor pixels midway between two actual 2572 predictors using avg2p(p), which we think of as returning 2573 the pixel "at" p[1/2]. */ 2575 case B_VR_PRED: /* SSE (vertical right) step = 2576 (2,1) or (-2,-1) */ 2577 B[3][0] = avg3p( E + 2); /* predictor is from (1, -1) */ 2578 B[2][0] = avg3p( E + 3); /* (0, -1) */ 2579 B[3][1] = B[1][0] = avg3p( E + 4); /* (-1, -1) */ 2580 B[2][1] = B[0][0] = avg2p( E + 4); /* (-1, -1/2) */ 2581 B[3][2] = B[1][1] = avg3p( E + 5); /* (-1, 0) */ 2582 B[2][2] = B[0][1] = avg2p( E + 5); /* (-1, 1/2) */ 2583 B[3][3] = B[1][2] = avg3p( E + 6); /* (-1, 1) */ 2584 B[2][3] = B[0][2] = avg2p( E + 6); /* (-1, 3/2) */ 2585 B[1][3] = avg3p( E + 7); /* (-1, 2) */ 2586 B[0][3] = avg2p( E + 7); /* (-1, 5/2) */ 2587 break; 2589 case B_VL_PRED: /* SSW (vertical left) step = 2590 (2,-1) or (-2,1) */ 2591 B[0][0] = avg2p( A); /* predictor is from (-1, 1/2) */ 2592 B[1][0] = avg3p( A + 1); /* (-1, 1) */ 2593 B[2][0] = B[0][1] = avg2p( A + 1); /* (-1, 3/2) */ 2594 B[1][1] = B[3][0] = avg3p( A + 2); /* (-1, 2) */ 2595 B[2][1] = B[0][2] = avg2p( A + 2); /* (-1, 5/2) */ 2596 B[3][1] = B[1][2] = avg3p( A + 3); /* (-1, 3) */ 2597 B[2][2] = B[0][3] = avg2p( A + 3); /* (-1, 7/2) */ 2598 B[3][2] = B[1][3] = avg3p( A + 4); /* (-1, 4) */ 2599 /* Last two values do not strictly follow the pattern. */ 2600 B[2][3] = avg3p( A + 5); /* (-1, 5) [avg2p( A + 4) = 2601 (-1,9/2)] */ 2602 B[3][3] = avg3p( A + 6); /* (-1, 6) [avg3p( A + 5) = 2603 (-1,5)] */ 2604 break; 2606 case B_HD_PRED: /* ESE (horizontal down) step = 2607 (1,2) or (-1,-2) */ 2608 B[3][0] = avg2p( E); /* predictor is from (5/2, -1) */ 2609 B[3][1] = avg3p( E + 1); /* (2, -1) */ 2610 B[2][0] = B[3][2] = svg2p( E + 1); /* ( 3/2, -1) */ 2611 B[2][1] = B[3][3] = avg3p( E + 2); /* ( 1, -1) */ 2612 B[2][2] = B[1][0] = avg2p( E + 2); /* ( 1/2, -1) */ 2613 B[2][3] = B[1][1] = avg3p( E + 3); /* ( 0, -1) */ 2614 B[1][2] = B[0][0] = avg2p( E + 3); /* (-1/2, -1) */ 2615 B[1][3] = B[0][1] = avg3p( E + 4); /* ( -1, -1) */ 2616 B[0][2] = avg3p( E + 5); /* (-1, 0) */ 2617 B[0][3] = avg3p( E + 6); /* (-1, 1) */ 2618 break; 2620 case B_HU_PRED: /* ENE (horizontal up) step = (1,-2) 2621 or (-1,2) */ 2622 B[0][0] = avg2p( L); /* predictor is from ( 1/2, -1) */ 2623 B[0][1] = avg3p( L + 1); /* ( 1, -1) */ 2624 B[0][2] = B[1][0] = avg2p( L + 1); /* (3/2, -1) */ 2625 B[0][3] = B[1][1] = avg3p( L + 2); /* ( 2, -1) */ 2626 B[1][2] = B[2][0] = avg2p( L + 2); /* (5/2, -1) */ 2627 B[1][3] = B[2][1] = avg3( L[2], L[3], L[3]); /* ( 3, -1) */ 2628 /* Not possible to follow pattern for much of the bottom 2629 row because no (nearby) already-constructed pixels lie 2630 on the diagonals in question. */ 2631 B[2][2] = B[2][3] = B[3][0] = B[3][1] = B[3][2] = B[3][3] 2632 = L[3]; 2633 } 2634 } 2636 ---- End code block ---------------------------------------- 2638 The reference decoder implementation of subblock intra-prediction may 2639 be found in reconintra4x4.c. 2641 13. DCT Coefficient Decoding 2643 The second data partition consists of an encoding of the quantized 2644 DCT (and WHT) coefficients of the residue signal. As discussed in 2645 the format overview (Chapter 2), for each macroblock, the residue is 2646 added to the (intra- or inter-generated) prediction buffer to produce 2647 the final (except for loop-filtering) reconstructed macroblock. 2649 VP8 works exclusively with 4x4 DCTs and WHTs, applied to the 24 (or 2650 25 with the Y2 subblock) 4x4 subblocks of a macroblock. The ordering 2651 of macroblocks within any of the "residue" partitions in general 2652 follows the same raster-scan as used in the first "prediction" 2653 partition. 2655 For all intra- and inter-prediction modes apart from B_PRED (intra: 2656 whose Y subblocks are independently predicted) and SPLIT_MV (inter) 2657 each macroblock's residue record begins with the Y2 component of the 2658 residue, coded using a WHT. B_PRED and SPLIT_MV coded macroblocks 2659 omit this WHT, instead specifying the 0th DCT coefficient of each of 2660 the 16 Y subblocks as part of its DCT. 2662 After the optional Y2 block, the residue record continues with 16 2663 DCTs for the Y subblocks, followed by 4 DCTs for the U subblocks, 2664 ending with 4 DCTs for the V subblocks. The subblocks occur in the 2665 usual order. 2667 The DCTs and WHT are tree-coded using a 12-element alphabet whose 2668 members we call tokens. Except for the end of block token (which 2669 sets the remaining subblock coefficients to zero and is followed by 2670 the next block), each token (sometimes augmented with data 2671 immediately following the token) specifies the value of the single 2672 coefficient at the current (implicit) position and is followed by a 2673 token applying to the next (implicit) position. 2675 For all the Y and chroma subblocks, the ordering of the coefficients 2676 follows a so-called zig-zag order. DCTs begin at coefficient 1 if Y2 2677 is present, and begin at coefficient 0 if Y2 is absent. The WHT for 2678 a Y2 subblock always begins at coefficient 0. 2680 13.1. MB Without non-Zero Coefficient Values 2682 If the flag within macroblock mode info indicates that a macroblock 2683 does not have any non-zero coefficients, the decoding process of DCT 2684 coefficients is skipped for the macroblock. 2686 13.2. Coding of Individual Coefficient Values 2688 The coding of coefficient tokens is the same for the DCT and WHT and 2689 for the remainder of this chapter DCT should be taken to mean either 2690 DCT or WHT. 2692 All tokens (except end-of-block) specify either a single unsigned 2693 value or a range of unsigned values (immediately) followed by a 2694 simple probabilistic encoding of the offset of the value from the 2695 base of that range. 2697 Non-zero values (of either type) are then followed by a flag 2698 indicating the sign of the coded value (negative if 1, positive if 2699 0). 2701 Here are the tokens and decoding tree. 2703 ---- Begin code block -------------------------------------- 2705 typedef enum 2706 { 2707 DCT_0, /* value 0 */ 2708 DCT_1, /* 1 */ 2709 DCT_2, /* 2 */ 2710 DCT_3, /* 3 */ 2711 DCT_4, /* 4 */ 2712 dct_cat1, /* range 5 - 6 (size 2) */ 2713 dct_cat2, /* 7 - 10 (4) */ 2714 dct_cat3, /* 11 - 18 (8) */ 2715 dct_cat4, /* 19 - 34 (16) */ 2716 dct_cat5, /* 35 - 66 (32) */ 2717 dct_cat6, /* 67 - 2048 (1982) */ 2718 dct_eob, /* end of block */ 2720 num_dct_tokens /* 12 */ 2721 } 2722 dct_token; 2724 const tree_index coef_tree [2 * (num_dct_tokens - 1)] = 2725 { 2726 -dct_eob, 2, /* eob = "0" */ 2727 -DCT_0, 4, /* 0 = "10" */ 2728 -DCT_1, 6, /* 1 = "110" */ 2729 8, 12, 2730 -DCT_2, 10, /* 2 = "11100" */ 2731 -DCT_3, -DCT_4, /* 3 = "111010", 4 = "111011" */ 2732 14, 16, 2733 -dct_cat1, -dct_cat2, /* cat1 = "111100", 2734 cat2 = "111101" */ 2735 18, 20, 2736 -dct_cat3, -dct_cat4, /* cat3 = "1111100", 2737 cat4 = "1111101" */ 2738 -dct_cat5, -dct_cat6 /* cat4 = "1111110", 2739 cat4 = "1111111" */ 2740 }; 2742 ---- End code block ---------------------------------------- 2744 While in general all DCT coefficients are decoded using the same 2745 tree, decoding of certain DCT coefficients may skip the first branch, 2746 whose preceding coefficient is a DCT_0. This makes use of the fact 2747 that in any block last non zero coefficient before the end of the 2748 block is not 0, therefore no dct_eob follows a DCT_0 coefficient in 2749 any block. 2751 The tokens dct_cat1 ... dct_cat6 specify ranges of unsigned values, 2752 the value within the range being formed by adding an unsigned offset 2753 (whose width is 1, 2, 3, 4, 5, or 11 bits, respectively) to the base 2754 of the range, using the following algorithm and fixed probability 2755 tables. 2757 ---- Begin code block -------------------------------------- 2759 uint DCTextra( bool_decoder *d, const Prob *p) 2760 { 2761 uint v = 0; 2762 do { v += v + read_bool( d, *p);} while( *++p); 2763 return v; 2764 } 2766 const Prob Pcat1[] = { 159, 0}; 2767 const Prob Pcat2[] = { 165, 145, 0}; 2768 const Prob Pcat3[] = { 173, 148, 140, 0}; 2769 const Prob Pcat4[] = { 176, 155, 140, 135, 0}; 2770 const Prob Pcat5[] = { 180, 157, 141, 134, 130, 0}; 2771 const Prob Pcat6[] = 2772 { 254, 254, 243, 230, 196, 177, 153, 140, 133, 130, 129, 0}; 2774 ---- End code block ---------------------------------------- 2776 If v, the unsigned value decoded using the coefficient tree, possibly 2777 augmented by the process above, is non-zero, its sign is set by 2778 simply reading a flag: 2780 ---- Begin code block -------------------------------------- 2782 if( read_bool( d, 128)) 2783 v = -v; 2785 ---- End code block ---------------------------------------- 2787 13.3. Token Probabilities 2789 The probability specification for the token tree (unlike that for the 2790 "extra bits" described above) is rather involved. It uses three 2791 pieces of context to index a large probability table, the contents of 2792 which may be incrementally modified in the frame header. The full 2793 (non-constant) probability table is laid out as follows. 2795 ---- Begin code block -------------------------------------- 2797 Prob coef_probs [4] [8] [3] [num_dct_tokens-1]; 2798 ---- End code block ---------------------------------------- 2800 Working from the outside in, the outermost dimension is indexed by 2801 the type of plane being decoded: 2803 o 0 - Y beginning at coefficient 1 (i.e., Y after Y2) 2805 o 1 - Y2 2807 o 2 - U or V 2809 o 3 - Y beginning at coefficient 0 (i.e., Y in the absence of Y2). 2811 The next dimension is selected by the position of the coefficient 2812 being decoded. That position c steps by ones up to 15, starting from 2813 zero for block types 1, 2, or 3 and starting from one for block type 2814 0. The second array index is then 2816 ---- Begin code block -------------------------------------- 2818 coef_bands [c] 2820 ---- End code block ---------------------------------------- 2822 where 2824 ---- Begin code block -------------------------------------- 2826 const int coef_bands [16] = { 2827 0, 1, 2, 3, 6, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 7 2828 }; 2830 ---- End code block ---------------------------------------- 2832 is a fixed mapping of position to "band". 2834 The third dimension is the trickiest. Roughly speaking, it measures 2835 the "local complexity" or extent to which nearby coefficients are 2836 non-zero. 2838 For the first coefficient (DC, unless the block type is 0), we 2839 consider the (already encoded) blocks within the same plane (Y2, Y, 2840 U, or V) above and to the left of the current block. The context 2841 index is then the number (0, 1 or 2) of these blocks that had at 2842 least one non-zero coefficient in their residue record. 2844 Beyond the first coefficient, the context index is determined by the 2845 absolute value of the most recently decoded coefficient (necessarily 2846 within the current block) and is 0 if the last coefficient was a 2847 zero, 1 if it was plus or minus one, and 2 if its absolute value 2848 exceeded one. 2850 Note that the intuitive meaning of this measure changes as 2851 coefficients are decoded. For example, prior to the first token, a 2852 zero means that the neighbors are empty, suggesting that the current 2853 block may also be empty. After the first token, because an end-of- 2854 block token must have at least one non-zero value before it, a zero 2855 means that we just decoded a zero and hence guarantees that a non- 2856 zero coefficient will appear later in this block. However, this 2857 shift in meaning is perfectly okay because the complete context 2858 depends also on the coefficient band (and since band 0 is occupied 2859 exclusively by position 0). 2861 As with other contexts used by VP8, the "neighboring block" context 2862 described here needs a special definition for subblocks lying along 2863 the top row or left edge of the frame. These "non-existent" 2864 predictors above and to the left of the image are simply taken to be 2865 empty -- that is, taken to contain no non-zero coefficients. 2867 The residue decoding of each macroblock then requires, in each of two 2868 directions (above and to the left), an aggregate coefficient 2869 predictor consisting of a single Y2 predictor, two predictors for 2870 each of U and V, and four predictors for Y. In accordance with the 2871 scan-ordering of macroblocks, a decoder needs to maintain a single 2872 "left" aggregate predictor and a row of "above" aggregate predictors. 2874 Before decoding any residue, these maintained predictors may simply 2875 be cleared, in compliance with the definition of "non-existent" 2876 prediction. After each block is decoded, the two predictors 2877 referenced by the block are replaced with the (empty or non-empty) 2878 state of the block, in preparation for the later decoding of the 2879 blocks below and to the right of the block just decoded. 2881 The fourth, and final, dimension of the token probability array is of 2882 course indexed by (half) the position in the token tree structure, as 2883 are all tree probability arrays. 2885 The below pseudo-code illustrates the decoding process. Note that 2886 criteria, functions, etc. delimited with ** are either dependent on 2887 decoder architecture or are elaborated on elsewhere in this document. 2889 ---- Begin code block -------------------------------------- 2891 int block[16] = { 0 }; /* current 4x4 block coeffs */ 2892 int firstCoeff = 0; 2893 int plane; 2894 int ctx2; 2895 int ctx3 = 0; /* the 3rd context referred to in above description */ 2896 Prob *probTable; 2897 int token; 2898 int sign; 2899 int absValue; 2900 int extraBits; 2901 bool prevCoeffWasZero = false; 2902 bool currentBlockHasCoeffs = false; 2903 /* base coeff abs values per each category, elem #0 is 2904 DCT_VAL_CATEGORY1, * #1 is DCT_VAL_CATEGORY2 etc */ 2905 int categoryBase[6] = { 5, 7, 11, 19, 35, 67 }; 2907 /* Determine plane to use */ 2908 if( **current_block_is_Y2_block** ) plane = 0; 2909 else if ( **current_block_is_chroma** ) plane = 2; 2910 else if ( **current_macroblock_has_Y2** ) plane = 1; 2911 else plane = 3; 2913 /* For luma blocks of an "Y2 macroblock" we skip coeff index #0 */ 2914 if( plane == 1 ) 2915 firstCoeff++; 2917 /* Determine whether neighbour 4x4 blocks have coeffiecients. 2918 This is dependant of the plane we are currently decoding; 2919 i.e. we check only coefficients from same plane as current 2920 block. */ 2921 if( **left_neighbor_block_has_coefficients(plane)** ) 2922 ctx3++; 2923 if( **above_neighbor_block_has_coefficients(plane)** ) 2924 ctx3++; 2926 for( i = firstCoeff ; i < 16 ; ++i ) 2927 { 2928 ctx2 = coef_bands[i]; 2929 probTable = coef_probs[plane][ctx2][ctx3]; 2931 /* skip first code (dct_eob) if previous token was DCT_0 */ 2932 if( prevCoeffWasZero ) 2933 token = treed_read ( d, **coef_tree_without_eob**, 2934 probTable ); 2935 else 2936 token = treed_read ( d, coef_tree, probTable ); 2938 if( token == dct_eob ) 2939 break; 2941 if( token != DCT_0 ) 2942 { 2943 currentBlockHasCoeffs = true; 2944 if( **token_has_extra_bits(token)** ) 2945 { 2946 extraBits = DCTextra( token ); 2947 absValue = 2948 categoryBase[**token_to_cat_index(token)**] + 2949 extraBits; 2950 } 2951 else 2952 { 2953 absValue = **token_to_abs_value(token)**; 2954 } 2956 sign = read_bool(d, 128); 2957 block[i] = sign ? -absValue : absValue; 2958 } 2959 else 2960 { 2961 absValue = 0; 2962 } 2964 /* Set contexts and stuff for next coeff */ 2965 if( absValue == 0 ) ctx3 = 0; 2966 else if ( absValue == 1 ) ctx3 = 1; 2967 else ctx3 = 2; 2968 prevCoeffWasZero = true; 2969 } 2971 /* Store current block status to decoder internals */ 2972 **block_has_coefficients[currentMb][currentBlock]** = 2973 currentBlockHasCoeffs; 2975 ---- End code block ---------------------------------------- 2977 While we have in fact completely described the coefficient decoding 2978 procedure, the reader will probably find it helpful to consult the 2979 reference implementation, which can be found in the file 2980 detokenize.c. 2982 13.4. Token Probability Updates 2984 As mentioned above, the token-decoding probabilities may change from 2985 frame to frame. After detection of a key frame, they are of course 2986 set to their defaults shown in Section 13.5; this must occur before 2987 decoding the remainder of the header, as both key frames and 2988 interframes may adjust these probabilities. 2990 The layout and semantics of the coefficient probability update record 2991 (Section I of the frame header) are straightforward. For each 2992 position in the coef_probs array there occurs a fixed-probability 2993 bool indicating whether or not the corresponding probability should 2994 be updated. If the bool is true, there follows a P(8) replacing that 2995 probability. Note that updates are cumulative, that is, a 2996 probability updated on one frame is in effect for all ensuing frames 2997 until the next key frame, or until the probability is explicitly 2998 updated by another frame. 3000 The algorithm to effect the foregoing is simple: 3002 ---- Begin code block -------------------------------------- 3004 int i = 0; do { 3005 int j = 0; do { 3006 int k = 0; do { 3007 int t = 0; do { 3009 if( read_bool( d, coef_update_probs [i] [j] [k] [t])) 3010 coef_probs [i] [j] [k] [t] = read_literal( d, 8); 3012 } while( ++t < num_dct_tokens - 1); 3013 } while( ++k < 3); 3014 } while( ++j < 8); 3015 } while( ++i < 4); 3017 ---- End code block ---------------------------------------- 3019 The (constant) update probabilities are as follows (they may also be 3020 found in the reference decoder file coef_update_probs.c). 3022 ---- Begin code block -------------------------------------- 3024 const Prob coef_update_probs [4] [8] [3] [num_dct_tokens-1] = 3025 { 3026 { 3027 { 3028 { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}, 3029 { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}, 3030 { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255} 3031 }, 3032 { 3033 { 176, 246, 255, 255, 255, 255, 255, 255, 255, 255, 255}, 3034 { 223, 241, 252, 255, 255, 255, 255, 255, 255, 255, 255}, 3035 { 249, 253, 253, 255, 255, 255, 255, 255, 255, 255, 255} 3036 }, 3037 { 3038 { 255, 244, 252, 255, 255, 255, 255, 255, 255, 255, 255}, 3039 { 234, 254, 254, 255, 255, 255, 255, 255, 255, 255, 255}, 3040 { 253, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255} 3041 }, 3042 { 3043 { 255, 246, 254, 255, 255, 255, 255, 255, 255, 255, 255}, 3044 { 239, 253, 254, 255, 255, 255, 255, 255, 255, 255, 255}, 3045 { 254, 255, 254, 255, 255, 255, 255, 255, 255, 255, 255} 3046 }, 3047 { 3048 { 255, 248, 254, 255, 255, 255, 255, 255, 255, 255, 255}, 3049 { 251, 255, 254, 255, 255, 255, 255, 255, 255, 255, 255}, 3050 { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255} 3051 }, 3052 { 3053 { 255, 253, 254, 255, 255, 255, 255, 255, 255, 255, 255}, 3054 { 251, 254, 254, 255, 255, 255, 255, 255, 255, 255, 255}, 3055 { 254, 255, 254, 255, 255, 255, 255, 255, 255, 255, 255} 3056 }, 3057 { 3058 { 255, 254, 253, 255, 254, 255, 255, 255, 255, 255, 255}, 3059 { 250, 255, 254, 255, 254, 255, 255, 255, 255, 255, 255}, 3060 { 254, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255} 3061 }, 3062 { 3063 { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}, 3064 { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}, 3065 { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255} 3066 } 3067 }, 3068 { 3069 { 3070 { 217, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}, 3071 { 225, 252, 241, 253, 255, 255, 254, 255, 255, 255, 255}, 3072 { 234, 250, 241, 250, 253, 255, 253, 254, 255, 255, 255} 3073 }, 3074 { 3075 { 255, 254, 255, 255, 255, 255, 255, 255, 255, 255, 255}, 3076 { 223, 254, 254, 255, 255, 255, 255, 255, 255, 255, 255}, 3077 { 238, 253, 254, 254, 255, 255, 255, 255, 255, 255, 255} 3078 }, 3079 { 3080 { 255, 248, 254, 255, 255, 255, 255, 255, 255, 255, 255}, 3081 { 249, 254, 255, 255, 255, 255, 255, 255, 255, 255, 255}, 3082 { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255} 3083 }, 3084 { 3085 { 255, 253, 255, 255, 255, 255, 255, 255, 255, 255, 255}, 3086 { 247, 254, 255, 255, 255, 255, 255, 255, 255, 255, 255}, 3087 { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255} 3088 }, 3089 { 3090 { 255, 253, 254, 255, 255, 255, 255, 255, 255, 255, 255}, 3091 { 252, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}, 3092 { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255} 3093 }, 3094 { 3095 { 255, 254, 254, 255, 255, 255, 255, 255, 255, 255, 255}, 3096 { 253, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}, 3097 { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255} 3098 }, 3099 { 3100 { 255, 254, 253, 255, 255, 255, 255, 255, 255, 255, 255}, 3101 { 250, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}, 3102 { 254, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255} 3103 }, 3104 { 3105 { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}, 3106 { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}, 3107 { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255} 3108 } 3109 }, 3110 { 3111 { 3112 { 186, 251, 250, 255, 255, 255, 255, 255, 255, 255, 255}, 3113 { 234, 251, 244, 254, 255, 255, 255, 255, 255, 255, 255}, 3114 { 251, 251, 243, 253, 254, 255, 254, 255, 255, 255, 255} 3115 }, 3116 { 3117 { 255, 253, 254, 255, 255, 255, 255, 255, 255, 255, 255}, 3118 { 236, 253, 254, 255, 255, 255, 255, 255, 255, 255, 255}, 3119 { 251, 253, 253, 254, 254, 255, 255, 255, 255, 255, 255} 3120 }, 3121 { 3122 { 255, 254, 254, 255, 255, 255, 255, 255, 255, 255, 255}, 3123 { 254, 254, 254, 255, 255, 255, 255, 255, 255, 255, 255}, 3124 { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255} 3125 }, 3126 { 3127 { 255, 254, 255, 255, 255, 255, 255, 255, 255, 255, 255}, 3128 { 254, 254, 255, 255, 255, 255, 255, 255, 255, 255, 255}, 3129 { 254, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255} 3130 }, 3131 { 3132 { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}, 3133 { 254, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}, 3134 { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255} 3135 }, 3136 { 3137 { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}, 3138 { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}, 3139 { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255} 3140 }, 3141 { 3142 { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}, 3143 { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}, 3144 { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255} 3145 }, 3146 { 3147 { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}, 3148 { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}, 3149 { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255} 3150 } 3151 }, 3152 { 3153 { 3154 { 248, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}, 3155 { 250, 254, 252, 254, 255, 255, 255, 255, 255, 255, 255}, 3156 { 248, 254, 249, 253, 255, 255, 255, 255, 255, 255, 255} 3157 }, 3158 { 3159 { 255, 253, 253, 255, 255, 255, 255, 255, 255, 255, 255}, 3160 { 246, 253, 253, 255, 255, 255, 255, 255, 255, 255, 255}, 3161 { 252, 254, 251, 254, 254, 255, 255, 255, 255, 255, 255} 3162 }, 3163 { 3164 { 255, 254, 252, 255, 255, 255, 255, 255, 255, 255, 255}, 3165 { 248, 254, 253, 255, 255, 255, 255, 255, 255, 255, 255}, 3166 { 253, 255, 254, 254, 255, 255, 255, 255, 255, 255, 255} 3167 }, 3168 { 3169 { 255, 251, 254, 255, 255, 255, 255, 255, 255, 255, 255}, 3170 { 245, 251, 254, 255, 255, 255, 255, 255, 255, 255, 255}, 3171 { 253, 253, 254, 255, 255, 255, 255, 255, 255, 255, 255} 3172 }, 3173 { 3174 { 255, 251, 253, 255, 255, 255, 255, 255, 255, 255, 255}, 3175 { 252, 253, 254, 255, 255, 255, 255, 255, 255, 255, 255}, 3176 { 255, 254, 255, 255, 255, 255, 255, 255, 255, 255, 255} 3177 }, 3178 { 3179 { 255, 252, 255, 255, 255, 255, 255, 255, 255, 255, 255}, 3180 { 249, 255, 254, 255, 255, 255, 255, 255, 255, 255, 255}, 3181 { 255, 255, 254, 255, 255, 255, 255, 255, 255, 255, 255} 3183 }, 3184 { 3185 { 255, 255, 253, 255, 255, 255, 255, 255, 255, 255, 255}, 3186 { 250, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}, 3187 { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255} 3188 }, 3189 { 3190 { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}, 3191 { 254, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}, 3192 { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255} 3193 } 3194 } 3195 }; 3197 ---- End code block ---------------------------------------- 3199 13.5. Default Token Probability Table 3201 The default token probabilities are as follows. 3203 ---- Begin code block -------------------------------------- 3205 const Prob default_coef_probs [4] [8] [3] [num_dct_tokens - 1] = 3206 { 3207 { 3208 { 3209 { 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128}, 3210 { 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128}, 3211 { 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128} 3212 }, 3213 { 3214 { 253, 136, 254, 255, 228, 219, 128, 128, 128, 128, 128}, 3215 { 189, 129, 242, 255, 227, 213, 255, 219, 128, 128, 128}, 3216 { 106, 126, 227, 252, 214, 209, 255, 255, 128, 128, 128} 3217 }, 3218 { 3219 { 1, 98, 248, 255, 236, 226, 255, 255, 128, 128, 128}, 3220 { 181, 133, 238, 254, 221, 234, 255, 154, 128, 128, 128}, 3221 { 78, 134, 202, 247, 198, 180, 255, 219, 128, 128, 128} 3222 }, 3223 { 3224 { 1, 185, 249, 255, 243, 255, 128, 128, 128, 128, 128}, 3225 { 184, 150, 247, 255, 236, 224, 128, 128, 128, 128, 128}, 3226 { 77, 110, 216, 255, 236, 230, 128, 128, 128, 128, 128} 3227 }, 3228 { 3229 { 1, 101, 251, 255, 241, 255, 128, 128, 128, 128, 128}, 3230 { 170, 139, 241, 252, 236, 209, 255, 255, 128, 128, 128}, 3231 { 37, 116, 196, 243, 228, 255, 255, 255, 128, 128, 128} 3232 }, 3233 { 3234 { 1, 204, 254, 255, 245, 255, 128, 128, 128, 128, 128}, 3235 { 207, 160, 250, 255, 238, 128, 128, 128, 128, 128, 128}, 3236 { 102, 103, 231, 255, 211, 171, 128, 128, 128, 128, 128} 3237 }, 3238 { 3239 { 1, 152, 252, 255, 240, 255, 128, 128, 128, 128, 128}, 3240 { 177, 135, 243, 255, 234, 225, 128, 128, 128, 128, 128}, 3241 { 80, 129, 211, 255, 194, 224, 128, 128, 128, 128, 128} 3242 }, 3243 { 3244 { 1, 1, 255, 128, 128, 128, 128, 128, 128, 128, 128}, 3245 { 246, 1, 255, 128, 128, 128, 128, 128, 128, 128, 128}, 3246 { 255, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128} 3247 } 3248 }, 3249 { 3250 { 3251 { 198, 35, 237, 223, 193, 187, 162, 160, 145, 155, 62}, 3252 { 131, 45, 198, 221, 172, 176, 220, 157, 252, 221, 1}, 3253 { 68, 47, 146, 208, 149, 167, 221, 162, 255, 223, 128} 3254 }, 3255 { 3256 { 1, 149, 241, 255, 221, 224, 255, 255, 128, 128, 128}, 3257 { 184, 141, 234, 253, 222, 220, 255, 199, 128, 128, 128}, 3258 { 81, 99, 181, 242, 176, 190, 249, 202, 255, 255, 128} 3259 }, 3260 { 3261 { 1, 129, 232, 253, 214, 197, 242, 196, 255, 255, 128}, 3262 { 99, 121, 210, 250, 201, 198, 255, 202, 128, 128, 128}, 3263 { 23, 91, 163, 242, 170, 187, 247, 210, 255, 255, 128} 3264 }, 3265 { 3266 { 1, 200, 246, 255, 234, 255, 128, 128, 128, 128, 128}, 3267 { 109, 178, 241, 255, 231, 245, 255, 255, 128, 128, 128}, 3268 { 44, 130, 201, 253, 205, 192, 255, 255, 128, 128, 128} 3269 }, 3270 { 3271 { 1, 132, 239, 251, 219, 209, 255, 165, 128, 128, 128}, 3272 { 94, 136, 225, 251, 218, 190, 255, 255, 128, 128, 128}, 3273 { 22, 100, 174, 245, 186, 161, 255, 199, 128, 128, 128} 3274 }, 3275 { 3276 { 1, 182, 249, 255, 232, 235, 128, 128, 128, 128, 128}, 3277 { 124, 143, 241, 255, 227, 234, 128, 128, 128, 128, 128}, 3278 { 35, 77, 181, 251, 193, 211, 255, 205, 128, 128, 128} 3280 }, 3281 { 3282 { 1, 157, 247, 255, 236, 231, 255, 255, 128, 128, 128}, 3283 { 121, 141, 235, 255, 225, 227, 255, 255, 128, 128, 128}, 3284 { 45, 99, 188, 251, 195, 217, 255, 224, 128, 128, 128} 3285 }, 3286 { 3287 { 1, 1, 251, 255, 213, 255, 128, 128, 128, 128, 128}, 3288 { 203, 1, 248, 255, 255, 128, 128, 128, 128, 128, 128}, 3289 { 137, 1, 177, 255, 224, 255, 128, 128, 128, 128, 128} 3290 } 3291 }, 3292 { 3293 { 3294 { 253, 9, 248, 251, 207, 208, 255, 192, 128, 128, 128}, 3295 { 175, 13, 224, 243, 193, 185, 249, 198, 255, 255, 128}, 3296 { 73, 17, 171, 221, 161, 179, 236, 167, 255, 234, 128} 3297 }, 3298 { 3299 { 1, 95, 247, 253, 212, 183, 255, 255, 128, 128, 128}, 3300 { 239, 90, 244, 250, 211, 209, 255, 255, 128, 128, 128}, 3301 { 155, 77, 195, 248, 188, 195, 255, 255, 128, 128, 128} 3302 }, 3303 { 3304 { 1, 24, 239, 251, 218, 219, 255, 205, 128, 128, 128}, 3305 { 201, 51, 219, 255, 196, 186, 128, 128, 128, 128, 128}, 3306 { 69, 46, 190, 239, 201, 218, 255, 228, 128, 128, 128} 3307 }, 3308 { 3309 { 1, 191, 251, 255, 255, 128, 128, 128, 128, 128, 128}, 3310 { 223, 165, 249, 255, 213, 255, 128, 128, 128, 128, 128}, 3311 { 141, 124, 248, 255, 255, 128, 128, 128, 128, 128, 128} 3312 }, 3313 { 3314 { 1, 16, 248, 255, 255, 128, 128, 128, 128, 128, 128}, 3315 { 190, 36, 230, 255, 236, 255, 128, 128, 128, 128, 128}, 3316 { 149, 1, 255, 128, 128, 128, 128, 128, 128, 128, 128} 3317 }, 3318 { 3319 { 1, 226, 255, 128, 128, 128, 128, 128, 128, 128, 128}, 3320 { 247, 192, 255, 128, 128, 128, 128, 128, 128, 128, 128}, 3321 { 240, 128, 255, 128, 128, 128, 128, 128, 128, 128, 128} 3322 }, 3323 { 3324 { 1, 134, 252, 255, 255, 128, 128, 128, 128, 128, 128}, 3325 { 213, 62, 250, 255, 255, 128, 128, 128, 128, 128, 128}, 3326 { 55, 93, 255, 128, 128, 128, 128, 128, 128, 128, 128} 3327 }, 3328 { 3329 { 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128}, 3330 { 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128}, 3331 { 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128} 3332 } 3333 }, 3334 { 3335 { 3336 { 202, 24, 213, 235, 186, 191, 220, 160, 240, 175, 255}, 3337 { 126, 38, 182, 232, 169, 184, 228, 174, 255, 187, 128}, 3338 { 61, 46, 138, 219, 151, 178, 240, 170, 255, 216, 128} 3339 }, 3340 { 3341 { 1, 112, 230, 250, 199, 191, 247, 159, 255, 255, 128}, 3342 { 166, 109, 228, 252, 211, 215, 255, 174, 128, 128, 128}, 3343 { 39, 77, 162, 232, 172, 180, 245, 178, 255, 255, 128} 3344 }, 3345 { 3346 { 1, 52, 220, 246, 198, 199, 249, 220, 255, 255, 128}, 3347 { 124, 74, 191, 243, 183, 193, 250, 221, 255, 255, 128}, 3348 { 24, 71, 130, 219, 154, 170, 243, 182, 255, 255, 128} 3349 }, 3350 { 3351 { 1, 182, 225, 249, 219, 240, 255, 224, 128, 128, 128}, 3352 { 149, 150, 226, 252, 216, 205, 255, 171, 128, 128, 128}, 3353 { 28, 108, 170, 242, 183, 194, 254, 223, 255, 255, 128} 3354 }, 3355 { 3356 { 1, 81, 230, 252, 204, 203, 255, 192, 128, 128, 128}, 3357 { 123, 102, 209, 247, 188, 196, 255, 233, 128, 128, 128}, 3358 { 20, 95, 153, 243, 164, 173, 255, 203, 128, 128, 128} 3359 }, 3360 { 3361 { 1, 222, 248, 255, 216, 213, 128, 128, 128, 128, 128}, 3362 { 168, 175, 246, 252, 235, 205, 255, 255, 128, 128, 128}, 3363 { 47, 116, 215, 255, 211, 212, 255, 255, 128, 128, 128} 3364 }, 3365 { 3366 { 1, 121, 236, 253, 212, 214, 255, 255, 128, 128, 128}, 3367 { 141, 84, 213, 252, 201, 202, 255, 219, 128, 128, 128}, 3368 { 42, 80, 160, 240, 162, 185, 255, 205, 128, 128, 128} 3369 }, 3370 { 3371 { 1, 1, 255, 128, 128, 128, 128, 128, 128, 128, 128}, 3372 { 244, 1, 255, 128, 128, 128, 128, 128, 128, 128, 128}, 3373 { 238, 1, 255, 128, 128, 128, 128, 128, 128, 128, 128} 3374 } 3375 } 3377 }; 3379 ---- End code block ---------------------------------------- 3381 14. DCT and WHT Inversion and Macroblock Reconstruction 3383 14.1. Dequantization 3385 After decoding the DCTs/WHTs as described above, each (quantized) 3386 coefficient in each subblock is multiplied by one of six 3387 dequantization factors, the choice of factor depending on the plane 3388 (Y2, Y, or chroma) and position (DC = coefficient zero, AC = any 3389 other coefficient). If the current macroblock has overridden the 3390 quantization level (as described in Chapter 10) then the six factors 3391 are looked up from two dequantization tables with appropriate scaling 3392 and clamping using the single index supplied by the override. 3393 Otherwise, the frame-level dequantization factors (as described in 3394 Section 9.6 are used. In either case, the multiplies are computed 3395 and stored using 16-bit signed integers. 3397 The two dequantization tables, which may also be found in the 3398 reference decoder file quant_common.c, are as follows. 3400 ---- Begin code block -------------------------------------- 3402 static const int dc_qlookup[QINDEX_RANGE] = 3403 { 3404 4, 5, 6, 7, 8, 9, 10, 10, 11, 12, 13, 14, 15, 3405 16, 17, 17, 18, 19, 20, 20, 21, 21, 22, 22, 23, 23, 3406 24, 25, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 3407 36, 37, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 46, 3408 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 3409 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 3410 73, 74, 75, 76, 76, 77, 78, 79, 80, 81, 82, 83, 84, 3411 85, 86, 87, 88, 89, 91, 93, 95, 96, 98, 100, 101, 102, 3412 104, 106, 108, 110, 112, 114, 116, 118, 122, 124, 126, 128, 130, 3413 132, 134, 136, 138, 140, 143, 145, 148, 151, 154, 157, 3414 }; 3416 static const int ac_qlookup[QINDEX_RANGE] = 3417 { 3418 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 3419 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 3420 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 3421 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 3422 56, 57, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 3423 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102, 104, 3424 106, 108, 110, 112, 114, 116, 119, 122, 125, 128, 131, 134, 137, 3425 140, 143, 146, 149, 152, 155, 158, 161, 164, 167, 170, 173, 177, 3426 181, 185, 189, 193, 197, 201, 205, 209, 213, 217, 221, 225, 229, 3427 234, 239, 245, 249, 254, 259, 264, 269, 274, 279, 284, 3428 }; 3430 ---- End code block ---------------------------------------- 3432 Lookup values from the above two tables are directly used the DC and 3433 AC coefficients in Y1 respectively. For Y2 and chroma, values from 3434 above tables undergo either a scaling process or clamping processing 3435 before the multiplies. Details to these scaling and clamping can be 3436 found related lookup functions in quant_common.c. 3438 14.2. Inverse Transforms 3440 If the Y2 residue block exists (i.e., the macroblock luma mode is not 3441 SPLITMV or B_PRED), it is inverted first (using the inverse WHT) and 3442 the element of the result at row i, column j is used as the 0th 3443 coefficient of the Y subblock at position (i, j), that is, the Y 3444 subblock whose index is (i * 4) + j. As discussed in Chapter 13, if 3445 the luma mode is B_PRED or SPLITMV, the 0th Y coefficients are part 3446 of the residue signal for the subblocks themselves. 3448 In either case, the inverse transforms for the sixteen Y subblocks 3449 and eight chroma subblocks are computed next. All 24 of these 3450 inversions are independent of each other; their results may (at least 3451 conceptually) be stored in 24 separate 4x4 arrays. 3453 As is done by the reference decoder, an implementation may wish to 3454 represent the prediction and residue buffers as macroblock-sized 3455 arrays (that is, a 16x16 Y buffer and two 8x8 chroma buffers). 3456 Regarding the inverse DCT implementation given below, this requires a 3457 simple adjustment to the address calculation for the resulting 3458 residue pixels. 3460 14.3. Implementation of the WHT Inversion 3462 As previously discussed (see Chapters 2 and 13), for macroblocks 3463 encoded using prediction modes other than B_PRED and SPLITMV, the DC 3464 values derived from the DCT transform on the 16 Y blocks are 3465 collected to construct a 25th block of a macroblock(16 Y, 4 U, 4 V 3466 constitute the 24 blocks). This 25th block is transformed using a 3467 Walsh-Hadamard transform (WHT). 3469 The inputs to the inverse WHT (that is, the dequantized 3470 coefficients), the intermediate "horizontally detransformed" signal, 3471 and the completely detransformed residue signal are all stored as 3472 arrays of 16-bit signed integers. 3474 Following the tradition of specifying bitstream format using the 3475 decoding process, we specify the inverse WHT in the decoding process 3476 using the following C style source code: 3478 ---- Begin code block -------------------------------------- 3480 void vp8_short_inv_walsh4x4_c(short *input, short *output) 3481 { 3482 int i; 3483 int a1, b1, c1, d1; 3484 int a2, b2, c2, d2; 3485 short *ip = input; 3486 short *op = output; 3487 int temp1, temp2; 3489 for(i=0;i<4;i++) 3490 { 3491 a1 = ip[0] + ip[12]; 3492 b1 = ip[4] + ip[8]; 3493 c1 = ip[4] - ip[8]; 3494 d1 = ip[0] - ip[12]; 3495 op[0] = a1 + b1; 3496 op[4] = c1 + d1; 3497 op[8] = a1 - b1; 3498 op[12]= d1 - c1; 3499 ip++; 3500 op++; 3501 } 3502 ip = output; 3503 op = output; 3504 for(i=0;i<4;i++) 3505 { 3506 a1 = ip[0] + ip[3]; 3507 b1 = ip[1] + ip[2]; 3508 c1 = ip[1] - ip[2]; 3509 d1 = ip[0] - ip[3]; 3511 a2 = a1 + b1; 3512 b2 = c1 + d1; 3513 c2 = a1 - b1; 3514 d2 = d1 - c1; 3516 op[0] = (a2+3)>>3; 3517 op[1] = (b2+3)>>3; 3518 op[2] = (c2+3)>>3; 3519 op[3] = (d2+3)>>3; 3521 ip+=4; 3522 op+=4; 3523 } 3524 } 3526 ---- End code block ---------------------------------------- 3528 In the case that there is only one non-zero DC value in input, the 3529 inverse transform can be simplified to the following: 3531 ---- Begin code block -------------------------------------- 3533 void vp8_short_inv_walsh4x4_1_c(short *input, short *output) 3534 { 3535 int i; 3536 int a1; 3537 short *op=output; 3539 a1 = ((input[0] + 3)>>3); 3541 for(i=0;i<4;i++) 3542 { 3543 op[0] = a1; 3544 op[1] = a1; 3545 op[2] = a1; 3546 op[3] = a1; 3547 op+=4; 3548 } 3549 } 3551 ---- End code block ---------------------------------------- 3553 It should be noted, a conforming decoder should implement the inverse 3554 transform using exactly the same rounding to achieve bit-wise 3555 matching output to the output of the process specified by the above 3556 "C" source code. 3558 The reference decoder WHT inversion may be found in the files 3559 invtrans.c and idctllm.c. 3561 14.4. Implementation of the DCT Inversion 3563 All of the DCT inversions are computed in exactly the same way. In 3564 principle, VP8 uses a classical 2D inverse discrete cosine transform, 3565 implemented as two passes of 1-D inverse DCT. The 1-D inverse DCT 3566 was calculated using a similar algorithm to what was described in the 3567 paper "Practical Fast 1-D DCT Algorithms with 11 Multiplications" by 3568 Loeffler, Lightenberg and Moschytz. However, the paper only provided 3569 the 8-point and 16-point version of the algorithms, which was adapted 3570 by On2 to perform the 4-point 1-D DCT. 3572 Accurate calculation of 1-D DCT of the above algorithm requires 3573 infinite precision. VP8 of course can use only a finite-precision 3574 approximation. Also, the inverse DCT used by VP8 takes care of 3575 normalization of the standard unitary transform, that is, every 3576 dequantized coefficient has roughly double the size of the 3577 corresponding unitary coefficient. However, at all but the highest 3578 datarates, the discrepancy between transmitted and ideal coefficients 3579 is due almost entirely to (lossy) compression and not to errors 3580 induced by finite-precision arithmetic. 3582 The inputs to the inverse DCT (that is, the dequantized 3583 coefficients), the intermediate "horizontally detransformed" signal, 3584 and the completely detransformed residue signal are all stored as 3585 arrays of 16-bit signed integers. The details of the computation are 3586 as follows. 3588 It should also be noted that this implementation makes use of 16-bit 3589 fixed point version of two multiplication constants: 3591 sqrt(2) * cos (pi/8) 3593 sqrt(2) * sin (pi/8) 3595 Because the first constant is bigger than 1, to maintain the same 16- 3596 bit fixed point precision as the second one, we make use of the fact 3597 that 3599 x * a = x + x*(a-1) 3601 therefore 3603 x * sqrt(2) * cos (pi/8) = x + x * ( sqrt(2) * cos(pi/8)-1) 3605 ---- Begin code block -------------------------------------- 3607 /* IDCT implementation */ 3608 static const int cospi8sqrt2minus1=20091; 3609 static const int sinpi8sqrt2 =35468; 3610 void short_idct4x4llm_c(short *input, short *output, int pitch) 3611 { 3612 int i; 3613 int a1, b1, c1, d1; 3615 short *ip=input; 3616 short *op=output; 3617 int temp1, temp2; 3618 int shortpitch = pitch>>1; 3620 for(i=0;i<4;i++) 3621 { 3622 a1 = ip[0]+ip[8]; 3623 b1 = ip[0]-ip[8]; 3624 temp1 = (ip[4] * sinpi8sqrt2)>>16; 3625 temp2 = ip[12]+((ip[12] * cospi8sqrt2minus1)>>16); 3626 c1 = temp1 - temp2; 3628 temp1 = ip[4] + ((ip[4] * cospi8sqrt2minus1)>>16); 3629 temp2 = (ip[12] * sinpi8sqrt2)>>16; 3630 d1 = temp1 + temp2; 3632 op[shortpitch*0] = a1+d1; 3633 op[shortpitch*3] = a1-d1; 3634 op[shortpitch*1] = b1+c1; 3635 op[shortpitch*2] = b1-c1; 3637 ip++; 3638 op++; 3639 } 3640 ip = output; 3641 op = output; 3642 for(i=0;i<4;i++) 3643 { 3644 a1 = ip[0]+ip[2]; 3645 b1 = ip[0]-ip[2]; 3647 temp1 = (ip[1] * sinpi8sqrt2)>>16; 3648 temp2 = ip[3]+((ip[3] * cospi8sqrt2minus1)>>16); 3649 c1 = temp1 - temp2; 3651 temp1 = ip[1] + ((ip[1] * cospi8sqrt2minus1)>>16); 3652 temp2 = (ip[3] * sinpi8sqrt2)>>16; 3653 d1 = temp1 + temp2; 3655 op[0] = (a1+d1+4)>>3; 3656 op[3] = (a1-d1+4)>>3; 3657 op[1] = (b1+c1+4)>>3; 3658 op[2] = (b1-c1+4)>>3; 3660 ip+=shortpitch; 3661 op+=shortpitch; 3662 } 3663 } 3665 ---- End code block ---------------------------------------- 3667 The reference decoder DCT inversion may be found in the files 3668 invtrans.c and idctllm.c. 3670 14.5. Summation of Predictor and Residue 3672 Finally, the prediction and residue signals are summed to form the 3673 reconstructed macroblock, which, except for loop filtering (taken up 3674 next), completes the decoding process. 3676 The summing procedure is fairly straightforward, having only a couple 3677 of details. The prediction and residue buffers are both arrays of 3678 16-bit signed integers. Each individual (Y, U, and V pixel) result 3679 is calculated first as a 32-bit sum of the prediction and residue, 3680 and is then saturated to 8-bit unsigned range (using, say, the 3681 clamp255 function defined above) before being stored as an 8-bit 3682 unsigned pixel value. 3684 VP8 also supports a mode where the encoding of a bitstream guarantees 3685 all reconstructed pixel values between 0 and 255, compliant 3686 bitstreams of such requirements have the clamp_type bit in the frame 3687 header set to 1. In such case, the clamp255 is no longer required. 3689 The summation process is the same, regardless of the (intra or inter) 3690 mode of prediction in effect for the macroblock. The reference 3691 decoder implementation of reconstruction may be found in the file 3692 recon.c. 3694 15. Loop Filter 3696 Loop filtering is the last stage of frame reconstruction and the 3697 next-to-last stage of the decoding process. The loop filter is 3698 applied to the entire frame after the summation of predictor and 3699 residue described in Chapter 14. 3701 The purpose of the loop filter is to eliminate (or at least reduce) 3702 visually objectionable artifacts associated with the semi- 3703 independence of the coding of macroblocks and their constituent 3704 subblocks. 3706 As was discussed in Chapter 5, the loop filter is "integral" to 3707 decoding, in that the results of loop filtering are used in the 3708 prediction of subsequent frames. Consequently, a functional decoder 3709 implementation must perform loop filtering exactly as described here. 3710 This is in distinction to any postprocessing that may be applied only 3711 to the image immediately before display; such postprocessing is 3712 entirely at the option of the implementor (and/or user) and has no 3713 effect on decoding per se. 3715 The baseline frame level parameters controlling the loop filter are 3716 defined in the frame header (Chapter 9.4) along with a mechanism for 3717 adjustment based on a macroblock's prediction mode and/or reference 3718 frame. The first is a flag selecting the type of filter (normal or 3719 simple), the other two are numbers (loop_filter_level and 3720 sharpness_level) that adjust the strength or sensitivity of the 3721 filter. As described in Chapters 9.3 and 10, loop_filter_level may 3722 be also overridden on a per-macroblock basis using segmentation. 3724 Loop filtering is one of the more computationally-intensive aspects 3725 of VP8 decoding. This is the reason for the existence of the 3726 optional less-demanding simple filter type. Also, the loop filter is 3727 completely disabled if the loop_filter_level in the frame header is 3728 zero; macroblock-level overrides are ignored in this case. (It is of 3729 course possible for a compressor to encode a frame in which only a 3730 few macroblocks are loop filtered: The global loop_filter_level must 3731 be non-zero and each macroblock can select one of four levels, most 3732 of which could be zero.) 3734 To facilitate efficient implementation, the VP8 decoding algorithms 3735 generally, and the loop filter especially, were designed with SIMD 3736 ("Single Instruction Multiple Datum" or "integer vector") processors 3737 in mind. The reference decoder implementation of loop filtering 3738 (found in loopfilter.c) is, in effect, a portable SIMD specification 3739 of the loop filtering algorithms intended to simplify a realization 3740 on an actual SIMD processor. 3742 Unfortunately, the approach taken there does not lead to maximal 3743 efficency (restricted to the C language, that is) and, as far as a 3744 pure algorithm specification is concerned, is in places obscure. For 3745 example, various aspects of filtering are conditioned on absolute 3746 differences lying below certain thresholds. An ordinary C 3747 implementation would simply discriminate amongst these behaviors 3748 using if statements. The reference decoder instead effects this by 3749 "masking arithmetic", that is, using "and" operations to 3750 (conditionally) zero-out values to be added or subtracted to pixels. 3751 Furthermore, the structure holding the various threshold values is 3752 artificially parallelized. While this mimics closely the approach 3753 taken in vector-processor machine language, it is not how one usually 3754 programs in C. 3756 In this document, we take a different approach and present the 3757 algorithms in a more straightforward, idiomatic, and terse C style. 3758 Together with the reference version, we hope to provide the "best of 3759 both worlds", that is, a pure algorithm specification here and a 3760 strong suggestion as to an optimal actual implementation in 3761 loopfilter.c. 3763 We begin by discussing the aspects of loop filtering that are 3764 independent of the controlling parameters and type of filter chosen. 3766 15.1. Filter Geometry and Overall Procedure 3768 The Y, U, and V planes are processed independently and, except for 3769 the values of certain control parameters (derived from the 3770 loop_filter_level and sharpness_level), identically. 3772 The loop filter acts on the edges between adjacent macroblocks and on 3773 the edges between adjacent subblocks of a macroblock. All such edges 3774 are horizontal or vertical. For each pixel position on an edge, a 3775 small number (two or three) of pixels adjacent to either side of the 3776 position are examined and possibly modified. The displacements of 3777 these pixels are at a right angle to the edge orientation, that is, 3778 for a horizontal edge, we treat the pixels immediately above and 3779 below the edge position, for a vertical edge, we treat the pixels 3780 immediately to the left and right of the edge. 3782 We call this collection of pixels associated to an edge position a 3783 segment; the length of a segment is 2, 4, 6, or 8. Excepting that 3784 the normal filter uses slightly different algorithms for, and that 3785 either filter may apply different control parameters to, the edges 3786 between macroblocks and those between subblocks, the treatment of 3787 edges is quite uniform: All segments straddling an edge are treated 3788 identically, there is no distinction between the treatment of 3789 horizontal and vertical edges, whether between macroblocks or between 3790 subblocks. 3792 As a consequence, adjacent subblock edges within a macroblock may be 3793 concatenated and processed in their entirety. There is a single 3794 8-pixel long vertical edge horizontally centered in each of the U and 3795 V blocks (the concatenation of upper and lower 4-pixel edges between 3796 chroma subblocks), and three 16-pixel long vertical edges at 3797 horizontal positions 1/4, 1/2, and 3/4 the width of the luma 3798 macroblock, each representing the concatenation of four 4-pixel sub- 3799 edges between pairs of Y subblocks. 3801 The macroblocks comprising the frame are processed in the usual 3802 raster-scan order. Each macroblock is "responsible for" the inter- 3803 macroblock edges immediately above and left of it (but not the edges 3804 below and right of it), as well as the edges between its subblocks. 3806 For each macroblock M, there are four filtering steps, which are, 3807 (almost) in order: 3809 1. If M is not on the leftmost column of macroblocks, filter across 3810 the left (vertical) inter-macroblock edge of M. 3812 2. Filter across the vertical subblock edges within M. 3814 3. If M is not on the topmost row of macroblocks, filter across the 3815 top (horizontal) inter-macroblock edge of M. 3817 4. Filter across the horizontal subblock edges within M. 3819 We write MY, MU, and MV for the planar constituents of M, that is, 3820 the 16x16 luma block, 8x8 U block, and 8x8 V block comprising M. 3822 In step 1, for each of the three blocks MY, MU, and MV, we filter 3823 each of the (16 luma or 8 chroma) segments straddling the column 3824 separating the block from the block immediately to the left of it, 3825 using the inter-macroblock filter and controls associated to the 3826 loop_filter_level and sharpness_level. 3828 In step 4, we filter across the (three luma and one each for U and V) 3829 vertical subblock edges described above, this time using the inter- 3830 subblock filter and controls. 3832 Step 2 and 4 are skipped for macroblocks that satisfy the following 3833 two conditions: 3835 1. Macroblock coding mode is neither B_PRED nor SPLTMV; 3836 2. There is no DCT coefficient coded for the whole macroblock. 3838 For these macroblocks, loop filtering for edges between subblocks 3839 internal to a macroblock is effectively skipped. This skip strategy 3840 significantly reduces VP8 loop-filtering complexity. 3842 Edges between macroblocks and those between subblocks are treated 3843 with different control parameters (and, in the case of the normal 3844 filter, with different algorithms); luma and chroma edges are also 3845 treated with different control parameters. Except for pixel 3846 addressing, there is no distinction between the treatment of vertical 3847 and horizontal edges. Luma edges are always 16 pixels long, chroma 3848 edges are always 8 pixels long, and the segments straddling an edge 3849 are treated identically; this of course facilitates vector 3850 processing. 3852 Because many pixels belong to segments straddling two or more edges, 3853 and so will be filtered more than once, the order in which edges are 3854 processed given above must be respected by any implementation. 3855 Within a single edge, however, the segments straddling that edge are 3856 disjoint and the order in which these segments are processed is 3857 immaterial. 3859 Before taking up the filtering algorithms themselves, we should 3860 emphasize a point already made: Even though the pixel segments 3861 associated to a macroblock are antecedent to the macroblock (that is, 3862 lie within the macroblock or in already-constructed macroblocks), a 3863 macroblock must not be filtered immediately after its 3864 "reconstruction" (described in Chapter 14). Rather, the loop filter 3865 applies after all the macroblocks have been "reconstructed" (i.e., 3866 had their predictor summed with their residue); correct decoding is 3867 predicated on the fact that already-constructed portions of the 3868 current frame referenced via intra-prediction (described in Chapter 3869 12) are not yet filtered. 3871 15.2. Simple Filter 3873 Having described the overall procedure of, and pixels affected by, 3874 the loop filter, we turn our attention to the treatment of individual 3875 segments straddling edges. We begin by describing the simple filter, 3876 which, as the reader might guess, is somewhat simpler than the normal 3877 filter. 3879 Note that the simple filter only applies to luma edges. Chroma edges 3880 are left unfiltered. 3882 Roughly speaking, the idea of loop filtering is, within limits, to 3883 reduce the difference between pixels straddling an edge. Differences 3884 in excess of a threshold (associated to the loop_filter_level) are 3885 assumed to be "natural" and are unmodified; differences below the 3886 threshold are assumed to be artifacts of quantization and the 3887 (partially) separate coding of blocks, and are reduced via the 3888 procedures described below. While the loop_filter_level is in 3889 principle arbitrary, the levels chosen by a VP8 compressor tend to be 3890 correlated to quantization levels. 3892 Most of the filtering arithmetic is done using 8-bit signed operands 3893 (having a range -128 to +127, inclusive), supplemented by 16-bit 3894 temporaries holding results of multiplies. 3896 Sums and other temporaries need to be "clamped" to a valid signed 3897 8-bit range: 3899 ---- Begin code block -------------------------------------- 3901 int8 c( int v) 3902 { 3903 return (int8) (v < -128 ? -128 : (v < 128 ? v : 127)); 3904 } 3906 ---- End code block ---------------------------------------- 3908 Since pixel values themselves are unsigned 8-bit numbers, we need to 3909 convert between signed and unsigned values: 3911 ---- Begin code block -------------------------------------- 3913 /* Convert pixel value (0 <= v <= 255) to an 8-bit signed 3914 number. */ 3915 int8 u2s( Pixel v) { return (int8) (v - 128);} 3917 /* Clamp, then convert signed number back to pixel value. */ 3918 Pixel s2u( int v) { return (Pixel) ( c(v) + 128);} 3920 ---- End code block ---------------------------------------- 3922 Filtering is often predicated on absolute-value thresholds. The 3923 following function is the equivalent of the standard library function 3924 abs, whose prototype is found in the standard header file stdlib.h. 3925 For us, the argument v is always the difference between two pixels 3926 and lies in the range -255 <= v <= +255. 3928 ---- Begin code block -------------------------------------- 3930 int abs( int v) { return v < 0? -v : v;} 3931 ---- End code block ---------------------------------------- 3933 An actual implementation would of course use inline functions or 3934 macros to accomplish these trivial procedures (which are used by both 3935 the normal and simple loop filters). An optimal implementation would 3936 probably express them in machine language, perhaps using SIMD vector 3937 instructions. On many SIMD processors, the saturation accomplished 3938 by the above clamping function is often folded into the arithmetic 3939 instructions themselves, obviating the explicit step taken here. 3941 To simplify the specification of relative pixel positions, we use the 3942 word before to mean "immediately above" (for a vertical segment 3943 straddling a horizontal edge) or "immediately to the left of" (for a 3944 horizontal segment straddling a vertical edge) and the word after to 3945 mean "immediately below" or "immediately to the right of". 3947 Given an edge, a segment, and a limit value, the simple loop filter 3948 computes a value based on the four pixels that straddle the edge (two 3949 either side). If that value is below a supplied limit, then, very 3950 roughly speaking, the two pixel values are brought closer to each 3951 other, "shaving off" something like a quarter of the difference. The 3952 same procedure is used for all segments straddling any type of edge, 3953 regardless of the nature (inter-macroblock, inter-subblock, luma, or 3954 chroma) of the edge; only the limit value depends on the edge-type. 3956 The exact procedure (for a single segment) is as follows; the 3957 subroutine common_adjust is used by both the simple filter presented 3958 here and the normal filters discussed in Section 15.3 (Section 15.3). 3960 ---- Begin code block -------------------------------------- 3962 int8 common_adjust( 3963 int use_outer_taps, /* filter is 2 or 4 taps wide */ 3964 const Pixel *P1, /* pixel before P0 */ 3965 Pixel *P0, /* pixel before edge */ 3966 Pixel *Q0, /* pixel after edge */ 3967 const Pixel *Q1 /* pixel after Q0 */ 3968 ) { 3969 cint8 p1 = u2s( *P1); /* retrieve and convert all 4 pixels */ 3970 cint8 p0 = u2s( *P0); 3971 cint8 q0 = u2s( *Q0); 3972 cint8 q1 = u2s( *Q1); 3974 /* Disregarding clamping, when "use_outer_taps" is false, 3975 "a" is 3*(q0-p0). Since we are about to divide "a" by 3976 8, in this case we end up multiplying the edge 3977 difference by 5/8. 3979 When "use_outer_taps" is true (as for the simple filter), 3980 "a" is p1 - 3*p0 + 3*q0 - q1, which can be thought of as 3981 a refinement of 2*(q0 - p0) and the adjustment is 3982 something like (q0 - p0)/4. */ 3984 int8 a = c( ( use_outer_taps? c(p1 - q1) : 0 ) + 3*(q0 - p0) ); 3986 /* b is used to balance the rounding of a/8 in the case where 3987 the "fractional" part "f" of a/8 is exactly 1/2. */ 3989 cint8 b = (a & 7)==4 ? -1 : 0; 3991 /* Divide a by 8, rounding up when f >= 1/2. 3992 Although not strictly part of the "C" language, 3993 the right-shift is assumed to propagate the sign bit. */ 3995 a = c( a + 4) >> 3; 3997 /* Subtract "a" from q0, "bringing it closer" to p0. */ 3999 *Q0 = s2u( q0 - a); 4001 /* Add "a" (with adjustment "b") to p0, "bringing it closer" 4002 to q0. 4004 The clamp of "a+b", while present in the reference decoder, 4005 is superfluous; we have -16 <= a <= 15 at this point. */ 4007 *P0 = s2u( p0 + c( a + b)); 4009 return a; 4010 } 4012 ---- End code block ---------------------------------------- 4013 ---- Begin code block -------------------------------------- 4015 void simple_segment( 4016 uint8 edge_limit, /* do nothing if edge difference 4017 exceeds limit */ 4018 const Pixel *P1, /* pixel before P0 */ 4019 Pixel *P0, /* pixel before edge */ 4020 Pixel *Q0, /* pixel after edge */ 4021 const Pixel *Q1 /* pixel after Q0 */ 4022 ) { 4023 if( (abs(*P0 - *Q0)*2 + abs(*P1-*Q1)/2) <= edge_limit)) 4024 common_adjust( 1, P1, P0, Q0, Q1); /* use outer taps */ 4025 } 4027 ---- End code block ---------------------------------------- 4029 We make a couple of remarks about the rounding procedure above. When 4030 b is zero (that is, when the "fractional part" of a is not 1/2 ), we 4031 are (except for clamping) adding the same number to p0 as we are 4032 subtracting from q0. This preserves the average value of p0 and q0 4033 but the resulting difference between p0 and q0 is always even; in 4034 particular, the smallest non-zero gradation +-1 is not possible here. 4036 When b is one, the value we add to p0 (again except for clamping) is 4037 one less than the value we are subtracting from q0. In this case, 4038 the resulting difference is always odd (and the small gradation +-1 4039 is possible) but the average value is reduced by 1/2, yielding, for 4040 instance, a very slight darkening in the luma plane. (In the very 4041 unlikely event of appreciable darkening after a large number of 4042 interframes, a compressor would of course eventually compensate for 4043 this in the selection of predictor and/or residue.) 4045 The derivation of the edge_limit value used above, which depends on 4046 the loop_filter_level and sharpness_level, as well as the type of 4047 edge being processed, will be taken up after we describe the normal 4048 loop filtering algorithm below. 4050 15.3. Normal Filter 4052 The normal loop filter is a refinement of the simple loop filter; all 4053 of the general discussion above applies here as well. In particular, 4054 the functions c, u2s, s2u, abs, and common_adjust are used by both 4055 the normal and simple filters. 4057 As mentioned above, the normal algorithms for inter-macroblock and 4058 inter-subblock edges differ. Nonetheless, they have a great deal in 4059 common: They use similar threshold algorithms to disable the filter 4060 and to detect high internal edge variance (which influences the 4061 filtering algorithm). Both algorithms also use, at least 4062 conditionally, the simple filter adjustment procedure described 4063 above. 4065 The common thresholding algorithms are as follows. 4067 ---- Begin code block -------------------------------------- 4069 /* All functions take (among other things) a segment (of length 4070 at most 4 + 4 = 8) symmetrically straddling an edge. 4072 The pixel values (or pointers) are always given in order, 4073 from the "beforemost" to the "aftermost". So, for a 4074 horizontal edge (written "|"), an 8-pixel segment would be 4075 ordered p3 p2 p1 p0 | q0 q1 q2 q3. */ 4077 /* Filtering is disabled if the difference between any two 4078 adjacent "interior" pixels in the 8-pixel segment exceeds 4079 the relevant threshold (I). A more complex thresholding 4080 calculation is done for the group of four pixels that 4081 straddle the edge, in line with the calculation in 4082 simple_segment() above. */ 4084 int filter_yes( 4085 uint8 I, /* limit on interior differences */ 4086 uint8 E, /* limit at the edge */ 4088 cint8 p3, cint8 p2, cint8 p1, cint8 p0, /* pixels before 4089 edge */ 4090 cint8 q0, cint8 q1, cint8 q2, cint8 q3 /* pixels after 4091 edge */ 4092 ) { 4093 return (abs(p0 - q0)*2 + abs(p1-q1)/2) <= E 4094 && abs(p3 - p2) <= I && abs(p2 - p1) <= I && 4095 abs(p1 - p0) <= I 4096 && abs(q3 - q2) <= I && abs(q2 - q1) <= I && 4097 abs(q1 - q0) <= I; 4098 } 4100 ---- End code block ---------------------------------------- 4101 ---- Begin code block -------------------------------------- 4103 /* Filtering is altered if (at least) one of the differences 4104 on either side of the edge exceeds a threshold (we have 4105 "high edge variance"). */ 4107 int hev( 4108 uint8 threshold, 4109 cint8 p1, cint8 p0, /* pixels before edge */ 4110 cint8 q0, cint8 q1 /* pixels after edge */ 4111 ) { 4112 return abs(p1 - p0) > threshold || abs(q1 - q0) > threshold; 4113 } 4115 ---- End code block ---------------------------------------- 4117 The subblock filter is a variant of the simple filter. In fact, if 4118 we have high edge variance, the adjustment is exactly as for the 4119 simple filter. Otherwise, the simple adjustment (without outer taps) 4120 is applied and the two pixels one step in from the edge pixels are 4121 adjusted by roughly half the amount by which the two edge pixels are 4122 adjusted; since the edge adjustment here is essentially 3/8 the edge 4123 difference, the inner adjustment is approximately 3/16 the edge 4124 difference. 4126 ---- Begin code block -------------------------------------- 4128 void subblock_filter( 4129 uint8 hev_threshold, /* detect high edge variance */ 4130 uint8 interior_limit, /* possibly disable filter */ 4131 uint8 edge_limit, 4132 cint8 *P3, cint8 *P2, int8 *P1, int8 *P0, /* pixels before 4133 edge */ 4134 int8 *Q0, int8 *Q1, cint8 *Q2, cint8 *Q3 /* pixels after 4135 edge */ 4136 ) { 4137 cint8 p3 = u2s(*P3), p2 = u2s(*P2), p1 = u2s(*P1), 4138 p0 = u2s(*P0); 4139 cint8 q0 = u2s(*Q0), q1 = u2s(*Q1), q2 = u2s(*Q2), 4140 q3 = u2s(*Q3); 4142 if( filter_yes( interior_limit, edge_limit, q3, q2, q1, q0, 4143 p0, p1, p2, p3)) 4144 { 4145 const int hv = hev( hev_threshold, p1, p0, q0, q1); 4147 cint8 a = ( common_adjust( hv, P1, P0, Q0, Q1) + 1) >> 1; 4149 if( !hv) { 4150 *Q1 = s2u( q1 - a); 4151 *P1 = s2u( p1 + a); 4152 } 4153 } 4154 } 4156 ---- End code block ---------------------------------------- 4158 The inter-macroblock filter has potentially wider scope. If the edge 4159 variance is high, it performs the simple adjustment (using the outer 4160 taps, just like the simple filter and the corresponding case of the 4161 normal subblock filter). If the edge variance is low, we begin with 4162 the same basic filter calculation and apply multiples of it to pixel 4163 pairs symmetric about the edge; the magnitude of adjustment decays as 4164 we move away from the edge and six of the pixels in the segment are 4165 affected. 4167 ---- Begin code block -------------------------------------- 4169 void MBfilter( 4170 uint8 hev_threshold, /* detect high edge variance */ 4171 uint8 interior_limit, /* possibly disable filter */ 4172 uint8 edge_limit, 4173 cint8 *P3, int8 *P2, int8 *P1, int8 *P0, /* pixels before 4174 edge */ 4175 int8 *Q0, int8 *Q1, int8 *Q2, cint8 *Q3 /* pixels after 4176 edge */ 4177 ) { 4178 cint8 p3 = u2s(*P3), p2 = u2s(*P2), p1 = u2s(*P1), 4179 p0 = u2s(*P0); 4180 cint8 q0 = u2s(*Q0), q1 = u2s(*Q1), q2 = u2s(*Q2), 4181 q3 = u2s(*Q3); 4183 if( filter_yes( interior_limit, edge_limit, q3, q2, q1, q0, 4184 p0, p1, p2, p3)) 4185 { 4186 if( !hev( hev_threshold, p1, p0, q0, q1)) 4187 { 4188 /* Same as the initial calculation in "common_adjust", 4189 w is something like twice the edge difference */ 4191 const int8 w = c( c(p1 - q1) + 3*(q0 - p0) ); 4193 /* 9/64 is approximately 9/63 = 1/7 and 1<<7 = 128 = 4194 2*64. So this a, used to adjust the pixels adjacent 4195 to the edge, is something like 3/7 the edge 4196 difference. */ 4198 int8 a = c( (27*w + 63) >> 7); 4200 *Q0 = s2u( q0 - a); *P0 = s2u( p0 + a); 4202 /* Next two are adjusted by 2/7 the edge difference */ 4204 a = c( (18*w + 63) >> 7); 4206 *Q1 = s2u( q1 - a); *P1 = s2u( p1 + a); 4208 /* Last two are adjusted by 1/7 the edge difference */ 4210 a = c( (9*w + 63) >> 7); 4212 *Q2 = s2u( q2 - a); *P2 = s2u( p2 + a); 4214 } else /* if hev, do simple filter */ 4215 common_adjust( 1, P1, P0, Q0, Q1); /* using outer 4216 taps */ 4217 } 4219 } 4221 ---- End code block ---------------------------------------- 4223 15.4. Calculation of Control Parameters 4225 We conclude the discussion of loop filtering by showing how the 4226 thresholds supplied to the procedures above are derived from the two 4227 control parameters sharpness_level (an unsigned 3-bit number having 4228 maximum value 7) and loop_filter_level (an unsigned 6-bit number 4229 having maximum value 63). 4231 While the sharpness_level is constant over the frame, individual 4232 macroblocks may override the loop_filter_level with one of four 4233 possibilities supplied in the frame header (as described in Chapter 4234 10). 4236 Both the simple and normal filters disable filtering if a value 4237 derived from the four pixels that straddle the edge (2 either side) 4238 exceeds a threshold / limit value. 4240 ---- Begin code block -------------------------------------- 4242 /* Luma and Chroma use the same inter-macroblock edge limit */ 4243 uint8 mbedge_limit = ((loop_filter_level + 2) * 2) + 4244 interior_limit; 4246 /* Luma and Chroma use the same inter-subblock edge limit */ 4247 uint8 sub_bedge_limit = (loop_filter_level * 2) + interior_limit; 4249 ---- End code block ---------------------------------------- 4251 The remaining thresholds are used only by the normal filters. The 4252 filter-disabling interior difference limit is the same for all edges 4253 (luma, chroma, inter-subblock, inter-macroblock) and is given by the 4254 following. 4256 ---- Begin code block -------------------------------------- 4258 uint8 interior_limit = loop_filter_level; 4260 if( sharpness_level) 4261 { 4262 interior_limit >>= sharpness_level > 4 ? 2 : 1; 4263 if( interior_limit > 9 - sharpness_level) 4264 interior_limit = 9 - sharpness_level; 4265 } 4266 if( !interior_limit) 4267 interior_limit = 1; 4269 ---- End code block ---------------------------------------- 4270 Finally, we give the derivation of the high edge-variance threshold, 4271 which is also the same for all edge types. 4273 ---- Begin code block -------------------------------------- 4275 uint8 hev_threshold = 0; 4277 if( we_are_decoding_akey_frame) /* current frame is a key frame */ 4278 { 4279 if( loop_filter_level >= 40) 4280 hev_threshold = 2; 4281 else if( loop_filter_level >= 15) 4282 hev_threshold = 1; 4283 } 4284 else /* current frame is an interframe */ 4285 { 4286 if( loop_filter_level >= 40) 4287 hev_threshold = 3; 4288 else if( loop_filter_level >= 20) 4289 hev_threshold = 2; 4290 else if( loop_filter_level >= 15) 4291 hev_threshold = 1; 4292 } 4294 ---- End code block ---------------------------------------- 4296 16. Interframe Macroblock Prediction Records 4298 We describe the layout and semantics of the prediction records for 4299 macroblocks in an interframe. 4301 After the feature specification (which is described in Chapter 10 and 4302 is identical for intraframes and interframes), there comes a Bool( 4303 prob_intra), which indicates inter-prediction (i.e., prediction from 4304 prior frames) when true and intra-prediction (i.e., prediction from 4305 already-coded portions of the current frame) when false. The zero- 4306 probability prob_intra is set by field J of the frame header. 4308 16.1. Intra-Predicted Macroblocks 4310 For intra-prediction, the layout of the prediction data is 4311 essentially the same as the layout for key frames, although the 4312 contexts used by the decoding process are slightly different. 4314 As discussed in Chapter 8, the "outer" Y mode here uses a different 4315 tree from that used in key frames, repeated here for convenience. 4317 ---- Begin code block -------------------------------------- 4319 const tree_index ymode_tree [2 * (num_ymodes - 1)] = 4320 { 4321 -DC_PRED, 2, /* root: DC_PRED = "0", "1" subtree */ 4322 4, 6, /* "1" subtree has 2 descendant subtrees */ 4323 -V_PRED, -H_PRED, /* "10" subtree: V_PRED = "100", 4324 H_PRED = "101" */ 4325 -TM_PRED, -B_PRED /* "11" subtree: TM_PRED = "110", 4326 B_PRED = "111" */ 4327 }; 4329 ---- End code block ---------------------------------------- 4331 The probability table used to decode this tree is variable. As 4332 described in Chapter 9 (Section 9), it (along with the similarly- 4333 treated UV table) can be updated by field J of the frame header. 4334 Similar to the coefficient-decoding probabilities, such updates are 4335 cumulative and affect all ensuing frames until the next key frame or 4336 explicit update. The default probabilities for the Y and UV tables 4337 are 4339 ---- Begin code block -------------------------------------- 4341 Prob ymode_prob [num_ymodes - 1] = { 112, 86, 140, 37}; 4342 Prob uv_mode_prob [num_uv_modes - 1] = { 162, 101, 204}; 4343 ---- End code block ---------------------------------------- 4345 These defaults must be restored after detection of a key frame. 4347 Just as for key frames, if the Y mode is B_PRED, there next comes an 4348 encoding of the intra_bpred mode used by each of the sixteen Y 4349 subblocks. These encodings use the same tree as does that for key 4350 frames but, in place of the contexts used in key frames, use the 4351 single fixed probability table 4353 ---- Begin code block -------------------------------------- 4355 const Prob bmode_prob [num_intra_bmodes - 1] = { 4356 120, 90, 79, 133, 87, 85, 80, 111, 151 4357 }; 4359 ---- End code block ---------------------------------------- 4361 Last comes the chroma mode, again coded using the same tree as that 4362 for key frames, this time using the dynamic uv_mode_prob table 4363 described above. 4365 The calculation of the intra-prediction buffer is identical to that 4366 described for key frames in Chapter 12. 4368 16.2. Inter-Predicted Macroblocks 4370 Otherwise (when the above bool is true), we are using inter- 4371 prediction (which of course only happens for interframes), to which 4372 we now restrict our attention. 4374 The next datum is then another bool, B( prob_last), selecting the 4375 reference frame. If 0, the reference frame is previous frame (last 4376 frame); if 1, another bool (prob_gf) selects the reference frame 4377 between golden frame (0) or altref frame (1). The probabilities 4378 prob_last and prob_gf are set in field J of the frame header. 4380 Together with setting the reference frame, the purpose of inter-mode 4381 decoding is to set a motion vector for each of the sixteen Y 4382 subblocks of the current macroblock. This then defines the 4383 calculation of the inter-prediction buffer (detailed in Chapter 18). 4384 While the net effect of inter-mode decoding is straightforward, the 4385 implementation is somewhat complex; the (lossless) compression 4386 achieved by this method justifies the complexity. 4388 After the reference frame selector comes the mode (or motion vector 4389 reference) applied to the macroblock as a whole, coded using the 4390 following enumeration and tree. Setting mv_nearest = num_ymodes is a 4391 convenience that allows a single variable to unambiguously hold an 4392 inter- or intraprediction mode. 4394 ---- Begin code block -------------------------------------- 4396 typedef enum 4397 { 4398 mv_nearest = num_ymodes, /* use "nearest" motion vector 4399 for entire MB */ 4400 mv_near, /* use "next nearest" "" */ 4401 mv_zero, /* use zero "" */ 4402 mv_new, /* use explicit offset from 4403 implicit "" */ 4404 mv_split, /* use multiple motion vectors */ 4406 num_mv_refs = mv_split + 1 - mv_nearest 4407 } 4408 mv_ref; 4410 const tree_index mv_ref_tree [2 * (num_mv_refs - 1)] = 4411 { 4412 -mv_zero, 2, /* zero = "0" */ 4413 -mv_nearest, 4, /* nearest = "10" */ 4414 -mv_near, 6, /* near = "110" */ 4415 -mv_new, -mv_split /* new = "1110", split = "1111" */ 4416 }; 4418 ---- End code block ---------------------------------------- 4420 16.3. Mode and Motion Vector Contexts 4422 The probability table used to decode the mv_ref, along with three 4423 reference motion vectors used by the selected mode, is calculated via 4424 a survey of the already-decoded motion vectors in (up to) 3 nearby 4425 macroblocks. 4427 The algorithm generates a sorted list of distinct motion vectors 4428 adjacent to the search site. The best_mv is the vector with the 4429 highest score. The nearest_mv is the non-zero vector with the 4430 highest score. The near_mv is the non-zero vector with the next 4431 highest score. The number of motion vectors coded using the SPLITMV 4432 mode is scored using the same weighting and is returned with the 4433 scores of the best, nearest, and near vectors. 4435 The three adjacent macroblocks above, left, and above-left are 4436 considered in order. If the macroblock is intra-coded, no action is 4437 taken. Otherwise, the motion vector is compared to other previously 4438 found motion vectors to determine if it has been seen before, and if 4439 so contributes its weight to that vector, otherwise enters a new 4440 vector in the list. The above and left vectors have twice the weight 4441 of the above-left vector. 4443 As is the case with many contexts used by VP8, it is possible for 4444 macroblocks near the top or left edges of the image to reference 4445 blocks that are outside the visible image. VP8 provides a border of 4446 1 macroblock filled with 0x0 motion vectors left of the left edge, 4447 and a border filled with 0,0 motion vectors of 1 macroblocks above 4448 the top edge. 4450 Much of the process is more easily described in C than in English. 4451 The reference code for this can be found in findnearmv.c. The 4452 calculation of reference vectors, probability table, and, finally, 4453 the inter-prediction mode itself is implemented as follows. 4455 ---- Begin code block -------------------------------------- 4457 typedef union 4458 { 4459 unsigned int as_int; 4460 MV as_mv; 4461 } int_mv; /* facilitates rapid equality tests */ 4463 static void mv_bias(MODE_INFO *x,int refframe, int_mv *mvp, 4464 int * ref_frame_sign_bias ) 4465 { 4466 MV xmv; 4467 xmv = x->mbmi.mv.as_mv; 4468 if ( ref_frame_sign_bias[x->mbmi.ref_frame] != 4469 ref_frame_sign_bias[refframe] ) 4470 { 4471 xmv.row*=-1; 4472 xmv.col*=-1; 4473 } 4474 mvp->as_mv = xmv; 4475 } 4477 ---- End code block ---------------------------------------- 4478 ---- Begin code block -------------------------------------- 4480 void vp8_clamp_mv(MV *mv, const MACROBLOCKD *xd) 4481 { 4482 if ( mv->col < (xd->mb_to_left_edge - LEFT_TOP_MARGIN) ) 4483 mv->col = xd->mb_to_left_edge - LEFT_TOP_MARGIN; 4484 else if ( mv->col > xd->mb_to_right_edge + RIGHT_BOTTOM_MARGIN ) 4485 mv->col = xd->mb_to_right_edge + RIGHT_BOTTOM_MARGIN; 4487 if ( mv->row < (xd->mb_to_top_edge - LEFT_TOP_MARGIN) ) 4488 mv->row = xd->mb_to_top_edge - LEFT_TOP_MARGIN; 4489 else if ( mv->row > xd->mb_to_bottom_edge + RIGHT_BOTTOM_MARGIN ) 4490 mv->row = xd->mb_to_bottom_edge + RIGHT_BOTTOM_MARGIN; 4491 } 4493 ---- End code block ---------------------------------------- 4495 In the function vp8_find_near_mvs(), the vectors "nearest" and "near" 4496 are used by the corresponding modes. 4498 The vector best_mv is used as a base for explicitly-coded motion 4499 vectors. 4501 The first three entries in the return value cnt are (in order) 4502 weighted census values for "zero", "nearest", and "near" vectors. 4503 The final value indicates the extent to which SPLIT_MV was used by 4504 the neighboring macroblocks. The largest possible "weight" value in 4505 each case is 5. 4507 ---- Begin code block -------------------------------------- 4509 void vp8_find_near_mvs 4510 ( 4511 MACROBLOCKD *xd, 4512 const MODE_INFO *here, 4513 MV *nearest, 4514 MV *near, 4515 MV *best_mv, 4516 int cnt[4], 4517 int refframe, 4518 int * ref_frame_sign_bias 4519 ) 4520 { 4521 const MODE_INFO *above = here - xd->mode_info_stride; 4522 const MODE_INFO *left = here - 1; 4523 const MODE_INFO *aboveleft = above - 1; 4524 int_mv near_mvs[4]; 4525 int_mv *mv = near_mvs; 4526 int *cntx = cnt; 4527 enum {CNT_ZERO, CNT_NEAREST, CNT_NEAR, CNT_SPLITMV}; 4529 /* Zero accumulators */ 4530 mv[0].as_int = mv[1].as_int = mv[2].as_int = 0; 4531 cnt[0] = cnt[1] = cnt[2] = cnt[3] = 0; 4533 /* Process above */ 4534 if(above->mbmi.ref_frame != INTRA_FRAME) { 4535 if(above->mbmi.mv.as_int) { 4536 (++mv)->as_int = above->mbmi.mv.as_int; 4537 mv_bias(above, refframe, mv, ref_frame_sign_bias); 4538 ++cntx; 4539 } 4540 *cntx += 2; 4541 } 4543 /* Process left */ 4544 if(left->mbmi.ref_frame != INTRA_FRAME) { 4545 if(left->mbmi.mv.as_int) { 4546 int_mv this_mv; 4548 this_mv.as_int = left->mbmi.mv.as_int; 4549 mv_bias(left, refframe, &this_mv, ref_frame_sign_bias); 4551 if(this_mv.as_int != mv->as_int) { 4552 (++mv)->as_int = this_mv.as_int; 4553 ++cntx; 4554 } 4555 *cntx += 2; 4556 } else 4557 cnt[CNT_ZERO] += 2; 4558 } 4560 /* Process above left */ 4561 if(aboveleft->mbmi.ref_frame != INTRA_FRAME) { 4562 if(aboveleft->mbmi.mv.as_int) { 4563 int_mv this_mv; 4565 this_mv.as_int = aboveleft->mbmi.mv.as_int; 4566 mv_bias(aboveleft, refframe, &this_mv, 4567 ref_frame_sign_bias); 4569 if(this_mv.as_int != mv->as_int) { 4570 (++mv)->as_int = this_mv.as_int; 4571 ++cntx; 4572 } 4573 *cntx += 1; 4575 } else 4576 cnt[CNT_ZERO] += 1; 4577 } 4579 /* If we have three distinct MV's ... */ 4580 if(cnt[CNT_SPLITMV]) { 4581 /* See if above-left MV can be merged with NEAREST */ 4582 if(mv->as_int == near_mvs[CNT_NEAREST].as_int) 4583 cnt[CNT_NEAREST] += 1; 4584 } 4586 cnt[CNT_SPLITMV] = ((above->mbmi.mode == SPLITMV) 4587 + (left->mbmi.mode == SPLITMV)) * 2 4588 + (aboveleft->mbmi.mode == SPLITMV); 4590 /* Swap near and nearest if necessary */ 4591 if(cnt[CNT_NEAR] > cnt[CNT_NEAREST]) { 4592 int tmp; 4593 tmp = cnt[CNT_NEAREST]; 4594 cnt[CNT_NEAREST] = cnt[CNT_NEAR]; 4595 cnt[CNT_NEAR] = tmp; 4596 tmp = near_mvs[CNT_NEAREST].as_int; 4597 near_mvs[CNT_NEAREST].as_int = near_mvs[CNT_NEAR].as_int; 4598 near_mvs[CNT_NEAR].as_int = tmp; 4599 } 4601 /* Use near_mvs[0] to store the "best" MV */ 4602 if(cnt[CNT_NEAREST] >= cnt[CNT_ZERO]) 4603 near_mvs[CNT_ZERO] = near_mvs[CNT_NEAREST]; 4605 /* Set up return values */ 4606 *best_mv = near_mvs[0].as_mv; 4607 *nearest = near_mvs[CNT_NEAREST].as_mv; 4608 *near = near_mvs[CNT_NEAR].as_mv; 4610 vp8_clamp_mv(nearest, xd); 4611 vp8_clamp_mv(near, xd); 4612 vp8_clamp_mv(best_mv, xd); //TODO: move this up before 4613 the copy 4614 } 4616 ---- End code block ---------------------------------------- 4618 The mv_ref probability table (mv_ref_p) is then derived from the 4619 census as follows. 4621 ---- Begin code block -------------------------------------- 4623 const int vp8_mode_contexts[6][4] = 4624 { 4625 { 7, 1, 1, 143, }, 4626 { 14, 18, 14, 107, }, 4627 { 135, 64, 57, 68, }, 4628 { 60, 56, 128, 65, }, 4629 { 159, 134, 128, 34, }, 4630 { 234, 188, 128, 28, }, 4631 } 4633 ---- End code block ---------------------------------------- 4635 ---- Begin code block -------------------------------------- 4637 vp8_prob *vp8_mv_ref_probs(vp8_prob mv_ref_p[VP8_MVREFS-1], 4638 int cnt[4]) 4639 { 4640 mv_ref_p[0] = vp8_mode_contexts [cnt[0]] [0]; 4641 mv_ref_p[1] = vp8_mode_contexts [cnt[1]] [1]; 4642 mv_ref_p[2] = vp8_mode_contexts [cnt[2]] [2]; 4643 mv_ref_p[3] = vp8_mode_contexts [cnt[3]] [3]; 4644 return p; 4645 } 4647 ---- End code block ---------------------------------------- 4649 Once mv_ref_p is established, the mv_ref is decoded as usual. 4651 ---- Begin code block -------------------------------------- 4653 mvr = (mv_ref) treed_read( d, mv_ref_tree, mv_ref_p); 4655 ---- End code block ---------------------------------------- 4657 For the first four inter-coding modes, the same motion vector is used 4658 for all the Y subblocks. The first three modes use an implicit 4659 motion vector. 4661 +------------+------------------------------------------------------+ 4662 | Mode | Instruction | 4663 +------------+------------------------------------------------------+ 4664 | mv_nearest | Use the nearest vector returned by | 4665 | | vp8_find_near_mvs. | 4666 | | | 4667 | mv_near | Use the near vector returned by vp8_find_near_mvs. | 4668 | | | 4669 | mv_zero | Use a zero vector, that is, predict the current | 4670 | | macroblock from the corresponding macroblock in the | 4671 | | prediction frame. | 4672 | | | 4673 | NEWMV | This mode is followed by an explicitly-coded motion | 4674 | | vector (the format of which is described in the next | 4675 | | chapter) that is added (component-wise) to the | 4676 | | best_mv reference vector returned by find_near_mvs | 4677 | | and applied to all 16 subblocks. | 4678 +------------+------------------------------------------------------+ 4680 16.4. Split Prediction 4682 The remaining mode (SPLITMV) causes multiple vectors to be applied to 4683 the Y subblocks. It is immediately followed by a partition 4684 specification that determines how many vectors will be specified and 4685 how they will be assigned to the subblocks. The possible partitions, 4686 with indicated subdivisions and coding tree, are as follows. 4688 ---- Begin code block -------------------------------------- 4690 typedef enum 4691 { 4692 mv_top_bottom, /* two pieces {0...7} and {8...15} */ 4693 mv_left_right, /* {0,1,4,5,8,9,12,13} and 4694 {2,3,6,7,10,11,14,15} */ 4695 mv_quarters, /* {0,1,4,5}, {2,3,6,7}, {8,9,12,13}, 4696 {10,11,14,15} */ 4697 MV_16, /* every subblock gets its own vector 4698 {0} ... {15} */ 4700 mv_num_partitions 4701 } 4702 MVpartition; 4704 const tree_index mvpartition_tree [2 * (mvnum_partition - 1)] = 4705 { 4706 -MV_16, 2, /* MV_16 = "0" */ 4707 -mv_quarters, 4, /* mv_quarters = "10" */ 4708 -mv_top_bottom, -mv_left_right /* top_bottom = "110", 4709 left_right = "111" */ 4710 }; 4712 ---- End code block ---------------------------------------- 4714 The partition is decoded using a fixed, constant probability table: 4716 ---- Begin code block -------------------------------------- 4718 const Prob mvpartition_probs [mvnum_partition - 1] = 4719 { 110, 111, 150}; 4720 part = (MVpartition) treed_read( d, mvpartition_tree, 4721 mvpartition_probs); 4723 ---- End code block ---------------------------------------- 4725 After the partition come two (for mv_top_bottom or mv_left_right), 4726 four (for mv_quarters), or sixteen (for MV_16) subblock inter- 4727 prediction modes. These modes occur in the order indicated by the 4728 partition layouts (given as comments to the MVpartition enum) and are 4729 coded as follows. (As was done for the macroblock-level modes, we 4730 offset the mode enumeration so that a single variable may 4731 unambiguously hold either an intra- or inter-subblock mode.) 4733 Prior to decoding each subblock, a decoding tree context is chosen as 4734 illustrated in the code snippet below. The context is based on the 4735 immediate left and above subblock neighbors, and whether they are 4736 equal, are zero, or a combination of those. 4738 ---- Begin code block -------------------------------------- 4740 typedef enum 4741 { 4742 LEFT4x4 = num_intra_bmodes, /* use already-coded MV to 4743 my left */ 4744 ABOVE4x4, /* use already-coded MV above me */ 4745 ZERO4x4, /* use zero MV */ 4746 NEW4x4, /* explicit offset from "best" */ 4748 num_sub_mv_ref 4749 }; 4750 sub_mv_ref; 4752 const tree_index sub_mv_ref_tree [2 * (num_sub_mv_ref - 1)] = 4753 { 4754 -LEFT4X4, 2, /* LEFT = "0" */ 4755 -ABOVE4X4, 4, /* ABOVE = "10" */ 4756 -ZERO4X4, -NEW4X4 /* ZERO = "110", NEW = "111" */ 4757 }; 4759 /* Choose correct decoding tree context 4760 * Function parameters are left subblock neighbor MV and above 4761 * subblock neighbor MV */ 4762 int vp8_mvCont(MV *l, MV*a) 4763 { 4764 int lez = (l->row == 0 && l->col == 0); /* left neighbour 4765 is zero */ 4766 int aez = (a->row == 0 && a->col == 0); /* above neighbour 4767 is zero */ 4768 int lea = (l->row == a->row && l->col == a->col); /* left 4769 neighbour equals above neighbour */ 4771 if(lea && lez) 4772 return SUBMVREF_LEFT_ABOVE_ZED; /* =4 */ 4774 if(lea) 4775 return SUBMVREF_LEFT_ABOVE_SAME; /* =3 */ 4777 if(aez) 4778 return SUBMVREF_ABOVE_ZED; /* =2 */ 4780 if(lez) 4781 return SUBMVREF_LEFT_ZED; /* =1*/ 4783 return SUBMVREF_NORMAL; /* =0 */ 4785 } 4787 /* Constant probabilities and decoding procedure. */ 4789 const Prob sub_mv_ref_prob [5][num_sub_mv_ref - 1] = { 4790 { 147,136,18 }, 4791 { 106,145,1 }, 4792 { 179,121,1 }, 4793 { 223,1 ,34 }, 4794 { 208,1 ,1 } }; 4796 sub_ref = (sub_mv_ref) treed_read( d, sub_mv_ref_tree, 4797 sub_mv_ref_prob[context]); 4799 ---- End code block ---------------------------------------- 4801 The first two sub-prediction modes simply copy the already-coded 4802 motion vectors used by the blocks above and to-the-left of the 4803 subblock at the upper left corner of the current subset (i.e., 4804 collection of subblocks being predicted). These prediction blocks 4805 need not lie in the current macroblock and, if the current subset 4806 lies at the top or left edges of the frame, need not lie in the 4807 frame. In this latter case, their motion vectors are taken to be 4808 zero, as are subblock motion vectors within an intra-predicted 4809 macroblock. Also, to ensure the correctness of prediction within 4810 this macroblock, all subblocks lying in an already-decoded subset of 4811 the current macroblock must have their motion vectors set. 4813 ZERO4x4 uses a zero motion vector and predicts the current subset 4814 using the corresponding subset from the prediction frame. 4816 NEW4x4 is exactly like NEWMV except applied only to the current 4817 subset. It is followed by a 2-dimensional motion vector offset 4818 (described in the next chapter) that is added to the best vector 4819 returned by the earlier call to find_near_mvs to form the motion 4820 vector in effect for the subset. 4822 Parsing of both inter-prediction modes and motion vectors (described 4823 next) can be found in the reference decoder file decodemv.c. 4825 17. Motion Vector Decoding 4827 As discussed above, motion vectors appear in two places in the VP8 4828 datastream: applied to whole macroblocks in NEWMV mode and applied to 4829 subsets of macroblocks in NEW4x4 mode. The format of the vectors is 4830 identical in both cases. 4832 Each vector has two pieces: A vertical component (row) followed by a 4833 horizontal component (column). The row and column use separate 4834 coding probabilities but are otherwise represented identically. 4836 17.1. Coding of Each Component 4838 Each component is a signed integer "V" representing a vertical or 4839 horizontal luma displacement of "V" quarter-pixels (and a chroma 4840 displacement of "V" eighth-pixels). The absolute value of "V", if 4841 non-zero, is followed by a boolean sign. "V" may take any value 4842 between -255 and +255, inclusive. 4844 The absolute value "A" is coded in one of two different ways 4845 according to its size. For 0 <= "A" <= 7, "A" is tree-coded, and for 4846 8 <= "A" <= 255, the bits in the binary expansion of "A" are coded 4847 using independent boolean probabilities. The coding of "A" begins 4848 with a bool specifying which range is in effect. 4850 Decoding a motion vector component then requires a 17-position 4851 probability table, whose offsets, along with the procedure used to 4852 decode components, are as follows: 4854 ---- Begin code block -------------------------------------- 4856 typedef enum 4857 { 4858 mvpis_short, /* short (<= 7) vs long (>= 8) */ 4859 MVPsign, /* sign for non-zero */ 4860 MVPshort, /* 8 short values = 7-position tree */ 4862 MVPbits = MVPshort + 7, /* 8 long value bits 4863 w/independent probs */ 4865 MVPcount = MVPbits + 8 /* 17 probabilities in total */ 4866 } 4867 MVPindices; 4869 typedef Prob MV_CONTEXT [MVPcount]; /* Decoding spec for 4870 a single component */ 4872 /* Tree used for small absolute values (has expected 4873 correspondence). */ 4875 const tree_index small_mvtree [2 * (8 - 1)] = 4876 { 4877 2, 8, /* "0" subtree, "1" subtree */ 4878 4, 6, /* "00" subtree", "01" subtree */ 4879 -0, -1, /* 0 = "000", 1 = "001" */ 4880 -2, -3, /* 2 = "010", 3 = "011" */ 4881 10, 12, /* "10" subtree, "11" subtree */ 4882 -4, -5, /* 4 = "100", 5 = "101" */ 4883 -6, -7 /* 6 = "110", 7 = "111" */ 4884 }; 4886 /* Read MV component at current decoder position, using 4887 supplied probs. */ 4889 int read_mvcomponent( bool_decoder *d, const MV_CONTEXT *mvc) 4890 { 4891 const Prob * const p = (const Prob *) mvc; 4892 int A = 0; 4894 if( read_bool( d, p [mvpis_short])) /* 8 <= A <= 255 */ 4895 { 4896 /* Read bits 0, 1, 2 */ 4898 int i = 0; 4899 do { A += read_bool( d, p [MVPbits + i]) << i;} 4900 while( ++i < 3); 4902 /* Read bits 7, 6, 5, 4 */ 4904 i = 7; 4905 do { A += read_bool( d, p [MVPbits + i]) << i;} 4906 while( --i > 3); 4908 /* We know that A >= 8 because it is coded long, 4909 so if A <= 15, bit 3 is one and is not 4910 explicitly coded. */ 4912 if( !(A & 0xfff0) || read_bool( d, p [MVPbits + 3])) 4913 A += 8; 4914 } 4915 else /* 0 <= A <= 7 */ 4916 A = treed_read( d, small_mvtree, p + MVPshort); 4918 return A && read_bool( r, p [MVPsign]) ? -A : A; 4919 } 4920 ---- End code block ---------------------------------------- 4922 17.2. Probability Updates 4924 The decoder should maintain an array of two MV_CONTEXTs for decoding 4925 row and column components, respectively. These MV_CONTEXTs should be 4926 set to their defaults every key frame. Each individual probability 4927 may be updated every interframe (by field "J" of the frame header) 4928 using a constant table of update probabilities. Each optional update 4929 is of the form B? P(7), that is, a bool followed by a 7-bit 4930 probability specification if true. 4932 As with other dynamic probabilities used by VP8, the updates remain 4933 in effect until the next key frame or until replaced via another 4934 update. 4936 In detail, the probabilities should then be managed as follows. 4938 ---- Begin code block -------------------------------------- 4940 /* Never-changing table of update probabilities for each 4941 individual probability used in decoding motion vectors. */ 4943 const MV_CONTEXT vp8_mv_update_probs[2] = 4944 { 4945 { 4946 237, 4947 246, 4948 253, 253, 254, 254, 254, 254, 254, 4949 254, 254, 254, 254, 254, 250, 250, 252, 254, 254 4950 }, 4951 { 4952 231, 4953 243, 4954 245, 253, 254, 254, 254, 254, 254, 4955 254, 254, 254, 254, 254, 251, 251, 254, 254, 254 4956 } 4957 }; 4959 /* Default MV decoding probabilities. */ 4961 const MV_CONTEXT default_mv_context[2] = 4962 { 4963 { // row 4964 162, // is short 4965 128, // sign 4966 225, 146, 172, 147, 214, 39, 156, // short tree 4967 128, 129, 132, 75, 145, 178, 206, 239, 254, 254 // long bits 4969 }, 4971 { // same for column 4972 164, // is short 4973 128, 4974 204, 170, 119, 235, 140, 230, 228, 4975 128, 130, 130, 74, 148, 180, 203, 236, 254, 254 // long bits 4977 } 4978 }; 4980 /* Current MV decoding probabilities, set to above defaults 4981 every key frame. */ 4983 MV_CONTEXT mvc [2]; /* always row, then column */ 4985 /* Procedure for decoding a complete motion vector. */ 4987 typedef struct { int16 row, col;} MV; /* as in previous chapter */ 4989 MV read_mv( bool_decoder *d) 4990 { 4991 MV v; 4992 v.row = (int16) read_mvcomponent( d, mvc); 4993 v.col = (int16) read_mvcomponent( d, mvc + 1); 4994 return v; 4995 } 4997 /* Procedure for updating MV decoding probabilities, called 4998 every interframe with "d" at the appropriate position in 4999 the frame header. */ 5001 void update_mvcontexts( bool_decoder *d) 5002 { 5003 int i = 0; 5004 do { /* component = row, then column */ 5005 const Prob *up = mv_update_probs[i]; /* update probs 5006 for component */ 5007 Prob *p = mvc[i]; /* start decode tbl "" */ 5008 Prob * const pstop = p + MVPcount; /* end decode tbl "" */ 5009 do { 5010 if( read_bool( d, *up++)) /* update this position */ 5011 { 5012 const Prob x = read_literal( d, 7); 5014 *p = x? x<<1 : 1; 5015 } 5016 } while( ++p < pstop); /* next position */ 5018 } while( ++i < 2); /* next component */ 5019 } 5021 ---- End code block ---------------------------------------- 5023 This completes the description of the motion-vector decoding 5024 procedure and, with it, the procedure for decoding interframe 5025 macroblock prediction records. 5027 18. Interframe Prediction 5029 Given an inter-prediction specification for the current macroblock, 5030 that is, a reference frame together with a motion vector for each of 5031 the sixteen Y subblocks, we describe the calculation of the 5032 prediction buffer for the macroblock. Frame reconstruction is then 5033 completed via the previously-described processes of residue summation 5034 (Chapter 14 (Section 14)) and loop filtering (Chapter 15 5035 (Section 15)). 5037 The management of inter-predicted subblocks may be found in the 5038 reference decoder file reconinter.c; sub-pixel interpolation is 5039 implemented in filter_c.c. 5041 18.1. Bounds on and Adjustment of Motion Vectors 5043 It is possible within the VP8 format for a block or macroblock to 5044 have an arbitrarily large motion vectors, due to the fact that each 5045 motion vector is differentially encoded without any clamp from a 5046 neighboring block or macroblock. 5048 Because the motion vectors applied to the chroma subblocks have 1/8 5049 pixel resolution, the synthetic pixel calculation, outlined in 5050 Chapter 5 and detailed below, uses this resolution for the luma 5051 subblocks as well. In accordance, the stored luma motion vectors are 5052 all doubled, each component of each luma vector becoming an even 5053 integer in the range -510 to +510, inclusive. 5055 The vector applied to each chroma subblock is calculated by averaging 5056 the vectors for the 4 luma subblocks occupying the same visible area 5057 as the chroma subblock in the usual correspondence, that is, the 5058 vector for U and V block 0 is the average of the vectors for the Y 5059 subblocks { 0, 1, 4, 5}, chroma block 1 corresponds to Y blocks { 2, 5060 3, 6, 7}, chroma block 2 to Y blocks { 8, 9, 12, 13}, and chroma 5061 block 3 to Y blocks { 10, 11, 14, 15}. 5063 In detail, each of the two components of the vectors for each of the 5064 chroma subblocks is calculated from the corresponding luma vector 5065 components as follows: 5067 ---- Begin code block -------------------------------------- 5069 int avg( int c1, int c2, int c3, int c4) 5070 { 5071 int s = c1 + c2 + c3 + c4; 5073 /* The shift divides by 8 (not 4) because chroma pixels 5074 have half the diameter of luma pixels. The handling 5075 of negative motion vector components is slightly 5076 cumbersome because, strictly speaking, right shifts 5077 of negative numbers are not well-defined in C. */ 5079 return s >= 0 ? (s + 4) >> 3 : -( (-s + 4) >> 3); 5080 } 5082 ---- End code block ---------------------------------------- 5084 Furthermore, if the version number in the frame tag specifies only 5085 full-pel chroma motion vectors, then the fractional parts of both 5086 components of the vector are truncated to zero, as illustrated in the 5087 following pseudo-code (assuming 3 bits of fraction for both luma and 5088 chroma vectors): 5090 ---- Begin code block -------------------------------------- 5092 x = x & (~7); 5093 y = y & (~7); 5095 ---- End code block ---------------------------------------- 5097 Earlier in this document we described the vp8_clamp_mv() function to 5098 limit "nearest" and "near" motion vector predictors inside specified 5099 margins within the frame boundaries. Additional clamping is 5100 performed for NEW_MV macroblocks, for which the final motion vector 5101 is clamped again after combining the "best" predictor and the 5102 differential vector decoded from the stream. 5104 However, the secondary clamping is not performed for SPLIT_MV 5105 macroblocks, meaning that any subblock's motion vector within the 5106 SPLIT_MV macroblock may point outside the clamping zone. These non- 5107 clamped vectors are also used when determining the decoding tree 5108 context for subsequent subblocks' modes in the vp8_mvCont() function. 5110 18.2. Prediction Subblocks 5112 The prediction calculation for each subblock is then as follows. 5113 Temporarily disregarding the fractional part of the motion vector 5114 (that is, rounding "up" or "left" by right-shifting each component 3 5115 bits with sign propagation) and adding the origin (upper left 5116 position) of the (16x16 luma or 8x8 chroma) current macroblock gives 5117 us an origin in the Y, U, or V plane of the predictor frame (either 5118 the golden frame or previous frame). 5120 Considering that origin to be the upper left corner of a (luma or 5121 chroma) macroblock, we need to specify the relative positions of the 5122 pixels associated to that subblock, that is, any pixels that might be 5123 involved in the sub-pixel interpolation processes for the subblock. 5125 18.3. Sub-pixel Interpolation 5127 The sub-pixel interpolation is effected via two one-dimensional 5128 convolutions. These convolutions may be thought of as operating on a 5129 two-dimensional array of pixels whose origin is the subblock origin, 5130 that is the origin of the prediction macroblock described above plus 5131 the offset to the subblock. Because motion vectors are arbitrary, so 5132 are these "prediction subblock origins". 5134 The integer part of the motion vector is subsumed in the origin of 5135 the prediction subblock, the 16 (synthetic) pixels we need to 5136 construct are given by 16 offsets from the origin. The integer part 5137 of each of these offsets is the offset of the corresponding pixel 5138 from the subblock origin (using the vertical stride). To these 5139 integer parts is added a constant fractional part, which is simply 5140 the difference between the actual motion vector and its integer 5141 truncation used to calculate the origins of the prediction macroblock 5142 and subblock. Each component of this fractional part is an integer 5143 between 0 and 7, representing a forward displacement in eighths of a 5144 pixel. 5146 It is these fractional displacements that determine the filtering 5147 process. If they both happen to be zero (that is, we had a "whole 5148 pixel" motion vector), the prediction subblock is simply copied into 5149 the corresponding piece of the current macroblock's prediction 5150 buffer. As discussed in Chapter 14, the layout of the macroblock's 5151 prediction buffer can depend on the specifics of the reconstruction 5152 implementation chosen. Of course, the vertical displacement between 5153 lines of the prediction subblock is given by the stride, as are all 5154 vertical displacements used here. 5156 Otherwise, at least one of the fractional displacements is non-zero. 5157 We then synthesize the missing pixels via a horizontal, followed by a 5158 vertical, one-dimensional interpolation. 5160 The two interpolations are essentially identical. Each uses an (at 5161 most) six-tap filter (the choice of which of course depends on the 5162 one-dimensional offset). Thus, every calculated pixel references at 5163 most three pixels before (above or to-the-left of) it and at most 5164 three pixels after (below or to-the-right of) it. The horizontal 5165 interpolation must calculate two extra rows above and three extra 5166 rows below the 4x4 block, to provide enough samples for the vertical 5167 interpolation to proceed. 5169 Depending on the reconstruction filter type given in the field 5170 Version Number in the frame tag, either a bicubic or a bilinear tap 5171 set is used. 5173 The exact implementation of subsampling is as follows. 5175 ---- Begin code block -------------------------------------- 5177 /* Filter taps taken to 7-bit precision. 5178 Because DC is always passed, taps always sum to 128. */ 5180 const int BilinearFilters[8][6] = 5181 { 5182 { 0, 0, 128, 0, 0, 0 }, 5183 { 0, 0, 112, 16, 0, 0 }, 5184 { 0, 0, 96, 32, 0, 0 }, 5185 { 0, 0, 80, 48, 0, 0 }, 5186 { 0, 0, 64, 64, 0, 0 }, 5187 { 0, 0, 48, 80, 0, 0 }, 5188 { 0, 0, 32, 96, 0, 0 }, 5189 { 0, 0, 16, 112, 0, 0 } 5190 }; 5192 const int filters [8] [6] = { /* indexed by displacement */ 5193 { 0, 0, 128, 0, 0, 0 }, /* degenerate whole-pixel */ 5194 { 0, -6, 123, 12, -1, 0 }, /* 1/8 */ 5195 { 2, -11, 108, 36, -8, 1 }, /* 1/4 */ 5196 { 0, -9, 93, 50, -6, 0 }, /* 3/8 */ 5197 { 3, -16, 77, 77, -16, 3 }, /* 1/2 is symmetric */ 5198 { 0, -6, 50, 93, -9, 0 }, /* 5/8 = reverse of 3/8 */ 5199 { 1, -8, 36, 108, -11, 2 }, /* 3/4 = reverse of 1/4 */ 5200 { 0, -1, 12, 123, -6, 0 } /* 7/8 = reverse of 1/8 */ 5201 }; 5203 /* One-dimensional synthesis of a single sample. 5204 Filter is determined by fractional displacement */ 5206 Pixel interp( 5207 const int fil[6], /* filter to apply */ 5208 const Pixel *p, /* origin (rounded "before") in 5209 prediction area */ 5211 const int s /* size of one forward step "" */ 5212 ) { 5213 int32 a = 0; 5214 int i = 0; 5215 p -= s + s; /* move back two positions */ 5217 do { 5218 a += *p * fil[i]; 5219 p += s; 5220 } while( ++i < 6); 5222 return clamp255( (a + 64) >> 7); /* round to nearest 5223 8-bit value */ 5224 } 5226 /* First do horizontal interpolation, producing intermediate 5227 buffer. */ 5229 void Hinterp( 5230 Pixel temp[9][4], /* 9 rows of 4 (intermediate) 5231 destination values */ 5232 const Pixel *p, /* subblock origin in prediction 5233 frame */ 5234 int s, /* vertical stride to be used in 5235 prediction frame */ 5236 uint hfrac, /* 0 <= horizontal displacement <= 7 */ 5237 uint bicubic /* 1=bicubic filter, 0=bilinear */ 5238 ) { 5239 const int * const fil = bicubic ? filters [hfrac] : 5240 BilinearFilters[hfrac]; 5242 int r = 0; do /* for each row */ 5243 { 5244 int c = 0; do /* for each destination sample */ 5245 { 5246 /* Pixel separation = one horizontal step = 1 */ 5248 temp[r][c] = interp( fil, p + c, 1); 5249 } 5250 while( ++c < 4); 5251 } 5252 while( p += s, ++r < 9); /* advance p to next row */ 5253 } 5255 /* Finish with vertical interpolation, producing final results. 5256 Input array "temp" is of course that computed above. */ 5258 void Vinterp( 5259 Pixel final[4][4], /* 4 rows of 4 (final) destination values */ 5260 const Pixel temp[9][4], 5261 uint vfrac, /* 0 <= vertical displacement <= 7 */ 5262 uint bicubic /* 1=bicubic filter, 0=bilinear */ 5263 ) { 5264 const int * const fil = bicubic ? filters [vfrac] : 5265 BilinearFilters[vfrac]; 5267 int r = 0; do /* for each row */ 5268 { 5269 int c = 0; do /* for each destination sample */ 5270 { 5271 /* Pixel separation = one vertical step = width 5272 of array = 4 */ 5274 final[r][c] = interp( fil, temp[r] + c, 4); 5275 } 5276 while( ++c < 4); 5277 } 5278 while( ++r < 4); 5279 } 5281 ---- End code block ---------------------------------------- 5283 18.4. Filter Properties 5285 We discuss briefly the rationale behind the choice of filters. Our 5286 approach is necessarily cursory; a genuinely accurate discussion 5287 would require a couple of books. Readers unfamiliar with signal 5288 processing may or may not wish to skip this. 5290 All digital signals are of course sampled in some fashion. The case 5291 where the inter-sample spacing (say in time for audio samples, or 5292 space for pixels) is uniform, that is, the same at all positions, is 5293 particularly common and amenable to analysis. Many aspects of the 5294 treatment of such signals are best-understood in the frequency domain 5295 via Fourier Analysis, particularly those aspects of the signal that 5296 are not changed by shifts in position, especially when those 5297 positional shifts are not given by a whole number of samples. 5299 Non-integral translates of a sampled signal are a textbook example of 5300 the foregoing. In our case of non-integral motion vectors, we wish 5301 to say what the underlying image "really is" at these pixels we don't 5302 have values for but feel that it makes sense to talk about. The 5303 correctness of this feeling is predicated on the underlying signal 5304 being band-limited, that is, not containing any energy in spatial 5305 frequencies that cannot be faithfully rendered at the pixel 5306 resolution at our disposal. In one dimension, this range of "OK" 5307 frequencies is called the Nyquist band; in our two-dimensional case 5308 of integer-grid samples, this range might be termed a Nyquist 5309 rectangle. The finer the grid, the more we know about the image, and 5310 the wider the Nyquist rectangle. 5312 It turns out that, for such band-limited signals, there is indeed an 5313 exact mathematical formula to produce the correct sample value at an 5314 arbitrary point. Unfortunately, this calculation requires the 5315 consideration of every single sample in the image, as well as needing 5316 to operate at infinite precision. Also, strictly speaking, all band- 5317 limited signals have infinite spatial (or temporal) extent, so 5318 everything we are discussing is really some sort of approximation. 5320 It is true that the theoretically correct subsampling procedure, as 5321 well as any approximation thereof, is always given by a translation- 5322 invariant weighted sum (or filter) similar to that used by VP8. It 5323 is also true that the reconstruction error made by such a filter can 5324 be simply represented as a multiplier in the frequency domain, that 5325 is, such filters simply multiply the Fourier transform of any signal 5326 to which they are applied by a fixed function associated to the 5327 filter. This fixed function is usually called the frequency response 5328 (or transfer function); the ideal subsampling filter has a frequency 5329 response equal to one in the Nyquist rectangle and zero everywhere 5330 else. 5332 Another basic fact about approximations to "truly correct" 5333 subsampling is that, the wider the subrectangle (within the Nyquist 5334 rectangle) of spatial frequencies one wishes to "pass" (that is, 5335 correctly render) or, put more accurately, the closer one wishes to 5336 approximate the ideal transfer function, the more samples of the 5337 original signal must be considered by the subsampling, and the wider 5338 the calculation precision necessitated. 5340 The filters chosen by VP8 were chosen, within the constraints of 4 or 5341 6 taps and 7-bit precision, to do the best possible job of handling 5342 the low spatial frequencies near the zeroth DC frequency along with 5343 introducing no resonances (places where the absolute value of the 5344 frequency response exceeds one). 5346 The justification for the foregoing has two parts. First, resonances 5347 can produce extremely objectionable visible artifacts when, as often 5348 happens in actual compressed video streams, filters are applied 5349 repeatedly. Second, the vast majority of energy in real-world images 5350 lies near DC and not at the high-end; also, roughly speaking, human 5351 perception tends to be more sensitive at these lower spatial 5352 frequencies. 5354 To get slightly more specific, the filters chosen by VP8 are the best 5355 resonance-free 4- or 6-tap filters possible, where "best" describes 5356 the frequency response near the origin: the response at 0 is required 5357 to be 1 and the graph of the response at 0 is as flat as possible. 5359 To provide an intuitively more obvious point of reference, the "best" 5360 2-tap filter is given by simple linear interpolation between the 5361 surrounding actual pixels. 5363 Finally, it should be noted that, because of the way motion vectors 5364 are calculated, the (shorter) 4-tap filters (used for odd fractional 5365 displacements) are applied in the chroma plane only. Human color 5366 perception is notoriously poor, especially where higher spatial 5367 frequencies are involved. The shorter filters are easier to 5368 understand mathematically, and the difference between them and a 5369 theoretically slightly better 6-tap filter is negligible where chroma 5370 is concerned. 5372 19. Annex A: Bitstream Syntax 5374 This annex presents the bitstream syntax in a tabular form. All the 5375 information elements have been introduced and explained in the 5376 previous chapters but are collected here for a quick reference. Each 5377 syntax element is shortly described after the tabular representation 5378 along with a reference to the corresponding paragraph in the main 5379 document. The meaning of each syntax element value is not repeated 5380 here. 5382 The top-level hierarchy of the bitstream is introduced in Chapter 4 5383 (Section 4). 5385 Definition of syntax element coding types can be found in Chapter 8 5386 (Section 8). The types used in the representation in this annex are: 5388 o f(n), n-bit value from stream (n successive bits, not boolean 5389 encoded) 5391 o L(n), n-bit number encoded as n booleans (with equal probability 5392 of being 0 or 1) 5394 o B(p), bool with probability p of being 0 5396 o T, tree-encoded value 5398 19.1. Uncompressed Data Chunk 5400 +----------------------+-------+ 5401 | Frame Tag | Type | 5402 +----------------------+-------+ 5403 | frame_tag | f(24) | 5404 | | | 5405 | if (key_frame) { | | 5406 | | | 5407 | start_code | f(24) | 5408 | | | 5409 | horizontal_size_code | f(16) | 5410 | | | 5411 | vertical_size_code | f(16) | 5412 | | | 5413 | } | | 5414 +----------------------+-------+ 5416 The 3-byte frame tag can be parsed as follows: 5418 ---- Begin code block -------------------------------------- 5420 unsigned char *c = pbi->Source; 5421 unsigned int tmp; 5423 tmp = (c[2] << 16) | (c[1] << 8) | c[0]; 5425 key_frame = tmp & 0x1; 5426 version = (tmp >> 1) & 0x7; 5427 show_frame = (tmp >> 4) & 0x1; 5428 first_part_size = (tmp >> 5) & 0x7FFFF; 5430 ---- End code block ---------------------------------------- 5432 Where: 5434 o key_frame indicates if the current frame is a key frame or not 5436 o version determines the bitstream version 5438 o show_frame indicates if the current frame is meant to be displayed 5439 or not 5441 o first_part_size determines the size of the first partition 5442 (control partition) 5444 The start_code is a constant 3-byte pattern having value 0x9d012a. 5445 The latter part of the uncompressed chunk (after the start_code) can 5446 be parsed as follows: 5448 ---- Begin code block -------------------------------------- 5450 unsigned char *c = pbi->Source + 6; 5451 unsigned int tmp; 5453 tmp = (c[1] << 8) | c[0]; 5455 width = tmp & 0x3FFF; 5456 horizontal_scale = tmp >> 14; 5458 tmp = (c[3] << 8) | c[2]; 5460 height = tmp & 0x3FFF; 5461 vertical_scale = tmp >> 14; 5463 ---- End code block ---------------------------------------- 5465 19.2. Frame Header 5467 +-------------------------------+------+ 5468 | Frame Header | Type | 5469 +-------------------------------+------+ 5470 | if (key_frame) { | | 5471 | | | 5472 | color_space | L(1) | 5473 | | | 5474 | clamping_type | L(1) | 5475 | | | 5476 | } | | 5477 | | | 5478 | segmentation_enabled | L(1) | 5479 | | | 5480 | if (segmentation_enabled) { | | 5481 | | | 5482 | update_segmentation() | | 5483 | | | 5484 | } | | 5485 | | | 5486 | filter_type | L(1) | 5487 | | | 5488 | loop_filter_level | L(6) | 5489 | | | 5490 | sharpness_level | L(3) | 5491 | | | 5492 | mb_lf_adjustments() | | 5493 | | | 5494 | log2_nbr_of_dct_partitions | L(2) | 5495 | | | 5496 | quant_indices() | | 5497 | | | 5498 | if (key_frame) { | | 5499 | | | 5500 | refresh_entropy_probs | L(1) | 5501 | | | 5502 | } else { | | 5503 | | | 5504 | refresh_golden_frame | L(1) | 5505 | | | 5506 | refresh_alternate_frame | L(1) | 5507 | | | 5508 | if (!refresh_golden_frame) | | 5509 | | | 5510 | copy_buffer_to_golden | L(2) | 5511 | | | 5512 | if (!refresh_alternate_frame) | | 5513 | copy_buffer_to_alternate | L(2) | 5514 | | | 5515 | sign_bias_golden | L(1) | 5516 | | | 5517 | sign_bias_alternate | L(1) | 5518 | | | 5519 | refresh_entropy_probs | L(1) | 5520 | | | 5521 | refresh_last | L(1) | 5522 | | | 5523 | } | | 5524 | | | 5525 | token_prob_update() | | 5526 | | | 5527 | mb_no_coeff_skip | L(1) | 5528 +-------------------------------+------+ 5530 +--------------------------------------+------+ 5531 | Frame Header | Type | 5532 +--------------------------------------+------+ 5533 | prob_skip_false | L(8) | 5534 | | | 5535 | if (!key_frame) { | | 5536 | | | 5537 | prob_intra | L(8) | 5538 | | | 5539 | prob_last | L(8) | 5540 | | | 5541 | prob_golden | L(8) | 5542 | | | 5543 | intra_16x16_prob_update_flag | L(1) | 5544 | | | 5545 | if (intra_16x16_prob_update_flag) { | | 5546 | | | 5547 | for (i = 0; i < 4; i++) | | 5548 | | | 5549 | intra_16x16_prob | L(8) | 5550 | | | 5551 | } | | 5552 | | | 5553 | intra_chroma prob_update_flag | L(1) | 5554 | | | 5555 | if (intra_chroma_prob_update_flag) { | | 5556 | | | 5557 | for (i = 0; i < 3; i++) | | 5558 | | | 5559 | intra_chroma_prob | L(8) | 5560 | | | 5561 | } | | 5562 | | | 5563 | mv_prob_update() | | 5564 | | | 5565 | } | | 5566 +--------------------------------------+------+ 5568 o color_space defines the YUV color space of the sequence (9.2 5569 (Section 9.2)) 5571 o clamping_type specifies if the decoder is required to clamp the 5572 reconstructed pixel values (9.2 (Section 9.2)) 5574 o segmentation_enabled enables the segmentation feature for the 5575 current frame (9.3 (Section 9.3)) 5577 o filter_type determines whether the normal or the simple loop 5578 filter is used (9.4 (Section 9.4), 15 (Section 15)) 5580 o loop_filter_level controls the deblocking filter (9.4 5581 (Section 9.4), 15 (Section 15)) 5583 o sharpness_level controls the deblocking filter (9.4 (Section 9.4), 5584 15 (Section 15)) 5586 o log2_nbr_of_dct_partitions determines the number of separate 5587 partitions containing the DCT coefficients of the macroblocks (9.5 5588 (Section 9.5)) 5590 o refresh_entropy_probs determines whether updated token 5591 probabilities are used only for this frame or until further update 5593 o refresh_golden_frame determines if the current decoded frame 5594 refreshes the golden frame (9.7 (Section 9.7)) 5596 o refresh_alternate_frame determines if the current decoded frame 5597 refreshes the alternate reference frame (9.7 (Section 9.7)) 5599 o copy_buffer_to_golden determines if the golden reference is 5600 replaced by another reference (9.7 (Section 9.7)) 5602 o copy_buffer_to_alternate determines if the alternate reference is 5603 replaced by another reference (9.7 (Section 9.7)) 5605 o sign_bias_golden controls the sign of motion vectors when the 5606 golden frame is referenced (9.7 (Section 9.7)) 5608 o sign_bias_alternate controls the sign of motion vectors when the 5609 alternate frame is referenced (9.7 (Section 9.7)) 5611 o refresh_last determines if the current decoded frame refreshes the 5612 last frame reference buffer (9.8 (Section 9.8)) 5614 o mb_no_coeff_skip enables or disables the skipping of macroblocks 5615 containing no non-zero coefficients (9.10 (Section 9.10)) 5617 o prob_skip_false the probability that the macroblock is not skipped 5618 (flag indicating skipped macroblock is false) (9.10 5619 (Section 9.10)) 5621 o prob_intra the probability of an intra macroblock (9.10 5622 (Section 9.10)) 5624 o prob_last the probability that the last reference frame is used 5625 for inter prediction (9.10 (Section 9.10)) 5627 o prob_golden the probability that the golden reference frame is 5628 used for inter prediction (9.10 (Section 9.10)) 5630 o intra_16x16_prob_update_flag indicates if the branch probabilies 5631 used in the decoding of luma intra prediction mode are updated 5632 (9.10 (Section 9.10)) 5634 o intra_16x16_prob the branch probabilities of the luma intra 5635 prediction mode decoding tree 5637 o intra_chroma_prob_update_flag indicates if the branch probabilies 5638 used in the decoding of chroma intra prediction mode are updated 5639 (9.10 (Section 9.10)) 5641 o intra_chroma_prob the branch probabilities of the chroma intra 5642 prediction mode decoding tree 5644 +------------------------------------+------+ 5645 | update_segmentation() | Type | 5646 +------------------------------------+------+ 5647 | update_mb_segmentation_map | L(1) | 5648 | | | 5649 | update_segment_feature_data | L(1) | 5650 | | | 5651 | if (update_segment_feature_data) { | | 5652 | | | 5653 | segment_feature_mode | L(1) | 5654 | | | 5655 | for (i = 0; i < 4; i++) { | | 5656 | | | 5657 | quantizer_update | L(1) | 5658 | | | 5659 | if (quantizer_update) { | | 5660 | | | 5661 | quantizer_update_value | L(7) | 5662 | | | 5663 | quantizer_update_sign | L(1) | 5664 | | | 5665 | } | | 5666 | | | 5667 | } | | 5668 | | | 5669 | for (i = 0; i < 4; i++) { | | 5670 | | | 5671 | loop_filter_update | L(1) | 5672 | if (loop_filter_update) { | | 5673 | | | 5674 | lf_update_value | L(6) | 5675 | | | 5676 | lf_update_sign | L(1) | 5677 | | | 5678 | } | | 5679 | | | 5680 | } | | 5681 | | | 5682 | } | | 5683 | | | 5684 | if (update_mb_segmentation_map) { | | 5685 | | | 5686 | for (i = 0; i < 3; i++) { | | 5687 | | | 5688 | segment_prob_update | L(1) | 5689 | | | 5690 | if (segment_prob_update) { | | 5691 | | | 5692 | segment_prob | L(8) | 5693 | | | 5694 | } | | 5695 | | | 5696 | } | | 5697 | | | 5698 | } | | 5699 +------------------------------------+------+ 5701 o update_mb_segmentation_map determines if the MB segmentation map 5702 is updated in the current frame (9.3 (Section 9.3)) 5704 o update_segment_feature_data indicates if the segment feature data 5705 is updated in the current frame (9.3 (Section 9.3)) 5707 o segment_feature_mode indicates the feature data update mode, 0 for 5708 delta and 1 for the absolute value (9.3 (Section 9.3)) 5710 o quantizer_update indicates if the quantizer value is updated for 5711 the i^(th) segment (9.3 (Section 9.3)) 5713 o quantizer_update_value indicates the update value for the segment 5714 quantizer (9.3 (Section 9.3)) 5716 o quantizer_update_sign indicates the update sign for the segment 5717 quantizer (9.3 (Section 9.3)) 5719 o loop_filter_update indicates if the loop filter level value is 5720 updated for the i^(th) segment (9.3 (Section 9.3)) 5722 o lf_update_value indicates the update value for the loop filter 5723 level (9.3 (Section 9.3)) 5725 o lf_update_sign indicates the update sign for the loop filter level 5726 (9.3 (Section 9.3)) 5728 o segment_prob_update indicates if the branch probabilities used to 5729 decode the segment_id in the MB header are decoded from the stream 5730 or use the default value of 255 (9.3 (Section 9.3)) 5732 o segment_prob the branch probabilities of the segment_id decoding 5733 tree (9.3 (Section 9.3)) 5734 +------------------------------------+------+ 5735 | mb_lf_adjustments() | Type | 5736 +------------------------------------+------+ 5737 | loop_filter_adj_enable | L(1) | 5738 | | | 5739 | if (loop_filter_adj_enable) { | | 5740 | | | 5741 | mode_ref_lf_delta_update | L(1) | 5742 | | | 5743 | if (mode_ref_lf_delta_update) { | | 5744 | | | 5745 | for (i = 0; i < 4; i++) { | | 5746 | | | 5747 | ref_frame_delta_update_flag | L(1) | 5748 | | | 5749 | if (ref_frame_delta_update_flag) { | | 5750 | | | 5751 | delta_magnitude | L(6) | 5752 | | | 5753 | delta_sign | L(1) | 5754 | | | 5755 | } | | 5756 | | | 5757 | } | | 5758 | | | 5759 | for (i = 0; i < 4; i++) { | | 5760 | | | 5761 | mb_mode_delta_update_flag | L(1) | 5762 | | | 5763 | if (mb_mode_delta_update_flag) { | | 5764 | | | 5765 | delta_magnitude | L(6) | 5766 | | | 5767 | delta_sign | L(1) | 5768 | | | 5769 | } | | 5770 | | | 5771 | } | | 5772 | | | 5773 | } | | 5774 | | | 5775 | } | | 5776 +------------------------------------+------+ 5778 o loop_filter_adj_enable indicates if the MB-level loop filter 5779 adjustment (based on the used reference frame and coding mode) is 5780 on for the current frame (9.4 (Section 9.4)) 5782 o mode_ref_lf_delta_update indicates if the delta values used in 5783 adjustment are updated in the current frame (9.4 (Section 9.4)) 5785 o ref_frame_delta_update_flag indicates if the adjustment delta 5786 value corresponding to a certain used reference frame is updated 5787 (9.4 (Section 9.4)) 5789 o delta_magnitude is the absolute value of the delta value 5791 o delta_sign is the sign of the delta value 5793 o mb_mode_delta_update_flag indicates if the adjustment delta value 5794 corresponding to certain MB prediction mode is updated (9.4 5795 (Section 9.4)) 5796 +----------------------------+------+ 5797 | quant_indices() | Type | 5798 +----------------------------+------+ 5799 | y_ac_qi | L(7) | 5800 | | | 5801 | y_dc_delta_present | L(1) | 5802 | | | 5803 | if (y_dc_delta_present) { | | 5804 | | | 5805 | y_dc_delta_magnitude | L(4) | 5806 | | | 5807 | y_dc_delta_sign | L(1) | 5808 | | | 5809 | } | | 5810 | | | 5811 | if (y2_dc_delta_present) { | | 5812 | | | 5813 | y2_dc_delta_magnitude | L(4) | 5814 | | | 5815 | y2_dc_delta_sign | L(1) | 5816 | | | 5817 | } | | 5818 | | | 5819 | if (y2_ac_delta_present) { | | 5820 | | | 5821 | y2_ac_delta_magnitude | L(4) | 5822 | | | 5823 | y2_ac_delta_sign | L(1) | 5824 | | | 5825 | } | | 5826 | | | 5827 | if (uv_dc_delta_present) { | | 5828 | | | 5829 | uv_dc_delta_magnitude | L(4) | 5830 | | | 5831 | uv_dc_delta_sign | L(1) | 5832 | | | 5833 | } | | 5834 | | | 5835 | if (uv_ac_delta_present) { | | 5836 | | | 5837 | uv_ac_delta_magnitude | L(4) | 5838 | | | 5839 | uv_ac_delta_sign | L(1) | 5840 | | | 5841 | } | | 5842 +----------------------------+------+ 5844 o y_ac_qi is the dequantization table index used for the luma AC 5845 coefficients (and other coefficient groups if no delta value is 5846 present) (9.6 (Section 9.6)) 5848 o y_dc_delta_present indicates if the stream contains a delta value 5849 that is added to the baseline index to obtain the luma DC 5850 coefficient dequantization index (9.6 (Section 9.6)) 5852 o y_dc_delta_magnitude the magnitude of the delta value (9.6 5853 (Section 9.6)) 5855 o y_dc_delta_sign the sign of the delta value (9.6 (Section 9.6)) 5857 o y2_dc_delta_present indicates if the stream contains a delta value 5858 that is added to the baseline index to obtain the Y2 block DC 5859 coefficient dequantization index (9.6 (Section 9.6)) 5861 o y2_ac_delta_present indicates if the stream contains a delta value 5862 that is added to the baseline index to obtain the Y2 block AC 5863 coefficient dequantization index (9.6 (Section 9.6)) 5865 o uv_dc_delta_present indicates if the stream contains a delta value 5866 that is added to the baseline index to obtain the chroma DC 5867 coefficient dequantization index (9.6 (Section 9.6)) 5869 o uv_ac_delta_present indicates if the stream contains a delta value 5870 that is added to the baseline index to obtain the chroma AC 5871 coefficient dequantization index (9.6 (Section 9.6)) 5872 +-------------------------------+------+ 5873 | token_prob_update() | Type | 5874 +-------------------------------+------+ 5875 | for (i = 0; i < 4; i++) { | | 5876 | | | 5877 | for (j = 0; j < 8; j++) { | | 5878 | | | 5879 | for (k = 0; k < 3; k++) { | | 5880 | | | 5881 | for (l = 0; l < 11; l++) { | | 5882 | | | 5883 | coeff_prob_update_flag | L(1) | 5884 | | | 5885 | if (coeff_prob_update_flag) { | | 5886 | | | 5887 | coeff_prob | L(8) | 5888 | | | 5889 | } | | 5890 | | | 5891 | } | | 5892 | | | 5893 | } | | 5894 | | | 5895 | } | | 5896 | | | 5897 | } | | 5898 +-------------------------------+------+ 5900 o coeff_prob_update_flag indicates if the corresponding branch 5901 probability is updated in the current frame (13.4 (Section 13.4)) 5903 o coeff_prob is the new branch probability (13.4 (Section 13.4)) 5904 +----------------------------+------+ 5905 | mv_prob_update() | Type | 5906 +----------------------------+------+ 5907 | for (i = 0; i < 2; i++) { | | 5908 | | | 5909 | for (j = 0; j < 19; j++) { | | 5910 | | | 5911 | mv_prob_update_flag | L(1) | 5912 | | | 5913 | if (mv_prob_update_flag) { | | 5914 | | | 5915 | prob | L(7) | 5916 | | | 5917 | } | | 5918 | | | 5919 | } | | 5920 | | | 5921 | } | | 5922 +----------------------------+------+ 5924 o mv_prob_update_flag indicates if the corresponding MV decoding 5925 probability is updated in the current frame (17.2 (Section 17.2)) 5927 o prob is the updated probability (17.2 (Section 17.2)) 5929 19.3. Macroblock Data 5931 +---------------------+------+ 5932 | Macroblock Data | Type | 5933 +---------------------+------+ 5934 | macroblock_header() | | 5935 | | | 5936 | residual_data() | | 5937 +---------------------+------+ 5939 +--------------------------------+------+ 5940 | macroblock_header() | Type | 5941 +--------------------------------+------+ 5942 | if (segmentation_map_update) { | | 5943 | | | 5944 | segment_id | T | 5945 | | | 5946 | if (mb_no_coeff_skip) { | | 5947 | | | 5948 | mb_coeff_skip | B(p) | 5949 | | | 5950 | } | | 5951 | | | 5952 | if (!key_frame) { | | 5953 | | | 5954 | is_inter_mb | B(p) | 5955 | | | 5956 | if (is_inter_mb) { | | 5957 | | | 5958 | mb_ref_frame_sel1 | B(p) | 5959 | | | 5960 | if (mb_ref_frame_sel1) | | 5961 | | | 5962 | mb_ref_frame_sel2 | B(p) | 5963 | | | 5964 | mv_mode | T | 5965 | | | 5966 | if (mv_mode == SPLITMV) { | | 5967 | | | 5968 | mv_split_mode | T | 5969 | | | 5970 | for (i = 0; i < numMvs; i++) { | | 5971 | | | 5972 | sub_mv_mode | T | 5973 | | | 5974 | if (sub_mv_mode == NEWMV4x4) { | | 5975 | | | 5976 | read_mvcomponent() | | 5977 | | | 5978 | read_mvcomponent() | | 5979 | | | 5980 | } | | 5981 | | | 5982 | } | | 5983 | | | 5984 | } else if (mv_mode == NEWMV) { | | 5985 | | | 5986 | read_mvcomponent() | | 5987 | | | 5988 | read_mvcomponent() | | 5989 | | | 5990 | } | | 5991 | | | 5992 | } else { /* intra mb */ | | 5993 | | | 5994 | intra_y_mode | T | 5995 +--------------------------------+------+ 5996 +-------------------------------+------+ 5997 | macroblock_header() | Type | 5998 +-------------------------------+------+ 5999 | if (intra_y_mode == B_PRED) { | | 6000 | | | 6001 | for (i = 0; i < 16; i++) | | 6002 | | | 6003 | intra_b_mode | T | 6004 | | | 6005 | } | | 6006 | | | 6007 | intra_uv_mode | T | 6008 | | | 6009 | } | | 6010 +-------------------------------+------+ 6012 o segment_id indicates to which segment the macroblock belongs (10 6013 (Section 10)) 6015 o mb_coeff_skip indicates if the macroblock contains any coded 6016 coefficients or not (11.1 (Section 11.1)) 6018 o is_inter_mb indicates if the macroblock is intra or inter coded 6019 (16 (Section 16)) 6021 o mb_ref_frame_sel1 selects the reference frame to be used; last 6022 frame (0), golden/alternate (1) (16.2 (Section 16.2)) 6024 o mb_ref_frame_sel2 selects whether the golden (0) or alternate 6025 reference frame (1) is used (16.2 (Section 16.2)) 6027 o mv_mode determines the macroblock motion vector mode (16.2 6028 (Section 16.2)) 6030 o mv_split_mode gives macroblock partitioning specification and 6031 determines number of motion vectors used (numMvs)(16.2 6032 (Section 16.2)) 6034 o sub_mv_mode determines the sub-macroblock motion vector mode for 6035 macroblocks coded using SPLITMV motion vector mode (16.2 6036 (Section 16.2)) 6038 o intra_y_mode selects the luminance intra prediction mode (16.1 6039 (Section 16.1)) 6041 o intra_b_mode selects the sub-macroblock luminance prediction mode 6042 for macroblocks coded using B_PRED mode (16.1 (Section 16.1)) 6044 o intra_uv_mode selects the chrominance intra prediction mode (16.1 6045 (Section 16.1)) 6047 +----------------------------------------------+------+ 6048 | residual_data() | Type | 6049 +----------------------------------------------+------+ 6050 | if (!mb_coeff_skip) { | | 6051 | | | 6052 | if ( (is_inter_mb && mv_mode != SPLITMV) || | | 6053 | | | 6054 | (!is_inter_mb && intra_y_mode != B_PRED) ) { | | 6055 | | | 6056 | residual_block() /* Y2 */ | | 6057 | | | 6058 | } | | 6059 | | | 6060 | for (i = 0; i < 24; i++) | | 6061 | | | 6062 | residual_block() /* 16 Y, 4 U, 4 V */ | | 6063 | | | 6064 | } | | 6065 +----------------------------------------------+------+ 6067 +-------------------------------------+------+ 6068 | residual_block() | Type | 6069 +-------------------------------------+------+ 6070 | for (i = firstCoeff; i < 16; i++) { | | 6071 | | | 6072 | token | T | 6073 | | | 6074 | if (token == EOB) break; | | 6075 | | | 6076 | if (token_has_extra_bits) { | | 6077 | | | 6078 | extra_bits | L(n) | 6079 | | | 6080 | sign | L(1) | 6081 +-------------------------------------+------+ 6083 o firstCoeff is 1 for luma blocks of macroblocks containing Y2 6084 subblock, otherwise 0 6086 o token defines the value of the coefficient, the value range of the 6087 coefficient or the end of block (13.2 (Section 13.2)) 6089 o extra_bits determine the value of the coefficient within the value 6090 range defined by token (13.2 (Section 13.2)) 6092 o sign indicates the sign of the coefficient (13.2 (Section 13.2)) 6094 20. License 6096 Google hereby grants to You a perpetual, worldwide, non-exclusive, 6097 no-charge, royalty-free, irrevocable (except as stated in this 6098 section) patent license to make, have made, use, offer to sell, sell, 6099 import, and otherwise implementations of this specification where 6100 such license applies only to those patent claims, both currently 6101 owned by Google and acquired in the future, licensable by Google that 6102 are necessarily infringed by implementation of this specification. 6103 If You or your agent or exclusive licensee institute or order or 6104 agree to the institution of patent litigation against any entity 6105 (including a cross-claim or counterclaim in a lawsuit) alleging that 6106 any implementation of this specification constitutes direct or 6107 contributory patent infringement, or inducement of patent 6108 infringement, then any rights granted to You under the License for 6109 this specification shall terminate as of the date such litigation is 6110 filed. 6112 21. Copyright 6114 This specification is made available under a Creative Commons 6115 Attribution 3.0 License [4]. 6117 22. References 6119 [1] International Telecommunication Union, "ITU BT.601: Studio 6120 encoding parameters of digital television for standard 4:3 and 6121 wide screen 16:9 aspect ratios", January 2007. 6123 [2] Kernighan, B. and D. Ritchie, "The C Programming Language (2nd 6124 edition)", April 1988. 6126 [3] Shannon, C., "A Mathematical Theory of Communication", Bell 6127 System Technical Journal Vol. 27, pp. 379-423, 623-656, July, 6128 October 1948. 6130 [4] 6132 Authors' Addresses 6134 James Bankoski 6135 Google, Inc. 6137 Email: jimbankoski@google.com 6139 Paul Wilkins 6140 Google, Inc. 6142 Email: paulwilkins@google.com 6144 Yaowu Xu 6145 Google, Inc. 6147 Email: yaowu@google.com