idnits 2.17.1 draft-bankoski-vp8-bitstream-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 635 has weird spacing: '...typedef sign...' == Line 644 has weird spacing: '...def int int32...' -- The document date (January 6, 2011) is 4852 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '
' and
     '' lines.


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  -- Looks like a reference, but probably isn't: 'A' on line 2059

  -- Looks like a reference, but probably isn't: 'L' on line 2059

  == Missing Reference: '8' is mentioned on line 5192, but not defined

  -- Looks like a reference, but probably isn't: '-1' on line 2498

  == Missing Reference: '0' is mentioned on line 5453, but not defined

  == Missing Reference: '7' is mentioned on line 2551, but not defined

  == Missing Reference: '9' is mentioned on line 5260, but not defined

  == Missing Reference: '5' is mentioned on line 4789, but not defined

  == Missing Reference: '6' is mentioned on line 5207, but not defined

  == Missing Reference: '16' is mentioned on line 2891, but not defined

  == Missing Reference: '12' is mentioned on line 3629, but not defined

  -- Looks like a reference, but probably isn't: 'MVPcount' on line 4869

  -- Looks like a reference, but probably isn't: 'MVPsign' on line 4918


     Summary: 3 errors (**), 0 flaws (~~), 11 warnings (==), 7 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                        J. Bankoski
3	Internet-Draft                                                P. Wilkins
4	Intended status: Informational                                     Y. Xu
5	Expires: July 10, 2011                                      Google, Inc.
6	                                                         January 6, 2011

8	                   VP8 Data Format and Decoding Guide
9	                    draft-bankoski-vp8-bitstream-00

11	Abstract

13	   This document describes the VP8 compressed video data format created
14	   by Google On2, together with a discussion of the decoding procedure
15	   for this format.

17	Status of this Memo

19	   This Internet-Draft is submitted in full conformance with the
20	   provisions of BCP 78 and BCP 79.

22	   Internet-Drafts are working documents of the Internet Engineering
23	   Task Force (IETF).  Note that other groups may also distribute
24	   working documents as Internet-Drafts.  The list of current Internet-
25	   Drafts is at http://datatracker.ietf.org/drafts/current/.

27	   Internet-Drafts are draft documents valid for a maximum of six months
28	   and may be updated, replaced, or obsoleted by other documents at any
29	   time.  It is inappropriate to use Internet-Drafts as reference
30	   material or to cite them other than as "work in progress."

32	   This Internet-Draft will expire on July 10, 2011.

34	Copyright Notice

36	   Copyright (c) 2011 IETF Trust and the persons identified as the
37	   document authors.  All rights reserved.

39	   This document is subject to BCP 78 and the IETF Trust's Legal
40	   Provisions Relating to IETF Documents
41	   (http://trustee.ietf.org/license-info) in effect on the date of
42	   publication of this document.  Please review these documents
43	   carefully, as they describe your rights and restrictions with respect
44	   to this document.

46	Table of Contents

48	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   4
49	   2.  Format Overview . . . . . . . . . . . . . . . . . . . . . . .   6
50	   3.  Compressed Frame Types  . . . . . . . . . . . . . . . . . . .   8
51	   4.  Overview of Compressed Data Format  . . . . . . . . . . . . .   9
52	   5.  Overview of the Decoding Process  . . . . . . . . . . . . . .  11
53	   6.  Description of Algorithms . . . . . . . . . . . . . . . . . .  16
54	   7.  Boolean Entropy Decoder . . . . . . . . . . . . . . . . . . .  19
55	     7.1.  Underlying Theory of Coding . . . . . . . . . . . . . . .  20
56	     7.2.  Practical Algorithm Description . . . . . . . . . . . . .  21
57	     7.3.  Actual Implementation . . . . . . . . . . . . . . . . . .  23
58	   8.  Compressed Data Components  . . . . . . . . . . . . . . . . .  28
59	     8.1.  Tree Coding Implementation  . . . . . . . . . . . . . . .  30
60	     8.2.  Tree Coding Example . . . . . . . . . . . . . . . . . . .  31
61	   9.  Frame Header  . . . . . . . . . . . . . . . . . . . . . . . .  34
62	     9.1.  Uncompressed Data Chunk . . . . . . . . . . . . . . . . .  34
63	     9.2.  Color Space and Pixel Type (Key Frames-only)  . . . . . .  37
64	     9.3.  Segment-based Adjustments . . . . . . . . . . . . . . . .  37
65	     9.4.  Loop Filter Type and Levels . . . . . . . . . . . . . . .  38
66	     9.5.  Token Partition and Partition Data Offsets  . . . . . . .  39
67	     9.6.  Dequantization Indices  . . . . . . . . . . . . . . . . .  40
68	     9.7.  Refresh Golden Frame and AltRef Frame . . . . . . . . . .  41
69	     9.8.  Refresh Last Frame Buffer . . . . . . . . . . . . . . . .  43
70	     9.9.  DCT Coefficient Probability Update  . . . . . . . . . . .  43
71	     9.10. Remaining Frame Header Data (non-Key Frame) . . . . . . .  43
72	     9.11. Remaining Frame Header Data (Key Frame) . . . . . . . . .  44
73	   10. Segment-based Feature Adjustments . . . . . . . . . . . . . .  45
74	   11. Key Frame Macroblock Prediction Records . . . . . . . . . . .  46
75	     11.1. mb_skip_coeff . . . . . . . . . . . . . . . . . . . . . .  46
76	     11.2. Luma Modes  . . . . . . . . . . . . . . . . . . . . . . .  46
77	     11.3. Subblock Mode Contexts  . . . . . . . . . . . . . . . . .  49
78	     11.4. Chroma Modes  . . . . . . . . . . . . . . . . . . . . . .  50
79	     11.5. Subblock Mode Probability Table . . . . . . . . . . . . .  51
80	   12. Intraframe Prediction . . . . . . . . . . . . . . . . . . . .  55
81	     12.1. mb_skip_coeff . . . . . . . . . . . . . . . . . . . . . .  55
82	     12.2. Chroma Prediction . . . . . . . . . . . . . . . . . . . .  56
83	     12.3. Luma Prediction . . . . . . . . . . . . . . . . . . . . .  58
84	   13. DCT Coefficient Decoding  . . . . . . . . . . . . . . . . . .  64
85	     13.1. MB Without non-Zero Coefficient Values  . . . . . . . . .  64
86	     13.2. Coding of Individual Coefficient Values . . . . . . . . .  65
87	     13.3. Token Probabilities . . . . . . . . . . . . . . . . . . .  67
88	     13.4. Token Probability Updates . . . . . . . . . . . . . . . .  71
89	     13.5. Default Token Probability Table . . . . . . . . . . . . .  76
90	   14. DCT and WHT Inversion and Macroblock Reconstruction . . . . .  81
91	     14.1. Dequantization  . . . . . . . . . . . . . . . . . . . . .  81
92	     14.2. Inverse Transforms  . . . . . . . . . . . . . . . . . . .  82
93	     14.3. Implementation of the WHT Inversion . . . . . . . . . . .  83
94	     14.4. Implementation of the DCT Inversion . . . . . . . . . . .  85
95	     14.5. Summation of Predictor and Residue  . . . . . . . . . . .  88
96	   15. Loop Filter . . . . . . . . . . . . . . . . . . . . . . . . .  89
97	     15.1. Filter Geometry and Overall Procedure . . . . . . . . . .  90
98	     15.2. Simple Filter . . . . . . . . . . . . . . . . . . . . . .  92
99	     15.3. Normal Filter . . . . . . . . . . . . . . . . . . . . . .  96
100	     15.4. Calculation of Control Parameters . . . . . . . . . . . . 101
101	   16. Interframe Macroblock Prediction Records  . . . . . . . . . . 103
102	     16.1. Intra-Predicted Macroblocks . . . . . . . . . . . . . . . 103
103	     16.2. Inter-Predicted Macroblocks . . . . . . . . . . . . . . . 104
104	     16.3. Mode and Motion Vector Contexts . . . . . . . . . . . . . 105
105	     16.4. Split Prediction  . . . . . . . . . . . . . . . . . . . . 111
106	   17. Motion Vector Decoding  . . . . . . . . . . . . . . . . . . . 115
107	     17.1. Coding of Each Component  . . . . . . . . . . . . . . . . 115
108	     17.2. Probability Updates . . . . . . . . . . . . . . . . . . . 117
109	   18. Interframe Prediction . . . . . . . . . . . . . . . . . . . . 120
110	     18.1. Bounds on and Adjustment of Motion Vectors  . . . . . . . 120
111	     18.2. Prediction Subblocks  . . . . . . . . . . . . . . . . . . 121
112	     18.3. Sub-pixel Interpolation . . . . . . . . . . . . . . . . . 122
113	     18.4. Filter Properties . . . . . . . . . . . . . . . . . . . . 125
114	   19. Annex A: Bitstream Syntax . . . . . . . . . . . . . . . . . . 128
115	     19.1. Uncompressed Data Chunk . . . . . . . . . . . . . . . . . 128
116	     19.2. Frame Header  . . . . . . . . . . . . . . . . . . . . . . 130
117	     19.3. Macroblock Data . . . . . . . . . . . . . . . . . . . . . 142
118	   20. License . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
119	   21. Copyright . . . . . . . . . . . . . . . . . . . . . . . . . . 148
120	   22. References  . . . . . . . . . . . . . . . . . . . . . . . . . 149
121	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . . 150

123	1.  Introduction

125	   IMPORTANT NOTE: Portions of this document contain mathematical
126	   expressions, code blocks and other non-narrative content that may not
127	   render suitably in the text-only RFC format.  Formatted versions of
128	   this document's content is available at http://www.webmproject.org/.

130	   * * *

132	   This document describes the VP8 compressed video data format created
133	   by Google On2, together with a discussion of the decoding procedure
134	   for this format.  It is intended to be used in conjunction with and
135	   as a guide to the reference decoder provided by Google On2.  If there
136	   are any conflicts between this document and the reference source
137	   code, the reference source code should be considered correct.  The
138	   bitstream is defined by the reference source code and not this
139	   document.

141	   Like many modern video compression schemes, VP8 is based on
142	   decomposition of frames into square subblocks of pixels, prediction
143	   of such subblocks using previously constructed blocks, and adjustment
144	   of such predictions (as well as synthesis of unpredicted blocks)
145	   using a discrete cosine transform (hereafter abbreviated as DCT).  In
146	   one special case, however, VP8 uses a "Walsh-Hadamard" (hereafter
147	   abbreviated as WHT) transform instead of a DCT.

149	   Roughly speaking, such systems reduce datarate by exploiting the
150	   temporal and spatial coherence of most video signals.  It is more
151	   efficient to specify the location of a visually similar portion of a
152	   prior frame than it is to specify pixel values.  The frequency
153	   segregation provided by the DCT and WHT facilitate the exploitation
154	   of both spatial coherence in the original signal and the tolerance of
155	   the human visual system to moderate losses of fidelity in the
156	   reconstituted signal.

158	   VP8 augments these basic concepts with, among other things,
159	   sophisticated usage of contextual probabilities.  The result is a
160	   significant reduction in datarate at a given quality.

162	   Unlike some similar schemes (the older MPEG formats, for example),
163	   VP8 specifies exact values for reconstructed pixels.  Specifically,
164	   the specification for the DCT and WHT portions of the reconstruction
165	   does not allow for any "drift" caused by truncation of fractions.
166	   Rather, the algorithm is specified using fixed-precision integer
167	   operations exclusively.  This greatly facilitates the verification of
168	   the correctness of a decoder implementation as well as avoiding
169	   difficult-to-predict visual incongruities between such
170	   implementations.

172	   It should be remarked that, in a complete video playback system, the
173	   displayed frames may or may not be identical to the reconstructed
174	   frames.  Many systems apply a final level of filtering (commonly
175	   referred to as postprocessing) to the reconstructed frames prior to
176	   viewing.  Such postprocessing has no effect on the decoding and
177	   reconstruction of subsequent frames (which are predicted using the
178	   completely-specified reconstructed frames) and is beyond the scope of
179	   this document.  In practice, the nature and extent of this sort of
180	   postprocessing is dependent on both the taste of the user and on the
181	   computational facilities of the playback environment.

183	2.  Format Overview

185	   VP8 works exclusively with an 8-bit YUV 4:2:0 image format.  In this
186	   format, each 8-bit pixel in the two chroma planes (U and V)
187	   corresponds positionally to a 2x2 block of 8-bit luma pixels in the Y
188	   plane; coordinates of the upper left corner of the Y block are of
189	   course exactly twice the coordinates of the corresponding chroma
190	   pixels.  When we refer to pixels or pixel distances without
191	   specifying a plane, we are implicitly referring to the Y plane or to
192	   the complete image, both of which have the same (full) resolution.

194	   As is usually the case, the pixels are simply a large array of bytes
195	   stored in rows from top to bottom, each row being stored from left to
196	   right.  This "left to right" then "top to bottom" raster-scan order
197	   is reflected in the layout of the compressed data as well.

199	   Provision has been made for the support of two different YUV color
200	   formats in the VP8 bitstream header, however only one format is
201	   supported in the first release of VP8.

203	   The YUV formats differ in terms of their conversion to and from RGB
204	   color space.  The first corresponds to the traditional YUV color
205	   space similar to the YCrCb color space defined in ITU-R BT.601.  The
206	   second (currently unsupported) format corresponds to a new YUV color
207	   space whose digital conversion to and from RGB can be implemented
208	   without multiplications and divides.  The VP8 Decoder should decode
209	   and pass the information on to the processes that convert the YUV
210	   output to RGB color space.

212	   Occasionally, at very low datarates, a compression system may decide
213	   to reduce the resolution of the input signal to facilitate efficient
214	   compression.  The VP8 data format supports this via optional
215	   upscaling of its internal reconstruction buffer prior to output (this
216	   is completely distinct from the optional postprocessing discussed
217	   earlier, which has nothing to do with decoding per se).  This
218	   upsampling restores the video frames to their original resolution.
219	   In other words, the compression/decompression system can be viewed as
220	   a "black box", where the input and output is always at a given
221	   resolution.  The compressor might decide to "cheat" and process the
222	   signal at a lower resolution.  In that case, the decompressor needs
223	   the ability to restore the signal to its original resolution.

225	   Internally, VP8 decomposes each output frame into an array of
226	   macroblocks.  A macroblock is a square array of pixels whose Y
227	   dimensions are 16x16 and whose U and V dimensions are 8x8.
228	   Macroblock-level data in a compressed frame occurs (and must be
229	   processed) in a raster order similar to that of the pixels comprising
230	   the frame.

232	   Macroblocks are further decomposed into 4x4 subblocks.  Every
233	   macroblock has 16 Y subblocks, 4 U subblocks, and 4 V subblocks.  Any
234	   subblock-level data (and processing of such data) again occurs in
235	   raster order, this time in raster order within the containing
236	   macroblock.

238	   As discussed in further detail below, data can be specified at the
239	   levels of both macroblocks and their subblocks.

241	   Pixels are always treated, at a minimum, at the level of subblocks,
242	   which may be thought of as the "atoms" of the VP8 algorithm.  In
243	   particular, the 2x2 chroma blocks corresponding to 4x4 Y subblocks
244	   are never treated explicitly in the data format or in the algorithm
245	   specification.

247	   The DCT and WHT always operate at a 4x4 resolution.  The DCT is used
248	   for the 16Y, 4U and 4V subblocks.  The WHT is used (with some but not
249	   all prediction modes) to encode a 4x4 array comprising the average
250	   intensities of the 16 Y subblocks of a macroblock.  These average
251	   intensities are, up to a constant normalization factor, nothing more
252	   that the zeroth DCT coefficients of the Y subblocks.  This "higher-
253	   level" WHT is a substitute for the explicit specification of those
254	   coefficients, in exactly the same way as the DCT of a subblock
255	   substitutes for the specification of the pixel values comprising the
256	   subblock.  We consider this 4x4 array as a second-order subblock
257	   called Y2, and think of a macroblock as containing 24 "real"
258	   subblocks and, sometimes, a 25th "virtual" subblock.  This is dealt
259	   with further in Chapter 13.

261	   The frame layout used by the reference decoder may be found in the
262	   file yv12config.h.

264	3.  Compressed Frame Types

266	   There are only two types of frames in VP8.

268	   Intraframes (also called key frames and, in MPEG terminology,
269	   I-frames) are decoded without reference to any other frame in a
270	   sequence, that is, the decompressor reconstructs such frames
271	   beginning from its "default" state.  Key frames provide random access
272	   (or seeking) points in a video stream.

274	   Interframes (also called prediction frames and, in MPEG terminology,
275	   P-frames) are encoded with reference to prior frames, specifically
276	   all prior frames up to and including the most recent key frame.
277	   Generally speaking, the correct decoding of an interframe depends on
278	   the correct decoding of the most recent key frame and all ensuing
279	   frames.  Consequently, the decoding algorithm is not tolerant of
280	   dropped frames: In an environment in which frames may be dropped or
281	   corrupted, correct decoding will not be possible until a key frame is
282	   correctly received.

284	   In contrast to MPEG, there is no use of bidirectional prediction.  No
285	   frame is predicted using frames temporally subsequent to it; there is
286	   no analog to an MPEG B-frame.

288	   Secondly, VP8 augments these notions with that of alternate
289	   prediction frames, called golden frames and altref frames
290	   (alternative reference frames).  Blocks in an interframe may be
291	   predicted using blocks in the immediately previous frame as well as
292	   the most recent golden frame or altref frame.  Every key frame is
293	   automatically golden and altref, and any interframe may optionally
294	   replace the most recent golden or altref frame.

296	   Golden frames and altref frames may also be used to partially
297	   overcome the intolerance to dropped frames discussed above: If a
298	   compressor is configured to code golden frames only with reference to
299	   the prior golden frame (and key frame) then the "substream" of key
300	   and golden frames may be decoded regardless of loss of other
301	   interframes.  Roughly speaking, the implementation requires (on the
302	   compressor side) that golden frames subsume and recode any context
303	   updates effected by the intervening interframes.  A typical
304	   application of this approach is video conferencing, in which
305	   retransmission of a prior golden frame and/or a delay in playback
306	   until receipt of the next golden frame is preferable to a larger
307	   retransmit and/or delay until the next key frame.

309	4.  Overview of Compressed Data Format

311	   The input to a VP8 decoder is a sequence of compressed frames whose
312	   order matches their order in time.  Issues such as the duration of
313	   frames, the corresponding audio, and synchronization are generally
314	   provided by the playback environment and are irrelevant to the
315	   decoding process itself, however, to aid in fast seeking a start code
316	   is included in the header of each key frame.

318	   The decoder is simply presented with a sequence of compressed frames
319	   and produces a sequence of decompressed (reconstructed) YUV frames
320	   corresponding to the input sequence.  As stated in the introduction,
321	   the exact pixel values in the reconstructed frame are part of VP8's
322	   specification.  This document specifies the layout of the compressed
323	   frames and gives unambiguous algorithms for the correct production of
324	   reconstructed frames.

326	   The first frame presented to the decompressor is of course a key
327	   frame.  This may be followed by any number of interframes; the
328	   correct reconstruction of each frame depends on all prior frames up
329	   to the key frame.  The next key frame restarts this process: The
330	   decompressor resets to its default initial condition upon reception
331	   of a key frame and the decoding of a key frame (and its ensuing
332	   interframes) is completely independent of any prior decoding.

334	   At the highest level, every compressed frame has three or more
335	   pieces.  It begins with an uncompressed data chunk comprising 10
336	   bytes in the case of key frames and 3-bytes for inter frames.  This
337	   is followed by two or more blocks of compressed data (called
338	   partitions).  These compressed data partitions begin and end on byte
339	   boundaries.

341	   The first compressed partition has two subsections:

343	   1.  Header information that applies to the frame as a whole.

345	   2.  Per-macroblock information specifying how each macroblock is
346	       predicted from the already-reconstructed data that is available
347	       to the decompressor.

349	   As stated above, the macroblock-level information occurs in raster-
350	   scan order.

352	   The rest of the partitions contain, for each block, the DCT/WHT
353	   coefficients (quantized and logically compressed) of the residue
354	   signal to be added to the predicted block values.  It typically
355	   accounts for roughly 70% of the overall datarate.  VP8 supports
356	   packing the compressed DCT/WHT coefficients' data from macroblock
357	   rows into separate partitions.  If there is more than one partition
358	   for these coefficients, the sizes of the partitions -- except the
359	   last partition -- in bytes are also present in the bitstream right
360	   after the above first partition.  Each of the sizes is a 3-byte data
361	   item written in little endian format.  These sizes provide the
362	   decoder direct access to all DCT/WHT coefficient partitions, which
363	   enables parallel processing of the coefficients in a decoder.

365	   The separate partitioning of the prediction data and coefficient data
366	   also allows flexibility in the implementation of a decompressor: An
367	   implementation may decode and store the prediction information for
368	   the whole frame and then decode, transform, and add the residue
369	   signal to the entire frame, or it may simultaneously decode both
370	   partitions, calculating prediction information and adding in the
371	   residue signal for each block in order.  The length field in the
372	   frame tag, which allows decoding of the second partition to begin
373	   before the first partition has been completely decoded, is necessary
374	   for the second "block-at-a-time" decoder implementation.

376	   All partitions are decoded using separate instances of the boolean
377	   entropy decoder described in Chapter 7.  Although some of the data
378	   represented within the partitions is conceptually "flat" (a bit is
379	   just a bit with no probabilistic expectation one way or the other),
380	   because of the way such coders work, there is never a direct
381	   correspondence between a "conceptual bit" and an actual physical bit
382	   in the compressed data partitions.  Only in the 3 or 10 byte
383	   uncompressed chunk described above is there such a physical
384	   correspondence.

386	   A related matter, which is true for most lossless compression
387	   formats, is that seeking within a partition is not supported.  The
388	   data must be decompressed and processed (or at least stored) in the
389	   order in which it occurs in the partition.

391	   While this document specifies the ordering of the partition data
392	   correctly, the details and semantics of this data are discussed in a
393	   more logical fashion to facilitate comprehension.  For example, the
394	   frame header contains updates to many probability tables used in
395	   decoding per-macroblock data.  The latter is often described before
396	   the layouts of the probabilities and their updates, even though this
397	   is the opposite of their order in the bitstream.

399	5.  Overview of the Decoding Process

401	   A VP8 decoder needs to maintain four YUV frame buffers whose
402	   resolutions are at least equal to that of the encoded image.  These
403	   buffers hold the current frame being reconstructed, the immediately
404	   previous reconstructed frame, the most recent golden frame, and the
405	   most recent altref frame.

407	   Most implementations will wish to "pad" these buffers with
408	   "invisible" pixels that extend a moderate number of pixels beyond all
409	   four edges of the visible image.  This simplifies interframe
410	   prediction by allowing all (or most) prediction blocks -- which are
411	   not guaranteed to lie within the visible area of a prior frame -- to
412	   address usable image data.

414	   Regardless of the amount of padding chosen, the invisible rows above
415	   (below) the image are filled with copies of the top (bottom) row of
416	   the image; the invisible columns to the left (right) of the image are
417	   filled with copies of the leftmost (rightmost) visible row; and the
418	   four invisible corners are filled with copies of the corresponding
419	   visible corner pixels.  The use of these prediction buffers (and
420	   suggested sizes for the halo) will be elaborated on in the discussion
421	   of motion vectors, interframe prediction, and sub-pixel interpolation
422	   later in this document.

424	   As will be seen in the description of the frame header, the image
425	   dimensions are specified (and can change) with every key frame.
426	   These buffers (and any other data structures whose size depends on
427	   the size of the image) should be allocated (or re-allocated)
428	   immediately after the dimensions are decoded.

430	   Leaving most of the details for later elaboration, the following is
431	   an outline the decoding process.

433	   First, the frame header (beginning of the first data partition) is
434	   decoded.  Altering or augmenting the maintained state of the decoder,
435	   this provides the context in which the per-macroblock data can be
436	   interpreted.

438	   The macroblock data occurs (and must be processed) in raster-scan
439	   order.  This data comes in two or more parts.  The first (prediction
440	   or mode) part comes in the remainder of the first data partition.
441	   The other parts comprise the data partition(s) for the DCT/WHT
442	   coefficients of the residue signal.  For each macroblock, the
443	   prediction data must be processed before the residue.

445	   Each macroblock is predicted using one (and only one) of four
446	   possible frames.  All macroblocks in a key frame, and all intra-coded
447	   macroblocks in an interframe, are predicted using the already-decoded
448	   macroblocks in the current frame.  Macroblocks in an interframe may
449	   also be predicted using the previous frame, the golden frame or the
450	   altref frame.  Such macroblocks are said to be inter-coded.

452	   The purpose of prediction is to use already-constructed image data to
453	   approximate the portion of the original image being reconstructed.
454	   The effect of any of the prediction modes is then to write a
455	   macroblock-sized prediction buffer containing this approximation.

457	   Regardless of the prediction method, the residue DCT signal is
458	   decoded, dequantized, reverse-transformed, and added to the
459	   prediction buffer to produce the (almost final) reconstruction value
460	   of the macroblock, which is stored in the correct position of the
461	   current frame buffer.

463	   The residue signal consists of 24 (sixteen Y, four U, and four V) 4x4
464	   quantized and losslessly-compressed DCT transforms approximating the
465	   difference between the original macroblock in the uncompressed source
466	   and the prediction buffer.  For most prediction modes, the zeroth
467	   coefficients of the sixteen Y subblocks are expressed via a 25th WHT
468	   of the second-order virtual Y2 subblock discussed above.

470	   Intra-prediction exploits the spatial coherence of frames.  The 16x16
471	   luma (Y) and 8x8 chroma (UV) components are predicted independently
472	   of each other using one of four simple means of pixel propagation,
473	   starting from the already-reconstructed (16-pixel long luma, 8-pixel
474	   long chroma) row above and column to the left of the current
475	   macroblock.  The four methods are:

477	   1.  Copying the row from above throughout the prediction buffer.

479	   2.  Copying the column from left throughout the prediction buffer.

481	   3.  Copying the average value of the row and column throughout the
482	       prediction buffer.

484	   4.  Extrapolation from the row and column using the (fixed) second
485	       difference (horizontal and vertical) from the upper left corner.

487	   Additionally, the sixteen Y subblocks may be predicted independently
488	   of each other using one of ten different modes, four of which are 4x4
489	   analogs of those described above, augmented with six "diagonal"
490	   prediction methods.  There are two types of predictions, one intra
491	   and one prediction (among all the modes), for which the residue
492	   signal does not use the Y2 block to encode the DC portion of the
493	   sixteen 4x4 Y subblock DCTs.  This "independent Y subblock" mode has
494	   no effect on the 8x8 chroma prediction.

496	   Inter-prediction exploits the temporal coherence between nearby
497	   frames.  Except for the choice of the prediction frame itself, there
498	   is no difference between inter-prediction based on the previous frame
499	   and that based on the golden frame or altref frame.

501	   Inter-prediction is conceptually very simple.  While, for reasons of
502	   efficiency, there are several methods of encoding the relationship
503	   between the current macroblock and corresponding sections of the
504	   prediction frame, ultimately each of the sixteen Y subblocks is
505	   related to a 4x4 subblock of the prediction frame, whose position in
506	   that frame differs from the current subblock position by a (usually
507	   small) displacement.  These two-dimensional displacements are called
508	   motion vectors.

510	   The motion vectors used by VP8 have quarter-pixel precision.
511	   Prediction of a subblock using a motion vector that happens to have
512	   integer (whole number) components is very easy: the 4x4 block of
513	   pixels from the displaced block in the previous, golden, or altref
514	   frame are simply copied into the correct position of the current
515	   macroblock's prediction buffer.

517	   Fractional displacements are conceptually and implementationally more
518	   complex.  They require the inference (or synthesis) of sample values
519	   that, strictly speaking, do not exist.  This is one of the most basic
520	   problems in signal processing and readers conversant with that
521	   subject will see that the approach taken by VP8 provides a good
522	   balance of robustness, accuracy, and efficiency.

524	   Leaving the details for the implementation discussion below, the
525	   pixel interpolation is calculated by applying a kernel filter (using
526	   reasonable-precision integer math) three pixels on either side, both
527	   horizontally and vertically, of the pixel to be synthesized.  The
528	   resulting 4x4 block of synthetic pixels is then copied into position
529	   exactly as in the case of integer displacements.

531	   Each of the eight chroma subblocks is handled similarly.  Their
532	   motion vectors are never specified explicitly; instead, the motion
533	   vector for each chroma subblock is calculated by averaging the
534	   vectors of the four Y subblocks that occupy the same area of the
535	   frame.  Since chroma pixels have twice the diameter (and four times
536	   the area) of luma pixels, the calculated chroma motion vectors have
537	   1/8 pixel resolution, but the procedure for copying or generating
538	   pixels for each subblock is essentially identical to that done in the
539	   luma plane.

541	   After all the macroblocks have been generated (predicted and
542	   corrected with the DCT/WHT residue), a filtering step (the loop
543	   filter) is applied to the entire frame.  The purpose of the loop
544	   filter is to reduce blocking artifacts at the boundaries between
545	   macroblocks and between subblocks of the macroblocks.  The term loop
546	   filter is used because this filter is part of the "coding loop," that
547	   is, it affects the reconstructed frame buffers that are used to
548	   predict ensuing frames.  This is distinguished from the
549	   postprocessing filters discussed earlier which affect only the viewed
550	   video and do not "feed into" subsequent frames.

552	   Next, if signaled in the data, the current frame (or individual
553	   macroblocks within the current frame) may replace the golden frame
554	   prediction buffer and/or the altref frame buffer.

556	   The halos of the frame buffers are next filled as specified above.
557	   Finally, at least as far as decoding is concerned, the (references
558	   to) the "current" and "last" frame buffers should be exchanged in
559	   preparation for the next frame.

561	   Various processes may be required (or desired) before viewing the
562	   generated frame.  As discussed in the frame dimension information
563	   below, truncation and/or upscaling of the frame may be required.
564	   Some playback systems may require a different frame format (RGB,
565	   YUY2, etc.).  Finally, as mentioned in the introduction, further
566	   postprocessing or filtering of the image prior to viewing may be
567	   desired.  Since the primary purpose of this document is a decoding
568	   specification, the postprocessing is not specified in this document.

570	   While the basic ideas of prediction and correction used by VP8 are
571	   straightforward, many of the details are quite complex.  The
572	   management of probabilities is particularly elaborate.  Not only do
573	   the various modes of intra-prediction and motion vector specification
574	   have associated probabilities but they, together with the coding of
575	   DCT coefficients and motion vectors, often base these probabilities
576	   on a variety of contextual information (calculated from what has been
577	   decoded so far), as well as on explicit modification via the frame
578	   header.

580	   The "top-level" of decoding and frame reconstruction is implemented
581	   in the reference decoder files onyxd_if.c and decodframe.c.

583	   This concludes our summary of decoding and reconstruction; we
584	   continue by discussing the individual aspects in more depth.

586	   A reasonable "divide and conquer" approach to implementation of a
587	   decoder is to begin by decoding streams composed exclusively of key
588	   frames.  After that works reliably, interframe handling can be added
589	   more easily than if complete functionality were attempted
590	   immediately.  In accordance with this, we first discuss components
591	   needed to decode key frames (most of which are also used in the
592	   decoding of interframes) and conclude with topics exclusive to
593	   interframes.

595	6.  Description of Algorithms

597	   As the intent of this document, together with the reference decoder
598	   source code, is to specify a platform-independent procedure for the
599	   decoding and reconstruction of a VP8 video stream, many (small)
600	   algorithms must be described exactly.

602	   Due to its near-universality, terseness, ability to easily describe
603	   calculation at specific precisions, and the fact that On2's reference
604	   VP8 decoder is written in C, these algorithm fragments are written
605	   using the C programming language, augmented with a few simple
606	   definitions below.

608	   The standard (and best) reference for C is The C Programming
609	   Language, written by Brian W. Kernighan and Dennis M. Ritchie, and
610	   published by Prentice-Hall.

612	   Many code fragments will be presented in this document.  Some will be
613	   nearly identical to corresponding sections of the reference decoder;
614	   others will differ.  Roughly speaking, there are three reasons for
615	   such differences:

617	   1.  For reasons of efficiency, the reference decoder version may be
618	       less obvious.

620	   2.  The reference decoder often uses large data structures to
621	       maintain context that need not be described or used here.

623	   3.  The authors of this document felt that a different expression of
624	       the same algorithm might facilitate exposition.

626	   Regardless of the chosen presentation, the calculation effected by
627	   any of the algorithms described here is identical to that effected by
628	   the corresponding portion of the reference decoder.

630	   All VP8 decoding algorithms use integer math.  To facilitate
631	   specification of arithmetic precision, we define the following types.

633	   ---- Begin code block --------------------------------------

635	   typedef   signed char  int8; /* signed int exactly 8 bits wide */
636	   typedef unsigned char uint8; /* unsigned "" */

638	   typedef short int16;         /* signed int exactly 16 bits wide */
639	   typedef unsigned int16 uint16; /* unsigned "" */

641	   /* int32 is a signed integer type at least 32 bits wide */

643	   typedef long int32; /* guaranteed to work on all systems */
644	   typedef int  int32; /* will be more efficient on some systems */

646	   typedef unsigned int32 uint32;

648	   /* unsigned integer type, at least 16 bits wide, whose exact size
649	      is most convenient to whatever processor we are using */

651	   typedef unsigned int uint;

653	   /* While pixels themselves are 8-bit unsigned integers,
654	      pixel arithmetic often occurs at 16- or 32-bit precision and
655	      the results need to be "saturated" or clamped to an 8-bit
656	      range. */

658	   typedef uint8 Pixel;

660	   Pixel clamp255( int32 v) { return v < 0? 0 : (v < 255? v : 255);}

662	   /*  As is elaborated in the discussion of the bool_decoder below,
663	       VP8 represents probabilities as unsigned 8-bit numbers. */

665	   typedef uint8 Prob;

667	   ---- End code block ----------------------------------------

669	   We occasionally need to discuss mathematical functions involving
670	   honest-to-goodness "infinite precision" real numbers.  The DCT is
671	   first described via the cosine function cos; the ratio of the lengths
672	   of the circumference and diameter of a circle is denoted pi; at one
673	   point, we take a (base 1//2) logarithm denoted log; and pow( x, y)
674	   denotes x raised to the power y.  If x = 2 and y is a small non-
675	   negative integer, pow( 2, y) may be expressed in C as 1 << y.

677	   Finally, we sometimes need to divide signed integers by powers of
678	   two, that is, we occasionally right-shift signed numbers.  The
679	   behavior of such shifts (i.e., the propagation of the sign bit) is,
680	   perhaps surprisingly, not defined by the C language itself and is
681	   left up to individual compilers.  Because of the utility of this
682	   frequently needed operation, it is at least arguable that it should
683	   be defined by the language (to naturally propagate the sign bit) and,
684	   at a minimum, should be correctly implemented by any reasonable
685	   compiler.  In the interest of strict portability, we attempt to call
686	   attention to these shifts when they arise.

688	7.  Boolean Entropy Decoder

690	   As discussed in the overview above, essentially the entire VP8 data
691	   stream is encoded using a boolean entropy coder.

693	   An understanding of the bool_decoder is critical to the
694	   implementation of a VP8 decompressor, so we discuss in detail.  It is
695	   easier to comprehend the bool_decoder in conjunction with the
696	   bool_encoder used by the compressor to write the compressed data
697	   partitions.

699	   The bool_encoder encodes (and the bool_decoder decodes) one bool
700	   (zero-or-one boolean value) at a time.  Its purpose is to losslessly
701	   compress a sequence of bools for which the probability of their being
702	   zero or one can be well-estimated (via constant or previously-coded
703	   information) at the time they are written, using identical
704	   corresponding probabilities at the time they are read.

706	   As the reader is probably aware, if a bool is much more likely to be
707	   zero than one (for instance), it can, on average, be faithfully
708	   encoded using much less than one bit per value.  The bool_encoder
709	   exploits this.

711	   In the 1940s, Claude Shannon proved that there is a lower bound for
712	   the average datarate of a faithful encoding of a sequence of bools
713	   (whose probability distributions are known and are independent of
714	   each other) and also that there are encoding algorithms that
715	   approximate this lower bound as closely as one wishes.

717	   If we encode a sequence of bools whose probability of being zero is p
718	   (and whose probability of being 1 is 1-p), the lowest possible
719	   datarate per value is

721	   plog(p) + (1-p)log(1-p);

723	   taking the logarithms to the base 1//2 expresses the datarate in
724	   bits/value.

726	   We give two simple examples.  At one extreme, if p=1//2, then log(p)
727	   = log(1-p) = 1 and the lowest possible datarate per bool is 1//2 +
728	   1//2 = 1, that is, we cannot do any better than simply literally
729	   writing out bits.  At another extreme, if p is very small, say p=1//
730	   1024, then log(p)=10, log(1-p) is roughly .0014, and the lowest
731	   possible datarate is approximately 10//1024 + .0014, roughly 1/100 of
732	   a bit per bool.

734	   Because most of the bools in the VP8 datastream have zero-
735	   probabilities nowhere near 1//2, the compression provided by the
736	   bool_encoder is critical to the performance of VP8.

738	   The bool coder used by VP8 is a variant of an arithmetic coder.  An
739	   excellent discussion of arithmetic coding (and other lossless
740	   compression techniques) can be found in the book Text Compression by
741	   Timothy C. Bell, John G. Cleary, and Ian H. Witten, published in 1990
742	   by Prentice-Hall.

744	7.1.  Underlying Theory of Coding

746	   The basic idea used by the bool coder is to consider the entire data
747	   stream (either of the partitions in our case) as the binary expansion
748	   of a single number x with 0 <= x < 1.  The bits (or bytes) in x are
749	   of course written from high to low order and if b[j] (B[j]) is the
750	   j^(th) bit (byte) in the partition, the value x is simply the sum
751	   (starting with j = 1) of pow(2, -j) * b[j] or pow(256, -j) * B[j].

753	   Before the first bool is coded, all values of x are possible.

755	   The coding of each bool restricts the possible values of x in
756	   proportion to the probability of what is coded.  If p1 is the
757	   probability of the first bool being zero and a zero is coded, the
758	   range of possible x is restricted to 0 <= x < p1.  If a one is coded,
759	   the range becomes p1 <= x < 1.

761	   The coding continues by repeating the same idea.  At every stage,
762	   there is an interval a <= x < b of possible values of x.  If p is the
763	   probability of a zero being coded at this stage and a zero is coded,
764	   the interval becomes a <= x < a + (p(b-a)).  If a one is coded, the
765	   possible x are restricted to a + (p(b-a)) <= x < b.

767	   Assuming only finitely many values are to be coded, after the encoder
768	   has received the last bool, it can write as its output any value x
769	   that lies in the final interval.  VP8 simply writes the left endpoint
770	   of the final interval.  Consequently, the output it would make if
771	   encoding were to stop at any time either increases or stays the same
772	   as each bool is encoded.

774	   Decoding parallels encoding.  The decoder is presented with the
775	   number x, which has only the initial restriction 0 <= x < 1.  To
776	   decode the first bool, the decoder is given the first probability p1.
777	   If x < p1, a zero is decoded; if x >= p1, a one is decoded.  In
778	   either case, the new restriction on x, that is, the interval of
779	   possible x, is remembered.

781	   Decoding continues in exactly the same way: If a <= x < b is the
782	   current interval and we are to decode a bool with zero-probability p,
783	   we return a zero if a <= x < a + (p(b-a)) and a one if a + (p(b-a))
784	   <= x < b.  In either case, the new restriction is remembered in
785	   preparation for decoding the next bool.

787	   The process outlined above uses real numbers of infinite precision to
788	   express the probabilities and ranges.  It is true that, if one could
789	   actualize this process and coded a large number of bools whose
790	   supplied probabilities matched their value distributions, the
791	   datarate achieved would approach the theoretical minimum as the
792	   number of bools encoded increased.

794	   Unfortunately, computers operate at finite precision and an
795	   approximation to the theoretically perfect process described above is
796	   necessary.  Such approximation increases the datarate but, at quite
797	   moderate precision and for a wide variety of data sets, this increase
798	   is negligible.

800	   The only conceptual limitations are, first, that coder probabilities
801	   must be expressed at finite precision and, second, that the decoder
802	   be able to detect each individual modification to the value interval
803	   via examination of a fixed amount of input.  As a practical matter,
804	   many of the implementation details stem from the fact that the coder
805	   can function using only a small "window" to incrementally read or
806	   write the arbitrarily precise number x.

808	7.2.  Practical Algorithm Description

810	   VP8's bool coder works with 8-bit probabilities p.  The range of such
811	   p is 0 <= p <= 255; the actual probability represented by p is
812	   p//256.  Also, the coder is designed so that decoding of a bool
813	   requires no more than an 8-bit comparison and so that the state of
814	   both the encoder and decoder can be easily represented using a small
815	   number of unsigned 16-bit integers.

817	   The details are most easily understood if we first describe the
818	   algorithm using bit-at-a-time input and output.  Aside from the
819	   ability to maintain a position in this bitstream and write/read bits,
820	   the encoder also needs the ability to add 1 to the bits already
821	   output; after writing n bits, adding 1 to the existing output is the
822	   same thing as adding pow( 2, -n) to x.

824	   Together with the bit position, the encoder must maintain two
825	   unsigned 8-bit numbers which we call "bottom" and "range".  Writing w
826	   for the n bits already written and S = pow( 2, - n - 8) for the scale
827	   of the current bit position one byte out, we have the following
828	   constraint on all future values v of w (including the final value v =
829	   x):

831	   w + ( S * "bottom" ) <= v < w + ( S * ( "bottom" + "range" ) )
832	   Thus, appending "bottom" to the already-written bits w gives the left
833	   endpoint of the interval of possible values, appending "bottom" +
834	   "range" gives the right endpoint, "range" itself (scaled to the
835	   current output position) is the length of the interval.

837	   So that our probabilistic encodings are reasonably accurate, we do
838	   not let "range" vary by more than a factor of two: It stays within
839	   the bounds 128 <= "range" <= 255.

841	   The process for encoding a boolean value "val" whose probability of
842	   being zero is "prob" / 256 -- and whose probability of being one is (
843	   256 - "prob" ) / 256 -- with 1 <= "prob" <= 255 is as follows.

845	   Using an unsigned 16-bit multiply followed by an unsigned right
846	   shift, we calculate an unsigned 8-bit "split" value:

848	   split = 1 + ((("range" - 1) * "probability")]] >> 8)

850	   "split" is approximately ( "prob" / 256 ) * "range" and lies within
851	   the bounds 1 <= "split" <= "range" - 1.  These bounds ensure the
852	   correctness of the decoding procedure described below.

854	   If "val" is false, we leave the left interval endpoint "bottom" alone
855	   and reduce "range", replacing it by "split".  If "val" is true, we
856	   move up the left endpoint to "bottom" + "split", propagating any
857	   carry to the already-written value "w" (this is where we need the
858	   ability to add 1 to "w"), and reduce "range" to "range" - "split".

860	   Regardless of the value encoded, "range" has been reduced and now has
861	   the bounds 1 <= "range" <= 254.  If "range" < 128, the encoder
862	   doubles it and shifts the high-order bit out of "bottom" to the
863	   output as it also doubles "bottom", repeating this process one bit at
864	   a time until 128 <= "range" <= 255.  Once this is completed, the
865	   encoder is ready to accept another bool, maintaining the constraints
866	   described above.

868	   After encoding the last bool, the partition may be completed by
869	   appending "bottom" to the bitstream.

871	   The decoder mimics the state of the encoder.  It maintains, together
872	   with an input bit position, two unsigned 8-bit numbers, a "range"
873	   identical to that maintained by the encoder and a "value".  Decoding
874	   one bool at a time, the decoder (in effect) tracks the same left
875	   interval endpoint as does the encoder and subtracts it from the
876	   remaining input.  Appending the unread portion of the bitstream to
877	   the 8-bit "value" gives the difference between the actual value
878	   encoded and the known left endpoint.

880	   The decoder is initialized by setting "range" = 255 and reading the
881	   first 16 input bits into "value".  The decoder maintains "range" and
882	   calculates "split" in exactly the same way as does the encoder.

884	   To decode a bool, it compares "value" to "split"; if "value" <
885	   "split", the bool is zero, and "range" is replaced with "split".  If
886	   "value" >= "split", the bool is one, "range" is replaced with "range"
887	   - "split", and "value" is replaced with "value" - "split".

889	   Again, "range" is doubled one bit at a time until it is at least 128.
890	   The "value" is doubled in parallel, shifting a new input bit into the
891	   "bottom" each time.

893	   Writing "Value" for "value" together with the unread input bits and
894	   "Range" for "range" extended indefinitely on the right by zeros, the
895	   condition "Value" < "Range" is maintained at all times by the
896	   decoder.  In particular, the bits shifted out of "value" as it is
897	   doubled are always zero.

899	7.3.  Actual Implementation

901	   The C code below gives complete implementations of the encoder and
902	   decoder described above.  While they are logically identical to the
903	   "bit-at-a-time" versions, they internally buffer a couple of extra
904	   bytes of the bitstream.  This allows I/O to be done (more
905	   practically) a byte at a time and drastically reduces the number of
906	   carries the encoder has to propagate into the already-written data.

908	   Another (logically equivalent) implementation may be found in the
909	   reference decoder files dboolhuff.h and dboolhuff.c.

911	   ---- Begin code block --------------------------------------

913	   /* Encoder first */

915	   typedef struct {
916	     uint8 *output;  /* ptr to next byte to be written */
917	     uint32 range;   /* 128 <= range <= 255 */
918	     uint32 bottom;  /* minimum value of remaining output */
919	     int bit_count;  /* # of shifts before an output byte
920	                        is available */
921	   } bool_encoder;

923	   /* Must set initial state of encoder before writing any bools. */

925	   void init_bool_encoder( bool_encoder *e, uint8 *start_partition)
926	   {
927	     e->output = start_partition;
928	     e->range = 255;

930	     e->bottom = 0;
931	     e->bit_count = 24;
932	   }

934	   /* Encoding very rarely produces a carry that must be propagated
935	      to the already-written output. The arithmetic guarantees that
936	      the propagation will never go beyond the beginning of the
937	      output. Put another way, the encoded value x is always less
938	      than one. */

940	   void add_one_to_output( uint8 *q)
941	   {
942	     while( *--q == 255)
943	       *q = 0;
944	     ++*q;
945	   }

947	   /* Main function writes a bool_value whose probability of being
948	      zero is (expected to be) prob/256. */

950	   void write_bool( bool_encoder *e, Prob prob, int bool_value)
951	   {
952	     /* split is approximately (range * prob) / 256  and,
953	        crucially, is strictly bigger than zero and strictly
954	        smaller than range */

956	     uint32 split = 1 + ( ((e->range - 1) * prob) >> 8);

958	     if( bool_value) {
959	       e->bottom += split; /* move up bottom of interval */
960	       e->range -= split;  /* with corresponding decrease in range */
961	     } else
962	       e->range = split;   /* decrease range, leaving bottom alone */

964	     while( e->range < 128)
965	     {
966	       e->range <<= 1;

968	       if( e->bottom & (1 << 31))  /* detect carry */
969	         add_one_to_output( e->output);

971	       e->bottom <<= 1;        /* before shifting bottom */

973	       if( !--e->bit_count) {  /* write out high byte of bottom ... */

975	         *e->output++ = (uint8) (e->bottom >> 24);
976	         e->bottom &= (1 << 24) - 1;  /* ... keeping low 3 bytes */

978	         e->bit_count = 8;            /* 8 shifts until next output */
979	       }
980	     }
981	   }

983	   /* Call this function (exactly once) after encoding the last
984	      bool value for the partition being written */

986	   void flush_bool_encoder( bool_encoder *e)
987	   {
988	     int c = e->bit_count;
989	     uint32 v = e->bottom;

991	     if( v & (1 << (32 - c)))   /* propagate (unlikely) carry */
992	       add_one_to_output( e->output);
993	     v <<= c & 7;               /* before shifting remaining output */
994	     c >>= 3;                   /* to top of internal buffer */
995	     while( --c >= 0)
996	       v <<= 8;
997	     c = 4;
998	     while( --c >= 0) {    /* write remaining data, possibly padded */
999	       *e->output++ = (uint8) (v >> 24);
1000	       v <<= 8;
1001	     }
1002	   }

1004	   /* Decoder state exactly parallels that of the encoder.
1005	      "value", together with the remaining input, equals the
1006	      complete encoded number x less the left endpoint of the
1007	      current coding interval. */

1009	   typedef struct {
1010	     uint8   *input;     /* pointer to next compressed data byte */
1011	     uint32  range;      /* always identical to encoder's range */
1012	     uint32  value;      /* contains at least 24 significant bits */
1013	     int     bit_count;  /* # of bits shifted out of
1014	                            value, at most 7 */
1015	   } bool_decoder;

1017	   /* Call this function before reading any bools from the
1018	      partition.*/

1020	   void init_bool_decoder( bool_decoder *d, uint8 *start_partition)
1021	   {
1022	     {
1023	       int i = 0;
1024	       d->value = 0;           /* value = first 24 input bytes */
1025	       while( ++i <= 24)
1026	         d->value = (d->value << 8)  |  *start_partition++;
1027	     }

1029	     d->input = start_partition;  /* ptr to next byte to be read */
1030	     d->range = 255;           /* initial range is full */
1031	     d->bit_count = 0;         /* have not yet shifted out any bits */
1032	   }

1034	   /* Main function reads a bool encoded at probability prob/256,
1035	      which of course must agree with the probability used when the
1036	      bool was written. */

1038	   int read_bool( bool_decoder *d, Prob prob)
1039	   {
1040	     /* range and split are identical to the corresponding values
1041	        used by the encoder when this bool was written */

1043	     uint32  split = 1 + ( ((d->range - 1) * prob) >> 8);
1044	     uint32  SPLIT = split << 8;
1045	     int     retval;           /* will be 0 or 1 */

1047	     if( d->value >= SPLIT) {  /* encoded a one */
1048	       retval = 1;
1049	       d->range -= split;  /* reduce range */
1050	       d->value -= SPLIT;  /* subtract off left endpoint of interval */
1051	     } else {              /* encoded a zero */
1052	       retval = 0;
1053	       d->range = split;  /* reduce range, no change in left endpoint */
1054	     }

1056	     while( d->range < 128) {  /* shift out irrelevant value bits */
1057	       d->value <<= 1;
1058	       d->range <<= 1;
1059	       if( ++d->bit_count == 8) {  /* shift in new bits 8 at a time */
1060	         d->bit_count = 0;
1061	         d->value |= *d->input++;
1062	       }
1063	     }
1064	     return retval;
1065	   }

1067	   /* Convenience function reads a "literal", that is, a "num_bits"
1068	      wide unsigned value whose bits come high- to low-order, with
1069	      each bit encoded at probability 128 (i.e., 1/2). */

1071	   uint32 read_literal( bool_decoder *d, int num_bits)
1072	   {
1073	     uint32 v = 0;
1074	     while( num_bits--)
1075	       v = (v << 1) + read_bool( d, 128);
1076	     return v;
1077	   }

1079	   /* Variant reads a signed number */

1081	   int32 read_signed_literal( bool_decoder *d, int num_bits)
1082	   {
1083	     int32 v = 0;
1084	     if( !num_bits)
1085	       return 0;
1086	     if( read_bool( d, 128))
1087	       v = -1;
1088	     while( --num_bits)
1089	       v = (v << 1) + read_bool( d, 128);
1090	     return v;
1091	   }

1093	   ---- End code block ----------------------------------------

1095	8.  Compressed Data Components

1097	   At the lowest level, VP8's compressed data is simply a sequence of
1098	   probabilistically-encoded bools.  Most of this data is composed of
1099	   (slightly) larger semantic units fashioned from bools, which we
1100	   describe here.

1102	   We sometimes use these descriptions in C expressions within data
1103	   format specifications.  In this context, they refer to the return
1104	   value of a call to an appropriate bool_decoder d, reading (as always)
1105	   from its current reference point.

1107	   +--------------+-------+--------------------------------------------+
1108	   | Call         | Alt.  | Return                                     |
1109	   +--------------+-------+--------------------------------------------+
1110	   | Bool(p)      | B(p)  | Bool with probability p of being 0.        |
1111	   |              |       | Abbreviated B(p). Return value of          |
1112	   |              |       | read_bool(d, p).                           |
1113	   |              |       |                                            |
1114	   | Flag         | F     | A one-bit flag (same thing as a B(128) or  |
1115	   |              |       | an L(1)). Abbreviated F. Return value of   |
1116	   |              |       | read_bool(d, 128).                         |
1117	   |              |       |                                            |
1118	   | Lit(n)       | L(n)  | Unsigned n-bit number encoded as n flags   |
1119	   |              |       | (a "literal"). Abbreviated L(n). The bits  |
1120	   |              |       | are read from high to low order. Return    |
1121	   |              |       | value of read_literal(d, n).               |
1122	   |              |       |                                            |
1123	   | SignedLit(n) |       | Signed n-bit number encoded similarly to   |
1124	   |              |       | an L(n). Return value of                   |
1125	   |              |       | read_signed_literal(d, n). These are rare. |
1126	   |              |       |                                            |
1127	   | P(8)         |       | An 8-bit probability. No different from an |
1128	   |              |       | L(8), but we sometimes use this notation   |
1129	   |              |       | to emphasize that a probability is being   |
1130	   |              |       | coded.                                     |
1131	   |              |       |                                            |
1132	   | P(7)         |       | A 7-bit specification of an 8-bit          |
1133	   |              |       | probability. Coded as an L(7) number x;    |
1134	   |              |       | the resulting 8-bit probability is x ? x   |
1135	   |              |       | << 1 : 1.                                  |
1136	   |              |       |                                            |
1137	   | F? X         |       | A flag which, if true, is followed by a    |
1138	   |              |       | piece of data X.                           |
1139	   |              |       |                                            |
1140	   | F? X:Y       |       | A flag which, if true, is followed by X    |
1141	   |              |       | and, if false, is followed by Y. Also used |
1142	   |              |       | to express a value where Y is an implicit  |
1143	   |              |       | default (not encoded in the data stream),  |
1144	   |              |       | as in F? P(8):255, which expresses an      |
1145	   |              |       | optional probability: if the flag is true, |
1146	   |              |       | the probability is specified as an 8-bit   |
1147	   |              |       | literal, while if the flag is false, the   |
1148	   |              |       | probability defaults to 255.               |
1149	   |              |       |                                            |
1150	   | B(p)? X      | B(p)? | Variants of the above using a boolean      |
1151	   |              | X:Y   | indicator whose probability is not         |
1152	   |              |       | necessarily 128.                           |
1153	   |              |       |                                            |
1154	   | X            |       | Multi-component field, the specifics of    |
1155	   |              |       | which will be given at a more appropriate  |
1156	   |              |       | point in the discussion.                   |
1157	   |              |       |                                            |
1158	   | T            |       | Tree-encoded value from small alphabet.    |
1159	   +--------------+-------+--------------------------------------------+

1161	   The last type requires elaboration.  We often wish to encode
1162	   something whose value is restricted to a small number of
1163	   possibilities (the alphabet).

1165	   This is done by representing the alphabet as the leaves of a small
1166	   binary tree.  The (non-leaf) nodes of the tree have associated
1167	   probabilities p and correspond to calls to read_bool(d, p).  We think
1168	   of a zero as choosing the left branch below the node and a one as
1169	   choosing the right branch.

1171	   Thus every value (leaf) whose tree depth is x is decoded after
1172	   exactly x calls to read_bool.

1174	   A tree representing an encoding of an alphabet of n possible values
1175	   always contains n-1 non-leaf nodes, regardless of its shape (this is
1176	   easily seen by induction on n).

1178	   There are many ways that a given alphabet can be so represented.  The
1179	   choice of tree has little impact on datarate but does affect decoder
1180	   performance.  The trees used by VP8 are chosen to (on average)
1181	   minimize the number of calls to read_bool.  This amounts to shaping
1182	   the tree so that more probable values have smaller tree depth than do
1183	   less probable values.

1185	   Readers familiar with Huffman coding will notice that, given an
1186	   alphabet together with probabilities for each value, the associated
1187	   Huffman tree minimizes the expected number of calls to read_bool.

1189	   Such readers will also realize that the coding method described here
1190	   never results in higher datarates than does the Huffman method and,
1191	   indeed, often results in much lower datarates.  Huffman coding is, in
1192	   fact, nothing more than a special case of this method in which each
1193	   node probability is fixed at 128 (i.e., 1/2).

1195	8.1.  Tree Coding Implementation

1197	   We give a suggested implementation of a tree data structure followed
1198	   by a couple of actual examples of its usage by VP8.

1200	   It is most convenient to represent the values using small positive
1201	   integers, typically an enum counting up from zero.  The largest
1202	   alphabet (used to code DCT coefficients, described in Chapter 13 that
1203	   is tree-coded by VP8 has only 12 values.  The tree for this alphabet
1204	   adds 11 interior nodes and so has a total of 23 positions.  Thus, an
1205	   8-bit number easily accommodates both a tree position and a return
1206	   value.

1208	   A tree may then be compactly represented as an array of (pairs of)
1209	   8-bit integers.  Each (even) array index corresponds to an interior
1210	   node of the tree;, the zeroth index of course corresponds to the root
1211	   of the tree.  The array entries come in pairs corresponding to the
1212	   left (0) and right (1) branches of the subtree below the interior
1213	   node.  We use the convention that a positive (even) branch entry is
1214	   the index of a deeper interior node, while a nonpositive entry v
1215	   corresponds to a leaf whose value is -v.

1217	   The node probabilities associated to a tree-coded value are stored in
1218	   an array whose indices are half the indices of the corresponding tree
1219	   positions.  The length of the probability array is one less than the
1220	   size of the alphabet.

1222	   Here is C code implementing the foregoing.  The advantages of our
1223	   data structure should be noted.  Aside from the smallness of the
1224	   structure itself, the tree-directed reading algorithm is essentially
1225	   a single line of code.

1227	   ---- Begin code block --------------------------------------

1229	   /* A tree specification is simply an array of 8-bit integers. */

1231	   typedef int8 tree_index;
1232	   typedef const tree_index Tree[];

1234	   /* Read and return a tree-coded value at the current decoder
1235	      position. */

1237	   int treed_read(
1238	     bool_decoder * const d, /* bool_decoder always returns a 0 or 1 */
1239	     Tree t,                 /* tree specification */
1240	     const Prob p[]     /* corresponding interior node probabilities */
1241	   ) {
1242	     register tree_index i = 0;   /* begin at root */

1244	     /* Descend tree until leaf is reached */

1246	     while( ( i = t[ i + read_bool( d, p[i>>1]) ] ) > 0) {}

1248	     return -i;     /* return value is negation of nonpositive index */
1249	   }

1251	   ---- End code block ----------------------------------------

1253	   Tree-based decoding is implemented in the reference decoder file
1254	   tree_reader.h.

1256	8.2.  Tree Coding Example

1258	   As a multi-part example, without getting too far into the semantics
1259	   of macroblock decoding (which is of course taken up below), we look
1260	   at the "mode" coding for intra-predicted macroblocks.

1262	   It so happens that, because of a difference in statistics, the Y (or
1263	   luma) mode encoding uses two different trees: one for key frames and
1264	   another for interframes.  This is the only instance in VP8 of the
1265	   same dataset being coded by different trees under different
1266	   circumstances.  The UV (or chroma) modes are a proper subset of the Y
1267	   modes and, as such, have their own decoding tree.

1269	   ---- Begin code block --------------------------------------

1271	   typedef enum
1272	   {
1273	       DC_PRED, /* predict DC using row above and column to the left */
1274	       V_PRED,  /* predict rows using row above */
1275	       H_PRED,  /* predict columns using column to the left */
1276	       TM_PRED, /* propagate second differences a la "true motion" */

1278	       B_PRED,  /* each Y subblock is independently predicted */

1280	       num_uv_modes = B_PRED,  /* first four modes apply to chroma */
1281	       num_ymodes   /* all modes apply to luma */
1282	   }
1283	   intra_mbmode;

1285	   /* The aforementioned trees together with the implied codings as
1286	      comments.
1287	      Actual (i.e., positive) indices are always even.
1288	      Value (i.e., nonpositive) indices are arbitrary. */

1290	   const tree_index ymode_tree [2 * (num_ymodes - 1)] =
1291	   {
1292	    -DC_PRED, 2,        /* root: DC_PRED = "0", "1" subtree */
1293	     4, 6,              /* "1" subtree has 2 descendant subtrees */
1294	      -V_PRED, -H_PRED, /* "10" subtree: V_PRED = "100",
1295	                           H_PRED = "101" */
1296	      -TM_PRED, -B_PRED /* "11" subtree: TM_PRED = "110",
1297	                           B_PRED = "111" */
1298	   };

1300	   const tree_index kf_ymode_tree [2 * (num_ymodes - 1)] =
1301	   {
1302	    -B_PRED, 2,            /* root: B_PRED = "0", "1" subtree */
1303	     4, 6,                 /* "1" subtree has 2 descendant subtrees */
1304	      -DC_PRED, -V_PRED,   /* "10" subtree: DC_PRED = "100",
1305	                              V_PRED = "101" */
1306	      -H_PRED, -TM_PRED    /* "11" subtree: H_PRED = "110",
1307	                              TM_PRED = "111" */
1308	   };

1310	   const tree_index uv_mode_tree [2 * (num_uv_modes - 1)] =
1311	   {
1312	    -DC_PRED, 2,          /* root: DC_PRED = "0", "1" subtree */
1313	     -V_PRED, 4,          /* "1" subtree:  V_PRED = "10",
1314	                             "11" subtree */
1315	      -H_PRED, -TM_PRED   /* "11" subtree: H_PRED = "110",
1316	                             TM_PRED = "111" */
1317	   };

1319	   /* Given a bool_decoder d, a Y mode might be decoded as follows.*/

1321	   const Prob pretend_its_huffman [num_ymodes - 1] =
1322	     { 128, 128, 128, 128};

1324	   Ymode = (intra_mbmode) treed_read( d, ymode_tree,
1325	     pretend_its_huffman);

1327	   ---- End code block ----------------------------------------

1329	   Since it greatly facilitates re-use of reference code and since there
1330	   is no real reason to do otherwise, it is strongly suggested that any
1331	   decoder implementation use exactly the same enumeration values and
1332	   probability table layouts as described in this document (and in the
1333	   reference code) for all tree-coded data in VP8.

1335	9.  Frame Header

1337	   The uncompressed data chunk at the start of each frame and the first
1338	   part of the first data partition contains information pertaining to
1339	   the frame as a whole.  We list the fields in the order of occurrence,
1340	   giving details for some of the fields.  Other details are postponed
1341	   until a more logical point in our overall description.  Most of the
1342	   header decoding occurs in the reference decoder file decodeframe.c.

1344	9.1.  Uncompressed Data Chunk

1346	   The uncompressed data chunk comprises a common (for key frames and
1347	   interframes) 3-byte frame tag that contains four fields, as follows:

1349	   1.  A 1-bit frame type (0 for key frames, 1 for interframes).

1351	   2.  A 3-bit version number (0 - 3 are defined as four different
1352	       profiles with different decoding complexity; other values may be
1353	       defined for future variants of the VP8 data format).

1355	   3.  A 1-bit show_frame flag (0 when current frame is not for display,
1356	       1 when current frame is for display).

1358	   4.  A 19-bit field containing the size of the first data partition in
1359	       bytes.

1361	   Version number enables or disables certain features in the bitstream,
1362	   as follows:

1364	            +---------+-------------------------+-------------+
1365	            | Version | Reconstruction filter   | Loop filter |
1366	            +---------+-------------------------+-------------+
1367	            | 0       | Bicubic                 | Normal      |
1368	            |         |                         |             |
1369	            | 1       | Bilinear                | Simple      |
1370	            |         |                         |             |
1371	            | 2       | Bilinear                | None        |
1372	            |         |                         |             |
1373	            | 3       | None                    | None        |
1374	            |         |                         |             |
1375	            | Other   | Reserved for future use |             |
1376	            +---------+-------------------------+-------------+

1378	   The reference software also adjusts the loop filter based on version
1379	   number, as per the table above.  Version number 1 implies a "simple"
1380	   loop filter and version numbers 2 and 3 imply no loop filter.
1381	   However, the "simple" filter setting in this context has no effect
1382	   whatsoever on the decoding process, and the "no loop filter" setting
1383	   only forces the reference encoder to set filter level equal to 0.
1384	   Neither affect the decoding process.  In decoding, the only loop
1385	   filter settings that matter are those in the frame header.

1387	   For key frames the frame tag is followed by a further 7 bytes of
1388	   uncompressed data, as follows:

1390	   ---- Begin code block --------------------------------------

1392	   Start code byte 0     0x9d
1393	   Start code byte 1     0x01
1394	   Start code byte 2     0x2a

1396	   16 bits      :     (2 bits Horizontal Scale << 14) | Width (14 bits)
1397	   16 bits      :     (2 bits Vertical Scale << 14) | Height (14 bits)

1399	   ---- End code block ----------------------------------------

1401	   The following source code segment illustrates validation of the start
1402	   code and reading the width, height and scale factors for a key frame.

1404	   ---- Begin code block --------------------------------------

1406	   unsigned char *c = pbi->Source+3;

1408	   // vet via sync code
1409	   if(c[0]!=0x9d||c[1]!=0x01||c[2]!=0x2a)
1410	       return -1;

1412	   ---- End code block ----------------------------------------

1414	   where pbi->source points to the beginning of the frame.

1416	   The following code reads the image dimension from the bitstream:

1418	   ---- Begin code block --------------------------------------

1420	   pc->Width      = swap2(*(unsigned short*)(c+3))&0x3fff;
1421	   pc->horiz_scale = swap2(*(unsigned short*)(c+3))>>14;
1422	   pc->Height     = swap2(*(unsigned short*)(c+5))&0x3fff;
1423	   pc->vert_scale  = swap2(*(unsigned short*)(c+5))>>14;

1425	   ---- End code block ----------------------------------------

1427	   where swap2 macro takes care of the endian on different platform:

1429	   ---- Begin code block --------------------------------------

1431	   #if defined(__ppc__) || defined(__ppc64__)
1432	   # define swap2(d)  \
1433	     ((d&0x000000ff)<<8) |  \
1434	     ((d&0x0000ff00)>>8)
1435	   #else
1436	     # define swap2(d) d
1437	   #endif

1439	   ---- End code block ----------------------------------------

1441	   While each frame is encoded as a raster scan of 16x16 macroblocks,
1442	   the frame dimensions are not necessarily evenly divisible by 16.  In
1443	   this case, write ew = 16 - (width & 15) and eh = 16 - (height & 15)
1444	   for the excess width and height, respectively.  Although they are
1445	   encoded, the last ew columns and eh rows are not actually part of the
1446	   image and should be discarded before final output.  However, these
1447	   "excess pixels" should be maintained in the internal reconstruction
1448	   buffer used to predict ensuing frames.

1450	   The scaling specifications for each dimension are encoded as follows.

1452	             +-------+--------------------------------------+
1453	             | Value | Scaling                              |
1454	             +-------+--------------------------------------+
1455	             | 0     | No upscaling (the most common case). |
1456	             |       |                                      |
1457	             | 1     | Upscale by 5/4.                      |
1458	             |       |                                      |
1459	             | 2     | Upscale by 5/3.                      |
1460	             |       |                                      |
1461	             | 3     | Upscale by 2.                        |
1462	             +-------+--------------------------------------+

1464	   Upscaling does not affect the reconstruction buffer, which should be
1465	   maintained at the encoded resolution.  Any reasonable method of
1466	   upsampling (including any that may be supported by video hardware in
1467	   the playback environment) may be used.  Since scaling has no effect
1468	   on decoding, we do not discuss it any further.

1470	   As discussed in Chapter 5, allocation (or re-allocation) of data
1471	   structures (such as the reconstruction buffer) whose size depends on
1472	   dimension will be triggered here.

1474	9.2.  Color Space and Pixel Type (Key Frames-only)

1476	           +-------+------------------------------------------+
1477	           | Field | Value                                    |
1478	           +-------+------------------------------------------+
1479	           | L(1)  | 1-bit color space type specification     |
1480	           |       |                                          |
1481	           | L(1)  | 1-bit pixel value clamping specification |
1482	           +-------+------------------------------------------+

1484	   The color space type bit is encoded as the following:

1486	   o  0 - YUV color space similar to the YCrCb color space defined in
1487	      ITU-R BT.601

1489	   o  1 - YUV color space whose digital conversion to RGB does not
1490	      involve multiplication and division

1492	   It should be noted that in either case, the actual conversion between
1493	   YUV and RGB is not part of this specification.

1495	   Note: In the initial release of VP8 only color space type 0 is
1496	   supported.

1498	   The pixel value clamping type bit is encoded as the following:

1500	   o  0 - Decoders are required to clamp the reconstructed pixel values
1501	      to between 0 and 255 (inclusive).

1503	   o  1 - Reconstructed pixel values are guaranteed to be between 0 and
1504	      255, no clamping is necessary.

1506	   Information in this subsection does not appear in interframes.

1508	9.3.  Segment-based Adjustments

1510	   This subsection contains probability and value information for
1511	   implementing segment adaptive adjustments to default decoder
1512	   behaviors.  The data in this section is used in the decoding of the
1513	   ensuing per-segment information and applies to the entire frame.
1514	   When segment adaptive adjustments are enabled, each macroblock will
1515	   be assigned a segment ID.  Macroblocks with the same segment ID
1516	   belong to same segment, and have the same adaptive adjustments over
1517	   default baseline values for the frame.  The adjustments can be
1518	   quantization level or loop filter strength.

1520	   The context for decoding this feature is provided by section B of the
1521	   frame header.  It contains:

1523	   1.  A segmentation_enabled Flag which if 1 (0), enables (disables)
1524	       the feature for this frame.  The remaining fields occur if the
1525	       feature is enabled.

1527	   2.  L(1) indicates if the segment map is updated for the current
1528	       frame (update_mb_segmentaton_map)

1530	   3.  L(1) indicates if the segment feature data items are updated for
1531	       the current frame

1533	   4.  If flag in 3 is 1, the following fields occur:

1535	       1.  L(1) the mode of segment feature data, can be absolute value
1536	           mode or delta value mode, later mode, feature data is the
1537	           difference against current frame defaults.

1539	       2.  Segment feature data items are decoded segment by each
1540	           segment for each segment feature.  For every data item, a one
1541	           bit flag indicating if the item is 0 or a non-zero value to
1542	           be decoded.  If there is non-zero value, the value is decoded
1543	           as a magnitude L(n) followed by a one bit sign (L(1), 0 for
1544	           positive and 1 for negative).  The length n can be looked up
1545	           from a pre-defined length table for all feature data.

1547	   5.  If flag in 2 is 1, the probabilities of the decoding tree for
1548	       segment map are decoded from the bitstream.  Each probability is
1549	       decoded with one bit flag indicating if the probability is the
1550	       default value of 255 (flag is 0), or the probability is an 8-bit
1551	       value, L(8), from the bitstream.

1553	   The layout and semantics supporting this feature at the macroblock
1554	   level will be described in Chapter 10.

1556	9.4.  Loop Filter Type and Levels

1558	   VP8 supports two types of loop filter, having different computational
1559	   complexity.  The following bits occur in the header to support the
1560	   selection of the baseline type, strength and sharpness behavior of
1561	   the loop filter used for the current frame.

1563	                       +-------+-------------------+
1564	                       | Index | Description       |
1565	                       +-------+-------------------+
1566	                       | L(1)  | filter_type       |
1567	                       |       |                   |
1568	                       | L(6)  | loop_filter_level |
1569	                       |       |                   |
1570	                       | L(3)  | sharpness_level   |
1571	                       +-------+-------------------+

1573	   The meaning of these numbers will be further explained in Chapter 15.

1575	   VP8 has a feature in the bitstream that enables adjustment of the
1576	   loop filter level based on a macroblock's prediction mode and
1577	   reference frame.  The per-macroblock adjustment is done through delta
1578	   values against default loop filter level for the current frame.  This
1579	   subsection contains flag and value information for implementing per-
1580	   macroblock loop filter level adjustment to default decoder behaviors.
1581	   The data in this section is used in the decoding of the ensuing per-
1582	   macroblock information and applies to the entire frame.

1584	   L(1) is a one-bit flag indicating if macroblock loop filter
1585	   adjustment is on for the current frame. 0 means such feature is not
1586	   supported in the current frame and 1 means this feature is enabled
1587	   for the current frame.

1589	   Whether the adjustment is based on reference frame or encoding mode,
1590	   the adjustment of loop filter level is done via a delta value against
1591	   a baseline loop filter value.  The delta values are updated for the
1592	   current frame if an L(1) bit, mode_ref_lf_delta_update, takes the
1593	   value 1.  There are two groups of delta values, one group of delta
1594	   values are for reference frame-based adjustments, the other group is
1595	   for mode-based adjustments.  The number of delta values in the two
1596	   groups is MAX_REF_LF_DELTAS and MAX_MODE_LF_DELTAS, respectively.
1597	   For every value within the two groups, there is one bit L(1) to
1598	   indicate if the particular value is updated.  When one is updated
1599	   (1), it is transmitted as a six-bit magnitude L(6) followed by a one-
1600	   bit sign flag (L(1), 0 for positive and 1 for negative).

1602	9.5.  Token Partition and Partition Data Offsets

1604	   VP8 allows DCT coefficients to be packed into multiple partitions
1605	   besides the first partition with header and per-macroblock prediction
1606	   information, so the decoder can perform parallel decoding in an
1607	   efficient manner.  There are two bits L(2) used to indicate the
1608	   number of coefficient data partitions within a compressed frame.  The
1609	   two bits are defined in the following table:

1611	                 +-------+-------+----------------------+
1612	                 | Bit 1 | Bit 0 | Number of Partitions |
1613	                 +-------+-------+----------------------+
1614	                 | 0     | 0     | 1                    |
1615	                 |       |       |                      |
1616	                 | 0     | 1     | 2                    |
1617	                 |       |       |                      |
1618	                 | 1     | 0     | 4                    |
1619	                 |       |       |                      |
1620	                 | 1     | 1     | 8                    |
1621	                 +-------+-------+----------------------+

1623	   When the number of partitions is greater than one, offsets are
1624	   embedded in the bitstream to provide the decoder direct access to
1625	   token partitions.  Each offset is written in 3 bytes (24 bits).
1626	   Since the offset to the first partition is always 0, only the offsets
1627	   for partitions other than the first partition are encoded in the
1628	   bitstream.  The partitioned data are consecutive in the bitstream, so
1629	   offsets can also be used to calculate the data size of each
1630	   partition.  The following pseudo code illustrates how the size/offset
1631	   is defined by the three bytes in the bitstream.

1633	   ---- Begin code block --------------------------------------

1635	   Offset/size  =  (uint32)(byte0) + ((uint32)(byte1)<<8)
1636	     + ((uint32)(byte2)<<16);

1638	   ---- End code block ----------------------------------------

1640	9.6.  Dequantization Indices

1642	   All residue signals are specified via a quantized 4x4 DCT applied to
1643	   the Y, U, V, or Y2 subblocks of a macroblock.  As detailed in Chapter
1644	   14, before inverting the transform, each decoded coefficient is
1645	   multiplied by one of six dequantization factors, the choice of which
1646	   depends on the plane (Y, chroma = U or V, Y2) and coefficient
1647	   position (DC = coefficient 0, AC = coefficients 1-15).  The six
1648	   values are specified using 7-bit indices into six corresponding fixed
1649	   tables (the tables are given in Chapter 14).

1651	   The first 7-bit index gives the dequantization table index for Y
1652	   plane AC coefficients, called yac_qi.  It is always coded and acts as
1653	   a baseline for the other 5 quantization indices, each of which is
1654	   represented by a delta from this baseline index.  Following is pseudo
1655	   code for reading the indices:

1657	   ---- Begin code block --------------------------------------

1659	   yac_qi     = L(7);           /* Y ac index always specified */
1660	   ydc_delta  = F? delta(): 0;  /* Y dc delta specified if
1661	                                   flag is true */

1663	   y2dc_delta = F? delta(): 0;  /* Y2 dc delta specified if
1664	                                   flag is true */
1665	   y2ac_delta = F? delta(): 0;  /* Y2 ac delta specified if
1666	                                   flag is true */

1668	   uvdc_delta = F? delta(): 0;  /* chroma dc delta specified
1669	                                   if flag is true */
1670	   uvac_delta = F? delta(): 0;  /* chroma ac delta specified
1671	                                   if flag is true */

1673	   ---- End code block ----------------------------------------

1675	   Where delta() is the process to read 5 bits from the bitstream to
1676	   determine a signed delta value:

1678	       +-------+--------------------------------------------------+
1679	       | Index | Description                                      |
1680	       +-------+--------------------------------------------------+
1681	       | L(4)  | Magnitude of delta                               |
1682	       |       |                                                  |
1683	       | L(1)  | Sign of delta, 0 for positive and 1 for negative |
1684	       +-------+--------------------------------------------------+

1686	9.7.  Refresh Golden Frame and AltRef Frame

1688	   For key frames, both golden frame and altref frame are refreshed/
1689	   replaced by the current reconstructed frame, by default.  For non-key
1690	   frames, VP8 uses two bits to indicate whether the two frame buffers
1691	   are refreshed, using the reconstructed current frame:

1693	   +-------+-----------------------------------------------------------+
1694	   | Index | Description                                               |
1695	   +-------+-----------------------------------------------------------+
1696	   | L(1)  | Whether golden frame is refreshed (0 complete. This flag  |
1697	   |       | does not occur for no, 1 for yes).                        |
1698	   |       |                                                           |
1699	   | L(1)  | Whether altref frame is refreshed (0 for no, 1 for yes).  |
1700	   +-------+-----------------------------------------------------------+

1702	   When the flag for golden frame is 0, VP8 uses 2 more bits in the
1703	   bitstream to indicate whether the buffer (and which buffer) is copied
1704	   to the golden frame, or if no buffer is copied:

1706	           +-------+------------------------------------------+
1707	           | Index | Description                              |
1708	           +-------+------------------------------------------+
1709	           | L(2)  | Buffer copy flag for golden frame buffer |
1710	           +-------+------------------------------------------+

1712	   Where:

1714	   o  0 means no buffer is copied to golden frame

1716	   o  1 means last_frame is copied to golden frame

1718	   o  2 means alt_ref_frame is copied to golden frame

1720	   Similarly, when the flag for altref is 0, VP8 uses 2 bits in the
1721	   bitstream to indicate which buffer is copied to alt_ref_frame.

1723	           +-------+------------------------------------------+
1724	           | Index | Description                              |
1725	           +-------+------------------------------------------+
1726	           | L(2)  | Buffer copy flag for altref frame buffer |
1727	           +-------+------------------------------------------+

1729	   Where:

1731	   o  0 means no buffer is copied to altref frame

1733	   o  1 means last_frame is copied to altref frame

1735	   o  2 means golden_frame is copied to altref frame

1737	   Two bits are transmitted for ref_frame_sign_bias for golden_frame and
1738	   alt_ref_frame respectively.

1740	                +-------+---------------------------------+
1741	                | Index | Description                     |
1742	                +-------+---------------------------------+
1743	                | L(1)  | Sign bias flag for golden frame |
1744	                |       |                                 |
1745	                | L(1)  | Sign bias flag for altref frame |
1746	                +-------+---------------------------------+

1748	   These values are used to control the sign of the motion vectors when
1749	   a golden frame or an altref frame is used as the reference frame for
1750	   a macroblock.

1752	9.8.  Refresh Last Frame Buffer

1754	   VP8 uses one bit, L(1), to indicate if the last frame reference
1755	   buffer is refreshed using the constructed current frame.  On key
1756	   frame this bit is overridden, and the last frame buffer is always
1757	   refreshed.

1759	9.9.  DCT Coefficient Probability Update

1761	   Contains a partial update of the probability tables used to decode
1762	   DCT coefficients.  These tables are maintained across interframes but
1763	   are of course replaced with their defaults at the beginning of every
1764	   key frame.

1766	   The layout and semantics of this field will be taken up in Chapter
1767	   13.

1769	9.10.  Remaining Frame Header Data (non-Key Frame)

1771	   +-------+-----------------------------------------------------------+
1772	   | Index | Description                                               |
1773	   +-------+-----------------------------------------------------------+
1774	   | L(1)  | mb_no_coeff_skip. This flag indicates at the frame level  |
1775	   |       | if skipping of macroblocks with no non-zero coefficients  |
1776	   |       | is enabled. If it is set to 0 then prob_skip_false is not |
1777	   |       | read and mb_skip_coeff is forced to 0 for all macroblocks |
1778	   |       | (see Sections 11.1 and 12.1).                             |
1779	   |       |                                                           |
1780	   | L(8)  | prob_skip_false = probability used for decoding a         |
1781	   |       | macroblock level flag, which indicates if a macroblock    |
1782	   |       | has any non-zero coefficients. Only read if               |
1783	   |       | mb_no_coeff_skip is 1.                                    |
1784	   |       |                                                           |
1785	   | L(8)  | prob_intra = probability that a macroblock is "intra"     |
1786	   |       | predicted, that is, predicted from the already-encoded    |
1787	   |       | portions of the current frame as opposed to "inter"       |
1788	   |       | predicted, that is, predicted from the contents of a      |
1789	   |       | prior frame.                                              |
1790	   |       |                                                           |
1791	   | L(8)  | prob_last = probability that an inter-predicted           |
1792	   |       | macroblock is predicted from the immediately previous     |
1793	   |       | frame, as opposed to the most recent golden frame or      |
1794	   |       | altref frame..                                            |
1795	   |       |                                                           |
1796	   | L(8)  | prob_gf = probability that an inter-predicted macroblock  |
1797	   |       | is predicted from the most recent golden frame, as        |
1798	   |       | opposed to the altref frame                               |
1799	   |       |                                                           |
1800	   | F     | If true, followed by four L(8)s updating the              |
1801	   |       | probabilities for the different types of intra-prediction |
1802	   |       | for the Y plane. These probabilities correspond to the    |
1803	   |       | four interior nodes of the decoding tree for intra Y      |
1804	   |       | modes in an interframe, that is, the even positions in    |
1805	   |       | the ymode_tree array given above.                         |
1806	   |       |                                                           |
1807	   | F     | If true, followed by three L(8)s updating the             |
1808	   |       | probabilities for the different types of intra-prediction |
1809	   |       | for the chroma planes. These probabilities correspond to  |
1810	   |       | the even positions in the uv_mode_tree array given above. |
1811	   |       |                                                           |
1812	   | X     | Motion vector probability update. The details will be     |
1813	   |       | given after the discussion of motion vector decoding.     |
1814	   +-------+-----------------------------------------------------------+

1816	   Decoding of this portion (only) of the frame header is handled in the
1817	   reference decoder file decodemv.c.

1819	9.11.  Remaining Frame Header Data (Key Frame)

1821	   +-------+-----------------------------------------------------------+
1822	   | Index | Description                                               |
1823	   +-------+-----------------------------------------------------------+
1824	   | L(1)  | mb_no_coeff_skip. This flag indicates at the frame level  |
1825	   |       | if skipping of macroblocks with no non-zero coefficients  |
1826	   |       | is enabled. If it is set to 0 then prob_skip_false is not |
1827	   |       | read and mb_skip_coeff is forced to 0 for all macroblocks |
1828	   |       | (see Sections 11.1 and 12.1).                             |
1829	   |       |                                                           |
1830	   | L(8)  | prob_skip_false = Probability used for decoding a         |
1831	   |       | macroblock level flag, which indicates if a macroblock    |
1832	   |       | has any non-zero coefficients. Only read if               |
1833	   |       | mb_no_coeff_skip is 1.                                    |
1834	   +-------+-----------------------------------------------------------+

1836	   Decoding of this portion of the frame header is handled in the
1837	   reference decoder file demode.c.

1839	   This completes the layout of the frame header.  The remainder of the
1840	   first data partition consists of macroblock-level prediction data.

1842	   After the frame header is processed, all probabilities needed to
1843	   decode the prediction and residue data are known and will not change
1844	   until the next frame.

1846	10.  Segment-based Feature Adjustments

1848	   Every macroblock may optionally override some of the default
1849	   behaviors of the decoder.  Specifically, VP8 uses segment based
1850	   adjustments to support changing quantizer level and loop filter level
1851	   for a macroblock.  When the segment-based adjustment feature is
1852	   enabled for a frame, each macroblock within the frame is coded with a
1853	   segment_id.  This effectively segments all the macroblocks in the
1854	   current frame into a number of different segments.  Macroblocks
1855	   within the same segment behave exactly the same for quantizer and
1856	   loop filter level adjustments.

1858	   If both the segmentation_enabled and update_mb_segmentation_map flags
1859	   in subsection B of the frame header take a value of 1, the prediction
1860	   data for each (intra- or inter-coded) macroblock begins with a
1861	   specification of segment_id for the current macroblock.  It is
1862	   decoded using this simple tree ...

1864	   ---- Begin code block --------------------------------------

1866	   const tree_index mb_segment_tree [2 * (4-1)] =
1867	     {
1868	       2,  4,     /* root: "0", "1" subtrees */
1869	       -0, -1,    /* "00" = 0th value, "01" = 1st value */
1870	        -2, -3    /* "10" = 2nd value, "11" = 3rd value */
1871	     }

1873	   ---- End code block ----------------------------------------

1875	   ... combined with a 3-entry probability table
1876	   mb_segment_tree_probs[3].  The macroblock's segment_id is used later
1877	   in the decoding process to look into the segment_feature_data table
1878	   and determine how the quantizer and loop filter levels are adjusted.

1880	   The decoding of segment_id, together with the parsing of intra-
1881	   prediction modes (which is taken up next), is implemented in the
1882	   reference decoder file demode.c.

1884	11.  Key Frame Macroblock Prediction Records

1886	   After the features described above, the macroblock prediction record
1887	   next specifies the prediction mode used for the macroblock.

1889	11.1.  mb_skip_coeff

1891	   The single bool flag is decoded using prob_skip_false if and only if
1892	   mb_no_coeff_skip is set to 1 (see sections 9.10 and 9.11).  If
1893	   mb_no_coeff_skip is set to 0 then this value defaults to 0.

1895	11.2.  Luma Modes

1897	   First comes the luma specification of type intra_mbmode, coded using
1898	   the kf_ymode_tree, as described in Chapter 8 and repeated here for
1899	   convenience:

1901	   ---- Begin code block --------------------------------------

1903	   typedef enum
1904	   {
1905	       DC_PRED, /* predict DC using row above and column to the left */
1906	       V_PRED,  /* predict rows using row above */
1907	       H_PRED,  /* predict columns using column to the left */
1908	       TM_PRED, /* propagate second differences a la "true motion" */

1910	       B_PRED,  /* each Y subblock is independently predicted */

1912	       num_uv_modes = B_PRED,  /* first four modes apply to chroma */
1913	       num_ymodes   /* all modes apply to luma */
1914	   }
1915	   intra_mbmode;

1917	   const tree_index kf_ymode_tree [2 * (num_ymodes - 1)] =
1918	   {
1919	    -B_PRED, 2,            /* root: B_PRED = "0", "1" subtree */
1920	     4, 6,                 /* "1" subtree has 2 descendant subtrees */
1921	      -DC_PRED, -V_PRED,   /* "10" subtree: DC_PRED = "100",
1922	                              V_PRED = "101" */
1923	      -H_PRED, -TM_PRED    /* "11" subtree: H_PRED = "110",
1924	                              TM_PRED = "111" */
1925	   };

1927	   ---- End code block ----------------------------------------

1929	   For key frames, the Y mode is decoded using a fixed probability array
1930	   as follows:

1932	   ---- Begin code block --------------------------------------

1934	   const Prob kf_ymode_prob [num_ymodes - 1] = { 145, 156, 163, 128};
1935	   Ymode = (intra_mbmode) treed_read( d, kf_ymode_tree, kf_ymode_prob);

1937	   ---- End code block ----------------------------------------

1939	   d is of course the bool_decoder being used to read the first data
1940	   partition.

1942	   If the Ymode is B_PRED, it is followed by a (tree-coded) mode for
1943	   each of the 16 Y subblocks.  The 10 subblock modes and their coding
1944	   tree as follows:

1946	   ---- Begin code block --------------------------------------

1948	   typedef enum
1949	   {
1950	       B_DC_PRED,  /* predict DC using row above and column
1951	                      to the left */
1952	       B_TM_PRED,  /* propagate second differences a la
1953	                      "true motion" */

1955	       B_VE_PRED,  /* predict rows using row above */
1956	       B_HE_PRED,  /* predict columns using column to the left */

1958	       B_LD_PRED,  /* southwest (left and down) 45 degree diagonal
1959	                      prediction */
1960	       B_RD_PRED,  /* southeast (right and down) "" */

1962	       B_VR_PRED,  /* SSE (vertical right) diagonal prediction */
1963	       B_VL_PRED,  /* SSW (vertical left) "" */
1964	       B_HD_PRED,  /* ESE (horizontal down) "" */
1965	       B_HU_PRED,  /* ENE (horizontal up) "" */

1967	       num_intra_bmodes
1968	   }
1969	   intra_bmode;

1971	   /* Coding tree for the above, with implied codings as comments */

1973	   const tree_index bmode_tree [2 * (num_intra_bmodes - 1)] =
1974	   {
1975	    -B_DC_PRED, 2,                   /* B_DC_PRED = "0" */
1976	     -B_TM_PRED, 4,                  /* B_TM_PRED = "10" */
1977	      -B_VE_PRED, 6,                 /* B_VE_PRED = "110" */
1978	       8, 12,
1979	        -B_HE_PRED, 10,              /* B_HE_PRED = "1110" */
1980	         -B_RD_PRED, -B_VR_PRED,     /* B_RD_PRED = "111100",
1981	                                        B_VR_PRED = "111101" */
1982	        -B_LD_PRED, 14,              /* B_LD_PRED = "111110" */
1983	          -B_VL_PRED, 16             /* B_VL_PRED = "1111110" */
1984	            -B_HD_PRED, -B_HU_PRED   /* HD = "11111110",
1985	                                        HU = "11111111" */
1986	   };

1988	   ---- End code block ----------------------------------------

1990	   The first four modes are smaller versions of the similarly-named
1991	   16x16 modes above, albeit with slightly different numbering.  The
1992	   last six "diagonal" modes are unique to luma subblocks.

1994	11.3.  Subblock Mode Contexts

1996	   The coding of subblock modes in key frames uses the modes already
1997	   coded for the subblocks to the left of and above the subblock to
1998	   select a probability array for decoding the current subblock mode.
1999	   This is our first instance of contextual prediction, and there are
2000	   several caveats associated with it:

2002	   1.  The adjacency relationships between subblocks are based on the
2003	       normal default raster placement of the subblocks.

2005	   2.  The adjacent subblocks need not lie in the current macroblock.
2006	       The subblocks to the left of the left-edge subblocks 0, 4, 8, and
2007	       12 are the right-edge subblocks 3, 7, 11, and 15, respectively,
2008	       of the (already coded) macroblock immediately to the left.
2009	       Similarly, the subblocks above the top-edge subblocks 0, 1, 2,
2010	       and 3 are the bottom-edge subblocks 12, 13, 14, and 15 of the
2011	       already-coded macroblock immediately above us.

2013	   3.  For macroblocks on the top row or left edge of the image, some of
2014	       the predictors will be non-existent.  Such predictors are taken
2015	       to have had the value B_DC_PRED which, perhaps conveniently,
2016	       takes the value 0 in the enumeration above.  A simple management
2017	       scheme for these contexts might maintain a row of above
2018	       predictors and four left predictors.  Before decoding the frame,
2019	       the entire row is initialized to B_DC_PRED; before decoding each
2020	       row of macroblocks, the four left predictors are also set to
2021	       B_DC_PRED.  After decoding a macroblock, the bottom four subblock
2022	       modes are copied into the row predictor (at the current position,
2023	       which then advances to be above the next macroblock) and the
2024	       right four subblock modes are copied into the left predictor.

2026	   4.  Many macroblocks will of course be coded using a 16x16 luma
2027	       prediction mode.  For the purpose of predicting ensuing subblock
2028	       modes (only), such macroblocks derive a subblock mode, constant
2029	       throughout the macroblock, from the 16x16 luma mode as follows:
2030	       DC_PRED uses B_DC_PRED, V_PRED uses B_VE_PRED, H_PRED uses
2031	       B_HE_PRED, and TM_PRED uses B_TM_PRED.

2033	   5.  Although we discuss interframe modes later, we remark here that,
2034	       while interframes do use all the intra-coding modes described
2035	       here and below, the subblock modes in an interframe are coded
2036	       using a single constant probability array that does not depend on
2037	       any context.

2039	   The dependence of subblock mode probability on the nearby subblock
2040	   mode context is most easily handled using a three-dimensional
2041	   constant array:

2043	   ---- Begin code block --------------------------------------

2045	   const Prob kf_bmode_prob [num_intra_bmodes] [num_intra_bmodes]
2046	     [num_intra_bmodes-1];

2048	   ---- End code block ----------------------------------------

2050	   The outer two dimensions of this array are indexed by the already-
2051	   coded subblock modes above and to the left of the current block,
2052	   respectively.  The inner dimension is a typical tree probability list
2053	   whose indices correspond to the even indices of the bmode_tree above.
2054	   The mode for the j^(th) luma subblock is then

2056	   ---- Begin code block --------------------------------------

2058	   Bmode = (intra_bmode) treed_read( d, bmode_tree, kf_bmode_prob
2059	     [A] [L]);

2061	   ---- End code block ----------------------------------------

2063	   where the 4x4 Y subblock index j varies from 0 to 15 in raster order
2064	   and A and L are the modes used above and to-the-left of the j^(th)
2065	   subblock.

2067	   The contents of the kf_bmode_prob array are given at the end of this
2068	   chapter.

2070	11.4.  Chroma Modes

2072	   After the Y mode (and optional subblock mode) specification comes the
2073	   chroma mode.  The chroma modes are a subset of the Y modes and are
2074	   coded using the uv_mode_tree described in Chapter 8, again repeated
2075	   here for convenience:

2077	   ---- Begin code block --------------------------------------

2079	   const tree_index uv_mode_tree [2 * (num_uv_modes - 1)] =
2080	   {
2081	    -DC_PRED, 2,           /* root: DC_PRED = "0", "1" subtree */
2082	     -V_PRED, 4,           /* "1" subtree:  V_PRED = "10",
2083	                              "11" subtree */
2084	      -H_PRED, -TM_PRED    /* "11" subtree: H_PRED = "110",
2085	                              TM_PRED = "111" */
2086	   };

2088	   ---- End code block ----------------------------------------

2090	   As for the Y modes (in a key frame), the chroma modes are coded using
2091	   a fixed, contextless probability table:

2093	   ---- Begin code block --------------------------------------

2095	   const Prob kf_uv_mode_prob [num_uv_modes - 1] = { 142, 114, 183};
2096	   uv_mode = (intra_mbmode) treed_read( d, uv_mode_tree,
2097	     kf_uv_mode_prob);

2099	   ---- End code block ----------------------------------------

2101	   This completes the description of macroblock prediction coding for
2102	   key frames.  As will be discussed in Chapter 16, the coding of intra
2103	   modes within interframes is similar, but not identical, to that
2104	   described here (and in the reference code) for prediction modes and,
2105	   indeed, for all tree-coded data in VP8.

2107	11.5.  Subblock Mode Probability Table

2109	   Finally, here is the fixed probability table used to decode subblock
2110	   modes in key frames.

2112	   ---- Begin code block --------------------------------------

2114	   const Prob kf_bmode_prob [num_intra_bmodes] [num_intra_bmodes]
2115	     [num_intra_bmodes-1] =
2116	   {
2117	     {
2118	       { 231, 120,  48,  89, 115, 113, 120, 152, 112},
2119	       { 152, 179,  64, 126, 170, 118,  46,  70,  95},
2120	       { 175,  69, 143,  80,  85,  82,  72, 155, 103},
2121	       {  56,  58,  10, 171, 218, 189,  17,  13, 152},
2122	       { 144,  71,  10,  38, 171, 213, 144,  34,  26},
2123	       { 114,  26,  17, 163,  44, 195,  21,  10, 173},
2124	       { 121,  24,  80, 195,  26,  62,  44,  64,  85},
2125	       { 170,  46,  55,  19, 136, 160,  33, 206,  71},
2126	       {  63,  20,   8, 114, 114, 208,  12,   9, 226},
2127	       {  81,  40,  11,  96, 182,  84,  29,  16,  36}
2128	     },
2129	     {
2130	       { 134, 183,  89, 137,  98, 101, 106, 165, 148},
2131	       {  72, 187, 100, 130, 157, 111,  32,  75,  80},
2132	       {  66, 102, 167,  99,  74,  62,  40, 234, 128},
2133	       {  41,  53,   9, 178, 241, 141,  26,   8, 107},
2134	       { 104,  79,  12,  27, 217, 255,  87,  17,   7},
2135	       {  74,  43,  26, 146,  73, 166,  49,  23, 157},
2136	       {  65,  38, 105, 160,  51,  52,  31, 115, 128},
2137	       {  87,  68,  71,  44, 114,  51,  15, 186,  23},
2138	       {  47,  41,  14, 110, 182, 183,  21,  17, 194},
2139	       {  66,  45,  25, 102, 197, 189,  23,  18,  22}
2140	     },
2141	     {
2142	       {  88,  88, 147, 150,  42,  46,  45, 196, 205},
2143	       {  43,  97, 183, 117,  85,  38,  35, 179,  61},
2144	       {  39,  53, 200,  87,  26,  21,  43, 232, 171},
2145	       {  56,  34,  51, 104, 114, 102,  29,  93,  77},
2146	       { 107,  54,  32,  26,  51,   1,  81,  43,  31},
2147	       {  39,  28,  85, 171,  58, 165,  90,  98,  64},
2148	       {  34,  22, 116, 206,  23,  34,  43, 166,  73},
2149	       {  68,  25, 106,  22,  64, 171,  36, 225, 114},
2150	       {  34,  19,  21, 102, 132, 188,  16,  76, 124},
2151	       {  62,  18,  78,  95,  85,  57,  50,  48,  51}
2152	     },
2153	     {
2154	       { 193, 101,  35, 159, 215, 111,  89,  46, 111},
2155	       {  60, 148,  31, 172, 219, 228,  21,  18, 111},
2156	       { 112, 113,  77,  85, 179, 255,  38, 120, 114},
2157	       {  40,  42,   1, 196, 245, 209,  10,  25, 109},
2158	       { 100,  80,   8,  43, 154,   1,  51,  26,  71},
2159	       {  88,  43,  29, 140, 166, 213,  37,  43, 154},
2160	       {  61,  63,  30, 155,  67,  45,  68,   1, 209},
2161	       { 142,  78,  78,  16, 255, 128,  34, 197, 171},
2162	       {  41,  40,   5, 102, 211, 183,   4,   1, 221},
2163	       {  51,  50,  17, 168, 209, 192,  23,  25,  82}
2164	     },
2165	     {
2166	       { 125,  98,  42,  88, 104,  85, 117, 175,  82},
2167	       {  95,  84,  53,  89, 128, 100, 113, 101,  45},
2168	       {  75,  79, 123,  47,  51, 128,  81, 171,   1},
2169	       {  57,  17,   5,  71, 102,  57,  53,  41,  49},
2170	       { 115,  21,   2,  10, 102, 255, 166,  23,   6},
2171	       {  38,  33,  13, 121,  57,  73,  26,   1,  85},
2172	       {  41,  10,  67, 138,  77, 110,  90,  47, 114},
2173	       { 101,  29,  16,  10,  85, 128, 101, 196,  26},
2174	       {  57,  18,  10, 102, 102, 213,  34,  20,  43},
2175	       { 117,  20,  15,  36, 163, 128,  68,   1,  26}
2176	     },
2177	     {
2178	       { 138,  31,  36, 171,  27, 166,  38,  44, 229},
2179	       {  67,  87,  58, 169,  82, 115,  26,  59, 179},
2180	       {  63,  59,  90, 180,  59, 166,  93,  73, 154},
2181	       {  40,  40,  21, 116, 143, 209,  34,  39, 175},
2182	       {  57,  46,  22,  24, 128,   1,  54,  17,  37},
2183	       {  47,  15,  16, 183,  34, 223,  49,  45, 183},
2184	       {  46,  17,  33, 183,   6,  98,  15,  32, 183},
2185	       {  65,  32,  73, 115,  28, 128,  23, 128, 205},
2186	       {  40,   3,   9, 115,  51, 192,  18,   6, 223},
2187	       {  87,  37,   9, 115,  59,  77,  64,  21,  47}
2188	     },
2189	     {
2190	       { 104,  55,  44, 218,   9,  54,  53, 130, 226},
2191	       {  64,  90,  70, 205,  40,  41,  23,  26,  57},
2192	       {  54,  57, 112, 184,   5,  41,  38, 166, 213},
2193	       {  30,  34,  26, 133, 152, 116,  10,  32, 134},
2194	       {  75,  32,  12,  51, 192, 255, 160,  43,  51},
2195	       {  39,  19,  53, 221,  26, 114,  32,  73, 255},
2196	       {  31,   9,  65, 234,   2,  15,   1, 118,  73},
2197	       {  88,  31,  35,  67, 102,  85,  55, 186,  85},
2198	       {  56,  21,  23, 111,  59, 205,  45,  37, 192},
2199	       {  55,  38,  70, 124,  73, 102,   1,  34,  98}
2200	     },
2201	     {
2202	       { 102,  61,  71,  37,  34,  53,  31, 243, 192},
2203	       {  69,  60,  71,  38,  73, 119,  28, 222,  37},
2204	       {  68,  45, 128,  34,   1,  47,  11, 245, 171},
2205	       {  62,  17,  19,  70, 146,  85,  55,  62,  70},
2206	       {  75,  15,   9,   9,  64, 255, 184, 119,  16},
2207	       {  37,  43,  37, 154, 100, 163,  85, 160,   1},
2208	       {  63,   9,  92, 136,  28,  64,  32, 201,  85},
2209	       {  86,   6,  28,   5,  64, 255,  25, 248,   1},
2210	       {  56,   8,  17, 132, 137, 255,  55, 116, 128},
2211	       {  58,  15,  20,  82, 135,  57,  26, 121,  40}
2212	     },
2213	     {
2214	       { 164,  50,  31, 137, 154, 133,  25,  35, 218},
2215	       {  51, 103,  44, 131, 131, 123,  31,   6, 158},
2216	       {  86,  40,  64, 135, 148, 224,  45, 183, 128},
2217	       {  22,  26,  17, 131, 240, 154,  14,   1, 209},
2218	       {  83,  12,  13,  54, 192, 255,  68,  47,  28},
2219	       {  45,  16,  21,  91,  64, 222,   7,   1, 197},
2220	       {  56,  21,  39, 155,  60, 138,  23, 102, 213},
2221	       {  85,  26,  85,  85, 128, 128,  32, 146, 171},
2222	       {  18,  11,   7,  63, 144, 171,   4,   4, 246},
2223	       {  35,  27,  10, 146, 174, 171,  12,  26, 128}
2224	     },
2225	     {
2226	       { 190,  80,  35,  99, 180,  80, 126,  54,  45},
2227	       {  85, 126,  47,  87, 176,  51,  41,  20,  32},
2228	       { 101,  75, 128, 139, 118, 146, 116, 128,  85},
2229	       {  56,  41,  15, 176, 236,  85,  37,   9,  62},
2230	       { 146,  36,  19,  30, 171, 255,  97,  27,  20},
2231	       {  71,  30,  17, 119, 118, 255,  17,  18, 138},
2232	       { 101,  38,  60, 138,  55,  70,  43,  26, 142},
2233	       { 138,  45,  61,  62, 219,   1,  81, 188,  64},
2234	       {  32,  41,  20, 117, 151, 142,  20,  21, 163},
2235	       { 112,  19,  12,  61, 195, 128,  48,   4,  24}
2236	     }
2237	   };

2239	   ---- End code block ----------------------------------------

2241	12.  Intraframe Prediction

2243	   Intraframe prediction uses already-coded macroblocks within the
2244	   current frame to approximate the contents of the current macroblock.
2245	   It applies to intra-coded macroblocks in an interframe and to all
2246	   macroblocks in a key frame.

2248	   Relative to the current macroblock "M", the already-coded macroblocks
2249	   include all macroblocks above M together with the macroblocks on the
2250	   same row as, and to the left of, M, though at most four of these
2251	   macroblocks are actually used: the block "A" directly above M, the
2252	   blocks immediately to the left and right of A, and the block
2253	   immediately to the left of M.

2255	   Each of the prediction modes (i.e., means of extrapolation from
2256	   already-calculated values) uses fairly simple arithmetic on pixel
2257	   values whose positions, relative to the current position, are defined
2258	   by the mode.

2260	   The chroma (U and V) and luma (Y) predictions are independent of each
2261	   other.

2263	   The relative addressing of pixels applied to macroblocks on the upper
2264	   row or left column of the frame will sometimes cause pixels outside
2265	   the visible frame to be referenced.  Usually such out-of-bounds
2266	   pixels have an assumed value of 129 for pixels to the left of the
2267	   leftmost column of the visible frame and 127 for pixels above the top
2268	   row of the visible frame (including the special case of the pixel
2269	   above and to the left of the top-left pixel in the visible frame).
2270	   Exceptions to this (associated to certain modes) will be noted below.

2272	   The already-coded macroblocks referenced by intra-prediction have
2273	   been "reconstructed", that is, have been predicted and residue-
2274	   adjusted (as described in Chapter 14), but have not been loop-
2275	   filtered.  While it does process the edges between individual
2276	   macroblocks and individual subblocks, loop filtering (described in
2277	   Chapter 15) is applied to the frame as a whole, after all of the
2278	   macroblocks have been reconstructed.

2280	12.1.  mb_skip_coeff

2282	   The single bool flag is decoded using prob_skip_false if and only if
2283	   mb_no_coeff_skip is set to 1 (see Sections 9.10 and 9.11).  If
2284	   mb_no_coeff_skip is set to 0 then this value defaults to 0.

2286	12.2.  Chroma Prediction

2288	   The chroma prediction is a little simpler than the luma prediction,
2289	   so we treat it first.  Each of the chroma modes treats U and V
2290	   identically, that is, the U and V prediction values are calculated in
2291	   parallel, using the same relative addressing and arithmetic in each
2292	   of the two planes.

2294	   The modes extrapolate prediction values using the 8-pixel row "A"
2295	   lying immediately above the block (that is, the bottom chroma row of
2296	   the macroblock immediately above the current macroblock) and the
2297	   8-pixel column "L" immediately to the left of the block (that is, the
2298	   rightmost chroma column of the macroblock immediately to the left of
2299	   the current macroblock).

2301	   Vertical prediction (chroma mode V_PRED) simply fills each 8-pixel
2302	   row of the 8x8 chroma block with a copy of the "above" row (A).  If
2303	   the current macroblock lies on the top row of the frame, all 8 of the
2304	   pixel values in A are assigned the value 127.

2306	   Similarly, horizontal prediction (H_PRED) fills each 8-pixel column
2307	   of the 8x8 chroma block with a copy of the "left" column (L).  If the
2308	   current macroblock is in the left column of the frame, all 8 pixel
2309	   values in L are assigned the value 129.

2311	   DC prediction (DC_PRED) fills the 8x8 chroma block with a single
2312	   value.  In the generic case of a macroblock lying below the top row
2313	   and right of the leftmost column of the frame, this value is the
2314	   average of the 16 (genuinely visible) pixels in the (union of the)
2315	   above row A and left column L.

2317	   Otherwise, if the current macroblock lies on the top row of the
2318	   frame, the average of the 8 pixels in L is used; if it lies in the
2319	   left column of the frame, the average of the 8 pixels in A is used.
2320	   Note that the averages used in these exceptional cases are not the
2321	   same as those that would be arrived at by using the out-of-bounds A
2322	   and L values defined for V_PRED and H_PRED.  In the case of the
2323	   leftmost macroblock on the top row of the frame the 8x8 block is
2324	   simply filled with the constant value 128.

2326	   For DC_PRED, apart from the exceptional case of the top left
2327	   macroblock, we are averaging either 16 or 8 pixel values to get a
2328	   single prediction value that fills the 8x8 block.  The rounding is
2329	   done as follows:

2331	   ---- Begin code block --------------------------------------

2333	   int sum;  /* sum of 8 or 16 pixels at (at least) 16-bit precision */
2334	   int shf;  /* base 2 logarithm of the number of pixels (3 or 4) */

2336	   Pixel DCvalue = (sum + (1 << (shf-1))) >> shf;

2338	   ---- End code block ----------------------------------------

2340	   Because the summands are all valid pixels, no "clamp" is necessary in
2341	   the calculation of DCvalue.

2343	   The remaining "True Motion" (TM_PRED) chroma mode gets its name from
2344	   an older technique of video compression used by On2 Technologies, to
2345	   which it bears some relation.  In addition to the row "A" and column
2346	   "L", TM_PRED uses the pixel "P" above and to the left of the chroma
2347	   block.

2349	   In this mode, we propagate the horizontal differences between pixels
2350	   in A (starting from P), using the pixels from L to start each row.
2351	   The exact algorithm is as follows.

2353	   ---- Begin code block --------------------------------------

2355	   void TMpred(
2356	       Pixel b[8][8],      /* chroma (U or V) prediction block */
2357	       const Pixel A[8],   /* row of already-constructed pixels
2358	                              above block */
2359	       const Pixel L[8],   /* column of "" just to the left of
2360	                              block */
2361	       const Pixel P       /* pixel just to the left of A and
2362	                              above L*/
2363	   ) {
2364	       int r = 0;          /* row */
2365	       do {
2366	           int c = 0;      /* column */
2367	           do {
2368	               b[r][c] = clamp255( L[r]+ A[c] - P);
2369	           } while( ++c < 8);
2370	       } while( ++r < 8);
2371	   }

2373	   ---- End code block ----------------------------------------

2375	   Note that the process could equivalently be described as propagating
2376	   the vertical differences between pixels in L (starting from P), using
2377	   the pixels from A to start each column.

2379	   An implementation of chroma intra-prediction may be found in the
2380	   reference decoder file reconintra.c.

2382	   Unlike DC_PRED, for macroblocks on the top row or left edge TM_PRED
2383	   does use the out-of-bounds values of 127 and 129 (respectively)
2384	   defined for V_PRED and H_PRED.

2386	12.3.  Luma Prediction

2388	   The prediction processes for the first four 16x16 luma modes
2389	   (DC_PRED, V_PRED, H_PRED, and TM_PRED) are essentially identical to
2390	   the corresponding chroma prediction processes described above, the
2391	   only difference being that we are predicting a single 16x16 luma
2392	   block instead of two 8x8 chroma blocks.

2394	   Thus, the row "A" and column "L" here contain 16 pixels, the DC
2395	   prediction is calculated using 16 or 32 pixels (and shf is 4 or 5),
2396	   and we of course fill the entire prediction buffer, that is, 16 rows
2397	   (or columns) containing 16 pixels each.  The reference implementation
2398	   of 16x16 luma prediction is also in reconintra.c.

2400	   In the remaining luma mode (B_PRED), each 4x4 Y subblock is
2401	   independently predicted using one of ten modes (listed, along with
2402	   their encodings, in Chapter 11).

2404	   Also, unlike the full-macroblock modes already described, some of the
2405	   subblock modes use prediction pixels above and to the right of the
2406	   current subblock.  In detail, each 4x4 subblock "B" is predicted
2407	   using (at most) the 4-pixel column "L" immediately to the left of B
2408	   and the 8-pixel row "A" immediately above B, consisting of the 4
2409	   pixels above B followed by the 4 adjacent pixels above and to the
2410	   right of B, together with the single pixel "P" immediately to the
2411	   left of A (and immediately above L).

2413	   For the purpose of subblock intra-prediction, the pixels immediately
2414	   to the left and right of a pixel in a subblock are the same as the
2415	   pixels immediately to the left and right of the corresponding pixel
2416	   in the frame buffer "F".  Vertical offsets behave similarly: The
2417	   above row A lies immediately above B in F, and the adjacent pixels in
2418	   the left column L are separated by a single row in F.

2420	   Because entire macroblocks (as opposed to their constituent
2421	   subblocks) are reconstructed in raster-scan order, for subblocks
2422	   lying along the right edge (and not along the top row) of the current
2423	   macroblock, the four "extra" prediction pixels in A above and to the
2424	   right of B have not yet actually been constructed.

2426	   Subblocks 7, 11, and 15 are affected.  All three of these subblocks
2427	   use the same extra pixels as does subblock 3 (at the upper right
2428	   corner of the macroblock), namely the 4 pixels immediately above and
2429	   to the right of subblock 3.  Writing (R,C) for a frame buffer
2430	   position offset from the upper left corner of the current macroblock
2431	   by R rows and C columns, the extra pixels for all the right-edge
2432	   subblocks (3, 7, 11, and 15) are at positions (-1,16), (-1,17),
2433	   (-1,18), and (-1,19).

2435	   The details of the prediction modes are most easily described in
2436	   code.

2438	   ---- Begin code block --------------------------------------

2440	   /* Result pixels are often averages of two or three predictor
2441	      pixels. The following subroutines are used to calculate
2442	      these averages. Because the arguments are valid pixels, no
2443	      clamping is necessary. An actual implementation would
2444	      probably use inline functions or macros. */

2446	   /* Compute weighted average centered at y w/adjacent x, z */

2448	   Pixel avg3( Pixel x, Pixel y, Pixel z) {
2449	     return (x + y + y + z + 2) >> 2;}

2451	   /* Weighted average of 3 adjacent pixels centered at p */

2453	   Pixel avg3p( const Pixel *p) { return avg3( p[-1], p[0], p[1]);}

2455	   /* Simple average of x and y */

2457	   Pixel avg2( Pixel x, Pixel y) { return (x + y + 1) >> 1;}

2459	   /* Average of p[0] and p[1] may be considered to be a synthetic
2460	      pixel lying between the two, that is, one half-step past p. */

2462	   Pixel avg2p( const Pixel *p) { return avg2( p[0], p[1]);}

2464	   /* Main function.  Out-of-frame pixels in A or L should be set
2465	      to 128. */

2467	   void subblock_intra_predict(
2468	       Pixel B[4][4],     /* Y subblock prediction buffer */
2469	       const Pixel *A,    /* A[0]...A[7] = above row, A[-1] = P */
2470	       const Pixel *L,    /* L[0]...L[3] = left column, L[-1] = P */
2471	       intra_bmode mode   /* enum is in section 11.1 above */
2472	   ) {
2473	       Pixel E[9];        /* 9 already-constructed edge pixels */
2474	       E[0] = L[3];  E[1] = L[2];  E[2] = L[1];  E[3] = L[0];
2475	       E[4] = A[-1];      /* == L[-1] == P */
2476	       E[5] = A[0];  E[6] = A[1];  E[7] = A[2];  E[8] = A[3];

2478	     switch( mode) {
2479	       /* First four modes are similar to corresponding
2480	          full-block modes. */

2482	       case B_DC_PRED:
2483	       {
2484	           int v = 4;      /* DC sum/avg, 4 is rounding adjustment */
2485	           int i = 0;  do { v += A[i] + L[i];}  while( ++i < 4);
2486	           v >>= 3;        /* averaging 8 pixels */
2487	           i = 0;  do {    /* fill prediction buffer with constant DC
2488	                              value */
2489	               int j = 0;  do { B[i][j] = v;}  while( ++j < 4);
2490	           } while( ++i < 4);
2491	           break;
2492	       }

2494	       case B_TM_PRED: /* just like 16x16 TM_PRED */
2495	       {
2496	           int r = 0;  do {
2497	               int c = 0;  do {
2498	                   B[r][c] = clamp255( L[r] + A[c] - A[-1]);
2499	               } while( ++c < 4);
2500	           } while( ++r < 4);
2501	           break;
2502	       }

2504	       case B_VE_PRED: /* like 16x16 V_PRED except using averages */
2505	       {
2506	           int c = 0;  do { /* all 4 rows = smoothed top row */
2507	               B[0][c] = B[1][c] = B[2][c] = B[3][c] = avg3p( A + c);
2508	           } while( ++c < 4);
2509	           break;
2510	       }

2512	       case B_HE_PRED: /* like 16x16 H_PRED except using averages */
2513	       {
2514	           /* Bottom row is exceptional because L[4] does not exist */
2515	           int v = avg3( L[2], L[3], L[3]);
2516	           int r = 3;  while( 1) {  /* all 4 columns = smoothed left
2517	                                       column */
2518	               B[r][0] = B[r][1] = B[r][2] = B[r][3] = v;
2519	               if( --r < 0)
2520	                   break;
2521	               v = avg3p( L + r);  /* upper 3 rows use average of
2522	                                      3 pixels */

2524	           }
2525	           break;
2526	       }

2528	       /* The remaining six "diagonal" modes subdivide the
2529	          prediction buffer into diagonal lines.  All the pixels
2530	          on each line are assigned the same value; this value is
2531	          (a smoothed or synthetic version of) an
2532	          already-constructed predictor value lying on the same
2533	          line.  For clarity, in the comments, we express the
2534	          positions of these predictor pixels relative to the
2535	          upper left corner of the destination array B.

2537	          These modes are unique to subblock prediction and have
2538	          no full-block analogues.  The first two use lines at
2539	          +|- 45 degrees from horizontal (or, equivalently,
2540	          vertical), that is, lines whose slopes are +|- 1. */

2542	       case B_LD_PRED:    /* southwest (left and down) step =
2543	                             (-1, 1) or (1,-1) */
2544	           /* avg3p( A + j) is the "smoothed" pixel at (-1,j) */
2545	           B[0][0] = avg3p( A + 1);
2546	           B[0][1] = B[1][0] = avg3p( A + 2);
2547	           B[0][2] = B[1][1] = B[2][0] = avg3p( A + 3);
2548	           B[0][3] = B[1][2] = B[2][1] = B[3][0] = avg3p( A + 4);
2549	           B[1][3] = B[2][2] = B[3][1] = avg3p( A + 5);
2550	           B[2][3] = B[3][2] = avg3p( A + 6);
2551	           B[3][3] = avg3( A[6], A[7], A[7]); /* A[8] does not exist */
2552	           break;

2554	       case B_RD_PRED: /* southeast (right and down) step =
2555	                          (1,1) or (-1,-1) */
2556	           B[3][0] = avg3p( E + 1);  /* predictor is from (2, -1) */
2557	           B[3][1] = B[2][0] = avg3p( E + 2);  /* (1, -1) */
2558	           B[3][2] = B[2][1] = B[1][0] = avg3p( E + 3);  /* (0, -1) */
2559	           B[3][3] = B[2][2] = B[1][1] = B[0][0] =
2560	             avg3p( E + 4);  /* (-1, -1) */
2561	           B[2][3] = B[1][2] = B[0][1] = avg3p( E + 5);  /* (-1, 0) */
2562	           B[1][3] = B[0][2] = avg3p( E + 6);  /* (-1, 1) */
2563	           B[0][3] = avg3p( E + 7);  /* (-1, 2) */
2564	           break;

2566	       /* The remaining 4 diagonal modes use lines whose slopes are
2567	          +|- 2 and +|- 1/2.  The angles of these lines are roughly
2568	          +|- 27 degrees from horizontal or vertical.

2570	          Unlike the 45 degree diagonals, here we often need to
2571	          "synthesize" predictor pixels midway between two actual
2572	          predictors using avg2p(p), which we think of as returning
2573	          the pixel "at" p[1/2]. */

2575	       case B_VR_PRED:    /* SSE (vertical right) step =
2576	                             (2,1) or (-2,-1) */
2577	           B[3][0] = avg3p( E + 2);  /* predictor is from (1, -1) */
2578	           B[2][0] = avg3p( E + 3);  /* (0, -1) */
2579	           B[3][1] = B[1][0] = avg3p( E + 4);  /* (-1,  -1) */
2580	           B[2][1] = B[0][0] = avg2p( E + 4);  /* (-1, -1/2) */
2581	           B[3][2] = B[1][1] = avg3p( E + 5);  /* (-1,   0) */
2582	           B[2][2] = B[0][1] = avg2p( E + 5);  /* (-1, 1/2) */
2583	           B[3][3] = B[1][2] = avg3p( E + 6);  /* (-1,   1) */
2584	           B[2][3] = B[0][2] = avg2p( E + 6);  /* (-1, 3/2) */
2585	           B[1][3] = avg3p( E + 7);  /* (-1, 2) */
2586	           B[0][3] = avg2p( E + 7);  /* (-1, 5/2) */
2587	           break;

2589	       case B_VL_PRED:    /* SSW (vertical left) step =
2590	                             (2,-1) or (-2,1) */
2591	           B[0][0] = avg2p( A);  /* predictor is from (-1, 1/2) */
2592	           B[1][0] = avg3p( A + 1);  /* (-1, 1) */
2593	           B[2][0] = B[0][1] = avg2p( A + 1);  /* (-1, 3/2) */
2594	           B[1][1] = B[3][0] = avg3p( A + 2);  /* (-1,   2) */
2595	           B[2][1] = B[0][2] = avg2p( A + 2);  /* (-1, 5/2) */
2596	           B[3][1] = B[1][2] = avg3p( A + 3);  /* (-1,   3) */
2597	           B[2][2] = B[0][3] = avg2p( A + 3);  /* (-1, 7/2) */
2598	           B[3][2] = B[1][3] = avg3p( A + 4);  /* (-1,   4) */
2599	           /* Last two values do not strictly follow the pattern. */
2600	           B[2][3] = avg3p( A + 5);  /* (-1, 5) [avg2p( A + 4) =
2601	                                        (-1,9/2)] */
2602	           B[3][3] = avg3p( A + 6);  /* (-1, 6) [avg3p( A + 5) =
2603	                                        (-1,5)] */
2604	           break;

2606	       case B_HD_PRED:    /* ESE (horizontal down) step =
2607	                             (1,2) or (-1,-2) */
2608	           B[3][0] = avg2p( E);  /* predictor is from (5/2, -1) */
2609	           B[3][1] = avg3p( E + 1);  /* (2, -1) */
2610	           B[2][0] = B[3][2] = svg2p( E + 1);  /* ( 3/2, -1) */
2611	           B[2][1] = B[3][3] = avg3p( E + 2);  /* (   1, -1) */
2612	           B[2][2] = B[1][0] = avg2p( E + 2);  /* ( 1/2, -1) */
2613	           B[2][3] = B[1][1] = avg3p( E + 3);  /* (   0, -1) */
2614	           B[1][2] = B[0][0] = avg2p( E + 3);  /* (-1/2, -1) */
2615	           B[1][3] = B[0][1] = avg3p( E + 4);  /* (  -1, -1) */
2616	           B[0][2] = avg3p( E + 5);  /* (-1, 0) */
2617	           B[0][3] = avg3p( E + 6);  /* (-1, 1) */
2618	           break;

2620	       case B_HU_PRED:    /* ENE (horizontal up) step = (1,-2)
2621	                             or (-1,2) */
2622	           B[0][0] = avg2p( L);  /* predictor is from ( 1/2, -1) */
2623	           B[0][1] = avg3p( L + 1);  /* ( 1, -1) */
2624	           B[0][2] = B[1][0] = avg2p( L + 1);  /* (3/2, -1) */
2625	           B[0][3] = B[1][1] = avg3p( L + 2);  /* (  2, -1) */
2626	           B[1][2] = B[2][0] = avg2p( L + 2);  /* (5/2, -1) */
2627	           B[1][3] = B[2][1] = avg3( L[2], L[3], L[3]);  /* ( 3, -1) */
2628	           /* Not possible to follow pattern for much of the bottom
2629	              row because no (nearby) already-constructed pixels lie
2630	              on the diagonals in question. */
2631	           B[2][2] = B[2][3] = B[3][0] = B[3][1] = B[3][2] = B[3][3]
2632	             = L[3];
2633	     }
2634	   }

2636	   ---- End code block ----------------------------------------

2638	   The reference decoder implementation of subblock intra-prediction may
2639	   be found in reconintra4x4.c.

2641	13.  DCT Coefficient Decoding

2643	   The second data partition consists of an encoding of the quantized
2644	   DCT (and WHT) coefficients of the residue signal.  As discussed in
2645	   the format overview (Chapter 2), for each macroblock, the residue is
2646	   added to the (intra- or inter-generated) prediction buffer to produce
2647	   the final (except for loop-filtering) reconstructed macroblock.

2649	   VP8 works exclusively with 4x4 DCTs and WHTs, applied to the 24 (or
2650	   25 with the Y2 subblock) 4x4 subblocks of a macroblock.  The ordering
2651	   of macroblocks within any of the "residue" partitions in general
2652	   follows the same raster-scan as used in the first "prediction"
2653	   partition.

2655	   For all intra- and inter-prediction modes apart from B_PRED (intra:
2656	   whose Y subblocks are independently predicted) and SPLIT_MV (inter)
2657	   each macroblock's residue record begins with the Y2 component of the
2658	   residue, coded using a WHT.  B_PRED and SPLIT_MV coded macroblocks
2659	   omit this WHT, instead specifying the 0th DCT coefficient of each of
2660	   the 16 Y subblocks as part of its DCT.

2662	   After the optional Y2 block, the residue record continues with 16
2663	   DCTs for the Y subblocks, followed by 4 DCTs for the U subblocks,
2664	   ending with 4 DCTs for the V subblocks.  The subblocks occur in the
2665	   usual order.

2667	   The DCTs and WHT are tree-coded using a 12-element alphabet whose
2668	   members we call tokens.  Except for the end of block token (which
2669	   sets the remaining subblock coefficients to zero and is followed by
2670	   the next block), each token (sometimes augmented with data
2671	   immediately following the token) specifies the value of the single
2672	   coefficient at the current (implicit) position and is followed by a
2673	   token applying to the next (implicit) position.

2675	   For all the Y and chroma subblocks, the ordering of the coefficients
2676	   follows a so-called zig-zag order.  DCTs begin at coefficient 1 if Y2
2677	   is present, and begin at coefficient 0 if Y2 is absent.  The WHT for
2678	   a Y2 subblock always begins at coefficient 0.

2680	13.1.  MB Without non-Zero Coefficient Values

2682	   If the flag within macroblock mode info indicates that a macroblock
2683	   does not have any non-zero coefficients, the decoding process of DCT
2684	   coefficients is skipped for the macroblock.

2686	13.2.  Coding of Individual Coefficient Values

2688	   The coding of coefficient tokens is the same for the DCT and WHT and
2689	   for the remainder of this chapter DCT should be taken to mean either
2690	   DCT or WHT.

2692	   All tokens (except end-of-block) specify either a single unsigned
2693	   value or a range of unsigned values (immediately) followed by a
2694	   simple probabilistic encoding of the offset of the value from the
2695	   base of that range.

2697	   Non-zero values (of either type) are then followed by a flag
2698	   indicating the sign of the coded value (negative if 1, positive if
2699	   0).

2701	   Here are the tokens and decoding tree.

2703	   ---- Begin code block --------------------------------------

2705	   typedef enum
2706	   {
2707	       DCT_0,      /* value 0 */
2708	       DCT_1,      /* 1 */
2709	       DCT_2,      /* 2 */
2710	       DCT_3,      /* 3 */
2711	       DCT_4,      /* 4 */
2712	       dct_cat1,   /* range 5 - 6  (size 2) */
2713	       dct_cat2,   /* 7 - 10   (4) */
2714	       dct_cat3,   /* 11 - 18  (8) */
2715	       dct_cat4,   /* 19 - 34  (16) */
2716	       dct_cat5,   /* 35 - 66  (32) */
2717	       dct_cat6,   /* 67 - 2048  (1982) */
2718	       dct_eob,    /* end of block */

2720	       num_dct_tokens   /* 12 */
2721	   }
2722	   dct_token;

2724	   const tree_index coef_tree [2 * (num_dct_tokens - 1)] =
2725	   {
2726	    -dct_eob, 2,               /* eob = "0"   */
2727	     -DCT_0, 4,                /* 0   = "10"  */
2728	      -DCT_1, 6,               /* 1   = "110" */
2729	       8, 12,
2730	        -DCT_2, 10,            /* 2   = "11100" */
2731	         -DCT_3, -DCT_4,       /* 3   = "111010", 4 = "111011" */
2732	        14, 16,
2733	         -dct_cat1, -dct_cat2, /* cat1 =  "111100",
2734	                                  cat2 = "111101" */
2735	        18, 20,
2736	         -dct_cat3, -dct_cat4, /* cat3 = "1111100",
2737	                                  cat4 = "1111101" */
2738	         -dct_cat5, -dct_cat6  /* cat4 = "1111110",
2739	                                  cat4 = "1111111" */
2740	   };

2742	   ---- End code block ----------------------------------------

2744	   While in general all DCT coefficients are decoded using the same
2745	   tree, decoding of certain DCT coefficients may skip the first branch,
2746	   whose preceding coefficient is a DCT_0.  This makes use of the fact
2747	   that in any block last non zero coefficient before the end of the
2748	   block is not 0, therefore no dct_eob follows a DCT_0 coefficient in
2749	   any block.

2751	   The tokens dct_cat1 ... dct_cat6 specify ranges of unsigned values,
2752	   the value within the range being formed by adding an unsigned offset
2753	   (whose width is 1, 2, 3, 4, 5, or 11 bits, respectively) to the base
2754	   of the range, using the following algorithm and fixed probability
2755	   tables.

2757	   ---- Begin code block --------------------------------------

2759	   uint DCTextra( bool_decoder *d, const Prob *p)
2760	   {
2761	       uint v = 0;
2762	       do { v += v + read_bool( d, *p);}  while( *++p);
2763	       return v;
2764	   }

2766	   const Prob Pcat1[] = { 159, 0};
2767	   const Prob Pcat2[] = { 165, 145, 0};
2768	   const Prob Pcat3[] = { 173, 148, 140, 0};
2769	   const Prob Pcat4[] = { 176, 155, 140, 135, 0};
2770	   const Prob Pcat5[] = { 180, 157, 141, 134, 130, 0};
2771	   const Prob Pcat6[] =
2772	       { 254, 254, 243, 230, 196, 177, 153, 140, 133, 130, 129, 0};

2774	   ---- End code block ----------------------------------------

2776	   If v, the unsigned value decoded using the coefficient tree, possibly
2777	   augmented by the process above, is non-zero, its sign is set by
2778	   simply reading a flag:

2780	   ---- Begin code block --------------------------------------

2782	   if( read_bool( d, 128))
2783	       v = -v;

2785	   ---- End code block ----------------------------------------

2787	13.3.  Token Probabilities

2789	   The probability specification for the token tree (unlike that for the
2790	   "extra bits" described above) is rather involved.  It uses three
2791	   pieces of context to index a large probability table, the contents of
2792	   which may be incrementally modified in the frame header.  The full
2793	   (non-constant) probability table is laid out as follows.

2795	   ---- Begin code block --------------------------------------

2797	   Prob coef_probs [4] [8] [3] [num_dct_tokens-1];
2798	   ---- End code block ----------------------------------------

2800	   Working from the outside in, the outermost dimension is indexed by
2801	   the type of plane being decoded:

2803	   o  0 - Y beginning at coefficient 1 (i.e., Y after Y2)

2805	   o  1 - Y2

2807	   o  2 - U or V

2809	   o  3 - Y beginning at coefficient 0 (i.e., Y in the absence of Y2).

2811	   The next dimension is selected by the position of the coefficient
2812	   being decoded.  That position c steps by ones up to 15, starting from
2813	   zero for block types 1, 2, or 3 and starting from one for block type
2814	   0.  The second array index is then

2816	   ---- Begin code block --------------------------------------

2818	   coef_bands [c]

2820	   ---- End code block ----------------------------------------

2822	   where

2824	   ---- Begin code block --------------------------------------

2826	   const int coef_bands [16] = {
2827	        0, 1, 2, 3, 6, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 7
2828	   };

2830	   ---- End code block ----------------------------------------

2832	   is a fixed mapping of position to "band".

2834	   The third dimension is the trickiest.  Roughly speaking, it measures
2835	   the "local complexity" or extent to which nearby coefficients are
2836	   non-zero.

2838	   For the first coefficient (DC, unless the block type is 0), we
2839	   consider the (already encoded) blocks within the same plane (Y2, Y,
2840	   U, or V) above and to the left of the current block.  The context
2841	   index is then the number (0, 1 or 2) of these blocks that had at
2842	   least one non-zero coefficient in their residue record.

2844	   Beyond the first coefficient, the context index is determined by the
2845	   absolute value of the most recently decoded coefficient (necessarily
2846	   within the current block) and is 0 if the last coefficient was a
2847	   zero, 1 if it was plus or minus one, and 2 if its absolute value
2848	   exceeded one.

2850	   Note that the intuitive meaning of this measure changes as
2851	   coefficients are decoded.  For example, prior to the first token, a
2852	   zero means that the neighbors are empty, suggesting that the current
2853	   block may also be empty.  After the first token, because an end-of-
2854	   block token must have at least one non-zero value before it, a zero
2855	   means that we just decoded a zero and hence guarantees that a non-
2856	   zero coefficient will appear later in this block.  However, this
2857	   shift in meaning is perfectly okay because the complete context
2858	   depends also on the coefficient band (and since band 0 is occupied
2859	   exclusively by position 0).

2861	   As with other contexts used by VP8, the "neighboring block" context
2862	   described here needs a special definition for subblocks lying along
2863	   the top row or left edge of the frame.  These "non-existent"
2864	   predictors above and to the left of the image are simply taken to be
2865	   empty -- that is, taken to contain no non-zero coefficients.

2867	   The residue decoding of each macroblock then requires, in each of two
2868	   directions (above and to the left), an aggregate coefficient
2869	   predictor consisting of a single Y2 predictor, two predictors for
2870	   each of U and V, and four predictors for Y. In accordance with the
2871	   scan-ordering of macroblocks, a decoder needs to maintain a single
2872	   "left" aggregate predictor and a row of "above" aggregate predictors.

2874	   Before decoding any residue, these maintained predictors may simply
2875	   be cleared, in compliance with the definition of "non-existent"
2876	   prediction.  After each block is decoded, the two predictors
2877	   referenced by the block are replaced with the (empty or non-empty)
2878	   state of the block, in preparation for the later decoding of the
2879	   blocks below and to the right of the block just decoded.

2881	   The fourth, and final, dimension of the token probability array is of
2882	   course indexed by (half) the position in the token tree structure, as
2883	   are all tree probability arrays.

2885	   The below pseudo-code illustrates the decoding process.  Note that
2886	   criteria, functions, etc. delimited with ** are either dependent on
2887	   decoder architecture or are elaborated on elsewhere in this document.

2889	   ---- Begin code block --------------------------------------

2891	   int block[16] = { 0 }; /* current 4x4 block coeffs */
2892	   int firstCoeff = 0;
2893	   int plane;
2894	   int ctx2;
2895	   int ctx3 = 0; /* the 3rd context referred to in above description */
2896	   Prob *probTable;
2897	   int token;
2898	   int sign;
2899	   int absValue;
2900	   int extraBits;
2901	   bool prevCoeffWasZero = false;
2902	   bool currentBlockHasCoeffs = false;
2903	   /* base coeff abs values per each category, elem #0 is
2904	      DCT_VAL_CATEGORY1, * #1 is DCT_VAL_CATEGORY2 etc */
2905	   int categoryBase[6] = { 5, 7, 11, 19, 35, 67 };

2907	   /* Determine plane to use */
2908	   if( **current_block_is_Y2_block** )       plane = 0;
2909	   else if ( **current_block_is_chroma** )   plane = 2;
2910	   else if ( **current_macroblock_has_Y2** ) plane = 1;
2911	   else                                      plane = 3;

2913	   /* For luma blocks of an "Y2 macroblock" we skip coeff index #0 */
2914	   if( plane == 1 )
2915	       firstCoeff++;

2917	   /* Determine whether neighbour 4x4 blocks have coeffiecients.
2918	      This is dependant of the plane we are currently decoding;
2919	      i.e. we check only coefficients from same plane as current
2920	      block. */
2921	   if( **left_neighbor_block_has_coefficients(plane)** )
2922	       ctx3++;
2923	   if( **above_neighbor_block_has_coefficients(plane)** )
2924	       ctx3++;

2926	   for( i = firstCoeff ; i < 16 ; ++i )
2927	   {
2928	       ctx2 = coef_bands[i];
2929	       probTable = coef_probs[plane][ctx2][ctx3];

2931	       /* skip first code (dct_eob) if previous token was DCT_0 */
2932	       if( prevCoeffWasZero )
2933	           token = treed_read ( d, **coef_tree_without_eob**,
2934	             probTable );
2935	       else
2936	           token = treed_read ( d, coef_tree, probTable );

2938	       if( token == dct_eob )
2939	           break;

2941	       if( token != DCT_0 )
2942	       {
2943	           currentBlockHasCoeffs = true;
2944	     if( **token_has_extra_bits(token)** )
2945	     {
2946	         extraBits = DCTextra( token );
2947	         absValue =
2948	             categoryBase[**token_to_cat_index(token)**] +
2949	       extraBits;
2950	     }
2951	     else
2952	     {
2953	         absValue = **token_to_abs_value(token)**;
2954	     }

2956	     sign = read_bool(d, 128);
2957	           block[i] = sign ? -absValue : absValue;
2958	       }
2959	       else
2960	       {
2961	           absValue = 0;
2962	       }

2964	       /* Set contexts and stuff for next coeff */
2965	       if( absValue == 0 )         ctx3 = 0;
2966	       else if ( absValue == 1 )   ctx3 = 1;
2967	       else                        ctx3 = 2;
2968	       prevCoeffWasZero = true;
2969	   }

2971	   /* Store current block status to decoder internals */
2972	   **block_has_coefficients[currentMb][currentBlock]** =
2973	     currentBlockHasCoeffs;

2975	   ---- End code block ----------------------------------------

2977	   While we have in fact completely described the coefficient decoding
2978	   procedure, the reader will probably find it helpful to consult the
2979	   reference implementation, which can be found in the file
2980	   detokenize.c.

2982	13.4.  Token Probability Updates

2984	   As mentioned above, the token-decoding probabilities may change from
2985	   frame to frame.  After detection of a key frame, they are of course
2986	   set to their defaults shown in Section 13.5; this must occur before
2987	   decoding the remainder of the header, as both key frames and
2988	   interframes may adjust these probabilities.

2990	   The layout and semantics of the coefficient probability update record
2991	   (Section I of the frame header) are straightforward.  For each
2992	   position in the coef_probs array there occurs a fixed-probability
2993	   bool indicating whether or not the corresponding probability should
2994	   be updated.  If the bool is true, there follows a P(8) replacing that
2995	   probability.  Note that updates are cumulative, that is, a
2996	   probability updated on one frame is in effect for all ensuing frames
2997	   until the next key frame, or until the probability is explicitly
2998	   updated by another frame.

3000	   The algorithm to effect the foregoing is simple:

3002	   ---- Begin code block --------------------------------------

3004	   int i = 0;  do {
3005	    int j = 0;  do {
3006	     int k = 0;  do {
3007	      int t = 0;  do {

3009	           if( read_bool( d, coef_update_probs [i] [j] [k] [t]))
3010	               coef_probs [i] [j] [k] [t] = read_literal( d, 8);

3012	      } while( ++t < num_dct_tokens - 1);
3013	     } while( ++k < 3);
3014	    } while( ++j < 8);
3015	   } while( ++i < 4);

3017	   ---- End code block ----------------------------------------

3019	   The (constant) update probabilities are as follows (they may also be
3020	   found in the reference decoder file coef_update_probs.c).

3022	   ---- Begin code block --------------------------------------

3024	   const Prob coef_update_probs [4] [8] [3] [num_dct_tokens-1] =
3025	   {
3026	    {
3027	     {
3028	      { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255},
3029	      { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255},
3030	      { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}
3031	     },
3032	     {
3033	      { 176, 246, 255, 255, 255, 255, 255, 255, 255, 255, 255},
3034	      { 223, 241, 252, 255, 255, 255, 255, 255, 255, 255, 255},
3035	      { 249, 253, 253, 255, 255, 255, 255, 255, 255, 255, 255}
3036	     },
3037	     {
3038	      { 255, 244, 252, 255, 255, 255, 255, 255, 255, 255, 255},
3039	      { 234, 254, 254, 255, 255, 255, 255, 255, 255, 255, 255},
3040	      { 253, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}
3041	     },
3042	     {
3043	      { 255, 246, 254, 255, 255, 255, 255, 255, 255, 255, 255},
3044	      { 239, 253, 254, 255, 255, 255, 255, 255, 255, 255, 255},
3045	      { 254, 255, 254, 255, 255, 255, 255, 255, 255, 255, 255}
3046	     },
3047	     {
3048	      { 255, 248, 254, 255, 255, 255, 255, 255, 255, 255, 255},
3049	      { 251, 255, 254, 255, 255, 255, 255, 255, 255, 255, 255},
3050	      { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}
3051	     },
3052	     {
3053	      { 255, 253, 254, 255, 255, 255, 255, 255, 255, 255, 255},
3054	      { 251, 254, 254, 255, 255, 255, 255, 255, 255, 255, 255},
3055	      { 254, 255, 254, 255, 255, 255, 255, 255, 255, 255, 255}
3056	     },
3057	     {
3058	      { 255, 254, 253, 255, 254, 255, 255, 255, 255, 255, 255},
3059	      { 250, 255, 254, 255, 254, 255, 255, 255, 255, 255, 255},
3060	      { 254, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}
3061	     },
3062	     {
3063	      { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255},
3064	      { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255},
3065	      { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}
3066	     }
3067	    },
3068	    {
3069	     {
3070	      { 217, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255},
3071	      { 225, 252, 241, 253, 255, 255, 254, 255, 255, 255, 255},
3072	      { 234, 250, 241, 250, 253, 255, 253, 254, 255, 255, 255}
3073	     },
3074	     {
3075	      { 255, 254, 255, 255, 255, 255, 255, 255, 255, 255, 255},
3076	      { 223, 254, 254, 255, 255, 255, 255, 255, 255, 255, 255},
3077	      { 238, 253, 254, 254, 255, 255, 255, 255, 255, 255, 255}
3078	     },
3079	     {
3080	      { 255, 248, 254, 255, 255, 255, 255, 255, 255, 255, 255},
3081	      { 249, 254, 255, 255, 255, 255, 255, 255, 255, 255, 255},
3082	      { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}
3083	     },
3084	     {
3085	      { 255, 253, 255, 255, 255, 255, 255, 255, 255, 255, 255},
3086	      { 247, 254, 255, 255, 255, 255, 255, 255, 255, 255, 255},
3087	      { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}
3088	     },
3089	     {
3090	      { 255, 253, 254, 255, 255, 255, 255, 255, 255, 255, 255},
3091	      { 252, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255},
3092	      { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}
3093	     },
3094	     {
3095	      { 255, 254, 254, 255, 255, 255, 255, 255, 255, 255, 255},
3096	      { 253, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255},
3097	      { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}
3098	     },
3099	     {
3100	      { 255, 254, 253, 255, 255, 255, 255, 255, 255, 255, 255},
3101	      { 250, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255},
3102	      { 254, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}
3103	     },
3104	     {
3105	      { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255},
3106	      { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255},
3107	      { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}
3108	     }
3109	    },
3110	    {
3111	     {
3112	      { 186, 251, 250, 255, 255, 255, 255, 255, 255, 255, 255},
3113	      { 234, 251, 244, 254, 255, 255, 255, 255, 255, 255, 255},
3114	      { 251, 251, 243, 253, 254, 255, 254, 255, 255, 255, 255}
3115	     },
3116	     {
3117	      { 255, 253, 254, 255, 255, 255, 255, 255, 255, 255, 255},
3118	      { 236, 253, 254, 255, 255, 255, 255, 255, 255, 255, 255},
3119	      { 251, 253, 253, 254, 254, 255, 255, 255, 255, 255, 255}
3120	     },
3121	     {
3122	      { 255, 254, 254, 255, 255, 255, 255, 255, 255, 255, 255},
3123	      { 254, 254, 254, 255, 255, 255, 255, 255, 255, 255, 255},
3124	      { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}
3125	     },
3126	     {
3127	      { 255, 254, 255, 255, 255, 255, 255, 255, 255, 255, 255},
3128	      { 254, 254, 255, 255, 255, 255, 255, 255, 255, 255, 255},
3129	      { 254, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}
3130	     },
3131	     {
3132	      { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255},
3133	      { 254, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255},
3134	      { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}
3135	     },
3136	     {
3137	      { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255},
3138	      { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255},
3139	      { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}
3140	     },
3141	     {
3142	      { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255},
3143	      { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255},
3144	      { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}
3145	     },
3146	     {
3147	      { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255},
3148	      { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255},
3149	      { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}
3150	     }
3151	    },
3152	    {
3153	     {
3154	      { 248, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255},
3155	      { 250, 254, 252, 254, 255, 255, 255, 255, 255, 255, 255},
3156	      { 248, 254, 249, 253, 255, 255, 255, 255, 255, 255, 255}
3157	     },
3158	     {
3159	      { 255, 253, 253, 255, 255, 255, 255, 255, 255, 255, 255},
3160	      { 246, 253, 253, 255, 255, 255, 255, 255, 255, 255, 255},
3161	      { 252, 254, 251, 254, 254, 255, 255, 255, 255, 255, 255}
3162	     },
3163	     {
3164	      { 255, 254, 252, 255, 255, 255, 255, 255, 255, 255, 255},
3165	      { 248, 254, 253, 255, 255, 255, 255, 255, 255, 255, 255},
3166	      { 253, 255, 254, 254, 255, 255, 255, 255, 255, 255, 255}
3167	     },
3168	     {
3169	      { 255, 251, 254, 255, 255, 255, 255, 255, 255, 255, 255},
3170	      { 245, 251, 254, 255, 255, 255, 255, 255, 255, 255, 255},
3171	      { 253, 253, 254, 255, 255, 255, 255, 255, 255, 255, 255}
3172	     },
3173	     {
3174	      { 255, 251, 253, 255, 255, 255, 255, 255, 255, 255, 255},
3175	      { 252, 253, 254, 255, 255, 255, 255, 255, 255, 255, 255},
3176	      { 255, 254, 255, 255, 255, 255, 255, 255, 255, 255, 255}
3177	     },
3178	     {
3179	      { 255, 252, 255, 255, 255, 255, 255, 255, 255, 255, 255},
3180	      { 249, 255, 254, 255, 255, 255, 255, 255, 255, 255, 255},
3181	      { 255, 255, 254, 255, 255, 255, 255, 255, 255, 255, 255}

3183	     },
3184	     {
3185	      { 255, 255, 253, 255, 255, 255, 255, 255, 255, 255, 255},
3186	      { 250, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255},
3187	      { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}
3188	     },
3189	     {
3190	      { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255},
3191	      { 254, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255},
3192	      { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}
3193	     }
3194	    }
3195	   };

3197	   ---- End code block ----------------------------------------

3199	13.5.  Default Token Probability Table

3201	   The default token probabilities are as follows.

3203	   ---- Begin code block --------------------------------------

3205	   const Prob default_coef_probs [4] [8] [3] [num_dct_tokens - 1] =
3206	   {
3207	    {
3208	     {
3209	      { 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128},
3210	      { 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128},
3211	      { 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128}
3212	     },
3213	     {
3214	      { 253, 136, 254, 255, 228, 219, 128, 128, 128, 128, 128},
3215	      { 189, 129, 242, 255, 227, 213, 255, 219, 128, 128, 128},
3216	      { 106, 126, 227, 252, 214, 209, 255, 255, 128, 128, 128}
3217	     },
3218	     {
3219	      {   1,  98, 248, 255, 236, 226, 255, 255, 128, 128, 128},
3220	      { 181, 133, 238, 254, 221, 234, 255, 154, 128, 128, 128},
3221	      {  78, 134, 202, 247, 198, 180, 255, 219, 128, 128, 128}
3222	     },
3223	     {
3224	      {   1, 185, 249, 255, 243, 255, 128, 128, 128, 128, 128},
3225	      { 184, 150, 247, 255, 236, 224, 128, 128, 128, 128, 128},
3226	      {  77, 110, 216, 255, 236, 230, 128, 128, 128, 128, 128}
3227	     },
3228	     {
3229	      {   1, 101, 251, 255, 241, 255, 128, 128, 128, 128, 128},
3230	      { 170, 139, 241, 252, 236, 209, 255, 255, 128, 128, 128},
3231	      {  37, 116, 196, 243, 228, 255, 255, 255, 128, 128, 128}
3232	     },
3233	     {
3234	      {   1, 204, 254, 255, 245, 255, 128, 128, 128, 128, 128},
3235	      { 207, 160, 250, 255, 238, 128, 128, 128, 128, 128, 128},
3236	      { 102, 103, 231, 255, 211, 171, 128, 128, 128, 128, 128}
3237	     },
3238	     {
3239	      {   1, 152, 252, 255, 240, 255, 128, 128, 128, 128, 128},
3240	      { 177, 135, 243, 255, 234, 225, 128, 128, 128, 128, 128},
3241	      {  80, 129, 211, 255, 194, 224, 128, 128, 128, 128, 128}
3242	     },
3243	     {
3244	      {   1,   1, 255, 128, 128, 128, 128, 128, 128, 128, 128},
3245	      { 246,   1, 255, 128, 128, 128, 128, 128, 128, 128, 128},
3246	      { 255, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128}
3247	     }
3248	    },
3249	    {
3250	     {
3251	      { 198,  35, 237, 223, 193, 187, 162, 160, 145, 155,  62},
3252	      { 131,  45, 198, 221, 172, 176, 220, 157, 252, 221,   1},
3253	      {  68,  47, 146, 208, 149, 167, 221, 162, 255, 223, 128}
3254	     },
3255	     {
3256	      {   1, 149, 241, 255, 221, 224, 255, 255, 128, 128, 128},
3257	      { 184, 141, 234, 253, 222, 220, 255, 199, 128, 128, 128},
3258	      {  81,  99, 181, 242, 176, 190, 249, 202, 255, 255, 128}
3259	     },
3260	     {
3261	      {   1, 129, 232, 253, 214, 197, 242, 196, 255, 255, 128},
3262	      {  99, 121, 210, 250, 201, 198, 255, 202, 128, 128, 128},
3263	      {  23,  91, 163, 242, 170, 187, 247, 210, 255, 255, 128}
3264	     },
3265	     {
3266	      {   1, 200, 246, 255, 234, 255, 128, 128, 128, 128, 128},
3267	      { 109, 178, 241, 255, 231, 245, 255, 255, 128, 128, 128},
3268	      {  44, 130, 201, 253, 205, 192, 255, 255, 128, 128, 128}
3269	     },
3270	     {
3271	      {   1, 132, 239, 251, 219, 209, 255, 165, 128, 128, 128},
3272	      {  94, 136, 225, 251, 218, 190, 255, 255, 128, 128, 128},
3273	      {  22, 100, 174, 245, 186, 161, 255, 199, 128, 128, 128}
3274	     },
3275	     {
3276	      {   1, 182, 249, 255, 232, 235, 128, 128, 128, 128, 128},
3277	      { 124, 143, 241, 255, 227, 234, 128, 128, 128, 128, 128},
3278	      {  35,  77, 181, 251, 193, 211, 255, 205, 128, 128, 128}

3280	     },
3281	     {
3282	      {   1, 157, 247, 255, 236, 231, 255, 255, 128, 128, 128},
3283	      { 121, 141, 235, 255, 225, 227, 255, 255, 128, 128, 128},
3284	      {  45,  99, 188, 251, 195, 217, 255, 224, 128, 128, 128}
3285	     },
3286	     {
3287	      {   1,   1, 251, 255, 213, 255, 128, 128, 128, 128, 128},
3288	      { 203,   1, 248, 255, 255, 128, 128, 128, 128, 128, 128},
3289	      { 137,   1, 177, 255, 224, 255, 128, 128, 128, 128, 128}
3290	     }
3291	    },
3292	    {
3293	     {
3294	      { 253,   9, 248, 251, 207, 208, 255, 192, 128, 128, 128},
3295	      { 175,  13, 224, 243, 193, 185, 249, 198, 255, 255, 128},
3296	      {  73,  17, 171, 221, 161, 179, 236, 167, 255, 234, 128}
3297	     },
3298	     {
3299	      {   1,  95, 247, 253, 212, 183, 255, 255, 128, 128, 128},
3300	      { 239,  90, 244, 250, 211, 209, 255, 255, 128, 128, 128},
3301	      { 155,  77, 195, 248, 188, 195, 255, 255, 128, 128, 128}
3302	     },
3303	     {
3304	      {   1,  24, 239, 251, 218, 219, 255, 205, 128, 128, 128},
3305	      { 201,  51, 219, 255, 196, 186, 128, 128, 128, 128, 128},
3306	      {  69,  46, 190, 239, 201, 218, 255, 228, 128, 128, 128}
3307	     },
3308	     {
3309	      {   1, 191, 251, 255, 255, 128, 128, 128, 128, 128, 128},
3310	      { 223, 165, 249, 255, 213, 255, 128, 128, 128, 128, 128},
3311	      { 141, 124, 248, 255, 255, 128, 128, 128, 128, 128, 128}
3312	     },
3313	     {
3314	      {   1,  16, 248, 255, 255, 128, 128, 128, 128, 128, 128},
3315	      { 190,  36, 230, 255, 236, 255, 128, 128, 128, 128, 128},
3316	      { 149,   1, 255, 128, 128, 128, 128, 128, 128, 128, 128}
3317	     },
3318	     {
3319	      {   1, 226, 255, 128, 128, 128, 128, 128, 128, 128, 128},
3320	      { 247, 192, 255, 128, 128, 128, 128, 128, 128, 128, 128},
3321	      { 240, 128, 255, 128, 128, 128, 128, 128, 128, 128, 128}
3322	     },
3323	     {
3324	      {   1, 134, 252, 255, 255, 128, 128, 128, 128, 128, 128},
3325	      { 213,  62, 250, 255, 255, 128, 128, 128, 128, 128, 128},
3326	      {  55,  93, 255, 128, 128, 128, 128, 128, 128, 128, 128}
3327	     },
3328	     {
3329	      { 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128},
3330	      { 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128},
3331	      { 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128}
3332	     }
3333	    },
3334	    {
3335	     {
3336	      { 202,  24, 213, 235, 186, 191, 220, 160, 240, 175, 255},
3337	      { 126,  38, 182, 232, 169, 184, 228, 174, 255, 187, 128},
3338	      {  61,  46, 138, 219, 151, 178, 240, 170, 255, 216, 128}
3339	     },
3340	     {
3341	      {   1, 112, 230, 250, 199, 191, 247, 159, 255, 255, 128},
3342	      { 166, 109, 228, 252, 211, 215, 255, 174, 128, 128, 128},
3343	      {  39,  77, 162, 232, 172, 180, 245, 178, 255, 255, 128}
3344	     },
3345	     {
3346	      {   1,  52, 220, 246, 198, 199, 249, 220, 255, 255, 128},
3347	      { 124,  74, 191, 243, 183, 193, 250, 221, 255, 255, 128},
3348	      {  24,  71, 130, 219, 154, 170, 243, 182, 255, 255, 128}
3349	     },
3350	     {
3351	      {   1, 182, 225, 249, 219, 240, 255, 224, 128, 128, 128},
3352	      { 149, 150, 226, 252, 216, 205, 255, 171, 128, 128, 128},
3353	      {  28, 108, 170, 242, 183, 194, 254, 223, 255, 255, 128}
3354	     },
3355	     {
3356	      {   1,  81, 230, 252, 204, 203, 255, 192, 128, 128, 128},
3357	      { 123, 102, 209, 247, 188, 196, 255, 233, 128, 128, 128},
3358	      {  20,  95, 153, 243, 164, 173, 255, 203, 128, 128, 128}
3359	     },
3360	     {
3361	      {   1, 222, 248, 255, 216, 213, 128, 128, 128, 128, 128},
3362	      { 168, 175, 246, 252, 235, 205, 255, 255, 128, 128, 128},
3363	      {  47, 116, 215, 255, 211, 212, 255, 255, 128, 128, 128}
3364	     },
3365	     {
3366	      {   1, 121, 236, 253, 212, 214, 255, 255, 128, 128, 128},
3367	      { 141,  84, 213, 252, 201, 202, 255, 219, 128, 128, 128},
3368	      {  42,  80, 160, 240, 162, 185, 255, 205, 128, 128, 128}
3369	     },
3370	     {
3371	      {   1,   1, 255, 128, 128, 128, 128, 128, 128, 128, 128},
3372	      { 244,   1, 255, 128, 128, 128, 128, 128, 128, 128, 128},
3373	      { 238,   1, 255, 128, 128, 128, 128, 128, 128, 128, 128}
3374	     }
3375	    }

3377	   };

3379	   ---- End code block ----------------------------------------

3381	14.  DCT and WHT Inversion and Macroblock Reconstruction

3383	14.1.  Dequantization

3385	   After decoding the DCTs/WHTs as described above, each (quantized)
3386	   coefficient in each subblock is multiplied by one of six
3387	   dequantization factors, the choice of factor depending on the plane
3388	   (Y2, Y, or chroma) and position (DC = coefficient zero, AC = any
3389	   other coefficient).  If the current macroblock has overridden the
3390	   quantization level (as described in Chapter 10) then the six factors
3391	   are looked up from two dequantization tables with appropriate scaling
3392	   and clamping using the single index supplied by the override.
3393	   Otherwise, the frame-level dequantization factors (as described in
3394	   Section 9.6 are used.  In either case, the multiplies are computed
3395	   and stored using 16-bit signed integers.

3397	   The two dequantization tables, which may also be found in the
3398	   reference decoder file quant_common.c, are as follows.

3400	   ---- Begin code block --------------------------------------

3402	   static const int dc_qlookup[QINDEX_RANGE] =
3403	   {
3404	       4,   5,   6,   7,   8,   9,  10,  10,   11,  12,  13,  14,  15,
3405	      16,  17,  17,  18,  19,  20,  20,  21,   21,  22,  22,  23,  23,
3406	      24,  25,  25,  26,  27,  28,  29,  30,   31,  32,  33,  34,  35,
3407	      36,  37,  37,  38,  39,  40,  41,  42,   43,  44,  45,  46,  46,
3408	      47,  48,  49,  50,  51,  52,  53,  54,   55,  56,  57,  58,  59,
3409	      60,  61,  62,  63,  64,  65,  66,  67,   68,  69,  70,  71,  72,
3410	      73,  74,  75,  76,  76,  77,  78,  79,   80,  81,  82,  83,  84,
3411	      85,  86,  87,  88,  89,  91,  93,  95,   96,  98, 100, 101, 102,
3412	      104, 106, 108, 110, 112, 114, 116, 118, 122, 124, 126, 128, 130,
3413	      132, 134, 136, 138, 140, 143, 145, 148, 151, 154, 157,
3414	   };

3416	   static const int ac_qlookup[QINDEX_RANGE] =
3417	   {
3418	       4,   5,   6,   7,   8,   9,  10,  11,  12,  13,  14,  15,  16,
3419	      17,  18,  19,  20,  21,  22,  23,  24,  25,  26,  27,  28,  29,
3420	      30,  31,  32,  33,  34,  35,  36,  37,  38,  39,  40,  41,  42,
3421	      43,  44,  45,  46,  47,  48,  49,  50,  51,  52,  53,  54,  55,
3422	      56,  57,  58,  60,  62,  64,  66,  68,  70,  72,  74,  76,  78,
3423	      80,  82,  84,  86,  88,  90,  92,  94,  96,  98, 100, 102, 104,
3424	     106, 108, 110, 112, 114, 116, 119, 122, 125, 128, 131, 134, 137,
3425	     140, 143, 146, 149, 152, 155, 158, 161, 164, 167, 170, 173, 177,
3426	     181, 185, 189, 193, 197, 201, 205, 209, 213, 217, 221, 225, 229,
3427	     234, 239, 245, 249, 254, 259, 264, 269, 274, 279, 284,
3428	   };

3430	   ---- End code block ----------------------------------------

3432	   Lookup values from the above two tables are directly used the DC and
3433	   AC coefficients in Y1 respectively.  For Y2 and chroma, values from
3434	   above tables undergo either a scaling process or clamping processing
3435	   before the multiplies.  Details to these scaling and clamping can be
3436	   found related lookup functions in quant_common.c.

3438	14.2.  Inverse Transforms

3440	   If the Y2 residue block exists (i.e., the macroblock luma mode is not
3441	   SPLITMV or B_PRED), it is inverted first (using the inverse WHT) and
3442	   the element of the result at row i, column j is used as the 0th
3443	   coefficient of the Y subblock at position (i, j), that is, the Y
3444	   subblock whose index is (i * 4) + j.  As discussed in Chapter 13, if
3445	   the luma mode is B_PRED or SPLITMV, the 0th Y coefficients are part
3446	   of the residue signal for the subblocks themselves.

3448	   In either case, the inverse transforms for the sixteen Y subblocks
3449	   and eight chroma subblocks are computed next.  All 24 of these
3450	   inversions are independent of each other; their results may (at least
3451	   conceptually) be stored in 24 separate 4x4 arrays.

3453	   As is done by the reference decoder, an implementation may wish to
3454	   represent the prediction and residue buffers as macroblock-sized
3455	   arrays (that is, a 16x16 Y buffer and two 8x8 chroma buffers).
3456	   Regarding the inverse DCT implementation given below, this requires a
3457	   simple adjustment to the address calculation for the resulting
3458	   residue pixels.

3460	14.3.  Implementation of the WHT Inversion

3462	   As previously discussed (see Chapters 2 and 13), for macroblocks
3463	   encoded using prediction modes other than B_PRED and SPLITMV, the DC
3464	   values derived from the DCT transform on the 16 Y blocks are
3465	   collected to construct a 25th block of a macroblock(16 Y, 4 U, 4 V
3466	   constitute the 24 blocks).  This 25th block is transformed using a
3467	   Walsh-Hadamard transform (WHT).

3469	   The inputs to the inverse WHT (that is, the dequantized
3470	   coefficients), the intermediate "horizontally detransformed" signal,
3471	   and the completely detransformed residue signal are all stored as
3472	   arrays of 16-bit signed integers.

3474	   Following the tradition of specifying bitstream format using the
3475	   decoding process, we specify the inverse WHT in the decoding process
3476	   using the following C style source code:

3478	   ---- Begin code block --------------------------------------

3480	   void vp8_short_inv_walsh4x4_c(short *input, short *output)
3481	   {
3482	     int i;
3483	     int a1, b1, c1, d1;
3484	     int a2, b2, c2, d2;
3485	     short *ip = input;
3486	     short *op = output;
3487	     int temp1, temp2;

3489	     for(i=0;i<4;i++)
3490	     {
3491	       a1 = ip[0] + ip[12];
3492	       b1 = ip[4] + ip[8];
3493	       c1 = ip[4] - ip[8];
3494	       d1 = ip[0] - ip[12];
3495	       op[0] = a1 + b1;
3496	       op[4] = c1 + d1;
3497	       op[8] = a1 - b1;
3498	       op[12]= d1 - c1;
3499	       ip++;
3500	       op++;
3501	     }
3502	     ip = output;
3503	     op = output;
3504	     for(i=0;i<4;i++)
3505	     {
3506	       a1 = ip[0] + ip[3];
3507	       b1 = ip[1] + ip[2];
3508	       c1 = ip[1] - ip[2];
3509	       d1 = ip[0] - ip[3];

3511	       a2 = a1 + b1;
3512	       b2 = c1 + d1;
3513	       c2 = a1 - b1;
3514	       d2 = d1 - c1;

3516	       op[0] = (a2+3)>>3;
3517	       op[1] = (b2+3)>>3;
3518	       op[2] = (c2+3)>>3;
3519	       op[3] = (d2+3)>>3;

3521	       ip+=4;
3522	       op+=4;
3523	     }
3524	   }

3526	   ---- End code block ----------------------------------------

3528	   In the case that there is only one non-zero DC value in input, the
3529	   inverse transform can be simplified to the following:

3531	   ---- Begin code block --------------------------------------

3533	   void vp8_short_inv_walsh4x4_1_c(short *input, short *output)
3534	   {
3535	     int i;
3536	     int a1;
3537	     short *op=output;

3539	     a1 = ((input[0] + 3)>>3);

3541	     for(i=0;i<4;i++)
3542	     {
3543	       op[0] = a1;
3544	       op[1] = a1;
3545	       op[2] = a1;
3546	       op[3] = a1;
3547	       op+=4;
3548	     }
3549	   }

3551	   ---- End code block ----------------------------------------

3553	   It should be noted, a conforming decoder should implement the inverse
3554	   transform using exactly the same rounding to achieve bit-wise
3555	   matching output to the output of the process specified by the above
3556	   "C" source code.

3558	   The reference decoder WHT inversion may be found in the files
3559	   invtrans.c and idctllm.c.

3561	14.4.  Implementation of the DCT Inversion

3563	   All of the DCT inversions are computed in exactly the same way.  In
3564	   principle, VP8 uses a classical 2D inverse discrete cosine transform,
3565	   implemented as two passes of 1-D inverse DCT.  The 1-D inverse DCT
3566	   was calculated using a similar algorithm to what was described in the
3567	   paper "Practical Fast 1-D DCT Algorithms with 11 Multiplications" by
3568	   Loeffler, Lightenberg and Moschytz.  However, the paper only provided
3569	   the 8-point and 16-point version of the algorithms, which was adapted
3570	   by On2 to perform the 4-point 1-D DCT.

3572	   Accurate calculation of 1-D DCT of the above algorithm requires
3573	   infinite precision.  VP8 of course can use only a finite-precision
3574	   approximation.  Also, the inverse DCT used by VP8 takes care of
3575	   normalization of the standard unitary transform, that is, every
3576	   dequantized coefficient has roughly double the size of the
3577	   corresponding unitary coefficient.  However, at all but the highest
3578	   datarates, the discrepancy between transmitted and ideal coefficients
3579	   is due almost entirely to (lossy) compression and not to errors
3580	   induced by finite-precision arithmetic.

3582	   The inputs to the inverse DCT (that is, the dequantized
3583	   coefficients), the intermediate "horizontally detransformed" signal,
3584	   and the completely detransformed residue signal are all stored as
3585	   arrays of 16-bit signed integers.  The details of the computation are
3586	   as follows.

3588	   It should also be noted that this implementation makes use of 16-bit
3589	   fixed point version of two multiplication constants:

3591	   sqrt(2) * cos (pi/8)

3593	   sqrt(2) * sin (pi/8)

3595	   Because the first constant is bigger than 1, to maintain the same 16-
3596	   bit fixed point precision as the second one, we make use of the fact
3597	   that

3599	   x * a = x + x*(a-1)

3601	   therefore

3603	   x * sqrt(2) * cos (pi/8) = x + x * ( sqrt(2) * cos(pi/8)-1)

3605	   ---- Begin code block --------------------------------------

3607	   /* IDCT implementation */
3608	   static const int cospi8sqrt2minus1=20091;
3609	   static const int sinpi8sqrt2      =35468;
3610	   void short_idct4x4llm_c(short *input, short *output, int pitch)
3611	   {
3612	     int i;
3613	     int a1, b1, c1, d1;

3615	     short *ip=input;
3616	     short *op=output;
3617	     int temp1, temp2;
3618	     int shortpitch = pitch>>1;

3620	     for(i=0;i<4;i++)
3621	     {
3622	       a1 = ip[0]+ip[8];
3623	       b1 = ip[0]-ip[8];
3624	       temp1 = (ip[4] * sinpi8sqrt2)>>16;
3625	       temp2 = ip[12]+((ip[12] * cospi8sqrt2minus1)>>16);
3626	       c1 = temp1 - temp2;

3628	       temp1 = ip[4] + ((ip[4] * cospi8sqrt2minus1)>>16);
3629	       temp2 = (ip[12] * sinpi8sqrt2)>>16;
3630	       d1 = temp1 + temp2;

3632	       op[shortpitch*0] = a1+d1;
3633	       op[shortpitch*3] = a1-d1;
3634	       op[shortpitch*1] = b1+c1;
3635	       op[shortpitch*2] = b1-c1;

3637	       ip++;
3638	       op++;
3639	     }
3640	     ip = output;
3641	     op = output;
3642	     for(i=0;i<4;i++)
3643	     {
3644	       a1 = ip[0]+ip[2];
3645	       b1 = ip[0]-ip[2];

3647	       temp1 = (ip[1] * sinpi8sqrt2)>>16;
3648	       temp2 = ip[3]+((ip[3] * cospi8sqrt2minus1)>>16);
3649	       c1 = temp1 - temp2;

3651	       temp1 = ip[1] + ((ip[1] * cospi8sqrt2minus1)>>16);
3652	       temp2 = (ip[3] * sinpi8sqrt2)>>16;
3653	       d1 = temp1 + temp2;

3655	       op[0] = (a1+d1+4)>>3;
3656	       op[3] = (a1-d1+4)>>3;
3657	       op[1] = (b1+c1+4)>>3;
3658	       op[2] = (b1-c1+4)>>3;

3660	       ip+=shortpitch;
3661	       op+=shortpitch;
3662	     }
3663	   }

3665	   ---- End code block ----------------------------------------

3667	   The reference decoder DCT inversion may be found in the files
3668	   invtrans.c and idctllm.c.

3670	14.5.  Summation of Predictor and Residue

3672	   Finally, the prediction and residue signals are summed to form the
3673	   reconstructed macroblock, which, except for loop filtering (taken up
3674	   next), completes the decoding process.

3676	   The summing procedure is fairly straightforward, having only a couple
3677	   of details.  The prediction and residue buffers are both arrays of
3678	   16-bit signed integers.  Each individual (Y, U, and V pixel) result
3679	   is calculated first as a 32-bit sum of the prediction and residue,
3680	   and is then saturated to 8-bit unsigned range (using, say, the
3681	   clamp255 function defined above) before being stored as an 8-bit
3682	   unsigned pixel value.

3684	   VP8 also supports a mode where the encoding of a bitstream guarantees
3685	   all reconstructed pixel values between 0 and 255, compliant
3686	   bitstreams of such requirements have the clamp_type bit in the frame
3687	   header set to 1.  In such case, the clamp255 is no longer required.

3689	   The summation process is the same, regardless of the (intra or inter)
3690	   mode of prediction in effect for the macroblock.  The reference
3691	   decoder implementation of reconstruction may be found in the file
3692	   recon.c.

3694	15.  Loop Filter

3696	   Loop filtering is the last stage of frame reconstruction and the
3697	   next-to-last stage of the decoding process.  The loop filter is
3698	   applied to the entire frame after the summation of predictor and
3699	   residue described in Chapter 14.

3701	   The purpose of the loop filter is to eliminate (or at least reduce)
3702	   visually objectionable artifacts associated with the semi-
3703	   independence of the coding of macroblocks and their constituent
3704	   subblocks.

3706	   As was discussed in Chapter 5, the loop filter is "integral" to
3707	   decoding, in that the results of loop filtering are used in the
3708	   prediction of subsequent frames.  Consequently, a functional decoder
3709	   implementation must perform loop filtering exactly as described here.
3710	   This is in distinction to any postprocessing that may be applied only
3711	   to the image immediately before display; such postprocessing is
3712	   entirely at the option of the implementor (and/or user) and has no
3713	   effect on decoding per se.

3715	   The baseline frame level parameters controlling the loop filter are
3716	   defined in the frame header (Chapter 9.4) along with a mechanism for
3717	   adjustment based on a macroblock's prediction mode and/or reference
3718	   frame.  The first is a flag selecting the type of filter (normal or
3719	   simple), the other two are numbers (loop_filter_level and
3720	   sharpness_level) that adjust the strength or sensitivity of the
3721	   filter.  As described in Chapters 9.3 and 10, loop_filter_level may
3722	   be also overridden on a per-macroblock basis using segmentation.

3724	   Loop filtering is one of the more computationally-intensive aspects
3725	   of VP8 decoding.  This is the reason for the existence of the
3726	   optional less-demanding simple filter type.  Also, the loop filter is
3727	   completely disabled if the loop_filter_level in the frame header is
3728	   zero; macroblock-level overrides are ignored in this case.  (It is of
3729	   course possible for a compressor to encode a frame in which only a
3730	   few macroblocks are loop filtered: The global loop_filter_level must
3731	   be non-zero and each macroblock can select one of four levels, most
3732	   of which could be zero.)

3734	   To facilitate efficient implementation, the VP8 decoding algorithms
3735	   generally, and the loop filter especially, were designed with SIMD
3736	   ("Single Instruction Multiple Datum" or "integer vector") processors
3737	   in mind.  The reference decoder implementation of loop filtering
3738	   (found in loopfilter.c) is, in effect, a portable SIMD specification
3739	   of the loop filtering algorithms intended to simplify a realization
3740	   on an actual SIMD processor.

3742	   Unfortunately, the approach taken there does not lead to maximal
3743	   efficency (restricted to the C language, that is) and, as far as a
3744	   pure algorithm specification is concerned, is in places obscure.  For
3745	   example, various aspects of filtering are conditioned on absolute
3746	   differences lying below certain thresholds.  An ordinary C
3747	   implementation would simply discriminate amongst these behaviors
3748	   using if statements.  The reference decoder instead effects this by
3749	   "masking arithmetic", that is, using "and" operations to
3750	   (conditionally) zero-out values to be added or subtracted to pixels.
3751	   Furthermore, the structure holding the various threshold values is
3752	   artificially parallelized.  While this mimics closely the approach
3753	   taken in vector-processor machine language, it is not how one usually
3754	   programs in C.

3756	   In this document, we take a different approach and present the
3757	   algorithms in a more straightforward, idiomatic, and terse C style.
3758	   Together with the reference version, we hope to provide the "best of
3759	   both worlds", that is, a pure algorithm specification here and a
3760	   strong suggestion as to an optimal actual implementation in
3761	   loopfilter.c.

3763	   We begin by discussing the aspects of loop filtering that are
3764	   independent of the controlling parameters and type of filter chosen.

3766	15.1.  Filter Geometry and Overall Procedure

3768	   The Y, U, and V planes are processed independently and, except for
3769	   the values of certain control parameters (derived from the
3770	   loop_filter_level and sharpness_level), identically.

3772	   The loop filter acts on the edges between adjacent macroblocks and on
3773	   the edges between adjacent subblocks of a macroblock.  All such edges
3774	   are horizontal or vertical.  For each pixel position on an edge, a
3775	   small number (two or three) of pixels adjacent to either side of the
3776	   position are examined and possibly modified.  The displacements of
3777	   these pixels are at a right angle to the edge orientation, that is,
3778	   for a horizontal edge, we treat the pixels immediately above and
3779	   below the edge position, for a vertical edge, we treat the pixels
3780	   immediately to the left and right of the edge.

3782	   We call this collection of pixels associated to an edge position a
3783	   segment; the length of a segment is 2, 4, 6, or 8.  Excepting that
3784	   the normal filter uses slightly different algorithms for, and that
3785	   either filter may apply different control parameters to, the edges
3786	   between macroblocks and those between subblocks, the treatment of
3787	   edges is quite uniform: All segments straddling an edge are treated
3788	   identically, there is no distinction between the treatment of
3789	   horizontal and vertical edges, whether between macroblocks or between
3790	   subblocks.

3792	   As a consequence, adjacent subblock edges within a macroblock may be
3793	   concatenated and processed in their entirety.  There is a single
3794	   8-pixel long vertical edge horizontally centered in each of the U and
3795	   V blocks (the concatenation of upper and lower 4-pixel edges between
3796	   chroma subblocks), and three 16-pixel long vertical edges at
3797	   horizontal positions 1/4, 1/2, and 3/4 the width of the luma
3798	   macroblock, each representing the concatenation of four 4-pixel sub-
3799	   edges between pairs of Y subblocks.

3801	   The macroblocks comprising the frame are processed in the usual
3802	   raster-scan order.  Each macroblock is "responsible for" the inter-
3803	   macroblock edges immediately above and left of it (but not the edges
3804	   below and right of it), as well as the edges between its subblocks.

3806	   For each macroblock M, there are four filtering steps, which are,
3807	   (almost) in order:

3809	   1.  If M is not on the leftmost column of macroblocks, filter across
3810	       the left (vertical) inter-macroblock edge of M.

3812	   2.  Filter across the vertical subblock edges within M.

3814	   3.  If M is not on the topmost row of macroblocks, filter across the
3815	       top (horizontal) inter-macroblock edge of M.

3817	   4.  Filter across the horizontal subblock edges within M.

3819	   We write MY, MU, and MV for the planar constituents of M, that is,
3820	   the 16x16 luma block, 8x8 U block, and 8x8 V block comprising M.

3822	   In step 1, for each of the three blocks MY, MU, and MV, we filter
3823	   each of the (16 luma or 8 chroma) segments straddling the column
3824	   separating the block from the block immediately to the left of it,
3825	   using the inter-macroblock filter and controls associated to the
3826	   loop_filter_level and sharpness_level.

3828	   In step 4, we filter across the (three luma and one each for U and V)
3829	   vertical subblock edges described above, this time using the inter-
3830	   subblock filter and controls.

3832	   Step 2 and 4 are skipped for macroblocks that satisfy the following
3833	   two conditions:

3835	   1.  Macroblock coding mode is neither B_PRED nor SPLTMV;
3836	   2.  There is no DCT coefficient coded for the whole macroblock.

3838	   For these macroblocks, loop filtering for edges between subblocks
3839	   internal to a macroblock is effectively skipped.  This skip strategy
3840	   significantly reduces VP8 loop-filtering complexity.

3842	   Edges between macroblocks and those between subblocks are treated
3843	   with different control parameters (and, in the case of the normal
3844	   filter, with different algorithms); luma and chroma edges are also
3845	   treated with different control parameters.  Except for pixel
3846	   addressing, there is no distinction between the treatment of vertical
3847	   and horizontal edges.  Luma edges are always 16 pixels long, chroma
3848	   edges are always 8 pixels long, and the segments straddling an edge
3849	   are treated identically; this of course facilitates vector
3850	   processing.

3852	   Because many pixels belong to segments straddling two or more edges,
3853	   and so will be filtered more than once, the order in which edges are
3854	   processed given above must be respected by any implementation.
3855	   Within a single edge, however, the segments straddling that edge are
3856	   disjoint and the order in which these segments are processed is
3857	   immaterial.

3859	   Before taking up the filtering algorithms themselves, we should
3860	   emphasize a point already made: Even though the pixel segments
3861	   associated to a macroblock are antecedent to the macroblock (that is,
3862	   lie within the macroblock or in already-constructed macroblocks), a
3863	   macroblock must not be filtered immediately after its
3864	   "reconstruction" (described in Chapter 14).  Rather, the loop filter
3865	   applies after all the macroblocks have been "reconstructed" (i.e.,
3866	   had their predictor summed with their residue); correct decoding is
3867	   predicated on the fact that already-constructed portions of the
3868	   current frame referenced via intra-prediction (described in Chapter
3869	   12) are not yet filtered.

3871	15.2.  Simple Filter

3873	   Having described the overall procedure of, and pixels affected by,
3874	   the loop filter, we turn our attention to the treatment of individual
3875	   segments straddling edges.  We begin by describing the simple filter,
3876	   which, as the reader might guess, is somewhat simpler than the normal
3877	   filter.

3879	   Note that the simple filter only applies to luma edges.  Chroma edges
3880	   are left unfiltered.

3882	   Roughly speaking, the idea of loop filtering is, within limits, to
3883	   reduce the difference between pixels straddling an edge.  Differences
3884	   in excess of a threshold (associated to the loop_filter_level) are
3885	   assumed to be "natural" and are unmodified; differences below the
3886	   threshold are assumed to be artifacts of quantization and the
3887	   (partially) separate coding of blocks, and are reduced via the
3888	   procedures described below.  While the loop_filter_level is in
3889	   principle arbitrary, the levels chosen by a VP8 compressor tend to be
3890	   correlated to quantization levels.

3892	   Most of the filtering arithmetic is done using 8-bit signed operands
3893	   (having a range -128 to +127, inclusive), supplemented by 16-bit
3894	   temporaries holding results of multiplies.

3896	   Sums and other temporaries need to be "clamped" to a valid signed
3897	   8-bit range:

3899	   ---- Begin code block --------------------------------------

3901	   int8 c( int v)
3902	   {
3903	       return (int8) (v < -128 ? -128 : (v < 128 ? v : 127));
3904	   }

3906	   ---- End code block ----------------------------------------

3908	   Since pixel values themselves are unsigned 8-bit numbers, we need to
3909	   convert between signed and unsigned values:

3911	   ---- Begin code block --------------------------------------

3913	   /* Convert pixel value (0 <= v <= 255) to an 8-bit signed
3914	      number. */
3915	   int8 u2s( Pixel v) { return (int8) (v - 128);}

3917	   /* Clamp, then convert signed number back to pixel value. */
3918	   Pixel s2u( int v) { return (Pixel) ( c(v) + 128);}

3920	   ---- End code block ----------------------------------------

3922	   Filtering is often predicated on absolute-value thresholds.  The
3923	   following function is the equivalent of the standard library function
3924	   abs, whose prototype is found in the standard header file stdlib.h.
3925	   For us, the argument v is always the difference between two pixels
3926	   and lies in the range -255 <= v <= +255.

3928	   ---- Begin code block --------------------------------------

3930	   int abs( int v) { return v < 0?  -v : v;}
3931	   ---- End code block ----------------------------------------

3933	   An actual implementation would of course use inline functions or
3934	   macros to accomplish these trivial procedures (which are used by both
3935	   the normal and simple loop filters).  An optimal implementation would
3936	   probably express them in machine language, perhaps using SIMD vector
3937	   instructions.  On many SIMD processors, the saturation accomplished
3938	   by the above clamping function is often folded into the arithmetic
3939	   instructions themselves, obviating the explicit step taken here.

3941	   To simplify the specification of relative pixel positions, we use the
3942	   word before to mean "immediately above" (for a vertical segment
3943	   straddling a horizontal edge) or "immediately to the left of" (for a
3944	   horizontal segment straddling a vertical edge) and the word after to
3945	   mean "immediately below" or "immediately to the right of".

3947	   Given an edge, a segment, and a limit value, the simple loop filter
3948	   computes a value based on the four pixels that straddle the edge (two
3949	   either side).  If that value is below a supplied limit, then, very
3950	   roughly speaking, the two pixel values are brought closer to each
3951	   other, "shaving off" something like a quarter of the difference.  The
3952	   same procedure is used for all segments straddling any type of edge,
3953	   regardless of the nature (inter-macroblock, inter-subblock, luma, or
3954	   chroma) of the edge; only the limit value depends on the edge-type.

3956	   The exact procedure (for a single segment) is as follows; the
3957	   subroutine common_adjust is used by both the simple filter presented
3958	   here and the normal filters discussed in Section 15.3 (Section 15.3).

3960	   ---- Begin code block --------------------------------------

3962	   int8 common_adjust(
3963	       int use_outer_taps,   /* filter is 2 or 4 taps wide */
3964	       const Pixel *P1,    /* pixel before P0 */
3965	       Pixel *P0,          /* pixel before edge */
3966	       Pixel *Q0,          /* pixel after edge */
3967	       const Pixel *Q1     /* pixel after Q0 */
3968	   ) {
3969	       cint8 p1 = u2s( *P1);   /* retrieve and convert all 4 pixels */
3970	       cint8 p0 = u2s( *P0);
3971	       cint8 q0 = u2s( *Q0);
3972	       cint8 q1 = u2s( *Q1);

3974	       /* Disregarding clamping, when "use_outer_taps" is false,
3975	          "a" is 3*(q0-p0). Since we are about to divide "a" by
3976	          8, in this case we end up multiplying the edge
3977	          difference by 5/8.

3979	          When "use_outer_taps" is true (as for the simple filter),
3980	          "a" is p1 - 3*p0 + 3*q0 - q1, which can be thought of as
3981	          a refinement of 2*(q0 - p0) and the adjustment is
3982	          something like (q0 - p0)/4. */

3984	       int8 a = c( ( use_outer_taps? c(p1 - q1) : 0 ) + 3*(q0 - p0) );

3986	       /* b is used to balance the rounding of a/8 in the case where
3987	          the "fractional" part "f" of a/8 is exactly 1/2. */

3989	       cint8 b = (a & 7)==4 ? -1 : 0;

3991	       /* Divide a by 8, rounding up when f >= 1/2.
3992	          Although not strictly part of the "C" language,
3993	          the right-shift is assumed to propagate the sign bit. */

3995	       a = c( a + 4) >> 3;

3997	       /* Subtract "a" from q0, "bringing it closer" to p0. */

3999	       *Q0 = s2u( q0 - a);

4001	       /* Add "a" (with adjustment "b") to p0, "bringing it closer"
4002	          to q0.

4004	          The clamp of "a+b", while present in the reference decoder,
4005	          is superfluous; we have -16 <= a <= 15 at this point. */

4007	       *P0 = s2u( p0 + c( a + b));

4009	       return a;
4010	   }

4012	   ---- End code block ----------------------------------------
4013	   ---- Begin code block --------------------------------------

4015	   void simple_segment(
4016	       uint8 edge_limit,   /* do nothing if edge difference
4017	                              exceeds limit */
4018	       const Pixel *P1,    /* pixel before P0 */
4019	       Pixel *P0,          /* pixel before edge */
4020	       Pixel *Q0,          /* pixel after edge */
4021	       const Pixel *Q1     /* pixel after Q0 */
4022	   ) {
4023	       if( (abs(*P0 - *Q0)*2 + abs(*P1-*Q1)/2) <= edge_limit))
4024	           common_adjust( 1, P1, P0, Q0, Q1);   /* use outer taps */
4025	   }

4027	   ---- End code block ----------------------------------------

4029	   We make a couple of remarks about the rounding procedure above.  When
4030	   b is zero (that is, when the "fractional part" of a is not 1/2 ), we
4031	   are (except for clamping) adding the same number to p0 as we are
4032	   subtracting from q0.  This preserves the average value of p0 and q0
4033	   but the resulting difference between p0 and q0 is always even; in
4034	   particular, the smallest non-zero gradation +-1 is not possible here.

4036	   When b is one, the value we add to p0 (again except for clamping) is
4037	   one less than the value we are subtracting from q0.  In this case,
4038	   the resulting difference is always odd (and the small gradation +-1
4039	   is possible) but the average value is reduced by 1/2, yielding, for
4040	   instance, a very slight darkening in the luma plane.  (In the very
4041	   unlikely event of appreciable darkening after a large number of
4042	   interframes, a compressor would of course eventually compensate for
4043	   this in the selection of predictor and/or residue.)

4045	   The derivation of the edge_limit value used above, which depends on
4046	   the loop_filter_level and sharpness_level, as well as the type of
4047	   edge being processed, will be taken up after we describe the normal
4048	   loop filtering algorithm below.

4050	15.3.  Normal Filter

4052	   The normal loop filter is a refinement of the simple loop filter; all
4053	   of the general discussion above applies here as well.  In particular,
4054	   the functions c, u2s, s2u, abs, and common_adjust are used by both
4055	   the normal and simple filters.

4057	   As mentioned above, the normal algorithms for inter-macroblock and
4058	   inter-subblock edges differ.  Nonetheless, they have a great deal in
4059	   common: They use similar threshold algorithms to disable the filter
4060	   and to detect high internal edge variance (which influences the
4061	   filtering algorithm).  Both algorithms also use, at least
4062	   conditionally, the simple filter adjustment procedure described
4063	   above.

4065	   The common thresholding algorithms are as follows.

4067	   ---- Begin code block --------------------------------------

4069	   /* All functions take (among other things) a segment (of length
4070	      at most 4 + 4 = 8) symmetrically straddling an edge.

4072	      The pixel values (or pointers) are always given in order,
4073	      from the "beforemost" to the "aftermost". So, for a
4074	      horizontal edge (written "|"), an 8-pixel segment would be
4075	      ordered p3 p2 p1 p0 | q0 q1 q2 q3. */

4077	   /* Filtering is disabled if the difference between any two
4078	      adjacent "interior" pixels in the 8-pixel segment exceeds
4079	      the relevant threshold (I). A more complex thresholding
4080	      calculation is done for the group of four pixels that
4081	      straddle the edge, in line with the calculation in
4082	      simple_segment() above. */

4084	   int filter_yes(
4085	       uint8 I,        /* limit on interior differences */
4086	       uint8 E,        /* limit at the edge */

4088	       cint8 p3, cint8 p2, cint8 p1, cint8 p0, /* pixels before
4089	                                                  edge */
4090	       cint8 q0, cint8 q1, cint8 q2, cint8 q3  /* pixels after
4091	                                                  edge */
4092	   ) {
4093	       return  (abs(p0 - q0)*2 + abs(p1-q1)/2) <= E
4094	           &&  abs(p3 - p2) <= I  &&  abs(p2 - p1) <= I  &&
4095	             abs(p1 - p0) <= I
4096	           &&  abs(q3 - q2) <= I  &&  abs(q2 - q1) <= I  &&
4097	             abs(q1 - q0) <= I;
4098	   }

4100	   ---- End code block ----------------------------------------
4101	   ---- Begin code block --------------------------------------

4103	   /* Filtering is altered if (at least) one of the differences
4104	      on either side of the edge exceeds a threshold (we have
4105	      "high edge variance"). */

4107	   int hev(
4108	       uint8 threshold,
4109	       cint8 p1, cint8 p0, /* pixels before edge */
4110	       cint8 q0, cint8 q1  /* pixels after edge */
4111	   ) {
4112	       return abs(p1 - p0) > threshold  ||  abs(q1 - q0) > threshold;
4113	   }

4115	   ---- End code block ----------------------------------------

4117	   The subblock filter is a variant of the simple filter.  In fact, if
4118	   we have high edge variance, the adjustment is exactly as for the
4119	   simple filter.  Otherwise, the simple adjustment (without outer taps)
4120	   is applied and the two pixels one step in from the edge pixels are
4121	   adjusted by roughly half the amount by which the two edge pixels are
4122	   adjusted; since the edge adjustment here is essentially 3/8 the edge
4123	   difference, the inner adjustment is approximately 3/16 the edge
4124	   difference.

4126	   ---- Begin code block --------------------------------------

4128	   void subblock_filter(
4129	       uint8 hev_threshold,     /* detect high edge variance */
4130	       uint8 interior_limit,    /* possibly disable filter */
4131	       uint8 edge_limit,
4132	       cint8 *P3, cint8 *P2, int8 *P1, int8 *P0,   /* pixels before
4133	                                                      edge */
4134	       int8 *Q0, int8 *Q1, cint8 *Q2, cint8 *Q3    /* pixels after
4135	                                                      edge */
4136	   ) {
4137	       cint8 p3 = u2s(*P3), p2 = u2s(*P2), p1 = u2s(*P1),
4138	         p0 = u2s(*P0);
4139	       cint8 q0 = u2s(*Q0), q1 = u2s(*Q1), q2 = u2s(*Q2),
4140	         q3 = u2s(*Q3);

4142	       if( filter_yes( interior_limit, edge_limit, q3, q2, q1, q0,
4143	         p0, p1, p2, p3))
4144	       {
4145	           const int hv = hev( hev_threshold, p1, p0, q0, q1);

4147	           cint8 a = ( common_adjust( hv, P1, P0, Q0, Q1) + 1) >> 1;

4149	           if( !hv) {
4150	               *Q1 = s2u( q1 - a);
4151	               *P1 = s2u( p1 + a);
4152	           }
4153	       }
4154	   }

4156	   ---- End code block ----------------------------------------

4158	   The inter-macroblock filter has potentially wider scope.  If the edge
4159	   variance is high, it performs the simple adjustment (using the outer
4160	   taps, just like the simple filter and the corresponding case of the
4161	   normal subblock filter).  If the edge variance is low, we begin with
4162	   the same basic filter calculation and apply multiples of it to pixel
4163	   pairs symmetric about the edge; the magnitude of adjustment decays as
4164	   we move away from the edge and six of the pixels in the segment are
4165	   affected.

4167	   ---- Begin code block --------------------------------------

4169	   void MBfilter(
4170	       uint8 hev_threshold,     /* detect high edge variance */
4171	       uint8 interior_limit,    /* possibly disable filter */
4172	       uint8 edge_limit,
4173	       cint8 *P3, int8 *P2, int8 *P1, int8 *P0,  /* pixels before
4174	                                                    edge */
4175	       int8 *Q0, int8 *Q1, int8 *Q2, cint8 *Q3   /* pixels after
4176	                                                    edge */
4177	   ) {
4178	       cint8 p3 = u2s(*P3), p2 = u2s(*P2), p1 = u2s(*P1),
4179	         p0 = u2s(*P0);
4180	       cint8 q0 = u2s(*Q0), q1 = u2s(*Q1), q2 = u2s(*Q2),
4181	         q3 = u2s(*Q3);

4183	       if( filter_yes( interior_limit, edge_limit, q3, q2, q1, q0,
4184	         p0, p1, p2, p3))
4185	       {
4186	           if( !hev( hev_threshold, p1, p0, q0, q1))
4187	           {
4188	               /* Same as the initial calculation in "common_adjust",
4189	                  w is something like twice the edge difference */

4191	               const int8 w = c( c(p1 - q1) + 3*(q0 - p0) );

4193	               /* 9/64 is approximately 9/63 = 1/7 and 1<<7 = 128 =
4194	                  2*64. So this a, used to adjust the pixels adjacent
4195	                  to the edge, is something like 3/7 the edge
4196	                  difference. */

4198	               int8 a = c( (27*w + 63) >> 7);

4200	               *Q0 = s2u( q0 - a);  *P0 = s2u( p0 + a);

4202	               /* Next two are adjusted by 2/7 the edge difference */

4204	               a = c( (18*w + 63) >> 7);

4206	               *Q1 = s2u( q1 - a);  *P1 = s2u( p1 + a);

4208	               /* Last two are adjusted by 1/7 the edge difference */

4210	               a = c( (9*w + 63) >> 7);

4212	               *Q2 = s2u( q2 - a);  *P2 = s2u( p2 + a);

4214	           } else                      /* if hev, do simple filter */
4215	               common_adjust( 1, P1, P0, Q0, Q1);   /* using outer
4216	                                                       taps */
4217	       }

4219	   }

4221	   ---- End code block ----------------------------------------

4223	15.4.  Calculation of Control Parameters

4225	   We conclude the discussion of loop filtering by showing how the
4226	   thresholds supplied to the procedures above are derived from the two
4227	   control parameters sharpness_level (an unsigned 3-bit number having
4228	   maximum value 7) and loop_filter_level (an unsigned 6-bit number
4229	   having maximum value 63).

4231	   While the sharpness_level is constant over the frame, individual
4232	   macroblocks may override the loop_filter_level with one of four
4233	   possibilities supplied in the frame header (as described in Chapter
4234	   10).

4236	   Both the simple and normal filters disable filtering if a value
4237	   derived from the four pixels that straddle the edge (2 either side)
4238	   exceeds a threshold / limit value.

4240	   ---- Begin code block --------------------------------------

4242	   /* Luma and Chroma use the same inter-macroblock edge limit */
4243	   uint8 mbedge_limit = ((loop_filter_level + 2) * 2) +
4244	     interior_limit;

4246	   /* Luma and Chroma use the same inter-subblock edge limit */
4247	   uint8 sub_bedge_limit = (loop_filter_level * 2) + interior_limit;

4249	   ---- End code block ----------------------------------------

4251	   The remaining thresholds are used only by the normal filters.  The
4252	   filter-disabling interior difference limit is the same for all edges
4253	   (luma, chroma, inter-subblock, inter-macroblock) and is given by the
4254	   following.

4256	   ---- Begin code block --------------------------------------

4258	   uint8 interior_limit = loop_filter_level;

4260	   if( sharpness_level)
4261	   {
4262	       interior_limit  >>=  sharpness_level > 4 ?  2 : 1;
4263	       if( interior_limit > 9 - sharpness_level)
4264	           interior_limit = 9 - sharpness_level;
4265	   }
4266	   if( !interior_limit)
4267	       interior_limit = 1;

4269	   ---- End code block ----------------------------------------
4270	   Finally, we give the derivation of the high edge-variance threshold,
4271	   which is also the same for all edge types.

4273	   ---- Begin code block --------------------------------------

4275	   uint8 hev_threshold = 0;

4277	   if( we_are_decoding_akey_frame)   /* current frame is a key frame */
4278	   {
4279	       if( loop_filter_level >= 40)
4280	           hev_threshold = 2;
4281	       else if( loop_filter_level >= 15)
4282	           hev_threshold = 1;
4283	   }
4284	   else                            /* current frame is an interframe */
4285	   {
4286	       if( loop_filter_level >= 40)
4287	           hev_threshold = 3;
4288	       else if( loop_filter_level >= 20)
4289	           hev_threshold = 2;
4290	       else if( loop_filter_level >= 15)
4291	           hev_threshold = 1;
4292	   }

4294	   ---- End code block ----------------------------------------

4296	16.  Interframe Macroblock Prediction Records

4298	   We describe the layout and semantics of the prediction records for
4299	   macroblocks in an interframe.

4301	   After the feature specification (which is described in Chapter 10 and
4302	   is identical for intraframes and interframes), there comes a Bool(
4303	   prob_intra), which indicates inter-prediction (i.e., prediction from
4304	   prior frames) when true and intra-prediction (i.e., prediction from
4305	   already-coded portions of the current frame) when false.  The zero-
4306	   probability prob_intra is set by field J of the frame header.

4308	16.1.  Intra-Predicted Macroblocks

4310	   For intra-prediction, the layout of the prediction data is
4311	   essentially the same as the layout for key frames, although the
4312	   contexts used by the decoding process are slightly different.

4314	   As discussed in Chapter 8, the "outer" Y mode here uses a different
4315	   tree from that used in key frames, repeated here for convenience.

4317	   ---- Begin code block --------------------------------------

4319	   const tree_index ymode_tree [2 * (num_ymodes - 1)] =
4320	   {
4321	    -DC_PRED, 2,           /* root: DC_PRED = "0", "1" subtree */
4322	     4, 6,                 /* "1" subtree has 2 descendant subtrees */
4323	      -V_PRED, -H_PRED,    /* "10" subtree:  V_PRED = "100",
4324	                              H_PRED = "101" */
4325	      -TM_PRED, -B_PRED    /* "11" subtree:  TM_PRED = "110",
4326	                              B_PRED = "111" */
4327	   };

4329	   ---- End code block ----------------------------------------

4331	   The probability table used to decode this tree is variable.  As
4332	   described in Chapter 9 (Section 9), it (along with the similarly-
4333	   treated UV table) can be updated by field J of the frame header.
4334	   Similar to the coefficient-decoding probabilities, such updates are
4335	   cumulative and affect all ensuing frames until the next key frame or
4336	   explicit update.  The default probabilities for the Y and UV tables
4337	   are

4339	   ---- Begin code block --------------------------------------

4341	   Prob ymode_prob [num_ymodes - 1] = { 112, 86, 140, 37};
4342	   Prob uv_mode_prob [num_uv_modes - 1] = { 162, 101, 204};
4343	   ---- End code block ----------------------------------------

4345	   These defaults must be restored after detection of a key frame.

4347	   Just as for key frames, if the Y mode is B_PRED, there next comes an
4348	   encoding of the intra_bpred mode used by each of the sixteen Y
4349	   subblocks.  These encodings use the same tree as does that for key
4350	   frames but, in place of the contexts used in key frames, use the
4351	   single fixed probability table

4353	   ---- Begin code block --------------------------------------

4355	   const Prob bmode_prob [num_intra_bmodes - 1] = {
4356	       120, 90, 79, 133, 87, 85, 80, 111, 151
4357	   };

4359	   ---- End code block ----------------------------------------

4361	   Last comes the chroma mode, again coded using the same tree as that
4362	   for key frames, this time using the dynamic uv_mode_prob table
4363	   described above.

4365	   The calculation of the intra-prediction buffer is identical to that
4366	   described for key frames in Chapter 12.

4368	16.2.  Inter-Predicted Macroblocks

4370	   Otherwise (when the above bool is true), we are using inter-
4371	   prediction (which of course only happens for interframes), to which
4372	   we now restrict our attention.

4374	   The next datum is then another bool, B( prob_last), selecting the
4375	   reference frame.  If 0, the reference frame is previous frame (last
4376	   frame); if 1, another bool (prob_gf) selects the reference frame
4377	   between golden frame (0) or altref frame (1).  The probabilities
4378	   prob_last and prob_gf are set in field J of the frame header.

4380	   Together with setting the reference frame, the purpose of inter-mode
4381	   decoding is to set a motion vector for each of the sixteen Y
4382	   subblocks of the current macroblock.  This then defines the
4383	   calculation of the inter-prediction buffer (detailed in Chapter 18).
4384	   While the net effect of inter-mode decoding is straightforward, the
4385	   implementation is somewhat complex; the (lossless) compression
4386	   achieved by this method justifies the complexity.

4388	   After the reference frame selector comes the mode (or motion vector
4389	   reference) applied to the macroblock as a whole, coded using the
4390	   following enumeration and tree.  Setting mv_nearest = num_ymodes is a
4391	   convenience that allows a single variable to unambiguously hold an
4392	   inter- or intraprediction mode.

4394	   ---- Begin code block --------------------------------------

4396	   typedef enum
4397	   {
4398	       mv_nearest = num_ymodes, /* use "nearest" motion vector
4399	                                   for entire MB */
4400	       mv_near,                 /* use "next nearest" "" */
4401	       mv_zero,                 /* use zero "" */
4402	       mv_new,                  /* use explicit offset from
4403	                                   implicit "" */
4404	       mv_split,                /* use multiple motion vectors */

4406	       num_mv_refs = mv_split + 1 - mv_nearest
4407	   }
4408	   mv_ref;

4410	   const tree_index mv_ref_tree [2 * (num_mv_refs - 1)] =
4411	   {
4412	    -mv_zero, 2,                /* zero = "0" */
4413	     -mv_nearest, 4,            /* nearest = "10" */
4414	      -mv_near, 6,              /* near = "110" */
4415	        -mv_new, -mv_split      /* new = "1110", split = "1111" */
4416	   };

4418	   ---- End code block ----------------------------------------

4420	16.3.  Mode and Motion Vector Contexts

4422	   The probability table used to decode the mv_ref, along with three
4423	   reference motion vectors used by the selected mode, is calculated via
4424	   a survey of the already-decoded motion vectors in (up to) 3 nearby
4425	   macroblocks.

4427	   The algorithm generates a sorted list of distinct motion vectors
4428	   adjacent to the search site.  The best_mv is the vector with the
4429	   highest score.  The nearest_mv is the non-zero vector with the
4430	   highest score.  The near_mv is the non-zero vector with the next
4431	   highest score.  The number of motion vectors coded using the SPLITMV
4432	   mode is scored using the same weighting and is returned with the
4433	   scores of the best, nearest, and near vectors.

4435	   The three adjacent macroblocks above, left, and above-left are
4436	   considered in order.  If the macroblock is intra-coded, no action is
4437	   taken.  Otherwise, the motion vector is compared to other previously
4438	   found motion vectors to determine if it has been seen before, and if
4439	   so contributes its weight to that vector, otherwise enters a new
4440	   vector in the list.  The above and left vectors have twice the weight
4441	   of the above-left vector.

4443	   As is the case with many contexts used by VP8, it is possible for
4444	   macroblocks near the top or left edges of the image to reference
4445	   blocks that are outside the visible image.  VP8 provides a border of
4446	   1 macroblock filled with 0x0 motion vectors left of the left edge,
4447	   and a border filled with 0,0 motion vectors of 1 macroblocks above
4448	   the top edge.

4450	   Much of the process is more easily described in C than in English.
4451	   The reference code for this can be found in findnearmv.c.  The
4452	   calculation of reference vectors, probability table, and, finally,
4453	   the inter-prediction mode itself is implemented as follows.

4455	   ---- Begin code block --------------------------------------

4457	   typedef union
4458	   {
4459	       unsigned int as_int;
4460	       MV           as_mv;
4461	   } int_mv;        /* facilitates rapid equality tests */

4463	   static void mv_bias(MODE_INFO *x,int refframe, int_mv *mvp,
4464	     int * ref_frame_sign_bias )
4465	   {
4466	       MV xmv;
4467	       xmv = x->mbmi.mv.as_mv;
4468	       if ( ref_frame_sign_bias[x->mbmi.ref_frame] !=
4469	         ref_frame_sign_bias[refframe] )
4470	       {
4471	           xmv.row*=-1;
4472	           xmv.col*=-1;
4473	       }
4474	       mvp->as_mv = xmv;
4475	   }

4477	   ---- End code block ----------------------------------------
4478	   ---- Begin code block --------------------------------------

4480	   void vp8_clamp_mv(MV *mv, const MACROBLOCKD *xd)
4481	   {
4482	       if ( mv->col < (xd->mb_to_left_edge - LEFT_TOP_MARGIN) )
4483	           mv->col = xd->mb_to_left_edge - LEFT_TOP_MARGIN;
4484	       else if ( mv->col > xd->mb_to_right_edge + RIGHT_BOTTOM_MARGIN )
4485	           mv->col = xd->mb_to_right_edge + RIGHT_BOTTOM_MARGIN;

4487	       if ( mv->row < (xd->mb_to_top_edge - LEFT_TOP_MARGIN) )
4488	           mv->row = xd->mb_to_top_edge - LEFT_TOP_MARGIN;
4489	       else if ( mv->row > xd->mb_to_bottom_edge + RIGHT_BOTTOM_MARGIN )
4490	           mv->row = xd->mb_to_bottom_edge + RIGHT_BOTTOM_MARGIN;
4491	   }

4493	   ---- End code block ----------------------------------------

4495	   In the function vp8_find_near_mvs(), the vectors "nearest" and "near"
4496	   are used by the corresponding modes.

4498	   The vector best_mv is used as a base for explicitly-coded motion
4499	   vectors.

4501	   The first three entries in the return value cnt are (in order)
4502	   weighted census values for "zero", "nearest", and "near" vectors.
4503	   The final value indicates the extent to which SPLIT_MV was used by
4504	   the neighboring macroblocks.  The largest possible "weight" value in
4505	   each case is 5.

4507	   ---- Begin code block --------------------------------------

4509	   void vp8_find_near_mvs
4510	   (
4511	       MACROBLOCKD *xd,
4512	       const MODE_INFO *here,
4513	       MV *nearest,
4514	       MV *near,
4515	       MV *best_mv,
4516	       int cnt[4],
4517	       int refframe,
4518	       int * ref_frame_sign_bias
4519	   )
4520	   {
4521	       const MODE_INFO *above = here - xd->mode_info_stride;
4522	       const MODE_INFO *left = here - 1;
4523	       const MODE_INFO *aboveleft = above - 1;
4524	       int_mv            near_mvs[4];
4525	       int_mv           *mv = near_mvs;
4526	       int             *cntx = cnt;
4527	       enum {CNT_ZERO, CNT_NEAREST, CNT_NEAR, CNT_SPLITMV};

4529	       /* Zero accumulators */
4530	       mv[0].as_int = mv[1].as_int = mv[2].as_int = 0;
4531	       cnt[0] = cnt[1] = cnt[2] = cnt[3] = 0;

4533	       /* Process above */
4534	       if(above->mbmi.ref_frame != INTRA_FRAME) {
4535	           if(above->mbmi.mv.as_int) {
4536	               (++mv)->as_int = above->mbmi.mv.as_int;
4537	               mv_bias(above, refframe, mv, ref_frame_sign_bias);
4538	               ++cntx;
4539	           }
4540	           *cntx += 2;
4541	       }

4543	       /* Process left */
4544	       if(left->mbmi.ref_frame != INTRA_FRAME) {
4545	           if(left->mbmi.mv.as_int) {
4546	               int_mv this_mv;

4548	               this_mv.as_int = left->mbmi.mv.as_int;
4549	               mv_bias(left, refframe, &this_mv, ref_frame_sign_bias);

4551	               if(this_mv.as_int != mv->as_int) {
4552	                   (++mv)->as_int = this_mv.as_int;
4553	                   ++cntx;
4554	               }
4555	               *cntx += 2;
4556	           } else
4557	               cnt[CNT_ZERO] += 2;
4558	       }

4560	       /* Process above left */
4561	       if(aboveleft->mbmi.ref_frame != INTRA_FRAME) {
4562	           if(aboveleft->mbmi.mv.as_int) {
4563	               int_mv this_mv;

4565	               this_mv.as_int = aboveleft->mbmi.mv.as_int;
4566	               mv_bias(aboveleft, refframe, &this_mv,
4567	                 ref_frame_sign_bias);

4569	               if(this_mv.as_int != mv->as_int) {
4570	                   (++mv)->as_int = this_mv.as_int;
4571	                   ++cntx;
4572	               }
4573	               *cntx += 1;

4575	           } else
4576	               cnt[CNT_ZERO] += 1;
4577	       }

4579	       /* If we have three distinct MV's ... */
4580	       if(cnt[CNT_SPLITMV]) {
4581	           /* See if above-left MV can be merged with NEAREST */
4582	           if(mv->as_int == near_mvs[CNT_NEAREST].as_int)
4583	               cnt[CNT_NEAREST] += 1;
4584	       }

4586	       cnt[CNT_SPLITMV] = ((above->mbmi.mode == SPLITMV)
4587	                            + (left->mbmi.mode == SPLITMV)) * 2
4588	                           + (aboveleft->mbmi.mode == SPLITMV);

4590	       /* Swap near and nearest if necessary */
4591	       if(cnt[CNT_NEAR] > cnt[CNT_NEAREST]) {
4592	           int tmp;
4593	           tmp = cnt[CNT_NEAREST];
4594	           cnt[CNT_NEAREST] = cnt[CNT_NEAR];
4595	           cnt[CNT_NEAR] = tmp;
4596	           tmp = near_mvs[CNT_NEAREST].as_int;
4597	           near_mvs[CNT_NEAREST].as_int = near_mvs[CNT_NEAR].as_int;
4598	           near_mvs[CNT_NEAR].as_int = tmp;
4599	       }

4601	       /* Use near_mvs[0] to store the "best" MV */
4602	       if(cnt[CNT_NEAREST] >= cnt[CNT_ZERO])
4603	           near_mvs[CNT_ZERO] = near_mvs[CNT_NEAREST];

4605	       /* Set up return values */
4606	       *best_mv = near_mvs[0].as_mv;
4607	       *nearest = near_mvs[CNT_NEAREST].as_mv;
4608	       *near = near_mvs[CNT_NEAR].as_mv;

4610	       vp8_clamp_mv(nearest, xd);
4611	       vp8_clamp_mv(near, xd);
4612	       vp8_clamp_mv(best_mv, xd); //TODO: move this up before
4613	                                    the copy
4614	   }

4616	   ---- End code block ----------------------------------------

4618	   The mv_ref probability table (mv_ref_p) is then derived from the
4619	   census as follows.

4621	   ---- Begin code block --------------------------------------

4623	   const int vp8_mode_contexts[6][4] =
4624	   {
4625	     {   7,     1,     1,   143,   },
4626	     {  14,    18,    14,   107,   },
4627	     { 135,    64,    57,    68,   },
4628	     {  60,    56,   128,    65,   },
4629	     { 159,   134,   128,    34,   },
4630	     { 234,   188,   128,    28,   },
4631	   }

4633	   ---- End code block ----------------------------------------

4635	   ---- Begin code block --------------------------------------

4637	   vp8_prob *vp8_mv_ref_probs(vp8_prob mv_ref_p[VP8_MVREFS-1],
4638	     int cnt[4])
4639	   {
4640	       mv_ref_p[0] = vp8_mode_contexts [cnt[0]] [0];
4641	       mv_ref_p[1] = vp8_mode_contexts [cnt[1]] [1];
4642	       mv_ref_p[2] = vp8_mode_contexts [cnt[2]] [2];
4643	       mv_ref_p[3] = vp8_mode_contexts [cnt[3]] [3];
4644	       return p;
4645	   }

4647	   ---- End code block ----------------------------------------

4649	   Once mv_ref_p is established, the mv_ref is decoded as usual.

4651	   ---- Begin code block --------------------------------------

4653	     mvr = (mv_ref) treed_read( d, mv_ref_tree, mv_ref_p);

4655	   ---- End code block ----------------------------------------

4657	   For the first four inter-coding modes, the same motion vector is used
4658	   for all the Y subblocks.  The first three modes use an implicit
4659	   motion vector.

4661	   +------------+------------------------------------------------------+
4662	   | Mode       | Instruction                                          |
4663	   +------------+------------------------------------------------------+
4664	   | mv_nearest | Use the nearest vector returned by                   |
4665	   |            | vp8_find_near_mvs.                                   |
4666	   |            |                                                      |
4667	   | mv_near    | Use the near vector returned by vp8_find_near_mvs.   |
4668	   |            |                                                      |
4669	   | mv_zero    | Use a zero vector, that is, predict the current      |
4670	   |            | macroblock from the corresponding macroblock in the  |
4671	   |            | prediction frame.                                    |
4672	   |            |                                                      |
4673	   | NEWMV      | This mode is followed by an explicitly-coded motion  |
4674	   |            | vector (the format of which is described in the next |
4675	   |            | chapter) that is added (component-wise) to the       |
4676	   |            | best_mv reference vector returned by find_near_mvs   |
4677	   |            | and applied to all 16 subblocks.                     |
4678	   +------------+------------------------------------------------------+

4680	16.4.  Split Prediction

4682	   The remaining mode (SPLITMV) causes multiple vectors to be applied to
4683	   the Y subblocks.  It is immediately followed by a partition
4684	   specification that determines how many vectors will be specified and
4685	   how they will be assigned to the subblocks.  The possible partitions,
4686	   with indicated subdivisions and coding tree, are as follows.

4688	   ---- Begin code block --------------------------------------

4690	   typedef enum
4691	   {
4692	       mv_top_bottom,   /* two pieces {0...7} and {8...15} */
4693	       mv_left_right,   /* {0,1,4,5,8,9,12,13} and
4694	                           {2,3,6,7,10,11,14,15} */
4695	       mv_quarters,    /* {0,1,4,5}, {2,3,6,7}, {8,9,12,13},
4696	                          {10,11,14,15} */
4697	       MV_16,          /* every subblock gets its own vector
4698	                          {0} ... {15} */

4700	       mv_num_partitions
4701	   }
4702	   MVpartition;

4704	   const tree_index mvpartition_tree [2 * (mvnum_partition - 1)] =
4705	   {
4706	    -MV_16, 2,                         /* MV_16 = "0" */
4707	     -mv_quarters, 4,                  /* mv_quarters = "10" */
4708	      -mv_top_bottom, -mv_left_right   /* top_bottom = "110",
4709	                                          left_right = "111" */
4710	   };

4712	   ---- End code block ----------------------------------------

4714	   The partition is decoded using a fixed, constant probability table:

4716	   ---- Begin code block --------------------------------------

4718	   const Prob mvpartition_probs [mvnum_partition - 1] =
4719	     { 110, 111, 150};
4720	   part = (MVpartition) treed_read( d, mvpartition_tree,
4721	     mvpartition_probs);

4723	   ---- End code block ----------------------------------------

4725	   After the partition come two (for mv_top_bottom or mv_left_right),
4726	   four (for mv_quarters), or sixteen (for MV_16) subblock inter-
4727	   prediction modes.  These modes occur in the order indicated by the
4728	   partition layouts (given as comments to the MVpartition enum) and are
4729	   coded as follows.  (As was done for the macroblock-level modes, we
4730	   offset the mode enumeration so that a single variable may
4731	   unambiguously hold either an intra- or inter-subblock mode.)

4733	   Prior to decoding each subblock, a decoding tree context is chosen as
4734	   illustrated in the code snippet below.  The context is based on the
4735	   immediate left and above subblock neighbors, and whether they are
4736	   equal, are zero, or a combination of those.

4738	   ---- Begin code block --------------------------------------

4740	   typedef enum
4741	   {
4742	       LEFT4x4 = num_intra_bmodes,   /* use already-coded MV to
4743	                                        my left */
4744	       ABOVE4x4,             /* use already-coded MV above me */
4745	       ZERO4x4,              /* use zero MV */
4746	       NEW4x4,               /* explicit offset from "best" */

4748	       num_sub_mv_ref
4749	   };
4750	   sub_mv_ref;

4752	   const tree_index sub_mv_ref_tree [2 * (num_sub_mv_ref - 1)] =
4753	   {
4754	    -LEFT4X4, 2,           /* LEFT = "0" */
4755	     -ABOVE4X4, 4,         /* ABOVE = "10" */
4756	      -ZERO4X4, -NEW4X4    /* ZERO = "110", NEW = "111" */
4757	   };

4759	   /* Choose correct decoding tree context
4760	    * Function parameters are left subblock neighbor MV and above
4761	    * subblock neighbor MV */
4762	   int vp8_mvCont(MV *l, MV*a)
4763	   {
4764	       int lez = (l->row == 0 && l->col == 0);   /* left neighbour
4765	                                                    is zero */
4766	       int aez = (a->row == 0 && a->col == 0);   /* above neighbour
4767	                                                    is zero */
4768	       int lea = (l->row == a->row && l->col == a->col);  /* left
4769	                                neighbour equals above neighbour */

4771	       if(lea && lez)
4772	           return SUBMVREF_LEFT_ABOVE_ZED; /* =4 */

4774	       if(lea)
4775	           return SUBMVREF_LEFT_ABOVE_SAME; /* =3 */

4777	       if(aez)
4778	           return SUBMVREF_ABOVE_ZED; /* =2 */

4780	       if(lez)
4781	           return SUBMVREF_LEFT_ZED; /* =1*/

4783	       return SUBMVREF_NORMAL; /* =0 */

4785	   }

4787	   /* Constant probabilities and decoding procedure. */

4789	   const Prob sub_mv_ref_prob [5][num_sub_mv_ref - 1] = {
4790	       { 147,136,18 },
4791	       { 106,145,1  },
4792	       { 179,121,1  },
4793	       { 223,1  ,34 },
4794	       { 208,1  ,1  } };

4796	       sub_ref = (sub_mv_ref) treed_read( d, sub_mv_ref_tree,
4797	         sub_mv_ref_prob[context]);

4799	   ---- End code block ----------------------------------------

4801	   The first two sub-prediction modes simply copy the already-coded
4802	   motion vectors used by the blocks above and to-the-left of the
4803	   subblock at the upper left corner of the current subset (i.e.,
4804	   collection of subblocks being predicted).  These prediction blocks
4805	   need not lie in the current macroblock and, if the current subset
4806	   lies at the top or left edges of the frame, need not lie in the
4807	   frame.  In this latter case, their motion vectors are taken to be
4808	   zero, as are subblock motion vectors within an intra-predicted
4809	   macroblock.  Also, to ensure the correctness of prediction within
4810	   this macroblock, all subblocks lying in an already-decoded subset of
4811	   the current macroblock must have their motion vectors set.

4813	   ZERO4x4 uses a zero motion vector and predicts the current subset
4814	   using the corresponding subset from the prediction frame.

4816	   NEW4x4 is exactly like NEWMV except applied only to the current
4817	   subset.  It is followed by a 2-dimensional motion vector offset
4818	   (described in the next chapter) that is added to the best vector
4819	   returned by the earlier call to find_near_mvs to form the motion
4820	   vector in effect for the subset.

4822	   Parsing of both inter-prediction modes and motion vectors (described
4823	   next) can be found in the reference decoder file decodemv.c.

4825	17.  Motion Vector Decoding

4827	   As discussed above, motion vectors appear in two places in the VP8
4828	   datastream: applied to whole macroblocks in NEWMV mode and applied to
4829	   subsets of macroblocks in NEW4x4 mode.  The format of the vectors is
4830	   identical in both cases.

4832	   Each vector has two pieces: A vertical component (row) followed by a
4833	   horizontal component (column).  The row and column use separate
4834	   coding probabilities but are otherwise represented identically.

4836	17.1.  Coding of Each Component

4838	   Each component is a signed integer "V" representing a vertical or
4839	   horizontal luma displacement of "V" quarter-pixels (and a chroma
4840	   displacement of "V" eighth-pixels).  The absolute value of "V", if
4841	   non-zero, is followed by a boolean sign.  "V" may take any value
4842	   between -255 and +255, inclusive.

4844	   The absolute value "A" is coded in one of two different ways
4845	   according to its size.  For 0 <= "A" <= 7, "A" is tree-coded, and for
4846	   8 <= "A" <= 255, the bits in the binary expansion of "A" are coded
4847	   using independent boolean probabilities.  The coding of "A" begins
4848	   with a bool specifying which range is in effect.

4850	   Decoding a motion vector component then requires a 17-position
4851	   probability table, whose offsets, along with the procedure used to
4852	   decode components, are as follows:

4854	   ---- Begin code block --------------------------------------

4856	   typedef enum
4857	   {
4858	       mvpis_short,         /* short (<= 7) vs long (>= 8) */
4859	       MVPsign,             /* sign for non-zero */
4860	       MVPshort,            /* 8 short values = 7-position tree */

4862	       MVPbits = MVPshort + 7,      /* 8 long value bits
4863	                                       w/independent probs */

4865	       MVPcount = MVPbits + 8       /* 17 probabilities in total */
4866	   }
4867	   MVPindices;

4869	   typedef Prob MV_CONTEXT [MVPcount];    /* Decoding spec for
4870	                                             a single component */

4872	   /* Tree used for small absolute values (has expected
4873	      correspondence). */

4875	   const tree_index small_mvtree [2 * (8 - 1)] =
4876	   {
4877	    2, 8,          /* "0" subtree, "1" subtree */
4878	     4, 6,         /* "00" subtree", "01" subtree */
4879	      -0, -1,      /* 0 = "000", 1 = "001" */
4880	      -2, -3,      /* 2 = "010", 3 = "011" */
4881	     10, 12,       /* "10" subtree, "11" subtree */
4882	      -4, -5,      /* 4 = "100", 5 = "101" */
4883	      -6, -7       /* 6 = "110", 7 = "111" */
4884	   };

4886	   /* Read MV component at current decoder position, using
4887	      supplied probs. */

4889	   int read_mvcomponent( bool_decoder *d, const MV_CONTEXT *mvc)
4890	   {
4891	       const Prob * const p = (const Prob *) mvc;
4892	       int A = 0;

4894	       if( read_bool( d, p [mvpis_short]))      /* 8 <= A <= 255 */
4895	       {
4896	           /* Read bits 0, 1, 2 */

4898	           int i = 0;
4899	           do { A += read_bool( d, p [MVPbits + i]) << i;}
4900	             while( ++i < 3);

4902	           /* Read bits 7, 6, 5, 4 */

4904	           i = 7;
4905	           do { A += read_bool( d, p [MVPbits + i]) << i;}
4906	             while( --i > 3);

4908	           /* We know that A >= 8 because it is coded long,
4909	              so if A <= 15, bit 3 is one and is not
4910	              explicitly coded. */

4912	           if( !(A & 0xfff0)  ||  read_bool( d, p [MVPbits + 3]))
4913	               A += 8;
4914	       }
4915	       else                                    /* 0 <= A <= 7 */
4916	           A = treed_read( d, small_mvtree, p + MVPshort);

4918	       return A && read_bool( r, p [MVPsign]) ?  -A : A;
4919	   }
4920	   ---- End code block ----------------------------------------

4922	17.2.  Probability Updates

4924	   The decoder should maintain an array of two MV_CONTEXTs for decoding
4925	   row and column components, respectively.  These MV_CONTEXTs should be
4926	   set to their defaults every key frame.  Each individual probability
4927	   may be updated every interframe (by field "J" of the frame header)
4928	   using a constant table of update probabilities.  Each optional update
4929	   is of the form B?  P(7), that is, a bool followed by a 7-bit
4930	   probability specification if true.

4932	   As with other dynamic probabilities used by VP8, the updates remain
4933	   in effect until the next key frame or until replaced via another
4934	   update.

4936	   In detail, the probabilities should then be managed as follows.

4938	   ---- Begin code block --------------------------------------

4940	   /* Never-changing table of update probabilities for each
4941	      individual probability used in decoding motion vectors. */

4943	   const MV_CONTEXT vp8_mv_update_probs[2] =
4944	   {
4945	     {
4946	       237,
4947	       246,
4948	       253, 253, 254, 254, 254, 254, 254,
4949	       254, 254, 254, 254, 254, 250, 250, 252, 254, 254
4950	     },
4951	     {
4952	       231,
4953	       243,
4954	       245, 253, 254, 254, 254, 254, 254,
4955	       254, 254, 254, 254, 254, 251, 251, 254, 254, 254
4956	     }
4957	   };

4959	   /* Default MV decoding probabilities. */

4961	   const MV_CONTEXT default_mv_context[2] =
4962	   {
4963	     {                       // row
4964	       162,                    // is short
4965	       128,                    // sign
4966	         225, 146, 172, 147, 214,  39, 156,      // short tree
4967	       128, 129, 132,  75, 145, 178, 206, 239, 254, 254 // long bits

4969	     },

4971	     {                       // same for column
4972	       164,                    // is short
4973	       128,
4974	       204, 170, 119, 235, 140, 230, 228,
4975	       128, 130, 130,  74, 148, 180, 203, 236, 254, 254 // long bits

4977	     }
4978	   };

4980	   /* Current MV decoding probabilities, set to above defaults
4981	      every key frame. */

4983	   MV_CONTEXT mvc [2];     /* always row, then column */

4985	   /* Procedure for decoding a complete motion vector. */

4987	   typedef struct { int16 row, col;}  MV;  /* as in previous chapter */

4989	   MV read_mv( bool_decoder *d)
4990	   {
4991	       MV v;
4992	       v.row = (int16) read_mvcomponent( d, mvc);
4993	       v.col = (int16) read_mvcomponent( d, mvc + 1);
4994	       return v;
4995	   }

4997	   /* Procedure for updating MV decoding probabilities, called
4998	      every interframe with "d" at the appropriate position in
4999	      the frame header. */

5001	   void update_mvcontexts( bool_decoder *d)
5002	   {
5003	       int i = 0;
5004	       do {                      /* component = row, then column */
5005	           const Prob *up = mv_update_probs[i];    /* update probs
5006	                                                      for component */
5007	           Prob *p = mvc[i];                  /* start decode tbl "" */
5008	           Prob * const pstop = p + MVPcount; /* end decode tbl "" */
5009	           do {
5010	               if( read_bool( d, *up++))     /* update this position */
5011	               {
5012	                   const Prob x = read_literal( d, 7);

5014	                   *p = x? x<<1 : 1;
5015	               }
5016	           } while( ++p < pstop);              /* next position */

5018	       } while( ++i < 2);                      /* next component */
5019	   }

5021	   ---- End code block ----------------------------------------

5023	   This completes the description of the motion-vector decoding
5024	   procedure and, with it, the procedure for decoding interframe
5025	   macroblock prediction records.

5027	18.  Interframe Prediction

5029	   Given an inter-prediction specification for the current macroblock,
5030	   that is, a reference frame together with a motion vector for each of
5031	   the sixteen Y subblocks, we describe the calculation of the
5032	   prediction buffer for the macroblock.  Frame reconstruction is then
5033	   completed via the previously-described processes of residue summation
5034	   (Chapter 14 (Section 14)) and loop filtering (Chapter 15
5035	   (Section 15)).

5037	   The management of inter-predicted subblocks may be found in the
5038	   reference decoder file reconinter.c; sub-pixel interpolation is
5039	   implemented in filter_c.c.

5041	18.1.  Bounds on and Adjustment of Motion Vectors

5043	   It is possible within the VP8 format for a block or macroblock to
5044	   have an arbitrarily large motion vectors, due to the fact that each
5045	   motion vector is differentially encoded without any clamp from a
5046	   neighboring block or macroblock.

5048	   Because the motion vectors applied to the chroma subblocks have 1/8
5049	   pixel resolution, the synthetic pixel calculation, outlined in
5050	   Chapter 5 and detailed below, uses this resolution for the luma
5051	   subblocks as well.  In accordance, the stored luma motion vectors are
5052	   all doubled, each component of each luma vector becoming an even
5053	   integer in the range -510 to +510, inclusive.

5055	   The vector applied to each chroma subblock is calculated by averaging
5056	   the vectors for the 4 luma subblocks occupying the same visible area
5057	   as the chroma subblock in the usual correspondence, that is, the
5058	   vector for U and V block 0 is the average of the vectors for the Y
5059	   subblocks { 0, 1, 4, 5}, chroma block 1 corresponds to Y blocks { 2,
5060	   3, 6, 7}, chroma block 2 to Y blocks { 8, 9, 12, 13}, and chroma
5061	   block 3 to Y blocks { 10, 11, 14, 15}.

5063	   In detail, each of the two components of the vectors for each of the
5064	   chroma subblocks is calculated from the corresponding luma vector
5065	   components as follows:

5067	   ---- Begin code block --------------------------------------

5069	   int avg( int c1, int c2, int c3, int c4)
5070	   {
5071	       int s = c1 + c2 + c3 + c4;

5073	       /* The shift divides by 8 (not 4) because chroma pixels
5074	          have half the diameter of luma pixels.  The handling
5075	          of negative motion vector components is slightly
5076	          cumbersome because, strictly speaking, right shifts
5077	          of negative numbers are not well-defined in C. */

5079	       return s >= 0 ?  (s + 4) >> 3 : -( (-s + 4) >> 3);
5080	   }

5082	   ---- End code block ----------------------------------------

5084	   Furthermore, if the version number in the frame tag specifies only
5085	   full-pel chroma motion vectors, then the fractional parts of both
5086	   components of the vector are truncated to zero, as illustrated in the
5087	   following pseudo-code (assuming 3 bits of fraction for both luma and
5088	   chroma vectors):

5090	   ---- Begin code block --------------------------------------

5092	       x = x & (~7);
5093	       y = y & (~7);

5095	   ---- End code block ----------------------------------------

5097	   Earlier in this document we described the vp8_clamp_mv() function to
5098	   limit "nearest" and "near" motion vector predictors inside specified
5099	   margins within the frame boundaries.  Additional clamping is
5100	   performed for NEW_MV macroblocks, for which the final motion vector
5101	   is clamped again after combining the "best" predictor and the
5102	   differential vector decoded from the stream.

5104	   However, the secondary clamping is not performed for SPLIT_MV
5105	   macroblocks, meaning that any subblock's motion vector within the
5106	   SPLIT_MV macroblock may point outside the clamping zone.  These non-
5107	   clamped vectors are also used when determining the decoding tree
5108	   context for subsequent subblocks' modes in the vp8_mvCont() function.

5110	18.2.  Prediction Subblocks

5112	   The prediction calculation for each subblock is then as follows.
5113	   Temporarily disregarding the fractional part of the motion vector
5114	   (that is, rounding "up" or "left" by right-shifting each component 3
5115	   bits with sign propagation) and adding the origin (upper left
5116	   position) of the (16x16 luma or 8x8 chroma) current macroblock gives
5117	   us an origin in the Y, U, or V plane of the predictor frame (either
5118	   the golden frame or previous frame).

5120	   Considering that origin to be the upper left corner of a (luma or
5121	   chroma) macroblock, we need to specify the relative positions of the
5122	   pixels associated to that subblock, that is, any pixels that might be
5123	   involved in the sub-pixel interpolation processes for the subblock.

5125	18.3.  Sub-pixel Interpolation

5127	   The sub-pixel interpolation is effected via two one-dimensional
5128	   convolutions.  These convolutions may be thought of as operating on a
5129	   two-dimensional array of pixels whose origin is the subblock origin,
5130	   that is the origin of the prediction macroblock described above plus
5131	   the offset to the subblock.  Because motion vectors are arbitrary, so
5132	   are these "prediction subblock origins".

5134	   The integer part of the motion vector is subsumed in the origin of
5135	   the prediction subblock, the 16 (synthetic) pixels we need to
5136	   construct are given by 16 offsets from the origin.  The integer part
5137	   of each of these offsets is the offset of the corresponding pixel
5138	   from the subblock origin (using the vertical stride).  To these
5139	   integer parts is added a constant fractional part, which is simply
5140	   the difference between the actual motion vector and its integer
5141	   truncation used to calculate the origins of the prediction macroblock
5142	   and subblock.  Each component of this fractional part is an integer
5143	   between 0 and 7, representing a forward displacement in eighths of a
5144	   pixel.

5146	   It is these fractional displacements that determine the filtering
5147	   process.  If they both happen to be zero (that is, we had a "whole
5148	   pixel" motion vector), the prediction subblock is simply copied into
5149	   the corresponding piece of the current macroblock's prediction
5150	   buffer.  As discussed in Chapter 14, the layout of the macroblock's
5151	   prediction buffer can depend on the specifics of the reconstruction
5152	   implementation chosen.  Of course, the vertical displacement between
5153	   lines of the prediction subblock is given by the stride, as are all
5154	   vertical displacements used here.

5156	   Otherwise, at least one of the fractional displacements is non-zero.
5157	   We then synthesize the missing pixels via a horizontal, followed by a
5158	   vertical, one-dimensional interpolation.

5160	   The two interpolations are essentially identical.  Each uses an (at
5161	   most) six-tap filter (the choice of which of course depends on the
5162	   one-dimensional offset).  Thus, every calculated pixel references at
5163	   most three pixels before (above or to-the-left of) it and at most
5164	   three pixels after (below or to-the-right of) it.  The horizontal
5165	   interpolation must calculate two extra rows above and three extra
5166	   rows below the 4x4 block, to provide enough samples for the vertical
5167	   interpolation to proceed.

5169	   Depending on the reconstruction filter type given in the field
5170	   Version Number in the frame tag, either a bicubic or a bilinear tap
5171	   set is used.

5173	   The exact implementation of subsampling is as follows.

5175	   ---- Begin code block --------------------------------------

5177	   /* Filter taps taken to 7-bit precision.
5178	      Because DC is always passed, taps always sum to 128. */

5180	   const int BilinearFilters[8][6] =
5181	   {
5182	       { 0, 0, 128,   0, 0, 0 },
5183	       { 0, 0, 112,  16, 0, 0 },
5184	       { 0, 0,  96,  32, 0, 0 },
5185	       { 0, 0,  80,  48, 0, 0 },
5186	       { 0, 0,  64,  64, 0, 0 },
5187	       { 0, 0,  48,  80, 0, 0 },
5188	       { 0, 0,  32,  96, 0, 0 },
5189	       { 0, 0,  16, 112, 0, 0 }
5190	   };

5192	   const int filters [8] [6] = {        /* indexed by displacement */
5193	       { 0,  0,  128,    0,   0,  0 },  /* degenerate whole-pixel */
5194	       { 0, -6,  123,   12,  -1,  0 },  /* 1/8 */
5195	       { 2, -11, 108,   36,  -8,  1 },  /* 1/4 */
5196	       { 0, -9,   93,   50,  -6,  0 },  /* 3/8 */
5197	       { 3, -16,  77,   77, -16,  3 },  /* 1/2 is symmetric */
5198	       { 0, -6,   50,   93,  -9,  0 },  /* 5/8 = reverse of 3/8 */
5199	       { 1, -8,   36,  108, -11,  2 },  /* 3/4 = reverse of 1/4 */
5200	       { 0, -1,   12,  123,  -6,  0 }   /* 7/8 = reverse of 1/8 */
5201	   };

5203	   /* One-dimensional synthesis of a single sample.
5204	      Filter is determined by fractional displacement */

5206	   Pixel interp(
5207	       const int fil[6],   /* filter to apply */
5208	       const Pixel *p,     /* origin (rounded "before") in
5209	                              prediction area */

5211	       const int s         /* size of one forward step "" */
5212	   ) {
5213	       int32 a = 0;
5214	       int i = 0;
5215	       p -= s + s;         /* move back two positions */

5217	       do {
5218	           a += *p * fil[i];
5219	           p += s;
5220	       }  while( ++i < 6);

5222	       return clamp255( (a + 64) >> 7);    /* round to nearest
5223	                                              8-bit value */
5224	   }

5226	   /* First do horizontal interpolation, producing intermediate
5227	      buffer. */

5229	   void Hinterp(
5230	       Pixel temp[9][4],   /* 9 rows of 4 (intermediate)
5231	                              destination values */
5232	       const Pixel *p,     /* subblock origin in prediction
5233	                              frame */
5234	       int s,              /* vertical stride to be used in
5235	                              prediction frame */
5236	       uint hfrac,         /* 0 <= horizontal displacement <= 7 */
5237	       uint bicubic        /* 1=bicubic filter, 0=bilinear */
5238	   ) {
5239	       const int * const fil = bicubic ? filters [hfrac] :
5240	         BilinearFilters[hfrac];

5242	       int r = 0;  do              /* for each row */
5243	       {
5244	           int c = 0;  do          /* for each destination sample */
5245	           {
5246	               /* Pixel separation = one horizontal step = 1 */

5248	               temp[r][c] = interp( fil, p + c, 1);
5249	           }
5250	           while( ++c < 4);
5251	       }
5252	       while( p += s, ++r < 9);    /* advance p to next row */
5253	   }

5255	   /* Finish with vertical interpolation, producing final results.
5256	      Input array "temp" is of course that computed above. */

5258	   void Vinterp(
5259	       Pixel final[4][4],  /* 4 rows of 4 (final) destination values */
5260	       const Pixel temp[9][4],
5261	       uint vfrac,         /* 0 <= vertical displacement <= 7 */
5262	       uint bicubic        /* 1=bicubic filter, 0=bilinear */
5263	   ) {
5264	       const int * const fil = bicubic ? filters [vfrac] :
5265	         BilinearFilters[vfrac];

5267	       int r = 0;  do              /* for each row */
5268	       {
5269	           int c = 0;  do          /* for each destination sample */
5270	           {
5271	               /* Pixel separation = one vertical step = width
5272	                  of array = 4 */

5274	               final[r][c] = interp( fil, temp[r] + c, 4);
5275	           }
5276	           while( ++c < 4);
5277	       }
5278	       while( ++r < 4);
5279	   }

5281	   ---- End code block ----------------------------------------

5283	18.4.  Filter Properties

5285	   We discuss briefly the rationale behind the choice of filters.  Our
5286	   approach is necessarily cursory; a genuinely accurate discussion
5287	   would require a couple of books.  Readers unfamiliar with signal
5288	   processing may or may not wish to skip this.

5290	   All digital signals are of course sampled in some fashion.  The case
5291	   where the inter-sample spacing (say in time for audio samples, or
5292	   space for pixels) is uniform, that is, the same at all positions, is
5293	   particularly common and amenable to analysis.  Many aspects of the
5294	   treatment of such signals are best-understood in the frequency domain
5295	   via Fourier Analysis, particularly those aspects of the signal that
5296	   are not changed by shifts in position, especially when those
5297	   positional shifts are not given by a whole number of samples.

5299	   Non-integral translates of a sampled signal are a textbook example of
5300	   the foregoing.  In our case of non-integral motion vectors, we wish
5301	   to say what the underlying image "really is" at these pixels we don't
5302	   have values for but feel that it makes sense to talk about.  The
5303	   correctness of this feeling is predicated on the underlying signal
5304	   being band-limited, that is, not containing any energy in spatial
5305	   frequencies that cannot be faithfully rendered at the pixel
5306	   resolution at our disposal.  In one dimension, this range of "OK"
5307	   frequencies is called the Nyquist band; in our two-dimensional case
5308	   of integer-grid samples, this range might be termed a Nyquist
5309	   rectangle.  The finer the grid, the more we know about the image, and
5310	   the wider the Nyquist rectangle.

5312	   It turns out that, for such band-limited signals, there is indeed an
5313	   exact mathematical formula to produce the correct sample value at an
5314	   arbitrary point.  Unfortunately, this calculation requires the
5315	   consideration of every single sample in the image, as well as needing
5316	   to operate at infinite precision.  Also, strictly speaking, all band-
5317	   limited signals have infinite spatial (or temporal) extent, so
5318	   everything we are discussing is really some sort of approximation.

5320	   It is true that the theoretically correct subsampling procedure, as
5321	   well as any approximation thereof, is always given by a translation-
5322	   invariant weighted sum (or filter) similar to that used by VP8.  It
5323	   is also true that the reconstruction error made by such a filter can
5324	   be simply represented as a multiplier in the frequency domain, that
5325	   is, such filters simply multiply the Fourier transform of any signal
5326	   to which they are applied by a fixed function associated to the
5327	   filter.  This fixed function is usually called the frequency response
5328	   (or transfer function); the ideal subsampling filter has a frequency
5329	   response equal to one in the Nyquist rectangle and zero everywhere
5330	   else.

5332	   Another basic fact about approximations to "truly correct"
5333	   subsampling is that, the wider the subrectangle (within the Nyquist
5334	   rectangle) of spatial frequencies one wishes to "pass" (that is,
5335	   correctly render) or, put more accurately, the closer one wishes to
5336	   approximate the ideal transfer function, the more samples of the
5337	   original signal must be considered by the subsampling, and the wider
5338	   the calculation precision necessitated.

5340	   The filters chosen by VP8 were chosen, within the constraints of 4 or
5341	   6 taps and 7-bit precision, to do the best possible job of handling
5342	   the low spatial frequencies near the zeroth DC frequency along with
5343	   introducing no resonances (places where the absolute value of the
5344	   frequency response exceeds one).

5346	   The justification for the foregoing has two parts.  First, resonances
5347	   can produce extremely objectionable visible artifacts when, as often
5348	   happens in actual compressed video streams, filters are applied
5349	   repeatedly.  Second, the vast majority of energy in real-world images
5350	   lies near DC and not at the high-end; also, roughly speaking, human
5351	   perception tends to be more sensitive at these lower spatial
5352	   frequencies.

5354	   To get slightly more specific, the filters chosen by VP8 are the best
5355	   resonance-free 4- or 6-tap filters possible, where "best" describes
5356	   the frequency response near the origin: the response at 0 is required
5357	   to be 1 and the graph of the response at 0 is as flat as possible.

5359	   To provide an intuitively more obvious point of reference, the "best"
5360	   2-tap filter is given by simple linear interpolation between the
5361	   surrounding actual pixels.

5363	   Finally, it should be noted that, because of the way motion vectors
5364	   are calculated, the (shorter) 4-tap filters (used for odd fractional
5365	   displacements) are applied in the chroma plane only.  Human color
5366	   perception is notoriously poor, especially where higher spatial
5367	   frequencies are involved.  The shorter filters are easier to
5368	   understand mathematically, and the difference between them and a
5369	   theoretically slightly better 6-tap filter is negligible where chroma
5370	   is concerned.

5372	19.  Annex A: Bitstream Syntax

5374	   This annex presents the bitstream syntax in a tabular form.  All the
5375	   information elements have been introduced and explained in the
5376	   previous chapters but are collected here for a quick reference.  Each
5377	   syntax element is shortly described after the tabular representation
5378	   along with a reference to the corresponding paragraph in the main
5379	   document.  The meaning of each syntax element value is not repeated
5380	   here.

5382	   The top-level hierarchy of the bitstream is introduced in Chapter 4
5383	   (Section 4).

5385	   Definition of syntax element coding types can be found in Chapter 8
5386	   (Section 8).  The types used in the representation in this annex are:

5388	   o  f(n), n-bit value from stream (n successive bits, not boolean
5389	      encoded)

5391	   o  L(n), n-bit number encoded as n booleans (with equal probability
5392	      of being 0 or 1)

5394	   o  B(p), bool with probability p of being 0

5396	   o  T, tree-encoded value

5398	19.1.  Uncompressed Data Chunk

5400	                     +----------------------+-------+
5401	                     | Frame Tag            | Type  |
5402	                     +----------------------+-------+
5403	                     | frame_tag            | f(24) |
5404	                     |                      |       |
5405	                     | if (key_frame) {     |       |
5406	                     |                      |       |
5407	                     | start_code           | f(24) |
5408	                     |                      |       |
5409	                     | horizontal_size_code | f(16) |
5410	                     |                      |       |
5411	                     | vertical_size_code   | f(16) |
5412	                     |                      |       |
5413	                     | }                    |       |
5414	                     +----------------------+-------+

5416	   The 3-byte frame tag can be parsed as follows:

5418	   ---- Begin code block --------------------------------------

5420	   unsigned char *c = pbi->Source;
5421	   unsigned int tmp;

5423	   tmp = (c[2] << 16) | (c[1] << 8) | c[0];

5425	   key_frame = tmp & 0x1;
5426	   version = (tmp >> 1) & 0x7;
5427	   show_frame = (tmp >> 4) & 0x1;
5428	   first_part_size = (tmp >> 5) & 0x7FFFF;

5430	   ---- End code block ----------------------------------------

5432	   Where:

5434	   o  key_frame indicates if the current frame is a key frame or not

5436	   o  version determines the bitstream version

5438	   o  show_frame indicates if the current frame is meant to be displayed
5439	      or not

5441	   o  first_part_size determines the size of the first partition
5442	      (control partition)

5444	   The start_code is a constant 3-byte pattern having value 0x9d012a.
5445	   The latter part of the uncompressed chunk (after the start_code) can
5446	   be parsed as follows:

5448	   ---- Begin code block --------------------------------------

5450	   unsigned char *c = pbi->Source + 6;
5451	   unsigned int tmp;

5453	   tmp = (c[1] << 8) | c[0];

5455	   width = tmp & 0x3FFF;
5456	   horizontal_scale = tmp >> 14;

5458	   tmp = (c[3] << 8) | c[2];

5460	   height = tmp & 0x3FFF;
5461	   vertical_scale = tmp >> 14;

5463	   ---- End code block ----------------------------------------

5465	19.2.  Frame Header

5467	                 +-------------------------------+------+
5468	                 | Frame Header                  | Type |
5469	                 +-------------------------------+------+
5470	                 | if (key_frame) {              |      |
5471	                 |                               |      |
5472	                 | color_space                   | L(1) |
5473	                 |                               |      |
5474	                 | clamping_type                 | L(1) |
5475	                 |                               |      |
5476	                 | }                             |      |
5477	                 |                               |      |
5478	                 | segmentation_enabled          | L(1) |
5479	                 |                               |      |
5480	                 | if (segmentation_enabled) {   |      |
5481	                 |                               |      |
5482	                 | update_segmentation()         |      |
5483	                 |                               |      |
5484	                 | }                             |      |
5485	                 |                               |      |
5486	                 | filter_type                   | L(1) |
5487	                 |                               |      |
5488	                 | loop_filter_level             | L(6) |
5489	                 |                               |      |
5490	                 | sharpness_level               | L(3) |
5491	                 |                               |      |
5492	                 | mb_lf_adjustments()           |      |
5493	                 |                               |      |
5494	                 | log2_nbr_of_dct_partitions    | L(2) |
5495	                 |                               |      |
5496	                 | quant_indices()               |      |
5497	                 |                               |      |
5498	                 | if (key_frame) {              |      |
5499	                 |                               |      |
5500	                 | refresh_entropy_probs         | L(1) |
5501	                 |                               |      |
5502	                 | } else {                      |      |
5503	                 |                               |      |
5504	                 | refresh_golden_frame          | L(1) |
5505	                 |                               |      |
5506	                 | refresh_alternate_frame       | L(1) |
5507	                 |                               |      |
5508	                 | if (!refresh_golden_frame)    |      |
5509	                 |                               |      |
5510	                 | copy_buffer_to_golden         | L(2) |
5511	                 |                               |      |
5512	                 | if (!refresh_alternate_frame) |      |
5513	                 | copy_buffer_to_alternate      | L(2) |
5514	                 |                               |      |
5515	                 | sign_bias_golden              | L(1) |
5516	                 |                               |      |
5517	                 | sign_bias_alternate           | L(1) |
5518	                 |                               |      |
5519	                 | refresh_entropy_probs         | L(1) |
5520	                 |                               |      |
5521	                 | refresh_last                  | L(1) |
5522	                 |                               |      |
5523	                 | }                             |      |
5524	                 |                               |      |
5525	                 | token_prob_update()           |      |
5526	                 |                               |      |
5527	                 | mb_no_coeff_skip              | L(1) |
5528	                 +-------------------------------+------+

5530	              +--------------------------------------+------+
5531	              | Frame Header                         | Type |
5532	              +--------------------------------------+------+
5533	              | prob_skip_false                      | L(8) |
5534	              |                                      |      |
5535	              | if (!key_frame) {                    |      |
5536	              |                                      |      |
5537	              | prob_intra                           | L(8) |
5538	              |                                      |      |
5539	              | prob_last                            | L(8) |
5540	              |                                      |      |
5541	              | prob_golden                          | L(8) |
5542	              |                                      |      |
5543	              | intra_16x16_prob_update_flag         | L(1) |
5544	              |                                      |      |
5545	              | if (intra_16x16_prob_update_flag) {  |      |
5546	              |                                      |      |
5547	              | for (i = 0; i < 4; i++)              |      |
5548	              |                                      |      |
5549	              | intra_16x16_prob                     | L(8) |
5550	              |                                      |      |
5551	              | }                                    |      |
5552	              |                                      |      |
5553	              | intra_chroma prob_update_flag        | L(1) |
5554	              |                                      |      |
5555	              | if (intra_chroma_prob_update_flag) { |      |
5556	              |                                      |      |
5557	              | for (i = 0; i < 3; i++)              |      |
5558	              |                                      |      |
5559	              | intra_chroma_prob                    | L(8) |
5560	              |                                      |      |
5561	              | }                                    |      |
5562	              |                                      |      |
5563	              | mv_prob_update()                     |      |
5564	              |                                      |      |
5565	              | }                                    |      |
5566	              +--------------------------------------+------+

5568	   o  color_space defines the YUV color space of the sequence (9.2
5569	      (Section 9.2))

5571	   o  clamping_type specifies if the decoder is required to clamp the
5572	      reconstructed pixel values (9.2 (Section 9.2))

5574	   o  segmentation_enabled enables the segmentation feature for the
5575	      current frame (9.3 (Section 9.3))

5577	   o  filter_type determines whether the normal or the simple loop
5578	      filter is used (9.4 (Section 9.4), 15 (Section 15))

5580	   o  loop_filter_level controls the deblocking filter (9.4
5581	      (Section 9.4), 15 (Section 15))

5583	   o  sharpness_level controls the deblocking filter (9.4 (Section 9.4),
5584	      15 (Section 15))

5586	   o  log2_nbr_of_dct_partitions determines the number of separate
5587	      partitions containing the DCT coefficients of the macroblocks (9.5
5588	      (Section 9.5))

5590	   o  refresh_entropy_probs determines whether updated token
5591	      probabilities are used only for this frame or until further update

5593	   o  refresh_golden_frame determines if the current decoded frame
5594	      refreshes the golden frame (9.7 (Section 9.7))

5596	   o  refresh_alternate_frame determines if the current decoded frame
5597	      refreshes the alternate reference frame (9.7 (Section 9.7))

5599	   o  copy_buffer_to_golden determines if the golden reference is
5600	      replaced by another reference (9.7 (Section 9.7))

5602	   o  copy_buffer_to_alternate determines if the alternate reference is
5603	      replaced by another reference (9.7 (Section 9.7))

5605	   o  sign_bias_golden controls the sign of motion vectors when the
5606	      golden frame is referenced (9.7 (Section 9.7))

5608	   o  sign_bias_alternate controls the sign of motion vectors when the
5609	      alternate frame is referenced (9.7 (Section 9.7))

5611	   o  refresh_last determines if the current decoded frame refreshes the
5612	      last frame reference buffer (9.8 (Section 9.8))

5614	   o  mb_no_coeff_skip enables or disables the skipping of macroblocks
5615	      containing no non-zero coefficients (9.10 (Section 9.10))

5617	   o  prob_skip_false the probability that the macroblock is not skipped
5618	      (flag indicating skipped macroblock is false) (9.10
5619	      (Section 9.10))

5621	   o  prob_intra the probability of an intra macroblock (9.10
5622	      (Section 9.10))

5624	   o  prob_last the probability that the last reference frame is used
5625	      for inter prediction (9.10 (Section 9.10))

5627	   o  prob_golden the probability that the golden reference frame is
5628	      used for inter prediction (9.10 (Section 9.10))

5630	   o  intra_16x16_prob_update_flag indicates if the branch probabilies
5631	      used in the decoding of luma intra prediction mode are updated
5632	      (9.10 (Section 9.10))

5634	   o  intra_16x16_prob the branch probabilities of the luma intra
5635	      prediction mode decoding tree

5637	   o  intra_chroma_prob_update_flag indicates if the branch probabilies
5638	      used in the decoding of chroma intra prediction mode are updated
5639	      (9.10 (Section 9.10))

5641	   o  intra_chroma_prob the branch probabilities of the chroma intra
5642	      prediction mode decoding tree

5644	               +------------------------------------+------+
5645	               | update_segmentation()              | Type |
5646	               +------------------------------------+------+
5647	               | update_mb_segmentation_map         | L(1) |
5648	               |                                    |      |
5649	               | update_segment_feature_data        | L(1) |
5650	               |                                    |      |
5651	               | if (update_segment_feature_data) { |      |
5652	               |                                    |      |
5653	               | segment_feature_mode               | L(1) |
5654	               |                                    |      |
5655	               | for (i = 0; i < 4; i++) {          |      |
5656	               |                                    |      |
5657	               | quantizer_update                   | L(1) |
5658	               |                                    |      |
5659	               | if (quantizer_update) {            |      |
5660	               |                                    |      |
5661	               | quantizer_update_value             | L(7) |
5662	               |                                    |      |
5663	               | quantizer_update_sign              | L(1) |
5664	               |                                    |      |
5665	               | }                                  |      |
5666	               |                                    |      |
5667	               | }                                  |      |
5668	               |                                    |      |
5669	               | for (i = 0; i < 4; i++) {          |      |
5670	               |                                    |      |
5671	               | loop_filter_update                 | L(1) |
5672	               | if (loop_filter_update) {          |      |
5673	               |                                    |      |
5674	               | lf_update_value                    | L(6) |
5675	               |                                    |      |
5676	               | lf_update_sign                     | L(1) |
5677	               |                                    |      |
5678	               | }                                  |      |
5679	               |                                    |      |
5680	               | }                                  |      |
5681	               |                                    |      |
5682	               | }                                  |      |
5683	               |                                    |      |
5684	               | if (update_mb_segmentation_map) {  |      |
5685	               |                                    |      |
5686	               | for (i = 0; i < 3; i++) {          |      |
5687	               |                                    |      |
5688	               | segment_prob_update                | L(1) |
5689	               |                                    |      |
5690	               | if (segment_prob_update) {         |      |
5691	               |                                    |      |
5692	               | segment_prob                       | L(8) |
5693	               |                                    |      |
5694	               | }                                  |      |
5695	               |                                    |      |
5696	               | }                                  |      |
5697	               |                                    |      |
5698	               | }                                  |      |
5699	               +------------------------------------+------+

5701	   o  update_mb_segmentation_map determines if the MB segmentation map
5702	      is updated in the current frame (9.3 (Section 9.3))

5704	   o  update_segment_feature_data indicates if the segment feature data
5705	      is updated in the current frame (9.3 (Section 9.3))

5707	   o  segment_feature_mode indicates the feature data update mode, 0 for
5708	      delta and 1 for the absolute value (9.3 (Section 9.3))

5710	   o  quantizer_update indicates if the quantizer value is updated for
5711	      the i^(th) segment (9.3 (Section 9.3))

5713	   o  quantizer_update_value indicates the update value for the segment
5714	      quantizer (9.3 (Section 9.3))

5716	   o  quantizer_update_sign indicates the update sign for the segment
5717	      quantizer (9.3 (Section 9.3))

5719	   o  loop_filter_update indicates if the loop filter level value is
5720	      updated for the i^(th) segment (9.3 (Section 9.3))

5722	   o  lf_update_value indicates the update value for the loop filter
5723	      level (9.3 (Section 9.3))

5725	   o  lf_update_sign indicates the update sign for the loop filter level
5726	      (9.3 (Section 9.3))

5728	   o  segment_prob_update indicates if the branch probabilities used to
5729	      decode the segment_id in the MB header are decoded from the stream
5730	      or use the default value of 255 (9.3 (Section 9.3))

5732	   o  segment_prob the branch probabilities of the segment_id decoding
5733	      tree (9.3 (Section 9.3))
5734	               +------------------------------------+------+
5735	               | mb_lf_adjustments()                | Type |
5736	               +------------------------------------+------+
5737	               | loop_filter_adj_enable             | L(1) |
5738	               |                                    |      |
5739	               | if (loop_filter_adj_enable) {      |      |
5740	               |                                    |      |
5741	               | mode_ref_lf_delta_update           | L(1) |
5742	               |                                    |      |
5743	               | if (mode_ref_lf_delta_update) {    |      |
5744	               |                                    |      |
5745	               | for (i = 0; i < 4; i++) {          |      |
5746	               |                                    |      |
5747	               | ref_frame_delta_update_flag        | L(1) |
5748	               |                                    |      |
5749	               | if (ref_frame_delta_update_flag) { |      |
5750	               |                                    |      |
5751	               | delta_magnitude                    | L(6) |
5752	               |                                    |      |
5753	               | delta_sign                         | L(1) |
5754	               |                                    |      |
5755	               | }                                  |      |
5756	               |                                    |      |
5757	               | }                                  |      |
5758	               |                                    |      |
5759	               | for (i = 0; i < 4; i++) {          |      |
5760	               |                                    |      |
5761	               | mb_mode_delta_update_flag          | L(1) |
5762	               |                                    |      |
5763	               | if (mb_mode_delta_update_flag) {   |      |
5764	               |                                    |      |
5765	               | delta_magnitude                    | L(6) |
5766	               |                                    |      |
5767	               | delta_sign                         | L(1) |
5768	               |                                    |      |
5769	               | }                                  |      |
5770	               |                                    |      |
5771	               | }                                  |      |
5772	               |                                    |      |
5773	               | }                                  |      |
5774	               |                                    |      |
5775	               | }                                  |      |
5776	               +------------------------------------+------+

5778	   o  loop_filter_adj_enable indicates if the MB-level loop filter
5779	      adjustment (based on the used reference frame and coding mode) is
5780	      on for the current frame (9.4 (Section 9.4))

5782	   o  mode_ref_lf_delta_update indicates if the delta values used in
5783	      adjustment are updated in the current frame (9.4 (Section 9.4))

5785	   o  ref_frame_delta_update_flag indicates if the adjustment delta
5786	      value corresponding to a certain used reference frame is updated
5787	      (9.4 (Section 9.4))

5789	   o  delta_magnitude is the absolute value of the delta value

5791	   o  delta_sign is the sign of the delta value

5793	   o  mb_mode_delta_update_flag indicates if the adjustment delta value
5794	      corresponding to certain MB prediction mode is updated (9.4
5795	      (Section 9.4))
5796	                   +----------------------------+------+
5797	                   | quant_indices()            | Type |
5798	                   +----------------------------+------+
5799	                   | y_ac_qi                    | L(7) |
5800	                   |                            |      |
5801	                   | y_dc_delta_present         | L(1) |
5802	                   |                            |      |
5803	                   | if (y_dc_delta_present) {  |      |
5804	                   |                            |      |
5805	                   | y_dc_delta_magnitude       | L(4) |
5806	                   |                            |      |
5807	                   | y_dc_delta_sign            | L(1) |
5808	                   |                            |      |
5809	                   | }                          |      |
5810	                   |                            |      |
5811	                   | if (y2_dc_delta_present) { |      |
5812	                   |                            |      |
5813	                   | y2_dc_delta_magnitude      | L(4) |
5814	                   |                            |      |
5815	                   | y2_dc_delta_sign           | L(1) |
5816	                   |                            |      |
5817	                   | }                          |      |
5818	                   |                            |      |
5819	                   | if (y2_ac_delta_present) { |      |
5820	                   |                            |      |
5821	                   | y2_ac_delta_magnitude      | L(4) |
5822	                   |                            |      |
5823	                   | y2_ac_delta_sign           | L(1) |
5824	                   |                            |      |
5825	                   | }                          |      |
5826	                   |                            |      |
5827	                   | if (uv_dc_delta_present) { |      |
5828	                   |                            |      |
5829	                   | uv_dc_delta_magnitude      | L(4) |
5830	                   |                            |      |
5831	                   | uv_dc_delta_sign           | L(1) |
5832	                   |                            |      |
5833	                   | }                          |      |
5834	                   |                            |      |
5835	                   | if (uv_ac_delta_present) { |      |
5836	                   |                            |      |
5837	                   | uv_ac_delta_magnitude      | L(4) |
5838	                   |                            |      |
5839	                   | uv_ac_delta_sign           | L(1) |
5840	                   |                            |      |
5841	                   | }                          |      |
5842	                   +----------------------------+------+

5844	   o  y_ac_qi is the dequantization table index used for the luma AC
5845	      coefficients (and other coefficient groups if no delta value is
5846	      present) (9.6 (Section 9.6))

5848	   o  y_dc_delta_present indicates if the stream contains a delta value
5849	      that is added to the baseline index to obtain the luma DC
5850	      coefficient dequantization index (9.6 (Section 9.6))

5852	   o  y_dc_delta_magnitude the magnitude of the delta value (9.6
5853	      (Section 9.6))

5855	   o  y_dc_delta_sign the sign of the delta value (9.6 (Section 9.6))

5857	   o  y2_dc_delta_present indicates if the stream contains a delta value
5858	      that is added to the baseline index to obtain the Y2 block DC
5859	      coefficient dequantization index (9.6 (Section 9.6))

5861	   o  y2_ac_delta_present indicates if the stream contains a delta value
5862	      that is added to the baseline index to obtain the Y2 block AC
5863	      coefficient dequantization index (9.6 (Section 9.6))

5865	   o  uv_dc_delta_present indicates if the stream contains a delta value
5866	      that is added to the baseline index to obtain the chroma DC
5867	      coefficient dequantization index (9.6 (Section 9.6))

5869	   o  uv_ac_delta_present indicates if the stream contains a delta value
5870	      that is added to the baseline index to obtain the chroma AC
5871	      coefficient dequantization index (9.6 (Section 9.6))
5872	                 +-------------------------------+------+
5873	                 | token_prob_update()           | Type |
5874	                 +-------------------------------+------+
5875	                 | for (i = 0; i < 4; i++) {     |      |
5876	                 |                               |      |
5877	                 | for (j = 0; j < 8; j++) {     |      |
5878	                 |                               |      |
5879	                 | for (k = 0; k < 3; k++) {     |      |
5880	                 |                               |      |
5881	                 | for (l = 0; l < 11; l++) {    |      |
5882	                 |                               |      |
5883	                 | coeff_prob_update_flag        | L(1) |
5884	                 |                               |      |
5885	                 | if (coeff_prob_update_flag) { |      |
5886	                 |                               |      |
5887	                 | coeff_prob                    | L(8) |
5888	                 |                               |      |
5889	                 | }                             |      |
5890	                 |                               |      |
5891	                 | }                             |      |
5892	                 |                               |      |
5893	                 | }                             |      |
5894	                 |                               |      |
5895	                 | }                             |      |
5896	                 |                               |      |
5897	                 | }                             |      |
5898	                 +-------------------------------+------+

5900	   o  coeff_prob_update_flag indicates if the corresponding branch
5901	      probability is updated in the current frame (13.4 (Section 13.4))

5903	   o  coeff_prob is the new branch probability (13.4 (Section 13.4))
5904	                   +----------------------------+------+
5905	                   | mv_prob_update()           | Type |
5906	                   +----------------------------+------+
5907	                   | for (i = 0; i < 2; i++) {  |      |
5908	                   |                            |      |
5909	                   | for (j = 0; j < 19; j++) { |      |
5910	                   |                            |      |
5911	                   | mv_prob_update_flag        | L(1) |
5912	                   |                            |      |
5913	                   | if (mv_prob_update_flag) { |      |
5914	                   |                            |      |
5915	                   | prob                       | L(7) |
5916	                   |                            |      |
5917	                   | }                          |      |
5918	                   |                            |      |
5919	                   | }                          |      |
5920	                   |                            |      |
5921	                   | }                          |      |
5922	                   +----------------------------+------+

5924	   o  mv_prob_update_flag indicates if the corresponding MV decoding
5925	      probability is updated in the current frame (17.2 (Section 17.2))

5927	   o  prob is the updated probability (17.2 (Section 17.2))

5929	19.3.  Macroblock Data

5931	                      +---------------------+------+
5932	                      | Macroblock Data     | Type |
5933	                      +---------------------+------+
5934	                      | macroblock_header() |      |
5935	                      |                     |      |
5936	                      | residual_data()     |      |
5937	                      +---------------------+------+

5939	                 +--------------------------------+------+
5940	                 | macroblock_header()            | Type |
5941	                 +--------------------------------+------+
5942	                 | if (segmentation_map_update) { |      |
5943	                 |                                |      |
5944	                 | segment_id                     | T    |
5945	                 |                                |      |
5946	                 | if (mb_no_coeff_skip) {        |      |
5947	                 |                                |      |
5948	                 | mb_coeff_skip                  | B(p) |
5949	                 |                                |      |
5950	                 | }                              |      |
5951	                 |                                |      |
5952	                 | if (!key_frame) {              |      |
5953	                 |                                |      |
5954	                 | is_inter_mb                    | B(p) |
5955	                 |                                |      |
5956	                 | if (is_inter_mb) {             |      |
5957	                 |                                |      |
5958	                 | mb_ref_frame_sel1              | B(p) |
5959	                 |                                |      |
5960	                 | if (mb_ref_frame_sel1)         |      |
5961	                 |                                |      |
5962	                 | mb_ref_frame_sel2              | B(p) |
5963	                 |                                |      |
5964	                 | mv_mode                        | T    |
5965	                 |                                |      |
5966	                 | if (mv_mode == SPLITMV) {      |      |
5967	                 |                                |      |
5968	                 | mv_split_mode                  | T    |
5969	                 |                                |      |
5970	                 | for (i = 0; i < numMvs; i++) { |      |
5971	                 |                                |      |
5972	                 | sub_mv_mode                    | T    |
5973	                 |                                |      |
5974	                 | if (sub_mv_mode == NEWMV4x4) { |      |
5975	                 |                                |      |
5976	                 | read_mvcomponent()             |      |
5977	                 |                                |      |
5978	                 | read_mvcomponent()             |      |
5979	                 |                                |      |
5980	                 | }                              |      |
5981	                 |                                |      |
5982	                 | }                              |      |
5983	                 |                                |      |
5984	                 | } else if (mv_mode == NEWMV) { |      |
5985	                 |                                |      |
5986	                 | read_mvcomponent()             |      |
5987	                 |                                |      |
5988	                 | read_mvcomponent()             |      |
5989	                 |                                |      |
5990	                 | }                              |      |
5991	                 |                                |      |
5992	                 | } else { /* intra mb */        |      |
5993	                 |                                |      |
5994	                 | intra_y_mode                   | T    |
5995	                 +--------------------------------+------+
5996	                 +-------------------------------+------+
5997	                 | macroblock_header()           | Type |
5998	                 +-------------------------------+------+
5999	                 | if (intra_y_mode == B_PRED) { |      |
6000	                 |                               |      |
6001	                 | for (i = 0; i < 16; i++)      |      |
6002	                 |                               |      |
6003	                 | intra_b_mode                  | T    |
6004	                 |                               |      |
6005	                 | }                             |      |
6006	                 |                               |      |
6007	                 | intra_uv_mode                 | T    |
6008	                 |                               |      |
6009	                 | }                             |      |
6010	                 +-------------------------------+------+

6012	   o  segment_id indicates to which segment the macroblock belongs (10
6013	      (Section 10))

6015	   o  mb_coeff_skip indicates if the macroblock contains any coded
6016	      coefficients or not (11.1 (Section 11.1))

6018	   o  is_inter_mb indicates if the macroblock is intra or inter coded
6019	      (16 (Section 16))

6021	   o  mb_ref_frame_sel1 selects the reference frame to be used; last
6022	      frame (0), golden/alternate (1) (16.2 (Section 16.2))

6024	   o  mb_ref_frame_sel2 selects whether the golden (0) or alternate
6025	      reference frame (1) is used (16.2 (Section 16.2))

6027	   o  mv_mode determines the macroblock motion vector mode (16.2
6028	      (Section 16.2))

6030	   o  mv_split_mode gives macroblock partitioning specification and
6031	      determines number of motion vectors used (numMvs)(16.2
6032	      (Section 16.2))

6034	   o  sub_mv_mode determines the sub-macroblock motion vector mode for
6035	      macroblocks coded using SPLITMV motion vector mode (16.2
6036	      (Section 16.2))

6038	   o  intra_y_mode selects the luminance intra prediction mode (16.1
6039	      (Section 16.1))

6041	   o  intra_b_mode selects the sub-macroblock luminance prediction mode
6042	      for macroblocks coded using B_PRED mode (16.1 (Section 16.1))

6044	   o  intra_uv_mode selects the chrominance intra prediction mode (16.1
6045	      (Section 16.1))

6047	          +----------------------------------------------+------+
6048	          | residual_data()                              | Type |
6049	          +----------------------------------------------+------+
6050	          | if (!mb_coeff_skip) {                        |      |
6051	          |                                              |      |
6052	          | if ( (is_inter_mb && mv_mode != SPLITMV) ||  |      |
6053	          |                                              |      |
6054	          | (!is_inter_mb && intra_y_mode != B_PRED) ) { |      |
6055	          |                                              |      |
6056	          | residual_block() /* Y2 */                    |      |
6057	          |                                              |      |
6058	          | }                                            |      |
6059	          |                                              |      |
6060	          | for (i = 0; i < 24; i++)                     |      |
6061	          |                                              |      |
6062	          | residual_block() /* 16 Y, 4 U, 4 V */        |      |
6063	          |                                              |      |
6064	          | }                                            |      |
6065	          +----------------------------------------------+------+

6067	              +-------------------------------------+------+
6068	              | residual_block()                    | Type |
6069	              +-------------------------------------+------+
6070	              | for (i = firstCoeff; i < 16; i++) { |      |
6071	              |                                     |      |
6072	              | token                               | T    |
6073	              |                                     |      |
6074	              | if (token == EOB) break;            |      |
6075	              |                                     |      |
6076	              | if (token_has_extra_bits) {         |      |
6077	              |                                     |      |
6078	              | extra_bits                          | L(n) |
6079	              |                                     |      |
6080	              | sign                                | L(1) |
6081	              +-------------------------------------+------+

6083	   o  firstCoeff is 1 for luma blocks of macroblocks containing Y2
6084	      subblock, otherwise 0

6086	   o  token defines the value of the coefficient, the value range of the
6087	      coefficient or the end of block (13.2 (Section 13.2))

6089	   o  extra_bits determine the value of the coefficient within the value
6090	      range defined by token (13.2 (Section 13.2))

6092	   o  sign indicates the sign of the coefficient (13.2 (Section 13.2))

6094	20.  License

6096	   Google hereby grants to You a perpetual, worldwide, non-exclusive,
6097	   no-charge, royalty-free, irrevocable (except as stated in this
6098	   section) patent license to make, have made, use, offer to sell, sell,
6099	   import, and otherwise implementations of this specification where
6100	   such license applies only to those patent claims, both currently
6101	   owned by Google and acquired in the future, licensable by Google that
6102	   are necessarily infringed by implementation of this specification.
6103	   If You or your agent or exclusive licensee institute or order or
6104	   agree to the institution of patent litigation against any entity
6105	   (including a cross-claim or counterclaim in a lawsuit) alleging that
6106	   any implementation of this specification constitutes direct or
6107	   contributory patent infringement, or inducement of patent
6108	   infringement, then any rights granted to You under the License for
6109	   this specification shall terminate as of the date such litigation is
6110	   filed.

6112	21.  Copyright

6114	   This specification is made available under a Creative Commons
6115	   Attribution 3.0 License [4].

6117	22.  References

6119	   [1]  International Telecommunication Union, "ITU BT.601: Studio
6120	        encoding parameters of digital television for standard 4:3 and
6121	        wide screen 16:9 aspect ratios", January 2007.

6123	   [2]  Kernighan, B. and D. Ritchie, "The C Programming Language (2nd
6124	        edition)", April 1988.

6126	   [3]  Shannon, C., "A Mathematical Theory of Communication", Bell
6127	        System Technical Journal Vol. 27, pp. 379-423, 623-656, July,
6128	        October 1948.

6130	   [4]  

6132	Authors' Addresses

6134	   James Bankoski
6135	   Google, Inc.

6137	   Email: jimbankoski@google.com

6139	   Paul Wilkins
6140	   Google, Inc.

6142	   Email: paulwilkins@google.com

6144	   Yaowu Xu
6145	   Google, Inc.

6147	   Email: yaowu@google.com