idnits 2.17.1 

draft-grange-vp9-bitstream-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (February 18, 2013) is 4084 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Unused Reference: 'RFC6368' is defined on line 557, but no explicit
     reference was found in the text


     Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Network Working Group                                          A. Grange
3	Internet-Draft                                             H. Alvestrand
4	Intended status: Informational                                    Google
5	Expires: August 22, 2013                               February 18, 2013

7	                        A VP9 Bitstream Overview
8	                     draft-grange-vp9-bitstream-00

10	Abstract

12	   This document describes VP9, a video codec being developed
13	   specifically to meet the demand for the consumption of video over the
14	   Internet, including professionally and amateur produced video-on-
15	   demand and conversational video content.  VP9 is an evolution of the
16	   VP8 video codec that is described in [bankoski-rfc6386] and includes
17	   a number of enhancements and new coding tools that have been added to
18	   improve the coding efficiency.  The new tools that have been added so
19	   far include: larger prediction block sizes up to 64x64, various forms
20	   of compound INTER prediction, more modes for INTRA prediction,
21	   &#8539;-pel motion vectors, 8-tap switchable sub-pixel interpolation
22	   filters, improved motion reference generation, improved motion vector
23	   coding, improved entropy coding including frame-level entropy
24	   adaptation for various symbols, improved loop filtering, the
25	   incorporation of the Asymmetric Discrete Sine Transform (ADST),
26	   larger 16x16 and 32x32 DCTs, and improved frame level segmentation.
27	   VP9 is under active development and this document provides only a
28	   snapshot of the current state of the coding tools as they exist
29	   today.  The finalized version of the VP9 bitstream may differ
30	   considerably from the description contained herein and may encompass
31	   the exclusion or modification of existing coding tools or the
32	   addition of new coding tools.

34	Status of this Memo

36	   This Internet-Draft is submitted in full conformance with the
37	   provisions of BCP 78 and BCP 79.

39	   Internet-Drafts are working documents of the Internet Engineering
40	   Task Force (IETF).  Note that other groups may also distribute
41	   working documents as Internet-Drafts.  The list of current Internet-
42	   Drafts is at http://datatracker.ietf.org/drafts/current/.

44	   Internet-Drafts are draft documents valid for a maximum of six months
45	   and may be updated, replaced, or obsoleted by other documents at any
46	   time.  It is inappropriate to use Internet-Drafts as reference
47	   material or to cite them other than as "work in progress."
48	   This Internet-Draft will expire on August 22, 2013.

50	Copyright Notice

52	   Copyright (c) 2013 IETF Trust and the persons identified as the
53	   document authors.  All rights reserved.

55	   This document is subject to BCP 78 and the IETF Trust's Legal
56	   Provisions Relating to IETF Documents
57	   (http://trustee.ietf.org/license-info) in effect on the date of
58	   publication of this document.  Please review these documents
59	   carefully, as they describe your rights and restrictions with respect
60	   to this document.  Code Components extracted from this document must
61	   include Simplified BSD License text as described in Section 4.e of
62	   the Trust Legal Provisions and are provided without warranty as
63	   described in the Simplified BSD License.

65	Table of Contents

67	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
68	   2.  Outline of the Codec . . . . . . . . . . . . . . . . . . . . .  4
69	     2.1.  Prediction Block Size  . . . . . . . . . . . . . . . . . .  4
70	     2.2.  Prediction Modes . . . . . . . . . . . . . . . . . . . . .  5
71	       2.2.1.  INTRA modes  . . . . . . . . . . . . . . . . . . . . .  5
72	       2.2.2.  INTER Modes  . . . . . . . . . . . . . . . . . . . . .  5
73	       2.2.3.  Compound INTER-INTRA Mode  . . . . . . . . . . . . . .  6
74	     2.3.  Sub-Pixel Interpolation  . . . . . . . . . . . . . . . . .  6
75	     2.4.  Transforms . . . . . . . . . . . . . . . . . . . . . . . .  7
76	     2.5.  Motion Vector Reference Selection and Coding . . . . . . .  7
77	     2.6.  Entropy Coding and Adaptation  . . . . . . . . . . . . . .  8
78	     2.7.  Loop Filter  . . . . . . . . . . . . . . . . . . . . . . .  9
79	     2.8.  Segmentation . . . . . . . . . . . . . . . . . . . . . . .  9
80	   3.  Bitstream features . . . . . . . . . . . . . . . . . . . . . . 10
81	     3.1.  Error-Resilience . . . . . . . . . . . . . . . . . . . . . 10
82	     3.2.  Parallel Decodability  . . . . . . . . . . . . . . . . . . 11
83	       3.2.1.  Frame-Level Parallelism  . . . . . . . . . . . . . . . 11
84	       3.2.2.  Tiling . . . . . . . . . . . . . . . . . . . . . . . . 11
85	     3.3.  Scalability  . . . . . . . . . . . . . . . . . . . . . . . 12
86	   4.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 12
87	   5.  Security Considerations  . . . . . . . . . . . . . . . . . . . 12
88	   6.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13
89	   7.  Informative References . . . . . . . . . . . . . . . . . . . . 13
90	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 14

92	1.  Introduction

94	   Video data accounts for a significant proportion of all internet
95	   traffic, and the trend is toward higher quality, larger format and
96	   often professionally produced video, encoded at higher data rates and
97	   supported by the improved provisioning of high bandwidth internet
98	   connections.  VP9 is being developed as an open source solution
99	   tailored to the specific characteristics of the internet, under the
100	   auspices of the WebM project [Google-webm], with the aim of providing
101	   the highest quality user experience and the ability to support the
102	   widest range of use-cases on a diverse set of target devices.  This
103	   document provides a high-level technical overview of the coding tools
104	   that will likely be included in the final VP9 bitstream.

106	2.  Outline of the Codec

108	   A large proportion of the advance that VP9 has made over VP8 can be
109	   attributed to a straightforward generational progression from the
110	   current to the future, driven by the need for the greater efficiency
111	   required to handle a new coding "sweet-spot" that has evolved to
112	   support the provisioning of larger frame size, higher quality video
113	   formats.

115	2.1.  Prediction Block Size

117	   A large part of the coding efficiency improvements achieved by VP9
118	   can be attributed to the introduction of larger prediction block
119	   sizes.  Specifically, VP9 introduces the notion of Super-Blocks of
120	   size up to 64x64 and their quad-tree like decomposition all the way
121	   down to a block size of 4x4, with some quirks as described below.  In
122	   particular, a superblock of size 64x64 (SB64) could be split into 4
123	   superblocks of size 32x32 (SB32), each of which can be further split
124	   into 16x16 macroblocks (MB).  Each SB64, SB32 or MB could be
125	   predicted as a whole using a conveyed INTRA prediction mode, or an
126	   INTER prediction mode with up to two motion vectors and corresponding
127	   reference frames, as described in Section 3.2.  A macroblock can be
128	   further split using one of three mode families: B_PRED - where each
129	   4x4 sub-block within the MB can be coded using a signaled 4x4 INTRA
130	   prediction mode; I8X8_PRED - where each 8x8 block within the MB can
131	   be coded using a signaled 8x8 INTRA prediction mode; and SPLITMV -
132	   where each 4x4 sub-block within the MB is coded in INTER mode with a
133	   corresponding motion vector, but with the option of grouping common
134	   motion vectors over 16x8, 8x16, or 8x8 partitions within the MB.
135	   Note that the B_PRED and SPLITMV modes in VP9 work in the same way as
136	   they do in VP8.

138	2.2.  Prediction Modes

140	   VP9 supports the following prediction modes for various block-sizes:

142	2.2.1.  INTRA modes

144	   At block-size 4x4, VP9 supports ten intra prediction modes; DC,
145	   Vertical, Horizontal, TM (True Motion), Horizontal Up, Left Diagonal,
146	   Vertical Right, Vertical Left, Right Diagonal, and Horizontal Down
147	   (the same set defined by VP8).  For blocks from 8x8 to 64x64 there is
148	   also support for ten intra modes; DC, Vertical, Horizontal, TM (True
149	   Motion), and six angular predictors corresponding, approximately, to
150	   angles of 27, 45, 63, 117, 135, and 153 degrees.  Furthermore, there
151	   is additionally the option of applying a low-pass filter to the
152	   prediction that can be signaled in the bitstream.

154	2.2.2.  INTER Modes

156	   VP9 currently supports INTER prediction from up to three reference
157	   frame buffers (named LAST_FRAME, GOLDEN_FRAME and ALTREF_FRAME, as in
158	   VP8), but for any particular frame the three available references are
159	   dynamically selectable from a pool of eight stored reference frames.
160	   A syntax element in the frame header indicates which sub-set of three
161	   reference buffers are available when encoding the frame.  A further
162	   syntax element indicates which of three frame buffers, if any, are to
163	   be updated at the end of encoding a frame.  Some coded frames may be
164	   designated as invisible in the sense that they are only used as a
165	   reference and never actually displayed, akin to the ALTREF frame in
166	   VP8.  It is also likely that the number of available working
167	   reference buffers will be increased from three to four in the final
168	   VP9 bitstream.

170	   Each INTER coded block within a frame, may be coded using up to two
171	   motion vectors with two different reference buffers out of the three
172	   working reference buffers selected for the frame.  When a single
173	   motion vector is used, sub-pixel interpolation from the indicated
174	   reference frame buffer is used to obtain the predictor.  When two
175	   motion vectors, mv1 and mv2, are conveyed for any given block, the
176	   corresponding reference frame buffers ref1 and ref2 must be different
177	   from each other, and the final predictor is then obtained by
178	   averaging the individual predictors from each of the motion vectors,
179	   i.e.,

181	   P[i, j] = floor((Pmv1, ref1[i, j] + Pmv2, ref2[i, j] + 1) / 2)

183	   where P[i, j] is the predictor value at pixel location [i, j], and
184	   Pmv1, ref1 and Pmv2, ref2 are the INTER predictors corresponding to
185	   the two motion vectors and reference buffers conveyed.

187	2.2.3.  Compound INTER-INTRA Mode

189	   A further prediction mode under consideration is a combination INTER/
190	   INTRA mode.  In this mode, an INTER predictor and an INTRA predictor
191	   are combined in a manner whereby pixels closer to the INTRA
192	   prediction edge (top or left) are weighted more heavily towards the
193	   INTRA predictor, whilst pixels further away from the edges are
194	   weighted more heavily towards the INTER predictor.  The exact weights
195	   used for each pixel thus depend on the particular INTRA prediction
196	   direction in use.  Conceptually, each INTRA prediction mode at a
197	   given block size is associated with a constant weighting block of the
198	   same size - that provides the weight for the INTRA predictor as
199	   compared to the inter predictor.  For instance, if the weighting
200	   matrix for a given INTRA mode m and block-size n is given by an nxn
201	   matrix, Wm, with values between 0 and 1, then the predictor of pixel
202	   [i, j] denoted P[i, j] is obtained by:

204	   P[i, j] = Wm[i, j] .  Pm[i, j] + (1 - Wm[i, j]) .  Pmv, ref[i, j]

206	   where Pm is the INTRA predictor for the given INTRA mode, and Pmv,
207	   ref is the INTER predictor obtained using motion vector mv and
208	   reference frame index ref.  This mode is restricted to one motion
209	   vector per block, and only to blocks of size 16x16 and above, i.e.
210	   MB/SB32/SB64.  The weighting matrix may be obtained from a 1-D
211	   exponential decay function of the form A + B exp (-Kx), where x
212	   represents the distance along the prediction direction to the nearest
213	   left/top edge.

215	2.3.  Sub-Pixel Interpolation

217	   The filters used for sub-pixel interpolation of fractional motion are
218	   critical to the performance of a video codec.  The maximum motion
219	   vector precision supported is 1/8-pixel, with the option of switching
220	   between 1/4-pixel and 1/8-pixel precision using a frame level flag.
221	   If 1/8-pixel precision is used in the frame, however, it is only used
222	   for small motion, depending on the magnitude of the reference motion
223	   vector.  For larger motion - indicated by a larger reference - there
224	   is almost always motion blur which obviates the need for higher
225	   precision interpolation.  VP9 defines a family of three 8-tap
226	   filters, selectable at either the frame or macroblock level in the
227	   bitstream:

229	   o  8-tap Regular: An 8-tap Lagrangian interpolation filter designed
230	      using the int_filt function in MATLAB,

232	   o  8-tap Sharp: A DCT-based interpolation filter with a sharper
233	      response, used mostly around sharper edges,

235	   o  8-tap Smooth (non-interpolating): A smoothing filter designed
236	      using the windowed Fourier series approach with a Hamming window.
237	      Note that unlike the other two filters, this filter is non-
238	      interpolating in the sense that the prediction at integer pixel-
239	      aligned locations is a smoothed version of the reference frame
240	      pixels.

242	2.4.  Transforms

244	   VP9 supports the Discrete Cosine Transform (DCTs) at sizes 4x4, 8x8,
245	   16x16 and 32x32 and removes the second-order transform that was
246	   employed in VP8.  Only transform sizes equal to, or smaller than, the
247	   prediction block size may be specified.  Modes B_PRED and 4x4 SPLITMV
248	   are thus restricted to using only the 4x4 transform; modes I8X8_PRED
249	   and non-4x4 SPLITMV can use either the 4x4 or 8x8 transform; full-
250	   size (16x16) macroblock predictors can be coupled with either the
251	   4x4, 8x8 or 16x16 transforms, and superblocks can use any transform
252	   size up to 32x32.  Further restrictions on the available sub-set of
253	   transforms can be signaled at the frame-level, by specifying a
254	   maximum allowable transform size, or at the macroblock level by
255	   explicitly signaling which of the available transform sizes is used.

257	   In addition, VP9 introduces support for a new transform type, the
258	   Asymmetric Discrete Sine Transform (ADST), which can be used in
259	   combination with specific intra-prediction modes.  It has been shown
260	   in [Han-Icassp] and [Han-Itip] that when a one-sided boundary is
261	   available, as in most INTRA prediction modes, the ADST rather than
262	   the DCT is the optimal transform for the residual signal.  Intra
263	   prediction modes that predict from a left edge can use the 1-D ADST
264	   in the horizontal direction, combined with a 1-D DCT in the vertical
265	   direction.  Similarly, the residual signal resulting from intra
266	   prediction modes that predict from the top edge can employ a vertical
267	   1-D ADST transform combined with a horizontal 1-D DCT transform.
268	   Intra prediction modes that predict from both edges such as the True
269	   Motion (TM_PRED) mode and some diagonal intra prediction modes, use
270	   the 1-D ADST in both horizontal and vertical directions.

272	2.5.  Motion Vector Reference Selection and Coding

274	   One of the most critical factors in the efficiency of motion vector
275	   encoding is the generation of a suitable reference motion vector to
276	   be used as a predictor.  VP9 creates a sorted list of candidate
277	   reference motion vectors that encompasses the three vectors best,
278	   nearest and near as defined by VP8.  In addition to the candidates
279	   produced by the VP8 algorithm, VP9 additionally evaluates the motion
280	   vector of the co-located block in the reference frame and those of
281	   nearby blocks.  VP9 introduces a new scoring mechanism to rank these
282	   reference vectors whereby each candidate is evaluated to determine
283	   how well it would have predicted the reconstructed pixels in close
284	   proximity to the current block (more specifically a small number of
285	   rows immediately above the current block, and maybe a small number of
286	   columns to the left of the current block).  A predictor is created
287	   using each candidate vector in turn to displace the pixels in the
288	   reference frame and the variance of the resulting error signal, with
289	   respect to the set of pixels in the current frame, is used to rank
290	   the reference vectors.

292	   With the three best candidate reference vectors best, nearest and
293	   near identified, the encoder can either signal the use of the vector
294	   identified as the nearest (NEAREST_MV mode) or near (NEAR_MV mode)
295	   or, if neither of them is deemed appropriate, signal the use of a
296	   completely new motion vector (NEW_MV mode) that is then specified as
297	   a delta from the best reference candidate.

299	   One further mode, ZERO_MV, signals the use of the (0, 0) motion
300	   vector.

302	   In addition, a more efficient motion vector offset encoding mechanism
303	   has been introduced.

305	2.6.  Entropy Coding and Adaptation

307	   The VP9 bitstream employs the VP8 BoolCoder as the underlying
308	   arithmetic encoder.  Generally speaking, given a symbol from any
309	   n-ary alphabet, a static binary tree is constructed with n-1 internal
310	   nodes, and a binary arithmetic encoder is run at each such node as
311	   the tree is traversed to encode a particular symbol.  The
312	   probabilities at each node use 8-bit precision.  The set of n-1
313	   probabilities for coding the symbol is referred to as the entropy
314	   coding context of the symbol.  Almost all of the coding elements
315	   conveyed in a bit-stream - including modes, motion vectors, reference
316	   frames, and prediction residuals for each transform type and size -
317	   use this strategy.

319	   Video content is inherently highly non-stationary in nature and a
320	   critical component of any codec is the mechanism used to track the
321	   statistics of the various encoded symbols and update the parameters
322	   of the entropy coding contexts to match.  VP9 makes use of forward
323	   context updates through the use of flags in the frame header that
324	   signal modifications of the coding contexts at the start of each
325	   frame.  The syntax for forward updates is designed to allow an
326	   arbitrary sub-set of the node probabilities to be updated whilst
327	   leaving the others unchanged.  The advantage of using forward
328	   adaptation is that decoding performance can be substantially
329	   improved, because no intermediate computations based on encountered
330	   token counts is necessary.  Updates are encoded differentially, to
331	   allow a more efficient specification of updated coding contexts which
332	   is essential given the expanded set of tokens available in VP9.

334	   In addition, there is also a limited option for signaling backward
335	   adaptation, which in VP9 is only possible at the end of encoding each
336	   frame so that the impact on decoding speed is minimal.  Specifically,
337	   for every frame encoded, first a forward update modifies the entropy
338	   coding contexts for various symbols encoded starting from the initial
339	   state at the beginning of the frame.  Thereafter, all symbols encoded
340	   in the frame are coded using this modified coding state.  At the end
341	   of the frame, both the encoder and decoder are expected to have
342	   accumulated counts for various symbols actually encoded or decoded
343	   over the frame.  Using these actual distributions, a backward update
344	   step is applied to adapt the entropy coding context for use as the
345	   baseline for the next frame.

347	2.7.  Loop Filter

349	   VP9 introduces a variety of new prediction block and transform sizes
350	   that require additional loop filtering options to handle a larger
351	   number of combinations of boundary types.  VP9 also incorporates a
352	   flatness detector in the loop filter that detects flat regions and
353	   varies the filter strength and size accordingly.

355	2.8.  Segmentation

357	   VP9 introduces more advanced segmentation features that make it much
358	   more efficient and powerful, allowing each superblock or macroblock
359	   to specify a segment-ID to which it belongs.  Then, for each segment,
360	   the frame header can convey common features that will be applied to
361	   all MBs/SB32s/SB64s belonging to the same segment ID.  Further, the
362	   segmentation map is coded differentially across frames in order to
363	   minimize the size of the signaling overhead.  Examples of information
364	   that can be conveyed for a segment include: restrictions on the
365	   reference frames that can be used for each segment, coefficient
366	   skips, quantizer and loopfilter strength, and transform size options.
367	   Generally speaking, the segmentation mechanism provides a flexible
368	   set of tools that can be used, in an application specific way, to
369	   target improvements in perceptual quality for a given compression
370	   ratio.

372	   In the reference implementation, segmentation is currently used to
373	   identify background and foreground areas in encoded video content.
374	   The (static) background is then coded at a higher quality compared to
375	   the rest of the frame in certain reference frames (such as the alt-
376	   ref frame) that provides prediction that persists over a number of
377	   frames.  In contrast, for the frames between these persistent
378	   reference frames, the background is given fewer bits by, for example,
379	   restricting the set of available reference buffers, using only the
380	   ZERO_MV coding mode, or skipping the residual coefficient block.  The
381	   result is that more bits are available to code the foreground-portion
382	   of the scene, while still preserving very good perceptual quality on
383	   the static background.  Other use cases involving spatial and
384	   temporal masking for perceptual quality improvement are conceivable.

386	3.  Bitstream features

388	   In addition to providing high compression efficiency with reasonable
389	   complexity, the VP9 bitstream includes features designed to support a
390	   variety of specific use-cases that are important to internet video
391	   content delivery and consumption.  This section provides an overview
392	   of these features.

394	3.1.  Error-Resilience

396	   For communication of conversational video with low latency over an
397	   unreliable network, it is imperative to support a coding mode where
398	   decoding can continue without errors even when arbitrary frames are
399	   lost.  Specifically, the arithmetic encoder should still be able to
400	   decode symbols correctly in frames subsequent to lost frames, even
401	   though frame buffers have been corrupted, leading to encoder-decoder
402	   mismatch.  The hope is that the drift between the encoder and decoder
403	   will still be manageable until such time as a key frame is sent or
404	   other corrective action (such as reference picture selection) can be
405	   taken.  VP9 supports a frame level error_resilient_mode flag which
406	   when turned on will only allow coding modes where this is possible to
407	   achieve.  In particular, the following restrictions are imposed in
408	   error resilient mode:

410	   1.  The entropy coding context probabilities are reset to defaults at
411	       the beginning of each frame.  (This effectively prevents
412	       propagation of forward updates as well as backward updates),

414	   2.  For MV reference selection, the co-located MV from a previously
415	       encoded reference frame can no longer be included in the
416	       reference candidate list,

418	   3.  For MV reference selection, sorting of the initial list of motion
419	       vector reference candidates based on search in the reference
420	       frame buffer is disabled.

422	   These restrictions produce a modest performance drop.

424	3.2.  Parallel Decodability

426	   Smooth encoding and playback of high-definition video on resource
427	   constrained personal devices (smartphones, tablets, netbooks, etc.)
428	   in software necessitates exploiting some form of parallelism, so that
429	   multi-threaded applications can be built around the codec to exploit
430	   the inherent multi-processing capabilities of modern processors.
431	   This may include either the ability to encode/decode parts of a frame
432	   in parallel, or the ability to decode successive frames in parallel,
433	   or a combination of both.  VP9 supports both forms of parallelism, as
434	   described below:

436	3.2.1.  Frame-Level Parallelism

438	   A frame level flag frame_parallel_mode, when turned on, enables an
439	   encoding mode where the entropy decoding for successive frames can be
440	   conducted in a quasi-parallel manner just by parsing the frame
441	   headers, before these frames actually need to be reconstructed.  In
442	   this mode, only the frame headers need to be decoded sequentially.
443	   Beyond that, the entropy decoding for each frame can be conducted in
444	   a lagged parallel mode as long as the co-located motion vector
445	   information from a previous reference frame has been decoded prior to
446	   the current frame.  The reconstruction of the frames can then be
447	   conducted sequentially in coding order as they are required to be
448	   displayed.  This mode will enable multi-threaded decoder
449	   implementations that results in smoother playback performance.
450	   Specifically, this mode imposes the following restrictions on the
451	   bitstream, which is a subset of the restrictions for the error-
452	   resilient mode.

454	   1.  Backward entropy coding context updates are disabled.  But
455	       forward updates are allowed to propagate.

457	   2.  For MV reference selection, sorting of the initial list of motion
458	       vector reference candidates based on a search in the reference
459	       frame buffer is disabled.  However, the co-located MV from a
460	       previously encoded reference frame can be included in the initial
461	       candidate list.

463	3.2.2.  Tiling

465	   In addition to making provisions for decoding multiple frames in
466	   parallel, VP9 also has support for decoding a single frame using
467	   multiple threads.  For this, VP9 introduces tiles, which are
468	   independently coded and decodable sub-units of the video frame.  When
469	   enabled a frame can be split into, for example, 2 or 4 column-based
470	   tiles.  Each tile shares the same frame entropy model, but all
471	   contexts and pixel values (for intra prediction) that cross tile-
472	   boundaries take the same value as those at the left, top or right
473	   edge of the frame.  Each tile can thus be decoded and encoded
474	   completely independently, which is expected to enable significant
475	   speedups in multi-threaded encoders/decoders, without introducing any
476	   additional latency.  Note that loop-filtering across tile-edges can
477	   still be applied, assuming a decoder implementation model where the
478	   loop-filtering operation lags the decoder's reconstruction of the
479	   individual tiles within the frame so as not to use any pixel that is
480	   not already reconstructed.  Further, backward entropy adaptation - a
481	   light-weight operation - can still be conducted for the whole frame
482	   after entropy decoding for all tiles has finished.

484	3.3.  Scalability

486	   The VP9 bit-stream will provide a number of flexible features that
487	   can be combined in specific ways to efficiently provide various forms
488	   of scalability.  VP9 increases the number of available reference
489	   frame buffers to eight, from which three may be selected for each
490	   frame.  In addition, each coded frame may be resampled and coded at a
491	   resolution different from the reference buffers, allowing internal
492	   spatial resolution changes on-the-fly without having to resort to
493	   using keyframes.  When such a resolution change is signaled in the
494	   bit-stream, the reference buffers as well as the corresponding MV
495	   information is suitably transformed to the new resolution before
496	   applying standard coding tools.  Furthermore, VP9 defines the
497	   maintenance of four different entropy coding contexts to be selected
498	   and optionally updated on every frame, thereby making it possible for
499	   the encoder to use a different entropy coding context for each
500	   scalable layer, if required.  These flexible features together enable
501	   an encoder/decoder to implement various forms of coarse-grained
502	   scalability, including temporal, spatial, or combined spatio-temporal
503	   scalability, without explicitly creating spatially scalable encoding
504	   modes.

506	4.  IANA Considerations

508	   This document makes no request of IANA.

510	   Note to RFC Editor: this section may be removed on publication as an
511	   RFC.

513	5.  Security Considerations

515	   The VP9 bitstream offers no security functions.  Integrity and
516	   confidentiality must be ensured by functions outside the bistream.

518	   The VP9 bitstream does not offer functions for embedding of other
519	   types of objects, either active or passive.  So this class of attack
520	   cannot be mounted using VP9.

522	   Implementations of codecs are often written with a strong focus on
523	   speed.  The reference software has been carefully vetted for security
524	   issues, but no guarantees can be given.  People who use other
525	   people's decoder software will need to take appropriate care when
526	   executing the software in a security sensitive context.

528	6.  Acknowledgements

530	   This document is heavily based on the paper by Bankoski, J., Bultje,
531	   R.S., Grange, A., Gu, Q., Han, J., Koleszar, J., Mukherjee, D.,
532	   Wilkins, P., Xu, Y., Towards a Next Generation Open-source Video
533	   Codec, IS&T / SPIE EI Conference on Visual Information Processing and
534	   Communication IV, February 5-7, 2013.

536	7.  Informative References

538	   [Google-webm]
539	              "WEBM project website", March .

541	              http://www.webmproject.org/

543	   [Han-Icassp]
544	              Han, J., "Towards jointly optimal spatial prediction and
545	              adaptive transform in video/image coding", March 2010.

547	              IEEE Int. Conf. on Acoustics, Speech and Signal Proc.
548	              (ICASSP), pp. 726-729

550	   [Han-Itip]
551	              Han, J., "Jointly optimized spatial prediction and block
552	              transform for video and image coding", April 2012.

554	              IEEE Transactions on Image Processing, vol. 21, pp. 1874-
555	              1884

557	   [RFC6368]  Marques, P., Raszuk, R., Patel, K., Kumaki, K., and T.
558	              Yamagata, "Internal BGP as the Provider/Customer Edge
559	              Protocol for BGP/MPLS IP Virtual Private Networks (VPNs)",
560	              RFC 6368, September 2011.

562	   [vp9-paper]
563	              Bankoski, J., Bultje, R., Grange, A., Gu, Q., Han, J.,
564	              Koleszar, J., Mukherjee, D., Wilkins, P., and Y. Xu,
565	              "Towards a Next Generation Open-source Video Codec",
566	              February 2013.

568	              IS&T / SPIE EI Conference on Visual Information Processing
569	              and Communication IV

571	Authors' Addresses

573	   Adrian Grange
574	   Google

576	   Email: agrange@google.com

578	   Harald Alvestrand
579	   Google

581	   Phone:
582	   Fax:
583	   Email: hta@google.com
584	   URI: