idnits 2.17.1 draft-grange-vp9-bitstream-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 18, 2013) is 4084 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RFC6368' is defined on line 557, but no explicit reference was found in the text Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Grange 3 Internet-Draft H. Alvestrand 4 Intended status: Informational Google 5 Expires: August 22, 2013 February 18, 2013 7 A VP9 Bitstream Overview 8 draft-grange-vp9-bitstream-00 10 Abstract 12 This document describes VP9, a video codec being developed 13 specifically to meet the demand for the consumption of video over the 14 Internet, including professionally and amateur produced video-on- 15 demand and conversational video content. VP9 is an evolution of the 16 VP8 video codec that is described in [bankoski-rfc6386] and includes 17 a number of enhancements and new coding tools that have been added to 18 improve the coding efficiency. The new tools that have been added so 19 far include: larger prediction block sizes up to 64x64, various forms 20 of compound INTER prediction, more modes for INTRA prediction, 21 ⅛-pel motion vectors, 8-tap switchable sub-pixel interpolation 22 filters, improved motion reference generation, improved motion vector 23 coding, improved entropy coding including frame-level entropy 24 adaptation for various symbols, improved loop filtering, the 25 incorporation of the Asymmetric Discrete Sine Transform (ADST), 26 larger 16x16 and 32x32 DCTs, and improved frame level segmentation. 27 VP9 is under active development and this document provides only a 28 snapshot of the current state of the coding tools as they exist 29 today. The finalized version of the VP9 bitstream may differ 30 considerably from the description contained herein and may encompass 31 the exclusion or modification of existing coding tools or the 32 addition of new coding tools. 34 Status of this Memo 36 This Internet-Draft is submitted in full conformance with the 37 provisions of BCP 78 and BCP 79. 39 Internet-Drafts are working documents of the Internet Engineering 40 Task Force (IETF). Note that other groups may also distribute 41 working documents as Internet-Drafts. The list of current Internet- 42 Drafts is at http://datatracker.ietf.org/drafts/current/. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on August 22, 2013. 50 Copyright Notice 52 Copyright (c) 2013 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (http://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 68 2. Outline of the Codec . . . . . . . . . . . . . . . . . . . . . 4 69 2.1. Prediction Block Size . . . . . . . . . . . . . . . . . . 4 70 2.2. Prediction Modes . . . . . . . . . . . . . . . . . . . . . 5 71 2.2.1. INTRA modes . . . . . . . . . . . . . . . . . . . . . 5 72 2.2.2. INTER Modes . . . . . . . . . . . . . . . . . . . . . 5 73 2.2.3. Compound INTER-INTRA Mode . . . . . . . . . . . . . . 6 74 2.3. Sub-Pixel Interpolation . . . . . . . . . . . . . . . . . 6 75 2.4. Transforms . . . . . . . . . . . . . . . . . . . . . . . . 7 76 2.5. Motion Vector Reference Selection and Coding . . . . . . . 7 77 2.6. Entropy Coding and Adaptation . . . . . . . . . . . . . . 8 78 2.7. Loop Filter . . . . . . . . . . . . . . . . . . . . . . . 9 79 2.8. Segmentation . . . . . . . . . . . . . . . . . . . . . . . 9 80 3. Bitstream features . . . . . . . . . . . . . . . . . . . . . . 10 81 3.1. Error-Resilience . . . . . . . . . . . . . . . . . . . . . 10 82 3.2. Parallel Decodability . . . . . . . . . . . . . . . . . . 11 83 3.2.1. Frame-Level Parallelism . . . . . . . . . . . . . . . 11 84 3.2.2. Tiling . . . . . . . . . . . . . . . . . . . . . . . . 11 85 3.3. Scalability . . . . . . . . . . . . . . . . . . . . . . . 12 86 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 87 5. Security Considerations . . . . . . . . . . . . . . . . . . . 12 88 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13 89 7. Informative References . . . . . . . . . . . . . . . . . . . . 13 90 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 14 92 1. Introduction 94 Video data accounts for a significant proportion of all internet 95 traffic, and the trend is toward higher quality, larger format and 96 often professionally produced video, encoded at higher data rates and 97 supported by the improved provisioning of high bandwidth internet 98 connections. VP9 is being developed as an open source solution 99 tailored to the specific characteristics of the internet, under the 100 auspices of the WebM project [Google-webm], with the aim of providing 101 the highest quality user experience and the ability to support the 102 widest range of use-cases on a diverse set of target devices. This 103 document provides a high-level technical overview of the coding tools 104 that will likely be included in the final VP9 bitstream. 106 2. Outline of the Codec 108 A large proportion of the advance that VP9 has made over VP8 can be 109 attributed to a straightforward generational progression from the 110 current to the future, driven by the need for the greater efficiency 111 required to handle a new coding "sweet-spot" that has evolved to 112 support the provisioning of larger frame size, higher quality video 113 formats. 115 2.1. Prediction Block Size 117 A large part of the coding efficiency improvements achieved by VP9 118 can be attributed to the introduction of larger prediction block 119 sizes. Specifically, VP9 introduces the notion of Super-Blocks of 120 size up to 64x64 and their quad-tree like decomposition all the way 121 down to a block size of 4x4, with some quirks as described below. In 122 particular, a superblock of size 64x64 (SB64) could be split into 4 123 superblocks of size 32x32 (SB32), each of which can be further split 124 into 16x16 macroblocks (MB). Each SB64, SB32 or MB could be 125 predicted as a whole using a conveyed INTRA prediction mode, or an 126 INTER prediction mode with up to two motion vectors and corresponding 127 reference frames, as described in Section 3.2. A macroblock can be 128 further split using one of three mode families: B_PRED - where each 129 4x4 sub-block within the MB can be coded using a signaled 4x4 INTRA 130 prediction mode; I8X8_PRED - where each 8x8 block within the MB can 131 be coded using a signaled 8x8 INTRA prediction mode; and SPLITMV - 132 where each 4x4 sub-block within the MB is coded in INTER mode with a 133 corresponding motion vector, but with the option of grouping common 134 motion vectors over 16x8, 8x16, or 8x8 partitions within the MB. 135 Note that the B_PRED and SPLITMV modes in VP9 work in the same way as 136 they do in VP8. 138 2.2. Prediction Modes 140 VP9 supports the following prediction modes for various block-sizes: 142 2.2.1. INTRA modes 144 At block-size 4x4, VP9 supports ten intra prediction modes; DC, 145 Vertical, Horizontal, TM (True Motion), Horizontal Up, Left Diagonal, 146 Vertical Right, Vertical Left, Right Diagonal, and Horizontal Down 147 (the same set defined by VP8). For blocks from 8x8 to 64x64 there is 148 also support for ten intra modes; DC, Vertical, Horizontal, TM (True 149 Motion), and six angular predictors corresponding, approximately, to 150 angles of 27, 45, 63, 117, 135, and 153 degrees. Furthermore, there 151 is additionally the option of applying a low-pass filter to the 152 prediction that can be signaled in the bitstream. 154 2.2.2. INTER Modes 156 VP9 currently supports INTER prediction from up to three reference 157 frame buffers (named LAST_FRAME, GOLDEN_FRAME and ALTREF_FRAME, as in 158 VP8), but for any particular frame the three available references are 159 dynamically selectable from a pool of eight stored reference frames. 160 A syntax element in the frame header indicates which sub-set of three 161 reference buffers are available when encoding the frame. A further 162 syntax element indicates which of three frame buffers, if any, are to 163 be updated at the end of encoding a frame. Some coded frames may be 164 designated as invisible in the sense that they are only used as a 165 reference and never actually displayed, akin to the ALTREF frame in 166 VP8. It is also likely that the number of available working 167 reference buffers will be increased from three to four in the final 168 VP9 bitstream. 170 Each INTER coded block within a frame, may be coded using up to two 171 motion vectors with two different reference buffers out of the three 172 working reference buffers selected for the frame. When a single 173 motion vector is used, sub-pixel interpolation from the indicated 174 reference frame buffer is used to obtain the predictor. When two 175 motion vectors, mv1 and mv2, are conveyed for any given block, the 176 corresponding reference frame buffers ref1 and ref2 must be different 177 from each other, and the final predictor is then obtained by 178 averaging the individual predictors from each of the motion vectors, 179 i.e., 181 P[i, j] = floor((Pmv1, ref1[i, j] + Pmv2, ref2[i, j] + 1) / 2) 183 where P[i, j] is the predictor value at pixel location [i, j], and 184 Pmv1, ref1 and Pmv2, ref2 are the INTER predictors corresponding to 185 the two motion vectors and reference buffers conveyed. 187 2.2.3. Compound INTER-INTRA Mode 189 A further prediction mode under consideration is a combination INTER/ 190 INTRA mode. In this mode, an INTER predictor and an INTRA predictor 191 are combined in a manner whereby pixels closer to the INTRA 192 prediction edge (top or left) are weighted more heavily towards the 193 INTRA predictor, whilst pixels further away from the edges are 194 weighted more heavily towards the INTER predictor. The exact weights 195 used for each pixel thus depend on the particular INTRA prediction 196 direction in use. Conceptually, each INTRA prediction mode at a 197 given block size is associated with a constant weighting block of the 198 same size - that provides the weight for the INTRA predictor as 199 compared to the inter predictor. For instance, if the weighting 200 matrix for a given INTRA mode m and block-size n is given by an nxn 201 matrix, Wm, with values between 0 and 1, then the predictor of pixel 202 [i, j] denoted P[i, j] is obtained by: 204 P[i, j] = Wm[i, j] . Pm[i, j] + (1 - Wm[i, j]) . Pmv, ref[i, j] 206 where Pm is the INTRA predictor for the given INTRA mode, and Pmv, 207 ref is the INTER predictor obtained using motion vector mv and 208 reference frame index ref. This mode is restricted to one motion 209 vector per block, and only to blocks of size 16x16 and above, i.e. 210 MB/SB32/SB64. The weighting matrix may be obtained from a 1-D 211 exponential decay function of the form A + B exp (-Kx), where x 212 represents the distance along the prediction direction to the nearest 213 left/top edge. 215 2.3. Sub-Pixel Interpolation 217 The filters used for sub-pixel interpolation of fractional motion are 218 critical to the performance of a video codec. The maximum motion 219 vector precision supported is 1/8-pixel, with the option of switching 220 between 1/4-pixel and 1/8-pixel precision using a frame level flag. 221 If 1/8-pixel precision is used in the frame, however, it is only used 222 for small motion, depending on the magnitude of the reference motion 223 vector. For larger motion - indicated by a larger reference - there 224 is almost always motion blur which obviates the need for higher 225 precision interpolation. VP9 defines a family of three 8-tap 226 filters, selectable at either the frame or macroblock level in the 227 bitstream: 229 o 8-tap Regular: An 8-tap Lagrangian interpolation filter designed 230 using the int_filt function in MATLAB, 232 o 8-tap Sharp: A DCT-based interpolation filter with a sharper 233 response, used mostly around sharper edges, 235 o 8-tap Smooth (non-interpolating): A smoothing filter designed 236 using the windowed Fourier series approach with a Hamming window. 237 Note that unlike the other two filters, this filter is non- 238 interpolating in the sense that the prediction at integer pixel- 239 aligned locations is a smoothed version of the reference frame 240 pixels. 242 2.4. Transforms 244 VP9 supports the Discrete Cosine Transform (DCTs) at sizes 4x4, 8x8, 245 16x16 and 32x32 and removes the second-order transform that was 246 employed in VP8. Only transform sizes equal to, or smaller than, the 247 prediction block size may be specified. Modes B_PRED and 4x4 SPLITMV 248 are thus restricted to using only the 4x4 transform; modes I8X8_PRED 249 and non-4x4 SPLITMV can use either the 4x4 or 8x8 transform; full- 250 size (16x16) macroblock predictors can be coupled with either the 251 4x4, 8x8 or 16x16 transforms, and superblocks can use any transform 252 size up to 32x32. Further restrictions on the available sub-set of 253 transforms can be signaled at the frame-level, by specifying a 254 maximum allowable transform size, or at the macroblock level by 255 explicitly signaling which of the available transform sizes is used. 257 In addition, VP9 introduces support for a new transform type, the 258 Asymmetric Discrete Sine Transform (ADST), which can be used in 259 combination with specific intra-prediction modes. It has been shown 260 in [Han-Icassp] and [Han-Itip] that when a one-sided boundary is 261 available, as in most INTRA prediction modes, the ADST rather than 262 the DCT is the optimal transform for the residual signal. Intra 263 prediction modes that predict from a left edge can use the 1-D ADST 264 in the horizontal direction, combined with a 1-D DCT in the vertical 265 direction. Similarly, the residual signal resulting from intra 266 prediction modes that predict from the top edge can employ a vertical 267 1-D ADST transform combined with a horizontal 1-D DCT transform. 268 Intra prediction modes that predict from both edges such as the True 269 Motion (TM_PRED) mode and some diagonal intra prediction modes, use 270 the 1-D ADST in both horizontal and vertical directions. 272 2.5. Motion Vector Reference Selection and Coding 274 One of the most critical factors in the efficiency of motion vector 275 encoding is the generation of a suitable reference motion vector to 276 be used as a predictor. VP9 creates a sorted list of candidate 277 reference motion vectors that encompasses the three vectors best, 278 nearest and near as defined by VP8. In addition to the candidates 279 produced by the VP8 algorithm, VP9 additionally evaluates the motion 280 vector of the co-located block in the reference frame and those of 281 nearby blocks. VP9 introduces a new scoring mechanism to rank these 282 reference vectors whereby each candidate is evaluated to determine 283 how well it would have predicted the reconstructed pixels in close 284 proximity to the current block (more specifically a small number of 285 rows immediately above the current block, and maybe a small number of 286 columns to the left of the current block). A predictor is created 287 using each candidate vector in turn to displace the pixels in the 288 reference frame and the variance of the resulting error signal, with 289 respect to the set of pixels in the current frame, is used to rank 290 the reference vectors. 292 With the three best candidate reference vectors best, nearest and 293 near identified, the encoder can either signal the use of the vector 294 identified as the nearest (NEAREST_MV mode) or near (NEAR_MV mode) 295 or, if neither of them is deemed appropriate, signal the use of a 296 completely new motion vector (NEW_MV mode) that is then specified as 297 a delta from the best reference candidate. 299 One further mode, ZERO_MV, signals the use of the (0, 0) motion 300 vector. 302 In addition, a more efficient motion vector offset encoding mechanism 303 has been introduced. 305 2.6. Entropy Coding and Adaptation 307 The VP9 bitstream employs the VP8 BoolCoder as the underlying 308 arithmetic encoder. Generally speaking, given a symbol from any 309 n-ary alphabet, a static binary tree is constructed with n-1 internal 310 nodes, and a binary arithmetic encoder is run at each such node as 311 the tree is traversed to encode a particular symbol. The 312 probabilities at each node use 8-bit precision. The set of n-1 313 probabilities for coding the symbol is referred to as the entropy 314 coding context of the symbol. Almost all of the coding elements 315 conveyed in a bit-stream - including modes, motion vectors, reference 316 frames, and prediction residuals for each transform type and size - 317 use this strategy. 319 Video content is inherently highly non-stationary in nature and a 320 critical component of any codec is the mechanism used to track the 321 statistics of the various encoded symbols and update the parameters 322 of the entropy coding contexts to match. VP9 makes use of forward 323 context updates through the use of flags in the frame header that 324 signal modifications of the coding contexts at the start of each 325 frame. The syntax for forward updates is designed to allow an 326 arbitrary sub-set of the node probabilities to be updated whilst 327 leaving the others unchanged. The advantage of using forward 328 adaptation is that decoding performance can be substantially 329 improved, because no intermediate computations based on encountered 330 token counts is necessary. Updates are encoded differentially, to 331 allow a more efficient specification of updated coding contexts which 332 is essential given the expanded set of tokens available in VP9. 334 In addition, there is also a limited option for signaling backward 335 adaptation, which in VP9 is only possible at the end of encoding each 336 frame so that the impact on decoding speed is minimal. Specifically, 337 for every frame encoded, first a forward update modifies the entropy 338 coding contexts for various symbols encoded starting from the initial 339 state at the beginning of the frame. Thereafter, all symbols encoded 340 in the frame are coded using this modified coding state. At the end 341 of the frame, both the encoder and decoder are expected to have 342 accumulated counts for various symbols actually encoded or decoded 343 over the frame. Using these actual distributions, a backward update 344 step is applied to adapt the entropy coding context for use as the 345 baseline for the next frame. 347 2.7. Loop Filter 349 VP9 introduces a variety of new prediction block and transform sizes 350 that require additional loop filtering options to handle a larger 351 number of combinations of boundary types. VP9 also incorporates a 352 flatness detector in the loop filter that detects flat regions and 353 varies the filter strength and size accordingly. 355 2.8. Segmentation 357 VP9 introduces more advanced segmentation features that make it much 358 more efficient and powerful, allowing each superblock or macroblock 359 to specify a segment-ID to which it belongs. Then, for each segment, 360 the frame header can convey common features that will be applied to 361 all MBs/SB32s/SB64s belonging to the same segment ID. Further, the 362 segmentation map is coded differentially across frames in order to 363 minimize the size of the signaling overhead. Examples of information 364 that can be conveyed for a segment include: restrictions on the 365 reference frames that can be used for each segment, coefficient 366 skips, quantizer and loopfilter strength, and transform size options. 367 Generally speaking, the segmentation mechanism provides a flexible 368 set of tools that can be used, in an application specific way, to 369 target improvements in perceptual quality for a given compression 370 ratio. 372 In the reference implementation, segmentation is currently used to 373 identify background and foreground areas in encoded video content. 374 The (static) background is then coded at a higher quality compared to 375 the rest of the frame in certain reference frames (such as the alt- 376 ref frame) that provides prediction that persists over a number of 377 frames. In contrast, for the frames between these persistent 378 reference frames, the background is given fewer bits by, for example, 379 restricting the set of available reference buffers, using only the 380 ZERO_MV coding mode, or skipping the residual coefficient block. The 381 result is that more bits are available to code the foreground-portion 382 of the scene, while still preserving very good perceptual quality on 383 the static background. Other use cases involving spatial and 384 temporal masking for perceptual quality improvement are conceivable. 386 3. Bitstream features 388 In addition to providing high compression efficiency with reasonable 389 complexity, the VP9 bitstream includes features designed to support a 390 variety of specific use-cases that are important to internet video 391 content delivery and consumption. This section provides an overview 392 of these features. 394 3.1. Error-Resilience 396 For communication of conversational video with low latency over an 397 unreliable network, it is imperative to support a coding mode where 398 decoding can continue without errors even when arbitrary frames are 399 lost. Specifically, the arithmetic encoder should still be able to 400 decode symbols correctly in frames subsequent to lost frames, even 401 though frame buffers have been corrupted, leading to encoder-decoder 402 mismatch. The hope is that the drift between the encoder and decoder 403 will still be manageable until such time as a key frame is sent or 404 other corrective action (such as reference picture selection) can be 405 taken. VP9 supports a frame level error_resilient_mode flag which 406 when turned on will only allow coding modes where this is possible to 407 achieve. In particular, the following restrictions are imposed in 408 error resilient mode: 410 1. The entropy coding context probabilities are reset to defaults at 411 the beginning of each frame. (This effectively prevents 412 propagation of forward updates as well as backward updates), 414 2. For MV reference selection, the co-located MV from a previously 415 encoded reference frame can no longer be included in the 416 reference candidate list, 418 3. For MV reference selection, sorting of the initial list of motion 419 vector reference candidates based on search in the reference 420 frame buffer is disabled. 422 These restrictions produce a modest performance drop. 424 3.2. Parallel Decodability 426 Smooth encoding and playback of high-definition video on resource 427 constrained personal devices (smartphones, tablets, netbooks, etc.) 428 in software necessitates exploiting some form of parallelism, so that 429 multi-threaded applications can be built around the codec to exploit 430 the inherent multi-processing capabilities of modern processors. 431 This may include either the ability to encode/decode parts of a frame 432 in parallel, or the ability to decode successive frames in parallel, 433 or a combination of both. VP9 supports both forms of parallelism, as 434 described below: 436 3.2.1. Frame-Level Parallelism 438 A frame level flag frame_parallel_mode, when turned on, enables an 439 encoding mode where the entropy decoding for successive frames can be 440 conducted in a quasi-parallel manner just by parsing the frame 441 headers, before these frames actually need to be reconstructed. In 442 this mode, only the frame headers need to be decoded sequentially. 443 Beyond that, the entropy decoding for each frame can be conducted in 444 a lagged parallel mode as long as the co-located motion vector 445 information from a previous reference frame has been decoded prior to 446 the current frame. The reconstruction of the frames can then be 447 conducted sequentially in coding order as they are required to be 448 displayed. This mode will enable multi-threaded decoder 449 implementations that results in smoother playback performance. 450 Specifically, this mode imposes the following restrictions on the 451 bitstream, which is a subset of the restrictions for the error- 452 resilient mode. 454 1. Backward entropy coding context updates are disabled. But 455 forward updates are allowed to propagate. 457 2. For MV reference selection, sorting of the initial list of motion 458 vector reference candidates based on a search in the reference 459 frame buffer is disabled. However, the co-located MV from a 460 previously encoded reference frame can be included in the initial 461 candidate list. 463 3.2.2. Tiling 465 In addition to making provisions for decoding multiple frames in 466 parallel, VP9 also has support for decoding a single frame using 467 multiple threads. For this, VP9 introduces tiles, which are 468 independently coded and decodable sub-units of the video frame. When 469 enabled a frame can be split into, for example, 2 or 4 column-based 470 tiles. Each tile shares the same frame entropy model, but all 471 contexts and pixel values (for intra prediction) that cross tile- 472 boundaries take the same value as those at the left, top or right 473 edge of the frame. Each tile can thus be decoded and encoded 474 completely independently, which is expected to enable significant 475 speedups in multi-threaded encoders/decoders, without introducing any 476 additional latency. Note that loop-filtering across tile-edges can 477 still be applied, assuming a decoder implementation model where the 478 loop-filtering operation lags the decoder's reconstruction of the 479 individual tiles within the frame so as not to use any pixel that is 480 not already reconstructed. Further, backward entropy adaptation - a 481 light-weight operation - can still be conducted for the whole frame 482 after entropy decoding for all tiles has finished. 484 3.3. Scalability 486 The VP9 bit-stream will provide a number of flexible features that 487 can be combined in specific ways to efficiently provide various forms 488 of scalability. VP9 increases the number of available reference 489 frame buffers to eight, from which three may be selected for each 490 frame. In addition, each coded frame may be resampled and coded at a 491 resolution different from the reference buffers, allowing internal 492 spatial resolution changes on-the-fly without having to resort to 493 using keyframes. When such a resolution change is signaled in the 494 bit-stream, the reference buffers as well as the corresponding MV 495 information is suitably transformed to the new resolution before 496 applying standard coding tools. Furthermore, VP9 defines the 497 maintenance of four different entropy coding contexts to be selected 498 and optionally updated on every frame, thereby making it possible for 499 the encoder to use a different entropy coding context for each 500 scalable layer, if required. These flexible features together enable 501 an encoder/decoder to implement various forms of coarse-grained 502 scalability, including temporal, spatial, or combined spatio-temporal 503 scalability, without explicitly creating spatially scalable encoding 504 modes. 506 4. IANA Considerations 508 This document makes no request of IANA. 510 Note to RFC Editor: this section may be removed on publication as an 511 RFC. 513 5. Security Considerations 515 The VP9 bitstream offers no security functions. Integrity and 516 confidentiality must be ensured by functions outside the bistream. 518 The VP9 bitstream does not offer functions for embedding of other 519 types of objects, either active or passive. So this class of attack 520 cannot be mounted using VP9. 522 Implementations of codecs are often written with a strong focus on 523 speed. The reference software has been carefully vetted for security 524 issues, but no guarantees can be given. People who use other 525 people's decoder software will need to take appropriate care when 526 executing the software in a security sensitive context. 528 6. Acknowledgements 530 This document is heavily based on the paper by Bankoski, J., Bultje, 531 R.S., Grange, A., Gu, Q., Han, J., Koleszar, J., Mukherjee, D., 532 Wilkins, P., Xu, Y., Towards a Next Generation Open-source Video 533 Codec, IS&T / SPIE EI Conference on Visual Information Processing and 534 Communication IV, February 5-7, 2013. 536 7. Informative References 538 [Google-webm] 539 "WEBM project website", March . 541 http://www.webmproject.org/ 543 [Han-Icassp] 544 Han, J., "Towards jointly optimal spatial prediction and 545 adaptive transform in video/image coding", March 2010. 547 IEEE Int. Conf. on Acoustics, Speech and Signal Proc. 548 (ICASSP), pp. 726-729 550 [Han-Itip] 551 Han, J., "Jointly optimized spatial prediction and block 552 transform for video and image coding", April 2012. 554 IEEE Transactions on Image Processing, vol. 21, pp. 1874- 555 1884 557 [RFC6368] Marques, P., Raszuk, R., Patel, K., Kumaki, K., and T. 558 Yamagata, "Internal BGP as the Provider/Customer Edge 559 Protocol for BGP/MPLS IP Virtual Private Networks (VPNs)", 560 RFC 6368, September 2011. 562 [vp9-paper] 563 Bankoski, J., Bultje, R., Grange, A., Gu, Q., Han, J., 564 Koleszar, J., Mukherjee, D., Wilkins, P., and Y. Xu, 565 "Towards a Next Generation Open-source Video Codec", 566 February 2013. 568 IS&T / SPIE EI Conference on Visual Information Processing 569 and Communication IV 571 Authors' Addresses 573 Adrian Grange 574 Google 576 Email: agrange@google.com 578 Harald Alvestrand 579 Google 581 Phone: 582 Fax: 583 Email: hta@google.com 584 URI: