idnits 2.17.1 draft-cho-netvc-applypvq-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (November 14, 2016) is 2713 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '1' on line 401 -- Looks like a reference, but probably isn't: '2' on line 403 -- Looks like a reference, but probably isn't: '3' on line 405 -- Looks like a reference, but probably isn't: '4' on line 407 -- Looks like a reference, but probably isn't: '5' on line 410 -- Looks like a reference, but probably isn't: '6' on line 413 -- Looks like a reference, but probably isn't: '7' on line 416 -- Looks like a reference, but probably isn't: '8' on line 419 -- Looks like a reference, but probably isn't: '9' on line 421 Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 10 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 NETVC (Internet Video Codec) Y. Cho 3 Internet-Draft Mozilla Corporation 4 Intended status: Informational November 14, 2016 5 Expires: May 18, 2017 7 Applying PVQ Outside Daala 8 draft-cho-netvc-applypvq-03 10 Abstract 12 This document describes the Perceptual Vector Quantization (PVQ) 13 outside of the Daala video codec, where PVQ was originally developed. 14 It discusses the issues arising while integrating PVQ into a 15 traditional video codec, AV1. 17 Status of This Memo 19 This Internet-Draft is submitted in full conformance with the 20 provisions of BCP 78 and BCP 79. 22 Internet-Drafts are working documents of the Internet Engineering 23 Task Force (IETF). Note that other groups may also distribute 24 working documents as Internet-Drafts. The list of current Internet- 25 Drafts is at http://datatracker.ietf.org/drafts/current/. 27 Internet-Drafts are draft documents valid for a maximum of six months 28 and may be updated, replaced, or obsoleted by other documents at any 29 time. It is inappropriate to use Internet-Drafts as reference 30 material or to cite them other than as "work in progress." 32 This Internet-Draft will expire on May 18, 2017. 34 Copyright Notice 36 Copyright (c) 2016 IETF Trust and the persons identified as the 37 document authors. All rights reserved. 39 This document is subject to BCP 78 and the IETF Trust's Legal 40 Provisions Relating to IETF Documents 41 (http://trustee.ietf.org/license-info) in effect on the date of 42 publication of this document. Please review these documents 43 carefully, as they describe your rights and restrictions with respect 44 to this document. Code Components extracted from this document must 45 include Simplified BSD License text as described in Section 4.e of 46 the Trust Legal Provisions and are provided without warranty as 47 described in the Simplified BSD License. 49 Table of Contents 51 1. Background . . . . . . . . . . . . . . . . . . . . . . . . . 2 52 2. Integration of PVQ into non-Daala codec, AV1 . . . . . . . . 3 53 2.1. Signaling Skip for Partition and Transform Block . . . . 4 54 2.2. Issues . . . . . . . . . . . . . . . . . . . . . . . . . 5 55 3. Performance of PVQ in AV1 . . . . . . . . . . . . . . . . . . 5 56 3.1. Coding Gain . . . . . . . . . . . . . . . . . . . . . . . 5 57 3.2. Speed . . . . . . . . . . . . . . . . . . . . . . . . . . 7 58 4. Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 8 59 5. Development Repository . . . . . . . . . . . . . . . . . . . 8 60 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 8 61 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 62 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 9 63 8.1. Informative References . . . . . . . . . . . . . . . . . 9 64 8.2. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 9 65 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 10 67 1. Background 69 Perceptual Vector Quantization (PVQ) 70 [Perceptual-VQ][I-D.valin-netvc-pvq] has been proposed as a 71 quantization and coefficient coding tool for an internet video codec. 72 PVQ was originally developed for the Daala video codec [1] 73 [PVQ-demo], which does a gain-shape coding of transform coefficients 74 instead of more traditional scalar quantization. (The original 75 abbreviation of PVQ, "Pyramid Vector Quantizer", as in 76 [I-D.valin-netvc-pvq] is now commonly expanded as "Perceptual Vector 77 Quantization".) 79 The most distinguishing idea of PVQ is the way it references a 80 predictor. With PVQ, we do not subtract the predictor from the input 81 to produce a residual, which is then transformed and coded. Both the 82 predictor and the input are transformed into the frequency domain. 83 Then, PVQ applies a reflection to both the predictor and the input 84 such that the prediction vector lies on one of the coordinate axes, 85 and codes the angle between them. By not subtracting the predictor 86 from the input, the gain of the predictor can be preserved and is 87 explicitly coded, which is one of the benefits of PVQ. Since DC is 88 not quantized by PVQ, the gain can be viewed as the amount of 89 contrast in an image, which is an important perceptual parameter. 91 Also, an input block of transform coefficients is split into 92 frequency bands based on their spatial orientation and scale. Then, 93 each band is quantized by PVQ separately. The 'gain' of a band 94 indicates the amount of contrast in the corresponding orientation and 95 scale. It is simply the L2 norm of the band. The gain is non- 96 linearly companded and then scalar quantized and coded. The 97 remaining information in the band, the 'shape', is then defined as a 98 point on the surface of a unit hypersphere. 100 Another benefit of PVQ is activity masking based on the gain, which 101 automatically controls the quantization resolution based on the image 102 contrast without any signaling. For example, for a smooth image area 103 (i.e. low contrast thus low gain), the resolution of quantization 104 will increase, thus fewer quantization errors will be shown. A 105 succinct summary on the benefits of PVQ can be found in the 106 Section 2.4 of [Terriberry_16]. 108 Since PVQ has only been used in the Daala video codec, which contains 109 many non-traditional design elements, there has not been any chance 110 to see the relative coding performance of PVQ compared to scalar 111 quantization in a more traditional codec design. We have tried to 112 apply PVQ in the AV1 video codec, which is currently being developed 113 by Alliance for Open Media (AOM) as an open source and royalty-free 114 video codec. While most of benefits of using PVQ arise from the 115 enhancement of subjective quality of video, compression results with 116 activity masking enabled are not available yet in this draft because 117 the required parameters, which were set for Daala, have not been 118 adjusted to AV1 yet. These results were achieved optimizing solely 119 for PSNR. 121 2. Integration of PVQ into non-Daala codec, AV1 123 Adopting PVQ in AV1 requires replacing both the scalar quantization 124 step and the coefficient coding of AV1 with those of PVQ. In terms 125 of inputs to PVQ and the usage of transforms as shown in Figure 1 and 126 Figure 2, the biggest conceptual changes required in a traditional 127 coding system, such as AV1, are 129 o Introduction of a transformed predictor both in encoder and 130 decoder. For this, we apply a forward transform to the 131 predictors, both intra-predicted pixels and inter-predicted (i.e., 132 motion-compensated) pixels. This is because PVQ references the 133 predictor in the transform domain, instead of using a pixel-domain 134 residual as in traditional scalar quantization. 136 o Absence of a difference signal (i.e. residue) defined as "input 137 source - predictor". Hence AV1 with PVQ does not do any 138 'subtraction' in order for an input to reference the predictor. 139 Instead, PVQ takes a different approach to referencing the 140 predictor which happens in the transform domain. 142 input X --> +-------------+ +-------------+ 143 | Subtraction | --> residue --> | Transform T | 144 predictor --> +-------------+ signal R +-------------+ 145 P | 146 | v 147 v T(R) 148 [+]--> decoded X | 149 ^ | 150 | v 151 | +-----------+ +-----------+ +-----------+ 152 decoded <-- | Inverse | <--| Inverse | <-- | Scalar | 153 R | Transform | | Quantizer | | | Quantizer | 154 +-----------+ +-----------+ | +-----------+ 155 v 156 +-------------+ 157 bitstream <--| Coefficient | 158 of coded T(R) | Coder | 159 +-------------+ 161 Figure 1: Traditional architecture containing Quantization and 162 Transforms 164 +-------------+ +-----------+ 165 input X-->| Transform T |--> T(X)--> | PVQ | 166 |_____________| | Quantizer | +-------------+ 167 +----> +-----------+ | PVQ | 168 +-------------+ | |------> | Coefficient | 169 predictor-->| Transform T |--> T(P) v | Coder | 170 P |_____________| | +-----------+ +-------------+ 171 | | PVQ | | 172 +----> | Inverse | v 173 | Quantizer | bitstream 174 +-----------+ of coded T(X) 175 | 176 +-----------+ v 177 decoded X <--| Inverse | <--------- dequantized T(X) 178 | Transform | 179 +-----------+ 181 Figure 2: AV1 with PVQ 183 2.1. Signaling Skip for Partition and Transform Block 185 In AV1, a skip flag for a partition block is true if all of the 186 quantized coefficients in the partition are zeros. The signaling for 187 the prediction mode in a partition cannot be skipped. If the skip 188 flag is true with PVQ, the predicted pixels are the final decoded 189 pixels (plus frame wise in-loop filtering such as deblocking) as in 190 AV1 then a forward transform of a predictor is not required. 192 While AV1 currently defines only one 'skip' flag for each 'partition' 193 (a unit where prediction is done), PVQ introduces another kind of 194 'skip' flag, called 'ac_dc_coded', which is defined for each 195 transform block (and thus for each Y'CbCr plane as well). AV1 allows 196 that a transform size can be smaller than a partition size which 197 leads to partitions that can have multiple transform blocks. The 198 ac_dc_coded flag signals whether DC and/or whole AC coefficients are 199 coded by PVQ or not (PVQ does not quantize DC itself though). 201 2.2. Issues 203 o PVQ has its own rate-distortion optimization (RDO) that differs 204 from that of traditional scalar quantization. This leads the 205 balance of quality between luma and chroma to be different from 206 that of scalar quantization. When scalar quantization of AV1 is 207 done for a block of coefficients, RDO, such as trellis coding, can 208 be optionally performed. The second pass of 2-pass encoding in 209 AV1 currently uses trellis coding. When doing so it appears a 210 different scaling factor is applied for each of Y'CbCr channels. 212 o In AV1, to optimize speed, there are inverse transforms that can 213 skip applying certain 1D basis functions based on the distribution 214 of quantized coefficients. However, this is mostly not possible 215 with PVQ since the inverse transform is applied directly to a 216 dequantized input, instead of a dequantized difference (i.e. input 217 source - predictor) as in traditional video codec. This is true 218 for both encoder and decoder. 220 o PVQ was originally designed for the 2D DCT, while AV1 also uses a 221 hybrid 2D transform consisting of a 1D DCT and a 1D ADST. This 222 requires PVQ to have new coefficient scanning orders for the two 223 new 2D transforms, DCT-ADST and ADST-DCT (ADST-ADST uses the same 224 scan order as for DCT-DCT). Those new scan orders have been 225 produced based on that of AV1, for each PVQ-defined-band of new 2D 226 transforms. 228 3. Performance of PVQ in AV1 230 3.1. Coding Gain 232 With the encoding options specified by both NETVC ([2]) and AOM 233 testing for the high latency case, PVQ gives similar coding 234 efficiency to that of AV1, which is measured in PSNR BD-rate. Again, 235 PVQ's activity masking is not turned on for this testing. Also, 236 scalar quantization has matured over decades, while video coding with 237 PVQ is much more recent. 239 We compare the coding efficiency for the IETF test sequence set 240 "objective-1-fast" defined in [3], which consists of sixteen of 241 1080p, seven of 720p, and seven of 640x360 sequences of various types 242 of content, including slow/high motion of people and objects, 243 animation, computer games and screen casting. The encoding is done 244 for the first 30 frames of each sequence. The encoding options used 245 is : "-end-usage=q -cq-level=x --passes=2 --good --cpu-used=0 --auto- 246 alt-ref=2 --lag-in-frames=25 --limit=30", which is official test 247 condition of IETF and AOM for high latency encoding except limiting 248 30 frames only. 250 For comparison reasons, some of the lambda values used in RDO are 251 adjusted to match the balance of luma and chroma quality of the PVQ- 252 enabled AV1 to that of current AV1. 254 o Use half the value of lambda during intra prediction for the 255 chroma channels. 257 o Scale PVQ's lambda by 0.9 for the chroma channels. 259 o Do not do RDO of DC for the chroma channels. 261 The results are shown in Table 1, which is the BD-Rate change for 262 several image quality metrics. (The encoders used to generate this 263 result are available from the author's git repository [4] and AOM's 264 repository [5].) 266 +-----------+----------------------+ 267 | Metric | AV1 --> AV1 with PVQ | 268 +-----------+----------------------+ 269 | PSNR | -0.17% | 270 | | | 271 | PSNR-HVS | 0.27% | 272 | | | 273 | SSIM | 0.93% | 274 | | | 275 | MS-SSIM | 0.14% | 276 | | | 277 | CIEDE2000 | -0.28% | 278 +-----------+----------------------+ 280 Table 1: Coding Gain by PVQ in AV1 282 3.2. Speed 284 Total encoding time increases roughly 20 times or more when intensive 285 RDO options, such as "--passes=2 --good --cpu-used=0 --auto-alt-ref=2 286 --lag-in-frames=25", are enabled. The significant increase in 287 encoding time is due to the increase of computation by the PVQ. The 288 PVQ tries to find asymptotically-optimal codepoints (in RD 289 optimization sense) on a hypersphere with a greedy search, which has 290 time complexity close to O(n*n) for n coefficients. Meanwhile, 291 scalar quantization has time complexity of O(n). 293 Compared to Daala, the search space for a RDO decision in AV1 is far 294 larger because AV1 considers ten intra prediction modes and four 295 different transforms (for the transform block sizes 4x4, 8x8, and 296 16x16 only), and the transform block size can be smaller than the 297 prediction block size. Since the largest transform and the 298 prediction sizes are currently 32x32 and 64x64 in AV1, PVQ can be 299 called approximately 5,160 times more in AV1 than in Daala. Also, 300 AV1 applies transform and quantization for each candidate of RDO. 302 As an example, AV1 calls the PVQ function 632,520 times to encode the 303 grandma_qcif (176x144) clip in intra frame mode while Daala calls 304 3843 times only (for QP = 30 and 39 for AV1 and daala respectively, 305 which corresponds to actual quantizer used for quantization being 306 38). So, PVQ was called 165 times more in AV1 than Daala. 308 Table 2 shows the frequency of function calls to PVQ and scalar 309 quantizers in AV1 at each speed level (where AV1 encoding mode is 310 'good') for the same sequence and the QP as used in the above 311 example. The first column indicates speed level, the second column 312 shows the number of calls to PVQ's search inside each band (function 313 pvq_search_rdo_double() in [6]), the third column shows the number of 314 calls to PVQ quantization of a transform block (function 315 od_pvq_encode() in [7]), and the fourth column shows the number of 316 calls to AV1's block quantizer. Smaller speed level gives slower 317 encoding but better quality for the same rate by doing more RDO 318 optimizations. The major difference from speed level 4 to 3 is 319 enabling a use of the transform block smaller than the prediction 320 (i.e. partition) block. 322 +--------+-----------------+-----------------+----------------------+ 323 | Speed | # of calls to | # of calls to | # of calls to PVQ | 324 | Level | AV1 quantizer | PVQ quantizer | search inside a band | 325 +--------+-----------------+-----------------+----------------------+ 326 | 5 | 28,028 | 26,786 | 365,913 | 327 | | | | | 328 | 4 | 57,445 | 56,980 | 472,222 | 329 | | | | | 330 | 3 | 505,039 | 564,724 | 3,680,366 | 331 | | | | | 332 | 2 | 505,039 | 564,724 | 3,680,366 | 333 | | | | | 334 | 1 | 535,100 | 580,566 | 3,990,327 | 335 | | | | | 336 | 0 | 589,931 | 632,520 | 4,109,113 | 337 +--------+-----------------+-----------------+----------------------+ 339 Table 2: Comparison of Frequency of Calls to PVQ and Scalar 340 Quantizers in AV1 342 4. Future Work 344 Possible future work includes: 346 o Enable activity masking, which also needs a HVS-tuned quantization 347 matrix (bandwise QP scalars). 349 o Adjust the balance between luma and chroma qualities, tuning for 350 subjective quality. 352 o Optimize the speed of the PVQ code, adding SIMD. 354 o RDO with more model-driven decision making, instead of full 355 transform + quantization. 357 5. Development Repository 359 The ongoing work of integrating PVQ into AV1 video codec is located 360 at the git repository [8]. 362 6. Acknowledgements 364 Thanks to Tim Terriberry for his proofreading and valuable comments. 365 Also thanks to Guillaume Martres for his contributions to integrating 366 PVQ into AV1 during his internship at Mozilla and Thomas Daede for 367 providing and maintaining the testing infrastructure by way of the 368 www.arewecompressedyet.com (AWCY) website. [9]. 370 7. IANA Considerations 372 This memo includes no request to IANA. 374 8. References 376 8.1. Informative References 378 [I-D.valin-netvc-pvq] 379 Valin, J., "Pyramid Vector Quantization for Video Coding", 380 draft-valin-netvc-pvq-00 (work in progress), June 2015. 382 [Perceptual-VQ] 383 Valin, JM. and TB. Terriberry, "Perceptual Vector 384 Quantization for Video Coding", Proceedings of SPIE Visual 385 Information Processing and Communication , February 2015, 386 . 388 [PVQ-demo] 389 Valin, JM., "Daala: Perceptual Vector Quantization (PVQ)", 390 November 2014, . 393 [Terriberry_16] 394 Terriberry, TB., "Perceptually-Driven Video Coding with 395 the Daala Video Codec", Proceedings SPIE Volume 9971, 396 Applications of Digital Image Processing XXXIX , September 397 2016, . 399 8.2. URIs 401 [1] https://xiph.org/daala/ 403 [2] https://tools.ietf.org/html/draft-ietf-netvc-testing-03 405 [3] https://tools.ietf.org/html/draft-ietf-netvc-testing-03 407 [4] https://github.com/ycho/aom/ 408 commit/2478029a9b6d02ee2ccc9dbafe7809b5ef345814 410 [5] https://aomedia.googlesource.com/ 411 aom/+/59848c5c797ddb6051e88b283353c7562d3a2c24 413 [6] https://github.com/ycho/aom/blob/14981eebb4a08f74182cea3c17f7361b 414 c79cf04f/av1/encoder/pvq_encoder.c#L84 416 [7] https://github.com/ycho/aom/blob/14981eebb4a08f74182cea3c17f7361b 417 c79cf04f/av1/encoder/pvq_encoder.c#L763 419 [8] https://github.com/ycho/aom/tree/av1_pvq 421 [9] https://arewecompressedyet.com/ 423 Author's Address 425 Yushin Cho 426 Mozilla Corporation 427 331 E. Evelyn Avenue 428 Mountain View, CA 94041 429 USA 431 Phone: +1 650 903 0800 432 Email: ycho@mozilla.com