idnits 2.17.1 

draft-cho-netvc-applypvq-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a Security Considerations section.

  == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the
     document.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (November 14, 2016) is 2713 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  -- Looks like a reference, but probably isn't: '1' on line 401

  -- Looks like a reference, but probably isn't: '2' on line 403

  -- Looks like a reference, but probably isn't: '3' on line 405

  -- Looks like a reference, but probably isn't: '4' on line 407

  -- Looks like a reference, but probably isn't: '5' on line 410

  -- Looks like a reference, but probably isn't: '6' on line 413

  -- Looks like a reference, but probably isn't: '7' on line 416

  -- Looks like a reference, but probably isn't: '8' on line 419

  -- Looks like a reference, but probably isn't: '9' on line 421


     Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 10 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	NETVC (Internet Video Codec)                                      Y. Cho
3	Internet-Draft                                       Mozilla Corporation
4	Intended status: Informational                         November 14, 2016
5	Expires: May 18, 2017

7	                       Applying PVQ Outside Daala
8	                      draft-cho-netvc-applypvq-03

10	Abstract

12	   This document describes the Perceptual Vector Quantization (PVQ)
13	   outside of the Daala video codec, where PVQ was originally developed.
14	   It discusses the issues arising while integrating PVQ into a
15	   traditional video codec, AV1.

17	Status of This Memo

19	   This Internet-Draft is submitted in full conformance with the
20	   provisions of BCP 78 and BCP 79.

22	   Internet-Drafts are working documents of the Internet Engineering
23	   Task Force (IETF).  Note that other groups may also distribute
24	   working documents as Internet-Drafts.  The list of current Internet-
25	   Drafts is at http://datatracker.ietf.org/drafts/current/.

27	   Internet-Drafts are draft documents valid for a maximum of six months
28	   and may be updated, replaced, or obsoleted by other documents at any
29	   time.  It is inappropriate to use Internet-Drafts as reference
30	   material or to cite them other than as "work in progress."

32	   This Internet-Draft will expire on May 18, 2017.

34	Copyright Notice

36	   Copyright (c) 2016 IETF Trust and the persons identified as the
37	   document authors.  All rights reserved.

39	   This document is subject to BCP 78 and the IETF Trust's Legal
40	   Provisions Relating to IETF Documents
41	   (http://trustee.ietf.org/license-info) in effect on the date of
42	   publication of this document.  Please review these documents
43	   carefully, as they describe your rights and restrictions with respect
44	   to this document.  Code Components extracted from this document must
45	   include Simplified BSD License text as described in Section 4.e of
46	   the Trust Legal Provisions and are provided without warranty as
47	   described in the Simplified BSD License.

49	Table of Contents

51	   1.  Background  . . . . . . . . . . . . . . . . . . . . . . . . .   2
52	   2.  Integration of PVQ into non-Daala codec, AV1  . . . . . . . .   3
53	     2.1.  Signaling Skip for Partition and Transform Block  . . . .   4
54	     2.2.  Issues  . . . . . . . . . . . . . . . . . . . . . . . . .   5
55	   3.  Performance of PVQ in AV1 . . . . . . . . . . . . . . . . . .   5
56	     3.1.  Coding Gain . . . . . . . . . . . . . . . . . . . . . . .   5
57	     3.2.  Speed . . . . . . . . . . . . . . . . . . . . . . . . . .   7
58	   4.  Future Work . . . . . . . . . . . . . . . . . . . . . . . . .   8
59	   5.  Development Repository  . . . . . . . . . . . . . . . . . . .   8
60	   6.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .   8
61	   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   9
62	   8.  References  . . . . . . . . . . . . . . . . . . . . . . . . .   9
63	     8.1.  Informative References  . . . . . . . . . . . . . . . . .   9
64	     8.2.  URIs  . . . . . . . . . . . . . . . . . . . . . . . . . .   9
65	   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  10

67	1.  Background

69	   Perceptual Vector Quantization (PVQ)
70	   [Perceptual-VQ][I-D.valin-netvc-pvq] has been proposed as a
71	   quantization and coefficient coding tool for an internet video codec.
72	   PVQ was originally developed for the Daala video codec [1]
73	   [PVQ-demo], which does a gain-shape coding of transform coefficients
74	   instead of more traditional scalar quantization.  (The original
75	   abbreviation of PVQ, "Pyramid Vector Quantizer", as in
76	   [I-D.valin-netvc-pvq] is now commonly expanded as "Perceptual Vector
77	   Quantization".)

79	   The most distinguishing idea of PVQ is the way it references a
80	   predictor.  With PVQ, we do not subtract the predictor from the input
81	   to produce a residual, which is then transformed and coded.  Both the
82	   predictor and the input are transformed into the frequency domain.
83	   Then, PVQ applies a reflection to both the predictor and the input
84	   such that the prediction vector lies on one of the coordinate axes,
85	   and codes the angle between them.  By not subtracting the predictor
86	   from the input, the gain of the predictor can be preserved and is
87	   explicitly coded, which is one of the benefits of PVQ.  Since DC is
88	   not quantized by PVQ, the gain can be viewed as the amount of
89	   contrast in an image, which is an important perceptual parameter.

91	   Also, an input block of transform coefficients is split into
92	   frequency bands based on their spatial orientation and scale.  Then,
93	   each band is quantized by PVQ separately.  The 'gain' of a band
94	   indicates the amount of contrast in the corresponding orientation and
95	   scale.  It is simply the L2 norm of the band.  The gain is non-
96	   linearly companded and then scalar quantized and coded.  The
97	   remaining information in the band, the 'shape', is then defined as a
98	   point on the surface of a unit hypersphere.

100	   Another benefit of PVQ is activity masking based on the gain, which
101	   automatically controls the quantization resolution based on the image
102	   contrast without any signaling.  For example, for a smooth image area
103	   (i.e. low contrast thus low gain), the resolution of quantization
104	   will increase, thus fewer quantization errors will be shown.  A
105	   succinct summary on the benefits of PVQ can be found in the
106	   Section 2.4 of [Terriberry_16].

108	   Since PVQ has only been used in the Daala video codec, which contains
109	   many non-traditional design elements, there has not been any chance
110	   to see the relative coding performance of PVQ compared to scalar
111	   quantization in a more traditional codec design.  We have tried to
112	   apply PVQ in the AV1 video codec, which is currently being developed
113	   by Alliance for Open Media (AOM) as an open source and royalty-free
114	   video codec.  While most of benefits of using PVQ arise from the
115	   enhancement of subjective quality of video, compression results with
116	   activity masking enabled are not available yet in this draft because
117	   the required parameters, which were set for Daala, have not been
118	   adjusted to AV1 yet.  These results were achieved optimizing solely
119	   for PSNR.

121	2.  Integration of PVQ into non-Daala codec, AV1

123	   Adopting PVQ in AV1 requires replacing both the scalar quantization
124	   step and the coefficient coding of AV1 with those of PVQ.  In terms
125	   of inputs to PVQ and the usage of transforms as shown in Figure 1 and
126	   Figure 2, the biggest conceptual changes required in a traditional
127	   coding system, such as AV1, are

129	   o  Introduction of a transformed predictor both in encoder and
130	      decoder.  For this, we apply a forward transform to the
131	      predictors, both intra-predicted pixels and inter-predicted (i.e.,
132	      motion-compensated) pixels.  This is because PVQ references the
133	      predictor in the transform domain, instead of using a pixel-domain
134	      residual as in traditional scalar quantization.

136	   o  Absence of a difference signal (i.e. residue) defined as "input
137	      source - predictor".  Hence AV1 with PVQ does not do any
138	      'subtraction' in order for an input to reference the predictor.
139	      Instead, PVQ takes a different approach to referencing the
140	      predictor which happens in the transform domain.

142	         input X --> +-------------+                 +-------------+
143	                     | Subtraction | --> residue --> | Transform T |
144	       predictor --> +-------------+     signal R    +-------------+
145	               P                                            |
146	            |                                               v
147	            v                                              T(R)
148	           [+]--> decoded X                                 |
149	            ^                                               |
150	            |                                               v
151	            |       +-----------+    +-----------+     +-----------+
152	       decoded  <-- | Inverse   | <--| Inverse   | <-- | Scalar    |
153	             R      | Transform |    | Quantizer |  |  | Quantizer |
154	                    +-----------+    +-----------+  |  +-----------+
155	                                                    v
156	                                              +-------------+
157	                                bitstream  <--| Coefficient |
158	                                of coded T(R) |       Coder |
159	                                              +-------------+

161	      Figure 1: Traditional architecture containing Quantization and
162	                                Transforms

164	               +-------------+            +-----------+
165	     input X-->| Transform T |--> T(X)--> | PVQ       |
166	               |_____________|            | Quantizer |  +-------------+
167	                                   +----> +-----------+  | PVQ         |
168	               +-------------+     |            |------> | Coefficient |
169	   predictor-->| Transform T |--> T(P)          v        | Coder       |
170	           P   |_____________|     |      +-----------+  +-------------+
171	                                   |      | PVQ       |        |
172	                                   +----> | Inverse   |        v
173	                                          | Quantizer |    bitstream
174	                                          +-----------+    of coded T(X)
175	                                                 |
176	                 +-----------+                   v
177	    decoded X <--| Inverse   | <--------- dequantized T(X)
178	                 | Transform |
179	                 +-----------+

181	                          Figure 2: AV1 with PVQ

183	2.1.  Signaling Skip for Partition and Transform Block

185	   In AV1, a skip flag for a partition block is true if all of the
186	   quantized coefficients in the partition are zeros.  The signaling for
187	   the prediction mode in a partition cannot be skipped.  If the skip
188	   flag is true with PVQ, the predicted pixels are the final decoded
189	   pixels (plus frame wise in-loop filtering such as deblocking) as in
190	   AV1 then a forward transform of a predictor is not required.

192	   While AV1 currently defines only one 'skip' flag for each 'partition'
193	   (a unit where prediction is done), PVQ introduces another kind of
194	   'skip' flag, called 'ac_dc_coded', which is defined for each
195	   transform block (and thus for each Y'CbCr plane as well).  AV1 allows
196	   that a transform size can be smaller than a partition size which
197	   leads to partitions that can have multiple transform blocks.  The
198	   ac_dc_coded flag signals whether DC and/or whole AC coefficients are
199	   coded by PVQ or not (PVQ does not quantize DC itself though).

201	2.2.  Issues

203	   o  PVQ has its own rate-distortion optimization (RDO) that differs
204	      from that of traditional scalar quantization.  This leads the
205	      balance of quality between luma and chroma to be different from
206	      that of scalar quantization.  When scalar quantization of AV1 is
207	      done for a block of coefficients, RDO, such as trellis coding, can
208	      be optionally performed.  The second pass of 2-pass encoding in
209	      AV1 currently uses trellis coding.  When doing so it appears a
210	      different scaling factor is applied for each of Y'CbCr channels.

212	   o  In AV1, to optimize speed, there are inverse transforms that can
213	      skip applying certain 1D basis functions based on the distribution
214	      of quantized coefficients.  However, this is mostly not possible
215	      with PVQ since the inverse transform is applied directly to a
216	      dequantized input, instead of a dequantized difference (i.e. input
217	      source - predictor) as in traditional video codec.  This is true
218	      for both encoder and decoder.

220	   o  PVQ was originally designed for the 2D DCT, while AV1 also uses a
221	      hybrid 2D transform consisting of a 1D DCT and a 1D ADST.  This
222	      requires PVQ to have new coefficient scanning orders for the two
223	      new 2D transforms, DCT-ADST and ADST-DCT (ADST-ADST uses the same
224	      scan order as for DCT-DCT).  Those new scan orders have been
225	      produced based on that of AV1, for each PVQ-defined-band of new 2D
226	      transforms.

228	3.  Performance of PVQ in AV1

230	3.1.  Coding Gain

232	   With the encoding options specified by both NETVC ([2]) and AOM
233	   testing for the high latency case, PVQ gives similar coding
234	   efficiency to that of AV1, which is measured in PSNR BD-rate.  Again,
235	   PVQ's activity masking is not turned on for this testing.  Also,
236	   scalar quantization has matured over decades, while video coding with
237	   PVQ is much more recent.

239	   We compare the coding efficiency for the IETF test sequence set
240	   "objective-1-fast" defined in [3], which consists of sixteen of
241	   1080p, seven of 720p, and seven of 640x360 sequences of various types
242	   of content, including slow/high motion of people and objects,
243	   animation, computer games and screen casting.  The encoding is done
244	   for the first 30 frames of each sequence.  The encoding options used
245	   is : "-end-usage=q -cq-level=x --passes=2 --good --cpu-used=0 --auto-
246	   alt-ref=2 --lag-in-frames=25 --limit=30", which is official test
247	   condition of IETF and AOM for high latency encoding except limiting
248	   30 frames only.

250	   For comparison reasons, some of the lambda values used in RDO are
251	   adjusted to match the balance of luma and chroma quality of the PVQ-
252	   enabled AV1 to that of current AV1.

254	   o  Use half the value of lambda during intra prediction for the
255	      chroma channels.

257	   o  Scale PVQ's lambda by 0.9 for the chroma channels.

259	   o  Do not do RDO of DC for the chroma channels.

261	   The results are shown in Table 1, which is the BD-Rate change for
262	   several image quality metrics.  (The encoders used to generate this
263	   result are available from the author's git repository [4] and AOM's
264	   repository [5].)

266	                   +-----------+----------------------+
267	                   |   Metric  | AV1 --> AV1 with PVQ |
268	                   +-----------+----------------------+
269	                   |    PSNR   |        -0.17%        |
270	                   |           |                      |
271	                   |  PSNR-HVS |        0.27%         |
272	                   |           |                      |
273	                   |    SSIM   |        0.93%         |
274	                   |           |                      |
275	                   |  MS-SSIM  |        0.14%         |
276	                   |           |                      |
277	                   | CIEDE2000 |        -0.28%        |
278	                   +-----------+----------------------+

280	                    Table 1: Coding Gain by PVQ in AV1

282	3.2.  Speed

284	   Total encoding time increases roughly 20 times or more when intensive
285	   RDO options, such as "--passes=2 --good --cpu-used=0 --auto-alt-ref=2
286	   --lag-in-frames=25", are enabled.  The significant increase in
287	   encoding time is due to the increase of computation by the PVQ.  The
288	   PVQ tries to find asymptotically-optimal codepoints (in RD
289	   optimization sense) on a hypersphere with a greedy search, which has
290	   time complexity close to O(n*n) for n coefficients.  Meanwhile,
291	   scalar quantization has time complexity of O(n).

293	   Compared to Daala, the search space for a RDO decision in AV1 is far
294	   larger because AV1 considers ten intra prediction modes and four
295	   different transforms (for the transform block sizes 4x4, 8x8, and
296	   16x16 only), and the transform block size can be smaller than the
297	   prediction block size.  Since the largest transform and the
298	   prediction sizes are currently 32x32 and 64x64 in AV1, PVQ can be
299	   called approximately 5,160 times more in AV1 than in Daala.  Also,
300	   AV1 applies transform and quantization for each candidate of RDO.

302	   As an example, AV1 calls the PVQ function 632,520 times to encode the
303	   grandma_qcif (176x144) clip in intra frame mode while Daala calls
304	   3843 times only (for QP = 30 and 39 for AV1 and daala respectively,
305	   which corresponds to actual quantizer used for quantization being
306	   38).  So, PVQ was called 165 times more in AV1 than Daala.

308	   Table 2 shows the frequency of function calls to PVQ and scalar
309	   quantizers in AV1 at each speed level (where AV1 encoding mode is
310	   'good') for the same sequence and the QP as used in the above
311	   example.  The first column indicates speed level, the second column
312	   shows the number of calls to PVQ's search inside each band (function
313	   pvq_search_rdo_double() in [6]), the third column shows the number of
314	   calls to PVQ quantization of a transform block (function
315	   od_pvq_encode() in [7]), and the fourth column shows the number of
316	   calls to AV1's block quantizer.  Smaller speed level gives slower
317	   encoding but better quality for the same rate by doing more RDO
318	   optimizations.  The major difference from speed level 4 to 3 is
319	   enabling a use of the transform block smaller than the prediction
320	   (i.e. partition) block.

322	   +--------+-----------------+-----------------+----------------------+
323	   | Speed  |  # of calls to  |  # of calls to  |  # of calls to PVQ   |
324	   | Level  |  AV1 quantizer  |  PVQ quantizer  | search inside a band |
325	   +--------+-----------------+-----------------+----------------------+
326	   |   5    |      28,028     |      26,786     |       365,913        |
327	   |        |                 |                 |                      |
328	   |   4    |      57,445     |      56,980     |       472,222        |
329	   |        |                 |                 |                      |
330	   |   3    |     505,039     |     564,724     |      3,680,366       |
331	   |        |                 |                 |                      |
332	   |   2    |     505,039     |     564,724     |      3,680,366       |
333	   |        |                 |                 |                      |
334	   |   1    |     535,100     |     580,566     |      3,990,327       |
335	   |        |                 |                 |                      |
336	   |   0    |     589,931     |     632,520     |      4,109,113       |
337	   +--------+-----------------+-----------------+----------------------+

339	        Table 2: Comparison of Frequency of Calls to PVQ and Scalar
340	                             Quantizers in AV1

342	4.  Future Work

344	   Possible future work includes:

346	   o  Enable activity masking, which also needs a HVS-tuned quantization
347	      matrix (bandwise QP scalars).

349	   o  Adjust the balance between luma and chroma qualities, tuning for
350	      subjective quality.

352	   o  Optimize the speed of the PVQ code, adding SIMD.

354	   o  RDO with more model-driven decision making, instead of full
355	      transform + quantization.

357	5.  Development Repository

359	   The ongoing work of integrating PVQ into AV1 video codec is located
360	   at the git repository [8].

362	6.  Acknowledgements

364	   Thanks to Tim Terriberry for his proofreading and valuable comments.
365	   Also thanks to Guillaume Martres for his contributions to integrating
366	   PVQ into AV1 during his internship at Mozilla and Thomas Daede for
367	   providing and maintaining the testing infrastructure by way of the
368	   www.arewecompressedyet.com (AWCY) website.  [9].

370	7.  IANA Considerations

372	   This memo includes no request to IANA.

374	8.  References

376	8.1.  Informative References

378	   [I-D.valin-netvc-pvq]
379	              Valin, J., "Pyramid Vector Quantization for Video Coding",
380	              draft-valin-netvc-pvq-00 (work in progress), June 2015.

382	   [Perceptual-VQ]
383	              Valin, JM. and TB. Terriberry, "Perceptual Vector
384	              Quantization for Video Coding", Proceedings of SPIE Visual
385	              Information Processing and Communication , February 2015,
386	              <https://arxiv.org/pdf/1602.05209v1.pdf>.

388	   [PVQ-demo]
389	              Valin, JM., "Daala: Perceptual Vector Quantization (PVQ)",
390	              November 2014, <https://people.xiph.org/~jm/daala/
391	              pvq_demo/>.

393	   [Terriberry_16]
394	              Terriberry, TB., "Perceptually-Driven Video Coding with
395	              the Daala Video Codec", Proceedings SPIE Volume 9971,
396	              Applications of Digital Image Processing XXXIX , September
397	              2016, <https://arxiv.org/pdf/1610.02488.pdf>.

399	8.2.  URIs

401	   [1] https://xiph.org/daala/

403	   [2] https://tools.ietf.org/html/draft-ietf-netvc-testing-03

405	   [3] https://tools.ietf.org/html/draft-ietf-netvc-testing-03

407	   [4] https://github.com/ycho/aom/
408	       commit/2478029a9b6d02ee2ccc9dbafe7809b5ef345814

410	   [5] https://aomedia.googlesource.com/
411	       aom/+/59848c5c797ddb6051e88b283353c7562d3a2c24

413	   [6] https://github.com/ycho/aom/blob/14981eebb4a08f74182cea3c17f7361b
414	       c79cf04f/av1/encoder/pvq_encoder.c#L84

416	   [7] https://github.com/ycho/aom/blob/14981eebb4a08f74182cea3c17f7361b
417	       c79cf04f/av1/encoder/pvq_encoder.c#L763

419	   [8] https://github.com/ycho/aom/tree/av1_pvq

421	   [9] https://arewecompressedyet.com/

423	Author's Address

425	   Yushin Cho
426	   Mozilla Corporation
427	   331 E. Evelyn Avenue
428	   Mountain View, CA  94041
429	   USA

431	   Phone: +1 650 903 0800
432	   Email: ycho@mozilla.com