idnits 2.17.1 

draft-valin-netvc-l1tw-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  -- The document has an IETF Trust Provisions (28 Dec 2009) Section 6.c(ii)
     Publication Limitation clause.  If this document is intended for
     submission to the IESG for publication, this constitutes an error.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (July 6, 2015) is 3217 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Outdated reference: A later version (-02) exists of
     draft-daede-netvc-testing-00


     Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Network Working Group                                          JM. Valin
3	Internet-Draft                                                   Mozilla
4	Intended status: Standards Track                            July 6, 2015
5	Expires: January 7, 2016

7	        Screencasting Considerations and L1-Tree Wavelet Coding
8	                       draft-valin-netvc-l1tw-01

10	Abstract

12	   This document proposes a screencasting encoding mode based on the
13	   Haar wavelet transform and L1-tree wavelet (L1TW) coding.

15	Status of This Memo

17	   This Internet-Draft is submitted in full conformance with the
18	   provisions of BCP 78 and BCP 79.

20	   Internet-Drafts are working documents of the Internet Engineering
21	   Task Force (IETF).  Note that other groups may also distribute
22	   working documents as Internet-Drafts.  The list of current Internet-
23	   Drafts is at http://datatracker.ietf.org/drafts/current/.

25	   Internet-Drafts are draft documents valid for a maximum of six months
26	   and may be updated, replaced, or obsoleted by other documents at any
27	   time.  It is inappropriate to use Internet-Drafts as reference
28	   material or to cite them other than as "work in progress."

30	   This Internet-Draft will expire on January 7, 2016.

32	Copyright Notice

34	   Copyright (c) 2015 IETF Trust and the persons identified as the
35	   document authors.  All rights reserved.

37	   This document is subject to BCP 78 and the IETF Trust's Legal
38	   Provisions Relating to IETF Documents
39	   (http://trustee.ietf.org/license-info) in effect on the date of
40	   publication of this document.  Please review these documents
41	   carefully, as they describe your rights and restrictions with respect
42	   to this document.  Code Components extracted from this document must
43	   include Simplified BSD License text as described in Section 4.e of
44	   the Trust Legal Provisions and are provided without warranty as
45	   described in the Simplified BSD License.

47	   This document may not be modified, and derivative works of it may not
48	   be created, and it may not be published except as an Internet-Draft.

50	Table of Contents

52	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
53	   2.  The Haar Wavelet  . . . . . . . . . . . . . . . . . . . . . .   3
54	   3.  L1-Tree Coding  . . . . . . . . . . . . . . . . . . . . . . .   3
55	   4.  Results . . . . . . . . . . . . . . . . . . . . . . . . . . .   4
56	   5.  Objective Evaluation  . . . . . . . . . . . . . . . . . . . .   4
57	   6.  Development Repository  . . . . . . . . . . . . . . . . . . .   5
58	   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   5
59	   8.  Security Considerations . . . . . . . . . . . . . . . . . . .   5
60	   9.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .   5
61	   10. Informative References  . . . . . . . . . . . . . . . . . . .   5
62	   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .   5

64	1.  Introduction

66	   Screensharing is an important application for an Internet video
67	   codec.  Screensharing content differs from photographic images in
68	   many ways, including:

70	   o  Text: screenshots often contain anti-aliased text on a perfectly
71	      flat background.  This makes ringing artefacts highly perceptible.
72	      Also, typical photographic codecs based on the discrete cosine
73	      transform (DCT) cannot take advantage of the fact that the
74	      background often has a constant colour.

76	   o  Lines and edges.  Screenshots often contain perfectly straight
77	      horizontal and/or vertical lines.  They appear in window frames,
78	      toolbars, widgets, spreadsheets, etc.  DCT-based codecs can
79	      represent those lines and edges, but not as compactly as codecs
80	      like PNG.

82	   o  Reduced number of colours: Screenshots are much less "noisy" than
83	      photographic images.  It is common for a certain region of an
84	      image to only contain a handful of different colours, another
85	      property we would like to exploit in a video codec.

87	   o  A very common motion pattern in screensharing content is the
88	      displacement of windows.  This typically involves rectangular
89	      boundaries.

91	   The technique described in this document only deals with still images
92	   for now and focuses on the problem of efficiently coding anti-aliased
93	   text.  While it is implemented for the Daala [Daala-website] codec,
94	   it should be applicable to most other video codecs.

96	2.  The Haar Wavelet

98	   The Haar wavelet <https://en.wikipedia.org/wiki/Haar_wavelet> is the
99	   simplest of all orthogonal wavelets, and also the only one with
100	   linear phase.  We use the Haar transform both because it is spatially
101	   compact and because it makes it easy to switch between a wavelet
102	   transform and the DCT.

104	   In 1-D, a single level of the Haar transform is expressed as:

106	                                  ___
107	                      [ y0 ]     / 1  [  1 1 ] [ x0 ]
108	                      [    ] =  / --- [      ] [    ]
109	                      [ y1 ]   v   2  [ -1 1 ] [ x1 ]

111	   The 2-D Haar transform is implemented from a 2x2 lifting Haar kernel:

113	                          inputs: x0, x1, x2, x3
114	                          x0 <= x0 + x2
115	                          x3 <= x3 - x1
116	                          tmp <= (x0 - x3) >> 1
117	                          x1 <= tmp - x1
118	                          x2 <= tmp - x2
119	                          x0 <= x0 - x1
120	                          x3 <= x3 + x2
121	                          outputs: x0, x1, x2, x3

123	   This kernel has perfect reconstruction, making it also useful for
124	   lossless compression.

126	   The kernel above is applied on 5 levels for 32x32 superblocks.  The
127	   resulting wavelet coefficients are quantized non-uniformly using the
128	   following quantization scales relative to the DC quantizer (from low
129	   frequency to high frequency):

131	              horizontal/vertical: [1.0, 1.0, 1.0, 1.5, 2.0]
132	              diagonal:            [1.0, 1.0, 1.5, 2.0, 3.0]

134	3.  L1-Tree Coding

136	   Like other wavelet coding methods such as EZW and SPIHT, we code the
137	   wavelet coefficients using trees.  The main difference however is
138	   that rather than being based on the maximum coefficient value in a
139	   tree, this technique is based on the sum of the absolute values of
140	   all coefficients in the tree.  Let x(i,j) denote the quantized
141	   wavelet coefficient at position (i,j), the children of x(i,j) are
142	   x(2*i,2*j), x(2*i,2*j+1), x(2*i+1,2*j), and x(2*i+1,2*j+1).  The
143	   absolute sum of the tree rooted in (i,j) is defined recursively as:

145	               S(i,j) = |x(i,j)| + S(2*i,2*j) + S(2*i,2*j+1)
146	                      + S(2*i+1,2*j) + S(2*i+1,2*j+1),

148	   with S(i,j)=0 for i or j >= N.  C(i,j) is defined as S(i,j)
149	   - |x(i,j)|.

151	   Coefficient coding starts at the root of each of the three "direction
152	   trees": (1,0), (0,1), and (1,1).  At each level we code the value
153	   of |x(i,j)| using a cumulative density function adapted based on the
154	   value of S(i,j).  Coding |x(i,j)| implies that the value of C(i,j) is
155	   known to the decoder, so it does not need to be coded.  Three symbols
156	   are then required to encode each of the new roots: S(2*i,2*j),
157	   S(2*i,2*j+1), S(2*i+1,2*j), and S(2*i+1,2*j+1).

159	   At the top level, we have S(0,0) = S(1,0) + S(0,1) + S(1,1), so that
160	   completely flat blocks can be coded with a single S(0,0)=0 symbol.
161	   The DC is coded separately.

163	4.  Results

165	   The coded images obtained with the Haar transform and L1TW have far
166	   better subjective visual quality than those obtained with the lapped
167	   DCT or with JPEG, and of comparable quality to those obtained with
168	   x264 <http://www.videolan.org/developers/x264.html> and x265
169	   <http://x265.org/>.  An example image at around 0.35 bit/pixel is
170	   provided at <http://jmvalin.ca/video/haar_example/>.  The x264 image
171	   encoded with options "--preset placebo --crf=27" and the x265 image
172	   is encoded with "--preset slow --crf=29".

174	   While the technique presented here works relatively well on the
175	   example above, there are still cases where it performs significantly
176	   worse than x265.  These include gradients, such as those in toolbars
177	   and window titlebars, and long horizontal and vertical lines such as
178	   those found in spreadsheets.  These cases should improve once we
179	   implement the ability to dynamically switch between the lapped DCT
180	   and the Haar transform.  Other ways of improving performance on long
181	   lines and edges would be to extend to use a different 2D wavelet
182	   decomposition, or use an overcomplete basis.

184	5.  Objective Evaluation

186	   As a first step for evaluating screensharing quality, we have added a
187	   small collection of screenshot images to the "Are We Compressed Yet?"
188	   (AWCY) <https://arewecompressedyet.com/> website, under the
189	   "screenshots" set name.  AWCY currently runs four quality metrics:
190	   PSNR, PSNR-HVS, SSIM, and FAST-SSIM [I-D.daede-netvc-testing].  It is
191	   not yet clear that and of these metrics is suitable for evaluating
192	   the quality of screensharing material.

194	6.  Development Repository

196	   The algorithms in this proposal are being developed as part of
197	   Xiph.Org's Daala project.  The code is available in the Daala git
198	   repository at <https://git.xiph.org/daala.git>.  See [Daala-website]
199	   for more information.

201	7.  IANA Considerations

203	   This document makes no request of IANA.

205	8.  Security Considerations

207	   This draft has no security considerations.

209	9.  Acknowledgements

211	   Thanks to Timothy B.  Terriberry for useful feedback and for
212	   designing the 2-D Haar lifting kernel.

214	10.  Informative References

216	   [Daala-website]
217	              "Daala website", Xiph.Org Foundation , <https://xiph.org/
218	              daala/>.

220	   [I-D.daede-netvc-testing]
221	              Daede, T. and J. Jack, "Video Codec Testing and Quality
222	              Measurement", draft-daede-netvc-testing-00 (work in
223	              progress), March 2015.

225	Author's Address

227	   Jean-Marc Valin
228	   Mozilla
229	   331 E. Evelyn Avenue
230	   Mountain View, CA  94041
231	   USA

233	   Email: jmvalin@jmvalin.ca