idnits 2.17.1 draft-valin-netvc-l1tw-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- -- The document has an IETF Trust Provisions (28 Dec 2009) Section 6.c(ii) Publication Limitation clause. If this document is intended for submission to the IESG for publication, this constitutes an error. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 6, 2015) is 3217 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-02) exists of draft-daede-netvc-testing-00 Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group JM. Valin 3 Internet-Draft Mozilla 4 Intended status: Standards Track July 6, 2015 5 Expires: January 7, 2016 7 Screencasting Considerations and L1-Tree Wavelet Coding 8 draft-valin-netvc-l1tw-01 10 Abstract 12 This document proposes a screencasting encoding mode based on the 13 Haar wavelet transform and L1-tree wavelet (L1TW) coding. 15 Status of This Memo 17 This Internet-Draft is submitted in full conformance with the 18 provisions of BCP 78 and BCP 79. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF). Note that other groups may also distribute 22 working documents as Internet-Drafts. The list of current Internet- 23 Drafts is at http://datatracker.ietf.org/drafts/current/. 25 Internet-Drafts are draft documents valid for a maximum of six months 26 and may be updated, replaced, or obsoleted by other documents at any 27 time. It is inappropriate to use Internet-Drafts as reference 28 material or to cite them other than as "work in progress." 30 This Internet-Draft will expire on January 7, 2016. 32 Copyright Notice 34 Copyright (c) 2015 IETF Trust and the persons identified as the 35 document authors. All rights reserved. 37 This document is subject to BCP 78 and the IETF Trust's Legal 38 Provisions Relating to IETF Documents 39 (http://trustee.ietf.org/license-info) in effect on the date of 40 publication of this document. Please review these documents 41 carefully, as they describe your rights and restrictions with respect 42 to this document. Code Components extracted from this document must 43 include Simplified BSD License text as described in Section 4.e of 44 the Trust Legal Provisions and are provided without warranty as 45 described in the Simplified BSD License. 47 This document may not be modified, and derivative works of it may not 48 be created, and it may not be published except as an Internet-Draft. 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 53 2. The Haar Wavelet . . . . . . . . . . . . . . . . . . . . . . 3 54 3. L1-Tree Coding . . . . . . . . . . . . . . . . . . . . . . . 3 55 4. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 56 5. Objective Evaluation . . . . . . . . . . . . . . . . . . . . 4 57 6. Development Repository . . . . . . . . . . . . . . . . . . . 5 58 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 5 59 8. Security Considerations . . . . . . . . . . . . . . . . . . . 5 60 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 5 61 10. Informative References . . . . . . . . . . . . . . . . . . . 5 62 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 5 64 1. Introduction 66 Screensharing is an important application for an Internet video 67 codec. Screensharing content differs from photographic images in 68 many ways, including: 70 o Text: screenshots often contain anti-aliased text on a perfectly 71 flat background. This makes ringing artefacts highly perceptible. 72 Also, typical photographic codecs based on the discrete cosine 73 transform (DCT) cannot take advantage of the fact that the 74 background often has a constant colour. 76 o Lines and edges. Screenshots often contain perfectly straight 77 horizontal and/or vertical lines. They appear in window frames, 78 toolbars, widgets, spreadsheets, etc. DCT-based codecs can 79 represent those lines and edges, but not as compactly as codecs 80 like PNG. 82 o Reduced number of colours: Screenshots are much less "noisy" than 83 photographic images. It is common for a certain region of an 84 image to only contain a handful of different colours, another 85 property we would like to exploit in a video codec. 87 o A very common motion pattern in screensharing content is the 88 displacement of windows. This typically involves rectangular 89 boundaries. 91 The technique described in this document only deals with still images 92 for now and focuses on the problem of efficiently coding anti-aliased 93 text. While it is implemented for the Daala [Daala-website] codec, 94 it should be applicable to most other video codecs. 96 2. The Haar Wavelet 98 The Haar wavelet is the 99 simplest of all orthogonal wavelets, and also the only one with 100 linear phase. We use the Haar transform both because it is spatially 101 compact and because it makes it easy to switch between a wavelet 102 transform and the DCT. 104 In 1-D, a single level of the Haar transform is expressed as: 106 ___ 107 [ y0 ] / 1 [ 1 1 ] [ x0 ] 108 [ ] = / --- [ ] [ ] 109 [ y1 ] v 2 [ -1 1 ] [ x1 ] 111 The 2-D Haar transform is implemented from a 2x2 lifting Haar kernel: 113 inputs: x0, x1, x2, x3 114 x0 <= x0 + x2 115 x3 <= x3 - x1 116 tmp <= (x0 - x3) >> 1 117 x1 <= tmp - x1 118 x2 <= tmp - x2 119 x0 <= x0 - x1 120 x3 <= x3 + x2 121 outputs: x0, x1, x2, x3 123 This kernel has perfect reconstruction, making it also useful for 124 lossless compression. 126 The kernel above is applied on 5 levels for 32x32 superblocks. The 127 resulting wavelet coefficients are quantized non-uniformly using the 128 following quantization scales relative to the DC quantizer (from low 129 frequency to high frequency): 131 horizontal/vertical: [1.0, 1.0, 1.0, 1.5, 2.0] 132 diagonal: [1.0, 1.0, 1.5, 2.0, 3.0] 134 3. L1-Tree Coding 136 Like other wavelet coding methods such as EZW and SPIHT, we code the 137 wavelet coefficients using trees. The main difference however is 138 that rather than being based on the maximum coefficient value in a 139 tree, this technique is based on the sum of the absolute values of 140 all coefficients in the tree. Let x(i,j) denote the quantized 141 wavelet coefficient at position (i,j), the children of x(i,j) are 142 x(2*i,2*j), x(2*i,2*j+1), x(2*i+1,2*j), and x(2*i+1,2*j+1). The 143 absolute sum of the tree rooted in (i,j) is defined recursively as: 145 S(i,j) = |x(i,j)| + S(2*i,2*j) + S(2*i,2*j+1) 146 + S(2*i+1,2*j) + S(2*i+1,2*j+1), 148 with S(i,j)=0 for i or j >= N. C(i,j) is defined as S(i,j) 149 - |x(i,j)|. 151 Coefficient coding starts at the root of each of the three "direction 152 trees": (1,0), (0,1), and (1,1). At each level we code the value 153 of |x(i,j)| using a cumulative density function adapted based on the 154 value of S(i,j). Coding |x(i,j)| implies that the value of C(i,j) is 155 known to the decoder, so it does not need to be coded. Three symbols 156 are then required to encode each of the new roots: S(2*i,2*j), 157 S(2*i,2*j+1), S(2*i+1,2*j), and S(2*i+1,2*j+1). 159 At the top level, we have S(0,0) = S(1,0) + S(0,1) + S(1,1), so that 160 completely flat blocks can be coded with a single S(0,0)=0 symbol. 161 The DC is coded separately. 163 4. Results 165 The coded images obtained with the Haar transform and L1TW have far 166 better subjective visual quality than those obtained with the lapped 167 DCT or with JPEG, and of comparable quality to those obtained with 168 x264 and x265 169 . An example image at around 0.35 bit/pixel is 170 provided at . The x264 image 171 encoded with options "--preset placebo --crf=27" and the x265 image 172 is encoded with "--preset slow --crf=29". 174 While the technique presented here works relatively well on the 175 example above, there are still cases where it performs significantly 176 worse than x265. These include gradients, such as those in toolbars 177 and window titlebars, and long horizontal and vertical lines such as 178 those found in spreadsheets. These cases should improve once we 179 implement the ability to dynamically switch between the lapped DCT 180 and the Haar transform. Other ways of improving performance on long 181 lines and edges would be to extend to use a different 2D wavelet 182 decomposition, or use an overcomplete basis. 184 5. Objective Evaluation 186 As a first step for evaluating screensharing quality, we have added a 187 small collection of screenshot images to the "Are We Compressed Yet?" 188 (AWCY) website, under the 189 "screenshots" set name. AWCY currently runs four quality metrics: 190 PSNR, PSNR-HVS, SSIM, and FAST-SSIM [I-D.daede-netvc-testing]. It is 191 not yet clear that and of these metrics is suitable for evaluating 192 the quality of screensharing material. 194 6. Development Repository 196 The algorithms in this proposal are being developed as part of 197 Xiph.Org's Daala project. The code is available in the Daala git 198 repository at . See [Daala-website] 199 for more information. 201 7. IANA Considerations 203 This document makes no request of IANA. 205 8. Security Considerations 207 This draft has no security considerations. 209 9. Acknowledgements 211 Thanks to Timothy B. Terriberry for useful feedback and for 212 designing the 2-D Haar lifting kernel. 214 10. Informative References 216 [Daala-website] 217 "Daala website", Xiph.Org Foundation , . 220 [I-D.daede-netvc-testing] 221 Daede, T. and J. Jack, "Video Codec Testing and Quality 222 Measurement", draft-daede-netvc-testing-00 (work in 223 progress), March 2015. 225 Author's Address 227 Jean-Marc Valin 228 Mozilla 229 331 E. Evelyn Avenue 230 Mountain View, CA 94041 231 USA 233 Email: jmvalin@jmvalin.ca