idnits 2.17.1 draft-davies-netvc-qmtx-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 7) being 59 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There is 1 instance of too long lines in the document, the longest one being 4 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 206 has weird spacing: '...rameter q is ...' -- The document date (March 16, 2016) is 2962 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'IRFVC' is defined on line 280, but no explicit reference was found in the text Summary: 3 errors (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group T. Davies 2 Internet-Draft Cisco 3 Intended status: Standards Track March 16, 2016 4 Expires: September 17, 2016 6 Quantisation matrices for Thor video coding 7 draft-davies-netvc-qmtx-00 9 Abstract 11 This draft describes a family of default quantisation matrices 12 that may be used to improve perceptual quality when encoding with 13 Thor. Similar quantisation matrix designs may be used in most 14 block-based video and image codecs. 16 Status of This Memo 18 This Internet-Draft is submitted in full conformance with the 19 provisions of BCP 78 and BCP 79. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF). Note that other groups may also distribute 23 working documents as Internet-Drafts. The list of current Internet- 24 Drafts is at http://datatracker.ietf.org/drafts/current/. 26 Internet-Drafts are draft documents valid for a maximum of six months 27 and may be updated, replaced, or obsoleted by other documents at any 28 time. It is inappropriate to use Internet-Drafts as reference 29 material or to cite them other than as "work in progress." 31 This Internet-Draft will expire on September 17, 2016. 33 Copyright Notice 35 Copyright (c) 2016 IETF Trust and the persons identified as the 36 document authors. All rights reserved. 38 This document is subject to BCP 78 and the IETF Trust's Legal 39 Provisions Relating to IETF Documents 40 (http://trustee.ietf.org/license-info) in effect on the date of 41 publication of this document. Please review these documents 42 carefully, as they describe your rights and restrictions with respect 43 to this document. Code Components extracted from this document must 44 include Simplified BSD License text as described in Section 4.e of 45 the Trust Legal Provisions and are provided without warranty as 46 described in the Simplified BSD License. 48 Table of Contents 50 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 52 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 2 53 2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 2 55 3. Quantisation matrix design . . . . . . . . . . . . . . . . . 3 56 3.1. The function of quantisation matrices . . . . . . . . . . 3 57 3.2. Quantisation matrices in AVC and HEVC . . . . . . . . . . 4 58 3.3. Quantisation matrices in Thor . . . . . . . . . . . . . . 4 59 3.4. Implementation . . . . . . . . . . . . . . . . . . . . . 5 61 4. Compression performance . . . . . . . . . . . . . . . . . . . 6 63 5. Informative References . . . . . . . . . . . . . . . . . . . . 7 65 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 7 67 1. Introduction 69 This document describes a family of default quantisation matrices 70 that may be used to improve perceptual quality when encoding with 71 Thor. The quantisation matrices are designed to be near-flat at high 72 quantisation levels and more strongly profiled at low quantisation 73 levels, to avoid ringing artefacts and better shape quantisation 74 error across a whole sequence with varying quantisation levels. 76 2. Definitions 78 2.1. Terminology 80 This document uses the following terms. 82 QP: quantisation parameter 84 QM: quantisation matrix 86 CSF: contrast sensitivity function 88 BDR: Bjontegaard Delta-Rate 90 3. Quantisation matrix design 92 3.1. The function of quantisation matrices 94 Quantisation matrices work by shaping the residual error after 95 quantisation in the spatial frequency domain, usually the DCT domain. 96 This is done by varying the quantisation factor applied across 97 spatial frequencies in the transform block. Typically a high 98 quantisation factor is applied at high spatial frequencies and a 99 low one at low spatial frequencies. 101 The aim is roughly to match a Contrast Sensitivity Function for the 102 human visual system. This provides a curve of sensitivity to detail 103 (and therefore coding errors) with spatial frequency. Given known 104 resolutions and assumed viewing distances, a weighting function 105 can be simply defined for all the coefficients in a transform block. 107 This simple approach is complicated, however, by a number of factors. 108 The first is the CSF is in reality not a simple function 109 of spatial frequency, but depends on factors such as brightness 110 which are imperfectly corrected for by television gammas. There is 111 little that can be done about that in the quantisation matrices 112 themselves, but adjusting QP itself may help. 114 The second factor is that CSFs are determined experimentally based 115 on models of Just Noticeable Difference (JND) and do not reflect so 116 well the impact of distortions well above this level. Adjustments 117 at high levels of quantisation are needed to reflect this. 119 Finally, applying quantisation matrices to video is affected by the 120 fact that most frames are predicted and the QM is applied to the 121 residual after prediction. This means that the quantisation error 122 for a block consists of the quantisation error in the reference 123 block, plus any additional error introduced in the current block. 124 These errors will add if they are uncorrelated, but they may well 125 be correlated at high QP. 127 Despite these difficulties, QMs are widely used and known to work 128 well, and are available in video coding standards such as 129 H264/AVC and H265/HEVC [AVC,HEVC]. 131 3.2 Quantisation matrix design in AVC and HEVC 133 Quantisation matrices are available in a number of different codecs. 134 The design in AVC and HEVC is to provide default matrices together 135 with the ability to signal bespoke matrices [AVC,HEVC]. These 136 matrices must cover all the different transform block sizes, 137 components (Y, Cb, Cr) and intra and inter frame or block types, 138 with fall-backs defined if bespoke matrices are not provided. 139 Default inter block matrices are flatter than 140 intra matrices, no doubt because of the noise-addition effect 141 described in section 3.1: if they had the same profile as for intra 142 then the overall profile of the combined prediction + residual could 143 be over-shaped. 145 3.3 Design of quantisation matrices in Thor 147 Thor provides a set of matrices for each component of 420-sampled 148 video, for each block size and each quantisation parameter. The 149 principles behind the design are as follows: 151 1) QP dependence. Matrices become flatter as quantisation levels 152 increase 154 2) Energy preservation for intra. The inverse quantisation matrices 155 for intra blocks are normalised to approximately preserve energy 156 of the residual 158 3) DC preservation for inter. The inverse quantisation matrices for 159 inter blocks are normalised to preserve the DC level 161 4) Matrices are also flatter for inter blocks than for intra blocks. 163 5) Quantisation matrix strength is globally adjustable 165 The QP dependence takes account of a number of factors. Firstly it 166 reflects that inter blocks typically have higher QPs than the blocks 167 used to predict them. This means that flattening the matrices at 168 higher QP naturally prevents over-shaping the quantisation error. 170 Secondly, the high-QP flattening process also reflects the fact 171 that errors at this level are very visible even at high spatial 172 frequencies. Strong error-shaping at these QP levels leads to very 173 visible additional ringiness. 175 SSIM-based metrics [SSIM,MSSSIM,FASTSSIM] indicate that preserving 176 image variances and therefore residual energies is perceptually 177 important. This is feasible for intra where residuals are substantial 178 but in the case of inter it is also important to preserve DC levels 179 since getting these wrong can produce very visible artefacts. 181 Intra frames tend to have lower QP than inter frames, and this means 182 that QP dependence absorbs most of the requirement for inter 183 matrices to be flatter than intra matrices. However inter matrices 184 are still a little flatter, to take account of the different 185 characteristics of intra and inter blocks within the same frame. 187 In determining the quantisation matrix, there are 12 possible sets 188 available giving a new set of matrix for each change of approximately 189 4 in quantisation value. Thor also supports a global adjustment 190 or strength parameter, which offsets the LUT mapping quantisation 191 parameter to quantisation matrix set. This is a value from -32 192 to 31. A value of -32 will reduce the qp used by 32, increasing 193 the strength of quantisation matrix dramatically. Likewise a value 194 of 31 will eliminate quantisation matrices for all but the smallest 195 QPs. 197 The effect of the ability to signal strength, and the provision 198 of a range of QP-dependent matrices are intended to remove the need 199 to signal bespoke matrices at all. 201 3.4 Implementation 203 Quantisation matrices are applied as multiplicative factors in 204 forward or inverse quantisation processes. In Thor the basic 205 unweighted dequantisation process for a coefficient c with 206 quantisation parameter q is based on two values: scale[q], which 207 depends only on q%6, and shift[q] which depends only on q/6, 208 the block size and the signal dynamic range. scale[q] takes care of 209 quantisation step sizes which fall between powers of 2 and shift[q] 210 takes care of the basic power of 2 part of the quantisation step. 212 The formula for unweighted dequantisation is then: 214 c -> (c*scale[q] + (1<<(shift[q]-1))) >> shift[q] (1) 216 for positive shift[q], otherwise 218 c -> (c*scale[q])<<(-shift[q]) (2) 220 To apply a matrix M to a coefficient c[i,j] at position 221 (i,j) within a block, the formulae (1), (2) change to: 223 c[i,j]->(c[i,j]*M[i,j]*scale[q]+(1<<(shift[q]+5)))>>(shift[q]+6) (3) 225 if shift[q]+6 > 0, otherwise 227 c[i,j]->(c[i,j]*M[i,j]*scale[q])<<(-shift[q]-6) (4) 229 otherwise. 231 Exactly complementary formulae can be derived for the forward 232 quantisation process. 234 4. Compression performance 236 Although largely a visual tool, the effectiveness of QMs can be 237 inferred by changes to PSNRHVS [PSNRHVS] and FASTSSIM metrics. 238 FASTSSIM tends to over-estimate gains a little, as it has a bias 239 towards low-pass filtering. Overall BDR results for the Low-Delay B (LDB) 240 and High-Delay B GOP 16 configuration (HDB16) are as follows 241 (QPs 22, 27, 32, 37): 243 Config | PSNR | PSNRHVS | FASTSSIM | 244 -------------------------------------------- 245 LDB | +1.1% | -3.3% | -9.0% | 246 -------------------------------------------- 247 HDB | +2.2% | -2.6% | -11.6% | 248 -------------------------------------------- 250 These were computed on the same test sequences as in IRFVC. 252 FASTSSIM and PSNRHVS gains are typically larger, and PSNR losses 253 smaller, for higher resolution material. 255 5. Informative References 257 [AVC] ITU-T Recommendation H.264, "Advanced video coding for 258 generic audiovisual services", March 2010. 260 [HEVC] ITU-T Recommendation H.265, "High efficiency video 261 coding", April 2013. 263 [FASTSSIM] Chen, M. and A. Bovik, "Fast structural similarity index 264 algorithm", 2010, . 267 [MSSSIM] Wang, Z., Simoncelli, E., and A. Bovik, "Multi-Scale 268 Structural Similarity for Image Quality Assessment", n.d., 269 . 271 [PSNRHVS] Egiazarian, K., Astola, J., Ponomarenko, N., Lukin, V., 272 Battisti, F., and M. Carli, "A New Full-Reference Quality 273 Metrics Based on HVS", 2002. 275 [SSIM] Wang, Z., Bovik, A., Sheikh, H., and E. Simoncelli, "Image 276 Quality Assessment: From Error Visibility to Structural 277 Similarity", 2004, 278 . 280 [IRFVC] Davies, T. "Interpolated reference frames for video 281 coding", IETF draft 282 https://www.ietf.org/id/draft-davies-netvc-irfvc-00.txt 284 Author's Address: 286 Thomas Davies 287 Cisco 288 Feltham 289 UK 291 Email: thdavies@cisco.com