idnits 2.17.1 draft-midtskogen-netvc-chromapred-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (October 31, 2016) is 2733 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-03) exists of draft-fuldseth-netvc-thor-02 Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group S. Midtskogen 3 Internet-Draft Cisco 4 Intended status: Standards Track October 31, 2016 5 Expires: May 4, 2017 7 Improved chroma prediction 8 draft-midtskogen-netvc-chromapred-02 10 Abstract 12 This document describes the technique used to improve the chroma 13 prediction in the Thor video codec. 15 Status of This Memo 17 This Internet-Draft is submitted in full conformance with the 18 provisions of BCP 78 and BCP 79. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF). Note that other groups may also distribute 22 working documents as Internet-Drafts. The list of current Internet- 23 Drafts is at http://datatracker.ietf.org/drafts/current/. 25 Internet-Drafts are draft documents valid for a maximum of six months 26 and may be updated, replaced, or obsoleted by other documents at any 27 time. It is inappropriate to use Internet-Drafts as reference 28 material or to cite them other than as "work in progress." 30 This Internet-Draft will expire on May 4, 2017. 32 Copyright Notice 34 Copyright (c) 2016 IETF Trust and the persons identified as the 35 document authors. All rights reserved. 37 This document is subject to BCP 78 and the IETF Trust's Legal 38 Provisions Relating to IETF Documents 39 (http://trustee.ietf.org/license-info) in effect on the date of 40 publication of this document. Please review these documents 41 carefully, as they describe your rights and restrictions with respect 42 to this document. Code Components extracted from this document must 43 include Simplified BSD License text as described in Section 4.e of 44 the Trust Legal Provisions and are provided without warranty as 45 described in the Simplified BSD License. 47 Table of Contents 49 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 50 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 2 51 2.1. Requirements Language . . . . . . . . . . . . . . . . . . 2 52 3. Background . . . . . . . . . . . . . . . . . . . . . . . . . 2 53 4. Computing the improved prediction . . . . . . . . . . . . . . 3 54 5. Performance . . . . . . . . . . . . . . . . . . . . . . . . . 6 55 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 56 7. Security Considerations . . . . . . . . . . . . . . . . . . . 8 57 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 8 58 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 8 59 9.1. Normative References . . . . . . . . . . . . . . . . . . 8 60 9.2. Informative References . . . . . . . . . . . . . . . . . 9 61 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 9 63 1. Introduction 65 Modern video coding standards such as Thor [I-D.fuldseth-netvc-thor] 66 form predictions for the luma channel (Y) and chroma channels (U and 67 V) which are encoded separately (in that order). The prediction for 68 each channel has spatial or temporal dependencies only in its own 69 channel. Most of the perceived information of a video is to be found 70 in the luma channel, but there still remain correlations between the 71 luma and chroma channels. For instance, the same shape of an object 72 can often be seen in all three channels, and if this correlation is 73 not exploited, some structural information will be transmitted three 74 times. Thor will attempt to improve the chroma prediction by finding 75 linear relationships between the each of the initial chroma 76 predictions and the luma prediction, and if certain criteria are 77 satisfied, use that relationship to form a new prediction based on 78 the reconstructed luma samples. 80 2. Definitions 82 2.1. Requirements Language 84 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 85 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 86 document are to be interpreted as described in RFC 2119 [RFC2119]. 88 3. Background 90 The improved predictions are derived from the reconstructed luma 91 samples using a mapping. The underlying assumption is that the 92 colours can be identified by their luminosities. Informally we can 93 say that a new chroma prediction is formed from the reconstructed 94 luma block painted with the colours of the initial chroma prediction. 96 There is often a linear correlation between the luma and chroma 97 channel, so that a chroma sample c can be expressed by the linear 98 function 100 c = a*y + b 102 Figure 1: Linear relationship 104 where y is the corresponding luma sample. This observation has been 105 previously been used in techniques to convert YUV 4:2:0 and YUV 4:2:2 106 images to YUV 4:4:4, and in a (rejected) proposal for HEVC as a 107 special intra mode. Thor, however, generalises the prediction, so it 108 does not depend on the coding mode (i.e. whether inter or intra, or 109 the kind of inter/intra mode). 111 Since it would be too costly to transmit the values a and b in the 112 linear mapping, and since both the encoder and decoder must be able 113 to compute identical predictions, a and b are derived from data 114 available to both using linear regression. 116 4. Computing the improved prediction 118 Since the assumption that the correlation is the same in the 119 predicted block and in the reconstructed block is not always true, 120 the new prediction from luma might not be better even when there is a 121 very good correlation in the predicted block. Therefore, we can only 122 expected an improvement if the initial prediction is bad, and the 123 luma residual is used as an estimate for this. The initial chroma 124 prediction is kept unless the average squared difference between the 125 reconstructed luma samples yr and the predicted y samples for an N*N 126 prediction block is above 64: 128 _N_ _N_ 129 \ \ 130 /__ /__ (yr(i, j) - y(i, j)) ^ 2 131 i=1 j=1 132 -------------------------------- > 64 133 N*N 135 Figure 2: Requirement for improvement 1 137 The encoder and decoder must compute a and b using the same least 138 square fit for an N*N prediction block, where y and c denote the luma 139 and chroma samples in the initial prediction: 141 _N_ _N_ _N_ _N_ 142 \ \ \ \ 143 Ysum = /__ /__ y(i, j) Csum = /__ /__ c(i, j) 144 i=1 j=1 i=1 j=1 146 _N_ _N_ _N_ _N_ 147 \ \ \ \ 148 YYsum = /__ /__ y(i, j) ^ 2 CCsum = /__ /__ c(i, j) ^ 2 149 i=1 j=1 i=1 j=1 151 _N_ _N_ 152 \ \ 153 YCsum = /__ /__ y(i, j) * c(i, j) 154 i=1 j=1 156 Figure 3: Equations for linear regression 1 158 These sums will all be contained within a 32 bit signed integer when 159 the internal bitdepth is 8. Otherwise 64 bit integers must be used. 160 Then the following must be computed using 64 bit arithmetic 161 regardless of bitdepth: 163 SSyy = YYsum - ((Ysum * Ysum) >> 2*log2(N)) 164 SScc = CCsum - ((Csum * Csum) >> 2*log2(N)) 165 SSyc = YCsum - ((Ysum * Csum) >> 2*log2(N)) 167 Figure 4: Equations for linear regression 2 169 Still using 64 bit arithmetic, if 171 SSyy > 0 /\ 2 * SSyy * SSyy > SSyy * SScc 173 Figure 5: Requirement for improvement 2 175 then it is assumed that the correlation is reasonably good and a new 176 prediction will be computed and used. Otherwise, the initial 177 prediction will be kept. First, a and b must be computed. 2^15 is 178 added to b to ensure correct rounding later on. 180 a = (SSyc << 16) / SSyy 181 b = (((Csum << 16) - a * Ysum) >> 2*log2(N)) + (1 << 15) 183 Figure 6: Equation for linear regression 3 185 The final operations are performed with 32 bit arithmetic, so a must 186 be clipped to [-2^(31-B), 2^(31-B)], where B is the bitdepth, and b 187 must be clipped to [-2^31, 2^31-1]. The a new chroma prediction c' 188 is computed using the reconstructed luma samples yr, a and b, and a 189 clipping function saturating the results to an 8 bit value: 191 c'(i, j) = clip((a * yr(i, j) + b) >> 16) 193 Figure 7: Improved chroma prediction 195 The above assumes 4:4:4 format. For the 4:2:0 format the predicted 196 luma block must be subsampled first: 198 y'(i,j) = (y(2*i, 2*j) + y(2*i+i, 2j) + 199 y(2*i, 2*j+1) + y(2*i+1, 2*j+1) + 2) >> 2 201 Figure 8: Subsampling of predicted luma block 203 The resulting new chroma prediction must also be subsampled. The 204 clipping is performed before the subsampling. 206 c'(i, j) = (clip((a*yr(2*i, 2*j) + b) >> 16) + 207 clip((a*yr(2*i+1, 2*j) + b) >> 16) + 208 clip((a*yr(2*i, 2*j+1) + b) >> 16) + 209 clip((a*yr(2*i+1, 2*j+1) + b) >> 16) + 2) >> 2 211 Figure 9: Subsampling of improved chroma prediction 213 In intra mode the chroma prediction improvement must be performed 214 right after each transform, since the new chroma reconstruction will 215 be used to predict the next block. 217 5. Performance 219 The improved chroma prediction may significantly improve the 220 compression efficiency for images or video containing high 221 correlations between the channels. It is particularly useful for 222 encoding screen content, 4:4:4 content, high frequency content and 223 "difficult" content where traditional prediction techniques perform 224 poorly. Little quality change is seen for content not in these 225 categories, but there is a general small increase in chroma PSNR. 227 An encoded configured for low delay and high complexity was used for 228 the following results. The numbers have been computed using the 229 Bjontegaard Delta Rate (BDR [BDR]). The rates for Y, U and V have 230 been shown separately. 232 +--------------+--------------------+--------------------+ 233 | | 4:4:4 | 4:2:0 | 234 +--------------+------+------+------+------+------+------+ 235 |Sequence | Y | U | V | Y | U | V | 236 +--------------+------+------+------+------+------+------+ 237 |cad_waveform |-21.3%|-27.0%|-24.0%| 0.5%| -1.3%| -1.1%| 238 |pcb_layout | -9.2%|-13.3%|-10.6%| -1.6%| -3.1%| -3.5%| 239 |ppt_doc_xls | -6.3%|-14.1%|-12.7%| -0.1%| -0.8%| -0.8%| 240 |vc_doc_sharing| -2.9%| -6.4%| -6.9%| 0.3%| -1.2%| -0.6%| 241 |web_browsing | -0.5%| -1.1%| -1.5%| 0.3%| -0.5%| -1.0%| 242 |wordEditing | -1.8%| -5.9%| -4.8%| 1.5%| 1.2%| 1.1%| 243 |park_joy | -0.5%| -2.6%| -0.9%| -0.0%| -0.8%| 0.4%| 244 |old_town_cross| -0.1%| -2.2%| -1.2%| 0.0%| -0.6%| -0.2%| 245 +--------------+------+------+------+------+------+------+ 246 |Average | -5.3%| -9.1%| -7.8%| 0.1%| -0.9%| -0.7%| 247 +--------------+------+------+------+------+------+------+ 249 Figure 10: Compression Performance, improved prediction for intra 250 blocks only 252 +--------------+--------------------+--------------------+ 253 | | 4:4:4 | 4:2:0 | 254 +--------------+------+------+------+------+------+------+ 255 |Sequence | Y | U | V | Y | U | V | 256 +--------------+------+------+------+------+------+------+ 257 |cad_waveform |-23.1%|-28.9%|-26.1%| -2.6%| -3.6%| -3.5%| 258 |pcb_layout |-21.0%|-29.0%|-21.0%| -5.4%| -7.9%| -5.4%| 259 |ppt_doc_xls | -9.0%|-19.0%|-17.5%| -0.2%| -0.2%| -1.2%| 260 |vc_doc_sharing| -4.7%| -9.6%| -9.6%| -0.1%| -1.0%| -0.4%| 261 |web_browsing | -0.6%| -1.5%| -1.5%| -0.5%| -1.2%| -1.2%| 262 |wordEditing |-11.3%|-13.7%|-11.7%| -3.0%| -4.2%| -3.2%| 263 |park_joy | -5.5%| -7.4%| -7.1%| -0.9%| -1.9%| -1.6%| 264 |old_town_cross| -1.7%| -3.6%| -2.2%| -0.3%| -4.1%| -1.6%| 265 +--------------+------+------+------+------+------+------+ 266 |Average | -9.6%|-14.1%|-12.1%| -1.6%| -3.0%| -2.3%| 267 +--------------+------+------+------+------+------+------+ 269 Figure 11: Compression Performance, improved prediction using intra 270 only coding 272 +--------------+--------------------+--------------------+ 273 | | 4:4:4 | 4:2:0 | 274 +--------------+------+------+------+------+------+------+ 275 |Sequence | Y | U | V | Y | U | V | 276 +--------------+------+------+------+------+------+------+ 277 |cad_waveform |-11.5%|-14.4%|-12.7%| 0.0%| -1.8%| -1.7%| 278 |pcb_layout | -3.2%| -5.5%| -4.8%| -0.9%| -2.4%| -3.4%| 279 |ppt_doc_xls | -0.1%| -0.7%| -0.3%| 0.0%| -0.2%| -0.6%| 280 |vc_doc_sharing| -0.4%| -0.6%| -1.6%| -0.0%| -0.4%| -0.6%| 281 |web_browsing | 0.1%| 0.2%| 0.1%| 0.5%| -0.0%| -0.9%| 282 |wordEditing | -3.7%| -5.8%| -6.2%| 0.4%| -0.9%| -1.4%| 283 |park_joy | -1.6%| -8.6%| -1.5%| 0.0%| -3.5%| -0.2%| 284 |old_town_cross| -0.0%| -0.4%| -0.1%| 0.0%| 0.1%| -0.2%| 285 +--------------+------+------+------+------+------+------+ 286 |Average | -2.5%| -4.5%| -3.4%| 0.0%| -1.1%| -1.1%| 287 +--------------+------+------+------+------+------+------+ 289 Figure 12: Compression Performance, improved prediction for inter 290 blocks only 292 +--------------+--------------------+--------------------+ 293 | | 4:4:4 | 4:2:0 | 294 +--------------+------+------+------+------+------+------+ 295 |Sequence | Y | U | V | Y | U | V | 296 +--------------+------+------+------+------+------+------+ 297 |cad_waveform |-25.8%|-31.7%|-28.2%| -2.4%| -5.5%| -5.4%| 298 |pcb_layout |-11.5%|-16.1%|-13.5%| -2.4%| -4.1%| -5.6%| 299 |ppt_doc_xls | -6.3%|-14.3%|-13.2%| -0.2%| -0.8%| -0.8%| 300 |vc_doc_sharing| -3.0%| -6.7%| -8.2%| 0.1%| -0.9%| -1.1%| 301 |web_browsing | -0.5%| -1.2%| -1.5%| 0.2%| -0.3%| -2.0%| 302 |wordEditing | -3.4%| -6.8%| -6.6%| 0.6%| -0.5%| -1.4%| 303 |park_joy | -1.7%| -9.2%| -1.7%| -0.0%| -4.0%| 0.0%| 304 |old_town_cross| -0.1%| -2.2%| -1.0%| 0.1%| -0.5%| -0.1%| 305 +--------------+------+------+------+------+------+------+ 306 |Average | -6.5%|-11.0%| -9.2%| -0.5%| -2.1%| -2.0%| 307 +--------------+------+------+------+------+------+------+ 309 Figure 13: Compression Performance, improved prediction for intra and 310 inter blocks 312 6. IANA Considerations 314 This document has no IANA considerations yet. TBD 316 7. Security Considerations 318 This document has no security considerations yet. TBD 320 8. Acknowledgments 322 The author would like to thank Arild Fuldseth and Mo Zanaty for 323 reviewing this document, and Timothy Terriberry for pointing a couple 324 of errors in the first draft. 326 9. References 328 9.1. Normative References 330 [I-D.fuldseth-netvc-thor] 331 Fuldseth, A., Bjontegaard, G., Midtskogen, S., Davies, T., 332 and M. Zanaty, "Thor Video Codec", draft-fuldseth-netvc- 333 thor-02 (work in progress), March 2016. 335 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 336 Requirement Levels", BCP 14, RFC 2119, 337 DOI 10.17487/RFC2119, March 1997, 338 . 340 9.2. Informative References 342 [BDR] Bjontegaard, G., "Calculation of average PSNR differences 343 between RD-curves", ITU-T SG16 Q6 VCEG-M33 , April 2001. 345 Author's Address 347 Steinar Midtskogen 348 Cisco 349 Lysaker 350 Norway 352 Email: stemidts@cisco.com