idnits 2.17.1 

draft-midtskogen-netvc-chromapred-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document doesn't use any RFC 2119 keywords, yet seems to have RFC
     2119 boilerplate text.

  -- The document date (October 31, 2016) is 2733 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Outdated reference: A later version (-03) exists of
     draft-fuldseth-netvc-thor-02


     Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Network Working Group                                      S. Midtskogen
3	Internet-Draft                                                     Cisco
4	Intended status: Standards Track                        October 31, 2016
5	Expires: May 4, 2017

7	                       Improved chroma prediction
8	                  draft-midtskogen-netvc-chromapred-02

10	Abstract

12	   This document describes the technique used to improve the chroma
13	   prediction in the Thor video codec.

15	Status of This Memo

17	   This Internet-Draft is submitted in full conformance with the
18	   provisions of BCP 78 and BCP 79.

20	   Internet-Drafts are working documents of the Internet Engineering
21	   Task Force (IETF).  Note that other groups may also distribute
22	   working documents as Internet-Drafts.  The list of current Internet-
23	   Drafts is at http://datatracker.ietf.org/drafts/current/.

25	   Internet-Drafts are draft documents valid for a maximum of six months
26	   and may be updated, replaced, or obsoleted by other documents at any
27	   time.  It is inappropriate to use Internet-Drafts as reference
28	   material or to cite them other than as "work in progress."

30	   This Internet-Draft will expire on May 4, 2017.

32	Copyright Notice

34	   Copyright (c) 2016 IETF Trust and the persons identified as the
35	   document authors.  All rights reserved.

37	   This document is subject to BCP 78 and the IETF Trust's Legal
38	   Provisions Relating to IETF Documents
39	   (http://trustee.ietf.org/license-info) in effect on the date of
40	   publication of this document.  Please review these documents
41	   carefully, as they describe your rights and restrictions with respect
42	   to this document.  Code Components extracted from this document must
43	   include Simplified BSD License text as described in Section 4.e of
44	   the Trust Legal Provisions and are provided without warranty as
45	   described in the Simplified BSD License.

47	Table of Contents

49	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
50	   2.  Definitions . . . . . . . . . . . . . . . . . . . . . . . . .   2
51	     2.1.  Requirements Language . . . . . . . . . . . . . . . . . .   2
52	   3.  Background  . . . . . . . . . . . . . . . . . . . . . . . . .   2
53	   4.  Computing the improved prediction . . . . . . . . . . . . . .   3
54	   5.  Performance . . . . . . . . . . . . . . . . . . . . . . . . .   6
55	   6.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   8
56	   7.  Security Considerations . . . . . . . . . . . . . . . . . . .   8
57	   8.  Acknowledgments . . . . . . . . . . . . . . . . . . . . . . .   8
58	   9.  References  . . . . . . . . . . . . . . . . . . . . . . . . .   8
59	     9.1.  Normative References  . . . . . . . . . . . . . . . . . .   8
60	     9.2.  Informative References  . . . . . . . . . . . . . . . . .   9
61	   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .   9

63	1.  Introduction

65	   Modern video coding standards such as Thor [I-D.fuldseth-netvc-thor]
66	   form predictions for the luma channel (Y) and chroma channels (U and
67	   V) which are encoded separately (in that order).  The prediction for
68	   each channel has spatial or temporal dependencies only in its own
69	   channel.  Most of the perceived information of a video is to be found
70	   in the luma channel, but there still remain correlations between the
71	   luma and chroma channels.  For instance, the same shape of an object
72	   can often be seen in all three channels, and if this correlation is
73	   not exploited, some structural information will be transmitted three
74	   times.  Thor will attempt to improve the chroma prediction by finding
75	   linear relationships between the each of the initial chroma
76	   predictions and the luma prediction, and if certain criteria are
77	   satisfied, use that relationship to form a new prediction based on
78	   the reconstructed luma samples.

80	2.  Definitions

82	2.1.  Requirements Language

84	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
85	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
86	   document are to be interpreted as described in RFC 2119 [RFC2119].

88	3.  Background

90	   The improved predictions are derived from the reconstructed luma
91	   samples using a mapping.  The underlying assumption is that the
92	   colours can be identified by their luminosities.  Informally we can
93	   say that a new chroma prediction is formed from the reconstructed
94	   luma block painted with the colours of the initial chroma prediction.

96	   There is often a linear correlation between the luma and chroma
97	   channel, so that a chroma sample c can be expressed by the linear
98	   function

100	                                c = a*y + b

102	                       Figure 1: Linear relationship

104	   where y is the corresponding luma sample.  This observation has been
105	   previously been used in techniques to convert YUV 4:2:0 and YUV 4:2:2
106	   images to YUV 4:4:4, and in a (rejected) proposal for HEVC as a
107	   special intra mode.  Thor, however, generalises the prediction, so it
108	   does not depend on the coding mode (i.e. whether inter or intra, or
109	   the kind of inter/intra mode).

111	   Since it would be too costly to transmit the values a and b in the
112	   linear mapping, and since both the encoder and decoder must be able
113	   to compute identical predictions, a and b are derived from data
114	   available to both using linear regression.

116	4.  Computing the improved prediction

118	   Since the assumption that the correlation is the same in the
119	   predicted block and in the reconstructed block is not always true,
120	   the new prediction from luma might not be better even when there is a
121	   very good correlation in the predicted block.  Therefore, we can only
122	   expected an improvement if the initial prediction is bad, and the
123	   luma residual is used as an estimate for this.  The initial chroma
124	   prediction is kept unless the average squared difference between the
125	   reconstructed luma samples yr and the predicted y samples for an N*N
126	   prediction block is above 64:

128	                       _N_ _N_
129	                       \   \
130	                       /__ /__ (yr(i, j) - y(i, j)) ^ 2
131	                       i=1 j=1
132	                       -------------------------------- > 64
133	                                      N*N

135	                  Figure 2: Requirement for improvement 1

137	   The encoder and decoder must compute a and b using the same least
138	   square fit for an N*N prediction block, where y and c denote the luma
139	   and chroma samples in the initial prediction:

141	              _N_ _N_                            _N_ _N_
142	              \   \                              \   \
143	       Ysum = /__ /__ y(i, j)             Csum = /__ /__ c(i, j)
144	              i=1 j=1                            i=1 j=1

146	              _N_ _N_                            _N_ _N_
147	              \   \                              \   \
148	      YYsum = /__ /__ y(i, j) ^ 2        CCsum = /__ /__ c(i, j) ^ 2
149	              i=1 j=1                            i=1 j=1

151	              _N_ _N_
152	              \   \
153	      YCsum = /__ /__ y(i, j) * c(i, j)
154	              i=1 j=1

156	                Figure 3: Equations for linear regression 1

158	   These sums will all be contained within a 32 bit signed integer when
159	   the internal bitdepth is 8.  Otherwise 64 bit integers must be used.
160	   Then the following must be computed using 64 bit arithmetic
161	   regardless of bitdepth:

163	                SSyy = YYsum - ((Ysum * Ysum) >> 2*log2(N))
164	                SScc = CCsum - ((Csum * Csum) >> 2*log2(N))
165	                SSyc = YCsum - ((Ysum * Csum) >> 2*log2(N))

167	                Figure 4: Equations for linear regression 2

169	   Still using 64 bit arithmetic, if

171	                 SSyy > 0 /\ 2 * SSyy * SSyy > SSyy * SScc

173	                  Figure 5: Requirement for improvement 2

175	   then it is assumed that the correlation is reasonably good and a new
176	   prediction will be computed and used.  Otherwise, the initial
177	   prediction will be kept.  First, a and b must be computed.  2^15 is
178	   added to b to ensure correct rounding later on.

180	         a = (SSyc << 16) / SSyy
181	         b = (((Csum << 16) - a * Ysum) >> 2*log2(N)) + (1 << 15)

183	                Figure 6: Equation for linear regression 3

185	   The final operations are performed with 32 bit arithmetic, so a must
186	   be clipped to [-2^(31-B), 2^(31-B)], where B is the bitdepth, and b
187	   must be clipped to [-2^31, 2^31-1].  The a new chroma prediction c'
188	   is computed using the reconstructed luma samples yr, a and b, and a
189	   clipping function saturating the results to an 8 bit value:

191	                 c'(i, j) = clip((a * yr(i, j) + b) >> 16)

193	                   Figure 7: Improved chroma prediction

195	   The above assumes 4:4:4 format.  For the 4:2:0 format the predicted
196	   luma block must be subsampled first:

198	           y'(i,j) = (y(2*i, 2*j)   + y(2*i+i, 2j) +
199	                      y(2*i, 2*j+1) + y(2*i+1, 2*j+1) + 2) >> 2

201	               Figure 8: Subsampling of predicted luma block

203	   The resulting new chroma prediction must also be subsampled.  The
204	   clipping is performed before the subsampling.

206	        c'(i, j) = (clip((a*yr(2*i, 2*j) + b) >> 16) +
207	                    clip((a*yr(2*i+1, 2*j) + b) >> 16) +
208	                    clip((a*yr(2*i, 2*j+1) + b) >> 16) +
209	                    clip((a*yr(2*i+1, 2*j+1) + b) >> 16) + 2) >> 2

211	            Figure 9: Subsampling of improved chroma prediction

213	   In intra mode the chroma prediction improvement must be performed
214	   right after each transform, since the new chroma reconstruction will
215	   be used to predict the next block.

217	5.  Performance

219	   The improved chroma prediction may significantly improve the
220	   compression efficiency for images or video containing high
221	   correlations between the channels.  It is particularly useful for
222	   encoding screen content, 4:4:4 content, high frequency content and
223	   "difficult" content where traditional prediction techniques perform
224	   poorly.  Little quality change is seen for content not in these
225	   categories, but there is a general small increase in chroma PSNR.

227	   An encoded configured for low delay and high complexity was used for
228	   the following results.  The numbers have been computed using the
229	   Bjontegaard Delta Rate (BDR [BDR]).  The rates for Y, U and V have
230	   been shown separately.

232	        +--------------+--------------------+--------------------+
233	        |              |        4:4:4       |        4:2:0       |
234	        +--------------+------+------+------+------+------+------+
235	        |Sequence      |   Y  |   U  |   V  |   Y  |   U  |   V  |
236	        +--------------+------+------+------+------+------+------+
237	        |cad_waveform  |-21.3%|-27.0%|-24.0%|  0.5%| -1.3%| -1.1%|
238	        |pcb_layout    | -9.2%|-13.3%|-10.6%| -1.6%| -3.1%| -3.5%|
239	        |ppt_doc_xls   | -6.3%|-14.1%|-12.7%| -0.1%| -0.8%| -0.8%|
240	        |vc_doc_sharing| -2.9%| -6.4%| -6.9%|  0.3%| -1.2%| -0.6%|
241	        |web_browsing  | -0.5%| -1.1%| -1.5%|  0.3%| -0.5%| -1.0%|
242	        |wordEditing   | -1.8%| -5.9%| -4.8%|  1.5%|  1.2%|  1.1%|
243	        |park_joy      | -0.5%| -2.6%| -0.9%| -0.0%| -0.8%|  0.4%|
244	        |old_town_cross| -0.1%| -2.2%| -1.2%|  0.0%| -0.6%| -0.2%|
245	        +--------------+------+------+------+------+------+------+
246	        |Average       | -5.3%| -9.1%| -7.8%|  0.1%| -0.9%| -0.7%|
247	        +--------------+------+------+------+------+------+------+

249	     Figure 10: Compression Performance, improved prediction for intra
250	                                blocks only

252	        +--------------+--------------------+--------------------+
253	        |              |        4:4:4       |        4:2:0       |
254	        +--------------+------+------+------+------+------+------+
255	        |Sequence      |   Y  |   U  |   V  |   Y  |   U  |   V  |
256	        +--------------+------+------+------+------+------+------+
257	        |cad_waveform  |-23.1%|-28.9%|-26.1%| -2.6%| -3.6%| -3.5%|
258	        |pcb_layout    |-21.0%|-29.0%|-21.0%| -5.4%| -7.9%| -5.4%|
259	        |ppt_doc_xls   | -9.0%|-19.0%|-17.5%| -0.2%| -0.2%| -1.2%|
260	        |vc_doc_sharing| -4.7%| -9.6%| -9.6%| -0.1%| -1.0%| -0.4%|
261	        |web_browsing  | -0.6%| -1.5%| -1.5%| -0.5%| -1.2%| -1.2%|
262	        |wordEditing   |-11.3%|-13.7%|-11.7%| -3.0%| -4.2%| -3.2%|
263	        |park_joy      | -5.5%| -7.4%| -7.1%| -0.9%| -1.9%| -1.6%|
264	        |old_town_cross| -1.7%| -3.6%| -2.2%| -0.3%| -4.1%| -1.6%|
265	        +--------------+------+------+------+------+------+------+
266	        |Average       | -9.6%|-14.1%|-12.1%| -1.6%| -3.0%| -2.3%|
267	        +--------------+------+------+------+------+------+------+

269	    Figure 11: Compression Performance, improved prediction using intra
270	                                only coding

272	        +--------------+--------------------+--------------------+
273	        |              |        4:4:4       |        4:2:0       |
274	        +--------------+------+------+------+------+------+------+
275	        |Sequence      |   Y  |   U  |   V  |   Y  |   U  |   V  |
276	        +--------------+------+------+------+------+------+------+
277	        |cad_waveform  |-11.5%|-14.4%|-12.7%|  0.0%| -1.8%| -1.7%|
278	        |pcb_layout    | -3.2%| -5.5%| -4.8%| -0.9%| -2.4%| -3.4%|
279	        |ppt_doc_xls   | -0.1%| -0.7%| -0.3%|  0.0%| -0.2%| -0.6%|
280	        |vc_doc_sharing| -0.4%| -0.6%| -1.6%| -0.0%| -0.4%| -0.6%|
281	        |web_browsing  |  0.1%|  0.2%|  0.1%|  0.5%| -0.0%| -0.9%|
282	        |wordEditing   | -3.7%| -5.8%| -6.2%|  0.4%| -0.9%| -1.4%|
283	        |park_joy      | -1.6%| -8.6%| -1.5%|  0.0%| -3.5%| -0.2%|
284	        |old_town_cross| -0.0%| -0.4%| -0.1%|  0.0%|  0.1%| -0.2%|
285	        +--------------+------+------+------+------+------+------+
286	        |Average       | -2.5%| -4.5%| -3.4%|  0.0%| -1.1%| -1.1%|
287	        +--------------+------+------+------+------+------+------+

289	     Figure 12: Compression Performance, improved prediction for inter
290	                                blocks only

292	        +--------------+--------------------+--------------------+
293	        |              |        4:4:4       |        4:2:0       |
294	        +--------------+------+------+------+------+------+------+
295	        |Sequence      |   Y  |   U  |   V  |   Y  |   U  |   V  |
296	        +--------------+------+------+------+------+------+------+
297	        |cad_waveform  |-25.8%|-31.7%|-28.2%| -2.4%| -5.5%| -5.4%|
298	        |pcb_layout    |-11.5%|-16.1%|-13.5%| -2.4%| -4.1%| -5.6%|
299	        |ppt_doc_xls   | -6.3%|-14.3%|-13.2%| -0.2%| -0.8%| -0.8%|
300	        |vc_doc_sharing| -3.0%| -6.7%| -8.2%|  0.1%| -0.9%| -1.1%|
301	        |web_browsing  | -0.5%| -1.2%| -1.5%|  0.2%| -0.3%| -2.0%|
302	        |wordEditing   | -3.4%| -6.8%| -6.6%|  0.6%| -0.5%| -1.4%|
303	        |park_joy      | -1.7%| -9.2%| -1.7%| -0.0%| -4.0%|  0.0%|
304	        |old_town_cross| -0.1%| -2.2%| -1.0%|  0.1%| -0.5%| -0.1%|
305	        +--------------+------+------+------+------+------+------+
306	        |Average       | -6.5%|-11.0%| -9.2%| -0.5%| -2.1%| -2.0%|
307	        +--------------+------+------+------+------+------+------+

309	   Figure 13: Compression Performance, improved prediction for intra and
310	                               inter blocks

312	6.  IANA Considerations

314	   This document has no IANA considerations yet.  TBD

316	7.  Security Considerations

318	   This document has no security considerations yet.  TBD

320	8.  Acknowledgments

322	   The author would like to thank Arild Fuldseth and Mo Zanaty for
323	   reviewing this document, and Timothy Terriberry for pointing a couple
324	   of errors in the first draft.

326	9.  References

328	9.1.  Normative References

330	   [I-D.fuldseth-netvc-thor]
331	              Fuldseth, A., Bjontegaard, G., Midtskogen, S., Davies, T.,
332	              and M. Zanaty, "Thor Video Codec", draft-fuldseth-netvc-
333	              thor-02 (work in progress), March 2016.

335	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
336	              Requirement Levels", BCP 14, RFC 2119,
337	              DOI 10.17487/RFC2119, March 1997,
338	              <http://www.rfc-editor.org/info/rfc2119>.

340	9.2.  Informative References

342	   [BDR]      Bjontegaard, G., "Calculation of average PSNR differences
343	              between RD-curves", ITU-T SG16 Q6 VCEG-M33 , April 2001.

345	Author's Address

347	   Steinar Midtskogen
348	   Cisco
349	   Lysaker
350	   Norway

352	   Email: stemidts@cisco.com