idnits 2.17.1 

draft-davies-netvc-irfvc-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** There are 4 instances of lines with control characters in the document.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document doesn't use any RFC 2119 keywords, yet seems to have RFC
     2119 boilerplate text.

  -- The document date (October 19, 2015) is 3111 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Outdated reference: A later version (-03) exists of
     draft-fuldseth-netvc-thor-01


     Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Network Working Group                                          T. Davies
2	Internet-Draft                                                     Cisco
3	Intended status: Standards Track                        October 19, 2015
4	Expires: January 7, 2016

6	              Interpolated reference frames for video coding
7	                      draft-davies-netvc-irfvc-00

9	Abstract

11	   This document describes the use of interpolated reference frames in
12	   video coding in general, and in the Thor video codec in particular.

14	Status of This Memo

16	   This Internet-Draft is submitted in full conformance with the
17	   provisions of BCP 78 and BCP 79.

19	   Internet-Drafts are working documents of the Internet Engineering
20	   Task Force (IETF).  Note that other groups may also distribute
21	   working documents as Internet-Drafts.  The list of current Internet-
22	   Drafts is at http://datatracker.ietf.org/drafts/current/.

24	   Internet-Drafts are draft documents valid for a maximum of six months
25	   and may be updated, replaced, or obsoleted by other documents at any
26	   time.  It is inappropriate to use Internet-Drafts as reference
27	   material or to cite them other than as "work in progress."

29	   This Internet-Draft will expire on January 7, 2016.

31	Copyright Notice

33	   Copyright (c) 2015 IETF Trust and the persons identified as the
34	   document authors.  All rights reserved.

36	   This document is subject to BCP 78 and the IETF Trust's Legal
37	   Provisions Relating to IETF Documents
38	   (http://trustee.ietf.org/license-info) in effect on the date of
39	   publication of this document.  Please review these documents
40	   carefully, as they describe your rights and restrictions with respect
41	   to this document.  Code Components extracted from this document must
42	   include Simplified BSD License text as described in Section 4.e of
43	   the Trust Legal Provisions and are provided without warranty as
44	   described in the Simplified BSD License.

46	Table of Contents

48	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
49	   2.  Definitions . . . . . . . . . . . . . . . . . . . . . . . . .   3
50	     2.1.  Requirements Language . . . . . . . . . . . . . . . . . .   3
51	     2.2.  Terminology . . . . . . . . . . . . . . . . . . . . . . .   3
52	   3.  The interpolation process . . . . . . . . . . . . . . . . . .   3
53	     3.1.  Interpolation framework . . . . . . . . . . . . . . . . .   3
54	     3.2.  Motion estimation process . . . . . . . . . . . . . . . .   4
55	     3.3.  Complexity considerations . . . . . . . . . . . . . . . .   5
56	   4.  Coding using interpolated reference frames  . . . . . . . . .   6
57	   5.  Compression performance . . . . . . . . . . . . . . . . . . .   6
58	   6. IANA Considerations  . . . . . . . . . . . . . . . . . . . . .   8
59	   7. Security Considerations  . . . . . . . . . . . . . . . . . . .   8
60	   8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . .   8
61	   9. Normative References . . . . . . . . . . . . . . . . . . . . .   8
62	   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .   8

64	1.  Introduction

66	   This document describes a method of generating synthetic reference
67	   frames for video coding using a simplified frame interpolation
68	   method. The aim is to create a reference frame that is temporally
69	   co-located with the current frame being predicted, leveraging the
70	   motion information already present in the previously-coded frames,
71	   and removing the need for techniques such as motion vector scaling
72	   in motion vector prediction.

74	   Since the decoder will have to generate the same interpolated
75	   reference frame as the encoder, complexity considerations are a
76	   paramount concern. The interpolation process is therefore a highly
77	   simplified block-matching algorithm and uses only pixel-accurate
78	   motion vectors, for example. Worst-case complexity can be managed by
79	   controlling the number of matches per block, per region and per
80	   frame as well as the total vertical excursion to manage memory
81	   bandwidth.

83	   The method gives most gain in Thor at high quantisation (QP) levels
84	   i.e. low bitrates. Overall, Bjontegaard delta-rate (BDR) reductions
85	   across QP ranges 22-37 are on average 5.2% for a range of HD test
86	   sequences.  For higher QP (32-44) the reductions gains are larger:
87	   8.8% on average.

89	   Interpolated reference frames are enabled by default in the high
90	   complexity random access (RA) and High Delay B (HDB) configurations
91	   in the Thor repository github.com/cisco/thor.

93	   Section 3 describes the interpolation process, which is based on
94	   a simplified hierarchical motion estimation (HME). Section 4
95	   describes the modifications to the Thor syntax coding processes.
96	   Section 5 provides details of compression performance.

98	2.  Definitions

100	2.1.  Requirements Language

102	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
103	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
104	   document are to be interpreted as described in RFC 2119 [RFC2119].

106	2.2.  Terminology

108	   This document frequently uses the following terms.

110	      MV: Motion vector - a horizontal and vertical vector displacement
111	      (x,y)

113	      ME: Motion Estimation

115	      HME: Hierarchical ME

117	      SAD: Sum of Absolute Differences. A metric defined for a pair of
118	      equal dimension blocks of numerical vaules consisting of the sum
119	      of the absolute differences of the corresponding values in each
120	      location in the blocks

122	      QP: quantisation parameter

124	      BDR: Bjontegaard Delta-Rate

126	3.  The interpolation process

128	3.1.  Interpolation framework

130	  Consider two frames R0 and R1 and a frame F equidistant in time
131	  between them which is to be interpolated (Figure 1). Image data must
132	  be created for each block in F by combining information from R0 and
133	  R1 using a linear model for the block motion.

135	    ______________________________|_______|__________________________ R0
136	                                  |   /\  |
137	                                      /
138	                                     /
139	                                    / mv0
140	                                   /
141	                                  /
142	    ________________________|____/__|________________________________ F
143	                            |   /   |
144	                               /
145	                              /
146	                             / mv1
147	                            /
148	                           /
149	    __________________|___\/__|______________________________________ R1
150	                      |       |

152	            Figure 1: forward and backward motion pairs for a block

154	   For each block in the frame F there is an associated motion vector
155	   mv0 pointing at a displaced block in R0 and a corresponding motion
156	   vector mv1 which is equal to -mv0 pointing at R1.

158	   Where F is not equidistant from the reference frames the linear model
159	   can simply be scaled appropriately.

161	   If both blocks fall within the reference frames, then the
162	   interpolated block is just the average of the two reference blocks.
163	   At the edges of the frames one of the reference blocks may fall off
164	   the edge - here the other reference only is used instead.

166	3.2.  Motion estimation process

168	   Since F does not exist the motion estimation process consists of
169	   matching blocks B+mv0 in R0 with blocks B+mv1 in R1. A basic block
170	   size of 8x8 is used but the bulk of the motion estimation is done for
171	   16x16 blocks. For UHD resolutions, perhaps a larger basic block size
172	   would be better. The overall approach is to use hierarchical motion
173	   estimation (HME), as this is amenable to limiting both average and
174	   worst-case complexity.

176	   In the HME scheme each reference frame is down-scaled vertically and
177	   horizontally by a factor 2, using a (1/2,1/2) filter. This is done
178	   repeatedly to get a series R0(n) and R1(n) of reference frames. Then
179	   motion estimation is done very simply on each resolution layer n, but
180	   using candidates from next layer (n+1) as well as spatial neighbours.
181	   The block sizes are the same at each layer, so each block at layer
182	   n+1 corresponds to 4 blocks at layer n.

184	   For each layer, the ME stages are as follows:

186	      1.  For each 16x16 block in raster order:
187	         a.  Check if ME can be bypassed.
188	         b. If not bypassed, determine candidates from lower layer
189		    blocks and from neighbour blocks in raster order
190	         c. Perform an adaptive cross search around each candidate
191		    vector and determine the best vector

193	      2.  For each 8x8 block in raster order, find the best merge
194	          candidate, i.e. choose which MV to use: the original 16x16
195		  block vector, or one of 4 neighbouring block 16x16 vectors
196		  (above, below, left or right)

198	   The majority of blocks bypass ME at step 1a. Here a skip candidate
199	   is generated as:

201	   skipmv = argmin{mvx in {mv0,mv1,mv2}: sum_{i=0}^{2} |mvx-mvi|}

203	   where mv0,mv1,and mv2 are the motion vectors for blocks above, left
204	   and above-right the current block. If the cost for this vector is
205	   below a fixed value for each 8x8 sub-block, no further ME is done.

207	   In step 1c, the ranges of the cross search are restricted to just 2
208	   steps (max 8 matches) for each candidate, if the search is not at
209	   the lowest resolution layer. This is because vector candidates from
210	   the lower layer or from neighbours will already be highly accurate by
211	   this point.

213	   In step 1, the cost metric is a combination of luma SAD and a fixed
214	   multiple of the sum of abolute motion vector difference between the
215	   vector mvx and the four neighbours mv0,mv1,mv2,mv3 to the left,
216	   right, above and above right, i.e.

218	   sum_{i=0}^{3} |mvx-mvi|

220	   This helps make the motion estimation process less sensitive to noise
221	   and spurious matches.

223	   In step 2 the cost metric is SAD alone.

225	3.3.  Complexity considerations

227	   The ME process is not that sensitive to the selection of candidates,
228	   at least in terms of the impact on coding performance. If the
229	   interpolated frames are used directly this might not be so, but in
230	   effect the interpolated blocks are only going to be used for
231	   prediction if they are interpolated well: therefore effort refining
232	   bad matches is generally wasted, so should be avoided.

234	   This means that the ME process can be quite truncated. The only
235	   candidates considered are up to three neighbour block candidates and
236	   one from the layer below. The majority of motion estimation is
237	   skipped, and so only requires a single match. For HW applications
238	   the total number of matches would still require a hard limit, as
239	   well as limits for the matches per block and possibly per region.
240	   Vertical motion vector limits could also be imposed to reduce memory
241	   bandwidth costs.

243	4.  Coding using interpolated reference frames

245	   In the Thor implementation, when an interpolated reference frame is
246	   used it is inserted at the beginning of the reference pictures list
247	   and is given the same frame number as the current frame. Typically
248	   use of the interpolated reference frame causes a considerable
249	   increase in uni-pred prediction, often with no residual to code, and
250	   a reduction of bi-prediction modes. This changes the probability of
251	   the various supermode values used in Thor. Therefore in such frames
252	   it makes sense to modify the supermode coding to reflect this, and
253	   this contributes a small amount to coding gains. Full details are in
254	   [Fuld1].

256	5.  Compression performance

258	  Luma PSNR BDR percentage gains for standard QP ranges (22,27,32,37)
259	  are given in Table 1. For high QP (32,36,40,44), the results are in
260	  Table 2.

262	  -------------------------------------------------------
263	  1920x1080
264	  -------------------------------------------------------
265	  Kimono                                             -3.5
266	  ParkScene                                          -3.1
267	  Cactus                                             -4.9
268	  BasketballDrive                                    -2.1
269	  BQTerrace                                          -1.9
270	  ChangeSeats                                        -5.8
271	  HeaAndShoulder                                     -6.6
272	  TelePresence                                       -6.6
273	  WhiteBoard                                         -7.5
274	  -------------------------------------------------------
275	  1280x720
276	  -------------------------------------------------------
277	  FourPeople                                         -7.0
278	  Johnny                                             -6.2
279	  KristenAndSara                                     -7.0
280	  -------------------------------------------------------
281	  Average                                            -5.2

283	        Table 1: BDR reductions for standard QPs

285	  -------------------------------------------------------
286	  1920x1080
287	  -------------------------------------------------------
288	  Kimono                                             -6.6
289	  ParkScene                                          -7.0
290	  Cactus                                             -8.9
291	  BasketballDrive                                    -5.5
292	  BQTerrace                                          -4.7
293	  ChangeSeats                                       -12.1
294	  HeaAndShoulder                                    -10.1
295	  TelePresence                                      -11.0
296	  WhiteBoard                                        -12.4
297	  -------------------------------------------------------
298	  1280x720
299	  -------------------------------------------------------
300	  FourPeople                                         -9.1
301	  Johnny                                             -8.0
302	  KristenAndSara                                     -9.9
303	  -------------------------------------------------------
304	  Average                                            -8.8

306	        Table 2: BDR reductions for high QPs

308	6.  IANA Considerations

310	   This document has no IANA considerations.

312	7.  Security Considerations

314	   This document has no security considerations.

316	8.  Acknowledgements

318	   The author would like to thank Arild Fuldseth for assistance with
319	   experimental investigations, and Mo Zanaty for reviewing this
320	   document.

322	9.  Normative References

324	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
325	              Requirement Levels", BCP 14, RFC 2119, March 1997.

327	   [Fuld1]    Fuldseth, A., Bjontegaard, G., Zanaty, M. "The Thor video
328	              codec", draft-fuldseth-netvc-thor-01, October 2015.

330	Authors' Addresses

332	   Thomas Davies
333	   Cisco
334	   Feltham
335	   UK

337	   Email: thdavies@cisco.com