idnits 2.17.1 draft-davies-netvc-irfvc-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 4 instances of lines with control characters in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (October 19, 2015) is 3111 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-03) exists of draft-fuldseth-netvc-thor-01 Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group T. Davies 2 Internet-Draft Cisco 3 Intended status: Standards Track October 19, 2015 4 Expires: January 7, 2016 6 Interpolated reference frames for video coding 7 draft-davies-netvc-irfvc-00 9 Abstract 11 This document describes the use of interpolated reference frames in 12 video coding in general, and in the Thor video codec in particular. 14 Status of This Memo 16 This Internet-Draft is submitted in full conformance with the 17 provisions of BCP 78 and BCP 79. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF). Note that other groups may also distribute 21 working documents as Internet-Drafts. The list of current Internet- 22 Drafts is at http://datatracker.ietf.org/drafts/current/. 24 Internet-Drafts are draft documents valid for a maximum of six months 25 and may be updated, replaced, or obsoleted by other documents at any 26 time. It is inappropriate to use Internet-Drafts as reference 27 material or to cite them other than as "work in progress." 29 This Internet-Draft will expire on January 7, 2016. 31 Copyright Notice 33 Copyright (c) 2015 IETF Trust and the persons identified as the 34 document authors. All rights reserved. 36 This document is subject to BCP 78 and the IETF Trust's Legal 37 Provisions Relating to IETF Documents 38 (http://trustee.ietf.org/license-info) in effect on the date of 39 publication of this document. Please review these documents 40 carefully, as they describe your rights and restrictions with respect 41 to this document. Code Components extracted from this document must 42 include Simplified BSD License text as described in Section 4.e of 43 the Trust Legal Provisions and are provided without warranty as 44 described in the Simplified BSD License. 46 Table of Contents 48 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 49 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 3 50 2.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 51 2.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 52 3. The interpolation process . . . . . . . . . . . . . . . . . . 3 53 3.1. Interpolation framework . . . . . . . . . . . . . . . . . 3 54 3.2. Motion estimation process . . . . . . . . . . . . . . . . 4 55 3.3. Complexity considerations . . . . . . . . . . . . . . . . 5 56 4. Coding using interpolated reference frames . . . . . . . . . 6 57 5. Compression performance . . . . . . . . . . . . . . . . . . . 6 58 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 59 7. Security Considerations . . . . . . . . . . . . . . . . . . . 8 60 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 8 61 9. Normative References . . . . . . . . . . . . . . . . . . . . . 8 62 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 8 64 1. Introduction 66 This document describes a method of generating synthetic reference 67 frames for video coding using a simplified frame interpolation 68 method. The aim is to create a reference frame that is temporally 69 co-located with the current frame being predicted, leveraging the 70 motion information already present in the previously-coded frames, 71 and removing the need for techniques such as motion vector scaling 72 in motion vector prediction. 74 Since the decoder will have to generate the same interpolated 75 reference frame as the encoder, complexity considerations are a 76 paramount concern. The interpolation process is therefore a highly 77 simplified block-matching algorithm and uses only pixel-accurate 78 motion vectors, for example. Worst-case complexity can be managed by 79 controlling the number of matches per block, per region and per 80 frame as well as the total vertical excursion to manage memory 81 bandwidth. 83 The method gives most gain in Thor at high quantisation (QP) levels 84 i.e. low bitrates. Overall, Bjontegaard delta-rate (BDR) reductions 85 across QP ranges 22-37 are on average 5.2% for a range of HD test 86 sequences. For higher QP (32-44) the reductions gains are larger: 87 8.8% on average. 89 Interpolated reference frames are enabled by default in the high 90 complexity random access (RA) and High Delay B (HDB) configurations 91 in the Thor repository github.com/cisco/thor. 93 Section 3 describes the interpolation process, which is based on 94 a simplified hierarchical motion estimation (HME). Section 4 95 describes the modifications to the Thor syntax coding processes. 96 Section 5 provides details of compression performance. 98 2. Definitions 100 2.1. Requirements Language 102 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 103 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 104 document are to be interpreted as described in RFC 2119 [RFC2119]. 106 2.2. Terminology 108 This document frequently uses the following terms. 110 MV: Motion vector - a horizontal and vertical vector displacement 111 (x,y) 113 ME: Motion Estimation 115 HME: Hierarchical ME 117 SAD: Sum of Absolute Differences. A metric defined for a pair of 118 equal dimension blocks of numerical vaules consisting of the sum 119 of the absolute differences of the corresponding values in each 120 location in the blocks 122 QP: quantisation parameter 124 BDR: Bjontegaard Delta-Rate 126 3. The interpolation process 128 3.1. Interpolation framework 130 Consider two frames R0 and R1 and a frame F equidistant in time 131 between them which is to be interpolated (Figure 1). Image data must 132 be created for each block in F by combining information from R0 and 133 R1 using a linear model for the block motion. 135 ______________________________|_______|__________________________ R0 136 | /\ | 137 / 138 / 139 / mv0 140 / 141 / 142 ________________________|____/__|________________________________ F 143 | / | 144 / 145 / 146 / mv1 147 / 148 / 149 __________________|___\/__|______________________________________ R1 150 | | 152 Figure 1: forward and backward motion pairs for a block 154 For each block in the frame F there is an associated motion vector 155 mv0 pointing at a displaced block in R0 and a corresponding motion 156 vector mv1 which is equal to -mv0 pointing at R1. 158 Where F is not equidistant from the reference frames the linear model 159 can simply be scaled appropriately. 161 If both blocks fall within the reference frames, then the 162 interpolated block is just the average of the two reference blocks. 163 At the edges of the frames one of the reference blocks may fall off 164 the edge - here the other reference only is used instead. 166 3.2. Motion estimation process 168 Since F does not exist the motion estimation process consists of 169 matching blocks B+mv0 in R0 with blocks B+mv1 in R1. A basic block 170 size of 8x8 is used but the bulk of the motion estimation is done for 171 16x16 blocks. For UHD resolutions, perhaps a larger basic block size 172 would be better. The overall approach is to use hierarchical motion 173 estimation (HME), as this is amenable to limiting both average and 174 worst-case complexity. 176 In the HME scheme each reference frame is down-scaled vertically and 177 horizontally by a factor 2, using a (1/2,1/2) filter. This is done 178 repeatedly to get a series R0(n) and R1(n) of reference frames. Then 179 motion estimation is done very simply on each resolution layer n, but 180 using candidates from next layer (n+1) as well as spatial neighbours. 181 The block sizes are the same at each layer, so each block at layer 182 n+1 corresponds to 4 blocks at layer n. 184 For each layer, the ME stages are as follows: 186 1. For each 16x16 block in raster order: 187 a. Check if ME can be bypassed. 188 b. If not bypassed, determine candidates from lower layer 189 blocks and from neighbour blocks in raster order 190 c. Perform an adaptive cross search around each candidate 191 vector and determine the best vector 193 2. For each 8x8 block in raster order, find the best merge 194 candidate, i.e. choose which MV to use: the original 16x16 195 block vector, or one of 4 neighbouring block 16x16 vectors 196 (above, below, left or right) 198 The majority of blocks bypass ME at step 1a. Here a skip candidate 199 is generated as: 201 skipmv = argmin{mvx in {mv0,mv1,mv2}: sum_{i=0}^{2} |mvx-mvi|} 203 where mv0,mv1,and mv2 are the motion vectors for blocks above, left 204 and above-right the current block. If the cost for this vector is 205 below a fixed value for each 8x8 sub-block, no further ME is done. 207 In step 1c, the ranges of the cross search are restricted to just 2 208 steps (max 8 matches) for each candidate, if the search is not at 209 the lowest resolution layer. This is because vector candidates from 210 the lower layer or from neighbours will already be highly accurate by 211 this point. 213 In step 1, the cost metric is a combination of luma SAD and a fixed 214 multiple of the sum of abolute motion vector difference between the 215 vector mvx and the four neighbours mv0,mv1,mv2,mv3 to the left, 216 right, above and above right, i.e. 218 sum_{i=0}^{3} |mvx-mvi| 220 This helps make the motion estimation process less sensitive to noise 221 and spurious matches. 223 In step 2 the cost metric is SAD alone. 225 3.3. Complexity considerations 227 The ME process is not that sensitive to the selection of candidates, 228 at least in terms of the impact on coding performance. If the 229 interpolated frames are used directly this might not be so, but in 230 effect the interpolated blocks are only going to be used for 231 prediction if they are interpolated well: therefore effort refining 232 bad matches is generally wasted, so should be avoided. 234 This means that the ME process can be quite truncated. The only 235 candidates considered are up to three neighbour block candidates and 236 one from the layer below. The majority of motion estimation is 237 skipped, and so only requires a single match. For HW applications 238 the total number of matches would still require a hard limit, as 239 well as limits for the matches per block and possibly per region. 240 Vertical motion vector limits could also be imposed to reduce memory 241 bandwidth costs. 243 4. Coding using interpolated reference frames 245 In the Thor implementation, when an interpolated reference frame is 246 used it is inserted at the beginning of the reference pictures list 247 and is given the same frame number as the current frame. Typically 248 use of the interpolated reference frame causes a considerable 249 increase in uni-pred prediction, often with no residual to code, and 250 a reduction of bi-prediction modes. This changes the probability of 251 the various supermode values used in Thor. Therefore in such frames 252 it makes sense to modify the supermode coding to reflect this, and 253 this contributes a small amount to coding gains. Full details are in 254 [Fuld1]. 256 5. Compression performance 258 Luma PSNR BDR percentage gains for standard QP ranges (22,27,32,37) 259 are given in Table 1. For high QP (32,36,40,44), the results are in 260 Table 2. 262 ------------------------------------------------------- 263 1920x1080 264 ------------------------------------------------------- 265 Kimono -3.5 266 ParkScene -3.1 267 Cactus -4.9 268 BasketballDrive -2.1 269 BQTerrace -1.9 270 ChangeSeats -5.8 271 HeaAndShoulder -6.6 272 TelePresence -6.6 273 WhiteBoard -7.5 274 ------------------------------------------------------- 275 1280x720 276 ------------------------------------------------------- 277 FourPeople -7.0 278 Johnny -6.2 279 KristenAndSara -7.0 280 ------------------------------------------------------- 281 Average -5.2 283 Table 1: BDR reductions for standard QPs 285 ------------------------------------------------------- 286 1920x1080 287 ------------------------------------------------------- 288 Kimono -6.6 289 ParkScene -7.0 290 Cactus -8.9 291 BasketballDrive -5.5 292 BQTerrace -4.7 293 ChangeSeats -12.1 294 HeaAndShoulder -10.1 295 TelePresence -11.0 296 WhiteBoard -12.4 297 ------------------------------------------------------- 298 1280x720 299 ------------------------------------------------------- 300 FourPeople -9.1 301 Johnny -8.0 302 KristenAndSara -9.9 303 ------------------------------------------------------- 304 Average -8.8 306 Table 2: BDR reductions for high QPs 308 6. IANA Considerations 310 This document has no IANA considerations. 312 7. Security Considerations 314 This document has no security considerations. 316 8. Acknowledgements 318 The author would like to thank Arild Fuldseth for assistance with 319 experimental investigations, and Mo Zanaty for reviewing this 320 document. 322 9. Normative References 324 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 325 Requirement Levels", BCP 14, RFC 2119, March 1997. 327 [Fuld1] Fuldseth, A., Bjontegaard, G., Zanaty, M. "The Thor video 328 codec", draft-fuldseth-netvc-thor-01, October 2015. 330 Authors' Addresses 332 Thomas Davies 333 Cisco 334 Feltham 335 UK 337 Email: thdavies@cisco.com