idnits 2.17.1 draft-midtskogen-netvc-clpf-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There is 1 instance of too long lines in the document, the longest one being 2 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (April 5, 2016) is 2941 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-03) exists of draft-fuldseth-netvc-thor-02 Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group S. Midtskogen 3 Internet-Draft A. Fuldseth 4 Intended status: Standards Track M. Zanaty 5 Expires: October 7, 2016 Cisco 6 April 5, 2016 8 Constrained Low Pass Filter 9 draft-midtskogen-netvc-clpf-02 11 Abstract 13 This document describes a low complexity filtering technique which is 14 being used as a low pass loop filter in the Thor video codec. 16 Status of This Memo 18 This Internet-Draft is submitted in full conformance with the 19 provisions of BCP 78 and BCP 79. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF). Note that other groups may also distribute 23 working documents as Internet-Drafts. The list of current Internet- 24 Drafts is at http://datatracker.ietf.org/drafts/current/. 26 Internet-Drafts are draft documents valid for a maximum of six months 27 and may be updated, replaced, or obsoleted by other documents at any 28 time. It is inappropriate to use Internet-Drafts as reference 29 material or to cite them other than as "work in progress." 31 This Internet-Draft will expire on October 7, 2016. 33 Copyright Notice 35 Copyright (c) 2016 IETF Trust and the persons identified as the 36 document authors. All rights reserved. 38 This document is subject to BCP 78 and the IETF Trust's Legal 39 Provisions Relating to IETF Documents 40 (http://trustee.ietf.org/license-info) in effect on the date of 41 publication of this document. Please review these documents 42 carefully, as they describe your rights and restrictions with respect 43 to this document. Code Components extracted from this document must 44 include Simplified BSD License text as described in Section 4.e of 45 the Trust Legal Provisions and are provided without warranty as 46 described in the Simplified BSD License. 48 Table of Contents 50 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 51 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 2 52 2.1. Requirements Language . . . . . . . . . . . . . . . . . . 2 53 2.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 2 54 3. Filtering Process . . . . . . . . . . . . . . . . . . . . . . 3 55 4. Further complexity considerations . . . . . . . . . . . . . . 4 56 5. Performance . . . . . . . . . . . . . . . . . . . . . . . . . 5 57 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 58 7. Security Considerations . . . . . . . . . . . . . . . . . . . 7 59 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 7 60 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 7 61 9.1. Normative References . . . . . . . . . . . . . . . . . . 7 62 9.2. Informative References . . . . . . . . . . . . . . . . . 8 63 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 8 65 1. Introduction 67 Modern video coding standards such as Thor [I-D.fuldseth-netvc-thor] 68 include in-loop filters which correct artifacts introduced in the 69 encoding process. Thor includes a deblocking filter which corrects 70 artifacts introduced by the block based nature of the encoding 71 process, and a low pass filter correcting artifacts not corrected by 72 the deblocking filter, in particular artifacts introduced by 73 quantisation errors of transform coefficients and by the 74 interpolation filter. Since in-loop filters have to be applied in 75 both the encoder and decoder, it is highly desirable that these 76 filters have low computational complexity. 78 2. Definitions 80 2.1. Requirements Language 82 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 83 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 84 document are to be interpreted as described in RFC 2119 [RFC2119]. 86 2.2. Terminology 88 This document will refer to a pixel X and six of its neighbouring 89 pixels A, B, C, D, E, F ordered in the following pattern. 91 +---+---+---+---+---+ 92 | | | A | | | 93 +---+---+---+---+---+ 94 | B | C | X | D | E | 95 +---+---+---+---+---+ 96 | | | F | | | 97 +---+---+---+---+---+ 99 Figure 1: Filter pixel positions 101 In Thor the frames are divided into filter blocks (FB) of 128x128, 102 64x64 or 32x32 pixels. The size is signalled for each frame to be 103 filtered. Also, each frame is divided into coding blocks (CB) which 104 range from 8x8 to 128x128 independent of the FB size. The filter 105 described in this draft can be switched on or off for the entire 106 frame or optionally on or off for each FB. CB's that have been coded 107 using the skip mode are not filtered, and if a FB only contains CB's 108 that have been coded in skip mode, the FB will not be filtered and no 109 signal will be transmitted for this FB. 111 If the frame can't fit a whole number of FB's, the FB's at the right 112 and bottom edges are clipped to fit. For instance, if the frame 113 resolution is 1920x1080 and the FB size is 128x128, the size of the 114 FB's at the bottom of the frame becomes 128x56. 116 3. Filtering Process 118 Given a pixel X and its neighbouring pixels described above we can 119 define a general non-linear filter as: 121 X' = X + clip(a*clip(A-X,-s,s) + b*clip(B-X,-s,s) + c*clip(C-X,-s,s) + 122 d*clip(D-X,-s,s) + e*clip(E-X,-s,s) + f*clip(F-X,-s,s),-g,g) 124 Figure 2: Equation 1 126 If a neighbour pixel is outside the image frame, it is given the same 127 value as the closest pixel within the frame. To avoid dependencies 128 prohibiting parallel processing, all neighbour pixels must be the 129 unfiltered pixels of the frame being filtered. 131 Experiments in Thor have shown that a good compromise between 132 complexity and performance is a=f=1/4, b=e=1/16, c=d=3/16 and the 133 filter strength s being 1, 2 or 4 signalled at frame level. These 134 values eliminate the need for the outer clipping to +/-g. The 135 rounding is to the nearest integer. 137 This gives us the equation: 139 X' = X + (4*clip(A-X,-s,s) + clip(B-X,-s,s) + 3*clip(C-X,-s,s) + 140 3*clip(D-X,-s,s) + clip(E-X,-s,s) + 4*clip(F-X,-s,s)) / 16 142 Figure 3: Equation 2 144 It can be noted that a=c=d=f=1/4, b=e=0 and s=1 give a slighly 145 simpler filter which is very similar to the one described in the 146 first version of this draft. 148 The filter leaves the encoder 13 different choices for a frame. The 149 filter can be disabled for the entire frame, or the frame is filtered 150 using all distinct combinations of strength (1, 2 or 4), non-skip FB 151 signal (enabled/disabled) and FB size (32x32, 64x64 or 128x128). 152 Note that the FB size only matters when FB signalling is in use. 154 The decisions at both frame level and FB level may be based on rate- 155 distortion optimisation (RDO), but an encoder running in a low- 156 complexity mode, or possibly a low-delay mode, may instead assume 157 that a fixed mode will be beneficial. In general, using s=2, a QP 158 dependent FB size and RDO only at the FB level gives good results. 160 However, because of the low complexity of the filter, fully RDO based 161 decisions are not costly. The distortion of the 13 configurations of 162 the filter can easily be computed in a single pass by keeping track 163 of the distortions of the three different strengths and the bit costs 164 for different FB sizes. 166 The filter is applied after the deblocking filter. 168 4. Further complexity considerations 170 The filter has been designed to offer the best compromise between low 171 complexity and performance. A single pixel can be filtered with 172 simple operations as illustrated by this C function: 174 int clpf_sample(int X, int A, int B, int C, int D, int E, int F, int s) 175 { 176 int delta = 177 4*clip(A - X, -s, s) + clip(B - X, -s, s) + 3*clip(C - X, -s, s) + 178 3*clip(D - X, -s, s) + clip(E - X, -s, s) + 4*clip(F - X, -s, s); 179 return (8 + delta - (delta < 0)) >> 4; // Assumes arithmetic shift 180 } 182 Figure 4: C code 184 Also, these operations are easily vectorised in architectures 185 supporting SIMD instructions, such as x86/SSE4 and ARM/NEON. The 186 pixel difference is 9 bit, but it can be computed using adding an 8 187 bit offset and the use of 8 bit saturated signed subtraction. This 188 means that 16 pixels per core can be filtered in parallel on these 189 architectures. Clipping at frame borders can be implemented using 190 shuffle instructions. 192 A C implementation using x86/SSE4 intrinsics required 6.8 193 instructions per pixel to filter a single 8x8 block. The 194 corresponding number for ARM/NEON (armv7) was 4.9. The compiler was 195 gcc 4.8.4 in both cases. 197 Since the filter only needs to look up pixels in the line directly 198 above and below the pixel to be filtered, the line buffer requirement 199 in hardware implementations is very low. 201 5. Performance 203 The table below shows filters effect on the bandwidth for a selection 204 of 10 second video sequences encoded in Thor with uni-prediction 205 only. The numbers have been computed using the Bjontegaard Delta 206 Rate (BDR). BDR-low and BDR-high indicate the effect at low and high 207 bitrates, respectively, as described in BDR [BDR]. 209 The effect of the filter was tested in two encoder low-delay 210 configurations: high complexity in which the encoder strongly favours 211 compression efficiency over CPU usage, and medium complexity which is 212 more suited for real-time applications. The bandwidth reduction is 213 somewhat less in the high complexity configuration. 215 +----------------+--------------------+--------------------+ 216 | | MEDIUM COMPLEXITY | HIGH COMPLEXITY | 217 +----------------+------+------+------+--------------------+ 218 | | | BDR- | BDR- | | BDR- | BDR- | 219 |Sequence | BDR | low | high | BDR | low | high | 220 +----------------+------+------+------+------+------+------+ 221 |Kimono | -2.7%| -2.3%| -3.4%| -1.9%| -1.8%| -2.0%| 222 |BasketballDrive | -3.3%| -2.5%| -4.5%| -2.1%| -1.6%| -3.0%| 223 |BQTerrace | -7.2%| -4.9%| -9.1%| -5.5%| -3.7%| -6.7%| 224 |FourPeople | -5.7%| -3.9%| -8.6%| -4.0%| -2.8%| -6.0%| 225 |Johnny | -5.9%| -4.0%| -9.0%| -4.7%| -4.0%| -5.8%| 226 |ChangeSeats | -6.4%| -3.4%|-10.8%| -4.5%| -2.8%| -6.8%| 227 |HeadAndShoulder | -8.6%| -2.6%|-18.8%| -5.8%| -2.2%|-11.1%| 228 |TelePresence | -5.9%| -3.1%|-10.7%| -4.0%| -2.0%| -7.0%| 229 +----------------+------+------+------+--------------------+ 230 |Average | -5.7%| -3.3%| -9.4%| -4.0%| -2.6%| -6.0%| 231 +----------------+------+------+------+--------------------+ 233 Figure 5: Compression Performance without Biprediction 235 While the filter objectively performs better at relatively high 236 bitrates, the subjective effect seems better at relatively low 237 bitrates, and overall the subjective effect seems better than what 238 the objective numbers suggest. 240 If biprediction is allowed, there is generally less bandwidth 241 reduction as the table below shows. These results reflect low-delay 242 biprediction without frame reordering. 244 +----------------+--------------------+--------------------+ 245 | | MEDIUM COMPLEXITY | HIGH COMPLEXITY | 246 +----------------+------+------+------+--------------------+ 247 | | | BDR- | BDR- | | BDR- | BDR- | 248 |Sequence | BDR | low | high | BDR | low | high | 249 +----------------+------+------+------+------+------+------+ 250 |Kimono | -2.2%| -1.8%| -2.7%| -1.4%| -1.3%| -1.5%| 251 |BasketballDrive | -2.6%| -2.5%| -2.7%| -1.4%| -1.6%| -1.1%| 252 |BQTerrace | -4.1%| -3.1%| -4.7%| -2.7%| -2.7%| -2.5%| 253 |FourPeople | -4.0%| -2.9%| -5.3%| -2.7%| -1.9%| -3.4%| 254 |Johnny | -3.5%| -2.7%| -4.6%| -2.2%| -1.6%| -3.1%| 255 |ChangeSeats | -4.2%| -3.0%| -6.1%| -2.6%| -2.0%| -3.2%| 256 |HeadAndShoulder | -4.1%| -2.9%| -6.1%| -2.3%| -1.8%| -2.8%| 257 |TelePresence | -2.8%| -1.9%| -4.3%| -1.6%| -1.2%| -2.1%| 258 +----------------+------+------+------+------+------+------+ 259 |Average | -3.4%| -2.6%| -4.6%| -2.1%| -1.9%| -2.5%| 260 +----------------+------+------+------+------+------+------+ 262 Figure 6: Compression Performance with Biprediction 264 6. IANA Considerations 266 This document has no IANA considerations yet. TBD 268 7. Security Considerations 270 This document has no security considerations yet. TBD 272 8. Acknowledgements 274 The authors would like to thank Gisle Bjontegaard for reviewing this 275 document and design, and providing constructive feedback and 276 direction. 278 9. References 280 9.1. Normative References 282 [I-D.fuldseth-netvc-thor] 283 Fuldseth, A., Bjontegaard, G., Midtskogen, S., Davies, T., 284 and M. Zanaty, "Thor Video Codec", draft-fuldseth-netvc- 285 thor-02 (work in progress), March 2016. 287 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 288 Requirement Levels", BCP 14, RFC 2119, 289 DOI 10.17487/RFC2119, March 1997, 290 . 292 9.2. Informative References 294 [BDR] Bjontegaard, G., "Calculation of average PSNR differences 295 between RD-curves", ITU-T SG16 Q6 VCEG-M33 , April 2001. 297 Authors' Addresses 299 Steinar Midtskogen 300 Cisco 301 Lysaker 302 Norway 304 Email: stemidts@cisco.com 306 Arild Fuldseth 307 Cisco 308 Lysaker 309 Norway 311 Email: arilfuld@cisco.com 313 Mo Zanaty 314 Cisco 315 RTP,NC 316 USA 318 Email: mzanaty@cisco.com