idnits 2.17.1 draft-fuldseth-netvc-thor-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 3 instances of too long lines in the document, the longest one being 30 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (October 31, 2016) is 2734 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'N' is mentioned on line 972, but not defined == Outdated reference: A later version (-04) exists of draft-midtskogen-netvc-clpf-02 Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Fuldseth 3 Internet-Draft G. Bjontegaard 4 Intended status: Standards Track S. Midtskogen 5 Expires: May 4, 2017 T. Davies 6 M. Zanaty 7 Cisco 8 October 31, 2016 10 Thor Video Codec 11 draft-fuldseth-netvc-thor-03 13 Abstract 15 This document provides a high-level description of the Thor video 16 codec. Thor is designed to achieve high compression efficiency with 17 moderate complexity, using the well-known hybrid video coding 18 approach of motion-compensated prediction and transform coding. 20 Status of This Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at http://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on May 4, 2017. 37 Copyright Notice 39 Copyright (c) 2016 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (http://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 55 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 5 56 2.1. Requirements Language . . . . . . . . . . . . . . . . . . 5 57 2.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6 58 3. Block Structure . . . . . . . . . . . . . . . . . . . . . . . 6 59 3.1. Super Blocks and Coding Blocks . . . . . . . . . . . . . 6 60 3.2. Special Processing at Frame Boundaries . . . . . . . . . 7 61 3.3. Transform Blocks . . . . . . . . . . . . . . . . . . . . 8 62 3.4. Prediction Blocks . . . . . . . . . . . . . . . . . . . . 8 63 4. Intra Prediction . . . . . . . . . . . . . . . . . . . . . . 8 64 5. Inter Prediction . . . . . . . . . . . . . . . . . . . . . . 9 65 5.1. Multiple Reference Frames . . . . . . . . . . . . . . . . 9 66 5.2. Bi-Prediction . . . . . . . . . . . . . . . . . . . . . . 10 67 5.3. Improved chroma prediction . . . . . . . . . . . . . . . 10 68 5.4. Reordered Frames . . . . . . . . . . . . . . . . . . . . 10 69 5.5. Interpolated Reference Frames . . . . . . . . . . . . . . 10 70 5.6. Sub-Pixel Interpolation . . . . . . . . . . . . . . . . . 10 71 5.6.1. Luma Poly-phase Filter . . . . . . . . . . . . . . . 10 72 5.6.2. Luma Special Filter Position . . . . . . . . . . . . 12 73 5.6.3. Chroma Poly-phase Filter . . . . . . . . . . . . . . 13 74 5.7. Motion Vector Coding . . . . . . . . . . . . . . . . . . 13 75 5.7.1. Inter0 and Inter1 Modes . . . . . . . . . . . . . . . 13 76 5.7.2. Inter2 and Bi-Prediction Modes . . . . . . . . . . . 15 77 5.7.3. Motion Vector Direction . . . . . . . . . . . . . . . 16 78 6. Transforms . . . . . . . . . . . . . . . . . . . . . . . . . 16 79 7. Quantization . . . . . . . . . . . . . . . . . . . . . . . . 16 80 7.1. Quantization matrices . . . . . . . . . . . . . . . . . . 17 81 7.1.1. Quantization matrix selection . . . . . . . . . . . . 17 82 7.1.2. Quantization matrix design . . . . . . . . . . . . . 18 83 8. Loop Filtering . . . . . . . . . . . . . . . . . . . . . . . 18 84 8.1. Deblocking . . . . . . . . . . . . . . . . . . . . . . . 18 85 8.1.1. Luma deblocking . . . . . . . . . . . . . . . . . . . 18 86 8.1.2. Chroma Deblocking . . . . . . . . . . . . . . . . . . 19 87 8.2. Constrained Low Pass Filter (CLPF) . . . . . . . . . . . 20 88 9. Entropy coding . . . . . . . . . . . . . . . . . . . . . . . 20 89 9.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 20 90 9.2. Low Level Syntax . . . . . . . . . . . . . . . . . . . . 21 91 9.2.1. CB Level . . . . . . . . . . . . . . . . . . . . . . 21 92 9.2.2. PB Level . . . . . . . . . . . . . . . . . . . . . . 21 93 9.2.3. TB Level . . . . . . . . . . . . . . . . . . . . . . 22 94 9.2.4. Super Mode . . . . . . . . . . . . . . . . . . . . . 22 95 9.2.5. CBP . . . . . . . . . . . . . . . . . . . . . . . . . 23 96 9.2.6. Transform Coefficients . . . . . . . . . . . . . . . 23 98 10. High Level Syntax . . . . . . . . . . . . . . . . . . . . . . 25 99 10.1. Sequence Header . . . . . . . . . . . . . . . . . . . . 25 100 10.2. Frame Header . . . . . . . . . . . . . . . . . . . . . . 26 101 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 27 102 12. Security Considerations . . . . . . . . . . . . . . . . . . . 27 103 13. Normative References . . . . . . . . . . . . . . . . . . . . 27 104 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 27 106 1. Introduction 108 This document provides a high-level description of the Thor video 109 codec. Thor is designed to achieve high compression efficiency with 110 moderate complexity, using the well-known hybrid video coding 111 approach of motion-compensated prediction and transform coding. 113 The Thor video codec is a block-based hybrid video codec similar in 114 structure to widespread standards. The high level encoder and 115 decoder structures are illustrated in Figure 1 and Figure 2 116 respectively. 118 +---+ +-----------+ +-----------+ +--------+ 119 Input--+-->| + |-->| Transform |-->| Quantizer |-->| Entropy| 120 Video | +---+ +-----------+ +-----------+ | Coding | 121 | ^ - | +--------+ 122 | | v | 123 | | +-----------+ v 124 | | | Inverse | Output 125 | | | Transform | Bitstream 126 | | +-----------+ 127 | | | 128 | | v 129 | | +---+ 130 | +------------------------>| + | 131 | | +-------------+ +---+ 132 | | ___| Intra Frame | | 133 | | / | Prediction |<-----+ 134 | | / +-------------+ | 135 | |/ v 136 | \ +-------------+ +---------+ 137 | \ | Inter Frame | | Loop | 138 | \___| Prediction | | Filters | 139 | +-------------+ +---------+ 140 | ^ | 141 | | v 142 | +------------+ +---------------+ 143 | | Motion | | Reconstructed | 144 +----------->| Estimation |<--| Frame Memory | 145 +------------+ +---------------+ 147 Figure 1: Encoder Structure 149 +----------+ +-----------+ 150 Input ------->| Entropy |----->| Inverse | 151 Bitstream | Decoding | | Transform | 152 +----------+ +-----------+ 153 | 154 v 155 +---+ 156 +------------------------>| + | 157 | +-------------+ +---+ 158 | ___| Intra Frame | | 159 | / | Prediction |<-----+ 160 | / +-------------+ | 161 |/ v 162 \ +-------------+ +---------+ 163 \ | Inter Frame | | Loop | 164 \___| Prediction | | Filters | 165 +-------------+ +---------+ 166 ^ |-------------> Output 167 | v Video 168 +--------------+ +---------------+ 169 | Motion | | Reconstructed | 170 | Compensation |<--| Frame Memory | 171 +--------------+ +---------------+ 173 Figure 2: Decoder Structure 175 The remainder of this document is organized as follows. First, some 176 requirements language and terms are defined. Block structures are 177 described in detail, followed by intra-frame prediction techniques, 178 inter-frame prediction techniques, transforms, quantization, loop 179 filters, entropy coding, and finally high level syntax. 181 An open source reference implementation is available at 182 github.com/cisco/thor. 184 2. Definitions 186 2.1. Requirements Language 188 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 189 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 190 document are to be interpreted as described in RFC 2119 [RFC2119]. 192 2.2. Terminology 194 This document frequently uses the following terms. 196 SB: Super Block - 64x64 or 128x128 block (luma pixels) which can 197 be divided into CBs. 199 CB: Coding Block - Subdivision of a SB, down to 8x8 (luma pixels). 201 PB: Prediction Block - Subdivision of a CB, into 1, 2 or 4 equal 202 blocks. 204 TB: Transform Block - Subdivision of a CB, into 1 or 4 equal 205 blocks. 207 3. Block Structure 209 3.1. Super Blocks and Coding Blocks 211 Input frames with bitdepths of 8, 10 or 12 are supported. The 212 internal bitdepth can be 8, 10 or 12 regardless if input bitdepth. 213 The bitdepth of the output frames always follows the input frames. 214 Chroma can be subsampled in both directions (4:2:0) or have full 215 resolution (4:4:4). 217 Each frame is divided into 64x64 or 128x128 Super Blocks (SB) which 218 are processed in raster-scan order. The SB size is signaled in the 219 sequence header. Each SB can be divided into Coding Blocks (CB) 220 using a quad-tree structure. The smallest allowed CB size is 8x8 221 luma pixels. The four CBs of a larger block are coded/signaled in 222 the following order; upleft, downleft, upright, and downright. 224 The following modes are signaled at the CB level: 226 o Intra 228 o Inter0 (skip): MV index, no residual information 230 o Inter1 (merge): MV index, residual information 232 o Inter2 (uni-pred): explicit motion information, residual 233 information 235 o Inter3 (ni-pred): explicit motion information, residual 236 information 238 3.2. Special Processing at Frame Boundaries 240 At frame boundaries some square blocks might not be complete. For 241 example, for 1920x1080 resolutions, the bottom row would consist of 242 rectangular blocks of size 64x56. Rectangular blocks at frame 243 boundaries are handled as follows. For each rectangular block, send 244 one bit to choose between: 246 o A rectangular inter0 block and 248 o Further split. 250 For the bottom part of a 1920x1080 frame, this implies the following: 252 o For each 64x56 block, transmit one bit to signal a 64x56 inter0 253 block or a split into two 32x32 blocks and two 32x24 blocks. 255 o For each 32x24 block, transmit one bit to signal a 32x24 inter0 256 block or a split into two 16x16 blocks and two 16x8 blocks. 258 o For each 16x8 block, transmit one bit to signal a 16x8 inter0 259 block or a split into two 8x8 blocks. 261 Two examples of handling 64x56 blocks at the bottom row of a 262 1920x1080 frame are shown in Figure 3 and Figure 4 respectively. 264 64 265 +-------------------------------+ 266 | | 267 | | 268 | | 269 | | 270 | | 271 | | 272 | | 273 64 | 56 64x56 | 274 | SKIP | 275 | | 276 | | 277 | | 278 | | 279 - - - - - - - - - + - - - - - - - - - - - - - - - + - - - 280 Frame boundary | 8 | 281 +-------------------------------+ 283 Figure 3: Super block at frame boundary 284 64 285 +---------------+---------------+ 286 | | | 287 | | | 288 | | | 289 | | | 290 | | | 291 | | | 292 | | | 293 64 +---------------+-------+-------+ 294 | | | | 295 | | | | 296 | 32x24 | | | 297 | SKIP +---+---+-------+ 298 | | | | 16x8 | 299 - - - - - - - - - + - - - - - - - +---+---+ - - - + - - - 300 Frame boundary | 8 | | | SKIP | 301 +---------------+---+---+-------+ 303 Figure 4: Coding block at frame boundary 305 3.3. Transform Blocks 307 A coding block (CB) can be divided into four smaller transform blocks 308 (TBs). 310 3.4. Prediction Blocks 312 A coding block (CB) can also be divided into smaller prediction 313 blocks (PBs) for the purpose of motion-compensated prediction. 314 Horizontal, vertical and quad split are used. 316 4. Intra Prediction 318 8 intra prediction modes are used: 320 1. DC 322 2. Vertical (V) 324 3. Horizontal (H) 326 4. Upupright (north-northeast) 328 5. Upupleft (north-northwest) 330 6. Upleft (northwest) 331 7. Upleftleft (west-northwest) 333 8. Downleftleft (west-southwest) 335 The definition of DC, vertical, and horizontal modes are 336 straightforward. 338 The upleft direction is exactly 45 degrees. 340 The upupright, upupleft, and upleftleft directions are equal to 341 arctan(1/2) from the horizontal or vertical direction, since they are 342 defined by going one pixel horizontally and two pixels vertically (or 343 vice versa). 345 For the 5 angular intra modes (i.e. angle different from 90 degrees), 346 the pixels of the neighbor blocks are filtered before they are used 347 for prediction: 349 y(n) = (x(n-1) + 2*x(n) + x(n+1) + 2)/4 351 For the angular intra modes that are not 45 degrees, the prediction 352 sometimes requires sample values at a half-pixel position. These 353 sample values are determined by an additional filter: 355 z(n + 1/2) = (y(n) + y(n+1))/2 357 5. Inter Prediction 359 5.1. Multiple Reference Frames 361 Multiple reference frames are currently implemented as follows. 363 o Use a sliding-window process to keep the N most recent 364 reconstructed frames in memory. The value of N is signaled in the 365 sequence header. 367 o In the frame header, signal which of these frames shall be active 368 for the current frame. 370 o For each CB, signal which of the active frames to be used for MC. 372 Combined with re-ordering, this allows for MPEG-1 style B frames. 374 A desirable future extension is to allow long-term reference frames 375 in addition to the short-term reference frames defined by the 376 sliding-window process. 378 5.2. Bi-Prediction 380 In case of bi-prediction, two reference indices and two motion 381 vectors are signaled per CB. In the current version, PB-split is not 382 allowed in bi-prediction mode. Sub-pixel interpolation is performed 383 for each motion vector/reference index separately before doing an 384 average between the two predicted blocks: 386 p(x,y) = (p0(x,y) + p1(x,y))/2 388 5.3. Improved chroma prediction 390 If specified in the sequence header, the chroma prediction, both 391 intra and inter, or either, is improved by using the luma 392 reconstruction if certain criteria are met. The process is described 393 in the separate CLPF draft [I-D.midtskogen-netvc-chromapred]. 395 5.4. Reordered Frames 397 Frames may be transmitted out of order. Reference frames are 398 selected from the sliding window buffer as normal. 400 5.5. Interpolated Reference Frames 402 A flag is sent in the sequence header indicating that interpolated 403 reference frames may be used. 405 If a frame is using an interpolated reference frame, it will be the 406 first reference in the reference list, and will be interpolated from 407 the second and third reference in the list. It is indicated by a 408 reference index of -1 and has a frame number equal to that of the 409 current frame. 411 The interpolated reference is created by a deterministic process 412 common to the encoder and decoder, and described in the separate 413 IRFVC draft [I-D.davies-netvc-irfvc]. 415 5.6. Sub-Pixel Interpolation 417 5.6.1. Luma Poly-phase Filter 419 Inter prediction uses traditional block-based motion compensated 420 prediction with quarter pixel resolution. A separable 6-tap poly- 421 phase filter is the basis method for doing MC with sub-pixel 422 accuracy. The luma filter coefficients are as follows: 424 When bi-prediction is enabled in the sequence header: 426 1/4 phase: [2,-10,59,17,-5,1]/64 428 2/4 phase: [1,-8,39,39,-8,1]/64 430 3/4 phase: [1,-5,17,59,-10,2]/64 432 When bi-prediction is disabled in the sequence header: 434 1/4 phase: [1,-7,55,19,-5,1]/64 436 2/4 phase: [1,-7,38,38,-7,1]/64 438 3/4 phase: [1,-5,19,55,-7,1]/64 440 With reference to Figure 5, a fractional sample value, e.g. i0,0 441 which has a phase of 1/4 in the horizontal dimension and a phase of 442 1/2 in the vertical dimension is calculated as follows: 444 a0,j = 2*A-2,i - 10*A-1,i + 59*A0,i + 17*A1,i - 5*A2,i + 1*A3,i 446 where j = -2,...,3 448 i0,0 = (1*a0,-2 - 8*a0,-1 + 39*a0,0 + 39*a0,1 - 8*a0,2 + 1*a0,3 + 449 2048)/4096 451 The minimum sub-block size is 8x8. 453 +-----+-----+-----+-----+-----+-----+-----+-----+-----+ 454 |A | | | |A |a |b |c |A | 455 |-1,-1| | | | 0,-1| 0,-1| 0,-1| 0,-1| 1,-1| 456 +-----+-----+-----+-----+-----+-----+-----+-----+-----+ 457 | | | | | | | | | | 458 | | | | | | | | | | 459 +-----+-----+-----+-----+-----+-----+-----+-----+-----+ 460 | | | | | | | | | | 461 | | | | | | | | | | 462 +-----+-----+-----+-----+-----+-----+-----+-----+-----+ 463 | | | | | | | | | | 464 | | | | | | | | | | 465 +-----+-----+-----+-----+-----+-----+-----+-----+-----+ 466 |A | | | |A |a |b |c |A | 467 |-1,0 | | | | 0,0 | 0,0 | 0,0 | 0,0 | 1,0 | 468 +-----+-----+-----+-----+-----+-----+-----+-----+-----+ 469 |d | | | |d |e |f |g |d | 470 |-1,0 | | | | 0,0 | 0,0 | 0,0 | 0,0 | 1,0 | 471 +-----+-----+-----+-----+-----+-----+-----+-----+-----+ 472 |h | | | |h |i |j |k |h | 473 |-1,0 | | | | 0,0 | 0,0 | 0,0 | 0,0 | 1,0 | 474 +-----+-----+-----+-----+-----+-----+-----+-----+-----+ 475 |l | | | |l |m |n |o |l | 476 |-1,0 | | | | 0,0 | 0,0 | 0,0 | 0,0 | 1,0 | 477 +-----+-----+-----+-----+-----+-----+-----+-----+-----+ 478 |A | | | |A |a |b |c |A | 479 |-1,1 | | | | 0,1 | 0,1 | 0,1 | 0,1 | 1,1 | 480 +-----+-----+-----+-----+-----+-----+-----+-----+-----+ 482 Figure 5: Sub-pixel positions 484 5.6.2. Luma Special Filter Position 486 For the fractional pixel position having exactly 2 quarter pixel 487 offsets in each dimension, a non-separable filter is used to 488 calculate the interpolated value. With reference to Figure 5, the 489 center position j0,0 is calculated as follows: 491 j0,0 = 493 [0*A-1,-1 + 1*A0,-1 + 1*A1,-1 + 0*A2,-1 + 495 1*A-1,0 + 2*A0,0 + 2*A1,0 + 1*A2,0 + 497 1*A-1,1 + 2*A0,1 + 2*A1,1 + 1*A2,1 + 498 0*A-1,2 + 1*A0,2 + 1*A1,2 + 0*A2,2 + 8]/16 500 5.6.3. Chroma Poly-phase Filter 502 Chroma interpolation is performed with 1/8 pixel resolution using the 503 following poly-phase filter. 505 1/8 phase: [-2, 58, 10, -2]/64 507 2/8 phase: [-4, 54, 16, -2]/64 509 3/8 phase: [-4, 44, 28, -4]/64 511 4/8 phase: [-4, 36, 36, -4]/64 513 5/8 phase: [-4, 28, 44, -4]/64 515 6/8 phase: [-2, 16, 54, -4]/64 517 7/8 phase: [-2, 10, 58, -2]/64 519 5.7. Motion Vector Coding 521 5.7.1. Inter0 and Inter1 Modes 523 Inter0 and inter1 modes imply signaling of a motion vector index to 524 choose a motion vector from a list of candidate motion vectors with 525 associated reference frame index. A list of motion vector candidates 526 are derived from at most two different neighbor blocks, each having a 527 unique motion vector/reference frame index. Signaling of the motion 528 vector index uses 0 or 1 bit, dependent on the number of unique 529 motion vector candidates. If the chosen neighbor block is coded in 530 bi-prediction mode, the inter0 or inter1 block inherits both motion 531 vectors, both reference indices and the bi-prediction property of the 532 neighbor block. 534 For block sizes less than 64x64, inter0 has only one motion vector 535 candidate, and its value is always zero. 537 Which neighbor blocks to use for motion vector candidates depends on 538 the availability of the neighbor blocks (i.e. whether the neighbor 539 blocks have already been coded, belong to the same slice and are not 540 outside the frame boundaries). Four different availabilities, U, UR, 541 L, and LL, are defined as illustrated in Figure 6. If the neighbor 542 block is intra it is considered to be available but with a zero 543 motion vector. 545 | | 546 | U | UR 547 -----------+-----------+----------- 548 | | 549 | current | 550 L | block | 551 | | 552 | | 553 -----------+-----------+ 554 | 555 | 556 LL | 557 | 559 Figure 6: Availability of neighbor blocks 561 Based on the four availabilities defined above, each of the motion 562 vector candidates is derived from one of the possible neighbor blocks 563 defined in Figure 7. 565 +----+----+ +----+ +----+----+ 566 | UL | U0 | | U1 | | U2 | UR | 567 +----+----+------+----+----+----+----+ 568 | L0 | | 569 +----+ | 570 | | 571 | | 572 +----+ current | 573 | L1 | block | 574 +----+ | 575 | | 576 +----+ | 577 | L2 | | 578 +----+--------------------------+ 579 | LL | 580 +----+ 582 Figure 7: Motion vector candidates 584 The choice of motion vector candidates depends on the availability of 585 neighbor blocks as shown in Table 1. 587 +----+-----+----+-----+---------------------------+ 588 | U | UR | L | LL | Motion vector candidates | 589 +----+-----+----+-----+---------------------------+ 590 | 0 | 0 | 0 | 0 | zero vector | 591 | 1 | 0 | 0 | 0 | U2, zero vector | 592 | 0 | 1 | 0 | 0 | NA | 593 | 1 | 1 | 0 | 0 | U2,zero vector | 594 | 0 | 0 | 1 | 0 | L2, zero vector | 595 | 1 | 0 | 1 | 0 | U2,L2 | 596 | 0 | 1 | 1 | 0 | NA | 597 | 1 | 1 | 1 | 0 | U2,L2 | 598 | 0 | 0 | 0 | 1 | NA | 599 | 1 | 0 | 0 | 1 | NA | 600 | 0 | 1 | 0 | 1 | NA | 601 | 1 | 1 | 0 | 1 | NA | 602 | 0 | 0 | 1 | 1 | L2, zero vector | 603 | 1 | 0 | 1 | 1 | U2,L2 | 604 | 0 | 1 | 1 | 1 | NA | 605 | 1 | 1 | 1 | 1 | U2,L2 | 606 +----+-----+----+-----+---------------------------+ 608 Table 1: Motion vector candidates for different availability of 609 neighbor blocks 611 5.7.2. Inter2 and Bi-Prediction Modes 613 Motion vectors are coded using motion vector prediction. The motion 614 vector predictor is defined as the median of the motion vectors from 615 three neighbor blocks. Definition of the motion vector predictor 616 uses the same definition of availability and neighbors as in Figure 6 617 and Figure 7 respectively. The three vectors used for median 618 filtering depends on the availability of neighbor blocks as shown in 619 Table 2. If the neighbor block is coded in bi-prediction mode, only 620 the first motion vector (in transmission order), MV0, is used as 621 input to the median operator. 623 +----+-----+----+-----+--------------------------------------+ 624 | U | UR | L | LL | Motion vectors for median filtering | 625 +----+-----+----+-----+--------------------------------------+ 626 | 0 | 0 | 0 | 0 | 3 x zero vector | 627 | 1 | 0 | 0 | 0 | U0,U1,U2 | 628 | 0 | 1 | 0 | 0 | NA | 629 | 1 | 1 | 0 | 0 | U0,U2,UR | 630 | 0 | 0 | 1 | 0 | L0,L1,L2 | 631 | 1 | 0 | 1 | 0 | UL,U2,L2 | 632 | 0 | 1 | 1 | 0 | NA | 633 | 1 | 1 | 1 | 0 | U0,UR,L2,L0 | 634 | 0 | 0 | 0 | 1 | NA | 635 | 1 | 0 | 0 | 1 | NA | 636 | 0 | 1 | 0 | 1 | NA | 637 | 1 | 1 | 0 | 1 | NA | 638 | 0 | 0 | 1 | 1 | L0,L2,LL | 639 | 1 | 0 | 1 | 1 | U2,L0,LL | 640 | 0 | 1 | 1 | 1 | NA | 641 | 1 | 1 | 1 | 1 | U0,UR,L0 | 642 +----+-----+----+-----+--------------------------------------+ 644 Table 2: Neighbor blocks used to define motion vector predictor 645 through median filtering 647 5.7.3. Motion Vector Direction 649 Motion vectors referring to reference frames later in time than the 650 current frame are stored with their sign reversed, and these reversed 651 values are used for coding and motion vector prediction. 653 6. Transforms 655 Transforms are applied at the TB or CB level, implying that transform 656 sizes range from 4x4 to 128x128. The transforms form an embedded 657 structure meaning the transform matrix elements of the smaller 658 transforms can be extracted from the larger transforms. 660 7. Quantization 662 For the 32x32, 64x64 and 128x128 transform sizes, only the 16x16 low 663 frequency coefficients are quantized and transmitted. 665 The 64x64 inverse transform is defined as a 32x32 transform followed 666 by duplicating each output sample into a 2x2 block. The 128x128 667 inverse transform is defined as a 32x32 transform followed by 668 duplicating each output sample into a 4x4 block. 670 7.1. Quantization matrices 672 A flag is transmitted in the sequence header to indicate whether 673 quantization matrices are used. If this flag is true, a 6 bit value 674 qmtx_offset is transmitted in the sequence header to indicate matrix 675 strength. 677 If used, then in dequantization a separate scaling factor is applied 678 to each coefficient, so that the dequantized value of a coefficient 679 ci at position i is: 681 (ci * d(q) * IW(i,c,s,t,q) + 2^(k + 5)) >> (k + 6) 683 Figure 8: Equation 1 685 where IW is the scale factor for coefficient position i with size s, 686 frame type (inter/inter) t, component (Y, Cb or Cr) c and quantizer 687 q; and k=k(s,q) is the dequantization shift. IW has scale 64, that 688 is, a weight value of 64 is no different to unweighted 689 dequantization. 691 7.1.1. Quantization matrix selection 693 The current luma qp value qpY and the offset value qmtx_offset 694 determine a quantisation matrix set by the formula: 696 qmlevel = max(0,min(11,((qpY + qmtx_offset) * 12) / 44)) 698 Figure 9: Equation 2 700 This selects one of the 12 different sets of default quantization 701 matrix, with increasing qmlevel indicating increasing flatness. 703 For a given value of qmlevel, different weighting matrices are 704 provided for all combinations of transform block size, type (intra/ 705 inter), and component (Y, Cb, Cr). Matrices at low qmlevel are flat 706 (constant value 64). Matrices for inter frames have unity DC gain 707 (i.e. value 64 at position 0), whereas those for intra frames are 708 designed such that the inverse weighting matrix has unity energy gain 709 (i.e. normalized sum-squared of the scaling factors is 1). 711 7.1.2. Quantization matrix design 713 Further details on the quantization matrix and implementation can be 714 found in the separate QMTX draft [I-D.davies-netvc-qmtx]. 716 8. Loop Filtering 718 8.1. Deblocking 720 8.1.1. Luma deblocking 722 Luma deblocking is performed on an 8x8 grid as follows: 724 1. For each vertical edge between two 8x8 blocks, calculate the 725 following for each of line 2 and line 5 respectively: 727 d = abs(a-b) + abs(c-d), 729 where a and b, are on the left hand side of the block edge and c 730 and d are on the right hand side of the block edge: 732 a b | c d 734 2. For each line crossing the vertical edge, perform deblocking if 735 and only if all of the following conditions are true: 737 * d2+d5 < beta(QP) 739 * The edge is also a transform block edge 741 * abs(mvx(left)) > 2, or abs(mvx(right)) > 2, or 743 abs(mvy(left)) > 2, or abs(mvy(right)) > 2, or 745 One of the transform blocks on each side of the edge has non- 746 zero coefficients, or 748 One of the transform blocks on each side of the edge is coded 749 using intra mode. 751 3. If deblocking is performed, calculate a delta value as follows: 753 delta = clip((18*(c-b) - 6*(d-a) + 16)/32,tc,-tc), 755 where tc is a QP-dependent value. 757 4. Next, modify two pixels on each side of the block edge as 758 follows: 760 a' = a + delta/2 762 b' = b + delta 764 c' = c + delta 766 d' = d + delta/2 768 5. The same procedure is followed for horizontal block edges. 770 The relative positions of the samples, a, b, c, d and the motion 771 vectors, MV, are illustrated in Figure 10. 773 | 774 | block edge 775 | 776 +---+---+---+---+ 777 | a | b | c | d | 778 +---+---+---+---+ 779 | 780 mv | mv 781 x,left | x,right 782 | 783 mv mv 784 y,left y,right 786 Figure 10: Deblocking filter pixel positions 788 8.1.2. Chroma Deblocking 790 Chroma deblocking is performed on a 4x4 grid as follows: 792 1. Delocking of the edge between two 4x4 blocks is performed if and 793 only if: 795 * The pixels on either side of the block edge belongs to an 796 intra block. 798 * The block edge is also an edge between two transform blocks. 800 2. If deblocking is performed, calculate a delta value as follows: 802 delta = clip((4*(c-b) + (d-a) + 4)/8,tc,-tc), 804 where tc is a QP-dependent value. 806 3. Next, modify one pixel on each side of the block edge as follows: 808 b' = b + delta 810 c' = c + delta 812 8.2. Constrained Low Pass Filter (CLPF) 814 A low-pass filter is applied after the deblocking filter if signaled 815 in the sequence header. It can still be switched off for individual 816 frames in the frame header. Also signaled in the frame header is 817 whether to apply the filter for all qualified 128x128 blocks or to 818 transmit a flag for each such block. A super block does not qualify 819 if it only contains Inter0 (skip) coding block and no signal is 820 transmitted for these blocks. 822 The filter is described in the separate CLPF draft 823 [I-D.midtskogen-netvc-clpf]. 825 9. Entropy coding 827 9.1. Overview 829 The following information is signaled at the sequence level: 831 o Sequence header 833 The following information is signaled at the frame level: 835 o Frame header 837 The following information is signaled at the CB level: 839 o Super-mode (mode, split, reference index for uni-prediction) 841 o Intra prediction mode 843 o PB-split (none, hor, ver, quad) 845 o TB-split (none or quad) 847 o Reference frame indices for bi-prediction 849 o Motion vector candidate index 851 o Transform coefficients if TB-split=0 853 The following information is signaled at the TB level: 855 o CBP (8 combinations of CBPY, CBPU, and CBPV) 857 o Transform coefficients 859 The following information is signaled at the PB level: 861 o Motion vector differences 863 9.2. Low Level Syntax 865 9.2.1. CB Level 867 super-mode (inter0/split/inter1/inter2-ref0/intra/inter2-ref1/inter2-ref2/inter2-ref3,..) 869 if (mode == inter0 || mode == inter1) 871 mv_idx (one of up to 2 motion vector candidates) 873 else if (mode == INTRA) 875 intra_mode (one of up to 8 intra modes) 877 tb_split (NONE or QUAD, coded jointly with CBP for tb_split=NONE) 879 else if (mode == INTER) 881 pb_split (NONE,VER,HOR,QUAD) 883 tb_split_and_cbp (NONE or QUAD and CBP) 885 else if (mode == BIPRED) 887 mvd_x0, mvd_y0 (motion vector difference for first vector) 889 mvd_x1, mvd_y1 (motion vector difference for second vector) 891 ref_idx0, ref_idx1 (two reference indices) 893 9.2.2. PB Level 895 if (mode == INTER2 || mode == BIPRED) 897 mvd_x, mvd_y (motion vector differences) 899 9.2.3. TB Level 901 if (mode != INTER0 and tb_split == 1) 903 cbp (8 possibilities for CBPY/CBPU/CBPV) 905 if (mode != INTER0) 907 transform coefficients 909 9.2.4. Super Mode 911 For each block of size NxN (64>=N>8), the following mutually 912 exclusive events are jointly encoded using a single VLC code as 913 follows (example using 4 reference frames): 915 If there is no interpolated reference frame: 917 INTER0 1 918 SPLIT 01 919 INTER1 001 920 INTER2-REF0 0001 921 BIPRED 00001 922 INTRA 000001 923 INTER2-REF1 0000001 924 INTER2-REF2 00000001 925 INTER2-REF3 00000000 927 If there is an interpolated reference frame: 929 INTER0 1 930 SPLIT 01 931 INTER1 001 932 BIPRED 0001 933 INTRA 00001 934 INTER2-REF1 000001 935 INTER2-REF2 0000001 936 INTER2-REF3 00000001 937 INTER2-REF0 00000000 939 If less than 4 reference frames is used, a shorter VLC table is used. 940 If bi-pred is not possible, or split is not possible, they are 941 omitted from the table and shorter codes are used for subsequent 942 elements. 944 Additionally, depending on information from the blocks to the left 945 and above (meta data and CBP), a different sorting of the events can 946 be used, e.g.: 948 SPLIT 1 949 INTER1 01 950 INTER2-REF0 001 951 INTER0 0001 952 INTRA 00001 953 INTER2-REF1 000001 954 INTER2-REF2 0000001 955 INTER2-REF3 00000001 956 BIPRED 00000000 958 9.2.5. CBP 960 Calculate code as follows: 962 if (tb-split == 0) 964 N = 4*CBPV + 2*CBPU + CBPY 966 else 968 N = 8 970 Map the value of N to code through a table lookup: 972 code = table[N] 974 where the purpose of the table lookup is the sort the different 975 values of code according to decreasing probability (typically CBPY=1, 976 CBPU=0, CBPV=0 having the highest probability). 978 Use a different table depending on the values of CBPY in neighbor 979 blocks (left and above). 981 Encode the value of code using a systematic VLC code. 983 9.2.6. Transform Coefficients 985 Transform coefficient coding uses a traditional zig-zag scan pattern 986 to convert a 2D array of quantized transform coefficients, coeff, to 987 a 1D array of samples. VLC coding of quantized transform 988 coefficients starts from the low frequency end of the 1D array using 989 two different modes; level-mode and run-mode, starting in level-mode: 991 o Level-mode 993 * Encode each coefficient, coeff, separately 995 * Each coefficient is encoded by: 997 + The absolute value, level=abs(coeff), using a VLC code and 999 + If level > 0, the sign bit (sign=0 or sign=1 for coeff>0 and 1000 coeff<0 respectively). 1002 * If coefficient N is zero, switch to run-mode, starting from 1003 coefficient N+1. 1005 o Run-mode 1007 * For each non-zero coefficient, encode the combined event of: 1009 1. Length of the zero-run, i.e. the number of zeros since the 1010 last non-zero coefficient. 1012 2. Whether or not level=abs(coeff) is greater than 1. 1014 3. End of block (EOB) indicating that there are no more non- 1015 zero coefficients. 1017 * Additionally, if level = 1, code the sign bit. 1019 * Additionally, if level > 1 define code = 2*(level-2)+sign, 1021 * If the absolute value of coefficient N is larger than 1, switch 1022 to level-mode, starting from coefficient N+1. 1024 Example 1026 Figure 11 illustrates an example where 16 quantized transform 1027 coefficients are encoded. 1029 4 1030 3 1031 2 | 2 1032 1 | 1 1 | 1 1033 | | 0 0 0 0 | | 0 0 0 0 1034 |__|__|__|________|________|__|________|_______ 1036 Figure 11: Coefficients to encode 1038 Table 3 shows the mode, VLC number and symbols to be coded for each 1039 coefficient. 1041 +--------+-------------+-------------+------------------------------+ 1042 | Index | abs(coeff) | Mode | Encoded symbols | 1043 +--------+-------------+-------------+------------------------------+ 1044 | 0 | 2 | level-mode | level=2,sign | 1045 | 1 | 1 | level-mode | level=1,sign | 1046 | 2 | 4 | level-mode | level=4,sign | 1047 | 3 | 1 | level-mode | level=1,sign | 1048 | 4 | 0 | level-mode | level=0 | 1049 | 5 | 0 | run-mode | | 1050 | 6 | 1 | run-mode | (run=1,level=1) | 1051 | 7 | 0 | run-mode | | 1052 | 8 | 0 | run-mode | | 1053 | 9 | 3 | run-mode | (run=1,level>1), | 1054 | | | | 2*(3-2)+sign | 1055 | 10 | 2 | level-mode | level=2, sign | 1056 | 11 | 0 | level-mode | level=0 | 1057 | 12 | 0 | run-mode | | 1058 | 13 | 1 | run-mode | (run=1,level=1) | 1059 | 14 | 0 | run-mode | EOB | 1060 | 15 | 0 | run-mode | | 1061 +--------+-------------+-------------+------------------------------+ 1063 Table 3: Transform coefficient encoding for the example above. 1065 10. High Level Syntax 1067 High level syntax is currently very simple and rudimentary as the 1068 primary focus so far has been on compression performance. It is 1069 expected to evolve as functionality is added. 1071 10.1. Sequence Header 1073 o Width - 16 bits 1075 o Height - 16 bits 1077 o Enable/disable PB-split - 1 bit 1079 o SB size - 3 bits 1081 o Enable/disable TB-split - 1 bit 1083 o Number of active reference frames (may go into frame header) - 2 1084 bits (max 4) 1086 o Enable/disable interpolated reference frames - 1 bit 1088 o Enable/disable delta qp - 1 bit 1089 o Enable/disable deblocking - 1 bit 1091 o Constrained low-pass filter (CLPF) enable/disable - 1 bit 1093 o Enable/disable block context coding - 1 bit 1095 o Enable/disable bi-prediction - 1 bit 1097 o Enable/disable quantization matrices - 1 bit 1099 o If quantization matrices enabled: quantization matrix offset - 6 1100 bits 1102 o Select 420 or 444 input - 1 bit 1104 o Number of reordered frames - 4 bits 1106 o Enable/disable chroma intra prediction from luma - 1 bit 1108 o Enable/disable chroma inter prediction from luma - 1 bit 1110 o Internal frame bitdepth (8, 10 or 12 bits) - 2 bits 1112 o Input video bitdepth (8, 10 or 12 bits) - 2 bits 1114 10.2. Frame Header 1116 o Frame type - 1 bit 1118 o QP - 8 bits 1120 o Identification of active reference frames - num_ref*4 bits 1122 o Number of intra modes - 4 bits 1124 o Number of active reference frames - 2 bits 1126 o Active reference frames - number of active reference frames * 6 1127 bits 1129 o Frame number - 16 bits 1131 o If CLPF is enabled in the sequence header: Constrained low-pass 1132 filter (CLPF) strength - 2 bits (00 = off, 01 = strength 1, 10 = 1133 strength 2, 11 = strength 4) 1135 o IF CLPF is enabled in the sequence header: Enable/disable CLPF 1136 signal for each qualified filter block 1138 11. IANA Considerations 1140 This document has no IANA considerations yet. TBD 1142 12. Security Considerations 1144 This document has no security considerations yet. TBD 1146 13. Normative References 1148 [I-D.davies-netvc-irfvc] 1149 Davies, T., "Interpolated reference frames for video 1150 coding", draft-davies-netvc-irfvc-00 (work in progress), 1151 October 2015. 1153 [I-D.davies-netvc-qmtx] 1154 Davies, T., "Quantisation matrices for Thor video coding", 1155 draft-davies-netvc-qmtx-00 (work in progress), March 2016. 1157 [I-D.midtskogen-netvc-chromapred] 1158 Midtskogen, S., "Improved chroma prediction", draft- 1159 midtskogen-netvc-chromapred-02 (work in progress), October 1160 2016. 1162 [I-D.midtskogen-netvc-clpf] 1163 Midtskogen, S., Fuldseth, A., and M. Zanaty, "Constrained 1164 Low Pass Filter", draft-midtskogen-netvc-clpf-02 (work in 1165 progress), April 2016. 1167 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1168 Requirement Levels", BCP 14, RFC 2119, 1169 DOI 10.17487/RFC2119, March 1997, 1170 . 1172 Authors' Addresses 1174 Arild Fuldseth 1175 Cisco 1176 Lysaker 1177 Norway 1179 Email: arilfuld@cisco.com 1180 Gisle Bjontegaard 1181 Cisco 1182 Lysaker 1183 Norway 1185 Email: gbjonteg@cisco.com 1187 Steinar Midtskogen 1188 Cisco 1189 Lysaker 1190 Norway 1192 Email: stemidts@cisco.com 1194 Thomas Davies 1195 Cisco 1196 London 1197 UK 1199 Email: thdavies@cisco.com 1201 Mo Zanaty 1202 Cisco 1203 RTP,NC 1204 USA 1206 Email: mzanaty@cisco.com