idnits 2.17.1 draft-fuldseth-netvc-thor-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 3 instances of too long lines in the document, the longest one being 30 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (March 18, 2016) is 2954 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'N' is mentioned on line 957, but not defined == Outdated reference: A later version (-04) exists of draft-midtskogen-netvc-clpf-01 Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Fuldseth 3 Internet-Draft G. Bjontegaard 4 Intended status: Standards Track S. Midtskogen 5 Expires: September 19, 2016 T. Davies 6 M. Zanaty 7 Cisco 8 March 18, 2016 10 Thor Video Codec 11 draft-fuldseth-netvc-thor-02 13 Abstract 15 This document provides a high-level description of the Thor video 16 codec. Thor is designed to achieve high compression efficiency with 17 moderate complexity, using the well-known hybrid video coding 18 approach of motion-compensated prediction and transform coding. 20 Status of This Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at http://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on September 19, 2016. 37 Copyright Notice 39 Copyright (c) 2016 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (http://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 55 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 5 56 2.1. Requirements Language . . . . . . . . . . . . . . . . . . 5 57 2.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6 58 3. Block Structure . . . . . . . . . . . . . . . . . . . . . . . 6 59 3.1. Super Blocks and Coding Blocks . . . . . . . . . . . . . 6 60 3.2. Special Processing at Frame Boundaries . . . . . . . . . 6 61 3.3. Transform Blocks . . . . . . . . . . . . . . . . . . . . 8 62 3.4. Prediction Blocks . . . . . . . . . . . . . . . . . . . . 8 63 4. Intra Prediction . . . . . . . . . . . . . . . . . . . . . . 8 64 5. Inter Prediction . . . . . . . . . . . . . . . . . . . . . . 9 65 5.1. Multiple Reference Frames . . . . . . . . . . . . . . . . 9 66 5.2. Bi-Prediction . . . . . . . . . . . . . . . . . . . . . . 10 67 5.3. Reordered Frames . . . . . . . . . . . . . . . . . . . . 10 68 5.4. Interpolated Reference Frames . . . . . . . . . . . . . . 10 69 5.5. Sub-Pixel Interpolation . . . . . . . . . . . . . . . . . 10 70 5.5.1. Luma Poly-phase Filter . . . . . . . . . . . . . . . 10 71 5.5.2. Luma Special Filter Position . . . . . . . . . . . . 12 72 5.5.3. Chroma Poly-phase Filter . . . . . . . . . . . . . . 13 73 5.6. Motion Vector Coding . . . . . . . . . . . . . . . . . . 13 74 5.6.1. Inter0 and Inter1 Modes . . . . . . . . . . . . . . . 13 75 5.6.2. Inter2 and Bi-Prediction Modes . . . . . . . . . . . 15 76 5.6.3. Motion Vector Direction . . . . . . . . . . . . . . . 16 77 6. Transforms . . . . . . . . . . . . . . . . . . . . . . . . . 16 78 7. Quantization . . . . . . . . . . . . . . . . . . . . . . . . 16 79 7.1. Quantization matrices . . . . . . . . . . . . . . . . . . 17 80 7.1.1. Quantization matrix selection . . . . . . . . . . . . 17 81 7.1.2. Quantization matrix design . . . . . . . . . . . . . 18 82 8. Loop Filtering . . . . . . . . . . . . . . . . . . . . . . . 18 83 8.1. Deblocking . . . . . . . . . . . . . . . . . . . . . . . 18 84 8.1.1. Luma deblocking . . . . . . . . . . . . . . . . . . . 18 85 8.1.2. Chroma Deblocking . . . . . . . . . . . . . . . . . . 19 86 8.2. Constrained Low Pass Filter (CLPF) . . . . . . . . . . . 20 87 9. Entropy coding . . . . . . . . . . . . . . . . . . . . . . . 20 88 9.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 20 89 9.2. Low Level Syntax . . . . . . . . . . . . . . . . . . . . 21 90 9.2.1. CB Level . . . . . . . . . . . . . . . . . . . . . . 21 91 9.2.2. PB Level . . . . . . . . . . . . . . . . . . . . . . 21 92 9.2.3. TB Level . . . . . . . . . . . . . . . . . . . . . . 22 93 9.2.4. Super Mode . . . . . . . . . . . . . . . . . . . . . 22 94 9.2.5. CBP . . . . . . . . . . . . . . . . . . . . . . . . . 23 95 9.2.6. Transform Coefficients . . . . . . . . . . . . . . . 23 96 10. High Level Syntax . . . . . . . . . . . . . . . . . . . . . . 25 97 10.1. Sequence Header . . . . . . . . . . . . . . . . . . . . 25 98 10.2. Frame Header . . . . . . . . . . . . . . . . . . . . . . 26 99 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 26 100 12. Security Considerations . . . . . . . . . . . . . . . . . . . 26 101 13. Normative References . . . . . . . . . . . . . . . . . . . . 27 102 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 27 104 1. Introduction 106 This document provides a high-level description of the Thor video 107 codec. Thor is designed to achieve high compression efficiency with 108 moderate complexity, using the well-known hybrid video coding 109 approach of motion-compensated prediction and transform coding. 111 The Thor video codec is a block-based hybrid video codec similar in 112 structure to widespread standards. The high level encoder and 113 decoder structures are illustrated in Figure 1 and Figure 2 114 respectively. 116 +---+ +-----------+ +-----------+ +--------+ 117 Input--+-->| + |-->| Transform |-->| Quantizer |-->| Entropy| 118 Video | +---+ +-----------+ +-----------+ | Coding | 119 | ^ - | +--------+ 120 | | v | 121 | | +-----------+ v 122 | | | Inverse | Output 123 | | | Transform | Bitstream 124 | | +-----------+ 125 | | | 126 | | v 127 | | +---+ 128 | +------------------------>| + | 129 | | +-------------+ +---+ 130 | | ___| Intra Frame | | 131 | | / | Prediction |<-----+ 132 | | / +-------------+ | 133 | |/ v 134 | \ +-------------+ +---------+ 135 | \ | Inter Frame | | Loop | 136 | \___| Prediction | | Filters | 137 | +-------------+ +---------+ 138 | ^ | 139 | | v 140 | +------------+ +---------------+ 141 | | Motion | | Reconstructed | 142 +----------->| Estimation |<--| Frame Memory | 143 +------------+ +---------------+ 145 Figure 1: Encoder Structure 147 +----------+ +-----------+ 148 Input ------->| Entropy |----->| Inverse | 149 Bitstream | Decoding | | Transform | 150 +----------+ +-----------+ 151 | 152 v 153 +---+ 154 +------------------------>| + | 155 | +-------------+ +---+ 156 | ___| Intra Frame | | 157 | / | Prediction |<-----+ 158 | / +-------------+ | 159 |/ v 160 \ +-------------+ +---------+ 161 \ | Inter Frame | | Loop | 162 \___| Prediction | | Filters | 163 +-------------+ +---------+ 164 ^ |-------------> Output 165 | v Video 166 +--------------+ +---------------+ 167 | Motion | | Reconstructed | 168 | Compensation |<--| Frame Memory | 169 +--------------+ +---------------+ 171 Figure 2: Decoder Structure 173 The remainder of this document is organized as follows. First, some 174 requirements language and terms are defined. Block structures are 175 described in detail, followed by intra-frame prediction techniques, 176 inter-frame prediction techniques, transforms, quantization, loop 177 filters, entropy coding, and finally high level syntax. 179 An open source reference implementation is available at 180 github.com/cisco/thor. 182 2. Definitions 184 2.1. Requirements Language 186 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 187 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 188 document are to be interpreted as described in RFC 2119 [RFC2119]. 190 2.2. Terminology 192 This document frequently uses the following terms. 194 SB: Super Block - 64x64 or 128x128 block (luma pixels) which can 195 be divided into CBs. 197 CB: Coding Block - Subdivision of a SB, down to 8x8 (luma pixels). 199 PB: Prediction Block - Subdivision of a CB, into 1, 2 or 4 equal 200 blocks. 202 TB: Transform Block - Subdivision of a CB, into 1 or 4 equal 203 blocks. 205 3. Block Structure 207 3.1. Super Blocks and Coding Blocks 209 Each frame is divided into 64x64 or 128x128 Super Blocks (SB) which 210 are processed in raster-scan order. The SB size is signaled in the 211 sequence header. Each SB can be divided into Coding Blocks (CB) 212 using a quad-tree structure. The smallest allowed CB size is 8x8 213 luma pixels. The four CBs of a larger block are coded/signaled in 214 the following order; upleft, downleft, upright, and downright. 216 The following modes are signaled at the CB level: 218 o Intra 220 o Inter0 (skip): MV index, no residual information 222 o Inter1 (merge): MV index, residual information 224 o Inter2 (uni-pred): explicit motion information, residual 225 information 227 o Inter3 (ni-pred): explicit motion information, residual 228 information 230 3.2. Special Processing at Frame Boundaries 232 At frame boundaries some square blocks might not be complete. For 233 example, for 1920x1080 resolutions, the bottom row would consist of 234 rectangular blocks of size 64x56. Rectangular blocks at frame 235 boundaries are handled as follows. For each rectangular block, send 236 one bit to choose between: 238 o A rectangular inter0 block and 240 o Further split. 242 For the bottom part of a 1920x1080 frame, this implies the following: 244 o For each 64x56 block, transmit one bit to signal a 64x56 inter0 245 block or a split into two 32x32 blocks and two 32x24 blocks. 247 o For each 32x24 block, transmit one bit to signal a 32x24 inter0 248 block or a split into two 16x16 blocks and two 16x8 blocks. 250 o For each 16x8 block, transmit one bit to signal a 16x8 inter0 251 block or a split into two 8x8 blocks. 253 Two examples of handling 64x56 blocks at the bottom row of a 254 1920x1080 frame are shown in Figure 3 and Figure 4 respectively. 256 64 257 +-------------------------------+ 258 | | 259 | | 260 | | 261 | | 262 | | 263 | | 264 | | 265 64 | 56 64x56 | 266 | SKIP | 267 | | 268 | | 269 | | 270 | | 271 - - - - - - - - - + - - - - - - - - - - - - - - - + - - - 272 Frame boundary | 8 | 273 +-------------------------------+ 275 Figure 3: Super block at frame boundary 276 64 277 +---------------+---------------+ 278 | | | 279 | | | 280 | | | 281 | | | 282 | | | 283 | | | 284 | | | 285 64 +---------------+-------+-------+ 286 | | | | 287 | | | | 288 | 32x24 | | | 289 | SKIP +---+---+-------+ 290 | | | | 16x8 | 291 - - - - - - - - - + - - - - - - - +---+---+ - - - + - - - 292 Frame boundary | 8 | | | SKIP | 293 +---------------+---+---+-------+ 295 Figure 4: Coding block at frame boundary 297 3.3. Transform Blocks 299 A coding block (CB) can be divided into four smaller transform blocks 300 (TBs). 302 3.4. Prediction Blocks 304 A coding block (CB) can also be divided into smaller prediction 305 blocks (PBs) for the purpose of motion-compensated prediction. 306 Horizontal, vertical and quad split are used. 308 4. Intra Prediction 310 8 intra prediction modes are used: 312 1. DC 314 2. Vertical (V) 316 3. Horizontal (H) 318 4. Upupright (north-northeast) 320 5. Upupleft (north-northwest) 322 6. Upleft (northwest) 323 7. Upleftleft (west-northwest) 325 8. Downleftleft (west-southwest) 327 The definition of DC, vertical, and horizontal modes are 328 straightforward. 330 The upleft direction is exactly 45 degrees. 332 The upupright, upupleft, and upleftleft directions are equal to 333 arctan(1/2) from the horizontal or vertical direction, since they are 334 defined by going one pixel horizontally and two pixels vertically (or 335 vice versa). 337 For the 5 angular intra modes (i.e. angle different from 90 degrees), 338 the pixels of the neighbor blocks are filtered before they are used 339 for prediction: 341 y(n) = (x(n-1) + 2*x(n) + x(n+1) + 2)/4 343 For the angular intra modes that are not 45 degrees, the prediction 344 sometimes requires sample values at a half-pixel position. These 345 sample values are determined by an additional filter: 347 z(n + 1/2) = (y(n) + y(n+1))/2 349 5. Inter Prediction 351 5.1. Multiple Reference Frames 353 Multiple reference frames are currently implemented as follows. 355 o Use a sliding-window process to keep the N most recent 356 reconstructed frames in memory. The value of N is signaled in the 357 sequence header. 359 o In the frame header, signal which of these frames shall be active 360 for the current frame. 362 o For each CB, signal which of the active frames to be used for MC. 364 Combined with re-ordering, this allows for MPEG-1 style B frames. 366 A desirable future extension is to allow long-term reference frames 367 in addition to the short-term reference frames defined by the 368 sliding-window process. 370 5.2. Bi-Prediction 372 In case of bi-prediction, two reference indices and two motion 373 vectors are signaled per CB. In the current version, PB-split is not 374 allowed in bi-prediction mode. Sub-pixel interpolation is performed 375 for each motion vector/reference index separately before doing an 376 average between the two predicted blocks: 378 p(x,y) = (p0(x,y) + p1(x,y))/2 380 5.3. Reordered Frames 382 Frames may be transmitted out of order. Reference frames are 383 selected from the sliding window buffer as normal. 385 5.4. Interpolated Reference Frames 387 A flag is sent in the sequence header indicating that interpolated 388 reference frames may be used. 390 If a frame is using an interpolated reference frame, it will be the 391 first reference in the reference list, and will be interpolated from 392 the second and third reference in the list. It is indicated by a 393 reference index of -1 and has a frame number equal to that of the 394 current frame. 396 The interpolated reference is created by a deterministic process 397 common to the encoder and decoder, and described in the separate 398 IRFVC draft [I-D.davies-netvc-irfvc]. 400 5.5. Sub-Pixel Interpolation 402 5.5.1. Luma Poly-phase Filter 404 Inter prediction uses traditional block-based motion compensated 405 prediction with quarter pixel resolution. A separable 6-tap poly- 406 phase filter is the basis method for doing MC with sub-pixel 407 accuracy. The luma filter coefficients are as follows: 409 When bi-prediction is enabled in the sequence header: 411 1/4 phase: [2,-10,59,17,-5,1]/64 413 2/4 phase: [1,-8,39,39,-8,1]/64 415 3/4 phase: [1,-5,17,59,-10,2]/64 417 When bi-prediction is disabled in the sequence header: 419 1/4 phase: [1,-7,55,19,-5,1]/64 421 2/4 phase: [1,-7,38,38,-7,1]/64 423 3/4 phase: [1,-5,19,55,-7,1]/64 425 With reference to Figure 5, a fractional sample value, e.g. i0,0 426 which has a phase of 1/4 in the horizontal dimension and a phase of 427 1/2 in the vertical dimension is calculated as follows: 429 a0,j = 2*A-2,i - 10*A-1,i + 59*A0,i + 17*A1,i - 5*A2,i + 1*A3,i 431 where j = -2,...,3 433 i0,0 = (1*a0,-2 - 8*a0,-1 + 39*a0,0 + 39*a0,1 - 8*a0,2 + 1*a0,3 + 434 2048)/4096 436 The minimum sub-block size is 8x8. 438 +-----+-----+-----+-----+-----+-----+-----+-----+-----+ 439 |A | | | |A |a |b |c |A | 440 |-1,-1| | | | 0,-1| 0,-1| 0,-1| 0,-1| 1,-1| 441 +-----+-----+-----+-----+-----+-----+-----+-----+-----+ 442 | | | | | | | | | | 443 | | | | | | | | | | 444 +-----+-----+-----+-----+-----+-----+-----+-----+-----+ 445 | | | | | | | | | | 446 | | | | | | | | | | 447 +-----+-----+-----+-----+-----+-----+-----+-----+-----+ 448 | | | | | | | | | | 449 | | | | | | | | | | 450 +-----+-----+-----+-----+-----+-----+-----+-----+-----+ 451 |A | | | |A |a |b |c |A | 452 |-1,0 | | | | 0,0 | 0,0 | 0,0 | 0,0 | 1,0 | 453 +-----+-----+-----+-----+-----+-----+-----+-----+-----+ 454 |d | | | |d |e |f |g |d | 455 |-1,0 | | | | 0,0 | 0,0 | 0,0 | 0,0 | 1,0 | 456 +-----+-----+-----+-----+-----+-----+-----+-----+-----+ 457 |h | | | |h |i |j |k |h | 458 |-1,0 | | | | 0,0 | 0,0 | 0,0 | 0,0 | 1,0 | 459 +-----+-----+-----+-----+-----+-----+-----+-----+-----+ 460 |l | | | |l |m |n |o |l | 461 |-1,0 | | | | 0,0 | 0,0 | 0,0 | 0,0 | 1,0 | 462 +-----+-----+-----+-----+-----+-----+-----+-----+-----+ 463 |A | | | |A |a |b |c |A | 464 |-1,1 | | | | 0,1 | 0,1 | 0,1 | 0,1 | 1,1 | 465 +-----+-----+-----+-----+-----+-----+-----+-----+-----+ 467 Figure 5: Sub-pixel positions 469 5.5.2. Luma Special Filter Position 471 For the fractional pixel position having exactly 2 quarter pixel 472 offsets in each dimension, a non-separable filter is used to 473 calculate the interpolated value. With reference to Figure 5, the 474 center position j0,0 is calculated as follows: 476 j0,0 = 478 [0*A-1,-1 + 1*A0,-1 + 1*A1,-1 + 0*A2,-1 + 480 1*A-1,0 + 2*A0,0 + 2*A1,0 + 1*A2,0 + 482 1*A-1,1 + 2*A0,1 + 2*A1,1 + 1*A2,1 + 483 0*A-1,2 + 1*A0,2 + 1*A1,2 + 0*A2,2 + 8]/16 485 5.5.3. Chroma Poly-phase Filter 487 Chroma interpolation is performed with 1/8 pixel resolution using the 488 following poly-phase filter. 490 1/8 phase: [-2, 58, 10, -2]/64 492 2/8 phase: [-4, 54, 16, -2]/64 494 3/8 phase: [-4, 44, 28, -4]/64 496 4/8 phase: [-4, 36, 36, -4]/64 498 5/8 phase: [-4, 28, 44, -4]/64 500 6/8 phase: [-2, 16, 54, -4]/64 502 7/8 phase: [-2, 10, 58, -2]/64 504 5.6. Motion Vector Coding 506 5.6.1. Inter0 and Inter1 Modes 508 Inter0 and inter1 modes imply signaling of a motion vector index to 509 choose a motion vector from a list of candidate motion vectors with 510 associated reference frame index. A list of motion vector candidates 511 are derived from at most two different neighbor blocks, each having a 512 unique motion vector/reference frame index. Signaling of the motion 513 vector index uses 0 or 1 bit, dependent on the number of unique 514 motion vector candidates. If the chosen neighbor block is coded in 515 bi-prediction mode, the inter0 or inter1 block inherits both motion 516 vectors, both reference indices and the bi-prediction property of the 517 neighbor block. 519 For block sizes less than 64x64, inter0 has only one motion vector 520 candidate, and its value is always zero. 522 Which neighbor blocks to use for motion vector candidates depends on 523 the availability of the neighbor blocks (i.e. whether the neighbor 524 blocks have already been coded, belong to the same slice and are not 525 outside the frame boundaries). Four different availabilities, U, UR, 526 L, and LL, are defined as illustrated in Figure 6. If the neighbor 527 block is intra it is considered to be available but with a zero 528 motion vector. 530 | | 531 | U | UR 532 -----------+-----------+----------- 533 | | 534 | current | 535 L | block | 536 | | 537 | | 538 -----------+-----------+ 539 | 540 | 541 LL | 542 | 544 Figure 6: Availability of neighbor blocks 546 Based on the four availabilities defined above, each of the motion 547 vector candidates is derived from one of the possible neighbor blocks 548 defined in Figure 7. 550 +----+----+ +----+ +----+----+ 551 | UL | U0 | | U1 | | U2 | UR | 552 +----+----+------+----+----+----+----+ 553 | L0 | | 554 +----+ | 555 | | 556 | | 557 +----+ current | 558 | L1 | block | 559 +----+ | 560 | | 561 +----+ | 562 | L2 | | 563 +----+--------------------------+ 564 | LL | 565 +----+ 567 Figure 7: Motion vector candidates 569 The choice of motion vector candidates depends on the availability of 570 neighbor blocks as shown in Table 1. 572 +----+-----+----+-----+---------------------------+ 573 | U | UR | L | LL | Motion vector candidates | 574 +----+-----+----+-----+---------------------------+ 575 | 0 | 0 | 0 | 0 | zero vector | 576 | 1 | 0 | 0 | 0 | U2, zero vector | 577 | 0 | 1 | 0 | 0 | NA | 578 | 1 | 1 | 0 | 0 | U2,zero vector | 579 | 0 | 0 | 1 | 0 | L2, zero vector | 580 | 1 | 0 | 1 | 0 | U2,L2 | 581 | 0 | 1 | 1 | 0 | NA | 582 | 1 | 1 | 1 | 0 | U2,L2 | 583 | 0 | 0 | 0 | 1 | NA | 584 | 1 | 0 | 0 | 1 | NA | 585 | 0 | 1 | 0 | 1 | NA | 586 | 1 | 1 | 0 | 1 | NA | 587 | 0 | 0 | 1 | 1 | L2, zero vector | 588 | 1 | 0 | 1 | 1 | U2,L2 | 589 | 0 | 1 | 1 | 1 | NA | 590 | 1 | 1 | 1 | 1 | U2,L2 | 591 +----+-----+----+-----+---------------------------+ 593 Table 1: Motion vector candidates for different availability of 594 neighbor blocks 596 5.6.2. Inter2 and Bi-Prediction Modes 598 Motion vectors are coded using motion vector prediction. The motion 599 vector predictor is defined as the median of the motion vectors from 600 three neighbor blocks. Definition of the motion vector predictor 601 uses the same definition of availability and neighbors as in Figure 6 602 and Figure 7 respectively. The three vectors used for median 603 filtering depends on the availability of neighbor blocks as shown in 604 Table 2. If the neighbor block is coded in bi-prediction mode, only 605 the first motion vector (in transmission order), MV0, is used as 606 input to the median operator. 608 +----+-----+----+-----+--------------------------------------+ 609 | U | UR | L | LL | Motion vectors for median filtering | 610 +----+-----+----+-----+--------------------------------------+ 611 | 0 | 0 | 0 | 0 | 3 x zero vector | 612 | 1 | 0 | 0 | 0 | U0,U1,U2 | 613 | 0 | 1 | 0 | 0 | NA | 614 | 1 | 1 | 0 | 0 | U0,U2,UR | 615 | 0 | 0 | 1 | 0 | L0,L1,L2 | 616 | 1 | 0 | 1 | 0 | UL,U2,L2 | 617 | 0 | 1 | 1 | 0 | NA | 618 | 1 | 1 | 1 | 0 | U0,UR,L2,L0 | 619 | 0 | 0 | 0 | 1 | NA | 620 | 1 | 0 | 0 | 1 | NA | 621 | 0 | 1 | 0 | 1 | NA | 622 | 1 | 1 | 0 | 1 | NA | 623 | 0 | 0 | 1 | 1 | L0,L2,LL | 624 | 1 | 0 | 1 | 1 | U2,L0,LL | 625 | 0 | 1 | 1 | 1 | NA | 626 | 1 | 1 | 1 | 1 | U0,UR,L0 | 627 +----+-----+----+-----+--------------------------------------+ 629 Table 2: Neighbor blocks used to define motion vector predictor 630 through median filtering 632 5.6.3. Motion Vector Direction 634 Motion vectors referring to reference frames later in time than the 635 current frame are stored with their sign reversed, and these reversed 636 values are used for coding and motion vector prediction. 638 6. Transforms 640 Transforms are applied at the TB or CB level, implying that transform 641 sizes range from 4x4 to 128x128. The transforms form an embedded 642 structure meaning the transform matrix elements of the smaller 643 transforms can be extracted from the larger transforms. 645 7. Quantization 647 For the 32x32, 64x64 and 128x128 transform sizes, only the 16x16 low 648 frequency coefficients are quantized and transmitted. 650 The 64x64 inverse transform is defined as a 32x32 transform followed 651 by duplicating each output sample into a 2x2 block. The 128x128 652 inverse transform is defined as a 32x32 transform followed by 653 duplicating each output sample into a 4x4 block. 655 7.1. Quantization matrices 657 A flag is transmitted in the sequence header to indicate whether 658 quantization matrices are used. If this flag is true, a 6 bit value 659 qmtx_offset is transmitted in the sequence header to indicate matrix 660 strength. 662 If used, then in dequantization a separate scaling factor is applied 663 to each coefficient, so that the dequantized value of a coefficient 664 ci at position i is: 666 (ci * d(q) * IW(i,c,s,t,q) + 2^(k + 5)) >> (k + 6) 668 Figure 8: Equation 1 670 where IW is the scale factor for coefficient position i with size s, 671 frame type (inter/inter) t, component (Y, Cb or Cr) c and quantizer 672 q; and k=k(s,q) is the dequantization shift. IW has scale 64, that 673 is, a weight value of 64 is no different to unweighted 674 dequantization. 676 7.1.1. Quantization matrix selection 678 The current luma qp value qpY and the offset value qmtx_offset 679 determine a quantisation matrix set by the formula: 681 qmlevel = max(0,min(11,((qpY + qmtx_offset) * 12) / 44)) 683 Figure 9: Equation 2 685 This selects one of the 12 different sets of default quantization 686 matrix, with increasing qmlevel indicating increasing flatness. 688 For a given value of qmlevel, different weighting matrices are 689 provided for all combinations of transform block size, type (intra/ 690 inter), and component (Y, Cb, Cr). Matrices at low qmlevel are flat 691 (constant value 64). Matrices for inter frames have unity DC gain 692 (i.e. value 64 at position 0), whereas those for intra frames are 693 designed such that the inverse weighting matrix has unity energy gain 694 (i.e. normalized sum-squared of the scaling factors is 1). 696 7.1.2. Quantization matrix design 698 Further details on the quantization matrix and implementation can be 699 found in the separate QMTX draft [I-D.davies-netvc-qmtx]. 701 8. Loop Filtering 703 8.1. Deblocking 705 8.1.1. Luma deblocking 707 Luma deblocking is performed on an 8x8 grid as follows: 709 1. For each vertical edge between two 8x8 blocks, calculate the 710 following for each of line 2 and line 5 respectively: 712 d = abs(a-b) + abs(c-d), 714 where a and b, are on the left hand side of the block edge and c 715 and d are on the right hand side of the block edge: 717 a b | c d 719 2. For each line crossing the vertical edge, perform deblocking if 720 and only if all of the following conditions are true: 722 * d2+d5 < beta(QP) 724 * The edge is also a transform block edge 726 * abs(mvx(left)) > 2, or abs(mvx(right)) > 2, or 728 abs(mvy(left)) > 2, or abs(mvy(right)) > 2, or 730 One of the transform blocks on each side of the edge has non- 731 zero coefficients, or 733 One of the transform blocks on each side of the edge is coded 734 using intra mode. 736 3. If deblocking is performed, calculate a delta value as follows: 738 delta = clip((18*(c-b) - 6*(d-a) + 16)/32,tc,-tc), 740 where tc is a QP-dependent value. 742 4. Next, modify two pixels on each side of the block edge as 743 follows: 745 a' = a + delta/2 747 b' = b + delta 749 c' = c + delta 751 d' = d + delta/2 753 5. The same procedure is followed for horizontal block edges. 755 The relative positions of the samples, a, b, c, d and the motion 756 vectors, MV, are illustrated in Figure 10. 758 | 759 | block edge 760 | 761 +---+---+---+---+ 762 | a | b | c | d | 763 +---+---+---+---+ 764 | 765 mv | mv 766 x,left | x,right 767 | 768 mv mv 769 y,left y,right 771 Figure 10: Deblocking filter pixel positions 773 8.1.2. Chroma Deblocking 775 Chroma deblocking is performed on a 4x4 grid as follows: 777 1. Delocking of the edge between two 4x4 blocks is performed if and 778 only if: 780 * The pixels on either side of the block edge belongs to an 781 intra block. 783 * The block edge is also an edge between two transform blocks. 785 2. If deblocking is performed, calculate a delta value as follows: 787 delta = clip((4*(c-b) + (d-a) + 4)/8,tc,-tc), 789 where tc is a QP-dependent value. 791 3. Next, modify one pixel on each side of the block edge as follows: 793 b' = b + delta 795 c' = c + delta 797 8.2. Constrained Low Pass Filter (CLPF) 799 A low-pass filter is applied after the deblocking filter if signaled 800 in the sequence header. It can still be switched off for individual 801 frames in the frame header. Also signaled in the frame header is 802 whether to apply the filter for all qualified 128x128 blocks or to 803 transmit a flag for each such block. A super block does not qualify 804 if it only contains Inter0 (skip) coding block and no signal is 805 transmitted for these blocks. 807 The filter is described in the separate CLPF draft 808 [I-D.midtskogen-netvc-clpf]. 810 9. Entropy coding 812 9.1. Overview 814 The following information is signaled at the sequence level: 816 o Sequence header 818 The following information is signaled at the frame level: 820 o Frame header 822 The following information is signaled at the CB level: 824 o Super-mode (mode, split, reference index for uni-prediction) 826 o Intra prediction mode 828 o PB-split (none, hor, ver, quad) 830 o TB-split (none or quad) 832 o Reference frame indices for bi-prediction 834 o Motion vector candidate index 836 o Transform coefficients if TB-split=0 838 The following information is signaled at the TB level: 840 o CBP (8 combinations of CBPY, CBPU, and CBPV) 842 o Transform coefficients 844 The following information is signaled at the PB level: 846 o Motion vector differences 848 9.2. Low Level Syntax 850 9.2.1. CB Level 852 super-mode (inter0/split/inter1/inter2-ref0/intra/inter2-ref1/inter2-ref2/inter2-ref3,..) 854 if (mode == inter0 || mode == inter1) 856 mv_idx (one of up to 2 motion vector candidates) 858 else if (mode == INTRA) 860 intra_mode (one of up to 8 intra modes) 862 tb_split (NONE or QUAD, coded jointly with CBP for tb_split=NONE) 864 else if (mode == INTER) 866 pb_split (NONE,VER,HOR,QUAD) 868 tb_split_and_cbp (NONE or QUAD and CBP) 870 else if (mode == BIPRED) 872 mvd_x0, mvd_y0 (motion vector difference for first vector) 874 mvd_x1, mvd_y1 (motion vector difference for second vector) 876 ref_idx0, ref_idx1 (two reference indices) 878 9.2.2. PB Level 880 if (mode == INTER2 || mode == BIPRED) 882 mvd_x, mvd_y (motion vector differences) 884 9.2.3. TB Level 886 if (mode != INTER0 and tb_split == 1) 888 cbp (8 possibilities for CBPY/CBPU/CBPV) 890 if (mode != INTER0) 892 transform coefficients 894 9.2.4. Super Mode 896 For each block of size NxN (64>=N>8), the following mutually 897 exclusive events are jointly encoded using a single VLC code as 898 follows (example using 4 reference frames): 900 If there is no interpolated reference frame: 902 INTER0 1 903 SPLIT 01 904 INTER1 001 905 INTER2-REF0 0001 906 BIPRED 00001 907 INTRA 000001 908 INTER2-REF1 0000001 909 INTER2-REF2 00000001 910 INTER2-REF3 00000000 912 If there is an interpolated reference frame: 914 INTER0 1 915 SPLIT 01 916 INTER1 001 917 BIPRED 0001 918 INTRA 00001 919 INTER2-REF1 000001 920 INTER2-REF2 0000001 921 INTER2-REF3 00000001 922 INTER2-REF0 00000000 924 If less than 4 reference frames is used, a shorter VLC table is used. 925 If bi-pred is not possible, or split is not possible, they are 926 omitted from the table and shorter codes are used for subsequent 927 elements. 929 Additionally, depending on information from the blocks to the left 930 and above (meta data and CBP), a different sorting of the events can 931 be used, e.g.: 933 SPLIT 1 934 INTER1 01 935 INTER2-REF0 001 936 INTER0 0001 937 INTRA 00001 938 INTER2-REF1 000001 939 INTER2-REF2 0000001 940 INTER2-REF3 00000001 941 BIPRED 00000000 943 9.2.5. CBP 945 Calculate code as follows: 947 if (tb-split == 0) 949 N = 4*CBPV + 2*CBPU + CBPY 951 else 953 N = 8 955 Map the value of N to code through a table lookup: 957 code = table[N] 959 where the purpose of the table lookup is the sort the different 960 values of code according to decreasing probability (typically CBPY=1, 961 CBPU=0, CBPV=0 having the highest probability). 963 Use a different table depending on the values of CBPY in neighbor 964 blocks (left and above). 966 Encode the value of code using a systematic VLC code. 968 9.2.6. Transform Coefficients 970 Transform coefficient coding uses a traditional zig-zag scan pattern 971 to convert a 2D array of quantized transform coefficients, coeff, to 972 a 1D array of samples. VLC coding of quantized transform 973 coefficients starts from the low frequency end of the 1D array using 974 two different modes; level-mode and run-mode, starting in level-mode: 976 o Level-mode 978 * Encode each coefficient, coeff, separately 980 * Each coefficient is encoded by: 982 + The absolute value, level=abs(coeff), using a VLC code and 984 + If level > 0, the sign bit (sign=0 or sign=1 for coeff>0 and 985 coeff<0 respectively). 987 * If coefficient N is zero, switch to run-mode, starting from 988 coefficient N+1. 990 o Run-mode 992 * For each non-zero coefficient, encode the combined event of: 994 1. Length of the zero-run, i.e. the number of zeros since the 995 last non-zero coefficient. 997 2. Whether or not level=abs(coeff) is greater than 1. 999 3. End of block (EOB) indicating that there are no more non- 1000 zero coefficients. 1002 * Additionally, if level = 1, code the sign bit. 1004 * Additionally, if level > 1 define code = 2*(level-2)+sign, 1006 * If the absolute value of coefficient N is larger than 1, switch 1007 to level-mode, starting from coefficient N+1. 1009 Example 1011 Figure 11 illustrates an example where 16 quantized transform 1012 coefficients are encoded. 1014 4 1015 3 1016 2 | 2 1017 1 | 1 1 | 1 1018 | | 0 0 0 0 | | 0 0 0 0 1019 |__|__|__|________|________|__|________|_______ 1021 Figure 11: Coefficients to encode 1023 Table 3 shows the mode, VLC number and symbols to be coded for each 1024 coefficient. 1026 +--------+-------------+-------------+------------------------------+ 1027 | Index | abs(coeff) | Mode | Encoded symbols | 1028 +--------+-------------+-------------+------------------------------+ 1029 | 0 | 2 | level-mode | level=2,sign | 1030 | 1 | 1 | level-mode | level=1,sign | 1031 | 2 | 4 | level-mode | level=4,sign | 1032 | 3 | 1 | level-mode | level=1,sign | 1033 | 4 | 0 | level-mode | level=0 | 1034 | 5 | 0 | run-mode | | 1035 | 6 | 1 | run-mode | (run=1,level=1) | 1036 | 7 | 0 | run-mode | | 1037 | 8 | 0 | run-mode | | 1038 | 9 | 3 | run-mode | (run=1,level>1), | 1039 | | | | 2*(3-2)+sign | 1040 | 10 | 2 | level-mode | level=2, sign | 1041 | 11 | 0 | level-mode | level=0 | 1042 | 12 | 0 | run-mode | | 1043 | 13 | 1 | run-mode | (run=1,level=1) | 1044 | 14 | 0 | run-mode | EOB | 1045 | 15 | 0 | run-mode | | 1046 +--------+-------------+-------------+------------------------------+ 1048 Table 3: Transform coefficient encoding for the example above. 1050 10. High Level Syntax 1052 High level syntax is currently very simple and rudimentary as the 1053 primary focus so far has been on compression performance. It is 1054 expected to evolve as functionality is added. 1056 10.1. Sequence Header 1058 o Width - 16 bits 1060 o Height - 16 bits 1062 o Enable/disable PB-split - 1 bit 1064 o SB size - 3 bits 1066 o Enable/disable TB-split - 1 bit 1068 o Number of active reference frames (may go into frame header) - 2 1069 bits (max 4) 1071 o Enable/disable interpolated reference frames - 1 bit 1073 o Enable/disable delta qp - 1 bit 1074 o Enable/disable deblocking - 1 bit 1076 o Constrained low-pass filter (CLPF) enable/disable - 1 bit 1078 o Enable/disable block context coding - 1 bit 1080 o Enable/disable bi-prediction - 1 bit 1082 o Enable/disable quantization matrices - 1 bit 1084 o If quantization matrices enabled: quantization matrix offset - 6 1085 bit 1087 10.2. Frame Header 1089 o Frame type - 1 bit 1091 o QP - 8 bits 1093 o Identification of active reference frames - num_ref*4 bits 1095 o Number of intra modes - 4 bits 1097 o Number of active reference frames - 2 bits 1099 o Active reference frames - number of active reference frames * 6 1100 bits 1102 o Frame number - 16 bits 1104 o If CLPF is enabled in the sequence header: Constrained low-pass 1105 filter (CLPF) strength - 2 bits (00 = off, 01 = strength 1, 10 = 1106 strength 2, 11 = strength 4) 1108 o IF CLPF is enabled in the sequence header: Enable/disable CLPF 1109 signal for each qualified filter block 1111 11. IANA Considerations 1113 This document has no IANA considerations yet. TBD 1115 12. Security Considerations 1117 This document has no security considerations yet. TBD 1119 13. Normative References 1121 [I-D.davies-netvc-irfvc] 1122 Davies, T., "Interpolated reference frames for video 1123 coding", draft-davies-netvc-irfvc-00 (work in progress), 1124 October 2015. 1126 [I-D.davies-netvc-qmtx] 1127 Davies, T., "Quantisation matrices for Thor video coding", 1128 draft-davies-netvc-qmtx-00 (work in progress), March 2016. 1130 [I-D.midtskogen-netvc-clpf] 1131 Midtskogen, S., Fuldseth, A., and M. Zanaty, "Constrained 1132 Low Pass Filter", draft-midtskogen-netvc-clpf-01 (work in 1133 progress), March 2016. 1135 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1136 Requirement Levels", BCP 14, RFC 2119, 1137 DOI 10.17487/RFC2119, March 1997, 1138 . 1140 Authors' Addresses 1142 Arild Fuldseth 1143 Cisco 1144 Lysaker 1145 Norway 1147 Email: arilfuld@cisco.com 1149 Gisle Bjontegaard 1150 Cisco 1151 Lysaker 1152 Norway 1154 Email: gbjonteg@cisco.com 1156 Steinar Midtskogen 1157 Cisco 1158 Lysaker 1159 Norway 1161 Email: stemidts@cisco.com 1162 Thomas Davies 1163 Cisco 1164 London 1165 UK 1167 Email: thdavies@cisco.com 1169 Mo Zanaty 1170 Cisco 1171 RTP,NC 1172 USA 1174 Email: mzanaty@cisco.com