idnits 2.17.1 draft-fuldseth-netvc-thor-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 3 instances of too long lines in the document, the longest one being 30 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (October 19, 2015) is 3084 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'N' is mentioned on line 930, but not defined Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Fuldseth 3 Internet-Draft G. Bjontegaard 4 Intended status: Standards Track S. Midtskogen 5 Expires: April 21, 2016 T. Davies 6 M. Zanaty 7 Cisco 8 October 19, 2015 10 Thor Video Codec 11 draft-fuldseth-netvc-thor-01 13 Abstract 15 This document provides a high-level description of the Thor video 16 codec. Thor is designed to achieve high compression efficiency with 17 moderate complexity, using the well-known hybrid video coding 18 approach of motion-compensated prediction and transform coding. 20 Status of This Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at http://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on April 21, 2016. 37 Copyright Notice 39 Copyright (c) 2015 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (http://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 55 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 5 56 2.1. Requirements Language . . . . . . . . . . . . . . . . . . 5 57 2.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6 58 3. Block Structure . . . . . . . . . . . . . . . . . . . . . . . 6 59 3.1. Super Blocks and Coding Blocks . . . . . . . . . . . . . 6 60 3.2. Special Processing at Frame Boundaries . . . . . . . . . 6 61 3.3. Transform Blocks . . . . . . . . . . . . . . . . . . . . 8 62 3.4. Prediction Blocks . . . . . . . . . . . . . . . . . . . . 8 63 4. Intra Prediction . . . . . . . . . . . . . . . . . . . . . . 8 64 5. Inter Prediction . . . . . . . . . . . . . . . . . . . . . . 9 65 5.1. Multiple Reference Frames . . . . . . . . . . . . . . . . 9 66 5.2. Bi-Prediction . . . . . . . . . . . . . . . . . . . . . . 10 67 5.3. Reordered Frames . . . . . . . . . . . . . . . . . . . . 10 68 5.4. Interpolated Reference Frames . . . . . . . . . . . . . . 10 69 5.5. Sub-Pixel Interpolation . . . . . . . . . . . . . . . . . 10 70 5.5.1. Luma Poly-phase Filter . . . . . . . . . . . . . . . 10 71 5.5.2. Luma Special Filter Position . . . . . . . . . . . . 12 72 5.5.3. Chroma Poly-phase Filter . . . . . . . . . . . . . . 12 73 5.6. Motion Vector Coding . . . . . . . . . . . . . . . . . . 12 74 5.6.1. Inter0 and Inter1 Modes . . . . . . . . . . . . . . . 12 75 5.6.2. Inter2 and Bi-Prediction Modes . . . . . . . . . . . 15 76 5.6.3. Motion Vector Direction . . . . . . . . . . . . . . . 15 77 6. Transforms . . . . . . . . . . . . . . . . . . . . . . . . . 15 78 7. Quantization . . . . . . . . . . . . . . . . . . . . . . . . 16 79 8. Loop Filtering . . . . . . . . . . . . . . . . . . . . . . . 16 80 8.1. Deblocking . . . . . . . . . . . . . . . . . . . . . . . 16 81 8.1.1. Luma deblocking . . . . . . . . . . . . . . . . . . . 16 82 8.1.2. Chroma Deblocking . . . . . . . . . . . . . . . . . . 17 83 8.2. Constrained Low Pass Filter (CLPF1) . . . . . . . . . . . 18 84 9. Entropy coding . . . . . . . . . . . . . . . . . . . . . . . 19 85 9.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 19 86 9.2. Low Level Syntax . . . . . . . . . . . . . . . . . . . . 20 87 9.2.1. CB Level . . . . . . . . . . . . . . . . . . . . . . 20 88 9.2.2. PB Level . . . . . . . . . . . . . . . . . . . . . . 20 89 9.2.3. TB Level . . . . . . . . . . . . . . . . . . . . . . 20 90 9.2.4. Super Mode . . . . . . . . . . . . . . . . . . . . . 21 91 9.2.5. CBP . . . . . . . . . . . . . . . . . . . . . . . . . 22 92 9.2.6. Transform Coefficients . . . . . . . . . . . . . . . 22 93 10. High Level Syntax . . . . . . . . . . . . . . . . . . . . . . 24 94 10.1. Sequence Header . . . . . . . . . . . . . . . . . . . . 24 95 10.2. Frame Header . . . . . . . . . . . . . . . . . . . . . . 25 96 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 25 97 12. Security Considerations . . . . . . . . . . . . . . . . . . . 25 98 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 25 99 14. Normative References . . . . . . . . . . . . . . . . . . . . 25 100 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 25 102 1. Introduction 104 This document provides a high-level description of the Thor video 105 codec. Thor is designed to achieve high compression efficiency with 106 moderate complexity, using the well-known hybrid video coding 107 approach of motion-compensated prediction and transform coding. 109 The Thor video codec is a block-based hybrid video codec similar in 110 structure to widespread standards. The high level encoder and 111 decoder structures are illustrated in Figure 1 and Figure 2 112 respectively. 114 +---+ +-----------+ +-----------+ +--------+ 115 Input--+-->| + |-->| Transform |-->| Quantizer |-->| Entropy| 116 Video | +---+ +-----------+ +-----------+ | Coding | 117 | ^ - | +--------+ 118 | | v | 119 | | +-----------+ v 120 | | | Inverse | Output 121 | | | Transform | Bitstream 122 | | +-----------+ 123 | | | 124 | | v 125 | | +---+ 126 | +------------------------>| + | 127 | | +-------------+ +---+ 128 | | ___| Intra Frame | | 129 | | / | Prediction |<-----+ 130 | | / +-------------+ | 131 | |/ v 132 | \ +-------------+ +---------+ 133 | \ | Inter Frame | | Loop | 134 | \___| Prediction | | Filters | 135 | +-------------+ +---------+ 136 | ^ | 137 | | v 138 | +------------+ +---------------+ 139 | | Motion | | Reconstructed | 140 +----------->| Estimation |<--| Frame Memory | 141 +------------+ +---------------+ 143 Figure 1: Encoder Structure 145 +----------+ +-----------+ 146 Input ------->| Entropy |----->| Inverse | 147 Bitstream | Decoding | | Transform | 148 +----------+ +-----------+ 149 | 150 v 151 +---+ 152 +------------------------>| + | 153 | +-------------+ +---+ 154 | ___| Intra Frame | | 155 | / | Prediction |<-----+ 156 | / +-------------+ | 157 |/ v 158 \ +-------------+ +---------+ 159 \ | Inter Frame | | Loop | 160 \___| Prediction | | Filters | 161 +-------------+ +---------+ 162 ^ |-------------> Output 163 | v Video 164 +--------------+ +---------------+ 165 | Motion | | Reconstructed | 166 | Compensation |<--| Frame Memory | 167 +--------------+ +---------------+ 169 Figure 2: Decoder Structure 171 The remainder of this document is organized as follows. First, some 172 requirements language and terms are defined. Block structures are 173 described in detail, followed by intra-frame prediction techniques, 174 inter-frame prediction techniques, transforms, quantization, loop 175 filters, entropy coding, and finally high level syntax. 177 An open source reference implementation is available at 178 github.com/cisco/thor. 180 2. Definitions 182 2.1. Requirements Language 184 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 185 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 186 document are to be interpreted as described in RFC 2119 [RFC2119]. 188 2.2. Terminology 190 This document frequently uses the following terms. 192 SB: Super Block - 64x64 block (luma pixels) which can be divided 193 into CBs. 195 CB: Coding Block - Subdivision of a SB, down to 8x8 (luma pixels). 197 PB: Prediction Block - Subdivision of a CB, into 1, 2 or 4 equal 198 blocks. 200 TB: Transform Block - Subdivision of a CB, into 1 or 4 equal 201 blocks. 203 3. Block Structure 205 3.1. Super Blocks and Coding Blocks 207 Each frame is divided into 64x64 Super Blocks (SB) which are 208 processed in raster-scan order. Each SB can be divided into Coding 209 Blocks (CB) using a quad-tree structure. The smallest allowed CB 210 size is 8x8 luma pixels. The four CBs of a larger block are coded/ 211 signaled in the following order; upleft, downleft, upright, and 212 downright. 214 The following modes are signaled at the CB level: 216 o Intra 218 o Inter0 (MV index, no residual information) 220 o Inter1 (MV index, residual information) 222 o Inter2 (explicit motion information, residual information) 224 o Bi-Prediction (explicit motion information, residual information) 226 3.2. Special Processing at Frame Boundaries 228 At frame boundaries some square blocks might not be complete. For 229 example, for 1920x1080 resolutions, the bottom row would consist of 230 rectangular blocks of size 64x56. Rectangular blocks at frame 231 boundaries are handled as follows. For each rectangular block, send 232 one bit to choose between: 234 o A rectangular inter0 block and 235 o Further split. 237 For the bottom part of a 1920x1080 frame, this implies the following: 239 o For each 64x56 block, transmit one bit to signal a 64x56 inter0 240 block or a split into two 32x32 blocks and two 32x24 blocks. 242 o For each 32x24 block, transmit one bit to signal a 32x24 inter0 243 block or a split into two 16x16 blocks and two 16x8 blocks. 245 o For each 16x8 block, transmit one bit to signal a 16x8 inter0 246 block or a split into two 8x8 blocks. 248 Two examples of handling 64x56 blocks at the bottom row of a 249 1920x1080 frame are shown in Figure 3 and Figure 4 respectively. 251 64 252 +-------------------------------+ 253 | | 254 | | 255 | | 256 | | 257 | | 258 | | 259 | | 260 64 | 56 64x56 | 261 | SKIP | 262 | | 263 | | 264 | | 265 | | 266 - - - - - - - - - + - - - - - - - - - - - - - - - + - - - 267 Frame boundary | 8 | 268 +-------------------------------+ 270 Figure 3: Super block at frame boundary 271 64 272 +---------------+---------------+ 273 | | | 274 | | | 275 | | | 276 | | | 277 | | | 278 | | | 279 | | | 280 64 +---------------+-------+-------+ 281 | | | | 282 | | | | 283 | 32x24 | | | 284 | SKIP +---+---+-------+ 285 | | | | 16x8 | 286 - - - - - - - - - + - - - - - - - +---+---+ - - - + - - - 287 Frame boundary | 8 | | | SKIP | 288 +---------------+---+---+-------+ 290 Figure 4: Coding block at frame boundary 292 3.3. Transform Blocks 294 A coding block (CB) can be divided into four smaller transform blocks 295 (TBs). 297 3.4. Prediction Blocks 299 A coding block (CB) can also be divided into smaller prediction 300 blocks (PBs) for the purpose of motion-compensated prediction. 301 Horizontal, vertical and quad split are used. 303 4. Intra Prediction 305 8 intra prediction modes are used: 307 1. DC 309 2. Vertical (V) 311 3. Horizontal (H) 313 4. Upupright (north-northeast) 315 5. Upupleft (north-northwest) 317 6. Upleft (northwest) 318 7. Upleftleft (west-northwest) 320 8. Downleftleft (west-southwest) 322 The definition of DC, vertical, and horizontal modes are 323 straightforward. 325 The upleft direction is exactly 45 degrees. 327 The upupright, upupleft, and upleftleft directions are equal to 328 arctan(1/2) from the horizontal or vertical direction, since they are 329 defined by going one pixel horizontally and two pixels vertically (or 330 vice versa). 332 For the 5 angular intra modes (i.e. angle different from 90 degrees), 333 the pixels of the neighbor blocks are filtered before they are used 334 for prediction: 336 y(n) = (x(n-1) + 2*x(n) + x(n+1) + 2)/4 338 For the angular intra modes that are not 45 degrees, the prediction 339 sometimes requires sample values at a half-pixel position. These 340 sample values are determined by an additional filter: 342 z(n + 1/2) = (y(n) + y(n+1))/2 344 5. Inter Prediction 346 5.1. Multiple Reference Frames 348 Multiple reference frames are currently implemented as follows. 350 o Use a sliding-window process to keep the N most recent 351 reconstructed frames in memory. The value of N is signaled in the 352 sequence header. 354 o In the frame header, signal which of these frames shall be active 355 for the current frame. 357 o For each CB, signal which of the active frames to be used for MC. 359 Combined with re-ordering, this allows for MPEG-1 style B frames. 361 A desirable future extension is to allow long-term reference frames 362 in addition to the short-term reference frames defined by the 363 sliding-window process. 365 5.2. Bi-Prediction 367 In case of bi-prediction, two reference indices and two motion 368 vectors are signaled per CB. In the current version, PB-split is not 369 allowed in bi-prediction mode. Sub-pixel interpolation is performed 370 for each motion vector/reference index separately before doing an 371 average between the two predicted blocks: 373 p(x,y) = (p0(x,y) + p1(x,y))/2 375 5.3. Reordered Frames 377 Frames may be transmitted out of order. Reference frames are 378 selected from the sliding window buffer as normal. 380 5.4. Interpolated Reference Frames 382 A flag is sent in the sequence header indicating that interpolated 383 reference frames may be used. 385 If a frame is using an interpolated reference frame, it will be the 386 first reference in the reference list, and will be interpolated from 387 the second and third reference in the list. It is indicated by a 388 reference index of -1 and has a frame number equal to that of the 389 current frame. 391 The interpolated reference is created by a deterministic process 392 common to the encoder and decoder, and described in the separate 393 IRFVC draft [I-D.davies-netvc-irfvc]. 395 5.5. Sub-Pixel Interpolation 397 5.5.1. Luma Poly-phase Filter 399 Inter prediction uses traditional block-based motion compensated 400 prediction with quarter pixel resolution. A separable 6-tap poly- 401 phase filter is the basis method for doing MC with sub-pixel 402 accuracy. The luma filter coefficients are as follows: 404 1/4 phase: [3,-15,111,37,-10,2]/128 406 2/4 phase: [3,-17,78,78,-17,3]/128 408 3/4 phase: [2,-10,37,111,-15,3]/128 410 With reference to Figure 5, a fractional sample value, e.g. i0,0 411 which has a phase of 1/4 in the horizontal dimension and a phase of 412 1/2 in the vertical dimension is calculated as follows: 414 a0,j = 3*A-2,i - 15*A-1,i + 111*A0,i + 37*A1,i - 10*A2,i + 2*A3,i 416 where j = -2,...,3 418 i0,0 = (3*a0,-2 - 17*a0,-1 + 78*a0,0 + 78*a0,1 - 17*a0,2 + 3*a0,3 + 419 8192)/16384 421 However, some of the sub-pixel positions have different filters which 422 can be non-separable and/or have different filter coefficients. The 423 minimum sub-block size is 8x8. 425 +-----+-----+-----+-----+-----+-----+-----+-----+-----+ 426 |A | | | |A |a |b |c |A | 427 |-1,-1| | | | 0,-1| 0,-1| 0,-1| 0,-1| 1,-1| 428 +-----+-----+-----+-----+-----+-----+-----+-----+-----+ 429 | | | | | | | | | | 430 | | | | | | | | | | 431 +-----+-----+-----+-----+-----+-----+-----+-----+-----+ 432 | | | | | | | | | | 433 | | | | | | | | | | 434 +-----+-----+-----+-----+-----+-----+-----+-----+-----+ 435 | | | | | | | | | | 436 | | | | | | | | | | 437 +-----+-----+-----+-----+-----+-----+-----+-----+-----+ 438 |A | | | |A |a |b |c |A | 439 |-1,0 | | | | 0,0 | 0,0 | 0,0 | 0,0 | 1,0 | 440 +-----+-----+-----+-----+-----+-----+-----+-----+-----+ 441 |d | | | |d |e |f |g |d | 442 |-1,0 | | | | 0,0 | 0,0 | 0,0 | 0,0 | 1,0 | 443 +-----+-----+-----+-----+-----+-----+-----+-----+-----+ 444 |h | | | |h |i |j |k |h | 445 |-1,0 | | | | 0,0 | 0,0 | 0,0 | 0,0 | 1,0 | 446 +-----+-----+-----+-----+-----+-----+-----+-----+-----+ 447 |l | | | |l |m |n |o |l | 448 |-1,0 | | | | 0,0 | 0,0 | 0,0 | 0,0 | 1,0 | 449 +-----+-----+-----+-----+-----+-----+-----+-----+-----+ 450 |A | | | |A |a |b |c |A | 451 |-1,1 | | | | 0,1 | 0,1 | 0,1 | 0,1 | 1,1 | 452 +-----+-----+-----+-----+-----+-----+-----+-----+-----+ 454 Figure 5: Sub-pixel positions 456 5.5.2. Luma Special Filter Position 458 For the fractional pixel position having exactly 2 quarter pixel 459 offsets in each dimension, a non-separable filter is used to 460 calculate the interpolated value. With reference to Figure 5, the 461 center position j0,0 is calculated as follows: 463 j0,0 = 465 [0*A-1,-1 + 1*A0,-1 + 1*A1,-1 + 0*A2,-1 + 467 1*A-1,0 + 2*A0,0 + 2*A1,0 + 1*A2,0 + 469 1*A-1,1 + 2*A0,1 + 2*A1,1 + 1*A2,1 + 471 0*A-1,2 + 1*A0,2 + 1*A1,2 + 0*A2,2 + 8]/16 473 5.5.3. Chroma Poly-phase Filter 475 Chroma interpolation is performed with 1/8 pixel resolution using the 476 following poly-phase filter. 478 1/8 phase: [-2, 58, 10, -2]/64 480 2/8 phase: [-4, 54, 16, -2]/64 482 3/8 phase: [-4, 44, 28, -4]/64 484 4/8 phase: [-4, 36, 36, -4]/64 486 5/8 phase: [-4, 28, 44, -4]/64 488 6/8 phase: [-2, 16, 54, -4]/64 490 7/8 phase: [-2, 10, 58, -2]/64 492 5.6. Motion Vector Coding 494 5.6.1. Inter0 and Inter1 Modes 496 Inter0 and inter1 modes imply signaling of a motion vector index to 497 choose a motion vector from a list of candidate motion vectors with 498 associated reference frame index. A list of motion vector candidates 499 are derived from at most two different neighbor blocks, each having a 500 unique motion vector/reference frame index. Signaling of the motion 501 vector index uses 0 or 1 bit, dependent on the number of unique 502 motion vector candidates. If the chosen neighbor block is coded in 503 bi-prediction mode, the inter0 or inter1 block inherits both motion 504 vectors, both reference indices and the bi-prediction property of the 505 neighbor block. 507 For block sizes less than 64x64, inter0 has only one motion vector 508 candidate, and its value is always zero. 510 Which neighbor blocks to use for motion vector candidates depends on 511 the availability of the neighbor blocks (i.e. whether the neighbor 512 blocks have already been coded, belong to the same slice and are not 513 outside the frame boundaries). Four different availabilities, U, UR, 514 L, and LL, are defined as illustrated in Figure 6. If the neighbor 515 block is intra it is considered to be available but with a zero 516 motion vector. 518 | | 519 | U | UR 520 -----------+-----------+----------- 521 | | 522 | current | 523 L | block | 524 | | 525 | | 526 -----------+-----------+ 527 | 528 | 529 LL | 530 | 532 Figure 6: Availability of neighbor blocks 534 Based on the four availabilities defined above, each of the motion 535 vector candidates is derived from one of the possible neighbor blocks 536 defined in Figure 7. 538 +----+----+ +----+ +----+----+ 539 | UL | U0 | | U1 | | U2 | UR | 540 +----+----+------+----+----+----+----+ 541 | L0 | | 542 +----+ | 543 | | 544 | | 545 +----+ current | 546 | L1 | block | 547 +----+ | 548 | | 549 +----+ | 550 | L2 | | 551 +----+--------------------------+ 552 | LL | 553 +----+ 555 Figure 7: Motion vector candidates 557 The choice of motion vector candidates depends on the availability of 558 neighbor blocks as shown in Table 1. 560 +----+-----+----+-----+---------------------------+ 561 | U | UR | L | LL | Motion vector candidates | 562 +----+-----+----+-----+---------------------------+ 563 | 0 | 0 | 0 | 0 | zero vector | 564 | 1 | 0 | 0 | 0 | U2, zero vector | 565 | 0 | 1 | 0 | 0 | NA | 566 | 1 | 1 | 0 | 0 | U2,zero vector | 567 | 0 | 0 | 1 | 0 | L2, zero vector | 568 | 1 | 0 | 1 | 0 | U2,L2 | 569 | 0 | 1 | 1 | 0 | NA | 570 | 1 | 1 | 1 | 0 | U2,L2 | 571 | 0 | 0 | 0 | 1 | NA | 572 | 1 | 0 | 0 | 1 | NA | 573 | 0 | 1 | 0 | 1 | NA | 574 | 1 | 1 | 0 | 1 | NA | 575 | 0 | 0 | 1 | 1 | L2, zero vector | 576 | 1 | 0 | 1 | 1 | U2,L2 | 577 | 0 | 1 | 1 | 1 | NA | 578 | 1 | 1 | 1 | 1 | U2,L2 | 579 +----+-----+----+-----+---------------------------+ 581 Table 1: Motion vector candidates for different availability of 582 neighbor blocks 584 5.6.2. Inter2 and Bi-Prediction Modes 586 Motion vectors are coded using motion vector prediction. The motion 587 vector predictor is defined as the median of the motion vectors from 588 three neighbor blocks. Definition of the motion vector predictor 589 uses the same definition of availability and neighbors as in Figure 6 590 and Figure 7 respectively. The three vectors used for median 591 filtering depends on the availability of neighbor blocks as shown in 592 Table 2. If the neighbor block is coded in bi-prediction mode, only 593 the first motion vector (in transmission order), MV0, is used as 594 input to the median operator. 596 +----+-----+----+-----+--------------------------------------+ 597 | U | UR | L | LL | Motion vectors for median filtering | 598 +----+-----+----+-----+--------------------------------------+ 599 | 0 | 0 | 0 | 0 | 3 x zero vector | 600 | 1 | 0 | 0 | 0 | U0,U1,U2 | 601 | 0 | 1 | 0 | 0 | NA | 602 | 1 | 1 | 0 | 0 | U0,U2,UR | 603 | 0 | 0 | 1 | 0 | L0,L1,L2 | 604 | 1 | 0 | 1 | 0 | UL,U2,L2 | 605 | 0 | 1 | 1 | 0 | NA | 606 | 1 | 1 | 1 | 0 | U0,UR,L2,L0 | 607 | 0 | 0 | 0 | 1 | NA | 608 | 1 | 0 | 0 | 1 | NA | 609 | 0 | 1 | 0 | 1 | NA | 610 | 1 | 1 | 0 | 1 | NA | 611 | 0 | 0 | 1 | 1 | L0,L2,LL | 612 | 1 | 0 | 1 | 1 | U2,L0,LL | 613 | 0 | 1 | 1 | 1 | NA | 614 | 1 | 1 | 1 | 1 | U0,UR,L0 | 615 +----+-----+----+-----+--------------------------------------+ 617 Table 2: Neighbor blocks used to define motion vector predictor 618 through median filtering 620 5.6.3. Motion Vector Direction 622 Motion vectors referring to reference frames later in time than the 623 current frame are stored with their sign reversed, and these reversed 624 values are used for coding and motion vector prediction. 626 6. Transforms 628 Transforms are applied at the TB or CB level, implying that transform 629 sizes range from 4x4 to 64x64. The transforms form an embedded 630 structure meaning the transform matrix elements of the smaller 631 transforms can be extracted from the larger transforms. 633 7. Quantization 635 For the 32x32 and 64x64 transform sizes, only the 16x16 low frequency 636 coefficients are quantized and transmitted. 638 Quantizer step-size control is not implemented yet, but some sort of 639 sub-frame control is desired. 641 8. Loop Filtering 643 8.1. Deblocking 645 8.1.1. Luma deblocking 647 Luma deblocking is performed on an 8x8 grid as follows: 649 1. For each vertical edge between two 8x8 blocks, calculate the 650 following for each of line 2 and line 5 respectively: 652 d = abs(a-b) + abs(c-d), 654 where a and b, are on the left hand side of the block edge and c 655 and d are on the right hand side of the block edge: 657 a b | c d 659 2. For each line crossing the vertical edge, perform deblocking if 660 and only if all of the following conditions are true: 662 * d2+d5 < beta(QP) 664 * The edge is also a transform block edge 666 * abs(mvx(left)) > 2, or abs(mvx(right)) > 2, or 668 abs(mvy(left)) > 2, or abs(mvy(right)) > 2, or 670 One of the transform blocks on each side of the edge has non- 671 zero coefficients, or 673 One of the transform blocks on each side of the edge is coded 674 using intra mode. 676 3. If deblocking is performed, calculate a delta value as follows: 678 delta = clip((18*(c-b) - 6*(d-a) + 16)/32,tc,-tc), 680 where tc is a QP-dependent value. 682 4. Next, modify two pixels on each side of the block edge as 683 follows: 685 a' = a + delta/2 687 b' = b + delta 689 c' = c + delta 691 d' = d + delta/2 693 5. The same procedure is followed for horizontal block edges. 695 The relative positions of the samples, a, b, c, d and the motion 696 vectors, MV, are illustrated in Figure 8. 698 | 699 | block edge 700 | 701 +---+---+---+---+ 702 | a | b | c | d | 703 +---+---+---+---+ 704 | 705 mv | mv 706 x,left | x,right 707 | 708 mv mv 709 y,left y,right 711 Figure 8: Deblocking filter pixel positions 713 8.1.2. Chroma Deblocking 715 Chroma deblocking is performed on a 4x4 grid as follows: 717 1. Delocking of the edge between two 4x4 blocks is performed if and 718 only if: 720 * The pixels on either side of the block edge belongs to an 721 intra block. 723 * The block edge is also an edge between two transform blocks. 725 2. If deblocking is performed, calculate a delta value as follows: 727 delta = clip((4*(c-b) + (d-a) + 4)/8,tc,-tc), 728 where tc is a QP-dependent value. 730 3. Next, modify one pixel on each side of the block edge as follows: 732 b' = b + delta 734 c' = c + delta 736 8.2. Constrained Low Pass Filter (CLPF1) 738 A low-pass filter is applied after the deblocking filter if signaled 739 in the sequence header. It can still be switched off for individual 740 frames in the frame header. Also signaled in the frame header is 741 whether to apply the filter for all qualified super blocks or to 742 transmit a flag for each such block. 744 A super block is qualified if it contains at least one uni-predictive 745 coding block having residual information. Fully bi-predictive super 746 blocks are never filtered, nor uni-predictive if they lack residual 747 information, and no signal is transmitted for these blocks. 749 Superblocks to be filtered are divided into its coding blocks and 750 processed as follows: 752 o Coding blocks that are bi-predictive or lack residual information 753 are unchanged. 755 o Otherwise, every pixel is compared to the adjacent left, right, 756 upper, lower pixels. 758 o If at least three of them are greater, the center pixel is 759 increased by one. 761 o If at least three of them are smaller, the center pixel is 762 decreased by one. 764 o The modification of the center pixel can be described by the 765 following equation where comparisons evaluate to 0 or 1: 767 X' = X + ((A>X)+(B>X)+(C>X)+(D>X))>2 - ((A2 769 The relative positions of the pixel values A, B, C, D, and X are 770 shown in Figure 9. Note that A, B, C, D and X are always the 771 unfiltered values. 773 +---+ 774 | A | 775 +---+---+---+ 776 | B | X | C | 777 +---+---+---+ 778 | D | 779 +---+ 781 Figure 9: Constrained low pass filter pixel positions 783 9. Entropy coding 785 9.1. Overview 787 The following information is signaled at the sequence level: 789 o Sequence header 791 The following information is signaled at the frame level: 793 o Frame header 795 The following information is signaled at the CB level: 797 o Super-mode (mode, split, reference index for uni-prediction) 799 o Intra prediction mode 801 o PB-split (none, hor, ver, quad) 803 o TB-split (none or quad) 805 o Reference frame indices for bi-prediction 807 o Motion vector candidate index 809 o Transform coefficients if TB-split=0 811 The following information is signaled at the TB level: 813 o CBP (8 combinations of CBPY, CBPU, and CBPV) 815 o Transform coefficients 817 The following information is signaled at the PB level: 819 o Motion vector differences 821 9.2. Low Level Syntax 823 9.2.1. CB Level 825 super-mode (inter0/split/inter1/inter2-ref0/intra/inter2-ref1/inter2-ref2/inter2-ref3,..) 827 if (mode == inter0 || mode == inter1) 829 mv_idx (one of up to 2 motion vector candidates) 831 else if (mode == INTRA) 833 intra_mode (one of up to 8 intra modes) 835 tb_split (NONE or QUAD, coded jointly with CBP for tb_split=NONE) 837 else if (mode == INTER) 839 pb_split (NONE,VER,HOR,QUAD) 841 tb_split_and_cbp (NONE or QUAD and CBP) 843 else if (mode == BIPRED) 845 mvd_x0, mvd_y0 (motion vector difference for first vector) 847 mvd_x1, mvd_y1 (motion vector difference for second vector) 849 ref_idx0, ref_idx1 (two reference indices) 851 9.2.2. PB Level 853 if (mode == INTER2 || mode == BIPRED) 855 mvd_x, mvd_y (motion vector differences) 857 9.2.3. TB Level 859 if (mode != INTER0 and tb_split == 1) 861 cbp (8 possibilities for CBPY/CBPU/CBPV) 863 if (mode != INTER0) 865 transform coefficients 867 9.2.4. Super Mode 869 For each block of size NxN (64>=N>8), the following mutually 870 exclusive events are jointly encoded using a single VLC code as 871 follows (example using 4 reference frames): 873 If there is no interpolated reference frame: 875 INTER0 1 876 SPLIT 01 877 INTER1 001 878 INTER2-REF0 0001 879 BIPRED 00001 880 INTRA 000001 881 INTER2-REF1 0000001 882 INTER2-REF2 00000001 883 INTER2-REF3 00000000 885 If there is an interpolated reference frame: 887 INTER0 1 888 SPLIT 01 889 INTER1 001 890 BIPRED 0001 891 INTRA 00001 892 INTER2-REF1 000001 893 INTER2-REF2 0000001 894 INTER2-REF3 00000001 895 INTER2-REF0 00000000 897 If less than 4 reference frames is used, a shorter VLC table is used. 898 If bi-pred is not possible, or split is not possible, they are 899 omitted from the table and shorter codes are used for subsequent 900 elements. 902 Additionally, depending on information from the blocks to the left 903 and above (meta data and CBP), a different sorting of the events can 904 be used, e.g.: 906 SPLIT 1 907 INTER1 01 908 INTER2-REF0 001 909 INTER0 0001 910 INTRA 00001 911 INTER2-REF1 000001 912 INTER2-REF2 0000001 913 INTER2-REF3 00000001 914 BIPRED 00000000 916 9.2.5. CBP 918 Calculate code as follows: 920 if (tb-split == 0) 922 N = 4*CBPV + 2*CBPU + CBPY 924 else 926 N = 8 928 Map the value of N to code through a table lookup: 930 code = table[N] 932 where the purpose of the table lookup is the sort the different 933 values of code according to decreasing probability (typically CBPY=1, 934 CBPU=0, CBPV=0 having the highest probability). 936 Use a different table depending on the values of CBPY in neighbor 937 blocks (left and above). 939 Encode the value of code using a systematic VLC code. 941 9.2.6. Transform Coefficients 943 Transform coefficient coding uses a traditional zig-zag scan pattern 944 to convert a 2D array of quantized transform coefficients, coeff, to 945 a 1D array of samples. VLC coding of quantized transform 946 coefficients starts from the low frequency end of the 1D array using 947 two different modes; level-mode and run-mode, starting in level-mode: 949 o Level-mode 951 * Encode each coefficient, coeff, separately 953 * Each coefficient is encoded by: 955 + The absolute value, level=abs(coeff), using a VLC code and 957 + If level > 0, the sign bit (sign=0 or sign=1 for coeff>0 and 958 coeff<0 respectively). 960 * If coefficient N is zero, switch to run-mode, starting from 961 coefficient N+1. 963 o Run-mode 964 * For each non-zero coefficient, encode the combined event of: 966 1. Length of the zero-run, i.e. the number of zeros since the 967 last non-zero coefficient. 969 2. Whether or not level=abs(coeff) is greater than 1. 971 3. End of block (EOB) indicating that there are no more non- 972 zero coefficients. 974 * Additionally, if level = 1, code the sign bit. 976 * Additionally, if level > 1 define code = 2*(level-2)+sign, 978 * If the absolute value of coefficient N is larger than 1, switch 979 to level-mode, starting from coefficient N+1. 981 Example 983 Figure 10 illustrates an example where 16 quantized transform 984 coefficients are encoded. 986 4 987 3 988 2 | 2 989 1 | 1 1 | 1 990 | | 0 0 0 0 | | 0 0 0 0 991 |__|__|__|________|________|__|________|_______ 993 Figure 10: Coefficients to encode 995 Table 3 shows the mode, VLC number and symbols to be coded for each 996 coefficient. 998 +--------+-------------+-------------+------------------------------+ 999 | Index | abs(coeff) | Mode | Encoded symbols | 1000 +--------+-------------+-------------+------------------------------+ 1001 | 0 | 2 | level-mode | level=2,sign | 1002 | 1 | 1 | level-mode | level=1,sign | 1003 | 2 | 4 | level-mode | level=4,sign | 1004 | 3 | 1 | level-mode | level=1,sign | 1005 | 4 | 0 | level-mode | level=0 | 1006 | 5 | 0 | run-mode | | 1007 | 6 | 1 | run-mode | (run=1,level=1) | 1008 | 7 | 0 | run-mode | | 1009 | 8 | 0 | run-mode | | 1010 | 9 | 3 | run-mode | (run=1,level>1), | 1011 | | | | 2*(3-2)+sign | 1012 | 10 | 2 | level-mode | level=2, sign | 1013 | 11 | 0 | level-mode | level=0 | 1014 | 12 | 0 | run-mode | | 1015 | 13 | 1 | run-mode | (run=1,level=1) | 1016 | 14 | 0 | run-mode | EOB | 1017 | 15 | 0 | run-mode | | 1018 +--------+-------------+-------------+------------------------------+ 1020 Table 3: Transform coefficient encoding for the example above. 1022 10. High Level Syntax 1024 High level syntax is currently very simple and rudimentary as the 1025 primary focus so far has been on compression performance. It is 1026 expected to evolve as functionality is added. 1028 10.1. Sequence Header 1030 o Width - 16 bit 1032 o Height - 16 bit 1034 o Enable/disable PB-split - 1 bit 1036 o Enable/disable TB-split - 1 bit 1038 o Number of active reference frames (may go into frame header) - 2 1039 bits (max 4) 1041 o Enable/disable deblocking - 1 bit 1043 o Enable/disable constrained low-pass filter (CLPF1) - 1 bit 1045 10.2. Frame Header 1047 o Frame type - 1 bit 1049 o QP - 8 bits 1051 o Identification of active reference frames - num_ref*4 bits 1053 o Number of intra modes - 4 bits 1055 o Constrained low-pass filter (CLPF1) enable/disable - 1 bit 1057 11. IANA Considerations 1059 This document has no IANA considerations yet. TBD 1061 12. Security Considerations 1063 This document has no security considerations yet. TBD 1065 13. Acknowledgements 1067 The authors would like to thank Thomas Davies, Steinar Midtskogen and 1068 Mo Zanaty for reviewing this document and design, and providing 1069 constructive feedback. 1071 14. Normative References 1073 [I-D.davies-netvc-irfvc] 1074 Davies, T., "Interpolated reference frames for video 1075 coding", draft-davies-netvc-irfvc-00 (work in progress), 1076 October 2015. 1078 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1079 Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/ 1080 RFC2119, March 1997, 1081 . 1083 Authors' Addresses 1085 Arild Fuldseth 1086 Cisco 1087 Lysaker 1088 Norway 1090 Email: arilfuld@cisco.com 1091 Gisle Bjontegaard 1092 Cisco 1093 Lysaker 1094 Norway 1096 Email: gbjonteg@cisco.com 1098 Steinar Midtskogen 1099 Cisco 1100 Lysaker 1101 Norway 1103 Email: stemidts@cisco.com 1105 Thomas Davies 1106 Cisco 1107 London 1108 UK 1110 Email: thdavies@cisco.com 1112 Mo Zanaty 1113 Cisco 1114 RTP,NC 1115 USA 1117 Email: mzanaty@cisco.com