idnits 2.17.1 

draft-fuldseth-netvc-thor-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** There are 3 instances of too long lines in the document, the longest one
     being 30 characters in excess of 72.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document doesn't use any RFC 2119 keywords, yet seems to have RFC
     2119 boilerplate text.

  -- The document date (October 31, 2016) is 2734 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'N' is mentioned on line 972, but not defined

  == Outdated reference: A later version (-04) exists of
     draft-midtskogen-netvc-clpf-02


     Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                        A. Fuldseth
3	Internet-Draft                                            G. Bjontegaard
4	Intended status: Standards Track                           S. Midtskogen
5	Expires: May 4, 2017                                           T. Davies
6	                                                               M. Zanaty
7	                                                                   Cisco
8	                                                        October 31, 2016

10	                            Thor Video Codec
11	                      draft-fuldseth-netvc-thor-03

13	Abstract

15	   This document provides a high-level description of the Thor video
16	   codec.  Thor is designed to achieve high compression efficiency with
17	   moderate complexity, using the well-known hybrid video coding
18	   approach of motion-compensated prediction and transform coding.

20	Status of This Memo

22	   This Internet-Draft is submitted in full conformance with the
23	   provisions of BCP 78 and BCP 79.

25	   Internet-Drafts are working documents of the Internet Engineering
26	   Task Force (IETF).  Note that other groups may also distribute
27	   working documents as Internet-Drafts.  The list of current Internet-
28	   Drafts is at http://datatracker.ietf.org/drafts/current/.

30	   Internet-Drafts are draft documents valid for a maximum of six months
31	   and may be updated, replaced, or obsoleted by other documents at any
32	   time.  It is inappropriate to use Internet-Drafts as reference
33	   material or to cite them other than as "work in progress."

35	   This Internet-Draft will expire on May 4, 2017.

37	Copyright Notice

39	   Copyright (c) 2016 IETF Trust and the persons identified as the
40	   document authors.  All rights reserved.

42	   This document is subject to BCP 78 and the IETF Trust's Legal
43	   Provisions Relating to IETF Documents
44	   (http://trustee.ietf.org/license-info) in effect on the date of
45	   publication of this document.  Please review these documents
46	   carefully, as they describe your rights and restrictions with respect
47	   to this document.  Code Components extracted from this document must
48	   include Simplified BSD License text as described in Section 4.e of
49	   the Trust Legal Provisions and are provided without warranty as
50	   described in the Simplified BSD License.

52	Table of Contents

54	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
55	   2.  Definitions . . . . . . . . . . . . . . . . . . . . . . . . .   5
56	     2.1.  Requirements Language . . . . . . . . . . . . . . . . . .   5
57	     2.2.  Terminology . . . . . . . . . . . . . . . . . . . . . . .   6
58	   3.  Block Structure . . . . . . . . . . . . . . . . . . . . . . .   6
59	     3.1.  Super Blocks and Coding Blocks  . . . . . . . . . . . . .   6
60	     3.2.  Special Processing at Frame Boundaries  . . . . . . . . .   7
61	     3.3.  Transform Blocks  . . . . . . . . . . . . . . . . . . . .   8
62	     3.4.  Prediction Blocks . . . . . . . . . . . . . . . . . . . .   8
63	   4.  Intra Prediction  . . . . . . . . . . . . . . . . . . . . . .   8
64	   5.  Inter Prediction  . . . . . . . . . . . . . . . . . . . . . .   9
65	     5.1.  Multiple Reference Frames . . . . . . . . . . . . . . . .   9
66	     5.2.  Bi-Prediction . . . . . . . . . . . . . . . . . . . . . .  10
67	     5.3.  Improved chroma prediction  . . . . . . . . . . . . . . .  10
68	     5.4.  Reordered Frames  . . . . . . . . . . . . . . . . . . . .  10
69	     5.5.  Interpolated Reference Frames . . . . . . . . . . . . . .  10
70	     5.6.  Sub-Pixel Interpolation . . . . . . . . . . . . . . . . .  10
71	       5.6.1.  Luma Poly-phase Filter  . . . . . . . . . . . . . . .  10
72	       5.6.2.  Luma Special Filter Position  . . . . . . . . . . . .  12
73	       5.6.3.  Chroma Poly-phase Filter  . . . . . . . . . . . . . .  13
74	     5.7.  Motion Vector Coding  . . . . . . . . . . . . . . . . . .  13
75	       5.7.1.  Inter0 and Inter1 Modes . . . . . . . . . . . . . . .  13
76	       5.7.2.  Inter2 and Bi-Prediction Modes  . . . . . . . . . . .  15
77	       5.7.3.  Motion Vector Direction . . . . . . . . . . . . . . .  16
78	   6.  Transforms  . . . . . . . . . . . . . . . . . . . . . . . . .  16
79	   7.  Quantization  . . . . . . . . . . . . . . . . . . . . . . . .  16
80	     7.1.  Quantization matrices . . . . . . . . . . . . . . . . . .  17
81	       7.1.1.  Quantization matrix selection . . . . . . . . . . . .  17
82	       7.1.2.  Quantization matrix design  . . . . . . . . . . . . .  18
83	   8.  Loop Filtering  . . . . . . . . . . . . . . . . . . . . . . .  18
84	     8.1.  Deblocking  . . . . . . . . . . . . . . . . . . . . . . .  18
85	       8.1.1.  Luma deblocking . . . . . . . . . . . . . . . . . . .  18
86	       8.1.2.  Chroma Deblocking . . . . . . . . . . . . . . . . . .  19
87	     8.2.  Constrained Low Pass Filter (CLPF)  . . . . . . . . . . .  20
88	   9.  Entropy coding  . . . . . . . . . . . . . . . . . . . . . . .  20
89	     9.1.  Overview  . . . . . . . . . . . . . . . . . . . . . . . .  20
90	     9.2.  Low Level Syntax  . . . . . . . . . . . . . . . . . . . .  21
91	       9.2.1.  CB Level  . . . . . . . . . . . . . . . . . . . . . .  21
92	       9.2.2.  PB Level  . . . . . . . . . . . . . . . . . . . . . .  21
93	       9.2.3.  TB Level  . . . . . . . . . . . . . . . . . . . . . .  22
94	       9.2.4.  Super Mode  . . . . . . . . . . . . . . . . . . . . .  22
95	       9.2.5.  CBP . . . . . . . . . . . . . . . . . . . . . . . . .  23
96	       9.2.6.  Transform Coefficients  . . . . . . . . . . . . . . .  23

98	   10. High Level Syntax . . . . . . . . . . . . . . . . . . . . . .  25
99	     10.1.  Sequence Header  . . . . . . . . . . . . . . . . . . . .  25
100	     10.2.  Frame Header . . . . . . . . . . . . . . . . . . . . . .  26
101	   11. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  27
102	   12. Security Considerations . . . . . . . . . . . . . . . . . . .  27
103	   13. Normative References  . . . . . . . . . . . . . . . . . . . .  27
104	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  27

106	1.  Introduction

108	   This document provides a high-level description of the Thor video
109	   codec.  Thor is designed to achieve high compression efficiency with
110	   moderate complexity, using the well-known hybrid video coding
111	   approach of motion-compensated prediction and transform coding.

113	   The Thor video codec is a block-based hybrid video codec similar in
114	   structure to widespread standards.  The high level encoder and
115	   decoder structures are illustrated in Figure 1 and Figure 2
116	   respectively.

118	                  +---+   +-----------+   +-----------+   +--------+
119	       Input--+-->| + |-->| Transform |-->| Quantizer |-->| Entropy|
120	       Video  |   +---+   +-----------+   +-----------+   | Coding |
121	              |     ^ -                         |         +--------+
122	              |     |                           v              |
123	              |     |                     +-----------+        v
124	              |     |                     |  Inverse  |     Output
125	              |     |                     | Transform |    Bitstream
126	              |     |                     +-----------+
127	              |     |                           |
128	              |     |                           v
129	              |     |                         +---+
130	              |     +------------------------>| + |
131	              |     |      +-------------+    +---+
132	              |     |   ___| Intra Frame |      |
133	              |     |  /   | Prediction  |<-----+
134	              |     | /    +-------------+      |
135	              |     |/                          v
136	              |      \     +-------------+  +---------+
137	              |       \    | Inter Frame |  |  Loop   |
138	              |        \___| Prediction  |  | Filters |
139	              |            +-------------+  +---------+
140	              |                   ^             |
141	              |                   |             v
142	              |            +------------+   +---------------+
143	              |            |   Motion   |   | Reconstructed |
144	              +----------->| Estimation |<--| Frame Memory  |
145	                           +------------+   +---------------+

147	                        Figure 1: Encoder Structure

149	                       +----------+      +-----------+
150	        Input  ------->| Entropy  |----->|  Inverse  |
151	      Bitstream        | Decoding |      | Transform |
152	                       +----------+      +-----------+
153	                                               |
154	                                               v
155	                                             +---+
156	                   +------------------------>| + |
157	                   |      +-------------+    +---+
158	                   |   ___| Intra Frame |      |
159	                   |  /   | Prediction  |<-----+
160	                   | /    +-------------+      |
161	                   |/                          v
162	                    \     +-------------+  +---------+
163	                     \    | Inter Frame |  |  Loop   |
164	                      \___| Prediction  |  | Filters |
165	                          +-------------+  +---------+
166	                                 ^             |-------------> Output
167	                                 |             v               Video
168	                        +--------------+   +---------------+
169	                        |     Motion   |   | Reconstructed |
170	                        | Compensation |<--| Frame Memory  |
171	                        +--------------+   +---------------+

173	                        Figure 2: Decoder Structure

175	   The remainder of this document is organized as follows.  First, some
176	   requirements language and terms are defined.  Block structures are
177	   described in detail, followed by intra-frame prediction techniques,
178	   inter-frame prediction techniques, transforms, quantization, loop
179	   filters, entropy coding, and finally high level syntax.

181	   An open source reference implementation is available at
182	   github.com/cisco/thor.

184	2.  Definitions

186	2.1.  Requirements Language

188	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
189	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
190	   document are to be interpreted as described in RFC 2119 [RFC2119].

192	2.2.  Terminology

194	   This document frequently uses the following terms.

196	      SB: Super Block - 64x64 or 128x128 block (luma pixels) which can
197	      be divided into CBs.

199	      CB: Coding Block - Subdivision of a SB, down to 8x8 (luma pixels).

201	      PB: Prediction Block - Subdivision of a CB, into 1, 2 or 4 equal
202	      blocks.

204	      TB: Transform Block - Subdivision of a CB, into 1 or 4 equal
205	      blocks.

207	3.  Block Structure

209	3.1.  Super Blocks and Coding Blocks

211	   Input frames with bitdepths of 8, 10 or 12 are supported.  The
212	   internal bitdepth can be 8, 10 or 12 regardless if input bitdepth.
213	   The bitdepth of the output frames always follows the input frames.
214	   Chroma can be subsampled in both directions (4:2:0) or have full
215	   resolution (4:4:4).

217	   Each frame is divided into 64x64 or 128x128 Super Blocks (SB) which
218	   are processed in raster-scan order.  The SB size is signaled in the
219	   sequence header.  Each SB can be divided into Coding Blocks (CB)
220	   using a quad-tree structure.  The smallest allowed CB size is 8x8
221	   luma pixels.  The four CBs of a larger block are coded/signaled in
222	   the following order; upleft, downleft, upright, and downright.

224	   The following modes are signaled at the CB level:

226	   o  Intra

228	   o  Inter0 (skip): MV index, no residual information

230	   o  Inter1 (merge): MV index, residual information

232	   o  Inter2 (uni-pred): explicit motion information, residual
233	      information

235	   o  Inter3 (ni-pred): explicit motion information, residual
236	      information

238	3.2.  Special Processing at Frame Boundaries

240	   At frame boundaries some square blocks might not be complete.  For
241	   example, for 1920x1080 resolutions, the bottom row would consist of
242	   rectangular blocks of size 64x56.  Rectangular blocks at frame
243	   boundaries are handled as follows.  For each rectangular block, send
244	   one bit to choose between:

246	   o  A rectangular inter0 block and

248	   o  Further split.

250	   For the bottom part of a 1920x1080 frame, this implies the following:

252	   o  For each 64x56 block, transmit one bit to signal a 64x56 inter0
253	      block or a split into two 32x32 blocks and two 32x24 blocks.

255	   o  For each 32x24 block, transmit one bit to signal a 32x24 inter0
256	      block or a split into two 16x16 blocks and two 16x8 blocks.

258	   o  For each 16x8 block, transmit one bit to signal a 16x8 inter0
259	      block or a split into two 8x8 blocks.

261	   Two examples of handling 64x56 blocks at the bottom row of a
262	   1920x1080 frame are shown in Figure 3 and Figure 4 respectively.

264	                                          64
265	                           +-------------------------------+
266	                           |                               |
267	                           |                               |
268	                           |                               |
269	                           |                               |
270	                           |                               |
271	                           |                               |
272	                           |                               |
273	                       64  | 56           64x56            |
274	                           |              SKIP             |
275	                           |                               |
276	                           |                               |
277	                           |                               |
278	                           |                               |
279	         - - - - - - - - - + - - - - - - - - - - - - - - - + - - -
280	         Frame boundary    | 8                             |
281	                           +-------------------------------+

283	                  Figure 3: Super block at frame boundary
284	                                          64
285	                           +---------------+---------------+
286	                           |               |               |
287	                           |               |               |
288	                           |               |               |
289	                           |               |               |
290	                           |               |               |
291	                           |               |               |
292	                           |               |               |
293	                       64  +---------------+-------+-------+
294	                           |               |       |       |
295	                           |               |       |       |
296	                           |     32x24     |       |       |
297	                           |     SKIP      +---+---+-------+
298	                           |               |   |   | 16x8  |
299	         - - - - - - - - - + - - - - - - - +---+---+ - - - + - - -
300	         Frame boundary    | 8             |   |   | SKIP  |
301	                           +---------------+---+---+-------+

303	                 Figure 4: Coding block at frame boundary

305	3.3.  Transform Blocks

307	   A coding block (CB) can be divided into four smaller transform blocks
308	   (TBs).

310	3.4.  Prediction Blocks

312	   A coding block (CB) can also be divided into smaller prediction
313	   blocks (PBs) for the purpose of motion-compensated prediction.
314	   Horizontal, vertical and quad split are used.

316	4.  Intra Prediction

318	   8 intra prediction modes are used:

320	   1.  DC

322	   2.  Vertical (V)

324	   3.  Horizontal (H)

326	   4.  Upupright (north-northeast)

328	   5.  Upupleft (north-northwest)

330	   6.  Upleft (northwest)
331	   7.  Upleftleft (west-northwest)

333	   8.  Downleftleft (west-southwest)

335	   The definition of DC, vertical, and horizontal modes are
336	   straightforward.

338	   The upleft direction is exactly 45 degrees.

340	   The upupright, upupleft, and upleftleft directions are equal to
341	   arctan(1/2) from the horizontal or vertical direction, since they are
342	   defined by going one pixel horizontally and two pixels vertically (or
343	   vice versa).

345	   For the 5 angular intra modes (i.e. angle different from 90 degrees),
346	   the pixels of the neighbor blocks are filtered before they are used
347	   for prediction:

349	   y(n) = (x(n-1) + 2*x(n) + x(n+1) + 2)/4

351	   For the angular intra modes that are not 45 degrees, the prediction
352	   sometimes requires sample values at a half-pixel position.  These
353	   sample values are determined by an additional filter:

355	   z(n + 1/2) = (y(n) + y(n+1))/2

357	5.  Inter Prediction

359	5.1.  Multiple Reference Frames

361	   Multiple reference frames are currently implemented as follows.

363	   o  Use a sliding-window process to keep the N most recent
364	      reconstructed frames in memory.  The value of N is signaled in the
365	      sequence header.

367	   o  In the frame header, signal which of these frames shall be active
368	      for the current frame.

370	   o  For each CB, signal which of the active frames to be used for MC.

372	   Combined with re-ordering, this allows for MPEG-1 style B frames.

374	   A desirable future extension is to allow long-term reference frames
375	   in addition to the short-term reference frames defined by the
376	   sliding-window process.

378	5.2.  Bi-Prediction

380	   In case of bi-prediction, two reference indices and two motion
381	   vectors are signaled per CB.  In the current version, PB-split is not
382	   allowed in bi-prediction mode.  Sub-pixel interpolation is performed
383	   for each motion vector/reference index separately before doing an
384	   average between the two predicted blocks:

386	   p(x,y) = (p0(x,y) + p1(x,y))/2

388	5.3.  Improved chroma prediction

390	   If specified in the sequence header, the chroma prediction, both
391	   intra and inter, or either, is improved by using the luma
392	   reconstruction if certain criteria are met.  The process is described
393	   in the separate CLPF draft [I-D.midtskogen-netvc-chromapred].

395	5.4.  Reordered Frames

397	   Frames may be transmitted out of order.  Reference frames are
398	   selected from the sliding window buffer as normal.

400	5.5.  Interpolated Reference Frames

402	   A flag is sent in the sequence header indicating that interpolated
403	   reference frames may be used.

405	   If a frame is using an interpolated reference frame, it will be the
406	   first reference in the reference list, and will be interpolated from
407	   the second and third reference in the list.  It is indicated by a
408	   reference index of -1 and has a frame number equal to that of the
409	   current frame.

411	   The interpolated reference is created by a deterministic process
412	   common to the encoder and decoder, and described in the separate
413	   IRFVC draft [I-D.davies-netvc-irfvc].

415	5.6.  Sub-Pixel Interpolation

417	5.6.1.  Luma Poly-phase Filter

419	   Inter prediction uses traditional block-based motion compensated
420	   prediction with quarter pixel resolution.  A separable 6-tap poly-
421	   phase filter is the basis method for doing MC with sub-pixel
422	   accuracy.  The luma filter coefficients are as follows:

424	   When bi-prediction is enabled in the sequence header:

426	   1/4 phase: [2,-10,59,17,-5,1]/64

428	   2/4 phase: [1,-8,39,39,-8,1]/64

430	   3/4 phase: [1,-5,17,59,-10,2]/64

432	   When bi-prediction is disabled in the sequence header:

434	   1/4 phase: [1,-7,55,19,-5,1]/64

436	   2/4 phase: [1,-7,38,38,-7,1]/64

438	   3/4 phase: [1,-5,19,55,-7,1]/64

440	   With reference to Figure 5, a fractional sample value, e.g. i0,0
441	   which has a phase of 1/4 in the horizontal dimension and a phase of
442	   1/2 in the vertical dimension is calculated as follows:

444	   a0,j = 2*A-2,i - 10*A-1,i + 59*A0,i + 17*A1,i - 5*A2,i + 1*A3,i

446	   where j = -2,...,3

448	   i0,0 = (1*a0,-2 - 8*a0,-1 + 39*a0,0 + 39*a0,1 - 8*a0,2 + 1*a0,3 +
449	   2048)/4096

451	   The minimum sub-block size is 8x8.

453	          +-----+-----+-----+-----+-----+-----+-----+-----+-----+
454	          |A    |     |     |     |A    |a    |b    |c    |A    |
455	          |-1,-1|     |     |     | 0,-1| 0,-1| 0,-1| 0,-1| 1,-1|
456	          +-----+-----+-----+-----+-----+-----+-----+-----+-----+
457	          |     |     |     |     |     |     |     |     |     |
458	          |     |     |     |     |     |     |     |     |     |
459	          +-----+-----+-----+-----+-----+-----+-----+-----+-----+
460	          |     |     |     |     |     |     |     |     |     |
461	          |     |     |     |     |     |     |     |     |     |
462	          +-----+-----+-----+-----+-----+-----+-----+-----+-----+
463	          |     |     |     |     |     |     |     |     |     |
464	          |     |     |     |     |     |     |     |     |     |
465	          +-----+-----+-----+-----+-----+-----+-----+-----+-----+
466	          |A    |     |     |     |A    |a    |b    |c    |A    |
467	          |-1,0 |     |     |     | 0,0 | 0,0 | 0,0 | 0,0 | 1,0 |
468	          +-----+-----+-----+-----+-----+-----+-----+-----+-----+
469	          |d    |     |     |     |d    |e    |f    |g    |d    |
470	          |-1,0 |     |     |     | 0,0 | 0,0 | 0,0 | 0,0 | 1,0 |
471	          +-----+-----+-----+-----+-----+-----+-----+-----+-----+
472	          |h    |     |     |     |h    |i    |j    |k    |h    |
473	          |-1,0 |     |     |     | 0,0 | 0,0 | 0,0 | 0,0 | 1,0 |
474	          +-----+-----+-----+-----+-----+-----+-----+-----+-----+
475	          |l    |     |     |     |l    |m    |n    |o    |l    |
476	          |-1,0 |     |     |     | 0,0 | 0,0 | 0,0 | 0,0 | 1,0 |
477	          +-----+-----+-----+-----+-----+-----+-----+-----+-----+
478	          |A    |     |     |     |A    |a    |b    |c    |A    |
479	          |-1,1 |     |     |     | 0,1 | 0,1 | 0,1 | 0,1 | 1,1 |
480	          +-----+-----+-----+-----+-----+-----+-----+-----+-----+

482	                       Figure 5: Sub-pixel positions

484	5.6.2.  Luma Special Filter Position

486	   For the fractional pixel position having exactly 2 quarter pixel
487	   offsets in each dimension, a non-separable filter is used to
488	   calculate the interpolated value.  With reference to Figure 5, the
489	   center position j0,0 is calculated as follows:

491	   j0,0 =

493	   [0*A-1,-1 + 1*A0,-1 + 1*A1,-1 + 0*A2,-1 +

495	   1*A-1,0 + 2*A0,0 + 2*A1,0 + 1*A2,0 +

497	   1*A-1,1 + 2*A0,1 + 2*A1,1 + 1*A2,1 +
498	   0*A-1,2 + 1*A0,2 + 1*A1,2 + 0*A2,2 + 8]/16

500	5.6.3.  Chroma Poly-phase Filter

502	   Chroma interpolation is performed with 1/8 pixel resolution using the
503	   following poly-phase filter.

505	   1/8 phase: [-2, 58, 10, -2]/64

507	   2/8 phase: [-4, 54, 16, -2]/64

509	   3/8 phase: [-4, 44, 28, -4]/64

511	   4/8 phase: [-4, 36, 36, -4]/64

513	   5/8 phase: [-4, 28, 44, -4]/64

515	   6/8 phase: [-2, 16, 54, -4]/64

517	   7/8 phase: [-2, 10, 58, -2]/64

519	5.7.  Motion Vector Coding

521	5.7.1.  Inter0 and Inter1 Modes

523	   Inter0 and inter1 modes imply signaling of a motion vector index to
524	   choose a motion vector from a list of candidate motion vectors with
525	   associated reference frame index.  A list of motion vector candidates
526	   are derived from at most two different neighbor blocks, each having a
527	   unique motion vector/reference frame index.  Signaling of the motion
528	   vector index uses 0 or 1 bit, dependent on the number of unique
529	   motion vector candidates.  If the chosen neighbor block is coded in
530	   bi-prediction mode, the inter0 or inter1 block inherits both motion
531	   vectors, both reference indices and the bi-prediction property of the
532	   neighbor block.

534	   For block sizes less than 64x64, inter0 has only one motion vector
535	   candidate, and its value is always zero.

537	   Which neighbor blocks to use for motion vector candidates depends on
538	   the availability of the neighbor blocks (i.e. whether the neighbor
539	   blocks have already been coded, belong to the same slice and are not
540	   outside the frame boundaries).  Four different availabilities, U, UR,
541	   L, and LL, are defined as illustrated in Figure 6.  If the neighbor
542	   block is intra it is considered to be available but with a zero
543	   motion vector.

545	                               |           |
546	                               |     U     |    UR
547	                    -----------+-----------+-----------
548	                               |           |
549	                               |  current  |
550	                         L     |   block   |
551	                               |           |
552	                               |           |
553	                    -----------+-----------+
554	                               |
555	                               |
556	                         LL    |
557	                               |

559	                 Figure 6: Availability of neighbor blocks

561	   Based on the four availabilities defined above, each of the motion
562	   vector candidates is derived from one of the possible neighbor blocks
563	   defined in Figure 7.

565	                  +----+----+      +----+    +----+----+
566	                  | UL | U0 |      | U1 |    | U2 | UR |
567	                  +----+----+------+----+----+----+----+
568	                  | L0 |                          |
569	                  +----+                          |
570	                       |                          |
571	                       |                          |
572	                  +----+        current           |
573	                  | L1 |         block            |
574	                  +----+                          |
575	                       |                          |
576	                  +----+                          |
577	                  | L2 |                          |
578	                  +----+--------------------------+
579	                  | LL |
580	                  +----+

582	                    Figure 7: Motion vector candidates

584	   The choice of motion vector candidates depends on the availability of
585	   neighbor blocks as shown in Table 1.

587	            +----+-----+----+-----+---------------------------+
588	            | U  | UR  | L  | LL  | Motion vector candidates  |
589	            +----+-----+----+-----+---------------------------+
590	            | 0  | 0   | 0  | 0   | zero vector               |
591	            | 1  | 0   | 0  | 0   | U2, zero vector           |
592	            | 0  | 1   | 0  | 0   | NA                        |
593	            | 1  | 1   | 0  | 0   | U2,zero vector            |
594	            | 0  | 0   | 1  | 0   | L2, zero vector           |
595	            | 1  | 0   | 1  | 0   | U2,L2                     |
596	            | 0  | 1   | 1  | 0   | NA                        |
597	            | 1  | 1   | 1  | 0   | U2,L2                     |
598	            | 0  | 0   | 0  | 1   | NA                        |
599	            | 1  | 0   | 0  | 1   | NA                        |
600	            | 0  | 1   | 0  | 1   | NA                        |
601	            | 1  | 1   | 0  | 1   | NA                        |
602	            | 0  | 0   | 1  | 1   | L2, zero vector           |
603	            | 1  | 0   | 1  | 1   | U2,L2                     |
604	            | 0  | 1   | 1  | 1   | NA                        |
605	            | 1  | 1   | 1  | 1   | U2,L2                     |
606	            +----+-----+----+-----+---------------------------+

608	      Table 1: Motion vector candidates for different availability of
609	                              neighbor blocks

611	5.7.2.  Inter2 and Bi-Prediction Modes

613	   Motion vectors are coded using motion vector prediction.  The motion
614	   vector predictor is defined as the median of the motion vectors from
615	   three neighbor blocks.  Definition of the motion vector predictor
616	   uses the same definition of availability and neighbors as in Figure 6
617	   and Figure 7 respectively.  The three vectors used for median
618	   filtering depends on the availability of neighbor blocks as shown in
619	   Table 2.  If the neighbor block is coded in bi-prediction mode, only
620	   the first motion vector (in transmission order), MV0, is used as
621	   input to the median operator.

623	      +----+-----+----+-----+--------------------------------------+
624	      | U  | UR  | L  | LL  | Motion vectors for median filtering  |
625	      +----+-----+----+-----+--------------------------------------+
626	      | 0  | 0   | 0  | 0   | 3 x zero vector                      |
627	      | 1  | 0   | 0  | 0   | U0,U1,U2                             |
628	      | 0  | 1   | 0  | 0   | NA                                   |
629	      | 1  | 1   | 0  | 0   | U0,U2,UR                             |
630	      | 0  | 0   | 1  | 0   | L0,L1,L2                             |
631	      | 1  | 0   | 1  | 0   | UL,U2,L2                             |
632	      | 0  | 1   | 1  | 0   | NA                                   |
633	      | 1  | 1   | 1  | 0   | U0,UR,L2,L0                          |
634	      | 0  | 0   | 0  | 1   | NA                                   |
635	      | 1  | 0   | 0  | 1   | NA                                   |
636	      | 0  | 1   | 0  | 1   | NA                                   |
637	      | 1  | 1   | 0  | 1   | NA                                   |
638	      | 0  | 0   | 1  | 1   | L0,L2,LL                             |
639	      | 1  | 0   | 1  | 1   | U2,L0,LL                             |
640	      | 0  | 1   | 1  | 1   | NA                                   |
641	      | 1  | 1   | 1  | 1   | U0,UR,L0                             |
642	      +----+-----+----+-----+--------------------------------------+

644	      Table 2: Neighbor blocks used to define motion vector predictor
645	                         through median filtering

647	5.7.3.  Motion Vector Direction

649	   Motion vectors referring to reference frames later in time than the
650	   current frame are stored with their sign reversed, and these reversed
651	   values are used for coding and motion vector prediction.

653	6.  Transforms

655	   Transforms are applied at the TB or CB level, implying that transform
656	   sizes range from 4x4 to 128x128.  The transforms form an embedded
657	   structure meaning the transform matrix elements of the smaller
658	   transforms can be extracted from the larger transforms.

660	7.  Quantization

662	   For the 32x32, 64x64 and 128x128 transform sizes, only the 16x16 low
663	   frequency coefficients are quantized and transmitted.

665	   The 64x64 inverse transform is defined as a 32x32 transform followed
666	   by duplicating each output sample into a 2x2 block.  The 128x128
667	   inverse transform is defined as a 32x32 transform followed by
668	   duplicating each output sample into a 4x4 block.

670	7.1.  Quantization matrices

672	   A flag is transmitted in the sequence header to indicate whether
673	   quantization matrices are used.  If this flag is true, a 6 bit value
674	   qmtx_offset is transmitted in the sequence header to indicate matrix
675	   strength.

677	   If used, then in dequantization a separate scaling factor is applied
678	   to each coefficient, so that the dequantized value of a coefficient
679	   ci at position i is:

681	            (ci * d(q) * IW(i,c,s,t,q) + 2^(k + 5)) >> (k + 6)

683	                           Figure 8: Equation 1

685	   where IW is the scale factor for coefficient position i with size s,
686	   frame type (inter/inter) t, component (Y, Cb or Cr) c and quantizer
687	   q; and k=k(s,q) is the dequantization shift.  IW has scale 64, that
688	   is, a weight value of 64 is no different to unweighted
689	   dequantization.

691	7.1.1.  Quantization matrix selection

693	   The current luma qp value qpY and the offset value qmtx_offset
694	   determine a quantisation matrix set by the formula:

696	         qmlevel = max(0,min(11,((qpY + qmtx_offset) * 12) / 44))

698	                           Figure 9: Equation 2

700	   This selects one of the 12 different sets of default quantization
701	   matrix, with increasing qmlevel indicating increasing flatness.

703	   For a given value of qmlevel, different weighting matrices are
704	   provided for all combinations of transform block size, type (intra/
705	   inter), and component (Y, Cb, Cr).  Matrices at low qmlevel are flat
706	   (constant value 64).  Matrices for inter frames have unity DC gain
707	   (i.e. value 64 at position 0), whereas those for intra frames are
708	   designed such that the inverse weighting matrix has unity energy gain
709	   (i.e. normalized sum-squared of the scaling factors is 1).

711	7.1.2.  Quantization matrix design

713	   Further details on the quantization matrix and implementation can be
714	   found in the separate QMTX draft [I-D.davies-netvc-qmtx].

716	8.  Loop Filtering

718	8.1.  Deblocking

720	8.1.1.  Luma deblocking

722	   Luma deblocking is performed on an 8x8 grid as follows:

724	   1.  For each vertical edge between two 8x8 blocks, calculate the
725	       following for each of line 2 and line 5 respectively:

727	       d = abs(a-b) + abs(c-d),

729	       where a and b, are on the left hand side of the block edge and c
730	       and d are on the right hand side of the block edge:

732	       a b | c d

734	   2.  For each line crossing the vertical edge, perform deblocking if
735	       and only if all of the following conditions are true:

737	       *  d2+d5 < beta(QP)

739	       *  The edge is also a transform block edge

741	       *  abs(mvx(left)) > 2, or abs(mvx(right)) > 2, or

743	          abs(mvy(left)) > 2, or abs(mvy(right)) > 2, or

745	          One of the transform blocks on each side of the edge has non-
746	          zero coefficients, or

748	          One of the transform blocks on each side of the edge is coded
749	          using intra mode.

751	   3.  If deblocking is performed, calculate a delta value as follows:

753	       delta = clip((18*(c-b) - 6*(d-a) + 16)/32,tc,-tc),

755	       where tc is a QP-dependent value.

757	   4.  Next, modify two pixels on each side of the block edge as
758	       follows:

760	       a' = a + delta/2

762	       b' = b + delta

764	       c' = c + delta

766	       d' = d + delta/2

768	   5.  The same procedure is followed for horizontal block edges.

770	   The relative positions of the samples, a, b, c, d and the motion
771	   vectors, MV, are illustrated in Figure 10.

773	                                    |
774	                                    | block edge
775	                                    |
776	                            +---+---+---+---+
777	                            | a | b | c | d |
778	                            +---+---+---+---+
779	                                    |
780	                           mv       | mv
781	                             x,left |   x,right
782	                                    |
783	                           mv         mv
784	                             y,left     y,right

786	               Figure 10: Deblocking filter pixel positions

788	8.1.2.  Chroma Deblocking

790	   Chroma deblocking is performed on a 4x4 grid as follows:

792	   1.  Delocking of the edge between two 4x4 blocks is performed if and
793	       only if:

795	       *  The pixels on either side of the block edge belongs to an
796	          intra block.

798	       *  The block edge is also an edge between two transform blocks.

800	   2.  If deblocking is performed, calculate a delta value as follows:

802	       delta = clip((4*(c-b) + (d-a) + 4)/8,tc,-tc),

804	       where tc is a QP-dependent value.

806	   3.  Next, modify one pixel on each side of the block edge as follows:

808	       b' = b + delta

810	       c' = c + delta

812	8.2.  Constrained Low Pass Filter (CLPF)

814	   A low-pass filter is applied after the deblocking filter if signaled
815	   in the sequence header.  It can still be switched off for individual
816	   frames in the frame header.  Also signaled in the frame header is
817	   whether to apply the filter for all qualified 128x128 blocks or to
818	   transmit a flag for each such block.  A super block does not qualify
819	   if it only contains Inter0 (skip) coding block and no signal is
820	   transmitted for these blocks.

822	   The filter is described in the separate CLPF draft
823	   [I-D.midtskogen-netvc-clpf].

825	9.  Entropy coding

827	9.1.  Overview

829	   The following information is signaled at the sequence level:

831	   o  Sequence header

833	   The following information is signaled at the frame level:

835	   o  Frame header

837	   The following information is signaled at the CB level:

839	   o  Super-mode (mode, split, reference index for uni-prediction)

841	   o  Intra prediction mode

843	   o  PB-split (none, hor, ver, quad)

845	   o  TB-split (none or quad)

847	   o  Reference frame indices for bi-prediction

849	   o  Motion vector candidate index

851	   o  Transform coefficients if TB-split=0

853	   The following information is signaled at the TB level:

855	   o  CBP (8 combinations of CBPY, CBPU, and CBPV)

857	   o  Transform coefficients

859	   The following information is signaled at the PB level:

861	   o  Motion vector differences

863	9.2.  Low Level Syntax

865	9.2.1.  CB Level

867	    super-mode          (inter0/split/inter1/inter2-ref0/intra/inter2-ref1/inter2-ref2/inter2-ref3,..)

869	    if (mode == inter0 || mode == inter1)

871	      mv_idx                    (one of up to 2 motion vector candidates)

873	    else if (mode == INTRA)

875	      intra_mode                (one of up to 8 intra modes)

877	      tb_split                  (NONE or QUAD, coded jointly with CBP for tb_split=NONE)

879	    else if (mode == INTER)

881	      pb_split          (NONE,VER,HOR,QUAD)

883	      tb_split_and_cbp  (NONE or QUAD and CBP)

885	    else if (mode == BIPRED)

887	      mvd_x0, mvd_y0    (motion vector difference for first vector)

889	      mvd_x1, mvd_y1    (motion vector difference for second vector)

891	      ref_idx0, ref_idx1        (two reference indices)

893	9.2.2.  PB Level

895	       if (mode == INTER2 || mode == BIPRED)

897	         mvd_x, mvd_y              (motion vector differences)

899	9.2.3.  TB Level

901	       if (mode != INTER0 and tb_split == 1)

903	         cbp                       (8 possibilities for CBPY/CBPU/CBPV)

905	       if (mode != INTER0)

907	         transform coefficients

909	9.2.4.  Super Mode

911	   For each block of size NxN (64>=N>8), the following mutually
912	   exclusive events are jointly encoded using a single VLC code as
913	   follows (example using 4 reference frames):

915	   If there is no interpolated reference frame:

917	   INTER0      1
918	   SPLIT       01
919	   INTER1      001
920	   INTER2-REF0 0001
921	   BIPRED      00001
922	   INTRA       000001
923	   INTER2-REF1     0000001
924	   INTER2-REF2     00000001
925	   INTER2-REF3     00000000

927	   If there is an interpolated reference frame:

929	   INTER0      1
930	   SPLIT       01
931	   INTER1      001
932	   BIPRED      0001
933	   INTRA       00001
934	   INTER2-REF1     000001
935	   INTER2-REF2     0000001
936	   INTER2-REF3     00000001
937	   INTER2-REF0 00000000

939	   If less than 4 reference frames is used, a shorter VLC table is used.
940	   If bi-pred is not possible, or split is not possible, they are
941	   omitted from the table and shorter codes are used for subsequent
942	   elements.

944	   Additionally, depending on information from the blocks to the left
945	   and above (meta data and CBP), a different sorting of the events can
946	   be used, e.g.:

948	     SPLIT       1
949	     INTER1      01
950	     INTER2-REF0 001
951	     INTER0      0001
952	     INTRA       00001
953	     INTER2-REF1 000001
954	     INTER2-REF2 0000001
955	     INTER2-REF3 00000001
956	     BIPRED      00000000

958	9.2.5.  CBP

960	   Calculate code as follows:

962	       if (tb-split == 0)

964	         N = 4*CBPV + 2*CBPU + CBPY

966	       else

968	         N = 8

970	   Map the value of N to code through a table lookup:

972	   code = table[N]

974	   where the purpose of the table lookup is the sort the different
975	   values of code according to decreasing probability (typically CBPY=1,
976	   CBPU=0, CBPV=0 having the highest probability).

978	   Use a different table depending on the values of CBPY in neighbor
979	   blocks (left and above).

981	   Encode the value of code using a systematic VLC code.

983	9.2.6.  Transform Coefficients

985	   Transform coefficient coding uses a traditional zig-zag scan pattern
986	   to convert a 2D array of quantized transform coefficients, coeff, to
987	   a 1D array of samples.  VLC coding of quantized transform
988	   coefficients starts from the low frequency end of the 1D array using
989	   two different modes; level-mode and run-mode, starting in level-mode:

991	   o  Level-mode

993	      *  Encode each coefficient, coeff, separately

995	      *  Each coefficient is encoded by:

997	         +  The absolute value, level=abs(coeff), using a VLC code and

999	         +  If level > 0, the sign bit (sign=0 or sign=1 for coeff>0 and
1000	            coeff<0 respectively).

1002	      *  If coefficient N is zero, switch to run-mode, starting from
1003	         coefficient N+1.

1005	   o  Run-mode

1007	      *  For each non-zero coefficient, encode the combined event of:

1009	         1.  Length of the zero-run, i.e. the number of zeros since the
1010	             last non-zero coefficient.

1012	         2.  Whether or not level=abs(coeff) is greater than 1.

1014	         3.  End of block (EOB) indicating that there are no more non-
1015	             zero coefficients.

1017	      *  Additionally, if level = 1, code the sign bit.

1019	      *  Additionally, if level > 1 define code = 2*(level-2)+sign,

1021	      *  If the absolute value of coefficient N is larger than 1, switch
1022	         to level-mode, starting from coefficient N+1.

1024	   Example

1026	   Figure 11 illustrates an example where 16 quantized transform
1027	   coefficients are encoded.

1029	                    4
1030	                                         3
1031	              2     |                       2
1032	                 1  |  1        1        |           1
1033	              |     |     0  0     0  0  |  |  0  0     0  0
1034	              |__|__|__|________|________|__|________|_______

1036	                     Figure 11: Coefficients to encode

1038	   Table 3 shows the mode, VLC number and symbols to be coded for each
1039	   coefficient.

1041	   +--------+-------------+-------------+------------------------------+
1042	   | Index  | abs(coeff)  | Mode        | Encoded symbols              |
1043	   +--------+-------------+-------------+------------------------------+
1044	   | 0      | 2           | level-mode  | level=2,sign                 |
1045	   | 1      | 1           | level-mode  | level=1,sign                 |
1046	   | 2      | 4           | level-mode  | level=4,sign                 |
1047	   | 3      | 1           | level-mode  | level=1,sign                 |
1048	   | 4      | 0           | level-mode  | level=0                      |
1049	   | 5      | 0           | run-mode    |                              |
1050	   | 6      | 1           | run-mode    | (run=1,level=1)              |
1051	   | 7      | 0           | run-mode    |                              |
1052	   | 8      | 0           | run-mode    |                              |
1053	   | 9      | 3           | run-mode    | (run=1,level>1),             |
1054	   |        |             |             | 2*(3-2)+sign                 |
1055	   | 10     | 2           | level-mode  | level=2, sign                |
1056	   | 11     | 0           | level-mode  | level=0                      |
1057	   | 12     | 0           | run-mode    |                              |
1058	   | 13     | 1           | run-mode    | (run=1,level=1)              |
1059	   | 14     | 0           | run-mode    | EOB                          |
1060	   | 15     | 0           | run-mode    |                              |
1061	   +--------+-------------+-------------+------------------------------+

1063	      Table 3: Transform coefficient encoding for the example above.

1065	10.  High Level Syntax

1067	   High level syntax is currently very simple and rudimentary as the
1068	   primary focus so far has been on compression performance.  It is
1069	   expected to evolve as functionality is added.

1071	10.1.  Sequence Header

1073	   o  Width - 16 bits

1075	   o  Height - 16 bits

1077	   o  Enable/disable PB-split - 1 bit

1079	   o  SB size - 3 bits

1081	   o  Enable/disable TB-split - 1 bit

1083	   o  Number of active reference frames (may go into frame header) - 2
1084	      bits (max 4)

1086	   o  Enable/disable interpolated reference frames - 1 bit

1088	   o  Enable/disable delta qp - 1 bit
1089	   o  Enable/disable deblocking - 1 bit

1091	   o  Constrained low-pass filter (CLPF) enable/disable - 1 bit

1093	   o  Enable/disable block context coding - 1 bit

1095	   o  Enable/disable bi-prediction - 1 bit

1097	   o  Enable/disable quantization matrices - 1 bit

1099	   o  If quantization matrices enabled: quantization matrix offset - 6
1100	      bits

1102	   o  Select 420 or 444 input - 1 bit

1104	   o  Number of reordered frames - 4 bits

1106	   o  Enable/disable chroma intra prediction from luma - 1 bit

1108	   o  Enable/disable chroma inter prediction from luma - 1 bit

1110	   o  Internal frame bitdepth (8, 10 or 12 bits) - 2 bits

1112	   o  Input video bitdepth (8, 10 or 12 bits) - 2 bits

1114	10.2.  Frame Header

1116	   o  Frame type - 1 bit

1118	   o  QP - 8 bits

1120	   o  Identification of active reference frames - num_ref*4 bits

1122	   o  Number of intra modes - 4 bits

1124	   o  Number of active reference frames - 2 bits

1126	   o  Active reference frames - number of active reference frames * 6
1127	      bits

1129	   o  Frame number - 16 bits

1131	   o  If CLPF is enabled in the sequence header: Constrained low-pass
1132	      filter (CLPF) strength - 2 bits (00 = off, 01 = strength 1, 10 =
1133	      strength 2, 11 = strength 4)

1135	   o  IF CLPF is enabled in the sequence header: Enable/disable CLPF
1136	      signal for each qualified filter block

1138	11.  IANA Considerations

1140	   This document has no IANA considerations yet.  TBD

1142	12.  Security Considerations

1144	   This document has no security considerations yet.  TBD

1146	13.  Normative References

1148	   [I-D.davies-netvc-irfvc]
1149	              Davies, T., "Interpolated reference frames for video
1150	              coding", draft-davies-netvc-irfvc-00 (work in progress),
1151	              October 2015.

1153	   [I-D.davies-netvc-qmtx]
1154	              Davies, T., "Quantisation matrices for Thor video coding",
1155	              draft-davies-netvc-qmtx-00 (work in progress), March 2016.

1157	   [I-D.midtskogen-netvc-chromapred]
1158	              Midtskogen, S., "Improved chroma prediction", draft-
1159	              midtskogen-netvc-chromapred-02 (work in progress), October
1160	              2016.

1162	   [I-D.midtskogen-netvc-clpf]
1163	              Midtskogen, S., Fuldseth, A., and M. Zanaty, "Constrained
1164	              Low Pass Filter", draft-midtskogen-netvc-clpf-02 (work in
1165	              progress), April 2016.

1167	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1168	              Requirement Levels", BCP 14, RFC 2119,
1169	              DOI 10.17487/RFC2119, March 1997,
1170	              <http://www.rfc-editor.org/info/rfc2119>.

1172	Authors' Addresses

1174	   Arild Fuldseth
1175	   Cisco
1176	   Lysaker
1177	   Norway

1179	   Email: arilfuld@cisco.com
1180	   Gisle Bjontegaard
1181	   Cisco
1182	   Lysaker
1183	   Norway

1185	   Email: gbjonteg@cisco.com

1187	   Steinar Midtskogen
1188	   Cisco
1189	   Lysaker
1190	   Norway

1192	   Email: stemidts@cisco.com

1194	   Thomas Davies
1195	   Cisco
1196	   London
1197	   UK

1199	   Email: thdavies@cisco.com

1201	   Mo Zanaty
1202	   Cisco
1203	   RTP,NC
1204	   USA

1206	   Email: mzanaty@cisco.com