idnits 2.17.1
draft-ietf-netvc-testing-05.txt:
Checking boilerplate required by RFC 5378 and the IETF Trust (see
https://trustee.ietf.org/license-info):
----------------------------------------------------------------------------
No issues found here.
Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
----------------------------------------------------------------------------
No issues found here.
Checking nits according to https://www.ietf.org/id-info/checklist :
----------------------------------------------------------------------------
** The document seems to lack a Security Considerations section.
** The document seems to lack an IANA Considerations section. (See Section
2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
when there are no actions for IANA.)
Miscellaneous warnings:
----------------------------------------------------------------------------
== The copyright year in the IETF Trust and authors Copyright Line does not
match the current year
-- The document date (March 27, 2017) is 2587 days in the past. Is this
intentional?
Checking references for intended status: Informational
----------------------------------------------------------------------------
== Unused Reference: 'DERFVIDEO' is defined on line 978, but no explicit
reference was found in the text
== Unused Reference: 'FASTSSIM' is defined on line 982, but no explicit
reference was found in the text
== Unused Reference: 'L1100' is defined on line 994, but no explicit
reference was found in the text
== Unused Reference: 'STEAM' is defined on line 1014, but no explicit
reference was found in the text
== Outdated reference: A later version (-10) exists of
draft-ietf-netvc-requirements-02
Summary: 2 errors (**), 0 flaws (~~), 6 warnings (==), 1 comment (--).
Run idnits with the --verbose option for more detailed information about
the items above.
--------------------------------------------------------------------------------
2 Network Working Group T. Daede
3 Internet-Draft Mozilla
4 Intended status: Informational A. Norkin
5 Expires: September 28, 2017 Netflix
6 I. Brailovskiy
7 Amazon Lab126
8 March 27, 2017
10 Video Codec Testing and Quality Measurement
11 draft-ietf-netvc-testing-05
13 Abstract
15 This document describes guidelines and procedures for evaluating a
16 video codec. This covers subjective and objective tests, test
17 conditions, and materials used for the test.
19 Status of This Memo
21 This Internet-Draft is submitted in full conformance with the
22 provisions of BCP 78 and BCP 79.
24 Internet-Drafts are working documents of the Internet Engineering
25 Task Force (IETF). Note that other groups may also distribute
26 working documents as Internet-Drafts. The list of current Internet-
27 Drafts is at http://datatracker.ietf.org/drafts/current/.
29 Internet-Drafts are draft documents valid for a maximum of six months
30 and may be updated, replaced, or obsoleted by other documents at any
31 time. It is inappropriate to use Internet-Drafts as reference
32 material or to cite them other than as "work in progress."
34 This Internet-Draft will expire on September 28, 2017.
36 Copyright Notice
38 Copyright (c) 2017 IETF Trust and the persons identified as the
39 document authors. All rights reserved.
41 This document is subject to BCP 78 and the IETF Trust's Legal
42 Provisions Relating to IETF Documents
43 (http://trustee.ietf.org/license-info) in effect on the date of
44 publication of this document. Please review these documents
45 carefully, as they describe your rights and restrictions with respect
46 to this document. Code Components extracted from this document must
47 include Simplified BSD License text as described in Section 4.e of
48 the Trust Legal Provisions and are provided without warranty as
49 described in the Simplified BSD License.
51 Table of Contents
53 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
54 2. Subjective quality tests . . . . . . . . . . . . . . . . . . 3
55 2.1. Still Image Pair Comparison . . . . . . . . . . . . . . . 3
56 2.2. Video Pair Comparison . . . . . . . . . . . . . . . . . . 3
57 2.3. Subjective viewing test . . . . . . . . . . . . . . . . . 4
58 3. Objective Metrics . . . . . . . . . . . . . . . . . . . . . . 4
59 3.1. Overall PSNR . . . . . . . . . . . . . . . . . . . . . . 4
60 3.2. Frame-averaged PSNR . . . . . . . . . . . . . . . . . . . 5
61 3.3. PSNR-HVS-M . . . . . . . . . . . . . . . . . . . . . . . 5
62 3.4. SSIM . . . . . . . . . . . . . . . . . . . . . . . . . . 5
63 3.5. Multi-Scale SSIM . . . . . . . . . . . . . . . . . . . . 5
64 3.6. CIEDE2000 . . . . . . . . . . . . . . . . . . . . . . . . 5
65 3.7. VMAF . . . . . . . . . . . . . . . . . . . . . . . . . . 6
66 4. Comparing and Interpreting Results . . . . . . . . . . . . . 6
67 4.1. Graphing . . . . . . . . . . . . . . . . . . . . . . . . 6
68 4.2. BD-Rate . . . . . . . . . . . . . . . . . . . . . . . . . 6
69 4.3. Ranges . . . . . . . . . . . . . . . . . . . . . . . . . 7
70 5. Test Sequences . . . . . . . . . . . . . . . . . . . . . . . 7
71 5.1. Sources . . . . . . . . . . . . . . . . . . . . . . . . . 7
72 5.2. Test Sets . . . . . . . . . . . . . . . . . . . . . . . . 8
73 5.2.1. regression-1 . . . . . . . . . . . . . . . . . . . . 8
74 5.2.2. objective-2-slow . . . . . . . . . . . . . . . . . . 8
75 5.2.3. objective-2-fast . . . . . . . . . . . . . . . . . . 12
76 5.2.4. objective-1.1 . . . . . . . . . . . . . . . . . . . . 14
77 5.2.5. objective-1-fast . . . . . . . . . . . . . . . . . . 17
78 5.3. Operating Points . . . . . . . . . . . . . . . . . . . . 18
79 5.3.1. Common settings . . . . . . . . . . . . . . . . . . . 18
80 5.3.2. High Latency CQP . . . . . . . . . . . . . . . . . . 19
81 5.3.3. Low Latency CQP . . . . . . . . . . . . . . . . . . . 19
82 5.3.4. Unconstrained High Latency . . . . . . . . . . . . . 19
83 5.3.5. Unconstrained Low Latency . . . . . . . . . . . . . . 19
84 6. Automation . . . . . . . . . . . . . . . . . . . . . . . . . 20
85 6.1. Regression tests . . . . . . . . . . . . . . . . . . . . 20
86 6.2. Objective performance tests . . . . . . . . . . . . . . . 20
87 6.3. Periodic tests . . . . . . . . . . . . . . . . . . . . . 21
88 7. Informative References . . . . . . . . . . . . . . . . . . . 21
89 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 23
91 1. Introduction
93 When developing a video codec, changes and additions to the codec
94 need to be decided based on their performance tradeoffs. In
95 addition, measurements are needed to determine when the codec has met
96 its performance goals. This document specifies how the tests are to
97 be carried about to ensure valid comparisons when evaluating changes
98 under consideration. Authors of features or changes should provide
99 the results of the appropriate test when proposing codec
100 modifications.
102 2. Subjective quality tests
104 Subjective testing is the preferable method of testing video codecs.
106 Subjective testing results take priority over objective testing
107 results, when available. Subjective testing is recommended
108 especially when taking advantage of psychovisual effects that may not
109 be well represented by objective metrics, or when different objective
110 metrics disagree.
112 Selection of a testing methodology depends on the feature being
113 tested and the resources available. Test methodologies are presented
114 in order of increasing accuracy and cost.
116 Testing relies on the resources of participants. For this reason,
117 even if the group agrees that a particular test is important, if no
118 one volunteers to do it, or if volunteers do not complete it in a
119 timely fashion, then that test should be discarded. This ensures
120 that only important tests be done in particular, the tests that are
121 important to participants.
123 2.1. Still Image Pair Comparison
125 A simple way to determine superiority of one compressed image is to
126 visually compare two compressed images, and have the viewer judge
127 which one has a higher quality. This is used for rapid comparisons
128 during development - the viewer may be a developer or user, for
129 example. Because testing is done on still images (keyframes), this
130 is only suitable for changes with similar or no effect on other
131 frames. For example, this test may be suitable for an intra de-
132 ringing filter, but not for a new inter prediction mode. For this
133 test, the two compressed images should have similar compressed file
134 sizes, with one image being no more than 5% larger than the other.
135 In addition, at least 5 different images should be compared.
137 2.2. Video Pair Comparison
139 Video comparisons are necessary when making changes with temporal
140 effects, such as changes to inter-frame prediction. Video pair
141 comparisons follow the same procedure as still images.
143 2.3. Subjective viewing test
145 A subjective viewing test is the preferred method of evaluating the
146 quality. The subjective test should be performed as either
147 consecutively showing the video sequences on one screen or on two
148 screens located side-by-side. The testing procedure should normally
149 follow rules described in [BT500] and be performed with non-expert
150 test subjects. The result of the test could be (depending on the
151 test procedure) mean opinion scores (MOS) or differential mean
152 opinion scores (DMOS). Normally, confidence intervals are also
153 calculated to judge whether the difference between two encodings is
154 statistically significant. In certain cases, a viewing test with
155 expert test subjects can be performed, for example if a test should
156 evaluate technologies with similar performance with respect to a
157 particular artifact (e.g. loop filters or motion prediction).
158 Depending on the setup of the test, the output could be a MOS, DMOS
159 or a percentage of experts, who preferred one or another technology.
161 3. Objective Metrics
163 Objective metrics are used in place of subjective metrics for easy
164 and repeatable experiments. Most objective metrics have been
165 designed to correlate with subjective scores.
167 The following descriptions give an overview of the operation of each
168 of the metrics. Because implementation details can sometimes vary,
169 the exact implementation is specified in C in the Daala tools
170 repository [DAALA-GIT]. Implementations of metrics must directly
171 support the input's resolution, bit depth, and sampling format.
173 Unless otherwise specified, all of the metrics described below only
174 apply to the luma plane, individually by frame. When applied to the
175 video, the scores of each frame are averaged to create the final
176 score.
178 Codecs must output the same resolution, bit depth, and sampling
179 format as the input.
181 3.1. Overall PSNR
183 PSNR is a traditional signal quality metric, measured in decibels.
184 It is directly drived from mean square error (MSE), or its square
185 root (RMSE). The formula used is:
187 20 * log10 ( MAX / RMSE )
189 or, equivalently:
191 10 * log10 ( MAX^2 / MSE )
193 where the error is computed over all the pixels in the video, which
194 is the method used in the dump_psnr.c reference implementation.
196 This metric may be applied to both the luma and chroma planes, with
197 all planes reported separately.
199 3.2. Frame-averaged PSNR
201 PSNR can also be calculated per-frame, and then the values averaged
202 together. This is reported in the same way as overall PSNR.
204 3.3. PSNR-HVS-M
206 The PSNR-HVS metric performs a DCT transform of 8x8 blocks of the
207 image, weights the coefficients, and then calculates the PSNR of
208 those coefficients. Several different sets of weights have been
209 considered. [PSNRHVS] The weights used by the dump_pnsrhvs.c tool in
210 the Daala repository have been found to be the best match to real MOS
211 scores.
213 3.4. SSIM
215 SSIM (Structural Similarity Image Metric) is a still image quality
216 metric introduced in 2004 [SSIM]. It computes a score for each
217 individual pixel, using a window of neighboring pixels. These scores
218 can then be averaged to produce a global score for the entire image.
219 The original paper produces scores ranging between 0 and 1.
221 To linearize the metric for BD-Rate computation, the score is
222 converted into a nonlinear decibel scale:
224 -10 * log10 (1 - SSIM)
226 3.5. Multi-Scale SSIM
228 Multi-Scale SSIM is SSIM extended to multiple window sizes [MSSSIM].
229 The metric score is converted to decibels in the same way as SSIM.
231 3.6. CIEDE2000
233 CIEDE2000 is a metric based on CIEDE color distances [CIEDE2000]. It
234 generates a single score taking into account all three chroma planes.
235 It does not take into consideration any structural similarity or
236 other psychovisual effects.
238 3.7. VMAF
240 Video Multi-method Assessment Fusion (VMAF) is a full-reference
241 perceptual video quality metric that aims to approximate human
242 perception of video quality [VMAF]. This metric is focused on
243 quality degradation due compression and rescaling. VMAF estimates
244 the perceived quality score by computing scores from multiple quality
245 assessment algorithms, and fusing them using a support vector machine
246 (SVM). Currently, three image fidelity metrics and one temporal
247 signal have been chosen as features to the SVM, namely Anti-noise SNR
248 (ANSNR), Detail Loss Measure (DLM), Visual Information Fidelity
249 (VIF), and the mean co-located pixel difference of a frame with
250 respect to the previous frame.
252 The quality score from VMAF is used directly to calculate BD-Rate,
253 without any conversions.
255 4. Comparing and Interpreting Results
257 4.1. Graphing
259 When displayed on a graph, bitrate is shown on the X axis, and the
260 quality metric is on the Y axis. For publication, the X axis should
261 be linear. The Y axis metric should be plotted in decibels. If the
262 quality metric does not natively report quality in decibels, it
263 should be converted as described in the previous section.
265 4.2. BD-Rate
267 The Bjontegaard rate difference, also known as BD-rate, allows the
268 measurement of the bitrate reduction offered by a codec or codec
269 feature, while maintaining the same quality as measured by objective
270 metrics. The rate change is computed as the average percent
271 difference in rate over a range of qualities. Metric score ranges
272 are not static - they are calculated either from a range of bitrates
273 of the reference codec, or from quantizers of a third, anchor codec.
274 Given a reference codec and test codec, BD-rate values are calculated
275 as follows:
277 o Rate/distortion points are calculated for the reference and test
278 codec.
280 * At least four points must be computed. These points should be
281 the same quantizers when comparing two versions of the same
282 codec.
284 * Additional points outside of the range should be discarded.
286 o The rates are converted into log-rates.
288 o A piecewise cubic hermite interpolating polynomial is fit to the
289 points for each codec to produce functions of log-rate in terms of
290 distortion.
292 o Metric score ranges are computed:
294 * If comparing two versions of the same codec, the overlap is the
295 intersection of the two curves, bound by the chosen quantizer
296 points.
298 * If comparing dissimilar codecs, a third anchor codec's metric
299 scores at fixed quantizers are used directly as the bounds.
301 o The log-rate is numerically integrated over the metric range for
302 each curve, using at least 1000 samples and trapezoidal
303 integration.
305 o The resulting integrated log-rates are converted back into linear
306 rate, and then the percent difference is calculated from the
307 reference to the test codec.
309 4.3. Ranges
311 For individual feature changes in libaom or libvpx, the overlap BD-
312 Rate method with quantizers 20, 32, 43, and 55 must be used.
314 For the final evaluation described in [I-D.ietf-netvc-requirements],
315 the quantizers used are 20, 24, 28, 32, 36, 39, 43, 47, 51, and 55.
317 5. Test Sequences
319 5.1. Sources
321 Lossless test clips are preferred for most tests, because the
322 structure of compression artifacts in already-compressed clips may
323 introduce extra noise in the test results. However, a large amount
324 of content on the internet needs to be recompressed at least once, so
325 some sources of this nature are useful. The encoder should run at
326 the same bit depth as the original source. In addition, metrics need
327 to support operation at high bit depth. If one or more codecs in a
328 comparison do not support high bit depth, sources need to be
329 converted once before entering the encoder.
331 5.2. Test Sets
333 Sources are divided into several categories to test different
334 scenarios the codec will be required to operate in. For easier
335 comparison, all videos in each set should have the same color
336 subsampling, same resolution, and same number of frames. In
337 addition, all test videos must be publicly available for testing use,
338 to allow for reproducibility of results. All current test sets are
339 available for download [TESTSEQUENCES].
341 Test sequences should be downloaded in whole. They should not be
342 recreated from the original sources.
344 5.2.1. regression-1
346 This test set is used for basic regression testing. It contains a
347 very small number of clips.
349 o kirlandvga (640x360, 8bit, 4:2:0, 300 frames)
351 o FourPeople (1280x720, 8bit, 4:2:0, 60 frames)
353 o Narrarator (4096x2160, 10bit, 4:2:0, 15 frames)
355 o CSGO (1920x1080, 8bit, 4:4:4 60 frames)
357 5.2.2. objective-2-slow
359 This test set is a comprehensive test set, grouped by resolution.
360 These test clips were created from originals at [TESTSEQUENCES].
361 They have been scaled and cropped to match the resolution of their
362 category. This test set requires compiling with high bit depth
363 support.
365 4096x2160, 4:2:0, 60 frames:
367 o Netflix_BarScene_4096x2160_60fps_10bit_420_60f
369 o Netflix_BoxingPractice_4096x2160_60fps_10bit_420_60f
371 o Netflix_Dancers_4096x2160_60fps_10bit_420_60f
373 o Netflix_Narrator_4096x2160_60fps_10bit_420_60f
375 o Netflix_RitualDance_4096x2160_60fps_10bit_420_60f
377 o Netflix_ToddlerFountain_4096x2160_60fps_10bit_420_60f
378 o Netflix_WindAndNature_4096x2160_60fps_10bit_420_60f
380 o street_hdr_amazon_2160p
382 1920x1080, 4:2:0, 60 frames:
384 o aspen_1080p_60f
386 o crowd_run_1080p50_60f
388 o ducks_take_off_1080p50_60f
390 o guitar_hdr_amazon_1080p
392 o life_1080p30_60f
394 o Netflix_Aerial_1920x1080_60fps_8bit_420_60f
396 o Netflix_Boat_1920x1080_60fps_8bit_420_60f
398 o Netflix_Crosswalk_1920x1080_60fps_8bit_420_60f
400 o Netflix_FoodMarket_1920x1080_60fps_8bit_420_60f
402 o Netflix_PierSeaside_1920x1080_60fps_8bit_420_60f
404 o Netflix_SquareAndTimelapse_1920x1080_60fps_8bit_420_60f
406 o Netflix_TunnelFlag_1920x1080_60fps_8bit_420_60f
408 o old_town_cross_1080p50_60f
410 o pan_hdr_amazon_1080p
412 o park_joy_1080p50_60f
414 o pedestrian_area_1080p25_60f
416 o rush_field_cuts_1080p_60f
418 o rush_hour_1080p25_60f
420 o seaplane_hdr_amazon_1080p
422 o station2_1080p25_60f
424 o touchdown_pass_1080p_60f
425 1280x720, 4:2:0, 120 frames:
427 o boat_hdr_amazon_720p
429 o dark720p_120f
431 o FourPeople_1280x720_60_120f
433 o gipsrestat720p_120f
435 o Johnny_1280x720_60_120f
437 o KristenAndSara_1280x720_60_120f
439 o Netflix_DinnerScene_1280x720_60fps_8bit_420_120f
441 o Netflix_DrivingPOV_1280x720_60fps_8bit_420_120f
443 o Netflix_FoodMarket2_1280x720_60fps_8bit_420_120f
445 o Netflix_RollerCoaster_1280x720_60fps_8bit_420_120f
447 o Netflix_Tango_1280x720_60fps_8bit_420_120f
449 o rain_hdr_amazon_720p
451 o vidyo1_720p_60fps_120f
453 o vidyo3_720p_60fps_120f
455 o vidyo4_720p_60fps_120f
457 640x360, 4:2:0, 120 frames:
459 o blue_sky_360p_120f
461 o controlled_burn_640x360_120f
463 o desktop2360p_120f
465 o kirland360p_120f
467 o mmstationary360p_120f
469 o niklas360p_120f
471 o rain2_hdr_amazon_360p
472 o red_kayak_360p_120f
474 o riverbed_360p25_120f
476 o shields2_640x360_120f
478 o snow_mnt_640x360_120f
480 o speed_bag_640x360_120f
482 o stockholm_640x360_120f
484 o tacomanarrows360p_120f
486 o thaloundeskmtg360p_120f
488 o water_hdr_amazon_360p
490 426x240, 4:2:0, 120 frames:
492 o bqfree_240p_120f
494 o bqhighway_240p_120f
496 o bqzoom_240p_120f
498 o chairlift_240p_120f
500 o dirtbike_240p_120f
502 o mozzoom_240p_120f
504 1920x1080, 4:4:4 or 4:2:0, 60 frames:
506 o CSGO_60f.y4m
508 o DOTA2_60f_420.y4m
510 o MINECRAFT_60f_420.y4m
512 o STARCRAFT_60f_420.y4m
514 o EuroTruckSimulator2_60f.y4m
516 o Hearthstone_60f.y4m
518 o wikipedia_420.y4m
519 o pvq_slideshow.y4m
521 5.2.3. objective-2-fast
523 This test set is a strict subset of objective-2-slow. It is designed
524 for faster runtime. This test set requires compiling with high bit
525 depth support.
527 1920x1080, 4:2:0, 60 frames:
529 o aspen_1080p_60f
531 o ducks_take_off_1080p50_60f
533 o life_1080p30_60f
535 o Netflix_Aerial_1920x1080_60fps_8bit_420_60f
537 o Netflix_Boat_1920x1080_60fps_8bit_420_60f
539 o Netflix_FoodMarket_1920x1080_60fps_8bit_420_60f
541 o Netflix_PierSeaside_1920x1080_60fps_8bit_420_60f
543 o Netflix_SquareAndTimelapse_1920x1080_60fps_8bit_420_60f
545 o Netflix_TunnelFlag_1920x1080_60fps_8bit_420_60f
547 o rush_hour_1080p25_60f
549 o seaplane_hdr_amazon_1080p
551 o touchdown_pass_1080p_60f
553 1280x720, 4:2:0, 120 frames:
555 o boat_hdr_amazon_720p
557 o dark720p_120f
559 o gipsrestat720p_120f
561 o KristenAndSara_1280x720_60_120f
563 o Netflix_DrivingPOV_1280x720_60fps_8bit_420_60f
565 o Netflix_RollerCoaster_1280x720_60fps_8bit_420_60f
566 o vidyo1_720p_60fps_120f
568 o vidyo4_720p_60fps_120f
570 640x360, 4:2:0, 120 frames:
572 o blue_sky_360p_120f
574 o controlled_burn_640x360_120f
576 o kirland360p_120f
578 o niklas360p_120f
580 o rain2_hdr_amazon_360p
582 o red_kayak_360p_120f
584 o riverbed_360p25_120f
586 o shields2_640x360_120f
588 o speed_bag_640x360_120f
590 o thaloundeskmtg360p_120f
592 426x240, 4:2:0, 120 frames:
594 o bqfree_240p_120f
596 o bqzoom_240p_120f
598 o dirtbike_240p_120f
600 1290x1080, 4:2:0, 60 frames:
602 o DOTA2_60f_420.y4m
604 o MINECRAFT_60f_420.y4m
606 o STARCRAFT_60f_420.y4m
608 o wikipedia_420.y4m
610 5.2.4. objective-1.1
612 This test set is an old version of objective-2-slow.
614 4096x2160, 10bit, 4:2:0, 60 frames:
616 o Aerial (start frame 600)
618 o BarScene (start frame 120)
620 o Boat (start frame 0)
622 o BoxingPractice (start frame 0)
624 o Crosswalk (start frame 0)
626 o Dancers (start frame 120)
628 o FoodMarket
630 o Narrator
632 o PierSeaside
634 o RitualDance
636 o SquareAndTimelapse
638 o ToddlerFountain (start frame 120)
640 o TunnelFlag
642 o WindAndNature (start frame 120)
644 1920x1080, 8bit, 4:4:4, 60 frames:
646 o CSGO
648 o DOTA2
650 o EuroTruckSimulator2
652 o Hearthstone
654 o MINECRAFT
656 o STARCRAFT
657 o wikipedia
659 o pvq_slideshow
661 1920x1080, 8bit, 4:2:0, 60 frames:
663 o ducks_take_off
665 o life
667 o aspen
669 o crowd_run
671 o old_town_cross
673 o park_joy
675 o pedestrian_area
677 o rush_field_cuts
679 o rush_hour
681 o station2
683 o touchdown_pass
685 1280x720, 8bit, 4:2:0, 60 frames:
687 o Netflix_FoodMarket2
689 o Netflix_Tango
691 o DrivingPOV (start frame 120)
693 o DinnerScene (start frame 120)
695 o RollerCoaster (start frame 600)
697 o FourPeople
699 o Johnny
701 o KristenAndSara
703 o vidyo1
704 o vidyo3
706 o vidyo4
708 o dark720p
710 o gipsrecmotion720p
712 o gipsrestat720p
714 o controlled_burn
716 o stockholm
718 o speed_bag
720 o snow_mnt
722 o shields
724 640x360, 8bit, 4:2:0, 60 frames:
726 o red_kayak
728 o blue_sky
730 o riverbed
732 o thaloundeskmtgvga
734 o kirlandvga
736 o tacomanarrowsvga
738 o tacomascmvvga
740 o desktop2360p
742 o mmmovingvga
744 o mmstationaryvga
746 o niklasvga
748 5.2.5. objective-1-fast
750 This is an old version of objective-2-fast.
752 1920x1080, 8bit, 4:2:0, 60 frames:
754 o Aerial (start frame 600)
756 o Boat (start frame 0)
758 o Crosswalk (start frame 0)
760 o FoodMarket
762 o PierSeaside
764 o SquareAndTimelapse
766 o TunnelFlag
768 1920x1080, 8bit, 4:2:0, 60 frames:
770 o CSGO
772 o EuroTruckSimulator2
774 o MINECRAFT
776 o wikipedia
778 1920x1080, 8bit, 4:2:0, 60 frames:
780 o ducks_take_off
782 o aspen
784 o old_town_cross
786 o pedestrian_area
788 o rush_hour
790 o touchdown_pass
792 1280x720, 8bit, 4:2:0, 60 frames:
794 o Netflix_FoodMarket2
795 o DrivingPOV (start frame 120)
797 o RollerCoaster (start frame 600)
799 o Johnny
801 o vidyo1
803 o vidyo4
805 o gipsrecmotion720p
807 o speed_bag
809 o shields
811 640x360, 8bit, 4:2:0, 60 frames:
813 o red_kayak
815 o riverbed
817 o kirlandvga
819 o tacomascmvvga
821 o mmmovingvga
823 o niklasvga
825 5.3. Operating Points
827 Four operating modes are defined. High latency is intended for on
828 demand streaming, one-to-many live streaming, and stored video. Low
829 latency is intended for videoconferencing and remote access. Both of
830 these modes come in CQP and unconstrained variants. When testing
831 still image sets, such as subset1, high latency CQP mode should be
832 used.
834 5.3.1. Common settings
836 Encoders should be configured to their best settings when being
837 compared against each other:
839 o av1: -codec=av1 -ivf -frame-parallel=0 -tile-columns=0 -cpu-used=0
840 -threads=1
842 5.3.2. High Latency CQP
844 High Latency CQP is used for evaluating incremental changes to a
845 codec. This method is well suited to compare codecs with similar
846 coding tools. It allows codec features with intrinsic frame delay.
848 o daala: -v=x -b 2
850 o vp9: -end-usage=q -cq-level=x -lag-in-frames=25 -auto-alt-ref=2
852 o av1: -end-usage=q -cq-level=x -lag-in-frames=25 -auto-alt-ref=2
854 5.3.3. Low Latency CQP
856 Low Latency CQP is used for evaluating incremental changes to a
857 codec. This method is well suited to compare codecs with similar
858 coding tools. It requires the codec to be set for zero intrinsic
859 frame delay.
861 o daala: -v=x
863 o av1: -end-usage=q -cq-level=x -lag-in-frames=0
865 5.3.4. Unconstrained High Latency
867 The encoder should be run at the best quality mode available, using
868 the mode that will provide the best quality per bitrate (VBR or
869 constant quality mode). Lookahead and/or two-pass are allowed, if
870 supported. One parameter is provided to adjust bitrate, but the
871 units are arbitrary. Example configurations follow:
873 o x264: -crf=x
875 o x265: -crf=x
877 o daala: -v=x -b 2
879 o av1: -end-usage=q -cq-level=x -lag-in-frames=25 -auto-alt-ref=2
881 5.3.5. Unconstrained Low Latency
883 The encoder should be run at the best quality mode available, using
884 the mode that will provide the best quality per bitrate (VBR or
885 constant quality mode), but no frame delay, buffering, or lookahead
886 is allowed. One parameter is provided to adjust bitrate, but the
887 units are arbitrary. Example configurations follow:
889 o x264: -crf-x -tune zerolatency
890 o x265: -crf=x -tune zerolatency
892 o daala: -v=x
894 o av1: -end-usage=q -cq-level=x -lag-in-frames=0
896 6. Automation
898 Frequent objective comparisons are extremely beneficial while
899 developing a new codec. Several tools exist in order to automate the
900 process of objective comparisons. The Compare-Codecs tool allows BD-
901 rate curves to be generated for a wide variety of codecs
902 [COMPARECODECS]. The Daala source repository contains a set of
903 scripts that can be used to automate the various metrics used. In
904 addition, these scripts can be run automatically utilizing
905 distributed computers for fast results, with rd_tool [RD_TOOL]. This
906 tool can be run via a web interface called AreWeCompressedYet [AWCY],
907 or locally.
909 Because of computational constraints, several levels of testing are
910 specified.
912 6.1. Regression tests
914 Regression tests run on a small number of short sequences -
915 regression-test-1. The regression tests should include a number of
916 various test conditions. The purpose of regression tests is to
917 ensure bug fixes (and similar patches) do not negatively affect the
918 performance. The anchor in regression tests is the previous revision
919 of the codec in source control. Regression tests are run on both
920 high and low latency CQP modes
922 6.2. Objective performance tests
924 Changes that are expected to affect the quality of encode or
925 bitstream should run an objective performance test. The performance
926 tests should be run on a wider number of sequences. The following
927 data should be reported:
929 o Identifying information for the encoder used, such as the git
930 commit hash.
932 o Command line options to the encoder, configure script, and
933 anything else necessary to replicate the experiment.
935 o The name of the test set run (objective-1)
936 o For both high and low latency CQP modes, and for each objective
937 metric:
939 * The BD-Rate score, in percent, for each clip.
941 * The average of all BD-Rate scores, equally weighted, for each
942 resolution category in the test set.
944 * The average of all BD-Rate scores for all videos in all
945 categories.
947 For non-tool contributions, the test set objective-1-fast can be
948 substituted.
950 6.3. Periodic tests
952 Periodic tests are run on a wide range of bitrates in order to gauge
953 progress over time, as well as detect potential regressions missed by
954 other tests.
956 7. Informative References
958 [AWCY] Xiph.Org, "Are We Compressed Yet?", 2016,
959 .
961 [BT500] ITU-R, "Recommendation ITU-R BT.500-13", 2012,
962 .
965 [CIEDE2000]
966 Yang, Y., Ming, J., and N. Yu, "Color Image Quality
967 Assessment Based on CIEDE2000", 2012,
968 .
970 [COMPARECODECS]
971 Alvestrand, H., "Compare Codecs", 2015,
972 .
974 [DAALA-GIT]
975 Xiph.Org, "Daala Git Repository", 2015,
976 .
978 [DERFVIDEO]
979 Terriberry, T., "Xiph.org Video Test Media", n.d.,
980 .
982 [FASTSSIM]
983 Chen, M. and A. Bovik, "Fast structural similarity index
984 algorithm", 2010,
985 .
988 [I-D.ietf-netvc-requirements]
989 Filippov, A., Norkin, A., and j.
990 jose.roberto.alvarez@huawei.com, "