Date: | Monday, November 1st, 15:20 - 16:50; Tuesday, November 2nd, 15:20 - 16:50 |
Location: | Yokohama, Japan |
Chairs: | Adam Roach, Mo Zanaty |
Minutes: | Jonathan Lennox, Nathan Egge, David Benham |
Presenter: | Chairs |
Slides: | https://www.ietf.org/proceedings/94/slides/slides-94-netvc-0.pdf |
The chairs called the working group's attention to the newly formed CELLAR WG, which will be publishing specifications for lossless codecs, based on FLAC and FFV1, as well as a container format based on Matroska.
Presenter: | Jose Alvarez |
Slides: | https://www.ietf.org/proceedings/94/slides/slides-94-netvc-1.pdf |
Drafts: | draft-filippov-netvc-requirements |
Mo Zanaty: for traditional video conferences (with mixes), there aren’t many Intras; but with newer use cases, e.g. PERC, Intra becomes more important. E.g. active speaker switching, not just random entry. Test conditions should reflect this.
Tim Terriberry: Can you propose a number to the list?
Mo: I’ll come up with a number.
Jonathan Lennox: Screencasting can also mean window sharing, which means arbitrary image sizes, including odd sizes.
Peter Thatcher: Desktop sizes go up to 5K now.
Maire Reavy: I'm concerned even 100ms of coding delay will be too high for real-time use. There can be slower modes (for non-real-time use cases) but 30ms (or less) for real-time needs to be the upper bound.
Mo Zanaty: we were thinking about a proper profile for these requirements, first version vs. later. May be time to do that now. On the delay, we should have a mode with zero structural delay. On bit depth, trend towards having higher internal precision apart from input or output formats. Might be good to differentiate these things.
Mo: To be clear, RGB means RGB 4:4:4
Martin Dürst: Useful to be able to play videos faster or slower, is that included in temporal scalability?
Jose: it could be.
Martin: Your list only includes 15, 30, etc. If you want 1.2x, whatever.
Jose: What is the use case?
Martin: If students are reviewing a lecture, can go through faster or slower depending on their needs.
Jose: Are these predictable factors, or infinitely variable?
Martin: Ones I know are 1.2, 1.5, etc.
Thomas Davies (Jabber Room): what does support of HDR mean in practice? If you can do 10/12 bit, then with a mapping function and suitable metadata then you can do some form of HDR.
Jose: Requirement is support for high dynamic range. More bits can do that, but it’s not necessarily linear. We’re not suggesting any particular implementation.
Chairs call for sense of the room regarding whether this draft should be the basis for fulfilling the requirements milestone. Of those in the room who had read the document (approximately 10), all favored adoption. No objections to adoption were noted.
Sense of room is to adopt; to be confirmed on-list
Mo: Does this have everything Jack had in his requirements document?
Tim: This had everything he wanted to see.
Presenter: | Thomas Daede |
Slides: | https://www.ietf.org/proceedings/94/slides/slides-94-netvc-2.pdf |
Drafts: | draft-daede-netvc-testing |
Thomas Davies: What about frame rate? 60 Hz does not require 2x the bit rate of 30 Hz
Thomas Daede: Currently that not’s taken into account, not sure how we want to scale it.
Mo Zanaty: Would it be useful to put the resolutions on the bitrate ranges too?
Thomas Daede: Screencast has all sorts of weird video sizes, I didn’t want to list them all, but when we get our explicit test cases we should list them.
Thomas Davies: square root is okay for framerate.
Mo: Unconstrained low latency is unconstrained in every way except structural delay?
Thomas Daede: Yes.
Thomas Davies: Main comment. The draft does not state what the testing is for and what the best test methodology might be for each purpose. Can you clarify the purposes you have in mind? What you might do for final testing is different for what you might do for testing a tool for inclusion, for example. This draft seems aimed at final testing rather than the process of developing the codec. I'm concerned about sucking away a lot of effort into developing fancy 2-pass encoding methods and rate controls that don't improve the fundamental technologies at all, when we are doing tests just for deciding on tools and judging current progress. Really want to constrain adaptivity to make fair comparisons between different codecs, also.
Thomas Daede: If you tell codecs they can’t use certain features, they have to be very careful not to use them. Not sure you can quite make it equal. Newer metrics respond correctly to a lot of these.
Mo: It seems like we have two types of testing, beauty contest vs. constant iteration working in the group. May be useful to have a split in the doc for those two cases.
Thomas: Yes, and a lot of concerns about fairness between codecs don’t apply if you’re testing a tool with a codec.
Mo: We already know you can get outlandish gains with outlandish rate controls. We don’t want that to mar tool selection or candidate selection.
Mo: I thought we were going to weight planes.
Thomas: Yes, but not clear how to weight. Works for PSNR not necessarily others.
Mo: But a weight of 0 for chroma is wrong.
Tim Terriberry: You don’t have to weight, just report all three scores.
Thomas: Doesn’t work great for other than PSNR.
Tim: Some are okay.
Chairs call for sense of the room regarding whether this draft should be the basis for fulfilling the requirements milestone. Of those in the room who had read the document (approximately 10), all favored adoption. No objections to adoption were noted.
Sense of room is to adopt; to be confirmed on-list
Presenter: | Jean-Marc Valin |
Slides: | https://www.ietf.org/proceedings/94/slides/slides-94-netvc-3.pdf |
Drafts: | http://jmvalin.ca/notes/dir_dering.pdf |
Mo Zanaty: Do you have a threshold or aggressiveness for direction determination?
Jean-Marc: The threshold I described depends on the bitrate, runs parallel to the edge.
Mo: So they’re not dependent on the content.
Jean-Marc: No, they’re also dependent on the content. Variance of the block influences the threshold.
Mo: Does direction determination have aggressiveness based on the content, confidence in the determination?
Jean-Marc: No, because conditional replacement filter avoids blurring the image.
Tim Terriberry: We always pick a direction, there’s no non-directional mode.
Steve Botzko: I’m confused what you mean by across? Orthogonal to direction?
Jean-Marc: Not quite orthogonal.
Steve: So it’s always horizontal or vertical
Jean-Marc: Yes, whichever is closer to orthogonal
Steinar Midtskogen: These bitrates are quite high, do you have numbers for lower bitrates?
Jean-Marc: Even at lower bitrates it’s a clear improvement. At some point it falls apart, just blocks of different colors, you lose directionality.
Steve: So this is part of the prediction mode?
Jean-Marc: It’s run in a loop.
Steve: Any point in using it as a post-filter?
Jean-Marc: I don't think so
Steinar Midtskogen: Is this off by an order of magnitude?
Jean-Marc: Yes, should be 0.025
Thomas Davies: have you tried running the filter on a subset of frames (e.g. HQ frames in a frame hierarchy)? would reduce complexity. e.g. do it on every 4th frame, to reduce complexity.
Jean-Marc: No, I haven’t; may be worth trying. Current complexity is about 5% of CPU use.
Mo: What was your test set?
Jean-Marc: NTT-short on arewecompressedyet.
Mo: The visual example you showed earlier is part of the test set?
Jean-Marc: No, that’s a separate test set of still images. The curves are on video.
Mo: Would be interesting to see on a class of images.
Jean-Marc: I went over the test set, the filter improves all of them.
Tim Terriberry: Responding to Thomas Davies, running every fourth frame would have some difficulties, would have to keep track of which areas were skipped. Could certainly do it with something like hierarchical P-frames, just run on I and P. Always have the option to disable at the encoder. Disabling on every block would be cheap, because of entropy coding.
Tim: Your B-Frames are skipping lots of superblocks anyway, so we’re already almost doing that.
Jean-Marc: If you’re doing B-Frames.
Mo: Would be good if you and Steinar could combine this with Thor’s low-pass filter, come up with something that’s the best common tool.
Jean-Marc: Definitely more experiments to do there. Thor’s constrained low-pass filter is constrained to changing by only one; works well for high bitrates, less for lower.
Steinar: Result probably depends on what kind of interpolation filter the codec has, so what works in Daala might not work in Thor.
Jean-Marc: Also that Daala has no Intra-prediction, Thor does. Yes, I wouldn’t expect result to look the same for the two codecs.
Presenter: | Timothy Terriberry |
Slides: | https://www.ietf.org/proceedings/94/slides/slides-94-netvc-4.pdf |
Drafts: | draft-terriberry-codingtools, draft-valin-netvc-pvq, draft-egge-netvc-tdlt, draft-terriberry-netvc-obmc, https://git.xiph.org/?p=daala.git |
See slides for summary of changes and resulting improvements. No discussion recorded in minutes.
Presenter: | Steinar Midtskogen |
Slides: | https://www.ietf.org/proceedings/94/slides/slides-94-netvc-6.pdf |
Drafts: | draft-fuldseth-netvc-thor, draft-davies-netvc-irfvc, draft-midtskogen-netvc-clpf |
See slides for summary of changes and resulting improvements.
Mo Zanaty: Asked for cllarification about SIMD optimizations made. Also asked if a general purpose GPU abstraction layer has been considered
Steiner: Roughly speaking, no.
Chairs requested that Daala and Thor developers agree on common methodology for depicting improvements between the two codecs.
Presenter: | Nathan Egge |
Slides: | https://www.ietf.org/proceedings/94/slides/slides-94-netvc-5.pdf |
Drafts: | draft-egge-netvc-cfl |
See slides for summary of technique and impacts.
Mo Zanaty: Have these been tested with 4:4:4 and RGB content?
Nathan: no
Chairs re-iterated that we need harmonized approach for displaying performance improvement information.