Re: [codec] A concrete proposal for requirements and testing

Stephen Botzko <stephen.botzko@gmail.com> Sun, 10 April 2011 12:44 UTC

Return-Path: <stephen.botzko@gmail.com>
X-Original-To: codec@core3.amsl.com
Delivered-To: codec@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 7D10628B56A for <codec@core3.amsl.com>; Sun, 10 Apr 2011 05:44:18 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.589
X-Spam-Level:
X-Spam-Status: No, score=-2.589 tagged_above=-999 required=5 tests=[AWL=-0.306, BAYES_00=-2.599, HTML_MESSAGE=0.001, SARE_MILLIONSOF=0.315]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id auYpHQvlCq4A for <codec@core3.amsl.com>; Sun, 10 Apr 2011 05:44:16 -0700 (PDT)
Received: from mail-vx0-f172.google.com (mail-vx0-f172.google.com [209.85.220.172]) by core3.amsl.com (Postfix) with ESMTP id 421FA28B23E for <codec@ietf.org>; Sun, 10 Apr 2011 05:44:16 -0700 (PDT)
Received: by vxg33 with SMTP id 33so4452846vxg.31 for <codec@ietf.org>; Sun, 10 Apr 2011 05:46:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=UIKkXYcx/aUN1/lUc6mbPuoAJC9upNfzxFX6910PhHM=; b=w1yEbwp7k/AOdXTtrxvXCJrI0T0ccgZeQeOb405a1rWz7Z2EitdsIi4TaauT+dKYxy SVt4lgplx47/scLrGlPLTpjGTPhKhOIZA89Whg78kYvHHq+ukzxdCXiJLlc8d9opo0dE ZIbCsV6NSw9C4FAOGJraKpzsm73zXbrxX8g30=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=XkKEJAYc8js8J2gVEP/1hKtmD1iQiykx+OYNzdwe0OQd47qL51RScaUaE1ht7c1BlE wfrOs5avRvnweMM1mmGIMlxxGxtwDDm2fZ16+47IB8RorhO7C4zLqggC0o036rkQJpF2 iLagofx5cBc9H0K8HBTQCZl/wqy1+IPHYl+2w=
MIME-Version: 1.0
Received: by 10.220.117.148 with SMTP id r20mr76582vcq.16.1302439562448; Sun, 10 Apr 2011 05:46:02 -0700 (PDT)
Received: by 10.220.84.2 with HTTP; Sun, 10 Apr 2011 05:46:02 -0700 (PDT)
In-Reply-To: <20110410023345.GM30415@audi.shelbyville.oz>
References: <BANLkTimN1VduZ9kR2Mgp_w7=p6V1srHBiQ@mail.gmail.com> <21200823.2625297.1302284060278.JavaMail.root@lu2-zimbra> <BLU0-SMTP11D0135F8FFEEEB308A1E9D0A70@phx.gbl> <4d9f7107.a7fed80a.542d.ffffa087@mx.google.com> <20110409030611.GG30415@audi.shelbyville.oz> <BLU0-SMTP9917A8ABBC14D6FFE833E6D0A90@phx.gbl> <20110410023345.GM30415@audi.shelbyville.oz>
Date: Sun, 10 Apr 2011 08:46:02 -0400
Message-ID: <BANLkTin1pTWfThu1mF=PnBKMz_0_=5f8rw@mail.gmail.com>
From: Stephen Botzko <stephen.botzko@gmail.com>
To: Ron <ron@debian.org>
Content-Type: multipart/alternative; boundary="485b3970d12c7610b704a08fd724"
Cc: codec@ietf.org
Subject: Re: [codec] A concrete proposal for requirements and testing
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 10 Apr 2011 12:44:18 -0000

-inline
Stephen Botzko

On Sat, Apr 9, 2011 at 10:33 PM, Ron <ron@debian.org> wrote:

> On Sat, Apr 09, 2011 at 08:30:27PM -0400, Paul Coverdale wrote:
> > >> > quality has been shown to be good enough for the codec to be
> > >useful...
> > >>
> > >> I think that’s where some people have difficulty. There’s been no
> > >systematic
> > >> attempt to evaluate Opus against the performance requirements given in
> > >the
> > >> codec requirements document (as thin as it is) in a controlled and
> > >repeatable
> > >> manner.
> > >
> > >I think that's a bit disingenuous to the careful methodology that was
> > >employed by the developers, and to the people who so far have done
> > >blind and scientific tests of their results, but without dwelling on
> > >that slur:
> >
> > I'm not trying to be disingenuous. I just haven't seen details of the
> > "careful methodology that was employed". There may well be a lot of
> results
> > floating around behind the scenes, but the only official information that
> I
> > have seen so far is the slide deck "Opus Testing" presented during
> IETF80.
> > This did not provide a lot of experimental detail, and the number of
> subjects
> > was quite small. Anyone familiar with subjective testing knows that the
> > results can be heavily influenced by the methodology employed.
>
> I presume you've seen enough of codec development though to know that you
> don't arrive at the sort of results that Opus has achieved through randomly
> stabbing in the dark.  At every step things have been evaluated in an
> appropriately controlled and repeatable manner.  The HA folk also take
> their testing very seriously, which I believe was described in Prague.


No one is criticizing the developers here.  Let's take that off the table.
People want to see this codec tested/characterized because they think it is
good, not because they think it is poor. There is no need to test a bad
codec.

And no one has accused the testers of not being "serious". However,
repeatable controlled testing means that anyone can re-run the tests [using
the same test methodology] and get the the same results.  The documentation
of the procedures was not complete enough to allow this; the Q&A in Prague
made that pretty clear.  And there are traps - small mistakes in methodology
will produce misleading results.  So describing the details on precisely how
the speech samples were prepared, how the tests were conducted, what
equipment was used, etc. actually does matter.

And since the testing can be a lot of work, at least some folks want to see
a test plan developed so the work can be split up.  This is not being
obstructive, it is a belief (based on past experience) that ad-hoc tests are
not sufficient.


> Their complete set of results should be available soon, and my
> understanding
> at this stage is that they are not notably different from the small sample
> that was available at the time of Prague.
>
>
Will their results be submitted as a contribution to this working group?


> > >There's also been no, even informal, results presented to suggest that
> > >it does not exceed even the most ambitious performance requirement
> > >expectations held at the outset, by a rather surprising margin.
> >
> > We need a test plan before we can conduct any useful tests.
>
> Except we've already established that not all groups are actually
> interested
> in, or find significant, or satisfactory, the same test plan.  We don't
> need
> consensus from the whole group for further testing to take place, only
> between
> the people actually planning to conduct, and interested in, that test.
>


And many folks are interested in such a plan, and have said so since the
beginning (at least since the Maastricht meeting).   Obviously you disagree,
but you seem to be working very hard to block any progress on a concrete
test plan; repeatedly arguing it is a waste of time [to a thread which was
specifically started to develop a concrete test plan].  Which is very
different from saying you are not interested.


> The group can reach consensus on the relevance of these additional tests
> if and when they ever get conducted and results are presented.
>
> > >Do you have some results to share, that back up the claims of the
> > >people "having difficulty" accepting what has been achieved to date?
> > >
> >
> > Many people have indicated they will be willing to conduct tests, and
> > share results, when a test plan is available.
>
> This has been repeated often, and yet weeks have passed and still there
> is no sign of even a fledgling plan that these people are proposing
> to commit to ...
>
>
> > >While I respect your adherence to processes that are indeed necessary
> > >for the ITU to licence a technology and deploy it in the telephone
> > >network, one of the very reasons for forming this group under the
> > >auspices of the IETF was that the reality of developing and deploying
> > >internet services is quite different.  We don't have just a handful
> > >of companies responsible for signing off on a spec, and making do if
> > >it later proves insufficient - we have millions of them, most of which
> > >haven't even heard of this group yet.  And the best way to engage them
> > >is to provide a proposed standard which they can work to and assess
> > >for their own uses.
> > >
> >
> > ITU doesn't license technologies, that's a matter between individual
> > organizations who may want to implement a specific standard which
> > incorporates IPR. But in any case, the processes they follow are not
> there
> > because of licensing. They are there to provide a logical progression
> from
> > determining the requirements, developing solution(s), and testing the
> > solution(s) against those requirements.
>
> Yes, bad choice of overloaded terms, I meant to say "give licence to",
> in the sense of grant its imprimatur to the result.  I wasn't talking
> about IPR there, since we weren't supposed to be talking about that here ;)
>
> When you're about to deploy a stable technology to (many) thousands of
> fixed exchanges, with the reliability demanded of the telephone network,
> that sort of rigour is essential.  Here though we are talking about
> deploying a proposed technology, to many thousands of implementers,
> any or all of which may provide us useful feedback for improving that
> technology further.  And since it's the internet, we can do that and
> not only get away with it, but evolve a superior result faster too.
>

ITU-T codecs are of course used on the internet, many of them are not used
on the public telephone network, and were never intended for that use.  MPEG
also characterizes/verifies their codecs, using somewhat different test
procedures, but still using a unified test plan.  Anyway, framing this
discussion as "ITU  against IETF" is disingenuous and not constructive, we
are all contributing to the IETF by posting here, and AFAIK, the individuals
and the companies they work for have been involved in many IETF WGs for many
years.

One reason for an up-front testing focus is that (despite the "proposed
standard" RFC  comments), it is very difficult to upgrade codecs once they
are widely deployed.  Once OPUS is released, it will be put in PCs, and
appliances like tablets, mobile phones, and voip phones.  SIP devices using
it will need to support the first version for many years.

And [assuming success] Opus will be implemented in *silicon*, making the
upgrade situation much more problematic and expensive than a typical RFC.

So I don't agree with the "since its the internet we can..." line of
reasoning.


>
> I don't see that the testing which will be conducted by those people
> is any less valuable than the testing which is yet to emerge, from
> people who say they'll probably do some, if only there was a plan.
> We have people who say they can begin tests with thousands of users,
> in real application use cases, tomorrow if we can give them an RFC
> by then.  The complaint that sample spaces are too small is easily
> remedied by unleashing those people to go do that.  You're offering
> a test with ~24 people, they're offering tests on 24 million+.
> Which should we choose?
>
> We have a perfectly good plan.  Everyone go test the things that are
> important to *you*, that's the only way we, and you, can ensure that
> you'll actually be satisfied by the testing regime to undertake.
> When you have results, share them, and we'll assess them.  Many people
> have already satisfied themselves with such tests.  Those who haven't
> really only have themselves to blame for leaving it until the last
> moment to wonder what they should test.  The people who know what they
> want from this codec are satisfied, it's not clear to me what exactly
> we expect to learn from people who don't already have a use case in
> mind, and can't even begin to figure out what sort of test would
> actually satisfy them, or even be meaningful to apply to a codec of
> this calibre.
>
>
Honestly, this is a non-plan.


> If groups of people want to agree on a common regime, that's excellent.
> But if they can't that's no reason to hold up deploying something that
> even to people with golden ears, is basically transparent.
> \
>

We are not at WG last call, there is ongoing development still happening.
Framing this as a "last minute" surprise, which is holding up publication of
the RFC is incorrect.  Discussions on the need and nature of
characterization and testing have been happening all along.  The working
group perhaps did not close on it (though honestly, I recall no objections
to the need for characterization in any of the previous meetings; perhaps
people should re-visit the recordings). But the topic is not new, and I hope
we can do better than simply "agreeing to disagree", and releasing a codec
for wide deployment without the normal verification.


>
> As I suggested previously, we should give people a few more days to
> see if they actually do come up with more test plans, and a sensible
> timeline in which their results will be presented, and if they can't
> do that, then we should just move on to the next phase asap.
>
> Best,
> Ron
>
>
> _______________________________________________
> codec mailing list
> codec@ietf.org
> https://www.ietf.org/mailman/listinfo/codec
>