[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [AVT] Requesting feedback on new CELT codec
"Gregory Maxwell" <gmaxwell at gmail.com> writes:
>Swinging this back on topic a bit more: With typical frame sizes in
>the range of 8ms any overhead is very harmful. CELT does a fairly
>good job of avoiding overhead internally, not wasting more than ~1 bit
>or so per frame.
>
>But this means that the codec's several tunables are not signaled in
>the individual frames: Sample rate, frame size, and stereo/mono must
>be explicitly and correctly negotiated out of band or the codec will
>fail to decode. (Bitrate need not be signaled: the codec can infer the
>bitrate from the frame sizes).
Not a problem so long as you restrict packetization time when using CELT to
less than the lowest common multiple of the bitrates - assuming you mean by
"infer the bitrate from the frame sizes" that it's infered from the byte
length of the buffer. If it's inferred from some internal frame-end bit,
then no problem.
In the VoIP space, most packetization (not frame) times seem to be 20-30ms.
>So it's important for CELT today that
>some higher level system takes care of working out these details. Is
>this a requirement which would harm some applications?
A couple of things to consider:
* Media arriving before the answer (usually 200 OK in SIP):
Since the signaling often takes a different path, or is via UDP and the
first transmission can be lost, you may need to decode packets without
having seen the answer yet - or you have to refuse to decode untFrom avt-bounces at ietf.org Sun Jul 27 06:05:40 2008
Return-Path: <avt-bounces at ietf.org>
X-Original-To: avt-archive at optimus.ietf.org
Delivered-To: ietfarch-avt-archive at core3.amsl.com
Received: from [127.0.0.1] (localhost [127.0.0.1])
by core3.amsl.com (Postfix) with ESMTP id C8FEE3A680B;
Sun, 27 Jul 2008 06:05:40 -0700 (PDT)
X-Original-To: avt at core3.amsl.com
Delivered-To: avt at core3.amsl.com
Received: from localhost (localhost [127.0.0.1])
by core3.amsl.com (Postfix) with ESMTP id 63F503A67EA
for <avt at core3.amsl.com>; Sun, 27 Jul 2008 06:05:39 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.734
X-Spam-Level:
X-Spam-Status: No, score=-1.734 tagged_above=-999 required=5
tests=[BAYES_00=-2.599, FH_HOST_EQ_D_D_D_D=0.765, RDNS_DYNAMIC=0.1]
Received: from mail.ietf.org ([64.170.98.32])
by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024)
with ESMTP id d1aX-HL+E83h for <avt at core3.amsl.com>;
Sun, 27 Jul 2008 06:05:34 -0700 (PDT)
Received: from exchange1.wgate.com (pr-66-150-46-254.wgate.com [66.150.46.254])
by core3.amsl.com (Postfix) with ESMTP id 4C0A63A6812
for <avt at ietf.org>; Sun, 27 Jul 2008 06:05:34 -0700 (PDT)
Received: from jesup.eng.wgate.com ([10.32.2.26]) by exchange1.wgate.com with
Microsoft SMTPSVC(6.0.3790.3959); Sun, 27 Jul 2008 09:05:40 -0400
To: "Gregory Maxwell" <gmaxwell at gmail.com>
References: <e692861c0807262128hdc85719j69603d38e6aec2f3 at mail.gmail.com>
From: Randell Jesup <rjesup at wgate.com>
Date: Sun, 27 Jul 2008 09:07:14 -0400
In-Reply-To: <e692861c0807262128hdc85719j69603d38e6aec2f3 at mail.gmail.com>
(Gregory Maxwell's message of "Sun, 27 Jul 2008 00:28:00 -0400")
Message-ID: <ybu8wvnpjb1.fsf at jesup.eng.wgate.com>
User-Agent: Gnus/5.1006 (Gnus v5.10.6) Emacs/21.3 (gnu/linux)
MIME-Version: 1.0
X-OriginalArrivalTime: 27 Jul 2008 13:05:40.0839 (UTC)
FILETIME=[7883EF70:01C8EFE9]
Cc: avt at ietf.org
Subject: Re: [AVT] Requesting feedback on new CELT codec
X-BeenThere: avt at ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
Reply-To: Randell Jesup <rjesup at wgate.com>
List-Id: Audio/Video Transport Working Group <avt.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/avt>,
<mailto:avt-request at ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/pipermail/avt>
List-Post: <mailto:avt at ietf.org>
List-Help: <mailto:avt-request at ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/avt>,
<mailto:avt-request at ietf.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: avt-bounces at ietf.org
Errors-To: avt-bounces at ietf.org
"Gregory Maxwell" <gmaxwell at gmail.com> writes:
>Swinging this back on topic a bit more: With typical frame sizes in
>the range of 8ms any overhead is very harmful. CELT does a fairly
>good job of avoiding overhead internally, not wasting more than ~1 bit
>or so per frame.
>
>But this means that the codec's several tunables are not signaled in
>the individual frames: Sample rate, frame size, and stereo/mono must
>be explicitly and correctly negotiated out of band or the codec will
>fail to decode. (Bitrate need not be signaled: the codec can infer the
>bitrate from the frame sizes).
Not a problem so long as you restrict packetization time when using CELT to
less than the lowest common multiple of the bitrates - assuming you mean by
"infer the bitrate from the frame sizes" that it's infered from the byte
length of the buffer. If it's inferred from some internal frame-end bit,
then no problem.
In the VoIP space, most packetization (not frame) times seem to be 20-30ms.
>So it's important for CELT today that
>some higher level system takes care of working out these details. Is
>this a requirement which would harm some applications?
A couple of things to consider:
* Media arriving before the answer (usually 200 OK in SIP):
Since the signaling often takes a different path, or is via UDP and the
first transmission can be lost, you may need to decode packets without
having seen the answer yet - or you have to refuse to decode until seeinil seeing
the answer, which you can do, but it sub-optimal. In iLBC, the answer
can force the period to 20 or 30ms, and if the packetization time is 60
or 120ms the offerer can't tell (see above). Normally iLBC is used with
20 or 30ms packetiztion, so the answerer can infer which is used.
* There may be some applications which store packets (instead of encoding
on the fly) which wouldn't be able to easily use a codec with lots of
negotiated options, or where the answerer (or offerer?) can't force which
options are selected. (For example, in iLBC the answerer can force 30ms
by answering with mode=30.)
* Not unique to CELT, but on a re-negotiation (re-INVITE) during a call,
you have to be careful about changing options when the stream is running
if you can't infer when the change happens from the bit/packet-stream
itself. One solution (though not often used in practice) is to switch
payload types in the re-negotiation. May be worth mentioning as an
advisory in the draft.
There may be others issues. The first item is the most important.
I realize in the audio regime this may be anathema (or was), but another
option is a variable encoding size, so at addition items can be included
occasionally. This does mess up frame-length inference from packetization
sizes. Alternatively, you could use a more complex scheme that encodes the
extra tunable information a few bits at a time into the individual frames.
Perhaps too complex in practice, and packet loss could require waiting for
another cycle of packets to go by. If the tunables are a low number of
bits or the encoding has a fair portion of a byte left over per frame, it
may be more viable.
Note also that there's nothing sacred about allowing multiple frames to be
concatenated in an RTP packet. An RFC could disallow that, and instead
implement it's own packetization scheme that includes additional
information (if needed) and avoids the aliasing issue. This may cost you
in bandwidth, however, but could be useful with very small frame sizes.
--
Randell Jesup, Worldgate (developers of the Ojo videophone), ex-Amiga OS team
rjesup at wgate.com
"The fetters imposed on liberty at home have ever been forged out of the weapons
provided for defence against real, pretended, or imaginary dangers from abroad."
- James Madison, 4th US president (1751-1836)
_______________________________________________
Audio/Video Transport Working Group
avt at ietf.org
https://www.ietf.org/mailman/listinfo/avt
g
the answer, which you can do, but it sub-optimal. In iLBC, the answer
can force the period to 20 or 30ms, and if the packetization time is 60
or 120ms the offerer can't tell (see above). Normally iLBC is used with
20 or 30ms packetiztion, so the answerer can infer which is used.
* There may be some applications which store packets (instead of encoding
on the fly) which wouldn't be able to easily use a codec with lots of
negotiated options, or where the answerer (or offerer?) can't force which
options are selected. (For example, in iLBC the answerer can force 30ms
by answering with mode=30.)
* Not unique to CELT, but on a re-negotiation (re-INVITE) during a call,
you have to be careful about changing options when the stream is running
if you can't infer when the change happens from the bit/packet-stream
itself. One solution (though not often used in practice) is to switch
payload types in the re-negotiation. May be worth mentioning as an
advisory in the draft.
There may be others issues. The first item is the most important.
I realize in the audio regime this may be anathema (or was), but another
option is a variable encoding size, so at addition items can be included
occasionally. This does mess up frame-length inference from packetization
sizes. Alternatively, you could use a more complex scheme that encodes the
extra tunable information a few bits at a time into the individual frames.
Perhaps too complex in practice, and packet loss could require waiting for
another cycle of packets to go by. If the tunables are a low number of
bits or the encoding has a fair portion of a byte left over per frame, it
may be more viable.
Note also that there's nothing sacred about allowing multiple frames to be
concatenated in an RTP packet. An RFC could disallow that, and instead
implement it's own packetization scheme that includes additional
information (if needed) and avoids the aliasing issue. This may cost you
in bandwidth, however, but could be useful with very small frame sizes.
--
Randell Jesup, Worldgate (developers of the Ojo videophone), ex-Amiga OS team
rjesup at wgate.com
"The fetters imposed on liberty at home have ever been forged out of the weapons
provided for defence against real, pretended, or imaginary dangers from abroad."
- James Madison, 4th US president (1751-1836)
_______________________________________________
Audio/Video Transport Working Group
avt at ietf.org
https://www.ietf.org/mailman/listinfo/avt