[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [AVT] Submission and request for feedback on draft-valin-celt-rtp-profile-00.txt



Randell Jesup writes:
[snip]
> So what then was the "one-byte change in bitrate per frame" then?
> I was referring to what happens if 10 packets in a row are lost - can the
> receiver decode a packet that might be 10 bytes smaller or larger?  Does it
> need as input the number of packets lost (if any) since the last decoded
> packet? 

I'll let JM respond to the rest of your ongoing conversation but I
wanted to take a crack at clarifying this as I probably caused the
misunderstanding on this point.

The text "CELT allows for bitrate adjustment in one byte per frame
increments without any signalling requirement or overhead." is referring
to the *resolution* of the instantaneous bitrate, not the rate of
change. 

Stated differently: At the moment a frame of audio is compressed you can
ask the encoder to produce anywhere from zero to hundreds of bytes (well,
the current encoder may fail to produce useful output for lengths <~5
bytes, but that should change). This is in contract to most communication
codecs which can only encode to a fairly limited number of sizes. The next
frame may be encoded with an entirely different number of bytes without
regard to what any of the prior frames were encoded with or what future
frames may be encoded with.

Whatever space is available will be well used which is why we recommend
that you never pad CELT: If you want a larger size you can ask the codec
for it and the extra space will be used to improve quality.

A million packets could be lost or otherwise mutilated and the decoder
will happily decode whatever comes in when the stream resumes. The codec
may take 40 milliseconds or so to reach full quality after a loss event
but it is guaranteed to recover quickly and it doesn't matter what the
bitrate did during the loss, nor does the decoder need to know how
much was lost or damaged.

So, for example, the encoder could be encoding at 64kbit/sec and receive
notice from the rate control part of an application (or operating system)
and on the next frame be producing 24kbit/sec with no coordination with
the decoder.  Alternatively, it could change the bitrate as little as
((sample_rate / frame_samples) * 1 byte * 8bits/byte),  1.38kbit/sec for
the 44100 sample_rate and 256 frame_samples case, or by any other amount
which results in an whole number of bytes per frame of encoded audio.

If a system were to provide a loss-free method of congestion detection,
such as ECN or some kind of queuing delay estimation, the codec could
gracefully and gradually rate adapt to provide the best supportable
quality without any glitches at all. (If congestion is detected via
loss then there will be a quality hit from lost frames, of course)

The codec design goal here was to efficiently provide the maximum amount of
rate-adaptability possible for a byte oriented transport. If not for the
need to place multiple CELT frames in a packet, there would be no need
to have any overhead for length coding at all as the RTP transport will
preserve payload length.

If the text were changed to say "CELT can encode to any whole number of
bytes per frame at any time without signalling a change to the decoder"
would this avoid the potential for misunderstanding?

-- 
Greg Maxwell