[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [AVT] Audio Level Indicators - new versions submitted



Hey Stefan,

Enrico Marocco wrote:
> Stefan Sayer wrote:
>> Why not use the (so far unused) MSB of the audio level in the
>> mixer-to-client extension header as voice activity flag (live the V in
>> client-to-mixer) ?

I guess mainly because we didn't see what use a client would have for
such kind of information. However, now I am thinking that ambitious UI
builders might like to use some kind of content dependent color
indication (e.g. red bars for noise and green bars for speech).

>> While it is possible for the mixer, it may be impossible for the client 
>> to accurately determine voice activity of the participants (e.g. the 
>> spectral information is lost).
>>
>> The format would then look like this:
>>      0                   1                   2                   3
>>         0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
>>        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>>        |  ID   |  len  |V|  level 1    |V|  level 2    |V|  level 3   ...
>>        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>>
>>               Figure 2: Audio level indicators extension format
>>
>> An open question would be whether this is mandatory, or how a mixer 
>> could signal that it is not doing voice activity detection (i.e. V will 
>> be zero all the time).
> 
> Well, I'd say that could be an optional feature negotiated in the SDP
> along with the extension header.

I am wondering whether we need to negotiate that at all. If we assume
that we are trying to solve the use case from my example above then we
could go for something like - 0 means the activity of a participant
corresponds to speech or that we don't know what it is (and hence we
show green bars) and 1 means this is most probably noise so the UA would
show the level bars in red.

This would however mean that we'd be assigning a meaning to this bit
which is exactly the opposite of the one in the client-to-mixer draft.
Therefore, assuming we'd like to have the same format in both drafts, we
could either reverse the meaning of the bit in the client-to-mixer draft
(Jonathan, is this an option?) or keep it as it is and rely on SDP to
indicate support for voice detection.

How does this sound?

Emil