An open question would be whether this is mandatory, or how a mixer
could signal that it is not doing voice activity detection (i.e. V will
be zero all the time).
Well, I'd say that could be an optional feature negotiated in the SDP
along with the extension header.
I am wondering whether we need to negotiate that at all. If we assume
that we are trying to solve the use case from my example above then we
could go for something like - 0 means the activity of a participant
corresponds to speech or that we don't know what it is (and hence we
show green bars) and 1 means this is most probably noise so the UA would
show the level bars in red.
This would however mean that we'd be assigning a meaning to this bit
which is exactly the opposite of the one in the client-to-mixer draft.
Therefore, assuming we'd like to have the same format in both drafts, we
could either reverse the meaning of the bit in the client-to-mixer draft
(Jonathan, is this an option?) or keep it as it is and rely on SDP to
indicate support for voice detection.
How does this sound?