Dear Mr. Perkins,I'd like to follow up on your draft "Guidelines for the use of Variable Bit Rate Audio with Secure RTP" (http://www.ietf.org/id/draft-perkins-avt-srtp-vbr-audio-01.txt), which proposes that "VBR audio codecs that alter the size or spacing of their output according to the characteristics of the input speech signal SHOULD NOT be used with encrypted SRTP sessions." As already expressed in my previous email, I do not agree with such a proposal. Please let me explain why:
1. The decision what codec to use does not belong in SRTP or with the IETF. Just like SRTP should not prescribe that only hardware encryption be used, or that all calls be routed through secure proxies. While these kind of requirements would boost security, they also come at a cost, and it is not the role of SRTP or the IETF to decide the best trade-off between security and cost.
2. The cost of constant bitrate (CBR) coding is not insignificant: the average bitrate can easily double compared to variable bitrate (VBR). Many situations exist where Internet access is expensive or limited.
3. Many users place limited value on security, as witnessed by the wide use of PSTN and GSM. Such users may be happy with "cheap" security such as encryption, but would rather avoid the inconvenience, higher price or lower quality that comes with more drastic security measures.
4. Combining (2) and (3) suggests that encrypted VBR could be the "sweet spot" for some or even most users. As it happens, most VoIP calls today are made using software-encrypted VBR.
5. The [spot-me] reference in the draft shows that sentences spoken in a controlled environment can be recognized with some degree of accuracy. Given that there were 122 sentences in their "dictionary", and assuming at least 7 words per sentence, that suggests an extracted information rate of less than 1 bit per word, on average. Without major improvements, such a system could thus not be used to recognize natural speech which has a higher entropy.
6. You may disagree with (5) based on the view that it is very dangerous to downplay the threat from any information leak. But then your proposed method of adding overhang periods to active speech intervals is equally dangerous, as all it does is reduce the size of the leak. My personal view is that it is dangerous to jump to conclusions without further analysis or common sense.
I applaud any effort to raise awareness about security limitations and risks. But the goal should be to inform developers and users who want to balance security risk and cost as they see fit.
best, koen. Quoting Koen Vos <koen.vos at skype.net>:
Hi,I saw this draft suggesting that variable bitrate (VBR) speech codecs not be used in combination with SRTP:http://www.ietf.org/id/draft-perkins-avt-srtp-vbr-audio-01.txtI didn't find much discussion about this in the archive, and wonder how serious the proposal is.In particular, I would like to know how someone in the future should use a VBR codec in combination with encryption? Or will the IETF decide that what is prevalent today because of its pragmatic trade-off between security and efficiency is simply no longer acceptable tomorrow?thanks, koen. _______________________________________________ Audio/Video Transport Working Group avt at ietf.org https://www.ietf.org/mailman/listinfo/avt