idnits 2.17.1 draft-ietf-avtcore-srtp-vbr-audio-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (April 28, 2011) is 4739 days in the past. Is this intentional? Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group C. Perkins 3 Internet-Draft University of Glasgow 4 Intended status: BCP JM. Valin 5 Expires: October 30, 2011 Octasic Inc. 6 April 28, 2011 8 Guidelines for the use of Variable Bit Rate Audio with Secure RTP 9 draft-ietf-avtcore-srtp-vbr-audio-02.txt 11 Abstract 13 This memo discusses potential security issues that arise when using 14 variable bit rate audio with the secure RTP profile. Guidelines to 15 mitigate these issues are suggested. 17 Status of this Memo 19 This Internet-Draft is submitted in full conformance with the 20 provisions of BCP 78 and BCP 79. 22 Internet-Drafts are working documents of the Internet Engineering 23 Task Force (IETF). Note that other groups may also distribute 24 working documents as Internet-Drafts. The list of current Internet- 25 Drafts is at http://datatracker.ietf.org/drafts/current/. 27 Internet-Drafts are draft documents valid for a maximum of six months 28 and may be updated, replaced, or obsoleted by other documents at any 29 time. It is inappropriate to use Internet-Drafts as reference 30 material or to cite them other than as "work in progress." 32 This Internet-Draft will expire on October 30, 2011. 34 Copyright Notice 36 Copyright (c) 2011 IETF Trust and the persons identified as the 37 document authors. All rights reserved. 39 This document is subject to BCP 78 and the IETF Trust's Legal 40 Provisions Relating to IETF Documents 41 (http://trustee.ietf.org/license-info) in effect on the date of 42 publication of this document. Please review these documents 43 carefully, as they describe your rights and restrictions with respect 44 to this document. Code Components extracted from this document must 45 include Simplified BSD License text as described in Section 4.e of 46 the Trust Legal Provisions and are provided without warranty as 47 described in the Simplified BSD License. 49 Table of Contents 51 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 52 2. Scenario-Dependent Risk . . . . . . . . . . . . . . . . . . . . 3 53 3. Guidelines for use of VBR Audio with SRTP . . . . . . . . . . . 4 54 4. Guidelines for use of Voice Activity Detection with SRTP . . . 4 55 5. Padding the output of VBR codecs . . . . . . . . . . . . . . . 5 56 6. Security Considerations . . . . . . . . . . . . . . . . . . . . 6 57 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 6 58 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 6 59 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 6 60 9.1. Normative References . . . . . . . . . . . . . . . . . . . 6 61 9.2. Informative References . . . . . . . . . . . . . . . . . . 6 62 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 6 64 1. Introduction 66 The secure RTP framework (SRTP) [RFC3711] is a widely used framework 67 for securing RTP sessions. SRTP provides the ability to encrypt the 68 payload of an RTP packet, and optionally add an authentication tag, 69 while leaving the RTP header and any header extension in the clear. 70 A range of encryption transforms can be used with SRTP, but none of 71 the pre-defined encryption transforms use any padding; the RTP and 72 SRTP payload sizes match exactly. 74 When using SRTP with voice streams compressed using variable bit rate 75 (VBR) codecs, the length of the compressed packets will therefore 76 depend on the characteristics of the speech signal. This variation 77 in packet size will leak a small amount of information about the 78 contents of the speech signal. For example [spot-me] shows that 79 known phrases in an encrypted call using the Speex codec in VBR mode 80 can be recognised with high accuracy in certain circumstances, 81 without breaking the encryption. Other work, referenced from 82 [spot-me], has shown that the language spoken in encrypted 83 conversations can also be recognised. This is potentially a security 84 risk for some applications. How significant these results are and 85 how they generalise to other codecs is still an open question. This 86 memo discusses ways in which this traffic analysis risk may be 87 mitigated. 89 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 90 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 91 document are to be interpreted as described in RFC 2119 [RFC2119]. 93 2. Scenario-Dependent Risk 95 Whether the information leak analysed in [spot-me] is significant 96 highly depends on the application. In the worst case, using the rate 97 information to recognize a pre-recorded message knowing the set of 98 all possible messages would lead to near-perfect accuracy. Even when 99 the audio is not pre-recorded, there is a real possibility of being 100 able to recognize contents from encypted audio when the dialog is 101 highly structured (e.g. when the evesdropper knows that only a 102 handful of possible sentences are possible) and thus contain only 103 little information. On the other end, recognizing unconstrained 104 conversational speech from the rate information alone appears to be 105 highly unlikely at best. In fact, such a task is already considered 106 a hard problem even when one has access to the unencrypted audio. 108 In practical SRTP scenarios, it must also be considered how 109 significant the information leak is when compared to other SRTP- 110 related information, such as the fact that the source and destination 111 IP addresses are available. 113 3. Guidelines for use of VBR Audio with SRTP 115 It is the responsibility of the application designer to determine the 116 appropriate trade-off between security and bandwidth overhead. As a 117 general rule, VBR codecs should be considered safe in the context of 118 encrypted one-to-one calls. However, applications that make use of 119 pre-recorded messages where the contents of such pre-recorded 120 messages may be of any value to an evesdropper (i.e., messages beyond 121 standard greeting messages) SHOULD NOT use codecs in VBR mode. IVR 122 applications would be particularly vulnerable since an evesdropper 123 could easily use the rate information to easily recognize the prompts 124 being played out. 126 It is safe to use variable rate coding to adapt the output of a voice 127 codec to match characteristics of a network channel, for example for 128 congestion control purposes, provided this adaptation done in a way 129 that does not expose any information on the speech signal. That is, 130 if the variation is driven by the available network bandwidth, not by 131 the input speech (i.e., if the packet sizes and spacing are constant 132 unless the network conditions change). VBR speech codecs can safely 133 be used in this fashion with SRTP while avoiding leaking information 134 on the contents of the speech signal that might be useful for traffic 135 analysis. 137 4. Guidelines for use of Voice Activity Detection with SRTP 139 Many speech codecs employ some form of voice activity detection (VAD) 140 to either suppress output frames, or generate some form of lower-rate 141 comfort noise frames, during periods when the speaker is not active. 142 If VAD is used on an encrypted speech signal, then some information 143 about the characteristics of that speech signal can be determined by 144 watching the patterns of voice activity. This information leakage is 145 less than with VBR coding since there are only two rates possible. 147 The information leakage due to VAD in SRTP audio sessions can be much 148 reduced if the sender adds an unpredictable "overhang" period to the 149 end of active speech intervals, so obscuring their actual length. an 150 RTP sender using VAD with encrypted SRTP audio SHOULD insert such an 151 overhang period at the end of each talkspurt, delaying the start of 152 the silence/comfort noise by a random interval. The length of the 153 overhang applied to each talkspurt must be randomly chosen in such a 154 way that it is computationally infeasible for an attacker to reliably 155 estimate the length of that talkspurt. The audio data comprising the 156 overhang period must be packetised and transmitted in RTP packets in 157 a manner that is indistinguishable from the other data in the 158 talkspurt. 160 The overhang period SHOULD have an exponentially-decreasing 161 probability distribution function. This ensures a long tail, while 162 being easy to compute. It is RECOMMENDED to use an overhang with a 163 "half life" of a few hundred milliseconds (this should be sufficient 164 to obscure the presence of inter-word pauses and the lengths of 165 single words spoken in isolation, for example the digits of a credit 166 card number clearly enunciated for an automated system, but not so 167 long as to significantly reduce the effectiveness of VAD for 168 detecting listening pauses). Despite the overhang (and no matter 169 what the duration is), there is still a small amount of information 170 leaked about the start time of the talkspurt due to the fact that we 171 cannot apply an overhang to the start of a talkspurt without 172 unacceptably affecting intelligibility. For that reason, VAD SHOULD 173 NOT be used in encrypted IVR applications where the content of pre- 174 recorded messages may be of any value to an eavesdropper. 176 The application of a random overhang period to each talkspurt will 177 reduce the effectiveness of VAD in SRTP sessions when compared to 178 non-SRTP sessions. It is, however, still expected that the use of 179 VAD will provide a significant bandwidth saving for many encrypted 180 sessions. 182 5. Padding the output of VBR codecs 184 For scenarios where VBR is considered unsafe, the codec SHOULD be 185 operated in constant bit rate (CBR) mode. However, if the codec does 186 not support CBR, RTP padding SHOULD be used to reduce the information 187 leak to an insignificant level. Packets may be padded to a constant 188 size ([spot-me] achieves good results by padding to the next multiple 189 of 16 octets, but the amount of padding needed to hide the variation 190 in packet size will depend on the codec), or may be padded to a size 191 that varies with time. In the case where the size of the padded 192 packets varies in time, the same concerns as for VAD apply. That is, 193 the padding SHOULD NOT be reduced without waiting for a certain 194 (random) time. The RECOMMENDED "hold time" is the same as the one 195 for VAD. 197 Note that SRTP encrypts the count of the number of octets of padding 198 added to a packet, but not the bit in the RTP header that indicates 199 that the packet has been padded. For this reason, it is RECOMMENDED 200 to add at least one octet of padding to all packets in a media 201 stream, so an attacker cannot tell which packets needed padding. 203 6. Security Considerations 205 The security considerations of [RFC3711] apply. 207 7. IANA Considerations 209 No IANA actions are required. 211 8. Acknowledgements 213 This memo is based on the discussion in [spot-me]. ZRTP [RFC6189] 214 contain a similar recommendation; the purpose of this memo is to 215 highlight these issues to a wider audience, since they are not 216 specific to ZRTP. Thanks are due to Phil Zimmermann, Stefan Doehla, 217 Mats Naslund, Gregory Maxwell, David McGrew, Mark Baugher, Koen Vos, 218 and Ingemar Johansson for their comments and feedback on this memo. 220 9. References 222 9.1. Normative References 224 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 225 Requirement Levels", BCP 14, RFC 2119, March 1997. 227 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 228 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 229 RFC 3711, March 2004. 231 9.2. Informative References 233 [RFC6189] Zimmermann, P., Johnston, A., and J. Callas, "ZRTP: Media 234 Path Key Agreement for Unicast Secure RTP", RFC 6189, 235 April 2011. 237 [spot-me] Wright, C., Ballard, L., Coull, S., Monrose, F., and G. 238 Masson, "Spot me if you can: Uncovering spoken phrases in 239 encrypted VoIP conversation", Proceedings of the IEEE 240 Symposium on Security and Privacy 2008, May 2008. 242 Authors' Addresses 244 Colin Perkins 245 University of Glasgow 246 School of Computing Science 247 Glasgow G12 8QQ 248 UK 250 Email: csp@csperkins.org 252 Jean-Marc Valin 253 Octasic Inc. 254 4101 Molson Street, Suite 300 255 Montreal, Quebec H1Y 3L1 256 Canada 258 Email: Jean-Marc.Valin@octasic.com