idnits 2.17.1 draft-ietf-avtcore-srtp-vbr-audio-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (December 30, 2011) is 4500 days in the past. Is this intentional? Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group C. Perkins 3 Internet-Draft University of Glasgow 4 Intended status: BCP JM. Valin 5 Expires: July 2, 2012 Octasic Inc. 6 December 30, 2011 8 Guidelines for the use of Variable Bit Rate Audio with Secure RTP 9 draft-ietf-avtcore-srtp-vbr-audio-04.txt 11 Abstract 13 This memo discusses potential security issues that arise when using 14 variable bit rate audio with the secure RTP profile. Guidelines to 15 mitigate these issues are suggested. 17 Status of this Memo 19 This Internet-Draft is submitted in full conformance with the 20 provisions of BCP 78 and BCP 79. 22 Internet-Drafts are working documents of the Internet Engineering 23 Task Force (IETF). Note that other groups may also distribute 24 working documents as Internet-Drafts. The list of current Internet- 25 Drafts is at http://datatracker.ietf.org/drafts/current/. 27 Internet-Drafts are draft documents valid for a maximum of six months 28 and may be updated, replaced, or obsoleted by other documents at any 29 time. It is inappropriate to use Internet-Drafts as reference 30 material or to cite them other than as "work in progress." 32 This Internet-Draft will expire on July 2, 2012. 34 Copyright Notice 36 Copyright (c) 2011 IETF Trust and the persons identified as the 37 document authors. All rights reserved. 39 This document is subject to BCP 78 and the IETF Trust's Legal 40 Provisions Relating to IETF Documents 41 (http://trustee.ietf.org/license-info) in effect on the date of 42 publication of this document. Please review these documents 43 carefully, as they describe your rights and restrictions with respect 44 to this document. Code Components extracted from this document must 45 include Simplified BSD License text as described in Section 4.e of 46 the Trust Legal Provisions and are provided without warranty as 47 described in the Simplified BSD License. 49 Table of Contents 51 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 52 2. Scenario-Dependent Risk . . . . . . . . . . . . . . . . . . . . 3 53 3. Guidelines for use of VBR Audio with SRTP . . . . . . . . . . . 4 54 4. Guidelines for use of Voice Activity Detection with SRTP . . . 4 55 5. Padding the output of VBR codecs . . . . . . . . . . . . . . . 5 56 6. Security Considerations . . . . . . . . . . . . . . . . . . . . 6 57 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 6 58 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 6 59 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 6 60 9.1. Normative References . . . . . . . . . . . . . . . . . . . 6 61 9.2. Informative References . . . . . . . . . . . . . . . . . . 7 62 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 7 64 1. Introduction 66 The secure RTP framework (SRTP) [RFC3711] is a widely used framework 67 for securing RTP [RFC3550] sessions. SRTP provides the ability to 68 encrypt the payload of an RTP packet, and optionally add an 69 authentication tag, while leaving the RTP header and any header 70 extension in the clear. A range of encryption transforms can be used 71 with SRTP, but none of the pre-defined encryption transforms use any 72 padding; the RTP and SRTP payload sizes match exactly. 74 When using SRTP with voice streams compressed using variable bit rate 75 (VBR) codecs, the length of the compressed packets will therefore 76 depend on the characteristics of the speech signal. This variation 77 in packet size will leak a small amount of information about the 78 contents of the speech signal. This is potentially a security risk 79 for some applications. For example, [spot-me] shows that known 80 phrases in an encrypted call using the Speex codec in VBR mode can be 81 recognised with high accuracy in certain circumstances, and [fon-iks] 82 shows that approximate transcripts of encrypted VBR calls can be 83 derived for some codecs without breaking the encryption. How 84 significant these results are, and how they generalise to other 85 codecs, is still an open question. This memo discusses ways in which 86 such traffic analysis risks may be mitigated. 88 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 89 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 90 document are to be interpreted as described in RFC 2119 [RFC2119]. 92 2. Scenario-Dependent Risk 94 Whether the information leaks and attacks discussed in [spot-me], 95 [fon-iks], and similar works are significant is highly dependent on 96 the application and use scenario. In the worst case, using the rate 97 information to recognize a pre-recorded message knowing the set of 98 all possible messages would lead to near-perfect accuracy. Even when 99 the audio is not pre-recorded, there is a real possibility of being 100 able to recognize contents from encypted audio when the dialog is 101 highly structured (e.g., when the evesdropper knows that only a 102 handful of possible sentences are possible), and thus contain only 103 little information. Recognizing unconstrained conversational speech 104 from the rate information alone is unreliable and computationally 105 expensive at present, but does appear possible in some circumstances. 106 These attacks are only likely to improve over time. 108 In practical SRTP scenarios, it must also be considered how 109 significant the information leak is when compared to other SRTP- 110 related information, such as the fact that the source and destination 111 IP addresses are available. 113 3. Guidelines for use of VBR Audio with SRTP 115 It is the responsibility of the application designer to determine the 116 appropriate trade-off between security and bandwidth overhead. As a 117 general rule, VBR codecs should be considered safe in the context of 118 low-value encrypted unstructured calls. However, applications that 119 make use of pre-recorded messages where the contents of such pre- 120 recorded messages may be of any value to an evesdropper (i.e., 121 messages beyond standard greeting messages) SHOULD NOT use codecs in 122 VBR mode. Interactive voice response (IVR) applications would be 123 particularly vulnerable since an evesdropper could easily use the 124 rate information to easily recognize the prompts being played out. 125 Applications conveying highly sensitive unstructured information 126 SHOULD NOT use codecs in VBR mode. 128 It is safe to use variable rate coding to adapt the output of a voice 129 codec to match characteristics of a network channel, for example for 130 congestion control purposes, provided this adaptation done in a way 131 that does not expose any information on the speech signal. That is, 132 if the variation is driven by the available network bandwidth, not by 133 the input speech (i.e., if the packet sizes and spacing are constant 134 unless the network conditions change). VBR speech codecs can safely 135 be used in this fashion with SRTP while avoiding leaking information 136 on the contents of the speech signal that might be useful for traffic 137 analysis. 139 4. Guidelines for use of Voice Activity Detection with SRTP 141 Many speech codecs employ some form of voice activity detection (VAD) 142 to either suppress output frames, or generate some form of lower-rate 143 comfort noise frames, during periods when the speaker is not active. 144 If VAD is used on an encrypted speech signal, then some information 145 about the characteristics of that speech signal can be determined by 146 watching the patterns of voice activity. This information leakage is 147 less than with VBR coding since there are only two rates possible. 149 The information leakage due to VAD in SRTP audio sessions can be much 150 reduced if the sender adds an unpredictable "overhang" period to the 151 end of active speech intervals, so obscuring their actual length. An 152 RTP sender using VAD with encrypted SRTP audio SHOULD insert such an 153 overhang period at the end of each talkspurt, delaying the start of 154 the silence/comfort noise by a random interval. The length of the 155 overhang applied to each talkspurt must be randomly chosen in such a 156 way that it is computationally infeasible for an attacker to reliably 157 estimate the length of that talkspurt. This may be more important 158 for short talk spurts, since is seems easier to distinguish between 159 different single word reponses based on the exact word length, than 160 to glean meaning from the duration of a longer phrase. The audio 161 data comprising the overhang period must be packetised and 162 transmitted in RTP packets in a manner that is indistinguishable from 163 the other data in the talkspurt. 165 The overhang period SHOULD have an exponentially-decreasing 166 probability distribution function. This ensures a long tail, while 167 being easy to compute. It is RECOMMENDED to use an overhang with a 168 "half life" of a few hundred milliseconds (this should be sufficient 169 to obscure the presence of inter-word pauses and the lengths of 170 single words spoken in isolation, for example the digits of a credit 171 card number clearly enunciated for an automated system, but not so 172 long as to significantly reduce the effectiveness of VAD for 173 detecting listening pauses). Despite the overhang (and no matter 174 what the duration is), there is still a small amount of information 175 leaked about the start time of the talkspurt due to the fact that we 176 cannot apply an overhang to the start of a talkspurt without 177 unacceptably affecting intelligibility. For that reason, VAD SHOULD 178 NOT be used in encrypted IVR applications where the content of pre- 179 recorded messages may be of any value to an eavesdropper. 181 The application of a random overhang period to each talkspurt will 182 reduce the effectiveness of VAD in SRTP sessions when compared to 183 non-SRTP sessions. It is, however, still expected that the use of 184 VAD will provide a significant bandwidth saving for many encrypted 185 sessions. 187 5. Padding the output of VBR codecs 189 For scenarios where VBR is considered unsafe, a constant bit rate 190 (CBR) codec SHOULD be negotiated and used instead, or the VBR codec 191 SHOULD be operated in a CBR mode. However, if the codec does not 192 support CBR, RTP padding SHOULD be used to reduce the information 193 leak to an insignificant level. Packets may be padded to a constant 194 size or to a small range of sizes ([spot-me] achieves good results by 195 padding to the next multiple of 16 octets, but the amount of padding 196 needed to hide the variation in packet size will depend on the codec 197 and the sophistication of the attacker), or may be padded to a size 198 that varies with time. The most secure, and RECOMMENDED, option is 199 to pad all packets throughout the call to the same size. 201 In the case where the size of the padded packets varies in time, the 202 same concerns as for VAD apply. That is, the padding SHOULD NOT be 203 reduced without waiting for a certain (random) time. The RECOMMENDED 204 "hold time" is the same as the one for VAD. 206 Note that SRTP encrypts the count of the number of octets of padding 207 added to a packet, but not the bit in the RTP header that indicates 208 that the packet has been padded. For this reason, it is RECOMMENDED 209 to add at least one octet of padding to all packets in a media 210 stream, so an attacker cannot tell which packets needed padding. 212 6. Security Considerations 214 This entire memo is about security. The security considerations of 215 [RFC3711] also apply. 217 7. IANA Considerations 219 No IANA actions are required. 221 8. Acknowledgements 223 ZRTP [RFC6189] contains similar recommendations; the purpose of this 224 memo is to highlight these issues to a wider audience, since they are 225 not specific to ZRTP. Thanks are due to Phil Zimmermann, Stefan 226 Doehla, Mats Naslund, Gregory Maxwell, David McGrew, Mark Baugher, 227 Koen Vos, Ingemar Johansson, and Stephen Farrell for their comments 228 and feedback on this memo. 230 9. References 232 9.1. Normative References 234 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 235 Requirement Levels", BCP 14, RFC 2119, March 1997. 237 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 238 Jacobson, "RTP: A Transport Protocol for Real-Time 239 Applications", STD 64, RFC 3550, July 2003. 241 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 242 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 243 RFC 3711, March 2004. 245 9.2. Informative References 247 [RFC6189] Zimmermann, P., Johnston, A., and J. Callas, "ZRTP: Media 248 Path Key Agreement for Unicast Secure RTP", RFC 6189, 249 April 2011. 251 [fon-iks] White, A., Matthews, A., Snow, K., and F. Monrose, 252 "Phonotactic Reconstruction of Encrypted VoIP 253 Conversations: Hookt on fon-iks", Proceedings of the IEEE 254 Symposium on Security and Privacy 2011, May 2011. 256 [spot-me] Wright, C., Ballard, L., Coull, S., Monrose, F., and G. 257 Masson, "Spot me if you can: Uncovering spoken phrases in 258 encrypted VoIP conversation", Proceedings of the IEEE 259 Symposium on Security and Privacy 2008, May 2008. 261 Authors' Addresses 263 Colin Perkins 264 University of Glasgow 265 School of Computing Science 266 Glasgow G12 8QQ 267 UK 269 Email: csp@csperkins.org 271 Jean-Marc Valin 272 Octasic Inc. 273 4101 Molson Street, Suite 300 274 Montreal, Quebec H1Y 3L1 275 Canada 277 Email: Jean-Marc.Valin@octasic.com