idnits 2.17.1 

draft-ietf-avtcore-srtp-vbr-audio-04.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (December 30, 2011) is 4500 days in the past.  Is this
     intentional?


  Checking references for intended status: Best Current Practice
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

     No issues found here.

     Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Network Working Group                                         C. Perkins
3	Internet-Draft                                     University of Glasgow
4	Intended status: BCP                                           JM. Valin
5	Expires: July 2, 2012                                       Octasic Inc.
6	                                                       December 30, 2011

8	   Guidelines for the use of Variable Bit Rate Audio with Secure RTP
9	                draft-ietf-avtcore-srtp-vbr-audio-04.txt

11	Abstract

13	   This memo discusses potential security issues that arise when using
14	   variable bit rate audio with the secure RTP profile.  Guidelines to
15	   mitigate these issues are suggested.

17	Status of this Memo

19	   This Internet-Draft is submitted in full conformance with the
20	   provisions of BCP 78 and BCP 79.

22	   Internet-Drafts are working documents of the Internet Engineering
23	   Task Force (IETF).  Note that other groups may also distribute
24	   working documents as Internet-Drafts.  The list of current Internet-
25	   Drafts is at http://datatracker.ietf.org/drafts/current/.

27	   Internet-Drafts are draft documents valid for a maximum of six months
28	   and may be updated, replaced, or obsoleted by other documents at any
29	   time.  It is inappropriate to use Internet-Drafts as reference
30	   material or to cite them other than as "work in progress."

32	   This Internet-Draft will expire on July 2, 2012.

34	Copyright Notice

36	   Copyright (c) 2011 IETF Trust and the persons identified as the
37	   document authors.  All rights reserved.

39	   This document is subject to BCP 78 and the IETF Trust's Legal
40	   Provisions Relating to IETF Documents
41	   (http://trustee.ietf.org/license-info) in effect on the date of
42	   publication of this document.  Please review these documents
43	   carefully, as they describe your rights and restrictions with respect
44	   to this document.  Code Components extracted from this document must
45	   include Simplified BSD License text as described in Section 4.e of
46	   the Trust Legal Provisions and are provided without warranty as
47	   described in the Simplified BSD License.

49	Table of Contents

51	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . 3
52	   2.  Scenario-Dependent Risk . . . . . . . . . . . . . . . . . . . . 3
53	   3.  Guidelines for use of VBR Audio with SRTP . . . . . . . . . . . 4
54	   4.  Guidelines for use of Voice Activity Detection with SRTP  . . . 4
55	   5.  Padding the output of VBR codecs  . . . . . . . . . . . . . . . 5
56	   6.  Security Considerations . . . . . . . . . . . . . . . . . . . . 6
57	   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 6
58	   8.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . . . 6
59	   9.  References  . . . . . . . . . . . . . . . . . . . . . . . . . . 6
60	     9.1.  Normative References  . . . . . . . . . . . . . . . . . . . 6
61	     9.2.  Informative References  . . . . . . . . . . . . . . . . . . 7
62	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . . . 7

64	1.  Introduction

66	   The secure RTP framework (SRTP) [RFC3711] is a widely used framework
67	   for securing RTP [RFC3550] sessions.  SRTP provides the ability to
68	   encrypt the payload of an RTP packet, and optionally add an
69	   authentication tag, while leaving the RTP header and any header
70	   extension in the clear.  A range of encryption transforms can be used
71	   with SRTP, but none of the pre-defined encryption transforms use any
72	   padding; the RTP and SRTP payload sizes match exactly.

74	   When using SRTP with voice streams compressed using variable bit rate
75	   (VBR) codecs, the length of the compressed packets will therefore
76	   depend on the characteristics of the speech signal.  This variation
77	   in packet size will leak a small amount of information about the
78	   contents of the speech signal.  This is potentially a security risk
79	   for some applications.  For example, [spot-me] shows that known
80	   phrases in an encrypted call using the Speex codec in VBR mode can be
81	   recognised with high accuracy in certain circumstances, and [fon-iks]
82	   shows that approximate transcripts of encrypted VBR calls can be
83	   derived for some codecs without breaking the encryption.  How
84	   significant these results are, and how they generalise to other
85	   codecs, is still an open question.  This memo discusses ways in which
86	   such traffic analysis risks may be mitigated.

88	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
89	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
90	   document are to be interpreted as described in RFC 2119 [RFC2119].

92	2.  Scenario-Dependent Risk

94	   Whether the information leaks and attacks discussed in [spot-me],
95	   [fon-iks], and similar works are significant is highly dependent on
96	   the application and use scenario.  In the worst case, using the rate
97	   information to recognize a pre-recorded message knowing the set of
98	   all possible messages would lead to near-perfect accuracy.  Even when
99	   the audio is not pre-recorded, there is a real possibility of being
100	   able to recognize contents from encypted audio when the dialog is
101	   highly structured (e.g., when the evesdropper knows that only a
102	   handful of possible sentences are possible), and thus contain only
103	   little information.  Recognizing unconstrained conversational speech
104	   from the rate information alone is unreliable and computationally
105	   expensive at present, but does appear possible in some circumstances.
106	   These attacks are only likely to improve over time.

108	   In practical SRTP scenarios, it must also be considered how
109	   significant the information leak is when compared to other SRTP-
110	   related information, such as the fact that the source and destination
111	   IP addresses are available.

113	3.  Guidelines for use of VBR Audio with SRTP

115	   It is the responsibility of the application designer to determine the
116	   appropriate trade-off between security and bandwidth overhead.  As a
117	   general rule, VBR codecs should be considered safe in the context of
118	   low-value encrypted unstructured calls.  However, applications that
119	   make use of pre-recorded messages where the contents of such pre-
120	   recorded messages may be of any value to an evesdropper (i.e.,
121	   messages beyond standard greeting messages) SHOULD NOT use codecs in
122	   VBR mode.  Interactive voice response (IVR) applications would be
123	   particularly vulnerable since an evesdropper could easily use the
124	   rate information to easily recognize the prompts being played out.
125	   Applications conveying highly sensitive unstructured information
126	   SHOULD NOT use codecs in VBR mode.

128	   It is safe to use variable rate coding to adapt the output of a voice
129	   codec to match characteristics of a network channel, for example for
130	   congestion control purposes, provided this adaptation done in a way
131	   that does not expose any information on the speech signal.  That is,
132	   if the variation is driven by the available network bandwidth, not by
133	   the input speech (i.e., if the packet sizes and spacing are constant
134	   unless the network conditions change).  VBR speech codecs can safely
135	   be used in this fashion with SRTP while avoiding leaking information
136	   on the contents of the speech signal that might be useful for traffic
137	   analysis.

139	4.  Guidelines for use of Voice Activity Detection with SRTP

141	   Many speech codecs employ some form of voice activity detection (VAD)
142	   to either suppress output frames, or generate some form of lower-rate
143	   comfort noise frames, during periods when the speaker is not active.
144	   If VAD is used on an encrypted speech signal, then some information
145	   about the characteristics of that speech signal can be determined by
146	   watching the patterns of voice activity.  This information leakage is
147	   less than with VBR coding since there are only two rates possible.

149	   The information leakage due to VAD in SRTP audio sessions can be much
150	   reduced if the sender adds an unpredictable "overhang" period to the
151	   end of active speech intervals, so obscuring their actual length.  An
152	   RTP sender using VAD with encrypted SRTP audio SHOULD insert such an
153	   overhang period at the end of each talkspurt, delaying the start of
154	   the silence/comfort noise by a random interval.  The length of the
155	   overhang applied to each talkspurt must be randomly chosen in such a
156	   way that it is computationally infeasible for an attacker to reliably
157	   estimate the length of that talkspurt.  This may be more important
158	   for short talk spurts, since is seems easier to distinguish between
159	   different single word reponses based on the exact word length, than
160	   to glean meaning from the duration of a longer phrase.  The audio
161	   data comprising the overhang period must be packetised and
162	   transmitted in RTP packets in a manner that is indistinguishable from
163	   the other data in the talkspurt.

165	   The overhang period SHOULD have an exponentially-decreasing
166	   probability distribution function.  This ensures a long tail, while
167	   being easy to compute.  It is RECOMMENDED to use an overhang with a
168	   "half life" of a few hundred milliseconds (this should be sufficient
169	   to obscure the presence of inter-word pauses and the lengths of
170	   single words spoken in isolation, for example the digits of a credit
171	   card number clearly enunciated for an automated system, but not so
172	   long as to significantly reduce the effectiveness of VAD for
173	   detecting listening pauses).  Despite the overhang (and no matter
174	   what the duration is), there is still a small amount of information
175	   leaked about the start time of the talkspurt due to the fact that we
176	   cannot apply an overhang to the start of a talkspurt without
177	   unacceptably affecting intelligibility.  For that reason, VAD SHOULD
178	   NOT be used in encrypted IVR applications where the content of pre-
179	   recorded messages may be of any value to an eavesdropper.

181	   The application of a random overhang period to each talkspurt will
182	   reduce the effectiveness of VAD in SRTP sessions when compared to
183	   non-SRTP sessions.  It is, however, still expected that the use of
184	   VAD will provide a significant bandwidth saving for many encrypted
185	   sessions.

187	5.  Padding the output of VBR codecs

189	   For scenarios where VBR is considered unsafe, a constant bit rate
190	   (CBR) codec SHOULD be negotiated and used instead, or the VBR codec
191	   SHOULD be operated in a CBR mode.  However, if the codec does not
192	   support CBR, RTP padding SHOULD be used to reduce the information
193	   leak to an insignificant level.  Packets may be padded to a constant
194	   size or to a small range of sizes ([spot-me] achieves good results by
195	   padding to the next multiple of 16 octets, but the amount of padding
196	   needed to hide the variation in packet size will depend on the codec
197	   and the sophistication of the attacker), or may be padded to a size
198	   that varies with time.  The most secure, and RECOMMENDED, option is
199	   to pad all packets throughout the call to the same size.

201	   In the case where the size of the padded packets varies in time, the
202	   same concerns as for VAD apply.  That is, the padding SHOULD NOT be
203	   reduced without waiting for a certain (random) time.  The RECOMMENDED
204	   "hold time" is the same as the one for VAD.

206	   Note that SRTP encrypts the count of the number of octets of padding
207	   added to a packet, but not the bit in the RTP header that indicates
208	   that the packet has been padded.  For this reason, it is RECOMMENDED
209	   to add at least one octet of padding to all packets in a media
210	   stream, so an attacker cannot tell which packets needed padding.

212	6.  Security Considerations

214	   This entire memo is about security.  The security considerations of
215	   [RFC3711] also apply.

217	7.  IANA Considerations

219	   No IANA actions are required.

221	8.  Acknowledgements

223	   ZRTP [RFC6189] contains similar recommendations; the purpose of this
224	   memo is to highlight these issues to a wider audience, since they are
225	   not specific to ZRTP.  Thanks are due to Phil Zimmermann, Stefan
226	   Doehla, Mats Naslund, Gregory Maxwell, David McGrew, Mark Baugher,
227	   Koen Vos, Ingemar Johansson, and Stephen Farrell for their comments
228	   and feedback on this memo.

230	9.  References

232	9.1.  Normative References

234	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
235	              Requirement Levels", BCP 14, RFC 2119, March 1997.

237	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
238	              Jacobson, "RTP: A Transport Protocol for Real-Time
239	              Applications", STD 64, RFC 3550, July 2003.

241	   [RFC3711]  Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
242	              Norrman, "The Secure Real-time Transport Protocol (SRTP)",
243	              RFC 3711, March 2004.

245	9.2.  Informative References

247	   [RFC6189]  Zimmermann, P., Johnston, A., and J. Callas, "ZRTP: Media
248	              Path Key Agreement for Unicast Secure RTP", RFC 6189,
249	              April 2011.

251	   [fon-iks]  White, A., Matthews, A., Snow, K., and F. Monrose,
252	              "Phonotactic Reconstruction of Encrypted VoIP
253	              Conversations: Hookt on fon-iks", Proceedings of the IEEE
254	              Symposium on Security and Privacy 2011, May 2011.

256	   [spot-me]  Wright, C., Ballard, L., Coull, S., Monrose, F., and G.
257	              Masson, "Spot me if you can: Uncovering spoken phrases in
258	              encrypted VoIP conversation", Proceedings of the  IEEE
259	              Symposium on Security and Privacy 2008, May 2008.

261	Authors' Addresses

263	   Colin Perkins
264	   University of Glasgow
265	   School of Computing Science
266	   Glasgow  G12 8QQ
267	   UK

269	   Email: csp@csperkins.org

271	   Jean-Marc Valin
272	   Octasic Inc.
273	   4101 Molson Street, Suite 300
274	   Montreal, Quebec  H1Y 3L1
275	   Canada

277	   Email: Jean-Marc.Valin@octasic.com