idnits 2.17.1 

draft-ietf-avtcore-srtp-vbr-audio-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (April 28, 2011) is 4739 days in the past.  Is this
     intentional?


  Checking references for intended status: Best Current Practice
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

     No issues found here.

     Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Network Working Group                                         C. Perkins
3	Internet-Draft                                     University of Glasgow
4	Intended status: BCP                                           JM. Valin
5	Expires: October 30, 2011                                   Octasic Inc.
6	                                                          April 28, 2011

8	   Guidelines for the use of Variable Bit Rate Audio with Secure RTP
9	                draft-ietf-avtcore-srtp-vbr-audio-02.txt

11	Abstract

13	   This memo discusses potential security issues that arise when using
14	   variable bit rate audio with the secure RTP profile.  Guidelines to
15	   mitigate these issues are suggested.

17	Status of this Memo

19	   This Internet-Draft is submitted in full conformance with the
20	   provisions of BCP 78 and BCP 79.

22	   Internet-Drafts are working documents of the Internet Engineering
23	   Task Force (IETF).  Note that other groups may also distribute
24	   working documents as Internet-Drafts.  The list of current Internet-
25	   Drafts is at http://datatracker.ietf.org/drafts/current/.

27	   Internet-Drafts are draft documents valid for a maximum of six months
28	   and may be updated, replaced, or obsoleted by other documents at any
29	   time.  It is inappropriate to use Internet-Drafts as reference
30	   material or to cite them other than as "work in progress."

32	   This Internet-Draft will expire on October 30, 2011.

34	Copyright Notice

36	   Copyright (c) 2011 IETF Trust and the persons identified as the
37	   document authors.  All rights reserved.

39	   This document is subject to BCP 78 and the IETF Trust's Legal
40	   Provisions Relating to IETF Documents
41	   (http://trustee.ietf.org/license-info) in effect on the date of
42	   publication of this document.  Please review these documents
43	   carefully, as they describe your rights and restrictions with respect
44	   to this document.  Code Components extracted from this document must
45	   include Simplified BSD License text as described in Section 4.e of
46	   the Trust Legal Provisions and are provided without warranty as
47	   described in the Simplified BSD License.

49	Table of Contents

51	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . 3
52	   2.  Scenario-Dependent Risk . . . . . . . . . . . . . . . . . . . . 3
53	   3.  Guidelines for use of VBR Audio with SRTP . . . . . . . . . . . 4
54	   4.  Guidelines for use of Voice Activity Detection with SRTP  . . . 4
55	   5.  Padding the output of VBR codecs  . . . . . . . . . . . . . . . 5
56	   6.  Security Considerations . . . . . . . . . . . . . . . . . . . . 6
57	   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 6
58	   8.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . . . 6
59	   9.  References  . . . . . . . . . . . . . . . . . . . . . . . . . . 6
60	     9.1.  Normative References  . . . . . . . . . . . . . . . . . . . 6
61	     9.2.  Informative References  . . . . . . . . . . . . . . . . . . 6
62	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . . . 6

64	1.  Introduction

66	   The secure RTP framework (SRTP) [RFC3711] is a widely used framework
67	   for securing RTP sessions.  SRTP provides the ability to encrypt the
68	   payload of an RTP packet, and optionally add an authentication tag,
69	   while leaving the RTP header and any header extension in the clear.
70	   A range of encryption transforms can be used with SRTP, but none of
71	   the pre-defined encryption transforms use any padding; the RTP and
72	   SRTP payload sizes match exactly.

74	   When using SRTP with voice streams compressed using variable bit rate
75	   (VBR) codecs, the length of the compressed packets will therefore
76	   depend on the characteristics of the speech signal.  This variation
77	   in packet size will leak a small amount of information about the
78	   contents of the speech signal.  For example [spot-me] shows that
79	   known phrases in an encrypted call using the Speex codec in VBR mode
80	   can be recognised with high accuracy in certain circumstances,
81	   without breaking the encryption.  Other work, referenced from
82	   [spot-me], has shown that the language spoken in encrypted
83	   conversations can also be recognised.  This is potentially a security
84	   risk for some applications.  How significant these results are and
85	   how they generalise to other codecs is still an open question.  This
86	   memo discusses ways in which this traffic analysis risk may be
87	   mitigated.

89	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
90	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
91	   document are to be interpreted as described in RFC 2119 [RFC2119].

93	2.  Scenario-Dependent Risk

95	   Whether the information leak analysed in [spot-me] is significant
96	   highly depends on the application.  In the worst case, using the rate
97	   information to recognize a pre-recorded message knowing the set of
98	   all possible messages would lead to near-perfect accuracy.  Even when
99	   the audio is not pre-recorded, there is a real possibility of being
100	   able to recognize contents from encypted audio when the dialog is
101	   highly structured (e.g. when the evesdropper knows that only a
102	   handful of possible sentences are possible) and thus contain only
103	   little information.  On the other end, recognizing unconstrained
104	   conversational speech from the rate information alone appears to be
105	   highly unlikely at best.  In fact, such a task is already considered
106	   a hard problem even when one has access to the unencrypted audio.

108	   In practical SRTP scenarios, it must also be considered how
109	   significant the information leak is when compared to other SRTP-
110	   related information, such as the fact that the source and destination
111	   IP addresses are available.

113	3.  Guidelines for use of VBR Audio with SRTP

115	   It is the responsibility of the application designer to determine the
116	   appropriate trade-off between security and bandwidth overhead.  As a
117	   general rule, VBR codecs should be considered safe in the context of
118	   encrypted one-to-one calls.  However, applications that make use of
119	   pre-recorded messages where the contents of such pre-recorded
120	   messages may be of any value to an evesdropper (i.e., messages beyond
121	   standard greeting messages) SHOULD NOT use codecs in VBR mode.  IVR
122	   applications would be particularly vulnerable since an evesdropper
123	   could easily use the rate information to easily recognize the prompts
124	   being played out.

126	   It is safe to use variable rate coding to adapt the output of a voice
127	   codec to match characteristics of a network channel, for example for
128	   congestion control purposes, provided this adaptation done in a way
129	   that does not expose any information on the speech signal.  That is,
130	   if the variation is driven by the available network bandwidth, not by
131	   the input speech (i.e., if the packet sizes and spacing are constant
132	   unless the network conditions change).  VBR speech codecs can safely
133	   be used in this fashion with SRTP while avoiding leaking information
134	   on the contents of the speech signal that might be useful for traffic
135	   analysis.

137	4.  Guidelines for use of Voice Activity Detection with SRTP

139	   Many speech codecs employ some form of voice activity detection (VAD)
140	   to either suppress output frames, or generate some form of lower-rate
141	   comfort noise frames, during periods when the speaker is not active.
142	   If VAD is used on an encrypted speech signal, then some information
143	   about the characteristics of that speech signal can be determined by
144	   watching the patterns of voice activity.  This information leakage is
145	   less than with VBR coding since there are only two rates possible.

147	   The information leakage due to VAD in SRTP audio sessions can be much
148	   reduced if the sender adds an unpredictable "overhang" period to the
149	   end of active speech intervals, so obscuring their actual length. an
150	   RTP sender using VAD with encrypted SRTP audio SHOULD insert such an
151	   overhang period at the end of each talkspurt, delaying the start of
152	   the silence/comfort noise by a random interval.  The length of the
153	   overhang applied to each talkspurt must be randomly chosen in such a
154	   way that it is computationally infeasible for an attacker to reliably
155	   estimate the length of that talkspurt.  The audio data comprising the
156	   overhang period must be packetised and transmitted in RTP packets in
157	   a manner that is indistinguishable from the other data in the
158	   talkspurt.

160	   The overhang period SHOULD have an exponentially-decreasing
161	   probability distribution function.  This ensures a long tail, while
162	   being easy to compute.  It is RECOMMENDED to use an overhang with a
163	   "half life" of a few hundred milliseconds (this should be sufficient
164	   to obscure the presence of inter-word pauses and the lengths of
165	   single words spoken in isolation, for example the digits of a credit
166	   card number clearly enunciated for an automated system, but not so
167	   long as to significantly reduce the effectiveness of VAD for
168	   detecting listening pauses).  Despite the overhang (and no matter
169	   what the duration is), there is still a small amount of information
170	   leaked about the start time of the talkspurt due to the fact that we
171	   cannot apply an overhang to the start of a talkspurt without
172	   unacceptably affecting intelligibility.  For that reason, VAD SHOULD
173	   NOT be used in encrypted IVR applications where the content of pre-
174	   recorded messages may be of any value to an eavesdropper.

176	   The application of a random overhang period to each talkspurt will
177	   reduce the effectiveness of VAD in SRTP sessions when compared to
178	   non-SRTP sessions.  It is, however, still expected that the use of
179	   VAD will provide a significant bandwidth saving for many encrypted
180	   sessions.

182	5.  Padding the output of VBR codecs

184	   For scenarios where VBR is considered unsafe, the codec SHOULD be
185	   operated in constant bit rate (CBR) mode.  However, if the codec does
186	   not support CBR, RTP padding SHOULD be used to reduce the information
187	   leak to an insignificant level.  Packets may be padded to a constant
188	   size ([spot-me] achieves good results by padding to the next multiple
189	   of 16 octets, but the amount of padding needed to hide the variation
190	   in packet size will depend on the codec), or may be padded to a size
191	   that varies with time.  In the case where the size of the padded
192	   packets varies in time, the same concerns as for VAD apply.  That is,
193	   the padding SHOULD NOT be reduced without waiting for a certain
194	   (random) time.  The RECOMMENDED "hold time" is the same as the one
195	   for VAD.

197	   Note that SRTP encrypts the count of the number of octets of padding
198	   added to a packet, but not the bit in the RTP header that indicates
199	   that the packet has been padded.  For this reason, it is RECOMMENDED
200	   to add at least one octet of padding to all packets in a media
201	   stream, so an attacker cannot tell which packets needed padding.

203	6.  Security Considerations

205	   The security considerations of [RFC3711] apply.

207	7.  IANA Considerations

209	   No IANA actions are required.

211	8.  Acknowledgements

213	   This memo is based on the discussion in [spot-me].  ZRTP [RFC6189]
214	   contain a similar recommendation; the purpose of this memo is to
215	   highlight these issues to a wider audience, since they are not
216	   specific to ZRTP.  Thanks are due to Phil Zimmermann, Stefan Doehla,
217	   Mats Naslund, Gregory Maxwell, David McGrew, Mark Baugher, Koen Vos,
218	   and Ingemar Johansson for their comments and feedback on this memo.

220	9.  References

222	9.1.  Normative References

224	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
225	              Requirement Levels", BCP 14, RFC 2119, March 1997.

227	   [RFC3711]  Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
228	              Norrman, "The Secure Real-time Transport Protocol (SRTP)",
229	              RFC 3711, March 2004.

231	9.2.  Informative References

233	   [RFC6189]  Zimmermann, P., Johnston, A., and J. Callas, "ZRTP: Media
234	              Path Key Agreement for Unicast Secure RTP", RFC 6189,
235	              April 2011.

237	   [spot-me]  Wright, C., Ballard, L., Coull, S., Monrose, F., and G.
238	              Masson, "Spot me if you can: Uncovering spoken phrases in
239	              encrypted VoIP conversation", Proceedings of the  IEEE
240	              Symposium on Security and Privacy 2008, May 2008.

242	Authors' Addresses

244	   Colin Perkins
245	   University of Glasgow
246	   School of Computing Science
247	   Glasgow  G12 8QQ
248	   UK

250	   Email: csp@csperkins.org

252	   Jean-Marc Valin
253	   Octasic Inc.
254	   4101 Molson Street, Suite 300
255	   Montreal, Quebec  H1Y 3L1
256	   Canada

258	   Email: Jean-Marc.Valin@octasic.com