idnits 2.17.1 

draft-iab-privsec-confidentiality-threat-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** The document is more than 15 pages and seems to lack a Table of Contents.


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  -- The document has examples using IPv4 documentation addresses according
     to RFC6890, but does not use any IPv6 documentation addresses.  Maybe
     there should be IPv6 examples, too?


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (February 20, 2015) is 3347 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Unused Reference: 'TOR' is defined on line 862, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2015' is defined on line 930, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2821' is defined on line 933, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC3851' is defined on line 948, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC4301' is defined on line 956, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC4306' is defined on line 962, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC5655' is defined on line 974, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC5750' is defined on line 978, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC6120' is defined on line 982, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC6698' is defined on line 988, but no explicit
     reference was found in the text

  -- Obsolete informational reference (is this intentional?): RFC 2821
     (Obsoleted by RFC 5321)

  -- Obsolete informational reference (is this intentional?): RFC 3501
     (Obsoleted by RFC 9051)

  -- Obsolete informational reference (is this intentional?): RFC 3851
     (Obsoleted by RFC 5751)

  -- Obsolete informational reference (is this intentional?): RFC 4306
     (Obsoleted by RFC 5996)

  -- Obsolete informational reference (is this intentional?): RFC 5246
     (Obsoleted by RFC 8446)

  -- Obsolete informational reference (is this intentional?): RFC 5750
     (Obsoleted by RFC 8550)

  -- Obsolete informational reference (is this intentional?): RFC 6962
     (Obsoleted by RFC 9162)


     Summary: 1 error (**), 0 flaws (~~), 11 warnings (==), 9 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                          R. Barnes
3	Internet-Draft
4	Intended status: Informational                               B. Schneier
5	Expires: August 24, 2015
6	                                                             C. Jennings

8	                                                               T. Hardie

10	                                                             B. Trammell

12	                                                              C. Huitema

14	                                                             D. Borkmann

16	                                                       February 20, 2015

18	 Confidentiality in the Face of Pervasive Surveillance: A Threat Model
19	                         and Problem Statement
20	              draft-iab-privsec-confidentiality-threat-03

22	Abstract

24	   Documents published since initial revelations in 2013 have revealed
25	   several classes of pervasive surveillance attack on Internet
26	   communications.  In this document we develop a threat model that
27	   describes these pervasive attacks.  We start by assuming an attacker
28	   with an interest in undetected, indiscriminate eavesdropping, then
29	   expand the threat model with a set of verified attacks that have been
30	   published.

32	Status of This Memo

34	   This Internet-Draft is submitted in full conformance with the
35	   provisions of BCP 78 and BCP 79.

37	   Internet-Drafts are working documents of the Internet Engineering
38	   Task Force (IETF).  Note that other groups may also distribute
39	   working documents as Internet-Drafts.  The list of current Internet-
40	   Drafts is at http://datatracker.ietf.org/drafts/current/.

42	   Internet-Drafts are draft documents valid for a maximum of six months
43	   and may be updated, replaced, or obsoleted by other documents at any
44	   time.  It is inappropriate to use Internet-Drafts as reference
45	   material or to cite them other than as "work in progress."

47	   This Internet-Draft will expire on August 24, 2015.

49	Copyright Notice

51	   Copyright (c) 2015 IETF Trust and the persons identified as the
52	   document authors.  All rights reserved.

54	   This document is subject to BCP 78 and the IETF Trust's Legal
55	   Provisions Relating to IETF Documents
56	   (http://trustee.ietf.org/license-info) in effect on the date of
57	   publication of this document.  Please review these documents
58	   carefully, as they describe your rights and restrictions with respect
59	   to this document.  Code Components extracted from this document must
60	   include Simplified BSD License text as described in Section 4.e of
61	   the Trust Legal Provisions and are provided without warranty as
62	   described in the Simplified BSD License.

64	1.  Introduction

66	   Starting in June 2013, documents released to the press by Edward
67	   Snowden have revealed several operations undertaken by intelligence
68	   agencies to exploit Internet communications for intelligence
69	   purposes.  These attacks were largely based on protocol
70	   vulnerabilities that were already known to exist.  The attacks were
71	   nonetheless striking in their pervasive nature, both in terms of the
72	   amount of Internet communications targeted, and in terms of the
73	   diversity of attack techniques employed.

75	   To ensure that the Internet can be trusted by users, it is necessary
76	   for the Internet technical community to address the vulnerabilities
77	   exploited in these attacks [RFC7258].  The goal of this document is
78	   to describe more precisely the threats posed by these pervasive
79	   attacks, and based on those threats, lay out the problems that need
80	   to be solved in order to secure the Internet in the face of those
81	   threats.

83	   The remainder of this document is structured as follows.  In
84	   Section 3, we describe an idealized flow access attacker, one which
85	   could completely undetectably compromise communications at Internet
86	   scale.  In Section 4, we provide a brief summary of some attacks that
87	   have been disclosed, and use these to expand the assumed capabilities
88	   of our idealized attacker.  Note that we do not attempt to describe
89	   all possible attacks, but focus on those which result in undetected
90	   eavesdropping.  Section 5 describes a threat model based on these
91	   attacks, focusing on classes of attack that have not been a focus of
92	   Internet engineering to date.

94	2.  Terminology

96	   This document makes extensive use of standard security and privacy
97	   terminology; see [RFC4949] and [RFC6973].  Terms used from [RFC6973]
98	   include Eavesdropper, Observer, Initiator, Intermediary, Recipient,
99	   Attack (in a privacy context), Correlation, Fingerprint, Traffic
100	   Analysis, and Identifiability (and related terms).  In addition, we
101	   use a few terms that are specific to the attacks discussed here:

103	   Flow Access Attack:    An eavesdropping attack in which the packets
104	      in a traffic stream between two endpoints are eavesdropped upon,
105	      but in which the attacker does not modify the packets in the
106	      traffic stream between two endpoints, modify the treatment of
107	      packets in the traffic stream (e.g. delay, routing), or add or
108	      remove packets in the traffic stream.  Flow access attacks are
109	      undetectable from the endpoints.

111	   Flow Modification Attack:   An attack which includes both
112	      eavesdropping (as in a flow access attack) as well as
113	      modification, addition, or removal of packets in a traffic stream,
114	      or modification of treatment of packets in the traffic stream.
115	      Flow modification attacks provide more capabilities to the
116	      attacker at the cost of possible detection at the endpoints.

118	   Pervasive Attack:   An attack on Internet communications that makes
119	      use of access at a large number of points in the network, or
120	      otherwise provides the attacker with access to a large amount of
121	      Internet traffic; see [RFC7258]

123	   Observation:  Information collected directly from communications by
124	      an eavesdropper or observer.  For example, the knowledge that
125	      <alice@example.com> sent a message to <bob@example.com> via SMTP
126	      taken from the headers of an observed SMTP message would be an
127	      observation.

129	   Inference:  Information extracted from analysis of information
130	      collected directly from communications by an eavesdropper or
131	      observer.  For example, the knowledge that a given web page was
132	      accessed by a given IP address, by comparing the size in octets of
133	      measured network flow records to fingerprints derived from known
134	      sizes of linked resources on the web servers involved, would be an
135	      inference.

137	   Collaborator:  An entity that is a legitimate participant in a
138	      communication, but who deliberately provides information about
139	      that interaction to an attacker.

141	   Unwitting Collaborator:  An entity that is a legitimate participant
142	      in a communication, and who is the source of information obtained
143	      by the attacker without the entity's consent or intention, because
144	      the attacker has exploited some technology used by the entity.

146	   Key Exfiltration:  The transmission of keying material for an
147	      encrypted communication from a collaborator, deliberately or
148	      unwittingly, to an attacker

150	   Content Exfiltration:  The transmission of the content of a
151	      communication from a collaborator, deliberately or unwittingly, to
152	      an attacker

154	3.  An Idealized Pervasive Flow Access Attacker

156	   In considering the threat posed by pervasive surveillance, we begin
157	   by defining an idealized pervasive flow access attacker.  While this
158	   attacker is less capable than those which we now know to have
159	   compromised the Internet from press reports, as elaborated in
160	   Section 4, it does set a lower bound on the capabilities of an
161	   attacker interested in indiscriminate passive surveillance while
162	   interested in remaining undetectable.  We note that, prior to the
163	   Snowden revelations in 2013, the assumptions of attacker capability
164	   presented here would be considered on the border of paranoia outside
165	   the network security community.

167	   Our idealized attacker is an indiscriminate eavesdropper on an
168	   Internet-attached computer network that:

170	   o  can observe every packet of all communications at any hop in any
171	      network path between an initiator and a recipient;

173	   o  can observe data at rest in any intermediate system between the
174	      endpoints controlled by the initiator and recipient; and

176	   o  can share information with other such attackers; but

178	   o  takes no other action with respect to these communications (i.e.,
179	      blocking, modification, injection, etc.).

181	   The techniques available to our ideal attacker are direct observation
182	   and inference.  Direct observation involves taking information
183	   directly from eavesdropped communications - e.g., URLs identifying
184	   content or email addresses identifying individuals from application-
185	   layer headers.  Inference, on the other hand, involves analyzing
186	   eavesdropped information to derive new information from it; e.g.,
187	   searching for application or behavioral fingerprints in observed
188	   traffic to derive information about the observed individual from
189	   them, in absence of directly-observed sources of the same
190	   information.  The use of encryption to protect confidentiality is
191	   generally enough to prevent direct observation of unencrypted
192	   content, assuming uncompromised encryption implementations and key
193	   material.  However, it provides less complete protection against
194	   inference, especially inference based only on unprotected portions of
195	   communications (e.g.  IP and TCP headers for TLS [RFC5246]).

197	3.1.  Information subject to direct observation

199	   Protocols which do not encrypt their payload make the entire content
200	   of the communication available to the idealized attacker along their
201	   path.  Following the advice in [RFC3365], most such protocols have a
202	   secure variant which encrypts payload for confidentiality, and these
203	   secure variants are seeing ever-wider deployment.  A noteworthy
204	   exception is DNS [RFC1035], as DNSSEC [RFC4033] does not have
205	   confidentiality as a requirement.  This implies that, in the absence
206	   of changes to the protocol as presently under development in the
207	   DPRIVE working group, all DNS queries and answers generated by the
208	   activities of any protocol are available to the attacker.

210	   Protocols which imply the storage of some data at rest in
211	   intermediaries (e.g.  SMTP [RFC5321]) leave this data subject to
212	   observation by an attacker that has compromised these intermediaries,
213	   unless the data is encrypted end-to-end by the application layer
214	   protocol, or the implementation uses an encrypted store for this
215	   data.

217	3.2.  Information useful for inference

219	   Inference is information extracted from later analysis of an observed
220	   or eavesdropped communication, and/or correlation of observed or
221	   eavesdropped information with information available from other
222	   sources.  Indeed, most useful inference performed by the attacker
223	   falls under the rubric of correlation.  The simplest example of this
224	   is the observation of DNS queries and answers from and to a source
225	   and correlating those with IP addresses with which that source
226	   communicates.  This can give access to information otherwise not
227	   available from encrypted application payloads (e.g., the Host:
228	   HTTP/1.1 request header when HTTP is used with TLS).

230	   Protocols which encrypt their payload using an application- or
231	   transport-layer encryption scheme (e.g.  TLS) still expose all the
232	   information in their network and transport layer headers to the
233	   attacker, including source and destination addresses and ports.
234	   IPsec ESP[RFC4303] further encrypts the transport-layer headers, but
235	   still leaves IP address information unencrypted; in tunnel mode,
236	   these addresses correspond to the tunnel endpoints.  Features of the
237	   cryptographic protocols themselves, e.g. the TLS session identifier,
238	   may leak information that can be used for correlation and inference.
239	   While this information is much less semantically rich than the
240	   application payload, it can still be useful for the inferring an
241	   individual's activities.

243	   Inference can also leverage information obtained from sources other
244	   than direct traffic observation.  Geolocation databases, for example,
245	   have been developed map IP addresses to a location, in order to
246	   provide location-aware services such as targeted advertising.  This
247	   location information is often of sufficient resolution that it can be
248	   used to draw further inferences toward identifying or profiling an
249	   individual.

251	   Social media provide another source of more or less publicly
252	   accessible information.  This information can be extremely
253	   semantically rich, including information about an individual's
254	   location, associations with other individuals and groups, and
255	   activities.  Further, this information is generally contributed and
256	   curated voluntarily by the individuals themselves: it represents
257	   information which the individuals are not necessarily interested in
258	   protecting for privacy reasons.  However, correlation of this social
259	   networking data with information available from direct observation of
260	   network traffic allows the creation of a much richer picture of an
261	   individual's activities than either alone.

263	   We note with some alarm that there is little that can be done at
264	   protocol design time to limit such correlation by the attacker, and
265	   that the existence of such data sources in many cases greatly
266	   complicates the problem of protecting privacy by hardening protocols
267	   alone.

269	3.3.  An illustration of an ideal flow access attack

271	   To illustrate how capable the idealized attacker is even given its
272	   limitations, we explore the non-anonymity of encrypted IP traffic in
273	   this section.  Here we examine in detail some inference techniques
274	   for associating a set of addresses with an individual, in order to
275	   illustrate the difficulty of defending communications against our
276	   idealized attacker.  Here, the basic problem is that information
277	   radiated even from protocols which have no obvious connection with
278	   personal data can be correlated with other information which can
279	   paint a very rich behavioral picture, that only takes one unprotected
280	   link in the chain to associate with an identity.

282	3.3.1.  Analysis of IP headers

284	   Internet traffic can be monitored by tapping Internet links, or by
285	   installing monitoring tools in Internet routers.  Of course, a single
286	   link or a single router only provides access to a fraction of the
287	   global Internet traffic.  However, monitoring a number of high
288	   capacity links or a set of routers placed at strategic locations
289	   provides access to a good sampling of Internet traffic.

291	   Tools like IPFIX [RFC7011] allow administrators to acquire statistics
292	   about sequences of packets with some common properties that pass
293	   through a network device.  The most common set of properties used in
294	   flow measurement is the "five-tuple"of source and destination
295	   addresses, protocol type, and source and destination ports.  These
296	   statistics are commonly used for network engineering, but could
297	   certainly be used for other purposes.

299	   Let's assume for a moment that IP addresses can be correlated to
300	   specific services or specific users.  Analysis of the sequences of
301	   packets will quickly reveal which users use what services, and also
302	   which users engage in peer-to-peer connections with other users.
303	   Analysis of traffic variations over time can be used to detect
304	   increased activity by particular users, or in the case of peer-to-
305	   peer connections increased activity within groups of users.

307	3.3.2.  Correlation of IP addresses to user identities

309	   The correlation of IP addresses with specific users can be done in
310	   various ways.  For example, tools like reverse DNS lookup can be used
311	   to retrieve the DNS names of servers.  Since the addresses of servers
312	   tend to be quite stable and since servers are relatively less
313	   numerous than users, an attacker could easily maintain its own copy
314	   of the DNS for well-known or popular servers, to accelerate such
315	   lookups.

317	   On the other hand, the reverse lookup of IP addresses of users is
318	   generally less informative.  For example, a lookup of the address
319	   currently used by one author's home network returns a name of the
320	   form "c-192-000-002-033.hsd1.wa.comcast.net".  This particular type
321	   of reverse DNS lookup generally reveals only coarse-grained location
322	   or provider information, equivalent to that available from
323	   geolocation databases.

325	   In many jurisdictions, Internet Service Providers (ISPs) are required
326	   to provide identification on a case by case basis of the "owner" of a
327	   specific IP address for law enforcement purposes.  This is a
328	   reasonably expedient process for targeted investigations, but
329	   pervasive surveillance requires something more efficient.  This
330	   provides an incentive for the attacker to secure the cooperation of
331	   the ISP in order to automate this correlation.

333	3.3.3.  Monitoring messaging clients for IP address correlation

335	   Even if the ISP does not cooperate, user identity can often be
336	   obtained via inference.  POP3 [RFC1939] and IMAP [RFC3501] are used
337	   to retrieve mail from mail servers, while a variant of SMTP is used
338	   to submit messages through mail servers.  IMAP connections originate
339	   from the client, and typically start with an authentication exchange
340	   in which the client proves its identity by answering a password
341	   challenge.  The same holds for the SIP protocol [RFC3261] and many
342	   instant messaging services operating over the Internet using
343	   proprietary protocols.

345	   The username is directly observable if any of these protocols operate
346	   in cleartext; the username can then be directly associated with the
347	   source address.

349	3.3.4.  Retrieving IP addresses from mail headers

351	   SMTP [RFC5321] requires that each successive SMTP relay adds a
352	   "Received" header to the mail headers.  The purpose of these headers
353	   is to enable audit of mail transmission, and perhaps to distinguish
354	   between regular mail and spam.  Here is an extract from the headers
355	   of a message recently received from the "perpass" mailing list:

357	   "Received: from 192-000-002-044.zone13.example.org (HELO
358	   ?192.168.1.100?) (xxx.xxx.xxx.xxx) by lvps192-000-002-219.example.net
359	   with ESMTPSA (DHE-RSA-AES256-SHA encrypted, authenticated); 27 Oct
360	   2013 21:47:14 +0100 Message-ID: <526D7BD2.7070908@example.org> Date:
361	   Sun, 27 Oct 2013 20:47:14 +0000 From: Some One <some.one@example.org>
362	   "

364	   This is the first "Received" header attached to the message by the
365	   first SMTP relay; for privacy reasons, the field values have been
366	   anonymized.  We learn here that the message was submitted by "Some
367	   One" on October 27, from a host behind a NAT (192.168.1.100)
368	   [RFC1918] that used the IP address 192.0.2.44.  The information
369	   remained in the message, and is accessible by all recipients of the
370	   "perpass" mailing list, or indeed by any attacker that sees at least
371	   one copy of the message.

373	   An attacker that can observe sufficient email traffic can regularly
374	   update the mapping between public IP addresses and individual email
375	   identities.  Even if the SMTP traffic was encrypted on submission and
376	   relaying, the attacker can still receive a copy of public mailing
377	   lists like "perpass".

379	3.3.5.  Tracking address usage with web cookies

381	   Many web sites only encrypt a small fraction of their transactions.
382	   A popular pattern is to use HTTPS for the login information, and then
383	   use a "cookie" to associate following clear-text transactions with
384	   the user's identity.  Cookies are also used by various advertisement
385	   services to quickly identify the users and serve them with
386	   "personalized" advertisements.  Such cookies are particularly useful
387	   if the advertisement services want to keep tracking the user across
388	   multiple sessions that may use different IP addresses.

390	   As cookies are sent in clear text, an attacker can build a database
391	   that associates cookies to IP addresses for non-HTTPS traffic.  If
392	   the IP address is already identified, the cookie can be linked to the
393	   user identify.  After that, if the same cookie appears on a new IP
394	   address, the new IP address can be immediately associated with the
395	   pre-determined identity.

397	3.3.6.  Graph-based approaches to address correlation

399	   An attacker can track traffic from an IP address not yet associated
400	   with an individual to various public services (e.g. websites, mail
401	   servers, game servers), and exploit patterns in the observed traffic
402	   to correlate this address with other addresses that show similar
403	   patterns.  For example, any two addresses that show connections to
404	   the same IMAP or webmail services, the same set of favorite websites,
405	   and game servers at similar times of day may be associated with the
406	   same individual.  Correlated addresses can then be tied to an
407	   individual through one of the techniques above, walking the "network
408	   graph" to expand the set of attributable traffic.

410	3.3.7.  Tracking of MAC Addresses

412	   Moving back down the stack, technologies like Ethernet or Wi-Fi use
413	   MAC Addresses to identify link-level destinations.  MAC Addresses
414	   assigned according to IEEE-802 standards are unique to the device.
415	   If the link is publicly accessible, an attacker can track it.  For
416	   example, the attacker can track the wireless traffic at public Wi-Fi
417	   networks.  Simple devices can monitor the traffic, and reveal which
418	   MAC Addresses are present.  If the network does not use some form of
419	   Wi-Fi encryption, or if the attacker can access the decrypted
420	   traffic, the analysis will also provide the correlation between MAC
421	   Addresses and IP addresses.  Additional monitoring using techniques
422	   exposed in the previous sections will reveal the correlation between
423	   MAC Addresses, IP Addresses, and user identity.

425	   Given that large-scale databases of the MAC addresses of wireless
426	   access points for geolocation purposes have been known to exist for
427	   some time, the attacker could easily build a database linking MAC
428	   Addresses and device or user identities, and use it to track the
429	   movement of devices and of their owners.

431	4.  Reported Instances of Large-Scale Attacks

433	   The situation in reality is more bleak than that suggested by an
434	   analysis of our idealized attacker.  Through revelations of sensitive
435	   documents in several media outlets, the Internet community has been
436	   made aware of several intelligence activities conducted by US and UK
437	   national intelligence agencies, particularly the US National Security
438	   Agency (NSA) and the UK Government Communications Headquarters
439	   (GCHQ).  These documents have revealed methods that these agencies
440	   use to attack Internet applications and obtain sensitive user
441	   information.

443	   First, they have confirmed that these agencies have capabilities in
444	   line with those of our idealized attacker, thorugh the large-scale
445	   passive collection of Internet traffic [pass1][pass2][pass3][pass4].
446	   For example: - The NSA XKEYSCORE system accesses data from multiple
447	   access points and searches for "selectors" such as email addresses,
448	   at the scale of tens of terabytes of data per day.  - The GCHQ
449	   Tempora system appears to have access to around 1,500 major cables
450	   passing through the UK.  - The NSA MUSCULAR program tapped cables
451	   between data centers belonging to major service providers.  - Several
452	   programs appear to perform wide-scale collection of cookies in web
453	   traffic and location data from location-aware portable devices such
454	   as smartphones.

456	   However, the capabilities described go beyond those available to our
457	   idealized attacker, including:

459	   o  Decryption of TLS-protected Internet sessions [dec1][dec2][dec3].
460	      For example, the NSA BULLRUN project appears to have had a budget
461	      of around $250M per year to undermine encryption through multiple
462	      approaches.

464	   o  Insertion of NSA devices as a man-in-the-middle of Internet
465	      transactions [TOR1][TOR2].  For example, the NSA QUANTUM system
466	      appears to use several different techniques to hijack HTTP
467	      connections, ranging from DNS response injection to HTTP 302
468	      redirects.

470	   o  Direct acquisition of bulk data and metadata from service
471	      providers [dir1][dir2][dir3].  For example, the NSA PRISM program
472	      provides the agency with access to many types of user data (e.g.,
473	      email, chat, VoIP).

475	   o  Use of implants (covert modifications or malware) to undermine
476	      security and anonymity features [dec2][TOR1][TOR2].  For example:

478	      *  NSA appears to use the QUANTUM man-in-the-middle system to
479	         direct users to a FOXACID server, which delivers an implant to
480	         compromise the browser of a user of the Tor anonymous
481	         communications network.

483	      *  Implants are apparently available for Cisco, Juniper, Huawei,
484	         Dell, and HP network elements, provided by the NSA Advanced
485	         Network Technology group [spiegel1]

487	      *  Compromised hosts at botnet scale, using tools by the NSA's
488	         Remote Operations Center [spiegel3]

490	      *  The BULLRUN program mentioned above includes the addition of
491	         covert modifications to software as one means to undermine
492	         encryption.

494	      *  There is also some suspicion that NSA modifications to the
495	         DUAL_EC_DRBG random number generator were made to ensure that
496	         keys generated using that generator could be predicted by NSA.
497	         These suspicions have been reinforced by reports that RSA
498	         Security was paid roughly $10M to make DUAL_EC_DRBG the default
499	         in their products.

501	   We use the term "pervasive attack" [RFC7258] to collectively describe
502	   these operations.  The term "pervasive" is used because the attacks
503	   are designed to indiscriminately gather as much data as possible and
504	   to apply selective analysis on targets after the fact.  This means
505	   that all, or nearly all, Internet communications are targets for
506	   these attacks.  To achieve this scale, the attacks are physically
507	   pervasive; they affect a large number of Internet communications.
508	   They are pervasive in content, consuming and exploiting any
509	   information revealed by the protocol.  And they are pervasive in
510	   technology, exploiting many different vulnerabilities in many
511	   different protocols.

513	   It's important to note that although the attacks mentioned above were
514	   executed by NSA and GCHQ, there are many other organizations that can
515	   mount pervasive surveillance attacks.  Because of the resources
516	   required to achieve pervasive scale, these attacks are most commonly
517	   undertaken by nation-state actors.  For example, the Chinese Internet
518	   filtering system known as the "Great Firewall of China" uses several
519	   techniques that are similar to the QUANTUM program, and which have a
520	   high degree of pervasiveness with regard to the Internet in China.

522	5.  Threat Model

524	   Given these disclosures, we must consider a broader threat model.

526	   Pervasive surveillance aims to collect information across a large
527	   number of Internet communications, analyzing the collected
528	   communications to identify information of interest within individual
529	   communications, or inferring information from correlated
530	   communications.  his analysis sometimes benefits from decryption of
531	   encrypted communications and deanonymization of anonymized
532	   communications.  As a result, these attackers desire both access to
533	   the bulk of Internet traffic and to the keying material required to
534	   decrypt any traffic that has been encrypted.  Even if keys are not
535	   available, note that the presence of a communication and the fact
536	   that it is encrypted may both be inputs to an analysis, even if the
537	   attacker cannot decrypt the communication.

539	   The attacks listed above highlight new avenues both for access to
540	   traffic and for access to relevant encryption keys.  They further
541	   indicate that the scale of surveillance is sufficient to provide a
542	   general capability to cross-correlate communications, a threat not
543	   previously thought to be relevant at the scale of the Internet.

545	5.1.  Attacker Capabilities

547	    +--------------------------+-------------------------------------+
548	    | Attack Class             | Capability                          |
549	    +--------------------------+-------------------------------------+
550	    | Passive observation      | Directly capture data in transit    |
551	    |                          |                                     |
552	    | Passive inference        | Infer from reduced/encrypted data   |
553	    |                          |                                     |
554	    | Active                   | Manipulate / inject data in transit |
555	    |                          |                                     |
556	    | Static key exfiltration  | Obtain key material once / rarely   |
557	    |                          |                                     |
558	    | Dynamic key exfiltration | Obtain per-session key material     |
559	    |                          |                                     |
560	    | Content exfiltration     | Access data at rest                 |
561	    +--------------------------+-------------------------------------+

563	   Security analyses of Internet protocols commonly consider two classes
564	   of attacker: flow access attackers, who can simply listen in on
565	   communications as they transit the network, and flow modification
566	   attackers, who can modify or delete packets in addition to simply
567	   collecting them.

569	   In the context of pervasive passive surveillance, these attacks take
570	   on an even greater significance.  In the past, these attackers were
571	   often assumed to operate near the edge of the network, where attacks
572	   can be simpler.  For example, in some LANs, it is simple for any node
573	   to engage in passive listening to other nodes' traffic or inject
574	   packets to accomplish flow modification attacks.  However, as we now
575	   know, both passive and flow modification attacks are undertaken by
576	   pervasive attackers closer to the core of the network, greatly
577	   expanding the scope and capability of the attacker.

579	   Eavesdropping and observation at a larger scale make passive
580	   inference attacks easier to carry out: a flow access attacker with
581	   access to a large portion of the Internet can analyze collected
582	   traffic to create a much more detailed view of individual behavior
583	   than an attacker that collects at a single point.  Even the usual
584	   claim that encryption defeats flow access attackers is weakened,
585	   since a pervasive flow access attacker can infer relationships from
586	   correlations over large numbers of sessions, e.g., pairing encrypted
587	   sessions with unencrypted sessions from the same host, or performing
588	   traffic fingerprinting between known and unknown encrypted sessions.
589	   Reports on the NSA XKEYSCORE system would indicate it is an example
590	   of such an attacker.

592	   A pervasive flow modification attacker likewise has capabilities
593	   beyond those of a localized flow modification attacker.  flow
594	   modification attacks are often limited by network topology, for
595	   example by a requirement that the attacker be able to see a targeted
596	   session as well as inject packets into it.  A pervasive flow
597	   modification attacker with access at multiple points within the core
598	   of the Internet is able to overcome these topological limitations and
599	   perform attacks over a much broader scope.  Being positioned in the
600	   core of the network rather than the edge can also enable a pervasive
601	   flow modification attacker to reroute targeted traffic, amplifying
602	   the ability to perform both eavesdropping and traffic injection.
603	   Pervasive flow modification attackers can also benefit from pervasive
604	   passive collection to identify vulnerable hosts.

606	   While not directly related to pervasiveness, attackers that are in a
607	   position to mount a pervasive flow modification attack are also often
608	   in a position to subvert authentication, a traditional protection
609	   against such attacks.  Authentication in the Internet is often
610	   achieved via trusted third party authorities such as the Certificate
611	   Authorities (CAs) that provide web sites with authentication
612	   credentials.  An attacker with sufficient resources may also be able
613	   to induce an authority to grant credentials for an identity of the
614	   attacker's choosing.  If the parties to a communication will trust
615	   multiple authorities to certify a specific identity, this attack may
616	   be mounted by suborning any one of the authorities (the proverbial
617	   "weakest link").  Subversion of authorities in this way can allow an
618	   flow modification attack to succeed in spite of an authentication
619	   check.

621	   Beyond these three classes (observation, inference, and active),
622	   reports on the BULLRUN effort to defeat encryption and the PRISM
623	   effort to obtain data from service providers suggest three more
624	   classes of attack:

626	   o  Static key exfiltration

628	   o  Dynamic key exfiltration

630	   o  Content exfiltration

632	   These attacks all rely on a collaborator providing the attacker with
633	   some information, either keys or data.  These attacks have not
634	   traditionally been considered in scope for the Security
635	   Considerations sections of IETF protocols, as they occur outside the
636	   protocol.

638	   The term "key exfiltration" refers to the transfer of keying material
639	   for an encrypted communication from the collaborator to the attacker.
640	   By "static", we mean that the transfer of keys happens once, or
641	   rarely, typically of a long-lived key.  For example, this case would
642	   cover a web site operator that provides the private key corresponding
643	   to its HTTPS certificate to an intelligence agency.

645	   "Dynamic" key exfiltration, by contrast, refers to attacks in which
646	   the collaborator delivers keying material to the attacker frequently,
647	   e.g., on a per-session basis.  This does not necessarily imply
648	   frequent communications with the attacker; the transfer of keying
649	   material may be virtual.  For example, if an endpoint were modified
650	   in such a way that the attacker could predict the state of its
651	   psuedorandom number generator, then the attacker would be able to
652	   derive per-session keys even without per-session communications.

654	   Finally, content exfiltration is the attack in which the collaborator
655	   simply provides the attacker with the desired data or metadata.
656	   Unlike the key exfiltration cases, this attack does not require the
657	   attacker to capture the desired data as it flows through the network.
658	   The risk is to data at rest as opposed to data in transit.  This
659	   increases the scope of data that the attacker can obtain, since the
660	   attacker can access historical data - the attacker does not have to
661	   be listening at the time the communication happens.

663	   Exfiltration attacks can be accomplished via attacks against one of
664	   the parties to a communication, i.e., by the attacker stealing the
665	   keys or content rather than the party providing them willingly.  In
666	   these cases, the party may not be aware that they are collaborating,
667	   at least at a human level.  Rather, the subverted technical assets
668	   are "collaborating" with the attacker (by providing keys/content)
669	   without their owner's knowledge or consent.

671	   Any party that has access to encryption keys or unencrypted data can
672	   be a collaborator.  While collaborators are typically the endpoints
673	   of a communication (with encryption securing the links),
674	   intermediaries in an unencrypted communication can also facilitate
675	   content exfiltration attacks as collaborators by providing the
676	   attacker access to those communications.  For example, documents
677	   describing the NSA PRISM program claim that NSA is able to access
678	   user data directly from servers, where it is stored unencrypted.  In
679	   these cases, the operator of the server would be a collaborator, if
680	   an unwitting one.  By contrast, in the NSA MUSCULAR program, a set of
681	   collaborators enabled attackers to access the cables connecting data
682	   centers used by service providers such as Google and Yahoo.  Because
683	   communications among these data centers were not encrypted, the
684	   collaboration by an intermediate entity allowed NSA to collect
685	   unencrypted user data.

687	5.2.  Attacker Costs

689	     +--------------------------+-----------------------------------+
690	     | Attack Class             | Cost / Risk to Attacker           |
691	     +--------------------------+-----------------------------------+
692	     | Passive observation      | Passive data access               |
693	     |                          |                                   |
694	     | Passive inference        | Passive data access + processing  |
695	     |                          |                                   |
696	     | Active                   | Active data access + processing   |
697	     |                          |                                   |
698	     | Static key exfiltration  | One-time interaction              |
699	     |                          |                                   |
700	     | Dynamic key exfiltration | Ongoing interaction / code change |
701	     |                          |                                   |
702	     | Content exfiltration     | Ongoing, bulk interaction         |
703	     +--------------------------+-----------------------------------+

705	   Each of the attack types discussed in the previous section entails
706	   certain costs and risks.  These costs differ by attack, and can be
707	   helpful in guiding response to pervasive attack.

709	   Depending on the attack, the attacker may be exposed to several types
710	   of risk, ranging from simply losing access to arrest or prosecution.
711	   In order for any of these negative consequences to occur, however,
712	   the attacker must first be discovered and identified.  So the primary
713	   risk we focus on here is the risk of discovery and attribution.

715	   A flow access attack is the simplest to mount in some ways.  The base
716	   requirement is that the attacker obtain physical access to a
717	   communications medium and extract communications from it.  For
718	   example, the attacker might tap a fiber-optic cable, acquire a mirror
719	   port on a switch, or listen to a wireless signal.  The need for these
720	   taps to have physical access or proximity to a link exposes the
721	   attacker to the risk that the taps will be discovered.  For example,
722	   a fiber tap or mirror port might be discovered by network operators
723	   noticing increased attenuation in the fiber or a change in switch
724	   configuration.  Of course, flow access attacks may be accomplished
725	   with the cooperation of the network operator, in which case there is
726	   a risk that the attacker's interactions with the network operator
727	   will be exposed.

729	   In many ways, the costs and risks for an flow modification attack are
730	   similar to those for a flow access attack, with a few additions.  An
731	   flow modification attacker requires more robust network access than a
732	   flow access attacker, since for example they will often need to
733	   transmit data as well as receiving it.  In the wireless example
734	   above, the attacker would need to act as an transmitter as well as
735	   receiver, greatly increasing the probability the attacker will be
736	   discovered (e.g., using direction-finding technology).  flow
737	   modification attacks are also much more observable at higher layers
738	   of the network.  For example, an flow modification attacker that
739	   attempts to use a mis-issued certificate could be detected via
740	   Certificate Transparency [RFC6962].

742	   In terms of raw implementation complexity, flow access attacks
743	   require only enough processing to extract information from the
744	   network and store it.  flow modification attacks, by contrast, often
745	   depend on winning race conditions to inject pakets into active
746	   connections.  So flow modification attacks in the core of the network
747	   require processing hardware to that can operate at line speed
748	   (roughly 100Gbps to 1Tbps in the core) to identify opportunities for
749	   attack and insert attack traffic in a high-volume traffic.  Key
750	   exfiltration attacks rely on flow access attack for access to
751	   encrypted data, with the collaborator providing keys to decrypt the
752	   data.  So the attacker undertakes the cost and risk of a flow access
753	   attack, as well as additional risk of discovery via the interactions
754	   that the attacker has with the collaborator.

756	   In this sense, static exfiltration has a lower risk profile than
757	   dynamic.  In the static case, the attacker need only interact with
758	   the collaborator a small number of times, possibly only once, say to
759	   exchange a private key.  In the dynamic case, the attacker must have
760	   continuing interactions with the collaborator.  As noted above these
761	   interactions may real, such as in-person meetings, or virtual, such
762	   as software modifications that render keys available to the attacker.
763	   Both of these types of interactions introduce a risk that they will
764	   be discovered, e.g., by employees of the collaborator organization
765	   noticing suspicious meetings or suspicious code changes.

767	   Content exfiltration has a similar risk profile to dynamic key
768	   exfiltration.  In a content exfiltration attack, the attacker saves
769	   the cost and risk of conducting a flow access attack.  The risk of
770	   discovery through interactions with the collaborator, however, is
771	   still present, and may be higher.  The content of a communication is
772	   obviously larger than the key used to encrypt it, often by several
773	   orders of magnitude.  So in the content exfiltration case, the
774	   interactions between the collaborator and the attacker need to be
775	   much higher-bandwidth than in the key exfiltration cases, with a
776	   corresponding increase in the risk that this high-bandwidth channel
777	   will be discovered.

779	   It should also be noted that in these latter three exfiltration
780	   cases, the collaborator also undertakes a risk that his collaboration
781	   with the attacker will be discovered.  Thus the attacker may have to
782	   incur additional cost in order to convince the collaborator to
783	   participate in the attack.  Likewise, the scope of these attacks is
784	   limited to case where the attacker can convince a collaborator to
785	   participate.  If the attacker is a national government, for example,
786	   it may be able to compel participation within its borders, but have a
787	   much more difficult time recruiting foreign collaborators.

789	   As noted above, the collaborator in an exfiltration attack can be
790	   unwitting; the attacker can steal keys or data to enable the attack.
791	   In some ways, the risks of this approach are similar to the case of
792	   an active collaborator.  In the static case, the attacker needs to
793	   steal information from the collaborator once; in the dynamic case,
794	   the attacker needs to continued presence inside the collaborators
795	   systems.  The main difference is that the risk in this case is of
796	   automated discovery (e.g., by intrusion detection systems) rather
797	   than discovery by humans.

799	6.  Security Considerations

801	   This document describes a threat model for pervasive surveillance
802	   attacks.  Mitigations are to be given in a future document.

804	7.  IANA Considerations

806	   This document has no actions for IANA.

808	8.  Acknowledgements

810	   Thanks to Dave Thaler for the list of attacks and taxonomy; to
811	   Security Area Directors Stephen Farrell, Sean Turner, and Kathleen
812	   Moriarty for starting and managing the IETF's discussion on pervasive
813	   attack; and to Stephan Neuhaus, Mark Townsley, Chris Inacio,
814	   Evangelos Halepilidis, Bjoern Hoehrmann, Aziz Mohaisen, as well as
815	   the IAB Privacy and Security Program, for their input.

817	9.  References

819	9.1.  Normative References

821	   [RFC6973]  Cooper, A., Tschofenig, H., Aboba, B., Peterson, J.,
822	              Morris, J., Hansen, M., and R. Smith, "Privacy
823	              Considerations for Internet Protocols", RFC 6973, July
824	              2013.

826	9.2.  Informative References

828	   [pass1]    The Guardian, "How the NSA is still harvesting your online
829	              data", 2013,
830	              <http://www.theguardian.com/world/2013/jun/27/
831	              nsa-online-metadata-collection>.

833	   [pass2]    The Guardian, "NSA's Prism surveillance program: how it
834	              works and what it can do", 2013,
835	              <http://www.theguardian.com/world/2013/jun/08/
836	              nsa-prism-server-collection-facebook-google>.

838	   [pass3]    The Guardian, "XKeyscore: NSA tool collects 'nearly
839	              everything a user does on the internet'", 2013,
840	              <http://www.theguardian.com/world/2013/jul/31/
841	              nsa-top-secret-program-online-data>.

843	   [pass4]    The Guardian, "How does GCHQ's internet surveillance
844	              work?", n.d., <http://www.theguardian.com/uk/2013/jun/21/
845	              how-does-gchq-internet-surveillance-work>.

847	   [dec1]     The New York Times, "N.S.A. Able to Foil Basic Safeguards
848	              of Privacy on Web", 2013,
849	              <http://www.nytimes.com/2013/09/06/us/
850	              nsa-foils-much-internet-encryption.html>.

852	   [dec2]     The Guardian, "Project Bullrun - classification guide to
853	              the NSA's decryption program", 2013,
854	              <http://www.theguardian.com/world/interactive/2013/sep/05/
855	              nsa-project-bullrun-classification-guide>.

857	   [dec3]     The Guardian, "Revealed: how US and UK spy agencies defeat
858	              internet privacy and security", 2013,
859	              <http://www.theguardian.com/world/2013/sep/05/
860	              nsa-gchq-encryption-codes-security>.

862	   [TOR]      The Tor Project, "Tor", 2013,
863	              <https://www.torproject.org/>.

865	   [TOR1]     Schneier, B., "How the NSA Attacks Tor/Firefox Users With
866	              QUANTUM and FOXACID", 2013,
867	              <https://www.schneier.com/blog/archives/2013/10/
868	              how_the_nsa_att.html>.

870	   [TOR2]     The Guardian, "'Tor Stinks' presentation - read the full
871	              document", 2013,
872	              <http://www.theguardian.com/world/interactive/2013/oct/04/
873	              tor-stinks-nsa-presentation-document>.

875	   [dir1]     The Guardian, "NSA collecting phone records of millions of
876	              Verizon customers daily", 2013,
877	              <http://www.theguardian.com/world/2013/jun/06/
878	              nsa-phone-records-verizon-court-order>.

880	   [dir2]     The Guardian, "NSA Prism program taps in to user data of
881	              Apple, Google and others", 2013,
882	              <http://www.theguardian.com/world/2013/jun/06/
883	              us-tech-giants-nsa-data>.

885	   [dir3]     The Guardian, "Sigint - how the NSA collaborates with
886	              technology companies", 2013,
887	              <http://www.theguardian.com/world/interactive/2013/sep/05/
888	              sigint-nsa-collaborates-technology-companies>.

890	   [secure]   Schneier, B., "NSA surveillance: A guide to staying
891	              secure", 2013,
892	              <http://www.theguardian.com/world/2013/sep/05/
893	              nsa-how-to-remain-secure-surveillance>.

895	   [snowden]  Technology Review, "NSA Leak Leaves Crypto-Math Intact but
896	              Highlights Known Workarounds", 2013,
897	              <http://www.technologyreview.com/news/519171/nsa-leak-
898	              leaves-crypto-math-intact-but-highlights-known-
899	              workarounds/>.

901	   [spiegel1]
902	              C Stocker, ., "NSA's Secret Toolbox: Unit Offers Spy
903	              Gadgets for Every Need", December 2013,
904	              <http://www.spiegel.de/international/world/nsa-secret-
905	              toolbox-ant-unit-offers-spy-gadgets-for-every-need-
906	              a-941006.html>.

908	   [spiegel3]
909	              H Schmundt, ., "The Digital Arms Race: NSA Preps America
910	              for Future Battle", January 2014,
911	              <http://www.spiegel.de/international/world/new-snowden-
912	              docs-indicate-scope-of-nsa-preparations-for-cyber-battle-
913	              a-1013409.html>.

915	   [key-recovery]
916	              Golle, P., "The Design and Implementation of Protocol-
917	              Based Hidden Key Recovery", 2003,
918	              <http://crypto.stanford.edu/~pgolle/papers/escrow.pdf>.

920	   [RFC1035]  Mockapetris, P., "Domain names - implementation and
921	              specification", STD 13, RFC 1035, November 1987.

923	   [RFC1918]  Rekhter, Y., Moskowitz, R., Karrenberg, D., Groot, G., and
924	              E. Lear, "Address Allocation for Private Internets", BCP
925	              5, RFC 1918, February 1996.

927	   [RFC1939]  Myers, J. and M. Rose, "Post Office Protocol - Version 3",
928	              STD 53, RFC 1939, May 1996.

930	   [RFC2015]  Elkins, M., "MIME Security with Pretty Good Privacy
931	              (PGP)", RFC 2015, October 1996.

933	   [RFC2821]  Klensin, J., "Simple Mail Transfer Protocol", RFC 2821,
934	              April 2001.

936	   [RFC3261]  Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
937	              A., Peterson, J., Sparks, R., Handley, M., and E.
938	              Schooler, "SIP: Session Initiation Protocol", RFC 3261,
939	              June 2002.

941	   [RFC3365]  Schiller, J., "Strong Security Requirements for Internet
942	              Engineering Task Force Standard Protocols", BCP 61, RFC
943	              3365, August 2002.

945	   [RFC3501]  Crispin, M., "INTERNET MESSAGE ACCESS PROTOCOL - VERSION
946	              4rev1", RFC 3501, March 2003.

948	   [RFC3851]  Ramsdell, B., "Secure/Multipurpose Internet Mail
949	              Extensions (S/MIME) Version 3.1 Message Specification",
950	              RFC 3851, July 2004.

952	   [RFC4033]  Arends, R., Austein, R., Larson, M., Massey, D., and S.
953	              Rose, "DNS Security Introduction and Requirements", RFC
954	              4033, March 2005.

956	   [RFC4301]  Kent, S. and K. Seo, "Security Architecture for the
957	              Internet Protocol", RFC 4301, December 2005.

959	   [RFC4303]  Kent, S., "IP Encapsulating Security Payload (ESP)", RFC
960	              4303, December 2005.

962	   [RFC4306]  Kaufman, C., "Internet Key Exchange (IKEv2) Protocol", RFC
963	              4306, December 2005.

965	   [RFC4949]  Shirey, R., "Internet Security Glossary, Version 2", RFC
966	              4949, August 2007.

968	   [RFC5246]  Dierks, T. and E. Rescorla, "The Transport Layer Security
969	              (TLS) Protocol Version 1.2", RFC 5246, August 2008.

971	   [RFC5321]  Klensin, J., "Simple Mail Transfer Protocol", RFC 5321,
972	              October 2008.

974	   [RFC5655]  Trammell, B., Boschi, E., Mark, L., Zseby, T., and A.
975	              Wagner, "Specification of the IP Flow Information Export
976	              (IPFIX) File Format", RFC 5655, October 2009.

978	   [RFC5750]  Ramsdell, B. and S. Turner, "Secure/Multipurpose Internet
979	              Mail Extensions (S/MIME) Version 3.2 Certificate
980	              Handling", RFC 5750, January 2010.

982	   [RFC6120]  Saint-Andre, P., "Extensible Messaging and Presence
983	              Protocol (XMPP): Core", RFC 6120, March 2011.

985	   [RFC6962]  Laurie, B., Langley, A., and E. Kasper, "Certificate
986	              Transparency", RFC 6962, June 2013.

988	   [RFC6698]  Hoffman, P. and J. Schlyter, "The DNS-Based Authentication
989	              of Named Entities (DANE) Transport Layer Security (TLS)
990	              Protocol: TLSA", RFC 6698, August 2012.

992	   [RFC7011]  Claise, B., Trammell, B., and P. Aitken, "Specification of
993	              the IP Flow Information Export (IPFIX) Protocol for the
994	              Exchange of Flow Information", STD 77, RFC 7011, September
995	              2013.

997	   [RFC7258]  Farrell, S. and H. Tschofenig, "Pervasive Monitoring Is an
998	              Attack", BCP 188, RFC 7258, May 2014.

1000	Authors' Addresses

1002	   Richard Barnes

1004	   Email: rlb@ipv.sx

1006	   Bruce Schneier

1008	   Email: schneier@schneier.com

1010	   Cullen Jennings

1012	   Email: fluffy@cisco.com

1014	   Ted Hardie

1016	   Email: ted.ietf@gmail.com

1018	   Brian Trammell

1020	   Email: ietf@trammell.ch

1022	   Christian Huitema

1024	   Email: huitema@huitema.net

1026	   Daniel Borkmann

1028	   Email: dborkman@redhat.com