idnits 2.17.1 

draft-ietf-ipfix-anon-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** You're using the IETF Trust Provisions' Section 6.b License Notice from
     12 Sep 2009 rather than the newer Notice from 28 Dec 2009.  (See
     https://trustee.ietf.org/license-info/)


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** There are 15 instances of too long lines in the document, the longest
     one being 4 characters in excess of 72.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but
     does not include the phrase in its RFC 2119 key words list.

  -- The document date (February 15, 2010) is 5184 days in the past.  Is this
     intentional?


  Checking references for intended status: Experimental
  ----------------------------------------------------------------------------

  == Unused Reference: 'RFC5610' is defined on line 1587, but no explicit
     reference was found in the text

  == Unused Reference: 'I-D.ietf-ipfix-mediators-problem-statement' is
     defined on line 1618, but no explicit reference was found in the text

  ** Obsolete normative reference: RFC 5101 (Obsoleted by RFC 7011)

  ** Obsolete normative reference: RFC 5102 (Obsoleted by RFC 7012)

  ** Obsolete normative reference: RFC 3330 (Obsoleted by RFC 5735)

  == Outdated reference: A later version (-09) exists of
     draft-ietf-ipfix-mediators-framework-04

  == Outdated reference: A later version (-09) exists of
     draft-ietf-ipfix-mediators-problem-statement-07


     Summary: 5 errors (**), 0 flaws (~~), 6 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	IPFIX Working Group                                            E. Boschi
3	Internet-Draft                                               B. Trammell
4	Intended status: Experimental                             Hitachi Europe
5	Expires: August 19, 2010                               February 15, 2010

7	                     IP Flow Anonymisation Support
8	                      draft-ietf-ipfix-anon-02.txt

10	Abstract

12	   This document describes anonymisation techniques for IP flow data and
13	   the export of anonymised data using the IPFIX protocol.  It
14	   categorizes common anonymisation schemes and defines the parameters
15	   needed to describe them.  It provides guidelines for the
16	   implementation of anonymised data export and storage over IPFIX, and
17	   describes an information model and Options-based method for
18	   anonymisation technique metadata export within the IPFIX protocol or
19	   storage in IPFIX Files.

21	Status of this Memo

23	   This Internet-Draft is submitted to IETF in full conformance with the
24	   provisions of BCP 78 and BCP 79.

26	   Internet-Drafts are working documents of the Internet Engineering
27	   Task Force (IETF), its areas, and its working groups.  Note that
28	   other groups may also distribute working documents as Internet-
29	   Drafts.

31	   Internet-Drafts are draft documents valid for a maximum of six months
32	   and may be updated, replaced, or obsoleted by other documents at any
33	   time.  It is inappropriate to use Internet-Drafts as reference
34	   material or to cite them other than as "work in progress."

36	   The list of current Internet-Drafts can be accessed at
37	   http://www.ietf.org/ietf/1id-abstracts.txt.

39	   The list of Internet-Draft Shadow Directories can be accessed at
40	   http://www.ietf.org/shadow.html.

42	   This Internet-Draft will expire on August 19, 2010.

44	Copyright Notice

46	   Copyright (c) 2010 IETF Trust and the persons identified as the
47	   document authors.  All rights reserved.

49	   This document is subject to BCP 78 and the IETF Trust's Legal
50	   Provisions Relating to IETF Documents
51	   (http://trustee.ietf.org/license-info) in effect on the date of
52	   publication of this document.  Please review these documents
53	   carefully, as they describe your rights and restrictions with respect
54	   to this document.  Code Components extracted from this document must
55	   include Simplified BSD License text as described in Section 4.e of
56	   the Trust Legal Provisions and are provided without warranty as
57	   described in the BSD License.

59	Table of Contents

61	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
62	     1.1.  IPFIX Protocol Overview  . . . . . . . . . . . . . . . . .  4
63	     1.2.  IPFIX Documents Overview . . . . . . . . . . . . . . . . .  5
64	     1.3.  Anonymisation within the IPFIX Architecture  . . . . . . .  5
65	   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  7
66	   3.  Categorisation of Anonymisation Techniques . . . . . . . . . .  7
67	   4.  Anonymisation of IP Flow Data  . . . . . . . . . . . . . . . .  8
68	     4.1.  IP Address Anonymisation . . . . . . . . . . . . . . . . . 10
69	       4.1.1.  Truncation . . . . . . . . . . . . . . . . . . . . . . 10
70	       4.1.2.  Reverse Truncation . . . . . . . . . . . . . . . . . . 11
71	       4.1.3.  Permutation  . . . . . . . . . . . . . . . . . . . . . 11
72	       4.1.4.  Prefix-preserving Pseudonymisation . . . . . . . . . . 11
73	     4.2.  Hardware Address Anonymisation . . . . . . . . . . . . . . 12
74	       4.2.1.  Reverse Truncation . . . . . . . . . . . . . . . . . . 12
75	       4.2.2.  Permutation  . . . . . . . . . . . . . . . . . . . . . 13
76	       4.2.3.  Structured Pseudonymisation  . . . . . . . . . . . . . 13
77	     4.3.  Timestamp Anonymisation  . . . . . . . . . . . . . . . . . 13
78	       4.3.1.  Precision Degradation  . . . . . . . . . . . . . . . . 13
79	       4.3.2.  Enumeration  . . . . . . . . . . . . . . . . . . . . . 14
80	       4.3.3.  Random Shifts  . . . . . . . . . . . . . . . . . . . . 14
81	     4.4.  Counter Anonymisation  . . . . . . . . . . . . . . . . . . 14
82	       4.4.1.  Precision Degradation  . . . . . . . . . . . . . . . . 15
83	       4.4.2.  Binning  . . . . . . . . . . . . . . . . . . . . . . . 15
84	       4.4.3.  Random Noise Addition  . . . . . . . . . . . . . . . . 15
85	     4.5.  Anonymisation of Other Flow Fields . . . . . . . . . . . . 16
86	       4.5.1.  Binning  . . . . . . . . . . . . . . . . . . . . . . . 16
87	       4.5.2.  Permutation  . . . . . . . . . . . . . . . . . . . . . 16
88	   5.  Parameters for the Description of Anonymisation Techniques . . 16
89	     5.1.  Stability  . . . . . . . . . . . . . . . . . . . . . . . . 17
90	     5.2.  Truncation Length  . . . . . . . . . . . . . . . . . . . . 17
91	     5.3.  Bin Map  . . . . . . . . . . . . . . . . . . . . . . . . . 17
92	     5.4.  Permutation  . . . . . . . . . . . . . . . . . . . . . . . 18
93	     5.5.  Shift Amount . . . . . . . . . . . . . . . . . . . . . . . 18
94	   6.  Anonymisation Export Support in IPFIX  . . . . . . . . . . . . 18
95	     6.1.  Anonymisation Options Template . . . . . . . . . . . . . . 18
96	     6.2.  Recommended Information Elements for Anonymisation
97	           Metadata . . . . . . . . . . . . . . . . . . . . . . . . . 20
98	       6.2.1.  informationElementIndex  . . . . . . . . . . . . . . . 20
99	       6.2.2.  anonymisationFlags . . . . . . . . . . . . . . . . . . 20
100	       6.2.3.  anonymisationTechnique . . . . . . . . . . . . . . . . 22
101	   7.  Applying Anonymisation Techniques to IPFIX Export and
102	       Storage  . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
103	     7.1.  Arrangement of Processes in IPFIX Anonymisation  . . . . . 24
104	     7.2.  IPFIX-Specific Anonymisation Guidelines  . . . . . . . . . 27
105	       7.2.1.  Appropriate Use of Information Elements for
106	               Anonymised Data  . . . . . . . . . . . . . . . . . . . 27
107	       7.2.2.  Export of Perimeter-Based Anonymisation Policies . . . 28
108	       7.2.3.  Anonymisation of Header Data . . . . . . . . . . . . . 28
109	       7.2.4.  Anonymisation of Options Data  . . . . . . . . . . . . 29
110	       7.2.5.  Special-Use Address Space Considerations . . . . . . . 30
111	   8.  Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
112	   9.  Security Considerations  . . . . . . . . . . . . . . . . . . . 33
113	   10. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 34
114	   11. Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 35
115	   12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 35
116	     12.1. Normative References . . . . . . . . . . . . . . . . . . . 35
117	     12.2. Informative References . . . . . . . . . . . . . . . . . . 35
118	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 36

120	1.  Introduction

122	   The standardisation of an IP flow information export protocol
123	   [RFC5101] and associated representations removes a technical barrier
124	   to the sharing of IP flow data across organizational boundaries and
125	   with network operations, security, and research communities for a
126	   wide variety of purposes.  However, with wider dissemination comes
127	   greater risks to the privacy of the users of networks under
128	   measurement, and to the security of those networks.  While it is not
129	   a complete solution to the issues posed by distribution of IP flow
130	   information, anonymisation (i.e., the deletion or transformation of
131	   information that is considered sensitive and could be used to reveal
132	   the identity of subjects involved in a communication) is an important
133	   tool for the protection of privacy within network measurement
134	   infrastructures.

136	   This document presents a mechanism for representing anonymised data
137	   within IPFIX and guidelines for using it.  It begins with a
138	   categorization of anonymisation techniques.  It then describes
139	   applicability of each technique to commonly anonymisable fields of IP
140	   flow data, organized by information element data type and semantics
141	   as in [RFC5102]; enumerates the parameters required by each of the
142	   applicable anonymisation techniques; and provides guidelines for the
143	   use of each of these techniques in accordance with best practices in
144	   data protection.  Finally, it specifies a mechanism for exporting
145	   anonymised data and binding anonymisation metadata to templates using
146	   IPFIX Options.

148	1.1.  IPFIX Protocol Overview

150	   In the IPFIX protocol, { type, length, value } tuples are expressed
151	   in templates containing { type, length } pairs, specifying which {
152	   value } fields are present in data records conforming to the
153	   Template, giving great flexibility as to what data is transmitted.
154	   Since Templates are sent very infrequently compared with Data
155	   Records, this results in significant bandwidth savings.  Various
156	   different data formats may be transmitted simply by sending new
157	   Templates specifying the { type, length } pairs for the new data
158	   format.  See [RFC5101] for more information.

160	   The IPFIX information model [RFC5102] defines a large number of
161	   standard Information Elements which provide the necessary { type }
162	   information for Templates.  The use of standard elements enables
163	   interoperability among different vendors' implementations.
164	   Additionally, non-standard enterprise-specific elements may be
165	   defined for private use.

167	1.2.  IPFIX Documents Overview

169	   "Specification of the IPFIX Protocol for the Exchange of IP Traffic
170	   Flow Information" [RFC5101] and its associated documents define the
171	   IPFIX Protocol, which provides network engineers and administrators
172	   with access to IP traffic flow information.

174	   "Architecture for IP Flow Information Export" [RFC5470] defines the
175	   architecture for the export of measured IP flow information out of an
176	   IPFIX Exporting Process to an IPFIX Collecting Process, and the basic
177	   terminology used to describe the elements of this architecture, per
178	   the requirements defined in "Requirements for IP Flow Information
179	   Export" [RFC3917].  The IPFIX Protocol document [RFC5101] then covers
180	   the details of the method for transporting IPFIX Data Records and
181	   Templates via a congestion-aware transport protocol from an IPFIX
182	   Exporting Process to an IPFIX Collecting Process.

184	   "Information Model for IP Flow Information Export" [RFC5102]
185	   describes the Information Elements used by IPFIX, including details
186	   on Information Element naming, numbering, and data type encoding.
187	   Finally, "IPFIX Applicability" [RFC5472] describes the various
188	   applications of the IPFIX protocol and their use of information
189	   exported via IPFIX, and relates the IPFIX architecture to other
190	   measurement architectures and frameworks.

192	   Additionally, "Specification of the IPFIX File Format" [RFC5655]
193	   describes a file format based upon the IPFIX Protocol for the storage
194	   of flow data.

196	   This document references the Protocol and Architecture documents for
197	   terminology, and extends the IPFIX Information Model to provide new
198	   Information Elements for anonymisation metadata.  The anonymisation
199	   techniques described herein are equally applicable to the IPFIX
200	   Protocol and data stored in IPFIX Files.

202	1.3.  Anonymisation within the IPFIX Architecture

204	   "Architecture for IP Flow Information Export" [RFC5470] defines the
205	   functions performed in sequence by the various functional blocks in
206	   an IPFIX Device as in the figure below.

208	                    Packet(s) coming into Observation Point(s)
209	                      |                                   |
210	                      v                                   v
211	     +----------------+-------------------------+   +-----+-------+
212	     |          Metering Process on an          |   |             |
213	     |             Observation Point            |   |             |
214	     |                                          |   |             |
215	     |   packet header capturing                |   |             |
216	     |        |                                 |...| Metering    |
217	     |   timestamping                           |   | Process N   |
218	     |        |                                 |   |             |
219	     | +----->+                                 |   |             |
220	     | |      |                                 |   |             |
221	     | |   sampling Si (1:1 in case of no       |   |             |
222	     | |      |          sampling)              |   |             |
223	     | |   filtering Fi (select all when        |   |             |
224	     | |      |          no criteria)           |   |             |
225	     | +------+                                 |   |             |
226	     |        |                                 |   |             |
227	     |        |        Timing out Flows         |   |             |
228	     |        |    Handle resource overloads    |   |             |
229	     +--------|---------------------------------+   +-----|-------+
230	              |                                           |
231	      Flow Records (identified by Observation Domain)  Flow Records
232	              |                                           |
233	              +---------+---------------------------------+
234	                        |
235	   +--------------------|----------------------------------------------+
236	   |                    |     Exporting Process                        |
237	   |+-------------------|-------------------------------------------+  |
238	   ||                   v       IPFIX Protocol                      |  |
239	   ||+-----------------------------+  +----------------------------+|  |
240	   |||Rules for                    |  |Functions                   ||  |
241	   ||| Picking/sending Templates   |  |-Packetise selected Control ||  |
242	   ||| Picking/sending Flow Records|->|  & data Information into   ||  |
243	   ||| Encoding Template & data    |  |  IPFIX export packets.     ||  |
244	   ||| Selecting Flows to export(*)|  |-Handle export errors       ||  |
245	   ||+-----------------------------+  +----------------------------+|  |
246	   |+----------------------------+----------------------------------+  |
247	   |                             |                                     |
248	   |                    exported IPFIX Messages                        |
249	   |                             |                                     |
250	   |                +------------+-----------------+                   |
251	   |                |  Anonymise export packet(*)  |                   |
252	   |                +------------+-----------------+                   |
253	   |                             |                                     |
254	   |                +------------+-----------------+                   |
255	   |                |       Transport  Protocol    |                   |
256	   |                +------------+-----------------+                   |
257	   |                             |                                     |
258	   +-----------------------------+-------------------------------------+
259	                                 |
260	                                 v
261	                    IPFIX export packet to Collector

263	   (*) indicates that the block is optional.

265	                 Figure 1: IPFIX Device functional blocks

267	   Note that, according to the original architecture specification,
268	   IPFIX Message anonymisation is optionally performed as the final
269	   operation before handing the Message to the transport protocol for
270	   export.  While no provision is made in the architecture for
271	   anonymisation metadata as in Section 6, this arrangement does allow
272	   for the message rewriting necessary for comprehensive anonymisation
273	   of IPFIX export as in Section 7.  The development of the IPFIX
274	   Mediation [I-D.ietf-ipfix-mediators-framework] framework and the
275	   IPFIX File Format [RFC5655] expand upon this initial architectural
276	   allowance for anonymisation by adding to the list of places that
277	   anonymisation may be applied.  The former specifies IPFIX Mediators,
278	   which rewrite existing IPFIX messages, and the latter specifies a
279	   method for storage of IPFIX data in files.

281	   More detail on the applicable architectural arrangements of
282	   anonymisation can be found in Section 7.1

284	2.  Terminology

286	   Terms used in this document that are defined in the Terminology
287	   section of the IPFIX Protocol [RFC5101] document are to be
288	   interpreted as defined there.

290	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
291	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
292	   document are to be interpreted as described in RFC 2119 [RFC2119].

294	3.  Categorisation of Anonymisation Techniques

296	   Anonymisation modifies a data set in order to protect the identity of
297	   the people or entities described by the data set from disclosure.
298	   With respect to network traffic data, anonymisation generally
299	   attempts to preserve some set of properties of the network traffic
300	   useful for a given application or applications, while ensuring the
301	   data cannot be traced back to the specific networks, hosts, or users
302	   generating the traffic.

304	   Anonymisation may be broadly classified according to two properties:
305	   recoverability and countability.  All anonymisation techniques map
306	   the real space of identifiers or values into a separate, anonymised
307	   space, according to some function.  A technique is said to be
308	   recoverable when the function used is invertible or can otherwise be
309	   reversed and a real identifier can be recovered from a given
310	   replacement identifier.

312	   Countability compares the dimension of the anonymised space (N) to
313	   the dimension of the real space (M), and denotes how the count of
314	   unique values is preserved by the anonymisation function.  If the
315	   anonymised space is smaller than the real space, then the function is
316	   said to generalise the input, mapping more than one input point to
317	   each anonymous value (e.g., as with aggregation).  By definition,
318	   generalisation is not recoverable.

320	   If the dimensions of the anonymised and real spaces are the same,
321	   such that the count of unique values is preserved, then the function
322	   is said to be a direct substitution function.  If the dimension of
323	   the anonymised space is larger, such that each real value maps to a
324	   set of anonymised values, then the function is said to be a set
325	   substitution function.  Note that with set substitution functions,
326	   the sets of anonymised values are not necessarily disjoint.  Either
327	   direct or set substitution functions are said to be one-way if there
328	   exists no practical method for recovering the real data point from an
329	   anonymised one.

331	   This classification is summarised in the table below.

333	   +------------------------+-----------------+------------------------+
334	   | Recoverability /       | Recoverable     | Non-recoverable        |
335	   | Countability           |                 |                        |
336	   +------------------------+-----------------+------------------------+
337	   | N < M                  | N.A.            | Generalisation         |
338	   | N = M                  | Direct          | One-way Direct         |
339	   |                        | Substitution    | Substitution           |
340	   | N > M                  | Set             | One-way Set            |
341	   |                        | Substitution    | Substitution           |
342	   +------------------------+-----------------+------------------------+

344	4.  Anonymisation of IP Flow Data

346	   Due to the restricted semantics of IP flow data, there is a
347	   relatively limited set of specific anonymisation techniques available
348	   on flow data, though each falls into the broad categories above.
349	   Each type of field that may commonly appear in a flow record may have
350	   its own applicable specific techniques.

352	   While anonymisation is generally applied at the resolution of single
353	   fields within a flow record, attacks against anonymisation use entire
354	   flows and relationships between hosts and flows within a given data
355	   set.  Therefore, fields which may not necessarily be identifying by
356	   themselves may be anonymised in order to increase the anonymity of
357	   the data set as a whole.

359	   Of all the fields in an IP flow record, IP addresses are the most
360	   likely to be used to directly identify entities in the real world.
361	   Each IP address is associated with an interface on a network host,
362	   and can potentially be identified with a single user.  Additionally,
363	   IP addresses are structured identifiers; that is, partial IP address
364	   prefixes may be used to identify networks just as full IP addresses
365	   identify hosts.  This makes anonymisation of IP addresses
366	   particularly important.

368	   Hardware addresses uniquely identify devices on the network; while
369	   they are not often available in traffic data collected at Layer 3,
370	   and cannot be used to locate devices within the network, some traces
371	   may contain sub-IP data including hardware address data.  Hardware
372	   addresses may be mappable to device serial numbers, and to the
373	   entities or individuals who purchased the devices, when combined with
374	   external databases.  They may also leak via IPv6 addresses in certain
375	   circumstances.  Therefore, hardware address anonymisation is also
376	   important.

378	   Port numbers identify abstract entities (applications) as opposed to
379	   real-world entities, but they can be used to classify hosts and user
380	   behavior.  Passive port fingerprinting, both of well-known and
381	   ephemeral ports, can be used to determine the operating system
382	   running on a host.  Relative data volumes by port can also be used to
383	   determine the host's function (workstation, web server, etc.); this
384	   information can be used to identify hosts and users.

386	   While not identifiers in and of themselves, timestamps and counters
387	   can reveal the behavior of the hosts and users on a network.  Any
388	   given network activity is recognizable by a pattern of relative time
389	   differences and data volumes in the associated sequence of flows,
390	   even without host address information.  They can therefore be used to
391	   identify hosts and users.  Timestamps and counters are also
392	   vulnerable to traffic injection attacks, where traffic with a known
393	   pattern is injected into a network under measurement, and this
394	   pattern is later identified in the anonymised data set.

396	   The simplest and most extreme form of anonymisation, which can be
397	   applied to any field of a flow record, is black-marker anonymisation,
398	   or complete deletion of a given field.  Note that black-marker
399	   anonymisation is equivalent to simply not exporting the field(s) in
400	   question.

402	   While black-marker anonymisation completely protects the data in the
403	   deleted fields from the risk of disclosure, it also reduces the
404	   utility of the anonymised data set as a whole.  Techniques that
405	   retain some information while reducing (though not eliminating) the
406	   disclosure risk will be extensively discussed in the following
407	   sections; note that the techniques specifically applicable to IP
408	   addresses, timestamps, ports, and counters will be discussed in
409	   separate sections.

411	4.1.  IP Address Anonymisation

413	   Since IP addresses are the most common identifiers within flow data
414	   that can be used to directly identify a person, organization, or
415	   host, most of the work on flow and trace data anonymisation has gone
416	   into IP address anonymisation techniques.  Indeed, the aim of most
417	   attacks against anonymisation is to recover the map from anonymised
418	   IP addresses to original IP addresses thereby identifying the
419	   identified hosts.  There is therefore a wide range of IP address
420	   anonymisation schemes that fit into the following categories.

422	       +------------------------------------+---------------------+
423	       | Scheme                             | Action              |
424	       +------------------------------------+---------------------+
425	       | Truncation                         | Generalisation      |
426	       | Reverse Truncation                 | Generalisation      |
427	       | Permutation                        | Direct Substitution |
428	       | Prefix-preserving Pseudonymisation | Direct Substitution |
429	       +------------------------------------+---------------------+

431	4.1.1.  Truncation

433	   Truncation removes "n" of the least significant bits from an IP
434	   address, replacing them with zeroes.  In effect, it replaces a host
435	   address with a network address for some fixed netblock; for IPv4
436	   addresses, 8-bit truncation corresponds to replacement with a /24
437	   network address.  Truncation is a non-reversible generalisation
438	   scheme.  Note that while truncation is effective for making hosts
439	   non-identifiable, it preserves information which can be used to
440	   identify an organization, a geographic region, a country, or a
441	   continent (or RIR region of responsibility).

443	   Truncation to an address length of 0 is equivalent to black-marker
444	   anonymisation.  Complete removal of IP address information is only
445	   recommended for analysis tasks which have no need to separate flow
446	   data by host or network; e.g. as a first stage to per-application
447	   (port) or time-series total volume analyses.

449	4.1.2.  Reverse Truncation

451	   Reverse truncation removes "n" of the most significant bits from an
452	   IP address, replacing them with zeroes.  Reverse truncation is a non-
453	   reversible generalisation scheme.  Reverse truncation is effective
454	   for making networks unidentifiable, partially or completely removing
455	   information which can be used to identify an organization, a
456	   geographic region, a country, or a continent (or RIR region of
457	   responsibility).  However, it may cause ambiguity when applied to
458	   data collected from more than one network, since it treats all the
459	   hosts with the same address on different networks as if they are the
460	   same host.  It is not particularly useful when publishing data where
461	   the network of origin is known or can be easily guessed by virtue of
462	   the identity of the publisher.

464	   Like truncation, reverse truncation to an address length of 0 is
465	   equivalent to black-marker anonymisation.

467	4.1.3.  Permutation

469	   Permutation is a direct substitution technique, replacing each IP
470	   address with an address selected from the set of possible IP
471	   addresses, guaranteeing that each anonymised address represents a
472	   unique original address.  The selection function is often random,
473	   though it is not necessarily so.  Permutation does not preserve any
474	   structural information about a network, but it does preserve the
475	   unique count of IP addresses.  Any application that requires more
476	   structure than host-uniqueness will not be able to use permuted IP
477	   addresses.

479	4.1.4.  Prefix-preserving Pseudonymisation

481	   Prefix-preserving pseudonymisation is a direct substitution
482	   technique, like permutation but further restricted such that the
483	   structure of subnets is preserved at each level while anonymising IP
484	   addresses.  If two real IP addresses match on a prefix of "n" bits,
485	   the two anonymised IP addresses will match on a prefix of "n" bits as
486	   well.  This is useful when relationships among networks must be
487	   preserved for a given analysis task, but introduces structure into
488	   the anonymised data which can be exploited in attacks against the
489	   anonymisation technique.

491	   Scanning in Internet background traffic can cause particular problems
492	   with this technique: if a scanner uses a predictable and known
493	   sequence of addresses, this information can be used to reverse the
494	   substitution.  The low order portion of the address can be left
495	   unanonymized as a partial defense against this attack.

497	4.2.  Hardware Address Anonymisation

499	   Flow data containing sub-IP information can also contain identifying
500	   information in the form of the hardware (MAC) address.  While
501	   hardware address information cannot be used to locate a node within a
502	   network, it can be used to directly uniquely identify a specific
503	   device.  Vendors or organizations within the supply chain may then
504	   have the information necessary to identify the entity or individual
505	   that purchased the device.

507	   Hardware address information is not as structured as IP address
508	   information.  EUI-48 and EUI-64 hardware addresses contain an
509	   Organizational Unique Identifier in the three most significant bytes
510	   of the address; this OUI additionally contains bits noting whether
511	   the address is locally or globally administered.  Beyond this, the
512	   address is unstructured, and there is no particular relationship
513	   among the OUIs assigned to a given vendor.

515	   Note that hardware address information also appear within IPv6
516	   addresses, as the EAP-64 address, or EAP-48 address encoded as an
517	   EAP-64 address, is used as the least significant 64 bits of the IPv6
518	   address in the case of link local addressing or stateless
519	   autoconfiguration; the considerations and techniques in this section
520	   may then apply to such IPv6 addresses as well.

522	           +-----------------------------+---------------------+
523	           | Scheme                      | Action              |
524	           +-----------------------------+---------------------+
525	           | Reverse Truncation          | Generalisation      |
526	           | Permutation                 | Direct Substitution |
527	           | Structured Pseudonymisation | Direct Substitution |
528	           +-----------------------------+---------------------+

530	4.2.1.  Reverse Truncation

532	   Reverse truncation removes "n" of the most significant bits from an
533	   MAC address, replacing them with zeroes.  Reverse truncation is a
534	   non-reversible generalisation scheme.  This has the effect of
535	   removing bits of the OUI, which identify manufacturers, before
536	   removing the least significant bits.  Reverse truncation of 24 bits
537	   zeroes out the OUI.

539	   Reverse truncation is effective for making device manufacturers
540	   partially or completely unidentifiable within a dataset.  However, it
541	   may cause ambiguity by introducing the possibility of truncated MAC
542	   address collision.  Also note that the utility or removing
543	   manufacturer information is dubious, and not particularly well-
544	   covered by the literature.

546	   Reverse truncation to an address length of 0 is equivalent to black-
547	   marker anonymisation.

549	4.2.2.  Permutation

551	   Permutation is a direct substitution technique, replacing each MAC
552	   address with an address selected from the set of possible MAC
553	   addresses, guaranteeing that each anonymised address represents a
554	   unique original address.  The selection function is often random,
555	   though it is not necessarily so.  Permutation does not preserve any
556	   structural information about a network, but it does preserve the
557	   unique count of devices on the network.  Any application that
558	   requires more structure than host-uniqueness will not be able to use
559	   permuted MAC addresses.

561	4.2.3.  Structured Pseudonymisation

563	   Structured pseudonymisation for MAC addresses is a direct
564	   substitution technique, like permutation, but restricted such that
565	   the OUI (the most significant three bytes) is permuted separately
566	   from the node identifier, the remainder.  This is useful when the
567	   uniqueness of OUIs must be preserved for a given analysis task, but
568	   introduces structure into the anonymised data which can be exploited
569	   in attacks against the anonymisation technique.

571	4.3.  Timestamp Anonymisation

573	   The particular time at which a flow began or ended is not
574	   particularly identifiable information, but it can be used as part of
575	   attacks against other anonymisation techniques or for user profiling.
576	   Presice timestamps can be used in injected-traffic fingerprinting
577	   attacks as well as to identify certain activity by response delay and
578	   size fingerprinting.  Therefore, timestamp information may be
579	   anonymised in order to ensure the protection of the entire dataset.

581	          +-----------------------+----------------------------+
582	          | Scheme                | Action                     |
583	          +-----------------------+----------------------------+
584	          | Precision Degradation | Generalisation             |
585	          | Enumeration           | Direct or Set Substitution |
586	          | Random Shifts         | Direct Substitution        |
587	          +-----------------------+----------------------------+

589	4.3.1.  Precision Degradation

591	   Precision Degradation is a generalisation technique that removes the
592	   most precise components of a timestamp, accounting all events
593	   occurring in each given interval (e.g. one millisecond for
594	   millisecond level degradation) as simultaneous.  This has the effect
595	   of potentially collapsing many timestamps into one.  With this
596	   technique time precision is reduced, and sequencing may be lost, but
597	   the information at which time the event occurred is preserved.  The
598	   anonymised data may not be generally useful for applications which
599	   require strict sequencing of flows.

601	   Note that flow meters with low time precision (e.g. second precision,
602	   or millisecond precision on high-capacity networks) perform the
603	   equivalent of precision degradation anonymisation by their design.

605	   Note also that degradation to a very low precision (e.g. on the order
606	   of minutes, hours, or days) is commonly used in analyses operating on
607	   time-series aggregated data, and may also be described as binning;
608	   though the time scales are longer and applicability more restricted,
609	   this is in principle the same operation.

611	   Precision degradation to infinitely low precision is equivalent to
612	   black-marker anonymisation.  Removal of timestamp information is only
613	   recommended for analysis tasks which have no need to separate flows
614	   in time, for example for counting total volumes or unique occurrences
615	   of other flow keys in an entire dataset.

617	4.3.2.  Enumeration

619	   Enumeration is a substitution function that retains the chronological
620	   order in which events occurred while eliminating time information.
621	   Timestamps are substituted by equidistant timestamps (or numbers)
622	   starting from a randomly chosen start value.  The resulting data is
623	   useful for applications requiring strict sequencing, but not for
624	   those requiring good timing information (e.g. delay- or jitter-
625	   measurement for QoS applications or SLA validation).

627	4.3.3.  Random Shifts

629	   Random time shifts add a random offset to every timestamp within a
630	   dataset.  This reversible substitution technique therefore retains
631	   duration and inter-event interval information as well as
632	   chronological order of flows.  It is primarily intended to defeat
633	   traffic injection fingerprinting attacks.

635	4.4.  Counter Anonymisation

637	   Counters (such as packet and octet volumes per flow) are subject to
638	   fingerprinting and injection attacks against anonymisation, or for
639	   user profiling as timestamps are.  Counter anonymisation can help
640	   defeat these attacks, but are only usable for analysis tasks for
641	   which relative or imprecise magnitudes of activity are useful.

643	   Counter information can also be completely removed, but this is only
644	   recommended for analysis tasks which have no need to evaluate the
645	   removed counter, for example for counting only unique occurrences of
646	   other flow keys.

648	          +-----------------------+----------------------------+
649	          | Scheme                | Action                     |
650	          +-----------------------+----------------------------+
651	          | Precision Degradation | Generalisation             |
652	          | Binning               | Generalisation             |
653	          | Random noise addition | Direct or Set Substitution |
654	          +-----------------------+----------------------------+

656	4.4.1.  Precision Degradation

658	   As with precision degradation in timestamps, precision degradation of
659	   counters removes lower-order bits of the counters, treating all the
660	   counters in a given range as having the same value.  Depending on the
661	   precision reduction, this loses information about the relationships
662	   between sizes of similarly-sized flows, but keeps relative magnitude
663	   information.  Precision degradation to an infinitely low precision is
664	   equivalent to black-marker anonymisation.

666	4.4.2.  Binning

668	   Binning can be seen as a special case of precision degradation; the
669	   operation is identical, except for in precision degradation the
670	   counter ranges are uniform, and in binning they need not be.  For
671	   example, a common counter binning scheme for packet counters could be
672	   to bin values 1-2 together, and 3-infinity together, thereby
673	   separating potentially completely-opened TCP connections from
674	   unopened ones.  Binning schemes are generally chosen to keep
675	   precisely the amount of information required in a counter for a given
676	   analysis task.  Note that, also unlike precision degradation, the bin
677	   label need not be within the bin's range.  Binning counters to a
678	   single bin is equivalent to black-marker anonymisation.

680	4.4.3.  Random Noise Addition

682	   Random noise addition adds a random amount to a counter in each flow;
683	   this is used to keep relative magnitude information and minimize the
684	   disruption to size relationship information while avoiding
685	   fingerprinting attacks against anonymisation.  Note that there is no
686	   guarantee that random noise addition will maintain ranking order by a
687	   counter among members of a set.  Random noise addition is
688	   particularly useful when the derived analysis data will not be
689	   presented in such a way as to require the lower-order bits of the
690	   counters.

692	4.5.  Anonymisation of Other Flow Fields

694	   Other fields, particularly port numbers and protocol numbers, can be
695	   used to partially identify the applications that generated the
696	   traffic in a a given flow trace.  This information can be used in
697	   fingerprinting attacks, and may be of interest on its own (e.g., to
698	   reveal that a certain application with suspected vulnerabilities is
699	   running on a given network).  These fields are generally anonymised
700	   using one of two techniques.

702	                   +-------------+---------------------+
703	                   | Scheme      | Action              |
704	                   +-------------+---------------------+
705	                   | Binning     | Generalisation      |
706	                   | Permutation | Direct Substitution |
707	                   +-------------+---------------------+

709	4.5.1.  Binning

711	   Binning is a generalisation technique mapping a set of potentially
712	   non-uniform ranges into a set of arbitrarily labeled bins.  Common
713	   bin arrangements depend on the field type and the analysis
714	   application.  For example, an IP protocol bin arrangement may
715	   preserve 1, 6, and 17 for ICMP, UDP, and TCP traffic, and bin all
716	   other protocols into a single bin, to mitigate the use of uncommon
717	   protocols in fingerprinting attacks.  Another example arrangement may
718	   bin source and destination ports into low (0-1023) and high (1024-
719	   65535) bins in order to tell service from ephemeral ports without
720	   identifying individual applications.

722	   Binning other flow key fields to a single bin is equivalent to black-
723	   marker anonymisation.  Removal of other flow key information is only
724	   recommended for analysis tasks which have no need to differentiate
725	   flows on the removed keys, for example for total traffic counts or
726	   unique counts of other flow keys.

728	4.5.2.  Permutation

730	   Permutation is a direct substitution technique, replacing each value
731	   with an value selected from the set of possible range, guaranteeing
732	   that each anonymised value represents a unique original value.  This
733	   is used to preserve the count of unique values without preserving
734	   information about, or the ordering of, the values themselves.

736	5.  Parameters for the Description of Anonymisation Techniques

738	   This section details the abstract parameters used to describe the
739	   anonymisation techniques examined in the previous section, on a per-
740	   parameter basis.  These parameters and their export safety inform the
741	   design of the IPFIX anonymisation metadata export specified in the
742	   following section.

744	5.1.  Stability

746	   Any given anonymisation technique may be applied with a varying range
747	   of stability.  Stability is important for assessing the comparability
748	   of anonymised information in different data sets, or in the same data
749	   set over different time periods.  In general, stability ranges from
750	   completely stable to completely unstable; however, note that the
751	   completely unstable case is indistinguishable from black-marker
752	   anonymisation.  A completely stable anonymisation will always map a
753	   given value in the real space to the same value in the anonymised
754	   space.  In practice, an anonymisation may also be stable for every
755	   data set published by an a particular producer to a particular
756	   consumer, stable for a stated time period within a dataset or across
757	   datasets, or stable only for a single data set.

759	   If no information about stability is available, users of anonymised
760	   data may assume that the techniques used are stable across the entire
761	   dataset, but unstable across datasets.  Note that stability presents
762	   a risk-utility tradeoff, as completely stable anonymisation can be
763	   used for longer-term trend analysis tasks but also presents more risk
764	   of attack given the stable mapping.

766	5.2.  Truncation Length

768	   Truncation and precision degradation are described by the truncation
769	   length, or the amount of data still remaining in the anonymised field
770	   after anonymisation.

772	   Truncation length can be inferred from a given data set, and need not
773	   be specially exported or protected.

775	5.3.  Bin Map

777	   Binning is described by the specification of a bin mapping function.
778	   This function can be generally expressed in terms of an associative
779	   array that maps each point in the original space to a bin, although
780	   from an implementation standpoint most bin functions are much simpler
781	   and more efficient.

783	   Since knowledge of the bin mapping function can be used to partially
784	   deanonymise binned data, depending on the degree of generalisation,
785	   no information about the bin mapping function should be exported.

787	5.4.  Permutation

789	   Like binning, permutation is described by the specification of a
790	   permutation function.  In the general case, this can be expressed in
791	   terms of an associative array that maps each point in the original
792	   space to a point in the anonymised space.  Unlike binning, each point
793	   in the anonymised space must correspond to a single, unique point in
794	   the original space.

796	   Since knowledge of the permutation function can be used to completely
797	   deanonymise permuted data, no information about the permutation
798	   function or its parameters should be exported.

800	5.5.  Shift Amount

802	   Shifting requires an amount to shift each value by.  Since the shift
803	   amount can be used to deanonymise data protected by shifting, no
804	   information about the shift amount should be exported.

806	6.  Anonymisation Export Support in IPFIX

808	   Anonymised data exported via IPFIX SHOULD be annotated with
809	   anonymisation metadata, which details which fields described by which
810	   Templates are anonymised, and provides appropriate information on the
811	   anonymisation techniques used.  This metadata SHOULD be exported in
812	   Data Records described by the recommended Options Templates described
813	   in this section; these Options Templates use the additional
814	   Information Elements described in the following subsection.

816	   Note that fields anonymised using the black-marker (removal)
817	   technique do not require any special metadata support.  Black-marker
818	   anonymised fields SHOULD NOT be exported at all; the absence of the
819	   field in a given Data Set is implicitly declared by not including the
820	   corresponding Information Element in the Template describing that
821	   Data Set.

823	6.1.  Anonymisation Options Template

825	   The Anonymisation Options Template describes anonymisation records,
826	   which allow anonymisation metadata to be exported inline over IPFIX
827	   or stored in an IPFIX File, by binding information about
828	   anonymisation techniques to Information Elements within defined
829	   Templates.  IPFIX Exporting Processes SHOULD export anonymisation
830	   records for any Template describing exported anonymised Data Records;
831	   IPFIX Collecting Processes and processes downstream from them MAY use
832	   anonymisation records to treat anonymised data differently depending
833	   on the applied technique.

835	   An Exporting Process SHOULD export anonymisation records after the
836	   Templates they describe have been exported, and SHOULD export
837	   anonymisation records reliably.

839	   Anonymisation records, like Templates, MUST be handled by Collecting
840	   Processes as scoped to the Transport Session in which they are sent.
841	   While the Stability Class within the anonymisationFlags IE can be
842	   used to declare that a given anonymisation technique's mapping will
843	   remain stable across multiple sessions, each session MUST re-export
844	   the anonymisation Records along with the templates.

846	   +-------------------------+-----------------------------------------+
847	   | IE                      | Description                             |
848	   +-------------------------+-----------------------------------------+
849	   | templateId [scope]      | The Template ID of the Template         |
850	   |                         | containing the Information Element      |
851	   |                         | described by this anonymisation record. |
852	   |                         | This Information Element MUST be        |
853	   |                         | defined as a Scope Field.               |
854	   | informationElementId    | The Information Element identifier of   |
855	   | [scope]                 | the Information Element described by    |
856	   |                         | this anonymisation record.  This        |
857	   |                         | Information Element MUST be defined as  |
858	   |                         | a Scope Field.                          |
859	   | informationElementId    | The Private Enterprise Number of the    |
860	   | [scope] [optional]      | enterprise-specific Information Element |
861	   |                         | described by this anonymisation record. |
862	   |                         | This Information Element MUST be        |
863	   |                         | defined as a Scope Field if present.    |
864	   | informationElementIndex | The Information Element index of the    |
865	   | [scope] [optional]      | instance of the Information Element     |
866	   |                         | described by this anonymisation record  |
867	   |                         | identified by the informationElementId  |
868	   |                         | within the Template.  Optional; need    |
869	   |                         | only be present when describing         |
870	   |                         | Templates that have multiple instances  |
871	   |                         | of the same Information Element.  This  |
872	   |                         | Information Element MUST be defined as  |
873	   |                         | a Scope Field if present.  This         |
874	   |                         | Information Element is defined in       |
875	   |                         | Section 6.2, below.                     |
876	   | anonymisationFlags      | Flags describing the mapping stability  |
877	   |                         | and specialized modifications to the    |
878	   |                         | Anonymisation Technique in use.  SHOULD |
879	   |                         | be present.  This Information Element   |
880	   |                         | is defined in Section 6.2, below.       |
881	   | anonymisationTechnique  | The technique used to anonymise the     |
882	   |                         | data.  MUST be present.  This           |
883	   |                         | Information Element is defined in       |
884	   |                         | Section 6.2, below.                     |
885	   +-------------------------+-----------------------------------------+

887	6.2.  Recommended Information Elements for Anonymisation Metadata

889	6.2.1.  informationElementIndex

891	   Description:   A zero-based index of an Information Element
892	      referenced by informationElementId within a Template referenced by
893	      templateId; used to disambiguate scope for templates containing
894	      multiple identical Information Elements.

896	   Abstract Data Type:   unsigned16

898	   ElementId:   TBD3

900	   Status:   Proposed

902	6.2.2.  anonymisationFlags

904	   Description:   A flag word describing specialized modifications to
905	      the anonymisation policy in effect for the anonymisation technique
906	      applied to a referenced Information Element within a referenced
907	      Template.  When flags are clear (0), the normal policy (as
908	      described by anonymisationTechnique) applies without modification.

910	      MSB   14  13  12  11  10   9   8   7   6   5   4   3   2   1  LSB
911	      +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
912	      |                Reserved                       |LOR|PmA|   SC  |
913	      +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

915	                            anonymisationFlags IE

917	   +--------+----------+-----------------------------------------------+
918	   | bit(s) | name     | description                                   |
919	   | (LSB = |          |                                               |
920	   | 0)     |          |                                               |
921	   +--------+----------+-----------------------------------------------+
922	   | 0-1    | SC       | Stability Class: see the Stability Class      |
923	   |        |          | table below, and section Section 5.1.         |
924	   | 2      | PmA      | Perimeter Anonymisation: when set (1), source |
925	   |        |          | address Information Elements are interpreted  |
926	   |        |          | as external addresses, and destination        |
927	   |        |          | address Information Elements are interpreted  |
928	   |        |          | as internal addresses, for the purposes of    |
929	   |        |          | associating anonymisationTechnique to         |
930	   |        |          | Information Elements.  MUST NOT be set when   |
931	   |        |          | associated with a non-endpoint (i.e., source- |
932	   |        |          | or destination-) Information Element.  SHOULD |
933	   |        |          | be consistent within a record (i.e., if a     |
934	   |        |          | source- Information Element has this flag     |
935	   |        |          | set, the corresponding destination- element   |
936	   |        |          | SHOULD have this flag set, and vice-versa.)   |
937	   | 3      | LOR      | Low-Order Unchanged: when set (1), the        |
938	   |        |          | low-order bits of the anonymised Information  |
939	   |        |          | Element contain real data.  This modification |
940	   |        |          | is intended for the anonymisation of          |
941	   |        |          | network-level addresses while leaving         |
942	   |        |          | host-level addresses intact in order to       |
943	   |        |          | preserve host level-structure, which could    |
944	   |        |          | otherwise be used to reverse anonymisation.   |
945	   |        |          | MUST NOT be set when associated with a        |
946	   |        |          | truncation-based anonymisationTechnique.      |
947	   | 4-15   | Reserved | Reserved for future use: SHOULD be cleared    |
948	   |        |          | (0) by the Exporting Process and MUST be      |
949	   |        |          | ignored by the Collecting Process.            |
950	   +--------+----------+-----------------------------------------------+

952	      The Stability Class portion of this flags word describes the
953	      stability class of the anonymisation technique applied to a
954	      referenced Information Element within a referenced Template.
955	      Stability classes refer to the stability of the parameters of the
956	      anonymisation technique, and therefore the comparability of the
957	      mapping between the real and anonymised values over time.  This
958	      determines which anonymised datasets may be compared with each
959	      other.  Values are as follows:

961	   +-----+-----+-------------------------------------------------------+
962	   | Bit | Bit | Description                                           |
963	   | 1   | 0   |                                                       |
964	   +-----+-----+-------------------------------------------------------+
965	   | 0   | 0   | Undefined: the Exporting Process makes no             |
966	   |     |     | representation as to how stable the mapping is, or    |
967	   |     |     | over what time period values of this field will       |
968	   |     |     | remain comparable; while the Collecting Process MAY   |
969	   |     |     | assume Session level stability, Session level         |
970	   |     |     | stability is not guaranteed.  Processes SHOULD assume |
971	   |     |     | this is the case in the absence of stability class    |
972	   |     |     | information; this is the default stability class.     |
973	   | 0   | 1   | Session: the Exporting Process will ensure that the   |
974	   |     |     | parameters of the anonymisation technique are stable  |
975	   |     |     | during the Transport Session.  All the values of the  |
976	   |     |     | described Information Element for each Record         |
977	   |     |     | described by the referenced Template within the       |
978	   |     |     | Transport Session are comparable.  The Exporting      |
979	   |     |     | Process SHOULD endeavour to ensure at least this      |
980	   |     |     | stability class.                                      |
981	   | 1   | 0   | Exporter-Collector Pair: the Exporting Process will   |
982	   |     |     | ensure that the parameters of the anonymisation       |
983	   |     |     | technique are stable across Transport Sessions over   |
984	   |     |     | time with the given Collecting Process, but may use   |
985	   |     |     | different parameters for different Collecting         |
986	   |     |     | Processes.  Data exported to different Collecting     |
987	   |     |     | Processes is not comparable.                          |
988	   | 1   | 1   | Stable: the Exporting Process will ensure that the    |
989	   |     |     | parameters of the anonymisation technique are stable  |
990	   |     |     | across Transport Sessions over time, regardless of    |
991	   |     |     | the Collecting Process to which it is sent.           |
992	   +-----+-----+-------------------------------------------------------+

994	   Abstract Data Type:   unsigned16

996	   ElementId:   TBD1

998	   Status:   Proposed

1000	6.2.3.  anonymisationTechnique

1002	   Description:   A description of the anonymisation technique applied
1003	      to a referenced Information Element within a referenced Template.
1004	      Each technique may be applicable only to certain Information
1005	      Elements and recommended only for certain Infomation Elements;
1006	      these restrictions are noted in the table below.

1008	   +-------+---------------------------+-----------------+-------------+
1009	   | Value | Description               | Applicable to   | Recommended |
1010	   |       |                           |                 | for         |
1011	   +-------+---------------------------+-----------------+-------------+
1012	   | 0     | Undefined: the Exporting  | all             | all         |
1013	   |       | Process makes no          |                 |             |
1014	   |       | representation as to      |                 |             |
1015	   |       | whether the defined field |                 |             |
1016	   |       | is anonymised or not.     |                 |             |
1017	   |       | While the Collecting      |                 |             |
1018	   |       | Process MAY assume that   |                 |             |
1019	   |       | the field is not          |                 |             |
1020	   |       | anonymised, it is not     |                 |             |
1021	   |       | guaranteed not to be.     |                 |             |
1022	   |       | This is the default       |                 |             |
1023	   |       | anonymisation technique.  |                 |             |
1024	   | 1     | None: the values exported | all             | all         |
1025	   |       | are real.                 |                 |             |
1026	   | 2     | Precision                 | all             | all         |
1027	   |       | Degradation/Truncation:   |                 |             |
1028	   |       | the values exported are   |                 |             |
1029	   |       | anonymised using simple   |                 |             |
1030	   |       | precision degradation or  |                 |             |
1031	   |       | truncation.  The new      |                 |             |
1032	   |       | precision or number of    |                 |             |
1033	   |       | truncated bits is         |                 |             |
1034	   |       | implicit in the exported  |                 |             |
1035	   |       | data, and can be deduced  |                 |             |
1036	   |       | by the Collecting         |                 |             |
1037	   |       | Process.                  |                 |             |
1038	   | 3     | Binning: the values       | all             | all         |
1039	   |       | exported are anonymised   |                 |             |
1040	   |       | into bins.                |                 |             |
1041	   | 4     | Enumeration: the values   | all             | timestamps  |
1042	   |       | exported are anonymised   |                 |             |
1043	   |       | by enumeration.           |                 |             |
1044	   | 5     | Permutation: the values   | all             | identifiers |
1045	   |       | exported are anonymised   |                 |             |
1046	   |       | by random permutation.    |                 |             |
1047	   | 6     | Structured Permutation:   | addresses       |             |
1048	   |       | the values exported are   |                 |             |
1049	   |       | anonymised by random      |                 |             |
1050	   |       | permutation, preserving   |                 |             |
1051	   |       | bit-level structure as    |                 |             |
1052	   |       | appropriate; this         |                 |             |
1053	   |       | represents                |                 |             |
1054	   |       | prefix-preserving IP      |                 |             |
1055	   |       | address anonymisation or  |                 |             |
1056	   |       | structured MAC address    |                 |             |
1057	   |       | anonymisation.            |                 |             |
1058	   | 7     | Reverse Truncation: the   | addresses       |             |
1059	   |       | values exported are       |                 |             |
1060	   |       | anonymised using reverse  |                 |             |
1061	   |       | truncation.  The number   |                 |             |
1062	   |       | of truncated bits is      |                 |             |
1063	   |       | implicit in the exported  |                 |             |
1064	   |       | data, and can be deduced  |                 |             |
1065	   |       | by the Collecting         |                 |             |
1066	   |       | Process.                  |                 |             |
1067	   | 8     | Noise: the values         | non-identifiers | counters    |
1068	   |       | exported are anonymised   |                 |             |
1069	   |       | by adding random noise to |                 |             |
1070	   |       | each value.               |                 |             |
1071	   | 9     | Offset: the values        | all             | timestamps  |
1072	   |       | exported are anonymised   |                 |             |
1073	   |       | by adding a single offset |                 |             |
1074	   |       | to all values.            |                 |             |
1075	   +-------+---------------------------+-----------------+-------------+

1077	   Abstract Data Type:   unsigned16

1079	   ElementId:   TBD2

1081	   Status:   Proposed

1083	7.  Applying Anonymisation Techniques to IPFIX Export and Storage

1085	   When exporting or storing anonymised flow data using IPFIX, certain
1086	   interactions between the IPFIX Protocol and the anonymisation
1087	   techniques in use must be considered; these are treated in the
1088	   subsections below.

1090	7.1.  Arrangement of Processes in IPFIX Anonymisation

1092	   Anonymisation may be applied to IPFIX data at three stages within the
1093	   collection infrastructure: on initial export, at a mediator, or after
1094	   collection, as shown in Figure 2.  Each of these locations has
1095	   specific considerations and applicability.

1097	               +==========================================+
1098	               | Exporting Process                        |
1099	               +==========================================+
1100	                 |                                      |
1101	                 |    (Anonymised at Original Exporter) |
1102	                 V                                      |
1103	               +=============================+          |
1104	               | Mediator                    |          |
1105	               +=============================+          |
1106	                 |                                      |
1107	                 | (Anonymising Mediator)               |
1108	                 V                                      V
1109	               +==========================================+
1110	               | Collecting Process                       |
1111	               +==========================================+
1112	                       |
1113	                       | (Anonymising CP/File Writer)
1114	                       V
1115	               +--------------------+
1116	               | IPFIX File Storage |
1117	               +--------------------+

1119	                Figure 2: Potential Anonymisation Locations

1121	   Anonymisation is generally performed before the wider dissemination
1122	   or repurposing of a flow data set, e.g., adapting operational
1123	   measurement data for research.  Therefore, direct anonymisation of
1124	   flow data on initial export is only applicable in certain restricted
1125	   circumstances: when the Exporting Process is "publishing" data to a
1126	   Collecting Process directly, and the Exporting Process and Collecting
1127	   Process are operated by different entities.  Note that certain
1128	   guidelines in Section 7.2.3 with respect to timestamp anonymisation
1129	   may not apply in this case, as the Collecting Process may be able to
1130	   deduce certain timing information from the time at which each Message
1131	   is received.

1133	   A much more flexible arrangement is to anonymise data within a
1134	   Mediator [I-D.ietf-ipfix-mediators-framework].  Here, original data
1135	   is sent to a Mediator, which performs the anonymisation function and
1136	   re-exports the anonymised data.  Such a Mediator could be located at
1137	   the administrative domain boundary of the initial Exporting Process
1138	   operator, exporting anonymised data to other consumers outside the
1139	   organisation.  In this case, the original Exporter SHOULD use TLS as
1140	   specified in [RFC5101] to secure the channel to the Mediator, and the
1141	   Mediator should follow the guidelines in Section 7.2, to mitigate the
1142	   risk of original data disclosure.

1144	   When data is to be published as an anonymised data set in an IPFIX
1145	   File [RFC5655], the anonymisation may be done at the final Collecting
1146	   Process before storage and dissemination, as well.  In this case, the
1147	   Collector should follow the guidelines in Section 7.2, especially as
1148	   regards File-specific Options in Section 7.2.4

1150	   In each of these data flows, the anonymisation of records is
1151	   undertaken by an Intermediate Anonymisation Process (IAP); the data
1152	   flows into and out of this IAP are shown in Figure 3 below.

1154	   packets --+                     +- IPFIX Messages -+
1155	             |                     |                  |
1156	             V                     V                  V
1157	   +==================+ +====================+ +=============+
1158	   | Metering Process | | Collecting Process | | File Reader |
1159	   +==================+ +====================+ +=============+
1160	             |      Non-anonymised | Records          |
1161	             V                     V                  V
1162	   +=========================================================+
1163	   |          Intermediate Anonymisation Process (IAP)       |
1164	   +=========================================================+
1165	             | Anonymised     ^            Anonymised |
1166	             | Records        |               Records |
1167	             V                |                       V
1168	   +===================+    Anonymisation      +=============+
1169	   | Exporting Process |<--- Parameters ------>| File Writer |
1170	   +===================+                       +=============+
1171	             |                                        |
1172	             +------------> IPFIX Messages <----------+

1174	          Figure 3: Data flows through the anonymisation process

1176	   Anonymisation parameters must also be available to the Exporting
1177	   Process and/or File Writer in order to ensure header data is also
1178	   appropriately anonymised as in Section 7.2.3.

1180	   Following each of the data flows through the IAP, we describe five
1181	   basic types of anonymisation arrangements within this framework in
1182	   Figure 4.  In addition to the three arrangements described in detail
1183	   above, anonymisation can also be done at a collocated Metering
1184	   Process and File Writer (see section 7.3.2 of [RFC5655]), or at a
1185	   file manipulator (see section 7.3.7 of [RFC5655]).

1187	         +----+  +-----+  +----+
1188	 pkts -> | MP |->| IAP |->| EP |-> anonymisation on Original Exporter
1189	         +----+  +-----+  +----+
1190	         +----+  +-----+  +----+
1191	 pkts -> | MP |->| IAP |->| FW |-> Anonymising collocated MP/File Writer
1192	         +----+  +-----+  +----+
1193	         +----+  +-----+  +----+
1194	IPFIX -> | CP |->| IAP |->| EP |-> Anonymising Mediator (Masquerading Proxy)
1195	         +----+  +-----+  +----+
1196	         +----+  +-----+  +----+
1197	IPFIX -> | CP |->| IAP |->| FW |-> Anonymising collocated CP/File Writer
1198	         +----+  +-----+  +----+
1199	         +----+  +-----+  +----+
1200	IPFIX -> | FR |->| IAP |->| FW |-> Anonymising file manipulator
1201	 File    +----+  +-----+  +----+

1203	        Figure 4: Possible anonymisation arrangements in the IPFIX
1204	                               architecture

1206	   Note that anonymisation may occur at more than one location within a
1207	   given collection infrastructure, to provide varying levels of
1208	   anonymisation, disclosure risk, or data utility for specific
1209	   purposes.

1211	7.2.  IPFIX-Specific Anonymisation Guidelines

1213	   In implementing and deploying the anonymisation techniques described
1214	   in this document, implementors should note that IPFIX already
1215	   provides features that support anonymised data export, and use these
1216	   where appropriate.  Care must also be taken that data structures
1217	   supporting the operation of the protocol itself do not leak data that
1218	   could be used to reverse the anonymisation applied to the flow data.
1219	   Such data structures may appear in the header, or within the data
1220	   stream itself, especially as options data.  Each of these and their
1221	   impact on specific anonymisation techniques is noted in a separate
1222	   subsection below.

1224	7.2.1.  Appropriate Use of Information Elements for Anonymised Data

1226	   Note, as in Section 6 above, that black-marker anonymised fields
1227	   SHOULD NOT be exported at all; the absence of the field in a given
1228	   Data Set is implicitly declared by not including the corresponding
1229	   Information Element in the Template describing that Data Set.

1231	   When using precision degradation of timestamps, Exporting Processes
1232	   SHOULD export timing information using Information Elements of an
1233	   appropriate precision, as explained in Section 4.5 of [RFC5153].  For
1234	   example, timestamps measured in millisecond-level precision and
1235	   degraded to second-level precision should use flowStartSeconds and
1236	   flowEndSeconds, not flowStartMilliseconds and flowEndMilliseconds.

1238	   When exporting anonymised data and anonymisation metadata, Exporting
1239	   Processes SHOULD ensure that the combination of Information Element
1240	   and declared anonymisation technique are compatible.  Specifically,
1241	   the applicable and recommended Information Element types and
1242	   semantics for each technique are noted in the description of the
1243	   anonymisationTechnique Information Element in Section 6.2.3.  In this
1244	   description, a timestamp is an Information Element with the data type
1245	   dateTimeSeconds, dataTimeMilliseconds, dateTimeMicroseconds, or
1246	   dateTimeNanoseconds; an address is an Information Element with the
1247	   data type ipv4Address, ipv6Address, or macAddress; and an identifier
1248	   is an Information Element with identifier data type semantics.
1249	   Exporting Process MUST NOT export Anonymisation Options records
1250	   binding techniques to Information Elements to which they are not
1251	   applicable, and SHOULD NOT export Anonymisation Options records
1252	   binding techniques to Information Elements for which they are not
1253	   recommended.

1255	7.2.2.  Export of Perimeter-Based Anonymisation Policies

1257	   Data collected from a single network may require different
1258	   anonymisation policies for addresses internal and external to the
1259	   network.  For example, internal addresses could be subject to simple
1260	   permutation, while external addresses could be aggregated into
1261	   networks by truncation.  When exporting anonymised perimeter
1262	   bidirectional flow (biflow) data as in section 5.2 of [RFC5103], this
1263	   arrangement may be easily represented by specifying one technique for
1264	   source endpoint information (which represents the external endpoint
1265	   in a perimeter biflow) and one technique for destination endpoint
1266	   information (which represents the internal address in a perimeter
1267	   biflow).

1269	   However, it can also be useful to represent perimeter-based
1270	   anonymisation policies with unidirectional flow (uniflow), or non-
1271	   perimeter biflow data.  In this case, the Perimeter Anonymisation bit
1272	   (bit 2) in the anonymisationFlags Information Element describing the
1273	   anonymised address Information Elements can be set to change the
1274	   meaning of "source" and "destination" of Information Elements to mean
1275	   "external" and "internal" as with perimeter biflows, but only with
1276	   respect to anonymisation policies.

1278	7.2.3.  Anonymisation of Header Data

1280	   Each IPFIX Message contains a Message Header; within this Message
1281	   Header are contained two fields which may be used to break certain
1282	   anonymisation techniques: the Export Time, and the Observation Domain
1283	   ID

1285	   Export of IPFIX Messages containing anonymised timestamp data where
1286	   the original Export Time Message header has some relationship to the
1287	   anonymised timestamps SHOULD anonymise the Export Time header field
1288	   using an equivalent technique, if possible.  Otherwise, relationships
1289	   between export and flow time could be used to partially or totally
1290	   reverse timestamp anonymisation.

1292	   The similarity in size between an Observation Domain ID and an IPv4
1293	   address (32 bits) may lead to a temptation to use an IPv4 interface
1294	   address on the Metering or Exporting Process as the Observation
1295	   Domain ID.  If this address bears some relation to the IP addresses
1296	   in the flow data (e.g., shares a network prefix with internal
1297	   addresses) and the IP addresses in the flow data are anonymised in a
1298	   structure-preserving way, then the Observation Domain ID may be used
1299	   to break the IP address anonymisation.  Use of an IPv4 interface
1300	   address on the Metering or Exporting Process as the Observation
1301	   Domain ID is NOT RECOMMENDED in this case.

1303	7.2.4.  Anonymisation of Options Data

1305	   IPFIX uses the Options mechanism to export, among other things,
1306	   metadata about exported flows and the flow collection infrastructure.
1307	   As with the IPFIX Message Header, certain Options recommended in
1308	   [RFC5101] and [RFC5655] containing flow timestamps and network
1309	   addresses of Exporting and Collecting Processes may be used to break
1310	   certain anonymisation techniques; care should be taken while using
1311	   them with anonymised data export and storage.

1313	   The Exporting Process Reliability Statistics Options Template,
1314	   recommended in [RFC5101], contains an Exporting Process ID field,
1315	   which may be an exportingProcessIPv4Address Information Element or an
1316	   exportingProcessIPv6Address Information Element.  If the Exporting
1317	   Process address bears some relation to the IP addresses in the flow
1318	   data (e.g., shares a network prefix with internal addresses) and the
1319	   IP addresses in the flow data are anonymised in a structure-
1320	   preserving way, then the Exporting Process address may be used to
1321	   break the IP address anonymisation.  Exporting Processes exporting
1322	   anonymised data in this situation SHOULD mitigate the risk of attack
1323	   either by omitting Options described by the Exporting Process
1324	   Reliability Statistics Options Template, or by anonymising the
1325	   Exporting Process address using a similar technique to that used to
1326	   anonymise the IP addresses in the exported data.

1328	   Similarly, the Export Session Details Options Template and Message
1329	   Details Options Template specified for the IPFIX File Format
1330	   [RFC5655] may contain the exportingProcessIPv4Address Information
1331	   Element or the exportingProcessIPv6Address Information Element to
1332	   identify an Exporting Process from which a flow record was received,
1333	   and the collectingProcessIPv4Address Information Element or the
1334	   collectingProcessIPv6Address Information Element to identify the
1335	   Collecting Process which received it.  If the Exporting Process or
1336	   Collecting Process address bears some relation to the IP addresses in
1337	   the flow data (e.g., shares a network prefix with internal addresses)
1338	   and the IP addresses in the flow data are anonymised in a structure-
1339	   preserving way, then the Exporting Process or Collecting Process
1340	   address may be used to break the IP address anonymisation.  Since
1341	   these Options Templates are primarily intended for storing IPFIX
1342	   Transport Session data for auditing, replay, and testing purposes, it
1343	   is NOT RECOMMENDED that storage of anonymised data include these
1344	   Options Templates in order to mitigate the risk of attack.

1346	   The Message Details Options Template specified for the IPFIX File
1347	   Format [RFC5655] also contains the collectionTimeMilliseconds
1348	   Information Element.  As with the Export Time Message Header field,
1349	   if the exported flow data contains anonymised timestamp information,
1350	   and the collectionTimeMilliseconds Information Element in a given
1351	   Message has some relationship to the anonymised timestamp
1352	   information, then this relationship can be exploited to reverse the
1353	   timestamp anonymisation.  Since this Options Template is primarily
1354	   intended for storing IPFIX Transport Session data for auditing,
1355	   replay, and testing purposes, it is NOT RECOMMENDED that storage of
1356	   anonymised data include this Options Template in order to mitigate
1357	   the risk of attack.

1359	   Since the Time Window Options Template specified for the IPFIX File
1360	   Format [RFC5655] refers to the timestamps within the flow data to
1361	   provide partial table of contents information for an IPFIX File, care
1362	   must be taken to ensure that Options described by this template are
1363	   written using the anonymised timestamps instead of the original ones.

1365	7.2.5.  Special-Use Address Space Considerations

1367	   When anonymising data for transport or storage using IPFIX containing
1368	   anonymised IP addresses, and the analysis purpose permits doing so,
1369	   it is recommended to filter out or leave unanonymised data containing
1370	   the special-use IPv4 addresses enumerated in [RFC3330] or the
1371	   special-use IPv6 addresses enumerated in [RFC5153].  Data containing
1372	   these addresses (e.g. 0.0.0.0 and 169.254.0.0/16 for link-local
1373	   autoconfiguration in IPv4 space) are often associated with specific,
1374	   well-known behavioral patterns.  Detection of these patterns in
1375	   anonymised data can lead to deanonymisation of these special-use
1376	   addresses, which increases the chance of a complete reversal of
1377	   anonymisation by an attacker, especially of prefix-preserving
1378	   techniques.

1380	8.  Examples

1382	   In this example, consider the export or storage of an anonymised IPv4
1383	   dataset from a single network described by a simple template
1384	   containing a timestamp in seconds, a five-tuple, and packet and octet
1385	   counters.  The template describing each record in this dataset is
1386	   shown in figure Figure 5.

1388	                            1                   2                   3
1389	        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1390	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1391	       |          Set ID = 2           |          Length =  40         |
1392	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1393	       |      Template ID = 256        |        Field Count = 8        |
1394	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1395	       |0| flowStartSeconds        150 |       Field Length =  4       |
1396	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1397	       |0| sourceIPv4Address         8 |       Field Length =  4       |
1398	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1399	       |0| destinationIPv4Address   12 |       Field Length =  4       |
1400	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1401	       |0| sourceTransportPort       7 |       Field Length =  2       |
1402	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1403	       |0| destinationTransportPort 11 |       Field Length =  2       |
1404	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1405	       |0| packetDeltaCount          2 |       Field Length =  4       |
1406	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1407	       |0| octetDeltaCount           1 |       Field Length =  4       |
1408	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1409	       |0| protocolIdentifier        4 |       Field Length =  1       |
1410	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1412	                      Figure 5: Example Flow Template

1414	   Suppose that this dataset is anonymised according to the following
1415	   policy:

1417	   o  IP addresses within the network are protected by reverse
1418	      truncation.

1420	   o  IP addresses outside the network are protected by prefix-
1421	      preserving anonymisation.

1423	   o  Octet counts are exported using degraded precision in order to
1424	      provide minimal protection against fingerprinting attacks.

1426	   o  All other fields are exported unanonymised.

1428	   In order to export anonymisation records for this template and
1429	   policy, first, the Anonymisation Options Template shown in figure
1430	   Figure 6 is exported.  For this example, the optional
1431	   privateEnterpriseNumber and informationElementIndex Information
1432	   Elements are omitted, because they are not used.

1434	                              1                   2                   3
1435	          0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1436	         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1437	         |          Set ID = 3           |          Length =  26         |
1438	         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1439	         |      Template ID = 257        |        Field Count = 4        |
1440	         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1441	         |    Scope Field Count = 2      |0| templateID              346 |
1442	         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1443	         |       Field Length = 2        |0| informationElementId    303 |
1444	         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1445	         |       Field Length = 2        |0| anonymisationFlags      339 |
1446	         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1447	         |       Field Length = 2        |0| anonymisationTechnique  344 |
1448	         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1449	         |       Field Length = 2        |
1450	         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1452	             Figure 6: Example Anonymisation Options Template

1454	   Following the Anonymisation Options Template comes a Data Set
1455	   containing Anonymisation Records.  This data set has an entry for
1456	   each Information Element Specifier in Template 256 describing the
1457	   flow records.  This Data Set is shown in figure Figure 7.  Note that
1458	   sourceIPv4Address and destinationIPv4Address have the Perimeter
1459	   Anonymisation (0x0004) flag set in anonymisationFlags, meaning that
1460	   source address should be treated as network-external, and the
1461	   destination address as network-internal.

1463	                            1                   2                   3
1464	        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1465	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1466	       |          Set ID = 257         |          Length =  68         |
1467	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1468	       |          Template 256         | flowStartSeconds       IE 150 |
1469	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1470	       | no flags               0x0000 | Not Anonymised              1 |
1471	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1472	       |          Template 256         | sourceIPv4Address        IE 8 |
1473	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1474	       | Perimeter, Session SC 0x0005  | Structured Permutation      6 |
1475	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1476	       |          Template 256         | destinationIPv4Address  IE 12 |
1477	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1478	       | Perimeter, Stable     0x0005  | Reverse Truncation          7 |
1479	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1480	       |          Template 256         | sourceTransportPort      IE 7 |
1481	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1482	       | no flags               0x0000 | Not Anonymised              1 |
1483	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1484	       |          Template 256         | dest.TransportPort      IE 11 |
1485	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1486	       | no flags               0x0000 | Not Anonymised              1 |
1487	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1488	       |          Template 256         | packetDeltaCount         IE 2 |
1489	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1490	       | no flags               0x0000 | Not Anonymised              1 |
1491	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1492	       |          Template 256         | octetDeltaCount          IE 1 |
1493	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1494	       | Stable                 0x0003 | Precision Degradation       2 |
1495	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1496	       |          Template 256         | protocolIdentifier      IE 4  |
1497	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1498	       | no flags               0x0000 | Not Anonymised              1 |
1499	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1501	                  Figure 7: Example Anonymisation Records

1503	   Following the Anonymisation Records come the data sets containing the
1504	   anonymised data, exported according to the template in figure
1505	   Figure 5

1507	9.  Security Considerations

1509	   This document provides guidelines for exporting metadata about
1510	   anonymised data in IPFIX, or storing metadata about anonymised data
1511	   in IPFIX Files.  It is not intended as a general statement on the
1512	   applicability of specific flow data anonymisation techniques.
1513	   Exporters or publishers of anonymised data must take care that the
1514	   applied anonymisation technique is appropriate for the data source,
1515	   the purpose, and the risk of deanonymisation of a given application.

1517	   We note specifically that anonymisation is not a replacement for
1518	   encryption for confidentiality.  It is only appropriate for
1519	   protecting identifying information in data to be used for purposes in
1520	   which the protected data is irrelevant.  Confidentiality in export is
1521	   best served by using TLS or DTLS as in the Security Considerations
1522	   section of [RFC5101], and in long-term storage by implementation-
1523	   specific protection applied as in the Security Considerations section
1524	   of [RFC5655].  Indeed, confidentiality and anonymisation are not
1525	   mutually exclusive, as encryption for confidentiality may be applied
1526	   to anonymised data export or storage, as well, when the anonymised
1527	   data is not intended for public release.

1529	   When using pseudonymisation techniques that have a mutable mapping,
1530	   there is an inherent tradeoff in the stability of the map between
1531	   long-term comparability and security of the dataset against
1532	   deanonymisation.  In general, deanonymisation attacks are more
1533	   effective given more information, so the longer a given mapping is
1534	   valid, the more information can be applied to deanonymisation.  The
1535	   specific details of this are technique-dependent and therefore out of
1536	   the scope of this document.

1538	   When releasing anonymised data, publishers need to ensure that data
1539	   that could be used in deanonymisation is not leaked through the
1540	   export protocol; guidelines for addressing this risk are provided in
1541	   Section 7.2.

1543	   Note as well that the Security Considerations section of [RFC5101]
1544	   applies as well to the export of anonymised data, and the Security
1545	   Considerations section of [RFC5655] to the storage of anonymised
1546	   data, or the publication of anonymised traces.

1548	10.  IANA Considerations

1550	   This document specifies the creation of several new IPFIX Information
1551	   Elements in the IPFIX Information Element registry located at
1552	   http://www.iana.org/assignments/ipfix, as defined in Section 6.2
1553	   above.  IANA has assigned the following Information Element numbers
1554	   for their respective Information Elements as specified below:

1556	   o  Information Element number TBD1 for the anonymisationFlags
1557	      Information Element.

1559	   o  Information Element number TBD2 for the anonymisationTechnique
1560	      Information Element.

1562	   o  Information Element number TBD3 for the informationElementIndex
1563	      Information Element.

1565	   [NOTE for IANA: The text TBDn should be replaced with the respective
1566	   assigned Information Element numbers where they appear in this
1567	   document.]

1569	11.  Acknowledgments

1571	   We thank Paul Aitken and John McHugh for their comments and insight,
1572	   and Carsten Schmoll for his review.  Special thanks to the ICT-PRISM
1573	   project for its material support of this work.

1575	12.  References

1577	12.1.  Normative References

1579	   [RFC5101]  Claise, B., "Specification of the IP Flow Information
1580	              Export (IPFIX) Protocol for the Exchange of IP Traffic
1581	              Flow Information", RFC 5101, January 2008.

1583	   [RFC5102]  Quittek, J., Bryant, S., Claise, B., Aitken, P., and J.
1584	              Meyer, "Information Model for IP Flow Information Export",
1585	              RFC 5102, January 2008.

1587	   [RFC5610]  Boschi, E., Trammell, B., Mark, L., and T. Zseby,
1588	              "Exporting Type Information for IP Flow Information Export
1589	              (IPFIX) Information Elements", RFC 5610, July 2009.

1591	   [RFC5655]  Trammell, B., Boschi, E., Mark, L., Zseby, T., and A.
1592	              Wagner, "Specification of the IP Flow Information Export
1593	              (IPFIX) File Format", RFC 5655, October 2009.

1595	   [RFC3330]  IANA, "Special-Use IPv4 Addresses", RFC 3330,
1596	              September 2002.

1598	12.2.  Informative References

1600	   [RFC5103]  Trammell, B. and E. Boschi, "Bidirectional Flow Export
1601	              Using IP Flow Information Export (IPFIX)", RFC 5103,
1602	              January 2008.

1604	   [RFC5472]  Zseby, T., Boschi, E., Brownlee, N., and B. Claise, "IP
1605	              Flow Information Export (IPFIX) Applicability", RFC 5472,
1606	              March 2009.

1608	   [RFC5470]  Sadasivan, G., Brownlee, N., Claise, B., and J. Quittek,
1609	              "Architecture for IP Flow Information Export", RFC 5470,
1610	              March 2009.

1612	   [I-D.ietf-ipfix-mediators-framework]
1613	              Kobayashi, A., Claise, B., and K. Ishibashi, "IPFIX
1614	              Mediation: Framework",
1615	              draft-ietf-ipfix-mediators-framework-04 (work in
1616	              progress), October 2009.

1618	   [I-D.ietf-ipfix-mediators-problem-statement]
1619	              Kobayashi, A., Claise, B., Nishida, H., Sommer, C.,
1620	              Dressler, F., and E. Stephan, "IPFIX Mediation: Problem
1621	              Statement",
1622	              draft-ietf-ipfix-mediators-problem-statement-07 (work in
1623	              progress), December 2009.

1625	   [RFC5153]  Boschi, E., Mark, L., Quittek, J., Stiemerling, M., and P.
1626	              Aitken, "IP Flow Information Export (IPFIX) Implementation
1627	              Guidelines", RFC 5153, April 2008.

1629	   [RFC3917]  Quittek, J., Zseby, T., Claise, B., and S. Zander,
1630	              "Requirements for IP Flow Information Export (IPFIX)",
1631	              RFC 3917, October 2004.

1633	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1634	              Requirement Levels", BCP 14, RFC 2119, March 1997.

1636	Authors' Addresses

1638	   Elisa Boschi
1639	   Hitachi Europe
1640	   c/o ETH Zurich
1641	   Gloriastrasse 35
1642	   8092 Zurich
1643	   Switzerland

1645	   Phone: +41 44 632 70 57
1646	   Email: elisa.boschi@hitachi-eu.com
1647	   Brian Trammell
1648	   Hitachi Europe
1649	   c/o ETH Zurich
1650	   Gloriastrasse 35
1651	   8092 Zurich
1652	   Switzerland

1654	   Phone: +41 44 632 70 13
1655	   Email: brian.trammell@hitachi-eu.com