idnits 2.17.1 

draft-boschi-ipfix-anon-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but
     does not include the phrase in its RFC 2119 key words list.

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (January 12, 2009) is 5576 days in the past.  Is this
     intentional?


  Checking references for intended status: Experimental
  ----------------------------------------------------------------------------

  ** Obsolete normative reference: RFC 5101 (Obsoleted by RFC 7011)

  ** Obsolete normative reference: RFC 5102 (Obsoleted by RFC 7012)

  == Outdated reference: A later version (-05) exists of
     draft-ietf-ipfix-file-03

  == Outdated reference: A later version (-09) exists of
     draft-ietf-ipfix-mediators-framework-01


     Summary: 2 errors (**), 0 flaws (~~), 4 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	IPFIX Working Group                                            E. Boschi
3	Internet-Draft                                               B. Trammell
4	Intended status: Experimental                             Hitachi Europe
5	Expires: July 16, 2009                                  January 12, 2009

7	                     IP Flow Anonymisation Support
8	                     draft-boschi-ipfix-anon-02.txt

10	Status of this Memo

12	   This Internet-Draft is submitted to IETF in full conformance with the
13	   provisions of BCP 78 and BCP 79.

15	   Internet-Drafts are working documents of the Internet Engineering
16	   Task Force (IETF), its areas, and its working groups.  Note that
17	   other groups may also distribute working documents as Internet-
18	   Drafts.

20	   Internet-Drafts are draft documents valid for a maximum of six months
21	   and may be updated, replaced, or obsoleted by other documents at any
22	   time.  It is inappropriate to use Internet-Drafts as reference
23	   material or to cite them other than as "work in progress."

25	   The list of current Internet-Drafts can be accessed at
26	   http://www.ietf.org/ietf/1id-abstracts.txt.

28	   The list of Internet-Draft Shadow Directories can be accessed at
29	   http://www.ietf.org/shadow.html.

31	   This Internet-Draft will expire on July 16, 2009.

33	Copyright Notice

35	   Copyright (c) 2009 IETF Trust and the persons identified as the
36	   document authors.  All rights reserved.

38	   This document is subject to BCP 78 and the IETF Trust's Legal
39	   Provisions Relating to IETF Documents
40	   (http://trustee.ietf.org/license-info) in effect on the date of
41	   publication of this document.  Please review these documents
42	   carefully, as they describe your rights and restrictions with respect
43	   to this document.

45	Abstract

47	   This document describes anonymisation techniques for IP flow data and
48	   the export of anonymised data using the IPFIX protocol.  It provides
49	   a categorization of common anonymisation schemes and defines the
50	   parameters needed to describe them.  It provides guidelines for the
51	   implementation of anonymised data export and storage over IPFIX, and
52	   describes an Options-based method for anonymization metadata export
53	   within the IPFIX protocol, providing the basis for the definition of
54	   information models for configuring anonymisation techniques within an
55	   IPFIX Metering or Exporting Process, and for reporting the technique
56	   in use to an IPFIX Collecting Process.

58	Table of Contents

60	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
61	     1.1.  IPFIX Protocol Overview  . . . . . . . . . . . . . . . . .  3
62	     1.2.  IPFIX Documents Overview . . . . . . . . . . . . . . . . .  3
63	   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  4
64	   3.  Categorisation of Anonymisation Techniques . . . . . . . . . .  4
65	   4.  Anonymisation of IP Flow Data  . . . . . . . . . . . . . . . .  6
66	     4.1.  IP Address Anonymisation . . . . . . . . . . . . . . . . .  7
67	       4.1.1.  Truncation . . . . . . . . . . . . . . . . . . . . . .  7
68	       4.1.2.  Random Permutation . . . . . . . . . . . . . . . . . .  7
69	       4.1.3.  Prefix-preserving Pseudonymisation . . . . . . . . . .  7
70	     4.2.  Timestamp Anonymisation  . . . . . . . . . . . . . . . . .  8
71	       4.2.1.  Precision Degradation  . . . . . . . . . . . . . . . .  8
72	       4.2.2.  Enumeration  . . . . . . . . . . . . . . . . . . . . .  8
73	       4.2.3.  Random Time Shifts . . . . . . . . . . . . . . . . . .  8
74	     4.3.  Counter Anonymisation  . . . . . . . . . . . . . . . . . .  8
75	       4.3.1.  Precision Degradation  . . . . . . . . . . . . . . . .  9
76	       4.3.2.  Binning  . . . . . . . . . . . . . . . . . . . . . . .  9
77	       4.3.3.  Random Noise Addition  . . . . . . . . . . . . . . . .  9
78	     4.4.  Anonymisation of Other Flow Fields . . . . . . . . . . . .  9
79	   5.  Applying Anonymisation Techniques to IPFIX Export and
80	       Storage  . . . . . . . . . . . . . . . . . . . . . . . . . . .  9
81	     5.1.  Arrangement of Processes in IPFIX Anonymisation  . . . . . 10
82	     5.2.  IPFIX-Specific Anonymisation Guidelines  . . . . . . . . . 11
83	       5.2.1.  Anonymisation of Header Data . . . . . . . . . . . . . 11
84	       5.2.2.  Anonymisation of Options Data  . . . . . . . . . . . . 12
85	   6.  Parameters for the Description of Anonymisation Techniques . . 13
86	   7.  Anonymisation Metadata Support in IPFIX  . . . . . . . . . . . 13
87	   8.  Security Considerations  . . . . . . . . . . . . . . . . . . . 14
88	   9.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 14
89	   10. Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 14
90	   11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 14
91	     11.1. Normative References . . . . . . . . . . . . . . . . . . . 14
92	     11.2. Informative References . . . . . . . . . . . . . . . . . . 14
93	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 15

95	1.  Introduction

97	   The standardisation of an IP flow information export protocol
98	   [RFC5101] and associated representations removes a technical barrier
99	   to the sharing of IP flow data across organizational boundaries and
100	   with network operations, security, and research communities for a
101	   wide variety of purposes.  However, with wider dissemination comes
102	   greater risks to the privacy of the users of networks under
103	   measurement, and to the security of those networks.  While it is not
104	   a complete solution to the issues posed by distribution of IP flow
105	   information, anonymisation is an important tool for the protection of
106	   privacy within network measurement infrastructures.

108	   This document presents a mechanism for representing anonymised data
109	   within IPFIX and guidelines for using it.  It begins with a
110	   categorization of anonymisation techniques.  It then describes
111	   applicability of each technique to commonly anonymisable fields of IP
112	   flow data, organized by information element data type and semantics
113	   as in [RFC5102]; enumerates the parameters required by each of the
114	   applicable anonymisation techniques; and provides guidelines for the
115	   use of each of these techniques in accordance with best practices in
116	   data protection.  Finally, it specifies a mechanism for exporting
117	   anonymised data and binding anonymisation metadata to templates using
118	   IPFIX Options.

120	1.1.  IPFIX Protocol Overview

122	   In the IPFIX protocol, { type, length, value } tuples are expressed
123	   in templates containing { type, length } pairs, specifying which {
124	   value } fields are present in data records conforming to the
125	   Template, giving great flexibility as to what data is transmitted.
126	   Since Templates are sent very infrequently compared with Data
127	   Records, this results in significant bandwidth savings.  Various
128	   different data formats may be transmitted simply by sending new
129	   Templates specifying the { type, length } pairs for the new data
130	   format.  See [RFC5101] for more information.

132	   The IPFIX information model [RFC5102] defines a large number of
133	   standard Information Elements which provide the necessary { type }
134	   information for Templates.  The use of standard elements enables
135	   interoperability among different vendors' implementations.
136	   Additionally, non-standard enterprise-specific elements may be
137	   defined for private use.

139	1.2.  IPFIX Documents Overview

141	   "Specification of the IPFIX Protocol for the Exchange of IP Traffic
142	   Flow Information" [RFC5101] and its associated documents define the
143	   IPFIX Protocol, which provides network engineers and administrators
144	   with access to IP traffic flow information.

146	   "Architecture for IP Flow Information Export"
147	   [I-D.ietf-ipfix-architecture] defines the architecture for the export
148	   of measured IP flow information out of an IPFIX Exporting Process to
149	   an IPFIX Collecting Process, and the basic terminology used to
150	   describe the elements of this architecture, per the requirements
151	   defined in "Requirements for IP Flow Information Export" [RFC3917].
152	   The IPFIX Protocol document [RFC5101] then covers the details of the
153	   method for transporting IPFIX Data Records and Templates via a
154	   congestion-aware transport protocol from an IPFIX Exporting Process
155	   to an IPFIX Collecting Process.

157	   "Information Model for IP Flow Information Export" [RFC5102]
158	   describes the Information Elements used by IPFIX, including details
159	   on Information Element naming, numbering, and data type encoding.
160	   Finally, "IPFIX Applicability" [I-D.ietf-ipfix-as] describes the
161	   various applications of the IPFIX protocol and their use of
162	   information exported via IPFIX, and relates the IPFIX architecture to
163	   other measurement architectures and frameworks.

165	   Additionally, the "Specification of the IPFIX File Format"
166	   [I-D.ietf-ipfix-file] describes a file format based upon the IPFIX
167	   Protocol for the storage of flow data.

169	   This document references the Protocol and Architecture documents for
170	   terminology, and extends the IPFIX Information Model to provide new
171	   Information Elements for anonymisation metadata.  The anonymisation
172	   techniques described herein are equally applicable to the IPFIX
173	   Protocol and data stored in IPFIX Files.

175	2.  Terminology

177	   Terms used in this document that are defined in the Terminology
178	   section of the IPFIX Protocol [RFC5101] document are to be
179	   interpreted as defined there.

181	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
182	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
183	   document are to be interpreted as described in RFC 2119 [RFC2119].

185	3.  Categorisation of Anonymisation Techniques

187	   Anonymisation modifies a data set in order to protect the identity of
188	   the people or entities described by the data set from disclosure.

190	   With respect to network traffic data, anonymisation generally
191	   attempts to preserve some set of properties of the network traffic
192	   useful for a given application or applications, while ensuring the
193	   data cannot be traced back to the specific networks, hosts, or users
194	   generating the traffic.

196	   Anonymisation may be broadly classified according to two properties:
197	   recoverability and countability.  All anonymisation techniques map
198	   the real space of identifiers or values into a separate, anonymised
199	   space, according to some function.  A technique is said to be
200	   recoverable when the function used is invertible or can otherwise be
201	   reversed and a real identifier can be recovered from a given
202	   replacement identifier.

204	   Countability compares the dimension of the anonymised space (N) to
205	   the dimension of the real space (M), and denotes how the count of
206	   unique values is preserved by the anonymisation function.  If the
207	   anonymised space is smaller than the real space, then the function is
208	   said to generalise the input, mapping more than one input point to
209	   each anonymous value (e.g., as with aggregation).  By definition,
210	   generalisation is not recoverable.

212	   If the dimensions of the anonymised and real spaces are the same,
213	   such that the count of unique values is preserved, then the function
214	   is said to be a direct substitution function.  If the dimension of
215	   the anonymised space is larger, such that each real value maps to a
216	   set of anonymised values, then the function is said to be a set
217	   substitution function.  Note that with set substitution functions,
218	   the sets of anonymised values are not necessarily disjoint.  Either
219	   direct or set substitution functions are said to be one-way if there
220	   exists no method for recovering the real data point from an
221	   anonymised one.

223	   This classification is summarised in the table below.

225	   +------------------------+-----------------+------------------------+
226	   | Recoverability /       | Recoverable     | Non-recoverable        |
227	   | Countability           |                 |                        |
228	   +------------------------+-----------------+------------------------+
229	   | N < M                  | N.A.            | Generalisation         |
230	   | N = M                  | Direct          | One-way Direct         |
231	   |                        | Substitution    | Substitution           |
232	   | N > M                  | Set             | One-way Set            |
233	   |                        | Substitution    | Substitution           |
234	   +------------------------+-----------------+------------------------+

236	4.  Anonymisation of IP Flow Data

238	   Due to the restricted semantics of IP flow data, there are a
239	   relatively limited set of specific anonymisation techniques available
240	   on flow data, though each falls into the broad categories above.
241	   Each type of field that may commonly appear in a flow record may have
242	   its own applicable specific techniques.

244	   While anonymisation is generally applied at the resolution of single
245	   fields within a flow record, attacks against anonymisation use entire
246	   flows and relationships between hosts and flows within a given data
247	   set.  Therefore, fields which may not necessarily be identifying by
248	   themselves may be anonymised in order to increase the anonymity of
249	   the data set as a whole.

251	   Of all the fields in an IP flow record, only IP addresses directly
252	   identify entities in the real world.  Each IP address is associated
253	   with an interface on a network host, and can potentially be
254	   identified with a single user.  Additionally, IP addresses are
255	   structured identifiers; that is, partial IP address prefixes may be
256	   used to identify networks just as full IP addresses identify hosts.
257	   This makes anonymisation of IP addresses particularly important.

259	   Port numbers identify abstract entities (applications) as opposed to
260	   real-world entities, but they can be used to classify hosts and user
261	   behavior.  Passive port fingerprinting, both of well-known and
262	   ephemeral ports, can be used to determine the operating system
263	   running on a host.  Relative data volumes by port can also be used to
264	   determine the host's function (workstation, web server, etc.); this
265	   information can be used to identify hosts and users.

267	   While not identifiers in and of themselves, timestamps and counters
268	   can reveal the behavior of the hosts and users on a network.  Any
269	   given network activity is recognizable by a pattern of relative time
270	   differences and data volumes in the associated sequence of flows,
271	   even without host address information.  They can therefore be used to
272	   identify hosts and users.  Timestamps and counters are also
273	   vulnerable to traffic injection attacks, where traffic with a known
274	   pattern is injected into a network under measurement, and this
275	   pattern is later identified in the anonymised data set.

277	   The simplest and most extreme form of anonymisation, which can be
278	   applied to any field of a flow record, is black-marker anonymisation,
279	   or complete deletion of a given field.  Note that black-marker
280	   anonymisation is equivalent to simply not exporting the field(s) in
281	   question.

283	   While black-marker anonymisation completely protects the data in the
284	   deleted fields from the risk of disclosure, it also reduces the
285	   utility of the anonymised data set as a whole.  Techniques that
286	   retain some information while reducing (though not eliminating) the
287	   disclosure risk will be extensively discussed in the following
288	   sections; note that the techniques specifically applicable to IP
289	   addresses, timestamps, and counters will be discussed in separate
290	   sections.

292	4.1.  IP Address Anonymisation

294	   The following table gives an overview of the schemes for IP address
295	   anonymization described in this document and their categorization.

297	   +-------------------------------+-------------------+---------------+
298	   | Scheme                        | Action            | Reversibility |
299	   +-------------------------------+-------------------+---------------+
300	   | Truncation                    | Generalisation    | N             |
301	   | Random Permutation            | Direct            | Y/N           |
302	   |                               | Substitution      |               |
303	   | Prefix-preserving             | Direct            | Y             |
304	   | Pseudonymisation              | Substitution      |               |
305	   +-------------------------------+-------------------+---------------+

307	   Note that random permutations might be either reversible or not,
308	   depending on the function used.

310	4.1.1.  Truncation

312	   Truncation removes "n" of the least significant bits from an IP
313	   address.  Note that truncating 8 bits would replace an IP address
314	   with the corresponding class C network address.

316	4.1.2.  Random Permutation

318	   Random permutation replaces each IP address with a unique address
319	   randomply selected from the set of possible IP addresses.  The
320	   permutation function is implementable using a hash table to ensure
321	   uniqueness.

323	4.1.3.  Prefix-preserving Pseudonymisation

325	   Prefix-preserving pseudonymisation preserves the structure of subnets
326	   at each level while anonymising IP addresses.  If two real IP
327	   addresses match on a prefix of "n" bits, the two anonymised IP
328	   addresses will match on a prefix of "n" bits as well.

330	4.2.  Timestamp Anonymisation

332	   [TODO: introductory text]

334	   +-----------------------+---------------------------+---------------+
335	   | Scheme                | Action                    | Reversibility |
336	   +-----------------------+---------------------------+---------------+
337	   | Precision Degradation | Generalisation            | N             |
338	   | Enumeration           | Direct or Set             | Y             |
339	   |                       | Substitution              |               |
340	   | Random Shifts         | Direct Substitution       | Y             |
341	   +-----------------------+---------------------------+---------------+

343	4.2.1.  Precision Degradation

345	   Precision Degradation removes the most precise components of a
346	   timestamp, accounting all events occurring in each given interval
347	   (e.g. one millisecond for millisecond level degradation) as
348	   simultaneous.  This has the effect of potentially collapsing many
349	   timestamps into one.  With this technique time precision is reduced,
350	   and sequencing may be lost, but the information at which time the
351	   event occurred is preserved.

353	4.2.2.  Enumeration

355	   Enumeration keeps the chronological order in which events occurred
356	   while eliminating time information.  Timestamps are substituted by
357	   equidistant timestamps (or numbers) starting from an randomly chosen
358	   start value.

360	4.2.3.  Random Time Shifts

362	   Random Time Shifts keep the information on how far apart two events
363	   are from each other.  This is achieved by shifting all timestamps by
364	   the same random number.  Note that random time shifts also preserve
365	   chronological order.

367	4.3.  Counter Anonymisation

369	   Counters (such as packet and octet volumes per flow) are subject to
370	   fingerprinting and injection attacks against anonymisation, as
371	   timestamps are, but relative magnitudes of activity can be useful for
372	   certain analysis tasks.  [TODO: more intro text]
373	   +-----------------------+---------------------------+---------------+
374	   | Scheme                | Action                    | Reversibility |
375	   +-----------------------+---------------------------+---------------+
376	   | Precision Degradation | Generalisation            | N             |
377	   | Binning               | Generalisation            | N             |
378	   | Random noise addition | Direct or Set             | N             |
379	   |                       | Substitution              |               |
380	   +-----------------------+---------------------------+---------------+

382	4.3.1.  Precision Degradation

384	   As with precision degradation in timestamps, precision degradation of
385	   counters removes lower-order bits of the counters, treating all the
386	   counters in a given range as having the same value.  Depending on the
387	   precision reduction, this loses information about the relationships
388	   between sizes of similarly-sized flows, but keeps relative magnitude
389	   information.

391	4.3.2.  Binning

393	   Binning can be seen as a special case of precision degradation; the
394	   operation is identical, except for in precision degradation the
395	   counter ranges are uniform, and in binning they need not be.  For
396	   example, a common counter binning scheme for packet counters could be
397	   to bin values 1-2 together, and 3-infinity together, thereby
398	   separating potentially completely-opened TCP connections from
399	   unopened ones.  Binning schemes are generally chosen to keep
400	   precisely the amount of information required in a counter for a given
401	   analysis task

403	4.3.3.  Random Noise Addition

405	   Random noise addition adds a random amount to a counter in each flow;
406	   this is used to keep relative magnitude information and minimize the
407	   disruption to size relationship information while avoiding
408	   fingerprinting attacks against anonymization.

410	4.4.  Anonymisation of Other Flow Fields

412	   [TODO: as section 4.1]

414	5.  Applying Anonymisation Techniques to IPFIX Export and Storage

416	   When exporting or storing anonymised flow data using IPFIX, certain
417	   interactions between the IPFIX Protocol and the anonymisation
418	   techniques in use must be considered; these are treated in the
419	   subsections below.

421	5.1.  Arrangement of Processes in IPFIX Anonymisation

423	   Anonymisation may be applied to IPFIX data at three stages within a
424	   the collection infrastructure: on initial export, at a mediator, or
425	   after collection, as shown in Figure 1.  Each of these locations has
426	   specific considerations and applicability.

428	                       +--------------------+
429	                       | IPFIX File Storage |
430	                       +--------------------+
431	                         ^
432	                         | (Anonymised after collection)
433	                         |
434	               +=======================================+
435	               | Collecting Process                    |
436	               +=======================================+
437	                 ^                                   ^
438	                 | (Anonymised at mediator)          |
439	                 |                                   |
440	               +=============================+       |
441	               | Mediator                    |       |
442	               +=============================+       |
443	                 ^                                   |
444	                 |    (Anonymised on initial export) |
445	                 |                                   |
446	               +=======================================+
447	               | Exporting Process                     |
448	               +=======================================+

450	                Figure 1: Potential Anonymisation Locations

452	   Anonymisation is generally performed before the wider dissemination
453	   or repurposing of a flow data set, e.g., adapting operational
454	   measurement data for research.  Therefore, direct anonymisation of
455	   flow data on initial export is only applicable in certain restricted
456	   circumstances: when the Exporting Process is "publishing" data to a
457	   Collecting Process directly, and the Exporting Process and Collecting
458	   Process are operated by different entities.  Note that certain
459	   guidelines in Section 5.2.1 with respect to timestamp anonymisation
460	   may not apply in this case, as the Collecting Process may be able to
461	   deduce certain timing information from the time at which each Message
462	   is received.

464	   A much more flexible arrangement is to anonymise data within a
465	   Mediator [I-D.ietf-ipfix-mediators-framework].  Here, original data
466	   is sent to a Mediator, which performs the anonymisation function and
467	   re-exports the anonymised data.  Such a Mediator could be located at
468	   the administrative domain boundary of the initial Exporting Process
469	   operator, exporting anonymised data to other consumers outside the
470	   organisation.  In this case, the original Exporter SHOULD use TLS as
471	   specified in [RFC5101] to secure the channel to the Mediator, and the
472	   Mediator should follow the guidelines in Section 5.2, to mitigate the
473	   risk of original data disclosure.

475	   When data is to be published as an anonymised data set in an IPFIX
476	   File [I-D.ietf-ipfix-file], the anonymisation may be done at the
477	   final Collecting Process before storage and dissemination, as well.
478	   In this case, the Collector should follow the guidelines in
479	   Section 5.2, especially as regards File-specific Options in
480	   Section 5.2.2

482	   Note that anonymisation may occur at more than one location within a
483	   given collection infrastructure, to provide varying levels of
484	   anonymisation reversal risk and utility for specific purposes.

486	5.2.  IPFIX-Specific Anonymisation Guidelines

488	   In implementing and deploying the anonymisation techniques described
489	   in this document, care must be taken that data structures supporting
490	   the operation of the protocol itself do not leak data that could be
491	   used to reverse the anonymisation applied to the flow data.  Such
492	   data structures may appear in the header, or within the data stream
493	   itself, especially as options data.  Each of these and their impact
494	   on specific anonymisation techniques is noted in a separate
495	   subsection below.

497	5.2.1.  Anonymisation of Header Data

499	   Each IPFIX Message contains a Message Header; within this Message
500	   Header are contained two fields which may be used to break certain
501	   anonymisation techniques: the Export Time, and the Observation Domain
502	   ID

504	   Export of IPFIX Messages containing anonymised timestamp data where
505	   the original Export Time Message header has some relationship to the
506	   anonymised timestamps SHOULD anonymise the Export Time header field
507	   using an equivalent technique, if possible.  Otherwise, relationships
508	   between export and flow time could be used to partially or totally
509	   reverse timestamp anonymisation.

511	   The similarity in size between an Observation Domain ID and an IPv4
512	   address (32 bits) may lead to a temptation to use an IPv4 interface
513	   address on the Metering or Exporting Process as the Observation
514	   Domain ID.  If this address bears some relation to the IP addresses
515	   in the flow data (e.g., shares a network prefix with internal
516	   addresses) and the IP addresses in the flow data are anonymised in a
517	   structure-preserving way, then the Observation Domain ID may be used
518	   to break the IP address anonymisation.  Use of an IPv4 interface
519	   address on the Metering or Exporting Process as the Observation
520	   Domain ID is NOT RECOMMENDED in this case.

522	   [EDITOR'S NOTE: We might want to see if anyone is actually doing this
523	   with IPFIX.  The example comes from other network measurement tools
524	   (e.g.  Argus) which default to using an IPv4 address as a sensor ID.]

526	5.2.2.  Anonymisation of Options Data

528	   IPFIX uses the Options mechanism to export, among other things,
529	   metadata about exported flows and the flow collection infrastructure.
530	   As with the IPFIX Message Header, certain Options recommended in
531	   [RFC5101] and the IPFIX File Format [I-D.ietf-ipfix-file] containing
532	   flow timestamps and network addresses of Exporting and Collecting
533	   Processes may be used to break certain anonymisation techniques; care
534	   should be taken while using them with anonymised data export and
535	   storage.

537	   The Exporting Process Reliability Statistics Options Template,
538	   recommended in [RFC5101], contains an Exporting Process ID field,
539	   which may be an exportingProcessIPv4Address Information Element or an
540	   exportingProcessIPv6Address Information Element.  If the Exporting
541	   Process address bears some relation to the IP addresses in the flow
542	   data (e.g., shares a network prefix with internal addresses) and the
543	   IP addresses in the flow data are anonymised in a structure-
544	   preserving way, then the Exporting Process address may be used to
545	   break the IP address anonymisation.  Exporting Processes exporting
546	   anonymised data in this situation SHOULD mitigate the risk of attack
547	   either by omitting Options described by the Exporting Process
548	   Reliability Statistics Options Template, or by anonymising the
549	   Exporting Process address using a similar technique to that used to
550	   anonymise the IP addresses in the exported data.

552	   Similarly, the Export Session Details Options Template and Message
553	   Details Options Template specified for the IPFIX File Format
554	   [I-D.ietf-ipfix-file] may contain the exportingProcessIPv4Address
555	   Information Element or the exportingProcessIPv6Address Information
556	   Element to identify an Exporting Process from which a flow record was
557	   received, and the collectingProcessIPv4Address Information Element or
558	   the collectingProcessIPv6Address Information Element to identify the
559	   Collecting Process which received it.  If the Exporting Process or
560	   Collecting Process address bears some relation to the IP addresses in
561	   the flow data (e.g., shares a network prefix with internal addresses)
562	   and the IP addresses in the flow data are anonymised in a structure-
563	   preserving way, then the Exporting Process or Collecting Process
564	   address may be used to break the IP address anonymisation.  Since
565	   these Options Templates are primarily intended for storing IPFIX
566	   Transport Session data for auditing, replay, and testing purposes, it
567	   is NOT RECOMMENDED that storage of anonymised data include these
568	   Options Templates in order to mitigate the risk of attack.

570	   The Message Details Options Template specified for the IPFIX File
571	   Format [I-D.ietf-ipfix-file] also contains the
572	   collectionTimeMilliseconds Information Element.  As with the Export
573	   Time Message Header field, if the exported flow data contains
574	   anonymised timestamp information, and the collectionTimeMilliseconds
575	   Information Element in a given Message has some relationship to the
576	   anonymised timestamp information, then this relationship can be
577	   exploited to reverse the timestamp anonymisation.  Since this Options
578	   Template is primarily intended for storing IPFIX Transport Session
579	   data for auditing, replay, and testing purposes, it is NOT
580	   RECOMMENDED that storage of anonymised data include this Options
581	   Template in order to mitigate the risk of attack.

583	   Since the Time Window Options Template specified for the IPFIX File
584	   Format [I-D.ietf-ipfix-file] refers to the timestamps within the flow
585	   data to provide partial table of contents information for an IPFIX
586	   File, care must be taken to ensure that Options described by this
587	   template are written using the anonymised timestamps instead of the
588	   original ones.

590	6.  Parameters for the Description of Anonymisation Techniques

592	   [TODO: see corresponding section of draft-ietf-psamp-sample-tech for
593	   the proposed structure of this section.]

595	7.  Anonymisation Metadata Support in IPFIX

597	   [TODO: Here we'll describe how the information specified above can be
598	   transmitted on the wire using an option template.  The idea is to
599	   scope the option to the Template ID and for each field specify which
600	   are anonymised, providing info on the output characteristics of the
601	   technique, and which ones aren't.]

603	   [EDITOR'S NOTE: Multiple anon. techniques applied on an IE at the
604	   same time is indicated with multiple elements of the same type (in
605	   application order as in PSAMP)]

607	   [EDITOR'S NOTE: for blackmarking we'll recommend not to export the
608	   information at all following the data protection law principle that
609	   only necessary information should be exported.]

611	8.  Security Considerations

613	   [TODO: write this section.]

615	9.  IANA Considerations

617	   This document contains no actions for IANA.

619	10.  Acknowledgments

621	   We thank Paul Aitken for his comments and insight, and the PRISM
622	   project for its support of this work.

624	11.  References

626	11.1.  Normative References

628	   [RFC5101]  Claise, B., "Specification of the IP Flow Information
629	              Export (IPFIX) Protocol for the Exchange of IP Traffic
630	              Flow Information", RFC 5101, January 2008.

632	   [RFC5102]  Quittek, J., Bryant, S., Claise, B., Aitken, P., and J.
633	              Meyer, "Information Model for IP Flow Information Export",
634	              RFC 5102, January 2008.

636	11.2.  Informative References

638	   [I-D.ietf-ipfix-as]
639	              Zseby, T., "IPFIX Applicability", draft-ietf-ipfix-as-12
640	              (work in progress), July 2007.

642	   [I-D.ietf-ipfix-architecture]
643	              Sadasivan, G., "Architecture for IP Flow Information
644	              Export", draft-ietf-ipfix-architecture-12 (work in
645	              progress), September 2006.

647	   [I-D.ietf-ipfix-file]
648	              Trammell, B., Boschi, E., Mark, L., Zseby, T., and A.
649	              Wagner, "Specification of the IPFIX File Format",
650	              draft-ietf-ipfix-file-03 (work in progress), October 2008.

652	   [I-D.ietf-ipfix-mediators-framework]
653	              Kobayashi, A., Nishida, H., and B. Claise, "IPFIX
654	              Mediation: Framework",
655	              draft-ietf-ipfix-mediators-framework-01 (work in
656	              progress), November 2008.

658	   [RFC3917]  Quittek, J., Zseby, T., Claise, B., and S. Zander,
659	              "Requirements for IP Flow Information Export (IPFIX)",
660	              RFC 3917, October 2004.

662	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
663	              Requirement Levels", BCP 14, RFC 2119, March 1997.

665	Authors' Addresses

667	   Elisa Boschi
668	   Hitachi Europe
669	   c/o ETH Zurich
670	   Gloriastrasse 35
671	   8092 Zurich
672	   Switzerland

674	   Phone: +41 44 632 70 57
675	   Email: elisa.boschi@hitachi-eu.com

677	   Brian Trammell
678	   Hitachi Europe
679	   c/o ETH Zurich
680	   Gloriastrasse 35
681	   8092 Zurich
682	   Switzerland

684	   Phone: +41 44 632 70 13
685	   Email: brian.trammell@hitachi-eu.com