IPFIX Working Group                                            E. Boschi
Internet-Draft                                               B. Trammell
Intended status: Experimental                                 ETH Zurich
Expires: April July 23, 2011                                 October 20, 2010                                  January 19, 2011

                     IP Flow Anonymisation Anonymization Support
                      draft-ietf-ipfix-anon-05.txt
                      draft-ietf-ipfix-anon-06.txt

Abstract

   This document describes anonymisation anonymization techniques for IP flow data and
   the export of anonymised anonymized data using the IPFIX protocol.  It
   categorizes common anonymisation anonymization schemes and defines the parameters
   needed to describe them.  It provides guidelines for the
   implementation of anonymised anonymized data export and storage over IPFIX, and
   describes an information model and Options-based method for
   anonymisation
   anonymization metadata export within the IPFIX protocol or storage in
   IPFIX Files.

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on April July 23, 2011.

Copyright Notice

   Copyright (c) 2010 2011 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
     1.1.  IPFIX Protocol Overview  . . . . . . . . . . . . . . . . .  4
     1.2.  IPFIX Documents Overview . . . . . . . . . . . . . . . . .  5
     1.3.  Anonymisation  Anonymization within the IPFIX Architecture  . . . . . . .  5
     1.4.  Supporting Experimentation with Anonymization  . . . . . .  6
   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  6
   3.  Categorisation  Categorization of Anonymisation Anonymization Techniques . . . . . . . . . .  6  7
   4.  Anonymisation  Anonymization of IP Flow Data  . . . . . . . . . . . . . . . .  8
     4.1.  IP Address Anonymisation Anonymization . . . . . . . . . . . . . . . . .  9 10
       4.1.1.  Truncation . . . . . . . . . . . . . . . . . . . . . .  9 11
       4.1.2.  Reverse Truncation . . . . . . . . . . . . . . . . . . 10 11
       4.1.3.  Permutation  . . . . . . . . . . . . . . . . . . . . . 10 11
       4.1.4.  Prefix-preserving Pseudonymisation Pseudonymization . . . . . . . . . . 11 12
     4.2.  MAC Address Anonymisation Anonymization  . . . . . . . . . . . . . . . . 11 12
       4.2.1.  Truncation . . . . . . . . . . . . . . . . . . . . . . 12 13
       4.2.2.  Reverse Truncation . . . . . . . . . . . . . . . . . . 12 13
       4.2.3.  Permutation  . . . . . . . . . . . . . . . . . . . . . 13 14
       4.2.4.  Structured Pseudonymisation Pseudonymization  . . . . . . . . . . . . . 13 14
     4.3.  Timestamp Anonymisation Anonymization  . . . . . . . . . . . . . . . . . 13 15
       4.3.1.  Precision Degradation  . . . . . . . . . . . . . . . . 14 15
       4.3.2.  Enumeration  . . . . . . . . . . . . . . . . . . . . . 14 16
       4.3.3.  Random Shifts  . . . . . . . . . . . . . . . . . . . . 15 16
     4.4.  Counter Anonymisation Anonymization  . . . . . . . . . . . . . . . . . . 15 16
       4.4.1.  Precision Degradation  . . . . . . . . . . . . . . . . 15 17
       4.4.2.  Binning  . . . . . . . . . . . . . . . . . . . . . . . 15 17
       4.4.3.  Random Noise Addition  . . . . . . . . . . . . . . . . 16 17
     4.5.  Anonymisation  Anonymization of Other Flow Fields . . . . . . . . . . . . 16 17
       4.5.1.  Binning  . . . . . . . . . . . . . . . . . . . . . . . 16 18
       4.5.2.  Permutation  . . . . . . . . . . . . . . . . . . . . . 17 18
   5.  Parameters for the Description of Anonymisation Anonymization Techniques . . 17 18
     5.1.  Stability  . . . . . . . . . . . . . . . . . . . . . . . . 17 19
     5.2.  Truncation Length  . . . . . . . . . . . . . . . . . . . . 18 19
     5.3.  Bin Map  . . . . . . . . . . . . . . . . . . . . . . . . . 18 20
     5.4.  Permutation  . . . . . . . . . . . . . . . . . . . . . . . 18 20
     5.5.  Shift Amount . . . . . . . . . . . . . . . . . . . . . . . 19 20
   6.  Anonymisation  Anonymization Export Support in IPFIX  . . . . . . . . . . . . 19 20
     6.1.  Anonymisation  Anonymization Records and the Anonymisation Anonymization Options
           Template . . . . . . . . . . . . . . . . . . . . . . . . . 19 21
     6.2.  Recommended Information Elements for Anonymisation Anonymization
           Metadata . . . . . . . . . . . . . . . . . . . . . . . . . 21 23
       6.2.1.  informationElementIndex  . . . . . . . . . . . . . . . 21 23
       6.2.2.  anonymisationTechnique  anonymizationTechnique . . . . . . . . . . . . . . . . 22 23
       6.2.3.  anonymisationFlags  anonymizationFlags . . . . . . . . . . . . . . . . . . 23 25
   7.  Applying Anonymisation Anonymization Techniques to IPFIX Export and
       Storage  . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 27
     7.1.  Arrangement of Processes in IPFIX Anonymisation Anonymization  . . . . . 26 28
     7.2.  IPFIX-Specific Anonymisation Anonymization Guidelines  . . . . . . . . . 28 30
       7.2.1.  Appropriate Use of Information Elements for
               Anonymised
               Anonymized Data  . . . . . . . . . . . . . . . . . . . 28 30
       7.2.2.  Export of Perimeter-Based Anonymisation Anonymization Policies . . . 29 31
       7.2.3.  Anonymisation  Anonymization of Header Data . . . . . . . . . . . . . 30 32
       7.2.4.  Anonymisation  Anonymization of Options Data  . . . . . . . . . . . . 30 32
       7.2.5.  Special-Use Address Space Considerations . . . . . . . 32 34
       7.2.6.  Protecting Out-of-Band Configuration and
               Management Data  . . . . . . . . . . . . . . . . . . . 32 34
   8.  Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 34
   9.  Security Considerations  . . . . . . . . . . . . . . . . . . . 37 39
   10. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 38 41
   11. Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 39 41
   12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 39 41
     12.1. Normative References . . . . . . . . . . . . . . . . . . . 39 41
     12.2. Informative References . . . . . . . . . . . . . . . . . . 40 42
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 40 43

1.  Introduction

   The standardisation standardization of an IP flow information export protocol
   [RFC5101] and associated representations removes a technical barrier
   to the sharing of IP flow data across organizational boundaries and
   with network operations, security, and research communities for a
   wide variety of purposes.  However, with wider dissemination comes
   greater risks to the privacy of the users of networks under
   measurement, and to the security of those networks.  While it is not
   a complete solution to the issues posed by distribution of IP flow
   information, anonymisation anonymization (i.e., the deletion or transformation of
   information that is considered sensitive and could be used to reveal
   the identity of subjects involved in a communication) is an important
   tool for the protection of privacy within network measurement
   infrastructures.

   This document presents a mechanism for representing anonymised anonymized data
   within IPFIX and guidelines for using it.  It is not intended as a
   general statement on the applicability of specific flow data
   anonymization techniques to specific situations, or as a
   recommendation of any particular application of anonymization to flow
   data export.  Exporters or publishers of anonymized data must take
   care that the applied anonymization technique is appropriate for the
   data source, the purpose, and the risk of deanonymization of a given
   application.

   It begins with a categorization of anonymisation anonymization techniques.  It then
   describes applicability of each technique to commonly anonymisable anonymizable
   fields of IP flow data, organized by information element data type
   and semantics as in [RFC5102]; enumerates the parameters required by
   each of the applicable anonymisation anonymization techniques; and provides
   guidelines for the use of each of these techniques in accordance with
   current best practices in data protection.  Finally, it specifies a
   mechanism for exporting
   anonymised anonymized data and binding anonymisation anonymization
   metadata to Templates and Options Templates using IPFIX Options.

1.1.  IPFIX Protocol Overview

   In the IPFIX protocol, { type, length, value } tuples are expressed
   in Templates containing { type, length } pairs, specifying which {
   value } fields are present in data records conforming to the
   Template, giving great flexibility as to what data is transmitted.
   Since Templates are sent very infrequently compared with Data
   Records, this results in significant bandwidth savings.  Various
   different data formats may be transmitted simply by sending new
   Templates specifying the { type, length } pairs for the new data
   format.  See [RFC5101] for more information.

   The IPFIX information model [RFC5102] defines a large number of
   standard Information Elements which provide the necessary { type }
   information for Templates.  The use of standard elements enables
   interoperability among different vendors' implementations.
   Additionally, non-standard enterprise-specific elements may be
   defined for private use.

1.2.  IPFIX Documents Overview

   "Specification of the IPFIX Protocol for the Exchange of IP Traffic
   Flow Information" [RFC5101] and its associated documents define the
   IPFIX Protocol, which provides network engineers and administrators
   with access to IP traffic flow information.

   "Architecture for IP Flow Information Export" [RFC5470] defines the
   architecture for the export of measured IP flow information out of an
   IPFIX Exporting Process to an IPFIX Collecting Process, and the basic
   terminology used to describe the elements of this architecture, per
   the requirements defined in "Requirements for IP Flow Information
   Export" [RFC3917].  The IPFIX Protocol document [RFC5101] then covers
   the details of the method for transporting IPFIX Data Records and
   Templates via a congestion-aware transport protocol from an IPFIX
   Exporting Process to an IPFIX Collecting Process.

   "Information Model for IP Flow Information Export" [RFC5102]
   describes the Information Elements used by IPFIX, including details
   on Information Element naming, numbering, and data type encoding.
   Finally, "IPFIX Applicability" [RFC5472] describes the various
   applications of the IPFIX protocol and their use of information
   exported via IPFIX, and relates the IPFIX architecture to other
   measurement architectures and frameworks.

   Additionally, "Specification of the IPFIX File Format" [RFC5655]
   describes a file format based upon the IPFIX Protocol for the storage
   of flow data.

   This document references the Protocol and Architecture documents for
   terminology, and extends the IPFIX Information Model to provide new
   Information Elements for anonymisation anonymization metadata.  The anonymisation anonymization
   techniques described herein are equally applicable to the IPFIX
   Protocol and data stored in IPFIX Files.

1.3.  Anonymisation  Anonymization within the IPFIX Architecture

   According to [RFC5470], IPFIX Message anonymisation anonymization is optionally
   performed as the final operation before handing the Message to the
   transport protocol for export.  While no provision is made in the
   architecture for anonymisation anonymization metadata as in Section 6, this
   arrangement does allow for the rewriting necessary for comprehensive
   anonymisation
   anonymization of IPFIX export as in Section 7.  The development of
   the IPFIX Mediation [I-D.ietf-ipfix-mediators-framework] framework
   and the IPFIX File Format [RFC5655] expand upon this initial
   architectural allowance for anonymisation anonymization by adding to the list of
   places that anonymisation anonymization may be applied.  The former specifies IPFIX
   Mediators, which rewrite existing IPFIX Messages, and the latter
   specifies a method for storage of IPFIX data in files.

   More detail on the applicable architectural arrangements for
   anonymisation
   anonymization can be found in Section 7.1

1.4.  Supporting Experimentation with Anonymization

   The intended status of this document is Experimental, reflecting the
   experimental nature of anonymization export support.  Research on
   network trace anonymization techniques and attacks against them is
   ongoing.  Indeed, there is increasing evidence that anonymization
   applied to network trace or flow data its own is insufficient for
   many data protection applications as in [Bur10].  Therefore, this
   document explicitly does not recommend any particular technique or
   implementation thereof.

   The intention of this document is to provide a common basis for
   interoperable exchange of anonymized data, furthering research in
   this area, both on anonymization techniques themselves as well as to
   the application of anonymized data to network measurement.  To that
   end, the classification in Section 3 and anonymization export support
   in Section 6 can be used to describe and export information even
   about data anonymized using techniques that are unacceptably weak for
   general application to production data sets on their own.

   While the specification herein is designed to be implementation- and
   technique-independent, open research in this area may necessitate
   future updates to the specification.  Assuming the future successful
   application of this specification to anonymized data publication and
   exchange, it may be brought back to the IPFIX working group for
   further development and publication on the standards track.

2.  Terminology

   Terms used in this document that are defined in the Terminology
   section of the IPFIX Protocol [RFC5101] document are to be
   interpreted as defined there.  In addition, this document defines the
   following terms:

   Anonymisation

   Anonymization Record:   A record, defined by the Anonymisation Anonymization
      Options Template in section Section 6.1, that defines the
      properties of the anonymisation anonymization applied to a single Information
      Element within a single Template or Options Template.

   Anonymised

   Anonymized Data Record:   A Data Record within a Data Set containing
      at least one Information Element with anonymised anonymized values.  The
      Information Element(s) within the Template or Options Template
      describing this Data Record SHOULD have a corresponding
      Anonymisation
      Anonymization Record.

   Intermediate Anonymisation Anonymization Process:   An intermediate process which
      takes Data Records and and transforms them into Anonymised Anonymized Data
      Records.

   Note that there is an explicit difference in this document between a
   "Data Set" (which is defined as in [RFC5101]) and a "data set".  When
   in lower case, this term refers to any collection of data (usually,
   within the context of this document, flow or packet data) which may
   contain identifying information and is therefore subject to
   anonymisation.
   anonymization.

   Note also that when the term Template is used in this document,
   unless otherwise noted, it applies both to Templates and Options
   Templates as defined in [RFC5101].  Specifically, Anonymisation Anonymization
   Records may apply to both Templates and Options Templates.

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

3.  Categorisation  Categorization of Anonymisation Anonymization Techniques

   Anonymisation modifies

   Anonymization, as described by this document, is the modification of
   a data set in order to protect the identity of the people or entities
   described by the data set from disclosure.  With respect to network
   traffic data, anonymisation anonymization generally attempts to preserve some set
   of properties of the network traffic useful for a given application
   or applications, while ensuring the data cannot be traced back to the
   specific networks, hosts, or users generating the traffic.

   Anonymisation

   Anonymization may be broadly classified according to two properties:
   recoverability and countability.  All anonymisation anonymization techniques map
   the real space of identifiers or values into a separate, anonymised anonymized
   space, according to some function.  A technique is said to be
   recoverable when the function used is invertible or can otherwise be
   reversed and a real identifier can be recovered from a given
   replacement identifier.  Techniques wherein the function used can
   only be reversed using additional information, such as an encryption
   key, or knowledge of injected traffic within the data set;
   "recoverability" as used within this categorization does not refer to
   recoverability under attack.

   Countability compares the dimension of the anonymised anonymized space (N) to
   the dimension of the real space (M), and denotes how the count of
   unique values is preserved by the anonymisation anonymization function.  If the
   anonymised
   anonymized space is smaller than the real space, then the function is
   said to generalise generalize the input, mapping more than one input point to
   each anonymous value (e.g., as with aggregation).  By definition,
   generalisation
   generalization is not recoverable.

   If the dimensions of the anonymised anonymized and real spaces are the same,
   such that the count of unique values is preserved, then the function
   is said to be a direct substitution function.  If the dimension of
   the anonymised anonymized space is larger, such that each real value maps to a
   set of anonymised anonymized values, then the function is said to be a set
   substitution function.  Note that with set substitution functions,
   the sets of anonymised anonymized values are not necessarily disjoint.  Either
   direct or set substitution functions are said to be one-way if there
   exists no non-brute force method for recovering the real data point
   from an anonymised anonymized one in isolation (i.e., if the only way to recover
   the data point is to attack the anonymised anonymized data set as a whole, e.g.
   through fingerprinting or data injection).

   This classification is summarised summarized in the table below.

   +------------------------+-----------------+------------------------+
   | Recoverability /       | Recoverable     | Non-recoverable        |
   | Countability           |                 |                        |
   +------------------------+-----------------+------------------------+
   | N < M                  | N.A.            | Generalisation Generalization         |
   | N = M                  | Direct          | One-way Direct         |
   |                        | Substitution    | Substitution           |
   | N > M                  | Set             | One-way Set            |
   |                        | Substitution    | Substitution           |
   +------------------------+-----------------+------------------------+

4.  Anonymisation  Anonymization of IP Flow Data

   Due

   In anonymizing IP flow data as treated by this document, the goal is
   generally two-way address untraceability: to remove the restricted semantics of ability to
   assert that endpoint X contacted endpoint Y at time T. Address
   untraceability is important as IP addresses are the most suitable
   field in IP flow data, there records to identify real-world entities.  Each IP
   address is a
   relatively limited set of specific anonymisation techniques available associated with an interface on flow data, though each falls into the broad categories above.
   Each type of field a network host, and can
   potentially be identified with a single user.  Additionally, IP
   addresses are structured identifiers; that is, partial IP address
   prefixes may commonly appear in a be used to identify networks just as full IP addresses
   identify hosts.  This leads IP flow data anonymization to be
   concerned first and foremost with IP address anonymization.

   Any form of aggregation which combines flows from multiple endpoints
   into a single record (e.g., aggregation by subnetwork, aggregation
   removing addressing completely) may have
   its own applicable specific techniques.

   While anonymisation also provide address
   untraceability; however, anonymization by aggregation is generally out of scope
   for this document.  Additionally of potential interest in this
   problem space but out of scope are anonymization techniques which are
   applied over multiple fields or multiple records in a way which
   introduces dependencies among anonymized fields or records.  This
   document is concerned solely with anonymization techniques applied at
   the resolution of single fields within a flow record, record.

   Even so, attacks against anonymisation these anonymization techniques use entire
   flows and relationships between hosts and flows within a given data
   set.  Therefore, fields which may not necessarily be identifying by
   themselves may be anonymised anonymized in order to increase the anonymity of
   the data set as a whole.

   Of all

   Due to the fields in an restricted semantics of IP flow record, IP addresses are data, there is a
   relatively limited set of specific anonymization techniques available
   on flow data, though each falls into the most
   likely to be used to directly identify entities broad categories discussed
   in the real world. previous section.  Each IP address is associated with an interface on a network host,
   and can potentially be identified with a single user.  Additionally,
   IP addresses are structured identifiers; type of field that is, partial IP address
   prefixes may be used to identify networks just as full IP addresses
   identify hosts.  This makes anonymisation of commonly appear
   in a flow record may have its own applicable specific techniques.

   As with IP addresses
   particularly important. addresses, MAC addresses uniquely identify devices on the
   network; while they are not often available in traffic data collected
   at Layer 3, and cannot be used to locate devices within the network,
   some traces may contain sub-IP data including MAC address data.
   Hardware addresses may be mappable to device serial numbers, and to
   the entities or individuals who purchased the devices, when combined
   with external databases.  MAC addresses are also often used in
   constructing IPv6 addresses (see section 2.5.1 of [RFC4291]), and as
   such may be used to reconstruct the low-order bits of anonymised anonymized IPv6
   addresses in certain circumstances.  Therefore, MAC address anonymisation
   anonymization is also important.

   Port numbers identify abstract entities (applications) as opposed to
   real-world entities, but they can be used to classify hosts and user
   behavior.  Passive port fingerprinting, both of well-known and
   ephemeral ports, can be used to determine the operating system
   running on a host.  Relative data volumes by port can also be used to
   determine the host's function (workstation, web server, etc.); this
   information can be used to identify hosts and users.

   While not identifiers in and of themselves, timestamps and counters
   can reveal the behavior of the hosts and users on a network.  Any
   given network activity is recognizable by a pattern of relative time
   differences and data volumes in the associated sequence of flows,
   even without host address information.  They can therefore be used to
   identify hosts and users.  Timestamps and counters are also
   vulnerable to traffic injection attacks, where traffic with a known
   pattern is injected into a network under measurement, and this
   pattern is later identified in the anonymised anonymized data set.

   The simplest and most extreme form of anonymisation, anonymization, which can be
   applied to any field of a flow record, is black-marker anonymisation, anonymization,
   or complete deletion of a given field.  Note that black-marker
   anonymisation
   anonymization is equivalent to simply not exporting the field(s) in
   question.

   While black-marker anonymisation anonymization completely protects the data in the
   deleted fields from the risk of disclosure, it also reduces the
   utility of the anonymised anonymized data set as a whole.  Techniques that
   retain some information while reducing (though not eliminating) the
   disclosure risk will be extensively discussed in the following
   sections; note that the techniques specifically applicable to IP
   addresses, timestamps, ports, and counters will be discussed in
   separate sections.

4.1.  IP Address Anonymisation Anonymization

   Since IP addresses are the most common identifiers within flow data
   that can be used to directly identify a person, organization, or
   host, most of the work on flow and trace data anonymisation anonymization has gone
   into IP address anonymisation anonymization techniques.  Indeed, the aim of most
   attacks against anonymisation anonymization is to recover the map from anonymised anonymized
   IP addresses to original IP addresses thereby identifying the
   identified hosts.  There is therefore a wide range of IP address
   anonymisation
   anonymization schemes that fit into the following categories.

       +------------------------------------+---------------------+
       | Scheme                             | Action              |
       +------------------------------------+---------------------+
       | Truncation                         | Generalisation Generalization      |
       | Reverse Truncation                 | Generalisation Generalization      |
       | Permutation                        | Direct Substitution |
       | Prefix-preserving Pseudonymisation Pseudonymization | Direct Substitution |
       +------------------------------------+---------------------+

4.1.1.  Truncation

   Truncation removes "n" of the least significant bits from an IP
   address, replacing them with zeroes.  In effect, it replaces a host
   address with a network address for some fixed netblock; for IPv4
   addresses, 8-bit truncation corresponds to replacement with a /24
   network address.  Truncation is a non-reversible generalisation generalization
   scheme.  Note that while truncation is effective for making hosts
   non-identifiable, it preserves information which can be used to
   identify an organization, a geographic region, a country, or a
   continent.

   Truncation to an address length of 0 is equivalent to black-marker
   anonymisation.
   anonymization.  Complete removal of IP address information is only
   recommended for analysis tasks which have no need to separate flow
   data by host or network; e.g. as a first stage to per-application
   (port) or time-series total volume analyses.

4.1.2.  Reverse Truncation

   Reverse truncation removes "n" of the most significant bits from an
   IP address, replacing them with zeroes.  Reverse truncation is a non-
   reversible generalisation generalization scheme.  Reverse truncation is effective
   for making networks unidentifiable, partially or completely removing
   information which can be used to identify an organization, a
   geographic region, a country, or a continent (or RIR region of
   responsibility).  However, it may cause ambiguity when applied to
   data collected from more than one network, since it treats all the
   hosts with the same address on different networks as if they are the
   same host.  It is not particularly useful when publishing data where
   the network of origin is known or can be easily guessed by virtue of
   the identity of the publisher.

   Like truncation, reverse truncation to an address length of 0 is
   equivalent to black-marker anonymisation. anonymization.

4.1.3.  Permutation

   Permutation is a direct substitution technique, replacing each IP
   address with an address selected from the set of possible IP
   addresses, such that each anonymised anonymized address represents a unique
   original address.  The selection function is often random, though it
   is not necessarily so.  Permutation does not preserve any structural
   information about a network, but it does preserve the unique count of
   IP addresses.  Any application that requires more structure than
   host-uniqueness will not be able to use permuted IP addresses.

   While

   There are many variations of permutation ideally functions, each of which has
   tradeoffs in performance, security, and guarantees of non-collision;
   evaluating these tradeoffs is implementation independent.  However,
   in general permutation functions applied to anonymization SHOULD be
   difficult to reverse without knowing the parameters (e.g., a secret
   key for HMAC).  Given the relatively small space of IPv4 addresses in
   particular, hash functions applied without additional parameters
   could be reversed through brute force if the hash function is known,
   and SHOULD NOT be used as permutation functions.  Permutation
   functions may guarantee noncollision (i.e., that each anonymised anonymized
   address represents a unique original address, such requires significant state
   in address), but need not; however,
   the Intermediate Anonymisation Process.  Therefore, permutation
   may probability of collision SHOULD be implemented by hashing for performance reasons, low.  We treat even
   permutations with hash
   functions that may have relatively small low but nonzero collision probabilities.
   Such techniques are still essentially probability as direct
   substitution techniques,
   despite the nonzero error probability. nevertheless.  Beyond these guidelines, recommendations
   for specific permutation functions are out of scope for this
   document.

4.1.4.  Prefix-preserving Pseudonymisation Pseudonymization

   Prefix-preserving pseudonymisation pseudonymization is a direct substitution
   technique, like permutation but further restricted such that the
   structure of subnets is preserved at each level while anonymising IP
   addresses.  If two real IP addresses match on a prefix of "n" bits,
   the two anonymised anonymized IP addresses will match on a prefix of "n" bits as
   well.  This is useful when relationships among networks must be
   preserved for a given analysis task, but introduces structure into
   the anonymised anonymized data which can be exploited in attacks against the
   anonymisation
   anonymization technique.

   Scanning in Internet background traffic can cause particular problems
   with this technique: if a scanner uses a predictable and known
   sequence of addresses, this information can be used to reverse the
   substitution.  The low order portion of the address can be left
   unanonymized as a partial defense against this attack.

4.2.  MAC Address Anonymisation Anonymization

   Flow data containing sub-IP information can also contain identifying
   information in the form of the hardware (MAC) address.  While MAC
   address information cannot be used to locate a node within a network,
   it can be used to directly uniquely identify a specific device.
   Vendors or organizations within the supply chain may then have the
   information necessary to identify the entity or individual that
   purchased the device.

   MAC address information is not as structured as IP address
   information.  EUI-48 and EUI-64 MAC addresses contain an
   Organizational Unique Identifier (OUI) in the three most significant
   bytes of the address; this OUI additionally contains bits noting
   whether the address is locally or globally administered.  Beyond
   this, there is no standard relationship among the OUIs assigned to a
   given vendor.

   Note that MAC address information also appear within IPv6 addresses,
   as the EAP-64 address, or EAP-48 address encoded as an EAP-64
   address, is used as the least significant 64 bits of the IPv6 address
   in the case of link local addressing or stateless autoconfiguration;
   the considerations and techniques in this section may then apply to
   such IPv6 addresses as well.

           +-----------------------------+---------------------+
           | Scheme                      | Action              |
           +-----------------------------+---------------------+
           | Truncation                  | Generalisation Generalization      |
           | Reverse Truncation          | Generalisation Generalization      |
           | Permutation                 | Direct Substitution |
           | Structured Pseudonymisation Pseudonymization | Direct Substitution |
           +-----------------------------+---------------------+

4.2.1.  Truncation

   Truncation removes "n" of the least significant bits from a MAC
   address, replacing them with zeroes.  In effect, it retains bits of
   OUI, which identifies the manufacturer, while removing the least
   significant bits identifying the particular device.  Truncation of 24
   bits of an EAP-48 or 40 bits of an EAP-64 address zeroes out the
   device identifier while retaining the OUI.

   Truncation is effective for making device manufacturers partially or
   completely identifiable within a dataset while deleting unique host
   identifiers; this can be used to retain and aggregate MAC layer
   behavior by vendor.

   Truncation to an address length of 0 is equivalent to black-marker
   anonymisation.
   anonymization.

4.2.2.  Reverse Truncation

   Reverse truncation removes "n" of the most significant bits from a
   MAC address, replacing them with zeroes.  Reverse truncation is a
   non-reversible generalisation generalization scheme.  This has the effect of
   removing bits of the OUI, which identify manufacturers, before
   removing the least significant bits.  Reverse truncation of 24 bits
   zeroes out the OUI.

   Reverse truncation is effective for making device manufacturers
   partially or completely unidentifiable within a dataset.  However, it
   may cause ambiguity by introducing the possibility of truncated MAC
   address collision.  Also note that the utility of removing
   manufacturer information is not particularly well-covered by the
   literature.

   Reverse truncation to an address length of 0 is equivalent to black-
   marker anonymisation. anonymization.

4.2.3.  Permutation

   Permutation is a direct substitution technique, replacing each MAC
   address with an address selected from the set of possible MAC
   addresses, such that each anonymised anonymized address represents a unique
   original address.  The selection function is often random, though it
   is not necessarily so.  Permutation does not preserve any structural
   information about a network, but it does preserve the unique count of
   devices on the network.  Any application that requires more structure
   than host-uniqueness will not be able to use permuted MAC addresses.

   While

   There are many variations of permutation ideally functions, each of which has
   tradeoffs in performance, security, and guarantees of non-collision;
   evaluating these tradeoffs is implementation independent.  However,
   in general permutation functions applied to anonymization SHOULD be
   difficult to reverse without knowing the parameters (e.g., a secret
   key for HMAC).  While the EAP-48 space is larger than the IPv4
   address space, hash functions applied without additional parameters
   could be reversed through brute force if the hash function is known,
   and SHOULD NOT be used as permutation functions.  Permutation
   functions may guarantee noncollision (i.e., that each anonymised anonymized
   address represents a unique original address, such requires significant state
   in address), but need not; however,
   the Intermediate Anonymisation Process.  Therefore, permutation
   may probability of collision SHOULD be implemented by hashing for performance reasons, low.  We treat even
   permutations with hash
   functions that may have relatively small low but nonzero collision probabilities.
   Such techniques are still essentially probability as direct
   substitution techniques,
   despite the nonzero error probability. nevertheless.  Beyond these guidelines, recommendations
   for specific permutation functions are out of scope for this
   document.

4.2.4.  Structured Pseudonymisation Pseudonymization

   Structured pseudonymisation pseudonymization for MAC addresses is a direct
   substitution technique, like permutation, but restricted such that
   the OUI (the most significant three bytes) is permuted separately
   from the node identifier, the remainder.  This is useful when the
   uniqueness of OUIs must be preserved for a given analysis task, but
   introduces structure into the anonymised anonymized data which can be exploited
   in attacks against the anonymisation anonymization technique.

4.3.  Timestamp Anonymisation Anonymization

   The particular time at which a flow began or ended is not
   particularly identifiable information, but it can be used as part of
   attacks against other anonymisation anonymization techniques or for user profiling.
   Precise timestamps profiling,
   e.g. as in [Mur07].  Timestamps can be used in injected-traffic fingerprinting traffic injection
   attacks, which use known information about a set of traffic generated
   or otherwise known by an attacker to recover mappings of other
   anonymised
   anonymized fields, as well as to identify certain activity by
   response delay and size fingerprinting, which compares response sizes
   and inter-flow times in anonymised anonymized data to known values.  Therefore,
   timestamp information may be anonymised in order  Note that
   these attacks have been shown to ensure the
   protection of be relatively robust against
   timestamp anonymization techniques (see [Bur10]), so the entire data set. techniques
   presented in this section are relatively weak and should be used with
   care.

          +-----------------------+----------------------------+
          | Scheme                | Action                     |
          +-----------------------+----------------------------+
          | Precision Degradation | Generalisation Generalization             |
          | Enumeration           | Direct or Set Substitution |
          | Random Shifts         | Direct Substitution        |
          +-----------------------+----------------------------+

4.3.1.  Precision Degradation

   Precision Degradation is a generalisation generalization technique that removes the
   most precise components of a timestamp, accounting all events
   occurring in each given interval (e.g. one millisecond for
   millisecond level degradation) as simultaneous.  This has the effect
   of potentially collapsing many timestamps into one.  With this
   technique time precision is reduced, and sequencing may be lost, but
   the information at which time the event occurred is preserved.  The
   anonymised
   anonymized data may not be generally useful for applications which
   require strict sequencing of flows.

   Note that flow meters with low time precision (e.g. second precision,
   or millisecond precision on high-capacity networks) perform the
   equivalent of precision degradation anonymisation anonymization by their design.

   Note also that degradation to a very low precision (e.g. on the order
   of minutes, hours, or days) is commonly used in analyses operating on
   time-series aggregated data, and may also be described as binning;
   though the time scales are longer and applicability more restricted,
   this is in principle the same operation.

   Precision degradation to infinitely low precision is equivalent to
   black-marker anonymisation. anonymization.  Removal of timestamp information is only
   recommended for analysis tasks which have no need to separate flows
   in time, for example for counting total volumes or unique occurrences
   of other flow keys in an entire dataset.

4.3.2.  Enumeration

   Enumeration is a substitution function that retains the chronological
   order in which events occurred while eliminating time information.
   Timestamps are substituted by equidistant timestamps (or numbers)
   starting from a randomly chosen start value.  The resulting data is
   useful for applications requiring strict sequencing, but not for
   those requiring good timing information (e.g. delay- or jitter-
   measurement for quality-of-service (QoS) applications or service-
   level agreement (SLA) validation).

   Note that enumeration is functionally equivalent to precision
   degradation in any environment into which traffic can be regularly
   injected to serve as a clock at the precision of the frequency of the
   injected flows.

4.3.3.  Random Shifts

   Random time shifts add a random offset to every timestamp within a
   dataset.  This reversible substitution technique therefore retains
   duration and inter-event interval information as well as
   chronological order of flows.  It is primarily intended  Random time shifts are quite weak, and
   relatively easy to defeat reverse in the presence of external knowledge
   about traffic injection fingerprinting attacks. on the measured network.

4.4.  Counter Anonymisation Anonymization

   Counters (such as packet and octet volumes per flow) are subject to
   fingerprinting and injection attacks against anonymisation, anonymization, or for
   user profiling as timestamps are.  Counter anonymisation can help
   defeat these attacks, but  Data sets with anonymized counters
   are useful only usable for analysis tasks for which relative or imprecise
   magnitudes of activity are useful.  Counter information can also be
   completely removed, but this is only recommended for analysis tasks
   which have no need to evaluate the removed counter, for example for
   counting only unique occurrences of other flow keys.

          +-----------------------+----------------------------+
          | Scheme                | Action                     |
          +-----------------------+----------------------------+
          | Precision Degradation | Generalisation Generalization             |
          | Binning               | Generalisation Generalization             |
          | Random noise addition | Direct or Set Substitution |
          +-----------------------+----------------------------+

4.4.1.  Precision Degradation

   As with precision degradation in timestamps, precision degradation of
   counters removes lower-order bits of the counters, treating all the
   counters in a given range as having the same value.  Depending on the
   precision reduction, this loses information about the relationships
   between sizes of similarly-sized flows, but keeps relative magnitude
   information.  Precision degradation to an infinitely low precision is
   equivalent to black-marker anonymisation. anonymization.

4.4.2.  Binning

   Binning can be seen as a special case of precision degradation; the
   operation is identical, except for in precision degradation the
   counter ranges are uniform, and in binning they need not be.  For
   example, a common counter binning scheme for consider separating unopened TCP connections from
   potentially opened TCP connections.  Here, packet counters could per flow
   would be
   to bin values binned into two bins, one for 1-2 together, packet flows, and 3-infinity together, thereby
   separating potentially completely-opened TCP connections from
   unopened ones. one for
   flows with 3 or more packets.  Binning schemes are generally chosen
   to keep precisely the amount of information required in a counter for
   a given analysis task.  Note that, also unlike precision degradation,
   the bin label need not be within the bin's range.  Binning counters
   to a single bin is equivalent to black-marker anonymisation. anonymization.

4.4.3.  Random Noise Addition

   Random noise addition adds a random amount to a counter in each flow;
   this is used to keep relative magnitude information and minimize the
   disruption to size relationship information while avoiding
   fingerprinting attacks against anonymisation. anonymization.  Note that there is no
   guarantee that random noise addition will maintain ranking order by a
   counter among members of a set.  Random noise addition is
   particularly useful when the derived analysis data will not be
   presented in such a way as to require the lower-order bits of the
   counters.

4.5.  Anonymisation  Anonymization of Other Flow Fields

   Other fields, particularly port numbers and protocol numbers, can be
   used to partially identify the applications that generated the
   traffic in a a given flow trace.  This information can be used in
   fingerprinting attacks, and may be of interest on its own (e.g., to
   reveal that a certain application with suspected vulnerabilities is
   running on a given network).  These fields are generally anonymised anonymized
   using one of two techniques.

                   +-------------+---------------------+
                   | Scheme      | Action              |
                   +-------------+---------------------+
                   | Binning     | Generalisation Generalization      |
                   | Permutation | Direct Substitution |
                   +-------------+---------------------+

4.5.1.  Binning

   Binning is a generalisation generalization technique mapping a set of potentially
   non-uniform ranges into a set of arbitrarily labeled bins.  Common
   bin arrangements depend on the field type and the analysis
   application.  For example, an IP protocol bin arrangement may
   preserve 1, 6, and 17 for ICMP, UDP, and TCP traffic, and bin all
   other protocols into a single bin, to mitigate the use of uncommon
   protocols in fingerprinting attacks.  Another example arrangement may
   bin source and destination ports into low (0-1023) and high (1024-
   65535) bins in order to tell service from ephemeral ports without
   identifying individual applications.

   Binning other flow key fields to a single bin is equivalent to black-
   marker anonymisation. anonymization.  Removal of other flow key information is only
   recommended for analysis tasks which have no need to differentiate
   flows on the removed keys, for example for total traffic counts or
   unique counts of other flow keys.

4.5.2.  Permutation

   Permutation is a direct substitution technique, replacing each value
   with an value selected from the set of possible range, such that each
   anonymised
   anonymized value represents a unique original value.  This is used to
   preserve the count of unique values without preserving information
   about, or the ordering of, the values themselves.

   While permutation ideally guarantees that each anonymised anonymized value
   represents a unique original value, such may require significant
   state in the Intermediate Anonymisation Anonymization Process.  Therefore,
   permutation may be implemented by hashing for performance reasons,
   with hash functions that may have relatively small collision
   probabilities.  Such techniques are still essentially direct
   substitution techniques, despite the nonzero error probability.

5.  Parameters for the Description of Anonymisation Anonymization Techniques

   This section details the abstract parameters used to describe the
   anonymisation
   anonymization techniques examined in the previous section, on a per-
   parameter basis.  These parameters and their export safety inform the
   design of the IPFIX anonymisation anonymization metadata export specified in the
   following section.

5.1.  Stability

   A stable anonymisation anonymization will always map a given value in the real
   space to a given value in the anonymised anonymized space, while an unstable
   anonymisation
   anonymization will change this mapping over time; a completely
   unstable anonymisation anonymization is essentially indistinguishable from black-
   marker anonymisation. anonymization.  Any given anonymisation anonymization technique may be
   applied with a varying range of stability.  Stability is important
   for assessing the comparability of anonymised anonymized information in
   different data sets, or in the same data set over different time
   periods.  In practice, an anonymisation anonymization may also be stable for every
   data set published by an a particular producer to a particular
   consumer, stable for a stated time period within a dataset or across
   datasets, or stable only for a single data set.

   If no information about stability is available, users of anonymised anonymized
   data MAY assume that the techniques used are stable across the entire
   dataset, but unstable across datasets.  Note that stability presents
   a risk-utility tradeoff, as completely stable anonymisation anonymization can be
   used for longer-term trend analysis tasks but also presents more risk
   of attack given the stable mapping.  Information about the stability
   of a mapping SHOULD be exported along with the anonymised anonymized data.

5.2.  Truncation Length

   Truncation and precision degradation are described by the truncation
   length, or the amount of data still remaining in the anonymised anonymized field
   after anonymisation. anonymization.

   Truncation length can generally be inferred from a given data set,
   and need not be specially exported or protected.  For bit-level
   truncation, the truncated bits are generally inferable by the least
   significant bit set for an instance of an Information Element
   described by a given Template (or the most significant bit set, in
   the case of reverse truncation).  For precision degradation, the
   truncation is inferable from the maximum precision given.  Note that
   while this inference method is generally applicable, it is data-
   dependent: there is no guarantee that it will recover the exact
   truncation length used to prepare the data.

   In the special case of IP address export with variable (per-record)
   truncation, the truncation MAY be expressed by exporting the prefix
   length alongside the address.

5.3.  Bin Map

   Binning is described by the specification of a bin mapping function.
   This function can be generally expressed in terms of an associative
   array that maps each point in the original space to a bin, although
   from an implementation standpoint most bin functions are much simpler
   and more efficient.

   Since knowledge of the bin map for a bin mapping function is in essence the bin
   mapping key, and can be used to partially
   deanonymise deanonymize binned data,
   depending on the degree of generalisation, generalization, information about the bin
   mapping function SHOULD NOT be exported.

5.4.  Permutation

   Like binning, permutation is described by the specification of a
   permutation function.  In the general case, this can be expressed in
   terms of an associative array that maps each point in the original
   space to a point in the anonymised anonymized space.  Unlike binning, each point
   in the anonymised anonymized space corresponds to a single, unique point in the
   original space.

   Since knowledge the parameters of the permutation function may, depending on are in essence key-
   like (indeed, for cryptographic permutation functions, they are the
   function, be used to completely deanonymise permuted data,
   keys themselves), information about the permutation function or its
   parameters SHOULD NOT be exported.

5.5.  Shift Amount

   Shifting requires an amount to shift each value by.  Since the shift
   amount is the only key to a shift function, and can be used to deanonymise
   trivially deanonymize data protected by shifting, information about
   the shift amount SHOULD NOT be exported.

6.  Anonymisation  Anonymization Export Support in IPFIX

   Anonymised

   Anonymized data exported via IPFIX SHOULD be annotated with
   anonymisation
   anonymization metadata, which details which fields described by which
   Templates are anonymised, anonymized, and provides appropriate information on the
   anonymisation
   anonymization techniques used.  This metadata SHOULD be exported in
   Data Records described by the recommended Options Templates described
   in this section; these Options Templates use the additional
   Information Elements described in the following subsection.

   Note that fields anonymised anonymized using the black-marker (removal)
   technique do not require any special metadata support: black-marker
   anonymised
   anonymized fields SHOULD NOT be exported at all, by omitting the
   corresponding Information Elements from Template describing the Data
   Set. In the case where application requirements dictate that a black-
   marker anonymised anonymized field must remain in a Template, then an Exporting
   Process MAY export black-marker anonymised anonymized fields with their native
   length as all-zeros, but only in cases where enough contextual
   information exists within the record to differentiate a black-marker
   anonymised
   anonymized field exported in this way from a real zero value.

6.1.  Anonymisation  Anonymization Records and the Anonymisation Anonymization Options Template

   The Anonymisation Anonymization Options Template describes Anonymisation Anonymization Records,
   which allow anonymisation anonymization metadata to be exported inline over IPFIX
   or stored in an IPFIX File, by binding information about
   anonymisation
   anonymization techniques to Information Elements within defined
   Templates or Options Templates.  IPFIX Exporting Processes SHOULD
   export anonymisation anonymization records for any Template describing exported
   anonymised
   anonymized Data Records; IPFIX Collecting Processes and processes
   downstream from them MAY use anonymisation anonymization records to treat
   anonymised
   anonymized data differently depending on the applied technique.

   Anonymisation

   Anonymization Records contain ancillary information bound to a
   Template, so many of the considerations for Templates apply to
   Anonymisation
   Anonymization Records as well.  First, reliability is important: an
   Exporting Process SHOULD export Anonymisation Anonymization Records after the
   Templates they describe have been exported, and SHOULD export
   anonymisation
   anonymization records reliably.

   Anonymisation reliably if supported by the underlying
   transport (i.e., without partial reliability when using SCTP)

   Anonymization Records MUST be handled by Collecting Processes as
   scoped to the Template to which they apply within the Transport
   Session in which they are sent.  When a Template is withdrawn via a
   Template Withdrawal Message or expires during a UDP transport
   session, the accompanying Anonymisation Anonymization Records are withdrawn or
   expire as well, and do not apply to subsequent Templates with the
   same Template ID within the Session unless re-exported.

   The Stability Class within the anonymisationFlags anonymizationFlags IE can be used to
   declare that a given anonymisation anonymization technique's mapping will remain
   stable across multiple sessions, but this does not mean that
   anonymisation
   anonymization technique information given in the Anonymisation Anonymization
   Records themselves persist across Sessions.  Each new Transport
   Session MUST contain new Anonymisation Anonymization Records for each Template
   describing anonymised anonymized Data Sets.

   SCTP per-stream export [I-D.ietf-ipfix-export-per-sctp-stream] may be
   used to ease management of Anonymisation Anonymization Records if appropriate for
   the application.

   The fields of the Anonymisation Anonymization Options template are as follows:

   +-------------------------+-----------------------------------------+
   | IE                      | Description                             |
   +-------------------------+-----------------------------------------+
   | templateId [scope]      | The Template ID of the Template or      |
   |                         | Options Template containing the         |
   |                         | Information Element described by this   |
   |                         | anonymisation anonymization record.  This Information |
   |                         | Element MUST be defined as a Scope      |
   |                         | Field.                                  |
   | informationElementId    | The Information Element identifier of   |
   | [scope]                 | the Information Element described by    |
   |                         | this anonymisation anonymization record.  This        |
   |                         | Information Element MUST be defined as  |
   |                         | a Scope Field.  Exporting Processes     |
   |                         | MUST clear then Enterprise bit of the   |
   |                         | informationElementId and Collecting     |
   |                         | Processes SHOULD ignore it; information |
   |                         | about enterprise-specific Information   |
   |                         | Elements is exported via the            |
   |                         | privateEnterpriseNumber Information     |
   |                         | Element.                                |
   | privateEnterpriseNumber | The Private Enterprise Number of the    |
   | [scope] [optional]      | enterprise-specific Information Element |
   |                         | described by this anonymisation anonymization record. |
   |                         | This Information Element MUST be        |
   |                         | defined as a Scope Field if present.  A |
   |                         | privateEnterpriseNumber of 0 signifies  |
   |                         | that the Information Element is         |
   |                         | IANA-registered.                        |
   | informationElementIndex | The Information Element index of the    |
   | [scope] [optional]      | instance of the Information Element     |
   |                         | described by this anonymisation anonymization record  |
   |                         | identified by the informationElementId  |
   |                         | within the Template.  Optional; need    |
   |                         | only be present when describing         |
   |                         | Templates that have multiple instances  |
   |                         | of the same Information Element.  This  |
   |                         | Information Element MUST be defined as  |
   |                         | a Scope Field if present.  This         |
   |                         | Information Element is defined in       |
   |                         | Section 6.2, below.                     |
   | anonymisationFlags anonymizationFlags      | Flags describing the mapping stability  |
   |                         | and specialized modifications to the    |
   |                         | Anonymisation Anonymization Technique in use.  SHOULD |
   |                         | be present.  This Information Element   |
   |                         | is defined in Section 6.2.3, below.     |
   | anonymisationTechnique anonymizationTechnique  | The technique used to anonymise anonymize the     |
   |                         | data.  MUST be present.  This           |
   |                         | Information Element is defined in       |
   |                         | Section 6.2.2, below.                   |
   +-------------------------+-----------------------------------------+

6.2.  Recommended Information Elements for Anonymisation Anonymization Metadata

6.2.1.  informationElementIndex

   Description:   A zero-based index of an Information Element
      referenced by informationElementId within a Template referenced by
      templateId; used to disambiguate scope for templates containing
      multiple identical Information Elements.

   Abstract Data Type:   unsigned16

   Data Type Semantics:   identifier

   ElementId:   TBD3

   Status:   Proposed   Current

6.2.2.  anonymisationTechnique  anonymizationTechnique

   Description:   A description of the anonymisation anonymization technique applied
      to a referenced Information Element within a referenced Template.
      Each technique may be applicable only to certain Information
      Elements and recommended only for certain Infomation Elements;
      these restrictions are noted in the table below.

   +-------+---------------------------+-----------------+-------------+
   | Value | Description               | Applicable to   | Recommended |
   |       |                           |                 | for         |
   +-------+---------------------------+-----------------+-------------+
   | 0     | Undefined: the Exporting  | all             | all         |
   |       | Process makes no          |                 |             |
   |       | representation as to      |                 |             |
   |       | whether the defined field |                 |             |
   |       | is anonymised anonymized or not.     |                 |             |
   |       | While the Collecting      |                 |             |
   |       | Process MAY assume that   |                 |             |
   |       | the field is not          |                 |             |
   |       | anonymised, anonymized, it is not     |                 |             |
   |       | guaranteed not to be.     |                 |             |
   |       | This is the default       |                 |             |
   |       | anonymisation anonymization technique.  |                 |             |
   | 1     | None: the values exported | all             | all         |
   |       | are real.                 |                 |             |
   | 2     | Precision                 | all             | all         |
   |       | Degradation/Truncation:   |                 |             |
   |       | the values exported are   |                 |             |
   |       | anonymised anonymized using simple   |                 |             |
   |       | precision degradation or  |                 |             |
   |       | truncation.  The new      |                 |             |
   |       | precision or number of    |                 |             |
   |       | truncated bits is         |                 |             |
   |       | implicit in the exported  |                 |             |
   |       | data, and can be deduced  |                 |             |
   |       | by the Collecting         |                 |             |
   |       | Process.                  |                 |             |
   | 3     | Binning: the values       | all             | all         |
   |       | exported are anonymised anonymized   |                 |             |
   |       | into bins.                |                 |             |
   | 4     | Enumeration: the values   | all             | timestamps  |
   |       | exported are anonymised anonymized   |                 |             |
   |       | by enumeration.           |                 |             |
   | 5     | Permutation: the values   | all             | identifiers |
   |       | exported are anonymised anonymized   |                 |             |
   |       | by permutation.           |                 |             |
   | 6     | Structured Permutation:   | addresses       |             |
   |       | the values exported are   |                 |             |
   |       | anonymised anonymized by             |                 |             |
   |       | permutation, preserving   |                 |             |
   |       | bit-level structure as    |                 |             |
   |       | appropriate; this         |                 |             |
   |       | represents                |                 |             |
   |       | prefix-preserving IP      |                 |             |
   |       | address anonymisation anonymization or  |                 |             |
   |       | structured MAC address    |                 |             |
   |       | anonymisation. anonymization.            |                 |             |
   | 7     | Reverse Truncation: the   | addresses       |             |
   |       | values exported are       |                 |             |
   |       | anonymised anonymized using reverse  |                 |             |
   |       | truncation.  The number   |                 |             |
   |       | of truncated bits is      |                 |             |
   |       | implicit in the exported  |                 |             |
   |       | data, and can be deduced  |                 |             |
   |       | by the Collecting         |                 |             |
   |       | Process.                  |                 |             |
   | 8     | Noise: the values         | non-identifiers | counters    |
   |       | exported are anonymised anonymized   |                 |             |
   |       | by adding random noise to |                 |             |
   |       | each value.               |                 |             |
   | 9     | Offset: the values        | all             | timestamps  |
   |       | exported are anonymised anonymized   |                 |             |
   |       | by adding a single offset |                 |             |
   |       | to all values.            |                 |             |
   +-------+---------------------------+-----------------+-------------+

   Abstract Data Type:   unsigned16

   Data Type Semantics:   identifier

   ElementId:   TBD2

   Status:   Proposed   Current

6.2.3.  anonymisationFlags  anonymizationFlags

   Description:   A flag word describing specialized modifications to
      the anonymisation anonymization policy in effect for the anonymisation anonymization technique
      applied to a referenced Information Element within a referenced
      Template.  When flags are clear (0), the normal policy (as
      described by anonymisationTechnique) anonymizationTechnique) applies without modification.

      MSB   14  13  12  11  10   9   8   7   6   5   4   3   2   1  LSB
      +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
      |                Reserved                       |LOR|PmA|   SC  |
      +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
                            anonymisationFlags

                            anonymizationFlags IE

   +--------+----------+-----------------------------------------------+
   | bit(s) | name     | description                                   |
   | (LSB = |          |                                               |
   | 0)     |          |                                               |
   +--------+----------+-----------------------------------------------+
   | 0-1    | SC       | Stability Class: see the Stability Class      |
   |        |          | table below, and section Section 5.1.         |
   | 2      | PmA      | Perimeter Anonymisation: Anonymization: when set (1),        |
   |        |          | source- Information Elements as described in  |
   |        |          | [RFC5103] are interpreted as external         |
   |        |          | addresses, and destination- Information       |
   |        |          | Elements as described in [RFC5103] are        |
   |        |          | interpreted as internal addresses, for the    |
   |        |          | purposes of associating                       |
   |        |          | anonymisationTechnique anonymizationTechnique to Information         |
   |        |          | Elements only; see Section 7.2.2 for details. |
   |        |          | This bit MUST NOT be set when associated with |
   |        |          | a non-endpoint (i.e., source- or              |
   |        |          | destination-) Information Element.  SHOULD be |
   |        |          | consistent within a record (i.e., if a        |
   |        |          | source- Information Element has this flag     |
   |        |          | set, the corresponding destination- element   |
   |        |          | SHOULD have this flag set, and vice-versa.)   |
   | 3      | LOR      | Low-Order Unchanged: when set (1), the        |
   |        |          | low-order bits of the anonymised anonymized Information  |
   |        |          | Element contain real data.  This modification |
   |        |          | is intended for the anonymisation anonymization of          |
   |        |          | network-level addresses while leaving         |
   |        |          | host-level addresses intact in order to       |
   |        |          | preserve host level-structure, which could    |
   |        |          | otherwise be used to reverse anonymisation. anonymization.   |
   |        |          | MUST NOT be set when associated with a        |
   |        |          | truncation-based anonymisationTechnique. anonymizationTechnique.      |
   | 4-15   | Reserved | Reserved for future use: SHOULD be cleared    |
   |        |          | (0) by the Exporting Process and MUST be      |
   |        |          | ignored by the Collecting Process.            |
   +--------+----------+-----------------------------------------------+

      The Stability Class portion of this flags word describes the
      stability class of the anonymisation anonymization technique applied to a
      referenced Information Element within a referenced Template.
      Stability classes refer to the stability of the parameters of the
      anonymisation
      anonymization technique, and therefore the comparability of the
      mapping between the real and anonymised anonymized values over time.  This
      determines which anonymised anonymized datasets may be compared with each
      other.  Values are as follows:

   +-----+-----+-------------------------------------------------------+
   | Bit | Bit | Description                                           |
   | 1   | 0   |                                                       |
   +-----+-----+-------------------------------------------------------+
   | 0   | 0   | Undefined: the Exporting Process makes no             |
   |     |     | representation as to how stable the mapping is, or    |
   |     |     | over what time period values of this field will       |
   |     |     | remain comparable; while the Collecting Process MAY   |
   |     |     | assume Session level stability, Session level         |
   |     |     | stability is not guaranteed.  Processes SHOULD assume |
   |     |     | this is the case in the absence of stability class    |
   |     |     | information; this is the default stability class.     |
   | 0   | 1   | Session: the Exporting Process will ensure that the   |
   |     |     | parameters of the anonymisation anonymization technique are stable  |
   |     |     | during the Transport Session.  All the values of the  |
   |     |     | described Information Element for each Record         |
   |     |     | described by the referenced Template within the       |
   |     |     | Transport Session are comparable.  The Exporting      |
   |     |     | Process SHOULD endeavour to ensure at least this      |
   |     |     | stability class.                                      |
   | 1   | 0   | Exporter-Collector Pair: the Exporting Process will   |
   |     |     | ensure that the parameters of the anonymisation anonymization       |
   |     |     | technique are stable across Transport Sessions over   |
   |     |     | time with the given Collecting Process, but may use   |
   |     |     | different parameters for different Collecting         |
   |     |     | Processes.  Data exported to different Collecting     |
   |     |     | Processes are not comparable.                         |
   | 1   | 1   | Stable: the Exporting Process will ensure that the    |
   |     |     | parameters of the anonymisation anonymization technique are stable  |
   |     |     | across Transport Sessions over time, regardless of    |
   |     |     | the Collecting Process to which it is sent.           |
   +-----+-----+-------------------------------------------------------+

   Abstract Data Type:   unsigned16

   Data Type Semantics:   flags

   ElementId:   TBD1

   Status:   Proposed   Current

7.  Applying Anonymisation Anonymization Techniques to IPFIX Export and Storage

   When exporting or storing anonymised anonymized flow data using IPFIX, certain
   interactions between the IPFIX Protocol and the anonymisation anonymization
   techniques in use must be considered; these are treated in the
   subsections below.

7.1.  Arrangement of Processes in IPFIX Anonymisation

   Anonymisation Anonymization

   Anonymization may be applied to IPFIX data at three stages within the
   collection infrastructure: on initial export, at a mediator, or after
   collection, as shown in Figure 1.  Each of these locations has
   specific considerations and applicability.

               +==========================================+
               | Exporting Process                        |
               +==========================================+
                 |                                      |
                 |    (Anonymised    (Anonymized at Original Exporter) |
                 V                                      |
               +=============================+          |
               | Mediator                    |          |
               +=============================+          |
                 |                                      |
                 | (Anonymising Mediator)               |
                 V                                      V
               +==========================================+
               | Collecting Process                       |
               +==========================================+
                       |
                       | (Anonymising CP/File Writer)
                       V
               +--------------------+
               | IPFIX File Storage |
               +--------------------+

                Figure 1: Potential Anonymisation Anonymization Locations

   Anonymisation

   Anonymization is generally performed before the wider dissemination
   or repurposing of a flow data set, e.g., adapting operational
   measurement data for research.  Therefore, direct anonymisation anonymization of
   flow data on initial export is only applicable in certain restricted
   circumstances: when the Exporting Process (EP) is "publishing" data
   to a Collecting Process (CP) directly, and the Exporting Process and
   Collecting Process are operated by different entities.  Note that
   certain guidelines in Section 7.2.3 with respect to timestamp
   anonymisation
   anonymization may not apply in this case, as the Collecting Process
   may be able to deduce certain timing information from the time at
   which each Message is received.

   A much more flexible arrangement is to anonymise anonymize data within a
   Mediator [I-D.ietf-ipfix-mediators-framework].  Here, original data
   is sent to a Mediator, which performs the anonymisation anonymization function and
   re-exports the anonymised anonymized data.  Such a Mediator could be located at
   the administrative domain boundary of the initial Exporting Process
   operator, exporting anonymised anonymized data to other consumers outside the
   organisation.
   organization.  In this case, the original Exporter SHOULD use TLS
   [RFC5246] as specified in [RFC5101] to secure the channel to the
   Mediator, and the Mediator should follow the guidelines in
   Section 7.2, to mitigate the risk of original data disclosure.

   When data is to be published as an anonymised anonymized data set in an IPFIX
   File [RFC5655], the anonymisation anonymization may be done at the final Collecting
   Process before storage and dissemination, as well.  In this case, the
   Collector should follow the guidelines in Section 7.2, especially as
   regards File-specific Options in Section 7.2.4

   In each of these data flows, the anonymisation anonymization of records is
   undertaken by an Intermediate Anonymisation Anonymization Process (IAP); the data
   flows into and out of this IAP are shown in Figure 2 below.

   packets --+                     +- IPFIX Messages -+
             |                     |                  |
             V                     V                  V
   +==================+ +====================+ +=============+
   | Metering Process | | Collecting Process | | File Reader |
   +==================+ +====================+ +=============+
             |      Non-anonymised      Non-anonymized | Records          |
             V                     V                  V
   +=========================================================+
   |          Intermediate Anonymisation Anonymization Process (IAP)       |
   +=========================================================+
             | Anonymised Anonymized     ^            Anonymised            Anonymized |
             | Records        |               Records |
             V                |                       V
   +===================+    Anonymisation    Anonymization      +=============+
   | Exporting Process |<--- Parameters ------>| File Writer |
   +===================+                       +=============+
             |                                        |
             +------------> IPFIX Messages <----------+

          Figure 2: Data flows through the anonymisation anonymization process

   Anonymisation

   Anonymization parameters must also be available to the Exporting
   Process and/or File Writer in order to ensure header data is also
   appropriately anonymised anonymized as in Section 7.2.3.

   Following each of the data flows through the IAP, we describe five
   basic types of anonymisation anonymization arrangements within this framework in
   Figure 3.  In addition to the three arrangements described in detail
   above, anonymisation anonymization can also be done at a collocated Metering
   Process (MP) and File Writer (FW) (see section 7.3.2 of [RFC5655]),
   or at a file manipulator, which combines a File Writer with a File
   Reader (FR) (see section 7.3.7 of [RFC5655]).

         +----+  +-----+  +----+
 pkts -> | MP |->| IAP |->| EP |-> anonymisation anonymization on Original Exporter
         +----+  +-----+  +----+
         +----+  +-----+  +----+
 pkts -> | MP |->| IAP |->| FW |-> Anonymising collocated MP/File Writer
         +----+  +-----+  +----+
         +----+  +-----+  +----+
IPFIX -> | CP |->| IAP |->| EP |-> Anonymising Mediator (Masq. Proxy)
         +----+  +-----+  +----+
         +----+  +-----+  +----+
IPFIX -> | CP |->| IAP |->| FW |-> Anonymising collocated CP/File Writer
         +----+  +-----+  +----+
         +----+  +-----+  +----+
IPFIX -> | FR |->| IAP |->| FW |-> Anonymising file manipulator
 File    +----+  +-----+  +----+

        Figure 3: Possible anonymisation anonymization arrangements in the IPFIX
                               architecture

   Note that anonymisation anonymization may occur at more than one location within a
   given collection infrastructure, to provide varying levels of
   anonymisation,
   anonymization, disclosure risk, or data utility for specific
   purposes.

7.2.  IPFIX-Specific Anonymisation Anonymization Guidelines

   In implementing and deploying the anonymisation anonymization techniques described
   in this document, implementors should note that IPFIX already
   provides features that support anonymised anonymized data export, and use these
   where appropriate.  Care must also be taken that data structures
   supporting the operation of the protocol itself do not leak data that
   could be used to reverse the anonymisation anonymization applied to the flow data.
   Such data structures may appear in the header, or within the data
   stream itself, especially as options data.  Each of these and their
   impact on specific anonymisation anonymization techniques is noted in a separate
   subsection below.

7.2.1.  Appropriate Use of Information Elements for Anonymised Anonymized Data

   Note, as in Section 6 above, that black-marker anonymised anonymized fields
   SHOULD NOT be exported at all; the absence of the field in a given
   Data Set is implicitly declared by not including the corresponding
   Information Element in the Template describing that Data Set.

   When using precision degradation of timestamps, Exporting Processes
   SHOULD export timing information using Information Elements of an
   appropriate precision, as explained in Section 4.5 of [RFC5153].  For
   example, timestamps measured in millisecond-level precision and
   degraded to second-level precision should use flowStartSeconds and
   flowEndSeconds, not flowStartMilliseconds and flowEndMilliseconds.

   When exporting anonymised anonymized data and anonymisation anonymization metadata, Exporting
   Processes SHOULD ensure that the combination of Information Element
   and declared anonymisation anonymization technique are compatible.  Specifically,
   the applicable and recommended Information Element types and
   semantics for each technique are noted in the description of the
   anonymisationTechnique
   anonymizationTechnique Information Element in Section 6.2.2.  In this
   description, a timestamp is an Information Element with the data type
   dateTimeSeconds, dataTimeMilliseconds, dateTimeMicroseconds, or
   dateTimeNanoseconds; an address is an Information Element with the
   data type ipv4Address, ipv6Address, or macAddress; and an identifier
   is an Information Element with identifier data type semantics.
   Exporting Process MUST NOT export Anonymisation Anonymization Options records
   binding techniques to Information Elements to which they are not
   applicable, and SHOULD NOT export Anonymisation Anonymization Options records
   binding techniques to Information Elements for which they are not
   recommended.

7.2.2.  Export of Perimeter-Based Anonymisation Anonymization Policies

   Data collected from a single network may require different
   anonymisation
   anonymization policies for addresses internal and external to the
   network.  For example, internal addresses could be subject to simple
   permutation, while external addresses could be aggregated into
   networks by truncation.  When exporting anonymised anonymized perimeter
   bidirectional flow (biflow) data as in section 5.2 of [RFC5103], this
   arrangement may be easily represented by specifying one technique for
   source endpoint information (which represents the external endpoint
   in a perimeter biflow) and one technique for destination endpoint
   information (which represents the internal address in a perimeter
   biflow).

   However, it can also be useful to represent perimeter-based
   anonymisation
   anonymization policies with unidirectional flow (uniflow), or non-
   perimeter biflow data.  In this case, the Perimeter Anonymisation Anonymization bit
   (bit 2) in the anonymisationFlags anonymizationFlags Information Element describing the
   anonymised
   anonymized address Information Elements can be set to change the
   meaning of "source" and "destination" of Information Elements to mean
   "external" and "internal" as with perimeter biflows, but only with
   respect to anonymisation anonymization policies.

7.2.3.  Anonymisation  Anonymization of Header Data

   Each IPFIX Message contains a Message Header; within this Message
   Header are contained two fields which may be used to break certain
   anonymisation
   anonymization techniques: the Export Time, and the Observation Domain
   ID

   Export of IPFIX Messages containing anonymised anonymized timestamp data where
   the original Export Time Message header has some relationship to the
   anonymised
   anonymized timestamps SHOULD anonymise anonymize the Export Time header field
   so that the Export Time is consistent with the anonymised anonymized timestamp
   data.  Otherwise, relationships between export and flow time could be
   used to partially or totally reverse timestamp anonymisation. anonymization.  When
   anonymising timestamps and the Export Time header field SHOULD avoid
   times too far in the past or future; while [RFC5101] does not make
   any allowance for Export Time error detection, it is sensible that
   Collecting Processes may interpret Messages with seemingly
   nonsensical Export Times as erroneous.  Specific limits are
   implementation-dependent, but this issue may cause interoperability
   issues when anonymising the Export Time header field.

   The similarity in size between an Observation Domain ID and an IPv4
   address (32 bits) may lead to a temptation to use an IPv4 interface
   address on the Metering or Exporting Process as the Observation
   Domain ID.  If this address bears some relation to the IP addresses
   in the flow data (e.g., shares a network prefix with internal
   addresses) and the IP addresses in the flow data are anonymised anonymized in a
   structure-preserving way, then the Observation Domain ID may be used
   to break the IP address anonymisation. anonymization.  Use of an IPv4 interface
   address on the Metering or Exporting Process as the Observation
   Domain ID is NOT RECOMMENDED in this case.

7.2.4.  Anonymisation  Anonymization of Options Data

   IPFIX uses the Options mechanism to export, among other things,
   metadata about exported flows and the flow collection infrastructure.
   As with the IPFIX Message Header, certain Options recommended in
   [RFC5101] and [RFC5655] containing flow timestamps and network
   addresses of Exporting and Collecting Processes may be used to break
   certain anonymisation anonymization techniques.  When using these Options along
   anonymised
   anonymized data export and storage, values within the Options which
   could be used to break the anonymisation anonymization SHOULD themselves be
   anonymised
   anonymized or omitted.

   The Exporting Process Reliability Statistics Options Template,
   recommended in [RFC5101], contains an Exporting Process ID field,
   which may be an exportingProcessIPv4Address Information Element or an
   exportingProcessIPv6Address Information Element.  If the Exporting
   Process address bears some relation to the IP addresses in the flow
   data (e.g., shares a network prefix with internal addresses) and the
   IP addresses in the flow data are anonymised anonymized in a structure-
   preserving way, then the Exporting Process address may be used to
   break the IP address anonymisation. anonymization.  Exporting Processes exporting
   anonymised
   anonymized data in this situation SHOULD mitigate the risk of attack
   either by omitting Options described by the Exporting Process
   Reliability Statistics Options Template, or by anonymising the
   Exporting Process address using a similar technique to that used to
   anonymise
   anonymize the IP addresses in the exported data.

   Similarly, the Export Session Details Options Template and Message
   Details Options Template specified for the IPFIX File Format
   [RFC5655] may contain the exportingProcessIPv4Address Information
   Element or the exportingProcessIPv6Address Information Element to
   identify an Exporting Process from which a flow record was received,
   and the collectingProcessIPv4Address Information Element or the
   collectingProcessIPv6Address Information Element to identify the
   Collecting Process which received it.  If the Exporting Process or
   Collecting Process address bears some relation to the IP addresses in
   the data set (e.g., shares a network prefix with internal addresses)
   and the IP addresses in the data set are anonymised anonymized in a structure-
   preserving way, then the Exporting Process or Collecting Process
   address may be used to break the IP address anonymisation. anonymization.  Since
   these Options Templates are primarily intended for storing IPFIX
   Transport Session data for auditing, replay, and testing purposes, it
   is NOT RECOMMENDED that storage of anonymised anonymized data include these
   Options Templates in order to mitigate the risk of attack.

   The Message Details Options Template specified for the IPFIX File
   Format [RFC5655] also contains the collectionTimeMilliseconds
   Information Element.  As with the Export Time Message Header field,
   if the exported data set contains anonymised anonymized timestamp information,
   and the collectionTimeMilliseconds Information Element in a given
   Message has some relationship to the anonymised anonymized timestamp
   information, then this relationship can be exploited to reverse the
   timestamp anonymisation. anonymization.  Since this Options Template is primarily
   intended for storing IPFIX Transport Session data for auditing,
   replay, and testing purposes, it is NOT RECOMMENDED that storage of
   anonymised
   anonymized data include this Options Template in order to mitigate
   the risk of attack.

   Since the Time Window Options Template specified for the IPFIX File
   Format [RFC5655] refers to the timestamps within the data set to
   provide partial table of contents information for an IPFIX File,
   Options described by this template SHOULD be written using the
   anonymised
   anonymized timestamps instead of the original ones.

7.2.5.  Special-Use Address Space Considerations

   When anonymising data for transport or storage using IPFIX containing
   anonymised
   anonymized IP addresses, and the analysis purpose permits doing so,
   it is RECOMMENDED to filter out or leave unanonymised unanonymized data containing
   the special-use IPv4 addresses enumerated in [RFC5735] or the
   special-use IPv6 addresses enumerated in [RFC5156].  Data containing
   these addresses (e.g. 0.0.0.0 and 169.254.0.0/16 for link-local
   autoconfiguration in IPv4 space) are often associated with specific,
   well-known behavioral patterns.  Detection of these patterns in
   anonymised
   anonymized data can lead to deanonymisation deanonymization of these special-use
   addresses, which increases the chance of a complete reversal of
   anonymisation
   anonymization by an attacker, especially of prefix-preserving
   techniques.

7.2.6.  Protecting Out-of-Band Configuration and Management Data

   Special care should be taken when exporting or sharing anonymised anonymized
   data to avoid information leakage via the configuration or management
   planes of the IPFIX Device containing the Exporting Process or the
   File Writer.  For example, adding noise to counters is useless if the
   receiver can deduce the values in the counters from SNMP information,
   and concealing the network under test is similarly useless if such
   information is available in a configuration document.  As the
   specifics of these concerns are largely implementation- and
   deployment-dependent, specific mitigation is out of scope for this
   draft.  The general ground rule is that information of similar type
   to that anonymised anonymized SHOULD NOT be made available to the receiver by
   any means, whether in the Data Records, in IPFIX protocol structures
   such as Message Headers, or out-of-band.

8.  Examples

   In this example, consider the export or storage of an anonymised anonymized IPv4
   data set from a single network described by a simple template
   containing a timestamp in seconds, a five-tuple, and packet and octet
   counters.  The template describing each record in this data set is
   shown in figure Figure 4.

                        1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          Set ID = 2           |          Length =  40         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |      Template ID = 256        |        Field Count = 8        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |0| flowStartSeconds        150 |       Field Length =  4       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |0| sourceIPv4Address         8 |       Field Length =  4       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |0| destinationIPv4Address   12 |       Field Length =  4       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |0| sourceTransportPort       7 |       Field Length =  2       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |0| destinationTransportPort 11 |       Field Length =  2       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |0| packetDeltaCount          2 |       Field Length =  4       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |0| octetDeltaCount           1 |       Field Length =  4       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |0| protocolIdentifier        4 |       Field Length =  1       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                      Figure 4: Example Flow Template

   Suppose that this data set is anonymised anonymized according to the following
   policy:

   o  IP addresses within the network are protected by reverse
      truncation.

   o  IP addresses outside the network are protected by prefix-
      preserving anonymisation. anonymization.

   o  Octet counts are exported using degraded precision in order to
      provide minimal protection against fingerprinting attacks.

   o  All other fields are exported unanonymised. unanonymized.

   In order to export anonymisation anonymization records for this template and
   policy, first, the Anonymisation Anonymization Options Template shown in figure
   Figure 5 is exported.  For this example, the optional
   privateEnterpriseNumber and informationElementIndex Information
   Elements are omitted, because they are not used.

                        1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          Set ID = 3           |          Length =  26         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |      Template ID = 257        |        Field Count = 4        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |    Scope Field Count = 2      |0| templateID              145 |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |       Field Length = 2        |0| informationElementId    303 |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |       Field Length = 2        |0| anonymisationFlags anonymizationFlags     TBD1 |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |       Field Length = 2        |0| anonymisationTechnique anonymizationTechnique TBD2 |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |       Field Length = 2        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

             Figure 5: Example Anonymisation Anonymization Options Template

   Following the Anonymisation Anonymization Options Template comes a Data Set
   containing Anonymisation Anonymization Records.  This data set has an entry for
   each Information Element Specifier in Template 256 describing the
   flow records.  This Data Set is shown in figure Figure 6.  Note that
   sourceIPv4Address and destinationIPv4Address have the Perimeter
   Anonymisation
   Anonymization (0x0004) flag set in anonymisationFlags, anonymizationFlags, meaning that
   source address should be treated as network-external, and the
   destination address as network-internal.

                        1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          Set ID = 257         |          Length =  68         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          Template 256         | flowStartSeconds       IE 150 |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | no flags               0x0000 | Not Anonymised Anonymized              1 |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          Template 256         | sourceIPv4Address        IE 8 |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Perimeter, Session SC  0x0005 | Structured Permutation      6 |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          Template 256         | destinationIPv4Address  IE 12 |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Perimeter, Stable      0x0007 | Reverse Truncation          7 |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          Template 256         | sourceTransportPort      IE 7 |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | no flags               0x0000 | Not Anonymised Anonymized              1 |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          Template 256         | dest.TransportPort      IE 11 |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | no flags               0x0000 | Not Anonymised Anonymized              1 |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          Template 256         | packetDeltaCount         IE 2 |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | no flags               0x0000 | Not Anonymised Anonymized              1 |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          Template 256         | octetDeltaCount          IE 1 |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Stable                 0x0003 | Precision Degradation       2 |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          Template 256         | protocolIdentifier      IE 4  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | no flags               0x0000 | Not Anonymised Anonymized              1 |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                  Figure 6: Example Anonymisation Anonymization Records

   Following the Anonymisation Anonymization Records come the data sets containing the
   anonymised
   anonymized data, exported according to the template in figure
   Figure 4.  Bringing it all together, consider an IPFIX Message
   containing three real data records and the necessary templates to
   export them, shown in Figure 7.  (Note that the scale of this message
   is 8-bytes per line, for compactness; lines of dots '. . . . . '
   represent shifting of the example bit structure for clarity.)
             1         2         3         4         5         6
   0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  | 0x000a        | length 135    | export time 1271227717        | msg
  | sequence 0                    | domain 1                      | hdr
  | SetID 2       | length 40     | tid 256       | fields 8      | tmpl
  | IE 150        | length 4      | IE 8          | length 4      | set
  | IE 12         | length 4      | IE 7          | length 2      |
  | IE 11         | length 2      | IE 2          | length 4      |
  | IE 1          | length 4      | IE 4          | length 1      |
  | SetID 256     | length 79     | time 1271227681               | data
  | sip 192.0.2.3                 | dip 198.51.100.7              | set
  | sp 53         | dp 53         | packets 1                     |
  | bytes 74                      | prt 17  | . . . . . . . . . . .
  | time 1271227682               | sip 198.51.100.7              |
  | dip 192.0.2.88                | sp 5091       | dp 80         |
  | packets 60                    | bytes 2896                    |
  | prt 6   | . . . . . . . . . . . . . . . . . . . . . . . . . . .
  | time 1271227683               | sip 198.51.100.7              |
  | dip 203.0.113.9               | sp 5092       | dp 80         |
  | packets 44                    | bytes 2037                    |
  | prt 6   |
  +---------+

                      Figure 7: Example Real Message

   The corresponding anonymised anonymized message is then shown in Figure 8.  The
   options template set describing Anonymisation Anonymization Records and the
   Anonymisation
   Anonymization Records themselves are added; IP addresses and byte
   counts are anonymised anonymized as declared.

             1         2         3         4         5         6
   0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  | 0x000a        | length 233    | export time 1271227717        | msg
  | sequence 0                    | domain 1                      | hdr
  | SetID 2       | length 40     | tid 256       | fields 8      | tmpl
  | IE 150        | length 4      | IE 8          | length 4      | set
  | IE 12         | length 4      | IE 7          | length 2      |
  | IE 11         | length 2      | IE 2          | length 4      |
  | IE 1          | length 4      | IE 4          | length 1      |
  | SetID 3       | length 30     | tid 257       | fields 4      | opt
  | scope 2       | . . . . . . . . . . . . . . . . . . . . . . . . tmpl
  | IE 145        | length 2      | IE 303        | length 2      | set
  | IE TBD1       | length 2      | IE TBD2       | length 2      |
  | SetID 257     | length 68     | . . . . . . . . . . . . . . . . anon
  | tid 256       | IE 150        | flags 0       | tech 1        | recs
  | tid 256       | IE 8          | flags 5       | tech 6        |
  | tid 256       | IE 12         | flags 7       | tech 7        |
  | tid 256       | IE 7          | flags 0       | tech 1        |
  | tid 256       | IE 11         | flags 0       | tech 1        |
  | tid 256       | IE 2          | flags 0       | tech 1        |
  | tid 256       | IE 1          | flags 3       | tech 2        |
  | tid 256       | IE41          | flags 0       | tech 1        |
  | SetID 256     | length 79     | time 1271227681               | data
  | sip 254.202.119.209           | dip 0.0.0.7                   | set
  | sp 53         | dp 53         | packets 1                     |
  | bytes 100                     | prt 17  | . . . . . . . . . . .
  | time 1271227682               | sip 0.0.0.7                   |
  | dip 254.202.119.6             | sp 5091       | dp 80         |
  | packets 60                    | bytes 2900                    |
  | prt 6   | . . . . . . . . . . . . . . . . . . . . . . . . . . .
  | time 1271227683               | sip 0.0.0.7                   |
  | dip 2.19.199.176              | sp 5092       | dp 80         |
  | packets 60                    | bytes 2000                    |
  | prt 6   |
  +---------+

                Figure 8: Corresponding Anonymised Anonymized Message

9.  Security Considerations

   This document provides guidelines for exporting metadata about
   anonymised
   anonymized data in IPFIX, or storing metadata about anonymised anonymized data
   in IPFIX Files.  It is not intended as a general statement on the
   applicability of specific flow data anonymisation anonymization techniques.
   Exporters or publishers of anonymised anonymized data must take care that the
   applied anonymisation anonymization technique is appropriate for the data source,
   the purpose, and the risk of deanonymisation deanonymization of a given application.
   Research in anonymization techniques, and techniques for
   deanonymization, is ongoing, and currently "safe" anonymization
   techniques may be rendered unsafe by future developments.

   We note specifically that anonymisation anonymization is not a replacement for
   encryption for confidentiality.  It is only appropriate for
   protecting identifying information in data to be used for purposes in
   which the protected data is irrelevant.  Confidentiality in export is
   best served by using TLS [RFC5246] or DTLS [RFC4347] as in the
   Security Considerations section of [RFC5101], and in long-term
   storage by implementation-
   specific implementation-specific protection applied as in the
   Security Considerations section of [RFC5655].  Indeed,
   confidentiality and anonymisation anonymization are not mutually exclusive, as
   encryption for confidentiality may be applied to anonymised anonymized data
   export or storage, as well, when the anonymised anonymized data is not intended
   for public release.

   We note as well that care should be taken even with well-anonymized
   data, and anonymized data should still be treated as privacy-
   sensitive.  Anonymization reduces the risk of misuse, but is not a
   complete solution to the problem of protecting end-user privacy in
   network flow trace analysis.

   When using pseudonymisation pseudonymization techniques that have a mutable mapping,
   there is an inherent tradeoff in the stability of the map between
   long-term comparability and security of the data set against
   deanonymisation.
   deanonymization.  In general, deanonymisation deanonymization attacks are more
   effective given more information, so the longer a given mapping is
   valid, the more information can be applied to deanonymisation. deanonymization.  The
   specific details of this are technique-dependent and therefore out of
   the scope of this document.

   When releasing anonymised anonymized data, publishers need to ensure that data
   that could be used in deanonymisation deanonymization is not leaked through the
   export protocol; guidelines a side
   channel.  The entire workflow (hardware, software, operational
   policies and procedures, etc.) for handling anonymized data must be
   evaluated for risk of data leakage.  While most of these possible
   side channels are out of scope for addressing this document, guidelines for
   reducing the risk of information leakage specific to the IPFIX export
   protocol are provided in Section 7.2.

   Note as well that the Security Considerations section of [RFC5101]
   applies as well to the export of anonymised anonymized data, and the Security
   Considerations section of [RFC5655] to the storage of anonymised anonymized
   data, or the publication of anonymised anonymized traces.

10.  IANA Considerations

   This document specifies the creation of several new IPFIX Information
   Elements in the IPFIX Information Element registry located at
   http://www.iana.org/assignments/ipfix, as defined in Section 6.2
   above.  IANA has assigned the following Information Element numbers
   for their respective Information Elements as specified below:

   o  Information Element number TBD1 for the anonymisationFlags anonymizationFlags
      Information Element.

   o  Information Element number TBD2 for the anonymisationTechnique anonymizationTechnique
      Information Element.

   o  Information Element number TBD3 for the informationElementIndex
      Information Element.

   [NOTE for IANA: The text TBDn should be replaced with the respective
   assigned Information Element numbers where they appear in this
   document.]
   document.  Information Element numbers should be assigned outside the
   NetFlow V9 compatibility range, as these Information Elements are not
   supported by NetFlow V9.]

11.  Acknowledgments

   We thank Paul Aitken and John McHugh for their comments and insight,
   and Carsten Schmoll, Benoit Claise, Lothar Braun, and Dan Romascanu Romascanu,
   Stewart Bryant, and Sean Turner for their reviews.  Special thanks to
   the ICT-PRISM project FP7 PRISM and DEMONS projects for its their material support of this
   work.

12.  References

12.1.  Normative References

   [RFC5101]  Claise, B., "Specification of the IP Flow Information
              Export (IPFIX) Protocol for the Exchange of IP Traffic
              Flow Information", RFC 5101, January 2008.

   [RFC5102]  Quittek, J., Bryant, S., Claise, B., Aitken, P., and J.
              Meyer, "Information Model for IP Flow Information Export",
              RFC 5102, January 2008.

   [RFC5103]  Trammell, B. and E. Boschi, "Bidirectional Flow Export
              Using IP Flow Information Export (IPFIX)", RFC 5103,
              January 2008.

   [RFC5655]  Trammell, B., Boschi, E., Mark, L., Zseby, T., and A.
              Wagner, "Specification of the IP Flow Information Export
              (IPFIX) File Format", RFC 5655, October 2009.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC5735]  Cotton, M. and L. Vegoda, "Special Use IPv4 Addresses",
              BCP 153, RFC 5735, January 2010.

   [RFC5156]  Blanchet, M., "Special-Use IPv6 Addresses", RFC 5156,
              April 2008.

12.2.  Informative References

   [RFC5470]  Sadasivan, G., Brownlee, N., Claise, B., and J. Quittek,
              "Architecture for IP Flow Information Export", RFC 5470,
              March 2009.

   [RFC5472]  Zseby, T., Boschi, E., Brownlee, N., and B. Claise, "IP
              Flow Information Export (IPFIX) Applicability", RFC 5472,
              March 2009.

   [I-D.ietf-ipfix-mediators-framework]
              Kobayashi, A., Claise, B., Muenz, G., and K. Ishibashi,
              "IPFIX Mediation: Framework",
              draft-ietf-ipfix-mediators-framework-08
              draft-ietf-ipfix-mediators-framework-09 (work in
              progress), August October 2010.

   [I-D.ietf-ipfix-export-per-sctp-stream]
              Claise, B., Aitken, P., Johnson, A., and G. Muenz, "IPFIX
              Export per SCTP Stream",
              draft-ietf-ipfix-export-per-sctp-stream-08 (work in
              progress), May 2010.

   [RFC5153]  Boschi, E., Mark, L., Quittek, J., Stiemerling, M., and P.
              Aitken, "IP Flow Information Export (IPFIX) Implementation
              Guidelines", RFC 5153, April 2008.

   [RFC3917]  Quittek, J., Zseby, T., Claise, B., and S. Zander,
              "Requirements for IP Flow Information Export (IPFIX)",
              RFC 3917, October 2004.

   [RFC4291]  Hinden, R. and S. Deering, "IP Version 6 Addressing
              Architecture", RFC 4291, February 2006.

   [RFC4347]  Rescorla, E. and N. Modadugu, "Datagram Transport Layer
              Security", RFC 4347, April 2006.

   [RFC5246]  Dierks, T. and E. Rescorla, "The Transport Layer Security
              (TLS) Protocol Version 1.2", RFC 5246, August 2008.

   [Bur10]    Burkhart, M., Schatzmann, D., Trammell, B., and E. Boschi,
              "The Role of Network Trace Anonymization Under Attack",
               ACM Computer Communications Review, vol. 40, no. 1, pp.
              6-11, January 2010.

   [Mur07]    Murdoch, S. and P. Zielinski, "Sampled Traffic Analysis by
              Internet-Exchange-Level Adversaries",  Proceedings of the
              7th Workshop on Privacy Enhancing Technologies, Ottawa,
              Canada., June 2007.

Authors' Addresses

   Elisa Boschi
   Swiss Federal Institute of Technology Zurich
   Gloriastrasse 35
   8092 Zurich
   Switzerland

   Email: boschie@tik.ee.ethz.ch

   Brian Trammell
   Swiss Federal Institute of Technology Zurich
   Gloriastrasse 35
   8092 Zurich
   Switzerland

   Phone: +41 44 632 70 13
   Email: trammell@tik.ee.ethz.ch