idnits 2.17.1 draft-ietf-ipfix-anon-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 5 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. -- The document has examples using IPv4 documentation addresses according to RFC6890, but does not use any IPv6 documentation addresses. Maybe there should be IPv6 examples, too? Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but does not include the phrase in its RFC 2119 key words list. -- The document date (January 19, 2011) is 4818 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 5101 (Obsoleted by RFC 7011) ** Obsolete normative reference: RFC 5102 (Obsoleted by RFC 7012) ** Obsolete normative reference: RFC 5735 (Obsoleted by RFC 6890) ** Obsolete normative reference: RFC 5156 (Obsoleted by RFC 6890) -- Obsolete informational reference (is this intentional?): RFC 4347 (Obsoleted by RFC 6347) -- Obsolete informational reference (is this intentional?): RFC 5246 (Obsoleted by RFC 8446) Summary: 4 errors (**), 0 flaws (~~), 3 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 IPFIX Working Group E. Boschi 3 Internet-Draft B. Trammell 4 Intended status: Experimental ETH Zurich 5 Expires: July 23, 2011 January 19, 2011 7 IP Flow Anonymization Support 8 draft-ietf-ipfix-anon-06.txt 10 Abstract 12 This document describes anonymization techniques for IP flow data and 13 the export of anonymized data using the IPFIX protocol. It 14 categorizes common anonymization schemes and defines the parameters 15 needed to describe them. It provides guidelines for the 16 implementation of anonymized data export and storage over IPFIX, and 17 describes an information model and Options-based method for 18 anonymization metadata export within the IPFIX protocol or storage in 19 IPFIX Files. 21 Status of this Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF). Note that other groups may also distribute 28 working documents as Internet-Drafts. The list of current Internet- 29 Drafts is at http://datatracker.ietf.org/drafts/current/. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 This Internet-Draft will expire on July 23, 2011. 38 Copyright Notice 40 Copyright (c) 2011 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents 45 (http://trustee.ietf.org/license-info) in effect on the date of 46 publication of this document. Please review these documents 47 carefully, as they describe your rights and restrictions with respect 48 to this document. Code Components extracted from this document must 49 include Simplified BSD License text as described in Section 4.e of 50 the Trust Legal Provisions and are provided without warranty as 51 described in the Simplified BSD License. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 56 1.1. IPFIX Protocol Overview . . . . . . . . . . . . . . . . . 4 57 1.2. IPFIX Documents Overview . . . . . . . . . . . . . . . . . 5 58 1.3. Anonymization within the IPFIX Architecture . . . . . . . 5 59 1.4. Supporting Experimentation with Anonymization . . . . . . 6 60 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 61 3. Categorization of Anonymization Techniques . . . . . . . . . . 7 62 4. Anonymization of IP Flow Data . . . . . . . . . . . . . . . . 8 63 4.1. IP Address Anonymization . . . . . . . . . . . . . . . . . 10 64 4.1.1. Truncation . . . . . . . . . . . . . . . . . . . . . . 11 65 4.1.2. Reverse Truncation . . . . . . . . . . . . . . . . . . 11 66 4.1.3. Permutation . . . . . . . . . . . . . . . . . . . . . 11 67 4.1.4. Prefix-preserving Pseudonymization . . . . . . . . . . 12 68 4.2. MAC Address Anonymization . . . . . . . . . . . . . . . . 12 69 4.2.1. Truncation . . . . . . . . . . . . . . . . . . . . . . 13 70 4.2.2. Reverse Truncation . . . . . . . . . . . . . . . . . . 13 71 4.2.3. Permutation . . . . . . . . . . . . . . . . . . . . . 14 72 4.2.4. Structured Pseudonymization . . . . . . . . . . . . . 14 73 4.3. Timestamp Anonymization . . . . . . . . . . . . . . . . . 15 74 4.3.1. Precision Degradation . . . . . . . . . . . . . . . . 15 75 4.3.2. Enumeration . . . . . . . . . . . . . . . . . . . . . 16 76 4.3.3. Random Shifts . . . . . . . . . . . . . . . . . . . . 16 77 4.4. Counter Anonymization . . . . . . . . . . . . . . . . . . 16 78 4.4.1. Precision Degradation . . . . . . . . . . . . . . . . 17 79 4.4.2. Binning . . . . . . . . . . . . . . . . . . . . . . . 17 80 4.4.3. Random Noise Addition . . . . . . . . . . . . . . . . 17 81 4.5. Anonymization of Other Flow Fields . . . . . . . . . . . . 17 82 4.5.1. Binning . . . . . . . . . . . . . . . . . . . . . . . 18 83 4.5.2. Permutation . . . . . . . . . . . . . . . . . . . . . 18 84 5. Parameters for the Description of Anonymization Techniques . . 18 85 5.1. Stability . . . . . . . . . . . . . . . . . . . . . . . . 19 86 5.2. Truncation Length . . . . . . . . . . . . . . . . . . . . 19 87 5.3. Bin Map . . . . . . . . . . . . . . . . . . . . . . . . . 20 88 5.4. Permutation . . . . . . . . . . . . . . . . . . . . . . . 20 89 5.5. Shift Amount . . . . . . . . . . . . . . . . . . . . . . . 20 90 6. Anonymization Export Support in IPFIX . . . . . . . . . . . . 20 91 6.1. Anonymization Records and the Anonymization Options 92 Template . . . . . . . . . . . . . . . . . . . . . . . . . 21 93 6.2. Recommended Information Elements for Anonymization 94 Metadata . . . . . . . . . . . . . . . . . . . . . . . . . 23 95 6.2.1. informationElementIndex . . . . . . . . . . . . . . . 23 96 6.2.2. anonymizationTechnique . . . . . . . . . . . . . . . . 23 97 6.2.3. anonymizationFlags . . . . . . . . . . . . . . . . . . 25 98 7. Applying Anonymization Techniques to IPFIX Export and 99 Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 100 7.1. Arrangement of Processes in IPFIX Anonymization . . . . . 28 101 7.2. IPFIX-Specific Anonymization Guidelines . . . . . . . . . 30 102 7.2.1. Appropriate Use of Information Elements for 103 Anonymized Data . . . . . . . . . . . . . . . . . . . 30 104 7.2.2. Export of Perimeter-Based Anonymization Policies . . . 31 105 7.2.3. Anonymization of Header Data . . . . . . . . . . . . . 32 106 7.2.4. Anonymization of Options Data . . . . . . . . . . . . 32 107 7.2.5. Special-Use Address Space Considerations . . . . . . . 34 108 7.2.6. Protecting Out-of-Band Configuration and 109 Management Data . . . . . . . . . . . . . . . . . . . 34 110 8. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 111 9. Security Considerations . . . . . . . . . . . . . . . . . . . 39 112 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 41 113 11. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 41 114 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 41 115 12.1. Normative References . . . . . . . . . . . . . . . . . . . 41 116 12.2. Informative References . . . . . . . . . . . . . . . . . . 42 117 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 43 119 1. Introduction 121 The standardization of an IP flow information export protocol 122 [RFC5101] and associated representations removes a technical barrier 123 to the sharing of IP flow data across organizational boundaries and 124 with network operations, security, and research communities for a 125 wide variety of purposes. However, with wider dissemination comes 126 greater risks to the privacy of the users of networks under 127 measurement, and to the security of those networks. While it is not 128 a complete solution to the issues posed by distribution of IP flow 129 information, anonymization (i.e., the deletion or transformation of 130 information that is considered sensitive and could be used to reveal 131 the identity of subjects involved in a communication) is an important 132 tool for the protection of privacy within network measurement 133 infrastructures. 135 This document presents a mechanism for representing anonymized data 136 within IPFIX and guidelines for using it. It is not intended as a 137 general statement on the applicability of specific flow data 138 anonymization techniques to specific situations, or as a 139 recommendation of any particular application of anonymization to flow 140 data export. Exporters or publishers of anonymized data must take 141 care that the applied anonymization technique is appropriate for the 142 data source, the purpose, and the risk of deanonymization of a given 143 application. 145 It begins with a categorization of anonymization techniques. It then 146 describes applicability of each technique to commonly anonymizable 147 fields of IP flow data, organized by information element data type 148 and semantics as in [RFC5102]; enumerates the parameters required by 149 each of the applicable anonymization techniques; and provides 150 guidelines for the use of each of these techniques in accordance with 151 current best practices in data protection. Finally, it specifies a 152 mechanism for exporting anonymized data and binding anonymization 153 metadata to Templates and Options Templates using IPFIX Options. 155 1.1. IPFIX Protocol Overview 157 In the IPFIX protocol, { type, length, value } tuples are expressed 158 in Templates containing { type, length } pairs, specifying which { 159 value } fields are present in data records conforming to the 160 Template, giving great flexibility as to what data is transmitted. 161 Since Templates are sent very infrequently compared with Data 162 Records, this results in significant bandwidth savings. Various 163 different data formats may be transmitted simply by sending new 164 Templates specifying the { type, length } pairs for the new data 165 format. See [RFC5101] for more information. 167 The IPFIX information model [RFC5102] defines a large number of 168 standard Information Elements which provide the necessary { type } 169 information for Templates. The use of standard elements enables 170 interoperability among different vendors' implementations. 171 Additionally, non-standard enterprise-specific elements may be 172 defined for private use. 174 1.2. IPFIX Documents Overview 176 "Specification of the IPFIX Protocol for the Exchange of IP Traffic 177 Flow Information" [RFC5101] and its associated documents define the 178 IPFIX Protocol, which provides network engineers and administrators 179 with access to IP traffic flow information. 181 "Architecture for IP Flow Information Export" [RFC5470] defines the 182 architecture for the export of measured IP flow information out of an 183 IPFIX Exporting Process to an IPFIX Collecting Process, and the basic 184 terminology used to describe the elements of this architecture, per 185 the requirements defined in "Requirements for IP Flow Information 186 Export" [RFC3917]. The IPFIX Protocol document [RFC5101] then covers 187 the details of the method for transporting IPFIX Data Records and 188 Templates via a congestion-aware transport protocol from an IPFIX 189 Exporting Process to an IPFIX Collecting Process. 191 "Information Model for IP Flow Information Export" [RFC5102] 192 describes the Information Elements used by IPFIX, including details 193 on Information Element naming, numbering, and data type encoding. 194 Finally, "IPFIX Applicability" [RFC5472] describes the various 195 applications of the IPFIX protocol and their use of information 196 exported via IPFIX, and relates the IPFIX architecture to other 197 measurement architectures and frameworks. 199 Additionally, "Specification of the IPFIX File Format" [RFC5655] 200 describes a file format based upon the IPFIX Protocol for the storage 201 of flow data. 203 This document references the Protocol and Architecture documents for 204 terminology, and extends the IPFIX Information Model to provide new 205 Information Elements for anonymization metadata. The anonymization 206 techniques described herein are equally applicable to the IPFIX 207 Protocol and data stored in IPFIX Files. 209 1.3. Anonymization within the IPFIX Architecture 211 According to [RFC5470], IPFIX Message anonymization is optionally 212 performed as the final operation before handing the Message to the 213 transport protocol for export. While no provision is made in the 214 architecture for anonymization metadata as in Section 6, this 215 arrangement does allow for the rewriting necessary for comprehensive 216 anonymization of IPFIX export as in Section 7. The development of 217 the IPFIX Mediation [I-D.ietf-ipfix-mediators-framework] framework 218 and the IPFIX File Format [RFC5655] expand upon this initial 219 architectural allowance for anonymization by adding to the list of 220 places that anonymization may be applied. The former specifies IPFIX 221 Mediators, which rewrite existing IPFIX Messages, and the latter 222 specifies a method for storage of IPFIX data in files. 224 More detail on the applicable architectural arrangements for 225 anonymization can be found in Section 7.1 227 1.4. Supporting Experimentation with Anonymization 229 The intended status of this document is Experimental, reflecting the 230 experimental nature of anonymization export support. Research on 231 network trace anonymization techniques and attacks against them is 232 ongoing. Indeed, there is increasing evidence that anonymization 233 applied to network trace or flow data its own is insufficient for 234 many data protection applications as in [Bur10]. Therefore, this 235 document explicitly does not recommend any particular technique or 236 implementation thereof. 238 The intention of this document is to provide a common basis for 239 interoperable exchange of anonymized data, furthering research in 240 this area, both on anonymization techniques themselves as well as to 241 the application of anonymized data to network measurement. To that 242 end, the classification in Section 3 and anonymization export support 243 in Section 6 can be used to describe and export information even 244 about data anonymized using techniques that are unacceptably weak for 245 general application to production data sets on their own. 247 While the specification herein is designed to be implementation- and 248 technique-independent, open research in this area may necessitate 249 future updates to the specification. Assuming the future successful 250 application of this specification to anonymized data publication and 251 exchange, it may be brought back to the IPFIX working group for 252 further development and publication on the standards track. 254 2. Terminology 256 Terms used in this document that are defined in the Terminology 257 section of the IPFIX Protocol [RFC5101] document are to be 258 interpreted as defined there. In addition, this document defines the 259 following terms: 261 Anonymization Record: A record, defined by the Anonymization 262 Options Template in section Section 6.1, that defines the 263 properties of the anonymization applied to a single Information 264 Element within a single Template or Options Template. 266 Anonymized Data Record: A Data Record within a Data Set containing 267 at least one Information Element with anonymized values. The 268 Information Element(s) within the Template or Options Template 269 describing this Data Record SHOULD have a corresponding 270 Anonymization Record. 272 Intermediate Anonymization Process: An intermediate process which 273 takes Data Records and and transforms them into Anonymized Data 274 Records. 276 Note that there is an explicit difference in this document between a 277 "Data Set" (which is defined as in [RFC5101]) and a "data set". When 278 in lower case, this term refers to any collection of data (usually, 279 within the context of this document, flow or packet data) which may 280 contain identifying information and is therefore subject to 281 anonymization. 283 Note also that when the term Template is used in this document, 284 unless otherwise noted, it applies both to Templates and Options 285 Templates as defined in [RFC5101]. Specifically, Anonymization 286 Records may apply to both Templates and Options Templates. 288 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 289 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 290 document are to be interpreted as described in RFC 2119 [RFC2119]. 292 3. Categorization of Anonymization Techniques 294 Anonymization, as described by this document, is the modification of 295 a data set in order to protect the identity of the people or entities 296 described by the data set from disclosure. With respect to network 297 traffic data, anonymization generally attempts to preserve some set 298 of properties of the network traffic useful for a given application 299 or applications, while ensuring the data cannot be traced back to the 300 specific networks, hosts, or users generating the traffic. 302 Anonymization may be broadly classified according to two properties: 303 recoverability and countability. All anonymization techniques map 304 the real space of identifiers or values into a separate, anonymized 305 space, according to some function. A technique is said to be 306 recoverable when the function used is invertible or can otherwise be 307 reversed and a real identifier can be recovered from a given 308 replacement identifier. Techniques wherein the function used can 309 only be reversed using additional information, such as an encryption 310 key, or knowledge of injected traffic within the data set; 311 "recoverability" as used within this categorization does not refer to 312 recoverability under attack. 314 Countability compares the dimension of the anonymized space (N) to 315 the dimension of the real space (M), and denotes how the count of 316 unique values is preserved by the anonymization function. If the 317 anonymized space is smaller than the real space, then the function is 318 said to generalize the input, mapping more than one input point to 319 each anonymous value (e.g., as with aggregation). By definition, 320 generalization is not recoverable. 322 If the dimensions of the anonymized and real spaces are the same, 323 such that the count of unique values is preserved, then the function 324 is said to be a direct substitution function. If the dimension of 325 the anonymized space is larger, such that each real value maps to a 326 set of anonymized values, then the function is said to be a set 327 substitution function. Note that with set substitution functions, 328 the sets of anonymized values are not necessarily disjoint. Either 329 direct or set substitution functions are said to be one-way if there 330 exists no non-brute force method for recovering the real data point 331 from an anonymized one in isolation (i.e., if the only way to recover 332 the data point is to attack the anonymized data set as a whole, e.g. 333 through fingerprinting or data injection). 335 This classification is summarized in the table below. 337 +------------------------+-----------------+------------------------+ 338 | Recoverability / | Recoverable | Non-recoverable | 339 | Countability | | | 340 +------------------------+-----------------+------------------------+ 341 | N < M | N.A. | Generalization | 342 | N = M | Direct | One-way Direct | 343 | | Substitution | Substitution | 344 | N > M | Set | One-way Set | 345 | | Substitution | Substitution | 346 +------------------------+-----------------+------------------------+ 348 4. Anonymization of IP Flow Data 350 In anonymizing IP flow data as treated by this document, the goal is 351 generally two-way address untraceability: to remove the ability to 352 assert that endpoint X contacted endpoint Y at time T. Address 353 untraceability is important as IP addresses are the most suitable 354 field in IP flow records to identify real-world entities. Each IP 355 address is associated with an interface on a network host, and can 356 potentially be identified with a single user. Additionally, IP 357 addresses are structured identifiers; that is, partial IP address 358 prefixes may be used to identify networks just as full IP addresses 359 identify hosts. This leads IP flow data anonymization to be 360 concerned first and foremost with IP address anonymization. 362 Any form of aggregation which combines flows from multiple endpoints 363 into a single record (e.g., aggregation by subnetwork, aggregation 364 removing addressing completely) may also provide address 365 untraceability; however, anonymization by aggregation is out of scope 366 for this document. Additionally of potential interest in this 367 problem space but out of scope are anonymization techniques which are 368 applied over multiple fields or multiple records in a way which 369 introduces dependencies among anonymized fields or records. This 370 document is concerned solely with anonymization techniques applied at 371 the resolution of single fields within a flow record. 373 Even so, attacks against these anonymization techniques use entire 374 flows and relationships between hosts and flows within a given data 375 set. Therefore, fields which may not necessarily be identifying by 376 themselves may be anonymized in order to increase the anonymity of 377 the data set as a whole. 379 Due to the restricted semantics of IP flow data, there is a 380 relatively limited set of specific anonymization techniques available 381 on flow data, though each falls into the broad categories discussed 382 in the previous section. Each type of field that may commonly appear 383 in a flow record may have its own applicable specific techniques. 385 As with IP addresses, MAC addresses uniquely identify devices on the 386 network; while they are not often available in traffic data collected 387 at Layer 3, and cannot be used to locate devices within the network, 388 some traces may contain sub-IP data including MAC address data. 389 Hardware addresses may be mappable to device serial numbers, and to 390 the entities or individuals who purchased the devices, when combined 391 with external databases. MAC addresses are also often used in 392 constructing IPv6 addresses (see section 2.5.1 of [RFC4291]), and as 393 such may be used to reconstruct the low-order bits of anonymized IPv6 394 addresses in certain circumstances. Therefore, MAC address 395 anonymization is also important. 397 Port numbers identify abstract entities (applications) as opposed to 398 real-world entities, but they can be used to classify hosts and user 399 behavior. Passive port fingerprinting, both of well-known and 400 ephemeral ports, can be used to determine the operating system 401 running on a host. Relative data volumes by port can also be used to 402 determine the host's function (workstation, web server, etc.); this 403 information can be used to identify hosts and users. 405 While not identifiers in and of themselves, timestamps and counters 406 can reveal the behavior of the hosts and users on a network. Any 407 given network activity is recognizable by a pattern of relative time 408 differences and data volumes in the associated sequence of flows, 409 even without host address information. They can therefore be used to 410 identify hosts and users. Timestamps and counters are also 411 vulnerable to traffic injection attacks, where traffic with a known 412 pattern is injected into a network under measurement, and this 413 pattern is later identified in the anonymized data set. 415 The simplest and most extreme form of anonymization, which can be 416 applied to any field of a flow record, is black-marker anonymization, 417 or complete deletion of a given field. Note that black-marker 418 anonymization is equivalent to simply not exporting the field(s) in 419 question. 421 While black-marker anonymization completely protects the data in the 422 deleted fields from the risk of disclosure, it also reduces the 423 utility of the anonymized data set as a whole. Techniques that 424 retain some information while reducing (though not eliminating) the 425 disclosure risk will be extensively discussed in the following 426 sections; note that the techniques specifically applicable to IP 427 addresses, timestamps, ports, and counters will be discussed in 428 separate sections. 430 4.1. IP Address Anonymization 432 Since IP addresses are the most common identifiers within flow data 433 that can be used to directly identify a person, organization, or 434 host, most of the work on flow and trace data anonymization has gone 435 into IP address anonymization techniques. Indeed, the aim of most 436 attacks against anonymization is to recover the map from anonymized 437 IP addresses to original IP addresses thereby identifying the 438 identified hosts. There is therefore a wide range of IP address 439 anonymization schemes that fit into the following categories. 441 +------------------------------------+---------------------+ 442 | Scheme | Action | 443 +------------------------------------+---------------------+ 444 | Truncation | Generalization | 445 | Reverse Truncation | Generalization | 446 | Permutation | Direct Substitution | 447 | Prefix-preserving Pseudonymization | Direct Substitution | 448 +------------------------------------+---------------------+ 450 4.1.1. Truncation 452 Truncation removes "n" of the least significant bits from an IP 453 address, replacing them with zeroes. In effect, it replaces a host 454 address with a network address for some fixed netblock; for IPv4 455 addresses, 8-bit truncation corresponds to replacement with a /24 456 network address. Truncation is a non-reversible generalization 457 scheme. Note that while truncation is effective for making hosts 458 non-identifiable, it preserves information which can be used to 459 identify an organization, a geographic region, a country, or a 460 continent. 462 Truncation to an address length of 0 is equivalent to black-marker 463 anonymization. Complete removal of IP address information is only 464 recommended for analysis tasks which have no need to separate flow 465 data by host or network; e.g. as a first stage to per-application 466 (port) or time-series total volume analyses. 468 4.1.2. Reverse Truncation 470 Reverse truncation removes "n" of the most significant bits from an 471 IP address, replacing them with zeroes. Reverse truncation is a non- 472 reversible generalization scheme. Reverse truncation is effective 473 for making networks unidentifiable, partially or completely removing 474 information which can be used to identify an organization, a 475 geographic region, a country, or a continent (or RIR region of 476 responsibility). However, it may cause ambiguity when applied to 477 data collected from more than one network, since it treats all the 478 hosts with the same address on different networks as if they are the 479 same host. It is not particularly useful when publishing data where 480 the network of origin is known or can be easily guessed by virtue of 481 the identity of the publisher. 483 Like truncation, reverse truncation to an address length of 0 is 484 equivalent to black-marker anonymization. 486 4.1.3. Permutation 488 Permutation is a direct substitution technique, replacing each IP 489 address with an address selected from the set of possible IP 490 addresses, such that each anonymized address represents a unique 491 original address. The selection function is often random, though it 492 is not necessarily so. Permutation does not preserve any structural 493 information about a network, but it does preserve the unique count of 494 IP addresses. Any application that requires more structure than 495 host-uniqueness will not be able to use permuted IP addresses. 497 There are many variations of permutation functions, each of which has 498 tradeoffs in performance, security, and guarantees of non-collision; 499 evaluating these tradeoffs is implementation independent. However, 500 in general permutation functions applied to anonymization SHOULD be 501 difficult to reverse without knowing the parameters (e.g., a secret 502 key for HMAC). Given the relatively small space of IPv4 addresses in 503 particular, hash functions applied without additional parameters 504 could be reversed through brute force if the hash function is known, 505 and SHOULD NOT be used as permutation functions. Permutation 506 functions may guarantee noncollision (i.e., that each anonymized 507 address represents a unique original address), but need not; however, 508 the probability of collision SHOULD be low. We treat even 509 permutations with low but nonzero collision probability as direct 510 substitution nevertheless. Beyond these guidelines, recommendations 511 for specific permutation functions are out of scope for this 512 document. 514 4.1.4. Prefix-preserving Pseudonymization 516 Prefix-preserving pseudonymization is a direct substitution 517 technique, like permutation but further restricted such that the 518 structure of subnets is preserved at each level while anonymising IP 519 addresses. If two real IP addresses match on a prefix of "n" bits, 520 the two anonymized IP addresses will match on a prefix of "n" bits as 521 well. This is useful when relationships among networks must be 522 preserved for a given analysis task, but introduces structure into 523 the anonymized data which can be exploited in attacks against the 524 anonymization technique. 526 Scanning in Internet background traffic can cause particular problems 527 with this technique: if a scanner uses a predictable and known 528 sequence of addresses, this information can be used to reverse the 529 substitution. The low order portion of the address can be left 530 unanonymized as a partial defense against this attack. 532 4.2. MAC Address Anonymization 534 Flow data containing sub-IP information can also contain identifying 535 information in the form of the hardware (MAC) address. While MAC 536 address information cannot be used to locate a node within a network, 537 it can be used to directly uniquely identify a specific device. 538 Vendors or organizations within the supply chain may then have the 539 information necessary to identify the entity or individual that 540 purchased the device. 542 MAC address information is not as structured as IP address 543 information. EUI-48 and EUI-64 MAC addresses contain an 544 Organizational Unique Identifier (OUI) in the three most significant 545 bytes of the address; this OUI additionally contains bits noting 546 whether the address is locally or globally administered. Beyond 547 this, there is no standard relationship among the OUIs assigned to a 548 given vendor. 550 Note that MAC address information also appear within IPv6 addresses, 551 as the EAP-64 address, or EAP-48 address encoded as an EAP-64 552 address, is used as the least significant 64 bits of the IPv6 address 553 in the case of link local addressing or stateless autoconfiguration; 554 the considerations and techniques in this section may then apply to 555 such IPv6 addresses as well. 557 +-----------------------------+---------------------+ 558 | Scheme | Action | 559 +-----------------------------+---------------------+ 560 | Truncation | Generalization | 561 | Reverse Truncation | Generalization | 562 | Permutation | Direct Substitution | 563 | Structured Pseudonymization | Direct Substitution | 564 +-----------------------------+---------------------+ 566 4.2.1. Truncation 568 Truncation removes "n" of the least significant bits from a MAC 569 address, replacing them with zeroes. In effect, it retains bits of 570 OUI, which identifies the manufacturer, while removing the least 571 significant bits identifying the particular device. Truncation of 24 572 bits of an EAP-48 or 40 bits of an EAP-64 address zeroes out the 573 device identifier while retaining the OUI. 575 Truncation is effective for making device manufacturers partially or 576 completely identifiable within a dataset while deleting unique host 577 identifiers; this can be used to retain and aggregate MAC layer 578 behavior by vendor. 580 Truncation to an address length of 0 is equivalent to black-marker 581 anonymization. 583 4.2.2. Reverse Truncation 585 Reverse truncation removes "n" of the most significant bits from a 586 MAC address, replacing them with zeroes. Reverse truncation is a 587 non-reversible generalization scheme. This has the effect of 588 removing bits of the OUI, which identify manufacturers, before 589 removing the least significant bits. Reverse truncation of 24 bits 590 zeroes out the OUI. 592 Reverse truncation is effective for making device manufacturers 593 partially or completely unidentifiable within a dataset. However, it 594 may cause ambiguity by introducing the possibility of truncated MAC 595 address collision. Also note that the utility of removing 596 manufacturer information is not particularly well-covered by the 597 literature. 599 Reverse truncation to an address length of 0 is equivalent to black- 600 marker anonymization. 602 4.2.3. Permutation 604 Permutation is a direct substitution technique, replacing each MAC 605 address with an address selected from the set of possible MAC 606 addresses, such that each anonymized address represents a unique 607 original address. The selection function is often random, though it 608 is not necessarily so. Permutation does not preserve any structural 609 information about a network, but it does preserve the unique count of 610 devices on the network. Any application that requires more structure 611 than host-uniqueness will not be able to use permuted MAC addresses. 613 There are many variations of permutation functions, each of which has 614 tradeoffs in performance, security, and guarantees of non-collision; 615 evaluating these tradeoffs is implementation independent. However, 616 in general permutation functions applied to anonymization SHOULD be 617 difficult to reverse without knowing the parameters (e.g., a secret 618 key for HMAC). While the EAP-48 space is larger than the IPv4 619 address space, hash functions applied without additional parameters 620 could be reversed through brute force if the hash function is known, 621 and SHOULD NOT be used as permutation functions. Permutation 622 functions may guarantee noncollision (i.e., that each anonymized 623 address represents a unique original address), but need not; however, 624 the probability of collision SHOULD be low. We treat even 625 permutations with low but nonzero collision probability as direct 626 substitution nevertheless. Beyond these guidelines, recommendations 627 for specific permutation functions are out of scope for this 628 document. 630 4.2.4. Structured Pseudonymization 632 Structured pseudonymization for MAC addresses is a direct 633 substitution technique, like permutation, but restricted such that 634 the OUI (the most significant three bytes) is permuted separately 635 from the node identifier, the remainder. This is useful when the 636 uniqueness of OUIs must be preserved for a given analysis task, but 637 introduces structure into the anonymized data which can be exploited 638 in attacks against the anonymization technique. 640 4.3. Timestamp Anonymization 642 The particular time at which a flow began or ended is not 643 particularly identifiable information, but it can be used as part of 644 attacks against other anonymization techniques or for user profiling, 645 e.g. as in [Mur07]. Timestamps can be used in traffic injection 646 attacks, which use known information about a set of traffic generated 647 or otherwise known by an attacker to recover mappings of other 648 anonymized fields, as well as to identify certain activity by 649 response delay and size fingerprinting, which compares response sizes 650 and inter-flow times in anonymized data to known values. Note that 651 these attacks have been shown to be relatively robust against 652 timestamp anonymization techniques (see [Bur10]), so the techniques 653 presented in this section are relatively weak and should be used with 654 care. 656 +-----------------------+----------------------------+ 657 | Scheme | Action | 658 +-----------------------+----------------------------+ 659 | Precision Degradation | Generalization | 660 | Enumeration | Direct or Set Substitution | 661 | Random Shifts | Direct Substitution | 662 +-----------------------+----------------------------+ 664 4.3.1. Precision Degradation 666 Precision Degradation is a generalization technique that removes the 667 most precise components of a timestamp, accounting all events 668 occurring in each given interval (e.g. one millisecond for 669 millisecond level degradation) as simultaneous. This has the effect 670 of potentially collapsing many timestamps into one. With this 671 technique time precision is reduced, and sequencing may be lost, but 672 the information at which time the event occurred is preserved. The 673 anonymized data may not be generally useful for applications which 674 require strict sequencing of flows. 676 Note that flow meters with low time precision (e.g. second precision, 677 or millisecond precision on high-capacity networks) perform the 678 equivalent of precision degradation anonymization by their design. 680 Note also that degradation to a very low precision (e.g. on the order 681 of minutes, hours, or days) is commonly used in analyses operating on 682 time-series aggregated data, and may also be described as binning; 683 though the time scales are longer and applicability more restricted, 684 this is in principle the same operation. 686 Precision degradation to infinitely low precision is equivalent to 687 black-marker anonymization. Removal of timestamp information is only 688 recommended for analysis tasks which have no need to separate flows 689 in time, for example for counting total volumes or unique occurrences 690 of other flow keys in an entire dataset. 692 4.3.2. Enumeration 694 Enumeration is a substitution function that retains the chronological 695 order in which events occurred while eliminating time information. 696 Timestamps are substituted by equidistant timestamps (or numbers) 697 starting from a randomly chosen start value. The resulting data is 698 useful for applications requiring strict sequencing, but not for 699 those requiring good timing information (e.g. delay- or jitter- 700 measurement for quality-of-service (QoS) applications or service- 701 level agreement (SLA) validation). 703 Note that enumeration is functionally equivalent to precision 704 degradation in any environment into which traffic can be regularly 705 injected to serve as a clock at the precision of the frequency of the 706 injected flows. 708 4.3.3. Random Shifts 710 Random time shifts add a random offset to every timestamp within a 711 dataset. This reversible substitution technique therefore retains 712 duration and inter-event interval information as well as 713 chronological order of flows. Random time shifts are quite weak, and 714 relatively easy to reverse in the presence of external knowledge 715 about traffic on the measured network. 717 4.4. Counter Anonymization 719 Counters (such as packet and octet volumes per flow) are subject to 720 fingerprinting and injection attacks against anonymization, or for 721 user profiling as timestamps are. Data sets with anonymized counters 722 are useful only for analysis tasks for which relative or imprecise 723 magnitudes of activity are useful. Counter information can also be 724 completely removed, but this is only recommended for analysis tasks 725 which have no need to evaluate the removed counter, for example for 726 counting only unique occurrences of other flow keys. 728 +-----------------------+----------------------------+ 729 | Scheme | Action | 730 +-----------------------+----------------------------+ 731 | Precision Degradation | Generalization | 732 | Binning | Generalization | 733 | Random noise addition | Direct or Set Substitution | 734 +-----------------------+----------------------------+ 736 4.4.1. Precision Degradation 738 As with precision degradation in timestamps, precision degradation of 739 counters removes lower-order bits of the counters, treating all the 740 counters in a given range as having the same value. Depending on the 741 precision reduction, this loses information about the relationships 742 between sizes of similarly-sized flows, but keeps relative magnitude 743 information. Precision degradation to an infinitely low precision is 744 equivalent to black-marker anonymization. 746 4.4.2. Binning 748 Binning can be seen as a special case of precision degradation; the 749 operation is identical, except for in precision degradation the 750 counter ranges are uniform, and in binning they need not be. For 751 example, consider separating unopened TCP connections from 752 potentially opened TCP connections. Here, packet counters per flow 753 would be binned into two bins, one for 1-2 packet flows, and one for 754 flows with 3 or more packets. Binning schemes are generally chosen 755 to keep precisely the amount of information required in a counter for 756 a given analysis task. Note that, also unlike precision degradation, 757 the bin label need not be within the bin's range. Binning counters 758 to a single bin is equivalent to black-marker anonymization. 760 4.4.3. Random Noise Addition 762 Random noise addition adds a random amount to a counter in each flow; 763 this is used to keep relative magnitude information and minimize the 764 disruption to size relationship information while avoiding 765 fingerprinting attacks against anonymization. Note that there is no 766 guarantee that random noise addition will maintain ranking order by a 767 counter among members of a set. Random noise addition is 768 particularly useful when the derived analysis data will not be 769 presented in such a way as to require the lower-order bits of the 770 counters. 772 4.5. Anonymization of Other Flow Fields 774 Other fields, particularly port numbers and protocol numbers, can be 775 used to partially identify the applications that generated the 776 traffic in a a given flow trace. This information can be used in 777 fingerprinting attacks, and may be of interest on its own (e.g., to 778 reveal that a certain application with suspected vulnerabilities is 779 running on a given network). These fields are generally anonymized 780 using one of two techniques. 782 +-------------+---------------------+ 783 | Scheme | Action | 784 +-------------+---------------------+ 785 | Binning | Generalization | 786 | Permutation | Direct Substitution | 787 +-------------+---------------------+ 789 4.5.1. Binning 791 Binning is a generalization technique mapping a set of potentially 792 non-uniform ranges into a set of arbitrarily labeled bins. Common 793 bin arrangements depend on the field type and the analysis 794 application. For example, an IP protocol bin arrangement may 795 preserve 1, 6, and 17 for ICMP, UDP, and TCP traffic, and bin all 796 other protocols into a single bin, to mitigate the use of uncommon 797 protocols in fingerprinting attacks. Another example arrangement may 798 bin source and destination ports into low (0-1023) and high (1024- 799 65535) bins in order to tell service from ephemeral ports without 800 identifying individual applications. 802 Binning other flow key fields to a single bin is equivalent to black- 803 marker anonymization. Removal of other flow key information is only 804 recommended for analysis tasks which have no need to differentiate 805 flows on the removed keys, for example for total traffic counts or 806 unique counts of other flow keys. 808 4.5.2. Permutation 810 Permutation is a direct substitution technique, replacing each value 811 with an value selected from the set of possible range, such that each 812 anonymized value represents a unique original value. This is used to 813 preserve the count of unique values without preserving information 814 about, or the ordering of, the values themselves. 816 While permutation ideally guarantees that each anonymized value 817 represents a unique original value, such may require significant 818 state in the Intermediate Anonymization Process. Therefore, 819 permutation may be implemented by hashing for performance reasons, 820 with hash functions that may have relatively small collision 821 probabilities. Such techniques are still essentially direct 822 substitution techniques, despite the nonzero error probability. 824 5. Parameters for the Description of Anonymization Techniques 826 This section details the abstract parameters used to describe the 827 anonymization techniques examined in the previous section, on a per- 828 parameter basis. These parameters and their export safety inform the 829 design of the IPFIX anonymization metadata export specified in the 830 following section. 832 5.1. Stability 834 A stable anonymization will always map a given value in the real 835 space to a given value in the anonymized space, while an unstable 836 anonymization will change this mapping over time; a completely 837 unstable anonymization is essentially indistinguishable from black- 838 marker anonymization. Any given anonymization technique may be 839 applied with a varying range of stability. Stability is important 840 for assessing the comparability of anonymized information in 841 different data sets, or in the same data set over different time 842 periods. In practice, an anonymization may also be stable for every 843 data set published by an a particular producer to a particular 844 consumer, stable for a stated time period within a dataset or across 845 datasets, or stable only for a single data set. 847 If no information about stability is available, users of anonymized 848 data MAY assume that the techniques used are stable across the entire 849 dataset, but unstable across datasets. Note that stability presents 850 a risk-utility tradeoff, as completely stable anonymization can be 851 used for longer-term trend analysis tasks but also presents more risk 852 of attack given the stable mapping. Information about the stability 853 of a mapping SHOULD be exported along with the anonymized data. 855 5.2. Truncation Length 857 Truncation and precision degradation are described by the truncation 858 length, or the amount of data still remaining in the anonymized field 859 after anonymization. 861 Truncation length can generally be inferred from a given data set, 862 and need not be specially exported or protected. For bit-level 863 truncation, the truncated bits are generally inferable by the least 864 significant bit set for an instance of an Information Element 865 described by a given Template (or the most significant bit set, in 866 the case of reverse truncation). For precision degradation, the 867 truncation is inferable from the maximum precision given. Note that 868 while this inference method is generally applicable, it is data- 869 dependent: there is no guarantee that it will recover the exact 870 truncation length used to prepare the data. 872 In the special case of IP address export with variable (per-record) 873 truncation, the truncation MAY be expressed by exporting the prefix 874 length alongside the address. 876 5.3. Bin Map 878 Binning is described by the specification of a bin mapping function. 879 This function can be generally expressed in terms of an associative 880 array that maps each point in the original space to a bin, although 881 from an implementation standpoint most bin functions are much simpler 882 and more efficient. 884 Since the bin map for a bin mapping function is in essence the bin 885 mapping key, and can be used to partially deanonymize binned data, 886 depending on the degree of generalization, information about the bin 887 mapping function SHOULD NOT be exported. 889 5.4. Permutation 891 Like binning, permutation is described by the specification of a 892 permutation function. In the general case, this can be expressed in 893 terms of an associative array that maps each point in the original 894 space to a point in the anonymized space. Unlike binning, each point 895 in the anonymized space corresponds to a single, unique point in the 896 original space. 898 Since the parameters of the permutation function are in essence key- 899 like (indeed, for cryptographic permutation functions, they are the 900 keys themselves), information about the permutation function or its 901 parameters SHOULD NOT be exported. 903 5.5. Shift Amount 905 Shifting requires an amount to shift each value by. Since the shift 906 amount is the only key to a shift function, and can be used to 907 trivially deanonymize data protected by shifting, information about 908 the shift amount SHOULD NOT be exported. 910 6. Anonymization Export Support in IPFIX 912 Anonymized data exported via IPFIX SHOULD be annotated with 913 anonymization metadata, which details which fields described by which 914 Templates are anonymized, and provides appropriate information on the 915 anonymization techniques used. This metadata SHOULD be exported in 916 Data Records described by the recommended Options Templates described 917 in this section; these Options Templates use the additional 918 Information Elements described in the following subsection. 920 Note that fields anonymized using the black-marker (removal) 921 technique do not require any special metadata support: black-marker 922 anonymized fields SHOULD NOT be exported at all, by omitting the 923 corresponding Information Elements from Template describing the Data 924 Set. In the case where application requirements dictate that a black- 925 marker anonymized field must remain in a Template, then an Exporting 926 Process MAY export black-marker anonymized fields with their native 927 length as all-zeros, but only in cases where enough contextual 928 information exists within the record to differentiate a black-marker 929 anonymized field exported in this way from a real zero value. 931 6.1. Anonymization Records and the Anonymization Options Template 933 The Anonymization Options Template describes Anonymization Records, 934 which allow anonymization metadata to be exported inline over IPFIX 935 or stored in an IPFIX File, by binding information about 936 anonymization techniques to Information Elements within defined 937 Templates or Options Templates. IPFIX Exporting Processes SHOULD 938 export anonymization records for any Template describing exported 939 anonymized Data Records; IPFIX Collecting Processes and processes 940 downstream from them MAY use anonymization records to treat 941 anonymized data differently depending on the applied technique. 943 Anonymization Records contain ancillary information bound to a 944 Template, so many of the considerations for Templates apply to 945 Anonymization Records as well. First, reliability is important: an 946 Exporting Process SHOULD export Anonymization Records after the 947 Templates they describe have been exported, and SHOULD export 948 anonymization records reliably if supported by the underlying 949 transport (i.e., without partial reliability when using SCTP) 951 Anonymization Records MUST be handled by Collecting Processes as 952 scoped to the Template to which they apply within the Transport 953 Session in which they are sent. When a Template is withdrawn via a 954 Template Withdrawal Message or expires during a UDP transport 955 session, the accompanying Anonymization Records are withdrawn or 956 expire as well, and do not apply to subsequent Templates with the 957 same Template ID within the Session unless re-exported. 959 The Stability Class within the anonymizationFlags IE can be used to 960 declare that a given anonymization technique's mapping will remain 961 stable across multiple sessions, but this does not mean that 962 anonymization technique information given in the Anonymization 963 Records themselves persist across Sessions. Each new Transport 964 Session MUST contain new Anonymization Records for each Template 965 describing anonymized Data Sets. 967 SCTP per-stream export [I-D.ietf-ipfix-export-per-sctp-stream] may be 968 used to ease management of Anonymization Records if appropriate for 969 the application. 971 The fields of the Anonymization Options template are as follows: 973 +-------------------------+-----------------------------------------+ 974 | IE | Description | 975 +-------------------------+-----------------------------------------+ 976 | templateId [scope] | The Template ID of the Template or | 977 | | Options Template containing the | 978 | | Information Element described by this | 979 | | anonymization record. This Information | 980 | | Element MUST be defined as a Scope | 981 | | Field. | 982 | informationElementId | The Information Element identifier of | 983 | [scope] | the Information Element described by | 984 | | this anonymization record. This | 985 | | Information Element MUST be defined as | 986 | | a Scope Field. Exporting Processes | 987 | | MUST clear then Enterprise bit of the | 988 | | informationElementId and Collecting | 989 | | Processes SHOULD ignore it; information | 990 | | about enterprise-specific Information | 991 | | Elements is exported via the | 992 | | privateEnterpriseNumber Information | 993 | | Element. | 994 | privateEnterpriseNumber | The Private Enterprise Number of the | 995 | [scope] [optional] | enterprise-specific Information Element | 996 | | described by this anonymization record. | 997 | | This Information Element MUST be | 998 | | defined as a Scope Field if present. A | 999 | | privateEnterpriseNumber of 0 signifies | 1000 | | that the Information Element is | 1001 | | IANA-registered. | 1002 | informationElementIndex | The Information Element index of the | 1003 | [scope] [optional] | instance of the Information Element | 1004 | | described by this anonymization record | 1005 | | identified by the informationElementId | 1006 | | within the Template. Optional; need | 1007 | | only be present when describing | 1008 | | Templates that have multiple instances | 1009 | | of the same Information Element. This | 1010 | | Information Element MUST be defined as | 1011 | | a Scope Field if present. This | 1012 | | Information Element is defined in | 1013 | | Section 6.2, below. | 1014 | anonymizationFlags | Flags describing the mapping stability | 1015 | | and specialized modifications to the | 1016 | | Anonymization Technique in use. SHOULD | 1017 | | be present. This Information Element | 1018 | | is defined in Section 6.2.3, below. | 1019 | anonymizationTechnique | The technique used to anonymize the | 1020 | | data. MUST be present. This | 1021 | | Information Element is defined in | 1022 | | Section 6.2.2, below. | 1023 +-------------------------+-----------------------------------------+ 1025 6.2. Recommended Information Elements for Anonymization Metadata 1027 6.2.1. informationElementIndex 1029 Description: A zero-based index of an Information Element 1030 referenced by informationElementId within a Template referenced by 1031 templateId; used to disambiguate scope for templates containing 1032 multiple identical Information Elements. 1034 Abstract Data Type: unsigned16 1036 Data Type Semantics: identifier 1038 ElementId: TBD3 1040 Status: Current 1042 6.2.2. anonymizationTechnique 1044 Description: A description of the anonymization technique applied 1045 to a referenced Information Element within a referenced Template. 1046 Each technique may be applicable only to certain Information 1047 Elements and recommended only for certain Infomation Elements; 1048 these restrictions are noted in the table below. 1050 +-------+---------------------------+-----------------+-------------+ 1051 | Value | Description | Applicable to | Recommended | 1052 | | | | for | 1053 +-------+---------------------------+-----------------+-------------+ 1054 | 0 | Undefined: the Exporting | all | all | 1055 | | Process makes no | | | 1056 | | representation as to | | | 1057 | | whether the defined field | | | 1058 | | is anonymized or not. | | | 1059 | | While the Collecting | | | 1060 | | Process MAY assume that | | | 1061 | | the field is not | | | 1062 | | anonymized, it is not | | | 1063 | | guaranteed not to be. | | | 1064 | | This is the default | | | 1065 | | anonymization technique. | | | 1066 | 1 | None: the values exported | all | all | 1067 | | are real. | | | 1068 | 2 | Precision | all | all | 1069 | | Degradation/Truncation: | | | 1070 | | the values exported are | | | 1071 | | anonymized using simple | | | 1072 | | precision degradation or | | | 1073 | | truncation. The new | | | 1074 | | precision or number of | | | 1075 | | truncated bits is | | | 1076 | | implicit in the exported | | | 1077 | | data, and can be deduced | | | 1078 | | by the Collecting | | | 1079 | | Process. | | | 1080 | 3 | Binning: the values | all | all | 1081 | | exported are anonymized | | | 1082 | | into bins. | | | 1083 | 4 | Enumeration: the values | all | timestamps | 1084 | | exported are anonymized | | | 1085 | | by enumeration. | | | 1086 | 5 | Permutation: the values | all | identifiers | 1087 | | exported are anonymized | | | 1088 | | by permutation. | | | 1089 | 6 | Structured Permutation: | addresses | | 1090 | | the values exported are | | | 1091 | | anonymized by | | | 1092 | | permutation, preserving | | | 1093 | | bit-level structure as | | | 1094 | | appropriate; this | | | 1095 | | represents | | | 1096 | | prefix-preserving IP | | | 1097 | | address anonymization or | | | 1098 | | structured MAC address | | | 1099 | | anonymization. | | | 1100 | 7 | Reverse Truncation: the | addresses | | 1101 | | values exported are | | | 1102 | | anonymized using reverse | | | 1103 | | truncation. The number | | | 1104 | | of truncated bits is | | | 1105 | | implicit in the exported | | | 1106 | | data, and can be deduced | | | 1107 | | by the Collecting | | | 1108 | | Process. | | | 1109 | 8 | Noise: the values | non-identifiers | counters | 1110 | | exported are anonymized | | | 1111 | | by adding random noise to | | | 1112 | | each value. | | | 1113 | 9 | Offset: the values | all | timestamps | 1114 | | exported are anonymized | | | 1115 | | by adding a single offset | | | 1116 | | to all values. | | | 1117 +-------+---------------------------+-----------------+-------------+ 1119 Abstract Data Type: unsigned16 1121 Data Type Semantics: identifier 1123 ElementId: TBD2 1125 Status: Current 1127 6.2.3. anonymizationFlags 1129 Description: A flag word describing specialized modifications to 1130 the anonymization policy in effect for the anonymization technique 1131 applied to a referenced Information Element within a referenced 1132 Template. When flags are clear (0), the normal policy (as 1133 described by anonymizationTechnique) applies without modification. 1135 MSB 14 13 12 11 10 9 8 7 6 5 4 3 2 1 LSB 1136 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ 1137 | Reserved |LOR|PmA| SC | 1138 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ 1140 anonymizationFlags IE 1142 +--------+----------+-----------------------------------------------+ 1143 | bit(s) | name | description | 1144 | (LSB = | | | 1145 | 0) | | | 1146 +--------+----------+-----------------------------------------------+ 1147 | 0-1 | SC | Stability Class: see the Stability Class | 1148 | | | table below, and section Section 5.1. | 1149 | 2 | PmA | Perimeter Anonymization: when set (1), | 1150 | | | source- Information Elements as described in | 1151 | | | [RFC5103] are interpreted as external | 1152 | | | addresses, and destination- Information | 1153 | | | Elements as described in [RFC5103] are | 1154 | | | interpreted as internal addresses, for the | 1155 | | | purposes of associating | 1156 | | | anonymizationTechnique to Information | 1157 | | | Elements only; see Section 7.2.2 for details. | 1158 | | | This bit MUST NOT be set when associated with | 1159 | | | a non-endpoint (i.e., source- or | 1160 | | | destination-) Information Element. SHOULD be | 1161 | | | consistent within a record (i.e., if a | 1162 | | | source- Information Element has this flag | 1163 | | | set, the corresponding destination- element | 1164 | | | SHOULD have this flag set, and vice-versa.) | 1165 | 3 | LOR | Low-Order Unchanged: when set (1), the | 1166 | | | low-order bits of the anonymized Information | 1167 | | | Element contain real data. This modification | 1168 | | | is intended for the anonymization of | 1169 | | | network-level addresses while leaving | 1170 | | | host-level addresses intact in order to | 1171 | | | preserve host level-structure, which could | 1172 | | | otherwise be used to reverse anonymization. | 1173 | | | MUST NOT be set when associated with a | 1174 | | | truncation-based anonymizationTechnique. | 1175 | 4-15 | Reserved | Reserved for future use: SHOULD be cleared | 1176 | | | (0) by the Exporting Process and MUST be | 1177 | | | ignored by the Collecting Process. | 1178 +--------+----------+-----------------------------------------------+ 1180 The Stability Class portion of this flags word describes the 1181 stability class of the anonymization technique applied to a 1182 referenced Information Element within a referenced Template. 1183 Stability classes refer to the stability of the parameters of the 1184 anonymization technique, and therefore the comparability of the 1185 mapping between the real and anonymized values over time. This 1186 determines which anonymized datasets may be compared with each 1187 other. Values are as follows: 1189 +-----+-----+-------------------------------------------------------+ 1190 | Bit | Bit | Description | 1191 | 1 | 0 | | 1192 +-----+-----+-------------------------------------------------------+ 1193 | 0 | 0 | Undefined: the Exporting Process makes no | 1194 | | | representation as to how stable the mapping is, or | 1195 | | | over what time period values of this field will | 1196 | | | remain comparable; while the Collecting Process MAY | 1197 | | | assume Session level stability, Session level | 1198 | | | stability is not guaranteed. Processes SHOULD assume | 1199 | | | this is the case in the absence of stability class | 1200 | | | information; this is the default stability class. | 1201 | 0 | 1 | Session: the Exporting Process will ensure that the | 1202 | | | parameters of the anonymization technique are stable | 1203 | | | during the Transport Session. All the values of the | 1204 | | | described Information Element for each Record | 1205 | | | described by the referenced Template within the | 1206 | | | Transport Session are comparable. The Exporting | 1207 | | | Process SHOULD endeavour to ensure at least this | 1208 | | | stability class. | 1209 | 1 | 0 | Exporter-Collector Pair: the Exporting Process will | 1210 | | | ensure that the parameters of the anonymization | 1211 | | | technique are stable across Transport Sessions over | 1212 | | | time with the given Collecting Process, but may use | 1213 | | | different parameters for different Collecting | 1214 | | | Processes. Data exported to different Collecting | 1215 | | | Processes are not comparable. | 1216 | 1 | 1 | Stable: the Exporting Process will ensure that the | 1217 | | | parameters of the anonymization technique are stable | 1218 | | | across Transport Sessions over time, regardless of | 1219 | | | the Collecting Process to which it is sent. | 1220 +-----+-----+-------------------------------------------------------+ 1222 Abstract Data Type: unsigned16 1224 Data Type Semantics: flags 1226 ElementId: TBD1 1228 Status: Current 1230 7. Applying Anonymization Techniques to IPFIX Export and Storage 1232 When exporting or storing anonymized flow data using IPFIX, certain 1233 interactions between the IPFIX Protocol and the anonymization 1234 techniques in use must be considered; these are treated in the 1235 subsections below. 1237 7.1. Arrangement of Processes in IPFIX Anonymization 1239 Anonymization may be applied to IPFIX data at three stages within the 1240 collection infrastructure: on initial export, at a mediator, or after 1241 collection, as shown in Figure 1. Each of these locations has 1242 specific considerations and applicability. 1244 +==========================================+ 1245 | Exporting Process | 1246 +==========================================+ 1247 | | 1248 | (Anonymized at Original Exporter) | 1249 V | 1250 +=============================+ | 1251 | Mediator | | 1252 +=============================+ | 1253 | | 1254 | (Anonymising Mediator) | 1255 V V 1256 +==========================================+ 1257 | Collecting Process | 1258 +==========================================+ 1259 | 1260 | (Anonymising CP/File Writer) 1261 V 1262 +--------------------+ 1263 | IPFIX File Storage | 1264 +--------------------+ 1266 Figure 1: Potential Anonymization Locations 1268 Anonymization is generally performed before the wider dissemination 1269 or repurposing of a flow data set, e.g., adapting operational 1270 measurement data for research. Therefore, direct anonymization of 1271 flow data on initial export is only applicable in certain restricted 1272 circumstances: when the Exporting Process (EP) is "publishing" data 1273 to a Collecting Process (CP) directly, and the Exporting Process and 1274 Collecting Process are operated by different entities. Note that 1275 certain guidelines in Section 7.2.3 with respect to timestamp 1276 anonymization may not apply in this case, as the Collecting Process 1277 may be able to deduce certain timing information from the time at 1278 which each Message is received. 1280 A much more flexible arrangement is to anonymize data within a 1281 Mediator [I-D.ietf-ipfix-mediators-framework]. Here, original data 1282 is sent to a Mediator, which performs the anonymization function and 1283 re-exports the anonymized data. Such a Mediator could be located at 1284 the administrative domain boundary of the initial Exporting Process 1285 operator, exporting anonymized data to other consumers outside the 1286 organization. In this case, the original Exporter SHOULD use TLS 1287 [RFC5246] as specified in [RFC5101] to secure the channel to the 1288 Mediator, and the Mediator should follow the guidelines in 1289 Section 7.2, to mitigate the risk of original data disclosure. 1291 When data is to be published as an anonymized data set in an IPFIX 1292 File [RFC5655], the anonymization may be done at the final Collecting 1293 Process before storage and dissemination, as well. In this case, the 1294 Collector should follow the guidelines in Section 7.2, especially as 1295 regards File-specific Options in Section 7.2.4 1297 In each of these data flows, the anonymization of records is 1298 undertaken by an Intermediate Anonymization Process (IAP); the data 1299 flows into and out of this IAP are shown in Figure 2 below. 1301 packets --+ +- IPFIX Messages -+ 1302 | | | 1303 V V V 1304 +==================+ +====================+ +=============+ 1305 | Metering Process | | Collecting Process | | File Reader | 1306 +==================+ +====================+ +=============+ 1307 | Non-anonymized | Records | 1308 V V V 1309 +=========================================================+ 1310 | Intermediate Anonymization Process (IAP) | 1311 +=========================================================+ 1312 | Anonymized ^ Anonymized | 1313 | Records | Records | 1314 V | V 1315 +===================+ Anonymization +=============+ 1316 | Exporting Process |<--- Parameters ------>| File Writer | 1317 +===================+ +=============+ 1318 | | 1319 +------------> IPFIX Messages <----------+ 1321 Figure 2: Data flows through the anonymization process 1323 Anonymization parameters must also be available to the Exporting 1324 Process and/or File Writer in order to ensure header data is also 1325 appropriately anonymized as in Section 7.2.3. 1327 Following each of the data flows through the IAP, we describe five 1328 basic types of anonymization arrangements within this framework in 1329 Figure 3. In addition to the three arrangements described in detail 1330 above, anonymization can also be done at a collocated Metering 1331 Process (MP) and File Writer (FW) (see section 7.3.2 of [RFC5655]), 1332 or at a file manipulator, which combines a File Writer with a File 1333 Reader (FR) (see section 7.3.7 of [RFC5655]). 1335 +----+ +-----+ +----+ 1336 pkts -> | MP |->| IAP |->| EP |-> anonymization on Original Exporter 1337 +----+ +-----+ +----+ 1338 +----+ +-----+ +----+ 1339 pkts -> | MP |->| IAP |->| FW |-> Anonymising collocated MP/File Writer 1340 +----+ +-----+ +----+ 1341 +----+ +-----+ +----+ 1342 IPFIX -> | CP |->| IAP |->| EP |-> Anonymising Mediator (Masq. Proxy) 1343 +----+ +-----+ +----+ 1344 +----+ +-----+ +----+ 1345 IPFIX -> | CP |->| IAP |->| FW |-> Anonymising collocated CP/File Writer 1346 +----+ +-----+ +----+ 1347 +----+ +-----+ +----+ 1348 IPFIX -> | FR |->| IAP |->| FW |-> Anonymising file manipulator 1349 File +----+ +-----+ +----+ 1351 Figure 3: Possible anonymization arrangements in the IPFIX 1352 architecture 1354 Note that anonymization may occur at more than one location within a 1355 given collection infrastructure, to provide varying levels of 1356 anonymization, disclosure risk, or data utility for specific 1357 purposes. 1359 7.2. IPFIX-Specific Anonymization Guidelines 1361 In implementing and deploying the anonymization techniques described 1362 in this document, implementors should note that IPFIX already 1363 provides features that support anonymized data export, and use these 1364 where appropriate. Care must also be taken that data structures 1365 supporting the operation of the protocol itself do not leak data that 1366 could be used to reverse the anonymization applied to the flow data. 1367 Such data structures may appear in the header, or within the data 1368 stream itself, especially as options data. Each of these and their 1369 impact on specific anonymization techniques is noted in a separate 1370 subsection below. 1372 7.2.1. Appropriate Use of Information Elements for Anonymized Data 1374 Note, as in Section 6 above, that black-marker anonymized fields 1375 SHOULD NOT be exported at all; the absence of the field in a given 1376 Data Set is implicitly declared by not including the corresponding 1377 Information Element in the Template describing that Data Set. 1379 When using precision degradation of timestamps, Exporting Processes 1380 SHOULD export timing information using Information Elements of an 1381 appropriate precision, as explained in Section 4.5 of [RFC5153]. For 1382 example, timestamps measured in millisecond-level precision and 1383 degraded to second-level precision should use flowStartSeconds and 1384 flowEndSeconds, not flowStartMilliseconds and flowEndMilliseconds. 1386 When exporting anonymized data and anonymization metadata, Exporting 1387 Processes SHOULD ensure that the combination of Information Element 1388 and declared anonymization technique are compatible. Specifically, 1389 the applicable and recommended Information Element types and 1390 semantics for each technique are noted in the description of the 1391 anonymizationTechnique Information Element in Section 6.2.2. In this 1392 description, a timestamp is an Information Element with the data type 1393 dateTimeSeconds, dataTimeMilliseconds, dateTimeMicroseconds, or 1394 dateTimeNanoseconds; an address is an Information Element with the 1395 data type ipv4Address, ipv6Address, or macAddress; and an identifier 1396 is an Information Element with identifier data type semantics. 1397 Exporting Process MUST NOT export Anonymization Options records 1398 binding techniques to Information Elements to which they are not 1399 applicable, and SHOULD NOT export Anonymization Options records 1400 binding techniques to Information Elements for which they are not 1401 recommended. 1403 7.2.2. Export of Perimeter-Based Anonymization Policies 1405 Data collected from a single network may require different 1406 anonymization policies for addresses internal and external to the 1407 network. For example, internal addresses could be subject to simple 1408 permutation, while external addresses could be aggregated into 1409 networks by truncation. When exporting anonymized perimeter 1410 bidirectional flow (biflow) data as in section 5.2 of [RFC5103], this 1411 arrangement may be easily represented by specifying one technique for 1412 source endpoint information (which represents the external endpoint 1413 in a perimeter biflow) and one technique for destination endpoint 1414 information (which represents the internal address in a perimeter 1415 biflow). 1417 However, it can also be useful to represent perimeter-based 1418 anonymization policies with unidirectional flow (uniflow), or non- 1419 perimeter biflow data. In this case, the Perimeter Anonymization bit 1420 (bit 2) in the anonymizationFlags Information Element describing the 1421 anonymized address Information Elements can be set to change the 1422 meaning of "source" and "destination" of Information Elements to mean 1423 "external" and "internal" as with perimeter biflows, but only with 1424 respect to anonymization policies. 1426 7.2.3. Anonymization of Header Data 1428 Each IPFIX Message contains a Message Header; within this Message 1429 Header are contained two fields which may be used to break certain 1430 anonymization techniques: the Export Time, and the Observation Domain 1431 ID 1433 Export of IPFIX Messages containing anonymized timestamp data where 1434 the original Export Time Message header has some relationship to the 1435 anonymized timestamps SHOULD anonymize the Export Time header field 1436 so that the Export Time is consistent with the anonymized timestamp 1437 data. Otherwise, relationships between export and flow time could be 1438 used to partially or totally reverse timestamp anonymization. When 1439 anonymising timestamps and the Export Time header field SHOULD avoid 1440 times too far in the past or future; while [RFC5101] does not make 1441 any allowance for Export Time error detection, it is sensible that 1442 Collecting Processes may interpret Messages with seemingly 1443 nonsensical Export Times as erroneous. Specific limits are 1444 implementation-dependent, but this issue may cause interoperability 1445 issues when anonymising the Export Time header field. 1447 The similarity in size between an Observation Domain ID and an IPv4 1448 address (32 bits) may lead to a temptation to use an IPv4 interface 1449 address on the Metering or Exporting Process as the Observation 1450 Domain ID. If this address bears some relation to the IP addresses 1451 in the flow data (e.g., shares a network prefix with internal 1452 addresses) and the IP addresses in the flow data are anonymized in a 1453 structure-preserving way, then the Observation Domain ID may be used 1454 to break the IP address anonymization. Use of an IPv4 interface 1455 address on the Metering or Exporting Process as the Observation 1456 Domain ID is NOT RECOMMENDED in this case. 1458 7.2.4. Anonymization of Options Data 1460 IPFIX uses the Options mechanism to export, among other things, 1461 metadata about exported flows and the flow collection infrastructure. 1462 As with the IPFIX Message Header, certain Options recommended in 1463 [RFC5101] and [RFC5655] containing flow timestamps and network 1464 addresses of Exporting and Collecting Processes may be used to break 1465 certain anonymization techniques. When using these Options along 1466 anonymized data export and storage, values within the Options which 1467 could be used to break the anonymization SHOULD themselves be 1468 anonymized or omitted. 1470 The Exporting Process Reliability Statistics Options Template, 1471 recommended in [RFC5101], contains an Exporting Process ID field, 1472 which may be an exportingProcessIPv4Address Information Element or an 1473 exportingProcessIPv6Address Information Element. If the Exporting 1474 Process address bears some relation to the IP addresses in the flow 1475 data (e.g., shares a network prefix with internal addresses) and the 1476 IP addresses in the flow data are anonymized in a structure- 1477 preserving way, then the Exporting Process address may be used to 1478 break the IP address anonymization. Exporting Processes exporting 1479 anonymized data in this situation SHOULD mitigate the risk of attack 1480 either by omitting Options described by the Exporting Process 1481 Reliability Statistics Options Template, or by anonymising the 1482 Exporting Process address using a similar technique to that used to 1483 anonymize the IP addresses in the exported data. 1485 Similarly, the Export Session Details Options Template and Message 1486 Details Options Template specified for the IPFIX File Format 1487 [RFC5655] may contain the exportingProcessIPv4Address Information 1488 Element or the exportingProcessIPv6Address Information Element to 1489 identify an Exporting Process from which a flow record was received, 1490 and the collectingProcessIPv4Address Information Element or the 1491 collectingProcessIPv6Address Information Element to identify the 1492 Collecting Process which received it. If the Exporting Process or 1493 Collecting Process address bears some relation to the IP addresses in 1494 the data set (e.g., shares a network prefix with internal addresses) 1495 and the IP addresses in the data set are anonymized in a structure- 1496 preserving way, then the Exporting Process or Collecting Process 1497 address may be used to break the IP address anonymization. Since 1498 these Options Templates are primarily intended for storing IPFIX 1499 Transport Session data for auditing, replay, and testing purposes, it 1500 is NOT RECOMMENDED that storage of anonymized data include these 1501 Options Templates in order to mitigate the risk of attack. 1503 The Message Details Options Template specified for the IPFIX File 1504 Format [RFC5655] also contains the collectionTimeMilliseconds 1505 Information Element. As with the Export Time Message Header field, 1506 if the exported data set contains anonymized timestamp information, 1507 and the collectionTimeMilliseconds Information Element in a given 1508 Message has some relationship to the anonymized timestamp 1509 information, then this relationship can be exploited to reverse the 1510 timestamp anonymization. Since this Options Template is primarily 1511 intended for storing IPFIX Transport Session data for auditing, 1512 replay, and testing purposes, it is NOT RECOMMENDED that storage of 1513 anonymized data include this Options Template in order to mitigate 1514 the risk of attack. 1516 Since the Time Window Options Template specified for the IPFIX File 1517 Format [RFC5655] refers to the timestamps within the data set to 1518 provide partial table of contents information for an IPFIX File, 1519 Options described by this template SHOULD be written using the 1520 anonymized timestamps instead of the original ones. 1522 7.2.5. Special-Use Address Space Considerations 1524 When anonymising data for transport or storage using IPFIX containing 1525 anonymized IP addresses, and the analysis purpose permits doing so, 1526 it is RECOMMENDED to filter out or leave unanonymized data containing 1527 the special-use IPv4 addresses enumerated in [RFC5735] or the 1528 special-use IPv6 addresses enumerated in [RFC5156]. Data containing 1529 these addresses (e.g. 0.0.0.0 and 169.254.0.0/16 for link-local 1530 autoconfiguration in IPv4 space) are often associated with specific, 1531 well-known behavioral patterns. Detection of these patterns in 1532 anonymized data can lead to deanonymization of these special-use 1533 addresses, which increases the chance of a complete reversal of 1534 anonymization by an attacker, especially of prefix-preserving 1535 techniques. 1537 7.2.6. Protecting Out-of-Band Configuration and Management Data 1539 Special care should be taken when exporting or sharing anonymized 1540 data to avoid information leakage via the configuration or management 1541 planes of the IPFIX Device containing the Exporting Process or the 1542 File Writer. For example, adding noise to counters is useless if the 1543 receiver can deduce the values in the counters from SNMP information, 1544 and concealing the network under test is similarly useless if such 1545 information is available in a configuration document. As the 1546 specifics of these concerns are largely implementation- and 1547 deployment-dependent, specific mitigation is out of scope for this 1548 draft. The general ground rule is that information of similar type 1549 to that anonymized SHOULD NOT be made available to the receiver by 1550 any means, whether in the Data Records, in IPFIX protocol structures 1551 such as Message Headers, or out-of-band. 1553 8. Examples 1555 In this example, consider the export or storage of an anonymized IPv4 1556 data set from a single network described by a simple template 1557 containing a timestamp in seconds, a five-tuple, and packet and octet 1558 counters. The template describing each record in this data set is 1559 shown in figure Figure 4. 1561 1 2 3 1562 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1563 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1564 | Set ID = 2 | Length = 40 | 1565 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1566 | Template ID = 256 | Field Count = 8 | 1567 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1568 |0| flowStartSeconds 150 | Field Length = 4 | 1569 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1570 |0| sourceIPv4Address 8 | Field Length = 4 | 1571 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1572 |0| destinationIPv4Address 12 | Field Length = 4 | 1573 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1574 |0| sourceTransportPort 7 | Field Length = 2 | 1575 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1576 |0| destinationTransportPort 11 | Field Length = 2 | 1577 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1578 |0| packetDeltaCount 2 | Field Length = 4 | 1579 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1580 |0| octetDeltaCount 1 | Field Length = 4 | 1581 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1582 |0| protocolIdentifier 4 | Field Length = 1 | 1583 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1585 Figure 4: Example Flow Template 1587 Suppose that this data set is anonymized according to the following 1588 policy: 1590 o IP addresses within the network are protected by reverse 1591 truncation. 1593 o IP addresses outside the network are protected by prefix- 1594 preserving anonymization. 1596 o Octet counts are exported using degraded precision in order to 1597 provide minimal protection against fingerprinting attacks. 1599 o All other fields are exported unanonymized. 1601 In order to export anonymization records for this template and 1602 policy, first, the Anonymization Options Template shown in figure 1603 Figure 5 is exported. For this example, the optional 1604 privateEnterpriseNumber and informationElementIndex Information 1605 Elements are omitted, because they are not used. 1607 1 2 3 1608 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1609 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1610 | Set ID = 3 | Length = 26 | 1611 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1612 | Template ID = 257 | Field Count = 4 | 1613 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1614 | Scope Field Count = 2 |0| templateID 145 | 1615 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1616 | Field Length = 2 |0| informationElementId 303 | 1617 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1618 | Field Length = 2 |0| anonymizationFlags TBD1 | 1619 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1620 | Field Length = 2 |0| anonymizationTechnique TBD2 | 1621 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1622 | Field Length = 2 | 1623 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1625 Figure 5: Example Anonymization Options Template 1627 Following the Anonymization Options Template comes a Data Set 1628 containing Anonymization Records. This data set has an entry for 1629 each Information Element Specifier in Template 256 describing the 1630 flow records. This Data Set is shown in figure Figure 6. Note that 1631 sourceIPv4Address and destinationIPv4Address have the Perimeter 1632 Anonymization (0x0004) flag set in anonymizationFlags, meaning that 1633 source address should be treated as network-external, and the 1634 destination address as network-internal. 1636 1 2 3 1637 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1638 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1639 | Set ID = 257 | Length = 68 | 1640 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1641 | Template 256 | flowStartSeconds IE 150 | 1642 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1643 | no flags 0x0000 | Not Anonymized 1 | 1644 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1645 | Template 256 | sourceIPv4Address IE 8 | 1646 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1647 | Perimeter, Session SC 0x0005 | Structured Permutation 6 | 1648 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1649 | Template 256 | destinationIPv4Address IE 12 | 1650 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1651 | Perimeter, Stable 0x0007 | Reverse Truncation 7 | 1652 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1653 | Template 256 | sourceTransportPort IE 7 | 1654 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1655 | no flags 0x0000 | Not Anonymized 1 | 1656 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1657 | Template 256 | dest.TransportPort IE 11 | 1658 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1659 | no flags 0x0000 | Not Anonymized 1 | 1660 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1661 | Template 256 | packetDeltaCount IE 2 | 1662 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1663 | no flags 0x0000 | Not Anonymized 1 | 1664 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1665 | Template 256 | octetDeltaCount IE 1 | 1666 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1667 | Stable 0x0003 | Precision Degradation 2 | 1668 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1669 | Template 256 | protocolIdentifier IE 4 | 1670 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1671 | no flags 0x0000 | Not Anonymized 1 | 1672 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1674 Figure 6: Example Anonymization Records 1676 Following the Anonymization Records come the data sets containing the 1677 anonymized data, exported according to the template in figure 1678 Figure 4. Bringing it all together, consider an IPFIX Message 1679 containing three real data records and the necessary templates to 1680 export them, shown in Figure 7. (Note that the scale of this message 1681 is 8-bytes per line, for compactness; lines of dots '. . . . . ' 1682 represent shifting of the example bit structure for clarity.) 1683 1 2 3 4 5 6 1684 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 1685 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1686 | 0x000a | length 135 | export time 1271227717 | msg 1687 | sequence 0 | domain 1 | hdr 1688 | SetID 2 | length 40 | tid 256 | fields 8 | tmpl 1689 | IE 150 | length 4 | IE 8 | length 4 | set 1690 | IE 12 | length 4 | IE 7 | length 2 | 1691 | IE 11 | length 2 | IE 2 | length 4 | 1692 | IE 1 | length 4 | IE 4 | length 1 | 1693 | SetID 256 | length 79 | time 1271227681 | data 1694 | sip 192.0.2.3 | dip 198.51.100.7 | set 1695 | sp 53 | dp 53 | packets 1 | 1696 | bytes 74 | prt 17 | . . . . . . . . . . . 1697 | time 1271227682 | sip 198.51.100.7 | 1698 | dip 192.0.2.88 | sp 5091 | dp 80 | 1699 | packets 60 | bytes 2896 | 1700 | prt 6 | . . . . . . . . . . . . . . . . . . . . . . . . . . . 1701 | time 1271227683 | sip 198.51.100.7 | 1702 | dip 203.0.113.9 | sp 5092 | dp 80 | 1703 | packets 44 | bytes 2037 | 1704 | prt 6 | 1705 +---------+ 1707 Figure 7: Example Real Message 1709 The corresponding anonymized message is then shown in Figure 8. The 1710 options template set describing Anonymization Records and the 1711 Anonymization Records themselves are added; IP addresses and byte 1712 counts are anonymized as declared. 1714 1 2 3 4 5 6 1715 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 1716 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1717 | 0x000a | length 233 | export time 1271227717 | msg 1718 | sequence 0 | domain 1 | hdr 1719 | SetID 2 | length 40 | tid 256 | fields 8 | tmpl 1720 | IE 150 | length 4 | IE 8 | length 4 | set 1721 | IE 12 | length 4 | IE 7 | length 2 | 1722 | IE 11 | length 2 | IE 2 | length 4 | 1723 | IE 1 | length 4 | IE 4 | length 1 | 1724 | SetID 3 | length 30 | tid 257 | fields 4 | opt 1725 | scope 2 | . . . . . . . . . . . . . . . . . . . . . . . . tmpl 1726 | IE 145 | length 2 | IE 303 | length 2 | set 1727 | IE TBD1 | length 2 | IE TBD2 | length 2 | 1728 | SetID 257 | length 68 | . . . . . . . . . . . . . . . . anon 1729 | tid 256 | IE 150 | flags 0 | tech 1 | recs 1730 | tid 256 | IE 8 | flags 5 | tech 6 | 1731 | tid 256 | IE 12 | flags 7 | tech 7 | 1732 | tid 256 | IE 7 | flags 0 | tech 1 | 1733 | tid 256 | IE 11 | flags 0 | tech 1 | 1734 | tid 256 | IE 2 | flags 0 | tech 1 | 1735 | tid 256 | IE 1 | flags 3 | tech 2 | 1736 | tid 256 | IE41 | flags 0 | tech 1 | 1737 | SetID 256 | length 79 | time 1271227681 | data 1738 | sip 254.202.119.209 | dip 0.0.0.7 | set 1739 | sp 53 | dp 53 | packets 1 | 1740 | bytes 100 | prt 17 | . . . . . . . . . . . 1741 | time 1271227682 | sip 0.0.0.7 | 1742 | dip 254.202.119.6 | sp 5091 | dp 80 | 1743 | packets 60 | bytes 2900 | 1744 | prt 6 | . . . . . . . . . . . . . . . . . . . . . . . . . . . 1745 | time 1271227683 | sip 0.0.0.7 | 1746 | dip 2.19.199.176 | sp 5092 | dp 80 | 1747 | packets 60 | bytes 2000 | 1748 | prt 6 | 1749 +---------+ 1751 Figure 8: Corresponding Anonymized Message 1753 9. Security Considerations 1755 This document provides guidelines for exporting metadata about 1756 anonymized data in IPFIX, or storing metadata about anonymized data 1757 in IPFIX Files. It is not intended as a general statement on the 1758 applicability of specific flow data anonymization techniques. 1759 Exporters or publishers of anonymized data must take care that the 1760 applied anonymization technique is appropriate for the data source, 1761 the purpose, and the risk of deanonymization of a given application. 1762 Research in anonymization techniques, and techniques for 1763 deanonymization, is ongoing, and currently "safe" anonymization 1764 techniques may be rendered unsafe by future developments. 1766 We note specifically that anonymization is not a replacement for 1767 encryption for confidentiality. It is only appropriate for 1768 protecting identifying information in data to be used for purposes in 1769 which the protected data is irrelevant. Confidentiality in export is 1770 best served by using TLS [RFC5246] or DTLS [RFC4347] as in the 1771 Security Considerations section of [RFC5101], and in long-term 1772 storage by implementation-specific protection applied as in the 1773 Security Considerations section of [RFC5655]. Indeed, 1774 confidentiality and anonymization are not mutually exclusive, as 1775 encryption for confidentiality may be applied to anonymized data 1776 export or storage, as well, when the anonymized data is not intended 1777 for public release. 1779 We note as well that care should be taken even with well-anonymized 1780 data, and anonymized data should still be treated as privacy- 1781 sensitive. Anonymization reduces the risk of misuse, but is not a 1782 complete solution to the problem of protecting end-user privacy in 1783 network flow trace analysis. 1785 When using pseudonymization techniques that have a mutable mapping, 1786 there is an inherent tradeoff in the stability of the map between 1787 long-term comparability and security of the data set against 1788 deanonymization. In general, deanonymization attacks are more 1789 effective given more information, so the longer a given mapping is 1790 valid, the more information can be applied to deanonymization. The 1791 specific details of this are technique-dependent and therefore out of 1792 the scope of this document. 1794 When releasing anonymized data, publishers need to ensure that data 1795 that could be used in deanonymization is not leaked through a side 1796 channel. The entire workflow (hardware, software, operational 1797 policies and procedures, etc.) for handling anonymized data must be 1798 evaluated for risk of data leakage. While most of these possible 1799 side channels are out of scope for this document, guidelines for 1800 reducing the risk of information leakage specific to the IPFIX export 1801 protocol are provided in Section 7.2. 1803 Note as well that the Security Considerations section of [RFC5101] 1804 applies as well to the export of anonymized data, and the Security 1805 Considerations section of [RFC5655] to the storage of anonymized 1806 data, or the publication of anonymized traces. 1808 10. IANA Considerations 1810 This document specifies the creation of several new IPFIX Information 1811 Elements in the IPFIX Information Element registry located at 1812 http://www.iana.org/assignments/ipfix, as defined in Section 6.2 1813 above. IANA has assigned the following Information Element numbers 1814 for their respective Information Elements as specified below: 1816 o Information Element number TBD1 for the anonymizationFlags 1817 Information Element. 1819 o Information Element number TBD2 for the anonymizationTechnique 1820 Information Element. 1822 o Information Element number TBD3 for the informationElementIndex 1823 Information Element. 1825 [NOTE for IANA: The text TBDn should be replaced with the respective 1826 assigned Information Element numbers where they appear in this 1827 document. Information Element numbers should be assigned outside the 1828 NetFlow V9 compatibility range, as these Information Elements are not 1829 supported by NetFlow V9.] 1831 11. Acknowledgments 1833 We thank Paul Aitken and John McHugh for their comments and insight, 1834 and Carsten Schmoll, Benoit Claise, Lothar Braun, Dan Romascanu, 1835 Stewart Bryant, and Sean Turner for their reviews. Special thanks to 1836 the FP7 PRISM and DEMONS projects for their material support of this 1837 work. 1839 12. References 1841 12.1. Normative References 1843 [RFC5101] Claise, B., "Specification of the IP Flow Information 1844 Export (IPFIX) Protocol for the Exchange of IP Traffic 1845 Flow Information", RFC 5101, January 2008. 1847 [RFC5102] Quittek, J., Bryant, S., Claise, B., Aitken, P., and J. 1848 Meyer, "Information Model for IP Flow Information Export", 1849 RFC 5102, January 2008. 1851 [RFC5103] Trammell, B. and E. Boschi, "Bidirectional Flow Export 1852 Using IP Flow Information Export (IPFIX)", RFC 5103, 1853 January 2008. 1855 [RFC5655] Trammell, B., Boschi, E., Mark, L., Zseby, T., and A. 1856 Wagner, "Specification of the IP Flow Information Export 1857 (IPFIX) File Format", RFC 5655, October 2009. 1859 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1860 Requirement Levels", BCP 14, RFC 2119, March 1997. 1862 [RFC5735] Cotton, M. and L. Vegoda, "Special Use IPv4 Addresses", 1863 BCP 153, RFC 5735, January 2010. 1865 [RFC5156] Blanchet, M., "Special-Use IPv6 Addresses", RFC 5156, 1866 April 2008. 1868 12.2. Informative References 1870 [RFC5470] Sadasivan, G., Brownlee, N., Claise, B., and J. Quittek, 1871 "Architecture for IP Flow Information Export", RFC 5470, 1872 March 2009. 1874 [RFC5472] Zseby, T., Boschi, E., Brownlee, N., and B. Claise, "IP 1875 Flow Information Export (IPFIX) Applicability", RFC 5472, 1876 March 2009. 1878 [I-D.ietf-ipfix-mediators-framework] 1879 Kobayashi, A., Claise, B., Muenz, G., and K. Ishibashi, 1880 "IPFIX Mediation: Framework", 1881 draft-ietf-ipfix-mediators-framework-09 (work in 1882 progress), October 2010. 1884 [I-D.ietf-ipfix-export-per-sctp-stream] 1885 Claise, B., Aitken, P., Johnson, A., and G. Muenz, "IPFIX 1886 Export per SCTP Stream", 1887 draft-ietf-ipfix-export-per-sctp-stream-08 (work in 1888 progress), May 2010. 1890 [RFC5153] Boschi, E., Mark, L., Quittek, J., Stiemerling, M., and P. 1891 Aitken, "IP Flow Information Export (IPFIX) Implementation 1892 Guidelines", RFC 5153, April 2008. 1894 [RFC3917] Quittek, J., Zseby, T., Claise, B., and S. Zander, 1895 "Requirements for IP Flow Information Export (IPFIX)", 1896 RFC 3917, October 2004. 1898 [RFC4291] Hinden, R. and S. Deering, "IP Version 6 Addressing 1899 Architecture", RFC 4291, February 2006. 1901 [RFC4347] Rescorla, E. and N. Modadugu, "Datagram Transport Layer 1902 Security", RFC 4347, April 2006. 1904 [RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security 1905 (TLS) Protocol Version 1.2", RFC 5246, August 2008. 1907 [Bur10] Burkhart, M., Schatzmann, D., Trammell, B., and E. Boschi, 1908 "The Role of Network Trace Anonymization Under Attack", 1909 ACM Computer Communications Review, vol. 40, no. 1, pp. 1910 6-11, January 2010. 1912 [Mur07] Murdoch, S. and P. Zielinski, "Sampled Traffic Analysis by 1913 Internet-Exchange-Level Adversaries", Proceedings of the 1914 7th Workshop on Privacy Enhancing Technologies, Ottawa, 1915 Canada., June 2007. 1917 Authors' Addresses 1919 Elisa Boschi 1920 Swiss Federal Institute of Technology Zurich 1921 Gloriastrasse 35 1922 8092 Zurich 1923 Switzerland 1925 Email: boschie@tik.ee.ethz.ch 1927 Brian Trammell 1928 Swiss Federal Institute of Technology Zurich 1929 Gloriastrasse 35 1930 8092 Zurich 1931 Switzerland 1933 Phone: +41 44 632 70 13 1934 Email: trammell@tik.ee.ethz.ch