idnits 2.17.1 draft-boschi-ipfix-anon-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.ii or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but does not include the phrase in its RFC 2119 key words list. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 30, 2009) is 5505 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Missing Reference: 'CITE' is mentioned on line 411, but not defined ** Obsolete normative reference: RFC 5101 (Obsoleted by RFC 7011) ** Obsolete normative reference: RFC 5102 (Obsoleted by RFC 7012) == Outdated reference: A later version (-05) exists of draft-ietf-ipfix-file-03 == Outdated reference: A later version (-09) exists of draft-ietf-ipfix-mediators-framework-02 Summary: 3 errors (**), 0 flaws (~~), 5 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 IPFIX Working Group E. Boschi 3 Internet-Draft B. Trammell 4 Intended status: Experimental Hitachi Europe 5 Expires: October 1, 2009 March 30, 2009 7 IP Flow Anonymisation Support 8 draft-boschi-ipfix-anon-03.txt 10 Status of this Memo 12 This Internet-Draft is submitted to IETF in full conformance with the 13 provisions of BCP 78 and BCP 79. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that 17 other groups may also distribute working documents as Internet- 18 Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six months 21 and may be updated, replaced, or obsoleted by other documents at any 22 time. It is inappropriate to use Internet-Drafts as reference 23 material or to cite them other than as "work in progress." 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/ietf/1id-abstracts.txt. 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html. 31 This Internet-Draft will expire on October 1, 2009. 33 Copyright Notice 35 Copyright (c) 2009 IETF Trust and the persons identified as the 36 document authors. All rights reserved. 38 This document is subject to BCP 78 and the IETF Trust's Legal 39 Provisions Relating to IETF Documents in effect on the date of 40 publication of this document (http://trustee.ietf.org/license-info). 41 Please review these documents carefully, as they describe your rights 42 and restrictions with respect to this document. 44 Abstract 46 This document describes anonymisation techniques for IP flow data and 47 the export of anonymised data using the IPFIX protocol. It provides 48 a categorization of common anonymisation schemes and defines the 49 parameters needed to describe them. It provides guidelines for the 50 implementation of anonymised data export and storage over IPFIX, and 51 describes an Options-based method for anonymization metadata export 52 within the IPFIX protocol, providing the basis for the definition of 53 information models for configuring anonymisation techniques within an 54 IPFIX Metering or Exporting Process, and for reporting the technique 55 in use to an IPFIX Collecting Process. 57 Table of Contents 59 1. Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . 4 60 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 61 2.1. IPFIX Protocol Overview . . . . . . . . . . . . . . . . . 5 62 2.2. IPFIX Documents Overview . . . . . . . . . . . . . . . . . 5 63 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 64 4. Categorisation of Anonymisation Techniques . . . . . . . . . . 6 65 5. Anonymisation of IP Flow Data . . . . . . . . . . . . . . . . 7 66 5.1. IP Address Anonymisation . . . . . . . . . . . . . . . . . 8 67 5.1.1. Truncation . . . . . . . . . . . . . . . . . . . . . . 9 68 5.1.2. Random Permutation . . . . . . . . . . . . . . . . . . 9 69 5.1.3. Prefix-preserving Pseudonymisation . . . . . . . . . . 9 70 5.2. Timestamp Anonymisation . . . . . . . . . . . . . . . . . 10 71 5.2.1. Precision Degradation . . . . . . . . . . . . . . . . 10 72 5.2.2. Enumeration . . . . . . . . . . . . . . . . . . . . . 11 73 5.2.3. Random Time Shifts . . . . . . . . . . . . . . . . . . 11 74 5.3. Counter Anonymisation . . . . . . . . . . . . . . . . . . 11 75 5.3.1. Precision Degradation . . . . . . . . . . . . . . . . 11 76 5.3.2. Binning . . . . . . . . . . . . . . . . . . . . . . . 12 77 5.3.3. Random Noise Addition . . . . . . . . . . . . . . . . 12 78 5.4. Anonymisation of Other Flow Fields . . . . . . . . . . . . 12 79 5.4.1. Binning . . . . . . . . . . . . . . . . . . . . . . . 13 80 5.4.2. Random Permutation . . . . . . . . . . . . . . . . . . 13 81 6. Parameters for the Description of Anonymisation Techniques . . 13 82 6.1. Stability . . . . . . . . . . . . . . . . . . . . . . . . 13 83 6.2. Truncation Length . . . . . . . . . . . . . . . . . . . . 14 84 6.3. Bin Map . . . . . . . . . . . . . . . . . . . . . . . . . 14 85 6.4. Permutation . . . . . . . . . . . . . . . . . . . . . . . 14 86 6.5. Shift Amount . . . . . . . . . . . . . . . . . . . . . . . 14 87 7. Anonymisation Export Support in IPFIX . . . . . . . . . . . . 15 88 7.1. Anonymisation Options Template . . . . . . . . . . . . . . 15 89 7.2. Recommended Information Elements for Anonymisation 90 Metadata . . . . . . . . . . . . . . . . . . . . . . . . . 16 91 7.2.1. anonymisationStability . . . . . . . . . . . . . . . . 16 92 7.2.2. anonymisationTechnique . . . . . . . . . . . . . . . . 17 93 7.2.3. informationElementIndex . . . . . . . . . . . . . . . 18 94 8. Applying Anonymisation Techniques to IPFIX Export and 95 Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 97 8.1. Arrangement of Processes in IPFIX Anonymisation . . . . . 19 98 8.2. IPFIX-Specific Anonymisation Guidelines . . . . . . . . . 20 99 8.2.1. Appropriate Use of Information Elements for 100 Anonymised Data . . . . . . . . . . . . . . . . . . . 20 101 8.2.2. Anonymisation of Header Data . . . . . . . . . . . . . 20 102 8.2.3. Anonymisation of Options Data . . . . . . . . . . . . 21 103 9. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 104 10. Security Considerations . . . . . . . . . . . . . . . . . . . 22 105 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23 106 12. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 23 107 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 23 108 13.1. Normative References . . . . . . . . . . . . . . . . . . . 23 109 13.2. Informative References . . . . . . . . . . . . . . . . . . 23 110 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 24 112 1. Open Issues 114 There is not yet a mechanism for exporting information about defined- 115 time anonymisation stability. 117 The terminology section is incomplete; we should decide which of the 118 terms introduced in this document are to be treated as terminology. 120 Between "classes" of techniques and "parameters", there may be 121 "properties" as well; for example, binning and timestamp 122 anonymisation may be "ordered" or not (x>y in real --> x>y in 123 anonymized). We should verify that we're splitting these up 124 correctly. 126 In parallel with this, the anonymisationTechnique values might be 127 useful as a bitfield, with properties and classes being represented 128 by some set of the bits in the field. We'll have to make sure that 129 the properties and classes are exhaustive, if we do this. 131 Both anonymisationStability and anonymisationTechnique might benefit 132 from the creation of IANA registries; HOWEVER, in this case, it would 133 be very important to ensure that such a registry contains only 134 classes and properties of anonymised data, not information about 135 specific algorithms. 137 Certain technique/IE combinaitons (e.g. structure-preserving 138 counters) don't make any sense; these should be noted in "IPFIX- 139 Specific Anonymisation Guidelines". 141 Guidelines should be provided for the evaluation of _new_ IEs added 142 to the IANA registry after the publication of this draft for their 143 anonymisation potential. 145 This document does not cover the anonymisation of sub-IP level 146 information, specifically MAC addresses. It should. 148 2. Introduction 150 The standardisation of an IP flow information export protocol 151 [RFC5101] and associated representations removes a technical barrier 152 to the sharing of IP flow data across organizational boundaries and 153 with network operations, security, and research communities for a 154 wide variety of purposes. However, with wider dissemination comes 155 greater risks to the privacy of the users of networks under 156 measurement, and to the security of those networks. While it is not 157 a complete solution to the issues posed by distribution of IP flow 158 information, anonymisation is an important tool for the protection of 159 privacy within network measurement infrastructures. 161 This document presents a mechanism for representing anonymised data 162 within IPFIX and guidelines for using it. It begins with a 163 categorization of anonymisation techniques. It then describes 164 applicability of each technique to commonly anonymisable fields of IP 165 flow data, organized by information element data type and semantics 166 as in [RFC5102]; enumerates the parameters required by each of the 167 applicable anonymisation techniques; and provides guidelines for the 168 use of each of these techniques in accordance with best practices in 169 data protection. Finally, it specifies a mechanism for exporting 170 anonymised data and binding anonymisation metadata to templates using 171 IPFIX Options. 173 2.1. IPFIX Protocol Overview 175 In the IPFIX protocol, { type, length, value } tuples are expressed 176 in templates containing { type, length } pairs, specifying which { 177 value } fields are present in data records conforming to the 178 Template, giving great flexibility as to what data is transmitted. 179 Since Templates are sent very infrequently compared with Data 180 Records, this results in significant bandwidth savings. Various 181 different data formats may be transmitted simply by sending new 182 Templates specifying the { type, length } pairs for the new data 183 format. See [RFC5101] for more information. 185 The IPFIX information model [RFC5102] defines a large number of 186 standard Information Elements which provide the necessary { type } 187 information for Templates. The use of standard elements enables 188 interoperability among different vendors' implementations. 189 Additionally, non-standard enterprise-specific elements may be 190 defined for private use. 192 2.2. IPFIX Documents Overview 194 "Specification of the IPFIX Protocol for the Exchange of IP Traffic 195 Flow Information" [RFC5101] and its associated documents define the 196 IPFIX Protocol, which provides network engineers and administrators 197 with access to IP traffic flow information. 199 "Architecture for IP Flow Information Export" 200 [I-D.ietf-ipfix-architecture] defines the architecture for the export 201 of measured IP flow information out of an IPFIX Exporting Process to 202 an IPFIX Collecting Process, and the basic terminology used to 203 describe the elements of this architecture, per the requirements 204 defined in "Requirements for IP Flow Information Export" [RFC3917]. 205 The IPFIX Protocol document [RFC5101] then covers the details of the 206 method for transporting IPFIX Data Records and Templates via a 207 congestion-aware transport protocol from an IPFIX Exporting Process 208 to an IPFIX Collecting Process. 210 "Information Model for IP Flow Information Export" [RFC5102] 211 describes the Information Elements used by IPFIX, including details 212 on Information Element naming, numbering, and data type encoding. 213 Finally, "IPFIX Applicability" [I-D.ietf-ipfix-as] describes the 214 various applications of the IPFIX protocol and their use of 215 information exported via IPFIX, and relates the IPFIX architecture to 216 other measurement architectures and frameworks. 218 Additionally, the "Specification of the IPFIX File Format" 219 [I-D.ietf-ipfix-file] describes a file format based upon the IPFIX 220 Protocol for the storage of flow data. 222 This document references the Protocol and Architecture documents for 223 terminology, and extends the IPFIX Information Model to provide new 224 Information Elements for anonymisation metadata. The anonymisation 225 techniques described herein are equally applicable to the IPFIX 226 Protocol and data stored in IPFIX Files. 228 3. Terminology 230 Terms used in this document that are defined in the Terminology 231 section of the IPFIX Protocol [RFC5101] document are to be 232 interpreted as defined there. 234 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 235 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 236 document are to be interpreted as described in RFC 2119 [RFC2119]. 238 4. Categorisation of Anonymisation Techniques 240 Anonymisation modifies a data set in order to protect the identity of 241 the people or entities described by the data set from disclosure. 242 With respect to network traffic data, anonymisation generally 243 attempts to preserve some set of properties of the network traffic 244 useful for a given application or applications, while ensuring the 245 data cannot be traced back to the specific networks, hosts, or users 246 generating the traffic. 248 Anonymisation may be broadly classified according to two properties: 249 recoverability and countability. All anonymisation techniques map 250 the real space of identifiers or values into a separate, anonymised 251 space, according to some function. A technique is said to be 252 recoverable when the function used is invertible or can otherwise be 253 reversed and a real identifier can be recovered from a given 254 replacement identifier. 256 Countability compares the dimension of the anonymised space (N) to 257 the dimension of the real space (M), and denotes how the count of 258 unique values is preserved by the anonymisation function. If the 259 anonymised space is smaller than the real space, then the function is 260 said to generalise the input, mapping more than one input point to 261 each anonymous value (e.g., as with aggregation). By definition, 262 generalisation is not recoverable. 264 If the dimensions of the anonymised and real spaces are the same, 265 such that the count of unique values is preserved, then the function 266 is said to be a direct substitution function. If the dimension of 267 the anonymised space is larger, such that each real value maps to a 268 set of anonymised values, then the function is said to be a set 269 substitution function. Note that with set substitution functions, 270 the sets of anonymised values are not necessarily disjoint. Either 271 direct or set substitution functions are said to be one-way if there 272 exists no method for recovering the real data point from an 273 anonymised one. 275 This classification is summarised in the table below. 277 +------------------------+-----------------+------------------------+ 278 | Recoverability / | Recoverable | Non-recoverable | 279 | Countability | | | 280 +------------------------+-----------------+------------------------+ 281 | N < M | N.A. | Generalisation | 282 | N = M | Direct | One-way Direct | 283 | | Substitution | Substitution | 284 | N > M | Set | One-way Set | 285 | | Substitution | Substitution | 286 +------------------------+-----------------+------------------------+ 288 5. Anonymisation of IP Flow Data 290 Due to the restricted semantics of IP flow data, there are a 291 relatively limited set of specific anonymisation techniques available 292 on flow data, though each falls into the broad categories above. 293 Each type of field that may commonly appear in a flow record may have 294 its own applicable specific techniques. 296 While anonymisation is generally applied at the resolution of single 297 fields within a flow record, attacks against anonymisation use entire 298 flows and relationships between hosts and flows within a given data 299 set. Therefore, fields which may not necessarily be identifying by 300 themselves may be anonymised in order to increase the anonymity of 301 the data set as a whole. 303 Of all the fields in an IP flow record, only IP addresses directly 304 identify entities in the real world. Each IP address is associated 305 with an interface on a network host, and can potentially be 306 identified with a single user. Additionally, IP addresses are 307 structured identifiers; that is, partial IP address prefixes may be 308 used to identify networks just as full IP addresses identify hosts. 309 This makes anonymisation of IP addresses particularly important. 311 Port numbers identify abstract entities (applications) as opposed to 312 real-world entities, but they can be used to classify hosts and user 313 behavior. Passive port fingerprinting, both of well-known and 314 ephemeral ports, can be used to determine the operating system 315 running on a host. Relative data volumes by port can also be used to 316 determine the host's function (workstation, web server, etc.); this 317 information can be used to identify hosts and users. 319 While not identifiers in and of themselves, timestamps and counters 320 can reveal the behavior of the hosts and users on a network. Any 321 given network activity is recognizable by a pattern of relative time 322 differences and data volumes in the associated sequence of flows, 323 even without host address information. They can therefore be used to 324 identify hosts and users. Timestamps and counters are also 325 vulnerable to traffic injection attacks, where traffic with a known 326 pattern is injected into a network under measurement, and this 327 pattern is later identified in the anonymised data set. 329 The simplest and most extreme form of anonymisation, which can be 330 applied to any field of a flow record, is black-marker anonymisation, 331 or complete deletion of a given field. Note that black-marker 332 anonymisation is equivalent to simply not exporting the field(s) in 333 question. 335 While black-marker anonymisation completely protects the data in the 336 deleted fields from the risk of disclosure, it also reduces the 337 utility of the anonymised data set as a whole. Techniques that 338 retain some information while reducing (though not eliminating) the 339 disclosure risk will be extensively discussed in the following 340 sections; note that the techniques specifically applicable to IP 341 addresses, timestamps, ports, and counters will be discussed in 342 separate sections. 344 5.1. IP Address Anonymisation 346 Since IP addresses are the most common identifiers within flow data 347 that can be used to directly identify a person, organization, or 348 host, most of the work on flow and trace data anonymisation has gone 349 into IP address anonymisation techniques. Indeed, the aim of most 350 attacks against anonymisation is to recover the map from anonymised 351 IP addresses to original IP addresses thereby identifying the 352 identified hosts. There is therefore a wide range of IP address 353 anonymisation schemes that fit into the following categories. 355 +------------------------------------+---------------------+ 356 | Scheme | Action | 357 +------------------------------------+---------------------+ 358 | Truncation | Generalisation | 359 | Random Permutation | Direct Substitution | 360 | Prefix-preserving Pseudonymisation | Direct Substitution | 361 +------------------------------------+---------------------+ 363 5.1.1. Truncation 365 Truncation removes "n" of the least significant bits from an IP 366 address, replacing them with zeroes. In effect, it replaces a host 367 address with a network address for some fixed netblock; for IPv4 368 addresses, 8-bit truncation corresponds to replacement with a /24 369 network address. Truncation is a non-reversible generalisation 370 scheme. Note that while truncation is effective for making hosts 371 non-identifiable, it preserves information which can be used to 372 identify an organization, a geographic region, a country, or a 373 continent (or RIR region of responsibility). 375 Truncation to an address length of 0 is equivalent to black-marker 376 anonymisation. Removal of IP address information is only recommended 377 for analysis tasks which have no need to separate flow data by host 378 or network; e.g. as a first stage to per-application (port) or time- 379 series total volume analyses. 381 5.1.2. Random Permutation 383 Random permutation is a direct substitution technique, replacing each 384 IP address with an address randomly selected from the set of possible 385 IP addresses, guaranteeing that each anonymised address represents a 386 unique original address. The random permutation does not preserve 387 any structural information about a network, but it does preserve the 388 unique count of IP addresses. Any application that requires more 389 structure than host-uniqueness will not be able to use randomly 390 permuted IP addresses. 392 5.1.3. Prefix-preserving Pseudonymisation 394 Prefix-preserving pseudonymisation is a direct substitution 395 technique, further restricted such that the structure of subnets is 396 preserved at each level while anonymising IP addresses. If two real 397 IP addresses match on a prefix of "n" bits, the two anonymised IP 398 addresses will match on a prefix of "n" bits as well. This is useful 399 when relationships among networks must be preserved for a given 400 analysis task, but introduces structure into the anonymised data 401 which can be exploited in attacks against the anonymisation 402 technique. 404 5.2. Timestamp Anonymisation 406 The particular time at which a flow began or ended is not 407 particularly identifiable information, but it can be used as part of 408 attacks against other anonymisation techniques or for user profiling. 409 Presice timestamps can be used in injected-traffic fingerprinting 410 attacks [CITE] as well as to identify certain activity by response 411 delay and size fingerprinting [CITE]. Therefore, timestamp 412 information may be anonymised in order to ensure the protection of 413 the entire dataset. 415 +-----------------------+----------------------------+ 416 | Scheme | Action | 417 +-----------------------+----------------------------+ 418 | Precision Degradation | Generalisation | 419 | Enumeration | Direct or Set Substitution | 420 | Random Shifts | Direct Substitution | 421 +-----------------------+----------------------------+ 423 5.2.1. Precision Degradation 425 Precision Degradation is a generalisation technique that removes the 426 most precise components of a timestamp, accounting all events 427 occurring in each given interval (e.g. one millisecond for 428 millisecond level degradation) as simultaneous. This has the effect 429 of potentially collapsing many timestamps into one. With this 430 technique time precision is reduced, and sequencing may be lost, but 431 the information at which time the event occurred is preserved. The 432 anonymised data may not be generally useful for applications which 433 require strict sequencing of flows. 435 Note that flow meters with low time precision (e.g. second precision, 436 or millisecond precision on high-capacity networks) perform the 437 equivalent of precision degradation anonymisation by their design. 439 Note also that degradation to a very low precision (e.g. on the order 440 of minutes, hours, or days) is commonly used in analyses operating on 441 time-series aggregated data, and is referred to binning; though the 442 time scales are longer and applicability more restricted, this is in 443 principle the same operation. 445 Precision degradation to infinitely low precision is equivalent to 446 black-marker anonymisation. Removal of timestamp information is only 447 recommended for analysis tasks which have no need to separate flows 448 in time, for example for counting total volumes or unique occurrences 449 of other flow keys in an entire dataset. 451 5.2.2. Enumeration 453 Enumeration is a substitution function that retains the chronological 454 order in which events occurred while eliminating time information. 455 Timestamps are substituted by equidistant timestamps (or numbers) 456 starting from a randomly chosen start value. The resulting data is 457 useful for applications requiring strict sequencing, but not for 458 those requiring good timing information (e.g. delay- or jitter- 459 measurement for QoS applications or SLA validation). 461 5.2.3. Random Time Shifts 463 Random time shifts add a random offset to every timestamp within a 464 dataset. This reversible substitution technique therefore retains 465 duration and inter-event interval information as well as 466 chronological order of flows. It is primarily intended to defeat 467 traffic injection fingerprinting attacks. 469 5.3. Counter Anonymisation 471 Counters (such as packet and octet volumes per flow) are subject to 472 fingerprinting and injection attacks against anonymisation, or for 473 user profiling as timestamps are. Counter anonymisation can help 474 defeat these attacks, but are only usable for analysis tasks for 475 which relative or imprecise magnitudes of activity are useful. 477 +-----------------------+----------------------------+ 478 | Scheme | Action | 479 +-----------------------+----------------------------+ 480 | Precision Degradation | Generalisation | 481 | Binning | Generalisation | 482 | Random noise addition | Direct or Set Substitution | 483 +-----------------------+----------------------------+ 485 5.3.1. Precision Degradation 487 As with precision degradation in timestamps, precision degradation of 488 counters removes lower-order bits of the counters, treating all the 489 counters in a given range as having the same value. Depending on the 490 precision reduction, this loses information about the relationships 491 between sizes of similarly-sized flows, but keeps relative magnitude 492 information. 494 5.3.2. Binning 496 Binning can be seen as a special case of precision degradation; the 497 operation is identical, except for in precision degradation the 498 counter ranges are uniform, and in binning they need not be. For 499 example, a common counter binning scheme for packet counters could be 500 to bin values 1-2 together, and 3-infinity together, thereby 501 separating potentially completely-opened TCP connections from 502 unopened ones. Binning schemes are generally chosen to keep 503 precisely the amount of information required in a counter for a given 504 analysis task. Note that, also unlike precision degradation, the bin 505 label need not be within the bin's range. 507 Binning counters to a single bin 0-infinity, or alternately precision 508 degradation to infinitely low precision, is equivalent to black- 509 marker anonymisation. Removal of counter information is only 510 recommended for analysis tasks which have no need to evaluate the 511 removed counter, for example for counting only unique occurrences of 512 other flow keys. 514 5.3.3. Random Noise Addition 516 Random noise addition adds a random amount to a counter in each flow; 517 this is used to keep relative magnitude information and minimize the 518 disruption to size relationship information while avoiding 519 fingerprinting attacks against anonymisation. Note that there is no 520 guarantee that random noise addition will maintain ranking order by a 521 counter among members of a set. Random noise addition is 522 particularly useful when the derived analysis data will not be 523 presented in such a way as to require the lower-order bits of the 524 counters. 526 5.4. Anonymisation of Other Flow Fields 528 Other fields, particularly port numbers and protocol numbers, can be 529 used to partially identify the applications that generated the 530 traffic in a a given flow trace. This information can be used in 531 fingerprinting attacks, and may be of interest on its own (e.g., to 532 reveal that a certain application with suspected vulnerabilities is 533 running on a given network). These fields are generally anonymised 534 using one of two techniques. 536 +--------------------+---------------------+ 537 | Scheme | Action | 538 +--------------------+---------------------+ 539 | Binning | Generalisation | 540 | Random Permutation | Direct Substitution | 541 +--------------------+---------------------+ 543 5.4.1. Binning 545 Binning is a generalisation technique mapping a set of potentially 546 non-uniform ranges into a set of abritrarily labeled bins. Common 547 bin arrangements depend on the field type and the analysis 548 application. For example, an IP protocol bin arrangement may 549 preserve 1, 6, and 17 for ICMP, UDP, and TCP traffic, and bin all 550 other protocols into a single bin, to mitigate the use of uncommon 551 protocols in fingerprinting attacks. Another example arrangement may 552 bin source and destination ports into low (0-1023) and high (1024- 553 65535) bins in order to tell service from ephemeral ports without 554 identifying individual applications. 556 Binning other flow key fields to a single bin is equivalent to black- 557 marker anonymisation. Removal of other flow key information is only 558 recommended for analysis tasks which have no need to differentiate 559 flows on the removed keys, for example for total traffic counts or 560 unique counts of other flow keys. 562 5.4.2. Random Permutation 564 Random permutation is a direct substitution technique, replacing each 565 key value with an value randomly selected from the set of possible 566 range, guaranteeing that each anonymised value represents a unique 567 original value. This is used to preserve the count of unique flow 568 key values without preserving information about the keys themselves. 570 6. Parameters for the Description of Anonymisation Techniques 572 This section details the abstract parameters used to describe the 573 anonymisation techniques examined in the previous section, on a per- 574 parameter basis. These parameters and their export safety inform the 575 design of the IPFIX anonymisation metadata export specified in the 576 following section. 578 6.1. Stability 580 Any given anonymisation technique may be applied with a varying range 581 of stability. Stability is important for assessing the comparability 582 of anonymised information in different data sets, or in the same data 583 set over different time periods. In general, stability ranges from 584 completely stable to completely unstable; however, note that the 585 completely unstable case is indistinguishable from black-marker 586 anonymisation. A completely stable anonymisation will always map a 587 given value in the real space to the same value in the anonymised 588 space. In practice, an anonymisation may also be stable for every 589 data set published by an a particular producer to a particular 590 consumer, stable for a stated time period within a dataset or across 591 datasets, or stable only for a single data set. 593 If no information about stability is available, users of anonymised 594 data may assume that the techniques used are stable across the entire 595 dataset, but unstable across datasets. Note that stability presents 596 a risk-utility tradeoff, as completely stable anonymisation can be 597 used for longer-term trend analysis tasks but also presents more risk 598 of attack given the stable mapping. 600 6.2. Truncation Length 602 Truncation and precision degradation are described by the truncation 603 length, or the amount of data still remaining in the anonymised field 604 after anonymisation. 606 Truncation length can be inferred from a given data set, and need not 607 be specially exported or protected. 609 6.3. Bin Map 611 Binning is described by the specification of a bin mapping function. 612 This function can be generally expressed in terms of an associative 613 array that maps each point in the original space to a bin, although 614 from an implementation standpoint most bin functions are much simpler 615 and more efficient. 617 Since knowledge of the bin mapping function can be used to partially 618 deanonymise binned data, depending on the degree of generalisation, 619 no information about the bin mapping function should be exported. 621 6.4. Permutation 623 Like binning, permutation is described by the specification of a 624 permutation function. In the general case, this can be expressed in 625 terms of an associative array that maps each point in the original 626 space to a point in the anonymised space. Unlike binning, each point 627 in the anonymised space must correspond to a single, unique point in 628 the original space. 630 Since knowledge of the permutation function can be used to completely 631 deanonymise permuted data, no information about the permutation 632 function or its parameters should be exported. 634 6.5. Shift Amount 636 Shifting requires an amount to shift each value by. Since the shift 637 amount can be used to deanonymize data protected by shifting, no 638 information about the shift amount should be exported. 640 7. Anonymisation Export Support in IPFIX 642 Anonymised data exported via IPFIX SHOULD be annotated with 643 anonymisation metadata, which details which fields described by which 644 Templates are anonymised, and provides appropriate information on the 645 anonymisation techniques used. This metadata SHOULD be exported in 646 Data Records described by the recommended Options Templates described 647 in this section; these Options Templates use the additional 648 Information Elements described in the following subsection. 650 Note that fields anonymised using the black-marker (removal) 651 technique do not require any special metadata support. Black-marker 652 anonymised fields SHOULD NOT be exported at all; the absence of the 653 field in a given Data Set is implicitly declared by not including the 654 corresponding Information Element in the Template describing that 655 Data Set; exporting "empty" data elements is inefficient and in the 656 general case impossible, as many non-counter Information Elements do 657 not have semantically distinct null values. 659 7.1. Anonymisation Options Template 661 The Anonymisation Options Template describes anonymisation records, 662 which allow anonymisation metadata to be exported inline over IPFIX 663 or stored in an IPFIX File, by binding information about 664 anonymisation techniques to Information Elements within defined 665 Templates. IPFIX Exporting Processes SHOULD export anonymisation 666 records for any Template describing exported anonymised Data Records; 667 IPFIX Collecting Processes and processes downstream from them MAY use 668 anonymisation records to treat anonymised data differently depending 669 on the applied technique. 671 An Exporting Process SHOULD export anonymisation records after the 672 Templates they describe have been exported, and SHOULD export 673 anonymisation records reliably. 675 Anonymisation records, like Templates, MUST be handled by Collecting 676 Processes as scoped to the Transport Session in which they are sent. 677 While the anonymisationStability IE can be used to declare that a 678 given anonymisation technique's mapping will remain stable across 679 multiple sessions, each session MUST re-export the anonymisation 680 Records along with the templates. 682 [EDITOR'S NOTE: Multiple anon. techniques applied on an IE at the 683 same time is indicated with multiple elements of the same type (in 684 application order as in PSAMP). Need to verify this is actually 685 useful given the defined techniques.] 687 +-------------------------+-----------------------------------------+ 688 | IE | Description | 689 +-------------------------+-----------------------------------------+ 690 | templateId [scope] | The Template ID of the Template | 691 | | containing the Information Element | 692 | | described by this anonymisation record. | 693 | | This Information Element MUST be | 694 | | defined as a Scope Field. | 695 | informationElementId | The Information Element identifier of | 696 | [scope] | the Information Element described by | 697 | | this anonymisation record. This | 698 | | Information Element MUST be defined as | 699 | | a Scope Field. | 700 | informationElementIndex | The Information Element index of the | 701 | [scope] [optional] | instance of the Information Element | 702 | | described by this anonymisation record | 703 | | identified by the informationElementId | 704 | | within the Template. Optional; need | 705 | | only be present when describing | 706 | | Templates that have multiple instances | 707 | | of the same Information Element. This | 708 | | Information Element MUST be defined as | 709 | | a Scope Field if present. This | 710 | | Information Element is defined in | 711 | | Section 7.2, below. | 712 | anonymisationStability | The stability class of the anonymised | 713 | | data. MUST be present. This | 714 | | Information Element is defined in | 715 | | Section 7.2, below. | 716 | anonymisationTechnique | The technique used to anonymise the | 717 | | data. MUST be present. This | 718 | | Information Element is defined in | 719 | | Section 7.2, below. | 720 +-------------------------+-----------------------------------------+ 722 7.2. Recommended Information Elements for Anonymisation Metadata 724 7.2.1. anonymisationStability 726 Description: A description of the stability class of the 727 anonymisation technique applied to a referenced Information 728 Element within a referenced Template. Stability classes refer to 729 the stability of the parameters of the anonymisation technique, 730 and therefore the comparability of the mapping between the real 731 and anonymised values over time. This determines which anonymised 732 datasets may be compared with each other. 734 +-------+-----------------------------------------------------------+ 735 | Value | Description | 736 +-------+-----------------------------------------------------------+ 737 | 0 | Undefined: the Exporting Process makes no representation | 738 | | as to how stable the mapping is, or over what time period | 739 | | values of this field will remain comparable; while the | 740 | | Collecting Process MAY assume Session level stability, | 741 | | Session level stability is not guaranteed. This is | 742 | | equivalent to 0x01 Session level stability while advising | 743 | | the Collecting Process that no special effort has been | 744 | | made to ensure stability. Collecting Processes SHOULD | 745 | | assume this is the case in the absence of stability class | 746 | | information; this is the default stability class. | 747 | 1 | Session: the Exporting Process will ensure that the | 748 | | parameters of the anonymisation technique are stable | 749 | | during the Transport Session. All the values of the | 750 | | described Information Element for each Record described | 751 | | by the referenced Template within the Transport Session | 752 | | are comparable. The Exporting Process SHOULD endeavour | 753 | | to ensure at least this stability class. | 754 | 2 | Exporter-Collector Pair: the Exporting Process will | 755 | | ensure that the parameters of the anonymisation technique | 756 | | are stable across Transport Sessions over time with the | 757 | | given Collecting Process, but may use different | 758 | | parameters for different Collecting Processes. Data | 759 | | exported to different Collecting Processes is not | 760 | | comparable. | 761 | 3 | Stable: the Exporting Process will ensure that the | 762 | | parameters of the anonymisation technique are stable | 763 | | across Transport Sessions over time, regardless of the | 764 | | Collecting Process to which it is sent. | 765 +-------+-----------------------------------------------------------+ 767 Abstract Data Type: unsigned8 769 ElementId: TBD1 771 Status: Proposed 773 7.2.2. anonymisationTechnique 775 Description: A description of the anonymisation technique applied 776 to a referenced Information Element within a referenced Template. 778 +-------+-----------------------------------------------------------+ 779 | Value | Description | 780 +-------+-----------------------------------------------------------+ 781 | 0 | Undefined: the Exporting Process makes no representation | 782 | | as to whether the defined field is anonymised or not. | 783 | | While the Collecting Process MAY assume that the field is | 784 | | not anonymised, it is not guaranteed not to be. This is | 785 | | the default anonymisation technique. | 786 | 1 | None: the values exported are real. | 787 | 2 | Precision Degradation/Truncation: the values exported are | 788 | | anonymised using simple precision degradation or | 789 | | truncation. The new precision is implicit in the | 790 | | exported data, and can be deduced by the Collecting | 791 | | Process. | 792 | 3 | Binning: the values exported are anonymised into bins. | 793 | 4 | Enumeration: the values exported are anonymised by | 794 | | enumeration. | 795 | 5 | Permutation: the values exported are anonymised by random | 796 | | permutation. | 797 | 6 | Prefixed Permutation: the values exported are anonymised | 798 | | by random permutation, preserving bit-level structure; | 799 | | this represents prefix-preserving IP address | 800 | | anonymisation. | 801 +-------+-----------------------------------------------------------+ 803 Abstract Data Type: unsigned8 805 ElementId: TBD2 807 Status: Proposed 809 7.2.3. informationElementIndex 811 Description: A zero-based index of an Information Element 812 referenced by informationElementId within a Template referenced by 813 templateId; used to disambiguate scope for templates containing 814 multiple identical Information Elements. 816 Abstract Data Type: unsigned16 818 ElementId: TBD3 820 Status: Proposed 822 8. Applying Anonymisation Techniques to IPFIX Export and Storage 824 When exporting or storing anonymised flow data using IPFIX, certain 825 interactions between the IPFIX Protocol and the anonymisation 826 techniques in use must be considered; these are treated in the 827 subsections below. 829 8.1. Arrangement of Processes in IPFIX Anonymisation 831 Anonymisation may be applied to IPFIX data at three stages within a 832 the collection infrastructure: on initial export, at a mediator, or 833 after collection, as shown in Figure 1. Each of these locations has 834 specific considerations and applicability. 836 +--------------------+ 837 | IPFIX File Storage | 838 +--------------------+ 839 ^ 840 | (Anonymised after collection) 841 | 842 +=======================================+ 843 | Collecting Process | 844 +=======================================+ 845 ^ ^ 846 | (Anonymised at mediator) | 847 | | 848 +=============================+ | 849 | Mediator | | 850 +=============================+ | 851 ^ | 852 | (Anonymised on initial export) | 853 | | 854 +=======================================+ 855 | Exporting Process | 856 +=======================================+ 858 Figure 1: Potential Anonymisation Locations 860 Anonymisation is generally performed before the wider dissemination 861 or repurposing of a flow data set, e.g., adapting operational 862 measurement data for research. Therefore, direct anonymisation of 863 flow data on initial export is only applicable in certain restricted 864 circumstances: when the Exporting Process is "publishing" data to a 865 Collecting Process directly, and the Exporting Process and Collecting 866 Process are operated by different entities. Note that certain 867 guidelines in Section 8.2.2 with respect to timestamp anonymisation 868 may not apply in this case, as the Collecting Process may be able to 869 deduce certain timing information from the time at which each Message 870 is received. 872 A much more flexible arrangement is to anonymise data within a 873 Mediator [I-D.ietf-ipfix-mediators-framework]. Here, original data 874 is sent to a Mediator, which performs the anonymisation function and 875 re-exports the anonymised data. Such a Mediator could be located at 876 the administrative domain boundary of the initial Exporting Process 877 operator, exporting anonymised data to other consumers outside the 878 organisation. In this case, the original Exporter SHOULD use TLS as 879 specified in [RFC5101] to secure the channel to the Mediator, and the 880 Mediator should follow the guidelines in Section 8.2, to mitigate the 881 risk of original data disclosure. 883 When data is to be published as an anonymised data set in an IPFIX 884 File [I-D.ietf-ipfix-file], the anonymisation may be done at the 885 final Collecting Process before storage and dissemination, as well. 886 In this case, the Collector should follow the guidelines in 887 Section 8.2, especially as regards File-specific Options in 888 Section 8.2.3 890 Note that anonymisation may occur at more than one location within a 891 given collection infrastructure, to provide varying levels of 892 anonymisation reversal risk and utility for specific purposes. 894 8.2. IPFIX-Specific Anonymisation Guidelines 896 In implementing and deploying the anonymisation techniques described 897 in this document, care must be taken that data structures supporting 898 the operation of the protocol itself do not leak data that could be 899 used to reverse the anonymisation applied to the flow data. Such 900 data structures may appear in the header, or within the data stream 901 itself, especially as options data. Each of these and their impact 902 on specific anonymisation techniques is noted in a separate 903 subsection below. 905 8.2.1. Appropriate Use of Information Elements for Anonymised Data 907 [TODO: reiterate black-marker guidelines here] 909 [TODO: note that precision degradation SHOULD use appropriately-sized 910 fields] 912 8.2.2. Anonymisation of Header Data 914 Each IPFIX Message contains a Message Header; within this Message 915 Header are contained two fields which may be used to break certain 916 anonymisation techniques: the Export Time, and the Observation Domain 917 ID 919 Export of IPFIX Messages containing anonymised timestamp data where 920 the original Export Time Message header has some relationship to the 921 anonymised timestamps SHOULD anonymise the Export Time header field 922 using an equivalent technique, if possible. Otherwise, relationships 923 between export and flow time could be used to partially or totally 924 reverse timestamp anonymisation. 926 The similarity in size between an Observation Domain ID and an IPv4 927 address (32 bits) may lead to a temptation to use an IPv4 interface 928 address on the Metering or Exporting Process as the Observation 929 Domain ID. If this address bears some relation to the IP addresses 930 in the flow data (e.g., shares a network prefix with internal 931 addresses) and the IP addresses in the flow data are anonymised in a 932 structure-preserving way, then the Observation Domain ID may be used 933 to break the IP address anonymisation. Use of an IPv4 interface 934 address on the Metering or Exporting Process as the Observation 935 Domain ID is NOT RECOMMENDED in this case. 937 8.2.3. Anonymisation of Options Data 939 IPFIX uses the Options mechanism to export, among other things, 940 metadata about exported flows and the flow collection infrastructure. 941 As with the IPFIX Message Header, certain Options recommended in 942 [RFC5101] and the IPFIX File Format [I-D.ietf-ipfix-file] containing 943 flow timestamps and network addresses of Exporting and Collecting 944 Processes may be used to break certain anonymisation techniques; care 945 should be taken while using them with anonymised data export and 946 storage. 948 The Exporting Process Reliability Statistics Options Template, 949 recommended in [RFC5101], contains an Exporting Process ID field, 950 which may be an exportingProcessIPv4Address Information Element or an 951 exportingProcessIPv6Address Information Element. If the Exporting 952 Process address bears some relation to the IP addresses in the flow 953 data (e.g., shares a network prefix with internal addresses) and the 954 IP addresses in the flow data are anonymised in a structure- 955 preserving way, then the Exporting Process address may be used to 956 break the IP address anonymisation. Exporting Processes exporting 957 anonymised data in this situation SHOULD mitigate the risk of attack 958 either by omitting Options described by the Exporting Process 959 Reliability Statistics Options Template, or by anonymising the 960 Exporting Process address using a similar technique to that used to 961 anonymise the IP addresses in the exported data. 963 Similarly, the Export Session Details Options Template and Message 964 Details Options Template specified for the IPFIX File Format 966 [I-D.ietf-ipfix-file] may contain the exportingProcessIPv4Address 967 Information Element or the exportingProcessIPv6Address Information 968 Element to identify an Exporting Process from which a flow record was 969 received, and the collectingProcessIPv4Address Information Element or 970 the collectingProcessIPv6Address Information Element to identify the 971 Collecting Process which received it. If the Exporting Process or 972 Collecting Process address bears some relation to the IP addresses in 973 the flow data (e.g., shares a network prefix with internal addresses) 974 and the IP addresses in the flow data are anonymised in a structure- 975 preserving way, then the Exporting Process or Collecting Process 976 address may be used to break the IP address anonymisation. Since 977 these Options Templates are primarily intended for storing IPFIX 978 Transport Session data for auditing, replay, and testing purposes, it 979 is NOT RECOMMENDED that storage of anonymised data include these 980 Options Templates in order to mitigate the risk of attack. 982 The Message Details Options Template specified for the IPFIX File 983 Format [I-D.ietf-ipfix-file] also contains the 984 collectionTimeMilliseconds Information Element. As with the Export 985 Time Message Header field, if the exported flow data contains 986 anonymised timestamp information, and the collectionTimeMilliseconds 987 Information Element in a given Message has some relationship to the 988 anonymised timestamp information, then this relationship can be 989 exploited to reverse the timestamp anonymisation. Since this Options 990 Template is primarily intended for storing IPFIX Transport Session 991 data for auditing, replay, and testing purposes, it is NOT 992 RECOMMENDED that storage of anonymised data include this Options 993 Template in order to mitigate the risk of attack. 995 Since the Time Window Options Template specified for the IPFIX File 996 Format [I-D.ietf-ipfix-file] refers to the timestamps within the flow 997 data to provide partial table of contents information for an IPFIX 998 File, care must be taken to ensure that Options described by this 999 template are written using the anonymised timestamps instead of the 1000 original ones. 1002 9. Examples 1004 [TODO: write this section.] 1006 10. Security Considerations 1008 [TODO: write this section.] 1010 11. IANA Considerations 1012 This document contains no actions for IANA. 1014 [EDITOR'S NOTE: creation of anonymisationStability and 1015 anonymisationTechnique registries may change this.] 1017 12. Acknowledgments 1019 We thank Paul Aitken for his comments and insight, and the PRISM 1020 project for its support of this work. 1022 13. References 1024 13.1. Normative References 1026 [RFC5101] Claise, B., "Specification of the IP Flow Information 1027 Export (IPFIX) Protocol for the Exchange of IP Traffic 1028 Flow Information", RFC 5101, January 2008. 1030 [RFC5102] Quittek, J., Bryant, S., Claise, B., Aitken, P., and J. 1031 Meyer, "Information Model for IP Flow Information Export", 1032 RFC 5102, January 2008. 1034 13.2. Informative References 1036 [I-D.ietf-ipfix-as] 1037 Zseby, T., "IPFIX Applicability", draft-ietf-ipfix-as-12 1038 (work in progress), July 2007. 1040 [I-D.ietf-ipfix-architecture] 1041 Sadasivan, G., "Architecture for IP Flow Information 1042 Export", draft-ietf-ipfix-architecture-12 (work in 1043 progress), September 2006. 1045 [I-D.ietf-ipfix-file] 1046 Trammell, B., Boschi, E., Mark, L., Zseby, T., and A. 1047 Wagner, "Specification of the IPFIX File Format", 1048 draft-ietf-ipfix-file-03 (work in progress), October 2008. 1050 [I-D.ietf-ipfix-mediators-framework] 1051 Kobayashi, A., Nishida, H., and B. Claise, "IPFIX 1052 Mediation: Framework", 1053 draft-ietf-ipfix-mediators-framework-02 (work in 1054 progress), February 2009. 1056 [RFC3917] Quittek, J., Zseby, T., Claise, B., and S. Zander, 1057 "Requirements for IP Flow Information Export (IPFIX)", 1058 RFC 3917, October 2004. 1060 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1061 Requirement Levels", BCP 14, RFC 2119, March 1997. 1063 Authors' Addresses 1065 Elisa Boschi 1066 Hitachi Europe 1067 c/o ETH Zurich 1068 Gloriastrasse 35 1069 8092 Zurich 1070 Switzerland 1072 Phone: +41 44 632 70 57 1073 Email: elisa.boschi@hitachi-eu.com 1075 Brian Trammell 1076 Hitachi Europe 1077 c/o ETH Zurich 1078 Gloriastrasse 35 1079 8092 Zurich 1080 Switzerland 1082 Phone: +41 44 632 70 13 1083 Email: brian.trammell@hitachi-eu.com