idnits 2.17.1 draft-boschi-ipfix-anon-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 15. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 469. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 480. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 487. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 493. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 14, 2008) is 5764 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Unused Reference: 'I-D.ietf-ipfix-architecture' is defined on line 416, but no explicit reference was found in the text == Unused Reference: 'I-D.ietf-ipfix-reducing-redundancy' is defined on line 421, but no explicit reference was found in the text ** Obsolete normative reference: RFC 5101 (Obsoleted by RFC 7011) ** Obsolete normative reference: RFC 5102 (Obsoleted by RFC 7012) Summary: 3 errors (**), 0 flaws (~~), 4 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 IPFIX Working Group E. Boschi 3 Internet-Draft B. Trammell 4 Intended status: Experimental Hitachi Europe 5 Expires: January 15, 2009 July 14, 2008 7 IP Flow Anonymisation Support 8 draft-boschi-ipfix-anon-01.txt 10 Status of this Memo 12 By submitting this Internet-Draft, each author represents that any 13 applicable patent or other IPR claims of which he or she is aware 14 have been or will be disclosed, and any of which he or she becomes 15 aware will be disclosed, in accordance with Section 6 of BCP 79. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt. 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 This Internet-Draft will expire on January 15, 2009. 35 Abstract 37 This document describes anonymisation techniques for IP flow data. 38 It provides a categorization of common anonymisation schemes and 39 defines the parameters needed to describe them. It describes support 40 for anonymization within the IPFIX protocol, providing the basis for 41 the definition of information models for configuring anonymisation 42 techniques within an IPFIX Metering or Exporting Process, and for 43 reporting the technique in use to an IPFIX Collecting Process. 45 Table of Contents 47 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 48 1.1. IPFIX Protocol Overview . . . . . . . . . . . . . . . . . 3 49 1.2. IPFIX Documents Overview . . . . . . . . . . . . . . . . . 3 50 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 51 3. Categorisation of Anonymisation Techniques . . . . . . . . . . 4 52 4. Anonymisation of IP Flow Data . . . . . . . . . . . . . . . . 5 53 4.1. IP Address Anonymisation . . . . . . . . . . . . . . . . . 6 54 4.1.1. Truncation . . . . . . . . . . . . . . . . . . . . . . 7 55 4.1.2. Random Permutations . . . . . . . . . . . . . . . . . 7 56 4.1.3. Prefix-preserving Pseudonymisation . . . . . . . . . . 7 57 4.2. Timestamp Anonymisation . . . . . . . . . . . . . . . . . 7 58 4.2.1. Precision Degradation . . . . . . . . . . . . . . . . 7 59 4.2.2. Enumeration . . . . . . . . . . . . . . . . . . . . . 7 60 4.2.3. Random Time Shifts . . . . . . . . . . . . . . . . . . 8 61 4.3. Counter Anonymisation . . . . . . . . . . . . . . . . . . 8 62 4.3.1. Precision Degradation . . . . . . . . . . . . . . . . 8 63 4.3.2. Binning . . . . . . . . . . . . . . . . . . . . . . . 8 64 4.3.3. Random Noise Addition . . . . . . . . . . . . . . . . 8 65 4.4. Anonymisation of Other Flow Fields . . . . . . . . . . . . 9 66 5. Parameters for the Description of Anonymisation Techniques . . 9 67 6. Anonymisation Support in IPFIX . . . . . . . . . . . . . . . . 9 68 7. Security Considerations . . . . . . . . . . . . . . . . . . . 9 69 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 70 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 9 71 9.1. Normative References . . . . . . . . . . . . . . . . . . . 9 72 9.2. Informative References . . . . . . . . . . . . . . . . . . 10 73 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 10 74 Intellectual Property and Copyright Statements . . . . . . . . . . 12 76 1. Introduction 78 The standardisation of an IP flow information export protocol 79 [RFC5101] and associated representations removes a technical barrier 80 to the sharing of IP flow data across organizational boundaries and 81 with network operations, security, and research communities for a 82 wide variety of purposes. However, with wider dissemination comes 83 greater risks to the privacy of the users of networks under 84 measurement, and to the security of those networks. While it is not 85 a complete solution to the issues posed by distribution of IP flow 86 information, anonymisation is an important tool for the protection of 87 privacy within network measurement infrastructures. 89 This document presents a mechanism for representing anonymised data 90 within IPFIX and guidelines for using it. It begins with a 91 categorization of anonymisation techniques. It then describes 92 applicability of each technique to commonly anonymisable fields of IP 93 flow data, organized by information element data type and semantics 94 as in [RFC5102]; enumerates the parameters required by each of the 95 applicable anonymisation techniques; and provides guidelines for the 96 use of each of these techniques in accordance with best practices in 97 data protection. Finally, it specifies a mechanism for exporting 98 anonymised data and binding anonymisation metadata to templates using 99 IPFIX Options. 101 1.1. IPFIX Protocol Overview 103 In the IPFIX protocol, { type, length, value } tuples are expressed 104 in templates containing { type, length } pairs, specifying which { 105 value } fields are present in data records conforming to the 106 Template, giving great flexibility as to what data is transmitted. 107 Since Templates are sent very infrequently compared with Data 108 Records, this results in significant bandwidth savings. Various 109 different data formats may be transmitted simply by sending new 110 Templates specifying the { type, length } pairs for the new data 111 format. See [RFC5101] for more information. 113 The IPFIX information model [RFC5102] defines a large number of 114 standard Information Elements which provide the necessary { type } 115 information for Templates. The use of standard elements enables 116 interoperability among different vendors' implementations. 117 Additionally, non-standard enterprise-specific elements may be 118 defined for private use. 120 1.2. IPFIX Documents Overview 122 "Specification of the IPFIX Protocol for the Exchange of IP Traffic 123 Flow Information" [RFC5101] and its associated documents define the 124 IPFIX Protocol, which provides network engineers and administrators 125 with access to IP traffic flow information. 127 "Architecture for IP Flow Information Export" [I-D.ietf-ipfix-arch] 128 defines the architecture for the export of measured IP flow 129 information out of an IPFIX Exporting Process to an IPFIX Collecting 130 Process, and the basic terminology used to describe the elements of 131 this architecture, per the requirements defined in "Requirements for 132 IP Flow Information Export" [RFC3917]. The IPFIX Protocol document 133 [RFC5101] then covers the details of the method for transporting 134 IPFIX Data Records and Templates via a congestion-aware transport 135 protocol from an IPFIX Exporting Process to an IPFIX Collecting 136 Process. 138 "Information Model for IP Flow Information Export" [RFC5102] 139 describes the Information Elements used by IPFIX, including details 140 on Information Element naming, numbering, and data type encoding. 141 Finally, "IPFIX Applicability" [I-D.ietf-ipfix-as] describes the 142 various applications of the IPFIX protocol and their use of 143 information exported via IPFIX, and relates the IPFIX architecture to 144 other measurement architectures and frameworks. 146 This document references the Protocol and Architecture documents for 147 terminology and extends the IPFIX Information Model to provide new 148 Information Elements for anonymisation metadata. 150 2. Terminology 152 Terms used in this document that are defined in the Terminology 153 section of the IPFIX Protocol [RFC5101] document are to be 154 interpreted as defined there. 156 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 157 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 158 document are to be interpreted as described in RFC 2119 [RFC2119]. 160 3. Categorisation of Anonymisation Techniques 162 Anonymisation modifies a data set in order to protect the identity of 163 the people or entities described by the data set from disclosure. 164 With respect to network traffic data, anonymisation generally 165 attempts to preserve some set of properties of the network traffic 166 useful for a given application or applications, while ensuring the 167 data cannot be traced back to the specific networks, hosts, or users 168 generating the traffic. 170 Anonymisation may be broadly split into three categories: 171 generalisation and reversible or irreversible substitution. When 172 generalisation is used, identifying information is grouped in sets, 173 and one single value is used to identify each set element. In 174 effect, this causes multiple records to become indistinguishable, 175 thereby aggregating them together. Generalisation is an irreversible 176 operation, in that the information needed to identify a single record 177 from its "generalised value" is lost. 179 Substitution (or pseudonymization) maps the real space of identifiers 180 or values into a separate, replacement space, using some substitution 181 function. If the substitution function is invertible or can 182 otherwise be reversed, then the substitution is reversible, and a 183 real identifier can be recovered from a given replacement identifier. 184 This allows to keep different elements distinguishable from each 185 other: the number of different elements in the real and the 186 replacement space is the same. 188 Irreversible substitution results when a randomising or one-way 189 function is used to map the value space; real identifiers cannot be 190 recovered in an irreversible substitution. The number of different 191 elements in the real and replacement spaces is not necessarily the 192 same. 194 4. Anonymisation of IP Flow Data 196 Due to the restricted semantics of IP flow data, there are a 197 relatively limited set of specific anonymisation techniques available 198 on flow data, though each falls into the broad categories above. 199 Each type of field that may commonly appear in a flow record may have 200 its own applicable specific techniques. 202 While anonymisation is generally applied at the resolution of single 203 fields within a flow record, attacks against anonymisation use entire 204 flows and relationships between hosts and flows within a given data 205 set. Therefore, fields which may not necessarily be identifying by 206 themselves may be anonymised in order to increase the anonymity of 207 the data set as a whole. 209 Of all the fields in an IP flow record, only IP addresses directly 210 identify entities in the real world. Each IP address is associated 211 with an interface on a network host, and can potentially be 212 identified with a single user. Additionally, IP addresses are 213 structured identifiers; that is, partial IP address prefixes may be 214 used to identify networks just as full IP addresses identify hosts. 215 This makes anonymisation of IP addresses particularly important. 217 Port numbers identify abstract entities (applications) as opposed to 218 real-world entities, but they can be used to classify hosts and user 219 behavior. Passive port fingerprinting, both of well-known and 220 ephemeral ports, can be used to determine the operating system 221 running on a host. Relative data volumes by port can also be used to 222 determine the host's function (workstation, web server, etc.); this 223 information can be used to identify hosts and users. 225 While not identifiers in and of themselves, timestamps and counters 226 can reveal the behavior of the hosts and users on a network. Any 227 given network activity is recognizable by a pattern of relative time 228 differences and data volumes in the associated sequence of flows, 229 even without host address information. They can therefore be used to 230 identify hosts and users. Timestamps and counters are also 231 vulnerable to traffic injection attacks, where traffic with a known 232 pattern is injected into a network under measurement, and this 233 pattern is later identified in the anonymised data set. 235 The simplest and most extreme form of anonymisation, which can be 236 applied to any field of a flow record, is black-marker anonymisation, 237 or complete deletion of a given field. While black-marker 238 anonymisation completely protects the data in the deleted fields from 239 the risk of disclosure, it also reduces the utility of the anonymised 240 data set as a whole. Techniques that retain some information while 241 reducing (though not eliminating) the disclosure risk will be 242 extensively discussed in the following sections; note that the 243 techniques specifically applicable to IP addresses, timestamps, and 244 counters will be discussed in separate sections. 246 4.1. IP Address Anonymisation 248 The following table gives an overview of the schemes for IP address 249 anonymization described in this document and their categorization. 251 +----------------------------------+----------------+---------------+ 252 | Scheme | Action | Reversibility | 253 +----------------------------------+----------------+---------------+ 254 | Truncation | Generalisation | N | 255 | Random Permutation | Substitution | Y/N | 256 | Prefix-preserving | Substitution | Y | 257 | Pseudonymisation | | | 258 +----------------------------------+----------------+---------------+ 260 Note that random permutations might be either reversible or not, 261 depending on the function used. 263 4.1.1. Truncation 265 Truncation removes "n" of the least significant bits from an IP 266 Address. Note that truncating 8 bits would replace an IP Address 267 with the corresponding class C network address. 269 4.1.2. Random Permutations 271 When random permutations are used, each IP Address is replaced with a 272 random permutation on the set of possible IP Addresses. The 273 permutation function can be implemented using hash tables. 275 4.1.3. Prefix-preserving Pseudonymisation 277 Prefix-preserving pseudonymisation preserves the structure of IP 278 Addresses. If two IP Addresses match on a prefix of "n" bits, their 279 anonymised versions will match on a prefix of "n" bits too. 281 4.2. Timestamp Anonymisation 283 [TODO: introductory text] 285 +-----------------------+----------------+---------------+ 286 | Scheme | Action | Reversibility | 287 +-----------------------+----------------+---------------+ 288 | Precision Degradation | Generalisation | N | 289 | Enumeration | Substitution | Y | 290 | Random Shifts | Substitution | Y | 291 +-----------------------+----------------+---------------+ 293 4.2.1. Precision Degradation 295 Precision Degradation removes the most precise components of a 296 timestamp, accounting all events occurring in each given interval 297 (e.g. one millisecond for millisecond level degradation) as 298 simultaneous. This has the effect of potentially collapsing many 299 timestamps into one. With this technique time precision is reduced, 300 and sequencing may be lost, but the information at which time the 301 event happened is kept. 303 4.2.2. Enumeration 305 Enumeration keeps the chronological order in which events occurred 306 while eliminating time information. Timestamps are substituted by 307 equidistant timestamps (or numbers) starting from an rendomly chosen 308 start value. 310 4.2.3. Random Time Shifts 312 Random Time Shifts keep the information on how far apart two events 313 are from each other. This is achieved by shifting all timestamps by 314 the same random number. Note that random time shifts also preserve 315 chronological order. 317 4.3. Counter Anonymisation 319 Counters (such as packet and octet volumes per flow) are subject to 320 fingerprinting and injection attacks against anonymisation, as 321 timestamps are, but relative magnitudes of activity can be useful for 322 certain analysis tasks. [TODO: more intro text] 324 +-----------------------+----------------+---------------+ 325 | Scheme | Action | Reversibility | 326 +-----------------------+----------------+---------------+ 327 | Precision Degradation | Generalisation | N | 328 | Binning | Generalisation | N | 329 | Random noise addition | Substitution | N | 330 +-----------------------+----------------+---------------+ 332 4.3.1. Precision Degradation 334 As with precision degradation in timestamps, precision degradation of 335 counters removes lower-order bits of the counters, treating all the 336 counters in a given range as having the same value. Depending on the 337 precision reduction, this loses information about the relationships 338 between sizes of similarly-sized flows, but keeps relative magnitude 339 information. 341 4.3.2. Binning 343 Binning can be seen as a special case of precision degradation; the 344 operation is identical, except for in precision degradation the 345 counter ranges are uniform, and in binning they need not be. For 346 example, a common counter binning scheme for packet counters could be 347 to bin values 1-2 together, and 3-infinity together, thereby 348 separating potentially completely-opened TCP connections from 349 unopened ones. Binning schemes are generally chosen to keep 350 precisely the amount of information required in a counter for a given 351 analysis task 353 4.3.3. Random Noise Addition 355 Random noise addition adds a random amount to a counter in each flow; 356 this is used to keep relative magnitude information and minimize the 357 disruption to size relationship information while avoiding 358 fingerprinting attacks against anonymization. 360 4.4. Anonymisation of Other Flow Fields 362 [TODO: as section 4.1] 364 5. Parameters for the Description of Anonymisation Techniques 366 [TODO: see corresponding section of draft-ietf-psamp-sample-tech for 367 the proposed structure of this section.] 369 6. Anonymisation Support in IPFIX 371 [TODO: Here we'll describe how the information specified above can be 372 transmitted on the wire using an option template. The idea is to 373 scope the option to the Template ID and for each field specify which 374 are anonymised, providing info on the output characteristics of the 375 technique, and which ones aren't.] 377 [EDITOR'S NOTE: Multiple anon. techniques applied on an IE at the 378 same time is indicated with multiple elements of the same type (in 379 application order as in PSAMP)] 381 [EDITOR'S NOTE: for blackmarking we'll recommend not to export the 382 information at all following the data protection law principle that 383 only necessary information should be exported.] 385 7. Security Considerations 387 [TODO: write this section.] 389 8. IANA Considerations 391 This document contains no actions for IANA. 393 9. References 395 9.1. Normative References 397 [RFC5101] Claise, B., "Specification of the IP Flow Information 398 Export (IPFIX) Protocol for the Exchange of IP Traffic 399 Flow Information", RFC 5101, January 2008. 401 [RFC5102] Quittek, J., Bryant, S., Claise, B., Aitken, P., and J. 402 Meyer, "Information Model for IP Flow Information Export", 403 RFC 5102, January 2008. 405 9.2. Informative References 407 [I-D.ietf-ipfix-arch] 408 Sadasivan, G. and N. Brownlee, "Architecture Model for IP 409 Flow Information Export", draft-ietf-ipfix-arch-02 (work 410 in progress), October 2003. 412 [I-D.ietf-ipfix-as] 413 Zseby, T., "IPFIX Applicability", draft-ietf-ipfix-as-12 414 (work in progress), July 2007. 416 [I-D.ietf-ipfix-architecture] 417 Sadasivan, G., "Architecture for IP Flow Information 418 Export", draft-ietf-ipfix-architecture-12 (work in 419 progress), September 2006. 421 [I-D.ietf-ipfix-reducing-redundancy] 422 Boschi, E., "Reducing Redundancy in IP Flow Information 423 Export (IPFIX) and Packet Sampling (PSAMP) Reports", 424 draft-ietf-ipfix-reducing-redundancy-04 (work in 425 progress), May 2007. 427 [RFC3917] Quittek, J., Zseby, T., Claise, B., and S. Zander, 428 "Requirements for IP Flow Information Export (IPFIX)", 429 RFC 3917, October 2004. 431 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 432 Requirement Levels", BCP 14, RFC 2119, March 1997. 434 Authors' Addresses 436 Elisa Boschi 437 Hitachi Europe 438 c/o ETH Zurich 439 Gloriastrasse 35 440 8092 Zurich 441 Switzerland 443 Phone: +41 44 632 70 57 444 Email: elisa.boschi@hitachi-eu.com 445 Brian Trammell 446 Hitachi Europe 447 c/o ETH Zurich 448 Gloriastrasse 35 449 8092 Zurich 450 Switzerland 452 Phone: +41 44 632 70 13 453 Email: brian.trammell@hitachi-eu.com 455 Full Copyright Statement 457 Copyright (C) The IETF Trust (2008). 459 This document is subject to the rights, licenses and restrictions 460 contained in BCP 78, and except as set forth therein, the authors 461 retain all their rights. 463 This document and the information contained herein are provided on an 464 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 465 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 466 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 467 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 468 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 469 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 471 Intellectual Property 473 The IETF takes no position regarding the validity or scope of any 474 Intellectual Property Rights or other rights that might be claimed to 475 pertain to the implementation or use of the technology described in 476 this document or the extent to which any license under such rights 477 might or might not be available; nor does it represent that it has 478 made any independent effort to identify any such rights. Information 479 on the procedures with respect to rights in RFC documents can be 480 found in BCP 78 and BCP 79. 482 Copies of IPR disclosures made to the IETF Secretariat and any 483 assurances of licenses to be made available, or the result of an 484 attempt made to obtain a general license or permission for the use of 485 such proprietary rights by implementers or users of this 486 specification can be obtained from the IETF on-line IPR repository at 487 http://www.ietf.org/ipr. 489 The IETF invites any interested party to bring to its attention any 490 copyrights, patents or patent applications, or other proprietary 491 rights that may cover technology that may be required to implement 492 this standard. Please address the information to the IETF at 493 ietf-ipr@ietf.org.