idnits 2.17.1 draft-cheng-ipfix-packet-selector-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 22, 2011) is 4663 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: 'RFC3917' on line 188 == Unused Reference: '1' is defined on line 366, but no explicit reference was found in the text == Unused Reference: '3' is defined on line 371, but no explicit reference was found in the text == Unused Reference: '4' is defined on line 373, but no explicit reference was found in the text == Unused Reference: '5' is defined on line 375, but no explicit reference was found in the text == Unused Reference: '6' is defined on line 378, but no explicit reference was found in the text == Unused Reference: '7' is defined on line 380, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2234 (ref. '2') (Obsoleted by RFC 4234) ** Downref: Normative reference to an Informational RFC: RFC 3917 (ref. '3') ** Downref: Normative reference to an Informational RFC: RFC 5474 (ref. '4') -- Possible downref: Non-RFC (?) normative reference: ref. '7' Summary: 5 errors (**), 0 flaws (~~), 7 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 IPFIX Working Group G. Cheng 2 Internet Draft J. Gong 3 Intended status: Standards Track w. Zhang 4 Expires: Dec 23,2011 H. Wu 5 Southeast University 6 June 22, 2011 8 A Composite IP Packet Selector 9 draft-cheng-ipfix-packet-selector-00.txt 11 Abstract 13 This document specifies a composite IP packet selector in Metering 14 Process of the IP Flow Information Export protocol (IPFIX). The 15 composite selector is realized by combining a sampling selector 16 using systematic or random sampling technique followed by a hash- 17 based filtering selector computing the hash function on 5-tuples 18 information (source/ destination IP address, source/destination port 19 number, port). Taking flow sampling into account in packet selection, 20 the designed composite selector could better solve the short-flow 21 lost problem meeting in simple systematic or random sampling 22 selector. 24 Status of this Memo 26 This Internet-Draft is submitted to IETF in full conformance with 27 the provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF), its areas, and its working groups. Note that 31 other groups may also distribute working documents as Internet- 32 Drafts. 34 Internet-Drafts are draft documents valid for a maximum of six 35 months and may be updated, replaced, or obsoleted by other documents 36 at any time. It is inappropriate to use Internet-Drafts as 37 reference material or to cite them other than as "work in progress." 39 The list of current Internet-Drafts is at 40 http://datatracker.ietf.org/drafts/current/. 42 The list of current Internet-Drafts can be accessed at 43 http://www.ietf.org/1id-abstracts.html 45 The list of Internet-Draft Shadow Directories can be accessed at 46 http://www.ietf.org/shadow.html 48 This Internet-Draft will expire on December 22, 2011. 50 Copyright Notice 52 Copyright (c) 2011 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (http://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Introduction.....................................................4 68 2. Terminology......................................................4 69 3. Composite selector...............................................6 70 3.1. Architecture................................................6 71 3.2. Simple sampling selector....................................7 72 3.3. Hash-based filtering selector...............................7 73 3.4. Algorithm...................................................8 74 3.5. Hash Function...............................................8 75 4. Formal Syntax....................................................9 76 5. Security Considerations..........................................9 77 References ........................................................10 78 Acknowledgments....................................................10 79 Author's Addresses.................................................10 80 1. Introduction 82 With the network data rates increment and fine-grained traffic 83 measurements need, sustained capture of network traffic at line rate 84 is difficult to perform even with the expensive specialized 85 measurement hardware. Therefore, some form of data reduction at the 86 point of measure is necessary. This can be achieved by an 87 intelligent packet selection through Sampling or Filtering, as well 88 as use of aggregation techniques. The motivation for Sampling is to 89 select a representative subset of packets that allow accurate 90 estimates of properties of the unsampled whole traffic. The 91 motivation for Filtering is to remove all packets that are not of 92 interest. The motivation for aggregation is to combine data and 93 allow compact pre-defined views of the traffic. Flow-based IP 94 traffic measurements synthetically apply packet selection and 95 aggregation techniques to achieve the capture of network traffic at 96 line rate in the backbone link. 98 The IPFIX working group gives a brief description about their 99 systematic and random sampling techniques using for packet selection 100 in metering process (section 5.2 of RFC 3917). With good use of 101 packet sampling method, they could efficiently reduce the data 102 amount to capture at the observation point. However, the simple 103 sampling techniques have a natural disadvantage in the capture of 104 short-flows. With equal probability to select each packet, long- 105 flows with a large number of packets have more opportunity to be 106 captured than short-flows with a relatively small number of packets. 107 Therefore, with a lower sampling probability, the simple sampling 108 techniques here may lead to serious lost of short-flows in flow- 109 based IP traffic measurements. Furthermore, short-flows usually 110 produced by anomalous network events such as DDoS attack. In a word, 111 simple sampling techniques have a high lost rate in the capture of 112 short-flows which could be used to find and analyze network 113 anomalous event. 115 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 116 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in 117 this document are to be interpreted as described in RFC-2119. 119 2. Terminology 121 The terminology defined here is fully consistent with all terms 122 listed in [RFC 5474 and RFC 5475] but includes additional terms 123 required for the description of the specific filtering selector. 125 In addition, this document defines the following terms 126 * Filtering: A filter is a Selector that selects a packet 127 deterministically based on the Packet Content, or its treatment, or 128 functions of these occurring in the Selection State. Two examples 129 are: 131 (i) Property Match Filtering: A packet is selected if a specific 132 field in the packet equals a predefined value. 134 (ii) Hash-based Selection: A Hash Function is applied to the Packet 135 Content, and the packet is selected if the result falls in a 136 specified range. 138 * Sampling: A Selector that is not a filter is called a Sampling 139 operation. This reflects the intuitive notion that if the selection 140 of a packet cannot be determined from its content alone, there must 141 be some type of Sampling taking place. Sampling operations can be 142 divided into two subtypes: 144 (i) Content-independent Sampling, which does not use Packet Content 145 in reaching Sampling decisions. Examples include systematic Sampling, 146 and uniform pseudorandom Sampling driven by a pseudorandom number 147 whose generation is independent of Packet Content. Note that in 148 content independent Sampling, it is not necessary to access the 149 Packet Content in order to make the selection decision. 151 (ii) Content-dependent Sampling, in which the Packet Content is used 152 in reaching selection decisions. An application is pseudorandom 153 selection according to a probability that depends on the contents of 154 a packet field, e.g., Sampling packets with a probability dependent 155 on their TCP/UDP port numbers. Note that this is not a Filter. 157 * Hash Domain: A Hash Domain is a subset of the Packet Content and 158 the packet treatment, viewed as an N-bit string for some positive 159 integer N. 161 * Hash Range: A Hash Range is a set of M-bit strings for some 162 positive integer M that defines the range of values that the result 163 of the hash operation can take. 165 * Hash Function: A Hash Function defines a deterministic mapping 166 from the Hash Domain into the Hash Range. 168 * Hash Selection Range: A Hash Selection Range is a subset of the 169 Hash Range. The packet is selected if the action of the Hash 170 Function on the Hash Domain for the packet yields a result in the 171 Hash Selection Range. 173 * Hash-based Selection: A Hash-based Selection is Filtering 174 specified by a Hash Domain, a Hash Function, a Hash Range, and a 175 Hash Selection Range. 177 * Observed Packet Stream: The Observed Packet Stream is the set of 178 all packets observed at the Observation Point. 180 * Selected Packet Stream: A Selected Packet Stream denotes a set of 181 packets from the Observed Packet Stream that flows past some 182 specified point within the Metering Process. An example of a 183 Selected Packet Stream is the output of the selection process. Note 184 that packets selected from a stream, e.g., by Sampling, do not 185 necessarily possess a property by which they can be distinguished 186 from packets that have not been selected. For this reason, the term 187 "stream" is favored over "flow", which is defined as a set of 188 packets with common properties [RFC3917]. 190 * Non Selected Packet Stream: A Non Selected Packet Stream denotes a 191 set of packets from the Observed Packet Stream that can not flow 192 past all specified point within the Metering Process. An example of 193 a Non Selected Packet Stream is the dropped packet stream of the 194 selection process. Note that packets not selected from a stream. 196 * Packet Content: The Packet Content denotes the union of the packet 197 header (which includes link layer, network layer, and other 198 encapsulation headers) and the packet payload. At some Observation 199 Points, the link header information may not be available. 201 * 5-tuple flow information: Basic information in the packet header: 202 source IP address, destination IP address, source port number, 203 destination port number, and port. 205 3. Composite selector 207 The composite selector aims at the solution of the high lost rate 208 problem in the capture of short-flows with the simple sampling 209 techniques. It is realized by combining a sampling selector using 210 systematic or random sampling technique followed by a hash-based 211 filtering selector computing a hash function on 5-tuples information. 212 This section detail describes its architecture and each component. 214 3.1. Architecture 215 +----------------------------------------+ 216 | +--------+ | 217 | | |---Selected Packet Stream -----> 218 | | | | 219 | | simple | | 220 | |sampling| Non +----------+ | 221 Observed | |selector| Selected |hash-based| | Selected 222 Packet---->| |-Packet--> |filtering |-------> Packet 223 Stream | | | Stream |selector | | Stream 224 | +--------+ +----------+ | 225 | Composite Selector | 226 +----------------------------------------+ 227 Figure 1: Architecture of A Composite Selector 229 The composite selector composes two cascaded selector: a simple 230 sampling selector followed by a specific hash-based filtering 231 selector. The latter one takes the non selected packet stream of the 232 previous one as its input. 233 In the first stage, the sampling selector uses simple systematic or 234 random sampling technique to select packets from observed packet 235 stream. If the packet is selected then export it outside, otherwise 236 forward it to the filtering selector. 238 In the second stage, the filtering selector computes a hash function 239 on 5-tuples information of each packet coming from the sampling 240 selector, and selects the packet whose hash key matching the 241 predefined patterns. 242 The input of the composite selector is the observed packet stream 243 while the output composes two parts. One is the selected packet 244 stream in the first stage; the other is the non selected stream of 245 the first stage but selected again in the second stage. 247 3.2. Simple sampling selector 249 A sampling selector is targeted at the selection of a representative 250 subset of packets. The subset is used to infer knowledge about the 251 whole set of observed packets without processing them all. The 252 selection can depend on packet position, and/or on Packet Content, 253 and/or on (pseudo) random decisions. 255 Because the sampling selector here is the same as what the IPFIX 256 working group described in RFC 3917, the document doesn't repeatedly 257 introduce this part. 259 3.3. Hash-based filtering selector 260 A normal hash-based filtering selector uses a hash function h to map 261 the Packet Content c, or some portion of it, onto a Hash Range R. 262 The packet is selected if h(c) is an element of S, which is a subset 263 of R called the Hash Selection Range. 265 To solve the high lost rate problem in the capture of short-flows, 266 the hash-based filtering selector here should take flow sampling 267 into account in packet filtering. That is on the basis of 5-tuples 268 flow information to compute a hash function. 270 3.4. Algorithm 272 First of all, the algorithm should predefine a pattern set - a set 273 of one or more patterns while each pattern definite a hash mapping 274 range. 276 On receiving a packet, the filtering selector computes the hash key 277 of the packet'5-tuple. 279 Then, it selects the packet if the hash key matching any one pattern 280 in the set. 282 3.5. Hash Function 284 Because applying the hash-based packet Selection, BOB function MUST 285 be used for packet selection operations in order to be compliant 286 with PSAMP (RFC 5475). 288 If a Hash-based Selection with the BOB function is used with IPv4 289 traffic, the following input bytes MUST be used. 290 - IP identification field 291 - Flags field 292 - Fragment offset 293 - Source IP address 294 - Destination IP address 295 - A configurable number of bytes from the IP payload, starting at 296 a configurable offset 298 Due to the lack of suitable IPv6 packet traces, all candidate Hash 299 Functions in RFC5476 were evaluated only for IPv4. Due to the IPv6 300 header fields and address structure, it is expected that there is 301 less randomness in IPv6 packet headers than in IPv4 headers. 302 Nevertheless, the randomness of IPv6 traffic has not yet been 303 evaluated sufficiently to get any evidence. In addition to this, 304 IPv6 traffic profiles may change significantly in the future when 305 IPv6 is used by a broader community. 307 If a Hash-based Selection with the BOB function is used with IPv6 308 traffic, the following input bytes MUST be used. 310 - Payload length (2 bytes) 311 - Byte number 10,11,14,15,16 of the IPv6 source address 312 - Byte number 10,11,14,15,16 of the IPv6 destination address 313 - A configurable number of bytes from the IP payload, starting at 314 a configurable offset. It is recommended to use at least 4 bytes 315 from the IP payload. 317 The payload itself is not changing during the path. Even if some 318 routers process some extension headers, they are not going to strip 319 them from the packet. Therefore, the payload length is invariant 320 along the path. Furthermore, it usually differs for different 321 packets. The IPv6 address has 16 bytes. The first part is the 322 network part and contains low variation. The second part is the host 323 part and contains higher variation. Therefore, the second part of 324 the address is used. Nevertheless, the uniformity has not been 325 checked for IPv6 traffic. 327 4. Formal Syntax 329 The following syntax specification uses the augmented Backus-Naur 330 Form (BNF) as described in RFC-2234 [2]. 332 5. Security Considerations 334 Security considerations concerning the choice of a Hash Function for 335 Hash-based Selection. Furthermore, the Hash Function has a number of 336 potential attacks to craft Packet Streams that are 337 disproportionately detected and/or discover the Hash Function 338 parameters, the vulnerabilities of different Hash Functions to these 339 attacks, and practices to minimize these vulnerabilities. 341 In addition to this, a user can gain knowledge about the start and 342 stop triggers in time-based systematic Sampling, e.g., by sending 343 test packets. This knowledge might allow users to modify their send 344 schedule in a way that their packets are disproportionately selected 345 or not selected. 347 For random Sampling, a cryptographically strong random number 348 generator should be used in order to prevent that an advisory can 349 predict the selection decision. 351 Further security threats can occur when Sampling parameters are 352 configured or communicated to other entities. The configuration and 353 reporting of Sampling parameters are out of scope of this document. 354 Therefore, the security threats that originate from this kind of 355 communication cannot be assessed with the information given in this 356 document. 358 Some of these threats can probably be addressed by keeping 359 configuration information confidential and by authenticating 360 entities that configure Sampling. Nevertheless, a full analysis and 361 assessment of threats for configuration and reporting has to be done 362 if configuration or reporting methods are proposed. 364 References 366 [1] Bradner, S., "The Internet Standards Process-Revision 3", BCP 9, 367 RFC 2026, October 1996. 368 [2] Crocker, D. and Overell, P(Editors), "Augmented BNF for Syntax 369 Specifications:ABNF", RFC 2234, Internet Mail Consortium and Demon 370 Internet Ltd, November 1997. 371 [3] J.Quittek, T.Zseby, B.Claise and S.Zander, "Requirements for IP 372 Flow Information Export (IPFIX)", RFC 3917, October 2004 373 [4] Duffield, N., Ed., "A Framework for Packet Selection and 374 Reporting", RFC 5474, March 2009. 375 [5] Zseby, T., Molina, M., Duffield, D., Niccolini, S., and F. 376 Rapall, "Sampling and Filtering Techniques for IP Packet Selection", 377 RFC 5475, March 2009. 378 [6] Claise, B., Ed., "Packet Sampling (PSAMP) Protocol 379 Specifications", RFC 5476, March 2009. 380 [7] Vyas Sekar, Michael K Reiter, Hui Zhang, "Revisiting the Case 381 for a Minimalist Approach for Network Flow Monitoring", In Proc.IMC, 382 November 2010. 384 Acknowledgments 386 This work is materially supported by the National Key Technology 387 Program of China under Grant No.2008BAH37B04, the National Grand 388 Fundamental Research 973 program of China under Grant No. 389 2009CB320505, the National Nature Science Foundation of China under 390 Grant No. 60973123. 392 Author's Addresses 394 Guang Cheng 395 School of Computer Science and Engineering 396 Southeast University 397 Sipailou No.2, Nanjing, P.R.China 398 Phone: +86 25 83794000 399 Email: gcheng@njnet.edu.cn 401 Jian Gong 402 School of Computer Science and Engineering 403 Southeast University 404 Sipailou No.2, Nanjing, P.R.China 405 Phone: +86 25 83794000 406 Email: jgong@njnet.edu.cn 408 Weiwei Zhang 409 School of Computer Science and Engineering 410 Southeast University 411 Sipailou No.2, Nanjing, P.R.China 412 Phone: +86 25 83794000 413 Email: wwzhang@njnet.edu.cn 415 Hua Wu 416 School of Computer Science and Engineering 417 Southeast University 418 Sipailou No.2, Nanjing, P.R.China 419 Phone: +86 25 83794000 420 Email: hwu@njnet.edu.cn