IPFIX Working Group G. Cheng Internet Draft J. Gong Intended status: Standards Track w. Zhang Expires: Dec 23,2011 H. Wu Southeast University June 22, 2011 A Composite IP Packet Selector draft-cheng-ipfix-packet-selector-00.txt Abstract This document specifies a composite IP packet selector in Metering Process of the IP Flow Information Export protocol (IPFIX). The composite selector is realized by combining a sampling selector using systematic or random sampling technique followed by a hash- based filtering selector computing the hash function on 5-tuples information (source/ destination IP address, source/destination port number, port). Taking flow sampling into account in packet selection, the designed composite selector could better solve the short-flow lost problem meeting in simple systematic or random sampling selector. Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/. The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This Internet-Draft will expire on December 22, 2011. Cheng, et al Expires December 23, 2011 [Page 1] Internet-Draft A Composite IP Packet Selector June 2011 Copyright Notice Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Cheng, et al Expires December 23, 2011 [Page 2] Internet-Draft A Composite IP Packet Selector June 2011 Table of Contents 1. Introduction.....................................................4 2. Terminology......................................................4 3. Composite selector...............................................6 3.1. Architecture................................................6 3.2. Simple sampling selector....................................7 3.3. Hash-based filtering selector...............................7 3.4. Algorithm...................................................8 3.5. Hash Function...............................................8 4. Formal Syntax....................................................9 5. Security Considerations..........................................9 References ........................................................10 Acknowledgments....................................................10 Author's Addresses.................................................10 Cheng, et al Expires December 23, 2011 [Page 3] Internet-Draft A Composite IP Packet Selector June 2011 1. Introduction With the network data rates increment and fine-grained traffic measurements need, sustained capture of network traffic at line rate is difficult to perform even with the expensive specialized measurement hardware. Therefore, some form of data reduction at the point of measure is necessary. This can be achieved by an intelligent packet selection through Sampling or Filtering, as well as use of aggregation techniques. The motivation for Sampling is to select a representative subset of packets that allow accurate estimates of properties of the unsampled whole traffic. The motivation for Filtering is to remove all packets that are not of interest. The motivation for aggregation is to combine data and allow compact pre-defined views of the traffic. Flow-based IP traffic measurements synthetically apply packet selection and aggregation techniques to achieve the capture of network traffic at line rate in the backbone link. The IPFIX working group gives a brief description about their systematic and random sampling techniques using for packet selection in metering process (section 5.2 of RFC 3917). With good use of packet sampling method, they could efficiently reduce the data amount to capture at the observation point. However, the simple sampling techniques have a natural disadvantage in the capture of short-flows. With equal probability to select each packet, long- flows with a large number of packets have more opportunity to be captured than short-flows with a relatively small number of packets. Therefore, with a lower sampling probability, the simple sampling techniques here may lead to serious lost of short-flows in flow- based IP traffic measurements. Furthermore, short-flows usually produced by anomalous network events such as DDoS attack. In a word, simple sampling techniques have a high lost rate in the capture of short-flows which could be used to find and analyze network anomalous event. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119. 2. Terminology The terminology defined here is fully consistent with all terms listed in [RFC 5474 and RFC 5475] but includes additional terms required for the description of the specific filtering selector. In addition, this document defines the following terms Cheng, et al Expires December 23, 2011 [Page 4] Internet-Draft A Composite IP Packet Selector June 2011 * Filtering: A filter is a Selector that selects a packet deterministically based on the Packet Content, or its treatment, or functions of these occurring in the Selection State. Two examples are: (i) Property Match Filtering: A packet is selected if a specific field in the packet equals a predefined value. (ii) Hash-based Selection: A Hash Function is applied to the Packet Content, and the packet is selected if the result falls in a specified range. * Sampling: A Selector that is not a filter is called a Sampling operation. This reflects the intuitive notion that if the selection of a packet cannot be determined from its content alone, there must be some type of Sampling taking place. Sampling operations can be divided into two subtypes: (i) Content-independent Sampling, which does not use Packet Content in reaching Sampling decisions. Examples include systematic Sampling, and uniform pseudorandom Sampling driven by a pseudorandom number whose generation is independent of Packet Content. Note that in content independent Sampling, it is not necessary to access the Packet Content in order to make the selection decision. (ii) Content-dependent Sampling, in which the Packet Content is used in reaching selection decisions. An application is pseudorandom selection according to a probability that depends on the contents of a packet field, e.g., Sampling packets with a probability dependent on their TCP/UDP port numbers. Note that this is not a Filter. * Hash Domain: A Hash Domain is a subset of the Packet Content and the packet treatment, viewed as an N-bit string for some positive integer N. * Hash Range: A Hash Range is a set of M-bit strings for some positive integer M that defines the range of values that the result of the hash operation can take. * Hash Function: A Hash Function defines a deterministic mapping from the Hash Domain into the Hash Range. * Hash Selection Range: A Hash Selection Range is a subset of the Hash Range. The packet is selected if the action of the Hash Function on the Hash Domain for the packet yields a result in the Hash Selection Range. Cheng, et al Expires December 23, 2011 [Page 5] Internet-Draft A Composite IP Packet Selector June 2011 * Hash-based Selection: A Hash-based Selection is Filtering specified by a Hash Domain, a Hash Function, a Hash Range, and a Hash Selection Range. * Observed Packet Stream: The Observed Packet Stream is the set of all packets observed at the Observation Point. * Selected Packet Stream: A Selected Packet Stream denotes a set of packets from the Observed Packet Stream that flows past some specified point within the Metering Process. An example of a Selected Packet Stream is the output of the selection process. Note that packets selected from a stream, e.g., by Sampling, do not necessarily possess a property by which they can be distinguished from packets that have not been selected. For this reason, the term "stream" is favored over "flow", which is defined as a set of packets with common properties [RFC3917]. * Non Selected Packet Stream: A Non Selected Packet Stream denotes a set of packets from the Observed Packet Stream that can not flow past all specified point within the Metering Process. An example of a Non Selected Packet Stream is the dropped packet stream of the selection process. Note that packets not selected from a stream. * Packet Content: The Packet Content denotes the union of the packet header (which includes link layer, network layer, and other encapsulation headers) and the packet payload. At some Observation Points, the link header information may not be available. * 5-tuple flow information: Basic information in the packet header: source IP address, destination IP address, source port number, destination port number, and port. 3. Composite selector The composite selector aims at the solution of the high lost rate problem in the capture of short-flows with the simple sampling techniques. It is realized by combining a sampling selector using systematic or random sampling technique followed by a hash-based filtering selector computing a hash function on 5-tuples information. This section detail describes its architecture and each component. 3.1. Architecture Cheng, et al Expires December 23, 2011 [Page 6] Internet-Draft A Composite IP Packet Selector June 2011 +----------------------------------------+ | +--------+ | | | |---Selected Packet Stream -----> | | | | | | simple | | | |sampling| Non +----------+ | Observed | |selector| Selected |hash-based| | Selected Packet---->| |-Packet--> |filtering |-------> Packet Stream | | | Stream |selector | | Stream | +--------+ +----------+ | | Composite Selector | +----------------------------------------+ Figure 1: Architecture of A Composite Selector The composite selector composes two cascaded selector: a simple sampling selector followed by a specific hash-based filtering selector. The latter one takes the non selected packet stream of the previous one as its input. In the first stage, the sampling selector uses simple systematic or random sampling technique to select packets from observed packet stream. If the packet is selected then export it outside, otherwise forward it to the filtering selector. In the second stage, the filtering selector computes a hash function on 5-tuples information of each packet coming from the sampling selector, and selects the packet whose hash key matching the predefined patterns. The input of the composite selector is the observed packet stream while the output composes two parts. One is the selected packet stream in the first stage; the other is the non selected stream of the first stage but selected again in the second stage. 3.2. Simple sampling selector A sampling selector is targeted at the selection of a representative subset of packets. The subset is used to infer knowledge about the whole set of observed packets without processing them all. The selection can depend on packet position, and/or on Packet Content, and/or on (pseudo) random decisions. Because the sampling selector here is the same as what the IPFIX working group described in RFC 3917, the document doesn't repeatedly introduce this part. 3.3. Hash-based filtering selector Cheng, et al Expires December 23, 2011 [Page 7] Internet-Draft A Composite IP Packet Selector June 2011 A normal hash-based filtering selector uses a hash function h to map the Packet Content c, or some portion of it, onto a Hash Range R. The packet is selected if h(c) is an element of S, which is a subset of R called the Hash Selection Range. To solve the high lost rate problem in the capture of short-flows, the hash-based filtering selector here should take flow sampling into account in packet filtering. That is on the basis of 5-tuples flow information to compute a hash function. 3.4. Algorithm First of all, the algorithm should predefine a pattern set - a set of one or more patterns while each pattern definite a hash mapping range. On receiving a packet, the filtering selector computes the hash key of the packet'5-tuple. Then, it selects the packet if the hash key matching any one pattern in the set. 3.5. Hash Function Because applying the hash-based packet Selection, BOB function MUST be used for packet selection operations in order to be compliant with PSAMP (RFC 5475). If a Hash-based Selection with the BOB function is used with IPv4 traffic, the following input bytes MUST be used. - IP identification field - Flags field - Fragment offset - Source IP address - Destination IP address - A configurable number of bytes from the IP payload, starting at a configurable offset Due to the lack of suitable IPv6 packet traces, all candidate Hash Functions in RFC5476 were evaluated only for IPv4. Due to the IPv6 header fields and address structure, it is expected that there is less randomness in IPv6 packet headers than in IPv4 headers. Nevertheless, the randomness of IPv6 traffic has not yet been evaluated sufficiently to get any evidence. In addition to this, IPv6 traffic profiles may change significantly in the future when IPv6 is used by a broader community. If a Hash-based Selection with the BOB function is used with IPv6 traffic, the following input bytes MUST be used. Cheng, et al Expires December 23, 2011 [Page 8] Internet-Draft A Composite IP Packet Selector June 2011 - Payload length (2 bytes) - Byte number 10,11,14,15,16 of the IPv6 source address - Byte number 10,11,14,15,16 of the IPv6 destination address - A configurable number of bytes from the IP payload, starting at a configurable offset. It is recommended to use at least 4 bytes from the IP payload. The payload itself is not changing during the path. Even if some routers process some extension headers, they are not going to strip them from the packet. Therefore, the payload length is invariant along the path. Furthermore, it usually differs for different packets. The IPv6 address has 16 bytes. The first part is the network part and contains low variation. The second part is the host part and contains higher variation. Therefore, the second part of the address is used. Nevertheless, the uniformity has not been checked for IPv6 traffic. 4. Formal Syntax The following syntax specification uses the augmented Backus-Naur Form (BNF) as described in RFC-2234 [2]. 5. Security Considerations Security considerations concerning the choice of a Hash Function for Hash-based Selection. Furthermore, the Hash Function has a number of potential attacks to craft Packet Streams that are disproportionately detected and/or discover the Hash Function parameters, the vulnerabilities of different Hash Functions to these attacks, and practices to minimize these vulnerabilities. In addition to this, a user can gain knowledge about the start and stop triggers in time-based systematic Sampling, e.g., by sending test packets. This knowledge might allow users to modify their send schedule in a way that their packets are disproportionately selected or not selected. For random Sampling, a cryptographically strong random number generator should be used in order to prevent that an advisory can predict the selection decision. Further security threats can occur when Sampling parameters are configured or communicated to other entities. The configuration and reporting of Sampling parameters are out of scope of this document. Therefore, the security threats that originate from this kind of communication cannot be assessed with the information given in this document. Cheng, et al Expires December 23, 2011 [Page 9] Internet-Draft A Composite IP Packet Selector June 2011 Some of these threats can probably be addressed by keeping configuration information confidential and by authenticating entities that configure Sampling. Nevertheless, a full analysis and assessment of threats for configuration and reporting has to be done if configuration or reporting methods are proposed. References [1] Bradner, S., "The Internet Standards Process-Revision 3", BCP 9, RFC 2026, October 1996. [2] Crocker, D. and Overell, P(Editors), "Augmented BNF for Syntax Specifications:ABNF", RFC 2234, Internet Mail Consortium and Demon Internet Ltd, November 1997. [3] J.Quittek, T.Zseby, B.Claise and S.Zander, "Requirements for IP Flow Information Export (IPFIX)", RFC 3917, October 2004 [4] Duffield, N., Ed., "A Framework for Packet Selection and Reporting", RFC 5474, March 2009. [5] Zseby, T., Molina, M., Duffield, D., Niccolini, S., and F. Rapall, "Sampling and Filtering Techniques for IP Packet Selection", RFC 5475, March 2009. [6] Claise, B., Ed., "Packet Sampling (PSAMP) Protocol Specifications", RFC 5476, March 2009. [7] Vyas Sekar, Michael K Reiter, Hui Zhang, "Revisiting the Case for a Minimalist Approach for Network Flow Monitoring", In Proc.IMC, November 2010. Acknowledgments This work is materially supported by the National Key Technology Program of China under Grant No.2008BAH37B04, the National Grand Fundamental Research 973 program of China under Grant No. 2009CB320505, the National Nature Science Foundation of China under Grant No. 60973123. Author's Addresses Guang Cheng School of Computer Science and Engineering Southeast University Sipailou No.2, Nanjing, P.R.China Phone: +86 25 83794000 Email: gcheng@njnet.edu.cn Jian Gong School of Computer Science and Engineering Southeast University Cheng, et al Expires December 23, 2011 [Page 10] Internet-Draft A Composite IP Packet Selector June 2011 Sipailou No.2, Nanjing, P.R.China Phone: +86 25 83794000 Email: jgong@njnet.edu.cn Weiwei Zhang School of Computer Science and Engineering Southeast University Sipailou No.2, Nanjing, P.R.China Phone: +86 25 83794000 Email: wwzhang@njnet.edu.cn Hua Wu School of Computer Science and Engineering Southeast University Sipailou No.2, Nanjing, P.R.China Phone: +86 25 83794000 Email: hwu@njnet.edu.cn Cheng, et al Expires December 23, 2011 [Page 11]