INTERNET-DRAFT Nick Duffield draft-duffield-framework-papame-01 Albert Greenberg Matthias Grossglauser Feb 27, 2002 Jennifer Rexford AT&T Labs - Research A Framework for Passive Packet Measurement Copyright (C) The Internet Society (2001). All Rights Reserved. This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract A wide range of traffic engineering and troubleshooting tasks rely on reliable, timely, and detailed traffic measurements. We describe a passive packet measurement framework that is (a) general enough to serve as the basis for a wide range of operational tasks, and (b) relies on a small set of primitives that facilitate uniform deployment in router interfaces or dedicated measurement devices, even at very high speeds. This document describes the motivation for such a framework through several operational examples, defines the measurement primitives (filtering, sampling, and hashing), and illustrates their use. 1 Motivation Framework: This document is described a framework for a standard set of capabilities for network elements to sample packets and report on them. One motivation to standardize these capabilities comes from the requirement for measurement-based support for network management and control across multivendor domains. This requires domain wide consistency in the types of sampling schemes available, the manner in which the resulting measurements are presented, and consequently, consistency of the interpretation that can be put on them. Relation to other work: The measurement capabilities are positioned as suppliers of packet samples to higher level consumers, including both remote collectors and applications, and on board measurement-based applications. Indeed, development of the standards within the framework described here should take into account the measurement requirements of standards in other IETF WGs, including IPPM and TEWG. Conversely, we expect that aspects of this framework not specifically concerned with the central issue of packet sampling may be able to leverage work in other WGs. The prime example is the format and export of measurement reports, which may leverage the work of IPFIX. Applications: We first describe several representative operational applications that require traffic measurements at various levels of temporal and spatial granularity. Example 1: Troubleshooting A network operator typically monitors aggregate statistics on a per- link basis. Such aggregate statistics may include total number of packets and bytes, dropped number of packets and bytes. These statistics are typically moving averages over relatively long time windows (e.g., 5 minutes), and serve as a coarse-grain indication of operational health of the network. The most common method of obtaining such measurements are through the appropriate SNMP MIBs (MIB-II and vendor-specific MIBs.) Suppose an operator detects a link that is persistently overloaded and experiences significant packet drop rates. There is a wide range of potential causes: routing parameters (e.g., OSPF link weights) that are poorly adapted to the traffic matrix, e.g., because of a shift in that matrix; a denial of service attack or a flash crowd; a routing problem (link flapping). In most cases, aggregate link statistics are not sufficient to distinguish between such causes, and to decide on an appropriate corrective action. For example, if routing over two links is unstable, and the links flap between being overloaded and inactive, this might be averaged out in a 5 min window, indicating moderate loads on both links. Hence, the operator must be able to drill down into the traffic on a link, and obtain measurements that are more fine-grained both in space and in time. The operator has to be able to determine how many bytes/packets are generated for each source/destination address, port number, and prefix, or other attributes, such as protocol number, MPLS forwarding equivalence class (FEC), type of service, etc. This allows to pinpoint precisely the nature of the offending traffic. For example, in the case of a DDoS attack, the operator would see a significant fraction of traffic with an identical destination address. Example 2: Characterizing Demand Traffic engineering has two goals: optimizing the quality of service provided to customers, and optimizing the use of network resources. This is achieved through network-wide control of routing, traffic classification and differentiation, and resource allocation. Traffic measurements are necessarily part of such a closed control loop. Specifically, the operator has to be able to measure the total network-wide traffic demand at several levels of granularity and time scales. For example, in order to optimize intradomain routing by modifying OSPF link weights or by configuring MPLS tunnels, the volume per ingress-egress pair has to be measured (traffic matrix.) At a longer time scale (weeks to months), measurements also drive topology and capacity planning and the management of peering agreements. Topology and capacity planning involves upgrading links and routers and modifying the network topology to be well-adapted to the prevailing traffic pattern. This includes deciding where new customers should be attached. A natural representation for traffic demand to drive topology and capacity planning is a previous/next-hop AS traffic matrix, which characterizes demand in terms of neighboring ASs. Managing peering agreements, i.e., making strategic decisions about setting up and retiring peering agreements, and modifying the terms of existing ones (e.g., where to interconnect with peers.), benefits from a source/destination AS traffic matrix, because the set of neighboring ASs may change as a result of peering management. Therefore, in general, it is necessary to obtain averages over various time scales of the entire traffic carried by a network domain. The spatial resolution of these averages include the source and destination IP address, AS, prefix, port number, and the previous and next hop AS with respect to the measurement domain. Furthermore, if a service provider uses multiple service types, it should also be possible to measure these matrices individually per service type. Example 3: Direct Observation of Network Behavior In certain circumstances, precise information about the spatial flow of traffic through the network domain is required to detect and diagnose problems and verify correct network behavior. For example, in the case of the overloaded link in Example 1, it would be very helpful to know the precise set of paths that packets traversing this link follow. This would readily reveal a routing problem such as a loop, or a link with a misconfigured weight. More generally, complex diagnosis scenarios can benefit from measurement of traffic intensities (and other attributes) over a set of paths that is constrained in some way. For example, if a multihomed customer complains about performance problems on one of the access links from a particular source address prefix, the operator should be able to examine in detail the traffic from that source prefix which also traverses the specified access link towards the customer. While it is in principle possible to obtain the spatial flow of traffic through auxiliary network state information, e.g., by downloading routing and forwarding tables from routers, this information is often unreliable, outdated, voluminous, and contingent on a network model. For operational purposes, a direct observation of traffic flow is more reliable, as it does not depend on any such auxiliary information. For example, if there was a bug in a router's software, direct observation would allow to diagnose the effect of this bug, while an indirect method would not. 2 Goals The main goal of this proposal is to define a measurement framework that relies on three canonical primitives: packet sampling, filtering, and hashing. A wide spectrum of applications, including those described in the previous section, are enabled by measurements obtained through combinations of these three primitives. Furthermore, a sampling device based on these measurement primitives is relatively simple, as (a) it requires only minimal per-packet processing, and (b) it requires little (local) memory. Therefore, the proposed framework represents an effective tradeoff between implementation complexity and the range of traffic engineering applications and other operational tasks it enables. More generally, the following goals motivate the proposed framework: o Greatly assist a very wide range of applications that can be built on traffic measurement (Section 4), from a very small set of primitives implemented ubiquitously. o Aim for ubiquity, by including in the minimal set of primitives functions that can be implemented at maximal line rate with minimal additional state. o Aim for ubiquity, by not forcing tight integration with packet control actions (policing, marking, shaping, queueing). o Allow for extensibility, which can be applied where needed (depending on the application) for enhanced functionality. o Aim for flexibility in data export format and options. o A common data stream must support different applications, teams and organizations (e.g., traffic engineering, marketing, billing) concurrently. o Allow for flexibility in implementation. In particular, export of local router state information can be decoupled from export of usage information. o Ease of configuration of sampling an export parameters, e.g. for automated remote reconfiguration in response to measurements. o Allow transparent interpretation of measurements through inclusion of sampling configuration in the reporting stream. o Allow robust interpretation of measurements with respect to reports missing due to loss in transport, or omission at the measurement device. 3 Measurement Functionality 3.1 Measurement Information Flow The framework for passive measurement has three main parts: the selection of packets for measurement, the creation and export of measurement reports, and the content and format of the measurement records. Because of the increasing number of distinct measurement applications, we believe it is desirable to set up parallel measurement information flows from the stream of packets. Each information flow should consist of independently-configurable pipelines for selecting packets and exporting measurement records. The processing of each measurement information flow should, as far as possible, be independent. However, resource constraints may prevent complete reporting on a packet selected for multiple information flows. In this case, reporting for the packet must be complete for at least one information flow; other information flows need only report that they selected the packet. The priority amongst information flows to report packets must be configurable. 3.2 Packet Selection The function of packet selection is to select a subset out of the stream of all packets. Selection may be used to select a subset of packets of interest based on their content, and/or to reduce the rate of packets into the measurement flow regardless of content. Packet selection is performed through combination a number of measurement primitives described below. In this document we do not set any restrictions on the form these combinations can take. o Hashing: A hashing function operates on a subset of packet bits and associates the resulting hash with the packet. Bit positions can be excluded from the input to the hashing function by masking. This ability would be used, for example, by applications that require the hash to be independent on packet header fields, such as TTL or header CRC, that are mutable on its passage through the network. o Filtering: Filtering is accomplished by applying mask/match operations to any combination of bit positions from the packet and the configured hashes. The mask/match operation is configurable independently for each filter. Higher level interfaces to the match/mask primitive may be used to specify mask and matches for particular fields, for example, for IP addresses and/or TCP/UDP port numbers. o Sampling: Each sampler will be individually configurable to sample packets with a certain probability p. Examples are probabilistic sampling, in which each packet is selected quasirandomly with probability p, and deterministic sampling, in which packets are sampled periodically with period 1/p. In some sampling schemes, the sampling probability may depend on the packet content. Sampling at full line rate with probability p=1 is not excluded in principle, although resource constraints may not support it in practice. In order to be able to function at line rates, each measurement primitive take as its input only a packet itself, or quantities that have been calculated from the packet previously by other measurement primitives. Router state is not assumed to be available to the measurement primitives. 3.3 Report Generation and Export Although the primary goal of this draft is to set up a framework for the sampling operations themselves, utilization of the resulting measurements places requirements the information available for export, and the methods by which reports are exported. Any scheme that can accommodate the framework described in this section and section 3.4 is a convenient candidate for the job. Report preparation involves selecting fields of interest from each sampled packet, then adjoining subsidiary information (e.g., hash values, byte and packet counts, timestamps, etc.) from the selection process and router state information. The router state values may depend on the packet content (e.g., the IP prefix or Autonomous System associated with the destination address in the IP header, the input and output interfaces that carried the packet, etc.). Reports may also include subsidiary quantities calculated as a function of the selected packet and the router state. To simplify the design, some of the subsidiary information and router state may be incorporated when the records are exported, rather than when the packets are selected. However, all such router state information must be included for reporting in a timely manner, in order that it reflects the actual state encountered by the packet. The device generating the measurement records is configured to transmit the data to one or more collection systems, identified by IP address and port number. Exporting these records to other systems introduces several practical issues that have important implications on the analysis of the data: o Transport: Two basic modes of transport are possible: unreliable and reliable. In the unreliable mode, a completed measurement packet from the export module is encapsulated into a UDP packet and sent to the configured address (the collection system). The sending device does not need to keep state about this packet (other than possibly a sequence number to detect lost measurement packets). In the reliable mode, the device exports records via a TCP connection to the collection system. The device must be capable of receiving packets (such as acknowledgments) from the collection system and retransmitting lost packets. o Export rate: The device should impose a (configurable) limit of the number of measurement records per unit time. Otherwise, the measurement device could overload the network and the collection system. This problem would be exacerbated in the reliable transport mode, where the device would retransmit any lost packets (thereby imposing an additional load on the network). At times, the device may generate new records faster than the allowed export rate. In this situation, the device should discard the excess records rather than transmitting them to the collection system. The device may record information (such as sequence numbers, or packet and byte counter values accumulated at the inputs and outputs of a packet selector) to aid the collection system in compensating for the missing data in any subsequent analysis. o Maximum delay in exporting records: The device may queue measurement records in order to export multiple records in a single packet. However, the device should bound the delay in exporting measurement records, even if the number of records is small. This is important for two reasons. First, having an upper bound on the export delay ensures that the collection system has up-to-date information about the sampled packets. Second, in some scenarios, the device may associate a timestamp with the record(s) at the export stage. Limiting the delay in exporting the records places a tight bound on the inaccuracy in the timestamp information. The device can impose a (configurable) Maximum Transmission Unit (MTU) size for reports. o Local Export: packet reports may also be directly exported to on-board measurement-based applications, for example those that for composite statistics from more than one packet. Local export may be presented through an interface direct to the higher level applications, i.e., without employing the transport used for off-board export. 3.4 Measurement Record Format Report export involves the bundling of one or more measurement records and sending a packet to the collection system. The report includes several types of information, such as: o Per-packet information: The measurement record for each sampled packet includes various header fields (e.g., IP addresses, port numbers, ToS bits, TCP flags, etc.), as well as subsidiary information (e.g., timestamp, input and output links, other router state, hash values, etc.). o Configuration information: The stream of reports should provide information about the configuration of the measurement flow (e.g., the sampling frequency, the sampling technique and associated parameters, the match/mask filter, etc.). This ensures that the measurement data are self-describing and allows the collection system to analyze the measurement data without a separate feed of the configuration state. Changes in configuration must be immediately reflected in the report stream. o Aggregate information: The reports should include sufficient information for the collection system to account for discarded measurement records and lost exported packets. For example, the reports could include sequence numbers to enable the collection machine to detect lost reports. The reports could include a count of the number of bytes and packets that matched the filter, or that passed both the filtering and sampling stages. To conserve storage space and network bandwidth, the device may compress the measurement records as they are stored or exported. Compression should be quite effective since the sampled packets may share many fields in common (especially if the filter focuses on packets with certain values in particular header fields). 3.5. Configuration and Management All configuration parameters associated with the sampling of packets and export of measurements are to be contained in a MIB. A secure protocol is to be used to access to the MIB for reconfiguration and retrieval of the parameters. 4 Applications We describe a representative set of operational applications enabled by the passive measurement device described in the previous section, by referring back to the examples in Section 1. Example 1: Troubleshooting Packet sampling is ideally suited to determine the composition of the traffic (e.g., on a link) in terms of various attributes (source and destination address and port numbers, prefix, protocol number, type of service, etc.) Typically, unfiltered sampling would be used to obtain a coarse-grained view of the traffic on a link, say. Once the characteristics of an interesting subset of traffic (e.g., a service type, or a source address prefix corresponding to some customer) has been identified, the resolution can be refined by filtering out this traffic, and by boosting the sampling rate correspondingly. In this way, the traffic can be examined and characterized ("sliced and diced") arbitrarily. Example 2: Characterizing Demand Characterizing demand for an entire network domain will likely be achieved by sampling packets on all the ingress links, or some other well-chosen cut set. The sampling rate would typically be chosen relatively low, given that we are interested in averages over longer time scales, e.g., to detect significant systemic shifts in demand not due to random fluctuations. Some of the subsidiary fields included in reports, such as source and destination AS, and input and output link, will be useful, depending on the spatial granularity of demand characterization. Example 3: Direct Observation of Network Behavior Direct observation of the spatial flow of traffic through the domain can be achieved through a method called trajectory sampling, which relies on the hash function to make sampling decisions [DG01]. Specifically, the hash function is computed over a predefined set of fields of the IP packet header and payload. If the hash function for a packet falls within a configurable interval [a,b], then the packet should be sampled; otherwise, it should not be sampled. This features yields the full paths followed by sampled packets, by ensuring that a packet is sampled on every router it traverses, or no router at all. This requires that the hash function and the set of packet fields over which it is computed are the same everywhere. A similar use of hash functions has also been considered for hash- based IP traceback of distributed denial-of-service (DDoS) attacks [SPSJTKS01]. 5 References [B88] R.T. Braden, A pseudo-machine for packet monitoring and statistics, in Proc ACM SIGCOMM 1988 [DG01] N. G. Duffield and M. Grossglauser, Trajectory Sampling for Direct Traffic Observation, IEEE/ACM Trans. on Networking, 9(3), pp. 280-292, June 2001. [SPSJTKS01] A. C. Snoeren, C. Partridge, L. A. Sanchez, C. E. Jones, F. Tchakountio, S. T. Kent, W. T. Strayer, Hash-Based IP Traceback, Proc. ACM SIGCOMM 2001, San Diego, CA, September 2001. 6 Author's Addresses Nicholas G. Duffield AT&T Labs - Research Room B-139 180 Park Ave Florham Park NJ 07932, USA Phone: +1 973-360-8726 Email: duffield@research.att.com Albert Greenberg AT&T Labs - Research Room A-161 180 Park Ave Florham Park NJ 07932, USA Phone: +1 973-360-8730 Email: albert@research.att.com Matthias Grossglauser AT&T Labs - Research Room A-167 180 Park Ave Florham Park NJ 07932, USA Phone: +1 973-360-7172 Email: mgross@research.att.com Jennifer Rexford AT&T Labs - Research Room A-169 180 Park Ave Florham Park NJ 07932, USA Phone: +1 973-360-8728 Email: jrex@research.att.com 7 Intellectual Property Statement AT&T Corp. may own intellectual property applicable to this contribution. AT&T is currently reviewing its licensing intent relative to the Intellectual Property and will notify the IETF when AT&T has made a determination of that intent. 8 Full Copyright Statement Copyright (C) The Internet Society (1999). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.