INTERNET-DRAFT                                            Nick Duffield
draft-duffield-framework-papame-01                     Albert Greenberg
                                                  Matthias Grossglauser
Feb 27, 2002                                           Jennifer Rexford
                                                   AT&T Labs - Research


               A Framework for Passive Packet Measurement


    Copyright (C) The Internet Society (2001).  All Rights Reserved.

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

Abstract

   A wide range of traffic engineering and troubleshooting tasks rely
   on reliable, timely, and detailed traffic measurements. We describe
   a passive packet measurement framework that is (a) general enough
   to serve as the basis for a wide range of operational tasks, and
   (b) relies on a small set of primitives that facilitate uniform
   deployment in router interfaces or dedicated measurement devices,
   even at very high speeds. This document describes the motivation
   for such a framework through several operational examples, defines
   the measurement primitives (filtering, sampling, and hashing), and
   illustrates their use.

1 Motivation

   Framework: This document is described a framework for a standard
   set of capabilities for network elements to sample packets and
   report on them.  One motivation to standardize these capabilities
   comes from the requirement for measurement-based support for
   network management and control across multivendor domains.  This
   requires domain wide consistency in the types of sampling schemes
   available, the manner in which the resulting measurements are
   presented, and consequently, consistency of the interpretation that
   can be put on them. 

   Relation to other work: The measurement capabilities are positioned
   as suppliers of packet samples to higher level consumers, including
   both remote collectors and applications, and on board
   measurement-based applications.  Indeed, development of the
   standards within the framework described here should take into
   account the measurement requirements of standards in other IETF
   WGs, including IPPM and TEWG. Conversely, we expect that aspects of
   this framework not specifically concerned with the central issue of
   packet sampling may be able to leverage work in other WGs. The
   prime example is the format and export of measurement reports,
   which may leverage the work of IPFIX.

   Applications: We first describe several representative operational
   applications that require traffic measurements at various levels of
   temporal and spatial granularity.

   Example 1: Troubleshooting

   A network operator typically monitors aggregate statistics on a
   per- link basis. Such aggregate statistics may include total number
   of packets and bytes, dropped number of packets and bytes. These
   statistics are typically moving averages over relatively long time
   windows (e.g., 5 minutes), and serve as a coarse-grain indication
   of operational health of the network. The most common method of
   obtaining such measurements are through the appropriate SNMP MIBs
   (MIB-II and vendor-specific MIBs.)

   Suppose an operator detects a link that is persistently overloaded
   and experiences significant packet drop rates. There is a wide
   range of potential causes: routing parameters (e.g., OSPF link
   weights) that are poorly adapted to the traffic matrix, e.g.,
   because of a shift in that matrix; a denial of service attack or a
   flash crowd; a routing problem (link flapping). In most cases,
   aggregate link statistics are not sufficient to distinguish between
   such causes, and to decide on an appropriate corrective action. For
   example, if routing over two links is unstable, and the links flap
   between being overloaded and inactive, this might be averaged out
   in a 5 min window, indicating moderate loads on both links.

   Hence, the operator must be able to drill down into the traffic on
   a link, and obtain measurements that are more fine-grained both in
   space and in time. The operator has to be able to determine how
   many bytes/packets are generated for each source/destination
   address, port number, and prefix, or other attributes, such as
   protocol number, MPLS forwarding equivalence class (FEC), type of
   service, etc. This allows to pinpoint precisely the nature of the
   offending traffic. For example, in the case of a DDoS attack, the
   operator would see a significant fraction of traffic with an
   identical destination address.


   Example 2: Characterizing Demand

   Traffic engineering has two goals: optimizing the quality of
   service provided to customers, and optimizing the use of network
   resources.  This is achieved through network-wide control of
   routing, traffic classification and differentiation, and resource
   allocation. Traffic measurements are necessarily part of such a
   closed control loop.  Specifically, the operator has to be able to
   measure the total network-wide traffic demand at several levels of
   granularity and time scales.

   For example, in order to optimize intradomain routing by modifying
   OSPF link weights or by configuring MPLS tunnels, the volume per
   ingress-egress pair has to be measured (traffic matrix.)  At a
   longer time scale (weeks to months), measurements also drive
   topology and capacity planning and the management of peering
   agreements.  Topology and capacity planning involves upgrading
   links and routers and modifying the network topology to be
   well-adapted to the prevailing traffic pattern. This includes
   deciding where new customers should be attached. A natural
   representation for traffic demand to drive topology and capacity
   planning is a previous/next-hop AS traffic matrix, which
   characterizes demand in terms of neighboring ASs.  Managing peering
   agreements, i.e., making strategic decisions about setting up and
   retiring peering agreements, and modifying the terms of existing
   ones (e.g., where to interconnect with peers.), benefits from a
   source/destination AS traffic matrix, because the set of
   neighboring ASs may change as a result of peering management.

   Therefore, in general, it is necessary to obtain averages over
   various time scales of the entire traffic carried by a network
   domain.  The spatial resolution of these averages include the
   source and destination IP address, AS, prefix, port number, and the
   previous and next hop AS with respect to the measurement domain.
   Furthermore, if a service provider uses multiple service types, it
   should also be possible to measure these matrices individually per
   service type.


   Example 3: Direct Observation of Network Behavior

   In certain circumstances, precise information about the spatial
   flow of traffic through the network domain is required to detect
   and diagnose problems and verify correct network behavior.  For
   example, in the case of the overloaded link in Example 1, it would
   be very helpful to know the precise set of paths that packets
   traversing this link follow. This would readily reveal a routing
   problem such as a loop, or a link with a misconfigured weight. More
   generally, complex diagnosis scenarios can benefit from measurement
   of traffic intensities (and other attributes) over a set of paths
   that is constrained in some way. For example, if a multihomed
   customer complains about performance problems on one of the access
   links from a particular source address prefix, the operator should
   be able to examine in detail the traffic from that source prefix
   which also traverses the specified access link towards the
   customer.

   While it is in principle possible to obtain the spatial flow of
   traffic through auxiliary network state information, e.g., by
   downloading routing and forwarding tables from routers, this
   information is often unreliable, outdated, voluminous, and
   contingent on a network model. For operational purposes, a direct
   observation of traffic flow is more reliable, as it does not depend
   on any such auxiliary information. For example, if there was a bug
   in a router's software, direct observation would allow to diagnose
   the effect of this bug, while an indirect method would not.

2 Goals

   The main goal of this proposal is to define a measurement framework
   that relies on three canonical primitives: packet sampling,
   filtering, and hashing.  A wide spectrum of applications, including
   those described in the previous section, are enabled by
   measurements obtained through combinations of these three
   primitives.  Furthermore, a sampling device based on these
   measurement primitives is relatively simple, as (a) it requires
   only minimal per-packet processing, and (b) it requires little
   (local) memory. Therefore, the proposed framework represents an
   effective tradeoff between implementation complexity and the range
   of traffic engineering applications and other operational tasks it
   enables.

   More generally, the following goals motivate the proposed framework:

   o Greatly assist a very wide range of applications that can be
   built on traffic measurement (Section 4), from a very small set of
   primitives implemented ubiquitously.

   o Aim for ubiquity, by including in the minimal set of primitives
   functions that can be implemented at maximal line rate with minimal
   additional state.

   o Aim for ubiquity, by not forcing tight integration with packet
   control actions (policing, marking, shaping, queueing).

   o Allow for extensibility, which can be applied where needed
   (depending on the application) for enhanced functionality.

   o Aim for flexibility in data export format and options.

   o A common data stream must support different applications, teams
   and organizations (e.g., traffic engineering, marketing, billing)
   concurrently.

   o Allow for flexibility in implementation.  In particular, export
   of local router state information can be decoupled from export of
   usage information.

   o Ease of configuration of sampling an export parameters, e.g. for
   automated remote reconfiguration in response to measurements.

   o Allow transparent interpretation of measurements through
   inclusion of sampling configuration in the reporting stream.

   o Allow robust interpretation of measurements with respect to
   reports missing due to loss in transport, or omission at the
   measurement device.

3 Measurement Functionality

   3.1 Measurement Information Flow

   The framework for passive measurement has three main parts: the
   selection of packets for measurement, the creation and export of
   measurement reports, and the content and format of the measurement
   records.  Because of the increasing number of distinct measurement
   applications, we believe it is desirable to set up parallel
   measurement information flows from the stream of packets.  Each
   information flow should consist of independently-configurable
   pipelines for selecting packets and exporting measurement records.
   
   The processing of each measurement information flow should, as far
   as possible, be independent. However, resource constraints may
   prevent complete reporting on a packet selected for multiple
   information flows. In this case, reporting for the packet must be
   complete for at least one information flow; other information flows
   need only report that they selected the packet. The priority
   amongst information flows to report packets must be configurable.

   3.2 Packet Selection

   The function of packet selection is to select a subset out of the
   stream of all packets.  Selection may be used to select a subset of
   packets of interest based on their content, and/or to reduce the
   rate of packets into the measurement flow regardless of content.
   Packet selection is performed through combination a number of
   measurement primitives described below. In this document we do not
   set any restrictions on the form these combinations can take.

   o Hashing:

   A hashing function operates on a subset of packet bits and
   associates the resulting hash with the packet.  Bit positions can
   be excluded from the input to the hashing function by masking. This
   ability would be used, for example, by applications that require
   the hash to be independent on packet header fields, such as TTL or
   header CRC, that are mutable on its passage through the network.

   o Filtering:

   Filtering is accomplished by applying mask/match operations to any
   combination of bit positions from the packet and the configured
   hashes.  The mask/match operation is configurable independently for
   each filter. Higher level interfaces to the match/mask primitive
   may be used to specify mask and matches for particular fields, for
   example, for IP addresses and/or TCP/UDP port numbers.

   o Sampling:

   Each sampler will be individually configurable to sample packets
   with a certain probability p.  Examples are probabilistic sampling,
   in which each packet is selected quasirandomly with probability p,
   and deterministic sampling, in which packets are sampled
   periodically with period 1/p. In some sampling schemes, the
   sampling probability may depend on the packet content. Sampling at
   full line rate with probability p=1 is not excluded in principle,
   although resource constraints may not support it in practice.
   
   In order to be able to function at line rates, each measurement
   primitive take as its input only a packet itself, or quantities
   that have been calculated from the packet previously by other
   measurement primitives. Router state is not assumed to be available
   to the measurement primitives.

   3.3 Report Generation and Export

   Although the primary goal of this draft is to set up a framework
   for the sampling operations themselves, utilization of the
   resulting measurements places requirements the information
   available for export, and the methods by which reports are
   exported. Any scheme that can accommodate the framework described
   in this section and section 3.4 is a convenient candidate for the
   job.

   Report preparation involves selecting fields of interest from each
   sampled packet, then adjoining subsidiary information (e.g., hash
   values, byte and packet counts, timestamps, etc.) from the
   selection process and router state information.  The router state
   values may depend on the packet content (e.g., the IP prefix or
   Autonomous System associated with the destination address in the IP
   header, the input and output interfaces that carried the packet,
   etc.).  Reports may also include subsidiary quantities calculated
   as a function of the selected packet and the router state. To
   simplify the design, some of the subsidiary information and router
   state may be incorporated when the records are exported, rather
   than when the packets are selected. However, all such router state
   information must be included for reporting in a timely manner, in
   order that it reflects the actual state encountered by the packet.

   The device generating the measurement records is configured to
   transmit the data to one or more collection systems, identified by
   IP address and port number.  Exporting these records to other
   systems introduces several practical issues that have important
   implications on the analysis of the data:

   o Transport: Two basic modes of transport are possible: unreliable
   and reliable.  In the unreliable mode, a completed measurement
   packet from the export module is encapsulated into a UDP packet and
   sent to the configured address (the collection system).  The
   sending device does not need to keep state about this packet (other
   than possibly a sequence number to detect lost measurement
   packets).  In the reliable mode, the device exports records via a
   TCP connection to the collection system.  The device must be
   capable of receiving packets (such as acknowledgments) from the
   collection system and retransmitting lost packets.

   o Export rate: The device should impose a (configurable) limit of
   the number of measurement records per unit time.  Otherwise, the
   measurement device could overload the network and the collection
   system.  This problem would be exacerbated in the reliable
   transport mode, where the device would retransmit any lost packets
   (thereby imposing an additional load on the network).  At times,
   the device may generate new records faster than the allowed export
   rate.  In this situation, the device should discard the excess
   records rather than transmitting them to the collection system.
   The device may record information (such as sequence numbers, or
   packet and byte counter values accumulated at the inputs and
   outputs of a packet selector) to aid the collection system in
   compensating for the missing data in any subsequent analysis.

   o Maximum delay in exporting records: The device may queue
   measurement records in order to export multiple records in a single
   packet.  However, the device should bound the delay in exporting
   measurement records, even if the number of records is small.  This
   is important for two reasons.  First, having an upper bound on the
   export delay ensures that the collection system has up-to-date
   information about the sampled packets.  Second, in some scenarios,
   the device may associate a timestamp with the record(s) at the
   export stage.  Limiting the delay in exporting the records places a
   tight bound on the inaccuracy in the timestamp information.

   The device can impose a (configurable) Maximum Transmission Unit
   (MTU) size for reports.

   o Local Export: packet reports may also be directly exported to 
   on-board measurement-based applications, for example those that
   for composite statistics from more than one packet. Local export
   may be presented through an interface direct to the higher level
   applications, i.e., without employing the transport used for
   off-board export.

   3.4 Measurement Record Format

   Report export involves the bundling of one or more measurement
   records and sending a packet to the collection system.  The report
   includes several types of information, such as:

   o Per-packet information: The measurement record for each sampled
   packet includes various header fields (e.g., IP addresses, port
   numbers, ToS bits, TCP flags, etc.), as well as subsidiary
   information (e.g., timestamp, input and output links, other router
   state, hash values, etc.).

   o Configuration information: The stream of reports should provide
   information about the configuration of the measurement flow (e.g.,
   the sampling frequency, the sampling technique and associated
   parameters, the match/mask filter, etc.).  This ensures that the
   measurement data are self-describing and allows the collection
   system to analyze the measurement data without a separate feed of
   the configuration state. Changes in configuration must be
   immediately reflected in the report stream.

   o Aggregate information: The reports should include sufficient
   information for the collection system to account for discarded
   measurement records and lost exported packets.  For example, the
   reports could include sequence numbers to enable the collection
   machine to detect lost reports.  The reports could include a count
   of the number of bytes and packets that matched the filter, or that
   passed both the filtering and sampling stages.

   To conserve storage space and network bandwidth, the device may
   compress the measurement records as they are stored or exported.
   Compression should be quite effective since the sampled packets may
   share many fields in common (especially if the filter focuses on
   packets with certain values in particular header fields).

   3.5. Configuration and Management

   All configuration parameters associated with the sampling of
   packets and export of measurements are to be contained in a MIB. A
   secure protocol is to be used to access to the MIB for
   reconfiguration and retrieval of the parameters.
 
4 Applications

   We describe a representative set of operational applications
   enabled by the passive measurement device described in the previous
   section, by referring back to the examples in Section 1.

   Example 1: Troubleshooting

   Packet sampling is ideally suited to determine the composition of
   the traffic (e.g., on a link) in terms of various attributes
   (source and destination address and port numbers, prefix, protocol
   number, type of service, etc.) Typically, unfiltered sampling would
   be used to obtain a coarse-grained view of the traffic on a link,
   say. Once the characteristics of an interesting subset of traffic
   (e.g., a service type, or a source address prefix corresponding to
   some customer) has been identified, the resolution can be refined
   by filtering out this traffic, and by boosting the sampling rate
   correspondingly. In this way, the traffic can be examined and
   characterized ("sliced and diced") arbitrarily.

   Example 2: Characterizing Demand

   Characterizing demand for an entire network domain will likely be
   achieved by sampling packets on all the ingress links, or some
   other well-chosen cut set. The sampling rate would typically be
   chosen relatively low, given that we are interested in averages
   over longer time scales, e.g., to detect significant systemic
   shifts in demand not due to random fluctuations.  Some of the
   subsidiary fields included in reports, such as source and
   destination AS, and input and output link, will be useful,
   depending on the spatial granularity of demand characterization.

   Example 3: Direct Observation of Network Behavior

   Direct observation of the spatial flow of traffic through the
   domain can be achieved through a method called trajectory sampling,
   which relies on the hash function to make sampling decisions
   [DG01].  Specifically, the hash function is computed over a
   predefined set of fields of the IP packet header and payload. If
   the hash function for a packet falls within a configurable interval
   [a,b], then the packet should be sampled; otherwise, it should not
   be sampled. This features yields the full paths followed by sampled
   packets, by ensuring that a packet is sampled on every router it
   traverses, or no router at all.  This requires that the hash
   function and the set of packet fields over which it is computed are
   the same everywhere.

   A similar use of hash functions has also been considered for hash-
   based IP traceback of distributed denial-of-service (DDoS) attacks
   [SPSJTKS01].

5 References

   [B88] R.T. Braden, A pseudo-machine for packet monitoring and
   statistics, in Proc ACM SIGCOMM 1988

   [DG01] N. G. Duffield and M. Grossglauser, Trajectory Sampling for
   Direct Traffic Observation, IEEE/ACM Trans. on Networking, 9(3), pp.
   280-292, June 2001.

   [SPSJTKS01] A. C. Snoeren, C. Partridge, L. A. Sanchez, C. E. Jones,
   F. Tchakountio, S. T. Kent, W. T. Strayer, Hash-Based IP Traceback,
   Proc. ACM SIGCOMM 2001, San Diego, CA, September 2001.


6 Author's Addresses

   Nicholas G. Duffield
   AT&T Labs - Research
   Room B-139
   180 Park Ave
   Florham Park NJ 07932, USA
   Phone: +1 973-360-8726
   Email: duffield@research.att.com

   Albert Greenberg
   AT&T Labs - Research
   Room A-161
   180 Park Ave
   Florham Park NJ 07932, USA
   Phone: +1 973-360-8730
   Email: albert@research.att.com

   Matthias Grossglauser
   AT&T Labs - Research
   Room A-167
   180 Park Ave
   Florham Park NJ 07932, USA
   Phone: +1 973-360-7172
   Email: mgross@research.att.com

   Jennifer Rexford
   AT&T Labs - Research
   Room A-169
   180 Park Ave
   Florham Park NJ 07932, USA
   Phone: +1 973-360-8728
   Email: jrex@research.att.com

7 Intellectual Property Statement

   AT&T Corp. may own intellectual property applicable to this
   contribution. AT&T is currently reviewing its licensing intent
   relative to the Intellectual Property and will notify the IETF when
   AT&T has made a determination of that intent.

8 Full Copyright Statement

   Copyright (C) The Internet Society (1999).  All Rights Reserved.

   This document and translations of it may be copied and furnished to others,
   and derivative works that comment on or otherwise explain it or assist in
   its implementation may be prepared, copied, published and distributed, in
   whole or in part, without restriction of any kind, provided that the above
   copyright notice and this paragraph are included on all such copies and
   derivative works.  However, this document itself may not be modified in any
   way, such as by removing the copyright notice or references to the Internet
   Society or other Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for copyrights
   defined in the Internet Standards process must be followed, or as required
   to translate it into languages other than English.

   The limited permissions granted above are perpetual and will not be revoked
   by the Internet Society or its successors or assigns.

   This document and the information contained herein is provided on an "AS IS"
   basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE
   DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO
   ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY
   RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A
   PARTICULAR PURPOSE.