idnits 2.17.1 

draft-ietf-psamp-sample-tech-10.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 24.

  -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on
     line 1734.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1702.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1709.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1715.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  == There are 12 instances of lines with non-RFC6890-compliant IPv4
     addresses in the document.  If these are example addresses, they should
     be changed.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust Copyright Line does not match the
     current year

  == Line 1844 has weird spacing: '...ong int  ub4; ...'

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (June 2007) is 6160 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'PSAMP-TECH' is mentioned on line 165, but not defined

  == Missing Reference: 'IPFIX-PROTO' is mentioned on line 323, but not
     defined

  -- Looks like a reference, but probably isn't: '1' on line 1932

  == Missing Reference: 'N' is mentioned on line 651, but not defined

  -- Looks like a reference, but probably isn't: '0' on line 1957

  -- Looks like a reference, but probably isn't: '00' on line 1385

  == Missing Reference: 'FF' is mentioned on line 1385, but not defined

  -- Looks like a reference, but probably isn't: '2' on line 1932

  -- Looks like a reference, but probably isn't: '4' on line 1953

  -- Looks like a reference, but probably isn't: '5' on line 1934

  -- Looks like a reference, but probably isn't: '6' on line 1934

  -- Looks like a reference, but probably isn't: '8' on line 1936

  -- Looks like a reference, but probably isn't: '9' on line 1936

  == Outdated reference: A later version (-13) exists of
     draft-ietf-psamp-framework-11

  == Outdated reference: A later version (-11) exists of
     draft-ietf-psamp-info-06

  == Outdated reference: A later version (-09) exists of
     draft-ietf-psamp-protocol-07


     Summary: 1 error (**), 0 flaws (~~), 10 warnings (==), 17 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	  Internet Draft
3	  Document: <draft-ietf-psamp-sample-tech-10.txt>              T. Zseby
4	  Intended status: Proposed Standard                   Fraunhofer FOKUS
5	  Expires: December 2007                                      M. Molina
6	                                                                  DANTE
7	                                                            N. Duffield
8	                                                     AT&T Labs-Research
9	                                                           S. Niccolini
10	                                                        NEC Europe Ltd.
11	                                                             F. Raspall
12	                                                               EPSC-UPC

14	                                                              June 2007

16	     Sampling and Filtering Techniques for IP Packet Selection

18	  Status of this Memo

20	     By submitting this Internet-Draft, each author represents that
21	     any applicable patent or other IPR claims of which he or she is
22	     aware have been or will be disclosed, and any of which he or she
23	     becomes aware will be disclosed, in accordance with Section 6 of
24	     BCP 79.

26	     Internet-Drafts are working documents of the Internet
27	     Engineering Task Force (IETF), its areas, and its working
28	     groups.  Note that other groups may also distribute working
29	     documents as Internet-Drafts.

31	     Internet-Drafts are draft documents valid for a maximum of six
32	     months and may be updated, replaced, or obsoleted by other
33	     documents at any time.  It is inappropriate to use Internet-
34	     Drafts as reference material or to cite them other than as "work
35	     in progress."

37	     The list of current Internet-Drafts can be accessed at
38	     http://www.ietf.org/ietf/1id-abstracts.txt.

40	     The list of Internet-Draft Shadow Directories can be accessed at
41	     http://www.ietf.org/shadow.html.

43	     This Internet-Draft will expire on December, 2007.

45	  Copyright Notice

47	     Copyright (C) The IETF Trust (2007).

49	  Abstract

51	     This document describes Sampling and Filtering techniques for IP
52	     packet selection. It provides a categorization of schemes and
53	     defines what parameters are needed to describe the most common
54	     selection schemes. Furthermore it shows how techniques can be
55	     combined to build more elaborate packet Selectors. The document
56	     provides the basis for the definition of information models for
57	     configuring selection techniques in Metering Processes and for
58	     reporting the technique in use to a Collector.

60	  Conventions used in this document

62	     The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
63	     NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
64	     "OPTIONAL" in this document are to be interpreted as described
65	     in RFC 2119 [RFC2119].

67	  Table of Contents

69	     1.   Introduction.................................................4
70	     2.   PSAMP Documents Overview.....................................4
71	     3.   Terminology..................................................5
72	     3.1     Observation Points, Packet Streams and Packet Content.....5
73	     3.2     Selection Process.........................................6
74	     3.3     Reporting.................................................7
75	     3.4     Metering Process..........................................8
76	     3.5     Exporting Process.........................................8
77	     3.6     PSAMP Device..............................................8
78	     3.7     Collector.................................................9
79	     3.8     Selection Methods.........................................9
80	     4.   Categorization of Packet Selection Techniques...............11
81	     5.   Sampling....................................................13
82	     5.1     Systematic Sampling......................................14
83	     5.2     Random Sampling..........................................15
84	     5.2.1   n-out-of-N Sampling......................................15
85	     5.2.2   Probabilistic Sampling...................................15
86	     5.2.2.1 Uniform Probabilistic Sampling...........................15
87	     5.2.2.2 Non-Uniform Probabilistic Sampling.......................16
88	     5.2.2.3 Non-Uniform Flow State Dependent Sampling................16
89	     5.2.2.4 Configuration of non-uniform probabilistic and flow-
90	              state Sampling..........................................17
91	     6.   Filtering...................................................17
92	     6.1     Property Match Filtering.................................17
93	     6.2     Hash-based Filtering.....................................19
94	     6.2.1   Application Examples for Hash-based Selection............20
95	     6.2.1.1 Approximation of Random Sampling.........................20
96	     6.2.1.2 Trajectory Sampling and Consistent Packet Selection......20
97	     6.2.2   Security Considerations for Hash Functions...............21
98	     6.2.2.1 Vulnerabilities of Hash-based selection without
99	              knowledge of selection outcomes.........................22
100	     6.2.2.2 Vulnerabilities of Hash-based selection using knowledge
101	              of selection outcomes...................................23
102	     6.2.2.3 Vulnerabilities to Replay Attacks........................24
103	     6.2.3   Choice of Hash-Function..................................25
104	     6.2.3.1 Properties of some hash functions........................25
105	     6.2.3.2 Hash Functions for Packet Selection......................26
106	     6.2.3.3 Hash Functions Suitable for Packet Digesting.............27
107	     7.   Parameters for the Description of Selection Techniques......27
108	     7.1     Description of Sampling Techniques.......................28
109	     7.2     Description of Filtering Techniques......................29
110	     8.   Composite Techniques........................................31
111	     8.1     Cascaded Filtering->Sampling or Sampling->Filtering......31
112	     8.2     Stratified Sampling......................................32
113	     9.   Security Considerations.....................................32
114	     10.  Acknowledgements............................................33
115	     11.  IANA Considerations.........................................33
116	     12.  Normative References........................................33
117	     13.  Informative References......................................33
118	     Authors' Addresses...............................................36
119	     Intellectual Property Statement..................................37
120	     Copyright Statement..............................................37
121	     Appendix A: Hash Functions.......................................38
122	     A.1 IP Shift-XOR (IPSX) Hash Function............................38
123	     A.2 BOB Hash Function............................................39

125	  1. Introduction

127	     There are two main drivers for the growth in measurement
128	     infrastructures and their underlying technology. First, network
129	     data rates are increasing, with a concomitant growth in
130	     measurement data. Secondly, the growth is compounded by the
131	     demand by measurement-based applications for increasingly fine
132	     grained traffic measurements. Devices such as routers, which
133	     perform the measurements, require increasingly sophisticated and
134	     resource intensive measurement capabilities, including the
135	     capture of packet headers or even parts of the payload, and
136	     classification for flow analysis. All these factors can lead to
137	     an overwhelming amount of measurement data, resulting in high
138	     demands on resources for measurement, storage, transport and
139	     post processing.

141	     The sustained capture of network traffic at line rate can be
142	     performed by specialized measurement hardware. However, the cost
143	     of the hardware and the measurement infrastructure required to
144	     accommodate the measurements preclude this as a ubiquitous
145	     approach. Instead some form of data reduction at the point of
146	     measurement is necessary. This can be achieved by an intelligent
147	     packet selection through Sampling, Filtering, or aggregation.
148	     The motivation for Sampling is to select a representative subset
149	     of packets that allow accurate estimates of properties of the
150	     unsampled traffic to be formed. The motivation for Filtering is
151	     to remove all packets that are not of interest. Aggregation
152	     combines data and allows compact pre-defined views of the
153	     traffic. Examples of applications that benefit from packet
154	     selection are given in [PSAMP-FW]. Aggregation techniques are
155	     out of scope of this document.

157	  2. PSAMP Documents Overview

159	     [PSAMP-FW]:   "A Framework for Packet Selection and Reporting"
160	                    describes the PSAMP framework for network elements
161	                    to select subsets of packets by statistical and
162	                    other methods, and to export a stream of reports
163	                    on the selected packets to a Collector.

165	     [PSAMP-TECH]: "Sampling and Filtering Techniques for IP Packet
166	                    Selection" (this document) describes the set of
167	                    packet selection techniques supported by PSAMP.

169	     [PSAMP-PROTO]: "Packet Sampling (PSAMP) Protocol Specifications"
170	                    specifies the export of packet information from a
171	                    PSAMP Exporting Process to a PSAMP Colleting
172	                    Process.

174	     [PSAMP-INFO]: "Information Model for Packet Sampling Exports"
175	                    defines an information and data model for PSAMP.

177	  3. Terminology

179	     The PSAMP terminology defined here is fully consistent with all
180	     terms listed in [PSAMP-FW] but includes additional terms
181	     required for the description of packet selection methods. An
182	     architecture overview and possible configurations of PSAMP
183	     elements can be found in [PSAMP-FW]. PSAMP terminology also aims
184	     at consistency with terms used in [RFC3917]. The relationship
185	     between PSAMP and IPFIX terms is described in [PSAMP-FW].

187	  3.1 Observation Points, Packet Streams and Packet Content

189	     * Observation Point

191	        An Observation Point is a location in the network where
192	        packets can be observed. Examples include:

194	          (i)  a line to which a probe is attached;

196	          (ii) a shared medium, such as an Ethernet-based LAN;

198	          (iii) a single port of a router, or set of interfaces
199	                (physical or logical) of a router;

201	          (iv) an embedded measurement subsystem within an interface.

203	        Note that one Observation Point may be a superset of several
204	        other Observation Points.  For example one Observation Point
205	        can be an entire line card.  This would be the superset of the
206	        individual Observation Points at the line card's interfaces.

208	     * Observed Packet Stream
209	        The Observed Packet Stream is the set of all packets observed
210	        at the Observation Point.

212	     * Packet Stream

214	        A packet stream denotes a set of packets that flows past some
215	        specified point within the metering process. An example of a
216	        Packet Stream is the output of the selection process.
217	        Note that packets selected from a stream, e.g. by Sampling, do
218	        not necessarily possess a property by which they can be
219	        distinguished from packets that have not been selected. For
220	        this reason the term "stream" is favored over "flow", which is
221	        defined as set of packets with common properties [RFC3917].

223	     * Packet Content

225	        The packet content denotes the union of the packet header
226	        (which includes link layer, network layer and other
227	        encapsulation headers) and the packet payload.

229	  3.2 Selection Process

231	     * Selection Process

233	        A Selection Process takes the Observed Packet Stream as its
234	        input and selects a subset of that stream as its output.

236	     * Selection State

238	        A Selection Process may maintain state information for use by
239	        the Selection Process. At a given time, the Selection State
240	        may depend on packets observed at and before that time, and
241	        other variables. Examples include:

243	          (i)  sequence numbers of packets at the input of Selectors;

245	          (ii) a timestamp of observation of the packet at the
246	                Observation Point;

248	          (iii) iterators for pseudorandom number generators;

250	          (iv) hash values calculated during selection;

252	          (v)  indicators of whether the packet was selected by a
253	                given Selector;

255	        Selection Processes may change portions of the Selection State
256	        as a result of processing a packet. Selection state for a
257	        packet is to reflect the state after processing the packet.

259	     * Selector

261	        A Selector defines the action of a Selection Process on a
262	        single packet of its input. If selected, the packet becomes an
263	        element of the output Packet Stream.

265	        The Selector can make use of the following information in
266	        determining whether a packet is selected:

268	          (i)  the packet's content;

270	          (ii) information derived from the packet's treatment at the
271	                Observation Point;

273	          (iii) any selection state that may be maintained by the
274	                Selection Process.

276	     * Composite Selector

278	        A Composite Selector is an ordered composition of Selectors,
279	        in which the output Packet Stream issuing from one Selector
280	        forms the input Packet Stream to the succeeding Selector.

282	     * Primitive Selector

284	        A Selector is primitive if it is not a Composite Selector.

286	     * Selection Sequence

288	        From all the packets observed at an Observation Point, only a
289	        few packets are selected by one or more Selectors.  The
290	        Selection Sequence is a unique value per Observation Domain
291	        describing the Observation Point and the Selector IDs through
292	        which the packets are selected.

294	  3.3 Reporting

296	     * Packet Reports

298	        Packet Reports comprise a configurable subset of a packet's
299	        input to the Selection Process, including the packet's
300	        content, information relating to its treatment (for example,
301	        the output interface), and its associated selection state (for
302	        example, a hash of the packet's content)

304	     * Report Interpretation:

306	        Report Interpretation comprises subsidiary information,
307	        relating to one or more packets, that is used for
308	        interpretation of their packet reports. Examples include
309	        configuration parameters of the Selection Process.

311	     * Report Stream:

313	        The Report Stream is the output of a Metering Process,
314	        comprising two distinguished types of information: Packet
315	        Reports, and Report Interpretation.

317	  3.4 Metering Process

319	        A Metering Process selects packets from the Observed Packet
320	        Stream using a Selection Process, and produces as output a
321	        Report Stream concerning the selected packets. The PSAMP
322	        Metering Process can be viewed as analogous to the IPFIX
323	        metering process [IPFIX-PROTO], which produces flow records as
324	        its output.  While the Metering Process definition in this
325	        document specifies the PSAMP definition, the PSAMP protocol
326	        specifications [PSAMP-PROTO] will use the IPFIX Metering
327	        Process definition, which also suits the PSAMP requirements.
328	        The relationship between PSAMP and IPFIX is described more in
329	        [PSAMP-INFO] and [PSAMP-PROTO].

331	  3.5 Exporting Process

333	     * Exporting Process:

335	        An Exporting Process sends, in the form of Export Packet, the
336	        output of one or more Metering Processes to one or more
337	        Collectors.

339	     * Export Packet:

341	        An Export Packet is a combination of Report Interpretation
342	        and/or one or more Packet Reports are bundled by the Exporting
343	        Process into an Export Packet for exporting to a Collector.

345	  3.6 PSAMP Device

347	     * PSAMP Device

349	        A PSAMP Device is a device hosting at least an Observation
350	        Point, a Metering Process and an Exporting Process. Typically,
351	        corresponding Observation Point(s), Metering Process(es) and
352	        Exporting Process(es) are co-located at this device, for
353	        example at a router.

355	  3.7 Collector

357	     * Collector

359	        A Collector receives a Report Stream exported by one or more
360	        Exporting Processes. In some cases, the host of the Metering
361	        and/or Exporting Processes may also serve as the Collector.

363	  3.8 Selection Methods

365	     * Filtering
366	        A filter is a Selector that selects a packet deterministically
367	        based on the Packet Content, or its treatment, or functions of
368	        these occurring in the Selection State.  Two examples are:

370	          (i) Property match filtering: a packet is selected if a
371	                specific field in the packet equals a predefined
372	                value.

374	          (ii) Hash-based selection: a hash function is applied to
375	                the Packet Content, and the packet is selected if the
376	                result falls in a specified range.

378	     * Sampling

380	        A selector that is not a filter is called a sampling
381	        operation.  This reflects the intuitive notion that if the
382	        selection of a packet cannot be determined from its content
383	        alone, there must be some type of sampling taking place.
384	        Sampling operations can be divided into two subtypes:

386	           (i) Content-independent sampling, which does not use
387	                Packet Content in reaching sampling decisions.
388	                Examples include systematic sampling, and uniform
389	                pseudorandom sampling driven by a pseudorandom number
390	                whose generation is independent of Packet Content.
391	                Note that in Content-independent Sampling it is not
392	                necessary to access the Packet Content in order to
393	                make the selection decision.

395	          (ii) Content-dependent sampling, in which the Packet
396	                Content is used in reaching selection decisions.  An
397	                application is pseudorandom selection according to a
398	                probability that depends on the contents of a packet
399	                field, e.g., sampling packets with a probability
400	                dependent on their TCP/UDP port numbers.  Note that
401	                this is not a Filter.

403	     * Hash Domain

405	        A subset of the Packet Content and the packet treatment,
406	        viewed as an N-bit string for some positive integer N.

408	     * Hash Range

410	        A set of M-bit strings for some positive integer M that define
411	        the range of values the result of the hash operation can take.

413	     * Hash Function

415	        A deterministic map from the Hash Domain into the Hash Range.

417	     * Hash Selection Range

419	        A subset of the Hash Range. The packet is selected if the
420	        action of the Hash Function on the Hash Domain for the packet
421	        yields a result in the Hash Selection Range.

423	     * Hash-based Selection

425	        Filtering specified by a Hash Domain, a Hash Function, and
426	        Hash Range and a Hash Selection Range.

428	     * Approximative Selection

430	        Selectors in any of the above categories may be approximated
431	        by operations in the same or another category for the purposes
432	        of implementation. For example, uniform pseudorandom Sampling
433	        may be approximated by Hash-based Selection, using a suitable
434	        Hash Function and Hash Domain. In this case, the closeness of
435	        the approximation depends on the choice of Hash Function and
436	        Hash Domain.

438	     * Population

440	        A Population is a Packet Stream, or a subset of a Packet
441	        Stream. A Population can be considered as a base set from
442	        which packets are selected. An example is all packets in the
443	        Observed Packet Stream that are observed within some specified
444	        time interval.

446	     * Population Size

448	        The Population Size is the number of all packets in the
449	        Population.

451	     * Sample Size

453	        The number of packets selected from the Population by a
454	        Selector.

456	     * Configured Selection Fraction

458	        The Configured Selection Fraction is the ratio of the number
459	        of packets selected by a Selector from an input Population, to
460	        the Population Size, as based on the configured selection
461	        parameters.

463	     * Attained Selection Fraction

465	        The Attained Selection Fraction is the actual ratio of the
466	        number of packets selected by a Selector from an input
467	        Population, to the Population Size.

469	     For some sampling methods the Attained Selection Fraction can
470	     differ from the Configured Selection Fraction due to, for
471	     example, the inherent statistical variability in sampling
472	     decisions of probabilistic Sampling and Hash-based Selection.
473	     Nevertheless, for large Population Sizes and properly configured
474	     Selectors, the Attained Selection Fraction usually approaches
475	     the Configured Selection Fraction.

477	  4. Categorization of Packet Selection Techniques

479	     Packet selection techniques generate a subset of packets from an
480	     Observed Packet Stream at an Observation Point. We distinguish
481	     between Sampling and Filtering.

483	     Sampling is targeted at the selection of a representative subset
484	     of packets. The subset is used to infer knowledge about the
485	     whole set of observed packets without processing them all. The
486	     selection can depend on packet position, and/or on packet
487	     content, and/or on (pseudo) random decisions.

489	     Filtering selects a subset with common properties. This is used
490	     if only a subset of packets is of interest. The properties can
491	     be directly derived from the packet content, or depend on the
492	     treatment given by the router to the packet. Filtering is a
493	     deterministic operation. It depends on packet content or router
494	     treatment. It never depends on packet position or on (pseudo)
495	     random decisions.

497	     Note that a common technique to select packets is to compute a
498	     Hash Function on some bits of the packet header and/or content
499	     and to select it if the Hash Value falls in the Hash Selection
500	     Range. Since hashing is a deterministic operation on the packet
501	     content, it is a Filtering technique according to our
502	     categorization. Nevertheless, Hash Functions are sometimes used
503	     to emulate random Sampling. Depending on the chosen input bits,
504	     the Hash Function and the Hash Selection Range, this technique
505	     can be used to emulate the random selection of packets with a
506	     given probability p. It is also a powerful technique to
507	     consistently select the same packet subset at multiple
508	     Observation Points [DuGr00]

510	     The following table gives an overview of the schemes described
511	     in this document and their categorization. An X in brackets (X)
512	     denotes schemes for which also content-independent variants
513	     exist. It easily can be seen that only schemes with both
514	     properties, content dependence and deterministic selection, are
515	     considered as filters.

517	            Selection Scheme   | Deterministic | Content- | Category
518	                               |  Selection    | dependent|
519	       ------------------------+---------------+----------+----------
520	        Systematic             |       X       |     _    | Sampling
521	        Count-based            |               |          |
522	       ------------------------+---------------+----------+----------
523	        Systematic             |       X       |     -    | Sampling
524	        Time-based             |               |          |
525	       ------------------------+---------------+----------+----------
526	        Random                 |       -       |     -    | Sampling
527	        n-out-of-N             |               |          |
528	       ------------------------+---------------+----------+----------
529	        Random                 |       -       |     -    | Sampling
530	        Uniform probabilistic  |               |          |
531	       ------------------------+---------------+----------+----------
532	        Random                 |       -       |    (X)   | Sampling
533	        Non-uniform probabil.  |               |          |
534	       ------------------------+---------------+----------+----------
535	        Random                 |       -       |    (X)   | Sampling
536	        Non-uniform flow-state |               |          |
537	       ------------------------+---------------+----------+----------
538	        Property Match         |       X       |    (X)   | Filtering
539	        Filtering              |               |          |
540	       ------------------------+---------------+----------+----------
541	        Hash Function          |       X       |     X    | Filtering
542	       ------------------------+---------------+----------+----------

544	     The categorization just introduced is mainly useful for the
545	     definition of an information model describing Primitive
546	     Selectors. More complex selection techniques can be described
547	     through the composition of cascaded Sampling and Filtering
548	     operations. For example, a packet selection that weights the
549	     selection probability on the basis of the packet length can be
550	     described as a cascade of a Filtering and a Sampling scheme.
551	     However, this descriptive approach is not intended to be rigid:
552	     if a common and consolidated selection practice turns out to be
553	     too complex to be described as a composition of the mentioned
554	     building blocks, an ad hoc description can be specified instead
555	     and added as a new scheme to the information model.

557	  5. Sampling

559	     The deployment of Sampling techniques aims at the provisioning
560	     of information about a specific characteristic of the parent
561	     population at a lower cost than a full census would demand. In
562	     order to plan a suitable Sampling strategy it is therefore
563	     crucial to determine the needed type of information and the
564	     desired degree of accuracy in advance.

566	     First of all it is important to know the type of metric that
567	     should be estimated. The metric of interest can range from
568	     simple packet counts [JePP92] up to the estimation of whole
569	     distributions of flow characteristics (e.g. packet
570	     sizes)[ClPB93].

572	     Secondly, the required accuracy of the information and with
573	     this, the confidence that is aimed at, should be known in
574	     advance. For instance for usage-based accounting the required
575	     confidence for the estimation of packet counters can depend on
576	     the monetary value that corresponds to the transfer of one
577	     packet. That means that a higher confidence could be required
578	     for expensive packet flows (e.g. premium IP service) than for
579	     cheaper flows (e.g. best effort). The accuracy requirements for
580	     validating a previously agreed quality can also vary extremely
581	     with the customer demands. These requirements are usually
582	     determined by the service level agreement (SLA).

584	     The Sampling method and the parameters in use must be clearly
585	     communicated to all applications that use the measurement data.
586	     Only with this knowledge a correct interpretation of the
587	     measurement results can be ensured.

589	     Sampling methods can be characterized by the Sampling algorithm,
590	     the trigger type used for starting a Sampling interval and the
591	     length of the Sampling interval. These parameters are described
592	     here in detail. The Sampling algorithm describes the basic
593	     process for selection of samples. In accordance to [AmCa89] and
594	     [ClPB93] we define the following basic Sampling processes:

596	  5.1 Systematic Sampling

598	     Systematic Sampling describes the process of selecting the start
599	     points and the duration of the selection intervals according to
600	     a deterministic function. This can be for instance the periodic
601	     selection of every k-th element of a trace but also the
602	     selection of all packets that arrive at pre-defined points in
603	     time. Even if the selection process does not follow a periodic
604	     function (e.g. if the time between the Sampling intervals varies
605	     over time) we consider this as systematic Sampling as long as
606	     the selection is deterministic.

608	     The use of systematic Sampling always involves the risk of
609	     biasing the results. If the systematics in the Sampling process
610	     resemble systematics in the observed stochastic process
611	     (occurrence of the characteristic of interest in the network),
612	     there is a high probability that the estimation will be biased.
613	     Systematics in the observed process might not be known in
614	     advance.

616	     Here only equally spaced schemes are considered, where triggers
617	     for Sampling are periodic, either in time or in packet count.
618	     All packets occurring in a selection interval (either in time or
619	     packet count) beyond the trigger are selected.

621	     Systematic count-based
622	     In systematic count-based Sampling the start and stop triggers
623	     for the Sampling interval are defined in accordance to the
624	     spatial packet position (packet count).

626	     Systematic time-based
627	     In systematic time-based Sampling time-based start and stop
628	     triggers are used to define the Sampling intervals. All packets
629	     are selected that arrive at the Observation Point within the
630	     time-intervals defined by the start and stop triggers (i.e.

632	     arrival time of the packet is larger than the start time and
633	     smaller than the stop time).

635	     Both schemes are content-independent selection schemes. Content
636	     dependent deterministic Selectors are categorized as filter.

638	  5.2 Random Sampling

640	     Random Sampling selects the starting points of the Sampling
641	     intervals in accordance to a random process. The selection of
642	     elements are independent experiments. With this, unbiased
643	     estimations can be achieved. In contrast to systematic Sampling,
644	     random Sampling requires the generation of random numbers. One
645	     can differentiate two methods of random Sampling:

647	  5.2.1   n-out-of-N Sampling

649	     In n-out-of-N Sampling n elements are selected out of the parent
650	     population that consists of N elements. One example would be to
651	     generate n different random numbers in the range [1,N] and
652	     select all packets which have a packet position equal to one of
653	     the random numbers. For this kind of Sampling the Sample Size n
654	     is fixed.

656	  5.2.2   Probabilistic Sampling

658	     In probabilistic Sampling the decision whether an element is
659	     selected or not is made in accordance to a pre-defined selection
660	     probability. An example would be to flip a coin for each packet
661	     and select all packets for which the coin showed the head. For
662	     this kind of Sampling the Sample Size can vary for different
663	     trials. The selection probability does not necessarily has to be
664	     the same for each packet. Therefore we distinguish between
665	     uniform probabilistic Sampling (with the same selection
666	     probability for all packets) and non-uniform probabilistic
667	     Sampling (where the selection probability can vary for different
668	     packets).

670	  5.2.2.1 Uniform Probabilistic Sampling

672	     For Uniform Probabilistic Sampling packets are selected
673	     independently with a uniform probability p. This Sampling can be
674	     count-driven, and is sometimes referred to as geometric random
675	     Sampling, since the difference in count between successive
676	     selected packets are independent random variables with a
677	     geometric distribution of mean 1/p. A time-driven analog,
678	     exponential random Sampling, has the time between triggers
679	     exponentially distributed.

681	     Both geometric and exponential random Sampling are examples of
682	     what is known as additive random Sampling, defined as Sampling
683	     where the intervals or counts between successive samples are
684	     independent identically distributed random variable.

686	  5.2.2.2 Non-Uniform Probabilistic Sampling

688	     This is a variant of Probabilistic Sampling in which the
689	     Sampling probabilities can depend on the selection process
690	     input. This can be used to weight Sampling probabilities in
691	     order e.g. to boost the chance of Sampling packets that are rare
692	     but are deemed important. Unbiased estimators for quantitative
693	     statistics are recovered by renormalization of sample values;
694	     see [HT52].

696	  5.2.2.3 Non-Uniform Flow State Dependent Sampling

698	     Another type of Sampling that can be classified as probabilistic
699	     Non-Uniform is closely related to the flow concept as defined in
700	     [RFC3917], and it is only used jointly with a flow monitoring
701	     function (IPFIX metering process). Packets are selected,
702	     dependent on a selection state. The point, here, is that the
703	     selection state is determined also by the state of the flow the
704	     packet belongs to and/or by the state of the other flows
705	     currently being monitored by the associated flow monitoring
706	     function. An example for such an algorithm is the "sample and
707	     hold" method described in [EsVa01]:

709	     - If a packet accounts for a flow record that already exists in
710	        the IPFIX flow recording process, it is selected (i.e. the
711	        flow record is updated)
712	     - If a packet doesn't account to any existing flow record, it is
713	        selected with probability p. If it has been selected a new
714	        flow record has to be created.

716	     A further algorithm that fits into the category of non-uniform
717	     flow state dependent Sampling is described in [Moli03].

719	     This type of Sampling is content dependent because the
720	     identification of the flow the packet belongs to requires
721	     analyzing part of the packet content. If the packet is selected,
722	     then it is passed as an input to the IPFIX monitoring function
723	     (this is called "Local Export" in [PSAMP-FW]. Selecting the
724	     packet depending on the state of a flow cache is useful when
725	     memory resources of the flow monitoring function are scarce
726	     (i.e. there is no room to keep all the flows that have been
727	     scheduled for monitoring).

729	  5.2.2.4 Configuration of non-uniform probabilistic and flow-state
730	        Sampling

732	     Many different specific methods can be grouped under the terms
733	     non-uniform probabilistic and flow state Sampling. Dependent on
734	     the Sampling goal and the implemented scheme, a different number
735	     and type of input parameters is required to configure such
736	     scheme.

738	     Some concrete proposals for such methods exist from the research
739	     community (e.g. [EsVa01],[DuLT01],[Moli03]). Some of these
740	     proposals are still in an early stage and need further
741	     investigations to prove their usefulness and applicability. It
742	     is not our aim to indicate preference amongst these methods.
743	     Instead, we only describe here the basic methods and leave the
744	     specification of explicit schemes and their parameters up to
745	     vendors (e.g. as extension of the information model).

747	  6. Filtering

749	     Filtering is the deterministic selection of packets based on the
750	     packet content, the treatment of the packet at the Observation
751	     Point, or deterministic functions of these occurring in the
752	     selection state. The packet is selected if these quantities fall
753	     into a specified range. The role of Filtering, as the word
754	     itself suggest, is to separate all the packets having a certain
755	     property from those not having it. A distinguishing
756	     characteristic from Sampling is that the selection decision does
757	     not depend on the packet position in time or in the space, or on
758	     a random process.
759	     We identify and describe in the following two Filtering
760	     techniques.

762	  6.1 Property Match Filtering

764	     With this Filtering method a packet is selected if specific
765	     fields within the packet and/or properties of the router state
766	     equal a predefined value. Possible filter fields are all IPFIX
767	     flow attributes specified in [IPFIX-INFO]. Further fields can be
768	     defined by vendor specific extensions.

770	     A packet is selected if Field=Value. Masks and ranges are only
771	     supported to the extent to which [IPFIX-INFO] allows them e.g.
772	     by providing explicit fields like the netmasks for source and
773	     destination addresses.

775	     AND operations are possible by concatenating filters, thus
776	     producing a composite selection operation.  In this case, the
777	     ordering in which the filtering happens is implicitly defined
778	     (outer filters come after inner filters).  However, as long as
779	     the concatenation is on filters only, the result of the cascaded
780	     filter is independent from the order, but the order may be
781	     important for implementation purposes, as the first filter will
782	     have to work at a higher rate.  In any case, an implementation
783	     is not constrained to respect the filter ordering, as long as
784	     the result is the same, and it may even implement the composite
785	     filtering in filtering in one single step.

787	     OR operations are not supported with this basic model.  More
788	     sophisticated filters (e.g. supporting bitmasks, ranges or OR
789	     operations etc.) can be realized as vendor specific schemes.

791	     Property match operations should be available for different
792	     protocol portions of the packet header:

794	           (i) the IP header (excluding options in IPv4, stacked
795	                headers in IPv6)

797	          (ii) transport header

799	          (iii) encapsulation headers (e.g. the MPLS label stack, if
800	                present)

802	     When the PSAMP Device offers property match filtering, and, in
803	     its usual capacity other than in performing PSAMP functions,
804	     identifies or processes information from IP, transport or
805	     encapsulation protocols, then the information should be made
806	     available for filtering.  For example, when a PSAMP Device
807	     routes based on destination IP address, that field should be
808	     made available for filtering.  Conversely, a PSAMP Device that
809	     does not route is not expected to be able to locate an IP
810	     address within a packet, or make it available for Filtering,
811	     although it may do so.

813	     Since packet encryption conceals the real values of encrypted
814	     fields, property match filtering must be configurable to ignore
815	     encrypted packets, when detected.

817	     The Selection Process may support filtering based on the
818	     properties of the router state:

820	          (i)  Ingress interface at which packet arrives equals a
821	                specified value

823	          (ii) Egress interface to which packet is routed to equals a
824	                specified value

826	          (iii) Packet violated Access Control List (ACL) on the
827	                router

829	          (iv)  Failed Reverse Path Forwarding (RPF)

831	          (v)  Failed Resource Reservation (RSVP)

833	          (vi)  No route found for the packet

835	          (vii) Origin Border Gateway Protocol (BGP) Autonomous System
836	                (AS) [RFC4271] equals a specified value or lies within
837	                a given range
838	          (viii)Destination BGP AS equals a specified value or lies
839	                within a given range

841	     Router architectural considerations may preclude some
842	     information concerning the packet treatment being available at
843	     line rate for selection of packets.  For example, the Selection
844	     Process may not be implemented in the fast path that is able to
845	     access routing state at line rate.  However, when filtering
846	     follows sampling (or some other selection operation) in a
847	     Composite Selector, the rate of the Packet Stream output from
848	     the sampler and input to the filter may be sufficiently slow
849	     that the filter could select based on routing state.

851	  6.2 Hash-based Filtering

853	     A Hash Function h maps the Packet Content c, or some portion of
854	     it, onto a Hash Range R. The packet is selected if h(c) is an
855	     element of S, which is a subset of R called the Hash Selection
856	     Range. Thus Hash-based Selection is indeed a particular case of
857	     Filtering: the object is selected if c is in inv(h(S)). But for
858	     desirable Hash Functions the inverse image inv(h(S)) will be
859	     extremely complex, and hence h would not be expressible as, say,
860	     a Property Match Filter or a simple combination of these.

862	     Hash-based selection has mainly two types of usage: it offers a
863	     way to approximate random Sampling by using packet content to
864	     generate pseudorandom variates, and a way to consistently select
865	     subsets of packets that share a common property (e.g. at
866	     different Observation Points).

868	     In the following subsections we give more details about them.
869	     However, both usages require that the Hash Functions has two
870	     statistical properties.

872	     First, the Hash Function h must have good mixing properties, in
873	     the sense that small changes in the input (e.g. the flipping of
874	     a single bit) cause large changes in the output (many bits
875	     change). Then any local clump of values of c is spread widely
876	     over R by h, and so the distribution of h(c) is fairly uniform
877	     even if the distribution of c is not. Then the Sampling Fraction
878	     is #S/#R, which can be tuned by choice of S.

880	     The second desirable property depends more closely on the
881	     statistics of the content c. In applications, the content c
882	     comprises a number of distinct fields, c1 ... cm, e.g. source
883	     and destination IP Address, IP identification, and TCP/UDP port
884	     numbers (if present) for a packet. With a Hash Function
885	     satisfying the first properties above, selection decisions will
886	     appear uncorrelated with the contents of any individual field,
887	     if the complementary fields are (i) sufficiently variable
888	     themselves, and (ii) sufficiently uncorrelated with cj.

890	  6.2.1   Application Examples for Hash-based Selection

892	  6.2.1.1 Approximation of Random Sampling

894	     Although pseudorandom number generators with well understood
895	     properties have been developed, they may not be the method of
896	     choice in settings where computational resources are scarce. A
897	     convenient alternative is to use Hash Functions of packet
898	     content as a source of randomness. The hash (suitably
899	     renormalized) is a pseudorandom variate in the interval [0,1].
900	     Other schemes may use packet fields in iterators for
901	     pseudorandom numbers. However, the statistical properties of an
902	     ideal packet selection law (such as independent Sampling for
903	     different packets, or independence on packet content) may not be
904	     exactly rendered by an implementation, but only approximately
905	     so.

907	     Use of packet content to generate pseudorandom variates shares
908	     with Non-uniform Probabilistic Sampling (see Section 3.1.2.2.2
909	     above) the property that selection decisions depend on Packet
910	     Content. However, there is a fundamental difference between the
911	     two. In the former case the content determines pseudorandom
912	     variates. In the latter case the content only determines the
913	     selection probabilities: selection could then proceed e.g by use
914	     of random variates obtained by an independent pseudorandom
915	     number generator.

917	  6.2.1.2 Trajectory Sampling and Consistent Packet Selection.

919	     Trajectory Sampling is the consistent selection of a subset of
920	     packets at either all of a set of Observation Points or none of
921	     them. Trajectory Sampling is realized by Hash-based Selection if
922	     all Observation Points in the set use a common Hash Function,
923	     Hash Domain and selection range. The Hash Domain comprises all
924	     or part of the packet content that is invariant along the packet
925	     path. Fields such as Time-to-Live, which is decremented per hop,
926	     and header CRC, which is recalculated per hop, are thus excluded
927	     from the Hash Domain. The Hash Domain needs to be wider than
928	     just a flow key, if packets are to be selected quasirandomly
929	     within flows.

931	     The trajectory (or path) followed by a packet is reconstructed
932	     from PSAMP reports on it that reach a Collector. Reports on a
933	     given packet originating from different observations points are
934	     associated by matching a label from the reports. The label may
935	     comprise that portion invariant packet content that is reported,
936	     or possibly some digest of the invariant packet content that is
937	     inserted into the packet report at the Observation Point. Such a
938	     digest may be constructed by applying a second Hash Function
939	     (distinct from that used for selection) to the invariant packet
940	     content. The reconstruction of trajectories, and methods for
941	     dealing with possible ambiguities due to label collisions
942	     (identical labels reported for different packets) and potential
943	     loss of reports in transmission, are dealt with in [DuGr00],
944	     [DuGG02] and [DuGr04].

946	     Applications of trajectory Sampling include (i) estimation of
947	     the network path matrix, i.e., the traffic intensities according
948	     to network path, broken down by flow key; (ii) detection of
949	     routing loops, as indicated by self-intersecting trajectories;
950	     (iii) passive performance measurement: prematurely terminating
951	     trajectories indicate packet loss, packet one way delay can be
952	     determined if reports include (synchronized) timestamps of
953	     packet arrival at the Observation Point; (iv) network attack
954	     tracing, of the actual paths taken by attack packets with
955	     spoofed source addresses.

957	  6.2.2   Security Considerations for Hash Functions

959	     A concern for Hash-based Selection is whether some large set of
960	     related packets could be disproportionately sampled, i.e., have
961	     an Attained Sampling Fraction significantly different from the
962	     Configured Sampling Fraction, either (i) through unanticipated
963	     behavior in the Hash Function, or (ii) because the packets had
964	     been deliberately crafted to have this property.

966	     The first point underlines the importance of using a Hash
967	     Function with good mixing properties. The statistical properties
968	     of candidate Hash Functions need to be evaluated, preferably on
969	     packet traces before adoption for hash-based Sampling. However,
970	     hash functions which perform well on typical traffic may not be
971	     sufficiently strong to withstand attacks specifically targeted
972	     against them. As detailed in the following section, only
973	     cryptographic hash functions employing a private parameter
974	     operating in pseudo-random function mode are sufficiently strong
975	     to
976	     withstand the range of conceivable attacks.   For example, fixed
977	     or
978	     variable length inputs could be hashed using a block cipher
979	     (like AES) in cipher-block-chaining mode.  Fixed length inputs
980	     could also be hashed using an iterated cryptographic hash
981	     function (like MD5 or SHA1), with a private initial vector.  For
982	     variable length inputs, iterated cryptographic hash function
983	     (like MD5 or SHA1) should employ private string post-pended to
984	     the data in addition to a private initial vector. For more
985	     details, see the "append-cascade" construction of [BeCK96].

987	     The following assumes that the hash function is public and hence
988	     known to an attacker. An attacker uses its knowledge of the hash
989	     function to craft packets which are then dispatched, either as
990	     the attack itself, or to elicit further information which can be
991	     used to refine the attack. Thus two scenarios are considered. In
992	     the first scenario, the attacker has no knowledge about whether
993	     the crafted packets are selected or not. In the second scenario
994	     the attacker uses some knowledge of sampling outcomes; the means
995	     by which this might be acquired is discussed below. Some attacks
996	     that involve tampering with export packets in transit, as
997	     opposed to attacking the PSAMP device, are discussed in
998	     [GoRe07].

1000	  6.2.2.1 Vulnerabilities of Hash-based selection without knowledge
1001	        of selection outcomes.

1003	     (i) The hash function does not use a private parameter.

1005	     If the selection range is public, an attacker can craft packets
1006	     whose selection properties are known in advance. If the
1007	     selection range is private, an attacker cannot determine whether
1008	     a crafted packet is selected. However by computing the hash on
1009	     different trial crafted packets, and selecting those yielding a
1010	     given hash value, the attacker can construct an arbitrarily
1011	     large set of distinct packets with a common selection
1012	     properties, i.e., packets that will be either all selected or
1013	     all not selected. This can be done whatever the strength of the
1014	     hash function.

1016	     (ii) The hash function is not cryptographically strong.

1018	     If the hash function is not cryptographically strong, it may
1019	     still be possible to construct sequences of distinct packets
1020	     with the common selection property. An example is the standard
1021	     CRC-32 hash function used with a private modulus (but without a
1022	     private string post-pended to the input). It has weak mixing
1023	     properties for low order bits. Consequently, simply by
1024	     incrementing the hash input, one obtains distinct packets whose
1025	     hashes mostly fall in a narrow range, and hence are likely
1026	     commonly selected; see [GoRe07]

1028	     Suitable parameterization of the hash function can make such
1029	     attacks more difficult. For example, post-pending a private
1030	     string to the input before hashing with CRC-32 will give
1031	     stronger mixing properties over all bits of the input. However,
1032	     with a hash function, such as CRC-32, that is not
1033	     cryptographically strong, the possibility of discovering a
1034	     method to construct packet sets with the common selected
1035	     property cannot be ruled out, even when a private modulus or
1036	     post-pended string is used.

1038	  6.2.2.2 Vulnerabilities of Hash-based selection using knowledge of
1039	        selection outcomes.

1041	     Knowledge of the selection outcomes of crafted packets can by
1042	     used by an attacker to more easily construct sets of packets
1043	     which are disproportionately sampled and/or are commonly
1044	     selected. There are several ways an attacker might acquire this
1045	     knowledge:

1047	     (i) Billing Reports: if samples are used for billing purposes,
1048	     then the selection outcomes of packets may be able to be
1049	     inferred by correlating a crafted packet stream with the billing
1050	     reports that it generates. However, the rate at knowledge of
1051	     selection outcomes can be acquired depends on the temporal and
1052	     spatial granularity of the billing reports, being slower the
1053	     more aggregated the reports are.

1055	     (ii) Feedback from an Intrusion Detection System: e.g., a
1056	     botmaster adversary learns if his packets were detected by the
1057	     intrusion detection system by seeing if one of his bots is
1058	     blocked by the network.

1060	     (iii) Observation of the Report Stream: export packets sent
1061	     across a public network may be eavesdropped on by an adversary.
1062	     Encryption of the export packets provides only a partial
1063	     defense, since it may be possible to infer the selection
1064	     outcomes of packets by correlating a crafted packet stream with
1065	     the occurrence (not the content) of packets in the export stream
1066	     that it generates. The rate at which such knowledge could be
1067	     acquired is limited by the temporal resolution at which reports
1068	     can be associated with packets, e.g. due to processing and
1069	     propagation variability, and difficulty in distinguishing report
1070	     on attack packets from those of background traffic, if present.
1071	     The association between packets and their reports on which this
1072	     depends could be removed by padding export packets to a constant
1073	     length and sending them at a constant rate.

1075	     We now turn to attacks that can exploit knowledge of selection
1076	     outcomes. Firstly, with a non-cryptographic hash function,
1077	     knowledge of selection outcomes for a trial stream may be used
1078	     to further craft a packet set with the common selection
1079	     property. This has been demonstrated for the modular hash f(x) =
1080	     a x + b mod k, for private parameters a, b, and k. With sampling
1081	     rate p, knowledge of the sampling outcomes of roughly 2/p is
1082	     sufficient for the attack to succeed, independent of the values
1083	     of a, b and k. With knowledge of the selection outcomes of a
1084	     larger number of packets, the parameters a b and k can be
1085	     determined; see [GoRe07].

1087	     A cryptographic hash function employing a private parameter and
1088	     operating in one of the pseudo-random function modes specified
1089	     above is not vulnerable to these attacks, even if the selection
1090	     range is known.

1092	  6.2.2.3 Vulnerabilities to Replay Attacks

1094	     Since hash-based selection is deterministic, any packet or set
1095	     of packets with known selection properties can be replayed into
1096	     a network and experience the same selection outcomes provide the
1097	     hash function and its parameters are not changed. Repetition of
1098	     a single packet may be noticeable to other measurement methods
1099	     if employed (e.g. collection of flow statistics), whereas a set
1100	     of distinct packets that appears statistically similar to
1101	     regular traffic may be less noticeable.

1103	     Replay attacks may be mitigated by repeated changing of hash
1104	     function parameters. This also prevents attacks that exploit
1105	     knowledge of sampling outcomes, at least if the parameters are
1106	     changed at least as fast as the knowledge can be acquired by an
1107	     attacker. In order to preserve the ability to perform Trajectory
1108	     Sampling, parameter changed would have to be simultaneous (or
1109	     approximately so) across all observation point.

1111	  6.2.3   Choice of Hash-Function

1113	     The specific choice of hash function represents a trade-off
1114	     between complexity and ease of implementation. Ideally, a
1115	     cryptographically strong hash function employing a private
1116	     parameter and operating in pseudorandom function mode as
1117	     specified above would be used, yielding a good emulation a
1118	     random packet selection at a target sampling rate, and giving
1119	     maximal robustness against the attacks described in the previous
1120	     section. However, it is not assumed that all PSAMP devices will
1121	     be capable of applying a cryptographically strong hash function
1122	     to every packet at line rate. For this reason, the hash
1123	     functions listed in this section will be of a weaker variety.
1124	     Future protocol extensions that employ stronger hash functions
1125	     are not precluded.

1127	  6.2.3.1 Properties of some hash functions.

1129	     This document recommends 3 hash functions: IPSX, BOB and CRC-32.
1130	     Specifications of IPSX and BOB are in the appendix; the CRC-32
1131	     function is described in [RFC1071]. None of these hash functions
1132	     is recommended for cryptographic purposes. A comparison of hash-
1133	     functions with regard to execution speed, collision probability,
1134	     uniformity of the distribution of values in the Hash
1135	     Range and the speed of the functions is described in [MoND05].

1137	     (i) Speed: IPSX is simple to implement and was correspondingly
1138	     about an order of magnitude faster to execute per packet than
1139	     BOB or CRC-32.

1141	     (ii) Uniformity: All three hash functions evaluated showed
1142	     relatively poor uniformity with 16 byte input that was drawn
1143	     from only invariant fields in the IP and TCP/UDP headers (i.e.
1144	     header fields that do not change from hop to hop). IPSX is
1145	     inherently limited to 16 bytes. BOB and
1146	     CRC-32 exhibits noticeably better uniformity when 4 or more
1147	     bytes from the payload are also included in the input. Although
1148	     the uniformity has been checked for different traffic traces,
1149	     results cannot be generalized to arbitrary traffic. Since hash-
1150	     based selection is a deterministic function on the packet
1151	     content, it can always be biased towards packets with specific
1152	     attributes. Furthermore, it should be noted that all Hash
1153	     Functions were evaluated only for IPv4.

1155	  6.2.3.2 Hash Functions for Packet Selection

1157	     The BOB function SHOULD be used for packet selection operations.
1158	     Both the parameter (the init value) and the selection range
1159	     should be kept private. Other functions, such as CRC-32 and IPSX
1160	     MAY be used. If CRC-32 is used, the input should first be post-
1161	     pended with a private string that acts as a parameter, and the
1162	     modulus of the CRC should also be kept private.

1164	     Input bytes for the Hash Function need to be invariant along the
1165	     path the packet is traveling. Only with this it is ensured that
1166	     the same packets are selected at different observation points.
1167	     Furthermore they should have a high variability between
1168	     different packets to generate a high variation in the Hash
1169	     Range.

1171	     If a hash-based selection with the BOB function is used with
1172	     IPv4 traffic, the following input bytes MUST be used.
1173	     - IP identification field
1174	     - Flags field
1175	     - Fragment offset
1176	     - Source IP address
1177	     - Destination IP address
1178	     - A configurable number of bytes from the IP payload, starting
1179	        at a configurable offset.

1181	     All investigated Hash Functions were evaluated only for IPv4.
1182	     Due to the IPv6 header fields and address structure it is
1183	     expected that there is less randomness in IPv6 packet headers
1184	     than in IPv4 headers. Nevertheless, the randomness of IPv6
1185	     traffic was not evaluated in the tests mentioned above. In
1186	     addition to this, IPv6 traffic profiles may change significantly
1187	     in future when IPv6 is used by a broader community. If a hash-
1188	     based selection with the BOB function is used with IPv6 traffic,
1189	     the following input bytes MUST be used.
1190	     - Payload length (2 bytes)
1191	     - Byte number 10,11,14,15,16 of the IPv6 source address
1192	     - Byte number 10,11,14,15,16 of the IPv6 destination address
1193	     - A configurable number of bytes from the IP payload, starting
1194	        at a configurable offset. It is recommended to use at least 4
1195	        bytes from the IP payload.

1197	     The payload itself is not changing during the path. Even if some
1198	     routers process some extension headers they are not going to
1199	     strip them from the packet. Therefore the payload length is
1200	     invariant along the path. Furthermore it usually differs for
1201	     different packets. The IPv6 address has 16 bytes. The first part
1202	     is the network part and it contains low variation. The second
1203	     part is the host part and contains higher variation. Therefore
1204	     the second part of the address is used. Nevertheless, the
1205	     uniformity has not been checked for IPv6 traffic.

1207	  6.2.3.3 Hash Functions Suitable for Packet Digesting

1209	     For digesting Packet Content for inclusion in a reported label,
1210	     the most important property is a low collision frequency. A
1211	     secondary requirement is the ability to accept variable length
1212	     input, in order to allow inclusion of maximal amount of packet
1213	     as input. Execution speed is of secondary importance, since the
1214	     digest need only be formed from selected packets.

1216	     For this purpose also the BOB function is recommended. Other
1217	     functions (such as CRC-32) MAY be used. Among the functions
1218	     capable of operating with variable length input BOB and CRC-32
1219	     have the fastest execution, BOB being slightly faster. IPSX is
1220	     not recommended for digesting because it has a significantly
1221	     higher collision rate and takes only a fixed length input.

1223	  7. Parameters for the Description of Selection Techniques

1225	     This section gives an overview of different alternative
1226	     selection schemes and their required parameters. In order to be
1227	     compliant with PSAMP at least one of proposed schemes MUST be
1228	     implemented.

1230	     The decision whether to select a packet or not is based on a
1231	     function which is performed when the packet arrives at the
1232	     selection process. Packet selection schemes differ in the input
1233	     parameters for the selection process and the functions they
1234	     require to do the packet selection. The following table gives an
1235	     overview.

1237	          Scheme       |   input parameters     |     functions
1238	        ---------------+------------------------+-------------------
1239	         systematic    |    packet position     |  packet counter
1240	         count-based   |    Sampling pattern    |
1241	        ---------------+------------------------+-------------------
1242	         systematic    |      arrival time      |  clock or timer
1243	         time-based    |     Sampling pattern   |
1244	        ---------------+------------------------+-------------------
1245	         random        |  packet position       |  packet counter,
1246	         n-out-of-N    |  Sampling pattern      |  random numbers
1247	                       | (random number list)   |
1248	        ---------------+------------------------+-------------------
1249	         uniform       |        Sampling        |  random function
1250	         probabilistic |      probability       |

1252	        ---------------+------------------------+-------------------
1253	         non-uniform   |e.g. packet position,   | selection function,
1254	         probabilistic |  packet content(parts) |  probability calc.
1255	        ---------------+------------------------+-------------------
1256	         non-uniform   |e.g. flow state,        | selection function,
1257	         flow-state    |  packet content(parts) |  probability calc.
1258	        ---------------+------------------------+-------------------
1259	         property      | packet content(parts)  |  filter function or
1260	         match         | or router state        |  state discovery
1261	        ---------------+------------------------+-------------------
1262	         hash-based    |  packet content(parts) |  Hash Function
1263	        ---------------+------------------------+-------------------

1265	  7.1 Description of Sampling Techniques

1267	     In this section we define what elements are needed to describe
1268	     the most common Sampling techniques. Here the selection function
1269	     is pre-defined and given by the Selector ID.

1271	     Sampler Description:
1272	          SELECTOR_ID
1273	          SELECTOR_TYPE
1274	          SELECTOR_PARAMETERS

1276	     Where:

1278	     SELECTOR_ID:
1279	     Unique ID for the packet sampler.

1281	     SELECTOR_TYPE
1282	     For Sampling processes the SELECTOR TYPE defines what Sampling
1283	     algorithm is used.
1284	     Values: Systematic Count-based | Systematic Time-based | Random
1285	     n-out-of-N | Uniform Probabilistic | Non-uniform Probabilistic |
1286	     Non-uniform Flow-state

1288	     SELECTOR_PARAMETERS
1289	     For Sampling processes the SELECTOR PARAMETERS define the input
1290	     parameters for the process. Interval length in systematic
1291	     Sampling means, that all packets that arrive in this interval
1292	     are selected. The spacing parameter defines the spacing in time
1293	     or number of packets between the end of one Sampling interval
1294	     and the start of the next succeeding interval.

1296	     Case n out of N:
1297	        - Population size N, Sample size n

1299	     Case Systematic Time Based:

1301	        - Interval length (in usec), Spacing (in usec)

1303	     Case Systematic Count Based:
1304	        - Interval length(in packets), Spacing (in packets)

1306	     Case Uniform Probabilistic (with equal probability per packet):
1307	        - Sampling probability p

1309	     Case Non-uniform Probabilistic:
1310	        - Calculation function for Sampling probability p (see also
1311	           section .
1312	                   5.2.2.4)

1314	     Case flow state:
1315	        - Information reported for flow state sampling are not
1316	           defined in this document (see also section 5.2.2.4)

1318	  7.2 Description of Filtering Techniques

1320	     In this section we define what elements are needed to describe
1321	     the most common Filtering techniques. The structure closely
1322	     parallels the one presented for the Sampling techniques.

1324	     Filter Description:
1325	          SELECTOR_ID
1326	          SELECTOR_TYPE
1327	          SELECTOR_PARAMETERS

1329	     Where:

1331	     SELECTOR_ID:
1332	     Unique ID for the packet filter. The ID can be calculated under
1333	     consideration of the SELECTION SEQUENCE and a local ID.

1335	     SELECTOR_TYPE
1336	     For Filtering processes the SELECTOR TYPE defines what Filtering
1337	     type is used.
1338	     Values: Matching | Hashing | Router_state

1340	     SELECTOR_PARAMETERS
1341	     For Filtering processes the SELECTOR PARAMETERS define formally
1342	     the common property of the packet being filtered. For the
1343	     filters of type Matching and Hashing the definitions have a lot
1344	     of points in common.

1346	     Values:

1348	     Case Matching
1349	        - Information Element (from [IPFIX-INFO])
1350	        - Value (type in accordance to [IPFIX-INFO])

1352	     In case of multiple match criteria, multiple "case matching"
1353	     have to be bound by a logical AND.

1355	     Case Hashing:
1356	        - Hash Domain (Input bits from packet)
1357	             - <Header type = ipv4>
1358	             - <Input bit specification, header part>
1359	             - <Header type =  ipv6>
1360	             - <Input bit specification, header part>
1361	             - <payload byte number N>
1362	             - <Input bit specification, payload part>
1363	        - Hash Function
1364	             - Hash function name
1365	             - Length of input key (eliminate 0x bytes)
1366	             - Output value (length M and bitmask)
1367	             - Hash Selection Range, as a list of non overlapping
1368	               intervals [start value, end value] where value is in
1369	               [0,2^M-1]
1370	             - Additional parameters dependent on specific Hash
1371	               Function (e.g. hash input bits (seed))

1373	     Notes to input bits for Case Hashing:
1374	        - Input bits can be from header part only, from the payload
1375	           part only or from both.
1376	        - The bit specification, for the header part, can be
1377	           specified for ipv4 or ipv6 only, or both
1378	        - In case of ipv4, the bit specification is a sequence of 20
1379	           Hexadecimal numbers [00,FF] specifying a 20 bytes bitmask
1380	           to be applied to the header.
1381	        - In case of ipv6, it is a sequence of 40 Hexadecimal numbers
1382	           [00,FF] specifying a 40 bytes bitmask to be applied to the
1383	           header
1384	        - The bit specification, for the payload part, is a sequence
1385	           of Hexadecimal numbers [00,FF] specifying the bitmask to be
1386	           applied to the first N bytes of the payload, as specified
1387	           by the previous field. In case the Hexadecimal number
1388	           sequence is longer then N, only the first N numbers are
1389	           considered.
1390	        - In case the payload is shorter than N, the Hash Function
1391	           cannot be applied. Other options, like padding with zeros,
1392	           may be considered in the future.
1393	        - A Hash Function cannot be defined on the options field of
1394	           the ipv4 header, neither on stacked headers of ipv6.
1395	        - The Hash Selection Range defines a range of hash-values
1396	           (out of all possible results of the Hash-Operation). If the
1397	           hash result for a specific packet falls in this range, the
1398	           packet is selected. If the value is outside the range, the
1399	           packet is not selected. E.g. if the selection interval
1400	           specification is [1:3], [6:9] all packets are selected for
1401	           which the hash result is 1,2,3,6,7,8, or 9. In all other
1402	           cases the packet is not selected.

1404	     Case Router State:

1406	        - Ingress interface at which the packet arrives equals a
1407	           specified value
1408	        - Egress interface to which the packet is routed equals a
1409	           specified value
1410	        - Packet violated Access Control List (ACL) on the router
1411	        - Reverse Path Forwarding (RPF) failed for the packet
1412	        - Resource Reservation is insufficient for the packet
1413	        - No route found for the packet
1414	        - Origin AS equals a specified value or lies within a given
1415	           range
1416	        - Destination AS equals a specified value or lies within a
1417	           given range

1419	     Note to Case Router State:
1420	        - All Router state entries can be linked by AND operators

1422	  8. Composite Techniques

1424	     Composite schemes are realized by combining the selector IDs
1425	     into a Selection Sequence. The Selection Sequence contains all
1426	     selector IDs that are applied to the packet stream subsequently.
1427	     Some examples of composite schemes are reported below.

1429	  8.1 Cascaded Filtering->Sampling or Sampling->Filtering

1431	     If a filter precedes a Sampling process the role of Filtering is
1432	     to create a set of "parent populations" from a single stream
1433	     that can then be fed independently to different Sampling
1434	     functions, with different parameters tuned for the population
1435	     itself (e.g. if streams of different intensity result from
1436	     Filtering, it may be good to have different Sampling rates). If
1437	     Filtering follows a Sampling process, the same Sampling Fraction
1438	     and type is applied to the whole stream, independently of the
1439	     relative size of the streams resulting from the Filtering
1440	     function. Moreover, also packets not destined to be selected in
1441	     the Filtering operation will "load" the Sampling function. So,
1442	     in principle, Filtering before Sampling allows a more accurate
1443	     tuning of the Sampling procedure, but if filters are too complex
1444	     to work at full line rate (e.g. because they have to access
1445	     router state information), Sampling before Filtering may be a
1446	     need.

1448	  8.2 Stratified Sampling

1450	     Stratified Sampling is one example for using a composite
1451	     technique. The basic idea behind stratified Sampling is to
1452	     increase the estimation accuracy by using a-priori information
1453	     about correlations of the investigated characteristic with some
1454	     other characteristic that is easier to obtain. The a-priori
1455	     information is used to perform an intelligent grouping of the
1456	     elements of the parent population. In this manner, a higher
1457	     estimation accuracy can be achieved with the same Sample Size or
1458	     the Sample Size can be reduced without reducing the estimation
1459	     accuracy.

1461	     Stratified Sampling divides the Sampling process into multiple
1462	     steps. First, the elements of the parent population are grouped
1463	     into subsets in accordance to a given characteristic. This
1464	     grouping can be done in multiple steps. Then samples are taken
1465	     from each subset.

1467	     The stronger the correlation between the characteristic used to
1468	     divide the parent population (stratification variable) and the
1469	     characteristic of interest (for which an estimate is sought
1470	     after), the easier is the consecutive Sampling process and the
1471	     higher is the stratification gain. For instance, if the dividing
1472	     characteristic were equal to the investigated characteristic,
1473	     each element of the sub-group would be a perfect representative
1474	     of that characteristic. In this case it would be sufficient to
1475	     take one arbitrary element out of each subgroup to get the
1476	     actual distribution of the characteristic in the parent
1477	     population. Therefore stratified Sampling can reduce the costs
1478	     for the Sampling process (i.e. the number of samples needed to
1479	     achieve a given level of confidence).

1481	     For stratified Sampling one has to specify classification rules
1482	     for grouping the elements into subgroups and the Sampling scheme
1483	     that is used within the subgroups. The classification rules can
1484	     be expressed by multiple filters. For the Sampling scheme within
1485	     the subgroups the parameters have to be specified as described
1486	     above. The use of stratified Sampling methods for measurement
1487	     purposes is described for instance in [ClPB93] and [Zseb03].

1489	  9. Security Considerations

1491	     Security considerations concerning the choice of sampling hash
1492	     function have been discussed in Section 6.2.2. That section
1493	     discussed a number of potential attacks to craft packet streams
1494	     which are disproportionately detected and/or discover the hash
1495	     function parameters, the vulnerabilities of different hash
1496	     functions to these attacks, and practices to minimize these
1497	     vulnerabilities. In addition to this a user can gains knowledge
1498	     about the start and stop triggers in time-based systematic
1499	     sampling e.g. by sending test packets. This knowledge might
1500	     allow users to modify their send schedule in a way that their
1501	     packets are disproportionately selected or not selected
1502	     [GoRe07].

1504	     Further security threats can occur if the configuration of
1505	     Sampling parameters or the communication of Sampling parameters
1506	     to the application is corrupted. This document only describes
1507	     Sampling schemes that can be used for packet selection. It
1508	     neither describes a mechanism how those parameters are
1509	     configured nor how these parameters are communicated to the
1510	     application. Therefore the security threats that originate from
1511	     this kind of communication cannot be assessed with the
1512	     information given in this document.
1513	  10. Acknowledgements

1515	     We would like to thank the PSAMP group, especially Benoit Claise
1516	     and Stewart Bryant, for fruitful discussions and for
1517	     proofreading the document. We thank Sharon Goldberg for her
1518	     input on security issues concerning hash-based selection.

1520	  11. IANA Considerations

1522	     This document has no actions for IANA.

1524	  12. Normative References

1526	     [RFC2119]   Bradner, S., Key words for use in RFCs to Indicate
1527	                  Requirement Levels, BCP 14, RFC 2119, March 1997

1529	  13. Informative References

1531	     [AmCa89]    Paul D. Amer, Lillian N. Cassel, "Management of
1532	                  Sampled Real-Time Network Measurements", 14th
1533	                  Conference on Local Computer Networks, October
1534	                  1989, Minneapolis, pages 62-68, IEEE, 1989.

1536	     [BeCK96]    M. Bellare, R. Canetti and H. Krawczyk,
1537	                  "Pseudorandom Functions Revisited: The Cascade
1538	                  Construction and its Concrete Security", Symposium
1539	                  on Foundations of Computer Science, 1996.

1541	     [ClPB93]    K.C. Claffy, George C. Polyzos, Hans-Werner Braun,
1542	                  "Application of Sampling Methodologies to Network
1543	                  Traffic Characterization", Proceedings of ACM
1544	                  SIGCOMM'93, San Francisco, CA, USA, September 13 -
1545	                  17, 1993.

1547	     [DuGG02]    N.G. Duffield, A. Gerber, M. Grossglauser,
1548	                  "Trajectory Engine: A Backend for Trajectory
1549	                  Sampling", IEEE Network Operations and Management
1550	                  Symposium 2002, Florence, Italy, April 15-19, 2002.

1552	     [DuGr00]    N.G. Duffield, M. Grossglauser, "Trajectory
1553	                  Sampling for Direct Traffic Observation",
1554	                  Proceedings of ACM SIGCOMM 2000, Stockholm, Sweden,
1555	                  August 28 - September 1, 2000.

1557	     [DuGr04]    N. G. Duffield and M. Grossglauser "Trajectory
1558	                  Sampling with Unreliable Reporting", Proc IEEE
1559	                  Infocom 2004, Hong Kong, March 2004.

1561	     [DuLT01]    N.G. Duffield, C. Lund, and M. Thorup, "Charging
1562	                  from Sampled Network Usage", ACM Internet
1563	                  Measurement Workshop IMW 2001, San Francisco, USA,
1564	                  November 1-2, 2001.

1566	     [EsVa01]    C. Estan and G. Varghese, "New Directions in
1567	                  Traffic Measurement and Accounting", ACM SIGCOMM
1568	                  Internet Measurement Workshop 2001, San Francisco
1569	                  (CA) Nov. 2001.

1571	     [GoRe07]    S. Goldberg, J. Rexford, "Security Vulnerabilities
1572	                  and Solutions for Packet Sampling", IEEE Sarnoff
1573	                  Symposium, Princeton, NJ, May 2007.

1575	     [HT52]      D.G. Horvitz and D.J. Thompson, "A Generalization
1576	                  of Sampling without replacement from a Finite
1577	                  Universe" J. Amer. Statist. Assoc. Vol. 47, pp.
1578	                  663-685, 1952.

1580	     [IPFIX-INFO] J. Meyer, J. Quittek, S. Bryant "Information Model
1581	                  for IP Flow Information Export", RFC XXXX
1582	                  [Currently Internet Draft, draft-ietf-ipfix-info-
1583	                  15, February 2007].

1585	     [IPFIX-PROTO]B. Claise (Editor) "Specification of the IPFIX
1586	                  Protocol for the Exchange of IP Traffic Flow
1587	                  Information", RFC XXXX. [Currently Internet Draft,
1588	                  draft-ietf-ipfix-protocol-24.txt, November 2006].

1590	     [Jenk97]    B. Jenkins, "Algorithm Alley", Dr. Dobb's Journal,
1591	                  September 1997.
1592	                  http://burtleburtle.net/bob/hash/doobs.html

1594	     [JePP92]    Jonathan Jedwab, Peter Phaal, Bob Pinna, "Traffic
1595	                  Estimation for the Largest Sources on a Network,
1596	                  Using Packet Sampling with Limited Storage", HP
1597	                  technical report, Managemenr, Mathematics and
1598	                  Security Department, HP Laboratories, Bristol,
1599	                  March 1992,
1600	                  http://www.hpl.hp.com/techreports/92/HPL-92-35.html

1602	     [Moli03]    M.Molina, "A scalable and efficient methodology for
1603	                  flow monitoring in the Internet", International
1604	                  Teletraffic Congress (ITC-18), Berlin, Sep. 2003

1606	     [MoND05]    M. Molina, S.Niccolini, N.G.Duffield "A Comparative
1607	                  Experimental Study of Hash Functions Applied to
1608	                  Packet Sampling" International Teletraffic Congress
1609	                  (ITC-19), Beijing, August 2005.

1611	     [PSAMP-FW]  Nick Duffield (Ed.), "A Framework for Packet
1612	                  Selection and Reporting", RFC XXXX [currently
1613	                  Internet Draft draft-ietf-psamp-framework-11, work
1614	                  in progress, May 2007].

1616	     [PSAMP-INFO] T. Dietz, F. Dressler, G. Carle, B. Claise,
1617	                  "Information Model for Packet Sampling Exports",
1618	                  RFC XXXX. [Currently Internet Draft, draft-ietf-
1619	                  psamp-info-06, June 2007]

1621	     [PSAMP-PROTO] B. Claise (Ed.), "Packet Sampling (PSAMP) Protocol
1622	                  Specifications", RFC XXXX. [Currently Internet
1623	                  Draft draft-ietf-psamp-protocol-07.txt, work in
1624	                  progress, October 2006].

1626	     [RFC1071]   R. Braden, D. Borman, C. Partridge, "Computing the
1627	                  Internet Checksum", RFC 1071, Sep. 1988 (updated by
1628	                  RFCs1141 and RFC1624).

1630	     [RFC3917]   J. Quittek, T. Zseby, B. Claise, S. Zander,
1631	                  "Requirements for IP Flow Information Export", RFC
1632	                  3917, October 2004.

1634	     [RFC4271]   Y. Rekhter, T. Li, S. Hares, "A Border Gateway
1635	                  Protocol 4 (BGP-4)", RFC 4271, January 2006.

1637	     [Zseb03]    T. Zseby, "Stratification Strategies for Sampling-
1638	                  based Non-intrusive Measurement of One-way Delay",
1639	                  Proceedings of Passive and Active Measurement
1640	                  Workshop (PAM 20003), La Jolla, CA, USA, pp. 171-
1641	                  179, April 2003.

1643	  Authors' Addresses

1645	     Tanja Zseby
1646	     Fraunhofer Institute for Open Communication Systems
1647	     Kaiserin-Augusta-Allee 31
1648	     10589 Berlin
1649	     Germany
1650	     Phone: +49-30-34 63 7153
1651	     Email: tanja.zseby@fokus.fraunhofer.de

1653	     Maurizio Molina
1654	     DANTE
1655	     City House
1656	     126-130 Hills Road
1657	     Cambridge CB21PQ
1658	     United Kingdom
1659	     Phone: +44 1223 371 300
1660	     Email: maurizio.molina@dante.org.uk

1662	     Nick Duffield
1663	     AT&T Labs - Research
1664	     Room B-139
1665	     180 Park Ave
1666	     Florham Park NJ 07932, USA
1667	     Phone: +1 973-360-8726
1668	     Email: duffield@research.att.com

1670	     Saverio Niccolini
1671	     Network Laboratories, NEC Europe Ltd.
1672	     Kurfuerstenanlage 36
1673	     69115 Heidelberg
1674	     Germany
1675	     Phone: +49-6221-9051118
1676	     Email:  saverio.niccolini@netlab.nec.de

1678	     Fredric Raspall
1679	     EPSC-UPC
1680	     Dept. of Telematics
1681	     Av. del Canal Olimpic, s/n
1682	     Edifici C4
1683	     E-08860 Castelldefels, Barcelona
1684	     Spain
1685	     Email: fredi@entel.upc.es

1687	  Intellectual Property Statement

1689	     The IETF has been notified of intellectual property rights
1690	     claimed in regard to some or all of the specification contained
1691	     in this document. For more information consult the online list
1692	     of claimed rights.

1694	     The IETF takes no position regarding the validity or scope of
1695	     any Intellectual Property Rights or other rights that might be
1696	     claimed to pertain to the implementation or use of the
1697	     technology described in this document or the extent to which any
1698	     license under such rights might or might not be available; nor
1699	     does it represent that it has made any independent effort to
1700	     identify any such rights.  Information on the procedures with
1701	     respect to rights in RFC documents can be found in BCP 78 and
1702	     BCP 79.

1704	     Copies of IPR disclosures made to the IETF Secretariat and any
1705	     assurances of licenses to be made available, or the result of an
1706	     attempt made to obtain a general license or permission for the
1707	     use of such proprietary rights by implementers or users of this
1708	     specification can be obtained from the IETF on-line IPR
1709	     repository at http://www.ietf.org/ipr.

1711	     The IETF invites any interested party to bring to its attention
1712	     any copyrights, patents or patent applications, or other
1713	     proprietary rights that may cover technology that may be
1714	     required to implement this standard. Please address the
1715	     information to the IETF at ietf-ipr@ietf.org.

1717	  Copyright Statement

1719	     Copyright (C) The IETF Trust (2007).

1721	     This document is subject to the rights, licenses and
1722	     restrictions contained in BCP 78, and except as set forth
1723	     therein, the authors retain all their rights.

1725	  Disclaimer

1727	     This document and the information contained herein are provided
1728	     on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
1729	     REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY,
1730	     THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM
1731	     ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO
1732	     ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT
1733	     INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY
1734	     OR FITNESS FOR A PARTICULAR PURPOSE.

1736	  Appendix A: Hash Functions

1738	  A.1 IP Shift-XOR (IPSX) Hash Function

1740	     The IPSX Hash Function is tailored for acting on IP version 4
1741	     packets. It exploits the structure of IP packet and in
1742	     particular the variability expected to be exhibited within
1743	     different fields of the IP packet in order to furnish a hash
1744	     value with little apparent correlation with individual packet
1745	     fields. Fields from the IPv4 and TCP/UDP headers are used as
1746	     input. The IPSX Hash Function uses a small number of simple
1747	     instructions.

1749	     Input parameters: None

1751	     Built-in parameters: None

1753	     Output: The output of the IPSX is a 16 bit number

1755	     Functioning:
1756	     The functioning can be divided into two parts: input selection,
1757	     which forms are composite input from various portions of the IP
1758	     packet, followed by computation of the hash on the composite.

1760	     Input Selection:
1761	     The raw input is drawn from the first 20 bytes of the IP packet
1762	     header and the first 8 bytes of the IP payload. If IP options
1763	     are not used, the IP header has 20 bytes, and hence the two
1764	     portions adjoin and comprise the first 28 bytes of the IP
1765	     packet. We now use the raw input as 4 32-bit subportions of
1766	     these 28 bytes. We specify the input by bit offsets from the
1767	     start of IP header or payload.

1769	     f1 = bits 32 to 63 of the IP header, comprising the IP
1770	          identification field, flags, and fragment offset.

1772	     f2 = bits 96 to 127 of the IP header, the source IP address.

1774	     f3 = bits 128 to 159 of the IP header, the destination IP
1775	          address.

1777	     f4 = bits 32 to 63 of the IP payload. For a TCP packet, f4
1778	          comprises the TCP sequence number followed by the message
1779	          length. For a UDP packet f4 comprises the UDP checksum.

1781	     Hash Computation:
1782	     The hash is computed from f1, f2, f3 and f4 by a combination of
1783	     XOR (^), right shift (>>) and left shift (<<) operations. The
1784	     intermediate quantities h1, v1, v2 are 32-bit numbers.

1786	            1.    v1 = f1 ^ f2;
1787	            2.    v2 = f3 ^ f4;
1788	            3.    h1 = v1 << 8;
1789	            4.    h1 ^= v1 >> 4;
1790	            5.    h1 ^= v1 >> 12;
1791	            6.    h1 ^= v1 >> 16;
1792	            7.    h1 ^= v2 << 6;
1793	            8.    h1 ^= v2 << 10;
1794	            9.    h1 ^= v2 << 14;
1795	            10.   h1 ^= v2 >> 7;

1797	     The output of the hash is the least significant 16 bits of h1.

1799	  A.2 BOB Hash Function

1801	     The BOB Hash Function is a Hash Function designed for having
1802	     each bit of the input affecting every bit of the return value
1803	     and using both 1-bit and 2-bit deltas to achieve the so called
1804	     avalanche effect [Jenk97]. This function was originally built
1805	     for hash table lookup with fast software implementation.

1807	     Input Parameters:
1808	     The input parameters of such a function are:
1809	     - the length of the input string (key) to be hashed, in bytes.
1810	     The elementary input blocks of Bob hash are the single bytes,
1811	     therefore no padding is needed.
1812	     - an init value (an arbitrary 32-bit number).

1814	     Built in parameters:
1815	     The Bob Hash uses the following built-in parameter:
1816	     - the golden ratio (an arbitrary 32-bit number used in the hash
1817	     function computation: its purpose is to avoid mapping all zeros
1818	     to all zeros);

1820	     Note: the mix sub-function (see mix (a,b,c) macro in the
1821	     reference code in 3.2.4) has a number of parameters governing
1822	     the shifts in the registers. The one presented is not the only
1823	     possible choice.

1825	     It is an open point whether these may be considered additional
1826	     built-in parameters to specify at function configuration.

1828	     Output.
1829	     The output of the BOB function is a 32-bit number. It should be
1830	     specified:
1831	     - A 32 bit mask to apply to the output
1832	     - The selection range as a list of non overlapping intervals
1833	     [start value, end value] where value is in [0,2^32]

1835	     Functioning:
1836	     The hash value is obtained computing first an initialization of
1837	     an internal state (composed of 3 32-bit numbers, called a, b, c
1838	     in the reference code below), then, for each input byte of the
1839	     key the internal state is combined by addition and mixed using
1840	     the mix sub-function. Finally, the internal state mixed one last
1841	     time and the third number of the state (c) is chosen as the
1842	     return value.

1844	     typedef unsigned long int  ub4;   /* unsigned 4-byte quantities
1845	     */
1846	     typedef unsigned      char ub1;   /* unsigned 1-byte quantities
1847	     */

1849	     #define hashsize(n) ((ub4)1<<(n))
1850	     #define hashmask(n) (hashsize(n)-1)

1852	     /* ------------------------------------------------------
1853	       mix -- mix 3 32-bit values reversibly.
1854	       For every delta with one or two bits set, and the deltas of
1855	     all three high bits or all three low bits, whether the original
1856	     value of a,b,c is almost all zero or is uniformly distributed,
1857	       * If mix() is run forward or backward, at least 32 bits in
1858	     a,b,c have at least 1/4 probability of changing.
1859	       * If mix() is run forward, every bit of c will change between
1860	     1/3 and 2/3 of the time.  (Well, 22/100 and 78/100 for some 2-
1861	     bit deltas.) mix() was built out of 36 single-cycle latency
1862	     instructions in a structure that could supported 2x parallelism,
1863	     like so:
1864	             a -= b;
1865	             a -= c; x = (c>>13);
1866	             b -= c; a ^= x;
1867	             b -= a; x = (a<<8);
1868	             c -= a; b ^= x;
1869	             c -= b; x = (b>>13);
1870	             ...
1871	     Unfortunately, superscalar Pentiums and Sparcs can't take
1872	     advantage of that parallelism.  They've also turned some of
1873	     those single-cycle latency instructions into multi-cycle latency
1874	     instructions
1875	     ------------------------------------------------------------*/

1877	       #define mix(a,b,c)  \
1878	       { \
1879	         a -= b; a -= c; a ^= (c>>13); \
1880	         b -= c; b -= a; b ^= (a<<8); \
1881	         c -= a; c -= b; c ^= (b>>13); \
1882	         a -= b; a -= c; a ^= (c>>12);  \
1883	         b -= c; b -= a; b ^= (a<<16); \
1884	         c -= a; c -= b; c ^= (b>>5); \
1885	         a -= b; a -= c; a ^= (c>>3);  \
1886	         b -= c; b -= a; b ^= (a<<10); \
1887	         c -= a; c -= b; c ^= (b>>15); \
1888	       }

1890	       /* -----------------------------------------------------------
1891	     hash() -- hash a variable-length key into a 32-bit value
1892	     k       : the key (the unaligned variable-length array of bytes)
1893	     len     : the length of the key, counting by bytes
1894	     initval : can be any 4-byte value
1895	     Returns a 32-bit value.  Every bit of the key affects every bit
1896	     of the return value.  Every 1-bit and 2-bit delta achieves
1897	     avalanche. About 6*len+35 instructions.

1899	     The best hash table sizes are powers of 2.  There is no need to
1900	     do mod a prime (mod is sooo slow!).  If you need less than 32
1901	     bits, use a bitmask.  For example, if you need only 10 bits, do
1902	     h = (h & hashmask(10));
1903	     In which case, the hash table should have hashsize(10) elements.

1905	     If you are hashing n strings (ub1 **)k, do it like this:
1906	     for (i=0, h=0; i<n; ++i) h = hash( k[i], len[i], h);

1908	     By Bob Jenkins, 1996.  bob_jenkins@burtleburtle.net.  You may
1909	     use this code any way you wish, private, educational, or
1910	     commercial.  It's free. See
1911	     http://burtleburtle.net/bob/hash/evahash.html
1912	     Use for hash table lookup, or anything where one collision in
1913	     2^^32 is acceptable.  Do NOT use for cryptographic purposes.
1914	      ----------------------------------------------------------- */

1916	       ub4 bob_hash(k, length, initval)
1917	       register ub1 *k;        /* the key */
1918	       register ub4  length;   /* the length of the key */
1919	       register ub4  initval;  /* an arbitrary value */
1920	       {
1921	          register ub4 a,b,c,len;
1922	          /* Set up the internal state */
1923	          len = length;
1924	          a = b = 0x9e3779b9; /*the golden ratio; an arbitrary value
1925	     */
1926	          c = initval;         /* another arbitrary value */

1928	     /*------------------------------------ handle most of the key */

1930	          while (len >= 12)
1931	          {
1932	             a += (k[0] +((ub4)k[1]<<8) +((ub4)k[2]<<16)
1933	     +((ub4)k[3]<<24));
1934	             b += (k[4] +((ub4)k[5]<<8) +((ub4)k[6]<<16)
1935	     +((ub4)k[7]<<24));
1936	             c += (k[8] +((ub4)k[9]<<8)
1937	     +((ub4)k[10]<<16)+((ub4)k[11]<<24));
1938	             mix(a,b,c);
1939	             k += 12; len -= 12;
1940	          }

1942	          /*---------------------------- handle the last 11 bytes */
1943	          c += length;
1944	          switch(len)       /* all the case statements fall through*/
1945	          {
1946	          case 11: c+=((ub4)k[10]<<24);
1947	          case 10: c+=((ub4)k[9]<<16);
1948	          case 9 : c+=((ub4)k[8]<<8);
1949	             /* the first byte of c is reserved for the length */
1950	          case 8 : b+=((ub4)k[7]<<24);
1951	          case 7 : b+=((ub4)k[6]<<16);
1952	          case 6 : b+=((ub4)k[5]<<8);
1953	          case 5 : b+=k[4];
1954	          case 4 : a+=((ub4)k[3]<<24);
1955	          case 3 : a+=((ub4)k[2]<<16);
1956	          case 2 : a+=((ub4)k[1]<<8);
1957	          case 1 : a+=k[0];
1958	            /* case 0: nothing left to add */
1959	          }
1960	          mix(a,b,c);
1961	          /*-------------------------------- report the result */
1962	          return c;
1963	       }